PS5 Die Shot has been revealed

Those (top) are 3xxx series zen2 ('matisse' cores)- it looks like both PS5 and Series S/X are using Zen2 'renoir' cores as found in moblile parts (4xxx series) - that's 4MB L3 cache per 4 core CCX.

Both PS5 and Series CPU cores look very similar to renoir cores imo - there are some images already in this thread eg here https://www.neogaf.com/threads/ps5-die-shot-has-been-revealed.1591559/page-3#post-262358033

tldr - both Series X/S and PS5 are have mobile Zen2 cores, closer to "supercharged" mobile zen chips (with extra CU) ..
Yep, precisely. I have said this many times on this forum and on Twitter that both CPUs are seemingly derivatives of Renoir.
 
so ps5 lost avx256 to just avx128, kinda disappointing, here i was excited to see how the apu handles the extra load of avx256 ...guess it can't/won't. 🤷‍♀️
That is very likely not true.

Seems like they cut from the PFU others things it won't use but native 256bits is there just like Cerny said.
 
Last edited:
Those (top) are 3xxx series zen2 ('matisse' cores)- it looks like both PS5 and Series S/X are using Zen2 'renoir' cores as found in moblile parts (4xxx series) - that's 4MB L3 cache per 4 core CCX.

Both PS5 and Series CPU cores look very similar to renoir cores imo - there are some images already in this thread eg here https://www.neogaf.com/threads/ps5-die-shot-has-been-revealed.1591559/page-3#post-262358033

tldr - both Series X/S and PS5 are have mobile Zen2 cores, closer to "supercharged" mobile zen chips (with extra CU) ..

I mean that shouldn't be surprising right? the 4000 series makes sense in a (relatively) low profile and low power system
 
this image should be in the first post.

so ps5 lost avx256 to just avx128, kinda disappointing, here i was excited to see how the apu handles the extra load of avx256 ...guess it can't/won't. 🤷‍♀️

so the bottom parts are the custom io and the audio/modified cu. that's a lot of silicon for io? feels like sony over engineered that for future hw/pro revision
Please don't make the same mistake I did and not read the entire thread before posting. I made an ill informed post about ray tracing because I didn't read everything.
 
for rdna 1 yeah and they boasted the exact same gains for RDNA2.25 + 25 = 50%.
I thought IPC was identical between RDNA 1 & 2 (not to be confused with efficiency gains). Where in the pipeline is RDNA 2 performing extra calculations per clock, the maps they've released look identical to RDNA 1 to me.
 
tenor.gif
 
I believe Cerny told the truth. It's RDNA2. But the fanboys took it to the next level with magical RDNA3 applications.

I think people keep going crazy over the mention of RDNA3.

Having a tweak in ps5/XsX that ends up also being in RDNA3 really wouldn't be surprising. Thinking that tweak is gonna boost the performance 2x is crazy though.
 
Last edited:
Alright that's interesting but is in total contradiction with amd's RDNA 2 presentation about RDNA 2 CU's, could it simply be a mistake from the man speaking ?
Because I think otherwise their are only presenting the same gains from GCN than RDNA 1 CU's which means that it is indeed RDNA 1 Cu's (that I don't believe) r maybe that had to make concessions to fit that much CU on the die (I'm not sure about it either).
Honestly I believe this is a simple mistake made by the person speaking.
You can look for RDNA1/2 ipc gains slide from AMD.

Edit: Or it could be a scenario where we are not talking exactly about the same thing it could simply be a difference between theoretiical and practical gains.

He didn't make a mistake, it's accurate. There are no traditional IPC gains in the Compute Units going from RDNA 1 to RDNA 2. The only IPC gain comes strictly from the addition of Infinity Cache. The additional 25% performance uplift from RDNA 1 to RDNA 2 comes entirely from 30% more clock speed, which is not the same as there being an IPC improvement.
 
I think people keep going crazy over the mention of RDNA3.

The having a tweak that ends up also being in RDNA3 really isn't crazy. That tweak could only provide a 1% boost in performance however.

I think it was that fucking square faced idiot "Moore's Law is Dead" talking shit through his ass...
 
I think it was that fucking square faced idiot "Moore's Law is Dead" talking shit through his ass...

Maybe I'm out of the loop and people really are claiming PS5 is using full on RDNA 3 or something but I just assume when people talk about RDNA 3 they mean that some small customizations will end up integreated in the next version of RDNA3. Like the cache scrubbers or something.
 
Lmao I love all the dismissive fanboy comments. Typical loser replies when they lost the battle. "I never really cared anyway." Cracks me up every time.
 
I recall when telling people I didn't buy that nonsense from youtubers about a unified L3 cache on the PS5 CPU, I was attacked for saying so. Surely people have to realize those guys are just making stuff up now, right? They literally just say whatever people want to hear to get their video views and likes up.
 
So why is it performing as well as it has?

Something seems wrong.

Honestly this makes the performance comparisons even more confusing.



Maybe Sonys goal was to eliminate all bottlenecks and design an incredibly efficient system?

That's all that I can think of to be honest.

None of these early games are remotely tapping what these consoles can do, not a one. People may also not want to hear this, but it is true that some work needs to be done on the Series X development side of things. Microsoft built an entirely brand new development kit, and there are quite a few new features available that Microsoft are pushing with DX 12 Ultimate that no developer/game is yet taking advantage of. It will take time for developers to come to grips with all of this.
 
this image should be in the first post.

so ps5 lost avx256 to just avx128, kinda disappointing, here i was excited to see how the apu handles the extra load of avx256 ...guess it can't/won't. 🤷‍♀️

so the bottom parts are the custom io and the audio/modified cu. that's a lot of silicon for io? feels like sony over engineered that for future hw/pro revision
More FUD from you.

The CPU does support AVX256.
Mark Cerny said:
PlayStation 5 is especially challenging because the CPU supports 256-bit native instructions that consume a lot of power.
 
Last edited:
are we still getting sharper xray ?
this is too blurry for my expert eyes

so did sony downgrade from avx256 to avx128?
 
Cerny said in March the PS5 supports AVX 256 bit native instructions.
saw it mentioned here about the cut down fpu


as panjev mentioned, ps5 can still support avx256 by running 2xavx128, but performance of course will be slower than native avx256
As far as i know to be considered 'native' it has to be 1x256 thus 256-bit width for the FPU. I don't think that Cerny is lying.
They can cut FPU units, because there are multiple units per core .. and still have 256 native, just get lower FP32 IPC.

eg (anandtech) "support for two AVX-256 instructions per cycle" [edit = it seems they mean 1 multiply + 1 add .. so not sure now] - so they could half that and still have native AVX256 doing one per clock. I'm not sure they have though. (also the idea that they removed FADD seems ridiculous -now the computer can't do floating point additions ?!??!)
 
Last edited:
Seems like Sony/AMD redesigned the Zen 2's FPU because it is smaller than typical Zen 2's APU.

The speculation was that they removed the support for native 256bits operations but it was discarded after some more sizes comparisons so now they are theorizing that some non-used operations were removed like FDAA.

Why Sony/AMD redesigned the Zen 2's FPU?
What was changed?
Nobody knows yet.

Outside that there is a big part of the silicon that nobody knows what is yet... I/O complex, co-processors and Tempest Unit is probably there but where exactly is still a mystery.

Edit - A bit more simple I guess.



 
Last edited:
can someone explain to me how exactly people came to the conlusion that there is a difference in CUs?

actually i can't make out shit at this resolution. same for the GPU frontend.
 
Because it means a lot less pressure for the system. But the thing with AVX256 is just a guess. Zen2 is fast, even without that. And AVX instructions are a nice to have. You can do everything even without AVX, but AVX support makes it faster/more efficient. More efficient doesn't mean it needs less power. It needs more in a short time. E.g. Intel CPUs clock themself down into <3GHz territory when AVX(256) is really used and the CPUs still need much power and generate a lot of heat. But you can do it with less power and less heat, but you also need more time.
Heat and power is essential for the PS5, because if the CPU needs more power, it is taken from the whole budget, which means the PS5 GPU must clock down because it has less power available.
If you remove AVX256 from the equation, you might need a bit longer on the CPU side to make your calculations, but you don't need a bigger chunk of the power budget.

AVX instructions are great, but can really hurt your power budget. Btw you need more than 2x cycles if you want to make 256 bit calculations with 128-bit support. But there is a high chance that games won't use those instructions to often, so removing them might be a great way to reduce the overall power-draw of the CPU.

But that is all speculation. And not some dubios insider info ;)
Hey, yes but it is interesting speculation :).

I am aware of Intel chips down locking heavily when AVX is used in cores (especially with the Larrabee derived AVX-512 extension set), but that was my point. Cerny actively made the point about power hungry 256 but instructions and there is no reason why four 128 bits Vector Units with 128 bit data paths (Ryzen 1) should consume the same power as two 256 bits Vector Units with 256 bits data paths (Intel and Ryzen 2) so AVX-256 on a Ryzen 1 floating point unit should not be a monster of power consumption at all. So why call 256 bits operations out when you are making the case about worst case scenario power consumption? Something is not adding up. Still, it is roughly 2x the clock cycles but 2x the units (AVX-256 bits is faster on the comparable Intel CPU's of the time, which had "proper" width SIMD units AVX-256, but not THAT much)

References:
* https://www.anandtech.com/show/14525/amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome/9
* https://www.agner.org/optimize/blog/read.php?i=838#838
walv2W7.jpg
 
Seems like Sony/AMD redesigned the Zen 2's FPU because it is smaller than typical Zen 2's APU.

The speculation was that they removed the support for native 256bits operations but it was discarded after some more sizes comparisons so now they are theorizing that some non-used operations were removed like FDAA.

Why Sony/AMD redesigned the Zen 2's FPU?
What was changed?
Nobody knows yet.

Outside that there is a big part of the silicon that nobody knows what is yet... I/O complex, co-processors and Tempest Unit is probably there but where exactly is still a mystery.

Edit - A bit more simple I guess.




Thanks! Though I still don't really understand; it's definitely above my pay grade lol.

Anyway, it doesn't really matter to me because I'm not the target audience. All I know (and should care about) is that if this generation PS5 can run games with the graphical fidelity of Demon's Souls at 60 FPS, and the loading time of Ratchet & Clank, and potential future games like the Unreal Engine 5 Demo (all of which we have seen running directly on the PS5 hardware), I'm a happy man.
 


This guy analyzed the XSX GPU in X-ray and discovered that the console's GPU is more or less close to the RDNA1 than the RDNA2 Desktop GPU.

He says that the base has a lot of the RDNA1 architecture with RDNA2 features.
 
Alright that's interesting but is in total contradiction with amd's RDNA 2 presentation about RDNA 2 CU's, could it simply be a mistake from the man speaking ?
Because I think otherwise their are only presenting the same gains from GCN than RDNA 1 CU's which means that it is indeed RDNA 1 Cu's (that I don't believe) r maybe that had to make concessions to fit that much CU on the die (I'm not sure about it either).
Honestly I believe this is a simple mistake made by the person speaking.
You can look for RDNA1/2 ipc gains slide from AMD.

Edit: Or it could be a scenario where we are not talking exactly about the same thing it could simply be a difference between theoretiical and practical gains.

It is possible... it is also possible like the quoted AMD leaker that both Sony and MS played a bit fast and loose with the "RDNA2 based" label and they both contain a mix of features from both RDNA1 and RDNA2 (and some to be released outside of consoles a bit later too).
 
Hey, yes but it is interesting speculation :).

I am aware of Intel chips down locking heavily when AVX is used in cores (especially with the Larrabee derived AVX-512 extension set), but that was my point. Cerny actively made the point about power hungry 256 but instructions and there is no reason why four 128 bits Vector Units with 128 bit data paths (Ryzen 1) should consume the same power as two 256 bits Vector Units with 256 bits data paths (Intel and Ryzen 2) so AVX-256 on a Ryzen 1 floating point unit should not be a monster of power consumption at all. So why call 256 bits operations out when you are making the case about worst case scenario power consumption? Something is not adding up. Still, it is roughly 2x the clock cycles but 2x the units (AVX-256 bits is faster on the comparable Intel CPU's of the time, which had "proper" width SIMD units AVX-256, but not THAT much)

References:
* https://www.anandtech.com/show/14525/amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome/9
* https://www.agner.org/optimize/blog/read.php?i=838#838
walv2W7.jpg

maybe it was a last minute downgrade after mark speech? they were unable to run high gpu clocks with true avx256 and revert to zen1 avx ?

in zen 2,when you ran avx256 like p95, it just fall to its base clocks, for intel when you run avx256, it falls below base clocks iirc.
 
maybe it was a last minute downgrade after mark speech? they were unable to run high gpu clocks with true avx256 and revert to zen1 avx ?

in zen 2,when you ran avx256 like p95, it just fall to its base clocks, for intel when you run avx256, it falls below base clocks iirc.
I think it is more likely they cut out some of the units they saw least used in their code to cut out complexity. They do not need to run Windows, they are not running desktop optimised software so it is possible that, as one of the commenters mentioned, they cut one of the pipes out (there is more than just a single AVX-256 unit there) that was used the least to save some on power but mostly on area (x8).
On CELL for PS3 they did optimise for example accessing memory wothb32 bits addresses despite the PowerPC architecture it was based on with having 64 bits addresses support.

I find it a lot less likely that they re-engineered an otherwise untouched Ryzen 2 core to remove the FP resources and tackle on the Ryzen 1 ones (it is likely even more invasive than it sounds)... if they had time for that they would have had time for the Ryzen 3 shared cache 😂.
When Cerny gave his speech the SoC was pretty much finalised and possibly already trialling mass production, few things could have changed then: what you suggest would have led to having to verify the chip again. Mmh a bit too late in the process for such a "feat" IMHO.
 
Last edited:
I think it is more likely they cut out some of the units they saw least used in their code to cut out complexity. They do not need to run Windows, they are not running desktop optimised software so it is possible that, as one of the commenters mentioned, they cut one of the pipes out (there is more than just a single AVX-256 unit there) that was used the least to save some on power but mostly on area (x8).
On CELL for PS3 they did optimise for example accessing memory wothb32 bits addresses despite the PowerPC architecture it was based on with having 64 bits addresses support.

I find it a lot less likely that they re-engineered an otherwise untouched Ryzen 2 core to remove the FP resources and tackle on the Ryzen 1 ones (it is likely even more invasive than it sounds)... if they had time for that they would have had time for the Ryzen 3 shared cache 😂.
When Cerny gave his speech the SoC was pretty much finalised and possibly already trialling mass production, few things could have changed then: what you suggest would have led to having to verify the chip again. Mmh a bit too late in the process for such a "feat" IMHO.

they could be pushing things till the final moments, and have 2 final designs, one downgraded, one best case scenario.

still the loss of full avx256 is disappointing, i think such fpu logic helps with games, and i remember some devs were excited with real avx256 finally. so its likely ps5 runs it at half speed?
 
Last edited:
Top Bottom