PS5 Die Shot has been revealed

Sony never said shit about Microsoft and never said "no cross gen games."

Find the lie. (Spoiler- there is none.)

"We have always said that we believe in generations. We believe that when you go to all the trouble of creating a next-gen console, that it should include features and benefits that the previous generation does not include. And that, in our view, people should make games that can make the most of those features."

"We do believe in generations, and whether it's the DualSense controller, whether it's the 3D audio, whether it's the multiple ways that the SSD can be used... we are thinking that it is time to give the PlayStation community something new, something different, that can really only be enjoyed on PS5."

- Jimmy Sony


Its standard marketing saying "PS5 is not just a PS4 Pro 2.0." Its hyping the hardware.
Amazing how different the quote looks in context. ;)
 
Even when PS4 had the obvious power crown, Sony wasn't anywhere near as adamant about it.
I thought them touting the Xbox One X as the most powerful console ever was kinda odd seeing as how Sony beat them to the market a year in advance. And that too while being a $100 cheaper. What was most bizarre was that they revealed the x1x 6 months before the PS4 Pro was even revealed, and said it will be the most powerful console ever. So they knew about the PS4 Pro being 4 tflops in advance, and basically came up with specs on the fly not caring at all about being a year late and being more expensive. Surely, thats not that big of an accomplishment.

they tried doing the same this time around and made a huge deal over an 18% more tflops difference while being $100 more expensive and were still outdone by the PS5 ssd speeds. What wouldve been impressive was if they were at $299 with 15 tflops and the same ssd speeds. Because thats what Sony did back in 2013 which apparently traumatized Phil so much.

With all that said, i am not too happy with how Sony has gone about disclosing this stuff either. They shouldve just come out and said we dont support VRS and Mesh Shaders, but our primitive shaders and other techniques can do what those RDNA 2.0 features can do.

P.S If I was an xbox fan I would be pretty fucking upset over no one utilizing these fancy rdna 2.0 features. Like whats the point of 12 tflops, mesh shaders, machine learning and vrs if no one had an games ready for launch or even a year after launch. the mesh shader benchmarks show an insane performance jump of 1700%. imagine what that means for photorealistic visuals in games. surely the focus shouldve been to blow us away with visuals instead of expecting us to jerk off to specs.
 
Xbox has RDNA2 because they waited to the end, because AMD endedn this cutting edge technology in may, that´s why the developer kits of xbox were not ready until the last day, and because of that first games were poor optimized for xbox series.


---"In our quest to put gamers and developers first we chose to wait for the most advanced technology from our partners at AMD before finalizing our architecture."

Microsoft waited for AMD to end RDNA2, sony didn´t.
Till the end? So when was the end? Do you really think the spec for RDNA2 was finalised in May 2020??? Really??? Hahahahaha. That is amazing how MS somehow made this chip sometime in May. So what was Phil taking hom in Dec 2019?
I suggest you read up on Silicon Bring up
 
First we have a die shot of a 6800 and infinity cache.

die-shot-color-front.jpg

Then the low res PS5 die shot, with similar outline around the GPU section
Eu-M4-XEWYAYKn-Mk.jpg

And now the latest higher res shot showing there are things present in these seemingly empty channels.
50947289326-07f4ba7345-o.jpg

Of note to me are the little uniform rectangles that look similar to those in the 6800 die shot that appear on alternating sides of the "infinity cache"

Any experts know what these are.

It is very strange when space is at a premium to have these channels in my opinion at first I thought they were empty with the image the other day, but there is clearly stuff there.

Anyway here to be enlightened by someone more knowledgable.

This...might potentially be interesting. Gonna have to look more into it. At first I was going to say this proved nothing but then I looked at the following:

10aa1328-1539-4735-928b-70e9283cc389.PNG


And there's also this:

z1k7j.jpg


If you look at the graphic Loxus Loxus posted, the IC, if present, would fit in two likely 8 MB banks per side. I'm trying to visualize a scale from the last image linked here.

I still won't outright say IC is there, but IF it's present, there are enough design parallels between PS5's GPU setup and the RDNA 2 GPU references linked to support some possibility of a 16 MB (low end) - 32 MB (upper limit) Infinity Cache. That'd be enough theoretically for a 4K framebuffer (24.8832 MB), tho that's probably not actually enough for a true 4K framebuffer since it seems 128 MB is "just enough" for 4K on PC RDNA 2 GPUs. For smaller framebuffers though, 32 MB would be pretty good, 16 MB less so but we can't rule that out either.

And again, we CAN'T rule out the possibility there's no IC present at all; I'm just saying going off these die shots and graphs that there's some possibility a limited IC capacity implementation is present in PS5, that's all. Ironically if 32 MB that'd make it roughly the same as the eSRAM in the Xbox One (I think AMD took some inspiration from the eSRAM on XBO for going with IC on the RDNA 2 GPUs, at least partially. Maybe that's a kind of ironic, I dunno).
 
Nvidia wants to use for their Nvidia I/O (or how they call it ) the same parts of the GPU wich now (and then off course ) accelerate Raytracing or was it the Parts who do the ML Stuff?!

Nvidia says they will use the GPU shader processors to decompress data faster than typical CPU cores at 14GB/s throughput, and have shown some demos to the press on this with faster loading times and lower CPU usage.

I am very skeptical of this, because:
1 - GPU shader processors are good for highly parallel code at hundreds of threads, data decompression AFAIK uses just 2 or 3 threads per file. Unless it's streaming hundreds of files every second, a CPU is a better fit for decompression.
2 - Nvidia just assumed a 14GB/s throughput because that's twice the 7GB/s limit of a PCIe 4.0 4x SSD. They made no mention over which compression format they were using, to achieve a 2:1 compression ratio which is really odd.
3 - The gains they presented could be coming from DirectStorage alone.
4 - It depends on DirectStorage which won't be ready for almost 2 years. So many things could happen in the meanwhile (like moving on from Ampere and forgetting nvidia ever mentioned this).



I said this many times already.
DirectX Raytracing, Mesh Shaders, Sampler Feedback and Variable Rate Shading are features from DirectX 12 Ultimate and has nothing to do with RDNA 2.
Except that RDNA 2 GPU support DirectX 12 Ultimate, even Nvidia supports DirectX 12 Ultimate.
Those fancy names are just Microsoft hyping up DirectX 12 Ultimate.
So you wouldn't hear those fancy names on PS5 cause Sony doesn't use DirectX 12 Ultimate, they have there own API with there own names for the same feature.
This.
Microsoft bragging about being the "only one with DX12 Ultimate VRS" is like AMD bragging about being the only one with Freesync. It may trick some people into thinking nvidia cards don't support Variable Refresh Rate.
 
This...might potentially be interesting. Gonna have to look more into it. At first I was going to say this proved nothing but then I looked at the following:

10aa1328-1539-4735-928b-70e9283cc389.PNG


And there's also this:

z1k7j.jpg


If you look at the graphic Loxus Loxus posted, the IC, if present, would fit in two likely 8 MB banks per side. I'm trying to visualize a scale from the last image linked here.

I still won't outright say IC is there, but IF it's present, there are enough design parallels between PS5's GPU setup and the RDNA 2 GPU references linked to support some possibility of a 16 MB (low end) - 32 MB (upper limit) Infinity Cache. That'd be enough theoretically for a 4K framebuffer (24.8832 MB), tho that's probably not actually enough for a true 4K framebuffer since it seems 128 MB is "just enough" for 4K on PC RDNA 2 GPUs. For smaller framebuffers though, 32 MB would be pretty good, 16 MB less so but we can't rule that out either.

And again, we CAN'T rule out the possibility there's no IC present at all; I'm just saying going off these die shots and graphs that there's some possibility a limited IC capacity implementation is present in PS5, that's all. Ironically if 32 MB that'd make it roughly the same as the eSRAM in the Xbox One (I think AMD took some inspiration from the eSRAM on XBO for going with IC on the RDNA 2 GPUs, at least partially. Maybe that's a kind of ironic, I dunno).

If we use the UE5 demo as an indication the PS5 would probably not run the majority of games at 4K. The console would probably just use reconstruction techniques or just upscale from a lower resolution. With that said a smaller infinity like cache could be all that it needs.

Remember moving data around quickly and efficiently seems to be one of the main focuses of the PS5. A small infinity like cache could definitely help with that.

This is definitely interesting and needs to be studied further.
 
Last edited:
This...might potentially be interesting. Gonna have to look more into it. At first I was going to say this proved nothing but then I looked at the following:

10aa1328-1539-4735-928b-70e9283cc389.PNG


And there's also this:

z1k7j.jpg


If you look at the graphic Loxus Loxus posted, the IC, if present, would fit in two likely 8 MB banks per side. I'm trying to visualize a scale from the last image linked here.

I still won't outright say IC is there, but IF it's present, there are enough design parallels between PS5's GPU setup and the RDNA 2 GPU references linked to support some possibility of a 16 MB (low end) - 32 MB (upper limit) Infinity Cache. That'd be enough theoretically for a 4K framebuffer (24.8832 MB), tho that's probably not actually enough for a true 4K framebuffer since it seems 128 MB is "just enough" for 4K on PC RDNA 2 GPUs. For smaller framebuffers though, 32 MB would be pretty good, 16 MB less so but we can't rule that out either.

And again, we CAN'T rule out the possibility there's no IC present at all; I'm just saying going off these die shots and graphs that there's some possibility a limited IC capacity implementation is present in PS5, that's all. Ironically if 32 MB that'd make it roughly the same as the eSRAM in the Xbox One (I think AMD took some inspiration from the eSRAM on XBO for going with IC on the RDNA 2 GPUs, at least partially. Maybe that's a kind of ironic, I dunno).
I still doubt the PS5 chip has enough room for any L3 cache.

Edit - I think that area is just the fabric to connect all the SoC parts together.
 
Last edited:
This is for you DJ12.
I highlighted the areas to see it better.
LhpXQDN.jpg


P.S. Before anyone come at me for adding in Infinity Cache, this is just all speculation.
Come be not in PS5 at all.
Some of that highlighted area has to be that infinity fabric stuff to connect everything right? Or do console SoCs not need that?

XSX SoC has what it calls "SoC fabric" in a similar area. I bet PS5 is same. Doubt its cache.

tBTehD6.jpg
 
Last edited:
Interesting "Sony must have something to hide" F.U.D. but at the end there ;). I am not sure more than a handful of people are really or have been really into AVX-256 support in Ryzen 1 vs Ryzen 2 like nobody was when Jaguar had been announced to support single cycle AVX-128... and right now we are still finding a lower clocked CPU (100-300 MHz) with supposedly slower (but not half speed in practice, check performance of Ryzen 1 to Ryzen 2 in FP code, it grows but not 2x AFAIK).

Sony stopped doing deep dives in the SoC with the PS3. What does that tell you then about their faith in the PS4 SoC then ;)?
i always thought sony cerny was a bit more involved with promoting ps4/pro soc, or at least from memory there was more info on it. maybe he did/focus on different parts with ps5 (i/o and tempest), that felt different from gpu/cpu/ram

I can applaud myself more because "supposedly" 256 bit CPU instructions with variable clocks on PS5 performs better in some games.
that's why im interested to know more now that we have more leaks about avx256 cut backs and sw vrs. its interesting to know ps5 is punching above its weight, for how long? due to more efficient hw design or just a temp sw api lead?
 
So, I guess we can't skewer the RTG guy quite yet. LOL

Seems like something that Sony would have talked about though, especially when they were discussing the system bandwidth.
 
I dont get why every other post here are crying about missing infinitycache? Do you know what happens when any data that resides in that cache is invalid? The whole cache gets flushed, resulting in a latency "tank" when the data needs to be read from ram instead, affecting all code paths elsewhere as well. Having a bigger unified cache wont help you, it will just make it worse.

If only RDNA2 had cache scrubbers this scenario would be void, since only the outdated data gets scrubbed from the cache, leaving the other data intact ready to be consumed. Its a night and day difference.

This is such a clutch feature that AMD would be nuts not incorporating it into RDNA3.

Are you saying that all that cache on desktops RDNA 2 makes the GPUs... worse?
This is what you saying.
 
absolutely true.
now tell all the advantages of a wider gpu and you will understand why all the market and gpu maker are going that route )

are you serious? they are going wider because they have to. there are hard physical limits in scaling frequency. GPU vendors would love to just scale up frequency if they could = less die space = more dies per wafer = more profit
 
It's the interconnect linking the CPU to the rest of the system, but yes, all those transistors in the middle of the memory connectors are suspicious.
Despite that if this was IF the PS5 should be performing even better, but it's not.

This is for you DJ12.
I highlighted the areas to see it better.
LhpXQDN.jpg


P.S. Before anyone come at me for adding in Infinity Cache, this is just all speculation.
Could be not in the PS5 at all.

I'm pretty sure that are simply buffers (or inverters/nor gates set for data transmission) used for the bus signal to keep the integrity of the signals all along the connections.
Due to the length of these connections, the loads (mainly due to the resistivity of the connections) at the outputs of the subblocks are huge, and to avoid to have a signal too degraded and keep your timing as you need, your tools will place these buffers automatically in the routing space. You will find that on XsX and other dies in such channel. We use that in dies between 20mm2 and 80mm2, so in dies such this ones that are over 300mm2 I think this is mandatory.
 
Last edited:
i always thought sony cerny was a bit more involved with promoting ps4/pro soc, or at least from memory there was more info on it. maybe he did/focus on different parts with ps5 (i/o and tempest), that felt different from gpu/cpu/ram
He gave pretty much the same coverage to both SoC's, but of course highlighted different aspects as each console had different key strengths. Some bits like I/O and Tempest are the big revolutionary aspects of the generation (especially the former) for them. I think you remember wrong :).
 
Yeah did a little more thinking and looking at a Navi 21 GPU reference again...99% sure PS5 doesn't have Infinity Cache. The dead giveaway is lack of a IFCC (Infinity Fabric Cache Controller); for a moment I was thinking the small grey block between the dual memory controllers could've been a (highly) scaled-down version but that makes no sense because the cache would be chunks arranged in parallel and therefore need a wide parallel (rather than sequential) interface to the L2$ via the IFCC.

There's no room in PS5's design for IFCC and the space between the PHYs is already very narrow; even taking into account the L3$ would have higher density than the L2$, maybe you could theoretically fit some 4 MB - 8 MB of IC in those spaces but again, no IFCC so what would be the point since the cache'd have no way to interact with the L2$? That hypothetical L3$ would run right into the memory controllers, anyway.

Maybe there is some VERY slight chance that Sony redesigned their GDDR6 memory controllers to also house IFCC logic but the chances of that are close to 0%; implementing the memory controllers is one of the more challenging parts of GPU designs IIRC and going to those lengths to design a new controller handling the GDDR6 and IC would be pretty dumb. Not only that, but the L3$ would still not be getting to where it actually need to go, aka the L2$.

Unless someone can provide proof that the unified memory controllers also in fact house IFCC logic, then there's no way the PS5 has any Infinity Cache. It simply can't work without IFCC.

Loxus Loxus that's a clever graphic but the problem is that if your ? are assumed (edit: NOT assumed on your end, you leave it open to speculation. I've assumed it on my end but also there are likely a lot of people who will assume that could be IC so this is for them) to be the IFCC, then the IFCC is not feeding into any L2$. That's how it's designed on RDNA 2 GPUs and that's more or less how the IC system seems to work; IFCC acts as the bridge between the L2$ and L3$ (IC).

The placement of the assumed IFCC in that graphic leads essentially nowhere; only hope would be if the unified memory controllers have IFCC logic built into them and that is a very strong stretch to put it mildly. There's essentially no proof of that having been done whatsoever and we'd need even more detailed x-rays specifically on the memory controllers to even begin to put that in as serious speculation.

Also even considering that, AMD's L3$ (just going off the L2$) are actually likely chunks of smaller cache blocks grouped together into clusters of a larger cache, i.e it's not one physically contingent block of memory. Cache is usually parallel, and IC would need to be in order to provide the bandwidths required, hence why in RDNA 2 GPUs the IFCC runs around the rest of the GPU. So assuming EVERYTHING else providing some proof for IC were there (however slim), it'd still likely fall apart because you'd only have a small portion of the IC actually able to send its data through into the IFCC (which would have to somehow be worked into the memory controllers) in/out of the L2$.
 
Last edited:
This...might potentially be interesting. Gonna have to look more into it. At first I was going to say this proved nothing but then I looked at the following:

10aa1328-1539-4735-928b-70e9283cc389.PNG


And there's also this:

z1k7j.jpg


If you look at the graphic Loxus Loxus posted, the IC, if present, would fit in two likely 8 MB banks per side. I'm trying to visualize a scale from the last image linked here.

I still won't outright say IC is there, but IF it's present, there are enough design parallels between PS5's GPU setup and the RDNA 2 GPU references linked to support some possibility of a 16 MB (low end) - 32 MB (upper limit) Infinity Cache. That'd be enough theoretically for a 4K framebuffer (24.8832 MB), tho that's probably not actually enough for a true 4K framebuffer since it seems 128 MB is "just enough" for 4K on PC RDNA 2 GPUs. For smaller framebuffers though, 32 MB would be pretty good, 16 MB less so but we can't rule that out either.

And again, we CAN'T rule out the possibility there's no IC present at all; I'm just saying going off these die shots and graphs that there's some possibility a limited IC capacity implementation is present in PS5, that's all. Ironically if 32 MB that'd make it roughly the same as the eSRAM in the Xbox One (I think AMD took some inspiration from the eSRAM on XBO for going with IC on the RDNA 2 GPUs, at least partially. Maybe that's a kind of ironic, I dunno).
Possibly, but unless IC is user programmable local storage (cache locking), IC is just L3 Cache so not really related to ESRAM beside being embedded SRAM blocks.
 
are you serious? they are going wider because they have to. there are hard physical limits in scaling frequency. GPU vendors would love to just scale up frequency if they could = less die space = more dies per wafer = more profit
surely you would like to spend less but to sell products they must be competitive. The performance by only raising the clock does not increase linearly .. and as is now known (practically from all vendors) parallelism>>hard>>clock.
Having the same number of CU's obviously always preferable to have the highest clock, but if you don't have a budget problem, only a fool would choose the lower number of CU's.
ps. Before the spiel begins ... no, I didn't say that Cerny is crazy. Cerny has preferred to invest in the i/o and whether you want to believe it or not the consoles are highly constrained budget products.
 
Last edited:
Isn't that like 2MB on the one side and 2MB on the other, likewise 4 & 4 on the CPU? I can't really tell how to read this one. LOL
I'm reading the CPU diagram as 3 views at different abstraction layers. The 2nd Zen2 core complex(containing 4 cores) is the most abstract, the top left shows the next layer of abstraction, block diagrams for two Zen 2 core blocks, two L2 and an L3 block. And below that diagram is a repeat of the two Zen 2 cores but without any abstraction. so the last diagram (without abstraction) fits 4x inside the entire Zen2 CPU image, because it has 8 Cores total, 16MB L3 cache total (4x4MB blocks each made of 4x1MB), etc. unless I'm reading it wrong, which is easily possible :)
 
surely you would like to spend less but to sell products they must be competitive. The performance by only raising the clock does not increase linearly .. and as is now known (practically from all vendors) parallelism>>hard>>clock.
Having the same number of CU's obviously always preferable to have the highest clock, but if you don't have a budget problem, only a fool would choose the lower number of CU's.
ps. Before the spiel begins ... no, I didn't say that Cerny is crazy. Cerny has preferred to invest in the i/o and whether you want to believe it or not the consoles are highly constrained budget products.
This is really not true - the equation is very complex.

The performance gain from frequency increase is linear for the individual component. Going wider has increasing diminishing returns (each additional CU adds less than the last CU added).

However, frequency increase results in power and thermals at some point increasing exponentially and manufacturing yields to go down. In addition, unless other components of the system (CPU, memory etc) matches the frequency increase of the individual component you get synchronisation problems (which negatively impacts performance even more).

Wider is not always a good thing. You can see this with SLI data. With games that supported it adding +100% CUs gave roughly 30% increase in performance. Clear diminishing returns.

In a controlled environment such as a console - frequency is probably your most powerful tool as long as you can keep power and thermals in check.
 
surely you would like to spend less but to sell products they must be competitive. The performance by only raising the clock does not increase linearly .. and as is now known (practically from all vendors) parallelism>>hard>>clock.
Having the same number of CU's obviously always preferable to have the highest clock, but if you don't have a budget problem, only a fool would choose the lower number of CU's.
ps. Before the spiel begins ... no, I didn't say that Cerny is crazy. Cerny has preferred to invest in the i/o and whether you want to believe it or not the consoles are highly constrained budget products.
Not by software development - which is always the limiting factor, and in his words it is easier to keep fast and narrow filled with work - so it is a choice of maximising the effectiveness of the much bigger expense, which is game development. Making it easier for developers to fill the hardware with work and get similar results because more CUs doesn't scale as linear as frequency seems like the technically correct choice IMHO.

I'm sure if Sony wanted to shave 30% of all their game budgets for a generation, and make it a developer issue to keep 72CUs busy - that they wouldn't have time to do and probably get 50% utilisation on average - they could afford to still subsidize a console at similar pricing, but the games would be less optimised or less ambitious at software maturity - and a bigger waste of power, bigger waste of materials and worse for the planet, etc.
 
Last edited:
Not to the Zen Cores but to the Caches, is what I was implying.
But the caches are part of the CCX.
For me it looks like Infinity Fabric. You need to connect and wire the CPU and GPU anyways.

That's what connects everything on AMD SOCs for low latency and high bandwidth.
But I'm also just a laymen and don't have a Phd in microprocessor design
 
Last edited:
Are you saying that all that cache on desktops RDNA 2 makes the GPUs... worse?
This is what you saying.
No, but it's not the be all, end all as some seems to think. Most of the time you will be piped to cache data and all is fine and dandy, but _when_ that miss arrives, it will bring more latency with it since more ALUs are probably dependant on some of the cached data. When no misses though, you get huge returns.
The scrubbers just prevent that one domino piece toppling all the others, making all the other, valid, cached data still available.
 
Last edited:
This is really not true - the equation is very complex.

The performance gain from frequency increase is linear for the individual component. Going wider has increasing diminishing returns (each additional CU adds less than the last CU added).

However, frequency increase results in power and thermals at some point increasing exponentially and manufacturing yields to go down. In addition, unless other components of the system (CPU, memory etc) matches the frequency increase of the individual component you get synchronisation problems (which negatively impacts performance even more).

Wider is not always a good thing. You can see this with SLI data. With games that supported it adding +100% CUs gave roughly 30% increase in performance. Clear diminishing returns.

In a controlled environment such as a console - frequency is probably your most powerful tool as long as you can keep power and thermals in check.

Resource sharing produces a data dependency between the processors.
The CPU has to finish writing to the resource before the GPU reads it.
If the GPU reads the resource before the CPU writes to it, the GPU reads undefined resource data. If the GPU reads the resource while the CPU is writing to it, the GPU reads incorrect resource data.

These data dependencies produce processor stalls between the CPU and the GPU and each processor must wait for the other to finish its work before beginning its own work.

Futhermore, the CPU and GPU are separate processors, so you should make them work simultaneously by using multiple instances of a resource. You have to provide the same arguments to your shaders for each frame, but this doesn't mean you need to reference the same resource object. Instead, you create a pool of multiple instances of a resource and use a different one each time you render a frame. For example, as shown below, the CPU can write position data to a buffer used for the next frame, at the same time that the GPU reads position data from a buffer used for the previous.
By using multiple instances of a buffer, the CPU and the GPU can work continuously and avoid stalls as long as you keep rendering frames.
But here is just the basics. How the frequency increase of the individual component further affect this ? May you elaborate on this ?
 
There's no room in PS5's design for IFCC and the space between the PHYs is already very narrow; even taking into account the L3$ would have higher density than the L2$, maybe you could theoretically fit some 4 MB - 8 MB of IC in those spaces but again, no IFCC so what would be the point since the cache'd have no way to interact with the L2$? That hypothetical L3$ would run right into the memory controllers, anyway.
Just to be clear - I obviously do not know what the PS5 has here. And all we have to look at is a fairly grainy grey picture...

However, how can you be so sure about your statement above that there is no IFCC? If you look at the GDDR6 interface modules, the memory controllers and the zen2 core blocks there is a fairly wide space that does not make sense to me (unless there is some sort of shared cache resource with cache controllers).

My issue is more that the cache size that would be shared would be rather small - would that really give a significant step-up in performance? To me the whole 'infinity cache' thing would only make sense if there was a real cache resource somewhere - and that somewhere can only be off-die which I would love to be true but see as highly unlikely (would be super cool though!).
 
But here is just the basics. How the frequency increase of the individual component further affect this ? May you elaborate on this ?
My point was exactly about these stalls - or lack of synchronisation between various work streams in a hardware set-up. In an environment such as the PC where the user has thrown together whatever hardware set-up they have and you increase the frequency of one component such as the GPU you will start to hit various stall problems with your other components. So frequency increase does not result in linear performance gains unless you ensure that your other pieces are fast enough to keep up with the component whose speed has increased. On the PC this has resulted in pieces of hardware acting as islands and code trying to minimise interdependencies between the various islands.

In this regard UE5 that seemingly utilises the CPU much more for rendering will be interesting to optimise for. I would assume that the optimisation map with regard to frequencies/tuning of CPU, memory and GPU will be quite complex suddenly.

Now we have only talked about this on a high abstract level - on individual pieces of HW this is enormously important in terms of transistor layout - what is the physical distance between a computational core and the primary cache resource? This impacts timings to prevent stalling to a significant degree. And I am just an amateur at this so real HW engineers can describe this much better and more accurate than me.

In a console, the HW vendor controls all parameters (timings and frequency of every component, physical distances etc) and should be able to get more or less a linear relationship between frequency increases and performance because they can ensure that all components are in synch with each other with a minimum of stalling. That was my point :)
 
My point was exactly about these stalls - or lack of synchronisation between various work streams in a hardware set-up. In an environment such as the PC where the user has thrown together whatever hardware set-up they have and you increase the frequency of one component such as the GPU you will start to hit various stall problems with your other components. So frequency increase does not result in linear performance gains unless you ensure that your other pieces are fast enough to keep up with the component whose speed has increased. On the PC this has resulted in pieces of hardware acting as islands and code trying to minimise interdependencies between the various islands.

In this regard UE5 that seemingly utilises the CPU much more for rendering will be interesting to optimise for. I would assume that the optimisation map with regard to frequencies/tuning of CPU, memory and GPU will be quite complex suddenly.

Now we have only talked about this on a high abstract level - on individual pieces of HW this is enormously important in terms of transistor layout - what is the physical distance between a computational core and the primary cache resource? This impacts timings to prevent stalling to a significant degree. And I am just an amateur at this so real HW engineers can describe this much better and more accurate than me.

In a console, the HW vendor controls all parameters (timings and frequency of every component, physical distances etc) and should be able to get more or less a linear relationship between frequency increases and performance because they can ensure that all components are in synch with each other with a minimum of stalling. That was my point :)

Thanks mate, very interesting reading.
I'm confident Cerny and his team worked hard about optimizations, removing most of bottlenecks, providing a very balanced piece of hardware. Resource sharing and stalls/synchronisation between various work streams was also key of hardware and software development along the "console building process". It will take some time before we see the true capabilities of both PS5 and XSX, only the best exclusive games can do this, probably starting from 2022.
 
But the caches are part of the CCX.
For me it looks like Infinity Fabric. You need to connect and wire the CPU and GPU anyways.

That's what connects everything on AMD SOCs for low latency and high bandwidth.
But I'm also just a laymen and don't have a Phd in microprocessor design

I think that's exactly that, all the componants part used for the datas transmission. :)
 
Wow... Thank you very much for spending time and effort on this friend! Now sadly i am not the one who can give you advice because i don't have the slightest experience about the matter. But i am sure that other more capable members will help in no time. Great contribution. :) 👍

I took a fast look on Series S die, and do we know the quantity of GPU L2 cache ? I have found and think that we have 2MB (seems that with this configuration, they simply have 2MB => 128 bits channel, 4MB => 256 bits, 5MB => 320 bits).
Always this curious part, linked to Command Proc part or MM and I/O ?

I have found also something curious, seems that on XsX, the CUs are not connected in the same way to the TMUs compared to XsS/PS5/RX5700 etc.., because the floorplan of this part is done seems changed due to different form factor. I'm crazy ?

I stop that for today, need to stop my break :p

xSqI90E.jpg


YgHZ1Nn.jpg
 
Sony never said shit about Microsoft and never said "no cross gen games."

Find the lie. (Spoiler- there is none.)

"We have always said that we believe in generations. We believe that when you go to all the trouble of creating a next-gen console, that it should include features and benefits that the previous generation does not include. And that, in our view, people should make games that can make the most of those features."

"We do believe in generations, and whether it's the DualSense controller, whether it's the 3D audio, whether it's the multiple ways that the SSD can be used... we are thinking that it is time to give the PlayStation community something new, something different, that can really only be enjoyed on PS5."

- Jimmy Sony


Its standard marketing saying "PS5 is not just a PS4 Pro 2.0." Its hyping the hardware.
They lied about Spiderman MM being exclusivel before backtracking
 
They lied about Spiderman MM being exclusivel before backtracking
Who lied?

And what exactly was the lie?

I ask this because remember ppl were asking the dev about that game.

Regardless, what does that have to do with the games that are PS5 exclusive? Because last time I checked, there are at least 2 we know of right now, one at launch and one yet to come out. Both showcased in July last year.

3, I forgot about Destruction All Stars. 4, forgot about Astro's Playroom
 
Last edited:
I took a fast look on Series S die, and do we know the quantity of GPU L2 cache ? I have found and think that we have 2MB (seems that with this configuration, they simply have 2MB => 128 bits channel, 4MB => 256 bits, 5MB => 320 bits).
Always this curious part, linked to Command Proc part or MM and I/O ?

I have found also something curious, seems that on XsX, the CUs are not connected in the same way to the TMUs compared to XsS/PS5/RX5700 etc.., because the floorplan of this part is done seems changed due to different form factor. I'm crazy ?

I stop that for today, need to stop my break :p

xSqI90E.jpg


YgHZ1Nn.jpg
Look better at the die shots, PS5 uses a mixure of the two:
kBhbawm.png


When the CUs are one in front of the other, the PS5 uses the same layout as the XSX, but if the CU doesn't have something else in front, it does use a flat layout like the XSS.
 
Last edited:
I took some time with pixel ruler, to measure all sub blocks, yes I'm crazy haha.
So I think that I need to raise some points from my "estimated" die shot for XsX:
  • I think I have not really found all the ROPs, I need to have a closer look also on XsS to verify the pattern, seems I have missed half of them !! I will update that.
  • The part I have rounded in green puzzles me, don't know if it is linked to Command processor and GE part, or to I/O because the layout in this part is so different.
  • Le multimedia HW Accel part on XsX is a monster
  • For the (Shader Proc + Prim Units + ... part) + (ROPs part) + (Command Proc. + GE +... part), the total area for these three parts seems clearly bigger on PS5 than XsX (if the rounded part is not include in the Command/GE part, it by more than 20% and 10 % if it is include).
Let me share your advice (and to tell me if I have done a big mistake !)

edit : Do not forget that it is done with obviously relative precision due to the resolution of the screens

2KFZnmh.jpg
Thanks for doing this.
 
Look better at the die shots, PS5 uses a mixure of the two:
kBhbawm.png


When the CUs are one in front of the other, the PS5 uses the same layout as the XSX, but if the CU doesn't have something else in front, it does use a flat layout like the XSS.

You are totally right :) my bad, I have only look the bottom part and thought that it was done like that for each CUs.
 
I took some time with pixel ruler, to measure all sub blocks, yes I'm crazy haha.
So I think that I need to raise some points from my "estimated" die shot for XsX:
  • I think I have not really found all the ROPs, I need to have a closer look also on XsS to verify the pattern, seems I have missed half of them !! I will update that.
  • The part I have rounded in green puzzles me, don't know if it is linked to Command processor and GE part, or to I/O because the layout in this part is so different.
  • Le multimedia HW Accel part on XsX is a monster
  • For the (Shader Proc + Prim Units + ... part) + (ROPs part) + (Command Proc. + GE +... part), the total area for these three parts seems clearly bigger on PS5 than XsX (if the rounded part is not include in the Command/GE part, it by more than 20% and 10 % if it is include).
Let me share your advice (and to tell me if I have done a big mistake !)

edit : Do not forget that it is done with obviously relative precision due to the resolution of the screens

2KFZnmh.jpg
Thanks for this! It lines more or less up with the measurements I did the other night myself.

The observations that were given are there: More assumed I/O and more command processor and GE mm2.

My head scratchers that I do not know how to fully interpret:
- Zen2 complex is very similar in size between the two....but the PS5 has cut out features (seemingly) with regard to 256-bit instructions. Something has been added to the PS5 - what could that be?
- The TMUs seem to be larger per CU on the PS5?? The size difference is much smaller between the two systems compared to the CU difference.
- The 'weird' interconnects between the Zen2 complex, GDDR6 interfaces, the memory controllers and the CUs. Hard not to start to speculate about some sort of shared cache functionality but who knows.
 
Last edited:
I took a fast look on Series S die, and do we know the quantity of GPU L2 cache ? I have found and think that we have 2MB (seems that with this configuration, they simply have 2MB => 128 bits channel, 4MB => 256 bits, 5MB => 320 bits).
Always this curious part, linked to Command Proc part or MM and I/O ?

I have found also something curious, seems that on XsX, the CUs are not connected in the same way to the TMUs compared to XsS/PS5/RX5700 etc.., because the floorplan of this part is done seems changed due to different form factor. I'm crazy ?

I stop that for today, need to stop my break :p

xSqI90E.jpg


YgHZ1Nn.jpg
Thanks. Yes, XSS is very similar to RX 5500 XT (one SA, 22 CUs, 128 bit MC at 224 Gb/s) which also has 2mb of L2 cache, makes sense. This was already confirmed i think but don't remember the source. As to bolded part this needs more expertise and analyse, certainly interesting.
 
Look better at the die shots, PS5 uses a mixure of the two:
kBhbawm.png


When the CUs are one in front of the other, the PS5 uses the same layout as the XSX, but if the CU doesn't have something else in front, it does use a flat layout like the XSS.
What you mean by a mix of the two? Do you talk about the layout of the TMUs?
 
Top Bottom