[Digital Foundry]PS5 uncovered

As i said, an educated guess from a 20 years console/gaming veteran.

2.23ghz being a 'normal' clocks on the same-ish 7nm process is a stretch. The 5700 ran around ~1.8ghz.
Even if Amd make architecture improvements in rDNA2, this is not reflected in Series X, which will be ~1.85ghz normal.

5700 is not power starved, you can flash the bios and unlocked to give it 300W. The high frequencies limit is just that, limit.
Pushing beyond game 'normal' clocks is just that, overclocking.

PS5 smells awfully like OC.
The best thing, it is hard to prove PS5 games runs at that 10tf most of the time. It is a closed console, unless devs leak out the truths, which they wont since NDA.
So Sony gets away with the PR they wanted.
It is not the same process.
That is why there is a 50% increase in perf. per watt.

BTW there is a lot of examples of big improvements using the exactly same process.
 
Last edited:
Border within the green container refers to CU. Increasing CU count also scales more than just ALUs.
You see all those components in the slide you posted? they scale in performance with clocks, so the CU gap isn't a linear increase when compared to a higher clocked part .
That's before taking into account components that reside outside CUs... which was the point of Cerny statement
Increasing GPU clock speed still has memory bandwidth issue
True but different architectures are more efficient with bandwidth consumption, see 2080
If Series X comfy clocks is ~1.8ghz, i dont believe PS5 is 2.23ghz.
It is not true smaller chip can reach higher frequencies,
Absolutely, the bigger the chip the harsher to dissipate heat, its a physics limitation
r. My 1080ti can bench comfortably at 2.025ghz, about the same as smaller pascals.

1080Ti is using a premium bin compared to smaller pascals
That's why i specified of comparable bins, the SEX can't go overboard with power consumption either
 
Last edited:
This is why I posted the following slide to counter "Increasing clocks raises the performance of all components not just ALUs" argument

3oVFeYA.png

Border within the green container refers to CU. Increasing CU count also scales more than just ALUs.

Increasing GPU clock speed still has memory bandwidth issue and PC's RX 6700 XT SKU may have faster 15000 to 15500 rated GDDR6 memory modules.

PC RDNA 2 SKUs are not limited by the new "budget" GDDR-14000 rated modules e.g. GTX 1660 Super has GDDR6-14000 modules.
I think you are making confusion here.

Memory bandwidth is needed to increase more CUs you have... because if you have more parallel processing you need enough memory bandwidth to support that.

Now if you just increase the clock you don't need to increase the memory bandwidth because you maintain the number of parallel processing units.

Increasing clocks increase the performance of all components but there is a limit to that... a hardware limit that makes the increase in clock not be proportional to increase in performance when you are near that limit.

For example in RDNA after 1900Mhz you can increase clocks in 30% and get only a 10% increase in performance.

RDNA 2 increase that hardware limit way over what RDNA delivery... só at 1900Mhz RDNA 2 is still getting similar performance increase compared with clock increase.

You can expect the same increase in clocks you have in GCN to RDNA with RDNA to RDNA 2.
 
"RDNA 2" miracle does NOT exist.
What miracle? lol

it is already confirmed the improvements... 50% increase in perf. per watt.

You have basically no ideia what are you are talking about.

52CUs at 1825Mhz is not possible with RDNA.
36CUs at 2230Mhz is not possible with RDNA.

RDNA 2 already made it happens.
That of course in a APU package... only RDNA 2 chip probably could reach better clocks.
 
Last edited:
It is not the same process.
That is why there is a 50% increase in perf. per watt.

BTW there is a lot of examples of big improvements using the exactly same process.
Example for 28 nm node, R9-290X (5.63 TFLOPS, 290 watts TDP, 16 GDDR5 chips) has ~31% perf/watt improvements over 7970 Ghz Edition (4.3 TFLOPS, 300 watts TDP, 12 GDDR5 chips) with the same 28 nm node. There's another 28 nm 52% perf/watt improvement with Fury X (8.6 TFLOPS, ~275 watts TDP) .

NVIDIA's Maxwell v2 28nm perf/watt improvement jump is industry-leading.

------

N7P has ~10 percent efficiency improvement over 1st gen 7N.

RDNA 2 could involve logic gate/switching layout and second-generation 7N efficiency improvements., hence 50% perf/watt improvement is not the first time.
.
 
I think you are making confusion here.

Memory bandwidth is needed to increase more CUs you have... because if you have more parallel processing you need enough memory bandwidth to support that.

Now if you just increase the clock you don't need to increase the memory bandwidth because you maintain the number of parallel processing units.

Increasing clocks increase the performance of all components but there is a limit to that... a hardware limit that makes the increase in clock not be proportional to increase in performance when you are near that limit.

For example in RDNA after 1900Mhz you can increase clocks in 30% and get only a 10% increase in performance.


RDNA 2 increase that hardware limit way over what RDNA delivery... só at 1900Mhz RDNA 2 is still getting similar performance increase compared with clock increase.

You can expect the same increase in clocks you have in GCN to RDNA with RDNA to RDNA 2.

RX 5700, RX 5700 XT and any RX 5700 XT OC have the same 448GB/s memory bandwidth.



GgnCIUn.png

RX 5700 XT's 9.6 TFLOPS with 122 % scales from RX 5600 XT OC's 7.9 TFLOPS

Math: 9.6 /7.9 = 1.215 or 21.5%

Apply 1.215 on RX 5600 XT Pulse's 7.9TF 100% result lands 121% and the real RX 5700 XT has 122% result

RDNA 36 CU's effective performance penalized by 8% degradation with 336GB/s memory bandwidth ie. Sapphire RX 5600 XT Pulse's TFLOPS increases dampen by 336 GB/s memory bandwidth.

You keep underestimating the memory bandwidth factor.
 
Yes Sony should have gone 512gb/s memory bandwith.
Drop the PR clocks to a more comfortable 2.1ghz, which should reduce bom elsewhere.
Sell it for $399 and call it a gamecube (ie. we are not chasing performance leadership).
Would make most of us happy.
 
You see all those components in the slide you posted? they scale in performance with clocks, so the CU gap isn't a linear increase when compared to a higher clocked part .
That's before taking into account components that reside outside CUs... which was the point of Cerny statement

True but different architectures are more efficient with bandwidth consumption, see 2080

Absolutely, the bigger the chip the harsher to dissipate heat, its a physics limitation

1080Ti is using a premium bin compared to smaller pascals
That's why i specified of comparable bins, the SEX can't go overboard with power consumption either
Mark Cerny's ALU only argument is flawed when actual CU scaling involves more than just ALUs i.e. texture mapper, texture filter, SRAM storage and RT cores also scales along with the shader ALUs.

Against Cerny's argument, XSX already has RTX 2080 like results with two weeks raw Gears 5 benchmark port with PC Ultra settings, hence XSX's RDNA 2 TFLOPS scaling is not a major issue when it's backed by 25% memory bandwidth increase from RX 5700 XT's 448 GB/s memory bandwidth.

GTX 1080 Ti is not the full GP102 i.e. it's the defected GP102 yields. Titan Xp has the full GP102.
 
Last edited:
What miracle? lol

it is already confirmed the improvements... 50% increase in perf. per watt.

You have basically no ideia what are you are talking about.

52CUs at 1825Mhz is not possible with RDNA.
36CUs at 2230Mhz is not possible with RDNA.

RDNA 2 already made it happens.
That of course in a APU package... only RDNA 2 chip probably could reach better clocks.
Reminder, XSX already has RTX 2080 like results with two weeks raw Gears 5 benchmark port with PC Ultra settings, hence XSX's RDNA 2 TFLOPS scaling is not a major issue when it's backed by 25% memory bandwidth increase from RX 5700 XT's 448 GB/s memory bandwidth.

XSX GPU is landing on the expected NVIDIA RX 2080 level. LOL

This is why I stated your RDNA 2 "miracle" does NOT exist i.e. RX 6700 like SKU with 36 CU at 2230Mhz and 448 GB/s memory bandwidth being shared with Ryzen 7 4800H/HS like CPU.
 
Last edited:
RX 5700, RX 5700 XT and any RX 5700 XT OC have the same 448GB/s memory bandwidth.



GgnCIUn.png

RX 5700 XT's 9.6 TFLOPS with 122 % scales from RX 5600 XT OC's 7.9 TFLOPS

Math: 9.6 /7.9 = 1.215 or 21.5%

Apply 1.215 on RX 5600 XT Pulse's 7.9TF 100% result lands 121% and the real RX 5700 XT has 122% result

RDNA 36 CU's effective performance penalized by 8% degradation with 336GB/s memory bandwidth ie. Sapphire RX 5600 XT Pulse's TFLOPS increases dampen by 336 GB/s memory bandwidth.

You keep underestimating the memory bandwidth factor.
Again your post has nothing to do with what I said lol


Reminder, XSX already has RTX 2080 like results with two weeks raw Gears 5 benchmark port with PC Ultra settings, hence XSX's RDNA 2 TFLOPS scaling is not a major issue when it's backed by 25% memory bandwidth increase from RX 5700 XT's 448 GB/s memory bandwidth.

XSX GPU is landing on the expected NVIDIA RX 2080 level. LOL

This is why I stated your RDNA 2 "miracle" does NOT exist i.e. RX 6700 like SKU with 36 CU at 2230Mhz and 448 GB/s memory bandwidth being shared with Ryzen 7 4800H/HS like CPU.
What miracle?

RDNA 2 is already a reality and the clocks of the consoles shows it is a big improvement over RDNA.

You keep posting the same nonsense posts lol
 
Last edited:
The best thing, it is hard to prove PS5 games runs at that 10tf most of the time. It is a closed console, unless devs leak out the truths, which they wont since NDA.
So Sony gets away with the PR they wanted.

Agreed. I believe this holds true with Microsoft as well. They tell us it is 12 Teraflops sustained for any given activity. However, how do we know the GPU clocks are exactly at 1.8Ghz from the time the console turns on to when it powers off? I wish developers would include a OSD letting the player know what frequency the GPU is running at.
 
Last edited:
Agreed. I believe this holds true with Microsoft as well. They tell us it is 12 Teraflops sustained for any given activity. However, how do we know the GPU clocks are exactly at 1.8Ghz from the time the console turns on to when it powers off? I wish developers would include a OSD letting the player know what frequency the GPU is running at.

Since MS dared to commit a number, three numbers actually, opening themselves to lawsuits.

I give them the better benefit of doubt. There is the difference between PS5 and Series X thus far. Someone is being more clear, transparent and confident. :messenger_bicep:
 
This is why I posted the following slide to counter "Increasing clocks raises the performance of all components not just ALUs" argument

3oVFeYA.png

Border within the green container refers to CU. Increasing CU count also scales more than just ALUs.

Increasing GPU clock speed still has memory bandwidth issue and PC's RX 6700 XT SKU may have faster 15000 to 15500 rated GDDR6 memory modules.

PC RDNA 2 SKUs are not limited by the new "budget" GDDR-14000 rated modules e.g. GTX 1660 Super has GDDR6-14000 modules.

The DCU's also scale by clock speed and, yes, scaling DCU's scales TMU's and RT HW almost linearly (DCU is two CU's packed together with some logic shared), but does not scale a whole bunch of components that are quite important to the ease of programming of the chip and to balance the external memory interface (ACE's, HW scheduler, L2 caches, rasteriser, Geometry Engine, ROPS, etc...).

See the following posts:


and

 
Last edited:
Since MS dared to commit a number, three numbers actually, opening themselves to lawsuits.

I give them the better benefit of doubt. There is the difference between PS5 and Series X thus far. Someone is being more clear, transparent and confident. :messenger_bicep:
Interesting narrating you weave ;). I am struggling to see either company being any less transparent, clear, or confident... but we shall see more as time goes on.
 
Series X and PS5 apu are from same gen.
If Series X comfy clocks is ~1.8ghz, i dont believe PS5 is 2.23ghz.

It is not true smaller chip can reach higher frequencies, at least not 30% higher. My 1080ti can bench comfortably at 2.025ghz, about the same as smaller pascals.

We shall see i guess, at the actual multiplatform games. At best, the frame rates and pixel counting. It is really hard to prove Sony 10tfl claims sadly. Thats what they wanted. :eek:

Not sure why you compare different GPU's with different feature sets on different nodes in your PC to prove how much they can overclock vs a totally different chip in your precious GitHub leak already covering apparently 2.0+ GHz in very very early silicon (hint: if the leak were true it is one more proof the design was targeting very high clocks from day 1, MS chose to invest their power budget in more but lower clocked compute units... and the clock speed increase does not cover the difference in CU's).
 
Last edited:
Not sure why you compare different GPU's with different feature sets on different nodes in your PC to prove how much they can overclock va a totally different chip in your precious GitHub leak already covering apparently 2.0+ GHz in very very early silicon (hint: if the leak were true it is one more proof the design was targeting very high clocks from day 1, MS chose to invest their power budget in more but lower clocked compute units... and the clock speed increase does not cover the difference in CU's).

lol These people are contradicting themselves and don't even realize how silly they sound. You got one guy trying to argue that the PS5 isn't 10.2TF based off GPUs that don't even have RDNA2 to make a comparison. You got another guy literally saying that the PS5 can't hit 2.23GHz because according to him "a smaller chip can't reach higher frequencies" despite having no evidence to back up his claim at all. It's ridiculous and screams FUD.
 
The DCU's also scale by clock speed and, yes, scaling DCU's scales TMU's and RT HW almost linearly (DCU is two CU's packed together with some logic shared), but does not scale a whole bunch of components that are quite important to the ease of programming of the chip and to balance the external memory interface (ACE's, HW scheduler, L2 caches, rasteriser, Geometry Engine, ROPS, etc...).

See the following posts:


and

The other poster is repeating the flawed argument from Mark Cerny 's scaling CU only scales ALU which is BS when CU includes wave queue schedulers, texture filtering, texture mapper, L0/L1 caches, LDS, RT cores.

MS hasn't revealed XSX GPU's un-core sections, but there's are four extra GDDR6 memory chips which could lead to increase ROPs and L2 cache. Rendering 4K resolution leads to greater parallelism when coupled with increase memory bandwidth.

To bad for Mark Cerny's ALU only scaling argument, XSX is delivering RTX 2080 class results with two weeks raw Gears 5 benchmark port with PC Ultra setting, XSX's scaling argument is valid as wider RTX 2080.

My MSI RTX 2080 Ti Gaming X Trio GPU card says Hi.

Lisa Su's "Big NAVI "is the 4K Ryzen style disruptor in the PC GPU market NOT PS5!

Again your post has nothing to do with what I said lol

What miracle?

RDNA 2 is already a reality and the clocks of the consoles shows it is a big improvement over RDNA.

You keep posting the same nonsense posts lol
Reminder,
1. PC's PEG slot and PEG power delivery is rated at 300 watts minimum, hence why PC AIB 5700 XTs already reached near 2Ghz out-of-the-box with near PC PEG's 300 watts power budget.
2. Future AMD PC GPU SKUs are not limited by new budget card favorite GDDR6-14000 rated chips e.g. GDDR6-15500 and GDDR6-16000.

There's no RDNA 2 miracle with memory bandwidth issue i.e. "Big NAVI" comes with higher memory bandwidth.

"RDNA 2" 50% perf/watt for PC is needed for PC PEG 300 watts "Big NAVI" to compete against NVIDIA's TU102 out-of-the-box OC and GA102/GA103 RTX.

Mark Cerny is doing His best with the given corporate budget.

There will be "R9-290X" flagship to overshadow PS4 "Supercharge PC" for RDNA 2 generation aka "Big NAVI" flagship.
 
Last edited:
lol These people are contradicting themselves and don't even realize how silly they sound. You got one guy trying to argue that the PS5 isn't 10.2TF based off GPUs that don't even have RDNA2 to make a comparison. You got another guy literally saying that the PS5 can't hit 2.23GHz because according to him "a smaller chip can't reach higher frequencies" despite having no evidence to back up his claim at all. It's ridiculous and screams FUD.
RDNA 2 has 50% perf/watt improvement to enable a higher clock speed when compared to PC AIB RX 5700 XT with out-of-the-box near 2Ghz clock speed. AMD is not limited by console's GDDR6-14000 rated chips e.g. GDDR6-15500 and 16000.

For PC market, RDNA 2's 50% perf/watt is needed for "Big NAVI" against RTX 2080 Ti OC/RTX 3080/RTX 3080 Ti market segments.
 
Last edited:
Since MS dared to commit a number, three numbers actually, opening themselves to lawsuits.

I give them the better benefit of doubt. There is the difference between PS5 and Series X thus far. Someone is being more clear, transparent and confident. :messenger_bicep:

Pro tip : you shouldn't use a Phil Spencer avatar if you want to have any credibility in PS5 threads.
 
The other poster is repeating the flawed argument from Mark Cerny 's scaling CU only scales ALU which is BS when CU includes wave queue schedulers, texture filtering, texture mapper, L0/L1 caches, LDS, RT cores.

MS hasn't revealed XSX GPU's un-core sections, but there's are four extra GDDR6 memory chips which could lead to increase ROPs and L2 cache. Rendering 4K resolution leads to greater parallelism when coupled with increase memory bandwidth.

To bad for Mark Cerny's ALU only scaling argument, XSX is delivering RTX 2080 class results with two weeks raw Gears 5 benchmark port with PC Ultra setting, XSX's scaling argument is valid as wider RTX 2080.

My MSI RTX 2080 Ti Gaming X Trio GPU card says Hi.

Lisa Su's "Big NAVI "is the 4K Ryzen style disruptor in the PC GPU market NOT PS5!
It is not bullshit lol
You really don't understand the these same units inside that ALU are running at different clocks.

Wave queue schedulers, texture filtering, texture mapper, L0/L1 caches, LDS, RT cores like you said are all running at 2230Mhz instead 1825Mhz.

Which extra GDDR6? Are you crazy?

Reminder,
1. PC's PEG slot and PEG power delivery is rated at 300 watts minimum, hence why PC AIB 5700 XTs already reached near 2Ghz out-of-the-box with near PC PEG's 300 watts power budget.
2. Future AMD PC GPU SKUs are not limited by new budget card favorite GDDR6-14000 rated chips e.g. GDDR6-15500 and GDDR6-16000.

There's no RDNA 2 miracle with memory bandwidth issue i.e. "Big NAVI" comes with higher memory bandwidth.

"RDNA 2" 50% perf/watt for PC is needed for PC PEG 300 watts "Big NAVI" to compete against NVIDIA's TU102 out-of-the-box OC and GA102/GA103 RTX.

Mark Cerny is doing His best with the given corporate budget.

There will be "R9-290X" flagship to overshadow PS4 "Supercharge PC" for RDNA 2 generation aka "Big NAVI" flagship.
Dude what miracle are you talking about?

RDNA 2 support high clocks... Xbox and PS5 are an example of that.

Your post makes no sense because it is unrelated with the situation of the consoles and RDNA 2.
 
Last edited:
I think you are making confusion here.

Memory bandwidth is needed to increase more CUs you have... because if you have more parallel processing you need enough memory bandwidth to support that.

Now if you just increase the clock you don't need to increase the memory bandwidth because you maintain the number of parallel processing units.

Increasing clocks increase the performance of all components but there is a limit to that... a hardware limit that makes the increase in clock not be proportional to increase in performance when you are near that limit.

For example in RDNA after 1900Mhz you can increase clocks in 30% and get only a 10% increase in performance.

RDNA 2 increase that hardware limit way over what RDNA delivery... só at 1900Mhz RDNA 2 is still getting similar performance increase compared with clock increase.

You can expect the same increase in clocks you have in GCN to RDNA with RDNA to RDNA 2.
That's not true. Faster CUs will process more operations so they will need more bandwidth.
 
That's not true. Faster CUs will process more operations so they will need more bandwidth.
It will need better latency not exactly more bandwidth.... you need the data reaching the CUs in less cycles not more data.

More CUs (parallelism) needs more bandwidth... you need more data reaching the CUs.

Let's give an example...

Let's say each CU works with 10GB of data for processing.

36CUs needs total 360GB of data.
52CUs needs total of 520GB of data.

If you increase the clock of the 36CUs it will only means you will finish the workload faster and will need more 360GB but you won't need more than that... you just need that data being available faster (in less cycles).

Of course increasing bandwidth helps to cover the latency.... but you don't exactly needs more data.

That is why when doing GPU overclock you need to focus on GPU clock before memory clock... the first give you more performance than the second.
 
Last edited:
It is not bullshit lol
You really don't understand the these same units inside that ALU are running at different clocks.

Wave queue schedulers, texture filtering, texture mapper, L0/L1 caches, LDS, RT cores like you said are all running at 2230Mhz instead 1825Mhz.

Which extra GDDR6? Are you crazy?

Dude what miracle are you talking about?

RDNA 2 support high clocks... Xbox and PS5 are an example of that.

Your post makes no sense because it is unrelated with the situation of the consoles and RDNA 2.
You argued NAVI above 1900Mhz has bad performance scaling and hoping RDNA 2 will deliver the miracle when you keep forgetting

1. RX 5700/RX 5700 XT's memory bandwidth remains at 448GB/s.

2. PC PEG slot and PEG power delivery design standard has 300-watts minimum and it's not bound by game console's budgetary limitations.

XSX has 10 GDDR6 memory chips for the 320-bit bus when compared to 256-bit bus's 8 GDDR6 memory chips.
 
You argued NAVI above 1900Mhz has bad performance scaling and hoping RDNA 2 will deliver the miracle when you keep forgetting

1. RX 5700/RX 5700 XT's memory bandwidth remains at 448GB/s.

2. PC PEG slot and PEG power delivery design standard has 300-watts minimum and it's not bound by game console's budgetary limitations.

XSX has 10 GDDR6 memory chips for the 320-bit bus when compared to 256-bit bus's 8 GDDR6 memory chips.
Hoping?

RDNA 2 is already confirmed with better clock scaling... the fact Xbox and PS5 reach these clocks already shows that.

RDNA examples and that PEG that is unrelated to the talk won't change that lol
In fact if you not cap the power target on PS5 the GPU can maintain 2230Mhz all the time because it is RDNA 2... RDNA can't do that no matter the power you supply to it.

Xbox needs 10 memory chip for a 320bits bus and to go cheaper they split the memory bandwidth not using the same density chip for all modules.
 
Last edited:
It will need better latency not exactly more bandwidth.... you need the data reaching the CUs in less cycles not more data.

More CUs (parallelism) needs more bandwidth... you need more data reaching the CUs.

Let's give an example...

Let's say each CU works with 10GB of data for processing.

36CUs needs total 360GB of data.
52CUs needs total of 520GB of data.

If you increase the clock of the 36CUs it will only means you will finish the workload faster and will need more 360GB but you won't need more than that... you just need that data being available faster (in less cycles).

Of course increasing bandwidth helps to cover the latency.... but you don't exactly needs more data.

That is why when doing GPU overclock you need to focus on GPU clock before memory clock... the first give you more performance than the second.
This is why I posted the following

3U7WkYV.png

Sapphire RX 5600 XT NAVI 10 with 36 CU at 1712 Mhz clock speed average (~7.89TFLOPS) and 336 GB/s memory bandwidth debunks your argument.

Insufficient memory bandwidth can penalize higher 7.89 TFLOPS Sapphire RX 5600 XT Pulse relative to lower 7.7 TFLOPS TFLOPS RX-5700.

For PS5 and similar RDNA 2 class GPUs (e.g. RX 6700), RDNA 2/Turing RTX's resource conservation hardware features will be needed.

XSX can brute force it's way like RTX 2080 class GPU equipped gaming PC, but RDNA 2/Turing RTX's resource conservation hardware features are useful.
 
This is why I posted the following

3U7WkYV.png

Sapphire RX 5600 XT NAVI 10 with 36 CU at 1712 Mhz clock speed average (~7.89TFLOPS) and 336 GB/s memory bandwidth debunks your argument.

Insufficient memory bandwidth can penalize higher 7.89 TFLOPS Sapphire RX 5600 XT Pulse relative to lower 7.7 TFLOPS TFLOPS RX-5700.

For PS5 and similar RDNA 2 class GPUs (e.g. RX 6700), RDNA 2/Turing RTX's resource conservation hardware features will be needed.

XSX can brute force it's way like RTX 2080 class GPU equipped gaming PC, but RDNA 2/Turing RTX's resource conservation hardware features are useful.
WTF? lol

Keep posting random charts doesn't create a point you know.

RDNA, Turing or anything is not related to what RDNA 2 can do... stop looking for something unrelated to the conversation.

BTW you need more bandwidth for post-processing/texture/filters but these are unrelated to CU.
 
Last edited:
The other poster is repeating the flawed argument from Mark Cerny 's scaling CU only scales ALU which is BS when CU includes wave queue schedulers, texture filtering, texture mapper, L0/L1 caches, LDS, RT cores.

I do not see any flawed BS argument in what Cerny said, more like some people misinterpreting what he said sometimes intentionally. Cerny said that in some cases the extra clockspeed helps close the gap with a design with more CU's and in some cases it speeds up important parts of the GPU that do not scale with the DCU count: triangle setup, RB's, Geometry Engine, ACE's, HW Scheduler, L2 caches bandwidth, etc...

Even inside the CU, if you are running more general purpose dynamically branching code that would cause per wave divergence (i.e.: throwing more calculations away per clock) having a higher peak frequency may improve efficiency. Also, not al the GPU work is the same or resolution dependent: on cases where your compute shader is burning up a lot of resources and your number of max threads in flight is limited because of that or because data dependencies then you may have scenarios where it may be easier to exploit a 36 CU's design at a higher frequency than a design with more CU's at a lower frequency.

Does it mean that PS5 has higher effective TFLOPS than XSX and thus trounces it? No 😂, not saying this.

MS hasn't revealed XSX GPU's un-core sections, but there's are four extra GDDR6 memory chips which could lead to increase ROPs and L2 cache. Rendering 4K resolution leads to greater parallelism when coupled with increase memory bandwidth.

Yeah, the higher the resolution the greater the chance to find parallel work for the fragment shading part of the pipeline/resolution and view dependent calculations.

As I was saying in the posts I linked above, I think the number of ROPS/RB's is linked to the number of Shader Arrays in each Shader Engine. I think there is a good chance the XSX design added two DCU's per Shader Array to get to 56 CU's total and 52 CU's active (4 of them, 2 DCU's, disabled in total).
This would mean two Shader Arrays for each of the two Shader Engines and thus both XSX and PS5 would go with the same number of ROPS.
I am not convinced they deviated this much over what the RDNA architecture uses here.

Chances are that Sony is not going to max the ROPS usage all the time, but may benefit of the performance they get in short burst and maybe to break their work down sometimes burning up fillrate instead of using extra compute resources which XSX can afford to.
If the XSX chip is already pushing for a big SoC as it is (very wide memory interface, lots more compute units, etc...) the amount of space and power they can burn by increasing ROPS and L2 cache (increasing enough to make a sizeable difference) is limited.
 
Last edited:
Agreed. I believe this holds true with Microsoft as well. They tell us it is 12 Teraflops sustained for any given activity. However, how do we know the GPU clocks are exactly at 1.8Ghz from the time the console turns on to when it powers off? I wish developers would include a OSD letting the player know what frequency the GPU is running at.
in less than a year people on yt are gonna open and hack both console to see what the truth really is during heavy gaming situations.

at least this is what some people told me on reeeeee.
 
Hoping?

RDNA 2 is already confirmed with better clock scaling... the fact Xbox and PS5 reach these clocks already shows that.

RDNA examples and that PEG that is unrelated to the talk won't change that lol
In fact if you not cap the power target on PS5 the GPU can maintain 2230Mhz all the time because it is RDNA 2... RDNA can't do that no matter the power you supply to it.

Xbox needs 10 memory chip for a 320bits bus and to go cheaper they split the memory bandwidth not using the same density chip for all modules.

FYI, liquid-cooled RX 5700 XT can reach 2.25 Ghz average. Generating heat increases electrical resistance which degrades signal quality, hence it's a fight with increase heat/increase electrical resistance/increase voltage until it reaches a hardware limit. RDNA 2 design has mitigated this issue.

Info on RX 5700 XT liquid-cooled results https://www.igorslab.de/en/untied-r...werplaytables-for-the-rx-5700-and-rx-5700-xt/

RX 5700 XT can switch at 2.2 Ghz clock speed but RDNA 2 has mitigated against increase heat/increase electrical resistance/increase voltage loop when increasing clock speed..

----
X1X's asymmetric memory issue again? Have you factored in CPU's memory bandwidth usage?


swzTBWT.png


Scenario 1

For XSX GPU memory bandwidth

1st slice (odd channels), 28 GB/s x 6 = 168 GB/s potential

2nd slice (even channels), 28 GB/s x 6 = 168 GB/s potential

3rd slice, 56 GB/s x 4 = 224‬ GB/s potential

IF CPU consumes 50 GB/s from 1st slice i.e. 168 - 50 = 118 then the total GPU bandwdith is 510‬ GB/s

----


For PS5 GPU memory bandwidth

448 - 50 = 398‬ GB/s

-----


XSX GPU has 28% memory bandwidth advantage.

PS5 GPU takes slightly more than half a penalty hit (ie. 4.5%) like RX 5600 OC 36 CU (336GB/s) vs RX 5700 36 CU (448 GB/s)= 8% performance hit.

XSX vs PS5 GPU performance gap is about 23% to 24.5%

Not factoring PS5's TFLOPS scaling issues with gimp GPU memory bandwidth.

Scenario 2

For XSX GPU memory bandwidth

1st slice, 28 GB/s x 6 = 168 GB/s potential

2nd slice, 28 GB/s x 6 = 168 GB/s potential

3rd slice, 56 GB/s x 4 = 224‬ GB/s potential

IF CPU consumes 100 GB/s from 1st slice i.e. 168- 100 = 68 then the total GPU bandwdith is 460‬ GB/s

----


For PS5 GPU memory bandwdith

448 - 100 = 348‬ GB/s

-----


XSX GPU has 32% memory bandwidth advantage.

PS5 GPU takes a penalty hit like RX 5600 OC 36 CU (336GB/s) vs RX 5700 36 CU (448 GB/s)= 8% performance hit.

XSX vs PS5 GPU performance gap is about 28%

Not factoring PS5's TFLOPS scaling issues with gimp GPU memory bandwidth.


GDDR6 is not GDDR5/GDDR5X i.e. GDDR6 improves fine-grain and random memory access performance. GDDR6 arrives in time for larger-scale APUs from PS5 and XSX.


Killzone Shadowfall's CPU vs GPU memory usage example.

naDwFg6.jpg
 
Some people here really believing Cerny just put some parts together and decided to overclock the GPU the last minute...smh

You should go back and watch Road to PS5 for the first time it seems. One of PS4's problems was cooling, loud fans and so on. The whole idea of PS5, as explained, is to improve and fix previous flaws.

Cerny designed this together with Sony delevopment teams worldwide and probably important 3rd parties too. There is a logic behind all of this, again, as explained before by Cerny.

PS5 has a fixed power quota by design. You may choose not to believe in the presentation we saw but that doesn't make you right.


Edit: this is also why SSD was one of the priorities. Devs asked for it, that's it.
 
Last edited:
I do not see any flawed BS argument in what Cerny said, more like some people misinterpreting what he said sometimes intentionally. Cerny said that in some cases the extra clockspeed helps close the gap with a design with more CU's and in some cases it speeds up important parts of the GPU that do not scale with the DCU count: triangle setup, RB's, Geometry Engine, ACE's, HW Scheduler, L2 caches bandwidth, etc...

Even inside the CU, if you are running more general purpose dynamically branching code that would cause per wave divergence (i.e.: throwing more calculations away per clock) having a higher peak frequency may improve efficiency. Also, not al the GPU work is the same or resolution dependent: on cases where your compute shader is burning up a lot of resources and your number of max threads in flight is limited because of that or because data dependencies then you may have scenarios where it may be easier to exploit a 36 CU's design at a higher frequency than a design with more CU's at a lower frequency.

Does it mean that PS5 has higher effective TFLOPS than XSX and thus trounces it? No 😂, not saying this.

Yeah, the higher the resolution the greater the chance to find parallel work for the fragment shading part of the pipeline/resolution and view dependent calculations.

As I was saying in the posts I linked above, I think the number of ROPS/RB's is linked to the number of Shader Arrays in each Shader Engine. I think there is a good chance the XSX design added two DCU's per Shader Array to get to 56 CU's total and 52 CU's active (4 of them, 2 DCU's, disabled in total).
This would mean two Shader Arrays for each of the two Shader Engines and thus both XSX and PS5 would go with the same number of ROPS.
I am not convinced they deviated this much over what the RDNA architecture uses here.

Chances are that Sony is not going to max the ROPS usage all the time, but may benefit of the performance they get in short burst and maybe to break their work down sometimes burning up fillrate instead of using extra compute resources which XSX can afford to.
If the XSX chip is already pushing for a big SoC as it is (very wide memory interface, lots more compute units, etc...) the amount of space and power they can burn by increasing ROPS and L2 cache (increasing enough to make a sizeable difference) is limited.
Digital Foundry has tested RX 5700 at 9.6 TFLOPS against RX 5700 XT at 9.6 TFLOPS and found out RDNA is not GCN. LOL

Actually, higher CU count leads to higher SRAM registry storage for more wavefronts in-flight and 4k resolution leads to greater parallelism.

A GPU equipped with fewer CUs has less register storage, hence extra memory access rates outside the CU.


I'm game for a GpGPU debate. This is getting ridiculous.
 
Last edited:
Actually, higher CU count leads to higher SRAM registry storage for more wavefronts in-flight and 4k resolution leads to greater parallelism.

Overall for the entire GPU yes, but not per CU (of course you can run more in parallel when you have more unit to run and/park work on) and we already agreed that higher resolution leads to chance to exploit the greater parallelism for both consoles.

Clockspeed raising the performance of the same L0 + TMU + ... per DCU helps with avoiding need of extra work in flight to maintain good utilisation, but as evidenced already it does not close the 14-18% gap in peak FP performance. Not sure how much non embarrassingly parallel work they plan to run on the GPU to take advantage of the extra frequency (CU level).
The perks of the much higher clockrate, and as Cerny noted the impact on the greater latency for RAM accesses the GPU now sees, are also something far easier to fully flex in first party titles than third party ones. Same for the SSD, but maybe it will cause interesting tradeoffs there too (although not super noticeable, I think early on PS5 just tries to be easy to develop on period).
 
For XSX GPU memory bandwidth

1st slice (odd channels), 28 GB/s x 6 = 168 GB/s potential

2nd slice (even channels), 28 GB/s x 6 = 168 GB/s potential

3rd slice, 56 GB/s x 4 = 224‬ GB/s potential

IF CPU consumes 50 GB/s from 1st slice i.e. 168 - 50

Not sure you can actually access memory like that, if that example chart is correct that is. People were quoting a dev from Era discussing about having two switch between fast and slow memory "pools" per cycle to get closer to full memory bandwidth and that in the average scenario where for example you would need more than what you have available in the 10 GB fast section your bandwidth would drop more than what you have calculated there (tax being around 80 GB/s) and thus bringing the bandwidth for both GPU's closer to each other and definitely they seem to match to each console's bandwidth needs.
 
FYI, liquid-cooled RX 5700 XT can reach 2.25 Ghz average. Generating heat increases electrical resistance which degrades signal quality, hence it's a fight with increase heat/increase electrical resistance/increase voltage until it reaches a hardware limit. RDNA 2 design has mitigated this issue.

Info on RX 5700 XT liquid-cooled results https://www.igorslab.de/en/untied-r...werplaytables-for-the-rx-5700-and-rx-5700-xt/

RX 5700 XT can switch at 2.2 Ghz clock speed but RDNA 2 has mitigated against increase heat/increase electrical resistance/increase voltage loop when increasing clock speed..

----
X1X's asymmetric memory issue again? Have you factored in CPU's memory bandwidth usage?


swzTBWT.png


Scenario 1

For XSX GPU memory bandwidth

1st slice (odd channels), 28 GB/s x 6 = 168 GB/s potential

2nd slice (even channels), 28 GB/s x 6 = 168 GB/s potential

3rd slice, 56 GB/s x 4 = 224‬ GB/s potential

IF CPU consumes 50 GB/s from 1st slice i.e. 168 - 50 = 118 then the total GPU bandwdith is 510‬ GB/s

----


For PS5 GPU memory bandwidth

448 - 50 = 398‬ GB/s

-----


XSX GPU has 28% memory bandwidth advantage.

PS5 GPU takes slightly more than half a penalty hit (ie. 4.5%) like RX 5600 OC 36 CU (336GB/s) vs RX 5700 36 CU (448 GB/s)= 8% performance hit.

XSX vs PS5 GPU performance gap is about 23% to 24.5%

Not factoring PS5's TFLOPS scaling issues with gimp GPU memory bandwidth.

Scenario 2

For XSX GPU memory bandwidth

1st slice, 28 GB/s x 6 = 168 GB/s potential

2nd slice, 28 GB/s x 6 = 168 GB/s potential

3rd slice, 56 GB/s x 4 = 224‬ GB/s potential

IF CPU consumes 100 GB/s from 1st slice i.e. 168- 100 = 68 then the total GPU bandwdith is 460‬ GB/s

----


For PS5 GPU memory bandwdith

448 - 100 = 348‬ GB/s

-----


XSX GPU has 32% memory bandwidth advantage.

PS5 GPU takes a penalty hit like RX 5600 OC 36 CU (336GB/s) vs RX 5700 36 CU (448 GB/s)= 8% performance hit.

XSX vs PS5 GPU performance gap is about 28%

Not factoring PS5's TFLOPS scaling issues with gimp GPU memory bandwidth.


GDDR6 is not GDDR5/GDDR5X i.e. GDDR6 improves fine-grain and random memory access performance. GDDR6 arrives in time for larger-scale APUs from PS5 and XSX.


Killzone Shadowfall's CPU vs GPU memory usage example.

naDwFg6.jpg
An wall text of nothing substantial lol

RDNA doesn't scale in high clocks... that liquid cooling RX 5700 is just to break speed records... it performance does not scale, it is impractical and the clock doesn't sustain.

About Xbox memory setup you can't access the memory at the same time so you are ir access the 560GB/s part or the 336GB/s part... your made up math made no sense.

And now you try to use an old GCN 1.2 as example for memory management in RDNA 2.

Your examples are very misleading.
 
Last edited:
Not sure you can actually access memory like that, if that example chart is correct that is. People were quoting a dev from Era discussing about having two switch between fast and slow memory "pools" per cycle to get closer to full memory bandwidth and that in the average scenario where for example you would need more than what you have available in the 10 GB fast section your bandwidth would drop more than what you have calculated there (tax being around 80 GB/s) and thus bringing the bandwidth for both GPU's closer to each other and definitely they seem to match to each console's bandwidth needs.
You can't.
It is a single bus with single access per time to the memory pool...

But somehow MS fans believes you can access all the memory parts in parallel lol

It is a weird memory made to cut costs... it add more complexity and variable performance than helps developers.
 
Last edited:
You can't.
It is a single bus with single access per time to the memory pool...

But somehow MS fans believes you can access all the memory parts in parallel lol

Yes, that is what I was saying and why the strategy in these cases is aternating access between those pools as often as possible.
 
ethomaz ethomaz bro why are you arguing with yourself? (At least as it seems to me with all these people I put on ignore, so you really look like answering the air and talking to yourself)

I do suggest, stop engaging with fanatics, put them on ignore and let us PS fans argue among ourselves on clear heads, as we won't have presupposed agendas to push, we won't try to argue just or the sake of arguing... etc. Really mate just ignore those constant agitators so that we can have a meaningful discussion instead of yelling matches between crazies.

Btw I really do not care about the hardware, or at least nitty gritty numbers. There was a stupid suggestion here (which I then put the guy on ignore so I won't be able to respond if he/she has a fit of rage after me) that MS And Sony should put a screen on the unit showing what frequency the CPU/GPU is currently running at and perhaps even the value of used RAM..... LOL like wtf why do you even care mate? The game is designed to run on the box, so it will run on the box, you don't need those infos like on the PC. 'Can my system run this game' <- these websites are only for PC hahahaha and the number craze is coming only from PC master racers.

And it just dawned on me rn how MS is courting those PC master racers with Series X and those poor saps are turning into console warriors without even realizing man LOL just look at them expecting consoles to become more and more like PC with RGB lighting, frequency displays and MS serving them up just what they want, and them becoming console lovers and maybe even fanatics running around threads shitting on them without any pay.

What PC exclusive AAA game there is right now in the market? OK include upcoming any upcoming games too. => Just one -> Star Citizen (and that is with bending the requirements of a AAA game)
I mean there isn't even a game that the console gamers won't be missing out except for that one, and yet we are still fighting among themselves? About who got few % better resolution/performance whatever.

Games are all that matters. Don't say to me that even single one of you will wonder about memory pools, its current BW, its current frequency..... etc. when you put a disc in it and start playing. None will.

PS4 had noise issues on some units. They said they've heard it and fixed it and really proud of their engineering solution and will show it in the near future. And worrying about it won't change that it is fixed. Guys stop concerns trolling about that shit bc it is simply fixed man move along.

Now that it is established both systems are powerful, quiet when playing, then why on earth would you care for those spec numbers while you are playing.

Is there someone here who volunteers to say that they really care for that stuff when gaming? Pls do really tell, my ignore list is hungry.
 
It's like the PS5's final specs were created the day before Nvidia came out with the leap forward in ray-tracing graphics cards. That blows on many levels. The PS4 Pro is 4.2 TF and the Xbox One X is 6TF. Is it not worth it, Sony? ES6, when released will most likely have a min 2080-family requirement, so imagine playing that at max on a 4080TI compared to a 1080, that can't even do min specs.
 
Yes Sony should have gone 512gb/s memory bandwith.
I agree and hope they upgrade last minute but consoles are a compromise
XSX with 10x2GB chips would have been better but alas compromise that's not without its tradeoffs: limits total available memory to 10GB (including CPU) or incur a performance hit whenever the slower 3.5GB is accessed
You can't.
It is a single bus with single access per time to the memory pool...
But somehow MS fans believes you can access all the memory parts in parallel lol
It is a weird memory made to cut costs... it add more complexity and variable performance than helps developers.
Yes Its what i read too, the memory interface (bus) is saturated and can't access the slower portion on the same cycle thus total (average) bandwidth drops whenever the extra 3.5GB is accessed making the average (bandwidth) lower than the max
Mark Cerny's ALU only argument is flawed when actual CU scaling involves more than just ALUs i.e. texture mapper, texture filter, SRAM storage and RT cores also scales along with the shader ALUs.
Its not, you need all those extra components to properly feed a CU, higher clocks mean the whole unit ( texture mapper, texture filter, SRAM storage and RT hw) performs faster. To give you a easy to understand example:
A 1CU @2GHz vs B 2CUs @1GHz under ideal cicumstances (100% utilization) both configurations would perform exactly the same (A performs tasks twice as fast but B performs two tasks simultaneously). The advantage Cerny mentioned is related to GPU components besides CU such as the frontend, rops, cache, geometry processors etc. all being affected by the higher clock
Against Cerny's argument, XSX already has RTX 2080 like results with two weeks raw Gears 5 benchmark port with PC Ultra settings, hence XSX's RDNA 2 TFLOPS scaling is not a major issue when it's backed by 25% memory bandwidth increase from RX 5700 XT's 448 GB/s memory bandwidth.
That's great to hear but doesnt contradict what he said one bit, if anything it paints a great picture for RDNA2 improvements
GTX 1080 Ti is not the full GP102 i.e. it's the defected GP102 yields. Titan Xp has the full GP102.
Still a premium bin/silicon compared to smaller pascals
 
Last edited:
Its not, you need all those extra components to properly feed a CU, higher clocks mean the whole unit performs faster. To give you a easy to understand example
1CU @2GHz vs 2CUs @1GHz under ideal cicumstances (100% utilization) both configurations would perform exactly the same. The advantage Cerny mentioned is related to GPU components besides CU such as the frontend, rops, cache, geometry processors etc. all being affected by the higher clock

That's great to hear but doesnt contradict what he said one bit, if anything it paints a great picture for RDNA2 improvements

Still a premium bin/silicon compared to smaller pascals
This is not a theory when RTX 2080 Super is 48 CU wide at 1.9 Ghz beats over-clocked RTX 2070 36 CU wide at 2.2hz.

XSX GPU already scales to RTX 2080 class result with two weeks raw Gears 5 built-in benchmark port with PC Ultra settings i.e. XSX GPU is already superior to RX 5700 XT (~9.66 TFLOPS)!

Unlike PS5, over-clocked RTX 2070's 448 GB/s memory bandwidth is not being shared with the CPU.
 
Last edited:
NXGamer NXGamer can you explain why you said the GPU wouldn't "throttle" but only 50 or 60 MHz? Or is it just not worth your time?




The bolded is wrong dude. I'm not sure why you keep thinking this. SonGoku SonGoku just summarized the DF article (it's better than the video so PLEASE go read it). These points below (written by DF by the way) is what you need to keep in mind.



  • The CPU and GPU each have a power budget. If the CPU doesn't use its power budget - for example, if it is capped at 3.5GHz - then the unused portion of the budget goes to the GPU.
  • There's enough power that both CPU and GPU can potentially run at their limits of 3.5GHz and 2.23GHz, it isn't the case that the developer has to choose to run one of them slower.
  • GPU will spend most of its time at or near its top frequency in situations where the whole frame is being used productively in PS5 games. The same is true for the CPU, based on examination of situations where it has high utilization throughout the frame, we have concluded that the CPU will spend most of its time at its peak frequency.
  • With race to idle out of the equation and both CPU and GPU fully used, the boost clock system should still see both components running near to or at peak frequency most of the time.
The 50MHz was an estimate, the issue is that to save a 10-15% power level you need much less on the APU clock reduction. 50MHz is 2.2% of the max GPU clock, would likely return 8-10% power. 4% (100MHz) would push beyond the the amount needed.

I have covered this before but NO GPU or CPU runs at full clock inter and intra frame, the amount of work determines the power budget being used. THis is what the "race to idle" means. When a GPU is at max clocks it is likley not being fully loaded from a transistor perspective. The CPU will likely be able to run at 3.2GHz for the first couple of years as teams ramp up engines to utilise them, during this time the GPU will likley have sufficient power budget from the CPU via Smart Shift to rarely become an issue.

I have a video up today where I dicusss this in more depth and give you some real examples of how it all works and MAY work once the most important thing lands....
 
Last edited:
The 50MHz was an estimate, the issue is that to save a 10-15% power level you need much less on the APU clock reduction. 50MHz is 2.2% of the max GPU clock, would likely return 8-10% power. 4% (100MHz) would push beyond the the amount needed.

I have covered this before but NO GPU or CPU runs at full clock inter and intra frame, the amount of work determines the power budget being used. THis is what the "race to idle" means. When a GPU is at max clocks it is likley not being fully loaded from a transistor perspective. The CPU will likely be able to run at 3.2GHz for the first couple of years as teams ramp up engines to utilise them, during this time the GPU will likley have sufficient power budget from the CPU via Smart Shift to rarely become an issue.

I have a video up today where I dicusss this in more depth and give you some real examples of how it all works and MAY work once the most important thing lands....

Thanks for this answer.
 
Can someone with more knowledge of emulation tell me if the ps5 would have enough power to do the earlier systems? I can't think of a reason why they couldn't.
 
The 50MHz was an estimate, the issue is that to save a 10-15% power level you need much less on the APU clock reduction. 50MHz is 2.2% of the max GPU clock, would likely return 8-10% power. 4% (100MHz) would push beyond the the amount needed.

I have covered this before but NO GPU or CPU runs at full clock inter and intra frame, the amount of work determines the power budget being used. THis is what the "race to idle" means. When a GPU is at max clocks it is likley not being fully loaded from a transistor perspective. The CPU will likely be able to run at 3.2GHz for the first couple of years as teams ramp up engines to utilise them, during this time the GPU will likley have sufficient power budget from the CPU via Smart Shift to rarely become an issue.

I have a video up today where I dicusss this in more depth and give you some real examples of how it all works and MAY work once the most important thing lands....
In your opinion, was the comparative test DF did with the different CU counts and frequencies representative of what we can expect from the next gen consoles? Is it solid proof that Cerny's statement about frequency versus CUs is wrong?
 
Can someone with more knowledge of emulation tell me if the ps5 would have enough power to do the earlier systems? I can't think of a reason why they couldn't.

PS1? Easily? PS2? That's a little harder task due to Emotion Engine and its Vector Processing Units, but their tasks could simply be redirected to the GPU. PS3, now this is basically impossible task due to Cell and its SPEs, as each game uses them for different purposes, some for CPu tasks, some for GPU tasks, some for sound processing etc., so the emulation would have to be made on a game by game basis, with a profile for each game telling each SPE task where it should go on the PS5.
 
Top Bottom