• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

PlayStation 6 to utilize AMD's 3D stacked chips; AMD UDNA Flagship GPU revived for 2026, Zen 6 Halo with 3D stacking technology, and Zen 6 all on TSMC

My guess would be that the GPU in the PS6 is going to be three device layers, each one being a newer revision of the PS5 Pro GPU and clocked at whatever allows them to hit the 250watt limit or less

Then paired with a modern mobile Zen CPU inside the APU with decent power efficient but with equal or higher clocks to the PS5/Pro, for B/C with but better IPC and throughput because of the 3D cache.

Assuming they were going with 3x Crossfire GPU I would expect 48GBs of whatever GDDR memory won't bottleneck performance, so possible sticking with GDDR6 and just relying on the GPU crossfire setup with a memory controller operating in parallel on three 16GB regions to give a big multiplier in bandwidth by controller complexity rather than chasing expensive GDDR, combined with an update IOComplex with three times the bandwidth (ESRAM) to scale appropriately.

If they were doing it this way, they'd be completely covered for PS5 B/C, mostly PS5 Pro B/C with patches to handle clocks and redirecting raster, RT and ML to the different GPUs, and Cross gen by taking the Pro solution and just ramping up the ML AI and RT on those hardly used parallel GPUs and using the newer Zen CPU and more GDDR.

Early native PS6 games would then probably utilise the Zen CPU, new IOcomplex and RAM fully with Raster on one GPU1, RT on GPU2 and ML AI on GPU3,
Fully developed PS6 games would instead split the Raster, ML AI and RT across the GPUs 1-3 as jobs to scale by need rather than dedicate whole GPU cores per feature IMO.
3x Crossfire GPU and 48GB GDDR6 in a console?! With an MSRP of $1499?

CF/SLI is dead even on PC these days.

There's no way Sony won't adopt GDDR7, since it will be cheaper and higher density.

I don't even understand what makes you think GPU2 will only have RT circuitry. Are you sure you understand modern GPU architectures?

Splitting jobs among cores sounds too Cell-y (software rendering), but modern GPUs have dedicated circuitry for RT.
 

PaintTinJr

Member
3x Crossfire GPU and 48GB GDDR6 in a console?! With an MSRP of $1499?

CF/SLI is dead even on PC these days.

There's no way Sony won't adopt GDDR7, since it will be cheaper and higher density.

I don't even understand what makes you think GPU2 will only have RT circuitry. Are you sure you understand modern GPU architectures?

Splitting jobs among cores sounds too Cell-y (software rendering), but modern GPUs have dedicated circuitry for RT.
The BVH accelerators are inside the general purpose WGPs (for raster shader, BVH RT, ML AI CNN's stacked CU caches) so yeah, why wouldn't each have the functionality? It is the very reason AMD and PlayStation haven't been chasing performant Nvidia style ASIC features outside of the WGPs.

The cost of 48GBs of PS5 Pro level GDDR6 for a Ps6 would be offset by the use in 4 PlayStation SKUs they would still have in the market, not including a PS6, so the volumes they are dealing in would certainly let them hit a maximum £750 launch price point IMO.

Splitting jobs is what is already happening, and with the PSSR solution on PS5 Pro relying on stacking L1 and L2 bandwidth from CUs, the granularity that PS5 code is operating is very small, so piping them across GPUs on the same clock cycle (ie in parallel like a big abstract 144 or something CU GPU) makes perfect sense as opposed to chasing monolithic size or clockspeed. against thermals and power efficiency that all diminish the higher you climb

/edit
in case it wasn't obvious I'm not being literal with crossfire/sli as discrete GPUs, I'm still talking about GPU units within a 3Dstacked APU.
 
Last edited:
It depends... can they have competitive enough ARM cores to rival Zen 6? (wideness, AVX-512)

Well according to the FTC leaks the design (back then) mentioned interesting specs, AMD would be involved one way or another

XBOX-SERIES-NEXT-GEN-SPEC.jpg
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
I would see still an SoC (maybe the Pro model could explore other solution) with GDDR7 and maybe VCache / Infinity Cache stacked memory for the GPU for the bandwidth they need.

If PS6 is serious about more flexible AI acceleration they need more memory and much more bandwidth (RT would benefit too). I think 32 GB of GDDR7 is a good part of the puzzle sorted, but we are not talking about HBM like super high bandwidth external memory so I think that pushing on the caching hierarchy and stacked memory to give the needed bandwidth is a must.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
Based on that rumor is it possible to guess the power of the PS6 ? Is it 3x, 6x, 10x vs PS5 ?
Too early to tell, but aside from focus on some areas like RT which will get another speedup on top of the 2-3x speedup they gained with PS5 Pro, this generation will be a lot more based on AI to render sparsely and fill the gaps. So more focused used of yes increased resources (I would not expect the absolute number of “old” metrics like TFLOPS or fillrate to jump by an order or magnitude or so).

Being a new generation the focus is not just to make games better without any effort from devs, more effort is to be required to take advantage of the new architecture. We do not even know clockspeed so even if we know Zen 6 will be better than Zen 2 (Zen 5 is :)), it might not mean much until we know more details.
 

Loxus

Member
I would see still an SoC (maybe the Pro model could explore other solution) with GDDR7 and maybe VCache / Infinity Cache stacked memory for the GPU for the bandwidth they need.

If PS6 is serious about more flexible AI acceleration they need more memory and much more bandwidth (RT would benefit too). I think 32 GB of GDDR7 is a good part of the puzzle sorted, but we are not talking about HBM like super high bandwidth external memory so I think that pushing on the caching hierarchy and stacked memory to give the needed bandwidth is a must.
HBM can still be on the table.

If Sony is using 3D stacking, they can stack the HBM on top of the I/O die or use fanout HBM, which is similar to RDNA 3 MCDs.

Both of which removes what makes HBM so expensive, the interposer.


High Bandwidth Memory Will Stack on AI Chips Starting Around 2026 With HBM4
Currenty, HBM stacks integrate 8, 12, or 16 memory devices as well as a logic layer that acts like a hub. HBM stacks are placed on the interposer next to CPUs or GPUs and are connected to their processors using a 1,024-bit interface. SK Hynix aims to put HBM4 stacks directly on processors, eliminating interposers altogether.

This approach resembles AMD’s 3D V-Cache, which is placed directly on CPU dies. But HBM will feature considerably higher capacities and will be cheaper but slower than V-Cache albeit slower.



SK hynix Prepares for ‘Fan-out Packaging’ with Next-generation HBM
A major reason for SK hynix’s application of Fan-out packaging in the memory semiconductor field is interpreted as a cost reduction in packaging. The industry regards 2.5D Fan-out packaging as a technology that can reduce costs by skipping the Through-Silicon Via (TSV) process while increasing the number of input/output (I/O) interfaces. The industry speculates that this packaging technology will be applied to Graphic DRAM (GDDR) and others that require an expansion of information I/O.
 

Panajev2001a

GAF's Pleasant Genius
HBM can still be on the table.

If Sony is using 3D stacking, they can stack the HBM on top of the I/O die or use fanout HBM, which is similar to RDNA 3 MCDs.

Both of which removes what makes HBM so expensive, the interposer.


High Bandwidth Memory Will Stack on AI Chips Starting Around 2026 With HBM4
Currenty, HBM stacks integrate 8, 12, or 16 memory devices as well as a logic layer that acts like a hub. HBM stacks are placed on the interposer next to CPUs or GPUs and are connected to their processors using a 1,024-bit interface. SK Hynix aims to put HBM4 stacks directly on processors, eliminating interposers altogether.

This approach resembles AMD’s 3D V-Cache, which is placed directly on CPU dies. But HBM will feature considerably higher capacities and will be cheaper but slower than V-Cache albeit slower.



SK hynix Prepares for ‘Fan-out Packaging’ with Next-generation HBM
A major reason for SK hynix’s application of Fan-out packaging in the memory semiconductor field is interpreted as a cost reduction in packaging. The industry regards 2.5D Fan-out packaging as a technology that can reduce costs by skipping the Through-Silicon Via (TSV) process while increasing the number of input/output (I/O) interfaces. The industry speculates that this packaging technology will be applied to Graphic DRAM (GDDR) and others that require an expansion of information I/O.
I would like seeing it practical, but a last level memory cache like VCache/Infinity Cache and GDDR7 as well as tweaks to the CUs to reduce bandwidth needed from external I/O might be more economical.
 
Last edited:

winjer

Member
HBM can still be on the table.

If Sony is using 3D stacking, they can stack the HBM on top of the I/O die or use fanout HBM, which is similar to RDNA 3 MCDs.

Both of which removes what makes HBM so expensive, the interposer.


High Bandwidth Memory Will Stack on AI Chips Starting Around 2026 With HBM4
Currenty, HBM stacks integrate 8, 12, or 16 memory devices as well as a logic layer that acts like a hub. HBM stacks are placed on the interposer next to CPUs or GPUs and are connected to their processors using a 1,024-bit interface. SK Hynix aims to put HBM4 stacks directly on processors, eliminating interposers altogether.

This approach resembles AMD’s 3D V-Cache, which is placed directly on CPU dies. But HBM will feature considerably higher capacities and will be cheaper but slower than V-Cache albeit slower.



SK hynix Prepares for ‘Fan-out Packaging’ with Next-generation HBM
A major reason for SK hynix’s application of Fan-out packaging in the memory semiconductor field is interpreted as a cost reduction in packaging. The industry regards 2.5D Fan-out packaging as a technology that can reduce costs by skipping the Through-Silicon Via (TSV) process while increasing the number of input/output (I/O) interfaces. The industry speculates that this packaging technology will be applied to Graphic DRAM (GDDR) and others that require an expansion of information I/O.

I really doubt we'll see a console using HBM, because it's too expensive and on a console price is the main factor.
But having a stack of 3DVcache, made in N6, as AMD is using on their Ryzen CPUs, could be something more realistic.
A chunk of 32 or 64Mb of L3 on top of the SoC, could do wonders for data locality.
 

Loxus

Member
I really doubt we'll see a console using HBM, because it's too expensive and on a console price is the main factor.
But having a stack of 3DVcache, made in N6, as AMD is using on their Ryzen CPUs, could be something more realistic.
A chunk of 32 or 64Mb of L3 on top of the SoC, could do wonders for data locality.
The article states 3D stacking HBM is cheaper than 3D V-Cache.

This video shows how expensive 3D V-Cache can be due to the steps involved, which may not be viable for mass production of millions of chips.

 
Last edited:

PaintTinJr

Member
Minor maybe, but from what Cerny was saying cache bandwidth was too low, they are relying on dedicated registers in each CU shader processor.
I clearly need to go back and gleam more info from that chat as I was under the impression the register caches were the L1 data. Going by your correction of what I was thinking, I suspect that means the granularity of the processing - within the WGPs - is even finer grain than I thought, which again leads me to the view that the cheapest way to scale up the ML AI TOPs and RT without breaking the GPU CU count (36Cus, etc) that has proved to be more optimal to CU saturation with raster/shader is to crossfire/SLI GPU units within an APU.

In your other comment you mentioned GDDR7, assuming it is already most likely, but thinking back to the PS5 having RDNA1.x ROPs to get double the amount within cost, I'm still leaning towards GDDR6 being the same situation where having more versus memory controller simplicity and more expensive, and still less will lean PlayStation to older, cheaper and more complicated, especially as it would be an easy tick list improvement on a PS6 Pro.

Nvidia's recent vision transformer(new) versus CNNs (old) despite being slight of hand marketing in all likelihood via partial attention (partial context full frame analysis) unlikely to scale down to RTX 20xx, I do foresee this being added as full self attention to PSSR on PS6 where latency hiding the analysis on a crossfire/sli GPU unit in an APU, or done quicker across all 3 GPU units with a frame's latency seems more feasibly when you don't have to account for bandwidth of in/out of an external NPU unit, and similarly certain parts of raster/shader do benefit from more CUs in specific tasks like generating the scene's depth buffer that is the initial ray of a ray trace, so that again would benefit from the latency reduction of a three crossfire GPU unit, to then kickstart the BVH intersection tracing on just one or two units, So I still think there's a lot of software flexibility benefits of a multi GPU unit when done with AMD's very general purpose WGPs.


Even a PS6 Pro could probably just be bigger CU counts on a better lithography for GPU units attached to GDDR7
 
Last edited:
The article states 3D stacking HBM is cheaper than 3D V-Cache.

This video shows how expensive 3D V-Cache can be due to the steps involved, which may not be viable for mass production of millions of chips.


Then how come AMD doesn't use it on Ryzen CPUs?

I clearly need to go back and gleam more info from that chat as I was under the impression the register caches were the L1 data. Going by your correction of what I was thinking, I suspect that means the granularity of the processing - within the WGPs - is even finer grain than I thought, which again leads me to the view that the cheapest way to scale up the ML AI TOPs and RT without breaking the GPU CU count (36Cus, etc) that has proved to be more optimal to CU saturation with raster/shader is to crossfire/SLI GPU units within an APU.

In your other comment you mentioned GDDR7, assuming it is already most likely, but thinking back to the PS5 having RDNA1.x ROPs to get double the amount within cost, I'm still leaning towards GDDR6 being the same situation where having more versus memory controller simplicity and more expensive, and still less will lean PlayStation to older, cheaper and more complicated, especially as it would be an easy tick list improvement on a PS6 Pro.

Nvidia's recent vision transformer(new) versus CNNs (old) despite being slight of hand marketing in all likelihood via partial attention (partial context full frame analysis) unlikely to scale down to RTX 20xx, I do foresee this being added as full self attention to PSSR on PS6 where latency hiding the analysis on a crossfire/sli GPU unit in an APU, or done quicker across all 3 GPU units with a frame's latency seems more feasibly when you don't have to account for bandwidth of in/out of an external NPU unit, and similarly certain parts of raster/shader do benefit from more CUs in specific tasks like generating the scene's depth buffer that is the initial ray of a ray trace, so that again would benefit from the latency reduction of a three crossfire GPU unit, to then kickstart the BVH intersection tracing on just one or two units, So I still think there's a lot of software flexibility benefits of a multi GPU unit when done with AMD's very general purpose WGPs.


Even a PS6 Pro could probably just be bigger CU counts on a better lithography for GPU units attached to GDDR7
Not gonna happen, soon enough GDDR6 will be EOL. Even nVidia has abandoned it. Sony needs to have solid logistics for the PS6 all the way to mid-2030s.

That's like saying PS4 should have stuck to good ol' GDDR3, even though the latest PS3 Super Slim revision (28nm RSX) adopted 2 x GDDR5 chips vs 4 x GDDR3 chips without compromising BC. It's just more economical, plain and simple:

20140902135435.jpg


The multi-GPU setup sounds too complicated for most devs, kinda reminiscent of Cell in a way... most devs would just use 1 GPU out of 3, so you'd be stuck with an expensive console that barely anyone utilizes to its full potential.

Sony will not go that wide as you suggest, in fact they're willing to go narrower judging by the PS5 GPU (less CUs, higher clocks).

Based on that rumor is it possible to guess the power of the PS6 ? Is it 3x, 6x, 10x vs PS5 ?
Don't expect a huge raster bump. Sony/AMD are following nVidia's approach (less brute force, more AI).

At best it might be equal to RTX 4090... maybe...
 
Last edited:

PaintTinJr

Member
Then how come AMD doesn't use it on Ryzen CPUs?


Not gonna happen, soon enough GDDR6 will be EOL. Even nVidia has abandoned it. Sony needs to have solid logistics for the PS6 all the way to mid-2030s.

That's like saying PS4 should have stuck to good ol' GDDR3, even though the latest PS3 Super Slim revision (28nm RSX) adopted 2 x GDDR5 chips vs 4 x GDDR3 chips without compromising BC. It's just more economical, plain and simple:

20140902135435.jpg


The multi-GPU setup sounds too complicated for most devs, kinda reminiscent of Cell in a way... most devs would just use 1 GPU out of 3, so you'd be stuck with an expensive console that barely anyone utilizes to its full potential.

Sony will not go that wide as you suggest, in fact they're willing to go narrower judging by the PS5 GPU (less CUs, higher clocks).


Don't expect a huge raster bump. Sony/AMD are following nVidia's approach (less brute force, more AI).

At best it might be equal to RTX 4090... maybe...
How is that comparable with the slim, are you suggesting that the PS5 or Pro won't sell in 2030? And logistics is a long game, they could design the launch console with GDDR6 and end up on GDDR7 at GDDR6 specs because of logistics and dropped price to buy GDDR7, but they can be forward compatible that way, but if the prohibitive costs of 48GBs of GDDR7 at launch make the console cost £1000 in parts that isn't a superior option when you didn't need GDDR7 speeds when 3x downclocked GDDR6 would have been bigger or the same.

The multi GPU setup would be completely invisible to middleware like UE6 that would largely be used as is like BMW type games have this gen on UE5/PS5, as ICE team abstraction layer style SPURs would handle that automatically for all but native to the metal PS6 games. So would be nothing like launch PS3 complexity, and more like PS3 using SPURS end of gen happy middleware dev situation.

And they wouldn't be going wide per sa. it could be 3x narrow at lower clock gaining advantages from newer RDNA, or configured 3 wide for a single cycle, then 3x narrow, with three different uses (raster shader, RT, ML AI) on the next clock cycle for maintaining utilisation.
 
Last edited:

Xyphie

Member
A non-handheld PS6 is going to use 24-32Gbit GDDR7 with whatever speed bin is cheap at the time. HBM is too expensive (it's not even used in $2000 GPUs, so why would consoles be able to afford it?) and GDDR6 is EOL and is not going to get faster, but more importantly denser chips.
 

Crayon

Member
That is a bad idea. Sony has a good tick-tock strategy with base and Pro model, a suicidal empathy with MS approach there does not make sense.

If they have a base PS6 and a handheld again, with a Pro coming down the line, maybe, but looking at what MS did and copying it when it did not work (it did not show promise and it was an attempt to sandwich PS5 from low and high end at the same time more than anything IMHO) does not mean Sony should follow. By all accounts PS5 should have shipped with PSVR/PSVR2 bundled in then because MS did it with Kinect.

A Pro is not just an overclocked slightly bigger chipset. It is an evolution of an existing platform and testing grounds for a new generation. You need time to observe what developers do with the existing console (including internal devs and select third parties with pre-release DevKits) and where the industry is leading to in the future to chart the path to a Pro model and then to the new generation.

The “we launched our mid generation upgrade with the base console” idea is simply not the point nor, no offence meant, a good idea / good understanding of what mid-generation upgrades are meant for.

The series s situation is worse that that. Playststions always get support well past the launch of their successors. These days it's cross gen, but back then it was bespoke versions or even distinct games. The PS4 IS Sony's series s. A way to still enjoy games if you aren't ready for $500 at the moment.

Except you probably already have a ps4. Series S wouldn't be needed if ms wasn't so dead set on burying the xb1. Miles, GT7, and Horizon fw coming out on PS4 meant happier PS4 owners, who are your biggest pool of prospective PS5 owners.
 

PaintTinJr

Member
A non-handheld PS6 is going to use 24-32Gbit GDDR7 with whatever speed bin is cheap at the time. HBM is too expensive (it's not even used in $2000 GPUs, so why would consoles be able to afford it?) and GDDR6 is EOL and is not going to get faster, but more importantly denser chips.
It doesn't need to get faster, but in my hypothetical scenario of a PS6 being modern mobile Zen chiplets with 3d cache to 3 unit GPU stacked layers with each GPU unit a more modern Pro GPU downclocked to hit a 250watt limit, each GPU unit could in effect have its own 16GBs of GDDR6, effectively tripling system memory performance using old GDDR6.
 

Xyphie

Member
It doesn't need to get faster, but in my hypothetical scenario of a PS6 being modern mobile Zen chiplets with 3d cache to 3 unit GPU stacked layers with each GPU unit a more modern Pro GPU downclocked to hit a 250watt limit, each GPU unit could in effect have its own 16GBs of GDDR6, effectively tripling system memory performance using old GDDR6.

So you expect the PS6 to have 3 GPUs, each with 256-bit GDDR6 interfaces, for a 768-bit total? And this SoC will be how many mm^2? You're going to get some laptop variant CPU paired with a mid-range AMD GPU with 24-32GB.
 
CPUs are more latency sensitive than bandwidth.
Who told you HBM increases latency?

HBM is nothing like GDDR in terms of latency, it behaves more like regular DDR.

How is that comparable with the slim, are you suggesting that the PS5 or Pro won't sell in 2030?
PS5 Pro will be phased out by then, since the console enthusiast audience will no longer be interested by then (same thing with PS4 Pro vs PS5).

PS5 Super Slim would most likely adopt GDDR7, just like PS3 Super Slim with RSX @ 28nm adopted 2 GDDR5 chips. Lower cost + economies of scale (ordering a single type of memory for both consoles -> bigger order -> cheaper price). Logistics 101

20140902135435.jpg


But AMD doesn't use HBM, even on their GPUs. They use the same L3 cache, just on the side, as their CPUs.
They don't use it not because of latency, but because it's expensive as hell.

Not even nVidia uses HBM on their consumer GPUs (even the prosumer RTX 5090) and they dominate the GPU field (90% marketshare).
 
So you expect the PS6 to have 3 GPUs, each with 256-bit GDDR6 interfaces, for a 768-bit total? And this SoC will be how many mm^2? You're going to get some laptop variant CPU paired with a mid-range AMD GPU with 24-32GB.
Even if Cerny became Crazy Ken Vol2 (highly unlikely), this would still be a bad idea.

scwx9eT.png


A $1000 console is DOA. Consoles need to be mass market products, otherwise game devs won't bother.

PSVR2 has probably taught Sony a lesson or two...
 

PaintTinJr

Member
So you expect the PS6 to have 3 GPUs, each with 256-bit GDDR6 interfaces, for a 768-bit total? And this SoC will be how many mm^2? You're going to get some laptop variant CPU paired with a mid-range AMD GPU with 24-32GB.
Stacked, with a 3D cache. Lets consider Sony's previous PlayStation interfaces that others wouldn't have considered, even the IO Complex is a modern interface others wouldn't have come up with, but when their choice is expensive 3rd party parts or complex internal EE solutions, they favoure the latter because it is at cost and eventually endeds up cheaper very quickly.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
The series s situation is worse that that. Playststions always get support well past the launch of their successors. These days it's cross gen, but back then it was bespoke versions or even distinct games. The PS4 IS Sony's series s. A way to still enjoy games if you aren't ready for $500 at the moment.

Except you probably already have a ps4. Series S wouldn't be needed if ms wasn't so dead set on burying the xb1. Miles, GT7, and Horizon fw coming out on PS4 meant happier PS4 owners, who are your biggest pool of prospective PS5 owners.
Very well said.
 

Loxus

Member
But AMD doesn't use HBM, even on their GPUs. They use the same L3 cache, just on the side, as their CPUs.
AMD used HBM on Radeon VII.

3D stacked and Fan-out HBM isn't available as yet. Did you read the article?

SK Hynix on track to explore interposer-free HBM4 production

They don't use it not because of latency, but because it's expensive as hell.

Not even nVidia uses HBM on their consumer GPUs (even the prosumer RTX 5090) and they dominate the GPU field (90% marketshare).
I still don't think you guys are reading the article and how an interposer free HBM can benefit a unified memory architecture.

HBM is high bandwidth and low latency which benefits both the CPU and GPU without adding a large amount of L3 cache.

The interposer is the reason HBM is expensive. Removing it now makes HBM suitable for a console unified memory architecture.

However, the interposer itself is a significant cost factor in HBM due to the complex semiconductor manufacturing process and size limitations. Large interposers are difficult and expensive to produce.

According to a report from Tom’s Hardware, SK Hynix has been recruiting logic semiconductor design engineers for CPUs and GPUs since late 2023. The company seeks to stack HBM4 directly on processors, completely bypassing the need for interposers. This approach would revolutionize not only the way logic and memory chips are connected, but also their entire manufacturing process.

While stacking HBM4 directly on processors offers potential benefits in terms of design simplification and cost reduction, it comes with a significant hurdle – heat dissipation. SK Hynix is reportedly collaborating with Nvidia and other companies to tackle HBM4 integration design. There’s a high possibility of joint chip design from the start, with production likely handled by TSMC.



The rumor of PS6 utilizing 3D stacking makes it a possibility that the HBM can be stacked on top of an I/O die similarly to Mi300A.
 
Last edited:
I still don't think you guys are reading the article and how an interposer free HBM can benefit a unified memory architecture.

HBM is high bandwidth and low latency which benefits both the CPU and GPU without adding a large amount of L3 cache.

The interposer is the reason HBM is expensive. Removing it now makes HBM suitable for a console unified memory architecture.
I've been reading for over a decade about how HBM will become mainstream (i.e. organic vs silicon interposer), but so far it's been a nothingburger.

I'll stick to my safe GDDR7 prediction. :)
 

PaintTinJr

Member
@Xyphie

What is Sony's alternative other than major EE innovation and software innovation for the PS6 to remain better bang for buck than any generic competitor with money, or over spending on BoM or under delivering on a launch PS6 spec that puts daylight between itself and the PS5 Pro to repeat the successes of PS1, PS2, PS4 and PS5?
 
Last edited:

PaintTinJr

Member
Even if Cerny became Crazy Ken Vol2 (highly unlikely), this would still be a bad idea.

scwx9eT.png


A $1000 console is DOA. Consoles need to be mass market products, otherwise game devs won't bother.

PSVR2 has probably taught Sony a lesson or two...
Why would it cost £1000? many of the Ps5 innovations like the IO complex and use of SSD modules with multiple channels to increase bandwidth have already been amortised. The Pro GPU x3 at a smaller node would be amortised too, so the 3D cache to interface it all would surely be the biggest BoM cost, especially if downclocking GPU units on newer RDNA and not going for newest lithography.

Where are you seeing the costing exploding on an APU with 3 GPU Units and Zen processor?
 

Xyphie

Member
@Xyphie

What is Sony's alternative other than major EE innovation and software innovation for the PS6 to remain better bang for buck than any generic competitor with money, or over spending on BoM or under delivering on a launch PS6 spec that puts daylight between itself and the PS5 Pro to repeat the successes of PS1, PS2, PS4 and PS5?

We've seen Sony build the most predictable SoC with a mid-range GPU with a 256-bit GDDRx bus for 4 consoles in a row, why do you expect this to change? If you bet that the PS6 is just going to be some Zen6-7 CPU paired with a mid-range RX 10060-10070-derived 256-bit GDDR7 UDNA GPU, you'll more than likely be right. And with rumors abound that Sony will do a 2 SoC strategy my expectation is that that 256-bit is the high-end.

Look at what the feature set nVidia is launching with Blackwell does, that's the baseline Sony and AMD will catch up to in 3-4 years.
 
Why would it cost £1000?
but if the prohibitive costs of 48GBs of GDDR7 at launch make the console cost £1000 in parts
If it costs £1000 with GDDR7, you can be 100% sure it's going to cost even more with GDDR6 (more chips)...

Besides that, Sony wants a hefty profit margin, they no longer want to sell consoles at a loss. PS5 Pro BOM is nowhere near €800.

So your proposed console setup would sell for at least £1200, maybe even £1500.
 

FireFly

Member
Why would it cost £1000? many of the Ps5 innovations like the IO complex and use of SSD modules with multiple channels to increase bandwidth have already been amortised. The Pro GPU x3 at a smaller node would be amortised too, so the 3D cache to interface it all would surely be the biggest BoM cost, especially if downclocking GPU units on newer RDNA and not going for newest lithography.

Where are you seeing the costing exploding on an APU with 3 GPU Units and Zen processor?
The cost per transistor is not going down signficantly with newer nodes, so 3x the transistors could require well over 2x the silicon cost. It makes more sense to minimise total area and maximise clocks within your power budget, which is exactly what Sony did with the PS5.
 
Last edited:

PaintTinJr

Member
We've seen Sony build the most predictable SoC with a mid-range GPU with a 256-bit GDDRx bus for 4 consoles in a row, why do you expect this to change? If you bet that the PS6 is just going to be some Zen6-7 CPU paired with a mid-range RX 10060-10070-derived 256-bit GDDR7 UDNA GPU, you'll more than likely be right. And with rumors abound that Sony will do a 2 SoC strategy my expectation is that that 256-bit is the high-end.

Look at what the feature set nVidia is launching with Blackwell does, that's the baseline Sony and AMD will catch up to in 3-4 years.
Not 4 gens, and even just looking a those 2gens you are downplaying what was delivered, the innovation started with the PS4 using an advance EE innovating hUMA setup for a APU when the generic option was a split RAM with DDR and Esram like the competition. The PS4 set a new expectation of what was possible hence X1X copied..

Then look at the PS5, the ability for the IO complex to DMA with check-in at 20x less latency than a PC and still offer 14GB/s(?) of decompression bandwidth is hardly just a run of the mill 256 bit bus with GDDR solution like Microsoft and AMD or Nvidia could have whipped up.
 

PaintTinJr

Member
The cost per transistor is not going down signficantly with newer nodes, so 3x the transistors could require well over 2x the silicon cost. It makes more sense to minimise total area and maximise clocks within your power budget, which is exactly what Sony did with the PS5.
But that's only if chasing the latest lithography, when mixing and matching is now very much the norm for custom stuff and PlayStation fabs at least 10M PS5 Pro chips minimum to get beyond such direct costing.

Higher clocks offer less and fight power efficiency and power limits, and stability, and the further you clock above 1.4GHz the less performance per watt you get in parallel processing, that's a theoretical calculation that isn't going to move much by real world improvements or tweaks.

So how exactly do they do a console 2-3x the performance of a PS5 pro below a £750 BoM?
 

nial

Gold Member
They’ve been at 7 years almost on the dot for three generations. I think they’ll continue. Plus, November is right on time for the holidays.

November 2006: PS3
November 2013: PS4
November 2020: PS5
November 2027: PS6
Don't forget
March 2000: PS2 (not quite 7 years, but very close enough).
and
December 2004: PSP
December 2011: PS Vita (discontinued in March 2019).
 
Last edited:

FireFly

Member
But that's only if chasing the latest lithography, when mixing and matching is now very much the norm for custom stuff and PlayStation fabs at least 10M PS5 Pro chips minimum to get beyond such direct costing.

Higher clocks offer less and fight power efficiency and power limits, and stability, and the further you clock above 1.4GHz the less performance per watt you get in parallel processing, that's a theoretical calculation that isn't going to move much by real world improvements or tweaks.

So how exactly do they do a console 2-3x the performance of a PS5 pro below a £750 BoM?
It applies to the 7m to 5nm transition, which is why Sony already had to increase the price for the Pro vs. the base model. Now if you want to ship the equivalent of three of those Pro chips, that's 3x the cost on 5nm, and possibly slightly less on a newer process.

Yes, you lose power efficiency by clocking higher, but the price of the console is limited by the size of the APU, so it makes sense to sacrifice power efficiency to sell at a cheaper price. Which again, is exactly what Sony did with the PS5.

As for my prediction, I think Sony have a choice between building the smallest GPU that can fit a 256-bit bus with GDDR7 for ~1TBs, which on 3nm or 2nm is going to be really expensive. Or they can build a smaller chip (eg. 150 mm^2) with a 128-bit bus and rely on cache. The GCD for the 7900 XTX is 300 mm^2, so on 2mm it could be half that and perhaps there is still some room to increase clock speeds.
 
The article states 3D stacking HBM is cheaper than 3D V-Cache.

This video shows how expensive 3D V-Cache can be due to the steps involved, which may not be viable for mass production of millions of chips.



If it's a choice between HBM and 3D V-cache, V-Cache wins every time.

We're at a point in real-time computing, and especially with the importance of specialized and dense data workloads like ML and raytracing, that data locality is becoming more and more important for GPU performance than ever before.

The costs of moving data around the chip are too prohibitive, and clock speeds can't be pushed much higher because the fabrication technologies are reaching physical limits and diminishing returns on cost.

Designs are starting to look more and more inwards at data locality on die, in order to see architectural improvements that can realise significant performance and efficiency improvements. We're already seeing this with Sony's work on the PS5 Pro registers and cache technologies.

The importance of ML and RT workloads next-gen will make it all the more critical to keep as much data as close to the execution cores as possible. So expanded caches and fat lower-level caches will provide orders of magnitude better performance for the majority of critical GPU workloads next-gen than any benefits HBM will provide to the main memory pool.

3D V-Cache beats HBM on bandwidth and blows it completely out of the water in terms of latency, and latency is the Achilles heel of ML performance. Having greater cache hierarchy performance overall in terms of cache sizes and bandwidth will reduce the pressure on main memory too; meaning Sony can opt for a smaller memory interface width, which reduces die size and complexity, increases yields and thus nets an overall more cost-efficient as well as energy efficient product.

CPUs are more latency sensitive than bandwidth.

With the criticality of workloads like ML to next-gen, GPUs are quickly becoming so too.
 

PaintTinJr

Member
It applies to the 7m to 5nm transition, which is why Sony already had to increase the price for the Pro vs. the base model. Now if you want to ship the equivalent of three of those Pro chips, that's 3x the cost on 5nm, and possibly slightly less on a newer process.

Yes, you lose power efficiency by clocking higher, but the price of the console is limited by the size of the APU, so it makes sense to sacrifice power efficiency to sell at a cheaper price. Which again, is exactly what Sony did with the PS5.

As for my prediction, I think Sony have a choice between building the smallest GPU that can fit a 256-bit bus with GDDR7 for ~1TBs, which on 3nm or 2nm is going to be really expensive. Or they can build a smaller chip (eg. 150 mm^2) with a 128-bit bus and rely on cache. The GCD for the 7900 XTX is 300 mm^2, so on 2mm it could be half that and perhaps there is still some room to increase clock speeds.
Well we know the Pro is massively overpriced and not intended to be mass mainstream sold priced product,

The BoM is estimated at $500 on selling 10m and sharing bulk component pricing with OG PS5 and Slim, and the estimate for the Pro's APU is between $180-$230, and RAM estimated is under $70, so I still don't see why they couldn't hit the $750 BoM price for launch with the spec I suggested. a memory budget over $150 and a APU budget around $400 should be enough given the product volumes expected for a OG PS6 IMO.
 
Last edited:
Top Bottom