Well, it ain't about hitting 5 WGP or not.
It's about trying to find out why Mark Cerny changed his philosophy on CU count.
Road to PS5
"In general, I like running the GPU at a higher frequency. Let me show you why.
Here's two possible configurations for a GPU roughly of the level of the PlayStation 4 Pro. This is a thought experiment, don't take these configurations too seriously.
If you just calculate teraflops, you get the same number, but actually, the performance is noticeably different because teraflops is defined as the computational capability of the vector ALU.
That's just one part of the GPU, there are a lot of other units and those other units all run faster when the GPU frequency is higher. At 33% higher frequency, rasterization goes 33% faster. Processing the command buffer goes that much faster, the L2 and other caches have that much higher bandwidth and so on.
About the only downside, is that system memory is 33% further away in terms of cycles. But the large number of benefits more than counterbalanced that.
As a friend of mine says a rising tide lifts all boats.
Also it's easier to fully use 36CUs in parallel than it is to fully use 48CUs. When triangles are small, it's much harder to fill all those CUs with useful work."
The bolded is the most important in this case. Why would Cerny increase the CUs within the Shader Engines even though it's already hard to utilize 48CUs in parallel?
AMD also never go over 5WGP per Shader Array. How Cerny figure it out is what I'm interested in learning.