So it is fairly clear what the I/O complex does in terms of high quality textures, and on the spot changes to VRAM and load times.
The question I have tried to find answers to - and have not found - is the following:
Assume you have a graphics card on a PC (pick your model) and you get 70 FPS on average at ultra-settings and 1440p with 5% lows at 55 FPS. What does utilisation look like across cores and shaders during rendering? What drives the dips into the lows? I have read several article and book chapters now without getting solid numbers. There are multiple hints that utilisation across CUs and shaders are very far from even but no-one gives practical answers. The question I try to understand is what the value can be of variable frequency, i.e. one CU is starting to hit the roof and be the bottle-neck and a frequency increase solves that issue and how much cache management by the GPU matters to increase efficiency, i.e. how much does GPU cache management impact GPU performance. In both if these areas PS5 seems to have a strength - I just cannot get my arms around what that strength might result in in practice.
Anyone has practical knowledge about GPU utilisation pattern under load (please note that this is very different from the GPU utilisation number you get under windows - that number does not tell you if it is actually used or not - just that it is fired up)?
I don't think there's an easy or even standardized answer here because it comes down a lot to the engine a game is running, the GPU programming language being used (CUDA, etc.), and the programming techniques of the application in question. My understanding of things on this front aren't super-detailed, but I assume generally CUs are filled with work depending on the requirements of the workload, and (usually) that's done sequentially, i.e when the first CU has its caches occupied and it's working on data then the next CU in the block is given data to work with, once all CUs in a block are occupied then CUs in the next block are assigned with tasks by the scheduler etc.
So in that type of environment, the GPU operating faster will clear out its taskwork in the caches more quickly and is sooner available to queue for more work on the task. However if that's the case then it's also worth keeping in mind it's not the only way work can be assigned to CUs in the GPU, there's as asynchronous programming where, with the frontend improvements in RDNA2 architectures, should allow for much better saturation of even lower-demanding tasks on a wider array of CUs of the GPU, that way you don't end up with the situation of extreme unevenness in the GPU hardware utilization.
So with that type of example, a task only needing say 3 TF of computational power could be spread out over 18 CUs instead of 9 CUs for example on PS5 (each CU there is roughly 285 GF); that would give that task more L1 and L2 cache to work on in parallel, in addition to the speed of processing data through the caches based on the GPU clock. If I had to take a guess, I think one of the reasons Sony focused on high clocks is because they were very aware of the problems you mentioned earlier and maybe weren't confident AMD could improve the frontend to such a degree where higher and smarter rates of CU parallelized utilization would give the results they wanted without going absolutely massive on the GPU size, which would've drove up costs. So they chose a strategy on higher clocks instead, and whatever frontend improvements came would be a "nice bonus" on top of that, but at least this allowed them to go with a smaller GPU, banking on yield improvements with shift to 7nm to save on costs even if they would need higher-quality silicon due to the higher clock.
MS, on the other hand, I think they had a lot more confidence in AMD's design team to make breakthroughs on the frontend and improve parallelized CU workloads by magnitudes. The fact we're seeing much bigger GPUs taking the focus front-and-center from AMD (as well as Nvidia and Intel) seems to show that they've made those improvements. They went with a more modest GPU clock in understanding that those frontend and architectural improvements would translate to bigger gains on parts of the GPU performance that are not as reliant on the clocks, and also knowing those would benefit more from a larger GPU. Even if the larger die size would affect the pricing, they wouldn't need such super high-quality silicon for the GPU as the clocks are lower, offsetting some of that extra cost.
Overall I'd say both Sony and MS made smart decisions with their GPUs based on what they saw in the roadmaps and what they saw in terms of actual performance from their current-gen systems when starting up next-gen design work. To the things you mention with PS5; to my understanding the system is in a "continuous boost" mode by default so the GPU is already at full clock speed of 2.23 GHz, and lowers down based on load demands but that is the result of the power distribution being scaled back, not the frequency itself being directly changed. So depending on the GPU task, the GPU will variate the frequency by adjusting the power to itself, but it never goes past 2.23 GHz.
I don't think the power load adjustment works on a CU-level, i.e selectively scaling the power load per CU. They are either all operating at one frequency or all operating at another frequency depending on the power allotted to the GPU. So the example you mention, I don't think that's actually something which can happen on the GPU side. This would also extend to the caches; they're either all operating at one frequency or all operating at another frequency, it's not a case of one CU operating at 2.23 GHz and another at 2.5 GHz and yet another at 1.9 GHz but across the board they're all operating at a net power load resulting in a frequency at or below 2.23 GHz.
.