What AMD has officially detailed so far about RDNA 3 is yet another significant increase in performance per watt over RDNA 2, with contributions from process node and microarchitectural design choices. However, the design philosophy of gfx11 is all about area, area, area. What is the best way to achieve the performance target with minimal area? The rearchitected Compute Unit and Optimized Graphics Pipeline changes are mostly about trimming the fat in pursuit of the lowest area and cost (example: halving relative FP64 rate to 1/32). As a result of this focus, PPA is significantly increased. In fact, at the same node, an RDNA 3 WGP is slightly smaller in area than an RDNA 2 WGP, despite packing double the ALUs.
OREO
One of the features in the RDNA 3 graphics pipeline is OREO: Opaque Random Export Order, which is just one of the many area saving techniques. With gfx10, the pixel shaders run out-of-order, where the outputs go into a Re-Order Buffer before moving to the rest of the pipeline in-order. With OREO, the next step (blend) can now receive and execute operations in any order and export to the next stage in-order. Thus, the ROB can be replaced with a much smaller skid buffer, saving area.
Infinity Cache Updates
The Memory Attached Last Level (MALL) Cache blocks are each halved in size, doubling the number of banks for the same cache amount. There are also changes and additions that increase graphics to MALL bandwidth and reduce the penalty of going out to VRAM.