The use of linux in this scenario isn't to run games without the same userland protection to gain performance - once debugged they'll run normally through the userland protection - it is just a means of locating performance throttling in the HAL/Kernel while debugging a GPU API, but the point is moot as Vulkan, Opengl, etc on Linux don't have the performance discrepancy that Nvidia DX gets over Opengl/Vulkan in WIndows, in say running the Dolphin emulator. So it being transparent on linux versus Windows and DX being a closed opaque shop of proprietary closed source software - going back to my first point that DX only serves Microsoft/Nvidia and potentially throttles all GPU APIs living on top of that DX HAL is still a valid criticism.
That's irrelevant to the hardware issue.
Examples
1. RX 7900 XTX has
33% fewer TMU and RT cores unit count when to compared RTX 4090, hence RX 7900 XTX is one SKU lower when compared to RTX 4090.
Both RX 7900 XTX and RTX 4090 have 192 ROPS and 384-bit bus.
Within the RDNA3 DCU, AMD doubled the stream processor count
without scaling the TMU count!
2. Before VEGA, AMD didn't design ROPS links with L2 cache design, hence
AMD was behind in memory bandwidth conservation when compared to GTX 980 Ti.
VEGA competed against NVIDIA's Pascal and later Volta generation.
Xbox One X's GPU's ROPS has a 2MB render cache that didn't exist on the baseline Polaris IP.
Xbox One X's GPU (modified 44 CU Hawaii with Polaris and semi-custom enhancements) has 2MB L2 cache for Geo/TMU and 2 MB render cache for ROPS while VEGA 56/64 has 4MB L2 cache for Geo/TMU/ROPS.
3. Under "Mr TFLOPS" Raja Koduri's leadership,
AMD ROPS was stuck at 64 ROPS from R9-290X/R9-390X to R9 Fury to Vega 64 to Vega II to RX 5700 XT. This is why AMD was pushing hard for Async Compute's compute shader/TMU IO path as a workaround for the ROPS IO bottleneck.
4. AMD didn't properly scale the geometry engine with CU count while NVIDIA scaled polymorph engines with SM count.
For the mesh shader era, AMD doesn't have compute shader TFLOPS high ground for NAVI21 vs GA102 and NAVI31 vs AD102.
-------
PC has Direct3D profiling tools such as
https://developer.nvidia.com/conten...t3d-11-nvidia-nsight-visual-studio-edition-40
https://learn.microsoft.com/en-us/windows/win32/direct2d/profiling-directx-applications
https://gpuopen.com/rgp/
When designing Xbox One X,
Microsoft identified graphics pipeline bottlenecks for AMD.
--------------
Mark Cerny: "There's a specific set of formats you can use their variations on the same BVH concept. Then in your shader program you use a new instruction that asks the intersection engine to check array against the BVH.
While the Intersection Engine is processing the requested ray triangle or ray box intersections the shaders are free to do other work."
BVH RT has three functions s i.e.
BVH transverse,
box intersection check, and
triangle intersection check. Your statement doesn't show BVH transverse hardware.
Mark Cerny confirmed
Intersection Engine hardware for PS5 GPU!
With NAVI 21's 80 RT cores being close to NAVI 31's 96 RT cores count, the NAVI 31's enhanced 96 RT cores delivered nearly twice the performance of NAVI21's 80 RT cores.
PS5's RT results are within the RDNA 2 power rankings, NOT Ampere GA104 RT class e.g. 256 bit external bus RTX 3070 and RTX 3070 Ti SKUs.
PS5's Doom Eternal RT results are inferior to XSX's and RTX 3070 beats both consoles.
Prove PS5 has Ampere's RT-level cores. Show PS5 beating RTX 3070 in RT!
Mark Cerny: "First we have a custom AMD GPU based on there 'RDNA2' technology what does that mean AMD is continuously improving and revising their tech for RDNA2 their goals were roughly speaking to reduce power of consumption by rhe architecting the GPU to put data close to where it's needed to optimize the GPU for performance and to adding new more advanced feature set.
But that feature set is malleable which is to say that we have our own needs for PlayStation and that can factor into what the AMD roadmap becomes.
So collaboration is born.
If we bring concepts to AMD that are felt to be widely useful then they can be adopted into RDNA - and used broadly including in PC GPUs.
Mark Cerny:" So we've implemented a gentler way of doing things where the coherency engines inform the GPU of the overwritten address ranges and custom scrubbers in several dozen GPU caches do pinpoint evictions of just those address ranges."
The GPU cache scrubbers are exactly why the complex slug text benchmark works more efficiently on the RDNA2 of PS5, because as the depth of the problem increases - AAA games even in simpler Forward+ rendering are complicated compared to basic slug text - the scrubbers yield more bandwidth saved - both from latency of not waiting for a transfer request/delivery, and from the reduced volume of data being processed by eliminating lots of data transfer redundancy.
PS5 GPU doesn't have PC RDNA 2's Infinity Cache design.
On new vs new comparison, RDNA 2 generation competes against NVIDIA's Ampere generation. PS5 competes against NVIDIA's Ampere generation.
RTX 3070 is NVIDIA's 256-bit GDDR6-14000 like PS5's. PS5's 448 GB/s memory bandwidth is shared between CPU and GPU.
Comparing PS5 against Turing RTX 2080 (TU104) is a lower bar comparison when Turing competed against Vega II and RX 5700 series generation!
PS; I have an MSI RTX 3080 Ti Gaming X Trio OC (gaming room, faster than RTX 3090 FE) and MSI RTX 3070 Ti Suprim X (for living room PC instead of game consoles).