AMD RDNA 1 & 2 GPU Driver Support Moved To “Maintenance” Mode, Game Optimizations & New Tech For RDNA 3, 4 & Beyond

This isn't a backtrack?

No, the RDNA 2 GPUs remain on a maintenance branch, as originally planned. They simply rephrased the same text due to the scale this issue reached.
Even Steve from Gamer Nexus noticed this. He even said that, judging by the language used, the text was written by a lawyer.
 
were they really going to sideline older GPUs?poor choice of words was the issue here
here is the populer arc raiders running on a rx580

 
"Market needs" sounds like "let's optimize for some new Call of Duty that comes out."

AMD confirms RDNA1/2 will get game optimizations alongside RDNA3/4, Call of Duty Black Ops 7 included

200.gif
 
No. Nvidia only maintains security updates for those GPUs. The cutoff point for optimized drivers is for the RTX 2000 series.
AMD still provides security updates for GPUs prior to RDNA2. The problem is that there are no optimized drivers for RDNA1 and 2, from now on.
So what's the difference between "only maintains security updates" and "there are no optimized drivers for RDNA1 and 2, from now on"?

The cutoff point for optimized drivers is for the RTX 2000 series.
Not yet. Yesterday's Nvidia GRD release was still from R580 branch with Maxwell and Pascal support. This should be the last one though.
Press has been running way ahead of the moment on this one.
 
AMD's Radeon Preview driver from https://www.amd.com/en/resources/su...-notes/RN-RAD-MS-AGILITY-SDK-25-10-07-01.html

AMD Radeon™ RX 7000 and 9000 series graphics products will support:
  • Advanced Shader Delivery
    • Target AMD's plugin DLL directly using --plugin <Your_Path>\amdxc64.dll
  • Application-Specific Driver States (PIX)
  • Fence Barriers
    • Limitation: "MaybeReorderThreads" does not move threads
  • Tightening Placed Resource Alignment
  • Tiled Resource Tier 4
AMD Radeon™ RX 9000 series graphics products will support:
  • Cooperative Vectors 1.0

RDNA 1 and RDNA 2 are missing hardware features.

NVIDIA Turing and Ampere don't have Shader Execution Reordering i.e. it's NOP(no operation).
This is a word play on part of AMD.
"Limitation: "MaybeReorderThreads" does not move threads" means that the feature isn't actually working despite driver declaring support.
It is exactly the same as on Turing and Ampere where NVAPI also declares support which isn't doing anything - the code just runs without any reordering.
SER will likely be supported from RDNA5 onward on AMD's h/w.
 
Last edited:
This is a word play on part of AMD.
"Limitation: "MaybeReorderThreads" does not move threads" means that the feature isn't actually working despite driver declaring support.
It is exactly the same as on Turing and Ampere where NVAPI also declares support which isn't doing anything - the code just runs without any reordering.
SER will likely be supported from RDNA5 onward on AMD's h/w.
RDNA4 owner here , is this thing a big deal?
 
This is a word play on part of AMD.
"Limitation: "MaybeReorderThreads" does not move threads" means that the feature isn't actually working despite driver declaring support.
It is exactly the same as on Turing and Ampere where NVAPI also declares support which isn't doing anything - the code just runs without any reordering.
SER will likely be supported from RDNA5 onward on AMD's h/w.

Shader Execution Reordering (SER) introduces a new HLSL built-in intrinsic, MaybeReorderThread, that enables application-controlled reordering of work across the GPU for improved execution and data coherence. Additionally, the introduction of HitObject allows separation of traversal, anyhit shading and intersection testing from closesthit and miss shading.

-------------

Shader Execution Reordering has two functions i.e. MaybeReorderThread, and HitObject
Even on devices that don't do reordering, the HitObject portion of SER can be useful.
For instance, suppose an app wants to trace a ray, potentially including AnyHit shader invocations, and just wants the final T value without running the ClosestHit shader (even if it happens to exist in the HitGroup).
The app can call TraceRay returning a HitObject, call HitObject::GetRayTCurrent on the HitObject to get the T value and be done. Not calling HitObject::Invoke, skips ClosestHit/Miss invocation, and this works on any device with Shader Model 6.9 support.

Try again.
 
Last edited:
RDNA4 owner here , is this thing a big deal?
Atm, RDNA 3 and RDNA 4 unified preview driver support half of SER.

It's NOP on Ampere and Turing.

With the ADA generation, Nvidia already used SER extensions with NVAPI in Cyberpunk 2077's path tracing. RDNA 3 / 4 is brute-forcing Cyberpunk's path tracing without NVAPI extensions.

In theory, RDNA 4 supports hardware out-of-order shader execution. This requires the driver code base's segment to be separated from RDNA 3.
 
Last edited:
Let's wait RDNA5/Rubin
Both are highly likely to be less sensitive to traversal and shading divergence than RDNA2/3/4 are.
So I'd expect SER to be even less interesting on next gen h/w - and for that h/w to have more interesting things for RT than SER in it.
SER's main selling point is that it's a cheap optimization from h/w perspective, and anything which is getting you even +1% with close to zero transistors spent is interesting.

Try again.
"Try again" what?
You've just confirmed exactly what I've said - RDNA3/4 h/w won't do thread reordering.
HitObject is a purely API s/w optimization which doesn't require any support in the h/w.

SM 6.9 (required for SER feature in DX) is supported from Turing onward on Nvidia and I kinda thought that it will be supported from RDNA2 onward on AMD as this makes sense feature wise - but this thread suggests that AMD may be skipping RDNA2 for such support.
 
Last edited:
Both are highly likely to be less sensitive to traversal and shading divergence than RDNA2/3/4 are.
So I'd expect SER to be even less interesting on next gen h/w - and for that h/w to have more interesting things for RT than SER in it.
SER's main selling point is that it's a cheap optimization from h/w perspective, and anything which is getting you even +1% with close to zero transistors spent is interesting.


"Try again" what?
You've just confirmed exactly what I've said - RDNA3/4 h/w won't do thread reordering.
HitObject is a purely API s/w optimization which doesn't require any support in the h/w.

SM 6.9 (required for SER feature in DX) is supported from Turing onward on Nvidia and I kinda thought that it will be supported from RDNA2 onward on AMD as this makes sense feature wise - but this thread suggests that AMD may be skipping RDNA2 for such support.
Nvidia's Shader Execution Reordering (SER) allows for is reordering threads that hit or miss.

RDNA 4's out-of-order memory access seems to be very similar to the capabilities that Cortex-A510, which can absorb up to 2 cache misses without stalling the rest of the pipeline. The number of misses that an RDNA4 Compute Unit can handle is unknown.

From https://chipsandcheese.com/p/amds-rdna4-architecture-video
 
Nvidia's Shader Execution Reordering (SER) allows for is reordering threads that hit or miss.
Which is where divergence happens.

RDNA 4's out-of-order memory access seems to be very similar to the capabilities that Cortex-A510, which can absorb up to 2 cache misses without stalling the rest of the pipeline. The number of misses that an RDNA4 Compute Unit can handle is unknown.
This has nothing to do with SER.
 
Which is where divergence happens.


This has nothing to do with SER.
SER deals with data coherence. SER is about grouping threads based on local data coherence.

"SER allows for is reordering threads that hit or miss, as well as threads that go to the same cache or memory level, to be bundled in the same wave." - Microsoft

Out of Order Memory Access is another method for mitigating data load stalls with divergence.

For divergence mitigation,
1. NVIDIA's approach for an in-order processor (a GPU in this case) with many hyper-threads is to reorder the threads. The program order is changed.

2. A CPU company's approach for an in-order processor (a GPU in this case) is to add the out-of-order memory access feature. The program order is not changed. A CPU with OOOE (out-of-order execution) will process instructions and read data out of order, but maintain program order at the end. RDNA 4's approach is similar to ARM A510 CPU's approach, with just the out-of-order memory access, and this is where AMD's CPU knowledge base comes into play. ARM A510 is short of a full OOOE CPU design.

NVIDIA is not a proper CPU design house when they licensed ARM Holdings PLC's out-of-order-execution CPU IP e.g. ARM's Neoverse.

Leading-edge CPU design houses have been dealing with divergence execution for a long time; the difference is that the GPU is an array of small processors. Might as well throw in small OOOE processors into an array and call it a day, but the GPU ideology still excels at raster.
 
Last edited:
Out of Order Memory Access is another method for mitigating data load stalls with divergence.
You clearly don't understand what you're talking about. SER doesn't "help with data load stalls". It helps with optimal h/w utilization when a ray tracing divergence lead to a shader thread group executing with less than h/w width - SER allows to "repack" threads into wider groups in such cases.
Memory access stalls happen when a thread in a group needs something from memory, and depending on where this thread is in the pipeline getting this data faster may significantly improve performance by limiting the length of a pipeline stall. This isn't necessarily related to any sort of divergence (which is also why it helps in general and not just when such execution divergence happen) and is not related to SER or helps with h/w utilization when a divergence happen - it may help with getting data from memory faster but you will still get subpar execution h/w utilization w/o SER.
Also we don't really know if RTX GPUs have similar OOO memory access features. They don't advertise a lot of what they have due to different reasons. I'd expect that they do have something similar, at least from what they have according to CUDA docs.
 
I just read that the support for 1000 pascal series ended but actually that nvidia was as of October still supporting GTX 700, 800, 900 series? That's insane

And they say while driver updates stop for games, quarterly security updates are promised up to 2028
No game optimizations does not mean no driver support.

RX 400 series GPUs still receive driver updates in 2025. So rest assured RDNA1 will keep receiving driver updates.

In Linux AMD GPUs get even more long term driver updates, with GPU's from 2002 still getting driver updates.

Though I should have expected a Microsoft fanboy to spread misinformation about AMD. Wintel for a reason.
 
This is a very interesting video about the situation with AMD drivers, according to a former AMD employee. The compiler that AMD uses is bad, and even RDNA4 suffers from bugs from the early GCN era.

 
Top Bottom