Support NeoGAF

Wolzard · Nov 3, 2025

Mr Moose said:
This isn't a backtrack?

No, the RDNA 2 GPUs remain on a maintenance branch, as originally planned. They simply rephrased the same text due to the scale this issue reached.
Even Steve from Gamer Nexus noticed this. He even said that, judging by the language used, the text was written by a lawyer.

roosnam1980 · Nov 4, 2025

Article:

AMD confirms RDNA1/2 will get game optimizations alongside RDNA3/4, Call of Duty Black Ops 7 included

Source: https://videocardz.com/newz/amd-confirms-rdna1-2-will-get-game-optimizations-alongside-rdna3-4-call-of-duty-included

winjer · Nov 4, 2025

roosnam1980 · Nov 4, 2025

were they really going to sideline older GPUs?poor choice of words was the issue here
here is the populer arc raiders running on a rx580

Wolzard · Nov 4, 2025

Wolzard said:
"Market needs" sounds like "let's optimize for some new Call of Duty that comes out."

roosnam1980 said:
AMD confirms RDNA1/2 will get game optimizations alongside RDNA3/4, Call of Duty Black Ops 7 included

dgrdsv · Nov 5, 2025

winjer said:
No. Nvidia only maintains security updates for those GPUs. The cutoff point for optimized drivers is for the RTX 2000 series.
AMD still provides security updates for GPUs prior to RDNA2. The problem is that there are no optimized drivers for RDNA1 and 2, from now on.

So what's the difference between "only maintains security updates" and "there are no optimized drivers for RDNA1 and 2, from now on"?

winjer said:
The cutoff point for optimized drivers is for the RTX 2000 series.

Not yet. Yesterday's Nvidia GRD release was still from R580 branch with Maxwell and Pascal support. This should be the last one though.
Press has been running way ahead of the moment on this one.

dgrdsv · Nov 5, 2025

rnlval said:
AMD's Radeon Preview driver from https://www.amd.com/en/resources/su...-notes/RN-RAD-MS-AGILITY-SDK-25-10-07-01.html

AMD Radeon™ RX 7000 and 9000 series graphics products will support:

Advanced Shader Delivery

Target AMD's plugin DLL directly using --plugin <Your_Path>\amdxc64.dll

Application-Specific Driver States (PIX)

Fence Barriers

Limitation: "MaybeReorderThreads" does not move threads

Tightening Placed Resource Alignment

Tiled Resource Tier 4

AMD Radeon™ RX 9000 series graphics products will support:

Cooperative Vectors 1.0

RDNA 1 and RDNA 2 are missing hardware features.

NVIDIA Turing and Ampere don't have Shader Execution Reordering i.e. it's NOP(no operation).

This is a word play on part of AMD.
"Limitation: "MaybeReorderThreads" does not move threads" means that the feature isn't actually working despite driver declaring support.
It is exactly the same as on Turing and Ampere where NVAPI also declares support which isn't doing anything - the code just runs without any reordering.
SER will likely be supported from RDNA5 onward on AMD's h/w.

Branded · Nov 5, 2025

I'm on a 6800XT... So I'm good for now or what?

roosnam1980 · Nov 5, 2025

dgrdsv said:
This is a word play on part of AMD.
"Limitation: "MaybeReorderThreads" does not move threads" means that the feature isn't actually working despite driver declaring support.
It is exactly the same as on Turing and Ampere where NVAPI also declares support which isn't doing anything - the code just runs without any reordering.
SER will likely be supported from RDNA5 onward on AMD's h/w.

RDNA4 owner here , is this thing a big deal?

dgrdsv · Nov 5, 2025

roosnam1980 said:
RDNA4 owner here , is this thing a big deal?

Not really, gives some +10% of final performance with RT or so.
Although I'm sure there may be edge cases where this figure will be higher - and that different architectures may get different speedups from it.

SolidQ · Nov 5, 2025

dgrdsv said:
that different architectures may get different speedups from it.

Let's wait RDNA5/Rubin

I'm on a 6800XT... So I'm good for now or what?

You fine. Even RX 580 still get bug fixes.

rnlval · Nov 5, 2025

dgrdsv said:
This is a word play on part of AMD.
"Limitation: "MaybeReorderThreads" does not move threads" means that the feature isn't actually working despite driver declaring support.
It is exactly the same as on Turing and Ampere where NVAPI also declares support which isn't doing anything - the code just runs without any reordering.
SER will likely be supported from RDNA5 onward on AMD's h/w.

DirectX Raytracing (DXR) Functional Spec

Engineering specs for DirectX features.

microsoft.github.io

Shader Execution Reordering (SER) introduces a new HLSL built-in intrinsic, MaybeReorderThread, that enables application-controlled reordering of work across the GPU for improved execution and data coherence. Additionally, the introduction of HitObject allows separation of traversal, anyhit shading and intersection testing from closesthit and miss shading.

-------------

Shader Execution Reordering has two functions i.e. MaybeReorderThread, and HitObject

DirectX Raytracing (DXR) Functional Spec

Engineering specs for DirectX features.

microsoft.github.io

Even on devices that don't do reordering, the HitObject portion of SER can be useful.

For instance, suppose an app wants to trace a ray, potentially including AnyHit shader invocations, and just wants the final T value without running the ClosestHit shader (even if it happens to exist in the HitGroup).

The app can call TraceRay returning a HitObject, call HitObject::GetRayTCurrent on the HitObject to get the T value and be done. Not calling HitObject::Invoke, skips ClosestHit/Miss invocation, and this works on any device with Shader Model 6.9 support.

Try again.

rnlval · Nov 5, 2025

roosnam1980 said:
RDNA4 owner here , is this thing a big deal?

Atm, RDNA 3 and RDNA 4 unified preview driver support half of SER.

It's NOP on Ampere and Turing.

With the ADA generation, Nvidia already used SER extensions with NVAPI in Cyberpunk 2077's path tracing. RDNA 3 / 4 is brute-forcing Cyberpunk's path tracing without NVAPI extensions.

RDNA 4's "Out-of-Order" Memory Accesses

Examining RDNA 4's out-of-order memory accesses in detail, and investigating with testing

chipsandcheese.com

In theory, RDNA 4 supports hardware out-of-order shader execution. This requires the driver code base's segment to be separated from RDNA 3.

dgrdsv · Nov 6, 2025

SolidQ said:
Let's wait RDNA5/Rubin

Both are highly likely to be less sensitive to traversal and shading divergence than RDNA2/3/4 are.
So I'd expect SER to be even less interesting on next gen h/w - and for that h/w to have more interesting things for RT than SER in it.
SER's main selling point is that it's a cheap optimization from h/w perspective, and anything which is getting you even +1% with close to zero transistors spent is interesting.

rnlval said:
Try again.

"Try again" what?
You've just confirmed exactly what I've said - RDNA3/4 h/w won't do thread reordering.
HitObject is a purely API s/w optimization which doesn't require any support in the h/w.

SM 6.9 (required for SER feature in DX) is supported from Turing onward on Nvidia and I kinda thought that it will be supported from RDNA2 onward on AMD as this makes sense feature wise - but this thread suggests that AMD may be skipping RDNA2 for such support.

rnlval · Nov 6, 2025

dgrdsv said:
Both are highly likely to be less sensitive to traversal and shading divergence than RDNA2/3/4 are.
So I'd expect SER to be even less interesting on next gen h/w - and for that h/w to have more interesting things for RT than SER in it.
SER's main selling point is that it's a cheap optimization from h/w perspective, and anything which is getting you even +1% with close to zero transistors spent is interesting.

"Try again" what?
You've just confirmed exactly what I've said - RDNA3/4 h/w won't do thread reordering.
HitObject is a purely API s/w optimization which doesn't require any support in the h/w.

SM 6.9 (required for SER feature in DX) is supported from Turing onward on Nvidia and I kinda thought that it will be supported from RDNA2 onward on AMD as this makes sense feature wise - but this thread suggests that AMD may be skipping RDNA2 for such support.

Nvidia's Shader Execution Reordering (SER) allows for is reordering threads that hit or miss.

RDNA 4's out-of-order memory access seems to be very similar to the capabilities that Cortex-A510, which can absorb up to 2 cache misses without stalling the rest of the pipeline. The number of misses that an RDNA4 Compute Unit can handle is unknown.

From https://chipsandcheese.com/p/amds-rdna4-architecture-video

dgrdsv · Monday at 7:33 AM

rnlval said:
Nvidia's Shader Execution Reordering (SER) allows for is reordering threads that hit or miss.

Which is where divergence happens.

rnlval said:
RDNA 4's out-of-order memory access seems to be very similar to the capabilities that Cortex-A510, which can absorb up to 2 cache misses without stalling the rest of the pipeline. The number of misses that an RDNA4 Compute Unit can handle is unknown.

This has nothing to do with SER.

rnlval · Monday at 10:23 PM

dgrdsv said:
Which is where divergence happens.

This has nothing to do with SER.

GDC 2025: DirectX State of the Union

Take a closer look at the latest improvements and new features available soon to developers in DirectX.

developer.microsoft.com

SER deals with data coherence. SER is about grouping threads based on local data coherence.

"SER allows for is reordering threads that hit or miss, as well as threads that go to the same cache or memory level, to be bundled in the same wave." - Microsoft

Out of Order Memory Access is another method for mitigating data load stalls with divergence.

For divergence mitigation,
1. NVIDIA's approach for an in-order processor (a GPU in this case) with many hyper-threads is to reorder the threads. The program order is changed.

2. A CPU company's approach for an in-order processor (a GPU in this case) is to add the out-of-order memory access feature. The program order is not changed. A CPU with OOOE (out-of-order execution) will process instructions and read data out of order, but maintain program order at the end. RDNA 4's approach is similar to ARM A510 CPU's approach, with just the out-of-order memory access, and this is where AMD's CPU knowledge base comes into play. ARM A510 is short of a full OOOE CPU design.

NVIDIA is not a proper CPU design house when they licensed ARM Holdings PLC's out-of-order-execution CPU IP e.g. ARM's Neoverse.

Leading-edge CPU design houses have been dealing with divergence execution for a long time; the difference is that the GPU is an array of small processors. Might as well throw in small OOOE processors into an array and call it a day, but the GPU ideology still excels at raster.

dgrdsv · Tuesday at 8:18 AM

rnlval said:
Out of Order Memory Access is another method for mitigating data load stalls with divergence.

You clearly don't understand what you're talking about. SER doesn't "help with data load stalls". It helps with optimal h/w utilization when a ray tracing divergence lead to a shader thread group executing with less than h/w width - SER allows to "repack" threads into wider groups in such cases.
Memory access stalls happen when a thread in a group needs something from memory, and depending on where this thread is in the pipeline getting this data faster may significantly improve performance by limiting the length of a pipeline stall. This isn't necessarily related to any sort of divergence (which is also why it helps in general and not just when such execution divergence happen) and is not related to SER or helps with h/w utilization when a divergence happen - it may help with getting data from memory faster but you will still get subpar execution h/w utilization w/o SER.
Also we don't really know if RTX GPUs have similar OOO memory access features. They don't advertise a lot of what they have due to different reasons. I'd expect that they do have something similar, at least from what they have according to CUDA docs.

Bernkastel · Tuesday at 11:56 AM

Buggy Loop said:
I just read that the support for 1000 pascal series ended but actually that nvidia was as of October still supporting GTX 700, 800, 900 series? That's insane

And they say while driver updates stop for games, quarterly security updates are promised up to 2028

No game optimizations does not mean no driver support.

https://www.amd.com/en/support/downloads/drivers.html/graphics/radeon-600-500-400/radeon-rx-400-series/radeon-rx-480.html

RX 400 series GPUs still receive driver updates in 2025. So rest assured RDNA1 will keep receiving driver updates.

In Linux AMD GPUs get even more long term driver updates, with GPU's from 2002 still getting driver updates.

https://news.ycombinator.com/item?id=38888897

Though I should have expected a Microsoft fanboy to spread misinformation about AMD. Wintel for a reason.

Wolzard · Tuesday at 1:45 PM

This is a very interesting video about the situation with AMD drivers, according to a former AMD employee. The compiler that AMD uses is bad, and even RDNA4 suffers from bugs from the early GCN era.

Support NeoGAF

AMD RDNA 1 & 2 GPU Driver Support Moved To “Maintenance” Mode, Game Optimizations & New Tech For RDNA 3, 4 & Beyond

Member

Gold Member

Member

Gold Member

Member

Member

Member

Member

Gold Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Ask me about my fanboy energy!

Member

Similar threads