• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

DF: AMD FSR 4 Upscaling Tested vs DLSS 3/4 & - A Big Leap Forward - RDNA 4 Delivers!

Gaiff

SBI’s Resident Gaslighter
I don't game on PC, but is there a rule saying AMD is not allowed to keep FSR 4 exclusive on AMD cards?

From a competitive perspective, I'm sure they'd want to.
There’s no rule, but it leverages AMD’s software in different ways than DLSS. I don’t think AMD will ever bother changing it to make it compatible with NVIDIA cards.
 

Kataploom

Gold Member
casual people won't care
my friend played black myth wukong with fsr set to perfrmance... at 1080p
I asked her like "how does the game look, any good?" (without any upscaling context by the way, she doesn't know anything about that stuff). she said, "hmm, it's fine idk"

another friend of mine don't even bother using "quality" mode as he thinks it will reduce his FPS. most people have no clue about how any of this works
It is more like they see the screen and don't give a fuck. Ir haciendo to me as well, there are some artifacts from FSR that I really couldn't stand in a couple games at most... But most of the time? It feels like free performance at no compromise, no kidding. I can't see the ghosting while gaming as many of you, I've tried, and I have perfect eyes with no glasses 20/20 but I just don't give a fuck si My brain basically omits the alleged issues.
 

Nex240

Neo Member
Still Interested to see more comparisons at low resolutions. Most of these comparisons have been incredibly kind to AMD by testing at 4K (1440p for quality and 1080p for performance). DLSS has really been an absolute cheat code at 720p or lower where FSR3 and PSSR start to really struggle.
Most PC gamers play on 1080p/1440p monitors so they would be rendering at sub 1080p on DLSS much of the time. (sub 1080p if they pick balanced or performance at 1440p)
 
Last edited:

yamaci17

Member
Still Interested to see more comparisons at low resolutions. Most of these comparisons have been incredibly kind to AMD by testing at 4K (1440p for quality and 1080p for performance). DLSS has really been an absolute cheat code at 720p or lower where FSR and PSSR start to really struggle.
Most PC gamers play on 1080p/1440p monitors so they would be rendering at sub 1080p on DLSS. I believe quality DLSS on 1440p is about 900p.
quite good at 1440p
 

PaintTinJr

Member
We have yet to see FSR 4 running with 300 TOPs while producing superior results at similar processing cost per frame before declaring PSSR as an 'inferior' custom solution i think.

Why would Sony stick to RDNA 3/4 for PS6 in 2028?... This isn't Nintendo. It will surely incorporate hardware from the future/yet to be released AMD arc along with original customizations like in their past systems. PS5 PRO is already RDNA 4 when it come to RT hardware.
Agreed, although the TOPs comparison is going to be a calculation in each system, or at least the for the Pro, because the Pro is using nothing like 300 TOPs of it's mutually inclusive 16.7TF/s.

If a game ran at a locked 60fps on Pro with PSSR taking 1.5ms per frame, then 300 TOPS * (60 * 1.5 /1000) tells you the real processing cost (eg 27 Tops)

Whereas as in say the RX 9070 XT regardless of frame-rate FSR4 is using all 1557 TOPs of the tensor cores AFAIK, meaning you can't async compute more work in the idle periods.

So the silicon cost for comparison is all those tensor cores, which makes sense because the FSR4 algorithm runs dynamically isochronously on the tensors at the fixed or varied frame-rate, so interlacing other independent game compute tasks that also will be isochronous will have difficulty working around FSR4 to saturate what FSR4 doesn't use.

However if we ignore that and said the game ran at 90fps locked on the RX 9070 XT and FSR4's comparative frame time is 2.5ms,

Then that would be 1557 * (90*2.5/1000) = 350 TOPs, but then the TOPs per frame would be
(27/60 vs 350/90 or)

0.45 vs 3.8

still showing orders less TOPs per frame for PSSR than FSR4 in the 5x -10x range.
 
Last edited:

SKYF@ll

Member
Agreed, although the TOPs comparison is going to be a calculation in each system, or at least the for the Pro, because the Pro is using nothing like 300 TOPs of it's mutually inclusive 16.7TF/s.

If a game ran at a locked 60fps on Pro with PSSR taking 1.5ms per frame, then 300 TOPS * (60 * 1.5 /1000) tells you the real processing cost (eg 27 Tops)

Whereas as in say the RX 9070 XT regardless of frame-rate FSR4 is using all 1557 TOPs of the tensor cores AFAIK, meaning you can't async compute more work in the idle periods.

So the silicon cost for comparison is all those tensor cores, which makes sense because the FSR4 algorithm runs dynamically isochronously on the tensors at the fixed or varied frame-rate, so interlacing other independent game compute tasks that also will be isochronous will have difficulty working around FSR4 to saturate what FSR4 doesn't use.

However if we ignore that and said the game ran at 90fps locked on the RX 9070 XT and FSR4's comparative frame time is 2.5ms,

Then that would be 1557 * (90*2.5/1000) = 350 TOPs, but then the TOPs per frame would be
(27/60 vs 350/90 or)

0.45 vs 3.8

still showing orders less TOPs per frame for PSSR than FSR4 in the 5x -10x range.
RX9070:291 TOPS = 582.5 TOPS(w/Sparsity) *INT8
RX9070XT:389 TOPS = 778.5 TOPS(w/Sparsity) *INT8
PS5 Pro:300 TOPS = 600 TOPS(w/Sparsity) *INT8 *Sony never mentioned Sparsity.

RX9070XT:1557 TOPS (w/Sparsity) *INT 4

Do you just compare INT4 and INT8?
I'm confused because different people claim that FSR4 uses INT4 or FP8.
Also, one journalist said that AMD uses INT4 numbers to make performance appear higher.
 

PaintTinJr

Member
RX9070:291 TOPS = 582.5 TOPS(w/Sparsity) *INT8
RX9070XT:389 TOPS = 778.5 TOPS(w/Sparsity) *INT8
PS5 Pro:300 TOPS = 600 TOPS(w/Sparsity) *INT8 *Sony never mentioned Sparsity.

RX9070XT:1557 TOPS (w/Sparsity) *INT 4

Do you just compare INT4 and INT8?
I'm confused because different people claim that FSR4 uses INT4 or FP8.
Also, one journalist said that AMD uses INT4 numbers to make performance appear higher.
Good point.

INT8 is/was the common means of discussing TOPs, but as INT4 and INT2 become universally available and applied in certain ML problems TOPs being unitless as Tera OPerations (per) second, the discussion will probably shift to whatever is the comparative unit AFAIK, so given that FSR4 and PSSR do indeed use INT4 for quantisation - I'm sure I've read that in documents - the used TOPs by PSSR would indeed need doubled and the scale difference per frame between RX 9070XT would halve to 2.5x-5x

I was working on the basis of sparsity being available in the Pro and used, same with FSR4 but not part of the calculation - can't remember exactly where I read the Pro uses sparsity, must have been in the leaked document thread, but given how long the hardware matrix sparsity has been available in AMD RDNA3, prior to the Pro design being finished, I'm 99.9% sure it is in, but as the TOPs numbers are derived by the clock and FLOPs to INT OPS, with 3 TOPs per unit - as per Cerny's technical follow up on the Pro - the 300 and 600TOPs numbers are without sparsity that be a a level comparison with RDNA4/RX9070 1557TOPs AFAIK.

In the event the Pro doesn't support sparsity, then the comparison would be even more impressive for the Pro depending on how many inferenced equation variables are zeros. Sparsity would probably matter multitudes more for training than inferencing IMO.
 
Last edited:

winjer

Gold Member

AMD FSR 4, the latest version of the upscaling technology made by AMD, was co-developed with Sony. The news came in a tweet pushed by AMD itself, where it was added that this is just the beginning:

The collaboration is specifically on the AI models, then. It's the big AMD FSR 4 change, after all. FSR 1 was just a spatial upscaler, whereas FSR 2 switched to a temporal upscaler. Neither could possibly perform as well as an AI model, though, which has now been proven with the launch of FSR 4, described as a massive improvement over FSR 3. Of course, the trade off is that AMD had to restrict it to the new Radeon RX 9000 Series, just like NVIDIA has been doing with DLSS and its GeForce RTX Series.
 

Lysandros

Member
RX9070:291 TOPS = 582.5 TOPS(w/Sparsity) *INT8
RX9070XT:389 TOPS = 778.5 TOPS(w/Sparsity) *INT8
PS5 Pro:300 TOPS = 600 TOPS(w/Sparsity) *INT8 *Sony never mentioned Sparsity.

RX9070XT:1557 TOPS (w/Sparsity) *INT 4

Do you just compare INT4 and INT8?
I'm confused because different people claim that FSR4 uses INT4 or FP8.
Also, one journalist said that AMD uses INT4 numbers to make performance appear higher.
I thought AMD figures were for INT8 without sparsity, that is somewhat misleading then. I think few people have a real grasp on how Sony is achieving 300 TOPs via their custom solution despite Cerny's explicit explanation of the math. It seems the focus is TOPs efficiency/real throughput, there is a strong distinction to be made there with stock AMD hardware in this context. Like nearly always it's more about efficiency than inflated theoretical numbers.

In the latest video DF/Alex assumed that PS5 PRO's 300 TOPs is achieved via sparsity to emphasis the machine's lesser numbers compared to new AMD GPU's, despite Sony not mentioning sparsity as you said.
 
Last edited:

PaintTinJr

Member
I thought AMD figures were for INT8 without sparsity, that is somewhat misleading then. I think few people have a real grasp on how Sony is achieving 300 TOPs via their custom solution despite Cerny's explicit explanation of the math. It seems the focus is TOPs efficiency/real throughput, there is a strong distinction to be made there with stock AMD hardware in this context. Like nearly always it's more about efficiency than inflated theoretical numbers.

In the latest video DF/Alex assumed that PS5 PRO's 300 TOPs is achieved via sparsity to emphasis the machine's lesser numbers compared to new AMD GPU's, despite Sony not mentioning sparsity as you said.
Alex really showing his arse there if he a) he can't do the maths following Cerny's info to work out the 300TOPs and b) if thinks sparsity is as important for inference as the regression maths for training - to be a big issue not being used for inferencing.

I would even hazard a guess that sparsity is more beneficial to standard transformation of T&L to transpose geometry into projected camera space than in (generative adversarial networks comprising convolution neural networks) for upscaled game graphics.

Unlike in other forms of inferencing where the data might be heavily incomplete, the data being feed to - the inference equations for - this type of problem is all decided and control in advance, so why would developers even choose to use variables that weren't containing data - for sparsity to be a meaningful saving?
 
Last edited:
Top Bottom