Support NeoGAF

psorcerer · Mar 19, 2020

I see a lot of bullshit is flying in all the threads about how RT is implemented in NV/AMD and how one is superior to the other or vise versa.
I hope I can put all of these arguments out once and for all. (err, whom I kidding)
TL;DR NV and AMD solutions are almost exactly the same. Nothing to see here.

Now actually if we open the Nvidia whitepaper on ray tracing we can see how it is built (roughly)
Each Turing SM (shader module) has a special silicon called "RT unit", it is located close to TEX fetch units, in fact it is located before them, meaning it is on path of VRAM->L1 cache but just before the Texture units
And each 4 Texture units have 1 RT unit serving them.

But what's the performance of an RT unit?
We do not know.
But we do know the texture unit performance, for a typical 2080Ti (in boost mode) it's ~420GTex/sec
Because RTs are on the path to texture cache they cannot possibly fetch from VRAM faster than that.
I would suspect that NV doesn't state the actual max perf numbers just because the actual intersection check is almost instant (probably small number of clocks) after the sample is loaded from memory.
How do I know it?
From the same presentation we can see that RT core accelerates ray->AABB/Tri intersection in a BVH structure.

What's BVH?
It's a tree of boxes for each and every "object"/"group of objects" in the scene, and when boxes house a sufficiently small object that tree leaves are the triangles that this object is built from.
So essentially BVH includes all of your scene, everything and every fucking triangle.
It means in turn that BVH structures are huge, a lot of memory, depends on how precise you want your effects to be, but still.
The size and maintenance (you need to add new objects to that tree, and reconfigure tree when object changes position) of BVH is the first stumble block of a real-time raytracing.
So, to test for an intersection of ray vs BVH we ask the texture cache to load a small BVH "node" into our RT unit, check for intersection, then load next node and so on.
In no place Nvidia mentions any type of special cache for the BVH or any cache in RT unit whatsoever, so, until further notice, we should assume no internal memory in RT units.
That's why each RT unit is effectively bottlenecked by how fast textures (in our case it's a part of BVH tree) can be fetched from VRAM.

Now we do have another number for 2080Ti it's "10Grays/sec" how was it calculated?
According to the same whitepaper it's the best synthetic benchmark result they could achieve on a primary ray intersections for a specific curated list of BVHs they had.
If we open a more realistic scenario of a multi-ray benchmark here we can see that performance drops further, to 3-3.5Grays/sec.
These are synthetic. Actual games will have even less performance available.

So what will happen in actual games?
Let's return to the whitepaper: we can see that the RT core is invoked by scheduling an instruction from the shader and the result is returned back to the shader engine (probably in a register).
It means that "rasterization" is not going anywhere, after we got the intersection it's up to the shader itself to determine what to do with it, how to color the pixel, how to render the shadow, etc.
RT units accelerate BVH traversal and nothing else, all the usual shaders need to run and use these normal unaccelerated FLOPS to render the final image.

What about AMD?
Let's check the AMD patent here.
What do we see? We see "intersection engines"(IE) that are colocated with texture cache units.
Each TEX unit gets an intersection engine that gets BVH node from the cache and then returns the result to the shader.
It's exactly the same path, but with 1 IE core per each 1 TEX core.
The only difference is that "NV RT" is placed before the cache, and "AMD RT" goes after the cache.

So what about bottlenecks?
It's exactly the same. For XSeX RDNA2 GPU we have a texture fillrate of 208TEX units * 1.825Ghz = 379.6GTex/sec.
And we have a number from MSFT "380 billion BVH traversals per second".
Doesn't ring a bell? Yep. It does. We are still limited by the ~380GTex/sec
The same as NV.

Can we compare 2080Ti to XSeX?
Yep, now we can.
We can approximately calculate theoretical difference in max RT performance between 2080Ti and XSeX: 420 vs 380 = 10Grays/sec vs 9Grays/sec
Pretty close. But again, actual in-game numbers will be much, much, much lower.
Probably to the point that there is no difference at all.

What about PS5?
Simple 144*2.23 = 321GTex/sec (yes it is a boost clock, and it is a boost clock for NV too)
Which places it in theoretical 7.6Grays/sec. Not bad. But lower than the other two.
Actually NV states 6Grays/sec for 2070, so still better than that.

Questions?

SighFight · Mar 19, 2020

Wow, thank you. So tge difference in rt power is the same as the raw compute power. ~18% ray intersects/sec. You also mention the large memory consumption. How does the faster, slighty smaller xbsx memory stack up against the larger unified pool that's a bit slower on ps5? Possible to predict the impact?

Shin · Mar 19, 2020

Nice write up, A for effort.

psorcerer · Mar 19, 2020

SighFight said:
How does the faster, slighty smaller xbsx memory stack up against the larger unified pool that's a bit slower on ps5? Possible to predict the impact?

I think it will be the same. Actually.
It will be limited by the fixed function hardware much more than by memory bandwidth.
And memory latency in both consoles is pretty much the same.

UnNamed · Mar 19, 2020

Nvidia showed RT cores as separate cores from shader cores, what I don't understand if this implementation means RT "instruction" are inside the shader cores so they became some sort of more complex cores, tied to them, separate, etc. Sorry for my incorrect use of specific terms.

psorcerer · Mar 19, 2020

UnNamed said:
Nvidia showed RT cores as separate cores from shader cores

That was misleading. That's how it looks in Nvidia whitepaper.

UnNamed · Mar 19, 2020

Saw the slides, now is more clear. Thanks.

GamingKaiju · Mar 19, 2020

P psorcerer that was an excellent explanation, thank you taking the time to write that up.

So based on your research we could see better RT implementation on xsx than on ps5 due to the higher Grays/sec of the xsx?

From my knowledge the nvme stuff in ps5 wouldn't make that much of an impact on RT due to the data been held in Vram.

I'm not really interested in the fanboy stuff I'm generally more interested in the RT implementation of these new consoles.

psorcerer · Mar 19, 2020

GamingKaiju said:
So based on your research we could see better RT implementation on xsx than on ps5 due to the higher Grays/sec of the xsx?

Yes.

GamingKaiju said:
From my knowledge the nvme stuff in ps5 wouldn't make that much of an impact on RT due to the data been held in Vram.

NVME stuff will impact "shader" phase a lot though. Which is still a part of the RT pipeline.
Overall I do not expect heavy RT usage in anything that looks better than Minecraft.
Light usage (like in Control) will be bottlenecked by memory latency, I suppose.
GDDR6 in both consoles, latency will be virtually the same.

Racer! · Mar 19, 2020

GamingKaiju said:
P psorcerer that was an excellent explanation, thank you taking the time to write that up.

So based on your research we could see better RT implementation on xsx than on ps5 due to the higher Grays/sec of the xsx?

From my knowledge the nvme stuff in ps5 wouldn't make that much of an impact on RT due to the data been held in Vram.

I'm not really interested in the fanboy stuff I'm generally more interested in the RT implementation of these new consoles.

Could it be though that you could actually dynamically "switch out" pre computed ray traced textures for static scenery...like changing weather/time of day in a title like GTSport on the fly, while just real time ray trace animated scenery with for example reflections and shadows on a car?

psorcerer · Mar 19, 2020

Racer! said:
Could it be though that you could actually dynamically "switch out" pre computed ray traced textures for static scenery...like changing weather/time of day in a title like GTSport on the fly, while just real time ray trace animated scenery with for example reflections and shadows on a car?

Yep, something like that. There is no point in raytracing static stuff anyway. It can be prebaked in a much better quality.
Some would argue that off-screen reflections/shadows could be accelerated for statics too, but I think it's so minor...

Windows-PC · Mar 19, 2020

P psorcerer I've a question, does the PS5 have hardware RT or is the GPU responsible for RT?

Racer! · Mar 19, 2020

psorcerer said:
Yep, something like that. There is no point in raytracing static stuff anyway. It can be prebaked in a much better quality.
Some would argue that off-screen reflections/shadows could be accelerated for statics too, but I think it's so minor...

Wow that would benefit alot of genres I think. That SSD it seems will be the key to a lot of new game design opportunities.

psorcerer · Mar 19, 2020

Windows-PC said:
I've a question, does the PS5 have hardware RT or id the GPU responsible for RT?

It has an RT implementation that works almost exactly like Nvidia Turning (2080Ti, 2070, etc..)
If you can call that "hardware RT", then AMD is "hardware RT" too.

psorcerer · Mar 19, 2020

Racer! said:
Wow that would benefit alot of genres I think. That SSD it seems will be the key to a lot of new game design opportunities.

I think SSD is kind of off topic here.
You can prebake and use static assets even without SSD.

Insane Metal · Mar 19, 2020

Thank you a lot.

As expected, RT in both consoles are gimped, specially so on PS5. Nice to have it though I guess.

Entroyp · Mar 19, 2020

Wow... thanks for this

psorcerer · Mar 19, 2020

Insane Metal said:
As expected, RT in both consoles are gimped, specially so on PS5. Nice to have it though I guess.

Judging from Cerny's talk he thinks that RT is a gimmick (at the current state of technology).
I also think it's a gimmick.
Hope NV Ampere will prove me wrong.

Insane Metal · Mar 19, 2020

BTW thank you for your post breaking down this stuff

JareBear: Remastered · Mar 19, 2020

psorcerer said:
Judging from Cerny's talk he thinks that RT is a gimmick (at the current state of technology).
I also think it's a gimmick.
Hope NV Ampere will prove me wrong.

I'm not saying it's not a gimmick but haven't we seen it look great in a game like Control, albeit at a substantial performance hit?

Spreewaldgurke · Mar 19, 2020

So in short: if somebody wants the best RT experince (my favorite next gen feature) he should buy XSX?

psorcerer · Mar 19, 2020

Cult of FartNoise said:
I'm not saying it's not a gimmick but haven't we seen it look great in a game like Control, albeit at a substantial performance hit?

I'm not impressed. But people are, so it's ok.
I think it's a definition of "gimmick" - "some people are impressed".

Spreewaldgurke said:
So in short: if somebody wants the best RT experince (my favorite next gen feature) he should buy XSX?

Probably better buy NV Ampere, if you have the moneys.
But if not, then yes, XSeX is a good choice.

Entroyp · Mar 19, 2020

Spreewaldgurke said:
So in short: if somebody wants the best RT experince (my favorite next gen feature) he should buy XSX?

The best thing to do is to wait for ampere cards.

Insane Metal · Mar 19, 2020

Entroyp said:
The best thing to do is to wait for ampere cards.

Watch nVidia more than double the number of RT cores on those new cards. They'll be beastly.

Racer! · Mar 19, 2020

psorcerer said:
I think SSD is kind of off topic here.
You can prebake and use static assets even without SSD.

Yes but the usability in harmony with that SSD might make up for some of that lower ray tracing performance. Which of course would work on both consoles.

Entroyp · Mar 19, 2020

Insane Metal said:
Watch nVidia more than double the number of RT cores on those new cards. They'll be beastly.

Oh, I’m ready to be amazed by what nvidia might bring using the 10 or 7 nm node.

Insane Metal · Mar 19, 2020

Entroyp said:
Oh, I’m ready to be amazed by what nvidia might bring using the 10 or 7 nm node.

Yup.

AMD should also bring good cards this time around. Obviously I don't expect them to match NVidia's stuff but RDNA2 is a massive upgrade over anything they've come up with in many years. If the XSX is a freaking APU with 12TFs, their discrete GPU should get 15TF easily.

psorcerer · Mar 19, 2020

Racer! said:
Yes but the usability in harmony with that SSD might make up for some of that lower ray tracing performance.

SSD was made for other things.
PS5 vision is to remove bottlenecks in data streaming from persistent storage to the screen.
A lot like what PS2 was: data streaming machine that can pipe from the storage to the pixel in the least time possible.
The problem is, no multiplatform game would ever use it.
At the PS2 times, PS2 was an absolute king of sales, so games were made for PS2 and then ported to other devices.
We are living in a world where PC is the king for multiplatforms. And if you develop a game for PC, which is "low streaming", "high RAM", "CPU limited" then PS5 is a worst possible machine.

Ascend · Mar 19, 2020

Insane Metal said:
Yup.

AMD should also bring good cards this time around. Obviously I don't expect them to match NVidia's stuff but RDNA2 is a massive upgrade over anything they've come up with in many years. If the XSX is a freaking APU with 12TFs, their discrete GPU should get 15TF easily.

15TF? I would be surprised if they do not get to 18TF.

Insane Metal · Mar 19, 2020

Ascend said:
15TF? I would be surprised if they do not get to 18TF.

That's why I said 15TF easily. That's the least I expect.

Spreewaldgurke · Mar 19, 2020

psorcerer said:
I'm not impressed. But people are, so it's ok.
I think it's a definition of "gimmick" - "some people are impressed".

Probably better buy NV Ampere, if you have the moneys.
But if not, then yes, XSeX is a good choice.

I don’t have the money for NV but I will buy XSX

rashbeep · Mar 19, 2020

psorcerer said:
Judging from Cerny's talk he thinks that RT is a gimmick (at the current state of technology).
I also think it's a gimmick.
Hope NV Ampere will prove me wrong.

He's downplaying it because it can't be offered at a price that makes sense for most people.

I think he is smart enough to know it's not a gimmick

psorcerer · Mar 19, 2020

rashbeep said:
He's downplaying it because it can't be offered at a price that makes sense for most people.

Nope. It is a gimmick.
Any complex shaders and it's a toast. Any open world - it's a toast.
You can use some RT stuff for physics, audio, shadows, etc.
But full RT is a gimmick. It's enough to analyze Quake 2 RT on 2080 to understand.
Shadows calculated each 4 frames?

Insane Metal · Mar 19, 2020

psorcerer said:
Nope. It is a gimmick.
Any complex shaders and it's a toast. Any open world - it's a toast.
You can use some RT stuff for physics, audio, shadows, etc.
But full RT is a gimmick. It's enough to analyze Quake 2 RT on 2080 to understand.
Shadows calculated each 4 frames?

Imagine full scene, per pixel RT. Maybe 10 years from now?

psorcerer · Mar 19, 2020

Insane Metal said:
Imagine full scene, per pixel RT. Maybe 10 years from now?

These are RT-OFF.
Almost realtime.

Ascend · Mar 19, 2020

psorcerer said:
Nope. It is a gimmick.
Any complex shaders and it's a toast. Any open world - it's a toast.
You can use some RT stuff for physics, audio, shadows, etc.
But full RT is a gimmick. It's enough to analyze Quake 2 RT on 2080 to understand.
Shadows calculated each 4 frames?

It might be, it might not be. If implemented correctly, you don't need separate effects like ambient occlusion, shadows, lighting, reflections, alphas for transparency, texture space diffusion... The RT handles all of them naturally.

VFXVeteran · Mar 19, 2020

I doubt that the consoles will have the same RT performance (in terms of FPS) as the 2080Ti @4k. It'll be interesting to see but I'm betting that in a realworld scenario, the pipeline of the chipsets on the consoles will lag far behind once you push pixel throughput higher. We'll see.

A.Romero · Mar 19, 2020

Personally I like RT on BFV a lot but it does come with a substantial hit. It's really only useful if it's matched with DLSS, otherwise you have to chose between playing on a high res/high fps or RT.

I'm expecting creative uses of RT for next gen but nothing that requires a lot of raw power. As usual, raw power will be available in PC.

psorcerer · Mar 19, 2020

VFXVeteran said:
I doubt that the consoles will have the same RT performance (in terms of FPS) as the 2080Ti @4k. It'll be interesting to see but I'm betting that in a realworld scenario, the pipeline of the chipsets on the consoles will lag far behind once you push pixel throughput higher. We'll see.

Not really.
New stuff from MSFT. DXR 1.1 adds better scheduling for RT shading.

VFXVeteran · Mar 19, 2020

psorcerer said:
Not really.
New stuff from MSFT. DXR 1.1 adds better scheduling for RT shading.

Thanks for the vids. I see that they put them up today.

Can't wait for benchmark tests with RDNA 2 and 2080Ti.

cormack12 · Mar 19, 2020

Thanks for the write up P psorcerer

Support NeoGAF

Ray Tracing in NV/AMD, demystified.

Banned

Member

Banned

Banned

Banned

Banned

Banned

Member

Banned

Member

Banned

Banned

Member

Banned

Banned

Member

Member

Banned

Member

Banned

Member

Banned

Member

Member

Member

Member

Member

Banned

Member

Member

Member

Banned

Banned

Member

Banned

Member

Banned

Member

Banned

Banned

Gold Member

Similar threads