• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

PS5 Pro Specs Leak are Real, Releasing Holiday 2024(Insider Gaming)

onQ123

Member
So you agree its software. Glad to hear it.

I also love how you conveniently ignored the fact that we were discussing massive stutters to 0 in lords of the fallen and huge stutters in avatar's 30 and 40 fps modes that drop 10-20 fps in a second almost randomly. it was literally in the video that was posted. That is whats being discussed, not average performance of xsx vs ps5 where the slower clocks or wide design of the xsx HARDWARE might be keeping it from consistently outperforming the ps5.
And you're ignoring that Xbox Series X has a strange memory configuration that might be causing these stutters?
 

SlimySnake

Flashless at the Golden Globes
And you're ignoring that Xbox Series X has a strange memory configuration that might be causing these stutters?
yes, because other games have had these same issues and they were fixed after some patches. callisto, star wars, control, pretty much every single launch game.
 
poor optimization. nothing more. the XSX has more than enough power to run games at a decent resolution and framerate. those drops to 0 fps are just bugs. Same shit as Skyrim on the PS3.

The PS5 and XSX literally have the same exact RDNA2 GPU and Zen 2 CPU. Both use GDDR. There is no ESRAM vs GDDR fiasco this gen. No cell vs xenon processor nonsense. No Nvidia vs ATI GPU delta. If its Directx issues then it shouldve affected the PC versions as well. This is just a very simple case of people not playing their own games before they are shipped.

XSX outperforms the PS5 in a lot of games. If these issues were a hardware thing than every single game would be affected. Its possible that the slower clocks and wide design is making the xsx perform roughly on par with the PS5 in some games, but those traversal stutters in avatar and 0 fps bugs are not due to PS5's super secret cerny IO sauce or higher clocks. They are just bugs. There is nothing for DF to investigate in those scenarios. Thats for the devs to investigate.

Most of us are millennials on this board. We all should remember how tearing and texture filtering was a massive issue in virtually every single UE3, Ass Creed and ubisoft game on the PS3. Nothing worked until around 2010 when they all mysteriously went away. All of a sudden, AC games, EA games, and virtually every third party game was virtually identical to the X360 which was supposedly the more powerful console. It turned out devs just got more familiar with the PS3 hardware and all the work sony engineers put in to train devs on how to code the cell. Hell, even ND were baffled by the tearing and shipped Uncharted 1 with severe screen tearing only to be told by GG that the fix is rather simple and it took them a week to add that in to Uncharted 2 or something silly like that. but that was the cell and it was early days of HD era game development. XSX architecture is anything but exotic so these bizarre traversal stuters and 0 fps issues are indeed lazy devs.
I don't think you got the gist of my post. XSX is actually slightly outperforming PS5 in many of those UE5 games when it's compute / bandwidth limited. Interestingly PS5 is outperforming XSX when there is hardware RT involved, but that's another problem entirely (likely better tools on Sony SDK when there is hardware RT involved).

But when it's I/O limited (traversal stutters) suddenly PS5 is outperforming XSX in both UE5 and custom engines (like Avatar). Why aren't everyone suddenly doing 2 + 2 = 4 here? Why would PS5 have an impressive I/O advantage over XSX here? Mysterious or elephant in the room?
 
Last edited:

GermanZepp

Member
So Series X hardware is a software issue?
M7SWOOi.jpeg
 

ChiefDada

Gold Member
Yeah..........I'm confused why people are lying to themselves on this. Guys........the PS5 Pro will perform like a console that's 15 TFs with added\better Ray-tracing and Mach-Learning with PSSR.

So 15tf on top of hw acceleration for compute work that teraflop metrics traditionally cover. So more like 50+TF equivalent.

Overall, it'll probably feel like a console that's 70% better than the Xbox Series X games with raytracing turned on (like Cyberpunk) or 30-40% better without raytracing. But don't expect a next-gen level jump here.

considering base PS5 often performs bettet than Series X Cyberpunk RT games, I'll say you're coming in WAY too low with the 70% higher than Series X" comparison
 

winjer

Gold Member
This is the alleged BVH8 Traversal Shader implementation the PS5 Pro will benefit from, I'm guessing it's a result of collaboration between both AMD and Sony. I'm curious to see what it will be doing on the hardware level, Keplar already suggested it would performant as Lovelace.

RDNA2 already was doing a deep BVH.
ChipsandCheese noticed that it averaged a 7 level BVH structure, but sometimes going as much as 11 levels deep.
So I don't get why a BVH with 8 levels is anything special this time around.
 

Lysandros

Member
Interestingly PS5 is outperforming XSX when there is hardware RT involved, but that's another problem entirely (likely better tools on Sony SDK when there is hardware RT involved).
Not to dismiss software side of things, there are also hardware facets consider for this phenomenon. Ray bounces are calculated faster on PS5 due to clock differencial, CUs/intersection engines have substantially less GPU L1 cache amount/bandwidth to work within on XSX (BVH pressure on cache to be noted), intersection engines might be a bit 'freer' to process their work flow without affecting shader throughput on PS5 due to async favoring it, available RAM and real world CPU throughput etc.
 
Last edited:

IDWhite

Member
RDNA2 already was doing a deep BVH.
ChipsandCheese noticed that it averaged a 7 level BVH structure, but sometimes going as much as 11 levels deep.
So I don't get why a BVH with 8 levels is anything special this time around.

RDNA 2 it's only capable of do 4 box tests or one triangle test per cycle. So the data that is in >4 node level is checked on separate cycles. Aparently on RDNA 4 they can do RDNA 2 x2 box tests or triangle test per cycle, so that's why they say BVH8 instead of BVH4.

This only describes part of the RT process, because cache size and configuration has a big role, as well as memory bandwidth. And there are a lot of issues to solve like divergence and latency.
 
Last edited:

winjer

Gold Member
RDNA 2 it's only capable of do 4 box tests or one triangle test per cycle. So the data that is in >4 node level is checked on separate cycles. Aparently on RDNA 4 they can do RDNA 2 x2 box tests or triangle test per cycle, so that's why they say BVH8 instead of BVH4.

This only describes part of the RT process, because cache size and configuration has a big role, as well as memory bandwidth. And there are a lot of issues to solve like divergence and latency.

BVH8 refers to the numbers of levels that the BVH structure has. Not to the amount of rays or tests that can be done.
 

IDWhite

Member
BVH8 refers to the numbers of levels that the BVH structure has. Not to the amount of rays or tests that can be done.

Your are completely wrong. BVH structures could have multiple levels, not only 4 or 8. Here BVH4 and BVH8 are references of how "deep" the RT units can go on one single cycle.
 

raul3d

Member
BVH8 refers to the numbers of levels that the BVH structure has. Not to the amount of rays or tests that can be done.
No, I think it refers to the number of branches. BVH4 has 4 branches and checks all 4 per cycle. BVH8 doubles that to 8 and checks all 8. This also prevents the tree from becoming too deep.
 

winjer

Gold Member
Your are completely wrong. BVH structures could have multiple levels, not only 4 or 8. Here BVH4 and BVH8 are references of how "deep" the RT units can go on one single cycle.

I never said it has only 4 or 8 levels.
Read my post before, and you will see what I mean.
 

winjer

Gold Member
No, I think it refers to the number of branches. BVH4 has 4 branches and checks all 4 per cycle. BVH8 doubles that to 8 and checks all 8. This also prevents the tree from becoming too deep.

It's levels, not branches.
And RDNA2 could already make a variable number of levels and branches.
 
Last edited:

FireFly

Member
Not to dismiss software side of things, there are also hardware facets consider for this phenomenon. Ray bounces are calculated faster on PS5 due to clock differencial, CUs/intersection engines have substantially less GPU L1 cache amount/bandwidth to work within on XSX (BVH pressure on cache to be noted), intersection engines might be a bit 'freer' to process their work flow without affecting shader throughput on PS5 due to async favoring it, available RAM and real world CPU throughput etc.
It's four ray/box intersections per CU per clock, or one ray/triangle intersection per CU per clock.

So given the CU difference, the PS5's clock speed advantage does not give it a theoretical intersection testing rate advantage.
 

IDWhite

Member
No, I think it refers to the number of branches. BVH4 has 4 branches and checks all 4 per cycle. BVH8 doubles that to 8 and checks all 8. This also prevents the tree from becoming too deep.
Boxes
I never said it has only 4 or 8 levels.
Read my post before, and you will see what I mean.
You said literally "BVH8 refers to the numbers of levels that the BVH structure has"

It's levels, not branches.

No, they are boxes, not levels.
 

winjer

Gold Member
Boxes

You said literally "BVH8 refers to the numbers of levels that the BVH structure has"

Yes, BVH8 refers to the number of levels the BVH structure has.
But the reason why I was referring to that, is because of the news someone posted that the Pro would have support for BVH8.
But that is pointless and probably very wrong, because even RDNA2 could already do more levels.

No, they are boxes, not levels.

What is this non-sense? As if a BVH could only have such a low number of bounding boxes.
A BVH has hundreds, even thousands of bounding volumes.
 

raul3d

Member
It is the branch factor or degree of the tree (number of children). Each child is a box, so to me branches is the same as boxes. See this random internet page:
https://psychopath.io/post/2017_08_03_bvh4_without_simd

Traversing the BVH is a recursive algorithm. I does not make sense to artificially restrict the levels (height) of the tree. Every developer can decide that on it's own. But having instructions that can travers wider trees makes a lot of difference.
 

IDWhite

Member
Yes, BVH8 refers to the number of levels the BVH structure has.
But the reason why I was referring to that, is because of the news someone posted that the Pro would have support for BVH8.
But that is pointless and probably very wrong, because even RDNA2 could already do more levels.



What is this non-sense? As if a BVH could only have such a low number of bounding boxes.
A BVH has hundreds, even thousands of bounding volumes.

They are referring to box test per cycle, that's it.

BVH structures could have multiple levels as well as multiple boxes in each level, but you can only test a limited number of those per cycle. So RDNA 2 can only test 4 and RDNA 4 only 8 per cycle. All the subsequent boxes are tested on differents cycles.
 

winjer

Gold Member
They are referring to box test per cycle, that's it.

BVH structures could have multiple levels as well as multiple boxes in each level, but you can only test a limited number of those per cycle. So RDNA 2 can only test 4 and RDNA 4 only 8 per cycle. All the subsequent boxes are tested on differents cycles.

BVH and box test are different stages in the RT pipeline.
You can't say it's BVH8, while claiming it refers to ray-tests.
That is the wrong terminology.
 

Lysandros

Member
It's four ray/box intersections per CU per clock, or one ray/triangle intersection per CU per clock.

So given the CU difference, the PS5's clock speed advantage does not give it a theoretical intersection testing rate advantage.
I am referring to bounces per ray cast here, this doesn't scale with the number of intersection engines but the clocks. Anyway, this is only one aspect of overall RT performance.
 
Last edited:

IDWhite

Member
BVH and box test are different stages in the RT pipeline.
You can't say it's BVH8, while claiming it refers to ray-tests.
That is the wrong terminology.

You are mixing concepts. Bounding volume hierarchy (BVH) is not a stage, is the name of a data structure that reside on memory. And box test it's only a step of many on the RT pipeline.

Ray test can be done on boxes and triangles, so when we say BVH4 or 8 is a reference to a ray test on boxes. Ray test on triangle is completely different measure and nomenclature.
 
Last edited:

winjer

Gold Member
You are mixing concepts. Bounding volume hierarchy (BVH) is not a stage, is the name of a data structure that reside on memory. And box test it's only a step of many on the RT pipeline.

Ray test can be done on boxes and triangles, so when we say BVH4 or 8 is a reference to a ray test on boxes. Ray test on triangle is completely different measure and nomenclature.

At this point you are just making stuff up.
 

FireFly

Member
I am referring to bounces per ray cast here, this doesn't scale with the number of intersection engines but the clocks. Anyway, this is only one aspect of overall RT performance.
Why would the capacity to generate new rays (at the point of bounce) not scale with the number of CUs, if this work is being done in a shader? It seems you're implying that ray generation is performed by some kind of dedicated unit in the GPU, separate from the CUs.
 
Last edited:

West Texas CEO

GAF's Nicest Lunch Thief and Nosiest Dildo Archeologist
Explain your self because this non argument post accusing someone without explanation don't give you any truth
Your assumptions can only be correct if the gpu works with quads, but as you well know, such is not the case.
 
Last edited:

SlimySnake

Flashless at the Golden Globes
I don't think you got the gist of my post. XSX is actually slightly outperforming PS5 in many of those UE5 games when it's compute / bandwidth limited. Interestingly PS5 is outperforming XSX when there is hardware RT involved, but that's another problem entirely (likely better tools on Sony SDK when there is hardware RT involved).

But when it's I/O limited (traversal stutters) suddenly PS5 is outperforming XSX in both UE5 and custom engines (like Avatar). Why aren't everyone suddenly doing 2 + 2 = 4 here? Why would PS5 have an impressive I/O advantage over XSX here? Mysterious or elephant in the room?
PS5 has traversal stutters too though. Star Wars. Dead Space. Callisto. I dont think thats IO limited either. I think it's just whichever version got the most polish/time. The PS4 pro and X1x didnt have the IO block and they never had this many issues with the traversal stutter like we are seeing this gen. This is on devs. Avatar had the traversal stutter in a cave. There is literally a tiny small room you traverse and the framerate drops to the mid teens. At least this level they showed off here was in the middle of the jungle but again, a random drop from a locked 40 fps to 30 fps is a stutter, not a GPU or CPU or IO issue. It's just poor optimization and a bug frankly.

Dead Space on the PC is a massive stutter fest to this day. They never fixed it. I run it on a 3080 we are all hoping the PS5 Pro can match in performance. Doesnt mean anything. I have a 7.5 GBps SSD. Im using Gen 4 CPU, GPU and SSDs in a Gen 4 motherboard. My DDR4 can go up to 100 GBps. Way more than the PS5 SSD's transfer rate of 9 GBps. Or even 22 GBps for kraken comp like they said. Avatar has no issues whatsoever. Alan wake is reading 2-3 GBps of data from my SSD every second. No issues. Meanwhile, Dead Space is virtually unplayable and Callisto and Star Wars stutter at the same exact place every time.
 

raul3d

Member
Your assumptions can only be correct if the gpu works with quads, but as you well know, such is not the case.
What is going on here? And when did quads come into the picture?

A BVH is a data structure that groups geometry into bounding boxes and organizes them hierarchically in a tree. Tracing into these bounding boxes (traversing the tree) is a lot simpler then against triangles and allows you to quickly discard geometry that your ray will not hit.
 

IDWhite

Member
Your assumptions can only be correct if the gpu works with quads, but as you well know, such is not the case.
And what are my assumptions? Modern GPUs can make ray box and ray triangle test, and thats it. Im not doing deep explanations on what the purpose of those funtions. I'm only saying that BVH8 is a reference from box test per cycle.

I'm not sure if you understand what a quad is, but you can use built-in triangles.
 
Last edited:

octos

Member
What is going on here? And when did quads come into the picture?

A BVH is a data structure that groups geometry into bounding boxes and organizes them hierarchically in a tree. Tracing into these bounding boxes (traversing the tree) is a lot simpler then against triangles and allows you to quickly discard geometry that your ray will not hit.
Exactly. I've been making my own BVH structures for game engines.

Simplified explanation:
It's basically a "collision accelerator".
Imagine you want to know if 2 objects are hitting each other, but each object is made of thousands of polygons (triangles btw).
It would be very inefficient to check if for each polygon of object 1, there's an intersection with each polygon of object 2.
If we put a big box around object 1, and then a big box around object 2, then we can very quickly check if box1 intersects box2 (especially if those boxes are axis aligned => AABB = axis aligned bounding box), and if not, then we know there is no collision and don't need to check further.

Now with BVH, it's the same idea but each big bounding box is made of smaller bounding boxes, so we keep checking recursively.
Overall, instead of doing a million checks, we end up with something like a logarithm of that (complexity goes from O(n2) to something like n*log(n)), which is obviously way faster.
 

Fafalada

Fafracer forever
Now with BVH, it's the same idea but
Not just the same idea, collision acceleration structures used by every physics engine are literally the same thing, its collision geometry bucketed into bounding boxes(usually around 5-10 tris in leaf nodes) with the corresponding hierarchy.

That's also why rigid body based collision meshes are almost exclusively non deformable/static. It's too expensive to recompute the acceleration structure(though modern gpus are now up to the task, the most widely used physics engines hadn't caught up yet).
 
Last edited:

Perrott

Member
The Marmolade (who's written for VGC in the past) is claiming that, at least to his knowledge, the PS5 Pro reveal is planned for September, within the context of a showcase, and that a State of Play is likelier to be the format of the much-rumored event happening this month.
 

ChiefDada

Gold Member
Digital Foundry has new and VERY INTERESTING info about PS5 Pro GPU specs via DF Direct early access . I will not post their video or slide as I acknowledge they have to make a living but confirmed specs below:

1. 30 WGPs = 60 Active CUs
2. Configuration: 2 SEs/ 4 SAs (8-7 8-7)
2. 2.35 Ghz Max Boost Clock
3. GL2 Cache =4MB (Same as PS5)
4. GL1 Cache =256kb (PS5 =128kb)
5. GL0V Cache = 32kb (PS5 = 16kb) "Sony Specifically says this increase is to allow for better RT performance"

I'm still watching and will update you guys asap.
 
Top Bottom