• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Xbox Velocity Architecture - 100 GB is instantly accessible by the developer through a custom hardware decompression block

ToadMan

Member
No one designs games around SSDs, they target the: CPU, GPU, RAM.

I'd say its the opposite way round - designers have built their games around the HDD transfer limitations in the past. They've tried to hide these compromises behind gameplay devices - elevators, corridors, walking sections that are unskippable and so on.

The PS5 SSD solution means developers DON'T have to design around the storage transfer speeds because it is so fast. Whether anyone except Sony 1st Party is in a position to actually take advantage of this new found freedom is the question.
 

psorcerer

Banned
I'm referring specifically to your claim that DirectSorage on PC will use 64K blocks.

Link me to that please if you can.

That's my guess.
It should be bigger than 4K and closer to actual SSD block size (64-128K) and should work with SFS 64K.
 
Last edited:
That's my guess.
It should be bigger than 4K and closer to actual SSD block size (64-128K) and should work with SFS 64K.
Ok. It didn't sound to me like you were guessing. But that's fine. It's just that details on DirectStorage are scant and I'm patiently waiting to find out more information about it, so I thought if you had heard some specifics from somewhere, I'd like to see it.
 

psorcerer

Banned
Ok. It didn't sound to me like you were guessing. But that's fine. It's just that details on DirectStorage are scant and I'm patiently waiting to find out more information about it, so I thought if you had heard some specifics from somewhere, I'd like to see it.

Judging by "100GB accessible to game developers in Velocity" I would guess it's the same thing as DS.
I.e. part of your SSD gets allocated with different blocks and games load into that space. Otherwise there is not way to get the speeds even comparable to XBSX, not to mention PS5.
You can test and compare your SSD right now with 4K and 64K blocks random read to see what's the difference in sustained bandwidth.
 
Judging by "100GB accessible to game developers in Velocity" I would guess it's the same thing as DS.
I.e. part of your SSD gets allocated with different blocks and games load into that space. Otherwise there is not way to get the speeds even comparable to XBSX, not to mention PS5.
You can test and compare your SSD right now with 4K and 64K blocks random read to see what's the difference in sustained bandwidth.
64K blocks are definitely far more performant.

I wonder if they can partition the drive that way, or whether you'd have to dedicate an entire drive to it?
 

psorcerer

Banned
64K blocks are definitely far more performant.

I wonder if they can partition the drive that way, or whether you'd have to dedicate an entire drive to it?

I think they can just use a file with 64K block alignment. At least if you benchmark it like this multithreaded you can max out your SSD for random read.
I.e. seq read = rand read of 64K aligned
 

Three

Member
So I wonder how the armchair " developers" will react to the SFS PR buzzword. SSD will save RAM and improve visuals or not you resident doubters?

It's a bit dumb of a completely random twitter user to try and advertise SFS as "increasing SSD performance" though. That's like a broadband provider saying you can increase our broadband performance by only downloading the essentials. You can't 'Increase performance the performance is defined and the same as before. You can use what it provides efficiently but you can do that with any other provider too.
You're cherry picking an engineer's comments why don't you include these from him?? You're using a false equivalency of SFS and SF yet SF is only one aspect of SFS in the XSX and will have custom hw not available in other SF implementations!! On top of having extra hardware for more efficient texture streaming.




There is no custom hardware that gets you better performance and the SF that provides 75% saving is present everywhere. Like somebody having their game running at 30fps on a 2tf machine doing some optimisation to get 60fps then saying "wow I've essentially made this machine 4tf!".

If there is some special tech that provides the improvement then these sort of comments would make sense but there isn't. This stuff is all present elsewhere
He only stated the custom texture filters for Sampler feedback and not the other custom hardware that comprises other features of SFS. It makes sense because we'll find out in due time. But clearly you have a preferred system and unless the PS5 has an equivalent system, the gap between the systems in terms of I/O will be diminished to a considerable extent. A 2-3x multiplier for the SSD and RAM by cutting the amount of texture data loaded into RAM is no joke.


What you're not getting is that the 2x efficiency (not performance in the spec) exists already on other GPUs in SF and PRT. So you are being oversold some nonsense. The extra added to 'SFS' is to improve visual pop-in on misses and you still have no idea if other GPUs have this feature or not.
 
Last edited:
I think they can just use a file with 64K block alignment. At least if you benchmark it like this multithreaded you can max out your SSD for random read.
I.e. seq read = rand read of 64K aligned
Yea duh, I guess that makes sense. But DirectStorage HAS to be more than just that on the PC.
 

Thirty7ven

Banned
Most games are not made to cater for ssds or faster streaming of assets....you only have to look at PCs which have had these for years they target gpu/cpu specs

you will probably also find most third party games will also not cater to this which accounts 90% of a consoles library

So what? Things change, that’s the fun of it. Tech evolves, new results, new benchmarks.

It’s exciting. And you shouldn’t let your bias for your favorite brand diminish that. The point of consoles is that they are affordable, if you bought a PS5, a XSX and a Switch with a Pro controller on top, that would be barely above an iPhone 11 Pro or a good gaming laptop.
 
Last edited:

psorcerer

Banned
Yea duh, I guess that makes sense. But DirectStorage HAS to be more than just that on the PC.

Dunno. What else could they do? Dedicate a whole SSD to DS? I don't think it will gain more perf. Still the same block oriented controller. Still the same hardware.
They could uncompress on GPU and not CPU. And support unswizzle there too.
But that depends on hw vendors that do not want to support swizzled formats and direct cmd buffer formats to this day. I don't think it will change. It took MSFT 3 versions of DX to get from 9 to 12. And here we are talking about a change of similar magnitude.
 

Ascend

Member
I'm referring specifically to your claim that DirectSorage on PC will use 64K blocks.

Link me to that please if you can.

A MinMip map is paired with a tiled texture.
The typical usage pattern is for each texel in the MinMip map to correspond to a mip region of the paired resource. As the paired resource is likely to be a reserved resource, it is likely that a 64KB-worth-of-data-sized mip region would be used, or a small multiple thereof.




64kb is not required to be used, but it seems like that size is recommended.

There's also, from that same link;
Mip region constraints
Each dimension of a mip region is

  • a power-of-two number
  • greater than or equal to 4
  • less than or equal to half of the dimension of the most detailed mip of the paired texture.
The most fine-grained a mip region is allowed to be is 4x4 (four texels by four texels) in the paired 2D texture.

For example, for a 32x32 paired texture, mip regions of 16x16, 8x8, or 4x4 are allowed. Going with this example, asymmetric mip regions like 16x8 or 4x8 are also allowed. As another example, for an 100x100 paired texture, mip regions of 32x32, 16x16, 8x8, and 4x4 are allowed.
 
Last edited:

rntongo

Banned
If there is some special tech that provides the improvement then these sort of comments would make sense but there isn't. This stuff is all present elsewhere

What you're not getting is that the 2x efficiency (not performance in the spec) exists already on other GPUs in SF and PRT. So you are being oversold some nonsense. The extra added to 'SFS' is to improve visual pop-in on misses and you still have no idea if other GPUs have this feature or not.

You know this is a lie? I didn't read the rest of what you were saying when I saw this:
What you're not getting is that the 2x efficiency (not performance in the spec) exists already on other GPUs in SF and PRT.
 
It's a bit dumb of a completely random twitter user to try and advertise SFS as "increasing SSD performance" though. That's like a broadband provider saying you can increase our broadband performance by only downloading the essentials. You can't 'Increase performance the performance is defined and the same as before. You can use what it provides efficiently but you can do that with any other provider too.

There is no custom hardware that gets you better performance and the SF that provides 75% saving is present everywhere. Like somebody having their game running at 30fps on a 2tf machine doing some optimisation to get 60fps then saying "wow I've essentially made this machine 4tf!".

This Is an exceptionally picky quibble in a discussion comparing performance. If loading the same textures takes up half the bandwidth and half the memory as another system, how is that functionally different than having twice the hardware? If it gets you the performance, who cares whether it’s higher specs or just more efficient hardware?

What you're not getting is that the 2x efficiency (not performance in the spec) exists already on other GPUs in SF and PRT. So you are being oversold some nonsense. The extra added to 'SFS' is to improve visual pop-in on misses and you still have no idea if other GPUs have this feature or not.

Unless I’ve missed some official info, you are making lots of assumptions here:

”2X efficiency... exists already on other GPUs”
”SFS is to improve visual pop-in on misses”

You don’t know these things. You’re just speculating from marketing materials like the rest of us.
 

Panajev2001a

GAF's Pleasant Genius
”2X efficiency... exists already on other GPUs”
”SFS is to improve visual pop-in on misses”

You don’t know these things. You’re just speculating from marketing materials like the rest of us

He is starting like others from what their own engineers have said and applying a modicum of logic while on the other side there is a hidden magic fairy dust that, on top of a custom HW texture blending mode SFS has on top of PRT+... oops SF (only element anyone could name) delivers 2-3x or more memory bandwidth and storage improvements over PS5 or RDNA cards... why? Don’t know...
 

Panajev2001a

GAF's Pleasant Genius
You know this is a lie? I didn't read the rest of what you were saying when I saw this:
What you're not getting is that the 2x efficiency (not performance in the spec) exists already on other GPUs in SF and PRT.

Still peddling this 2-3x bandwidth multiplier on going from SF to SFS must be delivering else the SSD bandwidth that was supposedly worthless now matters too much and magically out of the woodwork you get 2-3x the performance you needed to close the gap... how convenient ;)...
 
Last edited:
Is there mention on if the demo needed to be scaled down when running on the laptop?

“Our goal is that the graphic quality like this demo, we want to make it run 60FPS at next-gen consoles. But now we do not reach the goal. Now it is 30FPS. Our target is 60FPS, that is also why we can not release it now. And I can assure you that we can run this demo in our notebook, in editor , not cooked, it even can 40FPS. (Afterwards someone in BBS confirmed that the device is RTX2080 and 970EVO)”

They were running it unbaked in the engine editor not an optimised final product like what was shown, so what the laptop was running was likely more taxing.
 
Last edited:
He is starting like others from what their own engineers have said and applying a modicum of logic while on the other side there is a hidden magic fairy dust that, on top of a custom HW texture blending mode SFS has on top of PRT+... oops SF (only element anyone could name) delivers 2-3x or more memory bandwidth and storage improvements over PS5 or RDNA cards... why? Don’t know...

That very well may end up being the case but I think these discussions become a lot less fun when people take their own assumptions as gospel.

The truth is we don’t have the full picture and probably won’t for a little while. Here’s what we know right now:

1) Microsoft chose to put a “2X to 3X” performance improvement claim in their product description tied to a feature that (in name) is a variation on a DX12/RDNA2 feature.

2) The PS5 may or may not have the full PC/Xbox RDNA2 feature-set, including Sampler Feedback.

To take those facts and conclude that the PS5 (and other PRT implementations) has the same or a very similar performance improvement is a big assumption. That’s like seeing Unreal’s Nanite demo and saying “well people have been using LODs for decades so it’s nothing new.”

Until we see how SFS/SF works in the field (or if you get to experiment with it yourself) we’re not really able to make these kinds of claims for one side or the other.
 

Panajev2001a

GAF's Pleasant Genius
That very well may end up being the case but I think these discussions become a lot less fun when people take their own assumptions as gospel.

The truth is we don’t have the full picture and probably won’t for a little while. Here’s what we know right now:

1) Microsoft chose to put a “2X to 3X” performance improvement claim in their product description tied to a feature that (in name) is a variation on a DX12/RDNA2 feature.

2) The PS5 may or may not have the full PC/Xbox RDNA2 feature-set, including Sampler Feedback.

SF makes use of PRF easier/more efficient and SFS a bit more so when dealing with streaming systems. SFS is an optimisation of SF for texture streaming and SF is an optimisation for PRT and managing texture prefetching/paging without developer intervention. This is what MS engineers have stated, not PS5 fanboys.

One thing is negating SFS is more efficient than SF and PRT, which would be wrong, and another thing is assuming 2-3x bandwidth multiplier was meant as a generalised improvement in reference to games using PRT efficiently. This has been stated as such by nobody except people suddenly interested in getting close or above that 5.5 GB/s uncompressed - 9 GB/s compressed that up until last week was only useful for meaningless 1-2 s faster loading times.
 
I said this before, but I am not convinced that storage bandwidth should be weighted equally as CPU/GPU raw render ability in the overall rendering process

This right here. They are apples to oranges overall.

Also, you triggered me into going into a wall of text, but I got some examples for this stuff. Those with Wall-of-Text-phobia have been warned :goog_biggrin:

......................

(NOTE: I use LOQ as "Level of Quality"; I don't necessarily mean it the same was as LOD, or Level of Detail, although there's some overlap. LOD is, per Wikipedia: "involves decreasing the complexity of a 3D model representation as it moves away from the viewer or according to other metrics such as object importance, viewpoint-relative speed or position. Level of detail techniques increase the efficiency of rendering by decreasing the workload on graphics pipeline stages, usually vertex transformations."

I'm using Level of Quality/LOQ as a means of marrying that with visualized parameter boundaries specified in virtualized game-world metrics, such as feet (FT), and arbitrarily stating different LOQs for various ranges of feet (FT). Just so things are clear on the terminology that follows.)

Maybe if people pictured it like a 2D camera system it would make a bit more sense. Like, in a lot of 2D games, you actually have a camera system (of sorts), because there are still objects and visuals being calculated outside of the immediate viewing frame, just possibly at a lower rate.

The frame window is the GDDR6 RAM, and let's say the data outside of it is what comes in from the SSD. Let's split that up into a few rings. The immediate ring around the main frame window's perimeter is the Level 1 priority data, it's the data to immediately replace and/or swap assets in RAM (if it needs to; otherwise I don't see why the SSD can't simply stream this through to the GPU if there's a means to do so which I believe is the case for both PS5 and XSX). So say the frame window is a "L1 cache", then this first "ring" would be the L2 (I am NOT saying these things are actually analogous with system memory caches, just saying that if you wanted to give a point of comparison to better visualize things, you could structure a visualization similar to system memory caches)

Let's say there are several other "rings" (consider these rectangular rings) around that first ring; all of these represent data on the SSD being streamed in at different levels of priority. And let's say the "rings" have different levels of quality (I'll just call it LOQ). The immediate LOQ (the one with the active frame window, representing the RAM) is LOQ 0/zero, the highest quality. And let's just arbitrarily say the "rings" surrounding that have a linear ranking from 1 through 8, 8 being the lowest quality level.

The proportion of each of the 1-8 LOQs would change depending on a lot of different factors; not every game will have the same demands, and this is influenced by things such as design style (linear level-based, open-world, etc.), graphical styles/techniques, game design choices, implementation of game mechanics and physics systems, etc. By and large, let's just take the general raw speeds on both PS5 and XSX to define a ratio of 2.25:1. In other words, for all the rings 1 through 8, PS5's would be 125% larger, but if we're talking, say, wide open spaces of thousand of feet in span, and you get an average of 1000 FT per "ring", on XSX LOQ 1 would be 1000 FT while on PS5 it'd be 2250 FT, just as an example. And this is assuming that the sizes/proportions per "ring" would stay the same/be even across the board (i.e the sizes are a fixed constant of the same value for each LOQ).

The question would then become, at what "ring" level does depreciating LOQ ratings not have any perceivable impact on a player's experience? For example, if in the aforementioned example PS5's LOQ 1 "ring" is 2500 FT and XSX's LOQ 1 "ring" is 1000 FT, how much does that actually impact the player's experience in terms of immersion? Because let's also keep in mind, the LOQ 0/zero "ring", which is the one the RAM space occupies, that's the immediate visual data directly around the player (measured in terms of virtualized game real estate), and if we assume a 1000 FT LOQ ring is equivalent to, say, 10 GB of physical data on the SSD drive, then we can assume LOQ 0/zero can provide, let's say, 250 FT of the highest-level asset quality (I reduced it from 1000 FT because what's in RAM I would assume is also being ran through intense physics and interactive simulations by the player, and also various other scripting logic, character model assets, enemy models, etc.).

So now let's ask ourselves, is there really a perceivable (as in, during actual gameplay, seen by the average player) gulf in quality if one system can produce highest-quality and extremely high-quality asset texture quality in a virtualized game space of, say, 1250 FT at any one time (XSX), vs. 2500 FT at any one time (PS5), if specifying a cutoff at LOQ ring 1, and assuming this is an actual game with true logic and physics, AI, and enemies happening on-screen? Will the average gamer be able to tell some tree at a distance of 1250 FT from them has any glaring difference in visual texture quality than a tree literally a foot in front of them?

Chance are, probably not. Firstly because what average gamer is going to be paying attention to that tree far off in the distance? More importantly, if that happens to be the case, you then have to ask is it even worth having a high-quality asset texture model upwards even 2500 feet away from the player if they will not be viewing said model at the given moment? Maybe you can get away with a lower-quality version of that model at that distance after all and there is no perceivable impact to the visual immersive experience for the average player, considering most real-world gameplay scenarios. It does require a bit more work from the visual artists I suppose, but I figure advancing toolsets would automate much of that type of stuff for them regardless.

If that happens to be the case, a developer can basically go "well, we really don't need high-quality texture assets for these objects at (n) distance, so let's make LOQs at 4 and lower (as in, 4, 5, 6, 7, and 8) 100 FT instead of 1000 FT." Okay, cool. Now that frees up a bigger range for high-quality texture assets in LOQs 1, 2, and 3. Now if that means LOQ 1 goes from covering 1000 FT to, say, 2250 FT, for the XSX that would effectively put its LOQ 1 at PS5's default LOQ 1, though it means the XSX had to drop the range of lower LOQ rings notably to do so.

The thing is, in many games (in fact I'd say the vast majority), this would be perfectly fine, because again just how far off into the distance are we really expecting MOST gamers to painstakingly pay attention to when it comes to details? More importantly, if you're familiar with classical painting techniques, you'll know that you can actually hint/indicate an impression of detail without actually providing much or even any detail in a given area. This is usually done by utilizing smart color pairings and value contrasts; in fact that's exactly what we saw in the UE5 demo in areas such as on the girl character's skin with the light coming down as she crawled through a cave passage.

Now, everything I just mentioned perfectly applies to PS5 as well and say, for example, a dev could lower the range on the lower LOQ "rings" to bump up the range on the first upper ones. So let's say it's by the same rate as the XSX example; that could then give the PS5's LOQ 1 "ring" a range of 3500 FT instead of 2250 FT. Again, for certain types of games and some real-world gameplay scenarios this can be beneficial, but there's always a point of diminishing returns because by and large players will focus on the details immediately in front of them and immediately around their avatar.

So I can actually also switch over to talking about this a bit in terms of simply increasing the level of detail on immediate (LOQ 0, LOQ 1) texture assets, as well. Because like I just said, not every game needs intense texture detail on objects super far-away (and in fact, sometimes you can get very comparable or superior results with smart utilization of various visual effects and low-quality model assets to give impressions of details that work just as well, and save a ton on the data streaming pipeline). Again, picture painting techniques from the Old Masters ;)

Now, the issue with this is, at some point you're going to hit a wall in terms of the limit you can push the texture asset quality being streamed in vs. what the system can ACTUALLY output in terms of resolution to the display device, because while the former is reliant on the SSD, the latter is reliant on the GPU (and if you want to go a step further, the fidelity of those two would be reliant on the CPU).
What's the point of trying to pump out a 16K texture asset to stream from storage when the game resolution is going to be "only" 4K (and that's just in some cases; we're probably looking closer at 1440p - 2160p for a good deal of next-gen games, tho I'd like 4K60 to be more of a standard personally)? The perceived level of difference to the average player between an 8K version of that texture and a 16K version of the same texture at an output resolution of barely 4K on a 4K screen, will likely be imperceptible. After all, if most people already have a hard time telling much of any immersion-breaking (or even immersion-irritating) difference between things like RE3 Remake on PS4 Pro vs. RE3 Remake on the One X, that argument should perfectly carry over to native texture resolution asset streaming that reaches a point where the returns rapidly diminish on most standard modern 4K displays, right?

Again this is where I'd like to focus on things like DLSS techniques for a moment, because at least IMO, if you had the choice between streaming through extremely large, super-high resolution 16K or even some 8K texture assets from storage into memory, or from storage through the GPU via some implementation of an AMD SSG-type setup (which I believe both systems are doing TBH), versus rolling with a lower-quality texture you can simply upscale to a higher resolution that appears pretty much like it were a native resolution anyway...I would think most would choose the latter. After all, it saves on game disc and game install space, file size, and that frees up the SSD data pipeline for moving through more unique data overall. In particular for artists who like to use some notable degree of programming to create their art, the latter approach would probably be their preferred one (granted it's not the only approach, of course).

I thought that was gonna be a shorter explanation than it turned out to be :p, goes to show how complicated this stuff is. It may not be a perfect explanation (is anything ever, truly?), but I had an interest in trying to visualize a breakdown of how PS5 and XSX's SSDs could operate on a basic level with some practical examples and considering majority of real-time, general gameplay use-cases and gamer habits. And also, to show how you don't always need "more" to accomplish "more", depending on a variety of other factors. Maybe I'd like to elaborate on this in the future sometime, but I'm gonna wait until we get more system info from MS and Sony before doing that, so it's definitely gonna be well into August at earliest.
 
Last edited:

geordiemp

Member
That very well may end up being the case but I think these discussions become a lot less fun when people take their own assumptions as gospel.

The truth is we don’t have the full picture and probably won’t for a little while. Here’s what we know right now:

1) Microsoft chose to put a “2X to 3X” performance improvement claim in their product description tied to a feature that (in name) is a variation on a DX12/RDNA2 feature.

2) The PS5 may or may not have the full PC/Xbox RDNA2 feature-set, including Sampler Feedback.

To take those facts and conclude that the PS5 (and other PRT implementations) has the same or a very similar performance improvement is a big assumption. That’s like seeing Unreal’s Nanite demo and saying “well people have been using LODs for decades so it’s nothing new.”

Until we see how SFS/SF works in the field (or if you get to experiment with it yourself) we’re not really able to make these kinds of claims for one side or the other.

I agree with you, there is something not right when state of decay loading in the mS demo from Xbox studio resume took 11 seconds for what, at most 5 GB. That is not 2.4 Gb/s spec that is advertsied, so something is slowing down the real world performance.

If the new API / method gets MS back to the 2.4 GB/s Raw then thats great for everyone,a nd if MS say its faster it will be, question is faster than what baseline ?
 
One thing is negating SFS is more efficient than SF and PRT, which would be wrong, and another thing is assuming 2-3x bandwidth multiplier was meant as a generalised improvement in reference to games using PRT efficiently...

PRT like any other technique has it’s limitations, so is it really so impossible that SFS provides a 2-3X efficiency improvement on top of that? Without SF the software is largely blind to which MIP/portion of a MIP is actually needed. Maybe this extra info is actually that much more valuable compared to the approximations used in existing PRT techniques. I think it’s actually quite possible that SF offers this much improvement. I’m not certain or anything, but I’m certainly not going to scoff at that idea.
 
“Our goal is that the graphic quality like this demo, we want to make it run 60FPS at next-gen consoles. But now we do not reach the goal. Now it is 30FPS. Our target is 60FPS, that is also why we can not release it now. And I can assure you that we can run this demo in our notebook, in editor , not cooked, it even can 40FPS. (Afterwards someone in BBS confirmed that the device is RTX2080 and 970EVO)”

They were running it unbaked in the engine editor not an optimised final product like what was shown, so what the laptop was running was likely more taxing.

but more taxing of what exactly?

Keep in mind as you say the APIs for the new consoles are not finished, nor are the SDK nearly as mature as what would be in the laptop.
It does mean two things though. It means first that the demo isnt using PS5s full throughput, only that of the 970EVO at most, and that from what
we have heard the frame rate is locked at 30.... so currently it may run closer to 40, but we dont know how close.
And 2nd that its within firing distance - and they EXPECT it to improve.
Sounds to me like something developed for years on Windows PC was now moving to the PS5, not fully utilizing its I/O with a small framerate drop or a very
similar framerate (not yet able to lock it to 60, but also only 40 on a 2080 mobile (a very competent GPU really, all told).
This also means that because it can hit 40fps on a 970EVO and- assuming at the same resolution (any confirmation that?) the IO isnt a bottleneck.
So there should be no reason to think the PS5 would have an advantage here based on whats shown- but we dont know about total potential
only this demo based on the info about the laptop they sort of "provided"

TLDR: Unfinished API/"Drivers(RDNA2 isnt out so I am sure there will be more learning/ tweaking/ API updates from all sides)"
on a dev kit level system running an unfinished engine not fully utilizing bandwidth of the system is essentially keeping up with a 2080 mobile
with fully established API and drivers on its original target platform.

Honestly for me, thats pretty good.

We can play a game and say, assuming this is as good as performance would ever get, lets pretend it isnt a locked
30FPS and the PS5 is maxed, lets imagine the PS5 is pushing EXACTLY 30fps and thats the most it can do. It means the best you'll see from the XSX
would be lets say 35- or along with the CPU being slightly higher, high 30s locked(This is going on floating point ops. I have zero info on if the GPU
clock should matter but I do not think its GPU clock dependent, the 2080 mobile is NOT a high clocked chip, I think 1500 boost... mobile chip of course it is.

But we know its early and that wont be the case. And we know both consoles will, once optimized for be a lot nicer than a 2080 mobile.
For one BOTH of the new systems have a much better clock frequency and are in a much better cooling position, not to mention the
bandwidth will be higher with less OS overhead.

I cant wait man this forum is going to be a mess when actual games come out. Remember the XBone/PS4 launch? My god that was nuts.
ALL THE NITPICKING over GRASS and stuff... with a 30 percent GPU difference. With half that we're in for some interesting "comparisons" and arguments about "optimized"
and "lazy developers" and stuff.

I am going to say, I like to think my vision is still sharp but with all the desk work I do now that Im not a field engineer I am going to have a hard time
telling the difference.... or so I imagine.
 

Ascend

Member
SF makes use of PRF easier/more efficient and SFS a bit more so when dealing with streaming systems. SFS is an optimisation of SF for texture streaming and SF is an optimisation for PRT and managing texture prefetching/paging without developer intervention. This is what MS engineers have stated, not PS5 fanboys.

One thing is negating SFS is more efficient than SF and PRT, which would be wrong, and another thing is assuming 2-3x bandwidth multiplier was meant as a generalised improvement in reference to games using PRT efficiently. This has been stated as such by nobody except people suddenly interested in getting close or above that 5.5 GB/s uncompressed - 9 GB/s compressed that up until last week was only useful for meaningless 1-2 s faster loading times.
I hadn't even heard of the 2x - 3x bandwidth multiplier before three days ago. I have said in the past that people are ignoring SFS but that it seems to be a great feature to help with loading/streaming. I didn't bother researching it a bit until recently, because I didn't think there was any info on it out there. Apprently there is, and I don't see the 3x reduction in bandwidth usage as impossible. The thing is, it is not guaranteed, because if you're up close to an object and that object is all you see, you will not be able to avoid loading the highest quality of the textures nor the full texture, which means SFS will basically give zero advantage in such a case, since there is nothing to 'discard', or rather avoid loading to RAM. But for far away objects that have extremely detailed textures, SFS will likely reduce the required bandwidth by quite a lot. The PS5 will have its 8-9GB/s at all times, while the benefit of SFS is sort of situational, although calling it situational kind of downplays its capability a bit, since in the majority of cases/games, you won't be hugging walls constantly.

I think I explained SFS quite well in this post;
 
Last edited:

rntongo

Banned
I agree with you, there is something not right when state of decay loading in the mS demo from Xbox studio resume took 11 seconds for what, at most 5 GB. That is not 2.4 Gb/s spec that is advertsied, so something is slowing down the real world performance.

If the new API / method gets MS back to the 2.4 GB/s Raw then thats great for everyone,a nd if MS say its faster it will be, question is faster than what baseline ?

We haven't seen any system use the XVA's hardware decompression nor the DirectStorage API. The SOD2, Gear of War 5 demos show a consistent 4-5x improvement even with bottlenecks by just using the SSD. What's so hard to understand? The bottleneck could have been the XSX CPU having to do the decompression.

Here's a quote from The Coalition on top of what's been said by DF:
"With the Xbox Series X, out of the gate, we reduced our load-times by more than 4x without any code changes. With the new DirectStorage APIs and new hardware decompression, we can further improve I/O performance and reduce CPU overhead, both of which are essential to achieve fast loading."

Stop trying to frame it like the loading demos we've seen so far are what the actual performance will be. Digital Foundry clearly stated that the only part of the XVA that was being used was the SSD. So no SFS, DirectStorageAPI and it's use of the hardware decompression block. I think MSFT made a major mistake showing those loading demos because some people are going to try and frame them as the final console performance.
 

rntongo

Banned
Just throwing this out there...

SF =/= SFS
PRT = primarily GPU & RAM work.

Honestly at this point it's intentional that they are ignoring this. They understand that SF on the XSX is going to use custom and exclusive hardware and that it's just part of SFS hardware in the GPU. They just want to shoot it down but it's absurd because it's like saying the 12 channels in the PS5 SSD don't do anything new.
 
Stop trying to frame it like the loading demos we've seen so far are what the actual performance will be. Digital Foundry clearly stated that the only part of the XVA that was being used was the SSD. So no SFS, DirectStorageAPI and it's use of the hardware decompression block. I think MSFT made a major mistake showing those loading demos because some people are going to try and frame them as the final console performance.

Theres no other logical conclusion to draw. Of course its not the best its going to be- thats a given. But when you DEMONSTRATE your
machines and thats what you wheel out - Thats what people think you're showcasing.

On one hand it wont matter much. The average consumer doesnt care about a half a year before launch loading time in a video.
On the other hand - the video matters so little, why show something if you cant show something competitive?

From a curiosity standpoint, from a standpoint of INTEREST, the presentations and whats been shown so far have been good...
From a "reveal" standpoint they've been very.... mixed.
 

rntongo

Banned
That very well may end up being the case but I think these discussions become a lot less fun when people take their own assumptions as gospel.

The truth is we don’t have the full picture and probably won’t for a little while. Here’s what we know right now:

1) Microsoft chose to put a “2X to 3X” performance improvement claim in their product description tied to a feature that (in name) is a variation on a DX12/RDNA2 feature.

2) The PS5 may or may not have the full PC/Xbox RDNA2 feature-set, including Sampler Feedback.

To take those facts and conclude that the PS5 (and other PRT implementations) has the same or a very similar performance improvement is a big assumption. That’s like seeing Unreal’s Nanite demo and saying “well people have been using LODs for decades so it’s nothing new.”

Until we see how SFS/SF works in the field (or if you get to experiment with it yourself) we’re not really able to make these kinds of claims for one side or the other.

You summarised it well. MSFT has made a claim and they will have to prove it in the real world performance. On the other hand, as you said, some are taking the facts and concluding that the PS5 will have the same thing. That's absurd. It's like saying the XSX SSD has 6 priority levels in it's SSD controller just because Cerny said the same for the PS5.

Basically they are the PS5 fan versions of PCs have had SSDs since the early 2010s.
 

geordiemp

Member
We haven't seen any system use the XVA's hardware decompression nor the DirectStorage API. The SOD2, Gear of War 5 demos show a consistent 4-5x improvement even with bottlenecks by just using the SSD. What's so hard to understand? The bottleneck could have been the XSX CPU having to do the decompression.

Here's a quote from The Coalition on top of what's been said by DF:
"With the Xbox Series X, out of the gate, we reduced our load-times by more than 4x without any code changes. With the new DirectStorage APIs and new hardware decompression, we can further improve I/O performance and reduce CPU overhead, both of which are essential to achieve fast loading."

Stop trying to frame it like the loading demos we've seen so far are what the actual performance will be. Digital Foundry clearly stated that the only part of the XVA that was being used was the SSD. So no SFS, DirectStorageAPI and it's use of the hardware decompression block. I think MSFT made a major mistake showing those loading demos because some people are going to try and frame them as the final console performance.

It does not need hardware decompression to have 2.4 Gb/s raw.

You understand what raw means dont you ? Means uncompressed speed. If its BCpack compressed textures it should be over 4 right ?

And you think a ZEN2 CPU with 16 threads of 3.6 GHz would slow down loading that much even if it had to help decompressing. Really ? LOL

And this demo was shown when 16 March 20, and your telling me the system is not ready yet. mmmm,. OK.

Yes MS did make a mistake showing really slow performance in March 2020 when Sony showed 1 second loading a whole year earlier in May 2019.

Maybe MS are working hard at it, seems a bit late release is 5 months away, should all be done by now.
 
Last edited:

rntongo

Banned
Theres no other logical conclusion to draw. Of course its not the best its going to be- thats a given. But when you DEMONSTRATE your
machines and thats what you wheel out - Thats what people think you're showcasing.

On one hand it wont matter much. The average consumer doesnt care about a half a year before launch loading time in a video.
On the other hand - the video matters so little, why show something if you cant show something competitive?

From a curiosity standpoint, from a standpoint of INTEREST, the presentations and whats been shown so far have been good...
From a "reveal" standpoint they've been very.... mixed.

I think that's where MSFT made a mistake. They should have been very clear about the limitations of the demos instead of telling a few people and leaving it to devs and journalists to explain.
 

rntongo

Banned
It does not need hardware decompression to have 2.4 Gb/s raw.

You understand what raw means dont you ? Means uncompressed speed. If its BCpack compressed textures it should be over 4 right ?

And you think a ZEN2 CPU with 16 threads of 3.6 GHz would slow down loading even if it had to help decompressing. Really ? LOL

And this demo was shown when 16 March, and your telling me the system is not ready yet. mmmm,. OK.

Yes MS did make a mistake showing really slow performance in March 2020 when Sony showed 1 second loading a whole year earlier in May 2019.

Maybe MS are working hard at it, seems a bit late release is 5 months away, should all be done by now.

Smdh!
If a game doesn't have access to the File I/O APIs it doesn't matter even if it was running on the PS5 SSD, it still would be bottlenecked. The hardware/software to eliminate the bottlenecks between the SSD and game code are not yet ready for use otherwise the devs would have had access to them. It's that simple.

Even on the PS5, games that were developed for the PS4 will have to change the file I/O code to utilize the SSD. If they do not, they have to use the CPU for decompression etc and you'll likely see a 10x improvement in loading and not the 100x being claimed by Sony. That's why the Coalition dev stated that once they have access to the APIs they will be able to reduce CPU overhead.

"
With the Xbox Series X, out of the gate, we reduced our load-times by more than 4x without any code changes. With the new DirectStorage APIs and new hardware decompression, we can further improve I/O performance and reduce CPU overhead, both of which are essential to achieve fast loading."
 

rntongo

Banned
This Is an exceptionally picky quibble in a discussion comparing performance. If loading the same textures takes up half the bandwidth and half the memory as another system, how is that functionally different than having twice the hardware? If it gets you the performance, who cares whether it’s higher specs or just more efficient hardware?



Unless I’ve missed some official info, you are making lots of assumptions here:

”2X efficiency... exists already on other GPUs”
”SFS is to improve visual pop-in on misses”

You don’t know these things. You’re just speculating from marketing materials like the rest of us.

P Panajev2001a was just lying when he said :

”2X efficiency... exists already on other GPUs”
”SFS is to improve visual pop-in on misses”

He thought we wouldn't realize.
 

Three

Member
You know this is a lie? I didn't read the rest of what you were saying when I saw this:
What you're not getting is that the 2x efficiency (not performance in the spec) exists already on other GPUs in SF and PRT.
Se tell me then how does it do this? Can you show me what tech there is in SFS over PRT that results in a 2-3x performance boost? How does it do this exactly?

The only people taking things as gospel are those not understanding or not asking how SFS would gain that 3x performance. The answer is it won't. The stated figures are in comparison to loading the whole texture. You can do PRT since PS2. Show how you can cut the streamed textures in half in comparison to PRT.
 

rntongo

Banned
Se tell me then how does it do this? Can you show me what tech there is in SFS over PRT that results in a 2-3x performance boost? How does it do this exactly?

The only people taking things as gospel are those not understanding or not asking how SFS would gain that 3x performance. The answer is it won't. The stated figures are in comparison to loading the whole texture. You can do PRT since PS2. Show how you can cut the streamed textures in half in comparison to PRT.

The MSFT engineer that offered details on SFS only spoke about SF as being one part of the hardware and that even though it(SF) is a part of DX12U the XSX will have custom hardware as well for SF. He said he cannot reveal much right now. Don't ask me what the other hardware there is, just wait for more information on it.
 
Thank you, I just don't want to be lied to.

You and I both xD (and I would say the same for a lot of us here).

Ah, as for when we can expect more info on stuff like SFS, best bet is the Hot Chips presentation in...August. On the 17th. So literally three months from now at the latest (hopefully).

Kind of a long wait especially considering the stuff going on nowadays, but it aught to be worth it. We still got other stuff to look forward to as well even if it's not going to be quite tech-focused like the PS5 event and Xbox July showcase, too.
 
Last edited:

geordiemp

Member
Smdh!
If a game doesn't have access to the File I/O APIs it doesn't matter even if it was running on the PS5 SSD, it still would be bottlenecked. The hardware/software to eliminate the bottlenecks between the SSD and game code are not yet ready for use otherwise the devs would have had access to them. It's that simple.

Even on the PS5, games that were developed for the PS4 will have to change the file I/O code to utilize the SSD. If they do not, they have to use the CPU for decompression etc and you'll likely see a 10x improvement in loading and not the 100x being claimed by Sony. That's why the Coalition dev stated that once they have access to the APIs they will be able to reduce CPU overhead.

"
With the Xbox Series X, out of the gate, we reduced our load-times by more than 4x without any code changes. With the new DirectStorage APIs and new hardware decompression, we can further improve I/O performance and reduce CPU overhead, both of which are essential to achieve fast loading."

Well lets hope you are right, we need both consoles to be able to stream in sub frame timescales easily for 3rd party devs to move to engines that use high assets, that UE5 demo at least taught everyone that this week,

Still suprised MS are still working on this, its due to release in 5-6 months and we expect SOME True nextgen games to be ready.
 
Last edited:
I agree with you, there is something not right when state of decay loading in the mS demo from Xbox studio resume took 11 seconds for what, at most 5 GB. That is not 2.4 Gb/s spec that is advertsied, so something is slowing down the real world performance.

If the new API / method gets MS back to the 2.4 GB/s Raw then thats great for everyone,a nd if MS say its faster it will be, question is faster than what baseline ?
2wfDTVo.jpg


Maybe there's some sort of bottleneck along the path that stops the SSD from reaching it's peak?

The real question is whether is a hardware issue or a software one.

If it's software I can definitely see it getting fixed. But if it's hardware that's more difficult to do.
 
Last edited:
Still suprised MS are still working on this, its due to release in 5-6 months and we expect SOME True nextgen games to be ready.

It is a little bit weird since it's such a huge feature for next gen and you would think it would be ready before the system launches so developers can optimize for it.

Microsoft showed off the retail unit a long time ago so you would think everything has been finalized with the hardware. Hopefully it's a software limitation that can be fixed.

We really need both systems to have really good I/O systems even if one ends up slightly slower.
 

rntongo

Banned
Well lets hope you are right, we need both consoles to be able to stream in sub frame timescales easily for 3rd party devs to move to engines that use high assets, that UE5 demo at least taught everyone that this week,

Still suprised MS are still working on this, its due to release in 5-6 months and we expect SOME True nextgen games to be ready.

Honestly I'm surprised as well but it seems MSFT is really putting a lot of weight into efficient texture compression and streaming. The last tweet I read they were still working on the texture compression(Xbox Texture Compression), this was like a month or two ago. The hardware has been finalized for the longest time so it must be related to the software APIs. I was honestly impressed with the PS5 SSD and all the custom hardware they built to eliminate bottlenecks so I look forward to hearing more about the XSX hardware.
 

Three

Member
The MSFT engineer that offered details on SFS only spoke about SF as being one part of the hardware and that even though it(SF) is a part of DX12U the XSX will have custom hardware as well for SF. He said he cannot reveal much right now. Don't ask me what the other hardware there is, just wait for more information on it.
The hardware he mentioned was for SFS. The efficiency saving was for SF (compared to loading the whole texture). The 2x-3x was from another random person. The details he didn't give.
 
2wfDTVo.jpg


Maybe there's some sort of bottleneck along the path that stops the SSD from reaching it's peak?

The real question is whether is a hardware issue or a software one.

If it's software I can definitely see it getting fixed. But if it's hardware that's more difficult to do.

geordiemp geordiemp Is citing inaccurate load times on the SOD2 load; I rewatched that recently and counted 6.5 seconds raw load. That's a little over the provided 2.4 GB/s spec MS mentioned.

I'm sure they are still optimizing parts of their I/O same as Sony, but that demo shouldn't be a cause of concern in all honesty because the 11 second figure isn't reflective of what that demo actually performed at, not even.

Well lets hope you are right, we need both consoles to be able to stream in sub frame timescales easily for 3rd party devs to move to engines that use high assets, that UE5 demo at least taught everyone that this week,

Still suprised MS are still working on this, its due to release in 5-6 months and we expect SOME True nextgen games to be ready.

If it's mainly software-related API fixes and implementations, I don't see why it couldn't be further worked on even if the hardware side is mostly finalized. Sony is very much doing the same; optimizing the software stack even if hardware is pretty much finished and ready.

IIRC both systems are entering the wide-scale manufacturing phase very soon, so hardware work on them is practically completed.
 
Random read latency, seek latency (RAM lookup is not cheap), Jaguar decompressing stuff, probably just a single thread.

I think the issue here is that we are assuming that both eliminated all bottlenecks or the same number of them.

It could be possible that one eliminated more bottlenecks the other. Which means that one I/O system is probably more efficient. The real question is which one?

I personally believe that it's probably Sony due to how extensively they talked about their I/O system and the elimination of bottle necks. While I'm pretty sure Microsoft took steps to eliminate them I don't think they did as good as job as Sony did. Otherwise their would have talked about their I/O system more in depth like Mark did.
 

psorcerer

Banned
The real question is which one?

Obviously Sony's one.
If we assume that everything that was mentioned in the Sony's SSD patent is implemented it's a next gen storage hardware.
Like, no hardware in the world currently has the similar performance (for the task of gathering huge amounts of data from an attached flash), including server/datacenter solutions that cost $100k or more,
 

MCplayer

Member
Terrible explanation

Memory is a precious commodity in computer systems. Advances in memory architecture has been a lot slower than say everything else in a computer. Add on top of that, bandwidth is an even more precious commodity.

You have your CPUs and GPUs which process tons of information but if you don't have a fast enough Memory and bandwidth, they will be sitting idle most of the time waiting to be fed information.

You have 2GB storage to work with and you can transfer a file at 1GB/s from that storage to your CPU and GPU. it will take 2 seconds transfer 2GB worth of data. This is where compression comes in. If you compress the file, you can fit more things into the storage essentially giving you twice the storage. You can now fit 4GB worth of data into only 2GB worth of storage therefore you can transfer 4GB worth of data to your CPU and GPU giving them better utilization.

Compression comes at a cost of CPU performance because in say a regular computer, you have to use a CPU core or 2 dedicated to just compressing and decompressing data. That is not ideal in a console because you don't want to take away CPU cores that can be used for anything else. Here comes dedicated ASICs who's sole purpose is to be really good at compressing and decompressing data. The CPU is free to do other things while the ASIC will just keep compressing and decompressing data all day everyday without breaking a sweat.

To tie it to XSX and PS5.

XSX has a 1TB SSD that has a RAW bandwidth of 2.4GB/s they also have a decompression system that can handle doing 6GB/s. Meaning you can transfer 4.8GB worth of compressed data without breaking a sweat because they have 1.2GB/s overhead.

PS5 has a 825GB SSD that has a RAW bandwidth of 5.5GB/s, they also have a decompression system that can handle doing 22GB/s.
Meaning they can transfer 11GB worth of compressed data without breaking a sweat because they have 11GB/s overhead.

You always want to compress because you get more out of your storage and IO.
where did the 22GB/s popped up, I missed that
 
Top Bottom