• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Xbox Velocity Architecture - 100 GB is instantly accessible by the developer through a custom hardware decompression block

Ar¢tos

Member
Although early games shown by playstation 5, The difference between the 2 systems is very evident in their hardware choices and software choices. Most of Sony's game offerings show 4K30 FPS Or the other end of the spectrum Sub 4K60. As some people have said a ssd can not replace a fast GPU or cpu.
30 or 60fps has more to do with developer choice than processing power regarding next gen. It's not a 100mhz cpu difference that is going to make a game 30fps on a console and 60fps on the other.
Sony 1st party wants to push visuals, if MS 1st party wants to push performance it's down to choice, not hardware difference since this gen that is less relevant than ever.

I never said it was, how are you reading what I said there to imply that? Again, BC1-7 is like ATSC/ETC/S3TC/PVR... I get that: GPU reads and does not have to uncompress them (they stay compressed in the GPU cache too).

I just was making the assumption that when I write “GPU native format” you can fill in any of those formats wherever I write it :).



Yup, but you save GPU performance as you are not decoding it in software and memory if you have temporary storage for the async compute job to decode blocks into (and less complexity not having to sync with those GPU decompressor shader tasks).
You’d trade off memory bandwidth between GPU and main RAM (which XSX has plenty of) to maximise SSD I/O and get better compression rates/better SSD I/O performance. It would be a pretty smart tradeoff: memory bandwidth cost to pay for better SSD effective bandwidth. It would explain better equivalent compressed data rates and how they can achieve that (2x the raw uncompressed bandwidth vs ~1.45-1.6).
Trying to make sense of the uncompressed vs compressed SSD I/O numbers and how they were explained.

BTW, I am not suggesting BCPack HW decoder uncompresses to raw RGBA or anything crazy :).
I can understand using gpu shaders to decompress since GPUs are never under 100% use and its a great use of idle resources, but won't latency have an impact?
 

Bernkastel

Ask me about my fanboy energy!
Although early games shown by playstation 5, The difference between the 2 systems is very evident in their hardware choices and software choices. Most of Sony's game offerings show 4K30 FPS Or the other end of the spectrum Sub 4K60. As some people have said a ssd can not replace a fast GPU or cpu.
Medium is 30 fps though. Although it does many impressive things utilizing XVA.
 

oldergamer

Member
Pointing out an angle that makes XSX BCPack look better is now console warring? Weird...
No, I'm not concerned with that, I never was to be honest. I wasn't looking for an "angle" to quote you. it was simple speculation. No matter if it's to the benefit of Xbox or PS5. You called my name out, and i replied. To me it seems like you only took interest when you saw some info specific to PS5. I don't see you jumping into other threads to tamp down crazy speculation on PS5 like you did here early on.

I am speculating based on hard data, benchmarks provided by Sony, MS (compressed vs uncompressed), Oodle engineers, and seeing how they are tied together...
Ah, so we are in agreement that you are speculating. Note, the only new info you have is what someone has written on twitter about what they now support for PS5. There's no new information on the xbox side. MS have yet to provide any official benchmarks. You're speculating, with more information, but it's still speculating. Nothing wrong with that, but you did jump down people's throats for doing the same, regardless of the new information you have now.


not trying to imagine some magical 2-3x increase in bandwidth based on the assumption an comment without detail actually meant XYZ because it would be so cool if it meant it and thus I make a baseline of comparison up (SFS improving bandwidth usage by more than 2-3x as well as memory usage over PRT and software page management).
You keep going back to this BS "magical" nonsense. It's the same people that you're basing xbox compression information on from twitter that mentioned the effective bandwidth savings. In my eyes, you are picking and choosing what you deem valid for speculation.
 
Some interesting discussions on XSX XvA (and the SSDs in general) from B3D. This comment in particular following a chain of responses between Ronaldo8 and Shifty Geezer seems to help put some things into perspective regarding what the SSDs will actually be contributing and how:

And because of that, I don't think the SSD will make that much difference in how something is rendered. It only reduces the size of the texture-cache and the need to load everything as package (multiple times). So you can use the memory more efficient, but overall it doesn't take away the problem that you still must load ahead from things that might get visible in the next few frames. Texture titling etc and the high bandwidth just helps here to reduce the texture-cache footprint. That's more or less all. It won't radical change how games get developed, but offers a method to use the RAM a bit more efficient.

So people assuming the PS5, for example, only needs texture data in the RAM for the exact scene on display are misunderstanding, because latency is too big an issue in that regard. Attempts to refer to the 22 GB/s theoretical peak compression speed is misguided, because only a small range of data will compress at that level (mainly some types of video and audio data), and you still have to account for loss of data integrity/quality as it would be lossy.

Additionally, while the 22 GB/s compressed theoretical peak speed is fast, NAND latency is still extremely bad compared to DRAM, which is slow compared to on-chip SRAM caches, which are slower than the registers. "Slow" in all these instances refers to latency. You can probably guess why the latency gets worst with each drop in the memory stack tier. Affording each NAND module its own channel helps in parallelizing the chunking/stripping of data across a net of chips, but it's still dealing with the inherent technological limitations of NAND.

So you're going to have instances where on PS5 the texture footprint in RAM can be lower compared to XSX, but people need to keep in consideration just what that actually means in practice. That though is certainly beneficial for PS5 in terms of using the hardware more efficiently.
 

Panajev2001a

GAF's Pleasant Genius
No, I'm not concerned with that, I never was to be honest. I wasn't looking for an "angle" to quote you. it was simple speculation. No matter if it's to the benefit of Xbox or PS5. You called my name out, and i replied. To me it seems like you only took interest when you saw some info specific to PS5. I don't see you jumping into other threads to tamp down crazy speculation on PS5 like you did here early on.
Feel free to point out similar crazy rabid speculation like the ones discussed earlier in this thread that seems more like free PR than anything... I like crazy reads, but it is this pursuit of balance where both sides must be the same in any situation that I


Ah, so we are in agreement that you are speculating. Note, the only new info you have is what someone has written on twitter about what they now support for PS5. There's no new information on the xbox side. MS have yet to provide any official benchmarks.
Did XSX’s 4.8 GB/s bandwidth when factoring compression come out of thin air? Did the equivalent 8-9 GB/s figure for PS5 come out of thin air? Did we not have a base raw speed from the manufacturer for both consoles? Is data about zlib and kraken compression public or not?

They went public with something they offer and Sony financed to have first early dibs on and describing this technique and the general library (Oodle Texture with and without use of BC7Prep) gave some more information to think about how the extra efficiency came about for XSX. The info was likely there and smarter people already figured it out but the new news made me think about it again and the maths seem to make sense. Day old news? Maybe, but had not seen it posted...

You're speculating, with more information, but it's still speculating. Nothing wrong with that, but you did jump down people's throats for doing the same, regardless of the new information you have now.
I disagree like others, discussions became inflamed and more and more disingenuous. I think you are making a false equivalence here. I do not have anything against educated guesses / speculation. Fanboy drivel and/or astroturfing on the other side.

You keep going back to this BS "magical" nonsense. It's the same people that you're basing xbox compression information on from twitter that mentioned the effective bandwidth savings. In my eyes, you are picking and choosing what you deem valid for speculation.
The people being quoted are not the same btw, but it is not about what the people said. Nobody specified the baseline for those numbers and nobody beside a small group thinks that we are really talking about 2-3x bandwidth and storage improvement delivered by XSX’s implementation of SFS over a modern PRT based virtual texturing solution.
Some people took the comments made and ran with them misinterpreting them and becoming belligerent if the data was questioned or the conclusions.

My problem was the baseline for the improvement that people made up/speculated on. Not that they speculated.
If you think what I said does not make sense please do so, worst comes to worst I will learn something :).

False equivalence anyways as in, forgive me for the exaggeration to make the metaphor clearer, yelling “fire” when there is not one in a crowded theatre not being a display of freedom of speech like to speaking at Hyde Park Corner.
 

Panajev2001a

GAF's Pleasant Genius
Some interesting discussions on XSX XvA (and the SSDs in general) from B3D. This comment in particular following a chain of responses between Ronaldo8 and Shifty Geezer seems to help put some things into perspective regarding what the SSDs will actually be contributing and how:



So people assuming the PS5, for example, only needs texture data in the RAM for the exact scene on display are misunderstanding, because latency is too big an issue in that regard. Attempts to refer to the 22 GB/s theoretical peak compression speed is misguided, because only a small range of data will compress at that level (mainly some types of video and audio data), and you still have to account for loss of data integrity/quality as it would be lossy.

Additionally, while the 22 GB/s compressed theoretical peak speed is fast, NAND latency is still extremely bad compared to DRAM, which is slow compared to on-chip SRAM caches, which are slower than the registers. "Slow" in all these instances refers to latency. You can probably guess why the latency gets worst with each drop in the memory stack tier. Affording each NAND module its own channel helps in parallelizing the chunking/stripping of data across a net of chips, but it's still dealing with the inherent technological limitations of NAND.

So you're going to have instances where on PS5 the texture footprint in RAM can be lower compared to XSX, but people need to keep in consideration just what that actually means in practice. That though is certainly beneficial for PS5 in terms of using the hardware more efficiently.

Thanks for the notes, do you mind linking the thread for those who wants to follow it more easily?
 
Thanks for the notes, do you mind linking the thread for those who wants to follow it more easily?

Yeah, sure thing. Here 'ya go.

You should also check out the PS5 and Next-Gen Console Technology threads there, too. They're all in the "Console Technology" forum section. Pretty good insights overall I like using in addition to stuff discussed here and on Era.

Ah yeah, just remembered. So yeah, you're going to have instances where PS5's texture footprint in RAM is lower vs. XSX, but in their case it's being done by quickly swapping out chunks of data through the I/O block just overall more quickly. That's not saying the XSX doesn't have its own answer to such, though; while I'm sure there might be some type of texture upscaling techniques available on PS5, I don't think that type of ML feature is as critical to Sony's design as it is to Microsoft's.

So there could be instances where devs utilize software-based upscaling techniques that are engine-specific, incurring some overhead cost. I don't know if MS's solution in terms of the ML features is comparable with Nvidia's DLSS 2.0, but it should be a very solid hardware-based solution in any case. Loading in lower-resolution textures that can be upscaled in real-time by the GPU (and keeping the penalty small, preferably through customized fixed hardware similar to Nvidia's Tensor cores but not exactly like those) to give the desired resolution would seem pretty good for use especially as the generation rolls along.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
Yeah, sure thing. Here 'ya go.

You should also check out the PS5 and Next-Gen Console Technology threads there, too. They're all in the "Console Technology" forum section. Pretty good insights overall I like using in addition to stuff discussed here and on Era.

Ah yeah, just remembered. So yeah, you're going to have instances where PS5's texture footprint in RAM is lower vs. XSX, but in their case it's being done by quickly swapping out chunks of data through the I/O block just overall more quickly. That's not saying the XSX doesn't have its own answer to such, though; while I'm sure there might be some type of texture upscaling techniques available on PS5, I don't think that type of ML feature is as critical to Sony's design as it is to Microsoft's.

So there could be instances where devs utilize software-based upscaling techniques that are engine-specific, incurring some overhead cost. I don't know if MS's solution in terms of the ML features is comparable with Nvidia's DLSS 2.0, but it should be a very solid hardware-based solution in any case.
Thanks
 

M1chl

Currently Gif and Meme Champion
30 or 60fps has more to do with developer choice than processing power regarding next gen. It's not a 100mhz cpu difference that is going to make a game 30fps on a console and 60fps on the other.
Sony 1st party wants to push visuals, if MS 1st party wants to push performance it's down to choice, not hardware difference since this gen that is less relevant than ever.


I can understand using gpu shaders to decompress since GPUs are never under 100% use and its a great use of idle resources, but won't latency have an impact?
On contrary, I would say that CPU is not ever in 100% use, but GPU very well could be. Maybe not in X1X when it's crippled by Jaguar, but that's another story.
 

Rikkori

Member
30 or 60fps has more to do with developer choice than processing power regarding next gen. It's not a 100mhz cpu difference that is going to make a game 30fps on a console and 60fps on the other.
Sony 1st party wants to push visuals, if MS 1st party wants to push performance it's down to choice, not hardware difference since this gen that is less relevant than ever.

Remember, dev choice is limited as well - by the audience's demands. Sure, in theory, almost any game can be 60 fps, but how is that gonna sell? So you can't simply choose 60 fps regardless of how that impacts the presentation. But when you have more GPU power then that choices becomes easier - because you have to sacrifice less visually in order to get there.
 

Lethal01

Member
So people assuming the PS5, for example, only needs texture data in the RAM for the exact scene on display are misunderstanding, because latency is too big an issue in that regard. Attempts to refer to the 22 GB/s theoretical peak compression speed is misguided, because only a small range of data will compress at that level (mainly some types of video and audio data), and you still have to account for loss of data integrity/quality as it would be lossy.

Additionally, while the 22 GB/s compressed theoretical peak speed is fast, NAND latency is still extremely bad compared to DRAM, which is slow compared to on-chip SRAM caches, which are slower than the registers. "Slow" in all these instances refers to latency.

I feel like the Unreal Demo and comments from people at Epic show that the latency is low enough and the speed is fast enough to create more major shifts than just using ram a tiny bit better. Ofcourse I'm not saying that it's anywhere near fast enough to load anything you want by the next frame but going from needing to load something 20 seconds in advance to needing to load something 1 second in advance will be a big shift in how games are made.
 
Last edited:

Ascend

Member
Some interesting discussions on XSX XvA (and the SSDs in general) from B3D. This comment in particular following a chain of responses between Ronaldo8 and Shifty Geezer seems to help put some things into perspective regarding what the SSDs will actually be contributing and how:



So people assuming the PS5, for example, only needs texture data in the RAM for the exact scene on display are misunderstanding, because latency is too big an issue in that regard. Attempts to refer to the 22 GB/s theoretical peak compression speed is misguided, because only a small range of data will compress at that level (mainly some types of video and audio data), and you still have to account for loss of data integrity/quality as it would be lossy.

Additionally, while the 22 GB/s compressed theoretical peak speed is fast, NAND latency is still extremely bad compared to DRAM, which is slow compared to on-chip SRAM caches, which are slower than the registers. "Slow" in all these instances refers to latency. You can probably guess why the latency gets worst with each drop in the memory stack tier. Affording each NAND module its own channel helps in parallelizing the chunking/stripping of data across a net of chips, but it's still dealing with the inherent technological limitations of NAND.

So you're going to have instances where on PS5 the texture footprint in RAM can be lower compared to XSX, but people need to keep in consideration just what that actually means in practice. That though is certainly beneficial for PS5 in terms of using the hardware more efficiently.
I think the idea that the main benefit of the SSD basically being improved RAM efficiency, was a given. That's why the SSD is both overrated and underrated at the same time. It's overrated because it cannot improve graphics beyond the capability of the GPU/CPU, but, it's underrated because a limit that has been here a long time is suddenly gone. Less RAM needs to be used to achieve the same results. Or from another perspective, with the same amount of RAM used you can get much better results. 16GB of RAM might seem like a small jump for this generation, but in reality, combined with the SSD it is an astronomical jump.

There's a reason why my posts have constantly been saying that the gap between the PS5 and XSX is not that big for the SSDs. Every developer is coming from developing for the HDDs, and if you need to keep 1 second in RAM vs 2 seconds for example, in practice, that is going to matter very little for the design of games, because everyone was coming from at least 30 seconds of data in RAM. Although if you're talking 1 vs 2 seconds, you could theoretically be talking about a 2x RAM usage requirement for the XSX. In practice that is not realistic though, because there is a near zero chance that all assets used in this second will be completely unneeded in the next 3 seconds. The majority of RAM space will be occupied with assets that are going to be re-used a lot, just like in the UE5 tech demo.
The only cases where the PS5 will have the advantage is in cases where the XSX would be RAM-limited in terms of capacity. And that is assuming that with XVA no data can be transferred from the SSD directly to the GPU, bypassing RAM. If it can, all this goes out the window, and the PS5 might end up with an advantage in only very specific situations, despite its faster SSD.

It is one of the reasons I am so interested in XVA. What it is actually doing will make all the difference of what the implications are in practice. But MS is very tight-lipped about it. What they did share suggests some things that can help off-set the slower SSD speed. But we still have to confirm those. And no new information has been really available. Other than the rumor that MS will have some mic drop moments in July, whatever that means.
 
I feel like the Unreal Demo and comments from people at Epic show that the latency is low enough and the speed is fast enough to create more major shifts than just using ram a tiny bit better. Ofcourse I'm not saying that it's anywhere near fast enough to load anything you want by the next frame but going from needing to load something 20 seconds in advance to needing to load something 1 second in advance will be a big shift in how games are made.

We don't have any detailed technical notes on the Unreal demo, however, and with all respects to Tim Sweeney, he has a few public relation reasons to frame some of his comments the way he has. To be fair, though, almost every developer who's speaking publicly on the systems is in something of that same boat, some more than others.

Faster asset streaming to/from RAM is, essentially, the same as what the poster I quoted was saying. I also think a big part depending on how useful the SSD I/O of the next-gen systems is flexed comes down to the game design in particular. If your game is only using a few GBs of assets for a prolonged period of time, and/or uses other methods to transform assets in real-time besides streaming in new baked assets to RAM, then it's not going to rely on repeated throughput from the SSD the same way a game requiring more than those circumstances will. That's all.

But, at the very least we should be far enough to say that the faster SSDs will certainly help with more than just load times.

In practice that is not realistic though, because there is a near zero chance that all assets used in this second will be completely unneeded in the next 3 seconds. The majority of RAM space will be occupied with assets that are going to be re-used a lot, just like in the UE5 tech demo.

Great post, but wanted to highlight this part in particular. I think some people are under the assumption that this is how data management in RAM with the SSDs will normally work going forward. But that would seem like a one-off case only with very specific gameplay moments in just a few basic types of games. And when you break it down, very few games in general are going to need the type of asset streaming speed seen in the flying section of UE5 demo.

I'd also like to mention that even if you can load highest-quality LODs for far-off distances, for certain types of art direction you might actually want lower-quality LODs for selective objects in the distance. Even moreso if some of those objects bear no impact on player interaction.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
Appreciate the rest of the post, especially considering that we are just hopping to a new medium and good use of SSD I/O requires a change in approach by the devs no matter how many API’s you throw at them... now you would get mostly very short or no loading times which is the lowest hanging fruit.

that is assuming that with XVA no data can be transferred from the SSD directly to the GPU, bypassing RAM. If it can, all this goes out the window

I am not sure what you think would happen, how it would offset an almost double bandwidth advantage, if RAM were skipped for very large volumes of data beside the shaders being often idle as you are dealing with additional latency for a large volume of data.
PS5 has no tech that can magically close the gap with XSX TFLOPS advantage, XSX just as likely has no tech to magically close the gap :).
 
So people assuming the PS5, for example, only needs texture data in the RAM for the exact scene on display are misunderstanding, because latency is too big an issue in that regard. Attempts to refer to the 22 GB/s theoretical peak compression speed is misguided, because only a small range of data will compress at that level (mainly some types of video and audio data), and you still have to account for loss of data integrity/quality as it would be lossy.

I am not sure what you say here, the exact scene on display as what is currently being rendered in the frame? or the scene where the player is that may o may not be in its entirety in the frame being rendered?

the comment comes from a user commenting from a Shifty's comment discussing with another user, the discussion was about loading textures required for the frame from SSD at mid-frame using XSX's SFS, you can have textures from the room you need to render in RAM and then when needed load the next room "instantly" from SSD, it is not instantly because there is a huge difference from nano seconds to micro seconds it means you cannot use the SSD as RAM and load from there mid-frame as textures will miss and then will have to wait to resolve during frame so you use a cache, but the updates from SSD can be so fast that by the time you move the camera to certain position it can load the chunks of the scene just before they are required for rendering and because this speed you can have smaller caches relying in the speed to fill what is needed before required for rendering so yes you can have only the texture data required for the room or scene the player is in(depending the game) what you cannot do is loading the textures required for rendering during frametime directly from SSD and have to resolve what will be needed off camera, with SSD this can be planned different than in current gen games

the difference with PS5 is that its much faster than XBSX but it will still require a cache(just like XBSX), If I remember correctly there is a slide in Cerny presentations where there is a cache box when he was speaking about streaming data from SSD
 
Last edited:

Ascend

Member
I am not sure what you think would happen, how it would offset an almost double bandwidth advantage, if RAM were skipped for very large volumes of data beside the shaders being often idle as you are dealing with additional latency for a large volume of data.
PS5 has no tech that can magically close the gap with XSX TFLOPS advantage, XSX just as likely has no tech to magically close the gap :).
Why are you all acting like you actually need to fully dump 16GB of RAM every two seconds, rather than every 3 seconds?

If you consider the fact that most data in RAM does not need to be dumped and reloaded constantly, the advantage of 'double bandwidth' is clearly diminished. All games are designed to reduce dumps and reloads as much as possible. It is not only smart design, it is key to allow things to run optimally and smoothly. That does not change while having an SSD. The PS5 can get away with more sloppy programming than the XSX due to its SSD bandwidth, that's for sure. But that is not something we want programmers to do.

Then there's the fact that most of what you need to render on-screen does not need to have the highest detail at all. If you move at high speeds, which inevitably will use a lot of motion blur, it's redundant to load the highest quality textures/mips for rendering everything. If it's a racer, have high detail on the car, and whatever is motion blurred, reduce texture quality. Boom, less streaming required. If you move slowly, you likely have enough time to load in whatever you want with a slower storage as well.

Even above that.... If the XSX has the ability to transfer data directly from the SSD to the GPU bypassing RAM, the implications are a lot bigger than you give it credit for. It might be 'only' 2.4 GB/s. This translates to 2.4 MB per ms. If you have a 60 fps game, that means you have the ability to transfer about 40MB per frame directly from the SSD. What does 40MB give you? A non-compressed 2K texture is 33MB. A non-compressed 4K texture is 67MB. If you'd want to transfer a full 4K texture, you'd need about 40% compression ratio, which the XSX can supposedly already achieve.
So the XSX SSD is capable of transferring a full 4K texture directly from the SSD to the GPU, if it can bypass RAM. Add in the fact that you're using tiling with sampler feedback rather than whole textures at full quality, and things add up quite quickly. And remember that every time you'd transfer from the SSD, that actually spares you from doing RAM operations. RAM operations to load a new texture with RAM already full would mean giving an instruction to dump something currently in RAM, waiting for the actual dumping and its confirmation, giving an instruction to load, waiting for the actual loading. And say the player immediately turns back around after that, you'd possibly even be dumping that new 4K texture data from RAM again, just so can reload what you already had dumped.

The PS5 might have had less RAM usage and would not need to carry out all those operations and would simply load the texture into RAM. But the XSX has a way to compensate for that through its direct SSD to GPU data transfer. It's not magic. It's logic. But the fact that you want to dismiss this as magic says a lot. Again, as a disclaimer, this is all assuming that a transfer from SSD to GPU can happen while bypassing RAM, which has not been confirmed. If it cannot do that, only the last point is invalid. The first two still apply.
 
If you consider the fact that most data in RAM does not need to be dumped and reloaded constantly, the advantage of 'double bandwidth' is clearly diminished. All games are designed to reduce dumps and reloads as much as possible. It is not only smart design, it is key to allow things to run optimally and smoothly. That does not change while having an SSD. The PS5 can get away with more sloppy programming than the XSX due to its SSD bandwidth, that's for sure. But that is not something we want programmers to do.

Then there's the fact that most of what you need to render on-screen does not need to have the highest detail at all. If you move at high speeds, which inevitably will use a lot of motion blur, it's redundant to load the highest quality textures/mips for rendering everything. If it's a racer, have high detail on the car, and whatever is motion blurred, reduce texture quality. Boom, less streaming required. If you move slowly, you likely have enough time to load in whatever you want with a slower storage as well.

Even above that.... If the XSX has the ability to transfer data directly from the SSD to the GPU bypassing RAM, the implications are a lot bigger than you give it credit for. It might be 'only' 2.4 GB/s. This translates to 2.4 MB per ms. If you have a 60 fps game, that means you have the ability to transfer about 40MB per frame directly from the SSD. What does 40MB give you? A non-compressed 2K texture is 33MB. A non-compressed 4K texture is 67MB. If you'd want to transfer a full 4K texture, you'd need about 40% compression ratio, which the XSX can supposedly already achieve.
So the XSX SSD is capable of transferring a full 4K texture directly from the SSD to the GPU, if it can bypass RAM. Add in the fact that you're using tiling with sampler feedback rather than whole textures at full quality, and things add up quite quickly. And remember that every time you'd transfer from the SSD, that actually spares you from doing RAM operations. RAM operations to load a new texture with RAM already full would mean giving an instruction to dump something currently in RAM, waiting for the actual dumping and its confirmation, giving an instruction to load, waiting for the actual loading. And say the player immediately turns back around after that, you'd possibly even be dumping that new 4K texture data from RAM again, just so can reload what you already had dumped.

The PS5 might have had less RAM usage and would not need to carry out all those operations and would simply load the texture into RAM. But the XSX has a way to compensate for that through its direct SSD to GPU data transfer. It's not magic. It's logic. But the fact that you want to dismiss this as magic says a lot. Again, as a disclaimer, this is all assuming that a transfer from SSD to GPU can happen while bypassing RAM, which has not been confirmed. If it cannot do that, only the last point is invalid. The first two still apply.

no, SSD may be fast but its still very slow compared to RAM and the use in the new consoles, in your scenario your compressed 4k texture is being transferred during frametime, its not already there to be used at the start of frametime, if your GPU is going to read a texture directly from SSD and use it it needs it completely, but it needs it for a fraction of the frametime when is filling the triangles of the objects that use it but it cannot wait to the end of the frametime to use it, in that case it will stall until the texture is resolved so your frame now takes much more time, the difference in latency is too big to bypass RAM, it need the required texture in its entirety for the object(s) where is going to be used and also what about its MIP maps?, also what if you need the texture in other parts of the frametime for light calculations? or what if you need to transfer other things from SSD? are you going to devote the several next frames to read directly from SSD the same texture over and over just not to store a single 4k texture in cache in RAM? your 60 fps game will quickly become a 10 fps mess just because you dont want to use 60 MB of RAM! and whatever you save in RAM instructions is meaningless, the difference in latency already blows out of the water whatever time you saved now makes RAM to be idle waiting for the SSD to resolve one texture, you may read textures from SSD but they are to be used in next frames not in current frame, you need to use a texture cache with required textures already there by the time GPU renders

yes the XSX has a way to compensate for slower SSD, its called "using a bigger cache"


thicc_girls_are_best posted a link to a discussion in beyond3d its not all about this subject but they touch it, I suggest you read it

 
Last edited:

GODbody

Member
Even above that.... If the XSX has the ability to transfer data directly from the SSD to the GPU bypassing RAM, the implications are a lot bigger than you give it credit for.

Funnily enough, this was confirmed by Phil Spencer almost a year ago in an interview with a German website. (Site is in German but most modern browsers like Chrome should be able to translate it)

Article Here

Here is the translated quote

PC Games Hardware: Project Scarlett will have an SSD by default. As with the PC, this ensures shorter loading times, no question. But how could game developers use that for their titles? With PCs you can not automatically build on it, because in some cases there are still traditional hard drives.

Phil Spencer: Thanks to their speed, developers can now use the SSD practically as virtual RAM. The SSD access times come close to the memory access times of the current generation of consoles. Of course, the OS must allow developers appropriate access that goes beyond that of a pure storage medium. But then we will see how the address space will increase immensely - comparable to the change from Win16 to Win32 or in some cases Win64.

Of course, the SSD will still be slower than the GDDR6 RAM that sits directly on top of the die. But the ability to directly supply data to the CPU and GPU via the SSD will enable game worlds to be created that will not only be richer, but also more seamless. Not only in terms of pure loading times, but also in terrain mapping. A graphic designer no longer has to worry about when GDDR6 ends and when the SSD starts. I like that Mark Cerny and his team at Sony are also investing in an SSD for the PlayStation 5 ...
 
Last edited:

oldergamer

Member
Funnily enough, this was confirmed by Phil Spencer almost a year ago in an interview with a German website. (Site is in German but most modern browsers like Chrome should be able to translate it)

Article Here

Here is the translated quote
Interesting quote. We totally missed this one. It sounds like the drive having memory addressable space could be what XNA is. Also interesting fact about it coming close to the ram speed from PS4 & Xbox one. I didn't think of it that way previously. What are the ramifications however?
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
Funnily enough, this was confirmed by Phil Spencer almost a year ago in an interview with a German website. (Site is in German but most modern browsers like Chrome should be able to translate it)

Article Here

Here is the translated quote

This does not say anything about CPU and GPU reading directly from the SSD, quite the opposite to be fair. Just like virtual memory now on a PC does not allow the CPU to fetch data directly from the disk. Key words “practically as RAM” in what Spencer says.

What the system does / seems to do is giving you a virtue address space that contains a portion of the SSD storage and manages swapping pages in and out for you (with hints from the devs for prefetching, see SFS’s new instructions for example). We shall see how developers take advantage of each console strength.

In 4 years or so, when first parties will have their second wave of PS5 exclusives, we will see what can and does pay off.
 
Last edited:

Ascend

Member
no, SSD may be fast but its still very slow compared to RAM and the use in the new consoles, in your scenario your compressed 4k texture is being transferred during frametime, its not already there to be used at the start of frametime, if your GPU is going to read a texture directly from SSD and use it it needs it completely, but it needs it for a fraction of the frametime when is filling the triangles of the objects that use it but it cannot wait to the end of the frametime to use it, in that case it will stall until the texture is resolved so your frame now takes much more time, the difference in latency is too big to bypass RAM, it need the required texture in its entirety for the object(s) where is going to be used and also what about its MIP maps?, also what if you need the texture in other parts of the frametime for light calculations? or what if you need to transfer other things from SSD? are you going to devote the several next frames to read directly from SSD the same texture over and over just not to store a single 4k texture in cache in RAM? your 60 fps game will quickly become a 10 fps mess just because you dont want to use 60 MB of RAM! and whatever you save in RAM instructions is meaningless, the difference in latency already blows out of the water whatever time you saved now makes RAM to be idle waiting for the SSD to resolve one texture, you may read textures from SSD but they are to be used in next frames not in current frame, you need to use a texture cache with required textures already there by the time GPU renders

yes the XSX has a way to compensate for slower SSD, its called "using a bigger cache"


thicc_girls_are_best posted a link to a discussion in beyond3d its not all about this subject but they touch it, I suggest you read it

Already did. I see nothing in your statements that invalidate what I have said. And yes I already read that thread on Beyond3D.

Not that you're wrong, except for the latency part. Latency and read speeds are two very different things. Latency is how long it takes before the data can start to be read. Or to put it differently, how long it takes for the data to be accessed. This is high with HDDs, because they have seek times. The access time of an HDD is pretty much around 15 ms, or almost an entire frame. With an SSD, it's not that high at all. The access time of a typical SSD of respectable quality, is around the 0.03 ms mark. Still higher than RAM which is around 10ns, but definitely fast enough to be used for loading.

I used the 4K texture as a reference point for the amount of data that can be transferred. No smart developer is likely to actually load a full 4K texture from the SSD to the GPU.
 

Xdrive05

Member
Just dropping in to say I want to see Series X exclusives designed around the SSD to see how their storage technology compares to exclusives on PS5, games like Ratchet which are obviously designed around having massive datas available to transition entire asset sets within like 1-2 seconds. Maybe it would be double the loading time on X for the same kind of transitions, and that would be noticeably worse I’m sure, but still a very good solution. Anyway, this would be a good opportunity for MS to slow the PS5 SSD hype and demonstrate that their solution works (nearly) as well. Unless it doesn’t, in which case I guess they should keep leaning into their cross-gen designs and hope no one cares. Do something already, MS.
 

Ascend

Member
Funnily enough, this was confirmed by Phil Spencer almost a year ago in an interview with a German website. (Site is in German but most modern browsers like Chrome should be able to translate it)

Article Here

Here is the translated quote
I have actually seen that before. For some reason I thought I had posted this already, but I can't seem to find it..?
 

Leyasu

Banned
Just dropping in to say I want to see Series X exclusives designed around the SSD to see how their storage technology compares to exclusives on PS5, games like Ratchet which are obviously designed around having massive datas available to transition entire asset sets within like 1-2 seconds. Maybe it would be double the loading time on X for the same kind of transitions, and that would be noticeably worse I’m sure, but still a very good solution. Anyway, this would be a good opportunity for MS to slow the PS5 SSD hype and demonstrate that their solution works (nearly) as well. Unless it doesn’t, in which case I guess they should keep leaning into their cross-gen designs and hope no one cares. Do something already, MS.
3+ seconds on the XsX is far from a game breaker
 

Xdrive05

Member
3+ seconds on the XsX is far from a game breaker

Then MS needs to demonstrate that the difference really is at that small magnitude. For all we know, Sony’s I/O solution is also way better than MS’s beyond just the drive speed difference itself. MS could get out in front of this thing with gameplay of something first party that switches entire level assets within a couple seconds like Rachet shows. So far only Sony has brought the receipts.

Ultimately we need a 3rd party future where Series X and PS5 are the baseline platforms, and we need to know that MS’s Velocity architecture can come close to keeping up with Sony’s, so that all engines going forward can use loading solutions like we see in Ratchet.

Set aside that PC will be the actual lowest common denominator along the I/O dimension, which is a wild turn of events we haven’t seen since the N64 days, bizarrely. But I digress.

Ball is in MS’s court to make the score now. I hope they do!
 

oldergamer

Member
This does not say anything about CPU and GPU reading directly from the SSD, quite the opposite to be fair. Just like virtual memory now on a PC does not allow the CPU to fetch data directly from the disk. Key words “practically as RAM” in what Spencer says.

What the system does / seems to do is giving you a virtue address space that contains a portion of the SSD storage and manages swapping pages in and out for you (with hints from the devs for prefetching, see SFS’s new instructions for example). We shall see how developers take advantage of each console strength.
Im not sure what you are thinking he meant is just that. Saying its practically like ram could have a different meaning. Not only that, but spencer mentioned the OS allowing access and also specifically said:

"But the ability to directly supply data to the CPU and GPU via the SSD"

It wouldnt be direct if it had to go through system memory
 
Last edited:

GODbody

Member
This does not say anything about CPU and GPU reading directly from the SSD, quite the opposite to be fair.

Ah, I don't think you read the full quote it says here quite specifically

Of course, the SSD will still be slower than the GDDR6 RAM that sits directly on top of the die. But the ability to directly supply data to the CPU and GPU via the SSD will enable game worlds to be created that will not only be richer, but also more seamless. Not only in terms of pure loading times, but also in terrain mapping.

Virtual memory now on PC does not allow for this but I think this is where the DirectStorage API comes in if its a similar implementation to the GPUDirect Storage tool that Nvidia detailed.
 

Panajev2001a

GAF's Pleasant Genius
Im not sure what you are thinking he meant is just that. Saying its practically like ram could have a different meaning. Not only that, but spencer mentioned the OS allowing access and also specifically said:

"But the ability to directly supply data to the CPU and GPU via the SSD"

It wouldnt be direct if it had to go through system memory

It is from a programming model point of view. Hence the “use practically as virtual ram” quote.

It is a big shift from you having to manage memory mapping and DMA’s from the CPU side to make data available to the GPU address space, it is not a small change.
 

Panajev2001a

GAF's Pleasant Genius
Ah, I don't think you read the full quote it says here quite specifically



Virtual memory now on PC does not allow for this but I think this is where the DirectStorage API comes in if its a similar implementation to the GPUDirect Storage tool that Nvidia detailed.

Just because the GPU on PC does not have access to the same virtual address space as the CPU does, but yes in this model the CPU has direct access to the SSD on your Windows PC. How that happens, how more efficiently it can be done now that is where Direct Storage can help (in addition to the CPU and GPU sharing the same address space).

I read the full quote and it is a carefully worded statement that does convey the clear intention of making the software mode kore usable and more efficient, but does not directly state what you are saying. Then again we shall see... both MS and Sony are talking about instantaneous access to the data on the SSD and have an I/O pipeline going from the SSD to the GPU caches, so they may both have this capability... I am not sure either does though and in order to use it to overcome the memory cache savings you get from about 2x the disk bandwidth you would run into other problems (your shaders wasting tons of cycles potentially unable to cover up the extra latency).
 

sinnergy

Member
In RC there was a small 1+ second load screen when he fell into the voids. On the XsX it would be around 3+ seconds. Nothing game breaking
If you look at the specs of the Ssd alone , we don’t know enough about that part of the Xbox architecture.

If you use low-res textures and with ML upscale , you won’t tell the difference from artist authored, but you load much faster.
 
Last edited:

Bernkastel

Ask me about my fanboy energy!
In RC there was a small 1+ second load screen when he fell into the voids. On the XsX it would be around 3+ seconds. Nothing game breaking
And in The Medium(which is a much more demanding game), they instantly switch worlds.
What about Games that take advantage of this technology ?
Announce on Inside Xbox May 2020, this Silent Hill inspired Game will have the player seamlessly switch between 2 worlds.
You should think of The Medium as Bloober Team's largest and most ambitious game to date. The team isn't creating one world, but two: a version of our own, and a reflection of it in the spirit realm. You'll be able to shift seamlessly between the two in The Medium with – Bloober promises – no discernible load times or impact to game performance and graphics, thanks to the power of the Xbox Series X.
And your comments about 3+ seconds in R&C dont have any basis.
 
Last edited:
Just dropping in to say I want to see Series X exclusives designed around the SSD to see how their storage technology compares to exclusives on PS5, games like Ratchet which are obviously designed around having massive datas available to transition entire asset sets within like 1-2 seconds. Maybe it would be double the loading time on X for the same kind of transitions, and that would be noticeably worse I’m sure, but still a very good solution. Anyway, this would be a good opportunity for MS to slow the PS5 SSD hype and demonstrate that their solution works (nearly) as well. Unless it doesn’t, in which case I guess they should keep leaning into their cross-gen designs and hope no one cares. Do something already, MS.
You realize the game has to warrant it right? Can't just make a game do that unless it makes sense.
 

sendit

Member
And in The Medium(which is a much more demanding game), they instantly switch worlds.

And your comments about 3+ seconds in R&C dont have any basis.

How did you come to a conclusion that The Medium is a more a demanding game? Do you have the technical details of both games?
 
Last edited:
Already did. I see nothing in your statements that invalidate what I have said. And yes I already read that thread on Beyond3D.

Not that you're wrong, except for the latency part. Latency and read speeds are two very different things. Latency is how long it takes before the data can start to be read. Or to put it differently, how long it takes for the data to be accessed. This is high with HDDs, because they have seek times. The access time of an HDD is pretty much around 15 ms, or almost an entire frame. With an SSD, it's not that high at all. The access time of a typical SSD of respectable quality, is around the 0.03 ms mark. Still higher than RAM which is around 10ns, but definitely fast enough to be used for loading.

I used the 4K texture as a reference point for the amount of data that can be transferred. No smart developer is likely to actually load a full 4K texture from the SSD to the GPU.

still slow , its a bad idea to compromise the performance with the idea to load directly to GPU from SSD there is no gain that justifies the stalls you can quickly get just o gain that small amount of memory, there is more than enough RAM, you need a cache in the middle in RAM o a special embedded memory but not directly, latency is one thing and speed is another but both are important, yo can make a request very fast(latency ) the seek time is very fast in SSD but then there is the actual transfer speed of what you need

Memory latency is also the time between initiating a request for data and the beginning of the actual data transfer. On a hard disk drive, latency is the time it takes for the selected sector to come around and be positioned under the read/write head.


your example of 4k textures is not bad, textures dont require super fast data access because its static data but you still require certainty of access with time, any other file you access in SSD will make things worse and worse while you are reading a texture you cant afford to delay so you will need to devote the SSD to stream textures during frametime and the data you can get during a frame is in megabytes so is not a good idea, when its simply better to ensure the texture is available at certain speed in a cache prepared before the frame requires it, the advantage of SSD is not to use them as RAM is to be able to stream data really fast it allow a lot of things is so fast it can work with the speed the user plays and traverse the scenes but not the speed required at mid-frame
 

Bernkastel

Ask me about my fanboy energy!
How did you come to a conclusion that The Medium is a more a demanding game? Do you have the technical details of both games?
Based on what??
Going by developer track record you are saying that a barely known indie dev is making a "much more demanding game" than a proven AAA developer that has shown their tech skills several times in the past.
Are we seriously doing this ? Its not even Insomniac's main project. Are we now calling R&C a AAA game now? Looks like the hive found what minor thing to argue about next.
 

Ar¢tos

Member
Are we seriously doing this ? Its not even Insomniac's main project. Are we now calling R&C a AAA game now? Looks like the hive found what minor thing to argue about next.
This is the Medium trailer:
There is almost zero gameplay, and in the little there is the closest to "loading a world" is when basically a very limited wall changes color in front of the character at 2min.
Are you seriously comparing that to R&C?
Are you here just to troll?
 

oldergamer

Member
Are we seriously doing this ? Its not even Insomniac's main project. Are we now calling R&C a AAA game now? Looks like the hive found what minor thing to argue about next.
You may have jumped the gun there a little. We really don't know how demanding that game is until we get a better look at it.
 

Andodalf

Banned
You may have jumped the gun there a little. We really don't know how demanding that game is until we get a better look at it.

They already said we’re going back to New York, based in Harlem, but in snow. It’s going to be using the same map and most of the same assets, just with higher quality. It’s not a full game. It’s an expandalone Lost legacy type title.
 
Funnily enough, this was confirmed by Phil Spencer almost a year ago in an interview with a German website. (Site is in German but most modern browsers like Chrome should be able to translate it)

Article Here

Here is the translated quote

Damn, that quote just threw another monkey wrench into this xD. I mean reading it, is there really any other way to interpret it than a direct streaming solution of the 100 GB reserved NAND partition to the GPU through some altered method of GPUDirectStorage? Does this mean there's hardware modifications in the GPU for executeIndirect which handle addressing?

Now that I think about it, one of the AMD guys (some Indian lad IIRC) had a LinkedIn a while ago where they talked about the XSX APU and they specifically mentioned ARM cores in there along with the expected x86 cores (x86-64 in AMD's case). Could they have been referring to co-processor cores on the GPU to facilitate extension of executeIndirect functions to pull in data from the 100 GB partition of the SSD without the CPU necessarily needing to do so? Presumedly in GPU native format as well? I mean, it's all essentially a co-processor at the end of the day, like how the ARM co-processor in the PS4 Pro was (though that was for a different purpose and in such case, wasn't implemented into the GPU directly like how the case could be here with MS).

It's not even that far-fetched; Nvidia's GPUs for example use some type of FPGA cores integrated into the GPU die for certain logic, which I assume would extend to handling GPUDirectStorage calls. MS could have just chosen ARM over FPGA because they're cheaper, but still do what they'd need them to. There's also this quote from Ronaldo8 on B3D that's interesting and might fit into this speculation:

What's crazy is that we know that those particular methods are actively being used by some MS studios (Ninja theory?) thanks to an interview given by Playfab's head honcho: https://venturebeat.com/2020/02/03/...ext-generation-of-games-and-game-development/

Of note is this particular nugget of information:

"Gwertzman: You were talking about machine learning and content generation. I think that’s going to be interesting. One of the studios inside Microsoft has been experimenting with using ML models for asset generation. It’s working scarily well. To the point where we’re looking at shipping really low-res textures and having ML models uprez the textures in real time. You can’t tell the difference between the hand-authored high-res texture and the machine-scaled-up low-res texture, to the point that you may as well ship the low-res texture and let the machine do it.

Journalist: Can you do that on the hardware without install time?

Gwertzman: Not even install time. Run time.

Journalist: To clarify, you’re talking about real time, moving around the 3D space, level of detail style?

Gwertzman: Like literally not having to ship massive 2K by 2K textures. You can ship tiny textures."

They highlighted the important parts; while it's basically referring to the DLSS-style ML AI texture upscaling features of the platform, it's interesting Gwertzman stressed that it can be done at runtime. That probably hints at some of the GPU capabilities Matt over on Era was suggesting, but it could also be hinting at some customizations on the GPU to facilitate texture streaming by some co-processor in the GPU (working off extensions of executeIndirect and the pipeline fashioned similarly to Nvidia's GPUDirectStorage but for different purposes as in Nvidia's case it's mainly useful due to PCs being non-hUMA; in this case it could have benefit of very low/virtually non-existent abstraction layer access of data by the GPU to/from the 100 GB partition of NAND storage).

I think this is all starting to piece itself together rather nicely, now. It also fits into what some of the Dirt 5 dev's comments were suggesting about mid-frame use of streamed texture data. Provided the NAND in question for the 100 GB partition is of good enough quality in terms of latency (which I would assume it is), if a car model (in Dirt 5's case) only needs a 5 MB texture file for a panel deformation, even if it's running at 60 FPS and is using different textures each frame, that's still only 5 MB/frame, or 300 MB/s. That's easily within the SSD's limits, and we're talking about textures only being streamed a single time by the GPU, likely to work with the texture for a moment (in the local caches), dump it, then replace it with a new texture streamed in from the SSD. So replacing the textures mid-frame as the Dirt 5 project team member mentioned it, is perfectly capable especially if MS's implementation of XvA is what I'm starting to think it is based on these comments directly from people on the team (and after having considered alternatives seriously).

Starting to think, if the setup with these GPU modifications are what I think they are, could another thing I've bee thinking of be possible. But I'll save going into on that for some other time.

Just because the GPU on PC does not have access to the same virtual address space as the CPU does, but yes in this model the CPU has direct access to the SSD on your Windows PC. How that happens, how more efficiently it can be done now that is where Direct Storage can help (in addition to the CPU and GPU sharing the same address space).

I read the full quote and it is a carefully worded statement that does convey the clear intention of making the software mode kore usable and more efficient, but does not directly state what you are saying. Then again we shall see... both MS and Sony are talking about instantaneous access to the data on the SSD and have an I/O pipeline going from the SSD to the GPU caches, so they may both have this capability... I am not sure either does though and in order to use it to overcome the memory cache savings you get from about 2x the disk bandwidth you would run into other problems (your shaders wasting tons of cycles potentially unable to cover up the extra latency).

I think if Sony had this feature, Road to PS5 would've been the time to talk about it, no? After all, they spoke A LOT about the SSD in particular at that event, it easily took up the majority of the presentation time (audio being 2nd, and their variable frequency being 3rd). It was a conference presentation aimed at developers, after all. These are features developers would like to have heard if they were present, in a conference specifically targeting them.

So I'm inclined to believe Sony is achieving this through a quite different means: just raw speed throughput of the dedicated processor in the I/O block writing to/from RAM. Their goal is to see how to maximize the use of 16 GB memory (minus the reserve for the OS, so 14 GB) as a framebuffer as well as possible, meaning at any time if such and such data is needed it should be able to stream in to the RAM through the I/O block relatively instantly. That's their approach.

It seems more like MS are the ones who've taken an approach mirroring functionality of AMD's SSG cards and Nvidia's GPUDirectStorage, although both of those work differently than what I've been speculating could be the case with MS's approach. At the very least, we can also theorize that MS's approach to this could be exclusively (or in addition to what I've speculated above) having the GPU able to read data direct from the 100 GB NAND partition on the SSD and placing it in the 10 GB pool. Although that seems limited in scope and doesn't facilitate some of the capabilities we've already seen developers on the machine mention publicly (now if it were possible for the GPU to do what I just mentioned, only to the 4x 1 GB modules while the CPU etc. use the slower pool simultaneously, that could be interesting. Since the OS is managing the virtualized pool partitions anyway, it and the kernel could probably adjust the virtualization semi-dynamically although there are no working examples of this in any system (console or PC) that I'm aware of, hence it's a fringe speculation).

Out of the two I've only really seen MS people or devs on XSX mention anything suggesting direct streaming from the SSD in any way that could be bypassing RAM. Although, even if this is possible, it will have obvious limitations (there is still the reality of SSD speeds being way too slow for expected repeated read accesses by the GPU in a way it could do this with the RAM, so it will probably mainly be used for selective single-time stream access of small chunks of data by the GPU to write into the GPU caches depending on how the workload assignments are handled (how the CUs are assigned data, basically). Still very useful, but has its limitations.
 
Last edited:
Top Bottom