• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

(*)SSD and loading times demystified

There's a lot of confusion on why SSD is so important for next-gen and how it will change things.
Here I will try to explain the main concepts.
TL;DR fast SSD is a game changing feature, this generation will be fun to watch!

It was working fine before, why do we even need that?
No, it wasn't fine, it was a giant PITA for anything other than small multiplayer maps or fighting games.
Let's talk some numbers. Unfortunately not many games have ever published their RAM pools and asset pools to the public, but some did.
Enter Killzone: Shadowfall Demo presentation.
We have roughly the following:

TypeApprox. Size, %Approx. Size, MB
Textures30%1400
CPU working set15%700
GPU working set25%1200
Streaming pool10%500
Sounds10%450
Meshes10%450
Animations/Particles1%45

*These numbers are rounded sums of various much more detailed numbers presented in the article above.

We are interested in the "streaming pool" number here (but we will talk about others too)
We have ~500MB of data that is loaded as the demo progresses, on the fly.
The whole chunk of data that the game samples from (for that streaming process) is 1600MB.
The load speed of PS4 drive is (compressed data) <50MB/sec (uncompressed is <20MB/sec), i.e. it will take >30sec to load that at least.

It seems like it's not that big of a problem, and indeed for demo it is. But what about the game?
The game size is ~40GB, you have 6.5GB of usable RAM, you cannot load the whole game, even if you tried.
So what's left? We can either stream things in, or do a loading screen between each new section.
Let's try the easier approach: do a loading screen
We have 6.5GB of RAM, and the resident set is ~2GB from the table above (GPU + CPU working set). We need to load 4.5GB each time. It's 90 seconds, pretty annoying, but it's the best case. Any time you need to load things not sequentially, you will need to seek the drive and the time will increase.
You can't go back, as it will re-load things and - another loading screen.
You can't use more than 4.5GB assets in your whole gaming section, or you will need another loading screen.
It gets even more ridiculous if your levels are dynamic: left an item in previous zone? Load time will increase (item is not built into the gaming world, we load the world, then we seek for each item/item group on disk).
Remember Skyrim? Loading into each house? That's what will happen.
So, loading screens are easy, but if your game is not a linear, static, theme-park style attraction it gets ridiculous pretty fast.

How to we stream then?
We have a chunk of memory (remember 500Mb) that's reserved for streaming things from disk.
With our 50MB/sec speed we fill it up each 10 sec.
So, each 10 sec we can have a totally new data in RAM.
Let's do some metrics, for example: how much new shit we can show to the player in 1 min? Easy: 6*500 = 3GB
How much old shit player sees each minute? Easy again: 1400+450+450+45=~ 2.5GB
So we have a roughly 50/50 old to new shit on screen.
Reused monsters? assets? textures? NPCs? you name it. You have the 50/50 going on.

But PS4 has 6.5GB of RAM, we used only 4.5GB till now, what about other 2GB?
Excellent question!
The answer is: it goes to the old shit. Because if we increase the streaming buffer to 1.5GB it still does nothing to the 50MB/sec speed.
With the full 6.5GB we get to 6GB old vs 3GB new in 1 minute. Which is 2:1 old shit wins.

But what about 10 minutes?
Good, good. Here we go!
In 10 min we can get to 30GB new shit vs 6GB old.
And that's, my friends, how the games worked last gen.
You're as a player were introduced to the new gaming moments very gradually.
Or, there were some tricks they used: open doors animation.
Remember Uncharted with all the "let's open that heavy door for 15sec?" that's because new shit needs to load, players need to get to a new location, but we cannot load it fast.

So, what about SSDs then?
We will answer that later.
Let's ask something else.

What about 4K?
With 4K "GPU working set" will grow 4x, at least.
We are looking at 1200*4 = 4.8GB of GPU data.
CPU working set will also grow (everybody wants these better scripts and physics I presume?) but probably 2x only, to 700*2 = ~1.5GB
So overall the persistent memory will be well over 6GB, let's say 6.5GB.
That leaves us with ~5GB of free RAM in XSeX and ~8GB for PS5.

Stop, stop! Why PS5 has more RAM suddenly?
That's simple.
XSeX RAM is divided into two pools (logically, physically it's the same RAM): 10GB and 3.5GB.
GPU working set must use the 10GB pool (it's the memory set that absolutely needs the fast bandwidth).
So 10 - 4.8 = 5.2 which is ~5GB
CPU working set will use 3.5GB pool and we will have a spare 2GB there for other things.
We may load some low freq data there, like streaming meshes and stuff, but it will hard to use in each frame: accessing that data too frequently will lower the whole system bandwidth to 336Mb/sec.
That's why MSFT calls the 10GB pool "GPU optimal".

But what about PS5? It also has some RAM reserved for the system? It should be ~14GB usable!
Nope, sorry.
PS5 has a 5.5GB/sec flash drive. That typically loads 2GB in 0.27 sec. It's write speed is lower, but not less than 5.5GB/sec raw.
What PS5 can do, and I would be pretty surprised if Sony won't do it. Is to save the system image to the disk while the game is playing.
And thus give almost full 16GB of RAM to the game.
2GB system image will load into RAM in <1 sec (save 2GB game data to disk in 0.6 sec + load system from disk 0.3 sec). Why keep it resident?
But I'm on the safe side here. So it's ~14.5GB usable for PS5.

Hmm, essentially MSFT can do that too?
Yep, they can. The speeds will be less sexy but not more than ~3sec, I think.
Why don't they do it? Probably they rely on OS constantly running on the background for all the services it provides.
That's why I gave Sony 14.5GB.
But I have hard time understanding why 2.5GB is needed, all the background services can run on a much smaller RAM footprint just fine, and UI stuff can load on-demand.

Can we talk about SSD for games now?
Yup.
So, let's get to the numbers again.
For XSeX ~5GB of "free" RAM we can divide it into 2 parts: resident and streaming.
Why two? Because typically you cannot load shit into frame while frame is rendering.
GPU is so fast, that each time you ask GPU "what exact memory location are you reading now?" will slow it down to give you an answer.

But can you load things into other part while the first one is rendering?
Absolutely. You can switch "resident" and "streaming" part as much as you like, if it's fast enough.
Anyway, we got to 50/50 of "new shit" to "old shit" inside 1 second now!
2.5GB of resident + 2.5GB of streaming pool and it takes XSeX just 1 sec to completely reload the streaming part!
In 1 min we have 60:1 of new/old ratio!
Nice!

What about PS5 then? Is it just 2x faster and that's it?
Not really.
The whole 8GB of the RAM we have "free" can be a "streaming pool" on PS5.

But you said "we cannot load while frame is rendering"?
In XSeX, yes.
But in PS5 we have GPU cache scrubbers.
This is a piece of silicon inside the GPU that will reload our assets on the fly while GPU is rendering the frame.
It has full access to where and what GPU is reading right now (it's all in the GPU cache, hence "cache scrubber")
It will also never invalidate the whole cache (which can still lead to GPU "stall") but reload exactly the data that changed (I hope you've listened to that part of Cerny's talk very closely).

But it's free RAM size doesn't really matter, we still have 2:1 of old/new in one frame, because SSD is only 2x faster?
Yes, and no.
We do have only 2x faster rates (although the max rates are much higher for PS5: 22GB/sec vs 6GB/sec)
But the thing is, GPU can render from 8GB of game data. And XSeX - only from 2.5GB, do you remember that we cannot render from the "streaming" part while it loads?
So in any given scene, potentially, PS5 can have 2x to 3x more details/textures/assets than XSeX.
Yes, XSeX will render it faster, higher FPS or higher frame-buffer resolution (not both, perf difference is too low).
But the scene itself will be less detailed, have less artwork.

OMG, can MSFT do something about it?
Of course they will, and they do!
What are the XSeX advantages? More ALU power (FLOPS) more RT power, more CPU power.
What MSFT will do: rely heavily on this power advantage instead of the artwork: more procedural stuff, more ALU used for physics simulation (remember, RT and lighting is a physics simulation too, after all).
More compute and more complex shaders.

So what will be the end result?
It's pretty simple.
PS5: relies on more artwork and pushing more data through the system. Potentially 2x performance in that.
XSeX: relies more on in-frame calculations, procedural. Potentially 30% performance in that.
Who will win: dunno. There are pros and cons for each.
It will be a fun generation indeed. Much more fun than the previous one, for sure.

But what about this?

"Enter Xbox Velocity Architecture, which features tight integration between hardware and software and is a revolutionary new architecture optimized for streaming of in game assets. This will unlock new capabilities that have never been seen before in console development, allowing 100 GB of game assets to be instantly accessible by the developer. The components of the Xbox Velocity Architecture all combine to create an effective multiplier on physical memory that is, quite literally, a game changer."


How many games have you programed in you life?
 
Last edited:

psorcerer

Banned
But what about this?

"Enter Xbox Velocity Architecture, which features tight integration between hardware and software and is a revolutionary new architecture optimized for streaming of in game assets. This will unlock new capabilities that have never been seen before in console development, allowing 100 GB of game assets to be instantly accessible by the developer. The components of the Xbox Velocity Architecture all combine to create an effective multiplier on physical memory that is, quite literally, a game changer."

Dunno, what about it?
I have pointed that out at the day it was announced.
Without that my OP would be much more grim for XSeX.
 

hyperbertha

Member
Getting data exactly at the next frame seems pretty tricky.
It means you will either need to sync SSD to GPU every frame: bad performance.
Or live with a possibility of dropped LoD (you know, when next LoD switches too late and you have a visual pop-in).
How often that pop-in would occur depends solely on other things your game does.
Probably you can minimize that, but in a generic case: it's hard.
So from a practical standpoint, in most cases the 8 GB of new data is going to be what's going to be shown after a second of play time? Or can they selectively load data as it loads into the memory on a first come first served basis for instance whatever loads at 0.25th of a second?
 

psorcerer

Banned
Or can they selectively load data as it loads into the memory on a first come first served basis for instance whatever loads at 0.25th of a second?

Yep, something like that.
Or you can do a sync point each 4 frames or each 10 frames.
Or you can do it in a CPU thread, if you use CPU to submit draw calls to GPU. (RDNA2 can use GPU to feed itself without CPU involvement)
 

hyperbertha

Member
Yep, something like that.
Or you can do a sync point each 4 frames or each 10 frames.
Or you can do it in a CPU thread, if you use CPU to submit draw calls to GPU. (RDNA2 can use GPU to feed itself without CPU involvement)
A sync point would mean syncing the GPU and memory every 4 frames? Why 4 frames?
And the CPU thread solution seems promising but how much data can be passed in a single CPU thread?
 

psorcerer

Banned
A sync point would mean syncing the GPU and memory every 4 frames? Why 4 frames?
And the CPU thread solution seems promising but how much data can be passed in a single CPU thread?

In the end what you need to have is: your GPU, that runs current frame needs to know what to render.
For example it knows that the texture has MIP level 1 and i renders that.
Now player comes closer. How do we know, we read input from player (on CPU) and place that input into some memory location: new player coordinates.
We also can calculate the distance to objects (it's pretty complex in itself, because CPU cannot really know the locations of all objects, otherwise it would be a ray tracing all over again on CPU), so let's pretend it knows it somehow.
It determines that GPU needs to use MIP 0, so it gives command to the SSD: load me a MIP 0 and place it here (pointer to a texture object in RAM)
Then it needs to give a command to GPU: render me this mesh using that texture with MIP 0.
But we don't know when the SSD will finish the load.
And it also doesn't know when GPU will finish the frame.
Or any other current job it has.
So the simplest method is to use semaphore: it places a special token into GPU command buffer that says: when you reach this point in your rendering pipeline, call me, please.
Simplest semaphore would be: at the end of the frame.
Now when GPU finishes the frame, CPU gets called, it checks if texture is there.
If it's already loaded, the next frame will have that MIP 0 texture.
If not, it may sleep on semaphore again.
Now, most of the time you don't want to place too many semaphores, because it means that GPU will call CPU a lot, instead of doing it's normal job - rendering.
So you can place it each 2 frames, or 4 frames.
But.
You can also place it in the frame, if you have a total control of your pipeline, you can even do that in such a way that GPU will not stall, for example: some GPU warp (group of threads) is waiting to fetch a texture from RAM, oops, you insert semaphore here and another warp runs that calls you on CPU.
This shit is complicated. I hope it makes sense.
 

LordKasual

Banned
NXGamer's breakdown of the PS5 was EXTREMELY informative from a practical use situation, and OP's thread also helps put it into perspective.

Good work explaining this tech to everyone.

The problem with Sony's conference is that it was a consumer presentation that was almost 80% just meant for developers.

This why the difference in reaction from consumers and developers is so ridiculously massive with PS5.
 
Last edited:

PocoJoe

Banned
Really interesting text, thanks!

I still fear that this "but it is only 2x faster in loading times, whatever" ignorance wont end.

I were bid disappointed yesterday but now I truly see Cernys vision about what next gen should be.

It cant be about the raw computational power gen after gen, if we want the NEXT GEN feeling and looks like this is the next step (super fast SSD). We are already at the point where games wont look multiple times better, so adding 100 tflops with slow HDD would not do it

This + 3D audio, cant wait to see and hear it:messenger_grimmacing_
 
Last edited:

Lone Wolf

Member
Would a "SSD of the Year" lyrics textblock help?
Really interesting text, thanks!

I still fear that this "but it is only 2x faster in loading times, whatever" ignorance wont end.

I were bid disappointed yesterday but now I truly see Cernys vision about what next gen should be.

It cant be about the raw computational power gen after gen, if we want the NEXT GEN feeling and looks like this is the next step (super fast SSD). We are already at the point where games wont look multiple times better, so adding 100 tflops with slow HDD would not do it

This + 3D audio, cant wait to see and hear it:messenger_grimmacing_
pgfCsGM.jpg
 

Jigsaah

Member
This shit is way over my head. What are you actually trying to say here. In a couple of sentences. My feeling is that you're saying it's a toss up? Based on pros and cons of each system's setup?
 

psorcerer

Banned
This shit is way over my head. What are you actually trying to say here. In a couple of sentences. My feeling is that you're saying it's a toss up? Based on pros and cons of each system's setup?

Let's say in the grand scheme of a next gen gaming it will be hard to predict where it will end up.
But I would say that for multiplatform games XSeX looks more straightforward.
On the other hand I don't think that Cerny is lying about the developer support, so maybe they already convinced everybody that PS5 is cool.
 

Pallas

Member
Xbox's Kinect was an embarrasment.

Atleast devs around the world are talking about the SSD in a positive way.

For gaming maybe but Kinect has better, more practical usages that are far from embarrassment.

If anything I’d say the SSD sauce is more like the Cell processor from the PS3, hyped up to deliver out of this world experience. Let’s hope it won’t be as difficult.
 

yurinka

Member
But what about PS5? It also has some RAM reserved for the system? It should be ~14GB usable!
Nope, sorry.
PS5 has a 5.5GB/sec flash drive. That typically loads 2GB in 0.27 sec. It's write speed is lower, but not less than 5.5GB/sec raw.
Congrats, awesome OP! But I think you forgot something: Cerny said game data in PS5 SSD and Bluray data will be compressed with Kraken and that the I/O system has different things to avoid bottlenecks including decompressing that data by dedicated hardware and placing it into memory, turning that 5.5GB/s into 8-9GB/s. So PS5 numbers will be even better.
 
Last edited:
  • Like
Reactions: CJY
Ayyee my bro, not trying to be a negative Nancy, just a realist. Jim felt the same as you though, always thought I brought the mood down. I was just trying to keep expectations in check.
The Kinect was pretty damn advanced for its time, but I dont think the SSD will end up like the Cell Processor because it's a technology that existed for a decade now.

The PS5 just has a very unique type of SSD with Six-Levels of prioritization to eliminate the bottlenecks.
 
Moving the system image from RAM to SSD (OS image to SSD) Did I understand that right? Because if that's so, OP wants to hibernate our consoles
 

psorcerer

Banned
Congrats, awesome OP! But I think you forgot something: Cerny said game data in PS5 SSD and Bluray data will be compressed with Kraken and that the I/O system has different things to avoid bottlenecks including decompressing that data by dedicated hardware and placing it into memory, turning that 5.5GB/s into 8-9GB/s. So PS5 numbers will be even better.

Not really. If you divide 2GB by 0.27 sec. You will get more than 5.5 GB/sec. And that's the number I do use. 🙂
 
There's a lot of confusion on why SSD is so important for next-gen and how it will change things.
Here I will try to explain the main concepts.
TL;DR fast SSD is a game changing feature, this generation will be fun to watch!

It was working fine before, why do we even need that?
No, it wasn't fine, it was a giant PITA for anything other than small multiplayer maps or fighting games.
Let's talk some numbers. Unfortunately not many games have ever published their RAM pools and asset pools to the public, but some did.
Enter Killzone: Shadowfall Demo presentation.
We have roughly the following:

TypeApprox. Size, %Approx. Size, MB
Textures30%1400
CPU working set15%700
GPU working set25%1200
Streaming pool10%500
Sounds10%450
Meshes10%450
Animations/Particles1%45

*These numbers are rounded sums of various much more detailed numbers presented in the article above.

We are interested in the "streaming pool" number here (but we will talk about others too)
We have ~500MB of data that is loaded as the demo progresses, on the fly.
The whole chunk of data that the game samples from (for that streaming process) is 1600MB.
The load speed of PS4 drive is (compressed data) <50MB/sec (uncompressed is <20MB/sec), i.e. it will take >30sec to load that at least.

It seems like it's not that big of a problem, and indeed for demo it is. But what about the game?
The game size is ~40GB, you have 6.5GB of usable RAM, you cannot load the whole game, even if you tried.
So what's left? We can either stream things in, or do a loading screen between each new section.
Let's try the easier approach: do a loading screen
We have 6.5GB of RAM, and the resident set is ~2GB from the table above (GPU + CPU working set). We need to load 4.5GB each time. It's 90 seconds, pretty annoying, but it's the best case. Any time you need to load things not sequentially, you will need to seek the drive and the time will increase.
You can't go back, as it will re-load things and - another loading screen.
You can't use more than 4.5GB assets in your whole gaming section, or you will need another loading screen.
It gets even more ridiculous if your levels are dynamic: left an item in previous zone? Load time will increase (item is not built into the gaming world, we load the world, then we seek for each item/item group on disk).
Remember Skyrim? Loading into each house? That's what will happen.
So, loading screens are easy, but if your game is not a linear, static, theme-park style attraction it gets ridiculous pretty fast.

How to we stream then?
We have a chunk of memory (remember 500Mb) that's reserved for streaming things from disk.
With our 50MB/sec speed we fill it up each 10 sec.
So, each 10 sec we can have a totally new data in RAM.
Let's do some metrics, for example: how much new shit we can show to the player in 1 min? Easy: 6*500 = 3GB
How much old shit player sees each minute? Easy again: 1400+450+450+45=~ 2.5GB
So we have a roughly 50/50 old to new shit on screen.
Reused monsters? assets? textures? NPCs? you name it. You have the 50/50 going on.

But PS4 has 6.5GB of RAM, we used only 4.5GB till now, what about other 2GB?
Excellent question!
The answer is: it goes to the old shit. Because if we increase the streaming buffer to 1.5GB it still does nothing to the 50MB/sec speed.
With the full 6.5GB we get to 6GB old vs 3GB new in 1 minute. Which is 2:1 old shit wins.

But what about 10 minutes?
Good, good. Here we go!
In 10 min we can get to 30GB new shit vs 6GB old.
And that's, my friends, how the games worked last gen.
You're as a player were introduced to the new gaming moments very gradually.
Or, there were some tricks they used: open doors animation.
Remember Uncharted with all the "let's open that heavy door for 15sec?" that's because new shit needs to load, players need to get to a new location, but we cannot load it fast.

So, what about SSDs then?
We will answer that later.
Let's ask something else.

What about 4K?
With 4K "GPU working set" will grow 4x, at least.
We are looking at 1200*4 = 4.8GB of GPU data.
CPU working set will also grow (everybody wants these better scripts and physics I presume?) but probably 2x only, to 700*2 = ~1.5GB
So overall the persistent memory will be well over 6GB, let's say 6.5GB.
That leaves us with ~5GB of free RAM in XSeX and ~8GB for PS5.

Stop, stop! Why PS5 has more RAM suddenly?
That's simple.
XSeX RAM is divided into two pools (logically, physically it's the same RAM): 10GB and 3.5GB.
GPU working set must use the 10GB pool (it's the memory set that absolutely needs the fast bandwidth).
So 10 - 4.8 = 5.2 which is ~5GB
CPU working set will use 3.5GB pool and we will have a spare 2GB there for other things.
We may load some low freq data there, like streaming meshes and stuff, but it will hard to use in each frame: accessing that data too frequently will lower the whole system bandwidth to 336Mb/sec.
That's why MSFT calls the 10GB pool "GPU optimal".

But what about PS5? It also has some RAM reserved for the system? It should be ~14GB usable!
Nope, sorry.
PS5 has a 5.5GB/sec flash drive. That typically loads 2GB in 0.27 sec. It's write speed is lower, but not less than 5.5GB/sec raw.
What PS5 can do, and I would be pretty surprised if Sony won't do it. Is to save the system image to the disk while the game is playing.
And thus give almost full 16GB of RAM to the game.
2GB system image will load into RAM in <1 sec (save 2GB game data to disk in 0.6 sec + load system from disk 0.3 sec). Why keep it resident?
But I'm on the safe side here. So it's ~14.5GB usable for PS5.

Hmm, essentially MSFT can do that too?
Yep, they can. The speeds will be less sexy but not more than ~3sec, I think.
Why don't they do it? Probably they rely on OS constantly running on the background for all the services it provides.
That's why I gave Sony 14.5GB.
But I have hard time understanding why 2.5GB is needed, all the background services can run on a much smaller RAM footprint just fine, and UI stuff can load on-demand.

Can we talk about SSD for games now?
Yup.
So, let's get to the numbers again.
For XSeX ~5GB of "free" RAM we can divide it into 2 parts: resident and streaming.
Why two? Because typically you cannot load shit into frame while frame is rendering.
GPU is so fast, that each time you ask GPU "what exact memory location are you reading now?" will slow it down to give you an answer.

But can you load things into other part while the first one is rendering?
Absolutely. You can switch "resident" and "streaming" part as much as you like, if it's fast enough.
Anyway, we got to 50/50 of "new shit" to "old shit" inside 1 second now!
2.5GB of resident + 2.5GB of streaming pool and it takes XSeX just 1 sec to completely reload the streaming part!
In 1 min we have 60:1 of new/old ratio!
Nice!

What about PS5 then? Is it just 2x faster and that's it?
Not really.
The whole 8GB of the RAM we have "free" can be a "streaming pool" on PS5.

But you said "we cannot load while frame is rendering"?
In XSeX, yes.
But in PS5 we have GPU cache scrubbers.
This is a piece of silicon inside the GPU that will reload our assets on the fly while GPU is rendering the frame.
It has full access to where and what GPU is reading right now (it's all in the GPU cache, hence "cache scrubber")
It will also never invalidate the whole cache (which can still lead to GPU "stall") but reload exactly the data that changed (I hope you've listened to that part of Cerny's talk very closely).

But it's free RAM size doesn't really matter, we still have 2:1 of old/new in one frame, because SSD is only 2x faster?
Yes, and no.
We do have only 2x faster rates (although the max rates are much higher for PS5: 22GB/sec vs 6GB/sec)
But the thing is, GPU can render from 8GB of game data. And XSeX - only from 2.5GB, do you remember that we cannot render from the "streaming" part while it loads?
So in any given scene, potentially, PS5 can have 2x to 3x more details/textures/assets than XSeX.
Yes, XSeX will render it faster, higher FPS or higher frame-buffer resolution (not both, perf difference is too low).
But the scene itself will be less detailed, have less artwork.

OMG, can MSFT do something about it?
Of course they will, and they do!
What are the XSeX advantages? More ALU power (FLOPS) more RT power, more CPU power.
What MSFT will do: rely heavily on this power advantage instead of the artwork: more procedural stuff, more ALU used for physics simulation (remember, RT and lighting is a physics simulation too, after all).
More compute and more complex shaders.

So what will be the end result?
It's pretty simple.
PS5: relies on more artwork and pushing more data through the system. Potentially 2x performance in that.
XSeX: relies more on in-frame calculations, procedural. Potentially 30% performance in that.
Who will win: dunno. There are pros and cons for each.
It will be a fun generation indeed. Much more fun than the previous one, for sure.
Data per frame is much more important than fps and resolutions, they are both 4k machines anyway. Playstations ssd is awesome and we all expected it but fucking hell all the complications happened because of the stupid low 16gb and god knows itll be a bottleneck my preferred pool was 24-32gb atleast not 16
 

psorcerer

Banned
Data per frame is much more important than fps and resolutions, they are both 4k machines anyway. Playstations ssd is awesome and we all expected it but fucking hell all the complications happened because of the stupid low 16gb and god knows itll be a bottleneck my preferred pool was 24-32gb atleast not 16

There's never enough RAM.
And with the console price constraints even more so.
Need to deal with it.
 

Shmunter

Member
2nd Bouns round:

But, but I swapped HDD to SSD in my PS4 and nothing changed! What's going on?
You see. Let's return to that Killzone example.
We have that 500MB streaming buffer and we load it with the new data in 10 sec, on HDD.
Now we swapped in SSD with 500MB/sec and we load that buffer in 1 sec!
But, guess what, game was not designed around that.
Game was designed to demand and use that data in next 10 sec, not right now.
So, no matter how fast your SSD is, it will not change anything.
I would say in a properly designed game it will even make things worse.
Why? Because you used 500MB/sec of memory bandwidth right now, for the data that won't be needed at all until game will require it 10 sec later, wasted 500Mb/sec for nothing instead of giving it to the GPU!

But how do I know if game was designed around SSD?
Simple. It will not work on HDD, like at all.
Like giving you 1fps, 0.2fps, complete slideshow.
Unless it behaves like that. It's not a game for SSD.

Games targeting ssd could still work on slower solutions by scaling the asset quality

E.g PS5 = full Rez textures
XsX = half rez textures
Hdd = a few pixels
 

yurinka

Member
Not really. If you divide 2GB by 0.27 sec. You will get more than 5.5 GB/sec. And that's the number I do use. 🙂
Mmmm... that is 7.4GBs. Cerny mentioned loading 2GB in 0.27s from the SSD but 7.4GBs it doesn't match with what he mentioned about 5.5GB/s (raw), which after being compressed by its custom chip decompressor at 5.5GB/s turns into 8-9GB of data, and up to 22GB if data is particularly well compressed.

I assume that since the uncompressed number varies depending on what type of data is compressed and how is compressed he may have chosen a conservative case, which would be 7.5GB/s for that example, while the most typical case would be between 8 and 9 GB/s.
 

psorcerer

Banned
Mmmm... that is 7.4GBs. Cerny mentioned loading 2GB in 0.27s from the SSD but 7.4GBs it doesn't match with what he mentioned about 5.5GB/s (raw), which after being compressed by its custom chip decompressor at 5.5GB/s turns into 8-9GB of data, and up to 22GB if data is particularly well compressed.

I assume that since the uncompressed number varies depending on what type of data is compressed and how is compressed he may have chosen a conservative case, which would be 7.5GB/s for that example, while the most typical case would be between 8 and 9 GB/s.

Yup, it's kind of a safe bet.
Their target was 10GB/sec it seems. And it probably can reach 10GB/sec in real workloads.
 

yurinka

Member
Yup, it's kind of a safe bet.
Their target was 10GB/sec it seems. And it probably can reach 10GB/sec in real workloads.
Yes, because in the talk he also mentions 'roughly 100x faster than PS4 HDD'. And 100x faster would be these 10GB/s instead of the 8-9GB/s.

Edit: when watching that portion of the talk again, I see (starts at 7:29) that this "2GB read in 0.27 sec" was their target.
 
Last edited:
There's never enough RAM.
And with the console price constraints even more so.
Need to deal with it.
32gb was the minimum i wpild accept plus tge fast ssds as compensation this gen but 16gb is too low id rather buy an expensive console and live with it for next 6 years than buy a 16gb cheap one whats the point of a ps5 anyway its just bogus!
 
I think "hybernate" will be there from day 1.
I want to do something like "suspend" when game is running.
Mm you can't do that. When you suspend, you are only powering off everything except the RAM.
Also, you can't remove the OS from RAM, everything else stops working. You always need the OS in RAM because the OS it's the system administrator. Any program (games in this case) that's launched, has to ask for resources or other things to the OS
 

psorcerer

Banned
Also, you can't remove the OS from RAM

You can. It's called "hypervisor".
If your OS runs as yet another software under the same hypervisor.
And hypervisors can have pretty small footprint in memory.
But I've already mentioned that it depends how many services you want to run in background.
If "everything" then obviously the whole OS runs all the time.

When you suspend, you are only powering off everything except the RAM

That's a too literal understanding of what I mean.
 

hyperbertha

Member
In the end what you need to have is: your GPU, that runs current frame needs to know what to render.
For example it knows that the texture has MIP level 1 and i renders that.
Now player comes closer. How do we know, we read input from player (on CPU) and place that input into some memory location: new player coordinates.
We also can calculate the distance to objects (it's pretty complex in itself, because CPU cannot really know the locations of all objects, otherwise it would be a ray tracing all over again on CPU), so let's pretend it knows it somehow.
It determines that GPU needs to use MIP 0, so it gives command to the SSD: load me a MIP 0 and place it here (pointer to a texture object in RAM)
Then it needs to give a command to GPU: render me this mesh using that texture with MIP 0.
But we don't know when the SSD will finish the load.
And it also doesn't know when GPU will finish the frame.
Or any other current job it has.
So the simplest method is to use semaphore: it places a special token into GPU command buffer that says: when you reach this point in your rendering pipeline, call me, please.
Simplest semaphore would be: at the end of the frame.
Now when GPU finishes the frame, CPU gets called, it checks if texture is there.
If it's already loaded, the next frame will have that MIP 0 texture.
If not, it may sleep on semaphore again.
Now, most of the time you don't want to place too many semaphores, because it means that GPU will call CPU a lot, instead of doing it's normal job - rendering.
So you can place it each 2 frames, or 4 frames.
But.
You can also place it in the frame, if you have a total control of your pipeline, you can even do that in such a way that GPU will not stall, for example: some GPU warp (group of threads) is waiting to fetch a texture from RAM, oops, you insert semaphore here and another warp runs that calls you on CPU.
This shit is complicated. I hope it makes sense.
This really clears many things up.
From what I understand the CPU knows what textures/assets need loading depending on player proximity and uses semaphores to know when to check for new data on the RAM and then inform the GPU to render that asset.
Also semaphores have absolutely nothing to do with what assets currently need loading and is just a command that gets executed every time the GPU ends a frame(for eg) to notify the CPU, and thus are able to render whatever the CPU has loaded onto the RAM from SSD, correct? Is this the solution you expect most games to employ going forward?

Also instead of proximity LOD, in the scenario where the player is turning his camera and new assets need to load in as he turns, the same technique can be used right? For proximity, its sensible that there are just a few assets loading in as the player moves, but when you turn the camera there's suddenly a whole truckload of new assets that you need to stream in. I watched the Horizon video and Cyberpunk 2077 takes this to even more of an extreme where its loading in new assets not just for horizontal camera rotation , but vertical rotation too. So what you see when you turn the camera upwards is entirely new content just loaded in discarding all the data that was present when you were looking straight. Do you think both these games use the solution mentioned above?
Also when there needs to be a huge number of assets that need streaming in one frame, how does it affect performance? I'm assuming it doesn't affect performance at all.

Also I'd like a more in depth explanation of the GPU cache scrubber present in the PS5. This seems to be the main key in its arsenal, and what most people here seem to be missing when they say things like ''but the xbox has SSD too''. Clearing this thing up could help with lot of the issues here I think.
 

HeadsUp7Up

Member
Thanks for this, now I’m hyped again for next gen!

Still on the fence about buying PS5, at release though. Going to have to wait to hear reports about heat and noise first after it’s released. Still, this write up has me anticipating getting both this year now.
 

psorcerer

Banned
Is this the solution you expect most games to employ going forward?

Dunno. MSFT SFS seems like a different solution.
The whitepaper on SFS is pretty long but in the end it's something like: load the whole texture, calculate which parts are used by sampling what GPU renders, unload all the parts that are not rendered, hence the "feedback" name.
PS5 solution is more "give developers control" and XSeX: "do the work for them". Which is kind of a recurring theme in DirectX vs GNM libraries both vendors use anyway.

but vertical rotation too

Yup. CP2077 has a lot of verticality, because it models a city, you need to do 3D space LoDs there.
But it's still a "current gen" game. I think they won;t be using new tech too much.

Also I'd like a more in depth explanation of the GPU cache scrubber present in the PS5

We don't know exactly how it works. And probably won't know soon. The exact performance of these will be under NDA, as usual.
But in a nutshell it should be similar to this:
When GPU renders something, it uses the caches. I.e. data goes VRAM->cache->GPU registers.
Let's say we rendered an object with one texture. After render is finished. Some texels from the texture are still in the cache.
If we do render it again, or render different object with that texture cache will save us a trip to RAM.
But now, we want to load another texture instead of that one.
Without scrubbers nothing will tell the cache to reload the data from VRAM, and it will happily render the old texels from cache next time it is asked to do it.
That's why scrubber goes to all the caches that keep that texture data and marks these cache lines "invalid". Then the cache knows to reload it next time the texture is needed.
 

Shmunter

Member
You can. It's called "hypervisor".
If your OS runs as yet another software under the same hypervisor.
And hypervisors can have pretty small footprint in memory.
But I've already mentioned that it depends how many services you want to run in background.
If "everything" then obviously the whole OS runs all the time.



That's a too literal understanding of what I mean.
Even then, services would be a small amount of ram. The UX takes lots of ram, and that can be swapped out as it’s assets, and swapped back in quick from ssd when needed. So there are saving.

But what this whole convo misses, and I think some analysts also missed is that gamedvr video file sits in ram. That’s the biggest allocation that is reserved as OS. I don’t believe you would use ssd for this as it’s recording all the time and only dumps chunks to storage once in awhile. Unless ssd is ok for this sort of thing now??
 

Goliathy

Banned
But what about this?

"Enter Xbox Velocity Architecture, which features tight integration between hardware and software and is a revolutionary new architecture optimized for streaming of in game assets. This will unlock new capabilities that have never been seen before in console development, allowing 100 GB of game assets to be instantly accessible by the developer. The components of the Xbox Velocity Architecture all combine to create an effective multiplier on physical memory that is, quite literally, a game changer."


How many games have you programed in you life?

Sounds good. I think that closes the gap compared to the PS5 SSD.
 
Top Bottom