• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Xbox Velocity Architecture - 100 GB is instantly accessible by the developer through a custom hardware decompression block

Panajev2001a

GAF's Pleasant Genius
I'm not sure what you mean, First it was specifically PRT, it was SFS and the specific features that the twitter user mentioned. I see you keep resorting to just saying PRT, but i don't see the point if that is part of a larger feature there would be no need to single it out.

Second, I thought the guy on twitter that alluded to the effective 2x to 3x increase in memory/bandwidth worked for MS? Sure, MS hasn't officially said this yet, but they also said they would talk more about it later. If you are claiming that isn't what the twitter guy (I forget his name at the moment, forgive me) was actually saying, what do you think he was talking about?

My point was that the baseline in the guy’s comments, in the original articles, and in the poster’s comments was always the one missing.

In the other comments the MS engineers made, in the tech presentation linked about Sampler Feedback itself (stating it was essentially HW refinement of a feature already available and in use, hence why they also call it PRT+), and in what anyone (including the MS engineer in question) could quote as SFS having on XSX on top of “basic” SF (a HW blending mode to smooth out page misses scenarios where the texture data still needs to be streamed in).. I do not see the basis for 2-3x or more I/O bandwidth and memory storage savings (reduce RAM consumption up to 1/4th) on top of what you could get on RDNA1 or Polaris/Vega based GPU’s that already implement a virtual texturing scheme.
It seems odd and frankly smells like snake oil if that were what was implied and not a misunderstanding.
 
Last edited:

ZywyPL

Banned
I think the whole 2-3x multiplier comes from misinterpretation:

As textures have ballooned in size to match 4K displays, efficiency in memory utilisation has got progressively worse - something Microsoft was able to confirm by building in special monitoring hardware into Xbox One X's Scorpio Engine SoC. "From this, we found a game typically accessed at best only one-half to one-third of their allocated pages over long windows of time," says Goossen. "So if a game never had to load pages that are ultimately never actually used, that means a 2-3x multiplier on the effective amount of physical memory, and a 2-3x multiplier on our effective IO performance."


So basically what Andrew Goossen said is the same what Cerny said during his presentation - that up until now the consoles had to store entire levels, or at least huge parts of the gameplay in RAM, and now thanks to SSDs they will be able to store only a small bit of the nearest, upcoming section and stream the remaining parts on the fly, when needed. So basically, with HDDs the consoles would have to store entire level within 13GB, and now they have 13GB for just a chapter/checkpoint, effectively expanding the overall size of the entire level.
 

phil_t98

#SonyToo
Final fantasy is a current gen game though. Not saying that 100gb will be needed the whole time but I would assume it’s a step up from this gen at the very least considering the baseline of textures and such isn’t 1080p(XBO/ base PS4).

Agree its a currant gen game but we are jumping from 100mb transfer speed to at least 2.5gb which is a massive jump.
 

oldergamer

Member

One of the examples, maybe not the best but I think I was not exactly the condescending one in this scenario, albeit I can see how I was sounding less and less nice and cam each additional page this went on. That is fair to say. I am not going to go through the thread with a fine tooth comb though...

Yeah no expection to on my part, the thread is too long for that.



I do not think there is a rule prohibiting from re-exploring a point when that is a.) not settled and presented as a fact and b.) the starting point for other discussion in the same thread which would build upon it. Speculative discussion should not be discouraged that we agree. Debate on that should not either. This is not a place pro or against any box, I do not think it is a place to try to divert narratives or try to create new ones with sometimes disingenuous arguments: anyways beauty of free speech and open discussion and all that.

Sure agreed.


This is fair, I can at least re-read my posts and check on my tone. Right now at least I disagree with your assessment though, I still think it was not improper / nasty either that it would disqualify the rest of what I was saying, but introspection does not hurt and I may be very wrong on this so honestly apologies if so.
ok lets forget it about, These things happen and I've done the same. I actually should change my name to grumpy oldergamer to be honest. my responses sometimes are just plain cranky. it's cool to move back to the tech aspects. I'm still curious about this 100GB of the SSD that is "instantly accesible" and if it indeed is close to the video cards with an SSD attached.
 

Dodkrake

Banned
Nah i am talking about power budget which is the reason here.
Where did Mark Sony said indefinitely btw?

Modern processors are complicated that seeing the onscreen frequency is not indicative. They can boost to 2.2hz@100w but the rendered output may not be as good as when it is running 2ghz@120w etc.

Hence we use power/tdp. As this is the limitation on both consoles, by their cost & design, so there is your cap.
Besides if both consoles run at sustained peak, Series X will still deliver better results! :messenger_bicep:

1. In the road to PS5 presentation

Starts at 36:50



And here


There's enough power that both CPU and GPU can potentially run at their limits of 3.5GHz and 2.23GHz, it isn't the case that the developer has to choose to run one of them slower.

2. This is not the same as boost clocks on laptop and desktop CPUs

3. You don't know if it will deliver better results. In theory it will, but you have no proof of that, so either show it or stop spreading nonsense
 

Thirty7ven

Banned
I’ve now watched the cooling segment a couple of times. From what I understand, the default clock of the PS5 are:

- CPU 3.5Ghz
- GPU 2.23GHz

Down clock is then workload dependent. Because some rare specific workloads will try to exceed TDP limits, the GPU will lower it’s frequency slightly, but if in that rare case the CPU is being underutilized, it will use smart shift to send power to the GPU.

At least this is how I understand it.
 

longdi

Banned
1. In the road to PS5 presentation

2. This is not the same as boost clocks on laptop and desktop CPUs

3. You don't know if it will deliver better results. In theory it will, but you have no proof of that, so either show it or stop spreading nonsense

Smartshift will deliver better results for constrained systems. Same thing why you see SS, Optimus only in laptops and not desktop. Full sustained load always win. I reckon with smartshift, there are even overheads to have a counter monitor running constantly.

Joining the dots, why Sony needed SS, either they built a strangely compact system or they have reached the limits of how much power/cooling they have.

Which ties back to the surprisingly high gpu clocks and the murmurs that Series X big die surprised Sony.
 

THE:MILKMAN

Member
Smartshift will deliver better results for constrained systems. Same thing why you see SS, Optimus only in laptops and not desktop. Full sustained load always win. I reckon with smartshift, there are even overheads to have a counter monitor running constantly.

Joining the dots, why Sony needed SS, either they built a strangely compact system or they have reached the limits of how much power/cooling they have.

Which ties back to the surprisingly high gpu clocks and the murmurs that Series X big die surprised Sony.

Ultimately both Sony and Microsoft compromised. Consoles always have to. Microsoft outright stated they did in the DF article.

I'm pretty confident both will show great games despite any compromises and that is all we should care about.
 

Dodkrake

Banned
Smartshift will deliver better results for constrained systems. Same thing why you see SS, Optimus only in laptops and not desktop. Full sustained load always win. I reckon with smartshift, there are even overheads to have a counter monitor running constantly.

Every single desktop, laptop and mobile device is constrained to a power envelope. That's why different PSU's exist for desktop computers. And this tech is new, rumored to be applied to desktop level components in the future.

Also, stop with the nonsense of "full sustained loads". Neither PS5 nor Xbox Series X will be running fully sustained loads. You never do, that's why the figures are theoretical maximums. The difference is that the PS5 will downclock / move power to another component, while the Xbox will keep its clock speed the same.

Desktop computers already do this and have done so for more than a decade.

Joining the dots, why Sony needed SS, either they built a strangely compact system or they have reached the limits of how much power/cooling they have.

Explained in the video I quoted, which you seem to conveniently ignore because you are just trolling at this point.

Which ties back to the surprisingly high gpu clocks and the murmurs that Series X big die surprised Sony.

And I bet it didn't, those rumors were started by the Discord group and ignorant posters that don't have any clue how HW and SW development cycles work. Sony went with a different paradigm that matches with AMD's timeline for their APU's.

As for high clock speeds, we don't know how high RDNA2 can go. If Cerny's words are anything to go by, it looks like they needed to cap GPU frequency at 2.23GHz. This means it can go higher.

So, all in all, I want to apologize for the offtopic and will be concluding this argument. There's simply no point in beating a dead horse of rehashed bull dung.
 
Are some people actually belittling a college student/developer simply for putting out an idea some people don't like? I might not've agreed with everything the Crytek dev was saying but at least I acknowledged they had some good points in there and some of the things they mentioned did seem plausible (like PS5 being relatively easier to develop for, which was more or less his entire point).

So I'm gonna take a moment and surmise my thoughts on how PS5 and XSX's SSD I/O systems most likely work, given the evidence we have and some speculation on top of that.

PS5:

0bmAhHb.jpg


This is the PS5's custom I/O block, shown in Road to PS5. Based on the diagram, it sends data from itself to the main system GDDR6 memory pool, meaning it shares the memory bus with the CPU and GPU to RAM. Which means, most likely, when the I/O block is sending or retrieving data from RAM, the CPU and GPU are not accessing RAM, similar to how when the CPU is accessing RAM the GPU has to wait and vice-versa (this same bus contention is present on XSX, since on APUs the memory bus is shared between the different components).

Most likely, the I/O block is not sending data through a direct link to the GPU; if that were the case it is a feature Cerny would have mentioned at Road to PS5, since it's a pretty critical feature to have. At the very least, that would have been a feature alluded to, so I'm now of the belief that is not a method PS5 is utilizing to transmit data from NAND through to other parts of the system. This is partly why the SSD is so fast at 5.5 GB/s for raw data and 8-9 GB/s for typical compressed data, and up to 22 GB/s for particularly "well-compressed" data; if the I/O block is sending and receiving to/fro RAM and the CPU/GPU can't access during that time, you would want the I/O block to finish its operations as fast as possible so those other components can access the bus quicker. This is beneficial even if the I/O block is only sending a few megabytes worth of data at any given time.

That's essentially the basics of how PS5's I/O system functions. The CPU still communicates with the I/O block to instruct other parts of the block what to do, but that's about it. Compression/Decompression etc. are all handled on the I/O block.

XSX:

GPU Work Creation – Xbox Series X adds hardware, firmware and shader compiler support for GPU work creation that provides powerful capabilities for the GPU to efficiently handle new workloads without any CPU assistance. This provides more flexibility and performance for developers to deliver their graphics visions.

Microsoft's solution is a system dubbed the Xbox Velocity Architecture, through which the game installation on the Xbox Series X's 1 TB NVMe SSD functions as an extension of the system RAM — “allowing 100 GB of game assets to be instantly accessible by the developer,” according to an Xbox Wire article.

From what we know, the XSX reserves 1/10th a core to handle some of the system I/O operations. We can assume this also involves reading and writing data from NAND to/from system memory. Obviously, when the CPU is doing this, the GPU can't access the memory because of the same bus contention stuff mentioned above. However, this 1/10th a CPU core is only sending and retrieving data to/from RAM; it's not compressing, decompressing or doing any extra work on it, because the system has other hardware to handle those tasks.

It can send and retrieve raw data at 2.4 GB/s, compressed data at 4.8 GB/s and certain compressed data at up to 6 GB/s. These are sustained numbers meaning under max loads this is the majority of the throughput they expect; there could be sparse peaks with higher numbers (the memory controller supports the hardware to enable that), but those are likely to be edge-cases.

The reason I pulled those two quotes is because MS, in posting them, essentially have indicated that their SSD has a direct feed link to the GPU, so the GPU can stream in data from the SSD with game code basically looking at it as an extension of RAM, similar to how older game consoles and microcomputers could treat ROM cartridges as extensions of their own system RAM. That way, no specific calls to the data on those cartridges had to be done, reducing overhead significantly in moving data from storage to memory since it's NOT going to memory in the first place.

MS seem to have allocated a 100 GB partition of the SSD to serve as the link of direct feed access to the GPU, which will be able to perform tasks of GPU-bound data in the partition without CPU intervention, utilizing advancements in features such as executeIndrect which was already present on the XBO. That's where the "instant" part comes in; there's no transition of the data in that 100 GB partition to memory the GPU has to wait on to operate with the data. The trade-off being that data being fed in through the NAND at 2.4 GB/s raw is magnitudes slower than data coming from 10 GB of GDDR6 at 560 GB/s, similar to how ROM carts were slower than the RAM in older cartridge game systems even if the ROM carts were able to be treated as extended (read-only) memory.

--------------

This is basically what it seems the two systems are doing; PS5 wants to maximize data throughput from storage to RAM as quickly as possible given today's technological limits and use a mostly hardware-dominated approach, whereas XSX can sufficiently supply the RAM with data from storage, just not as fast as PS5. However, it has a scaled-down implementation of AMD's SSG cards in providing a direct access feed from the SSD to the GPU via a 100 GB partition, which is treated as extended memory by the system, and the GPU having modifications so that it can work with this data without CPU intervention.

That seems to be a general perspective on how the two systems are implementing their SSD I/O systems, and when you think about it they're a bit apples to oranges. It's a disservice to both to directly compare them because they are achieving fuller utilization of the system data pipeline through different means that are equally valid in their own ways and areas of efficiency.

Thanks for the explanation.

I thought it was odd, but with all the talk of "virtual RAM in the SSD", I started to think that maybe there was something more to it. No matter how one looks at it, it doesn't make sense. The assets always have to end up in the RAM, and constantly flushing the cache on purpose is stupid, you might as well not have it there then.

This isn't true. AMD's SSG cards have 2 TB of NAND storage on the card the GPU can directly access for streaming of asset data. The data has to be formatted a given way (after all, data on NAND cannot be addressed at the bit or byte level), but the tech is out there.

It would seem MS have taken a cut-down version of that and are utilizing it for XSX. It being "instantly accessible" is more due to the GPU having extended work done with features like executeIndirect that allow the GPU to work with streamed-in data without CPU interrupt to instruct it what to do with that data.
 
Last edited:

Thirty7ven

Banned
Which ties back to the surprisingly high gpu clocks and the murmurs that Series X big die surprised Sony.

Stop spreading Fan fiction. The mumurs about MS surprising Sony came from Tom Warren who said MS wanted to surprise Sony.

The only surprise is MS not caring about console form factor. Xbox One also has a larger die than PlayStation 4, in fact it’s the same size as XSX.
 

THE:MILKMAN

Member
The reason I pulled those two quotes is because MS, in posting them, essentially have indicated that their SSD has a direct feed link to the GPU, so the GPU can stream in data from the SSD with game code basically looking at it as an extension of RAM, similar to how older game consoles and microcomputers could treat ROM cartridges as extensions of their own system RAM. That way, no specific calls to the data on those cartridges had to be done, reducing overhead significantly in moving data from storage to memory since it's NOT going to memory in the first place.

Now this is all getting really technical but I'm trying to use my own "logic" here but taking what you say (data goes directly from SSD > GPU), what is the point of SFS and the 2-3x saving if it doesn't even get parked in RAM but goes straight to the GPU?

I'm just getting bewildered and bogged down on all this technical stuff....
 

longdi

Banned
Every single desktop, laptop and mobile device is constrained to a power envelope. That's why different PSU's exist for desktop computers. And this tech is new, rumored to be applied to desktop level components in the future.

Also, stop with the nonsense of "full sustained loads". Neither PS5 nor Xbox Series X will be running fully sustained loads. You never do, that's why the figures are theoretical maximums. The difference is that the PS5 will downclock / move power to another component, while the Xbox will keep its clock speed the same.

Desktop computers already do this and have done so for more than a decade.


Explained in the video I quoted, which you seem to conveniently ignore because you are just trolling at this point.


And I bet it didn't, those rumors were started by the Discord group and ignorant posters that don't have any clue how HW and SW development cycles work. Sony went with a different paradigm that matches with AMD's timeline for their APU's.

As for high clock speeds, we don't know how high RDNA2 can go. If Cerny's words are anything to go by, it looks like they needed to cap GPU frequency at 2.23GHz. This means it can go higher.

So, all in all, I want to apologize for the offtopic and will be concluding this argument. There's simply no point in beating a dead horse of rehashed bull dung.

You are definitely mistaking things.

Firstly smartshift is not new per se. Such tech have always in use in mobile/constrained systems. You PC psu dont have such limitations as in your Windows/Mac need not have additional counters to shift power between cpu and gpu. It is either you PC psu is underpowered and system shuts down or your CPU/GPU can take as much power they need.

Series X and PS5 will both downclock in low loads.

When MS said sustained, it is usually mean when you stress the system at full load and the system can run the max load at the reported max frequencies without downclocking.
 
Cache scrubbers allow granular cache kills, keeping cache hits high. The end. There's absolutely nothing bad about good cache invalidation.

Not sure why she thinks "The obvious solution would be flushing GPU caches when the SSD is read ".. The point is not to read the thing in the first place!

2 biggest issues in comp sci
1: Naming things
2: Cache Invalidation
3: Off-by-one errors



Actually the more I read...

she's talking nonsense.

"XVA is the better approach if you want to do processing on the data like I said above, as you can feed the CPU/GPU directly from the SSD, where as on PS5 you have to copy from SSD to RAM, then the CPU/GPU can read in the copied data from RAM and work with it "

Where is the data stored that you are reading? What the.! You always move to RAM!


"The obvious solution would be flushing GPU caches when the SSD is read, that way no matter what the GPU doesn't get a cache miss (it knows the cache is clear and to ignore cache and look in RAM) "

Flushing cache is stupid, you will just get a bunch of misses then, a cache miss isn't that you checked cache and it was empty.. The problem is the next part (Checking the source). I mean, you are caching for a reason.

Supposedly there is the ability to read directly into cache. You do NOT have to copy to RAM first to consume date if its GPU friendly format.

She is absolutely correct on that point. The diagram Cerny provided makes it look like the SSD copies to RAM.

XvA is expected to copy to any consumer in the I/O stack.
 
Supposedly there is the ability to read directly into cache. You do NOT have to copy to RAM first to consume date if its GPU friendly format.

She is absolutely correct on that point. The diagram Cerny provided makes it look like the SSD copies to RAM.

XvA is expected to copy to any consumer in the I/O stack.
When I say RAM i mean VRAM directly. You don't push to a cache, these caches are tiny. You don't have anything reading GPU/CPU-> SSD into ...? L1 cache, L2 cache?

Seriously, that twitter account is a bluffer.
 
Last edited:
I'm not sure what you mean, First it wasn't specifically PRT, it was SFS and the specific features that the twitter user mentioned. I see you keep resorting to just saying PRT, but i don't see the point if that is part of a larger feature there would be no need to single it out.

Second, I thought the guy on twitter that alluded to the effective 2x to 3x increase in memory/bandwidth worked for MS? Sure, MS hasn't officially said this yet, but they also said they would talk more about it later. If you are claiming that isn't what the twitter guy (I forget his name at the moment, forgive me) was actually saying, what do you think he was talking about?

James Stanard. He is the engineer responsible for implementing BCPACK and SFS on the XSX and inside DX12U overall.

He is directing the team writing the code for the XTC or Xbox Texture Compressor, so other than Goossen, he is probably the foremost authority on the Xbox APIs and Hardware.

Please don't listen to Panajev2001a... listen to Stanard. Panajev2001a is on a Sony mitzvah. Lol
 
Last edited:
When I say RAM i mean VRAM directly. You don't push to a cache, these caches are tiny. You don't have anything reading GPU/CPU-> SSD into ...? L1 cache, L2 cache?

Seriously, that twitter account is a bluffer.

I think i read there was like 40 MB of l2 cache on XSX GPU.

It depends on what you are reading into the cache I guess.

But the xSX gpu can directly consume from the SSD without writing to VRAM first. That is something we know.
 
Last edited:

Nikana

Go Go Neo Rangers!
I think the whole 2-3x multiplier comes from misinterpretation:




So basically what Andrew Goossen said is the same what Cerny said during his presentation - that up until now the consoles had to store entire levels, or at least huge parts of the gameplay in RAM, and now thanks to SSDs they will be able to store only a small bit of the nearest, upcoming section and stream the remaining parts on the fly, when needed. So basically, with HDDs the consoles would have to store entire level within 13GB, and now they have 13GB for just a chapter/checkpoint, effectively expanding the overall size of the entire level.

Ding ding ding. We have a winner.
 
I think i read there was like 40 MB of l2 cache on XSX GPU.

It depends on what you are reading into the cache I guess.

But the xSX gpu can directly consume from the SSD without writing to VRAM first. That is something we know.
These caches are like, vertex buffers, shader cache, tiny parts of textures, there are a few of them, depending on a how a job is scheduled to a CU or whatever. The SSD wouldn't really feed into this.

It's probably more like 4mb or so of cache, any bigger you lose die space and performance (these are ultrafast reads).
 
Last edited:

Journey

Banned
Stop spreading Fan fiction. The mumurs about MS surprising Sony came from Tom Warren who said MS wanted to surprise Sony.

The only surprise is MS not caring about console form factor. Xbox One also has a larger die than PlayStation 4, in fact it’s the same size as XSX.


Classic misleading falsehood. PS4 has a larger cluster of CU's, 20 CUs in total in the die vs 14 CUs in Xbox One's die, so the PS4 logic space in the die is 40% larger, and not a coincidence why the PS4 is 40% more powerful. The reason why Xbox One's die is bigger is because they had to add ESRAM which was actually detrimental to Xbox One in both performance (From losing room to add more CUs to the die) to also making Xbox One more expensive to manufacture since ESRAM doesn't come cheap.

Saying Xbox One has a larger form factor suggests it doesn't matter, but one, if not THE most important factor when comparing the PS4 vs Xbox One is the difference in CU count.

This time the die size difference is all about the CUs. XSX has 45% more CUs than the PS5, an even bigger factor than PS4 over XBO. We would be having a completely different conversation this coming gen had the clock frequencies been similar, but Sony increasing the clock was the only way they could compete with the power difference and it's one of the things you can adjust last minute, the other being ram, in fact it was expected for Sony to do this as an answer, I can confidently bet that it was never Sony's intention to go with clocks as high as 2.23Ghz from the beginning.
 
Last edited:
Ding ding ding. We have a winner.

Its not misinterpreted by the people talking about it. This is literally what we have been saying with many many quotes and references.

The multiplier was that you don't load the whole of anything. You only load what you need (in the frustum) when you need it.

It also goes further. You don't even load anything that won't be needed in the scene until just before you need it so you literally have less to discard or scrub.

If a whole texture is 10mb... you only load 3mb. Now you can get two more textures to fit in the memory space that would have been consumed by a single texture.

Now compress those textures 50%. Instead of 3 textures for the price of 1 you get 6 moving over the same pipeline.

Or you have to spend the money to double or triple the speed of the HW.
 
Last edited:
I haven’t read anything about this yet but the title made me do a double take.

4.8. gb/s and 100gb instantly does not mesh well. That’s far from instant.

Now this is all getting really technical but I'm trying to use my own "logic" here but taking what you say (data goes directly from SSD > GPU), what is the point of SFS and the 2-3x saving if it doesn't even get parked in RAM but goes straight to the GPU?

I'm just getting bewildered and bogged down on all this technical stuff....

Well, the GPU is still going to be accessing data from RAM, right ;) ?

So if something like SFS is being targeted for smarter utilization of asset management in RAM, that benefits the GPU with high-priority graphics data but if needed, GPU-optimized data on the SSD can still be streamed in for lower-priority graphics data (anything that's majority read-only, doesn't need bit or byte-level granularity in data alterability or access).

As for the 4.8 GB/s and 100 GB "instantly accessible" bit, well look at it this way: instant isn't referring to the bandwidth speed here. It's referring to the reduced overhead or steps the GPU has to go through in order to get that data (and with some GPU modifications, able to do so with reduced or no instruction reliance from the CPU). Normally if the GPU has to access the data from RAM, then it has to be transferred from the SSD by the CPU (in XSX's case; on PS5 the I/O block takes care of this after being instructed by the CPU to get X or Y data off storage), placed in the RAM, and then can access that data out of RAM as long as the CPU is not accessing the memory bus (since these are APUs, they share the pool of memory; this is for both PS5 and XSX and PS5 additionally has the I/O block share the bus).

If you can provide a way for specifically formatted data on storage to be directly streamed in by the GPU without CPU instruction calls, that basically removes the step of transferring the data from storage to RAM. The trade-off is the speed; data being streamed in to the GPU this way is magnitudes slower than the GPU getting the data from RAM. On the other hand, it gains 10x the amount of data to access (100 GB partition) compared to the amount of data in the GPU-optimized RAM pool the GPU can usually access (10 GB).

But in order for that to be of particular use, the GPU needs a means of having a bit of autonomy from the CPU in accessing this data, so that perhaps the GPU can access data in the 100 GB partition while the CPU is accessing data from the RAM pool. And going by some of MS's own statements plus work on things like executeIndirect, that is most likely what they have done in terms of modifications with XSX GPU. So the GPU can see the data in that 100 GB partition as an extension of RAM, but it's not actually RAM. Similar to how older microcomputers and cartridge systems could "see" ROM cartridges as extensions of memory even though ROM cartridges aren't actually RAM (slower, read-only, etc. Though they do have bit & byte-level accessibility and alterability like RAM).

Classic misleading falsehood. PS4 has a larger cluster of CU's, 20 CUs in total in the die vs 14 CUs in Xbox One's die, so the PS4 logic space in the die is 40% larger, and not a coincidence why the PS4 is 40% more powerful. The reason why Xbox One's die is bigger is because they had to add ESRAM which was actually detrimental to Xbox One in both performance (From losing room too add more CUs to the die) to also making Xbox One more expensive to manufacture since ESRAM doesn't come cheap.

Saying Xbox One has a larger form factor suggests it doesn't matter, but ione if not THE most important factor when comparing the PS4 vs Xbox One is the difference in CU count.

The reason XSX die is larger is because, just like the PS4 vs XBO, it has 45% more CUs, an even bigger factor than PS4 over XBO. We would be having a completely different conversation this coming gen had the clock frequencies been similar, but Sony increasing the clock was the only way they could compete with the power difference and it's one of the things you can adjust last minute, the other being ram, in fact it was expected for Sony to do this as an answer, I can confidently bet that it was never Sony's intention to go with clocks as high as 2.23Ghz from the beginning.

I do honestly think Sony were looking for at least a 2 GHz clock on the GPU, since they decided from the get-go for a 36 CU GPU and that meant they could only get desired performance with high clocks, with a big focus on the cooling system. Maybe they even planned for variable frequency much earlier on (they could not test that on Ariel though since it was an RDNA1 chip, and at least two of the early Oberon revisions were using a fixed frequency setup, Cerny seemed to have suggested this himself).

But I kinda think 2.23 GHz in particular may not have been planned, or at least not planned for actual implementation, even if they knew it could technically be done. Maybe there was a decision the team had to make between higher Gbps GDDR6 chips or further pushing the clock. Higher clock would need even better cooling, they would have tested implementation of said cooling earlier on as something to decide on.

Figure the decision to implement the stronger cooling could've been preferred from a pricing POV vs. higher Gbps GDDR6 chips, and probably more friendly in fitting in with variable frequency power profiles and setup. But personally I do think their GPU clock is notably north of whatever RDNA2's upper sweetspot frequency limit is on 7nm DUV enhanced. We can even kind of use XSX's GPU clock as a reference for where the lower end of that new sweetspot might be, though maybe the sweetspot's expanded from a 100 MHz range to a 150 MHz range.

IMO 2.23 GHz would still be well north of that on the process the consoles are on.
 
Last edited:
This is not a private chat server amongst fanboys dedicated to hype XSX and find a way to put PS5 down (I think there is one of you look for it apparently ;)) not a platform for free or paid astroturfers, but a public thread on a multiplatform forum where a rather ludicrous unsubstantiated claim was made (and I happened to read it), peddled as truth based on nothing much, and I happened to disagree with it and reserve my right to discuss it.
Just pointing out the irony of the statement. Also, there is a difference between discuss and defend. This thread has been super interesting in regards to the speculatated technical aspects of these machine, but make no mistake there are several on here who are doing a bang up job on defending their platform of choice, you being one if them. Honestly it detracts from the actual discussion because the informative posts get buried in console warring. You can obviously tell who is excited about the tech and who is wearing it as a badge of honor (which is weird, btw).
 

THE:MILKMAN

Member
Well, the GPU is still going to be accessing data from RAM, right ;) ?

I don't know because you suggested it bypasses the physical RAM so there is no data in there? Are you saying some data goes direct to the GPU while other data goes the traditional route to/through RAM? All sounds like it is complicating things.

Like I said this is all way over my head so if there are any programmers here with console/PC experience that could explain it like I'm 5, please do!
 
I don't know because you suggested it bypasses the physical RAM so there is no data in there? Are you saying some data goes direct to the GPU while other data goes the traditional route to/through RAM? All sounds like it is complicating things.

Like I said this is all way over my head so if there are any programmers here with console/PC experience that could explain it like I'm 5, please do!
You are reading into System RAM always, in a PS5, System RAM is unified. The big massive change here is SSD -> System RAM without touching the CPU, and some nice features around ensuring L1 and L2 cache are cohesive (ie: They don't store different versions of the same data) and are cleaned up (with some accuracy too, so the rest of our precious cache is left alone).

Reading SSD -> CPU is like saying you have a guy pouring gas directly into a car's engine block with a firehose, and actually it might be Diesel.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
Just pointing out the irony of the statement. Also, there is a difference between discuss and defend. This thread has been super interesting in regards to the speculatated technical aspects of these machine, but make no mistake there are several on here who are doing a bang up job on defending their platform of choice, you being one if them. Honestly it detracts from the actual discussion because the informative posts get buried in console warring. You can obviously tell who is excited about the tech and who is wearing it as a badge of honor (which is weird, btw).

I was discussing a point because it, I felt, disagreed with reality and nonetheless was stated sometimes a bit aggressively as a pure fact. It was confusing at best and an attempt at creating a disingenuous narrative at worst. I tried my best to provide arguments though, maybe more in a more detailed way earlier on than after pages and pages of repeating the same thing perhaps.

Excited about tech or not is not the point, I can be excited about A, but it does not make sense for me to suddenly go out and trying to make a case for something without either having something to concrete back it up with or being prepared for my excitement to be taken as factual evidence. So console warning is discussing overexcited fans and not taking their arguments as gospel or accepting their argument and/or the evidence used to back it up as factual just because it is reiterated over and over and over without any other evidence?! Not sure what to say to that... I disagree with it, simple as that.
 
Are some people actually belittling a college student/developer simply for putting out an idea some people don't like? I might not've agreed with everything the Crytek dev was saying but at least I acknowledged they had some good points in there and some of the things they mentioned did seem plausible (like PS5 being relatively easier to develop for, which was more or less his entire point).

So I'm gonna take a moment and surmise my thoughts on how PS5 and XSX's SSD I/O systems most likely work, given the evidence we have and some speculation on top of that.

PS5:

0bmAhHb.jpg


This is the PS5's custom I/O block, shown in Road to PS5. Based on the diagram, it sends data from itself to the main system GDDR6 memory pool, meaning it shares the memory bus with the CPU and GPU to RAM. Which means, most likely, when the I/O block is sending or retrieving data from RAM, the CPU and GPU are not accessing RAM, similar to how when the CPU is accessing RAM the GPU has to wait and vice-versa (this same bus contention is present on XSX, since on APUs the memory bus is shared between the different components).

Most likely, the I/O block is not sending data through a direct link to the GPU; if that were the case it is a feature Cerny would have mentioned at Road to PS5, since it's a pretty critical feature to have. At the very least, that would have been a feature alluded to, so I'm now of the belief that is not a method PS5 is utilizing to transmit data from NAND through to other parts of the system. This is partly why the SSD is so fast at 5.5 GB/s for raw data and 8-9 GB/s for typical compressed data, and up to 22 GB/s for particularly "well-compressed" data; if the I/O block is sending and receiving to/fro RAM and the CPU/GPU can't access during that time, you would want the I/O block to finish its operations as fast as possible so those other components can access the bus quicker. This is beneficial even if the I/O block is only sending a few megabytes worth of data at any given time.

That's essentially the basics of how PS5's I/O system functions. The CPU still communicates with the I/O block to instruct other parts of the block what to do, but that's about it. Compression/Decompression etc. are all handled on the I/O block.

XSX:





From what we know, the XSX reserves 1/10th a core to handle some of the system I/O operations. We can assume this also involves reading and writing data from NAND to/from system memory. Obviously, when the CPU is doing this, the GPU can't access the memory because of the same bus contention stuff mentioned above. However, this 1/10th a CPU core is only sending and retrieving data to/from RAM; it's not compressing, decompressing or doing any extra work on it, because the system has other hardware to handle those tasks.

It can send and retrieve raw data at 2.4 GB/s, compressed data at 4.8 GB/s and certain compressed data at up to 6 GB/s. These are sustained numbers meaning under max loads this is the majority of the throughput they expect; there could be sparse peaks with higher numbers (the memory controller supports the hardware to enable that), but those are likely to be edge-cases.

The reason I pulled those two quotes is because MS, in posting them, essentially have indicated that their SSD has a direct feed link to the GPU, so the GPU can stream in data from the SSD with game code basically looking at it as an extension of RAM, similar to how older game consoles and microcomputers could treat ROM cartridges as extensions of their own system RAM. That way, no specific calls to the data on those cartridges had to be done, reducing overhead significantly in moving data from storage to memory since it's NOT going to memory in the first place.

MS seem to have allocated a 100 GB partition of the SSD to serve as the link of direct feed access to the GPU, which will be able to perform tasks of GPU-bound data in the partition without CPU intervention, utilizing advancements in features such as executeIndrect which was already present on the XBO. That's where the "instant" part comes in; there's no transition of the data in that 100 GB partition to memory the GPU has to wait on to operate with the data. The trade-off being that data being fed in through the NAND at 2.4 GB/s raw is magnitudes slower than data coming from 10 GB of GDDR6 at 560 GB/s, similar to how ROM carts were slower than the RAM in older cartridge game systems even if the ROM carts were able to be treated as extended (read-only) memory.

--------------

This is basically what it seems the two systems are doing; PS5 wants to maximize data throughput from storage to RAM as quickly as possible given today's technological limits and use a mostly hardware-dominated approach, whereas XSX can sufficiently supply the RAM with data from storage, just not as fast as PS5. However, it has a scaled-down implementation of AMD's SSG cards in providing a direct access feed from the SSD to the GPU via a 100 GB partition, which is treated as extended memory by the system, and the GPU having modifications so that it can work with this data without CPU intervention.

That seems to be a general perspective on how the two systems are implementing their SSD I/O systems, and when you think about it they're a bit apples to oranges. It's a disservice to both to directly compare them because they are achieving fuller utilization of the system data pipeline through different means that are equally valid in their own ways and areas of efficiency.



This isn't true. AMD's SSG cards have 2 TB of NAND storage on the card the GPU can directly access for streaming of asset data. The data has to be formatted a given way (after all, data on NAND cannot be addressed at the bit or byte level), but the tech is out there.

It would seem MS have taken a cut-down version of that and are utilizing it for XSX. It being "instantly accessible" is more due to the GPU having extended work done with features like executeIndirect that allow the GPU to work with streamed-in data without CPU interrupt to instruct it what to do with that data.

The only quibble I have with your description is that the XSX has a *some* separation between GPU and CPU accesses. While the CPU can see all 16 GB, the GPU can only see 10GB and those 10 GB are more or less reserved for it. At the very least there are 4 X 1GB modules (@224 GB/second) which never have contention because they slotted entirely for the GPU. The PS5 does not have the same memory setup and all lanes are up for contention in its memory set up,
 
Last edited:

THE:MILKMAN

Member
You are reading into System RAM always, in a PS5, System RAM is unified. The big massive change here is SSD -> System RAM without touching the CPU, and some nice features around ensuring L1 and L2 cache are cohesive (ie: They don't store different versions of the same data) and are cleaned up (with some accuracy too, so the rest of our precious cache is left alone).

Reading SSD -> CPU is like saying you have a guy pouring gas directly into a car's engine block with a firehose, and actually it might be Diesel.

Yeah so currently a couple of ~GB's of data is parked/cached in RAM but with these new systems there is no need to park data and the data can flow back/forth within a frame or two?

I'll have to think about and look into the ability of skipping the CPU as I still can't get my head around it. Some of the PS5 I/O blocks tasks are currently carried out by the CPU maybe? My current understanding is the path of data from SSD to screen is the same as now but just 2 orders of magnitude faster with maybe a few innovations thrown in.
 
I was discussing a point because it, I felt, disagreed with reality and nonetheless was stated sometimes a bit aggressively as a pure fact. It was confusing at best and an attempt at creating a disingenuous narrative at worst. I tried my best to provide arguments though, maybe more in a more detailed way earlier on than after pages and pages of repeating the same thing perhaps.

Excited about tech or not is not the point, I can be excited about A, but it does not make sense for me to suddenly go out and trying to make a case for something without either having something to concrete back it up with or being prepared for my excitement to be taken as factual evidence. So console warning is discussing overexcited fans and not taking their arguments as gospel or accepting their argument and/or the evidence used to back it up as factual just because it is reiterated over and over and over without any other evidence?! Not sure what to say to that... I disagree with it, simple as that.

Honestly you are disagreeing with reality. I dont see why this has up in arms so much. its frankly a different poster than what I remember from b3d. Your assertions are out of touch with the factual statements from the engineers. The baseline you are looking for come the cost of transferring data on the bus (reductions from a texture map that's 8MB dow to 2MB). That ts the 2-3x multiplier.

Literally all the other stuff you have posted about representative baseline is simply clouding the statement for your purpose. Whatever that purpose is.
 

Panajev2001a

GAF's Pleasant Genius
James Stanard. He is the engineer responsible for implementing BCPACK and SFS on the XSX and inside DX12U overall.

He is directing the team writing the code for the XTC or Xbox Texture Compressor, so other than Goossen, he is probably the foremost authority on the Xbox APIs and Hardware.

Please don't listen to Panajev2001a... listen to Stanard. Panajev2001a is on a Sony mitzvah. Lol

Listen to Stanard over me, why would I say otherwise? Listen to what he is actually saying though, not what you may want to hear him say. He explained, and we went over this several times, what SFS bring onto the table roughly and other tech talks posted here talked about Sampler Feedback itelf. Do I believe that SFS will make implementing an efficient texture streaming solution easier? Yes.
We are not disagreeing on this or how each improvement over devs doing it in software, PRT bringing support for sparse textures / virtual texturing in HW, Sampler Feedback exposing data on how texture data is used and what is requested and discarded per frame, and SFS taking this information into account to help you automatically page in data you are likely to need in time and smooth things over if a texture does not get there on time essentially.

What we are disagreeing on is this: the baseline for the comparison. I find it hard to believe that he is implying that SFS is increasing effective SSD I/O throughput and lowering memory needed in RAM to store textures to 1/4th of what is normally achieved compared to a game implementing a virtual texturing system (using tiled resources or not) unless we are talking about some pathological edge cases.

Its not misinterpreted by the people talking about it. This is literally what we have been saying with many many quotes and references.

The multiplier was that you don't load the whole of anything. You only load what you need (in the frustum) when you need it.

It also goes further. You don't even load anything that won't be needed in the scene until just before you need it so you literally have less to discard or scrub.

If a whole texture is 10mb... you only load 3mb. Now you can get two more textures to fit in the memory space that would have been consumed by a single texture.

Now compress those textures 50%. Instead of 3 textures for the price of 1 you get 6 moving over the same pipeline.

Or you have to spend the money to double or triple the speed of the HW.

Sure, this is the point of streaming texture data in and out of VRAM in the first place to only store and display what you need. Virtual texturing being an extension of that idea (MegaTextures an extreme of it trying to make it cheaper on the rendering side and reduce draw calls overhead as a single giant texture would be used for everything in the scene and pieces of it would be streamed in and out).

The misunderstanding, unintentional or disingenuous, is what the technique you described refers to as basis of comparison. To be three times faster than something else you need to define that something else and the conditions/details of the metric you are comparing. The disagreement is here nor about how awesome XSX is (it is).
 

Panajev2001a

GAF's Pleasant Genius
Honestly you are disagreeing with reality. I dont see why this has up in arms so much. its frankly a different poster than what I remember from b3d. Your assertions are out of touch with the factual statements from the engineers. The baseline you are looking for come the cost of transferring data on the bus (reductions from a texture map that's 8MB dow to 2MB). That ts the 2-3x multiplier.

Literally all the other stuff you have posted about representative baseline is simply clouding the statement for your purpose. Whatever that purpose is.

No, the baseline is important. If we are saying 3-4x improvement over a game implementing an efficient virtual texturing / streaming mechanism vs a game that does not have it in place we are talking about two completely different scenarios: in the former scenario my jaw is on the floor while in the other one is only “just” happy and positive the needle is moving forward and efficient texture streaming has a lower and lower barrier of entry.
 

BeardGawd

Banned
P Panajev2001a if I'm not mistaken MS built-in a special chip to monitor their most recent console (Xbox One X) and determined only a 3rd to half of all texture pages loaded in memory are being used.

So the comparison being made with SFS is compared to that. So clearly based off this data PRT and SF have not been used on the whole on games (perhaps this is due to complexity, performance or quality issues).

SFS is looking to alleviate whatever that bottleneck is.
 
Yeah so currently a couple of ~GB's of data is parked/cached in RAM but with these new systems there is no need to park data and the data can flow back/forth within a frame or two?

I'll have to think about and look into the ability of skipping the CPU as I still can't get my head around it. Some of the PS5 I/O blocks tasks are currently carried out by the CPU maybe? My current understanding is the path of data from SSD to screen is the same as now but just 2 orders of magnitude faster with maybe a few innovations thrown in.

Well within a frame or two depends on a lot, how you optimize to read only what you see, and game engines have done this before for textures and are now doing it for things like geometry, because IO tech like this exists.

I wouldn't go crazy for SFS as the tiebreaker here, this is already being done in software by lots of engine teams. It will help though kinda in a background way. I wouldn't be able to say it suddenly triples your RAM bandwidth. So does UE5 in that case.
 

T-Cake

Member
This doesn't make any sense. A 100GB partition for direct access by the GPU? So you have to keep moving GBs of data in and out of that partition every time you play a different game?
 
The only quibble I have with your description is that the XSX has a *some* separation between GPU and CPU accesses. While the CPU can see all 16 GB, the GPU can only see 10GB and those 10 GB are more or less reserved for it. At the very least there are 4 X 1GB modules (@224 GB/second) which never have contention because they slotted entirely for the GPU. The PS5 does not have the same memory setup and all lanes are up for contention in its memory set up,

Ah okay, I appreciate the clarification. So would that mean the XSX GPU can access that 4 GB of data even while the CPU is accessing its 6 GB of data from the 6x 2 GB modules?

That would be interesting if true, provided the hardware and OS have the provisions in place to let it facilitate seamlessly. I had already figured the CPU had access to all 16 GB with it needing to write incoming data to the 10 GB and 6 GB pools. So at very least when it's doing that, that would be a scenario where the GPU can't access RAM but with some GPU modifications it can still stream in data from the SSD while the CPU writes data to RAM (and I figure to avoid any potential errors the GPU is only able to read the incoming streamed data, not do write modifications, while the CPU writes contents to RAM. Assuming the CPU and GPU can receive streamed data from storage simultaneously, which is probably not the case so one or the other would be retrieving data from storage at a time).

This doesn't make any sense. A 100GB partition for direct access by the GPU? So you have to keep moving GBs of data in and out of that partition every time you play a different game?

It's possible they have a block of SLC NAND acting as the cache so yeah, it would probably either require a single game storing lots of its contents in that partition and then moving it around the drive when another game needs it, or maybe multiple games can have data in that partition and any given game accessing it knows which portion is "their" data on access calls, IDK.

This is kinda why we're waiting for more official information, to clear up speculation.

Each game has upto 100GBs to use as 'virtual ram' when they install.

Maybe this might be an approach they take, though it could limit game installs to up to only nine (and that's considering the user doesn't want to store non-game files on the drive), if each game is getting 100 GB. Because you probably still need some space for OS files and maybe to facilitate some other functions of the OS.

It would be an even more software-driven approach, though, and games would probably need to install their data on the drive in a way where graphics data is formatted in a way the GPU can easily access.
 

Panajev2001a

GAF's Pleasant Genius
P Panajev2001a if I'm not mistaken MS built-in a special chip to monitor their most recent console (Xbox One X) and determined only a 3rd to half of all texture pages loaded in memory are being used.

So the comparison being made with SFS is compared to that. So clearly based off this data PRT and SF have not been used on the whole on games (perhaps this is due to complexity, performance or quality issues).

SFS is looking to alleviate whatever that bottleneck is.

I believe them, I believe they added this monitoring and they profiled games, but they do not go out and say they profiles games with an efficient texture streaming / virtual texturing solution and got those kind of numbers. I am not seeing the evidence that would suggest that to be the case or even likely.

There is still value as making developers’ life easier is doing God’s work.
 

Journey

Banned
Well, the GPU is still going to be accessing data from RAM, right ;) ?

So if something like SFS is being targeted for smarter utilization of asset management in RAM, that benefits the GPU with high-priority graphics data but if needed, GPU-optimized data on the SSD can still be streamed in for lower-priority graphics data (anything that's majority read-only, doesn't need bit or byte-level granularity in data alterability or access).

As for the 4.8 GB/s and 100 GB "instantly accessible" bit, well look at it this way: instant isn't referring to the bandwidth speed here. It's referring to the reduced overhead or steps the GPU has to go through in order to get that data (and with some GPU modifications, able to do so with reduced or no instruction reliance from the CPU). Normally if the GPU has to access the data from RAM, then it has to be transferred from the SSD by the CPU (in XSX's case; on PS5 the I/O block takes care of this after being instructed by the CPU to get X or Y data off storage), placed in the RAM, and then can access that data out of RAM as long as the CPU is not accessing the memory bus (since these are APUs, they share the pool of memory; this is for both PS5 and XSX and PS5 additionally has the I/O block share the bus).

If you can provide a way for specifically formatted data on storage to be directly streamed in by the GPU without CPU instruction calls, that basically removes the step of transferring the data from storage to RAM. The trade-off is the speed; data being streamed in to the GPU this way is magnitudes slower than the GPU getting the data from RAM. On the other hand, it gains 10x the amount of data to access (100 GB partition) compared to the amount of data in the GPU-optimized RAM pool the GPU can usually access (10 GB).

But in order for that to be of particular use, the GPU needs a means of having a bit of autonomy from the CPU in accessing this data, so that perhaps the GPU can access data in the 100 GB partition while the CPU is accessing data from the RAM pool. And going by some of MS's own statements plus work on things like executeIndirect, that is most likely what they have done in terms of modifications with XSX GPU. So the GPU can see the data in that 100 GB partition as an extension of RAM, but it's not actually RAM. Similar to how older microcomputers and cartridge systems could "see" ROM cartridges as extensions of memory even though ROM cartridges aren't actually RAM (slower, read-only, etc. Though they do have bit & byte-level accessibility and alterability like RAM).



I do honestly think Sony were looking for at least a 2 GHz clock on the GPU, since they decided from the get-go for a 36 CU GPU and that meant they could only get desired performance with high clocks, with a big focus on the cooling system. Maybe they even planned for variable frequency much earlier on (they could not test that on Ariel though since it was an RDNA1 chip, and at least two of the early Oberon revisions were using a fixed frequency setup, Cerny seemed to have suggested this himself).

But I kinda think 2.23 GHz in particular may not have been planned, or at least not planned for actual implementation, even if they knew it could technically be done. Maybe there was a decision the team had to make between higher Gbps GDDR6 chips or further pushing the clock. Higher clock would need even better cooling, they would have tested implementation of said cooling earlier on as something to decide on.

Figure the decision to implement the stronger cooling could've been preferred from a pricing POV vs. higher Gbps GDDR6 chips, and probably more friendly in fitting in with variable frequency power profiles and setup. But personally I do think their GPU clock is notably north of whatever RDNA2's upper sweetspot frequency limit is on 7nm DUV enhanced. We can even kind of use XSX's GPU clock as a reference for where the lower end of that new sweetspot might be, though maybe the sweetspot's expanded from a 100 MHz range to a 150 MHz range.

IMO 2.23 GHz would still be well north of that on the process the consoles are on.


Oh I agree, they locked in at 36 CU way before with the intent of pushing frequencies to a considerable 2Ghz, the 9.2 Teraflop figure was no mistake, but I do believe that they pushed harder once MS revealed their 12 TF figure, they HAD to.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
The only quibble I have with your description is that the XSX has a *some* separation between GPU and CPU accesses. While the CPU can see all 16 GB, the GPU can only see 10GB and those 10 GB are more or less reserved for it. At the very least there are 4 X 1GB modules (@224 GB/second) which never have contention because they slotted entirely for the GPU. The PS5 does not have the same memory setup and all lanes are up for contention in its memory set up,

T Trueblakjedi thicc_girls_are_teh_best thicc_girls_are_teh_best regarding this, unless I missed some data again, the only description I read was about CPU seeing no performance distinction while accessing memory (slow speed for the CPU accessing either pool of memory, saving bandwidth up from the GPU optimised pool and reducing contention furtherwhile the GPU sees part of memory at full speed and part of it at a slower speed so to speak) while the GPU does see it... more than the GPU not being able to access the other / less optimal pool:
"Memory performance is asymmetrical - it's not something we could have done with the PC," explains Andrew Goossen "10 gigabytes of physical memory [runs at] 560GB/s. We call this GPU optimal memory. Six gigabytes [runs at] 336GB/s. We call this standard memory. GPU optimal and standard offer identical performance for CPU audio and file IO. The only hardware component that sees a difference in the GPU."

They said CPU sees no difference in performance between the two pools while the GPU does so to me it would seem that the CPU is capped at 336 GB/ when reading and writing to RAM which makes sense.

If devs went on record specifying more about how this is achieved and whether or not the GPU can access RAM beyond the 10 GB of fast memory would be interesting. Assuming you are correct, it would mean that the CPU would need to move memory from the slow pool to the fast one for the GPU to see it in the perhaps unlikely scenario the GPU needed extra data beyond the 10 GB it has optimised access too.
 
Last edited:
1. In the road to PS5 presentation

Starts at 36:50



And here




2. This is not the same as boost clocks on laptop and desktop CPUs

3. You don't know if it will deliver better results. In theory it will, but you have no proof of that, so either show it or stop spreading nonsense


Potentially, like running UNO or some kind of that game.
If a normal next gen game could run both the CPU at 3.5GHz and the GPU at 2.23GHz, the clocks wouldn't have to be variable. They could be fixed.
But they aren't, because there is not enough power and other headroom issues.
 

FranXico

Member
Potentially, like running UNO or some kind of that game.
If a normal next gen game could run both the CPU at 3.5GHz and the GPU at 2.23GHz, the clocks wouldn't have to be variable. They could be fixed.
But they aren't, because there is not enough power and other headroom issues.
The power consumption is being limited to make the thermal throughput predictable. That's the headroom issue.
 
T Trueblakjedi thicc_girls_are_teh_best thicc_girls_are_teh_best regarding this, unless I missed some data again, the only description I read was about CPU seeing no performance distinction while accessing memory (slow speed for the CPU accessing either pool of memory, saving bandwidth up from the GPU optimised pool and reducing contention furtherwhile the GPU sees part of memory at full speed and part of it at a slower speed so to speak) while the GPU does see it... more than the GPU not being able to access the other / less optimal pool:


They day CPU sees no difference in performance between the two pools while the GPU does so to me it would seem that the CPU is capped at 336 GB/ when reading and writing to RAM which makes sense.

If devs went on record specifying more about how this is achieved and whether or not the GPU can access RAM beyond the 10 GB of fast memory would be interesting. Assuming you are correct, it would mean that the CPU would need to move memory from the slow pool to the fast one for the GPU to see it in the perhaps unlikely scenario the GPU needed extra data beyond the 10 GB it has optimised access too.

This is exactly correct sir.
 
No, the baseline is important. If we are saying 3-4x improvement over a game implementing an efficient virtual texturing / streaming mechanism vs a game that does not have it in place we are talking about two completely different scenarios: in the former scenario my jaw is on the floor while in the other one is only “just” happy and positive the needle is moving forward and efficient texture streaming has a lower and lower barrier of entry.

I think ultimately the ability to peer into the texture, sample it, and then filter to the exact page that is necessary in the frustum is the benefit of SFS. Other solutions apparently will pull the entire mip. PS5 SSD is so fast that it can pull, display, then the cache scrubbers discard the entire mip.

Instead, XVA just pulls the page and not the entire mip, compresses it further, and transports it just it time to the GPU. its isnt going to be 100 percent perfect, but in cases where it isn't, it will simply request the next closest mip available. So you are right in that this is in comparison to PRT ,and can be considered PRT+ plus HW that's special to the XSX.

Stanard says this:

"One of the most overlooked elements is our new sampler feedback streaming. It allows us to elegantly stream individual texture pages (rather than whole mips) based on GPU texture fetches. Special filtering hardware allows graceful fallback to the resident mip levels. "

"We don't stream *from* virtual memory, but you can reserve virtual memory for very large textures and only commit the pages you have streamed in from SSD."

And just so you dont think that he's trying to be misleading, when asked why they don't reveal more:

"It's not about secrecy. All will be revealed. I just have to be really careful not to mis-speak, over-promise, cause offense, or steal anyone's thunder. This is an exciting time for reveals, and I don't want to disrupt the message. "
 
Last edited:

Ascend

Member
So my question now is, if the XSX - SSD has 4 lanes to system memory ( 16Gb) what bandwidth is the CPU/GPU able to access data directly from 100GB at? I don't see how it could be the same 4 lanes if those components see them like system memory.
I don't think we have enough details on XVA to be able to answer that...

Where is the data stored that you are reading? What the.! You always move to RAM!
That question is irrelevant. It's like asking, where does the data in RAM go when you transfer it to the GPU? Yeah. Where does it go? That answer is the same in comparison to the case of the XSX transferring from SSD, because this is from the perspective that the system sees the mapped 100GB on the SSD as RAM, like has been postulated a gazillion times.

"The obvious solution would be flushing GPU caches when the SSD is read, that way no matter what the GPU doesn't get a cache miss (it knows the cache is clear and to ignore cache and look in RAM) "

Flushing cache is stupid, you will just get a bunch of misses then, a cache miss isn't that you checked cache and it was empty.. The problem is the next part (Checking the source). I mean, you are caching for a reason.
To me, it doesn't sound like you're disagreeing with her at all.

This Louise person is not a dev. Word Salad basically has confused a lot of you.
Can you copy paste direct quotes (without quote mining) and explain why what she is saying is wrong?

She's a self proclaimed developer. Developer mode on Xbox is free.
And it isn't on the Switch.

I keep seeing these arguments trying to defame the person, rather than explaining why what she is saying is incorrect.
 
These caches are like, vertex buffers, shader cache, tiny parts of textures, there are a few of them, depending on a how a job is scheduled to a CU or whatever. The SSD wouldn't really feed into this.

It's probably more like 4mb or so of cache, any bigger you lose die space and performance (these are ultrafast reads).

It wouldn't stream into those caches. But it can select a texture or texture page... not even a whole mip, and grab it.

No CPU involved.
 
Last edited:

cireza

Member
Your explanations make sense.

I understand the Series X addresses the 100 GB of SSD as if it was RAM. However, if it does this without interfering with the CPU, if the data is compressed, then I suppose the load of uncompressing the data has to be handled by GPU, correct ? Unless there is a dedicated component for this, of course. Which might be the case ?
 
Last edited:
It's possible they have a block of SLC NAND acting as the cache so yeah, it would probably either require a single game storing lots of its contents in that partition and then moving it around the drive when another game needs it, or maybe multiple games can have data in that partition and any given game accessing it knows which portion is "their" data on access calls, IDK.

This is kinda why we're waiting for more official information, to clear up speculation.

Do you think we will learn more in July? Or will it wait until that MS tech talk/conference in August? Your theories seem very sound, am definitely curious to get the official explanation
 
Your explanations make sense.

I understand the Series X addresses the 100 GB of SSD as if it was RAM. However, if it does this without interfering with the CPU, if the data is compressed, then I suppose the load of uncompressing the data has to be handled by GPU, correct ? Unless there is a dedicated component for this, of course. Which might be the case ?

Interesting point. There could be 3 solutions:

The 100GB is uncompressed.

The 100gb is compressed and is decompressed by the HW block.

The GPU can consume compressed texture data.

I would bet on 2.
 
My issue with the analysis is AMD tech is not exclusive. They both can use the tech.



He certainly thinks so.


They both could, but that doesn't mean they both will. It all comes down to what priorities they focused on in their design of the APUs.

I'm just assuming these are the approaches Sony and MS have taken, mainly through what they've already shown and/or touched on publicly, and some insights from other posters, some of whom know more than me in some of the more knitty-gritty technical aspects of things. Just trying to piece together what seems most likely based on what we have so far.


Your explanations make sense.

I understand the Series X addresses the 100 GB of SSD as if it was RAM. However, if it does this without interfering with the CPU, if the data is compressed, then I suppose the load of uncompressing the data has to be handled by GPU, correct ? Unless there is a dedicated component for this, of course. Which might be the case ?

Gonna guess the data in the 100 GB partition is kept uncompressed; the CPU can maybe instruct the flash controller to transfer data to the 100 GB partition and since that pool is seen as extended "RAM" by the GPU it can read (and possibly write) data to that partition of NAND, but primarily use it for read-only operations (don't want too many writes because of endurance cycle wear levels).

That's an aspect probably worth thinking about a bit more, tbh. I'm assuming the GPU wouldn't be able to operate with compressed data in that space unless it's decompressed first.

Do you think we will learn more in July? Or will it wait until that MS tech talk/conference in August? Your theories seem very sound, am definitely curious to get the official explanation

Well, now we're hearing MS might do a hardware event in June going into more of the system features, particularly Velocity Architecture. But I was under the impression it'd be a general hardware event with maybe Lockhart showing up? These rumors seem to change every day!

So really, it could be June, July or August at this point. But the more likely seem to be August for certain, and after that maybe we get more next month. July is least likely since that's their game showcase and they probably don't want to focus on hardware while doing that.

Interesting point. There could be 3 solutions:

The 100GB is uncompressed.

The 100gb is compressed and is decompressed by the HW block.

The GPU can consume compressed texture data.

I would bet on 2.

This seems pretty possible. Do you know if the GPU can operate on compressed data in the 100 GB portion? I'm only assuming it can't because usually on PC applications with compressed files even when you go in the folder and open the file it usually has to partially decompress the file and with certain things like photos just opening it in a compressed file prevents you from scrolling through other pictures in the same folder, which I'm assuming is because the file is still technically compressed.

That might be a bad reference point on my end, tbh.
 
Top Bottom