• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Exploring The Complications Of Series X's Memory Configuration & Performance

How much are YOU willing to say recent Xbox multiplat perf is affected by the memory setup?

  • Very

    Votes: 18 9.8%
  • Mostly

    Votes: 32 17.5%
  • So/so

    Votes: 40 21.9%
  • Not really

    Votes: 41 22.4%
  • None

    Votes: 52 28.4%

  • Total voters
    183
  • Poll closed .
Status
Not open for further replies.

Yoboman

Member
Good post

This is the sort of analysis maybe with some dev interviews you would expect from a site like Digital Foundry about Series X performance. Especially after a year and a half of harping on about the power of 12 teraflops.

Instead all they've given is hand waving about tools and suggesting the devs didn't optimise enough
 
George Clooney Reaction GIF
 

yamaci17

Member
Good post

This is the sort of analysis maybe with some dev interviews you would expect from a site like Digital Foundry about Series X performance. Especially after a year and a half of harping on about the power of 12 teraflops.

Instead all they've given is hand waving about tools and suggesting the devs didn't optimise enough
i mean in the end let's be realistic, from dev's perspective

Console PS5 has like %65 userbase (I'm being optimistic for Xbox)
Xbox SX+Series S has like %35 userbase (as I said, optimist numbers)

LOGICALLY... if you have 100 units of optimization time, if I was a dev, 65 units of time would go to PlayStation, and 35 units of time would go to Xbox SX+SS. It is only fair. That console has a higher userbase. I'm not the owner of the console, nor I have an obligation to provide "equal" optimization. If there were a 3rd console brand that only has a %1 installbase, would you allocate your time equally between the tree? hell no. This is why older generations of GPUs get progressively worse performance. Take Kepler/Fermi+Maxwell GPUs. They're now being phased out. Why'd any dev tailor their game for maxwell/gcn3-4-5 anymore? They have extremely niche userbase. But take good notice how well Pascal (GTX 1000 series) aged finer in recent 3 years. Even the latest spiderman and uncharted and many more PS port RUNS better on PASCAL than GCN cards, despite the game being literally designed for GCN consoles. It is clear most of the devs give some importance to Pascal optimization, since they present the biggest chunk of PC installbase, from 1050ti to 1080 and 1060 being very popular GPU, still.

now, let's return back to the initial discussion. 65 units of time goes to PS5. plenty. then you have 35 units of time going to xbox series x... AND series s.

that's where the fun comes in! you must now scale back your entire game in terms of audio, physics, draw distance, resolution, texture and many more intricately to a point where you do not let the game get destroyed or lose its essence for Series S, even a nicher userbase, all the while dealing with a more intricate and complex and less optimized API in DX12 with weird memory pools.

like literally ALL the odds are against xbox series x + s. microsoft dug this grave themselves. only their own studios can do shit at this point. and I hope they do, so at least one can say these consoles are actually capable when devs exclusively code for it. other than that, writing is on the wall. most 3rd party games are gonna struggle on series x + series s.

u can disagree with my logic. even if a dev were to give EQUAL amount of time and care for both xbox and ps5, the GIVEN time would still have to be DIVIDED between xbox series s and x. there's no magical button that scales a game back to Series S's capabilities. it simply does not exist.
 
If a game only uses 10 GB, than I'm assuming there isn't any issue. But if it's trying to use more, than yeah that's going to be a headache. And I'm sure it's even more of a headache for the weaker S.

PS4 and PS5 have a unified pool of RAM for this reason. And PS5 made strides in removing bottlenecks for devs, not adding some.

I'm particularly worried about how Series S will hamstring things going forward. Alex's description of how generic API application across two different hardware configs (which is the way MS kind of intended to streamline dev with GDK and DX12U; the other issue being the sheer amount of options for implementing the same thing (or wanting the same result) in DX12U that can impact performance, i.e sometimes being too much choice for devs) is a bit concerning.

It leads me to think that the only way teams working on more ambitious games can mitigate gimped performance that impacts the Series X version, is to take even more time out of their dev schedules to apply specific solutions for the Series S builds, which could require a lot of experimentation.

Or maybe, just maybe the PS5 is just as powerful as the XBSX.

Imo, there is nothing wrong with the XBSX hardware or software.

I think we let teraflops take over and stop us from appreciating how well engineered these consoles are.

Well like Lysandros Lysandros said, "powerful" can mean different things. For Series X it's theoretical peak compute, raw RAM bandwidth (on 10 GB of the memory) and (barely) faster CPU. PS5's advantages are in quite a lot of other areas, some of which are more pertinent to actual gaming performance.

But I agree with you that both systems are well-engineered for their intended strengths. It just helps to also recognize that priorities were not necessarily the same for both cases.

"In terms of how the memory is allocated, games get a total of 13.5GB in total, which encompasses all 10GB of GPU optimal memory and 3.5GB of standard memory. This leaves 2.5GB of GDDR6 memory from the slower pool for the operating system and the front-end shell. From Microsoft's perspective, it is still a unified memory system, even if performance can vary. "In conversations with developers, it's typically easy for games to more than fill up their standard memory quota with CPU, audio data, stack data, and executable data, script data, and developers like such a trade-off when it gives them more potential bandwidth,"

You won't need more than 10gb for the GPU, first party games will leverage Sampler Feedback Streaming which will have a huge impact

FQ9kmXd.jpg


Therefore Series X has much higher peak bandwith and according to DF more Ram available to games. As games get more compute heavy this gen the wider architecture will prove it's worth as in the PC GPU scene where top range cards are wider not narrow and clocked faster.
We can see with Forza Motorsport that they are already pushing ahead with 4k 60fps and RT on track. It's just the start.

I think you're misunderstanding SFS's purpose. Sampler Feedback is meant to reduce the amount of render passes for when only portions of a screen-space with a target texture are updated. Sampler Feedback Streaming is designed to cut down the size of texture data needing to be stored in memory when it is a texture that will be utilized for only select portions of an object's rendered result.

In other words, for cases where the texture in question is only being partially used (in terms of the aspects of that texture) and by either one or a number of objects in the scene, SFS is designed to reduce the need for the entire texture to be loaded and stored in memory. But just THAT texture. Now, if it's a lot of textures that are otherwise each only having small portions of the actual texture being used for objects in that scene, theoretically SFS can start to make more of an impact.

However, like someone else said earlier in the thread, that isn't going to automatically produce some massive advantage for Series X to where some of the issues I raised just completely disappear. You can have a situation where say Series X only needs half an actual texture to be in memory, but it's not like the PS5 can't replicate similar access with it's I/O subsystem. And it seems in PS5's case, this is a lot easier to leverage for the time being.

You have to keep in mind that PS5's SSD subsystem is set up to mimic DRAM access patterns, in terms of channels. More channels increases granularity of access, and there are facets to the compression schema of both systems (when it comes to data) worth considering as well. But even excusing that, even if the presence of SFS leads to what you describe, it doesn't absolve the concerns where game data is CPU-bound because in scenarios where the system needs the CPU to access data for a decent portion of the time, Series X's total system bandwidth is dragged down by the slower pool.

It still would, in many of those scenarios, still have overall bandwidth higher than PS5's, but not by a lot. Also worth considering: in any instances where portions of a texture currently not in memory (because the SFS map determined those portions weren't needed for an earlier scene) need to be loaded into memory later, then at those points the load from SSD to memory still acts as the lowest point in the pipeline, and as more portions belonging to the same texture need to be stored in RAM, the benefits of SFS become reduced further along the game state.

Also I'm not 100% sure on how SFS "samples" the texture but I'm almost sure that the full texture itself needs to either be (very briefly) placed in memory first (and the unneeded parts dumped out), or maybe some part of the system is using a FIFO buffer to read the data as it's being accessed to discard unneeded data before writing results to RAM. If you know anything specifically to that part please clarify for me.

Do the developer can decide or control where the data goes? because if you can’t control and you are getting different random latency impacting your engine, I can imagine a lot of developer will give up the additional memory to make testing the engine easier.
its already complicated enough that you don’t need more complicated memory to manage. I wonder if only MS game might actually well optimized for XSX. Especially if the sales number are behind the PS5.

Yeah, developers can decide where the data goes, and I'd suspect the kernel & OS assist in that to a small degree as well. They kinda have to have control where that data goes, it would be a nightmare if a full 3.5 GB worth of GPU data went into the 6 GB pool because that'd only mean 7 GB of GPU-bound data in the faster pool, and modern games would need a lot more than that..
 

TonyK

Member
u can disagree with my logic. even if a dev were to give EQUAL amount of time and care for both xbox and ps5, the GIVEN time would still have to be DIVIDED between xbox series s and x. there's no magical button that scales a game back to Series S's capabilities. it simply does not exist.
Obviously no, but these developers are used to PC, where there are zillions of configurations. Games are done with that in mind. They don't do the best version posssible and then, during the last three months of development, scale down the graphics. Game development doesn't work in that way.
 

yamaci17

Member
Obviously no, but these developers are used to PC, where there are zillions of configurations. Games are done with that in mind. They don't do the best version posssible and then, during the last three months of development, scale down the graphics. Game development doesn't work in that way.
and? aren't most games on PC disastrous (per most console folk) ?
 

Lysandros

Member
I haven't trusted MS with hardware specs since last gen when they claimed the Xbox One had more bandwidth than the PS4

They said the Xbox One's bandwidth between the Esram and DDR3 could be combined to get 204GB/s except you can't sustain that speed for every cycle with the small amount of Esram they put in the console
They actually combined read and write figures in a pretty fantastical claim, the real figure was 102 GB/s for that 'hefty' pool of 32 MB.
 

damidu

Member
wow a dissertation. send it to phill

ive been told here the issues are because of da toolzz (those should come along nicely towards the end of generation, fingers crossed)
or was it sony fanboy devs?
its one of those things
 

MarkMe2525

Gold Member
I don't know, if that was the case I would expect to hear more about it from the people developing the games. Of course, this is coming from someone who has no expertise on the subject.

John and Alex recently brought up an interesting point referring to how MS API works. They were alluding to the fact that while the API does alow for more customization of certain processes on the platform, a lot can be "drag and dropped" for the lack of a better term.

Alex specifically brought up how RT is implemented on the system and mentioned how this approach may cause devs to go the route of least resistance. The speculation was this can lead to inefficiencies.

Again, I have no idea and I am just parroting what was said on the recent DF direct.
 
Last edited:

Riky

$MSFT


All explained here and the SFS multiplier works despite the complexity of the scene, therefore your just moving a small percentage of the data rather than inefficiently trying to move all the data quickly which will require a lot more optimisation, which we just haven't seen so far in any software, just slightly faster loading times.
Now both SFS and Mesh Shaders are in the GDK along with seeing the first truly plumbed in Tier 2 VRS with Forza we're going to see large performance boosts, Series S will also benefit greatly from SFS and Mesh Shaders even more so than Tier 2 VRS which is resolution dependent.
Forward thinking hardware design that maximises efficiency.
 
Digital Foundry actually spoke with devs about this performance advantage and John said devs themselves were baffled by it. Some devs think it's the DirectX overhead, but there was no consensus.

Alex speculated that everyone is still using old DXR APIs. He also speculated that because the PS5 uses new RT APIs, devs had to do MORE work to get it up and running and ended up optimizing those PS5 APIs more compared to the DXR APIs. Odd reasoning considering devs have had 5 years of experience with RT APIs on PC, and it also doesnt reconcile with the fact that multiplatform game dev is done on PC first and then ported to consoles.

Whats important is what devs have told them. Which is that they dont fucking know. PS5 literally has some secret sauce in it that is making it perform better than its specs. Memory management was not brought up. In fact, John said no devs are complaining about the Xbox or the PS5.

Sometimes it just boils down to Mark Cerny am God.

Timestamped:

Mark Cerny is a master engineer and for whatever reasons he works with Sony in designing thier hardware and if Sony are smart they’ll keep him engineering their hardware until he’s in the ground.

Isn’t he he those kids who graduated high school when he was like 10? Always made me wonder if JP is loosely based on him.
 
GPU is accessing memory for 66% of that time, and the CPU accesses memory for the other 33% and the audio accesses
It's way too high. CPU will never need that much bandwidth. I believe some knowledgable people had done years ago a worst case scenario with the results of the average bandwidth being about 520GB/s (instead of 560GB/s, that's ignoring the usual memory contention problems also occurring on PS5).

But that's the worst case scenario (if CPU uses all its available bandwidth). In most cases CPU shouldn't need that much bandwidth thanks to the caches.
 
Last edited:

Yoboman

Member
i mean in the end let's be realistic, from dev's perspective

Console PS5 has like %65 userbase (I'm being optimistic for Xbox)
Xbox SX+Series S has like %35 userbase (as I said, optimist numbers)

LOGICALLY... if you have 100 units of optimization time, if I was a dev, 65 units of time would go to PlayStation, and 35 units of time would go to Xbox SX+SS. It is only fair. That console has a higher userbase. I'm not the owner of the console, nor I have an obligation to provide "equal" optimization. If there were a 3rd console brand that only has a %1 installbase, would you allocate your time equally between the tree? hell no. This is why older generations of GPUs get progressively worse performance. Take Kepler/Fermi+Maxwell GPUs. They're now being phased out. Why'd any dev tailor their game for maxwell/gcn3-4-5 anymore? They have extremely niche userbase. But take good notice how well Pascal (GTX 1000 series) aged finer in recent 3 years. Even the latest spiderman and uncharted and many more PS port RUNS better on PASCAL than GCN cards, despite the game being literally designed for GCN consoles. It is clear most of the devs give some importance to Pascal optimization, since they present the biggest chunk of PC installbase, from 1050ti to 1080 and 1060 being very popular GPU, still.

now, let's return back to the initial discussion. 65 units of time goes to PS5. plenty. then you have 35 units of time going to xbox series x... AND series s.

that's where the fun comes in! you must now scale back your entire game in terms of audio, physics, draw distance, resolution, texture and many more intricately to a point where you do not let the game get destroyed or lose its essence for Series S, even a nicher userbase, all the while dealing with a more intricate and complex and less optimized API in DX12 with weird memory pools.

like literally ALL the odds are against xbox series x + s. microsoft dug this grave themselves. only their own studios can do shit at this point. and I hope they do, so at least one can say these consoles are actually capable when devs exclusively code for it. other than that, writing is on the wall. most 3rd party games are gonna struggle on series x + series s.

u can disagree with my logic. even if a dev were to give EQUAL amount of time and care for both xbox and ps5, the GIVEN time would still have to be DIVIDED between xbox series s and x. there's no magical button that scales a game back to Series S's capabilities. it simply does not exist.
Sure. But that theory doesn't seem to stack up with recent history. Xbox One X was the lowest unit count last gen, a portion of a portion of the market. Yet that lower priority didn't stop devs pushing it's technical advantages. That was happening while dividing development resources between two PS4 models and two Xbox models as well

PS5 maybe easier to develop for but this is not like PS3 vs 360 gulf in ease of development

Let's not forget the narrative was that Series X had a massive technology advantage, as big or bigger than Series X vs PS4 Pro. And DF were one of the main proponents of that propaganda. So some good reporting on how they got it so wrong should be expected
 
Last edited:

GHG

Gold Member
Obviously no, but these developers are used to PC, where there are zillions of configurations. Games are done with that in mind. They don't do the best version posssible and then, during the last three months of development, scale down the graphics. Game development doesn't work in that way.

And do you know what PC games have? A settings menu. They are not specifically coding games to be compatible with zillions of hardware configs, it doesn't work like that. Direct X and Proton exist to take care of that for them.

Within reason devs can do what they want with their PC releases in terms of optimisation and we are seeing that with increasing frequency at the moment. There is no guarantee a game needs to run a particular way on any specific hardware spec (or combination of hardware). The minimum and recommended specs exist to serve as a guidance for the end-user, not a guarantee. They don't have that luxury when developing for consoles.
 

cireza

Member


All explained here and the SFS multiplier works despite the complexity of the scene, therefore your just moving a small percentage of the data rather than inefficiently trying to move all the data quickly which will require a lot more optimisation, which we just haven't seen so far in any software, just slightly faster loading times.
Now both SFS and Mesh Shaders are in the GDK along with seeing the first truly plumbed in Tier 2 VRS with Forza we're going to see large performance boosts, Series S will also benefit greatly from SFS and Mesh Shaders even more so than Tier 2 VRS which is resolution dependent.
Forward thinking hardware design that maximises efficiency.

As with many technologies it could be great and effective. And probably salvage Series S in the long run.

However the real question is : how likely are the third party developers to invest in such technology if it is specific to Xbox ? Are they really going to change their engine and development stack to conform to this technique ? I strongly doubt it. In current times, third parties rely on tools and technologies that are easily adaptable and scale to all configurations because making a lot of specific efforts is simply out of the question.

It might be the greatest tech ever, if nobody uses, it will be pointless. I guess some first party games will, obviously, but I doubt it will spread outside of this spectrum.
 
Last edited:

Riky

$MSFT
As with many technologies it could be great and effective. And probably salvage Series S in the long run.

However the real question is : how likely are the third party developers to invest in such technology if it is specific to Xbox ? Are they really going to change their engine and development stack to conform to this technique ? I strongly doubt it. In current times, third parties rely on tools and technologies that are easily adaptable and scale to all configurations because making a lot of specific efforts is simply out of the question.

It might be the greatest tech ever, if nobody uses, it will be pointless. I guess some first party games will, obviously, but I doubt it will spread outside of this spectrum.

Third parties will obviously take the path of least resistance but I think in the long term the unified GDK will make a difference on next gen only games, of which so far we've seen hardly any.



Microsoft need to make the integration as easy as possible for third parties and that starts on PC. However as every generation you're right that first party software will set the standards.
 

diffusionx

Gold Member


All explained here and the SFS multiplier works despite the complexity of the scene, therefore your just moving a small percentage of the data rather than inefficiently trying to move all the data quickly which will require a lot more optimisation, which we just haven't seen so far in any software, just slightly faster loading times.
Now both SFS and Mesh Shaders are in the GDK along with seeing the first truly plumbed in Tier 2 VRS with Forza we're going to see large performance boosts, Series S will also benefit greatly from SFS and Mesh Shaders even more so than Tier 2 VRS which is resolution dependent.
Forward thinking hardware design that maximises efficiency.

two more weeks, Microbros
 

Hobbygaming

has been asked to post in 'Grounded' mode.


All explained here and the SFS multiplier works despite the complexity of the scene, therefore your just moving a small percentage of the data rather than inefficiently trying to move all the data quickly which will require a lot more optimisation, which we just haven't seen so far in any software, just slightly faster loading times.
Now both SFS and Mesh Shaders are in the GDK along with seeing the first truly plumbed in Tier 2 VRS with Forza we're going to see large performance boosts, Series S will also benefit greatly from SFS and Mesh Shaders even more so than Tier 2 VRS which is resolution dependent.
Forward thinking hardware design that maximises efficiency.

It's interesting that the velocity architecture uses the Xbox Series X SSD which can load 2.4GB/s of uncompressed data

PS5's SSD can do 5.5GB/s uncompressed data and also has hardware decompression like Xbox

I would wait it out before touting these features as a big advantage. Some of it could very well be marketing, which MS hasn't always been the most truthful with
 
Sampler Feedback Streaming (SFS) was developed to combat the very issues described by the OP. The problem is that devs are just not using it. Along with a number of other next-gen technologies not/under utilised i.e. mesh shaders, VRS (some titles now use this), DirectML etc. this has contributed to a rather lacklustre generation thus far. Ultimately, Microsoft is responsible. First parties should be encouraged (or perhaps mandated) to utilise the more advanced features possible with the Series generation devices. As long as we are still living in this cross-gen hellscape we sadly won't see what the Xbox is capable of. Only basic increases to resolution and frame-rate and a couple of extra effects seem to be the difference since last gen.

Phil Spencer, though better than his predecessors in many ways has given us no system selling games whatsoever. If we look at first-party titles released under his tenure, the situation is laughably bad. The reality is that Xbox has been woefully mismanaged since the early days i.e. OG Xbox and 360. Since then it has been a joke. It's sad as its us that lose out.

Seriously, it's laughably pathetic that we've not seen ONE game use SFS or DirectML and we're 2.5 years after release!
 
Good post

This is the sort of analysis maybe with some dev interviews you would expect from a site like Digital Foundry about Series X performance. Especially after a year and a half of harping on about the power of 12 teraflops.

Instead all they've given is hand waving about tools and suggesting the devs didn't optimise enough

I've been skipping most things Digital Foundry related lately. They're mostly a fluff outlet for big gaming companies. Never ever take Sony or MS to task. Only time I've seen them do it was after the abysmal Halo infinite demo in 2019.
 

Hoddi

Member
Also I'm not 100% sure on how SFS "samples" the texture but I'm almost sure that the full texture itself needs to either be (very briefly) placed in memory first (and the unneeded parts dumped out), or maybe some part of the system is using a FIFO buffer to read the data as it's being accessed to discard unneeded data before writing results to RAM. If you know anything specifically to that part please clarify for me.
SFS textures are chunked into 64KB blocks. The parts of the texture that are visible onscreen only need those 64KB blocks read from disk and never the full texture (~350MB at 16k with BC7 compression). This also means that disk IO rates scale fairly linearly with rendering resolution where 4k needs 3-4x the disk throughput of 1080p because it needs to read more 64KB blocks.

There's a demo available for it here. It's easy to track disk reads in Windows' Resource Monitor if you want to give it a try.
 
Last edited:

Amiga

Member
there is no problem with the memory setup of the XSX, it is good, PS5 is just better.

Its not just the RAM. the memory flow through the various catches and connectors from SSD->CP->GPU makes a difference. On the PS5 the whole pipeline is optimized better than the XSX.

moday Memory management is the crux of the fight between AMD/Intel/invidia
 

knocksky

Banned
To really give you an answer I would have to check the tools, APIs and a extensive amount of documentation, if you’re not a software developer I recommend you go do bait console war thread.

You’re expecting people to provide an answer with a bunch of gibberish and hypotheticals… here, full of 12 year olds and man children.

Lock this shit.
Don't forget the poll that was added to. Because the the reality is that this place isn't really stricken with fanboys, they are in fact all devs that are knowledgeable enough to be able to give us the answers and make that poll actually mean something.
 
Oh yeah, Dead Space. While the software implementation on PS5 (at launch) looked terrible, I don't think anyone can say the XSX hardware implementation looked particularly good either. Combined with eye tracking in VR (i.e. "foveated rendering") it makes sense, but if you're doing it on a display where the user can look wherever without the game knowing about it - I'm not sure if it'll ever be able to be the silver bullet some make it up to be. Unless the drop in shading quality is just miniscule, and then, I don't think the performance gains would be that substantial.
It is definitely worth it when implement properly. I‘m sure I read somewhere it is typically worth ~10% perf improvement at little cost. Admittedly however it does depend on the title. For example, Doom Eternal features one of the few hardware-based tier two VRS implementations
 

nowhat

Gold Member


All explained here and the SFS multiplier works despite the complexity of the scene, therefore your just moving a small percentage of the data rather than inefficiently trying to move all the data quickly which will require a lot more optimisation, which we just haven't seen so far in any software, just slightly faster loading times.
Now both SFS and Mesh Shaders are in the GDK along with seeing the first truly plumbed in Tier 2 VRS with Forza we're going to see large performance boosts, Series S will also benefit greatly from SFS and Mesh Shaders even more so than Tier 2 VRS which is resolution dependent.
Forward thinking hardware design that maximises efficiency.

...riiiiight, just let me know when that translates into real world performance. You know. Where Series S curb-stomps PS5. This is what was promised to me.
 

LordOfChaos

Member
People forget that PlayStation’s OS is BSD based. Super stable and incredibly efficient. This has probably nothing to do with the original post but I am 100% not reading all of that.

Going back to the PS4 vs XBO this was a definite added factor, as not only was the PS4's OS fully native, but Xbox was sort of trying to virtualize multiple operating systems too.

That also turned around and meant Xbox could mess with clocks and BC more than Sony, where Sony went for more careful approaches with only optional boost modes unless a developer added a patch, and no clock bump systems like the One S. But it might still be that Sony's OS and bespoke GNM API is more efficient than the Xbox OS and DirectX.
 

Hobbygaming

has been asked to post in 'Grounded' mode.
Seriously, it's laughably pathetic that we've not seen ONE game use SFS or DirectML and we're 2.5 years after release!
Insomniac Games to my knowledge have been the only developers to use machine learning so far and that was just for muscle deformation in Spider-man Miles Morales lol it did look better than before though
 
Last edited:
At this point, I honestly don't see anything offered in Velocity Architecture that will move the needle that much, we're coming up on year three. It makes absolutely zero sense from a performance standpoint on XSX memory config. But, it does make sense from a price standpoint, as many have pointed out, 20GB would have been the preferred setup that would of avoided this "split" memory setup.
 
games_gear_series-x.jpg


So, I'm not going to really get into the performance results between Series X and PS5 over the past few game releases, or even necessarily claim that this thread is a response to say "why" the performance deltas have been popping up outside of Microsoft's favor. I think in most cases, we can take perspectives like @NXGamer 's , which focus on API differences, and even Digital Foundry's insights (specifically from Alex) on how MS's approach for a platform-agnostic API set in GDK might prevent specific optimizations from being applicable because it could be a crapshoot for devs to figure which option for their game is best for their specific game (this was referenced in relation to Atomic Heart).

However, I do think it's worth talking a bit about Series X's memory situation, because I do think it plays a part into some of the performance issues that pop up. As we know, Series X has 16 GB capacity of GDDR6 memory, but "split" into a 10 GB capacity pool running at 560 GB/s, and a 6 GB pool running at 336 GB/s. The 10 GB pool is referred to as "GPU-optimized", while the 6 GB partially reserved for the OS, with the remaining 3.5 GB capacity of that block being used by CPU and audio data.

Some clarification: the Series X memory is not physically "split". It's not one type of memory for the GPU and a completely different type of memory for the CPU, as was the case for systems like the PS3, or virtually all pre-7th gen consoles (outside of exceptions like the N64, which had their own issues with bad memory latency). It is also not "split" in a way wherein the 10 GB and 6 GB are treated as separate virtual memory addresses. AFAIK, game applications will see the two as one virtual memory address, though I suspect the kernel helps out on that given the fact that, since the two pools of memory run at different bandwidths and therefore can't be completely considered hUMA (Heterogenous Unified Memory Architecture) in the same way the PS5's memory setup is.

But now comes the more interesting parts. Now, the thing about Series X's advertised memory bandwidths is that they are only accomplished if only THAT part of the total memory pool is accessed for the duration of a second. If, for a given set of frame time, a game has to access data for the CPU or audio, then that's a portion of a second the 10 GB capacity is NOT being accessed, but rather the slower 336 GB/s portion of memory. Systems like the PS5 have to deal with this type of bus contention as well, since all the components are sharing the same unified memory. However, bus contention becomes more of an issue for Series X due to its memory configuration. Since data for the GPU needs to be within certain physical space (otherwise it's not possible for the GPU to leverage the peak 560 GB/s bandwidth), it creates a bit of complication to the memory access that a fully hUMA design (where the bandwidth is uniform across the entire memory capacity pool) isn't afflicted with.

This alone actually means that, since very few processes in an actual game are 100% GPU-bound for a given set of consecutive frames over the course of a second, then it's extremely rare that Series X ever runs the full 560 GB/s bandwidth of the 10 GB capacity in practice. For example, for a given second let's say that the GPU is accessing memory for 66% of that time, and the CPU accesses memory for the other 33% and the audio accesses it for 1% of that time. In practice, memory bandwidth access would theoretically be ((560/3) * 2 + (336/3) * 1 =) ~ 485 GB/s. However, that isn't taking into account any kernel or OS-side management of the memory. On that note, I can't profess to knowing much; I would say that since outside of a situation I'll describe in a bit, the Series X doesn't need to copy data from the 6 GB to the 10 GB (or vice-versa), you aren't going to run into the type of scenario you see on PC. So in this case, whatever kernel/OS overhead there is for memory management of this setup is minimal and for this example effective bandwidth would be around the 485 GB/s mark.

As you can see, though, that mixed result is notably less than the 560 GB/s of the 10 GB GPU-optimized memory pool; in fact it isn't much larger than PS5's 448 GB/s. Add to that the fact with PS5, features like the cache scrubbers reduce the need for GPU to hit main memory as often, and there being wholly dedicated hardware for enforcing cache coherency (IIRC the Series X has to leverage the CPU to handle at least a good portion of this, which nullifies much of the minuscule MHz advantage CPU-side anyway), it gets easy to picture how these two systems are performing relatively on par in spite of some larger paper spec advantages for Series X.

What happens though when there's a need for more than 10 GB of graphics data, though? This is the kind of scenario where Series X's memory setup really shows its quirky problems, IMO. There are basically two options, both of which require some big compromises. Devs can either reserve a portion of the available 3.5 GB in the 6 GB pool for reserve graphics data to copy over (and overwrite) a portion of what's in the current 10 GB pool (leaving less room for non-graphics data), or they can access that data from the SSD, which is much slower and has higher latency, and would still require a read and copy operation into memory, eating up a ton of cycles. Neither of these are optimal but will become more likely if a game needs more than 10 GB of graphics data. Data can be compressed in memory and then decompressed by the GPU when it accesses it, and that helps provide some room, but this isn't exclusive to Series X so in equivalent games across it and PS5 the latter still has an advantage due to its fully unified memory and uniform bandwidth across the entire pool of its memory.

Taking the earlier example and applying it here, in a situation where the GPU needs to access 11 GB of graphics data instead of 10, it means it has to use a 1 GB in the other 6 GB pool to do this. That would reduce the capacity for CPU & audio data down to 2.5 GB. Again, same situation where for a given second, the GPU accesses memory for 66% of the time (but accesses the other 1 GB for 25% of its total access time), CPU for 33% and audio for 1%, and you get ((((560/3) * 2) * .75) + ((336/3) * 1.4125) =) ~438 GB/s. Remember, now the GPU is only accessing from the 10 GB capacity for 3/4ths of the 66% time (in which the GPU is accessing memory); it spends 25% of its access time in the 6 GB capacity, so in reality the 6 GB capacity is being accessed a little over 40% of the total time for the second.

Either way, as you can see, that's a scenario where due to needing more than 10 GB of graphics data, the total contention on the bus pulls overall effective bandwidth down considerably. That 438 GB/s is 10 GB/s lower than PS5's bandwidth and, again, PS5 has the benefit of cache scrubbers which can help (to a decent degree, though it varies wildly) reduce the GPU's need to access the RAM. PS5 also having beefier offloaded hardware for enforcing cache coherency helps "essentially" maximize its total system bandwidth usage (talking RAM and caches, here), as well. For any amount of data the GPU needs which is beyond the 10 GB capacity threshold, if that data is in the 6 GB pool, then the GPU will be bottlenecked by that pool's maximum bandwidth access, and that bottleneck is compounded the longer the GPU needs that particular chunk of data which is outside of the 10 GB portion.

As cross-gen games fade out and games start to become more CPU-heavy in terms of data needing processed, this might present a small problem for Series X relative the PS5 as the generation goes on. But it's the increasing likelihood of more data being needed for the GPU which presents the bigger problem for Series X going forward. Technically speaking, Series X DOES have a compute advantage over PS5, but the problem its GPU could face isn't really tied to processing, but data capacity. ANY amount of data that is GPU-bound but needs to fit outside of the 10 GB pool, will essentially drag down total system bandwidth depending on the frequency the GPU needs that additional information. Whether that's graphics data or AI or logic or physics (to have the GPU process via GPGPU), it doesn't change the complication.

There's only but so much you can compress graphics data for the GPU to decompress when accessing it from memory; neither MS or Sony can offer a compression on that front beyond what AMD's RDNA2 hardware allows, and that benefit is shared by both PS5 and Series X so entertaining it as a solution for the capacity problem (or rather, the capacity configuration problem) relative Series X isn't going to work. There isn't really anything Microsoft could do to fix this outside of giving the Series X a uniform 20 GB of RAM. Even if that 20 GB were kneecapped to the same bandwidth as the current Series X, all of the problems I present in this post would go away. But they would still need games to be developed with the 16 GB setup in mind, effectively nullifying the solution of that approach.

Microsoft's other option would be to go with faster RAM modules but keep the same 16 GB setup they currently have. The problems would persist in practice, but their actual impact in terms of bandwidth would be nullified thanks to the sheer raw bandwidth increase. Again, Microsoft would still need games to be programed for the current set-up in mind, but with this specific approach it would be cheaper than increasing capacity to 20 GB and while system components access things the same as current, the bandwidth uptick would benefit total system performance on the memory side (just as an example, the scenario I gave earlier resulting in the 438 GB/s would automatically increase to 518 GB/s if MS fitted Series X with 16 Gbps chips (for total peak system bandwidth of 640 GB/s over the current 560 GB/s).

Again, I am NOT saying that the current performance issues with games like Atomic Heart, Hogwarts, Wild Hearts etc. on Series X are due to the memory setup. What I'm doing here is illustrating the possible issues which either are or could arise with more multiplat releases going forward, as a result of how the Series X's memory setup functions, and the way it's been designed. These concerns become even more pronounced with games needing more than 10 GB of GPU-bound data (and for long periods of access time cycle-wise), a situation which will inevitably impact Series X in a way it won't impact PS5, due to Series X's memory setup.

So hopefully, this serves as an informative post for those maybe wanting to point to an explanation to any future multiplat performance results of games where results for Series X are less than expected (especially if they are lower than PS5's), and the RAM setup can be identified as a possible culprit (either exclusively or in relation with other things like API tools overhead, efficiency, etc.). This isn't meant to instigate silly console wars; that said, seeing people like Colteastwood continue to propagate FUD about both consoles online is a partial reason I wanted to write this up. I did give hints to other design differences between the two systems which provide some benefits to PS5 in particular, such as the cache scrubbers and the more robust enforcement of cache coherency, but I kept this to something both systems actually have, and they both use RAM, at the same capacity, of the same module bandwidth (per chip). SSD I/O could also be a factor into gaming performance differences as the generation goes on, but that is a whole other topic (and in the case of most multiplats, at least for things like load times, the two systems have been very close in results).

Anyway, if there are other tech insights on the PS5 and Series systems you all would want to share to add on top of this, whether to explain what you feel could create probable performance advantages/disadvantages for PlayStation OR Xbox, feel free to share them. Just try to be as accurate as possible and, please, no FUD. There's enough of that on Twitter from influencers 😂
I actually clicked into one of colteastwoods YouTube videos and noped the fuck out quicker than I noped out of RE8 in VR.

The thumbnail and title actually made it seem like it was going to diss the xbox but when you start watching it, it’s just filled with so much delusional mumbo jumbo “talking points”.

I appreciate all the technical details on your post but I’m not so technical myself. However, from a qualitative point of view, it would seem to me that throughout all the recent console generations, the consoles with unified memory have been lauded and the ones with split memory architecture have been derided for being more complex and more difficult to work with, particularly at the beginning of the generation when devs are still learning the tools.

Basically, looking at recent history, it’s kinda unlikely that their split memory isn’t causing probably, and at the very least, it’s highly possible, amongst other potential issues such as what Alex suggested. Which also makes sense as Xbox runs on top of their hypervisor and that has got to incur some kind of performance penalty. How efficient is their hypervisor? I’m sure their kernel is closed source so I guess we cannot know. Everything on UWP is basically virtualised and abstracted from the hardware. DirectX API running through the hypervisor would also incur a penalty. Unless I’m missing something and the hypervisor can be bypassed through directx.
 
SFS textures are chunked into 64KB blocks. The parts of the texture that are visible onscreen only need those 64KB blocks read from disk and never the full texture (~350MB at 16k with BC7 compression). This also means that disk IO rates scale fairly linearly with rendering resolution where 4k needs 3-4x the disk throughput of 1080p because it needs to read more 64KB blocks.

There's a demo available for it here. It's easy to track disk reads in Windows' Resource Monitor if you want to give it a try.

Thanks, I appreciate your clarification. So from your description, those textures are arranged into 64 KB portions on the storage itself. The texture is still stored as a contiguous asset but since it's in a format SFS can understand, the system can access the portions needed.

I can see the advantages, for sure. But there are probably some disadvantages as well. For example, any accesses of data to storage still incur magnitudes more latency access than in RAM, which incur magnitudes more than data in the caches. If you're going to be using SFS for parts of a texture that effectively result in the whole texture being used in a given scene (at some point), you're probably better off storing the whole texture in RAM anyway and that's where the advantage of SFS would end (for that particular texture).

If that repeats for a group of unique textures, then you're just better off keeping that group of textures in RAM anyway, rather than incur the access latency penalty of the SSD. I can also see how, at least in theory, the concept of SFS might be at odds with how artists actually tend to make their texture assets; they aren't making them as 64 KB blocks, but at 4K-quality (or larger) sizes. So work on the artist's end stays the same in terms of load.

Meanwhile managing the feedback stack on the programmer's end probably involves a bit of work, and that may explain why SFS isn't being readily used in a lot of games yet. Though I'm sure at least a few games have to be using it, rather on Xbox or PC. I'm still not convinced it totally resolves some of the concerns I listed in the OP though, for reasons I mention in this specific post.

Seriously, it's laughably pathetic that we've not seen ONE game use SFS or DirectML and we're 2.5 years after release!

I'm pretty sure Forspoken is utilizing it, at least on PC, since it's also using DirectStorage there. But really, if no game's using these, that's no one's fault but Microsoft's. They should have had their 1P games making use of these things instead of waiting and hoping for 3P to do so.

As with many technologies it could be great and effective. And probably salvage Series S in the long run.

However the real question is : how likely are the third party developers to invest in such technology if it is specific to Xbox ? Are they really going to change their engine and development stack to conform to this technique ? I strongly doubt it. In current times, third parties rely on tools and technologies that are easily adaptable and scale to all configurations because making a lot of specific efforts is simply out of the question.

It might be the greatest tech ever, if nobody uses, it will be pointless. I guess some first party games will, obviously, but I doubt it will spread outside of this spectrum.

Don't think this is actually the case. When devs make games for say PS5, they have to use Sony's APIs, because Sony doesn't use Microsoft's APIs or their SDKs. The high-level language that devs might write their applications in runs on all the different platforms, that's true, but for closed boxes like consoles, typically you are using specific APIs and tools designed for that box by the platform holder.

Certain things like commercial viability of the platform versus difficulty in the APIs and tools able to extract optimal performance in a reasonable time frame can impact the rate of adoption, but this is Xbox we're talking about here. It's not some little add-on like Sega CD or some obscure dead-end like the Jaguar. The brand still sells better than a lot of other consoles, in fact only Sony and Nintendo consoles have generally outsold Microsoft ones. When you consider all of the platform holders that have been in the industry (Sega, NEC, Matsushita/Panasonic/3DO, Atari, FM Towns, Phillips, Apple, etc.) and the gap between even the best selling consoles among some of them and the lower-selling Microsoft ones, it's a pretty big gap.

If something's not being used, chances are it's because the opportunity cost is not worth the headaches in working with getting optimal performance. What you're suggesting, I think, is misguided because having APIs and tools that scale to a wide net of configurations actually seems like it causes problems in settling on very specific approaches that target one particular hardware spec to optimize for it to get the best performance. Which is something Digital Foundry were touching on in their latest podcast.

It's way too high. CPU will never need that much bandwidth. I believe some knowledgable people had done years ago a worst case scenario with the results of the average bandwidth being about 520GB/s (instead of 560GB/s, that's ignoring the usual memory contention problems also occurring on PS5).

But that's the worst case scenario (if CPU uses all its available bandwidth). In most cases CPU shouldn't need that much bandwidth thanks to the caches.

Maybe it's too high, maybe not. It's not just about CPU tapping the RAM for data to calculate on the game's end. If the CPU needs to be used for helping along transfer of data from storage to RAM or vice-versa, that's also time spent occupying the bus. As it's doing so, the GPU won't be able to access RAM contents.

It's a bit more of a sticking point for Series systems because I don't think they have off-chip silicon with DMA to the RAM the way the PS5 does. If they did I think we'd of learned of it at Hot Chips a few years ago. So at the very least the CPU for Series consoles is more involved in that process.

As so, maybe you're right that my example is too high on CPU and audio's end accessing the bus, at least in terms of doing so for game data to be processed (and only processed, not moved in/out of RAM <-> storage). But my main intent was to look into situations where Series X would need more than 10 GB of data to access, and if so, any data in the slower pool would drag down effective bandwidth proportional to the amount of time data in that slower pool is being accessed.

The thing is, that is going to vary wildly from game to game, and even situation to situation, moment to moment. I'm just exploring a possibility that could become more regular as the generation goes on. The worst-case you bring up assumes the CPU is only occupying the bus for maybe 7.5% of a given second. But then say there's a situation a game needs 11 GB of data that is GPU-bound, and the last GB may need to be accessed say 33% of that time. It still creates a situation where total bandwidth is dragged down, and in that case, it'd be by a lot more than the CPU itself.

You probably still get a total bandwidth higher than PS5's (I got 485.2 GB/s), but how much does that counterbalance PS5 needing to access RAM less often because of the cache scrubbers? That it doesn't need the CPU to do as much heavy lifting for cache coherency because it has dedicated silicon that offloads the requirement? Stuff like this, I think should be considered as well.
 
Last edited:

yamaci17

Member
Sure. But that theory doesn't seem to stack up with recent history. Xbox One X was the lowest unit count last gen, a portion of a portion of the market. Yet that lower priority didn't stop devs pushing it's technical advantages. That was happening while dividing development resources between two PS4 models and two Xbox models as well

PS5 maybe easier to develop for but this is not like PS3 vs 360 gulf in ease of development

Let's not forget the narrative was that Series X had a massive technology advantage, as big or bigger than Series X vs PS4 Pro. And DF were one of the main proponents of that propaganda. So some good reporting on how they got it so wrong should be expected
let me stop you there!

PS4 and PS4 PRO DO scale between each other.

- 18 CU to 36 CU (perfect CU count alignment). This is so valuable for PS API that sony decided to go on with the exact same 36 CU count with PS5. I gather even that small detail makes ports between PS4 pro and PS5 easier. see, you do stuff to EASE things on the dev side. xbox is doing the opposite (ESRAM, DDR3 RAM, split memory, massive memory budget disperancy between s and x etc.)
- Both have equal amounts of memory; 8 GB GDDR5 to 8 GB GB GDRR5
- No split memory configurations
- Exact same crap harddisk and CPU (yes, a mere bump from 1.6 GHz to 2.2 GHz. this only ended up less stable 30 FPS for PS4 in certain games)

Even Xbox One has that very same 8 GB memory budget.

Only and ONLY reason Series S provides a massive challenge is because of the reduced memory budget. all consoles lastgen had equal memory budgets (expect one x, which had 12 gb budget instead. and most likely that helped it to get extra native 4k benefits in certain games)

you never skimp on RAM. microsoft did. I do not know how it reflect on future, but 3 years from now, I'm sure more and more devs will be complaining about it
 
Last edited:
Status
Not open for further replies.
Top Bottom