• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Xbox Velocity Architecture - 100 GB is instantly accessible by the developer through a custom hardware decompression block

Panajev2001a

GAF's Pleasant Genius
The term Majority oesn't really provide any insight... and you know that. Lol

Sure, but you need a bit more than that to jump to the conclusion that it must mean they are hiding the fact that they only spend 50.01% of the time at that clockspeed under average game load. Unless the angle is Xbox specs == believe and assume the best for everything you do not see specified/measured independently, Sony specs == take with two punches of salt and assume the worst for everything you do not see specified/measured independently.

What they say that under average load the console should be able to keep max clockspeed say 99.9% of the time and drop it by a small percentage (2-3% or so) in some corner cases and where the devs did not optimised it out pre-release seems reasonable.

In most cases this talk about variable clocks takes air time as a vehicle to imply that PS5 performance is much lower than advertised under real world scenarios... and you know that... lol.
 
Last edited:
The only people who would know are the engineers/game devs who have had hands on with the PS5. As of right now we have a few groups of people: We have the group that is hoping, yes trust me they are hoping and praying, that PS5 cannot come anywhere close to sustaining it's peak frequencies which mean it's TF numbers are actually lower than 10, closer to 9. We have the other group that is stating that currently no GPU/CPU is ever used at 100% and that if there is any discrepancy it is minimal. This argument is happening in the same thread as the groups fighting over whether or not the SeriesX's optimizations brings it's SDD speeds closer to the PS5s.

I honestly don't know why people argue about it. Even if facts come out, they won't believe it. There's always gonna be something there that they can hold on to for hope. In the end, the games will pretty much look the same, they'll get there in their own way, but they'll get there. They just can't come to grips that each consoles has their advantages, each has their "thing" They are unwilling to give them any ground to stand on. But, could one of them pull ahead? Yea, it's possible. They both took risks by choosing the route they chose. Between the two "camps" power and graphics have always been the hill they die on, it's never really been the games.

Logically if someone states that a particular object or process does something a majority of the time, two questions arise:

1. Why not always?
2. When does that process not run at max performance.

They become especially interesting when there is an opportunity to utilize a comparable solution that runs at a faster rate (tflops wise) with no variability of performance.

Majority means not always but it can mean usually.

Sony's answers to the circumstance under which that variability would be expected to occur arent readily available.

Their statements about performance at or near maximum belie the base statement that the system is variable in nature.

Is it variable or fixed? If its variable when and why? If it isn't variable but as good as fixed why didn't you call it fixed?

The answers to these statements are simply unclear.
 
Last edited:

ToadMan

Member
100 GB can be accessed at 4.8 GB/s, the rest of the SSD cannot be accessed so quickly.

This is incorrect in 2 ways.

First the SSD transfer rate is 2.4gb/s (it is the effect of decompression that potentially gives the 4.8gb/s number - decompression is not the matter at hand here).

Second, 2.4gb/s is the max transfer rate for the entire SSD - not just 100gb of it.
 

Panajev2001a

GAF's Pleasant Genius
Logically if someone states that a particular object or process does something a majority of the time, two questions arise:

1. Why not always?
2. When does that process not run at max performance.

They become especially interesting when there is an opportunity to utilize a comparable solution that runs at a faster rate (tflops wise) with no variability of performance.

Majority means not always but it can mean usually.

Sony's answers to the circumstance under which that variability would be expected to occur arent readily available.

Their statements about performance at or near maximum belie the base statement that the system is variable in nature.

Is it variable or fixed? If its variable when and why? If it isn't variable but as good as fixed why didn't you call it fixed?

The answers to th

This has been explained by Sony, it has been covered here ( https://www.neogaf.com/posts/258225582/ ), etc... sure we do not have 100% of the data, but we are not in the dark with magic buzzword that nobody can comprehend even slightly thrown at us...
 

Panajev2001a

GAF's Pleasant Genius
Do you know where that quote came from exactly? It was mentioned as part of the Eurogamer article earlier, but the one I read said 'full clock speed most of the time'. I might've missed it being posted here.

For the record, I imagine that the PS5 does run at peak the vast majority of the time, but if Cerny actually said 'vast majority' it would be helpful if you could provide a link.
Are we arguing most vs majority of the time semantics now :LOL: as if that made any difference. He could sign that it stays up 99.98% of the time and people would still throw doubt and “concern” at it...
 
Sure, but you need a bit more than that to jump to the conclusion that it must mean they are hiding the fact that they only spend 50.01% of the time at that clockspeed under average game load. Unless the angle is Xbox specs == believe and assume the best for everything you do not see specified/measured independently, Sony specs == take with two punches of salt and assume the worst for everything you do not see specified/measured independently.

What they say that under average load the console should be able to keep max clockspeed say 99.9% of the time and drop it by a small percentage (2-3% or so) in some corner cases and where the devs did not optimised it out pre-release seems reasonable.

In most cases this talk about variable clocks takes air time as a vehicle to imply that PS5 performance is much lower than advertised under real world scenarios... and you know that... lol.

Sure but that ISNT what THEY said. Thats what you are saying in their defense.

And 99.9 is as good as fixed yeah? This is lsnt a six sigma operation. This is gaming.

Its unclear. The question is whether that lack of clarity is by design or because they can't be arsed to specify beyond the public statement.
 

Panajev2001a

GAF's Pleasant Genius
Do you know somethign we dont?
Is PS5 primative shaders stronger than SeX mesh shaders?

I doubt Mark Sony paid money to deviate from Rdna2 when he could choose the bigger APU ip.

Oh Mark Sony did things you cannot even imagine and he may even have told Mark Cerny about it... 😲.
UIh7bTr.png


hMK431r.png


ydvwN14.jpg
 
Last edited:

ToadMan

Member
Logically if someone states that a particular object or process does something a majority of the time, two questions arise:

1. Why not always?
2. When does that process not run at max performance.

They become especially interesting when there is an opportunity to utilize a comparable solution that runs at a faster rate (tflops wise) with no variability of performance.

Majority means not always but it can mean usually.

Sony's answers to the circumstance under which that variability would be expected to occur arent readily available.

Their statements about performance at or near maximum belie the base statement that the system is variable in nature.

Is it variable or fixed? If its variable when and why? If it isn't variable but as good as fixed why didn't you call it fixed?

The answers to these statements are simply unclear.

The answers to this are clear in Cerny’s presentation and his interviews.

They might not be clear to you I understand, but they have been clearly explained to others.

The simple answer is that the max power (Watts available to the gpu and cpu) is capped. If a game doesn’t exceed max power, the cpu and gpu will run at 100% clock all of the time.

If for example the GPU exceeds its power budget, then if the cpu is running under its power budget, AMDs SmartShift will reallocate power to the GPU and both the cpu and gpu continue to run at 100% clock.

Finally using the example above, if the GPU is making a power demand and the CPU is already using its max power, the GPU will be declocked to maintain the power at its max allowable (actually the CPU would probably be declocked because devs would prefer that usually)

All of these scenarios are in the hands of the developers. They optimise their code for what they need to do. If they want full clocks, they work within the total power budget and they’ll get full clocks all the time. If they optimise poorly, they risk their game getting a reduced clock at runtime.

This isn’t a dark art, this type of solution exists in other applications already. It’s new to consoles not new to computer science.
 
Last edited:
This has been explained by Sony, it has been covered here ( https://www.neogaf.com/posts/258225582/ ), etc... sure we do not have 100% of the data, but we are not in the dark with magic buzzword that nobody can comprehend even slightly thrown at us...

Nearly every thing in this thread has a formal published statement, video or document except how one should apply the multiplier.

You guys have argued for 20 pages about how the multpiier that a Microsoft engineer cites can be applied. Is it within the 4.8 or on top of 4.8.

None of the other things that I have seen you attempt to debate are debatable.

Prt and SF are the similar. Sf and sfs are related but not the same. Sfs ans PRT are neither similar nor the same.

The documentation in dx12u and other places is clear about that.

I recognize your handle from beyond3d which I think is the only public technical forum on the web thats above and beyond DF so I'm unsure why you are here debating. Lol
 
You've misunderstood what I've meant. I am not claiming some false information is being given by MS or some betraylaton. Taking it at face value would be accepting that the vast majority of that 2x-3x multipler is the know method of SFS over transferring the whole texture as is clearly even stated by the spec and MS' PR material. I'm saying why are some people in these threads, one particularly, then using that figure as some kind of XSX secret sauce? MS aren't the ones giving false info its just some people here are trying hard to forcefully link that figure to some custom exclusive feature of XSX. The reason is kind of clear too.

Here is a quote from the Eurogamer article reporting what Microsoft said to them on this subject:

As textures have ballooned in size to match 4K displays, efficiency in memory utilisation has got progressively worse - something Microsoft was able to confirm by building in special monitoring hardware into Xbox One X's Scorpio Engine SoC. "From this, we found a game typically accessed at best only one-half to one-third of their allocated pages over long windows of time," says Goossen. "So if a game never had to load pages that are ultimately never actually used, that means a 2-3x multiplier on the effective amount of physical memory, and a 2-3x multiplier on our effective IO performance."

A technique called Sampler Feedback Streaming - SFS - was built to more closely marry the memory demands of the GPU, intelligently loading in the texture mip data that's actually required with the guarantee of a lower quality mip available if the higher quality version isn't readily available, stopping GPU stalls and frame-time spikes. Bespoke hardware within the GPU is available to smooth the transition between mips, on the off-chance that the higher quality texture arrives a frame or two later. Microsoft considers these aspects of the Velocity Architecture to be a genuine game-changer, adding a multiplier to how physical memory is utilised.

Microsoft looked at a relatively modern SoC running modern games and came away realizing that with more information about what MIPS/portions of MIPS of a texture were needed that features to aid texture streaming could close that gap and be 2-3X more efficient. It’s a claim that they will have to prove, but it’s not impossible and based on this reporting, it is clearly not referring to the general technique of texture streaming.

And regarding the availability of Sampler Feedback in Direct 3D prior to now, according to a DX12 ultimate preview article from fall 2019:

In this blog post, we will preview a suite of new DirectX 12 features, including DirectX Raytracing tier 1.1, Mesh Shader, and Sampler Feedback. We will briefly explain what each feature is and how it will improve the gaming experience. In subsequent weeks, we will publish more technical details on each feature along with feature specs. All these features are currently available in Windows 10 Insider Preview Builds (20H1)

Yes, Partial Resident Textures and texture streaming have been around for a while but there are limits on how aggressive you can be with loading/not loading textures in unless you know for sure which textures are needed. SFS is designed to solve that by providing the feedback information about what is actually needed in memory. Before this feature the software had to guess through approximations. The lack of accuracy in these techniques required more textures to be in memory than was actually required, hence the 1/2 to 1/3rd actual memory utilization.
 

Panajev2001a

GAF's Pleasant Genius
Sure but that ISNT what THEY said. Thats what you are saying in their defense.

And 99.9 is as good as fixed yeah? This is lsnt a six sigma operation. This is gaming.

Its unclear. The question is whether that lack of clarity is by design or because they can't be arsed to specify beyond the public statement.

If you think about what they described the solution as you will see why providing a number now is perhaps less meaningful than the details they have given. Back to the link I quoted earlier.

You are free to think “they must be hiding something”...
 

quest

Not Banned from OT
Are we arguing most vs majority of the time semantics now :LOL: as if that made any difference. He could sign that it stays up 99.98% of the time and people would still throw doubt and “concern” at it...
That be great if we got hard numbers outside max clocks. Most and vast are marketing terms neither are hard numbers like we have on the SSD you that. Its been what 2 months and 0 new hard numbers on the variable clocks just "trust" us like Microsoft in 2013 and DRM. When a company won't give up numbers or facts there is something up.
 

THE:MILKMAN

Member
I think arguing over which isn't being completely transparent is pointless. Both will only share what they want ultimately and no doubt both will continue to do so on certain issues.

So far only Microsoft seem to have talked about being transparent and open, yet have themselves held back giving details about XVA and curiously didn't show the internal SSD in their teardown but did show the external card.

In time we'll get most things answered one way or another.
 

cireza

Gold Member
This is incorrect in 2 ways.

First the SSD transfer rate is 2.4gb/s (it is the effect of decompression that potentially gives the 4.8gb/s number - decompression is not the matter at hand here).

Second, 2.4gb/s is the max transfer rate for the entire SSD - not just 100gb of it.
So the quote that says "100 GB is accessible at 4.8 GB" is not really accurate.

Edit : actually saw your previous post, made things clearer thanks.
 
Last edited:
Oh Mark Sony did things you cannot even imagine and he may even have told Mark Cerny about it... 😲.
UIh7bTr.png


hMK431r.png


ydvwN14.jpg


Interesting. I actually know Matt Hargett. Spoke to him a few weeks ago about something unrelated to gaming.

Seriously brilliant man and very down to earth as well. Thoughtful in his speech. I would take all his statements as very good insight.
 

Clear

CliffyB's Cock Holster
The term Majority oesn't really provide any insight... and you know that. Lol

It kinda does when no system runs at full resource occupancy for the majority of the time either!

Think about it: are games always as busy as they can be? Are the same amount of enemies, particles, you name it always active at once, draw distances always equal? Etc.

No they aren't. There are peaks and troughs, often varying every few milliseconds.
 
It kinda does when no system runs at full resource occupancy for the majority of the time either!

Think about it: are games always as busy as they can be? Are the same amount of enemies, particles, you name it always active at once, draw distances always equal? Etc.

No they aren't. There are peaks and troughs, often varying every few milliseconds.

Utilization and availability are two different measurements.

Throughput and bandwidth or capacity are also different.

The PS5 has a maximum of 10.28 TfLops availability regardless of utilization under the right power conditions. When those conditions aren't right it will have fewer tflops available. This decision is ms by ms.

The XSX has 12.15 tflops available under all conditionsMs.

This isn't hard to get. Sony is just unclear about what those conditions are.
 

geordiemp

Member
The term Majority oesn't really provide any insight... and you know that. Lol

Do you think you need full power for every nanoseconf of a cycle ?

Here is activity and "how busy" for 1 frame of a big game. Then take 8 cores 16 threads, there is time for speed, there is time for a nice nap, makes no difference to throughput when you understand how things work.

So many people think in 12 TF this, 3,6 Ghz that, they dont see what really happens, they are maximums like BHP of a car, you dont need it in the car park. For the odd time a raod with a load of bumps comes along, slow down a bit (AVX) for that tiny fraction, better than getting hot under the collar.

So why dont all things work like this ? They will, efficiency is important.


0JG23Ei.png
 
Last edited:
If you think about what they described the solution as you will see why providing a number now is perhaps less meaningful than the details they have given. Back to the link I quoted earlier.

You are free to think “they must be hiding something”...

So on the one hand you ascribe numerical certainty to Sonys solution which they themselves won't do.

Yet you are one of the most strident proponents of MS actual statements are marketing.

Then on the third hand you make an assertion that im playing this silly game of "Sony must be hiding something" when I have never stated anything like that.

I simply said their numbers don't exist and you inserting numbers for them simply won't do.

Now please go find the "hiding something" quote or insinuation which you are referring to.

I'll wait.
 

Clear

CliffyB's Cock Holster
Utilization and availability are two different measurements.

Throughput and bandwidth or capacity are also different.

The PS5 has a maximum of 10.28 TfLops availability regardless of utilization under the right power conditions. When those conditions aren't right it will have fewer tflops available. This decision is ms by ms.

The XSX has 12.15 tflops available under all conditionsMs.

This isn't hard to get. Sony is just unclear about what those conditions are.

The Tflop count is an abstract you plank. Its literally the number of CU's x frequency. Literally change either and the value varies, because it disregards what the CU's are doing and what data they are processing.
 
Do you think you need full power for every nanoseconf of a cycle ?

Here is activity and "how busy" for 1 frame of a big game. Then take 8 cores 16 threads, there is time for speed, there is time for a nice nap, makes no difference to throughput when you understand how things work.

So many people think in 12 TF this, 3,6 Ghz that, they dont see what really happens, they are maximums like BHP of a car, you dont need it in the car park.

So why dont all things work like this ? They will, efficiency is important.


0JG23Ei.png

You are answering a question no one asked. You are simply looking at utiliZation. Frequencies don't change with utilization unless by design.

So I'm not sure which thing you are debating.

The design of the ps5 is to change frequency based on power load which is an analog for utilization. So?

All that means is that its frequency will change and once it's frequency changes, its not a 10.28 tflop machine.

This isn't hard. Why do you care what frequencies or utilization it operates it if you are able to play your favorite games and not hear a jet engine coming from your TV console? Lol
 
Last edited:

geordiemp

Member
The Tflop count is an abstract you plank. Its literally the number of CU's x frequency. Literally change either and the value varies, because it disregards what the CU's are doing and what data they are processing.

Dont bother yourself, you cant teach some people they just need simple numbers and have no idea how silicon work, RAM, chips, boards or software. Its pointless.

Sustained - yeah its funny, marketing to the idiots.
 
The Tflop count is an abstract you plank. Its literally the number of CU's x frequency. Literally change either and the value varies, because it disregards what the CU's are doing and what data they are processing.

Thanks for the kind words. You have restated exactly what I have stated.

Change the frequency you don't have the same output.

I appreciate your kind, humble, civil tone. 😇
 

geordiemp

Member
You ate answering a question no one asked. You are simply looking at utiliZation. Frequencies don't change with utilization unless by design.

So I'm not sure which thing you are debating.

The design of the ps5 is to change frequency based on power load which is an analog for utilization. So?

All that means is that its frequency will change and once it's frequency changes, its not a 10.28 tflop machine.

This isn't hard. Why do you care what frequencies or utilization it operates it if you are able to play your favorite games and not hear a jet engine coming from your TV console? Lol

Oh my god, its a 10.2 blah blah machine.

Its hilarious. Its not 10.2 BHP car because you slowed down when you could to save petrol as you had lots of time on your hands in the loading screen.....

Or that multiple AVX instruction which - or there is tempest engine...

You are answering a question no one asked. You are simply looking at utiliZation. Frequencies don't change with utilization unless by design.

Well done, you have described Ps5 design, frequencies change to save power when they can, its not that hard.
 
Last edited:

Clear

CliffyB's Cock Holster
Thanks for the kind words. You have restated exactly what I have stated.

Change the frequency you don't have the same output.

I appreciate your kind, humble, civil tone. 😇

I'm sorry, I get frustrated at times when I see a genuinely interesting discussion getting constantly derailed by people who appear to lack any real understanding of the topic at hand. People like you.
 

Ascend

Member
Just so that everyone is on the same page...

The XSX SSD is not the same as a 'normal' SSD.
The 6GB/s of the decompression block is not its 'max' performance.

On the hardware level, the custom NVMe drive is very, very different to any other kind of SSD you've seen before. It's shorter, for starters, presenting more like a memory card of old. It's also rather heavy, likely down to the solid metal construction that acts as a heat sink that was to handle silicon that consumes 3.8 watts of power. Many PC SSDs 'fade' in performance terms as they heat up - and similar to the CPU and GPU clocks, this simply wasn't acceptable to Microsoft, who believe that consistent performance across the board is a must for the design of their consoles.

The form factor is cute, the 2.4GB/s of guaranteed throughput is impressive, but it's the software APIs and custom hardware built into the SoC that deliver what Microsoft believes to be a revolution - a new way of using storage to augment memory (an area where no platform holder will be able to deliver a more traditional generational leap). The idea, in basic terms at least, is pretty straightforward - the game package that sits on storage essentially becomes extended memory, allowing 100GB of game assets stored on the SSD to be instantly accessible by the developer. It's a system that Microsoft calls the Velocity Architecture and the SSD itself is just one part of the system.

"Our second component is a high-speed hardware decompression block that can deliver over 6GB/s," reveals Andrew Goossen. "This is a dedicated silicon block that offloads decompression work from the CPU and is matched to the SSD so that decompression is never a bottleneck. The decompression hardware supports Zlib for general data and a new compression [system] called BCPack that is tailored to the GPU textures that typically comprise the vast majority of a game's package size."


And from that same article... What SFS is doing was not somehow already done in the past on prior consoles. It's not stated like that, but it is inferred;

As textures have ballooned in size to match 4K displays, efficiency in memory utilisation has got progressively worse - something Microsoft was able to confirm by building in special monitoring hardware into Xbox One X's Scorpio Engine SoC. "From this, we found a game typically accessed at best only one-half to one-third of their allocated pages over long windows of time," says Goossen. "So if a game never had to load pages that are ultimately never actually used, that means a 2-3x multiplier on the effective amount of physical memory, and a 2-3x multiplier on our effective IO performance."

A technique called Sampler Feedback Streaming - SFS - was built to more closely marry the memory demands of the GPU, intelligently loading in the texture mip data that's actually required with the guarantee of a lower quality mip available if the higher quality version isn't readily available, stopping GPU stalls and frame-time spikes. Bespoke hardware within the GPU is available to smooth the transition between mips, on the off-chance that the higher quality texture arrives a frame or two later. Microsoft considers these aspects of the Velocity Architecture to be a genuine game-changer, adding a multiplier to how physical memory is utilised.

If you argue that the Xbox One (X) and PS4 (pro) already had PRT and tiling, this should clarify that the velocity architecture in the XSX (particularly SFS) is different than what the Xbox One X (and therefore all previous consoles) are doing.
 
Last edited:

Redlight

Member
Are we arguing most vs majority of the time semantics now :LOL: as if that made any difference. He could sign that it stays up 99.98% of the time and people would still throw doubt and “concern” at it...

I agree that 'most' and 'majority' mean pretty much the same thing, unfortunately that's not what we were discussing.

Just to be clear, you posted 'vast majority' which is a different thing altogether. 51% of the time could be 'most', 'vast majority of the time' suggests near constant speeds.

I simply asked if you could link to your source, given the quote, should I assume that you have no source?

I think the PS5 will be at peak performance pretty much all of the time but it might be best to refrain from making up your own Cerny quotes. :)
 

Andodalf

Banned
Just so that everyone is on the same page...

The XSX SSD is not the same as a 'normal' SSD.

It’s exhausting to see so many people describe it as “just” an SSD. The truth is, it would be a high quality NVME SSD if you could buy it today, not even counting any of their optimizations or software. Those make it better than anything in the market rn for games. People act like it’s a Sata SSD from 2012 with no dram cache.
 
Last edited:
If you argue that the Xbox One (X) and PS4 (pro) already had PRT and tiling, this should clarify that the velocity architecture in the XSX (particularly SFS) is different than what the Xbox One X (and therefore all previous consoles) are doing.

I think almost more important than that is that based on this Eurogamer/DF reporting, the performance improvement claim of effectively 2X-3X for memory and storage I/O is compared to games that are running on an XBox One X, not games that don't utilize PRT/texture streaming.

To me the two most relevant unanswered questions about SFS and the performance claim are:

1) Can they actually achieve the 2-3X claim or even close to it.
2) Will the PS5 have access to the same GPU hardware feature supporting this in RDNA2 (Sampler Feedback).

We don't know the answer to either of those questions now, but this could potentially be a big change to the SSD conversation. Even if it's not 50-66% less I/O and RAM useage (the 2-3X), but something like 20-30%, that hypothetically would put the XsX much closer to the PS5 in terms of effective SSD performance, and put substantial distance between them in terms of effective graphics memory size and speed.

And if they can actually meet the full claimed improvement most of the time and the PS5 doesn't have the hardware to do it, that would be kind of wild in terms of platform differences. We'll have to see but I'm very interested to find out.
 
On the Topic of SSD speeds in both consoles:

XSX : SSD Read Speed is 2.4GB/s
PS5: SSD Read Speed is 5.5GB/S

All those other numbers that are floating around already count in compressed data.

Example:
If both consoles would read 100Gb of data that has been compressed to 50GB of data. ( 50% compression)
XSX - 20.83 seconds to Read 50GB of compressed data and output 100GB of data
PS5 - 9.09 seconds to Read 50GB of compressed data and output 100GB of data

Another example with some of these numbers that have been reported:
If both consoles need to read 100Gb of textures that have been compressed by kraken / BCPack
XSX - BCPack compression 60% - 40GB Compressed Data - 40GB/2.4GB = 16.7 seconds
PS5 - Kraken compression 30% - 70GB Compressed Data - 70GB/5.5GB = 12.7 seconds

So in the time XSX reads those 100 GB of Data that has been compressed by 60% - the PS5 could actually read 91.85GB of Kraken compressed data, which would be 119.40GB of textures.

KEEP IN MIND HOWEVER THOSE COMPRESSION RATES ARE JUST ASSUMPTIONS
We have no idea how good either of those compression methods is at compressing texture.
Also not all data necessary is texture data.
Compressions rate might therefore differ between all kinds of data structures.
 
Last edited:

Deto

Banned
Different approach >> same result in the end


The interesting pattern I noticed:

all accounts on Re and here exposed in the FUD group disappeared, at least the "celebrities" who liked to pretend to be impartial ... A while later the same pattern appears, new accounts with the same nickname on the resetera and here, spreading the even FUD that had already died two months ago "clock fake" "SSD overheats"

the pattern of the subject who has the same nick in the discord, resetera, gaf, with the same narrative.
 

geordiemp

Member
On the Topic of SSD speeds in both consoles:

XSX : SSD Read Speed is 2.4GB/s
PS5: SSD Read Speed is 5.5GB/S

All those other numbers that are floating around already count in compressed data.

Example:
If both consoles would read 100Gb of data that has been compressed to 50GB of data. ( 50% compression)
XSX - 20,83 seconds
PS5 - 9,09 seconds

We dont know how mesh vertices like in that UE5 demo in small tiles will compress much, you cant do lossy.

Interesting times....
 

Three

Member
Here is a quote from the Eurogamer article reporting what Microsoft said to them on this subject:



Microsoft looked at a relatively modern SoC running modern games and came away realizing that with more information about what MIPS/portions of MIPS of a texture were needed that features to aid texture streaming could close that gap and be 2-3X more efficient. It’s a claim that they will have to prove, but it’s not impossible and based on this reporting, it is clearly not referring to the general technique of texture streaming.

And regarding the availability of Sampler Feedback in Direct 3D prior to now, according to a DX12 ultimate preview article from fall 2019:



Yes, Partial Resident Textures and texture streaming have been around for a while but there are limits on how aggressive you can be with loading/not loading textures in unless you know for sure which textures are needed. SFS is designed to solve that by providing the feedback information about what is actually needed in memory. Before this feature the software had to guess through approximations. The lack of accuracy in these techniques required more textures to be in memory than was actually required, hence the 1/2 to 1/3rd actual memory utilization.
I'll quote sections of that article below. A few lines up from what you quoted.
"This wastage comes principally from the textures. Textures are universally the biggest consumers of memory for games. However, only a fraction of the memory for each texture is typically accessed by the GPU during the scene. For example, the largest mip of a 4K texture is eight megabytes and often more, but typically only a small portion of that mip is visible in the scene and so only that small portion really needs to be read by the GPU."

The solution to that is what? SF and PRT.
Which is what then follows with

"So if a game never had to load pages that are ultimately never actually used, that means a 2-3x multiplier on the effective amount of physical memory, and a 2-3x multiplier on our effective IO performance."

Now where do we get custom technique or hardware from this? He is essentially describing SF or PRT which is a part of SFS. Monitoring hardware on xbox one x (not Series X) showed them that figure in games but PRT and SF are the solution to that problem and SFS an extension of it. So what do you think is in SFS or what inefficiency do you think it overcomes in hardware for it to be a 2x or 3x improvement on top of those instead of it (PRT or SF) just being where the majority of that efficiency comes from in SFS? Because your quote only explains exactly the same thing.

The only bespoke hardware mention was to make transitioning between mips on misses

"Bespoke hardware within the GPU is available to smooth the transition between mips, on the off-chance that the higher quality texture arrives a frame or two later."

Exactly as Panajev2001a was saying.
 
Last edited:

Deto

Banned
impressive that MS declares 4.8GB / s, but it's actually more than 6GB / s

Sony says it is 9GB / s, but it is fake because the SSD overheats.


I just wanted to tell MS that it is stupid to spend money on astroturfing on internet forums.

But everyone knows that Greenberg is stupid and will continue to do so at least until the middle of next year.
 
Last edited:

BrentonB

Member
The above quotes from Eurogamer indicate that Microsoft's published speeds already take into account the use of Sampler Feedback. It is not being 'added on.' The noticed inefficient memory usage in the One X and came up with a solution. This efficiency, along with the use of an SSD in general, resulted in the data transfer speeds Microsoft has published. Those are the numbers. They will not change. Gaming will be better next gen.
 

Andodalf

Banned
1) Can they actually achieve the 2-3X claim or even close to it.
2) Will the PS5 have access to the same GPU hardware feature supporting this in RDNA2 (Sampler Feedback).

Can the PS5 achieve it's claims, or even close to them?

Does the XSX actually have all the thing SSD features PS5 talked about?

Why would you ask those questions one way but not the other way around? You can't have discourse if you assume one side is always lying, and that the other is perfect and has every theoretical advantage, even one's they haven't talked about. The boded clearly shows this is a leading question, and that you just assume anything they say is a lie. We have to assume honest intent, and we can only work of what we know, or have very good reason to suspect. and yeah it's very reasonable to say that PS5 has SF, 100%, based on what we know about RDNA2 and DX12 IMO. The question is SFS, and to what extent that improves things.
 

oldergamer

Member
I think almost more important than that is that based on this Eurogamer/DF reporting, the performance improvement claim of effectively 2X-3X for memory and storage I/O is compared to games that are running on an XBox One X, not games that don't utilize PRT/texture streaming.

To me the two most relevant unanswered questions about SFS and the performance claim are:

1) Can they actually achieve the 2-3X claim or even close to it.
2) Will the PS5 have access to the same GPU hardware feature supporting this in RDNA2 (Sampler Feedback).

We don't know the answer to either of those questions now, but this could potentially be a big change to the SSD conversation. Even if it's not 50-66% less I/O and RAM useage (the 2-3X), but something like 20-30%, that hypothetically would put the XsX much closer to the PS5 in terms of effective SSD performance, and put substantial distance between them in terms of effective graphics memory size and speed.

And if they can actually meet the full claimed improvement most of the time and the PS5 doesn't have the hardware to do it, that would be kind of wild in terms of platform differences. We'll have to see but I'm very interested to find out.

I would add one question to that list.
3. Can the sony NVME setup actually live up to the stated performance level?

I still have my doubts that sony's solution, even though it states 5.5 GB per second is going to deliver far lower, specifically if they are using 12 channels/lanes, or one for each memory chip on the NVME. That will impact performance more than i think people will realize. In turn I expect Xbox to punch above its weight a little, and sony's brute force approach to be more conservative. In the end, the the difference I suspect in paper specs vrs useable performance on both drives will not be a 50%. I'm expecting a smaller delta.
 

Ascend

Member
The above quotes from Eurogamer indicate that Microsoft's published speeds already take into account the use of Sampler Feedback. It is not being 'added on.' The noticed inefficient memory usage in the One X and came up with a solution. This efficiency, along with the use of an SSD in general, resulted in the data transfer speeds Microsoft has published. Those are the numbers. They will not change. Gaming will be better next gen.
How? 4.8GB/s is talking about compression, not SFS.
 

oldergamer

Member
6gb isn't a theoretical max. It is the stated MINIMUM decompression throughput rate of the Hardware decompression block.

Much like the stated locked compute of the XSX CU array is 12.15.

Upon some reflection and reading through this thread and others, I actually think I understand why they chose a decompression rate of 6GB/s... but its simply a random speculation nothing more.
Why do you suspect they did that? To cover peak performance?
 
Those are the numbers. They will not change. Gaming will be better next gen.

It's great that we are discussing the technology and learning how It all works. However the results will be what Microsoft gives us on the specifications sheets. They will do everything possible to make the numbers higher and what's in the specs should be what people should go with.

I'm pretty sure if in the future if they push those figures higher they won't hesitate to update the specifications.
 
A few lines up.
"This wastage comes principally from the textures. Textures are universally the biggest consumers of memory for games. However, only a fraction of the memory for each texture is typically accessed by the GPU during the scene. For example, the largest mip of a 4K texture is eight megabytes and often more, but typically only a small portion of that mip is visible in the scene and so only that small portion really needs to be read by the GPU."

The solution to that is what? SF and PRT.
Which is what then follows with

"So if a game never had to load pages that are ultimately never actually used, that means a 2-3x multiplier on the effective amount of physical memory, and a 2-3x multiplier on our effective IO performance."

Now where do we get custom technique or hardware from this? He is essentially describing SF or PRT which is a part of SFS. Monitoring hardware on xbox one x (not Series X) showed them that figure in games but PRT and SF are the solution to that problem and SFS an extension of it. So what do you think is in SFS or what inefficiency do you think it overcomes in hardware for it to be a 2x or 3x improvement over others instead of just addressing the problem with those? Because your quote only explains exactly the same thing.

The only bespoke hardware mention was to make transitioning between mips on misses

"Bespoke hardware within the GPU is available to smooth the transition between mips, on the off-chance that the higher quality texture arrives a frame or two later."

Exactly as Panajev2001a was saying.

I think you're mixing up a few things here.

1) Not sure if you're saying this or not, but just in case, PRT and SF are not the same thing. You can use PRT to stream portions of textures in and out, and SF lets you make more accurate and performant decisions about when to do that.
2) SF is a new GPU hardware feature that was only exposed via Direct X late last year. It is currently only on nvidia Turing (RTX) and will be in RDNA2 on PC when it comes out. It doesn't need to be bespoke to be relevant if it's not on PS5 (which we have no confirmation of).
3) I'm using SF and SFS somewhat interchangeably mostly because I don't believe it's known whether the PS5 has Sample Feedback. I know some people infer this from the mention of RDNA2 but I just don't think we know.
4) I don't think it is particularly important how much secret Sauce comes with that extra S in SFS. What's relevant is the performance claim and the context of it.
5) Like with VRS, I think Microsoft mentions these things in their marketing to compete with PS5. Specifically pointing out a performance improvement that is also available on the PS5 seems a little odd. At this point we don't know though.
6) If a new GPU hardware feature allows for 2-3X memory and I/O efficiency on the PS5, why didn't Marc Cerny mention it in his developer presentation? Seems like that would be really cool and relevant to this whole topic of using the SSD to stream in graphics data. Again, we don't know, but something to ponder.


Stepping away from this high-stakes, high-emotion family feud for a second (I kid), even if Sampler Feedback is on both consoles, consider what that means for games in the future. These specs, while already impressive, become really incredible. Effectively 32-48GB memory! Effectively 1TB-1.5TB/s memory bandwidth! Effectively 15-30GB/s SSD throughput! Couple that with the efficiency improvements in RDNA2 vs GCN (DF showed something like a 1.5X to the TF spec vs framerate with RDNA1) plus things like VRS and these consoles start to seem pretty powerful! I'm excited to see how games will look as time goes on.
 

FranXico

Member
Why do you suspect they did that? To cover peak performance?
I'm guessing that 16 - 10 = 6. The lower bandwith portion is fast enough for the SSD and decompressor to only stream as needed. The large faster portion is updated more frequently, and filled in by rasterization, used for frame buffer, etc.

Just toying with this notion that maybe the distinct memory bandwidth actively serves a purpose.
 

Andodalf

Banned
Exactly, they stated it was compressed.

SFS isn't compression. Compression is making data smaller, and planning on restoring it. You're going to make it bigger again so it's usable. You might lose a bit on the way, but you aren't getting the linear advantage in Memory savings that is describes for SFS (2-3X for both memory and storage). Compression is squeezing 4.8 of data into 2.4. SFS is completely excluding textures it think won't come up.

"2.4 GB/s (Raw), 4.8 GB/s (Compressed, with custom hardware decompression block)"

That's the quote from the XSX web page. Nothing indicates SFS is in either number. SFS is not a part of hardware decompression and isn't in the raw, as we know that's the base throughput.
 

Ascend

Member
I think you're mixing up a few things here.

1) Not sure if you're saying this or not, but just in case, PRT and SF are not the same thing. You can use PRT to stream portions of textures in and out, and SF lets you make more accurate and performant decisions about when to do that.
2) SF is a new GPU hardware feature that was only exposed via Direct X late last year. It is currently only on nvidia Turing (RTX) and will be in RDNA2 on PC when it comes out. It doesn't need to be bespoke to be relevant if it's not on PS5 (which we have no confirmation of).
3) I'm using SF and SFS somewhat interchangeably mostly because I don't believe it's known whether the PS5 has Sample Feedback. I know some people infer this from the mention of RDNA2 but I just don't think we know.
4) I don't think it is particularly important how much secret Sauce comes with that extra S in SFS. What's relevant is the performance claim and the context of it.
5) Like with VRS, I think Microsoft mentions these things in their marketing to compete with PS5. Specifically pointing out a performance improvement that is also available on the PS5 seems a little odd. At this point we don't know though.
6) If a new GPU hardware feature allows for 2-3X memory and I/O efficiency on the PS5, why didn't Marc Cerny mention it in his developer presentation? Seems like that would be really cool and relevant to this whole topic of using the SSD to stream in graphics data. Again, we don't know, but something to ponder.


Stepping away from this high-stakes, high-emotion family feud for a second (I kid), even if Sampler Feedback is on both consoles, consider what that means for games in the future. These specs, while already impressive, become really incredible. Effectively 32-48GB memory! Effectively 1TB-1.5TB/s memory bandwidth! Effectively 15-30GB/s SSD throughput! Couple that with the efficiency improvements in RDNA2 vs GCN (DF showed something like a 1.5X to the TF spec vs framerate with RDNA1) plus things like VRS and these consoles start to seem pretty powerful! I'm excited to see how games will look as time goes on.
Just in support of this...

 

oldergamer

Member
On the Topic of SSD speeds in both consoles:

XSX : SSD Read Speed is 2.4GB/s
PS5: SSD Read Speed is 5.5GB/S

All those other numbers that are floating around already count in compressed data.

Example:
If both consoles would read 100Gb of data that has been compressed to 50GB of data. ( 50% compression)
XSX - 20.83 seconds to Read 50GB of compressed data and output 100GB of data
PS5 - 9.09 seconds to Read 50GB of compressed data and output 100GB of data

Another example with some of these numbers that have been reported:
If both consoles need to read 100Gb of textures that have been compressed by kraken / BCPack
XSX - BCPack compression 60% - 40GB Compressed Data - 40GB/2.4GB = 16.7 seconds
PS5 - Kraken compression 30% - 70GB Compressed Data - 70GB/5.5GB = 12.7 seconds

So in the time XSX reads those 100 GB of Data that has been compressed by 60% - the PS5 could actually read 91.85GB of Kraken compressed data, which would be 119.40GB of textures.

KEEP IN MIND HOWEVER THOSE COMPRESSION RATES ARE JUST AUSSUMPTIONS
We have no idea how good either of those compression methods is at compressing texture.
Also not all data necessary is texture data.
Compressions rate might therefore differ between all kinds of data structures.
Interesting post btw. However we should consider how performance per lane degrades when there is a single memory chip on a lane by itself.. Xbox uses 4 lanes, and PS5 uses 12. Not sure how you would factor that into those numbers.
 
Can the PS5 achieve it's claims, or even close to them?

Does the XSX actually have all the thing SSD features PS5 talked about?

Why would you ask those questions one way but not the other way around? You can't have discourse if you assume one side is always lying, and that the other is perfect and has every theoretical advantage, even one's they haven't talked about. The boded clearly shows this is a leading question, and that you just assume anything they say is a lie. We have to assume honest intent, and we can only work of what we know, or have very good reason to suspect. and yeah it's very reasonable to say that PS5 has SF, 100%, based on what we know about RDNA2 and DX12 IMO. The question is SFS, and to what extent that improves things.

I'm really not intending to lead with my question. I'm sort of rooting for the SFS claims to be true because at an engineering level I love the idea of an efficiency improvement like that having such a profound impact on memory and bandwidth. I was intentionally adding some leveled skepticism to my own enthusiasm in an attempt to be fair to the other side of the debate.
 
Top Bottom