martino
Member
Just in support of this...
are you aware he is saying advantage of it are already mostly there ?
Just in support of this...
I know that, That's what i was alluding to for the 4.8gb number ms has posted.SFS isn't compression. Compression is making data smaller, and planning on restoring it. You're going to make it bigger again so it's usable. You might lose a bit on the way, but you aren't getting the linear advantage in Memory savings that is describes for SFS (2-3X for both memory and storage). Compression is squeezing 4.8 of data into 2.4. SFS is completely excluding textures it think won't come up.
"2.4 GB/s (Raw), 4.8 GB/s (Compressed, with custom hardware decompression block)"
That's the quote from the XSX web page. Nothing indicates SFS is in either number. SFS is not a part of hardware decompression and isn't in the raw, as we know that's the base throughput.
And why do you come to that conclusion? The Eurogamer article clearly stated that on the Xbox One X, memory utilization efficiency is low. Considering that games have been using PRT for a while, it's quite safe to assume, that what they discovered on the Xbox One X, was with PRT. Not without.are you aware he is saying advantage of it are already mostly there ?
Hmm I'm not sure i understand what that means. using sampler feedback to trigger page reads? what does it achieve? is this in the absence of a Cache?
are you aware he is saying advantage of it are already mostly there ?
If you are referring to Sampler Feedback vs Sample Feedback Streaming, then you are partially right. I doubt there are any games right now that use this since it is only a few months old so "already mostly there" isn't quite true.
Interesting post btw. However we should consider how performance per lane degrades when there is a single memory chip on a lane by itself.. Xbox uses 4 lanes, and PS5 uses 12. Not sure how you would factor that into those numbers.
the tweet talk about PRT.
Well done, you have described Ps5 design, frequencies change to save power when they can, its not that hard.
I'm sorry, I get frustrated at times when I see a genuinely interesting discussion getting constantly derailed by people who appear to lack any real understanding of the topic at hand. People like you.
The ability to do this does sounds like something very new. Is this custom to the Xbox GPU?
That's a complicated question with two answers:
1) What I described was Sampler Feedback. We know more about that because it is in DX12 Ultimate, meaning its on RTX nvidia cards and will be in RDNA2 GPUs from AMD.
Because console GPUs are somewhat custom, we don't know if this will be on both the XsX and the PS5, or just the XsX.
Because of the marketing from both sides on this and the VRS feature (lots of claims on one side and lots of silence from the other), I'm inclined to bet that the PS5 doesn't have it. But that's just a guess, we don't really know.
and
2) The other issues that some people get hung up on is the difference between Sample Feedback and Sampler Feedback Streaming. Here's all we know about that:
- Microsoft uses two distinct terms in their marketing and documentation: SF and SFS.
- SF is a big part of SFS, but not all of it (it is in the name after all).
- SFS appears to be a banner over a hardware/software feature (the SF part at least is hardware)
- The performance claim of 2X-3X is only seen in the SFS XBox marketing, not in the DX12 marketing.
- There are some folks on twitter vaguely alluding to some additional hardware features in SFS vs SF, not enough to really draw strong conclusions (it's twitter)
How? 4.8GB/s is talking about compression, not SFS.
4.8gb/s is not the "all in" calculation. Not sure where you are getting this from.I'm saying that all of the efficiency gains made with SFS, in combination with the custom SSD, will result in the published transfer speeds. This is the 'all in' final calculation.
That's exactly what Microsoft have said it is.4.8gb/s is not the "all in" calculation. Not sure where you are getting this from.
2) The other issues that some people get hung up on is the difference between Sample Feedback and Sampler Feedback Streaming. Here's all we know about that:
- Microsoft uses two distinct terms in their marketing and documentation: SF and SFS.
- SF is a big part of SFS, but not all of it (it is in the name after all).
- SFS appears to be a banner over a hardware/software feature (the SF part at least is hardware)
- The performance claim of 2X-3X is only seen in the SFS XBox marketing, not in the DX12 marketing.
- There are some folks on twitter vaguely alluding to some additional hardware features in SFS vs SF, not enough to really draw strong conclusions (it's twitter)
SFS is sometimes known as PRT+I think you're mixing up a few things here.
1) Not sure if you're saying this or not, but just in case, PRT and SF are not the same thing. You can use PRT to stream portions of textures in and out, and SF lets you make more accurate and performant decisions about when to do that.
2) SF is a new GPU hardware feature that was only exposed via Direct X late last year. It is currently only on nvidia Turing (RTX) and will be in RDNA2 on PC when it comes out. It doesn't need to be bespoke to be relevant if it's not on PS5 (which we have no confirmation of).
3) I'm using SF and SFS somewhat interchangeably mostly because I don't believe it's known whether the PS5 has Sample Feedback. I know some people infer this from the mention of RDNA2 but I just don't think we know.
4) I don't think it is particularly important how much secret Sauce comes with that extra S in SFS. What's relevant is the performance claim and the context of it.
5) Like with VRS, I think Microsoft mentions these things in their marketing to compete with PS5. Specifically pointing out a performance improvement that is also available on the PS5 seems a little odd. At this point we don't know though.
6) If a new GPU hardware feature allows for 2-3X memory and I/O efficiency on the PS5, why didn't Marc Cerny mention it in his developer presentation? Seems like that would be really cool and relevant to this whole topic of using the SSD to stream in graphics data. Again, we don't know, but something to ponder.
Stepping away from this high-stakes, high-emotion family feud for a second (I kid), even if Sampler Feedback is on both consoles, consider what that means for games in the future. These specs, while already impressive, become really incredible. Effectively 32-48GB memory! Effectively 1TB-1.5TB/s memory bandwidth! Effectively 15-30GB/s SSD throughput! Couple that with the efficiency improvements in RDNA2 vs GCN (DF showed something like a 1.5X to the TF spec vs framerate with RDNA1) plus things like VRS and these consoles start to seem pretty powerful! I'm excited to see how games will look as time goes on.
Terminology
Use of sampler feedback with streaming is sometimes abbreviated as SFS. It is also sometimes called sparse feedback textures, or SFT, or PRT+, which stands for “partially resident textures”.
You know this is a lie? I didn't read the rest of what you were saying when I saw this:
What you're not getting is that the 2x efficiency (not performance in the spec) exists already on other GPUs in SF and PRT.
4.8gb/s is with BCpack compression and no other bandwidth saving measures. SFS and any other methods for saving bandwidth are in addition to 4.8gb (just like they would be on PS4 if they support the same feature set)That's exactly what Microsoft have said it is.
This is true for all modern computers, but many manufactures don't bother calling it variable. They just don't advertise clock speeds. Why does Sony?Not all games or applications will require full power from the APU. Take backwards compatibility or Netflix as an example. The CPU will hardly be taxed by a PS4 game and watching a video will hardly require anything from the system. In these cases it will be more efficient to lower clock speeds. This is especially useful when a game engine isn't yet fully optimized for PS5 and running full tilt will actually break a game.
4.8gb/s is with BCpack compression and no other bandwidth saving measures. SFS and any other methods for saving bandwidth are in addition to 4.8gb
This is true for all modern computers, but many manufactures don't bother calling it variable. They just don't advertise clock speeds. Why does Sony?
No, the 2x~ 3x multiplier they refer to is the efficiency gained over the One X after they implemented SFS. That and their custom SSD solution mean that the total data transfer rate for the Series X will be 4.8 GB/s when the data is compressed. This will be a huge upgrade from current generation consoles.
'Samper feedback' just standardises it in the DX API, something that was done before when determining which textures to load in games. The RTX cards got a driver update for DX12 sampler feedback. They didn't get some special custom hardware they didn't have before and again this proves it is not XSX secret sauce as the spec is known.
The way you know which ones to load have been done before in different ways.
The difference in committed memory is very high— 524,288 versus 51,584 kilobytes! About a tenth the space for this tiled resource-based, full-mip-chain-based texturing system. Although this demo comparison is a bit silly, it confirms something you probably suspected: good judgments about what to load next can mean dramatic memory savings. And even if you’re using a partial-mip-chain-based system, accurate sampler feedback can still allow you to make better judgments about what to load and when.
Suppose you are shading a complicated 3D scene. The camera moves swiftly throughout the scene, causing some objects to be moved into different levels of detail. Since you need to aggressively optimize for memory, you bind resources to cope with the demand for different LODs. Perhaps you use a texture streaming system; perhaps it uses tiled resources to keep those gigantic 4K mip 0s non-resident if you don’t need them. Anyway, you have a shader which samples a mipped texture using A Very Complicated sampling pattern. Pick your favorite one, say anisotropic.
The sampling in this shader has you asking some questions.
What mip level did it ultimately sample? Seems like a very basic question. In a world before Sampler Feedback there’s no easy way to know. You could cobble together a heuristic. You can get to thinking about the sampling pattern, and make some educated guesses. But 1) You don’t have time for that, and 2) there’s no way it’d be 100% reliable.
Where exactly in the resource did it sample? More specifically, what you really need to know is— which tiles? Could be in the top left corner, or right in the middle of the texture. Your streaming system would really benefit from this so that you’d know which mips to load up next. Yeah while you could always use HLSL CheckAccessFullyMapped to determine yes/no did-a-sample-try-to-get-at-something-nonresident, it’s definitely not the right tool for the job.
Direct3D Sampler Feedback answers these powerful questions.
That's incorrect. I'm not sure why this is hard to understand, but I think you are confused. BCpack compression is not the same feature as sampler feedback.
2.4gb/s is uncompressed data
4.8gb/s is with BCpack compression of texture data
Further bandwidth savings that can give a 2x to 3x "effective" increase in bandwidth comes from using SFS and not loading parts of textures or geometry before they are loaded into memory. This isn't factored into 4.8gb becuase there's no way to know exactly what each game will get from this. More complicated visual titles will likely hit the 2x to 3x range. Simpler titles likely will not see a big benefit. Say you can fit one large texture into memory. If you only need 25% of that texture, you still have room to fit 75% of other textures.
Wait, are you saying the same thing I am now that i laid it out?
2.4 GB/s is uncompressed data
+6GB/s is decompressed mode (ZLIB or BCpack)
"Our second component is a high-speed hardware decompression block that can deliver over 6GB/s," reveals Andrew Goossen.Source?
OFFICIAL DATA is up to 4.8GB / s
So can I speak at 22GB / s on the PS5?
6GB / s is in the new power of the cloud and the Hidden GPU.
That's incorrect. I'm not sure why this is hard to understand, but I think you are confused. BCpack compression is not the same feature as sampler feedback.
2.4gb/s is uncompressed data
4.8gb/s is with BCpack compression of texture data
Further bandwidth savings that can give a 2x to 3x "effective" increase in bandwidth comes from using SFS and not loading parts of textures or geometry before they are loaded into memory. This isn't factored into 4.8gb becuase there's no way to know exactly what each game will get from this. More complicated visual titles will likely hit the 2x to 3x range. Simpler titles likely will not see a big benefit. Say you can fit one large texture into memory. If you only need 25% of that texture, you still have room to fit 75% of other textures.
Wait, are you saying the same thing I am now that i laid it out?
"Our second component is a high-speed hardware decompression block that can deliver over 6GB/s," reveals Andrew Goossen.
Close, but not quite. I'm saying that the efficiency gains from Sampler Feedback are in relation to the One X hard drive. It's an effective 2 - 3x increase in efficiency compared to that console.
Sony didn't come out and say they have RDNA 2 either but there you go they do. They didn't need to.I never claimed Sampler Feedback is only in the Xbox though I did suggest it may not be in the PS5. All Sony has to do is say they have SF (and VRS) and I will believe them.
...and you're wrong about Sampler Feedback being just a codifying of something that already existed. Unless you only mean it was already in RTX (your arguments are kind of fuzzy around that). I'm not sure how that actually came about, but clearly nvidia and MS were working together in development of upcoming new features in DX12U since they were already present in hardware in 2018. There's mention of variation in hardware implementation (probably between AMD and nvidia), but nvidia says that RTX cards are the only ones that support it from them.
Everyone from Anandtech to Microsoft to nvidia refers to Sampler Feedback as a new hardware feature. It was exposed in RTX for the first time by DX12U (even though the hardware support was obviously present).
It seems to me like you're just doing a lot of hand-waving, saying "yeah yeah people have done this before, there a bunch of different ways..." Maybe you're right in that Sampler Feedback is not a big deal in practice, but that is clearly not what Microsoft is saying.
Here's what Microsoft says about "the way you know which ones to load".
Coming to DirectX 12— Sampler Feedback: some useful once-hidden data, unlocked - DirectX Developer Blog
Why Feedback: A Streaming Scenario Suppose you are shading a complicated 3D scene. The camera moves swiftly throughout the scene, causing some objects to be moved into different levels of detail. Since you need to aggressively optimize for memory, you bind resources to cope with the demand for...devblogs.microsoft.com
If that's not clear, she is saying that in this extreme example of two different texture streaming systems, the one that uses Sampler Feedback uses 10(ten) times less memory! This paragraph alone refutes what you're saying on whether this feature is JUST PRT in new a dressing.
Here's some more from that same article on why SF is different.
Although this demo comparison is a bit silly, it confirms something you probably suspected:
And even if you’re using a partial-mip-chain-based system, accurate sampler feedback can still allow you to make better judgments about what to load and when.
"Our second component is a high-speed hardware decompression block that can deliver over 6GB/s," reveals Andrew Goossen.
You are. You're telling me things that have nothing to with the conversation. It doesn't matter if it's not full RDNA 2, the fact is, Sony and MS call it RDNA 2 and you can't say one is full RDNA 2 and the other is not.
I watched it, and again, its IRREVERENT.
Let me put it like this since you're clearly missing the point.
They're saying it will be 4.9 and sometimes hit 6.9 because PS5's variable frequency won't be able to maintain that number when needed.
It's the fact that they're saying 10.2TF is not the real number and that it will only hit that number "sometimes".
I don't have to explain this an further.
Right, all MS numbers are "substained" but PS5's SSDs are not. You're really playing word games here and it reminds when we had this discussion a few months ago when you were saying Cerny was assuming his numbers.
I don't have time for people who just want to keep moving goalpost.
So can I speak at 22GB/s on the PS5?
6gb isn't a theoretical max. It is the stated MINIMUM decompression throughput rate of the Hardware decompression block.
Much like the stated locked compute of the XSX CU array is 12.15.
Upon some reflection and reading through this thread and others, I actually think I understand why they chose a decompression rate of 6GB/s... but its simply a random speculation nothing more.
Close, but not quite. I'm saying that the efficiency gains from Sampler Feedback are in relation to the One X hard drive. It's an effective 2 - 3x increase in efficiency compared to that console. I don't believe it will dramatically increase the loading speed of the Series X. The only numbers that Microsoft have made public are 2.4 GB/s and 4.8 GB/s. Those are the numbers I will refer to. I will be interested to see real-world tests.
That's not my argument.So can I speak at 22GB/s on the PS5?
That's not my argument.
It's NVMe's 2.4GB/s data stream hitting the decompression block which then decompressed into 4.8 GB/s to "more than 6GB/s".It can deliver up to 6GB/s but will it? I don't think so. There are only 4.8 GB of compressed data coming into the decompression block. I believe this design choice was made to ensure that there is always room for incoming data and that it will never be 'backed up'. Basically the system can decompress more data than it actually needs to so that nothing has to 'wait its turn.'
PS5 wasn't mentioned in my post. LOLIt is exactly your argument.
Just apply the same argument on the PS5 and it magically reaches 22GB / s
On the Topic of SSD speeds in both consoles:
XSX : SSD Read Speed is 2.4GB/s
PS5: SSD Read Speed is 5.5GB/S
All those other numbers that are floating around already count in compressed data.
Example:
If both consoles would read 100Gb of data that has been compressed to 50GB of data. ( 50% compression)
XSX - 20.83 seconds to Read 50GB of compressed data and output 100GB of data
PS5 - 9.09 seconds to Read 50GB of compressed data and output 100GB of data
Another example with some of these numbers that have been reported:
If both consoles need to read 100Gb of textures that have been compressed by kraken / BCPack
XSX - BCPack compression 60% - 40GB Compressed Data - 40GB/2.4GB = 16.7 seconds
PS5 - Kraken compression 30% - 70GB Compressed Data - 70GB/5.5GB = 12.7 seconds
So in the time XSX reads those 100 GB of Data that has been compressed by 60% - the PS5 could actually read 91.85GB of Kraken compressed data, which would be 119.40GB of textures.
KEEP IN MIND HOWEVER THOSE COMPRESSION RATES ARE JUST AUSSUMPTIONS
We have no idea how good either of those compression methods is at compressing texture.
Also not all data necessary is texture data.
Compressions rate might therefore differ between all kinds of data structures.
Sony didn't come out and say they have RDNA 2 either but there you go they do. They didn't need to.
There is no handwaving. SF is just a standardized way now in the DX12 API you could write a similar but not identical thing in HLSL. There are examples of alternatives even in the spec. SF is just a standardized API feature. You could write it in the driver for most cards that support PRT.
...
Why do you think it's saying it's a bit silly? Because nobody just loads everything like that and people have wrote good systems that do something similar already, especially on consoles.
...
You even have some examples of the alternatives in your quote so why are you saying it's handwaving?
It's NVMe's 2.4GB/s data stream hitting the decompression block which then decompressed into 4.8 GB/s to "more than 6GB/s".
Are you claiming the decompression block is built into NVMe device?
PS5 wasn't mentioned in my post. LOL
PS fanboys made a big deal over a .5 TFLOPS difference last Gen, but for whatever reason a 2 TFLOP advantage is "insignificant". Yet with that .5 TFLOP difference manifested itself in framerate and resolution. So I suspect that a bigger gap this gen will see at least a similar difference. I think it will be greater because I'm not buying Sony's clock speed claims. If the damn thing spent "most of it's time" at 2.23 GHz then just say that's the clock speed and call it a day. PC manufacturers do that all the time. But I digress, back to storage...
There's nothing disingenuous about my post. The real world difference between Xbox Series X and PS5 storage solutions will be measured in milliseconds and not seconds. The end user won't notice the difference in game. There's nothing in that UE5 demo that couldn't be done in the Xbox Velocity Architecture. You're believing in a fantasy world where only Sony listened to developers and put significant investment in their storage solution while MS went to Best Buy and just slapped a random SSD in the box and called it a day.
Both companies focused significantly on storage and asset streaming. However, Microsoft didn't skimp on GPU to get there.
On the Topic of SSD speeds in both consoles:
XSX : SSD Read Speed is 2.4GB/s
PS5: SSD Read Speed is 5.5GB/S
All those other numbers that are floating around already count in compressed data.
Example:
If both consoles would read 100Gb of data that has been compressed to 50GB of data. ( 50% compression)
XSX - 20.83 seconds to Read 50GB of compressed data and output 100GB of data
PS5 - 9.09 seconds to Read 50GB of compressed data and output 100GB of data
Another example with some of these numbers that have been reported:
If both consoles need to read 100Gb of textures that have been compressed by kraken / BCPack
XSX - BCPack compression 60% - 40GB Compressed Data - 40GB/2.4GB = 16.7 seconds
PS5 - Kraken compression 30% - 70GB Compressed Data - 70GB/5.5GB = 12.7 seconds
So in the time XSX reads those 100 GB of Data that has been compressed by 60% - the PS5 could actually read 91.85GB of Kraken compressed data, which would be 119.40GB of textures.
KEEP IN MIND HOWEVER THOSE COMPRESSION RATES ARE JUST ASSUMPTIONS
We have no idea how good either of those compression methods is at compressing texture.
Also not all data necessary is texture data.
Compressions rate might therefore differ between all kinds of data structures.
I never claimed Sampler Feedback is only in the Xbox though I did suggest it may not be in the PS5. All Sony has to do is say they have SF (and VRS) and I will believe them.
...and you're wrong about Sampler Feedback being just a codifying of something that already existed. Unless you only mean it was already in RTX (your arguments are kind of fuzzy around that). I'm not sure how that actually came about, but clearly nvidia and MS were working together in development of upcoming new features in DX12U since they were already present in hardware in 2018. There's mention of variation in hardware implementation (probably between AMD and nvidia), but nvidia says that RTX cards are the only ones that support it from them.
Everyone from Anandtech to Microsoft to nvidia refers to Sampler Feedback as a new hardware feature. It was exposed in RTX for the first time by DX12U (even though the hardware support was obviously present).
It seems to me like you're just doing a lot of hand-waving, saying "yeah yeah people have done this before, there a bunch of different ways..." Maybe you're right in that Sampler Feedback is not a big deal in practice, but that is clearly not what Microsoft is saying.
Here's what Microsoft says about "the way you know which ones to load".
Coming to DirectX 12— Sampler Feedback: some useful once-hidden data, unlocked - DirectX Developer Blog
Why Feedback: A Streaming Scenario Suppose you are shading a complicated 3D scene. The camera moves swiftly throughout the scene, causing some objects to be moved into different levels of detail. Since you need to aggressively optimize for memory, you bind resources to cope with the demand for...devblogs.microsoft.com
If that's not clear, she is saying that in this extreme example of two different texture streaming systems, the one that uses Sampler Feedback uses 10(ten) times less memory! This paragraph alone refutes what you're saying on whether this feature is JUST PRT in new a dressing.
Here's some more from that same article on why SF is different.
Indeed... And I pointed this out in another thread... Quoting...;Thank you for this. They’ve been trying to blurr the difference between between PRT and Sampler Feedback. I would advise not to argue with them.
I would add one question to that list.
3. Can the sony NVME setup actually live up to the stated performance level?
I still have my doubts that sony's solution, even though it states 5.5 GB per second is going to deliver far lower, specifically if they are using 12 channels/lanes, or one for each memory chip on the NVME. That will impact performance more than i think people will realize. In turn I expect Xbox to punch above its weight a little, and sony's brute force approach to be more conservative. In the end, the the difference I suspect in paper specs vrs useable performance on both drives will not be a 50%. I'm expecting a smaller delta.
I have the feeling I still need to clarify something
Back to my example:
If both consoles need to read 100Gb of textures that have been compressed by kraken / BCPack
XSX - BCPack compression 60% - 40GB Compressed Data - 40GB/2.4GB = 16.7 seconds | 100GB worth of uncompressed data divided by 16.7 seconds is 5.99GB/s compressed data
PS5 - Kraken compression 30% - 70GB Compressed Data - 70GB/5.5GB = 12.7 seconds | 100GB worth of uncompressed data divided by 12,7 seconds is 7.87GB/s compressed data
Wait so it's referring to minimum decompressison throughput rate? Has this been mentioned by any of the MS Xbox engineering team members?
Was under the impression it was theoretical max, but if I'm wrong I'm wrong. Or is this a personal analysis? Is there a way you can expand on this?
wow, we already have the magic and the xbox SSD is already delivering the same thing as the PS5.
what a delusion.
What a coincidence that all these accounts with these magic numbers deducted from the lollipop world are from this year.
If I were you, I would comment on YouTube and expect the Windows Central people to spread the FUD on Twitter