• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Xbox Velocity Architecture - 100 GB is instantly accessible by the developer through a custom hardware decompression block

Ascend

Member
Copied from the patent;

ABSTRACT
Systems , methods , apparatuses , and software for graphics processing systems in computing environments are provided herein . In one example , a method of handling tiled resources in graphics processing environments is presented . The method includes establishing , in a graphics processing unit , a residency map having values determined from memory residency properties of a texture resource , and sampling from the residency map at a specified location to determine a residency map sample for the texture resource at the specified location , where the residency map sample indicates at least an initial level of detail presently resident and a smoothing component to reach a next level of detail .


That is going to take a while to fully understand. But it's patented, so, no one else has this to this degree.
 

Thirty7ven

Banned

Thank you for this. Interesting thread by the way, and more information after. Shifty Geezer chimes in too, and I know him from reading his stuff on RT some time ago. He’s input should be valued also.
 

Fafalada

Fafracer forever
The last one is the most important. It basically means that if you do things the same way in software you will get pop ins.
You can adjust filter in software, it just comes at a (shader) cost. It's something current gen titles ran into with PRT too.

Anyway its peculiar that MS quotes page sizes, as that's granularity we already worked with in current gen just fine. Ie. for that efficiency increase I'd expect you need finer granularity, unless he's comparing to paging entire textures...
 

THE:MILKMAN

Member
You can adjust filter in software, it just comes at a (shader) cost. It's something current gen titles ran into with PRT too.

Anyway its peculiar that MS quotes page sizes, as that's granularity we already worked with in current gen just fine. Ie. for that efficiency increase I'd expect you need finer granularity, unless he's comparing to paging entire textures...

Can you elaborate on this Fafalada? I also suspect some might be taking the following quote and applying the 2-3x multiplier to the full 13.5GB RAM and 2.4/4.8GB/s IO.

that means a 2-3x multiplier on the effective amount of physical memory, and a 2-3x multiplier on our effective IO performance.
 

Bernkastel

Ask me about my fanboy energy!
The hardware implementation of SFS in Xbox Series X for feedback streaming and how Sampler Feedback differs from standard Partially Resident Textures(PRTs) is described in patent US10388058B2.

Graphics processing units(GPUs) include various iternal hardware components, such as processing stages, memory elements, and other pipelined processing elements. There are various internal stages to process graphical data into rendered images. In many GPUs, these internal stages comprise a graphics pipeline that can take representations of scenes or user interfaces and render these into images for output. Among these stages are texture mapping stages that provide graphical details, surface textures, colors, or other elements for portions of rendered images. User content that is rendered by GPUs, such as video game content, is expected to continue growing in complexity over time, but graphics hardware constraints such as bandwidth and memory capacity are not expected to grow at a similar rate.
This patent provides improved efficiency in usage of residency maps for GPUs. The layout of the residency maps here provides for viable hardware implementations, leading to faster and more efficient rendering of graphics for computing systems. Texture mapping process can be memory intensive, so Partially Resident Textures(PRTs) are used to aid in this process. PRTs partially map texture data only to portions of objects presently needed to be rendered, such as due to viewpoint characteristics of a user, obstruction by other objects, proximity to a viewer, or other factors. Mip mapping can also be included in this process to pre-compute a series or set of smaller and smaller texture representations to suit different levels of detail for the textures. This mip mapping can aid in anti-aliasing as well as provide less processor-intensive renderings.
Migrating elements of texture streaming implementations from mip-based streaming(i.e., loading entire levels of detail) to tile-based streaming and partial residency can be an effective mitigation to performance issues. Techniques using partial residency can allow content complexity to continue to grow without a corresponding increase in load times or memory footprint. Tiled resources can be improved so that these PRTs can be widely adopted while minimizing implementation difficulty and performance overhead for GPUs. These improvements include hardware residency map features and texture sample operations referred to herein as "residency samples", among other improvements.
The first enhancement includes a hardware residency map feature comprising a low-resolution residency map that is paired with a much larger PRT, and both are provided to hardware at the same time. The residency map stores the mipmap level of detail resident for each rectangular region of the texture. PRT textures are currently difficult to sample given sparse residency. Software-only residency map solutions typically perform two fetches of two different buffers in the shader, namely the residency map and the actual texture map. The primary PRT texture sample is dependent on the results of a residency map sample. These solutions are effective, but require considerable implementation changes to shader and application code, especially to perform filtering the residency map in order to mask unsightly transitions between levels of detail, and may have undesirable performance characteristics. The improvements herein can streamline the concept of a residency map and move the residency map into a hardware implementation. This is the custom hardware portion that performs feedback streaming in XSX.
A second enhancement includes an enhanced type of texture sample operation called a "residency sample”. The residency sample operates similarly to a traditional texture sampling, except the part of the texture sample that request texture data from cache/memory and filters the texture data to provide an output value is removed from the residency sample operation. The purpose of the residency sample is to generate memory addresses that reach the page table hardware in the graphics processor but do not continue on to become full memory requests. Instead, the residency of the PRT at those addresses is checked and missing pages are non-redundantly logged and requested to be filled by the OS or a delegate. This describes how Sampler Feedback is an improvement over standard PRT.
 
I thought they got a new PR/Marketing team and strategy? I guess not, same ole Microsoft in action.
By August it will already be game over in terms of specs. Everyone would have pre-ordered and made up their mind. If you are getting a PC and latest Nividia/Intel/Amd you have already made up your mind, If you are getting a XSX or PS5 nothing will change it either.

While people criticism Cerny. He did a brilliant thing giving that talk and laying out what they did for their SSD, I/O solution and how complex it is.
Making everything plain. Its not surprising that SSD has taken over the discourse. If that talk didn't exist it wouldn't have.
Of-course MS being MS will take forever to respond and actually will never respond. Knowing MS from the past behavior in tech (not just consoles). The corporates are still running the show.

If a real down to earth small company were in charge. They would move up parts of that Hot chip presentation like yesterday and do what Cerny did including a Tech demo of their own specifically to outline their SSD/IO solution.

Its hilarious that people have to read between the lines on BCPACK, SFS and heck what the hell is Direct Storage anyway. Sure we know the general description but nothing tangible. When is it coming to PC. How will the PC market utilize it and how will it improve the current crop of SSD NVME in PCs. Simple stuff. But it will take months to get approval due to red tape. Just goes to show you that the same leadership is still in charge unfortunately.

You are literally describing MS' june, July and August shows.

Be patient.
 

Panajev2001a

GAF's Pleasant Genius
Copied from the patent;

ABSTRACT
Systems , methods , apparatuses , and software for graphics processing systems in computing environments are provided herein . In one example , a method of handling tiled resources in graphics processing environments is presented . The method includes establishing , in a graphics processing unit , a residency map having values determined from memory residency properties of a texture resource , and sampling from the residency map at a specified location to determine a residency map sample for the texture resource at the specified location , where the residency map sample indicates at least an initial level of detail presently resident and a smoothing component to reach a next level of detail .


That is going to take a while to fully understand. But it's patented, so, no one else has this to this degree.

Yes and no and the usual bit about something in patents not necessarily being in shipping HW... “In technology, everything worth a damn has already been patented... twice” is a quote I have even lost track of its origin now... I keep attributing it to Simon F. from IMG Technology/PowerVR fame...

Still yeah, XSX packs quite a bit of cleverly designed HW and consoles with a great intersection of custom software and HW stacks and the innovation in reach is something that I have always enjoyed learning about :).
 

Fafalada

Fafracer forever
Can you elaborate on this Fafalada? I also suspect some might be taking the following quote and applying the 2-3x multiplier to the full 13.5GB RAM and 2.4/4.8GB/s IO.
What we care about is actual use. Eg. if PRT is used effectively - we are likely looking at less than 100MB(@4k) of memory used for a single frame, and still get wholly unique detail on screen (like UE5 demo). Likewise the actual amount of transferred data per frame/second is what matters for bandwidth. If I need to refill up to 100MB/frame - that's what my I/O needs to be able to handle - no multiplication shortcuts past that.

Obviously - we're going from talking about GBs of texture cache with no PRT in current gen - down to 100s (or maybe 10s) of MBs, that's why those handy multipliers are waved about. But yes - by themselves they are pretty meaningless. Applied to base-numbers - that's just marketing material.
 

Ascend

Member
You can adjust filter in software, it just comes at a (shader) cost. It's something current gen titles ran into with PRT too.

Anyway its peculiar that MS quotes page sizes, as that's granularity we already worked with in current gen just fine. Ie. for that efficiency increase I'd expect you need finer granularity, unless he's comparing to paging entire textures...
Unless I'm gravely mistaken, I think they are synchronizing the pages in memory to the tiles of the textures on the SSD, so to speak.
 

Ascend

Member
Thank you for this. Interesting thread by the way, and more information after. Shifty Geezer chimes in too, and I know him from reading his stuff on RT some time ago. He’s input should be valued also.
The interesting thing is that the same user is describing what Kirby Louise said as a real possibility in this post.

One important difference between MS and Sony's I/O solution is MS's claim to be able to transfer data directly from the SSD to the GPU. The claim of 100 GB of NAND SSD being instantly available is brought to mind. The questions are then:
(1) What does the qualifier "instantly" means in this context?
(2) What is exactly being made "available" ?
(3) For what purpose?

The careless observer will just wave it away by saying that this is just good old virtual memory paging; that is not in fact the direct transfer of data from the SSD to the GPU . However, the idea of virtual paging does not stand up to scrutiny in this case.
Reason 1: There is nothing particularly instantaneous about virtual memory paging. It describes a tortuous circuit whereby the CPU will have to acknowledge a page fault, look through the filesytem to find the requested page on the SSD, find an empty frame in main memory or evict a stale page to create one and then swap in the correct page from the SSD. Yea, nothing to brag about in terms of instantaneousness.
Reason 2: Phil Spencer, in an otherwise mundane interview in December 2019 drops an absolute bombshell: the SSD of the upcoming Xbox can be used as virtual RAM. Now this can either mean a matching of a page on the SSD to an address in the physical memory address space which remains unchanged (virtual memory paging) OR the memory mapping of a portion of the SSD (100 GB) of it and its addition to the physical memory address space contiguous with system RAM. Phil Spencer specifically mentions that the SSD will act virtually as RAM by significantly increasing the physical memory address space, comparing it to the 32 to 64 bit transition for good measure. Thus, it becomes highly probable that MS has succeeded into making a part of an NVME SSD byte-addressable which cuts down significantly on the CPU overhead associated with virtual memory as the CPU likely can't differentiate between system RAM and the SSD.

This type of technology is not unprecedented in consoles, that's how the ROM cartridge of the good old NES functioned. Nowadays it finds an echo in a field far removed from gaming: big data and AI systems. The addressable SSD is what can be described as persistent memory, a technology now ubiquitous with dual socket servers being used for RDMA. Tom Talpey of Microsoft is actually a good source for the ongoing effort to develop a new filesystem API for presistent memory when in memory mode. This is it for the term of art 'instantaneous'.
Now what is this data available for? I speculate that it is available to be duplicated back to another portion of the physical memory space which is system RAM (the CPU will view it just as a duplication of data from one RAM address to another) and/or streaming of textures from the SSD to the GPU as part of SFS. One interesting result of this aspect of the XVA is that it doesn't actually requires the use of coherency engines or GPU scrubbers.
 

Tripolygon

Banned
XSX streams way less asset data then PS5 and this whole setup considerable reduces CPU/GPU overhead and RAM usage. By your logic we should go back to days of Kutaragi and use Flops as absolute performance gauge of comparison.
And you have come to this conclusion how?
QPVZHESdbfeT.jpg


7y2nL9CL8hKP.jpg


NgLE9jJ0WvWA.jpg


4U1KLlMwzZdg.jpg


gWX6Lmna9hJ8.jpg

Streaming way less assets is something developers have been doing dating back to the beginning of game development. Last current gen consoles support PRT or as Microsoft calls it Tiled Resources. PS4 supported a more advanced implementation.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
And you have come to this conclusion how?

The pattern seems to be to just bring up the same argument every so many pages as if it were new... normally it is meant to bring up a reaction like “oh, people keep debating it, there must be merits on both sides... :/...“ without having to prove one’s position.
 

Deto

Banned
now the speed is not so triple or power of the SSD, but also triples a RAM?

wow, if you leave the xbox fanboys here for at least another 6 months, they will start talking about the hidden APU, that the MS will update the software and will magically become CPU 32C / 64t, 512GB ram, GDDR8 10TB / s .. ..

Why do we only have these delusions on xbox?

I don't see anyone talking about DLSS 2.0 on the switch for him to make 4K, I don't see anyone talking about magic software that will turn the PS5 into 30TF ...

This topic seems like a topic of terraplanism, the last thing that matters to them is reality and facts.
 
Last edited:

Andodalf

Banned
And you have come to this conclusion how?
QPVZHESdbfeT.jpg


7y2nL9CL8hKP.jpg


NgLE9jJ0WvWA.jpg


4U1KLlMwzZdg.jpg


gWX6Lmna9hJ8.jpg

Streaming way less assets is something developers have been doing dating back to the beginning of game development. Last gen consoles support PRT or as Microsoft calls it Tiled Resources. PS4 supported a more advanced implementation.

Since they’ve been doing it forever, why is it so hard to believe that XSX has a new implementation?
 

TBiddy

Member
now the speed is not so triple or power of the SSD, but also triples a RAM?

wow, if you leave the xbox fanboys here for at least another 6 months, they will start talking about the hidden APU, that the MS will update the software and will magically become CPU 32C / 64t, 512GB ram, GDDR8 10TB / s .. ..

Why do we only have these delusions on xbox?

I don't see anyone talking about DLSS 2.0 on the switch for him to make 4K, I don't see anyone talking about magic software that will turn the PS5 into 30TF ...

This topic seems like a topic of terraplanism, the last thing that matters to them is reality and facts.

I've rarely seen a more impressive collection of strawmen gathered in one post. Nice one!
 

Three

Member
The interesting thing is that the same user is describing what Kirby Louise said as a real possibility in this post.

One important difference between MS and Sony's I/O solution is MS's claim to be able to transfer data directly from the SSD to the GPU. The claim of 100 GB of NAND SSD being instantly available is brought to mind. The questions are then:
(1) What does the qualifier "instantly" means in this context?
(2) What is exactly being made "available" ?
(3) For what purpose?

The careless observer will just wave it away by saying that this is just good old virtual memory paging; that is not in fact the direct transfer of data from the SSD to the GPU . However, the idea of virtual paging does not stand up to scrutiny in this case.
Reason 1: There is nothing particularly instantaneous about virtual memory paging. It describes a tortuous circuit whereby the CPU will have to acknowledge a page fault, look through the filesytem to find the requested page on the SSD, find an empty frame in main memory or evict a stale page to create one and then swap in the correct page from the SSD. Yea, nothing to brag about in terms of instantaneousness.
Reason 2: Phil Spencer, in an otherwise mundane interview in December 2019 drops an absolute bombshell: the SSD of the upcoming Xbox can be used as virtual RAM. Now this can either mean a matching of a page on the SSD to an address in the physical memory address space which remains unchanged (virtual memory paging) OR the memory mapping of a portion of the SSD (100 GB) of it and its addition to the physical memory address space contiguous with system RAM. Phil Spencer specifically mentions that the SSD will act virtually as RAM by significantly increasing the physical memory address space, comparing it to the 32 to 64 bit transition for good measure. Thus, it becomes highly probable that MS has succeeded into making a part of an NVME SSD byte-addressable which cuts down significantly on the CPU overhead associated with virtual memory as the CPU likely can't differentiate between system RAM and the SSD.

This type of technology is not unprecedented in consoles, that's how the ROM cartridge of the good old NES functioned. Nowadays it finds an echo in a field far removed from gaming: big data and AI systems. The addressable SSD is what can be described as persistent memory, a technology now ubiquitous with dual socket servers being used for RDMA. Tom Talpey of Microsoft is actually a good source for the ongoing effort to develop a new filesystem API for presistent memory when in memory mode. This is it for the term of art 'instantaneous'.
Now what is this data available for? I speculate that it is available to be duplicated back to another portion of the physical memory space which is system RAM (the CPU will view it just as a duplication of data from one RAM address to another) and/or streaming of textures from the SSD to the GPU as part of SFS. One interesting result of this aspect of the XVA is that it doesn't actually requires the use of coherency engines or GPU scrubbers.
This is all very interesting but doesn't relate to the SSD bandwidth. This would actually hammer the SSD more or at best the same because you have things not even resident in memory. This would be a compute overhead saving not an SSD bandwidth saving.
 
Last edited:

Tripolygon

Banned
Since they’ve been doing it forever, why is it so hard to believe that XSX has a new implementation?
I'm afraid you have lost me on where I said XSX couldn't have a new implementation. What XSX adds are texture filters that helps in LOD transition between mip level when a page fault is detected. So I guess that's a new implementation.

What I'm saying is this 2x to 3x multiplayer due to selective asset or texture loading applies to every console and PC.

This statement here.
XSX streams way less asset data then PS5 and this whole setup considerable reduces CPU/GPU overhead and RAM usage. By your logic we should go back to days of Kutaragi and use Flops as absolute performance gauge of comparison.
 
Last edited:

Andodalf

Banned
now the speed is not so triple or power of the SSD, but also triples a RAM?

wow, if you leave the xbox fanboys here for at least another 6 months, they will start talking about the hidden APU, that the MS will update the software and will magically become CPU 32C / 64t, 512GB ram, GDDR8 10TB / s .. ..

Why do we only have these delusions on xbox?

I don't see anyone talking about DLSS 2.0 on the switch for him to make 4K, I don't see anyone talking about magic software that will turn the PS5 into 30TF ...

This topic seems like a topic of terraplanism, the last thing that matters to them is reality and facts.

I’m not really replying to you, as you’re clearly trolling, but I don’t want any uniformed passers by to be confused. The 2-3x multiplier was not made by “Xbox fanboys”, but in the very first Eurogamer article quoting Andrew Goosen. So it also isn’t something new, as you Imply. Literally the first time we got official specs, we heard about it.

"So if a game never had to load pages that are ultimately never actually used, that means a 2-3x multiplier on the effective amount of physical memory, and a 2-3x multiplier on our effective IO performance."

And saying no one has said ridiculous things on the Sony side is laughable. You had people convinced that the 8TF fp16 mode on PS4 meant it was stronger than XOX and they had outplayed them. Now you have people saying an SSD means they can render more geometry at any one time. But you know this. You’re not an idiot. You’re just trying to mislead people and be loud to feel like you’ve won.



I'm afraid you have lost me on where I said XSX couldn't have a new implementation. What XSX adds are texture filters that helps in LOD transition between mip level when a page fault is detected. So I guess that's a new implementation.

What I'm saying is this 2x to 3x multiplayer due to selective asset or texture loading applies to every console and PC.

This statement here.

So it can be a new implementation, but it has to be the exact same? Not saying the other hardware doesn’t have an effective multiplier, but can XSX have a more efficient one? It’s not like the tech has been perfected
 
Last edited:

Three

Member
Since they’ve been doing it forever, why is it so hard to believe that XSX has a new implementation?
It's not hard to believe it's a 'new implementation' the fact that some people on forums are relating this method to a 2x-3x saving compared to other methods or to hardware without any logical input as to how. when the method to get rid of uneeded textures described by MS themselves is a method that's not new in that regard why are we doing that?

Edit: somebody beat me to it already.
 
Last edited:

Three

Member
So it can be a new implementation, but it has to be the exact same? Not saying the other hardware doesn’t have an effective multiplier, but can XSX have a more efficient one? It’s not like the tech has been perfected

The implementation has already been described. People can make a very logical connection as to what that 2x or 3x infers but they choose not to. What we end up with are people thinking MS has some exclusive secret sauce to get 2x efficiency compared to other methods when we know that's not the case.
 

Ascend

Member
The implementation has already been described. People can make a very logical connection as to what that 2x or 3x infers but they choose not to. What we end up with are people thinking MS has some exclusive secret sauce to get 2x efficiency compared to other methods when we know that's not the case.
Aaaaand how do we know this?
 

Dodkrake

Banned
now the speed is not so triple or power of the SSD, but also triples a RAM?

wow, if you leave the xbox fanboys here for at least another 6 months, they will start talking about the hidden APU, that the MS will update the software and will magically become CPU 32C / 64t, 512GB ram, GDDR8 10TB / s .. ..

Why do we only have these delusions on xbox?

I don't see anyone talking about DLSS 2.0 on the switch for him to make 4K, I don't see anyone talking about magic software that will turn the PS5 into 30TF ...

This topic seems like a topic of terraplanism, the last thing that matters to them is reality and facts.

Easy. Originally, TF were the greatest thing ever and the SSD was only for cutting loading times. This was MS's original pitch. Then the PS5 came along with a blitz fast SSD that, and I'm paraphrashing, can change the way game worlds are built (the Xbox SSD can also do this). Then suddenly, since MS understood that their custom IO and SSD doesn't hold a candle to Sony's, they started pushing the narrative that every single new tech SW related would fix and decuplicate their system's output and capabilities.

Facts are, with what we have so far:
  1. Sony invested a lot more in Custom IO to reduce bottlenecks
  2. Microsoft preferred to invest in SW applications
  3. HW, unless stupidly poor engineering, will beat SW in most applications where the usage is similar
  4. Most techs being presented as "Xbox exclusives" have had similar implementations patented by Sony
What to expect, with what we have so far
  1. Xbox will have the edge in RT and resolution
  2. PS5 will have the edge in sound and asset streaming
  3. Third party games will probably look roughly the same
  4. Game design in first party games will likely improve way more in Sony's camp, as Xbox needs to scale game design to the Xbox One and PC
  5. MS will likely have an edge in graphics for their first party games (not by much)
My 2 cents.
 
Last edited:
I'm afraid you have lost me on where I said XSX couldn't have a new implementation. What XSX adds are texture filters that helps in LOD transition between mip level when a page fault is detected. So I guess that's a new implementation.

Can you expound on this? Is this something like a replacement texture? So instead of showing nothing at all which results in pop-ins, it'll show a lower quality LOD instead?
 
Last edited:
D

Deleted member 775630

Unconfirmed Member
Easy. Originally, TF were the greatest thing ever and the SSD was only for cutting loading times. This was MS's original pitch. Then the PS5 came along with a blitz fast SSD that, and I'm paraphrashing, can change the way game worlds are built (the Xbox SSD can also do this). Then suddenly, since MS understood that their custom IO and SSD doesn't hold a candle to Sony's, they started pushing the narrative that every single new tech SW related would fix and decuplicate their system's output and capabilities.
You make it sound as if Microsoft doesn't talk with developers and does this only for load time. That no one in their own studios would properly explain what is needed and what the possibilities are with a fast SSD.
 
Last edited by a moderator:

Panajev2001a

GAF's Pleasant Genius
I’m not really replying to you, as you’re clearly trolling, but I don’t want any uniformed passers by to be confused. The 2-3x multiplier was not made by “Xbox fanboys”,

He made a reasonable statement, but people took and ran away with it inferring a console warrish statement that lacked support. So the statement was made by MS, the 2-3x over PRT/software implemented virtual texturing schemes was made by excited fans of the Xbox platform so to speak.
 

Panajev2001a

GAF's Pleasant Genius
Aaaaand how do we know this?

Common sense and MS never actually stating it (baseline is essential to relative comparisons)? Likelihood of MS sitting quiet on a 200% or more improvement on XSX only over the state of the art of texture streaming anywhere else = 0%.
 
Last edited:

Bernkastel

Ask me about my fanboy energy!
This is pure bullshit. It doesn't.
And you have come to this conclusion how?
QPVZHESdbfeT.jpg


7y2nL9CL8hKP.jpg


NgLE9jJ0WvWA.jpg


4U1KLlMwzZdg.jpg


gWX6Lmna9hJ8.jpg

Streaming way less assets is something developers have been doing dating back to the beginning of game development. Last current gen consoles support PRT or as Microsoft calls it Tiled Resources. PS4 supported a more advanced implementation.
This is your own personal opinion though... not seeing any support/proof for this honestly.
You can hype the strengths XSX has up without having to present unsubstantiated opinions as facts.
If PRT and SFS were the same thing, we wont be having this discussion. We have already talked about patent US10388058B2, which describes PRT and its issues. The information has been added to OP, so you dont have search a 43 page thread.
Migrating elements of texture streaming implementations from mip-based streaming(i.e., loading entire levels of detail) to tile-based streaming and partial residency can be an effective mitigation to performance issues. Techniques using partial residency can allow content complexity to continue to grow without a corresponding increase in load times or memory footprint. Tiled resources can be improved so that these PRTs can be widely adopted while minimizing implementation difficulty and performance overhead for GPUs. These improvements include hardware residency map features and texture sample operations referred to herein as "residency samples", among other improvements.
The first enhancement
includes a hardware residency map feature comprising a low-resolution residency map that is paired with a much larger PRT, and both are provided to hardware at the same time. The residency map stores the mipmap level of detail resident for each rectangular region of the texture. PRT textures are currently difficult to sample given sparse residency. Software-only residency map solutions typically perform two fetches of two different buffers in the shader, namely the residency map and the actual texture map. The primary PRT texture sample is dependent on the results of a residency map sample. These solutions are effective, but require considerable implementation changes to shader and application code, especially to perform filtering the residency map in order to mask unsightly transitions between levels of detail, and may have undesirable performance characteristics. The improvements herein can streamline the concept of a residency map and move the residency map into a hardware implementation. This is the custom hardware portion that performs feedback streaming in XSX.
A second enhancement includes an enhanced type of texture sample operation called a "residency sample”. The residency sample operates similarly to a traditional texture sampling, except the part of the texture sample that request texture data from cache/memory and filters the texture data to provide an output value is removed from the residency sample operation. The purpose of the residency sample is to generate memory addresses that reach the page table hardware in the graphics processor but do not continue on to become full memory requests. Instead, the residency of the PRT at those addresses is checked and missing pages are non-redundantly logged and requested to be filled by the OS or a delegate. This describes how Sampler Feedback is an improvement over standard PRT.
We also know that Nvidia acknowledged Sampler Feedback(which is only a software enhancement without any of the hardware implementation of SFS) as "increasing load times".
Sampler Feedback enables better visual quality, shorter load times, and less stuttering.
 

Tripolygon

Banned
So it can be a new implementation, but it has to be the exact same? Not saying the other hardware doesn’t have an effective multiplier, but can XSX have a more efficient one? It’s not like the tech has been perfected
Sure it is entirely possible that Microsoft has come up with a more efficient way to do virtual texturing which reading through the documentation it sounds efficient. But More efficient compared to what? To consider something more efficient, it has to be comparative to something that comes before right?

So more efficient compared to how Microsoft used to do it in Direct X 11 and 12?
More efficient compared to how the entire development community as a whole used to do it?
Can you expound on this? Is this something like a replacement texture? So instead of showing nothing at all which results in pop-ins, it'll show a lower quality LOD instead?
I couldn't give you a definite answer on that question as we don't have entire details on how those texture filters work. But from the little information we have my understanding is that yes, it will smoothly blend between a lower lod to avoid a pop in.
 

Dodkrake

Banned
You make it sound as if Microsoft doesn't talk with developers and does this only for load time. That no one in their own studios would properly explain what is needed and what the possibilities are with a fast SSD.

I'm not sure where you read that, but I'm not going to explain myself.

#TopTip: Read the parenthesis just were you bolded.
 

Tripolygon

Banned
If PRT and SFS were the same thing, we wont be having this discussion. We have already talked about patent US10388058B2, which describes PRT and its issues. The information has been added to OP, so you dont have search a 43 page thread.

We also know that Nvidia acknowledged Sampler Feedback(which is only a software enhancement without any of the hardware implementation of SFS) as "increasing load times".
We won't have this discussion if you actually bother to read the documentation.

Terminology
Use of sampler feedback with streaming is sometimes abbreviated as SFS. It is also sometimes called sparse feedback textures, or SFT, or PRT+, which stands for “partially resident textures”.
 
Last edited:
Reason 1: There is nothing particularly instantaneous about virtual memory paging. It describes a tortuous circuit whereby the CPU will have to acknowledge a page fault, look through the filesytem to find the requested page on the SSD, find an empty frame in main memory or evict a stale page to create one and then swap in the correct page from the SSD. Yea, nothing to brag about in terms of instantaneousness.
Reason 2: Phil Spencer, in an otherwise mundane interview in December 2019 drops an absolute bombshell: the SSD of the upcoming Xbox can be used as virtual RAM. Now this can either mean a matching of a page on the SSD to an address in the physical memory address space which remains unchanged (virtual memory paging) OR the memory mapping of a portion of the SSD (100 GB) of it and its addition to the physical memory address space contiguous with system RAM. Phil Spencer specifically mentions that the SSD will act virtually as RAM by significantly increasing the physical memory address space, comparing it to the 32 to 64 bit transition for good measure. Thus, it becomes highly probable that MS has succeeded into making a part of an NVME SSD byte-addressable which cuts down significantly on the CPU overhead associated with virtual memory as the CPU likely can't differentiate between system RAM and the SSD.

This dismisses virtual memory and then describes a worse version. Byte addressable SSD? Virtual Memory handles this, and swaps it for you. Without swapping, you just have a hard drive and all the latency and speed overheads associated with it. So either it swaps, and therefore is virtual, or doesn't, and it's a disk only that will always be slow A CPU can lookup an address table pretty damn quickly and if it was in RAM, boom a win.

Stop bringing up that Kirby person, as a ) They said nothing close to it b ) they aren't a source.
 

Bernkastel

Ask me about my fanboy energy!
We won't have this discussion if you actually bother to read the documentation.

Terminology
Use of sampler feedback with streaming is sometimes abbreviated as SFS. It is also sometimes called sparse feedback textures, or SFT, or PRT+, which stands for “partially resident textures”.
Sampler Feedback is only the software implementation(the second enhancement), without any of the hardware implementation(first enhancement) which most of the patent focuses on. Even then they called it SRT or PRT+ and Nvidia still talked about how it increases load times.
A second enhancement includes an enhanced type of texture sample operation called a "residency sample”. The residency sample operates similarly to a traditional texture sampling, except the part of the texture sample that request texture data from cache/memory and filters the texture data to provide an output value is removed from the residency sample operation. The purpose of the residency sample is to generate memory addresses that reach the page table hardware in the graphics processor but do not continue on to become full memory requests. Instead, the residency of the PRT at those addresses is checked and missing pages are non-redundantly logged and requested to be filled by the OS or a delegate.
This is basically Sample Feedback.
 

Three

Member
Aaaaand how do we know this?
Because the method is known in the spec sheet for DX12 and it describes nothing that would result in a 2x-3x saving in bandwidth. Everything people have mentioned here doesn't either. MS themselves and developers here and elsewhere are clearly pointing to where that 2x-3x is coming from but people are ignoring it completely.
 
Last edited:

Ascend

Member
Because the method is known in the spec sheet for DX12 and it describes nothing that would result in a 2x-3x saving in bandwidth. Everything people have mentioned here doesn't either. MS themselves and developers here and elsewhere are clearly pointing to where that 2x-3x is coming from but people are ignoring it completely.
Yes. By loading only textures that are needed, as they are needed, with fine granularity.

But I guess what we do need to ignore is the fact that they also mentioned the XSX has custom hardware, that sampler feedback is a DX12U spec but sampler feedback streaming isn't, the fine granularity part, and most importantly, that you couldn't be accurate at all in the past with similar methods (basically killing the benefit of it).

This dismisses virtual memory and then describes a worse version. Byte addressable SSD? Virtual Memory handles this, and swaps it for you. Without swapping, you just have a hard drive and all the latency and speed overheads associated with it. So either it swaps, and therefore is virtual, or doesn't, and it's a disk only that will always be slow A CPU can lookup an address table pretty damn quickly and if it was in RAM, boom a win.
That a disk is slower than RAM does not mean that transferring from disk to GPU would not be faster than transferring from disk to RAM to GPU. And that was the whole point.

Stop bringing up that Kirby person, as a ) They said nothing close to it b ) they aren't a source.
They said nothing close to it? How come both these people bring up how the NES works as a reference?


As for that person not being a 'source'... Obviously, not an official one. Whether credible info is provided by this person or not is up for debate, although some people are very keen on stopping that debate completely. If what Ronaldo8 at Beyond3D said in this post was considered credible, and somehow that other one is considered complete nonsense because it aligns with what Kirby Louise said, then I don't know what else there is to talk about.

This thread was fine, but again we have people coming in here dictating what is allowed and not allowed to be talked about. So yeah. Another thread that was going fine with great sharing of information, ruined because of ego boosting.

I'm out.
 

Three

Member
If PRT and SFS were the same thing, we wont be having this discussion. We have already talked about patent US10388058B2, which describes PRT and its issues. The information has been added to OP, so you dont have search a 43 page thread.

We also know that Nvidia acknowledged Sampler Feedback(which is only a software enhancement without any of the hardware implementation of SFS) as "increasing load times".
"The first enhancement includes a hardware residency map feature comprising a low-resolution residency map that is paired with a much larger PRT, and both are provided to hardware at the same time. The residency map stores the mipmap level of detail resident for each rectangular region of the texture."

Understand this part of it then realise this has nothing to do with a bandwidth saving. This one may not even result in memory saving compared to other methods because it ALWAYs keeps the entire low res texture in memory. This is as everyone has described over and over again a method to deal with the texture not being resident in memory within the frametime so it falls back to the low res textures in memory. Everyone has covered this over and over again in this thread.

"Second enhancement: Instead, the residency of the PRT at those addresses is checked and missing pages are non-redundantly logged and requested to be filled by the OS or a delegate."

Again does this relate to needing 2x or 3x less textures? This is basically describing how Sampler Feedback is applied for Sampler feedback Streaming. It does not describe how this will result in 2x-3x less texture data to begin with than games on Engines like UE5 or id Tech 6. Its on a different layer of abstraction.
 
Last edited:
We also know that Nvidia acknowledged Sampler Feedback(which is only a software enhancement without any of the hardware implementation of SFS) as "increasing load times".

Sampler Feedback is not "only a software enhancement" as some keep claiming. Here's nvidia again on Sampler Feedback:

DirectX Sampler Feedback, another new feature, lets engineers capture and record texture sampling info and locations, all done in hardware.


Sampler Feedback shares the same philosophy as Variable Rate Shading: work smarter to reduce GPU load and improve performance. It is enabled by a hardware capability in our GeForce RTX architecture called Texture Space Shading.


...and Microsoft...

...but it's[Sampler Feedback] an extension to it[sampling hardware]. It's a GPU hardware feature that extends existing hardware designs and gets you something new out of what used to be that closed black box.



quote is at 6:25

And here's a quote from Microsoft that succinctly explains what Sampler Feedback does and how it makes texture streaming more efficient:

Sampler feedback solves this by allowing a shader to efficiently query what part of a texture would have been needed to satisfy a sampling request, without actually carrying out the sample operation. This information can then be fed back into the game’s asset streaming system, allowing it to make more intelligent, precise decisions about what data to stream in next. In conjunction with the D3D12 tiled resources feature, this allows games to render larger, more detailed textures while using less video memory.


Microsoft says very clearly that Sampler Feedback (part of SFS on XsX) lets you use "less video memory" in your texture streaming implementation. How much less? They gave us a number when talking about SFS on Xbox, but who knows, maybe that number is meaningless to answer that question. Or maybe it's not a meaningless number. All of those that are confidently saying they know probably don't know, unless they are engine developers playing with DX12_2.

Does PS5 have Sampler Feedback (an RDNA2 feature)? We don't know that either. All that us normal people have is claims from Microsoft and a wait to find out.
 
Last edited:

Three

Member
Yes. By loading only textures that are needed, as they are needed, with fine granularity.

But other methods have already been doing exactly this. What hasn't been described is where you think there was this 2x or 3x inefficiency in needed textures on previous GPU or IO hardware that is NOT related to the SSD throughput. BUT we overcome the limited SSD throughput with something special. What is that special something?

Remember we are not talking about game engines at this point.

People are actually pointing to things that hammer the SSD even more (ie needing even higher throughput) effectively trading bandwidth for memory.
 
Last edited:
But other methods have already been doing exactly this. What hasn't been described is where you think there was this 2x or 3x inefficiency in needed textures on previous GPU or IO hardware that is NOT related to the SSD throughput. BUT we overcome the limited SSD throughput with something special. What is that special something?

Remember we are not talking about game engines at this point.

The claim all boils down to the question "which textures do I need to render this image?" Is it too hard to believe that Sampler Feedback lets you answer that question more precisely? Let's say previous texture streaming systems let you cut your memory use by 2X. Are those systems perfect at answering that question? Have you looked at that?

Why couldn't it be reduced a further 2X by having a more accurate understanding of what texture data you need?
 

Bernkastel

Ask me about my fanboy energy!
"The first enhancement includes a hardware residency map feature comprising a low-resolution residency map that is paired with a much larger PRT, and both are provided to hardware at the same time. The residency map stores the mipmap level of detail resident for each rectangular region of the texture."

Understand this part of it then realise this has nothing to do with a bandwidth saving. This one may not even result in memory saving compared to other methods because it ALWAYs keeps the entire low res texture in memory. This is as everyone has described over and over again a method to deal with the texture not being resident in memory within the frametime so it falls back to the low res textures in memory. Everyone has covered this over and over again in this thread.

"Second enhancement: Instead, the residency of the PRT at those addresses is checked and missing pages are non-redundantly logged and requested to be filled by the OS or a delegate."

Again does this relate to needing 2x or 3x less textures? This is basically describing how Sampler Feedback is applied for Sampler feedback Streaming. It does not describe how this will result in 2x-3x less texture data to begin with than games on Engines like UE5 or id Tech 6. It on a different layer of abstraction.
I was not even talking about multipliers, just that SFS is way more advanced than PRT and not the same thing, but here you go


"Texture gets a bigger multiplier"


These things are in the OP, so people dont have to search the whole thread for it.
 

Three

Member
The claim all boils down to the question "which textures do I need to render this image?" Is it too hard to believe that Sampler Feedback lets you answer that question more precisely? Let's say previous texture streaming systems let you cut your memory use by 2X. Are those systems perfect at answering that question? Have you looked at that?

Why couldn't it be reduced a further 2X by having a more accurate understanding of what texture data you need?

Science isn't something that relies on faith.

Either give credible evidence in the already available spec or please stop trying to get me to "believe because we don't know". Please stop spaming things which would result in more bandwidth being required instead of less too.

I was not even talking about multipliers, just that SFS is way more advanced than PRT and not the same thing, but here you go


"Texture gets a bigger multiplier"


These things are in the OP, so people dont have to search the whole thread for it.

You keep appealing to authority on those but they don't describe what you think it's describing.
SFS is streaming textures you need, so is UE5, so is idtech 6, the question you're not answering because you can't is how you believe this to be 2x more efficient in determining what is and isn't needed compared to current hardware. The answer is it isn't. That may upset you but this number comes from previous gen games on Xbox one as clearly stated by MS. Previous gen games held a lot more in memory because they had a slow ass drive so ALMOST ALL games and engines kept textures in memory that were not visible in the scene because the player could turn or move faster than you could load them in.
 
Last edited:

THE:MILKMAN

Member
^ Bernkastel Bernkastel

Surely the 'stacking' is already accounted for in the 50% compression applied in the 4.8GB? After all on PS5 the percentage is 35%?

Also Tripolygon Tripolygon post showing PS4 docs talking about PRT Mapping Granularity sounds exactly like what SFS is...?

Edit: I also have to say James Stanard is very careful how he answers questions.
 
Last edited:

oldergamer

Member
Facts are, with what we have so far:
  1. Sony invested a lot more in Custom IO to reduce bottlenecks
  2. Microsoft preferred to invest in SW applications
  3. HW, unless stupidly poor engineering, will beat SW in most applications where the usage is similar
  4. Most techs being presented as "Xbox exclusives" have had similar implementations patented by Sony
What to expect, with what we have so far
  1. Xbox will have the edge in RT and resolution
  2. PS5 will have the edge in sound and asset streaming
  3. Third party games will probably look roughly the same
  4. Game design in first party games will likely improve way more in Sony's camp, as Xbox needs to scale game design to the Xbox One and PC
  5. MS will likely have an edge in graphics for their first party games (not by much)
My 2 cents.

For the Facts:
1. That is not a fact
2. incorrect, and bullshit to be honest
3. yes
4. Incorrect.

what to expect:
1. Possibly, we don't know
2. Possibly, we don't know
3. Possibly, we don't know
4. Bullshit.
5. Possibly, we don't know
 
Science isn't something that relies on faith.

Either give credible evidence in the already available spec or please stop trying to get me to "believe because we don't know". Please stop spaming things which would result in more bandwidth being required instead of less too.

Sampler feedback solves this by allowing a shader to efficiently query what part of a texture would have been needed to satisfy a sampling request, without actually carrying out the sample operation. This information can then be fed back into the game’s asset streaming system, allowing it to make more intelligent, precise decisions about what data to stream in next. In conjunction with the D3D12 tiled resources feature, this allows games to render larger, more detailed textures while using less video memory.

 

jimbojim

Banned
Surely the 'stacking' is already accounted for in the 50% compression applied in the 4.8GB? After all on PS5 the percentage is 35%?

Of course it is. Saying otherwise that 4.8 GB/s is without BCpack is spreading FUD actually. Anyway, like it was said many times everywhere BCpack is texture only and lossy compression method, Kraken is general purpose compression method and lossless.
 
Top Bottom