Funnily enough, this was confirmed by Phil Spencer almost a year ago in an interview with a German website. (Site is in German but most modern browsers like Chrome should be able to translate it)
Article Here
Here is the translated quote
Damn, that quote just threw another monkey wrench into this xD. I mean reading it, is there really any other way to interpret it than a direct streaming solution of the 100 GB reserved NAND partition to the GPU through some altered method of GPUDirectStorage? Does this mean there's hardware modifications in the GPU for executeIndirect which handle addressing?
Now that I think about it, one of the AMD guys (some Indian lad IIRC) had a LinkedIn a while ago where they talked about the XSX APU and they specifically mentioned ARM cores in there along with the expected x86 cores (x86-64 in AMD's case). Could they have been referring to co-processor cores on the GPU to facilitate extension of executeIndirect functions to pull in data from the 100 GB partition of the SSD without the CPU necessarily needing to do so? Presumedly in GPU native format as well? I mean, it's all essentially a co-processor at the end of the day, like how the ARM co-processor in the PS4 Pro was (though that was for a different purpose and in such case, wasn't implemented into the GPU directly like how the case could be here with MS).
It's not even that far-fetched; Nvidia's GPUs for example use some type of FPGA cores integrated into the GPU die for certain logic, which I assume would extend to handling GPUDirectStorage calls. MS could have just chosen ARM over FPGA because they're cheaper, but still do what they'd need them to. There's also this quote from Ronaldo8 on B3D that's interesting and might fit into this speculation:
What's crazy is that we
know that those particular methods are actively being used by some MS studios (Ninja theory?) thanks to an interview given by Playfab's head honcho:
https://venturebeat.com/2020/02/03/...ext-generation-of-games-and-game-development/
Of note is this particular nugget of information:
"Gwertzman: You were talking about machine learning and content generation. I think that’s going to be interesting. One of the studios inside Microsoft has been experimenting with using ML models for asset generation. It’s working scarily well. To the point where we’re looking at shipping really low-res textures and having ML models uprez the textures in real time. You can’t tell the difference between the hand-authored high-res texture and the machine-scaled-up low-res texture, to the point that you may as well ship the low-res texture and let the machine do it.
Journalist: Can you do that on the hardware without install time?
Gwertzman: Not even install time. Run time.
Journalist: To clarify, you’re talking about real time, moving around the 3D space, level of detail style?
Gwertzman: Like literally not having to ship massive 2K by 2K textures. You can ship tiny textures."
They highlighted the important parts; while it's basically referring to the DLSS-style ML AI texture upscaling features of the platform, it's interesting Gwertzman stressed that it can be done at runtime. That probably hints at some of the GPU capabilities Matt over on Era was suggesting, but it could also be hinting at some customizations on the GPU to facilitate texture streaming by some co-processor in the GPU (working off extensions of executeIndirect and the pipeline fashioned similarly to Nvidia's GPUDirectStorage but for different purposes as in Nvidia's case it's mainly useful due to PCs being non-hUMA; in this case it could have benefit of very low/virtually non-existent abstraction layer access of data by the GPU to/from the 100 GB partition of NAND storage).
I think this is all starting to piece itself together rather nicely, now. It also fits into what some of the Dirt 5 dev's comments were suggesting about mid-frame use of streamed texture data. Provided the NAND in question for the 100 GB partition is of good enough quality in terms of latency (which I would assume it is), if a car model (in Dirt 5's case) only needs a 5 MB texture file for a panel deformation, even if it's running at 60 FPS and is using different textures each frame, that's still only 5 MB/frame, or 300 MB/s. That's easily within the SSD's limits, and we're talking about textures only being streamed a single time by the GPU, likely to work with the texture for a moment (in the local caches), dump it, then replace it with a new texture streamed in from the SSD. So replacing the textures mid-frame as the Dirt 5 project team member mentioned it, is perfectly capable especially if MS's implementation of XvA is what I'm starting to think it is based on these comments directly from people on the team (and after having considered alternatives seriously).
Starting to think, if the setup with these GPU modifications are what I think they are, could another thing I've bee thinking of be possible. But I'll save going into on that for some other time.
Just because the GPU on PC does not have access to the same virtual address space as the CPU does, but yes in this model the CPU has direct access to the SSD on your Windows PC. How that happens, how more efficiently it can be done now that is where Direct Storage can help (in addition to the CPU and GPU sharing the same address space).
I read the full quote and it is a carefully worded statement that does convey the clear intention of making the software mode kore usable and more efficient, but does not directly state what you are saying. Then again we shall see... both MS and Sony are talking about instantaneous access to the data on the SSD and have an I/O pipeline going from the SSD to the GPU caches, so they may both have this capability... I am not sure either does though and in order to use it to overcome the memory cache savings you get from about 2x the disk bandwidth you would run into other problems (your shaders wasting tons of cycles potentially unable to cover up the extra latency).
I think if Sony had this feature, Road to PS5 would've been the time to talk about it, no? After all, they spoke A LOT about the SSD in particular at that event, it easily took up the majority of the presentation time (audio being 2nd, and their variable frequency being 3rd). It was a conference presentation aimed at developers, after all. These are features developers would like to have heard if they were present, in a conference specifically targeting them.
So I'm inclined to believe Sony is achieving this through a quite different means: just raw speed throughput of the dedicated processor in the I/O block writing to/from RAM. Their goal is to see how to maximize the use of 16 GB memory (minus the reserve for the OS, so 14 GB) as a framebuffer as well as possible, meaning at any time if such and such data is needed it should be able to stream in to the RAM through the I/O block relatively instantly. That's their approach.
It seems more like MS are the ones who've taken an approach mirroring functionality of AMD's SSG cards and Nvidia's GPUDirectStorage, although both of those work differently than what I've been speculating could be the case with MS's approach. At the very least, we can also theorize that MS's approach to this could be exclusively (or in addition to what I've speculated above) having the GPU able to read data direct from the 100 GB NAND partition on the SSD and placing it in the 10 GB pool. Although that seems limited in scope and doesn't facilitate some of the capabilities we've already seen developers on the machine mention publicly (now if it were possible for the GPU to do what I just mentioned, only to the 4x 1 GB modules while the CPU etc. use the slower pool simultaneously, that could be interesting. Since the OS is managing the virtualized pool partitions anyway, it and the kernel could probably adjust the virtualization semi-dynamically although there are no working examples of this in any system (console or PC) that I'm aware of, hence it's a fringe speculation).
Out of the two I've only really seen MS people or devs on XSX mention anything suggesting direct streaming from the SSD in any way that could be bypassing RAM. Although, even if this is possible, it will have obvious limitations (there is still the reality of SSD speeds being way too slow for expected repeated read accesses by the GPU in a way it could do this with the RAM, so it will probably mainly be used for selective single-time stream access of small chunks of data by the GPU to write into the GPU caches depending on how the workload assignments are handled (how the CUs are assigned data, basically). Still very useful, but has its limitations.