…. Both GPU and CPU have access to the exact same 10 chips, BOTH can see, write and read from all 13.5GB. It doesn't matter if data is accessed by the CPU or GPU, it matters if this data needs to be accessed quickly or not. Sound file? Put it in the slow pool. BVH? Put it in the fast pool. CPU needs to access the BVH? It will access it in the fast pool. There is no copying from pool to pool. The shared GPU and CPU pool will sit in the 10GB and it's extremely small. And no, the GPU doesn't need access to all the CPU dat, it doesn't need access to over 90% of the CPU data.
….
I've had a look into the physical hardware side of things and I'm still not convinced – but am less inclined to fully rule out your claim, having looked extensively at the AMD Infinity Fabric stuff, the Zen2 floor plan, and the shot of the exposed XsX APU – which is all looking very inconclusive.
Conventionally a CPU core will try to offload its outer cache(L3 in this case) through the cache attached memory controller to the pool of memory it is always scheduled to read and write to. And that pool is also the one that provides the lowest latency because of wiring length, which in this case is the 6GB. It is convention to do it this way to avoid data starving or block the cores by being unable to offload or fill the outer cache(to then cascade the copying through the cache hierarchy) and the idea of data starving and blocking the CPU and GPU at once just to avoid an asynchronous copy from 6GB to 10GB, doesn't sound good for parallel processing or good for latency, or good for utilisation(IMHO); especially when the L3 cache typically is used far more for data destined for the 6GB, and that cache data will be stuck waiting for the portion destined for 10GB to get the memory controller to make it happen for it to offload or be filled from the 6GB.
(AFAIK) Conventionally the memory controllers are initialized by the low level bios/uefi system prior to boot strapping -hence why RaspPi and laptops with iGPUs fix the memory split before boot strapping – and in the initialization the controller/s is setup to enqueue commands from the outer CPU cache - so certainly not likely that a to-the-metal approach for developers would let them change how data gets to each memory pool.
But in saying all that, the Infinity fabric design of the Zen2 and Vega, which I assume XsX uses, or the early Infinity Architecture because of the old style 60 CU GPU of the XsX looks like it might be able to decouple the memory controllers (for all chiplets) behind the Infinity Fabric to make CPU memory access to the 10GB trivial, but the split bandwidths and different bus widths suggests something hasn't been decoupled for that optimal setup. A normal Zen 2 would have two 64bit memory controller units (MCU). One for each L3 cache - and one L3 cache per 4C/8T module - and we know that XsX has 192 bit bus width for the Zen2, so has 3x MCUs (and logically will have three L3 caches), but RDNA2 AMD GPUs will have either 256 bit bus or 384 bit, 4 MCUs 2 per side, or 6 MCUs, with 3 per side. But neither of these setups fits with the XsX's 5 MCUs.
If the XsX's ten 32bit chips are interfaced with 3 MCUs in the Zen2, and another 5MCUs in the GPU, then what you are claiming seems highly plausible as the MCUs would likely manage the complexity. However, if the memory is connected to 3 MCUs in the Zen2 and only 2 MCUs live in the GPU, then for latency reasons alone, I would expect the CPU to pass all data to the GPU by copying to the 6GB, and then to the 10GB. The picture of the exposed XsX chip has 5 bright silver units on its North edge, a cluster of 3 in the middle and one further wide on each side – I think they could be the MCUs because they are before the black moat, that I assume is the Infinity Fabric that wires to all the 10 GDDR6 modules, and that matches the Zen2 type design (AFAIK).