PaintTinJr
Member
But it isn't slower in reality IMO because the bottleneck that is missing in the PS5 situation is that data the CPU modifies can be modified at full bandwidth with interleaved CPU/GPU access to the 16GB, and then used by the GPU at full bandwidth after the update.No, the lanes from APU to the RAM are shared. This means when you access the RAM you can either access the memory with 320bit (fast ram) or with 192 bit (slow ram). It is only one of both at the same time.
The difference in bits is because some RAM chips are 2GB and some are 1GB, but this are the same RAM chips accessed over the same memory lanes.
On the Playstation you can access all RAM with 256 bit, so it is slower then the 320bit RAM of the XSX and faster then the 192bit, since both consoles have the same clock of the RAM.
On the XsX, depending on which way round the access goes, there will be many times when the CPU has to modify data(prepare data for the GPU), and there is a performance cost there, one way or the other.
If the data is in the 10GB, this reduces the GPU effective memory throughout the interleaved update, which could have a further negative cost to the GPU caching by reducing the aperture feeding or receiving cache data, and the CPU updates would further reduce the GPU bandwidth to a split of 560Gb/s and 320GB/s - with every process cycle used by the CPU costing proportionally more throughput of the GPU. If the data is in the 6GB, then it needs copied after modification, and that interleaved update and data copy then has costs to the overall system bandwidth for the update and copy, and the GPU caches for the copy.
Neither of those XsX scenarios can be optimized to match the Ps5's unified setup, and that still doesn't even account for the scheduling of both console systems using RAM for the OS and CPU workloads that will lower GPU cache opportunities to access RAM. Although in the PS5's case, the cache scrubbers and IO complex will help both sides of that predicament - for the GPU, accessing RAM will be reduce by scrubbers, and for the IO complex it will DMA data in/out of RAM for both CPU/GPU needs with less CPU check-in involvement, lowering times when the CPU ram access will be blocking the GPU access.
Last edited: