GAF machine
Member
The name I'm referring to is 'YAMAZAKI'. It's on the back of PS5's SSD controller, and it's the last name of Takeshi Yamazaki. An SIE engineer Ken Kutaragi put at the forefront of Emotion Engine and CELL Broadband Engine R&D. The significance of this is the SSD controller is CELL inspired because one of the original PS5 I/O complex + SSD patent applications filed by Hideyuki Saito (devised the PS5's I/O complex + SSD data access request and address translation schemes; also helped IBM test CELL's functional correctness) cites a patent down in the 'Patent Citations' section (i.e., US8504736B2) for a file input/output scheduler (FIOS) that runs on CELL's SPUs). This FIOS application cites another application for a CELL-accelerated NIC server board of which Yamazaki co-invented (Sony/SIE used these server boards for their streaming services; the link to the intro of the server board whitepaper can be found in this related OP embedded in the words 'network processing'). Although the application details the specific invention of data transfer from a CELL's SPE to a NIC, language in one of the patent entries encapsulates two defining characteristics of Yamazaki's SSD controller, which are:
Given this and the harmonious operation of the Zen 2 + I/O complex + SSD controller system, I'd hazard a guess that Yamazaki was also responsible for designing PS5's I/O complex too. There are so many similarities between the CELL + NIC server board and the Zen 2 + I/O complex + SSD controller system, that it's impossible to ignore the overlap. I will attempt to summarize them from their respective applications, but these summaries will get into the weeds so those still reading please bear with me:
Two other similarities not part of the two patent applications, but which I thought were worth mentioning as they show CELL's heavy influence on the I/O complex's design are:
1) while the SPE is a type of CPU, it's also an accelerator in its own right. As some of you still reading may not know, it was used as an engine to accelerate decompression for PS3 (e.g. the Oodle Beta and subsequent Oodle releases starting with 1.1.0 - April 11, 2013). The Kraken decompression engine (i.e. the accelerator mentioned in entry [0081] of the I/O complex + SSD patent) is essentially a substitute SPE designed to chew on chunks of data the size of an SPE's 256KB local store. Entry [0111] of the CELL FIOS patent and Fabien Giesen's Kraken tweets practically say as much:
[0111]
By way of example, in the certain cell processor architectures, the SPE 206 have a local store capacity of 256 Kbytes. This is often sufficient memory space for the compressed input, the decompressed output and the code for the decompression algorithm
2) PS3's low-level graphics API (GCM) largely runs on SPUs, and treats them as a form of graphics command processor/coherency engine. They can be synchronized with the RSX to give the end-user tight control over when and what commands the RSX executes via the placement and subsequent overwriting of "local stalls" in the command buffer. SPUs can also overwrite previously written data located in a range of addresses stored in VRAM (per section 2.1 Memory Allocation under "RSX local memory", continued here) before the RSX can copy the stale data out of VRAM into its caches. PS5's I/O co-processors, coherency engines and cache scrubbers automate and evolve this process from low-level GPU memory management to low-level GPU cache management. But the main takeaway here is that SPUs free up their main CPU (i.e., PPE) from having to keep tabs on and overwrite address ranges. I/O co-processors and coherency engines in PS5 free up their main CPU (i.e., Zen 2) from having to keep tabs on and overwrite address ranges.
Extremely long story short, I think that Yamazaki designed the I/O system to not only rapidly store and transfer data around inside PS5, but to also rapidly feed an external compatibility adaptor in the service of PS5 backwards compatibility with PS1, 2, 3 games (in this post, I raised the prospect of SIE releasing a CELL/RSX-based compatibility adaptor to run PS1, 2, 3 games "on" PS5).
By this I mean once a user has selected a previously downloaded PS1, 2 or 3 game (purchased from the PS Store in the past or rented from PS Now) to play from an SSD, Zen 2 tells the adaptor to ready itself for the selected game. Once ready, the adaptor tells Zen 2 which then tells the SSD I/O co-processor inside the I/O complex to have the SSD controller read the selected legacy game's file data from an SSD and forward it in 4KB, 64KB, 1MB or 16MB chunks to the adaptor's internal flash storage via ethernet cable.
As the game file data arrives and is stored in the adaptor's flash storage, CELL quickly accesses it under a scheme similar to PS5's (i.e. using two or more priority levels) and processes it according to the legacy console it belongs to. Possibly resulting in quicker load times because the data reads sent from PS5's SSD to the adaptor are already chunked in 4KB, 64KB, 1MB or 16MB sizes the way CELL likes it. After the CELL/RSX-based adaptor renders a frame of the legacy game, the frame is then sent via ethernet cable to PS5's system memory, pulled out by Oberon and is either displayed as is or spruced up/upscaled before being displayed.
Likewise, a similar process would exist for disc-based PS1, 2, 3 titles in which PS5's disc drive reads the disc and sends signals containing game data to the adaptor. Once received, the adaptor extracts the game data from the signal and forwards the data to its internal flash storage to be accessed from there and processed according to the legacy console it belongs to.
How sweet would that be?
[0004]
The present invention has been conceived in view of the above-described situation, and an object of the invention is to provide an information processing device, data transfer method and information storage medium that can commence data transfer to an I/O device immediately, and can stably exhibit data transfer performance.
Given this and the harmonious operation of the Zen 2 + I/O complex + SSD controller system, I'd hazard a guess that Yamazaki was also responsible for designing PS5's I/O complex too. There are so many similarities between the CELL + NIC server board and the Zen 2 + I/O complex + SSD controller system, that it's impossible to ignore the overlap. I will attempt to summarize them from their respective applications, but these summaries will get into the weeds so those still reading please bear with me:
CELL + NIC server board:
The CELL Broadband Engine is an "information processing device" consisting of (focusing on CPUs only) a main processor and sub-processors connected via an on-chip coherent bus (i.e., the EIB)
[0013]
As shown in FIG. 1, this information processing device 10 includes a main processor 12 and a plurality of sub-processors 24-1 to 24-n, and is constructed as an asymmetric multi-core processor... The main processor 12 and the plurality of sub-processors 24-1 to 24-n are all connected to a bus 22,...
Zen 2 + I/O complex + SSD controller system:
The Zen 2 + I/O complex (referred to as the "host unit") is part of an "information processing device" consisting of (focusing on CPUs only) a main CPU and sub-CPUs connected via a coherent bus:
[0019]
An information processing device 10 includes a host unit 12,
[0050]
The host unit 12 includes a main CPU 30, a sub-CPU 32, and a memory controller 34 connected together by a coherent bus 36.
Summary:
The CELL + NIC server board and Zen + I/O complex patent applications refer to both devices as an "information processing device", with main and sub CPUs. CELL's PPE is its main processor (i.e., main-CPU) and its SPEs are its sub-processors (i.e., sub-CPUs). Zen 2 is the I/O complex's main CPU (i.e., main processor) and the two I/O co-processors inside the I/O complex along with the SSD controller outside the I/O complex are sub-CPUs (i.e., sub-processors).
CELL + NIC server board:
The DMAC inside a CELL's SPE (sub-processor) is used to bypass the main processor (i.e., PPE) when accessing data stored in main memory (i.e. system memory):
[0015]
The sub-processors 24 (24-1 to 24-n) are ancillary program execution means containing local memory 24 a, a memory management section 24 b and a DMAC (Direct Memory Access Controller)... The DMAC 24 c is also a control unit for direct access to the main memory 14, without going via the main processor 12.
Zen 2 + I/O complex + SSD controller system:
The DMAC inside the I/O complex in conjunction with an adjacent I/O co-processor (specifically the one "dedicated to SSD I/O") is used to bypass the main CPU (Zen 2) when accessing the requested data stored in system memory:
[0068]
When the data passes the ECC check, the data in question is stored in a kernel area 70 of the system memory 14 reserved in advance by the sub-CPU 32, for example, by a DMAC not illustrated, and the sub-CPU 32 is notified to that effect (S44).
Summary:
The SPE (equipped with an internal DMAC) facilitates high-speed data transfers so that the main processor (i.e., PPE) doesn't have to. The I/O-coprocessor inside the I/O complex that's dedicated to SSD I/O along with an adjacent DMAC facilitate high speed data transfers so that the main CPU (i.e., Zen 2) doesn't have to.
CELL + NIC server board:
With CELL, the PPE (main-processor) and SPE (sub-processor) page sizes are the same. These pages (data) are strictly 4KB, 64KB, 1MB or 16MB in size (i.e. page granularity levels), and are grouped into an address translation table to be referred to by an SPE:
[0015]
The sub-processors 24 (24-1 to 24-n) are ancillary program execution means containing local memory 24 a, a memory management section
[0014]
The address translation table is a table which associates logical addresses with physical addresses, and is made up of page groups of a specified size such as 4 KB. Therefore, the memory management section 12 a is provided with a memory for storing the necessary pages, of these pages,...
Zen 2 + I/O complex + SSD controller system:
With the Zen 2 + I/O complex + SSD controller system, the Zen 2 (main-CPU), I/O co-processors (sub-CPUs) and SSD controller (a sub-CPU outside the I/O complex) page sizes are the same. These pages (data) are strictly 4KB, 64KB, 1MB or 16MB in size (i.e. page granularity levels), and are grouped into an address translation table to be referred to by the SSD controller:
[0051]
Although it is not necessary for the main and sub-CPUs 30 and 32 to have the same instruction set architecture and operating system, the main and sub-CPUs 30 and 32 are connected by the coherent bus 36 and their page sizes are the same so that data stored in the system memory 14 can be shared between them.
[0062]
The flash controller 18 not only handles the data storage in question but also generates an address conversion table for associating a logical address of compressed data and a physical address of an area where the actual data is stored.
[0030]
A plurality of address spaces different in granularity level are defined in the address conversion table... It should be noted that there may be three or more granularity levels.
[0074]
Summary:
CELL's SPE (sub-CPU) and PPE (main CPU) have the same page (data) sizes which are 4KB, 64KB, 1MB or 16MB (i.e. page granularity levels). The SPE refers to an address translation table to locate data. Zen 2 (main-CPU), the I/O-coprocessor (sub-CPU inside the I/O complex) dedicated to SSD I/O and the SSD controller (sub-CPU outside the I/O complex) have the same page sizes which are 4KB, 64KB, 1MB or 16MB. The SSD controller refers to an address translation table to locate data.
CELL + NIC server board:
The benefit of using CELL's SPE (sub-processor) to handle data transfers from system memory rather than its PPE (main-processor) is high-speed data transfer:
[0019]
According to the above described information processing device 10, since a single sub-processor 24-1 constituting a multi-core processor is allocated solely to data transfer, it is possible to implement data transfer at high speed and with low latency regardless of the operating state of the main processor 12.
Zen 2 + I/O complex + SSD controller system:
The benefit of using the I/O complex's SSD I/O co-processor (sub-CPU) to access system memory and a flash controller (sub-CPU) to access data in flash memory rather than Zen 2 (main-CPU), is high speed data transfer:
[0052]
The sub-CPU 32 divides a file read request issued by the main CPU 30 into read requests for data of a given size, storing the requests in the system memory 14. Thus, in the present embodiment, hardware other than the main CPU 30 handles the major part of data access to the flash memory 20, and the read unit is reduced to a finer one immediately after issuance of a file access request. This allows for parallel access to a plurality of NAND devices, thus providing a high transfer rate.
Summary:
Offloading read requests from the main CPU (i.e the PPE or Zen 2) onto a sub CPU (i.e a SPE or a dedicated SSD I/O co-processor) speeds up data transfers.
Two other similarities not part of the two patent applications, but which I thought were worth mentioning as they show CELL's heavy influence on the I/O complex's design are:
1) while the SPE is a type of CPU, it's also an accelerator in its own right. As some of you still reading may not know, it was used as an engine to accelerate decompression for PS3 (e.g. the Oodle Beta and subsequent Oodle releases starting with 1.1.0 - April 11, 2013). The Kraken decompression engine (i.e. the accelerator mentioned in entry [0081] of the I/O complex + SSD patent) is essentially a substitute SPE designed to chew on chunks of data the size of an SPE's 256KB local store. Entry [0111] of the CELL FIOS patent and Fabien Giesen's Kraken tweets practically say as much:
[0111]
By way of example, in the certain cell processor architectures, the SPE 206 have a local store capacity of 256 Kbytes. This is often sufficient memory space for the compressed input, the decompressed output and the code for the decompression algorithm
2) PS3's low-level graphics API (GCM) largely runs on SPUs, and treats them as a form of graphics command processor/coherency engine. They can be synchronized with the RSX to give the end-user tight control over when and what commands the RSX executes via the placement and subsequent overwriting of "local stalls" in the command buffer. SPUs can also overwrite previously written data located in a range of addresses stored in VRAM (per section 2.1 Memory Allocation under "RSX local memory", continued here) before the RSX can copy the stale data out of VRAM into its caches. PS5's I/O co-processors, coherency engines and cache scrubbers automate and evolve this process from low-level GPU memory management to low-level GPU cache management. But the main takeaway here is that SPUs free up their main CPU (i.e., PPE) from having to keep tabs on and overwrite address ranges. I/O co-processors and coherency engines in PS5 free up their main CPU (i.e., Zen 2) from having to keep tabs on and overwrite address ranges.
Extremely long story short, I think that Yamazaki designed the I/O system to not only rapidly store and transfer data around inside PS5, but to also rapidly feed an external compatibility adaptor in the service of PS5 backwards compatibility with PS1, 2, 3 games (in this post, I raised the prospect of SIE releasing a CELL/RSX-based compatibility adaptor to run PS1, 2, 3 games "on" PS5).
By this I mean once a user has selected a previously downloaded PS1, 2 or 3 game (purchased from the PS Store in the past or rented from PS Now) to play from an SSD, Zen 2 tells the adaptor to ready itself for the selected game. Once ready, the adaptor tells Zen 2 which then tells the SSD I/O co-processor inside the I/O complex to have the SSD controller read the selected legacy game's file data from an SSD and forward it in 4KB, 64KB, 1MB or 16MB chunks to the adaptor's internal flash storage via ethernet cable.
As the game file data arrives and is stored in the adaptor's flash storage, CELL quickly accesses it under a scheme similar to PS5's (i.e. using two or more priority levels) and processes it according to the legacy console it belongs to. Possibly resulting in quicker load times because the data reads sent from PS5's SSD to the adaptor are already chunked in 4KB, 64KB, 1MB or 16MB sizes the way CELL likes it. After the CELL/RSX-based adaptor renders a frame of the legacy game, the frame is then sent via ethernet cable to PS5's system memory, pulled out by Oberon and is either displayed as is or spruced up/upscaled before being displayed.
Likewise, a similar process would exist for disc-based PS1, 2, 3 titles in which PS5's disc drive reads the disc and sends signals containing game data to the adaptor. Once received, the adaptor extracts the game data from the signal and forwards the data to its internal flash storage to be accessed from there and processed according to the legacy console it belongs to.
How sweet would that be?
Last edited: