http://www.eet.com/semi/news/showArticle.jhtml?articleId=54200580
SAN FRANCISCO The eagerly anticipated Cell processor from IBM, Toshiba and Sony leverages a multicore 64-bit Power architecture with an embedded streaming processor, high-speed I/O, SRAM and dynamic multiplier in an effort, the partners hope, to revolutionize distributed computing architectures.
Although the technical aspects of the design, which has been in the works for nearly four years, are tightly held, details are emerging in excerpts from papers to be released today for the 2005 International Solid-State Circuits Conference(see story, page 94), as well as in patent filings.
The highly integrated Cell device has been billed as a beefy engine for Sony's Playstation 3, due to be demonstrated in May. But the architecture also addresses many other applications, including set-top boxes and mobile communications. Workstations fitted with the Cell architecture a $2 billion endeavor are already in the hands of game developers.
Five ISSCC papers from members of the 400-strong Cell processor team (see related story, "Best Development Teams," page 64) open peepholes onto a highly modular and hierarchical first-generation device implemented in 90-nanometer silicon-on-insulator (SOI) technology.
At root, the Cell architecture rests on two concepts: the "apulet," a bundle comprising a data object and the code necessary to perform an action upon it; and the "processing element," a hierarchical bundle of control and streaming processor resources that can execute any apulet at any time.
The apulets appear to be completely portable among the processing elements in a system, so that tasks can be doled out dynamically by assigning a waiting apulet to an available processing element. Scalability can be achieved by adding processing elements.
These ideas are not easily achieved. According to data from Paul Zimmons, a PhD graduate in computer science from the University of North Carolina at Chapel Hill, they require a highly intelligent way of dividing memory into protected regions called "bricks," careful attention to memory bandwidth and local storage, and massive bandwidth between processing elements even those lying on separate chips.
At the top level, the architecture appears to be a pool of "cells," or clusters of perhaps four identical processing elements. All of the cells in a system or for that matter, a network of systems are apparently peers. According to one of the ISSCC papers on the Cell design, a single chip implements a single processing element. The initial chips are being built in 90-nm SOI technology, with 65-nm devices reportedly sampling.
Each processing element comprises a Power-architecture 64-bit RISC CPU, a highly sophisticated direct-memory access controller and up to eight identical streaming processors. The Power CPU, DMA engine and streaming processors all reside on a very fast local bus. And each processing element is connected to its neighbors in the cell by high-speed "highways." Designed by Rambus Inc. with a team from Stanford University, these highways or parallel bundles of serial I/O links operate at 6.4 GHz per link. One of the ISSCC papers describes the link characteristics, as well as the difficulties of developing high-speed analog transceiver circuits in SOI technology.
The streaming processors, described in another paper, are self-contained SIMD units that operate autonomously once they are launched.
They include a 128-kbyte local pipe-lined SRAM that goes between the stream processor and the local bus, a bank of one hundred twenty-eight 128-bit registers and a bank of four floating-point and four integer execution units, which appear to operate in single-instruction, multiple-data mode from one instruction stream. Software controls data and instruction flow through the processor.
Another ISSCC paper describes a dynamic Booth double-precision multiplier designed in 90-nm SOI technology.
Performance estimates
The processing element's DMA controller is so designed, it appears, that any chip in a system can access any bank of DRAM in the cell through a band-switching arrangement. This would make all the processing resources appear to be a single pool under control of the system software.
Giving scale to the performance targets for the project, one of the ISSCC papers puts the performance of the streaming-processor SRAM at 4.8 GHz. This suggests the data transfer rate for 128-bit words across the local bus within the processing element. When the Cell alliance was announced in 2001, Sony Computer Entertainment CEO Ken Kutagari estimated the performance of each Cell processor a collection of apparently four processing elements in the first implementation at 1 teraflops.
But UNC's Zimmons has his doubts. "I believe that while theoretically having a large number of transistors enables teraflops-class performance, the PS3 [Playstation 3] will not be able to deliver this kind of power to the consumer," he wrote in response to an e-mail query from EE Times. "The PS3 memory is rumored to be able to transfer around 100 Gbytes/second, which would mean it could process new data at roughly 25 Gflops (at 32 bits) far from the 1-Tflops number."
Sony's 300-mm fab at Nagasaki, Japan, will run the 65-nm process and IBM Corp.'s fab in East Fishkill, N.Y., the SOI line.
SAN FRANCISCO The eagerly anticipated Cell processor from IBM, Toshiba and Sony leverages a multicore 64-bit Power architecture with an embedded streaming processor, high-speed I/O, SRAM and dynamic multiplier in an effort, the partners hope, to revolutionize distributed computing architectures.
Although the technical aspects of the design, which has been in the works for nearly four years, are tightly held, details are emerging in excerpts from papers to be released today for the 2005 International Solid-State Circuits Conference(see story, page 94), as well as in patent filings.
The highly integrated Cell device has been billed as a beefy engine for Sony's Playstation 3, due to be demonstrated in May. But the architecture also addresses many other applications, including set-top boxes and mobile communications. Workstations fitted with the Cell architecture a $2 billion endeavor are already in the hands of game developers.
Five ISSCC papers from members of the 400-strong Cell processor team (see related story, "Best Development Teams," page 64) open peepholes onto a highly modular and hierarchical first-generation device implemented in 90-nanometer silicon-on-insulator (SOI) technology.
At root, the Cell architecture rests on two concepts: the "apulet," a bundle comprising a data object and the code necessary to perform an action upon it; and the "processing element," a hierarchical bundle of control and streaming processor resources that can execute any apulet at any time.
The apulets appear to be completely portable among the processing elements in a system, so that tasks can be doled out dynamically by assigning a waiting apulet to an available processing element. Scalability can be achieved by adding processing elements.
These ideas are not easily achieved. According to data from Paul Zimmons, a PhD graduate in computer science from the University of North Carolina at Chapel Hill, they require a highly intelligent way of dividing memory into protected regions called "bricks," careful attention to memory bandwidth and local storage, and massive bandwidth between processing elements even those lying on separate chips.
At the top level, the architecture appears to be a pool of "cells," or clusters of perhaps four identical processing elements. All of the cells in a system or for that matter, a network of systems are apparently peers. According to one of the ISSCC papers on the Cell design, a single chip implements a single processing element. The initial chips are being built in 90-nm SOI technology, with 65-nm devices reportedly sampling.
Each processing element comprises a Power-architecture 64-bit RISC CPU, a highly sophisticated direct-memory access controller and up to eight identical streaming processors. The Power CPU, DMA engine and streaming processors all reside on a very fast local bus. And each processing element is connected to its neighbors in the cell by high-speed "highways." Designed by Rambus Inc. with a team from Stanford University, these highways or parallel bundles of serial I/O links operate at 6.4 GHz per link. One of the ISSCC papers describes the link characteristics, as well as the difficulties of developing high-speed analog transceiver circuits in SOI technology.
The streaming processors, described in another paper, are self-contained SIMD units that operate autonomously once they are launched.
They include a 128-kbyte local pipe-lined SRAM that goes between the stream processor and the local bus, a bank of one hundred twenty-eight 128-bit registers and a bank of four floating-point and four integer execution units, which appear to operate in single-instruction, multiple-data mode from one instruction stream. Software controls data and instruction flow through the processor.
Another ISSCC paper describes a dynamic Booth double-precision multiplier designed in 90-nm SOI technology.
Performance estimates
The processing element's DMA controller is so designed, it appears, that any chip in a system can access any bank of DRAM in the cell through a band-switching arrangement. This would make all the processing resources appear to be a single pool under control of the system software.
Giving scale to the performance targets for the project, one of the ISSCC papers puts the performance of the streaming-processor SRAM at 4.8 GHz. This suggests the data transfer rate for 128-bit words across the local bus within the processing element. When the Cell alliance was announced in 2001, Sony Computer Entertainment CEO Ken Kutagari estimated the performance of each Cell processor a collection of apparently four processing elements in the first implementation at 1 teraflops.
But UNC's Zimmons has his doubts. "I believe that while theoretically having a large number of transistors enables teraflops-class performance, the PS3 [Playstation 3] will not be able to deliver this kind of power to the consumer," he wrote in response to an e-mail query from EE Times. "The PS3 memory is rumored to be able to transfer around 100 Gbytes/second, which would mean it could process new data at roughly 25 Gflops (at 32 bits) far from the 1-Tflops number."
Sony's 300-mm fab at Nagasaki, Japan, will run the 65-nm process and IBM Corp.'s fab in East Fishkill, N.Y., the SOI line.