Official *** CELL processor announements **** Thread

I know there's another thread, but it's devolved and I thought this one might provide more focus. If not, a mod can merge it with whichever of the other threads they feel is most appropriate.

Cell Processor Uses Rambus High Speed Interface Solutions

Monday February 7, 1:01 pm ET
XDR(R) DRAM and Redwood FlexIO(TM) Processor Bus Provide Unprecedented Bandwidth for Next-Generation Computer and Consumer Applications

SAN FRANCISCO--(BUSINESS WIRE)--Feb. 7, 2005-- Rambus Inc. (Nasdaq:RMBS - News), a leading developer of chip interface products and services, today revealed that the Cell processor incorporates Rambus's XDR memory and FlexIO(TM) processor bus interface solutions. Cell is the highly-anticipated advanced microprocessor developed by Sony Corporation, Sony Computer Entertainment, Toshiba Corporation and IBM. The memory and processor bus interfaces designed by Rambus account for 90% of the Cell processor signal pins, providing an unprecedented aggregate processor I/O bandwidth of approximately 100 gigabytes-per-second.

Rambus is scheduled to discuss the Cell interface clocking and circuit design at the International Solid State Circuits Society conference in San Francisco on February 9, 2005.

"The Cell processor, that has overwhelming computational power, demands another overwhelming data transfer capability between Cell and main memory system, and Input/Output systems. Rambus, underpinned by its expertise in latest memory technology, provided us with a clear solution that was absolutely the best match to Cell," said Ken Kutaragi, executive deputy president and COO, Sony Corporation, and president and Group CEO, Sony Computer Entertainment Inc. "I respect Rambus and all our team members that collaborated together for completing this challenging work with all the technology and enthusiasm they possess."

"We have been busy working with the Sony Group and Toshiba on the development of the Cell processor for the past couple of years and we're excited to see this advanced engineering effort become a reality," said Harold Hughes, chief executive officer at Rambus. "Our engineering teams have not only designed and developed the world's fastest memory and logic interfaces but we continue to help our customers integrate various system components which enable them to bring high-performance, high-value products to the market."

The Rambus XDR memory interface, capable of data rates of 3.2GHz to 8.0GHz, achieves data rate speeds that are an order of magnitude higher than today's mainstream PC memory systems while utilizing fewer DRAMs and fewer controller pins. FlexIO processor buses, formerly codenamed Redwood, are capable of running up to 6.4GHz data rates providing bandwidth more than four times faster than best-of-class processor buses available today. All Rambus high-speed interfaces are developed as complete solutions for high-volume, low-cost systems.

Sony and Toshiba signed a licensing agreement with Rambus in January 2003. Since then the engineering teams have worked closely to design and develop the high-bandwidth interface solutions necessary for next-generation computing and consumer devices.
 
The memory and processor bus interfaces designed by Rambus account for 90% of the Cell processor signal pins, providing an unprecedented aggregate processor I/O bandwidth of approximately 100 gigabytes-per-second.
 
RDRAM has always been very fast, but it's popularity in PCs faded because it was so pricey compared to DRAM.
 
Maybe there was a leak, but I'm pretty sure we already knew the Rambus information didn't we? There have already been people talking about the max theoretical performance considering the memory speed.
 
McFly said:

Might as well put the whole thing here and save a click ;)

IBM, Sony, Sony Computer Entertainment Inc. and Toshiba Disclose Key Details of the Cell Chip
Monday February 7, 1:00 pm ET
Innovative Design Features Eight Synergistic Cores Together with Power Based Core, Delivers More Than 10 Times the Performance of the Latest PC Processors

SAN FRANCISCO--(BUSINESS WIRE)--Feb. 7, 2005-- At the International Solid State Circuits Conference (ISSCC) today, IBM, Sony Corporation, Sony Computer Entertainment Inc. (Sony and Sony Computer Entertainment collectively referred to as Sony Group) and Toshiba Corporation (Toshiba) for the first time disclosed in detail the breakthrough multi-core architectural design - featuring supercomputer-like floating point performance with observed clock speeds greater than 4 GHz - of their jointly developed microprocessor code-named Cell.

Source: IBM

· View multimedia news release


A team of IBM, Sony Group and Toshiba engineers has collaborated on development of the Cell microprocessor at a joint design center established in Austin, Texas, since March 2001. The prototype chip is 221 mm(2), integrates 234 million transistors, and is fabricated with 90 nanometer SOI technology.

Cell's breakthrough multi-core architecture and ultra high-speed communications capabilities deliver vastly improved, real-time response for entertainment and rich media applications, in many cases 10 times the performance of the latest PC processors.

Effectively a "supercomputer on a chip" incorporating advanced multi-processing technologies used in IBM's sophisticated servers, Sony Group's computer entertainment systems and Toshiba's advanced semiconductor technology, Cell will become the broadband processor used for industrial applications to the new digital home.

Another advantage of Cell is to support multiple operating systems, such as conventional operating systems (including Linux), real-time operating systems for computer entertainment and consumer electronics applications as well as guest operating systems for specific applications, simultaneously.

Initial production of Cell microprocessors is expected to begin at IBM's 300mm wafer fabrication facility in East Fishkill, N.Y., followed by Sony Group's Nagasaki Fab, this year. IBM, Sony Group and Toshiba expect to promote Cell-based products including a broad range of industry-wide applications, from digital televisions to home servers to supercomputers.

Among the highlights of Cell released today:

* Cell is a breakthrough architectural design -- featuring eight synergistic processors and top clock speeds of greater than 4 GHz (as measured during initial hardware testing)
* Cell is a multicore chip capable of massive floating point processing
* Cell is OS neutral and supports multiple operating systems simultaneously

"Today's disclosure of the Cell chip's breakthrough architectural design is a significant milestone in an ambitious project that began four years ago with the creation of the IBM, Sony and Toshiba design lab in Austin, Texas," said William Zeitler, senior vice president and group executive, IBM Systems and Technology Group. "Today we see the tangible results of our collaboration: an open, multi-core, microprocessor that portends a new era in graphics and multi-media performance."

"Today, we are very proud to share with you the first development of the Cell project, initiated with aspirations by the joint team of IBM, Sony Group and Toshiba in March 2001," said Ken Kutaragi, executive deputy president and COO, Sony Corporation, and president and Group CEO, Sony Computer Entertainment Inc. "With Cell opening a doorway, a new chapter in computer science is about to begin."

"We are proud that Cell, a revolutionary microprocessor with a brand new architecture that leapfrogs the performance of existing processors, has been created through a perfect synergy of IBM, Sony Group and Toshiba's capabilities and talented resources, "said Masashi Muromachi, corporate vice president of Toshiba Corporation and president & CEO of Toshiba's Semiconductor Company. "We are confident that Cell will provide major momentum for the progress of digital convergence, as a core device sustaining a whole spectrum of advanced information-rich broadband applications, from consumer electronics, home entertainment through various industrial systems."
 
gofreak said:
Is this thread just for press releases, or will we merge all Cell info into here?

I think anything official can certainly go in here, but stuff that's a year or even six months old will either be confirmed or trumped presumably.
 
From a B3D member taht is at the coference:

Update from San Francisco Very Happy

First post is on SPU, next will be on overall CELL.

Presentations haven't happened yet, but here is some stuff from the conference proceedings (which anyone can buy as of this morning):

On the SPU paper they can't seem to make up their mind on the name. It's called an SPU (streaming processor unit) and also an SPE (synergistic processor element), and then in the overall CELL paper the 8 little boxes in the block diagram are labelled SXU. Seriously, I didn't make that second one up. The last one I think actually refers to the interconnect mechanism to the rest of the chip.

The core area of one SPU/SPE (of which there are 8 on the chip) is 2.5x5.81mm2 in 90nm.

Each SPU has 256KB local SRAM which is not part of system address space (referred to as "untranslated, unguarded and non-coherent"). There is a DMA unit per SPU to manage background transfers to/from system memory space (with MMU). There can be up to 16 pending DMA requests, each of up to 16kb.

Each SPU has 128 128bit registers. The text says there are both seven and eight execution units per SPU (doesn't anyone proofread their papers anymore? Smile ). There are fixed and floating point units, permute, some other stuff. Ask if you want details.

All data fetch and branch prediction is managed in software, i.e. you have to explicitly prefetch what you want when you want it, and for branches it mentions that "efficient S/W" manages branches by replacing branches with bitwise select instructions, arranging common case code to be inline, and inserting branch hint instructions.

They claim the SPU/SPE is programmable in C/C++ with intrinsics.

Clock rate ranges from 2-5 GHz over a voltage range 0.9-1.3v with power ranging from 1-11W.

Fredi
 
All data fetch and branch prediction is managed in software, i.e. you have to explicitly prefetch what you want when you want it, and for branches it mentions that "efficient S/W" manages branches by replacing branches with bitwise select instructions, arranging common case code to be inline, and inserting branch hint instructions.
Is this good or bad?
 
Second part:

Ok, now for the CELL itself. All of the following info is from now publicly available conference proceedings:

- 8 SPU's/SPE's
- 1 64-bit PPU, dual threaded Power microprocessor (also referred to as PPE for power processor element and also a PXU because they can't seem to make up their minds what they want to call anything)
- Dual XDR channels for memory
- whole chip is 234M transistors, 200+ mm2 in 90nm

PPU/PPE/etc. has L1 and L2 of unknown size. L2 looks really big though (I'm guessing it's 512KB, but it might be 1MB).

I'll try to describe the block diagram in words. Use your imagination Smile

There is an interconnect block called Element Interface Bus (EIB). On one side of this are the 8 SPU's hanging off it, each through their own load store/DMA unit. On the other side is the PPU (with it's L1 and L2), dual XDR connection, and two non-coherent I/O interfaces (not sure what they do exactly).

The 8 SPU's can have a total of 128 DMA transfers outstanding.

256Gflops total (no, not 1Tflops). Peak number of course (i.e. take clock rate and multiple by # of floating point units).

Fredi
 
cellpic_1_.jpg
 
Ouch u are right. For some reason I though they ment 221mm x 221mm. LOL, my bad.
Don't worry, I almost typed as a reply that 221mm2 is around 5cm x 4cm, before I realized how wrong that sounds :PP
 
soundwave05 said:
So are these preliminary announcements more or less what people expected?

Nothing new really.
All as expected.
Now they have to confirm the 8MB*8 APUs embedded memory and lshow us some benchmarks.
It would be cool a 3D demo but I guess i'm asking too much at the moment.
 
soundwave05 said:
So are these preliminary announcements more or less what people expected?

It's about 4 times as much as xbox fanboys have predicted and about 4 times less than the Sony fanboys predicted. Of course it's still a posibility that a 2 or even 4 PE chip ends inside the PS3.

Fredi
 
I will get back to the above post in a min...

from Beyond3D:

Update from San Francisco

First post is on SPU, next will be on overall CELL.

Presentations haven't happened yet, but here is some stuff from the conference proceedings (which anyone can buy as of this morning)

On the SPU paper they can't seem to make up their mind on the name. It's called an SPU (streaming processor unit) and also an SPE (synergistic processor element), and then in the overall CELL paper the 8 little boxes in the block diagram are labelled SXU. Seriously, I didn't make that second one up. The last one I think actually refers to the interconnect mechanism to the rest of the chip.

The core area of one SPU/SPE (of which there are 8 on the chip) is 2.5x5.81mm2 in 90nm.

Each SPU has 256KB local SRAM which is not part of system address space (referred to as "untranslated, unguarded and non-coherent"). There is a DMA unit per SPU to manage background transfers to/from system memory space (with MMU). There can be up to 16 pending DMA requests, each of up to 16kb.

Each SPU has 128 128bit registers. The text says there are both seven and eight execution units per SPU (doesn't anyone proofread their papers anymore?. There are fixed and floating point units, permute, some other stuff. Ask if you want details.

All data fetch and branch prediction is managed in software, i.e. you have to explicitly prefetch what you want when you want it, and for branches it mentions that "efficient S/W" manages branches by replacing branches with bitwise select instructions, arranging common case code to be inline, and inserting branch hint instructions.

They claim the SPU/SPE is programmable in C/C++ with intrinsics.

Clock rate ranges from 2-5 GHz over a voltage range 0.9-1.3v with power ranging from 1-11W.

Ok, now for the CELL itself. All of the following info is from now publicly available conference proceedings:

- 8 SPU's/SPE's
- 1 64-bit PPU, dual threaded Power microprocessor (also referred to as PPE for power processor element and also a PXU because they can't seem to make up their minds what they want to call anything)
- Dual XDR channels for memory
- whole chip is 234M transistors, 200+ mm2 in 90nm

PPU/PPE/etc. has L1 and L2 of unknown size. L2 looks really big though (I'm guessing it's 512KB, but it might be 1MB).

I'll try to describe the block diagram in words. Use your imagination

There is an interconnect block called Element Interface Bus (EIB). On one side of this are the 8 SPU's hanging off it, each through their own load store/DMA unit. On the other side is the PPU (with it's L1 and L2), dual XDR connection, and two non-coherent I/O interfaces (not sure what they do exactly).

The 8 SPU's can have a total of 128 DMA transfers outstanding.

256Gflops total (no, not 1Tflops). Peak number of course (i.e. take clock rate and multiple by # of floating point units)
 
Is this good or bad?
It's what everyone assumed it will be since the first time we saw the patents. So neither good or bad really.


[Old Skool]256Gflops TEH UNDERPOWERED[/Old Skool]
But but but... Deadmeat said it was 64GFlops at 1GHz.
 
*lends kleegamefan his bi-focals, and points up the thread* :P

Getting slow, old man ;)
 
The FlexIO technology will be used to connect the various chips on a Cell-based motherboard, according to Rich Warmke, marketing director of the memory interface division at Rambus. A multicore Cell processor, by contrast, will use its own internal bus to connect multiple cores. However, 90 percent of the Cell's external pins are connected to either the FlexIO or XDR interfaces, evidence that the Cell's design emphasizes moving application and or 3D scene data around within main memory, Warmke said.

http://www.extremetech.com/article2/0,1558,1761407,00.asp

So a multi PE chip is planned at least.

Fredi
 
Excellent to see the patents so thoroughly realised (as far as I can see). The local SRAM per APU seems to have even got an upgrade.

Sigh of relief all round? :) I'm so glad I no longer have to say "well, if there's 8 APUs, and if it's clocked at 4Ghz, and if there's a 8flop instruction in there...." ;)
 
Is Deadmeat still alive?

Alive? unknown, but his rotting corpse stinks up Opa Ages with it's bitterness and bile too often.

(After all, he has been banned from just about everywhere else around the net, yet still gets mentioned eveywhere he's left his past detrious, that things just gotta be undead.)
 
Deadmeat is still alive and well at the other "Age" forum. He predicted 72 GFlops for Cell, with 1.15 Ghz clock speed. I don't think he has posted a reaction to these announcements though.
 
border said:
Deadmeat is still alive and well at the other "Age" forum. He predicted 72 GFlops for Cell, with 1.15 Ghz clock speed. I don't think he has posted a reaction to these announcements though.

Undoubtedly he'll be able to explain it all away..;)
 
McFly said:
It's about 4 times as much as xbox fanboys have predicted and about 4 times less than the Sony fanboys predicted. Of course it's still a posibility that a 2 or even 4 PE chip ends inside the PS3.

Fredi


Heh, the truth is always somewhere in the middle.
 
http://www.eet.com/semi/news/showArticle.jhtml?articleID=59301581

A key to the architecture is the so-called Synergistic Processor Element (SPE), which is a SIMD-based technology."The SPE can issue up to two instructions per cycle to seven execution units organized in two execution pipelines," according to a paper presented by the developers of "Cell."

It is said to support multiple operating systems, such as Linux, real-time operating systems and guest operating systems for specific applications simultaneously.

Initial production of "Cell" microprocessors is expected to begin at IBM's 300-mm wafer fabrication facility in East Fishkill, N.Y., followed by Sony's Nagasaki Fab, this year.
 
Haha, Intel must go crazy after this news. They released additional details on its Montecito chip today at the conference. 1.72 Billion transistors at 90 nm.

A one PE Cell chip has only 234M transistors and is much faster than that Intel chip.

So what about a 6PE Cell chip in PS3? Yeah, not exactly a fair comparision as you have to look what is memory and what logic, but still funny enough. :D

Fredi
 
Gotta feel for Deadmeat.

No only did the Dreamcast die (and thus the DC2 as well), but Sega brought Virtua Fighter 4 to PS2 and PS2-only :lol :lol :lol
 
Fafalada said:
It's what everyone assumed it will be since the first time we saw the patents. So neither good or bad really.
So hitting ambitious performance targets one sets for themself is neither good or bad? Damn, Faf, what does it take? ;)
 
Kaching said:
So hitting ambitious performance targets one sets for themself is neither good or bad? Damn, Faf, what does it take? ;)
You should read the question I was replying to ;) It was in regards to how the S|APUs execute code(in order execution, no auto prefetch mechanism), not performance.

The performance numbers we're seeing are definately good :)
 
Top Bottom