How is the Cell Processor Different to the Xbox 360 Xenon CPU?

playXray

Member
Please excuse the ignorance of this question but until recently I thought the Cell was a bespoke CPU and the Xbox 360 CPU was an off-the-shelf PowerPC chip.

From what I've read, they are actually both kinda related, and the PPE architecture that the Cell is based on is actually used in the Xenon CPU too. I don't really understand any more than that however.

I always thought they were as different as an ARM CPU is from an x86 CPU - is that true (if it's an appropriate analogy)?
 
You're a few years late to this party, no?

A very basic distinction would be that the Xbox 360 CPU is 3 PowerPC cores and little else, while the Cell is 1 of those PPC cores and 7 smaller, specialized cores that aren't PowerPC but something custom.
 
IMO ARM and x86 is not that different from a programming perspective. Cell is different because it had multiple SPUs (consider them small computers itself with its own RAM) to do work. You need to write specific SPU program to run on them and communicate with main CPU for I/O (devs struggle to utilize the SPUs because you will need to explicitly find ways to use them, but there are a lot more restrictions compared to plain code on the CPU). This architecture is unique in 7th gen, but actually PS2 also has similar structure with the VU1 and IOP coprocessor. You can see the continuation in the design.
 
You're a few years late to this party, no?

A very basic distinction would be that the Xbox 360 CPU is 3 PowerPC cores and little else, while the Cell is 1 of those PPC cores and 7 smaller, specialized cores that aren't PowerPC but something custom.

That's a good summary, thank you.
 
I can speak more about Cell than I can about Xenon. I was starting my PhD in computer architecture at the time the academic papers on Cell were being published, so it was a topic of conversation.

It's true that Cell used the PPC ISA, but it was a fairly novel architecture compared to more traditional PPC processors. I would say the difference between Cell and other Power processors is actually greater than the difference between a small X86 (like Atom, for instance) and ARM processors.

The main thing to note about Cell was the "Synergistic Processing Elements" or SPEs. The SPEs were more like DSP cores / accelerators / mini-GPU. They talked to the CPU via a separate bus using DMA. They also had software-controlled scratchpad memories. This leads to a much different programming model than you see with cores that have hardware-managed consistent L1 caches. This sort of model requires more work from programmers to optimize.
 
Please excuse the ignorance of this question but until recently I thought the Cell was a bespoke CPU and the Xbox 360 CPU was an off-the-shelf PowerPC chip.

From what I've read, they are actually both kinda related, and the PPE architecture that the Cell is based on is actually used in the Xenon CPU too. I don't really understand any more than that however.

I always thought they were as different as an ARM CPU is from an x86 CPU - is that true (if it's an appropriate analogy)?

Long story short. IBM is what ties the two chips together. Sony, IBM, and Toshiba began the cell design. When MS went looking for a chip IBM showed a version of the cell. MS specified some customizations. Some even made it into the final "Cell". There were teams working on both chips in the same building at the same time but only known to IBM senior designers and IBM management.
 
Ah... Technology of an age when new games were being developed... Now.. It is nothing more than re-makes and games that are half finished.
 
Cell: 1 PPU, the most traditional core, and seven enabled SPUs (one reserved for OS, so 6 for games), which are cores with a lot of traditional stuff pulled out (no prefetching, very little cache, a local memory to manually manage instead), but a lot of SIMD power for the time (Single instruction, multiple data, aka large number crunching)

Xenon: Basically 3 PPUs, but with more registers than the PPU in Cell had. Much more traditional. Half the SIMD power on paper, but easier to tap into its power as the traditional cores take care of a lot for the programmer.



The Cell was essentially early to be decent at what GPUs are very good at today, lots of number crunching but in a "straight line" so to speak, things that were branchy and unpredictable are where both suffer. And GPUs are several magnitudes faster than the Cell today, so we don't really need or want a CPU like that again. They bet on CPU when more things would eventually shift to GPU.


To get more complicated than that it's a heck of a lot of reading.


http://www.redgamingtech.com/sony-playstation-3-post-mortem-part-1-the-cell-processor/
 
Ah... Technology of an age when new games were being developed... Now.. It is nothing more than re-makes and games that are half finished.

Yup. All those remakes and unfinished games like Nier Automata, Hollow Knight, Nioh, Titanfall 2, and Persona 5.

Nothing ever finished or original. I await for this industry to finally crash again like in 1983 with great anticipation.
 
Outside of architecture and structure, he investment Sony was using on the cell actually may have helped the design of the 360 CPU too.

http://www.eurogamer.net/articles/sony-helped-design-360-processor
The authors of the book, 'The Race For a New Game Machine', are former IBM microchip designers David Shippy and Mickie Phipps. According to the Wall Street Journal's review, they worked on the processors for both machines, and say that IBM's design for core components of the Cell directly influenced the work the computing giant did for Microsoft on the 360's processor.

To add insult to injury, the Microsoft chip was commissioned later and delivered sooner than the Cell. The 360 hit the market first and established a lead over the PS3 that Sony is still struggling to crack. Although the book's authors claim both companies were winners in the end, the Wall Street Journal calls the debacle "one of [Sony's] greatest business failures".
 
A lot of people also attributed the PS3s worse performance in some multiplats to "lazy devs" not taking advantage of the Cell, but people seriously underestimated the explosion in code complexity it created.


Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
 
/* ... */
 
/* the graph */
vertex_t * G;
 
/* number of vertices in the graph */
unsigned card_V;
 
/* root vertex (where the visit starts) */
unsigned root;
 
void parse_input( int argc, char** argv );
 
int main(int argc, char ** argv)
{
  unsigned *Q, *Q_next, *marked;
  unsigned  Q_size=0, Q_next_size=0;
  unsigned  level = 0;
 
  parse_input(argc, argv);
  graph_load();
 
  Q      = 
          (unsigned *) calloc(card_V, sizeof(unsigned));
  Q_next = 
          (unsigned *) calloc(card_V, sizeof(unsigned));
  marked = 
          (unsigned *) calloc(card_V, sizeof(unsigned));
 
  Q[0] = root;
  Q_size  = 1;
  while (Q_size != 0)
    {
      /* scanning all vertices in queue Q */
      unsigned Q_index;
      for ( Q_index=0; Q_index<Q_size; Q_index++ )
      {
        const unsigned vertex = Q[Q_index];
        const unsigned length = G[vertex].length;
        /* scanning each neighbor of each vertex */
        unsigned i;
      for ( i=0; i<length; i++)
          {
            const unsigned neighbor =
              G[vertex].neighbors[i];
      if( !marked[neighbor] ) {
            /* mark the neighbor */
            marked[neighbor]      = TRUE;
            /* enqueue it to Q_next */
            Q_next[Q_next_size++] = neighbor;
          }
        }
      }
      level++;
      unsigned * swap_tmp;
      swap_tmp    = Q;
      Q           = Q_next;
      Q_next      = swap_tmp;
      Q_size      = Q_next_size;
      Q_next_size = 0;
    }
  return 0;
}

60 lines of code on a general processor.

1200 lines of code to port to an SPU. "lazy" = had a budget to stay alive.
 
Yup. All those remakes and unfinished games like Nier Automata, Hollow Knight, Nioh, Titanfall 2, and Persona 5.

Nothing ever finished or original. I await for this industry to finally crash again like in 1983 with great anticipation.

Only 5 games? Sad...

Let it burn. Let. It. Burn.
 
and fulfill the promise of 3 HDMI outs, 2 NICs, and ablative armour cases for the PS5.

What would we have even done with all those HDMIs? And two NICs? I've rarely seen PCs with two NICs, let alone needing two for a games console. Crazy days.
 
MS got to market first partially because they had a backup foundry ready to go when IBM's screwed up. Sony did not and lost more than 6 months.

Apparently, it cost MS a crap load of money though.


Which is also why the PS3, despite launching a year later, had a weaker GPU.

I always wonder what a different universes 7th gen console looks like, where the interesting Cell is paired with the better GPU, Xenos, and unified memory
 
A lot of people also attributed the PS3s worse performance in some multiplats to "lazy devs" not taking advantage of the Cell, but people seriously underestimated the explosion in code complexity it created.

60 lines of code on a general processor.

1200 lines of code to port to an SPU
. "lazy" = had a budget to stay alive.

Got a source for that? I'm actually curious.
 
A lot of people also attributed the PS3s worse performance in some multiplats to "lazy devs" not taking advantage of the Cell, but people seriously underestimated the explosion in code complexity it created.

<snip>

60 lines of code on a general processor.

1200 lines of code to port to an SPU. "lazy" = had a budget to stay alive.

Seems like devs have eventually had to adapt to GPGPU, though, which has many of the same design patterns that SPEs did.

There is an argument that programmability of GPUs has improved in the last 10 years with better support for caches and coherence, but I think a lot of it comes down to Cell's model being a little too early for its time - that is to say that developers are now more comfortable with this type of programming model and there are better tools / middleware to support it.
 
Got a source for that? I'm actually curious.

http://www.drdobbs.com/parallel/pro...ecode/programming-the-cell-processor/30000173

http://www.drdobbs.com/parallel/programming-the-cell-processor/197801624?pgno=3

To illustrate the peculiarities of Cell programming, we use the Breadth-First Search (BFS) on a graph. Despite its simplicity, this algorithm is important because it is a building block of many applications in computer graphics, artificial intelligence, astrophysics, national security, genomics, robotics, and the like.

Listing One is a minimal BFS implementation in C. Variable G contains the graph in the form of an array of adjacency lists. G.length tells how many neighbors the i-th vertex has, which are in G.neighbors[0], G.neighbors[1], and so on. The vertex from which the visit starts is in variable root. A BFS visit proceeds in levels: First, the root is visited, then its neighbors, then its neighbors' neighbors, and so on. At any time, queue Q contains the vertices to visit in the current level. The algorithm scans every vertex in Q, fetches its neighbors, and adds each neighbor to the list of vertices to visit in the next level, Qnext. To prevent being caught in loops, the algorithm avoids visiting those vertices that have been visited before. To do so, it maintains a marked array of Boolean variables. Neighbors are added to Qnext only when they are not already marked, then they get marked. At the end of each level, Q and Qnext swap, and Qnext is emptied.

On a Pentium 4 HT running at 3.4 GHz, this algorithm is able to check 24-million edges per second. On the Cell, at the end of our optimization, we achieved a performance of 538-million edges per second. This is an impressive result, but came at the price of an explosion in code complexity. While the algorithm in Listing One fits in 60 lines of source code, our final algorithm on the Cell measures 1200 lines of code.


Seems like devs have eventually had to adapt to GPGPU, though, which has many of the same design patterns that SPEs did.

There is an argument that programmability of GPUs has improved in the last 10 years with better support for caches and coherence, but I think a lot of it comes down to Cell's model being a little too early for its time - that is to say that developers are now more comfortable with this type of programming model and there are better tools / middleware to support it.

In addition the Cell didn't have great SDK support at the start, leaving devs pulling out hairs over a black box. Iirc, a game developer actually built a debugger for the SPUs as Sony hadn't provided one, and with that resource shared things got better.

GPU programming today has more robust frameworks that take away some of this complexity from the programmer.

Interesting tidbits
Xbox360: Other than the big-endian thing, it really smells like a PC --until you dug into it. The GPU is great --except that the limited EDRAM means that your have to draw your scene twice to comply with the anti-aliasing requirement? WTF! Holy Crap there are a lot of SIMD registers! 4 floats x 128 registers x 6 registers banks = 12K of registers! You are handed DX9 and everything works out of the box. But, if you dig in, you find better ways to do things. Deeper and deeper. Eventually, your code looks nothing like PC-DX9 and it works soooo much better than it did before! The debugger is awesome! PIX! PIX! I Kiss You!


PS3: A 95 pound box shows up on your desk with a printout of the 24-step instructions for how to turn it on for the first time. Everyone tries, most people fail to turn it on. Eventually, one guy goes around and sets up everyone else's machine. There's only one CPU. It seems like it might be able to do everything, but it can't. The SPUs seem like they should be really awesome, but not for anything you or anyone else is doing. The CPU debugger works pretty OK. There is no SPU debugger. There was nothing like PIX at first. Eventually some Sony 1st-party devs got fed up and made their own PIX-like GPU debugger. The GPU is very, very disappointing... Most people try to stick to working with the CPU, but it can't handle the workload. A few people dig deep into the SPUs and, Dear God, they are fast! Unfortunately, they eventually figure out that the SPUs need to be devoted almost full time making up for the weaknesses of the GPU.
 
A lot of people also attributed the PS3s worse performance in some multiplats to "lazy devs" not taking advantage of the Cell, but people seriously underestimated the explosion in code complexity it created.

60 lines of code on a general processor.

1200 lines of code to port to an SPU. "lazy" = had a budget to stay alive.

Heh I was always amused at the lazy dev nonsense. What people didn't realise that the PS3 was significantly harder to code for, had crap documentation (at least initially) and to top of all that debugging all the SPEs was "difficult" at best.

I'm truly amazed it got the ports and performance it did over it's life time.
 
Which is also why the PS3, despite launching a year later, had a weaker GPU.

I always wonder what a different universes 7th gen console looks like, where the interesting Cell is paired with the better GPU, Xenos, and unified memory

It's crazy how much money MS was throwing around back then.

Even crazier is that Xenon is actually gimped. The true spec was to be 4GHz but IBM could not get it to run cool enough without a rework. IBM did hit 4GHz soon after though.

Even crazier crazier, is that Apple would have had a customized Cell too. But MS bumped them in line and Apple didn't want to wait so they started looking at Intel.
 
Which is also why the PS3, despite launching a year later, had a weaker GPU.

I always wonder what a different universes 7th gen console looks like, where the interesting Cell is paired with the better GPU, Xenos, and unified memory
I believe the original plan was 2 cells no GPU as it could be programmed to function as both. That would have been a beast of a machine especial with the super fast ram.
 
Barf

Jesus

It was totally Sony's fault and yet the abuse heaved upon multi-platform devs was crazy. "Lazy Developers" in every other thread when a PS3 game wasn't running as good as the 360. It took years, and what I expect was Sony letting the ICE Team to share some of their shared technology, before multiplats to recover.
 
I believe the original plan was 2 cells no GPU as it could be programmed to function as both. That would have been a beast of a machine especial with the super fast ram.


I think this is slightly urban legend status, if there was a plan to have two Cells and no GPU, it was a very early drawing board kind of deal. With no ROPs and such the Cell wouldn't have made a great GPU, even if it was good at picking up the RSX's slack in some cases. Or if they would have added those, it would have been a very different end product.

The problem was that they poured all this R&D into the Cell, and just went with a relatively off the shelf GPU, where Microsoft pushed the envelope with the first unified shader GPU, eDRAM, shared memory, etc.

Even crazier crazier, is that Apple would have had a customized Cell too. But MS bumped them in line and Apple didn't want to wait so they started looking at Intel.


I suspect Apple would have nixed it even being first in line, for the above explosion in code complexity to get anything out of it. In addition, it was IBMs performance per watt not improving and Intels being so much better, with the rise of laptops, that made them also switch. G5 actually had *worse* perf/watt than G4.

I do wonder what a dual Cell Mac Pro would have been like though! For 2005, that would have been an insane number of CPU Gflops to play with.
 
It was totally Sony's fault and yet the abuse heaved upon multi-platform devs was crazy. "Lazy Developers" in every other thread when a PS3 game wasn't running as good as the 360. It took years, and what I expect was Sony letting the ICE Team to share some of their shared technology, before multiplats to recover.

There's a reason why Gabe Newell was utterly savage about his criticisms of the PS3 and Sony early on (and also why Valve never did an in-house PS3 port of the Orange Box and left it to EA to completely half-ass the PS3 version, it wasn't worth the effort). By the time they eventually made a PS3 compiler/exporter for Source, tools and documentation were way better.

I think all three console makers have learned the value in providing extensive documentation and dev toolsets by now, especially Nintendo, who also flubbed with the Wii U and made the smart decision of letting Nvidia handle that side of things, especially since Nvidia already had a robust and mature set of tools and documentation for their hardware.
 
I think this is slightly urban legend status, if there was a plan to have two Cells and no GPU, it was a very early drawing board kind of deal. With no ROPs and such the Cell wouldn't have made a great GPU, even if it was good at picking up the RSX's slack in some cases. Or if they would have added those, it would have been a very different end product.

The problem was that they poured all this R&D into the Cell, and just went with a relatively off the shelf GPU, where Microsoft pushed the envelope with the first unified shader GPU, eDRAM, shared memory, etc.




I suspect Apple would have nixed it even being first in line, for the above explosion in code complexity to get anything out of it. In addition, it was IBMs performance per watt not improving and Intels being so much better, with the rise of laptops, that made them also switch. G5 actually had *worse* perf/watt than G4.

I do wonder what a dual Cell Mac Pro would have been like though! For 2005, that would have been an insane number of CPU Gflops to play with.

Well, IBM's goal for the PPC "Cell" core was 4GHz at 10 watts, and work started in 2001. In theory, a Cell based Mac would have been an '04 product instead of them floundering with the G5 on it's deathbed.
 
Well, IBM's goal for the PPC "Cell" core was 4GHz at 10 watts, and work started in 2001. In theory, a Cell based Mac would have been an '04 product instead of them floundering with the G5 on it's deathbed.

At 4GHz it ended up around 60 watts as far as I can tell, I don't know the OG PS3s Cell portion of the power consumption but even if dropping 1GHz halved the wattage that would be 3x their stated goal

http://www.blachford.info/computer/Cell/Cell1_v2.html

Was 10 watts for for just the PPU alone or something? Because for the whole 1+7 chip at 4GHz that sounds like an unrealistic goal at the nodes they had at launch.

Now, maybe 30 watts would have been fine for Apple, laptop chips like the Core 2 Duo were up at that anyways, but it would still involve that increase in code complexity. It would be interesting to see the alternate reality with Cell Macs though, for sure.


Instinctively they probably would have flopped though, lots of power on paper, but an order of magnitude harder to get it in the air than their Core series PC brethren. Apple made the right choice with Intel.
 
2 NIC I can understand..but 3 HDMI out? Wtf...lmao.
Mutliscreen gaming uses mutliple video outputs. Forza Motorsport series on the 360 supported multiscreen gaming.

Why would you use 2 NICs on a console? On a PC I can see the benefit when you're working in different vlans, networks, firewall or simply need the added bandwidth provided by a second NIC. But a console? I can't come up with a use case that won't be covered by one NIC.
Thanks for the links.
 
At 4GHz it ended up around 60 watts as far as I can tell, I don't know the OG PS3s Cell portion of the power consumption but even if dropping 1GHz halved the wattage that would be 3x their stated goal

http://www.blachford.info/computer/Cell/Cell1_v2.html

Was 10 watts for for just the PPU alone or something? Because for the whole 1+7 chip at 4GHz that sounds like an unrealistic goal at the nodes they had at launch.

Now, maybe 30 watts would have been fine for Apple, laptop chips like the Core 2 Duo were up at that anyways, but it would still involve that increase in code complexity. It would be interesting to see the alternate reality with Cell Macs though, for sure.


Instinctively they probably would have flopped though, lots of power on paper, but an order of magnitude harder to get it in the air than their Core series PC brethren. Apple made the right choice with Intel.


Looks like 10 was for the PPU. Apparently, Apple/IBM had even road mapped 6GHz somewhere in the area of 60-75 watts, possibly lower once IBM moved away from 90nm.

Agreed that Apple made the right choice in the long run.
 
Top Bottom