The article at Ars Tech sums it up in four succinctly written paragraphs:
The burden of managing the cache has been moved into software, with the result that the cache design has been greatly simplified. There is no tag RAM to search on each access, no prefetch, and none of the other overhead that accompanies a normal L1 cache. The SPEs also move the burden of branch prediction and code scheduling into software, much like a VLIW design
To sum up, IBM has sort of reapplied the RISC approach of throwing control logic overboard in exchange for a wider execution core and a larger storage area that's situated closer to the execution core. The difference is that instead of the compiler taking up the slack (as in RISC), a combination of the compiler, the programmer, some very smart scheduling software, and a general-purpose CPU do the kind of scheduling and resource allocation work that the control logic used to do
.
Once the instructions are in the SPE, the SPE's control unit can issue up to two instructions per cycle, in-order. The SPE has a 128-entry register file (128-bits per entry) that stores both floating-point and integer vectors. As stated above, there are no rename registers. All loop unrolling is done by the programmer/compiler using this very large register file
Note also that the register file has six read ports and two write ports. The SPEs can do forwarding and bypass the register file when necessary. The SPE has a DMA engine that handles moving data between main memory and the register file. This engine is under the control of the programmer as mentioned above. Each SPE is made of 21 million transistors: 14 million SRAM and 7 million logic. Finally, the instruction set for the SPEs is not VMX compatible or derivative, because its execution hardware doesn't support the range of instructions and instruction types that VMX/Altivec does.
To summarize:
1) Xenon will offer better online play, revolutionary downloadable content schemes to removable media, and generally a feature-rich gaming experience compared to Sonys obvious ultimate vision of controlling format and delivery, which is closer to what you would imagine a scary, monopolistic company would want. Microsoft is empowering developers to make gamers happy and buy more games. Its the same strategy they use with other software development platforms, and it works. #1) Make the developers happy, #2) make the gamers happy.
2) Sony is heading into the same directions (proprietary technology, ignoring Western game cos) which Nintendo was 10 years ago, and its not going unnoticed and you will surprising defections to MS camp because of this. Sony is doing nothing but hype hype hype and try to position themselves for financial windfalls from royalties and the poor Japanese who are going to pay their life savings away to own their 49900 yen PS3.
3) Without a doubt MS has Sony beat in dev support, today, right now, and likely for the next 10 years. Its black and white. They dont seem to understand the concept of offering a helping hand. Defend them all you want, they make great games and have awesome marketing/hype machines. However, even Nintendo has Sony beat in dev support for crying out loud. DS has development tools which poop on PSP, and this talk of open source is making me groan. Open source tools are made by the comic book dude from the Simpsons, who would rather scratch his butt than lift a finger to write a legible and useful piece of documentation.
4) My money is on Xenon to have more main memory, maybe even twice as much (something any game programmer will be crying with joy about), and additionally larger general purpose cache than PS3. Its going to flow naturally from the difference in necessities of memory bandwidth. Basically, Xenon will be kicking butt out of the box, you wont have to tell it what to do, it knows what to do, and it has plenty of space to do it in. Its a programmer friendly architecture what needs to be cached is cached and this works well for anything pushing lots of data back and forth, which you know
all freaking games do, especially living breathing worlds like Fable and what youll be seeing on Xenon.
Memory latency is overblown into tying our hands behind back, all I want to do is store X and retrieve X when I need it, not store X, Y, Z because I know Im going to need Z, Y, X in that order, for every object, for every possible combination of stores. Its not something as trivial as working it into the compiler. All games store and load data BUT, paying attention to order, temporality, and cache size should be ALL YOU NEED to do to optimize, not attempt to predict whats going to happen when you start pushing data through the CELL. Its turning the CPU into a Chess partner you have to predict moves with
while this may seem fun to some, you can have that I just want something fast enough to do rigid body and AI at the same time and that is something that is not bottlenecked by memory latency
.
CELL is going to be nice for math heavy computation and making Gran Turismo 5 look and feel great. Would it be good for the massive content games which Microsoft are brewing up? No. Im seeing a divergence in not only architecture but games themselves. Western games are going to become all about what you can do, what you can experience, what you can change, Japanese games are going to continue to be stuff which is all about the moment, all about the CG
thats fine and dandy, Im sure well be wowing about some new Zone of Enders in Fall 2006 with lots of pretty robots and pixel effects, but long term, long run, the difference in architecture is not as important as the difference in game development support and things like mature online play support systems.
Having an OS would NOT abstract any of the CELL workload away
what you do with the data is purely a game to game, line of code to line of code decision. More importantly, is it possible to work CELLs architecture into middleware so that its easy to make better looking games? The answer is no, without a doubt middleware would probably mainly make porting from Xenon to PS3 easier, and without a miracle predict the future Turing machine embedded into CELL, theres no way to make it go fast without lots more work and non stop line to line, game to game consternation.