Mrbob said:
I understand the basics of dual core and triple core processors, but Cell confuses me. You have one core processor with 7 SPE. The SPE are what confuse me. How exactly do they function with the main processor. How do one SPE alone handle things like physics alone when X360 games like CoD2 are dedicated an entire core just to physics?
The principle behind the SPU design is this. Conventional cores have lots of logic - only some of which is directly related to execution of code. There's lots of other logic in there, that's basically there to help the programmer - like cache control mechanisms, instruction reschedulers (for Out of Order Execution) and hardware branch prediction - but isn't core execution logic. The idea is, we take some of these "programmer comforts" away - rely on good programmer practice or ingenuity to do or avoid the kinds of things the hardware would in a conventional core - so that we can make the core smaller. And because we can make the core smaller, we can fit more of them on the die. They also tweaked the execution logic to favour floating point execution - which absolutely makes sense for a games processor. So given the right treatment by the programmer, a SPU can be as good, or better as we've seen in some cases, than a conventional core (better in some cases, perhaps because of the memory model in Cell, which is oft overlooked in favour of "teh FLOPS!1"

). The removal of that "comfort logic" is also what makes Cell more difficult to program for (by varying degrees depending on the developer), as I'm sure you've heard by now.
Basically, what all this means for games comes down to whether the SPUs are well suited for the more time-consuming tasks in a game. I would say yes, and judging by what developers have said they want to use more CPU power for, a lot of them seem to agree. We can't simply look at the number of tasks that will or won't run well on SPUs, we have to look at their relative importance in terms of intensity and execution time. To take it to a bit of an extreme, if the SPUs were only good at one task of ten, but that one task took 90+% of execution time, it'd still be a win.
As for how the SPUs function with the main processor, it's really up to the developer. You can write a function, and instead of executing it on the PPE, you can hand the computation off to a SPU for example, and it will return the results to the PPE. That's one model. Then, of course, there's a threading model, where the PPE would spawn a thread on the SPU and let it run. When the SPU was finished, the PPE would spawn another on it, and so forth. Or, the PPE could send a kernel over to the SPUs, get them started, and leave the SPUs to subsequently look after themselves (they'd pull over their own tasks from memory, finish them, pull over another and so forth).
Also, just to correct some of my earlier comments about the Transform & Lighting demo, obviously that doesn't represent a "benched" maximum transform rate as I suggested earlier. That would be a higher figure, as obviously they must be including some lighting model for it to be TnL.