• Hey Guest. Check out your NeoGAF Wrapped 2025 results here!

Peter Hofstee talk on Cell

antipode

Member
H. Peter Hofstee is giving a technical presentation on Cell right now, so I thought I'd post some notes for those who are interested. Probably most of this will be stuff we already know.

-Host mentions in introduction "in the Playstation 3, hopefully coming out this year", Peter cracks a smile.

-agenda: power wall, memory/latency wall, multicore and specialization, DMA and microarchitecture distinctions, thinks that work and don't work well, things for Academia to look at

-historical specInt single thread growth rate was 45%, but slowing dramatically

[shows a chart showing the motivations behind either multiple cores or specialized (non-homogenous) cores]
-memory wall: asynchronous loads point to non-homogenous cores, efficiency wall: specialized functions point to non-homogenous cores, power wall: reduced transistor power w/limit oxide thickness scaling, channel length and operating voltage points to multiple cores, reduced switching per function points to non-homogenous cores

- ceiling in terms of power - have already hit ceiling in terms of watts that can fit in a traditional computer form factor, need multi-core to progress, don't want much more than 250W in a consumer box
- motivation since 2001 was to "Support an introduction in 2005/6 - Challenge: structure innovation such that 5yr schedule can be met"
-sharing workloads across the network an important design motivation
-non-homogenous coherent chip multiprocessor allows attack on the "Frequency Wall" - deliberately designed for 4GHx, reduced to 3.2 w/ low operating volatage because power efficiency increases greater than cubically - "also helps that we spent 400 mil in that regard" (gets laughs)
-streaming DMA attacks "memory wall"

-potentially a collision between mainstream OS functions and streaming app/games, managed by hypervisor to allow realtime guarantees
-most programmers will not ("and I don't see in practice") use LS Alias available in main memory, since load balancing and scaling are difficult, instead refer explicitly to LS in SPEs

-token-based mechanism guarantees bandwidth at memory and IO chokepoints to real-time OS functions that need it
 
-Fundamental change for programmers - transition from demand-fetch to software-controlled prefetch - DMA lists to scatter and gather info from memory
-design "would have been flat-out impossible for a PC maker to pull-off" because performance is not as important to them
-everything goes into GPRs - branches, links, compare results - because he doesn't like how in higher-frequency processors specialized registers bloat into stacks

-Q from audience: "this reminds me of the 'CDC 6600' processor with lots of little processors around the central processor". A: "the SPEs aren't really so little (compared to central core)- remember that on compute-intensive tasks they will outperform a Pentium 4"

-fetching from main mem to local store is not so bad - similar to cache miss, but you can have many things in flight
 
- 25.6 bandwidth to Rambus mem - 50 or 60 "depending on asychronous or synchronous I/O configuration" to FlexIO

- "Focus on 32b float, or <= 32b integer","Typical code is double-buffered gather-compute-scatter"
-FFTw best result "about 100 GFlops"
-10-20x faster today: crypto, MPEG2/4, H.264, JPEG
-library based scientific code need no rewrites, "Library..Device/API based Graphics and physics and sound" apps

-double-precision FP stalls because not fully pipelined out to 14 cycles, "stutters at 7 cycles", surprised to find recently that really ambitious game worlds are using double-prec now
 
- shows Mercury development system, IBM blade

-future of Cell designs - may be a threat to "Windows" in desire for immerse, 3D interactivity in real-time with distributed, device-agnostic, collaborative apps
- "new types of applications (often real-time) made possible by a dramatic jump in performance - E.g. gesture and emotion recognition"
 
-thinks emotion recognition (with a camera - frown or smile) is a promising way of interacting with a computer

-Q: How are game devs doing? A: The Sony side knows more about specific game development progress...
-US export restrictions on supercomputers - "at some point we had a question whether all our employees could be allowed into the printing lab" jokes
 
antipode said:
- ceiling in terms of power - have already hit ceiling in terms of watts that can fit in a traditional computer form factor, need multi-core to progress, don't want much more than 250W in a consumer box
Ceiling in term of power ? That's kinda bullshit... For a long time, PC processors were developped for computation power, not for power efficiency. So we got PIV, Athlon 64 and such over 100W.

Recently, they decided to look for better wattages. Result ? Pentium M is below 30W, there's even very-low voltages Pentium M that uses 5W IIRC (didn't saw them in retail, but didn't looked either).

And as far as I know, we don't know precisely yet the consumption of Cell ? Did I miss it ?


Besides this, I think that if you look for computation power, that's probably the way to go (XBox 360 processor shares the same philosophy -at a lesser degree-, Intel roadmap points to modified DSP in future processors, etc.)
 
Heh, sorry - no it's not London or Fridayton, it's a small presentation at Stanford for faculty and students. The main talk is over, he's just taking questions from professors now. Overall he seems pretty confident in the architecture but a little bewildered at all the different applications people are coming up with and what that means for whether his current design decisions were right and what Cell 2 will look like. He pointed to the "Magic Mirror" (I think that was Toshiba's?) as an application he would never have imagined was possible when he started the project. It sounds like the games are also more ambitious in terms of what he thought necessary - specifically needing DP floating point to handle the number of objects in the game world.
 
Koren said:
Ceiling in term of power ? That's kinda bullshit... For a long time, PC processors were developped for computation power, not for power efficiency. So we got PIV, Athlon 64 and such over 100W.

Recently, they decided to look for better wattages. Result ? Pentium M is below 30W, there's even very-low voltages Pentium M that uses 5W IIRC (didn't saw them in retail, but didn't looked either).

And as far as I know, we don't know precisely yet the consumption of Cell ? Did I miss it ?


Besides this, I think that if you look for computation power, that's probably the way to go (XBox 360 processor shares the same philosophy -at a lesser degree-, Intel roadmap points to modified DSP in future processors, etc.)

Yeah, your last sentence is the gist of what he intended I think. He didn't give any absolute numbers for Cell, only that range. FWIW, Nvidia's David Kirk made a similar point last month at a talk here - his "biggest worry" was that graphic processors would be reaching an absolute ceiling in terms of wattage they could safely use in the next 5 years.
 
antipode said:
Heh, sorry - no it's not London or Fridayton, it's a small presentation at Stanford for faculty and students. The main talk is over, he's just taking questions from professors now. Overall he seems pretty confident in the architecture but a little bewildered at all the different applications people are coming up with and what that means for whether his current design decisions were right and what Cell 2 will look like. He pointed to the "Magic Mirror" (I think that was Toshiba's?) as an application he would never have imagined was possible when he started the project. It sounds like the games are also more ambitious in terms of what he thought necessary - specifically needing DP floating point to handle the number of objects in the game world.

Very cool. Thanks for providing this info.
 
No problem. Some of the stuff is probably confusing just because of my shorthand notes, not because it's difficult to explain...

BTW, I asked him about that stuff DCharlie was saying earlier about Sony reserving some of the 7 SPEs for the OS and not for games. He said he couldn't comment specifically on the Playstation 3 but could explain why it wasn't necessary or likely - they designed the Cell with the hypervisor able to partition access to bandwidth for all the SPEs, for real-time applications, so any OS wouldn't normally lock an entire SPE considering it wouldn't need the compute power. So I'm not sure if what DCharlie said is actually true.

It also sounds like the FlexIO that's going to connect the RSX to Cell has alot of tricks up its sleeve - they're using some of them in the IBM Blade to connect multiple Cells together.
 
antipode said:
It also sounds like the FlexIO that's going to connect the RSX to Cell has alot of tricks up its sleeve - they're using some of them in the IBM Blade to connect multiple Cells together.

:O

Thats extremley interesting

puebla said:
what's happening on "Fridayton"??

Don't mean to talk for him, but I believe he was joking around with that
 
BlueTsunami said:
:O
Don't mean to talk for him, but I believe he was joking around with that
oh my bad. i coulda swore i saw some other posters refer to some big announcement from Sony in 1 or 2 days. i don't know if they were joking or not.
 
puebla said:
oh my bad. i coulda swore i saw some other posters refer to some big announcement from Sony in 1 or 2 days. i don't know if they were joking or not.

Please don't speak ill of the dead
to me now :( MassiveAttack will be missed.
 
Diablos said:
is there anyone here willing to break this down for at least some of us?
Somebody is doomed. Exactly which company that is has yet to be ascertained. But rest assured - SOMEBODY is doomed.
 
bishoptl said:
Somebody is doomed. Exactly which company that is has yet to be ascertained. But rest assured - SOMEBODY is doomed.

What exactly are you trying to say? It seems like you are typing in code or something. :/
 
BlueTsunami said:
Then came back to life and his return is currently being waited for by hundred of millions

so your ps3s will die, and levitate out of this planet not to be seen for a long time? That sucks :<
 
DopeyFish said:
Depends on the year

Well...considering theres still a June coming around this year I would think you would be using that one.

but...

DopeyFish
leafs are confusing me
(Today, 12:46 AM)

I can understand why you couldn't figure that out
 
BlueTsunami said:
Well...considering theres still a June coming around this year I would think you would be using that one.

but...

DopeyFish
leafs are confusing me
(Today, 12:46 AM)

I can understand why you couldn't figure that out

I'm sorry dude, but failing to indicate a year makes it hard to make any opinion as to how long it would be. Assume nothing. Question everything. Sort of like taking my tag not knowing what the context was and trying to insult me. Completely failed? Yes. Completely.

Anyways back on topic!

When processing core companies say their core makes things more immersive, etc... they're talking completely out of their ass. Why? Because it's software side, not hardware side.

It doesn't matter what processor is running in there, the software has to be there or it's nothing but a little piece of a silicon wafer.
 
DopeyFish said:
I'm sorry dude, but failing to indicate a year makes it hard to make any opinion as to how long it would be. Assume nothing. Question everything. Sort of like taking my tag not knowing what the context was and trying to insult me. Completely failed? Yes. Completely.

1) Being on GAF and seeing how its unavoidable to know this information, I can honestly say that you knew what I meant. Though I like that your playing coy but I know who you are (and your role on this board) so its not a surprise

2) I could care less about the context of your tag. Anyone whos sees it (without knowing the contenxt) thinks your retarted because you get confused by leafs. So to anyone who doesn't know the context? Completley failed? No. Not Completley

DopeyFish said:
Anyways back on topic!

When processing core companies say their core makes things more immersive, etc... they're talking completely out of their ass. Why? Because it's software side, not hardware side.

It doesn't matter what processor is running in there, the software has to be there or it's nothing but a little piece of a silicon wafer.

1) Should have staid on topic in the first place.

2) You don't seem to understand what IBM in its entirety really is. Good job on trying to downplay IBM though!

*BlueTsuanmi salutes you*
 
DopeyFish said:
When processing core companies say their core makes things more immersive, etc... they're talking completely out of their ass. Why? Because it's software side, not hardware side.

It doesn't matter what processor is running in there, the software has to be there or it's nothing but a little piece of a silicon wafer.

I hardly think its that black and white. Specialized hardware enables specialized software, as well as the inverse. Its not a one way street. Purpose tends to be suited.
 
Then came back to life and his return is currently being waited for by hundred of millions

Jesus is the PSP then ?

BTW, I asked him about that stuff DCharlie was saying earlier about Sony reserving some of the 7 SPEs for the OS and not for games. He said he couldn't comment specifically on the Playstation 3 but could explain why it wasn't necessary or likely - they designed the Cell with the hypervisor able to partition access to bandwidth for all the SPEs, for real-time applications, so any OS wouldn't normally lock an entire SPE considering it wouldn't need the compute power. So I'm not sure if what DCharlie said is actually true.

well, might be how the PS3 works in comparison to how it is possible to not lock a full SPE for OS functions.

Certainly, an OS could have it's own thread rather than it's own SPE (?)

My info is old, it could all be different by now, but we shall see.
 
DCharlie said:
Jesus is the PSP then ?

I would think so. Though, we know that this rebirth will be happening this year with all the hot games coming out for it.



DCharlie said:
well, might be how the PS3 works in comparison to how it is possible to not lock a full SPE for OS functions.

Certainly, an OS could have it's own thread rather than it's own SPE (?)

My info is old, it could all be different by now, but we shall see.

Its sounds to me (though I'll probably be corrected by someone more knowledable) is the the game gets the full hardware dedicated to it? and when you go to the OS (like you would the Blade for the Xbox360) it pauses the game and realocates resources on the fly to the OS? then everything returns to normal when you return to the game?

I don't know how feasible that is but i'm reading it that way. Its going to be cool though, to see what is done with Hypervisor.
 
DCharlie said:
Jesus is the PSP then ?



well, might be how the PS3 works in comparison to how it is possible to not lock a full SPE for OS functions.

Certainly, an OS could have it's own thread rather than it's own SPE (?)

My info is old, it could all be different by now, but we shall see.

D.C., Hofstee said that it was not necessary, not that it was impossible ;).

Jokes aside, I hope that PLAYSTATION 3 OS guys are working hard and well.
 
Top Bottom