are going to have much info interesting to a non-engineer. Section 1. of those two docs are general overviews that should be somewhat understandable by someone with a moderate amount of computer knowledge.
The rest of the docs are really targeted at people writing code for Cell chips. Interesting to quickly browse through if you aren't a software engineer, but 90 percent of the info in the pdfs is very low level.
Probably the most interesting thing to look at in the docs is the diagram on page 20 of CBE_Architecture. Note how the SPUs and PPEs are labled 0..N connected to the Element Interconnect Bus. The Broadband Engine in the PS3 is just the first of many Cell chips to come. The architecture of Cell is designed to scale at will by adding more and more cores.
Once you have migrated your code to one Cell chip, it should be trivial to move it to more powerful Cell chips with more cores in the future. The PS3 will not be upgraded over the life of the platform, but taking a wider view of what Sony and NVidia are planning on working on together beyond the PS3, Cell will become a common media platform that scales from the smallest to the largest computing devices.
Does this surprise anyone, from the language extensions doc (p. 20)?
-----------------
... Programmer-directed branch prediction is provided using an enhanced version of GCCs __builtin_expect function.... For dynamic prediction, the value argument can be either a compile-time constant or a variable....
Dynamic Prediction Example
cond2 = ... /* predict a value for cond1 */
...
cond1 = ...
if (__builtin_expect(cond1, cond2)) {
foo();
}
cond2 = cond1; /* predict that next branch is the same as the previous */
------------------
I expect you can get much of the same effect via profiler guided optimization (which is in VC++ 2005) without having to add any language extensions.
This would also tend to be less effort for a dev to implement, because it's pretty much automatic and you can apply it over your entire program, not just the pieces you sit down to optimize.
Erhm - the point is that branch hints are dynamic in hardware - if you can only issue static hints no language extension would make dynamic ones happen.
Remember this is CPUs with no branch predictor, I completely agree profiler guided optimization would be great to have, but this extension will IMO still be usefull with or without it.
Anyway, having gone through the docs, I see no microarchitecture details, instruction latencies etc Yes we've seen some numbers about that before, but there were also some conflicting info (on load/store in particular), I'd like to see final word on it
Remember this is CPUs with no branch predictor, I completely agree profiler guided optimization would be great to have, but this extension will IMO still be usefull with or without it.
Good point, it is a nice extension to have on a CPU with no branch predictor.
PGO is really nice. If your profiler runs are representative of real data sets your app is going to process, then the branch hints inserted by the compiler will tend to be right more of the time, plus the optimizer will be able to do nice things like automatically reorder your code to improve locality and all sorts of other goodies -- best of all it doesn't make me do any extra work.
Yeah, I agree that PGO is going to be what you want to use 99% of the time. I could see some useful game things this dynamic branch prediction could do that the static b.p. that (I presume) PGO is currently doing couldn't though - like dynamically predicting whether a collison happens or not based upon whether a region is more or less sparse or a counter of the last few collisions. Even on-chip branch prediction can only get so close to that ideal.
I wonder if PGO is going to be improved to insert dynamic branch prediction. I also wonder - is it useful to expect a variable is going to be equal to itself?
__builtin_expect(cond1,cond1)
I'm curious how this is implemented.