For the interested, IBM posted another paper on one of their remaining E3 demos, the Alias cloth solver. IGN reported on it at the time:
The IBM paper is here:
http://www.research.ibm.com/cell/whitepapers/alias_cloth.pdf
The paper compares performance of a 2.4Ghz Cell to a 3.6Ghz Pentium 4. Interesting points:
- One SPU at 2.4Ghz performs better than a P4 a 3.6Ghz
- 8 SPUs provides about a 5x speedup - if you scaled the improvement with the clockspeed, it'd be more like 7.5x, or a linear speedup
- The PPE VMX appears to underperform very poorly, for whatever reason. Possibly because it seems they were using the initial DD1 version of Cell, which seems to have a weaker VMX unit than DD2..?
edit - Apparently the ones they were using are actually the second, DD2 revision of Cell. So that makes the relative VMX performance puzzling.
The next demo was based on a new cloth simulation algorithm being worked into Maya. Again using two Cell processors, the demo was able to run 16 separate simulations simultaneously. Each piece of cloth was defined by 300 vertices, but the real kicker with this demo is that the algorithm incorporated self-intersecting physics, keeping the cloth from flowing through itself. This sort of simulation is much more computationally-intensive than simulating a cloth against another object.
The IBM paper is here:
http://www.research.ibm.com/cell/whitepapers/alias_cloth.pdf
The paper compares performance of a 2.4Ghz Cell to a 3.6Ghz Pentium 4. Interesting points:
- One SPU at 2.4Ghz performs better than a P4 a 3.6Ghz
- 8 SPUs provides about a 5x speedup - if you scaled the improvement with the clockspeed, it'd be more like 7.5x, or a linear speedup
- The PPE VMX appears to underperform very poorly, for whatever reason. Possibly because it seems they were using the initial DD1 version of Cell, which seems to have a weaker VMX unit than DD2..?
edit - Apparently the ones they were using are actually the second, DD2 revision of Cell. So that makes the relative VMX performance puzzling.