Cheers Fredi - These were the relevant bits that I could see..I don't know if any of this is new since Monday:
10.2 The Design and Implementation of a First-Generation
CELL Processor
9:00 AM
D. Pham1, S.Asano2,M. Bolliger1, M. Day1 , H. Hofstee1, C. Johns1 , J. Kahle1 ,
A. Kameyama3 , J. Keaty1,Y. Masubuchi2, M. Riley1, D. Shippy1, D. Stasiak1 ,
M.Wang1 , J.Warnock1, S.Weitzel1, D.Wendel1 , T.Yamazaki1 , K.Yazawa2
1IBM, Austin, TX
2Sony, Tokyo, Japan
3Toshiba, Austin, TX
A CELL Processor is a multi-core chip consisting of a 64b Power
architecture processor, multiple streaming processors, a flexible IO
interface, and a memory interface controller. This SoC is implemented in
90nm SOI technology. The chip is designed with a high degree of
modularity and reuse to maximize the custom circuit content and achieve
a high-frequency clock-rate.
7.4 A Streaming Processing Unit for a CELL Processor
3:15 PM
B. Flachs1, S. Asano2, S. Dhong1, P. Hofstee1, G. Gervais1, R. Kim1 , T. Le1 ,
P. Liu1, J. Leenstra3, J. Liberty1, B. Michael, S. Mueller3, O. Takahashi1 ,
Y.Watanabe2 , A. Hatakeyama4,H. Oh1, N.Yano2
1IBM, Austin, TX
2Toshiba, Austin, TX
3IBM, Boeblingen, Germany
4Sony, Austin, TX
The design of a 4-way SIMD streaming data processor emphasizes
achievable performance in area and power. Software controls data
movement and instruction flow, and improves data bandwidth and pipeline
utilization. The micro-architecture minimizes instruction latency and
provides fine-grain clock control to reduce power.
20.3 A Double-Precision Multiplier with Fine-Grained Clock-
Gating Support for a First-Generation CELL Processor
9:30 AM
J. Kuang1, T. Buchholtz2, S. Dance2 , J. Warnock3, S. Storino2, D. Wendel4
1IBM, Austin, TX
2IBM, Rochester, MN
3IBM, Yorktown Heights, NY
4IBM, Böblingen, Germany
A double-precision multiplier for a 90nm SOI CELL processor is
presented. Dynamic Booth logic is designed for scalability and with noise,
leakage, and pulse-width variation tolerance. Static partial-product
compression is implemented with replicated bits for performance. The
design supports fine-grained clock gating domains for active power reduction.
26.7 A 4.8GHz Fully Pipelined Embedded SRAM in the
Streaming Processor of a CELL Processor
4:15 PM
T. Asano1, T. Nakazato2, S. Dhong3, A. Kawasumi2, J. Silberman4,
O. Takahashi3, M. White3, H.Yoshihara5
1IBM, Yasu, Japan
2Toshiba, Austin, TX
3IBM, Austin, TX
4IBM, Yorktown Heights, NY
5Sony, Austin, TX
A 6-stage fully pipelined embedded SRAM is implemented in a 90nm SOI
technology. The array uses a conventional 6-transistor memory cell and
sense amplifier to achieve the cycle time while minimizing the impact of
device variation. A sum-addressed pre-decoder allows partial activation for
power savings.
28.9 Clocking and Circuit Design for a Parallel I/O on a First-
Generation CELL Processor
4:45 PM
K. Chang1, S. Pamarti1, K. Kaviani1, E. Alon1,2, X. Shi1, T. Chin1, J. Shen1,
G.Yip1, C. Madden1, R. Schmitt1, C.Yuan1, F. Assaderaghi1, M. Horowitz1,2
1Rambus, Los Altos, CA
2Stanford University, Stanford, CA
A parallel I/O is integrated on a first-generation CELL processor in 90nm SOI
CMOS. A clock-tracking architecture suppresses reference jitter to achieve
6.4Gb/s/link operation at 21.6mW/Gb/s. SOI effects on analog circuits, in particular
high-speed receivers, are addressed to achieve a receiver sensitivity of
±12mV at 6.4Gb/s with BER <10-14 measured using 7b PRBS data.
edit - this might also be relevant (?):
20.1 An 8GHz Floating Point Multiply
8:30 AM
W. Belluomini, D. Jamsek, A. Martin, C. McDowell, R. Montoye,
T. Nguyen, H. Ngo, J. Sawada, I. Vo, R. Datta
IBM, Austin, TX
The implementation of the mantissa portion of a floating-point multiply
(54x54b) is described. The 0.124mm2 multiplier is implemented using
limited switch dynamic logic and operates at speeds up to 8GHz in a 90nm
SOI technology. The multiplier dissipates between 150mW and 1.8W as it
scales between 2GHz and 8GHz.