It's similar to the
SPU of a CELL...
CELL is two different types of cores (heterogeneous); the SPU is but one, and it would take at least a dozen Tempest Engines (each equivalent to eight SPUs) to run that urban landscape demo -- which means a PS3 didn't produce that RT'd imagery by itself. It was actually assisted over a network by seven QS20 CELL-based blades (14 CELL processors)...
One of the visualization architects responsible for the demo wrote on his blog:
Sounds sweet... Sweeney would be all over something like that. Years ago he voiced:
I'd like to think that if a "CGPU" of Sweeney's (and your) description is the future of rendering, then a CELL-based CGPU may be biding its time given that:
- CELL demonstrated it could trounce a top-off-the-line GPU at software-based RT
despite being significantly disadvantaged in terms of transistors and flops (in line with
Sweeney's circa '99 prediction for '06-7: "CPU's driving the rendering process"... "3D chips will likely be deemed a waste of silicon")
- the
PlayStation Shader Language (PSSL) is based on the same ANSI C standard superheads from MIT
used to mask CELL's complexity
-
SPUs are programmable in C++ languages so SPU support of PSSL's C++ structs and members can be added with little or no hassle, which means
the benefits of PSSL would likely extend to a massive many-core CELL-based CGPU designed to
run shaders across a legion of SPUs and CUs using a single simple shader language
- entries [0015], [0016], [0017] [0052]
of this patent and entries [0017], [0018], [0033]
of this patent say that the described methods for backwards compatibility can be implemented on "new" processors in various ways
-
IBM's open-source customizable A2I processor core for SoC devices has a number of features (it can run in little endian mode; addresses x86 emulation) that make it a prime candidate to replace
CELL's PPE (had instructions for translating little endian data; addressed x86, PS1, 2, PSP emulation; A2I's LE mode would add PS4, 5, assumably 6 emulation)
- FreeBSD (
PS4's OS based on 9.0, PS5 presumed to be 12.0 based)
supports PowerPC; A2I is a PowerPC core with little (x86) and big (PPC) endian support; An A2I/CELL-based PS7 could run a little endian OS carried over from an x86-based PS6 or run an entirely new one written with little or big endian byte ordering
- FreeBSD now
only supports LLVM's Clang compiler
- Clang/LLVM (frontend/backend compilers, SIE made the full
switch to Clang x86 frontend during PS4 dev)
support PowerPC (frontend), CELL
SPU (backend courtesy of Aerospace corp.) and Radeon (backend)
- AMD's open-source Clang/LLVM-based compiler with
support for PowerPC offloading to Radeon under Linux can serve as a reference for an SIE Clang/LLVM-based compiler that supports PPC (A2I)/CELL (SPU) offloading to Radeon under FreeBSD (
Linux and FreeBSD were cut from the same cloth)
Interestingly, some folks at Pixar seem to think an architecture that melds CPU and GPU characteristics would be the preferred option for path-tracers, and anticipate such a chip may appear by 2026:
IMO, the unfortunate thing about their future outlook is that
the pros and cons of the "CPU vs. GPU" debate were rendered moot over a decade ago. I recall Kutaragi saying that he expected CELL to morph into a sort of integrated CGPU at some juncture and wanted Sony to make a business of selling them:
Too bad SCE wasn't able to startup that business and become a successful vendor of CELL-based CGPUs back then. There's no telling how customer feedback from the likes of Pixar might've influenced future designs or what of those designs might've trickled down to PS consoles -- but all isn't lost. If SIE were to bring CELL out of "cryopreservation" for a CGPU, there are at least two coders (one of which is an authority on CELL) who'd jump at the chance to help shape its feature set and topology...
-
Mike Acton (CELL aficionado, former Insomniac Games Engine Director, current Director of DOTS architecture at Unity (
career timestamped) had a few things he wanted to see implemented that would
bring CELL's capabilities closer to those of today's GPUs
- Michael Kopietz wants to
play with a "Monster!" -- a 728 SPU monster...
The 'Monsters Inc.' slide says SPEs can replace specialized hardware. I'm not much of a techy, but I presume SIE would skip on ML hardware so the algorithms could benefit from the theoretical higher clock frequency, wide SPU parallelism and massive internal bandwidth of a "CELL2". A BCPNN study showed that
CELL was extremely performant vs. a top-tier x86 CPU from its era due mainly to the chip's internal bus
(EIB) bandwidth.
I think a CELL2 with a bus(es) broad enough to harbor and speedy enough to shuttle a few TB/s of data around internally to hundreds of high frequency SPEs to drastically accelerate ML (and other workloads in parallel), would be better than having dedicated ML hardware (and other dedicated hardware) constrained by GPU memory interface bandwidth limits and taking up space that could otherwise go to more TMUs or ROPs on the integrated GPU.
The re-emergence of CELL would not only give SIE an opportunity to further strengthen its collaborative ties with Epic and possibly establish a new one with Unity, but also allow Sony to pick up where it left off with ZEGO (a.k.a
the BCU-100, used to accelerate CG production workflows with Houdini Batch --
the Spiderman/GT mash-up for example) and become a seller of CGPUs to Pixar and other studios. I'm sure if SIE really wanted to, it could come up with something
in collaboration with all interested parties to satisfy creatives of all stripes in the game and film industries -- from Polyphony to Pixar...
The thought of a future PS console potentially running a first or third-party game engine that drives something like the above Pixar KPCNN in real-time intrigues me. My wild imagination envisions a many-core CELL2 juggling:
-
geometry processing (Nanite on SPUs in UE5's case)
- physically accurate expressions of face, hair, gestures, movement, collision, destruction, fluid dynamics, etc.
- primary ray casts
-
BVH traversal (video of scene
shown on the page) for multi-bounce path-tracing
-
SPU-based shading and pre/post-processing of
diffuse and specular data (Lumens on SPUs in UE5's case)
-
Monte Carlo random number generation for random sampling of rays
- filtering (denoising) the diffuse and specular ray data via two convolutional neural networks (CELL
has a library for convolution functions,
consumes neural networks of all types and is
flexible in how it processes them)
while an integrated GPU from the previous gen (or bare spec next gen entry level GPU) dedicates every single flop of compute it has to the trivial tasks of merging the diffuse/specular components of frames that were rendered and denoised by SPUs; then displaying the composited frame in native resolution (the GPU would only do these two tasks and provide for backwards compatibility; in bc mode
Super-Resolution could be
done by SPEs accessing the framebuffer to
work their magic on pixels)
Given that
George Lucas and Stephen Spielberg used to discuss the future of CG with Kutaragi, I think a hybrid rendering system of this sort would've been front of mind for a Kutaragi in the era of CGPUs. With
SPUs acting as a second GPU compute resource for every compute
intensive rendering workload, the GPU would be free to fly at absurdly high framerates. Kaz wants fully
path-traced visuals (timestamped) in native
4K at 240 fps for his GT series (for VR I guess), I suspect a CGPU with 728 enhanced SPEs at 91.2 GFLOPS per SPE (3x the ALU clocked at 3.8 Ghz) would give it to him. Maybe we'd even get that photorealistic army of screaming orcs
Kutaragi was talking about too (timestamped).
Realistically though, the current regime has a good thing going with their "PC cycle incrementalism bolstered by PS3's post-mortem" approach to hardware; so they'll likely work with AMD to come up with a
Larrabee-like x86/Radeon design influenced by aspects of
CELL's EIB rings. So long as its $399, can upscale to the native res of the day, run previous gen games at ~60 fps and show a marginal increase in character behavior/world simulation over what we have today, it'll be met with praise from consumers.
Personally, I'd love to see them do more than just balance price and performance on a dusty 56 years old model born of Moore's law. I'd be elated if they balanced the two on one of CELL's models, because
unlike the "modern" CPUs an AMD sourced CGPU would descend from, CELL:
-
doesn't waste half its die area on level cache; more of it goes to
ALU (in red) instead to help achieve
multiples of performance with greater power efficiency
-
was intended to outpace the performance of Moore's law and designed to break free of
the programming model the law gave rise to
- was designed to effectively challenge
Geisinger's law and Hofstee's collorary (two bosom buddies of Moore's law that use cache size, hardware branch prediction etc. etc. to
limit traditional processor design performance gains to 1.4x (i.e. ~40%) despite getting 2x the transistors, and
drive power efficiency down by ~40% -- and they won't be going anywhere without fundamental changes in chip design)
All things being equal (node, transistor count, size, modern instruction sets, iGPU) it seems to me that a CELL/Radeon CGPU would be way more performant and much more power efficient than a Larrabee-like x86/Radeon CGPU for what I presume would be similar costs. That means a lot more bang for my buck and I'm all for
that kind of "balance".
Hopefully the merits of CELL won't continue to go ignored.