• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Tesla's Dojo and Japan's Fugaku share a lot of ideas with the Cell, 16 years later

LordOfChaos

Member


https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1aa63a93-22f1-418c-ae45-575b2f98e454_1024x545.png

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F00dd6cc5-e0b5-4a87-bec8-b3680dfd344c_1024x509.png


Let's see here...

-An in-order CPU with SMT commanding wide SIMD units, reducing complexity over out-of-order in favor of more transistors doing SIMD and other functions that make things fast
-No or not much cache, largely uses local memory, same idea as above
-No GPU in the mix, everything done on a CPU. GPUs just ended up being good at massively parallel compute, but you don't /need/ a GPU to do that if you're not a GPU design company to start with.
-Heavy focus on fabric bandwidth, a unit can do a job and quickly pass it off, do both a calculation and transfer in the same cycle

The worlds top Fugaku supercomputer shares a lot of similar principals, there's no GPU in the mix, but the A64FX CPUs have a heavy focus on SIMD. A CPU-only system becoming the top supercomputer in the world is wild!

This is not of course a thread saying Cell was comparable in performance to this 16 years ago, but the ideas that seemed bizarre and complicated back then are now well represented in the worlds top supercomputer, and Tesla's Dojo which may also have a shot up there at a targetted 1.1 exaflops.

Was it worth it in a game console, that's a whole different debate and I'd tend to side with no, but until 2009 it was also powering another top supercomputer before development waned, but these ideas are back in blush now. I wonder if they had kept going with it if it could have been represented in something like these.

Just thought this was interesting watching AI day.
 
Last edited:

LordOfChaos

Member
My ps3 was going to connect to all ps3's worldwide to increase framerates.

What happend to that?


And your fridge and your toaster!


Some of the ideas were definitely Krazy, but the overall idea of the architecture did end up being the way top computers 16 years later massively scale, so there were some good ideas there too.
 
This is interesting stuff, but I dunno how well it actually does in the grand scheme of things.

D1 has comparable FP32 performance (22TF) vs Nvidia A100 (19.5TF FP32). They both also have similar BF16 performance (362TF for Dojo vs 316TF for A100).
Trouble is, A100 is already on the market and has been for ages and is gonna be replaced with H100 next year.
 

M1chl

Currently Gif and Meme Champion
And this sort of implies what? That it's good solution for really narrow set of applications?

Sony putting Cell to PS3 is akin MS bundling Kinect with X360/X1, both terrible technologies for gaming.

Also let's first see what they can do with it, because Musk is CEO of Vaporware.
 

LordOfChaos

Member
This is interesting stuff, but I dunno how well it actually does in the grand scheme of things.

D1 has comparable FP32 performance (22TF) vs Nvidia A100 (19.5TF FP32). They both also have similar BF16 performance (362TF for Dojo vs 316TF for A100).
Trouble is, A100 is already on the market and has been for ages and is gonna be replaced with H100 next year.

I think a bigger difference may show itself on usable flops, similar to the FSD 3 computer in the cars. The flops are a paper theoretical metric, the better tailored the hardware is to your model the better you can use them. Things like the 1 cycle fabric transfer latency are probably big helps to keeping the utilization up.

Nvidia also keeps cranking up what they charge for their training chips, so if you only match A100, that's not a bad spec either once you get to scale. This is the trend with cloud vendors like Amazon, Google, and Microsoft already, once you have to pay enough in hardware licensing fees you start to consider your own hardware.


And this sort of implies what? That it's good solution for really narrow set of applications?

Sony putting Cell to PS3 is akin MS bundling Kinect with X360/X1, both terrible technologies for gaming.

Also let's first see what they can do with it, because Musk is CEO of Vaporware.

Musk aside Fugaku is the worlds fastest supercomputer right now and is built on similar principals. And yeah it was a mixed bag to probably a waste of effort on a console, both things I already mentioned. It did live on in the PowerXCell 8i which held a top supercomputer spot for a while, until 2009, but years later we're back to the idea in the top spot
 
Last edited:
I think a bigger difference may show itself on usable flops, similar to the FSD 3 computer in the cars. The flops are a paper theoretical metric, the better tailored the hardware is to your model the better you can use them. Things like the 1 cycle fabric transfer latency are probably big helps to keeping the utilization up.

Nvidia also keeps cranking up what they charge for their training chips, so if you only match A100, that's not a bad spec either once you get to scale. This is the trend with cloud vendors like Amazon, Google, and Microsoft already, once you have to pay enough in hardware licensing fees you start to consider your own hardware.

Problem is, by the time this is ready at scale, H100 will exist and probably double A100's performance.
Radeon Instinct MI200 is already shipping to HPC clients with similar BF performance, but significantly better FP performance (current estimates at 42TF FP64 / 84TF FP 32), and MI300 is due to launch H2 2022 well.

And let us not forget Intel's Ponte Veccio which will also be out H2 2022.
 

LordOfChaos

Member
Problem is, by the time this is ready at scale, H100 will exist and probably double A100's performance.
Radeon Instinct MI200 is already shipping to HPC clients with similar BF performance, but significantly better FP performance (current estimates at 42TF FP64 / 84TF FP 32), and MI300 is due to launch H2 2022 well.

And let us not forget Intel's Ponte Veccio which will also be out H2 2022.

I mean hardware will always leapfrog itself like this, they said they'd have it online next year, if H100 ships Q2 2022 when could they have a 1 exaflop system up out of it by. This is a self described gen 1 and there's no doubt more coming, and now they're not bound to the release cycles and margins of another company (sans the fab).

If they're a year behind but control their own silicon that's not the worst thing in the world, cloud vendors that get big enough all seem to tend towards making their own.
 
Last edited:

Goalus

Member
And this sort of implies what? That it's good solution for really narrow set of applications?

Sony putting Cell to PS3 is akin MS bundling Kinect with X360/X1, both terrible technologies for gaming.

Also let's first see what they can do with it, because Musk is CEO of Vaporware.
Is he?
His sense of time seems to be a bit off, but other than that I see a lot of real products.
 
Top Bottom