• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Next-Gen PS5 & XSX |OT| Console tEch threaD

Status
Not open for further replies.

Ar¢tos

Member
If "BOOM" is referring to the price, I agree. I wish that were true, but unless CELL2 was going to cost pennies, it's not happening.
I could see a CELL redesign without the limitations of the previous one, with an easy to use and efficient API to distribute tasks per SPUs, for scientific calculations farms that have no need for gpus. (so, not gonna happen!)
 

Lort

Banned
Only the PPE was powerpc, and it had an extented instruction set.
The SPUs were RISC based with a custom instruction set.
The PPE can be emulated by a toaster, its the SPUs that are the issue, and not even 2 X1Xs taped together would be able to match the SPUs SIMD abilities.

What the hell are you taking about ?

So misguided and wrong...

Ps3 230 gigaflops
Xbox 360 240 gigaflops
Xbox one x 6 teraflops


Also all PowerPC cpus are RISC based ( not that that means anything) and the xbox 360 had an extended instruction set with lots of SIMD commands that the PS3 did not.
 
Last edited:

Ar¢tos

Member
What the hell are you taking about ?

So misguided and wrong...

Ps3 230 gigaflops
Xbox 360 240 gigaflops
Xbox one x 6 teraflops


Also all PowerPC cpus are RISC based ( not that that means anything) and the xbox 360 had an extended instruction set with lots of SIMD commands that the PS3 did not.
The PPE instruction set (PPEIS) is designed as a superset of the PowerPC Architecture
instruction set (with a few additions and changes), which adds new vector/SIMD multimedia
extensions (VSME) and their associated C/C++ intrinsics.
The SPEIS draws a similarity between itself and
the PPE VSME because they both operate on SIMD vectors. But under the hood they are quite
different instruction sets which need different compilers for processing programs designated to
PPE and SPEs



IBM has tons of public documentation on the cell also.
 

Ar¢tos

Member
Not sure why you quoted me there as you weren't disagreeing with anything i said.
What does the flop count of x1x has to do with anything?
It could have 40Tflops and it still wouldn't be able to emulate the ps3, because it would still have a Jaguar CPU that couldn't emulate the CELL.
 

Fake

Gold Member
So even naughty dogs best coders couldnt tap the potential of the PS3.. so there was all this untapped potential ... apparently.

Alternatively in the real world .. even maxed out, the ps3 with all its SPU power was exactly 0 years ahead performance wise.
Don't change the fact they're different. Don't even need to count Naughty Dogs. Look at Kingdom Hearts 1.5/2.5 for example. 1080p at 60 fps on base PS4 and native 4K at 60 fps on PS4pro.
Remaster and BC are different beasts.
 
Last edited:

Lort

Banned
What does the flop count of x1x has to do with anything?
It could have 40Tflops and it still wouldn't be able to emulate the ps3, because it would still have a Jaguar CPU that couldn't emulate the CELL.

You seem a little confused SIMD can be by done by CPU or GPU .. that is not the problem .. the difficulty in emulation has nothing to do with the SIMD capability at all. The difficulty in emulation ( amongst other things) is the lack of latency between a write and a read of the tiny internal SPU memory.. the closest thing to that is ESRAM and the xbox one x just brute forced it when emulating the 360. The real reason is because noone cares to put money behind an emulator.

Naughty Dog claimed to be using almost all the SPU capability by the end of the gen and easily covered all the Ps3 cell “power” by writing the tasks for a normal CPU and GPU and used the register memory to cover for the SPU internal memory. Note that the 360 had games that used the GPU for physics etc as in “async compute”.

The PS3 is exactly as powerful as it looks ... 230 gflops..

 
Last edited:

shark sandwich

tenuously links anime, pedophile and incels
LOL we are still paying the price for Cell nearly 15 years later. Expensive, extremely difficult to program, and now extremely difficult to emulate.

But hey, we got a handful of games that really unlocked its full potential several years into the generation.
 

Panajev2001a

GAF's Pleasant Genius
What the hell are you taking about ?

So misguided and wrong...

Ps3 230 gigaflops
Xbox 360 240 gigaflops
Xbox one x 6 teraflops


Also all PowerPC cpus are RISC based ( not that that means anything) and the xbox 360 had an extended instruction set with lots of SIMD commands that the PS3 did not.

To call PowerPC RISC is a bit of a stretch considering the microcoded instructions it has but sure... not sure I buy into that Gamespot rankings with perls such as:
Released on March 4, 2000, Sony's PlayStation 2 used a 150MHz Graphics Synthesizer solution that offered 6.2 gigaflops of performance, which is 4.4x more than the Dreamcast.

GS has flexible fixed function logic, that quote is for the EE. Considering how strong the SPE’s were and both Xenos and RSX had programmable shaders, a count covering both CPU’s and GPU’s would be fair and would seem to be way north of that value.

Emulating the SPE’s LS would be challenging but you have so much cache in modern Zen cores to do that (lockable cache lines) while the register file is more challenging, the mailboxes and other synchronisation primitives, and the built in DMAC would provide the hurdle as you not only need to make it work, but need accuracy to boot. As console games get more complex you need to make sure you do not have small bugs.

We will see what they do with PS5, but I think we will see gradual unveiled PS3 emulation.
 

SonGoku

Member
from some people EUV is a massive cost increase and hardly worth it and others are super mega bullish
Got some links where more expensive is mentioned? im curious
From what I've read is supposed to bring cost reductions

My understanding is as follows
  • If you designed a 7nm chip, the benefits of 7nm+ euv are not worth the redesign costs which is why tsmc claimed that most 7nm clients will migrate to 6nm which requires minimum retooling/investment.
  • If you design your chip on 7nm+ euv from the get go it shouldn't be any more expensive than 7nm in theory.
Ideally, we would need a brand new CPU architecture built just with gaming in mind. All existing ones have drawbacks. This would work for consoles, but it would make pc ports harder, since pcs need to do other things and would keep x86.
The R&D costs for such an enterprise are cost prohibitive for non CPU makers. If only ARM was interested in gaming, they could design gaming focused CPUs and license the IP.
Its could have many clients with cloud streaming becoming all the rage: Sony, MS, Nintendo, Google, nvidia

My only concern is seeing how Apple custom CPUs surpass ARM designs.
without it you end up with Bayonetta for PS3.
Bayonetta happened because of RSX and it being outsourced to sega... If devs optimized their code for CELL it would have matched the 360
PG actually developed the PS3 version of their next game, Vanquish, to excellent results.
 
Last edited:
You could use the ACE but their speed are too slow to not run into some issues. Unless you could split those instructions which would not be easy. I thought they had 8 with one disabled for yield. But 6 sounds right. PS3 emulation requires 6 units capable of running double precision ops at 3.2GHz. The PPE will be very simple to emulate. The SPEs are a different beast. You still need to convert RISC to x86 which is up to Sony and relatively simple.
The bold part (FP64 performance) needs some context:

"For double-precision floating point operations, as sometimes used in personal computers and often used in scientific computing, Cell performance drops by an order of magnitude, but still reaches 20.8 GFLOPS (1.8 GFLOPS per SPE, 6.4 GFLOPS per PPE)."

For single-precision/FP32 (that's what we want in video games) it's 25.6 GFLOPS per SPU (a total of 153.6 GFLOPS for 6 SPUs).

Are you sure the Zen 2 AVX2 FPU is not up to the task for that?

Also, let's not forget that MS/Sony have a partnership these days, which includes (among other things) making common dev tools for their platforms.

Who's to say that MS (with their 360 BC expertise) won't help Sony to develop a PS3 emulator? There's also RPCS3 from hobbyists and Sony's knowledge on the PS3 platform/innards.

RSX emulation shouldn't be that hard. The hard part IMHO would be to get a licence from Nvidia to emulate RSX tech on PS5. More of a legal than a tech issue if you think about it.

Last but not least, Sony will eventually have to replace proprietary PS3 blades that are no longer manufactured due to obsolete tech (Cell/RSX/XDR/GDDR3).

Unless of course they're willing to let the PS3 ecosystem die, which would be a shame from a game preservation and financial perspective...
 
Last edited:

xool

Member
.. the difficulty in emulation has nothing to do with the SIMD capability at all. The difficulty in emulation ( amongst other things) is the lack of latency between a write and a read of the tiny internal SPU memory..
One possibility is to :
  • Ignore transfers between main memory and cell SPE memory
  • Convert cell memory access into direct access to main memory (similar to de-relavatising memory access intructions)
  • Let the L2/L3 cache pick up the weight

Consequences :There'll be a different sort of latency - instead of a pause whilst memory is loaded into SPEs there will be latency as the new main memory addresses is/are first accessed from SPE code (and copied into L2 cache) - but subsequent accesses should be fast.

If the processor has more than 6 cores, and the SIMD capability exceeds that of cell, and the clock speed is comparable or better it might be workable.

Emulating the SPE’s LS would be challenging but you have so much cache in modern Zen cores to do that (lockable cache lines)

I think this is close to what I was thinking above

while the register file is more challenging
Yes 128 SIMD registers is difficult. Even AVX-512 doesn't have that many .. ...thinking... no I'm stuck

The real reason is because noone cares to put money behind an emulator.
Yep. I really don't have any desire to revist PS3 games - there's a very few from that era I'd be interested - and it's Splinter Cell-Blacklist - so much easier via the 360 version.

The work would be for MGS4 and Demon's Souls I guess.
 

Gamernyc78

Banned
Actually i do ive studied CPU design. AMD and Intel x86 CPUs have a microarchitecture design that decodes x86 commands into RISC style instuctions, out of order execution has been around since 1990. The DEC Alpha engineers joined AMD. Did you know any of that?

The design of the Cell sounded good in theory until you understand its crippling memory bandwidth issues, you cant get data in and out of the SPU, theres a reason every other processor prioritizes cache and mem IO as part of the CPU and GPU design because without it you end up with Bayonetta for PS3.

The Cell failed at everything it was designed for. The custom PPC on the xbox 360 had a hardware dot product instruction, prefetch commands, could write from GPU back into the CPU and the GPU hosted the mem controller (not the CPU as in every other computer ever made).

The 360 was extremely smartly designed but lacked a legion of fans who would be prepared to flood the internet with ill-informed tech propaganda.

Yet almost all the major ps3 exclusives bested the 360 exclusives head to head in their respective genres. Uncharted 2/3 technically and graphically were on another level and looked better than Gears, Microsoft use to put out Halo Bullshots to try to keep up with Killzone but would get demolished when the real games were released, God of War was on another level. The Last of Us ppl couldn't believe.

I mean it's great "you say" you have this knowledge of cpu design and we all know cell was limiting and difficult for third party devs to master but in the end we all know first party devs and their output is how ppl judge the power of a console :) There is no question the cell was putting in that work 😁

Also towards the end of the gen almost all the multiolats were performing equally on both consoles once the devs were able to utilize the cell better.

The 360 only needed one fanboy to turn the propaganda wars around. And he succeeded.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
Got some links where more expensive is mentioned? im curious
From what I've read is supposed to bring cost reductions

My understanding is as follows
  • If you designed a 7nm chip, the benefits of 7nm+ euv are not worth the redesign costs which is why tsmc claimed that most 7nm clients will migrate to 6nm which requires minimum retooling/investment.
  • If you design your chip on 7nm+ euv from the get go it shouldn't be any more expensive than 7nm in theory.

The R&D costs for such an enterprise are cost prohibitive for non CPU makers. If only ARM was interested in gaming, they could design gaming focused CPUs and license the IP.
Its could have many clients with cloud streaming becoming all the rage: Sony, MS, Nintendo, Google, nvidia

My only concern is seeing how Apple custom CPUs surpass ARM designs.

Bayonetta happened because of RSX and it being outsourced to sega... If devs optimized their code for CELL it would have matched the 360
PG actually developed the PS3 version of their next game, Vanquish, to excellent results.

It is more the price to buy, configure, and run the equipment to make it possible then the manufacturing steps themselves.




Slower improvements jumps getting slower: https://www.anandtech.com/show/1272...-scaling-but-thin-power-and-performance-gains
 

psorcerer

Banned
In practice, if you were coding for the cell's SPE you would have known what you were doing and getting the vectorisation right, so the actual code optimisation shouldn't have been a huge issue. It's just that the system had a huge flaw/drawback if the data set+program exceed the small amount of memory available

Not even close. You can pipeline things between spe's. You can scatter/gather from main ram. Etc.
The biggest problem for top performance was tlb misses, afaik. SPEs still needed to access ram in ppu pages.
 
About xool/psorcerer's conversation

giphy.gif
 
Last edited:

xool

Member
Not even close. You can pipeline things between spe's. You can scatter/gather from main ram. Etc.
The biggest problem for top performance was tlb misses, afaik. SPEs still needed to access ram in ppu pages.
I don't see how that additional information make me wrong. My main point was the limitation of the 256kB memory actually accessible to SPEs :
  • Pipeline between SPEs - yes the high bandwidth ring bus was a huge part of the design, but doesn't help directly when a data set exceeds 256kB
  • Scatter/gather from main RAM - yes - and wait for it to finish (also size aligned to 2^n iirc) - even though scatter/gather is queued it still has to be transferred from main RAM - there's no big cache helping here. tldr - having to make a list of memory locations to fetch from, and then wait for that DDR ram fetch to complete before you can even access the data is not a "feature"
  • "The biggest problem for top performance was tlb misses" - not seeing what TLB misses had specifically to do with cell architecture .. and anyway - if true - the problem returns to the one I originally suggested - ie having to tranfer memory from main RAM before it can be used
I mean I'm not even disagreeing - you added more information.

[edit - more] I'm confused about your TLB point -from https://crd-legacy.lbl.gov/~oliker/papers/CF06_cell.pdf is the sort of thing I am thinking of :
Additionally, in column layout, there is added pressure on the maximum tile size for large matrices, as each column within a tile will be on a different page resulting in TLB misses.

..but this TLB problem is caused directly because the SPE ram is limited to under 256kB . if they PPE [duh] SPE memory could store big matrixes (or other data sets) the TLB problem goes away I think.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
I don't see how that additional information make me wrong. My main point was the limitation of the 256kB memory actually accessible to SPEs :
  • Pipeline between SPEs - yes the high bandwidth ring bus was a huge part of the design, but doesn't help directly when a data set exceeds 256kB
  • Scatter/gather from main RAM - yes - and wait for it to finish (also size aligned to 2^n iirc) - even though scatter/gather is queued it still has to be transferred from main RAM - there's no big cache helping here. tldr - having to make a list of memory locations to fetch from, and then wait for that DDR ram fetch to complete before you can even access the data is not a "feature"
  • "The biggest problem for top performance was tlb misses" - not seeing what TLB misses had specifically to do with cell architecture .. and anyway - if true - the problem returns to the one I originally suggested - ie having to tranfer memory from main RAM before it can be used
I mean I'm not even disagreeing - you added more information.

[edit - more] I'm confused about your TLB point -from https://crd-legacy.lbl.gov/~oliker/papers/CF06_cell.pdf is the sort of thing I am thinking of :

..but this TLB problem is caused directly because the SPE ram is limited to under 256kB . if they PPE [duh] SPE memory could store big matrixes (or other data sets) the TLB problem goes away I think.

Sure, but in the eyes of a PS2 developer it was a monstrous upgrade: super wide register file at 128x128 bits (vs 32x128 bits of VU1), super fast unified scratchpad (256 KB for each SPE vs 32 KB for VU1, divided in two halves for code and data), ability to access virtual memory and DMA directly (vs handling only physical addresses in the EE’s DMAC and the DMAC being only controllable from main MIPS core code... VU1 could not really self feed itself), ability to pass data directly to other units and synchronise with them (chained/serial operation mode and message mailboxes and atomic sync fabric: ACU https://n4g.com/news/41514/ps3-cell-chip-spes-each-have-atomic-cache).

Lots of upgrades, but also a very complex setup to emulate correctly at full speed.

P.S.: I see that as a feature because at the time it meant giving SPE’s their own memory management and that was sorely lacking in the VU’s.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
So the Cell is still considered some secret sauce power house that no one really figured out?
Let.it.go.
No one is still going around claiming a second GPU in the X1 and this Cell nonsense is just as crazy. Devs had 8 years and 360 still had the best looking multiplats and exclusives.

Xbox 360 with the best looking exclusives? It was a great console, but you are reaching there. You are free to believe CELL had no redeeming quality if it helps you sleep better at night ;).
 
Last edited:

FranXico

Member

We don’t yet know if Nagoshi’s comments on the PS5 could spell danger for the Xbox Two – the Yakuza series has primarily been a PlayStation exclusive IP. So, Nagoshi presumably hasn’t had the same access to Xbox’s next-gen technology to compare the progress being made.

So, he knows it's a big upgrade coming from PS4.The usual, then.
 

StreetsofBeige

Gold Member
What the hell are you taking about ?

So misguided and wrong...

Ps3 230 gigaflops
Xbox 360 240 gigaflops
Xbox one x 6 teraflops


Also all PowerPC cpus are RISC based ( not that that means anything) and the xbox 360 had an extended instruction set with lots of SIMD commands that the PS3 did not.
Are you sure?

I thought PS3 was 2 TF! lol
rsxbandwidth.jpg
 

Fake

Gold Member
Even if he spoke about Xbox two I more curious about the difference between PS4 and PS5. After his experience doing PS4 should be easy he tell the gap.
 

xool

Member
Sure, but in the eyes of a PS2 developer it was a monstrous upgrade:
Yep.

I thought the 360 CPU had interesting parallels to the overall design the PS2's CPU (though surely they don't share any DNA) .. replace MIPS (R5000 core?) and the VU0 and VU1 with three (practically) equivalent PowerPC cores, which are also vector SIMD feature rich, but this time are full cores .. also the third 360 CPU core being able to share some L2 cache with GPU is reminiscent of using VU1 to feed GPU (GS) with vertex data etc.

[conspiracy time] I wonder if there was some deliberate intention by MS to make 360 CPU seem familiar (to last gen's clear winner PS2), and at the same time - a lot easier to use ?

PS4 gen shows the 360 design "won" that argument.. (no flames please)

PS3 was just totally fresh innovation- almost everything in it seems re-invented, maybe the use of scratchpad RAM in VUs instead of proper L1 data/program cache is the only obvious carry over I can spot
 

Aceofspades

Banned
Xbox 360 with the best looking exclusives? It was a great console, but you are reaching there. You are free to believe CELL had no redeeming quality of it helps you sleep better at night ;).

Hell even late gen multiplat was running better or equal to 360. Even Rockstar praised CELL on PS3 (GTAV was running better on PS3 than 360)
 

Panajev2001a

GAF's Pleasant Genius
Actually i do ive studied CPU design. AMD and Intel x86 CPUs have a microarchitecture design that decodes x86 commands into RISC style instuctions, out of order execution has been around since 1990. The DEC Alpha engineers joined AMD. Did you know any of that?

The design of the Cell sounded good in theory until you understand its crippling memory bandwidth issues, you cant get data in and out of the SPU, theres a reason every other processor prioritizes cache and mem IO as part of the CPU and GPU design because without it you end up with Bayonetta for PS3.

The Cell failed at everything it was designed for. The custom PPC on the xbox 360 had a hardware dot product instruction, prefetch commands, could write from GPU back into the CPU and the GPU hosted the mem controller (not the CPU as in every other computer ever made).

The 360 was extremely smartly designed but lacked a legion of fans who would be prepared to flood the internet with ill-informed tech propaganda.

Intel actually got most of the Alpha people (they tried to get the Alpha teams twice actually, only the last time they were able not to have them resign en masse), especially the EV7 and the EV8 ones, assigned them to revolutionise the Itanium core, and then killed their project off based on internal politics (thanks ex Elbrus 2K lead... Moscow research center acquired by Intel in those years).

HW dot product was a nice to have, but meaningless in terms of where graphics were moving next (a horizontal operation in a world moving to vertical SIMD / scalar and SoA data layouts to increase throughout and allow wider SIMD widths).

Sure, a CELL v2 with say 512 KB of cache per SPE where you could lock a portion of it would have made it easier to code for, but for the time it would have blown the power and complexity budget and data layout optimised software managed fibers... see SPURS system... is something that actually survived the generation and where the industry was already going...). Sure, it was not easy to code for and some of RSX’s bugs forced developers to have to face it head on earlier than people would have wanted or needed to.

A CELL CPU with Xenos would have been a very very interesting experiment ;).
 

Gamernyc78

Banned
So the Cell is still considered some secret sauce power house that no one really figured out?
Let.it.go.
No one is still going around claiming a second GPU in the X1 and this Cell nonsense is just as crazy. Devs had 8 years and 360 still had the best looking multiplats and exclusives.

In your little bubble called the green verse lol along mister media.

We got to call you out on your blatant bullshit especially since Digital Foundry and lens of truth were always giving the graphics nod and graphical benchmarks to ps3 exclusives. That's when Microsoft were going hard with live action trailers instead of gameplay. In each respective genre Sony was spanking them....

FPS-Consensus by professionals and gamers, it wasn't even close Killzone 2/3 destroyed Halo visually.

TPS-Unchaeted 2/3 people couldn't believe what thy saw the pinnacle of graphics last Gen. Gears couldn't compete :)

TLOU-We won't even go there :)

Heavy Rain hmmmm

God of War there wasn't even any competition in tht genre

MLB the Show again no competition in exclusives :)

You get the point but if you dont there are dozens of articles on ps3 exclusives setting the bar when all Xbox gamers could do was wonder how they pulled it off. If you don't believe me lol go back look at articles about Killzone and Uncharted.

Stop the blatant trolling and bs. Even towards the end multiplats were already running on par to 360 versions.

I'll drop this again just in case you want to keep lying and talking nonsense. This isn't subjective it was facts all last Gen

 
Last edited:
In your little bubble called the green verse lol along mister media.

We got to call you out on your blatant bullshit especially since Digital Foh dry and lens of truth were always giving the graphics nod and graphical benchmarks to ps3 exclusives. That's when Microsoft were going hard with live action trailers instead of gameplay. In each respective genre Sony was skating them....

FPS-Consensus by professionals and gamers, it wasn't even close Killzone 2/3 destroyed Halo visually.

TPS-Unchaeted 2/3 people couldn't believe what thy saw the pinnacle of graphics last Gen. Gears couldn't compete :)

TLOU-We won't even go there :)

Heavy Rain hmmmm

God of War there wasn't even any competition in tht genre

MLB the Show again no competition in exclusives :)

You get the point but if you do there are dozens of articles on ps3 exclusives setting the bar when all Xbox games could do was wonder how they pulled it off. If you don't believe me lol go back look at articles about Killzone and Uncharted.

Stop the blatant trolling and bs. Even towards the end multiplats were already running on par to 360 versions.

I'll drop this again just in case you want to keep lying and talking nonsense. This isn't subjective it was facts all last Gen

Gamingbolt isn't facts. Of course this can be subjective, but Gears Judgement and Halo 4 had the best graphics of x360/ps3 era.
And hey, X1 had some games on par with ps4...so, eSRAM redeemed, right? :pie_eyeroll:
 

Gamernyc78

Banned
Gamingbolt isn't facts. Of course this can be subjective, but Gears Judgement and Halo 4 had the best graphics of x360/ps3 era.
And hey, X1 had some games on par with ps4...so, eSRAM redeemed, right? :pie_eyeroll:

Nope it isn't subjective ps3 games were doing things graphically Xbox 360 wasn't. We aren't just talking about 2006 lol once Uncharted 2 came out it was over. Again whole bunch of articles on that :) The overall consensus even if you find fanboy outliers is that ps3 exclusives were setting graphical bars and passing that mantle amongst each other.

Halo Meh never even close to killzone.
Gears had the best graphics on 360 but in no way was touching Uncharted. Not to say those games didn't look good they just weren't ps3 exclusive good. Of course you are looking through a Microsoft fan goggles so nothing I say or a myriad of articles won't change your mind 😁
 
Last edited:

LordOfChaos

Member
Here's what closes the book for me.

To illustrate the peculiarities of Cell programming, we use the Breadth-First Search (BFS) on a graph. Despite its simplicity, this algorithm is important because it is a building block of many applications in computer graphics, artificial intelligence, astrophysics, national security, genomics, robotics, and the like.

Listing One is a minimal BFS implementation in C. Variable G contains the graph in the form of an array of adjacency lists. G.length tells how many neighbors the i-th vertex has, which are in G.neighbors[0], G.neighbors[1], and so on. The vertex from which the visit starts is in variable root. A BFS visit proceeds in levels: First, the root is visited, then its neighbors, then its neighbors' neighbors, and so on. At any time, queue Q contains the vertices to visit in the current level. The algorithm scans every vertex in Q, fetches its neighbors, and adds each neighbor to the list of vertices to visit in the next level, Qnext. To prevent being caught in loops, the algorithm avoids visiting those vertices that have been visited before. To do so, it maintains a marked array of Boolean variables. Neighbors are added to Qnext only when they are not already marked, then they get marked. At the end of each level, Q and Qnext swap, and Qnext is emptied.


Normal CPU:
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

/* ... */

/* the graph */
vertex_t * G;

/* number of vertices in the graph */
unsigned card_V;

/* root vertex (where the visit starts) */
unsigned root;

void parse_input( int argc, char** argv );

int main(int argc, char ** argv)
{
  unsigned *Q, *Q_next, *marked;
  unsigned  Q_size=0, Q_next_size=0;
  unsigned  level = 0;

  parse_input(argc, argv);
  graph_load();

  Q      =
          (unsigned *) calloc(card_V, sizeof(unsigned));
  Q_next =
          (unsigned *) calloc(card_V, sizeof(unsigned));
  marked =
          (unsigned *) calloc(card_V, sizeof(unsigned));

  Q[0] = root;
  Q_size  = 1;
  while (Q_size != 0)
    {
      /* scanning all vertices in queue Q */
      unsigned Q_index;
      for ( Q_index=0; Q_index<Q_size; Q_index++ )
      {
        const unsigned vertex = Q[Q_index];
        const unsigned length = G[vertex].length;
        /* scanning each neighbor of each vertex */
        unsigned i;
      for ( i=0; i<length; i++)
          {
            const unsigned neighbor =
              G[vertex].neighbors[i];
      if( !marked[neighbor] ) {
            /* mark the neighbor */
            marked[neighbor]      = TRUE;
            /* enqueue it to Q_next */
            Q_next[Q_next_size++] = neighbor;
          }
        }
      }
      level++;
      unsigned * swap_tmp;
      swap_tmp    = Q;
      Q           = Q_next;
      Q_next      = swap_tmp;
      Q_size      = Q_next_size;
      Q_next_size = 0;
    }
  return 0;
}

Becomes this to do on an SPE

On a Pentium 4 HT running at 3.4 GHz, this algorithm is able to check 24-million edges per second. On the Cell, at the end of our optimization, we achieved a performance of 538-million edges per second. This is an impressive result, but came at the price of an explosion in code complexity. While the algorithm in Listing One fits in 60 lines of source code, our final algorithm on the Cell measures 1200 lines of code.





Cell was an interesting, novel design choice, but ultimately gave no consideration to developer time and budgets, and that's why it failed outside of where Sony would fund a few close nit studios with nearly unlimited budgets. Would it be interesting to see where a continuation of it would have gone in a universe simulator, sure, but ultimately we live in a world where companies have to turn profits and saving silicon budget on the things that make programming easier and shifting that to the programmers was not a winning strategy.
 
Last edited:

Ar¢tos

Member
Here's what closes the book for me.




Normal CPU:
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

/* ... */

/* the graph */
vertex_t * G;

/* number of vertices in the graph */
unsigned card_V;

/* root vertex (where the visit starts) */
unsigned root;

void parse_input( int argc, char** argv );

int main(int argc, char ** argv)
{
  unsigned *Q, *Q_next, *marked;
  unsigned  Q_size=0, Q_next_size=0;
  unsigned  level = 0;

  parse_input(argc, argv);
  graph_load();

  Q      =
          (unsigned *) calloc(card_V, sizeof(unsigned));
  Q_next =
          (unsigned *) calloc(card_V, sizeof(unsigned));
  marked =
          (unsigned *) calloc(card_V, sizeof(unsigned));

  Q[0] = root;
  Q_size  = 1;
  while (Q_size != 0)
    {
      /* scanning all vertices in queue Q */
      unsigned Q_index;
      for ( Q_index=0; Q_index<Q_size; Q_index++ )
      {
        const unsigned vertex = Q[Q_index];
        const unsigned length = G[vertex].length;
        /* scanning each neighbor of each vertex */
        unsigned i;
      for ( i=0; i<length; i++)
          {
            const unsigned neighbor =
              G[vertex].neighbors[i];
      if( !marked[neighbor] ) {
            /* mark the neighbor */
            marked[neighbor]      = TRUE;
            /* enqueue it to Q_next */
            Q_next[Q_next_size++] = neighbor;
          }
        }
      }
      level++;
      unsigned * swap_tmp;
      swap_tmp    = Q;
      Q           = Q_next;
      Q_next      = swap_tmp;
      Q_size      = Q_next_size;
      Q_next_size = 0;
    }
  return 0;
}

Becomes this to do on an SPE







Cell was an interesting, novel design choice, but ultimately gave no consideration to developer time and budgets, and that's why it failed outside of where Sony would fund a few close nit studios with nearly unlimited budgets. Would it be interesting to see where a continuation of it would have gone in a universe simulator, sure, but ultimately we live in a world where companies have to turn profits and saving silicon budget on the things that make programming easier and shifting that to the programmers was not a winning strategy.
But even it short existence was beneficial, Cell programming was kinda proto-gpgpu and developers learned a lot from it.
 
Note that the 360 had games that used the GPU for physics etc as in “async compute”.
Interesting. Got a source? Which games did this?

AFAIK, Xenos GPU was a DX9/DX10 hybrid (DX9 shader feature set with a unified shader design a la DX10), but it still lacked GPU compute capabilities.

The best you could have on 360 was VMX128 (CPU SIMD).

So the Cell is still considered some secret sauce power house that no one really figured out?
Let.it.go.
No one is still going around claiming a second GPU in the X1 and this Cell nonsense is just as crazy. Devs had 8 years and 360 still had the best looking multiplats and exclusives.
I'll have to disagree with that.

Cell truly existed, while the 2nd MisterXmedia GPU never did. There's no comparison here.

360 had stellar games, but nothing topped this PS3 system seller back in 2009:



Whether we like it or not, Cell was the progenitor of many modern technologies (GPGPU, APUs).

XB1X follows the same philosophy with a beefed up GPU. Sony is like AMD in a sense, Kutaragi was way ahead of his time back in 2001 when he conceived the idea.

Intel actually got most of the Alpha people (they tried to get the Alpha teams twice actually, only the last time they were able not to have them resign en masse), especially the EV7 and the EV8 ones, assigned them to revolutionise the Itanium core, and then killed their project off based on internal politics (thanks ex Elbrus 2K lead... Moscow research center acquired by Intel in those years).
Isn't AMD Athlon (K7) based on DEC Alpha? Even the bus (EV6) is the same.

It was the first time that AMD became "respectable" as a CPU company and surpassed Intel offerings (both P3 and P4) by a wide margin.

While the algorithm in Listing One fits in 60 lines of source code, our final algorithm on the Cell measures 1200 lines of code
Very interesting. I guess this must have affected compiled binary sizes and thus memory usage, right?

Cell only had 256MB of Rambus to work with. It was relatively memory starved compared to 2005 PCs (that had at least 1GB of RAM + x86/CISC code density).

The compute/flops advantage made it worth it, though.

But even it short existence was beneficial, Cell programming was kinda proto-gpgpu and developers learned a lot from it.
Very true. Programming wizards (like ND, Insomniac Games, Santa Monica, DICE etc.) that mastered the Cell architecture were also able to master current-gen consoles, despite the Jaguar deficit.
 
Last edited:
  • Like
Reactions: TLZ

psorcerer

Banned
..but this TLB problem is caused directly because the SPE ram is limited to under 256kB . if they PPE [duh] SPE memory could store big matrixes (or other data sets) the TLB problem goes away I think.

Not really. It's caused by really limited memory controller on PPC.
It cause a lot of problems for X360, like the famous LHS crap.

SPU are a lot like CU on modern GPUs, and their local storage is texture/vertex cache on modern GPUs.
And for 2007 having 256k cache per CU was pretty big even for top-end GPUs.
Size was not a problem.
 

Panajev2001a

GAF's Pleasant Genius
Yep.

I thought the 360 CPU had interesting parallels to the overall design the PS2's CPU (though surely they don't share any DNA) .. replace MIPS (R5000 core?) and the VU0 and VU1 with three (practically) equivalent PowerPC cores, which are also vector SIMD feature rich, but this time are full cores .. also the third 360 CPU core being able to share some L2 cache with GPU is reminiscent of using VU1 to feed GPU (GS) with vertex data etc.

[conspiracy time] I wonder if there was some deliberate intention by MS to make 360 CPU seem familiar (to last gen's clear winner PS2), and at the same time - a lot easier to use ?

PS4 gen shows the 360 design "won" that argument.. (no flames please)

PS3 was just totally fresh innovation- almost everything in it seems re-invented, maybe the use of scratchpad RAM in VUs instead of proper L1 data/program cache is the only obvious carry over I can spot

I think PS4 has carryovers and obvious philosophy extensions, but Sony’s modus operandi back then was to pivot and invent a new architecture to fix the previous generation bottlenecks and push forward on the rest... so they changed quite a bit... see PS1 vs SNES, PS2 vs PS1, PS3 vs PS2, etc...

PSP was the first “closely resembling with some twists” design as it was meant to feel a bit like a PS2 architecture on the go ;).

I do agree with you that the Xbox 360 could be considered as PS2 inspired for the reasons you mentioned (and the EDRAM, although it was not the universal scratchpad as the GS eDRAM was...).
 

psorcerer

Banned
Cell was an interesting, novel design choice, but ultimately gave no consideration to developer time and budgets, and that's why it failed outside of where Sony would fund a few close nit studios with nearly unlimited budgets. Would it be interesting to see where a continuation of it would have gone in a universe simulator, sure, but ultimately we live in a world where companies have to turn profits and saving silicon budget on the things that make programming easier and shifting that to the programmers was not a winning strategy.

If you want performance - you will need to change how things are programmed.
You cannot scale single thread performance anymore, and Cell was the pioneer architecture in teaching people how to do it.
Besides a lot of low level tech was included in the libraries later on. So even the learning curve was not that steep.
 
Status
Not open for further replies.
Top Bottom