• Hey Guest. Check out your NeoGAF Wrapped 2025 results here!

translated Hiroshige Goto article on PS3's Nvidia RSX

http://www.tinyurl.com/9k8f8

Hiroshige's Goto Weekly overseas news
Graphic engine RSX of PLAYSTATION


- As for RSX and G70 GPU

The SONY * computer entertainment (the SCEI) the NVIDIA it developed the GPU which is loaded onto the next generation machine " PLAYSTATION 3 ", " the RSX (the Reality Synthesizer)". This RSX, is presumed up-to-date high ended GPU " GeForce 7800 GTX of the NVIDIA (the G70)" with, it is the GPU of the twins.

The David B Kirk person of the NVIDIA (the Chief Scientist), concerning both GPU " Shader architecture of the RSX, the G70 is designated as the base ", it has explained. Perhaps, the difference of the RSX and the G70, is seen it is no more than a process technology and the system bus, and memory interface width. G70 with 0.11 mu m process of TSMC PCI Express x16, GDDR3 interface of 256bit width. RSX with the SONY / Toshiba 90nm process FlexcIo (Redwood: Redwood) with, memory interface of 128bit width. Concerning the Shader constitution the other than that and the micro architecture et cetera of the Shader, the difference is presumed in both GPU almost it is not.


RSX Block Diagram
kaigai_1a.gif

( it is not the official block diagram of RSX but one based heavily on G70)





As for the reason which you think there is no difference in Shader constitution, it is several, but as for largest basis there are announcement specifications with the E3. When it becomes 136 Shader Operations/cycle, concerning the parallel characteristic of the Shader of the RSX, it explained the NVIDIA, with the E3.

These specifications are presumed the Xbox 360 GPU of the ATI development which the Microsoft announces (the R500) opposing to the specifications, those which are published. As for the Microsoft, Shader operation efficiency of the Xbox 360 GPU has published, 48 billion shader operations/sec, when it converts, 96 Shader Operations/cycle (it divides) with it becomes with the 500MHz. The numerical value which is higher than that as the NVIDIA which does specifications competition, was shown, you probably apply.

However, the " Shader Operations/sec " in expression of the ATI, with NVIDIA inscription is suitable to the Instructions/sec almost. The NVIDIA has counted the plural operations which issue in 1 order, as the Operations/sec because parallel characteristic of order issue has counted as another specifications. The specifications at the time of the E3 are seen because the NVIDIA opposes to the ATI, transitory inscription was fitted.

With being the case that it is said, 136 Shader Operations/cycle of the RSX, are presumed with original inscription of the NVIDIA it becomes 136 Instructions/cycle. So, this being what specifications? As for the Vertex Shader of G70 type 2 Instructions/cycle, as for the Pixel Shader when it has the parallel characteristic of order issue of 5 Instructions/cycle, it has explained the NVIDIA. Because of that, when constitution of the G70 (Vertex Shader 8/Pixel Shader 24) is, it is published altogether GPU it becomes 136 Instructions/cycle.

In other words, if you look at the specifications of the Instructions/cycle, it can presume the Shader constitution of the G70 and the RSX completely it is the same. Also the RSX, when we assume the Vertex Shader 8 units, the Pixel Shader with constitution of 24 units, has the mechanism of the same order issue as the respective G70, becomes these specifications.

Actually, as for the person who says that the G70 and the RSX almost are identical architecture, it is many in the GPU industry authorized personnel. As for a certain authorized personnel " as for the NVIDIA finally concluding contract with the SCEI around summer of 2004 was. Therefore, as for the NVIDIA there was no enough time which develops the architecture which customizes in for the SCEI ", that.

It can presume that almost probably there is no difference in constitution of both GPU, even from the number of transistors. As for the number of transistors, the G70 is announced 3 hundred million 200 ten thousand (302m), the RSX 3 hundred million (300M) or more. The PCI Express x16 as for mounting is heavier than the FlexcIo, when you think of that also the DRAM controller is 2 times, you can agree upon this difference.

- As for Shader operational performance of RSX 28% increase

Shader architecture of the G70, GeForce 6800 (NV40) in the base, strengthened the parallel characteristic inside the especially Pixel Shader. For example, with the NV40, it has two vectoring operational units inside the Pixel Shader, but vectoring unit of one side could not do 1 cycle throughput sum of products calculation. With the G70 it is designed in such a way that it can do product-sum operation. The figure under the block diagram inside the Shader of the G70 is, but it is presumed also the RSX almost is the same as this.

G70 Vertex Shader
kaigai_2a.gif



G70 Pixel Shader
kaigai_3a.gif


G70 Pixel Shader (ROP)
kaigai_4a.gif


Assuming, Shader architecture of the G70 and the RSX is the same, operational performance differs largely. Because as for that operational frequency is different. As for the present G70 as for the product specifications as for the RSX the 550MHz is planned the 430MHz, vis-a-vis that. As for frequency because about 28% the RSX is higher, with peak operational efficiency with respect to theory as for the RSX it reaches 1.28 times that the G70.

With the specifications of the G70 which the NVIDIA is open, the Vertex Shader with 4way VLIW unit + 1 scalar unit, can calculate 5 data in parallel. With sum of products calculation, as for floating point operation per 1 cycle 10. Because the Vertex Shader is 8, operational efficiency becomes in regard to calculation, as follows.

(4way+1scalar) X 2 FP operations (MADD) = 10 FP operations/cycle
10 FP operations X 8 Shader X 430MHz = 34.4GFLOPS

As for the Pixel Shader of the G70, you say the scalar operational unit where SIMD unit of the 4way is called 2 units and the Mini-ALU 2, the FP16 normalize of the 7way (formalization) is processing unit. Because of that, operational efficiency becomes like under.

((4way X 2 Units + 2 scalar) X 2 FPoperations) + 7 normalize = 27 FP operations/cycle
27 FP operations X 24 Shader X 430MHz = 278.6GFLOPS

Shader constitution being the same, when we assume, the clock is the 550MHz, operational efficiency of the RSX becomes like below.

Vertex Shader
10 FP operations X 8 Shader X 550MHz = 44GFLOPS

Pixel Shader
27 FP operations X 24 Shader X 550MHz = 356.4GFLOPS

44GFLOPS + 356.4GFLOPS = 400.4GFLOPS

RSX Shader FP Performance
kaigai_5a.gif


Because the Xbox 360 GPU (the R500) Shader floating point arithmetic efficiency is the 240GFLOPS, in regard to calculation becomes peak Shader floating point arithmetic efficiency of 1.66 times. But, as for the RSX and the Xbox 360 GPU because architecture differs completely, as for actual efficiency it cannot infer simply from this number. To the last, only it is standard of the operational quantity.

- Changes the method of using the GPU broadband

The fact that most it differs in the G70 and the RSX is architecture of the host bus. Vis-a-vis the G70 being the PCI Express x16, as for the RSX the FlexcIo of the Rambus (the Redwood: The redwood) you use. As for zone of the PCI Express x16 the type direction 4gB/sec, being bidirectional, the 8gB/sec. Vis-a-vis that, the FlexcIo descends and (the Cell -> the RSX) rises and the 20GB/sec, (the RSX -> the Cell) zone is thick in the 15gB/sec and far. It is broadband of 4.5 times.

Not only saying, simply the data transfer quantity became large, it has an influence the extreme difference of this zone, on also the something related to CPU between the GPU. Because it reaches the point where it can do more positive role allotment between both processor. The PC graphics where the host bus is limited, the graphics which elementary differs becomes possible. Concerning this, already we would like to explain a little in detail.

There is also an advantage in regard to mounting in the FlexcIo. That the die/di which is possessed (the semiconductor itself) is that zone per area is wide. You do not know mounting the PCI Express of the NVIDIA has occupied some area. But, the tip/chip set vendor authorized personnel in each case points out the area where the PCI Express x16 is enormous is needed. As for the FlexcIo, with 13.1 squares mm, RSX side because interface width becomes narrow, the possibility of becoming small is higher than that even with mounting the Cell side of 96bit width.

By the way, as for the FlexcIo with the parallel interface whose constitution is possible at the 8bit unit, on the specifications transfer rate has become the 6.4gbps. But, the specifications of the PS3, 20GB/sec and the 15gB/sec are not agreeable with this transfer rate. Transfer rate descends with the 5gbps, and at the time of the 32bit and the rising 24bit and calculation is agreeable. Because of that, in the product as for the FlexcIo there is a possibility of being dropped into the transfer rate of the 5gbps. In case of the Cell, when at present, the operational frequency of the CPU and transfer rate of XDR DRAM memory are done, same period drop the CPU clock also transfer rate of the XDR DRAM has fallen. But, the FlexcIo is thought it has become asynchronism.

RSX Shader FP Performance
kaigai_6a.gif


- Architecture of memory

Another difference is memory interface width. The G70 for the PC has the GDDR3 interface of 256bit width. Vis-a-vis that, the RSX when it calculates backward from memory zone, becomes 128bit width. The memory controller means to have decreased in half. Because memory capacity is the 256MB, the DRAM tip/chip is presumed the x32 item of the 512mbit is 4.

The block diagram of the memory controller of the RSX with presumption, still is not secure. With NV40/G70 architecture, as for the memory controller it has divided into 4 partitions. Each partition is connected to the DRAM controller of 64bit width. The ROP at a time 4 units is connected by the memory (controller) partition of specification, also each partition is connected mutually. The crossbar to connect ROP everything 16 it is not vis-a-vis the memory controller, it is seen by the fact that you connect at 4 ROP units, you design the crossbar easy and efficiently.
 
Sho Nuff said:
Nuh-uh, xbox is better, hur hur hur


Actually, in lots of ways, Xenos probably *is* better than RSX...

Knowing this, its a good thing RSX can recieve graphic assist from CELL according to SCEI and nVidia....
 
Doesn't look like anything new. Matter of fact, we've seen those diagrams before. As stated in the diagrams themselves, they are extrapolations based on G70. That's why it features 24PS + 8VS arrangement. I'll wait to see if one finds something new in the translation. Machine-babble sucks. PEACE.

EDIT: BTW, it's not a bad assumption based on the shader ops, which the article mentions.
 
Mupepe said:
can i get the short version please? Dammit I'm lazy.

until now...

PS3: Number T (Number X + Number Y + ... + Number Z) = Performance M
360: Number V (Number N + Number O + ... + Number L) = Performance H

I like to know the numbers, but I think it is toooooo early to extract conclusions of any of these, how they will work together, we all know how the marketing is. I'm gonna wait until the LAUNCH DAY. Only one look at the best looking launch games is all I need to make a decision :)

PS: sorry techies :P
 
Kleegamefan said:
Actually, in lots of ways, Xenos probably *is* better than RSX...

Knowing this, its a good thing RSX can recieve graphic assist from CELL according to SCEI and nVidia....

And in a lot of ways RSX probably is better than Xenos. Better period? We'll see. There are too many blanks to fill in regarding performance on these chips, vs architecture, to come to conclusions, but if RSX is what we think it is, it looks very powerful.

I would not characterise Cell "co-processing" on graphics as some required booster to keep pace overall. Some of things Cell could do for graphics on PS3 would be highly difficult if not out of the question for X360 to match exactly, if at all.
 
sangreal said:
Elaborate, please

Cell could simply do some things much faster vs X360's CPU. It Xenon found itself having to do similar things, at the very least the scale would have to be cut down. For example if you threw all of Cell into procedural vertex work, you could throw all of Xenon into that and probably not come close.

Beyond power, there's simply more headroom, and a greater granularity of power to play with. Dedicating a couple of SPEs for graphics work on Cell seems easier than dedicating a core on X360 from a resource allocation point of view.
 
gofreak said:
Cell could simply do some things much faster vs X360's CPU.

I don't disagree, but that isn't the same as the statement you made:
Some of things Cell could do for graphics on PS3 would be highly difficult if not out of the question for X360 to match exactly

Either way, it works both ways.
 
gofreak said:
Dedicating a couple of SPEs for graphics work on Cell seems easier than dedicating a core on X360 from a resource allocation point of view.
How do you think games manage AI, animation, gameplay, physics, scripting, etc on one CPU today then?

In terms of CPU cycles, running multiple things on a single xenon core is no harder than running multiple things on any normal single core CPU.

Scheduling on multiple xenon cores is no harder than scheduling on a multi CPU SMP box, you have 3 physical cores and 6 physical threads to allocate. (And you have the NT thread scheduler to help you out, which manages far more complicated machines in the real world.)

If anything, it's SPUs that are going to be more complicated to schedule, because you have to setup DMAs for your code and data and have some sort of overlay system to transfer chunks of code into the SPUs as needed.
 
RSX should be quite a bit more powerful than xenos, except regarding FSAA.


Even with Xenos dedicating all its pipes to pixel shading, RSX can still outperform it on pixel shading.

The only area that the unfied shaders helps is with vertex shaders. Eg for a vertex only pass, Xenos outperforms RSX quite a bit. But then CELL is there as backup for such an event


Vertex Shader= 44GFlops

Use the CELL! -
 
mrklaw said:
RSX should be quite a bit more powerful than xenos, except regarding FSAA.


Even with Xenos dedicating all its pipes to pixel shading, RSX can still outperform it on pixel shading.

The only area that the unfied shaders helps is with vertex shaders. Eg for a vertex only pass, Xenos outperforms RSX quite a bit. But then CELL is there as backup for such an event

Xenos and RSX are vastly different architectures, yet you conclude that RSX is "quite a bit more powerful" because it outperforms it in pixel shading? If you're going to make such a sqeeping statement, atleast go more in depth with your reasoning. There is a lot more to a GPU than raw shading power.
 
I specifically mentioned only pixel shading. Xenos is likely to be better at FSAA and full-on vertex shading. Vertex shading is also something that CELL (and XeCPU) happen to be very suited for, so thats maybe not such a big advantage.

But pixel shading is a large part of what you'll see on screen next gen, and its where Xenos may have a weakness compared to RSA. So I think my (admittedly wooly) comment is still valid
 
Well, since the people who know how much shading power a unified shading architecture has are not talking, it is diffcult to guage how much faster Vertex/Pixel shaders will be in RSX...

Xenos will have close to 100% efficiency with those unified shaders though....RSX won't be able to match that for sure.....

Then again, if its fast enough and if the fragment buffers are effective enough, perhaps RSX wont need to match USA efficency...
 
sangreal said:
I don't disagree, but that isn't the same as the statement you made:

Whilst speed isn't a barrier to technically doing anything - technically you can do anything on any CPU if you have enough time - the realtime requirements of a game require a certain level of performance to make things realisable. Perhaps "viable" is a better word than "possible".

How do you think games manage AI, animation, gameplay, physics, scripting, etc on one CPU today then?

In terms of CPU cycles, running multiple things on a single xenon core is no harder than running multiple things on any normal single core CPU.

Yes, I was simplifying things, but the extra granularity on Cell helps (and there are more hardware threads too anyway). You don't have to dedicate execution units necessarily, though some tasks may be so demanding as to effectively require that anyway.

If anything, it's SPUs that are going to be more complicated to schedule, because you have to setup DMAs for your code and data and have some sort of overlay system to transfer chunks of code into the SPUs as needed.

This is more complex, but it can also have its advantages. Anywhere data access can be made explicit is right up their alley. For a lot of graphics work that you might consider on Cell I think it'd mesh quite well.

Kleegamefan said:
Xenos will have close to 100% efficiency with those unified shaders though....RSX won't be able to match that for sure.....

Utilisation may be a better word than efficiency. And utilisation can be higher on a fixed architecture in a closed box like PS3 than on fixed architectures in PCs, which is something to bear in mind when ATi compares to PC chips or PC games..
 
every hardware topic here end up with gofreak (no offense, you are very knowledgeable) defending the PS3 and not having much positive to say about the 360?
 
Kleegamefan said:
Xenos will have close to 100% efficiency with those unified shaders though....RSX won't be able to match that for sure.....

Where did this meme get started? Seriously? "Close to 100% effeciency?" I think not. Just because they are unified doesn't mean they map 1:1 with every workload, nor does it mean there isn't going to be ineffeciencies in arbitrating the resources that just don't exist on a classical architecture. Do bubbles not exist in a unified pipeline?

Furthermore, the more general purpose you get, you generally trade-off computational ability per area.
 
If the RSX is basically a G70 why hasnt any silicon been produced? The G70 is already out for sale. So what im trying to say what were those 2 years of developing the RSX go to if it was going to be just a slightly modified G70?
 
Can anyone tell me if this thing has shaders?

Yes it has shaders.

Have you seen the diagrams of the article?

If the RSX is basically a G70 why hasnt any silicon been produced? The G70 is already out for sale. So what im trying to say what were those 2 years of developing the RSX go to if it was going to be just a slightly modified G70?

Becuase is Sony who is going to manufacturate the RSX, not TSMC.
 
Shorter Version said:
Vertex Shader
10 FP operations X 8 Shader X 550MHz = 44GFLOPS

Pixel Shader
27 FP operations X 24 Shader X 550MHz = 356.4GFLOPS

44GFLOPS + 356.4GFLOPS = 400.4GFLOPS

Xenos = 240GFLOPS

400.4/240 ~ 1.668

240/400.4 ~ .5994
 
Has anyone said that Xenos is "240 Gflop"?
240GFlop shader throughput has been stated by several in-depth Xenos articles - and at least some of them had access to info straight from ATI.

That said, there's a rather interesting leaked MS document floating around the web for past month or two that claims a number lower then that :P
 
hadareud said:
every hardware topic here end up with gofreak (no offense, you are very knowledgeable) defending the PS3 and not having much positive to say about the 360?
It's already been pointed out that he is unbiased and any hint of him having any is laughable..... at least to some :p
 
sonycowboy said:
Well,

If the 360 has 1TFLOP and the CPU has 115.2 GFLOPS, then you'd think that Xenos is a good bit faster than just 240 GFLOPS.

That's programmable shader power, not total flops when you count everything, programmable or not.

Does anyone know where the supposed lower GFLOP figure for Xenos's shader array comes from? I saw a mangled translation from an article on B3D, but couldn't figure out what it was saying.

edit - ah, more "secret" MS docs floating around, haha. I need better contacts ;) Apparently it's the same as the info from this presentation though? http://www.watch.impress.co.jp/game/docs/20050520/x360_g.htm Those numbers match up?

Oh, and I'm sure I'm biased. Everyone is. Or at least I have preferences and/or different expectations for different things. But I think that's perfectly natural. I'm very impressed by X360, I'm just a bit more excited by PS3's potential in the long run. And I do think PS3 is a little more "marked" right now in terms of attempts to downplay it, which is the norm for any system which is perceived or "supposed" to be most powerful. Look at the technical to-ing and fro-ing Xbox had to endure last gen. So a combination of that and my (greater) excitement for it may lead me to "defend it". But that's just the way it is.

This is a purely technical thing also, btw. As my interest in the tech grew I found myself interested in games on two levels - technically and then just as games. Although I can't absolutely say my interest in one side does not influence the other, as far as the latter is concerned, I'd probably be a Nintendo fan more than anything (but I've been far more multiplatform since last gen) ;)
 
Oh Christ... not the the FLOPS measurements again...

The real question here is: Can the Xenos live up to it's hyped efficiency, and can the RSX actually hit its theoretical limits?

My answer to that is: I have no idea if the Xenos can live up to the claims, but I'm pretty sure the RSX can't (based on its architecture). But don't worry Sony fans, the PS3 is still a beast. :)
 
Top Bottom