• Hey Guest. Check out your NeoGAF Wrapped 2025 results here!

Xenon GPU - Unified shader > *(plus high res Ruby vid)

I mean, just think at the issue of legacy code, of backward compatibility.

If the PC space behaved exactly like the console space or even exactly bi-univocally as posited, would we really have the processor architectures and implementations we have today in the Desktop PC space ?

In the realm where feature and performance innovation was basically mandatory transparent to the applications developer (we have seen how quickly not transparent innovations such as MMX were implemented in mass in the desktop PC space ;) SSE fared better luck, but it was also because it did allow for an easier implementation than MMX caused issues when the programmer had to go back and forth between MMX and x87 FPU code and because... surprise surprise multi-media applications developers badly wanted more vector processing acceleration on the CPU side of things).

In the console realm we have massive fan wars discussing if BC should even be considered a worthy option or an utter waste of transistors satisfying poor fans (some have taken this position too). I do not see how you can avoid separating the two markets.

If PSOne developers affronted the transition from PSOne to PSTwo thinking "cool, so we can upgrade our assets, but keep our code-base mostly the same doing some optimiation only in few time critical loops and upgrading the reach of our solutions to bigger worlds, more detailed characters, more advanced A.I., etc..." without thinking how many massive changes all around their development pipeline would have been needed, you can imagine SCE's response something like what J.J. Jameson responded to Peter asking, Spider-man 2 reference, to be paid in advance for the photography service.

spiderman3_jonha_jameson.jpg
 
But, but a good part of the Fragment Processing power is for rendering the images and apply the FX without problems in HD resolutions.

I ever seen the PS3 designed for a True-HD environment and the 360 designed for an HD environment, being more better the 360 since they have an UMA configuration and have to render less than half pixels per frame than PS3.

In terms of performance power of the Fragment processing the RSX crushes the Xenos but it uses more resolution.
 
Besides developing hardware, ATI always helps developers by releasing tools, source code samples, etc. For example, we heard about a displaced subdivision surfaces algorithm developed by Vineet Goel of ATI Research Orlando. Are you helping Xbox 360 developers leverage the power of the Xenos GPU or is that a Microsoft responsibility?

Bob Feldstein: We have teamed with Microsoft to enable developers. We have had some members of the developer relation and tools teams work directly at developer events and assist in training Microsoft employees. We are ready to help at anytime but the truth is that we have been quite impressed by how Microsoft is handling developer relationships.

I like his reply to that question. Basically co-signs what everyone else says
 
Who the fuck really cares?

Some games look like utter dogshit, some look good same will be true for PS3.

And I thought console gamers were clever enough to not fight over transistors, polys/sec and memory bandwidth. I get enough of these wars between GeForce and Radeon fanbois in the PC area.
 
I thought we had this discussion already? some people did some very basic sums last year and although the Xenos is unified, the dedicated pipes in the RSX should be quicker, even when all the Xenos pipes are doing vertex work. So the RSX vertex pipes out perform a Xenos at full tilt, even when they still have their pixel pipes sitting idle.


You can't just say that unified is more efficent because there aren't any pipes left idle. You have to compare the performance of each pipe in each of its modes (vertex/pixel) with its traditional neighbour
 
If RSX really ends up being less powerful (or same), and if we end up with games even on PS3 that come from good devs, but use no aniso filtering, have to compromise resolution, have vsync problems, etc - even in games that look unimpressive - both consoles fail at my next gen graphics criteria.

Xenos is an interesting piece of hardware on paper, but in real life so far it's performed lower than what paper specs would suggest. I mean, let's face is - is there anyone here who anticipated virtually all the flagship games wouldn't have no AF?
 
Marconelly said:
Xenos is an interesting piece of hardware on paper, but in real life so far it's performed lower than what paper specs would suggest. I mean, let's face is - is there anyone here who anticipated virtually all the flagship games wouldn't have no AF?

Dont forget lack of any AA. From what i cant tell 360 seems to be between a 6800 and 7800 in performance. PS3 also isnt expected to be that great either.
 
Deg said:
Dont forget lack of any AA. From what i cant tell 360 seems to be between a 6800 and 7800 in performance. PS3 also isnt expected to be that great either.

There are fairly sound explanations for why early software has often lacked AA despite MS's assurances, that suggest that for (some) titles going forward it shouldn't be such a problem.

I also don't know what the PS3 comment is suggesting. If you're throwing it in with your suggestion of "between 6800 and 7800 in performance", I'd have to disagree.
 
Panajev2001a said:
you can imagine SCE's response something like what J.J. Jameson responded to Peter asking, Spider-man 2 reference, to be paid in advance for the photography service.

spiderman3_jonha_jameson.jpg

Love the SM2 reference. Great movie :D
 
mrklaw said:
I thought we had this discussion already? some people did some very basic sums last year and although the Xenos is unified, the dedicated pipes in the RSX should be quicker, even when all the Xenos pipes are doing vertex work. So the RSX vertex pipes out perform a Xenos at full tilt, even when they still have their pixel pipes sitting idle.

What ?

48 Shader ALU's dedicated to Vertex processing at 500 MHz slower than 8 (likely) VS ALU's clocked at 550 MHz ????
 
YellowAce said:
I'm running out of toilet paper.

Dude, you probably keep it in front of the big Flat screen in the spare room, you know... with all those giant pixels (no Blu-Ray yet ;)) you probably thought "crap" and realized a new place to store your tp.
 
Panajev2001a said:
What ?

48 Shader ALU's dedicated to Vertex processing at 500 MHz slower than 8 (likely) VS ALU's clocked at 550 MHz ????
I think the thing was that Xenos is limited to vertex throughtput by it's clock or something?
 
Panajev2001a said:
What ?

48 Shader ALU's dedicated to Vertex processing at 500 MHz slower than 8 (likely) VS ALU's clocked at 550 MHz ????

Yeah, I think he had vertex mixed up for pixel shader there. Xenos' 48 alus all tasked to PS vs. RSX's dedicated pipes (24, 32, whatever it ends up being) would be the comparison one might be looking for. And obviously a bench we'll never have any hard evidence for.
 
Marconelly said:
I think the thing was that Xenos is limited to vertex throughtput by it's clock or something?

There's a setup-limit, as in every gpu, but it wouldn't prevent you busying your ALUs with vertex shading in an arbitrary example, if you wanted.

For PS3 to match Xenos working only on vertex shading would require pulling in Cell, I think. Of course, that still leaves the pixel shaders available also.
 
Interviewer: So Mr. nVIDIA, what's better? Unified shaders or the traditional approach of vertex and pixel shader pipelines?

nVidia PR Guy: Oh....Traditional of course! There's an inherent ineffiency in that design that traditional pipelines always outperform.

Interviewer: Hmmm...Very interesiting. So I guess that means we'll never see unified shaders on nVIDIA hardware, huh?

nVidia PR Guy: Yes, someday.

Interviewer: But, I thought you just said that it isn't effiecient?

nVidia PR Guy: No, no, no...What I should have said was that ATI's version of unified shaders is ineffiecient...we'll be better, of course.

Interviewer: Of course.
 
3rdman said:
Interviewer: So Mr. nVIDIA, what's better? Unified shaders or the traditional approach of vertex and pixel shader pipelines?

nVidia PR Guy: Oh....Traditional of course! There's an inherent ineffiency in that design that traditional pipelines always outperform.

Interviewer: Hmmm...Very interesiting. So I guess that means we'll never see unified shaders on nVIDIA hardware, huh?

nVidia PR Guy: Yes, someday.

Interviewer: But, I thought you just said that it isn't effiecient?

nVidia PR Guy: No, no, no...What I should have said was that ATI's version of unified shaders is ineffiecient...we'll be better, of course.

Interviewer: Of course.

I believe their stance was that there were benefits in maintaining discrete units given the current vertex and pixel shading model (3.0/3.0+), not that it would never be viable or desireable. You could ask the same question of ATi and why they're not using unified shaders in their PC parts yet. A more obvious answer might be API support (?).

That asides, I think nVidia have learned the hard way that being early or first with new features isn't always the greatest of ideas, certainly in the PC space where there's little room for mess-ups on first implementations.
 
gofreak said:
I believe their stance was that there were benefits in maintaining discrete units given the current vertex and pixel shading model (3.0/3.0+), not that it would never be viable or desireable. You could ask the same question of ATi and why they're not using unified shaders in their PC parts yet. A more obvious answer might be API support (?).

That asides, I think nVidia have learned the hard way that being early or first with new features isn't always the greatest of ideas, certainly in the PC space where there's little room for mess-ups on first implementations.

Oh, I know...I was just paraphrasing an interview I once read from nVIDIA concerning their stance on unified shaders. In the end, if unified shaders are so ineffiecient compared to traditional pipelines, why switch over? Even nVIDIA has conceded that they are headed in that direction. That was the point I was trying to make. :P
 
It's kinda interesting reading about this little subdebate over logic. In an interview I just finished reading about the X1900 architecture over at B3D's main site (quite interesting by the way, I suggest you guys read it), I came across this answer when they asked about the tie in of 3:1 shader processors to pipes (48:16).

[Eric Demers] We talk to ISVs to get an idea of what they think will happen in the next few years. As well, we look at shaders for titles coming out next year or two. With both of those pieces of data, we can get a pretty good idea at what will be the newest titles at the introduction time. Those are definitely the types of titles we shoot for, much more so than the older titles (which people will play less of anyway by then). Having said that, there is some guesswork involved as well. But it's also a chicken and egg thing, in that ISVs will tell us what they are doing, but they will also be influenced to designing games with our new technology in mind. If we come out with 3:1 ALU:TEX ratio HW, then designers will tend to add more ALUs for next games, and so it's a mutually influenced evolution.

http://www.beyond3d.com/reviews/ati/r580/int/
 
3rdman said:
Oh, I know...I was just paraphrasing an interview I once read from nVIDIA concerning their stance on unified shaders. In the end, if unified shaders are so ineffiecient compared to traditional pipelines, why switch over? Even nVIDIA has conceded that they are headed in that direction. That was the point I was trying to make. :P

Fair enough, although the interviews I remember do specifically talk about the benefit of seperation in a SM3.0/+ model, and specific optimisations for each type of processing in that model, which is a better "excuse" if you want to put it that way ;)

Regardless, I wonder how relevant it is in a closed box situation as we're facing with the consoles. One of the big motivators for unified shaders is the problem you face in the PC space where you have games with different mixes of vertex vs pixel instructions seeking to run on an equally varied set of gpus with different proportions of vertex vs pixel shaders in the hardware. That presents a challenge for utilisation and optimisation. A unified setup pretty much negates that. In a closed box, however, you should be automatically gaining on the utilisation side because you only have one chip to worry about, and its peculiarities and characteristics. Sure, the capability to "burst" one type of processing - or specifically, vertex processing - in some shader passes is nice, but again, with a closed box, you could find other ways to do that (that also leave you with a simultaneous capability to "burst" pixel shading).
 
3rdman said:
Oh, I know...I was just paraphrasing an interview I once read from nVIDIA concerning their stance on unified shaders. In the end, if unified shaders are so ineffiecient compared to traditional pipelines, why switch over? Even nVIDIA has conceded that they are headed in that direction. That was the point I was trying to make. :P
I remember the interview you are talking about, and in it he's said that they keep evaluating the unified shader designs, but so far, their traditional designs are performing better. Unified shaders are technically more efficient in the percentage of shader utilization, but there's a difference between efficiency and raw performance/speed.
 
Marconelly said:
I remember the interview you are talking about, and in it he's said that they keep evaluating the unified shader designs, but so far, their traditional designs are performing better. Unified shaders are technically more efficient in the percentage of shader utilization, but there's a difference between efficiency and raw performance/speed.

Comparing their Traditional design to their own Unified Architecture. That could basically mean that their own implementation of UMA wasn't up to snuff compared to their Traditional architecture, hence the Traditional is better statement. Says nothing of UMA as a whole, ATi's implementation could be far more advanced compared to nVidia's. Which can also lead to why ATi are releasing their next PC architecture as a UMA already.
 
Tenacious-V said:
Which can also lead to why ATi are releasing their next PC architecture as a UMA already.

The R600 will be, but rumours are cropping up that the G80 will be unified also. We shall see.
 
yeah, I got them the wrong way round. I thought the RSX pixel shaders (even though there may be less of them) could actually do more ops than all the unified shaders on Xenos?


Anyway, its apples Vs oranges and we'll have to wait for the games
 
Tenacious-V said:
Comparing their Traditional design to their own Unified Architecture. That could basically mean that their own implementation of UMA wasn't up to snuff compared to their Traditional architecture, hence the Traditional is better statement.
Yeah, he never said otherwise in that interview.

Says nothing of UMA as a whole, ATi's implementation could be far more advanced compared to nVidia's.
Looking at the performance of their X1800, and especially X1900, that were developed and came out around the same time as Xenos, I find that difficult to believe.
 
Marconelly said:
Looking at the performance of their X1800, and especially X1900, that were developed and came out around the same time as Xenos, I find that difficult to believe.

Why would you find that hard to believe? Don't compare Xenos to the PC parts, it was a completely custom design for MS conceived, developed and manufactured in the span of like 2 years I believe. MS and ATi worked together to come up with what MS desired as a console GPU. If you want a true comparison of UMA vs Traditional in apples to apples, you'll have to compare R600 to whatever is it's equivalent when it arrives later this year as that's ATi's full tilt UMA design from the ground up.
 
My lack of graphics coding experience shows here, but I don't see how API calls would keep ATI from going USA right now. WGF standards don't come into play until Vista AFAIK, and yet we're hearing about two cards scheduled for this Summer that should boast unified shaders. If nothing else, the front-end of the R600 and G80 should could look identical to that of the current cards, yet feature all the benefits of unified shaders.

I think it's evident that from a performance standpoint, traditional pipes work better that the unified shaders in Xenos. I say this b/c now two products later, ATI is still releasing traditional cards. If Xenos's unified architecture was the clear performance leader, I don't see why ATI would have ignored it to this point. Technology doesn't stand still, it keeps progressing, so why give NVidia a chance to scoop them?

My guess has always been that the advantages of USA has been countered by the limitation to how many such pipes you can fit on a chip. We can look at Xenos packing 48ALUs into 230M trannies, but then need to remember the eDRAM, which removed a lot of logic from the main die. As a PC product, I don't think a 48ALU card at 500MHz would perform well enough against the 7800GTX. I'm assuming ATI and NVidia are gonna wait to put WGF support into the new USA cards, and hopefully be able to squeeze more mature unified pipes into those cards. Just my guess, b/c I don't see why else ATI wouldn't have used it for the X1900. PEACE.

P.S. Both NVidia and ATI need to come up with new naming conventions. The large numbers and ambiguous suffixes are completely worthless.
 
Vince said:
Ohh, see I totally agree with you here -- beyond dispute X360 games developed for X360 will kick ass (haha). My overarching point of contention is that the subset of tasks which are well suited towards a unified shader, unbalanced/varient workload, may not map all that well with the set of tasks that a developer wants to utilize and run.



No, there is causation if you knew how the programs scale in terms of computational complexity and fnctionality. There is also ample precendence in that accelerated vertex work predates fragment processing, yet it was purposely overlooked and surpassed by fragment shading. Current trends are even further reinforcing my point as the SIMD array's are being utilized in GPGPU work, where as the vertex shaders are sitting idle.

It's more akin to evolutionary selection due to morphology, which is conditional.

(Random observation because I'm bored at work)

Give me an I! Give me a N! Give me a T! Give me a P! What's that spell!

I-N-T-P

That being said...why can't more of you have the INTP gene? I think drohne might have it, too.

I love logic battles. :)
 
gofreak said:
Regardless, I wonder how relevant it is in a closed box situation as we're facing with the consoles. One of the big motivators for unified shaders is the problem you face in the PC space where you have games with different mixes of vertex vs pixel instructions seeking to run on an equally varied set of gpus with different proportions of vertex vs pixel shaders in the hardware. That presents a challenge for utilisation and optimisation. A unified setup pretty much negates that. In a closed box, however, you should be automatically gaining on the utilisation side because you only have one chip to worry about, and its peculiarities and characteristics. Sure, the capability to "burst" one type of processing - or specifically, vertex processing - in some shader passes is nice, but again, with a closed box, you could find other ways to do that (that also leave you with a simultaneous capability to "burst" pixel shading).
I don't get the closed box theory. You can't really design a game that way because you don't know what the user is going to be focusing on in his camera view. The bursts happen at different times during the game based on point of view. I don't see how a dev can code for that.
 
Pimpwerx said:
My lack of graphics coding experience shows here, but I don't see how API calls would keep ATI from going USA right now. WGF standards don't come into play until Vista AFAIK, and yet we're hearing about two cards scheduled for this Summer that should boast unified shaders. If nothing else, the front-end of the R600 and G80 should could look identical to that of the current cards, yet feature all the benefits of unified shaders.

I think it's evident that from a performance standpoint, traditional pipes work better that the unified shaders in Xenos. I say this b/c now two products later, ATI is still releasing traditional cards. If Xenos's unified architecture was the clear performance leader, I don't see why ATI would have ignored it to this point. Technology doesn't stand still, it keeps progressing, so why give NVidia a chance to scoop them?

My guess has always been that the advantages of USA has been countered by the limitation to how many such pipes you can fit on a chip. We can look at Xenos packing 48ALUs into 230M trannies, but then need to remember the eDRAM, which removed a lot of logic from the main die. As a PC product, I don't think a 48ALU card at 500MHz would perform well enough against the 7800GTX. I'm assuming ATI and NVidia are gonna wait to put WGF support into the new USA cards, and hopefully be able to squeeze more mature unified pipes into those cards. Just my guess, b/c I don't see why else ATI wouldn't have used it for the X1900. PEACE.

P.S. Both NVidia and ATI need to come up with new naming conventions. The large numbers and ambiguous suffixes are completely worthless.


the last 2 gens of cards that ati released have still been based on the 9700 architecture though.

Maybe its as simple as ATI not having enough resources to develop Xenos and a pc unified shader chip at the same time?
 
dorio said:
I don't get the closed box theory. You can't really design a game that way because you don't know what the user is going to be focusing on in his camera view. The bursts happen at different times during the game based on point of view. I don't see how a dev can code for that.

A dev can't code to take advantage of that in a unified system either. Your frame may be processed quicker if the instruction mix deviates from the "typical" case enshrined in fixed hardware - that's the advantage in a unified system - but you're still bound by the worst case.
 
gofreak said:
A dev can't code to take advantage of that in a unified system either. Your frame may be processed quicker if the instruction mix deviates from the "typical" case enshrined in fixed hardware - that's the advantage in a unified system - but you're still bound by the worst case.
But a dev doesn't have to code for that in a unified system because the caching and the fact you can martial all processing units at will handles that automatically in theory at least.
 
dorio said:
But a dev doesn't have to code for that in a unified system because the caching and the fact you can martial all processing units at will handles that automatically in theory at least.

When I say "code to take advantage of that", I mean, code so that when for example the proportion of pixel shading decreases, you could increase the amount of vertex processing dynamically for that frame. You can't do that. You'll get your frame back quicker in those cases, that's the benefit. On either architecture you are bound by your worst case in a scene.
 
gofreak said:
A dev can't code to take advantage of that in a unified system either. Your frame may be processed quicker if the instruction mix deviates from the "typical" case enshrined in fixed hardware - that's the advantage in a unified system - but you're still bound by the worst case.

As is the case in general in computing.
 
StoOgE said:
Anything that can be expressed by any human language is a question of logic. Logic is the basis of all human language, you really cant escape it. To say something isnt a question of logic is absolutely stupid.. and anyone from Wittgenstein to chomsky would agree with that (for those who arent into linguistics or the development of language, they are polar opposites on the nature of language, but agree on the above comment, its a basic truism, I could try and explain it, but it would take alot longer than Im willing to dedicate to this).

his arguement begs the question. He assumes A is true (GPUs are built around x). Then proves this by saying B (GPU's are good at x). The problem is he is trying to prove A not B.

So his arguement goes soemthing like

B
A->B
therefore: A

A may be sufficient for B, but it is not necessary for B. A would need to be both necessary and sufficient for B in order to prove A.. the problem is, if A is both necessary and sufficient for B, then B is also necessary and sufficient for A. So either A doesnt give us B or A is sufficient for B, in which case B is also sufficient for A and then we fall into the epistemic regress I allready discussed.

Cant escape it, its flawed logic. No amount of technical jargon is going to get out of it to, its circular.

Regardless of the logic at hand here, the axiomatic assumptions need to be layed out to define the nature of this beast in order to come to some solution. The nature of A and B are within the confines of there own space...let's say for a moment that they do not exist in a world along with other influences and variables. Thus, they won't be bi-conditional...there is neither a causation nor an effect, but rather a discrete problem and a discrete answer with the parameters and rules already set (example: 1 + 1 = 2...a discrete mathematical/computational problem with a discrete answer). Therefore, within its isolated context, the nature of the current day display (emmanating from the CRT tvs of old) and its 2 dimensions on which the graphics will be outputted to is that of a grid array of pixels. Let's just say a pixel is a point in space (even though it's not). Given the nature of geometry and the fact that a vertex is the point/pixel in which 2 lines made of multiple pixels/points meet to form an angle, this necessitates that there will never be as many vertices on screen at once as there are number of pixels which can be displayed and manipulated. Therefore, as we all pretty much know already, there are always going to be more operations that need to be done to pixels as there are more pixels on screen than vertices always. There are no exceptions to this rule because the principles of geometry say that there aren't any, and we assume the principles of geometry to be true, especially in this well established 2D space. Therefore, when resolution increases, the number of pixels which can be displayed scales faster than the number of vertices which have to be represented by the adjoining of two sides of a polygon made up of two lines made up of many pixels each. This, of course, would be only one pass in a cycle of many per frame outputted. The need for operations to handle pixels would increase logarithmically because of the multiple passes needed per frame of output.

Thus, it would be logical for engineers to design a graphics solution with independent pipes to lean heavily on the fragment pipes end of things. The nature of the output medium has, since the beginning, forced the developer to develop around the hardware which was developed around the display medium itself.
 
Dr_Cogent said:
As is the case in general in computing.

Of course.

Asides from frame-to-frame instruction mix juggling, there's the opportunity for speed-ups in some passes that are vertex-heavy or could be vertex-heavy versus a fixed architecture also (depending on whether the rest of the fixed architecture - the pixel shaders - do actually NEED to be idle in that time, or if there are smarter ways to keep them busy in the meantime). But then it's a question of what the impact would be over the whole frame..

And that's just considering computational concerns. Tradeoffs elsewhere to accomodate the command-handling and threading logic may create barriers to better performance with other passes, for example (as we've seen in Inis's experience with the nFactor2 engine). Issues like that aren't immediately obvious or something you'd think of necessarily, but it seems that caveats do exist (though I admit that these may not necessarily be inherent in the concept of a USA, but the implementation we have here).
 
Given the nature of geometry and the fact that a vertex is the point/pixel in which 2 lines made of multiple pixels/points meet to form an angle, this necessitates that there will never be as many vertices on screen at once as there are number of pixels which can be displayed and manipulated.

Micro-polygons and algorithms based on subdivision of geometry into such entities kinda give a problem with what you say that is a "necessary" conclusion.

Just a minor thing/correction really as in the PC space we are very far still from using REYES in real-time for complex games... so, restricting things in the modern PC gaming space there are no problems with the theory you state after that paragraph IMHO.
 
Marconelly said:
I remember the interview you are talking about, and in it he's said that they keep evaluating the unified shader designs, but so far, their traditional designs are performing better. Unified shaders are technically more efficient in the percentage of shader utilization, but there's a difference between efficiency and raw performance/speed.

I got crucified last night for saying the same thing...

efficiency doesn't aways=speed....

Brute force can beat efficiency if you have enough of it (Brute force, that is)
 
Top Bottom