• Hey Guest. Check out your NeoGAF Wrapped 2025 results here!

Xenon GPU - Unified shader > *(plus high res Ruby vid)

jedimike

Member
Sorry if this was already posted. I went back a few pages and also did a search.

Tons of technical mumbo jumbo contained within.

Interview with ATI Vice Pres of Engineering

How many ATI engineers worked on the design of the Xenos GPU? Any statistics from this project you’d like to throw in?

Bob Feldstein: ATI had 175 engineers working on the Xenos GPU at the peak. These included architects, logic designers, physical design engineers, verification engineers, manufacturing test engineers, qualification engineers and management. The team was spread out between Orlando, Florida, Marlborough, Massachusetts and Toronto, Ontario. The team’s size varied during the life of the program – in fact, we still have about 20 engineers involved in various cost down projects.

When the Xbox 360 GPU features were unveiled, Nvidia expressed doubts about unified shader architecture, particularly about its performance. Do you think Nvidia’s comments are due to no Nvidia part, not even the RSX, having a unified shader architecture yet?

Bob Feldstein: Oh yes. Very much so.
 
I found this bit interesting:
The Xenos is made of two elements, the parent die, which is basically a shader core and also acts as the Northbridge; and the daughter die, which handles some functions traditionally executed inside a one-die GPU, like the FSAA, or alpha and Z logic.
What was the reason for these two dies to exist? Was it because of a physical constraints (difficulty in putting all these transistors into one single die) or an architecture need?


Bob Feldstein: There is no architectural reason for the two parts of the chip to exist on separate die. Instead, it was an economic decision. The daughter die that handles FSAA, alpha, stencil and Z contains a large array of dynamic memory. We have logic in this memory array and we call this combination Intelligent Memory. Because the Dynamic memory has a higher failure rate at manufacturing, it allows us to decouple the fallout from memory failures from the general fallout – and this saves us money overall. I believe that it would make sense in the future to combine the dies in a smaller geometry to save money.

This seems to be more about costs and not about performance....that is surprising...


Let’s talk about the unified shader architecture. First, I’d like to know about its performance. I’m pretty sure a unified shader architecture makes things easier for developers, but is a unified shader pipeline as good (performance wise) as the current architecture seen in PC parts, that is, separated pixel and vertex processing units.

Bob Feldstein: The Unified Shader Architecture actually improves overall performance. To understand why, we need to look at what is unified.

In current architectures there are separate shader mechanisms with different instruction sets and different caching mechanisms. What ATI found is that, in real applications, one set of shaders, or the other, is often idle because either pixel or vertex processing dominates in a bursty manner. To handle the bursts, you need to build a lot of parallel shader resources – even though they will be idle as processing transitions from pixels to vertex.

The Unified Shaders combines the instruction sets, creates the right caching mechanisms and with a lot of other complication allows all the shader processors to be used for any problem. Thus, when pixel processing dominates, we can use all 48 shaders for pixel processing. When vertex processing dominates, we can use the 48 shaders for vertices. When the workload is some vertex and some pixel processing, we can mix the shader resources between the two programs.

He didnt really answer the question at all..."The Unified Shader Architecture actually improves overall performance"....improves performance compared to what?? nV2a? R480? RSX?......he doesn't answer this question and quickly shifts to the definition of a UMA....why???

Is it possible to measure the Xenos performance against a PC part? Or does the fact that they have different shader architectures, run with different operating systems, and system hardware, make a comparison too difficult?

Bob Feldstein: It really is difficult to measure the environments against each other. You would tend to write an app to run well in its intended environment. I would discourage trying to compare them in these manners. That being said, the console has certain advantages – most notably, the controlled environment. This is what allows us to overcome memory bandwidth bottlenecks with Intelligent Memory and make the 48 shader units operate at peak efficiency. A device could have an infinite number of shader ALUs, but without the memory bandwidth to feed the system all of the hardware would go for naught.


This is horseshit...there are standardized benchmark tests out there for stuff like fillrate, vertex rate, AA performance and many others as well....there are even some benchmarks out there for actual games, some of which reside both on PC and X360 (CoD2 and Quake 4 come to mind) and can easily be compared.....so I am supposed to believe ATI couldn't release benchmark results of Xenos against even ATIs own PC parts like R520 or R480???

Sorry, but ATI is *choosing* not to release those Xenos numbers...


The RSX has a 550 MHz clock speed. Does this 10% clock speed lead over the Xenos GPU necessarily mean that the PlayStation 3 GPU is more powerful than the Xbox 360 GPU? We won’t believe it until we see it, but if true, how is it possible that the PlayStation 3 can output two 1080p video streams simultaneously? That makes the RSX sound more powerful than the Xenos…

Bob Feldstein: No! These are inconsequential numbers that don’t reflect any reality concerning the system performance. The 1080p streams have no bearing on understanding system performance, and the clock speed means little.

Realize that the memory bandwidth is the bottleneck of graphics systems. ATI’s Intelligent Memory provides an almost infinite bandwidth path to memory – meaning that the Unified Shaders will never be stifled in getting work to do. The Sony processor is going to come up against memory bandwidth limitations constantly, negating any small clocking differential.

The Sony 1080p dual outputs are not an indication of performance – at best 1080p is an indication that Sony considered this resolution the sweet spot of the market. The use case of dual 1080p just shows that the RSX has a PC pedigree, and has been cobbled together with the console.


Not even gonna go there on this one...
 
Kleegamefan said:
He didnt really answer the question at all..."The Unified Shader Architecture actually improves overall performance"....improves performance compared to what?? nV2a? R480? RSX?......he doesn't answer this question and quickly shifts to the definition of a UMA....why???

Well, obviously if you had a GPU with 48 vertex and 48 pixel shaders, it'd be more powerful. I think what he means is that the unified architecture ends up being the most efficient use of 48 shader units.
 
Kleegamefan said:
I found this bit interesting:


This seems to be more about costs and not about performance....that is surprising...
That it's a seperate die, not that it is embedded ram with attached logic. Which is the reason why the 360gpu will likely outperform the rsx in bandwidth.
 
IJoel said:
Well, obviously if you had a GPU with 48 vertex and 48 pixel shaders, it'd be more powerful. I think what he means is that the unified architecture ends up being the most efficient use of 48 shader units.


Everyone knows UMA are more efficient than traditional vertex/pixel shaders....what we don't know is how fast it is in comparison......guess we wont know for sure till R600 (early 2007-ish)
 
Kleegamefan said:
Everyone knows UMA are more efficient than traditional vertex/pixel shaders....what we don't know is how fast it is in comparison......guess we wont know for sure till R600 (early 2007-ish)

Well, in comparison to what? You could probably theoretically equate the performance of the 500MHz Xenos to a current GPU design running at 250MHz with 48 Pixel and 48 Vertex shaders, since Xenos can't supposedly do mixed processing during a cycle. Obviously this would also depend on how efficient is their algorithm.

And then it's not quite as straight forward as that, depending on the shader array distribution and the instruction processing and how it compares, but again, there's nothing out there that this can be compared to, unless you mean benchmarks. I doubt they'll release benchmarks before Sony/NVidia release such a thing, if they do at all.
 
IJoel said:
Well, in comparison to what? You could probably theoretically equate the performance of the 500MHz Xenos to a current GPU design running at 250MHz with 48 Pixel and 48 Vertex shaders, since Xenos can't supposedly do mixed processing during a cycle. Obviously this would also depend on how efficient is their algorithm.


I hope you are not implying a UMA ALU is just as fast as a vertex/pixel ALU at doing vertex/pixel work, because that is not the case...

The upside of UMAs are widely covered but the downside of a general purpose UMA ALU is it is NOT as fast at doing pixel work as a dedicated pixel shader or as fast at doing vertex work as a dedicated vertex shader...

Nobody seems to want to talk about this...
 
Kleegamefan said:
I hope you are not implying a UMA ALU is just as fast as a vertex/pixel ALU at doing vertex/pixel work, because that is not the case...

The upside of UMAs are widely covered but the downside of a general purpose UMA ALU is it is NOT as fast at doing pixel work as a dedicated pixel shader or as fast at doing vertex work as a dedicated vertex shader...

Nobody seems to want to talk about this...
No one wants to talk about it because the other guy hasn't released their product. Why start a pissing contest when you don't know what you're aiming at?

In the end though all those numbers are irrevelant. As the generation advances people, thank god, talk about games and sales numbers of course :) .
 
Kleegamefan said:
I hope you are not implying a UMA ALU is just as fast as a vertex/pixel ALU at doing vertex/pixel work, because that is not the case...

The upside of UMAs are widely covered but the downside of a general purpose UMA ALU is it is NOT as fast at doing pixel work as a dedicated pixel shader or as fast at doing vertex work as a dedicated vertex shader...

Nobody seems to want to talk about this...

Hmm... It really all depends on the types of operations performed per cycle, in order to determine whether it's 'as fast' or not.

"Current ATI hardware is able to perform two 3 wide vector and two scalar operations per cycle in the pixel pipe alone. The vertex pipeline of R420 is 6 wide and can do one vector 4 and one scalar op per cycle. If we look at straight up processing power, this gives R420 the ability to crunch 158 components (30 of which are 32bit and 128 are limited to 24bit precision). The Xbox GPU is able to crunch 240 32bit components in its shader units per clock cycle. Where this is a 51% increase in the number of ops that can be done per cycle (as well as a general increase in precision), we can't expect these 48 piplines to act like 3 sets of R420 pipelines. All things being equal, this increase (when only looking at ops/cycle) would be only as powerful as a 24 piped R420." From: http://www.anandtech.com/video/showdoc.aspx?i=2453&p=7

That might allude better that what you were referring to.
 
What were the goals and challenges that ATI faced in developing the Xbox 360 GPU?

Bob Feldstein: The challenges included creating on schedule a platform that can live for five years without enhancement.
Microsoft’s aggressive performance specifications for the system forced ATI to once again think outside the box –in this case, the PC market. After making the breakthrough that we needed by thinking of this product as a console product only, the innovations -- Intelligent Memory, Unified Shader, Modeling Engine -- came more easily. Then the architecture team had to come through in record time to stay ahead of an aggressive implementation team.

Next Xbox in 5 years folks :)
 
Kleegamefan said:
I hope you are not implying a UMA ALU is just as fast as a vertex/pixel ALU at doing vertex/pixel work, because that is not the case...

The upside of UMAs are widely covered but the downside of a general purpose UMA ALU is it is NOT as fast at doing pixel work as a dedicated pixel shader or as fast at doing vertex work as a dedicated vertex shader...

Nobody seems to want to talk about this...

Well take into consideration idle cycles. Yes a dedicated pixel pipeline and dedicated vertex pipeline would be best at what it repectively does, but a UMA eliminates the idle cycles when the workload is primarily vertex oriented or pixel oriented.

Take a typical video card with say 16PP and 8VS, when the load is primarily PP you'd utilize all 16, and the 8VS would be sitting idle, reverse in case of vertex, the 8 would be utilized and the 16 would be sitting there doing nothing. During combinations of the 2, the pipelines would be switching back and forth and there is never a full utilization of all at one time, there are always some sitting idle. With a UMA, regardless of if they are slightly slower at the job at hand, 48 is 48. All are being utilized at all times regardless of if you are pixel/vertex heavy. You have 48 for one, 48 for the other. At all times they are being fully utilized. Which is why he states its efficiency so much.

This is also why PC's are switching over to this architecture beginning with the switch to Windows Vista. R600 will be fully unified. UMA is the way to go.
 
Kleegamefan said:
He didnt really answer the question at all..."The Unified Shader Architecture actually improves overall performance"....improves performance compared to what?? nV2a? R480? RSX?......he doesn't answer this question and quickly shifts to the definition of a UMA....why???

Yes, he answered the question as to "why" it improves performance.

The question wasn't about what makes it "faster" than anything else. He is explaining the design philosophy behind the system and why they chose that path.

It's all about efficiency.
 
another very good piece right here:

The interface to the system’s memory is 128-bit. Isn’t this a bottleneck considering the bandwidth-intensive tasks performed in the GPU? Why was a 128-bit bus selected when PC parts already implement 256-bit buses in their high-end editions?

Bob Feldstein: Excellent question because it gets to the heart of what is right in the system design. We have a great deal of internal memory in the daughter die referred to above. We actually use this memory as our back buffer. In addition, all anti-aliasing resolves, Z-Buffering and Alpha Blending occur within this internal memory. This means our highest bandwidth clients (Z, Alpha and FSAA) occur internally to the chip and don’t need to access main memory. This makes the 128 bit interface to system memory, and the ensuing bandwidth, more than enough for our needs because we are offloading the bandwidth hogs to internal memory.
 
http://66.102.7.104/search?q=cache:...ied+Shader+performance&hl=en&client=firefox-a

The difference between equivalent configurations for non-unified and unified shader architectures shows that the unified architecture can use the larger shader
pool to shade vertices at a faster rate.
The improvement is small though, ranging from a 1% to an 8% depending on the configuration. The reason is that the frames in our trace are mostly limited by fragment shading.

Another reason is that the same configuration is used for the geometry stages in both architectures and is currently limited to a throughput of 1 vertex and 1 triangle
per cycle. The vertex data fetch from memory may also become a bottleneck for not properly aligned or interleaved streams.
 
Kleegamefan said:

Haha... i was reading that the other day. I was looking for it to quote it.

From the same document:
The evaluated unified shader architecture proves to be 15% to 30% more efficient, in terms of area, with a 2% to 7% improvement in performance when compared with a similar non-unified architecture.

Of course, it's all moot because it comes down to actual operations/cycle and how efficient is the compiler and their algorithm to feed instructions to the pipeline.
 
The idea behind it is actually quite simple. You don't want your resources sitting idle doing nothing for you when you can instead take advantage of them more efficiently in order to create a greater output. The same idea is behind many design decisions in computing.
 
IJoel said:
Haha... i was reading that the other day. I was looking for it to quote it.

From the same document:


Of course, it's all moot because it comes down to actual operations/cycle and how efficient is the compiler and their algorithm to feed instructions to the pipeline.
What hardware are they using to reach those conclusions?
 
Again, I am aware of the *efficiency* advantage of UMAs and I have a decent understanding on the theory UMAs in Xenos.....

AFAIK, a (as in one, single) vertex shader is better/faster than a (singular) UMA ALU at vertex ops clock for clock (efficiency depending on the effectiveness of vertex buffer, of course)....

The same is true of a dedicated pixel shader vs a UMA ALU....

The tradeoff of a dedicated pixel/vertex shader is efficiency/flexibility because they will not always match the needs of every game or app and so one will be left idle waiting for the other...

UMAs eliminate this problem but downside is they are jack of both trades (vertex, pixel) masters at none and so are slower than a traditional pixel/vertex shader....this is probably why Xenos has 48 ALUs, to make up for this deficit...

Again, the overall efficiency of UMAs will also help with the speed defict too, but with no benchmark shader tests do draw from, we do not know what exactly the "UMA benifit" we are getting in Xenos....

MS/ATI propaganda says UMA are "better" but to what extent, exactly?


This is all I am asking...
 
Everyone just do KLEE a favor and say something like "yeah dude. RSX is definitely more powerful than Xenos, Traditional shaders ftw!" so that he can feel better and go to bed happy.
 
Kleegamefan said:
This is all I am asking...

And what kind of answer can we give you? Logic dictates that the degree of benefit of unified shaders is directly dependent on the characteristics of the software. Nobody here knows how much slower the unified shaders are compared to traditional shaders, so why argue about it? Wait a while and let the games speak for themselves -_-
 
Shogmaster said:
Everyone just do KLEE a favor and say something like "yeah dude. RSX is definitely more powerful than Xenos, Traditional shaders ftw!" so that he can feel better and go to bed happy.


We will see if this nets you the desired effect Shog...
 
Tenacious-V said:
Well take into consideration idle cycles. Yes a dedicated pixel pipeline and dedicated vertex pipeline would be best at what it repectively does, but a UMA eliminates the idle cycles when the workload is primarily vertex oriented or pixel oriented.

Take a typical video card with say 16PP and 8VS, when the load is primarily PP you'd utilize all 16, and the 8VS would be sitting idle, reverse in case of vertex, the 8 would be utilized and the 16 would be sitting there doing nothing. During combinations of the 2, the pipelines would be switching back and forth and there is never a full utilization of all at one time, there are always some sitting idle. With a UMA, regardless of if they are slightly slower at the job at hand, 48 is 48. All are being utilized at all times regardless of if you are pixel/vertex heavy. You have 48 for one, 48 for the other. At all times they are being fully utilized. Which is why he states its efficiency so much.

This is also why PC's are switching over to this architecture beginning with the switch to Windows Vista. R600 will be fully unified. UMA is the way to go.
Hmmm, one'd imagine there would be some way to get the idle VSs/PSs to do some work without resorting to unification. There should be a way to decouple what they're doing, aka, VSs/PSs handling what they'd have to do ahead of time while the other half finishes its job. Say you split the screen into multiple areas, all VSs speed through one section leaving PSs to do their work on that one and then the VSs immediately start the next batch/portion(while the PSs work on the previous section), and so on or something like that, both busy virtually all the time. would something like that be unviable? As for unified, I'd heard there was a necessary management h/w overhead(and difficulty implementing optimally) with unification, which'd take away die space.
 
Kleegamefan said:
MS/ATI propaganda says UMA are "better" but to what extent, exactly?


This is all I am asking...
Don't hold your breath waiting for this console makers to benchmark their products. It didn't happen last gen and it probably want happen this gen.
 
Kleegamefan said:
Oh, thanks luv :D

See? You're happier already!

MISSION:
fc29.jpg
 
Klee! RSX's traditional shaders are definitely better than Xenos' crappy Unified shaders. ATI gave MS their failed R400 experiment with a fancy new name.

Good night!
 
YellowAce said:
Klee! RSX's traditional shaders are definitely better than Xenos' crappy Unified shaders. ATI gave MS their failed R400 experiment with a fancy new name.

Good night!


Damn you YellowAce, damn you all to HECK!!!


Well, you win this round Mr. Shog!!!!





*whisks off into the night air*
 
some of which reside both on PC and X360 (CoD2 and Quake 4 come to mind) and can easily be compared

Wouldn't games have to be programmed from the beginning to fully support either a unified pipeline or the seperated units? Just imagine if a game was developed exclusively for the 360 and ported to PC...you'd have the same issue.
 
Fight for Freeform said:
Wouldn't games have to be programmed from the beginning to fully support either a unified pipeline or the seperated units? Just imagine if a game was developed exclusively for the 360 and ported to PC...you'd have the same issue.

*fingers in ears*

LA-LA-LA-LA-LA-LA-LAAAA.....I CANT HEAR YOU 'CAUSE I'M NOT HERE!!!
 
YellowAce said:
Klee! RSX's traditional shaders are definitely better than Xenos' crappy Unified shaders. ATI gave MS their failed R400 experiment with a fancy new name.

Good night!


it'll be fun to watch RSX choke on complex HDTV visuals with AA and everything else, with only a 128-bit bus to support it, and NO HIGH-BANDWIDTH EDRAM to take the load off.
 
Kleegamefan said:

You forgot the important part:

The difference between equivalent configurations for non-unified and unified shader architectures shows that the unified architecture can use the larger shader pool to shade vertices at a faster rate. The improvement is small though, ranging from a 1% to an 8% depending on the configuration. The reason is that the frames in our trace are mostly limited by fragment shading.

If the frames they tested were limited by pixel shaders, then duh, unified shading isn't going to help you much relative to a standard GPU architecture today. Most standard PC GPUs are heavy on the pixel shaders and light on the vertex shaders because most PC games are heavy on the pixel shaders and light on the vertex shaders.

Because 360 is a closed console platform, developers will find uses for the huge vertex shading throughput enabled by a unified shading architecture.

This is another reason why Xenos is a good example of a purpose-built console GPU, not a derivative of a current PC GPU.
 
Xenos is, so far, the most impressive piece of next-gen technology IMO.
more than Cell CPU, RSX GPU or Xenon CPU


the Revolution controller might be even more interesting though :D
 
aaaaa0 said:
You forgot the important part:



If the frames they tested were limited by pixel shaders, then duh, unified shading isn't going to help you much relative to a standard GPU architecture today. Most standard PC GPUs are heavy on the pixel shaders and light on the vertex shaders because most PC games are heavy on the pixel shaders and light on the vertex shaders.

Because 360 is a closed console platform, developers will find uses for the huge vertex shading throughput enabled by a unified shading architecture.

This is another reason why Xenos is a good example of a purpose-built console GPU, not a derivative of a current PC GPU.
Kleegamefan "but but butt"

baaa.gif
 
ThirdEye said:

X1900 $695.99> Xenon GPU $299-399.99 or ps3 gpu=? $$$>

I'll take the 360 or ps3 over that 600 dollar video card any day. Hell, that 695 card almost cost as much as a 360 and ps3 combined if ps3 launches at 399...or take a revolution and either 360 or ps3 and it will be the same. Consoles FTW
 
One other interesting point they try to make in the paper is that they claim the unified architecture is about 30% more efficient for the same silicon die area as the non-unified architecture.

Extrapolating the numbers naively, this implies that the 235 million transistor unified shader core in Xenos should perform roughly as well as the 300 million transistor traditional shader core in RSX -- but Xenos has a 90 milllion transistor daughter die packed with ROPS and embedded memory on top of it.
 
aaaaa0 said:
If the frames they tested were limited by pixel shaders, then duh, unified shading isn't going to help you much relative to a standard GPU architecture today. Most standard PC GPUs are heavy on the pixel shaders and light on the vertex shaders because most PC games are heavy on the pixel shaders and light on the vertex shaders.

Because 360 is a closed console platform, developers will find uses for the huge vertex shading throughput enabled by a unified shading architecture.

Your argument is incoherent; You posit state A, uses it to justify your argument of why this paper is irrelevent, and then posit that state A isn't universal.

The fact is that Fragment Processing is and will continue to scale to higher processing requirements than per-Vertex computation. The higher fragment processing capability is for a computational reason. It's not an inverse chicken-egg situation (which your argument requires) in which the existence of a bias towards fragment shading resources led towards developers to use them more -- that's asinine.

As the paper shows, the Unified Architecture basically takes an advantage when the workload is unbalanced and varient. You, by your own argument, state that the target is a "closed console platform;" which inheriently allows for extremely fine tuned and invarient conditions. Not exactly the situation one in your position would want to rest an argument on.
 
Vince said:
Your argument is incoherent; You posit state A, uses it to justify your argument of why this paper is irrelevent, and then posit that state A isn't universal.

The fact is that Fragment Processing is and will continue to scale to higher processing requirements than per-Vertex computation. The higher fragment processing capability is for a computational reason. It's not an inverse chicken-egg situation (which your argument requires) in which the existence of a bias towards fragment shading resources led towards developers to use them more -- that's asinine.

Everytime KLEE gets out argued on anything PS3 related, you just happen to pop up. Coincidence? Or are you Batman to his Commissioner Gordon?

KLEE just rang you up on the Vince-phone, didn't he? :lol
 
Vince said:
The fact is that Fragment Processing is and will continue to scale to higher processing requirements than per-Vertex computation. The higher fragment processing capability is for a computational reason.

PC games have traditionally been pixel heavy and vertex light. This is pretty easy to check with a performance analyzer.

PC gamers like to run at high resolutions with high anisotrophy and high levels of AA. All of these stress fillrate and pixel shading, which is why PC GPUs like the G70 and R520 are heavy on the pixel shaders and light on the vertex shaders. ATI and NVidia keep building their GPUs this way because that's what wins benchmarks ON PC GAMES.

Vince said:
It's not an inverse chicken-egg situation (which your argument requires) in which the existence of a bias towards fragment shading resources led towards developers to use them more -- that's asinine.

Ugh. Have you ever written a single line of game code?

Real world example:

PS2 was great at alpha blending, but lousy at per pixel effects. So developers went nuts with the alpha blending. Result: MGS2.

Xbox, not so great at alpha blending, but pretty good at per pixel effects. So developers go nuts with per-pixel effects. Result: Splinter Cell.

The point is, software written for closed platforms like consoles tends to ends up being molded by the hardware in the box. This is just a fact of life, and as a developer you should know this.

I completely expect the same kinds of things to happen on PS3 and xbox 360 this generation. A good console developer molds his software to the strengths of the hardware he's tasked with.
 
aaaaa0 said:
Real world example:

PS2 was great at alpha blending, but lousy at per pixel effects. So developers went nuts with the alpha blending. Result: MGS2.

Xbox, not so great at alpha blending, but pretty good at per pixel effects. So developers go nuts with per-pixel effects. Result: Splinter Cell.

Which are both *gasp* console games! Try comprehending my argument before posting nexttime douche.... I specifically was talking about the fact that on the PC (which is where we can see SM3 utilization), the fact is that you see fragment processing more heavily biased in both ATI and nVidia's hardware is because it scales faster than per-vertex work. In no way does the bias towards devoting area towards fragment shading imply anything BUT the fact that it's a more resource heavy task and scales extremely fast.

Some real world examples on the PC of how the IHV's don't control what developers utilize:

TruForm, 3Dc, NV Occlusion Query, etc, etc..


Yes, he edited to make it comprehensible!
aaaaaa0 said:
PC gamers like to run at high resolutions with high anisotrophy and high levels of AA. All of these stress fillrate and pixel shading, which is why PC GPUs like the G70 and R520 are heavy on the pixel shaders and light on the vertex shaders. ATI and NVidia keep building their GPUs this way because that's what wins benchmarks ON PC GAMES.

PC games, typically, run bounded at a resolution which necessitates output of around as many pixels as 1080p: which is decidedly not that alien to next generation consoles. The bias towards fragment shading is more fundimental than just to win benchmarks; look at how the tasks scale.

PC Games - the good ones - are a better indication of where resources are typically wanted and devoted than a console game for the reasons you stress so ademently: they are openly developed without regard for preformance levels, but rather for the effects achievable. On the console, you have conditions in which you stress what the machine is good at, not what a developer would desire.

And, amazingly, on the PC we see this trend towards more biased fragment, arithmetic, computational ability. nVidia's analysis of over 1,000 of the most common fragment programs led towards the current doubling of MADD/clock and similar per-fram analysis also led towards the current vertex to fragment division of area.
 
Vince said:
Which are both *gasp* console games! Try comprehending my argument before posting nexttime douche....

Is "douche" really necessary?

I specifically was talking about the fact that on the PC (which is where we can see SM3 utilization), the fact is that you see fragment processing more heavily biased in both ATI and nVidia's hardware is because it scales faster than per-vertex work.

How do you mean it scales faster? Do you mean the demand for pixel shading scales faster, or that it's easier to scale up pixel shaders in your design?

Because the first depends on the rendering techniques you choose, and the second I don't believe is true.

In no way does the bias towards devoting area towards fragment shading imply anything BUT the fact that it's a more resource heavy task and scales extremely fast.

IHVs build their cards to win benchmarks. If the benchmarks are heavy on fill and pixel, then IHVs will build cards heavy on fill and pixel. It's as simple as that.
 
Top Bottom