WiiU "Latte" GPU Die Photo - GPU Feature Set And Power Analysis

Status
Not open for further replies.
We've never seen confirmation that Latte is in fact VLIW5 as far as I know. Could be an early GCN like design for all we know. Just like Xenos was some weird in-between thing.

This has been my theory since I started posting in here again, if it started out as VLIW it was pushed more to a cluster design through schedulers, which was VLIW5's biggest weakness and often left it only using 4 of the 5 shaders and even then it would often drop to only 3.
 
thats what i was thinking, but the constant ram bandwidth is crap from some people when its never been seen to be an issue is so frustrating

Numbers are a talking point, people can compare them. The problem is no one wants to try to understand anything. (a very general statement not actually meant to mean everyone or anyone, just some of the people who talk about those points)
 
Numbers are a talking point, people can compare them. The problem is no one wants to try to understand anything. (a very general statement not actually meant to mean everyone or anyone, just some of the people who talk about those points)

the worst is people trying to make clockspeed based claims, its like people forgot what happened in the pc world over the last 10 years with clockspeeds in themselves becoming a meaningless metric
 
This has been my theory since I started posting in here again, if it started out as VLIW it was pushed more to a cluster design through schedulers, which was VLIW5's biggest weakness and often left it only using 4 of the 5 shaders and even then it would often drop to only 3.

Isn't GCN basically VLIW4 with better compute capabilities (oversimplified obviously)? Perhaps Latte is VLIW4 with the necessary customizations, or even somewhere in between VLIW4 and GCN.

Let's say it is VLIW4, then 160 shaders perform more or less like 200 VLIW5 shaders. I could be totally wrong, I'm just going by what I've read in the past about VLIW5 -> VLIW4 -> GCN changes.
 
Isn't GCN basically VLIW4 with better compute capabilities (oversimplified obviously)? Perhaps Latte is VLIW4 with the necessary customizations, or even somewhere in between VLIW4 and GCN.

Let's say it is VLIW4, then 160 shaders perform more or less like 200 VLIW5 shaders. I could be totally wrong, I'm just going by what I've read in the past about VLIW5 -> VLIW4 -> GCN changes.

GCN is very different from VLIW4. http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute/3 that is a pretty good page for that.

Yes VLIW4 does perform close to that, which is why HD 5870 with 1600 ALUs and HD 6950 with 1440 ALUs performed extremely close to each other. VLIW4 can just keep all four ALUs feed while VLIW5 has a very big problem when it comes to firing all 5 units especially continuously.
 
Before we had Durango efficiency that was a guessing game. However Durango also has 3x as much eDRAM, an even more modern featureset, a more efficient architecture and only achieves 66% better efficiency from it's ALUs. Making this much less of a guessing game. We have to assume Nintendo basically overcame all of those negatives and achieved ~50% better efficiency on VLIW5 in order to just edge out Xenos with 160 ALUs. I'd assume that is possible but you are still dealing with bad dev tools and unfinished hardware when it comes to launch titles like AC3 and CoD... It does seem a bit of a stretch so I don't think a more custom design is out of the question. A developer has already said as much.

You can't rely on that figure too much, not every scenario is ALU limited. You wouldn't need to improve hardware efficiency all that much if memory thrashing or API limitations were your main bottlenecks. Besides, I don't think those efficiency figures refer to the the GPU within the system as a whole but within the GPU themselves. For example, ceteris paribus, current nVidia GPUs will have a higher throughout of their rated FLOPs than AMD GPUs.
 
You can't rely on that figure too much, not every scenario is ALU limited. You wouldn't need to improve hardware efficiency all that much if memory thrashing or API limitations were your main bottlenecks. Besides, I don't think those efficiency figures refer to the the GPU within the system as a whole but within the GPU themselves. For example, ceteris paribus, current nVidia GPUs will have a higher throughout of their rated FLOPs than AMD GPUs.

Right, but in this case. What is the difference? Obviously Microsoft is not saying that Durango performs 66% better than Xenos, they are saying that ALUs perform 66% better than the ones in Xenos thanks to the changes made to the GPU as a whole not just the ALUs themselves but the GPU as a whole. That is the thing we are trying to get at right? With all the differences between Xenos and GPU7, Wii U would need to keep those 160 ALUs fed 50% more efficiently than Xenos. Durango already gets 66% better than Xenos, and obviously considering it's architecture Wii U should all somewhere between the two.

So unless I am missing something, it makes sense to compare these two using Durango's efficiency numbers with the knowledge that it is GCN at its best. (embedded SRAM, dual engine, SIMD... everything taken into account)
 
Right, but in this case. What is the difference? Obviously Microsoft is not saying that Durango performs 66% better than Xenos, they are saying that ALUs perform 66% better than the ones in Xenos thanks to the changes made to the GPU as a whole not just the ALUs themselves but the GPU as a whole. That is the thing we are trying to get at right? With all the differences between Xenos and GPU7, Wii U would need to keep those 160 ALUs fed 50% more efficiently than Xenos. Durango already gets 66% better than Xenos, and obviously considering it's architecture Wii U should all somewhere between the two.

So unless I am missing something, it makes sense to compare these two using Durango's efficiency numbers with the knowledge that it is GCN at its best. (embedded SRAM, dual engine, SIMD... everything taken into account)

this has been talked about on beyond3d.

Well the Wii U has a 10 % clock advantage, and so it probably has 10% more fill and texturing. Or maybe more - it's likely that as well as having faster TMUs that the WiiU has more efficient TMUs - the 360's texture cache is apparently quite small and memory latency is almost certainly lower on the Wii U.

So the Wii U is probably just an awful lot more efficient - the shaders will be bottlenecked less on either end than on the 360 (assuming ROP BW isn't an issues), the logic feeding the shaders will be better, VLIW 5 is probably better than the 360's Vec4 + 1, and the GPU won't idle whenever it's copying data out from edram to main memory. And thanks to early Z test it's probably actually working on fewer pixels too.

And maybe it's more efficient on small polys too - higher ROP to shader ratio and all that.

There are a lot of things that when combined could strongly stack up in favour of the Wii U GPU just being a hell of a lot more efficient. What was that figure that went round NeoGaf for Xenon vs Durango shader efficiency, 53% vs 100% or something? Possibly that particular example is bollocks, but the Wii U only needs to get 35% more work done per clock from it's SIMD units over the length of a frame and it's past the 360 ...
that shaders number came from the leak docs I believe. I seen it posted before in .one of the Xbox threads. I on my phone and it's hard to search.


http://beyond3d.com/showthread.php?t=60501&page=198

I am almost 99% it's a 160 alu part at this point. I would be shocked if it was something else. Everything I have found points directly to it.
 
So unless I am missing something, it makes sense to compare these two using Durango's efficiency numbers with the knowledge that it is GCN at its best. (embedded SRAM, dual engine, SIMD... everything taken into account)
We don't know that it's GCN at it's best and we don't know how much (if any) of that efficiency in improvement is derived from things like the SRAM, how much from the shader cores going from first gen unified shaders to GCN and how much from changes ROP and TMU ratios etc... But my main point is that ALU throughput isn't the only thing to consider here, though it is certainly a large one. That's why I mentioned the (ED)RAM and API changes, because there are situations where shader power isn't going to be the bottleneck, or where software inefficiency may trump hardware efficiency. That and a potential efficiency improvement for a theoretical performance measure isn't the most solid of bases to build on. But I agree that GPU7 shader efficiency is likely to fall somewhere between the two.
 
this has been talked about on beyond3d.

that shaders number came from the leak docs I believe. I seen it posted before in .one of the Xbox threads. I on my phone and it's hard to search.


http://beyond3d.com/showthread.php?t=60501&page=198

I am almost 99% it's a 160 alu part at this point. I would be shocked if it was something else. Everything I have found points directly to it.

I don't really care about the ALU count so much as what 160 ALUs would have to do... 35% in the quote plus the 10% overclock only gets to ~237 Xenos ALUs, it would have to be a bit past that... which is why I say 50% more efficient, it would mean Wii U's ALUs reach ~264 Xenos ALUs, leaving it with about 1/10th of xenos' ALU resources for the gamepad while matching 360 games on the tv. This is the bare minimum IMO, a 10% better performance from the ALUs in Wii U, just because it has to have something extra for the gamepad when games are topping out Xenos ALU resources.

Also to A more normal bird: I'm figuring that the ALUs would stall while those things are stalling so overall we are talking about the same thing. TMUs, ROPs, cache and API all have to be a certain level of efficiency not to bog down the Wii U's GPU. That is what I'm getting at with my numbers, it is not purely that each ALU can simply handle 50% more code by itself but that the entire GPU allows the ALUs in GPU7 to out perform the ALUs in Xenos by 50%. If that is VLIW5 it's very impressive. If it is GCN, it is probably a bit bogged down still thanks to the bad dev tools... I think when we get a few years in, the GPU7's efficiency might actually get closer to 60% better than xenos but that is just because we should allow some growth from the hardware, especially since no developer at launch was pushing the Wii U.
 
The line of thinking seems to be that, if the Wii U version of a GPU heavy game outperforms the Xbox360 version, it needs at least as many GFLOPS. And to get there, one would need a certain amount of shader units. Makes sense, right?

Except it doesn't, because traditional GPUs are apparently quite inefficient. Reportedly mostly as a result from branching issues, which can reduce the overall real world performance of a GPU by as much as ~85%, and stalls during texture reads, which can take hundreds or thousands of cycles. That's the problem with GFLOPS figures - they're highly theoretical and nowhere near the actual performance you'll get under real workloads. So essentially, if Nintendo managed to eliminate just one of those bottlenecks, the GFLOPS comparison becomes pretty much meaningless.

Perhaps, before, I was underestimating how efficient Latte's SP could be compared to Xenos'.
We've never seen confirmation that Latte is in fact VLIW5 as far as I know. Could be an early GCN like design for all we know. Just like Xenos was some weird in-between thing.
True. If Nintendo was serious about efficiency, it is very unlikely for Latte to have the same VLIW5 architecture as the original r700.
 
It been a long time since I look at Xenos but I thought it was only had 48 alu.



*XENOS has 48 ALUs that are 16-way, and are grouped into 3 arrays of SIMD ALUs. Each ALU can co-issue a Vector4 and a scalar instruction simultaneously, essentially a "5D" operation per cycle (basically 2 Vec4 and 2 scalar instructions per cycle per ALU). The ALUs process everything in FP32 precision with no internal partial precision requirements for FP16. Additionally each of the 48 ALUs contains additional logic that performs all the pixel shader interpolation calculations. ATI suggests that this would basically equates to an extra 33% pixel shader computional capacity.

http://arstechnica.com/civis/viewtopic.php?f=22&t=381592

Yes look like it was...
 
It been a long time since I look at Xenos but I thought it was only had 48 alu.

*XENOS has 48 ALUs that are 16-way, and are grouped into 3 arrays of SIMD ALUs. Each ALU can co-issue a Vector4 and a scalar instruction simultaneously, essentially a "5D" operation per cycle (basically 2 Vec4 and 2 scalar instructions per cycle per ALU). The ALUs process everything in FP32 precision with no internal partial precision requirements for FP16. Additionally each of the 48 ALUs contains additional logic that performs all the pixel shader interpolation calculations. ATI suggests that this would basically equates to an extra 33% pixel shader computional capacity.



http://arstechnica.com/civis/viewtopic.php?f=22&t=381592

Yes look like it was...
This is where you sometimes hear 255GFLOPs for Xenos. it could do 192 instructions per clock plus another ~63 pixel shader calculations per clock.
http://hardforum.com/showpost.php?p=1037111496&postcount=5
evolucion8 said:
That prediction is a fad. AMD's VLIW underutilization always has been an issue since its induction on the HD 2x00 series. A good example of this. The Radeon HD 6870 has 1120 stream processors which in reality, are 224 stream processors in which every processor in there, is capable to execute up to 5 instructions per clock, but as far as it is from the same thread, and that's the catch.

You compare it to the scalar architecture of the for example, the GTX 460 which has 334 stream processors, and each processor there is capable to execute one instruction per clock regardless if its from the same thread or not, which shows that nVidia's approach in terms of performance is quite predictable and it's based on Thread Level parallelism, AMD's approach of the VLIW5 is torward Instruction Level Parallelism which means that it requires of compiler tricks to maximize its execution resources. So, sounds quite outstanding that the GTX 460 is almost able to keep up with the HD 6870 and its sheer amount of 1120 stream processors, but in the other hand you could also say that its a feat seeing the HD 6870 with its 224 stream processors being able to outperform slightly the GTX 460 with its 334. AMD's VLIW approach under maximum utilization is able to smoke anything that nVidia currently offers, but that's something that only would happen in rare ideal circumstances, I will explain that below.

nVidia's approach usually is better as it requires of little software optimization, but also means that the chip die will be much larger as their shader processors are much fatter and consumes far more power. AMD's approach is to accomodate much smaller stream processors to increase parallelism that it is easy in terms of hardware implementation, but will require good software to work, but graphic rendering is so parallel that it explains why AMD had been trading blows for a while with its much smaller chip, specially since 2008.

But even with that, AMD knew that their VLIW5 performance wouldn't scale linearly forever with the increase of stream processors. So they did two little experiments. One of them is the HD 6870 has lots of tweaks at the hardware level that allows their Dual Command Queue processor to take more control of the shader resources compared to the HD 5000 series, which also explains why the HD 6870 performs so close compared to the HD 5870 while having 34% less stream processors and with a smaller die, it is a feat.

Their second test is their Cayman GPU, which moved their design from VLIW5 to VLIW4, which is a move that clearly shows that they're moving torward a more oriented TLP design than with previous generations which was more oriented to ILP. It was because according to AMD, they only saw an average of 3/4 or 60%-75% of utilization on their VLIW5 design in the best case scenario, something that only happened on 60% of the time, showing that a lot of hardware was wasted idle on the die. So instead of adding 4 little stream processors and one fat processor for special tasks which certainly used a lot of space and idled a lot of time, made more sense removing that fat processor and increase the computing performance of the remaining 4 making them equal on everything.

So in some circumstances where AMD can achieve maximum execution resources with a VLIW5 design (Something very rare), it should be faster than the VLIW4 as it happened on Civilization 5 compute tests. But if you couple the VLIW4 design with Barts tweaks, means that you can achieve a much better utilization of the VLIW4 resources that AMD was able to achieve on the HD 5000 series. That explains why the HD 6970 while it only has 1536 stream processors (As it has 24 SIMD engines), it is faster than the HD 5870 that has 1600 stream processors (20 SIMD engines). The issue here is that Cayman was supposed to be a 28nm product, so other performance enhancing features weren't added as it would make the chip bigger than it already is. So think of this, if Cayman used the VLIW5 design, it would have 1920 stream processors, that also explains why it isn't much faster than the HD 5870.

So the secret sauce would be mixing Barts optimizations (95% of the HD 5870 performance with 34% less stream processors), increased processor count (From HD 5870's 20 SIMD to 24 SIMD's from the HD 6970), Tessellation enhancements, bigger VRAM size and some core clock bumps and can give you between 20%-60% higher performance depending of how much advantage the software can take from Cayman's approach. So I do think that in the future, both GPU vendors with their different approaches will remain close for a while, nVidia's path is safer but also more expensive, but they have deeper pockets than AMD. I can't wait to see what nVidia/AMD will offer on their 28nm process!!

Hope that helps you clear it up.
 
It been a long time since I look at Xenos but I thought it was only had 48 alu.

http://arstechnica.com/civis/viewtopic.php?f=22&t=381592

Yes look like it was...
That quote was originally posted by somebody who's seen the specs but did not quite know what he was looking at. For instance 16-way ALUs looks like a misinterpretation of the 16-ALU SIMD (3 SIMD * 16 ALUs = 48 ALUs), as Xenos' ALUs are not 16-way by any stretch of the imagination.
 
That quote was originally posted by somebody who's seen the specs but did not quite know what he was looking at. For instance 16-way ALUs looks like a misinterpretation of the 16-ALU SIMD (3 SIMD * 16 ALUs = 48 ALUs), as Xenos' ALUs are not 16-way by any stretch of the imagination.

"AMD's VLIW underutilization always has been an issue since its induction on the HD 2x00 series. A good example of this. The Radeon HD 6870 has 1120 stream processors which in reality, are 224 stream processors in which every processor in there, is capable to execute up to 5 instructions per clock, but as far as it is from the same thread, and that's the catch."

The Xenos GPU iirc has 48 "stream processors" which is an architecture allowing up to 5 operations per clock, basically very early VLIW5 stuff. Not sure about the way they are aligned, if there is 3 SIMDs or not. The end result I believe however is correct.
 
The WiiU architecture being far more efficient than the 360s should not be a controversial position at this point. The 360 used a first gen unified shader architecture. WiiU uses AMD's fourth iteration of the technology. The design is not only going to be better at keeping the execution resources fed, but the more advanced shader model will offer certain shortcuts not available on the older hardware.
 
The WiiU architecture being far more efficient than the 360s should not be a controversial position at this point. The 360 used a first gen unified shader architecture. WiiU uses AMD's fourth iteration of the technology. The design is not only going to be better at keeping the execution resources fed, but the more advanced shader model will offer certain shortcuts not available on the older hardware.

Such as?
 
We've never seen confirmation that Latte is in fact VLIW5 as far as I know. Could be an early GCN like design for all we know. Just like Xenos was some weird in-between thing.

I think anything beyond VLIW4 is highly unlikely. Nintendo is very risk averse, and probably unwilling to try out an early GCN design. AMD isn't spending significant resources on a design just for Nintendo unless paid large sums. I don't see Nintendo spending much on a completely custom design considering how much cost they've cut out of other areas. Latte is probably based around an existing AMD design and tweaked to play nicely with an eDRAM/Logic process.

AMD had GCN taped out mid-2011, way too late for Nintendo.

Nintendo isn't going to release information so a lack of confirmation doesn't mean anything.

As far as launch, Nintendo sitting out the Direct9.0 era really put their teams behind the tech curve. I suspect they vastly underestimated the manpower needed to develop games like just in HD, but with advanced shaders.
 
From what I'm seeing, the GPU would need to be HD6XXX based or using tech from that generation to achieve performance with 160 shaders that exceeds the Xenos GPU, so the only way this 160 shader theory would work is if the GPU was only a notch behind the PS4/Xbox 3's GPU "technologically". This would, in turn, make the GPU even more next gen than we thought it was before.

The efficiency needed just doesn't seem to exist in the previous ATI/AMD GPU generations. I've yet to even see one concrete detail that could explain this being achieve.

I'm going to try filling in the letters with some of theoretical positions that have been estimated. Maybe looking at it will yield more understanding.


EDIT: I used BG Assassin's initial guesswork to fill in some of the unidentified blocks.

wiiudieblockswip.jpg
 
/\ I do not agree with anything you post. As normal you have nothing to back anything you post.

The WiiU architecture being far more efficient than the 360s should not be a controversial position at this point. The 360 used a first gen unified shader architecture. WiiU uses AMD's fourth iteration of the technology. The design is not only going to be better at keeping the execution resources fed, but the more advanced shader model will offer certain shortcuts not available on the older hardware.

Yep agreed.

More I look the more positive it's a 160 alu.

Found this about xenos.

We found out a while ago that its actually 216 for Xenos since the scalar opp is only one FP operation as opposed 2 as was initially thought. Thats makes the total flops per ALU, per cycle at 9 rather than 10.
http://beyond3d.com/showpost.php?p=749084&postcount=57
 
/\ I do not agree with anything you post. As normal you have nothing to back anything you post.



Yep agreed.

More I look the more positive it's a 160 alu.

You mean like the developers comments I quoted in my previous post? Or the official photographic comparisons in the one before that? Or BGassassin's analysis for the one above?

You ignoring/disregarding them do to them contradicting your argument does not = them not being there.

I do not care if you choose to ignore my post but don't use a lie as grounds for it. I am not you. Don't get salty with me just because I keep pointing out the flaws in all of your arguments.




Back on topic, there is a spot I marked with questions that is disregarded in the labeling even though it is sectioned off. It appears to be logic and Nintendo stated that there was no wasted silicon i believe.

After thinking about it for a moment, could this possibly be the 8bit-cpu responsible for converting TEV code?
 
You mean like the developers comments I quoted in my previous post? Or the official photographic comparisons in the one before that? Or BGassassin's analysis for the one above?

You ignoring/disregarding them do to them contradicting your argument does not = them not being there.

I do not care if you choose to ignore my post but don't use a lie as grounds for it. I am not you. Don't get salty with me just because I keep pointing out the flaws in all of your arguments.



For the people who have actual tech knowledge, why is the spot I marked with questions disregarded in the labeling even though it is sectioned off?

You mean the quote about lighting? Lol that you label confirm shader improvement. It's just so silly.

A selection of in-game screenshots taken from the Wii U version of Most Wanted. High-res PC textures are the headline addition, but Criterion has also improved night-time lighting after employing new staff who previously worked in the motion picture business. In contrast to the other versions, the game begins at night perhaps to highlight the difference.


They hired a staff member that brought things over from motion picture business. Guess that guy name is "shader. "

Official photo show nothing about shade performance and it's just silly. But I am sure you think you can just look at a photo and .just tell 160 part couldn't do that. It's crazy....
 
Also to A more normal bird: I'm figuring that the ALUs would stall while those things are stalling so overall we are talking about the same thing. TMUs, ROPs, cache and API all have to be a certain level of efficiency not to bog down the Wii U's GPU. That is what I'm getting at with my numbers, it is not purely that each ALU can simply handle 50% more code by itself but that the entire GPU allows the ALUs in GPU7 to out perform the ALUs in Xenos by 50%. If that is VLIW5 it's very impressive. If it is GCN, it is probably a bit bogged down still thanks to the bad dev tools... I think when we get a few years in, the GPU7's efficiency might actually get closer to 60% better than xenos but that is just because we should allow some growth from the hardware, especially since no developer at launch was pushing the Wii U.

Yeah, the ALUs would stall, but what if they were at less than 100% utilisation prior and maintaining a solid 60/30fps? We have no context for the 66% figure. If you had a game that didn't push the Xenos to the limit FLOPs-wise but had to reduce IQ and texture res due to memory and tiling concerns and ported it to the Wii-U those bottlenecks would be gone, and visuals could be improved. The ALUs would be being fed more efficiently but only in one particular scenario, and it wouldn't have to improve by whatever percentage is required to take you from 176GFLOPS to 240. I also strongly doubt that the 66% figure for Durango includes software efficiency, because there would be great difficulty in making a comparison and any that could be made would probably have little relevance to an in-game scenario; accordingly, we don't know if titles Wii-U are stalling due to the CPU/inefficient code or if they're breezing through functions hampered by the SM3.0 featureset of PS3/360.

Without access to detailed benchmarks and game code or explicit comments by those with first-hand knowledge this efficiency debate is just fumbling in the dark, especially when you include the nebulous Durango figure. All we can go by are what we see in-game: if it's a 160ALU part the improvements in GPU efficiency and other parts of the system are enough for it to generally keep pace with and perhaps exceed Xenos, if it's a 320ALU part then it's likely to be 50% to 100% more capable than Xenos.

You mean the quote about lighting? Lol that you label confirm shader improvement. It's just so silly.

They hired a staff member that brought things over from motion picture business. Guess that guy name is "shader. "

Official photo show nothing about shade performance and it's just silly. But I am sure you think you can just look at a photo and .just tell 160 part couldn't do that. It's crazy....

Others and myself have said this repeatedly, but krizzx just ignores it and assumes that every single improvement is due to increased power/better tech, even with no evidence to suggest it.
 
I think when we get a few years in, the GPU7's efficiency might actually get closer to 60% better than xenos but that is just because we should allow some growth from the hardware, especially since no developer at launch was pushing the Wii U.

That Criterion guy said that they had gotten pretty much everything they could out of the Wii U with their NFS:MW port.
 
Others and myself have said this repeatedly, but krizzx just ignores it and assumes that every single improvement is due to increased power/better tech, even with no evidence to suggest it.

That is a lie. I am going by the developers own comments. I did not ignore anything as you provided nothing substantial to ignore. You simply outright dismissed what was written and wrote it off as something else. You refuse to provide any explanations backing up your own claims or for why data I posted is false.

I will happily back up what I say, such is the reason I post quotes and photos, but if you refuse to take the words of actual known developers who have worked on the hardware and professional analysts, then saying anything more or trying to provide more evidence would simply be a wasted effort. So, I stopped wasting my time. Nothing more. I'm not going to keep arguing with you about this.


They directly correlated their improvements in the game with the strength of the Wii U, and I will take their professional, experienced words over yours unless you provide evidence to the contrary, which you didn't.

Now, will you cease with the personal attacks? I'm not here to fight a fan war.
Have I somehow still not made it clear the 8-bit cpu has nothing to do with tev code or do you just lack reading comprehension?

I thought that was what you said Marcan had said? If it is not, then that is no problem. I don't have money invested in parts being for one thing or another. No reason to get offensive about it. I haven't read every word of every post in the thread, so I probably missed that one.

What is the 8 bit cpu for then?
 
I thought that was what you said Marcan had said? If it is not, then that is no problem. No reason to get offensive about it.

What is the 8 bit cpu for then?

I would quote the many times I clarified and even Marcan's original quote, but whatever...

It's for converting the video format.
 
I would quote the many times I clarified and even Marcan's original quote, but whatever...

It's for converting the video format.

Okay, understood.

Though, it still doesn't answer the question about the unmarked component on the GPU. Could that be where it is placed? I also noticed that the size and shape of the unmarked component matches K perfectly.
 
Okay, understood.

Though, it still doesn't answer the question about the unmarked component on the GPU. Could that be where it is placed?

Based on position, not likely. I do not agree with most of the bg labelling, but I am too tired for another long analysis. Perhaps tomorrow.
 
Yeah, the ALUs would stall, but what if they were at less than 100% utilisation prior and maintaining a solid 60/30fps? We have no context for the 66% figure. If you had a game that didn't push the Xenos to the limit FLOPs-wise but had to reduce IQ and texture res due to memory and tiling concerns and ported it to the Wii-U those bottlenecks would be gone, and visuals could be improved. The ALUs would be being fed more efficiently but only in one particular scenario, and it wouldn't have to improve by whatever percentage is required to take you from 176GFLOPS to 240. I also strongly doubt that the 66% figure for Durango includes software efficiency, because there would be great difficulty in making a comparison and any that could be made would probably have little relevance to an in-game scenario; accordingly, we don't know if titles Wii-U are stalling due to the CPU/inefficient code or if they're breezing through functions hampered by the SM3.0 featureset of PS3/360.

Without access to detailed benchmarks and game code or explicit comments by those with first-hand knowledge this efficiency debate is just fumbling in the dark, especially when you include the nebulous Durango figure. All we can go by are what we see in-game: if it's a 160ALU part the improvements in GPU efficiency and other parts of the system are enough for it to generally keep pace with and perhaps exceed Xenos, if it's a 320ALU part then it's likely to be 50% to 100% more capable than Xenos.

I agree, I'm still working under the assumption that GPU7 produces at least 10% better performance than Xenos, and honestly I think showing how much more "efficient" durango's GPU "is" over Xenos, gives us something to base what (we assume is) GCN architecture can do in a console setting. We roughly know how much more efficient GCN is over VLIW5 (a lot) so from there we can take things a bit further. Remember it is all assumptions and guess work, we don't really have much solid information to go on. If you want answers, you have to remember they only come in proximity to reality when it comes to this product.

So, yes while you are right, ALU efficiency might not be "THAT" high, the chip as a whole should allow GPU7 to out perform Xenos by ~50% in efficiency + a 10% faster clock speed to achieve an overall 10% or better increase over Xenos as a whole.
 
I agree, I'm still working under the assumption that GPU7 produces at least 10% better performance than Xenos, and honestly I think showing how much more "efficient" durango's GPU "is" over Xenos, gives us something to base what (we assume is) GCN architecture can do in a console setting. We roughly know how much more efficient GCN is over VLIW5 (a lot) so from there we can take things a bit further. Remember it is all assumptions and guess work, we don't really have much solid information to go on. If you want answers, you have to remember they only come in proximity to reality when it comes to this product.

So, yes while you are right, ALU efficiency might not be "THAT" high, the chip as a whole should allow GPU7 to out perform Xenos by ~50% in efficiency + a 10% faster clock speed to achieve an overall 10% or better increase over Xenos as a whole.

I'm seeing a strange split here. We are going from 160 being slightly better to 320 only being 50%. No one finds that odd? I'm probably just over analyzing something.
 
I just proposed an architectural change that would work very well with the 160 SU theory: Thread interleaving. Running 320 or 640 concurrent threads on 160 shader units. See this presentation, starting at page 31: http://s08.idav.ucdavis.edu/fatahalian-gpu-architecture.pdf

If that's even necessary with all the embedded memory available. Stalls typically occur during VRAM reads after all, and with ultra low latency local storage, there shouldn't be all that many stalls in the first place compared to traditional GPUs.
Interesting. Are the stalls for a typical GPU so bad that reducing them would lead to a possible 200-400% increase of performance? That sounds bit high. I will have to check that pdf later when I get a chance.
 
Interesting. Are the stalls for a typical GPU so bad that reducing them would lead to a possible 200-400% increase of performance? That sounds bit high. I will have to check that pdf later when I get a chance.

If I'm understanding it right, the 200-400% increase comes from each ALU doing 2 to 4 instructions a piece per clock... Similar to Xenos, but I don't think we should take that sort of thing seriously until we see games that are doing so much more than Xenos that 1.5-4x 360 is reasonable to conclude... doubling the threads would be fairly close to actually just having 320ALUs in the GPU if I'm not mistaken. While it is possible and shouldn't be ruled out, we should wait to see something that looks like it is doing that much more.
 
I just proposed an architectural change that would work very well with the 160 SU theory: Thread interleaving. Running 320 or 640 concurrent threads on 160 shader units. See this presentation, starting at page 31: http://s08.idav.ucdavis.edu/fatahalian-gpu-architecture.pdf

If that's even necessary with all the embedded memory available. Stalls typically occur during VRAM reads after all, and with ultra low latency local storage, there shouldn't be all that many stalls in the first place compared to traditional GPUs.

I still have to read that link, but I assume the system comes with the usual issues of concurrency like resource locking, context switching etc? Basically a compromise? Will read later.

FakeEdit: Also, is this thread interleaving sort of what GCN does?
 
That is a lie. I am going by the developers own comments. I did not ignore anything as you provided nothing substantial to ignore. You simply outright dismissed what was written and wrote it off as something else. You refuse to provide any explanations backing up your own claims or for why data I posted is false.

I will happily back up what I say, such is the reason I post quotes and photos, but if you refuse to take the words of actual known developers who have worked on the hardware and professional analysts, then saying anything more or trying to provide more evidence would simply be a wasted effort. So, I stopped wasting my time. Nothing more.

They directly correlated their improvements in the game with the strength of the Wii U, and I will take their professional, experienced words over yours unless you provide evidence to the contrary, which you didn't.

Now, will you cease with the personal attacks? I'm not here to fight a fan war.

I haven't made a single personal attack on you, only stated that you believe something without evidence. If you can find a single statement claiming that the improvements to night-time lighting in Most Wanted were only made possible by the the extra power of the Wii-U I will gladly retract my statement. I'm not disputing that the nighttime lighting was improved, but there are issues with assuming that it was due to hardware capability:
-The game was already available for PC, where there would have been ample power for any such changes.
- If this was a technological and not an artistic advancement, why was only the nighttime lighting and shading improved?

The same applies to Deus-Ex: there were improvements made to the game no doubt (shadow res and AA I believe were explicitly mentioned) but the bulk of the lighting and shading improvements were backported from the DLC. The DLC had better lighting than the main game, but the systems it was released for remained the same. This is not without precedent. Both Half-Life 2 and The Witcher 2 on PC received patches that improved their lighting and shading after their console ports were released. This is obviously not because the 360/PS3 had more power to allow for it, but because as time goes on and more money and labour is invested improvements to these elements can be made. Those are examples where more technologically advanced systems were backported; a refinement and tweaking of variables in an existing system would require less time and effort.
 
I haven't made a single personal attack on you, only stated that you believe something without evidence. If you can find a single statement claiming that the improvements to night-time lighting in Most Wanted were only made possible by the the extra power of the Wii-U I will gladly retract my statement. I'm not disputing that the nighttime lighting was improved, but there are issues with assuming that it was due to hardware capability:
-The game was already available for PC, where there would have been ample power for any such changes.
- If this was a technological and not an artistic advancement, why was only the nighttime lighting and shading improved?

The same applies to Deus-Ex: there were improvements made to the game no doubt (shadow res and AA I believe were explicitly mentioned) but the bulk of the lighting and shading improvements were backported from the DLC. The DLC had better lighting than the main game, but the systems it was released for remained the same. This is not without precedent. Both Half-Life 2 and The Witcher 2 on PC received patches that improved their lighting and shading after their console ports were released. This is obviously not because the 360/PS3 had more power to allow for it, but because as time goes on and more money and labour is invested improvements to these elements can be made. Those are examples where more technologically advanced systems were backported; a refinement and tweaking of variables in an existing system would require less time and effort.

http://www.penny-arcade.com/report/...-wii-u-has-graphic-improvements-new-game-play

the Wii U hardware allowed the team to feature a new lighting system, improved fog, improved shadows, and antialiasing.
 
I've been doing analysis on the bg comparison, and I'm sure that T2 Latte is the block that is directly to the lower right of O on "Brazos?".
I'm not sure if T1 is the same block, though. Latte's W1 and W2 appear almost identical to the two side by components in the lower right corner, though I will lean towards the one BG has labeled as W.
bg's D looks spot on to me. I want to say the Latte's F is the one directly above the one marked O on "Brazo?" and E is the one directly under it.
U1 matches the component that is directly above W on "Brazos?".
 

No quote for NFS? Anyway, here you go:

http://www.polygon.com/2013/3/22/4136048/deus-ex-human-revolution-directors-cut-wii-u-pax-east
"After the release we continued to work with the game engine to improve some stuff because we knew we were doing DLC," he said. "In missing link, the visuals are better. We took all of that experience from Missing Link and added it to the entire game when working on the Wii U version."

That's a direct quote from the same source as the Penny Arcade article. EDIT: I should also note that the fog, shading and lighting are what was changed for the DLC on all systems. Shadow map resolution and AA I'm not sure about and could well be better on the Wii-U version than they are in the DLC for PS3 and 360.
 
No quote for NFS? Anyway, here you go:

http://www.polygon.com/2013/3/22/4136048/deus-ex-human-revolution-directors-cut-wii-u-pax-east


That's a direct quote from the same source as the Penny Arcade article. EDIT: I should also note that the fog, shading and lighting are what was changed for the DLC on all systems. Shadow map resolution and AA I'm not sure about and could well be better on the Wii-U version than they are in the DLC for PS3 and 360.

I never argued that there weren't some improvements made in the past because of the DLC, but the dev did clearly state that there have been have been enhancements made "because" of the Wii U hardware. What else he said in the article does not displace that. You can continue to dismiss/redirect/ignore it if you wish.

"the Wii U hardware allowed the team to feature a new lighting system, improved fog, improved shadows, and antialiasing."

I'm not arguing this anymore.
 
I never argued that there weren't some improvements made in the past because of the DLC, but the dev did clearly state that there have been have been enhancements made "because" of the Wii U hardware. You can continue to dismiss/redirect/ignore it if you wish.

"the Wii U hardware allowed the team to feature a new lighting system, improved fog, improved shadows, and antialiasing."

I'm not arguing this anymore.

That quote is from the editor, not the developer.

You'd do better to quote the whole thing
I asked Pedneault if the game had seen any improvements thanks to the Wii U's hardware. New controls were neat, but did the game actually run any better? Pedneault was cagey about details and wouldn't get into specifics, but said that the Wii U hardware allowed the team to feature a new lighting system, improved fog, improved shadows, and antialiasing.

“Right now, this is the best-looking Deus Ex,” Pedneault said. “It's even sharper than the PC version.” While the game does look noticeably nicer than it did on 360 and PS3, I couldn't compare it to a high-end PC, and when I pressed Pendeault to be specific about performance differences, he would only say that the team has worked on adjusting the game's engine, and that the Wii U hardware “helped” with that task.
 
That quote is from the editor, not the developer.

You'd do better to quote the whole thing

It's even sharper than the PC version.” While the game does look noticeably nicer than it did on 360 and PS3" I thought it was redundant and didn't see a need to quote two paragraphs. He still says the same thing. That it shows noticeable improvement over the 360/PS3 which means that isn't is just old DLC enhancements. Otherwise it would look exactly the same.
 
I've been doing analysis on the bg comparison, and I'm sure that T2 Latte is the block that is directly to the lower right of O on "Brazos?".
I'm not sure if T1 is the same block, though. Latte's W1 and W2 appear almost identical to the two side by components in the lower right corner, though I will lean towards the one BG has labeled as W.
bg's D looks spot on to me. I want to say the Latte's F is the one directly above the one marked O on "Brazo?" and E is the one directly under it.
U1 matches the component that is directly above W on "Brazos?".

This should help you.

vqJGqpn.jpg


GMC stands for Graphic Memory Controller. CP is Command Processor. This picture is why I was kind of torn on D because it lists D as a part of the graphics instead of video. And the coloring doesn't completely line up with some of the actual blocks. I matched the Fs by looking at and counting those larger SRAM blocks.
 
This should help you.

vqJGqpn.jpg


GMC stands for Graphic Memory Controller. CP is Command Processor. This picture is why I was kind of torn on D because it lists D as a part of the graphics instead of video. And the coloring doesn't completely line up with some of the actual blocks. I matched the Fs by looking at and counting those larger SRAM blocks.

Thanks. This is defnitely helpful.

I'm almost certain that T2 is "Northbridge", but that would be an abnormal place for it. Then there is still T1 which look noticeably different but mostly the same. It would have no need for 2 of them would it, unless T1 is something different.

I'm also going with E for the "Graphics Memory Controller" with 95% certainty. If you turn it left 90 degrees, it matches nearly perfectly with the Brazos pic on the left.

F is Latte's "Video Engine".

W matches the PCI-express controller, but we know that the Wii U has no PCI-express. I don't know what could be in Latte.

At this point I'm willing to bet Latte is based heavily on Brazos which means that it at least has some HD 6XXX tech in it. It was released in 2011, just in line with the Wii U announcement, and there was also that big rumor about the Wii U having an AMD embedded GPU in the 6000 series.
Brazos (Fusion) platform (2011)

AMD mobile Initial platform
Mobile processor Processors

Single or Dual-core 64-bit AMD Fusion APU C-Series (Ontario), and E-Series (Zacate) with the following:
made on 40 nm CMOS process
support for DDR3 1333 MHz memory
9W (C-Series) or 18W (E-Series) TDP
Mobility Radeon HD 6xxx GPU on 40 nm process
Cedar graphics core with 80 SP
DirectX 11
UVD 3

Mobile chipset


Also, there undeniably a lot of duplicate components in Latte. Cayman features similar duplicate components. Its also HD 6XXX. There are 5 duplicate components in Latte. Coindicently, there are 5 duplicate components in Cayman http://images.anandtech.com/doci/4061/Cayman block diagram.png

I found a block diagram for Llano, though Latte honestly looks a lot more like Brazos at this point.


I think this just might be a breakthrough in the GPU analysis.
 
I never argued that there weren't some improvements made in the past because of the DLC, but the dev did clearly state that there have been have been enhancements made "because" of the Wii U hardware. What else he said in the article does not displace that. You can continue to dismiss/redirect/ignore it if you wish.

"the Wii U hardware allowed the team to feature a new lighting system, improved fog, improved shadows, and antialiasing."

I'm not arguing this anymore.

It's even sharper than the PC version.” While the game does look noticeably nicer than it did on 360 and PS3"I thought it was redundant and didn't see a need to quote. He still says that is show irompvement over the 360/PS3 which means that isn't is just getting same benefits. There would be no point in teh statement otherwise.

Of course it shows improvement over the PS3/360 version, they're not using the lighting and shading from the DLC, which was a clear step up from the original game. The dev never explicitly stated that the Wii-U's power allowed for those improvements, that's an editor's assumption of context. I'm not ignoring it or redirecting it, I'm not a fanboy with an agenda, it's the truth. As for "sharper than the PC version", here's what I take that to mean: they optimised the FXAA and applied a sharpening filter (check the screens if you doubt it), both of which are negligible in terms of performance. There's no way on earth it has higher resolution textures or renders at a higher res than the game running on PC, so it's really a meaningless statement.

EDIT: Overlooked this the first time, "While the game does look noticeably nicer than it did on 360 and PS3" is a quote from the article writer, not the developer.
 
If I'm understanding it right, the 200-400% increase comes from each ALU doing 2 to 4 instructions a piece per clock... Similar to Xenos, but I don't think we should take that sort of thing seriously until we see games that are doing so much more than Xenos that 1.5-4x 360 is reasonable to conclude... doubling the threads would be fairly close to actually just having 320ALUs in the GPU if I'm not mistaken. While it is possible and shouldn't be ruled out, we should wait to see something that looks like it is doing that much more.


What I was specifically replying to was wsippel's statement that Latte may not need to worry about stalls due to the processor having all of that eDRAM. If that is the case, instead of the ALUs doing 2 to 4 instructions a piece per clock with interleaving, the Wii U's GPU will not need that power boost and would run at the same or better level.

If this info is correct, the boost of power due to Latte having additional eDRAM banks is very significant. Sounds a bit too good, IMO.

It should be noted that we still don't officially know how the Wii U could handle rendering two (and eventually three) different complex 3D screens. Going by past Nintendo's systems, it there was extra hardware to assist these type of exta tasks. The DS had an additional 2D Graphic core to handle the second screen, and the 3DS had its GPU clockspeed doubled to handle rendering 3D. Knowing what we know now, it seems unlikely that the Wii U was designed to tackle such tasks with brute force. Bgassassin dual-GPU/tessellation units and wsippel's interleaving theory are good ideas that goes along with this.
 
I never argued that there weren't some improvements made in the past because of the DLC, but the dev did clearly state that there have been have been enhancements made "because" of the Wii U hardware. What else he said in the article does not displace that. You can continue to dismiss/redirect/ignore it if you wish.

"the Wii U hardware allowed the team to feature a new lighting system, improved fog, improved shadows, and antialiasing."

I'm not arguing this anymore.

That's fox news level of selective reading. That quote does not imply these enhancements were only made possible because of the WiiU hardware, it simply states that the WiiU is powerful enough to implement these improvements. I doesn't in any way imply that these improvements would be impossible to implement on PC, 360 or PS3..
 
I really think its time to close this thread... Its getting sad
I'm actually curious what's there to get about the GPU now?

If the console was alot more powerful than PS3/360, we'd probably see more developer interest.

It's kinda hard to get excited about what's likely built on the 45nm process and uses very little watts. It's capable of some better tricks but in raw power, it's not doing much within that envelope. And that's what probably leaves developers disappointed or not caring.

If people want to keep digging then more power to them I suppose. But I think Nintendo's intentions for this thing are quite clear.
 
Status
Not open for further replies.
Top Bottom