WiiU "Latte" GPU Die Photo - GPU Feature Set And Power Analysis

Status
Not open for further replies.
Internal Development to way behind on the wiiu. Not sure what point you are making.

Just looking at how the system is deisgn its very clear BC was one of the main design goals. To say otherwise is just silly....

The reason why they dont market playing wii games on the wiiu is most likely they are trying to show wiiu is a "new" console. Since so many people think its just an add on controller.



http://www.nintendolife.com/news/2013/05/nintendo_sends_direct_wii_u_marketing_message_to_wii_owners

That is some fact skewing there. The things you are trying to correlate do not beget each other.

People mistaking the Wii U for the Wii has absolutely nothing to do with its backwards compatability or developement. Its is because the console case itself looks so similar and it still has the name "Wii" in it. Even if you ripped bc out of the Wii U and put in parts that surpassed the PS4/Xbox3, people would still mistake is for an addon for the Wii because most people judge books by their covers. That is the main purpose of marketing.
 
That is some fact skewing there. The things you are trying to correlate do not beget each other.

People mistaking the Wii U for the Wii has absolutely nothing to do with its backwards compatability or developement. Its is because the console case itself looks so similar and it still has the name "Wii" in it. Even if you ripped bc out of the Wii U and put in parts that surpassed the PS4/Xbox3, people would still mistake is for an addon for the Wii.

I don't think that was his point. What he said was that if they showed the Wii U playing Wii games, it would be even more confusing than it already is, not that it's the primary reason people are confused.
 
That is some fact skewing there. The things you are trying to correlate do not beget each other.

People mistaking the Wii U for the Wii has absolutely nothing to do with its backwards compatability or developement. Its is because the console case itself looks so similar and it still has the name "Wii" in it. Even if you ripped bc out of the Wii U and put in parts that surpassed the PS4/Xbox3, people would still mistake is for an addon for the Wii.

We were talking about marketing, never said that is the main reason why people think the wiiu is an add on.

Their marketing has been trying to show the wiiu is a new console. Pushing playing wii games wouldnt be smart marketing.

Edit beaten by a second lol
I don't think that was his point. What he said was that if they showed the Wii U playing Wii games, it would be even more confusing than it already is, not that it's the primary reason people are confused.
yes that was the point.
 
We were talking about marketing, never said that is the main reason why people think the wiiu is an add on.

Their marketing has been trying to show the wiiu is a new console. Pushing playing wii games wouldnt be smart marketing.

Possibly, but what is the relevance? Has Nintendo done that?

I honestly think it would depend on how they are doing it. In fact, if they advertised "backwards" capability while also showing its next gen games people may well view the system as something "forward" for it to need to go backwards. They could also advertise the fact that its the only next gen console with backwards compatibility.

Though, honestly I think they should just remove Wii from the name and call it the Nintendo U. That would relieve the majority of the confusion.
 
Possibly, but what is the relevance? Has Nintendo done that?

I honestly think it would depend on how they are doing it. In fact, if they advertised "backwards" capability while also showing its next gen games people may well view the system as something "forward" for it to need to go backwards. They could also advertise the fact that its the only next gen console with backwards compatibility.

Though, honestly I think they should just remove Wii from the name and call it the Nintendo U. That would relieve the majority of the confusion.

Yup, I agree. They definitely shouldn't have used the Wii name.
 
This thread is getting derailed again. Please keep the conversation about tech and less about dead horse arguments about marketing.

What improvements do you see them doing and wouldn't the end result be the same. Upfront more power, or take more time to get used to squeezing everything out of it but in the end have a good solid machine? If so, I'd prefer the up front power.

We'll see soon I guess. If Nintendo doesn't come out with a graphical showcase for Wii U, I won't be surprised but I'll be disappointed for sure.

Iwata said that they will have to deal with Wii U's "underpowered" stigma, so Nintendo probably has something to disprove that.

The front-end route would probably lead to a R700-based processor with more raw power but less efficient. The customizations Nintendo made was probably what made it possible for Latte to have features beyond DX10.1, and we are still figuring out what other changes were made.

Neither IBM nor Freescale had any chips in their portfolio that made sense for Apple. The high-end Power chips were too expensive, the low-end wasn't powerful enough, and there was pretty much nothing off-the-shelf in between. Console manufacturers sign multi-year, high-volume contracts, so IBM is willing to cook something up. Apple couldn't and wouldn't do that.
Thanks for the summary of that situation.
 
This thread is getting derailed again. Please keep the conversation about tech and less about dead horse arguments about marketing.



Iwata said that they will have to deal with Wii U's "underpowered" stigma, so Nintendo probably has something to disprove that.

The front-end route would probably lead to a R700-based processor with more raw power but less efficient. The customizations Nintendo made was probably what made it possible for Latte to have features beyond DX10.1, and we are still figuring out what other changes were made.

Yeah it is a really custom design considering there is 30-40% of the SPU's size unaccounted for. That is something a lot of people want to write off, if it has nothing to do with fitting in more shaders and allowing on chip memory to help feed them (maybe through the extra 1MB of SRAM, then there is something very weird going on inside of the SPUs that does not relate to off the shelf R700 designs.
 
Yeah, if Nintendo is actually paying $100 for that MCM they are getting ripped off.

That product is being developed by a Japanese manufacture iirc, meaning that the price likely is in yen. Making the Wii U's MCM price drop from $100 USD (if it was that high at launch) to ~$65 USD.

The price for the Wii U in japan is 26250 yen which was $340 USD (for the basic unit at launch) and is now $265 USD today. All japanese components for the device are likely contracted so Nintendo is saving a great deal of money on each Wii U made currently. Also they are selling Wii U for 30441 yen on each basic model sold in the us.

Anyone who thinks the hardware is still selling for a loss, doesn't understand this situation. At least this is my understanding of what is going on.
 
Not that it's always a bad thing. The real problem is, Nintendo chose the stupidest time to focus on "efficiency" in the face of PS4/720.
They could be thinking way ahead: Getting rid of their "home console" line altogether and fusing it with their handheld division.
 
They could be thinking way ahead: Getting rid of their "home console" line altogether and fusing it with their handheld division.
By using Wii U as a sacrificial scapegoat? The same console that's suppose to carry them for the next 4 years or so?

Not really seeing the logic in that.

Their handhelds are also doing ok so I'm not sure what the rush is for.
 
I know that fixed function 'secret sauce' has become kind of a laughable subject so please bear with me a little here.

There appears to be consensus here that there is a large portion of the GPU that has unknown function. Could be fixed function, could be nothing, could be something obvious that we haven't thought of, etc.

But playing Devil's advocate for fixed function in light of the current conversation here, let's consider a few things.

Tessellation: We know that the Wii U is capable of it, but don't yet know how. Shin'en says their next game will make use of it.

Advanced lighting: We don't know to what extent advanced lighting on Wii U extends, but we have seen some impressive lighting so far, even though the Wii U has no AAA games released or even fully announced yet. It seems so common amongst the effects where the Wii U excels so far it is quite interesting.

Depth of Field: Yet another effect that we continue to see (after the first wave of crap ports that is) in generous portion.

So... And again I realize that Fixed function at this point, technologically seems counterproductive, but bear with me. It's all about what Nintendo may have been going for.

Remember before the Wii U was released, and Nintendo went on and on about how easy the console would be to develop for, and how that was central to their custom design, etc.? It's easy to forget or write off considering what the reality of the last 6 months has been, but that is what they were initially saying, correct?

What if, there are fixed functions on the GPU silicon for Lighting, Depth of Field, Tessellation, and perhaps more - AND - the large size of the stream processors is due to an integration/series of trigger systems amidst the shaders which allow not only for more ease of specific pixel isolation, but also an easy on/off 'switch' if you will, that can be built into game engines. Potentially eliminating a lot of extra coding if you have the proper tools, while keeping the GPU more power efficient, as well as making the smaller number of shaders seem more realistic.

Tessellation would be potentially very thorough, depth of field would be as easy as it has appeared to be so far for the console, as well as lighting.

In any case, please excuse any ignorance of mine. Just throwing a thought process out there.
 
They could be thinking way ahead: Getting rid of their "home console" line altogether and fusing it with their handheld division.

You are thinking of this wrongly, push Wii U tech into the next DS, evolve Wii U tech for the successor (notice I didn't say replace)

This allows them to clearly build a path of development for future handhelds without having to drop the ball on them and also allows them to better support their home consoles because they could nearly double their output focusing on one architecture.

Making 2 games on 2 different architectures requires more resources, a much larger team and starting from scratch on both platforms.

Making 2 games on 1 architecture requires only a small increase in team size (monsters, ai, game logic can all stay roughly the same, even the general world created for the game could be the same in many instances) you'd also be able to build one game and add different content to the tail end of development to produce two separate games instead. Think of OoT and Majora's Mask, or Galaxy 1 and 2.
 
You are thinking of this wrongly, push Wii U tech into the next DS, evolve Wii U tech for the successor (notice I didn't say replace)

This allows them to clearly build a path of development for future handhelds without having to drop the ball on them and also allows them to better support their home consoles because they could nearly double their output focusing on one architecture.

Making 2 games on 2 different architectures requires more resources, a much larger team and starting from scratch on both platforms.

Making 2 games on 1 architecture requires only a small increase in team size (monsters, ai, game logic can all stay roughly the same, even the general world created for the game could be the same in many instances) you'd also be able to build one game and add different content to the tail end of development to produce two separate games instead. Think of OoT and Majora's Mask, or Galaxy 1 and 2.

That's what I imagined earlier (can't remember which tread). Make a 20nm SoC out of the WiiU tech (or slightly enhanced) for the next handheld (64bit DDR interface is also a perfect fit for this) and bump the specs for the next homeconsole on the same architecture (8 CPU cores, more GPU cores (~2TF), faster and more RAM, etc.). Would make things a lot easier for engine development etc.
 
That's what I imagined earlier (can't remember which tread). Make a 20nm SoC out of the WiiU tech (or slightly enhanced) for the next handheld (64bit DDR interface is also a perfect fit for this) and bump the specs for the next homeconsole on the same architecture (8 CPU cores, more GPU cores (~2TF), faster and more RAM, etc.). Would make things a lot easier for engine development etc.

I would imagine that they use something smaller than 20nm, 10nm should be ready by 2016 and DS4 isn't going to show up until 2017 or 2018 IMO but yes this is exactly what I'm saying, build the home console and convert it to the next handheld until you can't anymore. Iwata said they are exploring this more or less anyways when he said they will be moving onto 1 architecture in their next cycle, it could also mean building Wii U down into a phone at some point, because the reality is there is just too much money to be made, and if they could somehow get Android running on their Wii U hardware, they could just have Samsung or another manufacture build their handheld device.

I've even thought they can use the + as their home button like iPhone's circle or Samsung's rectangle, put 2 shoulder buttons on the side and a stylus on the bottom, and you have a compelling game device.

ArchangelWest:

Tessellation is fixed function, that is the way it is done even on PC hardware right now. As for what you are saying, it is possible that they have added some fixed function hardware inside the SPU i believe, but it is more likely that these shader units are just far more enlarged... maybe something crazy happened and they combined the 5th shader into every other shader in VLIW5, so it is a hybrid of VLIW5 and VLIW4, meaning that VLIW5's problem of the 5th shader often being useless has been corrected by unifying the shaders like in GCN (which was one of the points I was trying to make earlier) This would allow them to move from instruction level parallelism (ILP) to thread level parallelism (TLP) which would be a focus if Nintendo really did want to create a GPGPU-centric GPU.

VLIW had a problem scheduling processes of the shader ahead of time, this is why VLIW5 often didn't even use all 4 of the base shaders and rarely used the 5th shader. VLIW is rather complex and it is hard to predict performance thanks to issues like the scheduler and not all of the shaders being created equal in VLIW5's case.

What I think the extra space could be is added logic to make all 5 VLIWs in a group more equal like VLIW4, then adding a scheduler inside the SPU allowing the hardware to do the work itself, this corrects the weaknesses of VLIW and makes all 20ALUs in a SPU far more efficient, this is largely something they would have to do to address TEV anyways I believe and should solve GPGPU issues VLIW5(and 4) had.

This would account for the extra space and would give the 160 ALUs a noticeable boost, if you were to compare it to VLIW5 for instance, it would be similar to having 200ALUs and should offer even greater performance thanks to having the scheduler in the hardware and would allow code to not stall in VLIW.

This isn't out of order execution mind you, it can't jump around the code but it could select different code, pixels or values to work on, rather than waiting around for jobs to finish that only utilize 3 or 4 shaders.

I guess this sounds a lot like GCN at this point, the main difference of course is that they would still be locked down to 5 shader groups in series of 4 rather than 16 shader groups in series of 4. of course it's likely that 4 VLIW SPUs still work as 1 SIMD (iirc) which means that you'd have 2 "CU" that are 80ALUs wide rather than 2 that are 64 shaders wide. However if there is a scheduler in each SPU, that means you could have more elements for each thread than GCN, basically wider lanes than GCN has. The trade off is die space, schedulers take up room, but there is plenty of extra space in the SPUs than what you have in R700 SPUs, meaning this is some ridiculous custom R700 chip that has become some sort of hybrid GCN/VLIW thing.

Edit: I just looked it up and Cayman had 44 cycle latency. This is completely against the design logic of Wii U and is a major reason they might of done something very similar to what I suggest above. (Cayman is VLIW4 which has a lower latency than VLIW5 chips iirc)
GCN has a much lower cycle latency and so would this, possibly even lower than GCN since there is more schedulers and a wider series of elements (20 shaders vs 16)
 
I would imagine that they use something smaller than 20nm, 10nm should be ready by 2016 and DS4 isn't going to show up until 2017 or 2018 IMO but yes this is exactly what I'm saying, build the home console and convert it to the next handheld until you can't anymore. Iwata said they are exploring this more or less anyways when he said they will be moving onto 1 architecture in their next cycle, it could also mean building Wii U down into a phone at some point, because the reality is there is just too much money to be made, and if they could somehow get Android running on their Wii U hardware, they could just have Samsung or another manufacture build their handheld device.

I've even thought they can use the + as their home button like iPhone's circle or Samsung's rectangle, put 2 shoulder buttons on the side and a stylus on the bottom, and you have a compelling game device.

ArchangelWest:

Tessellation is fixed function, that is the way it is done even on PC hardware right now. As for what you are saying, it is possible that they have added some fixed function hardware inside the SPU i believe, but it is more likely that these shader units are just far more enlarged... maybe something crazy happened and they combined the 5th shader into every other shader in VLIW5, so it is a hybrid of VLIW5 and VLIW4, meaning that VLIW5's problem of the 5th shader often being useless has been corrected by unifying the shaders like in GCN (which was one of the points I was trying to make earlier) This would allow them to move from instruction level parallelism (ILP) to thread level parallelism (TLP) which would be a focus if Nintendo really did want to create a GPGPU-centric GPU.

VLIW had a problem scheduling processes of the shader ahead of time, this is why VLIW5 often didn't even use all 4 of the base shaders and rarely used the 5th shader. VLIW is rather complex and it is hard to predict performance thanks to issues like the scheduler and not all of the shaders being created equal in VLIW5's case.

What I think the extra space could be is added logic to make all 5 VLIWs in a group more equal like VLIW4, then adding a scheduler inside the SPU allowing the hardware to do the work itself, this corrects the weaknesses of VLIW and makes all 20ALUs in a SPU far more efficient, this is largely something they would have to do to address TEV anyways I believe and should solve GPGPU issues VLIW5(and 4) had.

This would account for the extra space and would give the 160 ALUs a noticeable boost, if you were to compare it to VLIW5 for instance, it would be similar to having 200ALUs and should offer even greater performance thanks to having the scheduler in the hardware and would allow code to not stall in VLIW.

This isn't out of order execution mind you, it can't jump around the code but it could select different code, pixels or values to work on, rather than waiting around for jobs to finish that only utilize 3 or 4 shaders.

I guess this sounds a lot like GCN at this point, the main difference of course is that they would still be locked down to 5 shader groups in series of 4 rather than 16 shader groups in series of 4. of course it's likely that 4 VLIW SPUs still work as 1 SIMD (iirc) which means that you'd have 2 "CU" that are 80ALUs wide rather than 2 that are 64 shaders wide. However if there is a scheduler in each SPU, that means you could have more elements for each thread than GCN, basically wider lanes than GCN has. The trade off is die space, schedulers take up room, but there is plenty of extra space in the SPUs than what you have in R700 SPUs, meaning this is some ridiculous custom R700 chip that has become some sort of hybrid GCN/VLIW thing.

Edit: I just looked it up and Cayman had 44 cycle latency. This is completely against the design logic of Wii U and is a major reason they might of done something very similar to what I suggest above. (Cayman is VLIW4 which has a lower latency than VLIW5 chips iirc)
GCN has a much lower cycle latency and so would this, possibly even lower than GCN since there is more schedulers and a wider series of elements (20 shaders vs 16)


I'd say you are a bit too optimistic there. 20nm won't be ready until 2014, then it'll be another 2.5-3 years (i.e. 2016 or 2017) until the next process (16nm?) is ready. We problem won't see 10nm until 2020+.
 
I'd say you are a bit too optimistic there. 20nm won't be ready until 2014, then it'll be another 2.5-3 years (i.e. 2016 or 2017) until the next process (16nm?) is ready. We problem won't see 10nm until 2020+.

http://www.brightsideofnews.com/new...mce28099s-16nm-finfet-process-technology.aspx

Still, ARM needs to compete on the high end as well, and here is where today's announcement fits perfectly. The company announced that it has completed the first tape-out using the 16nm FinFET Technology at Taiwan Semiconductor Manufacturing Company (TSMC). The Cortex IP selected for the task is the upcoming 64-bit ARMv8 architecture, dubbed Cortex-A57.

http://www.eetimes.com/electronics-...ce-to-14-nm-as-IBM-waits-for-EUV?pageNumber=1

K.H. Kim, executive vice president of Samsung's foundry business, said the company will run 14 nm test shuttles for select customers in April and September. It has 14 nm IP partnerships with ARM, Synopsys and Analog Bits.

Meanwhile, Samsung is converting its Austin, Texas fab from memory to logic, expecting the first 28 nm wafers from it this year. It also will produce 20 and 14 nm wafers starting as early as the end of 2014 in a new fab in Korea.


The 20 nm node is the first to require double patterning, a cost adder. The 14 nm node is essentially a 20 nm process with FinFETs, another cost, said IBM’s Patton.

“It’s not a true shrink, but when you get to 10 nm it is a true shrink and I expect significant cost benefits,” Patton said. “Is there still a cost benefit [with 14 nm]? Absolutely, it just won’t be as large as it was historically,” he added.

It's sort of weird what they are doing to try and keep up with Intel right now, but by 2016 they should be down to 10nm and by then it will actually be a true die shrink unlike some of the messy stuff we are seeing now.
 
... while I do follow technology and whatnot, I am not hardware engineer or a game programmer, so I am largely building off the work of others. Regardless, I've spent a fair amount of my leisure time comparing the die shot, reading the analysis of others, and trying to test different theories in an unbiased fashion.

Hello Mr Fourth Storm, I appreciate your efforts! and for the record I don't have any great experience of hardware engineering either. But I do now work in an evil globo-mega-corp which has a lot of those folks. I managed to get a couple of them to look at the die shots and have a chat about my questions. Unfortunately they confirmed that it really is impossible to tell from just those shots very much at all. They did however have some general points on the interpretation. The primary being that with a hand layout, all bets are off with respect to comparison against other dies and even within the same layout! i.e. Every hand layout is really a mix of auto (majority) and manual with the "same" logic varying in density/structure within the same die depending on positioning, heat, clock speed and the importance of those factors. So if you'll forgive my selective quoting...


In the case of the SPU count, this early post over on beyond3D was one of the first to make me raise an eyebrow and question how the shader blocks could possibly hold more than 20 shaders each. Emphasis is mine.



http://beyond3d.com/showpost.php?p=1702908&postcount=4495

Since Gipsel posted this, it was concluded that the SRAM in the SPU blocks is not dual ported. Also, it seems like each SRAM block holds 4kB and not 8kB. I arrived at this by comparing the SRAM blocks to the smaller ones on the bottom of Latte identified by Marcan (check the OP for that image) as 2 kB. The SRAM used as GPRs for the shaders are exactly twice as long as those 2kB blocks. Other than that, they appear identical, so a differing density seems highly unlikely (unlike the SRAM used in the 1MB pool of texture cache in the upper left of the chip - that appears to be more dense and with such a large amount necessary, it's unsurprising).

Their appearance at this scale is a bit meaningless. A near doubling of density could look pretty much the same to the naked eye. Also given the difference in location and application I think it's perhaps mistaken to assume that a difference in density is highly unlikely.


As to why they are the size they are, we can really only guess, but there are a few factors which may come into play:

a)We've assumed perfect scaling from the 55nm RV770, which is usually not the case
b)Renesas' 40nm process may be less dense than TSMC's (which is known for being incredibly dense). They may lose some density in making the process eDRAM friendly.
Very true. However, the achieved scaling can be better or worse depending on the factors involved and the capabilities of the process. e.g. advances in leakage prevention enabling a much higher dynamic power density. Moreover, this is not a simple die shrink as the hand layout attests to. Similarly the logic/function density can be dramatically increased by targeting a specific clock speed. Consumer GPUs are typically required to operate at a wide range of clock speeds to allow product differentiation, boost effective yields etc. If you know you are running at clock speed X and no higher, then (particularly if X is lower than the speed of the original design) the function density gains can be very substantial.

c)There may be extra logic in the shader blocks that runs the shim layer (the compatibility layer that performs translation), as Marcan described it. The 8-bit CPU he mentioned is specifically for converting the Wii video output to the format now used by Radeons. There is other logic on there to handle TEV instruction translation. I don't know exactly where it is, but it could very well be right there in the shader blocks.
d)Other small tweaks could have been implemented that make the shaders somewhat larger. DirectX11 SPUs have some additional logic in there to support the new features of the API, so perhaps Nintendo added something analagous for whatever features above DirectX10.1 they decided to include.

Yup, that could very well be the case. I guess all I'm saying is that trying to draw firm conclusions via size comparisons is a bit flawed. Or to put it another way, the size reductions\density increase to support 40 shaders per block are entirely feasible.
 
That product is being developed by a Japanese manufacture iirc, meaning that the price likely is in yen. Making the Wii U's MCM price drop from $100 USD (if it was that high at launch) to ~$65 USD.

The price for the Wii U in japan is 26250 yen which was $340 USD (for the basic unit at launch) and is now $265 USD today. All japanese components for the device are likely contracted so Nintendo is saving a great deal of money on each Wii U made currently. Also they are selling Wii U for 30441 yen on each basic model sold in the us.

Anyone who thinks the hardware is still selling for a loss, doesn't understand this situation. At least this is my understanding of what is going on.

Do we know that Renesas is making the mcm in Japan for sure? I thought is TSMC.
 
I guess I should do a TL;DR since some people are asking me for one.

Why 20 ALUs per SPU from R700 series doesn't make sense:

Space; 20 ALUs would only take up ~60% of the SPU space, of course there is another post above this one that talks about density being very hard to tell when using hand placed designs like this. Also making 40 ALUs per SPU within reason, especially if clock frequency was known before hand (could be why it is an odd clock given Nintendo's history)

Latency; R700 latency is high, it required ~70 to 80 cycles iirc, even northern islands (HD 6000 series) has 44 cycle latency, which is pretty bad when looking at the rest of Wii U's design especially when GCN's is much much lower. This is a bottleneck that Nintendo would of worked down, but the problem is in the architecture.

Architecture; VLIW5 is split in 5 shaders, 4 of which do much of the same thing, but a special 5th shader is part of the pipeline as well, this shader is designed to handle bigger tasks of a different nature, something they got rid of in VLIW4, but also increased what those 4 other shaders could do in order to accommodate removing the 5th shader. VLIW is also very hard to write for, you have to predict what is available at any time or the elements (the different shaders) won't be used, forcing only 3 or 4 elements (shaders/ALUs) to be used per cycle. This is extremely inefficient but there is an answer to it.

Proposal: Nintendo has a custom GPU that made all 5 shaders in a VLIW grouping more equal so that they could all handle the same tasks when asked, they also moved schedulers right into the hardware, inside of the 20 or 40 ALU blocks (SPUs) they have a scheduler, this would allow it to work much the same way that GCN does (at least to some extent) instead of the 64 ALU (4 groups of 16ALUs) Compute Units, you'd have 80 ALU (4 groups of 20ALUs set up as VLIW5 though each shader would be closer to VLIW4) this gives you a very good Compute Shader layout, something that can really be used as a GPGPU.

What this would mean is 2 or 4 large custom CUs built with VLIW architecture and could even have came from R700, but it would mean updated shader units and schedulers on or near the SPUs. Efficiency and latency would also be very good and work well with this design.

For those looking for some sort of "what's the scouter say about his power level" or the very basic TL;DR it would be 176GFLOPs (~2.5 GCN CU @ 550MHz) or 352GFLOPs (~5 GCN CU @ 550MHz) from a very custom VLIW architecture.
 
Estimate the size of the chip and divide by wafer size. Only thing missing is yields, should be pretty good for sony since it pretty much a known design.

Now for the motherboard would be a lot tougher to figure out.

I would be surprise if it cost sony more than $50 per SOC to make. Since it size should be under 300mm2. They made have to pay amd a couple dollar per chip also.

$50 per SOC sounds awfully optimistic. That would make it drastically cheaper than AMD APUs that are absolute hot garbage by comparison. Yes I know you can't compare retail to wholesale, etc. But the gap would be absolutely extremely massive. I guess we won't know more until something leaks, or we get a teardown and BOM.

Also, the $100 estimate for the Wii U MCM came from one of the guys at chipworks. He didn't really give a fully fleshed out reasoning. He could very well be wrong.
 
$50 per SOC sounds awfully optimistic. That would make it drastically cheaper than AMD APUs that are absolute hot garbage by comparison. Yes I know you can't compare retail to wholesale, etc. But the gap would be absolutely extremely massive. I guess we won't know more until something leaks, or we get a teardown and BOM.

Also, the $100 estimate for the Wii U MCM came from one of the guys at chipworks. He didn't really give a fully fleshed out reasoning. He could very well be wrong.

Really it should cost less than $40. But I'm sure Sony paying amd something. Chips do not cost much to make as long as yields are good.

Now if Sony just had terrible yields th's price would be a little higher.
 
These threads go in cycles.

Anything that even remotely sounds like something negative towards the hardware gets thrown back as being the fault of someone else other than AMD/Nintendo. Its a one-sided discussion. Everything that is slightly pessimistic is generally thrown out by the most active posters. Selective memories, and all that.

In reality we would see superfluous improvements in certain aspects of the games, like we have with NFSMW and its taking advantage of the more recent feature set and RAM in the Wii U.

There's plenty of people on both sides of the river. So don't even try to act like only Nintendo fans are biased here and the rest of you lot are trying to bring reason to the heathens.

When I'm at work reading neogaf, I hate Nintendo for the Wii u. When I am at home playing my wii u, I am in so much love with it. Love the os, the feel, the games, the everything.

Wii not being in HD is not why I felt off playing Wii games. It was the lack of a positive community. Now I can play Nintendo games on an always connected system with other Nintendo fans. That's why I love the Wii u , not that it has a slow CPU or hard to develop for gpu.

From what I gathered off reading neogaf, a lot of people want to see Nintendo drown and they are dismissing good news and spreading bad news even if the bad news is a flat out lie.

Hey, are you the same Scrawnton fron NW?
 
I guess I should do a TL;DR since some people are asking me for one.

Why 20 ALUs per SPU from R700 series doesn't make sense:

Space; 20 ALUs would only take up ~60% of the SPU space, of course there is another post above this one that talks about density being very hard to tell when using hand placed designs like this. Also making 40 ALUs per SPU within reason, especially if clock frequency was known before hand (could be why it is an odd clock given Nintendo's history)

Latency; R700 latency is high, it required ~70 to 80 cycles iirc, even northern islands (HD 6000 series) has 44 cycle latency, which is pretty bad when looking at the rest of Wii U's design especially when GCN's is much much lower. This is a bottleneck that Nintendo would of worked down, but the problem is in the architecture.

Architecture; VLIW5 is split in 5 shaders, 4 of which do much of the same thing, but a special 5th shader is part of the pipeline as well, this shader is designed to handle bigger tasks of a different nature, something they got rid of in VLIW4, but also increased what those 4 other shaders could do in order to accommodate removing the 5th shader. VLIW is also very hard to write for, you have to predict what is available at any time or the elements (the different shaders) won't be used, forcing only 3 or 4 elements (shaders/ALUs) to be used per cycle. This is extremely inefficient but there is an answer to it.

Proposal: Nintendo has a custom GPU that made all 5 shaders in a VLIW grouping more equal so that they could all handle the same tasks when asked, they also moved schedulers right into the hardware, inside of the 20 or 40 ALU blocks (SPUs) they have a scheduler, this would allow it to work much the same way that GCN does (at least to some extent) instead of the 64 ALU (4 groups of 16ALUs) Compute Units, you'd have 80 ALU (4 groups of 20ALUs set up as VLIW5 though each shader would be closer to VLIW4) this gives you a very good Compute Shader layout, something that can really be used as a GPGPU.

What this would mean is 2 or 4 large custom CUs built with VLIW architecture and could even have came from R700, but it would mean updated shader units and schedulers on or near the SPUs. Efficiency and latency would also be very good and work well with this design.

For those looking for some sort of "what's the scouter say about his power level" or the very basic TL;DR it would be 176GFLOPs (~2.5 GCN CU @ 550MHz) or 352GFLOPs (~5 GCN CU @ 550MHz) from a very custom VLIW architecture.

Plausible theory. Sounds good!
 
Thought this info might be useful, i don't remember seeing this posted here before, there is a user over at wiiuforums named Alex Atkin Uk and he noticed something with the latest update:

Dramatic improvement in performance!



I was able to get an average of 2.5MB/s from the USB adapter downloading to a HDD with the proxy enabled. Without the proxy it fluctuates wildly but still seems much faster than before, I suspect it may average around the same but its harder to measure when it fluctuates so much. Either way its a big improvement.



Downloading to internal memory still is garbage at an average of about 400KB/s, it did peak a few times at 1MB/s but then seemed to stall for a second or two (while it saved the data to the storage no doubt). As previously suspected it seems the internal storage has very slow write speeds which is bottle-necking the download. Its annoying as if I had known the built-in memory would be this bad I might not have bought the Wii U Deluxe. This is particularly relevant as saving to a cheap USB memory stick performs almost as good as an external HDD, doing between 1-2MB/s even while playing LEGO City Undercover at the same time which is excellent compared to PS3/Xbox 360 which slow down to 500KB/s when background downloading.



One of the big problems with the Wii U is that the downloads seem to be split into many different smaller files which slows down things a lot as the nature of TCP/IP is you start off slow with each connection and gradually speed up as it detects you are capable of handling it. So its far better to download one big file rather than several smaller ones.



The new background installing seems to be a huge problem too now as when its stalling the Wii U becomes unresponsive when trying to move between sections of the UI. I just tried to view Download Management and its just sat there on the loading screen for about a minute. The same thing happened when I tried to go into settings earlier, I had to wait several minutes for any response at all, the home menu just locked up.



There is one good thing though, background downloading/installing doesn't seem to affect actual retail disc playback at all. Clearly the issue is downloading to the internal storage while also trying to read the OS from internal storage, is a bad idea. I While in a disc based game you aren't really doing anything with internal storage (except when saving the game) so things work smoothly.



Netflix still doesn't seem to work very well for me though, PS3 continues to remain the most reliable device to playback Netflix in top quality.
Here is another post:
To summarise, wired used to perform worse than wireless but the OS update 3.0 fixed that problem. Wired is now recommended.



However, the biggest speed limit is the internal memory of the Wii U. If you use an external USB HDD then you will get faster speeds from WiFi and USB Ethernet. My latest test from USB Ethernet was:



Internal Memory: ~3Mbit (400KB/s)

External HDD: ~ 20Mbit (2.5MB/s)



The Wii U is limited to around 20Mbit however no matter what you do and it seems easier to maintain this speed if you use a local proxy server to buffer the download, similar to what you used to have to do on the PS3 to get the best speeds off PSN (and still have to do on PS Vita).



In my tests above I had the proxy server cache the entire FIFA demo by downloading it once to the Wii U with the proxy enabled. I then wiped it from the Wii U and downloaded again (checking the server log to show that it registered as a cache-hit), so it came directly from the proxy server on the LAN.



This is as fast as it could go, if the Wii U wasn't acting as a bottleneck it should have been able to do ~100Mbit. In fact downloading from the Nintendo servers directly to a PC I can get around 30Mbit, still faster than the Wii U.

Strange that the internal memory which is supposed to be eMMC NAND comparable to ssd right? Is slower than usb 2.0? Now i have a question since ssd is basically flash memory doesn't that mean as it fills up in space lets say 28/30gb full it performs slower?
 
Thought this info might be useful, i don't remember seeing this posted here before, there is a user over at wiiuforums named Alex Atkin Uk and he noticed something with the latest update:


Here is another post:


Strange that the internal memory which is supposed to be eMMC NAND comparable to ssd right? Is slower than usb 2.0? Now i have a question since ssd is basically flash memory doesn't that mean as it fills up in space lets say 28/30gb full it performs slower?
If those bandwidths are true, that is absolutely appaling. They seem too low to be real. I'd expect 10x that from USB and at least the same from flash.
 
If those bandwidths are true, that is absolutely appaling. They seem too low to be real. I'd expect 10x that from USB and at least the same from flash.

Well it's gonna cap at about 30mb/s read/write because of usb 2.0. Kind of sucks that nintendo wont let us dl games on sd cards. Does the wii u support sdhc speeds? There is a sandisk one that does about 95mb/s read.
 
http://www.brightsideofnews.com/new...mce28099s-16nm-finfet-process-technology.aspx



http://www.eetimes.com/electronics-...ce-to-14-nm-as-IBM-waits-for-EUV?pageNumber=1



It's sort of weird what they are doing to try and keep up with Intel right now, but by 2016 they should be down to 10nm and by then it will actually be a true die shrink unlike some of the messy stuff we are seeing now.


First tape out /= mass production

As I said 20nm will only become available in 2014 at TSMC and for obvious reasons 16nm will need another ~2.5-3 years. So there is no chance at all for 10nm to be ready (again: actual production, not some PR stunt or TSMC roadmap BS) in 2015, 16 or 17.


Also see this concerning tape out and mass production:
http://www.extremetech.com/computin...rtex-a57-tape-out-chip-launching-no-time-soon

"When asked how much volume TSMC would be doing on 16nm in 2015, Chang responded “I think it will be very small.” And that tracks perfectly with what we’ve previously seen from ARM/TSMC collaborations. The first Cortex-A15 built on 20nm with TSMC taped out in 2011. If Chang’s remarks are accurate, we might see such a chip come to market in 2014 or 2015 — three or four years after tape-out."
 
If those bandwidths are true, that is absolutely appaling. They seem too low to be real. I'd expect 10x that from USB and at least the same from flash.

The speed while using external USB is probably being limited by the network adapter. But the speed while using internal flash... appalling doesn't even begin to cover it...
 
They probably didn't bother with a decent memory controller.
How undecent a controller needs to be to get this performance?

You do realize that the process of moving around DRM'd data can be a bit more involved than reading and writing raw blocks, right? Pretty much every piece of data that enters and circulates the system gets decrypted & verified, just so then it would be re-signed and re-encrypted. Now, I'm not saying all that needs to be slow, but the bottleneck could be as unrelated to the flash controller & eMMC speeds as you could possible imagine.

Heck the ps3 is not exactly a data transfer champ when it comes to jiggling DRM data, and that machine has an industry standard HDD and an entire SPE dedicated to the purpose.
 
How undecent a controller needs to be to get this performance?

You do realize that the process of moving around DRM'd data can be a bit more involved than reading and writing raw blocks, right? Pretty much every piece of data that enters and circulates the system gets decrypted & verified, just so then it would be re-signed and re-encrypted. Now, I'm not saying all that needs to be slow, but the bottleneck could be as unrelated to the flash controller & eMMC speeds as you could possible imagine.

Heck the ps3 is not exactly a data transfer champ when it comes to jiggling DRM data, and that machine has an industry standard HDD and an entire SPE dedicated to the purpose.

The flash version of the PS3 has terrible write speeds as well.
 
The speed while using external USB is probably being limited by the network adapter. But the speed while using internal flash... appalling doesn't even begin to cover it...

Maybe it's because the os is inside the system memory and performing 2 actions such as reading the os and downloading/playing a game running on the system memory is not the most optimal way?

Clearly the issue is downloading to the internal storage while also trying to read the OS from internal storage, is a bad idea. I While in a disc based game you aren't really doing anything with internal storage (except when saving the game) so things work smoothly.
 
I guess I should do a TL;DR since some people are asking me for one.

Why 20 ALUs per SPU from R700 series doesn't make sense:

Space; 20 ALUs would only take up ~60% of the SPU space, of course there is another post above this one that talks about density being very hard to tell when using hand placed designs like this. Also making 40 ALUs per SPU within reason, especially if clock frequency was known before hand (could be why it is an odd clock given Nintendo's history)

Latency; R700 latency is high, it required ~70 to 80 cycles iirc, even northern islands (HD 6000 series) has 44 cycle latency, which is pretty bad when looking at the rest of Wii U's design especially when GCN's is much much lower. This is a bottleneck that Nintendo would of worked down, but the problem is in the architecture.

Architecture; VLIW5 is split in 5 shaders, 4 of which do much of the same thing, but a special 5th shader is part of the pipeline as well, this shader is designed to handle bigger tasks of a different nature, something they got rid of in VLIW4, but also increased what those 4 other shaders could do in order to accommodate removing the 5th shader. VLIW is also very hard to write for, you have to predict what is available at any time or the elements (the different shaders) won't be used, forcing only 3 or 4 elements (shaders/ALUs) to be used per cycle. This is extremely inefficient but there is an answer to it.

Proposal: Nintendo has a custom GPU that made all 5 shaders in a VLIW grouping more equal so that they could all handle the same tasks when asked, they also moved schedulers right into the hardware, inside of the 20 or 40 ALU blocks (SPUs) they have a scheduler, this would allow it to work much the same way that GCN does (at least to some extent) instead of the 64 ALU (4 groups of 16ALUs) Compute Units, you'd have 80 ALU (4 groups of 20ALUs set up as VLIW5 though each shader would be closer to VLIW4) this gives you a very good Compute Shader layout, something that can really be used as a GPGPU.

What this would mean is 2 or 4 large custom CUs built with VLIW architecture and could even have came from R700, but it would mean updated shader units and schedulers on or near the SPUs. Efficiency and latency would also be very good and work well with this design.

For those looking for some sort of "what's the scouter say about his power level" or the very basic TL;DR it would be 176GFLOPs (~2.5 GCN CU @ 550MHz) or 352GFLOPs (~5 GCN CU @ 550MHz) from a very custom VLIW architecture.

Excellent. This is what I like to see. Progress. Now if we could just get some of this for the CPU.

Where exactly, is the GPGPU logic housed on the Latte. My understanding of GPGPU is very primary. I'm only aware of what it can achieve, not how.
 
Excellent. This is what I like to see. Progress. Now if we could just get some of this for the CPU.

The CPU isnt such a mystery and has already more or less been figured out.

Basicly its a very efficient cpu. Its stronger than the 360 cpu in some areas and weaker in others. Its comparable to the ps4 cpu core for core but there are only 3 cores.
 
The CPU isnt such a mystery and has already more or less been figured out.

Basicly its a very efficient cpu. Its stronger than the 360 cpu in some areas and weaker in others. Its comparable to the ps4 cpu core for core but there are only 3 cores.

That isn't saying anything. That's just describing the chip.

Saying that somethings are good and somethings are bad does just cancels itself out. It does not gauge any form of real world capability. I'm looking for details and a lot of details for the CPU are still largely unknown like just what actually is better and what is worse. Just the other day we were made aware that it may have more registers than the previous CPUs in the series.

There are only 5 things that are really known for certain about the CPU.
1. Its in the PPC750 series.
2. Its active clock speed in Wii U mode is 1.24 GHz.
3. Its an out-of-oder CPU that processes multiple instructions per cycle.
4. It locks 2 cores and lowers its clock in Wii mode.
5. It support voltage stepping(known because of 4).

The components have also been labeled, but beyond that it is unknown how capable it is.
 
The speed while using external USB is probably being limited by the network adapter. But the speed while using internal flash... appalling doesn't even begin to cover it...

Data transfers are limited by the Starbuck ARM processor as well, which is responsible for encrypting and decrypting signed data on the fly, just like the ARM processor on the original Wii. Hopefully the Summer update makes dramatic improvements in it's performance.
 
That isn't really saying anything

Saying that somethings are good and somethings are bad does just cancels itself out. It does not gauge any form of real world capability. I'm looking for details and a lot of details for the CPU are still largely unknown like just what is better and what is worse. Just the other day we were made aware that it may have more registers than the previous CPU's in the series.

There are only 5 things that are really known for certain about the CPU.
1. Its in the PPC750 series.
2. Its active clock speed in Wii U mode is 1.24 GHz.
3. Its an out-of-oder CPU.
4. It locks 2 cores and lowers its clock in Wii mode.
5. It support voltage stepping(known because of 4).

We know it has paired singles. We also know two of the cores have 512Kb L2 Cache and one has 2MB. That's about it.
 
We know it has paired singles. We also know two of the cores have 512Kb L2 Cache and one has 2MB. That's about it.

Yeah, I added that the components have labeled while you were quoting this. We know what they are, but not how well they work.

Actually comparing what is written about the CPU on the first page and what is written about the GPU. We seem to know more about the GPU than we do about the CPU. I wonder if Criterion would answer if someone asked them just what is better and what is worse when using the CPU. I don't see why that would be NDAed
 
Data transfers are limited by the Starbuck ARM processor as well, which is responsible for encrypting and decrypting signed data on the fly, just like the ARM processor on the original Wii. Hopefully the Summer update makes dramatic improvements in it's performance.

Would it be fair to say that since the basic wii u console is left with 3gb out of 8 and the deluxe 25 out of 32. That the basic model would be a little slower and could present more problems compared to the deluxe model? Taking into considerarion that as flash memory fills up it becomes slower? Ever since i bought the wii u, i have not written anything to the system memory only to my usb drive and the system seems fast to me, barely had any problems with it.
 
That isn't saying anything. That's just describing the chip.

Saying that somethings are good and somethings are bad does just cancels itself out. It does not gauge any form of real world capability. I'm looking for details and a lot of details for the CPU are still largely unknown like just what actually is better and what is worse. Just the other day we were made aware that it may have more registers than the previous CPUs in the series.

There are only 5 things that are really known for certain about the CPU.
1. Its in the PPC750 series.
2. Its active clock speed in Wii U mode is 1.24 GHz.
3. Its an out-of-oder CPU that processes multiple instructions per cycle.
4. It locks 2 cores and lowers its clock in Wii mode.
5. It support voltage stepping(known because of 4).

The components have also been labeled, but beyond that it is unknown how capable it is.

6. The pipeline has 4 stages.
 
First tape out /= mass production

As I said 20nm will only become available in 2014 at TSMC and for obvious reasons 16nm will need another ~2.5-3 years. So there is no chance at all for 10nm to be ready (again: actual production, not some PR stunt or TSMC roadmap BS) in 2015, 16 or 17.


Also see this concerning tape out and mass production:
http://www.extremetech.com/computin...rtex-a57-tape-out-chip-launching-no-time-soon

"When asked how much volume TSMC would be doing on 16nm in 2015, Chang responded “I think it will be very small.” And that tracks perfectly with what we’ve previously seen from ARM/TSMC collaborations. The first Cortex-A15 built on 20nm with TSMC taped out in 2011. If Chang’s remarks are accurate, we might see such a chip come to market in 2014 or 2015 — three or four years after tape-out."

What does tape out mean? That they can fabricate at that size, but they're not ready for mass production?
 
Would it be fair to say that since the basic wii u console is left with 3gb out of 8 and the deluxe 25 out of 32. That the basic model would be a little slower and could present more problems compared to the deluxe model? Taking into considerarion that as flash memory fills up it becomes slower? Ever since i bought the wii u, i have not written anything to the system memory only to my usb drive and the system seems fast to me, barely had any problems with it.

I'm in the same boat. I have the deluxe model, but I have used an external USB drive for just about everything. The OS is loaded into the internal flash, and it seems much faster after the spring update. I would wager the slowness of the internal storage has a lot to do with a poorly tuned operating system and the ARM decoder. Software should be able to improve the performance of both by quite a bit, as it already has in the spring update.
 
I was leaning towards the lack of information on real world performance and capability for the CPU.

I wish we had better data, but we probably won't get what we want until the developer spills the beans anonymously or homebrew actually picks up steam on the console. We might get lucky with the former, but overall interest for the system will have to pick up quite a bit for the latter to take place, especially with how easy Nintendo is making it for small devs to monetize their development anyways.
 
Status
Not open for further replies.
Top Bottom