I don't really care what Wii U's ALU count is, I only originally cared because I wondered how it stacked up to Xenos, but Xenos was so poor in efficiency that once I dug into it's performance and found that it was almost half as powerful as I thought it was, Wii U's bare minimum spec would be able to clearly win in performance, so I found little reason to care beyond that, but all the people throwing around 160 ALUs only do so because the majority of people following these threads have taken that number as fact, when in reality it is a poor guess based on performance of one game with a small budget not trying to push the console in any real way. I still think it is more likely that 320 ALUs makes sense and one reason fourth storm decided it didn't is because he heard from someone (a rumor) that Latte was actually a 45nm process chip, but I find that very odd since no AMD GPU has ever been 45nm afaik, and even the r700 was produced at 40nm and was found to be significantly more efficient, Nintendo's design does go against using 45nm here and just makes very little sense when the MCM was needed to hit more exact performance than anything Nintendo has made in the past.
Just catching up with this thread. Man oh man, it's getting a bit crazy in here!
Anyway, z0m, I don't think 45nm vs 40nm is that big of a deal as much as I do the switch to a different manufacturer - namely Renesas, who to my knowledge have never manufactured a Radeon part previously. Add to the fact that they had to put the blocks on an eDRAM friendly process and that might also account for less density.
Another thing I learned recently in regards to the Brazos size comparisons is that the ALUs in that design lack double precision floating point capabilities. We don't know if Latte has this or not, but considering the fact that Iwata called it a "GPGPU," we can at least entertain the possibility that it might, as double-precision math is an important aspect of many OpenCL applications. If Latte's shaders do this, it could also contribute to their larger relative size.
No, my biggest reasons for rejecting anything other than 160 shaders at this point are the register banks within the shaders, the TMU count, and the TDP.
On a different note, I actually took to studying the die a bit more last night (I have no idea why it gives me such entertainment) while using Brazos as a reference. One of the things that has bothered me is the V block on Latte. I have, up until now, identified it as the Global Data Share, and pegged it at 64kiB. I didn't find this ID entirely satisfactory, however. The main reason is that to have no sort of memory controller adjacent to the DDR3 interface would be quite unusual. I also found it suspicious that the block housed two symmetric SRAM groups rather than one large pool.
Anyway, when I got to looking at Brazos again, I realized that even though the GPU supposedly has a 64kiB GDS, I couldn't for the life of me locate it. That much memory should be easy to find. I finally concluded that the only place that it could be located is within the Shader Export block. Looking at some GCN documentation (this part of which we can probably apply to previous designs as well), it stated that export is the path to GDS.
So I now believe Latte's GDS to be found within the Shader Export block, namely Block P. I am less confident on the size of the memory store now, but it's probably somewhere between 16kB (as in R700) and 64 kB (as in everything later than R700). It also makes sense that this block is adjacent to the blocks I believe are the Local Data Shares, the Q blocks.
Meanwhile, Block R is probably the North Bridge, as it seems to resemble other NB designs and features some dual ported SRAM, which I figure might be useful in such a block. It's also adjacent to the eDRAM and whatever kindof channel/bus thingy is going on in the middle of it. Block V would then be some type of DMA engine or an additional memory controller for the DDR3 memory, if the NB is meant specifically to handle access to the eDRAM. Maybe something like the separate "MAC" blocks described in the patent for Flipper's "enhanced memory controller." Block O might be a hub designed to arbitrate memory accesses from the UVD, Display Controller, DSP, and other less bandwidth intensive components separate from the graphics pipeline.
Finally, I think the 3 MB MEM 0 is probably being used by Nintendo as a cache of sorts for intercommunication between the CPU and GPU (conveniently the same size as the CPU's L2), and the Front Side Bus hardware is most likely Block A.