Fourth Storm
Member
Your assumptions here are a bit off. The wattage of a 160ALU @ 550 MHz on 40nm that is designed in a bubble, would be ~10watts, not to mention it is MCM and low powered 40nm process being extremely likely.
Another problem is the document, final hardware didn't exist for launch, in order to make the launch games, stand in hardware needed to be used. Wii U's dev kits from early on were R700 this made sense to continue to code around, these games were "designed in a bubble" meaning that the code they wrote would have to work on the dev kits that existed at the time, VLIW5 code works beautifully on VLIW4 and even better on GCN. If you want to test this, pull a VLIW5 designed demo and run it on the newer card, even at the same specs, it will run better on the later designs, your assumptions here are misguided and they don't fit with the reality we see in the die photo.
Did you take a look at the GCN memory set up? I think the major problem with using R700 as the final specs is that that document was written for the dev kits received, I talked to a source around that time and he had mentioned that the document kept being updated. This points to a changing environment, I haven't had the chance to speak to him since September but it is likely that in November final hardware no longer resembled what was in the dev kit. could also be a strong reason why NFS was so easily impressive, while launch software failed to do anything beyond last gen consoles, the dev kits were simply not close enough to final hardware.
PS that would also explain the fps drops we see in COD BOP2.
From what I've read, the main advantage of VLIW4 vs VLIW5 is that VLIW4 simply allowed them to fit more shader cores on the design. A low SPU VLIW4 design should not have any great advantage over VLIW5. VLIW5 also has a greater theoretical performance. There is a reason AMD went back to it before finally switching to GCN.
Devs working on PC games would have a more difficult time optimizing for VLIW5 because they are likely working through DirectX and have to ensure the code runs on a variety of cards from both AMD and NVidia. If early on Wii U games were optimized to actually keep all 5 ALUs in the VLIW5 fed (which is certainly possible considering ports were likely already optimized for the vec4+scalar config of the 360), a switch to VLIW4 could really throw a monkey wrench into that code, unless the shader count was jacked up.
Edit: As for a GCN setup, that would look much much different than what we are seeing on Latte. The SRAM amount in the compute units, for one, would be significantly higher.