That's what I imagined earlier (can't remember which tread). Make a 20nm SoC out of the WiiU tech (or slightly enhanced) for the next handheld (64bit DDR interface is also a perfect fit for this) and bump the specs for the next homeconsole on the same architecture (8 CPU cores, more GPU cores (~2TF), faster and more RAM, etc.). Would make things a lot easier for engine development etc.
I would imagine that they use something smaller than 20nm, 10nm should be ready by 2016 and DS4 isn't going to show up until 2017 or 2018 IMO but yes this is exactly what I'm saying, build the home console and convert it to the next handheld until you can't anymore. Iwata said they are exploring this more or less anyways when he said they will be moving onto 1 architecture in their next cycle, it could also mean building Wii U down into a phone at some point, because the reality is there is just too much money to be made, and if they could somehow get Android running on their Wii U hardware, they could just have Samsung or another manufacture build their handheld device.
I've even thought they can use the + as their home button like iPhone's circle or Samsung's rectangle, put 2 shoulder buttons on the side and a stylus on the bottom, and you have a compelling game device.
ArchangelWest:
Tessellation is fixed function, that is the way it is done even on PC hardware right now. As for what you are saying, it is possible that they have added some fixed function hardware inside the SPU i believe, but it is more likely that these shader units are just far more enlarged... maybe something crazy happened and they combined the 5th shader into every other shader in VLIW5, so it is a hybrid of VLIW5 and VLIW4, meaning that VLIW5's problem of the 5th shader often being useless has been corrected by unifying the shaders like in GCN (which was one of the points I was trying to make earlier) This would allow them to move from instruction level parallelism (ILP) to thread level parallelism (TLP) which would be a focus if Nintendo really did want to create a GPGPU-centric GPU.
VLIW had a problem scheduling processes of the shader ahead of time, this is why VLIW5 often didn't even use all 4 of the base shaders and rarely used the 5th shader. VLIW is rather complex and it is hard to predict performance thanks to issues like the scheduler and not all of the shaders being created equal in VLIW5's case.
What I think the extra space could be is added logic to make all 5 VLIWs in a group more equal like VLIW4, then adding a scheduler inside the SPU allowing the hardware to do the work itself, this corrects the weaknesses of VLIW and makes all 20ALUs in a SPU far more efficient, this is largely something they would have to do to address TEV anyways I believe and should solve GPGPU issues VLIW5(and 4) had.
This would account for the extra space and would give the 160 ALUs a noticeable boost, if you were to compare it to VLIW5 for instance, it would be similar to having 200ALUs and should offer even greater performance thanks to having the scheduler in the hardware and would allow code to not stall in VLIW.
This isn't out of order execution mind you, it can't jump around the code but it could select different code, pixels or values to work on, rather than waiting around for jobs to finish that only utilize 3 or 4 shaders.
I guess this sounds a lot like GCN at this point, the main difference of course is that they would still be locked down to 5 shader groups in series of 4 rather than 16 shader groups in series of 4. of course it's likely that 4 VLIW SPUs still work as 1 SIMD (iirc) which means that you'd have 2 "CU" that are 80ALUs wide rather than 2 that are 64 shaders wide. However if there is a scheduler in each SPU, that means you could have more elements for each thread than GCN, basically wider lanes than GCN has. The trade off is die space, schedulers take up room, but there is plenty of extra space in the SPUs than what you have in R700 SPUs, meaning this is some ridiculous custom R700 chip that has become some sort of hybrid GCN/VLIW thing.
Edit: I just looked it up and Cayman had 44 cycle latency. This is completely against the design logic of Wii U and is a major reason they might of done something very similar to what I suggest above. (Cayman is VLIW4 which has a lower latency than VLIW5 chips iirc)
GCN has a much lower cycle latency and so would this, possibly even lower than GCN since there is more schedulers and a wider series of elements (20 shaders vs 16)