I think the bolded is what most people are hoping for. Or, I suppose, that the handheld CPU situation will be "good enough" to allow Nintendo to develop games that work well across both form factors without feeling inherently gimped on the handheld.
There's also an argument to be made (at least if they go with 14LPP) for lots of A53s at a relatively low clock speed. Assuming 15% power reductions over 14LPE are true, you could hit 1GHz with 12 cores and consume a little under a Watt to hit 27,600 DMIPS (in a pretty small die area, at that). You're depending on developers to parallelise pretty heavily, and losing out a little on performance with code that can't be parallelised, but in a situation where every gain in perf/W saves you money in development costs down the line it's worth keeping in mind.
Yeah, I decided to take a look at this. I'm using A57 as a standin for Jaguar, since the later isn't really a popular core design and they perform about the same. 7 A57 cores clocked at 1.6ghz would have the same processing power as 17 A53 cores clocked at 1ghz (based on A57 cores out performing A53 cores by 56%) now I'm not trying to hit that mark with the handheld, it is possible with A53 cores, but you would need about 18 of them, considering that is still only 9mm^2 at 14nm, it is possible but a bit insane, even if the power envelope would be 1.6 watts.
My problem with this approach is the parallelised code, it would require a lot of work to port a title, and honestly I don't think these cpus need to be 1:1, I think the handheld can get away with less performance in the CPU while only having superficial downgrades.
I did the math and 4 A72 cores @ 1.4ghz + 4 A53 cores at 1ghz would perform about as well as 15 A53 cores at 1ghz, this CPU would consume 1.4watts and have an area of 10mm^2 at 14nm. The benefit with this design is that code that can't be paralleled, don't have to be, while giving a decent level of performance that should fit inside a decent power envelope.
I know it is a funny way to do it and admitting it is a rough estimation based on this article:
http://www.gizmochina.com/2015/03/27/huawei-reveals-kirin-930-uses-enhanced-cortex-a53e-cores/ "It could have used the Cortex A57 processor which is almost 56% faster than A53."
As for 4CUs, I'd love to see them, would make a console and handheld much easier to pair, but it really comes down to polaris's power draw, if they can get it down low enough, even knocking down some clocks on the CPU side could be a very beneficial gain. If Polaris can draw say 1.2 watts for 4CUs @ 400mhz at 14nm LPP, it could be perfect for the handheld, and would still be reasonable with the CPU I outlined above. The one big benefit here is if Nintendo hits 1TFLOP on the console, you'd only have about 5x more performance on the console, pretty reasonable considering the resolution we expect on the handheld and would still give some room for added effects / draw distance and hit a parity with XB1 (this is assuming they use polaris in the console, which I know you don't expect, but it isn't outside the realm of possibilities.
Thinking about the CPU some more, it could be possible to do 2 A72 cores @ 1.4ghz + 8 A53 cores at 1ghz, which is about 13 A53 cores @ 1ghz, still could help with code that can't be paralleled, and would only take up 8mm^2 @ 14nm.