I've always taken the high clocks speed in the PS3 and 360 as relics of 'more clocks is better' Pentium 4 mentality. We later saw faster chips from Intel that ran at lower clock speeds.
I know I'm mixing two different companies here, but I've assumed that some fo the mentalities are the same.
P4 was an extreme case of clock-chasing. To the extent its performance/power ratio went down the drain (performance scales linearly with clock, while power-draw - exponentially). We are not looking at that with Power7 - it has both the clock *and* the wide-issue, out-of-order capabilities, while staying at sane power-draw levels. It still can reach higher clocks than 476, which allows the same amount of work to be done by fewer cores. Yes, numerous 476s doing the same work would have the overall power efficiency advantage (which is very important for supercomputers with gazillions of cores), but extracting from them the same performance as from the fewer-but-faster cores is a non-trivial task. We're not dealing with GPUs threads here - these are old-fashioned human sweat-n-tears CPU threads we're talking of.
Generally speaking, high clocks are not bad per se - it's the sacrifices and the sheer power-draw cost (and associates TDP, etc) which take balancing. Let's take Xenon - a clocky CPU with massive vector units. Its metric ton of high clocked ALUs is a factor to be reckoned with. It does nothing for the general-purpose computing case (where massively super-scalar, OOE cores are kings), but it's not a general-purpose CPU to boot. Or at least not as general-purpose are your notebook's Core-based CPU. Of course, Cell still makes Xenon look ordinary.
I've been assuming that a thread on 476FP is going to get more instructions done per cycle than on something like the Xenon. I think that's why we get away with lower clock speeds.
Sure, IPC is one metric. FLOPs is another. Xenon's strength is not so much with the former - it's with the latter, while still managing to be an 'ordinary design' CPU. Problem is, how much effort it takes to get some FLOPs out of the CPU when you need some (it just happens so that games occasionally need some close at hand). Cell used to make all this look easy - you just get a portion of those SPUs to work at your pressing task. Well, turned out it was not so trivial. Massive amount of R&D went into engines capable of scaling work with numbers of SPUs, more so than with Xenos threads. Some devs succeeded, others - not so much (hey, not everybody can be DICE). An old-fashion, high performing single core is still the easiest way to get a job done. Core multiplicity is here not because we like it, but because we face roadblocks with the performance of a single core. Well, Power7 is the cream-of-the-crop of single-cores performance.
My understanding is probably flawed somewhere here.
There's nothing wrong with your understanding, it's just that I think the balance nintendo will seek will lean them more toward higher performance cores than twice as many, lower-clock cores (both options subject to pricing, of course). There's another factor - nintendo cannot afford to underperform at the single-thread level compared to Xenon. Which means they need that metric ton of clocky ALUs. A Power7 core has them, a 476 core - not so much.
You're right that we need to think about yields for the processor that Nintendo wants. If the Power7 solution has a much lower surface area than a 476FP solution, then they're going to be able to get better yields, and vice versa.
The fact is that I haven't run across enough data on completed solutions like the "Axxia ACP3448" to say with any certainty how that would play out, I've just been running with the assumptions that the simpler cores with the lower clock speeds were going to get better yields. If a completed 476FP is bigger than a Power7, then that gets thrown out the window.
Basically, the cost metric we need here (disregarding most of what we talked about in the earlier paragraphs), is how many 476 cores at half the clock of Power7 (which is roughly how the two designs fare at the same process) will match latter's single core performance. While 476 has the cumulative power-draw advantage, I don't expect it to have such in the silicon area. Just like you, though, I don't have hard data on that.