The Cortex A9 supports up to 4 MB of L2 cache. The configuration in Tegra 3, which was discussed here, has 1 MB.
Yes, but I said ON-DIE L2 cache. What Cortex A9 supports is to connect a 2nd chip of cache through a dedicated bus&memory controller it has for that.
Having the L2 cache on a different die means a "lot" more cycles to access it (the electric signal has to travel from the main die to the daughter one), and even if it only is a mere difference of 10 cycles per access, at the end of the day this will harm performance enough to be trounced by the Espresso in nearly every sense of the word.
It's not strange that near half of the die area in the Espresso is dedicated to caches. That and its ultra-short pipeline gives it a decisive edge over the A9, no matter how you look at it.
Maybe an A9 with 4MB of L2 cache AND in a SIMD-centric situation could perform better, but not having on-die L2 cache is a HUGE drawback for sure.
Of course, on a mobile phone you wont have that extra chip with cache, and the one in Tegra 3 is still only 1MB to share between 4 cores, so 256KB of memory per core, that's half the amount of L2 cache the "tinny" cores have, and 8 time less cache than the Espresso's main core.
But what's most important is that a certain design is done to achieve certain results. In the case of an ultra-customized design like the one in the WiiU, this huge caches may have an impact bigger than just increase the CPU performance.
I mean, maybe the Espresso's performance won't benefit as much having a core with 2MB than it could have if it had 4 cores with 512Kb of L2 cache instead.
But since what matters is the overall performance of the system, it could be possible that those 2MB of L2 cache have not only the function of increasing the performance of that concrete core but also to reduce hugely the accesses to main memory, thus increasing the main RAM performance (RAM performance decreases from its theoretical peak the more seeks you do in it).
All in all, it's pretty obvious to me that a vanilla A9 without an L2 memory chip won't come even close to Espresso's performance, and even if it have it, that wouldn't make much of a difference due to the huge disadvantage that is to have your cache in a separate die (unless you have the huge 8MB one, then it could compensate it a bit).
But what's most important is that on the WiiU design, the one that makes most the sense is Espresso. Even clearly superior CPUs wouldn't be nearly as efficient in that design as the Espresso is, and that's all that matters here, because you won't find an Espresso CPU in a laptop, they're only found on the WiiU.