That prediction is a fad. AMD's VLIW underutilization always has been an issue since its induction on the HD 2x00 series. A good example of this. The Radeon HD 6870 has 1120 stream processors which in reality, are 224 stream processors in which every processor in there, is capable to execute up to 5 instructions per clock, but as far as it is from the same thread, and that's the catch.
You compare it to the scalar architecture of the for example, the GTX 460 which has 334 stream processors, and each processor there is capable to execute one instruction per clock regardless if its from the same thread or not, which shows that nVidia's approach in terms of performance is quite predictable and it's based on Thread Level parallelism, AMD's approach of the VLIW5 is torward Instruction Level Parallelism which means that it requires of compiler tricks to maximize its execution resources. So, sounds quite outstanding that the GTX 460 is almost able to keep up with the HD 6870 and its sheer amount of 1120 stream processors, but in the other hand you could also say that its a feat seeing the HD 6870 with its 224 stream processors being able to outperform slightly the GTX 460 with its 334. AMD's VLIW approach under maximum utilization is able to smoke anything that nVidia currently offers, but that's something that only would happen in rare ideal circumstances, I will explain that below.
nVidia's approach usually is better as it requires of little software optimization, but also means that the chip die will be much larger as their shader processors are much fatter and consumes far more power. AMD's approach is to accomodate much smaller stream processors to increase parallelism that it is easy in terms of hardware implementation, but will require good software to work, but graphic rendering is so parallel that it explains why AMD had been trading blows for a while with its much smaller chip, specially since 2008.
But even with that, AMD knew that their VLIW5 performance wouldn't scale linearly forever with the increase of stream processors. So they did two little experiments. One of them is the HD 6870 has lots of tweaks at the hardware level that allows their Dual Command Queue processor to take more control of the shader resources compared to the HD 5000 series, which also explains why the HD 6870 performs so close compared to the HD 5870 while having 34% less stream processors and with a smaller die, it is a feat.
Their second test is their Cayman GPU, which moved their design from VLIW5 to VLIW4, which is a move that clearly shows that they're moving torward a more oriented TLP design than with previous generations which was more oriented to ILP. It was because according to AMD, they only saw an average of 3/4 or 60%-75% of utilization on their VLIW5 design in the best case scenario, something that only happened on 60% of the time, showing that a lot of hardware was wasted idle on the die. So instead of adding 4 little stream processors and one fat processor for special tasks which certainly used a lot of space and idled a lot of time, made more sense removing that fat processor and increase the computing performance of the remaining 4 making them equal on everything.
So in some circumstances where AMD can achieve maximum execution resources with a VLIW5 design (Something very rare), it should be faster than the VLIW4 as it happened on Civilization 5 compute tests. But if you couple the VLIW4 design with Barts tweaks, means that you can achieve a much better utilization of the VLIW4 resources that AMD was able to achieve on the HD 5000 series. That explains why the HD 6970 while it only has 1536 stream processors (As it has 24 SIMD engines), it is faster than the HD 5870 that has 1600 stream processors (20 SIMD engines). The issue here is that Cayman was supposed to be a 28nm product, so other performance enhancing features weren't added as it would make the chip bigger than it already is. So think of this, if Cayman used the VLIW5 design, it would have 1920 stream processors, that also explains why it isn't much faster than the HD 5870.
So the secret sauce would be mixing Barts optimizations (95% of the HD 5870 performance with 34% less stream processors), increased processor count (From HD 5870's 20 SIMD to 24 SIMD's from the HD 6970), Tessellation enhancements, bigger VRAM size and some core clock bumps and can give you between 20%-60% higher performance depending of how much advantage the software can take from Cayman's approach. So I do think that in the future, both GPU vendors with their different approaches will remain close for a while, nVidia's path is safer but also more expensive, but they have deeper pockets than AMD. I can't wait to see what nVidia/AMD will offer on their 28nm process!!