The problem of newer processes is not yields per se but the cost of wafers in general. So even if you have 100% working chips the cost of each one will still be very high. That's the reason for both IHVs waiting even though both processes are already used for production of smaller chips since last year - the cost of wafers go down with time making the production of bigger chips economically sound. This isn't about yields as much as it is about wafer costs.
Wafer costs are certainly an issue, but yields are most definitely the major sticking point for sub 20nm FinFET processes. If it was just wafer costs rather than yields, then we wouldn't be seeing mobile SoCs for some time, as they're a highly competitive, low-margin industry. They're the first out of the gate on Samsung's 14nm and TSMC's 16nm, though, because they're small dies, and on a low-yield process that means they're a lot more viable than larger dies, even if they don't command anywhere near the price per mm². If yields weren't an issue on 14nm, then big server CPUs like IBM's POWER9 would be the very first chips off the production lines, as they effectively command the highest revenue per wafer of anything you'll come across once yields come out of the equation. With yields in the equation, though, they're not due for another year, because with poor yields dies that large just aren't an option, regardless of how much money you can charge for them.
Intel is worth looking at as well. They're working up against the same physical limits as Samsung, Global Foundries and TSMC, they're using largely the same 193nm lithography methods and they seem to be having exactly the same yield issues. Intel has no problem selling small (sub 100mm², although I haven't found a confirmed figure) 14nm FinFET dies in the form of Core i3 processors for a $117 tray price, but they're unable to put out 8 core Xeons which may be three to four times the size but sell for ten times the price. This wouldn't be the case if wafer costs were their primary issue over yields.
Even on the A9X, looking at
the benchmarks of the new 10" iPad Pro today, the GPU performance has dropped 35% compared to the 13" model, which is a lot more than would be expected even with a reduced clock speed to accommodate the smaller battery. It seems possible (although it's very difficult to confirm) that Apple are actually binning their A9X dies by disabling GPU cores to increase yields. This would be pretty unusual for a sub 150mm² die, but points to particularly low yields on TSMC's 16nm process.
Both G92 and RV790 (and GT200B and some other GPUs) were pure "die shrinks" though as 55nm process was just that to 65nm - you could take the existing 65nm design and produce it on 55nm enjoying the size and power benefits straight away. This isn't how it's going to be with 28->14/16, they'll have to redesign the chips to produce them on these new lines.
Well that's not really true. Even on a single node jump with the same fab, it's not quite trivial even to do a straight die shrink.
Or course this is rather off the point, anyway, as Polaris, Vega and Pascal are updated architectures designed for FinFET nodes in the first place. My point was rather that if we're trying to estimate the die size of a chip on 14nm there are two factors:
(a) The shrink down to 14nm from 28nm, which will obviously decrease the size
and
(b) Architectural changes, which will typically increase the size of the chip
For (b) we have pretty much zero evidence, but for (a) we do have evidence in the form of similar ICs (i.e. other GPUs) which are direct (or near-direct) die shrinks, as this eliminates the architectural variable. Obviously there's a wide margin of error to be applied from one die shrink to the next, but it's the only hard evidence we have of what kind of scaling we might expect from 14nm.
The issue here is that I don't see much benefit for them to actually leave the same unit numbers for P10 and V10 as the chips will still have to be heavily redesigned for both the new production process and the updated GCN4 architecture. Why would they leave the same number of CUs/SPs while doing all this? This just seems like a missed opportunity right there as even a modest increase in the number of SIMDs would grant them performance wins across the board in addition to whatever the optimizations will bring (which may well be load specific and not showing up in all games on the market).
I should have been more clear on that, I didn't mean they would use another 44CU part, I was just using it as an example. Judging by the videocardz.com report it seems a 40CU part would be the most likely for Polaris 10, but obviously it could be anywhere around that. (AMD have thus far always used multiples of 4 CUs on all but their entry-level GCN dies, though, so I'd tend to assume they'll do the same again. This is due to the way CU's share cache, although that could in theory change with Polaris/Vega).
I think it's quite relevant as it shows the sweet spot for the last several generations. Unless 14nm will be completely different to the several previous generations for some reason I don't see why it should result in a different chip sizes at first. So expecting ~300mm^2 is actually based on how it was historically and expecting ~200mm^2 is something which is rather new and hasn't really happened often previously (RV670 and G92 are the only two dies on a new process lines which were smaller than 200nm^2 I believe).
The evidence suggests that 14nm/16nm
are different, though. For Intel, it's obviously a slow-maturing node, and is their first node to be stretched over 3 years rather than 2. For Samsung/GF/TSMC we're seeing small-die mobile SoCs
long before CPUs, GPUs or server chips, which we've never seen on any previous node.
That's the issue right there. We know that there will be two Polaris GPUs and two Vega GPUs. If both Vega GPUs are using HBM2 then it would mean that both of them will be faster than both Polaris GPUs - otherwise it makes no sense. So either both Vegas will be above P10 - which is possible if P10 is just on Hawaii's level of performance - or one Vega won't actually use HBM2 and won't really provide any perf/watt increase compared to Polaris.
The second option seems more likely to me and that's why I'm saying that we shouldn't read too much into this slide as it is pure marketing.
Why leave one of the Vega GPUs until 2017 if it's less powerful than Polaris 10? And, for that matter, why would they call two of them Polaris and two of them Vega if the two Polaris aren't related and the two Vega aren't related? Logically there's something the two Polaris chips have in common and something the two Vega chips have in common. The most obvious would be use of GDDR5(X) on Polaris and HBM2 on Vega, but it could be that Polaris are being made on GF's 14nm process and Vega are being made on TSMC's 16nm process.
I'd imagine the most likely scenario is:
Polaris 11: Desktop 470/470X, mid-range and ultra-thin laptop, GDDR5, somewhere between Tonga and Hawaii in performance
Polaris 10: Desktop 480/480X, high-end laptop, GDDR5(X) somewhere between Hawaii and Fiji in performance
Vega 11: Desktop 490/490X, HBM2, somewhere above Fiji in performance
Vega 10: Desktop Fury, HBM2, somewhere above Vega 11 in performance
This obviously assumes that Vega 10 doesn't arrive until quite a bit after Vega 11, as a counter to GP200 based cards.
Of course, even if we know the number of CUs that still leaves quite a bit of variance in potential performance. Despite the increase in raw computational power,
980Ti and Fury X are only about 20% faster than 390X at 1440p. A 40CU part could match them if AMD manages to squeeze an extra 30% performance out of Polaris over GCN1.2 from architectural improvements and clock increases, which isn't completely impossible, although I'd keep my expectations in check.