A GPU of the same size as GK104 or Tahiti made on 14/16nm should be able to beat both GM200 and Fiji from a pure theoretical point of view. Yields has nothing to do with this as a) we don't know them and b) it's a question of retail pricing which decides if yields are fine or not and we don't know that either.
You're correct in that we don't know yields, but we can infer a lot from what we do know, which is the timescale of confirmed product launches on 14nm/16nm, and likely pricing information. The point of my post is that all the publicly confirmed information we have points to a very slow yield growth and a process which will still be expensive even for relatively small dies for the rest of the year.
Mobile chips are using a different version of 14nm process.
No, the A9 and Samsung's Exynos 7420 (used in the S6) are made on the older 14LPE process. The Exynos 8890 and the Snapdragon 820 are made on the newer 14LPP process, and are being used in currently shipping phones like the S7. The Snapdragon 820 in particular is the first 14LPP chip mass-produced by Global Foundries, as far as I'm aware.
We don't know much about Zen yet to make any kind of prognosis here. But if AMD is planning to sell them for $1000 they're delusional.
Of course I'd expect AMD will undercut Intel on price with Zen. My point, though, was that they could undercut Intel even if the cost of an 8-core die pushed the price all the way up to $800 or so. The cost of a quad-core chip will have to be under $300 to compete with consumer-level i7's, but the quad-core die will be truly tiny (i.e. likely sub 100mm²).
Power CPUs are for supercomputers only these days and their costs there are obviously of no issue for the most part. I.e. a bad reference point for a consumer level GPU chips. You can use POWER9 chips as a reference for a possible GP100 introduction to the same HPC market this year however.
The bolded is exactly my point. IBM can afford to push out large-die POWER chips pretty much straight out of the gate on new processes, because they're heavily binned and command large margins. POWER8 was the first chip on their/GF's 22nm node, POWER7+ was the first on their 32nm node, POWER7 was the first on their 45nm node, etc., etc.
Yet here we are with the 14nm process they'll be using for POWER9 (Global Foundries' 14nm) starting to push out chips, and IBM's new chip is a year or more away. This isn't a design delay, either, as the three year wait between POWER8 and POWER9 will actually be the longest ever between generations of IBM's server chips.
So that's a data point in the expected yields of the 14nm node: IBM don't expect a heavily binned large 14nm die (potentially around 600mm²) to be profitable at several thousand dollars a piece until 2017. This may not seem like it's relevant to smaller 300-400mm² dies, but it again paints a picture of a very slowly maturing node.
You're assuming that Polaris will be just a shrink of GCN3. Which it won't be, not from architectural point of view and not from the production point of view either as GloFo's 14LPE is not a shrink process to TSMC's 28HP. There are just too many variables to guess on right now so this is mostly pointless.
You're right in that it's not a simple shrink, and different kinds of chips will scale differently. We do, however, have a couple of cases where AMD and Nvidia have either produced a direct die shrink of a GPU, or have produced an almost identical die on a smaller process, which we can use as a baseline for our expectations.
The first is Nvidia's G92, which was first produced on a 65nm process, and later shrunk to a 55nm process. The die was 324mm² on 65nm and 260mm² on 55nm, so that's a 19.8% reduction in size over a single node jump (28nm to 14nm is of course a two node jump). The second is AMD's RV790, which was a 282mm² die on 55nm. It was replaced by Juniper, which wasn't a direct die shrink, but shared the same architecture (Terascale 1) and ALU configuration (although it did have a narrower 128-bit GDDR5 interface, which would have saved some die space). Juniper was 170mm² on a 40nm process, so that was a 39.7% shrink (probably closer to 35% accounting for the smaller memory interface).
So, for a single-node jump, assuming no architectural changes, we have a die shrink range from about 20% to about 35%. For a two-node jump, then, our expectation should be in the range from about 36% to about 58%, or let's say 47% (+-11).
Still assuming no architectural changes for the moment, this would place a straight shrink of Hawaii at 232mm² (+-48).
Once you account for architectural improvements, though, the die size is more likely to go up than down. Attempting to estimate those changes would be pure speculation, so I'll leave it at this:
Based on the evidence available to us, a 44CU Polaris die would likely exceed 200mm², perhaps by a large margin. It would certainly outperform Hawaii, although to what extent is impossible to predict.
All I can be sure of is that porting Hawaii to 14LPE should give AMD a die of ~200mm^2 - which seems a bit on the low side from my point of view as all AMD GPU dies on a new production process for the last several years have been bigger than that. They usually start the new process with a die of approximately 300mm^2 in size which would be analogous to a chip of ~640mm^2 size on 28HP process - which is pretty close to how large Fiji is.
Hence my possible disappointment with P10 if it'll end up being a Hawaii "port" and will fall somewhere between Hawaii and Fiji in performance. I pretty much expect P10 to be a "port" of Fiji with higher clocks and optimized architecture driving it above Fiji in performance, not below it.
What AMD usually start on a new process with is irrelevant, as this isn't 28nm or 40nm or any older process. If it was, we'd already have a full range of Polaris/Vega and Pascal GPUs on store shelves.
It would be a disaster if AMD will wait till 2017 to introduce something faster than Fury X. In any case, I think that we shouldn't read too much into a marketing slide measuring some unknown perf/watt.
That "marketing slide" is pretty much the only piece of confirmed information we have (i.e. not rumour) on the timescale for the rollout of AMD's new GPUs. And that slide appears to show Polaris releasing in late 2016 and Vega in early/mid 2017.
(Regarding the perf/W, I would imagine the higher figure for Vega is largely due to its use of HBM2 compared to more power-hungry GDDR5(X) for Polaris).
Well, we can't really talk about yields because we don't know anything on this. Yields are highly product related anyway and thus it's pointless to talk about general "process yields" as there is no such thing.
There very much is such a thing as process yields. Different chips will have slightly different yields depending on their design (for example 1mm² of SRAM will be less likely to develop a fault than 1mm² of custom ALU logic), but general improvements in the manufacturing process over the life of a node help all dies in fairly equal measure.
The main difference in yield is binning strategies, which are employed on almost all but the smallest of dies. Speed binning will increase yields by a relatively small amount (it's more generally used to squeeze higher profits out of the best dies), whereas binning with disabled functional blocks (i.e. CPU cores or GPU ALUs) will increase yields by a larger amount. Block level binning will of course be used on Polaris, Vega and Pascal, but it has no greater benefit than it did on their 28nm products, or on AMD's Zen processors or APUs, or on Intel's forthcoming 14nm Xeons, or on IBM's POWER9. If slow yield growth is preventing those kinds of chips from releasing in a timely fashion, then we can expect the same slow yield growth to affect AMD and Nvidia's plans.
While you are definitely correct on the prices of wafers for newer processes rising all the time and essentially pushing bigger chips into the future where they will be feasible to produce what you're missing is that 200mm^2 isn't a typical starting point on a new process. It may be that both AMD and NV decided to use the first generation of 14/16 chips for mobile solutions only but I personally find this rather unlikely as their key market is dGPUs. Thus the wait for the ability to produce a GPU which will be better that the top dog of the previous gen has already been done most likely - and it's probably the reason both of them have passed on 20nm process for GPUs.
Their key market is desktop GPUs, but laptop GPUs command significantly higher margins, and are highly dependent on perf/W. If AMD's claims of 2.5x improvement in power efficiency over GCN1.2 are true, then a mobile Polaris 11 chip that matched their current Tonga-based M395X could be squeezed into a 50W envelope, not much more than the M370X used in the current MacBook Pro. And of course they could put out Polaris 11 based mobile GPUs which substantially outperform any current laptop chips at around the 100W envelope.
Now let's say Nvidia waits until a 980Ti-beater is feasible before releasing any Pascal chips. Why would AMD wait as well, if they could release a 400M line which could gain them significant market-share in a profitable market before Nvidia's FinFET GPUs arrive? Ditto for Nvidia, although I suppose they have less incentive to move early as they already have a larger share of the laptop market.
This means that we should expect chips which will be bigger than 200mm^2 from the start as only such chips will be able to beat Fiji/GM200 - which is crucial for them as this allows to price the products on these chips accordingly. When you talk about high end and mid range you seem to be thinking solely in terms of chip sizes while these things are not even related - the market segment of a product is decided upon from a chip's performance, not its size.
Thus I'd argue that building a 300mm^2 chip and using it for a card which will retail for $500-700 while performing better than modern $500-700 solutions is actually a better way to handle the economies of the new production process compared to a chip of ~200mm^2 which won't beat the current high end card and because of that will have to be sold in the $300-500 bracket.
It's important to note that die costs aren't linear wrt size, in fact on an immature node they're close to exponential. Not only is in entirely possible for a 200mm² die to be financially viable for a $350 product well before a 300mm² die is viable for a $500+ product, but if those 200mm² dies will be sold as both desktop and mobile GPUs, the average selling price per chip will increase substantially (i.e. even if Polaris 10 isn't actually viable for a $350 desktop card, it may be worth entering mass production on it if a large proportion of them actually sell for twice as much as laptop GPUs).
It's also rather important for either IHV not to undershoot in a segment when producing a new GPU on 14/16nm because that would mean that the competitor has gotten himself a year or so of complete market domination in this segment. And I'm pretty sure that NV will try to beat 980Ti with GP104 as this is the only way I see for them to handle the process node transition without loosing a lot of margins.
So yeah, I understand what you're saying but I expect a better future. P10 being slower than Fiji will be very disappointing. Same for GP104 being slower than GM200.
The only way I could see Nvidia entering the market with high-end cards substantially sooner than AMD do is if:
(a) TSMC's 16nm process yields make large dies affordable far before they are on GF's 14nm process
and
(b) AMD are dropping TSMC altogether irrespective of their 16nm yields.
The first isn't completely impossible (see A9X being fabbed by TSMC rather than Samsung, although as discussed previously it's still an expensive chip). The second, though, I would find very strange. AMD have been a long-time customer of TSMC, and would have kept themselves very well informed on the expected yield growth on their new process. If it was going to offer substantially better yields than 14nm then AMD would be using it, particularly for their larger dies (hence my speculation that Vega may be on TSMC's 16nm). Even if 14LPP offers better performance/Watt for their CPUs, APUs and small/mobile GPUs, AMD have shown they're willing to shop around between fabs, and if their high-end desktop GPUs were able to get better yields on TSMC, that's where they'd go.