"designed to perform culling of triangles before they hit the geometry processor. Effectively, what this means is that it runs through the triangles (also known as primitives) as they hit the GPU and tests them to see if they're actually going to be visible on the screen or not (with a variety of tests), then throws out the triangles that aren't going to be on screen (i.e. discards them or culls them).
Now, this is a good thing, pretty much regardless of the type of game. Attempting to render triangles that aren't actually going to end up in the final image is a waste of GPU resources, and preventing that would be a pretty good thing. One of EA's Frostbite developers actually just gave a talk on this at GDC (PDF link, very technical), where they describe the software-based culling methods used in Frostbite. In their test scene, they could throw out over 75% of triangles before hitting the geometry engine, resulting in an almost 20% performance improvement. AMD have also recently released their own software-based culling solution, GeometryFX, which like Frostbite runs as software on the GPU's shader units. Hence, they're obviously interested in the problem of triangle culling, so it wouldn't be surprising if they developed a hardware unit to perform it more efficiently."
This was great to hear, but I asked if it was possible to get this feature set on 28nm GPU's since those are more established and cheaper than 14nm and there isn't a yield risk on those.
"Well, from a purely theoretical perspective, any "Polaris exclusive" feature could be adapted to a 28nm process. Work on Polaris had likely been going on for about a year or so when work on NX started, so it is in theory possible that they said to AMD "Hey, we like this primitive discard accelerator thing, can we have it on our planned 28nm chip?". The issue with this is that Nintendo would have had to fork over quite a lot of extra R&D dollars to get a functional block "back-ported" to 28nm, compared to components from existing GCN 1.2 chips, which were already ready to go for TSMC's 28nm process. The other issue is the assumption that the primitive discard accelerator is single functional block that can be just pulled wholesale out of Polaris. It could be an integral part of the geometry processor or command processor, or the manner in which it operates could depend on the newer geometry or command processors in Polaris. This would mean that you'd need to port the bulk of Polaris's improvements back to 28nm, or do a substantial amount of redesign work on the primitive discard accelerator to get it to work in a GCN 1.2-era chip (either of these would add substantially more R&D cost).
It's impossible to say how much it would cost them in the scheme of things, but it does seem like an unusual added expense over just taking existing GCN 1.2 tech, which would still be a generation ahead of the competition."
So with this quote, while it's not confirmed, it is highly likely Nintendo is using a 14nm Polaris GPU in the NX based off of the expensive nature of using PDA or any Polaris feature set and putting it on a 28nm chip.
If Nintendo does go with 14nm, it is very likely they will be taking a loss on each NX sold, but we don't know the dealings AMD have with Nintendo. Once source told me "AMD may have been desperate enough for the business to offer Nintendo, say, the first year's supply at a fixed price, to reduce Nintendo's yield risk. They may also have specifically pushed Nintendo into a 14nm Polaris-based chip, as it would allow Nintendo to also use a Polaris-based 14nm chip for the handheld. This would not only be a big business win for AMD in itself (Nintendo's handhelds typically sell a lot more than their home consoles), but it would also big a big PR win for them, by showing their ability to compete in the ultra-low TPD sector, which is a market they've made pretty much no traction in up until now."