How large of an L3 cache would you think necessary? If they went with the same cache configuration as Parker, L1 and L2 are both 2MB.
I honestly don't know. Part of the issue is that I can't actually find official confirmation of the GPU L2 cache size (which is the important one in this case) for TX1 or Parker. I seem to recall that TX1 had a 512KB GPU L2 cache, but that isn't something I can find any hard data on. My gut says that an L3 victim cache of somewhere between 2MB - 4MB would probably do the job (Apple uses a 4MB L3 victim cache for the same purpose in its SoCs), but it's not something you can really tell without testing, and without a TX1 or Parker customised with a large L3 victim cache that can be fractionally disabled it's not something we can test.
(For what it's worth, the CPU L1 and L2 caches on Parker aren't both 2MB. The L1 depends on the cores, with the Denver cores each getting 128K I-cache and 64K D-cache, and the A57s getting 48K I/32K D each. There's then a 2MB L2 shared between the Denvers and a separate 2MB L2 shared among the A57s)
Well Nvidia already has an API named NV API. So the N could have just been added because it's Nintendo's version.
https://developer.nvidia.com/nvapi
From my brief reading of this, it seems to be more a PC-oriented API designed to be complementary to DX/OGL/etc. Things like thermal management, driver initialisation, display configuration and overclocking controls aren't entirely relevant in the console space.
Hmm... Maybe the HBM stack is for Xavier? I was reading Drive PX2 specs and had it 8GB of 128-bit LPDDR4 RAM for system memory and 4GB of GDDR5 RAM for Graphics memory.
HBM is probably overkill for Switch, so the other option is higher LPDDR4 RAM Memory Bandwidth (so 128-bit, 2 chips) and more cache.
____
16nm for the GPU's of the Slim consoles makes sense, and likely for PS4 Pro.
Yes, Xavier would be the most likely suspect. It is worth noting that the article claimed the chip was "in production", and Xavier isn't, although it is possible this was mis-interpreted or mis-transcribed by the author. It would be an unusual option for Xavier, though, in that GDDR5(X) could give them the same bandwidth and substantially more capacity for a lower cost, and I wouldn't expect them to be so heavily power-constrained to choose HBM2 on the basis of power savings.
The target render Nvidia showed of the Xavier board also showed what clearly looked like RAM modules sitting next to the SoC (although of course one can read too much into early target renders like this).
Alternatively it is technically possible that it's for a GPU, but given the capacity and bandwidth it would only be suitable for an entry-level card, and Nvidia would have to tape out a completely different die (probably something equivalent to a GP107) with a HBM interface just to be able to use it for a single slightly more power efficient laptop GPU. This would be an extremely unusual move for Nvidia, as they typically have a variety of SKUs per die across both laptop and desktop in order to take advantage of binning (the most power efficient dies going to laptops) and to reduce design costs and inventory risk. They wouldn't be able to do this here (no point selling it for desktops if it's just a more expensive version of the 1050Ti), and in any case it's unlikely that they'd be able to warrant the increased cost of a HBM-powered card versus its GDDR5 equivalent, as the power savings would be relatively small (especially as you're comparing to binned GP107's in laptops).
The last option (unless there's any other super-secret SoC due soon) is the Nintendo Switch. We know it's in production, we know Nintendo likes esoteric memory, and we have it on good authority that Switch will have 4GB of RAM. As you say, though, it would be overkill for a device of that performance level (you'd potentially be looking at equivalent bandwidth to PS4 Pro, despite being at best 1/5th as powerful and using a considerably more bandwidth efficient architecture), and it would almost certainly be very expensive for a device which we've been told is targeting a "surprisingly low" price point.
Effectively it has to be one of the above three. Xavier seems the least unlikely (although on the surface it wouldn't have been something I would have predicted), and I'd actually argue that a GPU with 4GB of HBM2 is more unlikely than Switch using it. For Switch, at least, it would appear to be the only feasible way of hitting 200GB/s+ of bandwidth in a portable form factor, if they were for some reason to decide that's what they want.