How close are we talking though? Will those cost savings be worth potentially handicapping the system?
It's pretty difficult to say precisely (there are an awful lot of variables), but just taking TSMC's public comments on the processes can give us a reasonable idea. On their 20nm process (they only have one, compared to I believe six 28nm processes),
they claim:
TSMC's 20nm process technology can provide 30 percent higher speed, 1.9 times the density, or 25 percent less power than its 28nm technology.
The problem is that they don't specify
which of their 28nm processes they're comparing to. It might be assumed that they'll give themselves the most favourable comparison, which would be 28LP, although this would actually make 20nm less power efficient than 28HPC, let alone 28HPC+, as
they say:
Compared with TSMC's 28LP, 28HPC provides 10% smaller die size and more than 30% power reduction at all levels of speed
If they were choosing the least favourable comparison and comparing to 28HPC (which became available at about the same time, afaik), then there still wouldn't be a whole lot in it, as 28HPC+
reportedly has 25% lower leakage than 28HPC. Although this is only leakage, not dynamic power, for a low-clocked mobile chip like Switch's leakage can constitute quite a large proportion of power draw.
Then, looking at TSMC's 16FF+ description, they claim
TSMC's 16FF+ (FinFET Plus) technology can provide above 65 percent higher speed, around 2 times the density, or 70 percent less power than its 28HPM technology. Comparing with 20SoC technology, 16FF+ provides extra 40% higher speed and 60% power saving.
Here they actually give us the 28nm process they're comparing to, and it's two generations behind 28HPC+. It also works out to a 25% power reduction from 28HPM to 20nm, so it may be that this is what they're comparing 20nm to above as well. If this is the case, then the actual power consumption difference between 20nm and 28HPC+ could be near-trivial.
It's also worth keeping in mind that the above comparisons will be, for the most part, between the median chip manufactured on a given process (i.e. if you line up all the chips from a wafer from best to worst, you're comparing the middle one). One of the big benefits of mature nodes like 28nm is that there's much less variability from one die to the next. For Nvidia or Intel, who will bin their dies across a number of SKUs, lowering clock speeds or disabling cores on the cheaper models to use as many dies as possible, this isn't that big of a deal, but for Nintendo it would be quite important, as they want to use as many dies as possible from each wafer for a single product, and they have to work around the clock speeds and power consumption of the worst-performing dies.
In this scenario, if Nintendo aims to use, say 95% of functioning dies, then a mature 28nm process may actually give them
better performance than the 20nm process, simply because the lower process variability means that the bottom 5th percentile of 28nm dies could actually perform better than the bottom 5th percentile of 20nm dies. Of course they could use fewer of the 20nm dies to push the performance floor up, but that's just ratcheting up the cost on an already expensive node.
The only area where 20nm would have a clear benefit over 28nm is in density, where a given chip will be quite a lot smaller on 20nm than 28nm. This is very valuable in smartphones, where every cubic millimetre is precious, but I honestly don't expect it to be that big of a deal for Switch (in fact a larger die actually helps cooling as the larger surface area dissipates heat more effectively).