Can somebody explain to me how it is that multiple sources have confirmed the dev kit specs in the OP to be correct, and yet people are sull expecting the final unit to to be 1.5-2x as powerful? Moreover, expecting higher performance than the watercooled Tegra Parker chip? Stop clinging on to that 1 TFLOPS rumor; it's obviously referring to FP16, and the only counterarguments I've seen require that the writer of that article has actual technical knowledge. From the article, that's clearly not the case. We were finally getting realistic, then this poorly written article comes along and both sides turn into something ridiculous.
Based on the rumors, we are almost certainly getting 512 GFLOPS docked. Expecting more is setting yourself up for disappointment. Nintendo likely chose this because there's already a chip available that matches it to use for development, while going for more would have made it hard to hit an early 2017 release (or even more so, the likely late 2016 original target that's likely the reason they couldn't go with Pascal in the first place). It's possible that it's a bit faster than that, but not 50% faster and certainly not 100%. Set your expectations for 256-384 on the go and 512 docked. Anything more is gravy.
To ease some of the disappointment, between the use of FP16 to give a boost in efficiency, the fact that Nintendo's customization is likely centered around increasing memory bandwidth, and just the typical advantages of console optimization, it's likly that what we end up with will be able to match or exceed this in real-world use:
https://www.youtube.com/watch?v=aEHhOmlyhJQ&t=395s
Is that not good enough for what it is? Some games would run at sub-HD but it's really not terrible. Who knows, maybe they will even be able to add 100-200MHz to it and squeeze out something even better.
Edit: More examples:
https://www.youtube.com/watch?v=jxAeIl-JX48 Note that most of these are running at medium settings, and that the 940M has only 14.4 GB/s of available memory bandwidth.