Support NeoGAF

Hermii · Dec 16, 2016

th3sicknness said:
Didnt the same already occur for Nvidia? Nvidia got out of 20nm business because it was shit just as AMD did .. why would they sell it to Nintendo and let them tinker with it and put out a product that would reflect badly on them?

They would put out a cheap product with gorgeous graphics and it wouldn't reflect badly at them. Then they release a 16nm pascal pocket version of the switch in a couple of years.

AzaK · Dec 16, 2016

Hermii said:
People will be wowed even with the worst case scenarios we are talking about because Nintendo/ monolith soft etc optimizes the shit out of their games and tools/ apis etc will be top notch. Even worst case the games will blow away any android or iOS device.

They might be wowed if they are used to the 3DS, but not if they're used to console gaming. In fact they will laugh and not buy it.

Hermii · Dec 16, 2016

AzaK said:
They might be wowed if they are used to the 3DS, but not if they're used to console gaming. In fact they will laugh and not buy it.

There won't be a portable system on the market that come close to switch in real world graphics.

ShadowFox08 · Dec 16, 2016

Schnozberry said:
How does that benefit Nintendo financially in the long run? It would be more advantageous to buy up the 20nm wafers on the cheap and make use of them, rather than offer to pay to bail Nvidia out of a bad deal.

Why would Nintendo be obligated to bail Nvidia out of bad deal? Lol

th3sicknness · Dec 16, 2016

Hermii said:
They would put out a cheap product with gorgeous graphics and it wouldn't reflect badly at them. Then they release a 16nm pascal pocket version of the switch in a couple of years.

I am less inclined to believe they would take his approach when the Switch has to succeed before it can be die shrunk and sold again as an improved product. Nintendo could have just as easily made a demand that to win the deal Nvidia would have to have Pascal ready in time to accommodate Nintendo's need for better battery life.

I would think both Nintendo and Nvidia would put their best feet forward when it came to the Switch. Which is why I think the tools and dev kit hardware are final but the GPU could be changed to accommodate the efficiency of Pascal.

Hermii · Dec 16, 2016

Never in history has 4nm been more argued about lol.

Schnozberry · Dec 16, 2016

th3sicknness said:
Didnt the same already occur for Nvidia? Nvidia got out of 20nm business because it was shit just as AMD did .. why would they sell it to Nintendo and let them tinker with it and put out a product that would reflect badly on them?

I don't know for certain if Nvidia has a wafer supply agreement in place for 20nm. If they did, it could be one of the reasons Nintendo decided to stick with Maxwell, because it would come considerably cheaper if Nvidia were looking to get rid of them.

I'm personally hoping for 16nm. It would be a much better product for everyone.

.

Schnozberry · Dec 16, 2016

ShadowFox08 said:
Why would Nintendo be obligated to bail Nvidia out of bad deal? Lol

The post I was replying to had asserted that Nintendo may offer to help Nvidia pay their way out of a 20nm wafer supply agreement in order to get on 16nm.

AlStrong · Dec 16, 2016

Schnozberry said:
AMD had to pay $33M in penalties to cancel their 20nm chips

From what I understand, the $33M was more of a write-off on having gone through the design & tape-out of 20nm products (APUs, if I recall correctly, but maybe a GPU-line too?), so it was a sunk cost for products that never made it to volume production.

Anyways.

th3sicknness · Dec 16, 2016

I find it hard to believe both Nintendo and Nvidia agreed this was the way to go in light of this lol...

"Nvidia deeply unhappy with TSMC, claims 20nm essentially worthless"
https://www.extremetech.com/computing/123529-nvidia-deeply-unhappy-with-tsmc-claims-22nm-essentially-worthless

Mr Swine · Dec 16, 2016

Is it possible for for Nvidia to engineer in 3 FP16 units instead of 2 FP16 (or 1 FP32) or is that completely impossible? I know they changed the FP16 in Pascal

Edit: sorry, should say Nvidia and not isis :/

EloquentM · Dec 16, 2016

I don't even know why flops are even being debated. It's not like most of us know how that even translates into any kind of development environment. It's basically just a comparison for the sake of a comparison to the other consoles without know what the limits these numbers puts on ports if at all.

teflontactics · Dec 16, 2016

EloquentM said:
I don't even know why flops are even being debated. It's not like most of us know how that even translates into any kind of development environment. It's basically just a comparison for the sake of a comparison to the other consoles without know what the limits these numbers puts on ports if at all.

In one of the most basic senses, FLOPS translates to raw potential performance - and this is a dev kit thread listing rumoured specs, I think it's more than expected that this kind of talk will be present.

Besides, it's much better than people comparing processor architecture when most people can't even equate that to performance in any sort of sense - at least this gives you a standard set of numbers to work with (even if not completely indicative of real-world performance).

Schnozberry · Dec 16, 2016

AlStrong said:
From what I understand, the $33M was more of a write-off on having gone through the design & tape-out of 20nm products (APUs, if I recall correctly, but maybe a GPU-line too?), so it was a sunk cost for products that never made it to volume production.

Anyways.

That's correct. The bigger issue was renegotiating their wafer supply agreements.

th3sicknness said:
I find it hard to believe both Nintendo and Nvidia agreed this was the way to go in light of this lol...

"Nvidia deeply unhappy with TSMC, claims 20nm essentially worthless"
https://www.extremetech.com/computing/123529-nvidia-deeply-unhappy-with-tsmc-claims-22nm-essentially-worthless

That was two years before the Tegra X1 was released, so they obviously found it suitable enough for an SOC design. For dedicated GPUs it was a dumpster fire.

EloquentM · Dec 16, 2016

teflontactics said:
In one of the most basic senses, FLOPS translates to raw potential performance - and this is a dev kit thread listing rumoured specs, I think it's more than expected that this kind of talk will be present.

Besides, it's much better than people comparing processor architecture when most people can't even equate that to performance in any sort of sense - at least this gives you a standard set of numbers to work with (even if not completely indicative of real-world performance).

That's fair and this is a discussion forum I just think it's odd when really most people lack the technical expertise to even gather anything relevant from these numbers.

Schnozberry · Dec 16, 2016

Mr Swine said:
Is it possible for an Isis to engineer in 3 FP16 units instead of 2 FP16 (or 1 FP32) or is that completely impossible? I know they changed the FP16 in Pascal

The CUDA Cores on the Tegra Chips can do one FP32 instruction or 2xFP16 instructions per clock cycle. They altered the desktop pascal chips so that the cores were designed around FP32 specifically, and only had one 2xFP16 capable core per block of 128 cores.

There are 256 cuda cores on the Tegra X1. If that same exact chip was shrunk down to 16nm from 20nm, they could fit 384 cores in the same amount of die space.

th3sicknness · Dec 16, 2016

Schnozberry said:
That's correct. The bigger issue was renegotiating their wafer supply agreements.

That was two years before the Tegra X1 was released, so they obviously found it suitable enough for an SOC design. For dedicated GPUs it was a dumpster fire.

Everything Ive read about Nvidia's exp with 20 nm has been bad. So lets just hope whatever Nintendo is customizing makes it better lol...

AlStrong · Dec 16, 2016

Mr Swine said:
Is it possible for an Isis to engineer in 3 FP16 units instead of 2 FP16 (or 1 FP32) or is that completely impossible? I know they changed the FP16 in Pascal

You could get triple rate if they implemented a separate FP16 set of shaders alongside the FP32 shaders, but that is just more die space.

The way the double pumped FP16 works is by packing & performing two (identical) ops with the single FP32 path.

teflontactics · Dec 16, 2016

EloquentM said:
That's fair and this is a discussion forum I just think it's odd when really most people lack the technical expertise to even gather anything relevant from these numbers.

Well if you need a little background for that, this Eurogamer article might help. It's not talking about the Switch, but it'll give you some of the reasoning behind how and what FLOPS have to do with power and why people talk about it (as well as some console numbers to compare to).

Only reading the first bit will go a long way, but if you're not too lost reading the whole thing will definitely help.

EloquentM · Dec 16, 2016

Thank you

AlStrong · Dec 16, 2016

Schnozberry said:
There are 256 cuda cores on the Tegra X1. If that same exact chip was shrunk down to 16nm from 20nm, they could fit 384 cores in the same amount of die space.

hm? density of 16nmFF is only marginally better than 20nm.

MuchoMalo · Dec 16, 2016

Schnozberry said:
The CUDA Cores on the Tegra Chips can do one FP32 instruction or 2xFP16 instructions per clock cycle. They altered the desktop pascal chips so that the cores were designed around FP32 specifically, and only had one 2xFP16 capable core per block of 128 cores.

There are 256 cuda cores on the Tegra X1. If that same exact chip was shrunk down to 16nm from 20nm, they could fit 384 cores in the same amount of die space.

This is incorrect. 16nmFF doesn't have any notable density advantage over 20nm, since 16nmFF is effectively 20nmFF but named 16nm because reasons. This is why I'm not really entertaining the idea of a third SM.

th3sicknness · Dec 16, 2016

For Nvidia the efficiency difference is drastic when going from maxwell to pascal.

Skittzo0413 · Dec 16, 2016

I don't remember if it was this thread or the Venture Beat one, but someone was asking if the 40% speed or 60% power saving is due to Maxwell > Pascal or simply due to 20nm>16nm (or were they claiming 28nm > 16nm?). Well according to this from TSMC:

Comparing with 20SoC technology, 16FF+ provides extra 40% higher speed and 60% power saving. By leveraging the experience of 20SoC technology, TSMC 16FF+ shares the same metal backend process in order to quickly improve yield and demonstrate process maturity for time-to-market value.

So the 40% more performance or 60% more power efficiency is due to the die shrink from 20nm to 16nm only, meaning a Maxwell chip at 16nm would gain those same advantages.

atbigelow · Dec 16, 2016

Panajev2001a said:
OS tasks, background tasks for it and for the games, etc... why waste any time slice from the big cores when you do not need them? You are better off offering games more deterministic performance and let the OS do its thing with the LITTLE ones:

I agree it makes a lot of sense to put smaller cores in for less intensive tasks. And it depends on the CPU in question: the Tegra X1 is a heterogeneous setup, so the OS can switch threads back and forth between the big and little cores. It'd be neat if that was what they go with in the Switch.

Zil33184 · Dec 16, 2016

Schnozberry said:
Well, it doesn't rule out a custom chip fabbed on 20nm, but it does rule out a bog standard Jetson TX1 like this OP details.

The leak doesn't state the dev kits are off the shelf Jetson boards, that's just how it was interpreted by people as an attempt to dismiss the legitimacy of the leak.

BDGAME · Dec 16, 2016

I believe that question is already asked before, but asking again, What kind of power is necessary to run Xbox one games at 720p? Assuming that most of the xOne games run at 1080p.

th3sicknness · Dec 16, 2016

Skittzo0413 said:
I don't remember if it was this thread or the Venture Beat one, but someone was asking if the 40% speed or 60% power saving is due to Maxwell > Pascal or simply due to 20nm>16nm (or were they claiming 28nm > 16nm?). Well according to this from TSMC:

So the 40% more performance or 60% more power efficiency is due to the die shrink from 20nm to 16nm only, meaning a Maxwell chip at 16nm would gain those same advantages.

Yes if you shrunk Maxwell you would get the same performance improvements at better efficiency based on that logic (I am not suggesting that you are suggesting they are doing this). What doesnt make sense is shrinking Maxwell when Pascal is essentially the same exact thing. Why would Nvidia die shrink Maxwell when the had already planned to AND HAVE done the same with Pascal?

It makes even less sense when Nvidia essentially hit 16nm for the first time with Pascal .. how could they be thinking about shrinking Maxwell for Nintendo before they had the ability to with Pascal?

It makes more sense that they either customized 20nm or the timing for 16nm Pascal made sense for them.

Skittzo0413 · Dec 16, 2016

th3sicknness said:
Yes if you shrunk Maxwell you would get the same performance improvements based on that logic (I am not suggesting that you are suggesting they are doing this). What doesnt make sense is shrinking Maxwell when Pascal is essentially the same exact thing. Why would Nvidia die shrink Maxwell when the had already planned to AND HAVE done the same with Pascal?

It makes even less sense when Nvidia essentially hit 16nm for the first time with Pascal .. how could they be thinking about shrinking Maxwell for Nintendo before they had the ability to with Pascal?

It makes more sense that they either customized 20nm or the timing for 16nm Pascal made sense for them.

The way I see it, the starting point for this console was the Tegra X1. Nintendo and Nvidia took a TX1 and did a number of customizations to that chip to arrive at the custom SoC confirmed by Nvidia. One of those customizations could very well be a die shrink to a 16nm process in which case, since the TX1 has Maxwell architecture, the custom chip would also be based on Maxwell architecture, because Pascal architecture has some minor differences that Nintendo isn't interested in.

This way, while the final chip is similar in architecture to Pascal, since Pascal itself is almost the same as a 16nm Maxwell, it's still technically just based on Maxwell because it's not 100% Pascal, as it has been customized.

Meaning, the whole Maxwell/Pascal thing is almost entirely semantics in the case of a custom chip.

Vena · Dec 16, 2016

Zil33184 said:
The leak doesn't state the dev kits are off the shelf Jetson boards, that's just how it was interpreted by people as an attempt to dismiss the legitimacy of the leak.

If it walks like a duck, quacks like a duck, and looks like a duck...

...clearly its an alligator.

Zil33184 · Dec 16, 2016

Vena said:
If it walks like a duck, quacks like a duck, and looks like a duck...

...clearly its an alligator.

Or it's a custom TX1 variant, just like the custom RSX was a modified 7800GTX.

Vena · Dec 16, 2016

Zil33184 said:
Or it's a custom TX1 variant, just like the custom RSX was a modified 7800GTX.

There's nothing custom about the spec listed, hence my statement. Such a claim is based on empty supposition.

If it walks like a duck, quacks like a duck, and looks like a duck...

...clearly its ~~an alligator~~ a custom duck.

TheDeadHeroAlistair · Dec 16, 2016

Zil33184 said:
The leak doesn't state the dev kits are off the shelf Jetson boards, that's just how it was interpreted by people as an attempt to dismiss the legitimacy of the leak.

It's interpreted that way because it makes sense on multiple levels. If hardware that targets your desired performance envelope and is close to the architecture you are working on is already commercially available, why not use it until the complexities of your architecture are finalized? It saves on much of the cost of producing devkits and is already well documented, and allows developers insight as to how well their games will perform on final hardware. This is all in addition to the specifications lining up, of course.

That isn't too say that there wasn't anyone that didn't dismiss the leak, of course.

aBarreras · Dec 16, 2016

Vena said:
There's nothing custom about the spec listed, hence my statement. Such a claim is based on empty supposition.

can someone tell me what is the consensous now? haha

is it still 16nm maxwell?

Vena · Dec 16, 2016

aBarreras said:
can someone tell me what is the consensous now? haha

is it still 16nm maxwell?

We have no idea. We just have devkit info and its mostly out of date in terms of performance, only other detail we have is that the final kits as of November with Maxwell.

th3sicknness · Dec 16, 2016

Skittzo0413 said:
because Pascal architecture has some minor differences that Nintendo isn't interested in.

Id would expect them to be interested in this..With limited memory / memory bandwidth wouldnt this be helpful?

blu · Dec 16, 2016

th3sicknness said:
Id would expect them to be interested in this..

With limited memory / memory bandwidth wouldnt this be helpful?

That's a reiteration of what's already in TX1. What might be in Switch is anybody's guess.

Vena · Dec 16, 2016

th3sicknness said:
Id would expect them to be interested in this..With limited memory / memory bandwidth wouldnt this be helpful?

Correct me if I am wrong but isn't this already in the Tegra X1? Or some parts of it (if not all of it)? If the chip for the Switch is based on the X1, it would already have all/most of these advantages.

Edit: Seems I was somewhat right, ^ blu above beat me.

th3sicknness · Dec 16, 2016

Vena said:
Correct me if I am wrong but isn't this already in the Tegra X1? Or some parts of it (if not all of it)? If the chip for the Switch is based on the X1, it would already have all/most of these advantages.

Edit: Seems I was somewhat right, ^ blu above beat me.

"New to Pascal is a mix of improved compression modes and new compression modes. 2:1 compression mode, the only delta compression mode available up through the 3rd generation, has been enhanced with the addition of more patterns to cover more scenarios, meaning NVIDIA is able to 2:1 compress blocks more often."

"To put all of this in numbers, NVIDIA pegs the effective increase in memory bandwidth from delta color compression alone at 20%. The difference is of course per-game, as the effectiveness of the tech depends on how well a game sticks to patterns (and if you ever create a game with random noise, you may drive an engineer or two insane), but 20% is a baseline number for the average. Meanwhile for anyone keeping track of the numbers over Maxwell 2, this is a bit less than the gains with NVIDIA’s last generation architecture, where the company claimed the average gain was 25%."

http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/8

Zil33184 · Dec 16, 2016

Vena said:
There's nothing custom about the spec listed, hence my statement. Such a claim is based on empty supposition.

This is Nvidia, who says a custom part can't be remarkably similar to a pre-existing one. Plus the specs in the OP are quite general, there could have been changes that aren't captured by them. RSX for example had larger texture cache compared to G70.

lwilliams3 · Dec 16, 2016

BDGAME said:
I believe that question is already asked before, but asking again, What kind of power is necessary to run Xbox one games at 720p, that assuming that most of the One games run at 1080p.

Going from 720p to 1080p requires 2.25x the GPU power for rendering alone, so a game running on the XB1 @ 1080p may roughly run the same game at 720p with a GPU close to 600 GFLOPS (the XB1 is 1.31 TFLOPS. There is also the architecture differences, CPU, and memory setup to consider, but the 2x fp16 for Maxwell/Pascal's architecture and the rumored stronger CPU would give the Switch a performance boost beyond the documented GFLOPS for fp32 in comparisons to the other systems.

Having said that, I don't know if most multiplatform games ran @ 1080p on the XB1.

blu · Dec 16, 2016

th3sicknness said:
"New to Pascal is a mix of improved compression modes and new compression modes. 2:1 compression mode, the only delta compression mode available up through the 3rd generation, has been enhanced with the addition of more patterns to cover more scenarios, meaning NVIDIA is able to 2:1 compress blocks more often."

That's incorrect in the case of TX1 - page 15, which has its own enhancements vs GM<desktop>. What is really new is the 4:1 DCC block, which was indeed absent from TX1's Maxwell (which has a 4:1 constant-color compression block). That new block scheme also allows for some combinations of 2:1 and 4:1 which were uncompressed in the old scheme.

ed: Ok, I just checked the GM204 whitepaper, and the Delta Color Compression description is word for word identical to the one in TX1 whitepaper - 2:1 DCC, 8:1 and 4:1 constant-color are all present in the GM204 as well as in TX1. So if Anand was referring strictly to the DCC then yes - only 2:1 DCC was available before, but with the GPn they also got 4:1 DCC. The rest is constant-color compression blocks, which translated from Maxwell verbatim.

Doctre81 · Dec 16, 2016

"So it's kinda like a tegra x1?"

"It's custom" -Nvidia

"customized tegra x1??"

"It's custom" -Nvidia

"But tegra x1 right"?

"..." -Nvidia

Schnozberry · Dec 16, 2016

MuchoMalo said:
This is incorrect. 16nmFF doesn't have any notable density advantage over 20nm, since 16nmFF is effectively 20nmFF but named 16nm because reasons. This is why I'm not really entertaining the idea of a third SM.

20nm was planar, and didn't support 3D transistors, and it also had a lot of static leakage in comparison to a FinFet design. It's not the same gains Intel made from 22nm to 14nm, but it's not like they don't exist. Fin length, thickness, and pitch have all been reduced compared to 20nm. It will yield a small improvement in density.

The Tegra X1 on 20nm Planar was 121mm^2. Apple's A10 Fusion has 6 GPU cores and is 125mm^2 on 16nm FinFet+. The Apple Chip is pretty tightly packed, though.

Schnozberry · Dec 16, 2016

Zil33184 said:
The leak doesn't state the dev kits are off the shelf Jetson boards, that's just how it was interpreted by people as an attempt to dismiss the legitimacy of the leak.

Go read the spec sheet on the Nvidia page for the Jetson TX1. It's the same verbage word for word.

Zil33184 · Dec 16, 2016

Schnozberry said:
Go read the spec sheet on the Nvidia page for the Jetson TX1. It's the same verbage word for word.

Well almost, there are slight differences with regards to storage and multitouch input. However, if the final hardware is a TX1 variant then you'd expect them to be similar or near identical in a general description.

Mr Swine · Dec 16, 2016

Will Switch likely have the new texture compression format that has been under development for a few years now? Won't that save bandwidth better than S3TC?

Hermii · Dec 16, 2016

Zil33184 said:
Well almost, there are slight differences with regards to storage and multitouch input. However, if the final hardware is a TX1 variant then you'd expect them to be similar or near identical in a general description.

Multitouch screen and storage has nothing to do with the soc.

The one thing we known about the switch since before it was the NX is that its custom hardware.

MuchoMalo · Dec 16, 2016

Schnozberry said:
20nm was planar, and didn't support 3D transistors, and it also had a lot of static leakage in comparison to a FinFet design. It's not the same gains Intel made from 22nm to 14nm, but it's not like they don't exist. Fin length, thickness, and pitch have all been reduced compared to 20nm. It will yield a small improvement in density.

The Tegra X1 on 20nm Planar was 121mm^2. Apple's A10 Fusion has 6 GPU cores and is 125mm^2 on 16nm FinFet+. The Apple Chip is pretty tightly packed, though.

I didn't say that there were no gains, but they're negligible enough that even TSMC doesn't consider it worth mentioning. A10 Fusion having 6 GPU cores doesn't have any bearing on anything either since it's a completely different architecture. On top of that, Apple's A8 has four GPU clusters in an 89mm^2 SoC on 20nm.

Edit: Here, to make it a bit more clear:

TSMC's 16FF+ (FinFET Plus) technology can provide above 65 percent higher speed, around 2 times the density, or 70 percent less power than its 28HPM technology. Comparing with 20SoC technology, 16FF+ provides extra 40% higher speed and 60% power saving. By leveraging the experience of 20SoC technology, TSMC 16FF+ shares the same metal backend process in order to quickly improve yield and demonstrate process maturity for time-to-market value.

http://www.tsmc.com/english/dedicatedFoundry/technology/16nm.htm

16nmFF = 2x 28nm

TSMC's 20nm process technology can provide 30 percent higher speed, 1.9 times the density, or 25 percent less power than its 28nm technology. TSMC 20nm technology is the manufacturing process behind a wide array of applications that run the gamut from tablets and smartphones to desktops and servers.

http://www.tsmc.com/english/dedicatedFoundry/technology/20nm.htm

20nm = 1.9x 28nm

That's only a 5% difference.

Zil33184 · Dec 16, 2016

Hermii said:
Multitouch screen and storage has nothing to do with the soc.

The one thing we known about the switch since before it was the NX is that its custom hardware.

Which could mean something radically different, superficially identical, or anything in between in comparison to existing Nvidia SoCs like the TX1. The use of the term custom doesn't rule out any of these possibilities.

Support NeoGAF

Nintendo Switch Dev Kit Stats Leaked? Cortex A57, 4GB RAM, 32GB Storage, Multi-Touch.

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Banned

aka Mannny

Member

Member

aka Mannny

Member

Member

Member

Member

aka Mannny

Member

Banned

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Wants the largest console games publisher to avoid Nintendo's platforms.

Member

Member

Member

Member

Wants the largest console games publisher to avoid Nintendo's platforms.

Member

Member

Member

Member

Banned

Member

Banned

Member

Similar threads