Switch 2 cpu bottleneck issues: Digital foundry

First of all, DF just makes these BS videos because they have nothing else to talk about and need to push out content.

Secondly, Nintendo should have used a 4nm fab.
 
What should it be compared to then? It's closest to the performance of the base PS4. I'm also surprised it's running so well and that CDPR got it running in 7 weeks. But also, I expect the experience to be sub-par
It's closer to ps4 pro

They both even play cyberpunk at a similar resolution and frame rate
 
Last edited:
Let's be real. The switch 2 is going to have a bunch of third party games on it that run and look like crap with low resolutions, low framerates, a mixture a both, and low quality textures in certain areas. Alot of people are going to be playing the switch 2 hooked up to there tv which means these flaws are much more noticeable which in turn means you're not getting a quality experience compared to the competition. If you wanna ignore those flaws because you play it in handheld mode then fine, but anyone expecting this machine to perform miracles were fooling themselves.
The majority aren't buying a Switch 1/2 to play 3rd party games. The only reason to buy one is to play Nintendo exclusives and 3rd party games on the go, albeit at lesser settings…and that's ok.
 
The majority aren't buying a Switch 1/2 to play 3rd party games. The only reason to buy one is to play Nintendo exclusives and 3rd party games on the go, albeit at lesser settings…and that's ok.

Ya i mean you buy nintendo system for nintendo games and third party games are just a bonus. All that matters is that the nintendo games look and run good on this machine. I agree with you.
 
Last edited:
DLSS infers high resolution images from lower resolution images. It free ups the gpu pipeline when you are trying to get higher res images. The CPU bottleneck will still stand.
What would probably be used on a CPU bottleneck scenario is the frame generation, which infers the "mid fake frames" from 2 different "real frames".

The thing is... Both cases are way better when you have at least a real 1080p on the GPU pipeline and more than 30 fps.

Every single time i tried frame gen + AI upscaler to generates a final 1080p 30fps (the lower end of the spectrum, which looks like it will be the case) it generates a ton of ghost, banding, artifacts, shitty IQ in general etc.

From what i saw until now, AI upscaling and frame gen in general were made and works to upscale 1440p images to 4k and 60 fps to 120.
Not 720p 30fps images to 1080p 60 fps.
 
Last edited:


community-ken-jeong.gif
 
i dont think the new 3rd party games going to run well on switch 2.

And i think capcom need to perform miracles for Mh Wilds to be able to run at all on switch 2
 
Last edited:
The majority aren't buying a Switch 1/2 to play 3rd party games. The only reason to buy one is to play Nintendo exclusives and 3rd party games on the go, albeit at lesser settings…and that's ok.
The majority does buy 3rd parties though ...you gotta remember we are not the majority on 'gaf. Even myself who has a higher end console will still buy a lot of smaller 3rd party games that might be ideal for handheld ....like Hades 2
 
DF has only ever been good for one thing: what's the framerate?

Every thing else in their videos I skip past, because it all reads like nitpick this, and nitpick that. Shit that doesn't fucking matter when you're playing the game, especially when they zoom in 4 times, to show how "look at the edges here compared to this version".

As someone whose played Cyberpunk 2077 on a Steam Deck, the Switch 2 version sounds kinda impressive since it has a 40fps performance mode. The steam deck always seemed like it could barely handle 20-30fps on the game.
 
DF is getting some major hate the last few weeks

Because they are speculating and making strong statements far too much without having the final console and games in hand.

It would be better for them to simply say "we don't know yet" or shut the fuck up.

There's plenty of stuff that's actually come out recently that they can sink their teeth in to where we could get much more valuable, useful and actionable information.

They've gone full tabloid newspaper in the last few months.
 
Last edited:
Isn't the small memory bandwidth also going to be a big bottleneck? People comparing it to a PS4 Pro but the Pro had much more RAM

That's like comparing a Dragster with a McLaren P1 on the Nurburgring.

There's such a paradigm shift in GPU occupancy, cache and memory handling between 2012 AMD GCN architecture and Ampere I'm not even sure where I would begin. Not to mention that Jaguars were bandwidth hungry compared to ARM processors which are made for mobile memory in mind to begin with.

AMD GCN's cache and memory was so bad that almost the entirety of the RDNA project was to fix it. It has anemic front end, the geometry engines and rasterisers can't spit out vertices and pixels fast enough to saturate the cores. Shit occupancy, the CUs just can't stay occupied, full of stalls. Its like having a giant pool and filling it with a water hose, that's why PS4 went overkill on bandwidth, while the hole diameter and valve did not get bigger, it has so much pressure that any time its not stalled they are sure to give it data ASAP.

GCN could do an instruction every 4 cycles (SIMD16 completes in 1/4 cycle) while Kepler was 1 instruction every cycle.
GCN had geometry pipeline stalls with any context switch instructions (which vega tried to fix).

Even the infamous Vega with ridiculous bandwidth and memory bus width had 4 geometry engines for 4096 cores. Tahiti which PS4 is based on is 2 per 2048, equivalent.
To give an idea, Kepler basically is the fundation how the division of the basic SM building blocks that carried forward all the way to modern days and back then it had one polymorph engine (geometry engine equivalent) per 48 cuda cores. 1 one per SM. Then 1 per 128 cuda cores in Pascal. etc. Nowhere near the GCN's bonker idea of trying to feed 1024 cores with 1.

GCN was a compute monster, it handled well large work sizes with long durations (big pool), but very few game workloads fall into this category. Simple geometry was not saturing the GP (idle), it had simultaneous bit commands that created huge buffers basically kneecappings parallelism. The larger GPU on PS4 also meant that the SE:CU ratio (shader engines vs compute units) would fill slower, prefering longer running waves which is again, anti-thesis to most gaming workloads.

RDNA's whole point was to revamp the consequences of years of trying to make GCN work.

A shitload happened between Kepler → Maxwell → Pascal → Volta → Turing → Ampere

Ampere especially was a paradigm shift in Nvidia architecture with concurrent raster/RT/ML, Asynchronous to keep GPU near full occupancy, ampere global memory traffic for asynchronous memory copy and reducing memory traffic and also hide data copy latency, etc. Without even going into each generation improvements.

For switch 2 bandwidth :

T239 on switch 2 respects the entire Ampere lineup of the usual 25GB/s TFlops. Which leaves ~25GB/s remaining for CPU which is more than plenty on ARM A78.

With estimated TFlops from the T239 leaks

Handheld 1.7 TFlops * 25 + ~25GB/s for CPU = 67.5 GB/s → DF estimated 68.26 GB/s
Handheld 3.1 TFlops * 25 + ~25GB/s for CPU = 102.5 GB/s → DF estimated 102.4 GB/s

More examples of Ampere ~25GB/s per TFlops :

3060 @ 12.74 TFlops for 360 GB/s → 28.25 GB/s/TFlops
3070 @ 20.31 TFlops for 448 GB/s → 22.1 GB/s/TFlops
3080 @ 29.77 TFlops for 760 GB/s → 25.5 GB/s/TFlops
3090 @ 35.58 TFlops for 936 GB/s → 26.3 GB/s/TFlops

Its being fed with bandwidth exactly according to the modern Nvidia architectures' needs.
 
DF is getting some major hate the last few weeks
Personally it's less them and more everyone who has been treating their guess as truth. Even DF is saying these are assumptions at best but when it gets posted here, it's being passed on as facts and not assumptions.

That's not really on DF as much it is on people passing the info along.
 
????????

No one said that. They're talking about the ballpark visual / performance level they saw in their hands on. They're actually surprised CDPR allowed a heavier part of the game to be demoed instead of a lighter, emptier area without any gunfights.

I swear some folks here have severe DF derangement syndrome. It's incredibly easy to just not open a DF thread guys. Try it sometime.
The bolded part in my post was not about digital foundary. It was about people in general complaining that the Switch 2 isn't the next console to compete with the PS5/Series S and X.

In hindsight I should have worded it in my post I suppose.

And I like some of the guys in DF, just not that cuck fuck Alex.
 
That's like comparing a Dragster with a McLaren P1 on the Nurburgring.

There's such a paradigm shift in GPU occupancy, cache and memory handling between 2012 AMD GCN architecture and Ampere I'm not even sure where I would begin. Not to mention that Jaguars were bandwidth hungry compared to ARM processors which are made for mobile memory in mind to begin with.

AMD GCN's cache and memory was so bad that almost the entirety of the RDNA project was to fix it. It has anemic front end, the geometry engines and rasterisers can't spit out vertices and pixels fast enough to saturate the cores. Shit occupancy, the CUs just can't stay occupied, full of stalls. Its like having a giant pool and filling it with a water hose, that's why PS4 went overkill on bandwidth, while the hole diameter and valve did not get bigger, it has so much pressure that any time its not stalled they are sure to give it data ASAP.

GCN could do an instruction every 4 cycles (SIMD16 completes in 1/4 cycle) while Kepler was 1 instruction every cycle.
GCN had geometry pipeline stalls with any context switch instructions (which vega tried to fix).

Even the infamous Vega with ridiculous bandwidth and memory bus width had 4 geometry engines for 4096 cores. Tahiti which PS4 is based on is 2 per 2048, equivalent.
To give an idea, Kepler basically is the fundation how the division of the basic SM building blocks that carried forward all the way to modern days and back then it had one polymorph engine (geometry engine equivalent) per 48 cuda cores. 1 one per SM. Then 1 per 128 cuda cores in Pascal. etc. Nowhere near the GCN's bonker idea of trying to feed 1024 cores with 1.

GCN was a compute monster, it handled well large work sizes with long durations (big pool), but very few game workloads fall into this category. Simple geometry was not saturing the GP (idle), it had simultaneous bit commands that created huge buffers basically kneecappings parallelism. The larger GPU on PS4 also meant that the SE:CU ratio (shader engines vs compute units) would fill slower, prefering longer running waves which is again, anti-thesis to most gaming workloads.

RDNA's whole point was to revamp the consequences of years of trying to make GCN work.

A shitload happened between Kepler → Maxwell → Pascal → Volta → Turing → Ampere

Ampere especially was a paradigm shift in Nvidia architecture with concurrent raster/RT/ML, Asynchronous to keep GPU near full occupancy, ampere global memory traffic for asynchronous memory copy and reducing memory traffic and also hide data copy latency, etc. Without even going into each generation improvements.

For switch 2 bandwidth :

T239 on switch 2 respects the entire Ampere lineup of the usual 25GB/s TFlops. Which leaves ~25GB/s remaining for CPU which is more than plenty on ARM A78.

With estimated TFlops from the T239 leaks

Handheld 1.7 TFlops * 25 + ~25GB/s for CPU = 67.5 GB/s → DF estimated 68.26 GB/s
Handheld 3.1 TFlops * 25 + ~25GB/s for CPU = 102.5 GB/s → DF estimated 102.4 GB/s

More examples of Ampere ~25GB/s per TFlops :

3060 @ 12.74 TFlops for 360 GB/s → 28.25 GB/s/TFlops
3070 @ 20.31 TFlops for 448 GB/s → 22.1 GB/s/TFlops
3080 @ 29.77 TFlops for 760 GB/s → 25.5 GB/s/TFlops
3090 @ 35.58 TFlops for 936 GB/s → 26.3 GB/s/TFlops

Its being fed with bandwidth exactly according to the modern Nvidia architectures' needs.
Iirc, LPDDR5X is also very low latency memory compared to GDDR5, does it make any substantial difference to reduce differences versus Xbox and PS machines?
 
Last edited:
god its so much fun to watch insecure nintendo fangirls throwing tantrums.
cant wait for the meltdowns at the inevitable 20fps dipping df reviews.
 
Last edited:
Iirc, lpdde5x is also very low latency memory compared to gddr5,

LPDDR5x is still "high" latency but less than equivalent DDR5 SODIMM when soldered and thus also GDDR5. Latency mainly impact CPU but ARM A78 (and other ARM) are basically built for this since the beginning. They're built for LPDDR with very specific memory subsystems and a slew of data prefetchers to cache with irregular access patterns. The cutdown from desktop Zen 2 processors we saw in many products and including the laptops, did really bad with LPDDR5 comparatively, those CPUs are very sensitive to timings.

does it make any substantial difference to reduce differences versus Xbox and PS machines?

Compared to previous console gens? There's nothing on Jaguar that would outclass A78C.

Nvidia's Grace CPU superchip is paired with LPDDR5x, crazily enough, for hardware where prices are not a concern.
 
Top Bottom