Switch 2 cpu bottleneck issues: Digital foundry

First of all, DF just makes these BS videos because they have nothing else to talk about and need to push out content.

Secondly, Nintendo should have used a 4nm fab.
 
What should it be compared to then? It's closest to the performance of the base PS4. I'm also surprised it's running so well and that CDPR got it running in 7 weeks. But also, I expect the experience to be sub-par
It's closer to ps4 pro

They both even play cyberpunk at a similar resolution and frame rate
 
Last edited:
Let's be real. The switch 2 is going to have a bunch of third party games on it that run and look like crap with low resolutions, low framerates, a mixture a both, and low quality textures in certain areas. Alot of people are going to be playing the switch 2 hooked up to there tv which means these flaws are much more noticeable which in turn means you're not getting a quality experience compared to the competition. If you wanna ignore those flaws because you play it in handheld mode then fine, but anyone expecting this machine to perform miracles were fooling themselves.
The majority aren't buying a Switch 1/2 to play 3rd party games. The only reason to buy one is to play Nintendo exclusives and 3rd party games on the go, albeit at lesser settings…and that's ok.
 
The majority aren't buying a Switch 1/2 to play 3rd party games. The only reason to buy one is to play Nintendo exclusives and 3rd party games on the go, albeit at lesser settings…and that's ok.

Ya i mean you buy nintendo system for nintendo games and third party games are just a bonus. All that matters is that the nintendo games look and run good on this machine. I agree with you.
 
Last edited:
DLSS infers high resolution images from lower resolution images. It free ups the gpu pipeline when you are trying to get higher res images. The CPU bottleneck will still stand.
What would probably be used on a CPU bottleneck scenario is the frame generation, which infers the "mid fake frames" from 2 different "real frames".

The thing is... Both cases are way better when you have at least a real 1080p on the GPU pipeline and more than 30 fps.

Every single time i tried frame gen + AI upscaler to generates a final 1080p 30fps (the lower end of the spectrum, which looks like it will be the case) it generates a ton of ghost, banding, artifacts, shitty IQ in general etc.

From what i saw until now, AI upscaling and frame gen in general were made and works to upscale 1440p images to 4k and 60 fps to 120.
Not 720p 30fps images to 1080p 60 fps.
 
Last edited:


community-ken-jeong.gif
 
i dont think the new 3rd party games going to run well on switch 2.

And i think capcom need to perform miracles for Mh Wilds to be able to run at all on switch 2
 
Last edited:
The majority aren't buying a Switch 1/2 to play 3rd party games. The only reason to buy one is to play Nintendo exclusives and 3rd party games on the go, albeit at lesser settings…and that's ok.
The majority does buy 3rd parties though ...you gotta remember we are not the majority on 'gaf. Even myself who has a higher end console will still buy a lot of smaller 3rd party games that might be ideal for handheld ....like Hades 2
 
DF has only ever been good for one thing: what's the framerate?

Every thing else in their videos I skip past, because it all reads like nitpick this, and nitpick that. Shit that doesn't fucking matter when you're playing the game, especially when they zoom in 4 times, to show how "look at the edges here compared to this version".

As someone whose played Cyberpunk 2077 on a Steam Deck, the Switch 2 version sounds kinda impressive since it has a 40fps performance mode. The steam deck always seemed like it could barely handle 20-30fps on the game.
 
DF is getting some major hate the last few weeks

Because they are speculating and making strong statements far too much without having the final console and games in hand.

It would be better for them to simply say "we don't know yet" or shut the fuck up.

There's plenty of stuff that's actually come out recently that they can sink their teeth in to where we could get much more valuable, useful and actionable information.

They've gone full tabloid newspaper in the last few months.
 
Last edited:
Isn't the small memory bandwidth also going to be a big bottleneck? People comparing it to a PS4 Pro but the Pro had much more RAM

That's like comparing a Dragster with a McLaren P1 on the Nurburgring.

There's such a paradigm shift in GPU occupancy, cache and memory handling between 2012 AMD GCN architecture and Ampere I'm not even sure where I would begin. Not to mention that Jaguars were bandwidth hungry compared to ARM processors which are made for mobile memory in mind to begin with.

AMD GCN's cache and memory was so bad that almost the entirety of the RDNA project was to fix it. It has anemic front end, the geometry engines and rasterisers can't spit out vertices and pixels fast enough to saturate the cores. Shit occupancy, the CUs just can't stay occupied, full of stalls. Its like having a giant pool and filling it with a water hose, that's why PS4 went overkill on bandwidth, while the hole diameter and valve did not get bigger, it has so much pressure that any time its not stalled they are sure to give it data ASAP.

GCN could do an instruction every 4 cycles (SIMD16 completes in 1/4 cycle) while Kepler was 1 instruction every cycle.
GCN had geometry pipeline stalls with any context switch instructions (which vega tried to fix).

Even the infamous Vega with ridiculous bandwidth and memory bus width had 4 geometry engines for 4096 cores. Tahiti which PS4 is based on is 2 per 2048, equivalent.
To give an idea, Kepler basically is the fundation how the division of the basic SM building blocks that carried forward all the way to modern days and back then it had one polymorph engine (geometry engine equivalent) per 48 cuda cores. 1 one per SM. Then 1 per 128 cuda cores in Pascal. etc. Nowhere near the GCN's bonker idea of trying to feed 1024 cores with 1.

GCN was a compute monster, it handled well large work sizes with long durations (big pool), but very few game workloads fall into this category. Simple geometry was not saturing the GP (idle), it had simultaneous bit commands that created huge buffers basically kneecappings parallelism. The larger GPU on PS4 also meant that the SE:CU ratio (shader engines vs compute units) would fill slower, prefering longer running waves which is again, anti-thesis to most gaming workloads.

RDNA's whole point was to revamp the consequences of years of trying to make GCN work.

A shitload happened between Kepler → Maxwell → Pascal → Volta → Turing → Ampere

Ampere especially was a paradigm shift in Nvidia architecture with concurrent raster/RT/ML, Asynchronous to keep GPU near full occupancy, ampere global memory traffic for asynchronous memory copy and reducing memory traffic and also hide data copy latency, etc. Without even going into each generation improvements.

For switch 2 bandwidth :

T239 on switch 2 respects the entire Ampere lineup of the usual 25GB/s TFlops. Which leaves ~25GB/s remaining for CPU which is more than plenty on ARM A78.

With estimated TFlops from the T239 leaks

Handheld 1.7 TFlops * 25 + ~25GB/s for CPU = 67.5 GB/s → DF estimated 68.26 GB/s
Handheld 3.1 TFlops * 25 + ~25GB/s for CPU = 102.5 GB/s → DF estimated 102.4 GB/s

More examples of Ampere ~25GB/s per TFlops :

3060 @ 12.74 TFlops for 360 GB/s → 28.25 GB/s/TFlops
3070 @ 20.31 TFlops for 448 GB/s → 22.1 GB/s/TFlops
3080 @ 29.77 TFlops for 760 GB/s → 25.5 GB/s/TFlops
3090 @ 35.58 TFlops for 936 GB/s → 26.3 GB/s/TFlops

Its being fed with bandwidth exactly according to the modern Nvidia architectures' needs.
 
DF is getting some major hate the last few weeks
Personally it's less them and more everyone who has been treating their guess as truth. Even DF is saying these are assumptions at best but when it gets posted here, it's being passed on as facts and not assumptions.

That's not really on DF as much it is on people passing the info along.
 
????????

No one said that. They're talking about the ballpark visual / performance level they saw in their hands on. They're actually surprised CDPR allowed a heavier part of the game to be demoed instead of a lighter, emptier area without any gunfights.

I swear some folks here have severe DF derangement syndrome. It's incredibly easy to just not open a DF thread guys. Try it sometime.
The bolded part in my post was not about digital foundary. It was about people in general complaining that the Switch 2 isn't the next console to compete with the PS5/Series S and X.

In hindsight I should have worded it in my post I suppose.

And I like some of the guys in DF, just not that cuck fuck Alex.
 
That's like comparing a Dragster with a McLaren P1 on the Nurburgring.

There's such a paradigm shift in GPU occupancy, cache and memory handling between 2012 AMD GCN architecture and Ampere I'm not even sure where I would begin. Not to mention that Jaguars were bandwidth hungry compared to ARM processors which are made for mobile memory in mind to begin with.

AMD GCN's cache and memory was so bad that almost the entirety of the RDNA project was to fix it. It has anemic front end, the geometry engines and rasterisers can't spit out vertices and pixels fast enough to saturate the cores. Shit occupancy, the CUs just can't stay occupied, full of stalls. Its like having a giant pool and filling it with a water hose, that's why PS4 went overkill on bandwidth, while the hole diameter and valve did not get bigger, it has so much pressure that any time its not stalled they are sure to give it data ASAP.

GCN could do an instruction every 4 cycles (SIMD16 completes in 1/4 cycle) while Kepler was 1 instruction every cycle.
GCN had geometry pipeline stalls with any context switch instructions (which vega tried to fix).

Even the infamous Vega with ridiculous bandwidth and memory bus width had 4 geometry engines for 4096 cores. Tahiti which PS4 is based on is 2 per 2048, equivalent.
To give an idea, Kepler basically is the fundation how the division of the basic SM building blocks that carried forward all the way to modern days and back then it had one polymorph engine (geometry engine equivalent) per 48 cuda cores. 1 one per SM. Then 1 per 128 cuda cores in Pascal. etc. Nowhere near the GCN's bonker idea of trying to feed 1024 cores with 1.

GCN was a compute monster, it handled well large work sizes with long durations (big pool), but very few game workloads fall into this category. Simple geometry was not saturing the GP (idle), it had simultaneous bit commands that created huge buffers basically kneecappings parallelism. The larger GPU on PS4 also meant that the SE:CU ratio (shader engines vs compute units) would fill slower, prefering longer running waves which is again, anti-thesis to most gaming workloads.

RDNA's whole point was to revamp the consequences of years of trying to make GCN work.

A shitload happened between Kepler → Maxwell → Pascal → Volta → Turing → Ampere

Ampere especially was a paradigm shift in Nvidia architecture with concurrent raster/RT/ML, Asynchronous to keep GPU near full occupancy, ampere global memory traffic for asynchronous memory copy and reducing memory traffic and also hide data copy latency, etc. Without even going into each generation improvements.

For switch 2 bandwidth :

T239 on switch 2 respects the entire Ampere lineup of the usual 25GB/s TFlops. Which leaves ~25GB/s remaining for CPU which is more than plenty on ARM A78.

With estimated TFlops from the T239 leaks

Handheld 1.7 TFlops * 25 + ~25GB/s for CPU = 67.5 GB/s → DF estimated 68.26 GB/s
Handheld 3.1 TFlops * 25 + ~25GB/s for CPU = 102.5 GB/s → DF estimated 102.4 GB/s

More examples of Ampere ~25GB/s per TFlops :

3060 @ 12.74 TFlops for 360 GB/s → 28.25 GB/s/TFlops
3070 @ 20.31 TFlops for 448 GB/s → 22.1 GB/s/TFlops
3080 @ 29.77 TFlops for 760 GB/s → 25.5 GB/s/TFlops
3090 @ 35.58 TFlops for 936 GB/s → 26.3 GB/s/TFlops

Its being fed with bandwidth exactly according to the modern Nvidia architectures' needs.
Iirc, LPDDR5X is also very low latency memory compared to GDDR5, does it make any substantial difference to reduce differences versus Xbox and PS machines?
 
Last edited:
god its so much fun to watch insecure nintendo fangirls throwing tantrums.
cant wait for the meltdowns at the inevitable 20fps dipping df reviews.
 
Last edited:
Iirc, lpdde5x is also very low latency memory compared to gddr5,

LPDDR5x is still "high" latency but less than equivalent DDR5 SODIMM when soldered and thus also GDDR5. Latency mainly impact CPU but ARM A78 (and other ARM) are basically built for this since the beginning. They're built for LPDDR with very specific memory subsystems and a slew of data prefetchers to cache with irregular access patterns. The cutdown from desktop Zen 2 processors we saw in many products and including the laptops, did really bad with LPDDR5 comparatively, those CPUs are very sensitive to timings.

does it make any substantial difference to reduce differences versus Xbox and PS machines?

Compared to previous console gens? There's nothing on Jaguar that would outclass A78C.

Nvidia's Grace CPU superchip is paired with LPDDR5x, crazily enough, for hardware where prices are not a concern.
 
Mario Kart World seems pretty solid with 24 online players racing together and using abilities on screen plus the open world with some npcs. So wondering how much power is Mario Kart World already using with its CPU in it. If it's very close to its full power or is there still a decent amount to spare. Would give me an idea on how impressive their future Zelda and Metroid game would end up looking.
 
First of all, DF just makes these BS videos because they have nothing else to talk about and need to push out content.

Secondly, Nintendo should have used a 4nm fab.
Yep, this is the main problem here. Second problem been the very limited bandwidth (if we to compare it against PS4).

Those CPU limited games will eat bandwidth and there is such thing as memory contention getting bigger specifically on CPU limited games. The third problem been developers know how to optimize for those weak Jaguar CPU.
 
You nintendo fans really are terrible at this power comparison thing lol. Comparing split fiction 😆. Compare a more demanding current gen title and let's see how it goes.

Best not to though as Nintendo isn't in this power war and neither should you and for good reason.
 
You nintendo fans really are terrible at this power comparison thing lol. Comparing split fiction 😆. Compare a more demanding current gen title and let's see how it goes.

Best not to though as Nintendo isn't in this power war and neither should you and for good reason.
spit fiction is SPLIT SCREEN dont even run on PS4 you thinks they wouldnt release on 100 million userbase ifffff they could?? ahah why cause cartoony artstyle?
 
spit fiction is SPLIT SCREEN dont even run on PS4 you thinks they wouldnt release on 100 million userbase ifffff they could?? ahah why cause cartoony artstyle?
How does it run cyberpunk? Like shit. That's why making comparison with ps5/ps5 pro is gonna make you guys look silly for titles that are more demanding.
 
I agree with comparing it to those handhelds, but with these specs, it's much more appropriate to compare it to the XB1 than the XBSS.

wGCRNrV.jpeg

I understand the argument wrt clocks, but it's much closer to a Series S than it is to last gen consoles, based on GPU arch alone.

The Xbone and PS4 cannot run Cyberpunk Phantom Liberty, even at 30 fps.

While DLSS is no magic sauce that's all of a sudden gonna handle CPU intensive workloads, it will still alleviate GPU tasks, which are helpful, for a 30 fps target.

Apart from fighting games, esports shooters, remasters and a few other games, most of the releases on Switch 2 will be 30 fps and the Nintendo audience doesn't mind that.

You will realize this with current gen only games or the cringeworthy "impossible ports" starts dropping on Switch 2.

Do remember that The Matrix Awakens demo, a current gen only tech demo for UE5, ran on this console that was shown to journos behind closed doors at some trade show last year or the one before that (I don't exactly remember so don't quote me on this, but I think it was Gamescom).
 
*shows footage of a game that clearly looks better than PS4 version of the same game*

DF: Switch 2 is comparable to PS4.
There are people that understand what a CPU bottleneck is, and people that don't....

I wouldn't expect more than a ps4 pro in a mobile form, just with a few new tricks.

Nintendo is limiting the power of the chip, so even a rog ally should be much better (especially CPU wise). Don't expect wonders from that device. Dlss can't do much with a low base res and limited compute power.

Nintendos own games will look good. Their art style doesn't need many tiny details and effects to shine.
 
Last edited:
Digital Foundry dont have any real expertise or qualifications that would warrant taking their "analysis" seriously. Why people think they are some sort of authority on the subject matter is wild to me.
 
Damn it, no Bottleneck please, i want the whole Power and much better graphics than Switch 1.
The HW is not crap, I am disappointed they did not base it on a newer nVIDIA GPU family with stronger RT support but it is not crap.

It is just a hybrid that is focused on trying not to sell at a loss at launch and that needs to keep the same overall gameplay experience when you are docked or running on a battery between your hands so you are limited power wise there. You are not going to run your CPU at 4 GHz docked and 0.9 GHz in handheld mode (handheld mode runs at 1 GHz so 100 MHz higher than docked mode at 0.9 GHz but I bet it is something related to power the OS needs in that mode that it does not in docked mode). Scale up GPU performance in the two modes yes, CPU performance affecting the gameplay / core systems less likely.

As much as one wants to hate Jaguar CPU family it is still a 1.6 GHz for PS4 8 Jaguar cores vs 0.9 GHz (Switch 2's 8 ARM cores) clockspeed gap which is quite large and the same number of cores. Compare it to PS4 Pro and you get 2.1 GHz vs 0.9 GHz. So, close to PS4 CPU wise, need to see more performance data on the A78C vs Jaguar, maybe… but maybe not for the PS4 Pro target (it is over 2x delta).

On the GPU side docked it is a newer architecture and I will assume it is efficient at reducing bandwidth needs, but we do have quite a gap.

Technical Specifications:
CPU: Arm Cortex-A78C
8 cores
Unknown L1/L2/L3 cache sizes
GPU @ Nvidia T239 Ampere (RTX 20 series)
1 Graphics Processing Cluster (GPC)
12 Streaming Multiprocessors (SM)
1534 CUDA cores
6 Texture Processing Clusters (TPC)
48 Gen 3 Tensor cores
2 RTX ray-tracing cores
RAM: 12 GB LPDDR5, (some/all units will have LPDDR5X chips)
Two 6 GB chips

Power Profiles:
Handheld:
CPU @ 1100.8 MHz
GPU @ 561 MHz, 1.72 TFLOPs peak
RAM @ 2133 MHz, 68.26 GB/s peak bandwidth
Docked:
CPU @ 998.4 MHz
GPU @ 1007.3 MHz, 3.09 TFLOPs peak
RAM @ 3200 MHz, 102.40 GB/s peak bandwidth

In handheld mode they are targeting a 1080p screen (overkill resolution for me for a handheld but 🤷‍♂️), so 1.7 TFLOPS on a newer nVIDIA architecture vs 1.84 TFLOPS on base PS4 spells good news, but you are then limited by bandwidth (68 GB/s on Switch 2 vs 176 GB/s, both having to share with the CPU too) as well as clockspeed (561 MHz vs 800 MHz).

Docked mode we are again in the PS4 Pro ballpark and much higher clockspeed (beating PS4 Pro GPU clocks a bit too), but there is still a large bandwidth gap that could limit the GPU 's ability to flex: 102.40 GB/s vs 217.6 GB/s (PS4 Pro was bandwidth limited even at that speed).

One can admit this nVIDIA chip has lower bandwidth needs but again this is quite a large gap.
 
68 GB/s on Switch 2 vs 176 GB/s, both having to share with the CPU too

but there is still a large bandwidth gap that could limit the GPU 's ability to flex: 102.40 GB/s vs 217.6 GB/s

PS4 Pro though was meant to push resolutions above 1080p, up to 4K I think.
So this should be good enough for 1080p gaming. I am definitely expecting a match to base PS4, and even slightly above, in 1080p. Which is pretty good for a device this size and form factor.
 
Last edited:
Top Bottom