Support NeoGAF

Lysandros · Apr 13, 2025

It exceeded my expectations pretty wildly to a degree of being illogical. I predicted PS2+ level of performance but what we got is objectively between PS5 PRO and PS6, the most impressive graphics i saw this generation by far. Nintendo engineers are truly magical.

Mozza · Apr 13, 2025

Just watching a game like cyberpunk running on system both in handheld and docked modes. With such a short development and optimization period has exceeded my expectations by a wide margin. Add to this a fully open world Mario Kart game that also looks insane makes me pretty happy.

Even more impressive is the system itself is so thin and only costs £395 in the U.K. So much better than the other bricks with screens gaming handhelds and the steam deck.

dgrdsv · Apr 19, 2025

Lysandros said:
Do you happen to know how Nvidia Ampere Teraflops are calculated?

Flops are always calculated in a similar fashion - unless someone starts to use something other than FP32 math for that which would be generally wrong thing to do.

Lysandros said:
Are double rate fp32 calculations included in a similar fashion? I am reading 'ampere compute figures being inflated' left and right but couldn't find a definitive answer after doing a bit of research.

No, Ampere's FP32 rate is "true" math rate in a sense that there are two SIMDs capable of doing FP32 math in an SM partition while in Turing there was only one such SIMD.
This is different from VOPD where the additional FP throughput comes from additional (repurposed?) ALUs inside the same SIMDs.
The difference comes down to the ability to extract FP instructions from the command stream and to the types of such FP instructions.
Ampere+ can basically run the same FP math on both SIMDs and this math can be from two independent warps/waves (out of those which are scheduled to run on the SM).
RDNA3's VOPD can run a limited subset of FP instructions at double rate and they have to come from the same warp/wave which is running on the SIMD in current clock.
This makes the opportunities for such double speed launches very limited and thus rarely happening in practice.
The "inflated" nature of Ampere flops is different - it's easy to use them but in practice they are excessive for gaming code (which tend to hit other bottlenecks aside from pure flops) and have to be shared with other math types.
The latter is the main reason why the gains between Turing and Ampere were lower than you'd expect from comparing the flops alone - Turing did have 1/2 of flops rate but it had the same INT rate and INT math takes 1/4 to 1/3 of typical gaming code execution.
On Ampere+ this INT workload has to run on the SIMD which can also do FP workload which means that it can't do FP when it's doing INT. Thus on practice this FP32 doubling is lower as the same SIMD must still run INT math which takes quite a chunk of overall execution time.
If you run a purely FP workload Ampere+ easily hits its peak FP throughput figures. But games are never a purely FP workload.

Support NeoGAF

Did the Switch 2 graphics capabilities meet your expectation?

Did the Switch 2 graphics capabilities meet your expectation?

Over my expectations

In line with my expectations

Under my expectations

Lysandros

Member

Mozza

Member

dgrdsv

Member

Similar threads