I'd disagree with you there. Not all GPU's can handle machine learning. If that was reality, then we would see more then just specific cards with the ability to support the feature. You could say the same on the CPU, you can handle the math there two, but without dedicated hardware handling it, its not fast. Also didn't the sony engineer that had his private messages posted for all to see, confirm that it doesn't have dedicated hardware for this months back on twitter?
Don't be obtuse. GPUs are much better than CPUs at DL computation for obvious reasons, but that doesn't mean they are a particularly good at it in absolute terms. For anything other than inferencing using the simplest DL models, they're actually pretty bad. Which is why you haven't seen a single instance of DL inference used in GPU compute during the current gen.
Rapid Packed Math for INT ops isn't "dedicated hardware for DL/ML"; contrary to the bollocks MS's marketing team has been peddling.
Integer math is used throughout the GPU graphics rendering and compute pipelines, for all sorts of workloads. It's an acceleration of generalised integer compute for lower precision integer numbers. It's analogous to RPM for FP16 for floating point math.
If you're looking for dedicated hardware for DL/ML, you need something specifically targeted towards the dense data, matrix math of DL computation. You want highly parallel cores with short execution pipelines, more registers, cache and shared memory. You want features for sparse data compression to improve effective memory bandwodth usage, and lota and lots of bandwidth to main memory. You want Tensor cores.
Tensor cores on NVidias flagship cards provide well over 230TFLOPs of floating point performance. Even with 2x or 4x RPM on XSX 12TFLOPs it isn't even coming close. And for those NVidia cards, that 230+TFLOPs is free, as it's dedicated hardware. So your DL computation isn't even touching the CUDA cores. So you still have 30+TFLOPs of CUDA core performance for all your other game rendering workloads.
This is the problem with your perspective on this subject and many others who push this ignorant narrative. You mistakenely believe faster low precision integer performance is somehow going to be a game-changer and make the XSX rediculously performant for DL computation. In reality, you're almost certainly looking at low single-digit percentage speed-up at best. And that's speed-up to a base DL compute performance that's already pathetic compared to other products with actual dedicated hardware providing the requisite meaningful level of performance required for this type of computation, e.g. tensor cores or TPUs.