Loxus
Member
The 16-bit 67 TF from the leak don't match either, which would of been 2.18GHz.Chill guys, that's a 4% difference. Likely to protect sources.
But it's like you said, likely to protect sources.
Last edited:
The 16-bit 67 TF from the leak don't match either, which would of been 2.18GHz.Chill guys, that's a 4% difference. Likely to protect sources.
Well... if you really wanna go down this rabbit hole.... read this too.For both inference in addition to training?
And if so is that because that V_DUAL_DOT2ACC_F32_BF16 capability has been brought into WMMA on the RDNA4 ISA?
From reading I assumed in RDNA3 those two dual issue 16bit (RPM) capable instructions couldn't be used with WMMA from what the article stated meaning that inference on RDNA3 would still get better performance from V_DUAL_DOT2ACC_F32_BF16 by being dual issue and RPM than just RPM via WMMA instructions, or did I misunderstand or is the article incorrect about dual issue with RPM being restricted to those two V_DUAL instructions?
For comparison the PS5 GoW Ragnarok ML AI inference upscaler on page 48-49 of pdf does an performance optimisation that sounds very similar to a manual implementation of the dot2acc.
It could also be that that white paper is old. If the PS5pro was scheduled to launch last year, it would (like the slim) no doubt have been on 6nm. Going from 6nm to 5nm or even 4nm can make a world of difference.The 16-bit 67 TF from the leak don't match either, which would of been 2.18GHz.
But it's like you said, likely to protect sources.
I tried putting a PS5 "pro" PC build together and no matter what I do I can't get it under ~$900 so if they can come at around $600 would be quite good.
Part List - AMD Ryzen 7 5700X, Radeon RX 7700 XT, Montech AIR 903 BASE ATX Mid Tower - PCPartPicker
Now, of course the PC is a much better proposition overall, but you always have to pay a bit more.
7700 XT is not a good proxy. Based on leaks, Pro should have much better RTRT performance. If PSSR matches XeSS, then it will blow FSR3 out of the water as well. You're more like looking at ~ 7900 GRE.
It could also be that that white paper is old. If the PS5pro was scheduled to launch last year, it would (like the slim) no doubt have been on 6nm. Going from 6nm to 5nm or even 4nm can make a world of difference.
Secondly, I have never taken the leak as gospel, more as something just pointing us in the general direction. Eg... with the OG PS5 leaks, not a single one gave an accurate clock of 2.2Ghz. Instead, we just got 2Ghz. I don't know why we expect everything said in this leak to be 100% accurate.,
I tried putting a PS5 "pro" PC build together and no matter what I do I can't get it under ~$900 so if they can come at around $600 would be quite good.
Now, of course the PC is a much better proposition overall, but you always have to pay a bit more.
So are you thinking we could get a spec increase if it's on a smaller node?
Most people thought PS5 was going to be like 9TF or whatever based on the earlier leaks
I take it from my previous responses you agree that the dual issue with rapid pack maths (RPM) FP16 is actually possible on RDNA3 for a useful PSSR inference implementation on the Pro to get the peak performance of 67TF/s (assuming custom RDNA3), yes?Well... if you really wanna go down this rabbit hole.... read this too.
BF16 is one of the supported data types of RDNA3, and that fall right into their WMMA feature set. Int and 4 are also supported.
Seeing how CPU limited the PC version was, very unlikely they could do anywhere near 120 on console.Gurantee Concord was going to be used to advertise this saying you can play at 120fps.
At native res sure but with PSSR? I think its possible.Seeing how CPU limited the PC version was, very unlikely they could do anywhere near 120 on console.
Based on how power efficient PS5 is with the subsystems it has and the boost clock it has, and the time Sony spent developing its liquid metal thermal interface material - effectively planned deliding of the AMD APUs - and how there's no value in them leaving more power headroom in the under 250watt target range the product sits in, the idea that they reacted to the XsX specs doesn't make sense IMO.That "early days" information was probably true until the XSX's specs got leaked, and Sony decided to bump the PS5's specs to the 10.28TF. But TF isn't a good metric anymore, and certainly not for the PS5 Pro.
This leak is more in line with the PS4 Pro.It could also be that that white paper is old. If the PS5pro was scheduled to launch last year, it would (like the slim) no doubt have been on 6nm. Going from 6nm to 5nm or even 4nm can make a world of difference.
Secondly, I have never taken the leak as gospel, more as something just pointing us in the general direction. Eg... with the OG PS5 leaks, not a single one gave an accurate clock of 2.2Ghz. Instead, we just got 2Ghz. I don't know why we expect everything said in this leak to be 100% accurate.,
This leak is more in line with the PS4 Pro.
With the PS5, if I remember correctly, the 2GHz came from an AMD intern that mistakenly put a lot on info on Github.
MLiD and RGT got nearly anything right, which is why I'm still kinda scratching my head that MLiD actually got PS5 Pro documents.
So, there are people in GAF that have no idea how games work.At native res sure but with PSSR? I think its possible.
I'm curious to see how the consoles will navigate the feature set for a mid-gen refresh. I know Sony will go for a strong focus on ray-tracing, and likely hardware accelerated upsampling. Similar with what they did with PS4 Pro but fortunately the next-gen upscaling techniques like DLSS 2/3 AND FSR 2/3 are significantly better than their predecessors so I expect some pretty cool things in that regards.
So, there are people in GAF that have no idea how games work.
I could be wrong, but it might be a case where the AI Accelerators need to utilize the SIMD32 or vice versa to work.I take it from my previous responses you agree that the dual issue with rapid pack maths (RPM) FP16 is actually possible on RDNA3 for a useful PSSR inference implementation on the Pro to get the peak performance of 67TF/s (assuming custom RDNA3), yes?
I had looked at that RDNA3 WMMA article, but it doesn't mention dual issue from what I gleamed, and your previous linked article says dual issue and RPM at the same time is only possible with those two instructions, so unless dual issue gets added in custom RDNA3 of the Pro for WMMA and RDNA 4 a DLSS equivalent (inference) on AMD GPUs on RDNA3 and 4 would have least performance cost use by either of the V_DUAL_DOT2ACC instructions IMO because it does dual issue and RPM at the same time. Do you agree with that assessment, or have I missed something?
In the first article Mr.Phoenix linkedI could be wrong, but it might be a case where the AI Accelerators need to utilize the SIMD32 or vice versa to work.
Which is where Dual-Issue comes in.
Vector Units has two SIMD32 now compared to the one in RDNA1&2.
SIMD32 (Float/ INT / Matrix)
SIMD32 (Float/ Matrix)
I figure that's the meaning of Dual-Issue.
One SIMD32 running normal Float operations and the other, Matrix operations at the same time.
When one or both SIMD32 is doing Matrix operations, it utilizes the AI Accelerators for higher throughput.
You can look at the RDNA3 CU and Vector Unit diagrams for a better understanding.
Yes, you clearly, no game since Crysis has been CPU limited to the extent you seem to think that modern games are, yes you might get 5/10/15% difference in the 1% lows but it's not going to be the difference between a game hitting 60fps and 30fps with a modern GPU and an 8 core 16 thread CPU.
goner?
Not the same meaning as in the US
No. But if you put a Ryzen 3600 (which is better than what the PS5 mounts) yes. So I’m not quite wrong. Timestamped for your viewing pleasure:So if I put a Ryzen 7 3700x in my rig a plague tale will run at 30fps on my 4090? what about Cyberpunk? you're wrong, Crysis was a unique title in that it couldn't utilise multiple threads and cores on a CPU.
I think $599 without a drive is a very real possibility
No. But if you put a Ryzen 3600 (which is better than what the PS5 mounts) yes. So I’m not quite wrong. Timestamped for your viewing pleasure:
No. But if you put a Ryzen 3600 (which is better than what the PS5 mounts) yes. So I’m not quite wrong. Timestamped for your viewing pleasure:
This is from the same article.In the first article Mr.Phoenix linked
Microbenchmarking AMD’s RDNA 3 Graphics Architecture
Editor’s Note (6/14/2023): We have a new article that reevaluates the cache latency of Navi 31, so please refer to that article for some new latency data.chipsandcheese.com
The text associated with diagram 2/5 of the way down the page does the formal comparison between RDNA2 and RDNA3 about why RPM and Dual issue isn't straight forward even on RDNA3 and gives insight into the limitations of using Dual issue on RDNA3 even without the benefits of RPM.
"A RDNA 3 VOPD instruction is encoded in eight bytes, and supports two sources and one destination for each of the two operations. That excludes operations that require three inputs, like the generic fused multiply add operation. Dual issue opportunities are further limited by available execution units, data dependencies, and register file bandwidth."
I could be wrong here but isn't the latency higher on PS5 due to it using gdr memory. Conversely though this gives higher bandwidth?I played that game with a 2070S and a 3700X, and I had significantly better performance on that area. At over 70 fps.
DF's machine, has a memory latency of 90ns. Which is insanely high for a Zen2 CPU. And it really hurst performance.
I could be wrong here but isn't the latency higher on PS5 due to it using gdr memory. Conversely though this gives higher bandwidth?
Would be nice if there was a cache increase on PS5 Pro as well, as I assume this would give some performance increase without any compatibility issues ?Yes, it's at 140ns. And it's not exactly because it's using GDDR6. It's because the memory controller is tweaked for high bandwidth, not latency.
To make things worse, the Zen2 CPU on consoles, only have 4+4MB of L3. While the desktop version has 16+16MB.
So cache misses are more frequent on consoles, causing more memory accesses to that slow memory.
Though consoles have other advantages that help to claw back some performance.
Would be nice if there was a cache increase on PS5 Pro as well, as I assume this would give some performance increase without any compatibility issues ?
I know right. But hey we coupd play dvdsI paid 1149 guilders back in November 2000. If you take inflation into account, that PS2 cost 900 euro in today's money.
BTW for reference:
PS3 - 600 euro in 2007 - 891 euro in 2024
PS4 - 400 euro in 2013 - 525 euro in 2024
PS5 - 500 euro in 2020 - 593 euro in 2024
That's how I convinced my mom to get me an XboxI know right. But hey we coupd play dvds
But this time might be different.Not really looking forward to it because it wasn't available to get when it first came out,just like steamdeck
Most definitely... well, one of two things. Its either they increase clocks of the APU, going from 6nm dn to 4nm will definitely give them the headroom while keeping the cooling solution they originally designed for the Pro at 6nm, the same. Or, they keep the clocks the same and drop the size of the cooler. These console peeps tend to always go with the latter. So we can only hope.So are you thinking we could get a spec increase if it's on a smaller node?
Most people thought PS5 was going to be like 9TF or whatever based on the earlier leaks
I agree in theory, but have my reservations. From my understanding of the doc, RDNA3s implementation of this tech is just convoluted. While there are two pairs of 32 ALUs in each CU, both are not identical in what data type they can handle. Then there is the thing of dual-issue operations only seemingly limited to FP16, BF16, INT8...etc. Basically, as it stands now, you can do one FP32 option, simultaneously with as much as 2 FP16/BF16 /4*Int8 instructions on the only other 32ALUs in the CU that support all those data types.I take it from my previous responses you agree that the dual issue with rapid pack maths (RPM) FP16 is actually possible on RDNA3 for a useful PSSR inference implementation on the Pro to get the peak performance of 67TF/s (assuming custom RDNA3), yes?
I had looked at that RDNA3 WMMA article, but it doesn't mention dual issue from what I gleamed, and your previous linked article says dual issue and RPM at the same time is only possible with those two instructions, so unless dual issue gets added in custom RDNA3 of the Pro for WMMA and RDNA 4 a DLSS equivalent (inference) on AMD GPUs on RDNA3 and 4 would have least performance cost use by either of the V_DUAL_DOT2ACC instructions IMO because it does dual issue and RPM at the same time. Do you agree with that assessment, or have I missed something?
Yes, it's at 140ns. And it's not exactly because it's using GDDR6. It's because the memory controller is tweaked for high bandwidth, not latency.
To make things worse, the Zen2 CPU on consoles, only have 4+4MB of L3. While the desktop version has 16+16MB.
So cache misses are more frequent on consoles, causing more memory accesses to that slow memory.
Though consoles have other advantages that help to claw back some performance.
in the case of Ps5 you have now dedicated hardware to manage coherency and a good amount of SRAM integrated on the SoC.
Keppler said PS5 Pro is like AMD 7700 with better RT performance.
Sounds good enough.
7700 XT is not a good proxy. Based on leaks, Pro should have much better RTRT performance. If PSSR matches XeSS, then it will blow FSR3 out of the water as well. You're more like looking at ~ 7900 GRE.
Mix that low level SDK and with PSSR and we are good to go.Keppler said PS5 Pro is like AMD 7700 with better RT performance.
Sounds good enough.
But using PSSR, if they decide to run the game at a lower resolution and then use PSSR to upscale it, wouldn't that extra headroom allow higher frame rates?I was saying this from day one, yet people are going 4070... 4070S... 4080 next week?
It will be above 7700XT when RT comes into play but 95% of console games so far have no RT at all (other than software), so in majority of games it will be around that level.
PSSR also don't change performance, "only" image quality.