We added support for AGS WMMA intrinsics through VK_KHR_cooperative_matrix and VK_KHR_shader_float8,
which is enough to support FSR4.
Note that these shaders are tightly coded for AMD GPUs with some implementation defined behavior
(particularly around matrix layouts), and they will not necessarily work on other GPU vendors.
There is also a quite hacky emulation path of this which relies on int8 and float16 cooperative matrix support,
which can run on older GPUs at significant performance cost (and some cost to theoretical correctness).
Note that the default "official" build of vkd3d-proton only exposes this feature when the native
VK_KHR_shader_float8 is properly supported, i.e. RDNA4+ only.
The emulation path is available when building from source with the appropriate build flags.
The decision to not include this emulation path by default is over my pay grade.
The aim is to be able to ship FSR4 in a more proper way in Proton.