An update on the whole RT situation, based on
a post by Dictator at Beyond3D, with some of my own thoughts mixed in.
The main difference between nVidia's RT and AMD's RT is that nVidia's RT cores cover both BVH and ray traversal, while AMD's RAs cover BVH, while ray traversal is done on the CUs.
AMD's implementation has the advantage that you could write your own traversal code to be more efficient and optimize on a game per game basis. The drawback is that the DXR API is apparently a black box, which prevents said code from being written by developers, a limit the consoles do not have. AMD does have the ability to change the traversal code in their drivers, meaning, working with developers becomes increasingly important.
nVidia's implementation has the advantage that the traversal code is accelerated, meaning, whatever you throw at it, it's bound to perform relatively well. It comes at the cost of programmability, which for them doesn't matter much for now, because as mentioned before, DXR is a black box. And it saves them from having to keep writing drivers per game.
That doesn't mean that nVidia's is necessarily better, but in the short term, it is bound to be better, because little optimization is required. Apparently developers are liking Sony's traversal code on the PS5 as is, so maybe something similar will end up in the AMD drivers down the line, if Sony is willing to share it with AMD.
I hinted at this a while back, where on the AMD cards the amount of CUs dedicated for RT is variable. There is an optimal balancing point somewhere, where the CUs are divided between RT and rasterization, and that point changes on a per game and per setting basis.
For example, if you only have RT shadows, maybe 20 CUs dedicated to the traversal are enough, and the rest are for the rasterization portion, and both would output around 60 fps, thus they balance out. But if you have many RT effects, having a bunch of CUs for rasterization and only a few for RT makes little sense, because the RT portion will output only 15 fps and the rasterization portion will do 75 fps, and the unbalanced distribution will leave all those rasterization CUs idling after they are done, because they have to wait for the RT to finish anyway.
AMD's approach makes sense, because it has to cater to both the consoles and the PC. nVidia's approach also makes sense, because for them, only the PC matter.