The Tflops are useless, it is necessary to evaluate how the system works as a whole.
8 DCU-16 CU hanging from the same RDNA2 branch is not a good deal.
AMD doesn't have any GPUs for now (leaked) RDNA2 has more than 5 DCUs hanging from the same L1. Although they have added L0 it does not solve it or AMD would have put more DCUs hanging. In fact 80 Cu is 2 times 40 CU and never exceeds 5 DCU per branch.
For me, taking advantage of those DCUs above 10 is requiring a low level programming with great care with caches. It's a headache.
However, it is only a speculation of mine, although with foundation.
TFlops are of no use if there is no control and there are no Hits in the caches, since they will remain waiting most of the time. That is why it is necessary to see how the complete system performs.