so if a card uses bruteforce to get 12tflops than its still better than a card that uses a more efficient way to get 10tflops?
That situation can happen well nVidia cards do more for games with 9TFs than AMD cards with 12-13TFs.
But all depends how the hardware is optimized to use these units and how the software is coded to make easy the hardware use them.
Now expect to use 100% of the SPs in a card reaching the max TFs output is utopian.
It is not possible to use 100% of the SPs units all the time... sometimes the game didn't even require these units and there are bootlenecks all the time... for example the data didn't reach in time from RAM so that chunk of SPs will stay iddle until the data reach the registers... so it is already holding the GPU to reach the max TFs performance.
There is a lot of others situations.
The TFLOPS is the metric to tell you the max processing power a GPU can have... it is like the max speed your can shows on display... can you reach the speed displayed in the screen? At least in my cars most shows 220 or 250 km/h but even in optimal situations (in a race track with all the best conditions) I never could reach more than 205 or 210 km/h.
It is the same here... the peak performance is TFLOPS is basically impossible to be archived so the GPU that can use more of that TFLOPS for games (that means efficiency) will be a better architecture for games.