The 40CU card is clocked very high (1750-1900 Mhz) likely way past its perf/watt sweet spot
A 64CU card can be clocked lower and achieve greater perf/watt. 7nm yields can also factor in higher than  needed voltages
| 64CUs (72 CUs - 8 disabled) - Frequency  | Teraflops | 
|---|
1500  | 12.28  | 
1580  | 12.94  | 
1600  | 13.1  | 
1650  | 13.5  | 
 
		 
		
	 
1600-1650Mhz are out of question imo.
And the number of CUs I don't believe it will have 72CUs.
The actual Shader Engine can have 20CUs...
So you have two paths that AMD can take to increase CUs:
1) Up to 4 Shader Engines:
You can have 4SEs with 16, 18, 20 CUs but not odd numbers.
In theory you can have these options: 4x14 (56), 4x16 (64) or 4x20 (80), etc... multiples of 8.
2) Continue with 2 Shader Engines:
If you stick with 2 Shader Engine then you can have: 2x14 (28), 2x16 (32), 2x18 (36) 2x20 (40)...multiples of 4... can you have more than 20CUs per SE?
Navi 20 will probably use that 4 SE configuration (curious enough it was the hardware limit of GCN maybe it is a limit here too) and I expect the same for PS5/Nextxbox.
48, 56, 64, 72 or 80 CUs are the options.
Using the most probably clock:
48CUs @ 1.5Ghz = 9.22TFs (7.68TFs with 8 disabled)
56CUs @ 1.5Ghz = 10.75TFs (9.22TFs with 8 disabled)
64CUs @ 1.5Ghz = 12.29TFs (10.75TFs with 8 disabled)
72CUs @ 1.5Ghz = 13.82TFs (12.29TFs with 8 disabled)
80CUs @ 1.5Ghz = 15.26TFs (13.82TFs with 8 disabled)
To be fair with the high power draw of RDNA I can only see the 
bolded being an option in a APU for consoles.