I haven't checked this thread for a few days so not sure if your above discussion is still relevant, I must be atleast 10 pages behind so I'm posting this blind
There's likely a few factors at play why GPU can't be clocked as high as CPU, gonna 'try' explaining some seemingly unrelated stuff first cos I believe it'll be relevant. This info can be for anybody who are interested in some 'simple' basics. Apologies for being a smartarse

but if it irritates sircaw then it's worth it
Every conductor has resistance that will convert some of the current flow to heat, and in return the heat causes more resistance in the conductor. Alternating current also get 'impeded' as current generates a magnetic field at right-angles to the flow (EMF) which in turn opposes the changes (long story short) pushing back on the changing current. So higher the frequency higher the impedance, longer the conductor more resistance and impedance, and more wasted heat further increasing the resistance
For these reasons the industry is packing everything in as close as possible, smaller distances means less resistance less impedance therefore less heat generated/wasted so can go much higher frequencies with less power (power is a product of current and voltage). There's a limit on how small they can go before they start hitting up against the laws of physics, get too small and quantum tunnelling will cause shit load of problems
Transistors in modern digital electronics are MOSFETs (metal-oxide-semiconductor field-effect-transistor) commonly connected in complementary pair called CMOS (one FET for pulling voltage high '1' and other for pulling low '0') the refined FinFET works on similar principle. FETs configured this way only draw power (and generate heat) when switching states. The state change is NOT instant, because of 'parasitic capacitance' the state change (switching) takes time to settle and contributes to the 'propagation delay'
The FETs are arranged as logic gates AND OR NOT XOR etc and FlipFlops (flipflops are a Latch that can be SET RESET or Toggled)
CPU/GPU registers, cache, and SRAM are made up of flipflops as memory. DRAM are different, each bit is made from 1 FET and 1 capacitor. The capacitor is the memory that retains the electrical charge (voltage), which needs regular refreshing due to leakage. DRAM have much higher memory density opposed to SRAM which needs 6 FETs to make up it's 1 bit memory flipflop. DRAM are slower mainly cos the heavily multiplexed interface needed to access substantially more memory
The CPU/GPU ALUs, control units, memory controllers, and all the parts that make up the compute unit cores an etc.. are made from combinations of logic gates and flipflops. As mentioned the FinFETs inside the GPU and CPU are the same. There are 2 reasons I can think of why the CPU can be clocked higher than the GPU:
1) This one is gonna be obvious. The average number of transistors (FETs) that are getting flipped (switched) 'concurrently' will be far greater on the GPU than the CPU because the sheer amount of parallel work the GPU does. CPU has 8 general purpose cores running in parallel. Whereas GPU there are 64 cores per compute unit, so in total about 2300 to 3300 specialised cores all running in parallel (if ever fully utilised), means GPU is gonna be running way lot more hotter if at the same clock as the CPU!
2) This one is a reason why GPU logic cannot keep up with higher frequencies. The highest clock rate will be limited by the maximum 'propagation delay' of the longest chain of 'combinational logic' that's in the GPU. Propagation delay is the time it takes for a logic gate to settle it's output after its inputs changed (this ain't instant as explained above), these delays will add up with every gate in series to get the final result, examples like full adders or multiplexers. Would infer that GPU rendering pipelines ALU'S have very deep logic chains, lot longer than the ones in CPUs! Hence Cerny's statement that the 'GPU can be clocked higher but the logic won't keep up'
I think the words Cerny referred to explain that GPU and CPU can be cooled equally as easy was 'thermal equilibrium' ?