One compute unit in GCN has 4 Vector unit each has 16 processing elements.
those processing elements are the shader cores.
Try AMD OpenCL programmers Guide it really cleared some things up for me when i had my interm. I could have swapped some of the terms has been like 3~4 months when i last saw into it.
this is what vgleaks said (the thing that i dont understand)
"Shader cores
12
Instruction issue rate
12 SCs * 4 SIMDs * 16 threads/clock = 768 ops/clock
FLOPs
768 ops/clock * (1 mul + 1 add) * 800 MHz = 1.2 TFLOPS"
L1 Cache
"64-way L1 cache of 16 KB, composed of 256 64-byte cache lines"
(so 256 long)
"
SIMD executes a vector instruction on 64 threads at once in lockstep."
"On each clock cycle, the scheduler considers one of the four SIMDs, iterating over them in a round-robin fashion. Most instructions have a four cycle throughput, so each SIMD only needs attention once every four clocks."
on vgleaks they using the formula 12*4*16 threads clock
12 stand per shader core .....4 r the simds..in every shader core and 16 the threads they should do x clock
but why 16?!?!? if they saying that a simd execture 64 threads/clock
and why on gcn the l1 isnt 64way and isnt long 256 but 64
pls if u have an answer help me
is vgleaks txt wrong? is their math formula wrong?
i know that misterx is talkign about this too and im
NOT supporting his theories but this formula of vgleaks sound crazy for real or not?