I agree with bgassassin in that I think we should be able to identify most blocks on Latte using the Radeon architecture as a reference. However, I see things quite differently than his interpretation. Here's how I'd place things.
Command Processor: Block C
Setup Engine: I'd have block G as the Vertex Assembler with the tesselator included within. H would be the Geometry Assembler, L the Rasterizer and K the Hierarchichal Z. I may have mixed up a few of these blocks, but I generally see this area as the "front end" of the graphics pipeline.
Instruction Cache and Constant Cache: Block D. I am pretty confident in these. The Constant Cache (which is helpful in GPGPU functionality) can be pretty significant in recent chips (up to 128kB) so I see that as the SRAM pool on the right of this block.
Thread Dispatch: Block I. It looks alot like the ones in Tahiti. Perhaps smaller than the one on RV770, but then again, it's dealing with alot less threads.
Interpolators: J Blocks. 32 in total distributed in the 4 blocks. If Latte is based on R700 series, you can't leave these out. Not as flexible as having interpolation done by shaders, but should decrease their load compared to DX11 chips by having fixed silicon.
Local Data Shares: Block M. 16 kB for each of the 2 SIMD cores.
SIMD cores: N Blocks. These have been covered pretty well, so no explanation should be needed.
Shader Export: Block P. Alot of SRAM on there for buffers and whatnot I would think.
Video Output Heads: Q Blocks. I just explained why I concluded this a few posts back.
Global Data Share: Block V. 64 kB. Similar to block found on Llano in a similar position relative to shaders, TMUs, etc. Upgraded from standard R700 for better compute.
TMUs: T blocks. Again, I've already explained why I think this, but it also makes sense for them to be close to the DDR3 I/O.
L1 caches: S blocks. 8 kB each.
L2 caches: U blocks. 32(?) kB each. Seem to resemble the L2 on RV770 to an extent. I'm only counting the long SRAM banks as actual storage space. There should be a fair amount of other logic/SRAM for cache functionality, so it's hard to say how much is for actual texture data.
ROPs: W blocks: Seem to look like the ROPs on RV770 and their positioning next to L2 and DDR3 I/O makes sense.
North Bridge: Block B? Just kinda guessing with this one but it would be logical for it to be around the CPU interface.
South Bridge/DSP: Block X. Pretty obvious. Marcan helped us with this one.
ARM926 core: Block Y. Ditto. Thanks to Marcan.
Media Decode: Block F. A similar block is also found on Llano. Block E may be related to this or related to the caches in Block D.
Blocks O and R are kind of a tossup, but they may have something to do with moving data to and from the 32 MB eDRAM. Call them "move engines" or whatever - who knows how they work really or how efficient they are. Block A seems to be related to the 1 MB eTC for Wii mode and contains the cache tags and possibly allows interaction with the CPU.
As you can see,this covers all blocks and is pretty standard stuff. Going by this, I don't think we need to draw up imaginative theories to explain Latte. It's a well designed chip that meets Nintendo's needs. Whether we agree with their use of available technology is another story.