I completely agree with you. I think the ESRAM will be used as somewhat of a software controlled cache (kind of like an L3 equivalent).
According to vgleaks, for the latency for, CPU a L1 miss is around 17 cycles and, L2 miss is around 144-160 cycles. For the GPU since it's running half the clock, that will be half that. So basically if the GPU runs out of memory and blows through both caches, you stall for 80 or so cycles to get data from the DDR3.
Acert on Beyond3d mentioned that he heard the SRAM latency is around 16-20 cycles. If true, that's 4-5x faster latency than DDR3 and will definitely reduce the number of stalls in cases where the L2 cannot be prefetched.
My theory is that this design is driven for tile based rendering where the main DDR3 ram will act mainly like a read-only buffer for the inputs (Textures, vertices, etc) and write only for the final rendered tile.
If the tiles are sized appropriately, then I think all the intermediate data should stay in the SRAM and not need to be written back to main memory. The move engines are then used for all the tile data prefetch form main DDR and the final write to DDR. You could have a triple buffered scheme, where you're moving data for the next tile into the SRAM, processing the current tile, and writing out the previously processed tiled back to main memory all in parallel. That way all the data movement is hidden and the GPU can just process continually.
In the 360's the ROP's were in the EDRAM die. From the leaked specs and what supposed insiders have mentioned, the ROPs on Durango are not on the EDRAM so there is no internal bandwidth.