This reminds me of AMDs issues with OpenCL and the Blender Cycles rendering engine. It just destroys the memory of your computer when you hack it to get it to work. The Blender devs can't easily fix it because it would require undesirable modifications to the code (having separate CUDA / OpenCL branches). AMD has discussed ways to fix the OpenCL compiler so the issue may be resolved before this new architecture is ever developed (if it ever gets big, I should say, could be pipe dreaming). And of course, with large scenes even CUDA is suffering from this problem. In that event having full access to internal memory, which should have 16 gigs standard for high performance rigs in a few more years (according to Steam
4-8 is standard now), will be huge, just wonderful for graphic performance. And it could indicated why AMD isn't terribly concerned about chopping OpenCL's compiler up to fit in the GPU-RAM / CPU-RAM model (which, as Blender shows, NVIDIA / CUDA is better at doing).
Note: grain of salt rambling here, corrections in my observations welcome.