Parallelized emulation is incredibly difficult, if not impossible. You need to have as many cores in the host system as there are discrete threads in the emulated system. If you have fewer, then one emulated thread has to sit and wait for the core it's running on to feed it results from another, related thread. If you have more cores, the extra core can't do anything until it gets results from the on-going thread, so it might as well just keep running on the same core.
Extra cores come in handy when you have to do things like emulate sound processors and vector units, but in the case of going from 6 SPUs to 400 shader units... that just doesn't work. I can't think of any way to make it work, even if devs went back to the source code they'd probably have to completely rewrite everything unless they used something like OpenCL.