It is faster. But it's not. It's faster in throughput. We're looking for latency.
OK so when you're encoding you have three types of frames, I-frames, P-frames, and B-frames. I-frames are complete representations of an image. It's like a JPG of a scene. A P-frame is a one way frame. It tells you motion going forward. B-frames are bidirectional. They tell you motion of pixels going forward and backward.
You're streaming. So immediately, B-frames are useless. Throw them out. But now you have this case where you have to figure out motion which normally takes a few frames. x264 has a special mode where it will look ahead only using the next frame. Keep in mind it's pulling from the frame buffer so any VBLANK timings are kind of irrelevant at this point so the latency can get pretty low on a decent setup.
Now normally decoding can only happen once a completed frame has been received. But we're looking for low latency. So software encoding does slicing where the frame is split into a number of self-contained slices which can be individually decoded so as soon as a slice is received it can be decoded. This reduces the latency again because you have a pipeline of decoding going on instead of waiting for a full frame and hurrying to decode it.
The last step is making sure bit-rate and encoding latency is consistent. I-frames are basically a full image smack in the middle of all this prediction. They screw with your bitrate, they take the longest to encode. So what x264 did was implement periodic intra refresh. The i-frame is inserted progressively in a column on each frame. The i-frame is being constantly updated throughout the stream one column at a time instead of once every 25 (or so) frames! So you get almost this perfectly smooth bitrate! The result of which is you can operate a stream with a minimal amount of buffering which again reduces latency.
And none of these tricks are supported in any hardware encoder implementation. So you get the old school multiple frame lookup more latency blah blah blah.