There are two basic approaches to face mocap.
The first method is to just place small optical markers on the face and have the system treat it like the larger markers on the body, track their movement in 3D space. Then this translation data is simply used to drive the bones in the face rig.
The problem is that the human face is much more complex and we need this complexity to interpret facial expressions. You can understand body language well enough even without the proper muscle deformations, tendons etc. - but on the face you need all the folds and wrinkles and the soft tissue pushing and pulling above the bones and such.
However you cannot place enough marker dots to track that, and the optical mocap probably couldn't handle it anyway. It also means you need a gazillion cameras on your mocap stage to always have every marker seen by at least two cameras.
There is an interesting development on this called Mova Contour, where fluorescent makeup is applied and it provides thousands of tracking points, so more fidelity can be achieved. However this requires a static sitting actor and cannot be used in performance capture where you record body and face simultaneously.
Obviously you can add some extra stuff on top of the bone based face rig, like helper bones to properly control the eye lids or the inside of the lips, and you can probably derive their movement from the tracked markers. But ultimately you only capture a small subset of the skin surface, so the deformations will suffer. You also cannot use this method to have significant differences between the actor's face and the CG character's face. Just look at the creepy human-goblin.
It's also very hard to manually animate on top of this, as all the bones have to be manipulated individually and you can not re-use any work because it's all relative on top of the mocap data.
The other approach is to track the face, then try to understand what the actor is doing and generate some kind of metadata, that can then be used to drive a facial rig based on facial expressions. So instead of tracking how many millimeters a jaw or eyelid is moving, you're instead trying to get a percentage value on "jaw opener" or "lower eyelid raiser".
The upside is that you don't care about tracking the deformations on the actor's face, you build them into the rig yourself, so the tracking doesn't need such a high granularity. You can also track pupil dilation and the markers only need to be painted on which is less obtrusive to the actor. Also, you can use a single head mounted camera so the mocap stage can be a little more simple.
However face cameras can get in the way in some situations and you also require all sorts of electronics equipment to sync them to the body mocap and voice recording, and you have to worry about batteries and such. Still, it's less expensive than buying another 20-50 cameras.
Also, you do have to put the deformations into the face rig yourself and that can take some work. You either need some talented artists to sculpt blendshapes, or multi-talented riggers to build expressions with a bone rig; or you can use scanning to get expressions from the actors but that requires significant investment. But you only have to build these deformations once, and re-use the same blink or smile or whatever. This also helps to make a character's facial performance more consistent.
The capture however requires special software called a solver that can recognize facial expressions, which is why it took so long to start to see solutions, it had to be built on a lot of research and requires fast hardware. There are also very few of these on the market, which is why probably 343 wrote their own.
Or, you can also use a human to do the "capture" like ND does Their system is by the way a bone based rig augmented with blendshapes, that has pre-defined facial expression poses for the bones, so technically it's the second approach. Only difference is that PS3 doesn't have enough memory for a full blendshape rig, so they are replicating its workings with the bones instead.
http://beyond3d.com/showpost.php?p=1752931&postcount=67