The lip syncing is superb! Best real-time virtual acting I've ever seen by a long chalk. Do we know if this is hard-baked from the English capture, or can they use alternative soundtracks and match up to those?
Looking at the research on the Debevec site, I can't really decide - maybe it is a flexible facial rig that can be animated manually... but it's just too perfectly synced and so much is baked into the animated textures that I can't really decide. It'd take a huge effort to combine phonemes and emotional expressions. Also note how the eye reflections are completely static because they're also baked into the textures!
After a bit of digesting it, I think the way it works is to use two layers of deformation:
- the first is relatively standard image based mocap, using markers drawn on the actor's face, recorded together with the audio, to drive the lip sync and the expressions on the base polygonal model
- the second is to animate (blend between various versions of) the color + normal maps that have a few dozen pre-sets for expressions and phonemes to add finer things like wrinkles and such; these have been recorded in the "training" phase of the capture session, using stereo photography and the light stage and are driven by certain poses of the geometry or the marker based mocap
Also, normal maps are probably 1/4 resolution, which is why we don't see any skin pore detail. The only thing they really need to change the color map for is the blinking, but they need new normals for a lot of the expressions and have to blend the maps together when various expressions and phonemes are combined together. That probably takes a lot of processing, hence the simple lighting and shading.
So it is very unlikely that they could just use a sound file and have automatic lip sync. But they don't have to record the entire performance in full detail, only a set of predetermined expressions and then use "standard" mocap for the actual performance.
As for the rest, my first impression was that they were going for a stylised look.
I'd say it's more about lacking the hw resources to use more than one light... The animated faces have to take up an insane amount of memory! One of the reasons that such tech wasn't really used before is that it comes with a lot of compromises.
Actually I wouldn't be surprised if the game would have to enter into "conversation" mode from normal gameplay and maybe even load for a while.