@Laa-Yosh You can see Remedy's facial scanning rigs (medium, high quality) in this vid.
This is actually quite interesting because I haven't really seen any combination like this before.
They have basically three techs:
- Photogrammetry rig with a lot of cameras for standard 3D capture. This is mostly for geometry; base head model with normal maps, and also standard facial blendshapes where you get geometry, normal maps, and also diffuse maps that you can use to display blood flow changes when the skin is extremely stretched or compressed.
- Performance capture rig, probably one or more head mounted cameras, to capture facial performance with body performance and voiceover at the same time, to get everything in sync. This is also quite common.
- Photogrammetry rig for 4D capture. This is where it's a bit tricky. This rig is basically producing a stream of 3D head scans at 60fps, so it's a very good but somewhat rough (particularly because only using 6 cameras) animated head, good for capturing extreme deformations and such. What I don't get is how they combine the results of this with the above two.
The usual method is that you use the photogrammetry data to build a face rig with blendshapes and wrinkle maps, either manually or by writing some complex custom code to automatically extract the final data. So you either need some good artists to process the results, or a really good coder to write your tools (involves stuff like optical flow and such). But in the end you get the rig and then you use a facial capture solver to process the headcam data and use it to drive the rig. There are some off-the-shelf tools (like Faceware) or custom inhouse stuff (used for example by 343i) to do this.
Utilizing 4D capture data is a lot less straightforward and must involve custom tools. It requires multiple cameras, so usually the subject has to sit in a chair and move very very little - which means that a complete performance capture is not possible and you have to record body movement separately, then somehow sync the results of the two captures. It can easily create a mismatch between body movement and facial performance.
The most obvious example is LA Noire where they've streamed the deformations for the head mesh and also one set of color+normal textures per frame from disc to basically replay the performance of the actor that was recorded. Obviously this wasn't flexible at all, it was probably nearly impossible to edit the results in any way. They used a fairly standard multi-camera setup that captured both geometry and texture data.
Another method is used by MOVA capture, their tech was used mostly in VFX and CG, like Benjamin Button by Digital Domain or the Halo 2 Anniversary cinematics by Blur. MOVA uses UV light responsive make-up to create tracking points in the range of 10K to generate a relatively high-res animated polygon mesh. Then there are many possible methods to use this mesh to drive your final head model, but it's still not that easy to change or edit the performance.
Edit: as far as I know they're also not mobile, so clients have to travel to their studio for the capture sessions.
So why and how would Remedy use 4D capture? They already have a performance capture session and they also capture a lot of data for the face model and probably the blendshapes of the rig. It's also really hard and not too practical to try to get the actor to repeat the previously recorded full performance, especially while sitting in a chair.
My best guess would be that they are capturing some additional data for the face rig to get even better results and details that go beyond simple static blendshapes (basically elemental static facial expressions). It's quite intriguing and possibly a completely new approach, so I'm pretty interested to find out more about it