Laa-Yosh talk on head scanning wanting input (questions, not models)

Laa-Yosh

I can has custom title?
Legend
Supporter
I know it's super unfair but I'd like to ask for some input - I'll do a presentation on Thursday here in Hungary about how we're using scanning on our character work, and I kinda wonder about the perspectives of outsiders.
No plans to release the material online, but I promise I'll do a short summary for you guys here (without the images, sorry). So, anyways, if there's anything about scanning you're curious about, ask here - I'll make sure to answer here as well. Thanks!
 
How long does it take to scan a face?
Is the model forced to stay as is?
Do you make the model perform certain actions?

What software do you use to scan? What other softwares can do the job? For a hobbyist for professionals?

What type of equipment is required? Professional equipment vs do it yourself? Can you even do it yourself?

What is the usual type of processing required post scanning the model? What are common issues?

Are certain heads/faces/colour more prone to issues than others?



Sent from my iPhone using Tapatalk
 
Excellent. What would be your definition and implementation of a greek profile?
 
I've personaly always wondered how much else besides the imediate outside surface of a persons head gets captured today but never took the time to research myself. Things like the subsurface layers of the skin, if there are systems that also scan skin thickness, or different types of tissue like muscle, fat, veins, etc. Also, how other parts of the head are handled, like their teeth, iris, facial hair distribution... If there are guys that take models of an actors teeth like a dentist, or x-rays for their skull's profile. If there are industry standard ways of mesuring different properties of someone's hair, like albedo, thickness, curlyness...
The use of various extreme poses for blend shapes also has me curious. More specifically, how those poses are chosen, and how is it made sure actors actually hit the decided pose. How are those specific facial set-ups comunicated and verified...
I guess those are interesting topics I've always been curious about, hope that helps you make your talk...
 
First of all, thanks for all the questions everyone :) I'll try to answer as best as I can.

I'll also focus on stereo photogrammetry for the following reasons:
- it's the most scalable, flexible, portable approach
- thus it is the most commonly used
- and it's also what I'm most familiar with.

The theory is that if you take pictures of a subject from multiple angles, under the same circumstances, then you can use software to identify tracking points and triangulate their position in 3D space, generating a point cloud which can be remeshed into a polygonal model. Obviously the more images you have and the closer the settings are, the more points you can identify.
It's cheap because all it requires are digital cameras and a fast computer.
However, it works a lot better if the lighting is as even as possible; and because live subjects can't stay completely still, you'll need to take all the pictures at once. So, a high end system will require many cameras, a proper studio setup, and advanced control systems to sync all cameras and lights and such.

How long does it take to scan a face?

Theoretically, all it takes is the time to take a single picture.
The overhead is a bit more complicated - if you have a scan rig with all the cameras properly set up, then it only takes a few minutes to get the talent in the scan area and prepare the system.
However, calibrating all the cameras may take considerable time, especially if you want to set up everything to a unique layout. You may also want to brief the talent, and maybe apply some marker paint on their faces.
Processing the data may take a few hours per scan, depending on the hardware.

If you use the same rig for face and body scans as well, then changing the camera setup can take a few hours.

Generally, a session of a neutral scan and 50-100 facial expressions, with briefing the actor, can usually be completed in about 4-6 hours; data processing can however take a few days altogether.

Is the model forced to stay as is?

Not sure what you mean... Usually for face scans, the actor has to sit completely still for the entire session. Most studios use some sort of head rest to make that easier.

Do you make the model perform certain actions?

If you want to build a face rig from the scans, then you'll absolutely want the actor to go through a range of expressions.
For body scans, it's not that common to go through Range Of Movement scans, but it certainly helps.

What software do you use to scan? What other softwares can do the job? For a hobbyist for professionals?

The two major apps I know about are Agisoft Photoscan and CapturingReality.
Agisoft is the most common, it's very robust and highly customizable. CR is faster, but it has an algorithm that tends to enhance the scan data which can create more detailed but less realistic results.
Agisoft has a cheap home user license and a pretty expensive pro license. CR is only available as a pro app as far as I know. Obviously Agisoft is for the hobbyists.
There are lots of other free apps, but they're not as advanced.

What type of equipment is required? Professional equipment vs do it yourself? Can you even do it yourself?

Photogrammetry can work even with a single camera, so it's the cheapest approach :)
The only downside is that taking 100+ photographs of any subject with the same conditions can be complicated - even for static objects, lighting can change, and you'll also have to be careful about the camera positions for each image.
A live subject can't stay still for even a single second, so a single camera won't work there.

High end systems are usually built with 30-200 DSLR cameras, even studio lighting with flashes synced to the cameras, and a lot of wiring to control the cameras and flashes and to download the data quickly. Mind you, this can mean hundreds of 10+ megapixel images per scan, so the system has to be really fast. At Digic we have an advanced system with custom wiring, software and such - but practically all studios have to develop this stuff on their own, too.

Still, scanning stuff like trees or sculptures of everyday objects is easily manageable on a pretty low budget. And also, environment scans (rocks trees buildings etc) are usually done with a single camera even for highend uses.

What is the usual type of processing required post scanning the model? What are common issues?

Scan data is basically a triangulated point cloud, with messy UVs. Basically useless for anything on its own :)

You usually want to clean up the scan manually in a sculpting app like Zbrush, and for that you need a more usable model built of quad polygons. Even zbrush has auto modeling tools to build that mesh, but there are a lot of other solutions as well.

Common issues with photogrammetry are:
- Hair and fur can't be scanned and it also messes up the scan data. Human subjects should be shaved and the hair should be covered by a cap.
- Deep recesses can't be scanned, like nostrils or ear canals, or the inner mouth and teeth. These have to be manually fixed with digital sculpting.

Are certain heads/faces/colour more prone to issues than others?

Facial hair is a no go, same for body hair. Without really diffuse lighting, the texture data won't be clean either.
Shiny objects and anything without a surface pattern will also mess up photogrammetry scans. Still, you can use structured light with a few projectors to compensate for this.
 
I've personaly always wondered how much else besides the imediate outside surface of a persons head gets captured today but never took the time to research myself. Things like the subsurface layers of the skin, if there are systems that also scan skin thickness, or different types of tissue like muscle, fat, veins, etc. Also, how other parts of the head are handled, like their teeth, iris, facial hair distribution... If there are guys that take models of an actors teeth like a dentist, or x-rays for their skull's profile. If there are industry standard ways of mesuring different properties of someone's hair, like albedo, thickness, curlyness...

Standard photogrammetry won't get you anything beyond a simple mesh and the photo textures generated from projecting the captured images on the model.

There is of course the Lightstage developed by Paul Debevec, which is a much more complicated system.
It's basically a combination of a photogrammetry rig, a geodesic sphare rig for computer controlled LED lights, and some structured light projectors.
Structured light means that you project a set of predefined patterns on the surface and the software can analyze that.
Lightstage can also measure reflectance and record data like specularity or translucency, to near perfect values.

Lightstages are very expensive and immobile though, as far as I know there are only a few of them.

The use of various extreme poses for blend shapes also has me curious. More specifically, how those poses are chosen, and how is it made sure actors actually hit the decided pose. How are those specific facial set-ups comunicated and verified...

For facial expressions, the common ground is FACS - Facial Action Coding System - developed in the `60s by clinical psychiatrists, in order to describe the facial expressions of patients. The base set of Facial Action Units is about 45 basic expressions, but the possible combinations are in the thousands.
http://www.cs.cmu.edu/~face/facs.htm

For scanning, the usual approach is to scan 40-50 elemental expressions, 5-20 combined expressions, and then about a dozen or more complex expressions like Happyness, Anger, Suprise etc. to get a good range of the combinations. This is usually enough to build a robust facial rig with blendshapes.
 
I think this is the latest version of Debevec's system:
mike_Paul.jpg


And here's the research page, with LOTS of stuff:
http://www.pauldebevec.com

A good walkthrough of the process:
http://www.ted.com/talks/paul_debevec_animates_a_photo_real_digital_face
 
Thanks yosh. I'll dive into your links later at home. But from what you've said, there still a lot of artist hand-made stuff behind these models, even with photogrametry. I'm assuming things like specular, roughness, subsurface transmitance/color are mostly guesstimayed by talented artists on the more modest setups...
As for the elemental expressions, my question was more with regards of how the specific desired expressions are comunicated to the actors precisely. Or, how do you explain to a non 3d facial rigger to lift muscle X and Y while keeping muscle Z relaxed? Is there more leeway in the interpretation/execution of those poses than I'm assuming or are actors just very good at this stuff?
Also, are multiple variations of the skin textures used for differences in how stretched/tense they are. I mean it, beside the large wrinkles, on a skin pores level, is that stuff taken into acount, or is it not that noticeble?
 
Thanks yosh. I'll dive into your links later at home. But from what you've said, there still a lot of artist hand-made stuff behind these models, even with photogrametry. I'm assuming things like specular, roughness, subsurface transmitance/color are mostly guesstimayed by talented artists on the more modest setups...

Yeah, most studios are not using measured data and hand paint those maps instead. However, there's actually very little variance there; the oiliness of the skin that drives roughness and reflectivity is quite similar on all people, as well as the thickness of the facial tissue (of course there are some differences based on amount of fat and skin type, but those are easy to account for).

As for the elemental expressions, my question was more with regards of how the specific desired expressions are comunicated to the actors precisely. Or, how do you explain to a non 3d facial rigger to lift muscle X and Y while keeping muscle Z relaxed? Is there more leeway in the interpretation/execution of those poses than I'm assuming or are actors just very good at this stuff?

This is actually the hardest part for quite a few reasons. It takes a certain learning curve to learn what each FACS AU actually does, and some practice to be able to reproduce them. So, most scans are inaccurate and take some manual work to fix...
The complex expressions scans can help here a lot. The artist compares the elemental scans and the complex scans, mixes them, augments them, and also watches a lot of the head cam footage to see how it works in movement. So, you still need really good artists most of the time :)

Also, are multiple variations of the skin textures used for differences in how stretched/tense they are. I mean it, beside the large wrinkles, on a skin pores level, is that stuff taken into acount, or is it not that noticeble?

This is certainly a real life phenomenon and it can help to get through that last 1% of realism in completely realistic CG humans. There are actually a few factors:
- Skin microstucture changes with stretch and compression. Pores and tiny wrinkles can get deeper or rougher. However, this can be automated by mutliplying the displacement map based on the changes of overall area per polygon, this is usually called a stress map.
- Stretching or compression can change the blood flow in the tiny capillaries in skin tissue and can cause the skin to redden or whiten up. This can also be driven automatically, or manually. In real life there's also a slight delay. As far as I know the fairies in Maleficent were using this to full extent.

We've experimented with the blood flow stuff but it was barely visible, so we usually don't bother with it.
I guess it's a lot more obvious in games, where the wrinkles are only present in the normal maps and thus they're not affected by lighting as much; here, changing the color texture can be much more dramatic.
 
Also, about not getting everything from the base scans - high end studios like ILM or Weta have really advanced custom software for the facial performance capture, which can help to finetune the basic expressions as well. Unfortunately - and obviously - these are industry secrets, considering the huge advantage they offer, and the giant R&D investment, so very little is actually known about how tis stuff works...
 
Thanks again. That answers most of my questions, and gives me some places to read further. Apreciated it.
 
I wonder if there is something like Facs for head/face composition...let me explain:

Facs uses some base expressions, which can then be combined to get a wide variety of realistic looking facial expressions.

I am wondering if you can scan a few human faces, maybe also faces identified as 'basic', and then combine those basic faces to generate a whole bunch of new, but still realistic faces?

As an application, I am thinking about scenes were you have a large crowd of people and don't want that everyone has the same face, people in a stadium, AC crowd, an army, etc etc.
 
I wonder if there is something like Facs for head/face composition...let me explain:
Facs uses some base expressions, which can then be combined to get a wide variety of realistic looking facial expressions.
I am wondering if you can scan a few human faces, maybe also faces identified as 'basic', and then combine those basic faces to generate a whole bunch of new, but still realistic faces?
As an application, I am thinking about scenes were you have a large crowd of people and don't want that everyone has the same face, people in a stadium, AC crowd, an army, etc etc.

This is actually an interesting idea that I wanted to get back to.

Production budgets are always limited, while ambitions are always through the roof, so there's obviously significant demand for any solution that could simplify character asset creation. There are actually quite a few different off-the-shelf tools for building faces that are or were available on the market; and it seems that from time to time someone in the Blender community comes up with some sort of procedural human head (or entire body) creation tool.

The thing is that none of these tools were able to provide reasonably good results. Human faces (and bodies) are incredibly complex, but at the same time the differences can also be really subtle - so a few sliders to blend between presets seem to create results that are either too similar, or just simply not realistic enough. We've processed about 16 head scans for the background gangsters in AC:Syndicate, and I couldn't pick even a single facial feature that was just reasonably similar between the subjects (things like jaw, eyes, nose, mouth, cheeks, ears etc.).
Strangely enough though, I still think that there is a limited number of variations in each, based simply on observation of real people. The easiest example would be to compare parents and their children, you can almost always find similar features if you look at the faces closely enough. It's probably also the reason why you can find a lot of celebrity doubles - there'd always be differences based on how the body has aged and similar stuff like the amount of fat or how the skin has aged, but still, the only explanation for such close likeness has to be a similarity in DNA. It's just that the number of 'components' must be really high, and so the possible combinations are just too many.

Also, I am aware of at least one reasonably good implementation of such a 'face generator' - but it was created by Weta for Tin Tin, and it was about building a lot of highly caricatured faces. And even with that non-realistic goal in mind, they still had to build in a lot of additional logic, like marking certain areas of the face that had to stay intact even after blending the components together. I can't find the white paper right now as I have other stuff to do ;) but it's probably available online somewhere.

But in the end, at least for realistic human faces, it's still a lot more efficient to build a pipeline that reuses models, UVs, blendshape and other rigging data, and just scan a lot of different faces; then add even more variation with facial hair which can really transform the look of a face. Then at the last step, you have to carefully manage the densities and appearances of 'clones' in the final shots, to make sure that they're hard to catch :)
 
I just find photogrammetry system scarier than a dentist 's chair :oops:

Hehehe, that's actually both funny and at the same time not that far from the truth :)

Our first system had a relatively small number of cameras (36) which meant that if we wanted to do detailed scans for body parts, we had to rearrange and re-calibrate them all. This meant that the talent had to stay completely still in the same pose for like 10-15 minutes, and having been the volunteer test subject I can tell you that it's surprisingly exhausting and somewhat painful as well ;)
Of course most of the big scan studios usually have two separate rigs, one for heads and one for full bodies, all with enough cameras to cover reasonably generic areas (100+) so they don't have to calibrate them for each shot. This is obviously what we're shooting for as well.

However, there are still a few parts that might be unpleasant and thus scary ;)

The main weakness of photogrammetry is that any kind of hair will cause artifacts on the model, as the system gets confused by catching tracking points that are not actually on the underlying surface. It's not a big deal for head scans, as you can cover the hair with a cap, and asking the talent to arrive clean shaved isn't such a big deal either.
But for body scans, this means a (near) full shave, which I've eventually decided to do in the name of science and company advancement and all... It gets kinda unpleasant after a week or so, but nothing is as bad as all the s*** you have to take from friends and co-workers, which will continue well after all the hair has grown back... ;)

The other part is the FACS facial expression scans. Most actors - even in what you could call "B" level TV roles - are pretty busy; and when you hire someone from a modeling agency, you usually pay by the hour. So there's very little time to get the talent familiarized with the Action Units as a start; and as I might have mentioned, many of the elemental movements are actually really hard to do without practice (and some people can't reproduce them even after a considerable time). I'm not sure how well it comes through, but most of the time the people in the studio tend to settle for near hits relatively quickly, instead of trying to get the talent to do things right. At least the complex expressions are usually a good secondary source that the blendshape artists can turn to when the AU scans are not good enough.

AS for the scan sessions themselves, I think they're not that unpleasant or scary or uncomfortable. There are so many cameras that your mind sort of fails to register that you're being recorded; and you also have to be completely alone in the space, as anything other than the rig would mess up the scans. Most studios have at least one online camera though, so that the operators can see what you do and deliver instructions when necessary, and there would be speakers so that you can hear them. It's also practical to have a 'digital mirror' which is basically a screen that displays the feed from that online camera, so that the talent can see how the body or face pose looks like. If anything, it should actually be somewhat more relaxing for models than a typical photo shoot where you can see the photographer and all the rest of the crew while you have to do your work.
 
Back
Top