Mocap/facial animation methods and design decisions *spawned from LA Noire thread

Laa-Yosh

I can has custom title?
Legend
Supporter
Mod: thread spawned from the L.A. Noire game thread. The topic is a branch of the motion capture methods used by Rockstar in this title and their promises of realistic virtual acting. What are the current methods and limits in realtime character animation, and the pitfalls they face? What new techniques are being introduced?

That's kinda underwhelming... after all the talk and uncanny valley stuff, it should at least look as good as any other character driven game, but it's more like the Godfathers...
 
That's kinda underwhelming... after all the talk and uncanny valley stuff, it should at least look as good as any other character driven game, but it's more like the Godfathers...

The faces do look weird. But do you think, given their unusual composition (can't describe better than wax like), that there may be some trick up the sleeve in terms of animation. They've done a bit of talking about a revolution on character faces I think.
 
A lot of them talk about revolutionary tech and whatever, and then it usually ends up being the video based mocap from ImageMetrics, outsourced to somewhere for processing...

GTA4 ended up with relatively good facial animation, Red Dead Revolver is promising too, but AC2 was a mix and Heavy Rain doesn't look that good either. Mass Effect 2 is really the only game that's been impressive with regards to facial animation recently, especially as it's very likely not using face mocap.
 
A lot of them talk about revolutionary tech and whatever, and then it usually ends up being the video based mocap from ImageMetrics, outsourced to somewhere for processing...

the GI article I noted above, iirc, mentions that it is all done in house at a new studio, 30 cameras, filmed with voice composed at same time to match lips and processed on site.

I'llpull the article out after work and see if I can type the bits of interest.
 
What happens if they want to localize the scenes ? (e.g., different languages)

The faces do look weird. But do you think, given their unusual composition (can't describe better than wax like), that there may be some trick up the sleeve in terms of animation. They've done a bit of talking about a revolution on character faces I think.

I remember they showed the face of an old man, which's much more interesting than the latest screens above. Would be nice if they can give another screen of a facial close up (or at least head + upper-torso).
 
Unless they give that up, they will still need to do a lot of clean up based on Laa-Yosh's previous description of mo-cap work. I think that's why he mentioned the outsourcing part in his post above.

If they do different takes, then the visual performance (and data !) for different languages will be different. Does the game have enough space to store multiple languages ?

Also want to see their control scheme. The characters may move in a twitchy manner if a traditional control scheme is used (or people may complain about input lag).
 
Unless they give that up, they will still need to do a lot of clean up based on Laa-Yosh's previous description of mo-cap work. I think that's why he mentioned the outsourcing part in his post above.

I'm re-reading now and will post details in a few minutes
 
Last edited by a moderator:
here is what the article says:

Surrounding the actress (in full 40's make up and hair) are 32 cameras that capture her image form every possible angle. Once up and running, the actor interacts with the director through a monitor posted directly in front of her head with crucial line prompts and cues form director. once completed, every detail of the performance - dialogue, expression, eye movement, even make up details like black eyes or burns - are directly pipe lined into the game with no involvement from the animators.

In this way LA NOIR represents a total break from conventional game development and animation. instead of recording dialogue, animating and performing motion capture as separate steps of the process, Team Bondi (using technology developed from sister company Depth Analysis) is capturing human performances just as a filmmaker would- except instead of generating movie footage, they comae away with fully animated 3D models. It's a tremendously advanced process -depth Analysis says. their Australian facility is equipped to store 200 terabytes of capture data -but one that allows them to work more quickly than traditional animation techniques. .... "we can mass produce... we can produce 20 minutes of footage a day and it's seamless - i don;t even have character artists or animators working with me."
 
Hmm... doesn't answer the question directly though. We'll find out soon enough.

EDIT: Laa-Yosh, it looks like they are saying no human effort is needed to clean up the raw data points.
 
there is an "unprecedented amount of spoken lines a script of around 2000 pages. (noting the average hour long TV how has around 50 pages).
yea, I'll guess that multiple languages are screwed. :D




The characters may move in a twitchy manner if a traditional control scheme is used (or people may complain about input lag).

I know it's a BJ piece but here is their impression after seeing it in motion

GI:
seeing side by -side comparisons of the actors with their in-game likenesses, it's clear McNamarra's technical team (staffed mostly by Depth analysis) is treading new ground in terms of facial animation in games. At first it's almost eerie. from the slightest raise of an eyebrow, the facial models are virtually indistinguishable form the real thin. ...since physical performance and dialogue are done at same time, lip synching problems are non-existent, allowing the player to finally react to the characters as real actors in a way that even games like uncharted 2 or Mass Effect haven't achieved.
 
EDIT: Laa-Yosh, it looks like they are saying no human effort is needed to clean up the raw data points.

they are not using traditional motion capture with raw data entry points and the little white balls :D. they are filming character actors and creating fully fleshed 3d models of them directly in the game world
 
they are not using traditional motion capture with raw data entry points and the little white balls :D. they are filming character actors and creating fully fleshed 3d models of them directly in the game world

Ha ha, yes, the "creating fully fleshed 3d model" part may introduce noise, which their automated system will have to deal with.

I know it's a BJ piece but here is their impression after seeing it in motion

What's a BJ piece ?

The paragraph doesn't talk about control scheme though. I think life-like animation brings a new level of believability and performance to the table (e.g, MLB The Show, Modern Warfare or Heavenly Sword).

Uncharted 1 & 2 solve a different problem. They have animation blending so that the developers don't have to create tons of animation for all the different actions. If LA Noire's facial capture mechanism works, then it should add on top of whatever they already have. Even in U2, the character may feel twitchy when you control Nathan.

Killzone 2 is the one with more refined and organic animation. :)
 
What's a BJ piece ?

The paragraph doesn't talk about control scheme though. I think life-like animation brings a new level of believability and performance to the table (e.g, MLB The Show, or Heavenly Sword).

Uncharted 1 & 2 solve a different problem. They have animation blending so that the developers don't have to create tons of animation for all the different actions. If LA Noire's facial capture mechanism works, then it should add on top of whatever they already have. Even in U2, the character may feel twitchy when you control Nathan.

Killzone 2 is the one with more refined and organic animation. :)

yea.. this speaks mainly to dialogue scenes as the game is based around investigating and interrogating subjects and trying to infer their honesty or intentions while speaking with them. so the facial and speaking stuff is KEY to the game. As for actual animation of the full bodies in the gameworld? not sure.

What's a BJ piece ?

fluff piece (not critical, more informative and friendly)
 
EDIT: Laa-Yosh, it looks like they are saying no human effort is needed to clean up the raw data points.

That would be a first in the history of any mocap as far as I know... Then again, a system with no need for manual work is possible, but I wonder about the quality it can produce.

As for the article, well, too bad the images don't appear to be as good, maybe they're from a far earlier build of the game. I'd really like to see some footage of this all.
 
they are not using traditional motion capture with raw data entry points and the little white balls :D. they are filming character actors and creating fully fleshed 3d models of them directly in the game world

Again, that would really be unprecedented, but when can we see it in motion then?
 
yea.. this speaks mainly to dialogue scenes as the game is based around investigating and interrogating subjects and trying to infer their honesty or intentions while speaking with them. so the facial and speaking stuff is KEY to the game. As for actual animation of the full bodies in the gameworld? not sure.

No wonder the screens look "normal". None of them showcase their tech (No facial close-up).

We'll have to see how well they apply the tech. With a face cam, they should be able to capture all the subtle facial expressions (like Avatar).

Heavenly Sword's face mo-cap tech was used to create a theatrical-like "over" acting. The final experience is rather powerful emotionally. The actors could always exaggerate their expressions and movement to drop facial hints and signal body languages. Would love to see a LA Noire's video !


That would be a first in the history of any mocap as far as I know... Then again, a system with no need for manual work is possible, but I wonder about the quality it can produce.

As for the article, well, too bad the images don't appear to be as good, maybe they're from a far earlier build of the game. I'd really like to see some footage of this all.

I read a Popular Sciences (or Scientific American) article on Avatar on the plane. It claimed that the Weta tech is also automated to create convincing (no uncanny valley effect) facial expressions for the aliens -- based on just one face cam. James Cameron insisted that they reworked the technology, otherwise the movie wouldn't work.
 
BTW, I'm not an advocate for the game or the tech, just happy to share some info not yet released on the internet of an interesting possible technical advancement and one that I look forward to interacting with if it works as well as they imply.
 
Ah, I think it could work in LA Noire's context. For other action-oriented games (e.g. U2 has platforming and a lot of weapon animations -- like dropping/lobing grenade in different postures and angles) to use the similar tech, they may need to do a lot more work. The cutscenes would benefit from even more realistic animation.

The multi-langual facial expression work would be interesting too.
 
Patsu, don't believe everything you read in magazines, not even Cinefex - it's usually very, very far from the truth for various reasons. The best info usually comes from smaller, more 'insider' websites' reports, and of course the people working on the movie, as it's a small industry after all.


Avatar's facial tech had the following components:

1. A face cam tracking a certain number of markers placed on the most important facial landmarks.

2. An image analysis software that interprets the marker data, using a set of previously recorded reference positions for the most basic elements of facial expressions (it's called the Facial Action Coding System, its components are like outer brow raiser, lip funneler, blink etc). The result is a kind of metadata describing the intensity for each of these ~50-60 Action Units.
The software can be 'trained' as in, if it's interpretation of an expression is wrong, it can memorize the manual corrections and apply them on its own the next time. However it's unable to capture tongue movement or pretty much anything happening inside the mouth, anything that's covered up (by a hand, or Jake's head when kissing Neytiri) and it's still a mechanical system, unable to get the intentions and emotions on its own. Which is why they had up to 10 HD camera operators on the mocap stage who filmed everything from multiple angles to provide additional reference for the animators.
(Also, the marker data can be first analyzed in real time to directly drive a far more simple rig in Motionbuilder to provide some feedback for the director on the virtual set, but it's not the same as the final, high detail stuff)

3. The actual detailed face rig in Maya, which models the elemental Action Units using blend shapes; also using additional blend shapes to fix the result of combinations of AUs (like smiling and talking at the same time). The face geometry is extremely complex (20-30K quad polygons at least) to allow the actual modeling of facial wrinkles. Up to a few thousand (!) blendshapes are required and they're all created manually, sometimes using various references like photos and scan data.

The general approach is based on the blendshape system for Gollum created by Bay Raitt (because of the more extreme expressions, that character required even more corrective shapes). The main difference is that they had no face mocap for Gollum so all the animation was created by hand, using the FACS system to break down the expressions as an industry first AFAIK.

Here's Neytiri's face:
neytiriwiresmall.jpg


Expression studies in Zbrush:
http://www.zbrushcentral.com/showthread.php?t=079195
(these are not the actual highres models used in the movie, but conceptual art from Stan Winston Studios used as reference to create the final versions at Weta)

You can check more about the system here in the feature "Creating the World of Pandora"
http://movies.yahoo.com/movie/1809804784/trailer
When they show Jake's CG face, you can even read the names of the blendshapes on the right side of the Maya GUI.


Also, I kinda like to / have to research this stuff as I've been responsible for our facial rigs for years now and it's all based on a subset of FACS and (when necessary) corrective shapes as well. From what I've gathered, facial mocap is an extremely complex issue and noone has really cracked it yet to work without human interaction...
 
Back
Top