Project Natal was announced on Microsoft's E3 press conference. The technology uses a 3D camera and the SDK for it seems to support face recognition, voice recognition and fully body motion scanning in real-time.
Discuss the technology and its potential applications here.
Things we know about the technology:
Eurogamer interview:
Eurogamer: How does Project Natal work with Burnout?
Alex Kipman: Essentially we do a 3D body scan of you. We graph 48 joints in your body and then those 48 joints are tracked in real-time, at 30 frames per second. So several for your head, shoulders, elbows, hands, feet...
Say I'm tracking a wrist, which is what I do for Burnout. I can look at that on a single frame and I can see what direction, acceleration and confidence I have for that joint. Why is that interesting? Because it allows me to not only know where you are, but to know where you're going to be. This is how we do the directing and the predictive behaviour.
If you think about swinging a baseball bat, by the time you're halfway done with the swing, I know not only where you're going to end but when you're going to end. There are very precise and predictable ways so you can have that immediate payoff of my baseball bat hitting the baseball.
- http://www.eurogamer.net/articles/e3-post-natal-discussion-interview
The 3D sensor itself is a pretty incredible piece of equipment providing detailed 3D information about the environment similar to very expensive laser range finding systems but at a tiny fraction of the cost. Depth cameras provide you with a point cloud of the surface of objects that is fairly insensitive to various lighting conditions allowing you to do things that are simply impossible with a normal camera.
- http://www.3dvsystems.com/technology/tech.html
WRT precision - it's a time-of-flight camera, which means they emit a IR light ray and measure the time for it to return; they probably scan the room with the ray, and they can control it at will; one of the materials said they scan the entire scene in 5 frames; if they want to detect only hands, for example, they can focus mostly on the area around where the hands were last frame, but receive increased precision in return.
- http://forum.beyond3d.com/showpost.php?p=1299174&postcount=146
My own general summary based on what I know so far:
The camera can scan its full field of view in 5 frames, generating a point cloud similar to how for instance environments can be digitally scanned by laser these days. It uses this to find the body. Then it focusses on the body and generates a new point cloud which instead of wasting a lot of time and points on the whole field of view, now shows mostly the body. This information is mapped to a predefined skeletal framework that has 48 joints, to determine the body posture of the player in 3d. From here on, it will keep focussing on the body solely and compare each new 'frame' / point-cloud with the previous position, to determine several parameters for the joints, like position, speed and direction of movement, and also how accurate the software in the camera doing the analys thinks this information is ('confidence' per Eurogamer's own interview). It gives back this information at 30 frames per second apparently (also per the Eurogamer interview).
Discuss the technology and its potential applications here.
Things we know about the technology:
Eurogamer interview:
Eurogamer: How does Project Natal work with Burnout?
Alex Kipman: Essentially we do a 3D body scan of you. We graph 48 joints in your body and then those 48 joints are tracked in real-time, at 30 frames per second. So several for your head, shoulders, elbows, hands, feet...
Say I'm tracking a wrist, which is what I do for Burnout. I can look at that on a single frame and I can see what direction, acceleration and confidence I have for that joint. Why is that interesting? Because it allows me to not only know where you are, but to know where you're going to be. This is how we do the directing and the predictive behaviour.
If you think about swinging a baseball bat, by the time you're halfway done with the swing, I know not only where you're going to end but when you're going to end. There are very precise and predictable ways so you can have that immediate payoff of my baseball bat hitting the baseball.
- http://www.eurogamer.net/articles/e3-post-natal-discussion-interview
The 3D sensor itself is a pretty incredible piece of equipment providing detailed 3D information about the environment similar to very expensive laser range finding systems but at a tiny fraction of the cost. Depth cameras provide you with a point cloud of the surface of objects that is fairly insensitive to various lighting conditions allowing you to do things that are simply impossible with a normal camera.
- http://www.3dvsystems.com/technology/tech.html
WRT precision - it's a time-of-flight camera, which means they emit a IR light ray and measure the time for it to return; they probably scan the room with the ray, and they can control it at will; one of the materials said they scan the entire scene in 5 frames; if they want to detect only hands, for example, they can focus mostly on the area around where the hands were last frame, but receive increased precision in return.
- http://forum.beyond3d.com/showpost.php?p=1299174&postcount=146
My own general summary based on what I know so far:
The camera can scan its full field of view in 5 frames, generating a point cloud similar to how for instance environments can be digitally scanned by laser these days. It uses this to find the body. Then it focusses on the body and generates a new point cloud which instead of wasting a lot of time and points on the whole field of view, now shows mostly the body. This information is mapped to a predefined skeletal framework that has 48 joints, to determine the body posture of the player in 3d. From here on, it will keep focussing on the body solely and compare each new 'frame' / point-cloud with the previous position, to determine several parameters for the joints, like position, speed and direction of movement, and also how accurate the software in the camera doing the analys thinks this information is ('confidence' per Eurogamer's own interview). It gives back this information at 30 frames per second apparently (also per the Eurogamer interview).