We present KinectFusion, a system that takes live depth data from a moving depth camera and in real-time creates high-quality 3D models. The system allows the user to scan a whole room and its contents within seconds. As the space is explored, new views of the scene and objects are revealed and these are fused into a single 3D model. The system continually tracks the 6DOF pose of the camera and rapidly builds a volumetric representation of arbitrary scenes.
Our technique for tracking is directly suited to the point-based depth data of Kinect, and requires no feature extraction or feature tracking. Once the 3D pose of the camera is known, each depth measurement from the sensor can be integrated into a volumetric representation. We describe the benefits of this representation over mesh-based approaches. In particular, the representation implicitly encodes predictions of the geometry of surfaces within a scene, which can be extracted readily from the volume. As the camera moves through the scene, new depth data can be added or removed from this volumetric representation, continually refining the 3D model acquired. We describe novel GPU-based implementations for both camera tracking and surface reconstruction. These take two well-understood methods from the computer vision and graphics literature as a starting point, defining new instantiations designed specifically for parallelizable GPGPU hardware. This allows for interactive real-time rates that have not previously been demonstrated.
We demonstrate the interactive possibilities enabled when high-quality 3D models can be acquired in real-time, including: extending multi-touch interactions to arbitrary surfaces; advanced features for augmented reality; real-time physics simulations of the dynamic model; novel methods for segmentation and tracking of scanned objects