By actively scanning I mean if you were to raise your hand, would it be producing a new depth map of your hand for every rendered frame such that it could accurately clip the AR scene that's obstructed by it. All of the stage demos they've shown have only shown AR objects mapped onto forward facing static surfaces (I would lump your ball/couch example together with those), or the moving robot with tracking LEDs, and have shown no examples of AR objects being occluded by physical objects. Obviously the orientation/position of the HMD is updated frame by frame in real time but I'm still very doubtful of the speed and precision of the mapping, which is the primary hurdle that AR faces compared to VR. This might sound like nitpicking, but the need for robustness and completeness is pretty crucial when you move from carefully chosen demo rooms to real world offices or homes, and considering the great pains that microsoft took to orchestrate the camera work and presenter positions in the stage demos, I'm guessing they feel it's pretty important to hide this if it's indeed a limitation. However cool it might sound to have virtual objects or screens that can be seen irrespective of the foreground physical objects, it's incredibly destructive to the perception of the virtual and physical objects existing in the same space, and pretty annoying when your eyes are battling back and forth to try to converge on a particular depth (akin to trying to watch a TV screen that has glossy reflections.)
edit: Someone's more comprehensive overview of the tech demo:
http://doc-ok.org/?p=1223
So no occlusion of AR content from physical environment, either static or dynamic. Additive only (no opaque objects or objects that are darker than the physical background.) Fixed/single focus plane.