Rather funny video, though I think it highlights some of the problems associated with voice-controls. See, if you have a remote, you have some sort of tactical feedback or an instant response that makes you aware that the buttonyou have pressed has registered and will be executed. With voice-command, it's not quite as easy. How do you give that sort of tactical feedback, especially if the commands are not limited to one word commands, but can potentially range to more complex phrases and commands?
At some point, there is always going to be some latency involved - because the machine has to explore the possibility that while you're still talking, the command might not be complete yet. Sure, it's probably like a search engine - once the mechanism knows you're in the process of issueing a command, the more you say and the more is matched, the more you are singling out the one command that you are likely trying to trigger. That lack of tactical feedback however is one of the reasons why I don't see voice-commands being very ergonomic. Especially given that the mechanism has to differentiate if you're talking to yourself (some people do that), talking to other people, or issueing commands.
Sure, if you have to state "xbox" at the beginning of every phrase or every command, the context is somewhat clear - but it also can become tiresome. At least, I found that video tiresome hearing him repeat "xbox" all the time and it would feel as if you're talking to a dog that isn't quite adept in the commands you are trying to give. Also the more error prone the system is, the more you are inclined as a user, to turn up some kind of "kinect voice" that may not fit well in the vision of controlling your livingroom of the future.
Having said that, I wonder if they ever gave much thought of implementing an on screen display when ever you're issuing a command - something that pops up like an assistant. I.e. when you start issuing a command "Xbox go to ..." that it pops up with that and also gives you a short glance of what could be followed to complete the command, similar to the dynamic auto-complete mechanic most search-engines or wikipedia has. At least this would give you the tactical feedback of knowing while you speak that your Xbox is listening and you have some sense of which commands were understood correctly.
On the other hand, since you need space to show this information and it might not be exactly practical to show this pop up at all times (for instance while playing a game), I again wonder if voice-recognition is the best path into the future or if it just isn't a bit too clumsy/error prone/complex for every day use?