You brought up good, valid points in the rest of your post, but in the true tradition of forums, I'm only going to answer one small part of it
.
MS has two technologies running in the Kinect to deal with the scenarios you posit: echo cancellation and beamforming.
...
bkilian, many thanks for taking the time to explain. I did actually think about echo cancellation, but was unaware of the beamforming methods. Highly fascinating tech for sure.
Just to explain some of the reasons why I'm skeptical and my posts that come across as a tad too critical: The tech guy in me actually loves the stuff about Kinect. From a technology standpoint, I'm deeply impressed by how the tech has evolved and what potential it offers. As a personal thing, I also prefer it to the Move on PlayStation, since that is dependant on a hardware (the Move controller) helping out the camera. That's purely personal preference though - and I do admit, both directions feature different strengths and weaknesses.
The reason for why I'm a bit skeptical is more from a market perspective, as in Kinect and how it is included in the business approach of Xbox One. I do agree that voice-control offers some advantages over some inherent disadvantages of a remote (like switching from a game to a specific channel), but at the same time, I don't think that's such huge advantage - and that advantage (if you can call it that) still hinges on how well it actually works. Not every livingroom is the same - and I'm sure there are some livingrooms where Kinect will work more effective and some where it won't. The question is; at average, how well does it work? This is not something that people will accomondate their livingroom around Kinect - but the other way around. Most people have a livingroom that is already set-up in a certain way. The more casual you go, the less optimal it will probably be set-up. How well will it work?
I'm not sure if this is an issue in the states, but in Europe (or at least my country) most modern appartments/houses have bigger livingrooms with open kitchens. That means, the noise level has generally increased and because the room is bigger, the distance to where ever the mic is located, is probably further away. Add to the fact, that as people buy bigger screens, they are more likely to move further away from the screen than closer. I'm sure this is to some degree solvable (or the effect minimized) by the techniques you mentioned, but the challenge doesn't stop there:
There's probably a large distinction between how well a command is understood to how well basic strings/words are translated into correct letters and spelling on the screen. That's why I'm assuming that the basic interface will work well, because from the 20 to 100 odd voice commands, they are predictable to some degree and the error-margin is smaller. It gets interesting as soon as we are talking about strings that are not basic commands. Channel X is already a higher challenge (where channel is a pre-fixed command and X a string) - or program Y, where program again is a pre-fixed hard embedded command and Y is a string like "Dr. House" where the program itself is perhaps only named "house", which might apply to other programs running at the same time on multiple channels. These are just some examples of the challenges awaiting. I'm sure I could think of more complex ones. Then there is the question - if these programs are running through some Xbox TV app or something - and how voice-commands are linked with dynamic content? If I am to take a guess, it's all down to how well the technology will translate strings through voice-command into correct spelled words which are then matched up by the TV guide to see what fits and the result with the highest matchup will then display (like a modern movie scraper for instance). Sounds great in theory, but how well will it work, especially if there are multiple hits etc - or more crucial, if the strings are misunderstood, miss-spelt and not found?
Also, surely, there must be some rules. I doubt you can simply just talk to your Xbox and it will sort out the rest. I don't think this will be possible because Xbox is a machine without any form of intelligence - thus, context is difficult to put into commands. So you might have rules or bounderies where a command will need to be spoken in a specific order if you want it to work, at least for a higher accuracy.
As I already mentioned - I find this tech extremely fascinating from a technical point of view - but more to the point, if this doesn't work well on a daily basis to the casual consumer, people will stop using it. *If* the gamepad will have to be used (or will be, purely by convinience) because voice-controls isn't that accurate in real-world usage, then surely it defeats the whole point of using it in the first place, which is when people will instinctively grab the gamepad instead of fumbling with voice-commands that on average don't get the desired results. It's only a good replacement if it's a full replacement.
I actually think the prospect of being able to install a Xbox-Controller-App for your smartphone or tablet (any OS mind you) to be far more exciting than any promising tech via voice-recognition. But has Microsoft even proposed something like this is in the works? I haven't heard anything, although IMO - this would be a brilliant move, probably easy to realize technically - and a whole lot more resource friendly too. Although not as exclusive as voice-commands.
I see why Microsoft is betting on it. It's their ace-card, their joker card so to speak, offering something that is not yet on the market to this extent. But going by how they demonstrated it, it's also in their interest to make you think it works flawless. And to some degree and judging by the euphoric-like responses by a few in this topic, I somehow think that a lot are thinking especially about the "potential", the endless posibilities, looking through rose-tinted glasses where the belief is strong that somehow it'll just work like portrayed in so many science-fiction movies. I think once it comes out, some people will be perhaps very disappointed simply because the expectations on what it can do, will far exceed what it actually will. And reality will set in, on how limited it will end up being. Simply because the complexity to voice-recognition is huge and a whole lot less predictable, unless you set certain bounderies or rules on how to use it. Which at that point will make it a glorified super remote controller with some inherent flaws along the way. Or I might be all wrong and this will turn out to be biggest thing since the invention of the IR remote. But somehow, when thinking of the technical challenges, I somehow doubt it.
Of course, having said that, even if it fails on expectations, whatever the results, the tech and research done will still be hugely fascinating and impressive.
Now, to another point I wanted to point out - Kinect in games. Are there some that are really excited about this? Core-gamers?
I bought PS EyeToy back on the PS2 and was fascinated by the minigames and challenges to moving your hands in front of a screen. That was when I still played my games in front of a small TV in a small room where I could set it up. Then, I got bored with it. I got bored with moving my hands, my head to actually play a game. It was a good excercice, but it wasn't what I bought a console for, not "gaming" in the sense how I am used to and want to play. It's a fun toy for when you have people over and want to play some party games, but on the whole - I just prefer to sit or lie on my couch and play games comfortably.
Now, I know Wii is a huge a success, but despite this huge success, I really wonder how often these games get played. Sure, they had a lot of sales, but all this points to, is that everyone bought something that was considered cool and new. Just like I bought EyeToy and count as a +1 sale, I'm sure so did many people with the Wii. It's cheap, small and refreshing new. But does it actually get used a lot? I'm willing to bet that games get used far more on X360 and PS3 - and without looking, I would also think that the software sales to userbase ratio is higher on these platforms than on Wii.
So, while the world (well Microsoft and Nintendo) is betting on these new casual games, I'm just not too sure. Sure, they get sold, but do people really want it as much as people buy consoles to play core-orientated games? I guess that could be irrelevant for Microsoft, because a sale is a sale, regardless if it's used a lot or not. As a gamer though, I just don't want my games to move in that direction. I think it's a short lived prospect, one that could do more damage than good. I'd rather we split them up: if people want to buy these kinds of games, go ahead, get a Wii or get the add-on. But let us play games with (and only) a controller who want to and prefer it that way. To the same point, even if Kinect has like 20 million sales - how many were bundled? And how many bought it simply because it was a fascinating accessory and portal to new and refreshing games? And how many still use it at all?
I'm not sure, but I do think Tablets and smartphones are where this casual market is heading. I'm not sure Kinect or Move is the future. It may introduce some exciting features in core-games, but on the whole, I don't think the tech (specific to games) is that a revolutionary concept. It's refreshing and new, but it gets old and tiresome quickly.