Am I understanding this correctly that it is quite crucial that the loudness at which you issue your commands are calibrated properly? From previous posts, I was under the impression that this tech is that good, that it would work good enough, irregardless if I am sitting in the same spot where I calibrated it or moving around the livingroom - and assuming that livingrooms vary in sizes, from small to large or having various ambient noises. It might be easy to equalize out sounds going through the Xbox, but what about different noises - noisy road outside etc.
I guess that's something you can't say, but if judging by that video the commands don't work well enough with what I figure to be reasonably calibrated in a more or less quiet room - what about less optimal circumstances (e.g. large room, more ambient noise, people moving around)?
I'm not disputing that the tech can work well indeed - but I question how often this is the case in the average livingroom outthere. How well were these things tested? Did MS test these things out in the field (e.g. employers homes) or were these tests confined to offices where these things are simulated to some degree?
No, the loudness of the audio coming out of the speakers has to be lower than the volume of the audio used in the calibration tones, or echo reduction performance degrades very quickly. They use beamforming to reduce the effect of ambient noise, but the louder your room is with sounds not generated by the Xbox, your performance will be degraded proportionally.
We used to get reports of people complaining about bad speech performance, and when investigating, you would get a story like "Well, I saw it said to calibrate loudly, but it was late at night and I didn't want to wake the baby". Like I said, inability to follow simple directions.
There was definitely testing done in real living rooms. There are at least two official setups in a house Microsoft owns for testing purposes. There are 3 or 4 special built rooms on campus configured as living rooms of different sizes, each with two or three independent audio systems set up, and there were hundreds, if not thousands of users in the Beta, using the product in their actual home, with it sending back telemetry all the time.
As an aside, the MS offices are basically the absolute worst case for echo reduction code, they're small, so the echo return is right on the heels of the original sound, and they have stark walls and glass, so they have multiple echo returns for every noise. The tech works significantly better in real environments.
Until just a few years ago, multichannel echo reduction was considered a theoretical impossibility. Physics limits how well you can do it, and it can never be perfect. The Kinect audio pipeline actually surprised a number of audio researchers, because it manages to almost reach the theoretical best you can do most of the time, if calibrated correctly. Unfortunately, it's like Prius fuel economy, it's so finely tuned that any disruption in its calibration can have a surprisingly large effect on performance.