I've always done this. Personally, I wouldn't mind if the calibrations were more involved. When I setup MacScribe and Mac Dictate those took about 15-80 minutes to learn my voice. I'm pretty impressed at how accurate the 360 voice control can be at times but man do I wish there were profiles tied to accounts, lol. My son uses voice controls ALL THE TIME in his room and sometimes the living room 360 will pick it up and perform the action he stated.
I'm impressed. I have trouble getting to recognise anything I say. (Comes from having an en-us account and an en-za accent). The way we model it, it uses similar technology to, say, a bayesian spam filter. You feed in enough examples of people saying the right phonemes, and it uses that to statistically map a person's speech to those phonemes. With enough samples, and enough processing, it gets scarily accurate, but it does need to be tuned to a specific accent.
We specifically didn't want to have to include personalized voice tuning, since that adds a barrier to entry that would scare away a lot of users (and, in general, a user will adapt his speech to the system much faster than the system could adapt to his speech)
We also had ideas about limiting voice input to the direction of the currently active skeleton, but that broke some important scenarios and would require titles that only wanted to do voice to also spin up the skeleton pipeline. We played around with doing Voice ID too, which would have let people sign in by voice (think mission impossible
) and could have been used to limit commands to the currently signed in user (by voiceprint). But there are only so many hours in the day, and not every awesome idea can get the resources it needs.
If you opt in to the "help improve speech" stuff (in your profile), we use the data gathered from that to fine tune the models and make them more accurate. Voice collection (in the previews) is used by the test team to verify the model quality. We post process the data and feed it through the pipeline to test how many false rejects and false accepts we get. It's also used to help tune the models and give us a bigger sample of rooms for echo cancellation testing.
Interesting side note: Identical twins can fool the Kinect ID 100% of the time, but never fooled Voice ID
. Also, our Kinect ID testers used Nixon, Reagan and Clinton masks during testing so different testers could test the same profile.