Poll: How accurate is XB1 voice control?

How accurate is XB1 voice control for you?


  • Total voters
    32
What device do you own where the remote control isn't reliable? Low batteries excepted, that isn't normal and stuff like that drives me nuts. NUTS! But I admit to have a low tolerance of things not responding.

No problems with texts, though. Practise makes perfect!

EDIT: I do recall a JVC DVD player that used a tiny remote powered by one of those circulate watch batteries. I tolerated it because it played back DivX from CDs or DVDs, but yeah - the range on that things was pretty poor.

My samsung TV, my motorola DVR, my toshiba TVs, various dvd players all have a less than 100% success rate at performing the requested command for various reasons. (low battery, crap buttons, line of sight issue, input issue rate)
 
What does surprise me is the acceptance of imperfection in the implementation. How many of us would long tolerate a remote control or controller that demonstrated a mere 5% failure rate?

Because most people understand that voice control is a more complex endeavor than a simple remote. And voice control requires training and huge data sets to get better. Releasing a v.1 of voice thats fool proof is practically an impossible task.

Plus, the 95% isn't tied to commands across the board but more specific commands. Its more like having a couple "less than perfect" buttons on an otherwise reliable remote.

Furthermore, regardless how great most simple remotes are when it comes to registering inputs, all it takes is a person standing here or a object placed there, to turn a simple operation into arm waving and body stretching excercise. So non-responses to a remote isn't something that general users arent used to experiencing. Users tend to more forgiving with these type of devices versus something like keyboards where we are conditioned to expect 100% all the time.
 
Of course, the gold standard really is "Is using my voice faster than using the controller?" For most actions, it absolutely is, especially if your controller is currently off from watching a movie (and I think it takes 3-5 seconds from turning on the controller before it will respond to button presses). So for tasks like switching back and forth between apps, snapping apps to the side, searching and using the game DVR, a controller will often be slower, even if the Kinect fails picking up your voice once.

I'm just wondering if the software will improve such that new hardware isn't needed to pass that 95% clearance even for difficult situations. As impressive as Kinect v2 is, I'm hope the hardware is good enough for 8-10 years of software improvements (keep in mind that Kinect v1 only lasted 3 years, though it may have been seen as a beta product in some instances).
 
Because most people understand that voice control is a more complex endeavor than a simple remote. And voice control requires training and huge data sets to get better. Releasing a v.1 of voice thats fool proof is practically an impossible task.
Hmmm.. I doubt that "most people" really understand the complexities. And agreed on the second but this is Kinect 2 ;-)

Furthermore, regardless how great most simple remotes are when it comes to registering inputs, all it takes is a person standing here or a object placed there, to turn a simple operation into arm waving and body stretching exercise.

IR remotes have been around since the 70s, I doubt there's many people who are now not used the basic limitations of the technology, i.e you need line of sight or, for a lot of remotes these days, something to bounce the IR beam off of. The differentiator is the technology is predictable and the problem with voice is that it isn't. Ever so slight volume or inflection variations can mean it'll work then it'll fail. Then throw in environment factors. It's amazing it works as well as it does!

Predictive text and those Swype-like interfaces make plenty of mistakes, plus editing them after the fact is often awkward.
That's true of voice input too. So isn't that a double standard?

Predictive text and auto-correct are not great but you can turn them off. I use the basic iPhone keyboard and before that the BlackBerry and never had problems with either unless, of course, I hit the wrong key. That's a user failure rather than the technology, i.e. I don't hit a 'W' and get an 'E'. Generally I want technology to be predictable, if given any input you know what the device will do, the user is less likely to get frustrated. But that could just be me. I never 'feel lucky' with Google ;-)

Of course, the gold standard really is "Is using my voice faster than using the controller?" For most actions, it absolutely is, especially if your controller is currently off from watching a movie (and I think it takes 3-5 seconds from turning on the controller before it will respond to button presses).
Yup and these are the only circumstances when I use Siri and 9/10 times this is: "set timer to 10 minutes", "set calendar appointment on Thursday at four pm for dentist". Both are reliable (for me) and both are quicker than using the touch UI.
 
I wonder if the reason for that is that "Xbox On" lacks access to the cloud solution. I wonder how Kinect VC would work with no internet access.
I tried Xbox One without a connection and voice control works the same way.

I tried every single command I could, many of them even new -there are a lot of commands I didn't know about- and new commands related to troubleshooting the network and things like that. They worked really well.

As for Xbox Snap, I hadn't had a single problem with that command til I began to say it more quickly -which I did pretty recently, I used to speak to Kinect more slowly-. If I space out the Xbox from Snap it works without a hitch.

Spacing out commands like that work without a hitch, and everything goes fluently for me.

I gotta try Portuguese, which is a language I like and I understand like 95% of it in real life -if a native is talking to me-. The problem is my accent and also that I know what the words mean for the most part but I can't say them because I use other terms and I get confused.

I lived in Portugal for 5 months and I think I should live there more in order to learn those words and use them on a daily basis.
 
Last edited by a moderator:
Predictive text and auto-correct are not great but you can turn them off. I use the basic iPhone keyboard and before that the BlackBerry and never had problems with either unless, of course, I hit the wrong key. That's a user failure rather than the technology, i.e. I don't hit a 'W' and get an 'E'.
the keys are very small. It's easy to hit a wrong key. That's a limit of the tech which led to the invention of Swype-style keypads, that solve the issue of small keys but with some inaccuracy.

An alternative would be an on-screen keyboard with four large direction keys to move and a select key. That'd be 100% accurate but terribly slow.

Same with voice input. You could ask the user to type in the name of the programme they are looking for, and hope they know how to spell it accurately enough to find it. Or you can provide a voice input that's not 100%, just like the shortcuts they use every day on mobiles.

Yup and these are the only circumstances when I use Siri and 9/10 times this is: "set timer to 10 minutes", "set calendar appointment on Thursday at four pm for dentist". Both are reliable (for me) and both are quicker than using the touch UI.
And that's what users of Kinect are telling us they are experiencing. For most stuff they want to do, there's very high accuracy, and if it misses, a second attempt scores a hit which is still faster than many console-based keypad entries.

I don't see any logic to your '5% should be unacceptable' argument. I'm one who doubted the worth of Kinect's voice input, mostly because I don't want to talk to my console, but I'll freely admit that if it works, it works (perhaps helped by the XB1 UI to be a bit terrible when not using voice by accounts :p).
 
Talking to your console is indeed weird.

But so is playing video games in your 40's, at least according to some.
 
the keys are very small. It's easy to hit a wrong key. That's a limit of the tech which led to the invention of Swype-style keypads, that solve the issue of small keys but with some inaccuracy.
The keys are small but if somebody buys a phone with a tiny keyboard, then it's not an unrealistic expectation to have to be more careful typing. Phones are not magical, I don't care what Steve Jobs said. I don't expect miracles I do expect consistent predictability.

I don't see any logic to your '5% should be unacceptable' argument. I'm one who doubted the worth of Kinect's voice input, mostly because I don't want to talk to my console, but I'll freely admit that if it works, it works (perhaps helped by the XB1 UI to be a bit terrible when not using voice by accounts :p).
You misunderstand me, and please read my initial post again I didn't say it was unacceptable, I said I found it interesting that people would accept even a 5% failure rate in their control device. But as I said, this is one of those low tolerance things for me.

But so is playing video games in your 40's, at least according to some.
I'm 42 and I still encounter the odd person who thinks video games are for kids but it's clear that the last 20 years of video games has passed them by and they still think video games are like Sonic, Mario Cart or Wolfenstein. Mercifully I'm not the oldest of my game playing friends! And the nice thing about being 42 is that I've been working so long that I can buy all the games I want. It's just a shame I no longer have the time to play them!

Bless them.
 
You misunderstand me, and please read my initial post again I didn't say it was unacceptable
You found it surprising people consider the fault rate acceptable. You expected the opposite. The opposite of acceptable is unacceptable. Ergo, you expected people to find it unacceptable, which you wouldn't think if you consider the failure rate acceptable. You wouldn't say, "I think 5% failure rate is acceptable, but I'm surprised if anyone finds 5% failure acceptable." ;)

I said I found it interesting that people would accept even a 5% failure rate in their control device. But as I said, this is one of those low tolerance things for me.
And as I say, people accept low(er) accuracy ever day in other devices. Hell, my keyboard+typing probably doesn't hit 90% accuracy! Near 100% accuracy on the most important and most used functions is very good and usable.
 
I think about the only devices that are basically as fool proof as you can get are hardware PC keyboards and mice. But even they aren't perfect.

Just about every other device comes with their own particular set of issues. Incomplete/imperfect mapping of controls on universal and cable STB remotes. Touch screens in general have had a ton of issues that have been resolve through maturation of the technology.

Kinect and voice recognition in general has its own particular set of issues, the inability to recognize commands due to vocalization errors is the biggest one. Furthermore, its unlikely that voice commands will ever be as reliable as a hardware keyboard and mice in the near future. The push to expand the vocabulary of voice solutions will trump any desire to push accuracy to 99-100% given they are inversely related.
 
You found it surprising people consider the fault rate acceptable. You expected the opposite. The opposite of acceptable is unacceptable. Ergo, you expected people to find it unacceptable, which you wouldn't think if you consider the failure rate acceptable. You wouldn't say, "I think 5% failure rate is acceptable, but I'm surprised if anyone finds 5% failure acceptable." ;)

I'm more than happy to say 5% failure rate is unacceptable. To me. It's why I barely use Siri except for a scant few commands that it gets right every time, but I'm quirky like this, it's a subjective thing. But to use a remote control which missed 1 command every 20? Yikes! I has a Sky STB remote where the '1' button needed a lot of pressure to register, that lasted about a day before I ordered a new one.

However my surprise was at the acceptance of the poll results where the figures are above a rate of 5%. There are [currently] no 100% votes (kudos for honesty), twelve 90% votes, four 75-80% votes and one 50% vote.
 
It's pretty compelling when it works.
I have issues which I'm sure are largely due to my accent, I can't get xbox pause to work unless I pronounce pause with a very bad american accent, despite that, I use it as my netflix client of choice.
 
I should state my surprise isn't limited to Kinect. Siri is the same. Folks now seem do more accepting of technology that doesn't work quite as well as demonstrated. Given the propensity of people to complain about anything, stuff like this throws me a curveball.
 
Why is it surprising? It's easy to like something that makes your life easier even if it's not 100% accurate. It's that 75-90% of the time when it works that it makes it feel all magical & makes you feel all goodie inside. There's never that feeling when you use a remote, mouse or keyboard when it works. You always expect it to work. I would probably say the same regarding Kinect games too. When it works, it's the future.

Tommy McClain
 
I should state my surprise isn't limited to Kinect. Siri is the same. Folks now seem do more accepting of technology that doesn't work quite as well as demonstrated. Given the propensity of people to complain about anything, stuff like this throws me a curveball.

There are lots of situation where slightly less accurate means more than twice as fast to accomplish.

To go back to my example of muting sound on my console/TV when I get a phone call from someone.

"Xbox, mute"

versus.

Look for remote. Walk over to it and pick it up. Find the mute button. Press button.

As I'm not always sitting at my couch an arms length away from the remote.

Same thing for turning the console on and off.

"Xbox On" and "Xbox, Turn Off"

Are significantly faster than either manually pressing the button on the console or powering on the controller to accomplish the same task. And that goes for a lot of the commands.

"Xbox, Switch" to switch between snapped and active app is significantly faster than using the controller.

http://support.xbox.com/en-US/xbox-one/kinect/voice-commands

Most of those commands are significantly faster than doing them via controller. There are a few exceptions though. "Xbox, Go Home" is arguably just as fast as pushing the Xbox button on the controller. Assuming the controller is on. If it isn't, as when watching a movie for example, then "Xbox, Go Home" is significantly faster.

I should make a separate post for the following, but don't feel like it. :p

For those having trouble with "Xbox On". Note the deliberate lack of a comma. It is like that on the Xbox site as well. That denotes that unlike normal "Xbox, ..." commands there should not be a deliberate pause between Xbox and On.

For all other Xbox commands while the machine is on, you will greatly increase your accuracy if you put a slight pause (as if there was a comma there. :)) between Xbox and the command.

That's because "Xbox" is to signal the console to start accepting voice commands. Once in that mode commands don't require deliberate pauses between words. If you don't have a pause you may have started saying a command milliseconds before the system is ready to start processing voice commands.

As I've said before I think that's what trips people up the most with "Xbox On". After they've used voice commands for a bit, the are conditioned to put a bit of a pause after Xbox, which then makes "Xbox On" not as reliable since it is expecting "Xbox On" and not "Xbox, On". And that's all because "Xbox On" is a system level command just like "Xbox".

In other words, the console is always listening for either "Xbox On" or Xbox" and nothing else. Only when it hears "Xbox" while running will it then accept user voice commands, which is everything else.

Just say "Xbox On" as you would in a normal sentence without a pause and it should work 100% of the time. Assuming it doesn't have a problem with your accent. :)

Regards,
SB
 
It's pretty compelling when it works.
I have issues which I'm sure are largely due to my accent, I can't get xbox pause to work unless I pronounce pause with a very bad american accent, despite that, I use it as my netflix client of choice.

I find Netflix is maybe a layout that isn't best suited for voice control, but mixing voice and gesture control is very good. I just don't like saying, "next page" over and over again, or whatever the command is. I find scrolling with the gesture is easier. Voice control is good for pausing and everything like that in the video player.
 
I'd like to point out a bug, that I think is worth mentioning.

When I was first playing around with snapping apps, I'd say, "Xbox, snap Skype" and it kept snapping Skydrive instead. It was only when I went to try to snap Skype using the controller UI that I realized you can't snap Skype. To me, that voice command should have failed, not given me a false positive. There are definitely some places where they need to fix things.

I'd like to see more feedback when issuing commands. Kinect should show me what it heard when it fails. Show me something like, "Heard: Xbox snap <unintelligable>". That way I can start to figure out which particular words I say in a way it doesn't understand. They show the text for successful commands. Should be easy enough to show some text for failures as well.
 
I'd like to point out a bug, that I think is worth mentioning.

When I was first playing around with snapping apps, I'd say, "Xbox, snap Skype" and it kept snapping Skydrive instead. It was only when I went to try to snap Skype using the controller UI that I realized you can't snap Skype. To me, that voice command should have failed, not given me a false positive. There are definitely some places where they need to fix things.

I'd like to see more feedback when issuing commands. Kinect should show me what it heard when it fails. Show me something like, "Heard: Xbox snap <unintelligable>". That way I can start to figure out which particular words I say in a way it doesn't understand. They show the text for successful commands. Should be easy enough to show some text for failures as well.

Yeah that would be cool if there was an option you could enable in settings to have the Xbox display what it thought you said.

Regards,
SB
 
I get some misses, but it is me and not the Kinect. However, I was not expecting it to be so (think of a word here) more involved to get to where I wanted to go. VC should be more natural than this, and my wife and son for example are really put off by it.

We do really notice how awful (for lack of a better word) Kinect V1 was with missing commands. I hear my wife in the other room yelling at her 360, and then yelling at me (not really).

Cortana integration should be interesting, and I go off-topic on wondering where the home automation for Kinect is at. If I could control my Nest t-stat today that would be...mind blown! I want v-crib controlled by my X1 and Kinect. ;)
 
Back
Top