Poll: How accurate is XB1 voice control?

How accurate is XB1 voice control for you?


  • Total voters
    32
You just say Xbox, not Xbox One, and I don't know about you but it takes about half a second in a normal cadence even with a slight pause for me to say Xbox and then be ready for part 2 or the command. Slowing it down to a full second, ie speaking only one word per second, feels highly unnatural.

I have no idea where you pulled the 2-3 seconds, or worse yet 5 seconds, from. I can guess.

BTW, for fun, in that 5 second window I was able to comfortably speak aloud "I have no idea where you pulled the 2-3 seconds, or worse yet 5 seconds, from. I can guess."

:)
 
Actually, now that I think of it - the Samsung remote that my mum uses, has quite a higher failure rate than what I am used to. Although, that is pretty much down to a low battery and I think the buttons, which are rather big (and cheap) don't always register nicely when you press them. The transmitter also seems to lag a bit more compared to mine, which could be down to the battery or the quality (or the fact that it most likely has been thrown around and dropped a few times).

Bear in mind, my own remote isn't anything hightech and the IR remotes I've been using all work flawless in that - yes, I can state - they are pretty close to 99% and if there is a failure, it's usually down to user error (e.g. me pressing a button too quick after another one or being too far away).

I think I would notice, though I accept your point as well that you tend to overlook these things when they do occur. On the other hand, I suspect this is also down to it being a rather small annoyance - namely because the feedback is instant. Using voice-controls isn't - because already the command 'xbox. one. do whatever' already requires around 4-5+ seconds (I'm guessing here, as Xbox. One. already takes around 2-3 seconds), so it not registering means a higher loss in time, ergo more noticable annoyance. Also, I suspect there is a small gap between when you complete the command and when it's executed on the system (after all, the Xbox also needs to figure out that the command is completed?).

Fair enough if people are happy to tolerate this. I'd still be quite interested to know if when these failures do occur, how many retries are required (on average) until the device does the command as requested. Surely, this has to be a lot more complex (and therefore error-prone) than simply repressing a button on a remote?

I'm happy to say that if I was forced to use voice-controls for a very specific task - like on a device to play games and only to play games - I'd be happy to give it a much higher tolerance. However, within the context of my livingroom and controlling my AV systems, I wouldn't really put up with it, as I'm used to a flawless working remote. The benefits would need to significantly outweigh the cons, but even then, for some things, I prefer simplicity and reliability over complex and less reliable methods - especially for inherent simple commands (start, stop, play, pause, volume controls, channel numbers, menu navigation etc).

Given that XB1 is now a released product, there is no longer any point in generating a subjective opinion based on an imaginary device with performance metrics you arbitrarily created in your head.

Maybe you should try out the real device before you nix it off your list of potential purchases for your living room.
 
You just say Xbox, not Xbox One, and I don't know about you but it takes about half a second in a normal cadence even with a slight pause for me to say Xbox and then be ready for part 2 or the command. Slowing it down to a full second, ie speaking only one word per second, feels highly unnatural.

I have no idea where you pulled the 2-3 seconds, or worse yet 5 seconds, from. I can guess.

BTW, for fun, in that 5 second window I was able to comfortably speak aloud "I have no idea where you pulled the 2-3 seconds, or worse yet 5 seconds, from. I can guess."

:)

I'd be actually quite curious to actually hear you say that sentence within the 5 seconds you claim, comfortably, and given the topic, in a way that a voice recognition device such as Kinect would be able to accurately identify. :p

In fact, stating a simple word like the number "twenty-one" takes aproximately a second and you want to claim that a full sentence such as the one above with 20 words, spaces and even commas within 5? Sure. :rolleyes:

You are right about the command only including "Xbox [Pause]" though, so that's roughly 2 serconds (including the pause). When i said 4-5seconds plus i was taking a general complex command (nothing specific). If you want to argue however that the longest command can be spoken within 2 seconds while retaining a reasonable accuracy, be my guest. It doesnt however reflect countless of videos i've seen that demonstrate voice controls.

In the end it doesnt really matter, as you managed to pretty much ignore the general point of my post, which is, that a voice command is inherently longer than pressing a simple button - and in the case of a failure or a miss hit, repressing a simple button probably happens instinctively while a voice command needs to be repeated in full.

As a non owner, i'd still find it interesting to know if failures result in multiple repeats of the same commands etc. That would seem like a quite relevant question to me if we are to assess how workable / likable / sufficient a hit rate of such a device is. In that sense, a remote might be able get away with a 90% success rate - but a voice recognition software might be looked at more critically even if it retains the same success rate, but the ones where it fails need to be retried multiple times. Is this accounted for?
 
I'd be actually quite curious to actually hear you say that sentence within the 5 seconds you claim, comfortably, and given the topic, in a way that a voice recognition device such as Kinect would be able to accurately identify. :p

Do you have some slow drawl or something? It takes me about 4 seconds to say that sentence.

You don't need to speak slow or deliberately for kinect to understand you, you just need a slight pause between xbox and the issued command. My normal speaking cadence works just fine.
 
I'd be actually quite curious to actually hear you say that sentence within the 5 seconds you claim, comfortably, and given the topic, in a way that a voice recognition device such as Kinect would be able to accurately identify.
I said it in 5 in a clear voice, and four with normal voice. You are making claims that Kinect requires a difference in speech pattern that's somewhat different to that described here by actual users. A beat pause between 'Xbox' and 'command' is the main difference between voice input and normal speech (short of a gabble).

Maybe someone here with Kinect can upload a video of normal use? That'd put to rest guesswork and assumptions from those who haven't used it, and back up personal claims here with hard evidence. It'd be good to see those for whom it doesn't work too, to see the difference if any in input and voice.
 
I think we're going in circles here. If voice was truly atrocious, truly awful, then it would disappear as an input method as quickly as mobile 3D displays did on smartphones.
 
I said it in 5 in a clear voice, and four with normal voice. You are making claims that Kinect requires a difference in speech pattern that's somewhat different to that described here by actual users. A beat pause between 'Xbox' and 'command' is the main difference between voice input and normal speech (short of a gabble).

Maybe someone here with Kinect can upload a video of normal use? That'd put to rest guesswork and assumptions from those who haven't used it, and back up personal claims here with hard evidence. It'd be good to see those for whom it doesn't work too, to see the difference if any in input and voice.

http://www.ign.com/videos/2013/12/14/xbox-one-how-fast-we-did-50-voice-commands

This has been posted before.

The Xb1 is fluid in its response to the vast majority of his commands. And he is quickly moving through the the UI in a way that's in no way normal for a typical user and would require heavy use of gamepad button combinations to come close to pull off, which 80% of the xb1 users wouldn't bother to learn.
 
Last edited by a moderator:
Lol, this sidebar has brought back some fond memories. I'm from Mississippi (so don't accuse me of having a faster than normal speech cadence!), and when we played yardball as kids (i.e., football in someone's yard), we often agreed on a "Mississippi count" before the guy on defense assigned to the QB could rush. That translates to five seconds, at which time game on and the QB better get rid of it or take off running (because as most kids I would assume we didn't have O and D lines, just receivers, defenders, QB, and the guy trying to kill him).

I wonder what is in common use by kids in other parts of the country, as I assume this is a common way to play.

Down here, in a normal speech cadence (not rushed, as thus was the source of many a fight!, and nit slow obviously as you didn't want to give the QB any gifts), the QB defender would begin at the ball snap...

One Mississippi,
Two Mississippi,
Three Mississippi,
Four Mississippi,
Five Mississippi,
Rush!!

Now relevant to this thread, look at the above words, in number in complexity, routinely spoken by schoolkids in 5 seconds.

Compare that to Phil's claim that it takes an equivalent amount of time to issue an xb1 command.

Ridiculous.
 
http://www.ign.com/videos/2013/12/14/xbox-one-how-fast-we-did-50-voice-commands

This has been posted before.

The Xb1 is fluid in its response to the vast majority of his commands. And he is quickly moving through the the UI in a way that's in no way normal for a typical user and would require heavy use of gamepad button combinations to come close to pull off, which 80% of the xb1 users wouldn't bother to learn.

The vast majority of those were around 1.5 sec. A few closer to 2, many right at 1. And he pauses just a but more than is absolutely necessary.
 
I wonder what is in common use by kids in other parts of the country, as I assume this is a common way to play.

Down here, in a normal speech cadence (not rushed, as thus was the source of many a fight!, and nit slow obviously as you didn't want to give the QB any gifts), the QB defender would begin at the ball snap...

One Mississippi,
Two Mississippi,
Three Mississippi,
Four Mississippi,
Five Mississippi,
Rush!!

I'm from the North-East Ohio area and we also used a Five Mississippi Rush count as well for Yard Ball. I never would have thought it was used elsewhere. :LOL:
 
I said it in 5 in a clear voice, and four with normal voice. You are making claims that Kinect requires a difference in speech pattern that's somewhat different to that described here by actual users. A beat pause between 'Xbox' and 'command' is the main difference between voice input and normal speech (short of a gabble).

Well, I definately can't and did attempt it a few times. 6 seconds is the closest I can get (with proper punctuation, commas), unless I start to rush the sentence. I mean, sure, you/I/anyone could actually read it within 3 seconds if you really wanted to, but that isn't exactly the point, is it?

Fair enough, if 5 seconds is doable and in a way that a device such as Kinect could properly and accurately identify. If it however can't accurately identifiy, because it's spoken too rushed, then it isn't really relevant in the context of this thread. If I understand correct, it should be reasonably easy to test this using a browser and entering a phrase like that as a search term?

Anyway, as I already pointed out earlier - the point wasn't to make it sound as if voice commands take that long or that you need to change your voice pattern to use it - though, to some degree, I suspect you would do it in instinctively when encountering a command or phrase that has a higher failure rate or doesn't work in the voice pattern you are versed speaking in. As for the above sentence, it really doesn't matter to me if I am right/wrong about the 5 seconds - I definately can't, but as I said, it doesn't really matter as I was pointing out something differently entirely and no matter if a command takes 2 or 5 seconds, doesn't really make a difference to that point (in that I imagine that tolerance levels are different when you are merely repressing buttons that went unregistered or repeating commands that weren't identified).

In that sense - the interesting question to me is; even if a hit rate of 90% is accurate on Xbox One - is it enough for a general wide acceptance? IMO, this poll should distinguish more between basic and complex commands. As demonstrated in the IGN video above, most commands work flawless - but with voice commands, the possibilities of what commands you could potentially integrate are limitless. How much does the success rate suffer once you get to complex phrases, like; looking for dynamic content (like TV program names) or actual search phrases on the internet browser?
 
My voice was a normal speed voice, about the same as the IGN video. Look at that IGN video. It's not a gabble, it's not a drawl, and it works.

In fact what I'd like to see is someone do a comparison between button interface and Kinect. That would be paid to all this speculation and guesswork! ;)
 
My voice was a normal speed voice, about the same as the IGN video. Look at that IGN video. It's not a gabble, it's not a drawl, and it works.

In fact what I'd like to see is someone do a comparison between button interface and Kinect. That would be paid to all this speculation and guesswork! ;)

For anything that takes one button press only and no stick movement, then the button press will likely win if you are holding the controller and the controller is on.

For anything that takes more than that, voice control wins.

If the controller is off. Then it's not even a contest. :)

Regards,
SB
 
For anything that takes one button press only and no stick movement, then the button press will likely win if you are holding the controller and the controller is on.

For anything that takes more than that, voice control wins.

If the controller is off. Then it's not even a contest. :)

Regards,
SB

nothing takes one button press though.

to leave a game is a button press. then to move to your selected next activity if its on the home screen is *navigation* plus button press. versus "Xbox, go to <app>"

never mind a button press to pause a game then find/grab the remote, change input, then navigate to your desired channel. versus "Xbox, watch TV" then navigate. I haven't tested out just navigating straight from an activity to a channel yet. there may efficiency in that if that capability exists.
 
For the most part. There is one action that is a single button press, though. And I've used it in my examples before. :)

Pressing the Xbox Button is the same as saying "Xbox, Go Home" when in a game/app.

Regards,
SB
 
nothing takes one button press though.

to leave a game is a button press. then to move to your selected next activity if its on the home screen is *navigation* plus button press. versus "Xbox, go to <app>"

never mind a button press to pause a game then find/grab the remote, change input, then navigate to your desired channel. versus "Xbox, watch TV" then navigate. I haven't tested out just navigating straight from an activity to a channel yet. there may efficiency in that if that capability exists.

This may be true for those kind of situations, but then there are others where voice controls are inherently more difficult, for instance then when a user doesn't exactly know what he wants; e.g. He doesn't know which channel to watch, because he doesn't know what's on. In this case, he wouldn't be calling out "Xbox watch Sky", [no, not interested in that], "Xbox watch CNN", [no, not interested in that either], "Xbox watch Fox", etc. In that sense, at least speaking for myself, I would prefer to scroll through an on-screen guide or simply wanting to iterate through channels in a chronological order until I find something that I want to watch. This would be one of the cases where lying on a couch, or while eating with one hand (and chewing), when it would be beneficial to just use a remote and press a button.

Sure, you can always say - use voice when it's beneficial, use a remote when not and get the best of both worlds, which I wouldn't be disagreeing with.

BTW: For the record, no one ever said voice-controls isn't quicker. It certainly is, when it works. There's also no argument that voice-controls offer a means to do more complex actions than you could by simply pressing a few buttons. I guess this is also not the right place to argue the benefits of voice controls over buttons - this thread/poll is after all on accuracy. For the sake of the argument though - I'm sure you could be quicker in controlling your smartphone using your voice (assuming a perfect working system), but would anyone actually want that? It'd be like giving your phone to a friend and telling him what to do while not allowed to do it yourself. Sure, it's great for some things, but is is actually beneficial and convinient for the majority of things?
 
This may be true for those kind of situations, but then there are others where voice controls are inherently more difficult, for instance then when a user doesn't exactly know what he wants; e.g. He doesn't know which channel to watch, because he doesn't know what's on. In this case, he wouldn't be calling out "Xbox watch Sky", [no, not interested in that], "Xbox watch CNN", [no, not interested in that either], "Xbox watch Fox", etc. In that sense, at least speaking for myself, I would prefer to scroll through an on-screen guide or simply wanting to iterate through channels in a chronological order until I find something that I want to watch. This would be one of the cases where lying on a couch, or while eating with one hand (and chewing), when it would be beneficial to just use a remote and press a button.
Are you being serious? :???: When has anyone channel hopped since the invention of electronic programme guides? If you don't know what to watch, you won't channel hop but you'll say, "Xbox Show Guide." Channel hopping is also one of the weaker functions of voice control, like volume control, but the system solves that by having a far better solution than any other. What you can do is say, "Xbox watch Fox," and have the TV change to the right channel. You won't have to memorise the channel number for all your TV channels, nor look them up (something I do when seeing what to watch, I check an online guide and see what the channel number is), nor open the guide and use up/down until you find the channel you want and can select it. You won't have to manage your channel listings to select favourites. It's just there, what you want when you ask for it, selected by name. There's no more natural, intuitive solution as long as the voice recognition is accurate enough (it has some issues with some channels, it seems).

Sure, it's great for some things, but is is actually beneficial and convinient for the majority of things?
People have gone on record about when voice control is better. They come home from work and can get the console on while they take off their coat and man-bag. They can change channel or music while they hold a sticky bun.

The use cases can be argued about, and are very subjective, and whether people want voice control or not is a different discussion (being held in a few threads simultaneously). I've said before that I'm not overly in favour. But the point is, and the reason this poll was created, that voice control works well on XB1. It's not a busted, awkward, shoed-in solution. It's not struggling with a <60% success rate as the opinions of posters like Zed claimed extrapolating from one YouTube video. And most XB1 users aren't repeating themselves in perpetual frustration, wasting moments of their life wrestling with clumsy voice control when they could just zip through the UI with a controller.

There's quite a few people thinking that Kinect Voice control is cumbersome and inaccurate and people (will) have to struggle with it. I can understand where that viewpoint may come from, but it's clearly outdated (for Americans at least). That opinion should be updated to a more accurate understanding. VC does work well enough for everyday use without being a frustrating PITA. There's no need to look at voice control on phones and PCs to try and understand if Kinect works or not. "Siri is crap in everyday use, ergo Kinect voice control is crap," is logic reliant on a redundant analogy. We have real use showing it does work. Speculation can give way to actual, experimental knowledge. Any discussions involving the topic of VC should be carried out with the understanding that it does work well on XB1 without the constant cyclic arguments about how VC sucks without reference to real-world data.

I for one have changed my estimation of VC and the value it brings to XB1 given user feedback. Speculation has given way to an informed opinion and my assumptions were wrong. Still not gonna get one though; ain't gonna be talking to my CE devices! :D
 
Are you being serious? :???: When has anyone channel hopped since the invention of electronic programme guides? If you don't know what to watch, you won't channel hop but you'll say, "Xbox Show Guide." Channel hopping is also one of the weaker functions of voice control, like volume control, but the system solves that by having a far better solution than any other. What you can do is say, "Xbox watch Fox," and have the TV change to the right channel. You won't have to memorise the channel number for all your TV channels, nor look them up (something I do when seeing what to watch, I check an online guide and see what the channel number is), nor open the guide and use up/down until you find the channel you want and can select it. You won't have to manage your channel listings to select favourites. It's just there, what you want when you ask for it, selected by name. There's no more natural, intuitive solution as long as the voice recognition is accurate enough (it has some issues with some channels, it seems).

Errr, I actually channel hopp, more so because sadly not all electronic guides I have encountered are as good and fast as the one the Xbox One offers. And of course, I wouldn't expect everyone to want to read guides or understand everything at the glance of a title or description what is on. Channel hopping is easy in that sense, you judge what you see and choose if you stay with it or progress to the next channel.

But yeah, my point wasn't well formulated. It was more in response to the argument nothing takes one button press though - which is correct, but there are cases where pressing a button in successive manner (like to scroll through a guide or a list) where it's probably more convinient to do so while using buttons instead of your voice in successive manner to repeat a basic function like scrolling down. The fastest way isn't always the most convinient.

Anyway, this isn't a particular important point we need to persuit. There are cases where voice controls are more convinient (the ones Blakjedi pointed out) - there are others, where there are less. No argument there.

We have real use showing it does work. Speculation can give way to actual, experimental knowledge. Any discussions involving the topic of VC should be carried out with the understanding that it does work well on XB1 without the constant cyclic arguments about how VC sucks without reference to real-world data.

Absolutely. I myself am actually surprised at how well it works. I do think it's important however that we distinguish what works well and what doesn't. As an example; I think it's clear that basic commands and navigation in the context of the Xbox One work flawless and accurate enough. What about more complex voice commands that aren't linked to a specific command (where an aproximation can be used) but where a phrase is needs to be identified correctly without context? E.g. dictating a phrase that should be entered into a search bar in the browser is a lot more complex to get right than for the software to identify a basic navigational command that can only be 1 possibility out of maybe 50. I'm not asking this as a hater, a fan of a different machine, but from the viewpoint of an interested potential buyer who likes to understand the complex mechanic behind Kinect. I think this is quite relevant, in light of this topic and the future uses for VC in general.

If we knew the potential limits of VC, it would be easier to gauge the potential of the technology. I'm aware this talk actually goes beyond the context of Xbox One being viewed as a game-console, but goes further into how well placed it is for future uses and functions that might be more relevant to livingroom usage etc.
 
The presence of voice control doesn't mean the remote has to be thrown in the trash. Voice isn't a replacement device for a remote just as a mouse isn't a replacement for a keyboard.

Voice doesn't have to show an advantage in every single circumstance to have value and improve how we interact with our consoles.
 
Back
Top