Project Natal: MS Full Body 3D Motion Detection

It wasn't particularly accurate, but I checked out a couple more YouTube vids including an official MS demo and the problems are still there. It shouldn't be a problem for voice selection from a list, where the options are limited. But we are a far way from confident voice input, because a computer has zero comprehension and can't self correct according to context. Give the user a small enough vocabulary and it'll do fine, but then you're creating a barrier that Natal is supposed to overcome. It's supposed to be natural, and getting people to consciously change their speech is anything but!
 
It wasn't particularly accurate, but I checked out a couple more YouTube vids including an official MS demo and the problems are still there. It shouldn't be a problem for voice selection from a list, where the options are limited. But we are a far way from confident voice input, because a computer has zero comprehension and can't self correct according to context. Give the user a small enough vocabulary and it'll do fine, but then you're creating a barrier that Natal is supposed to overcome. It's supposed to be natural, and getting people to consciously change their speech is anything but!

Still if most games stick to dialog trees (as an example) or selection menus (sports games?) then it's not a problem and breaks down the controller barrier that little bit more.

Although I'm sure Molyneux probably wants something closer to a natural speech control mechanism, I just can't imagine any type of game (other than perhaps adventure similar to old Infocom or Sierra type in a sentence - but only really looking for keywords) that would need natural speech recognition. Rather than just voice recognition for limited phrases/keywords or intonation (pseudo emotion/intention interpretation).

It'll also be interesting to see how much MS has improved voice recognition with the new engine in Windows 7.

That will give us a closer idea of what they might be thinking of with regards to Natal.

Regards,
SB
 
Even Sierra and Infocom games had really really simple grammar, mostly of a <verb> <object> or <verb> <direct object> <preposition> <indirect object> type.
 
Even Sierra and Infocom games had really really simple grammar, mostly of a <verb> <object> or <verb> <direct object> <preposition> <indirect object> type.

That what I meant. They encouraged a person to type in actual sentences but were only parsing for particular words or phrases.

Once you figured that out, then many people just stopped using sentences and just used keywords.

I'd imagine that would be the similar way games would treat "natural speech." In which they would just parse what's being said for relevant key words/phrases.

As that youtube video linked earlier shows however. If you speak with uneven speed, or speak too quickly the program/processor can't keep up or reliably parse what you are saying. I'm uncertain if Windows 7 voice engine can improve upon that aspect.

Regards,
SB
 
As that youtube video linked earlier shows however. If you speak with uneven speed, or speak too quickly the program/processor can't keep up or reliably parse what you are saying. I'm uncertain if Windows 7 voice engine can improve upon that aspect.
I'll be amazed if they can. Human's don't interpret the sounds 100% accurately either, but relate the sound to context to interpret meaning. Heck, homophones even sound identical, so there's no way to distinguish! And to have context, you'll need a very sophisticated language lookup. And one for every language you're targeting. And it's going to need to cope with accents etc. I'd love to see some test results from working with some of the more extreme British accents, and I'm not just talking about thick Scottish or Irish or West Coast accents ("oo ar, my luvver!"). The kids in Surrey don't pronounce their consonants properly. And they stick random words in that they don't mean. And they shun the rules of English grammar.

For reasonable use, it will need the users to...talk...to...the...computer...with...respect. Which might be a very good thing, and undo the damage of texting! :D
 
In my view...

Using voice to play games ? It's not entertaining and usually slow (It will serialize the entire experience, introduce dead wait/space, while waiting for the computer to respond). You already have a controller in your hands too ! In fact, I prefer gesture control to voice control, but that's just me. I can "talk to" (more like shout at) NPCs but it will only make me more aware of the limitation of technologies, rather than immerse me in it.

Using voice to control UI ? Workable but some tasks don't match well to speech (e.g., Google for words that sound similar). It's slow too ! Good when you have dirty hands, provided your console is on perpectually. I think Home Automation, not just controlling media player, would be cool here.

If I were MS, I'd focus on the gesture control since they can see in the dark, and 3D imaging !

EDIT: There are very specific use cases that may make voice control shine:
* Voice-based games (Karaoke, Jeopardy, ...)
* Voice-based applications (Chatting, training crowd cheers in sports games, training someone to sound like you, acting in a virtual world, humming melody to search for a song, ...)
... but I don't think anyone should shoehorn the tech into every app (like talking to NPCs :()
 
How well does the voice command work in that Tom Clancy game, and how complex is it? I have it sitting on my desk and haven't tried it yet.
 
How well does the voice command work in that Tom Clancy game, and how complex is it? I have it sitting on my desk and haven't tried it yet.

It works pretty well 2/3 of the time. Sometimes it seems to get really confused by standard phrases. At PAX last year, they showed it recognizing one of the team member's parrot's giving orders.

then sometimes, for no discernible reasons, it won't recognize a basic command or misinterprets or doesn't respond. And this is with a mic virtually right next to your mouth.
 
For reasonable use, it will need the users to...talk...to...the...computer...with...respect. Which might be a very good thing, and undo the damage of texting! :D

That would be a wonderful thing. As a former prolific letter writer, I've often been dismayed by the gradual decline of actual writing skills, much less the lack of thought put into a piece when someone does bother to write something.

I thought it was bad when letters were slowly being supplanted by e-mail. But with texting, it has degenerated extremely rapidly.

And now even so called news and scientific journals are suffering.

/me mourns the loss of the art of letter writing.

Regards,
SB
 
Actually, I bet we'd see the opposite. Texting shorthand didn't happen because they wanted to annoy us old folks. Maybe we'd see spoken shorthand.
 
patsu said:
Using voice to play games ? It's not entertaining and usually slow (It will serialize the entire experience, introduce dead wait/space, while waiting for the computer to respond). You already have a controller in your hands too!

Indeed... We tried it on the PS2 w/Opertator's Side, and it wasn't all that great. Granted the voice model was a little too granular and revolved around small immediate commands. The concept IMO was pretty cool, just didn't execute well.

Taito also did a mediocre PS2 soccer title that supported commanding teammates with verbal commands...
 
Actually, I bet we'd see the opposite. Texting shorthand didn't happen because they wanted to annoy us old folks. Maybe we'd see spoken shorthand.
:oops: Who'd have thought Sierra and Lucasarts would have predicted the future of the modern language? :oops: Sorry...


Who...think...future...modern...lanuage. Must...go...work.
 
I think sierra style is more:

- who modern language?
- go work

I don't mind though. I'd pay 100 euros to have sierra style adventures back on consoles with the free speech interface, even if it meant just typing using my bluetooth keyboard or keypad add-on!
 

I know it's just for laughs, but I would actually say that's pretty accurate (considering it's built into the OS)

I would think it'd certainly be accurate enough for fairly simple commands.
I can imagine the use of Natal being:

Sit down, turn on the system. Auto login by face.

"show recently released demos"
"download demo 2"
"play my music by coldplay"
"show inbox"
"open 3"
"delete" (confirm?) "confirm"
"pause music"
"play geometry wars 2" (confirm?) "confirm"

.... It would make for quite an interesting UI. Provided there is a way to show the expected commands and those available so you could easily get used to the system.
Question is how much does it hit the CPU.
 
The game console voice experience! ;)

I would think it'd certainly be accurate enough for fairly simple commands.
I can imagine the use of Natal being:

Sit down, turn on the system. Auto login by face.

"show recently released demos"
360: Takes you to the Force Unleashed page on Live
"download demo 2"
360: Does nothing
"play my music by coldplay"
360: Starts playing a random song from your entire library
"show inbox"
360: Does nothing
360: Tries to launch Halo 3 (error: insert disc!)
"delete" (confirm?) "confirm"
360: deletes your Halo 3 save game (argh!)
"pause music"
360: Pauses music (yay!)
"play geometry wars 2" (confirm?) "confirm"
360: are you sure?
 
Seems like voice needs more work. That video even made me frustrated just by watching it.

That area of research has been stuck at this stage for the past decade or so. I have little doubt we'd see hate messages for Milo all over the Internet if MS release it without any breakthrough.

It's more fruitful to work with a limited vocab. The longer the phrase the more accurate.

I know it's just for laughs, but I would actually say that's pretty accurate (considering it's built into the OS)

I would think it'd certainly be accurate enough for fairly simple commands.
I can imagine the use of Natal being:

Sit down, turn on the system. Auto login by face.

"show recently released demos"
"download demo 2"
"play my music by coldplay"
"show inbox"
"open 3"
"delete" (confirm?) "confirm"
"pause music"
"play geometry wars 2" (confirm?) "confirm"

.... It would make for quite an interesting UI. Provided there is a way to show the expected commands and those available so you could easily get used to the system.
Question is how much does it hit the CPU.

You need to add correction and repetition to your use case above. If there are background noise or other people talking, there may be interference.

It's much much easier to mix different input for general use (Mix gesture, controller, plus speech whatever comes natural). In fact, grabbing a controller, keyboard or mouse is usually quicker if your task list is long.

That guy was an idiot though. He kept talking to himself and trying to using commands that don't exist.

One can't help it. Beyond certain point, you'd be frustrated, and started to curse; or wonder whether you should just turn off the computer and do something more productive. The developers will have to do a lot of testing and tuning to make the program work reliably and consistently. The experience varies with people too.
 
Does anyone think that the Natal team might be able to borrow some of their tech from Microsoft Sync for Ford cars? I've never used it, but it seems to decent voice recognition for picking songs from your music collection or calling people from a phone book.
 
Yap ! You should be able to find existing applications today. Natal should be no exception.

Things like looking up items from a finite list (songs, names, available options in HelpDesk menu, numbers, letters) is fine. The longer the names the better. A free form, open ended conversation would be tough (You can tell from obonicus' video link).

It'd work well if the car is relatively quiet.

For Karaoke and cellphone use, the user's mouth is usually about the same distance/position relative to the mic. So there are less variants in those applications too.

The developers will have to provide an illusion to cover these limitations. There will be mistakes. As long as the consequence is recoverable, it should be acceptable.
 
Back
Top