Will we see successful Voice Recognition any time soon?

Naboomagnoli

Newcomer
I searched for any similar threads but couldn't see any - forgive me if there are..

It's mainly on the back of some poor reviews for TalkMan that I have come to wonder about this sort of thing. I put this down to the fact that while the concept is admirable - allowing people across the world to hold simple conversations, and encourage people to learn new languages to some extent - the PSP and a bog-standard microphone is hardly the best setup for a universal translator. In thinking about how else this concept could be applied in the longer run I wondered about the PS3 (solely because TalkMan is on PSP, so please don't consider X360 excluded from this or start any fires).

If Sony are looking at creating a global network of PS3's and being a part of the increasing blending of different cultures, wouldn't a successful translator applied to voice blogs and voice communication be a massive step forwards?

Obviously the ability to translate languages is a different kettle of fish, so voice-recognition itself should probably be the first aspect to nail. It has been attempted in a number of games with limited or no success over the years(AFAIK), and can be a real disaster from my attempts with the voice-recognition in Word over the years. However, this is the beginning of the next-gen, with motion sensing controllers, and cameras that promise to give us Minority Report controls, and processors that can do in real time what it takes conventional processors minutes to render. Surely there's room for voice-recognition in with all this power and progression of old concepts?

Consider a squad-based game such as SWAT; this has a list of specific commands (red team breach bang & clear, blue team provide cover etc) and as such lends itself very well to the idea of having a limited library of phrases to understand thus improving success rate. Plus, having pre-determined commands would make translation pretty damn easy. With further installments and imitations, phrase libraries could be larger, recognition success rates would improve and different genres of game would take up the idea and progress it further.

So, my question is: How far are we from seeing a game with pre-determined command phrases successfully employ voice recognition, and what is needed to get there, make it standard in similar games, and in the long run go beyond to the ultimate goal of viable word-for-word translation in games?
 
Before I engage in your discussion and answer your questions, I have a question of my own. Would you consider the SOCOM series’ implementation of voice recognition successful? If so, or if not, to what degree?
 
Gradthrawn said:
Before I engage in your discussion and answer your questions, I have a question of my own. Would you consider the SOCOM series’ implementation of voice recognition successful? If so, or if not, to what degree?

Haven't played it :p

I've only had the ability to play online on my PS2 recently when I won a PStwo signed by Fumito Ueda, and since online is apparently SOCOM's greatest strength I've held back on buying a copy thus far.

By no means do I know the full history of voice recognition in games - I've posted this thread to learn rather than to really inform ;). Praps I could have done some research, but for the most part my experience of voice recognition is not first-hand. How (and since when) exactly does SOCOM use it, and what is your opinion on its' successes/shortcomings? How would you see it evolving over the next 5 years or so?
 
Click in micro in my sig (there is a few more reports on the web).

If they indeed deliver what they say (you can see more or less the same in Windows Vista, see in MS site) then it is probably very soon, this is also why I wanted a standard micro in EVERY console and none delivered :devilish: :devilish: :devilish: :devilish:.

Edit: here is the Vista site, it is a bit better than I recal.

Also they already call it speech recognition, althought I doubt if UE3 would have it too.
 
Last edited by a moderator:
pc999 said:
Click in micro in my sig (there is a few more reports on the web).

If they indeed deliver what they say (you can see more or less the same in Windows Vista, see in MS site) then it is probably very soon, this is also why I wanted a standard micro in EVERY console and none delivered :devilish: :devilish: :devilish: :devilish:.

Edit: here is the Vista site, it is a bit better than I recal.

Also they already call it speech recognition, althought I doubt if UE3 would have it too.

Thanks for the info, and I fully agree that mic's should be standard, or at least available from launch and supported as if standard. Speech recognition is something that could be a bigger innovation than things like motion detecting cameras or multi-video chat etc because it's far quicker and more intuitive to say what you mean than it is to convert what you want to say into a number of mouse clicks in a menu, or a series of button combinations. You could also have a language conversion setting for online play so that people who speak English can hear exactly what it is you said, whereas people who don't speak English can hear your order converted into a basic command in their language.

The Unreal example sounds fantastic, especially with the ability to actually describe a place (the cavern, the tower etc) and have the teammate know where you mean. Excellent stuff. I just hope the end product is of a good standard though, since they all sound good on paper. I remember watching adverts of people using Word, talking very quickly and their words being perfectly transcribed on-screen, when I could literally say "arse" and have "elbow" come up onscreen.
 
Naboomagnoli said:
I've only had the ability to play online on my PS2 recently when I won a PStwo signed by Fumito Ueda, and since online is apparently SOCOM's greatest strength I've held back on buying a copy thus far.

By no means do I know the full history of voice recognition in games - I've posted this thread to learn rather than to really inform . Praps I could have done some research, but for the most part my experience of voice recognition is not first-hand. How (and since when) exactly does SOCOM use it, and what is your opinion on its' successes/shortcomings? How would you see it evolving over the next 5 years or so?

If you every decide to buy SOCOM, don't bother with the 1st two, go straight to 3. The checkpoint system alleviates much of the unnecessary frustration (for the single player mode, that is).

To answer your question, the SOCOM series have used voice recognition since the very first game. Commands/Voice Commands are mapped to a button on the controller (O, I think). Voice commands can be intermixed with commands selected via the controller. You can give commands without bringing up the command menu (voice only) or you can bring up the command menu and use the selections as guide for what to say. The commands are multi-tiered, meaning there's always at least 2 parts to them (and often 3). "Who," "what," and often "where." To give you an idea of how it works I've broken down the process below:
  • Tap "O" - The command menu comes up.
    • You can select a command with the controller.
    • You can read off one of the commands listed.
    • The 1st commands available will be in regards to Who you want to give an order to.
  • Press and Hold "O" - The command menu does not appear.
    • The game readies itself for a complete voice command (all necessary components)
For example, your entire (4 man) squad is called Fireteam. Fireteam has 2 subgroups, each subgroup has a name (can't recall the names). So, a typical command might look like this:

Pure Voice:
  1. Hold "O"
  2. Say "Fireteam"
  3. Say "Move to"
  4. Say "Crosshairs"
  5. Release “Oâ€￾
or

Menu Guided:
  1. Tap "O" - Command menu comes up
  2. Say "Fireteam" - submenu for available actions comes up
  3. Say "Move to" - submenu for available checkpoints (along with crosshairs) comes up
  4. Say “Cross Hairsâ€￾ – all menus close
In the case of using voice only, I've found that breaking the commands up (eg "Fireteam," then "Go to," etc) nets the most reliable results. However, saying it as one sentence ,"Fireteam move to crosshairs," is suitable once you've adapted yourself to what the game's speak recognition wants. As you can probably already tell, there's plenty of room for improvement. Voice recognition should, ideally, adapt itself to you, not the other way around. However, I do believe such adaptive technologies all require some degree of training (as with MS Office's speech recognition). This could present a problem when trying to implement it into a game that is suppose to be pick-up and play.

However, I think it could be cleverly intertwined, such that it's almost transparent. A training level would be the perfect place. I think a better solution would be to have voice recognition training built right into the OS. Attaching the training to each user's profile. Allowing any application (game or otherwise) to access that information, so that the user would only have to do it once for all software. When a user copies their profile to removable media, their info, saved games, and voice recog. settings go with them. So, even if they play SOCOM 10 at a friend's house, or a game that they haven't played that uses voice recog., the game will adjust to them, seamlessly. I don't think that's going to happen, not this generation at least. What I do see happening, in SOCOM at least, is a speech recognition algorithm that, at a minimum, is faster and notably more reliable. I find SOCOM's current voice recognition to be quite suitable. Far from perfect, I wouldn't even say great, but for the most part good enough to get the job done.

Over the next 5 years, I could see it potentially evolving to integrate facial, or more likely, gesture recognition as well. Facial recog. possibly to augment speech recognition. Gesture, however, is probably a lot more likely, and would allow for simple gestures, like the ones the SEALs do in the movies :p , to be used in the place of commands. Or maybe something akin to Manhunt, where giving voice commands when an enemy is near by will alert them, but using gestures will not (in Manhunt, if you had the microphone plugged in, noise would detected by the enemies).

Naboomagnoli said:
So, my question is: How far are we from seeing a game with pre-determined command phrases successfully employ voice recognition, and what is needed to get there, make it standard in similar games, and in the long run go beyond to the ultimate goal of viable word-for-word translation in games?

So, I guess I would say we already have games with pre-determined command phrase successfully implemented, to a degree. I don't think its going to take much more than additional system resources and R&D for it to be truly successful, and probably by the 3rd generation of next-gen games. As for the final part of your question, I have no idea. I haven't even heard much in the way of attempts to do that in a game. Nor do I have a clue as to the resources such a process would take. That will be something interesting to research.
 
I haven't played SOCOM either, but from what you just described the game recognizes commands by limiting the available options. Your bank does the same when you call and get a voice prompt. It gives you a short list of words you can use to access further menus. This type of system works well in a military simulation game because that's when it's natural to bark out commands clearly with a standard set of phrases.

Such a system would not be as immersive in a more casual game setting. Perhaps you're playing the role of a gangster and want to use voice commands to communicate with your allies. It would not seem natural to have to say "Posse, follow, me" when you wanted to move about. Ideally you could use slang and jargon and have the game recognize that "Hey guys, come on" meant that you wanted your group to follow you.

As far as seeing a more prolific voice implementation in games goes, I think Sony, Microsoft, and Nintendo have all partnered with the right company to do so.

http://www.pcmag.com/article2/0,1895,1915071,00.asp

IBM has demonstrated a couple voice recognition technologies. First is a command interface, where you can tune the radio in your car. Second is a translator application, that can listen to English and translate it into spoken Chinese.

I definitely think we'll be seeing something like this soon. Especially with the recent Nintendo patent about voice to text in games.
 
I think I read something about UE3.0 improving on the voice recognition that UE2.0/UT2k4 had implemented. I'll try to find that. But I would love to see that type of stuff improved in all these UE3.0 games. (Rainbow 6, Brothers in Arms, UT2k7, GOW, etc.) And more FPS games in general.

kinda OT, has there been any reports/news about a next gen Socom?
 
Rainbow Six had a couple of versions with voice commands iirc, also I've read one or two upcoming Nxt Gen releases (RB6, BiA?) will have it as well.
 
Bad_Boy said:
I think I read something about UE3.0 improving on the voice recognition that UE2.0/UT2k4 had implemented. I'll try to find that. But I would love to see that type of stuff improved in all these UE3.0 games. (Rainbow 6, Brothers in Arms, UT2k7, GOW, etc.) And more FPS games in general.
Bad_Boy said:

kinda OT, has there been any reports/news about a next gen Socom?


Zipper is currently at work on what I presume to be the last SOCOM for the PS2. And as far as I know, there have been no reports or news either. I'm rather surprised by this, since, SOCOM has been SCE's centerpiece for its online community for the PS2. One would think they would want to use the next-gen SOCOM to do the same for PS3 from day 1, since it will be online from day 1.
 
Gradthrawn said:


Zipper is currently at work on what I presume to be the last SOCOM for the PS2. And as far as I know, there have been no reports or news either. I'm rather surprised by this, since, SOCOM has been SCE's centerpiece for its online community for the PS2. One would think they would want to use the next-gen SOCOM to do the same for PS3 from day 1, since it will be online from day 1.

Zipper has more than one team, right? So they can be working on SOCOM PS2 and a PS3 game?
 
Kittonwy said:
Zipper has more than one team, right? So they can be working on SOCOM PS2 and a PS3 game?

AFAIK, they are a small developer. Besides, even if they do have more than one team, the other would probably be working on the PSP version, Fireteam Bravo 2. :p One can always hope, though (I sure am).
 
I heard a rumour that, what with Sony acquiring Zipper, they are helping Sony out with the online interface. I had never considered that it could involve speech recognition until I found out that SOCOM used such technology already.
It'd be pretty simple to apply similar techniques to those of SOCOM, saying "open" and "photo album" and "left", "right" or "rotate" to go through albums and see photos the right way up. I suppose Bladerunner's one step too far though!

You could also "open" "web browser", or "play" "disc" to run whatever's in the drive. Of course, with a 20-60GB hard drive, it'd take very little space to pre-record a voice sample of you saying the name of an HDD film, or of a musical artist, or a downloaded game. "Play" "Anchorman", "open" "Holiday photos".. This interests me more than the oft-talked up Minority Report controls everyone's promised for years.
 
Man you just got me really excited with all that stuff zipper could bring to the interface. Not going to get my hopes/expectations up, but I would love some features like that. Probably sounds pretty lazy (I mean cmon we got wireless controllers, all it takes is one or two buttons to do what you want)...but those type of features would be in the right direction for some cool innovations in voice recognition. I mean if the interface can be voice activated, I would imagine developers would try to focus on that for their games as well.

It definately has potential though, I just hope Sony recognizes the things that are possible, even more so than last gen.
 
Voice recognition will only be accepted when it obeys you on the first command.
We've used to input devices with immediate, prompt action and virtually errorfree operation.
As soon as you have to repeat "Open TV" more than once it gets more annoying than pushing a button or two on the remote.
With a physical input device you have a better grasp of why the command isn't getting through, with voice commands you can't be sure if it's because you are mumbling, there's too much ambient noice, the mic isn't connected or the machine just can't understand. It becomes frustrating very soon if the implementation isn't good.
For voice recognition to work reliably, you'd either need a relatively quiet room (not happening with game or media interfaces, or a very directional mic, preferably positioned close to your mouth (not good because who likes to wear headphones/mic most of the time)
 
Last edited by a moderator:
I can see the future of PS3 gaming clearly now...

You've got a left thumbstick, 4 right buttons, and 4 shoulder buttons to control your character.
You've got tilt detection to control the camera, and motion detection to control your familiar's position
You've got EyeToy observing your position for spell casting
You've got voice recognition to control your party

This'll give such delightful advice in game guides as...

"When encountering the Malaquy Warriors in the Dungeon of Krim, equip you fire sword. Using left, right, left, right, on the left stick to gain momentum, use Baneful Strike repeatedly (Square, Tri, Tri, Circle, X) while alternating left and right sidestepping with L2 and R2, blocking with L1 when a warrior gets through. You can tip the controller to better to see the action, and when a Boss Warrior comes, flick the controller towards him to send your dragon familiar to intercept. When grouped together, you can add a fireball by dipping your head in a side-to-side figure of 8 motion, and issue the commands 'Flank' and 'Push' to have your teammates cut down on the influx of enemies so you can deal with them more comfortably."

Then we'll have the combat moves list of Tekken, including
Cabalistic Flurry : O, O, X, O+X, L1+X, tip 45% left, lift left leg, shout 'do-dah-hicky', X+wave left hand+nod head+move controller up and down+sing '3 blind mice'+L2+R1
 
:LOL: Actually, already with the "Wiimote" with it's rumoured mic (and some degrees the PS3 tilt controller), the gaming faq's will be facing quite a challenge, even though the gamers might have it easier, controlwise (especially on the Wii)
 
Shifty Geezer said:
I can see the future of PS3 gaming clearly now...

You've got a left thumbstick, 4 right buttons, and 4 shoulder buttons to control your character.
You've got tilt detection to control the camera, and motion detection to control your familiar's position
You've got EyeToy observing your position for spell casting
You've got voice recognition to control your party

This'll give such delightful advice in game guides as...

"When encountering the Malaquy Warriors in the Dungeon of Krim, equip you fire sword. Using left, right, left, right, on the left stick to gain momentum, use Baneful Strike repeatedly (Square, Tri, Tri, Circle, X) while alternating left and right sidestepping with L2 and R2, blocking with L1 when a warrior gets through. You can tip the controller to better to see the action, and when a Boss Warrior comes, flick the controller towards him to send your dragon familiar to intercept. When grouped together, you can add a fireball by dipping your head in a side-to-side figure of 8 motion, and issue the commands 'Flank' and 'Push' to have your teammates cut down on the influx of enemies so you can deal with them more comfortably."

Then we'll have the combat moves list of Tekken, including
Cabalistic Flurry : O, O, X, O+X, L1+X, tip 45% left, lift left leg, shout 'do-dah-hicky', X+wave left hand+nod head+move controller up and down+sing '3 blind mice'+L2+R1

:p Cheeky scamp..


How about this instead (in a single player squad based war game):

You could gesture to a squad bot to go in a certain direction with your hand, while saying "find cover over there". You could peer up over a wall with your tilt controls/analogue stick, and do some recon. As enemies approach, you could gesture to your team of bots that there are 3 enemies, and that they're coming this way. As you see an enemy shadow approaching your cover, you shout "GOGOGO!" and all open fire at once. You could then press the grenade button and actually throw an imaginary grenade using the Eyetoy, and then get back behind cover. Shout "Flank to the left!" and your bots would try and get around the enemy while you lay cover fire down.

Once the battle plays out and the gunshots have faded away, you'd turn to your teammates and say "Everyone ok?" to which your team-mates would give you their status and you'd know who's hurt and who's low on ammo etc. One of your teammates, a gibbering wreck shaking with fear, says nothing. You say "on your feet soldier, we've made it. Our base is nearby". He looks up. You offer your hand out (which the EyeToy reads). The teammate, visibly reassured, grabs your hand and lifts himself up onto his feet.
"Squad, move out!"

Not saying this level of involvement should be standard, but it'd be nice to see it done where it'd work well..
 
Actually... if this is the case...

I prefer to watch you play Socom PS3 "live" :D

+ + + + + + +

Other than the occassional dream to guide PS3 on a google search by voice, I have very low expectation in voice recognition in general.
 
Last edited by a moderator:
Anyone called Microsoft's technical support? Their voice recognition is unprecedented and damn near human. It's spooky....
 
Back
Top