Old Discussion Thread for all 3 motion controllers

Status
Not open for further replies.
... microphone stuff ...

The advantage natal has is that the array microphone can isolate sounds from individual players. Combined with player facial recognition, etc, then in theory there shouldn't be many issues with disruption, noise, etc.

Microsoft have pretty decent track record in speech, all things considered.
 
The advantage natal has is that the array microphone can isolate sounds from individual players. Combined with player facial recognition, etc, then in theory there shouldn't be many issues with disruption, noise, etc.

Microsoft have pretty decent track record in speech, all things considered.
The PSEye also has an array of mics (4?) that should help isolating sounds.
Sing Star actually has speech recognition as an option, and it works surprisingly well. I haven't tested it in a noisy environment, or with several people, but it recognizes very well even my english that has an accent.
I rarely use the speech recognition in Sing Star, though, as using a controller is just faster.
So, speech recognition for me is really not a function I see myself using. A remote or a controller is usually more responsive and faster.
 
The advantage natal has is that the array microphone can isolate sounds from individual players. Combined with player facial recognition, etc, then in theory there shouldn't be many issues with disruption, noise, etc.

Microsoft have pretty decent track record in speech, all things considered.

It's still not easy. I am fairly impressed with Singstar in that it recognises artists and titles pretty quickly without any training, but it takes a few secs to process your command. But then it probably has to check against a far larger database than most games would need. Inherently speech is laggy, because it takes some time to say something.

We'll have to wait and see. It would certainly be good to be able to add to the motion tracking.
 
It's still not easy. I am fairly impressed with Singstar in that it recognises artists and titles pretty quickly without any training, but it takes a few secs to process your command. But then it probably has to check against a far larger database than most games would need. Inherently speech is laggy, because it takes some time to say something.

Which wouldn't be a problem for, say, adventure games. Or educational games. Or perhaps in the Team management screen of a sports simulation.

Using the sports simulation, even with lag it may be much faster to just call out the name of the player you want, rather than scrolling to him and selecting him. Just an example. :)

Then again, if I had to try pronouncing some of the names of players in the NHL... :D

Regards,
SB
 
The biggest improvement that could be made to sports games and management interfaces is to do away with it completely! I play FIFA to play FIFA, not to rearrange a team of professional athletes who can't manage to stay in shape for two matches on the trot.

So, uh-hum, your suggestions for audio interfaces are excellent examp[les of how speech recognition could ease things like player searches. Just, please developers, understand the difference between a sports sim and sports management sim and don't force us to do both if we don't want to!
 
The advantage natal has is that the array microphone can isolate sounds from individual players. Combined with player facial recognition, etc, then in theory there shouldn't be many issues with disruption, noise, etc.

Microsoft have pretty decent track record in speech, all things considered.

But it won't know who to give priority to (e.g., if kid talks first, then I may be ignored; although sometimes he may be the legit user while daddy is the idiot who's trying to tease him). Plus not all noises come from human, visible or not. All the surrounding sound will permeate/stack on top of each other even if the mic tries to pick up from one area.

One of the best uses for voice-driven UI is when I don't have to walk all the way to the TV/console to issue command (e.g., play music). So the camera may not even see me. If I had to go to the TV and face the camera, I might as well use gestures or remote for more accurate hit. [size=-2]I use RemotePlay for this right now though.[/size]

I'm sure it'd be better than cellphone speech recognition, but the success rate needs to be very high and consistent. Not sure how accurate the mic array is though. May need to filter out background noise/music.


Which wouldn't be a problem for, say, adventure games. Or educational games. Or perhaps in the Team management screen of a sports simulation.

Using the sports simulation, even with lag it may be much faster to just call out the name of the player you want, rather than scrolling to him and selecting him. Just an example. :)

Then again, if I had to try pronouncing some of the names of players in the NHL... :D

Yap, picking an item out of a known library is one of the most successful voice-input use cases. Something like Scribblenaut or Heavy Rain should work too.

SingStar works amazingly well, but then again, they kinda cheated because the mic is right beside the player's mouth. So the voice input characteristics can be tuned rather accurately (more consistent acoustic parameters).

EDIT: I remember MS signed the Scribblenaut developer for a game ? If so, we will probably see some sort of Scribbenaut for Natal.

A good way to make general voice input more worthwhile is to allow titles to export their top-level macros to the Natal hardware unit (at the Dashboard level). So the user can utter a voice command even when the 360 is off, and have the console boot up, start the right game and, for instance, go all the way to joining an MP session waiting for a game. I suspect this is why Natal has a separate power supply.

If they have this kind of structure laid out, then the user can also use gestures or the controller to choose these high level macros (like AppleScript !)
 
I just have to wonder how Natal voice recognition could be used reliably ?
I know this board is english, but keep in mind loads of people are not talking english at all.

My father is blind and use speech2text and text2speech french professionnal tools. The T2S works really fine, the other one is not really bad but not perfectly relialable... So I just try to imagine a game where I have to try several time to give a vocal order... hu, hope it is not COD6 because you're dead before the system undestand it, or your co-fighters are already trapped because you have some difficulties to be understand by the system, useless... Or maybe we juste have to put large delay between enemy moves and gunfires ;-)

On the hand, speech recognition in latest Microsoft Office Communication Suite may work fine in english, but in other languages, is it just a pure mess. We have the voice to text voicemail system activated on our business system, and it is really funny to look at what the system do with its speech to text engine... Every week, we are sending voice to text messages to each other to find out the weirdest messages. And it is really funny, but frankly we decided to not activate this options for our customers !

I just really fear your hopes for reliable speech recognition hare really too high for the technology MS will deliver. I do not even imagine the difficulties to implement clever and usefull voice commands.

About the gestures commands that you are thinking about, I wonder how you can think people will have pleasure to do the movement you imagined for the backetball game !
Even if the NAtal can perfectly understand approximativ gesture (should I say guess?) and map them to the right command the player want to mimick, it will often be clumsy and unatural. Just imagine you mimick running (in basket game or a in a maze game,), you want to turn 180° ? Clap in your hands ? jump ? cry :go back ? fell on your @$$ ? What about other action you may have to do at the same time ?
Gesture have to be easy to do and replicate, they must not tired the player too fast, whereas player will have fun 5 minutes (or even less for couch potatoes) and let the system powered off until a friend of him will have a look at the wonder a few weeks or months later...

And just another example, let driving in a car driving game, simulator or not. What will do the system if I need to cleanup my dirty noze ? Activate the horn ? do nothing ? pause the game because as my driving teacher told me old the time, my hands have to be on the driving wheel ?
Ok my example is maybe weird, but what about all days gesture that could interfere with Natal ? How much false movement may be ignored by the system ? I really want to test this because the technology amaze me, but I really think Natal alone is not the right solution. Natal technology should be coupled to a wiimote/move like device, because all day long a human touch buttons, turns wheels, hold sticks/pens but no ways we hold our arms or move them in air for fun, at least if you are not doing airguitar ;-) (by the way, funny to see airguitar contest, but really anoying sport to do yourself, no ?)

Really, I prefer wait E3 to see what MS will finally deliver, because for the moment, I do think Natal/Wave, without any devices in hands, will be totally useless for anything than really simple games (wack the worm anyone ? hu how funny future gaming will become)
 
Heh heh, I know what exactly you're talking about.


More than a decade ago, I set up a dedicated voice input Mac (with DSP card) to play. The room had 2-3 people, but they were mostly away. I had a software agent running in the background observing my actions (via AppleEvents and other system events). The system was also hooked up as a fax and telephone (via the Telephony API) so that I could convert them into digital form. Even with me alone, it had trouble recognizing the right phrases consistently.

The software agent could recognize some of my basic usage patterns but they could also get in the way and became pretty annoying/mundane. Turned off most of the bells and whistles in 2 weeks. Turned off all of them in 1-2 months. It became just a normal Mac. ^_^

I did remember annotating a birthday song for the Mac to sing to a friend (It's lame but funny anyway).


Today, I still try speech recognition once in a while, they haven't advanced much. >_<
In a nutshell, the user has to train themselves at the same time to coax performance out of the system.


Still, I think there are pockets of scenarios where they are useful.

Besides picking items from a list, I think natural interfaces can be great as a common macro system to bypass layers of dialog boxes. The applications/system has to be designed in certain way though. But once those macros are written up, I expect them to be available to a regular controller, or remotely say... via RemotePlay. The "new" thing here is the system and application design, not merely the natural interfaces. :)


EDIT: I think some of these AI guessworks is also useful in media searching. Right now, I have to go through the entire library one by one manually (using some sort of meta-data such as date). The "guessing" system in Photo Gallery is very helpful because it presents the library based on its content. I only need to go through a subset of the photos based on the guesses.
 
Joy Ride becomes Project Natal launch game:
http://www.computerandvideogames.com/article.php?id=249663

Microsoft's arcade racer Joy Ride has been reworked as a Natal launch title, CVG can reveal.

According to a senior retail source, the title will be available on Natal from day one later this year - and will no longer be free-to-play.

In development at first party studio BigPark, the avatar-based game was unveiled as a free-to-play Xbox Live Arcade title at last year's E3.
 
Another good area to get into: accessibility
http://kotaku.com/5555740/vi-fit-is-wii-sports-for-the-blind

A video game research project at the University of Nevada, Reno, is creating Wii Sports-based PC games that don't require eyesight to play.

The two games in the VI Fit line play much like their Wii Sports counterparts. Both VI Tennis and VI Bowling mimic their respective sports through use of the Wii remote. The only difference is that instead of seeing where the tennis ball is coming from or visually lining up a strike, blind players hear and feel the games through use of sound and vibrotactile cues.

...
 
We gave up toying with the brain wave controller (Excuse: Too busy at work). We kinda felt insulted by this:
http://kotaku.com/5555690/bring-on-the-monkey-mind+powered-robots

Researchers at the University of Pittsburgh have trained monkeys to manipulate robotic appendages using the power of their minds, once again proving one solid scientific fact: monkeys kick ass.


Would be funny if people can play Resident Evil 10 with their cats and dogs.
[size=-2]Your sidesick will raise one of his legs near a tire or a tree, or may be grab one of the zombies' limbs and run. Old habits die hard.[/size]
 
Some news on different Natal demos...

Sunday Parade said:
The first game we tried was a mixture of dodgeball and handball. I swatted my hand and my little red avatar smacked the ball down the court, where it popped balloons and then bounced back.
...
We next tried an obstacle course where you have to dart under trees and jump over logs.
...
The final demo (these may not be the games that are actually released) had us in a red raft, bouncing along a river as we bent and jumped to control our craft. This was my favorite, a real head-rush inducer.

Here some video:

rafting game... 0:00 to 0:07
Ricochet... 0:25 to 0:29 & 0:45 to 1:11


http://www.parade.com/news/2010/06/06-how-i-became-an-avatar.html

Rafting game looks fun.

Tommy McClain
 
But it won't know who to give priority to (e.g., if kid talks first, then I may be ignored; although sometimes he may be the legit user while daddy is the idiot who's trying to tease him). Plus not all noises come from human, visible or not. All the surrounding sound will permeate/stack on top of each other even if the mic tries to pick up from one area.

One of the best uses for voice-driven UI is when I don't have to walk all the way to the TV/console to issue command (e.g., play music). So the camera may not even see me. If I had to go to the TV and face the camera, I might as well use gestures or remote for more accurate hit. [size=-2]I use RemotePlay for this right now though.[/size]

I'm sure it'd be better than cellphone speech recognition, but the success rate needs to be very high and consistent. Not sure how accurate the mic array is though. May need to filter out background noise/music.

How accurate is the voice recognition in the Ford (Microsoft) Sync? I've heard good things about it, but have never had hands on experience with it though my next car will probably have it (2012 Ford Focus). Heck it's enough of a feature that people buy the car because of Sync. Rather than thinking of it as a bonus extra, it's consider THE purchasing decider.

It has to deal with road noise, music, and people chattering while still performing voice recognition.

That might give us an idea of how capable Natal's voice recognition might be.

Regards,
SB
 
The driver position is known though. I doubt it will work well if a baby is wailing in the car, or heavy metal is playing through the deck.

In an average case, probably like how our PC performs. I believe Sync has other features that are not speech recognition (e.g., text-to-speech, MP3 player integration).
 
The driver position is known though. I doubt it will work well if a baby is wailing in the car, or heavy metal is playing through the deck.

However Natal has an array microphone. Array mic's allow you to separate the recorded audio based on direction and background ambiance. In theory, as it knows where you are in the room it should be able to detect who is speaking.

That rafting/ricochet video is rather hilarious. There is no way any system could track them :mrgreen:
 
However Natal has an array microphone. Array mic's allow you to separate the recorded audio based on direction and background ambiance. In theory, as it knows where you are in the room it should be able to detect who is speaking.

If the acoustic environment of the car is known beforehand (e.g., enclosed cockpit of known car model), the Sync people can tune the hardware and firmware for it during config.

Although Natal and PSEye could detect the location of the speaker(s) in real time, they are in an unknown location with unknown acoustic parameters. They need to self adjust/calibrate since the experts won't be around to help.

I could be wrong though (Was exploring with an audio engineer to implement echo cancellation on a PC as a startup idea -- when Skype was first introduced).
 
mmm. Well, you can make some guesses. :)

That sports champions video really demonstrates how difficult it is to get an intuitive game mechanic from a motion control system. The number of on screen prompts and help/option dialogs was pretty mad, although the pointing seemed to work well.

It'll be especially difficult for titles like that to distinguish themselves from Wii software. I wouldn't expect many people could see a difference.

The Move Party demo at the end was clearly more enjoyable for the people playing. The simpler actions with clearer directions, more intuitive goals and the 'novel' virtual mirror interface clearly help enormously. Less abstraction, less confusion, more fun.
 
Status
Not open for further replies.
Back
Top