Kinect technology thread

:oops: Does Tellme have their own speech recognition tech ? They may have licensed it from someone else. They are more known as a telephony service company. It's one of those phone response service providers.

The speed recognition part is not convincing because people already know where the tech is today.

Skimming is the interesting one, fits the 3D body tracking model very well (no tactile feedback needed too !).
 
That's a good demo, explaining a lot more stuff. The AI is described as being a marvellous piece buried away in MS's dusty vaults. Speech recognition is using "TellMe" from a company MS acquired - that's one to look up. The skimming stones event has me rubbing my chin. What if the player doesn't try to skim? What if they don't even know how? The actual 'dialogue' was decidedly shonky, without any indication that anything said is recognised. Molyneux repeats that Milo exists 'in the cloud' and will learn vocabulary etc. over time, but so far short of a keyword here or there, I don't see any evidence of progression in language recognition.

Edit: Really what's needed is someone to 'play' it who isn't trying to demo it but is just exploring Milo's responses. I imagine that no matter what you do in the skimming stones episode, Milo will come out of that skimming stones, and later he'll still say, "we had fun skimming stones." In which case it's not terribly interactive.

Its a tutorial, and the cloud stuff will be activated WHEN the game has been launched.

As an intro/tutorial/demo, it was impressive and besides, they have got a long way to go before launch; if it does launch.
 
*nod* *nod* As a tutorial to Kinect, I think it would work wonderfully. This is one area where Sony can learn from MS. The packaging of PS Move should be experience-based like this, in addition to specs-based or selling by individual game titles.

It helps to tie over and extend the concept if the early games have holes in them. It also helps to set the identity of the system. I guess this is why Wii has Mii and other apps/services around Wiimote. This is why proper integration with XMB or Playstation Home is important for Sony too. Otherwise, Move's just a gadget.
 
Its a tutorial...
The doesn't change the nature of the tech or what Molyneux is talking about. He states at the beginning that they use tricks but they're tricks that work. So the question is, to what extent has the technolgoy really progressed. Is there an intelligent being there as Molyneux describes, or if it very far from being that? The proof would be if different resopnse can be got from the experience. If at the end of the day they have a set of situations that the player has ot confirm to, it's not really a learning platform but a join-the-dots where you get the experience they wanted. eg. The skimming - you have to show Milo how to skim and you have to have a fun experience that, after you have had to encourage him, he'll reflect upon.

Maybe that is all tutorial and the rest of the game if is advanced AI, but at the moment nothing has been shown that looks like progress in the field to me. And relevant to this thread, rather than a Milo discussion, the voice recognition is a tooted point (but very undershown one) of Kinect that so far seems to be standard commands. Let's see some voice recognition in a room with background noise!
 
Maybe that is all tutorial and the rest of the game if is advanced AI, but at the moment nothing has been shown that looks like progress in the field to me. And relevant to this thread, rather than a Milo discussion, the voice recognition is a tooted point (but very undershown one) of Kinect that so far seems to be standard commands. Let's see some voice recognition in a room with background noise!

Ha ha, he didn't say what the AI is all about, but at times, for the advancement of technologies, application and promotion of technology is more important than the technology itself.

TED is a PR platform for Kinect.
 
... over cellphone or speaker phone ? Kinect's environment is more like the latter. In the worst case, you're in the crowd/party.
 
Yeah... speaker phone in a relatively quiet cockpit or lab. Would be difficult in a noisy party or a cockpit with loud heavy metal playing.

EDIT: Rewatched the video. :no: Not at the rate the demoer was speaking in TED.

EDIT 2: Found Tellme's improvement to their speech recognition solution:
http://techcrunch.com/2009/04/28/te...-speech-recognition-and-its-own-voip-network/

Another set of technologies can break up sentences into their constituent parts so that if the software doesn’t understand something it can ask for only the piece of missing information instead of repeating the entire question. For instance, if you say you want to fly from New york to San Francisco on Wednesday, and it got everything but the day, it would only ask you what day you want to fly instead of making you repeat your entire itinerary.

TellMe is adding a few other technology improvements under the hood as well, such as statistical models for predicting the next word you are going to say to narrow down the possibilities and acoustic modeling that adapts to your accent or speech pattern.

Finally, consumers will begin to see some of this speech-to-text technology in Windows Mobile 6.5.
 
Yeah... speaker phone in a relatively quiet cockpit or lab. Would be difficult in a noisy party or a cockpit with loud heavy metal playing.
There's no need to be unfair! I'm thinking a few voices at the same time, does the mic array actually work, something I ask of PSEye too which introduced it and hasn't used it. We've had not a single demo of working voice isolation.
 
The PSEye and Kinect mic arrays are used for echo cancellation, multi-directional voice location, and background noise suppression. They are meant for single speaker only. I remember MS also claimed single speaker for Kinect.

Both can't recognize speech in a noisy environment, and won't track multiple human speakers. The headset might give more predictable results.
 
The PSEye and Kinect mic arrays are used for echo cancellation, multi-directional voice location, and background noise suppression. They are meant for single speaker only. I remember MS also claimed single speaker for Kinect.

Both can't recognize speech in a noisy environment, and won't track multiple human speakers. The headset might give more predictable results.
It's not a case of tracking more than one at once, but that a voice should be detectable given background noise of other people talking, where there's other voices are just background noise. eg. You're watching a film and discussing it, and while the discussion carries on the person in charge calls out some instructions to XB360 to stop/rewind/whatever the film. These array mics should be able to isolate that voice and pass the vocal commands on to the interpreter, where a single mic would just get all voices muddled together. It's this voice isolation that hasn't been shown. And indeed, echo cancellation didn't even make an appearance on PSEye when chatting - its mic array seems completely useless. Until someone actually demos working sound isolation, I remain from experience, skeptical.
 
The adaptive acoustic echo cancellation stuff I saw worked somewhat but had a small delay at the beginning (while it adjusted itself). As I understand, it's a heavily patented area. We had to pay or abort our plan.

The PSEye descriptions mentioned noise suppression, not noise cancellation. So I assume it doesn't work 100%

The voice location may be fine for one speaker, but would be limited for a party game system like PSEye and Kinect. The environment is expected to be noisy.


All in all, I agree PSEye was under-supported. Only a handful of PSEye games were released. Right now, both Kinect and PSEye are part of the "next gen" controllers though.

EDIT: Actually, if there is a money making VoIP phone service on consoles, we may see some of these techs on PSEye and Kinect. Too bad it's a low margin business right now.

Not sure if speech recognition will be useful in gaming. At this point, I think it's too slow and relatively unreliable. Only used when our hands are busy with other activities so far.
 
Not sure how further away from kinect kudo is, but his fingers are crealy visible in the depth image on this video:

http://www.gamersyde.com/stream_kinect_gc_kinect_tech_demo-16675_en.html
His fingers flicker in and out as they lie inbetween sampling points, and he's standing twice as close to the camera than typical playing distance. There's enough there to recognise large hand gestures, like open hand, closed fist, and spread fingers, but not enough detail to do individual finger tracking (at normal play distance, though a foot frmo the camera would work if the setup can focus that closely).
Anyway, do we know for a fact that now the depth resolution is really just 320x240?
Yes.
 
Not surprised at all. I mean do people really believe that it was going to be launched without the ability to work while seating.

Let say a splinter faction of gamers use this kind of stuff to make a other splinter faction of gamers feel bad and then you have the extreme ones that really believe what they hear.:rolleyes:
 
Not surprised at all. I mean do people really believe that it was going to be launched without the ability to work while seating.
There's a valid reason to question it, because resolving the human form in a pose other than standing face on is hard, and gets harder the more deviant from that standing pose you get. Slouching on a couch makes it hard to identify key points in a depth image, and we've all seen the noise in Kinect skeletons where it's had issues resolving other cases. MS also released a video background removal title that didn't work, and a pitch recognition title in Lips that didn't work.

Personally I consider it just as unrealistic to take it as read that the technology will work nigh faultlessly with seated players, as it is to take it as read that it won't work at all well with seated users. The info we have doesn't provide a clearly predictable case either way.
 
There's a valid reason to question it, because resolving the human form in a pose other than standing face on is hard, and gets harder the more deviant from that standing pose you get. Slouching on a couch makes it hard to identify key points in a depth image, and we've all seen the noise in Kinect skeletons where it's had issues resolving other cases. MS also released a video background removal title that didn't work, and a pitch recognition title in Lips that didn't work.

Personally I consider it just as unrealistic to take it as read that the technology will work nigh faultlessly with seated players, as it is to take it as read that it won't work at all well with seated users. The info we have doesn't provide a clearly predictable case either way.

Well consider it unrealistic all you want but I would rather be optimistic rather than to be pessimistic all the time. Especially about this hardware.
 
Back
Top