The speed recognition part is not convincing because people already know where the tech is today.
Skimming is the interesting one, fits the 3D body tracking model very well (no tactile feedback needed too !).
Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
That's a good demo, explaining a lot more stuff. The AI is described as being a marvellous piece buried away in MS's dusty vaults. Speech recognition is using "TellMe" from a company MS acquired - that's one to look up. The skimming stones event has me rubbing my chin. What if the player doesn't try to skim? What if they don't even know how? The actual 'dialogue' was decidedly shonky, without any indication that anything said is recognised. Molyneux repeats that Milo exists 'in the cloud' and will learn vocabulary etc. over time, but so far short of a keyword here or there, I don't see any evidence of progression in language recognition.
Edit: Really what's needed is someone to 'play' it who isn't trying to demo it but is just exploring Milo's responses. I imagine that no matter what you do in the skimming stones episode, Milo will come out of that skimming stones, and later he'll still say, "we had fun skimming stones." In which case it's not terribly interactive.
The doesn't change the nature of the tech or what Molyneux is talking about. He states at the beginning that they use tricks but they're tricks that work. So the question is, to what extent has the technolgoy really progressed. Is there an intelligent being there as Molyneux describes, or if it very far from being that? The proof would be if different resopnse can be got from the experience. If at the end of the day they have a set of situations that the player has ot confirm to, it's not really a learning platform but a join-the-dots where you get the experience they wanted. eg. The skimming - you have to show Milo how to skim and you have to have a fun experience that, after you have had to encourage him, he'll reflect upon.Its a tutorial...
Maybe that is all tutorial and the rest of the game if is advanced AI, but at the moment nothing has been shown that looks like progress in the field to me. And relevant to this thread, rather than a Milo discussion, the voice recognition is a tooted point (but very undershown one) of Kinect that so far seems to be standard commands. Let's see some voice recognition in a room with background noise!
Another set of technologies can break up sentences into their constituent parts so that if the software doesn’t understand something it can ask for only the piece of missing information instead of repeating the entire question. For instance, if you say you want to fly from New york to San Francisco on Wednesday, and it got everything but the day, it would only ask you what day you want to fly instead of making you repeat your entire itinerary.
TellMe is adding a few other technology improvements under the hood as well, such as statistical models for predicting the next word you are going to say to narrow down the possibilities and acoustic modeling that adapts to your accent or speech pattern.
Finally, consumers will begin to see some of this speech-to-text technology in Windows Mobile 6.5.
There's no need to be unfair! I'm thinking a few voices at the same time, does the mic array actually work, something I ask of PSEye too which introduced it and hasn't used it. We've had not a single demo of working voice isolation.Yeah... speaker phone in a relatively quiet cockpit or lab. Would be difficult in a noisy party or a cockpit with loud heavy metal playing.
It's not a case of tracking more than one at once, but that a voice should be detectable given background noise of other people talking, where there's other voices are just background noise. eg. You're watching a film and discussing it, and while the discussion carries on the person in charge calls out some instructions to XB360 to stop/rewind/whatever the film. These array mics should be able to isolate that voice and pass the vocal commands on to the interpreter, where a single mic would just get all voices muddled together. It's this voice isolation that hasn't been shown. And indeed, echo cancellation didn't even make an appearance on PSEye when chatting - its mic array seems completely useless. Until someone actually demos working sound isolation, I remain from experience, skeptical.The PSEye and Kinect mic arrays are used for echo cancellation, multi-directional voice location, and background noise suppression. They are meant for single speaker only. I remember MS also claimed single speaker for Kinect.
Both can't recognize speech in a noisy environment, and won't track multiple human speakers. The headset might give more predictable results.
Sitting in a chair: Definitely works.
Sitting on the floor: Works.
Reclining while facing the Kinect: Works.
Reclining with the Kinect at our side: We couldn't get this to work, but we've been told that it will by launch.
Using another human as a coffee table (should have taken photos!): Kinect recognized the person behind the human coffee table trying to control the movie.
His fingers flicker in and out as they lie inbetween sampling points, and he's standing twice as close to the camera than typical playing distance. There's enough there to recognise large hand gestures, like open hand, closed fist, and spread fingers, but not enough detail to do individual finger tracking (at normal play distance, though a foot frmo the camera would work if the setup can focus that closely).Not sure how further away from kinect kudo is, but his fingers are crealy visible in the depth image on this video:
http://www.gamersyde.com/stream_kinect_gc_kinect_tech_demo-16675_en.html
Yes.Anyway, do we know for a fact that now the depth resolution is really just 320x240?
Not surprised at all. I mean do people really believe that it was going to be launched without the ability to work while seating.
There's a valid reason to question it, because resolving the human form in a pose other than standing face on is hard, and gets harder the more deviant from that standing pose you get. Slouching on a couch makes it hard to identify key points in a depth image, and we've all seen the noise in Kinect skeletons where it's had issues resolving other cases. MS also released a video background removal title that didn't work, and a pitch recognition title in Lips that didn't work.Not surprised at all. I mean do people really believe that it was going to be launched without the ability to work while seating.
There's a valid reason to question it, because resolving the human form in a pose other than standing face on is hard, and gets harder the more deviant from that standing pose you get. Slouching on a couch makes it hard to identify key points in a depth image, and we've all seen the noise in Kinect skeletons where it's had issues resolving other cases. MS also released a video background removal title that didn't work, and a pitch recognition title in Lips that didn't work.
Personally I consider it just as unrealistic to take it as read that the technology will work nigh faultlessly with seated players, as it is to take it as read that it won't work at all well with seated users. The info we have doesn't provide a clearly predictable case either way.