Kinect technology thread

The video review showed a pretty good pointer-system with the hand. Looks comparable to Move MAG that I experienced last night. If Kinect is tracking subtle movements that well, that's quite an amazing achievement for this first generation tech.

From what i can see it is only using the x/y position of the users hand rather than the orientation of the users arm. Its a subtle difference but i would imagine it would have a significant impact, i cant imagine using move in MAG for instance if it was only using the x/y possition of the ball for pointing. Kinect seems to use a similar control method to the menu selection in Beat Sketchers, works well in that setting.
 
From what i can see it is only using the x/y position of the users hand rather than the orientation of the users arm. Its a subtle difference but i would imagine it would have a significant impact, i cant imagine using move in MAG for instance if it was only using the x/y possition of the ball for pointing. EDIT: Move seems to use a similar control method to the menu selection in Beat Sketchers, works well in that setting.

Seems, but apparently it's 3D still for Beat Sketcher (e.g. the spray can will have a wider range with lower density if you move forward or backward).

Anyway, yeah, the hand tracking for pointing purposes seems decent. However, I really hate the hold to select - if they could replace that with an option to snap your fingers or something instead, that would be much better for me personally, I'd be able to go through the menus in a snap (or two/three) literally!
 
From what i can see it is only using the x/y position of the users hand rather than the orientation of the users arm.
Oh, yes, you're quite possibly right and that's not terribly exciting. Although it seems to be working across a small area of the screen, which would be a very low resolution for positioning. Can anyone test this, maybe by keeping their hand in the same place and moving their arm left and right behind it, changing the pointing angle but not the displacement?
 
One of the major key ingredients of the experience is machine learning. Machine learning in our world is defining a world of probabilities. Machine learning, particularly our kind, which is probabilistic, is not really about what you know, it's about what you don't know.

It's about being able to look at the world and not see duality, zeroes and ones, but to see infinite shades of grey. To see what's probable. You should imagine that, in our machine learning piece of the brain, which is just one component of the brain, pixels go in and what you get out of it is a probability distribution of likelihood.

So a pixel may go in and what comes out of it may be - hey, this pixel? 80 per cent chance that this pixel belongs to a foot. Sixty per cent chance it belongs to a head, 20 per cent chance that it belongs to the chest. Now this is where we chop the human body into the 48 joints which we expose to our game designers. What you see is infinite levels of probability for every pixel and if it belongs to a different body part.

That operation is, as you can imagine, a highly, highly parallelisable operation. It's the equivalent of saying, pixel in, work through this fancy maths equation and imagine you get a positive number, a positive answer, you branch right, you get a negative answer you branch left. Imagine doing this over a forest of probabilities. This is stuff where you'll get a thousand times performance improvement if you put it on the GPU rather than the CPU.

GPUs are machines designed for these types of operations. The core of our machine learning algorithm, the thing that really understands meaning, and translates a world of noise to the world of probabilities of human parts, runs on the GPU.

This sounds like he's saying a highly probability based, branch heavy piece of code is better suited to GPU than general purpose CPU?
 
It sounds like he's saying it's "a highly, highly parallelisable operation" and that "GPUs are machines designed for these types of operations".
 
Although it seems to be working across a small area of the screen, which would be a very low resolution for positioning. Can anyone test this, maybe by keeping their hand in the same place and moving their arm left and right behind it, changing the pointing angle but not the displacement?

Thinking about it, i dont think tracking trajectory from the whole arm would be much different actually. If the elbow is in a fixed position you will be moving your hand in x/y anyway to control the pointer, so they would just use the x/y of the hand and apply some form of acceleration for the same effect. For laser style pointing like Move/Wii it is all in the wrist. Child of Eden is a good example, looking at that it could very well be using just x/y of hand or the orientation of the whole forearm but the result would be much the same to the point that you cant tell what its doing as its so similar.
 
Last edited by a moderator:
This sounds like he's saying a highly probability based, branch heavy piece of code is better suited to GPU than general purpose CPU?
That's not how I understand it. Creating those probability values is massively parallel number crunching, comparing large dataset for similarities, but the decision making, the final branches, can be handled on CPU. I would like to know what exactly they're doing on GPU, but I doubt we'll ever hear that! I also want to know how much local-machine or 'cloud' learning is actually going on. The YouTube vid a few posts above shows Kinect throwing a wobbly over someone's legs occluded by a short camera tripod. Is this something it'll later adapt to? How can it without some degree of correction? Experience based learning needs to have decisions as marked as successful or not, otherwise the system doesn't know to change. In this case, Kinect is placing the leg bent to the left when the user's legs are straight down. What is going to tell Kinect that was the wrong choice, and next time keep the guy's two legs below him?

One area his comparison with a human brain falls down, and a pet hate of mine with computer people speaking up their AI, is the computer lacks any comprehension and can only do comparisons with data. It's still fundamentally a conventional data analysis system, only with more computations and more reference data. Unless they have modelled self-evaluation and self-correction, its capacity to learn and adapt won't be as great as Kipman has described.
 
Well... jumping usually involves that your feet leave the floor, which Kinect should be able to see, or doesn't it?

Well, keep in mind that article is also with regards to children. Depending on the child some jump fairly high (a few inches) while quite a few barely get one inch off the ground when jumping in place. I'd imagine that could get tricky, especially if the camera is mounted above a large screen TV.

Regards,
SB
 
Well... jumping usually involves that your feet leave the floor, which Kinect should be able to see, or doesn't it?

In today's kinect session I attended, they showed some dev tech demos where they had a kind of picture-in-picture app showing what the various cameras saw and you could select more. The sensor was on the table but this was a classroom table, not living room so it was waist-high I'd guess. The cameras have a reasonably wide FOV, wider than I expected at least and do pickup the floor. One thing they did have to do was lower the window blinds because it wasn't facing the (strong) sunlight coming through the windows.

But I don't think you really need the floor. Just track the body's displacement relative to the joints. If the body shifts upwards without the knees being bent before you can assume the person jumped. I'm only guessing, they didn't go into much detail on how exactly the software works but the various cameras and algorithmic passes I saw tell me they have plenty of processing involved. It's not just a simple contrast-based, silhuette, edge-finder. The real-time skeletal tracking was particularly impressive.
 
I'm not sure about the floor. I think it was mentioned somewhere that Kinect has to see the floor for perspective or positioning or something.
 
So could you make a kinect with two ps eyes (one with the IR filter removed) and an IR projector, since the rest is in the software?
 
The physical hardware is the least complex part of Kinect. So, in theory anyone could easily replicate the hardware. Getting that built would, IMO, get you about 5% of the way to a working Kinect module. :p In contrast, Move is probably quite a bit more complex on the hardware front, but I'd be willing to bet the software still makes up greater than 50% of the complexity of the whole thing.

Regards,
SB
 
Is writing software for the IR camera much more complicated than image recognition software many companies (incl. Sony) already have? It seems to be an easier problem since both the color space and the resolution are smaller.
 
Is writing software for the IR camera much more complicated than image recognition software many companies (incl. Sony) already have?
The 3d skeleton tracking is. Unlike face recognition with it's easy targets, finding the hands and legs when crossed or in front of another bit of body is very hard. Kinect represents several years work on this very task alone, and it still has issues. No-one but a boy-wonder genius in a Hollywood film, who interfaces with comptuers solely by typing and never using a mouse, will be able to get a home-made depth camera up and running with full skeleton tracking without considerable amount of effort or time; unless MS are plain lying about the effort they went to! :p
 
The 3d skeleton tracking is. Unlike face recognition with it's easy targets, finding the hands and legs when crossed or in front of another bit of body is very hard. Kinect represents several years work on this very task alone, and it still has issues. No-one but a boy-wonder genius in a Hollywood film, who interfaces with comptuers solely by typing and never using a mouse, will be able to get a home-made depth camera up and running with full skeleton tracking without considerable amount of effort or time; unless MS are plain lying about the effort they went to! :p

Heh, it may be possible for someone to do as a university research study for their PHD if you constrain it to a set height, limb size, weight, etc. Things I believe have been done to limited extent prior to MS embarking on this process, which was mentioned in the big Kinect writeup. But again the key there was that it was extremely limted to either a set plane (distance), set area, set body type, set range of motions, etc. And that tracking was easily lost and could only be re-aquired by standing in a certain spot with a certain body stance for a set amount of time.

The beauty of MS's solution isn't that it can do skeleton tracking (although that in itself is pretty significant) but that it can do it with a variety of body shapes, heights, etc all at varying distances. All the while easily tracking multiple skeletons and seemlessly adding and subtracting people as they step into and out of the camera FOV. And never once do you have to stand in a certain place, or with a certain body position in order for Kinect to start tracking you. Well, except when it glitches out with certain people. It'll be interesting to see how rarely or how often it glitches once millions of people have access to it.

As was mentioned in one of those Kinect pieces. It must have been absolutely amazing the first time they had some random Joe step in front of Kinect and it automatically started tracking their skeleton.

Regards,
SB
 
Yep, the software seems to the be the key to this. Being able to tell the position in a 3D volume without using an accessory, like the wiimote or move, and having only a single stationary "sensor" is a hard problem to solve. Do that for more than one person, allowing them to "switch places", bump into one another, gracefully handling exiting and re-entering line of sight, other people walking in front, all that in real-time, etc. It's mighty impressive.

What the cameras themselves showed didn't look out of the ordinary but I was very intrigued when they begun showing the same cameras after doing algorithmic passes and the new information they were extracting.
 
http://www.gamasutra.com/view/news/...l_Launch_Amid_Celebration_Questions_Alike.php

Critical consensus also seems to hold that the device recognizes input much less than perfectly; in a review with a headline claiming Kinect "sacrifices the controller on the altar of accuracy," PC World called the tech "incomplete and frequently crude, with all the promise of something amazing, but only partial delivery."

A Joystiq reviewer found Kinect had trouble recognizing his face when he had his glasses on, and he also felt the device requires an unreasonable amount of living room space to play games. The site's sum verdict was harsh: "For all the talk of revolutionizing the Xbox 360 experience and making gaming more natural/ accessible, it's bordering on absurd how broken Kinect is when it comes to something as simple as working in your home."

Looks like it performs Exactly like I predicted when the device was announced, but this is not to say it wont succeed.
fitness + party games are enuf (look at the wii, apart from a few first party games thats all it had!) as these dont really need to read your movements accurately.
$500million marketing at launch guarantees a huge launch, But at this stage its too hard to see how it will pan out in 6 months time.
 
Back
Top