Kinect technology thread

Plus Move is $99 for seemingly much less technology and a much lower BOM, so it's not outlandish relative to competition.

Not that I dont strongly wish MS had given Kinect a good deal more fidelity. I'm assuming they had their good reasons.
 
Last edited by a moderator:
Also, when you see the IR dot pattern, it looks like finger tracking won't be possible even with a higher resolution camera unless held close to the IR projector. The interval between dots (bright ones anyway) looks something like a finger's width, so you couldn't rely on the finger to be fairly covered with sample pattern. I think you'd neefd a higher resolution sample pattern to accompany the higher resolution camera, and the processing overhead that'd provide. We don't know how the algorithm scales so don't know if 4x the resolution would require a little more processing, 4x the processing, or a more exponential amount of processing.
Not sure how far he is from the kinect on this photo, but:

kinect%20view%20hands.jpg



Light-wise seems enough to see fingers... But that got me figuring: The camera can't always see the dots that come through the space between fingers to know they are indeed holes, and there isn't much space in there, so maybe their algorithm just considers that as a reading error and understands the whole hand as a uniform mass from a certain distance.
 
Not sure how far he is from the kinect on this photo, but:

Light-wise seems enough to see fingers... But that got me figuring: The camera can't always see the dots that come through the space between fingers to know they are indeed holes, and there isn't much space in there, so maybe their algorithm just considers that as a reading error and understands the whole hand as a uniform mass from a certain distance.
Yes, that seems to be the current resolution when you look at how 'blobby' the depth images can appear. That hand is pretty close to the projector from the looks of it.

I looked at one of the latest vids in HD, which had a T-shirt on a chair as a frame of reference. If a hand is taken to be about half that size, the dots had a pretty limited resolution for identifying individual finger positions. If you consider the number of projected dots, and that the resolution is 320x240, the depth detection is using quite a lot of dots per pixel sample. So basically, instead of looking at the dots, think of a projected 320x200 grid, and you'll see each pixel of the IR camera takes in quite a large area, a few square cms at distance (someone could actually calculate this. Maybe I will?!). This is going to be for a good reason. In fact I think it was already raised in this thread by MS, saying that they evaluated a higher resolution camera and found that to get useable accuracy improvements needed much higher performance camera and processing cost, so they decided given the type of input they were wanting, the low-res camera provided the same experience but with lower overheads, so is the efficient choice.
 
Also, when you see the IR dot pattern, it looks like finger tracking won't be possible even with a higher resolution camera unless held close to the IR projector. The interval between dots (bright ones anyway) looks something like a finger's width, so you couldn't rely on the finger to be fairly covered with sample pattern. I think you'd neefd a higher resolution sample pattern to accompany the higher resolution camera, and the processing overhead that'd provide. We don't know how the algorithm scales so don't know if 4x the resolution would require a little more processing, 4x the processing, or a more exponential amount of processing.

Yes, I didn't forget about R&D costs (any idea how much we're looking at? apparently they spent more on marketing than R&D). But MS justified paring back the hardware by saying they were trying to get the best price for consumers the fact that it costs them 34 pounds to make means that they were just talking BS (as were all the analysts and commentators who said MS would be making very little money on each unit)

Plus don't forget that Kinect technology wasn't developed exclusively for Xbox and is part of MS push into future interface technology (like Surface) and will be implemented in desktops in the future - so the R&D costs would have been amortised as such.

And finger tracking would be possible with a hi res camera at closer distances - that was the whole premise behind that cancelled Finger Fighter Ubisoft game.

And the camera might be more efficient at 320x240 res but having the option of using a higher res at the expense of processing cost would be a tradeoff that developers could use if they wanted the higher fidelity.

Plus the original spec had onboard processing to extract the skeleton from the depthmap, surely an onboard processor can't cost that much if the final device uses single digits of Xbox CPU time. What kind of processor are we looking at that could do this?
Not sure how far he is from the kinect on this photo, but:

kinect%20view%20hands.jpg



Light-wise seems enough to see fingers... But that got me figuring: The camera can't always see the dots that come through the space between fingers to know they are indeed holes, and there isn't much space in there, so maybe their algorithm just considers that as a reading error and understands the whole hand as a uniform mass from a certain distance.

But aren't the dots just the output from the IR projector and don't correspond to what is being seen by the depth cam?

It would be good if someone could calculate what the resolving power is of a 320x240 vs 640x480 depth cam at various distances from the camera.

This might help:
http://www.flickr.com/photos/randrporter/4749126317/sizes/o/
 
Last edited by a moderator:
Plus the original spec had onboard processing to extract the skeleton from the depthmap, surely an onboard processor can't cost that much if the final device uses single digits of Xbox CPU time. What kind of processor are we looking at that could do this?

Except in this case, there's hints that it's more about extracting the massive parrallel processing advantages of the GPU, and not so much using anything from the CPU. In which case, they may be extracting an order of magnitude more of performance with regards to skeletal tracking for their algorithm on GPU than they would a dedicated cheap CPU in Kinect.

That said it does appear that for Kinect 2.0, something like AMD's upcoming Fusion APU's might be perfect for something like this.

Regards,
SB
 
It would be good if someone could calculate what the resolving power is of a 320x240 vs 640x480 depth cam at various distances from the camera.

This might help:
http://www.flickr.com/photos/randrporter/4749126317/sizes/o/

I hope those people who have ir cam can take those movie/images from roughly where Kinect is positioned (or better if they can capture the direct feed/raw image from the ir cam on Kinect). And probably try to test if using higher resolution cam can add more precision or not.

I want to see what the Kinect IR cam see (not the processed depth image).

Those DF guys better do it or else...
 
Yes, that seems to be the current resolution when you look at how 'blobby' the depth images can appear. That hand is pretty close to the projector from the looks of it.

I looked at one of the latest vids in HD, which had a T-shirt on a chair as a frame of reference. If a hand is taken to be about half that size, the dots had a pretty limited resolution for identifying individual finger positions. If you consider the number of projected dots, and that the resolution is 320x240, the depth detection is using quite a lot of dots per pixel sample. So basically, instead of looking at the dots, think of a projected 320x200 grid, and you'll see each pixel of the IR camera takes in quite a large area, a few square cms at distance (someone could actually calculate this. Maybe I will?!). This is going to be for a good reason. In fact I think it was already raised in this thread by MS, saying that they evaluated a higher resolution camera and found that to get useable accuracy improvements needed much higher performance camera and processing cost, so they decided given the type of input they were wanting, the low-res camera provided the same experience but with lower overheads, so is the efficient choice.

I don't think Ms had much of a choice, if they were to maintain compatibility with older 360's models (which are bounded to usb 2.0)...

A few pages back I posted a video about a canesta depth sensing camera. Their resolution was even lower than the one generated by prime sense (320x200), but the depth image seemed to have higher DOF and precision than current solution. Ms also had bought 3dv, which develops TOFs cameras aswell. But allegedly, time of flight cameras are not very cheap, so maybe they wouldn't be able to launch it this year at a competitive price.

And increasing the resolution of the cameras just isn't a choice, unless they'd use a faster bus than usb2.0, which would make it incompatible with non slims models.

Yes, I didn't forget about R&D costs (any idea how much we're looking at? apparently they spent more on marketing than R&D). But MS justified paring back the hardware by saying they were trying to get the best price for consumers the fact that it costs them 34 pounds to make means that they were just talking BS (as were all the analysts and commentators who said MS would be making very little money on each unit)

Plus don't forget that Kinect technology wasn't developed exclusively for Xbox and is part of MS push into future interface technology (like Surface) and will be implemented in desktops in the future - so the R&D costs would have been amortised as such.

And finger tracking would be possible with a hi res camera at closer distances - that was the whole premise behind that cancelled Finger Fighter Ubisoft game.

And the camera might be more efficient at 320x240 res but having the option of using a higher res at the expense of processing cost would be a tradeoff that developers could use if they wanted the higher fidelity.

Plus the original spec had onboard processing to extract the skeleton from the depthmap, surely an onboard processor can't cost that much if the final device uses single digits of Xbox CPU time. What kind of processor are we looking at that could do this?


But aren't the dots just the output from the IR projector and don't correspond to what is being seen by the depth cam?

It would be good if someone could calculate what the resolving power is of a 320x240 vs 640x480 depth cam at various distances from the camera.

This might help:
http://www.flickr.com/photos/randrporter/4749126317/sizes/o/
I dunno about marketing budget outweighing R&D... Kinect was greenlit in 2007, had about 3 years of R&D, and the team needed almost a miracle (or to throw huge amounts of cash to make the miracle) in order to get this working in so little time. Didn't even Nintendo said that they were presented to the tech back in 2007 but it was in such a state that simply wasn't working? Their massive machine learning datacentre alone should've cost quite a few bucks.

The camera is possible of finger tracking at closer distances. Ms has even stated so, citting a tech demo as proof of concept.

And all things points that the camera hasn't been downgraded at all. Its still a 640*480 depth camera, its the processing done by prime sense that supposedly outputs a smaller image. The guy who did the open drive for it even said that both feeds are 640*480, thought i believe that the depth image is upscaled.

I hope those people who have ir cam can take those movie/images from roughly where Kinect is positioned (or better if they can capture the direct feed/raw image from the ir cam on Kinect). And probably try to test if using higher resolution cam can add more precision or not.

I want to see what the Kinect IR cam see (not the processed depth image).

Those DF guys better do it or else...

Probably never going to happen (we having access to the IR image without any processing). The chip that handles that is inside the camera, and without their processing all you would have is a black and white picture. Its their processing that transforms that into a depth picture, and I'm sure they keep that secret very closed. Afterall, their tech enables a small trade off in processing power to perceive depth using off the shelf sensors, which costs way less than their competitors alternatives. It may not be a huge advantage now (except maybe for the lower price point), but in the future when not constrained by the data bus, they will probably be able to deliver much higher resolutions than the competition.
 
It would be good if someone could calculate what the resolving power is of a 320x240 vs 640x480 depth cam at various distances from the camera.

This might help:
http://www.flickr.com/photos/randrporter/4749126317/sizes/o/
I've crunched the numbers and come up with a usable ratio. The horizontal length (l) of the projected matrix is 1.086 x the distance (d) from the camera (tan(28.5 degrees) = l/2d). The sample resolution thus is 320 pixels over the length, or 320/1.086d. Given square pixels, the sample per unit area will be squared, though I'm not sure they are.

Thus at 5 feet from the camera, each square inch of play area occupies about 25 pixels on the IR camera (or about 5 pixels per square centimetre for those whose measurement system isn't straight from the Dark Ages :p). At ten feet the resolution per axis halves, quartering area resolution, so about 6 pixels per square inch, or one pixel per centimetre.

Using a 640x480 camera would give 4x the resolution, so 100 pixel/sq inch at 5', 25 px/sq inch at 10'.
 
I want to see what the Kinect IR cam see (not the processed depth image).
It'll look very much like a screen of static. The projected pattern will be preserved in proportions due to being perspective aligned with the camera. Or if it is a different perspective, it's very slight. Someone looking at the projection will not decern any contours of the surfaces on which the pattern is projected. TBH I don't understand how they derive distance measurements from it!
 
I've crunched the numbers and come up with a usable ratio. The horizontal length (l) of the projected matrix is 1.086 x the distance (d) from the camera (tan(28.5 degrees) = l/2d). The sample resolution thus is 320 pixels over the length, or 320/1.086d. Given square pixels, the sample per unit area will be squared, though I'm not sure they are.

Thus at 5 feet from the camera, each square inch of play area occupies about 25 pixels on the IR camera (or about 5 pixels per square centimetre for those whose measurement system isn't straight from the Dark Ages :p). At ten feet the resolution per axis halves, quartering area resolution, so about 6 pixels per square inch, or one pixel per centimetre.

Using a 640x480 camera would give 4x the resolution, so 100 pixel/sq inch at 5', 25 px/sq inch at 10'.

Great, thanks for doing that.

So it seems that at 5 pixels per square cm at 5 feet the camera can track fingers (might have some problems with kid fingers)

5 feet away isn't actually that close, further than I thought.

But the 640x480 res camera would have no problem tracking fingers (even children's fingers) at 5 feet and would be usable further out (say when sitting on the couch to control movies/dashboard) for adults.

It's a real pity MS didn't go with the higher res depth cam and onboard processing (and not pare back the number of joints tracked) as Kinect would be far more versatile with finger tracking - it allows for fine control simply impossible otherwise (and without the use of buttons).

For example the interface seen in Minority Report relies on finger tracking to work and MS could easily have replicated a practically identical experience (you wouldn't even need the finger lights!) for Kinect had they simply gone with a 640x480 depth camera (I mean how much more does it cost than a 320x240 cam?)

Especially seeing as MS has the know how from Surface which itself was an inspiration for the UI seen in the film.
 
Last edited by a moderator:
Maybe kinect calculates distance in a similiar way SLRs phase detection auto focus works? I.e. two sensors that read light/rays with different wavelengths (not sure this is really needed), and slightly different angle from the subject (those dots), calculates the intersection of these rays and figures out the distance. Not very precise way to do it, but it is very fast and I believe good enough for a device like kinect.
 
Last edited by a moderator:
Great, thanks for doing that.

So it seems that at 5 pixels per square cm at 5 feet the camera can track fingers (might have some problems with kid fingers)

5 feet away isn't actually that close, further than I thought.

But the 640x480 res camera would have no problem tracking fingers (even children's fingers) at 5 feet and would be usable further out (say when sitting on the couch to control movies/dashboard) for adults.

It's a real pity MS didn't go with the higher res depth cam and onboard processing (and not pare back the number of joints tracked) as Kinect would be far more versatile with finger tracking - it allows for fine control simply impossible otherwise (and without the use of buttons).

For example the interface seen in Minority Report relies on finger tracking to work and MS could easily have replicated a practically identical experience (you wouldn't even need the finger lights!) for Kinect had they simply gone with a 640x480 depth camera (I mean how much more does it cost than a 320x240 cam?)

Especially seeing as MS has the know how from Surface which itself was an inspiration for the UI seen in the film.
That doesn't seem impossible for kinect...

Take a look at that:


This was done by a single person in just a few days...
 
That doesn't seem impossible for kinect...

Take a look at that:


This was done by a single person in just a few days...

Yes, except he's using his hands and not his figures so they amount of gestures possible are far more limited. I mean he's only moving pictures around and zooming in and out.

The Minority Report UI allows a far more complex range of interactions.

If anything that video shows how effective simple manipulation is with Kinect and how much much more would be possible by just having 4x the resolving power of the current cam.
 
Understandable considering he has a glove-like device on his hands with LEDs lit up on the end of his fingers.

Tommy McClain

As I mentioned before, Kinect would let you do this without needing the LED glove, theoretically even now if you get close enough to the camera so that it can resolve fingers.

A 640x480 res depth camera could have made this possible at a comfortable sitting or playing distance.
 
As I mentioned before, Kinect would let you do this without needing the LED glove, theoretically even now if you get close enough to the camera so that it can resolve fingers.

A 640x480 res depth camera could have made this possible at a comfortable sitting or playing distance.

I think that implementation would be great with an HTPC and a widescreen TV for the living room.

Something like voice activated web-pages and search with gesture controlled scrolling and selection or clicks on hyperlinks. I know I'd pay money for a browser like that.
 
Yes, except he's using his hands and not his figures so they amount of gestures possible are far more limited. I mean he's only moving pictures around and zooming in and out.

The Minority Report UI allows a far more complex range of interactions.

If anything that video shows how effective simple manipulation is with Kinect and how much much more would be possible by just having 4x the resolving power of the current cam.

I still think that would be possible... They could use the depth data to track where the hand is and once they know it the rgb camera can give more insight on how is hand gesturing.

I mean, it already does that of some sort, right? Right now from your couch, it listens to a specific gesture to FF/Rewind during video playback...

In Minority report he does all that fancy gestures, but in the end he is choosing between a bunch of video feeds, pausing, FF/rewinding and zooming between them. It's that incredible display that seems more far from reality than the scheme to control it. He is not really far from the screen too :p
 
The "hacks" are getting cooler...


I was able to do this exact thing with the Kinect Viewer that was available during the beta. It's the tool where you can record your body/gestures & upload to MS. I believe they use that for updating the algorithm.

Tommy McClain
 
The "hacks" are getting cooler...


I was able to do this exact thing with the Kinect Viewer that was available during the beta. It's the tool where you can record your body/gestures & upload to MS. I believe they use that for updating the algorithm.

Tommy McClain

Really cool.
 
Back
Top