Kinect technology thread

That's what the calibration phase is for but yeah regular camera's have limitations. But it strikes me as a slightly better camera, In theory it can work in worse conditions but it seems far more effective to go with some kind of marker vs trying to plan for all this in the camera.
 
But with the time-of-flight camera using IR, they don't have to plan for anything. It just builds a depth map, if you want to call it that. You really don't have to do anything. The raw data is in three dimensions. The raw data doesn't make any assumptions. The only interpretation comes when you have to recognize the human form and track it. That is ignoring other sources that emit heavy IR, or enough to interfere with the camera.

The only question is whether it would be easier to recognize and track a human form based on raw 3D data or from a 2D image. I'd hazard a guess that it is easier with 3D. For one, based on depth info, a person will pop out. There are complications with multiple people, obstructing objects, occluding body parts etc, but I'd think those might all be easier to solve in 3d than 2D. There may be some types of movement that would be very easy to do in 2D like head tracking.
 
Is there anything more than rumors that tell us it doesn't have any processors? I don't think anyone has leaked internal shots yet, have they?

Yes, the E3 2009 Microsoft materials mention the processor, the later materials do not. All other elements of the spec are the same.
 
Gosh at the low depth camera. I suppose it's 16 bit per channel instead of 8 bit, and I guess that makes it a robotic part rather than consumer item (where else are 16 bit cameras used?). This definitely makes finger tracking impossible! this may also explain the reduced skeleton complexity as there won't be enough info to track more points reliably. But you don't need more point IMO. 20 is enough to get the limbs and body position.

Indeed but IR in my eyes seems a bit of a waste when you can do it the exact way I stated. Sure IR is likely more accurate in measuring the distance so smaller changes are needed but I don't see anything greatly revolutionary.
In theory, but Sony's solution shows optical measurement needs sufficient contrast to be accurate that they couldn't solve without throwing in a light source. Consider trying to measure the hand size reaching towards the screen against a pale top. Working out which fraction of a pixel is the hand and which isn't becomes nigh impossible. I dare say Kinect and Move are the only possible solutions given current tech, when you factor in these companies have been chasing the idea of motion controls for a long time and trying out all options. The only other alternative is a TOF camera. Anyone got ideas why this hasn't been used? In the early days of Natal discussion, some were suggesting it shouldn't be that expensive.
 
Yes, the E3 2009 Microsoft materials mention the processor, the later materials do not. All other elements of the spec are the same.

As I understand from Primesense's documentation:
1) they manipulate an IR light source to make it do what they want (pulsing/modulating/whatever).
2) the IR camera output is read, looking for aspects of the pattern sent by the light source. This produces a 3d depth map?
3) they then turn the depth map into skeletal models.

1 & 2 are "time sensitive/high bandwidth"? So I assume some type of controller/processor is inside the camera.
3 is the bit that requires "black magic" and I think MS do that inside the 360?


In terms of the 320/640 camera DF have some comments:
http://www.eurogamer.net/articles/digitalfoundry-kinect-spec-blog-entry

In our gamescom demo, presumably using something closer to the original reference design, Kudo Tsunoda expressed reservations that hand and finger tracking would work consistently with the camera simply because human beings come in all different sorts of shapes and sizes. There would be no way to ensure accurate tracking of a child's fingers, for example.


Therefore, for the sake of reliability, the emphasis would shift to tracking the whole body and at that point the need for the VGA depth map was less apparent, although clearly tracking more subtle movements does become more challenging. The lower-resolution depth map also reduces the amount of data being beamed across USB, and decreases processing overhead too.

Something I haven't seen mentioned is interference: sunlight/reflected-sunlight/flourescent lighting & 'pressing the volume button for the telly' - not sure if anyone knows anything whether that would be a problem?
 
The only other alternative is a TOF camera. Anyone got ideas why this hasn't been used? In the early days of Natal discussion, some were suggesting it shouldn't be that expensive.

That was ruled out due to increased time to aquire an image. You'd have to send a pulse of light, and time how long it took to reach the target and reflect back. With the system in Natal all it does is look at the intensity (or whatever) of the reflected light, so half the travel distance is cut.

So instead of having a strobe where you do pulse, measure, pulse measure. You just have one continuous stream of input with no need to time pulses, etc...

Regards,
SB
 
That was ruled out due to increased time to aquire an image. You'd have to send a pulse of light, and time how long it took to reach the target and reflect back. With the system in Natal all it does is look at the intensity (or whatever) of the reflected light, so half the travel distance is cut.
The added time is nanoseconds! We're talking the speed of light here.
 
That was ruled out due to increased time to aquire an image. You'd have to send a pulse of light, and time how long it took to reach the target and reflect back. With the system in Natal all it does is look at the intensity (or whatever) of the reflected light, so half the travel distance is cut.
You got it wrong. The light travels identical distance in both cases.

For a TOF camera, the refresh rates can be much faster as it reads frame-at-a-time, while natal is pixel-at-a-time.
 
You got it wrong. The light travels identical distance in both cases.
With a TOF camera, you have to use the same photons emitted for that frame, so have to send the photons at the beginning of the frame and wait for them to bounce back. For Kinect, as I understand it from the term 'pattern' used to describe the visioning process, the scene is irradiated with a constant pattern of IR light. Thus there'll be reflected photons any time the camera wants to sample. But as I say all this is purely academic as the time difference is too small to affect anything.

What could be a slowing down is if the IR pattern is strobed or cast across the scene, needing a transmission phase, which could add some ms to the sampling rate of the Kinect solution, but that's actually arguing in the wrong direction! I'm still scratching my head as to why TOF wasn't used, and I guess cost.
 
I thought it was a time of flight camera, otherwise how are they measuring depth? Don't you need to know when the light was pulsed to know how long it took to be reflected back? If you were constantly flooding the room with IR, you'd always have reflection, but how would you make any kind of depth measurement?
 
It's definitely not TOF in Kinect. What I read was a pattern is shone into the room, and deviations in this pattern would give a sense of depth. Think of grid being projected onto a person's face and how it'd form contours. Rereading the Digital Foundry interview, the PrimeSense guy says they 'bathe the scene in near IR light' in what they call their 'Light Coding' method. As it uses standard components, it's basically an algorithm solution. They nature of their IR light is unknown, but that may be part of the reason for the low depth resolution as if may take several CCD pixels to determine one 3D point. Maybe. It's just a standard CMOS sensor though, whereas TOF would need a sensor to be designed and fabbed for the job.
 
Where did this information come from that it wasn't time of flight?

Previously, most time of flight used phase changes to calculate distance between source and target and required a laser and they also only do one pixel at a time since there was a single detector.

Not to long ago they started developing area scan cameras that gave you z information. This uses pulsed light with the sensor being modified wtih an anolog signal. (see http://en.wikipedia.org/wiki/Time-of-flight_camera and http://en.wikipedia.org/wiki/File:TOF-Kamera-Prinzip.jpg) These are new sensors and have only really been around a few years. (Though the theory has been known for longer.) The accuracy in xyz of these TOF cameras are identical to the kinect sensor.

There is no pattern matching of light sources (ie there isn't a grid.) It is just a IR light source being pulsed at a known frequency. (again see the jpeg I linked.)
 
Interesting. I can't find the details you mentioned about patterns. All I can find is the PrimeSense interview on DF where they say they bath the room in IR light and then there aren't any details about how they create the depth map.

From the PrimeSense website:

"PrimeSense technology for acquiring the depth image is based on Light Coding™. Light Coding works by coding the scene volume with near-IR light. The IR Light Coding is invisible to the human eye. The solution then utilizes a standard off-the-shelf CMOS image sensor to read the coded light back from the scene. PrimeSense’s SoC chip is connected to the CMOS image sensor, and executes a sophisticated parallel computational algorithm to decipher the received light coding and produce a depth image of the scene. The solution is immune to ambient light, and works in any indoor environment."
 
Where did this information come from that it wasn't time of flight?
When I spawned this thread, I went to great lengths (okay, I grabbed a few useful posts ;)) to include info about what we already new so people joining the conversation could quickly get up to speed without us having to cover old ground every time a new visitor comes along. Hence there are useful links in the first posts which recap what we've already learnt.

Expressly, post 2, Digital Foundry interviewed PrimeSense and came up with...

The two PrimeSense men are also very keen to point out...all of the video capture and depth perception hardware within Natal comes from them, and only from them.
"PrimeSense isn't just the provider of the 3D technology in Project Natal... it's the sole provider," says Maizels proudly. "Project Natal is much more than a 3D sensing device, but PrimeSense is the only company responsible for the 3D."
...
"PrimeSense is using proprietary technology that we call Light Coding. It's proprietary. No other company in the world uses that," Adi Berenson says proudly.
"Most of our competitors are using a variety of methods that can be aggregated into one technique that's called 'time of flight'...Our methodology is nothing like that. What PrimeSense did is an evolution in terms of 3D sensing. We use standard components and the cost of the overall solution and the performance in terms of robustness, stability and no lag suits consumer devices."
Light Coding on the other hand does what it says on the tin: light very close to infrared on the spectrum bathes the scene. What PrimeSense calls "a sophisticated parallel computational algorithm" deciphers the IR data into a depth image.
 
Where did this information come from that it wasn't time of flight?

From the PrimeSense website:

"The PrimeSensor is based on PrimeSense’s patent pending Light Coding™ technology. PrimeSense is not infringing any depth measurement patents such as patents related to Time-of-Flight or Structured Light – these are different technologies, which have not yet proven viable for mass consumer market."
 
Interesting. I can't find the details you mentioned about patterns. All I can find is the PrimeSense interview on DF where they say they bath the room in IR light and then there aren't any details about how they create the depth map.
I'm quite sure I read about it somewhere, but even if not, that seems a very likely explanation. They aren't using any timing. Failing that, they could always determine distance from the IR source as proportional to intensity, but then you have to worry about different materials reflecting different amounts of IR back. That's something a projected pattern is much more robust dealing with.
 
I'm quite sure I read about it somewhere, but even if not, that seems a very likely explanation. They aren't using any timing. Failing that, they could always determine distance from the IR source as proportional to intensity, but then you have to worry about different materials reflecting different amounts of IR back. That's something a projected pattern is much more robust dealing with.

In my quote above they say they do not infringe on Structured Light (projected patterns) patents. They say their tech is different. I have no idea what this thing is anymore.
 
New Scientist Magazine has an article on it. Unfortunately you have to pay to read it. :( Though I was able to read a comment on the article from the author who had this to say...

Hi Jason,

Actually that's not the case. Kipman says: "Our IR does not pulse and it is not based on a TOF system (which usually pulses). Our light source is constant much like you would expect a projection system to work in a conference room."

Hope that helps,

Colin

That was in response by a reader who said Natal was TOF.

View the comments here.

BTW, here are some other sites that reported on that same article...

Kotaku: Natal Recognizes 31 Body Parts, Uses Tenth Of Xbox 360 "Computing Resources"

TechRadar: Microsoft: Natal consumes 15 per cent of Xbox CPU power

IGN: Everything We Know About Project Natal

Tommy McClain
 
In my quote above they say they do not infringe on Structured Light (projected patterns) patents. They say their tech is different. I have no idea what this thing is anymore.

This has been posted on another forum - and it can be looked up on uspto.gov with tiff images if you've got a method of displaying them.

The patent: http://www.faqs.org/patents/app/20100118123

(The refered patent seems easier to read) http://www.wipo.int/pctdb/en/wo.jsp?wo=2007043036&IA=IL2006000335&DISPLAY=DESC

My amateurish interpretation is "it's like structured light, except they shine a changing pattern of dots instead of a grid".
 
Shrug, C't did a DIY patterned 3D scanner in the 90s ... dunno what's patented, but certainly not the basic concept of projecting a raster to get 3D data.
 
Back
Top