Kinect technology thread

But then the technology isn't hamstrung by outdated USB-technology, as the article said.
Then the technology is hamstrung by artificial limits.

They said the exact same thing with the Vision-camera. :-/
 
I don't get the USB-excuse they used in that article, why the Kinect are so limited..
PS3 can also have several things hooked up via USB in addition to the camera wich is connected with USB, and that is able to have twice the framerate, or twice the resolution at the same framerate. :-/
Isn't it the same USB-standards in the machines?

But kinect has also to send a depth image and sounds. According to bkilian in this very thread they get 5 sounds stream from the device and send 6 to it (for the noise cancelation).

Also, there's no confirmation that kinect on 360 doesn't use all the available bandwidth, it's speculation by DF (speculation based on a lot of measured data, i agree, but still speculation).

Going by the reference design by prime sense (which kinect does seems to follow by the book) kinect should be able to output a 640*480 @ 60 fps depth image, so yes, it is limited, but it seems limited to use all the sustained data they can get giving that 30 times per second they have to fetch a 640*480 32 bit image, a 320*240 16bit image and 5 streams, along with whatever extra data kinect sends (like positional data for the sound streams) from the device and also send 6 sound streams to it.
 
A nice presentation from Gamefest:

http://www.microsoft.com/download/en/details.aspx?id=27977

Shows how Ms achieved finger tracking with kinect (it's not just hand gestures, it's actual finger tracking and using a 320*240 depth image).

They actually can project where your finger is pointing.

From the slide they seem to have a ongoing research for that one and using a higher res image would greatly benefit them, but seems to be working nice already to be implemented in upcoming games.
 
I don't get the USB-excuse they used in that article, why the Kinect are so limited..
PS3 can also have several things hooked up via USB in addition to the camera wich is connected with USB, and that is able to have twice the framerate, or twice the resolution at the same framerate. :-/
Isn't it the same USB-standards in the machines?

If I'm remember well, may be I'm wrong, it's more a bandwidth limit for all the USB and HDD of 360, than a limit of the USB for Kinect. The USB/ HDD bandwidth in Xenos need to be reserve in big part for the HDD (for good streaming install games), and USB Wifi, and probably DVD is also on the same contoller. So Kinect only have a constant limit bandwidth to not interfere with others.
 
From what I remember, it's a limitation of the particular Implementation of USB in the XBOX360 and not a limitation of USB specification.
 
Kinect 2 so accurate it can lip read
Am I the only one that thinks that lip-reading will be done purely from the good-old 2d video stream and not from the 3d depthmap? For getting lip reading to work you don't really need all that much better thing than the cameras in phones.
 
I dunno about that, I'd be very impressed if they get kinect reading lips at a range of 10' with a 640x480 camera.
 
Who said they should stay with that lousy 0.3MPix camera? Even my laptop's built-in thingy has 4x more pixels than that.

To get a good enough depthmap for lip-reading they'd need several orders of magnitude more samples with the IR camera they currently use. With plain-old RGB camera it's trivial. Not so with the method they use to get the depth samples and to actually read those samples they'd still need a very high-resolution camera to actually be capable of reading the IR dots they generate.

Basically what I'm saying is there is no real point on doing that detailed depthmap for lip-reading when you can do it just as easily (or probably even easier) with image-based lip reading. Especially because the depth difference between lips/mouth/rest of the face is so tiny that it'll be a nightmare to figure out the lip movement just from the depthmap.
 
You mentioned phones, the front camera on the iphone 4 for instance, is vga.

I'm not sure how useful depth would be, although I suspect it might be helpful for when the target is not directly facing. I don't expect they'd be using it alone it would be to assist the 2d camera.
 
Well, the difference is, that you don't "Facetime" with your phone put across the room. You hold it next to your face, with a distance of maybe 50cms.

But... resolution isn't everything. I high framerate will also help and is probably very much needed to "read lips".
 
I dunno about that, I'd be very impressed if they get kinect reading lips at a range of 10' with a 640x480 camera.

Zoom Lens

In line with this thought, I was thinking it would be more useful to have a zoom lens rather than uber resolution.

Higher res takes more bandwidth and more processing power which leads to higher costs and lag.

If the camera had a zoom, it could zoom in on hand(s) a face, a foot, whatever the case may be. The zoom could also zoom out to allow wider viewing angle.

Granted, the cheapest zoom lens camera I found was $60... and this is without the motors and logic necessary to automatically zoom and track a face/hand/foot/person/group ... so maybe not such a good idea. :???:
 
Ya, It'd be way cheaper to provide a much higher resolution camera than it would to provide a proper optical zoom. Not to mention the issues of trying to have kinect focus on a moving person from a distance while zoomed in on their face. It's just not going to happen.
 
Zoom Lens
Good in theory but in practice it'll be easier to just use a high-resolution camera (if those 8-12MP things can be called that) that is static and doesn't have extra optics or even worse, move itself around to find it's target. Also, zoom would mean just a single person could be tracked at a time and it would probably require a second camera for the depth buffer generation. Way too complex.
 
Good in theory but in practice it'll be easier to just use a high-resolution camera (if those 8-12MP things can be called that) that is static and doesn't have extra optics or even worse, move itself around to find it's target. Also, zoom would mean just a single person could be tracked at a time and it would probably require a second camera for the depth buffer generation. Way too complex.

Agreed, too complex/costly/limited.

But the second camera part is already there.


For HD camera:
Microsoft LifeCam Studio 1080p HD Webcam roughly $60 at retail (2MP)

IPhone 4gs has a 8MP camera on the back, but I have no idea what the video capabilities are. I'd assume 1080p at 30fps. I doubt the cost of the camera in this phone would be more costly than the webcam above, but I could be wrong as it is in a $600 device.

Either way, we are talking about roughly 7 times more bandwidth, data, and detail in going with a 1080p video feed.

Bandwidth savings could be had by smartly culling data in the video stream which is irrelevant. Doing so would enable Kinect2 (or Kinect HD) to be compatible with existing xb360 hardware.

In fact, perhaps with the success of Kinect, MS will recognize the device is worthy of real investment and put guts on board which can generate accurate 3d skeletal information and just send that info down the usb.

The room itself is static, and if the camera is high res, it could just get texture and model info of the player by doing a full Hi-Res scan and mapping the info to a full 3d model in the 360/720.

Then all they need to do is map the skeletal info to the 3d model of the person playing.

For uses other than full body motion gaming, they will be sampling a much smaller area and so they can cull useless data outside of that which is being tracked (hand/face etc).
 
Either way, we are talking about roughly 7 times more bandwidth, data, and detail in going with a 1080p video feed.
USB 3 has 5 Gbps BW. Uncompressed 1080p30 video requires < 1.5 Gbps. High quality h.264 encoded streams can be 40 Mbps or less (maximum bitrate of BRD).

The room itself is static, and if the camera is high res, it could just get texture and model info of the player by doing a full Hi-Res scan and mapping the info to a full 3d model in the 360/720.
Well it's not quite as easy as that, as something like Kung Fu Live demonstrates. But MS's 3D scanning tech is pretty awesome, and they could definitely have a player turn around and scan them in to build up a model. That's very exciting.

For things like lip-reading, facial recognition of a standard video stream will be fine, although as others say, higher framerate would probably be more beneficial. I also don't see the point in lip-reading outside of better voice recognition, so there'll be sound to map to, meaning only the basic mouth shapes should be needed to match with the sounds. And of course smile/frown detection, which wants full facial tracking. This is something Sony could have demo'd on PSEye given their past research, but they aren't doing anything of the sort. High res 3D wouldn't need more than 720p60 I reckon, but they'd want better depth resolution.
 
I don't know that lip reading is necessarily the goal, I think that comment was more to highlight what might be available with the advanced product (although it might mean not having to shout at your xbox if you have less than quiet conditions). Reading facial expressions or finger tracking would certainly open up a variety of control options and interactivity.
 
USB 3 has 5 Gbps BW. Uncompressed 1080p30 video requires < 1.5 Gbps. High quality h.264 encoded streams can be 40 Mbps or less (maximum bitrate of BRD).

Sure usb3 eliminates any concern for bandwidth savings. But I was trying to come up with ways they could keep the new kinect2 working with the old usb2 xb360.

Well it's not quite as easy as that, as something like Kung Fu Live demonstrates. But MS's 3D scanning tech is pretty awesome, and they could definitely have a player turn around and scan them in to build up a model. That's very exciting.

The 3d model of the person wouldn't be necessary for all games/interactions, but having kinect2 with enough processing power to decipher the images and just send the skeletal information to the console would enable usb2 to be plenty of bandwidth and thus compatible with xb720 and xb360.

For things like lip-reading, facial recognition of a standard video stream will be fine, although as others say, higher framerate would probably be more beneficial. I also don't see the point in lip-reading outside of better voice recognition, so there'll be sound to map to, meaning only the basic mouth shapes should be needed to match with the sounds. And of course smile/frown detection, which wants full facial tracking. This is something Sony could have demo'd on PSEye given their past research, but they aren't doing anything of the sort. High res 3D wouldn't need more than 720p60 I reckon, but they'd want better depth resolution.

Better depth resolution would indeed be nice, but would be even more expensive as the array for tracking would have to be redesigned, yet another hi-resolution camera installed, and even more processing power dedicated to tracking/decoding the depth camera on-top of the higher res rgb camera.

I'm not expecting it.
 
Back
Top