Kinect technology thread

I still shake my head at the tilt motor. That thing will never scale down in costs. Step motors are done. No process shrinks or other semi-reliable advances will make them cheaper.

Going with the initially pitched camera resolution OTOH would have solved the same problem, while avoiding all that weight and power. All the while processing performance per Watt/per mm² still advances at a steady pace. And you wouldn't even need to process the full camera frame. Just pick the hot rectangle where all the movement happens, but don't do it mechanically. Pile on the functional benefits of higher local resolution (fingers! etc), and you really have to wonder what happened there.

It's such a weird decision. I don't even believe this is a net cost reduction right now, let alone with a bit of chip integration further down the line. Departments not talking to each other? Engineers asleep under the desk?
 
It's such a weird decision. I don't even believe this is a net cost reduction right now, let alone with a bit of chip integration further down the line. Departments not talking to each other? Engineers asleep under the desk?
I don't want to take this too far OT, but actually going bak a step regards my comment, and MS's engineering in general, are there examples of MS engineering that's high quality, rather than the seemingly bolted-together bits of peripherals that I associate with them? How much of Kinect's hardware design is by necessity, and how much is bad design?

More on topic, does anyone know if the IR emissions ahve the same angle of projection as the cameras have FOV, such that the camera would see effectively a 2D pattern overlaid, or are the IR emissions at a different angle so that the pattern changes by distance? If the latter, anyone know what the spread is?
 
I don't want to take this too far OT, but actually going bak a step regards my comment, and MS's engineering in general, are there examples of MS engineering that's high quality, rather than the seemingly bolted-together bits of peripherals that I associate with them? How much of Kinect's hardware design is by necessity, and how much is bad design?
I was dancing around this myself ..
The last piece of Microsoft hardware I purchased was the HD-DVD addon drive, and that was offensively shoddy. That might have been excessive penny-pinching all the same, but it begs the question how much hardware development capacity they actually have in house. Maybe they can't do better period, or maybe they don't want to spend the cash.

edit:
More on topic, does anyone know if the IR emissions ahve the same angle of projection as the cameras have FOV, such that the camera would see effectively a 2D pattern overlaid, or are the IR emissions at a different angle so that the pattern changes by distance? If the latter, anyone know what the spread is?
Can't say, but here's a visualization of Kinect's IR pattern done with some game's pack-in goggles :D
 
Can't say, but here's a visualization of Kinect's IR pattern done with some game's pack-in goggles :D
Yeah, I saw that and it explains a lot. If we knew how the pattern is projected, we could tell if the 3D aspect was displacement (highly likely), or some voodoo maths. And if the projected pattern is a different angular projection to the cameras' FOV, is it more or less? As someone suggested elsewhere, could the emitter be a limiting factor on the play area?
 
So you guys feel that they could have done better with the design, or Shifty are you saying that Sony would have done a better job of designing the hardware? In what way exactly? are you talking about size or what? I want to understand what you are trying to say before I respond.
 
I wouldn't disparage someone else's engineering before you understand how the product actually works. To be honest, that looks like a fairly compact PCB design, and it doesn't look like there's a lot of fat to be trimmed. For the motor, it could maybe be an issue that they didn't want to go with a wide-angle lense because of image distortion. The fish-eye effect would probably cause some issues with interpreting that projected pattern, so it might make things easier to have a flat image and reposition the camera, in terms of processing.

Back when the price came out, people were estimating outrageously cheap BOMs, saying it was overpriced. Now they're looking at the tear downs, seeing how complex it really is, and questioning whether it's over-designed. All the while, none of us really understands how it works, so making judgement calls on the component level seems a little ridiculous.
 
I wouldn't disparage someone else's engineering before you understand how the product actually works. To be honest, that looks like a fairly compact PCB design, and it doesn't look like there's a lot of fat to be trimmed. For the motor, it could maybe be an issue that they didn't want to go with a wide-angle lense because of image distortion. The fish-eye effect would probably cause some issues with interpreting that projected pattern, so it might make things easier to have a flat image and reposition the camera, in terms of processing.

Back when the price came out, people were estimating outrageously cheap BOMs, saying it was overpriced. Now they're looking at the tear downs, seeing how complex it really is, and questioning whether it's over-designed. All the while, none of us really understands how it works, so making judgement calls on the component level seems a little ridiculous.

My thoughts exactly.
 
Does Kinect use the RGB camera for any motion tracking, or is it only used for identification, sign on, and augmented reality types of games, and only the IR is used for skeletal tracking, etc? One way to test this would be to see if there's any performance degradation/extra lag between playing in a well lit room vs pitch black. Can anyone cover the RGB camera lens with something and test if there's any difference?
 
It's such a weird decision. I don't even believe this is a net cost reduction right now, let alone with a bit of chip integration further down the line. Departments not talking to each other? Engineers asleep under the desk?

I think at this point given how new it is, they decided to play it safe by adding tilt just in case to allow for unforeseen room situations, and unforeseen product uses. If after two years of Kinect product they determine that tilt isn't needed, or that it was overkill for typical rooms, then they can drop it in Kinect 2 that will ship with the next gen console. Or maybe just keep manual tilt if it's determined that it is only used once during initial configuration. At this point there are too many unknown variables to know if it's needed for sure, and the software is too immature to say if software can replace mechanical tilt. I wouldn't be surprised if they are sending back all kinds of info on how tilt is being used, to help make that determination in a few years.
 
I think at this point given how new it is, they decided to play it safe by adding tilt just in case to allow for unforeseen room situations, and unforeseen product uses. If after two years of Kinect product they determine that tilt isn't needed, or that it was overkill for typical rooms, then they can drop it in Kinect 2 that will ship with the next gen console. Or maybe just keep manual tilt if it's determined that it is only used once during initial configuration. At this point there are too many unknown variables to know if it's needed for sure, and the software is too immature to say if software can replace mechanical tilt. I wouldn't be surprised if they are sending back all kinds of info on how tilt is being used, to help make that determination in a few years.

Actually the motorised tilt has a practical application in video Kinect. If you stand while chatting, it will tilt up, or if you crouch or lie down while chatting (for whatever reason) it will tilt to accommodate you. This tilt mechanics can be used in a game.
 
iFixit's teardown of the Kinect hardware reveals some interesting tidbits about this thing.

1: it draws 12 fucking watts of power. Holy smokes! It has to be actively fan-cooled or else it dies.

2: it has 2 internal heatsinks/heatspreaders, one smaller for the SoC controller and a larger one acting as a mounting frame for the cameras and IR projector, along with a tiny peltier device (!) strapped onto it cooling said projector!

3: Onboard microcontroller features a separate 512Mbits (64MB) DDR2 DRAM IC. Holy smokes again, that's twice the amount of my first PC back in 1997... Higher bandwidth too! This thing could run the original Unreal no problem! :LOL:

4: 3 circuit boards, one small and a smaller one stacked underneath the main board. Motor and associated gears look damn flimsy, and are located in the base, not in the body like I wrongly assumed.

5: 4 microphones, but you folks probably knew that already. Facing downwards for some reason, but OK...

6: There's a MEMS device included as well (damn, this thing has everything but the proverbial kitchen sink! :LOL:), iFixit speculates for tilt sensing purposes. Seems logical, methinks.

I wonder how this thing will hold up, seeing its power draw and need for active cooling. My concerns are internal dust buildup, and the longevity of the mechanical components. Tiny fans like the one in the Kinect aren't well-known for their awesome durability - quite the opposite in fact. If it fails, the Kinect is likely to die a fiery death, unless there's a thermocouple hidden away in there as well which shuts the thing off if temps get too high. There's no RPM sensor wire for the fan, so the hardware can't detect failure that way.

I'm also wondering how well the dust seals for the cameras will hold up in a forced ventilation environment. Even a small amount of leakage will cause dust buildup on the optics and sensor IC. Perhaps MS counts on planned obsolescence to render any such concerns moot...or else the Kinect just isn't designed to live long enough for this to be a problem. The solid-state IR projector is one such component that doesn't strike me as having a terribly long lifespan. The youtube video showing the IR lamp in action makes me think it's quite bright (and the cooling requirements implies as much too), and thus likely to wear out as well...

Hm, perhaps I'm over-analyzing things. In any case, it's a quite impressive piece of kit from a purely hardware geek standpoint. Fun trivia fact: googling the SoC's stated model code to try and find a datasheet shows only hits based on the iFixit teardown. :LOL: It's probably some kind of ARM-based device (and if not, it's almost certainly a MIPS core), but clockspeed and any other details are sketchy at best.
 
No, I don't think so. That processor will be responsible for turning the array of IR dots into a depth map. The component that was going to do the skeleton tracking is not present, leaving that to XB360.

So many chips! One can't help but feel if this were Sony's baby, they'd all be integrated onto a single die IC. :mrgreen:
Sony engineers do amazing board design, but in this case, I doubt they could have reduced the chip count by much. Also, the chip responsible for the 3d point cloud is the PrimeSense PS1080-A2, on the smaller of the two mainboards. There's another entire board with a Marvell AP102. I have pointed out before that people had made assumptions when we said we'd moved skeleton processing to the console. But skeleton processing isn't all the system is doing.
I was dancing around this myself ..
The last piece of Microsoft hardware I purchased was the HD-DVD addon drive, and that was offensively shoddy. That might have been excessive penny-pinching all the same, but it begs the question how much hardware development capacity they actually have in house. Maybe they can't do better period, or maybe they don't want to spend the cash.
I'm sad that you thought the HD DVD addon was shoddy. In what way? Also, note that it wasn't built by us, but Toshiba.
I think at this point given how new it is, they decided to play it safe by adding tilt just in case to allow for unforeseen room situations, and unforeseen product uses. If after two years of Kinect product they determine that tilt isn't needed, or that it was overkill for typical rooms, then they can drop it in Kinect 2 that will ship with the next gen console. Or maybe just keep manual tilt if it's determined that it is only used once during initial configuration. At this point there are too many unknown variables to know if it's needed for sure, and the software is too immature to say if software can replace mechanical tilt. I wouldn't be surprised if they are sending back all kinds of info on how tilt is being used, to help make that determination in a few years.
Trust me, if there was any good way for us to have been able to avoid putting the motor in, we would have done it, but we wanted to support multiple mounting heights and people from 4' to over 6', and it couldn't be done with current tech without a tilt motor.

I wouldn't be too sure that we'd update the hardware with more capabilities over time, for the same reason we haven't made the XBox faster or put more memory or more EDRAM into it. The only way we could do it is if it had a fallback behaviour so current games would still work, which adds to the cost. And new games would have to have a fallback so they worked with older sensors. It becomes untenable very quickly.
 
I still shake my head at the tilt motor. That thing will never scale down in costs. Step motors are done. No process shrinks or other semi-reliable advances will make them cheaper.

Going with the initially pitched camera resolution OTOH would have solved the same problem, while avoiding all that weight and power. All the while processing performance per Watt/per mm² still advances at a steady pace. And you wouldn't even need to process the full camera frame. Just pick the hot rectangle where all the movement happens, but don't do it mechanically. Pile on the functional benefits of higher local resolution (fingers! etc), and you really have to wonder what happened there.

It's such a weird decision. I don't even believe this is a net cost reduction right now, let alone with a bit of chip integration further down the line. Departments not talking to each other? Engineers asleep under the desk?

that's a good Idea but wouldn't they lose the angles like how it points down for soccer games.
 
Depending on the FOV of the camera (and whether that can change/zoom), using a 4X higher resolution sensor may not be a suitable alternative to being able to tilt/rotate the camera.
 
So you guys feel that they could have done better with the design, or Shifty are you saying that Sony would have done a better job of designing the hardware? In what way exactly? are you talking about size or what? I want to understand what you are trying to say before I respond.
At three PCBs and some 12ish ASICs, the whole design doesn't appear (at a cursory glance) to fit with engineering K.I.S.S.

I wouldn't disparage someone else's engineering before you understand how the product actually works.
Yes, I suppose it's easy to be an arm-chair engineer, and one without qualifications to boot! :mrgreen: But comparisons with other devices, and a little know-how, I'm not seeing why this thing needs so much stuff relative to other devices. It's not about trimming fat, but consolidating. Again though, I wasn't presenting a carefully considered viewpoint, but just reacting that it's a world apart from the elegance of Sony PCBs.

Back when the price came out, people were estimating outrageously cheap BOMs, saying it was overpriced. Now they're looking at the tear downs, seeing how complex it really is, and questioning whether it's over-designed. All the while, none of us really understands how it works, so making judgement calls on the component level seems a little ridiculous.
But guessing and discussing is part of the fun of tech. Otherwise why the hell are any of us on this board?! Why do people get so upset when others wonder about things?

Sony engineers do amazing board design, but in this case, I doubt they could have reduced the chip count by much.
I understand different chips from different manufacturers, but I'm sure a custom ASIC could roll most of those components into a couple of chips. Although TBH I don't know if Sony still have the means to produce their own custom components. But given the expectation to sell millions of these, is it really more economical to source 12 different components and fit them into a relatively large, hot device, then to design something for the job that would achieve it in a simpler package? Was that option explored and the economics just didn't justify it?

Also, the chip responsible for the 3d point cloud is the PrimeSense PS1080-A2, on the smaller of the two mainboards. There's another entire board with a Marvell AP102. I have pointed out before that people had made assumptions when we said we'd moved skeleton processing to the console. But skeleton processing isn't all the system is doing.
I don't suppose anyone will ever tell us the details of this. :( We know there's the 2D image, and clearly Kinect has to be doing on-board processing to control the motors. Oh, I suppose it could be getting instruction from the 360. All the info we have to go on is that the 360 receives the image data, audio stream and point-cloud. I'll take your word for it that there's more info than those! I suppose there's room data info with the camera informing 360 whereabouts it's looking. You can't really complain though when people try to join the dots and get it all wrong when they aren't aware they're missing some!

Trust me, if there was any good way for us to have been able to avoid putting the motor in, we would have done it, but we wanted to support multiple mounting heights and people from 4' to over 6', and it couldn't be done with current tech without a tilt motor.
You won't be able to answer this, but what is wrong with a higher res camera and wider lens FOV? Is it harder to accomodate the 3D spacial interpretation with wideangle distortions? Or cost of the cameras? Or speed of the depth processing?

Also, let's be clear about this most of all, I certainly wasn't knocking the engineering, not on any serious level. It was, as I said, a tongue-in-cheek comment. If all that stuff is necessary, there's no posssible alternative, and my ideas of something more like a minimalist work-of-art with a lone black IC dead centre of a PCB with Art Nouveau tracks swirling elegantly around are completely out of line with what is possible I happily accept that. Out of curiosty though I am interested in the design decisions and thought processes and in understanding the differences between Kinect and similar devices. Are there no cost considerations at all and this is the best anyone could do? Or may we are seeing engineering sadly held back as a rough compromise thanks to the prosaic limits of real-world economics?

Edit: Flicking through the teardown again, I've just noticed the two camera CCDs are actually different parts. This blows my theory of using the same component for both tasks to save costs out the water, and again leads to head scratching! If PrimeSense are to be believed, their system works with off-the-shelf components making for very cheap systems, one of their major selling points, so what's specific about the Microsoft part that another CCD couldn't do it? I notice the centre one (optical I believe) has a much larger aperture, certainly as alluded to by the cone between the camera and case.
 
Trust me, if there was any good way for us to have been able to avoid putting the motor in, we would have done it, but we wanted to support multiple mounting heights and people from 4' to over 6', and it couldn't be done with current tech without a tilt motor.

I was just about to say, they can't have thrown this in lightly! It must have been actually necessary - I'm not going to speculate on how much it adds in terms of cost, but I'm sure it's substantial enough to care. And to be honest, I can totally see that it is as well as adding a lot of user friendliness as well as extend its range, which is important. I've just been playing The Fight, and I'm really playing with my legs pressed into the couch because I just don't have enough room otherwise (it's 1.6m between the TV and the couch and I'm 1.92m).

I wouldn't be too sure that we'd update the hardware with more capabilities over time, for the same reason we haven't made the XBox faster or put more memory or more EDRAM into it. The only way we could do it is if it had a fallback behaviour so current games would still work, which adds to the cost. And new games would have to have a fallback so they worked with older sensors. It becomes untenable very quickly.

I agree that's totally unlikely. But I do presume that there were some time constraints that prevented you from going too far in optimising the chipsets, and much like the current Slim is much more integrated than the launch 360 I'm willing to bet we'll see some changes over time. Does the current unit still have active cooling? I think it may be possible to reach at least a point where that can be done passively? I don't expect much more than that though.
 
I agree that's totally unlikely. But I do presume that there were some time constraints that prevented you from going too far in optimising the chipsets, and much like the current Slim is much more integrated than the launch 360 I'm willing to bet we'll see some changes over time. Does the current unit still have active cooling? I think it may be possible to reach at least a point where that can be done passively? I don't expect much more than that though.

He's talking about improving performance. Of course they will do things to lower the cost or even aesthetic changes, but once you release an actual add-on you're basically locked in on performance. You still have to make all the games with the lowest common denominator hardware in mind, so performance improvements are problematic, unless you can do them in software.
 
At three PCBs and some 12ish ASICs, the whole design doesn't appear (at a cursory glance) to fit with engineering K.I.S.S.

Yes, I suppose it's easy to be an arm-chair engineer, and one without qualifications to boot! :mrgreen: But comparisons with other devices, and a little know-how, I'm not seeing why this thing needs so much stuff relative to other devices. It's not about trimming fat, but consolidating. Again though, I wasn't presenting a carefully considered viewpoint, but just reacting that it's a world apart from the elegance of Sony PCBs.
And if you compare the launch PS3 to the current one, or the Launch 360 to the current one, you will also see a huge decrease in complexity and cost. It's all about timing. Sure you could design an ASIC to do all the work, but then halfway through the development cycle, you decide to do something different, now you have to redesign the ASIC. First versions of a lot of products tend to be less efficient in design than followup versions to give the greatest development flexibility.
I understand different chips from different manufacturers, but I'm sure a custom ASIC could roll most of those components into a couple of chips. Although TBH I don't know if Sony still have the means to produce their own custom components. But given the expectation to sell millions of these, is it really more economical to source 12 different components and fit them into a relatively large, hot device, then to design something for the job that would achieve it in a simpler package? Was that option explored and the economics just didn't justify it?
I'm guessing the timing didn't justify it. I'm sure there will be revisions of the hardware, just like there have been revisions of the 360 hardware.
You won't be able to answer this, but what is wrong with a higher res camera and wider lens FOV? Is it harder to accomodate the 3D spacial interpretation with wideangle distortions? Or cost of the cameras? Or speed of the depth processing?
I don't know the specifics, but I would guess it's a limitation introduced by the PrimeSense tech.. I also don't know why the two different CCD parts, could have been due to IR performance (The RGB one doesn't have to be as sensitive), could be due to wanting a wider FOV for the RGB camera for Video Kinect (Don't know if that's true or not either).

Put it this way, the XBox org is _very_ careful about controlling costs. We don't have the advantages of ridiculous markup that the Office and Windows orgs have. But we have to balance BOM with engineering development costs too. During development and in the early life of a product, the BOM is a small percentage of the total costs associated with the device. You'll want to cost reduce it as soon as you can, of course, but the advantages of having an "off the shelf" solution for development can often outweigh the BOM tax associated with it.

Then again, I don't think anyone expected us to be able to sell 5 million of the things in 2 months, so I certainly hope we're at least covering BOM costs :)
 
I don't suppose anyone will ever tell us the details of this. :( We know there's the 2D image, and clearly Kinect has to be doing on-board processing to control the motors. Oh, I suppose it could be getting instruction from the 360. All the info we have to go on is that the 360 receives the image data, audio stream and point-cloud. I'll take your word for it that there's more info than those! I suppose there's room data info with the camera informing 360 whereabouts it's looking. You can't really complain though when people try to join the dots and get it all wrong when they aren't aware they're missing some!

I wouldn't necessarily sell the audio processing short on this. It not only has to correlate source of sound from 4 microphones to discrete people as detected and processed by the camera but, combined with some sort of image recognition programming, also allow it to follow the speaker. I would assume that this would all be done on camera as the system also has to control who the camera is looking at depending on who is talking. Again depending on program.

Since the camera can also be controlled to some extent by developers, camera aiming for feet for the soccer mini-game for example, then it would seem reasonable that some image processing is done on the device itself. Isolating not only different people but also specific body parts of people. I imagine this could be used to simpify what's being done on the console itself.

In other words, person specific data is sent to the console for the skeleton to be built rather than a mass of image data and the console has to then not only sort what is a person but also build the skeleton. Another possibility is that the basic skeleton is built on the camera itself, but all the probability calculations for potential locations (fine tuning) of the skeletal points are done on the console where the GPU would be more efficient at crunching through massively parallel sets of data.

I have a feeling many of us sold the processing done by Kinect short when word came out that certain features were going to be processed on the console. It wouldn't surprise me at all if we were to learn that all the console does is crunch numbers for probability calculations for skeletal points.

For the people that have used the video chat with Kinect. Does the system actually attempt to keep the speaker centered as much as possible?

Regards,
SB
 
Put it this way, the XBox org is _very_ careful about controlling costs. We don't have the advantages of ridiculous markup that the Office and Windows orgs have. But we have to balance BOM with engineering development costs too. During development and in the early life of a product, the BOM is a small percentage of the total costs associated with the device. You'll want to cost reduce it as soon as you can, of course, but the advantages of having an "off the shelf" solution for development can often outweigh the BOM tax associated with it.

That's an interesting point that often gets lost when people start talking about the cost of materials for a product. And a great explanation of why first generation hardware is generally a bit of a mess when compared with later revisions (assuming a long enough product life cycle).

Regards,
SB
 
Back
Top