Old Discussion Thread for all 3 motion controllers

Status
Not open for further replies.
The Sony system suggests more than a purely visual technique to me, simply by looking at the resolution of PSEye, the distance from the camera to the ball, the size of the ball, and the depth accuracy they were getting.
 
If the ball on top of the controller is emitting its own light then it would be extremely easy and accurate to track using a IR sensitive camera (which i believe the PSEye is). They should be able to track the controller on the xy axis accurately using just this. Tracking of depth on the z axis would be done using the ultrasound component alone. The orientation of the controller would be tracked in the same way as wii's motion plus using gyroscopes. Using these three components together gives you all you need for 1:1 tracking and positioning in 3d space.

This is my view on exactly how it works, anyone think otherwise?

EDIT:
First result on google on whether the PSEye is IR sensitive was a post by shifty on this very forum: http://forum.beyond3d.com/showpost.php?p=1124532&postcount=26
Can i ask you to confirm this shifty? Im assuming when you talk about IR filters you mean on the software side and not some sort if colour gel type filter needing to be placed physically in front of the camera?
 
Last edited by a moderator:
The patents certainly suggest that the device is using ultrasound for depth (z) and visual cues for horizontal and vertical movement (x/y). However, an interview with Kaz Hirai at E3 seems to suggest that they've since dropped the ultrasound and only use visual tracking now using the size of the ball for depth:


http://audioboo.fm/boos/27700-e3-final-day

The only thing I can think of now is that the precision comes from that you know what you are tracking is a perfect circle. In that case what the software needs to do is:

1) find the circle in the image
2) map the digital image to a perfect circle of a certain size to determine x/y/z.

I think given a digital input image you don't actually even need a very high resolution to still get very precise input, as luminence is going to help also?

If Kaz has it right (you'd think so, but stranger things have happened), it's interesting and it has a number of implications.
 
The patents certainly suggest that the device is using ultrasound for depth (z) and visual cues for horizontal and vertical movement (x/y). However, an interview with Kaz Hirai at E3 seems to suggest that they've since dropped the ultrasound and only use visual tracking now using the size of the ball for depth:


http://audioboo.fm/boos/27700-e3-final-day

The only thing I can think of now is that the precision comes from that you know what you are tracking is a perfect circle. In that case what the software needs to do is:

1) find the circle in the image
2) map the digital image to a perfect circle of a certain size to determine x/y/z.

I think given a digital input image you don't actually even need a very high resolution to still get very precise input, as luminence is going to help also?

If Kaz has it right (you'd think so, but stranger things have happened), it's interesting and it has a number of implications.

If you apply an IR filter you probably dont have to bother trying to find the circle in the image, you would just look for a IR scource that falls within an expected size range and then measure its diameter to determine depth.
 
True, optical finger recognition will solve the problems with gloves (hot, smelly, uncomfortable), while gloves solve the problems with optical finger recognition (actually being able to detect the position of the fingers in all cases) :smile:

If you really want finger recognition (and I can't see a reason why you would...
A-ha! There is where my argument trumps yours in the middle ground. I'm thinking the applications where finger detection is useful are situation where purely optical will work. Specifically, giving instructions through hand gestures - fingers closed, open palm to camera; spread hand; counting fingers; pointing. Really fine finger detection is probably beyond the point of any useful gaming applications, and is the domain of professional applications. eg. The guitar playing detection is useless for games if the players haven't spent years learning the guitar! There's no need for that level of finesse in a console interface (this is setting me up for a fail - what's the betting the future becomes tiny finger motions!)

Can i ask you to confirm this shifty? Im assuming when you talk about IR filters you mean on the software side and not some sort if colour gel type filter needing to be placed physically in front of the camera?
An actual physical filter. All the software gets in an RGB valus. It can't determine what caused the pixel to light up. The CCD is senstive to IR. Plug in your PSEye/EyeToy and shine a TV remote at it, you'll see the black bulb pulses white through the camera.

Thinking about it, especially with 120 Hz capture, they could exploit this and track a pulsating point. If it alternates between black and white every frame, it'll be a target point for the camera to follow
 
A-ha! There is where my argument trumps yours in the middle ground. I'm thinking the applications where finger detection is useful are situation where purely optical will work. Specifically, giving instructions through hand gestures - fingers closed, open palm to camera; spread hand; counting fingers; pointing. Really fine finger detection is probably beyond the point of any useful gaming applications, and is the domain of professional applications. eg. The guitar playing detection is useless for games if the players haven't spent years learning the guitar! There's no need for that level of finesse in a console interface (this is setting me up for a fail - what's the betting the future becomes tiny finger motions!)

I agree with you in general. In principle though I think you cannot make a gesture with one hand where a finger's position cannot be deduced from the position of the rest of the hand. So eventually it should always be possible.

One of the first things I thought of when I saw Natal in action was that a great testing application to find the limits / strengths of the device is by trying to teach it sign language.
 
The patents certainly suggest that the device is using ultrasound for depth (z) and visual cues for horizontal and vertical movement (x/y). However, an interview with Kaz Hirai at E3 seems to suggest that they've since dropped the ultrasound and only use visual tracking now using the size of the ball for depth:


http://audioboo.fm/boos/27700-e3-final-day

The only thing I can think of now is that the precision comes from that you know what you are tracking is a perfect circle. In that case what the software needs to do is:

1) find the circle in the image
2) map the digital image to a perfect circle of a certain size to determine x/y/z.

I think given a digital input image you don't actually even need a very high resolution to still get very precise input, as luminence is going to help also?

If Kaz has it right (you'd think so, but stranger things have happened), it's interesting and it has a number of implications.

Ha ha, liolio is right !

If it's using only the bulb and PS Eye, the controller should be cheap, but it won't work outside line-of-sight then. The thing is it may be doable using PSP too :p !

Some interesting images from Titanio again:
eb4oas.jpg
(http://www.neogaf.com/forum/showpost.php?p=16237604&postcount=1503)

2l8zxxi.jpg
(http://www.neogaf.com/forum/showpost.php?p=16237641&postcount=1505)

They seem to be one year old (from USPTO), so things may have changed.


Arwin said:
One of the first things I thought of when I saw Natal in action was that a great testing application to find the limits / strengths of the device is by trying to teach it sign language.

Yap ! And "puppeteering" I supposed.... like using fingers to strangle and fly-kick enemies :). The use is rather esoteric but very expressive (!). I originally wanted to track guitar fingers but were running into trouble thinking about the contact/trigger issue. That's why I thought about using ribbons (instead of 3D finger detection). I could tie the ribbons on the string to detect vibration instead of tracking the hand motion.
 
If Kaz has it right (you'd think so, but stranger things have happened), it's interesting and it has a number of implications.

It would be very interesting if they are doing that. My back of the envelope calculations (using the best possible characteristics of the PSEye of FOV: 56 degrees, resolution: 640 pixels), show that a ball with a diameter of an inch would take up 8 pixels at it's widest point at a distance of 1.91m from the camera. It would take up 7 pixels at it's widest point at 2.18m.

27cm of depth in a change of a pixel in diameter?

At 3m it would cover 5.1 pixels at it's widest point, at 4m: 3.8 pixels (1.3 pixels for a meter depth). I just don't see the resolution here to generate 1:1 depth on visuals alone.

A-ha! There is where my argument trumps yours in the middle ground. I'm thinking the applications where finger detection is useful are situation where purely optical will work. Specifically, giving instructions through hand gestures - fingers closed, open palm to camera; spread hand; counting fingers; pointing.

So I guess we are talking about actual finger recognition (me), and using fingers to create recognisable shapes, but not holding anything or using two hands together (you). If you hold your hand out in a specific way in front of the camera, it's just pattern matching, isn't it?

BTW, fair enough if that is the extent of the "finger recognition" you are talking about. I tend to think of "finger recognition" in the more general sense.

I agree with you in general. In principle though I think you cannot make a gesture with one hand where a finger's position cannot be deduced from the position of the rest of the hand. So eventually it should always be possible.

Imagine the camera sees something like this. Are the fingers outstretched or are they in a fist? (You can just about tell from this image but Googling images of particular hand positions is quite difficult ;)).
 
So I guess we are talking about actual finger recognition (me), and using fingers to create recognisable shapes, but not holding anything or using two hands together (you). If you hold your hand out in a specific way in front of the camera, it's just pattern matching, isn't it?

BTW, fair enough if that is the extent of the "finger recognition" you are talking about. I tend to think of "finger recognition" in the more general sense.
With respect to Natal, the origin of this discussion, the question is whether the 48 joints include fingers, and the consensus is they don't, that the hand is represented as a joint in the centre of the plam or somesuch, and the position of the fingers is irrelevant. I think the definition of 'finger recognition' meaning 'can we spot any fingers and what shape do they make; to be apt. Joint-level finger tracking is going to require some contact sensors I imagine, or some insane image recognition methods. I still can't think of any use for that level of tracking though. A virtual piano could be managed just with placement of a digit in an area of the FOV, for example. Perhaps the main use for true digit tracking would be in VR with correct vitrual object placement, graphic occlusion, and finger interaction with a creature?
 
It would be very interesting if they are doing that. My back of the envelope calculations (using the best possible characteristics of the PSEye of FOV: 56 degrees, resolution: 640 pixels), show that a ball with a diameter of an inch would take up 8 pixels at it's widest point at a distance of 1.91m from the camera. It would take up 7 pixels at it's widest point at 2.18m.

27cm of depth in a change of a pixel in diameter?

At 3m it would cover 5.1 pixels at it's widest point, at 4m: 3.8 pixels (1.3 pixels for a meter depth). I just don't see the resolution here to generate 1:1 depth on visuals alone.

That's why I originally thought they couldn't get rid of the ultrasound mechanism. The SD camera resolution may not be fine enough. If Kaz is wrong, I'd imagine someone from Sony would correct him soon.
 
With respect to Natal, the origin of this discussion, the question is whether the 48 joints include fingers, and the consensus is they don't, that the hand is represented as a joint in the centre of the plam or somesuch, and the position of the fingers is irrelevant. I think the definition of 'finger recognition' meaning 'can we spot any fingers and what shape do they make; to be apt. Joint-level finger tracking is going to require some contact sensors I imagine, or some insane image recognition methods. I still can't think of any use for that level of tracking though. A virtual piano could be managed just with placement of a digit in an area of the FOV, for example. Perhaps the main use for true digit tracking would be in VR with correct vitrual object placement, graphic occlusion, and finger interaction with a creature?

If they could track the curvature of your fingers, or your thumb, they'd be able to determine wrist rotation. The lack of finger tracking isn't a big minus for me. I think if everything is working they should be able to make enough cool stuff without it.

I'm still interested in knowing if they can detect a bent wrist.
 
Lets say they get all this working and delivered at affordable prices.

So instead of pretending to play guitar with a fake guitar in Guitar Hero (which itself is a lame concept), you'd be playing air guitar?

Or shadow-boxing in some future versions of Fight Night?

The popularity of Wii notwithstanding and no matter how much more immersive this may be, it's going to look retarded people flailing about in front of their TVs.
 
Well, it is playing videogames. We gamers have perfected the art of looking stupid in front of TVs (though there are different degrees of stupidity).
 
Lets say they get all this working and delivered at affordable prices.

So instead of pretending to play guitar with a fake guitar in Guitar Hero (which itself is a lame concept), you'd be playing air guitar?

Or shadow-boxing in some future versions of Fight Night?

The popularity of Wii notwithstanding and no matter how much more immersive this may be, it's going to look retarded people flailing about in front of their TVs.

Uh-huh... I am thinking as a UI and gaming extension, if the system can track fingers accurately, you may not need to flail your entire body to look retarded. One may be able to express a lot more actions using just 2 hands. While the systems may not be able to handle it reliably (especially far enough), I think both PS Eye and Project Natal can track them under very specific conditions.

For very subtle use that requires more than just motion (e.g., guitar playing), we will need more sophisticated mechanism.
 
Sony's tech demoes should be using the ultrasonic controller technology (complemented by PS Eye + color LED). A pure SD camera-based solution may not be able to track the player's movement so quickly and so accurately. Not to mention in the archer demo, one hand is behind (obscured by) the other. An optical solution will require some trickery to pin point the exact location of both hands.
With an ultrasonic solution, sound can be even more easily occluded. When I tried Gametrak's demo, they kept saying "don't think of it like a Wiimote" when you'd try small tilting, accelerometer-inputting motions, but whenever it became unresponsive, it was because I wasn't pointed directly at the sensor bars on the sides of the screen. I think the best solution would be gyroscopic + visual. When out of line-of-sight, the game can still track rate and vector of motion.
 
I think a lot of people are interested in the technology but ultimately people playing air guitar is just sad.
 
It would be very interesting if they are doing that. My back of the envelope calculations (using the best possible characteristics of the PSEye of FOV: 56 degrees, resolution: 640 pixels), show that a ball with a diameter of an inch would take up 8 pixels at it's widest point at a distance of 1.91m from the camera. It would take up 7 pixels at it's widest point at 2.18m.

27cm of depth in a change of a pixel in diameter?

At 3m it would cover 5.1 pixels at it's widest point, at 4m: 3.8 pixels (1.3 pixels for a meter depth). I just don't see the resolution here to generate 1:1 depth on visuals alone.

I think they can do significantly better than that considering silhouette boundary pixels carry much more information than a single (binary) bit, thanks to inherit low pass filtering.
For ideal lighting and coloring conditions you can get practically perfect readings.

My guess is that the bulb is a bit of a compromise based on avoiding patent coverage.

Which patent?
 
I think a lot of people are interested in the technology but ultimately people playing air guitar is just sad.

:LOL: It could get worse ! If the gestures can be tracked accurately and quickly, I am actually thinking about controlling 2 players at the same time in FIFA using my left and right hands. Just do a running man gesture on both hands, and have them pass a virtual ball between the 2 hands. I think I can do banana kick or bicycle kick using the fingers too (simulate leg movement). :p

I find the control scheme for sports games like MLB 2009 much much too hard (Too many button combination to remember). Don't know about FIFA.
 
Status
Not open for further replies.
Back
Top