PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Status
Not open for further replies.
Kinect2 in the 2010 PDF is just a "dual camera HD" too, not Kinect1 technology
you can make depthmap like Kinect1 with dual camera

Yeah, but I assumed that would be 2 depth cameras which would give a way better view of the room and the players by capturing point clouds from two angles giving far more complete 3D data.
 
Kinect2 in the 2010 PDF is just a "dual camera HD" too, not Kinect1 technology
you can make depthmap like Kinect1 with dual camera

Do they add an infrared channel to the RGB camera ? Seems trivial and an obvious cost cutting measure.

Cheers
 
You only need an angle between the two, they can be mounted on both sides of the TV. The image analysis can triangulate the detected edges and features. There's also nothing preventing Sony from projecting an infrared grid.

This is mocap done with 4 PS Eye camera. With 2 it would still work but would give an incomplete model (just like the kinect, it would be a messy "front" point cloud)
http://www.youtube.com/watch?v=7ssb0ZN1MSA
 
PS3 3D games ask for your screen size when you configure the 3D part of it.
If two cameras are a ways apart, they may simply ask approximately how far apart you placed the two cameras.

However, there is one thing that is very hard to emulate, which is the focus aspect. To be specific, I mean displaying a image that your eyes can freely focus to different distances so that your eye (one eye is enough to do this) can figure out what items are farther away and what items are closer to you.

Too bad this is pretty much undoable with current displays.
 
Yes, you can do markerless mocap with two or more PS Eye. There are professional systems doing that. It's better with markers though, they could use some high contrast cloth wrist and ankle bands to make it flawless and as easy on the processing as the Move, no need for any smoothing, 120Hz, zero lag.

Do you know what's the computing load for such a 120Hz, no lag setup ?
 
Do you know what's the computing load for such a 120Hz, no lag setup ?
ah. I didn't think that far. It's probably insane because the software version can't even do it in real time with a recent PC CPU. I'd be curious how much they'd be able to do with 4 CUs. I guess they would need markers to help ease the processing :???:
 
You only need an angle between the two, they can be mounted on both sides of the TV. The image analysis can triangulate the detected edges and features. There's also nothing preventing Sony from projecting an infrared grid.

This is mocap done with 4 PS Eye camera. With 2 it would still work but would give an incomplete model (just like the kinect, it would be a messy "front" point cloud)
http://www.youtube.com/watch?v=7ssb0ZN1MSA

Wow!. But it runs at 2-3 frames per second on a fast PC. So, without dedicated hardware it will be not possible.
 
Last edited by a moderator:
That's how Sony's current consumer 3D camcorders do it, though.

Yeah, I vaguely remember they have a patent for keeping both lens close to each other for close up 3D shots (without losing focus).

Ideally the unit is light enough to mount on a motorized platform.

Would be interesting to learn the tech aspects. Basically they need the GPU or dedicated h/w implementation to run at least 30 times faster. I am still very keen to see an update on the Magic Mirror demo.

So far, no consumer device seem to use the ultrasonic sensors for mapping the environment except for Aibo. Sony filed a few ultrasonic sensor-based gaming patents and that's the last we saw of 'em.

I wonder how far you need to stand to track fingers in this version.

If this rumor is true, at least the 3D YouTube announcement will get a proper follow up ? "Everyone" will have a 3D cam for gaming related videos at least. It feels so lonely when I'm the only one with a 3D Bloggie.
 
Eyetoy is:
320x240 60fps
8bit
1 microphone
1 camera

PSeye is:
640x480 60fps, 320x240 120fps
10bit
more sensitive
less compression
4 microphone
1 camera

TwinEyes?
1280x720 60fps, 640x360 120fps, 320x180 240fps ?
12bit ?
more sensitive ?
less compression ?
4 microphones ?
2 camera ?
 
If you time the cameras to run 60fps at an alternative timing, you would have 120fps response time on movement that is seen by both cameras?

Audio positioning/tracking will be better with two mics in the camera on the left, and two on the right of the TV.

Camera's could be really small and light these days, and I'm interested in seeing how Sony proposes to attach them to the TV. Would be great to have them at either edge of the screen, many advantages to that.

I'm also wondering if it is feasible to NOT filter out infra-red, as the PS Eye does (or perhaps make it optional).

Not expecting too much though. ;) But I certainly do hope for stereo recording and I really hope for a decent improvement in quality. 720p is almost a disappointment in that respect, but who knows what you can do with two of them.
 
Wow!. But it runs at 2-3 frames per second on a fast PC. So, without dedicated hardware it will be not possible.
Maybe that's what those extra 4 CUs are for? Maybe they're reserved for depth processing instead of available for the title. It's the kind of processing that a CU would be absolutely perfect for.
 
Eyetoy is:
320x240 60fps
8bit
1 microphone
1 camera

PSeye is:
640x480 60fps, 320x240 120fps
10bit
more sensitive
less compression
4 microphone
1 camera

TwinEyes?
1280x720 60fps, 640x360 120fps, 320x180 240fps ?
12bit ?
more sensitive ?
less compression ?
4 microphones ?
2 camera ?

The latest Exmor R supports HDR photo mode. Don't know if they will use the unit here.


Maybe that's what those extra 4 CUs are for? Maybe they're reserved for depth processing instead of available for the title. It's the kind of processing that a CU would be absolutely perfect for.

Most likely, in my guess. They need to do it at 60Hz given Move's requirements to date.
 
If you time the cameras to run 60fps at an alternative timing, you would have 120fps response time on movement that is seen by both cameras?
Nah, that'd screw the depth perception. You'd be comparing elements at different time slices.

Audio positioning/tracking will be better with two mics in the camera on the left, and two on the right of the TV.
The mic array in PSEye was a real disappointment. I hope Sony have sorted that out. 2+2 as you say with considerable separation should make audio extraction more accurate, although beyond a certain distance it shouldn't make a difference.

Camera's could be really small and light these days, and I'm interested in seeing how Sony proposes to attach them to the TV. Would be great to have them at either edge of the screen, many advantages to that.
Yeah, cameras in tablet are insanely small. Sony could go with really small clip-on type housings, sat on the top of the TV. Or a thin bar on the bottom (should be mountable in either place, regardless).

http://www.sony.net/SonyInfo/News/Press/201010/10-137E/
Very capable tiny Exmor R cameras in Q1 2011. Pretty ideal even back then. I'm really hoping for good things here. It's an area Sony actually have a leading role in the world

I'm also wondering if it is feasible to NOT filter out infra-red, as the PS Eye does (or perhaps make it optional).
I've considered the same, using IR LEDs for tracking, although it may be sensitive to IR noise. Don't really know how that'd work out.

Maybe that's what those extra 4 CUs are for? Maybe they're reserved for depth processing instead of available for the title. It's the kind of processing that a CU would be absolutely perfect for.
If they don't have to do a full 3D build every frame, maybe. It's perhaps important to track face and hands more accurately.

Most likely, in my guess. They need to do it at 60Hz given Move's requirements to date.
Hand tracking and face tracking needs to be fast and accurate, but a full 3D model can make various compromises. I don't anticipate 60fps full frame 3D reconstruction as that really is asking a lot! If it's that easy, I expect Hollywood would have amazing setups of full scene 3D capture. It'd make composition a zillion times faster and easier with automatic mattes based on depth.
 
Hand tracking and face tracking needs to be fast and accurate, but a full 3D model can make various compromises. I don't anticipate 60fps full frame 3D reconstruction as that really is asking a lot! If it's that easy, I expect Hollywood would have amazing setups of full scene 3D capture. It'd make composition a zillion times faster and easier with automatic mattes based on depth.

Dr. Marks have insisted on 60Hz (at the minimum) tracking in every iteration. If on-the-fly 3D data is too much to handle, drop down to 2D.

They can use 3D data for captured objects in say LBP, PS Home or some AR games.
 
Dr. Marks have insisted on 60Hz (at the minimum) tracking in every iteration. If on-the-fly 3D data is too much to handle, drop down to 2D.
Ah, my meaning perhaps wasn't clear. Keep the 3D tracking of face and hands at 60 fps. That can be performed with simpler algorithms other than full 3D reconstruction. Full 3D is only needed for AR or object scanning and such fancy things. That doesn't need to be 100% accurate at 60 fps. You could scan a room prior to playing EyePet 2 and as long as nothing is moved, that scan will be valid. You could also use partial reconstruction of just the areas that change, meaning processing just a fraction of the whole screen.
 
The mic array in PSEye was a real disappointment. I hope Sony have sorted that out. 2+2 as you say with considerable separation should make audio extraction more accurate, although beyond a certain distance it shouldn't make a difference.
This would be very hard to do, as I understand Mic Array algorithms. To make them work, you need to know in detail how far the mics are from each other, and no two mics should be a multiple of the others in distance (prime relationships would be absolutely best). When we made any tiny change to the mics in Kinect, including changing case materials, mic distances etc, we had to run a full optimization for the pipeline that generates the magic numbers for the filters. This took multiple days on a supercomputer cluster. That's because things like beamforming are highly reliant on the positioning of the mics.

My prediction: This "Playstation eyes" device will be a single enclosure with both cameras and the four mics.
 
Just want to point out that you can get very detailed depth/movement both relative and absolute information from a single camera. I cant explain how it works but it was one of my good mates physics honnors projects( i've played with it), was built with nothing but a single generic web cam camera and an FPGA. Has been the basis for like 6PHD's, he now works at a company thats joint funded by US DOD and AUS DOD.
 
You only need an angle between the two, they can be mounted on both sides of the TV. The image analysis can triangulate the detected edges and features. There's also nothing preventing Sony from projecting an infrared grid.

This is mocap done with 4 PS Eye camera. With 2 it would still work but would give an incomplete model (just like the kinect, it would be a messy "front" point cloud)
http://www.youtube.com/watch?v=7ssb0ZN1MSA

After checking out the information they have available, they cannot calculate depth info using just 2 cameras using their algorithm. The cameras off to the side are required to get accurate volumetric information (including depth). The minimum camera setup for 2D camera's that is support is 3x cameras in sort of a triangular half circle around the subject.

As well as someone else mentioned this requires a full PC which includes GPGPU acceleration. Even with the help of GPGPU acceleration it gives the following for processing.

Intel Core i7 quad core at 4 Ghz.

Radeon HD 5750 requires 2.5 seconds per frame.
Radeon HD 5870 requires 1.1 seconds per frame.
Nvidia GTX 480 requires 0.65 seconds per frame.

And that is using all of the available GPGPU resources available.

Needless to say, neither of the next gen consoles will be able to achieve anything like the results in that video using just standard 2D RGB cameras without significant additional compute resources.

You'll likely still need the move sensor's relative positioning ability combined with a rough positional guide from the 2 camera's in order to get any meaningful depth information. Just with the high resolution cameras they may finally be able to do away the large colored balls.

That's how Sony's current consumer 3D camcorders do it, though.

Which works fine for recording 3D. It doesn't work so well for calculating depth. Neither accurately nor with low enough processing power to be done in real time on the Orbis, even if all system resources were being used for it.

Regards,
SB
 
I don't know what Sony will use but this is done via a stereoscopic HD camera (up to 1080p) + FPGA in near real-time: https://www.youtube.com/watch?v=pydla1fPfBw
(Video seems to be for 720p processing @ 60Hz according to their site)

If Sony use similar tech, they will need to make it (even more) low cost.

I guess we'll see if Sony's "third pillar" professional imaging division is worth their salt.
 
Status
Not open for further replies.
Back
Top