Stereoscopic 3D using Reprojection

Marries

Newcomer
Hi,

At Siggraph 2011, Crytek and Disney explained their approaches on generating stereoscopic images using reprojection. A short explanation with their slides can be found here (here translated from japanese to english).

Together with a friend I've developed a gather-based reprojection approach. The results and performance are very similar to Crytek's, although the underlying math seems to be different.
A tech demo, shader code and a paper (with a thorough explanation, performance measurements and a quality comparison) are available on my website: http://www.marries.nl/projects/stereoscopic-3d/

I'm very interested in your opinion on the general idea of using reprojected images for stereoscopic 3D. Do you think reprojection is a good alternative to rendering everything twice?
I'm also hoping to receive some feedback on my approach for reprojection, and I'm curious how you think it compares to the other reprojection algorithms out there.

Any feedback is greatly appreciated!
 
Hi. I've read the linked article but find it difficult to follow, partly because of the translation, and partly my relative ignorance of the terminology. Could you tell me where the information for the 'missing' pixels comes from after the depth shifting of the pixels?

I noticed that Crysis 2 looked very flat indeed, and sometimes very near objects had ghosting round the edges. Is this because there is missing information? Does your solution suffer from this artifact?
 
I have researched reprojection based techniques in our last game projects. I feel that reprojection based techniques will play a major part in future graphics engines, since the complexity to generate a single screen pixel rises all the time. Most pixels in the screen do not change that much from frame to frame (especially in a 60 fps game). It would be a huge waste if we cannot find efficient ways to reuse this already calculated data.

Half the way though our last game project (Trials HD), our engine rendered only 20 true frames per second. To achieve the 60 fps, we rendered all the objects without any lighting and material processing 60 times a second. This pass used two different rendered views (future and past), and determined the correct sample to use one by using the reprojected depth coverage. The result of this was surprisingly good (we could actually achieve 120+ fps with the final code), but there was some ghosting artifacts and we disliked the 20 fps shadow update rate (we could have separated shadowing from lighting, but that has it's own problems). So the final game was released without the reprojection technology. We of course had to remove some dynamic lights to make the game run at 60 fps without reprojection, but we still managed to make it one of the most graphically intense XBLA games at the time.

I have also been toying up with stereoscopic reprojection. I personally think interleaving the eye update (odd frames left, even frames right), and using two input textures/depthbuffers as reprojection source (the other eye, and the last frame of the same eye) would provide pretty good quality. There should be much less covered areas when you can select from two images. Of course you need a fast update rate (60 fps) in your game, to make the last frame as correct as possible (moving shadows are the most notable offender).
 
Hi. I've read the linked article but find it difficult to follow, partly because of the translation, and partly my relative ignorance of the terminology. Could you tell me where the information for the 'missing' pixels comes from after the depth shifting of the pixels?

I noticed that Crysis 2 looked very flat indeed, and sometimes very near objects had ghosting round the edges. Is this because there is missing information? Does your solution suffer from this artifact?

The missing pixels are filled by repeating the background. As you suspected, this causes a sort of ghosting. The repeat of the background is not "correct", ofcourse, but it's the least noticeable. Retrieving the correct pixels would be too costly and it would defeat the purpose of reprojection. The artifacts from the background repeat are the biggest disadvantage of reprojection.

I think the reason Crysis 2 looks flat is because of the used depth settings (eye distance and focal plane distance). Have you tried increasing the depth in the settings?
I don't think the reprojection technique itself causes the flatness, but this is very hard to judge objectively because the depth perception of the eye is very hard to measure. In the reprojection tech demo on my site you can change all settings and create a really intense depth perception.
Maybe the reason why Crysis 2 uses conservative depth settings is because it makes the artifacts smaller.

My solution has results similar to Crysis 2 (though I'm not sure if Crytek's technique allows the focal plane distance to be changed). So my solution also has a background repeat which causes similar artifacts.
I have also been thinking about the dual reprojection approach sebbbi explained. In that approach, the missing information can be obtained by reprojecting the previous frame.

If you compare reprojection (with the missing pixel artifacts) to the alternative of rendering twice (with a halved resolution, lower LOD, etc), what do you prefer?
Personally, I prefer the missing pixel artifacts. I think they are barely noticeable when viewing them in S3D on a stereoscopic display.
 
I have researched reprojection based techniques in our last game projects. I feel that reprojection based techniques will play a major part in future graphics engines, since the complexity to generate a single screen pixel rises all the time. Most pixels in the screen do not change that much from frame to frame (especially in a 60 fps game). It would be a huge waste if we cannot find efficient ways to reuse this already calculated data.

I totally agree with your view on reprojection, sebbbi. Reusing rendered data can greatly improve quality/performance.
It's very interesting to hear about your experiments. I'm considering to investigate reprojection for games in general as my master's thesis.

Half the way though our last game project (Trials HD), our engine rendered only 20 true frames per second. To achieve the 60 fps, we rendered all the objects without any lighting and material processing 60 times a second. This pass used two different rendered views (future and past), and determined the correct sample to use one by using the reprojected depth coverage. The result of this was surprisingly good (we could actually achieve 120+ fps with the final code), but there was some ghosting artifacts and we disliked the 20 fps shadow update rate (we could have separated shadowing from lighting, but that has it's own problems).
You need to wait for the next frame to be rendered. Is that delay a problem, in your experience?

I have also been toying up with stereoscopic reprojection. I personally think interleaving the eye update (odd frames left, even frames right), and using two input textures/depthbuffers as reprojection source (the other eye, and the last frame of the same eye) would provide pretty good quality. There should be much less covered areas when you can select from two images. Of course you need a fast update rate (60 fps) in your game, to make the last frame as correct as possible (moving shadows are the most notable offender).
The stereoscopic reprojection technique you explained is very interesting. I've been thinking about the same approach, but I'm really worried about alternating between a true and a reprojected image. Doesn't that cause some sort of huge eye strain?
 
I think the reason Crysis 2 looks flat is because of the used depth settings (eye distance and focal plane distance). Have you tried increasing the depth in the settings?
Yes, this is with the depth set to 100% in game. To me it appears as though the gun and hands are in stereo3D, then the HUD in the foreground and the rest of the scene on a backround plane. If I was exposed to this as the only form of stereo3D for gaming I wouldn't have bought the LCD shutter glasses 10 years ago, and replaced them with 2 nVision kits in recent years. It just doesn't 'wow' me.

Maybe the reason why Crysis 2 uses conservative depth settings is because it makes the artifacts smaller.
I thought that too.

If you compare reprojection (with the missing pixel artifacts) to the alternative of rendering twice (with a halved resolution, lower LOD, etc), what do you prefer?
Personally, I prefer the missing pixel artifacts. I think they are barely noticeable when viewing them in S3D on a stereoscopic display.
Having not seen the reduced resolution and LOD in comparision I couldn't tell you. I don't have these issues using the 3Dvision system, as you're no doubt aware it's full res, no LOD adjustment and no missing information. I see your point though, that something must be sacrificed to keep the same performance, if performance is an issue.

I can't help but be dissatisfied with the only reprojection using 3D game I've seen so far, I would love to see a demo of your work and perhaps change my mind about it. If it could get a lot closer to regular stereo using 2 images it could be a massive win for stereo gaming getting more popular. What I've seen with Crysis though reminds me of 2D-3D conversion movies, they're bad for the industry as they will probably turn people off 3D if that's the majority of what they're exposed to.
 
The reprojected 3D in Crysis is pretty weak to me.
The severely limit depth and convergence settings, leading to a very flat image, a lot of this is because artifacts are going to become obnoxious if they increase it much further.
In principle the concepts of reprojection, or image interpolation are interesting, the current solutions don't seem acceptable to me.
At the end of the day 2X just doesn't seem like a big multiplier in processing terms, if the industry does decide 3D s important, and designed content within the restrictions, I don't see it would dramatically impact graphics and I wonder just how much investment is warranted in these types of techniques.
 
Well, if some magic sauce is found if/when 3D is popular then the user of it could have maybe 1.5 times the graphical horsepower for their app compared to the competition. I just don't know how 'faking' the missing pixels is going to end up with any sort of nice result, but then I'm not doing the research.
 
I can't help but be dissatisfied with the only reprojection using 3D game I've seen so far, I would love to see a demo of your work and perhaps change my mind about it.

You can download a demo from my website: http://www.marries.nl/projects/stereoscopic-3d/
All stereo parameters can be changed inside the demo. I'm really curious which settings you prefer!

I hope the demo works for your S3D setup. If it doesn't work, please PM/e-mail me.
 
Sadly the consumer level 3DVision system only works with DirectX applications, and your demo is OpenGL only.

I read portions of your paper with interest. Did you actually implement the hybrid version? If so I have a few questions:

  • Did it completely remove the missing pixel problem?
  • What was the performance penalty compared to the old brute-force method and the standard reprojection method?

When rendering two completely distinct views I often see the same point in space with a different colour in each eye, due to shading, reflection or whatever, as the light is reflected with different energy in different directions. This is true to real life. Using reprojection are we losing that? Reading your paper it looked to me (a layman!) that colour data was calculated before the horizontal shift.
 
Sadly the consumer level 3DVision system only works with DirectX applications, and your demo is OpenGL only.

Unfortunately we couldn't support the consumer level NVidia 3D Vision because it's impossible to supply self-rendered stereoscopic images to NvApi, if I remember correctly. (At least with the public API.) The demo does work for most 3D TV's and passive displays.
Too bad you cannot test the demo properly!

I read portions of your paper with interest. Did you actually implement the hybrid version? If so I have a few questions:

  • Did it completely remove the missing pixel problem?
  • What was the performance penalty compared to the old brute-force method and the standard reprojection method?
We haven't implemented the hybrid version, unfortunately. (Hybrid: only render close objects twice, use reprojection for the rest.)
It won't completely remove the missing pixel problem (otherwise, the reprojection wouldn't have any effect!). However, the missing pixels probably won't be noticeable because the area is usually only 1-2 pixels wide at some distance.

When rendering two completely distinct views I often see the same point in space with a different colour in each eye, due to shading, reflection or whatever, as the light is reflected with different energy in different directions. This is true to real life. Using reprojection are we losing that? Reading your paper it looked to me (a layman!) that colour data was calculated before the horizontal shift.
Yes indeed, you will lose that. But those differences are very small, if I'm correct. I think it's not noticeable.
 
Hmm, you say it's not noticeable, but I notice it! Out of interest, do you have much experience gaming in stereo? I've been using the nVidia stuff for the better part of 10 years. Perhaps I've become hard to please, but I think that stereo gaming is at enough of a disadvantage to start limiting the experience even more.

Judging by the reaction of stereo gamers to Crysis 2, few were pleased at all by what we were offered. Literally the only benefit was the performance, but most would rather have 'proper' 3D. Of course this is hearsay, but perhaps you could form a survey as part of your research. I don't know if you'd like the results though!

By rendering everything twice we appear to get a horizontal supersampling, akin to 2x1 SS as the images are combined by the brain. This is FSAA, and combined with a very moderate MSAA looks wonderful. Shifting the pixels with reprojection I think would lose this benefit too?

Perhaps as an enthusiastic PC gamer I'm the wrong audience for this work. I care about quality enough to want the best options available and will sacrifice money to let the performance be there for it. For consoles this may well be the best solution, I don't know.

Going back to recreating the lost data. Surely it's better to render the missing stuff (from a mask? - work out what isn't there then draw it) than to render the entire scene twice, as per the brute force method?

I'm sorry if I sound overly negative, I have great respect for your work and wish you all the best with it.
 
Last edited by a moderator:
Don't worry about having criticism. I posted here to receive such feedback!

A user study would indeed be very interesting. That would make it clear what is noticeable and what isn't. I'm especially interested in results for the "hardcore 3D users".

About the points having different colors in each eye, did you experience that with the 3D Vision system? I checked some documentation but I can't find anything about taking the changed camera position into account for shading.

For recreating the missing data, I think using temporal data (as sebbbi explained) is very promising.
The current solutions available are probably not yet suited for experienced stereo 3D PC gamers. For the consoles with their limited resources, I still think it's a good option (with the correct settings, of course).
 
About the points having different colors in each eye, did you experience that with the 3D Vision system? I checked some documentation but I can't find anything about taking the changed camera position into account for shading.

I'm not sure why it wouldn't, being that the scene is rendered from two unique positions. Camera location surely has an effect on colour data unless the shader is isotropic.

If you look at the section "How 3D Vision Automatically Fits In" - you'll see that there is a pixel shader stage right at the end, per eye.
 
Having said all that... I've just reread the document and come across this statement, under the heading "Wrong is Right"

A subtle aspect of rendering in stereoscopic is that what is correct is not always right.
Sometimes it is better to reduce eye strain than to be physically correct. This is
particularly true for users that play with high values for separation.
One example of this is with very strong, very tight specular highlights. To be physically
accurate, the application would need to be aware of the actual view vector for each eye,
and then compute the specular highlights accordingly. In testing, NVIDIA has found
that using a unified camera that matches the specular highlight in both eyes reduces
eyestrain and feels better. As an added bonus, this method requires no additional effort
on the part of the developer.

The last sentence makes me wonder if my earlier assumption was correct.
 
That quote seems to confirm my presumption. The camera location is implicitly changed by transforming the vertex positions after the vertex shader. This does not affect any cameraPosition/viewVector variables which are used to calculate a specular highlight.

Maybe you've seen the different colors in each eye in a game which uses a different API?
Anyway, it's very hard to know for sure. During my research I've experienced numerous times that the human visual system can be very treacherous. The eyes tend to see what they want to see. This is especially true for depth perception because it is very intangible. A decent user study could answer such perceptual questions.
 
Back
Top