Motion compensation and reusing last frame image data

sebbbi

Veteran
Our current/previous generation graphics rendering technology supports full screen motion blur. This is implemented by storing last matrices of each object after rendering (full WVP matrices), and then during next g-buffer render pass, we store the screen space xy motion difference in each pixel (one 8888 rendertarget stores screen space motion.xy and normal.xy). In scene post process pass, for each output pixel we take 7 samples of the input image along the motion vector to generate the motion blurred result. This technique provides pretty good result and makes the game feel much more fluid.

If you want to check this technology, you can get the free demo version of our Trials 2 Second edition at Steam:
http://www.steampowered.com/v/index.php?area=app&AppId=16600

But now to the real topic.

I bought a brand new 100hz 1920x1080p television set, and watched some Blu-Ray movies on it. The image feels very lifelike and the motion is extra smooth because of the very well implemented TV built-in motion compensation system. The HDTV motion compensation works very well even without having access to any real data about the scene structure. The animation in graphics engine is much more easier to predict, as we have much more information to use (compered to a single color bitmap). And there are no problems with UI/HUD overlays either, as it's easy to only motion compensate the real scene, and render the UI/HUD over the estimated image every frame. So I had to start experimenting.

My first experiment was to use the same motion vectors as I use in my motion blur implementation to provide the information to the motion compensation system. The motion compensation post process shader just samples one texel per output screen pixel by using shifted texture coordinate (finalTexCoord = texCoord - timeDifference * motionVector). This is basically free compared to the work needed for the real scene rendering. The result is good overall if the estimation time period is very short (high real frame framerate), but has some noticeable graphic gliches in object edges where adjacent pixels are moving on vastly different speeds (on screen space coordinate system). Also this technique does not improve the input latency at all. 15 fps looks like 60 fps to the outsider (with 4 motion compentated frames for each real frame), but when you play the game yourself the controls feel as lagged as when playing a 15 fps game.

Good:
- Motion compensation shader is basically free
- Easy 2-4 times improvement to the frame rate
- Looks very convincing on majority of the pixels

Bad:
- Does not improve input latency at all (60 fps with 4x motion compensation feels still like 15 fps when you control the game)
- Visible graphical gliches on object edges moving at different speeds (or at different direction)
- Requires high real frame update rate to minimize the artifacts. But quadrupling 60 fps -> 240 fps does not offer any real benefit.

Time for the second experiment.

In Trials 2 SE, we have (optional) simple forward rendering alternative system for our deferred renderer, so that players with business laptops and low end graphics cards could also play the game. The simple forward rendering system doesn't have any real time lighting, shadow calculation, post processing or fancy material system, and runs the game at 600+ fps on my computer (the deferred shader version runs at 50-60 fps). However it renders all the same geometry as the deferred shader (excluding shadow geometry of course). The system has a really low CPU overhead, as we are basically using the same highly optimized animation, culling and scene management system as we used in our previous release: Warhammer 40K Squad Command for PSP/NDS. PC has several magnitudes more CPU power than PSP and Nintendo DS, so the scene management and culling are basically free. This means that I can render the whole scene geometry (except shadows) with a simple pixel shader almost as many times I like without decreasing the game performance noticeably.

As with my previous experiment I borrowed stuff from my motion blur implemententation again. I modified the system to update the object last matrices only when a "real" frame is rendered. The real frame is rendered to a separate render target, so we can access it during the motion compensation frames.

In the motion compensation frames, I render the whole geometry again (with updated game logic info if available). But instead of using the object materials, every object has a single texture, and that texture is the last real frame rendered. Texture coordinates are calculated by using the stored object "last frame" matrices (that are updated only during the real frame rendering). So basically each object vertex is mapped to the real frame image, and these texture coordinates are used during the motion compensation frames.

With this technique the scene is rendered very rapidly, and the player commands are processed and visible without any delays. The only thing updating slowly is the texture mapped to this geometry. Usually however the object surface texture stays pretty much identical from frame to frame (with specular highlights being the most high frequency change to the pixel colors). We can update the texture for example once in 16 frames, and most of the surfaces look identical to the real rendering. And we get 16 x improvement to the frame rate.

However there is a "slight" problem. We are using the last frame image, and when the camera and the objects move, pixels that were previously hidden behind objects become visible. We do not have surface colors stored for these pixels at all. The stored texture has another object rendered on top of the mapped area. However we can easily detect these pixels in the motion compensation shader by storing the rendered image depth buffer to the texture alpha channel (low precision is enough). If the calculated input texture coordinate (using last frame matrix) produces different z-coordinate than the stored value in the alpha channel, the texture pixel belongs to another object.

Now we can first render the scene using any amount of different camera and object matrices and store it to textures (color in rgb and depth in a channel). Then we can sample these textures in the motion compensation shader repeatedly. For each pixel we only use the texture maps that see the pixel. Each screen pixel that is visible on at least one of these textures is rendered correctly. If the textures cover all surfaces, we can animate the scene any way we like, and it will be rendered correctly for any amount of frames (barring any texture animation or lighting changes). Also we get an alternative profit. Most of the pixels are visible in more than one input texture. The color of each texel is added together and the result is divided by the amount of visible texels. This is basically supersample antialiasing for the textures (and antialiases also pixel shader effects, unlike hardware geometry MSAA).

Rendering scene from x viewports and with different object matrices to get all the surfaces visible is of course technically impossible, and there is no need to get all that surface data stored, as we are updating the source texture(s) frequently (at 15 fps if we want 60 fps and use 4x motion compensation). If we can perfectly predict the world state at next real frame rendering (one real frame in the future) we can render the scene using those object and camera matrices to the texture, and after 4 motion compensated frames the predicted texture aligns perfectly to the rendered screen. Every rendered pixel can be found in that texture in the motion compensation shader. We also need to remember the last rendered texture or otherwise the surface pixels that become hidden during the 4 frame motion comensation time would be inaccessible. But this is not a problem, we can easily render the new predicted texture on one render target on odd frames and on even frames to another. The last texture is always available on the another render target, and there is no extra work needed.

If the texture refresh rate is high enough (every 4 frames for example), and the future object and camera matrices are predicted correctly, every rendered pixel is always visible on either of these 2 textures. The scene is rendered perfectly. However if the prediction fails badly, some of the pixels that become hidden or become visible during these 4 motion compensated frames might be rendered incorrectly. This does not happen that often in my test case (Trials 2 SE). Also I implemented the system so that the the motion compensation shader uses the screen last frame pixel if neither of the textures contain visible texel for the rendered pixel.

Good:
- Reduces input latency to 1/2 (2x motion compensation) or 1/4 (4x motion compensation) . The game feels really fluid.
- Looks very convincing. The real geometry is rendered and animated perfectly every frame. Only the textures mapped to the geometry can contain minor color issues.
- Around 3x frame rate improvement on 4x motion compensation.
- Provides free 2x SSAA to all polygon surfaces that are visible on both (current and next) real frames (not on object edges). This nicely reduces shimmering caused by some pixel shader effects.
- This system is fully compatible with hardware MSAA. The deferred renderer is only used to render the surface textures.

Bad:
- Requires prediction of future object matrices (one real frame = 4 motion compensated frames ahead)
- Minor visible graphical gliches on pixels that become visible/hidden if the future matrices are not predicted perfectly
- Light specular highlight and texture animation update rate is not improved
- For constantly smooth frame rate this technique requires that the real scene rendering can be split on 2-4 separate parts (in our case: g-buffer rendering, lighting/shadowing 1, lighting/shadowing 2, post process effects). Otherwise there will be a visible stuttering every 2nd/4th frame when the real frame is rendered to the buffer.

I don't have a demo yet ready (that I am allowed to publish), but if I get this feature polished (and all menu modifications done) before Trials 2 SE v1.08 launch I will include this feature to the game.

This technique is in no way perfect yet. But with with this promising results, I predict many games using similar techniques in the future as the surface texture calculation becomes more and more expensive (global illumination, etc). All posts about similar techniques you have implemented in your projects are naturally very welcome.
 
I dont understand this bit
"
- Easy 2-4 times improvement to the frame rate


Bad:
- Does not improve input latency at all (60 fps with 4x motion compensation feels still like 15 fps when you control the game)"

first you say it improves fps by 2-4x then you say it feels like it reduces fps by 4x
why do you need motion compensation for anyway ?
and if you do why not use the exact same technique video does

as for the "feels still like 15 fps when you control the game" i guess its because your also predicting imput, so why not have a system in place that discards moton compensation if the input is not as predicted ?
 
I dont understand this bit
"
- Easy 2-4 times improvement to the frame rate


Bad:
- Does not improve input latency at all (60 fps with 4x motion compensation feels still like 15 fps when you control the game)"

first you say it improves fps by 2-4x then you say it feels like it reduces fps by 4x

as for the "feels still like 15 fps when you control the game" i guess its because your also predicting imput, so why not have a system in place that discards moton compensation if the input is not as predicted ?

You misunderstood me. The input latency stays the same with the first technique. The game runs 15 fps and the motion compensation makes it look like 60 fps. However if the player presses a button, there is at least 1s/15 delay before the button press is visible on the screen. So this does not basically help to make the game controlling feel smoother, only the movement to look smoother.

In the first technique I store the motion vectors during the frame rendering, and the motion vectors are identical in all of the motion compensation frames. Motion vectors and the frame are updated in this example only every 4th frame (15 times per second).

Motion compensation can be of course discarded, but then the game just displays the last frame and we are back at 15 frames per second. If we have a game with one hundred moving enemies at screen, there is always one or two of them that has been mispredicted. These pixels show minor errors, not the whole scene.

why not use the exact same technique video does

Because videos do not have to deal with input lag. When motion compensating video, you have access to both this frame and the next frame available when you generate the motion compensated frames between them. The latency has no meaning. Even a 5 second latency would not cause any noticeable problems (just delay the sound stream by 5 seconds and everything works perfectly). Video motion compensation with 4 motion compensated frames, would cause additional 4 frame input lag, and essentially make the game very unplayable.

why do you need motion compensation for anyway ?

This technique is not really motion compensation in the same meaning as with videos. We actually are optimizing the rendering process by reusing data from the previous frames. So we have to render less every frame. Additionally we get 2x MSAA for free. Who wouldn't want to have 3x framerate (both visual and control) without any noticeable rendering errors?
 
Last edited by a moderator:
hm... I admit, my first thought was all those folks saying that Crysis felt smoother than it should despite the framerate... Could it be possible that the Crytek folks have done something similar (if not the same) :?:

I might have missed it (long read!) but I don't suppose increasing the ticrate would help input latency, would it?
 
To make it more clear:
- The first technique listed is not something I would ever release. It's just there for reference.
- The second technique is the newest one, and what I am developing ahead in the future.
- The next frame object/camera matrix prediction is not needed for sequences without player control (for example FMV sequences shown by the game engine). In these cases you can render 4 frames ahead, and get 3x fps and no prediction errors at all.
 
Last edited by a moderator:
Interesting to hear of at least one other person using frame recirculation in a real renderer, I'm also sampling from the previous drawn frame (verts have both current and previous frame position). A lot of conventional minded rendering people think it is crazy. However, lots of ability to factor work out across many frames this way. One downside to this is that fetching texels from a previous framebuffer usually (or always) means that you are fetching from a linear texture (not as texture cache friendly).

I'm only using the results of the previous frame, not rendering to any kind of surface cache (sounds like you are doing a surface cache?).

Dealing with filling in data which was hidden or offscreen in the previous frame can usually be hidden in motion blur. I also use a lower mipmap of the previous framebuffer (I generate the mipmaps for other reasons as well) to fill in holes in rendering (simple diffusion).

Are you doing depth aware motion blur?
 
My first experiment was to use the same motion vectors as I use in my motion blur implementation to provide the information to the motion compensation system. The motion compensation post process shader just samples one texel per output screen pixel by using shifted texture coordinate (finalTexCoord = texCoord - timeDifference * motionVector). This is basically free compared to the work needed for the real scene rendering. The result is good overall if the estimation time period is very short (high real frame framerate), but has some noticeable graphic gliches in object edges where adjacent pixels are moving on vastly different speeds (on screen space coordinate system). Also this technique does not improve the input latency at all. 15 fps looks like 60 fps to the outsider (with 4 motion compentated frames for each real frame), but when you play the game yourself the controls feel as lagged as when playing a 15 fps game.

Good:
- Motion compensation shader is basically free
- Easy 2-4 times improvement to the frame rate
- Looks very convincing on majority of the pixels

Your first experiment would work great for in-game rendered cutscenes which are by nature not-interactive... very cool idea :D.

Well, unless these

"- Visible graphical gliches on object edges moving at different speeds (or at different direction)
- Requires high real frame update rate to minimize the artifacts"

were too noticeable I guess.

I am sure if you made the decision to keep the first experiment only for future reference there was a more than valid reason :).
 
Last edited by a moderator:
Sebbbi,
Could this be relevant to your work? Accelerating Real-Time Shading with Reverse Reprojection Caching (presented last year at Graphics Hardware 2007)

Interesting... this technique is very similar to the one I have implemented. The reverse reprojection system is identical to mine, but they have implemented the cache miss handling (pixel becomes visible that was not visible in last frame) very differently (but the cache miss detection is also identical to mine).

FWIW I did ask the author(s) if they'd considered doing bidirectional "motion" (akin to the B frames in MPEG) but they hadn't. I was thinking it would be good for temporal AA.

The system I have implemented basically does both temporal AA and 2xSSAA on every surface (but not on the object edges).
 
Interesting to hear of at least one other person using frame recirculation in a real renderer, I'm also sampling from the previous drawn frame (verts have both current and previous frame position). A lot of conventional minded rendering people think it is crazy. However, lots of ability to factor work out across many frames this way.

I find it quite strange that majority of the graphics engine developers haven't even considered reusing the last frame pixel information. Most of the pixels stay almost the same (only move or rotate according to new object and camera matrices). By using the last frame data, we can achieve a major performance boost easily (and achieve extra quality by the additional pixel samples we have available).

I'm only using the results of the previous frame, not rendering to any kind of surface cache (sounds like you are doing a surface cache?).

Yes, "surface cache" (or "surface cache prediction") is a very good description of the technique. I try to predict what surface pixels are visible in the next real rendered frame. I render the scene to a cache using the predicted matrices. When these surface pixels become visible, I already have the pixels ready for use in the cache. There are no cache misses this way (or at least this removes vast majority of them). If the prediction is correct, the predicted frame also becomes the real frame after the prediction period, so I don't have to render the real frame at all to the cache. At this point next prediction is made (there are always 2 cache paged used, and each surface pixel can be found at least on either one).
 
The problem with re-using the pixels from the last frame is that it is not AFR-friendly thus multi-GPU performance will suffer a lot (if any gain is managed at all) as a result. If you want your technique to scale well with multi-GPUs you'd better off implementing a method that is self-contained within your frame.
 
The problem with re-using the pixels from the last frame is that it is not AFR-friendly thus multi-GPU performance will suffer a lot (if any gain is managed at all) as a result. If you want your technique to scale well with multi-GPUs you'd better off implementing a method that is self-contained within your frame.

I have a Radeon 3870 x2 in my development computer, and I am seeing huge performance gains from this technique. Just one of the cache textures is updated every 4th frame (the other one becomes the last one so it's already transferred to both chips). A cache texture is just a single back buffer sized 8888 texture without a separate z-buffer. Copying this texture to the another chip doesn't consume that much bw. Latency can of course be an issue if you need it right away, but that's not really a problem since you can predict a little bit further in the future and use the old textures for one extra frame to hide the latency.
 
I have a Radeon 3870 x2 in my development computer, and I am seeing huge performance gains from this technique.
The performance you are getting may not be due to Crossfire? Do you have a way to disable Crossfire to assess the amount of scaling you are getting?

Copying this texture to the another chip doesn't consume that much bw.
Usually the main issue with multi-GPU is not so much the amount of data to transfer between GPUs, but the synchronization that is required to transfer this data (i.e. stalling one GPU). Thus even a 1x1 RT transfer can reduce scaling considerably when done at an inopportune time.
 
Usually the main issue with multi-GPU is not so much the amount of data to transfer between GPUs, but the synchronization that is required to transfer this data (i.e. stalling one GPU). Thus even a 1x1 RT transfer can reduce scaling considerably when done at an inopportune time.

Yes, but the synchronization is not an issue, as there is enough time to get the data to the second GPU memory before it's needed (we can always render one extra frame using the old buffer if we just predict 1/4 of a real frame more ahead). However as there aren't currently any way to tell the API that the buffer should be moved ASAP to the second GPU memory after it's complete, this all depends on how well the driver detects this scenario. However as these cache buffers are always used by both GPUs the driver should notice it after few frames that the data of these buffers should always be duplicated to both GPU memory whenever it's ready. If the automatic SLI/Crossfire optimization systems do not catch this, the manufacturer will optimize it by hand (if the game is relevant enough). These are the cases that cause the driver build notes show "+30% performance on game xxxx on SLI mode".

Update:
By jittering shadow map coordinates (or light coordinate), it's possible to create 2 sample soft shadows (or shadow border quality increase) with this techinique. Also if we accumulate data to a accumulation buffer instead of just calculating an average from 2 texture buffers, the soft shadow quality can be improved considerably. This technique can also be used to dramatically improve quality and performance of screen space ambient occlusion and other real time diffuse global illumination estimation techniques.
 
Last edited by a moderator:
Back
Top