Realtime frame interpolation upscaling 30fps to 60fps

Kasersky

Regular
Introduction: So the problem these days is that in order to achieve the graphical fidelity to the current industry standards developers most often have to sacrifice framerate.

Now what if you could take 30fps or lower than use motion interpolation achieve picture which could be made to look like 60fps. Our friends at DF recently wrote discussing this very thing. Seems that this sorta thing can be achieved though current console hardware nearly "free".

http://www.eurogamer.net/articles/digitalfoundry-force-unleashed-60fps-tech-article

although exhibits some artifacting it is alot less noticeable in motion and though iteration artifacting can probably be reduced.
http://images.eurogamer.net/assets/articles//a/1/2/3/9/2/6/4/interpolation_prototype.jpg.jpg

Slides:
http://and.intercon.ru/rtfrucvg_html_slides/ (Dmitry Andreev)
http://and.intercon.ru/releases/talks/rtfrucvg/

Movies:
360 Demo HD
http://www.megaupload.com/?d=EQUOQ9RW (177MB)

Prototype HD
http://www.megaupload.com/?d=2QN7NZ3N (111MB)

Both SD
http://www.megaupload.com/?d=UR37CHZP (90MB)
uploading....
 
Last edited by a moderator:
So i guess i'll get this started...

one of the things that i was thinking of was that this actually wouldnt improve input latency at all because you'd essentially need the current frame and future frame to build an interpolated frame however seems that they actually use the front buffer and use it directly meaning there is an update that your eye will perceive thus improving latency.

The most simple and efficient solution is to do the interpolation in-place, during the current frame, while the previous one is on screen. This way the previous front buffer can be mapped as a texture (X360) and used for interpolation directly. In terms of latency there is something interesting going on. I said that there is no extra latency, which is true. But if you think about it, latency is actually reduced. Because we get the new visual result 16.6 ms earlier.
 
My concern with these systems is always what happens when they go out of their baseline operating parameters (i.e. sub 30fps). Credit where credit is due, he does seem to have examined this:

According to Andreev, if the game drops below 30FPS the PS3 is still able to carry out the "flip" at the right point between the two frames. However, on Xbox 360, Microsoft's TCRs - the technical rules which dictate what you can and can't do with its hardware - insist that all calls to the graphics hardware go through their own APIs, and there isn't an equivalent system in place on DirectX.

This sounds like if the framerate dropped to 25fps the effective result of the algorithm would be 50fps. What that would look like in motion is anyone's guess - as the framerate dropped the error in the approximation would increase significantly (the estimation period is >20% longer at 25fps than at 30fps). Similarly, tearing would destroy the illusion completely so wouldn't be viable.

It does sound completely contradictory to the way things have progressed this generation. Why on the one hand say you're unwilling to make the graphical quality sacrifice to make your game run at 60fps, then simultaneously promote an algorithm which introduces artefacts into the output in order to give the illusion of a higher framerate? The way this gen has gone, I'd be surprised if someone somewhere wasn't thinking "I wonder if we can render at 15fps and interpolate to 30fps?!"
 
So how would a framerate analysis work in a game that uses this technique? Would it register 60 unique frames? Also it would be nice to see this used on a 15 fps game upscaled to 30 to see how well it can manage such a low framerate, if its not really noticeable apart from the artifacts (which according to the article can be reduced if the game is designed with this technique in mind) it could mean a major boost in graphics this gen.
 
What I dont get is how it fills in the additional in-between frame without causing some lag. The 120hz TVs that do something similar have some latency because they first have to see the next frame before they can add the in-between. You can see this at stores where they have the same movie running on multiple TVs simultaneously.

The sound is synchronized with all but you will notice that the 120hz TV's that smooth out the framerate will have a slight lag compared to the usual 60hz TV's

With this technology though the action is completely 1:1. Motion starts from the same frame at the exact same time and ends on the same frame at the exact same time unlike the above mentioned example.

Its like their technology can predict whats going to happen before it actually happens. And real time visuals are more unpredictable than recorded movies. Camera positions and object placement can take infinite values. So how do they pull it?

Does it read values in nanoseconds to know how to fill in the in between frame hence the human eye cant notice the lag? :p
 
The assumption is that deferred shading is used with most of the rendering time in shading, with the motion vectors from the z-only pre-pass being available much earlier (and used for the in between frame). Which is a decent assumptions on present generation consoles, but it's not an assumption which will hold in the future.
 
MfA said:
but it's not an assumption which will hold in the future.
Not necessarily an issue though - great many 60fps games already run interpolated game-logic at half framerate(same delay as this method would give if motion vectors were a frame late), and I've yet to see people complain about latency on any of them.

Anyway at least this gen, these kind of techniques could offer a nice way of working with stereoscopic rendering without killing the framerate.
 
The assumption is that deferred shading is used with most of the rendering time in shading, with the motion vectors from the z-only pre-pass being available much earlier (and used for the in between frame). Which is a decent assumptions on present generation consoles, but it's not an assumption which will hold in the future.
Lighting, shadow map rendering and post processing will be much more complex in the future games. Games will start to include sophisticated real time global illumination techniques (at the minimum really sophisticated screen space ambient occlusion with color bleeding). Many games will also utilize virtual texturing, and thus need to render a page request texture. Reusing last frame data will be more important than ever.

A simple reverse reprojection pass (transform geometry by last frame matrix to get last frame position as texture coordinate and use current matrix to transform it to correct position, sample texel at last frame position and output it) is really fast. The vertex shader is faster than a standard vertex shader, since the vertex is really slim (no normals, tangents, bitangents, texcoords, lighting stuff, etc). The pixel shader is just a simple texture fetch to last frame texture (or two textures (choose correct one) in our approach). During Trials HD production we experimented with various techniques like this, and we could easily reach frame rates of 150+ with slightly more complex content we released the game with. The final game didn't have any frame interpolation technology, since we got it to run at vsynch locked 60 fps without it.

There is no additional latency assuming you run your game logic at 60 frames per second (the reverse reprojection pass uses it's own frame data, it does not need to predict anything). Many games run their game logic at 120 fps already (physics engines tend to get broken if the step size it too long). 60 fps game logic should not be a burden for most games. Graphics rendering is often the bottleneck.

Anyway at least this gen, these kind of techniques could offer a nice way of working with stereoscopic rendering without killing the framerate.
Agreed 100%. The stereo images contain so high amount of identical surface pixel data, that it would just be silly not to reuse it.
 
Last edited by a moderator:
Lighting, shadow map rendering and post processing will be much more complex in the future games.
I bloody well hope geometry gets a fuckton more complex in future games as well ... I am getting sick of identifying polygons.
 
I bloody well hope geometry gets a fuckton more complex in future games as well ... I am getting sick of identifying polygons.
Tessellation keeps the performance loss quite a bit lower than simply bumping up the polygon count. With tessellation you get more polygons where you need them, and less polygons where you don't. Basically the higher view distance you have, the more performance gain tessellation gives you versus a brute force approach. The current polygon count in games would be enough for future games if the (screen space) polygon density is evenly split.

Also you have to account that shadow map triangle complexity also rises up (you often render more polygons to shadow maps than you do to the final rendered output).

[Update]

My prediction about future is that games will rely more and more on pixel based techniques that have constant (average) performance hit and memory footprint per outputted pixel. Vertex based techniques have poor scaling and result in fluctuating unpredictable frame rate. Personally the system I would prefer would be fully virtual textured (texture memory footprint would be constant - around 4x the screen resolution), and all surfaces would have per pixel displacement streamed from the virtual texture. Highly advanced version of parallax occlusion mapping would be used in all surfaces to create the detail from the per pixel displacement map. The new DX11 hardware supports conservative oDepth output from pixel shader (hi-z culling works with oDepth output), and this makes proper parallax occlusion mapping much more efficient to implement (and results in pixel perfect silhouttes and pixel perfect depth output for deferred lighting). The best thing about per pixel displacement mapping (parallax occlusion mapping, etc) is that the performance hit does not rise when your scene complexity rises. It only increases the processing you do for each final image pixel by a constant factor (something that vertex based technologies never can achieve). With proper texture streaming system (virtual texturing) you can have highly detailed displacement maps everywhere, since the memory bottleneck disappears. Basically you can have unlimited geometry detail everywhere (if you have enough HDD space or use procedural technologies). The added displacement map detail does not decrease the rendering performance at all, since the hardware mipmapping (and the virtual texture streaming system) make sure that correct precision data is always used. This basically solves the geometry LOD issue.

Geometry representation with super high polygon meshes takes huge amount of memory, since the positional data of each point needs to be stored (often in floating point formats) and the polygon connectivity data needs to be stored. The more polygons we have, the more space and bandwidth we waste and the more we do meaningless work (process polygons that are less than one pixel sized, transform vertices that occluded or otherwise culled, etc). This generation of console hardware was not powerful enough to do per pixel parallax occlusion mapping everywhere. The next generation will surely be, and I think that many developers will go that route. According to our experiments, we do not even need real polygon count as high as we currently do produce geometry detail that looks absolutely stunning.
 
Last edited by a moderator:
so one of the issues with the implementation in the demo is that it uses a quarter resolution pass (640x360) so the artifacts are ugly, like he said if they added this early they could have better implementation with full resolution.

The most basic algorithm is this: sample the current pixel, if it's a character, then sample four pixels around it with some delta region offset and accumulate only the ones that don't have the character, and normalize the result of accumulation at the end. If there are no such pixels around, then leave the current one as is. It could further be improved by allowing only patches with more horizontal elements than vertical to move horizontally and vice versa.

Our current implementation, shown in the demo, runs at quarter resolution in 3 fixed passes.
 
Thanks Sebbi. Thats a very interesting read. I take it the golden keyboard you got from the Trials HD sales must be serving you well?
 
I take it the golden keyboard you got from the Trials HD sales must be serving you well?
Heh... This is actually my third Microsoft Natural Ergonomic Keyboard 4000... the last ones died when I poured Battery Energy Drink over them :(. We actually had a huge pyramid build from the energy drink cans, but it got removed and replaced by two new programmers a few months ago :)
 
Sebbbi:
That was an amazing post! Thanks for writing it, you rock. :D

Edit:
Voting thread 5 stars for Sebbbi's contribution.
 
Last edited by a moderator:
Heh... This is actually my third Microsoft Natural Ergonomic Keyboard 4000... the last ones died when I poured Battery Energy Drink over them :(. We actually had a huge pyramid build from the energy drink cans, but it got removed and replaced by two new programmers a few months ago :)

Awwww. Obviously the new programmers need to replace the cans. I take it you can arrange a 'crunch' for them as payment for messing up the stack? :p

Anyway, my suggestion is get a spill proof keyboard. My Microsoft wireless entertainment keyboard 7000 is resistant to pretty much everything.
 
Awwww. Obviously the new programmers need to replace the cans. I take it you can arrange a 'crunch' for them as payment for messing up the stack? :p

Anyway, my suggestion is get a spill proof keyboard. My Microsoft wireless entertainment keyboard 7000 is resistant to pretty much everything.

Has Microsoft come out with a new 'Natural' that is spill proof? The old Natural ones would totally die with only a few drops of water let in.
 
It is a very interesting technique, especially if you consider some of the research in image processing - there's this content aware fill or whatever:
http://www.youtube.com/watch?v=NH0aEp1oDOI

On the other hand I'm a bit worried by these trends. There's been a relatively steady increase in the image quality of realtime 3D graphics ever since the release of the 3dfx Voodoo - we started from 16-bit bilinear filtered relatively low-res to 32-bit with HDR, anisotropic filtering, higher resolutions and so on.
But first we started to see upscaled images on the consoles, and now this frame interpolation is another case that may hurt the final image quality. I'm a little worried about breaking this trend...

Sebbbi, very informative and interesting stuff there! One question though, isn't parallax occlusion mapping only a 'fake' effect, in that it can't modify object silhouette edges? Can it be used to create, for example, large spikes on a dragon?
 
Has Microsoft come out with a new 'Natural' that is spill proof? The old Natural ones would totally die with only a few drops of water let in.

Unfortunately they don't appear to be. They seem to be making them spill resistant as they update their range. Maybe they haven't gotten around to it yet?
 
Hijacking the thread, but I still dream of natural keyboard with a trackpoint in the middle split so I don't have to move my hands off my keyboard for daily work.
 
Back
Top