Neural SuperSampling, FaceBook Researchers [2020]

Remij

Veteran
Facebook developing an AI-assisted supersampling technique for real-time rendered content

https://www.roadtovr.com/facebook-d...ring-performance-high-resolution-vr-headsets/

Facebook has developed an AI-assisted method for supersampling real-time rendered content, something that could become an integral addition to games coming to future generations of high-resolution VR headsets.


There’s an ongoing arms race between display technology and GPUs, and adding VR into the mix only underscores the disparity. It’s not so much a question of putting higher-resolution displays in VR headsets; those panels are out there, and there’s a reason many manufacturers aren’t throwing in the latest and greatest in their headsets. It’s really more about hitting a smart balance between the display resolution and the end user’s ability to adequately render that VR content and have it look good. That’s the basics anyway.

That’s why Facebook is researching AI-assisted supersampling in a recently published paper, dubbed ‘Neural Supersampling for Real-time Rendering’. Using neural networks, Facebook researchers have developed a system capable of inputting low-resolution images and obtaining high-resolution output suitable for real-time rendering. This, they say, restores sharp details while saving computational overhead.

Comparison images:
10.png


11.png


4.png


5.png

PDF link to paper describing how it works:
https://research.fb.com/wp-content/uploads/2020/06/Neural-Supersampling-for-Real-time-Rendering.pdf

Direct Link to video of technique in action:
https://research.fb.com/wp-content/...ersampling-for-RealTime-Rendering_vid.mp4.zip
 
This sounds seriously good. I hope they can get it to work outside lab too

Facebook claims its neural network is state of the art, outperforming all other similar algorithms- the reason it’s able to achieve 16x upscaling. What makes this possible is the inherent knowledge of the depth of each object in the scene- it would not be anywhere near as effective with flat images.
A new neural network developed by Facebook’s VR/AR research division could enable console-quality graphics on future standalone headsets.
https://uploadvr.com/facebook-neural-supersampling/
 
Neural SuperSampling Is a Hardware Agnostic DLSS Alternative by Facebook

Includes link to the Facebook paper.
https://wccftech.com/neural-supersampling-is-a-hardware-agnostic-dlss-alternative-by-facebook/

I know all this stuff is early research, but I hate that there's no hardware specs given.
So something like currently able to achieve realtime on an amd rx580 using x amount of resources.
Nice it's able to do 16* upscale to 2160p in realtime but is that with a rtx2080 ti?
 
Last edited:
I know all this stuff is early research, but I hate that there's no hardware specs given.
It's given in the paper, here it goes:
"After training, the network models are optimized with Nvidia Ten-sorRT [2018] at 16-bit precision and tested on a Titan V GPU"
1920×1080 - 24.42 ms
1920×1080 "fast" version - 18.25 ms
That's an order of magnitude slower than DLSS 2.0 on 2080 Ti (actually DLSS 2.0 takes ~1.5 ms for 1080p to 4K temporal upscaling on RTX 2080 Ti), which is also slower than Titan V on FP16 inference.
They are obviously comparing their method with DLSS 1.0, but if you take a look at provided video you will notice that temporal stability sucks ass, which is expected for 16x scaling.
Not sure about generalization either, they include their test scenes in training data set and say "although including the test scenes into training datasets seems toalways improve the quality"
Of cause it will improve the quality due to overfitting, but it will also likely worsen generalization.
 
It's given in the paper, here it goes:
"After training, the network models are optimized with Nvidia Ten-sorRT [2018] at 16-bit precision and tested on a Titan V GPU"
1920×1080 - 24.42 ms
1920×1080 "fast" version - 18.25 ms
That's an order of magnitude slower than DLSS 2.0 on 2080 Ti (actually DLSS 2.0 takes ~1.5 ms for 1080p to 4K temporal upscaling on RTX 2080 Ti), which is also slower than Titan V on FP16 inference.
They are obviously comparing their method with DLSS 1.0, but if you take a look at provided video you will notice that temporal stability sucks ass, which is expected for 16x scaling.
Not sure about generalization either, they include their test scenes in training data set and say "although including the test scenes into training datasets seems toalways improve the quality"
Of cause it will improve the quality due to overfitting, but it will also likely worsen generalization.
Not quite apple to apple comparisons here.

Facebook is doing a 4x4 upscale here, vs Nvidia's 2x2.

Their results are very good considering how low of a resolution they are coming from, it's unimaginable it can even get something so close to reference.

Rendering Efficiency. We take the Spaceship scene as a representative scenario to demonstrate how the end-to-end rendering efficiency can be improved by applying our method. We render on an Nvidia Titan RTX GPU using the expensive and high-quality ray-traced global illumination effect available in Unity. The render pass for a full resolution image takes 140.6ms at 1600 × 900. On the other hand, rendering the image at 400 × 225 takes 26.40ms, followed by our method, which takes 17.68ms (the primary network) to upsample the image to the target 1600 × 900 resolution, totaling to 44.08ms. This leads to an over 3× rendering performance improvement, while providing high-fidelity results.

2x2 is reasonable in terms of what you can achieve by inference, in simplistic viewpoints you're only asking it to guess every other pixel.

4x4 is a great deal more reliant on the network to guess what the results should be; simplistically you're really asking it to infer a lot more.

What facebook accomplished here is pretty massive, most networks wouldn't compare, let alone at the speed at which this rendered at.


4.3.8 Discussion with DLSS. While Nvidia’s DLSS [Edelsten et al. 2019] also aims for learned supersampling of rendered content, no public information is available on the details of its algorithm, performance or training datasets, which makes direct comparisons impossible. Instead, we provide a preliminary ballpark analysis of its quality performance with respect to our method, however, on different types of scenes. Specifically, we took the AAA game “Islands of Nyne” supporting DLSS as an example, and captured two pairs of representative screenshots, where each pair of screenshots include the DLSS-upsampled image and the full-resolution image with no upsampling, both at 4K resolution. The content is chosen to be similar to our Spaceship and Robots scene in terms of geometric and materials complexity, with metallic (glossy) boxes and walls and some thin structures (railings, geometric floor tiles). For copyright reasons, we cannot include the exact images in the paper. Instead, we computed the PSNR and SSIM of the upsampled images after masking out mismatched pixels between the upsampled and the full-resolution images due to dynamic objects, plot the numerical quality as a distribution, and add our results quality to the same chart. Our results were computed on the test dataset from our Unity scenes (600 test frames per scene), reported as a box and whisker chart in Figure 11. While it is not a direct comparison (and generally it is impossible to compare the methods on the same scene), we believe this experiment can suggest that the quality ballparks of our method and DLSS are comparable

That being said this video here: https://forum.beyond3d.com/posts/2136620/
As per the video is total witchcraft.

haha. Unfortunately i have no clue on render time.
 
Last edited:
Yea, this tech is pretty amazing. I wonder how perceptible the difference is between their output and the reference while viewing through a headset?

All of these NN based upscaling techniques are really breaking new ground. It's crazy to think that they're really just getting started too.
 
Yea, this tech is pretty amazing. I wonder how perceptible the difference is between their output and the reference while viewing through a headset?

All of these NN based upscaling techniques are really breaking new ground. It's crazy to think that they're really just getting started too.
Indeed, the reason you’re seeing such fast improvements is because Computer vision using neural networks is considered solved. Getting to run as fast as possible as real time as possible with the smallest footprint and the cheapest amount of re-training costs is the new game here.

I believe that NLP is considered solved as well; but it’s going to be a really long time for it to get anywhere near real-time fast on a single device.
 
Their results are very good considering how low of a resolution they are coming from
I don't think low input resolution matters.
For accumulation algo, it does't matter whether you accumulate samples from 4x lower res or 2x, etc. It doesn't hit performance at all.
Network is applied in the very end of the pipeline and dominates in execution time. Hence, what matters is inference resolution and I am pretty sure 1080p to 4K DLSS 2.0 inference resolution is not lower than 270p to 1080p in the facebook research paper.

I've read the paper further and noticed that there is no temporal loss function, this also explains bad temporal coherency in the example video.
Once they add it, image will become blurrier, but likely much more stable without the wobbling effect.
Also, it seems network has learned some directional blur and uses it to make edges smoother, but this also adds up to the wobbling effect.
Wonder whether temporal loss function will force the network to forget the directional blur strategy, which should make static image more rough.
 
Back
Top