Machine Learning to enhance game image quality

Alucardx23

Regular
I want to dedicate this thread to how machine learning will help to greatly improve image quality at a relatively low performance cost. We can start with Google's RAISR. Here are some claims from Google:

-High Bandwidth savings
"By using RAISR to display some of the large images on Google+, we’ve been able to use up to 75 percent less bandwidth per image we’ve applied it to."

-So fast it can run on a typical mobile device
"RAISR produces results that are comparable to or better than the currently available super-resolution methods, and does so roughly 10 to 100 times faster, allowing it to be run on a typical mobile device in real-time."

-How it works
"With RAISR, we instead use machine learning and train on pairs of images, one low quality, one high, to find filters that, when applied to selectively to each pixel of the low-res image, will recreate details that are of comparable quality to the original. RAISR can be trained in two ways. The first is the "direct" method, where filters are learned directly from low and high-resolution image pairs. The other method involves first applying a computationally cheap upsampler to the low resolution image and then learning the filters from the upsampled and high resolution image pairs. While the direct method is computationally faster, the 2nd method allows for non-integer scale factors and better leveraging of hardware-based upsampling.

For either method, RAISR filters are trained according to edge features found in small patches of images, - brightness/color gradients, flat/textured regions, etc. - characterized by direction (the angle of an edge), strength (sharp edges have a greater strength) and coherence (a measure of how directional the edge is). Below is a set of RAISR filters, learned from a database of 10,000 high and low resolution image pairs (where the low-res images were first upsampled). The training process takes about an hour."

33n9gsh.png

Comments:
We are talking about a neural network that learns the best way to upscale images, based on a data base of thousands of compared images at different resolutions. As an example, you have developer X trying to develop a game that has a target of 1080P/60fps on the PS4 hardware, in theory you could let a neural network compare a bunch of images for hours/days of your game, running at 720P VS 1080P, and it will get better finding the best custom upscaling method to simulate a 1080P image, based on a 720P framebuffer.

I have seen several examples of AA methods that work wonders on one game, but don't work as good on others, since a lot has to do with the game aesthetics. This means that with this method every game can have their own custom AA filters that no other "One size fits all AA technique" can compete with at the same performance level.

RAISR Upscaling examples:
529825-google-raisr-eyes.jpg


529828-google-raisr-horse-head.jpg


Google-RAISR-Photos-2.jpg


Source material:

Saving you bandwidth through machine learning

https://blog.google/products/google-plus/saving-you-bandwidth-through-machine-learning/

Enhance! RAISR Sharp Images with Machine Learning

https://research.googleblog.com/2016/11/enhance-raisr-sharp-images-with-machine.html
 
Last edited:
Magic Pony is another company that uses neural networks to improve image quality, but they are more focused on video.

Artificial Intelligence Can Now Design Realistic Video and Game Imagery
https://www.technologyreview.com/s/...-now-design-realistic-video-and-game-imagery/

"The company has developed a way to create high-quality videos or images from low-resolution ones. It feeds example images to a computer, which converts them to a lower resolution and then learns the difference between the two. Others have demonstrated the feat before, but the company is able to do it on an ordinary graphics processor, which could open up applications. One example it’s demonstrated uses the technique to improve a live gaming feed in real time."

Example of video stream improvement:
screen_compare.jpg

 
I'm thinking about the possible application to old SD content. If you could do it live you'd never lose any information.
 
If it works well enough on video, you could broadcast at a quarter of the resolution, allowing for less compression and greater overall clarity. Imagine YouTube videos actually looking good!
 
As somebody does performs daily "magic" via a server farm, my first question is: how much computational power is required to do this in realtime? This may well be a solution for a one-run time over old SD footage in order to make it available to all in HD, how does this work in realtime with millions of clients?
 
My question is more about the "learning" part, like how much processing resource is required to do this in a time reasonable enough to allow for real-time implementation in a videogame?

Would the entire process not introduce far too much latency?
 
  • Like
Reactions: JPT
The learning part would be trained before hand by whoever makes the software, that would take probably weeks. I think currently, training takes anywhere between 10000 to 100000 images for image recognition software depending on how accurate you need it to be which would probably be true for this as well, this can be done in development time. The inference + rendering part is currently possible to do in under a second but even that would probably be too much for a low latency real time system like a game. I think right now, if you try to do this on a GPU, it would cost a lot more performance than directly rendering the image at higher resolution. There are dedicated hardware solutions coming out to do things like this like Google's TPU. These are much more efficient that GPUs at the specific calculations and Google claims 10x efficiency. We might see dedicated silicon for processing machine learning code in the future inside PCs and Consoles.
 
These are much more efficient that GPUs at the specific calculations and Google claims 10x efficiency. We might see dedicated silicon for processing machine learning code in the future inside PCs and Consoles.
Google's 10x claim is versus K80. K80 is a Kepler based GPU launched in 2012. Everybody knows that Kepler wasn't the best GPU for compute. Also Maxwell added double rate fp16 and Pascal added 4x rate uint8 operations. Googles TPU is doing uint8 inference only.

This is Nvidia's recent response to Google's TPU claims:
https://www.extremetech.com/computi...nge-googles-tensorflow-tpu-updated-benchmarks

Vega is going to support double rate fp16 and 4x rate uint8 operations. It will be interesting to see whether some games start to use machine learning based upscaling / antialiasing techniques, as consumer GPUs soon have all the features requires for fast inference. Nvidia is currently limiting double rate fp16 and 4x rate uint8 to professional GPUs. Intel has double rate fp16 already on all consumer grade iGPUs (not sure about 4x rate uint8).
 
My question is more about the "learning" part, like how much processing resource is required to do this in a time reasonable enough to allow for real-time implementation in a videogame?
Most of the neural networks do not learn while doing their job (called inference). Common way is to train the network first and then use it. You would store a trained network in the game package (disk). At runtime you only do inference (no training).
 
Thanks for the responses gents, but I must admit I'm still lost...

How do you train the neural network the difference between two images that haven't even been generated yet?

If the image, i.e. each frame of the videogame (i.e. final display frame) is only generated at runtime, then what are you using to train the neural network with beforehand?

I feel like I'm missing something important here.
 
How do you train the neural network the difference between two images that haven't even been generated yet?

If the image, i.e. each frame of the videogame (i.e. final display frame) is only generated at runtime, then what are you using to train the neural network with beforehand?
Neural network works a bit like the human brain. You can read text, recognize a logo, recognize a car or a building or a friend you know when you watch them from different angles, different distances or in different lighting conditions. You still recognize variations of the same logo easily, even at first time. You don't need to learn every single case separately. Very important thing in training is to avoid over-fitting. Over-fitting means that the network can only recognize exactly the training set. Instead you want more generic network that can recognize things that share properties and patterns with the training set. The network is trained with lots of different data to ensure that the network can figure out generic rules and patterns instead of just detecting a few examples. After training, you test the network with another training set that hasn't been used in training to ensure that it gives the right results.

For example a line antialiasing network could learn how to estimate exact lines from grid of 1/0 values. This network could learn common patterns of neighborhoods to calculate the exact position and direction of the line at each point. It is important that the network learns generic rules & patterns instead of remembering every single image. This allows both smaller networks (less neurons) and makes them more generic (applicable to images not in the training set).
 
Sebbbi, thanks for the explanation. I think where I was getting stuck was in understanding both exactly what the neural networks were being trained to see, and also the scope of what they would be used to do when being applied on a new image.

Your antialiasing example is actually a great example that really made it click in my mind.

I'm definitely interested to see the kind of results this produces.

Google's 10x claim is versus K80. K80 is a Kepler based GPU launched in 2012. Everybody knows that Kepler wasn't the best GPU for compute. Also Maxwell added double rate fp16 and Pascal added 4x rate uint8 operations. Googles TPU is doing uint8 inference only.

This is Nvidia's recent response to Google's TPU claims:
https://www.extremetech.com/computi...nge-googles-tensorflow-tpu-updated-benchmarks

Vega is going to support double rate fp16 and 4x rate uint8 operations. It will be interesting to see whether some games start to use machine learning based upscaling / antialiasing techniques, as consumer GPUs soon have all the features requires for fast inference. Nvidia is currently limiting double rate fp16 and 4x rate uint8 to professional GPUs. Intel has double rate fp16 already on all consumer grade iGPUs (not sure about 4x rate uint8).

I wonder whether existing GPUs are actually necessarily the best-suited hw for this kind of application? Perhaps someone can come up with some fixed-function hw to bolt onto the end of a GPU to accelerate these kinds of inference-based techniques?
 
I wonder whether existing GPUs are actually necessarily the best-suited hw for this kind of application? Perhaps someone can come up with some fixed-function hw to bolt onto the end of a GPU to accelerate these kinds of inference-based techniques?
New 8/16 bit packed math instructions are a big improvement. But you are most likely right. In the long run GPUs will be replaced by ASICs. However GPUs capable of fast inference will be available in consumer devices in a few month. Neural network ASICs will be integrated in various consumer electronics, such as digital cameras, but I doubt we see general purpose ASICs soon in PCs and consoles. But gaming devices always have GPUs. I am certain that games will use GPUs to run simple inference tasks pretty soon. First it will be like "Hairworks" and "VXGI", but eventually scale down.
 
Google's 10x claim is versus K80. K80 is a Kepler based GPU launched in 2012. Everybody knows that Kepler wasn't the best GPU for compute. Also Maxwell added double rate fp16 and Pascal added 4x rate uint8 operations. Googles TPU is doing uint8 inference only.

This is Nvidia's recent response to Google's TPU claims:
https://www.extremetech.com/computi...nge-googles-tensorflow-tpu-updated-benchmarks

Vega is going to support double rate fp16 and 4x rate uint8 operations. It will be interesting to see whether some games start to use machine learning based upscaling / antialiasing techniques, as consumer GPUs soon have all the features requires for fast inference. Nvidia is currently limiting double rate fp16 and 4x rate uint8 to professional GPUs. Intel has double rate fp16 already on all consumer grade iGPUs (not sure about 4x rate uint8).

The 4x rate uint8 operations may be a bit murky on consumer Pascal as initial reports/reviews mention GTX 1080 with no int8/dp4a, but Scott Gray tested this on the Nvidia dev forums and found it was fully supported with performance of around 33-36 Tops, he also proved the behaviour of FP16.
This sort of makes sense because the Tesla P4 (professional segment) is GP104 like the GTX1070/GTX1080 and also supports int8/dp4a.
But maybe this is a CUDA thing *shrug*.

Cheers
 
nvidia-siggraph-4.jpg


"Nvidia researchers used AI to tackle a problem in computer game rendering known as anti-aliasing. Like the de-noising problem, anti-aliasing removes artifacts from partially-computed images, with this artifact looking like stair-stepped “jaggies.” Nvidia researchers Marco Salvi and Anjul Patney trained a neural network to recognize jaggy artifacts and replace those pixels with smooth anti-aliased pixels. The AI-based solution produces images that are sharper (less blurry) than existing algorithms."

Nvidia uses AI to create 3D graphics better than human artists can
https://venturebeat.com/2017/07/31/...te-3d-graphics-better-than-human-artists-can/


I think that we can start to make the prediction that AI dedicated hardware will become a common thing for games.

Intel puts Movidius AI tech on a $79 USB stick
https://www.engadget.com/2017/07/20/intel-movidius-ai-tech-79-dollar-usb-stick/

"With the Compute Stick, you can convert a trained Caffe-based neural network to run on the Myriad 2, which can be done offline. Ultimately, the device will help bring added AI computing power right to a user's laptop without them having to tap into a cloud-based system. And for those wanting even more power than what a single Compute Stick can provide, multiple sticks can be used together for added boost."

 
Last edited:
it doesn't look much better than any old -and blurry- post-process aa.
EDIT: ...so far. I hope they continue research.
 
Back
Top