Support for Machine Learning (ML) on PS5 and Series X?

Transcoding is often a two step process with the initial step being decoding to an intermediate uncompressed format. The second step is to encode into the desired format. Crunch seems to have the ability to avoid this step. The initial compression (CRN) format is designed to allow for conversion from the compressed format itself. Or at least that’s my impression from the writings of Geldreich, the creator of crunch.

You are absolutely right. Which is why I compared BCPACK to it. Not aware if same applies for Kraken with RDO encoding (oodle texture), though I guess it must be.
 
Not aware if same applies for Kraken with RDO encoding (oodle texture), though I guess it must be.
Yes. RDO doesn't compress the textures outside of BCn compression. It just arranges the BCn compression more effectively allowing the load to restore the original BCn. Presently we have:

Raw image -> BCn data on disc -> load BCn -> BCn texture in RAM ready to use
Raw image -> BCn data -> RDO'd data on disc -> load RDO data -> BCn texture in RAM ready to use
Raw image -> JPEG data on disc -> load JPEG -> decompress to image data -> turn into BCn -> BCn texture in RAM ready to use

Where 'JPEG' can be any data format including PNG for lossless quality.
 
Yes. RDO doesn't compress the textures outside of BCn compression. It just arranges the BCn compression more effectively allowing the load to restore the original BCn. Presently we have:

Raw image -> BCn data on disc -> load BCn -> BCn texture in RAM ready to use
Raw image -> BCn data -> RDO'd data on disc -> load RDO data -> BCn texture in RAM ready to use
Raw image -> JPEG data on disc -> load JPEG -> decompress to image data -> turn into BCn -> BCn texture in RAM ready to use

Where 'JPEG' can be any data format including PNG for lossless quality.

Instead of choosing quality or compression factor as a single determiner of output, RDO wieghs both quality and compression ratio. In other words, RDO will choose higher compression ratios when loss of quality is minimal or choose less compression when loss of quality is too high.

RDO comes from the video codec world, where rate refers to bit rate and distortion refers to loss of quality.

RDO uses the original texture image but happens after block compression and before lossless.
 
Last edited:
More confirmation that the leveraging of ML for texture reconstruction is being actively pursued by the relevant division (Playfab Azure AI). From the horse's mouth (https://gamerant.com/microsoft-xbox-cloud-gaming-experimental-tech-machine-learning/):

"We have a research project in our studios where they built a texture decompression and compression algorithm, where they trained a machine learning model up on textures. They can take a big texture, shrink it down till it’s really ugly, and then use the machine learning model to decompress it again in real time. It’s not exactly the same, but it looks really realistic. It’s almost creepy how it works."

This aligns well with the aforementioned patent: https://patents.justia.com/patent/20200105030
 
Dlss 2.0 is calculated using tensor cores but is it int8 or fp16 calculation ?
Tensor cores support mixed precision. That's the key component you need from a feature perspective to increase the speed of calculation and limit the reduction in accuracy. DLSS2.0 networks could range from int4 -> FP32 if required
 
Tensor cores support mixed precision. That's the key component you need from a feature perspective to increase the speed of calculation and limit the reduction in accuracy. DLSS2.0 networks could range from int4 -> FP32 if required
but is there public knowledge of type of caluculation used in dlss2 ? on wiki quite interesting info
First 2.0 version, also referenced as version 1.9, using an approximated AI of the in-progress version 2.0 running on the CUDA shader cores and specifically adapted for Control
 
but is there public knowledge of type of caluculation used in dlss2 ? on wiki quite interesting info
I have never seen their solution out in the public space. Typically in our space, if you're going to run for high performance AI computing, you're going to either use a library or run an optimizer that will go through your network and reduce the weights of the nodes as required. I don't think you can just peek at the DLSS2.0 weights. I haven't tried or seen anything on this domain at least, but I can only suspect the weights to range from int4 to fp32. And that will change on the quality settings and how much you need to upscale by.

CUDA cores also supports mixed precision on RTX cards. BUt I don't know if they go as low as int4 or even int8. But they support FP16 - > FP32 mixed.

You can always run any network using FP32 hardware as long as it doesn't require FP64 ;)
I couldn't not tell you the gain from reducing weights as it's going to differ for each network. The idea that you can turn a FP32 network and drop all weights to int4 while retaining a high degree of accuracy is unlikely, and there is a high probability of overflow errors. So you should expect there to be a variation of weighs across the network, but only the people developing it would know how much is saved by the reductions.
 
Last edited:
I have never seen their solution out in the public space. Typically in our space, if you're going to run for high performance AI computing, you're going to either use a library or run an optimizer that will go through your network and reduce the weights of the nodes as required. I don't think you can just peek at the DLSS2.0 weights. I haven't tried or seen anything on this domain at least, but I can only suspect the weights to range from int4 to fp32. And that will change on the quality settings and how much you need to upscale by.

CUDA cores also supports mixed precision on RTX cards. BUt I don't know if they go as low as int4 or even int8. But they support FP16 - > FP32 mixed.

You can always run any network using FP32 hardware as long as it doesn't require FP64 ;)
I couldn't not tell you the gain from reducing weights as it's going to differ for each network. The idea that you can turn a FP32 network and drop all weights to int4 while retaining a high degree of accuracy is unlikely, and there is a high probability of overflow errors. So you should expect there to be a variation of weighs across the network, but only the people developing it would know how much is saved by the reductions.
got the info thx to Locuza, its int8, source https://www.pcgameshardware.de/Nvid...warrior-5-DLSS-Test-Control-DLSS-Test-1346257
Nvidia's developer tool Nsight allows you to analyze games and their frames in order to expose performance hogs. The possibilities go far beyond seeing, for example, which effect requires which computing time. The tool also shows how well the GPU is being used and what resources are being used. Anyone who chases a DLSS 2.0 game through Nsight will see several things here: DLSS 2.0 is applied at the end of a frame and causes data traffic on the tensor cores (INT8 operations).
 
Interesting. Sounds like they used QAT.

This is pretty cutting edge stuff for someone in my shoes. I guess I was expecting Nvidia to have some crazy stuff at int4 levels as well as int8 and above. Hmm.. I guess I was also expecting them not to quantize everything to int8.

I'll have to check it out myself. I suppose that is the shock for me, but perhaps it's not a shock. I don't know what is more shocking I guess, the fact that it's all int8, and thus amazing. Or that it all int8 and some actual mixing of precision would have resulted in better outputs.
 
Last edited:
Apparently insomniac use machine learning for muscle deformation on new ps5 miles morales suit
https://support.insomniac.games/hc/...6532-Version-1-09-PS4-1-009-PS5-Release-Notes
https://docs.zivadynamics.com/zivart/introduction.html

New-Spider-Man-Mile-Morales-Update-Adds-Sleek-Advanced-Tech-Suit-2-scaled.jpg
 
Last edited:
I'm not understanding if they used ML to pre-bake the deformations offline, or if they are using ML in real-time on PS5?
 
in the ziva real time documentaion Basic Workflow describes offline training tho


  • Go to “Motion”->”Load Animation of Skeleton / Extra Parameters (FBX)” to load a skeleton animation for your character. This skeleton animation must correspond to the mesh animation you provided in the previous step. Note that the mesh and skeleton animations are paired together in a row of the asset-management window.

  • Repeat the above process as many times as needed to load multiple FBX/ABC training pairs, each one into a separate row of the multi-asset view.

  • Go to “Rig”->”Train”. A dialog box will appear with a range of parameters. Click “Train” in this dialog box to train the ZivaRT rig. A dialog will present once the training has finished.

  • Go to “Rig”->”Save” to export the trained ZivaRT character rig to disk as a .zrt file, to be used by one of the player applications (Maya or UE4).
 
I would argue there is rarely a case to be made for real-time AI *learning* in animation or the rendering pipeline.

You can train an AI, then take the model from the training results, and bake that model in your animation, resolution upscaling, etc.

Most aspects of gaming you want to be deterministic in your end product anyway.
 
in the ziva real time documentaion Basic Workflow describes offline training tho


  • Go to “Motion”->”Load Animation of Skeleton / Extra Parameters (FBX)” to load a skeleton animation for your character. This skeleton animation must correspond to the mesh animation you provided in the previous step. Note that the mesh and skeleton animations are paired together in a row of the asset-management window.

  • Repeat the above process as many times as needed to load multiple FBX/ABC training pairs, each one into a separate row of the multi-asset view.

  • Go to “Rig”->”Train”. A dialog box will appear with a range of parameters. Click “Train” in this dialog box to train the ZivaRT rig. A dialog will present once the training has finished.

  • Go to “Rig”->”Save” to export the trained ZivaRT character rig to disk as a .zrt file, to be used by one of the player applications (Maya or UE4).
That is what i read, hence the confusion about the nature of the real-time mention.
 
That is what i read, hence the confusion about the nature of the real-time mention.

Last I heard "Online Neural Network Training" is still an unattainable Holy Grail save for some non-practical academic implementations.
Almost all Neural Network Training happens offline, i.e. you don't run inference on a neural network that is in the training process.


What the Ziva documentation describes seems to me like the standard workflow of feeding a neural network with representative data. In this case, they need to feed the Neural Network with skeleton poses as input and photogrammetry of people in the same poses as desired output.
This seems like a very good area to apply ML on, since the alternative would be to do musculoskeletal modeling and simulation in real-time (which is neither feasible nor effective for videogames), just to determine the muscle bumps according to poses.


Regardless, the output of training this neural network is.. a trained neural network that feeds data for geometry deformation. So while the training is obviously done offline, the console needs to be running machine learning inference in real-time, to deform the geometry according to the different poses.


This is a pretty cool tech. Having a NN to determine anatomically correct geometry could be pretty big.
For example, Michaelangelo's statues would often stand out for the ridiculous detail he showed for human anatomy.



Just imagine that eventually we'll get a NN that provides this kind of anatomy detail, and all the animators need to do is move the skeleton parts around to get the desired output.
Cool stuff.
 
Last edited by a moderator:
Regardless, the output of training this neural network is.. a trained neural network that feeds data for geometry deformation. So while the training is obviously done offline, the console needs to be running machine learning inference in real-time, to deform the geometry according to the different poses.
To sort of add to this; typically trained models are fairly light to run. The liklihood a GPU is going to be doing anything significant NN wise will typically be vision, upsampling, aa etc, related due to the number of points the NN has to take account for.

But the whole idea that PS5 can't do ML because it doesn't have inference extensions is a poor take. We've been doing it on GPUs since Kepler without tensor cores, without mixed precision, and it runs on some really weak IOT devices. The mixed precision dot products are nice to have as it will shave off some cycles. But I believe that this particular function of ML support has been over stated in terms of it's performance benefit.
 
To sort of add to this; typically trained models are fairly light to run. The liklihood a GPU is going to be doing anything significant NN wise will typically be vision, upsampling, aa etc, related due to the number of points the NN has to take account for.

But the whole idea that PS5 can't do ML because it doesn't have inference extensions is a poor take. We've been doing it on GPUs since Kepler without tensor cores, without mixed precision, and it runs on some really weak IOT devices. The mixed precision dot products are nice to have as it will shave off some cycles. But I believe that this particular function of ML support has been over stated in terms of it's performance benefit.


Yes, and to be clear I do not think this case of "PS5 running ML inference" serves as any proof whatsoever on whether or not the PS5's ALUs can run dot4 INT8 / dot8 INT4 operations.
For all I know, this NN could be so light that it's running on a small percentage of one CPU core, and running it on the GPU could just bring unnecessary dev time with little to no performance benefit.
 
offtopic - some cool info about ziva tools

How Ziva Dynamics uses AI to enhance CGI visual effects (VFX)


Ziva Dynamics: Powering the Animation Revolution
Artificial intelligence and machine learning breathe life into computer-generated characters in “Pacific Rim: Uprising*”.


https://www.intel.com/content/www/us/en/customer-spotlight/stories/ziva-dynamics-customer-story.html


"
Traditionally, animations are created by first building the characters and then animating them frame by frame. Any deformations and other dynamics are achieved by manually shot-sculpting their body shapes to achieve shot-specific desired results. Not only are these processes time-consuming and expensive, but there are so many variables in how characters move based on physics and size, their underlying anatomical structure, and more. As a result, if something doesn’t look quite right, it means that artists have to go back to the drawing board and determine which layer is off.

That’s where Ziva Dynamics comes in. Its flagship product, ZIVA VFX*, is an advanced simulation technology that mimics the physics of any material, which allows characters to move, flex, jiggle, and stretch organically."


Ziva adding real world physics to People, Sharks, Pigs and even Sofas.

https://www.fxguide.com/fxfeatured/...physics-to-people-sharks-pigs-and-even-sofas/
 
Back
Top