Support for Machine Learning (ML) on PS5 and Series X?

Machine learning has become quite popular, and i suspect we will see 'machine learning' for many, many things. The first half life and halo, they could have called it machine AI if they pushed the marketing abit more (advanced AI for the time).
??? No
 
Thinking about ML, does XSX (And XSS) actually have enough performance to do ML based upscaling?

XSX has what, half the INT4 of an RTX2060? And XSS 1/3 of XSX?

I know the lower performant the machine the more they benefit from ML upscaling but surely you need 'x' amount of performance to do the ML upscaling within a decent time frame else it holds the pipe line up?
 
Thinking about ML, does XSX (And XSS) actually have enough performance to do ML based upscaling?

XSX has what, half the INT4 of an RTX2060? And XSS 1/3 of XSX?

I know the lower performant the machine the more they benefit from ML upscaling but surely you need 'x' amount of performance to do the ML upscaling within a decent time frame else it holds the pipe line up?

Too many unknown variables involved to say definitively. The 2060 S can convert 720p to 1440p in 1.26ms. Even if the XSX took 4 times as long what could the XSX do with 11ms at 720p while targeting 60 fps? Probably more than it could do at native 1440p at 16.7ms.
 
Too many unknown variables involved to say definitively. The 2060 S can convert 720p to 1440p in 1.26ms. Even if the XSX took 4 times as long what could the XSX do with 11ms at 720p while targeting 60 fps? Probably more than it could do at native 1440p at 16.7ms.
Would the cost of upscaling impact rendering (compute) performance on AMD hardware if the ML upscale is using the same compute units as the rendering does? Comparing AMD to nVidia hardware in a DLSS-like upscaling really is an unknown, because nVidia has tensor cores to do the work without impacting the compute performance from the rendering. That's not the case on AMD.
 
Would the cost of upscaling impact rendering (compute) performance on AMD hardware if the ML upscale is using the same compute units as the rendering does? Comparing AMD to nVidia hardware in a DLSS-like upscaling really is an unknown, because nVidia has tensor cores to do the work without impacting the compute performance from the rendering. That's not the case on AMD.
Plenty of unknowns. But.
The frame needs to be fully rendered at the point of upscaling.
Maybe the next frame could be impacted? But this would just need to be factored in as a fixed cost.

XS consoles don't have anywhere close to tensor performance, but Intel seems to believe its possible using dp4a on general GPUs.
If they push it as a general solution and it couldn't work on reasonable selection of cards it would look just as bad on them at that point.

@Dictator covered some of the performance requirements in a switch 2.0 video in terms of DLSS, and someother that I can't remember the topic of.
 
But Intel seems to believe its possible using dp4a on general GPUs.
If they push it as a general solution and it couldn't work on reasonable selection of cards it would look just as bad on them at that point.

Intel stated it would work on general GPU's, they haven't guaranteed it would work fast enough to actually use.

I've re-watched Alex's video on Switch Pro with DLSS and it may be that XSS simply doesn't have the ML performance for upscaling.

XSX may be OK with it but I have doubts that XSS will be able to, which is a shame as it's the machine that needs it the most.
 
  • Like
Reactions: Jay
Intel stated it would work on general GPU's, they haven't guaranteed it would work fast enough to actually use.
Agree. But I suspect it will be serviceable.
Even if it means having to lower settings which you wouldn't need to do on ARC.
But even that lower setting would be better than without XeSS. But again this is nothing more than a gut feeling.
XSX may be OK with it but I have doubts that XSS will be able to, which is a shame as it's the machine that needs it the most.
I agree it needs it more.
Not because it's a bad machine, but the lower your base resolution is the better the upscaler needs to be, as it has less pixel data to work with.

I don't know if going from 540p to 1080p would give a native 1080p like image, I need to check his video again.
720p I suspect would be ok.

Right now we really don't know.
I would describe myself as just hopeful.
 
XSS has enough power especially if we’re talking 33.3ms frames. 16.6 will be tighter, as there is very little time to do anything at 16.6.
 
XSS has enough power especially if we’re talking 33.3ms frames. 16.6 will be tighter, as there is very little time to do anything at 16.6.

Can I ask what you're basing that on? Comparing the raw ML specs for both Series consoles with an RTX2060.

RTX2060 (None Super)

98 TOPS for 8-bit integer operations
196 TOPS for 4-bit integer operations

Series-X


49 TOPS for 8-bit integer operations
97 TOPS for 4-bit integer operations

Series-S

17 TOPS for 8-bit integer operations
34 TOPS for 4-bit integer operations

An RTX2060 has 5.7x the performance over Series-X on raw specs.
 
But it comes down to how much can you save doing ML upscaling vs going native? If you use 33.3ms for native 1440 or 1080 or whatever, how many ms will you save, if any, doing ML upscaling? And if that is 1ms or 15ms what kind of cool other thing can squeeze into that slot? I mean what is the point if you use 33.3ms for one frame native and use the exact same time to do it with upscaling?
 
But it comes down to how much can you save doing ML upscaling vs going native? If you use 33.3ms for native 1440 or 1080 or whatever, how many ms will you save, if any, doing ML upscaling? And if that is 1ms or 15ms what kind of cool other thing can squeeze into that slot? I mean what is the point if you use 33.3ms for one frame native and use the exact same time to do it with upscaling?

But the upscaling itself is not 'free'

If we look at the DF example where it took an RTX2060 Super 1.26ms to upscale from 720p to 1440p.

A RTX2060 Super has 12% more throughout then a regular RTX2060.

So dumb math time 1.26ms + 12% = 1.43ms on a regular RTX2060.

A regular RTX 2060 has 5.7x more ML performance then Series-S so that same upscale from 720p to 1440p would take 8.1ms.

Now there's a lot of things that are wrong with my math as it's not perfect and we don't know how running ML tasks on RDNA2 affects the rest of the pipe line so it may end up actually being more in the real world.

But it looks like 60fps is off the cards unless you wan to spend half your frame budget just upscaling and then there's the question if developers would be able to budget or cover for 8ms+ worth of upscale time, not in a cross gen game but in an actual 30fps next generation one.
 
Can I ask what you're basing that on? Comparing the raw ML specs for both Series consoles with an RTX2060.

RTX2060 (None Super)

98 TOPS for 8-bit integer operations
196 TOPS for 4-bit integer operations

Series-X


49 TOPS for 8-bit integer operations
97 TOPS for 4-bit integer operations

Series-S

17 TOPS for 8-bit integer operations
34 TOPS for 4-bit integer operations

An RTX2060 has 5.7x the performance over Series-X on raw specs.
The depth of the network is what determines how long the performance is going to be. Flops only determines how long it takes to get through the network. If you can complete the frame in 16.6ms and you had 16.6ms to upscale, that would be plenty. The performance profile of RTX machines with tensor cores are aiming to upscale and have high FPS. That doesn't necessarily have to be the goal, so you can aim for a lower upscale resolution and a lower frame time.

Typically upscale algorithms can run fairly well on 30fps on much weaker harder than what's found on a Series S.

This one runs on a intel igpu / rather this one does not, the older one did. They redid the presentation it seems like. I'll find the older one.


edit the original video (16:00min+)
2nd edit: it was not using the igpu of an intel lol. It's not clear what they used, but it was much weaker than the 2080TI of above.
https://on-demand.gputechconf.com/s...-gpu-inferencing-directml-and-directx-12.html
 
Last edited:
Intel stated it would work on general GPU's, they haven't guaranteed it would work fast enough to actually use.

I've re-watched Alex's video on Switch Pro with DLSS and it may be that XSS simply doesn't have the ML performance for upscaling.

XSX may be OK with it but I have doubts that XSS will be able to, which is a shame as it's the machine that needs it the most.

Hard to know because the tensor cores weren't designed with just DLSS in mind. Nor are the number of Tensor cores that appear on each chip.
 
DX ML just needs Kepler or radeon 7000 series or newer , it also has a cpu fall back. It's also vender agnostic.

I just don't think it's fully ready yet but we should see it at some point in the future on the consoles. Keppler is a 10 year old GPU.
 
https://www.gtplanet.net/how-gran-t...olyphony-digital-and-sony-ais-new-technology/
To help speed up the process, Sophy controls 20 cars on track at the same time. The results are fed into servers with NVIDIA V100 or A100 chips, server-grade GPUs designed to process artificial intelligence and machine learning data.

It is important to note this type of computing power is only needed to “create” Sophy, not run it. The machine learning process eventually results in “models” which can then be executed on more modest hardware.

“The learning of Sophy is parallel processed using compute resources in the cloud, but if you are just executing an already learned network, a local PS5 is more than adequate,” Kazunori Yamauchi explained. “The asymmetry of this computing power is a general characteristic of neural networks.”
 
DX ML just needs Kepler or radeon 7000 series or newer , it also has a cpu fall back. It's also vender agnostic.

I just don't think it's fully ready yet but we should see it at some point in the future on the consoles. Keppler is a 10 year old GPU.

The question is how efficient can these operations run on GPUs.

Of course, DirectML has support for all GPUs, generally speaking. However it really depends on the GPU if it makes sense to run neural networks on them while handling other heavy tasks such as gaming.

Hardware like RDNA2 has support for lower precision ML operations, INT8 and INT4 which provide significant speedup compared to FP16. It makes sense to run gaming related neural networks in lower precision, because usually for these kind of networks, higher precision is not needed and again result into significant speedups which is crucial when gaming.

Older hardware like Maxwell, RDNA1.0, GCN etc don't have support for lower precision ML models, so for these cards you have to use FP16 which is significantly slower, and then it begs the question if the impact in gaming performance from running these DLNNs concurrently is worth the gain you get from them. The short answer is no. If that answer was yes, we would have seen neural networks for gaming purposes before the next gen console and Turing launch already.

RTX GPUs even have dedicated ML hardware that accelerate these INT8/4 instructions, the output from TCs for these kind of tasks is not only much higher compared to shader cores, they also won't impact gaming performance as much as on say RDNA2 without dedicated ML hardware. That is why DLSS upscaling is so extremly fast (0.8ms on a 2060 at 1080p) and without TCs, running on the shader cores, it would be much slower.
 
Back
Top