Machine Learning: WinML/DirectML, CoreML & all things ML

DegustatoR · May 24, 2023

DirectML Unlocks New AI Silicon for Adobe - DirectX Developer Blog

DirectML is ushering in a new wave of machine learning integration capabilities on emerging AI silicon. At the Windows AI Breakout BUILD session, we showcased how ONNX Runtime, powered by DirectML, enables Adobe’s Premiere Pro to leverage Intel’s next generation platform,

devblogs.microsoft.com

Jay · May 29, 2023

TopSpoiler said:
Random-Access Neural Compression of Material Textures | Research

The continuous advancement of photorealism in rendering is accompanied by a growth in texture data and, consequently, increasing storage and memory demands. To address this issue, we propose a novel neural compression technique specifically designed for material textures. We unlock two more...

research.nvidia.com

Sounds like what xbox was talking about that they was working on.
Where everyone thought it was image reconstruction.
But not heard anything about it since, so who knows if still working on it.

Deleted member 2197 · Aug 1, 2023

July 31, 2023

Stable Diffusion Performance - NVIDIA GeForce VS AMD Radeon

Stable Diffusion is seeing more use for professional content creation work. How do NVIDIA GeForce and AMD Radeon cards compare in this workflow?

www.pugetsystems.com

Stable Diffusion Performance - NVIDIA RTX vs Radeon PRO

Stable Diffusion is seeing more use for professional content creation work. How do NVIDIA RTX and Radeon PRO cards compare in this workflow?

www.pugetsystems.com

Deleted member 2197 · Dec 17, 2023

December 15, 2023

Stable Diffusion Benchmarks: 45 Nvidia, AMD, and Intel GPUs Compared

Which graphics card offers the fastest AI performance?

www.tomshardware.com

Clearly, this look at FP16 compute doesn't match our actual performance much at all. That's because optimized Stable Diffusion implementations will opt for the highest throughput possible, which doesn't come from GPU shaders on modern architectures. That brings us to the Tensor, Matrix, and AI cores on the various GPUs.
...
It's interesting to see how the above chart showing theoretical compute lines up with the Stable Diffusion charts. The short summary is that a lot of the Nvidia GPUs land about where you'd expect, as do the AMD 7000-series parts. But the Intel Arc GPUs all seem to get about half the expected performance — note that my numbers use the boost clock of 2.4 GHz rather than the lower 2.0GHz "Game Clock" (which is a worst-case scenario that rarely comes into play, in my experience).

DegustatoR · Feb 1, 2024

Introducing Neural Processor Unit (NPU) support in DirectML (developer preview) - DirectX Developer Blog

With the release of DirectML 1.13.1 and the ONNX Runtime 1.17, we are excited to announce developer preview support for NPU acceleration in DirectML, the machine learning platform API for Windows. This developer preview enables support for a subset of models on new Windows 11 devices with Intel®...

devblogs.microsoft.com

Davros · Feb 29, 2024

So I can haz a question : Take for example Nvidia DLSS (Deep Learning Super Sampling) from what I understand my p.c is using an algorithm created on Nvidia super computers using ML
but when if ever will the learning take place on my computer (eg: I use some future version of DLSS and the more I use it the better it gets because its learning ?

Kaotik · Feb 29, 2024

Davros said:
So I can haz a question : Take for example Nvidia DLSS (Deep Learning Super Sampling) from what I understand my p.c is using an algorithm created on Nvidia super computers using ML
but when if ever will the learning take place on my computer (eg: I use some future version of DLSS and the more I use it the better it gets because its learning ?

Never, not current technologies anyway. It doesn't learn on your computer, it gets taught by NVIDIA and you just run it.

Davros · Feb 29, 2024

Kaotik said:
it gets taught by NVIDIA and you just run it.

As stated in my post

but this is the important bit

Kaotik said:
Never, not current technologies anyway

DegustatoR · Feb 29, 2024

You will have to run s/w in "learning" mode which in case of DLSS presumably means something like 64x SSAA. You want to do that?

Deleted member 2197 · Feb 29, 2024

Davros said:
So I can haz a question : Take for example Nvidia DLSS (Deep Learning Super Sampling) from what I understand my p.c is using an algorithm created on Nvidia super computers using ML
but when if ever will the learning take place on my computer (eg: I use some future version of DLSS and the more I use it the better it gets because its learning ?

If your question is only related to DLSS, not currently but no one knows what the direction will be over the next 5 - 10 years. I wouldn't be surprised to see LLM models running locally on automobiles making minute changes based on real time data, so why not the PC.

If you are talking about "training" in general on your PC and have an RTX card you can already run local LLM models against data residing on your pc today.

Davros · Feb 29, 2024

Well my question was related to games in general and using ML / A.I

DegustatoR · Feb 29, 2024

Games are applications where a user expects predictable results which isn't something you get from a learning NN where results may be different because of said learning - and not always in a good way.

pcchen · Feb 29, 2024

Currently the problem of training locally is not just it's very costly, but also that it can be hard to do quality control on the training results.
Using DLSS as an example. The models NVIDIA trained using their supercomputers have to be verified before shipping. That is, they'll run tests on many games to make sure that the new training results are still good. Otherwise, you could have something that's good with the new training data but somehow worse for other scenarios.

Davros · Mar 16, 2024

Quick question : If something has the ML performance of 300TOPS and 67 TFLOPs of 16-bit floating point (I have no idea what that means) is that impressive ?

pcchen · Mar 16, 2024

Davros said:
Quick question : If something has the ML performance of 300TOPS and 67 TFLOPs of 16-bit floating point (I have no idea what that means) is that impressive ?

It depends on what these numbers means though. For example, a 4090 has 82.6 TFLOPS of peak FP16 performance (non-tensor). When using tensor core, it's 330 TFLOPS with FP16 accumulate and 165 TFLOPS with FP32 accumulate. The TOPS number likely refers to FP8/INT8 performance and 4090 has 660 TOPS with tensor (both FP8 with FP16 accumulate and INT8).

So basically these numbers could mean a lot of things and whether that's impressive really depends on what exactly they are. Also the amount of memory and memory bandwidth are also important in some applications. For example, when doing LLM inference, a smaller model which can fit inside 4090's 24GB memory will run much faster than a M3 Max, but when running a larger model requiring more than 24GB, a M3 Max with say 64GB or 128GB memory will be faster.

Newguy · Mar 16, 2024

Davros said:
Quick question : If something has the ML performance of 300TOPS and 67 TFLOPs of 16-bit floating point (I have no idea what that means) is that impressive ?

In absolute terms it's not a world beater but given the general constraints and value of the package as a whole it's impressive

However there are some asterisks here - the FP16/FP32 numbers are doubled with dual issue and I don't know how widespread or useful that is, maybe it's easy to do and will be common but I don't know if that's the case right now. Then as pcchen says bandwidth is potentially problematic, apparently 576GB/s pro vs 448GB/s which is 1.29x. ~1.3x bandwidth for 3.26x flops with dual issue, that's 0.4x bandwidth/flop for the shaders. On top of that we don't know how nicely the NPU/whatever they're going to call it will play with the rest of the system wrt bandwidth, bandwidth required to exploit the performance, possible bandwidth contention issues reducing available bandwidth for the GPU and CPU on top of that, whether the GPU and NPU can run concurrently or not although I presume it will?

If everything works out well it could be a very exciting time next gen when both will have the extra number crunching. Devs are good at finding interesting and unexpected ways to get the most out of hardare so this is exciting even if it's limited

Davros · Mar 16, 2024

Thanks everyone?

Remij · Mar 16, 2024

Was thinking.. could AI (ML) be trained to optimize asset loading and streaming during gameplay in video games? Surely it can become better at predicting player patterns and inputs... so realistically devs should be able to allow for finer-grained loading and streaming of assets through much better algorithms designed to do specifically that, right?

What about another scenario.. Let's say you're on you're PS6/XSX2 dashboard, and you've got your library of installed games and whatever else. What if as you hovered over an installed game, before even pressing the button it started loading it up in the background.. cutting down on the perceived start up time further yet. I know this doesn't particularly require ML.. just saying lol

Davros · Mar 16, 2024

I asked asked the question was ML ever going to be performed on the p.c and was told no, but we are to believe (if rumours are true) that it will be done on the ps5 why the discrepancy ?

Kaotik · Mar 16, 2024

Davros said:
I asked asked the question was ML ever going to be performed on the p.c and was told no, but we are to believe (if rumours are true) that it will be done on the ps5 why the discrepancy ?

It's already 'being done's on PC depending on your definition on machine learning (they're doing inference), could you be more specific?

Machine Learning: WinML/DirectML, CoreML & all things ML

DegustatoR

DirectML Unlocks New AI Silicon for Adobe - DirectX Developer Blog

Jay

Random-Access Neural Compression of Material Textures | Research

Deleted member 2197

Guest

Stable Diffusion Performance - NVIDIA GeForce VS AMD Radeon

Stable Diffusion Performance - NVIDIA RTX vs Radeon PRO

Deleted member 2197

Guest

Stable Diffusion Benchmarks: 45 Nvidia, AMD, and Intel GPUs Compared

DegustatoR

Introducing Neural Processor Unit (NPU) support in DirectML (developer preview) - DirectX Developer Blog

Davros

Kaotik

Drunk Member

Davros

DegustatoR

Deleted member 2197

Guest

Davros

DegustatoR

pcchen

Moderator

Davros

pcchen

Moderator

Newguy

Davros

Remij

Davros

Kaotik

Drunk Member