Machine Learning: WinML/DirectML, CoreML & all things ML

tuna · Aug 29, 2024

pcchen said:
Anway, this seems to me that Jen-Sun Huang's "every pixel could be AI generated in 10 years" comment is no longer a pipe dream.

It still is.

pcchen · Aug 30, 2024

tuna said:
It still is.

Technically it's already demonstrated. Every pixel in that video was AI generated, thus it's been done.

raytracingfan · Jan 9, 2025

DavidGraham said:
New faster material rendering using Neural Rendering.

NVIDIA Showcases Real-Time Neural Materials Models, Offering Up To 24x Shading Speedup

NVIDIA unveils a new real-time neural materials models approach, offering a huge 12-24x speed in shading performance vs traditional methods.

wccftech.com

This will be released as RTX Neural Materials.

NVIDIA RTX Neural Rendering Introduces Next Era of AI-Powered Graphics Innovation | NVIDIA Technical Blog

NVIDIA today unveiled next-generation hardware for gamers, creators, and developers—the GeForce RTX 50 Series desktop and laptop GPUs. Alongside these GPUs, NVIDIA introduced NVIDIA RTX Kit…

developer.nvidia.com

This is even more interesting than DLSS 4 in my opinion because it makes neural rendering a core, integral part of the graphics pipeline, not something applied at the end like DLSS.

MfA · Jan 9, 2025

In a way it might be better to apply it at the end. Having golden samples which must be displayed makes things hard for AI, it's a direct cause of instability.

Maybe it's time for deferred hallucination. Have a subsampled G-buffer as a mere suggestion to the neural renderer, then let the Deep Learning Hallucinating Renderer make something up (for framegen there would also be frame time and view matrix and the G-buffer might be no-sampled).

PS. not being sarcastic.

Dictator · Jan 13, 2025

Perhaps this should be another thread, but with "Cooperative Vectors" API coming to Direct X, devs can now access tensor cores for normal shading/compute shaders and thus leverage quick ML performance for games.
Enabling Neural Rendering in DirectX: Cooperative Vector Support Coming Soon

What are Cooperative Vectors, and why do they matter?
Cooperative vector support will accelerate AI workloads for real-time rendering, which directly improves the performance of neural rendering techniques. It will do so by enabling multiplication of matrices with arbitrarily sized vectors, which optimize the matrix-vector operations that are required in large quantities for AI training, fine-tuning, and inferencing. Cooperative vectors also enable AI tasks to run in different shader stages, which means a small neural network can run in a pixel shader without consuming the entire GPU. Cooperative vectors will enable developers to seamlessly integrate neural graphics techniques into DirectX applications and light up access to AI-accelerator hardware across multiple platforms. Our aim is to provide game developers with the cutting-edge tools they need to create the next generation of immersive experiences.

Intel, AMD, and QUALCOMM support is due as the blog mentions - after having seen the Neural Rendering demos I am 100% onboard for this and would love to see how devs can manage to use this type of feature. Even if it just means getting better universal upscaling in the mid-term or other smaller things.

TopSpoiler · Jan 15, 2025

https://twitter.com/x/status/1876835383842857461

I don't know if this means real-time training or pre-training.

TopSpoiler · Jan 22, 2025

Vulkan Cooperative Matrix Multiply (2022)

https://www.khronos.org/assets/uploads/developers/presentations/Cooperative_Matrix_May22.pdf

This presentation will provide some insight into DirectX Cooperative Vectors.

MfA · Jan 25, 2025

Deepseek has just flipped the script. It used to be believed that MoE could not compete for the highest end frontier models, yet they are competing ... at over an order of magnitude lower computational complexity, due to the combination of MoE, native fp8 and multi-token prediction. They also went a large way towards solving KV cache memory consumption (and Tencent took it further).

This is bringing the potential of very potent language models running off SSD much closer. Deepseek V3 is obviously still a little too complex for that, but something the size of Aria gets close (potent in its own right, but less architecturally adventurous than Deepseek V3). A MoE model specifically designed for SSD would be trained a bit differently too, by restriction on the amount of new experts per tokens instead of simple top-K gating. Pre-gating using outputs from a layer earlier is also an option, so expert selection and layer computation can be pipelined ... though not essential since these MoE models are so fast to compute, compute is almost irrelevant.

I think MoE is going to invade everything soon, not just language models but image gen and rendering too. Dense is dead, everything is going to get ~10x cheaper to run.

DavidGraham · Jan 25, 2025

MfA said:
Deepseek has just flipped the script. It used to be believed that MoE could not compete for the highest end frontier models, yet they are competing ... at over an order of magnitude lower computational complexity, due to the combination of MoE, native fp8 and multi-token prediction

It was revealed that DeepSeek has been trained on a cluster of 50k H100s, does that still count as lower computational complexity?

According to Wang, when it comes to the Chinese accessing NVIDIA's advanced GPUs, "the reality is yes and no. You know the Chinese labs, they have more H100s than, than people think." He added and shared that his "understanding is that DeepSeek has about fifty thousand H100s." Wang outlined, "they can't talk about obviously because it is against the export controls that United States has put in place." He also thinks that "they have more chips than other people expect."

Chinese AI Lab DeepSeek Has 50,000 NVIDIA H100 AI GPUs, Says AI CEO

According to Scale AI CEO and founder Alexandr Wang, Chinese AI lab DeepSeek behind the popular R1 model has access to 50,000 NVIDIA H100 GPUs.

wccftech.com

MfA · Jan 25, 2025

DavidGraham said:
It was revealed that Deepseek has been trained on a cluster of 50k H100s, does that still count as lower computational complexity?

The inference has MoE, native fp8 and multi-token prediction. It's open weights, it's factual.

Whether they can really use fp8 matmul and sparse experts at training time is academic for actually using the models. Though I choose to believe they can.

TopSpoiler · Feb 7, 2025

NVIDIA today released several SDKs for neural rendering as part of the RTX Kit.

The interesting part about Neural Texture Compression is that it provides a fallback method depending on the ML capabilities of the GPU.
RTX 40 and higher GPUs can use realtime sampling directly from NTC-compressed textures.
Otherwise, the NTC textures are transcoded to BCn format at load time and the classic sampling method is used. This does not save VRAM and results in quality loss.

GPU for NTC decompression on load and transcoding to BCn:

Minimum: Anything compatible with Shader Model 6

Recommended: NVIDIA Turing (RTX 2000 series) and newer.

GPU for NTC inference on sample:

Minimum: Anything compatible with Shader Model 6 (will be functional but very slow)

Recommended: NVIDIA Ada (RTX 4000 series) and newer.

Dampf · Feb 7, 2025

TopSpoiler said:
NVIDIA today released several SDKs for neural rendering as part of the RTX Kit.

Otherwise, the NTC textures are transcoded to BCn format at load time and the classic sampling method is used. This does not save VRAM and results in quality loss.

No, it still uses the new neural rendering format, but it's just quite slow without Ada+ hardware. I was playing around with the demo a little bit on my 2060 laptop. The closer you get to the textures, the more performance it costs. But memory savings are huge.

Interestingly, DLSS reduces performance in this demo. But I was expecting that since the tensor cores would have less compute available for NTC then.

Pjotr · Feb 7, 2025

Hmm I don't know... I had my doubt with this approach because it mandated dlss to do the filtering. And it works mostly but when you zoom in you still see noise, especially on the logo. I wonder how this will do with game materials that could become more complex. I also wonder how it will look when upscaled in performance mode.
But apart from that I also see some limitations to the approach that I think could limit the adaptation in the near term at least:
It compresses textures in bundles with the assumption that they will all be used together. I don't think this will always be true in game, artists often share some textures across multiple materials. Especially with shader graphs this is made easier. Also textures can be swapped programmatically. This will make it hard for cooking/packaging tools to see which textures need to be compressed together and you will probably have artists to manually annotate this. All this will lead to lower memory savings than what nVidia claims now.
Right now there is a fallback for cards that have a lower performance. This fallback does not produce the same results, especially under magnification (see the logo in the demo). Neither result is wrong as an artist will tweak the textures to get the desired effect, the problem is you now have to choose if you want the desired effect be in the fallback or the new method.
The big reliance on dlss will make this system perform differently on systems with a different upscalers. Already in the demo there is more noise when switching to TAA. So it might be a deal-breaker on consoles till they get the same quality upscalers as dlss.

Pjotr · Feb 10, 2025

TopSpoiler said:
NVIDIA today released several SDKs for neural rendering as part of the RTX Kit.

The interesting part about Neural Texture Compression is that it provides a fallback method depending on the ML capabilities of the GPU.
RTX 40 and higher GPUs can use realtime sampling directly from NTC-compressed textures.
Otherwise, the NTC textures are transcoded to BCn format at load time and the classic sampling method is used. This does not save VRAM and results in quality loss.

So I just noticed that while the SDK for neural materials in not out yet the underlying library for neural shading is out: https://github.com/NVIDIA-RTX/RTXNS
It has a few examples but nothing really interesting that shows the benefits, the examples are just there to get you started to train your own shaders. You can read more about the examples here: https://github.com/NVIDIA-RTX/RTXNS/blob/main/docs/QuickStart.md
I compiled it on my system but like I said the examples are too basic to show any real benefit. For example the first example of the Disney-BRDF is so basic that the neural shader is slower than the full shader. But what the example does show is the quality loss of the neural approximation which some might find interesting.

TopSpoiler · Mar 20, 2025

Intel Co-Presents Cooperative Vectors with Microsoft at Game Developer Conference 2025

This technical deep dive was written by Jonathan Dupuy, and is a joint work with Anis Benyoub, Laurent Belcour, Mariusz Merecki, and Thomas Chambon as part of their research efforts at Intel Labs. Highlights: Microsoft will showcase Cooperative Vectors on DirectX during the Advanced...

community.intel.com

... With this new feature, developers can unleash the full power of the XMX units on modern Intel GPUs. Cooperative Vector support will be available on discrete GPUs, such as Intel® Arc™ (A series and B series), and built-in Intel® Arc™ GPUs in Intel® Core™ Ultra Processors (Series 2).

Intel is excited to be joining Microsoft on stage to demonstrate the benefits of Cooperative Vectors on Intel products. Anis Benyoub from Intel’s Graphics Research team will explain how this new DirectX feature enabled a 10x gain in inference performance for Neural Block Texture Compression.

Machine Learning: WinML/DirectML, CoreML & all things ML

Moderator

What are Cooperative Vectors, and why do they matter?​

What are Cooperative Vectors, and why do they matter?