Game development presentations - a useful reference

DavidGraham · Apr 19, 2025

There is also a fine presentation about mastering NVIDIA Nsight for GPU Performance Analysis for Ray Tracing Applications.

Mastering NVIDIA Nsight: GPU Performance Analysis for Ray Tracing Applications | GTC 25 2025 | NVIDIA On-Demand

Learn how to use NVIDIA Nsight Graphics to profile and optimize both graphics and compute workloads, focusing on ray tracing applications

www.nvidia.com

Scott_Arm · Apr 21, 2025

It was posted elsewhere on the forum, but figured it'd be nice to have it in this thread.

DirectX state of the union GDC 2025

DavidGraham · Apr 24, 2025

NVIDIA released a paper covering Floating Point and IEEE 754 Compliance for NVIDIA GPUs.

The key points covered are the following:

Use the fused multiply-add operator.
The fused multiply-add operator on the GPU has high performance and increases the accuracy of computations. No special flags or function calls are needed to gain this benefit in CUDA programs. Understand that a hardware fused multiply-add operation is not yet available on the CPU, which can cause differences in numerical results.

Compare results carefully.
Even in the strict world of IEEE 754 operations, minor details such as organization of parentheses or thread counts can affect the final result. Take this into account when doing comparisons between implementations.

Know the capabilities of your GPU.
The numerical capabilities are encoded in the compute capability number of your GPU. Devices of compute capability 2.0 and later are capable of single and double precision arithmetic following the IEEE 754 standard, and have hardware units for performing fused multiply-add in both single and double precision.

Take advantage of the CUDA math library functions.
These functions are documented in the CUDA C++ Programming Guide [7]. The math library includes all the math functions listed in the C99 standard [3] plus some additional useful functions. These functions have been tuned for a reasonable compromise between performance and accuracy. We constantly strive to improve the quality of our math library functionality. Please let us know about any functions that you require that we do not provide, or if the accuracy or performance of any of our functions does not meet your needs. Leave comments in the NVIDIA CUDA forum 1 or join the Registered Developer Program 2 and file a bug with your feedback.

1. Introduction — Floating Point and IEEE 754 12.8 documentation

White paper covering the most common issues related to NVIDIA GPUs.

docs.nvidia.com

DavidGraham · Apr 28, 2025

I am going to do a practical summary "sort of interesting tidbits" of all those NVIDIA presentations.

Scott_Arm said:
Advances in RTX - Full Session Replay

-Path Tracing is going to be a be available in a lot more games. NVIDIA is working with many developers who want their games to look their best, and thus are choosing to implement path tracing in their games.

-NVIDIA is working with Bethesda on RTX Neural Faces (with early demos on Starfield).

Scott_Arm said:
Path Tracing Nanite in Nvidia Zorah

RTX Mega Geometry doesn't support World Position Offset (WPO), skinned meshes, and tessellated Nanite just yet. NVIDIA is working to add all of these features. This could be the reason why Mega Geometry is still yet to be part of DXR just yet.

Scott_Arm said:
Scale Up Ray Tracing in Games with RTX Mega Geometry

-Alan Wake 2 uses Mesh Shaders, the geometry is comprised mainly of meshlets but the developer changed it to clustered geometry to gain higher performance with Mega Geometry.

-Overall, meshlets cost 6ms to build BVH and ray trace the scene, while clusters took 5ms.

raytracingfan · Apr 29, 2025

DavidGraham said:
RTX Mega Geometry doesn't support World Position Offset (WPO), skinned meshes, and tessellated Nanite just yet. NVIDIA is working to add all of these features. This could be the reason why Mega Geometry is still yet to be part of DXR just yet.

My understanding of what was said is that the NvRTX branch of UE5 hasn't added support for these features yet, not that Mega Geometry in its current state is incapable of supporting these features.

chris1515 · Apr 29, 2025

https://twitter.com/x/status/1916931630083567724

Real-Time Markov Chain Path Guiding for Global Illumination and Single Scattering - LALBER.ORG

We present a lightweight and unbiased path guiding algorithm tailored for real-time applications with highly dynamic content. The algorithm demonstrates effectiveness in guiding both direct and indirect illumination. Moreover, it can be extended to guide single scattering events in participating...

www.lalber.org

DavidGraham · Apr 30, 2025

More GDC 2025 presentations ...

Crossing the Uncanny Valley With RTX Neural Face Rendering (featuring demo on Starfield).

Crossing the Uncanny Valley With RTX Neural Face Rendering | Game Developers Conference (GDC) 2025 | NVIDIA On-Demand

Explore the exciting world of AI-powered face rendering

www.nvidia.com

-Neural Faces took 7ms for inference and rendering in Starfield with native 1440p on a 4090, using 1.2GB of VRAM, using FP16 AutoEncoder. FP8/FP4 should be a lot faster and require way less VRAM. All of this is early work.

-The tech works purely on screen space data for now, with no awareness of animation or 3D space. NVIDIA is working to integrate more data into the model.

Creating Next-Gen Agents in Krafton’s inZOI

-inZOi needed small language model AI to run on device because cloud would be too slow (the game can run at 5X the speed and AI has to be very responsive), the cloud would also be too costly (servers running 24/7).

-Liama.CPP/GGML is used to run the model on Windows, DirectML was considered, but other options worked much better.

-NVIDA uses CUDA in Graphics (CiG) to quickly switch context without latency or execution bubbles.

-RTX exclusive.

Creating Next-Gen Agents in Krafton’s inZOI | Game Developers Conference (GDC) 2025 | NVIDIA On-Demand

NVIDIA and Krafton are pulling back the curtain on how we built a truly emergent character AI for inZOI

www.nvidia.com

Achieving AI Teammates in 'NARAKA: BLADEPOINT' PC Version

-On device inference for companion AI, voice controlled by the player. The companions are also able to freely chat with the player.

-CUDA in Graphics (CiG) is used to maintain good performance.

-There is a cloud version available but with higher latency and less capabilities.

Achieving AI Teammates in 'NARAKA: BLADEPOINT' Mobile PC Version | Game Developers Conference (GDC) 2025 | NVIDIA On-Demand

In this talk, we delve into the innovative use of AI teammates in the popular game NARAKA:BLADEPOINT MOBILE PC VERSION, developed by Netease 24 Entertainme

www.nvidia.com

SlmDnk · Apr 30, 2025

SlmDnk · Friday at 10:27 AM

Hyperskin 4D breakdown | Symbiote Studio

Discover HyperSkin 4D, the groundbreaking technology developed by Symbiote in collaboration with Clear Angle Studios. This innovation redefines animation maps for digital characters, delivering unparalleled realism with dynamic skin details. Dive into the behind-the-scenes of this game-changing...

www.symbiote.skin

Dictator · Saturday at 9:39 PM

SlmDnk said:
Hyperskin 4D breakdown | Symbiote Studio

Discover HyperSkin 4D, the groundbreaking technology developed by Symbiote in collaboration with Clear Angle Studios. This innovation redefines animation maps for digital characters, delivering unparalleled realism with dynamic skin details. Dive into the behind-the-scenes of this game-changing...

www.symbiote.skin

Per frame texture based on Performance capture is... interesting. At least that is what it reads like. But It also sounds like a ridiculous amount of memory that would have trouble scaling to real-time... also scaling to arbitrary frame-rates?

tuna · Sunday at 7:16 PM

Not a presentation but I read the book Behind the Scenes at PlayStation by Masayuki Chatani. Totally pointless book that tells us nothing new. There is ZERO actual behind the scenes info.

DavidGraham · Monday at 6:56 AM

Every now and then we get asked what a beginner-friendly website is for learning graphics programming. We’d love to recommend GPUOpen of course, but the truth is, the main target audience for GPUOpen is intermediate or advanced graphics programmers. For someone who just started to dive into the world of graphics, there are surely other websites more suitable for them.

As with so many things, there is no one right way to get into graphics. It mostly depends on potential pre-existing knowledge, how you like to learn, personal preference, available hardware, etc. Hence, this guide is more a collection of websites that we think are useful for beginners, and a small discussion weighing the pro and cons of the websites and what they teach.

How do I become a graphics programmer? - A small guide from the AMD Game Engineering team

It is often difficult to know where to start when taking your first in the world of graphics. This guide is here to help with a discussion of first steps and a list of useful websites.

gpuopen.com

Scott_Arm · 2025-05-08T01:07:03+0100

AMD - Using Neural Networks for Geometric Representation

Introduction

Monte Carlo ray tracing is a cornerstone of physically based rendering, simulating the complex transport of light in 3D environments to achieve photorealistic imagery. Central to this process is ray casting which determines and computes intersections between rays and scene geometry. Due to the computational cost of these intersection tests, spatial acceleration structures such as bounding volume hierarchies (BVHs) are widely employed to reduce the number of candidate primitives a ray must test against.

Despite decades of research and optimization, BVH-based ray tracing still poses challenges on modern hardware, particularly on Single-Instruction Multiple-Thread (SIMT) architectures like GPUs. BVH traversal is inherently irregular: it involves divergent control flow and unpredictable memory access patterns. These characteristics make it difficult to fully utilize the parallel processing power of GPUs, which excel at executing uniform, data-parallel workloads. As a result, even with the addition of specialized ray tracing hardware, such as RT cores, the cost of BVH traversal remains a bottleneck in high-fidelity rendering workloads.

In contrast, neural networks, especially fully connected networks, offer a regular and predictable computational pattern, typically dominated by dense matrix multiplications. These operations map well to GPU hardware, making neural network inference highly efficient on SIMT platforms. This contrast between the irregularity of BVH traversal and the regularity of neural network computation raises an intriguing question: Can we replace the BVH traversal in ray casting with a neural network to better exploit the GPU’s architecture?

This idea is beginning to gain traction as researchers explore alternative spatial acceleration strategies that leverage learned models. In this post, we dive into the motivation behind this approach, examine the challenges and opportunities it presents, and explore how our invention, Neural Intersection Function, might reshape the future of real-time and offline ray tracing.

Using Neural Networks for Geometric Representation

Explore how Neural Intersection Functions (NIF) and the enhanced LSNIF are poised to reshape ray tracing by replacing traditional BVH traversal with efficient, GPU-friendly neural networks for accelerated performance and high-fidelity imagery.

gpuopen.com

MfA · 2025-05-08T11:10:35+0100

Using a neural network only for intersection calculations would be a criminally poor use of resources. Let the network prefilter too.

Thinking there is value in sticking close to Monte-Carlo/PBR purity is an expensive fantasy. Everything in real time is a hack and purity will often have negative gains.

raytracingfan · 2025-05-08T17:22:12+0100

Intel is working on its own neural denoiser and its own version of Mega Geometry's Partitioned TLAS.

Path Tracing a Trillion Triangles

This technical deep dive was written by Anton Sochenov, Manu Mathew Thomas, Cristiano Siqueira, Gabor Liktor, and Akshay Jindal as part of their research efforts at Visual Compute and Graphics lab, within Intel Labs. Highlights: Reducing the cost of path tracing to achieve real-time...

community.intel.com

Performance and image quality are proportional to the number of rays at each stage of the path tracing. To save on compute and memory traffic we use 1spp and 1 ray on every bounce. Due to the stochastic nature of path tracing, the rendered image has significant noise. Each pixel is determined by a single random light path, causing extreme fluctuations in brightness and color, especially in complex lighting scenarios such as indirect illumination, caustics, soft shadows, etc. To remove noise and reconstruct details, we use our spatiotemporal joint neural denoising and supersampling model.

The large-scale open world scene featured in Jungle Ruins poses further challenges for path tracing, due to its geometric complexity. Millions of dynamic instances of meshes need to be animated, which requires updating the acceleration structures prior to ray tracing. The two-level accelerating structures defined by modern ray tracing APIs do not scale well with this complexity. While the animation of the foliage can be efficiently amortized on a per-mesh level (BLAS), the high number of instances make the full update of the top-level acceleration structure (TLAS) prohibitively costly. To this end, we demonstrate a solution that partitions the TLAS into subsets (AS fragments) that could be updated independently at a fraction of the global TLAS update.

Game development presentations - a useful reference

Similar threads