Game development presentations - a useful reference

OlegSH · Jun 14, 2024

trinibwoy said:
I tried watching the presentation but unfortunately it was hard to hear/follow as the presenter was speaking far too quickly.

Honestly, it wasn't the most interesting presentation as its content repeated many other optimization guides and threading talks from other events.
The highlights for me personally were the special care notes about the sampler feedback on AMD, which requires sparse sampling for performance, as well as the special care needed on AMD for the scalarization for bindless.
Unfortunately, yet another presentation with zero confirmation of Lurkmass's theories about the superior binding model on AMD

Lurkmass · Jun 18, 2024

OlegSH said:
Honestly, it wasn't the most interesting presentation as its content repeated many other optimization guides and threading talks from other events.
The highlights for me personally were the special care notes about the sampler feedback on AMD, which requires sparse sampling for performance, as well as the special care needed on AMD for the scalarization for bindless.
Unfortunately, yet another presentation with zero confirmation of Lurkmass's theories about the superior binding model on AMD

You think needing a waterfall loop for divergent resource access is somehow the end of the world on AMD when D3D12 had to rip out a far more powerful feature like GPU timeline descriptor creation/copying because the other hardware vendors couldn't keep up ? AMD could easily implement NV's binding model but the same is not true the other way around and there's alternatives like Guerilla Games' loose tiling technique to eliminate resource access divergence. There's no good alternatives to fully bindless GPU descriptor synthesis/copies and pointers everywhere ...

Even 4A Games in their "Plans and Wishes" slide believes that AMD's model is desirable ...

Direct Descriptors manipulation would be extremely helpful.

Potato Head · Jun 18, 2024

Your own link says GPU timeline descriptor creation / copying was removed 10 years(!) ago due to “the real utility of GPU timeline descriptor updates is questionable.

“Questionable utility” does not equal “far more powerful” in any way that can comprehend those terms.

Lurkmass · Jun 18, 2024

Potato Head said:
Your own link says GPU timeline descriptor creation / copying was removed 10 years(!) ago due to “the real utility of GPU timeline descriptor updates is questionable.

“Questionable utility” does not equal “far more powerful” in any way that can comprehend those terms.

Well things have been different now since developers have been explicitly asking for it to be exposed again. It's more powerful than the current SM6.6 style dynamic resource binding model. I know Insomniac Games' engine would appreciate having that feature since they have a very free form descriptor model on consoles ...

Potato Head · Jun 18, 2024

Lurkmass said:
Well things have been different now since developers have been explicitly asking for it to be exposed again. It's more powerful than the current SM6.6 style dynamic resource binding model. I know Insomniac Games' engine would appreciate having that feature since they have a very free form descriptor model on consoles ...

From your first link:

A massive foot-gun
While this is a ridiculously powerful feature, it’s also an equally ridiculous foot-gun. The requirements on debug infrastructure are extreme. Be warned!

Sounds like a terrible idea. Which devs have asked for it? Anyone besides Valve who may want it in Proton for the deck? Like any game devs?

Lurkmass · Jun 18, 2024

Potato Head said:
From your first link:

Sounds like a terrible idea. Which devs have asked for it? Anyone besides Valve who may want it in Proton for the deck? Like any game devs?

The lack of debugging tool support doesn't negate the utility of the feature and any games like Marvel's Spider Man doing lot's of descriptor copying will benefit in terms of CPU overhead ... (I've seen it asked for elsewhere in other communities too as well)

The main takeaway is that AMD isn't the one preventing you from doing NURI (non-uniform resource indexing) as originally misled here while Nvidia are the ones willingly blocking more powerful bindless functionality like GPU-side descriptor copying from being standardized. I'll do you one even better which is pointing out the fact that they even expose a D3D12 driver extension for raw pointers in shaders (real bindless) in broad daylight but can you say the same for NVAPI yet ?

trinibwoy · Jun 18, 2024

Raw pointers sound awesome. Are there any applications using them with good performance on any GPU arch? In cases like this it helps to have an actual use case where the better tech is producing superior results.

Lurkmass · Jun 18, 2024

trinibwoy said:
Raw pointers sound awesome. Are there any applications using them with good performance on any GPU arch? In cases like this it helps to have an actual use case where the better tech is producing superior results.

It's a more "programmer oriented" feature than "performance oriented" where it opens the capability to build more interesting data structures like linked lists. Microsoft hears feature requests all the time about it to have it added in D3D12 ...

If you want to extract every bit of CPU performance without subscribing to pointers then GPU timeline descriptor copying functionality is another powerful bindless abstraction for that case. Some author comments from the Khronos blogpost below:

Copying descriptors on GPU timeline? Why not
Since descriptors are just memory now, there is nothing stopping us from doing descriptor updates on the GPU timeline. Combining this with GPU-driven rendering is exceptionally powerful.

Improving a fully bindless design - a-la Shader Model 6.6
With descriptor indexing as-is, it’s already possible to reach a design where every resource is accessed by a uint index; the VkDescriptorSetLayout system does not change after all. We do this in vkd3d-proton already for example.

The main win of descriptor buffers for this kind of design is that it’s now far more efficient to shuffle descriptors around. We can also copy descriptors on the GPU timeline. I expect we’ll see some interesting innovation here.

There's other potential as well with a bindless design to reduce the number of PSO permutations as well. Consider the case where we have two near identical PSOs that shares the exact same set of static states and shaders but they only differ in the set of resources accessed between them. Bindless specifically let's us reuse a similar PSO with different sets of resources thus eliminating the need to compile/generate any redundant PSOs!

chris1515 · Jul 1, 2024

chris1515 · Jul 2, 2024

chris1515 · Jul 12, 2024

Facial Rig and AI

chris1515 · Jul 13, 2024

https://twitter.com/x/status/1811781987935617414

The Snapdragon X Elite’s Adreno iGPU

Qualcomm is no stranger to integrated graphics. Their Adreno GPU line has served through many generations of Snapdragon cell phone SoCs. But Qualcomm was never content to stay within the cell phone…

chipsandcheese.com

Pjotr · Jul 17, 2024

https://twitter.com/x/status/1812882591428587755

A pretty cool presentation about adding a custom mesh renderer to Unreal Engine that worked alongside UE's mesh renderer. Because UE is a general solution there is a lot of bloat, even in a simple thing like a static mesh. This gets really noticeable in big worlds with lots of meshes. In this presentation they talk about how they added a 'fast-path' for static meshes that only supports the features they need. This way they improved both memory usage and reduced stutters. Meshes that can't take the fast-path can fall back to UE's vanilla renderer.
Another cool thing they did is that they enforced their art team to only use a handful of materials and only derive material instances from them. So no autoring of material shader graphs for the artists. This way they had less PSO's/shaders to compile.

Remij · Jul 18, 2024

Pjotr said:
https://twitter.com/x/status/1812882591428587755

A pretty cool presentation about adding a custom mesh renderer to Unreal Engine that worked alongside UE's mesh renderer. Because UE is a general solution there is a lot of bloat, even in a simple thing like a static mesh. This gets really noticeable in big worlds with lots of meshes. In this presentation they talk about how they added a 'fast-path' for static meshes that only supports the features they need. This way they improved both memory usage and reduced stutters. Meshes that can't take the fast-path can fall back to UE's vanilla renderer.
Another cool thing they did is that they enforced their art team to only use a handful of materials and only derive material instances from them. So no autoring of material shader graphs for the artists. This way they had less PSO's/shaders to compile.

Very cool. Thanks for posting it!

chris1515 · Jul 20, 2024

https://twitter.com/x/status/1814344972507746408

Hair Mesh Rendering - Cem Yuksel

cemyuksel.com

A scene with 100 characters, each with a unique hair model comprising of 100 thousand strands, rasterized in only 2 ms on an NVIDIA GTX 4090 GPU (with 8x MSAA) using our real-time hair rendering method with hair meshes and our level-of-detail techniques. All 100 hair mesh models in this scene fit in 1.7 MB (between 13 KB and 21 KB per model).

A good usage of mesh shader

Hair meshes are known to be effective for modeling and animating hair in computer graphics. We present how the hair mesh structure can be used for efficiently rendering strand-based hair models on the GPU with on-the-fly geometry generation that provides orders of magnitude reduction in storage and memory bandwidth. We use mesh shaders to carefully distribute the computation and a custom texture layout for offloading a part of the computation to the hardware texture units. We also present a set of procedural styling operations to achieve hair strand variations for a wide range of hairstyles and a consistent coordinate-frame generation approach to attach these variations to an animating/deforming hair mesh. Finally, we describe level-of-detail techniques for improving the performance of rendering distant hair models. Our results show an unprecedented level of performance with strand-based hair rendering, achieving hundreds of full hair models animated and rendered at real-time frame rates on a consumer GPU.

MfA · Jul 21, 2024

Lurkmass said:
The lack of debugging tool support

Seems more a language issue than a debugging issue. The compiler needs to understand lifetimes and scheduling to prevent UAF and race conditions by holding the programmer's hand.

Get it right by design, not by chasing intermittent bugs.

cheapchips · Jul 26, 2024

Pjotr said:
https://twitter.com/x/status/1812882591428587755

A pretty cool presentation about adding a custom mesh renderer to Unreal Engine that worked alongside UE's mesh renderer. Because UE is a general solution there is a lot of bloat, even in a simple thing like a static mesh. This gets really noticeable in big worlds with lots of meshes. In this presentation they talk about how they added a 'fast-path' for static meshes that only supports the features they need. This way they improved both memory usage and reduced stutters. Meshes that can't take the fast-path can fall back to UE's vanilla renderer.
Another cool thing they did is that they enforced their art team to only use a handful of materials and only derive material instances from them. So no autoring of material shader graphs for the artists. This way they had less PSO's/shaders to compile.

Here's the video of the presentation

Metricity · Jul 31, 2024

HPG 2024 Streams
Day 1
Day 2
Day 3

Deleted member 2197 · Jul 31, 2024

NVIDIA researchers used NVIDIA Edify, a multimodal architecture for visual generative AI, to build a detailed 3D desert landscape within a few minutes in a live demo at SIGGRAPH’s Real-Time Live event on Tuesday.

DavidGraham · Aug 2, 2024

EA's presentation of ray traced GIBS Global Illumination Based on Surfels.

One area of innovation was the use of hardware ray tracing for indirect lighting with Global Illumination Based on Surfels (GIBS) technology.

GIBS is a proprietary EA technology that leverages hardware ray tracing for indirect lighting.
This turnkey solution allows artists to realize their vision, as GIBS requires no pre-computation, no special meshes, and no unique UV sets.
Artists brought 150+ stadiums to life in EA SPORTS College Football 25 with the help of innovative technology like GIBS.

This dynamic global illumination system is based on surfels, which is an abbreviated term for surface elements. Surfels are disk-shaped primitives that spawn on geometric shapes within a scene.

Surfels approximate a surface when combined and cache indirect lighting information. Merging hardware ray tracing with surfels spawning across geometric surfaces on-the-fly allows the scene to accumulate and cache irradiance.

“Ray-tracing operations are performed exactly where they’re needed, on the surfaces in the scene, which is a good fit for global illumination.” – Henrik Halén (Principal Software Engineer, SEED)

“Simply put, GIBS is runtime ray-traced lighting for stadiums. This means our artists don’t have to bake lighting into the stadium.” – Richard Burgess-Dawson (Sr. Art Director, College Football)

“Through the exceptionally close collaboration between our rendering engineers and the Frostbite and SEED teams, we advanced GIBS performance to achieve our target of 60 frames per second across target platforms such as PS5, XBSX, and XBSS.” – Ishaan Singh (Technical Director - Rendering, College Football)

GIBS Lighting Technology in EA SPORTS™ College Football 25

Global Illumination Based on Surfels (GIBS) is innovative EA technology. See how it leverages ray tracing and lights up EA SPORTS™ College Football 25.

www.ea.com

Game development presentations - a useful reference

OlegSH

Lurkmass

Potato Head

Lurkmass

Potato Head

A massive foot-gun

Lurkmass

trinibwoy

Meh

Lurkmass

Copying descriptors on GPU timeline? Why not

Improving a fully bindless design - a-la Shader Model 6.6

chris1515

chris1515

chris1515

chris1515

The Snapdragon X Elite’s Adreno iGPU

Pjotr

Remij

chris1515

Hair Mesh Rendering - Cem Yuksel

MfA

cheapchips

Metricity

Deleted member 2197

Guest

DavidGraham

GIBS Lighting Technology in EA SPORTS™ College Football 25

Similar threads

Game development presentations - a useful reference

A massive foot-gun​

Meh

Copying descriptors on GPU timeline? Why not​

Improving a fully bindless design - a-la Shader Model 6.6​

Deleted member 2197

Guest

Similar threads

A massive foot-gun

Copying descriptors on GPU timeline? Why not

Improving a fully bindless design - a-la Shader Model 6.6