Game development presentations - a useful reference

I tried watching the presentation but unfortunately it was hard to hear/follow as the presenter was speaking far too quickly.
Honestly, it wasn't the most interesting presentation as its content repeated many other optimization guides and threading talks from other events.
The highlights for me personally were the special care notes about the sampler feedback on AMD, which requires sparse sampling for performance, as well as the special care needed on AMD for the scalarization for bindless.
Unfortunately, yet another presentation with zero confirmation of Lurkmass's theories about the superior binding model on AMD:rolleyes:
 
Honestly, it wasn't the most interesting presentation as its content repeated many other optimization guides and threading talks from other events.
The highlights for me personally were the special care notes about the sampler feedback on AMD, which requires sparse sampling for performance, as well as the special care needed on AMD for the scalarization for bindless.
Unfortunately, yet another presentation with zero confirmation of Lurkmass's theories about the superior binding model on AMD:rolleyes:
You think needing a waterfall loop for divergent resource access is somehow the end of the world on AMD when D3D12 had to rip out a far more powerful feature like GPU timeline descriptor creation/copying because the other hardware vendors couldn't keep up ? AMD could easily implement NV's binding model but the same is not true the other way around and there's alternatives like Guerilla Games' loose tiling technique to eliminate resource access divergence. There's no good alternatives to fully bindless GPU descriptor synthesis/copies and pointers everywhere ...
Image

Even 4A Games in their "Plans and Wishes" slide believes that AMD's model is desirable ...
  • Direct Descriptors manipulation would be extremely helpful.
 
Your own link says GPU timeline descriptor creation / copying was removed 10 years(!) ago due to “the real utility of GPU timeline descriptor updates is questionable.

“Questionable utility” does not equal “far more powerful” in any way that can comprehend those terms.
 
Your own link says GPU timeline descriptor creation / copying was removed 10 years(!) ago due to “the real utility of GPU timeline descriptor updates is questionable.

“Questionable utility” does not equal “far more powerful” in any way that can comprehend those terms.
Well things have been different now since developers have been explicitly asking for it to be exposed again. It's more powerful than the current SM6.6 style dynamic resource binding model. I know Insomniac Games' engine would appreciate having that feature since they have a very free form descriptor model on consoles ...
 
Well things have been different now since developers have been explicitly asking for it to be exposed again. It's more powerful than the current SM6.6 style dynamic resource binding model. I know Insomniac Games' engine would appreciate having that feature since they have a very free form descriptor model on consoles ...

From your first link:

A massive foot-gun​

While this is a ridiculously powerful feature, it’s also an equally ridiculous foot-gun. The requirements on debug infrastructure are extreme. Be warned!

Sounds like a terrible idea. Which devs have asked for it? Anyone besides Valve who may want it in Proton for the deck? Like any game devs?
 
From your first link:


Sounds like a terrible idea. Which devs have asked for it? Anyone besides Valve who may want it in Proton for the deck? Like any game devs?
The lack of debugging tool support doesn't negate the utility of the feature and any games like Marvel's Spider Man doing lot's of descriptor copying will benefit in terms of CPU overhead ... (I've seen it asked for elsewhere in other communities too as well)

The main takeaway is that AMD isn't the one preventing you from doing NURI (non-uniform resource indexing) as originally misled here while Nvidia are the ones willingly blocking more powerful bindless functionality like GPU-side descriptor copying from being standardized. I'll do you one even better which is pointing out the fact that they even expose a D3D12 driver extension for raw pointers in shaders (real bindless) in broad daylight but can you say the same for NVAPI yet ?
 
Raw pointers sound awesome. Are there any applications using them with good performance on any GPU arch? In cases like this it helps to have an actual use case where the better tech is producing superior results.
 
Raw pointers sound awesome. Are there any applications using them with good performance on any GPU arch? In cases like this it helps to have an actual use case where the better tech is producing superior results.
It's a more "programmer oriented" feature than "performance oriented" where it opens the capability to build more interesting data structures like linked lists. Microsoft hears feature requests all the time about it to have it added in D3D12 ...

If you want to extract every bit of CPU performance without subscribing to pointers then GPU timeline descriptor copying functionality is another powerful bindless abstraction for that case. Some author comments from the Khronos blogpost below:

Copying descriptors on GPU timeline? Why not​

Since descriptors are just memory now, there is nothing stopping us from doing descriptor updates on the GPU timeline. Combining this with GPU-driven rendering is exceptionally powerful.

Improving a fully bindless design - a-la Shader Model 6.6​

With descriptor indexing as-is, it’s already possible to reach a design where every resource is accessed by a uint index; the VkDescriptorSetLayout system does not change after all. We do this in vkd3d-proton already for example.

The main win of descriptor buffers for this kind of design is that it’s now far more efficient to shuffle descriptors around. We can also copy descriptors on the GPU timeline. I expect we’ll see some interesting innovation here.

There's other potential as well with a bindless design to reduce the number of PSO permutations as well. Consider the case where we have two near identical PSOs that shares the exact same set of static states and shaders but they only differ in the set of resources accessed between them. Bindless specifically let's us reuse a similar PSO with different sets of resources thus eliminating the need to compile/generate any redundant PSOs!
 

A pretty cool presentation about adding a custom mesh renderer to Unreal Engine that worked alongside UE's mesh renderer. Because UE is a general solution there is a lot of bloat, even in a simple thing like a static mesh. This gets really noticeable in big worlds with lots of meshes. In this presentation they talk about how they added a 'fast-path' for static meshes that only supports the features they need. This way they improved both memory usage and reduced stutters. Meshes that can't take the fast-path can fall back to UE's vanilla renderer.
Another cool thing they did is that they enforced their art team to only use a handful of materials and only derive material instances from them. So no autoring of material shader graphs for the artists. This way they had less PSO's/shaders to compile.
 

A pretty cool presentation about adding a custom mesh renderer to Unreal Engine that worked alongside UE's mesh renderer. Because UE is a general solution there is a lot of bloat, even in a simple thing like a static mesh. This gets really noticeable in big worlds with lots of meshes. In this presentation they talk about how they added a 'fast-path' for static meshes that only supports the features they need. This way they improved both memory usage and reduced stutters. Meshes that can't take the fast-path can fall back to UE's vanilla renderer.
Another cool thing they did is that they enforced their art team to only use a handful of materials and only derive material instances from them. So no autoring of material shader graphs for the artists. This way they had less PSO's/shaders to compile.
Very cool. Thanks for posting it!
 



A scene with 100 characters, each with a unique hair model comprising of 100 thousand strands, rasterized in only 2 ms on an NVIDIA GTX 4090 GPU (with 8x MSAA) using our real-time hair rendering method with hair meshes and our level-of-detail techniques. All 100 hair mesh models in this scene fit in 1.7 MB (between 13 KB and 21 KB per model).

A good usage of mesh shader
Hair meshes are known to be effective for modeling and animating hair in computer graphics. We present how the hair mesh structure can be used for efficiently rendering strand-based hair models on the GPU with on-the-fly geometry generation that provides orders of magnitude reduction in storage and memory bandwidth. We use mesh shaders to carefully distribute the computation and a custom texture layout for offloading a part of the computation to the hardware texture units. We also present a set of procedural styling operations to achieve hair strand variations for a wide range of hairstyles and a consistent coordinate-frame generation approach to attach these variations to an animating/deforming hair mesh. Finally, we describe level-of-detail techniques for improving the performance of rendering distant hair models. Our results show an unprecedented level of performance with strand-based hair rendering, achieving hundreds of full hair models animated and rendered at real-time frame rates on a consumer GPU.
 
Last edited:
The lack of debugging tool support
Seems more a language issue than a debugging issue. The compiler needs to understand lifetimes and scheduling to prevent UAF and race conditions by holding the programmer's hand.

Get it right by design, not by chasing intermittent bugs.
 

A pretty cool presentation about adding a custom mesh renderer to Unreal Engine that worked alongside UE's mesh renderer. Because UE is a general solution there is a lot of bloat, even in a simple thing like a static mesh. This gets really noticeable in big worlds with lots of meshes. In this presentation they talk about how they added a 'fast-path' for static meshes that only supports the features they need. This way they improved both memory usage and reduced stutters. Meshes that can't take the fast-path can fall back to UE's vanilla renderer.
Another cool thing they did is that they enforced their art team to only use a handful of materials and only derive material instances from them. So no autoring of material shader graphs for the artists. This way they had less PSO's/shaders to compile.

Here's the video of the presentation

 
EA's presentation of ray traced GIBS Global Illumination Based on Surfels.

One area of innovation was the use of hardware ray tracing for indirect lighting with Global Illumination Based on Surfels (GIBS) technology.
  • GIBS is a proprietary EA technology that leverages hardware ray tracing for indirect lighting.
  • This turnkey solution allows artists to realize their vision, as GIBS requires no pre-computation, no special meshes, and no unique UV sets.
  • Artists brought 150+ stadiums to life in EA SPORTS College Football 25 with the help of innovative technology like GIBS.

This dynamic global illumination system is based on surfels, which is an abbreviated term for surface elements. Surfels are disk-shaped primitives that spawn on geometric shapes within a scene.

Surfels approximate a surface when combined and cache indirect lighting information. Merging hardware ray tracing with surfels spawning across geometric surfaces on-the-fly allows the scene to accumulate and cache irradiance.

“Ray-tracing operations are performed exactly where they’re needed, on the surfaces in the scene, which is a good fit for global illumination.” Henrik Halén (Principal Software Engineer, SEED)

“Simply put, GIBS is runtime ray-traced lighting for stadiums. This means our artists don’t have to bake lighting into the stadium.” – Richard Burgess-Dawson (Sr. Art Director, College Football)

“Through the exceptionally close collaboration between our rendering engineers and the Frostbite and SEED teams, we advanced GIBS performance to achieve our target of 60 frames per second across target platforms such as PS5, XBSX, and XBSS.” – Ishaan Singh (Technical Director - Rendering, College Football)

 
Back
Top