Game development presentations - a useful reference

I tried watching the presentation but unfortunately it was hard to hear/follow as the presenter was speaking far too quickly.
Honestly, it wasn't the most interesting presentation as its content repeated many other optimization guides and threading talks from other events.
The highlights for me personally were the special care notes about the sampler feedback on AMD, which requires sparse sampling for performance, as well as the special care needed on AMD for the scalarization for bindless.
Unfortunately, yet another presentation with zero confirmation of Lurkmass's theories about the superior binding model on AMD:rolleyes:
 
Honestly, it wasn't the most interesting presentation as its content repeated many other optimization guides and threading talks from other events.
The highlights for me personally were the special care notes about the sampler feedback on AMD, which requires sparse sampling for performance, as well as the special care needed on AMD for the scalarization for bindless.
Unfortunately, yet another presentation with zero confirmation of Lurkmass's theories about the superior binding model on AMD:rolleyes:
You think needing a waterfall loop for divergent resource access is somehow the end of the world on AMD when D3D12 had to rip out a far more powerful feature like GPU timeline descriptor creation/copying because the other hardware vendors couldn't keep up ? AMD could easily implement NV's binding model but the same is not true the other way around and there's alternatives like Guerilla Games' loose tiling technique to eliminate resource access divergence. There's no good alternatives to fully bindless GPU descriptor synthesis/copies and pointers everywhere ...
Image

Even 4A Games in their "Plans and Wishes" slide believes that AMD's model is desirable ...
  • Direct Descriptors manipulation would be extremely helpful.
 
Your own link says GPU timeline descriptor creation / copying was removed 10 years(!) ago due to “the real utility of GPU timeline descriptor updates is questionable.

“Questionable utility” does not equal “far more powerful” in any way that can comprehend those terms.
 
Your own link says GPU timeline descriptor creation / copying was removed 10 years(!) ago due to “the real utility of GPU timeline descriptor updates is questionable.

“Questionable utility” does not equal “far more powerful” in any way that can comprehend those terms.
Well things have been different now since developers have been explicitly asking for it to be exposed again. It's more powerful than the current SM6.6 style dynamic resource binding model. I know Insomniac Games' engine would appreciate having that feature since they have a very free form descriptor model on consoles ...
 
Well things have been different now since developers have been explicitly asking for it to be exposed again. It's more powerful than the current SM6.6 style dynamic resource binding model. I know Insomniac Games' engine would appreciate having that feature since they have a very free form descriptor model on consoles ...

From your first link:

A massive foot-gun​

While this is a ridiculously powerful feature, it’s also an equally ridiculous foot-gun. The requirements on debug infrastructure are extreme. Be warned!

Sounds like a terrible idea. Which devs have asked for it? Anyone besides Valve who may want it in Proton for the deck? Like any game devs?
 
From your first link:


Sounds like a terrible idea. Which devs have asked for it? Anyone besides Valve who may want it in Proton for the deck? Like any game devs?
The lack of debugging tool support doesn't negate the utility of the feature and any games like Marvel's Spider Man doing lot's of descriptor copying will benefit in terms of CPU overhead ... (I've seen it asked for elsewhere in other communities too as well)

The main takeaway is that AMD isn't the one preventing you from doing NURI (non-uniform resource indexing) as originally misled here while Nvidia are the ones willingly blocking more powerful bindless functionality like GPU-side descriptor copying from being standardized. I'll do you one even better which is pointing out the fact that they even expose a D3D12 driver extension for raw pointers in shaders (real bindless) in broad daylight but can you say the same for NVAPI yet ?
 
Raw pointers sound awesome. Are there any applications using them with good performance on any GPU arch? In cases like this it helps to have an actual use case where the better tech is producing superior results.
 
Raw pointers sound awesome. Are there any applications using them with good performance on any GPU arch? In cases like this it helps to have an actual use case where the better tech is producing superior results.
It's a more "programmer oriented" feature than "performance oriented" where it opens the capability to build more interesting data structures like linked lists. Microsoft hears feature requests all the time about it to have it added in D3D12 ...

If you want to extract every bit of CPU performance without subscribing to pointers then GPU timeline descriptor copying functionality is another powerful bindless abstraction for that case. Some author comments from the Khronos blogpost below:

Copying descriptors on GPU timeline? Why not​

Since descriptors are just memory now, there is nothing stopping us from doing descriptor updates on the GPU timeline. Combining this with GPU-driven rendering is exceptionally powerful.

Improving a fully bindless design - a-la Shader Model 6.6​

With descriptor indexing as-is, it’s already possible to reach a design where every resource is accessed by a uint index; the VkDescriptorSetLayout system does not change after all. We do this in vkd3d-proton already for example.

The main win of descriptor buffers for this kind of design is that it’s now far more efficient to shuffle descriptors around. We can also copy descriptors on the GPU timeline. I expect we’ll see some interesting innovation here.

There's other potential as well with a bindless design to reduce the number of PSO permutations as well. Consider the case where we have two near identical PSOs that shares the exact same set of static states and shaders but they only differ in the set of resources accessed between them. Bindless specifically let's us reuse a similar PSO with different sets of resources thus eliminating the need to compile/generate any redundant PSOs!
 
Back
Top