Direct3D feature levels discussion

If graphics programmers want major API redesigns then it's their onus to push those very limits of graphics programming even if it makes IHVs uncomfortable!

Barriers were recently refactored to be more explicit. Barrier APIs are more complex these days because some hardware in particular requires a more fine grain resource state tracking model with one of the lower common denominator being AMD there ...

If the current resource binding model and "buffer zoo" concept is a sore thumb then we absolutely need more recent examples like Ghosts of Tsushima to push the boundaries of bindless design and show the consequences of what happens to laggards such as Nvidia when emulating more powerful bindless API designs like shader resource tables on inferior designs. Another big reason why bindless hasn't taken off is also down to the fact that major ISVs such as Epic Games with Unreal Engine have been dragging their feet for nearly a decade on implementing this feature ...

Work Graphs could be really helpful for expressing dependencies between different rendering passes if you want to avoid the complexity of using barriers in some cases and it's mesh nodes extension has potential for simplifying PSO management ...

The way I see it, better API designs aren't going to sprout into existence by just simply giving in to waiting out the stalemate. Developers need to do a better job to scare IHVs into submission to extract whatever concessions they want out of them. Largely confining the ISV bullying to just Intel or AMD to a lesser extent won't drive much impetus for API changes because of the lack of pressute to motivate the biggest IHV to do better in some of these aspects ...
 
Last edited:
@Lurkmass I'm looking forward to Sebastian Aaltonen's upcoming blog post. I've been watching his commentary about the buffer "zoo". My understanding from his tweets is that if a hypothetical DX13 dropped support for pascal, they could actually make a very lightweight api. The newer nvidia hardware from Turing onward should have the capability he expects.
 
@Lurkmass I'm looking forward to Sebastian Aaltonen's upcoming blog post. I've been watching his commentary about the buffer "zoo". My understanding from his tweets is that if a hypothetical DX13 dropped support for pascal, they could actually make a very lightweight api. The newer nvidia hardware from Turing onward should have the capability he expects.
Ultimately, it's up to the graphics programmers to make the decision on whether or not to stand in line with IHVs because IHVs can't design hardware around hypothetical patterns that don't exist thus I'm skeptical about selling the idea we can improve API design massively which is in of itself subjective ...

Whilst Turing did implement a faster hardware path for bindless constant buffer views, the buffer zoo still very much exists over there with their HW designs as there's still a performance penalty in comparison to using bound constant buffer views ...
 
Whilst Turing did implement a faster hardware path for bindless constant buffer views, the buffer zoo still very much exists over there with their HW designs as there's still a performance penalty in comparison to using bound constant buffer views ...

Why wouldn’t Nvidia voluntarily adopt the more elegant and performant design without being coerced by ISVs? Are the required hardware changes complicated and/or expensive?
 
I would imagine the driver side changes would also be incredibly complex and time consuming.

I was thinking there must be some benefit to the “buffer zoo” otherwise why would any ISV willingly go down that path in a new game.

This sounds like a classic performance vs flexibility tradeoff. ISVs should just choose flexibility every time even if it makes Nvidia look bad. I agree with Lurkmass.
 
I was thinking there must be some benefit to the “buffer zoo” otherwise why would any ISV willingly go down that path in a new game.

This sounds like a classic performance vs flexibility tradeoff. ISVs should just choose flexibility every time even if it makes Nvidia look bad. I agree with Lurkmass.
The buffer zoo might not be a deliberate choice so much as a result of architectural decisions that were made in the past.

I don't know if the performance vs flexibility choice is that clear cut.
 
Ultimately, it's up to the graphics programmers to make the decision on whether or not to stand in line with IHVs because IHVs can't design hardware around hypothetical patterns that don't exist thus I'm skeptical about selling the idea we can improve API design massively which is in of itself subjective ...

Whilst Turing did implement a faster hardware path for bindless constant buffer views, the buffer zoo still very much exists over there with their HW designs as there's still a performance penalty in comparison to using bound constant buffer views ...

This is the post I was referring to. Outside my pay grade.


Looking forward to the full blog post he's working on.
 
Why wouldn’t Nvidia voluntarily adopt the more elegant and performant design without being coerced by ISVs? Are the required hardware changes complicated and/or expensive?
Cause it's elegant only from a graphics programmer perspective and isn't elegant at all from h/w complexity and costs perspective?
I think that we're well past the moment where spending transistors on programmer side QOL was a good idea.
 
Cause it's elegant only from a graphics programmer perspective and isn't elegant at all from h/w complexity and costs perspective?
I think that we're well past the moment where spending transistors on programmer side QOL was a good idea.

My guess is that it’s more elegant from a hardware perspective too but it’s slower. Slower because without the explicit API hints the hardware can’t optimize as well for specific use cases.

It seems like the right thing to do long term although we’re constantly hearing from developers that they want more freedom to program generic hardware and APIs but the results so far haven’t been great. If DirectX 12 was bad I shudder to think what people will do with generic pointers to GPU memory. The horror.
 
Why wouldn’t Nvidia voluntarily adopt the more elegant and performant design without being coerced by ISVs? Are the required hardware changes complicated and/or expensive?
Perhaps but if an API design has consistently sucked for the past several years with little to no signs of improvement in that area what other means do ISVs have to force the issue besides making good on their threat to 'misuse' current APIs to the detriment of that specific hardware vendor ? If we suppose that IHVs are rational actors that mostly look out for their own self interests and that developers were able to make AMD and Intel cave in with ray tracing and GPU-driven rendering (ExecuteIndirect) respectively then it's not out of the realm of possibility that they can change Nvidia's stance to work towards getting rid of the buffer zoo ...

Historically, the best way to get the attention of an IHV is to make benchmarks so that ISVs can dictate hardware design changes for themselves ...
This is the post I was referring to. Outside my pay grade.


Looking forward to the full blog post he's working on.
The buffer zoo on NV hardware still hasn't been entirely eliminated when we look at sebbbi's perftest application. Divergent indexed access to constant buffers (cbuffer{float4} load linear) is almost ~30x slower compared to a randomly accessed typed buffer on Ampere and not using vertex buffers just means that you're wasting fixed function vertex fetch hardware that comes for free over there ...

You can pretend that the buffer zoo doesn't exist on NV HW but may not be ideal for performance ...
 
Perhaps but if an API design has consistently sucked for the past several years with little to no signs of improvement in that area what other means do ISVs have to force the issue besides making good on their threat to 'misuse' current APIs to the detriment of that specific hardware vendor ? If we suppose that IHVs are rational actors that mostly look out for their own self interests and that developers were able to make AMD and Intel cave in with ray tracing and GPU-driven rendering (ExecuteIndirect) respectively then it's not out of the realm of possibility that they can change Nvidia's stance to work towards getting rid of the buffer zoo ...

Historically, the best way to get the attention of an IHV is to make benchmarks so that ISVs can dictate hardware design changes for themselves ...

It’s a valid strategy but one with acceptable consequences when you’re sabotaging products with only 0-10% market share.

What’s not clear from this discussion is whether there are still any benefits to specialized memory paths. Is AMDs streamlined buffer implementation just as fast as Nvidia’s custom paths or did they trade some speed for flexibility? If there’s no downside it seems reasonable that Nvidia will follow suit eventually.
 
It’s a valid strategy but one with acceptable consequences when you’re sabotaging products with only 0-10% market share.

What’s not clear from this discussion is whether there are still any benefits to specialized memory paths. Is AMDs streamlined buffer implementation just as fast as Nvidia’s custom paths or did they trade some speed for flexibility? If there’s no downside it seems reasonable that Nvidia will follow suit eventually.
What design is faster vs slower between is up for interpretation ...

Special memory spaces can potentially be faster if the program in mind shows the optimal memory access patterns for it but it can also be much slower than global/generic/general memory spaces when pathological memory access patterns are observed ...

AMD not having any special memory spaces means that the programmer doesn't have to think about applying memory access pattern optimizations in regards to the differing types of buffers. I guess some graphics programmers prefer the sigh of relief for a convenient programming paradigm whereby users don't have to worry about losing out on performance either due to not making use out this hardware path or using it in the wrong way because it doesn't exist!
 
Last edited:
Back
Top