DirectX Developer Day 2020

There is nothing implying anywhere that "DXR 1.0 is deprecated", for AMD or any other vendor, as well as that 1.1 is a "fast path" for AMD RT. 1.1 is an extension of 1.0 with additional options which may provide more efficiency depending on where and how they are used. It's very possible that some h/w will be more efficient doing things through 1.1 in the same place where the other h/w will be more efficient with 1.0 - but it may as be the opposite for the same respective h/w in some other case.

It's basically impossible to say something like what he has said before seeing this h/w in action on your own code.
Not knowing if you're right or wrong doesn't mean you're colorblind. The post itself is neutral, it comments on AMDs side based on what the poster knew, or thought he knew based on what MS and AMD had talked about (and at least not completely wrong based on the quick glimpse I've gotten on these, haven't actually sat down to watch the presentations or read papers on it yet), and for NVIDIA it specificly mentions he has no idea whether there's similar "fast path way" (real or not) or even if there's need for one. It doesn't try to paint one competitor in better light than the other, or vice versa, it doesn't downplay either one either.
People need to focus less on who's "supporting" who or what and ignore who's saying something until they've actually read what it is they're saying, focus on the matter at hand, not the persons talking about it.
 
I'm non convinced 1.1 is a lasting step forwards because it does not address things still missing. It may prevent a global HW reordering solution while still not giving the option to implement it in software.

If it comes to that in the future then we just deprecate the API. Just make a new ray tracing API that matches the hardware better in the future. If OpenGL anything to go by which was effectively deprecated in it's entirety or previous versions of Direct3D come to stop being used in any real capacity then I don't see a real reason why compatibility must be absolutely guaranteed besides for legacy purposes that the applications aren't maintained anymore by their authors.

No API truly has to keep compatibility because 'cruft' will inevitably build up and at that point an abstraction becomes obsolete as older hardware falls out of favour. Deferred contexts didn't survive the transition from D3D11 to D3D12 as an example. CUDA for another example deprecated implicit warp synchronous programming since it's release with Volta/CUDA 9.0 because it could potentially cause undefined behaviour with other sets of hardware and instead introduced cooperative groups on the software side and independent thread scheduling on the hardware side. Newer versions of CUDA starting from version 11.0 will deprecate Kepler based hardware and the API will be rendered incompatible with it.

For now, DXR 1.1 is here to stay for at least a couple of years since some hardware maps better to it than with DXR 1.0.

Not knowing if you're right or wrong doesn't mean you're colorblind. The post itself is neutral, it comments on AMDs side based on what the poster knew, or thought he knew based on what MS and AMD had talked about (and at least not completely wrong based on the quick glimpse I've gotten on these, haven't actually sat down to watch the presentations or read papers on it yet), and for NVIDIA it specificly mentions he has no idea whether there's similar "fast path way" (real or not) or even if there's need for one. It doesn't try to paint one competitor in better light than the other, or vice versa, it doesn't downplay either one either.
People need to focus less on who's "supporting" who or what and ignore who's saying something until they've actually read what it is they're saying, focus on the matter at hand, not the persons talking about it.

Thank you, at least one person here gets it! I was indeed mostly commenting about how AMD hardware works.

Bindless shaders with dynamic divergent branching will produce very ugly codegen on AMD hardware. By offering inline ray tracing with a single shader AMD hardware will be able to use subgroup operations to mask out the inactive lanes if the shader takes a divergent path and this is relatively fast for them. With dynamic and divergent shader program indexing, AMD hardware has no choice but to store multiple shader programs which can quickly get out of hand with many shaders and it's especially bad if a warp/wave has to start doing dynamic divergent branching on multiple function calls. Doing ray tracing with multiple shaders makes it very easy to spill into memory this way which will cause slowdowns on AMD hardware.

One hard fact that an Nvidia engineer specializing in the CUDA stack revealed in public about ray tracing on their hardware is that by offering independent thread scheduling it gives warp/wave's the ability to have a program counter per-lane so it's cheap for their hardware to change shader programs in the case if a warp/wave have lanes with a different shader program to each other which is mainly caused by the divergent nature of dispatching rays.

That is my low level understanding so far on how each hardware vendor currently handles the ray dispatching in the case of diverging ray paths per-lane and it serves as my main argument for why DXR 1.0 and it's introduction of shader tables should be deprecated altogether because it's API practices is arguably less cross-vendor friendly from a performance perspective. It's cheaper currently on most hardware to take different branches in a single shader than storing multiple shader programs and then changing to different shader programs if a lane happens to want a different shader program compared to the other lanes.

Now it'd be great if people here would stop and actually listen to what the experts had to say instead of being defensive by bringing up their fanboy BS ...
 
You do realize that some of the people in this thread are experts who code for AMD and NV hardware for a living, right?

Regards,
SB

Those people will come to an understanding as they know exactly what I'm talking about but for the others who are obviously here to become a white knight for an IHV in order to spew invalid BS like how the "inline ray tracing can't handle complex shaders and thus doesn't have visual parity with their favourite vendor" are absolutely in no place to make technical arguments here. There's a totally different mindset between being a fanboy and a developer with the technical know how ...
 
Those people will come to an understanding as they know exactly what I'm talking about but for the others who are obviously here to become a white knight for an IHV in order to spew invalid BS like how the "inline ray tracing can't handle complex shaders and thus doesn't have visual parity with their favourite vendor" are absolutely in no place to make technical arguments here. There's a totally different mindset between being a fanboy and a developer with the technical know how ...

If you don't want to get labeled as the one that is less viewed upon, I'd suggest not using inflammatory language and instead try to have technical discourse.

Wording like "spew invalid BS" doesn't contribute to the conversation. Instead saying something is invalid due to X reasons and then providing documentation will get you much farther. Or you believe Y is happening instead of X and again linking to documentation or at least citing information from a reputable source will help your case.

Throwing around the fanboy label is more likely to get you labeled as one instead of getting people to try to understand what you are saying.

Regards,
SB
 
If you don't want to get labeled as the one that is less viewed upon, I'd suggest not using inflammatory language and instead try to have technical discourse.

This is technical discourse by correcting false statements but in the end it's a personal choice of whether or not to embrace it and some people just choose not to so there's no room for further exchange dialogue at that point.
 
Now it'd be great if people here would stop and actually listen to what the experts had to say instead of being defensive by bringing up their fanboy BS ...
...which is a bit hard if you accuse us to do so ;)

But thanks for the explanation and agree to that.
(My comments on the topic come from the assumption of avoiding different material shaders already in the first place, and how a future reordering unit would fit into current APIs - lesser thinking about which single codepath would suit both vendors better overall.)
 
...which is a bit hard if you accuse us to do so ;)

But thanks for the explanation and agree to that.
(My comments on the topic come from the assumption of avoiding different material shaders already in the first place, and how a future reordering unit would fit into current APIs - lesser thinking about which single codepath would suit both vendors better overall.)

I take absolute pride in my "no compromise approach" to educational discussion like these even if it does mean coming as been confrontational.

It's really harmful for misconceptions to spread around especially when a profession depends on such precise technical knowledge to perform their practice and getting wrong idea will just raise unnecessary issues in the future. Outright wrong think shouldn't be encouraged like this because it's a massive waste of valuable time on everyone's part.

Ray reordering hardware doesn't really seem to be in the agenda to standardize so it might be pretty far out into the future to deal with.

In the near future, I'm placing my hopes that Microsoft will standardize fixed function ray-box (ray-AABB) intersection testing for DXR 1.2! The new consoles already have this functionality available in the form of extensions and I also heard that Nvidia hardware has this too from other developers. Who else would be interested in hardware accelerated ray tracing for geometric representations other than triangles ?

DXR's current limitation is that it only allows fixed function ray-triangle intersection tests but so much more in potential could be explored with accelerated ray-box tests.
 
Bindless shaders with dynamic divergent branching will produce very ugly codegen on AMD hardware. By offering inline ray tracing with a single shader AMD hardware will be able to use subgroup operations to mask out the inactive lanes if the shader takes a divergent path and this is relatively fast for them. With dynamic and divergent shader program indexing, AMD hardware has no choice but to store multiple shader programs which can quickly get out of hand with many shaders and it's especially bad if a warp/wave has to start doing dynamic divergent branching on multiple function calls. Doing ray tracing with multiple shaders makes it very easy to spill into memory this way which will cause slowdowns on AMD hardware.

If true this limitation wouldn’t be specific to raytracing shaders. How do developers deal with this today on AMD hardware?
 
If true this limitation wouldn’t be specific to raytracing shaders. How do developers deal with this today on AMD hardware?

Sure it doesn't necessarily have to be specific to ray tracing shaders but given that bindless shaders (not to be confused with bindless 'resources' which is a different subject altogether) are only available with them on current PC APIs it's practically the only case where this presents an issue.

Bindless shaders essentially act as function pointers in disguise but on GPUs it's address can dynamically diverge for each lane during execution! Traditionally PC APIs don't allow you to use bindless shaders or function pointers in your vertex/fragment/compute shaders so this is never an issue in practice with the traditional graphics pipeline since developers aren't given access to this feature so they have to formulate their shader programs with this restriction in mind regardless.

Shader languages or shader intermediate representations haven't really evolved much in respect with pointers so their programming model is highly restrictive and outdated compared to a CPU using standard C++. Even advanced GPU languages in terms of feature set like the CUDA kernel language feels awkward compared to standard C++.

On consoles it's possible to use function pointers but it's expected that most developers will do the smart thing by optimizing the address to be uniform so that all lanes in a wave will execute the same function call and this can be fine in terms of performance.
 
Because DXR 1.0 is a lie on their hardware and I made no mention of any other vendors in relation to performance until you and others got defensive about it.
This was your parlance:
Whatever it is that Nvidia taught developers with DXR Tier 1.0, it's probably lies and it does not match how ray tracing works on consoles.
You came out and called NVIDIA practices a lie and started implying some kind of conspiracy theory against RT in consoles, that was your initiative, no one elses.

DXR 1.1 is the future otherwise Microsoft would've called it DXR 1.0 version B instead but nope DXR 1.1 signifies an absolute step forward in comparison to DXR 1.0
I respectively disagree. When Microsoft wants to introduce a major step they will call it DXR 2.0 or DXR Ultimate or whatever, not simply 1.1.

they fully expect programmers to adopt it's latest standard instead of adopting an outdated standard.
I guess we shall see about that.
and for NVIDIA it specificly mentions he has no idea whether there's similar "fast path way" (real or not) or even if there's need for one. It doesn't try to paint one competitor in better light than the other, or vice versa, it doesn't downplay either one either.
That's completely not true given the quote I mentioned above. Please pay attention to the conversation more closely.
 
Last edited:
This was your parlance:

You came out and called NVIDIA practices a lie and started implying some kind of conspiracy theory against RT in consoles, that was your initiative, no one elses.

It just meant that what techniques Nvidia 'taught' didn't 'apply' on consoles ...

I respectively disagree. When Microsoft wants introduce a major step they will call it DXR 2.0 or DXR Ultimate or whatever, not simply 1.1.


I guess we shall see about that.

Sure you can disagree but between 1.0's "shader tables" and 1.1's "inline ray tracing" it's pretty clear where the foundations will lie for the next generation in terms of programming model.
 
That's completely not true given the quote I mentioned above. Please pay attention to the conversation more closely.
I was talking about his first post in the thread, as I had previously specifically said, which is where things started rolling from after someone read it saying something it really didn't say. Maybe I'm not the one not paying attention ;)
 
Sure it doesn't necessarily have to be specific to ray tracing shaders but given that bindless shaders (not to be confused with bindless 'resources' which is a different subject altogether) are only available with them on current PC APIs it's practically the only case where this presents an issue.

Bindless shaders essentially act as function pointers in disguise but on GPUs it's address can dynamically diverge for each lane during execution! Traditionally PC APIs don't allow you to use bindless shaders or function pointers in your vertex/fragment/compute shaders so this is never an issue in practice with the traditional graphics pipeline since developers aren't given access to this feature so they have to formulate their shader programs with this restriction in mind regardless.

Shader languages or shader intermediate representations haven't really evolved much in respect with pointers so their programming model is highly restrictive and outdated compared to a CPU using standard C++. Even advanced GPU languages in terms of feature set like the CUDA kernel language feels awkward compared to standard C++.

On consoles it's possible to use function pointers but it's expected that most developers will do the smart thing by optimizing the address to be uniform so that all lanes in a wave will execute the same function call and this can be fine in terms of performance.

It is very frustrating seeing no progress on the compute side of things.
From the outside it looks like game industry is only interested in shiny robots, increasing details, doing low level optimizations, treating GPUs as a one way brute force machine not intended to generate work on its own efficiently, manage its memory dynamically, or do anything useful other than coloring pixels.

I would be happy enough if i had conditional command buffer execution like Mantle has, but neither DX nor VK allow to manage barriers within their options, so it's useless. Who cares.
Task shaders could be generalized to work with compute, but ofc. it's only about pushing more tiny triangles. Why would one want to call subroutines and pass data through on chip memory efficiently? Surely there is no need for such essential thing in games.
Same about the various RT shaders - obviously there is HW support for fine grained scheduling and dispatch. But gamedevs surely are too dumb to make use of this outside of RT, so there should't be any urgent need to expose this.

It's nice to see there is progress on the API side at all. But the current state of general compute feels stuck and starts to appear ridiculous. I do not request full C++ support, but there should be at least more than nothing... :(
 
It just meant that what techniques Nvidia 'taught' didn't 'apply' on consoles ...

But ask yourself why that is? Remember, 1.0's purpose was to make it easier for developers to hit acceptable levels of performance without needing to know how the underlining hardware worked. For instance, there is nothing in DXR that mandates the underlining acceleration structure be BVH. It could technically be anything. It's still too early for everyone (IHVs and ISVs) to agree on a generic acceleration structure for RT. In the early stages you need the black box. Unless of course you're an isolated platform that only has one architecture to optimize for. Then of course you know exactly how the underlining hardware works! For these platforms (cough consoles cough), the need for a black box is significantly reduced and flexibility is more desired.

I love you "black box truthers" btw. Yes MS is artificially limiting your flexibility in compute shaders because [some reason I never understood]. I got bad news for you, the underlining hardware just isn't that good (scheduling) or doesn't match up well enough with other architectures (barriers) to be more generic. You think that's real callable shaders you're getting in DXR? hahahahahahahahaha...surprise it's just a really big uber shader created for you in the background. You can do this already yourself. You don't though because usually you care about maximizing your performance. Which of course is the crux of the matter, you want callable shaders and more flexibility in CS? Sure, how much performance you willing to give up for them? MS doesn't include callable shaders in CS because people like you would actually use them instead of VERY sparingly using them in isolated use cases on an isolated group of GPUs (ray tracing). Then you'd wonder why everything is so slow. Must be the "PC APIs"... :D [hint: MS can only expose what the hardware allows them too...]

I just want to note that it's natural for black boxes to have progressions and evolutions. When we evolve and change an api it doesn't mean the previous versions were bad!!! DXR 1.0 was a good abstraction for the time, it brought non-trivial real-time ray tracing to video games for the first time. Pretty good! :mrgreen: DXR 1.1 is extending that reach and integrating RT into places where 1.0 couldn't (easily) go before. There are use cases for 1.0. There are use cases for 1.1. And there will be use cases for [next version of DXR]...:D
 
We should just focus on building time machine and then go back and do it right in the first time. Evolution and reality is so over rated. Who needs 1nm or transistors, when we could have quantum computers and dinosaurs after inventing time machine.
 
MS doesn't include callable shaders in CS because people like you would actually use them instead of VERY sparingly using them in isolated use cases on an isolated group of GPUs (ray tracing).
You have no idea what i would do, so why projecting your false assumptions on me in this tone?

It may well be RTX creates uber shaders under hood with jump tables - or what ever else. I guess they have at least some additional options here that are hidden from me. Personally i can NOT create huge shaders because occupancy would drop in practice.

Feel free to also explain how the situation is with mesh shaders, which boils down to indirect dispatch and taking care about data flow without a need for prerecorded barriers from CPU side, for workloads that might never end up doing any work at all. <- that's what i want.

And finally, Mantle has the option to skip over sections of prerecorded command buffers, including barriers. Which obviously works in a hardware friendly way - otherwise we would not see it in a vendor API? How old is Mantle?
It is not about black boxes, initially need to start things moving. So your black box or RT API defense addresses none of my arguments.


So that's my points (again). I did not request C++ parity or total freedom. Read my post again.
If you are happy with things as they are, and offend people that try to think ahead - be happy with remaining stuck at this.
I'll keep mentioning my requests and complaints in hope things will improve. It's not about criticizing anyones beloved APIs, vendors, or other unrelated parites.
 
Feel free to also explain how the situation is with mesh shaders, which boils down to indirect dispatch and taking care about data flow without a need for prerecorded barriers from CPU side, for workloads that might never end up doing any work at all. <- that's what i want.

This is exactly my point! How many GPUs support mesh shaders? Turing and RDNA2? MS is a bad guy for not supporting a feature that at this very moment only ONE architecture on ONE IHV supports? Really? You realize mesh shaders won't be fully utilized in game engines for like another 5 years right (i.e. be the baseline)?

And finally, Mantle has the option to skip over sections of prerecorded command buffers, including barriers. Which obviously works in a hardware friendly way - otherwise we would not see it in a vendor API? How old is Mantle?
It is not about black boxes, initially need to start things moving. So your black box or RT API defense addresses none of my arguments.

Another great example, the barrier API in mantle was fundamentally incompatible with nvidia's hardware and (especially) intel's hardware (they have very different barrier granularity). You think MS/IHVs made barriers harder on DX12/Vulkan for the lulz (you hate barriers on DX12? you'll just love love love Vulkan then)? Given the current state of ALL hardware at the time, this was the compromise that had to be forged (and if you're Vulkan you compromise even more for tiler's).

You have no idea what i would do, so why projecting your false assumptions on me in this tone?

What I mean is, callable shaders are not ready for primetime (the general market). Not even close! This is not to say you couldn't come up with a compelling use case for callable shaders. However given the current state of hardware, I find it extremely unlikely you'll come up with a compelling use case that's practical/feasible for the general market to use. Don't take it personally, I don't think anyone can! I mean look at ray tracing! MS/NV/ISVs/etc. have spent an enormous effort on making those "callable shaders" be as fast as possible on a very small range of GPUs (only the "latest and greatest" GPUs). And guess what? People still complain about the performance...:D
 
Back
Top