AMD made a mistake letting NVIDIA drive Ray Tracing APIs

You're right, the HW depends on the data structure, and we can't modify this structure. But we can modify the data.
If we know the specs, we can convert our data to the specified format the HW understands. Then we can do:
* Build BVH offline, stream it, convert to GPU. (Due to storage costs, that's maybe not always practical for the whole BVH. But then we can build only the lowest levels of the tree on the client for a compromise.)
* Apply changes to parts of the BVH, ideally. E.g. if a cluster of geometry on a model changes it's geometry due to LOD.

So we eliminate most of the cost of BVH building in the background during gameplay,
and we can implement fine grained LOD such as Nanite.
We may even get slightly better tracing perf. from high quality offline build. (but may be also slightly worse if our source data isn't ideal for all unique chips out there.)
We also achieve feature parity with consoles, if that matters.

To make it work and avoid a crash, we must know all details about the data structure as expected by the HW, plus ideally some optimization guide on what's ideal for the HW and what not.
I think reading your response is precisely why I think they did not go this route. It goes against the point of having DXR, you may as well let every vendor have their own RT extension and just bolt that onto DirectX if that is the case. The idea of having a single call, with the drivers doing the work for you, is the point of all of DXR, and the costs in performance are compromises to enabling developers to making a code once and deploy on multiple cards.

Developers are having challenges with DX12 today, and that's considered low level, I can't imagine developers having to manage the acceleration structure for a variety of GPUS in a single family, let alone doing this for several vendors from mobile, console to desktop.

I think if LOD is a critical component, LOD will be handled the DX method it will be a single code, multi deploy, and it will be up to hardware vendors to ensure their drivers are functioning well to get good performance out of it. I can't imagine any developers actually wanting to take on this task unless they only had a single target (ie PS5). A multiplatform deployment sounds excruciatingly painful.
 
So we eliminate most of the cost of BVH building in the background during gameplay,
and we can implement fine grained LOD such as Nanite.
Costructing, streaming, caching BVH for billions of micropolygon isn't efficient anyway in both offline and online. I think the best approach is provide common API that allow developers rebuild & refit small portion of a BLAS node or, linking specific leaf node to another TLAS?
 
I think reading your response is precisely why I think they did not go this route. It goes against the point of having DXR, you may as well let every vendor have their own RT extension and just bolt that onto DirectX if that is the case. The idea of having a single call, with the drivers doing the work for you, is the point of all of DXR, and the costs in performance are compromises to enabling developers to making a code once and deploy on multiple cards.
That's why the whole thing would be optional.
The driver still can manage all BVH for you, and if you don't need to look under the blackbox, you won't.
I assume only very few engines would go the route of custom BVH management. But they would have an advantage over the competition, so with time, utilizing the feature would become more widespread.
And in times of most games made using U engines, it's actually enough if only few engines tackle it, sadly.

It's not great, but keep in mind: It's the only way to make traceable LOD. Otherwise we are stuck at discrete LOD till the end of time. Discrete LOD is no solution, and after Nanite it's already outdated.
The situation is now much worse than it would have been if they included specifications or standards at launch. But that's their fault. They messed it up, now they have to clean it up.
The whole idea to generate BVH ingame is silly. I'd excuse they didn't think about LOD, but there should have been options to stream BVH from the start. RT is expensive enough - why do expensive BVH builds if they can be precomputed? Such streaming could have kept blackboxed, and then we would already have a base format the driver converts to the HW, which would be a start towards dynamic BVH.

If the blackbox isn't lifted, the only option becomes a standardized LOD solution, which IHVs have to support by driver software. E.g. 'Nanite' free to use for everybody.
But we don't have such standard. Nanite can only do LOD per model, so it requires to split some large terrain into thousands of unique models. This would cause cracks across the models, and it would take too much storage and memory anyway. That's why UE5 games still use lofi hightmaps, decorated with some rock models on top of it. Nanite is great, but no general solution yet.
To work out solutions which work for us, it's required that APIs actually allow to do so.

So you see, it sucks, but there is no alternative. The only alternatives are to accept stagnation on the LOD problem, or to simply not use RT, or to accept inaccuracy and redundant costs like Epic does.

Texture Compression formats is actually a good example. It started with many specific formats, but MS enforced standards (iirc) and licensing issues were resolved.
Technical complexity isn't worse for a BVH struct. And i do not even request a standard. I would be happy to know the individual formats IHVs use. I can't be more modest than that.

Developers are having challenges with DX12 today, and that's considered low level, I can't imagine developers having to manage the acceleration structure for a variety of GPUS in a single family
I have to do all this anyway for my own BVH anyway, so i have experience and can imagine what's the effort.
We can compare this to two things, regarding the increase of complexity:
Moving from rasterization (point lights, fake falloff, phong shading -> PBS, TAA) to raytracing (monte carlo integration, importance sampling, PBS, still fake falloff, restir, denoising).
Moving from discrete LOD to Nanite.
The effort is much smaller than either of these. Devs can do it.
Anyone who thinks BVH is black magic and requires rocket science skills is wrong. It's just a tree with a bounding volume for each node. A blackbox does not mean there must be a can of worms under it.
I would also say that mastering low level gfx APIs as a whole is much harder and covers much more details as well.

LOD will be handled the DX method it will be a single code, multi deploy
I could elaborate on why general LOD is impossible to solve.
If we try to enforce a standard, it will be bad and short sighted.
 
Costructing, streaming, caching BVH for billions of micropolygon isn't efficient anyway in both offline and online. I think the best approach is provide common API that allow developers rebuild & refit small portion of a BLAS node or, linking specific leaf node to another TLAS?
My conclusion is the only practical option is to precompute offline, but only the top levels. Otherwise it takes too much storage.
For example, we have 20 levels for a BVH4, which would be enough for a surface of 1M * 1M nodes / triangles.
If we store 18 levels, that's 260k * 260k nodes. Much less, because it decreases exponentially.
A single workgroup can easily build 2 or more bottom levels without a need for barriers between the levels. We also avoid the problem of not saturating the GPU for the top levels, and high quality tree isn't needed for bottom levels either.

But that's not the only thing we can do. I work on much higher compression levels. It should work, but so far it took already 3 * the dev time i have initially estimated... i hope i'll ever get done with that crap ; )
 
LOD will be handled the DX method
I just thought about something which would actually work.
Add the option of local updates to the BVH blackbox.
If a cluster on the mesh changes detail, we call to rebuild BVH within the bounding box of the cluster.

So we don't have to rebuild the whole BLAS just because one cluster changes.
It will still do much more work than actually needed, but it might work well enough eventually. And the blackbox remains intact.

That's really not too bad i think.
But it's also much more work for IHVs to implement than just releasing a pdf about their specs.
 
That's why the whole thing would be optional.
The driver still can manage all BVH for you, and if you don't need to look under the blackbox, you won't.
I think there’s a reason we want away from the days of 3DFX Glide and S3 Metal and the dozens of audio codecs. I think if you want an optional code base to work at that level, companies like Epic could demand it, and maybe enough developers could demand it that an extension is released, but I don’t see any reason how it could integrate into DirectX standard.
But it's also much more work for IHVs to implement than just releasing a pdf about their specs.
Pretty sure this is much more work for IHVs than to provide a pdf. Releasing documentation and saying you’re on your own is very different than supporting teams for success. It’s going to backfire spectacularly and there’s going to be vast differences in performance between vendors depending on how much they developer team was supported by that vendor.

It’s really bad from a business perspective that can later be solved by working with vendors and MS to develop the hardware and software together to bypass the need for everyone to get into the guts of it.

DXR2.0 is coming around the corner and AI ML technologies have dramatically improved the performance throughout significantly more than hand tuned optimized coding. The likelihood that having access to BVH data and hand tuning it VS. a vendor creating silicon hardware and driver stack on a long term roadmap of what’s coming is heavily in favour of the vendor winning.

Sorry. I love the romanticism behind it, but you’ve got no chance against what’s coming in 8 years. Nvidia could offer full support and access to all their black box hardware and in 30 years of hand tuning by CDPR wouldn’t approach the performance level that they have with Overdrive in Cyberpunk 2077.
 
Last edited:
I don't know what's coming in 8 years. But if you do, we're all ears...

With DLSS 3 about seven-eighths of pixels are AI generated today. Or 87.5% of all pixels are AI generated.
Eventually they will move closer to 99% within 8 years.

So I ask why we are wasting years focusing on optimizing the 1%.
 
With DLSS 3 about seven-eighths of pixels are AI generated.
Eventually they will move closer to 99%.
But how does this matter for anything i've discussed?

You mean AI will completely replace any traditional rendering tech, plus gameplay programming as well, and finally any software engineering?
So there's no more need to work on any of those open problems, and i could just quit, wait and see for the singularity to happen?

Guess we'll find out. Personally i think we'll always need traditional engineering to have precise control.
 
But how does this matter for anything i've discussed?

You mean AI will completely replace any traditional rendering tech, plus gameplay programming as well, and finally any software engineering?
So there's no more need to work on any of those open problems, and i could just quit, wait and see for the singularity to happen?

Guess we'll find out. Personally i think we'll always need traditional engineering to have precise control.
Well the concept is whether it makes sense to accept inefficiencies in programming as we go forward. Instead of needing to do LOD everywhere let AI handle it. It’s approximate, imperfect, but it’s enough to fool people and enough to run at high speed as well.
You can focus your attention on building a game instead of needing to optimize everything for every single vendor.
 
Well the concept is whether it makes sense to accept inefficiencies in programming as we go forward. Instead of needing to do LOD everywhere let AI handle it. It’s approximate, imperfect, but it’s enough to fool people and enough to run at high speed as well.
So that's how AI handles the visual LOD problem.
But how does AI handle the memory problem? E.g. we're on top of a mountain and look down. There is Los Santos, Night City, and some others to see. How can AI show those things if they don't fit in memory because we have no LOD?
Let me guess: There is a trained model of all those things seen from any point, at any distance, and this model is robust enough so AI can simulate - no, imagine and invent - all the stuff like car accidents, pedestrians going mad, prostitutes stripping in a club, and it all fits on a floppy disk and runs on a potato. It also feels real. Nobody has 6 fingers in AI Los Santos.

> You can focus your attention on building a game instead of needing to optimize everything for every single vendor.
Why should i waste time to build anything? AI will do it for me. I just type 'Make AI Los Santos' and done.

Anyway, i won't optimize BVH for all vendors, since that's sadly not possible.
 
So that's how AI handles the visual LOD problem.
But how does AI handle the memory problem? E.g. we're on top of a mountain and look down. There is Los Santos, Night City, and some others to see. How can AI show those things if they don't fit in memory because we have no LOD?
Let me guess: There is a trained model of all those things seen from any point, at any distance, and this model is robust enough so AI can simulate - no, imagine and invent - all the stuff like car accidents, pedestrians going mad, prostitutes stripping in a club, and it all fits on a floppy disk and runs on a potato. It also feels real. Nobody has 6 fingers in AI Los Santos.

> You can focus your attention on building a game instead of needing to optimize everything for every single vendor.
Why should i waste time to build anything? AI will do it for me. I just type 'Make AI Los Santos' and done.

Anyway, i won't optimize BVH for all vendors, since that's sadly not possible.
It will be a problem for the vendors to solve the best next places for AI to intervene. You are trying to solve it using your head, which is awesome. But several data scientists are looking at the render pipeline and saying here is where we can insert more AI to do more like this: let AI figure out how to light the scene. You only need to provide it some structure for it to do the work.

 
It will be a problem for the vendors to solve the best next places for AI to intervene.
Why just vendors? Because only the largest mega corps can effort to experiment with AI? (serious question)

Regarding the examples you give, i think you know as much about gfx programming and its problems than i know about AI.
That's not said to argue, but from your questions and responses i feel like the LOD problem is still not understood.
Also, the two videos you gave as examples do not address any problems games have, and where AI then might be the game changing solution.
Imo, better examples would things like Nerf, neural radiance caches, or things like that.

Actually i think neither of us has the knowledge to tell the other to better give up, because the 'competing' field will solve related problems better.
To do this, we both would need to have some expertise on both those fields, which isn't the case, so we end up with reactions based on second hand knowledge or prejudges.

Regardless, i tell you what i think ML can do for games, and what's my expectations. Just to make clear i'm not against ML only because i personally would not currently pay for tensor cores if playing games is my intent.

The first application seems procedural content generation to save costs. I'd have high hopes here, but i expected it to be more specific than something like Stable Diffusion. I think about generating rocks, aging effects, image synthesis, etc.
A very interesting runtime application is to replace traditional animation and mocap with procedural animation. ML has shown very impressive results here. Really promising.
And finally, the killer application for ML in games is simply AI itself. The goal is to get dynamic NPCs, no longer relying on static data such as animation and recorded speech. Dynamic and adaptive stories. A game changer. Industry veterans often say this won't work due to lack of control, but i rather believe in new things than game design concepts of the past.

ML will sure find some more applications in gfx too, but we already know how to do gfx, and it works.
NV fanclub ofc. thinks differently here, but that's obviously because gfx research is what NV has most experience with, so naturally they seek for applications there.
It's just not really that interesting. Better gfx is no more argument to make games better. People want games that are actually fun to play. Times where gfx sells games are over.
So i really think there are better opportunities for AI, e.g. as said above. It's mainly problems where we don't already know how to solve them.
 
Last edited:
So I ask why we are wasting years focusing on optimizing the 1%.

That 1% needs to be high quality all the more, not aliased.

PS. as for AI, NVIDIA dropped the ball a bit on inference ... it's important to not follow them too slavishly there too. 4 bits is the new 16, but for weights only ... inference engines need mixed precision vector matrix multipliers (ie. 16 bit vector times 4 bit array).
 
Last edited:
feel like the LOD problem is still not understood
Even with Nanite, the LOD problem is not solved yet, characters, dynamic objects, shadows, lights, major terrtain pieces, .. etc, all still exhibit the traditional pop in issue. I play Fortnite regularly, and whenever you are parachuting down over the island, major pieces of geometry/terrain suddenly appear in front of you. It's done due to memory constraints and terrain issues, but it is there none the less. The other elements look jarring next to them as well.
 

With DLSS 3 about seven-eighths of pixels are AI generated today. Or 87.5% of all pixels are AI generated.
Eventually they will move closer to 99% within 8 years.

So I ask why we are wasting years focusing on optimizing the 1%.

None of these pixels in DLSS3 are purely "AI generated", some network moves accumulated data around based on a set of parameters, the initial rendering that feeds in the data is all "traditional".
The video is also just library compression, trading off runtime generation cost, pre computed library cost, and reconstruction error for some data compression. Which isn't good at all for temporal coherence of a 4d lighftield aka animated 3d rendering, you'd get flickering everywhere. I wish I could find it, there's a meme floating around twitter of the new "AI reconstructed" enhancement of the black hole picture, comparing it to an AI reconstruction of what is clearly just Obama blurred out, being reconstructed into a totally unrecognizable almost white man.

Heck even NERFS aren't neural at all. They're just volume rendering where the volumes are constructed by neural nets, but the rendering is "traditional". AI bros just having a hard time not attaching "neural" to anything and everything all the time.

Think of it this way: there's 0 way to beat entropy. A thing, anything, of complication X can be either precomputed to minimum size X, or generated by runtime minimum complication X, but that's it. There's no magic way around minimum complication X unless P = NP, which would break thermodynamics and physics as we know it. All you can do is tradeoff error, which is to say "make X less complex, which is to say less detailed". AI often just involves "precompute a lot Ys, store them somewhere, and then assume input X is similar to group of Ys we precomputed". If Y contains all possible inputs X, then it's friggen huge, if it doesn't, its smaller but incorrectly "reconstructs" X.
 
HIP RT 2.0 exposes an API that supported by RDNA3 only. AMD has freedom to support extended ray tracing capabilities not bothered by DXR/VKRT obviously.
/** \brief Traversal hint.
*
* An additional information about the rays for the traversal object.
* It is taken into account only on AMD Navi3x (RDNA3) and above.
*/
enum hiprtTraversalHint
{
hiprtTraversalHintDefault = 0,
hiprtTraversalHintShadowRays = 1,
hiprtTraversalHintReflectionRays = 2
};

arch9.jpg
 
Why just vendors? Because only the largest mega corps can effort to experiment with AI? (serious question)
So there's a bit of a multifacted answer on this one. But R&D in machine learning is largely driven but your data sets (corpus) and your training iteration speed. Both of these are cost prohibitive when the data sets are massive and the time to train a network is extremely high. Some of our state of the art machine learning models cost will be in the tune of 30K. And there will be many iterations of this to fine tune a model. So assuming costs are OK, then the next challenge is actually productizing AI, which is very different from just being able to train models. For mega corporations who need to see a return on AI investments, it's important that they can package and sell their technologies, otherwise R&D just becomes this deep hole of costs.

This is generally why we don't see game developers take this on. It's something that would be provided by a 3rd party service, whose product is to sell to the industry and not a one off application.
Regarding the examples you give, i think you know as much about gfx programming and its problems than i know about AI.
Correct. I am a data scientist, and I've written very little graphics programming and published only a tiny indie title.
Actually i think neither of us has the knowledge to tell the other to better give up, because the 'competing' field will solve related problems better.
I think for me, I'm not trying to ask you to give up. But a big part of my job is to look at trending. The trending is that we are continually to push more into AI and not less, making the likelihood that work in this area will continue to cover more computationally intensive sectors.
The first application seems procedural content generation to save costs. I'd have high hopes here, but i expected it to be more specific than something like Stable Diffusion. I think about generating rocks, aging effects, image synthesis, etc.
A very interesting runtime application is to replace traditional animation and mocap with procedural animation. ML has shown very impressive results here. Really promising.
And finally, the killer application for ML in games is simply AI itself. The goal is to get dynamic NPCs, no longer relying on static data such as animation and recorded speech. Dynamic and adaptive stories. A game changer. Industry veterans often say this won't work due to lack of control, but i rather believe in new things than game design concepts of the past.
Generally from a game design perspective, unless you are building an organic world, you want control over the AI in order to curate a story. I suppose if the idea is to eventually make a West World, I suppose you may be right, but I don't know how much we'll be seeing in terms of AI or AI sakes. When I think about 8K resolution, I see the next story for nvidia to be 1080p or 1440p -> to 8K and frame generation in between that. I see them looking at how AI can provide more than a single bounce, re-lighting a scene after it's been rendered. Then upscaled, then interpolated.

That's where I think they're going, and I have hard believing that consoles won't follow suit. At the end of the day, there are power and cost limits, so somewhere along this timeline AI is going to provide a better return on cost.
 
None of these pixels in DLSS3 are purely "AI generated", some network moves accumulated data around based on a set of parameters, the initial rendering that feeds in the data is all "traditional".
I'm not saying these things will be.
But if you do the math, 7/8ths already is with DLSS3. There are obvious inputs required. And if we move to 1080p to 8K. Well then, that's already 7/8ths being not created by the standard compute and graphics pipeline, and then double it for frame interpolation, and we're at 15/16ths now augmented through AI bypassing a lot of traditional rendering or 93.5%. To me, add in AI to solve some more components, and you're getting ever closer to 99%, even though, the developers are spending all their time building out this perfect 1080p image.

And that's where I'm going with this. They need to focus on building this perfect image and less about focusing on getting it to run at 60fps and 4K native. And they can pour all of their work into 1080p30, and AI will push it to 4K60. To me, that's what I'm talking about in terms of where I think developers need to be pushing - with respect to this discussion, are we at this stage _TODAY_ where we need to solve the black box today? Why can't it come in the following generations of RT? Are truly at that point yet that we can not progress further without it? if the answer is no, then I feel like we're putting the cart before the horse.
 
But if you do the math, 7/8ths already is with DLSS3.
Would you feed your self-driving car AI with the upscaled halucinated feed of a cheap super low res camera? (it's not about the AI cascade here, even if that's also a fun topic :))
Sometimes you can get away with plausibility (which is just noise we can't identify as such), but sometimes you need to or must pass information through the whole system unmodified. In general, looking at an uphalucinated image you can't say anymore which of it is original information. You get some bounds maybe, that what you see is within this and that bounds of the original information. But you can't use this image for anything but looking at it anymore as all the real information has been torched.
I'm really curious what happens when the AI corpora start to contain AI generated outputs ... like the internet ... . I once jokingly predicted the Ministry of Truth ... not sure if it needs to be dystophian though. (ups, back to the cascade, this one even a reinforcing loop :devilish:)

Just saying that the models having their heyday currently, are not the type of model you need to solve entropically correct rerepresentation.
 
Back
Top