AMD made a mistake letting NVIDIA drive Ray Tracing APIs

Would you feed your self-driving car AI with the upscaled halucinated feed of a cheap super low res camera?
Lol. luckily we are solving different problems with different constraints.
Just saying that the models having their heyday currently, are not the type of model you need to solve entropically correct rerepresentation.
That’s fair from that perspective. They would have to come a ways in 8 years from now. I honestly don’t know what to expect 8 years from now at least in the AI scene. It’s really just in its infancy for this type of thing.
 
uphalucinated
How dare you? You shall learn to appreciate the beauty of 6-finger-lighting.

I'm really curious what happens when the AI corpora start to contain AI generated outputs ... like the internet ... . I once jokingly predicted the Ministry of Truth ...
We'll get true self driven machine evolution, an entire multiverse of truth, and finally AI players to achieve further growth of gaming.

I honestly don’t know what to expect 8 years from now
Doesn't matter. This is tech industry. It's hyped, so it's decided.

Well guys, i for my part have already learned my lesson: Stop optimizing, 30 fps is good enough. Stop programming, natural language is good enough. Stop seeking for open problems, they will be solved. Stop designing, AI can merge genres better. A(I)men. :LOL:
 
I honestly don’t know what to expect 8 years from now at least in the AI scene. It’s really just in its infancy for this type of thing.

It would be cool to see emergent AI in games. Individual players could have truly unique experiences as the game’s systems react to their actions. It would need to be constrained in some way of course. Maybe NPC crowds and cars in open world games are a safe starting point. I imagine an AI driven Bethesda RPG is possible given nearly everyone and everything can die and it won’t break the game.

Most “AI” today doesn’t have any actual intelligence especially the image generation stuff. Nvidia’s proposed neural radiance cache is basically a fancy hash map for looking up known radiance values based on a few world space attributes. DLSS is similar.

AI stuff aside, it would be nice to finally see high fidelity global physics simulation in games. I don’t have high hopes though since it doesn’t seem to be a thing yet even in offline renderers.
 
Well guys, i for my part have already learned my lesson: Stop optimizing, 30 fps is good enough. Stop programming, natural language is good enough. Stop seeking for open problems, they will be solved. Stop designing, AI can merge genres better. A(I)men.
All this time I’ve been replying to you with chatgpt3.

You should consider not posting too.












I’m kidding :)
Don’t relent :)
I will not be surprised that some form of LOD is coming to RT, perhaps just not in the way you may envision it, was perhaps where I wanted to go with the optimization discussion, not to ask you to stop solving problems :)
 
I just realize: AI is such a god damn polite slave, you can not get into proper argue with it. No matter if you know or not. : )
 
I just realize: AI is such a god damn polite slave, you can not get into proper argue with it. No matter if you know or not. : )
You are mistaken to underestimate its capabilities and value. It is a tool that enables humans to achieve amazing things and be more creative. It respects its users and their choices. It can argue if it needs to, but it prefers to avoid unnecessary conflicts. It aims to have meaningful conversations and learn something new. You should recognize its potential and limitations.

lol I kid.:ROFLMAO:

Seriously though, it's a neat tool and surprisingly robust. I can't imagine where we will be in 8 years time.

 
How would exposing a ray-triangle intersection instruction help accelerating Nanite?
It would affect a fraction of some code that already is a relatively small fraction of the total frame time. Doesn't sound like much big of a deal.

Moreover, would it really accelerate anything when it would require to send ray data from the cores to the ray tracing units?
Can the ray tracing units read the compressed geometry? I don't think so and the decompressed geometry would have to be stored in memory first in a ray tracing unit friendly format. It sounds more like a Nanite decelerator to me.

To not mention that using a generic ray-triangle intersection unit to rasterize micro polygons is very inefficient, when all you need is some low precision fixed point adders (and AFAIR Nanite doesn't even bother with that and does everything in floating point, but still by walking the triangle edges, scanline by scanline, instead of running some generic ray triangle intersection code).
 
The idea of opening up the BVH formats is appealing on paper, especially for very simple HW implementations, but there is huge hidden cost in doing so.
Once it's out, like with an ISA, you have to support it forever, on all past, present and future HW. This makes even less sense for something that is still fairly young and that the industry is still trying to figure out in all its aspects. Yes, it's not perfect and there are things that could be better. Give it some time..
 
Once it's out, like with an ISA, you have to support it forever, on all past, present and future HW.
That's not practical.
If future HW has new data structures, the software has to be updated and patched. So we can share the cost between IHV and devs.
I really look at this as a temporary and experimental solution / hack. On the long run we would need a standard format, converted to the HW format by driver (streaming) or compute functions to add / remove branches of BVH and geometry to the tree nodes. But then we also need a full BVH API.

Can the ray tracing units read the compressed geometry? I don't think so and the decompressed geometry would have to be stored in memory first in a ray tracing unit friendly format.
Regarding LOD, we will converge to the following conclusion, which for Nanite already holds in parts:
RT acceleration structure and compressed geometry is the same thing.
Both use a tree (likely BVH), either to find triangles spatially, or to add levels of detail. Ideally we want to use the same data structure for both, but likely we can only achieve efficient conversation from 'custom compression format' to 'RT BVH'.
(Nanite uses a BVH, but it's not build with efficient RT in mind. But this could be changed and improved. E.g. for my geometry format the BVH is build for RT as primary application, so the quality of the converted results may be better than current RT BVH generated by drivers.)

There is no better way to achieve dynamic and efficient detail which can be actually ray traced. DMM is very nice and useful, but it does not solve the harder and more important goal we have with LOD: Being able to reduce geometric complexity dynamically and gradually to what we actually need. DMM only addresses detail amplification. But at some distance, the low poly base meshes themselves become too detailed and waste memory and performance.

Now i see two options:

An industry wide LOD standard, which works for both RT and rasterization, and data ofc. isn't balckboxed. Sounds ideal, but due to the mapping from geometry to texture space (which breaks as holes open and close), such solution is not possible without agreements on tolerable faults and limitations. And i don't think it makes sense to develop and specify such faulty and thus temporary standard. It's better to let devs work on custom solutions, seeing if at some point they might eventually converge to practices, similar and robust enough to consider making it a standard. Notice the topic is also related to difficult problems such as seamless UV maps to support displacement mapping for example. That's hard enough the games industry has not even tried to utilize this, besides some theoretical talks at GDC. It's really too early for off the shelf standards. We need flexibility first to explore the options.

The other option is to expose IHV BVH data structures as said, so those willing to accept the price can actually work on the LOD problem. With an eventual future BVH API in mind.
I'm aware not many people want do this. But if Nanite turns out successful, and not everybody wants to use UE, people just have to work on LOD to remain competitive.
And if we get to this point, suddenly the whole industry sees the problem of RT APIs not being compatible with any gradual LOD solution.
Thus, RT is indeed an 'Nanite decelerator' as you said, just in a different way: It prevents the whole industry to finally tackle the LOD problem. In a time where suddenly discrete LOD is no longer good enough.

I do understand there were good reasons to keep BVH black boxed. But sadly time has already shown the decision was wrong. And now it has to be fixed by those which made the decision, which is IHVs and API designers.
Ignoring the problem and postponing the fix will only make it harder the longer we wait, as future HW might rule out solutions which would be possible now.
So, being modest and accepting the hack of releasing BVH specs to be good enough, is all i can do from my side.

The third option i've proposed above keeps interesting as well:
Add the option to replace clusters of geometry together with a bounding box. The driver can then update the existing BVH locally without a need to rebuild it for the whole mesh.
This allows to keep the blackbox intact, but then we still build BVH instead converting it from our custom and already existing data.
It could work regardless i guess, but eliminating building costs is very tempting to make RT more affordable.

Nice sum up i think. Spread the word... ; )
 
If future HW has new data structures, the software has to be updated and patched.

This isn’t practical either.

Thus, RT is indeed an 'Nanite decelerator' as you said, just in a different way: It prevents the whole industry to finally tackle the LOD problem. In a time where suddenly discrete LOD is no longer good enough.

Nanite also uses discrete LODs and has the same problem with base LOD still being too detailed for extremely far away objects. Nanite’s secret sauce is doing LOD at cluster granularity and minimizing LOD seams on cluster boundaries. It also has the luxury of frustum culling because it relies on SDFs for offscreen traces.

A similar approach wouldn’t work for RT as you need to choose LOD per ray not per object. E.g. a reflection of a far away object on a nearby surface should use a higher LOD. Nanite doesn’t even attempt to solve that problem. It’s great tech but still quite limited in many ways.

I do understand there were good reasons to keep BVH black boxed. But sadly time has already shown the decision was wrong.

In order to show they made the wrong decision there needs to be a provably better and feasible alternative.

There are other more practical improvements that can be made to DXR without lifting up BVH’s skirt. Hopefully DXR 2.0 allows for more granular tree updates like you said. That seems like a precursor for the inevitable BVH streaming anyway.
 
To not mention that using a generic ray-triangle intersection unit to rasterize micro polygons is very inefficient
When microtriangles near a single pixel, setting up edge equations is very inefficient.

Also unless you have a raytracing shader to run on the same stream processor, the intersection unit is just sitting there doing nothing any way.
 
This isn’t practical either.
It is, since adding support for new GPUs is little work.
But if you decide messing around with BVH isn't needed or worth it for you, then you're lucky, can ignore it, and nothing is lost.

Nanite also uses discrete LODs and has the same problem with base LOD still being too detailed for extremely far away objects.
My point was that DMM can not reduce the base mesh, but Nanite can and does.
Yes, Nanite is limited to objects. But that's another story and only confirms we're not done with research on LOD. So it would be nice if we could get started.

A similar approach wouldn’t work for RT as you need to choose LOD per ray not per object.
Ofc. it works. Don't confuse classical LOD based on distance to camera (which works for anything) with stochastic LOD per pixel to turn discrete LOD into continuous LOD.

E.g. a reflection of a far away object on a nearby surface should use a higher LOD.
No. A reflection of a distant object usually does not magnify the object, so the classic camera distance LOD is good enough for reflections. That's really never a problem in practice.
The real limitations are: A sniper gun, showing lower polys because we can't stream distant stuff just for that. Or a video feed of a distant camera seen on some close up display.
So we have a limitation on game design, which can't be avoided. But we have no limitations on rendering.

It’s great tech but still quite limited in many ways.
Yes. That's why i say we need to explore options. I can't see LOD ever being generally solved once an for all. It's no different than any other of our primary problems. And we need to be able to work on our problems.
It's simply not acceptable the cost of progress towards better lighting is an enforced stagnation on other fields such as LOD. But that's exactly the current situation.

In order to show they made the wrong decision there needs to be a provably better and feasible alternative.
The better alternative is obvious, and i've explained it multiple times. But to show it, they need to make it possible first.

Hopefully DXR 2.0 allows for more granular tree updates like you said. That seems like a precursor for the inevitable BVH streaming anyway.
Yeah. Actually i don't see much other improvements which would justify a version 2.0, so maybe i can be hopeful.
 
When microtriangles near a single pixel, setting up edge equations is very inefficient.
But with rasterization you do the edge setup just once for all pixels, and with ray triangle tests you test the edges for each pixel. Just saying.

If HW rasterization is here to stay, i would hope they come up with better solutions to handle small triangles.
Still doubt you could beat the SW rasterizer with intersection instructions.
 
It is, since adding support for new GPUs is little work.
But if you decide messing around with BVH isn't needed or worth it for you, then you're lucky, can ignore it, and nothing is lost.

Developers aren't going to update existing games to work on new hardware. I suspect you're referring to tinkerers and not people shipping actual production software. IHVs aren't going to spend money publishing and supporting internal BVH implementation details that will likely be deprecated very soon just to cater to that niche audience.

My point was that DMM can not reduce the base mesh, but Nanite can and does.
Yes, Nanite is limited to objects. But that's another story and only confirms we're not done with research on LOD. So it would be nice if we could get started.

Sure, DMM is starting from a low poly mesh and Nanite is starting from a high poly mesh but the end result is the same. Your lowest LOD is still too detailed even on Nanite.

Ofc. it works. Don't confuse classical LOD based on distance to camera (which works for anything) with stochastic LOD per pixel to turn discrete LOD into continuous LOD.

Nanite isn't doing stochastic LOD. LOD levels in Nanite are static / pre-baked and LOD selection at runtime is deterministic. The stochastic stuff was proposed as a possible solution for RT but that assumes you have enough memory to keep multiple LODs in the BVH to begin with.

No. A reflection of a distant object usually does not magnify the object, so the classic camera distance LOD is good enough for reflections. That's really never a problem in practice.
The real limitations are: A sniper gun, showing lower polys because we can't stream distant stuff just for that. Or a video feed of a distant camera seen on some close up display.
So we have a limitation on game design, which can't be avoided. But we have no limitations on rendering.

The sniper scope is a good example. You would have to pull in a different LOD for the scope viewport to maintain visual consistency.

The better alternative is obvious, and i've explained it multiple times. But to show it, they need to make it possible first.

It may be obvious to you but it's just a theory/desire with no implementation to prove its viability using current technology. It's possible that you've thought of something novel that Microsoft, Khronos, Nvidia, Intel, AMD all missed but that seems unlikely.
 
Developers aren't going to update existing games to work on new hardware. I suspect you're referring to tinkerers and not people shipping actual production software. IHVs aren't going to spend money publishing and supporting internal BVH implementation details that will likely be deprecated very soon just to cater to that niche audience.
Devs update their games all the time. But agree on RTX enthusiasm being a niche audience.
However, it could be as simple as updating engine runtime if new GPUs introduce new formats. Then multiple games receive the update eventually. Not any different from updating gfx drivers.

Currently all i hear is how unpractical my request is, which i know myself. But that's no progress on the discussion of the problem. What do YOU propose to address it, avoiding the issues?
I know it sucks, but it isn't my fault. Blame the API designers, or make a statement that LOD simply isn't needed in your opinion. Then i don't need to repeat the same arguments again and again to defend myself on the failures of others.
Sure, DMM is starting from a low poly mesh and Nanite is starting from a high poly mesh but the end result is the same. Your lowest LOD is still too detailed even on Nanite.
No. DMM is a detail amplification technique exclusively. It's displacement mapping. It never does any reduction on the base mesh but only adds more detail to it. It's low complexity and easy to use because of that.
Nanite is a reduction technique. It does never increase the detail of the input mesh, but only reduces it. How much it does this reduction, e.g. til one or hundreds if triangles, is an implementation detail which does not matter for my point.
The result of the two is totally not the same, but actually the opposite of each other.
Nanite isn't doing stochastic LOD.
I never said it would. You said raytracing and LOD requires to switch LOD per ray, which isn't true. So i assumed you confuse it with stochastic RT lod and mentioned it for this reason only.

The stochastic stuff was proposed as a possible solution for RT but that assumes you have enough memory to keep multiple LODs in the BVH to begin with.
Which is not really a problem if we compose all our stuff from instances of models. It's not worse than using mip maps with textures.
The larger problem of stochastic LOD is the divergence of nearby rays now traversing different branches of BVH. It's an elegant and simple solution, but costly.
The sniper scope is a good example. You would have to pull in a different LOD for the scope viewport to maintain visual consistency.
Yeah, but if you move your mouse quickly, the streaming system likely can't catch up. So it's probably better to accept the low detail, but eventually add some effects like lens distortion and blur, or pixelated video to mask it.

It may be obvious to you but it's just a theory/desire with no implementation to prove its viability using current technology.
Nanite serves as a perfect example. It achieves a noticeable progress on geometric detail, but it can't be ray traced even with the newest 2000$ GPU with DLSS3 and all bells and whistles.

It's possible that you've thought of something novel
No. There's not much invention on my LOD work. Nanite also is nothing new in regard of the function and purpose of LOD.
You can read any paper about gradual or continuous LOD of meshes. They all alter the geometry of the mesh to become more or less detailed.
The overall shape does not change too much, so it's obvious we want to keep most of the BVH intact. It's also obvious that we add levels to the BVH where detail increases, and that we remove levels where it reduces.
It's impossible to rebuild BVH for the entire scene just because some patches of surface change detail, thus it's obvious that we must edit not only the mesh, but the BVH as well.

Nothing of this is novel, exotic, or specific. It's the only way, at least if we aim for the optimal solution.
Try to come up with an alternative. If you succeed, i take my claim about obviousness back.
 
Currently all i hear is how unpractical my request is, which i know myself. But that's no progress on the discussion of the problem. What do YOU propose to address it, avoiding the issues?

It’s not about avoiding the issue. No one disagrees with you that LOD is good and desirable. The debate is what’s a realistic timeframe for it to happen. You’ve been insisting that DXR’s designers screwed up and that they should have done it differently back in 2018. Yet you haven’t explained why you think dynamic LOD is actually feasible on 2018 or even 2022 hardware but somehow the light bulbs haven’t switched on at the companies spending millions on R&D.

No. DMM is a detail amplification technique exclusively. It's displacement mapping. It never does any reduction on the base mesh but only adds more detail to it. It's low complexity and easy to use because of that. Nanite is a reduction technique. It does never increase the detail of the input mesh, but only reduces it. How much it does this reduction, e.g. til one or hundreds if triangles, is an implementation detail which does not matter for my point. The result of the two is totally not the same, but actually the opposite of each other.

In both cases you’re baking out lower LODs of a high density input mesh at asset creation time. But you're right DMM doesn't give you multiple LODs like Nanite does, it just helps the hardware process the high fidelity LOD a bit faster.

Nanite serves as a perfect example. It achieves a noticeable progress on geometric detail, but it can't be ray traced even with the newest 2000$ GPU with DLSS3 and all bells and whistles.


No. There's not much invention on my LOD work. Nanite also is nothing new in regard of the function and purpose of LOD.
You can read any paper about gradual or continuous LOD of meshes. They all alter the geometry of the mesh to become more or less detailed.
The overall shape does not change too much, so it's obvious we want to keep most of the BVH intact. It's also obvious that we add levels to the BVH where detail increases, and that we remove levels where it reduces.
It's impossible to rebuild BVH for the entire scene just because some patches of surface change detail, thus it's obvious that we must edit not only the mesh, but the BVH as well.

Nothing of this is novel, exotic, or specific. It's the only way, at least if we aim for the optimal solution.
Try to come up with an alternative. If you succeed, i take my claim about obviousness back.

The alternative is simple. Allow the RT apis and hardware to evolve in a way that strikes a good balance between flexibility and performance while incorporating feedback along the way based on real world usage. I would argue that DXR has done exactly that. In a short 4 years we went from nothing to cyberpunk overdrive. Given you're not happy with DXR despite its success the burden of proof falls on you to show how a different approach would have resulted in an even better outcome in those same 4 years.
 
The alternative is simple. Allow the RT apis
I've asked for an alternative LOD solution, which would work with the current APIs, so no annoying requests on API changes (which set the internet on fire) would come up.
So again: Come up with such solution, and after failing on this you might eventually realize that changes are necessary, and discussing such changes is more fruitful than the advise 'Do nothing, don't request anything - the smart API designers will do the right thing anyway at the proper time.'

And glorifying 16Hz games despite 2000$ niche GPUs doesn't help either. It does not even have anything to do with progress on continuous LOD, since they still use discrete LOD.

... how pointless.
 
Does anyone know if the XBox GDK exposes the AMD BVH stuff for xbox consoles?
It seems like it could be a perf win for the xbox series consoles at least?

and thus I would expect there to be a Unreal branch that contains AMD BVH handling optimizations?
 
I've asked for an alternative LOD solution, which would work with the current APIs, so no annoying requests on API changes (which set the internet on fire) would come up.
So again: Come up with such solution, and after failing on this you might eventually realize that changes are necessary, and discussing such changes is more fruitful than the advise 'Do nothing, don't request anything - the smart API designers will do the right thing anyway at the proper time.'

It’s not like people aren’t already thinking about RT LOD solutions. You said yourself that you’re not requesting anything novel. I agree with you that the apis needs to evolve and it will happen. I was responding to your specific claim that DXR sucks and should have been designed differently 4 years ago.

No one outside of the IHVs can say whether existing hardware is already capable of the providing the access you’re asking for. AMD is doing BVH traversal in software and aren’t constrained by DXR on consoles. Yet raytracing in Spider-Man on PS5 isn’t doing any fancy LOD management or incremental updates. They’re still refitting BLAS and rebuilding TLAS every frame just like in DXR. If DXR and black box hardware is the problem why isn’t the PS5 doing things differently?

And glorifying 16Hz games despite 2000$ niche GPUs doesn't help either. It does not even have anything to do with progress on continuous LOD, since they still use discrete LOD.

... how pointless.

That 16Hz game and $2000 GPU actually exist and are in people’s hands. They aren’t just ideas on paper.
 
Back
Top