GART: Games and Applications using RayTracing

Status
Not open for further replies.
When ALL vendors go RT, theres not much to 'hate' on now anymore. Even the PS5 has got ray tracing and just about any AAA/game on it has it, even though limited, its there and it will be admired and used. Same for other platforms.
 
It's not silly, just because this time NV is finally more successful.

It actually is quite silly to compare PhysX and DXR. One is owned and controlled by Nvidia. The other isn’t.

DXR is API standard of Microsoft, and i still think it was NV proposing it by using Optix. The industry has to use that an PC and XBox, but this does not proof they are happy with it. Otherwise XBox would not have more flexibility than we have on PC, which indicates this standard is not good enough.

It’s the first version of the api. Given how quickly DXR has made it into shipping games it would seem this first attempt was more than good enough to get the ball rolling. It’s kinda meaningless to point out that the first version of something isn’t perfect.
 
It actually is quite silly to compare PhysX and DXR. One is owned and controlled by Nvidia. The other isn’t.
The comparison was not about DXR, but overpriced first gen RTX GPUs, showcasing a feature which required GPUs at the pricepoint previously reserved for Titan to run smoothly.
Now i see 2060 was launched for 350, not much more expensive than 3060. But 2060 was 6.5TF, while i got a 10.5 TF GPU a year later for 220.
Something looked wrong here. Pay more for a feature that halfs framerate but does not look twice as good. It felt ridiculous to many - a marketing gag, introduced too early to make real sense yet.
Large scale fluid simulations are ridiculous too and still did not made it into games. To me the comparison makes sense, even if just one feature found adoption.
It’s the first version of the api. Given how quickly DXR has made it into shipping games it would seem this first attempt was more than good enough to get the ball rolling. It’s kinda meaningless to point out that the first version of something isn’t perfect.
No. DirectX is not a first version of an API. It has been learned flexibility is important. But this was totally ignored, and now is much harder to add afterwards, if it's possible at all.
The price 'to get going with just something' is high and has long standing consequences, already hurting progress on RT. See UE5 which suffers because it is innovative.
 
What i mean is: They can't apply the same LOD mechanism of Nanite to raytraced geometry because BVH is blackboxed.
Rebuilding all BVH constantly just because a patch of surface changes detail is no practical, having BVH at constant high detail does not fit into memory, having BVH at constant low detail misses RTs accuracy advantage and again becomes inefficient at distance.
What we need is ability to access and modify BVH, even if different vendors use different data structures.
As a bonus, this opens up reusing precomputed custom hierarchies (e.g. Nanites BVH) to convert it into HW format at very low cost, removing the costs of building BVH form the graphics driver completely.

Currently DXR has no support for any LOD mechanism. Standard discrete LODs do not count because that's no solution at all. You can't use them for larger models like terrain or architecture without noticeable popping and discontinuities.
Beside visibility, LOD is one of the two key problems in computer graphics. So far games industry did not really try to solve it, UE5 is the first serious attempt. Shortsighted DXR design hinders progress in this direction, and imo there is no excuse to justify such a failure.
 
What i mean is: They can't apply the same LOD mechanism of Nanite to raytraced geometry because BVH is blackboxed.
Rebuilding all BVH constantly just because a patch of surface changes detail is no practical, having BVH at constant high detail does not fit into memory, having BVH at constant low detail misses RTs accuracy advantage and again becomes inefficient at distance.
What we need is ability to access and modify BVH, even if different vendors use different data structures.
As a bonus, this opens up reusing precomputed custom hierarchies (e.g. Nanites BVH) to convert it into HW format at very low cost, removing the costs of building BVH form the graphics driver completely.

Currently DXR has no support for any LOD mechanism. Standard discrete LODs do not count because that's no solution at all. You can't use them for larger models like terrain or architecture without noticeable popping and discontinuities.
Beside visibility, LOD is one of the two key problems in computer graphics. So far games industry did not really try to solve it, UE5 is the first serious attempt. Shortsighted DXR design hinders progress in this direction, and imo there is no excuse to justify such a failure.
I do not view it as all as a failure or anything. It is just the beginning of an API - an API which can evolve as hardware and techniques do too. I find it rather unsurprising that a compute based triangle renderer has quirks that do not support a seemless integration with DXR hardware and software accelleration. Especially since discrete LOD and hardware triangle creation is an aspect of rendering that has existed as the de facto standard for nearly 30 years. Nearly every game uses it for primary view (easily 95+% of produced games). And the API was designed in 2017-2018 when UE5 did not even exist yet (which is only one engine and one solution to triangle density and LOD, there can of course be others). Can you imagine the hardware that must exist to both allow for serious hardware acceleration (4 - 10x) and programmability happening at the same time? Not sure that happens often if at all in the graphics accelerator space.

I feel like waiting a bit for DXR too mature and for more exotic ideas like nanite to actually be proven as shipping and development viable needs to happen before I get upset that I cannot get perfect acceleration in UE5 yet.
 
Last edited:
I do not view it as all as a failure or anything. It is just the beginning of an API - an API which can evolve as hardware and techniques do too. I find it rather unsurprising that a compute based triangle renderer has quirks that do not support a seemless integration with DXR hardware and software accelleration. Especially since discrete LOD and hardware triangle creation is an aspect of rendering that has existed as the de facto standard for nearly 30 years. Nearly every game uses it for primary view (easily 95+% of produced games). And the API was designed in 2017-2018 when UE5 did not even exist yet (which is only one engine and one solution to triangle density and LOD, there can of course be others). Can you imagine the hardware that must exist to both allow for serious hardware acceleration (4 - 10x) and programmability happening at the same time? Not sure that happens often if at all in the graphics accelerator space.

I feel like waiting a bit for DXR too mature and for more exotic ideas like nanite to actually be proven as shipping and development viable needs to happen before I get upset that I cannot get perfect acceleration in UE5 yet.

The devs complained of the DXR flexibility before the release of Turing card after the presentation of the technology at GDC 2018 or the SIGGRAPH 2018.

http://deadvoxels.blogspot.com/2018/08/some-thoughts-re-siggraph-2018.html

More complaint inside the blog post

- There's some minor allowance for LOD-ish things via ray-test flags, but what are the implications of even using this feature? How much more incoherent do I end up if my individual rays have to decide LOD? Better yet, my ray needs to scan different LODs based on distance from ray origin (or perhaps distance from camera), but those are LODs *in the BVH*, so how do I limit what the ray tests as the ray gets further away? Do I spawn multiple "sub-rays" (line segments along the ray) and given them different non-overlapping range cutoffs, each targetting different LOD masks? Is that reasonable to do, or devastatingly stupid? How does this affect my ray-intersection budget? How does this affect scheduling? Do I fire all LOD's rays for testing at the same time, or so I only fire them as each descending LOD's ray fails to intersect the scene?

https://aras-p.info/blog/2018/03/21/Random-Thoughts-on-Raytracing/

Black Box Raytracing
The API, as it is now, is a bit of a “black box” one.

  • What acceleration structure is used, what are the pros/cons of it, the costs to update it, memory consumption etc.? Who knows!
  • How is scheduling of work done; what is the balance between lane utilization vs latency vs register pressure vs memory accesses vs (tons of other things)? Who knows!
  • What sort of “patterns” the underlying implementation (GPU + driver + DXR runtime) is good or bad at? Raytracing, or path tracing, can get super bad for performance at divergent rays (while staying conceptually elegant); what and how is that mitigated by any sort of ray reordering, bundling, coalescing (insert N other buzzwords here)? Is that done on some parts of the hardware, or some parts of the driver, or DXR runtime? Who knows!
  • The “oh we have BVHs of triangles that we can traverse efficiently” part might not be enough. How do you do LOD? As Sebastien and Brian point out, there’s quite some open questions in that area.
There’s been a massive work with modern graphics APIs like Vulkan, D3D12 and partially Metal to move away from black boxes in graphics. DXR seems to be a step against that, with a bunch of “ohh, you never know! might be your GPU, might be your driver, might be your executable name lacking a quake3.exe” in it.

It probably would be better to expose/build whatever “magics” the upcoming GPUs might have to allow people to build efficient tracers themselves. Ability to spawn GPU work from other GPU work; whatever instructions/intrinsics GPUs might have for efficient tracing/traversal/intersection math; whatever fixed function hardware might exist for scheduling, re-scheduling and reordering of work packets for improved coherency & memory accesses, etc. etc.



I can find tons of other blogs and tweet of graphics developer complaining of blackbox BVH just after the presentation of DXR and raytracing. The flexibility is existing on consoles but is not useful for multiplatform games because of DXR on PC.
 
Last edited:
The flexibility is existing on consoles but is not useful for multiplatform games because of DXR on PC.
You are overstating the flexibility of console ray tracing in regards to the topic that JoeJ and I are talking about. It is still all about the hardware rasterised triangle there on console.
On console the traversal is programmable since it is not hw accelerated on AMD. But you're still writing traversers for the purpose of testing against hardware rasterised triangles. Nanite/Compute rasteriser is still not exactly fitting within there easily in a similar way. That is where the massive speed up is.
 
Last edited:
Since when can consoles test hits against arbitrary primitive types that are not triangles and also have the hw speed up? That is what the acceleration is about. The HW acceleration is doing the exact same thing there, just the traversal is programmable since it is not hw accelerated on AMD. But your still writing traversal to test against hardware rendered triangles.

The flexibility is about the BVH not the HW acceleration ray intersection working like PC. Like PC you can test against bounding box using HW acceleration and use a custom shader when you hit the primitive if it is not a triangle.

Devs don't complain about ray intersection acceleration but flexibility around BVH.
 
Last edited:
I can find tons of other blogs and tweet of graphics developer complaining

Yeah, from the usual suspects (people that love consoles). I can find a ton where people arent that negative about pc rt, too.
If i think its worth the time, il start spamming such tweets, links and screendumps aswell.

The flexibility is existing on consoles but is not useful for multiplatform games because of DXR on PC.

That statement is just wrong. If anythings limiting progress now, its the consoles.
RTX is ’flexible’ too. Besides, theres rdna2 on pc aswell, in its true form and all its festures/power (23+tf).

UE5 is optimized for ps5 apparantly, it seems running quite well on RTX hw…..
 
RTX is ’flexible’ too. Besides, theres rdna2 on pc aswell, in its true form and all its festures/power (23+tf).
That's why I'm looking forward to a title that's relying heavily on RT on the one hand and being optimized for addtional optimization paths RDNA2 offers on the other hand. To see, what this difference will mean for performance in practice.
 
an API which can evolve as hardware and techniques do too.
That's irrelevant. All current GPUs build and maintain BVH entirely on compute and CPU. So there is no reason to blackbox it at all. By opening up, developer takes the risk future GPUs need updates to support changes in their BVH format as well. It's then up to the developer to take that or using the current approach of leaving it all to the driver. Most would have decided for the latter of course, because they have no true LOD system yet. But if they want to compete UE5 visuals, this will change now, so it would be nice if DXR were ready.
I find it rather unsurprising that a compute based triangle renderer has quirks that do not support a seemless integration with DXR hardware and software accelleration.
The compute rasterizer is also totally irrelevant. This receives attention because we are used to focus on 'realtime rendering'. But this the wrong context.
Nanites big invention is not about rasterization. Usually, if you want to support fine grained LOD over a mesh, you get problems of discontinuities, requiring to stitch different detail levels with some extra triangles to fix the seams. This is very complicated and would add runtime costs of updating the mesh.
Nanite completely avoids this problem by keeping cluster boundaries at higher resolution to prevent cracks, and then in the next hierarchical step to reduce detail further, they move the previous boundary to the interior of the joined cluster to reduce the 'too high' bondayr from the previous step there. That's really brilliant work.
But that just said for context. What matter for is is: All this is offline and precomputed. Runtime cost is zero. Traversing and rasterizing for display after that has guaranteed minimized cost, and it's even adaptive so we can adapt to performance targets and various HW at runtime.
This is exactly what we want for raytracing too (and for any software in geberal). Preventing this, and ignoring the need for it - especially in a such performance hungry feature - was a big mistake. I'm not willing to compensate this with more powerful hardware just because we use big, bloated legacy PCs.
Now you say API will improve, just wait. But there is this risk, waiting too: With BVH bcoming an increasing problem and limitation, just build a HW builder unit and claim innovation again. Great, problem solved.
But then the problem isn't solved at all, just its sympotms were delayed, and we have another HW unit we have to feed with work. So we won't invest on the proper solution in software anymore at all. Case closed, opportunity missed.
Either they fix this API limit quickly (they won't), or they'll never fix it (most likely).
And the API was designed in 2017-2018 when UE5 did not even exist yet
This API is just the trivial RT API we know for decades from offline raytracing, Optix being just one very similar example. It was no big challenge to design it.
LOD is nothing new either. Remember Shinys Messiah which did that same thing. It just was no game changer yet because had to be done on CPU and needed to upload updated vertex indices to GPU each frame.
But it was obviously known all the time LOD has to be addressed. After DXR was presented surprisingly, Brian Karris comment on Twitter was 'how to do LOD?' - he already worked on that for years at this time, and other professionals had other serious complaints.
So i ask you: Why the fuck did they seemingly pull off the raytracing revolution in secret, not asking devs for feedback? Why did they think the simple and easy to use practices from offline rendering would work for realtime games too?
Whatever the answer is, it is no excuse for the failure.
 
Can you imagine the hardware that must exist to both allow for serious hardware acceleration (4 - 10x) and programmability happening at the same time?
Any existing RTX or RDNA2 GPU!
Like said, they build BVH in compute - pure software, but blackboxed.
I do not request 'traversal shaders', which only AMD could support. All i want is access to BVH data structure, which is pretty much the most important thing in raytracing.

I feel like waiting a bit for DXR too mature and for more exotic ideas like nanite to actually be proven as shipping and development viable needs to happen before I get upset that I cannot get perfect acceleration in UE5 yet.
But LOD is no 'exotic' idea. And it never was. It is just an open problem we need to address. I doubt we'll ever solve it in a way that serves all needs. Nanite is just a very good example, but still has restrictions.

I give you a different example: Animate foliage. Heavy cost on refitting BVH for each individually animated folige.
Proposed solution: Extend bounding boxes so they still bound triangles under animation. Never need to refit, just need to transform the boxes like vertices. Preblem solved. But becasue BVH is black boxed and API has no transfroming support, impossible.
Downside: Boxes are larger so tracing becomes more expensive. So we want to mix and match refit and transform based on some heuristic, e.g. for distant plants larger boxes are a win because lesser rays hit them.

Nothing of this exotic or something new, believe me.
 
So i ask you: Why the fuck did they seemingly pull off the raytracing revolution in secret, not asking devs for feedback? Why did they think the simple and easy to use practices from offline rendering would work for realtime games too?
Whatever the answer is, it is no excuse for the failure.
Do you remember the reveal of DXR? It was demo'd on production engines (Northlight, 4a, Frostbite, Unreal) - it is not like this just popped out of nowhere and devs were not consulted at all.

One thing I do know for an absolute fact is that some of the key people responsible for MS's API development quite literally only learned of Nanite and Lumen UE5 existing at all the day it was presented to the press on PS5. UE5 was actually kept a secret to nearly everyone (but Sony essentially). UE4 projects at MS for example had no visibility at all on UE5 up until it was publicly announced.
 
Any existing RTX or RDNA2 GPU!
Like said, they build BVH in compute - pure software, but blackboxed.
I do not request 'traversal shaders', which only AMD could support. All i want is access to BVH data structure, which is pretty much the most important thing in raytracing.
Genuine question: How is this handled on the GTX-side of things, which also support DXR via driver?
 
Genuine question: How is this handled on the GTX-side of things, which also support DXR via driver?

Probably the same BVH is blackboxed, the problem is not RTX or RDNA 2, it is DXR API. If consoles had Nvidia GPU it would been as flexible on BVH than with AMD GPUs.


DXR is a bruteforce solution.
 
Most of the current RT workloads are developed with a focus on what Nvidia hardware can and cannot do.
I don't believe that is true at all, even in RDNA2 optimisd implementation like RE8, RDNA2 takes a bigger hit.

We've had many console optimised RT implementations, Doom Eternal, Call of Duty Cold War, Watch Dogs Legion, The Medium .. and RDNA2 still falls behind Turing.

Not to mention, once you start increasing the complexity of RT, or use multiple RT effects, the gap gets wider.

6700XT is a bit ahead in 1080p, a bit behind in 3840p, 2070S is a 539 card, 6700XT 480€
Wrong metric, you don't compare based on price, which is arbitrary and subject to the competitive landscape in it's respective time. You compare based on technical specs. The 6700XT is a 3070ish level GPU, the fact it crashed behind a 2070S is telling enough, same for the 6800. The 6800XT is barely faster than 3060Ti, the 6900XT is either equal to 2080Ti or barely faster. That Is just pathetic scaling.
 
Last edited:
Status
Not open for further replies.
Back
Top