NVidia Ada Speculation, Rumours and Discussion

Rootax · Aug 7, 2021

Dictator said:
Gonna need the receipts - as all internal documents I have seen or personal anecdotes from developers contradict this. The game implementations as well contradict in terms of performance and visual quality.

Did you read the 4a blog on their implementation? It was a matter of finding tons of quality compromises essentially.

They stated that they're doing "custom traversal and ray caching and use direct access to BLAS leaf triangles that would not be possible on the PC".

"In terms of ray tracing, how would you characterize the different functions of PlayStation 5 from Xbox Series X and both consoles from the newly released PC graphics cards of the RTX 3000 series? In your opinion, should we expect many next-generation games with ray tracing overall?

What I can say for sure now is that PlayStation 5 and Xbox Series X are currently running our code at roughly the same performance and resolution.
The NV 3000 series is not comparable, it is in different leagues in terms of RT performance. AMD’s hybrid ray tracing approach is inherently different in its ability, especially for diverging rays. On the plus side, it’s more flexible and there are tons of (probably undiscovered) approaches to customizing it for specific needs, which is always good for consoles and, ultimately, console gamers. At 4A Games, we already do custom traversal and ray caching and use direct access to BLAS leaf triangles that would not be possible on the PC."

JoeJ · Aug 7, 2021

trinibwoy said:
Surely they will share their RT exploits, one day…

I'm not sooo optimistic about that. But would be nice.

trinibwoy said:
However they also acknowledge that passing control from the HW intersection unit to a traversal shader would be a significant performance loss. So it’s far from obvious that flexibility would be a net win on today’s hardware.

Yeah, did just mention this above myself. Beside that it increases memory access divergence, which is the main performance problem we have.
To be clear once more: I don't request traversal shaders. HW acceleration means to pay some price in terms of flexibility. Fine with that. It would be nice to have, it's nice to discuss it's options, but i don't request or need it. If someone else does, i'll listen.
I only request to access BVH data.

trinibwoy said:
Is there some RT workload that would be faster on today’s consoles than brute force hardware traversal? Maybe. But if the end result is completely unusable performance anyway then it’s kinda pointless.

I don't think software traversal can beat HW traversal in any practical case.
But i surely can improve performance with custom BVH, built and tailored to my specific needs, which the driver knows nothing about. By doing so, i'll prevent runtime spikes from BVH builds when streaming in new stuff. My performance will be more stable than the current state of the art.

JoeJ · Aug 7, 2021

Dictator said:
Gonna need the receipts - as all internal documents I have seen or personal anecdotes from developers contradict this. The game implementations as well contradict in terms of performance and visual quality.

Did you read the 4a blog on their implementation? It was a matter of finding tons of quality compromises essentially.

You observe existing samples. I talk about potential future improvements. Completely different things.
Did read the blogs, yep.

CarstenS · Aug 7, 2021

Thanks again for taking the time to explain your position in such detail, JoeJ. I can see now, where you're coming from and why you might have a different perspective.

trinibwoy · Aug 7, 2021

Rootax said:
They stated that they're doing "custom traversal and ray caching and use direct access to BLAS leaf triangles that would not be possible on the PC".

"In terms of ray tracing, how would you characterize the different functions of PlayStation 5 from Xbox Series X and both consoles from the newly released PC graphics cards of the RTX 3000 series? In your opinion, should we expect many next-generation games with ray tracing overall?

What I can say for sure now is that PlayStation 5 and Xbox Series X are currently running our code at roughly the same performance and resolution.
The NV 3000 series is not comparable, it is in different leagues in terms of RT performance. AMD’s hybrid ray tracing approach is inherently different in its ability, especially for diverging rays. On the plus side, it’s more flexible and there are tons of (probably undiscovered) approaches to customizing it for specific needs, which is always good for consoles and, ultimately, console gamers. At 4A Games, we already do custom traversal and ray caching and use direct access to BLAS leaf triangles that would not be possible on the PC."

The point on divergence is interesting. Most RT implementations in games today are doing a single coherent bounce (shadow rays) or are coherent reflection rays. Temporal accumulation is all the rage for diffuse GI. I wonder what 4a was trying to do where divergence was a problem.

DavidGraham · Aug 7, 2021

Rootax said:
They stated that they're doing "custom traversal and ray caching and use direct access to BLAS leaf triangles that would not be possible on the PC".

But even with that, performance was barely equal to an RTX 2060 Super with heavy compromises, not to mention it was a real headache to implement all of that despite the fixed nature of consoles!

due to the specific way we utilize Ray Tracing on consoles, we are tied to exact formats for the acceleration structures in our BVH: formats that were changing with almost every SDK release. Constant reverse-engineering became an inherent part of the work-cycle.

The first frames running on actual devkits revealed that we had been somewhat overly optimistic in our initial estimations of the expected Ray Tracing performance .. We halved per-pixel ray-count ..

https://www.4a-games.com.mt/4a-dna/...upgrade-for-playstation-5-and-xbox-series-x-s

Rootax · Aug 7, 2021

DavidGraham said:
But even with that, performance was barely equal to an RTX 2060 Super with heavy compromises, not to mention it was a real headache to implement all of that despite the fixed nature of consoles!

https://www.4a-games.com.mt/4a-dna/...upgrade-for-playstation-5-and-xbox-series-x-s

Oh sure, it was just to show that even 4a said consoles api are more flexibles. I don't know why it's so hard to believe for some...

Performances wise, it's hard to évaluate, as we would need the same hardware with the consoles api / rt " flexibility" vs a pc dxr environnement...

trinibwoy · Aug 7, 2021

Rootax said:
Oh sure, it was just to show that even 4a said consoles api are more flexibles. I don't know why it's so hard to believe for some...

Performances wise, it's hard to évaluate, as we would need the same hardware with the consoles api / rt " flexibility" vs a pc dxr environnement...

Nobody doubts that console apis are more flexible. The debate is whether that flexibility produces a better/faster end result.

Kaotik · Aug 7, 2021

trinibwoy said:
Nobody doubts that console apis are more flexible. The debate is whether that flexibility produces a better/faster end result.

If they didn't, why would the dev bother utilizing them? You could run bog standard DX12 on Xbox if you wanted to, but since for example 4a did do the time to go closer to metal and utilize possibilities DX doesn't allow (over and over again, one might add) it clearly produces faster and/or better end results.

trinibwoy · Aug 7, 2021

Kaotik said:
If they didn't, why would the dev bother utilizing them? You could run bog standard DX12 on Xbox if you wanted to, but since for example 4a did do the time to go closer to metal and utilize possibilities DX doesn't allow (over and over again, one might add) it clearly produces faster and/or better end results.

Huh? Of course all else equal flexibility is better. But all else isn’t equal. If Xbox had full hardware traversal would devs still utilize the flexible option?

The discussion is slow + flexible vs faster and not flexible. Not slow + flexible vs slow.

pjbliverpool · Aug 7, 2021

It'd certainly be interesting to equalise settings in MEE and compare performance between the consoles and the 6600XT/6700XT.

Svensk Viking · Aug 7, 2021

trinibwoy said:
Huh? Of course all else equal flexibility is better. But all else isn’t equal. If Xbox had full hardware traversal would devs still utilize the flexible option?

The discussion is slow + flexible vs faster and not flexible. Not slow + flexible vs slow.

I do not get why there even is resistance to what JoeJ is saying. JoeJ says he only wants the option, he does not promote that everyone suddenly are going to be forced to use it.
If DXR were updated with that flexibility, and perhaps even supported on all current DXR GPUs, why would anyone complain?

To be honest though, I do think the majority of any profession would take the easy way out for as long as they can. Relying on magical products to solve or postpone issues instead of working harder/smarter is simply so much more convenient.

CarstenS · Aug 7, 2021

This reminds me a bit of the HTnL situation before we had programmable shaders. Chicken and Egg?

edit: I'm wondering if Vulkans Ray Queries could use hardware intersection checks but leave the traversal to compute shaders on Nvidia hardware.

trinibwoy · Aug 7, 2021

Svensk Viking said:
I do not get why there even is resistance to what JoeJ is saying. JoeJ says he only wants the option, he does not promote that everyone suddenly are going to be forced to use it.
If DXR were updated with that flexibility, and perhaps even supported on all current DXR GPUs, why would anyone complain?

Depends on which of the things he’s saying that you’re referring to. Literally nobody thinks that flexibility is bad. That is not the problem with what he’s been saying.

It’s unsubstantiated stuff like “at some point, the console becomes faster no matter what's the difference in HW power.” There is no indication whatsoever that flexibility of current consoles will ever compensate for their significant deficit of raw RT performance. It would be great if this was demonstrably true but he’s making such claims with no theory or evidence to back it up. It’s all hand waving.

Svensk Viking · Aug 7, 2021

trinibwoy said:
Depends on which of the things he’s saying that you’re referring to. Literally nobody thinks that flexibility is bad. That is not the problem with what he’s been saying.

It’s unsubstantiated stuff like “at some point, the console becomes faster no matter what's the difference in HW power.” There is no indication whatsoever that flexibility of current consoles will ever compensate for their significant deficit of raw RT performance. It would be great if this was demonstrably true but he’s making such claims with no theory or evidence to back it up. It’s all hand waving.

Each to his own but I find him explaining his side of the argument well.

Perhaps I do interpret this discussion too negatively, but I have gotten the impression that there is this camp claiming "no, DXR is perfectly fine as is, leave it be". Whereas opening it up would not hurt anyone and the guys having complained about it suddenly would have the restrictions lifted and could get working on providing the evidence to the end users.

OlegSH · Aug 7, 2021

JoeJ said:
If you have open world, you stream in new stuff during gameplay, and generating BVH cost is not zero.

The cost depends on how the BVH is generated. If it's generated well in advance (which should be the case for BVH streaming too) in async compute and amortized across many frames, then it's practically free for static objects and we were talking exactly about how utilize HW better with async just a few pages ago.
Also, traversing hierarchy of LODs in every frame for tons of patches in case of Nanite, creating requests and fetching them from SSD would not be free either.

JoeJ said:
If you want continuous LOD either for detail or efficiency reasons, amount of static geometry becomes zero, and reusing the same geometry for RT becomes impossible.

Nanite stores geometry in its own format and who knows whether it's compatible with RT at all (there are likely some attributes stored per LOD patch rather than per vertex), so I am not sure whether the continuous LOD can help with memory footprint efficiency at all.
If the compact Nanite geometry representation is compatible with RT then it probably worth adding support for it to the BVH builder and problem is solved - no need to store 2 sets of meshes and small memory footprints, continuous LODs can be converted to a few discrete LODs for RT on the fly.

JoeJ said:
Proxy workaround is the only option, but that's a hack not a solution.

I expect there will be a lot of kitbashing in games with Nanite, having a hard time imagining good RT perf with that even if BVH patching was available and free perf wise, so the proxy workaround seems to be the only viable solution at the moment, automatic geometry merging and clustering would also help a lot and hopefully they will be able to implement it.

JoeJ said:
Stuck in the constant LOD middle age with some discrete LODs for characters?

Characters will stay in this middle age for a while with Nanite too, but what's the issue with discrete LODs for static geometry?
With RT, you can change these LODs when triangles are subpixel so that nobody would even notice that these LODs exist.

trinibwoy · Aug 7, 2021

Svensk Viking said:
Each to his own but I find him explaining his side of the argument well.

Perhaps I do interpret this discussion too negatively, but I have gotten the impression that there is this camp claiming "no, DXR is perfectly fine as is, leave it be". Whereas opening it up would not hurt anyone and the guys having complained about it suddenly would have the restrictions lifted and could get working on providing the evidence to the end users.

Why would anyone think that DXR is perfect, it’s literally the first version

This debate started months ago with suggestions that DXR 1.0 is too rigid and will hurt progress in the long run. My view on this is that usable performance required rigid data structures and constrained apis to make hardware acceleration feasible today.

If DXR 1.0 allowed everyone to define their own BVH model there would be no real-time RT because hardware acceleration wouldn’t work. Hardware acceleration depends on well defined contracts and data models. If you need that level of flexibility just code your RT pipeline from scratch using compute and deal with the performance implications.

Programmable traversal is more interesting. RDNA proves that it’s possible to support custom traversal with usable performance. However the trade off is that peak achievable RT performance on RDNA is much lower than black box traversal found in Turing/Ampere. We can speculate why DXR didn’t include traversal shaders on day one. The obvious conclusion is that Nvidia’s hardware implementation doesn’t support it and therefore it got axed. Won’t be surprising at all if future DXR versions are supported on RDNA and not Turing/Ampere.

If there are camps in this debate it would seem one side is arguing that prioritizing performance was the right decision for version 1.0 while the other side is saying peak performance is less important and they should’ve prioritized flexibility instead. I think looking at actual implementations of RT in PC and console games today we already know which side is right.

Rootax · Aug 7, 2021

trinibwoy said:
Nobody doubts that console apis are more flexible. The debate is whether that flexibility produces a better/faster end result.

Some do here....

JoeJ · Aug 7, 2021

OlegSH said:
The cost depends on how the BVH is generated. If it's generated well in advance (which should be the case for BVH streaming too) in async compute and amortized across many frames, then it's practically free for static objects and we were talking exactly about how utilize HW better with async just a few pages ago.

And what if you already have enough async compute work before you add RT?
Work is never free, no matter if async or not.
But if you add new features, and can hide the extra work with async, you basically admit you did not utilize the HW before that point

DegustatoR · Aug 7, 2021

trinibwoy said:
Programmable traversal is more interesting. RDNA proves that it’s possible to support custom traversal with usable performance. However the trade off is that peak achievable RT performance on RDNA is much lower than black box traversal found in Turing/Ampere. We can speculate why DXR didn’t include traversal shaders on day one. The obvious conclusion is that Nvidia’s hardware implementation doesn’t support it and therefore it got axed. Won’t be surprising at all if future DXR versions are supported on RDNA and not Turing/Ampere.

Turing and Ampere can do RT in the exact same fashion as RDNA2. Would lead to some halving of performance but if somebody thinks this is good ...

NVidia Ada Speculation, Rumours and Discussion

Rootax

JoeJ

JoeJ

CarstenS

Moderator

trinibwoy

Meh

DavidGraham

Rootax

trinibwoy

Meh

Kaotik

Drunk Member

trinibwoy

Meh

pjbliverpool

B3D Scallywag

Svensk Viking

CarstenS

Moderator

trinibwoy

Meh

Svensk Viking

OlegSH

trinibwoy

Meh

Rootax

JoeJ

DegustatoR

Similar threads