Inline RT doesn't disable anything when it's done via API (SER way) or manually.Yes. Inline disables any reordering or material sorting in HW, so we should use it only with care.
The UberShader name speaks for itself - it's just a big shader. Inlining produces big shaders.UberShader seems weakly defined here maybe
Quake RTX has SW materials sorting, UE 4 and 5 have it and all other major engines with RT support have it too, so there is no single-threaded anything, it's up to programmer whether he wants to sort something or not (api doesn't impose restrictions on that and many devs do sorting in SW), which will obviously depend on programers skills as well as runtime stats (sorting itself must be fast, perf hit w/o sorting must be bad enough to compensate for sorting overhead).Looking at the Quake RTX code back then, i definitively had this impression of classical RT, and simplicity > performance. The whole approach just ignores how GPUs work. They are not single threaded, and treating them as such performs poor.
As i understand it, inline RT means tracing rays from any shader, likely compute. It returns the result immideatly to the same thread, so there is no more way HW could implement any reordering or material binning.Inline RT doesn't disable anything when it's done via API (SER way) or manually.
SER is just an API (that exploits Ada HW capabilities) that you can use in any Inline RT shader type to sort something by "key", you can sort hits by materials IDs or you can sort basically anything else with it. API does allow that, you can read here how it's done.What's the benefit of SER in the scenario of inline tracing? I assume it has no effect at all. Am i wrong with that?
'Big shader' does not tell much, so that's no definition of what it means.The UberShader name speaks for itself - it's just a big shader. Inlining produces big shaders.
Ok, missed this form looking at the code if so. Maybe they added it later, or i was wondering about some other things.Quake RTX has SW materials sorting
How? Idk what SER exactly is. NVAPI seems not public.SER just makes it way easier for devs to implement such features and get good performance out of them.
How? Idk what SER exactly is. NVAPI seems not public.
Very interesting - missed this before, thanks!SER is just an API (that exploits Ada HW capabilities) that you can use in any Inline RT shader type to sort something by "key", you can sort hits by materials IDs or you can sort basically anything else with it. API does allow that, you can read here how it's done.
Is there something out on DMM as well?The SER API is public, you can download the SER SDK and use it right away:
FYI, RTX Remix uses Primary Surface Replacement for mirror reflection and refraction. For the diffuse reflection, it's done through secondary bounces sampling of the GI pass.That is one thing that is enjoyable in Portal RTX from my perspective: things in reflections have visually the similar quality as things in primary view. GI, material responses, reflections in their own right... I cannot wait to see more games getting to that level.
It is nice to see after playing Fortnite where reflections are... not very good looking in the base set up they have. I really wish Epic allowed hitlighting in Fortnite as an option.
Very interesting - missed this before, thanks!
I see my assumptions are pretty right but it can do more than just that. Good stuff.
Question about granularity is still open. Is the reordering happening across the whole chip or local to a SM?
Whitepaper suggest setting up a ray tracing pipeline for games with Ray Queries as an easy path to integrate SER (not sure what the hard path would be), this can be just a raygen ubershader without indexing into shader table for closest hit shaders as far as I understand.The ReorderThread function only available in the raygeneration shader. I have no idea if it could be callable from the arbitrary material shaders.
Makes sense. SM is not enough, whole chip would probably already diminish the wins, even more so if their is a chiplet future for NV too.The white paper is a little vague on that point. There are a few references to sorting and moving thread context “across the GPU” but that can mean anything. If I had to make I wild guess the sorting may be localized to a GPC as a single SM doesn’t have enough active threads to make sorting useful.
BTW: Computerbase has Raytracing numbers from the Callisto Protocol: https://www.computerbase.de/2022-12...bschnitt_rdna_3_in_aktuellen_neuerscheinungen
A 4090 loses 100FPS in 4K and it performs worse than Cyberpunk with Pyscho RT settings on a 4090...
It's a single threaded RT implementation, it's limited by the CPU on NVIDIA GPUs, not by the GPU, as the utilization of NVIDIA GPUs is sub 80%. The CPU overhead on NVIDIA GPUs is much larger than AMD GPUs in this game.Computerbase has Raytracing numbers from the Callisto Protocol: https://www.computerbase.de/2022-12...bschnitt_rdna_3_in_aktuellen_neuerscheinungen
A 4090 loses 100FPS in 4K and it performs worse than Cyberpunk with Pyscho RT settings on a 4090...
Getting a big lol about how the RX 6800 XT is just behind the 3090 Ti in a game with 3 ray tracing effects. Truly, the signs of a very representative title in ray tracing.BTW: Computerbase has Raytracing numbers from the Callisto Protocol: https://www.computerbase.de/2022-12...bschnitt_rdna_3_in_aktuellen_neuerscheinungen
A 4090 loses 100FPS in 4K and it performs worse than Cyberpunk with Pyscho RT settings on a 4090...