AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

At about 1 trillion rays/s, how much sorting and over what proportion of the rays is "shuffling" going to be meaningful? Seems to me that the cache hierarchy is the right place to worry about coherence...
IIRC, research on advanced reordering in GPU HW gave a speed up of about 2 for incoherent test cases. As long as we are interested mostly in coherent rays (shadows, sharp reflections), i think that's not worth it yet. But as time moves on we might get it and then inline tracing becomes deprecated.

However, the really interesting thought about this "AMD prefers inline tracing", and the assumption "inline tracing has been added for AMD / consoles" is this:
It fits the TMU patent where traversal outer loop is handled from regular shader cores. And if that's what they have finally used, traversal shaders would be possible.
So my hope AMD exposes it with extensions is not totally dead yet.
 
Also it is already possible to support stochastic LOD in DXR at full speed:
https://developer.nvidia.com/blog/implementing-stochastic-lod-with-microsoft-dxr/
This is very limited. The ray has to select LOD at launch, but here we don't know the distance it travels until it hits something. Can't remember precisely, but my conclusion was this is usable e.g. for characters, but can't be extended to full scene.
What we want is to switch LOD after the ray has passed some distance to the camera. The ray is still in air and traversal, so there is no callback or shader stage where we could do this.
To fully emulate stochastic LOD like in the Intel paper, we could only divide the ray into segments, requiring to restart a full trace at each boundary.
Traversal shaders would solve this, but they still lack the option to share upper BVH levels across discrete LODs which might be interesting too.
Lacking any idea what else Traversal Shaders would be good for, implementing such LOD switches in HW instead could be enough and faster.
 
If you place "transparent quads" as "LOD curtains" in a scene, you get a distance query. Or a transparent LOD sphere? Or nested transparent LOD spheres?
 
I'm thinking LOD curtains in a moderately dense grid are statically placed. This means coarser levels of the TLAS don't need to keep being updated to move the curtains with the camera. Instead there's simply "too many" curtains. The ray payload doesn't change in size as it traverses curtains, but obviously there's increased latency for the entire time of flight for a ray, much of which will be cached.

So the payback for using curtains is low-density meshes (or billboards?) in distant, finer, levels of TLAS. Presumably it's possible to keep the BVH's size approximately constant while supporting a low rate of BVH LOD update.

I don't know how the update for selected "distant" BVH nodes works, in order to keep dense distant meshes at "low quality", but improve meshes as they get closer to the camera (and the converse for meshes that move away from the camera). I'm struggling to find any meaningful content on BVH update strategies.

It sounds like a fun optimisation problem. Console devs are going to have a great time exploring this stuff (PS5, mostly, assuming PC is something they can ignore).
 
Few bits about RT performance and FidelityFX "SuperResolution"


I found this little bit more interesting.

“For example, our Ryzen 5000 [CPUs] use 2D packaging, but nevertheless the concept is the same thing, where of course our core complex uses 7-nanometer but our I/O die [uses] 12-nanometer...That works okay to a certain level, but suddenly you need high-bandwidth connections between your various pieces, and that's what 3D technology can potentially bring.

That was accompanied by a slide showing >10x bandwidth density. If that somehow translates into 10x (or more) the interchip bandwidth that they can currently get, would that make multi-GPU chiplets more feasible? Obviously this is more likely related to an upcoming Zen CPU or possibly a Zen CPU paired up with a discrete GPU chip, but would it be enough to allow them to start thinking about GPU chiplets?

Regards,
SB
 
No need to "guess" what AMD has to offer in terms of software support:

For upscaling:
https://www.amd.com/en/technologies/radeon-software-fidelityfx#CONTRAST-ADAPTIVE-SHARPENING
https://github.com/GPUOpen-Effects/FidelityFX-CAS

No surprise here, the contrast adaptive sharpening is just the quality we are used to, and not really suitable to target a higher resolution. Single frame in and out, no temporal accumulation, no integration with MSAA resolve, no integration with TAA, no depth buffer for better distinction.

As a result, visual quality is as limited as ever. It's just nowhere near to what NVidia can do with their fully integrated DLSS 2.0 solution, it simply doesn't have enough input to achieve the same IQ in (almost) still scenes.
Still, 1440p to 4k is still reasonable according to comments in the code. Just don't expect anything surpassing the original DLSS 1.0, with all the quirks that one had.


For the denoiser:
https://www.amd.com/en/technologies/radeon-software-fidelityfx#DENOISER
The example looks like something around 1-2 samples per pixel on the input side, and then used for global illumination only, reconstructed with full G-buffer.

No public source code yet, but the quality doesn't exactly appear to surpass the first iteration of what NVidia had to offer either.
Keep in mind that NVidia only achieved a proper IQ in combination with DLSS, a denoiser with only a single frame worth of input can only do so much.

All in all, not bad, but still about a man-year behind in R&D on the software end. And doesn't exactly appear to much manpower behind it either, TBH.
 
Holy crap on a cracker :confused:

Look at that LED-matrix behind the GPU. Also is that backplate-display built-in or something we can buy separately :runaway:
(around 4:45 if you dont't want to watch the whole thing)

edit: it's a pico-projector he built into the rig
 
Back
Top