AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

There's a new 3dmark ray tracing feature test, just in time.

The 3DMark DirectX Raytracing feature test is available now. You'll need Windows 10, 64-bit with the May 2020 Update (version 2004) and a graphics card with drivers that support DirectX Raytracing Tier 1.1 to run the test.
According to Dave Oldcorn, this is the preferred mode for AMD.
 
It's a "synthetic" test, I guess. It's designed to isolate ray tracing performance. Seems to be path traced.
https://s3.amazonaws.com/download-aws.futuremark.com/3dmark-technical-guide.pdf
Just finished reading this feature test section.
So it's noisy DOF with 12 (default setting) relatively coherent rays (due to CPU sorting, which is impossible with other effects), tons of instancing in the video above and likely relatively shallow BVH due to the heavy instancing.
To be honest, this doesn't look like anything representative of real games, such as Minecraft RTX, where there are 0.5-1 rays on avarage for a given effect, BVH occupies up to several gigabytes of memory and rays are incoherent.
 
Nice! Even if it's not perfect it's a useful data point.
Sure, but the bench looks like something, which should be limited by ray-triangle intersetion performance alone.
The scene is completely static, no need to rebuild BVH for dynamic geometry, which is an essential part of RT.
Also scene complexity looks relatively low, you can pack the scene into a few AABBs, while in reality RDNA2 has 4 ray/box intersection blocks in CU for a reason - Ray-AABBs tests should dominate in execution time since scene complexity in real games is high.
Rays are coherent too, while secondary rays are almost always quite divergent.
This test looks completely useless unless someone adds the same DOF implementation into real games (still looks like a waste of performance since rasterisation will be much faster for this effect).
I wish they simply added a number of knobs, such as BVH depth, type of rays (primary, secondary), rays dispercy, number of skinned and static models in scene, this would have been so much better syntetic test.
 
https://s3.amazonaws.com/download-aws.futuremark.com/3dmark-technical-guide.pdf
Just finished reading this feature test section.
So it's noisy DOF with 12 (default setting) relatively coherent rays (due to CPU sorting, which is impossible with other effects), tons of instancing in the video above and likely relatively shallow BVH due to the heavy instancing.
To be honest, this doesn't look like anything representative of real games, such as Minecraft RTX, where there are 0.5-1 rays on avarage for a given effect, BVH occupies up to several gigabytes of memory and rays are incoherent.

True, but synthetic tests can have their uses as long as you don't make the assumption that they'll represent game performance. The problem with using only gmaing benchmarks is it can be hard to extrapolate game performance across generations, like when the consoles shift to new minimum specs. Generally new features get leveraged etc and it's hard to predict what the outcome will be on released pc hardware. For example, UE5 is going to behave very stranglely compared to just about any other game.
 
True, but synthetic tests can have their uses as long as you don't make the assumption that they'll represent game performance.
Sure, but RT is a complex thing, it's a whole pipeline with millions on nuances.
A good synthetic test for RT should have tons of knobs to play with scene configurations, materials, effects, etc, etc, not just the number of rays, it's not a tesselation.
For now, it's a bad synthetic test (since it doesn't represent any real world RT configurations), which configuration is likely skewed for one of vendors, otherwise I don't know why would they productize the benchmark at all.
 
Sure, but the bench looks like something, which should be limited by ray-triangle intersetion performance alone.
The scene is completely static, no need to rebuild BVH for dynamic geometry, which is an essential part of RT.
Also scene complexity looks relatively low, you can pack the scene into a few AABBs, while in reality RDNA2 has 4 ray/box intersection blocks in CU for a reason - Ray-AABBs tests should dominate in execution time since scene complexity in real games is high.
Rays are coherent too, while secondary rays are almost always quite divergent.
This test looks completely useless unless someone adds the same DOF implementation into real games (still looks like a waste of performance since rasterisation will be much faster for this effect).
I wish they simply added a number of knobs, such as BVH depth, type of rays (primary, secondary), rays dispercy, number of skinned and static models in scene, this would have been so much better syntetic test.

You’re right of course. However raytracing performance is determined by so many factors that any synthetic benchmark won’t necessarily predict in-game performance.

I would rather have simple feature tests that tease out raw triangle and box intersection throughput as a baseline to help us understand the hardware. That’s how we did it for fillrate and texturing.

Then maybe layer on other tests that focus on ray divergence, instancing and more complex BVHs.
 
It's not exclusive to RDNA, it goes back all the way to the first GCN
Is there any indication which window sizes (PCIe BAR Size) are supported by GCN and if these were reconfigurable after booting?

ROCm drivers support BAR Size of >4 GB at least on GFX8 (Polaris GCN4) and GFX9 (Vega GCN5), but these are the only two officially supported GPU architectures in ROCm so far, with limited unofficial support for GFX7 (Hawaii GCN2).


all modern x86 CPUs have automatic cache coherence via snooping built into their PCIE controllers. You can see in the Vulkan GpuInfo database that any system memory heaps (those without the DEVICE_LOCAL bit) all have the HOST_COHERENT bit set, meaning any GPU writes to system memory are automatically coherent with the CPU
Going the other way, CPU access to GPU memory on AMD GPUs is always considered coherent, but not automatically in hardware. Instead it's because the kernel mode driver explicitly flushes/invalidates the GPU's "host data path" caches every time a command buffer is submitted from user space.
This type of driver-assisted coherence is just a fallback which incurs significant overhead. Truely heterogeneous Unified Memory Architecture (UMA) is only possible with AMD APUs or supercomputer systems like the NVidia DGX-2 and the upcoming HP/Cray El Capitan, since they use proprietary protocols (Infinity Fabric / NVLink) that support hardware cache coherence with atomic memory access.
 
Last edited:
Back
Top