Next gen lighting technologies - voxelised, traced, and everything else *spawn*

Actually, I may be wrong about them building BVH on CPU. I remember them speaking about research into dedicated BVH h/w during GDC'19 but that doesn't mean that they aren't building them on CUDA cores now. Here's the presentation on this:
 
Actually, I may be wrong about them building BVH on CPU. I remember them speaking about research into dedicated BVH h/w during GDC'19 but that doesn't mean that they aren't building them on CUDA cores now. Here's the presentation on this:
It maybe is game dependent, Metro Exodus seems to be using the GPU to build the BVH, Battlefield V seems to be using the CPU.
 
I have to mention, what World Of Tanks is doing reminds me a lot of NVIDIA's Hybrid Frustrum Traced Shadows (HFTS) released in games like Watch Dogs 2, Battlefront 2 and Division.
No, world of tankis is classical RT. HFTS is binning triangles to a regular grid (dimensions like shadow map, with each texel storing a list of triangles). For HFTS you would not need a BVH at all, but it is restricted to all rays having the same origin.
 
Tried it myself, 1440p and all on ultra with Vega56 and Ryzen 2700. Benchmark does not show average fps, just a score of 8303.
But with tanks on screen fps drop to 30. With RT off i have about 80 in such scenes.
 
Complete RT Pascal, Turing, Polaris, Vega and Navi benchmarks for World of Tanks, things to note:

-Vega is so far behind, a Vega 64 is slower than a GTX 1660/1070! Even at 4K! Radeon VII is barely any faster than a 1070 too.
-Navi is miles faster than Vega, like 50% faster comparing Vega 64 vs 5700 XT. Or 20% faster comparing Radeon VII vs 5700 XT.
-Turing does generally better than Pascal and Navi, a regular 2060 is as fast as a 5700XT or a GTX 1080.
-Polaris is 20% behind Pascal.

https://wccftech.com/world-of-tanks...d-polaris-vega-navi-pascal-and-turing-tested/
https://gamegpu.com/mmorpg-/-онлайн-игры/world-of-tanks-encore-rt-test-gpu-cpu
 
Last edited:
Vega is so far behind, a Vega 64 is slower than a GTX 1660/1070! Even at 4K! Radeon VII is barely any faster than a 1070 too.
-Navi is miles faster than Vega, like 50% faster comparing Vega 64 vs 5700 XT. Or 20% faster comparing Radeon VII vs 5700 XT.
That's expected because GCN is bad with random access. Navi fixed this with the new cache hierarchy, but seems NV still has the edge here, assuming it's the very dominant factor for this interesting benchmark.

BTW, according to https://github.com/sebbbi/perftest Navi seems even faster with random access than linear?!? - but maybe that's just some noise. (Random has still small offsets in his tests, smaller than what RT would cause.)

Navi:
StructuredBuffer<float>.Load uniform: 9.047ms 1.395x
StructuredBuffer<float>.Load linear: 5.461ms 2.310x
StructuredBuffer<float>.Load random: 4.722ms 2.672x

GCN3:
StructuredBuffer<float>.Load uniform: 12.653ms 2.770x
StructuredBuffer<float>.Load linear: 8.913ms 3.932x
StructuredBuffer<float>.Load random: 35.059ms 1.000x
 
assuming it's the very dominant factor for this interesting benchmark.
Why should it be dominant here?

With High preset, all shadows are hard in the benchmark. That means all rays were shotten towards an infinity small light source, these rays must be pretty coherent, as well as memory accesses.
With maximum settings, there is a small penumbra on distant part of shadows. Rays are now sent towards a random dot inside of a small area light, yet, these are still pretty coherent too because area light (The Sun) is still pretty small.
 
Thanks this is what I was curious about
I believe It really shines the light on why we need hardware RT right now, simple RT shadows that are accelerated by both the CPU and GPU, and are limited to certain models, still cut fps by more than half on the latest GPUs. When you compare that to what's possible with hardware RTX: whole scene RT GI and AO, whole scene RT shadows, whole scene RT reflections .. the benefit of specialized RT hardware can not be understated.
 
Last edited:
Why should it be dominant here?
Even if threads may fetch similar nodes often, the nodes are still scattered randomly (or likely morton order).
While it's hard to say how much RT cost is related to the rest of the frame, we can see GCN suffers more from RT on than NV GPUs. (Also NV is faster in Radeon Rays, people say, but never found benchmarks)

Matches my experience of 1070 coming closer to FuryX in shaders where access is random.
Interestingly, GCN also suffers from successive random access, like with the nodes here.
And the worst case is - you surely know about having threads accessing memory with larger power of two strides is very slow - but the same is true if all threads write a nice linear range, followed by skipping a large x^2 stride in the next instruction.
I think this successive cases are not documented (?) and i don't know the techincal reason, but changing list sizes from 256 to 257 gave me a speedup of 2 in a shader that really seemed ALU limited for example. (NV was unaffected from such changes)
 
The BVH was being done on the CPU, isn't that what is accelerated on RTX?

If so, what part of the pipeline is being so heavily affected on the GPUs?
 
The BVH was being done on the CPU, isn't that what is accelerated on RTX?

If so, what part of the pipeline is being so heavily affected on the GPUs?
No. BVH trees are created on the CPU on RTX too (PowerVR does this on their RT unit though, IIRC). Turing RT cores accelerate ray intersecetion / hit testing on those BVHs the CPU built.
 
Back
Top