DXR performance - CPU cost

BVH TLAS is built on CPU, this isn't new. Thus a game running completely CPU limited will show worse results with RT than without.

So this is probably what I'm experiencing here. It would make a lot of sense because it's independent of the resolution or the number of rays.

Edit: It's a very interesting problem. I basically have a pc with a mid range cpu that doesn't seem to be working too hard and a gpu sitting there at 50% all because of one particular operation that hangs the whole thing. I'm really curious to see how much this can be optimized, or if that's somewhat hampered by the api and driver. Seems like watch dogs legion does a better job, and it has one of the better DXR implementations.
 
I wonder if I can capture this with renderdoc or verysleepy. Renderdoc isn't really a profiler, I don't think. Verysleepy is a profiling tool for cpu, but I wouldn't really be able to sync it on a frame per frame basis. I'm assuming if I compare RT on to RT off, I should be able ot compare and see the bottleneck.
 
I forgot about PIX. It seems to have some support for D3D11ON12 so it might work.

It's very lackluster. I suggest you look for a truee DX12 title, this massaging of glueing different APIs together isn't really a joy to dig into.
 
640x480 low, no RT: 137 fps, 27% gpu load
640x480 low, contact shadows RT: 123 fps, 34% gpu load
640x480 low, reflections RT: 93 fps, 35% gpu load
640x480 low, diffuse lighting RT, 119 fps, 33% gpu load
640x480 low, transparent reflections RT, 93 fps, 33% gpu load
640x480 low, ray traced debris RT, 135 fps, 26% gpu load
640x480 low, Full RT, 87 fps, 44% gpu load
640x480 low, Full RT + DLSS performance: 86 fps, 41% gpu load

Clearly CPU limited at 640x480 low. The hit for ray tracing varies by what's enabled, but in any case gpu load increases only a small amount and the bottleneck is elsewhere. Full RT only hits 44% of GPU used.

In another area I can hit 240fps at 640x480 low with no RT. Turn on all RT effects and it only drops to 220. Goes from 8% gpu used to 65%. I can hit about 170 fps at 1080p ultra with all RT effects on here, pushing GPU to 99%.

Very weird. So the more complex scene has an interesting bottleneck to figure out. I know the worst case areas have more transparent surfaces etc.
 
Last edited:
I don’t think looking at GPU load is indicative of a CPU bottleneck. Just thinking out loud. RT is slow. Once that’s happening it’s waiting for RT to finish the chip is sitting idle. So the percentages and how much of the chip is being utilized is probably really bad with RT.
Are your 1080p benches the same?

i think if you got to Alex’s favourite hallway of death and benchmark RT there you should get proper scaling with resolution etc.
 
I don’t think looking at GPU load is indicative of a CPU bottleneck. Just thinking out loud. RT is slow. Once that’s happening it’s waiting for RT to finish the chip is sitting idle. So the percentages and how much of the chip is being utilized is probably really bad with RT.
Are your 1080p benches the same?

1080p is the same behaviour but slower. Hit about 126 fps with gpu load well below 99%, then I turn on RT and it drops to about 75 and utilization is even worse.

It could be that it's so cache unfriendly that it's VRAM accesses that is just stalling it all of the time. I still would think ray tracing at 640x480 would have solved that, but maybe the problem is that it's the slowest pixel in the framebuffer. So if one pixel takes a long time to process because it's anyhit shader or something slow, the whole frame is slow.

Oh, I tried Pix but it drops me from 170->60 fps in a simple scene, so I'm not sure how useful profiling would be.
 
1080p is the same behaviour but slower. Hit about 126 fps with gpu load well below 99%, then I turn on RT and it drops to about 75 and utilization is even worse.

It could be that it's so cache unfriendly that it's VRAM accesses that is just stalling it all of the time. I still would think ray tracing at 640x480 would have solved that, but maybe the problem is that it's the slowest pixel in the framebuffer. So if one pixel takes a long time to process because it's anyhit shader or something slow, the whole frame is slow.

Oh, I tried Pix but it drops me from 170->60 fps in a simple scene, so I'm not sure how useful profiling would be.
Yea. I think if we want to determine a CPU bottleneck with your GPU, we’re going to have to get different CPUs with your test setup. That would be definitive. I think 640 with low settings not RT is probably a CPU bottleneck. But once RT is enabled I think it’s a GPU bottleneck.

I recall when I tried pascal RTX. It would run fine at first until I hit actual ray tracing bits and then it would crawl to 1-7fps from 30fps. So I think if you want to bench and determine the connection between CPU and RT, you need to find a RT heavy area and start swapping different CPUs.
 
What about dropping CPU clock speed by 500Mhz or 1Ghz to see if you're CPU limited?
 
What about dropping CPU clock speed by 500Mhz or 1Ghz to see if you're CPU limited?

Yah, I was going to play with Ryzen Master and see if that's an easy route, but for some reason my ryzen master has an install issue so I don't know when I'll get around to fixing it.
 
Ok, so 640x480 with low preset and DLSS ultra so interal res less than 320x240 (DLSS has no issue keeping up with these frame rates). Ryzen 3600x with RTX 3080. Memory is 3200 with timings from an xmp profile (not great)

We have stock CPU clock and no RT we get 142 fps

upload_2020-12-16_19-29-34.png

At 3 GHz CPU clock and no RT we get 119 fps, so yes confirmed CPU limited

upload_2020-12-16_19-29-51.png

Then at stock clock and full RT we get 89 fps. Only 42% gpu use. Are we CPU limited?

upload_2020-12-16_19-30-19.png

At 3GHz cpu clock and full RT we get ... 74 fps. Looks like we're cpu limited. CPU clocked dropped by

upload_2020-12-16_19-30-35.png

Ok, maybe that cpu clock change was severe, so let's try 3950 MHz, so about 10% lower clock. Lose what looks like 6-9 fps depending on when I capture both images.

upload_2020-12-16_19-30-51.png
 
Last edited:
For good measure I just tested 1080p high with high RT and DLSS quality and it behaves the same way. If I lower my cpu clock by 400 MHz I lose fps. Increasing cpu clock increases gpu utilization.

upload_2020-12-16_22-47-2.png

upload_2020-12-16_22-47-30.png
 
Complexity should be contained with inline ray tracing if you’re looking to leverage only a little bit. Otherwise the suggestion is to go for the full call if you’re doing something complex. There is a performance implication here for inline, but what stands to be gained comes down to not needing to send additional calls or require additional scheduling.

I think RT reflections will likely need the non inline version.

but looking at what Chris is doing, a DXR path tracer, he does not seem affected by inline. And other comments indicates seeing no performance gain, so I don’t know. Might just be Chris’ method of setting up his Path tracer.
 
Last edited:
Back
Top