DXR performance - CPU cost

Discussion in 'Rendering Technology and APIs' started by Scott_Arm, Dec 15, 2020.

  1. Scott_Arm

    Scott_Arm Legend

    I'm really curious about DXR performance. The only game I have to test right now is Control. In certain rooms my fps takes a huge hit, I'm assuming because there are more RT effects in action in those scenes. You can see as I switch from Native to DLSS Quality to DLSS Balanced and to DLSS performance, my frame rate does not go up by my gpu utilization goes down. I thought by dropping the internal resolution I'd be casting less rays and would remove an RT core bottleneck. It looks like the real hit to performance is a sunk cost elsewhere.

    I think I'm cpu limited, but I'm not sure. When I check actually thread utilization, it doesn't look like that's the case. Is this a memory latency issue? Just something happening at the driver level that bottlenecks each frame but doesn't last long enough to show up in the cpu monitoring? I'm very curious. Nsight doesn't work with this game.

    I have an RTX3080 with a ryzen 3600x. I'm going to replace the cpu whenever the ryzen 6000 comes out. I knew I'd be cpu limited in a whole bunch of games, but the behaviour here just looked weird to me.

    Approximate 75 fps across the board.

    upload_2020-12-14_20-56-35.png upload_2020-12-14_20-56-48.png

    upload_2020-12-14_20-57-1.png upload_2020-12-14_20-57-17.png

    For good measure here's one in 640x480 with DLSS performance mode. I gain about 7 fps.

    upload_2020-12-14_20-58-22.png
     
    Tags:
  2. iroboto

    iroboto Daft Funk Legend Subscriber

    I’m not sure about DXR costs but DLSS costs are the same regardless. So everything else can go faster but you’re going to be limited by the neural network.
     
  3. Scott_Arm

    Scott_Arm Legend

    It’s not the dlss. I can hit my 138 fps cap in most areas with dlss quality.
     
  4. digitalwanderer

    digitalwanderer Dangerously Mirthful Legend

    So I need to upgrade my brain to get better performance in Control? Damn, that's pretty hardcore!
     
    Rootax likes this.
  5. Scott_Arm

    Scott_Arm Legend

    It has to be Bvh build or refit at the driver level. It’s the only thing that would remain constant regardless of the internal render resolution. If it’s single threaded or maybe cache unfriendly it could really bottleneck.

    maybe expensive context switches between user and kernel mode as it’s built or refitted? Maybe some issues with copying data from ram to vram? Would be interesting if resizable BAR helped here. Some cpu scaling tests with the game set at 640x480 with performance mode dlss would go a long way. Not sure what the easiest way to lower my cpu clock would be.
     
  6. Ethatron

    Ethatron Regular Subscriber

    BVH building is on the GPU, although there's considerable book-keeping of data from the CPU side. It essentially need to maintain two distinct scene "submissions" simultaniously.
     
  7. Scott_Arm

    Scott_Arm Legend

    well if this fixed cost of enabling ray tracing is on the gpu side it should get even worse if I lower my gpu clock. Maybe I’ll play with that tonight.
     
    BRiT likes this.
  8. Ethatron

    Ethatron Regular Subscriber

    It should be a regular compute shader. Someone raised the idea that there might be hardware support for BVH construction, but to me that sounds very exotic and inflexible, and not really necessary. It can be asynchronous, and low prio, and as such a bit difficult to measure. I guess Nsight can tell you how that is setup and works.
     
    iroboto and BRiT like this.
  9. Scott_Arm

    Scott_Arm Legend

    I tried to use nsight on control but it says it's incompatible because it's using a D3D11ON12 API layer. My thought is I can go back to that scene where I took the screen caps and set my resolution at 640x480 with performance dlss so the internal resolution is something like 320x240. Turn off ray tracing and see what I hit in terms of fps. I'll be cpu limited, but then I can turn on ray tracing and see what frame time difference is. They I can play with my gpu clock and see if that changes. Playing with my cpu clock seems more annoying. Maybe I can do that easily with ryzen master.

    Edit: I should also check whether it matters how many ray tracing effects I turn on. I have a feeling it won't matter, but maybe I'm wrong.

    Edit2: I also know this will vary scene to scene. There are a couple of rooms I'm aware of where I seem to hit this kind of frame cap, but in other places I can easily hit the fps cap I've set for gsync (138) with everything maxed out.
     
    Last edited: Dec 15, 2020
    BRiT likes this.
  10. iroboto

    iroboto Daft Funk Legend Subscriber

    why not just remove DLSS from the picture entirely for this benchmark? I've been following along and I get the part where you can't understand how changing resolution with RT isn't impacting performance is weird. But enabling DLSS makes the issue harder to diagnose.

    The challenge is that DLSS will never run any faster, it's a fixed cost, the only way it's going to run faster is to increase the GPU clockspeed to complete the network faster. So quality, balanced, and performance are different networks, but each one of those has a fixed cost regardless of how low you bring the resolution. The speed of processing that network could be your frame rate limiter once you get to high enough FPS. If everything else is already sub < 1 ms, and your neural networks take a hypothetical 4, 5, and 6 ms (p, n, q) to complete, you're sort of stuck on obtaining more speed unless you get rid of it.
     
    Last edited: Dec 15, 2020
  11. Scott_Arm

    Scott_Arm Legend

    The lower I can push the resolution down, the more confident I am that I'll be cpu limited. I'll play around with it though. There could be some frame rate where the tensor cores can no longer keep up. Not sure. Not a bad experiment in itself. I'll see how low the game will let me set the resolution, and then see if dlss starts to have a negative impact at extremely high frame rates.

    Edit: I may also grab ghost runner or something so I have a second game to experiment with.
     
    BRiT likes this.
  12. iroboto

    iroboto Daft Funk Legend Subscriber

    I think if you're goal is to find the breaking point of being CPU limited, I would definitely consider to remove DLSS as it will bottleneck performance eventually. It can't run any faster regardless of the resolution you set it at. The only thing that affect DLSS computation times will be (a) the computational power - ie tensor cores, clockrate, bandwidth and (b) the size of the network.
    Since you can't change (b) with resolution. You're left with only (a).
     
    PSman1700, DavidGraham and Scott_Arm like this.
  13. Malo

    Malo Yak Mechanicum Legend Subscriber

    Aren't Tensor cores on their own clock domain?
     
  14. iroboto

    iroboto Daft Funk Legend Subscriber

    I just assumed they ran at the same clock rate as everything else.
     
  15. Scott_Arm

    Scott_Arm Legend

    I do not know ...
     
  16. Scott_Arm

    Scott_Arm Legend

    Black Ops Cold War seems like it has the same problem running at 360p:



    Watch Dogs Legion does not seem to have this problem.



    Seems like they're obviously doing different things game to game, but whatever Control and Cold War do leads to more severe frame time penalties under particular conditions that isn't resolved by lowering resolution.
     
    Man from Atlantis and BRiT like this.
  17. Scott_Arm

    Scott_Arm Legend

    @Man from Atlantis In Control if I compare RT on/off at 640x480 in the scene I posed above, I would not be surprised if I'm losing 100 fps.
     
  18. manux

    manux Veteran

    Extra cpu cost could be somewhat be tied down into deciding what data to use for BVH building. Depending how engine is built the data might be available or it might have to be mined from traditional data structures if engine wasn't designed with rt in mind. Some assets for RT might also be different than what is used for rasterization. Collecting this data and feeding it to bvh building could consume significant cpu even if bvh building itself was done with compute.
     
    Last edited: Dec 15, 2020
    Man from Atlantis likes this.
  19. DegustatoR

    DegustatoR Veteran

    BVH TLAS is built on CPU, this isn't new. Thus a game running completely CPU limited will show worse results with RT than without.
     
    PSman1700 likes this.
Loading...

Share This Page

Loading...