DXR performance - CPU cost

Discussion in 'Rendering Technology and APIs' started by Scott_Arm, Dec 15, 2020.

  1. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,759
    Likes Received:
    6,893
    I'm really curious about DXR performance. The only game I have to test right now is Control. In certain rooms my fps takes a huge hit, I'm assuming because there are more RT effects in action in those scenes. You can see as I switch from Native to DLSS Quality to DLSS Balanced and to DLSS performance, my frame rate does not go up by my gpu utilization goes down. I thought by dropping the internal resolution I'd be casting less rays and would remove an RT core bottleneck. It looks like the real hit to performance is a sunk cost elsewhere.

    I think I'm cpu limited, but I'm not sure. When I check actually thread utilization, it doesn't look like that's the case. Is this a memory latency issue? Just something happening at the driver level that bottlenecks each frame but doesn't last long enough to show up in the cpu monitoring? I'm very curious. Nsight doesn't work with this game.

    I have an RTX3080 with a ryzen 3600x. I'm going to replace the cpu whenever the ryzen 6000 comes out. I knew I'd be cpu limited in a whole bunch of games, but the behaviour here just looked weird to me.

    Approximate 75 fps across the board.

    upload_2020-12-14_20-56-35.png upload_2020-12-14_20-56-48.png

    upload_2020-12-14_20-57-1.png upload_2020-12-14_20-57-17.png

    For good measure here's one in 640x480 with DLSS performance mode. I gain about 7 fps.

    upload_2020-12-14_20-58-22.png
     
  2. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,021
    Likes Received:
    15,765
    Location:
    The North
    I’m not sure about DXR costs but DLSS costs are the same regardless. So everything else can go faster but you’re going to be limited by the neural network.
     
  3. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,759
    Likes Received:
    6,893
    It’s not the dlss. I can hit my 138 fps cap in most areas with dlss quality.
     
  4. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    18,163
    Likes Received:
    2,775
    Location:
    Winfield, IN USA
    So I need to upgrade my brain to get better performance in Control? Damn, that's pretty hardcore!
     
  5. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,759
    Likes Received:
    6,893
    It has to be Bvh build or refit at the driver level. It’s the only thing that would remain constant regardless of the internal render resolution. If it’s single threaded or maybe cache unfriendly it could really bottleneck.

    maybe expensive context switches between user and kernel mode as it’s built or refitted? Maybe some issues with copying data from ram to vram? Would be interesting if resizable BAR helped here. Some cpu scaling tests with the game set at 640x480 with performance mode dlss would go a long way. Not sure what the easiest way to lower my cpu clock would be.
     
  6. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    921
    Likes Received:
    356
    BVH building is on the GPU, although there's considerable book-keeping of data from the CPU side. It essentially need to maintain two distinct scene "submissions" simultaniously.
     
  7. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,759
    Likes Received:
    6,893
    well if this fixed cost of enabling ray tracing is on the gpu side it should get even worse if I lower my gpu clock. Maybe I’ll play with that tonight.
     
    BRiT likes this.
  8. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    921
    Likes Received:
    356
    It should be a regular compute shader. Someone raised the idea that there might be hardware support for BVH construction, but to me that sounds very exotic and inflexible, and not really necessary. It can be asynchronous, and low prio, and as such a bit difficult to measure. I guess Nsight can tell you how that is setup and works.
     
    iroboto and BRiT like this.
  9. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,759
    Likes Received:
    6,893
    I tried to use nsight on control but it says it's incompatible because it's using a D3D11ON12 API layer. My thought is I can go back to that scene where I took the screen caps and set my resolution at 640x480 with performance dlss so the internal resolution is something like 320x240. Turn off ray tracing and see what I hit in terms of fps. I'll be cpu limited, but then I can turn on ray tracing and see what frame time difference is. They I can play with my gpu clock and see if that changes. Playing with my cpu clock seems more annoying. Maybe I can do that easily with ryzen master.

    Edit: I should also check whether it matters how many ray tracing effects I turn on. I have a feeling it won't matter, but maybe I'm wrong.

    Edit2: I also know this will vary scene to scene. There are a couple of rooms I'm aware of where I seem to hit this kind of frame cap, but in other places I can easily hit the fps cap I've set for gsync (138) with everything maxed out.
     
    #9 Scott_Arm, Dec 15, 2020
    Last edited: Dec 15, 2020
    BRiT likes this.
  10. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,021
    Likes Received:
    15,765
    Location:
    The North
    why not just remove DLSS from the picture entirely for this benchmark? I've been following along and I get the part where you can't understand how changing resolution with RT isn't impacting performance is weird. But enabling DLSS makes the issue harder to diagnose.

    The challenge is that DLSS will never run any faster, it's a fixed cost, the only way it's going to run faster is to increase the GPU clockspeed to complete the network faster. So quality, balanced, and performance are different networks, but each one of those has a fixed cost regardless of how low you bring the resolution. The speed of processing that network could be your frame rate limiter once you get to high enough FPS. If everything else is already sub < 1 ms, and your neural networks take a hypothetical 4, 5, and 6 ms (p, n, q) to complete, you're sort of stuck on obtaining more speed unless you get rid of it.
     
    #10 iroboto, Dec 15, 2020
    Last edited: Dec 15, 2020
  11. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,759
    Likes Received:
    6,893
    The lower I can push the resolution down, the more confident I am that I'll be cpu limited. I'll play around with it though. There could be some frame rate where the tensor cores can no longer keep up. Not sure. Not a bad experiment in itself. I'll see how low the game will let me set the resolution, and then see if dlss starts to have a negative impact at extremely high frame rates.

    Edit: I may also grab ghost runner or something so I have a second game to experiment with.
     
    BRiT likes this.
  12. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,021
    Likes Received:
    15,765
    Location:
    The North
    I think if you're goal is to find the breaking point of being CPU limited, I would definitely consider to remove DLSS as it will bottleneck performance eventually. It can't run any faster regardless of the resolution you set it at. The only thing that affect DLSS computation times will be (a) the computational power - ie tensor cores, clockrate, bandwidth and (b) the size of the network.
    Since you can't change (b) with resolution. You're left with only (a).
     
    PSman1700, DavidGraham and Scott_Arm like this.
  13. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    8,274
    Likes Received:
    4,702
    Location:
    Pennsylvania
    Aren't Tensor cores on their own clock domain?
     
  14. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,021
    Likes Received:
    15,765
    Location:
    The North
    I just assumed they ran at the same clock rate as everything else.
     
  15. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,759
    Likes Received:
    6,893
    I do not know ...
     
  16. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,759
    Likes Received:
    6,893
    Black Ops Cold War seems like it has the same problem running at 360p:



    Watch Dogs Legion does not seem to have this problem.



    Seems like they're obviously doing different things game to game, but whatever Control and Cold War do leads to more severe frame time penalties under particular conditions that isn't resolved by lowering resolution.
     
    Man from Atlantis and BRiT like this.
  17. Man from Atlantis

    Regular

    Joined:
    Jul 31, 2010
    Messages:
    920
    Likes Received:
    734
  18. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,759
    Likes Received:
    6,893
    @Man from Atlantis In Control if I compare RT on/off at 640x480 in the scene I posed above, I would not be surprised if I'm losing 100 fps.
     
  19. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,796
    Likes Received:
    1,978
    Location:
    Earth
    Extra cpu cost could be somewhat be tied down into deciding what data to use for BVH building. Depending how engine is built the data might be available or it might have to be mined from traditional data structures if engine wasn't designed with rt in mind. Some assets for RT might also be different than what is used for rasterization. Collecting this data and feeding it to bvh building could consume significant cpu even if bvh building itself was done with compute.
     
    #19 manux, Dec 15, 2020
    Last edited: Dec 15, 2020
    Man from Atlantis likes this.
  20. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    2,206
    Likes Received:
    1,601
    Location:
    msk.ru/spb.ru
    BVH TLAS is built on CPU, this isn't new. Thus a game running completely CPU limited will show worse results with RT than without.
     
    PSman1700 likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...