Draw Calls

Discussion in 'Beginners Zone' started by DavidGraham, Feb 18, 2012.

  1. silent_guy

    Veteran

    Joined:
    Mar 7, 2006
    Messages:
    3,425
    When the CPU constructs a draw call, the GPU executes a different one in parallel, right? So a GPU should not be slowed down by draw call overhead. With PCs getting ever more CPU cores, will draw call overhead really be an issue within the next couple of years? It's not as if games right now are making 100% use of all the CPU power. I don't think this is going to change. Or am I missing something (which is very likely) ?

    I guess the first question really is: is the GPU often put into idle mode only because of draw call overhead (so not because there is no work to be done.) If the answer to that is 'yes', then the rest doesn't need to be answered...
     
  2. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,340
    In most situations the driver/CPU is multiple draw calls ahead of the GPU so yes, they work in parallel. The GPU is only slowed down if it's starved for work.
     
  3. Rodéric

    Rodéric a.k.a. Ingenu
    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,476
    Location:
    Planet Earth.
    No because devs are already optimising for min draw calls ?

    I think people want to know why/if they need to put extra effort optimising to minimise draw calls.
     
  4. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    544
    Location:
    Slovenia
    Draw calls are primarily a CPU problem, not a GPU problem. And strictly speaking it's not the number of draw calls that is the problem. It's the amount of state switching and figuring out where D3D/OpenGL resources actually are in hardware.
    If you have lot's of vertex/index buffers then CPU will have to translate API handles to actual hardware addresses all the time. This isn't even an CPU problem that you could solve by having more cores or more threads. It depends alot on memory latency.

    I did some test a while ago... Basically it goes from one draw primitive call to 100k draw primitive calls with a total budget of 15M triangles that's the same throughout entire run. Same texture, same shader just flipping vertex and index buffers each draw primitive call and uploading some constants.
    This is D3D 11: https://static.slo-tech.com/52734.jpg
    And this is mulithreaded D3D 11 vs NV properitary OpenGL extensions: https://static.slo-tech.com/52736.jpg
     
  5. Rodéric

    Rodéric a.k.a. Ingenu
    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,476
    Location:
    Planet Earth.
    Quite interesting.
     
  6. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Location:
    /
    What are the axes? The y axis seems to be million triangles/sec, not sure what you have for X.
     
  7. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Location:
    /
    It's usually more complicated than that. The apps usually kick out any data structures that a driver might be using. So more things you change->more lookups->more CPU stalls. There are more things going on as well, but this is one of them.
     
  8. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    544
    Location:
    Slovenia
    X is number of draw primitive calls. Y is milions of triangles per second.
    Sorry for messup (it's quite early here :)).
     
    #28 MDolenc, Mar 22, 2012
    Last edited by a moderator: Mar 22, 2012
  9. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Location:
    /
    And what is X?
     
  10. Dominik D

    Regular

    Joined:
    Mar 23, 2007
    Messages:
    780
    Location:
    Wroclaw, Poland
    Both have been labeled already. ;) There's a constant TRI per frame count (15M) but the number of Draw calls per frame changes (from 1 to 100k -- this means that there are from 15M to 150 TRIs per draw call). X is number of draw calls per frame, Y is the number of TRIs rendered in one second across all frames that went into a 1 second span.
     
  11. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    544
    Location:
    Slovenia
    Do note the GTX 580 is limited quite a bit first by 1 triangle per clock limit and second by the decision on NV side that index buffers are kept in system memory and have to be shiped over pci-express.
    But what I found really interesting is how fast Radeon, which does not have any of these "peak limiting" factors, falls off a clif.
     
  12. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    12,924
    I have a noob question
    when a draw call is issued for an object is that it, or does it have to be issued for every frame that object remains on screen ?
     
  13. Richard

    Richard Mord's imaginary friend
    Veteran

    Joined:
    Jan 22, 2004
    Messages:
    3,508
    Location:
    PT, EU
    Unless you're drawing to secondary surfaces for special effects (image-based reflections, certain low-grade shadowing, etc.) that you might want to persist over a couple or handful of frames (for performance reasons) you issue all your commands per frame.

    Multi-GPU rendering may not apply, see your doctor for advisement

    I'm surprised OGL (plus NV extensions) is so efficient here. I'm assuming you're using the latest OGL ICD correct? Having said that, there's no games using over 10K calls right now (correct ?) or such constant geometry/lighting/shading setup. EDIT: btw, I'm not pointing flaws at your methodology or choice of variables to test, I'm trying to understand how this data relates to the real worldtm
     
  14. silent_guy

    Veteran

    Joined:
    Mar 7, 2006
    Messages:
    3,425
    Thanks for posting this. Very interesting.

    I saw your reply on Timothy Lottes' blog.

    I went through the Nvidia presentation about the extension that replacing OpenGL names by GPU addresses. You commented that it's a good thing that Kepler has bindless textures. Does this mean that this particular NV extension isn't useful anymore or are there still cases where it is?

    The way I think it works is that it's now up to the GPU to do the conversion between a GL name and an address, instead of having the CPU driver do it?

    Also: does DX work the same way as OpenGL, in that there is a translation between names and GPU address too? If so, I assume that Kepler would fix the same problem for DX as well?

    Finally, this has nothing to do with the virtual textures of AMD, right? Does AMD already have bindless textures?

    Thanks!
     
  15. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Location:
    /
    I think that extension is now backed by hw features.
    Not unless DX evolves to have this feature. The whitepaper says so.

    Finally, this has nothing to do with the virtual textures of AMD, right? Does AMD already have bindless textures?

    Thanks!
     
  16. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    544
    Location:
    Slovenia
    Of course it's still usefull. The two existing extensions handle vertex/index buffers and constant buffers. The new extension allows handling textures in the same way.

    That's always been up to the GPU driver. But with these extensions you can ask GPU driver to give you resource address in GPU virtual address space (basically a pointer) once and then use it everytime you switch resource.

    Yes DX is the same. But this is not accessible from DX at all. DX is completly fixed API, you don't have extensions. Nothing of this bindless stuff does anything for DX. Atleast until DX 12 or something like that.

    Yes, that's a completly different thing. Is there any documentation for that extension though?

    It's not OpenGL that's much more efficient. Without bindless extensions it's pretty much the same deal as DX.

    Yeah that's where things gets a bit tricky. Modern games can get to 10K calls, but they have more complex setup (they change textures, shaders,...). But neither do they render 16M triangles (I was off by 1M earlier :)).

    Some more graphs may be interesting...
    If you wonder how things improved over the years: http://static.slo-tech.com/52728.jpg
    Triangle count is a bit lower in this case (500k triangles), since 7600 GS seems to have a problem with 32 bit indices. This is logaritmic scale since numbers drop so sharply so fast it's not even funny. :) 7600 GS (AGP) runs on AthlonXP 2800+. GTX 580 and Radeon 6950 are on Q9550.
     

Share This Page

  • About Beyond3D

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...