Draw Calls

When the CPU constructs a draw call, the GPU executes a different one in parallel, right? So a GPU should not be slowed down by draw call overhead. With PCs getting ever more CPU cores, will draw call overhead really be an issue within the next couple of years? It's not as if games right now are making 100% use of all the CPU power. I don't think this is going to change. Or am I missing something (which is very likely) ?

I guess the first question really is: is the GPU often put into idle mode only because of draw call overhead (so not because there is no work to be done.) If the answer to that is 'yes', then the rest doesn't need to be answered...
 
In most situations the driver/CPU is multiple draw calls ahead of the GPU so yes, they work in parallel. The GPU is only slowed down if it's starved for work.
 
I guess the first question really is: is the GPU often put into idle mode only because of draw call overhead (so not because there is no work to be done.) If the answer to that is 'yes', then the rest doesn't need to be answered...

No because devs are already optimising for min draw calls ?

I think people want to know why/if they need to put extra effort optimising to minimise draw calls.
 
When the CPU constructs a draw call, the GPU executes a different one in parallel, right? So a GPU should not be slowed down by draw call overhead. With PCs getting ever more CPU cores, will draw call overhead really be an issue within the next couple of years? It's not as if games right now are making 100% use of all the CPU power. I don't think this is going to change. Or am I missing something (which is very likely) ?
Draw calls are primarily a CPU problem, not a GPU problem. And strictly speaking it's not the number of draw calls that is the problem. It's the amount of state switching and figuring out where D3D/OpenGL resources actually are in hardware.
If you have lot's of vertex/index buffers then CPU will have to translate API handles to actual hardware addresses all the time. This isn't even an CPU problem that you could solve by having more cores or more threads. It depends alot on memory latency.

I did some test a while ago... Basically it goes from one draw primitive call to 100k draw primitive calls with a total budget of 15M triangles that's the same throughout entire run. Same texture, same shader just flipping vertex and index buffers each draw primitive call and uploading some constants.
This is D3D 11: https://static.slo-tech.com/52734.jpg
And this is mulithreaded D3D 11 vs NV properitary OpenGL extensions: https://static.slo-tech.com/52736.jpg
 
Draw calls are primarily a CPU problem, not a GPU problem. And strictly speaking it's not the number of draw calls that is the problem. It's the amount of state switching and figuring out where D3D/OpenGL resources actually are in hardware.
If you have lot's of vertex/index buffers then CPU will have to translate API handles to actual hardware addresses all the time. This isn't even an CPU problem that you could solve by having more cores or more threads. It depends alot on memory latency.

I did some test a while ago... Basically it goes from one draw primitive call to 100k draw primitive calls with a total budget of 15M triangles that's the same throughout entire run. Same texture, same shader just flipping vertex and index buffers each draw primitive call and uploading some constants.
This is D3D 11: https://static.slo-tech.com/52734.jpg
And this is mulithreaded D3D 11 vs NV properitary OpenGL extensions: https://static.slo-tech.com/52736.jpg

What are the axes? The y axis seems to be million triangles/sec, not sure what you have for X.
 
When the CPU constructs a draw call, the GPU executes a different one in parallel, right? So a GPU should not be slowed down by draw call overhead. With PCs getting ever more CPU cores, will draw call overhead really be an issue within the next couple of years? It's not as if games right now are making 100% use of all the CPU power. I don't think this is going to change. Or am I missing something (which is very likely) ?

I guess the first question really is: is the GPU often put into idle mode only because of draw call overhead (so not because there is no work to be done.) If the answer to that is 'yes', then the rest doesn't need to be answered...

It's usually more complicated than that. The apps usually kick out any data structures that a driver might be using. So more things you change->more lookups->more CPU stalls. There are more things going on as well, but this is one of them.
 
What are the axes? The y axis seems to be million triangles/sec, not sure what you have for X.
X is number of draw primitive calls. Y is milions of triangles per second.
Sorry for messup (it's quite early here :)).
 
Last edited by a moderator:
Both have been labeled already. ;) There's a constant TRI per frame count (15M) but the number of Draw calls per frame changes (from 1 to 100k -- this means that there are from 15M to 150 TRIs per draw call). X is number of draw calls per frame, Y is the number of TRIs rendered in one second across all frames that went into a 1 second span.
 
Do note the GTX 580 is limited quite a bit first by 1 triangle per clock limit and second by the decision on NV side that index buffers are kept in system memory and have to be shiped over pci-express.
But what I found really interesting is how fast Radeon, which does not have any of these "peak limiting" factors, falls off a clif.
 
I have a noob question
when a draw call is issued for an object is that it, or does it have to be issued for every frame that object remains on screen ?
 
Unless you're drawing to secondary surfaces for special effects (image-based reflections, certain low-grade shadowing, etc.) that you might want to persist over a couple or handful of frames (for performance reasons) you issue all your commands per frame.

Multi-GPU rendering may not apply, see your doctor for advisement

X is number of draw primitive calls. Y is milions of triangles per second.
Sorry for messup (it's quite early here :)).

I'm surprised OGL (plus NV extensions) is so efficient here. I'm assuming you're using the latest OGL ICD correct? Having said that, there's no games using over 10K calls right now (correct ?) or such constant geometry/lighting/shading setup. EDIT: btw, I'm not pointing flaws at your methodology or choice of variables to test, I'm trying to understand how this data relates to the real worldtm
 
Draw calls are primarily a CPU problem, not a GPU problem. And strictly speaking it's not the number of draw calls that is the problem. It's the amount of state switching and figuring out where D3D/OpenGL resources actually are in hardware.
If you have lot's of vertex/index buffers then CPU will have to translate API handles to actual hardware addresses all the time. This isn't even an CPU problem that you could solve by having more cores or more threads. It depends alot on memory latency.

I did some test a while ago... Basically it goes from one draw primitive call to 100k draw primitive calls with a total budget of 15M triangles that's the same throughout entire run. Same texture, same shader just flipping vertex and index buffers each draw primitive call and uploading some constants.
This is D3D 11: https://static.slo-tech.com/52734.jpg
And this is mulithreaded D3D 11 vs NV properitary OpenGL extensions: https://static.slo-tech.com/52736.jpg
Thanks for posting this. Very interesting.

I saw your reply on Timothy Lottes' blog.

I went through the Nvidia presentation about the extension that replacing OpenGL names by GPU addresses. You commented that it's a good thing that Kepler has bindless textures. Does this mean that this particular NV extension isn't useful anymore or are there still cases where it is?

The way I think it works is that it's now up to the GPU to do the conversion between a GL name and an address, instead of having the CPU driver do it?

Also: does DX work the same way as OpenGL, in that there is a translation between names and GPU address too? If so, I assume that Kepler would fix the same problem for DX as well?

Finally, this has nothing to do with the virtual textures of AMD, right? Does AMD already have bindless textures?

Thanks!
 
Thanks for posting this. Very interesting.

I saw your reply on Timothy Lottes' blog.

I went through the Nvidia presentation about the extension that replacing OpenGL names by GPU addresses. You commented that it's a good thing that Kepler has bindless textures. Does this mean that this particular NV extension isn't useful anymore or are there still cases where it is?
I think that extension is now backed by hw features.
The way I think it works is that it's now up to the GPU to do the conversion between a GL name and an address, instead of having the CPU driver do it?

Also: does DX work the same way as OpenGL, in that there is a translation between names and GPU address too? If so, I assume that Kepler would fix the same problem for DX as well?
Not unless DX evolves to have this feature. The whitepaper says so.

Finally, this has nothing to do with the virtual textures of AMD, right? Does AMD already have bindless textures?

Thanks!
 
I went through the Nvidia presentation about the extension that replacing OpenGL names by GPU addresses. You commented that it's a good thing that Kepler has bindless textures. Does this mean that this particular NV extension isn't useful anymore or are there still cases where it is?
Of course it's still usefull. The two existing extensions handle vertex/index buffers and constant buffers. The new extension allows handling textures in the same way.

The way I think it works is that it's now up to the GPU to do the conversion between a GL name and an address, instead of having the CPU driver do it?
That's always been up to the GPU driver. But with these extensions you can ask GPU driver to give you resource address in GPU virtual address space (basically a pointer) once and then use it everytime you switch resource.

Also: does DX work the same way as OpenGL, in that there is a translation between names and GPU address too? If so, I assume that Kepler would fix the same problem for DX as well?
Yes DX is the same. But this is not accessible from DX at all. DX is completly fixed API, you don't have extensions. Nothing of this bindless stuff does anything for DX. Atleast until DX 12 or something like that.

Finally, this has nothing to do with the virtual textures of AMD, right? Does AMD already have bindless textures?
Yes, that's a completly different thing. Is there any documentation for that extension though?

I'm surprised OGL (plus NV extensions) is so efficient here. I'm assuming you're using the latest OGL ICD correct?
It's not OpenGL that's much more efficient. Without bindless extensions it's pretty much the same deal as DX.

Having said that, there's no games using over 10K calls right now (correct ?) or such constant geometry/lighting/shading setup.
Yeah that's where things gets a bit tricky. Modern games can get to 10K calls, but they have more complex setup (they change textures, shaders,...). But neither do they render 16M triangles (I was off by 1M earlier :)).

Some more graphs may be interesting...
If you wonder how things improved over the years: http://static.slo-tech.com/52728.jpg
Triangle count is a bit lower in this case (500k triangles), since 7600 GS seems to have a problem with 32 bit indices. This is logaritmic scale since numbers drop so sharply so fast it's not even funny. :) 7600 GS (AGP) runs on AthlonXP 2800+. GTX 580 and Radeon 6950 are on Q9550.
 
Back
Top