Draw Calls

silent_guy · Mar 20, 2012

When the CPU constructs a draw call, the GPU executes a different one in parallel, right? So a GPU should not be slowed down by draw call overhead. With PCs getting ever more CPU cores, will draw call overhead really be an issue within the next couple of years? It's not as if games right now are making 100% use of all the CPU power. I don't think this is going to change. Or am I missing something (which is very likely) ?

I guess the first question really is: is the GPU often put into idle mode only because of draw call overhead (so not because there is no work to be done.) If the answer to that is 'yes', then the rest doesn't need to be answered...

3dcgi · Mar 20, 2012

In most situations the driver/CPU is multiple draw calls ahead of the GPU so yes, they work in parallel. The GPU is only slowed down if it's starved for work.

Rodéric · Mar 20, 2012

silent_guy said:
I guess the first question really is: is the GPU often put into idle mode only because of draw call overhead (so not because there is no work to be done.) If the answer to that is 'yes', then the rest doesn't need to be answered...

No because devs are already optimising for min draw calls ?

I think people want to know why/if they need to put extra effort optimising to minimise draw calls.

MDolenc · Mar 21, 2012

silent_guy said:
When the CPU constructs a draw call, the GPU executes a different one in parallel, right? So a GPU should not be slowed down by draw call overhead. With PCs getting ever more CPU cores, will draw call overhead really be an issue within the next couple of years? It's not as if games right now are making 100% use of all the CPU power. I don't think this is going to change. Or am I missing something (which is very likely) ?

Draw calls are primarily a CPU problem, not a GPU problem. And strictly speaking it's not the number of draw calls that is the problem. It's the amount of state switching and figuring out where D3D/OpenGL resources actually are in hardware.
If you have lot's of vertex/index buffers then CPU will have to translate API handles to actual hardware addresses all the time. This isn't even an CPU problem that you could solve by having more cores or more threads. It depends alot on memory latency.

I did some test a while ago... Basically it goes from one draw primitive call to 100k draw primitive calls with a total budget of 15M triangles that's the same throughout entire run. Same texture, same shader just flipping vertex and index buffers each draw primitive call and uploading some constants.
This is D3D 11: https://static.slo-tech.com/52734.jpg
And this is mulithreaded D3D 11 vs NV properitary OpenGL extensions: https://static.slo-tech.com/52736.jpg

Rodéric · Mar 21, 2012

Quite interesting.

rpg.314 · Mar 22, 2012

MDolenc said:
Draw calls are primarily a CPU problem, not a GPU problem. And strictly speaking it's not the number of draw calls that is the problem. It's the amount of state switching and figuring out where D3D/OpenGL resources actually are in hardware.
If you have lot's of vertex/index buffers then CPU will have to translate API handles to actual hardware addresses all the time. This isn't even an CPU problem that you could solve by having more cores or more threads. It depends alot on memory latency.

I did some test a while ago... Basically it goes from one draw primitive call to 100k draw primitive calls with a total budget of 15M triangles that's the same throughout entire run. Same texture, same shader just flipping vertex and index buffers each draw primitive call and uploading some constants.
This is D3D 11: https://static.slo-tech.com/52734.jpg
And this is mulithreaded D3D 11 vs NV properitary OpenGL extensions: https://static.slo-tech.com/52736.jpg

What are the axes? The y axis seems to be million triangles/sec, not sure what you have for X.

rpg.314 · Mar 22, 2012

silent_guy said:
When the CPU constructs a draw call, the GPU executes a different one in parallel, right? So a GPU should not be slowed down by draw call overhead. With PCs getting ever more CPU cores, will draw call overhead really be an issue within the next couple of years? It's not as if games right now are making 100% use of all the CPU power. I don't think this is going to change. Or am I missing something (which is very likely) ?

I guess the first question really is: is the GPU often put into idle mode only because of draw call overhead (so not because there is no work to be done.) If the answer to that is 'yes', then the rest doesn't need to be answered...

It's usually more complicated than that. The apps usually kick out any data structures that a driver might be using. So more things you change->more lookups->more CPU stalls. There are more things going on as well, but this is one of them.

MDolenc · Mar 22, 2012

rpg.314 said:
What are the axes? The y axis seems to be million triangles/sec, not sure what you have for X.

X is number of draw primitive calls. Y is milions of triangles per second.
Sorry for messup (it's quite early here

).

rpg.314 · Mar 22, 2012

And what is X?

Dominik D · Mar 22, 2012

Both have been labeled already.

There's a constant TRI per frame count (15M) but the number of Draw calls per frame changes (from 1 to 100k -- this means that there are from 15M to 150 TRIs per draw call). X is number of draw calls per frame, Y is the number of TRIs rendered in one second across all frames that went into a 1 second span.

MDolenc · Mar 22, 2012

Do note the GTX 580 is limited quite a bit first by 1 triangle per clock limit and second by the decision on NV side that index buffers are kept in system memory and have to be shiped over pci-express.
But what I found really interesting is how fast Radeon, which does not have any of these "peak limiting" factors, falls off a clif.

Davros · Mar 23, 2012

I have a noob question
when a draw call is issued for an object is that it, or does it have to be issued for every frame that object remains on screen ?

Richard · Mar 23, 2012

Unless you're drawing to secondary surfaces for special effects (image-based reflections, certain low-grade shadowing, etc.) that you might want to persist over a couple or handful of frames (for performance reasons) you issue all your commands per frame.

Multi-GPU rendering may not apply, see your doctor for advisement

MDolenc said:
X is number of draw primitive calls. Y is milions of triangles per second.
Sorry for messup (it's quite early here ).

I'm surprised OGL (plus NV extensions) is so efficient here. I'm assuming you're using the latest OGL ICD correct? Having said that, there's no games using over 10K calls right now (correct ?) or such constant geometry/lighting/shading setup. EDIT: btw, I'm not pointing flaws at your methodology or choice of variables to test, I'm trying to understand how this data relates to the real worldtm

silent_guy · Mar 24, 2012

MDolenc said:
Draw calls are primarily a CPU problem, not a GPU problem. And strictly speaking it's not the number of draw calls that is the problem. It's the amount of state switching and figuring out where D3D/OpenGL resources actually are in hardware.
If you have lot's of vertex/index buffers then CPU will have to translate API handles to actual hardware addresses all the time. This isn't even an CPU problem that you could solve by having more cores or more threads. It depends alot on memory latency.

I did some test a while ago... Basically it goes from one draw primitive call to 100k draw primitive calls with a total budget of 15M triangles that's the same throughout entire run. Same texture, same shader just flipping vertex and index buffers each draw primitive call and uploading some constants.
This is D3D 11: https://static.slo-tech.com/52734.jpg
And this is mulithreaded D3D 11 vs NV properitary OpenGL extensions: https://static.slo-tech.com/52736.jpg

Thanks for posting this. Very interesting.

I saw your reply on Timothy Lottes' blog.

I went through the Nvidia presentation about the extension that replacing OpenGL names by GPU addresses. You commented that it's a good thing that Kepler has bindless textures. Does this mean that this particular NV extension isn't useful anymore or are there still cases where it is?

The way I think it works is that it's now up to the GPU to do the conversion between a GL name and an address, instead of having the CPU driver do it?

Also: does DX work the same way as OpenGL, in that there is a translation between names and GPU address too? If so, I assume that Kepler would fix the same problem for DX as well?

Finally, this has nothing to do with the virtual textures of AMD, right? Does AMD already have bindless textures?

Thanks!

rpg.314 · Mar 24, 2012

silent_guy said:
Thanks for posting this. Very interesting.

I saw your reply on Timothy Lottes' blog.

I went through the Nvidia presentation about the extension that replacing OpenGL names by GPU addresses. You commented that it's a good thing that Kepler has bindless textures. Does this mean that this particular NV extension isn't useful anymore or are there still cases where it is?

I think that extension is now backed by hw features.

The way I think it works is that it's now up to the GPU to do the conversion between a GL name and an address, instead of having the CPU driver do it?

Also: does DX work the same way as OpenGL, in that there is a translation between names and GPU address too? If so, I assume that Kepler would fix the same problem for DX as well?

Not unless DX evolves to have this feature. The whitepaper says so.

Finally, this has nothing to do with the virtual textures of AMD, right? Does AMD already have bindless textures?

Thanks!

MDolenc · Mar 24, 2012

silent_guy said:
I went through the Nvidia presentation about the extension that replacing OpenGL names by GPU addresses. You commented that it's a good thing that Kepler has bindless textures. Does this mean that this particular NV extension isn't useful anymore or are there still cases where it is?

Of course it's still usefull. The two existing extensions handle vertex/index buffers and constant buffers. The new extension allows handling textures in the same way.

silent_guy said:
The way I think it works is that it's now up to the GPU to do the conversion between a GL name and an address, instead of having the CPU driver do it?

That's always been up to the GPU driver. But with these extensions you can ask GPU driver to give you resource address in GPU virtual address space (basically a pointer) once and then use it everytime you switch resource.

silent_guy said:
Also: does DX work the same way as OpenGL, in that there is a translation between names and GPU address too? If so, I assume that Kepler would fix the same problem for DX as well?

Yes DX is the same. But this is not accessible from DX at all. DX is completly fixed API, you don't have extensions. Nothing of this bindless stuff does anything for DX. Atleast until DX 12 or something like that.

silent_guy said:
Finally, this has nothing to do with the virtual textures of AMD, right? Does AMD already have bindless textures?

Yes, that's a completly different thing. Is there any documentation for that extension though?

Richard said:
I'm surprised OGL (plus NV extensions) is so efficient here. I'm assuming you're using the latest OGL ICD correct?

It's not OpenGL that's much more efficient. Without bindless extensions it's pretty much the same deal as DX.

Richard said:
Having said that, there's no games using over 10K calls right now (correct ?) or such constant geometry/lighting/shading setup.

Yeah that's where things gets a bit tricky. Modern games can get to 10K calls, but they have more complex setup (they change textures, shaders,...). But neither do they render 16M triangles (I was off by 1M earlier

).

Some more graphs may be interesting...
If you wonder how things improved over the years: http://static.slo-tech.com/52728.jpg
Triangle count is a bit lower in this case (500k triangles), since 7600 GS seems to have a problem with 32 bit indices. This is logaritmic scale since numbers drop so sharply so fast it's not even funny.

7600 GS (AGP) runs on AthlonXP 2800+. GTX 580 and Radeon 6950 are on Q9550.

Draw Calls

silent_guy

3dcgi

Rodéric

a.k.a. Ingenu

MDolenc

Rodéric

a.k.a. Ingenu

rpg.314

rpg.314

MDolenc

rpg.314

Dominik D

MDolenc

Davros

Richard

Mord's imaginary friend

silent_guy

rpg.314

MDolenc

Similar threads