Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 18-Feb-2012, 04:57   #1
DavidGraham
Senior Member
 
Join Date: Dec 2009
Posts: 1,026
Default Draw Calls

Why are the Draw Calls on the PC more expensive than the Xbox360 (for example) , Developers can do more of them on the Xbox than on PC which sounds pretty strange considering that PCs have way more GPU and CPU horsepower .
DavidGraham is offline   Reply With Quote
Old 18-Feb-2012, 07:52   #2
sebbbi
Senior Member
 
Join Date: Nov 2007
Posts: 1,289
Default

On consoles you can directly write commands to the GPU ring buffer, and you will write the commands directly in a format that the GPU hardware understands. It's just a few lines of code to add a sincle draw call.

PC has both user space and kernel space drivers that process the draw calls. More than one software can be adding GPU commands simultaneously, and the driver must synchronize and store/return the GPU state accordingly (a single mutex lock is already over 1000 cycles). The GPU commands must be translated by the driver to a format understood by the GPU (many different manufacturers and GPU families). The commands and modified data must be sent over a standardized external bus to the GPU. On Xbox for example both GPU and CPU share same memory and nothing needs to be send over a relatively slow bus.

On consoles you can also edit GPU resources without locking them if you are sure that the GPU is not using them currently. On PC everything must be properly synchronized and all commands and resource references must be validated (software cannot be allowed to crash the GPU or modify/access data of other programs). PC drivers also automatically manage GPU memory allocation (moving in/out resources based on usage). Depending on allocator/cache algorithms used this can also be relatively expensive.
sebbbi is offline   Reply With Quote
Old 18-Feb-2012, 17:32   #3
3dcgi
Senior Member
 
Join Date: Feb 2002
Posts: 2,200
Default

Part 1 of the following link expands on how the application and drivers interact under DirectX.
http://fgiesen.wordpress.com/2011/07...ne-2011-index/
3dcgi is offline   Reply With Quote
Old 18-Feb-2012, 22:07   #4
swaaye
Entirely Suboptimal
 
Join Date: Mar 2003
Location: WI, USA
Posts: 7,283
Default

sebbi that sure sounds like a mess. Oh the uglies of a general purpose, expandable system.
swaaye is offline   Reply With Quote
Old 18-Feb-2012, 22:35   #5
hoho
Senior Member
 
Join Date: Aug 2007
Location: Estonia
Posts: 1,218
Send a message via MSN to hoho Send a message via Skype™ to hoho
Default

At least until DX11 OpenGL had far smaller overhead when doing drawing calls than DX. Not sure how things are now.

Also, having to feed the GPU with command stream in a specific format isn't all that fun any more once you have to deal with more than a couple of different GPUs or even versions of the same GPU core. Consoles can allow that kind of "uglyness" as hey are using fixed hardware for years and have no problems with incompatibility.
hoho is offline   Reply With Quote
Old 19-Feb-2012, 07:22   #6
pcchen
Moderator
 
Join Date: Feb 2002
Location: Taiwan
Posts: 2,485
Default

If the GPU becomes more mature then it may be possible to have a somewhat fixed hardware "instruction set" for GPUs, and it'd be possible to have a much leaner driver stack on PC.

There are, of course, some problems. The obvious one is, who gets to design this "instruction set." Most hardware 'standard' developed from a single product by a single company, which becomes very popular and then be used as a "de facto" standard (and may become a real industrial standard). It's really hard to make a new standard out of nothing. And design by a committee doesn't work. Microsoft is another possible candidate, but they probably don't understand enough about the underlying hardware architecture to make a good design.

Another way is to design an "intermediate code" which is translated by a software into hardware commands. But then this is not very different from a command buffer, and probably not going to bring much performance advantage.

There are other problems too. For example, since the driver would not be able to do safe keeping works, the hardware will have to. Basically you'll want the GPU to be like a CPU, with all the security modes and controls. Personally I think this is a good thing, as with GPU getting more flexible and GPGPU there will be more related problems with security, so it's probably better done with hardware anyway.
pcchen is offline   Reply With Quote
Old 19-Feb-2012, 17:32   #7
ERP
Moderator
 
Join Date: Feb 2002
Location: Redmond, WA
Posts: 3,669
Default

Modern GPU hardware is already a pretty close copy of the API.
Translating isn't really the issue.

The predominant problem is having to deal with multiple processes sharing the GPU. This will not change, short of adding hardware to the GPU to enable fast context switches which at some point may be justifiable.

DX11 and the win7 driver model removes a lot of the superfluous Driver overhead that existed. Plus you finally get command buffers, and state isn't global in the same way.

You still have the stupid stuff in the PC drivers, fixes/workarounds for poorly optimized or broken game code, that eats a lot of CPU since it involves analyzing everything going to the GPU. Of course devs are forced to work around these, which the driver writers then have to detect a d fix.....

And of course new PC GPU's are optimized to run last years best tech.
ERP is offline   Reply With Quote
Old 19-Feb-2012, 17:44   #8
pcchen
Moderator
 
Join Date: Feb 2002
Location: Taiwan
Posts: 2,485
Default

Quote:
Originally Posted by ERP View Post
Modern GPU hardware is already a pretty close copy of the API.
Translating isn't really the issue.

The predominant problem is having to deal with multiple processes sharing the GPU. This will not change, short of adding hardware to the GPU to enable fast context switches which at some point may be justifiable.
My point is, yes, you'll want GPU to handle these things, including memory protection and others. That probably needs a small CPU on the GPU to do some task managing works.

Then we can standardize these commands so applications will be able to send commands to the GPU directly, without any extra overhead. You can handle older applications with drivers, and newer applications will be able to access GPU more directly.

Of course, on a normal desktop OS, you probably still can't let applications access GPU directly, as it involves memory mapped I/O and that needs to be in kernel mode. However, its overhead should be much less than what we have now.
pcchen is offline   Reply With Quote
Old 20-Feb-2012, 18:35   #9
Dominik D
Member
 
Join Date: Mar 2007
Location: Wroclaw, Poland
Posts: 701
Default

Quote:
Originally Posted by pcchen View Post
If the GPU becomes more mature then it may be possible to have a somewhat fixed hardware "instruction set" for GPUs, and it'd be possible to have a much leaner driver stack on PC.
This is not going to happen. 1. HW from different vendors is to dissimilar for a common ISA. Not to mention that instructions are not everything GPUs process: there's some state involved, which is entirely HW-specific. 2. Part of what goes to the buffer is so tied to hardware it may be covered by patents (not that I would know anything about patents, just assuming this may very well be the case). 3. Even for the actual code there's a huge variation in what and how you want encoded in the command buffer which depends on the HW you're feeding.
__________________
Shifty Geezer: I don't think the guy really understands the subject.
PARANOiA: To be honest, Shifty, what you've described is 95% of Beyond3D - armchair experts spouting fact based on the low-level knowledge of a few.

This posting is provided "AS IS" with no warranties, and confers no rights.
Dominik D is offline   Reply With Quote
Old 21-Feb-2012, 16:19   #10
pcchen
Moderator
 
Join Date: Feb 2002
Location: Taiwan
Posts: 2,485
Default

Quote:
Originally Posted by Dominik D View Post
This is not going to happen. 1. HW from different vendors is to dissimilar for a common ISA. Not to mention that instructions are not everything GPUs process: there's some state involved, which is entirely HW-specific. 2. Part of what goes to the buffer is so tied to hardware it may be covered by patents (not that I would know anything about patents, just assuming this may very well be the case). 3. Even for the actual code there's a huge variation in what and how you want encoded in the command buffer which depends on the HW you're feeding.
Well, it's not likely in the immediate future, but never say never
HW from different vendor is probably going to be a moot point as the number of important GPU vendors in x86 space is now only three, and they probably all have cross licensing deals, so patent is not a serious problem. Advance in GPGPU also brings GPU from different vendors closer. Although it's probably not going to happen in maybe a few years, but at least it's not technically impossible and if there's enough incentive they may want to do that.

But that brings to the main point though: is there enough incentive for IHVs to do that? Right now I don't see that happening, as there are really no strong demand for very high performance desktop graphics.
pcchen is offline   Reply With Quote
Old 07-Mar-2012, 18:23   #11
darkblu
Senior Member
 
Join Date: Feb 2002
Posts: 2,642
Default

Quote:
Originally Posted by pcchen View Post
If the GPU becomes more mature then it may be possible to have a somewhat fixed hardware "instruction set" for GPUs, and it'd be possible to have a much leaner driver stack on PC.

There are, of course, some problems. The obvious one is, who gets to design this "instruction set." Most hardware 'standard' developed from a single product by a single company, which becomes very popular and then be used as a "de facto" standard (and may become a real industrial standard). It's really hard to make a new standard out of nothing. And design by a committee doesn't work. Microsoft is another possible candidate, but they probably don't understand enough about the underlying hardware architecture to make a good design.

Another way is to design an "intermediate code" which is translated by a software into hardware commands. But then this is not very different from a command buffer, and probably not going to bring much performance advantage.

There are other problems too. For example, since the driver would not be able to do safe keeping works, the hardware will have to. Basically you'll want the GPU to be like a CPU, with all the security modes and controls. Personally I think this is a good thing, as with GPU getting more flexible and GPGPU there will be more related problems with security, so it's probably better done with hardware anyway.
While I agree with most of the points you bring up, I'm not sure the (in)efficiency of today's draw call on the desktop is that much related to the GPU ISA per se. State models, buffer sync/lock mechanisms and such are much more relevant to the subject than what the compiler outputs. With the advent of binary-program APIs you could even consider the compiler's stage as (near) zero-cost these days, and that would not change that much the cost of the draw call. I mean, draw calls were a factor to reckon with much before GPU ISAs were a topic for conversations.
darkblu is offline   Reply With Quote
Old 08-Mar-2012, 03:07   #12
3dcgi
Senior Member
 
Join Date: Feb 2002
Posts: 2,200
Default

I assumed pcchen wasn't referring to the shader ISA, rather a theoretical command buffer "ISA".
3dcgi is offline   Reply With Quote
Old 08-Mar-2012, 10:20   #13
darkblu
Senior Member
 
Join Date: Feb 2002
Posts: 2,642
Default

Quote:
Originally Posted by 3dcgi View Post
I assumed pcchen wasn't referring to the shader ISA, rather a theoretical command buffer "ISA".
On a second read, I think that must be the case. My bad.
darkblu is offline   Reply With Quote
Old 16-Mar-2012, 22:14   #14
homerdog
hardly a Senior Member
 
Join Date: Jul 2008
Location: still camping with a mauler
Posts: 4,352
Default

So what can be done to reduce the cost of draw calls on PC? I know instancing is used to reduce the number of draw calls needed, but is there a way to actually reduce the amount of CPU time needed per call?
Keep in mind my understanding of these things could not even be called "beginner". More like "ignorant spectator".
__________________
Releasing a game in 2010 without AA is a completely foreign concept to me. If the technique you're using makes it impossible to use AA then you're using the wrong techniques. As simple as that. Releasing a PC game without AA options is OK only if that means you can only have it enabled
-Humus
homerdog is offline   Reply With Quote
Old 17-Mar-2012, 14:29   #15
Arwin
Now Officially a Top 10 Poster
 
Join Date: May 2006
Location: Maastricht, The Netherlands
Posts: 14,875
Default

All I know is that Crytek had pointed out the main weaknesses they considered to still be in DirectX11, and that they were working with Microsoft to sort them out. I don't know what has become of that actually, would have expected to have heard something about that, but maybe I just missed it.
Arwin is offline   Reply With Quote
Old 19-Mar-2012, 11:45   #16
Dominik D
Member
 
Join Date: Mar 2007
Location: Wroclaw, Poland
Posts: 701
Default

Quote:
Originally Posted by homerdog View Post
So what can be done to reduce the cost of draw calls on PC?
Draw call cost depends on the HW. Some things have to be translated for a given card and this imposes extra CPU cost per call. One could imagine that modern hardware may not support certain topologies (triangle fans would be something I guess most cards don't support directly; perhaps some support just plain TRIs or just lists). But it's not really a draw call that kills you, it's the (unnecessary) state changes between draw calls and stuff that has to be translated. Pretty much every modern HW out there simulates fixed pipeline in the driver, so that's extra CPU cost for you. Weird texture formats may require some processing. There's a lot happening beyond draw calls. And there are lots of things you can do to minimize CPU usage.
__________________
Shifty Geezer: I don't think the guy really understands the subject.
PARANOiA: To be honest, Shifty, what you've described is 95% of Beyond3D - armchair experts spouting fact based on the low-level knowledge of a few.

This posting is provided "AS IS" with no warranties, and confers no rights.
Dominik D is offline   Reply With Quote
Old 19-Mar-2012, 14:37   #17
Rodéric
a.k.a. Ingenu
 
Join Date: Feb 2002
Location: Carnon Plage, France.
Posts: 2,885
Default

Make GPU standard just like CPU thank you.

Not investigated the cost of a draw call in a while, a lot happens behind the hood for sure, but we are used to minimizing them since D3D9...
A draw call basically gets all the states and check their validity/consistency before filling the command stream.

Anyone working on drivers can explain how that works in D3D10/11 ?
(I know the runtime does a lot because of NV , but I'd be curious to see how much is left to the driver. Could "just" be transcoding the D3D command stream into a GPU specific one.)
__________________
So many things to do, and yet so little time to spend...
Rodéric is offline   Reply With Quote
Old 19-Mar-2012, 15:09   #18
darkblu
Senior Member
 
Join Date: Feb 2002
Posts: 2,642
Default

Quote:
Originally Posted by Rodéric View Post
Make GPU standard just like CPU thank you.
CPUs are not standard either ; ) Moreover, a proper level of abstraction beats standardized hw most of the time (e.g. HL programming languages vs assembly, etc)

Quote:
Not investigated the cost of a draw call in a while, a lot happens behind the hood for sure, but we are used to minimizing them since D3D9...
A draw call basically gets all the states and check their validity/consistency before filling the command stream.

Anyone working on drivers can explain how that works in D3D10/11 ?
(I know the runtime does a lot because of NV , but I'd be curious to see how much is left to the driver. Could "just" be transcoding the D3D command stream into a GPU specific one.)
I can't speak of D3D, but I have observations from GLES - the driver there has two (or two-and-a-half) major functions:

1. state tracking
2. shader compilation (normally depending on both client shaders and active state)
3. interfacing with the kernel mem allocators for buffer objects management and related fences/syncs.

The last one of those does not really belong in there, as it can be taken out of the driver and into a bog standard "GPU buffer API", or if you wish, a "DMA-coherent buffer API", perhaps even in flavors based on whether the device is MMU-equipped (so it can "comprehend" page tables) or not.

That said, we can optimize drawcalls all we want, but they will never be 'free' - they'll always cost CPU cycles, whether in housekeeping or in CPU/GPU rendezvous mechanisms.
darkblu is offline   Reply With Quote
Old 19-Mar-2012, 16:31   #19
Dominik D
Member
 
Join Date: Mar 2007
Location: Wroclaw, Poland
Posts: 701
Default

Quote:
Originally Posted by Rodéric View Post
Make GPU standard just like CPU thank you.
Sure. Who's going to create the de facto standard the way Intel's x86 is? I vote for PowerVR to lead in this space. :>
__________________
Shifty Geezer: I don't think the guy really understands the subject.
PARANOiA: To be honest, Shifty, what you've described is 95% of Beyond3D - armchair experts spouting fact based on the low-level knowledge of a few.

This posting is provided "AS IS" with no warranties, and confers no rights.
Dominik D is offline   Reply With Quote
Old 20-Mar-2012, 01:38   #20
3dcgi
Senior Member
 
Join Date: Feb 2002
Posts: 2,200
Default

Quote:
Originally Posted by Dominik D View Post
Draw call cost depends on the HW. Some things have to be translated for a given card and this imposes extra CPU cost per call. One could imagine that modern hardware may not support certain topologies (triangle fans would be something I guess most cards don't support directly; perhaps some support just plain TRIs or just lists). But it's not really a draw call that kills you, it's the (unnecessary) state changes between draw calls and stuff that has to be translated. Pretty much every modern HW out there simulates fixed pipeline in the driver, so that's extra CPU cost for you. Weird texture formats may require some processing. There's a lot happening beyond draw calls. And there are lots of things you can do to minimize CPU usage.
AMD hardware supports triangle fans though I would like to see future APIs drop support for any primitive type that isn't a list.
3dcgi is offline   Reply With Quote
Old 20-Mar-2012, 02:17   #21
silent_guy
Senior Member
 
Join Date: Mar 2006
Posts: 2,166
Default

When the CPU constructs a draw call, the GPU executes a different one in parallel, right? So a GPU should not be slowed down by draw call overhead. With PCs getting ever more CPU cores, will draw call overhead really be an issue within the next couple of years? It's not as if games right now are making 100% use of all the CPU power. I don't think this is going to change. Or am I missing something (which is very likely) ?

I guess the first question really is: is the GPU often put into idle mode only because of draw call overhead (so not because there is no work to be done.) If the answer to that is 'yes', then the rest doesn't need to be answered...
silent_guy is offline   Reply With Quote
Old 20-Mar-2012, 03:31   #22
3dcgi
Senior Member
 
Join Date: Feb 2002
Posts: 2,200
Default

In most situations the driver/CPU is multiple draw calls ahead of the GPU so yes, they work in parallel. The GPU is only slowed down if it's starved for work.
3dcgi is offline   Reply With Quote
Old 20-Mar-2012, 10:07   #23
Rodéric
a.k.a. Ingenu
 
Join Date: Feb 2002
Location: Carnon Plage, France.
Posts: 2,885
Default

Quote:
Originally Posted by silent_guy View Post
I guess the first question really is: is the GPU often put into idle mode only because of draw call overhead (so not because there is no work to be done.) If the answer to that is 'yes', then the rest doesn't need to be answered...
No because devs are already optimising for min draw calls ?

I think people want to know why/if they need to put extra effort optimising to minimise draw calls.
__________________
So many things to do, and yet so little time to spend...
Rodéric is offline   Reply With Quote
Old 21-Mar-2012, 14:45   #24
MDolenc
Member
 
Join Date: May 2002
Location: Slovenia
Posts: 421
Default

Quote:
Originally Posted by silent_guy View Post
When the CPU constructs a draw call, the GPU executes a different one in parallel, right? So a GPU should not be slowed down by draw call overhead. With PCs getting ever more CPU cores, will draw call overhead really be an issue within the next couple of years? It's not as if games right now are making 100% use of all the CPU power. I don't think this is going to change. Or am I missing something (which is very likely) ?
Draw calls are primarily a CPU problem, not a GPU problem. And strictly speaking it's not the number of draw calls that is the problem. It's the amount of state switching and figuring out where D3D/OpenGL resources actually are in hardware.
If you have lot's of vertex/index buffers then CPU will have to translate API handles to actual hardware addresses all the time. This isn't even an CPU problem that you could solve by having more cores or more threads. It depends alot on memory latency.

I did some test a while ago... Basically it goes from one draw primitive call to 100k draw primitive calls with a total budget of 15M triangles that's the same throughout entire run. Same texture, same shader just flipping vertex and index buffers each draw primitive call and uploading some constants.
This is D3D 11: https://static.slo-tech.com/52734.jpg
And this is mulithreaded D3D 11 vs NV properitary OpenGL extensions: https://static.slo-tech.com/52736.jpg
MDolenc is offline   Reply With Quote
Old 21-Mar-2012, 16:26   #25
Rodéric
a.k.a. Ingenu
 
Join Date: Feb 2002
Location: Carnon Plage, France.
Posts: 2,885
Default

Quite interesting.
__________________
So many things to do, and yet so little time to spend...
Rodéric is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 10:25.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.