Draw Calls

Why are the Draw Calls on the PC more expensive than the Xbox360 (for example) , Developers can do more of them on the Xbox than on PC which sounds pretty strange considering that PCs have way more GPU and CPU horsepower .
 
On consoles you can directly write commands to the GPU ring buffer, and you will write the commands directly in a format that the GPU hardware understands. It's just a few lines of code to add a sincle draw call.

PC has both user space and kernel space drivers that process the draw calls. More than one software can be adding GPU commands simultaneously, and the driver must synchronize and store/return the GPU state accordingly (a single mutex lock is already over 1000 cycles). The GPU commands must be translated by the driver to a format understood by the GPU (many different manufacturers and GPU families). The commands and modified data must be sent over a standardized external bus to the GPU. On Xbox for example both GPU and CPU share same memory and nothing needs to be send over a relatively slow bus.

On consoles you can also edit GPU resources without locking them if you are sure that the GPU is not using them currently. On PC everything must be properly synchronized and all commands and resource references must be validated (software cannot be allowed to crash the GPU or modify/access data of other programs). PC drivers also automatically manage GPU memory allocation (moving in/out resources based on usage). Depending on allocator/cache algorithms used this can also be relatively expensive.
 
sebbi that sure sounds like a mess. Oh the uglies of a general purpose, expandable system.
 
At least until DX11 OpenGL had far smaller overhead when doing drawing calls than DX. Not sure how things are now.

Also, having to feed the GPU with command stream in a specific format isn't all that fun any more once you have to deal with more than a couple of different GPUs or even versions of the same GPU core. Consoles can allow that kind of "uglyness" as hey are using fixed hardware for years and have no problems with incompatibility.
 
If the GPU becomes more mature then it may be possible to have a somewhat fixed hardware "instruction set" for GPUs, and it'd be possible to have a much leaner driver stack on PC.

There are, of course, some problems. The obvious one is, who gets to design this "instruction set." Most hardware 'standard' developed from a single product by a single company, which becomes very popular and then be used as a "de facto" standard (and may become a real industrial standard). It's really hard to make a new standard out of nothing. And design by a committee doesn't work. Microsoft is another possible candidate, but they probably don't understand enough about the underlying hardware architecture to make a good design.

Another way is to design an "intermediate code" which is translated by a software into hardware commands. But then this is not very different from a command buffer, and probably not going to bring much performance advantage.

There are other problems too. For example, since the driver would not be able to do safe keeping works, the hardware will have to. Basically you'll want the GPU to be like a CPU, with all the security modes and controls. Personally I think this is a good thing, as with GPU getting more flexible and GPGPU there will be more related problems with security, so it's probably better done with hardware anyway.
 
Modern GPU hardware is already a pretty close copy of the API.
Translating isn't really the issue.

The predominant problem is having to deal with multiple processes sharing the GPU. This will not change, short of adding hardware to the GPU to enable fast context switches which at some point may be justifiable.

DX11 and the win7 driver model removes a lot of the superfluous Driver overhead that existed. Plus you finally get command buffers, and state isn't global in the same way.

You still have the stupid stuff in the PC drivers, fixes/workarounds for poorly optimized or broken game code, that eats a lot of CPU since it involves analyzing everything going to the GPU. Of course devs are forced to work around these, which the driver writers then have to detect a d fix.....

And of course new PC GPU's are optimized to run last years best tech.
 
Modern GPU hardware is already a pretty close copy of the API.
Translating isn't really the issue.

The predominant problem is having to deal with multiple processes sharing the GPU. This will not change, short of adding hardware to the GPU to enable fast context switches which at some point may be justifiable.

My point is, yes, you'll want GPU to handle these things, including memory protection and others. That probably needs a small CPU on the GPU to do some task managing works.

Then we can standardize these commands so applications will be able to send commands to the GPU directly, without any extra overhead. You can handle older applications with drivers, and newer applications will be able to access GPU more directly.

Of course, on a normal desktop OS, you probably still can't let applications access GPU directly, as it involves memory mapped I/O and that needs to be in kernel mode. However, its overhead should be much less than what we have now.
 
If the GPU becomes more mature then it may be possible to have a somewhat fixed hardware "instruction set" for GPUs, and it'd be possible to have a much leaner driver stack on PC.
This is not going to happen. 1. HW from different vendors is to dissimilar for a common ISA. Not to mention that instructions are not everything GPUs process: there's some state involved, which is entirely HW-specific. 2. Part of what goes to the buffer is so tied to hardware it may be covered by patents (not that I would know anything about patents, just assuming this may very well be the case). 3. Even for the actual code there's a huge variation in what and how you want encoded in the command buffer which depends on the HW you're feeding.
 
This is not going to happen. 1. HW from different vendors is to dissimilar for a common ISA. Not to mention that instructions are not everything GPUs process: there's some state involved, which is entirely HW-specific. 2. Part of what goes to the buffer is so tied to hardware it may be covered by patents (not that I would know anything about patents, just assuming this may very well be the case). 3. Even for the actual code there's a huge variation in what and how you want encoded in the command buffer which depends on the HW you're feeding.

Well, it's not likely in the immediate future, but never say never ;)
HW from different vendor is probably going to be a moot point as the number of important GPU vendors in x86 space is now only three, and they probably all have cross licensing deals, so patent is not a serious problem. Advance in GPGPU also brings GPU from different vendors closer. Although it's probably not going to happen in maybe a few years, but at least it's not technically impossible and if there's enough incentive they may want to do that.

But that brings to the main point though: is there enough incentive for IHVs to do that? Right now I don't see that happening, as there are really no strong demand for very high performance desktop graphics.
 
If the GPU becomes more mature then it may be possible to have a somewhat fixed hardware "instruction set" for GPUs, and it'd be possible to have a much leaner driver stack on PC.

There are, of course, some problems. The obvious one is, who gets to design this "instruction set." Most hardware 'standard' developed from a single product by a single company, which becomes very popular and then be used as a "de facto" standard (and may become a real industrial standard). It's really hard to make a new standard out of nothing. And design by a committee doesn't work. Microsoft is another possible candidate, but they probably don't understand enough about the underlying hardware architecture to make a good design.

Another way is to design an "intermediate code" which is translated by a software into hardware commands. But then this is not very different from a command buffer, and probably not going to bring much performance advantage.

There are other problems too. For example, since the driver would not be able to do safe keeping works, the hardware will have to. Basically you'll want the GPU to be like a CPU, with all the security modes and controls. Personally I think this is a good thing, as with GPU getting more flexible and GPGPU there will be more related problems with security, so it's probably better done with hardware anyway.
While I agree with most of the points you bring up, I'm not sure the (in)efficiency of today's draw call on the desktop is that much related to the GPU ISA per se. State models, buffer sync/lock mechanisms and such are much more relevant to the subject than what the compiler outputs. With the advent of binary-program APIs you could even consider the compiler's stage as (near) zero-cost these days, and that would not change that much the cost of the draw call. I mean, draw calls were a factor to reckon with much before GPU ISAs were a topic for conversations.
 
I assumed pcchen wasn't referring to the shader ISA, rather a theoretical command buffer "ISA".
 
So what can be done to reduce the cost of draw calls on PC? I know instancing is used to reduce the number of draw calls needed, but is there a way to actually reduce the amount of CPU time needed per call?
Keep in mind my understanding of these things could not even be called "beginner". More like "ignorant spectator". :)
 
All I know is that Crytek had pointed out the main weaknesses they considered to still be in DirectX11, and that they were working with Microsoft to sort them out. I don't know what has become of that actually, would have expected to have heard something about that, but maybe I just missed it.
 
So what can be done to reduce the cost of draw calls on PC?

Draw call cost depends on the HW. Some things have to be translated for a given card and this imposes extra CPU cost per call. One could imagine that modern hardware may not support certain topologies (triangle fans would be something I guess most cards don't support directly; perhaps some support just plain TRIs or just lists). But it's not really a draw call that kills you, it's the (unnecessary) state changes between draw calls and stuff that has to be translated. Pretty much every modern HW out there simulates fixed pipeline in the driver, so that's extra CPU cost for you. Weird texture formats may require some processing. There's a lot happening beyond draw calls. And there are lots of things you can do to minimize CPU usage.
 
Make GPU standard just like CPU thank you.

Not investigated the cost of a draw call in a while, a lot happens behind the hood for sure, but we are used to minimizing them since D3D9...
A draw call basically gets all the states and check their validity/consistency before filling the command stream.

Anyone working on drivers can explain how that works in D3D10/11 ?
(I know the runtime does a lot because of NV :p, but I'd be curious to see how much is left to the driver. Could "just" be transcoding the D3D command stream into a GPU specific one.)
 
Make GPU standard just like CPU thank you.
CPUs are not standard either ; ) Moreover, a proper level of abstraction beats standardized hw most of the time (e.g. HL programming languages vs assembly, etc)

Not investigated the cost of a draw call in a while, a lot happens behind the hood for sure, but we are used to minimizing them since D3D9...
A draw call basically gets all the states and check their validity/consistency before filling the command stream.

Anyone working on drivers can explain how that works in D3D10/11 ?
(I know the runtime does a lot because of NV :p, but I'd be curious to see how much is left to the driver. Could "just" be transcoding the D3D command stream into a GPU specific one.)
I can't speak of D3D, but I have observations from GLES - the driver there has two (or two-and-a-half) major functions:

1. state tracking
2. shader compilation (normally depending on both client shaders and active state)
3. interfacing with the kernel mem allocators for buffer objects management and related fences/syncs.

The last one of those does not really belong in there, as it can be taken out of the driver and into a bog standard "GPU buffer API", or if you wish, a "DMA-coherent buffer API", perhaps even in flavors based on whether the device is MMU-equipped (so it can "comprehend" page tables) or not.

That said, we can optimize drawcalls all we want, but they will never be 'free' - they'll always cost CPU cycles, whether in housekeeping or in CPU/GPU rendezvous mechanisms.
 
Draw call cost depends on the HW. Some things have to be translated for a given card and this imposes extra CPU cost per call. One could imagine that modern hardware may not support certain topologies (triangle fans would be something I guess most cards don't support directly; perhaps some support just plain TRIs or just lists). But it's not really a draw call that kills you, it's the (unnecessary) state changes between draw calls and stuff that has to be translated. Pretty much every modern HW out there simulates fixed pipeline in the driver, so that's extra CPU cost for you. Weird texture formats may require some processing. There's a lot happening beyond draw calls. And there are lots of things you can do to minimize CPU usage.
AMD hardware supports triangle fans though I would like to see future APIs drop support for any primitive type that isn't a list.
 
Back
Top