Multidraw/exIndirect has/starting to arrive

iroboto

Daft Funk
Moderator
Legend
Supporter
I've been following this convo on twitter, even @sebbbi takes part in this convo.

Context:
The number of draw calls in Wolf 2 is over 30K per frame. And the responses following is about reducing those numbers down with multi draw/executeIndirect. Which is neat because this provides some light on things we didn't know before.

a) Sony never lets us confirm what's in GNM, I think its safe(r) to say that GNM supports some form of executeIndirect/Multidraw. The beating around the bush that this style of draw call setup is available on console but not on PC is, at the very least admission that there are similar technologies at play here.

b) exIndirect was demoed by Andrew L here at B3D, originally as a CPU savings type technology. Demo here (http://www.dsogaming.com/news/direc...proves-performance-greatly-reduces-cpu-usage/)

But we see in Andrew's demo that the move to ExecuteIndirect net massive fps boosts while simultaneously reducing the load on the CPU and thus power savings as well.

Since this demo, it was never known when developers would finally start taking advantage of it until now.

But it leaves an open question; what is the actual impact in a game scenario as opposed to a tech demo?
Following their twitter conversation here, we can see that they compressed 30K of draw calls down. So the CPU is still extensively being used to merge and cull the draw calls down (before submission) and then they finally submit it to the GPU.

So what are the actual savings here? I assume with the tech demo, each asteroid had its own draw call, no batching was done in this scenario. So the savings are multiplicative. But in game scenario, is this true? If they've merged and culled, and batched then make a handful of submissions via multi draw, What are the CPU savings then?
 
The biggest problem with ExecuteIndirect/Multidraw is that DirectX 11 doesn't support it. If you design your renderer around ExecuteIndirect, you are tied to DirectX 12. This means you can't support Windows Vista/7/8 customers.

Tiago and Axel are from id Software. Currently id Software is the only big AAA studio embracing Vulkan & OpenGL instead of DirectX. They can use the newest features, because Vulkan and OpenGL support Windows 7. They don't need to design around DirectX 11 limitations like all the other AAA cross platform developers.

Multidraw/ExecuteIndirect have their upsides and downsides. Most GPUs can pack together multiple instances (DrawInstanced) in a single vertex shader wave/warp, but can't pack together multiple draws. Multidraw thus isn't any more efficient to the GPU as submitting multiple real draws (with identical state), but the limitation is that you can't change state between these draws. Nvidia introduced a new multidraw extension recently for Vulkan that allows changing state and bindings between draw calls: https://developer.nvidia.com/device-generated-commands-vulkan.

IMHO the biggest advantage of the indirect draws and multidraw (with indirect draw count argument) is the ability to prepare the draws solely on GPU side. This allows late culling on GPU side, based on GPU known data (such as the depth buffer). This is especially important for shadow map rendering, as you can achieve dramatic cost savings by doing fine grained culling of shadows based on currently visible surfaces (depth buffer = all visible surface pixels). Both me and Ulrich (from AC:Unity team) are talking about out shadow rendering optimizations in our Siggraph 2015 presentation: http://advances.realtimerendering.c...siggraph2015_combined_final_footer_220dpi.pdf. Not many cross platform games are yet using optimizations like this... most likely because of the PC graphics API issues discussed above.
 
Guess I'm going to be that guy.
For big AAA games just how many people are using vista/7/8?

At some point support will need to be dropped and I'm wondering if we're close to that point. Especially if it's holding engine development back and the consoles already support it.
Is it more beneficial to design the engine around consoles and limit the OS support on pc, or the other way around.

From what you have said there doesn't seem to be any way to shim/software emulate it on DX11
 
Wolfenstein 2 uses Vulkan exclusively on PC, OGL isn't available in the game.
Now that glslang can translate most HLSL shaders to feed into SPIR-V I wonder if more developers will look into Vulkan or if it's still to early?
 
I have a feeling I'm going to get a lot of stuff wrong in my post, but if you can bear with me sebbbi ;)
The biggest problem with ExecuteIndirect/Multidraw is that DirectX 11 doesn't support it. If you design your renderer around ExecuteIndirect, you are tied to DirectX 12. This means you can't support Windows Vista/7/8 customers.

Tiago and Axel are from id Software. Currently id Software is the only big AAA studio embracing Vulkan & OpenGL instead of DirectX. They can use the newest features, because Vulkan and OpenGL support Windows 7. They don't need to design around DirectX 11 limitations like all the other AAA cross platform developers.
On this one, I recognize that this is probably more tied to financial costs than anything. So I'm assuming API you support, that's just a lot more labour. What is the cost of building say a Vulkan pipeline, a DX11 pipeline, and say because you really needed that CPU savings (you did your math up front), you needed to use ExecuteIndirect on DX12 for Xbox lets say; what are we looking at ball park? Can each pipeline be constructed in parallel by different render programmers? Or are their dependencies and constraints on the engine such that it makes certain features like ExecuteIndirect sort of mess up the other pipelines, and so the need to wait for all APIs to support it before moving forward?

Multidraw/ExecuteIndirect have their upsides and downsides. Most GPUs can pack together multiple instances (DrawInstanced) in a single vertex shader wave/warp, but can't pack together multiple draws. Multidraw thus isn't any more efficient to the GPU as submitting multiple real draws (with identical state), but the limitation is that you can't change state between these draws. Nvidia introduced a new multidraw extension recently for Vulkan that allows changing state and bindings between draw calls: https://developer.nvidia.com/device-generated-commands-vulkan.
Thanks for bringing this up as I once agian totally forgot that vulkan does not support the multidraw without the nvidia extension. I guess this is vain lol. I guess that also means GNM is not guaranteed to have this feature either. Dammit I want to know.
As for Xbox One/X however, as I understand it, ExecuteIndirect does support some form of state changes between draws, unless I'm wrong.

IMHO the biggest advantage of the indirect draws and multidraw (with indirect draw count argument) is the ability to prepare the draws solely on GPU side. This allows late culling on GPU side, based on GPU known data (such as the depth buffer). This is especially important for shadow map rendering, as you can achieve dramatic cost savings by doing fine grained culling of shadows based on currently visible surfaces (depth buffer = all visible surface pixels). Both me and Ulrich (from AC:Unity team) are talking about out shadow rendering optimizations in our Siggraph 2015 presentation: http://advances.realtimerendering.c...siggraph2015_combined_final_footer_220dpi.pdf. Not many cross platform games are yet using optimizations like this... most likely because of the PC graphics API issues discussed above.
I'll have to read this again, thanks Sebbbi
 
On this one, I recognize that this is probably more tied to financial costs than anything. So I'm assuming API you support, that's just a lot more labour. What is the cost of building say a Vulkan pipeline, a DX11 pipeline, and say because you really needed that CPU savings (you did your math up front), you needed to use ExecuteIndirect on DX12 for Xbox lets say; what are we looking at ball park? Can each pipeline be constructed in parallel by different render programmers? Or are their dependencies and constraints on the engine such that it makes certain features like ExecuteIndirect sort of mess up the other pipelines, and so the need to wait for all APIs to support it before moving forward?
If you need GPU data from current frame to make the culling decisions, then you need to make them on GPU side. You simply can't stall the GPU, send the data to CPU, do the decisions on CPU side, send new draws. This would be a huge GPU stall in the middle of the frame. If you design your shadow rendering pipeline in a way that it needs precise visibility data in order to generate the draw calls, then you need to run it on GPU. You can't simply have CPU fallback, unless slideshow performance (run CPU and GPU in lockstep) is good enough for you. Obviously you could write a completely different shadow mapping algorithm for the other APIs, and completely different CPU based scene setup & culling pipeline, but that's a lot of work and lots of extra maintenance. You don't want multiple paths because that adds inertia to future changes. It is always more important to make people in the team more efficient in their job than adding some fancy tech that makes things slightly faster on some platform.

Of course there are many way to implement multidraw-style rendering without actually using multidraw. This is one of them: https://forum.beyond3d.com/threads/...ching-and-index-buffering.57591/#post-1900656. Our Siggraph presentation presented one other way (strip clustering). If you need to select between these two, select the one described in the B3D thread. Strip clustering isn't very good (especially on older AMD GPUs).

If you have multidraw, I would use a hybrid tech with compute shader written 16 bit indices (saves 50% of index read & write bandwidth). Pack a small amount of clusters to each draw to avoid overloading CP + avoid partial wave problems.
As for Xbox One/X however, as I understand it, ExecuteIndirect does support some form of state changes between draws, unless I'm wrong.
Frostbite presentation mentioned that Xbox One has similar multidraw than the new Nvidia Vulkan extension, allowing shader/state changes between draws. This is better than DX12 standard ExecuteIndirect or OpenGL MultiDraw. I can't obviously talk about any more details than was revealed in their presentation.
 
Last edited:
If you need GPU data from current frame to make the culling decisions, then you need to make them on GPU side. You simply can't stall the GPU, send the data to CPU, do the decisions on CPU side, send new draws. This would be a huge GPU stall in the middle of the frame. If you design your shadow rendering pipeline in a way that it needs precise visibility data in order to generate the draw calls, then you need to run it on GPU. You can't simply have CPU fallback, unless slideshow performance (run CPU and GPU in lockstep) is good enough for you. Obviously you could write a completely different shadow mapping algorithm for the other APIs, and completely different CPU based scene setup & culling pipeline, but that's a lot of work and lots of extra maintenance. You don't want multiple paths because that adds inertia to future changes. It is always more important to make people in the team more efficient in their job than adding some fancy tech that makes things slightly faster on some platform.
This was massively insightful and thank you for that. I think I didn't spend enough time to understand basic rendering that I should have understood the order of what needed to be render, and now that you put it so clearly, I can see why GPU side dispatch matters that much.
 
Only 30% of Steam customers have Windows 10:
http://store.steampowered.com/hwsurvey/directx/

Windows 7 is still 65% of the player base. Dropping it would be a financial suicide. Support of Windows XP and Vista however now can be safely dropped from games.
Is that player base for modern AAA games, or just overall?

At work so can't view the link.
But the percentage of legacy operating systems must be dropping for people that actually play the heavy duty AAA games. If start on the engine now, by the time the game is finished lets just say 2 years fast tracked, would it still be a high enough percentage of legacy user to justify holding engine development back?

At what point would it be possible to 'cut them loose' and still be DX12 not valkun based?
 
Is that player base for modern AAA games, or just overall?

At work so can't view the link.
But the percentage of legacy operating systems must be dropping for people that actually play the heavy duty AAA games. If start on the engine now, by the time the game is finished lets just say 2 years fast tracked, would it still be a high enough percentage of legacy user to justify holding engine development back?

At what point would it be possible to 'cut them loose' and still be DX12 not valkun based?
a whole gen. These things take a lot of time. That's why exclusives tend to be outliers when it comes to what they can do with the title.
 
Only 30% of Steam customers have Windows 10:
http://store.steampowered.com/hwsurvey/directx/

Windows 7 is still 65% of the player base. Dropping it would be a financial suicide. Support of Windows XP and Vista however now can be safely dropped from games.
As Jay says, that's total Steam devices. What proportion of people interested in AAA titles are still on Win 7? You need data from one of the big pubs about its users' system data, such as average machine playing COD or PUBG.
 
Only 30% of Steam customers have Windows 10:
http://store.steampowered.com/hwsurvey/directx/

Windows 7 is still 65% of the player base. Dropping it would be a financial suicide. Support of Windows XP and Vista however now can be safely dropped from games.

That's actually extremely misleading since the influx of PUBG players from China (for example: Windows 7 grew ~22% over last month while Windows 10 lost ~17%). I think it's safe to say that the steam survey at the moment is not accurately portraying trends! Still the point stands, a studio is probably losing almost half (~40%?) its potential market by dropping Windows 7. A little too early for most...
 
Back
Top