Asynchronous Compute : what are the benefits?

Ok you you want to venn diagram the PC hardware game market: Circle 1 - people who game on their PC, Circle 2 - people who have PC's capable of 'pushing the tech envelopment' and Circle 3 - people who want to run the type of games in Circle 2.

Out of curiously, how large would be estimate the modern gaming PC market to be?
 
Ok you you want to venn diagram the PC hardware game market: Circle 1 - people who game on their PC, Circle 2 - people who have PC's capable of 'pushing the tech envelopment' and Circle 3 - people who want to run the type of games in Circle 2.

Out of curiously, how large would be estimate the modern gaming PC market to be?

That makes sense yes. It certainly makes more sense to segment the PC gaming market than it does to lump it all into one big market because that's not a true reflection of reality. It's purely anecdotal but as an example of what I'm talking about, I have 3 PC's in this house, all have steam installed and only one is used for gaming. I installed steam on the other 2 for streaming purposes.

With regards to the size of the modern gaming PC market (i.e. PC's capable of playing the latest generation of console games at minimum or higher settings) I really couldn't hazard a guess tbh but it should be easy enough to work out from the steam survey.

It's only a rough estimate but I'd say any DX11 capable PC with at least 1GB of VRAM and and SB based i3 with 4GB system memory would fall into the category of "capable of playing current generation consoles games at minimum settings" so whatever proportion of the steam survey that spec makes up could be considered the target market for console devs. Anyone who's running less than that probably isn't interested in modern console ports (on PC) anyway
 
Last edited:
The problem isn't the current console ports. It is the future console ports that are using async compute. Actually, I assume that even the most advanced PC could held back the adoption of compute that needs tight integration between GPU and CPU, which brings me to my 2nd point....
Devs, please utilize the iGPU in Kaveri in tandem with dGPU. When I bought this new PC, the choice was between Kaveri and i3 with dGPU. I'm hoping that one day the iGPU can be used as some kind of a co-processor (exclusive for compute). Basically I'm buying this PC and betting that in the future it will be used properly vs i3 with dGPU that should have better performance at similar price vs Kaveri. Don't let me down...
 
I actually thought about that too. Like so many others, I have two GPUs in my system and only one is used at any given time. The thought did cross my mind that the second GPU should be used, if needed, as a co-processor.
 
The problem isn't the current console ports. It is the future console ports that are using async compute. Actually, I assume that even the most advanced PC could held back the adoption of compute that needs tight integration between GPU and CPU, which brings me to my 2nd point....
Devs, please utilize the iGPU in Kaveri in tandem with dGPU. When I bought this new PC, the choice was between Kaveri and i3 with dGPU. I'm hoping that one day the iGPU can be used as some kind of a co-processor (exclusive for compute). Basically I'm buying this PC and betting that in the future it will be used properly vs i3 with dGPU that should have better performance at similar price vs Kaveri. Don't let me down...

I completely agree that that would be the ideal situation, i.e. use the iGPU for any compute work that requires tight integration between the CPU and GPU. It certainly sounds like DX12/Vulkan would allow that. However even without that, given the power of modern PC's CPU's there's (maybe) also the option of simply running whatever the consoles are relying on async compute for on the CPU itself. There's probably enough headroom there (especially once the new API's land) to not worry about needing async compute at all.
 
I have two GPUs in my system and only one is used at any given time. The thought did cross my mind that the second GPU should be used, if needed, as a co-processor.
If they are both nvidia gpu's then in certain games (physx) you can use both one for graphics one for physics
 
There is a method so you can mix amd (for graphics) and nvidia (for physics) but it involves a lot of messing around
A similar fix for intel and nvidia may exist but I'm guessing the awkwardness of getting it to work would make it too unappealing
 
Last edited:
There seems to be a common misbelief that compute shaders are mainly used for non-graphics (GPGPU) tasks. This is not true. Compute shaders are mainly used for graphics rendering in current games. Only a few games use compute shaders to perform non-graphics tasks.

Compute shader based lighting has become the norm in deferred rendered games. Battlefield 3 (2011) was one of the first games using compute shader based (tiled) lighting, but many PS3 games (such as Uncharted / TLoU) and even some Xbox 360 games (such as the Trials series games) had already used similar tiled deferred lighting pipelines (but with pixel shaders or with SPUs). Compute shaders allow better bandwidth usage compared to the old deferred rendering techniques.

Compute shaders also provide a big boost for many post process effects. Most kernels (such as blur/bloom, depth of field, etc) can be efficiently executed using compute shaders. A rough estimate for a modern game is that around 50% of the frame time is spend on lighting and post processing. This means that at least 50% of a modern game's rendering work is done in compute shaders. Most of this compute work can be executed asynchronously (overlapping g-buffer and shadow map rasterization work).

Asynchronous compute will be mostly used for rendering related tasks. It will improve the graphics quality and the frame rate. Some games will use asynchronous compute for non-graphics related tasks. However, as we have already seen, compute shaders have been most successfully used for graphics processing. Asynchronous compute will make compute shaders even more useful for rendering purposes, meaning that there's will not be much free GPU cycles to spare for other purposes. This is especially true on PC, since the data transfer from CPU memory to GPU memory and back is expensive, and has high latency. Rendering related compute work doesn't need to be transferred to CPU memory at all.
 
Asynchronous compute will make compute shaders even more useful for rendering purposes, meaning that there's will not be much free GPU cycles to spare for other purposes.
I'm not sure if your saying this is a bad thing or good ?

Rendering related compute work doesn't need to be transferred to CPU memory at all.
So non Rendering related compute work does need to be transferred to CPU memory ???
 
That's much clearer thanks sebbbi.

So just to confirm my understanding of the current situation: DX11 doesn't support async compute (but does support synchronous compute) while both Mantle and DX12 do support async compute. And on the hardware side, both GCN 1.0+ and Fermi(?)+ support async compute - through Mantle/Vulkan or DX12, but at present, no Intel GPU supports it.

Did I get anything wrong?
 
I'm not sure if your saying this is a bad thing or good ?

I took that to mean, there won't be much GPU time left for non-rendering based compute tasks, i.e. GPGPU, i.e. offloading CPU work onto the GPU, i.e. the types of tasks that require tight HSA style integration between CPU and GPU.
 
I'm not sure if your saying this is a bad thing or good ?
For game renderers, it's great.
Compute on a GPU works best on things that most closely resemble graphics, which in this case is more graphics.

So non Rendering related compute work does need to be transferred to CPU memory ???
The context was for the discrete PC space, where the GPU's memory is separated from the rest of the system in terms of physical distance and serious latency. There are quite a few general compute benchmarks where discrete cards that outweigh APUs by an order of magnitude in compute get beaten because the wait time for bus transfers is so long that the GPU card could have had a compute time of 0 and not make a difference.
If there is no bus transfer, what non-rendering workload is there that nothing but the GPU can look at the data?
 
I'm not sure if your saying this is a bad thing or good ?


So non Rendering related compute work does need to be transferred to CPU memory ???

Yes you need to synchronise. In cloth physics compute shader of Ubi soft they use a long compute shader with few synchronisation point.

And it is probably the reason in Physics library of Sony the gameplay physics stay on the CPU.
 
Last edited:
I'm not sure if your saying this is a bad thing or good ?
I just wanted to clarify that GPU is nowadays a flexible computation machine. If there are extra cycles left from rasterization work, it can be used to run compute shader based rendering tasks as well as it could be used to run some non-graphics related work. This is a good thing for graphics programmers, because it allows us to utilize the GPU better (= better graphics quality / frame rate). However I wanted to point out that IF the graphics programmers use all the GPU resources, there is no GPU resources left for other purposes. This is often fine, since the game play programmers are not skilled in GPU programming (that requires tasks to be split to tens of thousands of threads to be efficient). Graphics programmers understand how the GPU works and how to optimize the code for it. I personally prefer that graphics programmers move graphics related tasks from the CPU (such as viewport and occlusion culling) to GPU instead of pushing the game play programmers to find some gameplay tasks that could be offloaded to the GPU. That would require two way communication and synchronization (see below).
So non Rendering related compute work does need to be transferred to CPU memory ???
Obviously if you use GPGPU to offload some simulation work (such as physics or ocean simulation) from CPU to GPU, you need to copy that data back to CPU in order to update the CPU data structures. Otherwise the game logic cannot know where the objects are. When you do prosessing solely related to rendering (lighting, post processing, occlusion culling, etc), you don't need to copy the data back, since there is no reason for the CPU to know about the data that is only needed by the GPU to render the scene.
 
Back
Top