Asynchronous Compute : what are the benefits?

Sebbbi, since GPUs are wider and more parrallel than GPUs, would it not mean that they would benefit more from multiple Async compute queues than a CPU?
 
Multiple compute queues is a bit like hyperthreading. Going from 1 to 2 threads (independent instructions streams) helps the most. Modern Intel CPU (Haswell, Broadwell, etc) have 2-way hyperthreading. Some IBM and Sun CPUs have 4-way and 8-way hyperthreading but that doesn't give them much additional advantage. Same is true on GPU asynchronous compute.

But isn't MIMD better for things like raytracing?
 
Sebbbi, since GPUs are wider and more parrallel than GPUs, would it not mean that they would benefit more from multiple Async compute queues than a CPU?
GPUs are wider, but each command processor command (draw/compute kernel invocation) is basically a parallel_for_each (it is as wide as the work item count). Also the GPU can run multiple kernels simultaneously from the same command queue, if these kernels have no dependencies of each other (roughly this matches ILP in superscalar CPU execution). Basically a single command stream can fill the GPU in a similar way than a single command stream (thread) can fill the CPU core.

Asynchronous compute is useful in similar cases where hyperthreading is useful.

The first use case are stalls. A queue needs to wait for something (for example end of the raster operations and ROP cache flush to start sampling that render target as a texture in the following post process shader). On CPU side a core can stall for example if it waits some work to finish on a mutex or semaphore. In this scenario the other command stream is stalled, while the other runs at full rate. Hyperthreading / asyncronous compute keeps the CPU / GPU fed (it doesn't need to idle). Unless of course both instruction streams stall at the same time.

The second use case is workload that is bound by resource limits or fixed function hardware. On GPU, there is a fixed maximum primitive setup rate, maximum fill rate, maximum texture filtering rate, etc. Bandwidth and LDS work memory is also limited. When any of these things are the bottleneck, the shader cannot run at maximum speed, meaning that some instructions slots are unused. Asynchronous compute can use these instruction slots (if that shader has different bottlenecks). Similarly on CPU side, the instruction stream might have execution bubbles for various reasons. The instruction mix might overutilize some CPU execution ports or there might be cache misses (memory bottleneck). Hyperthreading can fill these small bubbles with instructions from the other thread.

Obviously the CPU and GPU are quite different, but both hyperthreading and asynchronous compute give the execution units more options (more TLP) to fill the execution pipelines with steady instruction stream.
 
That chart tells us Battlefield4 is already using these features on PS4, and I must confess I am left a bit let-down by this information because I am having a pretty sizeable problem evaluating on which platform to invest for the future DICE games on very very very tight budget.
Save to upgrade my 2008 vidcard in a year time, or keep my PS4.



By the above statement you can tell I am implying BF4 ps4 performance is not really ideal (framerate, aliasing issues).
It is a launch title, but on an architecture well known right of the bat. From that chart, this 2013 title is using hardware features most 2015 games are oblivious to.



I shatters my (wallet) biased hope BF4 was not the pinnacle of what ps4 can do for this brand. But it seems the game delivered on many ways and perhaps it is the pinnacle.
 
I shatters my (wallet) biased hope BF4 was not the pinnacle of what ps4 can do for this brand. But it seems the game delivered on many ways and perhaps it is the pinnacle.
It's still cross platform. Wait for Battlefront on PS4 for judgement.
 
Cross generation surely?

Cross generation or not, async shader usage or not, BF4 was DICE's first attempt at development on the PS4, a platform that had just launched at the time and thus still had fairly immature dev tools.

There's simply no way that BF4 can be considered the very best that DICE can do on the PS4 platform. I'll literally eat my sock if that turns out to be the case.
 
Is their second attempt better?
Hardline? No. But they may not have been giving them the tools. It may have been just up to Visceral to get the job done, which from a frame rate perspective as I understand it they did fix up. Bioware put in their own graphical upgrades but still a far cry from DICEs new features. Once again 1 code base 5 platforms is a ton of work.

I think SW:BF will be the proper deliverable to keep an eye out for.
 
Cross generation or not, async shader usage or not, BF4 was DICE's first attempt at development on the PS4, a platform that had just launched at the time and thus still had fairly immature dev tools.

There's simply no way that BF4 can be considered the very best that DICE can do on the PS4 platform. I'll literally eat my sock if that turns out to be the case.

Agreed. While I imagine Dice got a headstart with delving deeper and exposing additional performance due to Mantle on the PC and the lower level api of the PS4, I doubt BF represent nothing more than a good first step.

Hardline's performance is probably due to Dice not wanting to pour additional resources to improve their current DX11/ps4 path and Hardline releasing too early to take advantage of Frostbite transition to DX12.
 
Last edited:
Once Xbox 360 and PS3 drop off, DICE can focus specifically on compute shaders.
Asynchronous compute shaders, might be in full swing, but I doubt it, from our understanding, pre-GCN and many nvidia cards do not support async compute shaders. So there will be a limit on how far they want to go with that, unless they want to split the PC platform. I am curious as to how they will attempt to handle this on PC.
 
Once Xbox 360 and PS3 drop off, DICE can focus specifically on compute shaders.
Asynchronous compute shaders, might be in full swing, but I doubt it, from our understanding, pre-GCN and many nvidia cards do not support async compute shaders. So there will be a limit on how far they want to go with that, unless they want to split the PC platform. I am curious as to how they will attempt to handle this on PC.


Speak of the devil :eek: http://www.neogaf.com/forum/showthread.php?t=1025359
 
As I understood it, Hardline smoother framerate came at a visual cost (which by the way, its the correct choice to make).
The maps are smaller, their complexity diminished, trademark battlefield destruction mostly absent.

Not sure if the engine was tweaked at all, or if the sliders were just turned down (again, thank god they chose framerate over spectacle, although it doesn't feel like battlefield)
 
Is their second attempt better?

Hardline was developed by Visceral games, and not the same development team that worked on BF4.

DICE provided the engine and tool for the game, but that's it. DICE is no more responsible for BF:Hardline's visual fidelity than Epic games is for the plethora of games developed on UE3/4.

So in answer to your question, we haven't seen DICE's second attempt yet. And equally BF:Hardline is Visceral's first attempt at a game for current gen consoles, only using an engine that they didn't even develop themselves (a situation that always comes with its own set of issues).
 
In the context of what Asynchronous Compute means in this thread as a label for providing the GPU with the capability of handling independent compute streams, no.

"Obviously a major difference is the absence of SPUs in the hardware, so parallelism models have shifted to utilise the three cores," Porter affirms. "We started with the PC framework as reference here: SPU job deployment is modelled as a synchronous process on that platform, with the actual task functionality identical to the PS3. That's obviously a lot slower, but provided the hook for a custom job manager on Vita. Over time, we've migrated most of the original SPU jobs onto the Vita cores and returned back to a fully asynchronous model."

This is describing the porting process from PS3 to the Vita when there were no SPUs and more than one standard core. The job system was ported over in a conservative form that handled jobs sequentially. This was developed into a parallel system where jobs could be executed across the CPU cores independently, as was done on the PS3.

The capability to run multiple jobs or threads without them being constrained by each another's sequencing is not a GPU invention.
 
Back
Top