Asynchronous Compute : what are the benefits?

ISS uses asynchronous compute to process particles. The compute shader is dispatched to the GPU by the CPU. Asynchronous compute doesn't come for free on the CPU or GPU. There's nothing that suggests there's a huge amount of idle GPU time that can easily be exploited with asychronous compute. Any significant amount of processing done in a compute shader, synchronously or asynchronously will be processing time unavailable to other shaders/algorithms. If you were to do AI on the GPU, that's GPU time unavailable for graphics rendering. The great thing about GPGPU is you can do things that traditional pixel/vertex shaders cannot, and it should allow for some more efficient rendering (or so I've read).

There can actually be quite a bit of "idle" time on a GPU, at least if you look at the resources used by compute shaders. Even if you ignore rendering phases where ALU's aren't used heavily to begin with (for instance, depth-only rendering for shadow maps) there's typically quite a bit of time where the GPU has to sync/stall in order to allow subsequent rendering passes to run in lock-step. Async compute offers a convenient way of executing shaders that bypass all of the syncing (hence, the "async" part of its name), which allows you to "fill up" that idle time with compute jobs. I don't really want to go into too many specifics due to NDA, but my friends at Q games are a bit more cavalier and have shared some of their profiling data in these slides (see slide 83).

Obviously it depends quite a bit on what kinds of compute jobs you're running and what else is happening concurrently on the GPU. However it certainly isn't so cut and dry as "running async compute shaders always takes away processing time from graphics", if that's what you're suggesting.
 
Last edited:
I think the contentious issue is more the idea that using async to 'offload' work from the cpu can be done without using resources that could also be accessed using async for graphics (or anything else).
 
I think the contentious issue is more the idea that using async to 'offload' work from the cpu can be done without using resources that could also be accessed using async for graphics (or anything else).

It doesn't really matter. If the main bottleneck is CPU then your GPU resources are going to waste, producing a very pretty slideshow.
 
Gpgpu doesn't help, if the cpu and bandwidth is the bottleneck. The cpu eats up a big part of the available bandwidth when under stress, so the gpu starves because the rest of the bandwidth (about 100gb) is needed for graphics calculations.
That is why I said barring the bottlenecks thing.
But if they can actually use the GPU to do their CPU task, then maybe since the CPU is being used less, thus more bandwidth for the GPU because less contention with the CPU.
 
I hope compute centric games engine will be there for 2015 holidays... The wait is long...

It is sad to see two GPU centric consoles not show their potential.
 
That is why I said barring the bottlenecks thing.
But if they can actually use the GPU to do their CPU task, then maybe since the CPU is being used less, thus more bandwidth for the GPU because less contention with the CPU.

I thought one of their experiments is to see if they can make the GPU scheduling and jobs more autonomous ? If successful, that should free up some of the CPU dependency (but not all).

I hope compute centric games engine will be there for 2015 holidays... The wait is long...

It is sad to see two GPU centric consoles not show their potential.

One of the questions in my mind is how do Sony see and position themselves in the PS4 developer ecosystem.

During the PS3 era, Mark Cerny at first was thinking of keeping his Cell expertise to "themselves". He saw it as a competitive advantage against other studios. They later changed position when they found out everyone was having serious trouble. If Sony didn't help out, PS3 would suffer as a platform.

So now if PS4 is easy to develop for, and the low level tools are available since day 1, will they keep their approach to themselves ? Or will they share the more advanced techniques out ? :)

Granted, the cross platform developers should be very familiar with AMD tech.
 
With PS4, Xbox One and PC(mantle, DX12) all supporting asynchronous compute, there should be no shortage of knowledge about algorithms well suited to GPUs.
 
http://m.neogaf.com/showthread.php?t=1009066&page=1

Very good thread Dylan cuthbert give some precision about the async compute.

some more clarifications (just on terminology)

To oversimplify a bit, the compute units (CUs) are the things that run your shader. So whether you are doing graphics shaders or compute, they are all executing on the compute units. The async compute pipes are there to give you a way to supply more work to the GPU. Imagine your graphics work looks like this
  1. write to a texture
  2. wait for it to finish
  3. use texture as input for the next shader

In a naive implementation, that second shader can't run until the first finishes. That means towards the end there will be alot of the GPU hardware sitting around idle until the previous shader finishes. The compute pipes give you a way of supplying more work to the GPU that can fill in the gaps left by graphics.

Sorry if thats all a bit basic and oversimplified, but I'm not sure what level of detail is best to post

Regarding CPU sync, we have almost none. We use the CPU to kick GPU work and the CPU never stalls waiting for the GPU during our "frame". We do have some weirdness where an end of pipe interrupt wakes up our vsync thread which sleeps waiting for vsync and then writes a label allowing the GPU to continue, but thats just because work after the label may write to the current display buffer and we want to flip away from it before continuing. We basically tried to make everything as async as possible to avoid render and main thread involvement of any kind and minimize stalls.

Finally, while you can build command buffers and kick them from the GPU if you're clever about it, that is a topic for another day

They are early adopter of async compute

http://m.neogaf.com/showpost.php?p=156110542
 
Last edited:
Xbone GPU is not optimized for astync compute like PS4 GPU is.
???
You mean because of the ACEs?
They don't really matter. They might squeeze a bit more out of the GPU, but the 2 command processors do something similar. Just increasing chances you can use resources that may be unused, so you can use the gpu more efficient.

what really worries me is, that if the GPU is used more efficiently, it would also consume more power. That on the other hand increases the heat. And the PS4 is already hot and loud enough.
I even don't know if it can deliver enough energy for the GPU, CPU and memory, if all is under real pressure. PS4 already uses ~140W, how high can it go?
 
Xbone GPU is not optimized for astync compute like PS4 GPU is.

I think that depends on what task you're attempting to do.
I'm curious as to what you defined as an optimized async compute experience for PS4. If it's just the larger number of ACE queues, then I don't think that necessarily means Xbox is not optimized for async compute (wrt its own profile) - but I can't fault the idea that more queues would therefore mean more async compute (for PS4).

edit: nvm - hmm, this is a different approach - time to see what MS did, or didn't.
  • "First, we added another bus to the GPU that allows it to read directly from system memory or write directly to system memory, bypassing its own L1 and L2 caches. As a result, if the data that's being passed back and forth between CPU and GPU is small, you don't have issues with synchronization between them anymore. And by small, I just mean small in next-gen terms. We can pass almost 20 gigabytes a second down that bus. That's not very small in today’s terms -- it’s larger than the PCIe on most PCs!
  • "Next, to support the case where you want to use the GPU L2 cache simultaneously for both graphics processing and asynchronous compute, we have added a bit in the tags of the cache lines, we call it the 'volatile' bit. You can then selectively mark all accesses by compute as 'volatile,' and when it's time for compute to read from system memory, it can invalidate, selectively, the lines it uses in the L2. When it comes time to write back the results, it can write back selectively the lines that it uses. This innovation allows compute to use the GPU L2 cache and perform the required operations without significantly impacting the graphics operations going on at the same time -- in other words, it radically reduces the overhead of running compute and graphics together on the GPU."
  • Thirdly, said Cerny, "The original AMD GCN architecture allowed for one source of graphics commands, and two sources of compute commands. For PS4, we’ve worked with AMD to increase the limit to 64 sources of compute commands -- the idea is if you have some asynchronous compute you want to perform, you put commands in one of these 64 queues, and then there are multiple levels of arbitration in the hardware to determine what runs, how it runs, and when it runs, alongside the graphics that's in the system."
 
Last edited:
I think that depends on what task you're attempting to do.
I'm curious as to what you defined as an optimized async compute experience for PS4. If it's just the larger number of ACE queues, then I don't think that necessarily means Xbox is not optimized for async compute (wrt its own profile) - but I can't fault the idea that more queues would therefore mean more async compute (for PS4).

It is not only 8 ACE, volatile bits preventing cache trashing and bus Onion + bypassing GPU cache for synchronisation.

All GCN are pretty good for Async compute like in PS4 and XB1.

Async compute is useful for PC too.
 
This time no exotic hardware the "secret sauce" is the same on PS4 and XB1 and work for PC too. It is a good thing...
 
???
You mean because of the ACEs?
They don't really matter. They might squeeze a bit more out of the GPU, but the 2 command processors do something similar. Just increasing chances you can use resources that may be unused, so you can use the gpu more efficient.

what really worries me is, that if the GPU is used more efficiently, it would also consume more power. That on the other hand increases the heat. And the PS4 is already hot and loud enough.
I even don't know if it can deliver enough energy for the GPU, CPU and memory, if all is under real pressure. PS4 already uses ~140W, how high can it go?
Are you saying an engineering company, like Sony, are in the business of wasting APU space on optimizations that "don't really matter" (especially 4x the amount of async compute pipelines)? That doesn't seem logical.

These consoles are quite quiet and a LOT cooler than last gen launch consoles. The GPU isn't going to draw more power than it's max (judged from theoretical max performance). Async compute is just making the GPU more capable of meeting and staying closer to that max.

Having 4x more opportunity at getting something worked on is going to be a lot better at giving you opportunity to get more GPU time back. It just makes sense.
 
???
You mean because of the ACEs?
They don't really matter. They might squeeze a bit more out of the GPU, but the 2 command processors do something similar. Just increasing chances you can use resources that may be unused, so you can use the gpu more efficient.

what really worries me is, that if the GPU is used more efficiently, it would also consume more power. That on the other hand increases the heat. And the PS4 is already hot and loud enough.
I even don't know if it can deliver enough energy for the GPU, CPU and memory, if all is under real pressure. PS4 already uses ~140W, how high can it go?

Because the console shipped with this in mind, it's cooling and power profile is rated for it.
 
Are you saying an engineering company, like Sony, are in the business of wasting APU space on optimizations that "don't really matter" (especially 4x the amount of async compute pipelines)? That doesn't seem logical.
You quoted me out of context. "that doesn't really matter" was related to the statement "Xbone GPU is not optimized for astync compute like PS4 GPU is."
Yes, the ACEs will help to use the GPU more effiencent. But the 2 compute command processors will do similar things. so you just can't say that this thing is more optimized for that thing.
 
You quoted me out of context. "that doesn't really matter" was related to the statement "Xbone GPU is not optimized for astync compute like PS4 GPU is."
Yes, the ACEs will help to use the GPU more effiencent. But the 2 compute command processors will do similar things.

Xbox One has two ACE. The ACEs are just re-labelled from what I understand to Compute Command Processors. MS did claim that they did customize them to be better, somehow, likely at scheduling, but the number of available compute threads is only 16. But I'm unsure as of this moment how many threads are actually required as games continue to evolve.
 
Back
Top