Asynchronous Compute : what are the benefits?

Graphics workloads leave the GPU with quite a lot of spare cycles. If these are used for compute, you effectively get them for free.

Only if they consistently leave the gpu with spare cycles every frame, which doesn't happen because when it does then the graphics programmers quickly munch up that remaining alu anyways. Realistically the amount of spare alu bounces around in your typical game from frame to frame but you have to account for worse case scenario to maintain stable framerate. So if there is 3ms free on one frame then you can't just assume you will have 3ms of gpu to use for gpgpu every frame because in a few frames the gpu may suddenly be maxed out again,.Gpgpu just means that the graphics coder guys will now have to share alu with the non graphics coders whereas in the past we greedily had them all to ourselves.

EDIT: I think part of the confusion could be from the 360 days where you could slot various workloads on it's gpu free if they were non competing workloads. So on alu heavy shaders we could shove workloads that were sampler heavy and get some stuff almost free. Hence why I would combine an alu heavy post process step with a sampler heavy post process step into the same shader together and they would live in relative harmony on the gpu. In this case though gpgpu and resolution are both heavily competing for the same resource, namely alu, which is why they clash.
 
Last edited by a moderator:
The important question is: What is GPGPU?

Is animation GPGPU? Is occlusion culling GPGPU? Is scene setup (matrix & constant buffer update) GPGPU? Is texture transcoding GPGPU? Traditionally games have been doing all of these tasks purely on the CPU (and many still do), but modern engines can do all of these on the GPU (in the compute pipeline). You can free at least two whole CPU cores to gameplay code if you move these graphics engine tasks to the GPU.

Is lighting or particle rendering GPGPU if I do them in a compute shader (not using any fixed function graphics hardware)? What about raytraced reflections or octree traversals (traditional CPU tasks)?
 
Only if they consistently leave the gpu with spare cycles every frame, which doesn't happen because when it does then the graphics programmers quickly munch up that remaining alu anyways. Realistically the amount of spare alu bounces around in your typical game from frame to frame but you have to account for worse case scenario to maintain stable framerate. So if there is 3ms free on one frame then you can't just assume you will have 3ms of gpu to use for gpgpu every frame because in a few frames the gpu may suddenly be maxed out again,.Gpgpu just means that the graphics coder guys will now have to share alu with the non graphics coders whereas in the past we greedily had them all to ourselves.

EDIT: I think part of the confusion could be from the 360 days where you could slot various workloads on it's gpu free if they were non competing workloads. So on alu heavy shaders we could shove workloads that were sampler heavy and get some stuff almost free, etc. In this case though gpgpu and resolution are both heavily competing for the same resource, namely alu, which is why they clash.
No, the 'confusion' comes from AMD and Sony's discussion of Async compute and the known applications including games that profiled the graphics workloads and found async compute was able to fit in between workloads. eg. Slide 82, over 1/7th (5ms) of the frame time saved.

You seem to be confusing the issue with general GPGPU and are missing the 'async' part and the ACEs.
 
1080p definitely needs AA when viewed on large screes. 50"+ TVs are dirt cheap nowadays and people buy these to the same living rooms that used to have 28" CRTs. 1080p on a modern big screen TV produces roughly the same pixel size than 480p did some years ago on an average TV set.

Post AA is not enough for 1080p, not even on a small computer monitor. Edge crawling is still too distracting and even a bigger problem nowaways since the geometry complexity has increased (more small details and more draw distance -> more high contrast edges).

I'd say that (max) four geometry subsamples per pixel is a good compromise when combined with a smart custom resolve. Sampling doesn't need to be brute force. You don't need the same amount of sampling information on every screen location (not always even a single sample per pixel). It's the high contrast areas that matter. I personally feel that every console game should output at native 1080p. Scalers always cause image quality degradation. But I want to emphasize that this doesn't mean that the game should brute force sample everything at the same frequency (fixed distribution of ~2M samples per frame). Also throwing away all the work done in the previous frames is stupid. Game developers should definitely learn from the video codecs. 1080p wouldn't be possible if the video was uncompressed and no data was reused (every pixel stored again for every frame at full quality).

So I believe 1080p is still the way to go (even at 60 fps).

+1 Great post sebbbi.
 
No, the 'confusion' comes from AMD and Sony's discussion of Async compute and the known applications including games that profiled the graphics workloads and found async compute was able to fit in between workloads. eg. Slide 82, over 1/7th (5ms) of the frame time saved.

You seem to be confusing the issue with general GPGPU and are missing the 'async' part and the ACEs.

Async or not, how does that remove the clash of resolution and gpgpu workloads? I don't follow. Whatever you do according to that presentation, you can do more of if you drop resolution. Hence the conundrum, at some point the choice has to be made regardless, either 1080p or better visuals. Those two points exist in overlapping space meaning one or both have to be compromised regardless of if, when or how gpgpu is used.


The important question is: What is GPGPU?

Is animation GPGPU? Is occlusion culling GPGPU? Is scene setup (matrix & constant buffer update) GPGPU? Is texture transcoding GPGPU? Traditionally games have been doing all of these tasks purely on the CPU (and many still do), but modern engines can do all of these on the GPU (in the compute pipeline). You can free at least two whole CPU cores to gameplay code if you move these graphics engine tasks to the GPU.

Is lighting or particle rendering GPGPU if I do them in a compute shader (not using any fixed function graphics hardware)? What about raytraced reflections or octree traversals (traditional CPU tasks)?

That's the thing, for this gen it seems to me the money shot is to use gpgpu to free up cpu cores, not to increase resolution.
 
30 fps (or 'uncapped' 40 fps) and limiting some aspects of simulation complexity, probably. :(
There's still async and direct GPGPU, as well as remote compute processing ;)

Though the former is supported by more platforms than the latter, is likely easier to code, is more reliable, and is easier to deploy.
 
I truly believe Naughty Dog, SSMS, and a few others will skillfully prove that certain (maybe all) CPU task, can work perfectly within the GPU environment without hurting GPU performance... nor reducing the quality of assets within the graphic pipeline. We shall see...
 
Arent the CU's supposed to compensate for the CPU bottleneck onces devs get the hung of it? Thats the impression I got from Cerny's comments
 
I truly believe Naughty Dog, SSMS, and a few others will skillfully prove that certain (maybe all) CPU task, can work perfectly within the GPU environment without hurting GPU performance... nor reducing the quality of assets within the graphic pipeline. We shall see...

This doesn't sound right. Regardless of how many things are suitable for computing on the GPU, you're always going to be taking GPU time away from graphics. It sounds like the bigger win will come from moving away from traditional pixel, vertex shaders to more efficient compute shader algorithms that will free up GPU time, so that there is spare processing left on the GPU to offload from the CPU.
 
This doesn't sound right. Regardless of how many things are suitable for computing on the GPU, you're always going to be taking GPU time away from graphics. It sounds like the bigger win will come from moving away from traditional pixel, vertex shaders to more efficient compute shader algorithms that will free up GPU time, so that there is spare processing left on the GPU to offload from the CPU.

From my understanding, slotting CPU centric task into proper time slices (GPU idle time) does work quite well, without affecting basic GPU needs.
 
If you have GPU time to do (traditionally) CPU stuff, you have GPU time to graphics stuff.

That's it. If you think anything else, it's a misunderstanding. Time is time.

The hand you use to eat nachos is a hand you can't use to scratch your balls.
 
If you have GPU time to do (traditionally) CPU stuff, you have GPU time to graphics stuff.

That's it. If you think anything else, it's a misunderstanding. Time is time.

The hand you use to eat nachos is a hand you can't use to scratch your balls.
You can. You probably shouldn't do it, but you definitely can. Unless you're such a ferocious eater that you eat it non stop thus has no down time for your hand to do something else.

Anyway, PS4 does have more CU. From what I see, Graphic wise (in GTA5 and ACU), PS4 wasn't 40% prettier than X1, thus probably could afford more GPU utilization.
 
You're confusing available resources on PS4 with how good you think XBone is.

You can't see blast processing in action, and allocate it a percentage.
 
From my understanding, slotting CPU centric task into proper time slices (GPU idle time) does work quite well, without affecting basic GPU needs.

Example? Honestly question. I have seen this brought up for two years, and have yet to see where this actually works. Can it help offset the very weak CPU's that power current gen hardware? Sure, but the idea that physics, AI, and gameworld logic will make big progress thanks to GPGPU programming is still in its infant stage. GPU's still suck at branching code, and the current gen console hardware has weak cpu's Expect prettier versions of last gen games, and little more.
 
Sure, but the idea that physics, AI, and gameworld logic will make big progress thanks to GPGPU programming is still in its infant stage.

Read the Infamous Second Son pdf and tech articles about how Sucker Punch achieved the awesome particle physics (amongst some other things) within ISS. There are also some great articles on Resogun... and I'm pretty sure The Tomorrow Children is using Async Compute as well.
 
Last edited:
Read the Infamous Second Son pdf and tech articles about how Sucker Punch achieved the awesome particle physics (amongst some other things) within ISS. There are also some great articles on Resogun... and I'm pretty sure The Tomorrow Children is using Async Compute as well.

ISS uses asynchronous compute to process particles. The compute shader is dispatched to the GPU by the CPU. Asynchronous compute doesn't come for free on the CPU or GPU. There's nothing that suggests there's a huge amount of idle GPU time that can easily be exploited with asychronous compute. Any significant amount of processing done in a compute shader, synchronously or asynchronously will be processing time unavailable to other shaders/algorithms. If you were to do AI on the GPU, that's GPU time unavailable for graphics rendering. The great thing about GPGPU is you can do things that traditional pixel/vertex shaders cannot, and it should allow for some more efficient rendering (or so I've read).
 
You're confusing available resources on PS4 with how good you think XBone is.

You can't see blast processing in action, and allocate it a percentage.
I'm commenting more on the nachos thing, which is clearly a wrong analogy since you can do what you said you can't do. Of course can do it and should do it is another thing.

Anyway, my comment was more on ACU thing where clearly the dev leave some GPU resource on PS4. Maybe if they can utilize the extra resource (barring the bottleneck on other parts), it could help the game achieve constant 30fps on PS4. Yes, the GPU might be bad for the thing that they tried to do in CPU, but it's sitting there probably doing nothing. Might as well use it.
 
Example? Honestly question. I have seen this brought up for two years, and have yet to see where this actually works. Can it help offset the very weak CPU's that power current gen hardware? Sure, but the idea that physics, AI, and gameworld logic will make big progress thanks to GPGPU programming is still in its infant stage. GPU's still suck at branching code, and the current gen console hardware has weak cpu's Expect prettier versions of last gen games, and little more.

Not all algorithm are GPGPU friendly but raycasting for visibility(KZ SF slide*) or pathfinding, cloth physics and collision detection(Ubi soft slide), particle physics are GPU friendly and probably other algorithm. But before transfer this algorithm from CPU to GPU, the dev need to move all this drawcall from CPU to GPU and give more room to other calculation on CPU...

*Guerilla want to move the calculation on GPU. It is a bottleneck on KZ SF because it is calculated on CPU. From Cerne interview raycasting is useful for sound calculation too...
 
Last edited:
I'm commenting more on the nachos thing, which is clearly a wrong analogy since you can do what you said you can't do. Of course can do it and should do it is another thing.

Anyway, my comment was more on ACU thing where clearly the dev leave some GPU resource on PS4. Maybe if they can utilize the extra resource (barring the bottleneck on other parts), it could help the game achieve constant 30fps on PS4. Yes, the GPU might be bad for the thing that they tried to do in CPU, but it's sitting there probably doing nothing. Might as well use it.
Gpgpu doesn't help, if the cpu and bandwidth is the bottleneck. The cpu eats up a big part of the available bandwidth when under stress, so the gpu starves because the rest of the bandwidth (about 100gb) is needed for graphics calculations.
 
In Ubi slide they give an explanation about collision detection on CPU and GPU. On CPU you use a bounding box and do some early rejection of the vertex with branchs to do less calculation. On GPU you brute force and do the calculation fort all vertices. Final result it goes one order of magnitude faster on GPU... You don't need to use branch on GPU if you can brute force and have a performance gain. The constraints are to work on big data set with parralellisable algorithm...

The console are only one year old. And like some dev like sebbbi said on the forum be patient end of 2015 and 2016, 2017 will be interesting...
 
Last edited:
Back
Top