Asynchronous Compute : what are the benefits?

Right.

Right.

Right.

Wrong as written. If you are using all 1.84 TFlops for graphics, there's no space left for compute. Which lead to the old argument and many, many posts. If you aren't using all the 1.84 TF for graphics work (because the graphics don't tap all resources all the time), you can slot some compute in there and get more from the GPU than otherwise.

Long story short The jar is a ALU from the PS4 GPU, The golf balls are the graphics pipeline codes & everything else is the compute codes.


The Jar/ALU is filled with all the golf balls/graphic pipeline code that it can carry so when it come to the graphic pipeline it's filled & that's what 1.84TFLOPS worth of ALUs/Jars is going to get you using the graphic pipeline but it's not all that 1.84TFLOPS worth of ALUs/Jars can get you because their is still room for compute/ chocolate milk, "there is always room for chocolate!"
 
Last edited by a moderator:
No, because it depends on the workload. You talk as though (for example) a 1.84TFLOP GPU will only ever be able to achieve say 1.5TFLOPs of graphics related throughput. And that's not the case.

If you're not bottlenecked by other parts of the pipeline it's possible to use ALL of the 1.84 TFLOPs for graphics shaders leaving absolutely nothing left for compute shaders. In that case, async compute will gain you no extra usage from your shader array. Conversely you may be bottlenecked by another part of the pipeline on your graphics operations and thus have a full 1.5TFLOPs available to spend on compute. The reality is going to vary greatly by workload and likely millisecond to millisecond.

The bottom line is you can't use "all 1.84TF for graphics, then some more for compute". You can use as much of 1.84TF for graphics as your workload will allow given the rest of the pipelines bottlenecks and IF there's anything left over, you can use that for compute - assuming the same bottlenecks don't apply to that too of course.

Either way it's a moot point. All modern systems can do this. From what I've heard the devs on this board state recently the extra compute queues of the PS4 over other systems should have a pretty minimal impact on the efficiency of this for games.
 
... so when it come to the graphic pipeline it's filled & that's what 1.84TFLOPS worth of ALUs/Jars is going to get you using the graphic pipeline but it's not all that 1.84TFLOPS worth of ALUs/Jars can get you because their is still room for compute/ chocolate milk, "there is always room for chocolate!"
Yeah, no... you're digging yourself deeper here. I think your understanding is actually wrong. It is completely possible for a "graphics only" workload to use the entire throughput of the GPU. Adding async compute to such a case would result in no speedup (they would be run one after another). Only in cases where the graphics stuff is leaving some units idle does async compute make it faster. Now there are definitely certain common places where this is the case, but it is fairly architecture and renderer dependent.
 
If you're not bottlenecked by other parts of the pipeline it's possible to use ALL of the 1.84 TFLOPs for graphics shaders leaving absolutely nothing left for compute shaders.

What Asynchronous Compute (AC) brings is better utilization rate of the GPU. What the previous graphs show is that just by "turning on" AC, you gain easily ~25% of GPU efficiency.

From what I've heard the devs on this board state recently the extra compute queues of the PS4 over other systems should have a pretty minimal impact on the efficiency of this for games.

On a perfect world where 100% of the GPU tflops are used by games, you are right. Show me such a game. In a more real scenario, AC should improve utilization rate of the GPU easily in most (if not all) cases which would help to stabilize the framerate average level in the game and particularly those minimum fps level which in turn would help to improve better AA, resolution or framerate.

AC is perfect for particles. In most recent multiplatform games, big framerate drops mostly occurs during heavy particles scenes, like with Watch_dogs.

Infamous SS, which only uses AC for all its particles effects (and there are tons of particles effects in almost all scenes), is really impressive in how much (and different types of) particles it display in some scenes almost always at +30fps framerate.
 
AC is perfect for particles. In most recent multiplatform games, big framerate drops mostly occurs during heavy particles scenes, like with Watch_dogs.
This has nothing to do with the particle simulation and everything to do with the fill rate... And you can address that with tiled rendering (in compute or otherwise) but async has nothing to do with that.

I'm starting to agree with Shifty here... I don't think anything is to be gained by continuing this discussion. People don't even want to understand.
 
I think you guys are missing each others points somewhat.

OnQ is trying to say that the way 1.84TF is classically used by most game engines today would leave gaps. But most would consider a game running on such a GPU to be using 1.84TF.. it's all a black box to us playing the game, and the perceived reality is that it takes X TF to create X amount of Wow on the screen. But by filling the gaps we can now have X*1.25 Wow on the screen.

1.84 old paradigm TF = 1.84 Wow units
...
1.84 new paradigm TF = 1.84 * 1.25 = 2.3 WU

Therefore 2.3 WU requires 2.3 old paradigm TF. We could say PS4 punches above it's weight. Once everyone makes async common and games everywhere utilize 100% of the GPU we can consider PS4 to be just performing at the expectations of 1.84TF again.
 
Quick question. Compute tasks aren't running all the time are they? I've been told that they happen when less traditional GPU tasks are being done. Does that mean psychics can just start and stop depending on GPU load XD
 
This has nothing to do with the particle simulation and everything to do with the fill rate... And you can address that with tiled rendering (in compute or otherwise) but async has nothing to do with that.

I'm starting to agree with Shifty here... I don't think anything is to be gained by continuing this discussion. People don't even want to understand.

It's like this whole thread, because of the title, would be de facto wrong as if Cerny-hyped-then-must-be-wrong Asynchronous Compute couldn't possibly bring any benefits in 3D engines efficiency and videogames gfx.
 
Quick question. Compute tasks aren't running all the time are they? I've been told that they happen when less traditional GPU tasks are being done. Does that mean psychics can just start and stop depending on GPU load XD

Probably depends on the programmer and the tasks on hand. As far as I know, on paper, they can mix the jobs together wave after wave. The h/w doesn't care except for that one ACE that's reserved by the system for OS UI -- if my memory still serves me.
 
OnQ is trying to say that the way 1.84TF is classically used by most game engines today would leave gaps.
That statement doesn't make any sense. And yeah, if it's "black box" and you don't understand the computational efficiency that is being gained, fine. But if you want to actually understand it, you have to shed yourself of those naive notions... and frankly to do that you're going to have to not come at this from the point of view of being a fanboy trying to justify hardware "punching above ones weight" or other such silliness.

Does async compute allow improved GPU efficiency in some cases? Sure. But so do lots of other things as that is ultimately the whole point of a hardware architecture. The silliness starts when you guys try to assign general numbers to "how much more fastness does that make it?? Is that now 1.38726x better than this other consoles NYAAAA?". That is completely hardware architecture *and* workload dependent and for the love of god just rid yourself of that kind of thinking if you want to engage in a technology discussion.

It's like this whole thread, because of the title, would be de facto wrong as if Cerny-hyped-then-must-be-wrong Asynchronous Compute couldn't possibly bring any benefits in 3D engines efficiency and videogames gfx.
That sentence doesn't make any sense to me either... except grammatically this time.
 
Yeah, I'm shooting this thread in the head now before it has more of a chance at really hurting innocent brain cells.
 
The Jar/ALU is filled with all the golf balls/graphic pipeline code that it can carry so when it come to the graphic pipeline it's filled & that's what 1.84TFLOPS worth of ALUs/Jars is going to get you using the graphic pipeline but it's not all that 1.84TFLOPS worth of ALUs/Jars can get you because their is still room for compute/ chocolate milk, "there is always room for chocolate!"
Which is what I described in my conversation summary. 1.84 TF of ALU provides 10 VGUs of graphics and still has room for compute to provide some flops of compute. Although I did write TF of compute where I should have written GF or just flops, failing to account for the performance scale.
 
Resolution is one of many tick box features one can play with. If devs really wanted to, they could scale back on shadow detail, LOD, lighting, or whatever parts of the graphics engine they want to sacrifice if they really needed 1080p as a constant without affecting the framerate.
The fact that we get 1080p with tearing and sub-30fps gameplay is, in my humble opinion, just a matter of bad decisions in the game making process.

Thanks, saved me typing this :yes:

The conundrum there is that shifting cpu code to gpu will require that they forgo the 1080p bullet point.

Nope. Both Microsoft and Sony have spoken about GPGPU and both agree there is ample spare ALU resource in which to slot compute tasks in around tradiitonal GPU workloads.
 
No, on PS4 I don't believe that to be honest.

Nope. Both Microsoft and Sony have spoken about GPGPU and both agree there is ample spare ALU resource in which to slot compute tasks in around tradiitonal GPU workloads.

They can publicly say whatever they like, but gpgpu isn't free. It can serve to free substantial cpu time but it's still at a cost gpu side, unless they have figured out how to create an unlimited alu gpu.
 
They can publicly say whatever they like, but gpgpu isn't free. It can serve to free substantial cpu time but it's still at a cost gpu side, unless they have figured out how to create an unlimited alu gpu.
Nobody is claiming that GPGPU is free, what both companies are saying is that the 1.8 teraflops of available ALU in PS4's GPU and 1.31 teraflops of available ALU in Xbox GPU includes ample spare ALU above what the GPUs will be using to render graphics.
 
Nobody is claiming that GPGPU is free, what both companies are saying is that the 1.8 teraflops of available ALU in PS4's GPU and 1.31 teraflops of available ALU in Xbox GPU includes ample spare ALU above what the GPUs will be using to render graphics.

Sure, if they drop resolution and/or scale back other visual effects. Those same alu's used for gpgpu also happen to have their workload dramatically increased as resolution increases. Gpgpu will eventually pay dividends because they were stuck with particularly crappy cpu's due to Intel not being affordable, no suitable Arm alternative and no one willing to drop 2 billion to create something custom. But given that console alu is already being exhausted just one year in as seen by the difficultly in these boxes even maintaining proper framerates at long standard 1080p resolution, then it's a pipe dream for one to believe that they can suddenly shove entirely new alu workloads on these already taxed gpu's and maintain 1080p. Unless they forgo visual features or accept unstable framerates of course, which given the market value of the 1080p bullet point may indeed prove to be a viable option, or will also be a viable option once they realize they are cpu screwed.
 
Nope. Both Microsoft and Sony have spoken about GPGPU and both agree there is ample spare ALU resource in which to slot compute tasks in around tradiitonal GPU workloads.
There is of course some spare ALU (because no GPU code is perfect)... but what makes you think I wouldn't want to use this spare ALU myself to run my lighting, post processing and animation code (etc) as async compute instead of giving it away to gameplay programmers? :)
 
There is of course some spare ALU (because no GPU code is perfect)... but what makes you think I wouldn't want to use this spare ALU myself to run my lighting, post processing and animation code (etc) as async compute instead of giving it away to gameplay programmers? :)

Because you're a nice guy? ;)
 
They can publicly say whatever they like, but gpgpu isn't free. It can serve to free substantial cpu time but it's still at a cost gpu side, unless they have figured out how to create an unlimited alu gpu.
Nobody is claiming that GPGPU is free, what both companies are saying is that the 1.8 teraflops of available ALU in PS4's GPU and 1.31 teraflops of available ALU in Xbox GPU includes ample spare ALU above what the GPUs will be using to render graphics.
Graphics workloads leave the GPU with quite a lot of spare cycles. If these are used for compute, you effectively get them for free. It's just a significant optimisation. However, those compute cycles could be used for something other than AI, such as compute based rendering, so there is a graphical cost relative to the maximal output.

Or rather, there's two perspectives:
1) Relative to current graphics techniques, GPGPU AI can be obtained for free.
2) Relative to the maximum graphical output possible, GPGPU AI would come with a cost, taking GPU resources away from prettifying the game.
 
Last edited by a moderator:
Back
Top