Asynchronous Compute : what are the benefits?

onQ

Veteran
Asynchronous Compute seems to be the biggest customization Sony made to the PS4 GPU hardware, yet I haven't seen much talk about it.

What do you think will be the biggest benefits of having a asynchronous compute architecture in a console?
 
It's one of the things I'm most excited about. I'm really curious about Knack, and how much of the particle physics are utilized in gameplay. Right now it seems Knack's pieces collide with enemies when you do a special move, which is cool, but I wonder what else they have up their sleeve. Havok uses GPGPU for particles as well.


I can't wait to see what other algorithms devs think of this generation. It's way more interesting to me than the prettier graphics I've been chasing since my voodoo banshee days lol.
 
Asynchronous Compute seems to be the biggest customization Sony made to the PS4 GPU hardware, yet I haven't seen much talk about it.
I don't believe that is a customization. GCN wasn't introduced with Asynchronous Compute Engines years before the PS4 for no reason.
Sony did drive certain optimizations for job dispatch and cache behavior when using coherent memory. None of the disclosures fundamentally change the nature of GPU compute, although they do seem to be targeted at reducing queueing delay at the front end and very serious overheads related to cache behavior at the other.


What do you think will be the biggest benefits of having a asynchronous compute architecture in a console?
The Jaguar cores are not computational monsters, and they can't utilize all that memory bandwidth.
The advantage for loads that fit the CUs well, and possibly even some that don't (if the CPU section is already overburdened), is that a significant amount of peak computational ability is available with hopefully modest impact in a design that offers little other alternative.
 
I don't believe that is a customization. GCN wasn't introduced with Asynchronous Compute Engines years before the PS4 for no reason.
Sony did drive certain optimizations for job dispatch and cache behavior when using coherent memory. None of the disclosures fundamentally change the nature of GPU compute, although they do seem to be targeted at reducing queueing delay at the front end and very serious overheads related to cache behavior at the other.



The Jaguar cores are not computational monsters, and they can't utilize all that memory bandwidth.
The advantage for loads that fit the CUs well, and possibly even some that don't (if the CPU section is already overburdened), is that a significant amount of peak computational ability is available with hopefully modest impact in a design that offers little other alternative.

The other GCN GPU's only have 2 ACE's for doing 4 compute jobs at a time PS4 GPU has been customized to have 8 ACE's for doing 64 compute jobs at a time.
 
Asynchronous Compute isn't defined by queue count.
The question is "is there compute, and can it run out of lockstep with the CPU?"

Sony has advanced the concept by making the process of using it more streamlined, and if the APIs and tool chains are robust, potentially the other party besides Microsoft bringing the innovation of providing a software platform that can actually use the hardware to an AMD device.
 
The other GCN GPU's only have 2 ACE's for doing 4 compute jobs at a time PS4 GPU has been customized to have 8 ACE's for doing 64 compute jobs at a time.

I think it's just the way you worded the title and the first post make it seem like asynchronous compute is unique to PS4, which it is not. I think you're both in agreement about what the customization is.
 
It allows more compute jobs to be available at once for the compute front end to pick from.
This means fewer cases where commands are backed up behind queue entries they aren't dependent on that just happen to be in the same queue.

I suppose at 64 queues that Sony really hopes that there will be way more jobs running concurrently than is done at present.
 
Didn't they also mention a far more fine-grained priority system, as one of the customisations?
 
It's been so long and Sony's messaging has relatively content-free lately that I can't remember.
Of the big three customizations Cerny mentioned, the compute one did mention prioritization and arbitration in hardware.
However, it wasn't clear if the prioritization scheme was something Sony asked for, or if it's something Sony's front-end customization simply relies on or exposes.
 
The indications are the hardware has the capability, but no analysis of the released Bonaire cards mentions it being exposed.

That might be one of the announcements coming up.
 
Asynchronous Compute isn't defined by queue count.
The question is "is there compute, and can it run out of lockstep with the CPU?"

Sony has advanced the concept by making the process of using it more streamlined, and if the APIs and tool chains are robust, potentially the other party besides Microsoft bringing the innovation of providing a software platform that can actually use the hardware to an AMD device.

I think it's just the way you worded the title and the first post make it seem like asynchronous compute is unique to PS4, which it is not. I think you're both in agreement about what the customization is.

Basically what I'm asking is what are some of the benefits of being able to run lots of smaller compute jobs on a console?


I think it should be good for things like A.I & Animation since it could break it down into more jobs but still have the computing power of the GPGPU parallel processing.
 
The benefits are there are more compute resources, which is ideally a pretty generic thing that means the chip can do more stuff.

One of the more specific examples Sony has given is actually using compute to provide better culling ahead of the graphics pipeline in a manner similar to how the SPEs were sometimes used in the PS3.
A fair amount of the GPU compute capability is there so that the platform doesn't regress massively relative to Cell.

If a workload doesn't need tight synchronization with the CPU, has very high data parallelism, has low complexity,has good arithmetic density, has a coarse granularity that prevents divergence from ruining SIMD efficiency, and doesn't rely too heavily on straight-line speed, it's a good candidate for the GPU.
If it's complex, relies on straight-line speed, doesn't thrash the cache, and fits narrow SIMD better, hopefully Jaguar isn't too embarassing.

If it requires high straight-line FP speed and fits an in-order pipeline with a rather exotic local store, you better hope it's some kind of encoding or decoding thing that can be offloaded, because that's something Cell is good at.


As for why there's not too much discussion on it, it's because aside from "more graphics", Sony hasn't really given a strong indication on how well its GPU compute scheme will work, or what it will wind up doing besides graphics.

There's a hope that someday people might get around to implementing audio wave tracing or physics (probably fluid or non-rigid body physics). It's not fully described, the full range of tools Sony hopes to have someday for this doesn't exist, and most devs for the first wave of games are using their GPUs for graphics.
 
Last edited by a moderator:
Thanks 3dililettante for your as ever patient and insightful responses.

Could you give us a view on how desktop cpus might hold up in the types of gpgpu tasks the PS4 will be running? You specifically mentioned culling as one benefit which as you say Cell was particularly quick at. Do you think desktop CPU's have caught in that regard yet is is the only response to GPGPU at present, more GPGPU?
 
Proper fluid dynamics are something I'd love to see done. No games have really got that right yet. I remember a screen-saver that came with the Radeon 8500 that actually started to make me feel sea sick after a while!
 
Thanks 3dililettante for your as ever patient and insightful responses.

Could you give us a view on how desktop cpus might hold up in the types of gpgpu tasks the PS4 will be running? You specifically mentioned culling as one benefit which as you say Cell was particularly quick at. Do you think desktop CPU's have caught in that regard yet is is the only response to GPGPU at present, more GPGPU?

FWIW, GPGPU is a concept, it's not referring to a type of hardware.

On both the console and the PC, the software is offloading floating point heavy computations to the from the CPU to the GPU, because it does computations faster for these types of operations.
 
Could you give us a view on how desktop cpus might hold up in the types of gpgpu tasks the PS4 will be running?
The rough recommendation of 4 CUs for compute would give Orbis 410 GFLOPs from the GPU and 102.4 GFLOPs from the Jaguar cores.
A Sandy Bridge K processor could put out about half that total, all on the CPU.
If we assume this is a gaming rig, there's a GPU that I'm not going to include--although a huge chunk of what is GPGPU for one is going to be doable enough on the other.

For things that do very well on the GPU, Orbis could in theory do very well. The high peak FLOPs tends to be severely underutilized outside of the GPU-preferred subset, so I'd want evidence that Sony's tweaks have actually done enough to make GPU compute that much better than current APUs for things that aren't already a GPU strong point.

Orbis has to fall back to the Jaguar cores for single-threaded or complex workloads, which a modern desktop quad core from Intel can curb stomp easily, possibly with performance to spare to beat the GPU in areas where GPUs typically face-plant.
Since a gaming rig is very likely to have a discrete card, it's a lot of brute force to overcome no matter how elegant Sony's solution turns out to be.


You specifically mentioned culling as one benefit which as you say Cell was particularly quick at. Do you think desktop CPU's have caught in that regard yet is is the only response to GPGPU at present, more GPGPU?
At this point, most things Cell was good at have been brute-forced by the evolution of desktop cores, especially if you count the very latest Intel chips.
The SPE work was much more important for the PS3 because RSX needed the extra help.
Modern GPUs are simply massively more powerful and capable of doing more on their own.

There may be additional customizations that have enhanced this for the Orbis GPU, but a big chunk of the gains from using compute for graphics work is something inherent to having a modern GPU.
The case where GPGPU can be taken more seriously for non-graphics work is the case that AMD, Sony, and Microsoft need to make.
Falling back on a good CPU (and for a gaming rig, better silicon and several hundred extra Watts of power) has been the safe bet for years.
 
The rough recommendation of 4 CUs for compute would give Orbis 410 GFLOPs from the GPU and 102.4 GFLOPs from the Jaguar cores.
A Sandy Bridge K processor could put out about half that total, all on the CPU.
If we assume this is a gaming rig, there's a GPU that I'm not going to include--although a huge chunk of what is GPGPU for one is going to be doable enough on the other.

For things that do very well on the GPU, Orbis could in theory do very well. The high peak FLOPs tends to be severely underutilized outside of the GPU-preferred subset, so I'd want evidence that Sony's tweaks have actually done enough to make GPU compute that much better than current APUs for things that aren't already a GPU strong point.

Orbis has to fall back to the Jaguar cores for single-threaded or complex workloads, which a modern desktop quad core from Intel can curb stomp easily, possibly with performance to spare to beat the GPU in areas where GPUs typically face-plant.
Since a gaming rig is very likely to have a discrete card, it's a lot of brute force to overcome no matter how elegant Sony's solution turns out to be.



At this point, most things Cell was good at have been brute-forced by the evolution of desktop cores, especially if you count the very latest Intel chips.
The SPE work was much more important for the PS3 because RSX needed the extra help.
Modern GPUs are simply massively more powerful and capable of doing more on their own.

There may be additional customizations that have enhanced this for the Orbis GPU, but a big chunk of the gains from using compute for graphics work is something inherent to having a modern GPU.
The case where GPGPU can be taken more seriously for non-graphics work is the case that AMD, Sony, and Microsoft need to make.
Falling back on a good CPU (and for a gaming rig, better silicon and several hundred extra Watts of power) has been the safe bet for years.

Paints a rather grim picture for next gen performance..
 
Back
Top