Digital Foundry Article Technical Discussion Archive [2013]

Status
Not open for further replies.
I believe asynchronous compute refers to the ability to interleave compute tasks along side normal rendering. Before the ACEs your gpgpu task went through the main graphics command system which I suppose could result in inefficiencies.
That's roughly the same what I said above expressed from a different angle I think. Maybe I should add that on the older architectures the command processor got basically flushed when a compute task arrived (at least that was my impression from time to time).
 
This is a gross oversimplification and tbh is wrong. There can be many hurdles to GPGPU and "spawning many threads" is not the biggest one (in fact, imo I wouldn't even consider that a hurdle...). Perhaps I am misinterpreting you.
Parallelisation. A common theme I noticed among developer interviews, particularly when talking about the PS3 and getting use out of the SPUs, was the hurdle of breaking down a large task into a smaller jobs that could be run in parallel. An awful lot of Sony first and second party studios also did lengthy presentation on approaching the problem is parallelising code. GPGPU has the same.
 
I think "async" refers to the fact that compute on a gpu is not as serialized as it is on a cpu.

Even some workloads that need to be serialized, theoretically can be ran on a gpu asynchronously using speculation and transactional memory.
 
The main problem with GPGPU is random memory access. You basically get none, as it's a stream processor and its main advantage in massive parallelization is that you don't try to jump around in memory, you have to work on predictable datasets with no dependencies.

It's the same reason why raytracing large scenes on a GPU is hard. The hardware is designed around the assumption that you don't need to do such things. So whenever you start to need it, the performance will drop significantly. This is why only certain types of tasks can benefit from GPGPU and why it's not possible to just simply port any kind of code to it.
 
The main problem with GPGPU is random memory access. You basically get none, as it's a stream processor and its main advantage in massive parallelization is that you don't try to jump around in memory, you have to work on predictable datasets with no dependencies.
Of course one can do random or better let's say arbitrary memory accesses for each individual work item on GPUs. It usually works at least as good (I would say often even better) than random memory accesses on CPUs. The much more fundamental problem is divergence of control flow for work items within a vector (or that they are basically always executed in lockstep which restricts the kind of control flow and synchronisation possible [GCN offers a not very well performing solution around some of the restrictions]), i.e. anything what breaks the SPMD paradigm.
 
The main problem with GPGPU is random memory access. You basically get none, as it's a stream processor and its main advantage in massive parallelization is that you don't try to jump around in memory, you have to work on predictable datasets with no dependencies.

It's the same reason why raytracing large scenes on a GPU is hard. The hardware is designed around the assumption that you don't need to do such things. So whenever you start to need it, the performance will drop significantly. This is why only certain types of tasks can benefit from GPGPU and why it's not possible to just simply port any kind of code to it.

Carmack said something very quickly in passing yesterday, something to the effect of GPGPU compute is not always as big a win as the raw flops count would indicate on many tasks. That there could be some improvement on tasks, but it's not necessarily enormous.

I should find it and quote him better, I suppose.
 
what can compute shaders be used for? also why use compute shaders over a cpu if compute shaders are slower?

The idea of compute was to be able to do anything you wanted to do on the GPU without having to form your algorithm to fix within the constraints of the graphics pipeline.

Compute shaders will be slow at specific things, iirc they don't not respond well to complex/random access patterns as well as CPU's do.
 
The idea of compute was to be able to do anything you wanted to do on the GPU without having to form your algorithm to fix within the constraints of the graphics pipeline.

Compute shaders will be slow at specific things, iirc they don't not respond well to complex/random access patterns as well as CPU's do.
While this is generally true, I would contest the random access pattern part. I would actually bet, that a performance class GPU (let's say Pitcairn, Tahiti even more so) will beat the crap out of any CPU (edit: maybe a bit strong, but it will be faster) in a parallel pointer chasing benchmark on large buffers. They suck on latency, but the throughput is still higher with more work items in flight. So if the task is large enough, random access patterns definitely hurt, but GPUs can sustain a higher throughput than CPUs (which also don't like random patterns).
 
Last edited by a moderator:
what can compute shaders be used for? also why use compute shaders over a cpu if compute shaders are slower?

Because they are massively parallel in nature. While some things will be slower, especially if there's a branch miss, you have so many concurrent threads in flight that overall it is still to come out ahead if it is a parallelizable task.

A CPU will always do better (at least currently) in highly serial tasks with lots of branches. And likely even without many branches.

A complicated AI, for example, with lots of branches due to having to make frequent decisions wouldn't necessarily be suited to a GPU compute. Running 100 or more AI with less complex decision making, however, might be better suited to the GPU.

Regards,
SB
 
oh so is that why developers and gpu engineers have been trying to move physics over to gpus? That would be a good fit for it right?
 
oh so is that why developers and gpu engineers have been trying to move physics over to gpus? That would be a good fit for it right?

I sat through a presentation by Havok engineers on what they are and are not doing on GPU.
Basically generic solvers were not a good fit for the GPU, 70x the flops resulted in a 50% improvement in performance.
Things like particle physics are much faster on the GPU. Which is pretty much what nvidia physix also does on GPU.

A lot of it comes down to the data structures that need to be walked, and how much interaction between threads there is.
 
Imagine if Hexadrive redid Bayonetta..
1080P 60fps locked PS3 version?

The difference is massive. So it turns out, you CAN really screw up badly if you don't use the hardware properly. Badly as in, more than 50%! The difference is more than quadruple in many instances, the amount of pixels pushed per second.
 
Was that ever in doubt?

In the case of ZOE2, I actually thought it was physically impossible to have the same density of effects as on PS2. Given that vram bandwith per pixel is a magnitude slower on PS3.

Turned out you can work around it. As demonstrated with the now excellent ZOE2 port. Never have I seen a PS3 games with that much transparency effects, let alone with a full-res framebuffer, 1280*1080, MLAA, and 60fps (mostly) as well..
 
In the case of ZOE2, I actually thought it was physically impossible to have the same density of effects as on PS2. Given that vram bandwith per pixel is a magnitude slower on PS3.
But did you ever question the notion, "you CAN really screw up badly if you don't use the hardware properly"?
 
Status
Not open for further replies.
Back
Top