Digital Foundry Article Technical Discussion Archive [2013]

Gipsel · Aug 2, 2013

Brad Grenz said:
I believe asynchronous compute refers to the ability to interleave compute tasks along side normal rendering. Before the ACEs your gpgpu task went through the main graphics command system which I suppose could result in inefficiencies.

That's roughly the same what I said above expressed from a different angle I think. Maybe I should add that on the older architectures the command processor got basically flushed when a compute task arrived (at least that was my impression from time to time).

Deleted member 11852 · Aug 2, 2013

willardjuice said:
This is a gross oversimplification and tbh is wrong. There can be many hurdles to GPGPU and "spawning many threads" is not the biggest one (in fact, imo I wouldn't even consider that a hurdle...). Perhaps I am misinterpreting you.

Parallelisation. A common theme I noticed among developer interviews, particularly when talking about the PS3 and getting use out of the SPUs, was the hurdle of breaking down a large task into a smaller jobs that could be run in parallel. An awful lot of Sony first and second party studios also did lengthy presentation on approaching the problem is parallelising code. GPGPU has the same.

dobwal · Aug 2, 2013

I think "async" refers to the fact that compute on a gpu is not as serialized as it is on a cpu.

Even some workloads that need to be serialized, theoretically can be ran on a gpu asynchronously using speculation and transactional memory.

Laa-Yosh · Aug 2, 2013

The main problem with GPGPU is random memory access. You basically get none, as it's a stream processor and its main advantage in massive parallelization is that you don't try to jump around in memory, you have to work on predictable datasets with no dependencies.

It's the same reason why raytracing large scenes on a GPU is hard. The hardware is designed around the assumption that you don't need to do such things. So whenever you start to need it, the performance will drop significantly. This is why only certain types of tasks can benefit from GPGPU and why it's not possible to just simply port any kind of code to it.

Gipsel · Aug 2, 2013

Laa-Yosh said:
The main problem with GPGPU is random memory access. You basically get none, as it's a stream processor and its main advantage in massive parallelization is that you don't try to jump around in memory, you have to work on predictable datasets with no dependencies.

Of course one can do random or better let's say arbitrary memory accesses for each individual work item on GPUs. It usually works at least as good (I would say often even better) than random memory accesses on CPUs. The much more fundamental problem is divergence of control flow for work items within a vector (or that they are basically always executed in lockstep which restricts the kind of control flow and synchronisation possible [GCN offers a not very well performing solution around some of the restrictions]), i.e. anything what breaks the SPMD paradigm.

Rangers · Aug 3, 2013

Laa-Yosh said:
The main problem with GPGPU is random memory access. You basically get none, as it's a stream processor and its main advantage in massive parallelization is that you don't try to jump around in memory, you have to work on predictable datasets with no dependencies.

It's the same reason why raytracing large scenes on a GPU is hard. The hardware is designed around the assumption that you don't need to do such things. So whenever you start to need it, the performance will drop significantly. This is why only certain types of tasks can benefit from GPGPU and why it's not possible to just simply port any kind of code to it.

Carmack said something very quickly in passing yesterday, something to the effect of GPGPU compute is not always as big a win as the raw flops count would indicate on many tasks. That there could be some improvement on tasks, but it's not necessarily enormous.

I should find it and quote him better, I suppose.

Solarus · Aug 3, 2013

what can compute shaders be used for? also why use compute shaders over a cpu if compute shaders are slower?

Betanumerical · Aug 3, 2013

Solarus said:
what can compute shaders be used for? also why use compute shaders over a cpu if compute shaders are slower?

The idea of compute was to be able to do anything you wanted to do on the GPU without having to form your algorithm to fix within the constraints of the graphics pipeline.

Compute shaders will be slow at specific things, iirc they don't not respond well to complex/random access patterns as well as CPU's do.

Gipsel · Aug 3, 2013

Betanumerical said:
The idea of compute was to be able to do anything you wanted to do on the GPU without having to form your algorithm to fix within the constraints of the graphics pipeline.

Compute shaders will be slow at specific things, iirc they don't not respond well to complex/random access patterns as well as CPU's do.

While this is generally true, I would contest the random access pattern part. I would actually bet, that a performance class GPU (let's say Pitcairn, Tahiti even more so) will beat the crap out of any CPU (edit: maybe a bit strong, but it will be faster) in a parallel pointer chasing benchmark on large buffers. They suck on latency, but the throughput is still higher with more work items in flight. So if the task is large enough, random access patterns definitely hurt, but GPUs can sustain a higher throughput than CPUs (which also don't like random patterns).

Silent_Buddha · Aug 3, 2013

Solarus said:
what can compute shaders be used for? also why use compute shaders over a cpu if compute shaders are slower?

Because they are massively parallel in nature. While some things will be slower, especially if there's a branch miss, you have so many concurrent threads in flight that overall it is still to come out ahead if it is a parallelizable task.

A CPU will always do better (at least currently) in highly serial tasks with lots of branches. And likely even without many branches.

A complicated AI, for example, with lots of branches due to having to make frequent decisions wouldn't necessarily be suited to a GPU compute. Running 100 or more AI with less complex decision making, however, might be better suited to the GPU.

Regards,
SB

Solarus · Aug 4, 2013

oh so is that why developers and gpu engineers have been trying to move physics over to gpus? That would be a good fit for it right?

ERP · Aug 4, 2013

Solarus said:
oh so is that why developers and gpu engineers have been trying to move physics over to gpus? That would be a good fit for it right?

I sat through a presentation by Havok engineers on what they are and are not doing on GPU.
Basically generic solvers were not a good fit for the GPU, 70x the flops resulted in a 50% improvement in performance.
Things like particle physics are much faster on the GPU. Which is pretty much what nvidia physix also does on GPU.

A lot of it comes down to the data structures that need to be walked, and how much interaction between threads there is.

Arwin · Aug 17, 2013

Article on 'PS3 vs Vita':

http://www.eurogamer.net/articles/digitalfoundry-ps3-vs-ps-vita-face-off

Alucardx23 · Aug 18, 2013

Zone of the Enders: how Konami remade its own HD remake
http://www.eurogamer.net/articles/digitalfoundry-how-konami-remade-zoe-hd-remaster

FATBOT · Aug 28, 2013

Face-Off: Splinter Cell: Blacklist

http://www.eurogamer.net/articles/digitalfoundry-splinter-cell-blacklist-face-off

SlimJim · Aug 29, 2013

Alucardx23 said:
Zone of the Enders: how Konami remade its own HD remake
http://www.eurogamer.net/articles/digitalfoundry-how-konami-remade-zoe-hd-remaster

Imagine if Hexadrive redid Bayonetta..
1080P 60fps locked PS3 version?

hesido · Aug 29, 2013

SlimJim said:
Imagine if Hexadrive redid Bayonetta..
1080P 60fps locked PS3 version?

The difference is massive. So it turns out, you CAN really screw up badly if you don't use the hardware properly. Badly as in, more than 50%! The difference is more than quadruple in many instances, the amount of pixels pushed per second.

Shifty Geezer · Aug 29, 2013

Was that ever in doubt?

SlimJim · Aug 29, 2013

Shifty Geezer said:
Was that ever in doubt?

In the case of ZOE2, I actually thought it was physically impossible to have the same density of effects as on PS2. Given that vram bandwith per pixel is a magnitude slower on PS3.

Turned out you can work around it. As demonstrated with the now excellent ZOE2 port. Never have I seen a PS3 games with that much transparency effects, let alone with a full-res framebuffer, 1280*1080, MLAA, and 60fps (mostly) as well..

Shifty Geezer · Aug 29, 2013

SlimJim said:
In the case of ZOE2, I actually thought it was physically impossible to have the same density of effects as on PS2. Given that vram bandwith per pixel is a magnitude slower on PS3.

But did you ever question the notion, "you CAN really screw up badly if you don't use the hardware properly"?

Digital Foundry Article Technical Discussion Archive [2013]

Gipsel

Deleted member 11852

Guest

dobwal

Laa-Yosh

I can has custom title?

Gipsel

Rangers

Solarus

Betanumerical

Gipsel

Silent_Buddha

Solarus

ERP

Arwin

Now Officially a Top 10 Poster

Alucardx23

FATBOT

SlimJim

hesido

Shifty Geezer

uber-Troll!

SlimJim

Shifty Geezer

uber-Troll!

Similar threads