Again. Asynchronous has NOTHING to do with concurrency. You can have a fully asynchronous interface that executes sequentially and gets no performance gain.
It hasn't been handled super well. And as press I'm still trying to do a better job of communicating concurrency vs. async, in part because of these misconceptions.Again. Asynchronous has NOTHING to do with concurrency. You can have a fully asynchronous interface that executes sequentially and gets no performance gain.
Supporting Asynchronous Compute, from a technical point of view, does not imply anything about performance improvements.
Would have been more useful if they had called it Concurrent Compute.
32 is the amount of compute dispatches Maxwell/Pascal can keep track of at a time.This was my understanding as well, with Maxwell the partitioning was static at drawcall boundaries, but with a fixed latency pipeline I would expect it to be possible to develop a heuristic that at the very least doesn't lead to performance loss Why does MDolencs bench scale with 32 though, why 32?
I assume you're talking about variations in the latency part? Yeah, that's pretty much the worst possible case for Maxwell. There's only 128 draw calls that keep 980 Ti busy for about 22ms and is extremely light on geometry side (games will do 1000s of draw calls in less time). Never the less it shows that it can interrupt graphics rendering with high priority compute.MDolenc I am getting large variations with the async compute test, initially i thought it was because browser was open, so I closed it but still had problems. Appears sporadic lol.
ANyway manage to get a log of one of the runs in which it worked properly
Turns out all the work on the compute queue is dwm
Would there be any reason to use Asynchronous Compute if not looking for performance improvements?Again. Asynchronous has NOTHING to do with concurrency. You can have a fully asynchronous interface that executes sequentially and gets no performance gain.
Supporting Asynchronous Compute, from a technical point of view, does not imply anything about performance improvements.
Would have been more useful if they had called it Concurrent Compute.
32 is the amount of compute dispatches Maxwell/Pascal can keep track of at a time.
I assume you're talking about variations in the latency part? Yeah, that's pretty much the worst possible case for Maxwell. There's only 128 draw calls that keep 980 Ti busy for about 22ms and is extremely light on geometry side (games will do 1000s of draw calls in less time). Never the less it shows that it can interrupt graphics rendering with high priority compute.
I vaguely recall posts saying that all modern GPUs can execute different graphics shaders on the same CU/SM/EU or whatever in parallel. If that's true and you stuck to using just graphics shaders, couldn't you increase HW utilization on all brands by running more than one in parallel? Asked another way, what is it about compute shaders that you cannot do from a graphics shader?
Thanks! The text at https://www.khronos.org/registry/vulkan/specs/1.0/xhtml/vkspec.html#fundamentals-queueoperation seems to imply that even commands submitted to the same queue can theoretically execute in parallel. Does this not work out very well in practice?Nothing, compute shaders cannot access geometry resources, but you could run a compute shader using simple proxy geometry to do the same in graphics queue. The advantage lies in having the independent queues
Does this not work out very well in practice?
Sorry to be dense, but I don't understand - do you mean that the GPU driver is in some sense a third party that cannot be relied on to run commands submitted to the same queue in parallel?You mean relying on a third party to secure your business? Never
Correct. AMD doesn't provide official console support. I say official because people talk at conferences, etc. The console vendor devrel people are very knowledgeable so they handle devrel just fine.AMD isn't targetting the consoles specifically, I believe.
Consoles certainly helped adoption of async compute.Would Async Compute be integral to the development and design of games we are seeing on PC without the current consoles that AMD controls.
Bypass the graphics pipeline allowing work to launch as fast as possible with as little overhead as possible. Creating work groups that execute with access to shared local memory is useful at times.what is it about compute shaders that you cannot do from a graphics shader?
What is still "wrong" with Quantum Break on NVidia?What went wrong (and still is) with Quantum Break for Nvidia hardware?
GCN definitely. Others, well who knows.I vaguely recall posts saying that all modern GPUs can execute different graphics shaders on the same CU/SM/EU or whatever in parallel.
It's very simple: sometimes even when you have all these choices about what shader types to run on the ALUs, there's not enough work to keep all of the ALUs busy. For example, you can't have 2 pixel shaders running at the same time, because a draw call requires at most one of each type of shader (hull, domain, vertex, geometry, pixel). The best you can get is that as one draw call completes, the next draw call starts, so for a brief time there could be an overlap (it's unclear if any GPUs can overlap like this).If that's true and you stuck to using just graphics shaders, couldn't you increase HW utilization on all brands by running more than one in parallel? Asked another way, what is it about compute shaders that you cannot do from a graphics shader?
Correct. AMD doesn't provide official console support. I say official because people talk at conferences, etc. The console vendor devrel people are very knowledgeable so they handle devrel just fine.
Consoles certainly helped adoption of async compute.
Bypass the graphics pipeline allowing work to launch as fast as possible with as little overhead as possible. Creating work groups that execute with access to shared local memory is useful at times.
I assume all can overlap work from multiple draws.best you can get is that as one draw call completes, the next draw call starts, so for a brief time there could be an overlap (it's unclear if any GPUs can overlap like this).
Any developer is free to talk to AMD about their hardware and since there's a lot of overlap between PC and console AMD indirectly provides console support but the consoles are ultimately not AMD products so the console vendors have their own support teams.So when a game is developed initially on the console and ported to PC (original point of Mantle), AMD only provide support to the major AAA studios on the PC side?
I do get that and agree, in fact I mentioned overlap in response to Jawed.I assume all can overlap work from multiple draws.
Any developer is free to talk to AMD about their hardware and since there's a lot of overlap between PC and console AMD indirectly provides console support but the consoles are ultimately not AMD products so the console vendors have their own support teams.
That's another casualty of the async compute shit storm and those nice youtube animations how work is processed in the GPU. That draw calls go through GPU like little ducklings. It's simply wrong.It's very simple: sometimes even when you have all these choices about what shader types to run on the ALUs, there's not enough work to keep all of the ALUs busy. For example, you can't have 2 pixel shaders running at the same time, because a draw call requires at most one of each type of shader (hull, domain, vertex, geometry, pixel). The best you can get is that as one draw call completes, the next draw call starts, so for a brief time there could be an overlap (it's unclear if any GPUs can overlap like this).
That depends on whether the blend mode and render targets are commutative, doesn't it? The vertex based shaders are trivial to pipeline or even to execute out of order between drawcalls, but the pixel / fragment shaders can't overlap unless the blend operation itself is independent from order of execution.That draw calls go through GPU like little ducklings. It's simply wrong.
GPUs can overlap two draw calls (graphics) just as well as they can overlap two dispatch calls (compute).