Asynchronous Compute : what are the benefits?

Karamazov · Apr 9, 2015

ok thanks for the explanation !

iroboto · Apr 21, 2015

Little late in sharing this bit of information but I think I've figured out where that 14+4 CU line of thinking came from finally.

When reading very thoroughly through the Xbox leaked SDK usage of Async compute warrants that the default setting that only 4CU are set aside for the processing of the job. You must alter the parameter to use additional CU at your own discretion. I'm unsure if you can use Less CU.

This leads into an interesting factoid: Xbox One can only at most dispatch 3 Async jobs simultaneously and PS4 4. If of course you cannot use less than 4 CU per job.

Arwin · May 19, 2015

Just came across this:
http://www.dualshockers.com/2014/09...w-childrens-developer-to-save-5-ms-per-frame/

Doesn't seem that much of an improvement, but you don't get something like this that often. It also seems an interesting game in other ways:

http://www.dualshockers.com/2014/09...lighting-technology-with-new-wip-screenshots/

Karamazov · May 19, 2015

So does that mean that async compute is a tech that could be "patched" in current engines/games or is it something that needs to be there from start ?

DrJay24 · May 19, 2015

iroboto said:
Little late in sharing this bit of information but I think I've figured out where that 14+4 CU line of thinking came from finally.

When reading very thoroughly through the Xbox leaked SDK usage of Async compute warrants that the default setting that only 4CU are set aside for the processing of the job. You must alter the parameter to use additional CU at your own discretion. I'm unsure if you can use Less CU.

This leads into an interesting factoid: Xbox One can only at most dispatch 3 Async jobs simultaneously and PS4 4. If of course you cannot use less than 4 CU per job.

I'm having a hard time following what you are trying to say. How did you make the leap from the XB1 SDK to something about the PS4? How does the 8 compute pipelines with 8 queues in the PS4 GPU translate to a discrete numbers of CUs for compute?

iroboto · May 19, 2015

DrJay24 said:
I'm having a hard time following what you are trying to say. How did you make the leap from the XB1 SDK to something about the PS4? How does the 8 compute pipelines with 8 queues in the PS4 GPU translate to a discrete numbers of CUs for compute?

When you submit an async compute job into the system, wrt Xbox SDK the numbers of CUs that are leveraged for a job by default is '4', unless you modify it to be a larger number. This is what is written in the SDK.

The Async Controllers are looking and waiting for availability to insert work into the CUs to do. But each job requires in (xbox case) will block off at least 4 CU for the task.

If the two GPU are similar in this manner (default CU reservation) the default CU reservation is 4 CU, which is coincidentally all the hub bub about 14+4 a long time ago.

Globalisateur · May 19, 2015

iroboto said:
When you submit an async compute job into the system, wrt Xbox SDK the numbers of CUs that are leveraged for a job by default is '4', unless you modify it to be a larger number. This is what is written in the SDK.

The Async Controllers are looking and waiting for availability to insert work into the CUs to do. But each job requires in (xbox case) will block off at least 4 CU for the task.

If the two GPU are similar in this manner (default CU reservation) the default CU reservation is 4 CU, which is coincidentally all the hub bub about 14+4 a long time ago.

12 CUs (xbox one) = 4×3
18 CUs (PS4) = 14 + 4 = 3×4 + 4 + 2

You better not use 4 as the default CU reservation on PS4...

forumaccount · May 20, 2015

Karamazov said:
So does that mean that async compute is a tech that could be "patched" in current engines/games or is it something that needs to be there from start ?

There's no reason it couldn't be patched in, although patches are not really the right time to be doing that sort of thing unless there's a huge, previously unnoticed performance issue on ship.

dobwal · May 20, 2015

iroboto said:
When you submit an async compute job into the system, wrt Xbox SDK the numbers of CUs that are leveraged for a job by default is '4', unless you modify it to be a larger number. This is what is written in the SDK.

The Async Controllers are looking and waiting for availability to insert work into the CUs to do. But each job requires in (xbox case) will block off at least 4 CU for the task.

If the two GPU are similar in this manner (default CU reservation) the default CU reservation is 4 CU, which is coincidentally all the hub bub about 14+4 a long time ago.

I am guessing thats because in GCN, CUs are grouped in 4s by the fact that each group share an instruction cache and a scalar data cache.

psorcerer · May 20, 2015

iroboto said:
Xbox One can only at most dispatch 3 Async jobs simultaneously and PS4 4

There is still no public info that you can use compute without caling DX11/12 on Xone (i.e. without API overhead).
It is possible on PS4.

deanos · Jun 18, 2015

Im pretty sure there was a page 16 already and some talk about Dreams, what happend?

Shifty Geezer · Jun 18, 2015

Sorry, should have posted link. I moved the Dreams tech discussion here as it's not about async compute but is a more general rendering algorithm.

Starx · Sep 3, 2015

Instead of 8 Aces, there are 4 Aces in fury!!

Clukos · Sep 3, 2015

Is that why it doesn't see as much improvement in the Dx12 Ashes demo as the 290x/390x? That's weird.

chris1515 · Sep 3, 2015

Starx said:
Instead of 8 Aces, there are 4 Aces in fury!!

And 2 HWS, an evolution of the architecture

chris1515 · Sep 3, 2015

Clukos said:
Is that why it doesn't see as much improvement in the Dx12 Ashes demo as the 290x/390x? That's weird.

The compute scheduler architecture are different. The only things we know they have 2 HWS other sort of scheduler "smarter" than ACE.

Starx · Sep 3, 2015

I find this:

All newer GCN 1.2 cards have this configuration. There are 4 core ACEs. The two HWS units can do the same work as 4 ACEs, so this is why AMD refer to 8 ACEs in some presentations. The HWS units just smarter and can support more interesting workloads, but AMD don't talk about these right now. I think it has something to do with the HSA QoS feature. Essentially the GCN 1.2 design is not just a efficient multitask system, but also good for multi-user environments.

Most GPUs are not designed to run more than one program, because these systems are not optimized for latency. They can execute multiply GPGPU programs, but executing a game when a GPGPU program is running won't give you good results. This is why HSA has a graphics preemption feature. These GCN 1.2 GPUs can prioritize all graphics task to provide a low-latency output. QoS is just one level further. It can run two games or a game and a GPGPU app simultaneously for two different users, and the performance/experience will be really good with these HWS units.

http://forums.anandtech.com/showpost.php?p=37656793&postcount=204

Allandor · Sep 3, 2015

Starx said:
I find this:

http://forums.anandtech.com/showpost.php?p=37656793&postcount=204

sounds a bit like the ques used in the xbox one gpu. Would make sense to have specific hardware for the different "VMs"/programs that use the GPU.

iroboto · Sep 3, 2015

Clukos said:
Is that why it doesn't see as much improvement in the Dx12 Ashes demo as the 290x/390x? That's weird.

ACES don't increase performance. It's just how many async threads you can hold. Each async thread grabs 4CU as written above, so number of CU/4 is the amount of concurrent threads the GPU can operate on.

In this case, 16 threads. Each ACE is 8 threads.

chris1515 · Sep 3, 2015

It is more about efficiency, the only known games with more than 2 queues is The Tomorrow Children and it only use 3 queues (ACE) far from 7 (one is reserved for OS)

Asynchronous Compute : what are the benefits?

Karamazov

iroboto

Daft Funk

Arwin

Now Officially a Top 10 Poster

Karamazov

DrJay24

iroboto

Daft Funk

Globalisateur

Globby

forumaccount

dobwal

psorcerer

deanos

Shifty Geezer

uber-Troll!

Starx

Clukos

Bloodborne 2 when?

chris1515

chris1515

Starx

Allandor

iroboto

Daft Funk

chris1515

Similar threads