Little late in sharing this bit of information but I think I've figured out where that 14+4 CU line of thinking came from finally.
When reading very thoroughly through the Xbox leaked SDK usage of Async compute warrants that the default setting that only 4CU are set aside for the processing of the job. You must alter the parameter to use additional CU at your own discretion. I'm unsure if you can use Less CU.
This leads into an interesting factoid: Xbox One can only at most dispatch 3 Async jobs simultaneously and PS4 4. If of course you cannot use less than 4 CU per job.
I'm having a hard time following what you are trying to say. How did you make the leap from the XB1 SDK to something about the PS4? How does the 8 compute pipelines with 8 queues in the PS4 GPU translate to a discrete numbers of CUs for compute?
When you submit an async compute job into the system, wrt Xbox SDK the numbers of CUs that are leveraged for a job by default is '4', unless you modify it to be a larger number. This is what is written in the SDK.
The Async Controllers are looking and waiting for availability to insert work into the CUs to do. But each job requires in (xbox case) will block off at least 4 CU for the task.
If the two GPU are similar in this manner (default CU reservation) the default CU reservation is 4 CU, which is coincidentally all the hub bub about 14+4 a long time ago.
So does that mean that async compute is a tech that could be "patched" in current engines/games or is it something that needs to be there from start ?
When you submit an async compute job into the system, wrt Xbox SDK the numbers of CUs that are leveraged for a job by default is '4', unless you modify it to be a larger number. This is what is written in the SDK.
The Async Controllers are looking and waiting for availability to insert work into the CUs to do. But each job requires in (xbox case) will block off at least 4 CU for the task.
If the two GPU are similar in this manner (default CU reservation) the default CU reservation is 4 CU, which is coincidentally all the hub bub about 14+4 a long time ago.
Xbox One can only at most dispatch 3 Async jobs simultaneously and PS4 4
Instead of 8 Aces, there are 4 Aces in fury!!
Is that why it doesn't see as much improvement in the Dx12 Ashes demo as the 290x/390x? That's weird.
All newer GCN 1.2 cards have this configuration. There are 4 core ACEs. The two HWS units can do the same work as 4 ACEs, so this is why AMD refer to 8 ACEs in some presentations. The HWS units just smarter and can support more interesting workloads, but AMD don't talk about these right now. I think it has something to do with the HSA QoS feature. Essentially the GCN 1.2 design is not just a efficient multitask system, but also good for multi-user environments.
Most GPUs are not designed to run more than one program, because these systems are not optimized for latency. They can execute multiply GPGPU programs, but executing a game when a GPGPU program is running won't give you good results. This is why HSA has a graphics preemption feature. These GCN 1.2 GPUs can prioritize all graphics task to provide a low-latency output. QoS is just one level further. It can run two games or a game and a GPGPU app simultaneously for two different users, and the performance/experience will be really good with these HWS units.
sounds a bit like the ques used in the xbox one gpu. Would make sense to have specific hardware for the different "VMs"/programs that use the GPU.
ACES don't increase performance. It's just how many async threads you can hold. Each async thread grabs 4CU as written above, so number of CU/4 is the amount of concurrent threads the GPU can operate on.Is that why it doesn't see as much improvement in the Dx12 Ashes demo as the 290x/390x? That's weird.