Yes, I'm honestly curious what the benefits to having multiple compute kernels in parallel really are (ala AMD's >2 ACEs)... This is beneficial if you cannot overlap an independent graphics workload and you have multiple independent compute workloads to run, but I'm not sure how important that is in practice.... and if anyone starts talking about numbers of hardware queues and ACEs and whatever else you can pretty safely ignore that as marketing/fanboy nonsense that is just adding more confusion rather than useful information.
Certainly a lot depends on the workload, the developer, *and* the API's ability to expose that parallelism in the first place. OpenGL ES3.1's memory barrier mechanism is quite bad at this and this will hide some of the benefit of HW support for parallel graphics & compute. I think DX11 would have the same problem (or worse) while DX12/Vulkan/Mantle should all do a much better job at it, but I don't know enough about non-GLES/Vulkan APIs to be certain.
Another thing to consider is that if you have enough parallelism on one workload, then running a second one at the same time risks trashing your cache, and arbitration may also be non-trivial. Again I have never done any performance analysis of GCN so I don't know how well they handle that but it's certainly something that I expect will benefit from gradual improvement between hardware generations.