I wasn't thinking of a specific market, more of a value add to existing markets, gaming included.
AMD seemed to think there was value in adding in a custom audio accelerator in the way of Tenscilla DSP, however I would have thought a small x86 processor, bobcat for example, put on every gpu they manufactured would have been a better fit. They could have recommended devs use it to offload audio processing initially.
I thought about this a few years ago, with the two primary ideas being: 1) merging the volumes for APU and low-end discrete silicon in the face of rising costs and AMD's losing share in both and 2) providing a serial resource for certain heterogeneous compute situations where data transfer for discrete cards and the poor serial performance of the GPU side made even the strongest GPUs uncompelling.
However, it seems for point 1 that the justification did not arise. Perhaps this is in part due to the extra complexity, or costs not rising to that level or because the general slowdown in process nodes + rebranding was the reaction to the problem. Another possibility related to Jaguar or another core is that x86 in particular might incur special attention with AMD's WSA, and might have levied extra payments or restricted the choice of which foundries low-end GPU could be routed to.
The scenario for point 2 has generally just been ignored, perhaps to the detriment of AMD's HSA adoption.
There are apparent scenarios where some of this does occur, but typically it's more in the phone realm where AMD is a non-player rather than the power and cost ranges it is stuck in.
The DSP audio block may have been more important to AMD at the time for its co-development with the consoles and their ability to leverage the customizability of Tensilica's solutions and the inadequate latency/synchronization handling of GCN at the time, given how little impact or attention the PC space or AMD gave to discrete or AMD's implementations.
The shift to GPGPU with GFX8 and above at least in part comes from AMD figuring out the high-priority and CU reservation features. However, this is not a perfect mapping to what the DSP block offered. Perhaps the DSP TrueAudio was more limited than AMD wants to go, or the GPGPU method is kind of an over-engineered solution for the same types of load. A CU or the possibly larger number of CUs that are the minimum granularity for reservation, is much more silicon, power, and much more context to manage if the desired workload really was satisfied by the DSP blocks. Probably, a CU is also less amenable to customization.
Perhaps AMD is hoping its more generic interconnect will allow custom DSPs to show up without having to be given an architectural carve-out like TrueAudio did.
Specially when they can dedicate specific CU's count to this. ( as they have shown by the past ). something like virtualization inside the chip.
Isn't that kind of less virtualized than the old method that had software submit a workload and the GPU figured it out without giving any details?
I don't know whether AMD can afford to allocate resources to that right now, but if I'm not mistaken, Naples is already an SoC (well, an MCM made-up of SoCs) with lots of PCIe lanes and two 10Gb Ethernet links, so it seems fairly well-suited to that job, provided that its power curve looks good at low clocks.
The creation of what looks like SMP on the same die due to the CCX structure seems to also indicate a focus on a more modest instancing per CCX, rather than shooting for a core arrangement that could get more ambitious.