I didn't think audio processing required that much CPU resources. I recall it barely using a few percentages of CPU back in the day of general AC97-level of hardware. Not sure exactly where I had seen that, but recall it being somewhere around the time of ESS Technology before Creative dominating the market. Obviously CPU performance has increased since then, so it shouldn't require more. Or is this ratcheted up several fold for excessive effects?
My question revolves around what could be done with the remaining fraction of the Tempest CU's throughput, once the system-controlled 3D audio wavefront takes its cut. In the case of audio, many of the instances that have been discussed had low consumption.
However, even if a developer did have an audio pipeline that needed a small amount of throughput, does Tempest's form of compute justify the effort to leverage it, particularly if they already have the CPU path?
If audio does get a boost in throughput, it would more quickly reach a ceiling with Tempest than on the CPU. Being forced to straddle Tempest and the CPU if they overwhelm it is a larger headache and source of complexity than staying in the CPU or GPU pool where there's orders of magnitude more capacity.
The CPU would be the most flexible in terms of programmability, and there would be a higher probability that resources would be available to develop and maintain code on the CPU than on a more niche not-GPU DMA-based compute unit.
Audio loads on the GPU would have more total throughput to work with, which may be enough to justify the more constrained programming model.
Tempest has none of the throughput advantages, similar batching requirements, little existing infrastructure or coder pool, and an additional DMA-management consideration for the programmer.
Is there a value-add or hook to using Tempest that can provide a benefit besides its small to rounding-error throughput?
If the question came down to whether a game could reduce some effect by 2%, or find 1% of slack time somewhere in the CPU/GPU, or develop for a tiny compute unit with limited throughput and incompatible model, how often is the last option a winner?