Next Generation Hardware Speculation with a Technical Spin [post E3 2019, pre GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
"The AMD chip also includes a custom unit for 3D audio that Cerny thinks will redefine what sound can do in a videogame." It's described as a custom unit on the AMD chip. That could literally be anything from an entirely separate bespoke block, to modifications on the GPU front-end to further improve on TrueAudio Next.
A unit is generally a block on the die rather than modifications to existing bites. RT enhancements to the CUs would be called 'Hardware RT' but not an 'RT unit'. However, the 'custom unit' doesn't come from Cerny's mouth, so we can't be sure he described it as such and that wasn't just a (mis)interpretation by the author.

Audiokinetic will definitely be responsible in part for developing a wwise plugin for the PS5 sdk, whether it's a custom processor or using the gpu for some of the complex computation. They're a software, not a hardware company.
But why would Sony care to acquire them if their solution is the same as AMDs and XBSX's? Why not just leave them to develop and supply TrueAudio Next support on their own?
 
They did add an external 3D audio chip for PSVR despite the cost sensitivity of the product, so at least we can see the intent of putting more specialized silicon to the task. And they own wwise now, this can't be a coincidence. I think they want to offer the entire tool chain, not just add hardware and hope devs will use it. It's the audio middlewares that would be responsible for the hardware support.
 
They did add an external 3D audio chip for PSVR despite the cost sensitivity of the product, so at least we can see the intent of putting more specialized silicon to the task. And they own wwise now, this can't be a coincidence. I think they want to offer the entire tool chain, not just add hardware and hope devs will use it. It's the audio middlewares that would be responsible for the hardware support.

Is there any info on how psvr 3D audio works besides being an object based format like Dolby Atmos or ambisonic formats? It also worked on their wireless headset, but only their wireless headset for unknown reasons.
 
Amd is continually updating their sdk. They dropped DSPs. I don’t know how they made that decision. I’m not up to date on current audio DSPs or how they’d compare in terms of flexibility and overall performance. 8 CUs on an RDNA gpu is a lot of math performance to compete with.
I can think of a few reasons that could have contributed to dropping the DSPs.
One is that they were licensed IP AMD needed to pay for. Their die area didn't seem to be much, but it was non-zero and AMD was trying to get its GPUs on 16nm to be as small as they could.
In the PC space in particular, it was vastly clear versus the consoles that this DSP silicon had all but zero uptake since Trueaudio was introduced.

With later versions of GCN, the significant latency problem was at least partially addressed, and that might have been a major motivator for the DSP block in the consoles, where in-game sound processing was often didn't factor into the DSP blocks use cases. The PS4 liked it more for asset decode, and the Xbox One's SHAPE block had multiple other uses like handling sound input.
For the PC space, which didn't care at all about TrueAudio, a fully CU-based TrueAudio Next might have had enough latency improvement to not be rejected outright while also not costing much in silicon in case the uptake was as abysmal as it was for the original.

Whether or not the next-gen consoles might bring in DSP hardware again may depend on their ambitions in that space versus how far AMD's hardware has come.

Can you get the revolution in audio that you want with a tiny DSP? Is that assumption actually true?
Does it need to be just one DSP? The current gen didn't limit itself to one, or at least I don't think Microsoft's solution did.
Part of Sony's hope for an HSA-based sound engine was for an audio pipeline that could be fully programmable and support arbitrary combinations of effects and sources on the fly.
DSPs are fully programmable, being small CPU cores with specialized extensions. Their tendency to be significantly more narrow than Wave64 or Wave32 is part of their appeal for a more general sound solution. Arbitrary combinations mean divergence for the CUs, and too many different instruction streams start injecting concerns with occupancy or Icache thrashing.
DSP blocks can also maintain more specialized cache or storage hierarchies, and can often switch out tasks relatively quickly versus the wavefront launch model.
Granted, AMD has a patent for a more DSP-like handling of the CU, which might be its play for providing an in-house option for a block that it was forced to use Tensilica DSPs for initially.

Perhaps the latency improvements and CPU upgrade could give enough of an improvement such that Sony could employ a hybrid CPU/GPU model.
This wasn't done on the PS4 due to the GPU's latency, and it wasn't done with the DSP because its asynchronous API would have created additive latency if the engine switched between CPU and DSP.
Maybe the latest measures with the CUs might help keep the additive latency down, and Sony's hope for a fully general GPU-side audio engine might recede to the effects offered by True Audio Next.
 
...
Does it need to be just one DSP? The current gen didn't limit itself to one, or at least I don't think Microsoft's solution did
...

I only phrased it as a tiny DSP because I was responding to the suggestion that you could shrink the gpu 10% and replace it with a tiny DSP that would outperform what TrueAudio Next could provide with that 10%.

I'm not saying a DSP is a must-have, but it's a solid addition as you get more bang-per-buck. Reduce the GPU 10%, add a tiny DSP, and get 3D audio as a standard feature and USP for your platform while saving a few bucks on having a larger GPU.

Can you get the revolution in audio that you want with a tiny DSP? Is that assumption actually true? ...

I was genuinely asking the question. I don't know how big of an audio processor you'd need to do the same kind of real-time convolution reverb that an RDNA gpu can do across 2-8 CUs, as an example. I really don't have the knowledge of DSPs, or what's available on the market to get a sense of comparison.
 
I only phrased it as a tiny DSP because I was responding to the suggestion that you could shrink the gpu 10% and replace it with a tiny DSP that would outperform what TrueAudio Next could provide with that 10%.

It's a good question as to what the area is for a contemporary audio DSP. The vendor AMD used for TrueAudio was Tensilica, which was acquired in 2013. There's not much discussion about the area investment needed for those cores these days, perhaps related to how widely they can vary based on process node, clock speed, and selected features.
There are last-decade examples of compact versions of these VLIW DSPs potentially coming under .05mm, at TSMC 40nm.
More current versions that might have something like 2 or 4 MAC operations per clock don't have many references, although I did find one reference to a Hifi4 DSP reference that might give ~0.3mm2 at 28nm.
It's dealing with audio processing for hearing aides, so I'm not sure if there are implications such as tuning for area or power versus performance: https://www.ims.uni-hannover.de/fil...aeten/Tensilica_Day/2019/td19_karrenbauer.pdf
The marketing for the Hifi4 gives it among other things 4 32-bit MAC operations per clock or 8 for lower-precision, but this might clock several times lower than a CU depending on how far the GPU is being pushed.

A CU has 64 FMA lanes versus an ALU I'll for the sake of argument give 1/16 the ability, so naively that would need 4.8mm2 for the equivalent number of lanes. That's without about 2-3 node shrinks, however, so at least in terms of unit count it might be scalable ~1.2mm2 or lower.
(edit: At one point, I had misread the graph and had a 0.4 mm2 starting point, which yielded 1.6mm2. Perhaps a conservative estimate with all the overhead with many cores could use that?)

Clock speeds might lag, and there's an unknown amount of additional area for connecting them. However, estimates on die shots of RDNA GPUs have ~ 4.5mm2 per WGP, or 2.25mm2 per CU.

In terms of processing grunt, the CU could win out, and it does have other resources besides just the ALUs. On the other hand, it does give over a significant amount of area and power to functions not important to this workload, and it needs to batch 4-8 times as much work to utilize the width of its hardware. Perhaps for things like reverb, this isn't as important (some kinds of reverb were one category Sony did say might be fine with the GPU in 2013), but perhaps more free-form engines or varied scenarios might put more burden on audio programmers to get good results.
How much of that width is utilized isn't clear from the benchmark data given from the Steam page on TrueAudio, and I have questions about the specifics of the CPU figure (and whether the addition of CPU multithreading for Steam Audio that came later that year would have affected the comparison any).

I think it would be an interesting comparison, if a raft of DSPs could better fit scenarios where there are more varied effects or sources, versus ones where large amounts of similar computation could win out with a CU. I'm not sure either the CUs or DSPs at a low-level are much different in how easily they can be programmed, and it's not clear how well they can be targeted by intermediate layers. Tensilica/Cadence did try to make their products readily targetable for C/C++ programming, but perhaps not at this level of abstraction or for this purpose. AMD has more visibility in the APIs devs might be familiar with, although its efforts at software support for RDNA thus far have been unsatisfactory.

Another area of comparison is the execution model for CUs and programmable cores like DSPs. CUs typically handle individual tasks with full kernel launches, with the arbitration, spin-up, and export phases sometimes taking measurable amounts of time. I'm not sure what the DSPs use, although cores can often take smaller tasks and can often context switch more readily than the CUs can.


I was genuinely asking the question. I don't know how big of an audio processor you'd need to do the same kind of real-time convolution reverb that an RDNA gpu can do across 2-8 CUs, as an example. I really don't have the knowledge of DSPs, or what's available on the market to get a sense of comparison.
There's nothing for most in the general audience to review, since the customers for these DSPs are SOC designers or manufacturers. There may be other factors that would go against licensed DSPs, such as whether the licensing would reflect having dozens of independent DSPs on-die, and whether there is additional complexity in programming or linking them to the memory subsystem. Some of the benefits of licensed IP like pre-validated elements and help with design might not matter as much for a company like AMD. It might not rule out a more streamlined CU with a task-processor execution model, as was alluded to in an AMD patent.
 
Last edited:
@3dilettante Your posts are always so informative. Thank you for the detailed response.

I’m very curious to see what they came up with. One strength of the gpu solution is it can scale to whatever a particular game needs by reserving more of the gpu. If they made a custom audio processor the capability will be fixed so they’ll have to aim high. By your hypothetical processor they could probably do something significant with 5-10 mm2, or in that ballpark.

if I had an amd gpu I’d probably install unity or UE and see if I could play around with Steam Audio and profile it.
 
Last edited:
Found an actually effective 3D audio example.


I wish the people doing these types of demos would actually show what settings they're using. I'm assuming this is 1st-order ambisonics. Good demonstration of direct audio and the benefits you can get from having a more sophisticated audio format/engine that is capable of more accurate panning. Just that improvement alone would be of huge benefit to a lot of games. It doesn't demonstrate sound occlusion, semi-occlusion, room acoustics, distance, indirect sound etc. I'm really curious as to how heavily they'll get into those things. Occlusion, semi-occlusion, direct sound through materials should be the easiest. I guess reverb is a necessity for correct perception of distance in open spaces. Games typically use some kind of cheap attenuation filter, and I guess they can be very hard to hand tune so that they'll be accurate over a wide range of distances. Some static environmental sounds could just be recorded with ambisonic microphones or pre-calculated (I think) for cheaper playback. It really comes down to having dynamic environments where moving objects could affect sound, having moving sound sources etc, and how deep they really want to get into it. Steam Audio can't currently

Here's partial occlusion without sound propagation (indirect sound)

Here's partial occlusion with sound propogation

It doesn't sound totally realistic to me, but there's no doubt that propogation is an improvement. It looks like there are options to bake both propagation and reverb for static environments. I'd like to see a demo of a more complex environment, like maybe a large manufacturing facility with open space, distance, lots of moving objects and sound sources.

This is also very cool.
http://media.steampowered.com/apps/...io_2.0-beta.17_embreedynamicgeometry.webm?v=1

 
I always assumed that the processing, placement, trajectory (casting?) etc. of audio would remain in the CPU/GPU domain (now bolstered further by hardware rt); and that the "3D Audio" functionality mentioned by Cerny referred to HRTF or HRTF-like functionality for transferring the audio soundscape to the listener via headphones (and to a lesser degree, various speaker setups), akin to a virtual mannequin head or the "cetera algorithm".

Whether that is achieved by dedicated audio DSP, some additional functionality in the APU or an advanced software layer I'm not sure..

But the missing link in recent years hasn't really been the creation of a complex and solid soundscape but conveying that soundscape to the player by converting it to something convincing for the inner structures of our ears and the way our brain processes sound.

The dream for me, is the virtual barbershop on a grander scale, with greater verticality and fidelity, in a virtual environment...any virtual environment.
 
Last edited:
@Mitchings I may be wrong, but my understanding is that the conversion of the ambisonic “channels” down to stereo with hrtf is one of the computationally cheap parts that can easily be done on the cpu. The expensive computations are convolution effects like reverb. I should probably play around with unity and see how much cpu steam audio chews up just for direct sounds.
 
I'm still trying to figure out how Dolby Atmos for headphones compares to the ambisonic formats. It would be very hard for Sony to force devs to use a proprietary format, and ambisonics seems to be the growing standard for VR
Not so proprietary this time... and cheaper than Atmos.
https://www.sony.com/electronics/360-reality-audio
https://www.sony.net/Products/360RA/licensing/
Distribution format
The 360 Reality Audio Music Format was designed around being optimized for music distribution. In an effort to avoid the challenges of proprietary technology, Sony has been partnering with Fraunhofer IIS, part of Europe’s largest organization for applied research, to ensure the format complies with MPEG-H 3D Audio standard, an open audio standard.
https://www.iis.fraunhofer.de/en/ff/amm/broadcast-streaming/mpegh.html
Immersive sound offers cinema-like realism
The system may transmit immersive sound with additional front and rear height speaker channels or the Higher-Order Ambisonics sound field technology, improving today’s surround sound broadcasts and streams to provide a truly realistic and immersive audio experience on par with the latest cinema sound systems.
 
That looks like a music format, not really how they'd do real-time audio for games. Still cool, but getting adoption of that stuff is near impossible. Unless apple music can play it and spotify can play it, it's pretty much a dead product.
 
Cerny had interviews back in 2013, maybe 2012, stating RT was considered for PS4 but devs shut the idea down as they'd need to develop new tools and alter work flows too much from what they were doing. With RT being a consideration for ps4, MS would've been pretty sure of RT in PS5 for several years now. IIRC, sony has RT patents dating far back as well. Not too sure of that one, though.

As for rt with NV/MS, wasn't the launch pretty bumpy and claimed by many to be rushed? Poor driver maturity at launch, compared to typical NV drivers, no software support, poor performance by anything outside the top spec card.

I wonder, are those PS4 (and even PS3) RT stories accurate, or was it just too slow to be practical? I mean, this tools and workflow problem is there no matter when, and early RT can't be powerful enough to be a total revolution.
But i admit MS was probably not really surprised. It's just as likely they had the same plans independently - i was just speculating.

I think the critique among devs about RTX being rushed has likely the same reasons it had for me: It appeared they did plan a revolution secretly for a long time, without notifying or asking for feedback. They crossed many plans, and some investments or research became obsolete which could have been prevented.
The execution itself: Launch games, driver and API support was not rushed, but done pretty well, if you ask me. Sure there was a lot rant about 'bad performance', but hey, it's RT, and it worked for the first time. Looking back it's easy to see what could have done better, but overall it was a successful start.
 
Status
Not open for further replies.
Back
Top