Technical investigation into PS4 and XB1 audio solutions *spawn

How could an audio reverb possibly consume 500-800MB/s when CD quality audio streams in of themselves require ~1/8MB/s; you're not going to have 1000+ streams going simultaneously, obviously. That'd sound like utter chaos.
 
How could an audio reverb possibly consume 500-800MB/s when CD quality audio streams in of themselves require ~1/8MB/s; you're not going to have 1000+ streams going simultaneously, obviously. That'd sound like utter chaos.

Reverb is based on 1000000+ reflections in real life. We already have pro reverb units with 500-930MB/s+. But highly optimized algorithms which are still far ahead of current game reverbs in quality will typically use 12MB/s.
 
No doubt that having a dedicated audio chip pays off. At least to save CPU resources, along with HD space.

http://www.eurogamer.net/articles/2014-03-11-why-the-pc-version-of-titanfall-is-a-48gb-install

"Yeah, basically when you download the game or the disc itself, it's a lot smaller than that," Baker replied when asked about the PC version's 48GB install.

"We have audio we either download or install from the disc, then we uncompress it. We probably could have had audio decompress off disc but we were a little worried about min spec and the fact that a two-core machine would dedicate a huge chunk of one core to just decompressing audio
 
I don't think they divided resources well and didn't plan ahead for the PC. They are probably letting you download / install every language including Esperanto(!) all in one go, otherwise it would be gigabytes of sound per level, which doesn't make sense.
 
I'd like to see some profile captures first. >_>

Shame they couldn't just make it an optional install.

Sounds fishy. /poor AlNets processor pre-90s
 
I had read someone else it was indeed all languages and that picking one pre-install would have been a better solution. The whole game sounds rushed. I guess EA didn't want to give it a few extra months to iron out the performance issues.
 
The audio of Killer Instinct next gen.

http://www.polygon.com/2014/3/21/5532182/killer-instinct-next-gen-audio

Figuring out what Killer Instinct's brand essence was started, perhaps unsurprisingly, with its series-defining announcer. Originally voiced by Rare producer Chris Sutherland — who's still with the UK-based developer — Double Helix thought to re-record him and be done with it. But after 17 years, Sutherland's voice had changed enough to cause difficulties. To assist with recapturing the essence of the original, Rare sent DAT recordings from the original Killer Instinct's voiceover to Double Helix, which were then re-processed for the 2013 release.
 
A question for bkilian, :smile2: if he would be so kind to reply... Granted SHAPE is amazing and what not, and I wonder why some features you mentioned didn't make the final cut, like the reverb technologies you talked about sometimes or the Fourier transform.

Was it because of a tight budget (money or transistors) or was it because you considered those technologies completely unnecessary for the console? Just curious...
 
No doubt that having a dedicated audio chip pays off. At least to save CPU resources, along with HD space.

Even in the case of minimum dual core and DMA doing its job the resources to move uncompressed sound make no sensible justification for this.

Codec Benchmark For example it shows how little cpu lossless compression uses and combined with the bitrate reduction it ultimately free cycles. FLAC is not the best case in decompression speed but its presence is ubiquitous in the recording industry now. some low end cpus can decode FLAC with 7Mhz!

EDIT:
Here some recent graphs from FLAC site.
all-tracks-decode.png


EDIT2: OpenCL FLAC encoder, its really faast but maybe someone can engineer the code for decoding in game engines. http://www.cuetools.net/wiki/FLACCL

EDIT3: speed/size trade-off TAK is very impressive. Hydrogenaudio TAK 2.3.0
 
Last edited by a moderator:
Even in the case of minimum dual core and DMA doing its job the resources to move uncompressed sound make no sensible justification for this.

Codec Benchmark For example it shows how little cpu lossless compression uses and combined with the bitrate reduction it ultimately free cycles. FLAC is not the best case in decompression speed but its presence is ubiquitous in the recording industry now. some low end cpus can decode FLAC with 7Mhz!

EDIT:
Here some recent graphs from FLAC site.
all-tracks-decode.png


EDIT2: OpenCL FLAC encoder, its really faast but maybe someone can engineer the code for decoding in game engines. http://www.cuetools.net/wiki/FLACCL

EDIT3: speed/size trade-off TAK is very impressive. Hydrogenaudio TAK 2.3.0

How badly would it trash the cache?.
 
Reverb is based on 1000000+ reflections in real life. We already have pro reverb units with 500-930MB/s+.
Ten million reflections, how? Speed of sound is what, 340m/s or something like that. Each bounce is a third of a millionth of a meter long...? :)

Anyway, modern hardware caches should bring any reverb way, way, way down in bandwidth useage no matter how ridiculous, as it's just the same audio stream being worked over in multiple passes on top of itself.
 
Ten million reflections, how? Speed of sound is what, 340m/s or something like that. Each bounce is a third of a millionth of a meter long...? :)

Anyway, modern hardware caches should bring any reverb way, way, way down in bandwidth useage no matter how ridiculous, as it's just the same audio stream being worked over in multiple passes on top of itself.

Audio sources are, for the most part, omnidirectional; Sound is transmitted in all directions. You can simulate this by using rays (cones), these reflect, refract and scatter off geometry which spawns more rays/cones. The end result is integrated at the listening position, where you then need to do lots of convolutions to get the result.

Cheers
 
Ten million reflections, how? Speed of sound is what, 340m/s or something like that. Each bounce is a third of a millionth of a meter long...? :)
I don't think the length of the bounce matters at all, as long as there's enough of a pressure wave to make the sound audible. But even then, you're talking about a linear stream, whereas sound is spreading out in a spherical pressure wave. Stand in front of a 2mx2m wall and you'll get bounces off every little piece of that wall. If we count it as one bounce per cm square (pretty inaccurate as you'd ignore the roughness of the surface), there's 4 million reflections per 'sample'. Then there'll be reflections of those reflections from whatever's behind you. And reflections off the floor. And reflections off yourself back towards the wall and then back towards you...

True audio reverb is like true light reflections. The 'correct' way to solve is raytracing the audio paths, but that's too computationally difficult so we use hacks. The more individual reverbs we can calculate, the closer we'll get to reality, so I guess there's reason to calculate loads of reverbs instead of the old-school environment reverb applied to the sound-effect channel.
 
A question for bkilian, :smile2: if he would be so kind to reply... Granted SHAPE is amazing and what not, and I wonder why some features you mentioned didn't make the final cut, like the reverb technologies you talked about sometimes or the Fourier transform.

Was it because of a tight budget (money or transistors) or was it because you considered those technologies completely unnecessary for the console? Just curious...
No idea, I wasn't part of the decision making. I know our audio guy fought for a dedicated DSP to do reverb and FFT, but the speech team jealously guarded their two cores.
 
I don't think the length of the bounce matters at all, as long as there's enough of a pressure wave to make the sound audible. But even then, you're talking about a linear stream, whereas sound is spreading out in a spherical pressure wave. Stand in front of a 2mx2m wall and you'll get bounces off every little piece of that wall. If we count it as one bounce per cm square (pretty inaccurate as you'd ignore the roughness of the surface), there's 4 million reflections per 'sample'. Then there'll be reflections of those reflections from whatever's behind you. And reflections off the floor. And reflections off yourself back towards the wall and then back towards you...

True audio reverb is like true light reflections. The 'correct' way to solve is raytracing the audio paths, but that's too computationally difficult so we use hacks. The more individual reverbs we can calculate, the closer we'll get to reality, so I guess there's reason to calculate loads of reverbs instead of the old-school environment reverb applied to the sound-effect channel.

Since I am much less informed on the mechanics of audio and sound than I am about light and rendering, I´ve wondered for some time, without an answer, about voxel cone tracing for audio "global illumination".

Say, If an engine already has a simplified voxel representation of the scene, cascaded or an octree or whatnot, used for GI and such, could that be leveredged in some way for audio? Could material properties used for rendering like roughness, translucency or metalicness be extrapolated to guess sound refraction properties? Could audio code hitchike the same cones the rendering engine would already be using for its GI to do some of the audio at the same time with less overhead/less aditional rays?
 
I'd expect so. No reason not to, except perhaps the directionality of audio being different. But as an audio approximation it may well work.

Oh, another concern would be time of flight. Ray tracing works by having the time of a light ray being effectively zero given the distances viewed in game. Audio would have to factor in the speed of sound. A reverberation could originate from in front of you (gun shot) and then ricochet off a wall to your right after you've turned 90 degrees, meaning it'd be decoupled from the visual evaluations. Maybe the sound could be calculated riding the light tracing and delayed and played?
 
Nice point. You could limit the length of cones, and re-inject them on the next frame too.
 
I'd expect so. No reason not to, except perhaps the directionality of audio being different. But as an audio approximation it may well work.

Oh, another concern would be time of flight. Ray tracing works by having the time of a light ray being effectively zero given the distances viewed in game. Audio would have to factor in the speed of sound. A reverberation could originate from in front of you (gun shot) and then ricochet off a wall to your right after you've turned 90 degrees, meaning it'd be decoupled from the visual evaluations. Maybe the sound could be calculated riding the light tracing and delayed and played?

Also, sometimes truly realistic audio may even be deemed as "unrealistic" because we are all used to Hollywood / game realism. For example, a corridor may direct a sound so that the sound we hear may be from the right, while the sound source is actually on your left, behind a padded wall. You implement that and more than half of your audience will think it's a bug :) A gunshot from 300 meters away should come a second late, but I think people would think your audio is not in-sync sometimes!


I just want the lightning to go first with no sound at all and the sound should come a bit later. I've yet to see that in a game.
 
Say, If an engine already has a simplified voxel representation of the scene, cascaded or an octree or whatnot, used for GI and such, could that be leveredged in some way for audio? Could material properties used for rendering like roughness, translucency or metalicness be extrapolated to guess sound refraction properties? Could audio code hitchike the same cones the rendering engine would already be using for its GI to do some of the audio at the same time with less overhead/less aditional rays?

The light data would only be usable if the sound source is in close proximity to the light sources. Most light calculations are also in screenspace (around 80 degrees) and not 360 degrees which would be required for an immersive environment.
 
Back
Top