Technical investigation into PS4 and XB1 audio solutions *spawn

Based on other replies earlier in the thread, seems like the answer is "yes" to doing 3D audio on the CPU.

I'm of the belief that none of this stuff will ever get used unless there's total cross platform support, that the target will always be the least common denominator. But both consoles and PC all seem capable enough. So theoretically in a cross platform title, a dev could just use middleware that transparently handles everything in the most optimal way? So:

Xbox One - decompression/mixing/basic processing on shape, 3D audio/high end reverb on APU

PS4 - decompression on dedicated chip, mixing/basic processing/3D audio/high end reverb on APU

PC with AMD - decompression/mixing/basic processing on CPU, 3D audio/high end reverb on Trueaudio DSP

PC with Nvidia - decompression/mixing/basic processing on CPU, 3D audio/high end reverb on CPU or GPGPU

Does that seem plausible, or am I missing something?
 
openal supports hardware acceleration of audio if you have a dsp and there's a software fallback if you dont via openal soft
OpenAL Soft is an LGPL-licensed, cross-platform, software implementation of the OpenAL 3D audio API. It's forked from the open-sourced Windows version available originally from the SVN repository at openal.org.
OpenAL provides capabilities for playing audio in a virtual 3D environment. Distance attenuation, doppler shift, and directional sound emitters are among the features handled by the API. More advanced effects, including air absorption, occlusion, and environmental reverb, are available through the EFX extension. It also facilitates streaming audio, multi-channel buffers, and audio capture.

Hopefully someone will port it to use the gpu or the amd dsp
 
I wouldn't count Nvidia out for long, it doesn't take much for them to match AMD and MS.

PS4 though, seems they missed the boat with dynamic sounds. Will likely be canned stuff for them.
 
So at the end of a second, if you have a convoluted enough audio graph, you have mixed 4000 voices of 48000 samples in that second. So #1 in your example above.

I assume the same is true for all the other modules in SHAPE. I'm abit curious regarding the thoughtprocess behind the inclusion of both EQ/CMP and FLT/VOL, since they both perform similar functions if we go by the information from Vgleaks. Any particular reason why MS didn't include a convolution module instead of FLT/VOL, as an example?
 
So in your opinion it should be easier for a game to get "3D" sound with a pair of headphones (any special requirements) compared to a proper calibrated HT 5.1 setup?

I know I'm late, and you didn't ask for my opinion, but I find this subject interesting.

Short version: It's complicated. Speakers get us semi-decent 3D soundscapes, and use hardware that many people already own. Headphones have the potential to get us all the way to "photo-realistic" soundscapes, but require some additional hardware and software that aren't as common. Non-existent, even.

Long version: It's complicated. Headphones are a much "easier" (cheaper, simpler, more self contained) route to a realistic soundscape than a sparse speaker array could ever hope to be. Speakers (and the rooms they inhabit) just add too many complications (and wires!) to the equation.

But surround sound speakers work great, right? Stuff that's supposed to be behind you really sounds like it's behind you. Up and down? Not so much. You can even turn your head slightly to further localize sounds. Also, it works (kinda) for everybody in the room. And you don't have to wear anything on your head. You can see why it's a popular way to go.

On the other hand, with headphones, you don't have to worry about the in-room interactions of several speakers. And you don't have to slice a perfectly good spherical soundscape into 5 or 7 irregularly sized wedges, use panning between each wedge-center, and then hope those wedges roughly correspond to wherever the user has actually placed his speakers. Let's not even think about level and EQ issues between the speakers. With headphones, you only have to worry about two channels of audio, and those two channels directly map to two "virtual ears" that have been virtually duct taped to the virtual camera which conveys the game graphics. It's all very simple, conceptually. You just figure out what each ear should be hearing, based on its location and orientation. Speaker location doesn't come into it.

Are there problems? Yes. Two main ones: Head tracking, and good Head Related Transfer Functions (HRTF).

You need head tracking, so when you turn your real head, the soundscape doesn't just move with it. With actual speakers, moving your head "Just Works" in large degree. With headphones, it doesn't. The world "feels inside your head", and to some degree that's because it's clearly moving with your head. This destroys a big chunk of the realism headphones can convey. Luckily, head tracking is enjoying a bit of a resurgence lately, what with Kinect, and due to the hubbub around the Oculus Rift. If (head mounted) VR catches on, the audio side of things will just happen. However, I haven't seen any efforts to attach a cheap inertial tracker to a pair of headphones and then pass the calculated head orientation info back into a game. (I guess the TrackIR sort of does this, again as a side effect of doing a video thing.) That'll probably never happen "just for audio", since it would require so much cooperation between hardware and software makers. So, either head-mounted (visual) VR happens, and gets us "virtual audio" as almost a side effect, or we will probably just have to continue to go without.

HRTF is some sort of filtering magic that makes things sound like "they're coming from behind you", or "coming from above". We all use this magic when navigating through the real world, and out there the filtering happens as a result of sound refracting around our heads, and the shape of our outer ears. Since not everyone has the same ear/head shape, HRTFs aren't one-size-fits-all either. But I think they're probably pretty close. We can probably get by with a shared HRTF. Like I said, the whole "cocking your head to figure out where a sound is coming from" thing works pretty well on its own. If we get head tracking working well, having a perfect HRTF may not be as crucial.
 
I assume the same is true for all the other modules in SHAPE. I'm abit curious regarding the thoughtprocess behind the inclusion of both EQ/CMP and FLT/VOL, since they both perform similar functions if we go by the information from Vgleaks. Any particular reason why MS didn't include a convolution module instead of FLT/VOL, as an example?
Cost I suspect. Convolution is a high bandwidth memory heavy effect. I know folks wanted something like that, but it just didn't pan out with the budgets (memory/bandwidth/transistor) they had. As for why they chose the individual effects they did, your guess is as good as mine. Probably better. Some of it was because XAudio2 supports them, but other than that I have no clue. Maybe games use a lot of EQ-type stuff, and they wanted to make it easier than chaining three SVF together?
 
I know I'm late, and you didn't ask for my opinion, but I find this subject interesting.

Short version: It's complicated. Speakers get us semi-decent 3D soundscapes, and use hardware that many people already own. Headphones have the potential to get us all the way to "photo-realistic" soundscapes, but require some additional hardware and software that aren't as common. Non-existent, even.

Long version: It's complicated. Headphones are a much "easier" (cheaper, simpler, more self contained) route to a realistic soundscape than a sparse speaker array could ever hope to be. Speakers (and the rooms they inhabit) just add too many complications (and wires!) to the equation.

But surround sound speakers work great, right? Stuff that's supposed to be behind you really sounds like it's behind you. Up and down? Not so much. You can even turn your head slightly to further localize sounds. Also, it works (kinda) for everybody in the room. And you don't have to wear anything on your head. You can see why it's a popular way to go.

On the other hand, with headphones, you don't have to worry about the in-room interactions of several speakers. And you don't have to slice a perfectly good spherical soundscape into 5 or 7 irregularly sized wedges, use panning between each wedge-center, and then hope those wedges roughly correspond to wherever the user has actually placed his speakers. Let's not even think about level and EQ issues between the speakers. With headphones, you only have to worry about two channels of audio, and those two channels directly map to two "virtual ears" that have been virtually duct taped to the virtual camera which conveys the game graphics. It's all very simple, conceptually. You just figure out what each ear should be hearing, based on its location and orientation. Speaker location doesn't come into it.

Are there problems? Yes. Two main ones: Head tracking, and good Head Related Transfer Functions (HRTF).

You need head tracking, so when you turn your real head, the soundscape doesn't just move with it. With actual speakers, moving your head "Just Works" in large degree. With headphones, it doesn't. The world "feels inside your head", and to some degree that's because it's clearly moving with your head. This destroys a big chunk of the realism headphones can convey. Luckily, head tracking is enjoying a bit of a resurgence lately, what with Kinect, and due to the hubbub around the Oculus Rift. If (head mounted) VR catches on, the audio side of things will just happen. However, I haven't seen any efforts to attach a cheap inertial tracker to a pair of headphones and then pass the calculated head orientation info back into a game. (I guess the TrackIR sort of does this, again as a side effect of doing a video thing.) That'll probably never happen "just for audio", since it would require so much cooperation between hardware and software makers. So, either head-mounted (visual) VR happens, and gets us "virtual audio" as almost a side effect, or we will probably just have to continue to go without.

HRTF is some sort of filtering magic that makes things sound like "they're coming from behind you", or "coming from above". We all use this magic when navigating through the real world, and out there the filtering happens as a result of sound refracting around our heads, and the shape of our outer ears. Since not everyone has the same ear/head shape, HRTFs aren't one-size-fits-all either. But I think they're probably pretty close. We can probably get by with a shared HRTF. Like I said, the whole "cocking your head to figure out where a sound is coming from" thing works pretty well on its own. If we get head tracking working well, having a perfect HRTF may not be as crucial.

It could be that the best we'll get is virtualizing channels with an outboard processor, like we have now. But that would at least make head tracking viable without support from the game, and with enough channels, it could be more than a crude approximation.

7.1 is still just a flat plane, but HDMI 2.0 supports 32 channels, so there's a chance we'll eventually get some discrete height channels added to the mix. Can the Xbox audio block support more than 7.1 though?
 
It could be that the best we'll get is virtualizing channels with an outboard processor, like we have now. But that would at least make head tracking viable without support from the game, and with enough channels, it could be more than a crude approximation.

Quite possibly. I use one of those outboard processors myself. One of those Astro Mixamp thingies. It just seems like such a waste of effort to start with a (presumably) "full 3D" soundscape, then slice it up into discreet channels, and then smush them back together into a "continuous" soundscape, to which you then apply an HRTF. I mean, it's better than nothing, but it seems so inelegent.

7.1 is still just a flat plane, but HDMI 2.0 supports 32 channels, so there's a chance we'll eventually get some discrete height channels added to the mix. Can the Xbox audio block support more than 7.1 though?

Perhaps not as easily. I dunno. Aren't there some alternate 7.1 schemes that give up rear channels to add some height? Maybe that would work. You would need head tracking on the phones to make that worthwhile I bet. Or not. For me, the HRTF cues never seem to work very well. (It's hard to tell front from back, but I haven't had the chance to try out up/down cues very much.) Even the famous barbershop recording doesn't do much for me.

Although I did run across a YouTube video that (for me) had really convincing back/front and up/down cues. It was an Oculus Rift demo. I wonder if I can find that....

Here it is.

Maybe it's just a placebo effect coming from the visual clues, but it works quite well on me. Make sure to bump the video quality up to 720 so you get better sound. And turn off any HRTF stuff on your outboard gear.

I'd really like to try a game that uses whatever algorithm they're using in that vid.
 
Quite possibly. I use one of those outboard processors myself. One of those Astro Mixamp thingies. It just seems like such a waste of effort to start with a (presumably) "full 3D" soundscape, then slice it up into discreet channels, and then smush them back together into a "continuous" soundscape, to which you then apply an HRTF. I mean, it's better than nothing, but it seems so inelegent.

Perhaps not as easily. I dunno. Aren't there some alternate 7.1 schemes that give up rear channels to add some height? Maybe that would work. You would need head tracking on the phones to make that worthwhile I bet. Or not. For me, the HRTF cues never seem to work very well. (It's hard to tell front from back, but I haven't had the chance to try out up/down cues very much.) Even the famous barbershop recording doesn't do much for me.

It sure is inelegant, but it does work relatively well. I find that in first person games being able to correlate the rear sounds with my POV turning helps quite a bit. Head tracking would add a nice touch, I've heard that DTS Headphone:X is going down that path. Along with custom HRTFs and headphone correction...but that could take years. They're even claiming its 11.1, but that's matrixed from 7.1, so it would add yet another layer of inelegance.

To me it just feels like the 3D audio days are so far behind us that there's a whole generation of gamers that have never heard it and don't know what they're missing, so they're not asking for it. Like you can barely find a good YouTube demo of A3D, because YouTube didn't even exist back then! The virtual surround works well enough that people think that's the best it can get.
 
I think you misunderstood what I said. What I was really saying was that due to in-order, pipeline stalls, and other design elements of the XCPU, it's supposed 100GFLOPS is really only about 20GFLOPS when you profile real code. The hotchips presentation pegged the Shape block at 18G Ops (they can't use flops, it's an integer pipeline :)). Creative would have called it an 18000 MIPS chip, compared to their X-Fi's 10000 MIPS. When you include all the housekeeping the ACP does, it adds up quite quickly.

When we verified the functions of the chip, we compared the output of our 32 bit float reference blocks to the output of the chip using the same input. The outputs are exactly bit-equivalent. That's not really a surprise, since the blocks were designed from our reference pipeline.

Out of interest do those 18G Ops include the Xbox Media Decoder or is it purely the mixing/filtering hardware?.
 
Well I'm glad my Christmas present that year helped keep you employed !!!!:LOL:

Thanks! Aw, Christmas time at Creative. Those were the days I wanted to just kill myself. LOL You didn't happen to call support did you? That's what I did. Would have been funny if we actually talked. LOL BTW, most of my time there was spent supporting their whole product line(not just soundcards & CDROM drives) in OS/2, Windows NT & Windows 95(beta & 4 months after it shipped). Good times.

What's really sad is that I'm pretty sure I've had all those cards in my PCs over the years.

What's more sad is having every one of those cards at the exact same time and running benchmarks on them for a couple days & then putting them back in their boxes so they could collect dust on a shelf for a couple years only for them to later all get ruined by a flood in a storage unit years after that. Yeah, I wasn't too happy about that. LOL

Tommy McClain
 
Are there problems? Yes. Two main ones: Head tracking, and good Head Related Transfer Functions (HRTF).

You need head tracking, so when you turn your real head, the soundscape doesn't just move with it. With actual speakers, moving your head "Just Works" in large degree. With headphones, it doesn't. The world "feels inside your head", and to some degree that's because it's clearly moving with your head. This destroys a big chunk of the realism headphones can convey. Luckily, head tracking is enjoying a bit of a resurgence lately, what with Kinect, and due to the hubbub around the Oculus Rift. If (head mounted) VR catches on, the audio side of things will just happen. However, I haven't seen any efforts to attach a cheap inertial tracker to a pair of headphones and then pass the calculated head orientation info back into a game. (I guess the TrackIR sort of does this, again as a side effect of doing a video thing.) That'll probably never happen "just for audio", since it would require so much cooperation between hardware and software makers. So, either head-mounted (visual) VR happens, and gets us "virtual audio" as almost a side effect, or we will probably just have to continue to go without.

With Kinect being standard, could we see head tracking for audio positioning being a big deal in non-motion controlled games like FPS? It wouldn't be just for headphones either, but headphones could benefit much more by its use. Could this also be a big feature for the rumored Fortaleza glasses? Maybe that's why Kinect & the audio block/Shape have such a big presence in XB1?

Tommy McClain
 
I remember that time very fondly. I worked for Creative during the SB16-SBAWE64 time-frame. A great time to be in the business & a gamer. ;)

Tommy McClain
The EMU8000 is one of my favourite chips ever. I listened to many MIDI files with a Sound Blaster card back then. The quality isn't up to today's standards but that was a fine synthesizer for its time.

I remember I burnt the motherboard of my first PC when I installed the SoundBlaster AWE32 soundcard... I was so anxious so excited to try it that I forgot to remove the screwdriver from inside the case.

Luckily for me the soundcard survived.

I wouldn't count Nvidia out for long, it doesn't take much for them to match AMD and MS.

PS4 though, seems they missed the boat with dynamic sounds. Will likely be canned stuff for them.
By dynamic sounds you mean 3D sound? I wouldn't write them off yet. I kinda agree with Microsoft engineers, the PS4 has some extra CUs it might not need in some cases in order to keep things balanced.

Aside from that, Sony are an audio & video company, so they know how to build their own libraries and they could use a couple of spare CUs to produce sound and help the CPU out. Just a thought....
 
The EMU8000 is one of my favourite chips ever. I listened to many MIDI files with a Sound Blaster card back then. The quality isn't up to today's standards but that was a fine synthesizer for its time.

I remember I burnt the motherboard of my first PC when I installed the SoundBlaster AWE32 soundcard... I was so anxious so excited to try it that I forgot to remove the screwdriver from inside the case.

Luckily for me the soundcard survived.

Oh wow. You were lucky. I think I remember quite few people ruining/breaking motherboards by pushing their cards too hard into the slot. There was nothing we could do. As for the AWE32, it was a great card. I got a free one for passing one of their internal promotion tests. My Wave Blaster daughtercard fit right on it. :) Got the Wave Blaster for my SB16 before the AWE32. That was a pretty cool duo. My last SB card was shortly after I left the company: SB Live! Gold with the EMU 10K1. Had it for a long time. I think I ended up giving it away. LOL My first one was a SB 2.0. Kinda wish I kept all my cards. :|

Tommy McClain
 
Thanks! Aw, Christmas time at Creative. Those were the days I wanted to just kill myself. LOL You didn't happen to call support did you? That's what I did. Would have been funny if we actually talked. LOL BTW, most of my time there was spent supporting their whole product line(not just soundcards & CDROM drives) in OS/2, Windows NT & Windows 95(beta & 4 months after it shipped). Good times.



What's more sad is having every one of those cards at the exact same time and running benchmarks on them for a couple days & then putting them back in their boxes so they could collect dust on a shelf for a couple years only for them to later all get ruined by a flood in a storage unit years after that. Yeah, I wasn't too happy about that. LOL

Tommy McClain

Na , didn't call support , don't want to make you feel old but that was my Christmas gift when I was 12.

I bought a ton of sound cards during that time. I was the computer nerd at my school , getting a 2x cd burner improved my social life with the ladies 10 fold lol
 
HRTF is similar for many people, but not all. For best results, I think a library of transfer functions would be needed with a setup process allowing a best match to be made on a per gamerID basis.

Maybe even more importantly, good HRTF emulation relies on in-canal transducers so that the physical HRTF of your head and pinnae are bypassed and thus not superposed on what you are trying to emulate. There is a wide variety of headphones/earbuds out there, and for a game to really be convincing in emulated 3D sound via headphones, assuming the HRTF is a reasonable match for a large proportion of users, some standardization in headphone design is necessary.

If HRTF demo material hasn't worked well for you, try another headphone style and see if that makes a difference.

You could say a developer could just take whatever they get with some average HRTF and whatever headphones gamers use, figuring the result would still be an improvement over current status quo... but then you can say the same thing about the variety of soundbars, 5.1, 7.1 etc surround setups gamers are liable to have.

Both approaches require some level of standardization. Personally, I'd rather go for physical speakers. Headphones will never recreate the visceral impact that comes from good full range loudspeakers, loudspeakers do not require HRTF or head tracking to work, headphones for me are a bit uncomfortable after a while, headphones for me reduce the social aspect of gaming unless you are an online only kind of lone gamer... Not to mention kinect might be useful for surround loudspeaker setup and optimization if MS were smart about it (such as requesting the user complete a setup process where kinect was placed at the gamers head position facing the display to map speaker locations, distances, levels, frequency response, and if MS were really clever also calculating the ETC for each channel and compensating for some room interaction effects...).
 
HRTF is similar for many people, but not all. For best results, I think a library of transfer functions would be needed with a setup process allowing a best match to be made on a per gamerID basis.

Maybe even more importantly, good HRTF emulation relies on in-canal transducers so that the physical HRTF of your head and pinnae are bypassed and thus not superposed on what you are trying to emulate. There is a wide variety of headphones/earbuds out there, and for a game to really be convincing in emulated 3D sound via headphones, assuming the HRTF is a reasonable match for a large proportion of users, some standardization in headphone design is necessary.

If HRTF demo material hasn't worked well for you, try another headphone style and see if that makes a difference.

You could say a developer could just take whatever they get with some average HRTF and whatever headphones gamers use, figuring the result would still be an improvement over current status quo... but then you can say the same thing about the variety of soundbars, 5.1, 7.1 etc surround setups gamers are liable to have.

Both approaches require some level of standardization. Personally, I'd rather go for physical speakers. Headphones will never recreate the visceral impact that comes from good full range loudspeakers, loudspeakers do not require HRTF or head tracking to work, headphones for me are a bit uncomfortable after a while, headphones for me reduce the social aspect of gaming unless you are an online only kind of lone gamer... Not to mention kinect might be useful for surround loudspeaker setup and optimization if MS were smart about it (such as requesting the user complete a setup process where kinect was placed at the gamers head position facing the display to map speaker locations, distances, levels, frequency response, and if MS were really clever also calculating the ETC for each channel and compensating for some room interaction effects...).

Doesnt the kinect already do that from it's own POV during the setup process, so it can calculate speaker distance, delay, etc in order for it's noise cancellation to work properly? Once they're doing that, it's not too much of a stretch to ask gamers to move it to their seat to fire off a few more impulses.
 
I know I'm late, and you didn't ask for my opinion, but I find this subject interesting.

Short version: It's complicated. Speakers get us semi-decent 3D soundscapes, and use hardware that many people already own. Headphones have the potential to get us all the way to "photo-realistic" soundscapes, but require some additional hardware and software that aren't as common. Non-existent, even.

Long version: It's complicated. Headphones are a much "easier" (cheaper, simpler, more self contained) route to a realistic soundscape than a sparse speaker array could ever hope to be. Speakers (and the rooms they inhabit) just add too many complications (and wires!) to the equation.

But surround sound speakers work great, right? Stuff that's supposed to be behind you really sounds like it's behind you. Up and down? Not so much. You can even turn your head slightly to further localize sounds. Also, it works (kinda) for everybody in the room. And you don't have to wear anything on your head. You can see why it's a popular way to go.

On the other hand, with headphones, you don't have to worry about the in-room interactions of several speakers. And you don't have to slice a perfectly good spherical soundscape into 5 or 7 irregularly sized wedges, use panning between each wedge-center, and then hope those wedges roughly correspond to wherever the user has actually placed his speakers. Let's not even think about level and EQ issues between the speakers. With headphones, you only have to worry about two channels of audio, and those two channels directly map to two "virtual ears" that have been virtually duct taped to the virtual camera which conveys the game graphics. It's all very simple, conceptually. You just figure out what each ear should be hearing, based on its location and orientation. Speaker location doesn't come into it.

Are there problems? Yes. Two main ones: Head tracking, and good Head Related Transfer Functions (HRTF).

You need head tracking, so when you turn your real head, the soundscape doesn't just move with it. With actual speakers, moving your head "Just Works" in large degree. With headphones, it doesn't. The world "feels inside your head", and to some degree that's because it's clearly moving with your head. This destroys a big chunk of the realism headphones can convey. Luckily, head tracking is enjoying a bit of a resurgence lately, what with Kinect, and due to the hubbub around the Oculus Rift. If (head mounted) VR catches on, the audio side of things will just happen. However, I haven't seen any efforts to attach a cheap inertial tracker to a pair of headphones and then pass the calculated head orientation info back into a game. (I guess the TrackIR sort of does this, again as a side effect of doing a video thing.) That'll probably never happen "just for audio", since it would require so much cooperation between hardware and software makers. So, either head-mounted (visual) VR happens, and gets us "virtual audio" as almost a side effect, or we will probably just have to continue to go without.

HRTF is some sort of filtering magic that makes things sound like "they're coming from behind you", or "coming from above". We all use this magic when navigating through the real world, and out there the filtering happens as a result of sound refracting around our heads, and the shape of our outer ears. Since not everyone has the same ear/head shape, HRTFs aren't one-size-fits-all either. But I think they're probably pretty close. We can probably get by with a shared HRTF. Like I said, the whole "cocking your head to figure out where a sound is coming from" thing works pretty well on its own. If we get head tracking working well, having a perfect HRTF may not be as crucial.

Thanks, cleared a few things up for me. The Onkyo receiver i have now actually support "height" for the speakers. It's pretty interesting that after we had sound from behind we know have sound from above :)
My HTF is to small and the wall is 100% covered by my PJ. So i just settle for 5.1 (Audyssey corrected)

I am considering this as my next Headset for gaming.. it seems to offer value for money, even though it's from sony..

http://www.digitaltrends.com/gaming...views/sony-pulse-review-ps3-elite-edition/#/5
 
Thanks, cleared a few things up for me. The Onkyo receiver i have now actually support "height" for the speakers. It's pretty interesting that after we had sound from behind we know have sound from above :)
My HTF is to small and the wall is 100% covered by my PJ. So i just settle for 5.1 (Audyssey corrected)

I am considering this as my next Headset for gaming.. it seems to offer value for money, even though it's from sony..

http://www.digitaltrends.com/gaming...views/sony-pulse-review-ps3-elite-edition/#/5
What do you mean "even though"? Sony makes some damn good electronics. :)
 
Back
Top