Technical investigation into PS4 and XB1 audio solutions *spawn

I'm not using the term FLOPS since they don't represent any workload in real life scenarios. It's would be highly appreciated if you could post any audio DSP related measurements - it could be x taps of convolution (favors wide SIMD), recursive structures (Allpass, biquad, fdn etc.) or anything else related to audio processing. This way we could very easily compare between platforms, SHAPE, x86, ADSP, CU etc
I don't have those benchmarks. I have not had access to the hardware in almost a year. Suffice it to say that the vector cores are at least 3-wide SIMD, they have custom instructions especially for accelerating MEC and speech pipelines, they have a quad MAC, and they're running at high clocks.

BDTI is a company that does performance benchmarks for DSPs using 12 functions: Real Block FIR, Two-Biquad IIR, Viterbi Decoder, Single-Sample FIR, Vector Dot Product, Control, Complex Block FIR, Vector Add, 256-Point FFT, LMS Adaptive FIR, Vector Maximum and Bit Unpack.

The ADSP 21369 scores 2050 on their benchmark. The Tensilica core used by MS scores >6000 on the same benchmark, at half the clock rate MS will be running. (Higher is better). The X1 audio block has two of them.

The fixed function stuff is harder to quantify. The numbers you've seen are about as much as you'll get. Because it's fixed function, it doesn't matter what order of polyphase SRC or how many bands the EQ has. You get what you get. This makes it a lot less useful than a general purpose DSP, but you get a ton of functionality at low cost and low power.
 
I can find the 32bit float benchmark number for the ADSP-21369, but I can't find the 32bit float benchmark for the Tensilica chip - I would appreciate a link. I could only find a simulated 16bit fixed-point benchmark number, which is obviously not the correct one.

Would game developers have full access to Tensilica DSP?
 
I can find the 32bit float benchmark number for the ADSP-21369, but I can't find the 32bit float benchmark for the Tensilica chip - I would appreciate a link. I could only find a simulated 16bit fixed-point benchmark number, which is obviously not the correct one.

Would game developers have full access to Tensilica DSP?
Yeah, hard to do, because the tensilica cores are configurable. This one is configured similarly, although there will still be differences. the 16/32 bit difference may also explain why their result is about 2x what I expected. I do know that the MS vector cores have full 32bit float vector engines, because that's what the speech pipeline uses.

As far as I know, game developers do not have access to the 4 DSP cores. They are all system managed. They have access to codec algorithms running on the cores, and full access to the fixed function hardware. Much to the audio team's chagrin, the speech team bogarted the two vector cores. I know there was some internal pressure to force the speech team to give up some of their CPU so that developers could use it, but I have no idea if anything ever materialized from that.
 
Yeah - the two benchmarks are not very comparable, since the ADSP benchmark is based on a verified test on actual hardware in 32bit float. The Tensilica is a simulated benchmark not done on any hardware and based on 16bit integer. And further more the Tensilica test had 11 customized instructions specifically designed for this simulated benchmark to improve performance compared to other 16bit fixed-point DSP's.
 
Yeah - the two benchmarks are not very comparable, since the ADSP benchmark is based on a verified test on actual hardware in 32bit float. The Tensilica is a simulated benchmark not done on any hardware and based on 16bit integer. And further more the Tensilica test had 11 customized instructions specifically designed for this simulated benchmark to improve performance compared to other 16bit fixed-point DSP's.
Well, yes, that's the point. The MS cores have a bunch of customized instructions specifically to accelerate audio workloads in general, and the MS speech pipeline in particular. The scalar core has customized instructions to accelerate codec functions. Like I've said multiple times before, this is not a general purpose DSP to be used in high end mixing stations. It's a game console audio engine with a focus primarily on reducing power consumption and offloading audio workloads from the CPU. There's an entire core dedicated just to managing the other cores, doing housekeeping on the audio graph, moving memory around, and keeping the fixed function hardware well utilized, so the CPU doesn't have to (something no desktop audio card supports, as far as I know)
 
So bilikan on the question of enviromental audio modeling (or eax5 if you like as this is my benchmark)
it seems your saying
can shape do eax5 - no
can the xb1 as a whole do eax5 - yes

am I right
It seems capable because bkillian pointed out before that Shape is an order of magnitude more capable than the best Creative X-Fi (this was long before the recent discussion on the capabilities of the audio block), so a version for Shape would be EAX50 or more, to be precise.

If the GPU of the Xbox One is from the Sea Islands family, then Shape would be from the Solaris Islands :LOL:, if that even exist, which doesn't afaik.
 
Is bilikan talking more capable as a whole or more capable of environment modeling ?
If my memory isnt failing me shape doesnt do reverb I do know it doesnt wave trace so any enviromental modeling will either be faked/pre computed or done somewhere else
It also doesnt do hrtf
It also strikes me a strange that if shape is so powerfull why did he state that doing reverb was expensive ?

ps:
anyone with an asus xonar be prepared to run some tests ?
 
Sorry, I should have said "consumer sound cards". The X1 is not a professional mixing station, it's a game console.
And the mix buffers have 128 physical buffers, but can be used with over 4000 virtual buffers per audio frame. Think of them as registers that can hold an entire audio frame. The 21369 has 32, much smaller ones. The SRC can process 512 channels per audio frame, and the XMA decoder can decode 512 channels per audio frame.

The clock speed of the audio block is twice that of a 21369, and the fixed function blocks were calculated, per the hotchips presentation, to be 18 GOPS equivalent. The 21369 is 2.4 GFLOPS. If you assume the scalar tensilica cores are about the same power per clock of a 21369, and use the 15.4 GFLOPS value for the two vector cores, you're talking 23 21369s equivalent for the whole audio block. How much did that 12 core sound card cost again? I found an 8 core one for something like $1500. Let's change my statement to "the Xbox one audio block is far more powerful than any sound card you can buy for less than or equal to the price of an entire Xbox one."

I believe you heavily underestimate the power of the X1 audio block.

:oops: :D That was beautiful, can't wait to see/hear what the devs do with all this audio hardware.
 
:oops: :D That was beautiful, can't wait to see/hear what the devs do with all this audio hardware.
Unfortunately, Devs only have access to a small part of it. Most of it is reserved for Kinect processing. As a bonus though, it means devs don't have to ask the question "do I have the resources to spare for adding Kinect?" like they did in the last console. Kinect is free(*). _Not_ using it is leaving processing power on the table. I hope this encourages them to be more liberal in their kinect integration this time around.

(*) For certain values of "Kinect". I believe there are some features that devs can hook in to that require memory/processing on their part. Speech is not one of them.
 
What's the chances now that XB1 has good audio that MS may create a DX eax type api? (for both PC and XB1)
Would there be enough _spare_ processing capacity to process it on the audio block?
Either way I think it would be a progressive move.
 
Unfortunately, Devs only have access to a small part of it. Most of it is reserved for Kinect processing. As a bonus though, it means devs don't have to ask the question "do I have the resources to spare for adding Kinect?" like they did in the last console. Kinect is free(*). _Not_ using it is leaving processing power on the table. I hope this encourages them to be more liberal in their kinect integration this time around.

(*) For certain values of "Kinect". I believe there are some features that devs can hook in to that require memory/processing on their part. Speech is not one of them.

Even then I think it's going to be something special when the right devs start to thinking up ideas.

I want to see where voice control/recognition can go to in games & with extra hardware to make it easier for the devs & the console I'm sure we will see some nice things.

I think the next step is to actually make a A.I co-processor for a console so video games can really put NUI to good use & let the A.I respond back naturally in every game. but I guess that could also be done by using the Cloud & having a really large database of A.I interactions.
 
I've got a question for bkilian, if he knows...

Can the fancy speech processing and recognition "tricks" that the hardware does in conjunction with the Kinect also be done from the headset inputs? Is is physically possible to do that, if MS chose to allow it?

I ask, because on the 360 you can't do most of the voice stuff unless you have a Kinect hooked up. I assumed that was for business reasons, not technical ones, but I could be wrong.

Since MS has come up with an alternate way to encourage Kinect usage this coming gen, I'm hoping that headset mics can be used for more than just game-chat this time around.

And heck, since I've got you here, do you have any Kinect-mounting recommendations for best audio quality? MS has been pretty vague. Does the Kinect like to be set on a audio-reflective surface? (like a boundary mic) Or does it prefer to be "floating", up and away from such reflectors? Similarly, is it better to be close to, or far from, the wall behind the TV? Is it recommended to decouple the Kinect from any vibration sources? (Like my TV which has fans and color-wheels and such, before we even start talking about the big-ass speakers that sit to either side.)

I know a lot of blood, sweat and tears went into making the Kinect(2) relatively immune to crappy room acoustics, but I figure there's no reason not to make life as easy as possible for it.

I'm thinking... cover the wall with egg cartons, and then suspend the Kinect from bungie cords in front of that. Just gotta clear it with the wife! ;)
 
I've got a question for bkilian, if he knows...

Can the fancy speech processing and recognition "tricks" that the hardware does in conjunction with the Kinect also be done from the headset inputs? Is is physically possible to do that, if MS chose to allow it?

I ask, because on the 360 you can't do most of the voice stuff unless you have a Kinect hooked up. I assumed that was for business reasons, not technical ones, but I could be wrong.

Since MS has come up with an alternate way to encourage Kinect usage this coming gen, I'm hoping that headset mics can be used for more than just game-chat this time around.

And heck, since I've got you here, do you have any Kinect-mounting recommendations for best audio quality? MS has been pretty vague. Does the Kinect like to be set on a audio-reflective surface? (like a boundary mic) Or does it prefer to be "floating", up and away from such reflectors? Similarly, is it better to be close to, or far from, the wall behind the TV? Is it recommended to decouple the Kinect from any vibration sources? (Like my TV which has fans and color-wheels and such, before we even start talking about the big-ass speakers that sit to either side.)

I know a lot of blood, sweat and tears went into making the Kinect(2) relatively immune to crappy room acoustics, but I figure there's no reason not to make life as easy as possible for it.

I'm thinking... cover the wall with egg cartons, and then suspend the Kinect from bungie cords in front of that. Just gotta clear it with the wife! ;)
You want it as far away from your center speaker as you can get, and not in an enclosed area. The rest is normal open mic type stuff. The less echo you can get in your room, the better it'll work.

And yes, you can do speech reco using the headset mic. It requires you to retrain the speech database because the audio pipeline will be different, which is one of the reasons it wasn't done in the 360. I don't know the plans for the X1.
 
I assumed the voice quality on the 360's standard headset mic was too low for it to effectively be used for voice recognition, but it occurs to me I don't how the audio fidelity impacts that kind of thing.
 
Ive setup a few pc's for voice operation in every case the user had a low quality microphone and it worked quite well despite the fact no one completed the training
 
VGLeaks details (not much) PS4 audio processor: http://www.vgleaks.com/playstation-4-audio-processor-acp/

Seems pretty barebones after all.

audio1-600x360.jpg
 
Yep, pretty basic. But probably still more than enough, considering what Garageband can do on the iPad.
 
This is just a repeat of previous vgleaks rumors, what's new in there?

The slide looks so fake, for an official presentation it should have mentioned fine grain buzzword compute things.
 
Back
Top