Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
Sincerely, i use to play with stereo headphones nowadays, so for me even a cheapo dvd player chip would be enough.

I think you're missing most of what an sound card/audio processor actually does. It isn't just about how many channels can be output in the final master, it's how many sounds can be included in the final master and how many different effects and filters can be applied without using too much CPU performance. There seems to be a lot of audio processing done in software for games like Battlefield 3. They have a sort of fake high-dynamic range audio, with all kinds of processing to roll off sound over distance, alter sound based on positioning and add effects like muffling sound after you've been near an explosing. It would be interesting to know how much a dedicated hardware solution would benefit a game like that, and how many more sounds they could support, and if there are any new effects they could add. Just reading yesterday, there are all kinds of effects they could have just based on positioning. This audio block is supposed to support XAudio2 API. It's just a question of whether custom DSP effects are done on the audio block or on the CPU. Still, there is a lot of work that's been offloaded onto the audio block from the CPU, and for a much larger number of sounds concurrently.

Eg. You could have one game that masters to 7.1, but can only put 64 sounds (voices) into the final master, with little to no audio processing besides some volume adjustment. You could have another game that masters to stereo sound but can put 128 sounds into the final master with each sound processed for volume adjustments, environmental effects and filtering.
 
I think you're missing most of what an sound card/audio processor actually does. It isn't just about how many channels can be output in the final master, it's how many sounds can be included in the final master and how many different effects and filters can be applied without using too much CPU performance. There seems to be a lot of audio processing done in software for games like Battlefield 3. They have a sort of fake high-dynamic range audio, with all kinds of processing to roll off sound over distance, alter sound based on positioning and add effects like muffling sound after you've been near an explosing. It would be interesting to know how much a dedicated hardware solution would benefit a game like that, and how many more sounds they could support, and if there are any new effects they could add. Just reading yesterday, there are all kinds of effects they could have just based on positioning. This audio block is supposed to support XAudio2 API. It's just a question of whether custom DSP effects are done on the audio block or on the CPU. Still, there is a lot of work that's been offloaded onto the audio block from the CPU, and for a much larger number of sounds concurrently.

Well, you are right. I would like to be able to enjoy a 7.1 high quality sound system. But, in practice, enjoying something like that is very difficult, and that happens in 95% of the cases. I mean, i am not against having a super sound chip inside Durango. Simply, i would have prefered the super duper chip for making other things that i could have enjoyed more ( more CUs maybe would have been enough ).
PS3 ( this is also comparing systems?) is a very capable sound system and the best quality i have been able to enjoy from it is a Dolby 5.1, which was much better than a simple stereo, but still, i usually play at nights and can rarely go on with my multi channel joy.
 
Last edited by a moderator:
Well, you are right. I would like to be able to enjoy a 7.1 high quality sound system. But, in practice, enjoying something like that is very difficult, and that happens in 95% of the cases. I mean, i am not against having a super chip sound inside Durango. Simply, i would have prefered the super duper chip for making other things that i could have enjoyed more.

I think you're still missing this a little bit. This is as beneficial to stereo as it is for 7.1. Use Battlefield 3 as an example of absolutely incredible audio, even played in 2 channel. How much better can it get this gen? I'd argue there's still a long way to go.
 
Well, you are right. I would like to be able to enjoy a 7.1 high quality sound system. But, in practice, enjoying something like that is very difficult, and that happens in 95% of the cases. I mean, i am not against having a super sound chip inside Durango. Simply, i would have prefered the super duper chip for making other things that i could have enjoyed more ( more CUs maybe would have been enough ).
PS3 ( this is also comparing systems?) is a very capable sound system and the best quality i have been able to enjoy from it is a Dolby 5.1, which was much better than a simple stereo, but still, i usually play at nights and can rarely go on with my multi channel joy.

Think of the sounds you might hear in a shooter. In a basic audio engine you would hear the sound of guns when they fire. A slightly more advanced engine would alter the loudness of the gun sound based on how close to the player the object generating the sound is. The next improvement might be to put a filter over all of the sounds based on the environment the scene is taking place in (outdoors, in a corridor, etc). From there you could have individual objects in the environment between the object generating the sound and the player modify the sound. Then maybe you start modelling the sound of the individual bullets as they strike objects in the environment.

This increasing complexity (if not these specific examples) is the kind of improvement you would expect from developers having increased audio processing capabilities available to them. The advantage of having dedicated audio hardware is that, up to limit of the performance of the dedicated hardware, none of these improvements take performance away from the other game systems. It's simply up to the developer to determine how much programming effort they want to put into the game audio.
 
Absolutely. Screwing around... for Science!
There was some research into using parametric speakers to deliver a stereo mix directly to your ears without you wearing headphones, driven by kinect head sensing. It could deliver a different audio mix to each player.

Mine uses 40Khz transducers and then modulates the normal waveform onto the ultrasonic one. When done properly, the sound just seems to arrive in your head without having come through the air.
A video in which an advertising company takes advantage of this effect. They have their beam set pretty wide. Mine I have set to about 3 feet wide at 30 feet. With some more DSP playing around, I could actually have it appear in a bubble by using interference from two beams.

Something like this?

Augmented reality is powerful, but to build a real virtual world you'll want characters who appear to speak to you with location-specific audio. Impossible? Nope; the technology already exists for specialists.

"The effect of 3D sound is astonishing," says Tuyen Pham, CEO of A-Volute: 3D Sound Projects. "You're in the axis of the speaker and you hear sound; you move your head a little bit and the sound disappears. The hardware components in our technology are still too expensive to be used in consumer products, though with mass production having a device on top of or under your TV set can be something accessible for a major brand."

Microsoft are that major brand, and Microsoft Research has already demonstrated its own directional sound prototype - a rack of 16 speakers all working together to 'project' sound into a small area. Coupled with Kinect head-tracking, the speakers could project audio only you can hear, making headphones a thing of the past. Might the tech be intended for the next Xbox? We put the idea to Pham, who has some ideas about how it could work.

"The consumer device looks like a 10x20cm panel with a thickness of one centimetre. It could be placed anywhere, and I think the application of such technology for game consoles is clear. However, after checking with our legal department I can't disclose that. We're working with a gaming company but the information I could give is under NDA..."
 
Well, Xbox 360 was capable of real time 5.1 sound processing with as much as 3 hardware threads. So, a chip capable of processing that would be enough for most people.
See, I could make a resistor DAC capable of outputting 5.1 sound. I'm pretty sure it wouldn't be good enough for you (And 6 printer ports seems overkill anyway :))
What we're talking about is the mixing that happens to every sound in the game _before_ it gets the final mix to output.
Let's take a racing game, for instance: We have noise generated by the engine. We have the sounds from each of the tires, filtered by the cockpit, and 3d positioned. We have the wind noise. We have the crowd noise, one or more per crowd, dopplered, filtered and 3d positioned. Then we have noise from each car on the track, dopplered, filtered and 3d positioned.

If done right, this can be mixed into a stereo or a 5.1 or 7.1 mix and will be very immersive. A lot more "bang for your buck" than an extra CU (or 10% more graphics, essentially). People underestimate the importance of audio in games, especially AAA games.
Let's say you're doing a 5.1 end mix, for a single crowd noise, per audio frame, you need a SRC (doppler) -> Filter (Cabin) -> Equalizer (Cabin) -> 5 volume settings (3d positioning, ignore the bass) -> 6 Mixers (one per channel)
And you need to do that for _every_ sound in the game, at a minimum. I'm even ignoring compressor effects and DSP effects. The cabin filter is more likely a convolution filter with an impulse, which is expensive. Some effects are cheaper in the frequency domain, but then you need a Fourier and inverse Fourier, so you don't bother unless you have a ton of filters that would benefit from it.

There's a reason there are companies that make their entire living just providing audio engines for games. (Like Audiokinetic WWise or FMOD). Game sound is not just playing back an MP3.
 
See, I could make a resistor DAC capable of outputting 5.1 sound. I'm pretty sure it wouldn't be good enough for you (And 6 printer ports seems overkill anyway :))
What we're talking about is the mixing that happens to every sound in the game _before_ it gets the final mix to output.
Let's take a racing game, for instance: We have noise generated by the engine. We have the sounds from each of the tires, filtered by the cockpit, and 3d positioned. We have the wind noise. We have the crowd noise, one or more per crowd, dopplered, filtered and 3d positioned. Then we have noise from each car on the track, dopplered, filtered and 3d positioned.

If done right, this can be mixed into a stereo or a 5.1 or 7.1 mix and will be very immersive. A lot more "bang for your buck" than an extra CU (or 10% more graphics, essentially). People underestimate the importance of audio in games, especially AAA games.
Let's say you're doing a 5.1 end mix, for a single crowd noise, per audio frame, you need a SRC (doppler) -> Filter (Cabin) -> Equalizer (Cabin) -> 5 volume settings (3d positioning, ignore the bass) -> 6 Mixers (one per channel)
And you need to do that for _every_ sound in the game, at a minimum. I'm even ignoring compressor effects and DSP effects. The cabin filter is more likely a convolution filter with an impulse, which is expensive. Some effects are cheaper in the frequency domain, but then you need a Fourier and inverse Fourier, so you don't bother unless you have a ton of filters that would benefit from it.

There's a reason there are companies that make their entire living just providing audio engines for games. (Like Audiokinetic WWise or FMOD). Game sound is not just playing back an MP3.

You made me understand a lot of things, I'll pay more attention to sound when I play a game. And I'm really enchanted, can't wait to hear this powerfull audio hardware with a driving game or a open field battle with hundreds of amazing sounds
 
See, I could make a resistor DAC capable of outputting 5.1 sound. I'm pretty sure it wouldn't be good enough for you (And 6 printer ports seems overkill anyway :))
What we're talking about is the mixing that happens to every sound in the game _before_ it gets the final mix to output.
Let's take a racing game, for instance: We have noise generated by the engine. We have the sounds from each of the tires, filtered by the cockpit, and 3d positioned. We have the wind noise. We have the crowd noise, one or more per crowd, dopplered, filtered and 3d positioned. Then we have noise from each car on the track, dopplered, filtered and 3d positioned.

If done right, this can be mixed into a stereo or a 5.1 or 7.1 mix and will be very immersive. A lot more "bang for your buck" than an extra CU (or 10% more graphics, essentially). People underestimate the importance of audio in games, especially AAA games.
Let's say you're doing a 5.1 end mix, for a single crowd noise, per audio frame, you need a SRC (doppler) -> Filter (Cabin) -> Equalizer (Cabin) -> 5 volume settings (3d positioning, ignore the bass) -> 6 Mixers (one per channel)
And you need to do that for _every_ sound in the game, at a minimum. I'm even ignoring compressor effects and DSP effects. The cabin filter is more likely a convolution filter with an impulse, which is expensive. Some effects are cheaper in the frequency domain, but then you need a Fourier and inverse Fourier, so you don't bother unless you have a ton of filters that would benefit from it.

There's a reason there are companies that make their entire living just providing audio engines for games. (Like Audiokinetic WWise or FMOD). Game sound is not just playing back an MP3.

I'm kind of curious. I found some notes for Cryengine that says they spend about 2ms (on how many threads?) of every 16ms of processing time on audio, which I'm assuming would be typical even with a 30 fps game. That's not a huge amount of time, and they're most likely doing other things in parallel. It would be interesting to know how many sounds they can handle in their audio graph for their final mix.

If a racing type game can have a large number of voices per car, how exactly did the 360 handle processing that on the CPU without totally monopolizing all of the processing time? Are some voices eliminated from the mix based on positions of the cars (you only hear the cars nearest to you, or within a certain distance)? I'm just curious because it seems like there would be too many sounds to process within the limited time you had based on the theoretical 256 voices per core with some basic linear SRC, filtering and volume per channel.
 
If done right, this can be mixed into a stereo or a 5.1 or 7.1 mix and will be very immersive. A lot more "bang for your buck" than an extra CU (or 10% more graphics, essentially). People underestimate the importance of audio in games, especially AAA games.
Let's say you're doing a 5.1 end mix, for a single crowd noise, per audio frame, you need a SRC (doppler) -> Filter (Cabin) -> Equalizer (Cabin) -> 5 volume settings (3d positioning, ignore the bass) -> 6 Mixers (one per channel)
And you need to do that for _every_ sound in the game, at a minimum. I'm even ignoring compressor effects and DSP effects. The cabin filter is more likely a convolution filter with an impulse, which is expensive. Some effects are cheaper in the frequency domain, but then you need a Fourier and inverse Fourier, so you don't bother unless you have a ton of filters that would benefit from it.

How many Jaguar cores would you guesstimate are needed to emulate the audio hardware in Durango (if it even could)? How about the more taxing things the 360 used to do in software, whats the jaguar equivalent?

I wonder if MS will be including stereo headphone jacks in the controllers this time around? Seems like a good way to guarantee they get their money's worth with people getting experiencing the benefits of quality sound without relying on them having higher end stereo systems or dedicated gaming headsets.
 
I wonder if MS will be including stereo headphone jacks in the controllers this time around? Seems like a good way to guarantee they get their money's worth with people getting experiencing the benefits of quality sound without relying on them having higher end stereo systems or dedicated gaming headsets.

It will be really nice if done, the Roku 3 remote control has headphone jack and I like it.

Kinect mic + headphone jacks in controller will be good for me.
 
How many Jaguar cores would you guesstimate are needed to emulate the audio hardware in Durango (if it even could)? .

seems like in the devkit microsoft used a 8-core CPU to emulate SHAPE Chip, but the cpu was not enought to reach the power of SHAPE

but I don't know which cpu it is, but seems that SHAPE isn't a match even for the whole Jaguar

Interference: The alpha kits had an extra 8 CPU cores to emulate the sound block but even that wasn't enough processing power so they never implemented the emulation
That's also where some early rumours of Durango having a 16 core CPU came from.
 
Last edited by a moderator:
Damn, this is a long post. For those not interested in esoteric audio geekery, it's safely skippable.

I wonder if Bkillian had a hand in designing the audio capabilities of Durango. :) If so I'll have to send him a virtual pint of the best beverage he likes if audio in games improves significantly this generation. :)
Designing? No. Validating designs? Some. The guy in charge of design is a hardcore audio guy, who's been doing audio for xbox since the original xbox. He's also the guitarist in a band, and designs his own effects pedals for fun. Among the team, there were a number of musicians (Drummers, guitarists, the guy who wrote most of the Dragonball Z cartoon's music), electrical engineers with deep audio knowledge, who implement the filter algorithms, and folks like me, with a degree in Physics.

How many Jaguar cores would you guesstimate are needed to emulate the audio hardware in Durango (if it even could)? How about the more taxing things the 360 used to do in software, whats the jaguar equivalent?
From all reports, an 8 core jaguar is about equivalent to the Xenon in floating point performance, about 102GF or so. So a 360 core is equivalent to about 2.67 jaguar cores, reportedly. Which, when you think about it, is quite remarkable. In pure floating point performance, doing something highly streamable and optimized, which audio generally is, the 360 would _still_ give a current generation CPU a run for it's money. You can see why the designers figured software audio was feasible. Consider the DSP in a creative Z or X-Fi is a 400Mhz, 10000 MIPS part, so the 360 CPU is theoretically 10 times the power.
As to emulation equivalents for unannounced products, I really can't say.

I'm kind of curious. I found some notes for Cryengine that says they spend about 2ms (on how many threads?) of every 16ms of processing time on audio, which I'm assuming would be typical even with a 30 fps game. That's not a huge amount of time, and they're most likely doing other things in parallel. It would be interesting to know how many sounds they can handle in their audio graph for their final mix.

If a racing type game can have a large number of voices per car, how exactly did the 360 handle processing that on the CPU without totally monopolizing all of the processing time? Are some voices eliminated from the mix based on positions of the cars (you only hear the cars nearest to you, or within a certain distance)? I'm just curious because it seems like there would be too many sounds to process within the limited time you had based on the theoretical 256 voices per core with some basic linear SRC, filtering and volume per channel.
I suspect Crytek are talking about total processing in a 16ms frame, so audio is taking up 12.5% of their total processing budget. That's reasonable for a FPS. Different games have different audio requirements, and games that do a lot of reverb and filtering will use a lot more resources for audio than others. I bring up car racing games because they tend to be one of the worst offenders. These guys love simulation, they simulate the tire dynamics, they simulate the car's momentum, and they simulate as much realistic audio as they can fit into their budget. Yes, they tend to have to make allowances, doing things like you suggested. For instance, you only need the sound from the closest two crowds, all the others can be blended into a general background noise. Same for cars. All the cars behind you going at roughly the same speed can be folded into one, same for those in front. Only those passing you, and those going at a significantly faster or slower rate need individual attention. Graphics does similar things with mipmapping and reducing distant object triangle counts.
Game programming is all about budgets and compromises. For Crytek, I suspect they just have some canned environmental sounds that they can then use to give you the illusion of a 3d environment, then they just take care of the exceptions and they're good.
 
Damn, this is a long post. For those not interested in esoteric audio geekery, it's safely skippable.

Designing? No. Validating designs? Some. The guy in charge of design is a hardcore audio guy, who's been doing audio for xbox since the original xbox. He's also the guitarist in a band, and designs his own effects pedals for fun. Among the team, there were a number of musicians (Drummers, guitarists, the guy who wrote most of the Dragonball Z cartoon's music), electrical engineers with deep audio knowledge, who implement the filter algorithms, and folks like me, with a degree in Physics.

From all reports, an 8 core jaguar is about equivalent to the Xenon in floating point performance, about 102GF or so. So a 360 core is equivalent to about 2.67 jaguar cores, reportedly. Which, when you think about it, is quite remarkable. In pure floating point performance, doing something highly streamable and optimized, which audio generally is, the 360 would _still_ give a current generation CPU a run for it's money. You can see why the designers figured software audio was feasible. Consider the DSP in a creative Z or X-Fi is a 400Mhz, 10000 MIPS part, so the 360 CPU is theoretically 10 times the power.
As to emulation equivalents for unannounced products, I really can't say.

I suspect Crytek are talking about total processing in a 16ms frame, so audio is taking up 12.5% of their total processing budget. That's reasonable for a FPS. Different games have different audio requirements, and games that do a lot of reverb and filtering will use a lot more resources for audio than others. I bring up car racing games because they tend to be one of the worst offenders. These guys love simulation, they simulate the tire dynamics, they simulate the car's momentum, and they simulate as much realistic audio as they can fit into their budget. Yes, they tend to have to make allowances, doing things like you suggested. For instance, you only need the sound from the closest two crowds, all the others can be blended into a general background noise. Same for cars. All the cars behind you going at roughly the same speed can be folded into one, same for those in front. Only those passing you, and those going at a significantly faster or slower rate need individual attention. Graphics does similar things with mipmapping and reducing distant object triangle counts.
Game programming is all about budgets and compromises. For Crytek, I suspect they just have some canned environmental sounds that they can then use to give you the illusion of a 3d environment, then they just take care of the exceptions and they're good.

Thanks again for the comments to my questions.

Wow, I never would have guessed Xenon would be that strong in any type of computation relative to an 8-core Jaguar. Xenon is so old.
 
In pure floating point performance, doing something highly streamable and optimized, which audio generally is, the 360 would _still_ give a current generation CPU a run for it's money.

A dual core perhaps if you measuring peak theoretical performance against current generation desktop cpus. Modest quad cores more than double xenons peak rate these days.
 
A dual core perhaps if you measuring peak theoretical performance against current generation desktop cpus. Modest quad cores more than double xenons peak rate these days.
I was thinking mobile CPUs, like what's in ultrabooks, but yes, current high end desktop CPUs are well above the 360 performance. It's still a remarkable feat considering it's been 7 years since that CPU was released.
 
Status
Not open for further replies.
Back
Top