Predict: The Next Generation Console Tech - AUDIO

*sigh* You're one of _those_ people... There are no feedback frequencies, because there will always be a low pass filter inserted at below nyquist to ensure no aliasing occurs. Sampling theorem guarantees that the input frequencies and output frequencies will be identical as long as there are no frequencies above nyquist (half your sampling rate)
And 64bit is completely unnecessary. 24bit is 140dB of dynamic range. That's the difference between the softest sound you can think of in a soundproof room, and standing right behind a fighter jet as it takes off. 32bits would be in the range of 180dB, a sound that loud would pretty much instantly implode your head. The average room has a noise floor of about 50dB on a calibrated SPL meter, anything under that will be practically inaudible, and a good half of the 16 bit range is below that.

I did mean 64bit on the mixing stage, where it will give you almost unlimited mixing possibilitis for the dbfs scale.

Anyway most DAW still do 32 bits and is quite fine.

Personally 44Khz and 24bits (dinamics) would be fine.





A lot of this is pseudoscience. In real tests, users chose louder lower quality sounds over softer high quality ones, simply because we prefer louder sounds in general. I agree that the better quality an audio track is, the better in general, but there is no definitive science determining what the crossover is, since it's different for everyone. Most AAA game audio today has quality easily comparable to high quality movie mixes.

Whatever it is is real, that is why every pro do spend as much as as it can, because people notice it.

We generate a room impulse today with the current Kinect (that's what audio calibraton does), for use with echo reduction. We also do user tracking for beamforming.

Good to know (hadnt the chance to try it), hope it gets even better on the future ;)


Dude, I do audio _for a living_. I work with, and have worked with, leaders in the field of audio engineering. I have _done_ ABX testing, and that's why I know 100% that I cannot reliably tell the difference between 192Kb MP3 and CD quality. I've tried. I am by no means a "golden ear", although my frequency range is higher than average (I can still hear 18KHz, which at my age is pretty good :)). That's why for me, I know that I would get no boost at all, so any investment is worthless.

The advice was in general not just for you, didnt mean to offend .

Anyway some just cant hear it, I would try it nerveless...

This is not of any great concern. Current audio tech works just fine, and if there's actually any problem here, it's been dealt with satisfactorily.


Are you talking about samples here, or during the mixing stage? Because claiming that 64-bit float samples is the only "ideal" solution is completely ludicrous of course.

64 bit for mixing, and the CPU it is so small (if even noticiable in any modern CPU) that they should just do it by pride.

Anyway why not develop it into a full DJ or music production app to be used with a latency free kinetic 2.0, certainly i could be quite popular!

Yeah, but we can't have every console owner be forced to license synth software and instrument banks worth potentially thousands of dollars just to get good-sounding MIDI audio. ;) I think what Shifty was saying is that any standard, reasonably-priced general MIDI instrument bank is going to sound pretty crap playing orchestral music (which is true, btw.)

It would be such a small investment for someone like MS or Sony, really some of the best audio apps/synths are made by very few guys in about 2 years, it would be a drop of water in the ocean.

Not what I was thinking, although it could be interesting.
You got some independent studies-linkage to post, proving that claim? :)

That seems like placebo or (self-)indoctrination to me TBH. Again, some independent studies to back up this vague claim?

Most of them are inconclusive,that I know of.

But IMO you can take the word of so many musicians and music/audio producers or those who do music/audio for a living...


A Kinect reading would treat everything scanned as hard surfaces, as even a 3D camera can't properly identify fabrics and foam pillows and so on. It wouldn't create an accurate representation. Better than nothing perhaps, but you seem like such a stickler for details (64-bit floats... indeed!), I'm surprised you'd settle for setting the bar this low! :)

Sometimes the difference between nothing and little is a world of difference

Again, independent, proper studies rarely back up such claims. It's mostly just in the heads of the owners of such Hi-Fi gear, not anything tangibly measurable in the real world. Your language indicate as much btw, like when you speak of the need to "train" yourself to hear differences and so on. I doubt this is even possible.

I would have a hard time believing that someone can run 40 miles or get some of the score I saw in tetris, but they can and we saw it This is harder to proof.

Anyway how do you explain that a musicians learns to tune their guitars by ear, just a small example.

You dont need to actively train to it but you get used to it and then you notice it very easly, but that means you are actually used to hi fi sound.

Well, good for you! Honestly. I too love getting exactly the gear that I want. :D

I said I DO NOT HAVE the gear I would like.
 
Hmm. I edit audio on Audcaity in 32 bit floats, and it doesn't sound terrible. Or even in any way perceptibly bad. So I'm not sure what you count as less than ideal, unless ideal is a scientific ideal well beyond the typical human's perception. ;)

It should sound perfect, unless you are mixing a few dozen tracks at once.

This is somewhat tangential. By 'cheap' I was refering to synthetic orchestras. Even the good stuff (Vienna) that can sound very convincing can sound very artificial, and that's the quality of many game scores. For synthetic instruments, and even some simulated natural instruments (I own TruePiano's and it is very impressive), modern synths are superb, yes, but that's where I said 'limited', because synthetic sounds only take you so far in musical styles.

Within the confines of a MIDI or mod track in a game, sophisticated wave modelling or massive sample sets aren't an option. A MIDI module will play off a tiny soundfont with exceptioanlly fake sounding orchestral instruments. A mod will be limited to the small smaple set used to construct it, with it's looped sample regions. The quality you can hit can sound good, there's no denying, but it won't do for epic scores in the same way it'll suffice for a puzzler or shooter on a handheld.

Overall I agree.

This argument seems counterproductive to me. What you're saying is Joe Public can't tell the difference between $200 and $20,000 hardware until they are trained. Then, once trained, the cheap stuff they used to be happy with no makes them miserable, and they have to invest in expensive stuff to get enjoyment from their audio. Isn't that an argument to keep everything lofi and everyone happy?


Not really need a big investiment like I said the diference between nothing and litle is a world of diference and most of time what we have is nothing.

A cheap, "old" creative card (really a drop of silicon in a modern SOC) gives better sound than almost any CPU based of todays games.


Aren't most core gamers able to hours at a time? I don't see the evidence for gamers getting tired of terrible sounds. On the 8 bit machines with their infinitely repetitive tunes, young gamers still played hours at a time!

Wouldnt you prefer Sniper with a even better audio, you said it yourself.

That is what I meant with useful fx for gaming



I was playing Sniper Elite yesterday and the real problem wasn't audio sample quality or the repetitious nature of the explosion samples playing in the background (I'm sure they're reusing the same sound over and over, but I haven't noticed, unlike preating textures and models), but naff positional audio (on my monitor's cheap stereo speakers). It's not possible to identify where an enemy is, or how far away. I want positional audio, on headphones if necessary, to make immersion realistic. That's the glaring area that needs to be advanced IMO. Higher audio quality seems a waste of processing and resources when it doesn't sound in any way bad with what we've got.


That sound to me like what I also want!

That is something that is easy to get.

There is a number of useful things they could do t enhance gaming.

All of them would benefit of higher quality audio IMO:LOL:
 
It would be such a small investment for someone like MS or Sony, really some of the best audio apps/synths are made by very few guys in about 2 years, it would be a drop of water in the ocean.
I don't understand where you're going with this. Do you think every console should come with 16 GBs of orchestral samples isntalled and a hardware synth for playing MIDI music using internal sounds? Or that they should implement a softsynth in hardware and force every dev to use that instead of the masses of audio gear that's available to them?

That sound to me like what I also want!

That is something that is easy to get.

There is a number of useful things they could do t enhance gaming.

All of them would benefit of higher quality audio IMO:LOL:
They wouldn't benefit from higher than 44.1 kHz 16 bit mono (lossy compressed) samples, or higher than the existing audio output qualities. There's no need for higher source audio quality than can be achieved - only more sophisticated processing of the audio data.
 
Are you talking about only sound effects or also music ?
Because when it comes to music you may really want HiFi, although that would depend on the game.
(A game about music would want it, a few others with orchestrated tracks might also want that, but games with only background music likely don't need it)
 
The sound effects. Music should be stored at decent quality, and with the storage available now that can be high quality, although if it needs to be fitted into RAM or streamed at a lower bitrate because the drive is otherwise occupied streaming graphical assets, it may need to be kept compressed. But I've yet to hear a game soundtrack that sounds like it's compressed, so I'm fine for them to carry on with whatever they do now.
 
It should sound perfect, unless you are mixing a few dozen tracks at once.
32 bit float has 24 bits of mantissa (23+1 bits). It's capable of storing all 24 bit integer values without any precision loss. Also it is capable of storing all 24 bit integer values without any precision loss if you multiply (or divide) them by 2^x (x is a signed integer with [-127,128]). This is important for audio processing. Floating point doesn't downgrade sound quality when you increase or decrease the volume (even very silent voices are still kept in top quality). This is why 32 bit float is in many cases better than 32 bit integer.

You can add 256x 16 bit sound tracks together (assuming all have same mixing volume) to a 32 bit floating point and you will not lose a single bit of data. With 24 bit integer processing, you can add the same 256 sound tracks (lossless), but if you do any volume adjustments, you will lose precision.

I have been personally wondering how 16 bit floats (halfs) would sound compared to 16 bit integers. Floats do not have as big problems with volume adjustments (mixing), but 16 bit float has only 11 (10+1) bits of mantissa. It would likely cause audible problems...
 
I don't understand where you're going with this. Do you think every console should come with 16 GBs of orchestral samples isntalled and a hardware synth for playing MIDI music using internal sounds? Or that they should implement a softsynth in hardware and force every dev to use that instead of the masses of audio gear that's available to them?

Never said that.

But something like this should be standard

http://www.creative.com/soundblaster/technology/eax_advanced_hd/

(Althought some resynthsys tech could do wonders to make variations on sounds, any sounds, like blasts, shoots and death screams...)

They wouldn't benefit from higher than 44.1 kHz 16 bit mono (lossy compressed) samples, or higher than the existing audio output qualities. There's no need for higher source audio quality than can be achieved - only more sophisticated processing of the audio data.

Somethings would really benefit from stereo, mainly the music and anything beyond 44.1 kHz 24 bit would be pointless IMO too, like I think I said before.

My initial post was meant to be almost all about fx.

Positional audio are all about good reverbs, powerful shoots/blast are more intense if not overcompressed (or get a expander to work).

Good fx would give detail back to audio by converting it to output 44.1 Khz and 24bits, process it and output, even if the original is not.

A good DSP could do wonders to sound: sound position, virtual surround (either headphones or virtual 5.1 with "kinetic" help), give back detail to the sound, and (like kinetic seems to already do) better sound overall by correcting perceived problems.

Sure, better source material would be even better but even a inexpensive DSP integrated in a SOC would be really great compared to what we have.
 
I've not been too impressed with surround systems. They rarely correlate with the division of the screen and the seating so they sound like disconnected sounds just happening in the environment and not part of the movie/game. Unless you're sat dead centre in a properly set-up 5.1 system, you'll never be centralised in the audio, and that'll still never fit in with the screen anyway.

Windows 7 has native room correction calibration using a mic. It'll adjust speaker phase and amplitude according to your desired position (where you put the mic).
It works fairly well and you don't really need to be sat dead centre anymore.
Not to mention all the room correction systems found in fairly common A/V receivers like Audyssey, MCACC and YPAO.

I could see a console having this built-in. In fact, I don't see why a firmware update wouldn't allow this on current consoles, using an USB-mic.

The best audio IMO is physiologically correct binaural over headphones. The ultimate immersion would be a stereoscopic display like Sony's with stereo headphones and binaural accoustic processing. In a first-person game that'd place the player exactly in the centre of the action and give amazing spacial awareness. I don't know how well 5.1 headphones perform as I've never tried them, but technically they are a clumsy solution! That'd be an area for next-gen to improve in sound processing, except headphones aren't popular and would be spacially discrete from the display positioned some distance in front of the gamer. It's also kinda irrelevant to the choice of hardware and topic at hand, because audio processing is readily doable on the CPU. There's no need for a custom processor, especially for an extremely niche experience.

Because we have only two ears! ;) 5.1 allows placement of the source sound, but you have to average sounds that come between the speaker positions, and you can't place sounds behind the speaker positions. To do this requires cheats. Well, if you're going to model sound placement, why not do it just for 2 speakers for the two audio sensors that the brain uses? 5.1 Audio is like trying to generate 3D images with 10 images and a lenticular lens instead of using two images, one over each eye. The ability for binaural audio to model spaces and especially upclose sounds (what could be more immersive than a jungle shooter with mozzies buzzing aggravatingly around your head?!) is pretty prefect, in the same way a stereoscopic headset is leaps and bounds a more immersive 3D experience than looking a screen some distance away.

Contrary to popular belief, sound perception doesn't come from the eardrums alone. Much less localization perception. The "tactile" nerves in our outer ear can detect vibrations within the hearing spectrum, and they react differently depending on the source localization. That's mostly why we have these weird pieces of thin cartilage and skin with a particular form (which is different in each person).

Furthermore, the outer ear also dampens a soundwave before it reaches the eardrums, and that dampening (which is sort of a pre-equalizer) also changes according to the localization of the sound source relative to the head.

Anyways, perception of sound depends on:
1 - Latency of the waves between the eardrums (the most obvious)
2 - Amplitude difference between the waves that reach the eardrums (filled with liquid and soft tissue, our head is a fantastic sound dampener lol)
3 - Tactile nerve perception in the outer ear
4 - Pre-equalization made by the outer ear
5 - "Visceral feeling" caused by high-amplitude, low-frequency soundwaves (also sensed by our somatosensory system, or tactile nerves)


Shifty Geezer, as you can see, a properly-placed 4/6-driver headphone set can cover points 1, 2, 3 and 4 (some models with integrated vibration can even cover point 5 to a small degree). A 2-speaker headphone could only cover points 1 and 2.

Sure, many "virtual surround" tricks also try to emulate point 4, but they're made using a mock-up head+ears model which doesn't correspond to anyone's ears in particular. It'll sound "unnatural" no matter how you put it.

I'm not saying multi-driver headphones will sound the best. Most often, they won't because they don't use good quality drivers and/or OPAMPs and/or DACs (in the case of USB solutions).. There's also the problem that multi-driver headphones forces the manufacturer to use smaller drivers, which will handle low-frequency waves a lot worse.
But they're certainly the best approach for perceiving the location of sound sources in small form-factor, IMHO.

I don't think head-tracking is useful for anything but a niche. In FPSs and 3rd-persons, we're supposed to always be looking at the middle of the center monitor, even if we have a multi-monitor setup. The monitor's image is supposed to replicate the character's POV, so there's little logic in looking around.
I could see head-tracking being useful for racing/aerial/space sims, but the people who would go the extent of caring about that are so few that I don't know if it's worth it.





Now here's how I think gaming audio will evolve during this next generation:
- It won't.


It's been pretty much stagnant for 10 years since the XBox 1 used the nForce APU for Dolby Digital encoding and everything has stopped ever since.
Then again, it makes little sense to put effort/storage space into 24-bit/96KHz audio samples, since gamers usually don't care much about sound.

Decent speakers need to be big and use large drivers with large resonance cases. The typical consumer doesn't like big speakers.
The typical consumer prefers to pay 1200€ on a Bose setup with tiny satellites with all the low and mid-range frequencies coming out of the subwoofer than to pay 700€ for good floor-standing speakers + bookshelf satellites and an A/V Receiver, with the latter sound way, way better than the former.

Anything more than compressed+lossy 5.1 16-bit/44KHz is overkill for >99% of the console gamers, so why bother?
Plus, good sound is extremely hard to market. We can't show good sound in a TV ad, web page ad or superbowl ad.. It's the same reason why the 3DS was so difficult to market too.


Regarding equalizer/reverb effects, I think since Creative decided to pretty much copy/paste all of EAX 5's functionality into OpenAL EFX, there might not be much more work to do from there, to be honest.
Of course, a developer will have to dedicate time/money to use EFX, but is applied correctly the end result should be really good, and light on next-gen systems.
 
Ear fatigue, in my opinion, is caused by a combination of listening level being high enough to damage your hearing, poor quality recording with noisy treble, clipping amplifiers and over-driven speakers. It really doesn't have anything to do with the recorded format.

There may be some people with "golden ears" but they would be rare, and must have us have already done significant damage to our hearing and have hearing loss. Concerts, clubs, headphones, workplace noise, movies, car audio or whatever. I doubt I have anywhere near perfect hearing anymore.

What I do know is the only way to reliably evaluate stereo equipment is by reading quality measurements.
 
But IMO you can take the word of so many musicians and music/audio producers or those who do music/audio for a living...
That's an appeal to authority fallacy. Even professionals are often wrong or confused on topics that are heavily subjective - such as the perception of sound for example.

Sometimes the difference between nothing and little is a world of difference
I don't believe room compensation - especially a quick-and-dirty hack implementation done with a kinect camera - does "a world" of difference.

Anyway how do you explain that a musicians learns to tune their guitars by ear, just a small example.
I would assume they do it with practice. Like when learning to play the instrument in the first place. :)

You dont need to actively train to it but you get used to it and then you notice it very easly, but that means you are actually used to hi fi sound.
Again, I'd be more inclined to believe that reasoning if there were independent studies backing it up rather than vague handwaving and assurances along the line of "trust me on this, I'm right"... :)

I said I DO NOT HAVE the gear I would like.
Oh, well, that's too bad then. Good luck with your quest to obtain the stuff...
 
I would assume they do it with practice. Like when learning to play the instrument in the first place. :)

Also, you can hear out-of-tune instruments in recordings on CD, or mp3. Tuning is well within the range of hearing for basically everyone that doesn't have massive hearing impairment.
 
hifi can be incredibly affordable these days.
forget about expensive amps. you don't need the wattage, and you don't need a 200 euros or more budget.
modern, smart tech cost peanuts and have incredible frequency response, low distortion, perfect clarity and low noise. those are the "t-amps". they may be 2x12W or 2x30W max, but you can get loud sound out of them, down to the bass.

buy the cheapest cables.
forget about $500 CD players, other high end sources, preamps and all : a low end or midrange sound card does the job.
you now only need a pair of speakers!, passive, 8 ohm things, with the sensitivity (in dB) you wish to pay for.

sooner than later you can be limited by your room - small flat or bizarre, cramped house doesn't cut it.
active monitors can be another way of getting good sound.
 
Anyway how do you explain that a musicians learns to tune their guitars by ear, just a small example.

You dont need to actively train to it but you get used to it and then you notice it very easly, but that means you are actually used to hi fi sound.

I'm not sure how this correlates, but perfect pitch has little to nothing to do with being used to listening to hi-fi sound..

Yes, perfect pitch is trainable. It's just training your brain to memorize frequencies with a certain level of precision, just like you memorized how red, green and blue look like.

It doesn't mean that people with perfect pitch are better at discerning hi-fi than the others, though.


I can also tell you that not all is good in having perfect pitch. Some musicians know this and avoid it on purpose, for example.


I don't believe room compensation - especially a quick-and-dirty hack implementation done with a kinect camera - does "a world" of difference.

Have you ever tried using a room compensation system?

I had an Onkyo 674 + 7.1 Wharfedale speakers with Audyssey 2EQ, and now I have a Panasonic 921 + 5.0 Jamo speakers with MCACC, both in less-than-ideal rooms. Both were calibrated using a microphone placed in a less-than-ideal hearing position.

They make lots of difference to me.
 
I don't believe room compensation - especially a quick-and-dirty hack implementation done with a kinect camera - does "a world" of difference.
you don't do it with the camera, you do it with the mics. You can get a very accurate room impulse using a source with a good frequency range, and then using cross correlation and some other math to determine a "perfect" impulse. After that, you have to determine the position of the speakers and the desired listening position. We've done some proof of concept work using the assumption that the center speaker is generally in the same general location as the camera, and then calculating the angle and distance of the other speakers from that, it's fun stuff, but not important enough for us to spend a lot of time on.
 
Shucks, I should have asked years ago :p Honest questions, not really aimed to bait, but more inquisitive as I know you have first hand experience working with console audio and developer issues. Thanks for the feedback you provided. :D
 
Shucks, I should have asked years ago :p Honest questions, not really aimed to bait, but more inquisitive as I know you have first hand experience working with console audio and developer issues. Thanks for the feedback you provided. :D
I don't generally interact directly with the developers. Mostly when I get it it's been filtered through ATG, our developer support group. And even then, we usually get feedback only from the bigger developers and the middleware providers like wyse. So it's almost double filtered. We do interact directly with the developers sometimes, when we want to make sure a new API really solves a problem or provides real utility, it's sometimes easy to get sidetracked since we live so much further down the stack.
 
you don't do it with the camera, you do it with the mics. You can get a very accurate room impulse using a source with a good frequency range, and then using cross correlation and some other math to determine a "perfect" impulse. After that, you have to determine the position of the speakers and the desired listening position. We've done some proof of concept work using the assumption that the center speaker is generally in the same general location as the camera, and then calculating the angle and distance of the other speakers from that, it's fun stuff, but not important enough for us to spend a lot of time on.

To add the obvious (for you, but maybe not for others), irc Kinect has 4 microphones, 3 to the left relatively close to each other and another to the far right. This array, and particularly the distance to the fourth mik makes Kinect more suitable for auto-calibration tasks than the vast majority of microphones used with such calibration systems (where you can sometimes be lucky if it is even stereo)
 
LA Noire used uber low bitrate MP3 for their musics, radio, ambient sounds but on gameplay it doesn't sound bad at all. weird.

i just hope all next-gen games will have proper audio for every environment. Like on BF3 that make dialogue and gun sound echoing indoor, and not echo outdoor. (i think binaru domain did this too)
 
LA Noire used uber low bitrate MP3 for their musics, radio, ambient sounds but on gameplay it doesn't sound bad at all. weird.

i just hope all next-gen games will have proper audio for every environment. Like on BF3 that make dialogue and gun sound echoing indoor, and not echo outdoor. (i think binaru domain did this too)

Real gunshot sounds do in fact echo outdoors, it just depends on where outside you are. Ever gone hunting before?
 
Back
Top