Hearing and Sample Rates
Let’s look at some of the other factors concerning higher sample rates. The most obvious one is the debated question of whether sound above the commonly quoted human auditory perception limit of 20 kHz is significant or not. There are tests that show that some younger people can hear sine waves in air up to 24 kHz if reproduced loudly enough, so that alone suggests needing higher sampling rates, at least for some people. Other tests show that bone conductivity provides perception out to 90 KHz, however the sound is often heard as being between 8 and 16 KHz This suggests to some that it is a distortion process at work, and to others that human hearing has not been sufficiently studied to conclude that ultrasonic sounds are irrelevant. Some also point out that we are concerned with sound in air, not direct bone conductivity, as sound in air is how we listen to music. There have also been studies of the possibility of sensing ultrasonic frequencies with our skin. Certainly, ultrasonics in high levels can cause dizziness and nausea, and thus there are workplace limits on how high ultrasonic sound levels can legally be. There has also been plenty of anecdotal evidence, and some tests that are only semi-scientific, but compelling nonetheless. Rupert Neve does a test where he changes sine waves to square waves with high fundamentals, and people can hear the difference when they should not theoretically be able to, as the only difference is in harmonics that are above the commonly accepted audible range. He also tells a story of Geoff Emmerick correctly pointing out a couple of improperly terminated channels just by listening to the console output when the differences were only a few db down at around 50 kHz. In both cases above, there may be other distortions at work that explain the differences heard, but it remains interesting nonetheless. It has also been pointed out that trumpet with a harmon mute has a harmonic near 50 kHz which is near the amplitude of the fundamental, thus the argument of the upper harmonics being so low as not to matter is not an entirely accurate statement. So it seems that there is sometimes significant energy above 20 KHz, and ultrasonics may in some way be perceptible to humans, or possibly have some affect on what’s in the audible band. The jury may still be out, but it seems reasonable to make some effort to leave a margin of safety in our chosen sample rate. With these things in mind, many people adopt a “better safe than sorry†attitude and shoot for the sky where sample rates are concerned. One problem with this approach is that the storage space and the available rate of transfer from a storage medium have practical limits, thus reducing the amount of audio channels or other related data (pictures, video, text etc.) that can be included, and requiring more DSP power (thus more money) to be spent dealing with these large data requirements. It makes little sense to waste available resources for no reason.
Even if you discount the contested evidence on human perception of ultrasonic frequencies, to ensure coverage of the entire population, you still need to cover a 24 kHz bandwidth according to the studies, plus leave room for gentler filter slopes, and a bit of space to ensure that the filters won’t have audible artifacts due to ripple. At the very least, you still need 60 - 64 kHz sample rates according to most studies and industry task groups. Interestingly, the committee on sample rate in the 70’s had suggested a 60 kHz sample rate, but for practical reasons having to do with the available technology at the time, the 44.1 and 48 kHz rates were settled upon. And the last advantage of a higher sample rate, which was mentioned in the second installment of this series, is that you gain flexibility with noise shaped dither in that you can put more dither energy higher in the spectrum, thus improving low level detail in the critical bands.