Yeah, it's interesting to think about this. Especially in relation to spatial audio and spatial location and spatial recognition.
In real life...
- If there is one sound source it's pretty easy to identify where it is and in good detail. For example, a person talking to you in a room.
- If there are two sound sources, you can still easily locate them, but now your brain will filter out one or the other to an extent if you focus on one or the other.
- If there are multiple sound sources in a room, like say 30 people at a gathering. It becomes almost impossible to reliably locate any one sound source (person speaking).
- Effectively at this point, addition sound sources just become "noise."
- This goes for anything. Vehicles, gunshots, animals, footsteps, etc.
Think of the rain example. It's cool to model each raindrop. But you won't be able to spatially locate any of the raindrops. Is it a good use of resources then to model each raindrop? It's still cool, but no, it's probably not a good use of resources.
Is there a better way to simulate how enveloping the sound of rain is such that it's omnipresent around you but you can't spatially locate any of it? I don't know.
At what point does having multiple distinct audio sources go from something identifiable to something that is simply processed out as "noise" by a person's brain? After all, this was a natural evolution in order for the brain to be able to locate sounds that it identifies as important, meanwhile shoving anything not important into background noise.
Thinking about it, 32 sound sources is reasonable, but it'd be nice to have more. OTOH, over a thousand sound sources doesn't really accomplish anything that likely couldn't be done as well with far few sound sources combined with audio shaping.
Being at a gathering with ~30 people, and it's already just a sea of "noise" with the only identifiable sounds (voices) being people in my immediate proximity that I'm looking at.
Likewise, out at the ranch. One cow or horse running around is easily identified and located. 10 horses running around? Very difficult to try to identify the sound from one horses set of hooves. 20 horses? It's just a rolling bit of noise almost like thunder.
But those are all examples with sounds that are very similar. What about sounds that are different? How many different sounds can you have and still have them be distinct, identifiable, and locatable? No idea.
With a music band, you can identify and likely locate each player and instrument (assuming it's not being amplified through speakers).
With a full symphony orchestra? I can identify groups of instruments and the general area of a group of instruments but there's no hope of identifying a single instrument or musician unless they are doing something different.
OK, so an instrument/musician playing differently in the orchestra is still identifiable and locatable? So what if we had everyone in the orchestra playing a bit differently than everyone else? We're back to not being able to locate any specific sound source or the whole thing coming across as noise because there are too many differing sources of sound that are all distinctly unique from each other.
So, today I sat near a major thoroughfare (road) in my city. If there are a lot of cars passing by (10s), the sound of each vehicle being its own sound source would sound exactly identical to just having one sound source representing the volume of vehicles passing by.
There are exceptions, of course. A dumptruck going by was identifiable through the sea of vehicle noise. A car honk stood out. So in this case, modeling with a generic traffic sounce source and then individual sources for outstanding sources of sound would work just as well as using individual sources for each object.
All of this just to say that more than 32 would be good, but 1000 is more than needed, IMO.
I'm certainly interested in hearing what a game sounds like if a developer attempts to implement 100's of simultaneous sources of sound. I don't think it would noticeably stand out versus something using say 50-100 simultaneous sources of sound.
What I would prefer WRT to hardware accelerated sound in the next generation of consoles isn't MORE sounds, but better audio modeling and processing. That includes spatial location, occlusion, reflection, doppler, reverberation, material modeling, etc.
Regards,
SB