Kayvan Barin
Newcomer
I had posted an article like this several years ago on this site, but I could not follow the thread at the time. Now I am posting something similiar to that.
As you know, it is presently assumed that 3D effect is picked up by humans because they have 2 eyes and 2 ears. Therefore, we have stereo audio systems and binocular video systems (2 channels). Nowadays for richer content there is a trend of more and more channels for audio. My claim is that a 3 channel system would be the most ideal 3D system that is based on geometry of 3D space which I will explain next. Later I will comment on why we have 2 ears and 2 eyes.
Consider a mono audio system (1 channel). The sound is distributed linearly from speaker to you. Volume and clarity of the sound from speaker is the cue to the position of the object whose sound we are hearing on that line. Now consider a stereo system (2 channels). As well as the object being near or far, it can go right and left too. In other words, the sound is distributed in a plane that is identified by the coordinate plane of 2 axes: one from head to left speaker, and another from your head to right speaker. Now if we have 3 speakers with 3 signals going into them (3 channels), and we put them in a triangular formation in front of us, the sound will have a up and down distribution as well as near and far and left and right. In other words it will occupy a 3D space identified by spacial coordinate system passing through 3 axes: Each one from one speaker to you. I believe 3 channels is the best 3D configuration. Less than 3 channels would be poorer in content. And more than 3 channels will give too many cues, and it will be tiring.
Consider this simple suggestion, when you are watching TV, a stereo sound system just gives you the right and left cue relative to display. But a 3 channel audio system, would give you up and down cue relative to the display too. But more than 3 channels like having a speaker behind you is too much; you would lose the comfort of the living room where you are watching and hearing through the window that is the TV.
Same would apply to 3 channel video. In 2 channel video, you have to keep your head rigid to keep 3D cues intact. But in 3 channel video, tilting your head would give more 3D cues yet.
In optical object recognition systems like Kinect(Microsoft’s user interface for its video game consul), one can detect the most 3D information from 3 cameras in triangular formation. They use Kinect in some methods to measure 3D shape of objects, but they have to move the device along the objects to pick the shape of object calculated from vision through 2 cameras. While it should be available from one snap shot from 3 cameras and less calculations. Also in sound and voice recognition, there may be applications for this configuration.
About having 2 ears and 2 eyes, please remember our head is almost always in motion. If we had an absolutely stationary head, I believe we needed 3 ears and 3 eyes to hear and see in 3D. But since our heads are always moving and the complex brain we have to calculate 3D coordinates based on moving eyes and ears, 2 ears and 2 eyes are enough. May be that 2 ears and 2 eyes were all that evolution could muster so far.
]I do not think we would be able to manage this form of 3D (3 channels) with headphones and glasses as before. Some modification would be necessary. May be 3 stationary speakers and a display that emits lights in 3 directions would be necessary (3D video system without glasses).
Thanks for attention,
Kayvan Barin
As you know, it is presently assumed that 3D effect is picked up by humans because they have 2 eyes and 2 ears. Therefore, we have stereo audio systems and binocular video systems (2 channels). Nowadays for richer content there is a trend of more and more channels for audio. My claim is that a 3 channel system would be the most ideal 3D system that is based on geometry of 3D space which I will explain next. Later I will comment on why we have 2 ears and 2 eyes.
Consider a mono audio system (1 channel). The sound is distributed linearly from speaker to you. Volume and clarity of the sound from speaker is the cue to the position of the object whose sound we are hearing on that line. Now consider a stereo system (2 channels). As well as the object being near or far, it can go right and left too. In other words, the sound is distributed in a plane that is identified by the coordinate plane of 2 axes: one from head to left speaker, and another from your head to right speaker. Now if we have 3 speakers with 3 signals going into them (3 channels), and we put them in a triangular formation in front of us, the sound will have a up and down distribution as well as near and far and left and right. In other words it will occupy a 3D space identified by spacial coordinate system passing through 3 axes: Each one from one speaker to you. I believe 3 channels is the best 3D configuration. Less than 3 channels would be poorer in content. And more than 3 channels will give too many cues, and it will be tiring.
Consider this simple suggestion, when you are watching TV, a stereo sound system just gives you the right and left cue relative to display. But a 3 channel audio system, would give you up and down cue relative to the display too. But more than 3 channels like having a speaker behind you is too much; you would lose the comfort of the living room where you are watching and hearing through the window that is the TV.
Same would apply to 3 channel video. In 2 channel video, you have to keep your head rigid to keep 3D cues intact. But in 3 channel video, tilting your head would give more 3D cues yet.
In optical object recognition systems like Kinect(Microsoft’s user interface for its video game consul), one can detect the most 3D information from 3 cameras in triangular formation. They use Kinect in some methods to measure 3D shape of objects, but they have to move the device along the objects to pick the shape of object calculated from vision through 2 cameras. While it should be available from one snap shot from 3 cameras and less calculations. Also in sound and voice recognition, there may be applications for this configuration.
About having 2 ears and 2 eyes, please remember our head is almost always in motion. If we had an absolutely stationary head, I believe we needed 3 ears and 3 eyes to hear and see in 3D. But since our heads are always moving and the complex brain we have to calculate 3D coordinates based on moving eyes and ears, 2 ears and 2 eyes are enough. May be that 2 ears and 2 eyes were all that evolution could muster so far.
]I do not think we would be able to manage this form of 3D (3 channels) with headphones and glasses as before. Some modification would be necessary. May be 3 stationary speakers and a display that emits lights in 3 directions would be necessary (3D video system without glasses).
Thanks for attention,
Kayvan Barin