In the world where chip technology improves exponentially, acoustic engineering isn't so simple and it presents a huge hurdle to overcome to the world of telephony and video conferencing. Fraunhofer IIS (inventors of MPEG-1 Layer 3 AKA MP3) seeks to tackle this challenge and showed off some of its research and upcoming products at VON.x 2008.
The first demonstration given to me was echo cancellation technology that prevents sound coming out of a speaker from reentering the microphone. This is one of the most annoying things about using PC telephony like Skype since Skype
lacks good echo cancellation technology[update 3/20/2008 - Recent versions of Skype now have very good echo cancellation on Windows, Mac, and Linux. I still experienced some problems because the clients on the other end were using older versions of Skype.] When you connect to someone using a speaker and microphone, you can often hear yourself talking a split second after you speak and it's incredibly annoying. With Fraunhofer's echo cancellation technology, that problem virtually disappeared.
Now I'm fully aware that Skype is a free application but that hasn't stopped open source solutions like Asterisk from offering licensed technology where a user for example pays $10 for the G.729 codec. I'd gladly pay a little money for some good echo cancellation software.
The other cool demonstration was the discrete multichannel sound separation technology. Normally when you're in a room with multiple microphones being mixed in to a single sound channel and transmitted over a single audio channel, the sound is blurred. But when the sound from each microphone and each person is sent in its own channel and played back from its own speakers, you can clearly hear each person speaking at the same time. The downside of course is that each audio channel uses a separate 64 Kbps steam but that may not be a problem since it's dwarfed by the video stream.
If you can't spare the bandwidth and you only want to use a single 64 Kbps audio stream, Fraunhofer has another technology that can separate each person on to its own channel by marking the streams with few identifiers. Once that's done, each person can be moved from one sector to the other in a graphical interface shown below such that their sounds come out from the corresponding speakers. While it wasn't as pure as the discrete channel solution, it sounded almost as good because each person's voice had its own dedicated speaker. Just the act of using a physically different speaker cone per voice seems to have a huge impact on quality.
While this technology demo used 5 speakers, there's no reason it can't be made to work with the more typical stereo speaker set up. I'd love to see audio conferencing bridges incorporate this technology such that multiple sound sources are marked for separate speakers so that they can be played back from separate speakers.