Zoom acquires real-time translation startup Kites GmbH

Zoom will use Kites' technology to offer multi-language translation capabilities for Zoom users.

Zoom said on Tuesday that it has signed a deal to acquire Kites, a German startup that's developed a real-time machine translation (MT) platform. Kites' team of 12 research scientists will join Zoom's engineering team as the company works to improve meeting productivity with multi-language translation capabilities for Zoom users.

"We are continuously looking for new ways to deliver happiness to our users and improve meeting productivity, and MT solutions will be key in enhancing our platform for Zoom customers across the globe," said Velchamy Sankarlingam, president of product and Engineering at Zoom. "With our aligned missions to make collaboration frictionless -- regardless of language, geographic location, or other barriers -- we are confident Kites' impressive team will fit right in with Zoom."

Kites was founded in 2015 by Dr. Alex Waibel and Dr. Sebastian Stüker, faculty members of the Karlsruhe Institute of Technology. Zoom said Stüker and the rest of team will continue to work out of Karlsruhe, Germany, while Waibel will take on a role as a Zoom Research Fellow advising on Zoom's MT research and development.

The field of speech-to-text technology has had its share of challenges. In 2017, for instance, Google launched a highly anticipated new pair of wireless earbuds that boasted an exclusive real-time translation feature. The pitch was that Pixel Buds could recognize speech in one language, translate the words to another language on a user's phone, and then read the translated sentence aloud. 

However, early reviews of the product revealed that the technology was struggling to recognize speakers' words, especially if they spoke in complicated sentences or with an accent. The issue boils down to the fact that recognizing human speech is difficult no matter how sophisticated the artificial intelligence.

Kites' technology claims to be able to translate spontaneously spoken language with minimum latency and maximum accuracy. The company says that when it comes to conversational speech, its system has an error rate of about 5%, while the human translation error rate is about 5.5%.