Truly bilingual speakers are an asset to the business. Unfortunately, not too many speakers are truly bilingual. Microsoft Translator aims to deliver synthesised multilingual communications from a monolingual human voice.
At TechFest 2012 earlier this year, Microsoft demoed a project from Microsoft Research Asia, which aims to turn a monolingual speaker into a multilingual voice output using machine based Text To Speech (TTS) synthesis.
Microsoft describes how this 'monolingual into multi-lingual' method works:
Out of a speaker’s monolingual recordings, our algorithm can render speech sentences of different languages for building mixed-coded, bilingual TTS systems. We have recordings of 26 languages which are used to build our TTS of corresponding languages. By using the new approach, we can synthesize any mixed language pair out of the 26 languages.
Frank Soong, Principal Researcher at Microsoft Research Asia demonstrated the concept of TTS synthesis.
He used the example of a TTS for an American driving a car in Beijing. The TTS understands English, but the TTS is trained in Mandarin using the same English speech data. The key directions are in English, but the landmarks and street names are in Chinese.
The translator uses a reference speaker, in this case, a Chinese speaker to get the frequency, tone and modulation of the voice. The voice is then 'warped' or equalised between the reference Chinese human speaker and the English multilingual 'machine' voice.
The English language database is then broken down into pieces (five milliseconds per piece). All of the voice pieces which are closest to the trajectory of the 'warped' Chinese sentence are used. The best concatenations of sequences are then calculated and reassembled.
At about 19 minutes 30 seconds on the video there is a cool talking head of Rachid's boss, Craig Mundie speaking in English, then Mandarin using his own voice with the same timbre and intonation.
Although this is still a prototype, this machine based 'Babel Fish' brings great opportunities for businesses that only speak one language. Businesses could have the opportunity to break into new markets around the globe without the overhead of human based translation services.
Train the synthesiser once and reproduce your training video in any one of 26 languages. Apply it to your audio communications for multi lingual reach.
And even if it is only applied to car navigation systems, then it is a step in the right direction -- whatever language the directions happen to be in...