Google Cloud on Tuesday announced the general availability of its Cloud Text-to-Speech API, which lets developers add natural-sounding speech to their devices or applications. The API also now offers a feature to optimize the speech for specific kinds of speakers. Google has also added several new WaveNet voices to the API, opening up opportunities for natural-sounding speech in more languages and a wider variety of voices.
Google first announced Text-to-Speech in March, illustrating how Google has been able to leverage technology from its acquisition of DeepMind. The AI company created WaveNet, a deep neural network for generating raw audio. WaveNet voices are more natural-sounding than standard text-to-speech voices.
At its debut, Text-to-Speech only offered WaveNet voices in US English. With the new WaveNet voices, the API now supports 26 WaveNet voices in US English, UK English, Australian English, French, German, Dutch, Italian, Korean and Japanese.
Including Standard voices, the Text-to-Speech API offers a total of 56 voices in 14 languages and variants.
Google's Text-to-Speech API competes with Amazon Web Services' Polly, which lists 54 available voices.
Meanwhile, the Google API also now offers a beta version of Audio Profiles, a feature that helps developers optimize their voices for specific hardware, such as phone lines, headphones or speakers. With this feature, Text-to-Speech moves audio to the appropriate frequencies. Phone lines, for instance, are bandwidth-limited to exclude bass and treble frequencies.