Google Cloud Platform launches text-to-speech service to compete with AWS Polly

The new service from Google Cloud Platform highlights how it is leveraging models and technology from the search giant's Deepmind subsidiary.
Written by Larry Dignan, Contributor

Video: How machine learning's big data loop works

Google Cloud outlined Cloud Text-to-Speech a machine learning service that uses a model by Google's Deepmind subsidiary to analyze raw audio.

With the move, developers will get more access to the text to natural sounding speech technology used in Google Assistant, Search, Maps and others.

According to Google, Cloud Text-to-Speech can be used to power call center voice response systems, enabling Internet of things devices to talk and converting text-based media into spoken formats.

Google Cloud Text-to-Speech allows customers to choose from 32 different voices in 12 languages. You can also customize for pitch, speaking rate, volume gain and format.

Read also: What is cloud computing? Everything you need to know about the cloud, explained | How to choose your cloud provider: AWS, Google or Microsoft?| Top cloud providers 2018: How AWS, Microsoft, Google Cloud Platform, IBM Cloud, Oracle, Alibaba stack up

The primary competition for Google Cloud Text-to-Speech will be Amazon Web Services' Polly, which enables 47 voices. Polly is also used for use cases in call centers and applications.

Read also: re:Invent: Amazon Web Services adds more data and ML services, but when is enough enough? | Re:Invent 2017: AWS all about capturing data flows via AI, Alexa, database, IoT cloud services | Cloud AutoML: How Google aims to simplify the grunt work behind AI and machine learning models

The rollout of the service also highlights how Google is leveraging Deepmind technology for Google Cloud Platform. The Deepmind technology used in Cloud Text-to-Speech is called WaveNet. A year ago, WaveNet would create raw audio waveforms from scratch using a neural network trained by speech samples.

When given text, WaveNet would generate speech from scratch one sample at a time for accuracy.

But with an update, WaveNet is running on Google Cloud's TPU infrastructure and can generate raw waveforms 1,000 times faster than before. Fidelity and speed allow WaveNet to create more human audio.

Related stories

Editorial standards