X

Innovation

Home Innovation Artificial Intelligence

Google Cloud Platform launches text-to-speech service to compete with AWS Polly

The new service from Google Cloud Platform highlights how it is leveraging models and technology from the search giant's Deepmind subsidiary.

Written by Larry Dignan, Contributor March 27, 2018 at 8:00 a.m. PT

Video: How machine learning's big data loop works

Featured

Switzerland now requires all government software to be open source
I replaced my Samsung Galaxy S24 Ultra with the Z Fold 6 for a week - and can't go back
Can't hear TV dialogue? 3 fixes to dramatically improve your television's audio - and 2 are free
My 4 favorite iOS 18 features make the iPhone a lot better, and more fun

Google Cloud outlined Cloud Text-to-Speech a machine learning service that uses a model by Google's Deepmind subsidiary to analyze raw audio.

With the move, developers will get more access to the text to natural sounding speech technology used in Google Assistant, Search, Maps and others.

According to Google, Cloud Text-to-Speech can be used to power call center voice response systems, enabling Internet of things devices to talk and converting text-based media into spoken formats.

Google Cloud Text-to-Speech allows customers to choose from 32 different voices in 12 languages. You can also customize for pitch, speaking rate, volume gain and format.

Read also: What is cloud computing? Everything you need to know about the cloud, explained | How to choose your cloud provider: AWS, Google or Microsoft?| Top cloud providers 2018: How AWS, Microsoft, Google Cloud Platform, IBM Cloud, Oracle, Alibaba stack up

The primary competition for Google Cloud Text-to-Speech will be Amazon Web Services' Polly, which enables 47 voices. Polly is also used for use cases in call centers and applications.

Read also: re:Invent: Amazon Web Services adds more data and ML services, but when is enough enough? | Re:Invent 2017: AWS all about capturing data flows via AI, Alexa, database, IoT cloud services | Cloud AutoML: How Google aims to simplify the grunt work behind AI and machine learning models

The rollout of the service also highlights how Google is leveraging Deepmind technology for Google Cloud Platform. The Deepmind technology used in Cloud Text-to-Speech is called WaveNet. A year ago, WaveNet would create raw audio waveforms from scratch using a neural network trained by speech samples.

When given text, WaveNet would generate speech from scratch one sample at a time for accuracy.

But with an update, WaveNet is running on Google Cloud's TPU infrastructure and can generate raw waveforms 1,000 times faster than before. Fidelity and speed allow WaveNet to create more human audio.

Related stories

Editorial standards

Show Comments

Related

Trophy technology

OpenAI's newly released GPT-4o mini dominates the Chatbot Arena. Here's why.

Roborock S7 Max Ultra

The flagship Roborock S7 Mav Ultra robot vacuum mop is still $500 off after Prime Day

qcom-panel-1

Apple's iOS 18 beta and Amazon's AI assistant top the Innovation Index