X

Innovation

Home Innovation Artificial Intelligence

Google Cloud updates AI-powered speech tools for enterprises

Google's Speech-to-Text and Text-to-Speech products are getting more voices, more languages and lower prices.

Written by Stephanie Condon, Senior Writer Feb. 21, 2019 at 10:52 a.m. PT

executive guide

18-robotic-artificial-intelligence-ai-deep-learning-computer-program-technology.jpg

What is AI? Everything you need to know about Artificial Intelligence

A guide to artificial intelligence, from machine learning and general AI to neural networks.

Google Cloud on Thursday announced it's updating its Text-to-Speech products with more voice and more languages. Google has also improved the quality of its Speech-to-Text transcription tools and is bringing some of their features into general availability. The updates should help developers build intelligent voice applications that can reach millions of more people and function more effectively.

For Text-to-Speech, Google has roughly doubled the number of voices available since its last update in August. It's added support for seven new languages or variants, including Danish, Portuguese/Portugal, Russian, Polish, Slovakian, Ukrainian and Norwegian Bokmål -- all in beta. The product now supports a total of 21 languages.

Across those new languages, Google has added 31 new WaveNet voices and 24 new standard voices. Google says it now supports a total of 106 voices.

WaveNet is a deep neural network for generating raw audio, which creates voices that are more natural-sounding than standard text-to-speech voices. The technology was created by DeepMind, the AI company Google acquired in 2014.

"Thanks to unique access to WaveNet technology powered by Google Cloud TPUs, we can build new voices and languages faster and easier than is typical in the industry," Google product manager Dan Aharon said in a blog post.

Google's primary competition for Text-to-Speech services is Amazon Web Services' Polly, which according to its website currently enables 58 voices.

In addition to adding new voices, Google's Text-to-Speech Device Profiles feature is now generally available. This lets customers optimize audio playback on different types of hardware, such as headphones for media applications like podcasts.

Meanwhile, for Speech-to-Text, Google is bringing into general availability premium models for video and enhanced phone, which were rolled out in beta last year. The video model, which is based on technology similar to what YouTube uses for automatic captioning, now has 64 percent fewer transcription errors, Google announced. The enhanced phone model now has 62 percent fewer errors.

Google was able to improved the models by requiring customers who used the premium services to share usage data via data logging. Starting now, customers can use the enhanced phone model without opting into data sharing, while those who opt in will pay a lower rate. Prices are also lower for all premium video model customers, and those who opt into data sharing will get an additional discount.

Google is also announcing the general availability of multi-channel recognition, which helps the Speech-to-Text API distinguish between multiple audio channels. This is useful for in scenarios involving multiple people, such as doing meeting analytics.

Editorial standards

Show Comments

Related

prime-day-echo-device-deals-2024

The 30+ best Prime Day Echo device deals of 2024

Roborock S7 Max Ultra

The flagship Roborock S7 Mav Ultra robot vacuum mop is still $500 off after Prime Day

prime-day-robot-vacuum-deals-2024

The 50+ best Prime Day robot vacuum deals still available