X

Innovation

Home Innovation

Move over Siri, Alexa: Google's offline voice recognition breakthrough cuts response lag

Google paves the way for Siri and Alexa working without the internet.

Written by Liam Tung, Contributing Writer March 13, 2019 at 6:37 a.m. PT

If you're one of the few people who own a Google Pixel phone, you'll soon be able to experience voice recognition without the internet.

AR + VR

I replaced my boring workouts with Meta Quest's Supernatural app, and can't imagine going back
This Finnish startup's new VR headset rivals Apple's Vision Pro - and business users will love it
Meta's $500 Quest 3 is the mainstream VR headset I've been waiting for, and it delivers
I tried Apple Vision Pro and it's far ahead of where I expected
The best VR headsets right now (and they're not just from Meta)

Google has announced the rollout of "an end-to-end, all-neural, on-device speech recognizer to power speech input in Gboard", the company's keyboard with Google Search baked in.

The technology could give Google an edge over Siri and Alexa in convincing people to talk to machines through phones and home speakers that can deliver answers faster, by cutting down the latency that comes with sending a request from a device to a remote server and waiting for a response.

The company has enabled on-device voice recognition by miniaturizing a machine-learning model that can do the task on a phone rather than handing off the job to a server in the cloud.

Google researchers detailed the on-device technique in a paper published on arXiv.org in November called 'Streaming End-to-end Speech Recognition For Mobile Devices'.

According to Google researchers, the model works at the character level, so as the user enunciates a word, the machine repeats it one character at a time, exactly how an expert human transcriber would type.

Beyond supreme low-latency speech recognition, Google wanted its system to exploit "on-device user context", such as the user's list of contacts, music apps to provide a list of song names they might be referring to, and location.

SEE: How we learned to talk to computers, and how they learned to answer back (cover story PDF)

To achieve the on-device intelligence, Google employed a Recurrent Neural Networks (RNN) transducer aided by a recent innovation called 'Connectionist temporal classification' that's used for training neural networks. The technique allowed for a more efficient manner for machines to interpret speech.

Google explains that the speech-recognition engine would normally depend on a search graph that can be 2GB in size, which would be onerous if stored on a device.

Instead, it trained a neural network that provides the same accuracy as a client-server setup that was just 450MB in size. Not happy with that, the Google researchers shrunk the model to just 80MB.

"Our new all-neural, on-device Gboard speech recognizer is initially being launched to all Pixel phones in American English only," Google researchers said.

"Given the trends in the industry, with the convergence of specialized hardware and algorithmic improvements, we are hopeful that the techniques presented here can soon be adopted in more languages and across broader domains of application."

Google compares server-side speech recognizer, left, with the on-device recognizer, right, when recognizing the same spoken sentence.

Image: Akshay Kannan/Elnaz Sarbar/Google

Previous and related coverage

Google explores AI's mysterious polytope
Researchers at Google Brain and DeepMind go in quest of better "representations" of the world by AI, through exploration of the polytope, a Euclidean geometric form that represents the possible solutions to a game of strategy.
Google Pixel 3 review: Excellent camera, pocketable form factor, and Google software are compelling
In a world of massive smartphones, there are still a few that comfortably fit in your hand and pocket. The Google Pixel 3 is the best small Android smartphone, but it's not quite perfect.
Google Lookout uses AI to describe surroundings for the visually impaired
Using similar underlying technology as Google Lens
Google brings Assistant's "continued conversation" feature to smart displays
The feature lets users engage in a conversation with the voice-activated assistant without prefacing each statement with "Hey Google."
Google Cloud updates AI-powered speech tools for enterprises
Google's Speech-to-Text and Text-to-Speech products are getting more voices, more languages and lower prices.
Google AI is very good at predicting when a patient is going to die
Google takes a 'gobble-it-all' approach to building predictive analytics for patient outcomes.
Google AI on Raspberry Pi: Now you get official TensorFlow support
Google's TensorFlow team makes it a whole lot easier to get AI up and running on a Raspberry Pi.
What can Siri and Cortana do to catch up to Alexa and Google Assistant? TechRepublic
Smart assistant technologies from data-driven companies like Google and Amazon are leading the market, while Siri and Cortana are falling behind. Here's how the latter can make gains.
Google bringing AI to texting, Sprint's 5G launch plans CNET
Today's major tech stories include Google's AI addition to its Messages app, Sprint's plans for the company's 5G launch and some hands-on time with Microsoft's latest HoloLens 2

Editorial standards

Show Comments

Related

Amazon Echo Pop

How Amazon can turn around its failing Alexa business in 4 steps

qcom-panel-1

Apple's iOS 18 beta and Amazon's AI assistant top the Innovation Index

The best Alexa devices you can buy: Expert tested