Alice, the making of: Behind the scenes with the new AI assistant from Yandex

Did you ever wonder what it's like to build an AI personal assistant, or to bridge the language gap? Hint: There's big data and machine learning involved.
Written by George Anadiotis, Contributor

Siri vs. Google Assistant vs. Bixby

Today, another AI assistant is joining the party with Alexa, Google Assistant, Siri, Viv, and the gang. Her name is Alice, and she comes from Russia. Yandex, the Russian internet giant, has big plans for the future and Alice is a key part of those.

Read also: Russia sentences hackers from Humpty Dumpty ring | Facebook, Google, Twitter execs to testify at Russia hearings | Did Russia's election hacking break international law? Even the experts aren't sure

Recently, Yandex celebrated its 20 years in Moscow, and the celebration was an opportunity to visit Yandex HQ, converse with some of its top minds, and get the lowdown of what's cooking and how things work behind the scenes.

Like most high-tech vendors these days, Yandex is using tons of data and advanced machine learning (ML) to develop its products and services. In this case, there is the additional twist of locality and language that Yandex has to cater for, and looking at how Google and Microsoft alumni do it at Yandex illuminates the state of the art.


The AI assistant party just got a new entry: Alice, from Russia's Yandex. (Image: Yandex)

World, meet Alice

When ZDNet visited Moscow in late September, it was not just another day in the office for the nearly 3,000 people working in the massive Yandex HQ. It's not every day that a company celebrates 20 years, and Yandex is alive and kicking, dominating in a market that includes Russia and its peripheral countries.

Those were the days of the final sprint for Alice's release, too, but Yandex people were feeling confident enough about her already to showcase her to a special guest: the Russian President Vladimir Putin. Admittedly, that is more than the usual stress of releasing new products, but it all worked out in the end.

If anything, as Misha Bilenko jokingly commented during our chat the next day, you don't want to miss a product release deadline after you've committed to it to Putin. Bilenko joined Yandex as head of Machine Intelligence and Research (MIR) after a long stint at Microsoft, and has been heavily involved in the making of Alice, among other things.

That speaks volumes on the building blocks used to create Alice, but Bilenko was definitely not the only one involved. Currently, Alice integrates Yandex services such as Search, News, Weather, Music, and Maps. Alice is available in the Yandex Search app on iOS and Android. There is a beta version for Windows, and Yandex Browser and other Yandex products will soon follow.

Denis Filippov, head of Yandex Speech Technologies, said that Alice provides advanced digital functionality to accomplish tasks with a single tool by centralizing a number of market-leading products. Filipov is in charge of SpeechKit, Yandex's proprietary speech recognition toolkit that Alice's voice recognition and synthesis relies on.

Filipov, however, points out that Alice is built on a stack of search, speech, and dialogue technologies: Voice activation, speech recognition, text-to-speech, natural language deficiencies, entity recognition, dialog management, contextual support, search, object answers, and others.

Most of these technologies are based on deep learning (DL), so a vast amount of training data is needed to train them to superior quality. For Filipov, however, this is not a problem: "We have a great source of data since Yandex has the most popular search, geo, and taxi services in Russia and dozens other mobile apps. We also use Yandex Toloka, our crowdsourcing platform, to collect training data."

As for the future? Ultimately, Yandex wants Alice to become a basic platform to organize interaction between people and devices on all possible surfaces such as smartphones, desktops, smart homes, cars, and any others, Filipov said. Sounds Google-ish. But what about voice data retention and processing -- will that be Google-ish, too?

Filipov said that voice requests to Alice are processed by Yandex servers in the cloud: "We retain some of them to widen our training set data to provide our users with better speech recognition quality. It is crucial for us to provide the highest level of privacy to our users, so we retain completely anonymous voice data without any associations with users' accounts."

Filipov added that Alice works as a part of other Yandex apps and can't be exploited to control any other smartphone features, in response to concerns about recently uncovered vulnerabilities in other assistants. Ultimately, though, how does Alice rank compared to other assistants? How does one even begin to make such comparisons actually?

One way of doing that would be to come up with a way of measuring IQ, as was recently done in a comparative test. Filipov, however, seems to favor another metric: WER (Word Error Rate). "We wanted Alice to interact with users more like a human, so that users don't need to adapt their requests," he said.

"In developing Alice, we leveraged our speech technologies, which currently provide the world's most accurate Russian language recognition. Based on WER measurements, Alice demonstrates near-human levels of speech recognition accuracy. Alice uses a hybrid dialog technology with context support, it is a mix of goal-oriented and general conversation models.

Read also: Putin says Russia doesn't hack others, but patriots might have | Trump fires FBI director James Comey amid ongoing Russia probe | Russia joins arms race to produce cannon-shot swarm bots

"For general conversation Alice was trained not only with predefined answers, which is a common approach for virtual assistants at the moment, but also we went further and rolled out a neural network conversation model, which was trained on tremendously huge amounts of text dialogs from the internet.

"Alice, as all Yandex products, is a user-centric product, so we use some general set of metrics to evaluate it on the high level like daily active audience, users retention, requests per user, and others.

"Russian language offers a unique set of challenges with its grammatical complexities and reliance on tone of voice. Yandex's focus and expertise in the Russian language allowed us to train Alice to have a superior understanding of users and their various accents."

From Russian to English and back again

That's all fine and well, but if you're not a Russian speaker, what's it to you? At this point, not much, admittedly. But that may well change in the not so distant future, if David Talbot and his team have anything to say about that. People are already using Yandex's translation combined with OCR, for example, so making the voice connection does not seem like a far cry.

Talbot is leading Yandex's Machine Translation unit (MTU), after having spent about a decade working on machine translation at Google. His team's work at this point is not focused on spoken word, but things like natural language processing and entity recognition are both core to their work and part of Alice's building blocks.

So, if you're hoping to use Alice in English in the future, Talbot's team may have to be even bigger and busier than it already is now. Thirty people may sound like a lot, like a startup within a corporation, but getting to know their herculean work may leave you wondering whether that's even enough.

Talbot and his team had just returned from an international workshop on machine translation when we had our discussion, and that was a good opportunity to get an inside view on the latest developments in the field as well as what is used in practice.

Talbot only joined Yandex recently, but the team he now leads has been working since 2011, initially triggered by fixing misspelled user queries. Since that is in some sense close to being a translation problem, they were the ones to call when Yandex decided to make the non-Russian web transparent for its users.

This is MTU's mission -- to enable Yandex users to interact transparently with any part of the web in their native language. English-born Talbot, a Russian speaker himself, said that although this provides a specific focus on their translation work, MTU uses the techniques that everyone else in the field is using. MTU takes pride in claiming to be the best around when it comes to translation to and from Russian.

And what might these techniques be? A whole lot of ML and DL, basically. Talbot explains there have been huge changes in the field in the last couple of years, owing mostly to progress in DL: "Statistical models have dominated for a long time, but now neural machine translation is the thing."


Yandex's translation service may be purpose and context-specific, but the challenges it faces and the solutions it adopts are universal

Talbot explained how classical, statistical ML models for translation work by breaking down documents in smaller chunks and eventually phrases, and are mostly based on huge tables of phrases and translations from language to language. DL models by contrast work on the entire document, or at least much larger parts of it.

As a result, each approach has its strengths and weaknesses. The statistical approach works well on a phrase-to-phrase level, but the overall result is more fluid with DL. The problem, Talbot said, is that when DL goes wrong, it goes really wrong -- it can produce gibberish.

On the other hand, Talbot noted, "We're asking these machine translation algorithms to do what any human translator would object to: To translate isolated phrases out of context."

So, what is the solution to this dilemma?

Currently, MTU uses both approaches and a classifier to determine which result looks better for a given text. Essentially, Talbot said, this works as a fallback option for cases when the DL translation goes wrong.

In the future, MTU is hoping to address this in more sophisticated ways, including providing more context. Instead of letting each algorithm work independently and then keeping one result in the end, MTU is working on ways to combine them more organically, such as feeding the phrase tables to the DL algorithm.

Other approaches have to do with developing different DL arhitectures. As Talbot explained, DL-based machine translation is a hard optimization problem, which, among other things, means that -- depending on your initial settings and configuration -- you may end up developing different neural networks that seem to produce the same result but work in different ways. Combining an array of these neural networks is another possible optimization.

And then there's the brute-force approach: More data = better training = better results. But that's not that simple even if you are Yandex. Indexing the entire web, or close, is a good foundation. But then there are a number of 'gotchas.' What is needed for MTU is pairs of documents that exist both in Russian and in English, for which the translation is actually a good one.

Without a good heuristics strategy, this is an impossible task. Imagine having to find every document on the web and juxtapose it with every other document, and then assessing whether the one is a translation of the other, and then assessing translation quality. Even with heuristics, this is really hard and requires vast compute and storage resources.

Perhaps it would be simpler to apply some domain knowledge, which, in this case, comes down to linguistics. That's already been done actually, Talbot said. For a while that was the main strategy in machine translation, up to the point where ML approaches became dominant. Interestingly, in what mirrors the situation in many domains, the zeal for ML made people overlook domain knowledge initially.

Read also: Justice Dept. charges four Russia-backed hackers over Yahoo breach | Is Russia using hacking and misinformation to disrupt Western nations? | Treasury loosens Russia sanctions to ease encrypted tech blockade | Obama, Feds outline technical, spear phishing details, sanctions vs. Russia over cyber attacks

Many methods were applied to increase the quality of results of ML approaches, some of them resembling what MTU is now trying for its DL approach. When these plateaued, however, it was time for a return to linguistics, which turned out to bring significant improvement.

History may well repeat itself. Talbot noted that despite some people having already tried using linguistics with DL approaches, the gains were modest. However, as soon as other methods plateau again, a return to linguistics will be very likely.

It's all big data to me

Dealing with projects of this magnitude takes deep expertise and really big data compute and storage infrastructure. Yandex is among the few organizations that has access to this kind of data and expertise, and its approach is interesting, somewhat idiosyncratic, and not all that well known beyond its confines. We will return to examine the approach, the infrastructure, and the applications further in the near future.

Previous and related coverage

What is Kaspersky's role in NSA data theft? Here are three likely outcomes

A bombshell news report on Kaspersky's alleged involvement in the theft of NSA data leaves one crucial question unanswered.

Beyond Kaspersky: How a digital Cold War with Russia threatens the IT industry

What would an escalation of tensions mean for the future of our relationships with Russian software companies, developers, and strategically outsourced tech talent?

Trump backs down from 'impenetrable cyber unit' with Russia

Fellow politicians have lined up to deride US President Donald Trump's idea of a cybersecurity partnership with Russia.

NSA whistleblower Snowden: VPN ban makes Russia 'less safe and less free'

Vladimir Putin's decision to ban virtual private networks has drawn criticism from NSA whistleblower Edward Snowden.

Editorial standards