Facebook's Applied Machine Learning team detailed on Wednesday how it plans to use a new technique of multilingual embeddings to scale AI tools on the social network to more languages and ship AI-powered products to new languages faster.
More than half of Facebook's users speak a language other than English, and more than 100 languages are used on the social network. The wide variety of languages makes it difficult for Facebook to provide AI tools like recommendations and M suggestions standardized across all its supported languages.
In a blog post, Facebook explained multilingual embeddings perform 20 to 30 times faster in overall latency when compared to other approaches of natural language processing (NLP) text classification. The social network called multilingual embeddings a better way to scale NLP across many languages.
With multilingual embeddings, Facebook said embeddings for every language exist in the same vector space and words with similar meanings (regardless of language) are close together. In the past, Facebook has needed to collect a separate, large set of training data for each language or collected large amounts of data in English to train an English classifier, and then translate it to other languages.
Facebook explained how words in languages appearing closer together helps text classification:
In order to make text classification work across languages, then, you use these multilingual word embeddings with this property as the base representations for text classification models. Since the words in the new language will appear close to the words in trained languages in the embedding space, the classifier will be able to do well on the new languages too. Thus, you can train on one or more languages, and learn a classifier that works on languages you never saw in training.
In its initial testing, Facebook has seen multilingual embeddings perform better for English, German, French, and Spanish. Facebook said as the project continues to scale, the team will try new techniques for languages where it doesn't have large amounts of data.
Facebook has also used multilingual embeddings across its ecosystem in other ways, including its Integrity systems that detect policy-violating content and classifiers that support features like event recommendations.
Facebook said because multilingual embeddings are typically more accurate, and it "should mean people have better experiences using Facebook in their preferred language." In the future, the company is working on ways to capture nuances in cultural context across languages.