X

Innovation

Home Innovation Artificial Intelligence

Facebook AI open sources multilingual machine translation model

The model, M2M-100, is trained on 2,200 language directions and can be more accurate because it doesn't use English as a go-between language in translations.

Written by Larry Dignan, Contributor Oct. 19, 2020 at 8:00 a.m. PT

special feature

Managing AI and ML in the Enterprise

The AI and ML deployments are well underway, but for CXOs the biggest issue will be managing these initiatives, and figuring out where the data science team fits in and what algorithms to buy versus build.

Facebook AI is open sourcing M2M-100, a multilingual machine translation model (MMT) that can translate any pair of 100 languages without relying on English.

The MMT is thought to be more accurate because it doesn't have to use English as a go-between. Typically, models have been English-centric. So translating Chinese to French or Chinese to Spanish would require a translation into English before a final destination.

Facebook argues that direct translations between languages capture more meaning and outperform English-centric systems by 10 points on the BLEU metric.

Primers: What is AI? | What is machine learning? | What is deep learning? | What is artificial general intelligence?

M2M-100 is trained on 2,200 language directions. Facebook said it will also release the model, training and evaluation setup for M2M-100 for other researchers.

Facebook has been using machine translation across its network but building separate AI models for each language and task didn't sale. After all, Facebook runs 20 billion translations every day on Facebook News Feed.

To train the MMT model, Facebook had to curates quality sentence pairs across multiple languages without English. There are more translations to English than direct between languages. Ultimately, Facebook built an MMT dataset of 7.5 billion sentence pairs across 100 languages. From there, Facebook narrowed pairs down to high quality and high data pairs. Statistically rare translation pairs were avoided.

Here's a summary of the M2M-100 dataset and model.

Related:

Editorial standards

Show Comments

Related

Robot vacuums and mops

The best robot mops you can buy: Expert tested

Roborock S7 Max Ultra

The flagship Roborock S7 Mav Ultra robot vacuum mop is still $500 off after Prime Day

Levoit Vital 200S

The best Levoit air purifier I've tested is still 20% off after Prime Day