Alibaba neural network defeats human in global reading test

Chinese tech giant's research unit says its deep neural network model is the first to beat humans in the Stanford Question Answering Dataset, but is listed first alongside Microsoft on the latest rankings.
Written by Eileen Yu, Senior Contributing Editor

Alibaba says its deep neural network model has outscored humans in a global reading test, paving the way for the underlying technology to reduce the need for human input.

The Chinese tech giant's research unit, Institute of Data Science of Technologies (IDST), said it had developed a deep-learning model that attained a score of 82.44 in Exact Match on the Stanford Question Answering Dataset (SQuAD). Humans had clocked a previous score of 82.304, it said.

SQuAD is comprised more than 100,000 question-and-answer sets based on more than 500 Wikipedia articles, in which participants were required to build machine-learning models to respond to the questions. These models would be evaluated by SQuAD, which then would run the model on the test set.

Various universities, research institutions, and technology vendors were participants including Tencent, Google, IBM, Microsoft, Samsung, Tel-Aviv University, and South Korea's Kangwon National University. A handful had participated multiple times in the past year including Microsoft Research Asia, which previous score of 82.136 was clocked on December 17, 2017, while Alibaba's previous score of 79.199 was recorded on December 28, 2017.

In its statement Monday, the Chinese vendor said it was the first to surpass humans in the test, but SQuAD listed the Chinese vendor as shared leader alongside Microsoft Research Asia, which scored a higher 82.65. SQuAD highlighted Microsoft's rank as "January 3, 2018", while Alibaba's was "January 5, 2018".

A spokesperson for Alibaba explained that the dates indicated when the respective model was submitted. He told ZDNet that the actual test results officially registered by SQuAD for Alibaba was January 11, 2018--a day ahead of Microsoft's--which gave the Chinese vendor the distinction of being "first" to surpass human scores.

According to Alibaba, its neural network model was based on the Hierarchical Attention Network, which it explained would read "from paragraphs to sentences to words" to identify phases that could hold potential answers. This underlying technology previously was used in its Singles Day shopping festival to respond to customer inquiries.

The company had said its AI-powered customer service chatbot, Dian Xiaomi, was used to support its online merchants and served an average of 3.5 million users daily across its Taobao and Tmall platforms.

Commenting on the SQuAD score, Alibaba IDST's chief scientist of natural language processing Si Luo, said: "That means objective questions such as 'what causes rain' can now be answered with high accuracy by machines. We believe the technology underneath can be gradually applied to numerous applications such as customer service, museum tutorials, and online responses to medical inquiries from patients, decreasing the need for human input in an unprecedented way."

Si said Alibaba would be "sharing our model-building methodology" with the community and planned to apply the technology to support its customers in the near future.


SQuAD's current ranking

Editorial standards