Science writing can now be automated (sort of)

Uh-oh. But, like, I can totally do other stuff.

neural-net.jpg

Science writers are rhetorical middlemen. We read technical papers, talk to academics and professionals, and translate dense technical jargon into information that's consumable by a broader audience. 

In other words, we're like an Instagram filter for talented scientists who produce incredibly heavy, usually boring material to explain their work. Unfortunately for us, it seems someone took the analogy to heart. 

A team of scientists from MIT and the Qatar Computing Research Institute has developed a neural network that reads scientific papers and spits out a nice sentence or two summary in plain English. Though in a preliminary iteration, the neural network improves on existing natural language processing strategies using a new approach to correlating long strings of data called rotational unit memory (RUM). A neural net like this could quickly (and, uh, cheaply) scan a large number of papers to deduce what they're all about.

Okay, so I'm not updating my resume yet. But the automated writing is clearly on the wall. Machines can understand academic texts and distill from context the essential takeaways.

The project was launched while the researchers were working on an unrelated neural network project aimed at developing AI that could solve difficult physics problems. The researchers had the insight that their RUM approach could also address natural language processing, still a major challenge for computers.

As compared with a conventional neural network, the results from the team's RUM language processing are striking. Below are two summaries of the same scientific paper. The first was generated using a traditional neural net that uses a technique known as long short-term memory (LSTM) to remember and correlate long strings of data found in natural language.

"Baylisascariasis," kills mice, has endangered the allegheny woodrat and has caused disease like blindness or severe consequences. This infection, termed "baylisascariasis," kills mice, has endangered the allegheny woodrat and has caused disease like blindness or severe consequences. This infection, termed "baylisascariasis," kills mice, has endangered the allegheny woodrat.

And now a summary of the same paper generated by the team's RUM AI. 

Urban raccoons may infect people more than previously assumed. 7 percent of surveyed individuals tested positive for raccoon roundworm antibodies. Over 90 percent of raccoons in Santa Barbara play host to this parasite.

According to an MIT spokesperson, the system represents words by a vector in multidimensional space. Each subsequent word swings this vector in some direction. Using vectors to represent strings of words instead of LSTM, the system has essentially remembered every word within the larger context of a sentence. The final vector or set of vectors has a unique path through virtual space, which is correlated with other sentence vectors to find patterns and derive meaning.

"RUM helps neural networks to do two things very well," says Preslav Nakov, a senior scientist at the Qatar Computing Research Institute, HBKU. "It helps them to remember better, and it enables them to recall information more accurately."

That's great news for science. For science writers it's a little disconcerting.