The Oak Ridge National Laboratory (ORNL), home of the Titan supercomputer and a research team dedicated to Vice President Joe Biden's Cancer Moonshot program, is experimenting with how deep learning can be used to improve cancer research.
Specifically, ORNL researchers applied new deep learning techniques to automate how information is extracted from cancer pathology reports documented across a nationwide network of cancer registry programs. These registry programs collect demographic and clinical information related to diagnosis, treatment, and history of cancer incidences in the US, and are used by physicians as a consultation tool for broad cancer surveillance.
Using a dataset composed of 1,976 pathology reports, ORNL researchers trained a deep-learning algorithm to multitask, which in this context means it was made to simultaneously carry out two different but closely related information-extraction tasks. In the first task the algorithm aimed to identify the primary location of the cancer, and in the second it identified which side of the body the cancer was located.
It turns out that with this method, in which a neural network was made to understand not only the meaning of words but also the contextual relationships between them, the algorithm performed substantially better than in other methods where related information was not exploited.
"Intuitively this makes sense because carrying out the more difficult objective is where learning the context of related tasks becomes beneficial," said Georgia Tourassi, director of the Health Data Sciences Institute at ORNL. "Humans can do this type of learning because we understand the contextual relationships between words. This is what we're trying to implement with deep learning."
According to Tourassi, the development of automated data tools could give medical researchers and policymakers a detailed view of the US cancer population, which potentially could reveal overlooked avenues in cancer research and accelerate the development of promising therapies.
"Today we're making decisions about the effectiveness of treatment based on a very small percentage of cancer patients, who may not be representative of the whole patient population," Tourassi said. "Our work shows deep learning's potential for creating resources that can capture the effectiveness of cancer treatments and diagnostic procedures and give the cancer community a greater understanding of how they perform in real life."
The ORNL is part of a strategic computing partnership between the US Department of Energy and the National Cancer Institute. ORNL's Titan supercomputer is the nation's fastest computer, capable of 20 million billion calculations per second.