Cancer therapies depend on dizzying amounts of data: Here's how it's getting sorted in the cloud

Innovators in the field of genomics are using the cloud, machine learning and other technologies to get a grip on the vast amounts of cancer-related data being produced -- with potentially life-saving results.
Written by Stephanie Condon, Senior Writer
Getty Images

Cancer patients and their doctors have more information about the disease and its treatment than ever before, and the information available continues to grow at a dizzying rate. All that information, however, isn't useful if people can't make sense of it all. 

Think about a lung cancer patient, for instance, who might receive an early diagnosis through a screening program that produces a computed tomography (CT) image. As their diagnosis and treatment plan advances, their caretakers will bring in data sources like MR and molecular imaging, pathology data -- which is increasingly digitized -- and genomics information. 

"All of this, honestly, is a very difficult challenge for the care teams themselves as they're thinking about how to best care for and treat these patients," Louis Culot, GM of genomics and oncology informatics at Philips, said during an Amazon Web Services virtual event for the health industry. 

"In oncology now, or in any any medical discipline, this matters because the treatment matters, the intervention matters," Culot said. "We don't just want data for data's sake. What action could care team members take based on information?"

To get a better grip on all of this data, innovators have turned to tools like cloud computing and machine learning -- with potentially life-saving results. At this week's AWS event, Culot walked through Philips' partnership with the MD Anderson Cancer Center at the University of Texas, which aims to help doctors bring together all of their data to create personalized care plans for patients. 

Satnam Alag, SVP of software engineering at Grail, explained how his company is using the cloud and machine learning to develop a system that can screen patients for dozens of different types of cancer at once, rather than one at a time. 

It's hard to overstate the impact of improved cancer screenings and treatments. In 2020, there were more than 19 million cases of cancer globally, Alag noted, and nearly 10 million deaths. It's estimated that one in three men and one in four women are likely to get cancer during the lifetime.

"Will I or a family member be diagnosed with cancer? Where is it in my body? Can it be cured? Or is it going to kill me? These are common questions that many of us share," Alag said. 

Thankfully, as we collect more data points to study cancer, scientists are also developing new treatment options at a rapid clip. Advances in molecular profiling have helped scientists identify different categories and subcategories of cancer, along with different potential therapies. In 2009, the US FDA had approved eight anticancer drugs, Culot noted. By 2020, that number grew to 57. On top of that, there are now about 1,500 clinical trials currently open to cancer patients. 

"In general, there are now literally hundreds of possible therapies or therapy combinations, which can be used to treat cancer," Culot said. "So we have this double challenge, right?  How do we pull together all this data to get a better picture of the patient? And then with that view, what does it all mean in terms of best treatment?"

To tackle that problem, the doctors at MD Anderson developed the Precision Oncology Decision Support (PODS) system -- an evidence-based tool that helps doctors assess relevant information such as the latest in drug development and clinical trials, as well as patient responses to treatments. This helps them develop personalized treatment plans.


In 2020, MD Anderson partnered with Philips and AWS to make the system available to doctors and practitioners around the globe. 

The system could only exist in the cloud, Culot noted, for a number of reasons. There's an enormous amount of data to store and huge amounts of data processing that needs to happen. At the same time, the system needs to be a secure and compliant multi-tenant system for practitioners around the globe. 

Perhaps most critically, the cloud enables truly personalized treatment plans, Culot noted, by allowing doctors to collaborate and combine their data. 

"People talk about cancer as a big data problem, but it's also what i call a small-end problem," Culot said. He gave the example of a lung cancer patient who learns he has Stage 4 lung cancer with specific mutations. 

"You wind up subsetting and subsetting these populations so even the biggest health care institutions sometimes only have a few patients that meet the criteria we're trying to learn from," he said. "To be able to combine data -- de-identified, in a compliant way -- so we can learn from it, is enabled through these cloud based ecosystems."

Similarly, Satnam Alag of Grail said the cloud was imperative for the development of Galleri, the company's multi-cancer early detection test. The test is designed to detect more than 50 types of cancers as a complement to single-cancer screening tests.

"Leveraging the power of genomics and machine learning needs a lot of computation," Alag said. "Very large amounts of data need to be collected and scaled." 

From a single blood draw, the Galleri test uses DNA sequencing and machine learning algorithms to analyze pieces of DNA in a patient's bloodstream. The test looks specifically for the cell-free nucleic acids (cfDNA) that tumors shed in the blood, which can tell you what kind of cancer is in the body and where it's coming from. 

"Instead of only screening for individual cancers, we need to screen individuals for cancer," Alag said. "And this is now possible thanks to two big technology revolutions that have happened over the last 20 years. First, the power of genomics -- it is now possible to sequence the complete DNA... generating terabytes of data cost effectively within a few days. Second, is the huge amount of innovation in machine learning. We now have the know-how to be able to build complicated, deep learning models with tens of millions of parameters."

Editorial standards