US National Cancer Institute shares troves of research data via Google Cloud

The institute is using Google Cloud's BigQuery to connect researchers around the world to a wide collection of cancer datasets.
Written by Stephanie Condon, Senior Writer

After using Google Cloud's BigQuery to analyze massive datasets of genomic and proteomic data quickly, senior researchers from the US National Cancer Institute are now making their tools and resources available to the broader research community. By leveraging the cloud, the researchers are hoping to speed up potentially life-saving cancer research while keeping their data secure and compliant with different national and international rules, Google said Thursday. 

The data in question comes specifically from the Institute for Systems Biology-Cancer Gateway in the Cloud (ISB-CGC) -- part of the US National Cancer Institute's Cloud Resources. The NCI created Cloud Resources so that scientists wouldn't have to download and store extremely large datasets.

With the ISB-CGC on Google Cloud, two researchers developed a set of Google BigQuery user-defined functions (UDFs) to perform statistical tests on breast cancer data. Using the UDFs, analysis that would have taken days with an on-premise program took just minutes to complete. 

The researchers -- Dr. Kawther Abdilleh, lead bioinformatics scientist at General Dynamics Information Technology, and Dr. Boris Aguilar, a senior research scientist at ISB -- have now made their UDFs available to other researchers via BigQuery. 

"We are spreading the message of the cost-effectiveness of the cloud,'' Abdilleh said in a statement. "With Google Cloud's BigQuery, we've successfully demonstrated that researchers can inexpensively analyze large amounts of data and do so faster than ever before."

Editorial standards