Artifiical intelligence has a number of potentially significant applications in healthcare, from diagnosing patients to developing life-saving drugs. Yet training the robust neural networks for healthcare applications is easier said than done. Training a neural network requires huge amounts of quality data, but in the health sector, patient data must remain secure and private. That's limited the size of datasets that researchers have had to work with.
This week, researchers from Nvidia and King's College London are debuting a new method for training neural networks that could get around this major roadblock. At the MICCAI medical imaging conference in Shenzhen, China this week, they'll be presenting their research into building a privacy-preserving federated learning system for medical imaging analysis.
"We hope it will be a big step to enabling precision medicine at a large scale," Nicola Rieke, senior research scientist at Nvidia, said to ZDNet.
Federated learning is a learning paradigm based on decentralized data. Rather than relying on data pooled together in a single location, an algorithmic model is trained in multiple iterations at different sites. In the healthcare sector, this offers a degree of privacy for hospitals and other organizations that want to pool their resources to train a deep learning model without actually sharing their data or letting it out of their possession.
The Nvidia and King's College researchers took a client-server federated approach with a centralized server to maintain a global deep neural network. Under this approach, participating hospitals would be given a copy of their neural network to train on their own dataset.
Once the model was trained locally for a couple of iterations, the participants would send their updated version back to the centralized server. The server would then aggregate the contributions from all of the participants to create a "consensus model." The new consensus model would be shared again with participants, and the training would continue.
However, as Rieke explained, recent research has shown that through model inversion, it'd still be possible to infer information about the datasets used to train the model. The researchers set out to create a federated learning system that tackled this privacy vulnerability.
The first step they took was to communicate only part of a model update from the participants back to the centralized server. The researchers found they could hide up to 90 percent of the model and still aggregate a consensus model with performance levels comparable to those achieved with a centralized learning system.
The research team then went one step further to obscure the data by injecting random noise. With the particular dataset they were working with, they found that they could obscure 40 percent of the model and inject noise, while still achieving the same level of performance.
To conduct this research, the team used brain tumor segmentation data from the BraTS 2018 dataset, which contains MRI scans of 285 patients with brain tumors. The BraTS dataset includes data from 13 institutions. One of the benefits of the federated learning system employed for this study is that institutions could drop in or out of the model training process without impacting its progress.
"You would have less data available" if an organization dropped out, Rieke explained, "but others can continue to work together on this one global model."
Encouraging collaborative training efforts could have a clear impact on the advancement of healthcare AI. To understand the need for a large dataset, it helps to consider the number of medical images a physician would have to review to be considered an expert in their field. A medical professional with 15 years of experience has probably read around 15,000 cases in a year, amounting to a total of 225,000. Most open health datasets offer nowhere near that many images.
"The really big challenge to healthcare AI is to build this robust, generalizable model," said Abdul Hamid Halabi, director of healthcare at Nvidia.