The MedPerf benchmark takes AI models and sends them to clinicians who have data; the clinicians then report back how the model did against the data. That means the AI programs' developers can get access to private datasets that they would otherwise never have access to, says the group, while clinicians get to see whether AI can provide answers about their patients' heath by making predictions on the data. Because of the exchange, the data doesn't leave the secure facilities of the clinicians.
"This approach aims to catalyze wider adoption of medical AI, leading to more efficacious, reproducible and cost-effective clinical practice, with ultimately improved patient outcomes," notes the group in the paper, "Federated benchmarking of medical artificial intelligence with MedPerf," published in the Nature Machine Intelligence imprint of Nature.
The paper was written by lead author Alexandros Karargyris of the University of Strasbourg, France, and 76 other contributors, representing more than 20 companies, including Nvidia and Microsoft, and 20 academic institutions and nine hospitals across 13 countries and five continents.
The initial use of MedPerf in sample benchmark tests has been in radiology and surgery, note Karargyris and team. But, they write, the platform "can easily be used in other biomedical tasks such as computational pathology, genomics, natural language processing (NLP), or the use of structured data from the patient medical record."
Said David Kanter, the executive director of MLCommons, in an emailed statement, "Medical AI is essential for the potential impact it will have on everyone across the planet, and I'm especially proud of the broad community engagement we've seen with MedPerf -- researchers, hospitals, technologists, and more.
"MedPerf has been a huge community effort, and we are excited to see it grow and flourish going forward, ultimately improving medical care for everyone," Kanter said.
MedPerf's platform consists of MLCubs, a method of creating secure application containers akin to Docker. The platform has three different MLCubes, one to prepare the data, one to host the model, and a third to evaluate the output to assess the performance of the model on the benchmark test.
As described by Karargyris and team in the article,
The model MLCube contains a pretrained AI model to be evalu- ated as part of the benchmark. It provides a single function, infer, which computes predictions on the prepared data output by the data preparation MLCube. In the future case of API-only models, this would be the container hosting the API wrapper to reach the private model.
MedPerf also collaborated with Hugging Face, the popular repository of AI models. "The Hugging Face Hub can also facilitate automatic evaluation of models and provide a leaderboard of the best models based on benchmark specifications," they write.
Another partner is Sage Bionetworks, which develops the Synapse platform for data sharing that has been used in crowd-sourced data challenges. "Several ad-hoc components required for MedPerf-FeTS integration were built upon the Synapse platform," note the authors. "Synapse supports research data sharing and can be used to support the execution of community challenges."
The MedPerf approach has already been tested on a challenge organized by multiple academic institutions known as the Federated Tumor Segmentation Challenge, where neural nets are challenged to identify brain tumors -- specifically, gliomas -- in MRI images. The FeTS 2022 challenge in which MedPerf took part, took place across 32 participating sites on six continents.
"Furthermore, MedPerf was validated through a series of pilot studies with academic groups involved in multi-institutional collaborations for the purposes of research and development of medical AI models," the authors said.
MedPerf expects it will expand the platform to many more participants, declaring, "We are currently working on general purpose evaluation of healthcare AI through larger collaborations."
The paper describes MedPerf as being now past an initial "proof-of-concept" stage, and in the midst of a transition from an alpha to a beta stage. Next steps include opening up the benchmarking task generally to outside participants.
Part of the paper is a call for parties in medicine to step up and contribute, including "healthcare stakeholders to form benchmark committees that define specifications and oversee analyses," and "Data owners (for example, healthcare organizations, clinicians) to register their data in the platform (no data sharing required)."