IBM on Thursday unveiled a new, open-source toolkit designed for developers and data scientists that want to help spot trends in the ongoing COVID-19 pandemic. Using developer-friendly Jupyter notebooks, the toolkits are designed as a way to kickstart in-depth analysis. For instance, a user could analyze county-level data in the US to find correlations between poverty levels and infection rates.
"IBM and our team believe in the importance of democratizing technology, activating developers with the most up-to-date datasets and tools, which can help policymakers make the most informed decisions for citizens' well-being," Frederick Reiss, chief architect for IBM's Center for Open Source Data and AI Technologies, wrote in a blog post.
The notebooks download the data sets as they run since they change daily. Moreover, the license terms of the data sets prohibit commercial entities from redistributing the data.
To help users keep their notebooks up to date with the latest information, IBM has also created data processing pipelines. For instance -- as illustrated in the image below -- a user could build a pipeline for county-level time series data for the United States. Each box represents a Jupyter notebook. A user can click on the arrow in the toolbar above the workflow to ship the entire set of notebooks to the cloud. From there, all the notebooks run on Kubeflow Pipelines, and the results are saved to the cloud provider's object storage.
"It's important to note that the underlying data for COVID-19 changes on a daily basis," Reiss wrote. "As you build your own analysis, you'll want to update the results of your own notebooks frequently."