Data about the COVID-19 pandemic is being aggregated and prepped in a rapid clip as tech vendors are creating a stack of analysis tools for amateur epidemiologists as well as data science wonks.
Here's the upshot: This novel coronavirus outbreak may be the most visualized ever.
The first data analysis dashboard and aggregation tool appeared shortly after the COVID-19 outbreak in China. The dashboard, courtesy of Johns Hopkins University, has become a go-to data source since it visualizes and aggregates data from WHO, CDC, ECDC, NHC, DXY, 1point3acres, Worldometers.info, BNO, state and national government health departments, and local media reports.
Johns Hopkins also put the data on GitHub for use. Since the launch of that dashboard January 23, COVID-19 has become arguably the most visualized pandemic data set. While the sets were available from a variety of sources, the latest efforts revolve around providing clean data for analysis.
A tour of various efforts.
- Tableau is taking the Johns Hopkins data and publishing a starter dashboard. Tableau's contribution to the effort revolves around preparing the data and making it available in various formats and a visualization template.
- Esri is applying its mapping and geolocation expertise to COVID-19 tracking. Esri has also localized COVID-19 case data and combined it with bed availability data from Definitive Healthcare. The dashboard, which uses Esri's ArcGIS Business Analyst software, gives a snapshot of preparedness at the county level.
- Facebook and Carnegie Mellon anonymized user data to track COVID-19 symptoms across the US.
- Open source data sets have also been helpful. Researchers and Atlantic writers are pulling together data from numerous sources using open-source software.
- GitHub has a series of data sets on novel coronavirus as does data.world and Kaggle, which has competitions, forecasts and visualizations.
- Reddit's Data is Beautiful is a place to highlight a bevy of visualizations from hobbyists and data scientists. Our World in Data also has a strong overview of COVID-19 research and data.
- Snowflake, a cloud data platform announced data services firm Starschema has listed a free data set that aims to be a single-source of truth for incidence and mortality in COVID-19 cases. The data can be augmented with population density and geolocations.
- IBM has aggregated COVID-19 data and integrated it with The Weather Channel app, which will meld weather data and local novel coronavirus incidents. Via its The Weather Channel App, IBM's subsidiary can get relevant COVID-19 data to its 300 million active monthly users. The IBM visualizations rhymes with efforts from Google and Microsoft Bing, which aim to bring COVID-19 data to the masses.
ESO, a data software company that focuses on EMS, fire and hospital first responders to track response data across the US. The data set collects pre-hospital to hospital response and is collected from 2,600 EMS agencies across the US but excluding California.
The Institute for Health Metrics and Evaluation has a data set that looks at hospital bed use and need for intensive care beds and ventilators due to COVID-19.
C3.ai has created a unified data lake of all publicly available COVID-19 data sets. The data set will be available April 13 and updated again on May 15 with more data sets.
Other data sets that will be aggregated into the C3 data lake:
- Johns Hopkins University: COVID-19 Data Repository
- The Atlantic: COVID Tracking Project
- The New York Times: COVID-19 Data in the United States
- nCoV-2019 Data Working Group: Epidemiology Data
- MOBS Lab: COVID-19 Situation Report
- World Health Organization: Daily Situation Reports
- European Centre for Disease Prevention and Control: Worldwide Situation Updates
- University of Montreal: COVID-19 Image Data Collection
- National Center for Biotechnology Information Virus Database
- COVID-19 Open Research Dataset (CORD-19)
- Data Science for COVID-19: South Korea Dataset
- Indian Ministry of Health & Family Welfare: COVID-19 India
- Sito del Dipartimento della Protezione Civile – Emergenza Coronavirus
- Data Science for COVID-19 Indonesia Initiative
- Kaiser Health: US Hospital ICU Beds
- HealthData.org: US Hospital Capacity
- Environment Protection Agency: US Air Quality
- New York ISO: Electricity Load Data
- US Census Bureau: Population Data
- IEEE: COVID-19 Tweets Dataset
- University of Washington: COVID-19 Projections
- Kaiser Family Foundation: Social Distancing Policies