Tracking the real US coronavirus testing numbers with open source

We not only don't know how many people have coronavirus in the US, but we also don't even know how many have been tested. So, researchers, using open-source tools, are digging out the real numbers for us.
Written by Steven Vaughan-Nichols, Senior Contributing Editor

Want to know something scary? We really don't even know how many people have been tested for the coronavirus, never mind how many have it. Despite the Trump administration's promise of millions of tests and President Donald Trump's claims that anyone can get tested for COVID-19, it's clear there's still not enough tests available. 

Fortunately, researchers and Atlantic writers are pulling together data from numerous sources and using open-source software to give us the most accurate possible numbers on those tested, those found to be ill, and those who haven't gotten it. 

Isn't this the job of the gutted Centers for Disease Control (CDC)? Yes. But, with insufficient resources, thanks to Trump's CDC budget cuts, it's no longer trying. 

The CDC US coronavirus site states: 

"CDC is no longer reporting the number of persons under investigation (PUIs) that have been tested, as well as PUIs that have tested negative. Now that states are testing and reporting their own results, CDC's numbers are not representative of all testing being done nationwide." 

If it's not tracking the numbers, then who is? Open-source developers and allies -- with The Covid Tracking Project.

The project collects information from all 50 US states, the District of Columbia, and five other US territories to provide the most comprehensive testing data it can collect for the novel coronavirus, SARS-CoV-2. It attempts to include positive and negative results, pending tests, and total people tested for all of them. 

It's not perfect.  Besides, with the CDC is no longer sharing complete testing data, each state and region are doing it their own way. 

The developers explain:

"The information is patchy and inconsistent, so we're being transparent about what we find and how we handle it -- the spreadsheet includes our live comments about changing data and how we're working with incomplete information."

They also have Best Practices suggestions on how the data should be reported. Hopefully, the state health departments will follow their guidelines. 

The program itself uses Ruby to crawl state websites and update data at coronavirus (COVID-19) testing data. The API is wrapped in R, a statistical computing language.

If you want to see the data for yourself, it's available both as raw data on a Google spreadsheet or in JSON and CSV so you can work with it via an API.

Now, more than ever, we need real hard data as we face an increasingly dangerous and uncertain world. Thanks are owed for these individuals standing up to do the work when our government fails us. 

And what are the results? As of the afternoon of March 19, there have been 103,945 total COVID-19 tests reported. Of those, 11,723 people were infected, 89,197 were not infected, 3,025 are still awaiting results, and 160 have died.

Editorial standards