Software Carpentry for the people

Big data is pervasive, but the skills to use it aren't. Software Carpentry is programming literacy for big data. And important little data.

Thanks to the confluence of Moore's Law, faster algorithms, scale-out storage and faster networks, big data is becoming pervasive. But how do we train more people to use it?

Take a group of scientists, say, geneticists, working on a few hundred terabytes of sequencer data. The time to results is increasingly dominated by the time it takes to write and test software required to gather and analyze the data.

Google and Amazon don't have this problem, because they hire computer science PhDs. But geneticists - or other scientists - don't want to be computer scientists. But now they need programming skills to do their jobs.

That's the topic of a great paper by Greg Wilson, Software Carpentry: Lessons Learned. Greg works at Mozilla and has been working on this for 15 years.

The idea is Software Carpentry and the program trained over 4300 scientists last year to use computers more efficiently: program design, scripting, version control, testing, and other unsexy but basic software skills.

This is important for productivity and correctness.

Getting it wrong
Several years ago two Harvard economists - Reinhart and Rogoff - found that a national debt over 90 percent of GDP crushed economic growth. In response during the Great Recession policy makers forced countries to cut spending and debt, inflicting pain on millions of people.

But these smart Harvard economists were wrong: their spreadsheet didn't analyze all the data and, when corrected, the effect disappeared. In the meantime millions of eager and expensively educated young people are unemployed, an grievous waste of lives, talent and potential, thanks to a bad spreadsheet.

Getting software right
The Software Carpentry training materials are online, designed to be delivered in a two day training. The goal is to give smart people who don't want to be computer scientists the skills needed to make them productive and transparent in a computer-pervasisve age.

The Storage Bits take
Big data makes it possible to know things that could only be guessed at 10 years ago - and to know them NOW, rather than 5 years from now. But we don't have enough computer scientists and statisticians to write the code needed to make sense of all the data we can now gather, store and analyze.

Software Carpentry is a pragmatic response to a real problem. We need all the help we can get.

Comments welcome, as always. Got any good software horror stories?