In a very different kind of "big data" talk than other SXSW 2013 big data presentations, Splunk developer Ed Hunsinger explained how he gathered personal data output of things ranging from sleep machines to Foursquare and created informative data visualizations out of his everyday habits.
At turns funny (his dismal rate of catbox cleaning) and at times incredibly useful (his rate of maintaining 'inbox zero'), the audience for Hunsinger's Playing With Your Own Big Data was held in rapt attention to what the big data dev has done with his employer's tools — specifically, Splunk Storm.
Every day we continue to increase our own personal data footprints, sometimes actively and sometimes without even realizing it.
Big companies are already collecting your data and using it to make more money. Why not collect this data (and more) and use it to your own benefit?
Hunsinger looked at his everyday life and grabbed data from everything he could, like Foursquare, Twitter and more.
He acquired as many devices as he could find to measure his sleep, such as the Zeo Sleep Machine and wireless activity tracker, FitBit.
When data wasn't easy to get from devices, APIs or apps, he wrote custom python scripts — which he collected and shared here on GitHub.
Then in his all-too-short SXSW presentation, Hunsinger showed some of his own big data, visualized in charts and graphs — the results of cleaning up his data and dumping it into Splunk Storm (and sometimes the desktop version, Splunk).
Examples in Playing With Your Own Big Data included:
- Hunsingler's weight measured by WiThings, which showed his average weight over time.
- His location with Google Latitude and his check-ins via Foursquare.
- The unread vs. read email inbox count (Hunsinger's quest for 'inbox zero') — for this he used a custom python script to send to Splunk Storm.
- @edrabbit Tweets, for which he showed a chart of number of tweets per week, pointing out the large spikes of increased use over the duration of SXSW.
- Hunsinger's own custom iPhone web app for tracking things manually.
- His humorous graph of a Blue Angels noise detection system that he built with a RaspberryPI + USB Mic and SplunkStorm along with friend Greg Albrecht (Blue Angels are the United States Navy's flight demonstration squadron, known for ultra-loud practice flyovers in San Francisco during September's Fleet Week).
Hunsinger told me he didn't have enough time to include everything in the SXSW talk that he's tracking, and told me that he's also logging:
(...) my bike rides with RunKeeper, sleep patterns with Zeo, music with last.fm, my heart rate through an exercise heart rate monitor (I'm really hoping to get a basis soon), my posture with a LUMOback and steps walked with a Fitbit.
Putting his data into Splunk Storm wasn't necessarily as easy as hitting "export" on an app and "import" into Splunk Storm, but Hunsinger explained the process in a way that might be suprisingly easy for anyone to try.
To make the cool charts in his Playing With presentation, Ed employed different kinds of hardware for data collection, utilized APIs and export functionality from various services, and told me:
I used provided APIs or export functionality with the different services to get my data out. This is the tricky part as some services are hesitant to release their data.
But, they all use different formats, so I had to clean up the data for dumping into Splunk Storm.
Basically I wrote a bunch of python scripts (http://github.com/edrabbit) that fetch and/or clean the data so I can import it into Splunk Storm.
The need to write my own sanitation scripts points to a bigger picture. Most of these services want to keep you silo'ed at the expense of the customer.
My hope is that these companies can figure out a business model that lets them release the data to the user and then the best software out there for working with that data can win.
It's true that most people feel uneasy knowing that most everything they do is measured and collected into data sets — and the companies doing the collecting and analyzing may know more about their personal habits than they do.
I'm hoping Ed Hunsinger's self-experiments can start to change this imbalance.
If we can make the act of better knowing our own data into something fun, easy, and we can use it to better our lives, that's worth a lot.
It could also raise awareness about what everyone's been giving away, and why they might want to care about what's done with their own big data.
Or, at least it might give Ed's wife a week off from catbox duty.
Disclosure: I occasionally cat-sit for Mr. and Mrs. Hunsinger.