Digging for gold in DataSift's Twitter archive

DataSift is unlocking two years of Twitter archive for pay-as-you-go analysis using its big data tools, opening up a social media goldmine to enterprises and entrepreneurs
Written by Phil Wainewright, Contributor

Knowledge has long been seen as a source of power. Today, big data is its fuel. Harnessing that raw resource will bring significant rewards to those who are the first to uncover the richest seams of knowledge.

DataSift's launch today of its Historics service, which allows customers to mine Twitter's past two years of global tweets for hidden trends and effects, opens up a rich new field of data. There's concern in some quarters, too, about the privacy implications of probing long-forgotten tweets for meaning. DataSift will have to allay fears about individuals being targetted by the service, at the same time as making the case for the value of analyzing such a massive resource of aggregate expression.

One of the most compelling illustrations of how valuable that collective data could be comes from this chart (see picture: source DataSift) mapping Twitter sentiment against stock price in the wake of Blackberry maker RIM's announcement last month that its co-CEOs were resigning. As expected, RIM's stock price (shown by the blue line) dropped when markets opened following the news. But what's interesting is to look at the green line, which shows the volume of tweets commenting on the story in a positive light, and to see what happens after the line shows a sudden surge in positive sentiment. Five minutes later, the stock price hits a bottom and starts a recovery. A later fall back to the same support level again results in a second bounce as sentiment remains positive.

Of course this is just one dataset, so the correlation could be completely random. But the idea of analyzing Twitter sentiment to forecast market movements is already an established science. The difference now is that DataSift's Historics service, when it launches publicly in a couple of months time, will allow anyone to put the theory to the test, on a pay-as-you-go-basis. You just have to pick a selection of news events from the past two years, map Twitter sentiment against stock price movements and, if you find a correlation, you have the makings of a trading algorithm that might make a fortune by closing and opening your trading positions those few minutes faster. Well, maybe it won't be quite so easy, but the point is that this was never even an option before unless you had your own massive big data resources.

Most of DataSift's customers will be using the Historics resource for relatively more mundane purposes, such as measuring responses to past product launches and finding out which tactics worked best and worst. The most valuable information coming out of Twitter is what's being tweeted today, but having access to a two-year archive allows companies and entrepreneurs to find patterns from the past to help them make more sense of today's tweet stream.

As well as its unique access to the archive (which grew out of the early relationship between Twitter and retweeting service Tweetmeme), DataSift offers big data resources so that its customers can cut straight to the analysis. Information Age reports that the Historics service "sits on a Hadoop cluster with over half a petabyte's worth of storage." DataSift adds additional context such as sentiment measures, link content, Klout ratings, gender and location and provides an interface as well as an API to help customers filter the precise information they need out of Twitter's 250 million daily tweets. "You need to build what DataSift has built to consume that," newly appointed CMO Tim Barker told me in phone call yesterday. "[Customers] typically want a drink or a gallon of the firehose, not the whole piece."

Barker spent five years as VP of EMEA marketing at Salesforce.com after it acquired Koral Software, which he co-founded. He says the jump back to join a new start-up last week was motivated by the untapped potential he sees in DataSift. "I really was blown away by the technology they've created, he told me, "but also by the lack of public awareness ... I really see them as a diamond in the rough."

What's fascinating is that even Barker himself admits he has no way of knowing what that potential really is. "I don't know what is even possible with this technology." He sees the company carving out a role as a "social data platform" that, rather than offering a finished application, will instead provide a platform on which customers and partners will build new capabilities. "We've focused on having the combination of enterprise and entrepreneur. To make this technology easily accessible and consumable is critical." If big data from social media becomes a gold rush, then DataSift aims to provide the picks and shovels prospectors will use to seek their fortunes.

Editorial standards