Big data ascends the learning curve

Analysis of Twitter postings throughout 2012 shows that interest in big data is booming but the emphasis for now is on learning more about the topic

With several big data-themed events coming up this month — among them the global multi-city Big Data Week series — the level of social media chatter around this topic is likely to surge. One vendor that's well placed to track that buzz is DataSift, whose stock-in-trade is analyzing Twitter data to identify trends. In preparation for EuroCloud UK's own big data meetup tomorrow (see my previous post for details), DataSift's Tim Barker sent me the infographic below (click on the image to enlarge). This presents several findings from its analysis of every tweet throughout 2012 that mentioned big data.

Infographic of Big Data tweets 2012

The main message from the analysis is that the world is still learning about big data, and that probably means we're in the early adopter phase and not yet at the peak of the notorious hype cycle. Here's a quick rundown of some of the most striking findings:

  • After staying almost flat from Q1 to Q2, tweets about big data surged 25% in both Q3 and Q4.
  • 72% of tweets included links, showing the conversations were mostly about sharing information resources.
  • The most shared articles of the year were largely explaining, exploring or 'mythbusting' the topic.
  • Hadoop ensured that Apache was the most mentioned vendor, but MongoDB developer 10gen was a strong second place.
  • IBM's active content marketing strategy doubtless helped it beat out HP, Teradata, Oracle and EMC for mentions
  • Splunk's IPO will have helped its visibility. Conversely, HP's spat over Autonomy generated the most negative tweets on big data.
  • Japan's habitual preference for build-your-own solutions gave Cloudera prominence over rival vendors in that market.
  • Splunk outperformed in the US, DataSift in the UK, SAP in Germany and (why?) IBM in France.

DataSift maintains what it claims is Europe's biggest Hadoop cluster (comment in Talkback below if you know different!). It says that every tweet is stored along with an average of 72 data items relating to it and it stores around four terabytes of data every single day. In its analysis of 2012 tweets on big data, it found over 2 million interactions involving just less than one million authors. Peak activity was just over 3000 tweets an hour — almost one per second — but that's literally just a drop in the ocean of 8000+ items per second that DataSift monitors in total.