Big data: definitions and applications
Big data is commonly characterised by three vectors — volume, variety and velocity. Volume clearly refers to the sheer amount of data; variety refers to its 'polystructured' nature (i.e. a mixture of structured, semi-structured and unstructured data such as text, audio and video); and velocity refers to the rate at which it is generated and analysed (which in some applications needs to be in real time, or near real-time). Big data is not generally amenable to analysis in traditional SQL-queried relational database management systems (RDBMSs), which are primarily designed to handle smaller and more predictable flows of structured data. In particular, performance can suffer as the size or user population of an RDBMS grows. A variety of scalable database tools and techniques have therefore evolved, Apache's open-source Hadoop distributed data processing system (which includes the HBase database and Hive data warehouse system) being the best-known solution. A related set of non-relational databases go under the NoSQL banner, leading examples being Dynamo DB (Amazon), MongoDB, Neo4j, Couchbase and Cassandra (Apache).
There is also a relatively new job description, that of the data scientist, whose role is to orchestrate often disparate big data sources, perform analyses using the most appropriate tools, and present the results in digestible form (as dashboards, for example), to decision-makers. Data scientists are currently in short supply, however — a skills gap that leaves many organisations with few options other than to pay expensive consultancy rates or remain data-rich but information-poor. Consequently, there is much activity and interest in the area of 'self-service' big data analysis tools that can be used by non-specialists, and in : internet-centric Hadoop/NoSQL and enterprise-centric SQL/RDBMS.
There are myriad kinds of big data that could deliver value if properly orchestrated. In the EMC/IDC study mentioned earlier, four classes are highlighted in addition to traditional transactional data in enterprise data warehouses: surveillance footage (useful in crime, retail and military applications, for example); data from embedded and medical devices (for real-time epidemiological studies, for example); information from entertainment and social media (mining the wisdom — or otherwise — of the crowds on multiple topics); and consumer images (if tagged and analysed when uploaded to public websites). To these we would add the increasing amounts of data generated by all manner of sensors in the fast-developing Internet of Things.
Big data in business today
If, as IDC and EMC estimate, there are millions of terabytes of usable data available for big data analysis today, has it actually become part of the everyday fabric of business? A recent survey from Steria's Business Intelligence Maturity Audit (biMA), entitled Are European Companies Ready for Big Data?, gives a clue as to the current state of play in Europe.
When asked about the biggest business intelligence (BI) challenges facing them, respondents ranked big-data-related issues — pertaining to data velocity, volume and variety (shaded red, below) — the lowest:
Note that third in the list of challenges is 'internal competencies insufficient': that's a skills gap in the well-established field of business intelligence, not to mention the relatively new and less familiar area of big data analytics.
The BI data volumes in Steria's survey also suggest a low prevalence of big data activity, with only 16 percent of companies reporting volumes of more than 50TB in their analytical databases:
When asked to rank the relevance of big data, only 23 percent of respondents scored it positively (marked in red, below), compared to 51 percent who were cool on the idea (marked in blue):
Despite this moderate showing, Steria's respondents saw a wide range of potential benefits from big data, even if no single 'killer application' is apparent in this survey:
Although it's only one survey (see our own ZDNet/Tech Republic big data survey for another take), this Steria/biMA report tends to support the overall picture described earlier: there are plenty of potential benefits in big data, but it's not yet delivering value day in, day out, to ordinary businesses.
The big data market
The chances are, though, that big data will take its place in the mainstream of IT activities in due course. That's certainly the view of analyst firm IDC, which in March 2012 forecast big data to become a $17 billion business by 2015 with a CAGR of 39.4 percent over the preceding five years (since updated to $23.8bn by 2016 with a CAGR of 31.7%):
Not surprisingly, the storage sector — servicing large-scale Hadoop clusters and other similar systems — shows the biggest forecast growth rate (61.4%), with servers bringing up the rear (27.3%). According to IDC, big data storage will account for 6.8 percent of the entire worldwide storage market by 2015.