Big data: An overview

Big data: An overview

Summary: Data is being generated about the activities of people and inanimate objects on a massive and increasing scale. We examine how much data is involved, how much might be useful, what tools and techniques are available to analyse it, and whether businesses are actually getting to grips with big data.

SHARE:

Big data: definitions and applications

Big data is commonly characterised by three vectors — volume, variety and velocity. Volume clearly refers to the sheer amount of data; variety refers to its 'polystructured' nature (i.e. a mixture of structured, semi-structured and unstructured data such as text, audio and video); and velocity refers to the rate at which it is generated and analysed (which in some applications needs to be in real time, or near real-time). Big data is not generally amenable to analysis in traditional SQL-queried relational database management systems (RDBMSs), which are primarily designed to handle smaller and more predictable flows of structured data. In particular, performance can suffer as the size or user population of an RDBMS grows. A variety of scalable database tools and techniques have therefore evolved, Apache's open-source Hadoop distributed data processing system (which includes the HBase database and Hive data warehouse system) being the best-known solution. A related set of non-relational databases go under the NoSQL banner, leading examples being Dynamo DB (Amazon), MongoDB, Neo4j, Couchbase and Cassandra (Apache).

bd-hadoop-logo
Hadoop: the elephant in the Big Data room

There is also a relatively new job description, that of the data scientist, whose role is to orchestrate often disparate big data sources, perform analyses using the most appropriate tools, and present the results in digestible form (as dashboards, for example), to decision-makers. Data scientists are currently in short supply, however — a skills gap that leaves many organisations with few options other than to pay expensive consultancy rates or remain data-rich but information-poor. Consequently, there is much activity and interest in the area of 'self-service' big data analysis tools that can be used by non-specialists, and in converging the two strands of the database world: internet-centric Hadoop/NoSQL and enterprise-centric SQL/RDBMS.

There are myriad kinds of big data that could deliver value if properly orchestrated. In the EMC/IDC study mentioned earlier, four classes are highlighted in addition to traditional transactional data in enterprise data warehouses: surveillance footage (useful in crime, retail and military applications, for example); data from embedded and medical devices (for real-time epidemiological studies, for example); information from entertainment and social media (mining the wisdom — or otherwise — of the crowds on multiple topics); and consumer images (if tagged and analysed when uploaded to public websites). To these we would add the increasing amounts of data generated by all manner of sensors in the fast-developing Internet of Things.

Big data in business today

If, as IDC and EMC estimate, there are millions of terabytes of usable data available for big data analysis today, has it actually become part of the everyday fabric of business? A recent survey from Steria's Business Intelligence Maturity Audit (biMA), entitled Are European Companies Ready for Big Data?, gives a clue as to the current state of play in Europe.

When asked about the biggest business intelligence (BI) challenges facing them, respondents ranked big-data-related issues — pertaining to data velocity, volume and variety (shaded red, below) — the lowest:

bd-idc-emc-challenges
Source: Steria/biMA, 2013. Survey period: Nov 2012-Jan 2013 • Participants: 668 • Countries: Germany/Austria/Switzerland (47%), France (18%), Great Britain (13%), Scandinavia (10%), Poland (8%) • Industries: IT (22%), manufacturing (18%), public sector (13%) • Company size (employees): <250 (27%), 251-2,500 (28%), 2,501-10,000 (23%), >10,000 (23%)

Note that third in the list of challenges is 'internal competencies insufficient': that's a skills gap in the well-established field of business intelligence, not to mention the relatively new and less familiar area of big data analytics.

The BI data volumes in Steria's survey also suggest a low prevalence of big data activity, with only 16 percent of companies reporting volumes of more than 50TB in their analytical databases:

big-data-idc-emc-volume
Source: Steria/biMA, 2013

When asked to rank the relevance of big data, only 23 percent of respondents scored it positively (marked in red, below), compared to 51 percent who were cool on the idea (marked in blue):

big-data-steria-relevance
Source: Steria/biMA, 2013

Despite this moderate showing, Steria's respondents saw a wide range of potential benefits from big data, even if no single 'killer application' is apparent in this survey:

bd-steria-apps
Source: Steria/biMA, 2013

Although it's only one survey (see our own ZDNet/Tech Republic big data survey for another take), this Steria/biMA report tends to support the overall picture described earlier: there are plenty of potential benefits in big data, but it's not yet delivering value day in, day out, to ordinary businesses.

The big data market

The chances are, though, that big data will take its place in the mainstream of IT activities in due course. That's certainly the view of analyst firm IDC, which in March 2012 forecast big data to become a $17 billion business by 2015 with a CAGR of 39.4 percent over the preceding five years (since updated to $23.8bn by 2016 with a CAGR of 31.7%):

bd-idc-revenues
Source: Worldwide Big Data Technologies and Services: 2012-2015 Forecast (IDC, 2012)

Not surprisingly, the storage sector — servicing large-scale Hadoop clusters and other similar systems — shows the biggest forecast growth rate (61.4%), with servers bringing up the rear (27.3%). According to IDC, big data storage will account for 6.8 percent of the entire worldwide storage market by 2015.

Continued

Topics: Going Deep on Big Data, Big Data

About

Charles has been in tech publishing since the late 1980s, starting with Reed's Practical Computing, then moving to Ziff-Davis to help launch the UK version of PC Magazine in 1992. ZDNet came looking for a Reviews Editor in 2000, and he's been here ever since.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

6 comments
Log in or register to join the discussion
  • Riding high in the Hype

    I certainly have no problem agreeing that Big Data is coming close to the Peak of Inflated Expectations. That is why businesses need to accurately plan and research before they leap into a half-baked project.

    I wanted to share a video that I think can be helpful for your readers that deals with planning and executing a Big Data program. (http://www.youtube.com/watch?v=Ow76L0IEZNY) This video is based off of TEKsystems research and delivers the message in a cute way through multiple sci-fi references. It gives a more realistic expectations of how to begin to approach a Big Data initiative, backed up by research from leaders in the industry.
    TechGuy1313
  • was the America's Cup the first use-case of big data in sport?

    This is an interesting well-researched article with good data points. On the subject of how big data will be used and what value it will add, there is an interesting theory that Oracle just won the America's Cup by using big data. Below is a blog post on this topic (which i contributed to).
    http://bit.ly/1hjKy7G
    jamesr26
  • What the Gartner hype cycle misses out

    Of course certain hyped things prove to be nothing but hype and after the trough of disillusionment disappear without trace.

    Big data is probably one of those.

    Of course no one likes to talk about the computer industry getting something absolutely and fundamentally wrong.
    jorwell
  • What the Gartner hype cycle misses out

    Of course no one likes to talk about the computer industry getting something absolutely and fundamentally wrong.
    Big data is probably one of those.
    a418887065
  • Big Data - Businesses are not ready yet

    considering that most business are not even ready for traditional style BI, I think Big Data and open source technologies orbiting it have a long way to go. However the path is set and sooner or later Big Data will become industry norm.

    Our company http://www.sqiar.com/ helps small and medium size organizations to take first steps into BI with confidence. We provide consultancy in proprietary & open source technologies.
    BizzIntelSQIAR
  • Big Data

    I guess these are merely favorable..I am just astounded by their services and functions too..Check 'em http://www.flashpanel.com
    MiltonKer