Unstructured data: the elephant in the Big Data room

Unstructured data: the elephant in the Big Data room

Summary: Unstructured data is still ungovernable, a new survey finds. But enterprises are going to have to reckon with new forms of information in its many forms.


Unstructured data is still ungovernable, a new survey finds. But enterprises are going to have to reckon with new forms of information in its many forms.

As part of my work with Unisphere Research, a division of Information Today, Inc., I helped conduct a new survey that finds unstructured data is growing at a faster clip than relational data -- driving the "Big Data" explosion. Thirty-five percent of respondents say unstructured information has already surpassed or will surpass the volume of traditional relational data in the next 36 months. Sixty-two percent say this is inevitable within the next decade. The survey gathered input from 446 data managers and professionals who are readers of Database Trends and Applications magazine, and was underwritten by MarkLogic.

Many organizations are becoming overwhelmed with the volumes of unstructured information -- audio, video, graphics, social media messages -- that falls outside the purview of their "traditional" databases. Organizations that do get their arms around this data will gain significant competitive edge. A majority of survey respondents acknowledge that unstructured information is growing out of control and is driving the big data explosion – 91% say unstructured information already lives in their organizations, but many aren’t sure what to do about it.

A segment of companies, 16%, have made unstructured data part of their actual business offerings.

There is growing concern across the business technology landscape about organizations' inability to effectively tap these new resources. Last month, estimates were released that show that this year, the Digital Universe — meaning every electronically stored piece of data or file out there — will reach 1.2 million petabytes, or 1.2 zettabytes, this year. That’s up from a measly 800,000 petabytes in 2009. In a recent interview with MIT's Sloan Management Review, K. Ananth Krishnan of Tata Consultancy Services described what's at stake for businesses that fail to leverage their growing unstructured data stores:

"We are only looking at what we have in our data warehouses, it’s not going to be enough for us to get the insights that we need. If you’re a retailer and you were not using all the information you could to judge your customers’ buying patterns, then the retailer across the street probably will, and they’ll steal your customers. That’s the realization, I think, that drove a lot of people to think that they should be capturing much, much more."

In terms of technologies and governance, organizations don't feel they're ready for all this data. Many companies don’t understand how to handle unstructured data and throw old technologies at the problem. With relational databases, companies are attempting to use 30 year-old technology to try to tackle today’s information challenges.

The Unisphere/MarkLogic survey also found that 86% of respondents admit that unstructured data is important to their organization, yet only 11% have clear procedures and policies for managing unstructured data in place. In addition, 80% of respondents know the amount of unstructured data will rise in the next three years, but only 24% of respondents believe their current infrastructure will be able to adequately manage it.

We still have a long way to go, Krishnan says in the Sloan review article. Technology that can grasp and pull insights out of this variety of data is still on the cutting edge:

"There are still loads of things that we can’t do. There is a whole aspect of computing which PhD students are working on, which is basically trying to understand text. Understand sentiment. A five-year-old child can say in 30 seconds whether Mom or Dad is angry, or happy, or whatever. Sense the mood in the room. A computer program still has a hard time figuring that out.... The analysis of text, the analysis of video, the analysis of audio — it works a lot better in James Bond movies. In real life, it is extremely hard from a fundamental computer science perspective to understand all that information."

Management awareness of the existence of this data, let alone how it can benefit the business, is where we need to start. The Unisphere/MarkLogic survey finds 40% of managers are unaware of the extent unstructured data exists in their organization, and only 45% of organizations are moderately or strongly committed to leveraging this resource, creating a competitive advantage for those companies that do.

Even organizations with higher concentrations of unstructured information face issues with corporate awareness regarding the existence of this data.  They devote most of their resources to managing the smaller portion of their data, while moving unstructured information into special-purpose databases or content management systems.

Cross-posted at CBS SmartPlanet Business Brains site.

Topics: Enterprise Software, Data Centers, Data Management, Software

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • This issue is easily fixed; stop spying on your customers.

    So, having stalked us online for all this time, now the sales-weasels, marketing-drones, and their criminal-executive-offices don't know how to process all the illicit tracking data and invasive profiles that they have built on us?

    Awwww... I feel so sad for them.

    Once upon a time companies sold goods and services to valued customers without the need of tracking-cookies, web-bugs, pop-up/pop-under, flash advertisements, extensive personal profiling, and the inevitable criminal misconduct which comes with them. Corporate malfeasance now tops the list of dangers to the populace.

    TJX exposed 45+ million credit card numbers and private identity data to criminals... and was not even punished. That's *millions* of people victimized by just one such data collecting corporation.

    Hey, here's an idea, stop collecting and storing personal data on us!

    Here's a bold new marketing plan too; how about you build quality products and charge a fair price for them? That way you won't need to keep coming up with new ways to trick people into buying your foreign salve-labor made junk...

    and it would solve your unstructured data storage and mining issues too.

    Just a suggestion [shrug] not that anyone in the board-room is listening to reason.

    • RE: Unstructured data: the elephant in the Big Data room


      You know they tracked you when you wrote this?
      • Of course...

  • RE: Unstructured data: the elephant in the Big Data room

    I can vouch for an entire herd of elephants when it comes to CAD/CAM data (Computer Aided Design and Manufacture) sector. While there are PDM (Product Data Management) tools for larger corporations, it can be slim pickings for smaller businesses. And that does not cover the issues of legacy formats, design techniques (like first or third projection drawings), or just having a decent set of backups.
    Matthew A. Sawtell
  • RE: Unstructured data: the elephant in the Big Data room


    Ironically, the key to managing all this unstructured data is "more data" or to be more specific, metadata. The media companies around the world have been grappling with this for awhile (with limited success) but it is not just an issue there. Every company is dealing with media for marketing, learning and development, etc. Metadata standards are emerging from many sources and harmonization of them is going to be a challenge.
  • RE: Unstructured data: the elephant in the Big Data room

    Useless article full of second-hand drivel and statistics so stupid only an MBA could make sense of them. Says nothing about real-world solutions. Assumes big data is being used to drive business solutions and not technical solutions. The real experts in these fields (not the author) know data has to be combed, aggregated, filtered, correlated, compared, fixed, patched, evaluated, re-correlated, validated, and integrated before it can even come close to improving your products, and doesn't write whiny articles about it.
  • RE: Unstructured data: the elephant in the Big Data room

    The costs of managing these legacy systems are getting in the way: too much of the budget goes to maintenance, and not enough is left over for new development and technologies. -<a href="http://www.franchisegator.com/Any-Lab-Test-Now-franchise/">Any Lab Test Now Franchise</a>
  • RE: Unstructured data: the elephant in the Big Data room

    Big Data are: complexity of big data solutions; difficulty of applying valid statistical models to the data; and having limited insight into the meaning of the data.-<a href="presentationsolutions.org/tag/guy-riordan-adventure-fishing/">Guy Riordan</a>