Unstructured data: the elephant in the Big Data room

Unstructured data is still ungovernable, a new survey finds. But enterprises are going to have to reckon with new forms of information in its many forms.
Written by Joe McKendrick, Contributing Writer

Unstructured data is still ungovernable, a new survey finds. But enterprises are going to have to reckon with new forms of information in its many forms.

As part of my work with Unisphere Research, a division of Information Today, Inc., I helped conduct a new survey that finds unstructured data is growing at a faster clip than relational data -- driving the "Big Data" explosion. Thirty-five percent of respondents say unstructured information has already surpassed or will surpass the volume of traditional relational data in the next 36 months. Sixty-two percent say this is inevitable within the next decade. The survey gathered input from 446 data managers and professionals who are readers of Database Trends and Applications magazine, and was underwritten by MarkLogic.

Many organizations are becoming overwhelmed with the volumes of unstructured information -- audio, video, graphics, social media messages -- that falls outside the purview of their "traditional" databases. Organizations that do get their arms around this data will gain significant competitive edge. A majority of survey respondents acknowledge that unstructured information is growing out of control and is driving the big data explosion – 91% say unstructured information already lives in their organizations, but many aren’t sure what to do about it.

A segment of companies, 16%, have made unstructured data part of their actual business offerings.

There is growing concern across the business technology landscape about organizations' inability to effectively tap these new resources. Last month, estimates were released that show that this year, the Digital Universe — meaning every electronically stored piece of data or file out there — will reach 1.2 million petabytes, or 1.2 zettabytes, this year. That’s up from a measly 800,000 petabytes in 2009. In a recent interview with MIT's Sloan Management Review, K. Ananth Krishnan of Tata Consultancy Services described what's at stake for businesses that fail to leverage their growing unstructured data stores:

"We are only looking at what we have in our data warehouses, it’s not going to be enough for us to get the insights that we need. If you’re a retailer and you were not using all the information you could to judge your customers’ buying patterns, then the retailer across the street probably will, and they’ll steal your customers. That’s the realization, I think, that drove a lot of people to think that they should be capturing much, much more."

In terms of technologies and governance, organizations don't feel they're ready for all this data. Many companies don’t understand how to handle unstructured data and throw old technologies at the problem. With relational databases, companies are attempting to use 30 year-old technology to try to tackle today’s information challenges.

The Unisphere/MarkLogic survey also found that 86% of respondents admit that unstructured data is important to their organization, yet only 11% have clear procedures and policies for managing unstructured data in place. In addition, 80% of respondents know the amount of unstructured data will rise in the next three years, but only 24% of respondents believe their current infrastructure will be able to adequately manage it.

We still have a long way to go, Krishnan says in the Sloan review article. Technology that can grasp and pull insights out of this variety of data is still on the cutting edge:

"There are still loads of things that we can’t do. There is a whole aspect of computing which PhD students are working on, which is basically trying to understand text. Understand sentiment. A five-year-old child can say in 30 seconds whether Mom or Dad is angry, or happy, or whatever. Sense the mood in the room. A computer program still has a hard time figuring that out.... The analysis of text, the analysis of video, the analysis of audio — it works a lot better in James Bond movies. In real life, it is extremely hard from a fundamental computer science perspective to understand all that information."

Management awareness of the existence of this data, let alone how it can benefit the business, is where we need to start. The Unisphere/MarkLogic survey finds 40% of managers are unaware of the extent unstructured data exists in their organization, and only 45% of organizations are moderately or strongly committed to leveraging this resource, creating a competitive advantage for those companies that do.

Even organizations with higher concentrations of unstructured information face issues with corporate awareness regarding the existence of this data.  They devote most of their resources to managing the smaller portion of their data, while moving unstructured information into special-purpose databases or content management systems.

Cross-posted at CBS SmartPlanet Business Brains site.

Editorial standards