The ultimate challenge in the end is putting enough useful Big Data capabilities into the hands of the largest number of workers. The organizations that figure out this part will reap corresponding rewards. There have been some interesting discussions lately about the growing chasm between the vast quantities of information that companies are storing and how much of it is successfully transformed into actionable knowledge. As the raw information in enterprises continues to grow exponentially -- due to the rapid growth in sensors, connected devices, rich media, social media, and even the Internet of Things -- companies are rapidly coming to understand less and less of what they have and what it means.
It's not a minor issue. The total information a company stores is usually its second most valuable asset, after its people. A powerful analogy would be if a company was forced to hire workers exponentially yet was only allowed to tap into fewer and fewer of them in a productive way. While this is actually a genuine problem for very large companies, workers themselves are generally autonomous and self-activating, whereas data just sits inert in databases until its queried, analyzed, and put to good use. And these days, that's exactly what's happening as raw data continues to pile up in IT systems and data warehouses.
Knowledge is where the value is being created in business today, and has been the leading source of economic power for several decades now. Many of the most interesting and intrinsically valuable new businesses are ones that are fundamentally powered, almost directly, by the total sum of their information. Google is the canonical early example of this, taking the sum of the observable data on the Web and making it navigable in its entirety better than anyone else. This produced in a single stroke one of the most valuable companies in history. Recent tech IPOs from firms like Pandora and LinkedIn are similarly based on the lock-in that they possess via vast pools of data that they create value from by controlling and mining it strategically (music genome and professional social network connections, respectively.)
Big Data: A Response to Data Overload?
Stores of business data are only as good as the methods for extracting them and putting them work. It's no surprise the data-centric companies tend to have more depth in this area than companies where information technology is not their core competency. The rise of Big Data, which is much more than just the immense volume of raw information pooling inside of most organizations, has become a signature challenge already in industries where rich data is standard fare. This includes health care, government, retail, manufacturing, and telecom.
It's certainly not that traditional businesses don't understand this, at least to a point. Analytics and business intelligence has recently been a hot topic for the reasons outlined above, yet Gartner reports that 70% to 80% of business intelligence implementations fail these days. The cause:
A combination of poor communication between IT and the business, the failure to ask the right questions or to think about the real needs of the business, means most business intelligence (BI) projects fail to deliver, the firm says.
In other words, all the usual challenges of CoIT. The problem of course, is that it's hard to know what the right questions are to ask up front, as clear as they might be in hindsight. Companies today are typically caught in what I call the Big Data "shallows", meaning they can't tap into the data they have very quickly, they don't have much reach into their data silos, they can't analyze it very well, and their means for generating meaningful and high-impact insights is limited, infrequent, and immature. While the good news at least is that most traditional companies want to do improve this state of affairs, they either have don't sufficiently understand the strategic implications, can't find a way forward, or they don't have the organizational willingness, or all three.
To further underscore this point, Tim O'Reilly recently stated in a Google+ conversation:
Companies that have massive amounts of data without massive amounts of clue are going to be displaced by startups that have less data but more clue.
The key word here is displaced, as in no longer relevant and just living on the fumes of former marketshare. Is this too drastic a pronouncement? Is Big Data competency really that disruptive? One could look AOL or the publishing industry (increasingly displaced by social media and content farms), NetFlix and the opening of their rental data to competitive algorithms, or Gold Corp and their opening up and crowdsourcing of prospecting data to attain industry leadership, and numerous other significant examples. McKinsey's new Big Data report contains a great many additional case studies across a large number of industries.
As has been pointed out before when it comes to organizing for new methods, IT is a force multiplier that is further driving the leaders and laggards apart. Big Data is another rung up the IT capability ladder that, for now at least, requires very clear business vision, technical competence, and willingness to experiment in order to succeed. In other words, it's not for the partially committed and is one reason that there is a lot of interest these days in creating data startups, as they have these traits ingrained in their DNA. Their entire business is built upon the premise that the ability to strategically build, control, and wield massive datasets quickly and effectively will create market leading growth and value through best-in-class open access. This is in stark contrast to the view that Big Data capabilities might be "nice-to-have" but not business critical, a view that traditional organizations often have, if they are even aware of Big Data.
Big Data: Much More Than A Big Pile of Information
Given that the term itself is simplistic, it should be understood that there at least three significant aspects of Big Data that make it unique, beyond "an order of magnitude more data beyond what you have now", which was one early definition.
Instead, Big Data refers to the integrated employment of the following capabilities:
- Fast Data. Recognizing that traditional methods for moving, processing, and querying data were not sufficient, the Big Data industry has created an entirely new set of techniques -- and adapting some of those that existed -- so that organizations can actually process the full universe of information that they possess in enough time to actually get inside the windows of key business processes and critical decision trees. Thus, Fast Data techniques provides the ability to 'see' all (or enough, anyway) of what you know in a short enough time to actually do something with what you've learned. Fast Data techniques, at least so far, have grown exponentially faster at approximately the rate of Moore' Law, just barely keeping up with Big Data growth volume in my research.
- Big Analytics. This is where the qualitative differences between traditional business databases and Big Data become more apparent. Where Fast Data is about new techniques to process and transform raw information considerably faster than ever before, Big Analytics is about turning information into knowledge using a combination of existing and new approaches. As you can see from the moving parts visual above, some of the classic players in analytics are in use here including MATLAB, SAS, and R. But some of the most interesting aspects of Big Data can be found in relatively new entrants such as Apache Hive and Mahout, the latter which brings to bear automated machine learning to finding hidden trends and otherwise unthought of or unconsidered ideas. In fact, an entire industry is growing up in smart information management systems that will "not rely on users dreaming up smart questions to ask computers; rather, they will automatically determine if new observations reveal something of sufficient interest to warrant some reaction, e.g., sending an automatic notification to a user or a system about an opportunity or risk."
- Deep Insight. The powerful yet unfocused tools of Big Analytics are not sufficient to reap the rewards of Big Data. That requires taking the sum of the information at hand, applying analytic processes to it, and finally generating new knowledge and insights using a a specific, situated method. Insight must be in the domain of the business to be useful, and this part of Big Data is where the technology is connected to ground truth in a feedback loop. That is, the tools of Big Analytics are just tools by themselves. It's not until they are directed at deriving a particular type of result that they are actually useful in a business context. Insights must also be connected to specific objectives (examples depicted in the moving parts visual above) in order to have high levels of impact.
In this way, Big Data realizes a useful and end-to-end business vision well beyond a simple, big lake of data into which everything is thrown. It's like the Einstein quote many throw around these days on this topic: "Information is not knowledge". The often very limited and fragmented approaches to data warehouses and business intelligence is about to be replaced wholesale with methods to turn all of a company's information into more immediately actionable collective intelligence to a much larger audience, internally and externally. This will be a road to enlightenment for some while failure for others will ultimately lead to the path of sunset for others.
The key lesson is this: Data is fundamentally strategic to the global economy where the largest value is now based on knowledge work. Those that invest in and use their Big Data in a strategic way that's commensurate to the value it holds will have the most opportunities in the future.
Big Data: A Tool for the Line Worker?
Does all this seem like a lot to master for the average business? It can be. But help is on the way. For one, we'll see the domain of Big Data expertise become accessible via the cloud through packages to pull all of the above together in a form as turnkey as possible. Examples of full service offerings of Big Data include IBM's Smart Analytics Cloud and Birst, though there are many others. The second is that setting up Big Data using an Apache stack is no longer rocket science and technologies like Hadoop are maturing enough for most to be able to apply.
I believe that the ultimate challenge in the end is putting enough useful Big Data capabilities into the hands of the largest number of workers. The organizations that figure out this part will reap corresponding rewards. In order to have enough impact, Big Data capabilities must be as easy to use as Google search. So that is the Big Data holy grail: The combination of this vision with ready access those that need it, while they work.
In other words, Big Data + Usability + Broad Access = Scalable Competitive Advantage. Miss any of these, and you'll miss much of the opportunity in this fast growing field.
To close the "clue gap", wise organizations will seek to get out of the Big Data shallows with all three aspects of Big Data while simultaneously delivering capabilities usable and within reach of every worker. If Big Data requires a new set of high priests (i.e. extensively trained experts) to use, it will become a niche story. Fortunately, I see that is increasingly unlikely.
Do you see opportunities to apply Big Data to your organization, or does it feel too distant and complex?