Big data technology is exciting, innovative and genuinely powerful. It can absolutely take Enterprise analytics to the next level...but not yet.
In Global 1000 organizations, and numerous smaller companies, skill sets and best-practices have been building for years around Business Intelligence (BI) technology and for decades around relational database management systems (RDBMSes). The products in these categories have superior tooling, manageability and fault tolerance. They offer user interfaces designed for non-developers. They are repositories for carefully crafted data models, refined over the years, representing unparalleled investment.
Meanwhile, Hadoop is typically used at the command line, controlled by imperative MapReduce code that must be written in Java, using a file system (HDFS) controlled by a single, vulnerable name node. Some browser-based tooling is emerging and technologies like Hive provide a primitive connection layer for BI tools, but we’re still at a 1990s-era level of sophistication. This stuff is not Enterprise-ready yet. It’s not even close.
Getting to insights
Turning data into information has indeed been a struggle for some time in many cases because of the politics and logistics of acquiring, sharing and cleansing data in the enterprise. Big Data won’t fix this any more than Business Intelligence did. Making data “bigger” only increases the surface area of the data to be governed and arguably makes its analysis more complex.
One thing Big Data has on its side is a more flexible and agile approach to schema, allowing it to be defined at query/analysis time, thus removing some of the complexity and bureaucracy in curating the data. But the tooling for managing unstructured data is relatively immature, and data specialists in the enterprise are not conceptually accustomed to it.
The long term potential for Big Data here is good, as it should shorten innovation cycles. But in the near term things just aren’t that actionable yet.
The small business use case, and strategy
If I’m a quick-serve entrepreneur owning five McDonald’s franchises in a mid-sized city, it’s not obvious how I could take advantage of Hadoop and MapReduce to get more customers and more visits. If I’m a large Web company, a major financial services firm, a manufacturing concern or a major retailer, with big, continuous streams of clickstream, market or sensor data, then the appeal of Big Data is much more straightforward.
I do think that smaller businesses should begin developing their Big Data strategies now though. Even they will have substantial clickstream data if they’re online (and most are) and even brick-and-mortar operations can start to accumulate volumes of in-store video recordings (which can reveal shopping habits, store layout effectiveness and product affinities). Data can help everyone, and when you stop throwing it away it becomes Big Data. But mining it has to get easier or small and medium businesses won’t be able to make their move.
Big Data’s ROI
In the Internet world, Big Data can pay off in increased eyeballs, session lengths and corresponding monetization. In the manufacturing world it can pay off in reduced or eliminated downtime of assembly lines (through predictive analytics on equipment breakdowns). In the Financial Services world, Big Data can lead to better, more effective, and therefore more lucrative, trading strategies. Media companies can sell more ad impressions. E-Commerce concerns can sell more product.
But each of these companies has something in common that the average Enterprise business unit may not have: the ROI for them is demonstrable enough and big enough to get them over the barriers to entry. Will the line-of-business teams in the Enterprise have the smarts, the budget, or the appeal to bring in the Hadoop specialists, statisticians and data scientists necessary to attain compelling ROI? Probably not. Big Data value through off-the-shelf products and professional services has to get good enough, cheap enough and mature enough for these customers to buy in.
Impact on broader IT strategy
Big Data definitely has the potential to transform organizations’ overall IT strategy. That’s because Big Data is about more than Big Data per se. For example, Hadoop’s use of direct-attached storage and commodity hardware is very disruptive to the common enterprise deployment of storage networks and expensive servers and appliances.
Hadoop may also cause enterprises to emphasize Java skills more and SQL skills less, among other shifts in skillset priorities. The clustering approach used by Hadoop may accelerate adoption of hybrid on-premise/cloud strategies too: it’s easier to push data to on-premise servers, but the elastic nature of the cloud may be more effective in addressing intermittent demand for extremely large clusters.
Math, statistics and data modeling skills are needed, and there’s a shortage of these. Universities are only now addressing this problem with degree programs in analytics and data science. Java programming skills, as I mentioned above, will be highly useful, even for jobs that are data-oriented and not developer positions. What may be most important though, and most difficult to find, are individuals who have these tech skills in combination with strong industrial domain expertise. That’s the winning formula, and it may be very hard to recruit people who fit with it.
Which industries benefit?
Again, Internet-focused organizations, as well as media, financial services, online retail and manufacturing are the industries who may have the most to gain. Supply chain companies, be they purveyors of parts and components, or distributors, can certainly be added to the list. So too can healthcare, be it in the areas of research, hospital management or payer/insurance operations. Marketing organizations across industries can get great benefit from Big Data.
I think every organization has Big Data…it’s just that some don’t monitor it, and many do not retain it. Those that can and do are in the best position to derive advantage from Big Data. Those that don’t have to evaluate the costs and benefits of changing their operations model in order to become data-driven.
Big Data and the Cloud
The commodity hardware and add-more-as-needed clustering approach of Hadoop has huge affinity to the cloud computing model. In general, elasticity is a feature of both. On the other hand, upstream bandwidth is still a limiting factor for Big Data in the cloud – it’s much easier to stream new data and maintain cloud databases (including Hadoop Distributed File System files) than it is to migrate that data en masse and establish the databases in the first place. This is yet another area where things will change, and barriers will melt away…eventually.
Big Data Challenges
Data quality is a very big challenge. So too is the broader question of data governance. In both cases, the prevalence of unstructured data can make data integration difficult and curb success rates. As well, the still modest level of maturity in many Big Data technologies is a potential pitfall as well. Lots of companies are only in the R&D phase with Big Data because of this. The technology will have to become much more accessible, and defensive project management skills more widespread, before Big Data can truly go mainstream.
CEOs and CFOs
I think lots of CEOs understand Big Data at a high level and therefore they want it. But their management teams have to understand Big Data more granularly, and execute on Big Data initiatives. At the risk of being repetitive, I don’t think we’re there yet. Big Data has to get easier for non-specialists to partake of, and for managers to understand fully, before it becomes pervasive.
At many companies, the Business Intelligence buying decision lies in the CFO suite. And if Big Data is the successor to BI, then it might stand to reason that CFOs will maintain this control. But Big Data project leaders will more likely come from IT and the line-of-business corners of the Enterprise. Being hands-on with the technology and being acquainted first-hand with the data are likely prerequisites for project success. And data for financials is relatively discrete too – maybe there are petabyte-scale general ledgers out there, but I haven’t come across them yet. So CFOs seem unlikely to be the business decision makers on the Big Data front.
Five year forecast
Big Data may be at the top of its hype cycle right now (or it might not be), but it’s definitely not a fad. In my experience, very little around data is. Whether it’s line-of-business app development and corresponding transactional database needs, or dimensional analysis, or the kind of predictive analytics and other insights to be gleaned from Big Data, we’re talking about useful, important technology.
Typically, a new data technology starts out as innovative and ground-breaking, then becomes mainstream and mission-critical, and eventually commoditizes, but it doesn’t often just fizzle out and go away. There’s no question in my mind that Big Data is going to be big and mainstream in the Enterprise, in the future. That future may or may not be within the five-year time horizon, depending on whether Big Data can get past its fragmented, cottage industry phase in that time interval.
Technology has to get very mature – even a bit boring – before it enjoys truly widespread Enterprise adoption and deployment. Big Data will get there, but it’s got to overcome several hurdles first.
What do you think? Is Hadoop ready for full-on corporate adoption? Leave a comment and let me know.