Big Data's enterprise-readiness

Big Data's enterprise-readiness

Summary: Last week, ZDNet Editor-in-Chief Larry Dignan and I debated the enterprise-readiness of Big Data. I believe such readiness is still a ways off, for a variety of reasons. I reiterate my debate points in essay form here.


Big data technology is exciting, innovative and genuinely powerful. It can absolutely take Enterprise analytics to the next level...but not yet.

In Global 1000 organizations, and numerous smaller companies, skill sets and best-practices have been building for years around Business Intelligence (BI) technology and for decades around relational database management systems (RDBMSes). The products in these categories have superior tooling, manageability and fault tolerance. They offer user interfaces designed for non-developers. They are repositories for carefully crafted data models, refined over the years, representing unparalleled investment.

Meanwhile, Hadoop is typically used at the command line, controlled by imperative MapReduce code that must be written in Java, using a file system (HDFS) controlled by a single, vulnerable name node. Some browser-based tooling is emerging and technologies like Hive provide a primitive connection layer for BI tools, but we’re still at a 1990s-era level of sophistication. This stuff is not Enterprise-ready yet. It’s not even close.

Getting to insights
Turning data into information has indeed been a struggle for some time in many cases because of the politics and logistics of acquiring, sharing and cleansing data in the enterprise. Big Data won’t fix this any more than Business Intelligence did. Making data “bigger” only increases the surface area of the data to be governed and arguably makes its analysis more complex.

One thing Big Data has on its side is a more flexible and agile approach to schema, allowing it to be defined at query/analysis time, thus removing some of the complexity and bureaucracy in curating the data. But the tooling for managing unstructured data is relatively immature, and data specialists in the enterprise are not conceptually accustomed to it.

The long term potential for Big Data here is good, as it should shorten innovation cycles. But in the near term things just aren’t that actionable yet.

The small business use case, and strategy
If I’m a quick-serve entrepreneur owning five McDonald’s franchises in a mid-sized city, it’s not obvious how I could take advantage of Hadoop and MapReduce to get more customers and more visits. If I’m a large Web company, a major financial services firm, a manufacturing concern or a major retailer, with big, continuous streams of clickstream, market or sensor data, then the appeal of Big Data is much more straightforward.

I do think that smaller businesses should begin developing their Big Data strategies now though. Even they will have substantial clickstream data if they’re online (and most are) and even brick-and-mortar operations can start to accumulate volumes of in-store video recordings (which can reveal shopping habits, store layout effectiveness and product affinities). Data can help everyone, and when you stop throwing it away it becomes Big Data. But mining it has to get easier or small and medium businesses won’t be able to make their move.

Big Data’s ROI
In the Internet world, Big Data can pay off in increased eyeballs, session lengths and corresponding monetization. In the manufacturing world it can pay off in reduced or eliminated downtime of assembly lines (through predictive analytics on equipment breakdowns). In the Financial Services world, Big Data can lead to better, more effective, and therefore more lucrative, trading strategies. Media companies can sell more ad impressions. E-Commerce concerns can sell more product.

But each of these companies has something in common that the average Enterprise business unit may not have: the ROI for them is demonstrable enough and big enough to get them over the barriers to entry. Will the line-of-business teams in the Enterprise have the smarts, the budget, or the appeal to bring in the Hadoop specialists, statisticians and data scientists necessary to attain compelling ROI? Probably not. Big Data value through off-the-shelf products and professional services has to get good enough, cheap enough and mature enough for these customers to buy in.

Impact on broader IT strategy
Big Data definitely has the potential to transform organizations’ overall IT strategy.  That’s because Big Data is about more than Big Data per se. For example, Hadoop’s use of direct-attached storage and commodity hardware is very disruptive to the common enterprise deployment of storage networks and expensive servers and appliances.

Hadoop may also cause enterprises to emphasize Java skills more and SQL skills less, among other shifts in skillset priorities. The clustering approach used by Hadoop may accelerate adoption of hybrid on-premise/cloud strategies too: it’s easier to push data to on-premise servers, but the elastic nature of the cloud may be more effective in addressing intermittent demand for extremely large clusters.

Skills Shortage
Math, statistics and data modeling skills are needed, and there’s a shortage of these. Universities are only now addressing this problem with degree programs in analytics and data science. Java programming skills, as I mentioned above, will be highly useful, even for jobs that are data-oriented and not developer positions. What may be most important though, and most difficult to find, are individuals who have these tech skills in combination with strong industrial domain expertise. That’s the winning formula, and it may be very hard to recruit people who fit with it.

Which industries benefit?
Again, Internet-focused organizations, as well as media, financial services, online retail and manufacturing are the industries who may have the most to gain. Supply chain companies, be they purveyors of parts and components, or distributors, can certainly be added to the list. So too can healthcare, be it in the areas of research, hospital management or payer/insurance operations. Marketing organizations across industries can get great benefit from Big Data.

I think every organization has Big Data…it’s just that some don’t monitor it, and many do not retain it. Those that can and do are in the best position to derive advantage from Big Data. Those that don’t have to evaluate the costs and benefits of changing their operations model in order to become data-driven.

Big Data and the Cloud
The commodity hardware and add-more-as-needed clustering approach of Hadoop has huge affinity to the cloud computing model. In general, elasticity is a feature of both. On the other hand, upstream bandwidth is still a limiting factor for Big Data in the cloud – it’s much easier to stream new data and maintain cloud databases (including Hadoop Distributed File System files) than it is to migrate that data en masse and establish the databases in the first place. This is yet another area where things will change, and barriers will melt away…eventually.

Big Data Challenges
Data quality is a very big challenge. So too is the broader question of data governance. In both cases, the prevalence of unstructured data can make data integration difficult and curb success rates. As well, the still modest level of maturity in many Big Data technologies is a potential pitfall as well. Lots of companies are only in the R&D phase with Big Data because of this. The technology will have to become much more accessible, and defensive project management skills more widespread, before Big Data can truly go mainstream.

CEOs and CFOs
I think lots of CEOs understand Big Data at a high level and therefore they want it. But their management teams have to understand Big Data more granularly, and execute on Big Data initiatives. At the risk of being repetitive, I don’t think we’re there yet. Big Data has to get easier for non-specialists to partake of, and for managers to understand fully, before it becomes pervasive.

At many companies, the Business Intelligence buying decision lies in the CFO suite. And if Big Data is the successor to BI, then it might stand to reason that CFOs will maintain this control. But Big Data project leaders will more likely come from IT and the line-of-business corners of the Enterprise. Being hands-on with the technology and being acquainted first-hand with the data are likely prerequisites for project success. And data for financials is relatively discrete too – maybe there are petabyte-scale general ledgers out there, but I haven’t come across them yet.  So CFOs seem unlikely to be the business decision makers on the Big Data front.

Five year forecast
Big Data may be at the top of its hype cycle right now (or it might not be), but it’s definitely not a fad. In my experience, very little around data is. Whether it’s line-of-business app development and corresponding transactional database needs, or dimensional analysis, or the kind of predictive analytics and other insights to be gleaned from Big Data, we’re talking about useful, important technology.

Typically, a new data technology starts out as innovative and ground-breaking, then becomes mainstream and mission-critical, and eventually commoditizes, but it doesn’t often just fizzle out and go away. There’s no question in my mind that Big Data is going to be big and mainstream in the Enterprise, in the future. That future may or may not be within the five-year time horizon, depending on whether Big Data can get past its fragmented, cottage industry phase in that time interval.

Technology has to get very mature – even a bit boring – before it enjoys truly widespread Enterprise adoption and deployment. Big Data will get there, but it’s got to overcome several hurdles first.

Talk back
What do you think?  Is Hadoop ready for full-on corporate adoption?  Leave a comment and let me know.

Topics: Big Data, Making the Business Case For Big Data

Andrew Brust

About Andrew Brust

Andrew J. Brust has worked in the software industry for 25 years as a developer, consultant, entrepreneur and CTO, specializing in application development, databases and business intelligence technology.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Serengeti Helps Enterprise Respond to the Big Data Challenge

    Andrew, good overview. It may be worth pointing out that VMware already has an open source solution to a big part of this problem by using the recently announced Project Serengeti. It enables rapid deployment of standardized Apache Hadoop clusters on an existent virtual platform, using spare machine cycles, with no need to purchase additional hardware or software.

    Here is a short overview of how Serengeti Helps Enterprise Respond to the Big Data Challenge

  • Making Hadoop (and related projects) an enterprise-viable data platform

    Hi Andrew,

    Good topic. At Hortonworks, our focus is on enabling Apache Hadoop and related projects to come together into an enterprise-viable data platform. This means focusing on making the component projects as well as the overall data platform easy to use, consume, and operate, as well as enhancing the platform's ability to capture, process, and exchange data. Enabling and integrating with the ecosystem at every layer of the stack (Applications, Business Tools, Development Tools, Data Movement & Integration, Data Management Systems, Systems Management, and Infrastructure) helps drive the APIs and integration points that are important to accelerate enterprise adoption and ensure the largest and most vibrant market opportunity.

    In my Hadoop Summit presentation, I covered many of these points:

    Since Apache Hadoop (and related projects) are open source, it is also important to strike a proper balance between Community Innovation and Enterprise Stability:

    Clearly communicating what's ready for broad enterprise use versus what's ready for community / tech savvy early adopters is critically important since most enterprises are not tech savvy, early adopters.

    Successful enterprises focus on the "ilities", and when they adopt new technologies, they try to do so in a way that maximizes their existing investments and skills. Doing a wholesale replacement of data center architectures and components, for example, is not an option.

    Having worked at JBoss, Red Hat, SpringSource, and VMware, I'm convinced (as it sounds like you are) that Hadoop's open source future is bright. With that said, there remains a lot of hard work and innovation ahead to ensure this technology integrates well with enterprise data architectures and delivers on its promise.

    Fun times!
  • Cost of Big Data

    Great piece Andrew, I agree that oftentimes the cost of implementing a Big Data strategy (specifically the hardware) can prohibit companies from investing in Big Data technologies. That is why we have developed a new software that more efficiently stores Big Data, enabling companies to confidently choose a cheaper hardware that uses up to 50% less energy. I would be interested to hear what you think about our technology:
  • I think you're completely missing it...

    What if you could take all of the Excel spreadsheets sitting on your file shares and SharePoint lists, convert them to plain text or XML, drop them into a repository where the data could be rearranged into key-value pairs and transformed into structured data that exposed new relationships in your ERP and CRM systems? What if you could then feed those relationships back into your highly modeled OLAP system and gain a new valuable perspective that could be presented to the C level executives when it was only previously visible to a few accountants? That is the real value of big data and Hadoop, but apparently the marketing folks in most companies can only show word count and social media examples. There are some of us that have been in the IT industry for the past 15 to 20 years that can see how to pull extreme value out of big data and Hadoop right now. And the only skills we need are some basic Linux admin, a basic knowledge of Java programming, some basic SQL query knowledge, and a good feel for the holes in our company's data relationships. Then it's just a call to HortonWorks for a week of training for some of the more experienced staff, and we're off and running. The problem is that everyone is too focused on the social media examples to see the real value. And regardless of what the hardware vendors might say, racking in a few old Dell R900 servers that you just replaced with new VM hosts is a great option. n fact, you could take a look at eBay and grab a whole rack of off-lease commodity servers for a few thousand dollars. Actually, pick up two or three racks in case a few servers fail. You can get a few racks for the same price as one new high-performance half-full blade chassis. Put the name server on one of your new VM hosts for HADR, and you're all set. This is actually much easier than it looks, you just have to be willing to invest a little bit into your IT staff rather than relying on a magic bullet system.
  • Big Data or "Democratization" of Data?

    "Big Data" means control and inflated costs.

    The latter means data is devalued and nobody can live off it (since food, shelter, and keeping up with technologies needed to be competitive with are not free, oopsie), but a supply-side economy always assumes workers and customers earn their money by growing big magical trees (or gets money from the same Fed that bails out or subsidizes the Big companies at every turn...)