Four questions CIOs should ask about Big Data

Hortonworks' VP of Corporate Strategy attacks CIOs' Big 4 questions on Big Data.
Written by Andrew Brust, Contributor
Shaun Connolly Hortonworks

This guest post is from Shaun Connolly, VP Corporate Strategy at Hortonworks. Shaun has also held VP posts at VMWare, SpringSource, Red Hat and JBoss, and was a Director at HP.

By Shaun Connolly

It’s hard to believe that it’s been more than seven years since Apache Hadoop was founded. Its initial focus was to store and process the data on the Internet in a simple, scalable and economically feasible way. Those were humble beginnings for an open source technology that now finds itself in 2013 very much at the center of next-generation big data architectures.

Over the years, Hadoop has continued to mature from the domain of a small number of web monsters (including Yahoo!) to a technology that has crossed the chasm onto a large number of CIOs' agendas across mainstream enterprises. The rise of "Enterprise Hadoop" offers a refreshing opportunity for companies to benefit from a data platform that provides a compelling combination of technology, economic and business benefits.

Mainstream enterprise CIOs commonly ask four questions as they are thinking ahbout Big Data and how Hadoop fits in.

Q1. Big Data and Hadoop certainly has big hype, but what does it all really mean?

Hadoop, as the de facto technology for the Big Data era, is essentially predicated on two things.

  • First, it is about efficiency.  Hadoop provides a modern platform for storing and processing data in a way that enables companies to derive value from all of their data in ways that have previously not been possible, with an economic model that is orders of magnitude more cost effective, and in a manner that leverages existing investments and skills.
  • Second, it is about opportunity.  Since Hadoop can scale in a way that makes sense both technically and economically, it makes it possible to build analytic applications using new types of data that are able to drive 20% more productivity or unlock new revenue streams for today’s forward-thinking enterprises.

From an efficiency perspective, Hadoop as a data platform is designed to run on low cost commodity hardware without the specialized, expensive hardware of high-end RDBMS or HPC systems. Moreover, Hadoop not only unlocks new data storage and processing capabilities, but affords businesses the opportunity to assess their overall data architecture and enact a best-of-breed approach that focuses their Hadoop systems and traditional application databases and data warehouses on serving the workloads that they are best suited for.  Additionally, there are crucial enterprise requirements such as management, monitoring, data security and high availability that the Hortonworks distribution of Hadoop (i.e. the Hortonworks Data Platform) incorporates to ensure enterprise viability of the platform.

From an opportunity perspective, Hadoop unlocks the ability to refine and explore huge data sets spanning new and existing data sources at ever increasing scale. Use cases range from the well-understood analysis of Web clickstream and social sentiment data to the emerging advanced analysis of machine, sensor and geo-location data that is being produced at a prodigious rate. With Hadoop, enterprises now have the opportunity to move beyond simple analysis of data post-transaction and embrace an architecture that is capable of blending data across transactions, interactions, and observations so business results can be predicted pre-transaction, for example.

Q2. Overhauling the data center doesn't sound like fun. What is the reality of an implementation?

Many organizations with Hadoop experience say they adopted Hadoop for its extreme scalability, exploratory analytics, low cost, and support for multi-structured data. Therefore, you might start by making a business case for Hadoop based on these drivers and the targeted analytic applications Hadoop can enable.

While your first cluster may begin as its own silo, you should also think of Hadoop within the scope of your larger data architecture, making integration with business intelligence, data warehouses, and analytics a follow-on priority. 

Moreover, beware of the hype. There is much talk of the death of the Enterprise Data Warehouse. As much as that grabs a headline, the reality is not so extreme. From the beginning, our vision at Hortonworks has been focused on enabling a next-generation data architecture that seamlessly integrates both existing and new data systems (spanning application databases, data warehouses, Hadoop and others) in a way that unlocks new business value while preserving existing investments.

For that reason we’ve focused on creating deep and strategic integrations with partners such as Microsoft, Teradata, Rackspace and others, with the explicit goal of integrating Hadoop with existing data center technologies in a way that just works. While there is still more work to be done, we’ve made great progress on integrating Hadoop up and down the stack with:

a) Analytics and Business Intelligence tools such as Excel, Tableau, MicroStrategy, Business Objects, and SAS,

b) Data Systems and Data Integration Tools from vendors including Teradata, Microsoft, Informatica, IBM, and Talend,

c) Management Platforms including Microsoft System Center and Active Directory, and Teradata Viewpoint, and

d) Infrastructure Platforms such as Windows, Linux, VMware, Azure, Amazon Web Services, Rackspace OpenCloud, and OpenStack.

By focusing on the hard work of integrating Hadoop with commonly used platforms and tools, we aim to accelerate Hadoop’s adoption and success within the mainstream enterprise market.

Q3. Overhauling the team skills doesn't sound like fun either. How do I cope with that?

While Hadoop is a new platform, it provides familiar ways for Developers, Data Workers, and System Administrators to tap into and harness its power. For example, Developers familiar with Java, .NET and scripting languages such as Python or Pig have tools, SDKs and APIs to work with Hadoop. Data Workers familiar with SQL can leverage Hive (Hadoop’s data warehouse system) to query and interact with Hadoop data in a familiar way. And System Administrators tasked with operating Hadoop clusters have a range of choices including using the Apache Ambari Web Console, integrating directly with the Apache Ambari management and monitoring REST APIs, or pre-integrated and familiar experiences via Teradata Viewpoint, Microsoft System Center or other 3rd-party management solutions.

Hadoop, as any new platform, comes with a learning curve, so Developers, Data Workers, and System Administrators should invest in training so they are best prepared to get the most from Hadoop. They should also undertake this learning experience with a feeling of confidence that they are building important skills that will make them more valuable to their team and company.

Q4. Sounds like a job for next year. When is the time right for implementation? 

According to Merv Adrian at Gartner, who keynoted at Hadoop Summit 2013 in June, 30% of enterprises are already using Big Data with a further 34% planning implementations in thw next 12-24 months.

From a Hadoop perspective, our experience with customers is that almost all of the Global 1000 have deployed or have some clear plan for deploying Hadoop. Moreover, a majority of mainstream enterprises are enacting Big Data strategies. We've found that many enterprises that have had success with Hadoop started off by deploying targeted proof-of-concepts designed to spot and prove out their business opportunity. Once the initial use case was deployed, the race began on other use cases aimed at driving further competitive edge or operational efficiency.

Our advice: given the open source nature of Hadoop, a focused proof-of-concept provides a low barrier to entry and fastest path to initial success. Understanding and unlocking the value across all of your data is a key success factor of Big Data projects, and that process is better started sooner than later.

Editorial standards