If you're a CIO, you should be taking Hadoop seriously. A technology that few people knew about five years ago is now poised to be a major player in enterprise architectures.
For all the buzz around Hadoop, most traditional data warehouse users are still becoming aware of the possibilities of the new technology. Many continue to pigeon-hole Hadoop as only useful for web companies, or for manipulating "polystructured" data before putting it into a data warehouse.
The reality is that Hadoop is an incredible opportunity for most enterprises, both large and small.
For example, at the Hadoop Summit in Europe, Alasdair Anderson, Global Head of Architecture for HSBC Global Banking and Markets, gave a presentation on the theme of "Enterprise Integration of Disruptive Technologies."
The bank needed a single data platform that could provide 360-degree views of clients, operations and products. To provide this, the team had been struggling with a complex, "brittle" architecture based on over 150 source systems, 900 ETL jobs, 3 data warehouses, and 15 data marts.
The resulting system was expensive, and too slow to meet the business needs: it took months or years to make changes. The team concluded that they needed a different way of doing things, one that would support more agile, parallel streams of development, without being disruptive.
HSBC decided to try using Hadoop, with the work done in Gaungzhou, China. The project was a big success:
- Hadoop was installed and operational in a single week
- The 18 RDBMS data warehouses and marts were ported to Hadoop in 4 weeks
- The time it took to run an existing batch job dropped from 3 hours to 10 minutes
- New data sources could be included, such as information about financial derivatives stored in .pdf format.
Some Hadoop proponents assume that it's just a question of time before Hadoop gains the extra features that would enable it to take over all enterprise needs. But Anderson was careful to point out that the analytics needs of the project were different from more traditional data warehousing.
The focus of the project was fast-moving, "agile information" typically requiring several different iterations of analysis -- and he explained that other parts of the business such as the retail banking division might not have the same needs.
Over time, he believes the systems are going to be complimentary rather than Hadoop "ripping and replacing" the existing data warehouse: "We're not going to get rid of our relational databases, but we are going to examine each project to choose the right tool for right job, driven by price."
It's clear that the future is about using the best of old and new technologies, and that analytics professionals have to stay on top of new developments. As Anderson puts it, "I've decided the world has changed and I've decided to change my career with it."
SAP will redistribute and support various Hadoop distributions including the Hortonworks Data Platform, Cloudera Enterprise, and MapR. SAP customers are already using Hadoop in their organization, in organizations as diverse as football and genetics. To find out more, visit the SAP Big Data website.
For more detailed technical information about Hadoop can be integrated with traditional information architectures, check out the CIO Guide on Big Data: How to Use Hadoop With Your SAP Software Landscape.
[A version of this post first appeared on the Business Analytics Blog]