To address Big Data challenges in a cost-effective way, many organizations are turning to Hadoop, an open-source framework. Hadoop enables applications to run across large arrays of nodes, accessing petabytes' worth of data. Hadoop's popularity – now employed at 37% of sites in a recent Forrester research survey – grew out of initial implementations at several of the big Web companies that faced the Big Data onslaught some time before the rest of the business world.
Organizations that are becoming adept at service oriented approaches to problems may be in a prime position to bring the Hadoop framework into their operations as well. I recently had the opportunity to join John Akred, data and platforms lead at Accenture Technology Labs, along with Julianna DeLua, Enterprise Solution Evangelist for Big Data from Informatica, for a panel discussion on Hadoop’s role in the emerging "Data as a Platform paradigm." The session was part of the Hadoop Tuesdays Webinar series, sponsored by Informatica and Cloudera.
Data as a platform, supported by Hadoop, addresses concerns that SOA practitioners have had for years. As Akred put it, for too long, many enterprises have been attempting to sort through increasingly complex spaghetti architectures with point-to-point data integration. “They get to the point where when they want to introduce a new product or make a change, they have to touch 30 different systems,” he said. “That has real consequences in the marketplace for enterprises and their ability to adjust to market conditions and succeed.”
Rather than organize data stores around applications, which are then awkwardly integrated as new applications come along, the Data as a Platform approach maintains data as a cross-enterprise resource. "You can build an enterprise that looks more like an Amazon and NetFlix, where those companies are well known for using service oriented architectures to manage the complexity of their application infrastructure and using these new emerging technologies to handle the data at scale."
Here's how a Hadoop-enabled SOA is architected, according to Akred:
"We take the data infrastructure layer, and take data stores like Hadoop, and the existing enterprise systems that give that data valuable context and integrate those at the data layer. And we abstract that integrated data platform from the consuming applications via service-oriented data access patterns. So we're exposing our enterprise data platform to the enterprise via services rather than direct query access.
"Service oriented architectures aren’t new. But realizing the abstraction of the applications from the data layers via service oriented architectures has not been easy, and in many cases enterprises end up essentially implementing point to point interfaces over service oriented architectures. When you get to the data platform view, its really important to build well-known web services that enable data access, so that application developers are no longer having to understand the performance characteristics and implementation of a database."
Such a Data as a Platform approach also helps open enterprise data silos to both internal and external data consumers, Akred adds. "In many organizations, opening up your enterprise silos to the rest of the enterprise this is still a pretty adventurous goal in and of itself. But some organizations are embracing opening data outside their enterprise."
(Full disclosure: I am compensated for my role as Hadoop Tuesdays series moderator.)