This morning, open source software and infrastructure provider Red Hat announced its Big Data strategy. ZDNet's Steven J. Vaughan-Nichols covered the news earlier today:
The very occurence of Red Hat's announcement, as well as its multiple facets, marks a new phase for Big Data: one where it has become a matter of mainstream IT infrastructural and app dev concern.
It ain't just Linux
Red Hat Enterprise Linux (RHEL) is arguably Raleigh, North Carolina-based Red Hat's flagship product, but the operating system arena is not by any means its only focus. Red Hat also has big irons in the storage, cloud and developer fires, and its Big Data strategy announcement addressed all three of these.
Big Data is now a relevant factor in the entire enterprise software stack.
One could argue that the crux of Red Hat's Big Data manifesto focuses on the hybrid cloud. Red Hat's Big Data narrative entails customers working on Big Data pilots/proofs-of-concept in the public cloud today, with the need to put those projects into production in the on-premises, private cloud in the near future.
I'm not sure if this narrative is quite as univeral as Red Hat would have us believe, but the motivation Red Hat derives from it is nonethless laudable: to make certain that Big Data projects can move seamlessly from the public cloud environment to the private cloud, or vice-versa, without "re-tooling."
What defines the strategy?
In order for that roundtrip to be possible, and in an environment built on Red Hat Enterprise Linux, Red Hat Storage, JBoss Middleware and the OpenShift cloud platform (as well as the OpenStack cloud platform overall), Red Hat announced the following initiatives:
- It will move its governance of its Red Hat Storage Hadoop adapter, which makes Red Had Storage compatible with Hadoop's Distributed File System (HDFS) to the Apache Software Foundation. This could pave the way for the adapter to be integrated into major Hadoop distributions. And given that Red Hat Storage uses a commodity-hardware-based distributed file system that maintains Hadoop's hallmark data locality, such an outcome wouldn't be unreasonable
- In order for Red Hat Storage to work effectively in the public cloud, Red Hat will pursue engineering to make Red Hat Storage accomodate multiple tenants
- Red Hat will fully support the JBoss Midleware Apache Hive Connector, allowing developers on its Java stack to work in a familiar, SQL query-oriented coding environment when working against Hadoop
- Red Hat will enhance JBoss to interoperate with MongoDB and other (unnamed) NoSQL databases. It will also support the Open Data Protocol (OData), an open framework (originally developed at Microsoft, but now progressing toward status as an OASIS standard) for exposing data sources as RESTful Web Services in JSON and AtomPub formats.
Who the strategy involves
Red Hat also announced it will be forging hardware and software partnerships with an eye toward developing a full ecosystem around its Big Data approach. One deliverable from these partnerships will be reference architectures that the company said could be used as "cookbooks" by enterprises to build out Big Data infrastructure with greater assurance of success.
What it Means
Red Hat rightly pointed out that the majority of Big Data projects are built on open source software (including Linux, Hadoop, and various NoSQL databases) and so it's fitting that such an important company in the open source world as Red Hat would announce its Big Data strategy.
What's especially significant here is that Red Hat is also an Enterprise software company, and it articulated a strategy aimed at making Big Data part of the mainstream Enterprise stable of tools and technologies. It's a big step in the maturation process for Big Data technology. That maturation will seemingly figure heavily in the tech world in 2013.