OpenStack hooks up with Hadoop to bring big data to the cloud

The new OpenStack release, dubbed 'Juno,' has lots of fixes, better support for hot upgrades, and now hooks up with Hadoop.
Written by Steven Vaughan-Nichols, Senior Contributing Editor

OpenStack, the open-source cloud from a who's who list of technology companies has just released its latest version, the big data-friendly OpenStack 2014.2 dubbed Juno.

The new OpenStack doesn't have many new featrures, but it is improved. Jonathan Bryce, the OpenStack Foundation's executive director, characterizes Juno as a more mature version of the earlier Icehouse codebase with ten times as many bugs fixed as new features added.

That said, Juno does include one major new feature: Sahara. This enables users to run Hadoop big data applications in OpenStack. Sahara also supports Apache Spark, the open-source in-memory computing framework which works hand-in-glove with Hadoop. At the same time, OpenStack Trove, the cloud's database service now includes a new clustering application programming interface (API) with initial support for MongoDB NoSQL database management system (DBMS) clusters.

Sahara was first known as Savanna when it was started by the major OpenStack integration company, Mirantis. This marriage of one of the most popular cloud programs with one of the most well-supported big data programs is likely to give both even more market acceptance.

As Mark McLoughlin, a Red Hat consulting software engineer member of OpenStack's Board of Directors said in an e-mail:

With each OpenStack release, we see a growing community and rapidly maturing software. Many of the improvements in this release continue to address key use cases and features which will enable OpenStack to extend its reach further into the enterprise. Red Hat is particularly proud to have supported the addition in Juno of the Sahara Hadoop and Spark cluster provisioning service — showing that OpenStack is expanding up the stack to add direct support for big data applications.

Another noteworthy feature is OpenStack Juno Swift storage moving up to Swift 2.0 cloud storage policies. This enables cloud administrators to have far more control over OpenStack's backend storage options.

OpenStack's Neutron network also now has better support for IPv6. With the Internet slowly but surely running off IPv4 addresses, this was a mandatory improvement. Nova also includes the first support for network function virtualization (NFV), but this is still beta, perhaps alpha, technology.

For those who just want to keep OpenStack up and running even while maintaining it, the biggest news is that the OpenStack Foundation claims that:

The Juno release contains numerous updates and enhancements that make it easier to build, operate, scale and upgrade OpenStack clouds. Compute components allow easier upgrades with less impact to the applications users are running, and include an additional driver for managing bare metal hardware directly. There were also significant updates to metering and monitoring capabilities that provide faster and more efficient performance.

Helping with keeping OpenStack going when the going gets tough, OpenStack Compute (Nova), is a new rescue mode that "enables booting from alternate images with the attachment of all local disks. Also, per-network settings are now allowed by improved nova-network code; scheduling updates to support scheduling services and extensibility; and internationalization updates."

In short, unless you're planning on combining your OpenStack cloud with Hadoop big data, this isn't the most exciting release. What's more important than new features, however, is that this is the most production ready OpenStack release to date. I think most cloud administrators will be happier about that than any bright and shiny features. 

Related stories:

Editorial standards