EMC has integrated the open-source Hadoop Distributed File System into its EMC Isilon scale-out storage system, to help it make products that can organise massive unstructured datasets.
The Isilon OneFS 6.5 operating system was given greater understanding of the Hadoop Distributed File System (HDFS) to add redundancy to the storage architecture and make it easier to combine with Greenplum HD analytics Hadoop-based software, EMC said on Tuesday.
"We look at Hadoop as evolving to become the leading unstructured analytics platform," Nick Kirsch, director of product management for EMC Isilon, told ZDNet UK. "While organisations are interested in Hadoop... [at the moment it means] taking on an untested and unproven storage infrastructure layer."
Hadoop has a single point of failure for data storage in HDFS — the NameNode — which stores and manages metadata. If the server the NameNode is on goes down, then the filesystem becomes unavailable. A number of Hadoop vendors, including MapR, whose technology EMC licenses, have worked on getting rid of this problem.
EMC has dealt with the single point of failure in HDFS not by tweaking Hadoop, but by making changes to the Isilon OneFS 6.5 operating system. With the changes, the NameNode protocol has been implemented into Hadoop on OneFS's scale-out multi-node NAS storage.
"We've turned the HDFS storage layer into a protocol instead of a file system," Kirsch said. "Hadoop is storing files natively into OneFS and hence the high availability characteristics of OneFS are now available to Hadoop."
By doing this, the company has created a tool that can combine the scalable, multi-source, Hadoop storage framework with the mature Greenplum analytics technology, according to IDC storage systems analyst Benjamin Woo.
The capability is available as a maintenance update to the OneFS 6.5 operating system, which all Isilon customers can download as a free software update.
However, the number of companies that actually use Hadoop is limited, and those that list themselves on the Hadoop user page — Facebook, Amazon, Twitter, Adobe, and so on — have vast technical resources at their disposal.
"The market for HDFS is minimal at this point," Woo told ZDNet UK. "There are many proof-of-concepts going on."