It's news that it may cause big double-takes in the world of Big Data. MapR announced Tuesday that the file system in its Converged Data Platform Hadoop distribution has been chosen by SAP to be used as its cloud storage technology for HANA, SAP IQ, and similar data workloads.
You read that right: MapR File System (MapR-FS), the company's drop-in replacement for the Hadoop Distributed File System (HDFS), has been selected by a major software company for general-purpose cloud storage, or at least purposes beyond Hadoop and Spark.
It was always special
MapR's file system was its original differentiator in the Hadoop market: unlike standard HDFS, which is optimized for reading, and supports writing to a file only once, MapR-FS fully supports the read-write capabilities of a conventional file system. That still doesn't explain why SAP would use it for broader purposes, of course.
But Vikram Gupta, Senior Director of Product Management at MapR, explained to me that, far from being just an enhanced version of HDFS, MapR-FS was in fact implemented as a standard file system from the get-go. After the core file system was developed, an HDFS-compatible interface was built on top of it, allowing MapR to swap it into its Hadoop distro as a replacement for generic HDFS.
Meanwhile, the full file system is still in there, and additional interfaces for NFS and POSIX sit on top of it as well. This allows different file system clients to treat MapR-FS differently, while all of them physically read and write data to the same place. That's important for companies who wouldn't want to use standard HDFS to store the "gold" copies of their data, but also don't want to pay the twin penalties of data movement and duplication.
Apparently, that's important to SAP too.
Distributed storage, and elastic infrastructure
Of course, HDFS itself (and thus MapR-FS' HDFS-like functionality) has features that make it work well in the cloud. First, it's a distributed file system, allowing multiple physical disks to be federated into a single storage volume. This allows for geo-distribution and mixing and matching of different drive types (for example, flash storage, SSD and spinning disks) into the system. That, in turn, allows for a storage hierarchy where data of different "temperatures" can be stored on different media. For example, frequently accessed data could be stored in flash while more archival, historical data could be kept on cheaper, spinning media.
HDFS and MapR-FS also have redundancy built in, keeping multiple replicas of each file, where each of those replicas is stored on separate physical drives. This makes both file systems resilient to the failure of any one drive, as bad drives can quickly be removed from the storage cluster and new drives added in their place just as easily. And that ability to add and remove disks so easily allows for the general elasticity that cloud computing demands.
It all makes sense now
If Microsoft and Amazon swap in and recommend their own blob storage in their Hadoop services, to substitute for HDFS, then why can't SAP go the other way and take a storage system used in a customized Hadoop distribution as a more mainstream file store?
And this MapR-SAP deal isn't a one-off either. When I spoke with Gupta, he was adamant that additional, similar deals would be pursued with other licensees. MapR really sees its Converged Data Platform as just that, a platform. And and one that transcends Hadoop, Spark and maybe even traditional "data" itself.