MapR automates data tiering for the hybrid cloud

MapR's latest product feature addresses a familiar issue in data management that will become a sleeper for those embracing cloud: how do you tier data between on premise clusters and the cloud?
Written by Tony Baer (dbInsight), Contributor on

While the cloud promises more operational simplicity for implementing big data projects, it doesn't mean that enterprises can bypass the cost and management issues they face with onboarding new applications and data sets. While Ovum expects that cloud will account for most greenfield big data implementations by 2019 (not much over a year from now), that still leaves plenty of organizations that will be juggling on premise and cloud deployments.

MapR's new Orbit Cloud Suite addresses organizations planning to manage or tier data storage for hybrid on premise/cloud deployments. It supports two-way movement of data from on-premise to the cloud and back.

MapR Orbit Suite takes advantage of the global namespace capability that's already built into MapR's proprietary file system (recently rebranded MapR-XD). In plain English, that means you can store metadata even if it's stored on a separate cluster; that's a capability that open source Apache HDFS lacks. The metadata can cover everything from file and/or table name, data type, security permissions, and so on. The new Orbit Suite feature automates what would have previously required complex manual coding.

With the new Orbit Suite, metadata management extends to data kept in cloud object storage systems; initially, AWS S3 and Azure BLOB Storage are supported, with Google Cloud Storage to come later. That leads to a capability that has long been a staple of classic information lifecycle management: storage tiering. Tiering is used for storing the data in the most cost-effective place.

Traditionally, this would have meant moving aging data off local disk to higher capacity near-line or offline archival storage. But storage tiering today is a far more complex balancing act because there are so many new options. At the high end, in-memory, SSD flash, and (soon) NVRAM storage are making it economical to use silicon for demanding applications. Hadoop's HDFS added options for active archiving use cases, where data that ordinarily would have been disposed of or shunted off the archive are kept available for big data analytics thanks to scale-out commodity hardware and cheap disk. Now add cloud storage to the equation, providing options that are even cheaper than HDFS. It's an equation that many Amazon EMR customers often balance.

Once you have moved data to cloud storage, the new Orbit Suite also allows you to provision compute clusters as well using native APIs, initially for AWS and Azure, with Google Cloud support to come later. It is a separate option, for customers who want to move data to cheaper storage but don't yet have the need for spinning up compute. Additionally, the new Orbit Suite offering rounds out MapR's Edge IoT preprocessing offering by adding the option to move data in real time to the cloud, not just to the on-premise cluster.

Managed cloud services are supposed to provide operational simplicity, but the richness of the options that they offer will throw new choices and complexities into the mix. With its new Orbit Suite offering, MapR is addressing what will become one of many sleeper issues for those embracing hybrid cloud strategies.

Editorial standards