Business

MapR Brings tight integration with Kubernetes

MapR, the erstwhile Hadoop distribution vendor, is now delivering Apache Spark and Drill in forms able to run natively on Kubernetes, with no Hadoop required.

Written by Andrew Brust, Contributor April 2, 2019 at 5:00 a.m. PT

techrepublic cheat sheet

How to become a developer: Salaries, skills, and the best languages to learn

MapR has for years been transforming its Data Platform into being much more than a Hadoop distribution, focusing on unique innovations like its file system, database, streaming platform and more. While Hadoop and YARN have been there, running behind the scenes, they've been an implementation detail and have receded from being the emphasized technology in the platform.

Today, MapR is taking that Hadoop-independent approach one major step further, advancing it from marketing emphasis to architectural reality. The company is announcing its implementations of Apache Spark and Apache Drill can now run directly on Kubernetes (K8s), independent of Hadoop. In addition, MapR-XD, the company's file system, can serve as a K8s persistent volume, via a CSI (cloud storage interface) plug-in.

MapR's Spark/Drill-on-Kubernetes deployment models for private, public and hybrid cloud.
Credit: MapR

KISS for K8s

Featured

Beyond providing mere K8s-compatibility, MapR is is trying to help simplify the experience for its customers through the provision of K8s Operators and MapR's higher-level abstraction called a "tenant." The latter is designed to provide an easier way to deploy whole apps and associate specific users/groups with them; it abstracts away Kubernetes namespaces, pods and operators.

The implementation works on on-premises Kubernetes clusters, or those in the cloud -- whether they be running on cloud providers' IaaS virtual machines or their managed Kubernetes services (like Amazon Elastic Container Service for Kubernetes, Azure Kubernetes Service or Google Kubernetes Engine). The storage implementation can be implemented across VM disks or cloud storage services, though MapR says the latter can be significantly more expensive.

While this MapR offering does not represent a K8s-compatible implementation of the entire MapR platform, it does nonetheless enable arguably the most popular BI, data engineering and machine learning workloads to run directly on K8s. Furthermore, the company expects to bring more Data Platform components over to the K8s-native deployment model.

Hadoop's retreat

With the merger of Cloudera and Hortonworks on the one-hand, and MapR hurtling down the Hadoop-independent fast lane, it sure feels like Hadoop is on its back elephant foot. Spark, cloud object stores and Kubernetes have directly challenged Hadoop components MapReduce, HDFS and YARN, respectively. Meanwhile, cloud Hadoop services like Amazon EMR, Microsoft HDInsight and Google Cloud Dataproc are still popular -- including as platforms on which to run Spark.

The rumors of Hadoop's death may be greatly exaggerated, but it's not exactly the big data magic tonic and elixir it was drummed to be, say, eight years ago. Big data's here to stay and so is machine learning. But when it comes to deploying and implementing the technologies that do the work, there's a growing diversity of options and lots of competition.