MapR, DataStax offer options for container persistence

If it's lack of options for persistence that's keeping you from using containers, maybe it's time you reconsider.
Written by George Anadiotis, Contributor

Containers are great, having to re-invent persistence to use them, not so much. But that is changing.

You're not the only one concerned about container persistence: 82 percent of respondents in a recent survey said they'd deploy containers within two years if storage challenges could be resolved. That is a legitimate concern, as containers can offer great versatility and efficiency compared to virtualization, but being forced to work with stateless apps or to implement hacks to work with state management in containers poses severe limitations.

It also poses great opportunities for solutions that can offer a way forward to app developers, which is why MapR announced its PACC initiative to address this gap. As Andrew Brust wrote last week, "the new MapR Persistent Application Client Container (PACC) allow converged apps in Docker containers to see running MapR clusters, just as if the apps were deployed directly to a virtual machine or physical server."

This means PACC apps will be able to persist their state on MapR-FS, MapR-DB, or MapR-Stream. Somewhat puzzlingly, MapR's PACC announcement features praise from Mesosphere's VP of product marketing Edward Hsu: "MapR's recent innovation to make its platform available in a containerised setup expands the already broad set of backing data services that work on DC/OS. We're excited to see MapR embracing containers and together explore the opportunities that this move opens up."

Puzzlingly, because this seems to clash with Mesosphere's own Container 2.0 stack, marketed as Container + State and implemented as an alliance with Confluent, DataStax, and Lightbend. The former two are MapR's competitors in streaming and storage respectively.

So how does PACC stack up against Container 2.0, and why would Mesosphere partner with a competitor?

Apples and oranges?

"While these may sound like similar solutions, they actually solve different problems," says Dale Kim, MapR senior director, industry solutions.

Kim further says:

"The MapR solution focuses on application deployment, not server deployment. Any organisation that is struggling with getting the full value out of containers due to trade-offs with existing storage options now has a seamless way of deploying all types of applications, including stateful applications.

The MapR solution is even useful for stateless applications that generate data like log files. With container-based deployments, organisations can deploy containers on any node, in any data center, even across the cloud, because the MapR solution does not require storage-specific, or even MapR-specific, provisioning on the servers that will run containers.

Contrast this to the DataStax solution, which is mostly about deploying Cassandra servers in containers. It does not address the issue of adding a persistence tier to existing applications, and requires an application rewrite. While attractive for test and development environments where the agility of standing up new isolated database clusters is useful, it is not an ideal configuration for production environments due to the heavyweight nature of containerised servers.

The overhead of shutting down and restarting a containerised server offsets the benefits of having them virtualised, especially since database servers in production environments ideally use dedicated resources and need to be stable. The administrative overhead of separate silos, especially for modern applications that include multiple data stores, also remains a problem."

Mesosphere in the middle

Carefully chosen words, aimed directly at DataStax but not at all at Mesosphere. And how could they, since Mesosphere worked with MapR to enable PACC. Do they see this as a co-opetitive relationship?

Tobi Knaup, Mesosphere's CTO, says "this is a partnership that both makes MapR technologies easier to use and expands the set of storage options that work with Mesosphere DC/OS. DC/OS has several integrations available with SAN-based external storage vendors, and with PACC, we now have a filesystem-level storage option for users."

What about integration, how did the two vendors go about this? "No framework integration was necessary, because the administration of PACC is done via DC/OS Container Orchestration," says Knaup.

"Our solution requires no specific integration with DC/OS, as these two technologies work seamlessly together since they address different functions in the stack," concurs Kim, adding that "this is also why MapR works with other orchestration systems like Kubernetes out of the box."

While both parties are careful not to step on each other's toes, there's something going on here. On the one hand, attracting more vendors to its ecosystem is great for Mesosphere and MapR is a major player in the Hadoop world. But MapR offers much more in terms of persistence than filesystem-level storage, so in that sense it competes with DataStax as the persistence layer of the Container 2.0 stack.


Trying hard not to go head-on?

DataStax vs. MapR -- Not?

So what does DataStax have to say about that?

"For developers, tools like Hadoop are beyond the "cool project" stage and so there will obviously be something new to replace this as the next big thing, and you could argue Docker is that thing. However, just joining up Docker and Hadoop does not make things easier to run in a cloud environment." says Patrick McFadin, ‎chief evangelist for Apache Cassandra at DataStax.

McFadin continues:

"As for the use of containers in general, a customer isn't going to make a product decision based on something like a deployment option. It will help developers as they try new systems and learn about how the product will work for them. In that case, end users still evaluate on features and how they solve a business problem. So, the MapR deal is not competitive to DataStax.

Now, software containers are going to grow in importance, and this kind of approach will be more common over time. Shipping a converged data platform on its own won't solve the problem of how to manage huge volumes of data coming into the business, or that different sets of data will need different handling methods in order to make the most out of it.

New tools like graph databases can and should be used alongside data lakes based on the data models that companies want their app information to be in. Managing that as part of a converged approach to data is where I see the biggest problem for companies to solve."

Mirror mirror on the wall, who's the fairest platform of them all?

Quantium, a data analytics company focused on retail, financial services and media markets, has been running containers in production on MapR for over 2 years now, and has worked closely with MapR from the beginning.

"Moving from a non-containerised to a containerised world has been challenging, but today we are able to do things we would never be able to do with a non-containerised architecture," says Gerard Paulke, Quantium enterprise architect.

Paulke continues:

"Things like multi-tenancy and being able to support multiple environments on the same hardware suddenly become easy. The introduction of the new PACC tools and pre-built, certified container images add an extra level of security and allow interacting with MapR FS, Streams and DB easier than ever.

We looked at other options but none were as mature, straight forward or cost effective as running containers on MapR. SAN/NAS technology was cost prohibitive and didn't fit into our strategy of distributed, scalable infrastructure. There were no other distributed filesystems with POSIX support that would allow us to run containers alongside Hadoop analytics workloads.

By going with PACC, we are now able to securely allow application containers the ability to store their state (logs, configuration, etc) on MapR's distributed file system. This also means we can increase utilisation of our data center by running application and analytics workloads alongside each other. It enables us to leverage the investment in our analytics platform.

Another huge benefit is the ability for containers to migrate to other nodes on hardware failure while maintaining application state on the MapR filesystem. By utilising MapR volumes, we get built in backups through the use of snapshots, and replications of application state across clusters / data centers."


What is the "best" big data platform out there? It takes more than a one-size-fits-all magic mirror to answer that.

Image: Science Museum, London

Of course, we have to keep in mind this comes straight from a happy MapR customer. Doubtless, DataStax or any other vendor for that matter will easily produce happy customer stories to back up their position if asked to. So where does the truth lie? As usual, somewhere in the middle.

Containers are a big deal for app deployment, and persistence is a big deal for containers. In that sense, MapR seems to have the lead currently in terms of persistence options for containers.

On the other hand, deployment options are indeed probably not the primary criterion for most when it comes to big data product decisions. Which is why DataStax chooses to emphasize on what they see as their strengths, such as graph processing.

The bigger picture is that products like MapR and DataStax are turning to platforms vying for market share, and deployment options are one of the competing areas for such platforms. The more container persistence options out there, the more the ecosystem benefits.

Why building IoT solutions remains a challenge for developers:

Editorial standards