Mesos co-creator: Where the cluster manager came from and where it's heading

With a datacenter operating system based on Mesos due in 2015, the cluster manager's co-creator and Mesosphere chief architect Benjamin Hindman sets out the technology's background and strengths.
Written by Toby Wolpe, Contributor
Benjamin Hindman: Unlike YARN, Mesos wants to run everything.
Image: Mesosphere
Cluster manager Apache Mesos has already secured a serious role at high-profile users such as Twitter, Airbnb, HubSpot, Groupon, eBay and OpenTable. On top of that, startup Mesosphere this month announced an ambitious datacenter operating system based on the software.

Yet some remain unclear where Mesos might fit into their infrastructure and where it sits against resource managers such as YARN and configuration management and orchestration tools like Puppet and Chef.

Mesosphere is a major contributor to the Mesos open-source project and describes the technology as enabling users to manage virtualised and non-virtualised datacenters as if they were one large machine, by creating a single elastic pool of resources from which all applications can draw.

Benjamin Hindman, who left Twitter in September to become Mesosphere's chief architect, is the co-creator of Mesos. The cluster manager emerged from his collaboration with peers in the AMPLab at UC Berkeley while he was working on parallel computing.

"What we were trying to do is figure out a better way to run things like Hadoop on our clusters. We had a couple of specific goals: we wanted to run multiple Hadoops at the same time and we wanted to run other things - Hadoop and, say, MPI [Message Passing Interface] - all on the same cluster resources," Hindman said.

Cluster managers had existed for some time. PBS - the portable batch system - dates from 1991 and was widely used.

"But the interface that cluster managers gave you was, 'Hey, here's a specification that describes how to run my job' and then the cluster manager goes and runs the job," he said.

"We recognised what we really wanted was an API, an interface, that was not, 'Here go and run the job' but, 'Can I get some resources to run the job? OK, run this job for me, tell me when the job finished, tell me when the job failed', so that you could programmatically build something against your cluster."

That programmable interface was the main difference between existing cluster managers and what Hindman and his colleagues were aiming to build.

"The idea is you can take something like Hadoop and you can program it against Mesos and then you can build other distributed systems, other applications, directly on Mesos as well: that's like [services scheduler] Marathon, that's like Chronos [distributed cron], Spark and a bunch of others as well," he said.

The relationship between Apache Mesos and what Mesosphere produces is like that between the Unix-like open-source Darwin operating system underlying Apple's OS X and OS X itself, according to Hindman.

"The distributed operating system is really what you could imagine is the stuff you'd want to put on top of that open-source kernel that takes it to the next level for enterprise customers," he said.

Mesos and the YARN resource-management layer share similar ideas because of their common origin in the same Yahoo-sponsored Berkeley lab.

"When Hadoop started to do a bit of an iteration from Hadoop MapReduce to YARN, where MapReduce is just a component of that, they borrowed a lot of the ideas from Mesos to realise that but they're still very different," Hindman said.

"Mesos itself is focused on not just running data analytics but also on running long-lived services, web apps - things that we considered at Twitter to be tier-zero services. It's really important that those things don't go down."

At Twitter services such as data pipelines and data analytics tend to be classed as tier one rather than tier zero.

"If they fail, at least people can still tweet. You might just not find out how many people favourited your tweet because you're not running the analytics in the background and those types of things," he said.

"So YARN and Mesos are similar but Mesos has been going after a more general, 'We want to run everything, not just your analytics but also your web services'. In fact, a big amount of the work in Mesos right now is running your stateful services as well - so your databases, your key-value stores and ultimately your distributed file systems as well."

Hindman said the principal distinction between Mesos and the popular Puppet and Chef tools is that they are more declarative.

"You tend to say with things like Puppet and Chef what you want to run on this box is this stuff. You store that on, say, a centralised server. When the box comes up, it connects back to the centralised server, finds out what it's supposed to be running and it runs that stuff," he said.

"It's still a very node-specific perspective. The Mesos abstraction is not. It's top down, application-specific where resources are just an abstraction - not machines but resources that you might be able to use."

Therefore multi-tenancy with tools such as Puppet or Chef requires manual configuration, specifying which machine is to run Hadoop and the Hadoop Distributed File System (HDFS) and Redis [data structure server].

"Whereas on top of Mesos, the Redis application connects, the Hadoop application connects, the HDFS application connects. They get allocated resources and they could all come from the exact same box. Mesos takes care of isolating them, running them on the box and so on," Hindman said.

"So it's an inverse of the abstractions of things like Puppet and Chef and the real reason you end up doing something like that is it's far easier to get things like multi-tenancy done because we can control it and therefore we can increase the utilisation much more easily."

However, while utilisation is obviously very important, a more important reason for using Mesos is the ease it provides for dealing with faults.

"When you do this node-specific thing and that machine actually dies, somebody gets paged and then they've got to go and rewrite the configuration for a different machine and have that other machine start doing that stuff and bring it up," Hindman said.

"When you do something like Mesos and that machine dies, the applications on top - things like the Marathon framework and Hadoop - can find out that those resources have now gone and they can just reschedule those applications for someplace else. That's pretty significant from an operations perspective."

Mesos does generate an event when a machine goes down. When Hindman worked at Twitter, alerts were set up whenever a machine was lost, together with a series of thresholds.

"If 100 machines are lost, maybe we should page somebody. If 10 machines are lost, everything's going to be rescheduled - not a problem. You have visibility, of course, if you want it but you tend to put thresholds in place. But in the Puppet and Chef world it's much more likely that if a single machine went down, you'd be alerted," he said.

"You'd probably set up some sort of redundancy when you're using something like Puppet and Chef to run things on multiple machines. But everything can be much more automated in a Mesos world. It removes the burden quite a bit."

Just over a week ago Mesosphere unveiled its plans to take that automation a stage further. The company intends to go to general availability early next year, possibly in the first quarter, with what it calls Mesosphere datacenter operating system, or DCOS.

"What we've done with DCOS is the way that you run your software, your distributed applications, in your datacenter and cloud, is based on technology that gives you the same kinds of primitives and abstractions that an operating system would give you on a single machine - except across all the machines and across all the resources in the datacenter," Hindman said.

"It's one of the things that make Mesos a really interesting fit because we're really abstracting the machines to resources. We can really categorise those resources and say things like, 'These resources are unsafe, high latency because they're far away, SLAs are quite low because clouds fail all the time'.

"We can introduce resources of this type and applications can start taking advantage of them. This is foreshadowing a direction we're interested in going, which is effectively creating a market economy for resources."

More on datacenter technology

Editorial standards