In a perfect world, DevOps (the culture of integrating software development and its day-to-day operation in the data center) should be seamless, with Devs and Ops working side by side in harmony and understanding each other's playing field, options and consequences. In the real world, that is easier said than done.
Part of it may have to do with culture and workplace collaboration, but there's no denying that there are also real issues that impede the ideal of seamless collaboration. It all comes down to being able to see the greater picture, and analytics may be able to lend a helping hand there.
A new initiative announced today by Pepperdata aims to tackle this exact issue, building on an open source project by LinkedIn and starting with the Hadoop world but promising to expand its reach above and beyond.
'They need to understand what they're doing to us'
Pepperdata, a vendor specializing in Big Data performance, today announced it is expanding its product portfolio with Pepperdata Application Profiler. The Profiler, based on LinkedIn's Dr. Elephant open source solution, aims to provide Hadoop and Spark developers with easy to understand recommendations for improving job performance. Ash Munshi, Pepperdata's CEO, provided an interesting backstory to the new product's birth.
Pepperdata's current line of products includes the Analyzer, the Optimizer and the Enforcer. The Analyzer collects and analyzes a wide array of process, operating system and hardware metrics to provide an overview of cluster utilization, while the Optimizer and the Enforcer build on those to optimize cluster resource utilization and enforce user-provided policies regarding job prioritization.
All of these are apparently aimed at the Ops side of DevOps -- the people in charge of monitoring and optimizing the deployment and performance of applications. So what's the deal with the Profiler -- what is it out to achieve, and why?
As Munshi explains, "We did it because we had Ops come to us and say, it's great that you're helping us get the picture of what's going on in our clusters, but we need to get that across to developers. They need to understand what they're doing to us".
Jesse Escobedo, Senior Systems Engineer at Rubicon Project attests:
Having the appropriate visibility and insight into our Big Data applications is extremely important when delivering detailed reports to our clients and meeting our SLA. We challenged Pepperdata to come up with a solution to profile our applications before going to production that would help us maintain our SLA to our customers as we introduce new applications.
Sounds like a plea born of frustration? Maybe. The idea is that seeing is believing, so getting Devs access to metrics applying to jobs spawn as a result of their code running in the cluster and providing them with feedback and suggestions on how to optimize those jobs would enable them to make changes in their code that should result in better performance.
Sounds like a job for a tool that automatically gathers metrics, runs analysis on them, and presents them in a simple way for easy consumption, with the goal of improving developer productivity and increasing cluster efficiency by making it easier to tune jobs. That tool should provide insights on how a job performed, and then use the results to make suggestions about how to tune the job to make it perform more efficiently.
Dr. Elephant and Mr. Profiler
That is actually the description of what Dr. Elephant does, so pointing it out was obvious. "Pepperdata listened to us and quickly understood the problem we were trying to address. Working together we helped Pepperdata evaluate the usefulness and potential for integrating the open source Hadoop job performance monitoring technology, Dr. Elephant, directly into the Pepperdata dashboard," said Escobedo.
So, what's the big deal about taking an open source project and integrating it in your proprietary stack? And why would LinkedIn, or anyone else for that matter, care that much about that?
Obviously, that's good news for Pepperdata users. With the new integrated product in its stack, they get more than the sum of its parts, as Munshi points out:
Using Dr. Elephant on its own, Devs can get information that says 'this part of your job is slow.' The problem is that Devs don't understand the context: what else is running at this time, what resources jobs may be contesting for and so on. Integrated in our stack, they get the full picture in one place -- plus they save the hassle of installing yet another DB and UI to do that.
What about the rest of the world then? Dr. Elephant has been running as an open source project from LinkedIn for about a year now, so it did not take that long for it to be picked up. LinkedIn's engineers say that "Dr. Elephant is very popular at LinkedIn, where people love it for its simplicity. Like a family doctor, it is always on call and solves around 80 percent of the problems through simple diagnosis".
"When we approached LinkedIn, Dr. Elephant was a project used mainly internally and by a couple of relatively small customers. LinkedIn people were very excited by our interest in the project, and now we're in deep collaboration with them," said Munshi.
"We won't just stay at this level, but we intend to develop Dr. Elephant further and contribute back to its core codebase -- we will be a good open source citizen. We also want to turn it to an Apache project," he added.
Carl Steinbach, Senior Staff Software Engineer, LinkedIn sounds excited indeed stating that:
We created Dr. Elephant to help Hadoop and Spark users understand, analyze and improve the performance and efficiency of their applications. Pepperdata is well-positioned to make significant contributions to this project in terms of new features, new use cases and the ability to reach new users.
Dr. Elephant works by analyzing Hadoop and Spark jobs using a set of pluggable, configurable, rule-based heuristics that provide insights on how a job performed. Pepperdata aims to extend and expand those heuristics, making Dr. Elephant bigger, better, faster, more.
Profiler supports Spark and MapReduce on Cloudera, Hortonworks, MapR, IBM and Apache Hadoop distributions, is currently available in early access and will be generally available in Q2 2017.
This seems like a win-win-win combination: Pepperdata and its users win by getting a new piece in their stack, LinkedIn wins by getting a new major contributor and exposure for their project, and the rest of the world wins by eventually getting more functionality in the open source Dr. Elephant. What's not to like?
Pepperdata however has more plans in store for Profiler. The next steps in the evolution will be to go from inspecting upon deployed jobs to predicting the impact of deploying before it happens, and even to be able to automatically restructure code.
Pepperdata already touches upon automation in the Optimizer product, where Machine Learning algorithms are used to optimize cluster resource utilization, and it wants to go full circle from the Ops to Devs side.
Furthermore, Pepperdata aims to break beyond the Hadoop world. Currently its products work with YARN, Hadoop's resource manager. Pepperdata however aims to expand its offering to also work with the likes of Mesos or Kubernetes. "Our goal is to figure out how to work with them as well, it is strategic for us and it's where we're headed next," said Munshi.
Speaking of strategy, Munshi's own onboarding seems to be part of that too. Munshi, formerly Yahoo's CTO, joined Pepperdata as its CEO in August 2016, succeeding Pepperdata co-founder Sean Suchter, who has now become the startup's CTO. Pepperdata has also just set up what it calls an "All-Star Technology Advisory Board to Guide Future Innovation and Growth."
Pepperdata, co-founded by Sean Suchter and Chad Carson, both previously holding management positions at Microsoft and Yahoo before creating the startup in 2012, boasts clients like Comcast, Philips Wellcentive, and Zillow.
It has raised a little over $20 million in funding from investors including Wing Venture Capital, Signia Venture Partners, and Citi Ventures, which Munshi has said should be enough to go on , so the company is in no rush to raise money or sell to a bigger company at this point in time.
With such big plans however, you never know: "At the end of the day, every company is at sale for the right price."