Staying agile: data-driven IT operations

Would you like to have an end-to-end picture of your IT operations, but are lost in translation among a myriad monitoring solutions and metrics? Your monitoring should be as agile as your operations, and OpsDataStore says it can help you get there.
Written by George Anadiotis, Contributor

You got your agile application development methodology, and your DevOps, continuous integration and release. You got your combination of bare metal and virtualization, private, hybrid, and public cloud.

Great. If you got them right, your application development cycle is quick and adaptive, which means time to market is short, and your deployment options are flexible and elastic, which means you can provision effectively.

The flip side of that is complexity and opaqueness. It means you have a polyglot development environment with many moving parts, and a multitude of heterogeneous testing and deployment environments with numerous virtualization layers constantly reconfiguring.

So how do you keep track of your end-to-end IT operations? By utilizing a multitude of monitoring solutions: application performance management, Ops and Networking management and so on.

The problem is that each of these solutions, as great as it may be in what it does, only gives you a part of the bigger picture and lives in its own silo. So if you want to know how the latest fix in your application influenced server utilization, or get an idea of where bottlenecks causing downtime occur, you need to get a bunch of people with their laptops in a room and put their collective data and brains to work in an ad-hoc way.

This is the problem OpsDataStore has identified and is out to solve.


Agility also means complexity. How do you monitor your enb-to-end IT operations? Image: OpsDataStore

The big picture

Bernd Harzog, OpsDataStore founder and a performance management industry veteran, was ideally positioned to identify and act upon the issue. After building and selling a performance management solution to Citrix in 2004, Harzog spent the next years working as a performance management (PM) consultant with the likes of UBS and Credit Suisse.

Harzog was proficient with solutions like New Relic, AppDynamics, and Dynatrace, and helped his clients choose the best solutions for their needs, set them up and make the most of them. Harzog was beyond certified -- he had Non-Disclosure-Agreements (NDAs) with most PM vendors because of the intimate knowledge he had on their products.

He was, as he puts it, "perhaps the only person in the world with this kind of knowledge of what all of these competing vendors were doing and how they were doing it."

Still, when things went wrong, as they often do, fixing them was no easy feat even for Harzog. As each of these PM solutions only focused on part of the stack, and in addition many of these were competing with each other, integration was simply non-existent. This is precisely what Harzog decided to address with OpsDataStore, laying out and implementing a strategy to tackle each of the associated challenges.

Challenge #1: huge space. Despite vendor efforts to expand their offering as much as possible (with Cisco's acquisition of AppDynamics being the latest example), the IT operations stack ranging from application development to infrastructure and networking monitoring is huge.

OpsDataStore decided early on that there was no point in trying to cover it all. What they did instead is strike deals with as many players as possible, in order to be able to collect and integrate their data and metrics.

Challenge #2: vendor access. Some vendors in this space, like AppDynamics, are open about their metrics and even have documented APIs that third parties can use. Others are cryptic, therefore special permissions and partnerships needed to be in place in order to work with them.

Harzog's reputation and relationships in the space definitely helped there, and as a result OpsDataStore has partnered with many key players.

Challenge #3: big data. What OpsDataStore needed to do sounds like a standard big data scenario: ingest, integrate, and reuse data from a variety of sources. That does not make it simple, but OpsDataStore was able to put together the right team to make this happen.

Data is collected from all sources and kept in OpsDataStore's platform, from where users can query using SQL, consume via Kafka or explore via customizable dashboards in Birst, Qlik, or Tableau.

Harzog is clear: "We're not a monitoring vendor, we're an integration vendor. And if you ask, why us, simple: we're the only ones with the right business model, vendor relationships, and platform to make it work."


A graph object model that updates itself with new instances and relationships? When dealing with a mess of a data integration environment, that helps. Image: OpsDataStore

Under the hood

Business strategy aside however, is there anything exceptional in how OpsDataStore works under the hood? OpsDataStore is built on Cassandra, Spark, Kafka, AKKA, and EXASOL. OpsDataStore has a Service Provider Interface (SPI) that is used to integrate both raw data and metrics from the vendors it works with. When dealing with such a variety of sources and data, schema and API evolution is a serious consideration.

OpsDataStore has worked on the integration with a select few vendors itself, with its mid-to-long term strategy being to leverage its success to motivate vendors to undertake at least part of the integration themselves.

"We just tell them, we can do it in a year, or you could do it yourselves in much less," says Harzog. Get SPI, plug in, get inspected and certified by OpsDataStore, get with the program.

But that's only part of the solution. The other part is OpsDataStore's highly sophisticated, patent pending graph Object Data Model. This model is able to update entities and relationships from all the sources OpsDataStore connects to in real-time, every five minutes.

Harzog is particularly proud of this, and with good reason: "we do this automatically, continuously, dynamically and deterministically -- no statistics involved. Nobody else does this."

OpsDataStore also uses statistical methods to automatically calculate time-of-day and day-of-week baselines for every metric state. "You can call this machine learning if you want, although we don't do predictions at the moment," says Harzog. The idea is again for users to be able to consume data from OpsDataStore and feed them in their own apps if need be.

What about integration on the semantic layer? OpsDataStore takes the middle road there: "We simply tag all metrics to keep track of provenance. So if for example Intel and VMware define server utilization differently -- and they do, actually -- we don't do anything about this, we just preserve the information and let the users decide how to handle this," explains Harzog.

OpsDataStore recently released their latest version, 1.2, offering an impressive new feature: automated root cause detection. "Nobody has ever been able to tie in application and transaction behavior with infrastructure behavior, if not for any other reason simply because nobody has had access to all that data before. We do, so we can offer this. You can now do things like define utilization alerts and correlate them with metric alerts," says Harzog.


Automated root cause identification is no longer a pipedream, promised OpsDataStore. Image: OpsDataStore

Going forward

So, is OpsDataStore unique? "Our only real competition is the way people and organizations have traditionally been doing things," says Harzog. And it looks like he just might be right. With barely two years in existence and 10 people total headcount, OpsDataStore's achievements are rather impressive: partnerships with key players like Intel and VMware, and clients like Navis.

Navis, a part of Cargotec Corporation, provides technology for managing the movement of cargo through terminals, and is representative of the types of clients OpsDataStore is aiming at currently: "anyone with over 5000 servers." This also dictates some of OpsDataStore's technical decisions -- most prominently, lack of support for Mesos and public clouds.

"There is not enough demand from our clients," explains Harzog. "Mesos may be wildly popular, and we also use it ourselves internally, but not in our client base. Same goes for public cloud: we have direct connections to our clients, and at this time they don't trust the public cloud to run their business. They just don't. Because they don't believe they can get the reliability and performance they need from public clouds.

But we do support private clouds, co-location, and hybrid clouds. As things evolve, we will follow. We are already discussing with clients that will be moving some of their workloads to the public cloud in a cloud-bursting, multi-cloud strategy, and we will support them as this happens. Our goal is to always enable them to get the big end-to-end picture."

What's behind the trend of companies moving from public to hybrid cloud:

Editorial standards