Big Data meets Complex Event Processing: AccelOps delivers a better architecture to attack the data center monitoring and analytics problem

You need to nail the real-time analytics problem, and this has to be the centerpiece of any monitoring or management platform going forward.
Written by Dana Gardner, Contributor

Listen to the podcast. Find it on iTunes/iPod. Read a full transcript or download a copy. Sponsor:AccelOps. Connect with AccelOps: Linkedin, Twitter, Facebook, RSS.

The latest BriefingsDirect podcast discussion centers on how new data and analysis approaches are significantly improving IT operations monitoring, as well as providing stronger security.

The conversation examines how AccelOps has developed technology that correlates events with relevant data across IT systems, so that operators can gain much better insights faster, and then learn as they go to better predict future problems before they emerge. That's because advances in big data analytics and complex events processing (CEP) can come together to provide deep and real-time, pattern-based insights into large-scale IT operations.

Here to explain how these new solutions can drive better IT monitoring and remediation response -- and keep those critical systems performing at their best -- is Mahesh Kumar, Vice President of Marketing at AccelOps. The discussion is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: AccelOps is a sponsor of BriefingsDirect podcasts.]

Here are some excerpts:

Gardner: Is there a fundamental change in how we approach the data that’s coming from IT systems in order to get a better monitoring and analysis capability?

Kumar: The data has to be analyzed in real-time. By real-time I mean in streaming mode before the data hits the disk. You need to be able to analyze it and make decisions. That's actually a very efficient way of analyzing information. Because you avoid a lot of data sync issues and duplicate data, you can react immediately in real time to remediate systems or provide very early warnings in terms of what is going wrong.

The challenges in doing this streaming-mode analysis are scale and speed. The traditional approaches with pure relational databases alone are not equipped to analyze data in this manner. You need new thinking and new approaches to tackle this analysis problem.

Gardner: Also for issues of security, offeners are trying different types of attacks. So this needs to be in real-time as well?

Kumar: You might be familiar with advanced persistent threats (APTs). These are attacks where the attacker tries their best to be invisible. These are not the brute-force attacks that we have witnessed in the past. Attackers may hijack an account or gain access to a server, and then over time, stealthily, be able to collect or capture the information that they are after.

These kinds of threats cannot be effectively handled only by looking at data historically, because these are activities that are happening in real-time, and there are very, very weak signals that need to be interpreted, and there is a time element of what else is happening at that time. This too calls for streaming-mode analysis.

If you notice, for example, someone accessing a server, a database administrator accessing a server for which they have an admin account, it gives you a certain amount of feedback around that activity. But if on the other hand, you learn that a user is accessing a database server for which they don’t have the right level of privileges, it may be a red flag.

You need to be able to connect this red flag that you identify in one instance with the same user trying to do other activity in different kinds of systems. And you need to do that over long periods of time in order to defend yourself against APTs.

Gardner: It's always been difficult to gain accurate analysis of large-scale IT operations, but it seems that this is getting more difficult. Why?

Kumar: If you look at trends, there are on average about 10 virtual machines (VMs) to a physical server. Predictions are that this is going to increase to about 50 to 1, maybe higher, with advances in hardware and virtualization technologies. The increase in density of VMs is a complicating factor for capacity planning, capacity management, performance management, and security.

In a very short period of time, you have in effect seen a doubling of the size of the IT management problem. So there are a huge number of VMs to manage and that introduces complexity and a lot of data that is created.

Cloud computing

Cloud computing is another big trend. All analyst research and customer feedback suggests that we're moving to a hybrid model, where you have some workloads on a public cloud, some in a private cloud, and some running in a traditional data center. For this, monitoring has to work in a distributed environment, across multiple controlling parties.

Last but certainly not the least, in a hybrid environment, there is absolutely no clear perimeter that you need to defend from a security perspective. Security has to be pervasive.

Given these new realities, it's no longer possible to separate performance monitoring aspects from security monitoring aspects, because of the distributed nature of the problem. ... So change is happening much more quickly and rapidly than ever before. At the very least, you need monitoring and management that can keep pace with today’s rate of change.

The basic problem you need to address is one of analysis. Why is that? As we discussed earlier, the scale of systems is really high. The pace of change is very high. The sheer number of configurations that need to be managed is very large. So there's data explosion here.

Since you have a plethora of information coming at you, the challenge is no longer collection of that information. It's how you analyze that information in a holistic manner and provide consumable and actionable data to your business, so that you're able to actually then prevent problems in the future or respond to any issues in real-time or in near real-time.

You need to nail the real-time analytics problem, and this has to be the centerpiece of any monitoring or management platform going forward.

Advances in IT Gardner: So we have the modern data center, we have issues of complexity and virtualization, we have scale, we have data as a deluge, and we need to do something fast in real-time and consistently to learn and relearn and derive correlations.

It turns out that there are some advances in IT over the past several years that have been applied to solve other problems that can be brought to bear here. You've looked at what's being done with big data and in-memory architectures, and you've also looked at some of the great work that’s been done in services-oriented architecture (SOA) and CEP, and you've put these together in an interesting way.

Kumar: Clearly there is a big-data angle to this.

Doug Laney, a META and a Gartner analyst, probably put it best when he highlighted that big data is about volume, the velocity or the speed with which the data comes in and out, and the variety or the number of different data types and sources that are being indexed and managed.

For example, in an IT management paradigm, a single configuration setting can have a security implication, a performance implication, an availability implication, and even a capacity implication in some cases. Just a small change in data has multiple decision points that are affected by it. From our angle, all these different types of criteria affect the big data problem.

Couple of approaches

There are a couple of approaches. Some companies are doing some really interesting work around big-data analysis for IT operations.

They primarily focus on gathering the data, heavily indexing it, and making it available for search, thereby derive analytical results. It allows you to do forensic analysis that you were not easily able to with traditional monitoring systems.

The challenge with that approach is that it swings the pendulum all the way to the other end. Previously we had a very rigid, well-defined relational data-models or data structures, and the index and search approach is much more of a free form. So the pure index-and-search type of an approach is sort of the other end of the spectrum.

What you really need is something that incorporates the best of both worlds and puts that together, and I can explain to you how that can be accomplished with a more modern architecture. To start with, we can't do away with this whole concept of a model or a relationship diagram or entity relationship map. It's really critical for us to maintain that.

I’ll give you an example. When you say that a server is part of a network segment, and a server is connected to a switch in a particular way, it conveys certain meaning. And because of that meaning, you can now automatically apply policies, rules, patterns, and automatically exploit the meaning that you capture purely from that relationship. You can automate a lot of things just by knowing that.

If you stick to a pure index-and-search approach, you basically zero out a lot of this meaning and you lose information in the process. Then it's the operators who have to handcraft these queries to have to then reestablish this meaning that’s already out there. That can get very, very expensive pretty quickly.

Our approach to this big-data analytics problem is to take a hybrid approach. You need a flexible and extensible model that you start with as a foundation, that allows you to then apply meaning on top of that model to all the extended data that you capture and that can be kept in flat files and searched and indexed. You need that hybrid approach in order to get a handle on this problem.

Gardner: Why do you need to think about the architecture that supports this big data capability in order for it to actually work in practical terms?

Kumar: You start with a fully virtualized architecture, because it allows you not only to scale easily, ... but you're able to reach into these multiple disparate environments and capture and analyze and bring that information in. So virtualized architecture is absolutely essential.

Auto correlate

Maybe more important is the ability for you to auto-correlate and analyze data, and that analysis has to be distributed analysis. Because whenever you have a big data problem, especially in something like IT management, you're not really sure of the scale of data that you need to analyze and you can never plan for it.

Think of it as applying a MapReduce type of algorithm to IT management problems, so that you can do distributed analysis, and the analysis is highly granular or specific. In IT management problems, it's always about the specificity with which you analyze and detect a problem that makes all the difference between whether that product or the solution is useful for a customer or not.

A major advantage of distributed analytics is that you're freed from the scale-versus-richness trade-off, from the limits on the type of events you can process. If I wanted to do more complex events and process more complex events, it's a lot easier to add compute capacity by just simply adding VMs and scaling horizontally. That’s a big aspect of automating deep forensic analysis into the data that you're receiving.

I want to add a little bit more about the richness of CEP. It's not just around capturing data and massaging it or looking at it from different angles and events. When we say CEP, we mean it is advanced to the point where it starts to capture how people would actually rationalize and analyze a problem.

The only way you can automate your monitoring systems end-to-end and get more of the human element out of it is when your CEP system is able to capture those nuances that people in the NOC and SOC would normally use to rationalize when they look at events. You not only look at a stream of events, you ask further questions and then determine the remedy.

No hard limits

To do this, you should have a rich data set to analyze, i.e. there shouldn’t be any hard limits placed on what data can participate in the analysis and you should have the flexibility to easily add new data sources or types of data. So it's very important for the architecture to be able to not only event on data that are is stored in in traditional models or well-defined relational models, but also event against data that’s typically serialized and indexed in flat file databases.

Gardner: What's the payoff if you do this properly?

Kumar: It is no surprise that our customers don’t come to us saying we have a big data problem, help us solve a big data problem, or we have a complex event problem.

Their needs are really around managing security, performance and configurations. These are three interconnected metrics in a virtualized cloud environment. You can't separate one from the other. And customers say they are so interconnected that they want these managed on a common platform. So they're really coming at it from a business-level or outcome-focused perspective.

What AccelOps does under the covers, is apply techniques such as big-data analysis, complex driven processing, etc., to then solve those problems for the customer. That is the key payoff -- that customer’s key concerns that I just mentioned are addressed in a unified and scalable manner.

An important factor for customer productivity and adoption is the product user-interface. It is not of much use if a product leverages these advanced techniques but makes the user interface complicated -- you end up with the same result as before. So we’ve designed a UI that’s very easy to use, requires one or two clicks to get the information you need; a UI-driven ability to compose rich events and event patterns. Our customers find this very valuable, as they do not need super-specialized skills to work with our product.

Key metrics What we've built is a platform that monitors data center performance, security, and configurations. The three key interconnected metrics in virtualized cloud environments. Most of our customers really want that combined and integrated platform. Some of them might choose to start with addressing security, but they soon bring in the performance management aspects into it also. And vice versa.

And we take a holistic cross-domain perspective -- we span server, storage, network, virtualization and applications. What we've really built is a common consistent platform that addresses these problems of performance, security, and configurations, in a holistic manner and that’s the main thing that our customers buy from us today.

Free trial download

Most of our customers start off with the free trial download. It’s a very simple process. Visit www.accelops.com/download and download a virtual appliance trial that you can install in your data center within your firewall very quickly and easily.

Getting started with the AccelOps product is pretty simple. You fire up the product and enter the credentials needed to access the devices to be monitored. We do most of it agentlessly, and so you just enter the credentials, the range that you want to discover and monitor, and that’s it. You get started that way and you hit Go.

The product then uses this information to determine what’s in the environment. It automatically establishes relationships between them, automatically applies the rules and policies that come out of the box with the product, and some basic thresholds that are already in the product that you can actually start measuring the results. Within a few hours of getting started, you'll have measurable results and trends and graphs and charts to look at and gain benefits from it.

Gardner: It seems that as we move toward cloud and mobile that at some point or another organizations will hit the wall and look for this automation alternative.

Kumar: It’s about automation and distributed analytics and about getting very specific with the information that you have, so that you can make absolutely more predictable, 99.9 percent correct of decisions and do that in an automated manner. The only way you can do that is if you have a platform that’s rich enough and scalable and that allows you to then reach that ultimate goal of automating most of the management of these diverse and disparate environments.

That’s something that's sorely lacking in products today. As you said, it's all brute-force today. What we have built is a very elegant, easy-to-use way of managing your IT problems, whether it’s from a security standpoint, performance management standpoint, or configuration standpoint, in a single integrated platform. That's extremely appealing for our customers, both enterprise and cloud-service providers.

I also want to take this opportunity to encourage those of your listening or reading this podcast to come meet our team at the 2011 Gartner Data Center Conference, Dec. 5-9, at Booth 49 and learn more. AccelOps is a silver sponsor of the conference.

Listen to the podcast. Find it on iTunes/iPod. Read a full transcript or download a copy. Sponsor:AccelOps. Connect with AccelOps: Linkedin, Twitter, Facebook, RSS.

You may also be interested in:

Editorial standards