Understanding the performance of the modern data center

Virtual Instruments Q&A: Understanding how apps co-exist in the data center is a complicated but important.

It's been a hectic time for infrastructure performance management company Virtual Instruments: a merger with storage performance analytics Load DynamiX in March last year was followed by the acquisition of hybrid cloud and virtualization management company Xangati in November.

Founded in 2008, Virtual Instruments has nearly 500 customers, including enterprises, cloud service providers and storage vendors; the privately held company is headquartered in San Jose, California. ZDNet talked to Virtual Instruments CEO, Philippe Vincent, to see how they are doing.

ZDNet: Walk us through the background of your company and the reasons for the merger.
Vincent: I was the CEO of Load DynamiX and then we merged the companies -- Load DynamiX and Virtual Instruments -- in April of last year, then we acquired Xangati in October of last year.

Now the larger of those companies is Virtual Instruments but the other two supply important components to the mix as well as some vitally important IP.

All three of these companies were based in Silicon Valley literally within three miles of each other. Now we have combined all the companies and people into one campus and as of a few months ago, we have got our entire product team in one place.

Can you talk about where you came from?
The three companies that make up Virtual Instruments are separate but united. Virtual Instruments was a company that was focused on overall, infrastructure performance management. That was because they saw themselves instrumenting various products areas: storage environments and storage networking environments.

So, rather than being focused on particular pieces of hardware such as switches or storage arrays, they saw themselves going after the broader space of infrastructure performance management.

Our speciality now is what you could call our north/south axis. That means that you have your apps and middleware running on top of your infrastructure and in that infrastructure you've got your east/west axis -- which is nodes talking to each other -- and that's a north/south axis. So the strength of Virtual Instruments was the instrumentation, the analytics of that north/south axis. That was what VI was all about.

Naturally we have some particular strengths -- financial services is one, and that is one of our strongest verticals. That is where we see most of our mission critical apps. They are very complex and have a high dependence.

We also attract healthcare, particularly in the US where healthcare organisations generally spend a lot of money on IT.

We also have a fairly large representation in telcos and they also tend to be very large and complex.

Finally, our fastest growing segment is cloud service providers. We have had some of them for quite a few years and it has become our largest and fastest growing segment.

Now when we looked to take our business to the next stage we wanted to be innovative. For example, we have used DynamiX to work on simulating those workloads that are experienced on a regular basis and then creating synthetic traffic designed to simulate the I/O path of real workloads.

If you look at that business the product was used by two constituencies. The first one is pretty much everybody who builds storage systems for our database servers. Any company, any enterprise that uses those products knows that they have been tested, they work, and they are reliable. Cisco, IBM, they all use the equipment.

Now that was Load DynamiX, our initial business. But how to move forward with that? I was hired by the board of Load DynamiX to say: 'That's OK, but there is a market beyond that'.

We have the ability to help our customers use our performance validation to help them make better decisions. That is something that started for us in 2014.

Now, among the early users who adopted the product, we realised that the use case was very simple. They would start with the workloads they had and then they would simulate them and replay them and twiddle all the knobs and do what-if's.

20170615142041.jpg

Vincent: "We take care of both pieces - the application and the infrastructure."

Photo: Colin Barker

Then they would put into that all sorts of storage solutions that they wanted to investigate. They could then make their decisions and they could also drive the change management process with that.

Now we did all of this with our Load DynamiX customers and they said: 'It's great that you can help us simulate our workloads but could you tell us what our workloads are?'

They did not have that data and so we started to be a monitoring business, giving them the ability to search those workloads. That's Load Dynamix in a nutshell.

Other companies have east/west access and their focus is really the top of the IP network. We have the depth that comes from the storage network and above that we need to make sense of that data.

If you are having a problem with your array, then it's probably a good idea to see where that problem is coming from -- the causation.

Now causation has been in the product for some years but what we are adding now is the ability to see the infrastructure pretty much in an end-to-end way. Our focus now is on extending that understanding to the applications inside of our software.

Typically, there is the APM [Application Performance Management] space and there are three or four vendors who own that space. They are CA, IBM, etc and nowadays we have the newcomers as well.

They are the generation that is winning right now. But when you start to look at the infrastructure platform, instead of looking like an infrastructure performance management platform it looks like a collection of monitoring solutions in silos.

So nowadays you are typically looking at a customer who has several generations of infrastructure in their environment and they all have keyboard monitoring tool but instead of one monitoring tool, they have dozens of monitoring tools to monitor all those different silos. And apart from being in their silos, to add confusion, they are from different vendors too. That is not working.

What we want to do is reinvent that space with a platform approach that allows you to get the capability and flexibility across the environments.

But to do that you need to be able to bring multiple environments together?
Yes, and with the expertise collected by our original three companies we think we have the capability to do that. Some of them are integrated already and some of them will be integrated by early next year.

MillerCoors sues HCL Tech for $100 million over failure to implement ERP project

It is still unclear as to why the project went awry, as is the case with many complex ERP implementations.

Read More

Now with Load DynamiX and Xangati we have learned two other things. One is called Resource Contention Modelling and the other Workload Modelling. What we are trying to do here is to help the customer manage the interaction between the thousands of infrastructure services they are running today and the supporting thousands of application services that go with them.

If we look at Resource Contention Modelling, that is basically the principle of saying for every node of an infrastructure network we are going to ask, who is using you and how are they contending for your resources?

From there we can be pro-active and look at how, say, VM number 15 is starting to affect the performance of others. So you might be saying: 'I have these 16 VMs that I am writing to, and this is now they are contending for resources and, by the way, VM number 15 is starting to hog the resources and this is affecting the performance'.

And it's doing all this in real time?
It is doing it on a real-time basis and modelling the contention for the resources.

The other thing we do is look at the I/O and think about a storage array. If you look at Cisco and how they are tracking the average number of VMs in an array against the compute node, I think they say that it is seven and a half VMs per compute node.

Well when you get down to the storage array, the storage array might be seeing 20 hosts. So multiply that by the number of VMs and it is easy to see that the storage environment might be supporting 100 to 200 VMs. The contention for all of these storage resources is even greater than the compute and it has the problem that it is not nearly as flexible as it might be.

So, in Load DynamiX, we offer the ability to monitor the effect of all of those I/Os while we keep tracking the storage system. We call that Workload Modelling and it is something that we are building into our systems.

But we think that the big problem faced today is that the space is broken and it is broken because it is basically legacy. I was talking with a customer a few months ago and they are using dozens of components in their infrastructure and, as is typical of so many systems today, all of them is offering a very narrow view of a single component.

To address that we created a new visualization for our customers. When we think about our customer we see a single customer, or small group of customers, working to a massive data center and that's probably true or so many of our customers -- a single user or small group working with tens of thousands of infrastructure services.

And these are multi-generational infrastructure services -- storage systems, custom firewalls, network switches, and hyper-convergent models.

And on top of that the managers have all of these pieces that came from different vendors. It's really a miracle that they are managing with all of that.

What's even more complicated is that that stuff is now delivering thousands, or tens of thousands of application services over which they have no control.

Some of our customers have application development teams that are larger than software companies and are building thousands of applications. And they are building apps that are constantly changing on a continuing basis.

Now, in general, the technology is becoming very robust but there is still a feeling that the modern infrastructure is not capable of giving you a proper context.

If we look to the infrastructure modelling of today, it will have a solution that will give you visibility to the business here and another that will give you visibility there. And you might have a few thousand tools across that estate and every one of them will give you a partial view of the elephant, but it will not allow you to see all of it.

Customers today use performance management tools to cover the current deficiencies of the application itself. They will buy partial solutions to deal with these day-to-day problems.

So our approach is something we call App-Centric IPM. We take care of both pieces -- the application and the infrastructure.

What's an example of that?
To take a common problem, let's look at the 'noisy neighbor'. If you think about hundreds of thousands of apps running on the top layer and among all those apps there is one that is very important to you. That could be the app that you are running your business on, so it is vital.

To take a real life example from one of our customers, a big telco, was launching a new iPad app. They were signing up a lot of customers who wanted to use it so it was a busy time for them.

Now imagine you are one of those customers. You will be checking the application, checking the product history, the system will be signing you up, giving you a phone number in case your network didn't sign you up and then loading the applications.

Now somewhere along that service there was a database using a storage system at maximum load and it was running just fine until another application (not such an important app) started its back-up process at an unexpected time. That process went down and then took down both applications together. That's what happens as a result of the multi-tenancy in modern apps.

Now traditional monitoring systems can't help. The problem occurs because large portions of the infrastructure have large portions that don't understand that each other exists so that when an app goes down it becomes very hard to track down who, or what, is responsible.

As long as you are in silos you cannot really get to causation and that is entirely why, with our new system we are app-centric.

The whole idea of the app-centric design is to allow us to anticipate what is happening.

Read more on data centers

Newsletters

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
See All
See All