How to get the measure of the cloud

Imad Mouline, chief technology officer at web performance firm Gomez, talks about ways of ensuring cloud projects measure up

In February, a consortium of service providers, vendors, government bodies and consultants started work on a set of measurements designed to make it easier for businesses to compare the security features offered by cloud-computing providers.

As the Common Assurance Metric (CAM) project gets under way, ZDNet UK talks to Imad Mouline (pictured), chief technology officer at Gomez, a specialist web application and site performance firm, about the wider issues of cloud service comparison and measurement.

Q: Are you seeing many companies moving to the cloud?
A: We are seeing a lot of customers asking what their cloud strategy should be. Speaking with many chief information officers and chief technology officers, I know there seems to be some requirement in companies to come up with a cloud strategy — at the very least to be able to defend why that company is not moving to the cloud.

But what about organisations really moving to the cloud?
In terms of the people actually taking the plunge and doing something about it, they tend to be the smaller companies and start-ups that find the time-to-market capabilities of the cloud incredibly appealing. In larger organisations, there is certainly a lot of experimentation — whether it's in development, quality assurance or skunkworks projects.

But I am seeing far fewer mission- or even business-critical applications being deployed into the so-called public clouds. I don't actually know of any.

That perception obviously excludes some of the traditional software-as-a-service [SaaS] tools. Under some definitions, SaaS still counts as the cloud. But in terms of the more recent type of cloud offering — the infrastructure as a service, the platform as a service — I really couldn't point to anything mission-critical.

What are the main issues facing organisations in terms of measuring cloud performance?
There is this real lack of transparency about performance. Unfortunately, most of the cloud providers aren't helping because they don't necessarily come out with well-defined service level agreements [SLAs] — or their SLAs do not have real teeth. They are not punitive enough.

Why establish an SLA just to get a £1,000 credit at the end of the month? That is not the goal of an SLA. The goal is to make the service provider pay attention to the agreement and ensure the processes and redundancy infrastructure live up to the terms.

So what is the best approach for organisations considering a move to the cloud?
My best advice to people is: before you choose a provider, sit down to write a strategy or talk about pricing or performance. You need to understand why you might move to the cloud. There are good reasons and some not-so-good ones.

You might be a small start-up deciding that you do not have the funding [to do a project on premises], so why not go to a model when you can spend operating cash and quickly grow if you need to. That would be a fantastic reason.

Others might go to the cloud because they see that the real promise of the cloud is elasticity. Let's say I have a traffic pattern for my web application with a peak-to-trough ratio of 50 or 100, and the peak only happens 10 percent of the time. Why should I build capacity in-house? Again, that would be a fantastic reason for going to the cloud.

Still others might be thinking of deploying an application on Google App Engine or Amazon EC2 just because, hey, Google and Amazon are probably the best-performing, most stable applications out there. That may not always be true, though, because there are additional services and pieces of infrastructure that Amazon and Google deploy that are not necessarily made available to their cloud offerings.

If you are interested in elasticity, there are two components: one is raw capacity. I need to be able to get 100 machines whenever I want them. No questions asked. The second component is the velocity: how quickly do you need those 100 machines and how much notice do you need to give the system? Do you need those 100 machines in 10 minutes or seven days?

So ask yourself what your goal is and write down your overall criteria, especially when it comes down to performance. Then test according to those performance issues. If you first define why — what's of interest to you — then you can determine which supplier is right for you by tying your selection criteria to those goals.

You need to be assured that when you go into production any time you want, day or night, at the notice you've given, you'll be able to get that capacity. But now multiply that capacity by all the consumers of the cloud and ask yourself whether they will all be able to get it.

Is there really that much spare capacity? Is the demand for the cloud so diverse that it really fulfils the overall promise that anybody can get as much capacity as they want any time they want it and it will all balance out?

So not everyone will be able to get the capacity they need from the cloud?
The cloud is the ultimate shared environment. So you should keep a closer eye than ever on your application or portion of your application that's running in it, because it is, by definition, a shared environment.

As much as virtualisation can supposedly give you a slice of the computing power and bandwidth, the cloud remains fundamentally...

...shared. So at some point, someone else running in that big cloud environment might have an impact on your performance.

Consequently, you must monitor carefully and make sure your provider is delivering on your goals. Ultimately, not only is monitoring incredibly important, but setting SLAs based on those goals is also key.

Again, whether these SLAs can happen now with all providers or only a handful is open to question, but they are what it will take for more chief information officers to be comfortable moving applications to the public cloud.

If my main criterion is to be able to deliver my web application to end users anywhere in the world in 500ms for the home page, you need to be able to establish criteria from both a monitoring and an SLA perspective that will hold the provider accountable for that.

Are providers accommodating when it comes to these more stringent SLAs?
Not all of them. The cloud market has been filled with all kinds of innovation lately and is moving very fast. But what you are starting to see is some cloud providers coming out with a differentiated offering.

The whole idea of being able to provide cloud services for enterprises to deploy business- or mission-critical applications is finally starting to get some attention. So I think you'll see tiered offerings emerge.

Tiered offerings are priced according to the level of service you are expecting. So you might still see incredibly cheap cloud services, whether it is three cents an instance/hour, but with very loose SLAs.

But a guarantee of a month's credit if your cloud infrastructure and services fall below 99 percent does not do you much good if you're running a business-critical application and your reputation has been left in tatters.

So, you might see at the high end much more stringent SLAs that are measured by a third party such as Gomez from the edge of the network, rather than from within the cloud itself or the provider's datacentres.

You mention measuring from the edge of the network for an accurate picture of cloud performance. What other techniques could you use?
First, why measure from the edge of the network? Because it's measuring from where your users are going to get the content. Because in cloud, as opposed to traditional hosting environments, you are getting implicit as well as explicit services. The explicit benefit is elasticity. That's what's written down.

But you're also getting an implicit service, in that the datacentre will be connected to the internet and will have great peering relationships with the major ISPs around the world. You don't see that being written down, but it's an implicit service that you're buying.

In traditional hosting, you might demand an additional circuit or redundancy. But you're buying more or less a package from the cloud provider — an implicit service — and you have to make sure it is working well for you, not just the explicit one.

As for other techniques, you might want benchmark how quickly you can ramp up and whether you can do that at any time. I should test regularly and not wait until you really need to ramp up.

If you're buying into platform as a service because of the underlying services, the storage APIs, the image transformation and user authentication, you should be testing those individual functions as part of your evaluation and monitoring.

Are there things cloud providers could do to make measurement easier?
There are really two things. The first is to do with portability, because that's always a concern in the background.

And then there are common metrics for performance. What does uptime mean? Does it simply mean I can ping your overall cloud or launch an instance or is it that I can successfully use the login API? What does that mean and in what way can it be measured? Common metrics would make it a lot easier to compare all these different providers and have transparent SLAs.

Is it in providers' interests to have more transparency and common measures?
If providers want more enterprise-class applications to move to the cloud, then it is in their best interest to do so.

Until providers can offer SLAs with teeth, most of the applications deployed, certainly to the public cloud, will be sandbox-type applications — experiments, not mission-critical applications.