Is uptime the wrong metric for cloud service-level agreements?

Service level agreements, or SLAs, are the glue that holds cloud computing engagements together. SLAs were pretty straightforward in the days of traditional data center computing: guarantee me this much uptime from your system or application.

With cloud, things aren't so straightforward. Zapthink's Jason Bloomberg recently described the challenge with establishing SLAs for cloud services. First of all, since there are private, public and hybrid clouds, there are, accordingly, three contexts for cloud SLAs. For software vendors, SLAs are embodied in end-user license agreements (EULAs) that focus more on restrictions on the end users. Managed hosting providers will talk about uptime (such as 99.9%), with prices adjusted for each "9." Then there are SLAs for internal services provided by IT, which differ all over the place, depending on capabilities and hardware used, and tend to be expressed as service credits.

The problem is, Jason says, these contexts tend to collide, creating all sorts of confusion among cloud service consumers.

Ultimately, none of the three SLA contexts discussed above are suitable for cloud engagments, Jason believes.  Cloud SLAs should not be based on uptime guarantees:

"Elasticity is even more important than reliability. Remember, when working with the cloud you must plan for and expect failure; it is the cloud’s ability to automatically recover from such failures that compensates for the cloud’s underlying shortcomings. How fast your cloud can scale up, its ability to do so regardless of the demand, its ability to deprovision instances even more rapidly, and in particular its ability to recover automatically from failure, are the characteristics you’re really paying for."

In cloud engagements -- be they internal or external -- focus more on how well the cloud deals with unexpected events, Jason adds. "After all, these are the characteristics of the cloud that make it a cloud."

  • Why IT sucks...

    Non IT people will read an article like this and see a focus on diminishing expectations rather than improving service. This paradigm isn't isolated to IT, but I haven't read any articles on how the airline industry is saying passengers should be conditioned to accept more bags being lost rather than trying to improve their processes so that fewer bags are lost. No, uptime is not the wrong metric for SLA agreements. It just shouldn't be the only metric. How the cloud deals with downtime is an additional metric to be used, not a replacement. Stop making us look bad.
    • Agreed

      While I do care about how the cloud service recovers from failure, I'm even more concerned with how much failure I should expect. Uptime should not go away just because you're calling your datacenter a cloud now. I still want my core services to be up 99.999% of the time.