Innovation

Cloud outages: Why storage is the real villain

Amazon's EC2 suffered a four-day outage in April but the cloud itself is not at fault, says Lori MacVittie

Written by Lori MacVittie, Contributor May 29, 2011 at 1:00 a.m. PT

Critics seize gleefully on every cloud outage as an indictment of the technology itself, but the true blame lies elsewhere, says Lori MacVittie.

Amazon's outage in April drew a lot of coverage — with most of it unfairly focused on cloud computing. Yet if the culprit wasn't the cloud for Amazon Web Services, what was to blame?

The detailed technical descriptions of what went wrong clearly indicate the source of the real issue. Unfortunately it's an issue that's rarely discussed, let alone in the context of cloud computing. So what is it?

As my storage-minded colleague Don MacVittie expounded in a recent blog, many people have had much to say about the Amazon outage, but the one thing that it brings to the fore is not a problem with cloud, but a problem with storage. Yes, we have a problem with storage and it's compounded by cloud computing.

The issue of availability

While application and network virtualisation have enabled architectures designed for failure — that is, supportive of failover — storage virtualisation has not.

The underlying problem is that storage virtualisation is about aggregation of resources for purposes of expanding capacity of the entire storage network, not individual files. Storage virtualisation controllers, unlike application delivery controllers, do not provide failover.

If a resource such as a file system becomes unavailable, it's unavailable. There's no backup, no secondary, no additional copies of that file system to which the storage virtualisation controller can redirect users. Storage virtualisation products just aren't designed with redundancy in mind, and redundancy is critical for enabling availability of resources.

Redundancy is critical, but it's not the only technological feature required. Interfaces to storage must also be normalised across redundant resources. A common interface allows transparent failover from one resource to another in the event of failure, making it possible to take advantage of redundancy.

Multiple copies of the application

Applications, for example, especially in cloud-computing environments, generally take advantage of the ubiquitous nature of HTTP. Availability of applications is made possible by the existence of multiple copies of the application and because all clients are accessing the application via HTTP. If one instance fails, the same protocol can seamlessly interact with a secondary or tertiary copy of the application.

Storage virtualisation products have addressed the problem of normalised interfaces by acting as a go-between, a proxy, to provide a single interface to clients while managing the complexity of heterogeneous storage systems in the background.

But the protocols used to manage storage resources internal to the storage architecture are not always...

...the ones used to access storage resources across physically disparate environments, such as between the datacentre and a cloud-computing environment.

Every provider presents its own service interface, requiring storage virtualisation products to use customised access methods to integrate such services into the enterprise architecture.

For intra-provider redundancy, this situation is not a problem, but for inter-provider redundancy, it becomes a very serious drawback, as generally only the most popular provider services are supported.

Redundancy and interfaces

What storage services need, particularly in cloud-computing environments, is the ability to provide for failover — whether across environments or internal to the environment. Storage virtualisation products must take the next step towards availability and, ultimately, true storage-as-a-service.

That step means making storage services available in a more standards-oriented way to enable inter-cloud and ultimately inter-architecture compatibility.

Normalised interfaces would make it possible for storage virtualisation systems typically deployed in large enterprises to take advantage of external storage without the complexity and dependency on vendor whim or pure populism.

But first, storage virtualisation products must implement failover capabilities. They must be able to not only create tiers of data across environments, as they do now, but they must be able to replicate and failover from one to another to assure availability of the aforementioned services — especially of mission-critical files.

Single point of failure

Storage virtualisation products need to support redundancy in a manner similar to the network and application redundancy that has enabled highly available architectures to date. Without the ability to support redundancy, and thus failover, storage-as-a-service remains a single point of failure that, as evidenced by the Amazon outage, can be disastrous.

Without the ability to support redundancy, and thus failover, storage-as-a-service remains a single point of failure.

Once redundancy and subsequently high availability are achieved internal to the storage architecture, then it becomes possible to include external storage-as-a-service as part of that architecture. It is at that point that standard interfaces become imperative to providing customers with the choice and flexibility of services.

But first and foremost, we have to recognise that the real issues with storage-as-a-service are caused by inadequacies in storage technology, not by the concept itself. We have to address those problems rather than lay responsibility at the feet of cloud computing.

Lori MacVittie is responsible for application services education and evangelism at application delivery firm F5 Networks. Her role includes producing technical materials and participating in community-based forums and industry standards organisations. MacVittie has extensive programming experience as an application architect, as well as in network and systems development and administration.

Get the latest technology news and analysis, blogs and reviews delivered directly to your inbox with ZDNet UK's newsletters.

Editorial standards

Show Comments

Samsung Galaxy A35 5G home screen on a wooden bench.

Cloud outages: Why storage is the real villain

The issue of availability

Multiple copies of the application

Redundancy and interfaces

Single point of failure

Related

Forget the Pixel 8a. This $399 Samsung phone is a force to be reckoned with

I fly 10 times a year. These 5 tech gadgets are lifesavers

This 11-in-1 docking station freed up my cluttered workspace, and it's 40% off