Lightning takes down Amazon cloud

The latest disruption to a high-profile cloud-computing service follows outages from other providers, including Google and Salesforce.com
Written by Andrew Donoghue, Contributor

Online book seller and cloud-computing provider Amazon.com has blamed the latest outage to hit its Elastic Compute Cloud service on a lightning strike at one of its datacentres.

In a statement posted on the forums section of Amazon's web-services site, the online retailer addressed concerns from some US customers who said their use of the EC2 service had been disrupted at around 6:30pm US Pacific Time on Wednesday. "A lightning storm caused damage to a single Power Distribution Unit (PDU) in a single Availability Zone. While most instances were unaffected, a set of racks does not currently have power, so the instances on those racks are down," the company said in a posting on the site.

The disruption lasted around four hours, during which time Amazon asked any affected customers to use alternative parts of the network. "Users with affected instances can launch replacement instances in any of the US Region Availability Zones or wait until their instance(s) are restored," Amazon said.

The company later attributed the outage to a problem on one "availability zone" and that the outage was localised. "We would like to reconfirm that this issue was limited to the single Availability Zone where this power issue occurred, and that a very small percentage of instances in that AZ were affected; this was not a generalised service issue," Amazon said.

Despite acknowledging that Amazon had dealt with the issue fairly efficiently, one user was concerned that a single lightning strike was able to bring down the service, if only in a limited way. "I was under the impression that your architecture had more resiliency built into it. Yes we can use multiple availability zones to help with a single point of failure, but I thought that even within a single availability zone there was not a single point of failure for hardware/power," the user posted to the Amazon forum.

The EC2 service provides customers with virtual access to Amazon's computing infrastructure, using virtual machines that can be created using the Xen virtualisation platform. First launched in a limited beta in August 2006, the EC2 service went fully live in October 2008.

Not including the latest issue, the service has suffered two major disruptions during that time in February 2008 and October 2007. In June 2008, Amazon's main retail site suffered an outage which the company blamed on the complexity of its own systems.

A series of outages that have hit other online or cloud computing services including Google's Gmail and other applications over recent months have led some critics to question whether the cloud approach to computing is really capable of providing the resilience required by enterprise users.

In mid-May, Google services were hit by an outage which apparently affected one in 10 of its users. In January this year, software-as-a-service (SaaS) pioneer Salesforce.com experienced an outage that disrupted all its customers for approximately an hour.

Editorial standards