Amazon hardware glitch disrupts EC2 and retail sites

Failures in an Irish datacentre were the cause of outages in Amazon's retail sites and cloud services in Europe, not attacks by web activists, according to the company
Written by Tom Espiner, Contributor

Amazon suffered two hardware failures to its European network on Sunday, leading to widespread disruption to its business services and retail sites.

European Elastic Compute Cloud (EC2) and other cloud services were affected for up to two hours, Amazon said on Monday in an explanatory note on its service health dashboard.

The problem arose after a network device at Amazon's datacentre in Ireland failed, the company said. While services were being shifted to another device, that too gave out.

"The first device was nearly repaired but, unfortunately, was still unusable when the second device independently failed," said Amazon in a statement. "Because full redundancy had not been restored to this part of our network, this second failure resulted in an interruption of connectivity for the EU EC2 API servers."

Amazon Web Services's EC2 had increased latency and error rates, while Amazon business messaging customers had difficulty connecting to Simple Notification Service (SNS) and Simple Queue Service (SQS). Amazon's SimpleDB database was also affected by connectivity issues, according to the company.

In addition, Amazon retail sites in a number of countries were affected by the hardware failures on Sunday, according to security firm Netcraft.

"Amazon.co.uk, Amazon.de, Amazon.fr, Amazon.it and Amazon.at suffered approximately half an hour of downtime at around 21:15 GMT," said the company in a blog post on Sunday, pointing out that the retail sites are hosted at the Irish datacentre.

Amazon stressed that web activists from the Anonymous group were not the cause of the outages. "The brief interruption to our European retail sites last night was due to hardware failure in our European datacentre network and not the result of a DDoS [distributed denial-of-service] attempt," the company said.

Anonymous abandoned a DDoS strike on Amazon on Thursday after saying in a post on Twitter that it did not have enough enough people to launch an attack. The attack had been planned in reprisal for Amazon ending its hosting of Wikileaks servers.

Cloud technology is new, and should get better over time. But there are failure points, and this is a risk.
– Andy Buss, Freeform Dynamics

Andy Buss, an analyst at Freeform Dynamics, said that the Amazon disruption shows that businesses should take outages in cloud services into account when assessing new services.

"Cloud technology is new, and should get better over time," Buss told ZDNet UK. "But there are failure points, and this is a risk. There are times when applications are not available, and this should be built into the SLA [service level agreement]."

Buss said that the risk of service outages could dissuade businesses from adopting cloud services, but pointed out that such outages also occur within organisations that use their own datacentres and software.

Editorial standards