ie8 fix
madison

Lightning strike zaps EC2 Ireland

By | August 8, 2011, 12:47am PDT

Summary: A lightning strike last night knocked out servers at Amazon’s only European data center and the provider has warned some of those affected face delays of up to two days before they get back online.

Amazon has told its EC2 customers in Europe some of them could face outages of as long as 24 to 48 hours as the cloud provider struggles to recover from a lightning strike that disrupted power supplies to its Dublin, Ireland data center. It took 3 hours to recover the first of the affected instances last evening European time (midday Pacific time) and after almost 12 hours a quarter still remained offline, with knock-on effects slowing their likely recovery time. From Amazon’s status page (12:08am PDT update):

“Due to the scale of the power disruption, a large number of EBS servers lost power and require manual operations before volumes can be restored. Restoring these volumes requires that we make an extra copy of all data, which has consumed most spare capacity and slowed our recovery process. We’ve been able to restore EC2 instances without attached EBS volumes, as well as some EC2 instances with attached EBS volumes. We are in the process of installing additional capacity in order to support this process both by adding available capacity currently onsite and by moving capacity from other availability zones to the affected zone. While many volumes will be restored over the next several hours, we anticipate that it will take 24-48 hours until the process is completed. In some cases EC2 instances or EBS servers lost power before writes to their volumes were completely consistent. Because of this, in some cases we will provide customers with a recovery snapshot instead of restoring their volume so they can validate the health of their volumes before returning them to service. We will contact those customers with information about their recovery snapshot.”

The outage struck servers in one of three availability zones in the EU-WEST-1 region, but recovery efforts have had knock-on effects to capacity in the other two zones. Relational Database Service (RDS) is also badly affected. EU-WEST-1 is Amazon’s only data center in Europe, which means that customers who have to keep their data within the European region for data protection compliance have no available failover to another Amazon location.

How the outage happened, from Amazon’s status page history:

“We understand at this point that a lighting strike hit a transformer from a utility provider to one of our Availability Zones in Dublin, sparking an explosion and fire. Normally, upon dropping the utility power provided by the transformer, electrical load would be seamlessly picked up by backup generators. The transient electric deviation caused by the explosion was large enough that it propagated to a portion of the phase control system that synchronizes the backup generator plant, disabling some of them. Power sources must be phase-synchronized before they can be brought online to load. Bringing these generators online required manual synchronization. We’ve now restored power to the Availability Zone and are bringing EC2 instances up. We’ll be carefully reviewing the isolation that exists between the control system and other components. The event began at 10:41 AM PDT with instances beginning to recover at 1:47 PM PDT.”

In what seems to be a typical pattern when Amazon experiences large-scale outages, its customers have been complaining of insufficient information coming out to help them recover. “With AWS it is more a process of figuring it out through trail and error with little or poor feedback from Amazon,” wrote one poster to a thread about the outage on its discussion boards. “I hope they get the remaining instances up but from their service dashboard it says 24-48 hours. This can can totally ruin my company.”

See also:

Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Topics

Since 1998, Phil Wainewright has been a thought leader in cloud computing as a blogger, analyst and consultant.

Disclosure

Phil Wainewright

Phil Wainewright's work as an independent consultant brings him into direct or indirect business relationships with several of the companies that he writes about, or their competitors. Phil is committed to maintaining the independent and opinionated stance that his writings are well known for and does not enter into contracts that would limit his freedom of expression in any way. However it is important in the interests of full disclosure to inform readers of those relationships so they can form their own judgement.

Read the complete list of Phil's relationships.

Biography

Phil Wainewright

Since 1998, Phil Wainewright has been a thought leader in cloud computing as a blogger, analyst and consultant. He founded pioneering website ASPnews.com, and later Loosely Coupled, which covered enterprise adoption of web services and SOA. As CEO of strategic consulting group Procullux Ventures, he has developed an evaluation framework to help ISVs and enterprises select cloud platforms, and advises US and European vendors on messaging, positioning and go-to-market. His newest role as an industry advocate is vice-president of EuroCloud.

Related Discussions on TechRepublic

Did you know you can take part in these discussions with your ZDNet membership?
46
Comments

Join the conversation!

Just In

RE: Lightning strike zaps EC2 Ireland
cheaphostingreviews 25th Aug
Excellent Info!I got goog knowledge about this posting,Really great pleasure to reading this posting.I bookmarked this.
Sounds like you had a great time with great people. That?s the way to live!Thanks for the info, too.
cheap hosting reviews
0 Votes
+ -
Second Strike?
johnfenjackson@... 8th Aug
Without thinking too deeply it occurs to me that:

1. "EBS servers lost power before writes to their volumes were completely consistent" so Amazon's power outage contingency and system design is worse than my amateur provisions.

2. Like many incumbent global technology corporations the constraints (differing national laws, privacy, need to keep shareholders happy and make a huge profit, stay with current architectures, complexity of virtualisation ...) are so onerous that the company is more or less painted into a corner.
I felt the same way about Windows Home Server. Instead of a trivial addition of iSCSI target to (say) Windows Ultimate and a simple control panel (say Storage Server Lite) ... M$ insisted on a new box and a new OS and a new file system which they botched twice.

3. One plan to cripple the West in 2020 would not be to destroy the Internet but to take out the cloud datacentres. Ironically the most likely sites to survive would be P2P trackers.

4. THE CLOUD appears to have all the complexity and stability of THE FINANCIAL SYSTEM sad

Perhaps when 'lightning' has struck from the clouds sufficiently many times ... people will come to realise that lighning only comes via the clouds!!
0 Votes
+ -
@johnfenjackson@...

And what does is have to do with windows home server again?
@Knix96
"4. THE CLOUD appears to have all the complexity and stability of THE FINANCIAL SYSTEM"

Just who do you think this was designed by/for?
0 Votes
+ -
Unlike you're usual BS posts
William Farrell 8th Aug
@johnfenjackson@...
I actually thought you where going to write an on topic, thoughtful reply for a change.

Instead you just completely botched the opportunity presented to you, instead (once again) used it as a way to take a dig at MS in some unconnected manner.
0 Votes
+ -
RE: Lightning strike zaps EC2 Ireland
BlueCollarCritic 8th Aug
@johnfenjackson@...

?4) THE COULD appears to have all the complexity of the FINANCIAL SYSTEM ?

Actually if the cloud were like the financial system then it would

a) Change its available drive space (normally growing and rarely decreasing) not based on activity or need but the whims of some elite banksters in another country operating thru a front that we might call the FEDERAL CLOUD RESERVE

b) No matter how often the cloud were down it would always be reported in the media as being up and some would even report it as running better than normal

c) All in all the productivity of the cloud would go down as the costs to use it would go up unless you were part of the inside ?silver lining? group in which case your cloud productivity would increase equal to the reduction of everyone else?s.
0 Votes
+ -
RE: Lightning strike zaps EC2 Ireland
VillarrealAndy 8th Aug
I just paid $22.87 for an iPad2-64GB and my girlfriend loves her Panasonic Lumix GF 1 Camera that we got for $38.76 there arriving tomorrow by UPS. I will never pay such expensive retail prices in stores again. Especially when I also sold a 40 inch LED TV to my boss for $675 which only cost me $62.81 to buy. Here is the website we use to get it all from, BidsOut. com
0 Votes
+ -
@VillarrealAndy Oh, yes, yes, please - I also want the Mercedes for $5.20 and the blow up doll for $0.99. This message is illustrative of what is wrong with the Internet - For The Morons, By The Morons. Anybody who uses BidsOut deserves to be fleeced.
0 Votes
+ -
It's that new fangled lightning thingy. Like anything new, it will take awhile to figure out what to do.
0 Votes
+ -
@DKFlorida lol - and as we now know, lightning never strikes twice, so it can safely be disregarded from any DR plans in the future.
0 Votes
+ -
@ejhonda

That reminds me of a 300 feet deep water well we have in which the steel casing is 111 feet to bed rock. The electrician bonded/grounded/earthed the casing as required per the NEC. The pump was submerged and rested at 280 feet. The pump and controls were hit SEVEN times by llightning. Upon the installation of the eighth pump and motor ass'y I instructed the maintenance techs to cut the ground wire to the casing as the electrician refused to do so, as he should've. The eighth pump has been in operation for 18 years now with zero problems.

Apparently, the well casing provided the low impedance path for the power girid in that location. Perhaps it isn't safe in all respects, but it's far better than spending a week errecting a derrick and pulling it all out of the ground and replacing it. I might add the electrician was required to assist in this endeavor, so when he saw the 111 ft ground rod disconnected and no more burn outs his protests quickly diminished.

Don't get me wrong, grounding IS totally necessary and critical for safety. But, when the utility company isn't doing it's part drastic measures must be taken.
0 Votes
+ -
Lightning? Really?
cabdriverjim 8th Aug
Didn't Ben Franklin solve this problem a few hundred years ago? Its really not that difficult to design a system to isolate lightning from the equipment. Yet I see time and time again people running data centers who fail to do so. (Sadly, myself included, no one listened to my warnings that we needed proper ground buses and mere weeks after it went online nearby lightning, not even a direct strike, knocked out our switches and welded all the ports on our $20,000 carrier grade router. sigh.)
0 Votes
+ -
@cabdriverjim

And, most likely, those fools are either, 1) no longer with the company, or 2) promoted to manglement.

Let me say this, for my current boss' predecessor; it was a career ending move when the owner learned that he was warned repeatedly about the inadequacy of the UPS capacity, and ignored the advice.

You do not take chances in the lightning capital of the US. One good strike cost him his job.
0 Votes
+ -
Redundancy, redundancy, redundancy. Should I say it again? It's foolish to rely on one medium for your backups. This is why I'm looking forward to using iCloud, since I will retain the data on my computers as well.
0 Votes
+ -
If this had been propery configured and distributed the Amazon users would never had noticed, as the data would have been distributed pan-globally in mirrored servers.

I wonder how many "clouds" don't have a lining at all!
0 Votes
+ -
@PC Ferret

Amazon users do have the option to set that up. The problem is that there is only one DC in Europe, and some users, by law, must keep their data in Europe.
0 Votes
+ -
What, No UPS? Most places have rack UPS to cover the switchover to the generators as that usually takes a few seconds? Someone promised them an instant switchover?
0 Votes
+ -
@bobdavis321
Yes I agree, its basic to have some form of UPS, even if its to help clean up any noise on the mains supply.
0 Votes
+ -
If You read the article
sboverie 8th Aug
@bobdavis321
You would have read that the back up generator should have started and taken over for the loss of power, but the lightning strike caused more damage to the circuitry that switches power.
0 Votes
+ -
@bobdavis321 : Sounds like they did have rack UPS, but it took longer for generators to sync than allowed for. Meanwhile, the system kept on accepting new transactions to write to disk. If it had started turning them down when power was lost, and simply updated HD writes in progress to the disk, it wouldn't have lost data. (or had 'inconsistent' data)
0 Votes
+ -
RE: Lightning strike zaps EC2 Ireland
paddydefies@... Updated - 8th Aug
Even for customers that require a European regulatory compliance (to have data inside Europe), with a multi-AZ failover setup they could have avoided a complete outage. For others, a failover setup in a different region can address most of these situations in the future too. Rather than complaining too much about AWS, it would make better business sense to design your cloud strategy for failure.
http://www.cloud.8kmiles.com
0 Votes
+ -
And God sayeth...
Zorched 8th Aug
"Thou shalt not trust the cloud for thou art at the mercy of the whims of my hand."

...or something like that, if I was actually a religious person.
0 Votes
+ -
RE: Lightning strike zaps EC2 Ireland
james.vandamme 8th Aug
@Zorched "There is only darkness and distress;
even the sun will be darkened by clouds. "
Isaiah 5:30
FTFY.
0 Votes
+ -
I agree with cabdriverjim that the grounding was completely
screwed up. I have been in many telecom companies and you always need a common bus and to megohm your grounds
to prevent that sort of thing. Amazon's not too bright - things like this are just a fact of poor planning!!
0 Votes
+ -
What amazon should be doing is what they do in many carrier datacenters. You see a storm coming, then you spin up your generators and sync phase, and run them idle, then your cut over is quicker. Sure it takes extra diesel fuel or natural gas to do this, but it's safer, and it's an effective method that many large DCs use that I've worked with in the past.
0 Votes
+ -
Homer says - D'oh.

Amazon's AWS is quickly becoming a joke.

Don't they have huge f-c-u-k off UPS/Generator backup capacity, for this exact scenario of screwed up local power ?

Guessing from the text, it is local power supplies disrupted by the Lightening Strike, as opposed to their being a black scorch-mark on the AWS Datacentre

I think they need to sit down and re-evaluate, as this is a second, very public, disaster in 6 months. Perhaps they need to have more smaller datacentrees, perhaps they could put a containerised one in each retail DC they have globally. Having only one AWS DC for entire local European EC2 seems insanity, even if theoretically it can failover to one in another continent.
0 Votes
+ -
I smell three things:
1. absent or improper overvoltage protection for amazon's power supply system. Didn't someone tell these guys to hire professionals?
2. Absent or insufficient UPS backup power. Like one of my teachers used to say, you can't use $1000 backup on a $30000 equipment (this is just an anecdote.
3. Sufficient backup measures are present and amazon's just hiding something.
0 Votes
+ -
The good and bad of the 'cloud'
dbjohnso67 8th Aug
A Enterprise level Infrastructure team worth their pay would have had D.R. plans at the ready should this scenario present itself and would have enacted them within hours. To all those who had the financial resources and talent and were still down for an extended amount of time shame on you?

Those medium sized companies just glad to be "on the cloud" and not need to have those 50 servers running in house in their undersized overheated "offices/closets" probably did not have the $$ , resources or manpower to ensure a good D.R. strategy let alone then be able to put it in action end up being the big losers.

So I think the allure of the "cloud" to the medium guy is the problem zone because it can be very enticing because of the sense of safety/security they feel because they are using one of the ?big guys? (Amazon,Google,Microsoft) for their Infrastructure fully knowing they could probably never achieve quite S.L.A. on their own.
0 Votes
+ -
@Dean.Johnson@... This is exactly why I have kept all of my customers so far off of the cloud. Any sort of failure like this can ruin a small business and chances are there is something in the AWS TOS that accounts for this. A lot of times when talking about these products I have heard people say "We can just sue them if something goes wrong." I am sure that suing Amazon would take more time and resources than most anyone could afford. Something like this can take days to sort out while if you had your main servers onsite it should take a competent IT team a few hours to recover (unless your power company is lazy). I use the cloud for backup and for web sites but would never rely on it for a companies only copy of their data.
0 Votes
+ -
*sigh* so sad....been warning about this since I heard the term "cloud". At least we are now seeing proof that the cloud is NOT safe....trust yourself to handle your data and NOT someone else that is charging for the service and will cut you loose as soon as there is a problem.
0 Votes
+ -
@ColdFusion_z

Couldn't agree more. To add: It's a shame we're building a society of dependents and failing to instill self-sufficiency. All too often when things go wrong fingers point away from one's self and toward another.
0 Votes
+ -
@WayneC369

I keep a folder of stories like this one, to have on hand, when some dimwit MBA wanna-be starts spouting "the cloud". In a way, we have been down this road before.

My former boss, ignored many warnings that the UPS just did not have the capacity to handle a complete power failure; and he ignored those warnings. One day, his (and our) luck ran out. We went down hard, and for a few days. The owner was pissed. After we got back up, the inquisition began. Fortunately, for us peed-ons , we documented everything. The boot came swiftly, and without mercy (NO severance pay).
A Enterprise level Infrastructure team worth their pay would have had D.R. plans at the ready should this scenario present itself and would have enacted them within hours. To all those who had the financial resources and talent and were still down for an extended amount of time shame on you?

Those medium sized companies just glad to be "on the cloud" and not need to have those 50 servers running in house in their undersized overheated "offices/closets" probably did not have the $$ , resources or manpower to ensure a good D.R. strategy let alone then be able to put it in action are the big losers in this case..

So I think the allure of the "cloud" to the medium guy is the problem zone because it can be very enticing because of the false sense of TOTAL safety/security they feel because they are using one of the ?big guys? (Amazon,Google,Microsoft) for their Infrastructure fully knowing they could probably never achieve quite S.L.A. on their own.
A few more black eyes and the cloud's going the way of the dodo due to lack of trust.

Isn't the phase control system supposed to (inasmuch as possible) instantly switchover to backup generators, while separate UPS's regulate voltage and handle the load during the switchover? I'd really love to know which vendors supplied the equipment. (Phil, can you provide this information?) It sounds to me like the phase control system and UPS's have software/hardware bugs or the people responsible for designing the datacenter connected to the utility company in such a way that the electrical backup systems were left vulnerable.
0 Votes
+ -
How to set up a Cloud Data Center:

Step 1.) Throw up services as quickly as possible, without regard to redundancy and quality of service.

Step 2.) Discover that existing customers and applications do not allow backfill of redundancy and QoS without unacceptable downtime to implement.

Step 3.) Wait for the inevitable.
0 Votes
+ -
@kmarsh@...

... you forgot the Teflon coated terms of service agreement. Make sure you have that in place before you begin collecting $$$.
0 Votes
+ -
Step 1.) Throw up services and sign up customers as quickly as possible, without regard to redundancy and Quality of Service, planning to backfill later.

Step 2.) Discover that existing customers, applications and architecture does not allowing backfilling redundancy and QoS without unacceptable downtime.

Step 3.) Wait for the inevitable.
0 Votes
+ -
The CLOUD push is just one more attempt to centralization of everything.

ONE WOLRD CURRENY ? Centralization of monetary power
ONE WORLD GOVERNMNET ? Centralization of political power
THE CLOUD ? Centralization of information power

Centralization is only good for the few at the top just like a pyramid scheme.
0 Votes
+ -
Anyone who sets up a network active only system for their business deserves to go out of business. All my factory systems are network connected but are designed to run without a network. There's too much that can go wrong that are out of my control. At one company I worked at our intenet downtimes were mainly caused by backhoes or heavy rain.
0 Votes
+ -
RE: Lightning strike zaps EC2 Ireland
chris.lindley@... 8th Aug
...dear God, a need a sign. Will my business ever be safe from you? Please strike a Rackspace datacentre and show me if they are the chosen ones...
0 Votes
+ -
Cloud-to-cloud lighting is commonly known as Sheet Lighting! LOL! Who says cloud computing is really safe? That's why I like my data on the ground and protected with lightning rods. It means seldom if ever, delays. Fortunately it's only two days.

http://en.wikipedia.org/wiki/Lightning
0 Votes
+ -
One more example of a "Failsafe" system. When, of when, will our society ever learn that NOTHING works properly over the long haul and stop piling all their eggs into the newest shiny basket.....?

EAR
0 Votes
+ -
Did anyone get that lightning?
Dukhalion 9th Aug
Has it been questioned yet? What was it's reasons for behaving like that, a bad childhood perhaps? Is there a mugshot of it, or did it get away before anybody had a chance to do anything? And don't You people feel any shame for giving all the blame to the lightning, it was just doing it's job.
0 Votes
+ -
RE: Lightning strike zaps EC2 Ireland
tanhansing@... 9th Aug
These hackers are getting more and more resourceful. They could now mobilize mother nature to get the same job done!!! grin
0 Votes
+ -
There is NOTHING that beats your own data backups! Cloud or no Cloud. This is a good lesson for the cheap skates. At least with your own backups you can still run. Then later update your cloud storage. Cheap cloud may make your numbers look good, but lost data makes those numbers worthless.
0 Votes
+ -
I?m no electrician and it certainly sounds serious, but I didn?t read anything about alternative electric grid access with the same or alternative power supplier so that if one transformer or entire grid is lost you have access to a backup before the need to initiate a backup generator; a common practice with tier-3 data centers in the states. I?m not sure I understand why it takes 24 to 48 hours to restore instances, but probably has something to do with something less than enterprise class storage behind the servers.
0 Votes
+ -
RE: Lightning strike zaps EC2 Ireland
cheaphostingreviews 25th Aug
Excellent Info!I got goog knowledge about this posting,Really great pleasure to reading this posting.I bookmarked this.
Sounds like you had a great time with great people. That?s the way to live!Thanks for the info, too.
cheap hosting reviews

Join the conversation!

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]
ie8 fix
Click Here
ie8 fix

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity

White Papers, Webcasts, & Resources
ie8 fix
ie8 fix