Lightning strike zaps EC2 Ireland

Lightning strike zaps EC2 Ireland

Summary: A lightning strike last night knocked out servers at Amazon's only European data center and the provider has warned some of those affected face delays of up to two days before they get back online.

SHARE:

Amazon has told its EC2 customers in Europe some of them could face outages of as long as 24 to 48 hours as the cloud provider struggles to recover from a lightning strike that disrupted power supplies to its Dublin, Ireland data center. It took 3 hours to recover the first of the affected instances last evening European time (midday Pacific time) and after almost 12 hours a quarter still remained offline, with knock-on effects slowing their likely recovery time. From Amazon's status page (12:08am PDT update):

"Due to the scale of the power disruption, a large number of EBS servers lost power and require manual operations before volumes can be restored. Restoring these volumes requires that we make an extra copy of all data, which has consumed most spare capacity and slowed our recovery process. We've been able to restore EC2 instances without attached EBS volumes, as well as some EC2 instances with attached EBS volumes. We are in the process of installing additional capacity in order to support this process both by adding available capacity currently onsite and by moving capacity from other availability zones to the affected zone. While many volumes will be restored over the next several hours, we anticipate that it will take 24-48 hours until the process is completed. In some cases EC2 instances or EBS servers lost power before writes to their volumes were completely consistent. Because of this, in some cases we will provide customers with a recovery snapshot instead of restoring their volume so they can validate the health of their volumes before returning them to service. We will contact those customers with information about their recovery snapshot."

The outage struck servers in one of three availability zones in the EU-WEST-1 region, but recovery efforts have had knock-on effects to capacity in the other two zones. Relational Database Service (RDS) is also badly affected. EU-WEST-1 is Amazon's only data center in Europe, which means that customers who have to keep their data within the European region for data protection compliance have no available failover to another Amazon location.

How the outage happened, from Amazon's status page history:

"We understand at this point that a lighting strike hit a transformer from a utility provider to one of our Availability Zones in Dublin, sparking an explosion and fire. Normally, upon dropping the utility power provided by the transformer, electrical load would be seamlessly picked up by backup generators. The transient electric deviation caused by the explosion was large enough that it propagated to a portion of the phase control system that synchronizes the backup generator plant, disabling some of them. Power sources must be phase-synchronized before they can be brought online to load. Bringing these generators online required manual synchronization. We've now restored power to the Availability Zone and are bringing EC2 instances up. We'll be carefully reviewing the isolation that exists between the control system and other components. The event began at 10:41 AM PDT with instances beginning to recover at 1:47 PM PDT."

In what seems to be a typical pattern when Amazon experiences large-scale outages, its customers have been complaining of insufficient information coming out to help them recover. "With AWS it is more a process of figuring it out through trail and error with little or poor feedback from Amazon," wrote one poster to a thread about the outage on its discussion boards. "I hope they get the remaining instances up but from their service dashboard it says 24-48 hours. This can can totally ruin my company."

See also:

Topics: Amazon, Data Management, Storage

Phil Wainewright

About Phil Wainewright

Since 1998, Phil Wainewright has been a thought leader in cloud computing as a blogger, analyst and consultant.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

46 comments
Log in or register to join the discussion
  • Second Strike?

    Without thinking too deeply it occurs to me that:

    1. "EBS servers lost power before writes to their volumes were completely consistent" so Amazon's power outage contingency and system design is worse than my amateur provisions.

    2. Like many incumbent global technology corporations the constraints (differing national laws, privacy, need to keep shareholders happy and make a huge profit, stay with current architectures, complexity of virtualisation ...) are so onerous that the company is more or less painted into a corner.
    I felt the same way about Windows Home Server. Instead of a trivial addition of iSCSI target to (say) Windows Ultimate and a simple control panel (say Storage Server Lite) ... M$ insisted on a new box and a new OS and a new file system which they botched twice.

    3. One plan to cripple the West in 2020 would not be to destroy the Internet but to take out the cloud datacentres. Ironically the most likely sites to survive would be P2P trackers.

    4. THE CLOUD appears to have all the complexity and stability of THE FINANCIAL SYSTEM :-(

    Perhaps when 'lightning' has struck from the clouds sufficiently many times ... people will come to realise that lighning only comes via the clouds!!
    jacksonjohn
    • When you Said M$ you lost some respect!

      @johnfenjackson@...

      And what does is have to do with windows home server again?
      Viper589
      • Just who do you think this was designed by/for?

        @Knix96
        "4. THE CLOUD appears to have all the complexity and stability of THE FINANCIAL SYSTEM"

        Just who do you think this was designed by/for?
        PC Ferret
    • Unlike you're usual BS posts

      @johnfenjackson@...
      I actually thought you where going to write an on topic, thoughtful reply for a change.

      Instead you just [b]completely[/b] botched the opportunity presented to you, instead (once again) used it as a way to take a dig at MS in some unconnected manner.
      William Farrell
    • RE: Lightning strike zaps EC2 Ireland

      @johnfenjackson@...

      ?4) THE COULD appears to have all the complexity of the FINANCIAL SYSTEM ?

      Actually if the cloud were like the financial system then it would

      a) Change its available drive space (normally growing and rarely decreasing) not based on activity or need but the whims of some elite banksters in another country operating thru a front that we might call the FEDERAL CLOUD RESERVE

      b) No matter how often the cloud were down it would always be reported in the media as being up and some would even report it as running better than normal

      c) All in all the productivity of the cloud would go down as the costs to use it would go up unless you were part of the inside ?silver lining? group in which case your cloud productivity would increase equal to the reduction of everyone else?s.
      BlueCollarCritic
  • RE: Lightning strike zaps EC2 Ireland

    I just paid $22.87 for an iPad2-64GB and my girlfriend loves her Panasonic Lumix GF 1 Camera that we got for $38.76 there arriving tomorrow by UPS. I will never pay such expensive retail prices in stores again. Especially when I also sold a 40 inch LED TV to my boss for $675 which only cost me $62.81 to buy. Here is the website we use to get it all from, BidsOut. com
    VillarrealAndy
    • RE: Lightning strike zaps EC2 Ireland

      @VillarrealAndy Oh, yes, yes, please - I also want the Mercedes for $5.20 and the blow up doll for $0.99. This message is illustrative of what is wrong with the Internet - For The Morons, By The Morons. Anybody who uses BidsOut deserves to be fleeced.
      kolberey@...
  • RE: Lightning strike zaps EC2 Ireland

    It's that new fangled lightning thingy. Like anything new, it will take awhile to figure out what to do.
    DKFlorida
    • RE: Lightning strike zaps EC2 Ireland

      @DKFlorida lol - and as we now know, lightning never strikes twice, so it can safely be disregarded from any DR plans in the future.
      ejhonda
      • RE: Lightning strike zaps EC2 Ireland

        @ejhonda

        That reminds me of a 300 feet deep water well we have in which the steel casing is 111 feet to bed rock. The electrician bonded/grounded/earthed the casing as required per the NEC. The pump was submerged and rested at 280 feet. The pump and controls were hit SEVEN times by llightning. Upon the installation of the eighth pump and motor ass'y I instructed the maintenance techs to cut the ground wire to the casing as the electrician refused to do so, as he should've. The eighth pump has been in operation for 18 years now with zero problems.

        Apparently, the well casing provided the low impedance path for the power girid in that location. Perhaps it isn't safe in all respects, but it's far better than spending a week errecting a derrick and pulling it all out of the ground and replacing it. I might add the electrician was required to assist in this endeavor, so when he saw the 111 ft ground rod disconnected and no more burn outs his protests quickly diminished.

        Don't get me wrong, grounding IS totally necessary and critical for safety. But, when the utility company isn't doing it's part drastic measures must be taken.
        WayneC369
  • Lightning? Really?

    Didn't Ben Franklin solve this problem a few hundred years ago? Its really not that difficult to design a system to isolate lightning from the equipment. Yet I see time and time again people running data centers who fail to do so. (Sadly, myself included, no one listened to my warnings that we needed proper ground buses and mere weeks after it went online nearby lightning, not even a direct strike, knocked out our switches and welded all the ports on our $20,000 carrier grade router. sigh.)
    cabdriverjim
    • RE: no one listened to my warnings

      @cabdriverjim

      And, most likely, those fools are either, 1) no longer with the company, or 2) promoted to manglement.

      Let me say this, for my current boss' predecessor; it was a career ending move when the owner learned that he was warned repeatedly about the inadequacy of the UPS capacity, and ignored the advice.

      You do not take chances in the lightning capital of the US. One good strike cost him his job.
      fatman65535
  • RE: Lightning strike zaps EC2 Ireland

    Redundancy, redundancy, redundancy. Should I say it again? It's foolish to rely on one medium for your backups. This is why I'm looking forward to using iCloud, since I will retain the data on my computers as well.
    Mike Van Horn
  • Is The Cloud Really Ready for Primetime?

    If this had been propery configured and distributed the Amazon users would never had noticed, as the data would have been distributed pan-globally in mirrored servers.

    I wonder how many "clouds" don't have a lining at all!
    PC Ferret
    • RE: Lightning strike zaps EC2 Ireland

      @PC Ferret

      Amazon users do have the option to set that up. The problem is that there is only one DC in Europe, and some users, by law, must keep their data in Europe.
      zoredache
  • RE: Lightning strike zaps EC2 Ireland

    What, No UPS? Most places have rack UPS to cover the switchover to the generators as that usually takes a few seconds? Someone promised them an instant switchover?
    bobdavis321
    • RE: Lightning strike zaps EC2 Ireland

      @bobdavis321
      Yes I agree, its basic to have some form of UPS, even if its to help clean up any noise on the mains supply.
      Dave51
    • If You read the article

      @bobdavis321
      You would have read that the back up generator should have started and taken over for the loss of power, but the lightning strike caused more damage to the circuitry that switches power.
      sboverie
    • RE: Lightning strike zaps EC2 Ireland

      @bobdavis321 : Sounds like they did have rack UPS, but it took longer for generators to sync than allowed for. Meanwhile, the system kept on accepting new transactions to write to disk. If it had started turning them down when power was lost, and simply updated HD writes in progress to the disk, it wouldn't have lost data. (or had 'inconsistent' data)
      meski.oz@...
  • RE: Lightning strike zaps EC2 Ireland

    Even for customers that require a European regulatory compliance (to have data inside Europe), with a multi-AZ failover setup they could have avoided a complete outage. For others, a failover setup in a different region can address most of these situations in the future too. Rather than complaining too much about AWS, it would make better business sense to design your cloud strategy for failure. <br>http://www.cloud.8kmiles.com
    paddydefies@...