How Amazon ruined my Christmas

How Amazon ruined my Christmas

Summary: Netflix's Christmas outage is yet another reminder that downtime happens at the worst possible time and the cloud is not inherently resilient.

SHARE:
TOPICS: Outage, Cloud
41

Here's how Amazon ruined my Christmas: After devouring a lovely rib roast with a porcini-spinach stuffing (recipe here in case your stomach is now growling), we all curled up on the couch with hot cocoa, turned on Netflix streaming to watch classic Christmas movies (and past Doctor Who Christmas Specials)... only to get an error message. That's right, in case you missed it, Netflix was down on Christmas Eve and Christmas Day in North America for many users due to issues with Amazon's Elastic Load Balancing (ELB) service in the US East region. It's interesting to note that this is at least the third time issues with the ELB service have caused problems for Netflix, with each time, the company making improvements to prevent this from happening again.

You might be thinking, "ruin" is a strong word to describe what happened to me (and many others) on Christmas Eve, but I use it to illustrate a point: Even though this particular outage was probably not the most severe (in duration or number of customers impacted), it may well be the most costly for Netflix. Why? Because of TIMING. I've been saying for a while that timing and duration are more critical indicators of availability performance and impacts than looking at "nines" (99.99%, 99.999%, etc.). If this same outage had occurred just a day or two earlier, the impact would be significantly different. And unfortunately for Netflix, because of the timing, this is an outage that many customers will remember.

I write this not to be punitive towards Amazon or Netflix (or any of the other services that experienced downtime on the 24th/25th), but as a reminder/cautionary tale that:

  • Downtime will happen at the worst possible time. When designing continuity plans, it's prudent to hope for the best, but plan for the worst. Since the universe tends to be cruel and somewhat random, you may experience an outage at the worst possible time. Any calculations on the costs of downtime must account for this.
  • The cloud is not inherently resilient. Netflix is one of the most mature implementations of cloud resiliency that I have seen, and they still experience outages. You are responsible for resiliency of the applications you deploy in the cloud, not your cloud provider. If you architect your applications to be able to withstand the loss of systems or sites (Netflix, for example, uses chaos monkeys and gorillas for this), you will be much more able to withstand failures from the cloud provider.
  • Don't take away my Doctor Who Christmas specials. Seriously, don't do it.

Topics: Outage, Cloud

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

41 comments
Log in or register to join the discussion
  • Linux servers break down when it matters the most

    Anyone surprised?
    LBiege
    • of course IT issues happen

      So what is the point of this article? Isnt it obvious that these things happen. Maybe as a writer and someone not involved with IT this surprises her, but it is part of the IT world. Wake up. Didn't it occur to you there are many other ways of getting movies beside Netflix. Hello? Or is this just an article based on a dumb argument.
      FinFineman
      • So, is that your response

        When your power goes out?
        baggins_z
        • Errr....

          UPS and generator. Any business that relies heavily on their servers to work, must have them or don't open for business.
          Gisabun
    • Linux?

      Your right, if they had used Windows servers it could break down daily..... having run lots of Windows and Linux servers I'll take the Linux boxes any day. Not only are the Linux servers more stable, but the tools to diagnose issues when they do occur, are far better.
      tarapup
      • Really?

        We have something like 40 Windows Servers at my site and they almost never go down. I'm not saying the Linux would either but, this isn't 1998 and Windows has gotten a lot more stable.
        slickjim
      • Errr....

        I managed various servers - Windows and Linux. Only time when a Windows server when down was with a hardware problem - not an OS problem. Don't need to have third party diagnostics when modern servers include them.
        Gisabun
  • Doctor who

    I grabbed the Doctor Who Christmas special via bit torrent. My Christmas has great.
    Some Internet Dude
  • What is wrong with you?

    This "story" is more befitting of an insipid personal blog than a technology web site. You have contributed absolutely nothing of worth or value to society, you selfishly ignorant waste of sentience.

    Par for the modern course.
    Theodore Juices
    • What is wrong with YOU?

      This "post" is more befitting of a monkey flinging it's poo than any sort of insightful commentary. You've contributed absolutely nothing of worth or value to this talkback much less society you trolling douchebag.

      There, I fixed it for you troll.
      athynz
      • LOL

        I hardly ever agree with anything you write, Pete, but let me just say: "Bravo".
        Hallowed are the Ori
  • Amazon/Netflix is not a monopoly; many users have competing systems to keep

    ... their Christmas going. For example, there is Apple iTunes ecosystem, Microsoft's one, Google's one.
    DDERSSS
    • Yes, however

      One pays per month for Netflix service - as well as hulu+ BTW - and some or most of those people do not use iTunes, Google Play, or other streaming services that do not use Amazon's ELB. In fact there are MORE people that use Nexflix as opposed to iTunes.
      athynz
  • Prime

    What's even funnier is my Amazon Prime video worked just fine those days. Conspiracy theories would suggest it was not an accident. But then again what company in their right mind would rely on a competitor to handle their business. Netflix if this is not a wake-up call I don't know what is.
    7_USA
    • To dingdong 7_USA,

      I'm reasonably certain that Amazon IT people didn't wake up Christmas morning and sigh, 'Gee, I'd like to make the cloud dysfunctional today, only for Netflix.' Especially, since I'm sure any service contract provides for reimbursement when system goes down.

      Granted, this article is stupid but your conclusion from it is even more daft. Prime costs Amazon more money than it makes from the annual subscription fee. I am a Prime customer. Netflix costs a bit more money per year, for less value added, but for people who don't want Prime (and many want both), it's a good fit, Netflix and Amazon. I'd be more interested as a Prime customer IN Netflix because it's part of the Amazon ecosystem. Amazon would know that, it's a logical conclusion.

      So if anything, Amazon would go out of its way to provide superior service TO Netflix.

      If you can't see that, then I suppose the Google Chrome and gmail outages are conspiracies in your mind, too. :)
      brainout
  • Welcome to the cloud :)

    My uptime is 99.999999999999999999; I've got Dr Who on DVD and two players.

    Why do you think it's called the Cloud? It's because it can rain on your parade whenever it wants to, and you can't do a thing about it.
    Anono Mouser
    • LOL

      "Why do you think it's called the Cloud? It's because it can rain on your parade whenever it wants to..."

      You win the internets, my friend.
      dsf3g
    • Oh year? Well I got...

      ...FINGER PUPPETS! I don't need to rely on that monopolistic "power company" for my Holiday Who Entertainment when I can re-enact it with time-tested finger puppets. Even better, I just have to change the felt outfit for the Dr. Who puppet and voila! Zombie Dick Clark just in time for New Year's Eve.
      jvitous
  • Disingenuous Author and Title

    [quote] You are responsible the for resiliency of the applications you deploy in the cloud, not your cloud provider. If you architect your applications to be able to withstand the loss of systems or sites (Netflix, for example, uses chaos monkeys and gorillas for this), you will be much more withstand failures from the cloud provider.[/quote]

    Your words. If this is your belief, why are you slandering Amazon in your title when it should read How Netflix ruined my Christmas?
    Huckleseed
    • Excuse me, libel.

      We non-lawyer types easily mix the two.
      Huckleseed