A less than merry Christmas for Netflix

A less than merry Christmas for Netflix

Summary: Yet another Amazon cloud failure; will anyone notice?

SHARE:
TOPICS: Cloud, Data Centers
49

If you were one of those people whose Christmas plans involved a serious overdose on Christmas themed movies via your subscription to Netflix, I’m sure you were disappointed settling for whatever movies were available on cable or OTA, as Netflix experienced a back-end failure that shut the service down Christmas Eve through Christmas day.

Once again, the outage wasn’t specifically Netflix’s fault, other than in their choice of back-end service provider. Hosted on the Amazon cloud, the Netflix outage was a result of problems with the Amazon service.  Amazon reports the problem was with their Amazon Web Services Elastic Load Balancer, in the US-East-Region1 datacenter.

Netflix is one of Amazon’s most highly visible customers for their cloud backend services, is likely the one that generates the highest amount of traffic, and which has a very sophisticated Content Delivery Network.   In short, it’s a very high-profile customer and this is the second time that an Amazon failure has brought down Netflix service delivery in the last six months.  Netflix has done what they can to ameliorate the effects of back-end problems, with their own CDN servers that are being deployed at hub ISPs through-out their service area, but the Amazon cloud failures have still shut down their service for extended periods.

Though its timing is unlikely to make it noticed by business customers, the scale of the failure, in bringing down the Netflix service, needs to be taken into consideration when businesses begin to transition more real-time 24/7 business applications to cloud backends. The question remains if these failures will be taken as a reflection on the cloud services industry as a whole or whether they will be laid at the feet of problematic design issues within the Amazon Web Services.

Topics: Cloud, Data Centers

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

49 comments
Log in or register to join the discussion
  • it failed because Netflix is proprietary

    and would not open source their code, so the community can fix it.
    LlNUX Geek
    • Wrong

      Anyone who's looked at Linux with unbiased eyes knows those clowns couldn't fix a piece of toast.
      JohnnyRenoNV
      • Well,

        It's not so much a matter of being unable to fix stuff, which they are quite adept at, it's getting them to do it. How many other Linux users here tried to build a cross compiler following instructions just to have it fail after an hour wait, and perhaps then be ignored or shunned by GCC developers when you report the problem?
        Subsentient
        • Re: Well

          The strength of opensource is that able people can fix the code without waiting on others. We have fixed opensource products a few times without waiting for our vendor. Anytime vendors are involved you're collecting reports and going back and forth with vendors wasting hours, days, and yes sometime weeks.

          With HP-UX one year, I was on a call for over 20 hours to repair a development box after a patch of theirs made the system unusable. The only reason I even called them as I was told to by a "superior" as that why we paid so much for support. Every time they had a shift change we started over. When the "superior" came back into work the next morning I got permission to end the call and restore service my way (less than 15 minutes). Needless to say I ended up being their boss a few years later. Our dependency on software vendors in mission critical spaces dropped substantially.

          Planning is everything. Customers don't care why a service is down. The service provider is responsible for their service in the eyes of the customer. A company pointing the finger at anyone else for their service being down is a sign of immaturity and customers see that.

          Netflix's service being down is their fault. They take money from their customers to provide a service, how they provide that service is their responsibility. The fact that Netflix did not plan, or make sure that their provider (Amazon) had an adequate plan is Netflix's problem.
          sys_engineer
      • Re: those clowns couldn't fix a piece of toast.

        That "piece of toast" is running rings around Apple and Microsoft as they desperately flounder around, trying to hold on to their dwindling market share.
        ldo17
    • Yeah right

      Ans of course you would have been online on Christmas eve to fix it. Give me a break of the open source "we can fix everything an make a better world" nonsense.
      gbouchard99@...
    • Got some fact to back up your assertions

      Look kid,
      Have you bother to do a scrap of analysis to prove your position and the huge problem it would face. Not only the problem is beyond in the scope of open source, the people probably doesn't even have the tools or willingness to fix the problem for free. If I had the skills to fix Netflix problem I want to be paid for my work.
      Then there huge security risk not only for theft but to attack other computers.
      Richardbz
    • no, the real problem is

      That the Amazon cheepskates have based their infrastructure on Linux which may be ok for simple websites, but which royally fails when used on cloud scale. Should have used a real operating system like windows or Solaris.
      honeymonster
      • Sorry it can have viruses

        and zero day exploits galore?
        T1Oracle
      • Uhm...

        To each their own, but Solaris is an uncertain OS with a terrible kernel and Windows is as fit to run a server as a duck is to fly a 747.
        Subsentient
      • Please

        The microsoft comment requires no response.

        Solaris. That is a dead O/S at this point for many reasons. I remember a bug back in the Sun days for JAVA on Solaris that nailed us. The funny thing is that the bug had been known about for a few years and fixed on Windows, and if memory serves Linux as well. But Solaris...no. The bug that hit us with one of Sun's utilities that used JAVA to manage...get this...Solaris.
        Oracle has only made the situation worse. Believe it or not, they have a class that they offer to their customers to better navigate Oracle's support. Ridiculous. If they have to offer a class to use their support system they might want to rethink how their support actually works. Aside from Oracle's database the fortune 150 company that I work for has been moving away from Oracle's products. Now we do have a plan to move away from their database if their costs get out of line, or the product is not longer being viewed helpful. We already have move away from dataguard for our RAC environments and replaced it with in-house code and disk cabinet based replication.
        sys_engineer
    • That's funny (not)

      I could have sworn that you're constantly harping about how Linux is the "only" server used by *all* corporations to run their network & Internet servers...which would, theoretically, include "cloud" servers.

      Not to mention that the problem was with *Amazon's* servers... you know, the company that uses the *Android* version of Linux for its tablets.
      spdragoo@...
      • Android

        To a person who wants their device tweakable, is a nightmare. It's CLI tools are horrible and it doesn't even have an /etc/passwd file. The filesystem layout and configuration of an android device is a literal abomination. It is NOT linux. Just the kernel. And that's far, far from true, not vanilla, but even distro-patched Linux. Oh, and I hate java.
        Subsentient
    • Netflix isn't the problem. Amazon is.

      Reading the article may help a bit...
      notme403@...
    • Or break it as usual....

      And then have the "community" blame someone else for their failings.
      bin00010111
  • Amazon's streaming video worked fine...

    Suspicious since Netflix was hosted on Amazon's servers...
    Noah Bershatsky
    • My thought as well

      I was wondering the same thing. Why would Amazon, selling a product at almost no profit (AWS), mess up one of their major competitors on a night they knew people would want to use Netflix. On the same night many would be playing with their shiny new kindles. All while keeping Amazon's competitive service running like a top.
      Bruizer
      • Amazon new, Netflix old.

        Heh... my wife and mom had great fun with their Kindles. I cancelled my Netflix account months ago when their software "upgrade" rendered it unuseable to me.
        JohnnyRenoNV
        • Was the "upgrade" issue just on 1 device?

          Although they had an issue with the Wii app a few months ago, it was straightened out fairly quickly -- haven't had problems since then.

          Nor did it affect using their website, using the iOS app on an iPod, or using it on a Roku, etc.

          Maybe Amazon just didn't want a competitor's service to run well on their product...
          spdragoo@...
        • Then you're not very smart

          The next upgrade probably fixed it.
          T1Oracle