Google gets a cold and the world gets pneumonia

Google gets a cold and the world gets pneumonia

Summary: Well, maybe not pneumonia, but at least a nasty case of bronchitis.

SHARE:
TOPICS: Google
24

Gmail went down on Monday. Not for a particularly long time. 33 minutes from outage to complete resolution, in fact. Late risers on the west coast probably wouldn't even have known about it if not for panicking tech pundits from the east coast. To hear Wired talk about it, this portends the end of the world as we know it. OK, they weren't quite that over-the-top, but they, like many news outlets, had some very dramatic sound bites about the issue.

I'm not dismissing this outage, by the way. I live, eat, and breathe Google and the Gmail outage (caused by a bad update to their load-balancing software) had ripple effects across many related services (including the Chrome browser for users who, like me, choose to sync data across their various services). This isn't a small thing and, in fact, leads to the title of this article.

For Google, it was a hiccup. A bit of bad software rolls out, doesn't work, and gets rolled back. For the millions of people who rely on Google to get their jobs done, to enable important (and sometimes critical) business and personal communications, to write and calculate and advertise and sell, even a minor blip is cause for concern. As one analyst posed in the aforementioned Wired article,

“Imagine a scenario where you can’t even open your Android phone or you can’t get phone calls on Google Voice. it’s not just your browser.”

Given the market penetration of Android and projected domination of the mobile space, this sounds like a nightmare scenario. One wrong move from Google and all of our phones, tablets, Chromebooks, browsers, and communication tools go dead, assuming we've bought into the whole Google ecosystem (and many of us have). Doctors don't get urgent messages, stocks don't get traded, teenagers around the world stop texting for half an hour...you get the idea.

In reality, it's also a pretty damned unlikely scenario. In part, problems like those encountered Monday are rare anyway and Google's business model relies on the trust of its users. Google has the ultimate vested interest in ensuring problems like these don't happen.

Let's also keep in mind that Google detected the problem via its own monitoring software within 21 minutes and took action 7 minutes later. Just a few minutes later, the bad update was rolled back off of its production servers. There aren't many IT departments that can claim that sort of response time for on-premise communication and collaboration software. All users had to do was tweet about the Gmail outage for half an hour and they were back up and running.

Yes, there are risks involved in putting all of your IT eggs in one basket, whether that basket is in Mountanview, Redmond, Seattle, or somewhere else.. What's the alternative, though? Several disparate systems from several vendors, requring either separate federation systems or countless user logins? Or expensive, highly redundant on-premise solutions? Even Microsoft and its partners are doing a healthy business selling hosted solutions because they generally save time and money.

Whether your system of choice comes from Google, Microsoft, Amazon, Apple, or sits in your own datacenter, someday it's going to go down. Service providers strive for "five nines" or 99.999% uptime. That's a great goal, but even that goal (a stretch for many) implies that some downtime is inevitable.

Google's success means that even that tiny amount of downtime has wide-ranging, worldwide effects and commensurate headlines and Twitter outrage. However, it's important to keep this in perspective. When a plane crashes, it makes headlines for days. Hundreds of people might die at once. And yet 3000 people die every day worldwide in car accidents, very few of which we ever hear about. It's a matter of scale that makes front-page news.

Are Google's or Amazon's scale reason enough to avoid the cloud? Not at all. The conveniences and cost savings for most businesses make occasional downtime an extremely reasonable risk for the majority of businesses and individuals. The key is managing panic when things do go wrong, as well as demanding that cloud providers (the big guns in particular) continue to innovate and offer better reliability at better prices than we can achieve ourselves.

Topic: Google

Christopher Dawson

About Christopher Dawson

Chris Dawson is a freelance writer, consultant, and policy advocate with 20 years of experience in education, technology, and the intersection of the two.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

24 comments
Log in or register to join the discussion
  • you address some good points

    You addressed some very good points that i was thinking yesterday when i read another article that pretty much just bashed Google for allowing this to happen. "How dare they!" It's technology people. We're on the cutting edge every day in everyway and we can't expect services like these that run for hundreds of millions of people worldwide to be there every second of every day. Thing like this happen. I understand businesses have deadlines and such but for the average user, there is no reason a 30 minute outage once a decade is going to ruin your life. If you were waiting until exactly 8:30am dec 11,2012 to send the most important email of your life and it wouldnt go through... well maybe your plan should have included a backup text message.
    ukjb
  • Yes, but...

    You should be this understanding when it is RIM.
    Sure, their output was longer but it was a hardware issue, not a bad software patch that could be rolled back.
    Susan Antony
  • Perspective

    Google recognizes an issue in 21 minutes, determines a fix in 7 minutes, and the problem is resolved 4 minutes later. The world is coming to an end.

    Office 365 goes down for between 6 and 9 1/2 hours, twice in one month, and there is hardly a murmur.

    Managing expectations can be a challenge.
    allenfalcon
    • Try something like

      Google recognizes an issue in 21 minutes, determines a fix in 7 minutes, and the problem is resolved 24 hours later.

      that's typical of Google. You can't point to ine issue, ignoring all the others claiming "see how great they are". If what you described was typical of them, then yeah.

      But it's not
      William Farrel
  • Makes sense, but explain that to an executive

    that was sold on Google cockiness and "up time". Granted being IT we know these things happen, but when you take away functionality to save $'s and 40 minutes is down - it's a dance to explain.
    ScanBack
    • Where an "executive" is someone who reached their level of incompetence

      Sure, staying employed in IT means giving a reacharound to "executives".

      But since they are now paying as little as $35K for overworked system administrators, largely so there is someone else in the organization to blame and threaten for their crappy decisions, can we stop pretending IT is a place where there are good jobs worth trying to keep? Because those days are gone.
      undertoad@...
      • People seem to happily choose such a level of pay

        Where's the problem?
        HypnoToad72
        • I might have forgotten to add the sarcasm tag...

          Unless I had forgotten...
          HypnoToad72
    • They're better off with Google than the alternatives

      Here in West Virginia, Suddenlink (a cable ISP) lost its email servers on a Friday morning, and it took until Monday before people STARTED getting their email back up. Then they had to restore backups so that email and contacts stored on their servers returned. For us (a nonprofit that uses Suddenlink as an ISP and email service), we STILL are having problems sending group emails. So while Google can't claim 100% up-time (show me who can), as the author points out, they still beat the hell out of everyone else. Even your own IT department.
      big red one
  • Google Down?

    Since I don't depend on Google very much myself, I didn't even know anything was wrong. I am more upset when my ISP Cox goes down. I am not a business, so can always find something else to do. Obviously a lot of people have more invested in Google than I do.
    rgeiken@...
    • Or what they believed they would get

      This is what people want, so hopefully they won't argue.
      HypnoToad72
  • Remember its free

    As a non entwrprise, zero $ resourse, anyone relying on a free service with no SLAs is using it at their own risk. Perhaps if you are relying on this for business you should invest in proper infrastructure (exchange, domino, etc). Otherwise you just might as well claim you base your business connectivity off hotmail also. Sounds a little stupid to me.
    Kingotch
    • No its not free, the price is scanning through everything you write and

      selling that info to whoever will pay.
      Johnny Vegas
      • the conspireacy theory mockerey is starting to get old.

        People like you like to paint a picture of a nefarious weasle big brother reading everyone's secrets and selling them to the highest bidder like a snotty little boy reading his sister's diary at recess.
        NO! Get your head out of the sand!
        They have automated computers to sift through your stuff picking up on keywords to sell you stuff... And you know what? That is a completely reasonable trade-off for a free email and web search service that happens to be better than all the rest IMO, not to mention all the rest of their free services.
        ukjb
    • Most businesses using Gmail...

      ... probably aren't using the free version.
      bhartman36
  • Summer Wars

    Kind reminds me of a SF Anime a world where everything is tied into the cloud system call OZ. All is well until a nasty virus take control.

    The issue is allowing Google to put all IT needs into a single basket; the problem with centralization. I keep a lot of redundancy decentralize my needs. for example my domain name is separate company from my company I use for web hosting. I will be getting a separate email service for the domain registrant address and to serve as a back up if I suddenly need to transfer my email accounts to a different company.
    Richardbz
  • Bah

    You're exactly right in describing the whole thing as a hiccup.

    For the vast majority of users, it meant a whole half-hour of unaccustomed peace and quiet they had to find a way to fill or a minor inconvenience to be ridden out. The "lost productivity" people always scream about was in large part avoided by a little creative reshuffling of schedules and tasks. Some sales were put off a little while or went to a competitor who happened to be using a different provider, which will no doubt go down one of these days and send a few sales back. And the world went on. (Anyone relying on Google--or any other single service provider, without any fallback plan for outages--for true life-or-death matters needs to have their head examined in the first place.)

    How the hell did some of these people manage--or how would they have managed, if they're too young to remember--before there was an internet and cell phones? The world doesn't stop turning because of a few missed messages.
    Ginevra
  • Real service companies do rollouts of new service software like this to a

    fraction of their servers, wait and monitor their behavior for long enough to be certain they're functioning properly, then if yes rolling it out to an additional fraction, etc. until it's done. Google should have seen this well before it was on enough servers to impact anyone. It should have been discovered by the time it had reached around 20% of their servers and then those should have been removed from rotation and rolled back while the 80% still online with the previous version still have plenty of headroom to handle normal daily traffic with no impact.
    Johnny Vegas
  • No system is foolproof.

    Murphy's law. Sooner or later, anything that can go wrong will. You can never eliminate all risks. You just have to decide how much risk-reduction you're willing to pay for. Google seems to me to have gotten the risk of down time about as low as anybody's going to get it. Life is a terminal illness. If you can't afford down time, pay for backup system. Otherwise, when the system is down, relax and read a book.
    daniel1948x
  • Remember "America Offline"?

    AOL went down for a day and the business world went haywire.

    In this case, I'm pretty sure that at the very least, Google will fix Chrome so that it doesn't crash if Google services go down again.
    John L. Ries