Google: 'At scale, everything breaks'

Summary:Distinguished Google Fellow Urs Hölzle discusses the challenges of scaling Google's infrastructure, coping with cascading failures and the rise of flash storage in the modern datacentre your own solution. Maybe you can do your own solution but you [might not be able to] justify the software engineering effort and then the ongoing maintenance, instead of staying within an open-source system.

In your role, what is the most captivating technical problem you deal with?
I think the big challenges haven't changed that much. I'd say that it's dealing with failure, because at scale everything breaks no matter what you do and you have to deal reasonably cleanly with that and try to hide it from the people actually using your system.

At scale everything breaks no matter what you do and you have to deal reasonably cleanly with that and try to hide it from the people actually using your system.

There are two big reasons why MapReduce-Hadoop is really popular. One is that it gets rid of your parallelisation problem because it parallelises automatically and does load-balancing automatically across machines. But the second one is if you have a large computation, it deals with failures. So if one of [your] machines dies in the middle of a 10-hour computation, then you're fine. It just happens.

I think the second one is dealing with stateful, mutable states. MapReduce is easy because it's a case of presenting it with a number of files and having it compute them and, if things go wrong, you can just do it again. But Gmail, IM and other stateful services have very different security [uptime and data-loss] implications.

We use tapes, still, in this age because they're actually a very cost-effective way as a last resort for Gmail. The reason why we put it in is not physical data loss, but once in a blue moon you will have a bug that destroys all copies of the online data and your only protection is to have something that is not connected to the same software system, so you can go and redo it.

The last challenge we're seeing is to use commodity hardware but actually make it work in the face of rapid innovation cycles. For example, here's a new HDD (hard-disk drive). There's a lot of pressure in the market to get it out because you want to be the first one with a 3TB drive and there's a lot of cost pressure to, but how do you actually make these drives reliable?

As a large-scale user, we see all the corner cases and in virtually every piece of hardware we use we find bugs, even if it's a shipping piece of hardware.

If you use the same operating system, like Linux, and run the same computation on 10,000 machines and every day 100 of them fail, you're going to say, wow this is wrong. But if you did it by yourself, it's a one-percent failure rate. So three times a year you'd have to change your server. You probably wouldn't take the effort to debug and you'd think it was a random fluke or you'd debug and it wouldn't be happening any more.

It seems you want all your services to speak to each other. But surely this introduces its own problems of complexity?
Automation is key, but it's also dangerous. You can shut down all machines automatically if you have a bug. It's one of the things that is very challenging to do because you want uniformity and automation, but at the same time you can't really automate everything without lots of safeguards or you get into cascading failures.

Keeping things simple and yet scalable is actually the biggest challenge.

Complexity is evil in the grand scheme of things because it makes it possible for these bugs to lurk that you see only once every two or three years, but when you see them it's a big story because it had a large, cascading effect.

Keeping things simple and yet scalable is actually the biggest challenge. It's really, really hard. Most things don't work that well at scale, so you need to introduce some complexity, but you have to keep it down.

Have you looked into some of the emerging hardware, such as PCIe-linked flash?
We're not a priori excluding anything and we're playing with things all the time. I would expect PCIe flash to become a commodity because it's a pretty good way of exposing flash to your operating system. But flash is still tricky because the durability is not very good.

I think these all have the promise of deeply affecting how applications are written because if you can afford to put most of your data in a storage medium like this rather than on a moving head, then a factor of a thousand makes a huge difference. But these things are not close enough yet to disk in terms of storage cost.

Get the latest technology news and analysis, blogs and reviews delivered directly to your inbox with ZDNet UK's newsletters.

Topics: Cloud


Jack Clark has spent the past three years writing about the technical and economic principles that are driving the shift to cloud computing. He's visited data centers on two continents, quizzed senior engineers from Google, Intel and Facebook on the technologies they work on and read more technical papers than you care to name on topics f... Full Bio

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.