Forty-six percent of Google's SRE principles apply directly to your enterprise -- what about the rest?

Here's what you can learn from Google's Site Reliability Engineering book.
Written by Forrester Research, Contributor

Video: To comply with EU, Google simplifies privacy settings

When Google published its Site Reliability Engineering (SRE) book -- a detailed look at how it keeps production systems running -- Forrester started getting a lot of questions. "Should I do this in my enterprise IT shop?" "I'm no unicorn -- can I even do these things?" And perhaps most important: "What parts of the book are relevant?"

Read also: Airbnb: Inside the mind of a site reliability engineer

To answer these, we broke SRE down into 24 principles spread across six categories: Service delivery, feature velocity, automation, monitoring, reliability, and architecture. We then spoke with clients implementing SRE. We discussed their objectives, successes, and setbacks. We also talked with vendors guiding customers' implementations -- including Google to get its take.

What we found is that you can apply most of Google's advice -- with some tweaking. To sum up the findings:

  • Forty-six percent of the principles in the book work out of the box -- they're sound advice for any IT organization. This includes creating SLOs (service level objectives) that augment SLAs (service level agreements), implementing error budgets, and monitoring the four "golden signals" (latency, traffic, errors, and saturation). Do these today. Your customers will thank you.
  • Fifty percent of the principles are good advice -- but you'll need to tweak them for your enterprise. This includes balancing tickets between operations and development, writing your own APIs to automate processes, and bringing down production systems to test resiliency. This isn't bad advice per se, but your mileage may vary if you don't alter them for your enterprise.
  • There's a small number -- 4 percent -- that you should not execute. This mostly had to do with load balancing, which is not an invalid approach, but Google has some geographical architecture challenges that your enterprise probably does not.

In the end, most of the concepts can be applied with some tweaking. Focus on the service delivery, feature velocity, and automation concepts in the book. Focus less on the architecture sections, as Google's challenges likely don't mirror your own.

--By Chris Gardner, senior analyst

To learn more about applying Google's SRE approach to your infrastructure, download the report here [subscription required].

Previous and related coverage

Forrester Research: What we see coming for the channel in 2018

Forrester shares eight predictions for the channel this year.

CIOs must use EAs to be successful in their digital transformations

Here are three EA trends that can help you improve transformational efforts.

Tales of the scary data lifecycle: Cambridge Analytica and Emerdata

We're in the midst of witnessing a scary data lifecycle and the dangers it can bring to brands -- and beyond.

Editorial standards