Google's three rules

Google's three rules

Summary: They roll out new applications for millions of users with surprising speed, especially compared to corporate IT. They build data centers with hundreds of thousands of servers - and millions of disk drives - and run it all on free software.

SHARE:
TOPICS: Data Centers, Google
31

They roll out new applications for millions of users with surprising speed, especially compared to corporate IT. They build data centers with hundreds of thousands of servers - and millions of disk drives - and run it all on free software.

Costly corporate kit, like RAID arrays and 15k FC drives, aren't used. Yet they do more work in an hour than most companies do in a year.

Google's IT capabilities are a modern wonder of the world. Underneath the complexity though are just three simple rules. Rules that no enterprise data center (EDC) would ever think of following.

I'm attending the Google scalability conference (see a short version of the agenda here) in Bellevue, WA tomorrow, which got me thinking about the Google rules of IT.

This is not your father's data center How does a Google data center differ from an EDC? Other than using electricity, in just about every way that matters.

Cheap The key to Google's competitive strategy is that they have the cheapest compute, network and storage (CNS) in the industry. Free or home-made software. Mass produced - by Intel, these days - servers-on-a-board with network, storage and energy-efficient dual-core processors. SATA drives. Unmanaged 48 port switches.

EDCs don't care about cost. They focus on uptime. The low-volume hardware they buy is reliable and very expensive. As a result, EDC services growth is much less than Moore's Law. EDCs are nursing homes for aging apps, not hotbeds of innovation.

Embrace failure Cheap also means things break. And when you've got several million servers, lots of things break every day. Get over it. Google expects failure and builds recovery into the software layer that connects the cheap kit.

The EDC buys low-volume kit that tries to engineer-out failure. Google gets uptime by building failover on top of the hardware, not into it. Today's data center guys break out in hives just thinking about it. Twenty years from everyone will do it that way, but not today.

Architect for scale This is the flip side of cheap. Google hired some of the best minds in the business to architect for scale. They have multiple 8,000 node clusters that they've talked about and I wouldn't be surprised if they've got some up 16,000 nodes.

Architecting for scale leverages cheap CNS to give Google the lowest-cost growth as well. Competitors such as Yahoo, who rely more on standard EDC products, can do the same things as Google, but it costs them about 10x in capital expense and several times the operations expense.

Fast growing applications play to Google's strength.

The Storage Bits take If Google has it all figured out, why the scalability conference? Good question. I think they'd like more scalability: 40,000 node clusters; 4 million processor data centers; exabyte storage. This isn't just about gluing the bits together to get work done, either. They want lower power consumption, cheaper hardware, faster protocols and better software.

This is more than first-mover advantage. The faster they can grow, the greater their cost advantage over smaller, less nimble competitors. Their ROI brings them cheap capital, which increases their ability to invest in new businesses and more capacity. The higher their volumes, the cheaper growth becomes. A perfect storm.

Google is not invincible, by any means. Their marketing is pathetic. The concentration of power in the hands of three largely untried individuals means a major cock-up is only a question of when, not if. The stagnant share price puts pressure on management to increase returns by cutting back on costly perks. Google's purpose-built infrastructure is also relatively inflexible: they can't just paste on ACID transaction processing.

But that is all in the future. Tomorrow I'm looking forward to hearing about the latest in scalable systems from the industry's leading innovators.

Comments welcome.

Topics: Data Centers, Google

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

31 comments
Log in or register to join the discussion
  • why not?

    why can't they just paste on ACID transactions?

    it'd probably take alittle work but some sort of transaction processor they can gut and use over their infrastructure seems a doable thing given the brains they've got there...
    CWButler
  • Search vs. Enterprise Apps

    I think what Google does with their data centers is masterful. But let's be honest about the limitations of Google's approach. Google's main product is Search. Search lends itself to unreliable clusters. In particular, there is no need to maintain reliable state on the server side. Also, the result of a worst case scenario is that the user just re-runs their query. Enterprise applications like ERP can't run on that basis.
    johndoe445566
    • Enterprise applicaiton especially ERP

      Can run just like that. Think about it or a minute what is ERP it's really just a sophisticated query about Enterprise Resource Planning. Queries and data collation are what a search engine does. After all what you're really looking for in ERP is your critical path which is basically just a very complex query. The critical path once worked down until it's not the critical path you simply go back and get the next critical path item. It's not the big of a leap from a search engine to ERP after all.
      maldain
    • Yep...Application-Specific

      Yes, exactly right. Google's architecture is perfectly suited to the requirements of the type of application they run. Performance and availability are paramount, but maintaining consistency and state are not, so they can forgo many of the elements that dictate typical enterprise data center design. Which is not to take away from what they've built; we just shouldn't go ga-ga over their approach and try to make it work where it isn't appropriate.
      tpbishop1
  • finally someone sees sense

    And Google did it.
    Most Corporations are too afraid to take the risk of such an idea especially in software development. Now if it is does not have the right components and the right vendors, then it won't be used.
    This means that many enterprises end up using the same products or buying third party whilst not realising that they have probably compromised some of their competitive edge too.
    Programming languages are much the same - many corps are too anxious to specify it has to be C++ or C# etc whereas had they have used java or perl or something else for parts of the task in question, it could have been more aptly suited to that task, developed quicker and solved solutions sooner.
    Yes "maintainability" I hear you cry but lets face it - programmers are two a penny and in a large enough enterprise, you will fine someone with the right skills.
    It could also be said that this is a good reason for using Open Source and could be seen to quell some of the myths.
    ismoore
  • Google will suprise everyone

    Excellent review of another Google's strength.
    Google will dominate and eventually replace much of Domestic US Internet Best Effort traffic with a Nationwide Broadband distribuiton network managed by and via these centers and their Nationwide Fiber connections. In short delivering true broadband data links to local Service providers (fiber connected to the local Data Center via fiber) to deliver Hosted Applications to its customers-Mutual benefits.

    Jacomo
    jim.aimone9
    • Google and 700 Mhz

      Watch what these folks do with the new 700Mhz auction scheduled in Feb 2008. My prediction is they will bid direct or with a team and win a major share of the nationwide spectrum.
      This will round out their effort to own the networks end to end by allowing them to gain access to the "last mile" using this new wireless spectrum.
      jim.aimone9
  • Workload Matters

    The Google workload is very different from the typical enterprise workload. It is lots of very low value transactions. If something fails, don't worry the user will repeat the query. Tell that to Amazon or your brokerage firm. All credit to Google for creating an architecture that fits the load. Trying to move that architecture to a high value transaction environment is a much tougher problem.

    dave
    davea_hm
    • You are correct

      Which is why Amazon is actually more interesting from an internet scale business
      perspective.

      Robin
      R Harris
  • It works and not just for google

    i use the same stratigy at my home biz. More cheaper systems with failover and shared resources. Most people at home have 1 or 2 computers. We have 3 laptops all cheap as hell.
    I have 16 computers, no kids crying over who gets to use what, they all have multiple choices. Seems the 8 year old preferes Linux, the 12 year old wants XP for playing sims and the 16 year old uses only Windows 2000 pro for stability but the ability to use MSN (otherwise she boots Linux to do her homework.)
    The wife is a Mac freak.
    I use them all and they all dual boot to Linux with Mozix so i can runmy cluster apps overnight to put the latest scietific numbers on the desks of my clients first thing in the morning.
    Eventually everyone will be doing something like this. My customers are all changing from Proprietary Unix and Windows to Linux clusters on one hand for speed and stability and on the other hand are moving Bus apps to virtual systems. Its the future get used to it.
    sysop-dr
  • Good ole Google

    Very important systems such as ERP and the like DONT have to run on expensive resiliant hardware/software. In my opinion the problem is SQL. SQL needs all sorts of expensive disk arrays and hardware as well as software to get it to a point where you could lose a box and still be ok. I have seen search engines (Information MAnagement tools) outperform SQL by a HUGE factor in some cases, and yet everyone still uses SQL. Google dont use SQL and they are miles ahead of anyone in terms of storage capacity and cluster performance. Why? SQL. An old technology that is FAR to expensive to implement, even when its free. I'm telling you, please believe me, RDBMS were invented in the 70's and although the hardware has improved the same underlying principles exist. What happened to HyperMedia systems?

    IBM for example have spent Millions of dollars and a lot of time making their RDBMS system an "XML Enging" of sorts. It's slow, its resource intensive, why would anyone bother?!

    There are companies out there developing XML engines which will search free text (google-like) queries and structured (XPath/SQL-style) queries on massive records and massive datasets. No relationships. No real management of the data. And it outperforms SQL even on simple records, even on MASSIVE data sets (the example I have seen was 200,001 records, each record say 65K in size and you could search the entire data set free text or structured on a standard server (Dual single core 4gig of ram SCSI disk) in MILLISECONDS! The same data in SQL, some searches can take minutes!

    The world is blind and crazy, take a leaf out the google book.
    cjs1
  • Google Lucky

    Googles business models allow them the flexibility of using cheaper hardware. If someones search fails, oh well. Systems like banking and financial don't have that luxury and by gov't regulations are not allowed to have that luxury. If we could move more Client based apps to Internet based then the opportunity would present itself to use the cheaper Datacenter technology, but as long as we have these client based apps. We are pretty much SOL.
    wademan
    • Not really

      The fact is that as long as you don't lose the transaction if you can recover from failure then it's not a problem especially if you can do it rapidly. My company employs a similar business model for our retail outlets. We have servers that cost us about 300 bucks to build we install Linux on them and run inexpensive thin clients on the counter tops and in the offices. We service about 40 users per server giving them access to the net, office software and running our point of sale software. This model works extremely well in that our total outlay for hardware and software in an average store is about 2,000 excluding the dsl hook up. That makes it an inexpensive clean inexpensive and reliable setup if something does fail we can have them fully recovered and back up in less than 24 hours thanks to FedEx. The only problem is they will have to re-enter any transactions that took place during the time while their system is down. Our whole structure is set up so that no single failure, hardware or software, will put us out of business.
      maldain
      • Lucky You

        To have the luxury of 24 hours to have a store "fully recovered"

        For my financial clients I have a 100% SLA, if they experience less than a minute in downtime they don't have to pay me

        For my retail clients they have a 99.9% SLA meaning anything more than 3.65 days downtime in a year and they get a full refund of my support service charges

        With those sort of penalties in place I'd like to see Google run their datacentres the same way!
        Dominick-Murphy
    • why lucky?

      My experience here in the UK is that most banking is desperate to move most middle to back office stuff to web based/client server solutions. It is only high end Front Office where this becomes an issue.
      There is no regulatory reason not to as a failure is a failure. In my previous pre-sales roles of financial - the first question was "can it run in a browser".
      Client server offers one thing multiple apps on one PC doesn't - it has non of the complex installation problems. If someones PC fails, another can be replaced quickly without having to reinstall a variety of different applications depending on the users usage not to mention the problems of clashes between installed applications etc. (I worked for one company that took three years to package all apps onto XP from NT - who wants that headache)
      Providing that resilience is built in what is the problem?
      ismoore
  • The danger behind "Nobody got fired for buying from IBM"

    Many big corporations are scared to buy from small companies. If they take some risk they will get innovative products, a company that will do everything possible to support them (not a support person in a far away land) and they will be able to shape the product to satisfy their business needs.

    Too many big corporations stay away from innovative products and go with old technology from "trusted" companies.

    At this rate US corporations will loose the edge to emerging countries because they are willing to try anything to get a competitive edge.
    venkats2000
  • It is the only way to go

    When I first came to my company they were running on Sun hardware, The intial cost of the servers were 17,000 with a maintenence contract of 7,000 per year. When I came here and did a search on our servers, they had depreciated down to 720.00.

    720.00 and still paying 7,000 maintanence.

    We went to white box servers and sata drives, and for the same cost we now have tripple redundency with no need for support contracts, a server goes just throw it away and pop in a new one.
    uptime for us in the last 3 years is near perfect and we have no headaches.

    Plus we are always on the cutting edge, tossing a server that cost less than 2,500 bucks is painless.

    Also we never have the need for tape back-ups, that as well saves alot of time and money.

    Bottomeline: 8 servers for the price of one Sun box....no brainer in my book. Most Sun people will then reply, yea but the Sun box will last 12 years...and yes they would be correct..except why in God's name would I want an ancient 12 year old server in my network.
    mames1701
    • Re: It is the only way to go

      "Most Sun people will then reply, yea but the Sun box will last 12 years...and yes they would be correct..except why in God's name would I want an ancient 12 year old server in my network."

      Because your must-have application was last updated by the vendor when that server was new and it won't run on anything else.
      troidus
      • A good reason to NOT buy boxed software with no source code.

        Then you are at the mercy of the vendor.
        DonnieBoy
  • Hard to replicate

    I agree that Google's infrastructure is one of its key strategic advantages. I was awed some years ago when I first read about Google developers' dynamic ability to build clusters of thousands of machines, upon which they could test their latest tweak to some search algorithm or other. Like I said, impressive.

    I'm not sure I agree with Harris that Google's infrastructure is inflexible. Indeed, from what I've read about Google's distributed file system, I would venture to say that they've got a very flexible architecture upon which to build a variety of applications. (And unless they've had to modify it drastically for Google Checkout, I'm guessing that they're not having too much trouble layering ACID transactions on top of it, either.)

    The real problem with Google's architecture is how hard it appears to be to copy it, especially if you don't have the economies of scale that Google does. Small- to mid-size companies just don't have the resources to program their own BIOS, build their own hardware, or to create their own distributed file system.

    Now, if anybody does have examples of small companies that have replicated Google's architecture in interesting ways, I'd be very interested in hearing about them.
    smithkl42