The magic that makes Google tick

Summary:Google's vice-president of engineering was in London this week to talk to potential recruits about just what lies behind that search page. ZDNet UK snuck in to listen

The process
Obviously it would be impractical to run the algorithm once every page for every query, so Google splits the problem down.

When a query comes in to the system it is sent off to index servers, which contain an index of the Web. This index is a mapping of each word to each page that contains that word. For instance, the word 'Imperial' will point to a list of documents containing that word, and similarly for 'College'. For a search on 'Imperial College' Google does a Boolean 'AND' operation on the two words to get a list of what Hölzle calls 'word pages'.

"We also consider additional data, such as where in the page does the word occur: in the title, the footnote, is it in bold or not, and so on.

Each index server indexes only part of the Web, as the whole Web will not fit on a single machine - certainly not the type of machines that Google uses. Google's index of the Web is distributed across many machines, and the query gets sent to many of them - Google calls each on a shard (of the Web). Each one works on its part of the problem.

Google computes the top 1000 or so results, and those come back as document IDs rather than text. The next step is to use document servers, which contain a copy of the Web as crawled by Google's spiders. Again the Web is essentially chopped up so that each machine contains one part of the Web. When a match is found, it is sent to the ad server which matches the ads and produces the familiar results page.

Google's business model works because all this is done on cheap hardware, which allows it to run the service free-of-charge to users, and charge only for advertising.

Topics: Servers

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.