The Bing back-end: More on Cosmos, Tiger and Scope

Microsoft is working on increasing the performance and reliability of its Bing search engine via its Tiger, Cosmos and Scope work in its Online Services datacenters.
Written by Mary Jo Foley, Senior Contributing Editor

As part of its recent Microsoft Research roadshow, Microsoft officials talked up "Tiger," Bing's next-generation index-serving platform. Tiger, jointly developed by Microsoft Research and Microsoft's Search Technology Center in Asia, uses solid-state disk technology to improve Bing's search performance and relevance.

(According to LiveSide, Tiger's rollout began in August of this year and should be complete by year end.)

But there's more to Binging than just the index server. Another key component of Microsoft's search service is "Cosmos." Until recently,Cosmos was one of those Microsoft codenames that Softies hated to mention publicly. I've been tracking it since 2007 or so, but couldn't get Microsoft execs to say much about it. Recently, however, Microsoft posted a number of new job openings for Bing that mentioned Cosmos, Scope and the coming indexing platform.

Cosmos is the cloud storage and computational engine that powers all of Microsoft's Online Services, including Bing. Scope is the parallel querying capability/langugage for Cosmos.  Here's a description of Cosmos from one recent Microsoft job post:

"At the heart of Bing is Cosmos. We support the Online Services Division analyzing petabytes of data every day, operating at high scale and with high availability for data mining and business intelligence applications. As part of Cosmos, we build a highly parallel querying capability (called SCOPE) that allows front-end customers to focus on solving problems as if they are using a single machine."

Cosmos is helping Microsoft perform data analysis on "large clusters of tens of thousands of machines," some of its job posting say.

The evolving Bing crawling/indexing system (where Tiger fits in) is part of Bing's "new system for web search backend that will be orders of magnitude larger and faster than anything that currently exists" that is described in another Microsoft job post.

The indexing pipeline team is in charge of the back-end processing for Bing Web search. The indexing team has set some lofty goals for itself, as described in yet another job post:

"We aim to redefine real-time search and push the envelope on how fast any page anywhere in the world can be indexed. We write software from the ground-up, running across thousands of servers, managing petabytes of data. Our software has to reliably reprocess billions of web documents every day, ensuring that every document gets crawled, joined with the appropriate datasets and then indexed with the correct features. We are chartered with complicated problems such as finding, crawling, processing and serving any interesting and emerging web page in a matter of seconds; It doesn’t matter if it’s a new New-York times article, an posting to Facebook or an update to someone’s personal blog, we want that page in the index the moment it’s available. The problems we have to address everyday range from designing major new infrastructure pieces to debugging web-page(s) not correctly showing up in the search results."

Unsurprisingly, given that online search is powered by online ads, the Cosmos, Tiger and Scope folks also are going to contribute to what's going on with adCenter. From the job posting above:

"We’ll be helping to build the underlying infrastructure which will power the real-time processing and delivery of Advertisements to all AdCenter properties (including Bing). Hence contributing directly to the bottom-line success of Online Services Division."

Speaking of Cosmos, Microsoft platform architect veteran Pat Helland -- who has spent the last couple of years working on Cosmos -- is leaving the company, he announced publicly on September 30 in a blog post. Helland is moving to San Francisco without a new job to be closer to family, he said. From his post:

"For almost two years, I've worked on Cosmos, some of the plumbing for Bing. It stores hundreds of petabytes of data on tens of thousands of computers.  Large scale batch processing using Dryad with a high-level language called SCOPE on top of it. Working on this team (with some amazing colleagues and friends) has been one of the highlights of my career."

Editorial standards