A new whitepaper that Microsoft researchers are set to present at a conference next month sheds more light on Microsoft's back-end cloud infrastructure.
The paper, entitled, "SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets," details a new declarative scripting language that is optimized for storing and analyzing massive data sets (like search logs and click streams) that are key to cloud-scale service architectures. SCOPE, or Structure Computations Optimized for Parallel Execution, is the name of the language.
According to the paper -- which Microsoft is on tap to present at the VLDB 2008 conference in late August -- SCOPE doesn't require explicit parallelism, but it will be "amenable to efficient parallel execution" across large clusters. SCOPE is like SQL, but with C# extensions, the paper says.
I found the new whitepaper via a blog link from Greg Linden, an employee of Microsoft's Live Labs. Linden blogged:
"Scope is similar to Yahoo's Pig, which is a higher level language on top of Hadoop, or Google's Sawzall, which is a higher level language on top of MapReduce. But, where Pig focuses on and advocates a more imperative programming style, Scope looks much more like SQL."
Reading through the paper, I noticed an explanation of how SCOPE fits in with Cosmos, Microsoft's back-end storage layer that currently powers Live Search and other Microsoft services. The SCOPE whitepaper sheds more light on what Cosmos is and how it works. From the paper:
"Microsoft has developed a distributed computing platform, called Cosmos, for storing and analyzing massive data sets. Cosmos is designed to run on large clusters consisting of thousands of commodity servers. Disk storage is distributed with each server having one or more direct-attached disks."
(A loosely-coupled aside: I wonder if Pat Helland's decision to move to the SQL team at Microsoft has any connection to all of this. Helland's expertise is in big-picture strategy around transactional and parallel processing, as well as service-oriented architectures.)
Increasingly, all of Microsoft's future strategies and products finally seem to be converging. More teams are thinking about parallel/distributed/multicore computing, with the experimental Windows successor code-named Midori being just the most recent of many examples. More Microsoft products are seemingly being designed with modeling in mind from the get-go.
Maybe Chief Software Architect Ray Ozzie's campaign to break "drive alignment" across the various Microsoft product groups is finally taking root.... Or maybe it's simply that cloud computing, to be truly scalable, must be built to work across increasingly large networks of distributed systems. Or maybe it's a little of both....