Microsoft is making available for download the first release a new piece of cloud analytics technology developed by its eXtreme Computing Group that is known as Project Daytona.
Microsoft describes Daytona as "an iterative MapReduce runtime for Windows Azure" that is meant to support data analytics and machine-learning algorithms which can scale to hundreds of server cores for analyzing distributed data. The 1.0 "technical preview" download is under a non-commercial-use license.
"Using Daytona, a user can submit a model, such as a data-analytics or machine-learning algorithm, written as a map-and-reduce function to the Daytona service for execution on Windows Azure. The Daytona runtime will coordinate the execution of the map-and-reduce tasks that implement the algorithm across multiple Azure virtual machines."
Here's part of a poster from Microsoft Research's TechFest 2011 showcase that mentions Daytona:
(click on the image above to enlarge)
According to the poster, the MapReduce-Daytona combination make use of the compute and storage services built into Azure.
MapReduce is Google's framework/programming model for large data sets distributed across clusters of computers. It is somewhat akin to Microsoft's Dryad, which is now known by its official name of LINQ to HPC. LINQ to HPC enables developers to write data-intensive apps using Visual Studio and the LINQ programming model and to deploy those apps to clusters running HPC Server 2008 R2. Microsoft released Beta 2 of LINQ to HPC on July 12. Microsoft officials had said that the company planned to roll LINQ to HPC into SP2 of Windows HPC Server 2008 R2, but seemingly decided against doing so.
Update: Surajit Chaudhuri is Managing Director of the eXtreme Computing Group now, reporting to Research chief Rick Rashid
More Updates (July 18): Now that Microsoft has officially launched the Daytona tech preview, Roger Barga, an architect with the eXtreme Computing Group's cloud futures unit, has provided me with a few additional details.
Microsoft Research is downplaying the MapReduce elements of Daytona. The official position is Daytona is technology meant to demonstrate what next-generation analytics in the cloud," and which isn't all about Google's MapReduce.While data-analytics and machine-learning is what MapReduce is optimized for, the original MapReduce technology dates back to the 60s and 70s, Barga said, before iteration and caching the cloud existed.
The team opted to use MapReduce instead of Dryad/LINQ to HPC in order to save time. If they had gone with Dryad, they would have had to spend a lot of cycles decoupling it from SQL Server, Windows HPC Server, etc., Barga said. He also noted that Microsoft may end up delivering Daytona under a non-commercial open-source license (if that's what the academic community wants), and using MapReduce rather than Dryad would make it "easier to flip the bits" to open source.
Microsoft expects to take its Daytona learnings back inside the company and apply them to Dryad and possibly also Cosmos, Barga said. Cosmos is petabyte-scale store and computation platform, plus the ecosystem of tools and libraries, upon which Microsoft's Bing search engine is built. Some of the Azure storage elements have taken their cues from Cosmos, as well.