Just days after announcing plans for supporting the Hadoop big-data framework, Microsoft rolled out the near-final version of its own Hadoop (and Google MapReduce) competitor, known as LINQ to HPC (codenamed "Dryad").
Microsoft Chairman Bill Gates first publicly mentioned Dryad, a Microsoft Research project, in 2006. The company has been taking steps, especially recently, to move Dryad from a research effort to a commercial one.
Microsoft’s stated longer-term goal is to combine LINQ to HPC and its parallel-programming tool stack to create an abstraction layer that will allow users to access compute resources -- whether they’re located on multicore PCs, servers and/or the cloud. Microsoft officials have said that LINQ to HPC will be key to helping the company “turn the cloud into a supercomputer.”
(InfoQ cofounder Alex Popescu blogged a comparison of Dryad and Hadoop earlier this year.)
LINQ to HPC was originally expected to be part of the second service pack for Microsoft's HPC Pack 2008 R2-based clustering system. Instead, Microsoft is integrating the LINQ to HPC runtime and making it a component into Service Pack (SP) 3 for HPC Pack 2008 R2 (which is now at the Release Candidate milestone).
Here's the latest plan for LINQ to HPC, as outlined in an October 17 blog post on the Windows HPC Team Blog describing the coming SP3:
"The HPC Pack service pack is an update to the same Windows HPC cluster software that you know and love, with improvements to basic functionality & stability and a few additional new features such as the integration of the Linq to HPC runtime (previously released as a beta add-on), enhancements to our Windows Azure bursting scenarios by reducing the number of ports you have to open in your firewall (services now use 443 instead of a multiple ones), and the ability to install the HPC Pack software on a server not dedicated to your cluster (e.g. a team file server) for use in a manner similar to the Workstation Node functionality previously available."
HPC Pack 2008 R2 is the operating system for "a cluster of servers that includes a head node, and one or more compute nodes (on-premise of Azure-based)," according to a description on the Microsoft Connect testing site.
The HPC team also announced on October 17 the availability of the release candidate for the Windows Azure Scheduler software development kit (SDK).
Windows Azure Scheduler for Parallel Applications "is a solution that enables you to deploy applications in a scalable, high-performance computing (HPC) infrastructure in Windows Azure," the aforementioned blog post noted. The Scheduler allows users to schedule, submit and monitor high-performance computing (HPC) jobs that use the Message Passing Interface (MPI), service-oriented architecture (SOA), or LINQ to HPC applications, Microsoft officials said.
While on the topics of big data and Azure, here are a couple more related links of potential interest:
- Microsoft Senior Technical Specialist Buck Woody has a good blog post on the real meaning of big data.
- Alchemy Solutions has a new version of its NeoKicks product takes mainframe CICS/COBOL/DB2/VSAM-based applications and migrates those mainframes workloads to Windows Azure with what are said to be minimal changes to the application code and data. Microsoft is touting this as potentially aiding companies "that wish to break free of the high cost of mainframes while getting virtually unlimited scale and improved price/performance over their prior environment."