Microsoft will offer its own distribution of the Hadoop data processing framework for Windows Server and Azure.
The move, announced by Microsoft's corporate vice president, Ted Kummert, at the Pass Summit on Wednesday, will see Microsoft work with Yahoo-spinoff Hortonworks on developing a distribution of Hadoop tuned for its software. The company plans to contribute code back to the open-source community as well.
"The next frontier is all about uniting the power of the cloud with the power of data to gain insights that simply weren’t possible even just a few years ago," Kummert said. "Microsoft is committed to making this possible for every organization, and it begins with SQL Server 2012."
By adopting Hadoop Microsoft follows in the footsteps of other major vendors who have all recognised the importance of the software. IBM, EMC, Dell, and Oracle have all integrated Hadoop on some level with their products, while Intel, HP and Hadoop's father, Google, have all evaluated the software for large-scale computing.
"Hortonworks and Microsoft share a common vision of making Apache Hadoop easier to use and consume," Hortonworks wrote in a blog on Wednesday. "Microsoft’s commitment to Apache Hadoop further broadens the Apache Hadoop ecosystem, which is essential to accelerating its adoption in the enterprise."
A preview of the Hadoop distribution for Microsoft's platform-as-a-service cloud Windows Azure will be available by the end of 2011, followed by a preview for Windows Server in 2012.
Hadoop is a data processing framework that Yahoo developed in 2005 after Google published a whitepaper detailing its MapReduce processing and Google File System storage technologies. Yahoo has been working on the open-source technology ever since, along with other contributors, and spun off its Hadoop development wing into a new company named Hortonworks in June.
A slew of modern web companies use Hadoop, such as Facebook, Twitter, Rackspace, along with companies like the New York Times and eBay.
Hortonworks isn't the only Hadoop developer out there: Cloudera develops its own distribution and and tied up with Dell in August, while stealth Hadoop developer MapR was tapped by EMC in May to power its Hadoop distribution for its Greenplum analytics.
No mention was made of Dryad, Microsoft's own Hadoop-like processing framework, based on its Cosmos technology.
"As surprising as it is to see Microsoft planning to offer MapReduce based upon open source rather than upon the internally developed and heavily used Cosmos platform, it's even more surprising that they hope to contribute changes back to the open source community," James Hamilton, a distinguished engineer with Amazon Web Services, wrote in his blog.
Microsoft also released software to automate the loading in and management of Hadoop datasets into SQL Server 2008 R2 and the Parallel Data Warehouse, and vice versa. Some licence terms apply for the SQL Connector for Apache Hadoop.