At last, big-data fans, we've got some word of the seemingly-missing-but-not-forgotten Windows Server implementation of Hadoop promised by Microsoft and Hortonworks.
I'd started wondering whether Microsoft's repeated "no comments" about the project's whereabouts -- the most recent of which I received just a couple weeks ago, at the end of September 2012 -- meant Microsoft had decided to go cloud-only with Hadoop. But it turns out the Windows Server version of the Microsoft-Hortonworks Hadoop implementation is still around, and is just in private preview.
A quick refresher as to what's going on with Microsoft and Hadoop.
In the fall of 2011, Microsoft announced it was partnering with Hortonworks to create both a Windows Azure and Windows Server implementations of the Hadoop big data framework. At that time, Microsoft officials committed to providing a Community Technology Preview (CTP) test build of the Hadoop-based service for Windows Azure before the end of calendar 2011 and a CTP of the Hadoop-based distribution for Windows Server some time in 2012. A month after announcing the Hortonworks partnership, Microsoft dropped plans to make its own big data alternative, codenamed Dryad.
In late December 2011, Microsoft posted a video on its Channel 9 site that provided updated information about the company's Hadoop plans. According to that video, which Microsoft subsequently pulled from Channel 9, the company planned to make Hadoop on Windows Azure generally available in March 2012, and Hadoop for Windows Server generally available in June 2012.
Ever since, Microsoft officials have gone silent on the new timetables for the Hadoop for Azure and Hadoop for Windows Server offerings. Until late September 2012, that is.
A slide deck from the "24 Hours of PASS" event from Denny Lee, Technical Principal Program Manager for SQL Business Intelligence Group, made its way to the Web recently. Lee, according to his bio, is "one of the original core members of Microsoft Hadoop on Windows and Azure (code name: Isotope) and had helped bring Hadoop into Microsoft."
A few of the interesting slides from Lee's deck from his September 21, 2012 presentation:
Hadoop on Azure is still in preview, as Lee's slide says. (The latest publicly acknowledged build was the second Community Technology Preview release.) But now we know that the Windows Server version is in private preview, according to Lee's deck. I'm not sure how long it's been in private preview, and have never found any testers who've claimed to have been part of the preview for it.
Also: there's seemingly a new deliverable on the roadmap: An "on-demand" dedicated Hadoop cluster in the cloud, which seems to be some kind of hybrid between the two (best I can tell). Anyone know any more about this?
Microsoft officials have been saying for a while that it wasn't just the Hadoop framework which Microsoft planned to support. There are lots of other related components in the works, like the Excel Hive Add-in, Sqoop, Apache Pig, Hive ODBC and more, as this slide notes. I'm assuming the features listed below the beige bar are the features that will be in the Windows Server version of the Hadoop implementation, and those above the bar are what are in the Azure Hadoop one.
Hadoop for Windows Server includes an interactive console, remote-desktop support, and other related elements, as this slide seems to indicate.
The O'Reilly Strata Conference plus Hadoop World are on tap for late October in New York City. Maybe Microsoft and Hortonworks will share more about their Windows Azure and Windows Server Hadoop plans and progress then (even though there aren't many Softies listed as speakers)?