Microsoft's Hadoop roadmap reveals new big data deliverables

A Microsoft roadmap slide sheds some additional details on the scope of the work the company is doing to integrate Hadoop with Windows Azure and Windows Server.
Written by Mary Jo Foley, Senior Contributing Editor

When it comes to big data, Microsoft has more in the works than just Windows Azure and Windows Server versions of the Hadoop big-data framework. The company is working on a number of supplementary tools and technologies that it plans to roll out aggressively in the first half of this year, according to a roadmap shared with me by one of my contacts.

Microsoft announced last fall that the company was partnering with Hortonworks to create Windows Azure and Windows Server Hadoop distributions. Company officials also shared via a short video on Microsoft's Channel 9 that Microsoft was looking beyond those distributions themselves and was working on ways to integrate bidirectionally with the Hadoop file system and Hadoop tools like Sqoop and Flume. Microsoft officials have used the codename "Isotope" to refer to coming suite of Microsoft products and utilities that will support Hadoop on Windows Azure and Windows Server.

According to the roadmap slide below, Microsoft is planning to deliver the final version of Hadoop on Azure on March 30. (We knew this was supposed to happen some time in March, but didn't have the exact target date until now.) At that same time, Microsoft also is planning to launch a tech preview of Hadoop on Windows Server, the roadmap says, with the final version of that offering targeted for release on June 29. (Again, we knew the target was June for Hadoop on Windows Server, but didn't know of a specific date.)

(click on the slide to enlarge)

The exact delivery dates aren't all that interesting to me, as they're probably still somewhat subject to change (given that this slide is dated December 2011). Here's what I found to be far more interesting in this slide:

See that mention of "BigTop" that is under the Hadoop on Azure GA (general availability) item? That's a real surprising blast from the past -- at least for this Microsoft codename watcher. "BigTop" was a Microsoft project I last wrote about back in 2004. BigTop, from what I was told, was all about helping developers create a set of loosely coupled, distributed operating-systems components in a relatively rapid way. Last I had heard Microsoft killed its BigTop incubation effort, but it seems at least some of the technologies from it are alive and well if it's mentioned as a part of the current roadmap.

Update: Thanks to reader Andrew Bayer (@abayer), who alerted me to the "other" BigTop -- and the far more likely reference on this slide. There's an Apache BigTop that's all about interoperability tests and packaging. I'm 99% sure this is the BigTop meant by the slide and not the old Microsoft BigTop.

Also under the Hadoop on Azure GA item is a .Net/Common Language Runtime (CLR)/C# framework for Hadoop programming. I'm hearing second-hand that there's no real C# support there yet, beyond a single sample which can be used in a limited capacity. So it will be interesting to see how/if this materializes by March.

On the Hadoop on Server side, there's another mention of Active Directory (AD) integration -- something Microsoft execs alluded to in the aforementioned Channel 9 Isotope video. The roadmap slide also notes System Center (SC) integration is coming, too. And by the time the product is generally available in late June, it looks like support for the "R" statistical graphics and computation language will be incorporated, as will availability of "secure HDFS" (Hadoop File System).

I've asked Microsoft officials if they've got any more details to share about their Hadoop plans. No word back so far.

Editorial standards