Microsoft's getting into the Hadoop game, and people are skeptical. Can Microsoft really embrace open source technology? And if it can, will it end up co-opting it somehow, or will it truly play nice? Would you even want to run Hadoop on the Windows operating system? Why bother? Why care?
Microsoft's Hadoop distribution, which it is building in partnership with Hortonworks, includes the core HDFS and MapReduce, plus a bunch more. Microsoft's also throwing in Hive, Pig, Mahout, Sqoop, HedWig, Pegasus and HBase. (The last of these is no small feat for the creator of SQL Server). The distribution can be installed on-premises on Windows Server or in the cloud on customers' Windows Azure "roles" (virtual machines).
Perhaps the best option, though, is a Web browser-provisioning interface for standing up an entire Hadoop cluster in just a few clicks of the mouse. Once the cluster is up and running, you can use Microsoft's Remote Desktop software to connect directly to the head node, and then go to a command prompt and hack around with Hadoop and all those components. But the interactive console offers an even better way. It's a command line interface that gives you, all in one place, access to:
- HDFS commands
- JAR file-based MapReduce jobs
- Basic charting (bar, pie and line graphs)
There's more too. Like an ODBC driver for Hive that effectively attaches Excel and most of the Microsoft Business Intelligence stack to Hadoop. But that's fodder for a separate post...or seven.
Microsoft's Hadoop offering should become generally available before too long. But if you'd like to apply for an invite to the beta, create an account on "Connect" and then fill out the special survey.