Microsoft’s Big Data Plans: Acknowledge, Embrace, Integrate

Microsoft had its Worldwide Partner Conference in Toronto this week, and over 16,000 people were there. So was Big Data.
Written by Andrew Brust, Contributor

Microsoft held its annual Worldwide Partner Conference (WPC) in Toronto this week. Although the event is held in North America, it is the only such event all year, anywhere, and attendance is truly international.  Microsoft said that this year, over 16,000 people from 156 countries attended WPC.  It is by any measure a big show and this year Microsoft had a lot to say about Big Data.  Whether in keynotes, breakout sessions or invite-only roundtables, the message was there. 

First, the facts: Microsoft has been working with Hortonworks to build a distribution of Hadoop for Windows Azure, its cloud platform, and for Windows Server.  Right now the service is available as a cloud service in a by-invitation beta that just entered its third release.  The distribution includes Hadoop itself, Hive, Pig, HBase, Sqoop, Mahout and Carnegie Mellon’s “Pegasus” graph mining system.

The Hadoop bone’s connected to the SQL bone
What’s interesting about Microsoft’s Big Data approach is that the company sees Hadoop as a part of its overall data platform.  Maybe that’s why Microsoft Chief Operating Officer, Kevin Turner, called out the company’s Big Data strategy during his keynote address, saying "we're going big in Big Data." Turner also mentioned that Microsoft’s SQL Server is now the leader in the relational database market, in terms of units sold, and continues to grow.  And despite the many technological differences between relational databases and a distributed computation system like Hadoop, Microsoft sees the open source Big Data technology fitting right in to its enterprise data strategy.

There are components in Microsoft’s Hadoop distribution that help reconcile it with Enterprise technology.  For example, the distribution includes a very powerful browser-based console, providing a GUI for running MapReduce jobs; a JavaScript-based command console that also accommodates Pig and HDFS commands; and an interactive Hive console as well.  Microsoft’s distribution also allows MapReduce code itself to be written in JavaScript (rather than Java) and provides an ODBC driver for Hive, facilitating connectivity to Hadoop from Excel and most of the Microsoft Business Intelligence stack.

Stay on message
Where the tech goes, so goes the partner messaging. In a session on the opportunities brought to Independent Software Vendors (ISVs) by SQL Server, Microsoft’s Director, SQL Server Partner Marketing, Bob Baker, specifically mentioned Microsoft’s Hadoop efforts and those very tie-ins to the BI stack.  And it’s not just about the data platform, either.  In a roundtable discussion I attended a key member of Microsoft’s Big Data team, it became quite clear that Microsoft sees the technology fitting into its entire data center and cloud product strategy.

Bing Data
Why would Microsoft be so bullish on technology that is open source, Java-based and largely Linux-facing in pedigree? Most likely it’s because Microsoft runs Bing. By some counts, Bing and Yahoo Search (which is Bing-powered) together have about 30% search market share and Turner announced in his keynote that Bing is now leading Google in search relevance.

While I’m not exactly sure who’s measured that or how, the fact is that Bing is a Big Data hot bed.  In fact, according to one Microsoft Big Data team member I spoke with, Bing’s data corpus is now 250 Petabytes (PB), and is growing at 8 PB/month.  With that amount of data, it’s no wonder that Bing has used Hadoop to great advantage.  And given that Microsoft’s President for the Server and Tools Business (STB), Satya Nadella, was previously vice president of R&D for Microsoft's Online Services Division (which includes Bing), and that SQL Server falls under the STB organization, the Hadoop-SQL Server friendship isn’t so strange after all.

The Big Data Tidal Wave
Microsoft is not a Big Data company per se.  It’s not venture-funded, it’s not a startup and its main business model is certainly not built around open source.  Microsoft is a software company, and Microsoft’s take is that Big Data and Hadoop are increasingly part of the Enterprise software landscape.  As it did back in the 1990s with TCP/IP and the Internet itself, Microsoft is embracing Hadoop, integrating it and making it accessible to business users.  That matter-of-fact approach to Hadoop and Big Data is likely to become the norm throughout the Enterprise world.  For over 16,000 attendees of the Microsoft Worldwide Partner Conference, that approach is the norm in their world starting now.

Editorial standards