I've mentioned before that I've done a lot of work with Microsoft. Recently, I was visiting the company's corporate campus in Redmond, Washington, for the Global Summit of its Most Valuable Professionals program, in which I participate. As I was on campus, and it was the week before O'Reilly's big data-focused Strata conference, of which Microsoft is a big sponsor, I took the opportunity to sit down with Microsoft's Director of Program Management for BI in its Data Platform Group, Kamal Hathi.
It's not just about Strata, either. Hathi is gearing up to deliver the keynote address at the PASS Business Analytics Conference in Chicago next month and so his mind is pretty well immersed in strategic questions around big data that are relevant to the software giant that employs him.
Redmond's big data worldview
My goal was to find out how Redmond views the worlds of big data, analytics, and business intelligence, and what motivates those views, too. What I found out is that Microsoft sees big data mostly through two lenses: That of its business intelligence sensibility, formed over more than a decade of being in that market; and those of its other lines of business, including online services, gaming, and cloud platforms.
This combination makes Microsoft's analytics profile a combination of old-school mega vendor BI market contender and modern-day customer of analytics technologies. And because Microsoft has had to use its own BI tools in the service of big data analyses, it's been forced to find a way to make them work together, to ponder the mismatch between the two, and how best to productize a solution to that mismatch.
I mentioned last week's Strata conference, and that really is germane to my conversation with Hathi, because Microsoft made three key announcements, all of which tie into the ideas Hathi and I discussed. Those announcements are as follows:
Version 2 of its SQL Server Parallel Data Warehouse product is complete, with Dell and HP standing by to take orders now for delivery of appliances this month. PDW v2 includes PolyBase, which integrates PDW's Massively Parallel Processing (MPP) SQL query engine with data stored in Hadoop.
Microsoft released a preview of its "Data Explorer" add-in for Excel. Data Explorer can be used to import data from a variety of sources, including Facebook and Hadoop's Distributed File System, and can import data from the web much more adeptly than can Excel on its own. Data Explorer can import from conventional relational data sources as well. All data imported by Data Explorer can be added to PowerPivot data models and then analyzed and visualized in Power View.
Hortonworks, Microsoft's partner in all things Hadoop, has released a beta of its own distribution of the Hortwonworks Data Platform (HDP) for Windows. This more "vanilla" HDP for Windows will coexist with Microsoft's HDInsight distribution of Hadoop, which is itself based on the HDP for Windows code base.
As I said, these announcements tie into the ideas Hathi discussed with me, but I haven't told you what they are yet. Hathi explained to me that Microsoft's strategy for "Insights" (the term it typically applies to BI and analytics) is woven around a few key pillars: "democratization", cloud, and in-memory. I'll try now to relay Hathi's elaboration of each pillar.
"Democratization" is a concept Microsoft has always seen as key to its own value proposition. It's based on the idea that new areas of technology, in their early stages, typically are catered to by smaller pure play, specialist companies, whose products are sometimes quite expensive. In addition, the skills required to take advantage of these technologies are usually in short supply, driving costs up even further. Democratization disrupts this exclusivity with products that are often less expensive, integrate more easily in the corporate datacenter and, importantly, are accessible to mainstream information workers and developers using the skills they already have.
In the case of Hadoop, which is based on Apache Software Foundation open-source software projects, democratization is less about the cost savings aspect and much more about datacenter integration and skill set accessibility. The on-premises version of Microsoft's HDInsight distribution of Hadoop will integrate with Active Directory, System Center, and other back-end products; the Azure cloud-based version integrates with Azure cloud storage and with the Azure SQL Database offering as well.
In terms of skill set accessibility, Microsoft's integration of Excel/PowerPivot and Hadoop through Hive and ODBC means any Excel user that even aspires to power user status will be able to analyze big data on her own, using the familiar spreadsheet tool that has been established for decades.
The other thing to keep in mind is that HDInsight runs on Windows Server, rather than Linux. Given that a majority of Intel-based servers run Windows and that a majority of corporate IT personnel are trained on it, providing a Hadoop distribution that runs there, in and of itself, enlarges the Hadoop tent.