Microsoft's big data strategy: Democratize, in-memory, and the cloud

Microsoft's big data strategy: Democratize, in-memory, and the cloud

Summary: Microsoft wants to enable everyone to work with big data, using Excel, SQL Server, and current skill sets. With a solid stack, albeit one vulnerable in predictive analytics and mobility, can Microsoft become the McDonald's of big data "insights?"

SHARE:
TOPICS: Big Data
9

I've mentioned before that I've done a lot of work with Microsoft. Recently, I was visiting the company's corporate campus in Redmond, Washington, for the Global Summit of its Most Valuable Professionals program, in which I participate. As I was on campus, and it was the week before O'Reilly's big data-focused Strata conference, of which Microsoft is a big sponsor, I took the opportunity to sit down with Microsoft's Director of Program Management for BI in its Data Platform Group, Kamal Hathi.

It's not just about Strata, either. Hathi is gearing up to deliver the keynote address at the PASS Business Analytics Conference in Chicago next month and so his mind is pretty well immersed in strategic questions around big data that are relevant to the software giant that employs him.

Redmond's big data worldview

My goal was to find out how Redmond views the worlds of big data, analytics, and business intelligence, and what motivates those views, too. What I found out is that Microsoft sees big data mostly through two lenses: That of its business intelligence sensibility, formed over more than a decade of being in that market; and those of its other lines of business, including online services, gaming, and cloud platforms.

This combination makes Microsoft's analytics profile a combination of old-school mega vendor BI market contender and modern-day customer of analytics technologies. And because Microsoft has had to use its own BI tools in the service of big data analyses, it's been forced to find a way to make them work together, to ponder the mismatch between the two, and how best to productize a solution to that mismatch.

Strata reveals

I mentioned last week's Strata conference, and that really is germane to my conversation with Hathi, because Microsoft made three key announcements, all of which tie into the ideas Hathi and I discussed. Those announcements are as follows:

  • Version 2 of its SQL Server Parallel Data Warehouse product is complete, with Dell and HP standing by to take orders now for delivery of appliances this month. PDW v2 includes PolyBase, which integrates PDW's Massively Parallel Processing (MPP) SQL query engine with data stored in Hadoop.

  • Microsoft released a preview of its "Data Explorer" add-in for Excel. Data Explorer can be used to import data from a variety of sources, including Facebook and Hadoop's Distributed File System, and can import data from the web much more adeptly than can Excel on its own. Data Explorer can import from conventional relational data sources as well. All data imported by Data Explorer can be added to PowerPivot data models and then analyzed and visualized in Power View.

  • Hortonworks, Microsoft's partner in all things Hadoop, has released a beta of its own distribution of the Hortwonworks Data Platform (HDP) for Windows. This more "vanilla" HDP for Windows will coexist with Microsoft's HDInsight distribution of Hadoop, which is itself based on the HDP for Windows code base.

As I said, these announcements tie into the ideas Hathi discussed with me, but I haven't told you what they are yet. Hathi explained to me that Microsoft's strategy for "Insights" (the term it typically applies to BI and analytics) is woven around a few key pillars: "democratization", cloud, and in-memory. I'll try now to relay Hathi's elaboration of each pillar.

Democratization

"Democratization" is a concept Microsoft has always seen as key to its own value proposition. It's based on the idea that new areas of technology, in their early stages, typically are catered to by smaller pure play, specialist companies, whose products are sometimes quite expensive. In addition, the skills required to take advantage of these technologies are usually in short supply, driving costs up even further. Democratization disrupts this exclusivity with products that are often less expensive, integrate more easily in the corporate datacenter and, importantly, are accessible to mainstream information workers and developers using the skills they already have.

In the case of Hadoop, which is based on Apache Software Foundation open-source software projects, democratization is less about the cost savings aspect and much more about datacenter integration and skill set accessibility. The on-premises version of Microsoft's HDInsight distribution of Hadoop will integrate with Active Directory, System Center, and other back-end products; the Azure cloud-based version integrates with Azure cloud storage and with the Azure SQL Database offering as well.

In terms of skill set accessibility, Microsoft's integration of Excel/PowerPivot and Hadoop through Hive and ODBC means any Excel user that even aspires to power user status will be able to analyze big data on her own, using the familiar spreadsheet tool that has been established for decades.

The other thing to keep in mind is that HDInsight runs on Windows Server, rather than Linux. Given that a majority of Intel-based servers run Windows and that a majority of corporate IT personnel are trained on it, providing a Hadoop distribution that runs there, in and of itself, enlarges the Hadoop tent.

Topic: Big Data

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

9 comments
Log in or register to join the discussion
  • Microsoft fails at big solutions

    They need constant care, feeding, lots of servers, lots of expense, and lots of staff to manage them.

    Microsoft has no clue about truly "Big Data".
    itguy10
    • More make believe, I see. Time to change your diapers, itty

      Your fear of MS made you poop your pants.

      Again.
      William Farrel
      • Well, ...

        to post the kind of crap you did, you cannot be far behind.

        Perhaps it is time to grow up?
        D.T.Long
        • Guess who is talking about growing up...

          I think you are even worse than Cloggedbottom at times....so you advising others about growing up is a bit over ambitious.
          Owlll1net
    • Microsoft has a clue about profit...

      What irks me is corporate hijacking of words, right down to "democratize" -- try "corporatize", as in corporatism, instead...
      HypnoToad72
    • Where's you're next gig?

      I like you comedy act. And just like all good comedians, we really don't take them seriously.
      happyharry_z
    • Thanks

      Thanks for sharing nice view on BigData, Microsoft investing alot on this and their solution can be found http://www.techbubbles.com/microsoft/microsoft-big-data-solution/
      kalyanms1
  • Democratize

    I call bulls**t. I thought we put this sad word to bed.
    happyharry_z
  • MS achilles heel

    Automation.

    They suck at at, big time. It is nice I can fire up a Hadoop Cluster at Azure in 10 minutes (over 2x what it takes at AWS) and have it at my disposal. But getting real use of of Big Data means automating the crap out of everything. Data is constantly being procured (pushed, pulled), processed (cleaned and joined), and presented. There is a lot of movement and cadence here. This demands automated systems that do the heavy lifting (movement, joins, etc) getting data "ready" for realistic usage (whether automated or manual).

    Also, your talk of having BI folks operate on large scale data misses one huge problem. Moving it. Unless your BI tools can operate remotely, there is not way that is going to work. Even then, do you want BI folks launching 1TB by 1TB joins on your cluster? My guess is no.

    The future is exactly at the heart of what you identify as MS's other weak spot Machine Learning / Data Science / Predictive Analytics. Or as I call it, the death of BI. The best way to do BI is not at all, push meaningful data to people and processes letting them do the work for you. This really likens itself back you Automation as well, ML is really just automating BI.

    Where is MS's answer here? IMHO, Amazon is light years ahead with EMR, S3, RedShift, etc. Google is in great shape with BigQuery.
    mobile_manny