Microsoft's big data strategy: Democratize, in-memory, and the cloud

Microsoft's big data strategy: Democratize, in-memory, and the cloud

Summary: Microsoft wants to enable everyone to work with big data, using Excel, SQL Server, and current skill sets. With a solid stack, albeit one vulnerable in predictive analytics and mobility, can Microsoft become the McDonald's of big data "insights?"

SHARE:
TOPICS: Big Data
9

Big data in the cloud; big data and the cloud

The cloud isn't just about the Azure version of Hadoop. Rather, it's about ease of provisioning (itself another democratizing benefit), and access to public datasets. Cloud-based Hadoop clusters can be built by filling out a web-based form, clicking a submit button, and waiting for about 10 minutes. And once that cluster is up, it has access to data sources that are themselves cloud based. Yes, on-premises clusters can get to that cloud-based data too. But the effort and fixed infrastructure required to do that work on-premises is more significant.

Overall, Microsoft wants its big data technologies to scale from the desktop to the cloud. Hathi likened the goal to having a "flying car", permitting you to go from ground based to airborne (or desktop to private/public cloud) without having to switch to a plane. Writing that up makes it sound fairly corny, but Hathi was in earnest. It's just too inefficient to make trained information workers and database specialists change to a plane (Hadoop and MapReduce?) just because they want to work with really large datasets.

In-memory

In-memory is another very important area for Microsoft in the analytics world. Starting with the release of SQL Server 2008 R2 and its companion PowerPivot add-ins for Excel and SharePoint, Microsoft has seized on the value of in-memory technology. With the release of SQL Server 2012, the columnar engine in PowerPivot was brought to the company's Analysis Services product and even to its relational database in the form of special columnstore indexes.

The next version for SQL Server (currently referred to as SQL Server "14") will enhance columnstore indexes and will introduce a second in-memory database engine, code-named "Hekaton," designed for transactional workloads. Hekaton not only keeps data in-memory, but turns database stored procedures that query and manipulate that data into fast, compiled, native code.

Hekaton will bring big performance gains for certain workloads. And if you think those transactional workloads don't impact analytics, think again. As it turns out, Hekaton should be very beneficial for certain data extract, transform and load (ETL) applications as well.

Predicting predictive analytics

I got a lot of good information and insight (in the plain-English sense) chatting with Kamal Hathi, but a few things still concerned me. The biggest thing on my mind was the topic of predictive analytics, sometimes called machine learning, or data mining. Microsoft added a data mining engine to its Analysis Services product all the way back in 2000. That engine was enhanced rather dramatically with the release of SQL Server 2005, and brought the kind of democratization with it that Microsoft seeks to bring now for big data overall.

But since 2005, not much has been done with SQL Server Data Mining. It's still a very useful product, and it still ships with SQL Server, indicating that it's still important to Microsoft, even if the company hasn't significantly invested in it for eight years. But at this point in the market, many of Microsoft's competitors are in the predictive analytics game and Microsoft has fallen way behind. So what does it plan to do about it?

Hathi was a bit cagey with me in his answer. In other words, he didn't tell much of substance. But he assured me that predictive analytics is "super important" to Microsoft and that we will see progress on this front from the company. He indicated something similar for the territory of streaming data/complex event processing (CEP). Right now, the only offering Microsoft has in that arena is StreamInsight, a rather raw CEP engine, geared mostly to developers, that ships with SQL Server.

Prognosis

As I've written previously, Microsoft had excellent showings in Gartner's latest magic quadrants on data warehousing and business intelligence. In the last few years, the company has revamped almost its entire analytics stack, and has embraced big data and Hadoop, despite them being so Linux, open-source oriented. If Microsoft could just get its mobile BI (ie, data visualization for major smartphone and tablet platforms) story going, it would find itself in a fantastic position as the big data and enterprise BI worlds converge.

Disclosure: I'll be speaking at the PASS Business Analytics Conference event in Chicago next month myself.

Related stories

Topic: Big Data

Andrew Brust

About Andrew Brust

Andrew J. Brust has worked in the software industry for 25 years as a developer, consultant, entrepreneur and CTO, specializing in application development, databases and business intelligence technology.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

9 comments
Log in or register to join the discussion
  • Microsoft fails at big solutions

    They need constant care, feeding, lots of servers, lots of expense, and lots of staff to manage them.

    Microsoft has no clue about truly "Big Data".
    itguy10
    • More make believe, I see. Time to change your diapers, itty

      Your fear of MS made you poop your pants.

      Again.
      William Farrel
      • Well, ...

        to post the kind of crap you did, you cannot be far behind.

        Perhaps it is time to grow up?
        D.T.Long
        • Guess who is talking about growing up...

          I think you are even worse than Cloggedbottom at times....so you advising others about growing up is a bit over ambitious.
          Owlll1net
    • Microsoft has a clue about profit...

      What irks me is corporate hijacking of words, right down to "democratize" -- try "corporatize", as in corporatism, instead...
      HypnoToad72
    • Where's you're next gig?

      I like you comedy act. And just like all good comedians, we really don't take them seriously.
      happyharry_z
    • Thanks

      Thanks for sharing nice view on BigData, Microsoft investing alot on this and their solution can be found http://www.techbubbles.com/microsoft/microsoft-big-data-solution/
      kalyanms1
  • Democratize

    I call bulls**t. I thought we put this sad word to bed.
    happyharry_z
  • MS achilles heel

    Automation.

    They suck at at, big time. It is nice I can fire up a Hadoop Cluster at Azure in 10 minutes (over 2x what it takes at AWS) and have it at my disposal. But getting real use of of Big Data means automating the crap out of everything. Data is constantly being procured (pushed, pulled), processed (cleaned and joined), and presented. There is a lot of movement and cadence here. This demands automated systems that do the heavy lifting (movement, joins, etc) getting data "ready" for realistic usage (whether automated or manual).

    Also, your talk of having BI folks operate on large scale data misses one huge problem. Moving it. Unless your BI tools can operate remotely, there is not way that is going to work. Even then, do you want BI folks launching 1TB by 1TB joins on your cluster? My guess is no.

    The future is exactly at the heart of what you identify as MS's other weak spot Machine Learning / Data Science / Predictive Analytics. Or as I call it, the death of BI. The best way to do BI is not at all, push meaningful data to people and processes letting them do the work for you. This really likens itself back you Automation as well, ML is really just automating BI.

    Where is MS's answer here? IMHO, Amazon is light years ahead with EMR, S3, RedShift, etc. Google is in great shape with BigQuery.
    mobile_manny