Big data in the cloud; big data and the cloud
The cloud isn't just about the Azure version of Hadoop. Rather, it's about ease of provisioning (itself another democratizing benefit), and access to public datasets. Cloud-based Hadoop clusters can be built by filling out a web-based form, clicking a submit button, and waiting for about 10 minutes. And once that cluster is up, it has access to data sources that are themselves cloud based. Yes, on-premises clusters can get to that cloud-based data too. But the effort and fixed infrastructure required to do that work on-premises is more significant.
Overall, Microsoft wants its big data technologies to scale from the desktop to the cloud. Hathi likened the goal to having a "flying car", permitting you to go from ground based to airborne (or desktop to private/public cloud) without having to switch to a plane. Writing that up makes it sound fairly corny, but Hathi was in earnest. It's just too inefficient to make trained information workers and database specialists change to a plane (Hadoop and MapReduce?) just because they want to work with really large datasets.
In-memory is another very important area for Microsoft in the analytics world. Starting with the release of SQL Server 2008 R2 and its companion PowerPivot add-ins for Excel and SharePoint, Microsoft has seized on the value of in-memory technology. With the release of SQL Server 2012, the columnar engine in PowerPivot was brought to the company's Analysis Services product and even to its relational database in the form of special columnstore indexes.
The next version for SQL Server (currently referred to as SQL Server "14") will enhance columnstore indexes and will introduce a second in-memory database engine, code-named "Hekaton," designed for transactional workloads. Hekaton not only keeps data in-memory, but turns database stored procedures that query and manipulate that data into fast, compiled, native code.
Hekaton will bring big performance gains for certain workloads. And if you think those transactional workloads don't impact analytics, think again. As it turns out, Hekaton should be very beneficial for certain data extract, transform and load (ETL) applications as well.
Predicting predictive analytics
I got a lot of good information and insight (in the plain-English sense) chatting with Kamal Hathi, but a few things still concerned me. The biggest thing on my mind was the topic of predictive analytics, sometimes called machine learning, or data mining. Microsoft added a data mining engine to its Analysis Services product all the way back in 2000. That engine was enhanced rather dramatically with the release of SQL Server 2005, and brought the kind of democratization with it that Microsoft seeks to bring now for big data overall.
But since 2005, not much has been done with SQL Server Data Mining. It's still a very useful product, and it still ships with SQL Server, indicating that it's still important to Microsoft, even if the company hasn't significantly invested in it for eight years. But at this point in the market, many of Microsoft's competitors are in the predictive analytics game and Microsoft has fallen way behind. So what does it plan to do about it?
Hathi was a bit cagey with me in his answer. In other words, he didn't tell much of substance. But he assured me that predictive analytics is "super important" to Microsoft and that we will see progress on this front from the company. He indicated something similar for the territory of streaming data/complex event processing (CEP). Right now, the only offering Microsoft has in that arena is StreamInsight, a rather raw CEP engine, geared mostly to developers, that ships with SQL Server.
As I've written previously, Microsoft had excellent showings in Gartner's latest magic quadrants on data warehousing and business intelligence. In the last few years, the company has revamped almost its entire analytics stack, and has embraced big data and Hadoop, despite them being so Linux, open-source oriented. If Microsoft could just get its mobile BI (ie, data visualization for major smartphone and tablet platforms) story going, it would find itself in a fantastic position as the big data and enterprise BI worlds converge.
Disclosure: I'll be speaking at the PASS Business Analytics Conference event in Chicago next month myself.