Last week, Microsoft held its annual Connect() event in New York City, at an event space right at the mouth of the Holland Tunnel. Connect() tends to be focused on Visual Studio and the application development stack. But just as the Holland Tunnel joins a hip part of Manhattan to Jersey City, NJ, Connect() tied together the dev stack announcements with a bunch of announcements around the Microsoft Data Platform.
Microsoft had two huge announcements around SQL Server, arguably the Data Platform's component tied most closely to the developer world. But it also had announcements in the worlds of Big Data and analytics, specifically around Azure Data Lake; R Server, HDInsight and Apache Kafka.
Let's get relational
Let's start with the stuff pertaining to SQL Server, my Microsoft data first love. And if I loved it before, there's an even greater attraction now. The reasons: (1) the first version of the product to run on Linux is now in public preview as part of SQL Server "vNext" Community Technology Preview 1 (CTP 1) and (2) almost all the cool features of the product that were exclusive to the Enterprise Edition are now, with the release of SQL Server 2016 Service Pack 1 (SP1), available in every edition, including Standard, Web, Express and, with some footnotes, even in LocalDB, the application-embedded version of the product.
As someone who was involved early on in the private preview of SQL Server on Linux, I'm really glad that everyone can check it out now. While the product contains only the core relational engine, and not broader components like Reporting Services, Analysis Services or Integration Services, it is nonetheless a full fledged implementation of the product, and can run both "on the metal," or in a Docker container. It's compatible with Suse, Ubuntu and Red Hat Enterprise Linux (RHEL) distributions.
Close the Windows
The most remarkable thing about SQL Server on Linux is how, once it's installed and working, it's not remarkable at all. In other words, while there are some Linux command line tools for the product which are clearly unique, interacting with the server from an application, BI tool or even a Windows-based tool like SQL Server Management Studio, is practically indistinguishable from working with the Windows version. It's almost a let down.
But the big difference is that developers who are targeting Linux servers can work with SQL Server now. And even developers using Macs can run it locally, without a LAN or Internet connection, by running it in a Docker container. This makes SQL Server more competitive with Oracle, to be sure. But it also makes it more competitive with open source relational databases like MySQL and PostgreSQL.
You get columnstore, and you get in-memory, and you get PolyBase!
Of course, SQL Sever, even on Linux, isn't open source. But free versions do exist. Specifically, SQL Server Express and LocalDB are both free products. While they impose memory restrictions and other constraints, they work well where smaller databases are needed. The problem with these editions, and even their older, paid siblings like Web Edition and Standard Edition, is that Microsoft has kept most of its latest breakthrough SQL Server technologies out of them.
Cool features like columnstore indexes (which turns SQL Server into a column store database, enabling data warehouse, data mart and hybrid transactional-analytics implementations), memory-optimized tables (an in-memory transactional database technology) and PolyBase (which allows data in Hadoop and Azure Blob Storage be queried and joined as if it were located in SQL Server tables) have been off limits to non-Enterprise customers. This has inhibited their adoption among developers, Independent Software Vendors (ISVs -- who need to build their applications to work on Standard Edition for those customers who have it) and, by extension, the entire ecosystem.
That's a thing of the past now though, with Microsoft's announcing the general availability (GA) of SQL 2016 SP1 last week, virtually all features are available in all editions. Are there a few exceptions? Yes, but they're logical, based on how those editions are deployed. And Microsoft is also being very plain and transparent about the exceptions, summarizing them in this blog post, as well as the one I linked to in the third paragraph of this post.
Microsoft hearts developers
So we can start to see a path where developers on their Macs and Linux servers, used to working with free, open source tools, can also code against SQL Server and its most advanced features, without it costing them money, and without needing a machine (even a virtual one) running Windows. We're not there yet, but when SQL Server vNext on Linux, and the feature/licensing policies in SQL 2016 SP1 converge, we'll be damned close.
As a longtime Microsoft ecosystem professional, that gives me optimism. In an age when working with data has come to mean working with Linux and open source, this is a good turn of events.
"R with you is really a Kafkaesque experience..."
A similarly positive development occurred when Microsoft decided to release a Linux version of its cloud-based Hadoop distribution, HDInsight. It meant that companies in the Hadoop ecosystem, virtually all of whom (including my employer, Datameer) are Linux-focused, could partner and integrate with HDInsight.
This has helped HDInsight become a full-fledged Hadoop distribution, offering specialized cluster types not just for generic Hadoop work, but also for working with Apache HBase, Storm and Spark. And, just announced last week, a new cluster type for working with streaming data-oriented Apache Kafka, is in public preview.
And when Microsoft acquired Revolution Analytics, which had become the principal commercial entity behind the open source R programming language for statistics and machine learning, that was good too. That company's Revolution R Enterprise (RRE) product, now sold as Microsoft R Server, was noteworthy in its ability to run on a server, or scale across entire clusters, running in a distributed, in-database mode, rather than running standalone on a local PC or Mac.
Also read: Microsoft's R Strategy
SQL Server 2016 includes integration of the R Server technology in the form of SQL Server R Services. And while that's not yet part of SQL Server on Linux, another integration is Linux-based: R Server for HDInsight, which is integrated with Apache Spark, running on HDInsight. That product had been in preview for some time, and last week went into GA.
The GA version is tuned to work with Spark 2.0, can access data stored in Apache Hive or in Parquet format in HDFS directly, and can also access data in Microsoft's HDFS-compatible storage service called Azure Data Lake Store (ADLS).
Data Lake, and more
The latter, though based on Azure Blob Storage, provides even more robust fault tolerance and has no limit on file sizes. It too has gone GA, as has its companion query service, Azure Data Lake Analytics (ADLA). The combination of ADLS and ADLA let you do Big Data work using U-SQL, a SQL-like query language that is extensible using Microsoft .NET and C#. Plus, ADLA jobs are run on-demand, rather than requiring a dedicated cluster, which brings a Platform as a Service sensibility to Hadoop, upon which ADLA runs.
Also read: Microsoft BUILDs its cloud Big Data story
That's about all there is, but it's quite a lot. From SQL Server relational technology to R, Hadoop, Spark, Kafka and various integrations between them, on both Windows and on Linux, Microsoft, if nothing else, has data and analytics passion, big time. When you add things like Cognitive Services and Power BI on top of all this, Microsoft has a sprawling, formidable wall of data technology that's integrated, open and cross-platform.