Last week Microsoft released R Server (MRS) 9.0. Ironically, I was so busy working with R Server for a conference workshop I was presenting at the end of the week, that I needed to wait until this week to cover it, so I could provide some context, in addition to the raw facts.
Those facts are still important though, so let's first run them down, then we can discuss their significance.
Microsoft Adds its own stuff
First off, Microsoft announced a new release of MRS, its server-based (and cluster-based) version of the R programming language that came into its possession when it acquired Revolution Analytics last year. It's the Revolution R Enterprise (RRE) product that became MRS.
Also read: Microsoft acquires Revolution Analytics
The new release goes beyond a mere re-branding of the RRE product, though. That's because MRS 9 features a package called "Microsoft ML" (machine learning), which the company describes as a collection of "best-of-breed ML algorithms that have been battle-tested by Microsoft on a variety of [its] products."
Microsoft ML includes improved logistic regression and new "fast linear learner," "fast boosted decision tree" and "fast random forest" algorithms. It also includes GPU (graphics processing unit)-accelerated Deep Neural Networks (DNNs) and a One-Class Support Vector Machine.
While I can't claim the expertise to appreciate each of these improvements, I can say that GPU-based deep learning algorithms are a big thing in AI/machine learning right now and inclusion of that technology in R Server is exciting. Given everything going on at Microsoft with Cognitive Services, the Bot Framework, and Bing Predicts, inclusion of algorithms that have been "battle-tested" in Redmond would seem to bode well for the MRS product.
Microsoft ML is now available in the MRS implementations on Windows and in SQL Server. Microsoft says it will be available on Linux and in HDInsight (its cloud-based Hadoop and Spark service) "in the new year."
Beyond the inclusion of Microsoft ML, MRS 9 allows R models to be exposed as Web services, which is the same way Azure Machine Learning works. Microsoft points out that with the help of open source API framework Swagger, the Web service deployment means that models built in R can be used from virtually any programming language and platform.
Furthermore, with MRS 9, Microsoft says models that are trained in one environment can even be moved to, and scored in, other environments, and that new active-active high availability features in MRS 9 mean these models can be scaled for high-demand production use.
Further, MRS 9 now integrates with Spark 2.0, and adds support for the Ubuntu distribution of Linux. The combination of these two enhancements means greater compatibility with more Hadoop distributions (which include Spark), beyond Microsoft's own. And with even greater Hadoop/Spark versatility in mind, MRS 9 also adds the ability to read data directly into R Dataframes from Hive tables and Parquet files sitting on all those Hadoop clusters.
What's the impact?
So that's the news. Now let's look at why it's significant, especially in a server version of R and especially one that's integrated with a mainstream, relational database system like SQL Server.
Since R has traditionally been a tool used by individual statisticians and data scientists, from individual desktop computers, on an ad hoc basis, a server version of R, all by itself, is important. A server version of R allows code written in it to be run centrally, either on a beefy server, or on a cluster of servers, so that it can properly scale and serve multiple users. This was Revolution's differentiator when it was an independent company, and it's a differentiator for Microsoft now.
We R embedded
R Server can even run as part of a database system. In fact, before Microsoft acquired Revolution, the erstwhile RRE product was available in distributed versions that integrated with Teradata and Vertica MPP data warehouse systems.
The Teradata version is still available, but for customers on the Microsoft stack, Redmond has worked to integrate R with a number of its own products, including Azure Machine Learning (an integration that was in place pre-acquistion), Power BI and, more recently, SQL Server and HDInsight, Microsoft's cloud Hadoop and Spark service.
The SQL Server integration is especially important, as I've studied first-hand and taught to attendees of my SQL Server Live! workshop last week. Since trained R models can be stored in SQL Server database tables, and R code can be embedded in its stored procedures, it means that any developer acquainted with coding against SQL Server can easily add predictive analytics to her applications.
Why? Because running a prediction becomes merely a matter of calling a stored procedure, passing it values for the model's "features" (i.e. input variables) and getting back the predicted value for the "label" (output variable) as the stored procedure's return value.
The mechanics of calling a stored procedure, passing input parameter values and getting back an output parameter value are super familiar to an entire generation of application developers. Integration of R Server into SQL Server means that predictive analytics capabilities are now available to all of them.
The scale doesn't lie
I once wrote an editorial here called "Data Scientists Don't Scale," in which I argued that if predictive analytics technology was relegated to a high priesthood of data scientists, then it wouldn't benefit Enterprise organizations, because there aren't enough of those priests to go around.
Also read: Data Scientists Don't Scale
But a deployment of R-based predictive models into Enterprise developers' bread-and-butter tool chains? That's another matter. That does scale, both in the skillset sense that I meant and the more literal sense of productionalized use of the technology.
On that latter point, Microsoft's Corporate VP for Data, Joseph Sirosh, has already proven that by deploying R models into SQL Server, the combination can accommodate 1 million predictions per second. And that was based on the previous release of R Server, without the new "fast" algorithms added with the inclusion of Microsoft ML.
ML for the masses
In other words, by including predictive analytics technology in an Online Transactional Processing (OLTP) database, Microsoft is making predictive analytics itself an OLTP workload. And while OLTP workloads may seem mundane, bringing cutting edge technology into them is anything but.
Now add to that the ubiquity of R's integration into other Microsoft tools and products, including SQL Server Reporting Services, Power BI and even Visual Studio -- Microsoft's core integrated developer environment (IDE) -- and what you've got is a very earnest and credible attempt to make predictive analytics and machine learning truly pervasive.
Although the R language itself, and its underpinnings in statistics, may be a bit of a leap for the average C# or T-SQL developer, I can tell you, as a very rusty C#/T-SQL developer myself, that it's still quite feasible.
But it goes beyond that. Because once a developer on-boards core R skills and sees them become applicable in domain after domain, through so many data-related products and technologies in the Microsoft stack, something really just...pops.
And then the real impact becomes apparent: the data science stuff is so much more powerful when it's in the hands of mainstream developers and database professionals. That's when it gets ubiquitous. That's when it gets normalized. And that's when the flywheel really gets going.