Microsoft expands its commitment to Apache Spark big-data framework

Microsoft is supporting Apache Spark with its HDInsight, Cortana Intelligence Suite, Power BI and Microsoft R Server software and services.
Written by Mary Jo Foley, Senior Contributing Editor

Microsoft is upping its commitment to the open-source Apache Spark big-data processing engine.


At this week's Spark Summit in San Francisco, Microsoft officials will be talking up Microsoft's support for Spark with its HDInsight, Cortana Intelligence Suite, Power BI and Microsoft R Server deliverables.

Here's a quick list of what's new as of today, June 6:

Spark for Azure HDInsight is now generally available. Microsoft took the wraps of the public preview of Spark for HDInsight -- with HDInsight being Microsoft's cloud version of the Hadoop big-data framework -- a year ago.

R Server for HDInsight will be generally available later this Summer. Previously in public preview, R Server for HD Insight, R Server for HD Insight will include Spark integration in both the HDInsight on-premises and cloud flavors.

A quick R refresher: R is a programming language that can be used in big-data statistics, predictive modeling and machine learning . Microsoft last April completed its acquisition of Revolution Analytics, the maker of a distribution of the R programming language for statisticial computing and predictive analytics, for an undisclosed amount.

Microsoft previously announced that the company is integrating the commercial R distribution into SQL Server 2016 in the form of SQL Server R Services. Microsoft made SQL Server 2016 generally available last week, on June 1.

R Server for Hadoop on-premises will support both Microsoft R and native Spark execution frameworks and be available in June. "Combining R Server with Spark gives users the ability to run R functions over thousands of Spark nodes letting you train your models on data 1000x larger and 100x faster than was possible with open source R and nearly 2x faster than Spark's own MLLib," according to Microsoft's blog post.

Power BI support for Spark Streaming now available. The previously announced Spark support in Power BI is now expanded with new support for Spark Streaming scenarios.

Even though many, including at least some affiliated with Spark, consider Spark a head-to-head competitor to Hadoop, Microsoft is positioning the two as complementary in many cases (as seen from the announcements above). However, Microsoft Research also is working on Prajna/OneNet, a project that's about building a distributed functional-programming platform for those wanting to build cloud services that make use of big-data analytics in some ways that are similar to what Spark provides.

Editorial standards