Microsoft is upping its commitment to the open-source Apache Spark big-data processing engine.
At this week's Spark Summit in San Francisco, Microsoft officials will be talking up Microsoft's support for Spark with its HDInsight, Cortana Intelligence Suite, Power BI and Microsoft R Server deliverables.
Here's a quick list of what's new as of today, June 6:
R Server for HDInsight will be generally available later this Summer. Previously in public preview, R Server for HD Insight, R Server for HD Insight will include Spark integration in both the HDInsight on-premises and cloud flavors.
A quick R refresher: R is a programming language that can be used in big-data statistics, predictive modeling and machine learning . Microsoft last April completed its acquisition of Revolution Analytics, the maker of a distribution of the R programming language for statisticial computing and predictive analytics, for an undisclosed amount.
R Server for Hadoop on-premises will support both Microsoft R and native Spark execution frameworks and be available in June. "Combining R Server with Spark gives users the ability to run R functions over thousands of Spark nodes letting you train your models on data 1000x larger and 100x faster than was possible with open source R and nearly 2x faster than Spark's own MLLib," according to Microsoft's blog post.
Power BI support for Spark Streaming now available. The previously announced Spark support in Power BI is now expanded with new support for Spark Streaming scenarios.
Even though many, including at least some affiliated with Spark, consider Spark a head-to-head competitor to Hadoop, Microsoft is positioning the two as complementary in many cases (as seen from the announcements above). However, Microsoft Research also is working on Prajna/OneNet, a project that's about building a distributed functional-programming platform for those wanting to build cloud services that make use of big-data analytics in some ways that are similar to what Spark provides.