I haven't yet written much about the "R" programming langauge or the vast number of libraries for it, but it's a very important language and platform. R is all over the Big Data world, and yet it's not exclusively a Big Data technology. At its core, R is an open source programming language for mathematical and statistical analysis. And so, on the one hand, R competes with SAS and IBM's SPSS, the two commercial software packages that have traditionally filled the space. But it also competes with various standalone data mining and predictive analytics products. That's a lot of surface area to cover.
For programmers, and data scientists, too R is more than just a no-license-cost replacement for SAS and SPSS. It really is a data science tool. R's popularity in both academic and commercial circles, combined with its extensibility, has resulted in a booming ecosystem of libraries that take the general purpose R language and make it a more specialized tool for various specialty areas. Big Data is chief amongst these; resulting in the RHadoop project, along with its rmr library, which allows the R language to be used for Hadoop MapReduce code. The companion rhdfs and rhbase libraries allow that code to operate on raw HDFS files or on data in HBase, Hadoop's frequent NoSQL companion.
Value add In the Hadoop world, companies like Cloudera make available full, free open source Hadoop distributions, and also offer paid, commercial products, like Cloudera Manager, which are made available through Enterpise licenses. R is no different: and Palo Alto-based Revolution Analytics offers its open source distro and its commercial enhancements that include proprietary extensions. Together, these are packaged as a product called Revolution R Enterprise, version 6 of which is being released today.
What's new? Revolution R Enterprise, version 6, includes version 2.14.2 of the core open source R product. The latter, in turn, includes a byte code compiler for developers' source code, providing enhanced performance over the purely interpreted code architecture used in previous versions.
Revolution R Enterprise has thus far offered RevoScaleR, supporting multi-core and multi-node scale-out, contrasting significantly with the single-threaded, in-memory architecture of open source R. The Enterprise product also includes RevoDeployR, which offers a RESTful service interface to R libraries, opening up the capabilities of those libraries to Java, .NET and other development platforms.
The release of Revolution R Enterprise, version 6, adds several new enterprise features. Among them:
- Support for Linux-based computing grids, using IBM's Platform LSF
- Support for Microsoft Azure Burst Mode, the on-premises/cloud hybrid implementation of Windows High-Performance Computing (HPC). This joins the support for on-premises Windows HPC that existed in version 5.
- Support for generalized linear models on RevoScaleR
- Direct access to ODBC data sources as well as flat files, SAS and SPSS files, all without staging into intermediate XDF files.
Back to the dRawing boaRd R can be used for mathematics, machine learning, predictive analytics and more. With a big ecosystem and a brand new enterprise version, R also aims to be a first-class Big Data tool, and one with the ability to solicit data from a variety of data sources.
If it seems prudent to you to learn more about R, then check out the beginner tips section of the Revolutions blog. The blog is written by David Smith, the very person who briefed me on Revolution R Enterprise v6. David was very clear and patient with me, and I suppose as Revolution Analytics' VP of Marketing and Community, he should be. But a gentle demeanor coupled with techie chops is a pretty rare thing. So take a look at his posts and videos, and get up the learning curve quickly.