madison

Revolution Analytics targets R language, platform at growing need to handle 'big data' crunching challenges

By | August 4, 2010, 9:27am PDT

Summary: The latest version of Revolution R Enterprise comes complete with an add-on package called RevoScaleR, a framework for multi-core processing of large data sets. With RevoScaleR, Revolution Analytics targets some of the largest levels of capacity and performance for analyzing big data.

Revolution Analytics is working to revolutionize big data analysis with better crunching tools and an updated platform that brings the open source R statistics language to some the the largest data sets.

The company is betting its new big data scalability platform will help R transition from a research and prototyping tool to a production-ready platform for such enterprise applications as quantitative finance and risk management, social media, bioinformatics, and telecommunications data analysis.

The latest version of Revolution R Enterprise comes complete with an add-on package called RevoScaleR, a framework for multi-core processing of large data sets. With RevoScaleR, Revolution Analytics targets some of the largest levels of capacity and performance for analyzing big data, they said.

“With RevoScaleR, we’ve focused on making analytical models not just scale to the big data sets, but run the analysis in a fraction of the time compared to traditional systems,” says David Smith, vice president of Community and Marketing at Revolution Analytics. “For example, the FAA publishes a data set that contains every commercial airline take off and landing between 1987 and 2008. That’s more than 13 gigabytes of data. By analyzing that data, we can figure out the likelihood of airline delays in one second.”

A rows-and-columns approach

One second to analyze 13 GB of data should turn some heads because it takes 300 seconds with traditional methods. Under the hood of RevoScaleR is rapid fire access to data. For example, the RevoScaleR uses an XDF file format, a new binary big data file format with an interface to the R language that offers high-speed access to arbitrary rows, blocks and columns of data.

We’ve taken that one step further to develop a system that accesses the database by rows and columns at the same time

“The new SQL movement was all about going from relational databases to a flat file on a disk that offers fast to access by columns. A lot of the technology that’s behind things like Twitter and Facebook take this approach,” Smith said. “We’ve taken that one step further to develop a system that accesses the database by rows and columns at the same time, which is really well-attuned to doing these statistical computations.”

RevoScaleR also relies on a collection of the most-common statistical algorithms optimized for big data, including high-performance implementations of summary statistics, linear regression, binomial logistic regression and crosstabs. Data reading and transformation tools let users interactively explore and prepare large data sets for analysis. And, extensibility lets expert R users develop and extend their own statistical algorithms.

Integrating Hadoop

Based on the open-source R technologies, Revolution R Enterprise accordingly plays well with other modern big data architectures. Revolution R Enterprise leverages sources such as Hadoop, NoSQL or key value databases, relational databases, and data warehouses. These products can be used to store, regularize, and do basic manipulation on very large data sets—while Revolution R Enterprise now provides advanced analytics.

“Together, Hadoop and R can store and analyze massive, complex data,” says Saptarshi Guha, developer of the popular RHIPE R package that integrates the Hadoop framework with R in an automatically distributed computing environment. “Employing the new capabilities of Revolution R Enterprise, we will be able to go even further and compute dig data regressions and more.”

The new RevoScaleR package will be delivered as part of Revolution R Enterprise 4.0, which will be available for 32-and 64-bit Microsoft Windows in the next 30 days. Support for Red Hat Enterprise Linux (RHEL 5) is planned for later this year.

BriefingsDirect contributor Jennifer LeClaire provided editorial assistance and research on this post. She can be reached at http://www.linkedin.com/in/jleclaire and http://www.jenniferleclaire.com.

You may also be interested in:

Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Topics

Dana Gardner is president and principal analyst at Interarbor Solutions, an enterprise IT analysis, market research, and consulting firm.

Disclosure

Dana Gardner

Dana Gardner is president and principal analyst at Interarbor Solutions, LLC, a New Hampshire-based IT analysis and new media content production and consultancy firm that he founded in 2005. He produces a series of podcast/videocast/transcript/blog content shows, called BriefingsDirect[tm/sm], some of which are sponsored and which he blogs on. Such sponsored shows are declared individually as such and by what organization or company. When Dana blogs on ZDNet on companies that he does have, or has had, consulting and/or sponsorship relationships, he declares that in each blog entry. There is no connection between the negotiation of such sponsorships and the opinions expressed by Dana here on ZDNet. To date, the following organizations/companies have sponsored, or do sponsor, some BriefingsDirect content, or have consulting relationships with Dana: Active Endpoints Akamai Technologies Aster Data Systems BP Logix Business Technology Quarterly CA Compuware Electric Cloud Genuitec Gerson Lehrman Group Greenplum Hewlett-Packard iTKO JustSystems North America, Inc. Kapow Technologies LogLogic Nexaweb Technologies, Inc. The Open Group Paglo Panda Security Platform Computing Progress Software rPath Sailpoint Splunk TIBCO Software Weblayers Workday WSO2 ZDNet As a matter of CNET Networks and Interarbor Solutions policies, when Dana covers an organization that is also a sponsor of a BriefingsDirect-produced podcast, videocast or any other content, a disclosure will be included with the coverage. Updated (1/4/2010): Instead of providing a disclosure on just those editorials (blog posts, etc.) that intersect the above listed companies, we have changed the policy to include a link to this full disclosure at the end of every one of Dana's blog posts. In the case of audio or video-based coverage, such disclosures will be provided within the editorial content itself.

Biography

Dana Gardner

Dana Gardner is president and principal analyst at Interarbor Solutions, an enterprise IT analysis, market research, and consulting firm. Gardner, a leading identifier of software and cloud productivity trends and new IT business growth opportunities, honed his skills and refined his insights as an industry analyst, pundit, and news editor covering the emerging software development and enterprise infrastructure arenas for the last 18 years.

Gardner tracks and analyzes a critical set of enterprise software technologies and business development issues: Cloud computing, SOA, business process management, business intelligence, next-generation data centers, and application lifecycle optimization. His specific interests include Enterprise 2.0 and social media, cloud standards and security, as well as integrated marketing technologies and techniques.

Gardner is a former senior analyst at Yankee Group and Aberdeen Group, and a former editor-at-large and founding online news editor at InfoWorld. He is a former news editor at IDG News Service, Digital News & Review, and Design News.

Talkback - Tell Us What You Think

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]
Click Here

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity

White Papers, Webcasts, & Resources