ie8 fix

Virtually Speaking

Dan Kusnetzky, Paula Rooney and Ken Hess

RainStor: bringing the power of Hadoop to corporate developers

By | January 18, 2012, 3:21am PST

Summary: Hadoop has become a central tool for Big Data applications. RainStor has added SQL access, better compression and the ability to reduce the number of systems in a Hadoop cluster to improve performance and reduce overall costs of using Hadoop.

John Bantleman, CEO, and Deirdre Mahon, VP of marketing, of RainStor, introduced me to some enhancements the company was just about to announce. The goal was to make Hadoop easier to use for corporate developers, improve the performance of Hadoop and also dramatically reduce the number of systems needed to process Hadoop-based analytics.

Before we get into what RainStor had to say, let’s take a moment to look at Hadoop.

What is Hadoop?

Hadoop is a set of Apache open source projects that is getting quite a bit of interest recently. Hadoop is mentioned almost every time the catch phrase “Big Data” is discussed. It has had a strong impact on organizations needing to analyze huge volumes of rapidly changing data.

The Apache foundation describes Hadoop in the following way:

The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.

The project includes these subprojects:

Other Hadoop-related projects at Apache include:

  • Avro™: A data serialization system.
  • Cassandra™: A scalable multi-master database with no single points of failure.
  • Chukwa™: A data collection system for managing large distributed systems.
  • HBase™: A scalable, distributed database that supports structured data storage for large tables.
  • Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.
  • Mahout™: A Scalable machine learning and data mining library.
  • Pig™: A high-level data-flow language and execution framework for parallel computation.
  • ZooKeeper™: A high-performance coordination service for distributed applications.

What did RainStor have to say?

RainStor claims to have added the first enterprise database running natively on Hadoop.” Furthermore, the company states that it’s product enables faster, more flexible analytics on multi-structured data, without the need to move data out of the Hadoop Distributed File System (HDFS) environment.

RainStor has added the following enhancements to the Hadoop environment:

  • RainStor has added compression technology that can reduce the size of Hadoop data sets by up to 40 times. The compressed multi-structured data set running on HDFS improves overall processing efficiency and reduces the size of clusters by 50-80 percent according to RainStor. This one factor, the company points out, would significantly lowers operating cost.
  • The company has provided SQL access to Hadoop so that it can be used along side of the more traditional MapReduce access mechanism. RainStor claims 10 to 100 times performance improvements for analytic applications.

If your organization is using Hadoop or thinking about using Hadoop for business analytics, it would be worth the time to talk with RainStor.

Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Topics

Daniel Kusnetzky is a distinguished analyst and the founder of the Kusnetzky Group LLC.

Disclosure

Dan Kusnetzky

The Kusnetzky Group LLC is an independent technology industry research firm that focuses on system software, virtualization and cloud computing technology.

Dan's opinions are based upon research, personal experiences and actual use of technology. They are not based upon the relationships the company may or may not have with suppliers, end user organizations, the media, consultants or other analysts.

Dan's research is available on a subscription basis through the Kusnetzky Group LLC. Dan's attendance at industry events or at client meetings may be sponsored by the client. Clients may provide hardware or software for testing prior to the publication of analysis that includes that product. Clients may also provide shirts, jackets, coffee cups, folders, backpacks, pens and other event chotchkies. While nice, these don't effect Dan's opinions or insight about those clients or their products.

Biography

Dan Kusnetzky

Daniel Kusnetzky, Analyst and Founder of Kusnetzky Group LLC, is responsible for research, publications, and operations. Mr. Kusnetzky has been involved with information technology since the late 1970s. Mr. Kusnetzky has been responsible for research operations at the 451 Group; corporate and marketing strategy for Open-Xchange; system software and virtualization research at IDC; and program and product management at Digital Equipment Corporation.; Today, Mr. Kusnetzky focuses on system software, virtualization technology and cloud computing.

Related Discussions on TechRepublic

Did you know you can take part in these discussions with your ZDNet membership?

The discussion hasn’t started yet. Why don’t you begin it?

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]
ie8 fix

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity

White Papers, Webcasts, & Resources
ie8 fix