Linux and Open Source

Steven J. Vaughan-Nichols & Paula Rooney

Pentaho open sources big data code, licenses Kettle project under Apache 2.0

By | February 9, 2012, 10:57am PST

Summary: Pentaho has open sourced some of the big data assets in its Kettle open source project — and moved its entire Kettle Data Integration Platform to Apache 2.0 — in order to capture more of the booming Hadoop and NoSQL business.

One top BI player recently open sourced some of its data integration software and licensed the entire Kettle 4.3 release under Apache 2.0 to position itself well as a big data player.

Pentaho, a longstanding open source business intelligence applications player, notes that Hadoop and several top NoSQL databases are licensed under Apache. Pentaho’s Kettle open source project, othwerwise known as Pentaho Data Integration Community Edition,  is devoted to “operationalizing” big data.

Some of the big data capabilities in Kettle that will be open sourced include “the ability to input, output, manipulate and report on data using the following Hadoop and NoSQL stores: Cassandra, Hadoop HDFS, Hadoop MapReduce, Hadapt, HBase, Hive, HPCC Systems and MongoDB,” the company announced.

Traditional relational databases and data tools are insufficient for handling big datasets.

One exec had this to say about the open source move:

“In order to obtain broader market adoption of big data technology including Hadoop and NoSQL, Pentaho is open sourcing its data integration product under the free Apache license. This will foster success and productivity for developers, analysts and data scientists giving them one tool for data integration and access to discovery and visualization,” said Matt Caster, founder and chief architectb of Pentaho’s Kettle Project.

Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Topics

Paula Rooney is a Boston-based writer who has followed the tech industry for almost two decades.

Disclosure

Paula Rooney

Paula Rooney owns no stock in the companies that she covers. She holds a 401K that is managed by Morgan Stanley.

Biography

Paula Rooney

Paula Rooney has covered the software and technology industry for more than 20 years, starting with semiconductor design and mini-computer systems at EDN News and later focused on PC software companies including Microsoft, Lotus, Oracle, Red Hat, Novell and other open source and commercial software companies for CRN and PCWeek. She received a silver award from the American Society of Business Publication Editors in 2005 for her profile on Linus Torvalds and edited and co-authored "Partnering With Microsoft," a book about Microsoft's channel published by CMP Publishing in 2004. Rooney graduated from the Columbia University Graduate School of Journalism in 1997. In her off time, she enjoys scuba diving, sailing, sun worshipping, running, reading, surfing (the net) and hanging out with her family. She resides on the shores of Scituate, Massachusetts.

3
Comments

Join the conversation!

Just In

RE: Pentaho open sources big data code, licenses Kettle project under Apache 2.0
H-M 13th Feb
This is great news. I learned HPCC Systems is also developing plugins that allow users to spray fixed width or delimited files from within a Kettle job to a Thor cluster and also let you execute ECL on a Thor cluster from within a Kettle job. This integration can really allow for powerful data ETL capabilities. Learn more at hpccsystems.com
0 Votes
+ -
Pentaho is very competitive with Actuate
Dietrich T. Schmitz * Your Linux Advocate 9th Feb
Nice move.
I've got version 3.x installed on one of my Linux systems (it's LGPLv2). Now that I have a small Cassandra cluster setup (most definitely NOT big data), I may give version 4.x a test drive.
This is great news. I learned HPCC Systems is also developing plugins that allow users to spray fixed width or delimited files from within a Kettle job to a Thor cluster and also let you execute ECL on a Thor cluster from within a Kettle job. This integration can really allow for powerful data ETL capabilities. Learn more at hpccsystems.com

Join the conversation!

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]
ie8 fix

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity

White Papers, Webcasts, & Resources
ie8 fix