'

Pentaho version 8: Here's what's new and improved

Hitachi Vantara announces version 8.0 of Pentaho, the company's BI, data integration and data science stalwart.

Pentaho, a product that originally launched over a decade ago as an open source business intelligence package, will soon be available in a version 8.0 release.

Pentaho existed an an independent company for more than a decade, until it was acquired by Hitachi Data Systems (HDS) in 2015. HDS integrated Pentaho into its own offerings and services implementations, but otherwise left most things running as they had been before the acquisition. That changed last month, when Hitachi announced it was combining Pentaho, HDS and Hitachi Insight Group (the unit responsible for the Lumada IoT platform) into a single new division called Hitachi Vantara.

Also read: Hitachi Data Systems to buy big data analytics firm Pentaho
Also read: Hitachi launches Vantara, aims to target IoT, data center, cloud, analytics

While Pentaho as a distinct company has now been phased out, the Pentaho product and brand have not in any way been withdrawn. To make that point crystal clear, the Pentaho World conference kicks off today in Orlando, Florida. And at that event, the first new version of the Pentaho suite in the Hitachi Vantara era, Pentaho 8.0, is being announced, with general availability to follow next month.

What's new
Although BI/analytics is still an important part of Pentaho, the suite now spans well beyond that, and includes data integration and data mining (in the form of the Data Science Pack). In fact, it's the Pentaho Data Integration (PDI) component that features most prominently in this new release. Hitachi Vantara's Arik Pelkey, Senior Director, Pentaho Product Marketing, and Anand Rao, Senior Pentaho Product Marketing Manager, filled me in on the details.

New features in Pentaho 8.0 break down into three major areas. These are, in Hitachi Vantara's own words: improving connectivity to streaming data sources for real-time data processing; optimizing processing resources; and boosting team productivity. Let's take these in order.

Streaming colors
On the streaming data side, Pentaho is adding support for two juggernaut Apache Software Foundation projects: Kafka and Spark. Kafka is supported as source for streaming data via a new connector, while Spark and Spark Streaming are used to process it such data.

Furthermore, the Adaptive Execution Layer (AEL) feature that was added in Pentaho 7.1 will be used for real-time processing, allowing streaming data work flows to be designed which can then run against Pentaho's own Kettle data integration engine or Spark. Pelkey and Rao explained to me that a common pattern may emerge where Kettle is used in development and test, with Spark being used in production. Support for other engines is anticipated, as is made evident in the architecture diagram below.

pentaho-8-adaptive-execution.jpg

Pentaho 8 Adaptive Execution

Credit: Hitachi Vantara

Resourceful management
On the processing resource management side, Hitachi Vantara is adding a scale out architecture, allowing the Kettle engine to be deployed to a cluster of container-based worker nodes, rather than a single server. Worker nodes will not execute individual jobs in a distributed fashion but they can be used to execute multiple jobs in parallel. Pelkey and Rao explained to me that Kettle and Spark worker nodes can be overlaid on the same physical cluster.

Adaptive Execution is now certified compatible with Hortonworks Hadoop/Spark clusters in addition to Cloudera clusters, which were supported in the previous release. There's more Apache goodness in this release too: Pentaho 8.0 adds support for Apache Knox for cluster authentication (which makes sense, since Hortonworks is the major commercial entity behind that project), and adds support for Apache Avro and Parquet file formats.

And more
In addition, the Data Explorer component of Pentaho Data Integration, which allows visualization of data as it's being prepared and transformed, now supports filtering functionality that was not available in the previous version. The company's press release also explains that Pentaho 8.0 adds improved repository usability and easier application auditing.

Considering Pentaho's original pre-Big Data pedigree, its platform now supports several major open source big data technologies and standards, for both data at rest and streaming data. Its applicability to Enterprise BI, data science and the Internet of Things, and its corporate integration with two previously separate Hitachi business units focused on those spaces, makes it a very different product than it was at its inception.

Pentaho has evolved with the industry; we'll see if its union into Hitachi Vantara provides even greater velocity in that evolution.