Pentaho, a product that originally launched over a decade ago as an open source business intelligence package, will soon be available in a version 8.0 release.
Pentaho existed an an independent company for more than a decade, until it was acquired by Hitachi Data Systems (HDS) in 2015. HDS integrated Pentaho into its own offerings and services implementations, but otherwise left most things running as they had been before the acquisition. That changed last month, when Hitachi announced it was combining Pentaho, HDS and Hitachi Insight Group (the unit responsible for the Lumada IoT platform) into a single new division called Hitachi Vantara.
While Pentaho as a distinct company has now been phased out, the Pentaho product and brand have not in any way been withdrawn. To make that point crystal clear, the Pentaho World conference kicks off today in Orlando, Florida. And at that event, the first new version of the Pentaho suite in the Hitachi Vantara era, Pentaho 8.0, is being announced, with general availability to follow next month.
What's new Although BI/analytics is still an important part of Pentaho, the suite now spans well beyond that, and includes data integration and data mining (in the form of the Data Science Pack). In fact, it's the Pentaho Data Integration (PDI) component that features most prominently in this new release. Hitachi Vantara's Arik Pelkey, Senior Director, Pentaho Product Marketing, and Anand Rao, Senior Pentaho Product Marketing Manager, filled me in on the details.
New features in Pentaho 8.0 break down into three major areas. These are, in Hitachi Vantara's own words: improving connectivity to streaming data sources for real-time data processing; optimizing processing resources; and boosting team productivity. Let's take these in order.
Streaming colors On the streaming data side, Pentaho is adding support for two juggernaut Apache Software Foundation projects: Kafka and Spark. Kafka is supported as source for streaming data via a new connector, while Spark and Spark Streaming are used to process it such data.
Furthermore, the Adaptive Execution Layer (AEL) feature that was added in Pentaho 7.1 will be used for real-time processing, allowing streaming data work flows to be designed which can then run against Pentaho's own Kettle data integration engine or Spark. Pelkey and Rao explained to me that a common pattern may emerge where Kettle is used in development and test, with Spark being used in production. Support for other engines is anticipated, as is made evident in the architecture diagram below.
Resourceful management On the processing resource management side, Hitachi Vantara is adding a scale out architecture, allowing the Kettle engine to be deployed to a cluster of container-based worker nodes, rather than a single server. Worker nodes will not execute individual jobs in a distributed fashion but they can be used to execute multiple jobs in parallel. Pelkey and Rao explained to me that Kettle and Spark worker nodes can be overlaid on the same physical cluster.
Adaptive Execution is now certified compatible with HortonworksHadoop/Spark clusters in addition to Cloudera clusters, which were supported in the previous release. There's more Apache goodness in this release too: Pentaho 8.0 adds support for Apache Knox for cluster authentication (which makes sense, since Hortonworks is the major commercial entity behind that project), and adds support for Apache Avro and Parquet file formats.
And more In addition, the Data Explorer component of Pentaho Data Integration, which allows visualization of data as it's being prepared and transformed, now supports filtering functionality that was not available in the previous version. The company's press release also explains that Pentaho 8.0 adds improved repository usability and easier application auditing.
Considering Pentaho's original pre-Big Data pedigree, its platform now supports several major open source big data technologies and standards, for both data at rest and streaming data. Its applicability to Enterprise BI, data science and the Internet of Things, and its corporate integration with two previously separate Hitachi business units focused on those spaces, makes it a very different product than it was at its inception.
Pentaho has evolved with the industry; we'll see if its union into Hitachi Vantara provides even greater velocity in that evolution.