Strata NYC brings announcements from MapR, Pentaho, Zoomdata, and more

The annual NYC appearance of Strata + Hadoop World is this week, and the summer Big Data news drought is more than over.
Written by Andrew Brust, Contributor

August is usually a slow news month, and September tends to provide the antidote. This year is no exception as Strata + Hadoop World kicks off in New York City this week. A number of announcements are breaking this morning; several others were slip-streamed into the news pipeline yesterday. I'll provide a summary here, of both days' announcements.

Let's start with today's news, which includes announcements from MapR, the Open Data Platform Initiative (ODPi), BlueData, Cazena, and Bitwise.

ODPi wants to make the Hadoop world compatible
Today, ODPi is announcing its new ODPi Interoperable Compliance Program and inclusion in that program of applications from SAS, WANdisco, SyncSort, DataTorrent, IBM, Pivotal, and Xaivent. The Interoperable Compliance Program essentially certifies applications as compatible with Hadoop platforms that in turn conform to ODPi's Runtime Specification.

Version 2 of the runtime spec is also being announced. While Version 1 comprised a synchronized stack consisting of YARN, MapReduce, and the Hadoop Distributed File System (HDFS), Version 2 is a superset of the first version, adding Apache Hive and Hadoop Compatible File System support (HCFS).

Hive, being an application that sits on top of MapReduce and YARN, marks the Runtime Spec's graduation from including only Hadoop core components to including an application built on top of them, that is nonetheless part of every Hadoop distribution. And adding HCFS to the spec means that MapR and Microsoft HDInsight, which implement HDFS over their own proprietary file systems, could theoretically join ODPi. To be clear, MapR and Microsoft have not joined ODPi, but now, in theory, they could.

MapR and microservices
Speaking of MapR, that company is announcing the support for a microservices architecture to its Hadoop distribution. MapR microservices are built on top of the MapR DB, MapR FS (file system), and MapR Streams technologies that were already part of the distro. But now there's explicit support for microservices-specific volumes, monitoring (cluster-wide), and microservices for A-B/multivariate testing.

MapR is also announcing that it will issue a series of "Converged Application Blueprints," the first of which will focus on high-speed streaming data for trades in the financial markets. The blueprints consist of sample apps with source code, architecture guides, shared expertise, and best practices.

Cazena's data lake is Azure blue
Cazena, which offers cloud "as a Service" implementations for data marts and data lakes is announcing that the latter service is now available on Microsoft's Azure cloud platform. Previously, the Cazena's Data Mart-as-a Service was available on Azure and both services were available on Amazon Web Services. Now both will be available on Azure as well. Data Lake-as-a-Service is based on Cloudera Enterprise running on Azure Infrastructure-as-a-Service resources. That said, Cazena is your single point of contact for the service; you don't need to be an Azure customer to use it, although you certainly can be.

BlueData gets robust on Security, Networking, Storage
BlueData, whose EPIC platform makes short work of deploying Docker containerized Hadoop and Spark clusters, is announcing its new fall release, which provides automated setup of Kerberos on its clusters; automated management of LDAP/Active Directory and users and groups; integration with Linux Privileged Access Management (PAM) tools like BeyondTrust PowerBroker, and FoxT ServerControl; and enhanced virtual networking and storage support.

ETL stops being a dirty word
Bitwise, a company that is new to me, is announcing release of its Hydrograph product, which provides an Extract transform and Load (ETL) platform implemented natively on Hadoop. Under the hood, Hydrograph uses Cascading to generate MapReduce jobs to execute the ETL jobs. Bitwise says its architecture is pluggable, however, and that adoption of Apache Spark and Apache Flink for job execution could come at a later time. The company says the product might eventually go open source as well.

Yesterday, not so far away
There were several announcements at Strata yesterday as well. On the funding front, Kinetica (which offers an in-memory database accelerated by GPUs) announced its raise of $6 million in acceleration financing and Podium Data (which offers data lake management platform) announced its $9.5 million Series A round, led by Malibu Ventures.

On the new release front, Pentaho announced new stuff for its PDI data integration platform. These include deeper Apache Spark integration; enhanced Kerberos integration; Apache Sentry integration; Apache Kafka support; and support for Avro and Parquet file formats. Pentaho has added over 30 transformation steps for Hadoop, HBase, JSON, XML, Vertica, Greenplum.

Maana offers what it calls its Knowledge Platform, which combines AI, semantic search, and straight analytics to generate business recommendations and integrate them into line of business applications. Maana 2017 is adding Knowledge Assistants and Knowledge Applicationsto the platform. The former allow for optimization of supply chain, call center, accounts receivables, predictive maintenance, and other domains. The latter facilitate time series analyses; semantic similarity searches within instances, cases, events, and records; and extraction of knowledge from unstructured documents.

Lightbend announced its new Fast Data Platform (FDP) -- which leverages Apache Kafka, Spark, Flink, Mesosphere DC/OS, OpsClarity, and Lightbend's own technologies -- to make it easier for developers to build streaming data applications.

Finally, Zoomdata announced partnerships and integrations with Cloudera for Cloudera Enterprise, Google for BigQuery and Teradata for Teradata Database and Teradata Appliance for Hadoop, including both the Cloudera and Hortonworks editions. Zoomdata also announced the availability of a 30-day trial edition of the product on the Amazon Web Services Marketplace.

More tomorrow?
All this news, and it's only Tuesday. Meanwhile, the main conference days at Strata are Wednesday and Thursday. Will there be even more news? I'm not sure, but I will be sitting down for over a dozen vendor briefings on those two days. So even if the news is over, the analysis has barely begun.

Editorial standards