Three Big Data partnerships on deck

Today's a big day for Big Data partnerships. Cloudera-Syncsort, Cisco-Pentaho and Connotate-CrowdSource are all announcing link-ups today.
Written by Andrew Brust, Contributor

The world of Big Data is rich in partnerships and sometimes their announcements cluster together.  That's exactly what's happening today, as three major partnerships are being announced this morning,

Syncsort ETL goes CDH
, a veteran data integration company, is today announcing its DMExpress flagship data integration product is now certified to run with Cloudera’s Distribution including Apache Hadoop version 4 (CDH4).  Syncsort’s product will aid the process of integrating data from Enterprise databases and data warehouse appliances with the Hadoop Distributed File System (HDFS).  While open source Hadoop stack component Sqoop serves this purpose as well, it is a rather bare-bones import-export framework, whereas DMExpress is highly sophisticated in terms of user interface, manageability and performance.  Certifying DMExpress with Cloudera’s nearly omnipresent Hadoop distro makes all kinds of sense, especially since DMExpress' certification on the Hortonworks Data Platform (HDP) Hadoop distro was announced back in June.

Also read:  A Big Data 1-2 punch: Syncsort partners with Hortonworks and Greenplum


Open Source BI on Cisco UCS
Meanwhile, Open Source Business Intelligence (BI) provider Pentaho is announcing today that together with Cisco it will be offering its software on the latter firm’s UCS (Unified Computing System) server hardware.  UCS is a data center-ready product set that combines compute, storage and networking.  Now UCS customers will have the option to include BI technology in that turn-key stack as well.  I should point out that Pentaho’s suite also includes data integration technology and is certified on Cloudera’s CDH4 as well.

Also read: Big Data’s big week


Connotations of Crowdsourcing
The final piece of today’s partnership triad has a twist, as it addresses the issue of the manual labor that is sometimes required to make Big Data clean and accurate. Automated Web data collection solutions vendor Connotate is partnering with CrowdSource to get that manual labor done more systematically.  CrowdSource is an Amazon Web Services Mechanical Turk Partner.  If you didn’t know, "MTurk" is an actual API-driven Web service for the orderly request and provisioning of human intelligence tasks (HITs) – i.e. tasks that computers can’t really do, at least not yet.  Automated collection of data from the Web can sometimes require human auditing and/or correction.  The Connotate-CrowdSource partnership looks like an interesting solution to make such exacting work more feasible at scale.

Moving data into and out of Hadoop; getting BI and data integration software quickly stood up in the data center; and orchestrating audit-dependent data collection efficiently.  The common thread in all of these partnerships is  simplification of the complex. If Big Data processing is to be reliable, repeatable and thus mission critical, then more partnerships like these are going to be needed.


(Handshake thumbnail image by Aidan Jones, CC BY-SA 2.0)

Editorial standards