The world of Big Data is rich in partnerships and sometimes their announcements cluster together. That's exactly what's happening today, as three major partnerships are being announced this morning,
Syncsort ETL goes CDH
Open Source BI on Cisco UCS
Meanwhile, Open Source Business Intelligence (BI) provider Pentaho is announcing today that together with Cisco it will be offering its software on the latter firm’s UCS (Unified Computing System) server hardware. UCS is a data center-ready product set that combines compute, storage and networking. Now UCS customers will have the option to include BI technology in that turn-key stack as well. I should point out that Pentaho’s suite also includes data integration technology and is certified on Cloudera’s CDH4 as well.
Also read: Big Data’s big week
Connotations of Crowdsourcing
The final piece of today’s partnership triad has a twist, as it addresses the issue of the manual labor that is sometimes required to make Big Data clean and accurate. Automated Web data collection solutions vendor Connotate is partnering with CrowdSource to get that manual labor done more systematically. CrowdSource is an Amazon Web Services Mechanical Turk Partner. If you didn’t know, "MTurk" is an actual API-driven Web service for the orderly request and provisioning of human intelligence tasks (HITs) – i.e. tasks that computers can’t really do, at least not yet. Automated collection of data from the Web can sometimes require human auditing and/or correction. The Connotate-CrowdSource partnership looks like an interesting solution to make such exacting work more feasible at scale.
Moving data into and out of Hadoop; getting BI and data integration software quickly stood up in the data center; and orchestrating audit-dependent data collection efficiently. The common thread in all of these partnerships is simplification of the complex. If Big Data processing is to be reliable, repeatable and thus mission critical, then more partnerships like these are going to be needed.