From mainframes to Hadoop clusters Syncsort, a company that started out in the mainframe space with high-speed sorting technology in the 1960s, and which expanded to the ETL (Extract, Transform and Load) cum Data Integration space in the last decade, was involved in both of these Hadoop Summit phenomena. On June 14, the company announced a partnership with Hortonworks, which brings Hadoop/HDP to Syncsort's DMExpress Data Integration platform.
This partnership brings Hadoop and the Hadoop Distributed File System (HDFS) into the corporate data workflow and does it in such a way that Enterprise data practitioners can apply their current skill sets to work with it. Syncsort's graphical user interface for developing data integration flows (pictured) can now be used to front-end Hadoop. Even if the native Hadoop skillset does one day become mainstream in the Enterprise, that won't happen until Hadoop becomes usable from the skillsets that are mainstream there today.
Not just a driver Another noteworthy thing about this partnership is the engineering behind the integration. Syncsort went well beyond garden-variety Hive driver connectivity; instead, DMExpress integrates tightly with HDFS and Hadoop's MapReduce Framework. DMExpress can work directly with HDFS files and its highly optimized sort routines can be integrated into MapReduce jobs. The latter capability results from Syncsort's own open source code that links a pluggable sort infrastructure into the Map and Reduce tasks orchestrated by Hadoop.
One more thing I could end this post right here, because the Hortonworks partnership is significant and stands on its own. But I'm going to continue, because today the plot thickens.
I keep saying that Massively Parallel Processing (MPP) data warehouse appliances are Big Data technology too. Hadoop does not Big Data make.
Well, it would appear that Syncsort might agree because the company is today announcing a partnership with Greenplum, makers of a prominent MPP appliance (and a unit of EMC). This partnership means that Greenplum is a certified data endpoint for DMExpress and that, suddenly Hadoop, MPP and transactional databases are peers in the realm of data movement. To top it all off, Syncsort has also joined Greenplum's "Catalayst" developer program, which will allow technologists at both companies to collaborate on interesting solutions.
Worlds collide, and both survive I asked Keith Kohl, Syncsort's Director of Product Management for Data Integration, what his company's reasoning was in partnering with both a Hadoop player like Hortonworks and an MPP player like Greenplum, especially given its the Enterprise storage pedigree of Greenplums's parent company.
Kohl's answer was pretty matter-of-fact: Syncsort's customers are using both MPP and Hadoop technology, side by side. Kohl gave a specific example of one such customer (comScore), but explained that the scenario is rather pervasive. Corporate clients are integrating Hadoop into their data environments but they are doing so in a way that allows them to leverage their infrastructure and skill set investments in their data warehouse platforms.
The notions of "Big Data" and "One Version of the Truth" needn't be at cross-purposes. Kohl says that risk, skill sets and market maturity are each important drivers of this trend. My take: disruption shouldn't be underestimated, but the imperative of rip-and-replace is shouldn't be overstated.