Version 8 of Dell Software's SharePlex product, which heretofore has provided replication services between Oracle databases, can now replicate from Oracle to Apache Hadoop, bringing near real-time OLTP data refresh to the big data platform. What's more, Hadoop is only the first of several new data stores to be supported.
Now that, and made it part of its $1.5 billion software unit, the data acquisition and analysis tools that Quest brings to the table have strategic importance beyond their value as mere IT operations tools. Database replication is important for fault tolerance, branch operations, and more. But Dell has clearly decided that such an engine can and should also be used to move data between platforms, to aid in data integration and analysis.
Dell has effectively added an open output pipeline to SharePlex, implemented using a Java Message Service queue. It then created a connector that subscribes to the queue, and pushes the extracted data to Hadoop using Sqoop, which leverages Hadoop's MapReduce engine to move data in and out of Hadoop's Distributed File System(HDFS).
As the diagram below shows, developers can also build their own custom apps that subscribe to the JMS queue, and subsequently push the data into other platforms, or they can use existing data integration tools to do likewise:
If you're not up for the custom solution, though, fear not. Dell will be adding its own connectors. According to Dell Software's Darin Bartik (executive director of product management for information management solutions) and John Whittaker (director, marketing), Microsoft's SQL server is on the shortlist here. The logos in the diagram may or may not indicate other databases to be directly supported in the future.
Dell Software has a partnership with Cloudera, so we can expect that SharePlex's Hadoop connectivity will work very well with that company's Hadoop distro. However, Bartik and Whittaker assured me that Hortonworks' and MapR's distributions are supported, too.
With the release of SharePlex 8, Dell Software is taking big data convergence a step further than we've become used to seeing. Not only are the worlds of big data and BI colliding, but now, relational databases and OLTP are part of the equation as well.