Extract, Transform and Load (ETL) tools have constituted an important Enterprise software category for a long time. Once commanding huge standalone licensing fees, ETL capabilities are now built into various database products, reducing costs for customers of those database platforms.
Talend's DI platform
But Los Altos, California-based Talend, a specialist in Data Integration (a classier moniker than ETL, I suppose), offers an open source distribution of its platform. As a result, Talend has a vibrant ecosystem that often results in new community-developed connectors for the platform in addition to the more than 450 connectors included natively in the product.
And the Talend platform includes more than just plain ETL: Master Data Management (MDM), Data Quality, Business Process Integration (BPI), and even an Enterprise Service Bus (ESB) are part of the mix.
Get your Hadoop on
Talend version 5.3 now features a graphical mapper for building Apache Pig data transformation scripts visually (rather than having to code the data flows in the component's language, "Pig Latin"), thus making an important Hadoop stack component a bit more analyst-friendly.
Talend 5.3 can also generate native Java MapReduce code, which allows data transformations to run right on the Hadoop cluster, avoiding burdensome data movement, and making use of general purpose SQL and import/export tools like Hive and Sqoop unnecessary.
NoSQL, no peace!
Talend 5.3 also adds to its NoSQL connectivity capabilities. While the prior release could connect to HBase, Cassandra, and MongoDB, v5.3 adds support for Couchbase, CouchDB, and Neo4j. This provides coverage for the most popular NoSQL platforms (aside from proprietary offerings like Amazon Web Services' DynamoDB). It also means Talend has connectivity to databases across all four major NoSQL categories (key-value stores, document stores, wide column stores, and graph databases).
Will loyalties shift?
Whether Enterprises will go "cold turkey" from standalone DI tools, like those from Informatica, or sophisticated, bundled ETL tools, like Microsoft's SQL Server Integration Services, remains to be seen. But there's a lot to be said for graphical tools over Hadoop, native MapReduce code, connectivity across major NoSQL data stores, and the option to work with open source distributions of a product before standardizing on it.