When the company was founded just over a decade ago, Talend became one of the first of a wave of second generation data integration providers aiming to pick up where the Informaticas and IBMs of the world left off. While Informatica pioneered the data integration space by expanding from ETL through a series of acquisitions, Talend primarily boostrapped its way to building out its suite (although the master data management capability was acquired). More importantly, Talend introduced open source.
Talend, the rebel, is now challenged by a new generation of rebels that were born in the big data world, counting players such as Paxata, Trifacta, Tamr, Alation, Zaloni and others. The intersection comes in areas such as data preparation, offering a different approach targeting the business user rather than DBA, and more often than not, leveraging machine learning and Spark to guide the user in wrangling the data.
And so Talend, like most other incumbents of the data integration space, has responded with its own data preparation solution and with its own spin. Talend Data Preparation is the first such tool to be powered by Apache Beam, which we reviewed earlier in the year. It was part of the winter release, which also included refinements such as adding role-based environments or skins for developing integrations and transformation; setting permissions; running tests; and staging to production. For now, Talend Data Preparation is available on premises, but the company announced before an audience of customers this week in New York that it would add it to its AWS-based cloud offering in the fall.
Talend's use of Beam is more than coincidental, as it is one of the contributors to the project. The significance is not just bragging rights to another black box technology; by using Beam, Talend is extending data preparation to real-time streaming, and conceivably could provide yet another form of portability. Beam is also the future of the Talend Data Fabric, a meta-hub for orchestrating the transformation of data between cloud and on-premises sources and targets.
Like most of its rivals, Talend is gradually extending machine learning to the core of its integration suite. It has used Spark for turbocharging data transformation/ETL processes that are useful for jobs involving high complexity and data volumes. They have also recently extended machine learning to data quality and de-duplication. We'd like to see them apply machine learning to make master data management, not only more scalable, but flexible as well.
Talend promotes the fact that its tools are the same on premises and in the cloud, although for its managed offering, the admin console is different. For now, the Talend Integration Cloud is available through AWS. We expect that, with Talend's skin in the Apache Beam project (which is based on SDKs for Google Cloud Dataflow), that a multi-cloud future will be in the offing.
With the cloud and machine learning, Talend's journey has gone full circle. Pioneering open source in the data integration space, Talend led the way for a new generation of providers who challenged Informatica and IBM from the bottom up. Today, providers who emerged in the world of Hadoop and cloud have spurred providers like Talend to rethink their approaches by promoting machine learning to provide more guided experiences. In so doing, Talend's audience has expanded to business analysts and a new breed of DataOps developers. Exhibit one is its Winter 2017 release that introduced distinct user interfaces targeted at administrators, developers, and business analysts. With machine learning assist, the beauty will hopefully be more than skin deep.