With last year's $112 million Trillium acquisition now under its belt, Syncsort has released the first integration with its DMX-h Hadoop data integration tool. The integration brings together Trillium Discovery data profiling and Trillium Quality for data cleansing. It will enable users to profile and cleanse data as part of the workflow for moving data from mainframes or other sources to Hadoop. It comes on the heels of the March rollout of Trillium Precise, a cloud-based data-as-a-service for validating and enriching customer records.
The acquisition, which closed at the end of the year, filled a gap in Syncsort's data sorting and migration product lineup by adding quality and customer data verification. Starting from its roots in providing utilities for efficient sorting to support mainframe batch processing, Syncsort applied that approach to ETL processing to other targets, eventually extending to Hadoop.
Trillium's background came as a business unit of Harte-Hanks, whose business originated with direct mail services. Not surprisingly, Trillium developed a core competence for identifying and cleansing customer names and addresses, not only on North America, but across a wide range of geographies. While Trillium ultimately expended its data quality focus beyond name and address cleansing, under Harte-Hanks, the company was slow to make the transition from data warehousing to big data.
Prior to the acquisition, Trillium had a short-lived partnership with Unifi for a cloud-based data preparation service for Big Data. And although under private ownership Syncsort has been no stranger to acquisition, for data prep the company for now is more likely to partner rather than make or buy the capability.
Given that the Syncsort Trillium acquisition is less than six months ago, it's not surprising that the DMX-h/Trillium integration is more of a loosely-coupled linkage between two discrete products. DMX-h users can insert data profiling steps into the workflow by clicking a button that pops up the Trillium tooling. Within Trillium, the data set can be profiled and customer records matched.
While DMX-h had limited data profiling capabilities before (such as identifying whether a column is a date, numeric, or string field), Trillium provides far more granularity in identifying data types. For instance, Trillium provides counts of patterns, "metaphones" (worlds with similar pronunciation), indexing based on sounds, and masked records. It can infer data type, degree of precision, min/max ranges, and discover dependencies.
Wit this release, Syncsort is targeting two use cases: the obvious one, leveraging Trillium's customer data enrichment capabilities, is Customer 360. This release just skims the surface; under the hood, Trillium Discovery also has business rules capabilities for adding intelligence to the process. There are also opportunities for adding integrations with Trillium Precise.
Syncsort's other target use case lies with data lake governance. This release adds capabilities to conduct trend analysis of data quality issues for data populating the data lake. In the long run, we believe that they will need to own a data preparation capability as they get more serious with data lake governance.
Left unsaid is machine learning and Spark support, a capability that could venture beyond rules to provide a more flexible approach to governing the quality of data loaded into the data lake. This is a capability that would likely require future acquisition. By targeting data lake governance, Syncsort finds itself coming up against Informatica and Talend, both of which have boarder suites that also encompass functions such as master data management. But planting its stake in the ground barely months after closing the acquisition, Syncsort is making clear that Trillium is core to its data lake governance roadmap.