Podium Data becomes Qlik Data Catalyst

After last year's acquisition of Podium Data by Qlik, the data catalog product has been re-branded and rid of its Hadoop technology dependencies.

podiumgoesqlik.png

After the acquisition of Podium Data by Qlik last July, we knew a couple of things: consolidation in the analytics market was continuing, and interest in data governance/data management was increasing, even for self-service BI vendors like Qlik.

Also read: Qlik acquires Podium Data as BI and Big Data coalesce

What we didn't know, though, was what would happen to the Podium team and product post-acquisition. But we do now. The Podium team is apparently intact, now constituting a distinct Enterprise Data Management team at Qlik, with its own P&L. Paul Barth, Podium's erstwhile CEO, is now Managing Director of that group, and carries that P&L responsibility. And in a discussion with both Barth and Joe DosSantos, Qlik's new Global Head of Enterprise Data Strategy (and formerly a Podium Data customer at TD Bank), I found out how the product has evolved as well.

Nomenclature.new()

Podium Data's formerly eponymous product is now Qlik Data Catalyst. And, along with that name change announcement, Qlik is announcing a new release of the product. The new version number is 4.0, in-keeping with the Podium version numbering rather than being reset to 1.0.

Data Catalyst 4.0, as you might expect, is being brought more into the Qlik fold. Its wizards have been streamlined but also re-branded and restyled for consistency with the Qlik look and feel. Data Catalyst also features more direct integration with the Qlik Sense product, including a button on the Data Catalyst side that generates a visualization for a catalog data set on the Qlik Sense side in one...uh...qlik.

But the improvements go beyond mere conformance to the mother ship organization. There's also a change in architecture. Podium Data was born of the Big Data world and had corresponding dependencies on Hadoop. Qlik Data Catalyst, meanwhile, continues to support Hadoop but no longer requires it. Processing jobs, which previously executed on Hadoop, can now be run locally on the Data Catalyst server; and data storage which was previously exclusive to HDFS (the Hadoop Distributed File System) can now use the server's Linux file system instead.

Cloud catalyst?

But Data Catalyst can also use Amazon S3 for storage and its jobs can run on an Amazon EC2 virtual machine (or an EMR cluster, if the classic Hadoop architecture is preferred). Data Catalyst can also publish to Amazon Redshift. When it does so, Data Catalyst publishes just the metadata, by creating a Redshift external table that, in turn, uses Redshift Spectrum to link to the data in S3.

There's a lot of Amazon Web Services (AWS) goodness in there, but not much for fans of Microsoft Azure or Google Cloud Platform (GCP). Barth and DosSantos tell me that's consistent with customer demand, but that the team is evaluating support for Azure and GCP in the future.

Supporting the relationship

Beyond the decoupling with Hadoop and the new AWS-friendliness, Data Catalyst has also gained new appreciation for data relationships. This asserts itself in two forms: first, as users add tables to the catalog, any table that -- by virtue of the database's schema/metadata -- is related (as a peer, child or parent) will be identified and recommended for inclusion.

Beyond metadata-based relationships, though, Data Catalyst will also "infer" relationships based on identified/classified data elements that the selected table has in common with other tables, and add them to the catalog. Data Catalyst employs a Drupal rules-based engine to drive its data classification, as opposed to a machine learning-based approach like, for example, that taken by Waterline Data and Io-Tahoe.

Acquired, but not subsumed

As I pointed out at the beginning of this post, the Podium Data team is alive and well at Qlik. The Data Catalyst product, though re-branded and integrated with Qlik Sense, is nonetheless a product that Qlik will, according to Barth and DosSantos, sell as a distinct product, to customers who may use BI products and platforms from Qlik competitors. They say Qlik doesn't just want an add-on for its BI platform, but wants to be seen as a credible provider of data management/governance solutions.

And given the simultaneous push for companies to be data-driven and data regulation-compliant, that BI/data governance combination makes lots of (Qlik?) sense.