Informatica brings serverless compute to Data Integration Cloud

The highlight of the spring release is introduction of serverless computing. The new release also adds more services that are aided with machine learning, and the use of graph computing to help organizations build a single view of customer master data.
Written by Tony Baer (dbInsight), Contributor on
Credit: Informatica

A couple years after unveiling its microservices-based second-generation Intelligent Cloud Service, Informatica's latest quarterly release has finally caught the serverless bug. It is among a bundle of new features that adds new capabilities for managing data pipelines and integrating streaming.

Serverless computing is a natural fit for data ingestion and integration processes as they are often run in batch, and in depending on the mix of sources, could also have highly variable resource consumption profiles. The guiding notion for serverless is eliminating the need to provision "just-in-case" capacity to handle spikes, with the system automatically adjusting the provisioning based on traffic. The new serverless option autoscales and has built into high availability and recovery. Customers can still use server-based options for more predictable long-running workloads.

While serverless simplifies life for users by having the system automatically provision resources, the downside is that costs can be unpredictable. As part of the new serverless option, Informatica offers a calculator that applies machine learning to profile new workloads that provides an estimate of costs based on whether customers prioritize performance (with parallel processing) or cost (which goes through a single node).

With serverless, Informatica is stealing a page from cloud-based services that have already made serverless the staple for ETL and integration offerings based on data pipelines. Among them are AWS Glue, Azure Data Factory, Google Cloud Data Fusion, and even Databricks, which added a serverless option.

A related feature is applying machine learning to help organizations rationalize their data pipelines. As cloud-based, low code/no code tools make it almost too easy to build pipelines, customers can easily build up a bewildering array of one-offs. Informatica's new tool introspects the pipelines, scanning data sources, operations, and targets to identify which pipelines use similar transformation patterns, and guides users on building configurable templates that reduce proliferation and makes them more configurable and maintainable.

And, when ingesting streams, Informatica has added a new capability that scans the Kafka repository to track data lineage, just as it already does for database and file sources. And when conducting data prep, Informatica's cloud service can recommend joins. The visual integration designer for Informatica's cloud ETL service has in turn stolen a page from data prep  by recommending transformation operations based on scanning sources and targets.

Among incremental updates are addition of de-duplication capabilities to the data quality services that was introduced last year. While de-duplication is hardly new to Informatica, previously it was only available on-premises or as part of a bring-your-own-license (BYOL) support for running Informatica Data Quality on Amazon EC2 or other cloud infrastructure services. The catalog has been enhanced with a selection of views for data engineers, business analysts, and data scientists through menus that allow users to select logical or physical views of metadata. The catalog has been expanded from the usual roster of database sources to crawl metadata from cloud services such as Microsoft Power BI, Qlik Sense, AWS Glue, Google Cloud, Snowflake, and other sources.

Rounding out the spring release is the exposure of customer master data using underlying graph database that provides a more intuitive way for representing and exploring customer relationships. The new release is now available on AWS, Azure, and in beta on Google Cloud.

Editorial standards