Business

Semantics and virtualization, data integration and data governance on the way to the cloud

It's official - data analytics is swiftly moving to the cloud. AtScale is facilitating this, and CEO Chris Lynch shares his insights the hows and whys, in light of the unveiling of AtScale's Adaptive Analytics 2020.1

Written by George Anadiotis, Contributor Jan. 15, 2020 at 6:06 a.m. PT

AtScale is a data virtualization provider for analytics. What this means is that AtScale provides an abstraction layer that enables its users to access the underlying data stores it supports in a streamlined way. Last time we covered AtScale, in November 2016, AtScale had just broken out of the Hadoop box. Today, AtScale announced what it dubs a leap in multi-cloud and hybrid cloud analytics, data platform flexibility and time-to-analysis with the launch of its Adaptive Analytics 2020.1 platform release.

ZDNet connected with AtScale CEO Chris Lynch to discuss the data and analytics market and AtScale's positioning in it.

A cloud migration path that doesn't require a wholesale rip and replace

The announcement of AtScale Adaptive Analytics 2020.1 emphasizes secure, self-service analysis while speeding up performance of underlying data stores. According to AtScale's Cloud Data Warehouse Benchmark Report, AtScale reduces compute costs by 10x, improves query performance by 12.5x and enhances user concurrency by 61x.

The way AtScale achieves those results is still the same, conceptually, it has been doing this since 2016: design, cache, query. Additional enhancements in AtScale 2020.1 include a virtual cube catalog for simplified management of data assets and granular policy control that integrates natively with existing enterprise data catalog offerings.

The data and analytics landscape today, however, is different. and this is what drove the exchange with Lynch. Previously on ZDNet, Lynch stated that he aims to make sure that the percentage of AtScale customers running the product on Hadoop will become proportionate to Hadoop's share of the analytics market, relative to other platforms.

It's official: Hadoop is legacy, says AtScale CEO Chris Lynch

Our own feeling was that today, most existing Hadoop on-premise deployments are legacy. Some are migrating to the new Cloudera in the cloud. Some to the new Cloudera distribution on-premise. Some to Databricks. And some to data warehouses in the cloud. Lynch verified that a sea change in customers' data platform choices in the market is taking place:

"We continue to book new business and expansions on Hadoop environments. However, that's the exception to the rule. We are not seeing new customers looking to deploy on Hadoop (or Teradata or Oracle for that matter) on a net-new basis. Instead, we see the cloud data warehouses like Snowflake, Google BigQuery and Amazon Redshift (in that order) being the most popular target data platforms for AtScale.

Like on-premise Hadoop, we see customers who are interested in deploying AtScale on legacy data warehouses like Teradata and Oracle but with the intention of holding steady or declining those platform investments and focusing new workloads in the public cloud.

Since AtScale's virtualizes those data platform choices, our customers really like the fact that they can leverage their existing investments in on-premise data platforms (Hadoop, Teradata, Oracle) while migrating to the cloud without disrupting their downstream users. We give our customers a migration path that doesn't require a wholesale rip and replace".

Lynch went on to add that in terms of cloud data warehouses, they are seeing tremendous traction from Snowflake with Google BigQuery a distant second, Amazon Redshift a distant third and Azure SQL/DW not really on the radar. Google BigQuery and Snowflake is really popular with retailers at the expense of Amazon Redshift. An in-depth analysis on the performance of those offerings is offered in AtScale's Cloud Data Warehouse Benchmark Report.

Semantics and virtualization, data integration and data governance

Hearing these insights from the trenches was an affirmation. But useful as this may be, this is but a snapshot - tools and preferences change. As far as AtScale is concerned, however, a crucial point is whether its middleware, semantic layer is only capable of addressing these solutions specifically, or it can also be applied to others.

Lynch said that from the beginning, AtScale's platform was architected to talk to any data platform that exposes a JDBC interface. Hadoop was chosen as the first go to market target, but by no means was the product restricted to that platform:

"To be a universal semantic layer, you have to talk to any data platform and allow connections from any data consumer (BI, AI or custom application). We did just that. In this release, we extended our virtualization capabilities to allow blending of data from multiple data platforms in one virtual cube.

Our expansion to other data platforms is just a normal course of work for us based on customer demand. For example, in this release, we've added support for legacy data warehouses like Teradata, Oracle, SQL Server and DB2 based on customer requests".

Cloud computing with computer network — Semantics and virtualization, data integration and data governance on the way to the cloud
Getty Images/iStockphoto

For us, this is the point where we needed to open up the discussion to include what we consider bona-fide semantics, too. The AtScale Adaptive Analytics 2020.1 platform release is comprised of 3 pillars. A multi-source intelligent data model, a virtual cube catalog, and self-optimizing query acceleration structures.

The former 2 looked to us like a 1-1 match for data integration and data governance, respectively, and Lynch confirmed that data Integration is a primary use case for AtScale, and data governance is a close second.

Once you have a universal semantic layer that includes intelligent virtualization and is optimized for analytics, Lynch said, many use cases open up. Primarily, he went on to add, with a universal semantic layer, customers can enforce data access policies and security consistently regardless of where the data is stored or who is querying it.

We would not argue with that. What we wondered, however, is whether knowledge graph technology, which offers the same benefits, but also happens to be standardized, as opposed to AtScale's patented technology, could be an alternative.

Lynch replied that AtScale has built and patented a graph based query optimizer and planner, and that has been a key technology differentiator for them from day one. That may be true, but it does not address what we feel is the most important aspect of knowledge graph standardization - the data models per se. With AtScale, as well as with any proprietary implementation, data models and mappings are proprietary, too.

As far as future plans are concerned, although Lynch did not share specific details, he hinted at upcoming innovations in deployment strategies and autonomous model creation and data discovery.

Our takeaways from it all are rather clear. AtScale, on its part, continues to execute on the path it has carved for itself, apparently with success. By doing so, it gives us valuable insights on what the data and analytics landscape looks like today. And, we might add, a hint to where it's going: semantics and virtualization, data integration and data governance.