Google BigQuery Omni connects customers to data in AWS and Azure

The new multi-cloud analytics option from Google Cloud, announced at the company's Cloud Next virtual conference, allows BigQuery to run in other clouds, query data there, and shuttle the results back to the mother ship.

At its Next '20 "OnAir" virtual event today, Google Cloud is announcing BigQuery Omni, a new service -- perhaps better-characterized as a new modality -- that allows the BigQuery data warehouse service to query data domiciled in other clouds. The offering has the potential to shake up the cloud data warehouse market, blurring the lines not only between data warehouse and data lake, but also between Google Cloud Platform (GCP), Microsoft Azure and Amazon Web Services (AWS). And on that note, Google says BigQuery Omni is available for AWS now (albeit in private alpha), with access to Azure "coming soon."

Also read: Catching up with Google BigQuery

Google has made clear that Omni isn't just a big branding exercise about connectors to S3 and ADLS (Amazon's Simple Storage Service and Azure Data Lake Storage, respectively), either. Instead, BigQuery Omni actually involves BigQuery clusters physically running in the cloud on which the remote data resides. As such, Omni does have the potential to be a proverbial game changer.

It's a K8s world and we're just running in it

BigQuery Omni is enabled by Google Cloud's Anthos technology, which allows Google Cloud Platform (GCP) services to run on other clouds, by deploying the software of the service in question as containers orchestrated by Kubernetes (K8s). In the case of BigQuery Omni, even the clusters on non-Google clouds run as multi-tenant services, so the customer needn't worry about provisioning K8s clusters in their own accounts. Although identity access management permissions on the remote cloud need to be set up, BigQuery Omni is still serverless, just like BigQuery "classic."  The diagram in the figure below illustrates this architecture, in the AWS remote cloud use case, at a high level.  

explore-bigquery-omni-under-the-hood.png

BigQuery Omni high-level architecture

Credit: Google Cloud

The benefit here is that the query -- or part of the query -- involving Avro, CSV, JSON, ORC, or Parquet data on the remote cloud is executed there and only the results of that query -- if that -- need to be transferred back to GCP. This, of course, is more cost-efficient than moving all the source data back to Google's cloud and querying it there, since egress charges from the remote cloud provider will apply only to the result set.

There is no free lunch, though. Even in more physically constrained scenarios, federated queries are complex and expensive. When the federation essentially involves different data warehouse clusters in separate data centers, let alone separate clouds, that complexity only increases. It is for that reason that the result set from an Omni query needs to be persisted to the remote cloud's storage layer and left there, or physically copied back to native BigQuery storage on GCP for further querying, processing and blending with data that was in BigQuery storage initially.

Keep your enemies closer

Google is introducing Omni in recognition of the fact that many customers have data on multiple clouds. While the company hasn't said so, it may also acknowledge that many customers have a majority of their data on AWS and/or Azure, given GCP's #3 position in the market. With BigQuery Omni, Google Cloud can accept the data gravity situation for what it is, and still entice customers to use its services -- like BigQuery and Looker -- to query, process and visualize that data.

Also read: Looker and Google team up around BigQuery Machine Learning

Omni, and Anthos more generally, are smart realpolitik moves, and less capitulatory than competitive: customers who find GCP services attractive may then decide to land new data there...and maybe even move old data there from AWS and Azure. In other words, while data gravity begets service usage, the reverse can be true as well.

Tables turn

In many ways, offerings like Google Anthos and Azure Arc - which allow services from one cloud to run on another, or resources on one cloud to be managed from another - are akin to the old days when vendors acknowledged tech "heterogeneity" at their customer sites and invested in interoperability. So accommodating customers' multi-cloud strategies is thus consistent and based on precedent, but Kubernetes has the power to commoditize cloud services, or at least give the customer the illusion that it can. Of course, in the arena of negotiating cloud services contracts, such perception may be effective reality, so cloud providers need to be vigilant.

Customers, on the other hand, should celebrate the innovation here: containers in general, and Kubernetes in particular, are creating a kind of universal emulation environment that is starting to make cloud services fungible. Moving up 10,000 feet or so, this is a breakthrough that should drive new strategic thinking. While you'll probably always have a cloud of primary residence, Kubernetes and technologies based on it are making it easier, cheaper and more advantageous to have a second home. Now you'll just need to run two or more houses, and deal with paying taxes in multiple jurisdictions.