Google launches serverless Spark, AI workbench, new data offerings at Cloud Next

Google Cloud BigQuery, Vertex AI, Spanner all see important enhancements. Tableau partnership and Google Earth Engine further enable analytics scenarios on Google Cloud.

While the cloud has been great for data and analytics -- given its limitless storage and compute capacity  -- it has also caused a real regression in productivity for data professionals. The reason for this, simply put, is that the major cloud providers have hurled numerous data platforms at the market and left it to customers to pick the right combination of services, then integrate them. Say what you will about the old guard enterprise software behemoths, but they did spare their customers much of the "assembly required" experience that the cloud hyperscalers impose today.

Perhaps it's fitting, then, that Gerrit Kazmaier, until recently the data and analytics-focused Executive Vice President at SAP, is  Google Cloud's newly minted Vice President & General Manager, Databases, Analytics & Looker. SAP is an enterprise software company if ever there were one. And whether there's a causal phenomenon at play, or if it's just coincidentally apropos, Kazmaier briefed ZDNet on a number of new capabilities, being announced today at Google's Cloud Next '21 digital event, that are bringing enterprise software-style "turnkey" operation to Google Cloud's data platform.

At the vertex of AI and analytics

The first big reveal from Google Cloud is a new offering within its Vertex AI service called Vertex AI Workbench. The Workbench is essentially a managed notebook experience that serves as an IDE (integrated development environment) for machine learning and AI work. It ties together Vertex AI's core components (like its training and prediction services) along with key components of the data platform like BigQuery, Dataproc and Dataplex.

This is the very kind of integration that has largely been missing from cloud analytics environments and putting it all together helps data scientists, machine learning engineers and data engineers avoid having to change gears and lose their trains of thought, jumping from service to service. Having multiple services' UIs open in different browser tabs isn't integration; making an array of services available within the context of another, complementary one is.

Omni, present

Another of Google Cloud's big announcements today is the general availability (GA) of BigQuery Omni, which allows BigQuery users to get at data they have in Amazon Web Services (AWS) or Microsoft Azure. This is achieved by running instances of BigQuery in those competing clouds, performing the queries there and marshalling the results back to the Google Cloud home base. I wrote about Omni in detail when it was launched in preview in July of 2020.

Also read: Google BigQuery Omni connects customers to data in AWS and Azure

Kazmaier told ZDNet that customers including Electronic Arts and Johnson & Johnson have been using BigQuery Omni to great advantage. It's clear, form this and other announcements, that BigQuery is central to Google's "data cloud" strategy. Providing BigQuery access to data stored in other clouds is a must-have for Google, and GA of Omni is an important milestone.

Also read:

Up with Spark, down with servers

The next announcement is one that is highly complimentary to the others: an autoscaling, serverless implementation of Apache Spark, called Spark on Google Cloud, available as a preview service. Spark has become a ubiquitous commodity environment across the industry for all kinds of analytics, data engineering and machine learning workloads. Yes, cloud providers have built serverless Spark services for themselves; for example data flows on Azure Data Factory execute on Spark clusters that customers never have to provision themselves and code generated by Amazon Glue does likewise. But using Spark to execute a particular step in most data and AI pipelines has required the explicit provisioning of a Spark cluster, and dealing with the latency required for the cluster to spin up.

Also read: Azure Data Factory v2: Hands-on overview

With the serverless Spark on Google Cloud, much as with BigQuery itself, customers simply submit their workloads for execution and Google Cloud takes care of the rest, executing the jobs and not bothering the customer with needing to size, or even think about, a discrete Spark cluster. The service will be integrated into -- you guessed it -- BigQuery, Dataproc, Dataplex and Vertex AI allowing users of those services to leverage Spark without having the burden of infrastructure provisioning and management.

Of Cloud (Spanner) and (Google) Earth

Next up: Google has implemented a PostgreSQL interface atop Cloud Spanner, its geographically distributed relational database service. While not an implementation of Postgres itself (something that is available on Cloud SQL), this offering allows code that uses Postgres' SQL dialect and wire protocol to work on Spanner. Compare this offering to the Postgres interface on AWS' Aurora database service or Azure Database for PostgreSQL Hyperscale. In both those cases, as with the Spanner Postgres interface, cloud-hosted, horizontally scaled databases are available to those with Postgres skillsets. The Spanner Postgres offering is available in preview.

Also read:

And here's some more integration: 50+ petabytes of Google Earth data available to users of BigQuery, Google Cloud's ML technologies and Google Maps. The service, called Google Earth Engine, is being launched in preview

Looker here

In case you forgot, Google Cloud owns Looker now. Heck, the Looker name is even in Kazmaier's title. And while, yes, Looker is a BI front-end in its own right, it seems Google sees just as much value in the LookML modeling language, with which Looker can define semantic models that make data more easily analyzed by BI users. To that end, Google's Connected Sheets technology, which allows users of Google Sheets to query data in BigQuery, will become compatible with LookML, something Google Cloud says it will release in preview form by the end of this year.

Also read:

Beyond Connected Sheets, though, Google is announcing a partnership with Salesforce's Tableau that will soon provide that very popular business intelligence platform with access to Looker semantic models, via LookML, as well. While other industry players like Databricks, Informatica, Trifacta, Fivetran and Collibra will also be spotlight partners at Cloud Next, this partnership with Tableau is unprecedented and very interesting. It shows that Google Cloud knows it can't be a dominant data cloud provider without enlisting the help of partners from across the analytics world. It also shows, again, that Google pursued the Looker acquisition as much for Looker's back-end data modeling capabilities as for its front-end data visualization and dashboard capabilities.

Also read: Salesforce-Tableau, other BI deals flow; the tally's now five in a row

Hooking stuff together?

Bemoaning the relative lack of integration of cloud services that has existed up till now is no mere gripe. For customers to do the integration and hack through all the complexity is a ton of work, incurring a ton of risk and expense along with it. Microsoft has been addressing the integration vacuum with Azure Synapse Analytics and, one could argue, AWS has tried to do so with its Lake Formation offering.

Also read: Azure Synapse Analytics combines data warehouse, lake and pipelines

With today's announcements from Google Cloud, all three major cloud providers recognize the criticality of integrating their services. That's good, but all three also have a long way to go before their data and analytics offerings are simple to use, fully rationalized and seamlessly integrated. Eventually, though, the hyperscalers will be able to say, with legitimacy, that the cloud is the new enterprise stack.

Post updated on October 12th at 7:03pm ET to remove Wayfair from the list of customers using BigQuery Omni. Although WayFair is a BigQuery customer, it has not adopted Omni.