'

Google Cloud Platform begins filling out its data stack

With release of three core database services, the Google cloud is beginning to address the enterprise story.

While Google Cloud Platform (GCP) paraded out a series of executive endorsements at its NEXT conference last spring, a comment from one of the participants shed valuable light on GCP's appeal. Coke's CTO pointed to GCP's advanced "NoOps" that enabled it to - literally - stitch together the Happiness Flag quilt from over 3 million photos contributed worldwide. The quit was the star of Coke's digital marketing campaign for the 2014 World Cup. But the abilioty to rapidly mobilize viral content is not necessarily among the qualities that CIOs or CTOs would associate with more mundane jobs of running, say, SAP or Oracle.

That's the narrative that Google wants to start changing with its rollout of a trio of enterprise data platforms today. Google is traversing the well-worn path that Amazon paved a decade ahead of it, figuring ways to expose, monetize, and deliver the technologies that it's been using to run its core business.

Today's announcements encompass Cloud SQL, Google's managed MySQL service; Cloud Datastore, the JSON NoSQL database that has underpinned the Google App Engine platform-as-a-service (PaaS); and Cloud BigTable, the public cloud release of the data platform that originally inspired HBase.

Cloud SQL resembles Amazon Aurora in that both are fully managed implementations of MySQL that are adapted for the cloud. That means replication for fault tolerance across three zones or instances, and in Aurora's case, two copies of data in each zone. Both keep data encrypted and running on high-performance SSD Flash drives. The major differences are that Aurora is more mature (Cloud SQL is only now emerging from two betas), and optimized integrations with other components or services specific to each player's stacks. Aurora is MySQL-compatible, in that it works with MySQL APIs, whereas Google claims that Cloud SQL is a more vanilla version minus a few features like super privileged user roles that would otherwise play havoc with backups.

Cloud Datastore is the counterpart to Amazon DynamoDB and Microsoft DocumentDB. While DynamoDB predates Cloud Datastore in commercial release, Google first surfaced the database as part of the Google App Engine back in 2008. What's new is that the GCP release decouples the database from the App Engine, although it still supports use of server side scripts from it. Otherwise, each of these engines has similar specs, such as offering tunable consistency. But they differ on ACID transaction support; Cloud Datastore groups multiple operations in a single transaction that can read or fail; DynamoDB relies on the application for ACID support; while DocumentDB's ACID guarantees are limited to data inside a single partition.

Of course, no discussion of JSON cloud databases is complete without mentioning MongoDB. While there are several third parties that already offer MongoDB as a service, Mongo itself is now getting its hands dirty with its Atlas cloud service. The obvious differentiator is that, for the large MongoDB installed base, there is no new database to learn in going to the cloud; while the MongoDB database platform is fairly mature, its cloud service is brand new and in ramp up.

While Cloud Datastore is targeted at up to a few terabytes of data, when you go higher than that, it's time to move on to HBase, Cassandra, or Cloud BigTable, the Google wide column database that inspired the other two. The capabilities are similar aside from Cloud BigTable's ability to do atomic, single row operations. This is a case of coopetition, BigTable shares the same API as HBase and you can get access to HBase through Google's Cloud DataProc service that includes the rest of the Hadoop stack. Therefore, the key differentiator for Cloud BigTable is the managed aspect of the service.

Rounding out the announcements are the latest version Cloud Storage, Google's object storage answer to Amazon's S3 and Microsoft's Azure BLOB storage, and the beginnings of a managed SQL Server 2016 service that for now covers Standard and Wed Editions (Enterprise Edition will come later).

As Amazon S3 has become a de facto standard for bulk storage of data in the cloud, Google designed its object store to use the same APIs to encourage migrations. New in this release is support for the ability to encrypt data on ingest, but restrict the key to the client (a security measure that prevents someone from hacking into the Google cloud to decipher the data); and performance enhancements that reduce or eliminate the 3 - 5 second latency of retrieving data from archiving, making the data accessible to analytics.

As these are the first production releases of Google's data platforms (they were previously available through public beta), everything has to be watched through the lens of works in progress. While Google is putting the technology building blocks in place, it still is in the process of building the delivery and support channels. Today, Google is known for advanced technology - specifically its NoOps approach that self-manages operations in a black box. It is also known for advanced AI/machine learning capabilities that, unlike the mainstream of the industry, is starting at the far end of the pool with deep learning. But when it comes to the heartbeat workloads that run enterprises, the burden of proof is on Google that its NoOps approach is just as mature for the humdrum mundane workloads as it is for stitching together complex global digital marketing campaigns.