AWS re:Invent 2018 Postmortem: Surprise! Hybrid enters the agenda

With all the hype, one would think that Machine Learning would have dominated the spotlight at re:Invent. Maybe it did, but for us, AWS's announced preview of a hybrid cloud stole our attention.
Written by Tony Baer (dbInsight), Contributor

It is all too easy to get jaded by the scale of AWS's annual re:Invent extravaganza: 50,000+ plus attendees, jamming multiple hotels the length of the Vegas strip, from Mandalay Bay on the south to the Wynn and Encore on the north. Not to mention the stream of announcements. My colleagues have already provided wall-to-wall coverage over the past week, from partner announcements to machine learning, IoT, data lake "formation," to Werner Vogels' official goodbye to Oracle.

At the analyst preview event on the day before the announcements started going live, one of the questions was whether the 60+ announcements being delivered before the group signified a slowdown from the 80 or so announcements delivered last year. AWS's answer? The analyst relations group took a Sharpie to edit down the list.

AWS expected that the headline would be machine learning. They polled the analysts before the event. We wanted machine learning. Now hold that thought.

It's hard to ignore that there is an arms race over AI from which no technology vendor is exempt. Consider AI, or machine learning, as the newest enterprise computing checkbox item. Regardless of whether your organization is ready to adopt AI, you probably don't want to buy from a vendor that is behind the curve.

Also: How to Implement AI and Machine Learning Special Feature

AI and machine learning are races for which no good deeds go unpunished. Google has the reputation for being the AI company, but by virtue of its dominance of the cloud computing market, Amazon Web Services is the place where most AI workloads go into production. While Google created TensorFlow, arguably the most popular machine learning and deep learning framework, AWS rolled out stats showing that 85 percent of all TensorFlow workloads are running on AWS.

While there will continue to be debates on whether AWS contributes to or consumes from the community (it is open sourcing a new compiler for deep learning jobs), its core message is all about production. Yes, there were some announcements aimed at building the community. Opening a new section of AWS Marketplace specifically for machine learning models will provide a clearer path for developers and creators to monetize their ML algorithms.

But the direction of AWS's ML announcements centered on reducing cost and the headaches associated with data wrangling and making the process easier and more frictionless. Ovum's research shows that data wrangling easily hogs the bulk of the data scientist's time. If you can automate enough steps and get decent support from data engineers, maybe you can get that burden reduced to about half the data scientist's time.

At one end of the spectrum, the agenda was about the arms race for crunching ML workloads -- a new Amazon EC2 P3dn instance promises 4x the network bandwidth and twice the memory of the existing P3's, targeting larger, more complex models. At the other end of the spectrum, Amazon Elastic Inference was announced to help customers utilize compute infrastructure more efficiently, through scaling up and down loads to minimize consumption of precious GPU resource. Additionally, AWS is introducing a custom chip -- AWS Inferentia -- adding an economic alternative to GPUs.

Also: FAQ: What Arm servers on AWS mean for your cloud and data center strategy TechRepublic

Focusing on process, AWS is adding a new service to Amazon SageMaker targeting the ordeal of labeling training data. Amazon SageMaker Ground Truth provides three options: taking advantage of Mechanical Turk to mobilize workforces on demand for labels, plus a private workforce option, and an automated labeling option.

OK, that's just a sampling of AI and ML announcements that AWS made last week. But let's get back to our core point: while AI dominated the spotlight, AWS's announcement of a private preview of a new hybrid cloud offering proved the sleeper. AWS Outposts packages a rack of AWS compute and storage for installation in the customer's own data center where they can run many of the same services that are offered by AWS. The initial announcement did not specify which services would initially run and what compute and storage instances would be supported on Outposts. It will offer two options: one for packaging your workloads using the familiar VMware control plane you already use on premises, or the native EC2 environment that you are using in the AWS cloud. That makes Outposts the most significant (and logical) extension of the AWS-VMware relationship since it was announced a couple years ago.

Also: AWS re:Invent 2018: A guide for tech and business pros (free PDF) TechRepublic

Given that there are five categories, 16 instance families, and 44 instance types of AWS infrastructure, we'd be totally shocked if AWS offered every single permutation of EC2 instance on Outposts.

What's important is that Outposts is not a private cloud offering, but instead, a hybrid cloud. It is an extension of the VPC that the customer is already running in the nearest AWS region. The idea is being able to choose which AWS workloads run inside the firewall.

In our mind, we have this totally ridiculous image of an AWS Snowmobile 18-wheeler making roundtrips -- carting off a hundred petabytes of data from some customer to the nearest Availability Zone (AZ), and then taking a chunk of that AZ and delivering it back to some other customer. OK, we're just making that one up.

Also: What a hybrid cloud is in the 'multi-cloud era,' and why you may already have one

Outposts is not the first move for AWS outside the confines of its AZs: Amazon Greengrass for remote, IoT use cases, and Amazon Snowball Edge for preprocessing of data were the first forays. And at re:Invent, AWS added to this family with a more compute-heavy Amazon Snowball designed specifically for bringing ML out to the edge.

AWS's move to hybrid doesn't come in a vacuum. While enterprises are moving more workloads to the cloud, there are a host of legal and practical reasons why at least some workloads and data will always remain on-premises. As pure cloud provider, this is the one place where AWS has been at a competitive disadvantage. Microsoft already offers a limited replica of Azure services with Azure Stack, and in the database realm, offers several options for SQL Server customers to have their workloads managed or transitioned to the cloud. Meanwhile, Oracle offers a completely managed cloud with Cloud at Customer; IBM in turn offers IBM Cloud Private (ICP), software that implements IBM Cloud services on hardware of the customer's choosing.

Re:Invent was hardly short on database and storage announcements. AWS, the cloud platform provider with purpose-built databases, has added a couple more. Amazon Timestream is a new time series database that, as the name implies, is optimized for time series data. While time series data is hardly new, IoT has pushed the urgency for more efficient storage and retrieval of time series data to the front burner. InfluxData and Timescale are among a series of new time series database providers whose introductions have invaded our inbox over the past year. Among cloud providers, AWS has fired the warning shot.

Under the heading of buzzword compliance, AWS has introduced Amazon Quantum Ledger Database (QLDB), which contrary to its branding, is a blockchain, not a quantum computing database. But at least it got your attention. It's targeted at one form of use case: the need for a centralized, immutable ledger where there is a central trusted authority. For use cases that require de-centralized trust and need blockchain, AWS is introducing Amazon Managed Blockchain Service, which technically, is not a database.

Also: The data center is dead: Here's what comes next

AWS has also introduced several extensions to its existing database platforms: Amazon Aurora MySQL adds a Global Database feature that allows updates in a single region to be replicated across other regions rapidly. It extends Aurora's in-region log based replication to cross-region replication. In turn, Amazon DynamoDB has added new ACID transaction support that, in essence, batches updates associated with a single interaction; consistency will be strong within a region and eventual in outside regions. And there is a new On-Demand feature targeted at application with unpredictable or infrequent usage where customers pay per request.

But going back to that elusive theme of machine learning, one of the advantages of cloud is that is supposed to simplify computing. But as one size does not fit all, AWS has grown its portfolio of EC2 instances and managed services over the years to the point where choosing the right mix can be daunting. A helpful move toward that direction is a new intelligent tiering service for S3 storage that uses machine learning to determine if your data can be moved to a cooler, cheaper tier. But we're awaiting the day when AWS finally unveils a machine learning tool that analyzes your workload and data to recommend the best combination of EC2 instances.

That's where we'll briefly direct your attention to another startup that we discovered on the re:Invent expo floor: Accelerite is about to introduce a new tool for analytics that introspects your data and recommends, for instance, whether Amazon Athena, Redshift, or EMR would be your most economic option. Stay tuned. We'll dish out the dirt after they release their product next month.

Editorial standards