The company most associated with "the cloud," in the minds of the general public, is Amazon, which happens to be the world's largest e-retailer. Today, Amazon is the world's largest provider of computing services accessible through the web from globally distributed servers in highly automated data centers.
Here's what we mean by that: When a computing service is made available to you anywhere in the world through the web, on servers with functions that have been leased by a publisher, software producer, or other private customer, there's a 34 percent chance (according to analyst firm Synergy Research Group) that this service is being hosted on Amazon's cloud. There's only a 15 percent chance that it's hosted on Microsoft's Azure cloud.
- In 2018, AWS delivered most of Amazon's operating income
- Top cloud providers 2019: AWS, Microsoft, Azure, Google Cloud; IBM makes hybrid move, Salesforce dominates SaaS
What Amazon AWS does, generally speaking
Before we spout forth a fountain of fabulous terms such as "cloud infrastructure" and "virtual machine," let's try to explain what this AWS thing does, in terms even a CEO could understand.
Up until the mid-2000s, software was a thing you installed on your hard drive. It was intellectual property that you were granted the license to use, and either the entirety of that license was paid for up front, or it was subscribed to on an annual "per-seat" basis. A corporate network (a LAN) introduced the astounding technical innovation of moving that hard drive into a room full of other hard drives; otherwise, the principal idea was not much different. (Microsoft thrived in this market.)
The first truly brilliant idea that ever happened in corporate LANs was this: An entire computer, including its processor and installed devices, could be rendered as software. Sure, this software would still run on hardware, but being rendered as software made it expendable if something went irreparably wrong. You simply restored a backup copy of the software, and resumed. This was the first virtual machine (VM).
Now you could install the applications you needed to run on the virtual machine rather than a physical machine. Being virtual meant it could run anywhere, and soon, it became possible to relocate a VM between processors without noticeably affecting how the application on the VM was running.
If a business could install a web server on a virtual machine, it could attain the freedom to run those web servers anywhere it was practical to do so, rather than from the headquarters basement. The first great cloud service providers — Amazon among them — built their business models around hosting the virtual machines that ran their web servers, and selling that hosting for a time-based fee rather than an expensive license. Customers only paid for what they used, thus making high-quality service feasible for small and medium-sized businesses for the first time.
What "infrastructure" and "service" mean in the public cloud context
In any global economic system, the term infrastructure refers to the layers of services and support systems upon which the more visible components of the economy are based. Amazon Web Services, as was evident from the division's original name, enables websites to be hosted remotely. Since its inception, though, AWS has grown into the world's principal provider of virtual infrastructure -- the operating systems, hypervisors, service orchestrators, monitoring functions, and support systems upon which the economy of the public cloud is based.
We use the word "service" quite a bit in this article, though it's important that we use it intentionally rather than a word that just means "stuff" or "things", or the way "content" is used to refer to what you read on a website. AWS provides the following principal services:
- Software-as-a-Service (SaaS). On your PC, your web browser has become a rendering vehicle for applications delivered to you from a cloud provider. On your smartphone, its operating system can perform that same role, and the result can be an app whose core functionality exists in the cloud, rather than being installed on the device. In both of these cases, the software is run on the server and delivered to your device (the client). Portions of this application are executed in both places, with the internet serving as the delivery medium.
- Platform-as-a-Service (PaaS). When your intention is to deliver software to your customers through the cloud, it becomes more practical to use tools that are located in the cloud to build that software effectively on-site ("cloud-native applications"). It also becomes feasible to optimize the billing model for that software -- for instance, by charging only for the customers' use of specific functions built on the cloud platform.
- Object data storage. Although it's fair to say Amazon did not create the cloud, it's equally fair to say it did create the market for bulk data storage and delivery. By this, we mean not just files but the mountains of structured and unstructured data that may constitute a database, or may not yet have coalesced into a database. The impetus for this part of the cloud revolution was AWS charging for the space that data actually consumed, rather than the volumes or hard drives that contain it. Now, AWS offers a variety of data service options best suited for the different ways that customers intend to use cloud-based data.
- Game of Clouds: Lock-In is Coming
- Why AWS re:Invent is arguably more important than Amazon's Black Friday, Cyber Monday bonanza
- Security is the no. 1 IT barrier to cloud and SaaS adoption TechRepublic
Putting the Amazon cloud to use
Let's first be very clear about what a cloud platform is. You've already read more definitions of "cloud" than there are clouds (in the sky), but here, we're talking about the operating system that reformulates multiple servers into a cohesive unit. For a group of computers anywhere in the world to be one cloud, the following things have to be made feasible:
- They must be able to utilize virtualization (the ability for software to perform like hardware) to pool together the computing capability of multiple processors and multiple storage devices, along with those components' network connectivity, into single, contiguous units. In other words, they must collect their resources so they can be perceived as one big computer rather than several little ones.
- The workloads that run on these resource pools must not be rooted to any physical location. That is to say, their memory, databases, and processes -- however they may be contained -- must be completely portable throughout the cloud.
- The resource pools that run these workloads must be capable of being provisioned through a self-service portal. This way, any customer who needs to run a process on a server may provision the virtual infrastructure (the pooled resources for processing and other functions) needed to host and support that process, by ordering it through the web.
- All services must be made available on a per-use basis, usually in intervals of time consumed in the actual functioning of the service, as opposed to a one-time or renewable license.
The US National Institute of Standards and Technology (NIST) declared that any cloud service provider (CSP) to which the US Government would subscribe, must at a minimum provide these four capabilities.
If NIST had the opportunity to add a fifth component, given the vast amount of history that has taken place in the few short years of the public cloud's prominence, it would probably be support. AWS may be a public cloud, but it is also a managed service. That means it's administered to deliver particular service levels which are explicitly spelled out in the company's service-level agreements (SLA).
How do you get started with AWS?
It surprises some to learn that an AWS account is not an Amazon account with extra privileges. It's a security account that centralizes the access you're given to AWS services, and associates that access with a billable address. Not a shipping address, like a destination for goods ordered from Amazon.com, but rather a login like the one you may use for Windows.
There are ways you can use this AWS account to launch yourself into the AWS space without much, or quite likely without any, monetary investment. For the first year of each account, AWS sets aside 750 hours of free usage per month (also known as "the entire month") of a Linux- or Windows-based t2.micro virtual machine instance, which is configured like a single-CPU PC with 1 GB of RAM. Using that instance as a virtual server, you're free to set up an instance of an Amazon RDS relational database with up to 20 GB of storage, plus another 5 GB of standard S3 object storage. (You'll see more about these basic services momentarily.)
Where Can You Learn How to Use AWS?
From time to time, AWS convenes an online, half-day streaming conference to teach newcomers about how these services work. At the time of this writing, the next edition of the company's "AWSome Day" was scheduled for April 9, 2019, from 12:00 noon to 4:30 pm Eastern Time.
That online conference may give you a shove in the general direction of what you think you might need to know. If you have a particular business goal in mind, and you're looking for professional instruction, AWS sponsors instructional courses worldwide that are conducted in training centers with professional instructors, and streamed to registered students. For example:
- Migrating to AWS teaches the principles that organizations would need to know to develop a staged migration from its existing business applications and software, to their cloud-based counterparts.
- AWS Security Fundamentals introduces the best practices, methodologies, and protocols that AWS uses to secure its services, in order that organizations that may be following specific security regimens can incorporate those practices into their own methods.
- AWS Technical Essentials gives an IT professional within an organization a more thorough introduction to Amazon services, and the security practices around them, with the goal being to help that admin or IT manager build and deploy those services that are best suited to achieving business objectives.
- Oracle's Ellison: No way a 'normal' person would move to AWS
- Cloud customers pairing AWS and Microsoft Azure, according to Kentik
How affordable is AWS really?
AWS' business model was designed to shift expenses for business computing from capital expenditures to operational expenditures. Theoretically, a commodity in which costs are incurred monthly, or at least more gradually, is more sustainable.
But unlike a regular expense such as electricity or insurance, public cloud services tend to spawn more public cloud services. Although AWS clearly divides expenses into categories pertaining to storage, bandwidth usage, and compute cycle time, these categories are not the services themselves. Rather, they are the product of the services you choose, and by choosing more and incorporating more of these components into the cloud-based assets you build on the AWS platform, you "consume" these commodities at a more rapid rate.
AWS has a clear plan in mind: It draws you into an account with a tier of no-cost service with which you can comfortably experiment with building a web server, or launching a database, prior to taking those services live. Ironically, it's through this strategy of starting small and building gradually that many organizations are discovering they hadn't accounted for just how great an operational expense the public cloud could become -- particularly with respect to data consumption.
Cost control is feasible, however, if you take the time to thoroughly train yourself on the proper and strategic use of the components of the AWS platform, before you begin provisioning services on that platform. And the resources for that cost control training do exist, even on the platform itself.
- Cloud cost control becoming a leading issue for businesses
- Cloud cost control also a challenge for small businesses and freelancers
What do Amazon's Web services do?
Selling the services that a computer performs is nearly as old a business as selling computers themselves. Certainly Amazon did not invent that either. The time-sharing systems of the 1960s made it possible for universities and institutions to recoup the enormous costs of acquiring systems, at a time before tuition revenues could have accomplished that by themselves. But at just the right time, Amazon very keenly spotted the one type of service that almost every business could utilize: the ability to run a virtual server that runs websites.
Elastic Compute Cloud
The product name for the first automated service that AWS performs for customers is Amazon Elastic Compute Cloud (EC2). This is the place where AWS pools its virtual resources into instances of virtual machines, and stages those instances in locations chosen by the customer to best suit its applications.
Originally, the configurations of EC2 instances mimicked those of real-world, physical servers. You chose an instance that best suited the characteristics of the server that you'd normally have purchased, installed, and maintained on your own corporate premises, to run the application you intended for it. Today, an EC2 instance can be almost fanciful, configured like no server ever manufactured anywhere in the world. Since virtual servers comprise essentially the entire web services industry now, it doesn't matter that there's no correspondence with reality. You peruse AWS' very extensive catalog, and choose the number of processors, local storage, local memory, connectivity, and bandwidth that your applications require. And if that's more than in any real server ever manufactured, so what?
You then pay for the resources that instance uses, literally on a per-second basis. If the application you've planned is very extensive, like a multi-player game, then you can reasonably estimate what your AWS costs would be for delivering that game to each player, and calculate a subscription fee you can charge that player that earns you a respectable profit.
- AWS rolls out new EC2 high memory instances, tailored for SAP HANA
- AWS' Z1d compute instance aims to be 'fastest in public cloud'
Elastic Container Service
Virtual machines gave organizations a way to deliver functionality through the internet without having to change the way their applications were architected. They still "believe" they're running in a manufactured server.
In recent years, a new vehicle for packaging functionality has come about that is far better suited to cloud-based delivery. It was called the "Docker container," after the company that first developed an automated mechanism for deploying it on a cloud platform (even though its name at the time was dotCloud). Today, since so many parties have a vested interest in its success, and also because the English language has run out of words, this package is just called a container.
AWS' way to deliver applications through containers rather than virtual machines is Elastic Container Service (ECS). Here, the business model can be completely different than for EC2.
Because a containerized application (sorry, there's no other term for it) may use a variable amount of resources at any particular time, you may opt to pay only for the resources that application does use, at the time it requests them. As an analogy, think of it like this: Instead of renting a car, you lease the road, pay for the gasoline consumed with each engine revolution, the oxygen burned with each ignition of a cylinder, and the amount of carbon dioxide produced by the catalytic converter. With ECS, you're renting the bandwidth and paying for the precise volume of data consumed and the cycles required for processing, for each second of your application's operation. Amazon calls this pricing model Fargate, referring to the furthest possible point in the delivery chain where the "turnstile" is rotated and where charges may be incurred.
- How Amazon and VMware are building one cloud for both platforms
- At re:Invent, AWS launches Private Marketplace for the enterprise
One very important service that emerges from the system that makes ECS possible is called Lambda, and for many classes of industry and academia, it's already significantly changing the way applications are being conceived. Lambda advances a principle called the serverless model, in which the cloud server delivers the functions that an application may require on a per-use basis only, without the need for pre-provisioning.
For instance, if you have a function that analyzes a photograph and isolates the portion of it that's likely to contain the image of a human face, you can stage that function in Amazon's cloud using the serverless model. You're not being charged for the VM or the container hosting the function, or any of the resources it requires; rather, AWS places its "turnstile" at the point where the function renders its result and terminates. So you're charged a flat fee for the transaction.
Although Amazon may not have had the idea for the serverless model, Lambda has advanced that model considerably. Now developers are reconsidering the very nature of application architecture, with the end result being that an entirely new economy may emerge around fluid components of functionality, as opposed to rigid, undecipherable monoliths.
- What serverless architecture really means, and where servers enter the picture
- To be a microservice: How smaller parts of bigger applications could remake IT
Simple Cloud Storage Service
As we mentioned before, one of Amazon's true breakthroughs was the establishment of S3, its Simple Storage Service (the word "Cloud" has since been wedged into the middle of its name). For this business model, Amazon places "turnstiles," if you will, at two points of the data exchange process: when data is uploaded, and when it's transacted by means of a retrieval call or a database query. So both input and output incur charges.
AWS does not charge customers by the storage volume, or in any fraction of a physical device consumed by data. Instead, it creates a virtual construct called a bucket, and assigns that to an account. Essentially, this bucket is bottomless; it provides database tools and other services with a means to address the data contained within it. By default, each account may operate up to 100 buckets, though that limit may be increased upon request.
Once data is stored in one of these buckets, the way AWS monetizes its output from the bucket depends upon how that data is used. If a small amount of data is stored and retrieved not very often, AWS is happy not to charge anything at all. But if you've already deployed a web app that has multiple users, and in the course of using this app, these users all access data stored in an S3 bucket, that's likely to incur some charges. Database queries, such as retrieving billing information or statistics, will be charged very differently from downloading a video or media file.
If AWS were to charge one flat fee for data retrieval -- say, per megabyte downloaded -- then with the huge difference in scale between a spreadsheet's worth of tabular data and a 1080p video, no one would want to use AWS for media. So S3 assumes that the types of objects that you'll store in buckets will determine the way those objects will be used ("consumed") by others, and AWS establishes a fee for the method of use.
- AWS rolls out new security feature to prevent accidental S3 data leaks by Catalin Cimpanu
- Amazon Web Services launches fully-managed AWS Backup by Asha McLean
AWS Database services
Here's where Amazon adds a third turnstile to the data model: by offering database engines capable of utilizing the data stored in S3 buckets. An AWS database engine is a specialized instance type: a VM image in which the database management system is already installed.
For relational data -- the kind that's stored in tables and queried using SQL language -- AWS offers MariaDB (open source), Microsoft SQL Server, MySQL (open source), Oracle DB, PostgreSQL (open source), and Amazon's own Aurora. Any application that can interface with a database in one of these formats, even if it wasn't written for the cloud to begin with, can be made to run with one of these services. Alternately, AWS offers DynamoDB for use with less structured key/value stores, DocumentDB for working with long-form text data such as in a content management system, and ElastiCache for dealing with high volumes of data in-memory.
Standing up a "big data" system, such as one based on the Apache Hadoop or Apache Spark framework, is typically a concentrated effort on the part of any organization. Though they both refrain from invoking the phrase, both Spark and Hadoop are operating systems, enabling servers to support clusters of coordinated data providers as their core functionality. So any effort to leverage the cloud for a big data platform must involve configuring the applications running on these platforms to recognize the cloud as their storage center.
AWS approaches this issue by enabling S3 to serve as what Hadoop and Spark engineers call a data lake -- a massive pool of not-necessarily-structured, unprocessed, unrefined data. Originally, data lakes were "formatted," to borrow an old phrase, using Hadoop's HDFS file system. Some engineers have since found S3 actually preferable to HDFS, and some go so far as to argue S3 is more cost-effective. Apache Hadoop now ships with its own S3 connector, enabling organizations that run Hadoop on-premises to leverage cloud-based S3 instead of their own on-premises storage.
In a big data framework, the operating system clusters together servers, with both their processing and local storage, as single units. So scaling out processor power means increasing storage; likewise, tending to the need for more space for data means adding CPUs. AWS' approach to stationing the entire big data framework in the cloud is not to correlate Spark or Hadoop nodes as unaltered virtual machines, but instead deploy a somewhat different framework that manages Hadoop or Spark applications, but enable S3-based data lakes to become scalable independently. AWS calls this system EMR, and it's made considerable inroads, capitalizing on Amazon's success in substituting for HDFS.
- We interrupt this revolution: Apache Spark changes the rules of the game by Scott M. Fulton, III
Amazon Kinesis Data Analytics
Kinesis leverages AWS' data lake components to stand up an analytics service -- one that evaluates the underlying patterns within a data stream or a time series, make respectable forecasts, and draw apparent correlations as close to real-time as possible. So if you have a data source such as a server log, machines on a manufacturing or assembly line, a financial trading system, or in the most extensive example, a video stream, Kinesis can be programmed to generate alerts and analytical messages in response to conditions that you specify.
The word "programmed" is meant rather intentionally here. Using components such as Kinesis Streams, you do write custom logic code to specify those conditions that are worthy of attention or examination. By contrast, Kinesis Data Firehose can be set up with easier-to-explain filters that can divert certain data from the main stream, based on conditions or parameters, into a location such as another S3 bucket for later analysis.
- Cloud Computing: Why this airline just went 'all in' on AWS by Steve Ranger
Amazon Elastic Container Service for Kubernetes
As Microsoft so often demonstrated during its reign as the king of the operating system, if you own the underlying platform, you can give away parts of the territory that floats on top of it, secure in the knowledge that you own the kingdom to which those islands belong.
In founding the market for virtualization, VMware set about to relocate the seat of power in the data center kingdom to the hypervisor. And in drawing most of the map for the public cloud market, Amazon tried to relocate it to the EC2 instance. Both efforts have yielded success. But Kubernetes, as an open source orchestrator of container-based applications, sought to plant a bomb beneath that seat of power, by effectively democratizing the way new classes of applications were created and deployed. It was Google's idea, though with just the right dosage of benevolence, Docker would step aside, bowing graciously, and even Microsoft would contribute to the plan.
AWS' managed Kubernetes service, called EKS and launched in July 2018, represents Amazon's concession to the tides of history, at least for this round. The previous July, Amazon joined the Cloud Native Computing Foundation -- the arm of the Linux Foundation that oversees development of the Kubernetes orchestrator.
This way, EKS can provide management services over the infrastructure supporting a customer's Kubernetes deployment, comparable to what Google Cloud and Azure offer. The provisioning of clusters can happen automatically. That last sentence doesn't have much meaning unless you've read tens of thousands of pages of Kubernetes documentation, the most important sentence from which is this: You can pick a containerized application, tell EKS to run it, then EKS will configure the resources that application requires, you sign off on them, and it runs the app.
So if you have, say, an open source content management system compiled to run in containers, you just point EKS to the repository where those containers are located and say "Go." If all the world's applications could be automated in exactly this way, we would be living in a very different world.
- What Kubernetes really is, and how orchestration redefines the data center by Scott M. Fulton, III
VMware Cloud on AWS
The way VMware manages virtualized infrastructure for its vSphere customers, and the way Amazon manages its cloud infrastructure for AWS, are fundamentally different. However, the beauty of virtualization is that it's possible to configure a platform to run another platform (like the way Windows 10 runs Linux apps).
It took engineers from Amazon and VMware working together several years before they could emerge with a system allowing its NSX network virtualization layer (which enables the underlying resources of all servers in an enterprise network to be perceived as a single infrastructure) to be supported in AWS' proprietary cloud infrastructure. We don't exactly know how this works, but for now, it appears that it does work after all.
So with a product that VMware calls VMware Cloud on AWS, an existing vSphere environment may provision resources from the Amazon public cloud as necessary, including under policy-based automation rules. This way, compute power, storage, and to some extent database functionality may be brought into an enterprise network like fluids being trucked in to save an organization in crisis. And these dynamic resources may then be de-provisioned when they're no longer needed.
In recent months, Amazon has expanded its working relationship with VMware, enabling AWS for the first time to offer deploying its own server hardware on customers' premises. Certainly that's one way to bring the cloud closer to the enterprise.
AWS needed a means to compete against Microsoft's Azure Stack, which gives Azure customers the means to run Microsoft services in their own data centers the same way Azure would. That's easy enough for Microsoft, since much of Azure Stack is based on Windows Server already. Amazon's servers, by contrast, are clustered by a very unique beast entirely. AWS Outposts, as Amazon calls it, gives large enterprises a way to let that beast in through their back door, at least part-way, leasing them Amazon servers exclusively for their own use.
- VMware ramps up VMware Cloud on AWS, updates Cloud Foundation by Stephanie Condon
- AWS Outposts brings AWS cloud hardware on-premises by Asha McLean
- How Amazon's DeepLens seeks to rewire the old Web with new AI by Scott M. Fulton, III