What is Kubernetes? How orchestration redefines the data center
In a little over four years' time, the project born from Google's internal container management efforts has upended the best-laid plans of VMware, Microsoft, Oracle, and every other would-be king of the data center. So just what is it that changed everything?
The purpose of Kubernetes is not immediately obvious to anyone whose concept of the purpose and function of a data center was established in the era when the operating system was the platform upon which all software depended. Kubernetes is the product of a massive, ongoing realignment of the software resources that collectively comprise a network application. That alignment is centered around a concept called the workload, which is a broad concept of a job performed by one or more applications, or one or more services, across a multitude of processors.
A workload is a job -- for example, managing a supply chain, overseeing logistics, tracking inventory, facilitating a securities market. Kubernetes has become the modern era's job control system.
"You can think of Kubernetes as a platform for application patterns," explained Google software engineer Janet Kuo, during a tutorial session at the KubeCon 2017 conference. "The patterns make your application easy to deploy, easy to run, and easy to keep running."
The declining virtue of virtual machines
There is a growing class of data center infrastructure that is geared to concentrate on the health and well-being of workloads rather than that of the servers. Whether they're physical processors or virtual machines, servers may fail. The impact of that failure upon the availability and functionality of these workloads should be less than minimal -- they should not appear impacted at all.
Up until 2016, the open source community had come up with a handful of methodologies for orchestrating workloads for maximum availability. In a very short period of time, Kubernetes became the choice of enterprises that had made investments in open source. The reasons why could constitute an entire book, and if written well enough, it could be adapted to one of those art theater movies that win over the critics but never the Oscars.
Here, perhaps, is the only reason that matters: Google's early move to spur the Linux Foundation's establishment of the Cloud Native Computing Foundation (CNCF) gave Kubernetes enough time to organically nurture a following among the broadest group of people. The entire open-source business model revolves around the value of support. Enterprises that no longer desire to be locked into a single vendor (which, admittedly, is not everyone) appreciate the newfound value of pluralism in a support system. A group of vendors acting, if not always in concert, then at least with some modicum of coordination toward the same goal, is superior to a single vendor leading a monopolistic platform in no particular direction.
Why Kubernetes matters now
Kubernetes is not owned by any one company, although it's based on a project called Borg that was originally developed internally at Google, and Google is often perceived to be the de facto leader of the Kubernetes development community. That said, Microsoft has completely retooled its entire server system philosophy around Kubernetes, and hired several of its principal creators. As an open-source project, Kubernetes is governed by the Cloud Native Computing Foundation (CNCF), an agency of the Linux Foundation.
Google originally designed Borg to suit its own internal purposes. So it's more than fair to use Google's search engine itself as an example: The basic job of hunting for matching entries in a search query is conducted by hundreds, perhaps thousands, of individual services that share the responsibility. I'd say "countless," but that's not only wrong but contrary to the whole point of Kubernetes. It does keep count of all the services and components that comprise the active job, or jobs, throughout the network.
There is no better term at present, sadly, for the container in which these distributed pieces of programs are contained than "container." (For a while, we called these things "Docker containers" to distinguish them from "Tupperware containers," but today, Docker comprises only a part of the container ecosystem; plus, there's more than one format of container.) If you're familiar with a ZIP file, which uses mathematical compression to mash several files together into one, then you already understand quite a bit about modern software containers. They actually do use the same method to compress several files together. Those files are made up of just the executable elements and data the program would need to run, without having to look someplace else in the network. One of those elements may actually be a small operating system -- a miniaturized version of Linux, typically, or from Microsoft, a tiny cousin of Windows called Nano Server.
A program that was written for this method of containerized deployment, such as a search query response, could look through an index of cached Web pages for an entry that hasn't been selected yet, examine the semantic context of that entry for matches against the content of the query, rank the result, and register it in a list for later collection and retrieval. The program would then terminate. This is one of the characteristics of a distributed service that makes it so different from a PC application: It fulfills a request and then stops. It knows it's part of a much broader job, so once it fulfills its function, it ceases to exist. Software engineers borrow a concept from modern philosophy to describe this aspect: Ephemeralism. Unlike a GUI-based application, that literally spends most of its cycles waiting for a response from its user, an ephemeral service fulfills its function and then expires.
In a containerized network (again, so sorry, but there's no better word), programs are run in isolation from one another. Even though they may share the same processor and memory space, the host operating system outside the containers maintains their separation. (Theoretically this joint dependency is exploitable, though no known threatening exploit yet exists in the wild.) Communication can only take place between containers through a software-defined network. A more sophisticated SDN will give these containers network addresses strategically, taking into account how they will be collected together to perform a common job.
What it means to orchestrate workloads
Here is where orchestration enters the picture. Unlike "container," "orchestration" is term that perfectly describes the role Kubernetes plays. While some have illustrated this concept using an orchestra conductor, there's a big difference between a conductor and an orchestrator, both in music and in distributed applications. The act of orchestration lays out the patterns for individual applications to work together, in concert with one another -- like instruments in a band. While the composer produces the software's original pattern, including its melodic line and rhythm (the term for assembling a software container actually is composition), the orchestrator makes the piece audible.
"This is why I call Kubernetes a 'composable platform,'" explained Brian Gracely, director of product strategy for Red Hat, during a recent company webinar. "There is somewhat of a framework of what it should look like -- and some of this comes from the Kubernetes community, some of it comes from years and years of experience across the community, of how to go about deploying applications."
The orchestrator's principal job is to maintain the operating status of the applications under its trust. In another era, that job was entrusted to the operating system. But that was when the platform was a single processor with a single bank of memory and dedicated storage devices. Today, there isn't much to materially link a containerized service with any broader context of an application. (With the most sophisticated of these architectures, utilized by huge cloud services such as Netflix, no such link exists at all.) Indeed, it is the orchestrator that takes the functionality and the work product of all these services, organizes them through some form of a manifest, and comes up with some semblance of an application. Change the manifest, and you might have a different application altogether.
There is nothing structurally unique that distinguishes Kubernetes from any other type of application. It is not a virtual machine. Its orchestrator runs on an operating system. When running, it maintains a cluster of nodes, which is a more abstract way of referring to servers that may be physical or virtual. On each of these nodes are pods of containers. And within each of them is a client-side agent called the kubelet, which manages functions independently on behalf of the orchestrator, for the node to which it's assigned. But even that is a program like any other.
So Kubernetes is not like Hadoop, which truly remodels the structure of applications running in servers. Still, the distributed model that this orchestrator brings about is dramatically different from the one that prevailed up until 2016. Deployment models don't change with the times like fashion, cuisine, or political platforms. If we're being honest, Kubernetes' sudden rise to prominence is not on account of some suddenly realized need among all the world's enterprises to fling little bits of applications across the cloud. Kubernetes was the product of Google's need to make its globally accessible workloads manageable across tens of thousands of nodes. Very few other organizations in the world resemble Google, or would have Google's data center profile. Not every company runs its own search engine -- which, if you think about it, is why Google exists.
The appeal of distributed systems
So why, exactly, does Kubernetes or container orchestration have any appeal whatsoever to enterprises? The reasons for its true appeal have less to do with the workloads themselves, and much more to do with the development and deployment model around them:
Continuity -- When an application is comprised of granular components, it becomes much easier to evolve that application granularly by updating and improving those components individually. The orchestrator can make appropriate adjustments in response to how those individual changes impact the workload as a whole. No longer do feature improvements to applications have to be implemented in massive overhauls -- which, more often than not, negatively impact their usability. The concept of continuous integration and continuous delivery (CI/CD, with the "D" often standing for "deployment") can be much more easily automated by a platform that's designed from the outset to comprehend deployment itself in smaller, more manageable, steps.
Resilience -- Kubernetes maintains active replicas of container groups, called replica sets, for the express purpose of maintaining uptime and responsiveness in the event that any container or container grouping (what Kubernetes calls a pod) fails. This means a data center does not have to replicate the entire application, and trigger a load balancer to switch over to the secondary application should the primary one fail. In fact, a plurality of pods in a replica set are typically running at any one time, and the orchestrator's job is to maintain that plurality throughout the lifespan of the application.
Scalability -- The big payoff for organizations that orchestrate distributed workloads using Kubernetes is the built-in ability for workloads to multiply through the system as necessary -- to scale up and back down again, according to policy set in advance. To minimize the possibility for chaos, Kubernetes groups related containers together as pods. A service called the autoscaler can be set to automatically replicate pods to different nodes, when it determines that resources allocated to those pods are not being utilized as much as they could be.
Is Kubernetes the platform, or something else?
There continues to be some uncertainty over whether Kubernetes is a platform in the way that Microsoft Windows is a platform or VMware vSphere is a platform -- a complete provider for all the services and resources that hosted software requires to run efficiently. Undeniably, Kubernetes is an "engine," the main element providing the power to a distributed software system. Yet Kubernetes does not provide those elements itself, just as Windows' predecessor MS-DOS didn't originally provide its own hard disk optimizer or backup procedure.
But as many users would assert, as the effective engine, Kubernetes is the center of a platform that may be composed of any number of services capable of working in tandem. Some would say the purpose of today's CNCF is to maintain, marshal, and nurture a plurality of other independent, open-source projects -- for instance, monitoring systems such as Prometheus, log data managers such as Fluentd (not a typo), and trusted content authenticators such as Notary -- that may collectively comprise a platform. At the time of this writing, CNCF has certified 59 distributions, many of them commercial, featuring the orchestrator along with other CNCF tools or their vendors' own respective tools.
"You'll see that Kubernetes doesn't provide all these things," said Red Hat's Gracely. "They're all areas where the community is, through different vendors, through open source add-on projects, giving the marketplace a lot of options, giving them choice, giving them pluggability for these different elements, and allowing companies to ultimately decide, within this broader framework, how do I build the best platform for what we want to do, pick the best pieces that make sense for us, but still have it all be interoperable and supportable?"
Yet as Gracely's comment itself demonstrates, since the product of any of these collections is indisputably a platform, and Kubernetes is the facilitator at the center of it, then all of these results should be "Kubernetes platforms." Red Hat's OpenShift is one prominent example, as well as the latest 2.0 edition of Rancher.
Whither the monolith?
Whether Kubernetes is perceived by data center managers and CIOs as a platform or as an engine is not an esoteric or trivial matter. If the orchestrator is to continue to make headway in the enterprise, it can't afford to be treated as a lab experiment, or one of those crazy tools the developers love but no one else understands. "Engine" implies the need for a complete chassis (or, to borrow a phrase from my other gig, a "new stack"), and thus gives some evaluators the impression that it's incomplete by design.
A platform must provide the enterprise with the hope that it could soon host all of its applications, not just the funky ones with the curly-Q's and the microservices. For this reason, the CNCF has been presenting Kubernetes as a platform capable of hosting both old and new applications by way of containerization, even when the benefits of transferring old applications from virtual machines to containers have yet to be assessed.
One of the defining characteristics of pre-containerization era applications, compared to distributed models, is their inability to be decomposed, subdivided, or scaled. Modern developers call such applications monolithic. During a recent Open Source Summit, CNCF Executive Director Dan Kohn touted the virtues of a transition model for monoliths called "lift-and-shift."
He defined it as "the concept that you can actually take any piece of software ever written, and you can wrap it in a container. We've been trained to think of containers as these very svelte things that just have the barest number of libraries, and exactly the minimal software that's needed to run. But if you have an eight-gigabyte Java application, you can wrap a container around that. Just the act of containerizing it, actually does create some value for you, even before you've moved it to the cloud."
Bunching up at the starting gate
The bargain Kohn and other Kubernetes proponents are proposing to enterprises is that a platform based around, or integrated with, Kubernetes will at least be able to support pre-existing applications -- albeit in a different context or motif -- at the same time it's being trusted to usher in this completely new and seemingly alien architecture.
Now you can use Docker to compose a container, and any of these CaaS platforms to host that container on their own Kubernetes clusters. In all these cases, the container replaces the VM as the unit of consumption, so you no longer have to stand up your own virtual infrastructure on your side of the cloud, just to run applications.
This is where the Kubernetes revolution will pay off in spades. As of now, there's a very healthy market in the delivery of cloud-based resources just for hosting applications, and not the virtualized operating systems on which they're installed. It's so healthy and so sudden a marketplace that VMware had no other option but to join it. As new, non-monolithic applications are spawned, nurtured, and brought to maturity in the public cloud, the sign of this market's success will be how soon enterprises will stop worrying about whether or how to implement lift-and-shift.