With Wednesday evening's official third-quarter release of Kubernetes version 1.16 into general availability, comes an intriguing new question: Could every element of an enterprise data center's infrastructure -- not just those newfangled containers, but virtual machines, "big data" platforms, and machine learning frameworks -- all eventually become orchestrated by Kubernetes, a product originally born out of Google's need to make order out of chaos?
We've already seen what short work Kubernetes made of Docker Inc. Since the very day that company sought to cement a position of prominence in the open-source community, the Cloud Native Computing Foundation (CNCF), a cooperative venture of the Linux Foundation, built a bigger podium for itself, and took charge of the container movement.
In the intervening months, Kubernetes wove itself into nearly every open source and commercial platform project dealing with server-side software deployment and management. Red Hat retooled its OpenShift platform before that company's $34 billion acquisition by IBM, which closed last July. (Red Hat produces a prominent enterprise Linux platform, but IBM already had one.) Mesosphere, which had its own DC/OS platform built around the established Apache Mesos workload orchestration platform, and had a strong partnership going with Microsoft, backed away from Mesos so fast it had to change its name to D2IQ.
And earlier this month, in what history may yet regard as the denouement of Kubernetes' magnum opus, VMware announced it has begun a project to completely retool vSphere, the world's most prominent virtualization platform, as a Kubernetes-based system that also hosts Kubernetes environments. In 1998, the creators of Linux knew their project had reached critical mass the moment Oracle announced it had obtained certification from Red Hat to be a supplier of software for its enterprise operating system. As of now, Kubernetes' critical mass may have already been reached.
Could its footprint in the data center actually grow from here? With version 1.16, an influential new component of the orchestrator has officially been moved out of the development phase. This component could soon make it possible for the services users need from the public cloud -- databases, personal locators, streaming media, automobile drivers, and multinational corporation accounting systems -- to assemble themselves automatically, perhaps without the user even noticing what's going on. While Kubernetes has now firmly established its place as the orchestrator for environments using "containers" -- the poorly named, shrink-wrapped units of functionality being juggled about in server clusters -- a newly enshrined capability will give it the power to orchestrate practically everything else.
The future is in the CRDs
"I think the aspiration of Kubernetes -- and I guess it's more likely than not that it will occur -- is that Kubernetes will become boring, in the sense that it will just get built-in everywhere, and be kind of the assumed default, in the same way that Linux is," remarked Dan Kohn, CNCF's executive director, in an interview with ZDNet.
It's this new power that VMware engineers referred to earlier this month, as they announced a project to infuse Kubernetes at the heart of their vSphere virtualization platform. Custom Resource Definitions (CRD) enable Kubernetes to serve as the orchestrator for other things, including jobs for "big data" environments such as Apache Spark and Hadoop. (Indeed, one startup named BlueData was launched last year, with the goal of configuring Kubernetes to manage Hadoop and Spark clusters.)
"Kubernetes has concepts like Custom Resource Definitions, controllers, and operators, that allow us to add new object types to Kubernetes," explained Jared Rosoff, VMware's senior director of product management for its recently announced Project Pacific, during a VMworld 2019 session earlier this month. "So we said, what if we used that aspect of Kubernetes to build a new kind of cloud platform? What if the way we managed Kubernetes clusters, virtual machines [VM], serverless applications, and databases, was using that same desired-state pattern?"
What is Rosoff talking about here? The Kubernetes orchestrator has already borrowed one critical element from the realm of configuration management, particularly the declarative style like Puppet and Ansible: It allows an application to specify what resources it needs from the data center infrastructure that will host it. This specification is a declaration, which is why we say Puppet and others use the declarative style. Rather than writing up an explicit set of instructions for how to assemble and provision these resources, like a conventional script normally would, it merely lists its requirements and trusts the orchestrator to make as many of those resources available as it can.
In Kubernetes' case, it's like being able to create an entire server network simply by writing the script for it. Imagine if a scriptwriter could assemble a screenplay and trust some back-end service provider to make a movie out of it.
But this is a feature Kubernetes has had from the beginning (which, let's face it, is not all that long ago). CRDs, which become official components today, break wide open the entire concept of what it is that Kubernetes assembles, and for what purpose. It makes it feasible for the orchestrator to provision first-generation VMs, replacing a several-dozen-step process on the traditional vSphere console, and usually requiring IT operators bearing officially VMware-certified skills, with something more resembling a grocery list.
"What if, when I wanted a Kubernetes cluster, I wrote a Kubernetes-style, declarative state document that said, 'I would like there to be a Kubernetes cluster that has five nodes, running version 1.15?'" continued Rosoff. "What if, when I wanted a VM, I wrote a Kubernetes-style, declarative state document that said, 'I want to run this appliance image with this many CPUs, and this much memory?' What if, when I wanted a database, I could say, 'I would like there to be a MySQL database instance, with this much RAM and this version of MySQL?' We can do this in Kubernetes, and we have done it."
For open source developers who contribute to the Kubernetes project, the CRD concept has been around a little while. CRDs first made their appearance for public dissemination as a beta component in version 1.7 just over two years ago. Back then, Google engineer Tim Hockin explained to me in an interview for The New Stack that he and his fellow contributors perceived Kubernetes as evolving into a "hub of ecosystems." In other words, it didn't have to always be about Docker-style containers, especially as long as "resources" was being defined abstractly. Kubernetes could become a "resource orchestrator," and it could be up to each data center to decide what "resource" means in its own exclusive context.
From Jared Rosoff's description, that would appear to be pretty much the world that's coming to fruition, at least at VMware. In Kubernetes' architecture, a component called the controller monitors the active state of the system and is tasked with ensuring that this state is as close as possible to the "desired state" for each of the running workloads. In other words, the controller tends to the needs of each workload. Those needs are usually specified in terms of available resources -- components that workloads need to fulfill their functions. If Kubernetes were managing a Hadoop cluster rather than a container cluster, it would require the definition of "resources" to be rewritten for a new context. That's what a CRD does.
For custom resources to be usable in this new context, custom controllers must be added. Kubernetes calls these specialized components operators, and yes, that's confusing because people in the IT department are also called "operators." But as recently updated Kubernetes documentation asserts, that's the point. Operators in the orchestrator are effectively doing the work that people would be called upon to do, in a non-automated environment.
"I think the expectation probably is that the innovation and excitement will move up the stack," remarked the CNCF's Dan Kohn, addressing the issue of what happens when Kubernetes becomes both ubiquitous and boring. "It could be service mesh, or it could be some of the authentication technologies, or this whole area of operators that people are excited about -- a lot of the things your database administrator does, upgrades, security, tuning, you can build that into software, and have the operator run the database for you, almost the way you would have hired a DBA to before."
When Kubernetes automated the task of deploying containers to a managed platform, the job of manually deploying those containers had barely existed yet. IT departments had yet to completely understand the skill before vendors such as CoreOS (now part of Red Hat, which is now part of IBM) and Pivotal (now part of VMware, which is already a part of Dell) automated it for them. But database admins are all too familiar with the relative lack of automation in their everyday tasks, which may be part of the reason for the lull in enthusiasm for big data in recent months. Kohn is suggesting that the same new capability that had been treated as a nice upgrade for one area of the data center, could -- with the help of CRDs -- trigger revolutions elsewhere on the campus, including the way enterprise admins provision services for employees.
At a CNCF-sponsored conference last June in Shanghai, Pivotal engineers Ed King and Sam Gunaratne [shown at the podium above] painted a fictional, though poignant, picture of Kubernetes automating something through CRDs it doesn't typically automate: the lights, locks, and thermostat in a household. No, this isn't the orchestration of a container-based IoT controller, but the use of the Kubernetes API to accomplish the automation of turning lights on, locking doors, and turning up or down the heat at night. It's not out of the question -- not if the custom controller (what other Kubernetes engineers call an "operator") is directly tied to these in-home appliances.
In this hypothetical world, a manifest could declare the homeowner's desire to manage the brightness state of two light bulbs. A controller dedicated to lights could compare present state to desired state, and then conjure the sequence of events necessary to turn extinguished bulbs on if that's the state the manifest declares for these bulbs.
Here's where things get interesting: If controllers in this hypothetical system had the means to contact one another, the controller for the thermostat may be able to share with the controller such details as the time of day. The light controller could then be appropriated with decision-making tools about when it may be best to apply that desired state. Unlike a typical one-to-one relationship between a controller and an appliance in IoT, the orchestrator would act as a smart hub.
If all that sounds too banal for a tech publication to get excited about, multiply this same logic in your mind by the difference in scale between a couple of household lights and a city's traffic light system. Custom controllers whose automation includes the ability to share states, could result in smarter, smoother traffic patterns. The same logic that streamlined Nokia's telephony platform could be leveraged to streamline traffic flows downtown.
"This completely transforms the way we think about the developer/user experience of interacting with our cloud," said VMware's Rosoff. "If you think about what the alternative is here, it's things like OpenStack. OpenStack doesn't feel like the future of our data centers."
In case you're wondering, that bell you hear tolling for the OpenStack hybrid cloud platform, which Jared Rosoff just rung, is the same one that tolled for Mesos.