Your next operating system: Your datacenter

It's time to start thinking about a new operating system layer: your entire datacenter, from networking to compute to storage, it's all one big computer now.
Written by Simon Bisson, Contributor
With a new Windows release now just around the corner, we're all thinking about desktop and mobile operating systems. But desktop OSes are only the tip of today's computational iceberg, with much of the work we used to do on our PCs moving over into the cloud, and onto ever growing fleets of disaggregated compute, memory, storage, and networking.

Two events last week put this into some perspective, with cloud software and infrastructure beginning to open up and expose the APIs and features we need to have a cloud-scale OS.

First was the Open Compute Projects 2015 US Summit, a gathering of engineers from across the industry, sharing their work on open hardware standards for cloud-scale services.

Founded by Facebook, OCP has grown to encompass much of the at-scale hardware industry, along with proprietary and open virtualization software providers. It's a fascinating convergence of ideas and technologies, where competition is left behind as engineers try to solve the problems that all cloud providers have.

That openness is reminiscent of USENIX and other events back in the late 80s and early 90s, where academic, government, and commercial ISPs were working to take what had been a niche network and expand it to global scale. Sometimes it is better for everyone to cooperate and share the lessons they've learned, so no one is reinventing the wheel. That's why Facebook is not just opening its Wedge switch architecture (and its Six Pack enclosure), it's also opening up some of its switching firmware - including customized Broadcom drivers.

Compute is key to this cloud scale future, and new hardware from HP builds on Open Compute reference designs from companies like Microsoft. Offering cloud datacenter-class hardware to all buyers makes a lot of sense, though it does require a commitment to cloud architectures and cloud operations principles. You can't take an HP Cloudline server and run it like any other rack server; for one thing it's not designed to operate with the same levels of hardware redundancy.

When you're running a datacenter with tens of thousands of the same server you're not going to worry about lights-out management or redundant power supplies - or even redundant network connections.

What matters is the workloads running on that hardware.

Any cloud platform you're running, like OpenStack or the Windows Server Azure Pack, will identify hardware failures and move workloads to other compute nodes, storage to other storage nodes, and networking functions to other switches.

Cloud platforms are only part of the answer, as they're focused on running applications and managing VMs. They're abstracted from the underlying hardware, giving us only part of what we need from a cloud-scale service.

That's where tools like Canonical's new Metal-as a-Service hardware management software come in to play. Taking advantage of discovery APIs built into OPC hardware, Canonical's software can get a picture of the available hardware resources before deploying a cloud platform.

You'll be able to use MaaS tools to map out a datacenter before layering an OpenStack cloud on top of your 6PB of storage, your 50 servers (a mix of edge and core compute hardware) and your five switches. With a clear hardware management layer under a cloud, swapping out a failed server becomes a trivial process, with MaaS tools showing exactly what hardware has failed and then mapping in its replacement and deploying cloud software as soon as its powered on.

Automating datacenters in this way is key to delivering a cloud-scale operating system; one that's aware of the underlying hardware and its capabilities, and which can orchestrate applications across pools of compute, of networking, and of storage.

We don't have a cloud scale OS yet. Tools like Canonical's Juju, like Apache's Mesos (the basis for Mesosphere), and like Google's Kubernetes are a start, offering key elements of application orchestration across both VMs and containers. But they're only part of the story, they need to reach down into the hardware, stepping back from the abstraction that's driven the move to the private and public clouds, giving us all access to the truly software defined datacenter.

It's a concept that carried over into the second event, Juniper's press and analyst day. Unveiling new switches and routers, Juniper made it clear that network function virtualization was at the heart of its model. With hardware designed to separate control and data planes, Juniper's latest devices mix custom silicon at the network layer with x86-based controllers.

One interesting hardware element in its newest core routers is their use of 3D memory. With high-density memory on the same bus as the processor, it's possible to hold an entire BGP routing table in memory, allowing rapid reallocation of routes as network performance changes. It's also an approach that could allow significant changes in how we use software defined networking. With an in-memory routing table, it's easy to imagine using high-speed NoSQL database techniques to improve networking analytics - helping core network providers optimize routes and reduce latency for users.

Similarly new spine switches provide improved connectivity for cloud-scale datacenters, with Juniper providing high density, high throughput hardware. As private and public clouds scale up for the massive data flows that come with the predicted explosive growth of the internet of things, improved switching speeds and throughputs will be necessary in order to work with cloud machine learning systems and the streams of data they will need.

Software control of the networking stack makes a lot of sense, especially as we slowly move from traditional datacenters to private and hybrid clouds, with compute, storage, and networking fabrics and with orchestrated, containerized apps. Juniper's open hardware makes sense in light of this change, and should make it possible for a new generation of datacenter scale OSes to manage the most complex of network architectures.

Further reading

Editorial standards