High availability: Software vs. hardware perspectives

The notion of high availability held by software experts and the notion held by datacenter operators is quite different.


Business is increasingly a game driven by the best use of information technology (IT) and a well-aligned IT strategy. The proper selection of technology and services is only the beginning. Managing the people, the processes, and the technology to achieve business goals is critical as well.

If IT solutions, which run business critical applications and infrastructure, slow down or fail it can lead to major business problems. Datacenter facilities managers and IT developers appear to be using similar language to describe creating highly available and disaster tolerant computing solutions, but often are thinking of very different approaches.

When we speak with datacenter operators, they largely look to hardware-oriented solutions. By deploying redundant systems, networks, power supplies, cooling systems and setting up hardware-oriented approaches to workload fail over. They may not understand what those systems are doing at all.

IT management and software developers, on the other hand, are increasingly looking at the use of virtualization technology to encapsulate workloads or workload components; workload optimization tools to detect failures to meet service level guidelines; and migration technology to move workload components from place to place to meet service level objects and prevent outages.

Increasingly they design workload with the understanding that systems, networks, and other hardware components are going to fail, then develop software workarounds to address the issue. This might mean that a given workload or set of workload components move from machine to machine or datacenter to datacenter as conditions change.

After a number of conversations with datacenter facilities managers and datacenter operators at the recent Datacenter Dynamics Converged conference, it struck me that part of the confusion company executives face is that these two very important groups use similar languages, but they could mean something very different. Even when they appear to agree on an approach to availability, reliability, and disaster tolerance, when something fails, they sometimes discover that the agreements were based upon a basic misunderstanding. This situation can be exacerbated when the two teams report to entirely different managers and there is no process set up to address problems when they arise.

Could this situation be addressed by harnessing together the facilities and the IT experts into a single organization?

After speaking with Converged attendees, it quickly became clear that some large enterprises have linked the facilities and the IT teams together under the care of a single manager. Others continue to be organized in separate silos.

Show Comments