How much availability is enough?

Virtualization technology offers many ways to increase application availability. Before installing anything, however, it is wise to consider how much availability is enough for each application.

It is amazing to me that the same topics come up again and again when speaking with clients. This time, the topic was how to obtain higher levels of application or workload availability. The client in question wanted to understand when a continuous availability/fault tolerant system (FT) is preferred over a software-based High Availability (HA) solution. I wrote about this in April 2007 and thought I'd refresh that article here.

FT solutions go beyond HA fail over solutions to present an environment that is never seen to fail not merely an environment that survives a failure.  Some suppliers of FT technology call this "fail through" rather than fail over. I thought that was a well known concept and was surprised to find that the distinction is still not clear to some.

How different layers of virtualization technology can help

Here's a summary of how virtualization technology operating at different levels of the Kusnetzky Group model make applications and workloads more highly available:

Kusnetzky Group Model of Virtualization Technology

  • Access Virtualization —Access to application solutions can be virtualized.  If the back end system fails, the individual using the application is connected to another system that offers the same application. Both Microsoft and Citrix have excellent products in this area. More sophisticated access virtualization software may make this process automatic. Even more sophisticated products in this area will remember the state of the application and give the impression that nothing ever failed. Doing this last bit, however, usually involves other forms of virtualization. This fail over process, by the way, is unlikely to be instantaneous.
  • Application Virtualization —Application frameworks may offer load balancing and failover capabilities. The application framework monitor, upon detecting either a failure to meet service level objectives or some other type of failure, would start the application on another machine. Once again, the process could be automatic or require manual intervention. If other types of virtualization are in use, the actual state of the application could be saved during the process. While this process may happen quickly, it is likely that individuals using the application would notice a pause or a slow-down.
  • Processing Virtualization —Processing virtualization, which includes clustering, parallel processing and virtual machine software, may offer similar load balancing and fail over capabilities to that offered by application framework virtualization for selected or all applications on a given system. The key difference between the levels of virtualization is that application framework virtualization only virtualizes applications running in that framework. Processing virtualization makes it possible for applications, data management products or even basic system services to fail over to another system. As with the other forms of virtualization, the fail over process can take some time.
  • Storage Virtualization —Virtualizing storage often a necessity for all of the other forms of virtualization. After all, what good is moving an application over to another system, if the data it was processing is no longer available. Storage virtualization could be implemented using special purpose software on general purpose systems or by moving the entire storage function to a special purpose storage server.

What happens if a failure or outage isn't acceptable

All of these are well and good. What happens, however, if a specific application or workload can never be seen to fail? Now we're entering into the realm of FT systems.  In this case special purpose, redundant hardware configurations are deployed that are run in lock-step.  If one component of the system fails, the other continue working and the application does not fail. Stratus Technologies is a supplier of a single-system approach. Stratus and Marathon are both able to do this trick as well with multiple computers and sophisticated, tricky software. Stratus' single-system approach would be faster because component failures are managed in hardware.

The key question I pose to clients boils down to "How much availability is enough?"