X
Tech

Hardware fault tolerance in a virtual environment

Stratus wants organizations that are moving more and more workloads into virtual environments to understand that they're taking a risk when they put a large number of workloads onto a single industry standard system. Although these systems have gotten better and better over the years, they still are not reliable enough for critical workloads.
Written by Dan Kusnetzky, Contributor

Stratus wants organizations that are moving more and more workloads into virtual environments to understand that they're taking a risk when they put a large number of workloads onto a single industry standard system. Although these systems have gotten better and better over the years, they still are not reliable enough for critical workloads.

Grouping these machines together to create a cluster that depends upon software to create an high availability or fault tolerant environment is a bit better, but still leaves an organization exposed to the risk of failure. Stratus believes that the best approach is to deploy fault tolerant systems to support virtual environments is a much better idea. So, Stratus is now providing VMware® Infrastructure 3 Foundation free of charge with Stratus® ftServer® systems.

Understanding the nines

Let's look at uptime to gain an understanding of what adding a "nine" will do to an organization's exposure to downtime.
Monthly Uptime Monthly Downtime Seconds Down Per Month

Minutes Down per Month

Hours Down per Month

99.0% 1% 25,920 432.00 7.200

99.9%

0.1%

2,592

43.20

0.720

99.99%

0.01%

259

4.32

0.072

99.999%

0.001%

26

0.43

0.007 99.9999%

0.0001%

5

0.09

0.001

Although workload uptime is often dependent on many factors including the uptime of their systems, storage devices, network and probability of staff error causing a slowdown or failure, the chart above shows that a system supplier offering 99.00%  is really telling its customers that, on average, that they'll experience roughly 7 hours of downtime in any given month. While that might be good enough for some workloads, it is whoafully lacking for others.

If we add a "nine" to that uptime percentage to to 99.9% uptime, those same organizations would experience nearly three quarters of an hour of downtime in any given month. This, by the way, is a common uptime figure if all of the factors that can create planned or unplanned downtime are considered.

Those offering a clustering-based solution, which, by the way, includes most virtual machine migration-based clusters, would point out that they offer between 99.99% and 99.999% update. At 99.999% uptime, the organization would experience only 26 seconds of downtime in any given month. A financial institution could loose millions of dollars if their EFT or trading systems are down that long.

Stratus, a long time supplier of fault tolerant systems, would point out that just isn't good enough for critical applications that can not experience any downtime. Stratus wants organizations to have six nines, that is 99.9999% uptime. That means experiencing only 5 seconds of downtime in any given month.

Stratus' approach

Rather than working with off the shelf industry standard systems, Stratus has developed systems that have redundant components for every function. No hardware failures are seen by the application! So, components such as vCenter Server, multiple software licenses, multiple servers, multiple switches and a dual network aren't needed. This approach, Stratus would point out, lowers costs and complexity when compared to a cluster of systems.

Snapshot analysis

Organizations often overlook fault tolerant hardware-based solutions because they believe that they're simply too expensive. In a full analysis of costs of hardware, software, staff-related costs and the cost of downtime, they just might find that this is one of the lower cost solutions. The trouble Stratus faces is convincing decision makers that the hard-to-quantify downtime costs more than make up for any costs of the fault tolerant server. The company knows that if organizations needing that level of availabiltiy really considered all of the costs, they are likely to change their mind.

Editorial standards