I visited a Kusnetzky Group client, Racemi, several weeks ago. We had a fantastic discussion of disaster recovery and all of the different approaches organizations may choose to deploy to achieve the level of availability needed for their workloads. Racemi, for those who don't know the company, offers the DynaCenter family of products that are targeting cost-effective automated rapid server recovery - even on dissimilar hardware.
As is always the case with IT-based solutions there are several ways to look at this concept. One is based upon what percent of uptime is required to fulfill an organizations needs. Let's look at percent uptime and then calculate what downtime that would be experienced.
Level of availability | Downtime/Year |
90% | 36.5 days |
95% | 18.25 days |
99% | 3.65 days |
99.9% | 8.76 hours |
99.99% | 50 minues |
99.999% | 5 minutes |
Moving up to 95% availability is likely to require system or application clustering combined with redundant hardware. It may also be possible to meet this level of availability with redundant systems combined with something much more simple - backup/archiving/disaster recovery tools made available by a number of vendors (including Racemi of course).
Moving up the next step, to 99% availability, almost always requires redudant systems, power supplies, storage and networking systems. Some form of multi-system clustering is an absolute requirement at this point. There are many ways to attack this problem - clustering (operating system or application level), virtual system movement tools (XenMotion, VMotion, Live Migration), or virtual system orchestration/automation tools (Cassatt, VMLogix, Novell, Scalent Systems, Surgient all play here).
Going much beyond that often requires complex planning and datacenter design. I would bet that your suppliers would just love to sell planning and implementation services to your organization.
What is your organization doing to make sure that applications are available when needed?