Disaster Recovery: Reaction, Not Reality

The cumulative effect of many business pressures (e.g.
Written by Mark Vanston, Contributor

The cumulative effect of many business pressures (e.g., regulatory compliance such as Sarbanes-Oxley, perceived and real terrorism threats) on IT operational departments to provide guaranteed service levels has created a unique opportunity for disaster recovery (DR) professionals to promote and develop robust DR processes. Such professionals are now in a position to market their services and actively promote the benefits of a robust continuity process. Business executives have begun to realize the value of these processes and have become more averse to risk due to corporate liability issues being pushed to the forefront (a la Enron, HealthSouth, etc.), and thus stakeholder protection has become increasingly important. This has resulted in the freeing up of budgets for DR/business continuity (BC) initiatives and has given the necessary "authority" to implement change within these processes.

Unfortunately, these same business pressures have caused some organizations to overreact and begin to implement plans that, though robust, are exorbitant and, for much of the business function, completely unnecessary. DR professionals need, as always, to balance perceived risk against cost at all times and thus provide maximum business advantage to the organization they are chartered to serve. Risk is based on the following three variables:

- Threat: To be at risk, there must be a threat (natural or otherwise).
- Vulnerability: To be a risk, an asset must have vulnerability (e.g., the building could become flooded).
- Value: To be at risk, an asset must have some value. If there is no perceived value to the asset, then there is no risk. This is the most important distinction in understanding and exploiting different gradations of recoverability services. Ultimately, an asset with minimal value should have minimal protection, and vice versa. In an efficient recoverability process, valuation of an asset will define the recoverability option for that asset.

Inevitably, risk-averse corporations will spend a higher proportion of their budget on continuity initiatives. It is up to DR professionals to understand the level of risk an organization is willing to take and recommend appropriate strategies that mitigate risk to the organization's satisfaction while minimizing cost, both short and long term. Committing to certain architecture (e.g., synchronous vs. asynchronous data mirroring) can have significant long-term cost ramifications.

META Trend: By 2008, 45% of Global 2000 users will utilize two data centers to deliver continuous availability; of these, 25% will support real-time recovery. By 2006, more than 60% of G2000 data centers will utilize capacity on demand to satisfy less critical recovery services. Through 2008, more than 50% of G2000 users will utilize a single "hardened" data center augmented by third-party services to deliver traditional, cost-effective disaster recovery services (48- to 72-hour recovery).

We project that expenditures for DR/BC projects will continue to increase from the current average of about 4% of IT budgets to more than 7% by 2007 for Global 2000 (G2000) organizations (this includes third-party BC/DR vendor recovery sites and implementation services). Of greater concern is the research that indicates that fewer than 40% of G2000 IT organizations have comprehensive, enterprisewide BC/DR recovery plans that are current and regularly/rigorously tested. We predict this meager subset of G2000 businesses with realistic, effective business recovery architectures will increase only to about 60% by 2005. Aside from the statistical context, it is imperative that organizations develop realistic recoverability processes. Organizations need to understand numerous variables that will ultimately affect recoverability options and thus cost structures.

Of primary importance is understanding exactly what kind of business an organization conducts. For example, financial institutions, due to the nature of the business (real-time transactions and regulatory compliance issues), typically have no choice but to build and maintain expensive recoverability architectures that can guarantee specific service levels (e.g., synchronous data mirroring). This model is inapplicable to organizations that compete in a different vertical, due to cost structures ultimately hitting the bottom line and affecting profitability. A good example is organizations that compete in the logistics space (e.g., FedEx, UPS) and operate on thin margins. An exorbitant BC solution would ultimately lead to price increases (e.g., shipping costs) and thus a competitive disadvantage.

Many problems DR staffs face, due to the new pressures placed on DR services, are now becoming exacerbated and augmented. These include understanding business priorities, obtaining necessary funding, and gaining proper application group participation. These have been largely viewed heretofore as IT organization problems, even though unrecoverable disasters are “corporate” events. Indeed, with corporations facing increasing scrutiny from both regulators and stakeholders, underlying technology problems require corporate-level communication. We believe most BC/DR problems, at root, are structural, related to outworn organizational arrangements that must necessarily evolve to a business-side dominance. In the early 1990s, as IT staffs struggled with constant user griping about expenditures, we began advising them to relinquish budget control to business-side dictation. Analogously, DR issues are sufficiently acute (and business-driven) that subsuming them under a business-side organization is imperative.

Between 30% and 35% of our clients currently have a strong business-side BC organization capable of dictating policy that permeates the (corporate) organization. This has grown from fewer than 20% four years ago - driven primarily by the events of September 11, 2001 (even though September 11, 2001, has little statistical significance to continuity projects) - and is amply evidenced by user-initiated BC/DR consulting arrangements, participation in BC/DR events, etc. We expect that, by 2005, 50% of Fortune 1000 companies to have established strong BC organizational commitments, typified by a business continuity office and officer (BCO) as a CEO report, supported by an eclectic (line-of-business, IT, corporate presence) BC steering committee. By 2007, this model will be pervasive (80%+). We believe IT groups have been, and will continue to be, instrumental in fostering these arrangements because of their historical central role in continuity and articulation of insurmountable organizational difficulties.

Establishing the Value of Continuity
Fundamental to a realistic and effective BC strategy is the clear statement of its value to the business. Specifically, the IT organization must seek and obtain business buy-in on the real-world business value of system outages. For example, we have listed our estimates for the real hourly dollar value to the business of outages for nine specific business applications, expressed not in IT metrics, but in terms that drive the business (i.e., not transactions lost or calls missed, but dollars lost per hour; see Figure 1). Such metrics are critical to effective BC deployment and must be created, measured, and tailored to the specific service provided.

In Figure 2, we have identified a data center availability scale (1-10) to aid IT organizations in concretely identifying and categorizing levels of availability, their characteristics, and resource requirements. From this rough outline, the IT organization can more precisely estimate the costs associated with each level. It is important to note that this data center availability scale is specifically targeted at IT organizations and uses IT vernacular that will be meaningless to business users. Indeed, it is critical that the IT organization and the business agree on the specific, business-oriented effect of each availability level and its corresponding hard-dollar value. Without the business’s intimate involvement in this later process, the fundamental validity of the enterprise BC/DR architecture will be in question.

Taking the valuation of continuity to the next level requires the accurate measurement of the real costs of disruption. Indeed, many Global 2000 users fail to adequately consider or document the critical link between the IT organization’s recovery plans and the business units’ requirements. The business-case justification for a granular, multi-tiered BC/DR architecture must be tied directly to a credible estimate of business performance impact. Although gross industry sector averages exist to guide users (see Figure 3), a thoroughly researched business impact analysis should be an ongoing (and regularly updated) part of the enterprise BC/DR architecture. This analysis must be based on factual, user-specific performance history and include assumptions such as peak periods of performance, time required for users to regain productivity after an interruption, fully loaded employee burden rates (i.e., full-time-equivalent cost), customer responses to incidents, stakeholder resistance to change, break-even date estimates for investments, and IT asset life spans. Accurately and consistently calculating the value of potential business disruption involves staffing of new roles for performance analysis. In addition, permanent preventive strategies must be planned, and user impact and consumer/customer response to performance changes must be tracked and reported.

Longer term, users should not limit the business impact analysis to absolute, visible disruptions. Its use can be effectively extended to quantify loss that is less severe (but nonetheless costly) due to underperformance or inefficient assets. For example, slow networks, poor server or storage use, and inappropriately or underskilled staff can all have significant, costly business impacts yet be under the business continuity “radar.” If an hour of downtime can cost millions of dollars per hour, typical losses caused by slow or inefficient systems can potentially cost a business millions of dollars monthly. Users must estimate business value by application, tying recovery resource expenditure directly to the specific application’s documented business value. These estimated costs of downtime by application will form the foundation for the business case in an organization’s budget allocation for disaster avoidance and recovery, and will minimize the risk of inappropriate overspending on BC/DR for less than mission-critical applications.

Bottom Line: DR professionals need to exercise caution as new emphasis is placed on recoverability services. Ultimately, recoverability architectures should be linked to an organization’s risk aversion and business model.

Business Impact: From a business perspective, not everything is worth protecting. Organizations need to continuously measure risk against cost to remain competitive.

META Group originally published this article on 25 February 2004.

Editorial standards