The cost of system downtime adversely affects productivity, delivery schedules, commitments, and credibility. Downtime is always a negative; sometimes it is even a business killer. The recent US East Coast and Italy blackouts proved that major IT outages can affect consumer confidence, organizational image, brand, revenues, and sales. CIOs should provision contingencies for business continuity and address the disaster recovery and continuous availability of systems to minimize the impacts of outages and to further ensure the IT organization's credibility.
META Trend: By 2007, best-practice business continuity (BC) architecture will be a three-way team effort: 1) IT architects and business leaders will establish objectives and business impact; 2) infrastructure and engineering teams will manage availability (planning and building the infrastructure to meet BC requirements); and 3) operations command-and-control groups will run/test and monitor/report on disaster recovery preparedness.
By 2007, 40% of Global 2000 users will support two data centers for high-availability (HA) requirements (and workload balancing), deploying capacity-on-demand capability for less critical recovery services. However, only 10% of these two-data-center environments will have perfected real-time recovery solutions (e.g., zero downtime and no data loss). Through 2007, a single “hardened” data center delivering traditional, cost-effective disaster recovery (DR) services (48- to 72-hour recovery utilizing third-party services) will suffice for more than 50% of users. During 2004, 80% of IT groups will re-evaluate continuity plans to ensure alignment with business impact statements (e.g., recovery point, recovery time, viability horizon). By 2005/06, DR operational availability models (technology and process) and business continuity planning (BCP) will overlap. “Near-time” and continuous-availability (CA) proximity recovery techniques will appear in 60% of IT shops by 2007.
The blurring of business-as-usual and always-available environments is requiring the integration of CA into business processes and a more comprehensive and planned approach to BCP. The commensurate project cost impact is dramatic (2x-3x of current 48-to-72-hour recovery) and requires BCP be addressed at the application/infrastructure design phases of all projects (currently not a common practice), because afterthought accommodations tend to be at least 3x as expensive. CA environments must be designed from the ground up (project initiation) to mitigate risks of operational, application, infrastructure, and business process failures. End-to-end business impact analysis (BIA) of both internal and external information/business flows is required to successfully provide recovery options for all potential scenarios. A note of caution, however: two-hour recovery time objectives (RTOs) are typically 10x regular operating costs. If they are wrong (poor planning/execution), costs can escalate 2x to 5x.
On September 5, 2002, the Board of Governors of the US Federal Reserve System (FRB), Office of the Comptroller of the Currency (OCC), and the US Securities and Exchange Commission (SEC) published for comment a draft interagency white paper on sound practices to strengthen the resilience of the US financial system. Approximately 90 comment letters were submitted to one or more of the agencies by clearing and settlement system operators; banking organizations; investment banking firms; industry associations; technology companies; federal, state, and local officials; and other interested parties.
The respondents to regulatory rule-making comments agree that a within-the-business-day recovery and resumption objective for core clearing and settlement organizations is appropriate and acknowledge that a two-hour RTO is an achievable goal, though somewhat aggressive for some because of the volume and complexity of transaction data involved. There is general consensus that the end-of-business-day recovery objective is achievable for firms that play significant roles in critical markets, though many state that this is possible only if firms are able to use synchronous data storage technologies, which can limit the extent of geographic separation between primary and backup sites. CIOs of financial institutions regulated by one the FRB, OCC, or SEC) must comply with the interagency directive issued April 7, 2003 (see Figure 1).
CIO Compliance Commitment
Key activities (and compliance dates) that CIOs must acknowledge, consider, and commit to include the following:
- Clearing and settlement firms: CIOs of these organizations should continue their accelerated efforts to develop, approve, and implement plans that substantially achieve sound practices. DR plans should provide for backup facilities that are well outside the current synchronous range that can meet within-the-business-day recovery targets. On a case-by-case basis, core clearing and settlement organizations can be given additional time by federal agencies to complete implementation of backup facilities that are well outside the current synchronous range, so long as they take concrete, near-term steps that result in substantially improved resilience by the end of 2004.
- Short-term vs. long-term compliance activities: Some CIOs will find it in necessary to provide for a longer implementation period in light of their organization’s respective risk profile, level of resilience, and unique business circumstances. CIOs should create and communicate plans that incorporate interim milestones against which progress can be measured and should provide for ongoing consideration of the costs and benefits of achieving greater geographic diversification of backup facilities.
- Cost vs. risk-benefit considerations: Cost-effective DR/BCP, and especially CA, solutions are important and certainly costly. The costs associated with implementing the sound practices can vary substantially, depending on the extent to which incremental improvements may be needed to address the risks of a wide-scale disruption. To mitigate the costs of these enhancements, CIOs should integrate them into the strategic planning process (e.g., coordinate with planned enhancements to facilities, information system components/architecture, and business processes).
- Board-level activities: Boards of directors should review DR/BCP strategies to ensure plans are consistent with the firm's overall business objectives, risk-management strategies, and financial resources. Smart CIOs do not make RTO decisions; they provide the RTO choices and corresponding cost-of-recovery options (including dual data centers/load balancing) to the board and executive management committee for decision making (and funding). Decisions about overall DR/BCP objectives, including RTOs, should not be left to the discretion of individual business units.
- Implementation time frame: Firms that play significant roles in critical financial markets should develop, approve, and implement plans that call for substantial achievement of the sound practices as soon as practicable, but generally within three years (April 7, 2006).
CIOs must ensure that the recovery solution is neither overengineered nor inadequate for meeting the business’s tolerance for risk. CIOs and their DR managers should provide a categorization framework for a business-based evaluation by establishing base-level availability services that match business requirements. Complexity should be minimized. Key applications should be chosen using an agreed-to business RTO categorization framework (see Figure 2
Business continuity is a corporate responsibility. CIOs and IT operations are enablers and facilitators. The IT organization has an obligation to help build BC plans and facilitate execution of those plans through architecture, technology, and process. CIOs need to be cognizant of IT organization capabilities and lines of business’s expectations of its capabilities. They must ensure that they align with business requirements and business’s service levels. Enterprisewide failure to comply with regulatory BCP/DR directives can result in government sanctions and fines.
Business Impact: CIOs who fail to provide continuous/high-availability systems necessary to complete the clearing and settlement of pending transactions within the business day could create systemic liquidity dislocations, as well as exacerbate credit and market risk for critical markets.
Bottom Line: CIOs must commit to robust recovery capabilities and prepare the ITO to respond to a wide-scale disruption by adopting sound BCP and DR practices.
META Group originally published this article on 29 October 2003.