How 9/11 changed disaster planning

New technologies and new practices have come to the fore since last September. Five consultants give their views

Two law firms, both just a few blocks away from the World Trade Center, were equally decimated by the collapse of 1.8 million tons of glass and steel on September 11. Both firms used the same company to help them reconstitute business, but one firm was up and back to business as usual in two days. The other lost everything, and a year later, is still in the process of digging through paper files in warehouses, going back to clients and even competitors to try and recreate its records. So goes an anecdote from Andrew Kass, director of technical services for Array Technologies, the New York- based consulting firm that helped the two law firms with their disaster recovery plans. Law firm A had followed his company's prescriptions to the letter, while firm B had used some of Array's services for its networking and documentation, but had taken matters into its own hands on business continuity matters, making the unwise-and ultimately unlucky--decision to store their backup tapes in the World Trade Center. Kass is a business continuity expert who witnessed the human and business toll of the catastrophe. Since that horrific event, analysts have been looking to the business continuity industry to divine if any good has come of it. While some reports indicate companies are now more open to spending on disaster recovery plans, others say purse-strings are as tight as ever during the tough economic times. According to statistics from Meta Group, fewer than 25 percent of businesses currently have comprehensive business continuity/disaster recovery plans that are adequately documented and regularly tested. By 2003, that number should rise to 35 percent of businesses, and by 2004, it will hover around 50 percent, Meta predicts. An online poll of Tech Update readers suggests that companies are still resistant to change. Of approximately 250 respondents, only 10 percent said their company had changed its disaster recovery plan since September 11. A whopping 50 percent said their company doesn't yet have a plan in place. "September 11 is losing its effect," said Yankee Group analyst Zeus Kerravala, who saw a short spurt of interest after the World Trade Center disaster, followed by a relapse into apathy. "I run panels and talk to people ad nauseam about business continuity planning, but companies are still taking a reactive rather than a proactive approach." Tech Update talked to five companies with first-hand experience helping companies recover from the devastation of 9/11 to see if disaster recovery practices have really changed.

  • Andrew Kass, Director of Technical Services for Array Technologies, a New York-based consulting company
  • Chris Cangero, Vice President of Operations at Epoch Data, another New York-based consultancy
  • H. Grady Crunk III, Executive Vice President of CentralData, a remote network management company in Titusville, Florida
  • Todd Pekats, Director of Professional Services at Lakeville, MA -based storage consulting companyNeartek
  • Gary Lokken, Advisory Director of the Business Continuity Planners Association
pb title="Page 2:Anecdotes - learning from experience" Tech Update: Do you have any 9/11 anecdotes that illustrate best or worst practices? Kass: Array had three clients in One World Trade Center, which form a pretty illustrative cross-section of small business preparedness. Thankfully, all personnel safely escaped the disaster. Client A had been through the first Trade Center bombing in 1993. We had been able to retrieve their essential systems and set them up in temporary quarters at that time. They were thus extremely conscious of rotating verified tapes offsite. We were able to restore their systems to new hardware within two days, excepting only an e-mail data file that was not handled properly by the open file manager. Client B, a small law firm, took matters into its own hands. The firm backed up conscientiously and locked up the tapes in a vault--in One World Trade Center. Client B had to start from scratch. Client C backed up and reviewed procedures. We had proposed an audit, which unfortunately was tabled to late September. When the offsite tapes were mounted, all daily tapes were found to be unusable, necessitating use of a months-old archive tape. Test restore of critical data is not optional; it is of a piece with sound backup management. I'll also note that, while we never lost Internet or telephone service, many of our downtown clients were disconnected for weeks. One organisation jury-rigged a low-bandwidth wireless Internet connection for minimal e-mail use only, and supplemented this with cell phones to stay in touch. We were in the process of changing our wireless provider that very week, which allowed us to use two systems to maintain telephone coverage in the field. Cangero: For a law firm, the ability to do business is determined by the accessibility of your network. One company, which had no disaster recovery plan in place, found itself in ruins on September 12, its office just five blocks from the World Trade Center. But they were a client of Epoch Data -- we replicated their environment, recovered their data, and had them conducting business within 24 hours from a hotel room. That's an example of worst practice with a successful outcome. Lokken: I believe companies acted very responsibly in hunting down their employees and making sure they were safe. Most companies that are regulated have excellent plans in place and most large companies have some level of recovery and just need to place more emphasis on the commitment. Companies, when asked, state that business continuity is one of the top concerns that keep them awake at night -- but fewer dollars are usually spent directly on business continuity efforts than on other concerns. pb title="Page 3:What has happened to budgets?" Are companies open to spending more on business continuity now? Kass: Yes, but an appropriate, targeted spending. No one is writing blank checks, and there is a perception of snake oil in a lot of disaster recovery nostrums, particularly following the Year 2000 blitz. Pekats: Some companies are clearly spending more but many more are still in the consideration phase. The DR/BCP (disaster recovery/business continuity planning) initiatives are being treated the same by companies that have been serious about it for years. The biggest change is that many customers that had no DR/BCP plans have now identified the need to start getting serious. Crunk: Companies are slightly more open to spending on business continuity. However, even though September 11 was a wake-up call for many companies, most still feel that it is an isolated incident. What is going on is that the issues of business continuity are being brought to the top of people's minds, conversations, and budgets. Lokken: There hasn't been a change as far as I've seen, in terms of internal spending by companies not directly affected by September 11. Companies are more apt to spend money on outside consulting to make sure they have their bases covered though. In many cases, they'll do it at a higher level without even consulting their inside experts -- who could have used the money to further improve their recoverability in areas they already know they have issues with. Are there any particular areas where companies are still dragging their heels? Kass: Companies will spend thousands on new tape libraries before they'll spend a dime for online backup subscriptions. These are really complementary technologies, but online backup offers a sort of near-line mirror for firms with low tolerance of downtime and circumscribed budgets. It's not archival, that is a role for tape and optical media; it is there to minimise the backup loss window, and accelerate return to productivity. Pekats: Spending money. If money needs to be spent it goes into revenue producing technology to support the business. DR/BCP may be a necessity but it does not make any money. Crunk: When companies are dragging their heels, it is usually because most companies have put together a plan and then find out that they cannot afford to implement it. We're working with a lot of those companies to help them put together a staged plan. We're trying to make them understand that putting some processes in place, especially if they cannot afford to do it all at once, is much better than waiting for the plan to magically be implemented. We can show an ROI on even the smallest investment. By showing they are working on a disaster recovery/business continuance plan, part of the ROI can come from their customers' and shareholders' confidence. One thing that most people will admit to is that if the worst does happen, customers and shareholders will not be sympathetic if the companies are caught with their pants down. Lokken: Dollars are tight for non-money-making functions and even tighter in this economy. Companies that do not have plans are willing to spend money on contractors to obtain plans but may not continue with proper follow-through . Companies that are regulated are already doing an adequate job and will not need to enhance their efforts unless it's already part of their plans to do so. The job market for DR personnel was very good prior to last September and after last September dried up except in New York. This appears to be due to mergers and acquisitions, but there have been many DR folks in the industry laid off. One of the problems is that many companies have the DR responsibilities at too low a level in their organisation, treating them as a necessary evil. They don't take DR seriously because they still believe it will never happen to them. pb title="Page 4:New technologies, new practices" Are there any new technologies or techniques that have evolved over the past year? Kass: Nothing new as much as a more targeted approach to packaging certain services -- disaster recovery consulting, for example. But really, the various combinations of online services and VPN -- for live backup, e-mail, Web log and document management collaboration -- all help to decentralise resources to route around disaster and allow people in general to be productive and connected wherever they may be. We are seeing applications trying to catch up with the self-healing properties designed into ARPAnet a generation ago. Pekats: Backup to disk and then replicating disk is one new technique. Also, tape virtualisation that enables tape duplications and library replication, and extending SAN fabrics over distances via DWDM (dense wavelength division multiplexing) or Storage over IP. Crunk: Many technologies that were out there have evolved faster because of the September 11 tragedy. One of the fastest growing segments of our business has been IP video security and surveillance. This technology allows people to proactively monitor their environment by allowing for policy-based guidelines. This means that alerts can be set up based on events that have changed within a video or set environment. This system can be set up anywhere. It can protect wiring closets, data centers, and communication lines by setting policies that the customers set and review. It can be used at secondary sites which would cut manpower and hours logged in by people simply watching a monitor, which in turns shows an immediate ROI. This system also allows our customers to use the Homeland Security Funding Initiative. That means the funding does not come out of the IT budget and gives our customers an opportunity to use monies they never had before. This has proven to be the fastest growing segment our business has experienced over the last 18 years. Lokken:I haven't seen anything revolutionary -- just improvements on what was new technology a couple years ago, such as disk storage, communication technology, and more capable non-mainframe hardware. How have practices changed? We've heard about companies sharing resources, such as offsite facilities. Have you seen any evidence of this? Kass: I personally have seen more awareness and emphasis on secure multi-site communications leveraging the Internet since September 11th. While we and other firms opened our doors to clients and colleagues in the short term -- the re-staging period -- I see things going pretty much as usual now. Cangero: It is true, companies have used offsite facilities. But we have found over the last year that the majority of companies we talk to want to maintain control of their own data. Trusting offsite storage is simply not very comforting. Crunk: We have indeed seen companies sharing resources since the September 11 tragedy. Companies are now more open to the idea of shared resources because they cannot afford them by themselves. It has forced joint partnerships among entities that at one time were competitors. Bandwidth is more affordable now, which allows us to tie separate locations together more cost effectively. pb title="Page 5:What's the minimum response?" Looking back at what did and didn't work over the past year, is it possible to come up with a bare minimum that a mid- to large-size enterprise needs to do to protect itself? Kass: Designate a permanent recovery planning committee, empowered to draft, implement, and test a graduated continuity plan appropriate to the company's budget and needs. Make sure tapes, copies of OS, application, and backup software media, are available with documentation and licensing information in a known, secure location offsite. Publish a plan and contact directory on a mirrored company Web site, accessible by login to all employees. Have one or more recovery staging servers and appropriate tape devices or other media readers for bonus points. Cangero: Any company, no matter the size, that thinks that the "bare minimum" is going to protect them -- I would suggest they do nothing and save their money. They're going to need every penny when a disaster does happen, and it will happen. We all learned that lesson the hardest way imaginable. Lokken: Companies have to have a usable recovery plan in place that can bring up key systems in a very short period of time. The plan must be tested and understood by the whole company -- especially management. It's scary how many companies do not have plans in place and proper personnel to preach disaster recovery/business continuity at a high enough level in the company to really get the attention of the board of directors and senior management . The understanding of the plan is the important piece of the puzzle. The plan should not have to be read during an incident -- employees should be able to deal with an incident from the hip and the plan should only be a guideline for very specific tasks or contact/responsibility information. Disaster recovery should be incorporated into every employee's day-to-day activities, and not just come up when someone calls for a test or a disaster actually occurs. What are the most important lessons would you say that you -- or the industry overall -- learned about disaster recovery practices from 9/11? Kass: For companies directly impacted, the three keys were backup, re-staging, and lighting up or improvising new infrastructure services. A lot of companies that spent thousands on hot sites got short shrift when the sites were swamped and no one was flying. And for most downtown businesses, electricity and phone service, let alone Internet connectivity, were huge X factors for months following the attack. I think hot-site centres proved less valuable than anticipated -- they just couldn't scale space to this huge and sudden demand. Virtual office strategies -- mirrored data centers with remote VPN access -- offer a more flexible and serviceable approach leveraging the recoverability and versatility of the Internet, with fewer worries about bathrooms and parking. Cangero: The industry really realised within a couple of weeks just how important it is to prepare their customers for the worst. Crunk: With the September 11 tragedy, we saw disaster plans that were not at the level they should have been. I don't like to tell people to plan for the worst. I like to show the customer that by having a good plan in place, their ROI can come from eliminating some of the little disasters that are easier to see and happen quite often. By eliminating the smaller crisis, then we have the plans in place for managing the big ones. But the major lesson learned by the business continuity industry from the September 11 tragedy is this: there must be redundancy in more than one location. Pekats: Practice, practice, practice -- never underestimate the power of continual training. You can never anticipate everything, but you can plan by focusing on the outcomes of a few simple possibilities. Geographic disparity is critical. Design your backups around restoration and recovery. Too many customers concern themselves with the "backup window" that is the first and biggest mistake. It is clearly a consideration but the reality is that recovery and restoration is the end goal. Lokken: Effective communication during and speedy backup for communications afterwards, are still the most critical aspects of an effective plan. On a separate note, we also learned that the industry will help without concern for compensation when disasters strike, as long as its not anyone's fault. Speaking of lessons learned, what has become of companies A, B and C a year later? Kass: Company C essentially went quiet for a month or two while rebuilding data from such information as they had on archival backups or external paper files; we were a subcontractor there, so I don't have all the details. I am pretty sure that the general support contractor is no longer involved in this account! Companies A and B, which were in geographically equivalent situations (Company B was and is Company A's sub-tenant) had very different recoveries. While company A recovered quickly, Company B had to go back to warehoused paper files and files that were in attorneys' homes for current work; and contact courts, clients and even rival law suits in specific cases to update its records, a process that is ongoing. It is currently in the process of scanning everything needed from its paper records into its new system, and scrupulously rotates tapes off-site. If the courts and counsel were less forgiving -- were opposing counsel, for example, not willing to accept affidavits regarding the contents of lost documents -- Company B would have been liable for professional liability to such an extent that it most likely would have had to suspend practice. But I think another lesson here is that this was such an extraordinary and far-reaching disaster that special dispensations were freely granted, even among competitors. Were this a more mundane sort of event -- a transformer fire in a building rendering it contaminated, for example -- I strongly doubt that any such leniency would have been displayed. I hope never to have to employ the knowledge I gained in those weeks. We rationalise that we learn from mistakes and disasters; that's how we keep going, how we bring professional analysis to overwhelming events. It is, for better or worse, what makes us human.
Have your say instantly, in the Tech Update forum. For a weekly round-up of the enterprise IT news, sign up for the Tech Update newsletter. Find out what's where in the new Tech Update with our Guided Tour. Tell us what you think in the Mailroom.