X
Home & Office

Disaster recovery means rolling back the clock

A Kusnetzky Group client and I had a rather intense discussion about the role virtualization technology could play in a disaster recovery strategy. Although the strategy this organization was currently pursuing appeared to be based upon sound foundations, it didn't go far enough.
Written by Dan Kusnetzky, Contributor
A Kusnetzky Group client and I had a rather intense discussion about the role virtualization technology could play in a disaster recovery strategy. Although the strategy this organization was currently pursuing appeared to be based upon sound foundations, it didn't go far enough. This organization really didn't have a plan  to address the need to change network addresses, storage addresses and a number of other configuration issues found in the physical world.

Today's static datacenters

Datacenters, especially those based upon industry standard systems and software, have been static environments, that is a server is configured to support a single operating system, data management system, application framework, and a number of applications. Systems then access both storage and the network using a pre-assigned configuration that can only be changed with a carefully planed set of manual procedures. As the users of mainframes and single-vendor midrange systems discovered nearly three decades ago, this type of static thinking leads to a number of problems and must be replaced by the careful use of virtualization and automation. Although adopting dynamic, adaptive thinking is an important step, it’s still important to remember that physcial machines, including systems, network and storage, must be running for all of this to work! This is a lesson the managers of industry standard system-based datacenters are just learning now.

Over provisioning

Today’s industry standard system-based datacenters often evolved without an overarching plan. So, each business unit or department selected systems and software to satisfy only its own requirements and to support only its own flow of business. This means that most datacenters have become a warehouse, something a few would call a museum, for “silos of computing.” Each silo of computing was purchased with an eye only to the business unit or department’s needs. Each silo was often managed with its own management tools that may not play well with other tools the organization is relying on for the management of other silos. Business units and departments purchased sufficient system, software, storage and network resources to handle their own peak periods. Sufficient resources were also purchased to provide enough redundancy so that business solutions were always up and available. This approach also had an expensive side effect, those resources ended up sitting idle waiting for peak periods a great deal of the time. If all of the organization’s idle resources are considered, a great deal of the organization’s IT investment has been wasted. After all, those resources are not available for day-to-day processing requirements of the organization. It is clear that this approach, an approach that seemed reasonable and prudent only a few years ago, is now a luxury that many organizations can no longer afford. Organizations have been forced by a global market, rapidly changing market dynamics and regulations to include efficiency and making best use of their resources in their list of priorities. I must also note that there are some organizations that derive such value from immediate response that it overwhelms the cost structure of having a great deal of redundant equipment. Much of this post isn't for them.

Problematic Manual processes

When outages occur in the static datacenter, many organizations turned to error-prone manual processes and procedures to determine what’s happening, isolate the problem, move resources around so the business can keep running, fix the problem and then move resources back to their normal configuration. It’s also necessary to get physical machines turned on and loaded with the appropriate software. The network must be restarted or reconfigured. Storage systems must be restarted and reconfigured. Speed of recovery, in this scenario, is heavily dependent upon getting the physical systems back up and configured. Each of these steps can take a great deal of time, require costly expertise that the organization doesn’t normally have on staff and may be subject to human error. Another complication is that each of the computing silos are often based upon different application and management frameworks. This means that staff expertise that works well in solving part of the problem is not the staff expertise needed to solve other parts of the problem. It is clear that manual processes don’t scale well. This, of course, is the reason mainframe and midrange-based datacenters turned to automation decades ago. Organizations want to work with a dynamic datacenter that has the ability to deal with planned and unplanned outages, to roll back the clock, without dealing with any of the painful issues mentioned above.

Dreams of a dynamic datacenter

Managers of industry standard system-based datacenters have dreams of moving beyond a static environment. They imagine what it would be like if their datacenter did the following things:
  • Automatically found unused and, thus wasted, resources on a moment-to-moment basis. Resources such as systems, software, storage and networks must be included.
  • Automatically re-purposed those resources in a coordinated, policy-based fashion in order to make the most optimal use of these resources. High priority tasks should be given resources first.
  • Once re-purposed, the organization’s datacenter resources should automatically be assigned to useful tasks.
  • Each workload would be provided with the resources it needed without being able to interfere with or slow down other tasks.
  • Unneeded resources would be freed up so they could be powered down to reduce power consumption and heat generation if the organization so desired. These resources could be powered back up, provisioned for the tasks at hand and put to work as needed later.
  • New resources would need to be added only when currently available resources where really exhausted.
What’s clear is that everything must adapt in real-time, in a coordinated way otherwise problems are simply being shuffled about rather than really solved.

Is there a solution?

I've spoken with a number of suppliers who are trying to address these issues including Cassatt, Egenera, Scalent Systems, and, possibly, even VMlogix. Some of these folks have been very innovative and would be worth getting to know even if they're speaking about lab automation rather than disaster recovery.

What's your organization doing to address these issues?

Editorial standards