Datacenter Business Continuity Requires Rigorous Testing

Datacenter Business Continuity Requires Rigorous Testing

Summary: PayPal's failure highlights the need for cloud service consumers to be comfortable with their providers business continuity capabilities.

SHARE:

In more than 20 years of designing backup, disaster recovery, and business continuity systems for clients, there is factor which I have continually stressed. Testing, testing, and more testing; test often, and test regularly.  Make sure that you test a variety of scenarios; it's not enough just to test the connections between sites. And remember that the worst case scenario isn't a meteor strike leveling your datacenter; it's a series of cascading failures that stop your disaster recovery or business continuity plans from working.

This isn't a technology issue; it's simply one of common sense. There is little point in spending the money necessary for a full-fledged business continuity solution without some significant assurance that it actually works. And as companies move to cloud solutions, the failure of the solution provider doesn't affect a single business, but instead dozens, if not hundreds or thousands of businesses.

As it did with last week's series of failures with PayPal; not one, but two separate failures took the credit card processing capabilities for thousands of merchant's offline for almost three hours. On the plus side; all PayPal customers were affected, so a potential purchase wouldn't have shifted from one PayPal vendor to another as the buyer fought to spend their money. On the extra negative side, PayPal didn't acknowledge the initial failure until after it had been resolved.

So from the prospective of the potential cloud customer, this vendor suffered not a single failure, but effectively three failures:

  • The network hardware failure that was the original problem
  • A failover failure which caused a second outage
  • A communications failure where PayPal didn't acknowledge the problem until after the first issue had been resolved.

Frankly, at this point in time, I would want to not only see the business continuity plan of any cloud vendor I was planning on entrusting with a business critical process, but also their policy and actual practices for testing their own business continuity process.

Topics: Data Centers, Enterprise Software

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

1 comment
Log in or register to join the discussion
  • RE: Datacenter Business Continuity Requires Rigorous Testing

    David, you are soooooo RIGHT ON! But, I think the problem with Cloud Providers is a little more systemic, and can be summed up with "Not taking care of the basics." Anyone who has run an enterprise data center takes your point about testing for granted. It is just part of data center life. Another section of the enterprise data center "creed" is to notify your users of problems BEFORE they notify you. Having watched the Cloud community over the last 12-18 months, I see other disturbing actions that could and will be extremely detribmental to the adoption of Cloud Computing by enteprises. In 2009, we saw outage after outage with the big cloud players: Google, Amazon, Microsoft, Rackspace, Salesforce. One Microsoft incident struck me in particular. It was an elongated mid-week, mid-day outage, caused by a change they made that they could't back-out. In an enterprise environment, you do not make changes to production environments during peek hours. Even if it is an emergency, you ALWAYS have a crisp back-out or bypass plan. Another incident was a facility outage....in a highly touted "Tier 4" data center. They gave a great explanation even showing facility schematics. The problem was that their schematics clearly showed that the data center was not Tier 4....oops!

    Look back through the cloud provider outages and you will see a pattern of what I call a "lack of maturity" in core data center management discipline. They need to hire some top notch Enterprise class IT Infrastructure executives and then give them the authority to lay down the laws.
    Ken Cameron