What's the fuss about... disaster tolerance?

Or: 'How to prepare for the end of the universe...'
Written by silicon.com staff, Contributor

Or: 'How to prepare for the end of the universe...'

There may be no real way of making sure you survive that kind of disaster, but other, more mundane events could bring your company to its knees. Quocirca analyst Clive Longbottom tells you how to prepare for (almost) the worst... When a problem occurs, it is important to be able to recover from it as soon as possible. This is known as disaster recovery. However, it should be even more important to ensure that the problem does not affect your business - this is knows as business continuity services, or more accurately as disaster tolerance. Disaster tolerance is often made out to be expensive and only for the biggest companies who would lose millions in revenues if a technical failure was to occur. However, companies of all sizes should look at their disaster tolerance requirements. Solutions can be put in place to meet most needs and most budgets. It is not the case that every company needs the utmost in disaster tolerance solutions, but it is sensible to choose the solution which best balances your needs against your risks. If you are highly dependent on your technology, a higher level of solution should be chosen. However, if your technology systems generally provide internal services such as office functionality and customer accounts, a solution providing minimal disaster tolerance is appropriate. We advise that any solution must be capable of being moved seamlessly into the next two levels, enabling you to grow your company while retaining disaster tolerance. A full set of policies and procedures should also be put in place to deal with all the levels of disaster that would impact your business without automatically putting you out of business. These policies and procedures should include responsibilities for carrying out image and/or data backups, where these are to be stored, where replacement components or systems should be sourced from and how work will be carried out should a disaster occur. The policies and procedures should be reviewed regularly. Quocirca has a 10-level disaster scale, and provides advice on the minimum levels of solutions that should be implemented to provide tolerance. 1. In-box component failure
Failure of a single component (e.g. disk, power supply) within an assembly. This can be guarded against through the use of n+1 components. This should also be supported with traditional disaster recovery solutions, such as image backup and incremental data back-up.
2. Box failure
Failure of a complete assembly, e.g. through board or wiring failure. This can be guarded against through the use of fail-over back-up boxes in the same room. As there is still a considerable risk of room failure, Quocirca still recommends full image and data backup policies be implemented.
3. Room failure
Failure of a physical environment, e.g. through fire or through failure of critical networking component. This can be guarded against through the use of fail-over back-up boxes in another room in the same building. At this stage, image backups begin to be of less important. Quocirca recommends that data backups still be operated.
4. Building failure
Failure of a whole building may be brought about through, for example, the cutting through of major data cables outside the building, lightning strikes, etc. This can be guarded against through the use of replicated data centres within the campus areas. At this point, we begin to run into data latency issues, and it is necessary to ensure that data replication solutions take this into account.
5. Campus failure
Failure of a campus environment may be brought about by major data cable failure in the public environment, localised flooding, etc. Here, a replicated data centre in another part of the city will be required to provide tolerance. At this point, data latency issues become of major importance, and transactional logging of the data replication may be required. As the solutions will now often be running in totally different environments, the solution must also have advanced capabilities to be able to deal with users' state - maintaining their position within an application, as well as their data. At this stage is where outsourced business continuity services come into their own - an external company provides a 'hot' environment where your own staff can relocate on a temporary basis while the original site is sorted out.
6. City failure
Failure of a whole area may be brought about by flooding, earthquake, etc. Tolerance here will require a replicated data centre elsewhere in the country.
7. Country failure
Country failure may be brought about by war or political/civil unrest. Tolerance can only be provided by extreme long-range data centre replication. Many of today's storage companies have demonstrable capabilities to a distance of 1500km and beyond. The above are all highly possible in today's environments, but the last three vague possibilities must also be considered for completeness:
8. Continental failure
Continental failure would require a major natural disaster, such as an earthquake resulting in a tsunami. In most cases this would also involve the end of the world. Should continental failure be an issue, it would be necessary to arrange for replicated data centre environments across multiple continents.
9. World failure
World failure brought about by, say, a large asteroid impact, presents problems where the ongoing running of a company should not, at this stage, be at the forefront of anybody's mind. However, Quocirca believes that for those who really want to think ahead, the use of narrow-beam radio technology will enable you to stream your data out into space, and that this stream could be captured and rebuilt as data from the depths of space onto a extra-terrestrial mirrored data system.
10. Universe failure
At this stage, Quocirca has failed to come up with a means of providing any levels of tolerance... Quocirca is a leading, user-facing analyst house known for its focus on the 'big picture'. For a full summary of its activities see http://www.quocirca.com, or reach the company's founding directors by emailing quocirca@silicon.com.
Editorial standards