Everyone needs backups, but how do you recover a server quickly--operating system, patches, and all? We look at some of the options available for snapshot backup and other disaster recovery techniques.
Disaster recovery software covers quite a wide gamut of features and functionality. One we are all familiar with is good old tape backup software. Disaster recovery is precisely what tape backup is all about, but it is really only the blunt end of the disaster recovery pyramid.
Limitations of tape backup
Traditionally with tape backup software, you take full backups of your data files at regular intervals and more frequently incremental backups. Should a disaster occur, it's simply a matter of restoring the most recent full backup and incremental backup to restore your data. In a simple scenario, you might run the full backup over the weekend and then incremental backups Monday through Friday. Of course a loss of data on a Friday is going to be a long-winded restore, as you would need to restore a full backup and then four incremental backups--Monday through Thursday night--one after the other. For this reason alone, many choose to carry out a full backup on the weekend, incremental backups Monday and Tuesday then a differential backup on the Wednesday followed by two more incremental backups on Thursday and Friday.
The differential backup on the Wednesday takes longer than an incremental but it means a failure on the Friday would only require a full, a differential, and an incremental to be restored--not a full and four incremental backups.
If however your whole server Chernobyls or is stolen, you have lost a bit more than your valuable data, you have also lost a possibly significant chunk of time rebuilding your server to the point where you can start to restore your data files. The traditional tape backup scenario only backs up data files. Since you are not going to be restoring your operating system (OS) from tape, you will first have to install the OS, drivers, and probably a great wad of patches from the OS vendor. And believe me--I have just rolled out seven new test servers for the Lab--this is quite a time-consuming task. Time of course is money and most probably your business will be operating at a fraction of its efficiency at best, or not at all at the worst, until your data is back online.
Of course with inexpensive ATA drives that are fast and offer large storage capacities, many companies are replacing their tape unit with a direct attached ATA drive array. This speeds up the backup and restore processes considerably but file-based backup software does nothing to cure the inevitable and ponderous manual OS reinstall.
A problem with your basic garden variety backup software is that the application/s may need to be taken offline to execute the backup. This is fine if your business runs nine to five, but in many cases the systems must be up 24x7 so there is no window of opportunity.
To overcome this problem, there are new incremental or snapshot backup technologies that are file- or block-based, that run in the background while your system is fully operational. The idea being that as data is changed, only the changes are stored.
The snapshots may be 15 minutes apart, for example, so at the most you may lose 15 minutes of data. Some vendors offer "real time" snapshots of the data as the changes are made; these snapshots are all time stamped so if a problem occurs you can simply roll back one step prior to where the problem occurred. This is great for malicious virus attacks, for example, as you can identify the very point that the virus infected the system and roll back to clean data.
How do you take full backups or even snapshots of applications that cannot be taken down?
Well a simple solution may be to run a mirror of the application. While the app runs on one volume, the mirror volume can be taken offline and streamed to backup. Of course when the mirror is placed back online, there is the need to resync the mirrored volume. Since 1TB of data takes around 16 hours to resync, it can be quite a long wait. However a small "resync" volume can be created that simply tracks the changes after the mirror is taken offline, this then updates the mirror in a fraction of the time.
There are pros and cons to either file- or block-based incremental backups. File-based backups are naturally larger than block-based backups, but restoring a file-based backup is generally less involved and may at times be quicker when all you need is a particular file.
HP OpenView Storage Data Protector V 5.1
Data Protector can centrally manage the entire backup requirements of your enterprise through, at first appearances, a simple interface. There is of course a lot more complexity as you delve deeper. From the console, servers and devices can be administered remotely right down to installing clients and modules on remote servers.
Managing devices is not as intuitive as some other packages we have seen but it does not take overly long to come to grips with the overlying logic and there is quite a list of default devices provided.
Data Protector works by dividing your business into "cells" where a cell has a cell manager (who maintains the overall backup database), client systems, and backup devices. The client systems must have the backup agent installed to be managed by the cell manager but of course this can be remotely deployed.
So if you have a very large or geographically diverse company it can be divided up into a number of cells to provide logical groupings, separate administrative control, or based on performance considerations.
The product has very powerful media management and allows grouping of media into media pools so that devices do not need to be managed individually. Bar codes are supported and so media can be tracked, loaded and unloaded, rotated, and managed automatically.
Data Protector supports direct or serverless backups, split mirror backups--of which the product allows up to primary three mirrors that can be rotated--and snapshots or point-in-time copies of databases.
Although technically it may be defined as a "snapshot", Microsoft Windows Server 2003's Volume Shadow Copy service is supported along with Microsoft's Automated System Recovery.
The administration software runs on Windows NT to 2003, HP-UX 11.x, and Sun Solaris 7 and 8. Backup agents are available for a very wide range of OSes, including XP 64-bit and Compaq Tru64, Novell Netware, and various flavours of Unix and Linux. Application agent support is also extensive covering all of the major databases and backup device servers.
|Product||HP OpenView Storage Data Protector V 5.1
|Price||$1559 (single server edition for Windows); $2604 (starter pack for Windows)|
|Phone||13 13 47|
|Wide platform and database support.|
|Capabilities can grow with your enterprise with the addition of other OpenView components.|
|For basic network functionality, the starter pack is probably the one to pick.|
|Unlimited phone support between 8am and 5pm and some software updates. Extended support packages are available at additional cost.|
PowerQuest V2i Protector 2.0 Small Business Edition
So if your entire server goes belly up, you simply wheel in a new server and perform a "bare metal" restore in a matter of minutes without needing to fiddle with drivers, patches, and updates (a bare metal restore bypasses reinstalling the OS, creates a disk partition automatically, and recovers the entire system without manual intervention).
In addition to the bare metal restore, Protector is capable of restoring just the OS, data, or individual files or folders. A neat feature is that backup image files can be mounted as read-only drives for shared access by others.
No doubt you can see a limitation in this approach: for a bare metal restore, the new server must be pretty much identical to the old; you can't replace your old no-name server with a new HP box, for example. Another limitation with Protector is that it only supports Windows 200x, Windows NT, and Windows Small Business Servers.
The user interface is very clean and simple, which is perhaps not surprising given the overall simplicity of the features set. The interface is so easy to navigate that it is possible in just a few minutes to create a backup task and backup your server without a single glance at the online help.
Creating a backup job is just a few straightforward steps; there are not a lot of options other than full or incremental backup and your basic scheduling. Restoring your drive is just as straightforward although if the system drive goes belly up, a bootable rescue disk can be created to retrieve the image. It is also easy to find and restore a single file or folder from the backup image file.
|Product||PowerQuest V2i Protector 2.0 Small Business Edition|
|Price||$2080 (single license)|
|Phone||02 9521 6466|
|Windows OS support only.|
|Reasonably basic feature set works very well but does not cater for infrastructure growth as well as some other solutions.|
|Average value for money given its limited feature set and target of the small business market.|
|Phone support during normal business hours and 24x7 Web access. Further year support is renewable at 21 percent of the license cost.|
Snap Server 4500 and Backup Express
The server is powered by a 2.4GHz Pentium 4 processor with 512MB of memory expandable to 3GB. In our configuration, it was equipped with four 250GB hard drives for around 700GB of RAID 5 storage. The hardware supports a good range of network file protocols including Microsoft, Unix, and Apple, and supports clients from Windows 95 to XP, Mac OS8.x to X.x, Unix including Solaris, HP-UX, AIX, SCO, and Linux. Connectivity to the LAN is provided by a pair of gigabit Ethernet ports.
Managing the server is a breeze, with its simple Web interface and Linux-based OS called Guardian. Guardian is preloaded and configured; on our system it even included backup images of the clean system should you need to reinstall. Guardian features a journaling file system and to further improve robustness and security, has eTrust Antivirus software integrated, as well as support for SSL v3 and a password encrypted SSH CLI.
The OS has embedded snapshot technology for non-disruptive backups and server-to-server synchronisation. The unit also shipped with Backup Express to support backups to local tape.
As can be seen from the screenshot, configuring a snapshot task could not be easier and the disaster recovery window is similarly simple to use.
The software package supplied with the Snap Server is a product called Backup Express for GuardianOS from Syncsoft and the version that shipped is an "SE" edition in that it does not feature full functionality. The software will only backup from a Snap Server and up to five additional Snap Servers and then only to any locally attached SCSI tape device. The software is very easy to drive and can be upgraded to the fully-functional Enterprise Edition for an added cost.
|Product||Snap Server 4500 and Backup Express|
|Phone||02 9318 4222|
|Supports a wide range of network file protocols but the software supplied is restricted to GuardianOS functionality.|
|As the enterprise grows multiple servers can be implemented although the software will need upgrading to increase backup flexibility.|
|The server is modestly priced given its high disk capacity.|
|One-year standard RTB warranty. Premium support plans are available for a fee.|
Veritas NetBackup Business Server V4.5
The software has two main components: the administration console and the backup, archive, and restore console. There are other extra-cost options such as the intelligent disaster recovery ($1525) specifically for Windows platforms or bare metal restore ($1525), which as the name suggests can be scripted to deploy bare metal servers to full functionality. Bare metal restore currently supports Windows, Solaris, HO, AIX, and will soon support Linux. We should also note that the basic software will happily back up any files you wish but live backup of a database, for example, requires the purchase of the relevant backup agent and these are around $2182 each for Microsoft Exchange and SQL server, for example.
The administration console features a simple tree-structure layout for selecting main functions such as policy management, media and device management and host properties. As an item is selected from the tree, a group of supporting sub functions appear in the right hand window. We found it very easy to navigate and quite logical in its operation.
As the name suggests, it is here that NetBackup functionality for your site is configured and managed.
In addition to full, incremental, and differential backups, NetBackup can also perform synthetic backups where a full backup and several incremental backups are combined to produce an up-to-date "full" backup, without the need to bog down the database server with the requirements of a true full backup. Another neat feature is the ability to multiplex up to 32 data streams onto a single tape, this can be useful if, for one reason or another, the data stream from a single server is below your tape drive's maximum throughput.
Where server performance is critical, NetBackup can manage the staging of the backup data to an intermediate drive array prior to the subsequent transfer to tape thus reducing the time the main server must contend with backing up. The backup agent is quite intelligent, unlike some packages where multiple copies result in additional server load. NetBackup simply grabs the data from the server once and maintains a buffer, on another server, to feed the single backup stream to multiple tapes.
Vaulting is also supported: the software manages the tapes and maintains a database of which tapes must be sent off site and which tapes are to be returned from the vault for reuse. This extends right down to tape "pick lists" and if your backup hardware supports the feature, ejecting the relevant tape/s for the operators to dispatch.
While you can nail down backup strategies using the very flexible policy editor it is also possible to carry out more ad-hoc backup and restores with the backup, archive, and restore applet. Again a simple tree structure is employed but this is only used to navigate your volumes and folders. As the name suggests, functionality is pretty limited but it makes a simple chore of any ad-hoc backup tasks.
|Product||Veritas NetBackup Business Server V4.5|
|Price||$4375 (with 4 client license)|
|Phone||02 8220 7000|
|Supports a wide range of OSes, databases, and has very broad backup device support.|
|Can be integrated Veritas’s extensive range of network management software.|
|Average value for money given its strong feature set.|
|Standard support is e-mail only with no guaranteed response time. Full 24x7 support is available for around 23 percent of the license cost per year.|
Will the software be able to backup a variety of operating systems, databases, and other applications?
Will the software be able to scale and extend its capabilities to grow with your business?
Does the price justify the the productivity you may gain by using the software?
What options are available for service and support, and how much do they cost?
Disaster recovery techniques
- Synchronous Replication
- Data is current at all sites.
- Should a failure occur, data is available immediately.
- Generally impacts on application performance.
- Data writes can lag by the network latency between the remote sites, this also results in write commit latencies across nodes.
- Asynchronous Replication
- Should have less impact on application performance than synchronous so maximises performance.
- Data is available immediately at the secondary replication site.
- Secondary replication site data may lag primary site.
- Potential for data corruption with some databases.
- Periodic Replication
- Data is available immediately at the secondary replication site.
- Data is old and possibly inconsistent.
- Requires more storage to maintain overall consistency.
- Reconciling data sets may be very difficult.
Remote mirroring--where your primary and secondary sites are connected by SAN fabric--is certainly a good way to go. With volume management software, the physical disks on both sites are transparently (to the application and users) organised into logical volumes, and with some management software, this can be hardware independent. With the right management software, the administration is quite simple.
Both replication and remote mirroring have their pros and cons, mirroring over the SAN is generally restricted to distances of less than 100km but it can occur synchronously while replication can take place at distances greater than 100km using IP, for example, but unless you can live with application performance degradation, it is generally asynchronous.
Clustering is a seamless way of improving the chances of business continuation during a disaster. A simple definition of clustering is the use of multiple servers, storage devices, and redundant connections, which appear to the outside world as a single system.
Clustering--in addition to providing high availability--has the added benefit that it can be used for load balancing. Of course the application must support clustering, and large databases such as SAP and Oracle certainly do.
A cluster can exist over a SAN so the primary site can be relatively remote to the other failover sites, if your organisation is large enough and geographically dispersed. Global clustering can be linked by a public carrier WAN and still employ a single point of monitoring and administration.
A typical architecture may be to have the main cluster at the primary site so if a single server fails switching to a failover server, it will not result in any performance hit and the remote site may not be used at all unless a major accident takes out your primary cluster.
Disaster recovery precautions
Don't put all your eggs in one basket!
Where is your failover server or tape repository located? The building next door maybe?
It does not take a genius to see that this is not terribly bright: a fire or some other local disaster could very easily take out the building next door as well. If you want to be really sure about your DR security then you must think of more than simply locating your backups further away than the next building.
If you do happen to locate your DR recovery in a relatively distant remote site, is it on the same power grid or does it use the same telco? A power blackout or the telco going down may knock out your primary site but you still want customers to be able to carry out transactions on your secondary site.
What about other geographic location considerations?
Do your primary and secondary sites sit on the same fault line? Of course in Australia this is not as important say a Californian company, but it is worth considering?
What about flood plains? If both sites are located in relatively low-lying areas, a freak downpour--and let's face it the weather is quite unpredictable of late--could result in both sites being inundated.
And both vendors and governments never tire of telling us about the grimmer possibilities. Locating one site in Sydney and the other in Melbourne might be all well and good, but if both sites are located next to "strategic" sites such airports or in the CBD, it may not be such a great idea.
Always locate the vault offsite--a fireproof safe is simply not good enough.
To speed up DR in the event of a problem, it can help to have two copies of each backup, one for the offsite vault and the second retained in a safe place either onsite or preferably a couple of minutes away. This way, should your normal garden-variety failure occur--such as an array going west, not a catastrophic failure such as your office burning down--the second set can be located and installed in a matter of minutes without bothering to source, and pay the fee, for the copy from the vault.
A typical offsite vaulting process may involve the following steps:
- Carry out the backups (ensure two copies of each)
- Catalogue tapes and prepare for pickup by vaulting supplier.
- Tapes are transported to the offsite vault.
- Tapes are stored in the vault and the location of each tape recorded.
- When the tapes expire, based on your backup policies, they are transported back onsite for reuse.
Creating your two copies of data may be a problem for example if you create a single backup tape and then copy the data from one tape to another. The delay getting one copy offsite and into a vault offers a window of opportunity for a disaster to occur that potentially takes out both copies while they are at the primary site.
It would of course be wise to create both tapes at the same time during a backup, however many of the backup agents are not very smart and try and drag two copies of the data and push them over the network creating undue work for the server, storage array, and network. The smarter backup agents will actually grab the data once and create a buffer on the backup server to cater for the potentially different backup speeds of the two tapes.
It would of course be nice if the server could be kept out of the equation so it can devote all its time to serving the users. A true serverless backup is possible. As an example, your storage array can be connected to a smart fabric switch that includes an intelligent backup agent API. In this case, the backup data is streamed straight from the storage array to the fabric switch and then directly to the tape backup.
So you purchased some backup tapes and got a great deal, they are not your usual brand but hey, a tape cartridge is a tape cartridge right?
The industry scuttlebutt goes like this: when the tape media is manufactured it is wound onto large drums, the outside portion of the winding actually stretches, degrading the tape quality. Typically the outside winding is sold relatively cheaply to the bargain- price cartridge manufacturers and the inner windings are sold at a higher price to the "premium" tape manufacturers.
Like most things in life, you get what you pay for. Unless of course this is just a rumour spread by the more expensive cartridge manufacturers to justify their inflated prices.
Subscribe now to Australian Technology & Business magazine. About RMIT IT Test Labs
RMIT IT Test Labs is an independent testing institution based in Melbourne, Victoria, performing IT product testing for clients such as IBM, Coles-Myer, and a wide variety of government bodies. In the Labs' testing for T&B, they are in direct contact with the clients supplying products and the magazine is responsible for the full cost of the testing. The findings are the Labs' own--only the specifications of the products to be tested are provided by the magazine. For more information on RMIT, please contact the Lab Manager, Steven Turvey.