X
Business

Geek Sheet: Bare-metal backup and recovery

Tired of the same old punditry and OS wars? Want to read something practical you can actually use and apply to your real job?
Written by Jason Perlow, Senior Contributing Writer

Tired of the same old punditry and OS wars? Want to read something practical you can actually use and apply to your real job? Or perhaps you need some light reading material to help you get some sleep on the plane between consulting engagements – either way, welcome to the first in what I hope will be a series of technical HOWTO articles, entitled “Geek Sheets”.

One of my favorite Linux tools and live CD distributions is theSystem Rescue CD. It allows you to boot up on any x86-based, PowerPC and SPARC-based machine and perform any number of backup and recovery tasks on Linux, Mac, Solaris and Windows-based systems. The System Rescue CD can even be booted on completely diskless systems on a USB stick or PXE-booted over the network. In another publication,I went into depth on how to use some of System Rescue CD's basic functions and how to bare-metal image a typical desktop and server Linux or Windows configuration.

However, that article was aimed at an end-user or utility computing scenario using fixed file system configurations using local storage and not on an enterprise server machine. For storage flexibility, many Linux server systems today now use LVM, or Logical Volume Manager, rather than use fixed file system partitioning that desktop distributions such as Ubuntu typically use. This adds a layer of abstraction on top of the bare metal partition layout and introduces some complexity as to how these systems can be imaged.

As a best practice, enterprises should look to do SAN-based replication and disaster recovery for a system imaging and snapshot solution. However, we all know that it isn't always cost effective or practical to have a boot from SAN or SAN-replicated server infrastructure, and it may sometimes be desirable – such as during a data center move – to have a bare-metal “snapshot” backup of an entire system on a network file store or a portable storage device where everything can be quickly restored as it was without having to do some sort of complicated rebuild. I recently had to perform this for an engagement I was working on and it dawned on me that this procedure, while probably very useful, was not documented anywhere in one place.

The good news is that it can still all be done with standard Linux tools, System Rescue CD and without any expensive proprietary software. The bad news is that nobody has automated the process. I'm hoping that some enterprising college or high school kid looking for a summer project will take the information I am posting here and create a utility or script for inclusion in the next System Rescue CD build. Can you say “Google Summer of Code?

I would like to give special thanks to Durham, NC-based Linux sysadmin Mike Brennan for helping me to develop this procedure.

Next: Storing System Images and creating the NFS mount -->

Storing your System Images and creating the NFS mount

If you have a bunch of machines that you need to image, the easiest way is to set up an NFS export on a Linux box (or UNIX machine) on a network that all your systems can talk to with sufficient storage allocated to it. As we will be using Partimage for the imaging solution, only the used portions of the partitions are going to be stored, so even if you have multiple 500GB drives on the source machines, if they are only 20 percent utilized, only the actual used data will be sent across. This is superior and much more efficient than using the Unix/Linux “dd” command which dumps an entire block device to an image file – which includes all the zeroed unused bits. On large filesystems this can make a big difference in backup and restore time.

On most Linux systems, NFS is enabled by starting the scripts /etc/init.d/portmap and /etc/init.d/nfsd (or /etc/init.d/nfs-kernel-server). The configuration file /etc/exports is what defines the actual shared directories and sets the network permissions. Here is an example listing of that file that will permit any computer on the network to dump to it:

/home/baremetal *(async,rw,no_root_squash)

This exports the directory “/home/baremetal” to everyone on the network, with full read-write permissions. For more information on how to set up an NFS server, please consult the Linux NFS HOWTO at SourceForge

Note that even a Windows system will suffice as an NFS server if you install Services for UNIX and enable the NFS service, which is built into Server 2003R3 and Server 2008.

Once the NFS server is started, you may now boot with your copy of System Restore CD and issue the following commands:

#/etc/init.d/nfs restart

#root@sysresccd mount 192.168.1.100:/home/baremetal /mnt/backup

Where “192.168.1.100” is the actual IP or DNS hostname of the NFS server. Please note that when using IBM AIX machines running as an NFS server, that a known bug exists where the remote mount fails unless the remote machine is registered on DNS or on the local /etc/hosts of the NFS server machine.

Once the NFS volume is mounted on /mnt/backup of the System Rescue CD, make a directory for each system you wish to back up with the mkdir command, such as “mkdir /mnt/backup/backuptest1”

Next: Backing up the System -->

Backing up the System

The first thing we need to do is back up the partition geometry of all the storage devices on the system. In my sample case, I have an x86-based Linux system using LVM on a hardware-based RAID5, so the partitions will come up under /dev/sd(x) nomenclature. I only have a single RAID device, /dev/sda. On a software RAID, it will come up as /dev/md(x). Similarily, file systems mounted on a SAN using Host Bus Adapters (HBAs) will also appear as /dev/sdx.

The System Rescue CD's kernel has pre-compiled modules for the most common Host Bus Adapter and SCSI/SAS/SATA/RAID controller types, including qlogic and emulex You can issue a “dmesg > /mnt/backup/backuptest1/dmesg.txt” command to dump the kernel boot messages to a file, which contains all the detected devices on the system. Alternatively a “df -h” will show you all the block devices and file systems the kernel detects. The following command will display the partition map of my local RAID device, /dev/sda :

root@sysresccd /mnt/backup/backuptest1 % sfdisk -d /dev/sda # partition table of /dev/sda unit: sectors/dev/sda1 : start= 63, size= 208782, Id=83, bootable /dev/sda2 : start= 208845, size=122881185, Id=8e /dev/sda3 : start=123090030, size=102398310, Id=8e /dev/sda4 : start=225488340, size= 60998805, Id= 5 /dev/sda5 : start=225488403, size= 60998742, Id=8e

Similarly, the following command will dump the same output to a file:

root@sysresccd /mnt/backup/backuptest1 % sfdisk -d /dev/sda > /mnt/backup/backuptest1/backuptest-ptable.sda

This output file will be needed to restore the original geometry of the blank partitions prior to restoring the LVM metadata and the partition image itself. You will need to do this for every major block device on the system (/dev/sda, /dev/sdb/, /dev/md0, /dev/md1, etc).

The following command will display my local Volume Groups:

root@sysresccd % vgchange -ay

File descriptor 4 left open 1 logical volume(s) in volume group "VolGroup02" now active 2 logical volume(s) in volume group "VolGroup01" now active 2 logical volume(s) in volume group "VolGroup00" now active</blockquote>

And the following command will map volume groups to block devices:

root@sysresccd /mnt/backup/backuptest1 % pvscan File descriptor 4 left open PV /dev/sda5 VG VolGroup02 lvm2 [29.06 GB / 25.09 GB free] PV /dev/sda3 VG VolGroup01 lvm2 [48.81 GB / 32.81 GB free] PV /dev/sda2 VG VolGroup00 lvm2 [58.59 GB / 34.88 GB free] Total: 3 [136.47 GB] / in use: 3 [136.47 GB] / in no VG: 0 [0 ]

And here is how to show similar information in greater detail:

root@sysresccd /mnt/backup/backuptest1 % pvdisplay File descriptor 4 left open --- Physical volume --- PV Name /dev/sda5 VG Name VolGroup02 PV Size 29.09 GB / not usable 24.54 MB Allocatable yes PE Size (KByte) 32768 Total PE 930 Free PE 803 Allocated PE 127 PV UUID S7FvV3-9Vea-hq7K-bJgH-bwd0-3S0N-u3NfPi--- Physical volume --- PV Name /dev/sda3 VG Name VolGroup01 PV Size 48.83 GB / not usable 15.17 MB Allocatable yes PE Size (KByte) 32768 Total PE 1562 Free PE 1050 Allocated PE 512 PV UUID 7aYX40-I9HJ-2hcS-gBT0-Xajx-1aka-wQtKww

--- Physical volume --- PV Name /dev/sda2 VG Name VolGroup00 PV Size 58.59 GB / not usable 592.50 KB Allocatable yes PE Size (KByte) 32768 Total PE 1875 Free PE 1116 Allocated PE 759 PV UUID 7tYXil-6peA-y1j1-OHtP-hNfv-dgjg-3sUfh2

You may wish to issue a “pvdisplay > /mnt/backup/backuptest1/pvdisplay.txt” and print this out for every machine being backed up, should you need to refer to this later.

Here is where we get into the real nitty gritty. On LVM based systems, the actual Logical Volumes that reside within specific Volume Groups are enumerated in the /dev/mapper directory.

 

root@sysresccd /dev/mapper % ls VolGroup00-LogVol00 VolGroup01-LogVol02 VolGroup02-LogVol00 VolGroup00-LogVol01 VolGroup01-LogVol03 control

On a stock RHEL machine, the default naming convention for Logical Volume files are “VolGroupNN-LogVolNN”. Your system may have different names. Each one of these entries correspond to a unique logical volume configuration file as well as a logical volume image file that we are going to store on the remote server. The “control” file is not backed up to an image. Now we issue the command(s) to back up the LVM metadata for each Logical Volume:

 

root@sysresccd /mnt/backup/backuptest1 % vgcfgbackup -d -v VolGroup00 -f /mnt/backup/backuptest1/backuptest1-VolGroup00.lvm.backuproot@sysresccd /mnt/backup/backuptest1 % vgcfgbackup -d -v VolGroup01 -f /mnt/backup/backuptest1/backuptest1-VolGroup01.lvm.backup

root@sysresccd /mnt/backup/backuptest1 % vgcfgbackup -d -v VolGroup02 -f /mnt/backup/backuptest1/backuptest1-VolGroup02.lvm.backup

With the LVM metadata preserved, we back up each physical partition using the partimage program.

root@sysresccd /mnt/backup/backuptest1 % partimage -d -b -z0 save /dev/sda1 /mnt/backup/backuptest1/backuptest-sda1.img

/dev/sda1 is our /boot partition, which does not reside in a volume group, it's a fixed ext3 file system. The -d flag bypasses the description prompt, the -b executes in batch mode with no user intervention, and -z0 specifies no compression. Alternatively, -z1-z2 adds gzip and bzip2 compression respectively but it also slows down the backup process considerably.

The second image file to be backed up, the first logical volume, has the addition of the -M flag, which is not to save the Master Boot Record. This is a precaution more than anything else, so that the only partition saved with an MBR is /dev/sda1, the /boot.

root@sysresccd /mnt/backup/backuptest1 % partimage -d -M -b -z0 save /dev/mapper/VolGroup00-LogVol00 /mnt/backup/backuptest1/backuptest-VolGroup00-LogVol00.img

Now we do this for the rest of the logical volumes:

root@sysresccd /mnt/backup/backuptest1 % partimage -d -M -b -z0 save /dev/mapper/VolGroup00-LogVol01 /mnt/backup/backuptest1/backuptest-VolGroup00-LogVol01.imgroot@sysresccd /mnt/backup/backuptest1 % partimage -d -M -b -z0 save /dev/mapper/VolGroup01-LogVol02 /mnt/backup/backuptest1/backuptest-VolGroup01-LogVol02.img

root@sysresccd /mnt/backup/backuptest1 % partimage -d -M -b -z0 save /dev/mapper/VolGroup01-LogVol03 /mnt/backup/backuptest1/backuptest-VolGroup01-LogVol03.img

root@sysresccd /mnt/backup/backuptest1 % partimage -d -M -b -z0 save /dev/mapper/VolGroup02-LogVol00 /mnt/backup/backuptest1/backuptest-VolGroup02-LogVol00.img

root@sysresccd /mnt/backup/backuptest1 % partimage -d -M -b -z0 save /dev/mapper/VolGroup02-LogVol00 /mnt/backup/backuptest1/backuptest-VolGroup02-LogVol00.img

(The final command is not necessary as this is a swap partition and will be restored by the lvm metadata restore, command results in nothing done.)

So here is what files should be in your backup directory after you are done:

 

root@sysresccd /mnt/backup/backuptest1 % ls backuptest.ptable backuptest-VolGroup00-LogVol00.img.000 backuptest1-VolGroup00.lvm.backup backuptest-VolGroup00-LogVol01.img.000 backuptest1-VolGroup01.lvm.backup backuptest-VolGroup01-LogVol02.img.000 backuptest1-VolGroup02.lvm.backup backuptest-VolGroup01-LogVol02.img.001 backuptest1-pvscan.txt backuptest-VolGroup01-LogVol03.img.000 backuptest-sda1.img.000 backuptest1-vgdisplay.txt

Next: Restoring the System -->

Restoring the System

So you've experienced a catastrophic failure on one of your servers doing a datacenter move or someone accidentally re-initialized your drive array during a night of wild inebriated partying at the KVM console. No problem, you have a bare metal backup! Boot from the System Restore CD, re-mount your NFS, and get to work.

Restore the partition tables:

sysresccd backuptest1 # sfdisk /dev/sda < /mnt/backup/backuptest1/backuptest-ptable.sda

Rebuild each Volume Group:

sysresccd backuptest1 # pvcreate --uuid ASDFASDFASDFASDFASDF /dev/sda2

You'll want to use the original UUID of your physical volume detailed in the "pvdisplay" compand from earlier. You'll need to do this for every volume group. EDITOR'S NOTE -- this may be a redundant step because the next step restores the UUIDs.

Next, restore the LVM metadata to restore the /dev/mapper entries:

Restored volume group VolGroup00

sysresccd backuptest1 # vgcfgrestore --file /mnt/backup/backuptest1/backuptest1-VolGroup01.lvm.backup VolGroup01

Restored volume group VolGroup01

sysresccd backuptest1 # vgcfgrestore --file /mnt/backup/backuptest1/backuptest1-VolGroup02.lvm.backup VolGroup02

Restored volume group VolGroup02

Verify that the volume groups have been re-created:

PV /dev/sda5 VG VolGroup02 lvm2 [29.06 GB / 25.09 GB free] PV /dev/sda3 VG VolGroup01 lvm2 [48.81 GB / 32.81 GB free] PV /dev/sda2 VG VolGroup00 lvm2 [58.59 GB / 34.88 GB free] Total: 3 [136.47 GB] / in use: 3 [136.47 GB] / in no VG: 0 [0 ]

Activate the volume groups:

1 logical volume(s) in volume group "VolGroup02" now active 2 logical volume(s) in volume group "VolGroup01" now active 2 logical volume(s) in volume group "VolGroup00" now active

Restore the Master Boot Record on the first partition

sysresccd backuptest1 # partimage -e -b restmbr /dev/mapper/VolGroup00-LogVol00 /mnt/backup/backuptest1/backuptest-VolGroup00-LogVol00.img.000

Restore the partition data using partimage for each partition:

root@sysresccd /mnt/backup/backuptest1 % partimage -e -b restore /dev/sda1 /mnt/backup/backuptest1/backuptest-sda1.img

sysresccd backuptest1 # partimage -e -b restore /dev/mapper/VolGroup00-LogVol01 /mnt/backup/backuptest1/backuptest-VolGroup00-LogVol01.img.000

sysresccd backuptest1 # partimage -e -b restore /dev/mapper/VolGroup01-LogVol02 /mnt/backup/backuptest1/backuptest-VolGroup01-LogVol02.img.000

sysresccd backuptest1 # partimage -e -b restore /dev/mapper/VolGroup01-LogVol03 /mnt/backup/backuptest1/backuptest-VolGroup01-LogVol03.img.000

Restore the original swap partition:

sysresccd backuptest1 # mkswap /dev/mapper/VolGroup02-LogVol00 Setting up swapspace version 1, size = 4261408 kB no label, UUID=f33236e4-cef7-454e-8455-7c04412a87cd

Deactivate LVM prior to rebooting the machine:

sysresccd backuptest1 # vgchange -an 0 logical volume(s) in volume group "VolGroup02" now active 0 logical volume(s) in volume group "VolGroup01" now active 0 logical volume(s) in volume group "VolGroup00" now active

Sync the filesystems:

sysresccd backuptest1 # sync

Reboot the box:

sysresccd backuptest1 # reboot

Broadcast message from root (pts/1) (Thu May 1 15:58:35 2008):

The system is going down for reboot NOW!

Eject the System Rescue CD, reboot as normal. Pat yourself on the back and crack open a frosty one, you've saved the day, supergeek.

Got any more ideas for Geek Sheets you'd like to be published? Talk Back and let me know or reach me thru my contact page.

Editorial standards