Tired of the same old punditry and OS wars? Want to read something practical you can actually use and apply to your real job? Or perhaps you need some light reading material to help you get some sleep on the plane between consulting engagements – either way, welcome to the first in what I hope will be a series of technical HOWTO articles, entitled “Geek Sheets”.
One of my favorite Linux tools and live CD distributions is theSystem Rescue CD. It allows you to boot up on any x86-based, PowerPC and SPARC-based machine and perform any number of backup and recovery tasks on Linux, Mac, Solaris and Windows-based systems. The System Rescue CD can even be booted on completely diskless systems on a USB stick or PXE-booted over the network. In another publication,I went into depth on how to use some of System Rescue CD's basic functions and how to bare-metal image a typical desktop and server Linux or Windows configuration.
However, that article was aimed at an end-user or utility computing scenario using fixed file system configurations using local storage and not on an enterprise server machine. For storage flexibility, many Linux server systems today now use LVM, or Logical Volume Manager, rather than use fixed file system partitioning that desktop distributions such as Ubuntu typically use. This adds a layer of abstraction on top of the bare metal partition layout and introduces some complexity as to how these systems can be imaged.
As a best practice, enterprises should look to do SAN-based replication and disaster recovery for a system imaging and snapshot solution. However, we all know that it isn't always cost effective or practical to have a boot from SAN or SAN-replicated server infrastructure, and it may sometimes be desirable – such as during a data center move – to have a bare-metal “snapshot” backup of an entire system on a network file store or a portable storage device where everything can be quickly restored as it was without having to do some sort of complicated rebuild. I recently had to perform this for an engagement I was working on and it dawned on me that this procedure, while probably very useful, was not documented anywhere in one place.
The good news is that it can still all be done with standard Linux tools, System Rescue CD and without any expensive proprietary software. The bad news is that nobody has automated the process. I'm hoping that some enterprising college or high school kid looking for a summer project will take the information I am posting here and create a utility or script for inclusion in the next System Rescue CD build. Can you say “Google Summer of Code?”
I would like to give special thanks to Durham, NC-based Linux sysadmin Mike Brennan for helping me to develop this procedure.
Storing your System Images and creating the NFS mount
If you have a bunch of machines that you need to image, the easiest way is to set up an NFS export on a Linux box (or UNIX machine) on a network that all your systems can talk to with sufficient storage allocated to it. As we will be using Partimage for the imaging solution, only the used portions of the partitions are going to be stored, so even if you have multiple 500GB drives on the source machines, if they are only 20 percent utilized, only the actual used data will be sent across. This is superior and much more efficient than using the Unix/Linux “dd” command which dumps an entire block device to an image file – which includes all the zeroed unused bits. On large filesystems this can make a big difference in backup and restore time.
On most Linux systems, NFS is enabled by starting the scripts /etc/init.d/portmap and /etc/init.d/nfsd (or /etc/init.d/nfs-kernel-server). The configuration file /etc/exports is what defines the actual shared directories and sets the network permissions. Here is an example listing of that file that will permit any computer on the network to dump to it:
This exports the directory “/home/baremetal” to everyone on the network, with full read-write permissions. For more information on how to set up an NFS server, please consult the Linux NFS HOWTO at SourceForge
Note that even a Windows system will suffice as an NFS server if you install Services for UNIX and enable the NFS service, which is built into Server 2003R3 and Server 2008.
Once the NFS server is started, you may now boot with your copy of System Restore CD and issue the following commands:
#root@sysresccd mount 192.168.1.100:/home/baremetal /mnt/backup
Where “192.168.1.100” is the actual IP or DNS hostname of the NFS server. Please note that when using IBM AIX machines running as an NFS server, that a known bug exists where the remote mount fails unless the remote machine is registered on DNS or on the local /etc/hosts of the NFS server machine.
Once the NFS volume is mounted on /mnt/backup of the System Rescue CD, make a directory for each system you wish to back up with the mkdir command, such as “mkdir /mnt/backup/backuptest1”
The first thing we need to do is back up the partition geometry of all the storage devices on the system. In my sample case, I have an x86-based Linux system using LVM on a hardware-based RAID5, so the partitions will come up under /dev/sd(x) nomenclature. I only have a single RAID device, /dev/sda. On a software RAID, it will come up as /dev/md(x). Similarily, file systems mounted on a SAN using Host Bus Adapters (HBAs) will also appear as /dev/sdx.
The System Rescue CD's kernel has pre-compiled modules for the most common Host Bus Adapter and SCSI/SAS/SATA/RAID controller types, including qlogic and emulex You can issue a “dmesg > /mnt/backup/backuptest1/dmesg.txt” command to dump the kernel boot messages to a file, which contains all the detected devices on the system. Alternatively a “df -h” will show you all the block devices and file systems the kernel detects. The following command will display the partition map of my local RAID device, /dev/sda :
This output file will be needed to restore the original geometry of the blank partitions prior to restoring the LVM metadata and the partition image itself. You will need to do this for every major block device on the system (/dev/sda, /dev/sdb/, /dev/md0, /dev/md1, etc).
The following command will display my local Volume Groups:
root@sysresccd % vgchange -ay
File descriptor 4 left open
1 logical volume(s) in volume group "VolGroup02" now active
2 logical volume(s) in volume group "VolGroup01" now active
2 logical volume(s) in volume group "VolGroup00" now active</blockquote>
And the following command will map volume groups to block devices:
And here is how to show similar information in greater detail:
root@sysresccd /mnt/backup/backuptest1 % pvdisplay
File descriptor 4 left open
--- Physical volume ---
PV Name /dev/sda5
VG Name VolGroup02
PV Size 29.09 GB / not usable 24.54 MB
PE Size (KByte) 32768
Total PE 930
Free PE 803
Allocated PE 127
PV UUID S7FvV3-9Vea-hq7K-bJgH-bwd0-3S0N-u3NfPi--- Physical volume ---
PV Name /dev/sda3
VG Name VolGroup01
PV Size 48.83 GB / not usable 15.17 MB
PE Size (KByte) 32768
Total PE 1562
Free PE 1050
Allocated PE 512
PV UUID 7aYX40-I9HJ-2hcS-gBT0-Xajx-1aka-wQtKww
--- Physical volume ---
PV Name /dev/sda2
VG Name VolGroup00
PV Size 58.59 GB / not usable 592.50 KB
PE Size (KByte) 32768
Total PE 1875
Free PE 1116
Allocated PE 759
PV UUID 7tYXil-6peA-y1j1-OHtP-hNfv-dgjg-3sUfh2
You may wish to issue a “pvdisplay > /mnt/backup/backuptest1/pvdisplay.txt” and print this out for every machine being backed up, should you need to refer to this later.
Here is where we get into the real nitty gritty. On LVM based systems, the actual Logical Volumes that reside within specific Volume Groups are enumerated in the /dev/mapper directory.
root@sysresccd /dev/mapper % ls
VolGroup00-LogVol00 VolGroup01-LogVol02 VolGroup02-LogVol00
VolGroup00-LogVol01 VolGroup01-LogVol03 control
On a stock RHEL machine, the default naming convention for Logical Volume files are “VolGroupNN-LogVolNN”. Your system may have different names. Each one of these entries correspond to a unique logical volume configuration file as well as a logical volume image file that we are going to store on the remote server. The “control” file is not backed up to an image.
Now we issue the command(s) to back up the LVM metadata for each Logical Volume:
With the LVM metadata preserved, we back up each physical partition using the partimage program.
root@sysresccd /mnt/backup/backuptest1 % partimage -d -b -z0 save /dev/sda1 /mnt/backup/backuptest1/backuptest-sda1.img
/dev/sda1 is our /boot partition, which does not reside in a volume group, it's a fixed ext3 file system. The -d flag bypasses the description prompt, the -b executes in batch mode with no user intervention, and -z0 specifies no compression. Alternatively, -z1-z2 adds gzip and bzip2 compression respectively but it also slows down the backup process considerably.
The second image file to be backed up, the first logical volume, has the addition of the -M flag, which is not to save the Master Boot Record. This is a precaution more than anything else, so that the only partition saved with an MBR is /dev/sda1, the /boot.
So you've experienced a catastrophic failure on one of your servers doing a datacenter move or someone accidentally re-initialized your drive array during a night of wild inebriated partying at the KVM console. No problem, you have a bare metal backup! Boot from the System Restore CD, re-mount your NFS, and get to work.
You'll want to use the original UUID of your physical volume detailed in the "pvdisplay" compand from earlier. You'll need to do this for every volume group. EDITOR'S NOTE -- this may be a redundant step because the next step restores the UUIDs.
Next, restore the LVM metadata to restore the /dev/mapper entries:
sysresccd backuptest1 # mkswap /dev/mapper/VolGroup02-LogVol00
Setting up swapspace version 1, size = 4261408 kB
no label, UUID=f33236e4-cef7-454e-8455-7c04412a87cd
Deactivate LVM prior to rebooting the machine:
sysresccd backuptest1 # vgchange -an
0 logical volume(s) in volume group "VolGroup02" now active
0 logical volume(s) in volume group "VolGroup01" now active
0 logical volume(s) in volume group "VolGroup00" now active
Sync the filesystems:
sysresccd backuptest1 # sync
Reboot the box:
sysresccd backuptest1 # reboot
Broadcast message from root (pts/1) (Thu May 1 15:58:35 2008):
The system is going down for reboot NOW!
Eject the System Rescue CD, reboot as normal. Pat yourself on the back and crack open a frosty one, you've saved the day, supergeek.