Geek Sheet: Bare-metal backup and recovery

Geek Sheet: Bare-metal backup and recovery

Summary: Tired of the same old punditry and OS wars? Want to read something practical you can actually use and apply to your real job?

SHARE:

Geek Sheet Logo (Courtesy Brandon Perlow)Tired of the same old punditry and OS wars? Want to read something practical you can actually use and apply to your real job? Or perhaps you need some light reading material to help you get some sleep on the plane between consulting engagements – either way, welcome to the first in what I hope will be a series of technical HOWTO articles, entitled “Geek Sheets”.

One of my favorite Linux tools and live CD distributions is the System Rescue CD. It allows you to boot up on any x86-based, PowerPC and SPARC-based machine and perform any number of backup and recovery tasks on Linux, Mac, Solaris and Windows-based systems. The System Rescue CD can even be booted on completely diskless systems on a USB stick or PXE-booted over the network. In another publication, I went into depth on how to use some of System Rescue CD's basic functions and how to bare-metal image a typical desktop and server Linux or Windows configuration.

However, that article was aimed at an end-user or utility computing scenario using fixed file system configurations using local storage and not on an enterprise server machine. For storage flexibility, many Linux server systems today now use LVM, or Logical Volume Manager, rather than use fixed file system partitioning that desktop distributions such as Ubuntu typically use. This adds a layer of abstraction on top of the bare metal partition layout and introduces some complexity as to how these systems can be imaged.

Logical Volume Manager Layout

As a best practice, enterprises should look to do SAN-based replication and disaster recovery for a system imaging and snapshot solution. However, we all know that it isn't always cost effective or practical to have a boot from SAN or SAN-replicated server infrastructure, and it may sometimes be desirable – such as during a data center move – to have a bare-metal “snapshot” backup of an entire system on a network file store or a portable storage device where everything can be quickly restored as it was without having to do some sort of complicated rebuild. I recently had to perform this for an engagement I was working on and it dawned on me that this procedure, while probably very useful, was not documented anywhere in one place.

The good news is that it can still all be done with standard Linux tools, System Rescue CD and without any expensive proprietary software. The bad news is that nobody has automated the process. I'm hoping that some enterprising college or high school kid looking for a summer project will take the information I am posting here and create a utility or script for inclusion in the next System Rescue CD build. Can you say “Google Summer of Code?

I would like to give special thanks to Durham, NC-based Linux sysadmin Mike Brennan for helping me to develop this procedure.

Next: Storing System Images and creating the NFS mount -->

Storing your System Images and creating the NFS mount

If you have a bunch of machines that you need to image, the easiest way is to set up an NFS export on a Linux box (or UNIX machine) on a network that all your systems can talk to with sufficient storage allocated to it. As we will be using Partimage for the imaging solution, only the used portions of the partitions are going to be stored, so even if you have multiple 500GB drives on the source machines, if they are only 20 percent utilized, only the actual used data will be sent across. This is superior and much more efficient than using the Unix/Linux “dd” command which dumps an entire block device to an image file – which includes all the zeroed unused bits. On large filesystems this can make a big difference in backup and restore time.

On most Linux systems, NFS is enabled by starting the scripts /etc/init.d/portmap and /etc/init.d/nfsd (or /etc/init.d/nfs-kernel-server). The configuration file /etc/exports is what defines the actual shared directories and sets the network permissions. Here is an example listing of that file that will permit any computer on the network to dump to it:

/home/baremetal *(async,rw,no_root_squash)

This exports the directory “/home/baremetal” to everyone on the network, with full read-write permissions. For more information on how to set up an NFS server, please consult the Linux NFS HOWTO at SourceForge

Note that even a Windows system will suffice as an NFS server if you install Services for UNIX and enable the NFS service, which is built into Server 2003R3 and Server 2008.

Once the NFS server is started, you may now boot with your copy of System Restore CD and issue the following commands:

#/etc/init.d/nfs restart

#root@sysresccd mount 192.168.1.100:/home/baremetal /mnt/backup

Where “192.168.1.100” is the actual IP or DNS hostname of the NFS server. Please note that when using IBM AIX machines running as an NFS server, that a known bug exists where the remote mount fails unless the remote machine is registered on DNS or on the local /etc/hosts of the NFS server machine.

Once the NFS volume is mounted on /mnt/backup of the System Rescue CD, make a directory for each system you wish to back up with the mkdir command, such as “mkdir /mnt/backup/backuptest1”

Next: Backing up the System -->

Backing up the System

The first thing we need to do is back up the partition geometry of all the storage devices on the system. In my sample case, I have an x86-based Linux system using LVM on a hardware-based RAID5, so the partitions will come up under /dev/sd(x) nomenclature. I only have a single RAID device, /dev/sda. On a software RAID, it will come up as /dev/md(x). Similarily, file systems mounted on a SAN using Host Bus Adapters (HBAs) will also appear as /dev/sdx.

The System Rescue CD's kernel has pre-compiled modules for the most common Host Bus Adapter and SCSI/SAS/SATA/RAID controller types, including qlogic and emulex You can issue a “dmesg > /mnt/backup/backuptest1/dmesg.txt” command to dump the kernel boot messages to a file, which contains all the detected devices on the system. Alternatively a “df -h” will show you all the block devices and file systems the kernel detects. The following command will display the partition map of my local RAID device, /dev/sda :

root@sysresccd /mnt/backup/backuptest1 % sfdisk -d /dev/sda # partition table of /dev/sda unit: sectors/dev/sda1 : start= 63, size= 208782, Id=83, bootable /dev/sda2 : start= 208845, size=122881185, Id=8e /dev/sda3 : start=123090030, size=102398310, Id=8e /dev/sda4 : start=225488340, size= 60998805, Id= 5 /dev/sda5 : start=225488403, size= 60998742, Id=8e

Similarly, the following command will dump the same output to a file:

root@sysresccd /mnt/backup/backuptest1 % sfdisk -d /dev/sda > /mnt/backup/backuptest1/backuptest-ptable.sda

This output file will be needed to restore the original geometry of the blank partitions prior to restoring the LVM metadata and the partition image itself. You will need to do this for every major block device on the system (/dev/sda, /dev/sdb/, /dev/md0, /dev/md1, etc).

The following command will display my local Volume Groups:

root@sysresccd % vgchange -ay

File descriptor 4 left open 1 logical volume(s) in volume group "VolGroup02" now active 2 logical volume(s) in volume group "VolGroup01" now active 2 logical volume(s) in volume group "VolGroup00" now active</blockquote>

And the following command will map volume groups to block devices:

root@sysresccd /mnt/backup/backuptest1 % pvscan File descriptor 4 left open PV /dev/sda5 VG VolGroup02 lvm2 [29.06 GB / 25.09 GB free] PV /dev/sda3 VG VolGroup01 lvm2 [48.81 GB / 32.81 GB free] PV /dev/sda2 VG VolGroup00 lvm2 [58.59 GB / 34.88 GB free] Total: 3 [136.47 GB] / in use: 3 [136.47 GB] / in no VG: 0 [0 ]

And here is how to show similar information in greater detail:

root@sysresccd /mnt/backup/backuptest1 % pvdisplay File descriptor 4 left open --- Physical volume --- PV Name /dev/sda5 VG Name VolGroup02 PV Size 29.09 GB / not usable 24.54 MB Allocatable yes PE Size (KByte) 32768 Total PE 930 Free PE 803 Allocated PE 127 PV UUID S7FvV3-9Vea-hq7K-bJgH-bwd0-3S0N-u3NfPi--- Physical volume --- PV Name /dev/sda3 VG Name VolGroup01 PV Size 48.83 GB / not usable 15.17 MB Allocatable yes PE Size (KByte) 32768 Total PE 1562 Free PE 1050 Allocated PE 512 PV UUID 7aYX40-I9HJ-2hcS-gBT0-Xajx-1aka-wQtKww

--- Physical volume --- PV Name /dev/sda2 VG Name VolGroup00 PV Size 58.59 GB / not usable 592.50 KB Allocatable yes PE Size (KByte) 32768 Total PE 1875 Free PE 1116 Allocated PE 759 PV UUID 7tYXil-6peA-y1j1-OHtP-hNfv-dgjg-3sUfh2

You may wish to issue a “pvdisplay > /mnt/backup/backuptest1/pvdisplay.txt” and print this out for every machine being backed up, should you need to refer to this later.

Here is where we get into the real nitty gritty. On LVM based systems, the actual Logical Volumes that reside within specific Volume Groups are enumerated in the /dev/mapper directory.

 

root@sysresccd /dev/mapper % ls VolGroup00-LogVol00 VolGroup01-LogVol02 VolGroup02-LogVol00 VolGroup00-LogVol01 VolGroup01-LogVol03 control

On a stock RHEL machine, the default naming convention for Logical Volume files are “VolGroupNN-LogVolNN”. Your system may have different names. Each one of these entries correspond to a unique logical volume configuration file as well as a logical volume image file that we are going to store on the remote server. The “control” file is not backed up to an image. Now we issue the command(s) to back up the LVM metadata for each Logical Volume:

 

root@sysresccd /mnt/backup/backuptest1 % vgcfgbackup -d -v VolGroup00 -f /mnt/backup/backuptest1/backuptest1-VolGroup00.lvm.backuproot@sysresccd /mnt/backup/backuptest1 % vgcfgbackup -d -v VolGroup01 -f /mnt/backup/backuptest1/backuptest1-VolGroup01.lvm.backup

root@sysresccd /mnt/backup/backuptest1 % vgcfgbackup -d -v VolGroup02 -f /mnt/backup/backuptest1/backuptest1-VolGroup02.lvm.backup

With the LVM metadata preserved, we back up each physical partition using the partimage program.

root@sysresccd /mnt/backup/backuptest1 % partimage -d -b -z0 save /dev/sda1 /mnt/backup/backuptest1/backuptest-sda1.img

/dev/sda1 is our /boot partition, which does not reside in a volume group, it's a fixed ext3 file system. The -d flag bypasses the description prompt, the -b executes in batch mode with no user intervention, and -z0 specifies no compression. Alternatively, -z1-z2 adds gzip and bzip2 compression respectively but it also slows down the backup process considerably.

The second image file to be backed up, the first logical volume, has the addition of the -M flag, which is not to save the Master Boot Record. This is a precaution more than anything else, so that the only partition saved with an MBR is /dev/sda1, the /boot.

root@sysresccd /mnt/backup/backuptest1 % partimage -d -M -b -z0 save /dev/mapper/VolGroup00-LogVol00 /mnt/backup/backuptest1/backuptest-VolGroup00-LogVol00.img

Now we do this for the rest of the logical volumes:

root@sysresccd /mnt/backup/backuptest1 % partimage -d -M -b -z0 save /dev/mapper/VolGroup00-LogVol01 /mnt/backup/backuptest1/backuptest-VolGroup00-LogVol01.imgroot@sysresccd /mnt/backup/backuptest1 % partimage -d -M -b -z0 save /dev/mapper/VolGroup01-LogVol02 /mnt/backup/backuptest1/backuptest-VolGroup01-LogVol02.img

root@sysresccd /mnt/backup/backuptest1 % partimage -d -M -b -z0 save /dev/mapper/VolGroup01-LogVol03 /mnt/backup/backuptest1/backuptest-VolGroup01-LogVol03.img

root@sysresccd /mnt/backup/backuptest1 % partimage -d -M -b -z0 save /dev/mapper/VolGroup02-LogVol00 /mnt/backup/backuptest1/backuptest-VolGroup02-LogVol00.img

root@sysresccd /mnt/backup/backuptest1 % partimage -d -M -b -z0 save /dev/mapper/VolGroup02-LogVol00 /mnt/backup/backuptest1/backuptest-VolGroup02-LogVol00.img

(The final command is not necessary as this is a swap partition and will be restored by the lvm metadata restore, command results in nothing done.)

So here is what files should be in your backup directory after you are done:

 

root@sysresccd /mnt/backup/backuptest1 % ls backuptest.ptable backuptest-VolGroup00-LogVol00.img.000 backuptest1-VolGroup00.lvm.backup backuptest-VolGroup00-LogVol01.img.000 backuptest1-VolGroup01.lvm.backup backuptest-VolGroup01-LogVol02.img.000 backuptest1-VolGroup02.lvm.backup backuptest-VolGroup01-LogVol02.img.001 backuptest1-pvscan.txt backuptest-VolGroup01-LogVol03.img.000 backuptest-sda1.img.000 backuptest1-vgdisplay.txt

Next: Restoring the System -->

Restoring the System

So you've experienced a catastrophic failure on one of your servers doing a datacenter move or someone accidentally re-initialized your drive array during a night of wild inebriated partying at the KVM console. No problem, you have a bare metal backup! Boot from the System Restore CD, re-mount your NFS, and get to work.

Restore the partition tables:

sysresccd backuptest1 # sfdisk /dev/sda < /mnt/backup/backuptest1/backuptest-ptable.sda

Rebuild each Volume Group:

sysresccd backuptest1 # pvcreate --uuid ASDFASDFASDFASDFASDF /dev/sda2

You'll want to use the original UUID of your physical volume detailed in the "pvdisplay" compand from earlier. You'll need to do this for every volume group. EDITOR'S NOTE -- this may be a redundant step because the next step restores the UUIDs.

Next, restore the LVM metadata to restore the /dev/mapper entries:

Restored volume group VolGroup00

sysresccd backuptest1 # vgcfgrestore --file /mnt/backup/backuptest1/backuptest1-VolGroup01.lvm.backup VolGroup01

Restored volume group VolGroup01

sysresccd backuptest1 # vgcfgrestore --file /mnt/backup/backuptest1/backuptest1-VolGroup02.lvm.backup VolGroup02

Restored volume group VolGroup02

Verify that the volume groups have been re-created:

PV /dev/sda5 VG VolGroup02 lvm2 [29.06 GB / 25.09 GB free] PV /dev/sda3 VG VolGroup01 lvm2 [48.81 GB / 32.81 GB free] PV /dev/sda2 VG VolGroup00 lvm2 [58.59 GB / 34.88 GB free] Total: 3 [136.47 GB] / in use: 3 [136.47 GB] / in no VG: 0 [0 ]

Activate the volume groups:

1 logical volume(s) in volume group "VolGroup02" now active 2 logical volume(s) in volume group "VolGroup01" now active 2 logical volume(s) in volume group "VolGroup00" now active

Restore the Master Boot Record on the first partition

sysresccd backuptest1 # partimage -e -b restmbr /dev/mapper/VolGroup00-LogVol00 /mnt/backup/backuptest1/backuptest-VolGroup00-LogVol00.img.000

Restore the partition data using partimage for each partition:

root@sysresccd /mnt/backup/backuptest1 % partimage -e -b restore /dev/sda1 /mnt/backup/backuptest1/backuptest-sda1.img

sysresccd backuptest1 # partimage -e -b restore /dev/mapper/VolGroup00-LogVol01 /mnt/backup/backuptest1/backuptest-VolGroup00-LogVol01.img.000

sysresccd backuptest1 # partimage -e -b restore /dev/mapper/VolGroup01-LogVol02 /mnt/backup/backuptest1/backuptest-VolGroup01-LogVol02.img.000

sysresccd backuptest1 # partimage -e -b restore /dev/mapper/VolGroup01-LogVol03 /mnt/backup/backuptest1/backuptest-VolGroup01-LogVol03.img.000

Restore the original swap partition:

sysresccd backuptest1 # mkswap /dev/mapper/VolGroup02-LogVol00 Setting up swapspace version 1, size = 4261408 kB no label, UUID=f33236e4-cef7-454e-8455-7c04412a87cd

Deactivate LVM prior to rebooting the machine:

sysresccd backuptest1 # vgchange -an 0 logical volume(s) in volume group "VolGroup02" now active 0 logical volume(s) in volume group "VolGroup01" now active 0 logical volume(s) in volume group "VolGroup00" now active

Sync the filesystems:

sysresccd backuptest1 # sync

Reboot the box:

sysresccd backuptest1 # reboot

Broadcast message from root (pts/1) (Thu May 1 15:58:35 2008):

The system is going down for reboot NOW!

Eject the System Rescue CD, reboot as normal. Pat yourself on the back and crack open a frosty one, you've saved the day, supergeek.

Got any more ideas for Geek Sheets you'd like to be published? Talk Back and let me know or reach me thru my contact page.

Topics: Operating Systems, Data Management, Hardware, Linux, Open Source, Servers, Software

About

Jason Perlow, Sr. Technology Editor at ZDNet, is a technologist with over two decades of experience integrating large heterogeneous multi-vendor computing environments in Fortune 500 companies. Jason is currently a Partner Technology Strategist with Microsoft Corp. His expressed views do not necessarily represent those of his employer.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

20 comments
Log in or register to join the discussion
  • rsync? Anyone? Bueler?

    rsync -auvPz /<path to source>/* /path to destination backup/
    D T Schmitz
  • rsync is for directories

    not for system images.
    jperlow
    • Good Lord. Yes.

      Like dd or dd_rescue. Gotcha.
      How does it deal with bad sectors Jason?

      I keep bootable Knoppix on a pen-drive hanging around my neck for drive recoveries. dd_rescue is great for doing backups even over ssh to network drives.
      D T Schmitz
      • comparison with dd

        obviously dd is the lowest level you can get, it simply duplicates a block device, and it is the most foolproof and simple method to cloning a disk since it it all done with a single command. However the disadvantage is on large filesystems you are in effect dumping the entire block device including the zero bits, so you then have to go an gzip or bzip that image for effective storage on your image dump directory. On a 500GB RAID set thats a lot of bits to send across the LAN.With partimage, it only saves the used bits and depending on the speed of your network and the speed of your processor you can set the compression level.
        jperlow
  • RE: Geek Sheet: Bare-metal backup and recovery

    I also think rsync is a better solution. When hardware fails you are unlikely to be able to get exact replacements (unless you've squirreled away spares).

    I think rsync's directory oriented back up is better than parimage's partition oriented backup when the restore target has changed.

    I can edit /etc/fstab /boot/grub etc. to account for different hardware in the rsync backup prior to the restore.

    --wally.
    wkulecz
    • Rsync can be combined with this solution

      You can conmbine rsync with system imaging and things like database exports. The point is you want to have the path of least pain to restoring a system as it was. Image, rsync critical user and data directories, export your db's to flat files. Then if your box fails, you restore the image, restore the deltas from an rsync, and then import your databases to your most current version or from incrementals if required.
      jperlow
  • Do what Texstar did with PCLinuxOS

    MKLiveCD to remaster the OS/iso to a Live CD that can be easily automated to install just the way you like it:

    http://www.linux.com/articles/44293
    johnf76@...
  • Hmmm

    Yet, this is more punditry.
    Spiritusindomit@...
  • RE: Geek Sheet: Bare-metal backup and recovery

    I will never need to do a bare metal restore since production servers are virtualized, and the host system can be brought up from scratch very quickly. The vm's are on the san, mirrored to another san.

    Have you ever tried BartPE as a rescue disk?
    erm@...
  • RE: Geek Sheet: Bare-metal backup and recovery

    I prefer Acronis True Image 11 to a "roll your own" solution.

    I think that backups like editors are very much a case of "how comfortable are we with the interface"?

    Acronis will do both Linus and Windows type of partition backups.
    frj111@...
  • RE: Geek Sheet: So how can we print this article as a sheet?

    heck, its called Geek Sheet but I do not see a way to print out all 4 pages as one article ???
    chips@...
    • Agreed ... Print functionality is deficient

      The print button on each displayed page only prints the article content displayed. It should just dump the entire article to the print device.
      David A. Pimentel
      • Or make available as PDF download?

        Especially for good "How To..." articles. You know, like TechRepublic.
        seanferd
        • Print is not working properly... Found way using OS X

          Print function only works for the page selected, so I had to
          make 4 PDF files. Luckly, with Mac OS X pdf application, I
          was able to bundle all 4 in 1.

          I only don`t know if I can share them.
          LucasArruda
          • created a single html page

            I use SeaMonkey, and editing the pages so that all the text and none of the ads, etc are collated wasn't that hard. Took about 3 minutes to do.

            [b]BUT[/b] it is shameful that there is no 'Print Whole Article' button.
            chips@...
  • Download PDF of Article Here:

    I put the article together in a PDF file and it is available for download here: http://cid-f1d12a0e1e2ad2b4.skydrive.live.com/self.aspx/OpenShare/GeekSheetBareMetalBkup.pdf

    I'll leave it posted for two or three weeks unless Jason or ZDNET requests me to take it down for some reason.
    blc1839
  • RE: Geek Sheet: Bare-metal backup and recovery

    Thanks for uploading a pdf. :D
    vhinzsanchez@...
  • Ubuntu's default install uses LVM

    Get your facts straight.
    angrykeyboarder
  • Mondo Rescue vs System Restore CD?

    Has anyone used Mondo Rescue to do a baremetal recovery? Pros and Cons between Mondo and System Recue CD?
    mariano.lopez@...
  • Bare Metal Online Backup?

    Does anyone know of a Bare Metal backup solution?
    Crash Plant, etc. does not offer this. Can't find an online version.
    Thanks.
    2012WillGO2012