Btrfs hands-on: Exploring the error recovery features of the new Linux file system

Making things right when something goes wrong, and a summary of what I have seen and done in this series.
Written by J.A. Watson, Contributor

This is my final post in this series about the btrfs filesystem. The first in the series covered btrfs basics, the second was resizing, multiple volumes and devices, the third was RAID and Redundancy, and the fourth and most recent was subvolumes and snapshots

I think (and hope) that all of those together give a reasonable overview of what the btrfs filesystem is, what you can do with it, and how you can do some of those things.  In this post I will wrap up a couple of loose ends — error recovery, and integration with other standard Linux utilities — and try to give a recap of the series as a whole. For complete and authoritative information, please refer to the Btrfs Wiki at kernel.org.

OK, let's dive right into it. What happens when something goes wrong with a btrfs filesystem — or maybe you just suspect that something has gone wrong? Well, as I mentioned briefly in the second post in this series, you can use the btrfs scrub utility to read all of the data in the filesystem and verify the checksums as it goes. 

Btrfs hands on: My first experiments with a new Linux file system

Depending on the size of the filesystem, this can take a considerable time, so by default it will go off to do its work in the background and let you get on with other things on your terminal. Once you have a scrub running in the background, you can check on its progress at any time with the btrfs scrub status command. If it finds any checksum problems, the scrub process will attempt to repair them (unless you started it with the read-only option to suppress this).

If you have a damaged btrfs filesystem that you just want to get as much as possible of the data from, you can use the btrfs restore utility, which reads from an unmounted btrfs filesystem, and restores as much as it can to some specified path.

If your btrfs filesystem is made up of multiple devices or partitions, and one of those becomes damaged or unavailable, you can use the btrfs replace command to add a new device, copy the data from the old device to the new (as much as possible), and then remove the old device from the btrfs filesystem. This can be a lengthy process of copying over data, so this command is similar to the scrub command in that it will go into the background to do its work, and there are status and cancel options for this command. 

If the old device is not available (irretrievably damaged or defective), and you have a RAID filesystem, this command will rebuild the new device using the raid data from the other existing devices. In the worst case, if you just want to give up and remove the old device from the btrfs filesystem, you can use the btrfs device delete command.

There are a couple of things to remember in this area - first, this kind of failure recovery is one of the biggest reasons to use RAID, so this is an example of why you should consider it in the first place.

If you have a single-partition btrfs filesystem, or any other single-copy configuration, your options for replacing devices and recovering data are rather limited. On the other hand, RAID configurations require some minimum number of devices, and you can't go below such a minimum even as part of failure recovery. 

Here's a simple example — RAID1 (mirroring) requires at least two devices — you can't have a mirror without having something to do the reflection - so if you have such a minimum configuration, and then one device fails, your data will hopefully still be OK on the second device. 

But even though you know that physically you are down to only one device at that point, you still can't just delete the defective device, because that would put you below the logical minimum for a RAID1 filesystem.  So what you have to do is add another device to replace the defective one, while staying within the minimum limits.

Since I am sure that everyone reading this will be inspired to go out and add disks to their computers so that they can use btrfs RAID filesystems, how about if I just run through that entire procedure, up to and including device replacement, as an example? First, we still have the simple one-partition btrfs filesystem that I have used in some of the previous posts, which we can see with the show and df commands:

    # btrfs filesystem show /mnt

         Label: none  uuid: 3b6d6515-a75c-484b-8704-0fc803130140

        Total devices 1 FS bytes used 304.00KiB

        devid    1 size 16.00GiB used 2.04GiB path /dev/sda4

    # btrfs filesystem df /mnt

        Data, single: total=8.00MiB, used=256.00KiB

        System, DUP: total=8.00MiB, used=4.00KiB

        System, single: total=4.00MiB, used=0.00

        Metadata, DUP: total=1.00GiB, used=44.00KiB

        Metadata, single: total=8.00MiB, used=0.00

So it's a 16GB filesystem with "single" Data, meaning there is no duplication of that, and "DUP" System and Metadata, so it is at least keeping a copy of those parts.  Now, on the second drive of this system I have created a new empty partition of the same size, which I can add to this btrfs filesystem:

    # btrfs device add /dev/sdb1 /mnt

    # btrfs filesystem show /mnt

         Label: none  uuid: 3b6d6515-a75c-484b-8704-0fc803130140

        Total devices 2 FS bytes used 304.00KiB

        devid    1 size 16.00GiB used 2.04GiB path /dev/sda4

        devid    2 size 16.00GiB used 0.00 path /dev/sdb1

    # btrfs filesystem df /mnt

    Data, single: total=8.00MiB, used=256.00KiB

    System, DUP: total=8.00MiB, used=4.00KiB

    System, single: total=4.00MiB, used=0.00

    Metadata, DUP: total=1.00GiB, used=44.00KiB

    Metadata, single: total=8.00MiB, used=0.00

Well, that worked, we now have a filesystem with two devices, and a total of 32GB of space, but look at that last bit, from the df command! It's still single Data and DUP Metadata, and since we want a RAID setup, that's not good enough yet. We need to use the btrfs balance command to convert from single/DUP layout to RAID1.

    # btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt

        Done, had to relocate 4 out of 4 chunks

    # btrfs filesystem df /mnt

        Data, RAID1: total=1.00GiB, used=320.00KiB

        System, RAID1: total=32.00MiB, used=4.00KiB

        System, single: total=4.00MiB, used=0.00

        Metadata, RAID1: total=256.00MiB, used=44.00KiB

Ah, that's better, now it is a complete RAID1 filesystem, with Data and Metadata both mirrored (striped) on both devices. But then... OH NO!  Disaster strikes! In this case some moron (me) comes along and wipes the filesystem on the original disk drive. The next time I try to mount that filesystem I get:

    # mount /dev/sda4 /mnt

        mount: /dev/sda4 is write-protected, mounting read-only

        mount.nilfs2: Error while mounting /dev/sda4 on /mnt: Invalid argument

YIKES! That's not good. For the sake of brevity, and because the actual range of possible causes and symptoms is quite broad, I am going to skip over the identification of which device or filesystem is actually defective — this will normally include some combination of checking for strange noises coming from the drive, smoke and/or sparks coming out, analysis of system logs, and attempts to mount, and/or status check the pieces. 

In this case I know that it is /dev/sda4 that is defective, because I wiped it out myself (even this is not an unheard of problem situation, unfortunately, so don't criticise me too harshly for unrealistic scenarios).  Anyway, once I know that /dev/sdb1 should still be intact (at least I hope and pray that it is), I can try to mount it. A normal mount doesn't work, of course, but fortunately there is an option to mount such a "degraded" filesystem, which we need to do in order to repair it:

    # mount /dev/sdb1 /mnt

        mount: wrong fs type, bad option, bad superblock on /dev/sdb1,

       missing codepage or helper program, or other error

    # mount -o degraded /dev/sdb1 /mnt

Whew, ok, that worked. Now I need to replace the defective device, but as I mentioned above, I can't just delete the defective device, even though I know already from the sound, light and smoke coming from the computer cabinet that it is hopelessly gone:

    # btrfs device delete /dev/sda4 /mnt

        ERROR: error removing the device '/dev/sda4' - unable to go below two devices on raid1

Like I said, you're not allowed to break the rules for RAID, even when you know that they have already been effectively broken. 

What we need to do is add another device, to take the place of the defective one, then we can remove the defective device and still meet the minimum RAID requirement:

    # btrfs device add /dev/sda5 /mnt

    # btrfs device delete missing /mnt

    # btrfs filesystem show /mnt

        Label: none  uuid: 181420d6-f545-409e-af86-24d0cc316f79

        Total devices 2 FS bytes used 36.00KiB

        devid    2 size 16.00GiB used 292.00MiB path /dev/sdb1

        devid    3 size 16.00GiB used 288.00MiB path /dev/sda5

    # btrfs filesystem df /mnt

        Data, RAID1: total=1.00GiB, used=320.00KiB

        System, RAID1: total=32.00MiB, used=4.00KiB

        Metadata, RAID1: total=256.00MiB, used=24.00KiB

In the delete command above, missing is a special keyword that tells btrfs to delete whatever device was not found when the filesystem was mounted in degraded mode. So, now we have replaced a defective device in our RAID filesystem, and we have everything duplicated (mirrored) again as it should be. Whew.

OK, I think that's enough on data recovery — although honestly we have barely scratched the surface of what is possible — and in particular that is enough CLI examples. I want to wrap this up with a couple of words about how btrfs is (or isn't) integrated in and supported by standard utilities.

All of the GUI disk/partition management utilities that I have checked have at least a rudimentary support for btrfs. This means that they can create, resize and delete partitions containing simple btrfs filesystems. The degree of support beyond that, however, is variable.

This test system is running openSuSE 13.1, which uses YaST2 for system management, including disk partition management, so I might as well start with that. This utility has pretty good btrfs support, starting with this overview of Available storage:

YaST2 System Partition Overview

The most interesting thing to note here is that although it shows both /dev/sda4 and /dev/sdb1 as btrfs filesystems, it doesn't show a mount point for either of them;  however, the next btrfs filesystem shown, identified by its UUID number, shows that it is mounted on /mnt, which is where we have our test btrfs filesystem. If I select that filesystem, I can then check the Used Devices to see that it is made up of the two physical partitions.  However, I still can't find any detailed information about the btrfs filesystem, RAID level (if any), data or metadata duplication or such.

The other distribution that I am familiar with which includes btrfs support is Fedora.  When I set up a similar configuration on that, and then run the disks utility, it shows me this:

Screenshot from 2013-12-10 21:27:30
Fedora 20 TC5 disk display with btrfs partitions

Here you can see that it is a btrfs filesystem, and it is mounted on /mnt, but you can't see any details about the structure, or whether the filesystem actually spans other partitions or devices.  If I then view the second disk:

Fedora 20 TC5 disks diskplay

Hmmm. Well, it shows that /dev/sdb1 is a btrfs filesystem, but it also says that it is not mounted, which is actually wrong. It doesn't seem to understand that this is part of a RAID1 filesystem with /dev/sda4, and that filesystem is in fact currently mounted. Of course, the CLI tools still get it right, and tell the whole story:

    # btrfs filesystem show /mnt

        Label: none  uuid: 8321c748-b0e2-487f-9d9b-e4565e4c2730

        Total devices 2 FS bytes used 624.00KiB

        devid    1 size 32.00GiB used 2.04GiB path /dev/sda4

        devid    2 size 32.00GiB used 2.03GiB path /dev/sdb1

    # btrfs filesystem df /mnt

        Data, RAID1: total=1.00GiB, used=512.00KiB

        System, RAID1: total=32.00MiB, used=16.00KiB

        System, single: total=4.00MiB, used=0.00

        Metadata, RAID1: total=1.00GiB, used=96.00KiB

The utility I personally use most frequently is gparted.  As far as I can tell, it doesn't have anything beyond the basic btrfs support. The display below is taken from the server used in the examples above. It shows and correctly identifies the btrfs filesystems.  However, it has no provision to create, manage or display btrfs filesystems which span multiple partitions or devices, nor the specifics of btrfs filesystem structure, such as RAID1 for the /dev/sda4 and /dev/sdb1 pair in our example.

gparted displaying btrfs filesystems

By the way, the screenshot above shows a nice view of the openSuSE 13.1 btrfs installation, with both root and home btrfs filesystems, plus the test filesystem that I created and mounted on /mnt, and then a standard Linux swap partition. There are no ext3/ext4, FAT or other filesystems in this installation, it is pure btrfs.

To summarise this complete series of posts, I would start by saying that the btrfs filesystem is now becoming a realistic and viable alternative. 

You can create, resize and delete at least simple btrfs filesystems either from the CLI or from most GUI disk and partition management utilities. You can also convert existing ext3 and ext4 filesystems to btrfs (but be careful with this around the base filesystems). 

Beyond that, using the CLI tools, you can create such filesystems which span multiple partitions, and you can control the internal structure of btrfs filesystems, specifying anything from the simplest possible single-copy of everything (data and metadata) to various RAID formats. Again, you can use the CLI tools to convert existing simple multi-volume or multi-device filesystems to RAID, but in doing so you have to pay attention to RAID requirements of the minimum number of disks for the different types of RAID configurations.

You can create multiple subvolumes within a btrfs filesystem, and these can be mounted and snapshots made independently of each other. Finally, if you should have a problem of some sort with a btrfs filesystem, there are various recovery and replacement options available to help protect and recover your data.

I'm not advocating using btrfs in important or business-critical applications yet, and I wouldn't do that myself yet, but I have set up several of my personal systems to use btrfs, sometimes only for data partitions but some of them for the root and system partitions as well.  I expect this choice to become more common in the near future, as btrfs development continues and really good GUI utilities add support for it.

And finally: here's a little bonus tidbit.  While reloading the deskside system with Fedora 20 final TC5, I noticed that when I selected BTRFS for the default configuration, I then also had the option of selecting a RAID LEVEL. I didn't try it at that time, but I assume that this means I could load Fedora with a RAID0/1/10 root filesystem.  That would be quite nice.

More on Btrfs

Editorial standards