Btrfs hands on: An extremely cool file system

Btrfs hands on: An extremely cool file system

Summary: Resizing, Adding Partitions, Adding Disk Drives, Snapshots - all while the btrfs filesystem is still mounted!

SHARE:

In my first post on this subject, Btrfs basics, I discussed how to create a simple btrfs filesystem, or a complete btrfs Linux system. 

The information and examples in that post are going to be important because I am going to use them as the basis for what follows.

One subject that I did not dicuss in the previous post was disk partitioning. Hopefully the basic concepts are familiar: a physical disk drive can be logically divided into partitions. 

Btrfs hands on: My first experiments with a new Linux file system

Btrfs hands on: My first experiments with a new Linux file system

Btrfs hands on: My first experiments with a new Linux file system

For Linux users, the most common use of this partitioning is to load multiple operating systems on a single disk drive (such as Windows and Linux), or perhaps to segregate files into different groups, such as operating system files, boot files, user data files and such. 

The important characteristic of partitioning for this discussion is that the disk controller and driver enforce the partition boundaries. More advanced users or system administrators might have dealt with unpartitioned disk drives, usually in the context of a RAID system where the entire disk (multiple disks, actually) are given to the RAID controller for management. As we will see below, btrfs filesystems introduce some interesting new twists on this concept.

So, let's jump right in with a partitioning example (issue). In the previous post I gave this example of creating a btrfs filesystem:

    mkfs.btrfs /dev/sda16

That command assumes that the partition /dev/sda16 already exists, but then I quickly skirted that issue by saying that you could do the same thing with gparted — the important difference is that gparted will create the partition for you, and then create the btrfs filesystem within that partition. 

Either way, the result of the above command is a btrfs filesystem which fills the specified partition. In the case of my example, I had created a 16GB partition, so I then got a 16GB btrfs filesystem. Nice and easy. I can check that with the btrfs utility program:

    btrfs filesystem show /dev/sda16

        Label: 'btest'  uuid: a8a0ea98-5746-4d34-92b7-cfd447af9ddf

        Total devices 1 FS bytes used 28.00KiB

        devid    1 size 16.00GiB used 2.04GiB path /dev/sda16

Now, one of the really interesting things about btrfs filesystems is that you can resize them "on the fly" — while they are still mounted. 

That is just wonderful — how many times over the years have I had a filesystem fill up, and I had to create a whole new filesystem and then copy everything over — or worse yet, copy everything from the full filesystem to tape, then delete that filesystem and recreate it larger, then copy it all back again. Ugh. So the ability to do this without disturbing a running system makes me smile. A lot. 

The command for this is btrfs filesystem resize size <fs>. The size value may be given in absolute terms, with all sorts of human-friendly abbreviations (K or kilobytes, M or megabytes, G or gigabytes), and <fs> is where the filesystem is mounted. In use, to expand our filesystem to 20 Gigabytes, if I have it mounted on /mnt, it would be:

    btrfs filesystem resize 20G /mnt

        Resize '/mnt' of '20G'

        ERROR: unable to resize '/mnt' — File too large

Whoops. "File too large" — you didn't think I included that whole discussion above about partitioning for nothing, did you?  The partition that I created (well, gparted created) is 16GB, so I can't tell btrfs to make the filesystem any larger than that — in logical terms, my "disk" is full.  I could take the easy way out here, and go back to gparted to increase the size, of course. But that is not the point right now. So rather than increase the filesystem size, we will start by decreasing it:

    btrfs filesystem resize -4G /mnt

        Resize '/mnt' of '-4G

This shows a different notation, instead of giving the absolute size I have told it to reduce the size by 4GB.  That looks like more promising results, and we can check that it really worked: (note: the actual resizing of the filesystem will take a little bit of time, exactly how long depends on how fast or slow your system is, so don't be surprised if the result doesn't show up instantly)

     btrfs filesystem show /dev/sda16

        Label: 'btest'  uuid: a8a0ea98-5746-4d34-92b7-cfd447af9ddf

        Total devices 1 FS bytes used 284.00KiB

        devid    1 size 12.00GiB used 2.04GiB path /dev/sda16

Well, that is just extremely cool!  Other than being a generally useless example, since shrinking a filesystem is not something you want/need to do very often, but wow, it did it, while the filesystem was mounted!

Now that we have (artificially) created some free space within the partition, we can look at the other, more useful example — increasing the size of a filesystem. Again, the size can be given either as an increment (with "+" before the number), or as an absolute size:

    btrfs filesystem resize 14G /mnt

        Resize '/mnt' of '14G'

Finally, before this bit gets too boring, there is one more key word that can be used, to expand the filesystem to fill whatever its boundaries are:

    btrfs filesystem resize max /mnt

        Resize '/mnt' of 'max'

    btrfs filesystem show /dev/sda16

        Label: 'btest'  uuid: a8a0ea98-5746-4d34-92b7-cfd447af9ddf

        Total devices 1 FS bytes used 284.00KiB

        devid    1 size 16.00GiB used 2.04GiB path /dev/sda16

Hooray!  If had been able to do this many years ago, it would have saved me a lot of long nights and weekends moving files around and changing filesystems.

Oh, and there is one other way to keep this from getting boring... some of the documentation I read mentioned that if you needed to expand a filesystem in a partition that was already full, one option was to use fdisk to delete the partition and the create a new larger one — and be sure to use the same starting cylinder.

Well, let me tell you, if you are brave enough to delete and recreate a partition around a live filesystem, then ZOWIE, my hat is off to you!  Personally, I'll stick with gparted and its equivalents for that, thank you...

Ok, so now I can change filesystem sizes, within the bounds of the partition or disk drive that contains it.  Hmm... but that last bit can turn out to be a problem, because what if there is no more space available on the disk drive, or no adjacent space to expand the partition?  This is where the btrfs ability to span filesystems or even span devices is invaluable. (Note: at this point I am still only discussing simple files systems, I am not yet going to address RAID capabilities.)

For purposes of this illustration, I have created a new unformatted partition (/dev/sda17).  In the real world, this new partition is most likely to be on a different disk drive, but the key here is that as far as btrfs is conderned, it doesn't care where it is, it's just another partition. 

To add the new partition to the existing btrfs filesystem, I just use the btrfs command again.  To do this, the original btrfs filesystem has to be mounted, and you have to give the device name of the partition to be added, followed by the mount point of the original filesystem, like this:

    btrfs device add /dev/sda17 /mnt

This command produces no output — old Unix/Linux hands will be comfortable with the "no news is good news" philosophy, but if you want to be sure that it worked, you can check it again:

    btrfs filesystem show /dev/sda16

        Label: 'btest'  uuid: a8a0ea98-5746-4d34-92b7-cfd447af9ddf

        Total devices 2 FS bytes used 284.00KiB

        devid    2 size 16.00GiB used 0.00 path /dev/sda17

        devid    1 size 16.00GiB used 2.04GiB path /dev/sda16

Hey, cool, there it is, it worked!  Two devices listed, each with 16GB capacity.  Another way to check this would be to look at the total size of the mounted filesystem — remember, we originally created it as one 16GB partition:

    df -h /mnt

        Filesystem      Size  Used Avail Use% Mounted on

        /dev/sda16       32G  312K   30G   1% /mnt

Yes indeed, it is now 32GB! Very nice. Within that filesystem mounted on /mnt there is no distinction between the two devices, and we can use it completely normally.  The operating system will use the space from both parts as necessasry. We can check the distribution of data between the two parts with the btrfs command above. One other interesting note here, if I had done this because the existing filesystem was full, then after adding the new device I could redistribute the data evenly across the two partitions like this:

    btrfs filesystem balance /mnt

This is a one-time action command, it balances the current content across all of the devices within the filesystem; once it is done, distribution of subsequent data will be done normally again, not necessarily maintaining the balanced state.

Ok, I would like to cover just a couple more housekeeping commands before wrapping up.  First, if you are dealing with relatively large files and you want to cleanup whatever fragmentation of the file might have crept in over time, you can use the command:

    btrfs filesystem defragment <filename>

Oh my. Just think about that, defragmenting a single, specific file. To be honest, whenever someone starts talking about defragmenting, I immediately think about the old Dilbert comic strip, where Dogbert is working in Customer Support, and he says to someone on the phone: "Well, I could give you some false hope and tell you to try defragmenting your disk". 

But in this case, when I can specify a particular file which I know is large and would benefit, this could be very good indeed.  There are a number of options to this command, so you can specify file and fragment sizes, but the best one is — are you ready for this — you can also tell btrfs to turn on compression of the file contents as it defragments it! 

So, I have a large file, which has become scattered on the disk over time, and now I have a command which will gather the pieces and make them contiguous again, and at the same time it will compress the contents so that I can recover disk space at the same time?  I think I must have died and gone to heaven...

I haven't mentioned it until now, but another very important characteristic of btrfs is that it keeps checksums on data (and metadata, such as the directory structures).  These can help in identifying and possibly recovering corrupted data. Those of us who remember "alternate super blocks" in Unix filesystems might turn a bit green when we first learn about this. 

The btrfs utility includes a feature to use these checksums to verify data integrity, either of an entire filesystem or of individual devices or partitions within the filesystem:

    btrfs scrub start <path|device>

In its simplest form, this starts a scrub on the specified mount point or device, and the scrub will run in the background so that it doesn't tie up your terminal for what could be a rather long time.  What it actually does is read all of the data, and compare the cheksums to validate it.  If it finds an error it will attempt to fix it. In this simplest case, when it is run in the background, you will have to use the status command to get the results:

    btrfs scrub status <path|device>

If the scrub is still running, this will tell you what is happening; if it has finished it will give you the results.  Hmmm. Maybe this sounds a bit confusing, so a real example might help. On my completely trivial btrfs filesystem, it looks like this:

    # btrfs scrub start /mnt

    scrub started on /mnt, fsid a8a0ea98-5746-4d34-92b7-cfd447af9ddf (pid=3414)

 Some time later:

    # btrfs scrub status /mnt

    scrub status for a8a0ea98-5746-4d34-92b7-cfd447af9ddf

        scrub started at Fri Nov 29 09:58:44 2013 and finished after 1 seconds

        total bytes scrubbed: 312.00KiB with 0 errors

There are options for this command to set the priority it runs at, keep it from going into the background, make it more or less verbose, and even disable repair and simply report errors. So nice.

Ok, enough is enough.  This has been a lot of information, and my fingers are tired.  For the next post I will drag out an old deskside server which has two disk drives, and things will get even more interesting, with RAID, subvolumes and snapshots.  I hope it is becoming clear why btrfs is such an interesting and important development in Linux.

Further reading

Topics: Linux, Emerging Tech, Open Source, Operating Systems, Storage

J.A. Watson

About J.A. Watson

I started working with what we called "analog computers" in aircraft maintenance with the United States Air Force in 1970. After finishing military service and returning to university, I was introduced to microprocessors and machine language programming on Intel 4040 processors. After that I also worked on, operated and programmed Digital Equipment Corporation PDP-8, PDP-11 (/45 and /70) and VAX minicomputers. I was involved with the first wave of Unix-based microcomputers, in the early '80s. I have been working in software development, operation, installation and support since then.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

21 comments
Log in or register to join the discussion
  • LVM anyone?

    I recall being able to do much of what is being touted as new in the LVM like 10 years ago. Sistina, which was acquire by RedHat, wrote LVM 1 and 2, plus GFS (really good until clvdm and redhat cluster were intertwined). LVM allowed for adding disks, expanding volumes, migrating from disk to disk, ...etc ALL online.

    Sure BRTFS, is combining the filesystem and the LVM functionality into one, and adding some of its own, but really much of it could be done with tools that have been there for a long time.
    sys_engineer
    • LVM2/GFS2, even with clvmd and a cluster, can do most of these things

      If not all. Been using them for years, nothing beats LVM and GFS2 for large scale clusters.
      anothercanuck
    • Very True

      Yes, LVM was probably the earliest really flexible solution to a lot of these problems - especially having filesystems span partitions and devices, and chaging size and structure of mounted filesystems. I first started working with it in AIX, I can't remember exactly when but it must have been around 2000 sometime. But btrfs adds a number of new features and capabilities, and I find it to be significantly easier to use. Also, coming in the next post, btrfs has RAID capability, I don't recall that LVM had that - but I haven't worked with it in four or five years.

      Thanks for reading and commenting.

      jw
      j.a.watson@...
    • Disk error?

      It has been a few years since I have done anything with LVM, but I remember one potential downside was that if one of multiple physical disks in a volume failed, it could put data in all the volume at risk, even if on another disk. Is btrfs more resistant to this problem?
      timdor
      • btrfs ...

        It certainly should be more resistant to single disk failure because the RAID function is built in to the format rather than being an add on layer as it is with LVM. With btrfs you simply create an array of any number of drives or partitions and set the RAID level independently for data and metadata. If you set those to RAID 1 btrfs will make sure that all data and metadata will be written to two locations with each location being on a separate drive. If any one of the drives in the array fail, simpy issuing a command to remove the drive results in all data previously residing on that drive being copied to the remaining drives with the stipulation that it is not copied to the drive holding its mirror image. Once the mirroring is restored the failed drive is deleted from the array. My own RAID 1 arrays are bridged across five drives, thus I shoud be able to retain mirroring as long as I retain sufficient free space, without having to replace the defective drive(s). This is where having an integrated management capability pays dividends as opposed to dealing with an LVM layer, a RAID layer, and a filesystem layer. And all this gets achieved with one CLI command and remarkably few formats and options. Additionally if the drive failure takes place at boot, the whole whole array locks down to read only just in case one of the drives has been disconnected during the downtime which means nothing gets out of sync on the boot and the system can simply be shut down again, the failing drive identified and reconnected and immediately restored with the following reboot. Certainly not all of this works perfectly yet, there is still a lot of bug swatting going on, but enough of it does work that it is already pretty impressive.
        George Mitchell
        • Thanks

          Great explanation! I will definitely have to begin experimenting with birds.
          timdor
          • btrfs

            Stupid autocomplete. I have no desire to experiment with birds.
            timdor
  • Article: "defragmenting a single, specific file"

    I searched and found the Dilbert cartoon from May, 2005. LOL!

    FYI. A Sysinternals tool, Contig, allows one to defragment a single file from Windows XP forward. Here's a link to Contig:

    http://technet.microsoft.com/en-us/sysinternals/bb897428.aspx

    Like btrfs, Contig uses the CLI. Enjoy ...
    Rabid Howler Monkey
    • CLI

      Only a problem for those too lazy or technically handicapped to use the CLI. A lot of things are far easier to do by using the CLI than with a GUI. Some tasks are appropriate for GUI, others for CLI. They are two different tools for two different approaches. Just because the GUI came later doesn't mean it is better suited for every task. At first I was very put off by lack of GUI access to btrfs, but when I started working with it I found that CLI access is very straight forward. I used both GUI and CLI in administering 3ware RAID and GUI works better for some tasks while CLI works better for others. There are indeed some btrfs tasks that are better suited to GUI and that will come along soon enough. In the mean time it continues to be developed by Oracle with many more features likely to be added over the coming years before it is eventually finalized.
      George Mitchell
    • Brain Clutter

      Ack. May 2005? It's kind of scary that something like that is still cluttering my brain after more than 8 years.

      Thanks for reading and commenting.

      jw
      j.a.watson@...
  • Its a great filesystem ...

    I switched from 3ware RAID to btrfs RAID in April and have found it far easier to live with. I hated the occasional resyncs with hardware RAID and it is a big relief not to have to deal with them any more. Having all the features accessable through a common toolset also makes things much easier. Coming up will be things like the ability to assign RAID levels by file and a lot of other really cool stuff.
    George Mitchell
    • Redundant Array of Inexpensive Disks

      There have been a number of variations on what the RAID acronym stands for - in particular, the "I" is variously said to be "Independent", "Interconnected", "Inexpensive" and probably others that I am not thinking of right now. When I first learned about RAID it was "Inexpensive", and that has stuck in my brain ever since (see comment above re: Dilbert), so it always seemed incongruous to base a RAID system on a very expensive hardware controller. I also worked with a couple of different hardware-based RAID system from DEC over the years, and managing those was nothing short of a nightmare. I still shudder.

      Thanks for reading and commenting.

      jw
      j.a.watson@...
  • still

    I will stick with ZFS. It is much more cross platform, much more tested and stable and is natively supported in several OSes as well.

    It also scales (in real life) to huge dimensions.
    danbi
    • needs a lot of RAM though

      I am wondering if btrfs is also quite memory hungry?
      The license of zfs is gpl-incompatible, so it's future as a kernel module remains gloomy, taking away it's popularity.
      What's good for *BSD, Oracle and Apple might be detrimental for GNU and Linux. BTW, GPL is so good that even Oracle where btrfs project had begun couldn't routinely poison it.
      eulampius
      • Regardless of its limitations ...

        ZFS is a VERY good filesystem and has some features that are lightyears ahead of btrfs. HOPEFULLY, btrfs will eventually include at least some of them. I think Oracle is just covering its bases by remaining committed to btrfs development. They may also hold some ownership rights in the project that will allow them to merge certain btrfs capabilities into ZFS along the way, since GPL does not restrict their rights to anything that they also hold copyright on. But a number of other players are contributing to btrfs as well. I personally think that both filesystems have a great future. And Oracle has a great advantage by holding at least some degree of control over BOTH of the leading next generation filesystems. Its called "access to prior art" which is very useful in the courtroom these days when IP altercations come around.
        George Mitchell
  • The future belongs to file system RAID ...

    The hardware RAID industry faces a huge challenge posed by ever growing hard drive capacity. One can only imagine how long resyncs will take when multi terrabyte drives become common. And as drive sizes increase, the risk of hardware array fatalities increasing exponentially. Although RAID was never intended to replace backups, it was intended to maintain uptime. The longer a resync takes, the more danger there is of a second error that can take down the whole system. UNLESS hardware RAID systems become a whole lot smarter, they are going to begin to fade in popularity soon. And even assuming they achieve that, in the end that RAID card is just one more potential, and now unnecessary, point of failure. The future belongs to file based RAID systems like ZFS, BtrFS, and whatever MS and others will come up with to state their claim to that niche. MS is already moving in that direction with their latest filesystem. All of these nextgen filesystems amount to ONE intelligent integrated system to manage hard drive storage, both conventional and solid state. One system to manage with lots of scalability and granularity and minimal points of failure.
    George Mitchell
    • the ntfs nightmare

      >> "MS is already moving in that direction with their latest filesystem"
      Well, where and how Microsoft is moving is and always be a mystery. Had a friend to lose his entire ntfs partition by some magic with an absolutely healthy hdd and a new system. Robustness , stability and ease troubleshooting is a one of many places where MS and its proprietary friends have always been floundering.
      eulampius
      • Where MS is moving ...

        can usually be summed up in two words: predatory and oportunistic. But lately that strategy has seemingly produced mediocre results. As one who has seen multiple desktop worlds (Unix, Linux, Windows) I have often felt that the only thing that keeps MS going is its embedded base of applications and the dearth of key proprietary applications for Linux/Unix environments. If that stranglehold on apps EVER seriously breaks, I suspect MS is going to have a real horserace on their hands for the first time. Until then its a Windows world, no matter what shortcomings Windows might have. But seriously, at least in this case MS is recognizing that conventional block level RAID is at end of life and attempting to develop their own product to replace it. Of course that does not necessarily mean they can successfully execute that manuever, but at least they are trying (or at least pretending to try). They are in a tough spot right now competitively. They have all of these Lilliputians sniping away at them and all it would take is for one unlikely success and they would truly be in a world of hurt. I am not sure they could withstand another Android experience, even though they probably are making more off of Android than Google is.
        George Mitchell
      • NTFS is garbage

        eulampius:

        "Had a friend to lose his entire ntfs partition by some magic with an absolutely healthy hdd and a new system."

        I can't count how many times I've seen NTFS become corrupted in production use, and poof!, data has vanished. It's happened right in front of me on everything from Windows file servers, to basic Windows 7/XP workstations. Yes, NTFS has been revised over the years, but I trust the Linux ext3/4 and xfs filesystems over NTFS any day, for valuable data, where corruption cannot happen due to software problems. Hardware is out of the scope here. And yes, the solution is to have good and reliable backups. But with corrupted filesystems, there is downtime, and that can cost businesses lots of money.
        Chris_Clay
  • Very nice, but ready for production?

    JW, thanks as always for the great and thorough first hand review. This definitely seems like the wave of the future, it will be interesting to see which distributions jump on board with this filesystem as the default. It will especially be interesting to see if Btrfs is adopted more for the consumer end and/or for the enterprise (as some mentioned, where LVM has been for years), and also when it will be ready for heavy production use to fit in with the established filesystems like ext3, ext4, xfs, etc. It sure is not lacking any features, that is for sure.

    The other side is when the GUI tools will be updated to take advantage of these features, probably coming down the pipe I bet.
    Chris_Clay