Resurrection: Mac/ZFS returns!
Summary: ZDNet EXCLUSIVE: The Mac gets a 21st century file system. More good news: it's free!
Not from Apple - who promised it 4 years ago - but from Greenbytes, an east coast data and cloud storage company. Greenbytes bought Tens Complement, the company that released a version of the Sun/Oracle ZFS file system for Mac last year.
But Tens Complement couldn't scale its business, so founder Don Brady - who was Apple's lead engineer on the original ZFS port - sold out to Greenbytes. That was the end of Mac ZFS - or so I thought.
What you need to know: Now christened the "Zevo Community Edition" is coming out on Saturday the 15th. Works with Snow Leopard, Lion and Mountain Lion, 64 bit only. 4GB RAM required with 8GB recommended.
Has much of the original ZFS goodness, including snapshots, quotas, data scrubbing, mirrors & RAID Z - up to 3x failure protection - all heavy duty data protection features that the antiquated HFS+ never dreamed of. And, of course, the proven data integrity features of ZFS are standard.
But this release isn't for everyone. It requires Terminal to set up - no fancy GUI. ZFS features such as dedup and hot spares aren't supported yet. Most important: you can't boot your system from it.
But hey! It's free! And it's the 1st release - not the last.
The Storage Bits take
Think of this release as the storage enthusiast version. It's for people who have multi-drive systems, are comfortable using Disk Utility and, most importantly, care about data integrity and protection.
Which leaves most consumers out.
Every time I write about data corruption some claim - perhaps rightly - that they've never seen HFS+ data corruption. But the kicker is that unless you are a data-intensive power user, you'll probably never realize that the program that wouldn't open, or the photo that wasn't recognized, are symptoms of data corruption.
There's no big red "DATA CORRUPTIOM" flag that comes up - and Apple likes it that way. And why not? Power users are a shrinking piece of the user population. Why engineer advanced features for people who'll never know?
But Zevo is great news for pro users - creatives, scientists, lawyers, doctors, businesses - who need enterprise class data integrity on their Macs. Greenbytes plans to continue development of Zevo with more resources, so watch this space for future news.
Comments welcome, of course. Here's a link to the Zevo homepage.
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.
Talkback
If They Required Data Integrity, They Wouldn't Be Using Macs
My fix? Ran the Apple Disk Utility and told it to fix permissions. That was all. As a longtime Linux user, I've NEVER come across a situation where protections on important system files or directories could be randomly changed to something unusable like this. Apple has taken the legendary Unix concept, and made a complete hash of it.
Filesystem Is Probably Mac's Biggest Weakness
A Linux user who knows little about OS X
The operating system does not change permissions randomly at all. But permissions can be changed by badly written installers, that are given admin privileges.
This sort of thing can happen on any operating system, if a third-party installer is badly written, and changes permissions but doesn't change them back.
To Elaborate
Here are some of the issues:
HFS+ is not POSIX compliant. It has no support for hard links. In order to support hard links for OS X (which needs them), an ugly workaround has been put in place in which information about hard links is put into a hidden directory on the drive. This directory can become unwieldy, and makes for a greater risk of filesystem corruption.
HFS+ does not support relative paths or working directories. Every reference to a file handed to the filesystem has to be the full path (or else the CNID, perhaps). Any support for working directories or relative paths has to be included in the operating system or application.
HFS+ uses colons for file path component separators. Again this has to be worked around to make the normal Unix forward slash appear to work as a separator. It also has to be worked around to make colons possible as characters in a file name. This juggling of colons and slashes is a complication that can easily lead to problems when switching between various methods of accessing files in different programs or at the command line.
HFS+ supports flags that are not used anymore, resulting in cruft lying around (not a terribly big deal).
HFS+ supports resource forks, which are a security risk (to be fair so does NTFS, although they are not quite as easy to exploit in NTFS). Resource forks can be exploited to hide malicious code. Also, resource forks are used by some Mac software in HFS in a way unlike any other file system, and are sometimes used in a way that will cause files not to be correctly copied to another file system (Note that Apple has asked programmers to stop doing this, so it shouldn't be a problem with newer software, as long as the programmers listened).
HFS+ searches for unused nodes in the b-tree file 16 bits at a time (apparently a throwback to the 16 bit processors used when HFS was created). This degrades performance.
HFS+ uses global locks for system metadata structures. That means that only one application can change metadata at a time, even for different files (only one application at a time should be given access to the metadata for the same file). This was acceptable at one time, but today it's another bottleneck on performance.
HFS+ also doesn't have sparse file support. This means that in order to create a file that is large in anticipation of it later being filled with data, the whole file has to be written in HFS+ (much of it blank), rather than just a few bytes at the beginning and at the end.
There are probably more problems with HFS+ (there are at least some problems with every file system), but some of these are rather serious shortcomings for an advanced operating system like OS X.
Been on OSX since 2002
Let me introduce you to "silent corruption"
You've obviously never heard of silent corruption which means that data get corrupted without you even knowing it. The system simply can't tell because there are no parity checks that can detect such problems.
Here is a whole PhD disertation showing that normal file systems are unreliable:
http://www.zdnet.com/blog/storage/how-microsoft-puts-your-data-at-risk/169
Dr. Prabhakaran stated in this paper that he found that ALL the file systems shared
...ad hoc failure handling and a great deal of illogical inconsistency in failure policy...such inconsistency leads to substantially different detection and recovery strategies under similar fault scenarios, resulting in unpredictable and often undesirable fault-handling strategies.
We observe little tolerance to transient failures;...none of the file systems can recover from partial disk failures, due to a lack of in-disk redundancy.
Refrain sounds familiar....
No one has ever had any of these... until it happens. and then they stop making noise. I had a great tlak with a genius bar fellow awhile back and mentioned that I had never run time machine until i had a system failure. Now i run it religiously.
He commented that he sees this all the time. The way he put it was this: "Time machine saves a lot of peoples systems... the second or later time they fail."
Filesystem failure isn't common
Hardware failure is about 5 or 10 times more common than filesystem damage, and most of the filesystem issues are because of power failure or improper disconnects (external drives)
Backups are critical (businesses or extremely important or hard to replace data should also get an off-site backup)
As for permissions issues, as someone else mentioned, it is typically a badly made installer that tromps on directory permissions in areas it shouldn't even touch.
Now saying all of that, there are other issues with HFS+ that have had whitepapers written - don't let the drive get above about 85% full (large files like video shouldn't be an issue, but a typical boot drive with hundreds of thousands of small files could be an issue)
I was really looking forward to ZFS, but when Apple ran into licensing issues with ZFS, they did hire at least one or two people as filesystem specialists to probably develop internally a new one of their own.
Same
The disconnect caused directory corruption at a point that was critical - as a result the drive was inaccessible.
Norton Utilities could not fix the issue.
I ended up manually comparing the sector with the file system spec. It turned out that one byte was dropped which should have been 0x00.
Manually inserted the missing byte and wrote back to the sector and all well again.
Again, run an experiment and show
Repeatable results of experiments
Time Machine to JHFSX: no hint of a problem with the media.
Time Machine to ZEVO Community Edition 1.1: errors are revealed.
I'll not bother to graph things (sorry) but screenshots such as this http://www.wuala.com/grahamperrin/public/2012/09/23/a/?mode=gallery are exemplary.
Right - so an imperfect disk is imperfect
I am in the middle of recovery of failed HD
So it was good timing.
I have to say that OS X / HFS+ is fairly stable. Very few problems. Haven't had any directory or permissions issues for years. (besides this media failure)
OS X gracefully handled the media failure and the resulting directory corruption. The HD gets mounted as read-only after a warning that the disk cannot be repaired automatically.
So yes I agree that HFS+ has underlying issues that could be better. I have zero experience of data corruption after decades of use. Directory corruption - yes in the past - or in case of hardware failure now.
ZFS - yes would be better - and I am considering using it. Becoming dependant on something not standard in the OS worries me though as I may end up with an unsoupported disk.
Meanwhile I continue to copy files from a drive that has a transfer speed varying from 10Kbps to 40MBps.
Certainly the only way to avoid issues with data loss is to have copies. There is really no way any file structure can prevent a drive failure from being a problem without having some form of redundant storage.
There is no case I have experienced where I would have avoided data loss by having ZFS and the same number of hard drives in all the time I and the people I know and work with have been using Macs. (in other words since 1984)
For all the shortcomings of HFS (which I know are there) it does work quite well overall.
The links support issue is interesting - have a look at how Time Machine does backups using links that are not supported.