A post last month in ACM's Queue raised a scary issue: block-level deduplication - used in some popular SSDS - can wipe out your file system.
Context SSDs that use MLC flash have to balance endurance against cost and capacity. Flash is expensive and has limited endurance - as little as 3,000 writes - so maximizing capacity while minimizing writes is a Good Thing.
1 popular flash SSD controller maker has done a couple of things to achieve this goal:
- Inline compression of data
- Block level deduplication
Compressing the data means less data to write and thus greater flash endurance. Block level deduplication - which is another form of compression - compares an incoming block against current blocks and, if there is a match, substitutes a pointer to the stored block instead of writing a new block.
It's fast, efficient and maximizes endurance. What's not to like?
Block level de-dup problem Researchers found that at least 1 Sandforce SSD controller - the SF1200 - does block-level deduplication by default. Which can be a problem.
Many file systems - NTFS, most Unix/Linux FSs, ZFS are some - write critical metadata to multiple blocks in case one copy gets corrupted. But what if, unbeknownst to you, your SSD de-duplicates that block, leaving your file system with only 1 copy?
Yup, corruption of 1 block could wipe out your entire file system. And since all the "copies" point to the same corrupted block, there's no way to recover.
Industry comment I contacted Sandforce for a response. The complete response is at StorageMojo but here's the key part:
We completely agree that any loss of metadata is likely to corrupt access to the underlying data. That is why SandForce created RAISE (Redundant Array of Independent Silicon Elements) and includes it on every SSD that uses a SandForce SSD Processor. All storage devices include ECC protection to minimize the potential that a bit can be lost and corrupt data. Not only do SandForce SSD Processors employ ECC protection enabling an UBER (Uncorrectable Bit Error Rate) of greater than 10^-17, if the ECC engine is unable to correct the bit error RAISE will step in to correct a complete failure of an entire sector, page, or block.
This combination of ECC and RAISE protection provides a resulting UBER of 10^-29 virtually eliminates the probabilities of data corruption. This combined protection is much higher than any other currently shipping SSD or HDD solution we know about. . . . All data stored on a SandForce Driven SSD is viewed critical and protected with the highest level of certainty.
I also contacted Other World Computing and OCZ, companies that sell SSDs based on Sandforce controllers. OWC founder and CEO Larry O'Connor responded, noting that OWC designs conservatively and has over 400 Macs using Sandforce-based drives without seeing this. OCZ didn't respond.
The Storage Bits take There are 2 reasons not to panic: not all SSD controllers do this; and there are bigger threats to your data. But is the feature worth it?
Most flash SSDs are spec’d at 1 URE in every 10-15 or better, so we’re talking 1 lost block every 100 TB to 1PB. With small capacity drives – say 160 GB or less – most drives will never see a URE – and only rarely will that URE hit a critical metadata block.
But when it does, that drive is gone. That’s when mirroring or RAID saves the day.
Whether Sandforce's assertions about bit-error rate are accurate - they spec the SF-1200 at 10-15, not 10-29 - this points up a common problem: file system designers assume 1 thing; while storage designers assume something else.
Another problem is that this failure will simply look like the drive suddenly died. It may be happening to people who don't recognize what happened.
What is certain is that no matter what the technology - disk, flash, DRAM, tape or whatever is coming down the pike - storage fails, so your vital data needs protection.
Comments welcome, of course. I often buy from OWC. TMS advertises on StorageMojo.