How SSDs can hose your data
Summary: A post last month in ACM's Queue raised a scary issue: block-level deduplication - used in some popular SSDS - can wipe out your file system. Here's the scoop.
A post last month in ACM's Queue raised a scary issue: block-level deduplication - used in some popular SSDS - can wipe out your file system.
Context SSDs that use MLC flash have to balance endurance against cost and capacity. Flash is expensive and has limited endurance - as little as 3,000 writes - so maximizing capacity while minimizing writes is a Good Thing.
1 popular flash SSD controller maker has done a couple of things to achieve this goal:
- Inline compression of data
- Block level deduplication
Compressing the data means less data to write and thus greater flash endurance. Block level deduplication - which is another form of compression - compares an incoming block against current blocks and, if there is a match, substitutes a pointer to the stored block instead of writing a new block.
It's fast, efficient and maximizes endurance. What's not to like?
Block level de-dup problem Researchers found that at least 1 Sandforce SSD controller - the SF1200 - does block-level deduplication by default. Which can be a problem.
Many file systems - NTFS, most Unix/Linux FSs, ZFS are some - write critical metadata to multiple blocks in case one copy gets corrupted. But what if, unbeknownst to you, your SSD de-duplicates that block, leaving your file system with only 1 copy?
Yup, corruption of 1 block could wipe out your entire file system. And since all the "copies" point to the same corrupted block, there's no way to recover.
Ouch!
Industry comment I contacted Sandforce for a response. The complete response is at StorageMojo but here's the key part:
We completely agree that any loss of metadata is likely to corrupt access to the underlying data. That is why SandForce created RAISE (Redundant Array of Independent Silicon Elements) and includes it on every SSD that uses a SandForce SSD Processor. All storage devices include ECC protection to minimize the potential that a bit can be lost and corrupt data. Not only do SandForce SSD Processors employ ECC protection enabling an UBER (Uncorrectable Bit Error Rate) of greater than 10^-17, if the ECC engine is unable to correct the bit error RAISE will step in to correct a complete failure of an entire sector, page, or block.
This combination of ECC and RAISE protection provides a resulting UBER of 10^-29 virtually eliminates the probabilities of data corruption. This combined protection is much higher than any other currently shipping SSD or HDD solution we know about. . . . All data stored on a SandForce Driven SSD is viewed critical and protected with the highest level of certainty.
I also contacted Other World Computing and OCZ, companies that sell SSDs based on Sandforce controllers. OWC founder and CEO Larry O'Connor responded, noting that OWC designs conservatively and has over 400 Macs using Sandforce-based drives without seeing this. OCZ didn't respond.
Intel responded that they do not use compression/de-duplication in any of their currently shipping SSDs. Nor does Texas Memory Systems, a maker of high-end enterprise DRAM and flash SSDs.
The Storage Bits take There are 2 reasons not to panic: not all SSD controllers do this; and there are bigger threats to your data. But is the feature worth it?
Most flash SSDs are spec’d at 1 URE in every 10-15 or better, so we’re talking 1 lost block every 100 TB to 1PB. With small capacity drives – say 160 GB or less – most drives will never see a URE – and only rarely will that URE hit a critical metadata block.
But when it does, that drive is gone. That’s when mirroring or RAID saves the day.
Whether Sandforce's assertions about bit-error rate are accurate - they spec the SF-1200 at 10-15, not 10-29 - this points up a common problem: file system designers assume 1 thing; while storage designers assume something else.
Another problem is that this failure will simply look like the drive suddenly died. It may be happening to people who don't recognize what happened.
What is certain is that no matter what the technology - disk, flash, DRAM, tape or whatever is coming down the pike - storage fails, so your vital data needs protection.
Comments welcome, of course. I often buy from OWC. TMS advertises on StorageMojo.
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.
Talkback
RE: How SSDs can hose your data
RE: How SSDs can hose your data
My OCZ crapped out
RE: How SSDs can hose your data
Just as well you did have a copy in the cloud. As if you had of, well, stored a backup on the same drive the drive may well have decided to only store one copy.
RE: How SSDs can hose your data
what a load ..
RE: How SSDs can hose your data
I installed four SanDisk (C300) 250 GB SSDs in some work laptops (one is mine) that have been running solid for four months with no problems yet. I'm running a 90 GB OCZ Vertex 2 on my home machine (this one) and it's been flawless for six months.
I guess I'll have to suffer a failure to believe that the rate is that high, but I also run Acronis full system images once a week, just to be on the safe side.
RE: How SSDs can hose your data
RE: How SSDs can hose your data
RE: How SSDs can hose your data
Backups
NINE Backup Hard Drives for my computer's hard drive.
AND
TWO Backup Computers for my Computer.
And I have backups in different locations.
I think the only thing missinng is periodically backing up to DVDs or Blu-Ray disks.
The only problem is that DVDs and Blu-Ray disks may only last 10 years at most before becoming corrupted spontaneously while in storage.
I think we have to realize that no data is going to be completely safe and that backups are necessary for everyone.
Yes. SSDs need backups. I use SSDs and I LOVE THEM.
In a mobile environment - e.g. moving vehicles, hard drives can get easily destroyed by potholes. SSDs are immune to this and nearly any shock.
RE: How SSDs can hose your data
I've worked with people that used optical backups. Unless you use the most expensive, highest class, don't do it at all. Maybe for your vacation pictures or something, but not anything important.
Then, I again, I worked for a hospital that still did everything on paper. The warehouse they contracted out to lost six file cabinets worth of patient files. Mathematically calculate those failure rates.
re: blakepedersen
I LOVE OWC
They use quality components.
AND they design conservatively.
And they back it up with a great warranty.
AND they are FAST.
I love OWC period
OWC is my go to for a lot of things. They are a fabulous supplier of quality goods. Having said that, I don't care how high quality the components, how conservative the design, or how great the warranty service is <i>you need to back up your data, period</i>.
I used to tell people at the Genius Bar that there are 3 types of computer users:
1)Those that back up
2)Those that have lost data
3)Those that are not in group 1 or 2 but are guaranteed to be at some point.
Easy to fix
Problem solved.... The main thing is never get excited...
Are you a file system engineer?
@mouse2600 .. great logic
maybe this is where you need to discuss HDD / SSD architecture:
http://www.enterprisestorageforum.com/technology/features/article.php/3864891/Solid-State-Drive-Reliability-and-Performance-in-Storage-Networking.htm
.. keep'a chuggin'
( n.b. i think you're idea has merit, but it might actually go over the heads of most folk here. ;) )
RE: How SSDs can hose your data
Why so timid?
1) The file systems listed were invented before SSDs, and therefore any missing data due to deduplication is squarely the fault of the SSD manufacturer. If their product doesn't work reliably, then redesign it. That's what I'd expect you to say.
2) You should have listed the top 10 SSDs on the market, and indicated which did deduplication, and which didn't. You (again ironically) fell back on failure statistics in an aw-shucks kind of way, instead of shining a a bright light on a potentially dark design secret.