How SSDs can hose your data

How SSDs can hose your data

Summary: A post last month in ACM's Queue raised a scary issue: block-level deduplication - used in some popular SSDS - can wipe out your file system. Here's the scoop.

TOPICS: Storage, Hardware

A post last month in ACM's Queue raised a scary issue: block-level deduplication - used in some popular SSDS - can wipe out your file system.

Context SSDs that use MLC flash have to balance endurance against cost and capacity. Flash is expensive and has limited endurance - as little as 3,000 writes - so maximizing capacity while minimizing writes is a Good Thing.

1 popular flash SSD controller maker has done a couple of things to achieve this goal:

  • Inline compression of data
  • Block level deduplication

Compressing the data means less data to write and thus greater flash endurance. Block level deduplication - which is another form of compression - compares an incoming block against current blocks and, if there is a match, substitutes a pointer to the stored block instead of writing a new block.

It's fast, efficient and maximizes endurance. What's not to like?

Block level de-dup problem Researchers found that at least 1 Sandforce SSD controller - the SF1200 - does block-level deduplication by default. Which can be a problem.

Many file systems - NTFS, most Unix/Linux FSs, ZFS are some - write critical metadata to multiple blocks in case one copy gets corrupted. But what if, unbeknownst to you, your SSD de-duplicates that block, leaving your file system with only 1 copy?

Yup, corruption of 1 block could wipe out your entire file system. And since all the "copies" point to the same corrupted block, there's no way to recover.


Industry comment I contacted Sandforce for a response. The complete response is at StorageMojo but here's the key part:

We completely agree that any loss of metadata is likely to corrupt access to the underlying data. That is why SandForce created RAISE (Redundant Array of Independent Silicon Elements) and includes it on every SSD that uses a SandForce SSD Processor. All storage devices include ECC protection to minimize the potential that a bit can be lost and corrupt data. Not only do SandForce SSD Processors employ ECC protection enabling an UBER (Uncorrectable Bit Error Rate) of greater than 10^-17, if the ECC engine is unable to correct the bit error RAISE will step in to correct a complete failure of an entire sector, page, or block.

This combination of ECC and RAISE protection provides a resulting UBER of 10^-29 virtually eliminates the probabilities of data corruption. This combined protection is much higher than any other currently shipping SSD or HDD solution we know about. . . . All data stored on a SandForce Driven SSD is viewed critical and protected with the highest level of certainty.

I also contacted Other World Computing and OCZ, companies that sell SSDs based on Sandforce controllers. OWC founder and CEO Larry O'Connor responded, noting that OWC designs conservatively and has over 400 Macs using Sandforce-based drives without seeing this. OCZ didn't respond.

Intel responded that they do not use compression/de-duplication in any of their currently shipping SSDs. Nor does Texas Memory Systems, a maker of high-end enterprise DRAM and flash SSDs.

The Storage Bits take There are 2 reasons not to panic: not all SSD controllers do this; and there are bigger threats to your data. But is the feature worth it?

Most flash SSDs are spec’d at 1 URE in every 10-15 or better, so we’re talking 1 lost block every 100 TB to 1PB. With small capacity drives – say 160 GB or less – most drives will never see a URE – and only rarely will that URE hit a critical metadata block.

But when it does, that drive is gone. That’s when mirroring or RAID saves the day.

Whether Sandforce's assertions about bit-error rate are accurate - they spec the SF-1200 at 10-15, not 10-29 - this points up a common problem: file system designers assume 1 thing; while storage designers assume something else.

Another problem is that this failure will simply look like the drive suddenly died. It may be happening to people who don't recognize what happened.

What is certain is that no matter what the technology - disk, flash, DRAM, tape or whatever is coming down the pike - storage fails, so your vital data needs protection.

Comments welcome, of course. I often buy from OWC. TMS advertises on StorageMojo.

Topics: Storage, Hardware

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • RE: How SSDs can hose your data

    Is there a specific SSD you prefer? I'm looking at upgrading my MacBook Pro but I'm curious which would you like after doing a bit of research?
    • RE: How SSDs can hose your data

      @wergo The OWC Mercury Extreme is one of the fastest Internal SATA 2.5" SSD's with available data rates up to 285MB/s and reliable <a style="text-decoration: none; color: #333333;" href="">hard drive data recovery</a>.
  • My OCZ crapped out

    Windows had to try and recover the volume. Fortunately, the data on the drive was from a service that resynced a fresh copy.
    • RE: How SSDs can hose your data


      Just as well you did have a copy in the cloud. As if you had of, well, stored a backup on the same drive the drive may well have decided to only store one copy.
  • RE: How SSDs can hose your data

    I don't think this is a valuable article at all. <br><br>Most people are concrete thinkers. If you say SSD is unreliable, then they go purchase a traditional HDD.<br><br>They assume you meant to advise that.<br><br>You haven't written an article that enlightens, if you ask me. The reality is there is a failure rate on SSD's - just as their is a failure rate on regular hard drives.<br><br>Should people back up their computers? yes, of course.<br>Is the SSD unreliable? That depends.<br><br>In real life, the de duplication compression hasn't been an issue for me - and I run quite a few SSD's. And beyond my anecdotal experience - I found a more interesting statistic to be return rates.<br><br>The intel drives have a return rate of under 1%, but drives from OCZ - are closer to 3%.<br><br>Now another interesting study is the one done by google, because they were able to examine 100,000 drives. (regular sata/pata traditional drives)<br><br>If you look at a manufacturer spec, of 300,000 hour MTBF rating, you'd expect out of 100,000 drives you'd expect almost a 3% annual failure rate - but Google found it was much higher than that.<br><br>So you do stand a reasonable chance of losing a real hard drive - and they also found the safety features like SMART alerts, failed - over 1/3rd of the failed drives gave no warning.<br><br>So - yes you better back up.<br><br>And bottom line, SSDs may be more reliable than regular hard rives, but you better back them up too.
    • what a load ..

      @rdupuy11 .. <i>"I don't think this is a valuable article at all. "</i><br><br>So the author does some background research; confirms a major flaw in particular storage media type; states it (for everyone's benefit, mind you) - the repercussions; and all you can do is ad hominem his efforts.<br><br>This is one of the very few worthy tech write-ups posted on the whole of ZDNet and you paste the guy?<br><br>Tough luck, certain SSD's brands have this flaw - get used to it. Until it's fixed those customers that have SF1200's (..i'm gonna guess that's a decent number) are up s@#t creek if the aforementioned (Quote): <i>"...Yup, corruption of 1 block could wipe out your entire file system. And since all the copies point to the same corrupted block, theres no way to recover."</i>, ever happens to these folk.<br><br>Back to RAID i guess - or mirroring. (Personally, i would recommend ghosting a system when you have it at its optimum state .. it's quick and removes the stress of worrying about eventual corruption / deterioration of a running file system). Oh well, never mind .. i guess it's a risk people take when folk choose to early adopt a new / developing technology.
    • RE: How SSDs can hose your data


      I installed four SanDisk (C300) 250 GB SSDs in some work laptops (one is mine) that have been running solid for four months with no problems yet. I'm running a 90 GB OCZ Vertex 2 on my home machine (this one) and it's been flawless for six months.

      I guess I'll have to suffer a failure to believe that the rate is that high, but I also run Acronis full system images once a week, just to be on the safe side.
  • RE: How SSDs can hose your data

    I think using the digit "1" when you mean "one" in terms of unity or singularness, is considered unconventional writing.
    Benjie Dog
    • RE: How SSDs can hose your data

      @Benjie Dog Correct. However the research I've seen says it works better on the web at keeping readers reading. I have no idea why.
      Robin Harris
      • RE: How SSDs can hose your data

        @Robin Harris <br>Interesting tidbit there. But what kept me reading your article was not the use of "1" but the informative content of your article and your effective writing style.
        Benjie Dog
  • Backups

    I have:
    NINE Backup Hard Drives for my computer's hard drive.
    TWO Backup Computers for my Computer.
    And I have backups in different locations.

    I think the only thing missinng is periodically backing up to DVDs or Blu-Ray disks.

    The only problem is that DVDs and Blu-Ray disks may only last 10 years at most before becoming corrupted spontaneously while in storage.

    I think we have to realize that no data is going to be completely safe and that backups are necessary for everyone.

    Yes. SSDs need backups. I use SSDs and I LOVE THEM.

    In a mobile environment - e.g. moving vehicles, hard drives can get easily destroyed by potholes. SSDs are immune to this and nearly any shock.
    • RE: How SSDs can hose your data


      I've worked with people that used optical backups. Unless you use the most expensive, highest class, don't do it at all. Maybe for your vacation pictures or something, but not anything important.

      Then, I again, I worked for a hospital that still did everything on paper. The warehouse they contracted out to lost six file cabinets worth of patient files. Mathematically calculate those failure rates.
      • re: blakepedersen

        It is imperative to understand first the difference between astrology and astronomy and the association of star signs with the two fields. <a href="" target="_blank" rel="nofollow"></a><br><br>St. Catharines Ontario Canada is a great place to buy St Catherines real estate. This is a comfortable place to live with many housing options. Manufacturing is what most people do in this area. <a href="" target="_blank" rel="nofollow"></a><br><br>People from all walks of life used to carry a sticky note pad for easy jotting down of reminders, to do lists or any short notes that are of utmost importance and sticking the small stationery note piece with a re-adherent strip at its back to any documents or surfaces commonly which is computer. <a href="" target="_blank" rel="nofollow"></a>

    I love OWC SSDs.

    They use quality components.
    AND they design conservatively.
    And they back it up with a great warranty.
    AND they are FAST.
    • I love OWC period

      OWC is my go to for a lot of things. They are a fabulous supplier of quality goods. Having said that, I don't care how high quality the components, how conservative the design, or how great the warranty service is <i>you need to back up your data, period</i>.

      I used to tell people at the Genius Bar that there are 3 types of computer users:
      1)Those that back up
      2)Those that have lost data
      3)Those that are not in group 1 or 2 but are guaranteed to be at some point.
  • Easy to fix

    This is a very easy problem to solve. The file system driver just needs to XOR each superblock with the block number. That way each superblock is binary different and will not be de-duplicated. The driver will just XOR the data with the block number when it reads the superblock. XOR is a very fast cpu operation and is used for many things that require speed. There would be virtually no performance loss from this.

    Problem solved.... The main thing is never get excited...
    • Are you a file system engineer?

      @mouse2600 <br>yes - Great! Let me know when, as a consumer, I can license your new file system.<br><br>no - Whoops! Maybe you can tell me, as a consumer, how to do what you suggest, and make it simple for me to implement?
    • @mouse2600 .. great logic

      .. just bad placement. This isn't the real forum .. it's simply a tech blog site - as opposed to technology development or technical review site. (.. I mean it may have well been once .. but it sure as h#ll ain't now).

      maybe this is where you need to discuss HDD / SSD architecture:

      .. keep'a chuggin'

      ( n.b. i think you're idea has merit, but it might actually go over the heads of most folk here. ;) )
  • RE: How SSDs can hose your data

    Thank you for the column and the pointer to the research. Still, I don't agree that this is a significant issue. <br><br>Though de-dupe is an enterprise software feature, the acm article and yours seem to conclude that block de-dupe in a drive is a crit issue vs. file systems that make multiple copies of metadata. However, wouldn't it be also worrisome that the NTFS is putting those multiple copies on a single spinning HDD? I'll go with the SandForce SSD with a very strong chance of avoiding data corruption vs. a full drive mechanical failure! <br><br>It's always a good idea for any sensitive apps/data to run on a RAID array (multiple drives) and/or backup and logs to an external DR storage system. Otherwise, what app has the problem here that it wouldn't somewhere else? <br><br>Maybe it's just a bad practice to expect that the file system making multiple copies around a single drive will buy you very much protection from failure. I seems like the ACM article may have been written from a software perspective, but it didn't consider the typical storage hardware ecosystem around the drive or good backup practices.<br><br>RE:"Block level de-dup problem<br>Many file systems - NTFS, most Unix/Linux FSs, ZFS are some - write critical metadata to multiple blocks in case one copy gets corrupted. But what if, unbeknownst to you, your SSD de-duplicates that block, leaving your file system with only 1 copy?<br><br>Yup, corruption of 1 block could wipe out your entire file system. And since all the copies point to the same corrupted block, theres no way to recover."
  • Why so timid?

    You've been much more adversarial in the past (ironically with regards to RAID), yet this blog about redundant data going missing was barely worth reading.

    1) The file systems listed were invented before SSDs, and therefore any missing data due to deduplication is squarely the fault of the SSD manufacturer. If their product doesn't work reliably, then redesign it. That's what I'd expect you to say.

    2) You should have listed the top 10 SSDs on the market, and indicated which did deduplication, and which didn't. You (again ironically) fell back on failure statistics in an aw-shucks kind of way, instead of shining a a bright light on a potentially dark design secret.