How Microsoft puts your data at risk

How Microsoft puts your data at risk

Summary: 56% of data loss due to system & hardware problems - OntrackData loss is painful and all too common. Why?

TOPICS: Microsoft

56% of data loss due to system & hardware problems - Ontrack Data loss is painful and all too common. Why? Because your file system stinks. Microsoft's NTFS (used in XP & Vista) with its de facto monopoly is the worst offender. But Apple and Linux aren't any better.

Everyone knows what the problems are AND high-end systems fixed many of them years ago. Yet only one desktop vendor is moving forward, and they aren't based in Redmond. Here's the scoop.

Y2k got fixed. File systems didn't. That may sound harsh. But with all the lip-service paid to innovation - especially in Redmond - you'd think that sometimes we'd see some, especially in core technology. After all, more than half of all data loss is caused by system and hardware problems that the file system could recover from - but doesn't.

Instead we're using 20 year old technology that, like the 2 digit year - which led to the Y2K drama - was designed for a world of scarce storage, small disks and limited CPU power. Unlike Y2K though, we are living with, and paying for, these compromises every day with lost data, corrupted files, lame RAID solutions and hinky backup products that seem to fail almost as often as they work.

File systems? I should care because . . . You rely on your file system every time you save or retrieve a document. It is the file system that keeps track of all the information on your computer. If the file system barfs, your data is the victim. And you get to pick up the pieces.

As documented in my last two posts (see How data gets lost and 50 ways to lose your data) PC and commodity server storage stacks are prone to data corruption and loss, many of them silent. Only your file system is positioned to see and fix these problems. It doesn't, of course, but it could.

And you enterprise data center folks, smirking over the junk consumers get, don't be too smug. Some of your costly high-end storage servers have NTFS or Linux FS's under the hood as well. And no, RAID doesn't fix these problems. According to Kroll Ontrack, only a quarter of data loss instances are due to human error - and many of those errors happen in the panic after a loss is discovered.

Hey, I thought machines were supposed to be good at keeping track of stuff? Only if they are built to.

IRON = Internal RObustNess I came across the fascinating PhD thesis of Vijayan Prabhakaran, IRON File Systems which analyzes how five commodity journaling file systems - NTFS, ext3, ReiserFS, JFS and XFS - handle storage problems.

In a nutshell he found that the all the file systems have

. . . failure policies that are often inconsistent, sometimes buggy, and generally inadequate in their ability to recover from partial disk failures.

Dr. Prabhakaran will see you now In a mere 155 pages of lucid prose he lays out his analysis of the interaction between hosts and local file systems. It is a clever analysis, especially of the proprietary and unpublished NTFS.

First, inject a lot of errors Dr. Prabhakaran built an error-injection framework that enabled him to control what kind of errors the file system would see so he could document how the FS handled them. These errors include:

  • Failure type: read or write? If read: latent sector fault or block corruption. Does the machine crash before or after certain block failures"
  • Block type: directory block; super block? Specific inode or block numbers could be specified as well.
  • Transient or permanent fault?

So how did NTFS fare? Since NTFS is proprietary, Dr. Prabhakaran couldn't get as deeply into it as the open-source systems. While NTFS doesn't implement the strongest form of journaling, he found it pretty reliable at letting applications know when an I/O error has occurred. NTFS also retries I/O requests more than the Linux file systems, which, compared to the dearth of retries on Linux, is a good thing.

NTFS sanity checking is also stronger than some. Yet he notes that

NTFS surprisingly does not always perform sanity checking; for example, a corrupted block pointer can point to important system structures and hence corrupt them when the block pointed to is updated.

Translation: Bad Thing.

General screw-ups Dr. Prabhakaran offered a set of general conclusions about the commodity file systems including NTFS:

  • "Detection and Recovery: Bugs are common. We also found numerous bugs across the file systems we tested, some of which are serious, and many of which are not found by other sophisticated techniques."
  • "Detection: Sanity checking is of limited utility. Many of the file systems use sanity checking . . . . However, modern disk failure modes such as misdirected and phantom writes lead to cases where . . . [a] bad block thus passes sanity checks, is used, and can corrupt the file system. Indeed, all file systems we tested exhibit this behavior."
  • "Recovery: Automatic repair is rare. Automatic repair is used rarely by the file systems; . . . most of the file systems require manual intervention . . . (i.e., running fsck)."
  • "Detection and Recovery: Redundancy is not used. . . . [P]erhaps most importantly, while virtually all file systems include some machinery to detect disk failures, none of them apply redundancy to enable recovery from such failures."

Dr. Prabhakaran found that ALL the file systems shared

. . . ad hoc failure handling and a great deal of illogical inconsistency in failure policy . . . such inconsistency leads to substantially different detection and recovery strategies under similar fault scenarios, resulting in unpredictable and often undesirable fault-handling strategies. . . . We observe little tolerance to transient failures; . . . . none of the file systems can recover from partial disk failures, due to a lack of in-disk redundancy.

How doomed are we? Pretty doomed. But there is some hope.

There are well known techniques, such as disk scrubbing, check summing, and more robust ECC used in high-end systems that could be added to our systems. Not rocket science.

Young Dr. Prabhakaran now works at Microsoft Research. Perhaps someone up in Redmond will reach out to him to see how NTFS's aging architecture might be enhanced.

Of course, Microsoft is fine with the status quo until it threatens market share. Internet Explorer's innovation hiatus after crushing Netscape is a fine example.

So it is good news that Apple has two storage initiatives that will put pressure on Redmond to clean up its act.

  • Time Machine is a beautifully crafted automatic backup utility in Mac OS X.V (Leopard). While it doesn't solve the data corruption problems that I assume HFS+ has as well, it does make it very easy for regular folks to backup and recover their data. I think small business types will love it.
  • ZFS is the new open-source file system from Sun that Apple is incorporating into OS X. I expect the port won't be complete for another year, but ZFS is the first file system to offer end-to-end data integrity that can detect and correct such devious problems as phantom writes.

See Apple’s new kick-butt file system for more on ZFS.

The Storage Bits take As noted in "How data gets lost" more than half of all data loss is caused by system and hardware problems. A high quality file system that took better care of our data could eliminate many of those failures.

The industry knows how to fix the problems. The question is when. With a resurgent Mac pushing ZFS maybe Redmond will see the light sooner, rather than later, and dramatically increase the reliability of all our systems.

It will be interesting to see how Microsofties spin inferior data integrity once ZFS is the OS X default file system. Especially to the enterprise folks for whom data integrity is the ne plus ultra of the data center.

Comments welcome, of course. Itching to read a well done CompSci PhD. thesis? Here's a link to IRON File Systems. Enjoy.

Update: based on the first couple of commenters, who seem to believe that data loss is a figment of my imagination, I gave more prominence to the factual basis of data loss and added a couple of short quotes from the thesis. I single out Microsoft because their negligence impacts more people than any other company. Maybe, someday, Microsoft will start measuring success in terms of software quality instead of market share.

Topic: Microsoft

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Yawn

    Talk about troll FUD.

    As a computer user since the 60s, all that has happened is that both hardware and software have become more reliable. You manage to mention NTFS disparagingly even when it outperforms the other file system and with FAT32 is the most commonly used file system on the planet. It's one thing to build theoretical models, but where is the data that suggest we are losing information all over the place?

    And of course Apple (again rebadging someone else's technology) is going to save the day. Just like it's totally unexploitable operating system and it's one-button mouse. Well it's still not here is it? Frankly, I prefer the expertise of a company whose OS is used by over 90% of the world to one that borrows other people's work for their small audience.

    How about some facts about all the data we are losing?
    • OK, 56% of data loss due to system & hardware problems

      as reported by Ontrack? That data not good enough for you?

      If you are as experienced as you say - which I doubt - should know that Murphy's
      Law is operative here.

      Nor was this a theoretical exercise: he did real fault injection and found real
      problems. And Microsoft Research hired him based on that work. I thought you
      respected Microsoft?

      And no, Apple won't save the day, but competition will. Isn't that the American
      way? Seems to work in other industries.

      R Harris
      • Competition????

        I am sorry, but I do not understand this comment, that competition takes care. There is no competition, and MS has taken care of that. Whatever they see as competition, they buy and kill off. Thats because the legislation is deficient, which was not good enough to forestall one company to monopolise the OS market. With all the cash (some 50 Billion!), this monopolising company was not able to make ONE OS that fulfills the criteria of being anywhere near safe. Competition, sorry, that is a fairytale.
        • hmmm. Did you see this TITLE!!

          Far as I can tell, Apple is not on Microsoft's shopping list, nor do I think it's for sale. Same for SUN. As for Linux...I do not see Microsoft buying off "Linux". Who does that leave? That is the competition. With the over 7 year lock down of Microsoft, the gate has been swinging wide open. I think some substantial ground has been gained. And Vista is not stopping it from continuing so far. I'm not sure I understand your point? Things don't change overnight, but surely you can see the climate has begun to change. Microsoft has taken a beating from the press...look at this article alone, all vendors have the same problems or WORSE and he puts Microsoft in the Title as the Offender for the causual browsers to digest.....and they are given zero slack by anyone. Even when they are doing what's right. The funniest thing I've heard was the blog on how IE running on Vista it TOO Secure. yeah, ok...after years of getting battered over security (that wasn't really any worse than any competitor) they try to fix that and they are TOO securue. <br><br>
          I wish the bloggers who are trying to do their part to dent Microsoft at every chance would just come out and say it. This story was obvious, but some are more subtle and I think it's time even bloggers just stated their stance.
        • Not entirely true

          Granted, Microsoft has much of the market and that won't change in the near future. But they certainly upgraded IE from 6 to 7 in response to the challenge from Firefox. And Linux has certainly had an impact on their behavior, if not always positively.

          An advantage to open source is that it's tough for Microsoft to actually buy anything. Which forces them to other methods, like the nebulous claims of patent infringement. But if could allow one to infer that they're nervous.
          • Is this your only exposure ...

            to business? It's a given and something that really doesn't require anything beyond a highschool education, at most, to realize that every business in every industry, from car makers to plumbers, have to be concerned about their competition and resond to it in some manner. Those who don't are what? Out of business very quickly, yes. I'm not sure of the point of your post other than stating the extremely obvious.
        • My only wish . . .

          I just wish that too-many-billion-dollar Bill would take one of those billions and fix Windows.
    • Gee...

      It's a good thing hardware and software are perfect now. We don't have to worry about software bugs or faulty hardware causing data loss because Tony here doesn't believe that data loss even occurs. He takes two seemingly valid points...NTFS outperforms FAT32 and is the most commonly used file system on the planet...and jumps to the wild conclusion that it doesn't have problems. And he tries to call someone else a troll... What a laugh. Maybe someone else can figure out exactly where it was that the author claimed Apple was coming in to "save the day". All he said was that Apple's initiatives were likely to spur innovation at Microsoft since competition is the driver of innovation.

      Now, if you want facts about data we are losing, here a little real life facts for you... Since 2000 I have been directly involved with a company that has lost data from their Exchange server on 3 occasions. Twice were a result of hardware failure and once was a result of "data corruption due to unknown reasons" per MS support. The scope of the problem was mitigated after the first incident of data loss by initiating incremental backups of the email system several times during day. Problem is, the loss of even an hours worth of emails can be devastating to a company that relies on it as a communications vehicle (and these days, who doesn't?). One thing to note is that it's not the billion dollar a year company that typically sees this kind of's the small company with revenues of less than say $10 million. They don't have the resources to tackle problems like this that big corporations typically have.
    • It doesn't out perform ReiserFS 4

      [url=]While not easy to set-up[/url] initially, it works awesome and with the module plug in ability it's even more powerful and flexible than NTFS could dream of.

      How it compares to ZFS, I don't know. But since I have been using ReiserFS 4 I have noticed at least a 15% increase in overall read write performance. And this is noted by using the hdparm -Tt and comparing it to the ReiserFS 3.6 that was originally formatted on my drives.

      One thing though, I haven't had any serious data loss or corruption in the last couple years using ReiserFS. I have with ext2 and ext3 but not ReiserFS. ]:)

      The other powerful attribute is the lack of needing to defragment the drive, something that all Windows systems seem to need, even the NTFS system. Gotta love the *nix based FS... takes all the maintenance out of the equation for you! ]:)
      Linux User 147560
      • RE: ... ReiserFS 4

        I don't kow about ReiserFS v4 but I've been using ReiserFS V3 since it was in beta and never once lost data.

        In one situation ReiserFS v3 was on a bulletin board system that allowed tax payers to phone in and check on the status of their returns. The system was up 24/7 for 18 months without the loss of a single byte of data, except when ReiserFS was called upon twice during that period to recover data during reboot when Squirrels short-circuited power lines and brought the entire building down. The APC failed on both occasions, too, so I don't use them any more.
        • APC or UPS in general?

          I'm not sure if you meant generic UPS failed or specifically the APC brand UPS failed.
          We've had nothing but problems with APC - just fails when called upon. Of course they always seem to have some explanation, but that completely overlooks the idea that this is the failsafe backup to prevent power loss - so if they can't work out the bugs and I still have a reasonable chance of losing power when utility power dies, what's the point of spending all that money to ensure I never lose power? They sputter and struggle, but never come up with a reply.

          But who do you go to? APC seems to be a bigger monopoly than Microsoft.
      • Haven't used that fs

        I don't lose vital data because I understand appropriate and aggressive backup methodology. I shouldn't have to be so fastidious to protect myself from the file system. Perhaps I ought to try it...
    • Good theory, stick with it

      Your theory is good and you should stick with it but I'm one who tends to think differently and the reason why is simple. The Japanese have taught us that they are not ones to invent but take others inventions and perfect them. This has made them a domineering force in the car and motorcycle industries as GM, Ford and Chrysler have all felt their presence in more ways than balance sheets. Factory closures, loss of market and price lowering all account for severe losses. God help Microsoft if the Japanese get hold of a Linux OS and decide to perfect it like Honda did with their cars. Seriously, how would or could you perfect their cars as Mercedes are no more reliable than a Honda Accord, Toyota Camry or Lexus which has a similar price tag. Personally, I like Windows OS but I really dislike what Microsoft has done with it's license terms and conditions but more importantly how they FORCE you to prove to them it's their product over and over again. You may be able to accept this aspect of Microsoft, I won't as I feel it is not up to me to protect their software or spend my time doing it. Activation, certification and proving it's a qualified product are just a few of the many things Microsoft find other ways to have it looked after without involving me. If they are so good at developing their OS, they can find a way of looking after their piracy issues as I'm tired of doing it for them.
      • Too late...

        [url=]It's begun...[/url]
        Linux User 147560
      • WGA has no place in a business!

        @intrepi@... Agreed!

        Operating Systems have become commodity items. Using license keys or serial numbers for an OS as if it were a $25,000 engineering package is laughable.

        Let's not kid ourselves here - Microsoft says they are doing this to combat piracy - Unless WGA is 100% ineffective, they've reduced piracy by some unknown percentage.

        How much has the price of Windows dropped since they introduced WGA again?

        Lastly, WGA precludes windows from being used in a business. Right now, with reduced functionality, you may still have use of your computer. What's to stop Microsoft from ratcheting this down?

    • Rebadging

      <i>And of course Apple (again rebadging someone else's technology) is going to save the day.</i>

      Good for Apple. Isn't that the point of open systems? Why create another proprietary system?

      And haven't those fellas in Redmond gone out and bought a whole slew of technologies they didn't invent? The difference is that once they rebadge something, it becomes proprietary.
      • Yes, MS does that all the time.

        MS have nearly always erbadging someone else's technology, like MS Excel, MS DOS, MS SQL, First TCP/IP stack, MS Windows etc...

        They do HAVE developed one product them self, and that is the original MS Basic that was in ROM on the first IBM PC. That was the one that Mr B. Gates himself was one of the coders on. Developed on a Unix system, if I not remember wrong (which I might on this last part).
    • Real issues

      In the last few years, I've had two NTFS drives fail. Not in hardware, but by the loss of both MFTs. Suddenly, with no warning. I have good security and system maintenance software and I follow best (or at least good) practices, including backups. Which is why I didn't lose too much data. But on 100+ GB drives, even a small percentage is a lot.

      Fortunately, I ran across a program called Handy Recovery which helped minimize the loss. Even though I have difficulty recommending anything that uses any form of product activation. And Spinrite 6 verified that the drives were not affected by any physical problems. So it was apparently a problem in the file system.

      I too thought that NTFS was very robust, but I'd never had that kind of catastrophic failure with FAT or FAT32, even a couple of times when disk compression was involved. Having 90% of the market proves nothing, since most users I know and even some of the IT people don't really understand that much about computers. It's a black box to them. They use what's available in their price range or what they're given. Most would still be using Win98, if factors didn't push them to newer versions.

      I use best of breed software, rather than relying on a single source solution. No reason for Apple not to do the same. And it's not as if Microsoft has never done it. Have you noticed how many companies and software programs they've bought or co-opted over the years? They didn't even develop the original version of DOS. They have probably used far more of other people's work than Apple has. Or ever will. Not that I like Apple as a company, but the truth is what it is.
  • BS -Mr Harris

    How MS puts your data at risk? What a stupid headline when 3/4 of the article points out that that antique NTFS does a reasonable job at handling errors which is on-par or slightly better than file systems on Linux.

    Sure, there are better file systems in the works. I'm pretty sure Microsoft has some proprietary ones in the works too that you are not privy to.

    In any event, only an idiot would conclude the file system puts your data at risk. Aside from a serious system crash, what puts your data at risk are external factors like power failure, drive failure, drive damage from moving a powered-on external USB drive, etc.

    Not withstanding the dramatic BS headline, having a 10-15 minute UPS, with at least a mirrored array = NO PROBLEMS whether running NTFS or EXT3 or Reiser. In that 1-million event where the journaling file system fails, you will fall back on your backup (which you should be doing anyway to protect against hardware failures).

    A total BS story. You should find something useful to blog about.
    • Go back and read the article

      Better yet - go and read the thesis.
      If you haven't had any data corruption count yourself lucky, not smart.
      Also, I didn't say Microsoft was worse than the others - they are on a par. They all
      stink. But because they are the biggest vendor, they put more data at risk than any
      other vendor.<br>

      Isn't that obvious?
      R Harris