Making MLC safe for the enterprise

Summary:Can 3bpc flash - with only 1,000 read/write cycles - ever be safe for serious enterprise apps? Of course it can. In fact, the enterprise already uses lower duty cycle media today. It's how you use it that matters.

Can 3bpc flash - with only 1,000 read/write cycles - ever be safe for serious enterprise apps? Of course it can. In fact, the enterprise already uses lower duty cycle media today. It's how you use it that matters.

As NAND flash has moved from cameras and cell phones to SSDs and storage arrays, the focus has been on SLC - single level cell - flash. SLC has much higher R/W cycles - from 100k to 1,000k - than MLC (multi-level cell). The newest generation 3bpc - 3 bits-per-cell - MLC has 1k write cycles. Should this stop designers from using it for mission-critical apps?

The cost factor The only reason for using MLC is that it is a lot cheaper than SLC. You get 2-3x the capacity on the same chip.

But the actual price difference is closer to 5x. Why? Because MLC is the high-volume product. It takes a lot of photos to fill an 8 GB SD card even 100 times, let alone 1,000.

But MLC chips have a higher failure rate - perhaps as high as 50x SLC. That is only about 0.5%, but for vendors using millions of chips this is a real problem.

Overcoming the problems None of these factors are fatal to MLC in mission-critical apps. Enterprises routinely trust their data to LTO tapes that only support 200 head passes - 1/5th of what even 3bpc MLC offers. The key is matching the media to the application.

MLC has used several techniques to overcome the endurance problem:

  • Higher capacity. The larger the box, the longer it takes to fill. While filling a 128 MB flash is easy, filling a 128 GB SD card 1,000 times takes much longer.
  • Over-provisioning. Do what disk drives do: assume bad blocks happen and have extra capacity to replace them when they fail.
  • Wear-leveling. Just because you're hitting the same file, doesn't mean you have to hit the same blocks. Wear-leveling spreads the joy and ensures that blocks wear at about the same pace.
  • Improved garbage collection. Since flash has to be written in blocks, transferring data from old blocks to new ones is a major source of wear. Replace some flash capacity with non-volatile DRAM and you stop a major source of wear.
  • Enhanced ECC. Just like disks, flash vendors have gone from 4-bit to 15-bit ECC as geometries shrink and capacities increase. They can go much further, just as disk drives have.
  • Improved signal processing. Signal processing determines what is signal and what is noise. There are multiple techniques for measuring flash cell performance to improve data integrity. The net: MLC that acts more like SLC.

The Storage Bits take It will take time for engineers to scope out all the MLC issues and to develop the software to implement optimized algorithms. But have no doubt, the problems will be solved because the economics are too great to ignore.

Once these techniques are embedded in silicon, they'll spread to consumer devices too. That means much cheaper SSDs for us consumers. Cool.

Comments welcome, of course. A presentation by Anobit at the Storage Networking Industrial Association Storage Developer Conference spurred my thinking on this subject.

Topics: Networking, CXO, Hardware, Processors

About

Harris has been working with computers for over 35 years and selling and marketing data storage for over 30 in companies large and small. He introduced a couple of multi-billion dollar storage products (DLT, the first Fibre Channel array) to market, as well as a many smaller ones. Earlier he spent 10 years marketing servers and networks.... Full Bio

zdnet_core.socialButton.googleLabel Contact Disclosure

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.