How does flash storage fail?

The flash failure mode is odd: when most things break you lose their contents. But when flash fails your data is still there. Did you ever wonder why?
Written by Robin Harris, Contributor

The flash storage industry seems to have a code of silence on NAND flash technical issues. Search online for how NAND flash fails and you won't come up with much.

I recently did a video for LSI, the long time maker of RAID and SCSI controllers and adapters, in which I interviewed LSI system architect and corporate fellow Robert Ober, who holds dozens of patents and is deeply knowledgeable about storage technologies.

How does flash work?
Flash stores an electrical charge in a quantum well in a floating gate transistor. The floating gate name comes from the fact that the normal transistor gate is isolated from the source and drain by an insulating layer of oxide. The gate "floats" between the two insulating layers.

A floating gate NAND flash cell


The quantum well in the floating gate stores and electrical charge. In single level cell (SLC) flash either the absence or presence of charge gives us a single binary digit.

In multi level cell (MLC) flash there are four levels of charge corresponding to two binary digits. And in three level cell flash (TLC) there are eight levels of charge corresponding to three binary digits.

It takes about 20 V to write a flash cell. This voltage is created by on-chip pumps, which is why flash chips do not require a 20 V input.

With each write the high voltage places more charge into the insulating layers that protect the floating gate. As the charge in the insulating layers grows it takes longer and longer to write the cell.

Eventually, a write is no longer possible. When that happens the existing data can not be overwritten and is therefore preserved.

That's what flash "failure" looks like.

Other issues with flash
The video talks about more than how flash fails. For example, how should we think about the fact that flash is a wearing medium? How does write amplification work? What is the impact of data compression on write amplification?

These are among the other major issues addressed in the LSI video that you can watch here.

The Storage Bits take
Given the deep impact that NAND flash has had on the storage industry in the last five years, it is surprising about how little is generally known about the technology. Some of this is due to the desire to maintain trade secrets, but much of it has to do with a fear that knowledge would make users less comfortable about flash.

And users are uncomfortable about flash, though the discomfort is slowly dissipating with greater experience.

For example, early on the industry claimed that flash-based SSDs were much more reliable than disk drives. That wasn't completely true.

Yes, the most reliable SSDs are somewhat more reliable than HDD's, but vendors who threw together product's from spotmarket components turned out to be often much less reliable than disk drives.

Today we don't have any good alternatives to NAND flash, so the questions about how it works and how well it works are somewhat academic. But as new persistent storage technologies – such as resistance RAM technologies – come to market these issues will become more important to technical decision-makers.

Comments welcome, as always. LSI paid to create the video, but not for this post. What questions do you have about how flash works?

Editorial standards