The issue is simple enough: if we had started with NAND flash - instead of disks - in the late 1950s, would our storage devices and software stack look like they do today? No, of course not.
Over the last year, researchers have been teasing out the problems with making flash look like disks. While these problems are less of an issue for notebook and desktop users, they are a big problem for servers.
Recent academic research has found that the SSDs used in many flash arrays have surprising performance issues. For example, researchers from Carnegie Mellon and Facebook and recently discovered that placing sparse arrays on SSDs cause premature wear and failure.
Researchers at SanDisk found that applications, such as no SQL databases, that have log structured I/O, have interference effects with SSDs that slow performance and increase latency. The log structured flash translation layer (FTL) that makes flash look like a disk interacts with the already log structured I/O from the application in deleterious ways.
Another recent paper concluded
. . . [W]e show through empirical evaluation that performance SLOs cannot be satisfied with current commercial SSDs.
It is also well known that SSD performance drops as the drive ages. The number of I/O threads accessing an SSD can also have large performance effects.
The common problem underlying these results is that flash SSDs rely on proprietary FTLs that introduce unpredictable slowdowns and delays. The root cause is FTLs have to implement a log structured I/O that requires a non-deterministic process - commonly known as garbage collection - to flush invalid data.
SSDs require log I/O and garbage collection because flash is not byte addressable so data cannot be updated in place. Log structured I/O writes new data to the "end" of the storage block pool, which can be very efficient, but requires garbage collection as a background process.
As data is updated, the FTL decides when to rewrite valid data to a new block. While that happens the data isn't available, which leads to sporadic high latency. Furthermore, apps that use log structured I/O create serious problems with the FTL's logging: segment mismatches; write amplification; and asynchronous data invalidation.
Steven Swanson's and Adrian M. Caulfield's work at the University of California San Diego found that with a 4Kbyte disk access, the standard Linux software stack accounted for just 0.3% of the latency and 0.4% of the energy consumption. With flash however, the same software stack accounted for 70% of the latency and 87.7% of the energy consumed.
Software is the long pole in the SSD tent. Clearly, something has to change.
The Storage Bits take
SSD vendors adopted the FTL model for the sake of expedience, not performance. That made it easier to sell millions of SSDs.
But now SSDs are no longer exotic rarities. Instead, they are common in many servers, especially those running virtualized apps. It is time for the industry to step up and re-think the I/O stack for non-volatile memory.
The good news is that very smart people have been thinking about this problem for the last five years. No, we haven't seen any solutions - yet. But I expect to see them start arriving soon.
Comments welcome, as always.