Flash storage, with its high performance reads and low power consumption, is remaking the storage industry. But raw flash is expensive, roughly 10-20 times the cost of raw disk, making the economic case more difficult.
A slew of flash vendors, including Pure Storage and, lately, HP, have invoked the idea that, with proper techniques, useable flash capacity can be competitive with the per-gigabyte cost of disk arrays. But are these comparisons fair and realistic?
Flash vendors have a point: Traditional enterprise disk arrays are notoriously under-utilized, with only 30-40 percent of their expensive capacity storing user data. That's because they are over-provisioned and conservatively managed to allow for end-of-quarter spikes and years of application growth.
But there's a technical issue too: Since flash can handle tens of thousands of I/Os per second, data structures — especially metadata — and compression techniques can be optimized in ways that aren't feasible for disk-based storage. While disk systems can use some of these techniques — and one has to wonder if the goal of selling more gigs caused them not to be employed — the fact is they haven't been, leaving a clear field for flash vendors.
The techniques flash vendors employ include some old standbys as well as more modern technologies. They all are a form of compression, although what gets compressed and how it is compressed vary widely.
HP also promotes something called Thin Clones. Haven't delved into it, but assume it's simply a file with differences only and pointers back to the original for unchanged blocks.
So, are the "usable gigabyte" claims legitimate? Yes, if you understand some caveats.
All these techniques make assumptions about data and/or usage that may not always apply. For example, LZW assumes that data is compressible — i.e., approximately 50 percent entropy — but if you give it already compressed data, it's stuck and your "available" capacity suddenly drops.
De-duplication just keeps one copy of your data, plus a list of pointers and changes. If that list is corrupted, so is your data, maybe lots of data. So those data structures need to be bulletproof. I wouldn't rely on RAID5 to protect them.
Thin provisioning assumes that all apps aren't going to want all their provisioned capacity all at once. A pretty safe bet, but a bet nonetheless.
In the main then, the flash vendors are correct. For the most part, they've built in these features and others to work inline at wire speed so they don't impact performance. The array vendors could have done something similar, but chose not to.
As the declining sales of traditional enterprise RAID attests, they are now paying the price.
Courteous comments welcome, of course. I'm currently a guest at HP's Discover conference in Las Vegas. Question: Do you have any horror stories due to these "usable gigabyte" technologies? Be as specific as possible.