'

Can cloud storage replace your onsite storage?

Cloud providers offer lots of different storage services, but none of them are ideal for most applications - which is keeping enterprise storage vendors in business. Will cloud vendors ever be able to replace on-site storage for critical apps?

Cloud vendors, like Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure, all offer a variety of cloud services, ranging from high-performance, SSD-based capacity, to long-latency archive storage, at prices ranging from high to relatively low. But most applications have a variety of I/O needs, from latency-sensitive metadata updates, to bandwidth sucking backups. No single cloud storage service is ideal.

Also: Everything you need to know about cloud computing

Application developers know this, and often perform unnatural acts in their code to overcome cloud storage deficits. Two major issues are cost/performance tradeoffs, and inelastic deployment boundaries.

Cost/performance tradeoffs

The storage hierarchy -- in simpler times DRAM, disk, and tape -- reflect the tradeoffs. Fast storage is expensive, and cheap storage is slow.

To accommodate varying workloads, enteprise storage arrays move data adaptively, transferring hot data to fast caches, and moving cool data off to disk, or in some cases, all the way to a cloud archive. But this is hard to do with cloud storage, as the different services require explicit deployment, and offer different consistency guarantees.

Inelastic deployment

Cloud storage services also tend to offer only single metric elasticity. The AWS S3 service, for example, scales with capacity, but not with I/O demand. DynamoDB scales with I/O demand, but is prohibitively expensive in low-latency configurations.

Anna to the rescue

In a recent paper, researchers at Cal Berkeley, explore an advanced key-value storage system, Anna, designed to overcome current cloud storage limitations. Key-value stores are essentially two column spreadsheets, where the first column contains an access key and the second contains the data you wish to store.

Key-value stores are already in wide use in cloud services, but Anna implements three important optimizations.

  • Horizontal elasticity for scaling
  • Vertical data movement to accomodate changing access patterns
  • Selective replication of hot-data keys across multiple cores and nodes to scale access performance.

These optimizations are intended to address the need for growth in aggregate throughput, the reality of hot keys, and the shifting of workload hotspots.

Performance

There's a lot of detail in how Anna accomplishes these goals. But the bottom line is: how well does it work compared to, say, DynamoDB?

Here's one table, comparing the two:

anna-vs-dynamodb.jpg
Courtesy UC Berkeley

Adapting to hotspots is another test:

hotspot-adaption.jpg
Courtesy UC Berkeley

That's quite respectable.

The Storage Bits take

If I were Dell/EMC or NetApp, I'd be worried. Large scale public cloud storage is less than a decade old, and is rapidly maturing, as the lack of growth in enterprise storage attests.

Anna is important not only for performance gains, but for its focus on cost. Cloud storage headline rates seem reasonable, but when you add in all the overhead costs for directory lookups and data networking, enterprise storage is a lot more competitive.

Also: True private cloud isn't dead: Here are the companies leading the charge

The cloud vendors have as many PhDs as Berkeley does -- and the paper's authors have probably received job offers already -- so expect to see something like Anna productized in the near future.

Anything that makes storage more efficient at a lower cost is a win for our developing digital civilization. But perhaps not so much for enterprise storage vendors.

Courteous comments welcome, of course.