Why cloud data isn't as safe as you think

Summary:Yes, the cloud works pretty well. So does your PC. But the two are not always happy together. Here's why.

Serious cloud users know the vendor story: multiple datacenters, geograpically distributed; advanced erasure coding that is better than RAID 6 (which I've discussed); multiple version retention; checksums to ensure data integrity; and synchronization across devices. What could possibly go wrong?

Plenty
As has been documented, client-side corruption is all too common, so the cloud will carefully preserve and spread corrupted data. If you crash during an upload the data may be inconsistent - but the cloud doesn't know that - or the cloud may fail to sync changed files.

Worse, clients cannot typically preserve dependencies between files since uploads are not point-in-time snapshots, creating unexpected and unwanted application (mis)behavior. A group of linked databases - say, between CRM, ERP and distribution systems - could end up inconsistent due to piecemeal uploads of changes at different times.

The basic issue is that the loose coupling between the local and cloud file systems leaves data less protected than users - or cloud vendors - like to admit. Like most problems it is fixable, once we admit we have a problem.

ViewBox
In a not-yet-online paper to presented at the FAST - File And Storage Technology - conference tomorrow, researchers from NetApp and the University of Wisconsin-Madison present a solution they call ViewBox.

Built on the popular ext4 file system, ViewBox has three key components:

  • Checksumming that detects corrupt and inconsistent data
  • A view manager that creates and exposes views to the synchronization client
  • A damaged data recovery daemon, that handles the server backend independently of the client

The team integrated ViewBox with Dropbox and Seafile, two popular sync services. Viewbox ensures that the local file system and the cloud services cooperate to detect and recover from these failure modes, at a runtime speed penalty of 5% or less.

The Storage Bits take
Obviously today's file systems were not built to handle backend cloud storage. How could they have been?

But now the low cost and resiliency of cloud storage has made it a go-to resource for many IT pros. Not a problem with archiving, but as more timely data is passed into or through the cloud the greater the chance for problems.

Linux users will probably get a solution like ViewBox sooner than either Windows or OS X users. But the real problem will be convincing users that there is a problem that will cost them. Even today Apple fans often refuse to recognize HFS+ data integrity problem

But research like this will help focus OS teams on the problem, hopefully to speed a solution to market.

Comments welcome, please. The name of the paper is ViewBox: Integrating Local File Systems with Cloud Storage Services, by Yupu Zhang, Chris Dragga, Andrea C. Arpaci-Dusseau† and Remzi H. Arpaci-Dusseau.

Topics: Storage, Cloud, Software

About

Harris has been working with computers for over 35 years and selling and marketing data storage for over 30 in companies large and small. He introduced a couple of multi-billion dollar storage products (DLT, the first Fibre Channel array) to market, as well as a many smaller ones. Earlier he spent 10 years marketing servers and networks.... Full Bio

zdnet_core.socialButton.googleLabel Contact Disclosure

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.