Why cloud data isn't as safe as you think

Why cloud data isn't as safe as you think

Summary: Yes, the cloud works pretty well. So does your PC. But the two are not always happy together. Here's why.

TOPICS: Storage, Cloud, Software

Serious cloud users know the vendor story: multiple datacenters, geograpically distributed; advanced erasure coding that is better than RAID 6 (which I've discussed); multiple version retention; checksums to ensure data integrity; and synchronization across devices. What could possibly go wrong?

As has been documented, client-side corruption is all too common, so the cloud will carefully preserve and spread corrupted data. If you crash during an upload the data may be inconsistent - but the cloud doesn't know that - or the cloud may fail to sync changed files.

Worse, clients cannot typically preserve dependencies between files since uploads are not point-in-time snapshots, creating unexpected and unwanted application (mis)behavior. A group of linked databases - say, between CRM, ERP and distribution systems - could end up inconsistent due to piecemeal uploads of changes at different times.

The basic issue is that the loose coupling between the local and cloud file systems leaves data less protected than users - or cloud vendors - like to admit. Like most problems it is fixable, once we admit we have a problem.

In a not-yet-online paper to presented at the FAST - File And Storage Technology - conference tomorrow, researchers from NetApp and the University of Wisconsin-Madison present a solution they call ViewBox.

Built on the popular ext4 file system, ViewBox has three key components:

  • Checksumming that detects corrupt and inconsistent data
  • A view manager that creates and exposes views to the synchronization client
  • A damaged data recovery daemon, that handles the server backend independently of the client

The team integrated ViewBox with Dropbox and Seafile, two popular sync services. Viewbox ensures that the local file system and the cloud services cooperate to detect and recover from these failure modes, at a runtime speed penalty of 5% or less.

The Storage Bits take
Obviously today's file systems were not built to handle backend cloud storage. How could they have been?

But now the low cost and resiliency of cloud storage has made it a go-to resource for many IT pros. Not a problem with archiving, but as more timely data is passed into or through the cloud the greater the chance for problems.

Linux users will probably get a solution like ViewBox sooner than either Windows or OS X users. But the real problem will be convincing users that there is a problem that will cost them. Even today Apple fans often refuse to recognize HFS+ data integrity problem

But research like this will help focus OS teams on the problem, hopefully to speed a solution to market.

Comments welcome, please. The name of the paper is ViewBox: Integrating Local File Systems with Cloud Storage Services, by Yupu Zhang, Chris Dragga, Andrea C. Arpaci-Dusseau† and Remzi H. Arpaci-Dusseau.

Topics: Storage, Cloud, Software

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • It will always be a problem.

    Even during a sync, an update can come in from the net, and invalidate the sync... which doubles the traffic... and if it happens often enough, you never get caught up.
  • Good work

    'The Cloud' has got to be the biggest rip off ever. I can't wait to shout out 'told you so'. The sheep that can't think for themselves, the weak IT managers that should be standing up and fighting for security over keeping the ants happy, the moron media that write about Tech to look cool but actually know very little and have virtually no hands on experience. If you are a muppet, then get into 'The Cloud'. If you have no family and kids and are a Financial Controller or IT Manager then join the war on putting things right. I say no family or kids because you will be fired as soon as you stand up for all the things that are good and great about IT and humans. BYOD and 'The Cloud' is not one of them. Get your own datacentres, your own servers, 2 firewalls with DMZ and honeypots, forcefully educate your staff and your kids about security and being proud to protect data and systems. Hire security guards to pat down and forcefully remove all Tech from visitors to stop espionage and photography. Back up to tapes drives not someones server who you dont know and have no clue where the data is. Are you thinking this is over-the-top? Oh no, this is the minimum you should be doing. But of course many of you will do nothing. Not until all your data is deleted or changed or corrupted or ransomed. Educate, educate, educate (and start with yourself).
    • Phil, it is about close to the vest

      Keep your information as close to the vest as they say. Stop trusting people you don't know with information stored who knows where? Plenty of cheap options that can store plenty of information for a lifetime. Easy enough to transfer that information as storage options improve. Why risk transferring information over the internet and then have it stored on some server somewhere? All because the bean counters say you'll save a few bucks?
    • All that philswift wrote is true

      The lesson of Fazio/Target should be ringing in the ears of every IT manager who considers the cloud. When data is stored locally, it is under your control; in the cloud, you have no control. China wants corporate data for corporate espionage. Russia wants it for cyber-theft. Competitors want it for a leg-up in the market. Are you confident that your one of your cloud provider's employees will not sell access to your crown jewels?

      BYOD is a related, but different, issue. Most people think computer security is a big joke, so allowing them to bring their personal devices behind the corporate firewall is an invitation to disaster because they won't secure them. And some corporations are pouring gasoline onto the fire by requiring employees to supply their own devices, with those employees rightfully refusing to allow company IT policies to be enforced on their personal property.

      Outsourcing has ruined our country.
      • It's not only the cloud sellers Saucy

        We keep getting all these stories about data being hacked from stores as well as this hacked and that hacked. Look at Snowden - he took the job as a contractor SPECIFICALLY to get the information he later shared. (Yeah, I don't like that the NSA spied, but by getting the job specifically, and giving the info to non-Americans, he committed espionage.) Once that data is out of your hands, you also lose control over how to protect it. How do you know that these companies are doing enough to protect your data?
        library assistant
    • Too complex

      I thought this article was a critical review of the cloud, but then it goes on to suggest that there are /solutions/. Solutions that make the storage picture even MORE complex. I used to be an IT student and long time hobby programmer, but my instinct tells me that this complexity is WRONG. The long time principle comes to mind: keep it simple, stupid!
      • Unfortunately

        Unfortunately, KISS has met it's match. When it comes to cyber security, simple ideas are repeatedly hacked. Companies close - sometimes without much notice for the users who then overload the servers trying to get their data back. Individuals are corrupted - they hack, steal the same data they are supposed to be protecting, even read your files and sell information gleaned from them to the highest bidder. People make mistakes too, in purchasing hardware, software. What about price increases at a time when profits are down - are you going to be held hostage for your files when the server goes up 200% with little notice, and if you balk at that, will prevent you from getting your data? At least if you have someone in your organization handling the data, you can fire them.
        library assistant
    • Agreed

      It's not safe at all, and never will be. I think the idea was not thought through by the people who implemented it. It was designed and marketed by and for people that did not/do not work in IT. Perhaps a way to make home users and small feel safe and like they are equal to really big business. For companies like Adobe, it's a means to rip off their customer base. You used to OWN your program, now you don't.
      library assistant
    • Why tape backup?

      I agree with philswift 99.9%. What I don't understand is why anyone today uses tape backup.

      Tape is flimsy, subject to atmospheric conditions of temperature and humidity. It physically rubs against the read/write head of the tape drive causing wear and tear and once that drive fails you can't read any of your backups without replacing the drive, and there's no guarantee that drives for your tape format will still be available. Even if they are, the azimuth alignment of the read/write head may not match that of your failed drive so you could still be unable to read your backup tapes recorded on the old drive. Backups are slow compared with disc and restoration is very slow due to the sequential access nature of tape. Moreover, every time you backup you're causing wear to the same drive, regardless of which tape you're using.

      In contrast, removable hard disc backup uses disc drives which are hermetically sealed, use rigid discs, have read/write heads that fly above the discs and each unit is self-contained so a drive failure only affects the failed drive, not other discs. Disc is faster than tape and restoring a file is much simpler because of the random access nature of disc storage. One other factor is standards. Hard disc standards change far less frequently than tape standards and because the hard drive is self contained, even a change of standards will have very little effect on disc backup, as all disc standards will have interfaces that can connect via ethernet, USB, Firewire or Thunderbolt or more than one of these. Then there's the wear and tear factor; wear only occurs to the disc drive you're using at the time; all other discs are subject to wear only wheen they're in use, not one drive for all discs.

      I rest my case - or is there some fundamental advantage that tape has that I'm missing? Did I hear "cost"? Cost per gigabyte of discs is now so low that any savings that tape may offer are not worth the reduced reliability. Shouldn't the backup medium be chosen for maximum reliability?
  • No data is 'safe' be it cloud or datacenter or PC it's all vulnerable

    When you get that then you have the right perspective.

    ^philswift nice rant :-\
    • our's is !

      we kept, and keep all out important data on hard drives (not connected to a computer) stored in a huge bank grade safe. No matter how good you are, your nothing going to get access to it.

      as for the title of this article, "the cloud" IS EXACTLY AS SAFE AS I THINK !!!

      (BTW: I don't think it is safe at all, if you could not work that out!)
      • That safe isn't as safe as you think.

        The bearings on disk drives (and their lubricant) still age...

        I worked at one place that kept their critical storage in a safe...

        But for some reason anything stored on the bottom shelf kept getting damaged.

        It wasn't until an operator working overtime had to go to the safe... and found the janitorial staff using a floor polisher in the safe.. and bumping the units on both shelves.

        Right where the data was damaged...
        • Stored media is still the safest way to go

          I started with 3.5" floppy backups. Oh what a joy that was. Then Syquest came out with 88MB removable hard platter drives. That worked much better and I stored the floppies on redundant Syquest drives. CD burners came and were written with multiple Syquest images. After that came the Jazz drives that were faster and could store more than a CD could. These became the backup media for the years it took for DVD burners to mature in value and performance. DVDs were used to backup the Jazz images and archived. Now it is a combination of thumb drives and redundant DL DVD media.

          I can access my data as far back as 1995 with important stuff going into the 80's (I still have a working copy of Ashton-Tate's Framework II). Not a single bit of that data is accessible from any where but my offices and none is online.

          I may be a single seat/user case, but this can scale to most small and medium businesses. When you have TBs of data for a single user, how can the cloud make sense even if we did assume it was secure?
        • it is a special safe,

          its very think steal, inside a building, and magnetic free, and do you not think we would also have duplicates of each HD?, they are also regularly checked, and have a "used by" date, before which we have to put it on new media (with its duplicate).

          It is as safe as we make it, and that we think it is.

          Its not a walk in safe, the janitor does not have access to it, and the floor does not need polishing.

          What kind of IT security do you have if your janitor has access to secure data?
          You need a new Admin.

          the hard drives are also stored in their original anti-static bags and boxes, and anti-static foam, (those 'black boxes').

          thing is we control everything, access, security, backups, re-writings, and record keeping. With the cloud you control nothing.

          We have a 'chain' be we own each link, and we control how long the chain is, there are weak links, but even those links are quite strong, (and not many of them).

          How to reduce security, make the chain longer, give up control of the links, add more links, and make sure you include a good number of 'weak links'.

          and if the internet goes does, or the phones goes down, or the power goes off, we can fire up the Generator, and without the internet, or phones, or ISP's or service providers, we can just simple keep on going normally..

          Can you do that if you are tied to the 'cloud'?
      • Exactly!

        I have never held the misapprehension that the cloud was even close to being "safe."
  • For personal data the cloud is the last resort

    Knowing storage as I do I maintain at least 4 copies of critical data - some of that written on the 1,000 year DVDs, M-discs - the 4th copy being encrypted before it leaves my system and stored remotely. But I hope I never have to use it, because that would mean my house was destroyed.

    All data is vulnerable, but being humans we don't want to think about that. The Universe hates your data!

    And yes, nice rant, Phil!

    R Harris
    • This is a genuine ?

      But out if interest what is all this critical personal data?
      It seems to be an American thing, so I'm just genuinely curious.
      I use google docs as a sort of My Docs replacement, useful to have everything available no matter where I am.
      But if anyone got hold of the contents I think they would be very disappointed.
      No bank details in there, or any other payment type. No addresses.
      I also have a business G Apps account, and yes, it has invoices in there, but no links to where the actually money is. Nothing of real use to anyone.
      The really critical stuff in in my head.
      • Not Fail Safe Either

        Having the 'really critical stuff' in your head is not fail safe at all.
      • And

        what happens if Google accidentally delete your account or your data?

        Critical? All those photographs of the kids growing up, friends and family etc.
  • Narrow view

    If you think file sharing and sync services are "the cloud" then you have a parochial consumer view of things. Most file sharing and sync hosts don't even pretend to offer geographic resource distribution and when they do it is only for business reasons and not service resiliancy.

    The term Cloud is far too overused by marketers and "journalists" parroting press releases.