X
Innovation

Small issue of getting big data out of cloud

It's all very well planning how to get data into the cloud, but moving it back out again requires at least as much thought, says Lori MacVittie
Written by Lori MacVittie, Contributor

We need a plan. Not only how to get data into the cloud, but how to shift it back out again, says Lori MacVittie.

There's a scene in the 1980s cult classic film The Princess Bride in which the heroes plot how to rescue the princess from the castle and the clutches of the dastardly Prince Humperdinck. With minor changes, the scene accurately reflects the seat-of-the-pants approach frequently found in discussions about retrieving 'big data' from the cloud:

Westley: All right, all right. Come on, help me out. Now I'll need an SCP connection eventually.
Inigo: Why? You can't even type.
Westley: True, but that's hardly common knowledge, is it? Thank you. Now, there may be problems once the data is inside the cloud.
Inigo: I'll say. Namely, how do I find the cloud? Once I do, how do I find the data? Once I find the data, how do I get it out?
Fezzik: Don't pester him. He's had a hard day.
Inigo: Right. Right. Sorry.
Fezzik: Inigo?
Inigo: What?
Fezzik: I hope we win.

In the early days of cloud computing, the concept of big data focused on the applications themselves — the virtual images needing transfer from the datacentre to the cloud-computing provider. This issue was a huge roadblock then and, in many cases, still is. The most commonly adopted solution is the sneakernet. In other words, using couriers such as FedEx and UPS to transfer physically the huge amount of bytes involved.

If it took a FedEx truck to get the data into the cloud, it's almost certainly going to take a FedEx truck to get it out again.

IT departments are increasingly deploying WAN optimisation services in a virtual form, which fit better into a cloud-computing model. This approach is helping address the problems arising from the size of such images. But these same solutions do not necessarily solve the other big-data problem of database consistency and moving data out of the cloud-computing environment.

After all, if it took a FedEx truck to get the data into the cloud, it's almost certainly going to take a FedEx truck to get it out again. And it's going to cost you. It's worth noting that Amazon charges per hour to import and export data, which gives you some indication of the amount of time it takes to handle big data even in the cloud.

Big data need not imply big database

Cloud computing and virtualisation have so far had little impact on application architectures because it has not been necessary to make changes to adapt to a cloud-computing model. But the world of data may not escape so lightly.

Cloud computing and the promise of faster, cheaper compute are bringing to the surface the realisation that perhaps an enormous, monolithic database model is not really appropriate for the more modern, highly distributed world of mobile applications. Smarter application architectures, particularly in the data tier, will eventually be necessary to combat data-transfer fatigue experienced by networks and people.

As we've learned from big data in the enterprise, backup and transfer-window times increase, sometimes exponentially, as the volume of data requiring transfer continues to grow. The same is true of data in traditional database systems, and it is a reality that cloud is not the primary deployment site for most applications today. It is secondary or ancillary to critical business applications and eventually — often sooner rather than later — the data in the cloud must escape.

We need to plan better and that may mean significantly altering our application architectures as they relate to the data tier. We may need to explore distributed databases or alternative data-storage sources such as Hadoop, NoSQL, or even cloud-based data-storage services.

We need to plan better and that may mean significantly altering our application architectures as they relate to the data tier.

But to do so we'll really need to think about our applications. We need to consider where and how data flows between where it'll be deployed, and where we ultimately want — and need — that data to be at rest.

It's true that our intrepid heroes — even without a solid plan of rendezvous and escape — saved the princess and defeated the evil Prince Humperdinck. Though we may be inspired by such an improvised approach, it's important to recognise that it's fantasy and not reality.

The reality is that we need a plan, not only how to get data into the cloud, but how to get it back out again. And not only do we need to worry about getting it back out, but how to get it back out when we need it — not when the FedEx truck makes it next delivery. I hope we win.

Lori MacVittie is responsible for application services education and evangelism at application delivery firm F5 Networks. Her role includes producing technical materials and participating in community-based forums and industry standards organisations. MacVittie has extensive programming experience as an application architect, as well as in network and systems development and administration.


Get the latest technology news and analysis, blogs and reviews delivered directly to your inbox with ZDNet UK's newsletters.
Editorial standards