Will you get locked into your cloud? Ask the data gravity theory

A researcher has developed a way of looking at clouds that assesses their individual "gravity" and the effect it can have on users' data

Cloud computing was meant to do away with vendor lock-in, but the work of an independent researcher suggests it could be having the opposite effect.

Researcher and cloud architect Dave McCrory has spent the past two years working on a theory of "data gravity", intended to allow IT buyers to assess cloud products' potential for vendor lock-in and so make decisions to keep their data as accessible as possible.

According to the theory, clouds are not the adaptable systems that their marketing portrays them as, but planets that are always hungry for more data - and loathe to let it leave.

"The motivation for looking at things like this is to determine what you want to do with your data and where you want to put your data, so this allows you to look at it as instead of just storing bits and bytes, it could be the longer term effects of your decision to put your bits and bytes over there," McCrory says.

The theory of data gravity looks at applications, clouds and even collections of clouds and the amount of information they contain ("data mass").  If a cloud has more data gravity — a function of the mass of its application and data mass, divided by limiters such as latency and bandwidth which are then squared — then McCrory's theory suggests it will be more difficult to get data out of it.

"I think [data gravity] explains a lot of behaviours. For example, why does Amazon allow you to transfer data in at no cost, and to transfer data out you have to pay? That's leveraging the data gravity effect," he adds.

Cloud buyers should make data gravity a consideration when looking at how and where they are going to deploy an application, according to McCrory. They will have to walk a tightrope between placing their application close to the source of the gravity and thereby enjoying a host of network benefits and keeping it far enough away that it does not get stuck to a single cloud.

"If your application spans clouds or different data sources, the idea of being able to measure the different data gravity would determine where you want to move your app," he says. "Maybe you want to resist [data gravity] so you don't become more locked in, so maybe you would not want to move your app closer to the source, or maybe you want to get more performance so you want to leverage data gravity so you can get closer."

Data gravity can also be used to look at companies' future strategies, he believes.

"When companies make acquisitions, for example Facebook with Instagram, Instagram was its own asteroid or planet and ultimately I would bet we see Instagram collapse into Facebook and so it becomes a bigger logical body than previously," McCrory says.

"It also explains why companies might make acquisitions of technologies or other services to boost requests per second around their datasets... I think that explains why Google just announced their cloud platform which is more infrastructure based, they again want to exploit the fact that if you run your application there... you're closer to [your] data."

McCrory's theory is a work in progress and he is asking for interested parties to contribute ideas and, most importantly, data to it so it can be tested and evaluated by the community at his data gravity website.