Data gravity: The reason for a cloud's success

Data gravity: The reason for a cloud's success

Summary: As clouds get larger, they gain a greater hold over their customers because of "data gravity", according to researcher Dave McCrory.In a packed session at Interop Las Vegas on Monday, McCrory, a senior architect of VMware's Cloud Foundry service, gave a talk on his pet subject of "data gravity".

SHARE:
TOPICS: Storage
4

As clouds get larger, they gain a greater hold over their customers because of "data gravity", according to researcher Dave McCrory.

In a packed session at Interop Las Vegas on Monday, McCrory, a senior architect of VMware's Cloud Foundry service, gave a talk on his pet subject of "data gravity". He showed how as a cloud provider like Amazon Web Services brings in more information, it creates a virtuous circle that attracts more and more data to the same cloud.

"The more data you have in your network, the more likelihood you'll have data that will want to consume it," McCory said. "If more of the data lives in [Amazon's] network, people will be attracted to it."

This is not a case of lock-in, he said, but more a natural consequence of how data behaves: as applications feed on datasets, they create data in turn, which gets analysed by other applications, which create their own data, and so on.

From a developer's point of view, it becomes sensible to locate these applications and data within the same network to achieve high-bandwidth and low-latency.

This then favours whatever cloud system the original data was stored in, as developers can reap a whole host of benefits by staying within the bounds of the provider's network.

"The closer and closer you get, the more addicted you are to sub-millisecond latencies, the harder it's becoming to move your data away," he said. "You generally only move closer."

To illustrate how cloud providers can benefit from this, he pointed to the explosive growth of Amazon's S3 storage service, which went from storing 2.9 billion objects in 2006 to 762 billion in 2011. Data has grown on the service because of 'data gravity' he said, and the growth has been spurred by Amazon making its APIs available to developers so it is easier for them to write to the service.

However, the costs of loading data in and out of Amazon show how data gravity can lead to difficulties; it costs nothing to load a TB of data into AWS, $10 (£6) to process it within Amazon's cloud, but $120 to take the data out.

"It's incredibly difficult to beat the gravitational pull," he said.

Topic: Storage

Jack Clark

About Jack Clark

Currently a reporter for ZDNet UK, I previously worked as a technology researcher and reporter for a London-based news agency.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

4 comments
Log in or register to join the discussion
  • It sounds like an attractive theory although (ahem) I'd like to see the data supporting it. I'm not sure I see how it it necessarily translates across corporation boundaries, though: just because I have my stuff on Amazon servers, it doesn't affect what services other organisations use...
    Manek Dubash
  • Hi Manek, I had a chat with McCory and he said he's putting together some data and a few algorithms to tell you things like the "mass" of a data. It sounds like a useful way of looking at data in terms of stickyness/friction, but agree with you that the idea isn't fully fleshed out yet. Not sure what you mean by the last bit, though if I'm parsing it correctly I think the data one organisation puts into a cloud directly affects other organisations as more data = more scale for amazon = greater chance of price cut and ensuing increase of 'data gravity' within its cloud.
    Jack Clark
  • Jack: you parse my comment correctly so thanks for clarifying - it's a good point you make about scale and gravity, which makes a great deal of sense.
    Manek Dubash
  • This sounds like a restatement of Metcalfe's Law for a network of data (originally the value of a telecommunications network is proportional to the square of the number of connected users of the system, but held to be true of pretty nearly anything networked, except that I agree with the independent restatement of that it's closer to n lob n than n squared - linear not exponential growth, especially since the assignment of equal value to all connections or all groups is a bit unrealistic and Zipf's law applies to everything with a tail). But this is just the flip side of lock in; good luck getting your business model out if the costs of a service shift to outweigh the advantages of having your data connected and tractable by the same tools. In fact, a good exploration tool like Tableau or the new Microsoft Research climate data explorer mean that you don't have to move your data to the same place to explore it together. Assuming the bandwidth isn't an issue, distributed data is a better fit for our heterogeneous world ;-)
    Mary
    Simon Bisson and Mary Branscombe