Cloud secrecy: Will it cause a system meltdown?

The tendency of cloud providers to keep the internal details of their infrastructure secret could cause problems that could wreak havoc in the cloud, a researcher has warned.
Written by Jack Clark, Contributor

Microsoft's general manager for Windows Azure, Bill Hilf, thinks it is okay for cloud providers to hang onto the secrets of their internal infrastructure, and says if customers are designing applications that live or die on the basis of very specific requirements, they should not be going to the cloud in the first place.

"You've picked a hammer instead of a screwdriver for a screw," Hilf says. He stresses that less than one percent of the Windows Azure customers he talks to have requirements that go this far, and they tend to be from the government.

Hilf was reacting to a paper published by a Yale academic that argues the secrecy with which cloud providers treat their infrastructure could lead to wide-ranging problems.

In his "Icebergs in the Clouds: the Other risks of cloud computing" (PDF) academic Bryan Ford argues the lack of disclosure about the inner workings of clouds could put service providers on a collision course with one another.

"As diverse, independently developed cloud services share ever more fluidly and aggressively multiplexed hardware resource pools, unpredictable interactions between load balancing and other reactive mechanisms could lead to dynamic instabilities or meltdowns," he writes.

The problems he identifies come in two classes - programming issues and interdependency problems - and look set to become more prevalent over time as cloud providers and services interlace with one another. He presented his paper on Tuesday at the Hotcloud '12 conference in Boston.

A programming issue he identifies is where an application provider's load balancer eventually syncs its update cycles with the hardware power optimiser operated by a separate provider. This leads to a death spiral where as power is cut to one server the load balancer moves workloads to another and all incoming traffic ends up oscillating between one server and the other, "cutting the system's overall capacity in half - or worse if more than two servers are involved," he writes.

This problem would not arise if cloud providers disclosed the internal technologies they use to scale power, distribute loads and perform other detailed infrastructure management techniques, he argues, as developers would be able to see problems before they arose.

Ford believes the cloud business model encourages "providers not to share with each other the details of their resource allocation and optimisation algorithms - crucial parts of their 'secret sauce' - that would be necessary to analyse or ensure the stability of the larger, composite system."

Like Microsoft's Hilf, Equinix's director of cloud and technical services believes that the paper could be exaggerating the scale of the problem.

"Many of these [problems] aren't new or specific to cloud," Sam Johnston, Equinix's director of cloud and technical services, says. "You need open formats and open interfaces."

By example, if OpenStack, the open-source cloud operating system, got more traction it could alleviate this problem. However, it is an immature technology and even companies that have put it into production public clouds, such as HP, have seasoned it with additional, proprietary technologies.

The second problem Ford identifies is easier to grasp - interdependency. By building services atop other clouds, which themselves sit on top of other ones, Ford believes that services can expose themselves to greater risks of failures.

For example, he gives an example of a high availability application which provisions storage from two separate cloud providers, however if both these providers are provisioning their network or compute resources from the same mega-provider and the mega-provider goes down, then the application will fail.

This type of dependency problem reared its head last year when Amazon Web Services failed, taking down the Heroku platform-as-a-service and a variety of applications that sat on top of Heroku.

In my opinion, this type of problem is here and only going to get worse. When you consider the de facto monopoly that Amazon enjoys over the developer-centric cloud, it's hard not to foresee further problems of this nature. Dropbox, for instance, is used in many enterprises but its data is ultimately hosted in the AWS S3 storage cloud. If Amazon goes down in the future and takes Dropbox with it, you can guarantee it will be big news.

Ultimately Ford's paper concludes that the secrecy of cloud providers could create problems.

It's worth pointing out that by keeping this information private, the providers could be saving themselves from attackers. "The information is also able to potentially be used and abused by attackers," Johnston says. I don't want to advocate security via obscurity but I think it can be useful to not share that information."

Ford counters Johnston's point by arguing that if cloud companies shared information about the specific workings of their infrastructures they could foresee problems and get rid of them before they arose. Because many companies, such as Amazon or Microsoft, see their internal software as a key competitive advantage, Ford suggests establishing a third-party organisation that could perform independent testing on various cloud's inner workings, without the companies having to give all the information to the public.

"While the cloud computing model is promising and attractive in many ways, the author hopes that this paper has made the case that the model may bring risks beyond obvious information security concerns," he writes. "At the very least, it would be prudent for us to study some of these risks before our socioeconomic system becomes completely and irreversibly dependent on a computing model whose foundations may still be incompletely understood."

I think Ford makes some good points, though some of the problems he identifies only seem to arise if you make strange architectural decisions. However, cloud companies are not immune from this: Heroku had an outage last week that came about because one part of its infrastructure created a data record that its routing infrastructure could not parse - a prime example of what happens when not enough information is known about two independent systems, and exactly the type of problem Ford identifies. See also:

Editorial standards