Cloud outages: Why one status page is better than many

Cloud outages: Why one status page is better than many

Summary: The need for a clear picture of cloud outages is more important than ever, given more businesses are using multiple clouds for sites and software. But most dashboards are basic, at best.

SHARE:
TOPICS: Cloud, Amazon, Outage
3

When a site like CloudApp goes down, does the reason lie with a fault in its application, or in Heroku's platform-as-a-service that it sits on, or in Amazon Web Services's infrastructure-as-a-service cloud, which Heroku in turn relies on?

These are the questions administrators need to ask themselves when something goes wrong with a modern web application — but finding the answer can be tricky. 

The composite nature of modern websites means they can be damaged by flaws with their own technologies, as well as by problems in the cloud services they use. With the rise in the use of third-party technologies for anything from ads, to databases, to login and payment areas, large websites frequently access a multitude of services — all of which can, and do, fail. 

How, then, can you efficiently diagnose a problem? 

Many companies have tried to build tools to let administrators see through the fog of cloud disruption. On Wednesday, Compuware released its own attempt. 

compuware
The Outage Analyser site lets administrators track outages as they happen. Image: Compuware

Outage Analyser is a free website that compiles data gathered by 150,000 Compuware application performance management software agents (APM) used by its customers across the world. This data is amalgamated to give administrators the information they need to determine the root cause of the problem.

The Compuware tool shows the probable cause of the outage, the regions affected and a list of potentially hit websites and other dependent services. It also has an option to display a timeline to show how the outage evolved. 

With Outage Analyser, admins can view outages as they happen and track their spread across the globe. Unlike other comparison tools, it can also make a stab at telling them which sites are dependent on services that have gone down. 

"It is a fairly sophisticated approach that is required to do something like this. The first ingredient is to have the insight and visibility across the internet," Steve Tack, a product manager for Compuware's APM products, tells me, noting that Compuware is using its APM technology to take around eight billion measurements a day.

Though all the major clouds — Amazon, Microsoft, Google — and some of the minor ones have comprehensive status pages, there are few services that pull together information from multiple providers. 

"There's a benefit of having a neutral provider deliver this information," Tack says. "All the cloud providers themselves will present back information. What makes this unique is it's testing from the user experience... you have all these third-party services coming together at the browser."

"If I'm delivering a web property, I don't want to go to each of my providers [status pages]. I care about the whole experience," he adds.

Other companies have attempted to produce a tool like Compuware's: Cedexis has a range of services that let businesses monitor the performance of clouds, though these all cost money. However, Cedexis does not display dependencies. 

In 2010, Compuware released CloudSleuth, which showed the latencies and availability of various clouds, but it did not list dependencies or have timeline features. 

Urgent need

In my view, Outage Analyser is a handy tool, though the interface is a bit clunky. There is an urgent need for better information about clouds and cloud interdependencies, and this site is a step in the right direction.

However, until we have more tools like Outage Analyser, it will be difficult to assess the true scale of an outage. This is because the data in Compuware's site is sampled from Compuware APM customers, and some services may not appear, as those businesses may not be accessing them. 

Unfortunately, this creates a complicated situation. One service evaluating multiple clouds needs to be compared with another to get a full picture, and then their results have to be normalised. The layers of complexity just increase. I hope that other companies produce similar tools, so administrators can have an easier time quickly diagnosing cloud failures in the future.

Topics: Cloud, Amazon, Outage

Jack Clark

About Jack Clark

Currently a reporter for ZDNet UK, I previously worked as a technology researcher and reporter for a London-based news agency.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

3 comments
Log in or register to join the discussion
  • Cloud Security

    So how do you get your data from the cloud when all 65 failovers went down? This is a complicated situation and I don't think that Amazon's failover is the Apple Cloud. Apple's failover will not be Ubuntu ONE. I think this would be up to the underlying company who implements any cloud architecture.

    I want to quote Jack Clark here "With the rise in the use of third-party technologies for anything from ads, to databases, to login and payment areas, large websites frequently access a multitude of services — all of which can, and do, fail."

    This is the single most important comment in his editorial. We should take pride in how we interact and place our fingerprint in the cloud. The Sony Playstation Network outage is the best example here. You have care-free users who completely trust the vendor to secure their clouds without any regards to security. We take our chances when we apply and sign up for cloud accounts. We can't be responsible for something we are not in control of and in contrast, we must learn what we need to be in control of.
    smitheo1
  • Quite right, Mr. Smith,

    as I always say, "Live by the cloud, die by the cloud."

    Have a nice day,

    Doc
    Doc.Savage
  • Real Time AWS EC2 Cloud Availability Graphs

    A helpful tool to use in figuring out if your systems are having issues or the cloud provider, check out http://www.systemswatch.com It's a lot quicker then waiting for Amazons Health Status page.
    systemswatch