When any business's IT goes down, users will ask questions of those who provide their IT services - but in a cloudy world, a lax response to those questions can add up to major frustration.
It's a situation thrown sharply into relief by this week's outage of cloud DNS service Zerigo, which fell over after being targeted by a malicious DDoS. Zerigo sells its services directly to customers from its site, and is also available as a free and paid-for add-on for the popular Heroku platform-as-a-service.
The moment Zerigo went down on Tuesday, affected customer websites became inaccessible, internal messaging services stopped working, and mobile applications failed.
Affected customers turned to Zerigo's Twitter account for information, but after initially being given some details on the outage, the company went quiet on both Twitter and its status page for several hours - and in a cloud-enabled enterprise, that information blackout can be disastrous.
Dave Zille, a web entrepreneur, says 65 of his websites were taken offline by the seven-hour outage. Event.ly, a technology-oriented events and listings start-up, was also hit, while a quick browse of Twitter reveals dozens of other businesses affected by the outage. Many of them migrated to Amazon Web Services's Route 53 DNS service during the downtime, while others waited for Zerigo to come back up.
The situation could, and should, have been handled better.
Where Zerigo went wrong:
1. Poor communication. Both Event.ly and Zille called out the four- to five-hour communication gap as a major failing.
2. Obtuse technical language: The service's status page mixed technical terms in with standard English, making it difficult for both technical and non-technical users to get a clear picture on the outage.
Key lessons for cloud providers:
1. If a house is burning, let people know exactly how far away the fire brigade is.
Even if a fix is hours away, consistently updating people as to the state of the problem will reassure not only them, but their clients. The worst thing companies can do is appear to fall out of contact.
"We all understand that there are tech issues like this from time to time, but there was no excuse for such a large gap in communication," Zille said. "In these incidents, communication and transparency is key."
This message was echoed by James Conroy-Finn, a head of architecture at Event.ly, who told me that Zerigo should have kept customers informed of exactly what was going on and who was affected.
2. Be clear about compensation.
When I asked Zerigo what they were going to do in terms of compensation for affected customers, they said:
"As stated in the Zerigo SLA, 'the guarantees provided in this SLA exclude acts of nature, acts of God, criminal acts, other extenuating circumstances outsource our control, equipment outsource our control, scheduled maintenance with at least 24 hours notice, terms of service violations, and any period of time when advance payment has not been received for services'."
This is not a clear, nor helpful response. What people need in the case of an outage is clarity, and clarity is what cloud providers often fail to provide.