A promise of 99.9% uptime sounds impressive until you do the math. With a total of 8766* hours in a year, that 0.1% of downtime still adds up to eight and three-quarter hours. So salesforce.com could have a further three hours' downtime on top of the five and three-quarter hours that some customers suffered on Tuesday before it got above 0.1% downtime for the year — which is still quite a decent performance.
That calculation is worth bearing in mindSalesforce.com has broken two cardinal rules for on-demand providers before anyone is tempted to either jump ship from salesforce.com to another provider or even completely dismiss the whole notion of on-demand. TalkBack poster jmjames, for example, writes "why in the world would anyone outsource not just a critical business process such as CRM, but critical, confidential data?" I wonder how many readers work for IT shops that manage to consistently deliver 99.9% uptime to all of their users 24/7/365, let alone commit to it? Does jmjames? If not, there is the answer to his question.
Of course there are certain applications (and IT shops) where 8 hours of annual downtime is still too much, in which case it's worth making the extra investment. But the very high cost of that extra reliability has to be weighed up against the commercial benefit. Having your customer call center out of action for an hour or two would be a disaster for many businesses. Having your salespeople briefly unable to make sales calls or review their performance is something most businesses can live with — those individuals have other work they can get on with in the meantime.
I suspect the vast majority of salesforce.com users would be perfectly content with just 8 hours of annual downtime — although naturally they'd prefer if it didn't come all on the same day. In fact I'd be prepared to bet that most would turn you down if you offered them a premium service to reduce the figure to two-and-a-half hours (99.97%) or five minutes (99.999%). What matters is not the downtime itself — provided the vendor has made every effort to maintain an appropriate service level — but the perception of the downtime. This is where I believe salesforce.com has slipped up badly in its handling of Tuesday's outage and indeed in its overall approach to service levels.
In my view, salesforce.com has broken two cardinal rules that I believe on-demand providers must adhere to:
- Take steps to keep users informed. Salesforce.com doesn't even seem to have a process in place to notify users when a glitch occurs. Best practice would be something like what Jonathan Tang of competitor Salesnet outlined to a journalist on Wednesday:
"... while unscheduled down-time is unavoidable, companies should alert customers immediately when there's an outage and keep in touch with status reports. Salesnet has four tiers of customers; those at the top can expect hourly calls from account executives and engineers during a 'code red,' while the lower tiers can expect e-mails to their administrators."
- Be upfront about service levels. Providers should spell out to customers the service levels they'll commit to — and in what circumstances they'll forfeit penalties, if any. Amazingly, Salesforce.com's generic Master Subscription Agreement makes no undertakings whatsoever, beyond this vaguest of assertions:
"Salesforce.com represents and warrants that it will provide the Service in a manner consistent with general industry standards reasonably applicable to the provision thereof and that the Service will perform substantially in accordance with the online salesforce.com help documentation under normal use and circumstances."
Customers have a responsibility too, never to take for granted anything that's not a contractual commitment. Any salesforce.com users that couldn't afford to be offline for most of Tuesday really only have themselves to blame for not reading the small print.
* An earlier version of this posting quoted the erroneous figure of 8736 hours in a year, which was based on calculating 52x7x24 (ie 364 days) rather than 365x24. Add on another 6 hours to average in the effect of leap years and you reach the correct figure of 8766. Thanks to the first two TalkBack posters for spotting this.
Here's a quick reminder of how much downtime customers are exposed to at various service levels hosting providers often boast about:
- 99.5% — 43.76 hours (an entire working week, and more)
- 99.7% — 26.30 hours (more than three working days)
- 99.9% — 8.77 hours (more than one working day)
- 99.95% — 4.38 hours (half a working day)
- 99.97% — 2.63 hours (an extended lunch break)
- 99.99% — 0.88 hours (about 50 minutes)
- 99.995% — 0.44 hours (a half hour)
- 99.999% — 0.09 hours (five minutes)