Yesterday, Salesforce.com posted a more complete explanation and a mea culpa for its Dec. 20 outage. The gist of the recent explanation is that there is no explanation, at least not yet. The database clustering error "just appeared" and wasn't a previously known problem. The salesforce.com statement said:
We have deep redundancy, but with all complex software-based systems, there will occasionally be issues. It's our job to make those issues as rare as possible, and that is a never-ending mission for us....an extremely rare, undocumented software issue is not something that even the most robust systems can prevent 100% of the time. No system has 100% performance, and no software is bug free.
It's a bit ironic that the company with the slogan "No Software" is compelled to intone the obvious, that "no software is bug free." The company said that new procedures for restoring service more quickly were put in place. Customers who had unrealistic expectations for software services will certainly have more realistic expectations going forward. However, the recent hosted service black eyes (including the Six Apart/TypePad outage) won't slow down the rapid growth of hosted business applications.
NetSuite, another on demand software service, took the occasion of salesforce.com's outage to offer a money-back guarantee of 99.5-percent uptime (nearly 44 hours of downtime per year). It doesn't cover the entire weekend and 99.5 percent coverage is not enterprise-class service, but you're not paying for five-nines kind of service. I'd expect salesforce.com to follow suit with some kind of service level guarantee as part of its service tiers. Phil Wainewright has his take on services levels and where salesforce.com broke some cardinal rules of on-demand providers.