Yes, of course Salesforce's outages are sullying the reputation of the SaaS model, as David Berlind surmises. But it's not (or shouldn't be) the outages themselves that are doing the sullying. The true culprit is Salesforce.com's outage obfuscation, which Dan Farber reported last night — and on which I have more to add.
After Salesforce.com had its first major outage in December (and prompted by some robust comments from Talkback contributor jmjames), I spelt out a five-point code of practice for service providers, prefaced with this warning:
"If providers get this wrong — if they allow themselves to take the kind of short cuts with customer trust that jmjames describes — then they will consign their entire industry to the status of also-rans."
As Salesforce.com's own CEO Marc Benioff has saidSalesforce.com is currently failing lamentably in response to that outage and another subsequent one, no computer system is 100% perfect. On-demand providers should always strive in that direction — after all, telecoms providers at one time managed to establish a reputation for service availability that was so good the term 'dialtone' has become synonymous with constant availability. But software — especially in a distributed environment — just isn't in the same league. Coming back for a second to telecoms and its alleged 'dialtone' track record, you only have to try and establish a few conference calls with multiple parties to verify the truth of that.
The difficulty for every SaaS provider is that each of their outages are publicly visible. The effect on the industry's reputation, as I've mentioned previously, is not dissimilar to the effect on the airline industry when an aircraft malfunctions. The incident becomes headline news, and every prospective passenger imagines themselves in a similar position. They ask themselves, 'How would I feel? How would I react? Would I survive?'
In answering those questions, the testimony of other survivors plays a major part in restoring trust and confidence. And this is where Salesforce.com is currently failing lamentably. As Alorie Gilbert observed so tellingly in yesterday's news story, many of those users are remarkably web-savvy. There was enough testimony out there yesterday to know that a significant number of Salesforce.com users had experienced the following calamities:
- an outage of the API lasting almost 6 hours
- an outage of search functionality lasting 4 hours
- a brief period of intermittent availability of the site/application itself
But when I requested confirmation and further information yesterday about each of the above, the last point was the only one the company was prepared to own up to — and nothing further has emerged since at the time of writing.
To remain silent on an extended outage of the API is particularly galling, considering how crucial this facility is to the application mashup and integration capabilities of AppExchange, which the company has been talking up since the launch of AppExchange just two weeks ago.
When I interviewed Marc Benioff on a range of subjects last week, I didn't specifically ask about outages because I believed the company had dealt with its issues on this point, and therefore it wasn't high up my list of questions. Later this month, Saleforce.com is due to move its systems into a new, much more resilient and mirrored data center setup. In theory the move should make these outages rarer.
But what I'm starting to suspect from the extended silences from Salesforce.com when outages occur is that the malaise is not simply down to the robustness of the current, soon-to-be-retired infrastructure. My hunch is that there's also a lack of effective real-time monitoring of what's going on, which means that the company really doesn't know exactly what problems its customers are encountering.
In Salesforce.com's defence, it has to be said that it's easy enough to say such monitoring should be happening, but much trickier to actually implement. In the latest excerpt that I've published from last week's interview with Benioff, I've discussed the shortcomings of conventional software for supporting the massively scalable, multitenant architecture that Salesforce.com and other large on-demand vendors maintain. These vendors may have moved on from the old application code, but unless they handcode their entire infrastructure, they still often have to rely on systems management software from vendors that are not really up to speed with the particular demands of a high-volume on-demand data center.
My fear is that Salesforce.com doesn't have the excuse that it's relying on third-party software that's not up to the job (which to be frank is not much of an excuse anyway). My fear is that the company has not geared up to fulfil its responsibilities to know exactly what service levels its customers are experiencing, and to keep them informed when things go wrong.
If I have this wrong, then I look forward to being corrected by a clear statement from Salesforce.com of the true situation.