Reputation, trust and Salesforce outages

Reputation, trust and Salesforce outages

Summary: It's not the outages themselves that are sullying the reputation of the SaaS model. The true Iulprit is's obfuscation,

TOPICS: Tech Industry

Yes, of course Salesforce's outages are sullying the reputation of the SaaS model, as David Berlind surmises. But it's not (or shouldn't be) the outages themselves that are doing the sullying. The true culprit is's outage obfuscation, which Dan Farber reported last night — and on which I have more to add. 

After had its first major outage in December (and prompted by some robust comments from Talkback contributor jmjames), I spelt out a five-point code of practice for service providers, prefaced with this warning:

"If providers get this wrong — if they allow themselves to take the kind of short cuts with customer trust that jmjames describes — then they will consign their entire industry to the status of also-rans."

As's own CEO Marc Benioff has is currently failing lamentably in response to that outage and another subsequent one, no computer system is 100% perfect. On-demand providers should always strive in that direction — after all, telecoms providers at one time managed to establish a reputation for service availability that was so good the term 'dialtone' has become synonymous with constant availability. But software — especially in a distributed environment — just isn't in the same league. Coming back for a second to telecoms and its alleged 'dialtone' track record, you only have to try and establish a few conference calls with multiple parties to verify the truth of that. 

The difficulty for every SaaS provider is that each of their outages are publicly visible. The effect on the industry's reputation, as I've mentioned previously, is not dissimilar to the effect on the airline industry when an aircraft malfunctions. The incident becomes headline news, and every prospective passenger imagines themselves in a similar position. They ask themselves, 'How would I feel? How would I react? Would I survive?'

In answering those questions, the testimony of other survivors plays a major part in restoring trust and confidence. And this is where is currently failing lamentably. As Alorie Gilbert observed so tellingly in yesterday's news story, many of those users are remarkably web-savvy. There was enough testimony out there yesterday to know that a significant number of users had experienced the following calamities:

  • an outage of the API lasting almost 6 hours
  • an outage of search functionality lasting 4 hours
  • a brief period of intermittent availability of the site/application itself

But when I requested confirmation and further information yesterday about each of the above, the last point was the only one the company was prepared to own up to — and nothing further has emerged since at the time of writing.

To remain silent on an extended outage of the API is particularly galling, considering how crucial this facility is to the application mashup and integration capabilities of AppExchange, which the company has been talking up since the launch of AppExchange just two weeks ago.

When I interviewed Marc Benioff on a range of subjects last week, I didn't specifically ask about outages because I believed the company had dealt with its issues on this point, and therefore it wasn't high up my list of questions. Later this month, is due to move its systems into a new, much more resilient and mirrored data center setup. In theory the move should make these outages rarer. 

But what I'm starting to suspect from the extended silences from when outages occur is that the malaise is not simply down to the robustness of the current, soon-to-be-retired infrastructure. My hunch is that there's also a lack of effective real-time monitoring of what's going on, which means that the company really doesn't know exactly what problems its customers are encountering.

In's defence, it has to be said that it's easy enough to say such monitoring should be happening, but much trickier to actually implement. In the latest excerpt that I've published from last week's interview with Benioff, I've discussed the shortcomings of conventional software for supporting the massively scalable, multitenant architecture that and other large on-demand vendors maintain. These vendors may have moved on from the old application code, but unless they handcode their entire infrastructure, they still often have to rely on systems management software from vendors that are not really up to speed with the particular demands of a high-volume on-demand data center.

My fear is that doesn't have the excuse that it's relying on third-party software that's not up to the job (which to be frank is not much of an excuse anyway). My fear is that the company has not geared up to fulfil its responsibilities to know exactly what service levels its customers are experiencing, and to keep them informed when things go wrong.

If I have this wrong, then I look forward to being corrected by a clear statement from of the true situation.

Topic: Tech Industry

Phil Wainewright

About Phil Wainewright

Since 1998, Phil Wainewright has been a thought leader in cloud computing as a blogger, analyst and consultant.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Not with current carriers...

    I wouldn't want to bet my business on using software services with the current carriers in charge of connectivity. Supposedly, Comcast has only 2 DNS servers for their entire network, and they have had extended outages. Verizon's internet connectivity for the entire west coast of Florida was disrupted by a single cable cut in Georgia. So much for the self-repairing and routing capability of the network. In addition most of the carriers have now decided that they're not making enough money on selling connectivity alone - they want a piece of your profits in addition... pay up or your 'bits' don't go through.
    Steven J. Ackerman
  • Monitoring is not an add-on service...

    If Salesforce did not build in the ability to monitor real-time data on problems, then it really has itself to blame.

    Our on-demand software currently supports tens of thousands of people (phone, fax and Web) and we have instant notification when something goes wrong. Not only that, we get alerts on minor issues that didn't stop a user dead, but caused their action to not stick.

    We track those minor issues and have a person research to see if the user is doing something we did not expect or if this minor issue is a precursor to a larger issue. There is no excuse to not do this today.

    In a nutshell: this type of monitoring and follow-up has to be built into the system, not an add-on that gets slapped in place when the system starts crashing.

    We may not have hundreds of thousands of users (yet!), but the tens of thousands who do use it consider their phone, fax and distributed workforce applications to be mission-critical. And we take that very seriously. We do have outages, but we are up-front with clients and give them all of the information, even if it was our screw-up. That has built a very loyal client following.

    It's too bad that Salesforce does not see the benefit in being honest with its customers. If we had sales force management tools, I would invite them to be our clients. Perhaps one day we will consider adding sales force tools to our service line....
    Paul C.
  • Single Point of Failure ...

    Any "single point of failure" is a concern. It's one thing to lose productivity on one application for a handful of employees and quite another to have all your employees idle because of an outage at a single service provider for all of your mission critical applications.
    M Wagner
  • Salesforce is dishonest on this issue

    Although I have no inside information regarding how Salesforce monitors the availability of their service, they were aware of the extended API outage this week. It was posted on the status page portion of their website, and we heard from company representatives during the outage that they were aware of and working on the problem. While it is possible they do not have the monitoring capability to know exactly when the system went down, by the end of the day they certainly knew the extent of the outage.

    You are giving them a pass by saying they don't have the technical ability to track outages. They are not being honest with the press. They know about the outage, and avoid public discussion of it. They are not a reliable source of information, and potential customers would do good to keep that in mind.
  • Phil, Phil, Phil...not enough floor time lately...

    Too much time as an "analyst" or "strategist" rather than time "on the floor" will certainly cloud your perception. It's really quite Dilbert-esque. You said:

    "But it's not (or shouldn't be) the outages themselves that are doing the sullying."

    Are you kidding!?! I could care less if Salesforce gave me all the info in the world about what is going on, if the outages persist. It is not only the outages that are sullying the reputation of SaaS, but that is certainly primary!
    • Perception matters ...

      Most of the complaints I've seen have been about the lack of information. Everyone accepts that the world isn't perfect and outages happen from time to time (though the less frequently the better, of course). If I'm on the floor, the first thing I want is to have something I can tell my users about wha's going on. Sure, if the outages persist, you'll start to wonder whether you ought to move. But knowing what's happening and why is really important, and by stonewalling, is failing in its duty to customers.
      phil wainewright