Upgrade disrupts Salesforce.com

Upgrade disrupts Salesforce.com

Summary: Salesforce.com, a leading Software as a Service (SaaS) vendor, faced outages across parts of its network yesterday.

SHARE:

Salesforce.com, a leading Software as a Service (SaaS) vendor, faced outages across parts of its network yesterday. Here's a screen capture from the company's status dashboard:

Upgrade causes Salesforce.com outage

The main outage lasted about six hours, as reported by the company:

4:40 am PST : Root Cause Messaging - NA5 Performance Degradation 2/11/2008

Starting at 1622 UTC, customers on the NA5 instance began experiencing intermittent performance degradations. The salesforce.com technology team worked on troubleshooting the issue throughout the day and took corrective actions to restore normal service levels by 2204 UTC. We believe that the problem occurred due to changes in database utilization introduced in the Spring '08 release which went live on the NA5 instance on Friday night. We have subsequently changed the configuration of our servers to address the problem and do not expect further issues.

Secondary problems also occurred:

Time: 2/11/08 6:43 am PST

Detail: NA0 (SSL) Service Disruption from 1443 UTC to 1502 UTC on Monday, February 11th.

Root cause: Starting at 1443 UTC, customers on the NA0 instance experienced a service interruption lasting approximately 19 minutes. The salesforce.com technology team worked to isolate the issue and restored the service at 1502 UTC. While the salesforce.com technical team still is going through the forensics, we believe that the root cause for the outage was related to a significant slowdown in IO throughput to the database storage sub-systems.

Salesforce.com is continuing to work with our vendors to further understand what could have caused the IO slowdown and will take corrective actions to ensure that these issues are addressed appropriately.

THE PROJECT FAILURES ANALYSIS

Software as a service vendors claim non-disruptive upgrades are an important reason customers should buy online, rather than on-premises, software. In this instance, Salesforce.com did not insulate its customers from downtime and inconvenience associated with an upgrade.

Phil Wainewright, ZDNet blogger and premier SaaS analyst, notes the outage was small. With all due respect to Phil, I think this "minor" outage points to deeper issues.

Two points stand out:

  1. Customer don't want this hassle. SaaS customers pay to be insulated from release problems, and that's what they should get. With no reasons, excuses, or explanations.
  2. Salesforce.com's release testing process is broken. Testing should have caught these problems before they were released into the wild.

Outsourced software services remain a great option for many customers, but the marketing hype sometimes outstrips reality. Whether online or on your own server, critical points in the software lifecycle, such as upgrades for example, can (and do) cause end-user disruption.

In the meantime, salesforce.com should do some remedial work on its release and testing processes.

Topics: CXO, Enterprise Software, Outage, Software, IT Employment

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

10 comments
Log in or register to join the discussion
  • The promise of SaaS...

    That's what you get for going with that model. Enjoy.
    Techboy_z
  • Testing, Testing, and more Testing..

    I agree that software is more prone to problems in a wide open arena of computers, but SaaS providers have no excuse.

    They obviously failed to stress test their APP. Very bad.

    I saw blockbuster do this.. they had their new website up for about 8 hours or so until they pulled it and replaced it with the old one. Not enough testing.
    Been_Done_Before
  • Weekend is best time...

    for this kind of stuff. Maybe they don't believe in working on weenend
    bjbrock
  • Release process broken?

    Oh, I get it - it's either black or white, good or bad, perfect or broken. Yeah they can do a better job. Yeah, no other organization has tried this model before. Yeah, the release was perfect on most of their instances, for most customers.
    devils_advocate
    • "Most" customers

      Ask the affected customers whether the release process is broken or not.
      mkrigsman@...
  • The list of reasons not to switch to SaaS keeps getting longer

    Add that to the list of reasons not to use SaaS.

    Frankly, most of the touted benefits of SaaS are minimal, if they exist at all. Search for "SaaS benefits" and you'll get a bunch of websites, each with a totally different list. All of them with fluff sayings and without any real, solid, concrete, measurable benefits.

    IMHO SaaS is a fad. I doubt it'll last long once the hype is over and businesses take a closer look at whether they're really getting any benefits from it.
    CobraA1
  • RE: Upgrade disrupts Salesforce.com

    Michael,

    You must not keep a lot of mission critical systems up and running. Having 6 hours of disruption on the CRM system in one year is small. Not having to be the one to actually fix it and knowing that salesforce.com is working as hard as it can to get it back up and running.....HUGE!
    Andy123123
  • RE: Upgrade disrupts Salesforce.com

    I thought Mr Wainwright was kidding! Big, small - nobody wants outages in IT! It is like being in a fancy restaurant, having the food of your dreams and being told that you're under time pressure to finish. You still have the food and you can go back again but you just have to bear it all this present time!

    But it is still far from being a death sound for SaaS, just something to delay the inevitable SaaS adoption.

    Best.
    alain
    alain@...
  • If this is the only downtime this year

    Then they are managing a rate of defects per million opportunites of 36.15.

    They've already blown achieving Six Sigma level quality.

    I assume that downtime isn't the only quality measure we are applying, the software has to deliver correct results too.
    jorwell
    • reality

      the true compare is what most of their customer profile's would be able to achieve on their own from an uptime perspective. the reality of that is that the majority of their customers on a unit basis are likely under 500 people and they don't have track records or keeping things like exchange up more than 90% of the time.
      software.reality@...