Salesforce.com suffers worldwide disruption after power outage

Salesforce.com suffers worldwide disruption after power outage

Summary: Salesforce.com has suffered a series of outages that left many unable to use the CRM service. Even the service status page was down for a while.

SHARE:

Salesforce.com customers continue to suffer as instances of the social enterprise service crumbled in the face of a power outage, the company said today.

The bright side is that many were tucked up in bed following the initial outage that began at 12:49 a.m. PDT. Europeans picked up the slack as they headed into work around the same time on local time. Many took to Twitter to complain.

Even the status page was down for a while, some said on Twitter. Power appears to be the cause of the outage, as explained on the site's status page --- once it jumped back into life:

2:46 am PDT : NA1/NA5/NA6/CS0,CS3,CS1,CS12 salesforce.com System Status 
The salesforce.com NA1/NA5/NA6/CS0,CS3,CS1,CS12 instances are continuing to experience a service disruption. Power issues were detected but our technician onsite has confirmed this has been fixed. We are currently working to restore the service. Please check the status of trust.salesforce.com frequently for updates regarding this issue.

Since then, the company noted that "standard salesforce.com reporting, contacts, updates, case entry services" were available, but some will see problems with "sporadic search and file attachment performance."

The Salesforce.com application store is also down, the company said at 5:30 a.m. PDT.

It's the second such major outage in as many weeks

Late June saw the last major outage, in which a fault occurred in its storage tier. Performance suffered in both North America and European regions. In total, the outage lasted seven hours during European business primetime. 

At the time of writing, service had been restored on NA1 and NA5 servers --- though performance was still choppy --- but gave no estimates were given on the restoration of the remaining five affected instances: NA6, CS0, CS3, CS1 and CS12.

Hats off to the Salesforce public relations team. The status site remains (mostly) up and provided detailed information, and there were at times a tweet a minute from its Twitter-based team to keep the spirits up among those who were suffering in the outage. 

ZDNet contacted Salesforce for comment, but did not receive a reply at the time of writing. 

Update at 11:20 a.m. PDT: All of the instances are back up with the exception of CS12. While warnings and notifications are still appearing under all of all instances previously affected --- such as issues relating to search functionality --- many can at least now log back into the Salesforce.com website. 

"The salesforce.com Technology Team is working to fail the CS12 instance over to its DR datacenter," the update at 11 a.m. PDT read. 

At the time of this update, the service should have all instances up by 12 midday PDT.

Update at 12:15 p.m. PDT: The status page says Salesforce has paused the CS12 failover to its alternate datacenter but says it could restore the instance "within the same time window which is a lower risk option."

It also added: 

11:30 am PDT : NA5/CS1 Search and CS12 Outage System Status - Update 
The salesforce.com Technology Team has identified and repaired a problem impacting NA5 search for large indexes. The search servers, when restarted, defaulted to a sub-optimal packet size setting on the host network interface card. As such, large indexes were broken up and handled inefficiently. We found and corrected the packet size setting and remediated the search performance impact for NA5.

Update at 12:30 p.m. PDT: Salesforce said it has "resolved all issues with the CS12 instance, and all CS12 services are restored." All instances are back up and running, though some problems with search will persist throughout the remainder of the day. 

Topics: Salesforce.com, Outage, Social Enterprise

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

11 comments
Log in or register to join the discussion
  • Geographically distributed failover clusters...

    that's about it, see Google.
    dtdono0
    • Geographically distributed failover clusters...

      are expensive and eat into the bottom line. ;-)
      ForeverSPb
      • That sounds about par for the course...

        CIO: We need failover.
        CEO: Costs too much.
        CIO: okay...
        * servers crash, customers complain *
        CEO: This is all your fault, why don't we have redundancy?
        CIO: ...

        I'm not saying saying you need 20 data centers, but you should at least have a 2nd site over a slow WAN connection as a backup in case of a disaster like this just to keep the sevice up until you get power back. A company the size of salesforce should be able to afford one small secondary cluster node.
        dtdono0
        • there's more to it kid..

          There is more to geographic failover than "one small secondary cluster node" and a "slow WAN connection".

          It's from an older report, but from what I found, Salesforce stores its data in one of the largest Oracle instances in the world. You don't just setup one small secondary cluster node over a slow WAN connection to replicate that.

          Enterprise applications don't work like you think they do or like what you just described. I'd bet that a team of engineers at Salesforce have spent many days and weeks researching the feasability of replicating Oracle to a geographic diverse location and I suspect they determined what most do.. it's much more cost effective to fix the things locally for redundancy with a much better return than to try and engineer such a large scale replication.

          The conflict resolution and business rules they'd have to come up with would be mind boggling but I think I've already lost you..
          fireman949
  • Trust the cloud

    Um, no thanks.
    NoAxToGrind
  • Uptime Article

    The cloud is often sold as a more reliable entity that in-house IT resources. An article that that examines cloud downtime (Rim, Amazon, Microsoft, Salesforce, Yahoo, etc) and contrasts it with like expectations within corporate IT, could be very useful in terms of setting expectations among non-technical staff. Mayor Bloomberg's office in NYC recently completed a report on how unreliable or insufficient broadband communications can be in the metro area, and how that infrastructure thwarts the efforts of tech minded startups. In other words, contrast the reality with the dream.
    mgw@...
  • Well huh...

    imagine that, a cloud application that had service failure. yay cloud!
    bobavery
  • Vindication at long last

    I recently implemented a on-site server based CRM solution for our company and spent six months as upper management vacillated back and forth and sang the praises of "The Cloud".

    Of course none of them had any idea what cloud computing was. They had just heard that it was the wave of the future. More than once I had to explain that "cloud" is just a euphemism for a server hosting an application at someone else's location.

    Now I feel vindicated...I'm sending this article to everyone involved.
    fifth_disciple@...
    • misconception

      What the CIO says:
      We NEED CLOUD!

      What the CIO meant:
      We need software as a service


      "Cloud" has become such a useless term now because of how it is thrown around. Ask the next person who says "Let's put it in the CLOUD" what they actually mean by that.
      fireman949
  • So painful

    As a geek of 24 years, a lot of that spent in data centerrs, I truly feel the pain of those with their hands on the hardware/software involved. OTOH, there's really no excuse for the "search servers, when restarted, defaulted to a sub-optimal packet size setting" problem.

    This story highlights a few things:

    1. Cloud schmoud. Yawn. Next hot thing, please step forward.
    2. No one is immune from unplanned downtime, no matter how big they are.
    3. Everyone makes mistakes (e.g. packet size).
    4. It's yet another reality check for CxO folks who know diddly about IT (including CIOs) on the alleged reliability of any system.
    5. Feel free to laugh at anyone demanding five nines (usually because they read about it in Forbes or Business Week or some such). Just send them a link to this article or any of a multitude of other similar stories.
    moebiusloop
  • Size is not always the benefit you may think

    Salesforce is often touted as the defacto CRM, but big is not always better and certainly not right for all.

    Independent analysis of viable Salesforce alternatives can be found at www.g2crowd.com.where leading CRM's are compared by users in an open format.

    Ian Moyse
    Workbooks
    ianm32@...