365 Main on Wednesday detailed what went wrong during the San Francisco power outage last week and detailed what it's doing to make sure its facilities stay running in the future.
The power outage on July 24 knocked various sites--including CNET, Craiglist and others--offline and raised questions about business continuity planning.
Here's the explanation provided by the company in full:
At 1:47 p.m. on Tuesday, July 24, 365 Main’s San Francisco data center was impacted by a power surge caused when transformer breakers at a local PG&E power station unexpectedly opened. PG&E has still not determined what caused the breakers to open.
Typically when a power outage occurs, the outage triggers 365 Main’s rigorously maintained and tested back-up diesel generators to start-up and take over providing power supply to customers. 365 Main’s San Francisco facility has ten 2.1 megawatt back-up generators to be used in the event of a loss of utility power. Eight primary generators can successfully power the building, with two generators available on stand-by in case there are any failures with the primary eight.
However, following the power outage last week, three of 365 Main’s 10 back-up power generators, manufactured by Hitec, failed to complete their start sequence. A complete investigation of the incident began immediately.
Within hours of the incident, an international team of specialists was deployed to 365 Main’s San Francisco data center facility to join on-site technicians and begin systematically testing the generators in search of a root cause. After days of thorough testing around the clock, the team discovered a weakness in an essential component of the back-up generator system known as a DDEC (Detroit Diesel Electronic Controller).
The team discovered a setting in the DDEC that was not allowing the component to correctly reset its memory. Erroneous data left in the DDEC’s memory subsequently caused misfiring or engine start failures when the generators were called on to start during the power outage on July 24.
The investigation team discovered DDEC issues on each of the failed Hitec units and were able to successfully simulate failure. A fix was introduced by altering the timing of a command to the DDEC component, allowing more time between the engine shut-down command and the DDEC reset command. Once this fix was introduced, the Hitec generators successfully passed more than 50 consecutive start-up sequence tests without incident.
The testing methodology was performed by Hitec specialists along with 365 Main’s chief technician and staff. Specialists from Cupertino Electric were present during all testing, and EYP Mission Critical Facilities will provide independent verification of the findings the week of 8/6/07.
365 Main has implemented the DDEC fix in its San Francisco and El Segundo facilities. Of the five data centers in 365 Main’s portfolio, the San Francisco and El Segundo facilities are the only ones with Hitec generators containing DDECs. All other facilities feature other brands of generators or have different models of Hitecs.
365 Main is sharing the discoveries of its investigation with other Hitec customers. In addition, Hitec has expanded its preventative maintenance procedures as a direct result of discoveries made during the 365 Main investigation.
The company also has a full archive of the developments last week.