Research in Motion co-chief executive Mike Lazaridis apologized this morning for his company's worldwide service outage for its BlackBerry smartphones, the largest in its history.
"I want to apologize to all of the BlackBerry customers we've let down," he said during a press conference. "You expect better of us. I expect better of us. Our inability to quickly fix this has been frustrating."
Full services are now restored, but it took several days to do so.
Lazaridis said a hardware failure on Monday caused a "ripple effect" in system. A dual redundant high-capacity core switch failed, he said, causing outages and delays -- a cascading failure, he said -- in Europe, Middle East, Africa, India, Brazil, Chile and Argentina.
What's more, the backup switch didn't function as intended, causing a significant data backlog. As Europe's queue backed up from the failure, it overloaded the rest of the countries mentioned. And the data backlog took much longer to get through than expected, Lazaridis said.
"We don't know why the switch failed in the particular way it did and did not fail over to its redundant pair," Lazaridis said. "We do know that there was an error in it that [was] most likely caused by hardware."
The company is currently scrambling to have its vendors correct the switch failure mode, as well as audit its own infrastructure to understand why it took so long to get the system back online.
"We plan ahead for the kind of anticipated growth that we have and expect to have in the future," Lazaridis said, suggesting that it was not an issue of capital expenditure.
It is also managing the fallout from a break in its customers' trust, many of whom chose RIM precisely for its security and stability.
Recent layoffs did not impact the team that manages such outages, Lazaridis said.
"Nobody has gone home since Monday," co-CEO Jim Balsillie said.