Colleague Marguerite Reardon reports that BlackBerry-maker Research In Motion has finally issued a statement identifying the cause of the 14-hour service disruption earlier this week.
In a statement, RIM has "determined that the incident was triggered by the introduction of a new, noncritical system routine that was designed to provide better optimization of the system's cache."
"RIM said the system routine was not expected to impact the regular operations of the BlackBerry servers and infrastructure," Marguerite adds. "But despite previous testing, the new system routine produced an unexpected impact that set off a chain reaction triggering a series of interaction errors between the system's operational database and the cache.
First, RIM isolated the database problem and tried to fix the issue. No can do there. At that point, RIM launched a "failover" process that defaulted to a backup system in place.
You get one guess. That failed as well.
"Although the backup system and failover process had been repeatedly and successfully tested previously, the failover process did not fully perform to RIM's expectations in this situation and therefore caused further delay in restoring service and processing the resulting message queue," RIM says.
The good news for BlackBerry users is that RIM says it has identified several testing, monitoring and recovery processes that it now deems due for enhancement.