A catastrophic systems failure at cloud-based software provider, Navitaire, a business process outsourcing (BPO) unit of Accenture, disrupted travel for 50,000 customers of Virgin Blue airlines in Australia. The situation offers important lessons for buyers of cloud-based outsourcing services.
Related: Virgin's cloud failure: Rebuttal and a deeper perspective
Virgin Blue provided details in a press release:
Navitaire is the supplier of Virgin Blue’s reservation and distribution software platform and also hosts that platform on its own server infrastructure at a data centre in Sydney.
At 0800 (AEST) yesterday the solid state disk server infrastructure used to host Virgin Blue failed resulting in the outage of our guest facing service technology systems.
We are advised by Navitaire that while they were able to isolate the point of failure to the device in question relatively quickly, an initial decision to seek to repair the device proved less than fruitful and also contributed to the delay in initiating a cutover to a contingency hardware platform.
The service agreement Virgin Blue has with Navitaire requires any mission critical system outages to be remedied within a short period of time. This did not happen in this instance. We did get our check-in and online booking systems operational again by just after 0500 (AEST) today.
Navitaire has given us an assurance that they are thoroughly investigating all circumstances which led to the hardware device failure and the delay in getting an alternative platform up and running. They have given an undertaking to get a full report to us as soon as possible.
According to travel technology website, tnooze, Virgin Blue recently transitioned from Navitaire's Open Skies platform to the same company's New Skies system. Navitaire's website describes New Skies:
New Skies is a comprehensive airline passenger sales and management solution providing capabilities for integrated Internet booking, call center reservations, travel agency global distribution connectivity, inter-airline and alliance code-share itineraries, real-time reporting, ancillary revenue generation and departure control.
A Navitaire representative promised to contact me with additional details, but never did.
The Virgin Blue situation raises several key issues for business buyers of cloud-based services:
I asked CEO of top BPO and sourcing analyst firm Horses for Sources, Phil Fersht, for his view:
This incident highlights the advantages of using a single provider to manage both the business processes and related IT services within a cloud-based business services model. The Navitaire team responded relatively quickly to solve the problem, without Virgin having to deal with multiple points of blame. These things happen all the time; at least Virgin has a "single throat to choke."
Implications for buyers. Outages are an unpleasant reality in both the on-premise and cloud worlds.
To uncover potential Navitaire system weaknesses in advance, Virgin Blue would have needed to perform extraordinary and impractical levels of due diligence, digging deeply into Navitaire's technology, policies, and training procedures. Even then, it is unclear whether Virgin could have anticipated this particular point of failure.
Buyers of mission-critical outsourcing services should consider developing their own plans and procedures to handle external failures. In the end, process redundancy is the best form of failure prevention. Virgin Blue reverted to a poorly executed manual system to handle the outage, which caused the extended inconvenience its customers experienced.
While not excusing Navitaire, we must recognize that all parties have responsibility to plan and prepare for predictable, and even inevitable, failures.
Update 9/27/10, 1:30 PM ET: A few readers question whether this is actually a "cloud" situation or merely traditional outsourcing. Accenture, Navitaire's owner, titles the Navitaire web page, "Navitaire: Cloud computing for airlines: Accenture." Accenture also lists Navitaire under its Cloud Services set of offerings. All this raises questions around what "cloud" actually means. It's a tough question without an easy answer.
Update 10/14/10, 7:30am ET: CIO Magazine reports that the outage cost Virgin Blue $15-20 million.