What is going on?
One of our less pleasant responsibilities here at Forrester is commenting on serious business, security, or technical failures in the digital and IT industry. Due to its duration and the implications for a subset of the user base, the current Atlassian outage rises to that level.
Atlassian is staking its future on being a cloud provider — it is transforming all of its products into SaaS offerings and sunsetting most of its traditional support for on-premises. This week's outage puts intense scrutiny on its abilities to execute, win, and maintain customer trust, despite the reported low number of customers impacted. Atlassian-sourced figures put the number at around 0.2% of its cloud customer base, and it says that it has restored service to about 45% of its impacted users, but the duration of this restoration now makes this an unusually long SaaS outage.
For those not familiar, 400 customers lost service on Jira, Jira Service Management, Jira Work Management, Confluence, Opsgenie, Statuspage, and Atlassian Access for a week. The outage is expected to last at least two more weeks for some.
This outage was particularly ill-timed, occurring during its annual Team '22 customer conference. Before the outage broke, analyst and market reception of Atlassian's business strategy was mixed. While there are natural benefits for customers in moving to a SaaS model (such as reduced admin work), the reputational damage to Atlassian's cloud capabilities is occurring at a particularly contentious time. It seems likely that Atlassian's cloud migration timelines will be adjusted.
What can customers do?
In the interim, Atlassian customers should take a few steps in response to this outage:
- Verify whether you are affected across all of your Atlassian products and instances. You may have an Atlassian product being run independently in your organization, not part of standard IT channels. This discovery may prove useful for bundling instances or centralizing management in future negotiations.
- If you have not yet migrated off its Server option, speak with Atlassian migration reps about the ongoing risk to see if there are any architectural strategies you can employ to avoid a similar outage.
- If you have migrated to the cloud (or started on the cloud), speak with your representative about the outage. Explore if there are additional assurances your organization can leverage, whether it's an advanced SLA level (e.g. Atlassian's 99.9% and 99.95% uptime options) or architectural strategies to avoid similar impacts.
- Watch how Atlassian reacts to the outage:
- Atlassian has just concluded its first blameless mid-incident assessment and posted it for public assessment. Its primary content is around communicating what went wrong and correctly avoiding narrowing down the incident to a single point of failure or individual. It should be accompanied by additional pieces following the conclusion of the incident outlining future actions to be taken to ensure this can't happen again. While the initial mid-incident assessment avoided blaming a single individual if Atlassian pins responsibility on a specific group in the future and obscures culpability that will be an organizational red flag. This seems unlikely, however, it is worthy of attention.
- Look for customer compensation beyond the required SLA. How does Atlassian make it right? Does it go above and beyond to repair customer trust, or does it meet the contractual minimums? Just meeting minimums should generate skepticism.
- Look at how it executes its findings and how it acts to prevent this from happening again. Does it invest significantly in resiliency? Does it hire resiliency experts? Or does it routinely downplay the probability of such a failure happening again? The latter is less encouraging as an existing or potential customer.
Can you use other tools?
Some will undoubtedly consider alternatives to Atlassian. The challenge to this approach is that Atlassian is an increasingly broad and integrated, cross-functional suite (as we can see above). The recent product announcements around Atlas, Compass, and enhancements to underlying architecture (like the Atlassian Data Lake and Atlassian Analytics) indicate a smart emphasis on this strategy. Acquisitions (especially Opsgenie) are not remaining cohesive and decoupled. Quite the opposite, they are being integrated into the whole. Atlassian increasingly finds itself in the company of vendors like SAP or Salesforce, where replacement is made difficult thanks to their cross-functional capabilities.
What is going to happen next?
All is of course not lost for Atlassian. Unfortunately, high-profile outages are common. One Forrester analyst worked for a major US bank that suffered a high-profile mainframe outage. Customers could not access their funds and impacts included some customers missing payroll to their employees. That bank still exists and there is little or no residual impact of that outage.
As our friends in the resilience engineering community are fond of pointing out, it's a miracle that complex systems work at all – a clear-eyed examination of their operational history reveals sobering and ongoing near misses and is critical for building more resilient systems.
But Atlassian will not escape unscathed. It is in multiple challenging markets, and it has formidable competitors. Customers are going to use this opportunity to demand additional discounts and deployment flexibility. Dramatic commitments to resilience are going to be required to re-establish trust, so Atlassian can become the center of work it seeks to be. Cloud improvements have already been top of mind for the Atlassian teams (as its leaders spoke to performance improvements at its Team 22 event on the main stage) but further improvements to resilience must come immediately, along with financial commitments.
In conclusion, we expect Atlassian to survive the situation – the majority of Atlassian's pragmatic customers will say "I wasn't affected," but we expect additional cloud migration resistance introduced into the market, as well as additional fallout dependent on Atlassian's response to the situation. While this unfortunate situation involved SaaS offerings, this alone is no reason to abandon SaaS. Cloud services have generally proven dependable. However, it is a wake-up call that just because you have something in a cloud service does not grant blind trust in that service. Perform your diligence regardless of where "it" is.
In the meantime, for customers, there aren't many options to increase your own Atlassian resiliency beyond some of the more basic steps outlined above. You can develop alternatives/options to mitigate risks, like Brent Ellis and Naveen Chhabra outline here.
 Atlassian plans to EOL its Atlassian Server option by 2024 and is designing its newer on-prem offering, Atlassian Data Center, for larger organizations. Its timeline and the specifics of this plan are available here: Atlassian Server end of life (sale/support) information.
This post was written by Analyst William McKeon-White and it originally appeared here.