The 'real' cost of application outages

Application downtime, whether you're measuring intermittent availability or fully downed systems, is too costly to ignore. The best way to avoid trouble is to view the infrastructure through the eyes of your transactions says OpTier's Motti Tal.
Written by Motti Tal, Contributor
Application downtime, whether you're measuring intermittent availability or fully downed systems, is too costly to ignore. The best way to avoid trouble is to view the infrastructure through the eyes of your transactions.

Few items on the CIO's agenda today are more important than application availability. Today's mission-critical and emerging Web-based applications are the backbone for more financial transactions than ever before. They're also integrating partners and suppliers more tightly than was possible just five years ago, and they're often the front-line provider of customer service. These reasons, among many others, are why application outages, ranging from intermittent brownouts to outright crashes, simply are too costly to ignore.

Application management is a multifaceted challenge
It's important to focus not just on fully downed applications, as many observers do, but on those intermittent moments of poor application performance when employees, business partners, suppliers, and customers end up staring at a swirling timer and are forced to wait too long—or grow impatient and give up entirely. Organizations that don't have cross-tier, end-to-end visibility into business transactions expose themselves to the significant risk of outages, poor customer satisfaction, and lost customers. But few organizations are aware of just how costly the spectrum of application outages can be.

When it comes to application outages, many different factors often come into play. Disruptions can be caused by databases, servers, load-balancing problems, network performance, and possibly the design of the application itself. Adding to the complexity is the fact that applications are hyper-dependent on an ever-increasing number of components that reach across the entire IT infrastructure. It's an intricate matrix that spans applications, servers, and sometimes organizational domains. This is why the ability to predict impending outages, as well as to as spot the root cause of availability issues, can be so complex.

The costs of application outages
Getting to the precise costs of application outages isn't easy, and can be as complex as identifying the precise reasons for availability disruptions and outages themselves. Recent research from Enterprise Management Associates suggests that when one looks across industry sectors and organizational size, from the low mid-tier to the large enterprise, the average cost of a single hour of application downtime is about $45,000. That cost, analysts say, can be exponentially higher for high-value and transaction intensive verticals, such as financial services, where application outages can cost millions of dollars each minute.

An expense that typically is not considered when it comes to measuring the total cost of application outages is the time highly-skilled IT workers spend troubleshooting problems. And organizations that don't have the visibility needed to pinpoint potential weaknesses, such as a database approaching capacity, before they materialize visibly have to start the process manually. They need to convene teams of experts—database administrators, network engineers, application owners, and security managers—to try to determine why Web application performance is deteriorating at an increasing rate. Now, add the cumulative costs of several such calls every month or, at larger enterprises, every week, and you’ll see that, over time, these meetings steal away significant budget.

Unfortunately, the complexity of troubleshooting applications isn't going to wane any time soon. Consider the growth of interdependencies and complexities introduced by growing technologies such as virtualization, SOA, and enterprise Web 2.0—all of which are very dynamic and real-time in nature. Many add an entire additional layer of abstraction, which further increases the difficulties in maintaining service levels and availability. The only way many organizations can avoid an availability train wreck is to get ahead of this acceleration of application complexity. This can be achieved through an understanding of how transactions actually flow through the infrastructure, not by trying to focus on the infrastructure in its entirety.

A common example would be the performance problems an end user might experience while trying to conduct a balance transfer from a banking account portal. When the end user starts to witness performance degradation, it's likely an initial sign of a bigger, more systemic problem. The key to finding the trouble spots would be to follow the flow of this user's transaction, not trying to analyze the entire infrastructure. This research and analysis would help aid in the discovery and exposure of weak links. By understanding and stepping through the vantage point of the transaction, an organization can grasp the intricate web of connectivity between those business activities and the IT infrastructure components. Once this is complete, a significant part of the outage problem is resolved.

An emerging technology: Business transaction management
To get this granular, transaction-level view, more organizations are turning to an emerging technology known as Business Transaction Management (BTM). Fundamentally, BTM leverages the power of business transactions that flow through an organization’s IT infrastructure and results in greater understanding of the service quality, flow, and dependencies among databases, servers, and applications throughout the entire transaction life cycle. In this way, BTM technologies can help new computing techniques such as virtualization live up to the promises of cost savings and increased business agility.

Now, with the proper implementation of BTM, IT managers can measure the performance of their applications from their users’ perspective and concurrently gain detailed visibility into how each transaction flows across the entire infrastructure and utilizes each shared resource. As a result, they can optimize the performance of their transactions, applications, and the infrastructure itself, to provide better customer experience and deliver the quality-of-service that business units expect at lower cost.

Through this transaction insight, and the ability to identify problem areas before they become visible to users or lead to an outright crash, BTM can provide a significant return on investment.

A BTM approach will not only mean much less downtime, but it will also no longer be necessary to pull together large teams of experts for manual troubleshooting. Organizations will know that the database is beginning to stall because it's nearing transaction capability, or that the use of servers in the network has been trending higher over time, and more capacity needs to be added to meet demand. In this way, outages can be foreseen and are avoided long before they materialize. In some cases, organizations that utilize BTM have experienced up to 70 percent less unplanned downtime. This equates to significant savings in labor time, as well as in thousands to millions of dollars that can be lost every minute due to application downtime.

Reducing and eliminating costly application outages has been one of the most important discussions within IT for decades. The challenge is that the problem isn't getting simpler to resolve; indeed, it's getting more complex because of the growing interdependence of most applications. But by focusing on the transaction, it’s easy to see that the ability to ensure the always-on availability we've all come to expect becomes a measurable and attainable goal.

Motti Tal is executive vice president for Marketing and Business Development at OpTier.

Editorial standards