madison

The 'real' cost of application outages

Motti Tal, OpTier, Special to ZDNet | October 28, 2008 12:38 PM PDT

Summary

Application downtime, whether you're measuring intermittent availability or fully downed systems, is too costly to ignore. The best way to avoid trouble is to view the infrastructure through the eyes of your transactions says OpTier's Motti Tal.
Application downtime, whether you're measuring intermittent availability or fully downed systems, is too costly to ignore. The best way to avoid trouble is to view the infrastructure through the eyes of your transactions.

Few items on the CIO's agenda today are more important than application availability. Today's mission-critical and emerging Web-based applications are the backbone for more financial transactions than ever before. They're also integrating partners and suppliers more tightly than was possible just five years ago, and they're often the front-line provider of customer service. These reasons, among many others, are why application outages, ranging from intermittent brownouts to outright crashes, simply are too costly to ignore.

Application management is a multifaceted challenge
It's important to focus not just on fully downed applications, as many observers do, but on those intermittent moments of poor application performance when employees, business partners, suppliers, and customers end up staring at a swirling timer and are forced to wait too long—or grow impatient and give up entirely. Organizations that don't have cross-tier, end-to-end visibility into business transactions expose themselves to the significant risk of outages, poor customer satisfaction, and lost customers. But few organizations are aware of just how costly the spectrum of application outages can be.

When it comes to application outages, many different factors often come into play. Disruptions can be caused by databases, servers, load-balancing problems, network performance, and possibly the design of the application itself. Adding to the complexity is the fact that applications are hyper-dependent on an ever-increasing number of components that reach across the entire IT infrastructure. It's an intricate matrix that spans applications, servers, and sometimes organizational domains. This is why the ability to predict impending outages, as well as to as spot the root cause of availability issues, can be so complex.

The costs of application outages
Getting to the precise costs of application outages isn't easy, and can be as complex as identifying the precise reasons for availability disruptions and outages themselves. Recent research from Enterprise Management Associates suggests that when one looks across industry sectors and organizational size, from the low mid-tier to the large enterprise, the average cost of a single hour of application downtime is about $45,000. That cost, analysts say, can be exponentially higher for high-value and transaction intensive verticals, such as financial services, where application outages can cost millions of dollars each minute.

An expense that typically is not considered when it comes to measuring the total cost of application outages is the time highly-skilled IT workers spend troubleshooting problems. And organizations that don't have the visibility needed to pinpoint potential weaknesses, such as a database approaching capacity, before they materialize visibly have to start the process manually. They need to convene teams of experts—database administrators, network engineers, application owners, and security managers—to try to determine why Web application performance is deteriorating at an increasing rate. Now, add the cumulative costs of several such calls every month or, at larger enterprises, every week, and you’ll see that, over time, these meetings steal away significant budget.

Unfortunately, the complexity of troubleshooting applications isn't going to wane any time soon. Consider the growth of interdependencies and complexities introduced by growing technologies such as virtualization, SOA, and enterprise Web 2.0—all of which are very dynamic and real-time in nature. Many add an entire additional layer of abstraction, which further increases the difficulties in maintaining service levels and availability. The only way many organizations can avoid an availability train wreck is to get ahead of this acceleration of application complexity. This can be achieved through an understanding of how transactions actually flow through the infrastructure, not by trying to focus on the infrastructure in its entirety.

A common example would be the performance problems an end user might experience while trying to conduct a balance transfer from a banking account portal. When the end user starts to witness performance degradation, it's likely an initial sign of a bigger, more systemic problem. The key to finding the trouble spots would be to follow the flow of this user's transaction, not trying to analyze the entire infrastructure. This research and analysis would help aid in the discovery and exposure of weak links. By understanding and stepping through the vantage point of the transaction, an organization can grasp the intricate web of connectivity between those business activities and the IT infrastructure components. Once this is complete, a significant part of the outage problem is resolved.

An emerging technology: Business transaction management
To get this granular, transaction-level view, more organizations are turning to an emerging technology known as Business Transaction Management (BTM). Fundamentally, BTM leverages the power of business transactions that flow through an organization’s IT infrastructure and results in greater understanding of the service quality, flow, and dependencies among databases, servers, and applications throughout the entire transaction life cycle. In this way, BTM technologies can help new computing techniques such as virtualization live up to the promises of cost savings and increased business agility.

Now, with the proper implementation of BTM, IT managers can measure the performance of their applications from their users’ perspective and concurrently gain detailed visibility into how each transaction flows across the entire infrastructure and utilizes each shared resource. As a result, they can optimize the performance of their transactions, applications, and the infrastructure itself, to provide better customer experience and deliver the quality-of-service that business units expect at lower cost.

Through this transaction insight, and the ability to identify problem areas before they become visible to users or lead to an outright crash, BTM can provide a significant return on investment.

A BTM approach will not only mean much less downtime, but it will also no longer be necessary to pull together large teams of experts for manual troubleshooting. Organizations will know that the database is beginning to stall because it's nearing transaction capability, or that the use of servers in the network has been trending higher over time, and more capacity needs to be added to meet demand. In this way, outages can be foreseen and are avoided long before they materialize. In some cases, organizations that utilize BTM have experienced up to 70 percent less unplanned downtime. This equates to significant savings in labor time, as well as in thousands to millions of dollars that can be lost every minute due to application downtime.

Reducing and eliminating costly application outages has been one of the most important discussions within IT for decades. The challenge is that the problem isn't getting simpler to resolve; indeed, it's getting more complex because of the growing interdependence of most applications. But by focusing on the transaction, it’s easy to see that the ability to ensure the always-on availability we've all come to expect becomes a measurable and attainable goal.

biography
Motti Tal is executive vice president for Marketing and Business Development at OpTier.

Talkback Most Recent of 4 Talkback(s)

  • availability
    Downtime costs will naturally depend on the business. I worked many years at NYSE, where downtime could cost millions. I used to maintain and run one of their most critical systems. Over 7 years, the total 'unavailable' time was less than 50 seconds.

    Now working in a small shop: 250-300 Intel-based servers (mostly Windows), a couple of IBM AS400s and one small z800 Mainframe running z/OS. On any given day, from one to several of the servers have availability issues, usually outright crashes, with outages ranging from 30 minutes to several hours.
    The AS400's have some availability limitations in the middle of the night, but that is a result of the application design.

    The mainframe has never crashed in the 8 years I've been running it. Any required outages are planned and usually very short. In that time, there have only been 7 or 8 significant (1 hour or more) planned outages and there are brief (10-30 minutes) quarterly outages, plus a monthly 20-minute outage required by the (admittedly) out-of-date- application design (I didn't design the application). And we do not make use of any mirroring or multiple-mainframe functionality available to mainframe hardware and software architecture.

    Even planned outages to the servers are much more frequent.

    There may be reasons to go the Intel-server route, but applications-availability is not one of them.
    (Neither is I/O operations. No system does I/O like the mainframe).

    If you want maximum uptime, go IBM Mainframe.
    IBM mainframes are often called dinosaurs, but it is worth remembering that Homo Sapiens has only been around about 150 thousand years while dinosaurs were around 1000 times longer - and I question if we will still be around 150 million years from now.

    "Most IT history of the last 20 years has involved replacing what worked with what sounded good."
    ZDNet Gravatar
    steeleweed@...
    29th Oct 2008
  • RE: The 'real' cost of application outages
    I'm not surprised when an article like this turns into an ad for a particular "solution", but how is BTM any different from the many middleware transaction management solutions of the past? There have been a lot of them, virtually all are now gone.
    ZDNet Gravatar
    alflanagan
    29th Oct 2008
  • Very informative!
    I just read a similar article, check it out: Transaction Monitoring
    ZDNet Gravatar
    Alonben
    1st Nov 2008
  • Real performance monitoring ? look out for tractors!
    Body : While we can agree that understanding the detail of transactions can be the key to spotting application related problems, both front and back end, I cannot agree that this will enable an organisation to ,as you say, ?... grasp the intricate web of connectivity between those business activities and the IT infrastructure components.?

    Measuring specific transactions alone is simply not enough to understand that interconnectivity, it ignores so much else that is going on. Just monitoring key application transactions is like running a road transport operation by measuring the performance parameters of just your super important red vans ? you can tell where they are, and when they got there, and learn when there is a hold up. All useful information, but not really as useful as being able to also spot the tractors, broken down trucks and road restrictions that are causing the hold ups. Or having trending information at your finger tips which will let you understand delays are due to ?rush hour? traffic rather than a red van problem.

    For real insight you must look at the whole system picture to give context to the performance metrics of individual applications. You need for example to correlated application performance with trends in SNMP data to understand link utilisation impacts and Netflow data to attribute and investigate application and user usage. Only then can you optimise, diagnose faults and plan your systems effectively.?

    Benny Vogels
    Fluke Networks
    ZDNet Gravatar
    Make IT perform
    17th Nov 2008

Talkback - Tell Us What You Think

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity