Sometimes, when end users experience application trouble, response times degrade slowly. Other times, applications seem to turn off as if a power switch has been flipped. In many cases, the causes of end user application woes can be identified and preempted before there is perceivable degradation; other times, they cannot. But in all cases of poor end-user application experience, the cause needs to be identified and remedied swiftly.
That is easier said than implemented. But the ability to deliver a consistent, quality end user experience and to fix problems is both growing more important and more complex than ever. More important because workers today are more sophisticated, expect a superior experience and are less patient — while dependence on IT never has been higher. The challenge is more complex because the move to virtualization, cloud computing and SOA makes for more intricate application and infrastructure dependencies. That, in turn, makes troubleshooting much trickier.
In fact, business applications are growing more complex by the day. Seemingly simple transactions encompass the end user client, web servers and application servers, databases, mainframe, message busses, and increasingly services from web service providers, such as public and private clouds. When managing application performance — from the eyes of the user — IT teams need light shed onto the actual trouble spots. That’s the only way SLAs are met, application degradation calls to the help desk are minimized, and web sites are not abandoned due to poor response times.
All this requires a solid understanding of how the health and capacity of every system and device, across all tiers of the infrastructure, supports the application, from the network through the servers and databases to the end-user itself. Unfortunately, the way this information is typically gathered is no longer effective, or efficient, for today’s rapidly changing and dynamic environments.
To date, most organizations have been relying on point products instead of gaining the view of the entire application transaction life cycle. They are monitoring the end user experience, but in isolation. The same is true for their database performance, and the latency of the network layer, as well as how servers, mainframes and other systems are performing. These tools provide slivers of insight, like a flashlight in a dark field. What they need is the shine of a spotlight.
The result is siloed information; the data necessary to fix the problem is scattered among a variety of monitoring and troubleshooting tools. In order to determine the causes of problems, members of each respective IT group need to assemble together with their respective reports. They need to compare their numbers and try to figure out the cause of the problem. It is manual correlation and it should have gone extinct years ago.
While these specialized tools are necessary, and often provide deep insight into the areas on which they focus, they don’t help manage the entire business-technology infrastructure, in real-time.
When problems can start to reveal themselves with shifts in performance measured in milliseconds or fractions of a percentage of CPU utilization — and then impact the end user minutes or hours later — it becomes clear that organizations need the ability to detect end user issues before they occur. That means not only merely monitoring the application, but also tracking the entire transaction flow and monitoring each step of every transaction by measuring response times, latencies, protocol and application errors, and all of the associated dependencies, on every tier from the end-user through the data center.
Consider the behind-the-scenes complexity of a typical online purchase. The buyer adds items to the shopping cart, entering his or her billing information and clicking “submit.” If the user gets an error, the business is most likely lost. This transaction will have likely touched dozens of systems -- the underlying infrastructure, applications, databases, a credit card authentication system, and other tiers. Had the capabilities been in place to monitor all of those “pieces” of the transaction, their error may very well have been avoided altogether. For instance a sluggish database, or partner application, could have been spotted - and remedied - before ever impacting the online shopper. And, for those errors that cannot be spotted in advance of an error or failure, they can be fixed much more swiftly.
Consider this capability as it applies to shipping physical packages.
In the dark ages of shipping (barely a decade ago), packages were shipped and the shipper and receiver knew little more than when the package left its starting point and arrived at its destination. Today, packages are tagged and customers can track their progress in near real-time as they progress along each waypoint in their journey. Still, there is no easy way to determine, while the package is en route or before it is shipped, if it will miss its deadline. It would be useful to have more detail.
To be able to predict if the package will not arrive on time as a result of any difficulties. That data can be culled from slowdowns at the loading dock, the health of the truck engine and tires and real-time traffic information for the truck’s route. Similarly, end-user experience monitoring today provides that type of visibility into application response time and then alerts when response times have degraded. Unfortunately, many tools typically lack the more detailed information necessary to predict and rectify potential performance issues before they arise.
To manage the end user’s application experience properly, organizations need the same capabilities today when it comes to tracking application performance. They need to tag the transaction from its starting point and be able to monitor it as it traverses its way from the end user’s system all the way through the data center and back again. That kind of capability won’t be found with conventional point solutions that individually measure the performance of networks, databases, servers or applications. It only will be found by monitoring application performance from the end user’s perspective — as well as understanding how that experience is affected by the real-time health of all of the devices and systems on which that application depends. That means, ultimately, that application failures, slow response times, and unmet SLAs must not be discovered at the help desk or from upset customers when the damage is already done, and problems are the most difficult to fix.
Motti Tal is executive vice president for marketing, product and business development for OpTier.