Measurement vs. Perception

Sometimes an IT performance problem has nothing to do with computers or applications - but the general rule holds: if you can't measure it, you don't understand it and you can't manage it.
Written by Paul Murphy, Contributor

The old adage that you can't manage what you don't measure has an obvious corolary: to improve management, first improve measurement.

I was reminded of this at both the public policy and personal history levels this week - on the public policy level there was the widely heralded claim that this year, realistically the coldest in decades, was the hottest on record and on the personal level via a call from a former colleague facing an improbably complex performance problem.

The temperature claims turned out to be largely the result of cost cutting meeting a converging agenda: pilots need to know surface air temperatures near runways to gauge lift, and so the pressure to cut costs affecting various national weather services has led those agencies to preferentially reduce the number of reporting stations not on or near runways - thus significantly raising reported average temperatures.

The other problem was much harder to understand: basically the key servers are AIX (so no dTrace); the networking is Cisco (so largely unfathomable); there's a large collection of mostly Wintel clients and the rack mount applications that go along with those (so no good performance measures there either); and, of course, there's been no long term focus on performance that might have generated some kind of proxy measures.

He does have some records: the help desk software, helpfully chosen by some long gone predecessor to maximize positive reporting, offers a bit more than three years of "data", the SAP error logs go back almost twelve years, and the AIX system logs go back to the latest machine replacement - about two years ago.

He's held the job for about six months and from day one has heard other executives complain that things used to work but now don't - a complaint that seems to refer mostly to a perceived significant increase in system response time over the last eighteen months or so but is rapidly becoming the aura through which the company sees him.

So the obvious question is, of course, "what changed?" but the less obvious, and as it turned out more important, question is "how do they know?"

He'd done the obvious stuff: looked for network bottlenecks, reviewed the DB logs, had people examine specific corporate PC applications (particularly AD and Exchange) for performance, upgraded a few racks, talked to both IBM and SAP about it, even going so far as to load and try to run some specific analytic software there guys recommended.

Great, except that he found a few minor problems but nothing sufficient to affect user perception - in fact what the logs really show is that the company now does about 10% less business, and thus fewer transactions, than it did during this quarter during 2008.

So what to do? First, ask the right question: if IT can't objectively track performance and nobody has numbers on user response, how do the users know that performance has been degrading?

The answer turned out to be that the layoffs affected IT too - in particular there was an DB/AIX guy the users treated as their go-to guy for IT who left just before the last round of lay-offs when about a third of the "user facing" staff followed him out the door rather less voluntarily.

Thus users thought computer response had slowed when, in reality, what had happened was that a lot of the time saving face to face shortcuts around IT standards and procedures that had grown up during the earlier period of relative staffing stability in IT had disappeared - leaving users with the perception that IT had become less responsive and user management complaining about overall system response in terms the IT people heard as complaints about computer and application response.

Basically what happened to my friend was that his predecessor had laid off the wrong people to leave only those most likely to work to rule in place - a classic case, apparently, of retaining in his own image.

I suggested he sit down with the other executives to discuss the issue while launching both cross training and a replacement program aimed at replacing the worst of his deadwood with more motivated people - but the general lesson here for the rest of us is the same as that from the hottest summer nonsense: if you hope to manage well next year, you'd better measure, and measure the right things, this year.

Editorial standards