How do you benchmark real-world work?

How do you benchmark real-world work?

Summary: Most of the technical reviews of Windows Vista I've read recently focus on speeds and feeds. But does that granular approach miss the real point of owning and using a PC? Can any stopwatch-based measurement of isolated tasks performed by individual hardware and software components really measure the worth of a technology investment? I don't think so. What really matters is usability, a subject I've been thinking and writing about for nearly two decades now. But what's the best way to measure usability? The answer isn't as simple as you might think.


Adrian Kingsley-Hughes and I have been focusing lately on a tiny aspect of PC performance. He ran two sets of file management benchmarks on a test PC in his lab, I performed similar tests on a machine in my lab. Results? Inconclusive.

But are both of us missing the real point of owning and using a PC? Can any stopwatch-based measurement of isolated tasks as performed by individual hardware and software components really measure the worth of a technology investment? I don't think so.

This is not a new question for me. Back in the early 1990s, when I was editor of the late, lamented PC Computing, we differentiated our product reviews from those of sister public PC Magazine by focusing on usability. The highly regarded PC Magazine Labs was the quintessential "speeds and feeds" shop. We focused on usability, going to the extreme of spending a small fortune (I still remember the budget battles) building a state-of-the-art usability lab and hiring usability professionals to run it.

I liked our reviews better than the ones at PC Mag because we didn't have a one-size-fits-all conclusion. Instead, using the usability data, we tried to determine which product was a better fit for readers and prospective buyers with different needs. I think that approach still works today.

In the Talkback section of my earlier post, there's a lively discussion of what sort of benchmarking would work better than flawed speed tests that don't map to real world activities. The short version, from commenter frgough, says that Adrian and I should

simply do stopwatch tests on their normal daily workflow and see how the two operating systems compare, because, at the end of the day, that's what it comes down to.

Easier said than done. Here's a short list of lessons I learned from the PC Computing usability lab that are still valuable today:

Preconceptions affect perceptions. In the case of Windows Vista, that's a double whammy. The relentless drumbeat of "Vista sucks" press coverage is pretty hard to ignore. Try to find a usability tester who hasn't read any of that coverage and doesn't already have a bias going in.

Bad experiences affect perceptions too. The negative reviews of Vista are in many cases grounded in painful reality. There's no doubt that bad drivers, bugs in Vista itself, and crappy OEM hardware configurations caused a lot of early adopters to have unpleasant experiences with Windows Vista. Those initial impressions affect perceptions in a fundamental, hard-to-shake way. Even a minor problem can be painful if you don't know the solution. If it requires indeterminate amounts of troubleshooting to figure out why something doesn't work the way it's supposed to, that can be a deal-breaker.

The older, established system has a built-in advantage. Switching to a new computing platform involves unlearning old ways and learning new procedures (just look at the advice offered to people switching from Windows to a Mac). Initial productivity will be lower on the new system.

Are you testing learnability or usability? One trap that usability professionals warn about is the danger of disproportionately crediting a product that has a great out-of-box experience but doesn't deliver over the long haul. Jeff Atwood offers an excellent summary of the issues, capped by this great quote from Joel Spolsky:

If you did a usability test of cars, you would be forced to conclude that they are simply unusable.

Faster isn't always better. Simply measuring productivity by seeing who finishes first doesn't necessarily give you the right answer either. In the hands of someone who knows a system well, even a terrible design can be highly efficient. I can be tremendously productive at a command prompt and can probably finish many tasks faster with command-line tools. But if you forced me to choose between a command-line interface and a GUI for daily work I would choose the latter every time. I don't miss MS-DOS.

Sometimes there is no right answer. I talked with a usability professional at Microsoft recently who described an all-too-common real-world dilemma. The interface designers had to decide how the up arrow should work in a particular feature. There were only two possible choices. The trouble is, usability testing proved conclusively that 50% of the test subjects thought it should work one way, and 50% thought it should work the other way. No matter which design you choose, half of your customers will think you designed an unintuitive interface.

Ultimately, for mainstream business use and everyday consumer scenarios, I think usability is the key to measuring how well a piece of hardware performs. The trouble is finding the metrics to measure usability.

I'm interested in your thoughts. Regardless of which computing platform you use, what aspects of usability are important to you? Leave your thoughts in the Talkback section.

Topics: Hardware, Microsoft, Windows

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Usability

    There are three things that would dramatically improve my real world work.
    The first thing is relatively small but it annoys me. It is the having to watch the start up of a new application under Vista. I am referring to this circle of slowness. If I want to start an application it should start now, or at least give the impression that it listens. It is like the scene in Spaceballs where the commander always wants to prepare something. Why don't you just go for it? I know it is perception only, but perception counts.
    The second and third issue have to do with a paradigm shift. We still assume that there is one average user that works on one desktop (or we assume an enterprise environment where everything is organised by an IT department). We therefore focus on making everything monkey proof and install settings locally. This may be sufficient for most of the users, but wouldn't it be nice if a distinction were made between an average and an experienced SOHO user? I spent quite a lot of time in alligning settings over multiple PC's or using a laptop in different environments. Yes I know you can e.g. sync bookmarks, but have you tried syncing between Firefox and Opera? Or work on a business PC of a client that doesn't allow usb-keys?
    So let's also focus on a substantial sub group that doesn't work on one place on one computer and isn't satisfied with the default of 4 recent files in an Open file dialog. Oh and I don't want to store my settings on a Microsoft or Google Server. I'd rather have it managed by my own provider. Thank you.
    Frank from Holland
    • I agree with you...

      ...on the paradigm shift - differing security practices and software packages can more often than not be a hinderance.

      The only software on my Vista that is slow to start up is Firefox, but after the lacklustre, memory hogging FF2 and the fairly appalling FF3 beta, I think it won't be around on my HDD much longer.
      • agree ?

        You Jest I'm sure! FF3 is faster, very much so and has yet to Freeze Like IE7 which is a joke as far as browsers go!
        • Personal preference

          Could just be my install, I'll try cleaning it right out and then reinstalling it. But the new interface irritates the life out of me. That weird drop down thing on the address bar, and the silly mis-matched back and forward buttons...

          It's a personal preference thing, but at the moment FF isn't doing anything for me that IE7 doesn't do as well, and quite a few things that IMHO IE7 does better, for me at least.

          It's a shame, I was a really big advocate of FF in the past.
    • Professional Versus Personal

      Hi everybody! I work as a computer technician and spend a lot of time working with consumers as they deploy technology both on personal and professional levels.

      While I have read many of the comments about Vista, I decided to reserve judgment until I could test it personally. I purchased a new HP AMD Dual-Core laptop and gave it 4 gigs of DDR2 RAM and then ran some tests of my own.

      Vista is, by far, the "prettier" of the operating systems (although XP can be "tricked out" with some add-on software). I ran a memory and cpu tracker to see how things did as I ran through my daily work and here's what I found:

      1. Vista does use up more hardware. I used about a gig of RAM just sitting there. Of course I had some bells and whistles going in the background, but even a comparably equipped XP load used only about 750 MB.

      The upshot of this is you simply can't put Vista on "older" machines. For most IT people, this is straightforward, but to most home-users, this can be confusing. (I've seen XP run on a K6-2 500 with 128 MB - it wasn't pretty, but it ran.) Vista simply needs a LOT more hardware and is not necessarily compatible with the peripherals and programs you may already have in your possession. If you have the money and can afford the hardware, buy Vista. If you don't need it, XP will work.

      2. Vista, even with SP1 and defragged took about 20 - 25 seconds to load. XP took about 15 seconds.

      The upshot of this is that once they loaded, I noticed almost no difference in actual execution of tasks (beyond the Vista - "
  • Real-world vs. Synthetic

    I think that what's important to point out here is that there are plenty of real-world benchmarks that can be carried out. File copy is one, but I can think of dozens of others - rendering a video, ripping a CD/DVD, converting an audio file, FPS in a game ... the list goes on and on. Sure, it's debatable whether you're benching the OS itself of the application with in the OS, but either way the process is clear - you put real-world work in one end and you get real-world data out the other.

    Then you have synthetic benchmarks. PassMark, FutureMark ... these have less to do with the real world. You end up with a number at the end, but relating that number to reality can be hard.

    The you have the middle ground that you can't bench. Benching tasks such as word processing, email, browsing and so on is tricky, and you end up relying far too much on macros. These tests might look "real world" but I've long been suspect of the results.

    Adrian Kingsley-Hughes
    • I think it's ashame you benchmarked with a Pentium-D

      C'mon that's legacy hardware older than 3 years. Do you think that constitutes Vista hardware requirements (well BARELY)..

      I think you should retry your results with a brand new machine today and add that to your comparison..
  • Benchmarks are definitely not...

    the tell all. As a matter of fact, those benchmarks won't have one iota of bearing on my decision of what OS or hardware to invest in. If I were buying for gaming, it may be another thing.

    Usability and that "warm fuzzy feeling" of confidence in the product trump all of the bench marks put together.
    • Spot on in an immature market...

      Once markets mature, name recognition and "warm and fuzzy" only take you so far. Look at the ISP market for instance. AOL cornered the market on warm and fuzzy and were rewarded in spades during the internet boom when that market was in its young, immature stage. But look at what's happened. The marketplace matured. People began to realize that warm and fuzzy was actually getting in the way. Now, I don't know anyone who uses the kind of warm and fuzzy service AOL used to offer (I suppose they still do). One would probably guess that buying American automobiles was once a warm and fuzzy expierience. That market matured too, and now there are literally dozens of international players, many of whom are eating American carmakers for lunch. I'd suggest that we've already begun moving beyond the need for warm and fuzzy in IT. Now there isn't just one or two products in any particular vertical market to have confidence in. Good solid competition trumps warm and fuzzy on most days.
      • Agreed but...

        we're talking about benchmarking OS's and not markests. I get the warm fuzzies (sorta') with XP but Vista leaves me cold.
        • My in-laws love Vista

          As a home user, they love all the additional media enhancements that provide that warm and fuzzy feeling with Vista. To them, compared to their last 2 computers, the new OS along with the solid hardware upgrade has been a boon to their usability and stability.

          So personally, I think Vista is more usable to home users, but business users are better off sticking with XP-Pro because usability in the office isn't likely to make a productivity up-tick that will every offset the amount of the upgrade.

          As the article suggest -- different users have different applications for an OS. Those different needs can determine usability. Just like no-one (most people) would ever think about setting up a Windows XP box to run a high traffic website. You would install Windows or *nux server instead. Different OS for different needs.
  • Useability improvements equals....

    1. Not having to learn new locations or names for existing, well-established commands.
    2. Faster operation of the most common tasks, like file copys or opening a spreadsheet.
    3. New features or shortcuts to reduce the number of mouse clicks or keyboard use for the most common tasks.
    Unless an 'improved' version of an OS or application addresses these items, you can probably do without it.
  • So what you're telling us...

    is that anyone who spent time reading your benchmarking article just wasted their time. That might be something you want to keep from your readership in the future. There's nothing quite like reading some self-important blowhard explain why someone else is wrong and they are right only to turn around and invalidate their own findings, explaining it away as "well, it's not really meaningful anyway". If benchmarking isn't meaningful then please, for god's sake, stop benchmarking and using the results to make a point. Otherwise, let's have open an honest discussion about how benchmarking applies to the real world (I'd suggest that it does, otherwise there really is no performance metric and we're all being taken for fools by the continual upgrade process). Why tout faster processors, improved code or anything else that's supposedly there to improve performance if performance isn't something that can be quantified. It's like saying "Buy my product, it'll make you feel better about yourself." Sure, there are plenty enough idiots who will buy into that marketing pitch. This market is supposed to be maturing, how about we act like it?
    • It's a response

      I think this blog is a response to the last few benchmarking blogs, where commenters have asked for "more realistic" benchmarking than a few simple tasks like copying or opening zipped files.

      Everyone wants benchmarks, but then when they don't agree with the benchmarks, they demand other, more 'meaningful' benchmarks (to them). So the logical question was, then "how do we do real-world benchmarks", which is the point of this blog.
  • Subjective vs. objective

    After thinking about it some more, this is the kind of review I'd like to see.

    A description of your workflow.

    An overall stopwatch test of total time to complete the same workflow in both
    operating systems. It took me X time in OS A and Y time in OS B.

    What helped you get your work done in each OS and why, and what hindered
    getting your work done in each OS and why.

    And learning curve is a valid part of that evaluation. If a new version of the OS has
    a significant learning curve, then that's something potential buyers for whom time
    is money need to know. And if the learning curve was worth it because you were
    wicked fast and worked with a smile on your face after you got the learning down,
    that's good to know, too.

    I think we don't see this kind of review because it's a lot of work. Running
    benchmarks is something you can do while taking your wife to dinner.
    • To beat the tired old car analogy

      If you read an automobile review, you'll get horsepower and torque and 0-60, but
      you'll also get smoothness of ride, road handling, dashboard controls, sound system,
      and a bunch of other subjective factors and the reviewer's opinion on how enjoyable it
      made the whole driving experience.

      But, I guess, according to some other poster on here, that's silly marketing fluff for an
      immature market.
    • Quick and dirty example

      Here's a quick and dirty example comparing OS X and XP, both of which I use
      extensively, so there is no learning curve.

      Things that help my work in XP that aren't in OS X:

      The ability to resize windows from any side or corner (I do a lot of window sizing
      and positioning for screen shots).

      The ability to type a network share path directly into an Explorer Window (I need to
      connect temporarily to a lot of servers during the day, and they change from day to

      The fact that the above also works when connected via VPN.

      That I can browse servers in XP when connected via VPN (network browsing in OS X
      breaks over VPN connections)

      What hinders my work in XP:

      Inability to put shortcuts to folders or programs in the Explorer toolbar like I can in
      OS X's Finder window.

      Inability to have multiple start menus so that I can have task-specific menus (like
      stacks in 10.5.2).

      Inability to drop a folder in a start menu and have its contents display hierarchically.

      Limited screenshot tools (I make heavy use of screen shots)

      Inability to generate PDFs for document distribution.

      The above is just a quick and dirty example of things off the top of my head.

      It is my opinion that this kind of review would be more useful than simple
      benchmarking because it would give readers a feel for how the OS would work for
      them, or that your annoyances aren't for them because they do different things.

      Just some thoughts from your friendly neighborhood blowhard.
      • But do these translate into increased/decreased productivity?

        For example how much time are you really saving by having windows that resize from any side? Or losing because you don't? Unless a significant amount of your day is performing window resize operations I don't think you gain or lose productivity. Though it might feel you're being more productive with Windows.

        This is not to say little touches aren't welcome. Or that little annoyances don't exist. But in the end you gain a little with benefit x and lose a little with detriment y.
        • Don't underestimate the annoyance factor

          People work better when they enjoy it.
          • I agree. But every platform has its annoyances.

            That offset the gains. In the end all platforms have their positives and negatives. Unless one of them is a primary function of what you do they're unlikely to influence productivity.