Hardware comparability in benchmark comparisons

Hardware comparability in benchmark comparisons

Summary: Lots of people believe that identical hardware is required for performance comparisons for an application running under competing OSes to be fair. This is wrong: identical is unfair if the configuration is rigged to favor one OS over the other. Instead the base hardware should be the same with competing teams of experts encouraged to configure that base system to their best advantage.


Last week's discussions under my "purloined benchmark" title included this bit from ShadeTree:

if you are comparing OS performance....

... and you are not using the same hardware You cannot tell if the performance is related to the software or is merely a reflection of better hardware. I proposed you choose the hardware and run a benchmark. I suspect that the results will be the same regardless of the hardware chosen. As I have stated before I do have a background in Unix and in fact started there. I am also involved in the deployment of Linux. I have also stated that linux has a role in IT albeit not on the desktop. As for the configuration in the benchmark I believe it to be typical of many small business systems. I don't think it was biased in any way.

Quite a lot of people believe this - basically that you can only test which of two or more OSes better supports an application by running the application on identical hardware under each OS. In my opinion, however, the core hardware should be the same, but the configuration should be adapted to show each OS at its best.

The specific configuration I object to in the Pedigo benchmark is this:

For the purposes of these tests, it was important that the servers for both the Linux and Windows clusters be configured as identically as possible. The following configuration options were chosen for the RAC clusters:

  • Four HP® ProLiant DL380 G4
  • Two Windows servers

    • RacBench1
    • RacBench2

  • Two Linux servers

    • racbench3
    • racbench4

  • Two Intel EM64T Xeon processors per server (4 logical processors with Hyperthreading), 3.4 GHz
  • Two 36 GB SCSI disks per server, configured as RAID 1
  • 8 GB RAM per server
  • 8 GB swap space/paging file per server
  • Two Gigabit NICs per server
  • One Qlogic 2340-E HBA per server

Suppose I were to argue, in the context of the workload selected and the metric applied, that this configuration should be changed on all four servers by:

  1. adding another 8GB to each;
  2. adding two more storage connectors to each;
  3. replacing both 1Gb NICs with Neptune 10Gb cards. I know this card wouldn't work on the 2004 Linux release used in the benchmark, but lets pretend - and if you don't like that just pick the fastest single card supported in that release.
  4. adjusting Oracle's set-up to separate logging and sort I/O from regular I/O while setting the SGA limits and block sizes to their respective maximums for this Linux configuration.

Now if we were to run the same benchmark workload, and apply the same metrics, Linux would win easily because:

  1. a Windows server application that will just barely fit in 8GB will run measurably more slowly in 16GB - but the paging delay encountered on Linux at 8GB will simply disappear.
  2. the use of only one network card will be a performance killer for Windows but actually lead to a slight throughput improvement for Linux;
  3. the Windows preference for mirroring the two internal drives is now effectively effectless for Linux - since almost all of the paging has disappeared; and,
  4. the larger Oracle limits will have more effect on Linux than on Windows.

Notice, however, that the configurations are still identical - and so, according to ShadeTree and many others, the comparison is still completely fair.

In reality, however, it isn't - because identical is unfair and misleading if the configuration markedly favors one OS over another.

The fair thing to do, therefore, is to have competing teams of experts configure comparable base machines to reflect both the OS of their choice and the application - because, bottom line, Linux isn't Windows and it doesn't use identical hardware identically.

Topics: Windows, Hardware, Linux, Open Source, Operating Systems, Servers

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • What's fair? PRICE

    My former employer had a yearly workstation benchmark process. We would invite the big 4 UNIX workstation vendors (HP,Sun,IBM,SGI), give them the applications and test matrix - and let them tune their boxes to their hearts content. After the benchmark was complete, we looked at price/performance numbers and made an informed decision. Bear in mind that this company purchased about 1000 workstations per year - which is enough to keep the vendors interested and "honest". Even the losers of the benchmark came back with lower quotes to win business (SGI lost one year and we ended up buying O2 workstations for a couple grand - a rock bottom price in those days).

    It was relatively simple to benchmark UNIX CAD/CAM workstations - as the user experience is not too hard to measure. We tried looking at this same process for servers, but the costs of running it were high. To stress test a large infrastructure application, you would need server, SAN, Disk arrays, network, power and SOFTWARE LICENSES for things like Oracle, Datastage, Websphere, etc. Not to mention that some apps don't run on some platforms, so you need "equivalents". Then you have to compare things like - does support come with the platform (bundled) or separate (a la carte)? Just looking at a bill of sale for a large server is pretty disheartening (IBM P690: Power cord - $1000, Front rack door - $5000, Rear rack door - $4000, Total rack price (no computer) $40k). Would price/performance be better if we left off the doors? . . .
    Roger Ramjet
    • Yes

      1) I've always liked competitive run-offs. You generally get the best price/performance balance that way. (Unless, of course, you agree to pay per diems for the set-up, then you just lose your shirt)

      2 - the issues you raise about the packaging are interesting. On the P680 I had the misfortune of working with the packaging exceeded the cost of the processors. Oddly, it was a great little machine - had it run anything other than AIX it would have been great: fast enough, integrated storage, reasonably reliable - but the real cost exceeded the quoted cost by over 50% precisely because of things like $1K doors that turned out to be integral to the thing's cooling plan. HP played the same game - ugh.
      • Well now...

        the p690 is a very different box from the p690.
        The p680 also known as the S80(which was basically an as400)
        was RS64 based and the p690 was power4 based. Big difference.

        // Jesper
  • Once again you state as facts ....

    ... things you believe to be true but offer no supporting evidence or benchmarks to support your claims.

    Lets examine them one by one;

    "a Windows server application that will just barely fit in 8GB will run measurably more slowly in 16GB - but the paging delay encountered on Linux at 8GB will simply disappear."

    This is a totally false statement. Windows will make use of the additional Ram and will speed up.

    "the use of only one network card will be a performance killer for Windows but actually lead to a slight throughput improvement for Linux;"

    I don't know what you are trying to prove here other the Linux cannot handle multiple NICS effectively. The Windows box will handle both dual 1GB or a single 10GB NIC cery effectively and will not suufer the performance degradation you claim. By the way, 10GB networks are not exactly in common usage in the Enterprise let alone small to medium size businesses.

    "the Windows preference for mirroring the two internal drives is now effectively effectless for Linux - since almost all of the paging has disappeared; and,"

    The additional RAM will also reduce the paging in Windows. I don't know what mirroring drives has to do with it. Is that just misdirection?

    "the larger Oracle limits will have more effect on Linux than on Windows."

    Depending on the cluster size and drive layout the changes to Racle will be no more effective on Linux then Windows.

    In conclusion, you have made a lot of assertions without support many of which are totally false. You threw in hardware that would not typically be used on this class of machine and added tweeks to the application that have nothing to do with the premise the hardware favors either OS. I am still not convinced.
    • To answer one question.

      [b]I don't know what mirroring drives has to do with it.[/b]

      The original test mirrored the swap partitions (as has been discussed in this very blog) which creates an unneccesary performance hit.

      ...and you claim you deploy Linux at work...
  • Agree with ShadeTree: baseless arguments


    In many of your arguments (particularly with hardware) you through out comments as if they are facts. Back when Core 2 Duo was released you argued for AMD's performance superiority even when Core 2 Duo was blowing it away. You've argued the superiority of Niagra when it again and again in multiple 3rd party benchmarks, Web server, Database applications was out powered. You've categorically dismissed Windows as less secure than Mac OS X and Linux yet in two contests the Mac fell fastest and first and due to Apple code not 3rd party.

    Your entitled to your opinion but opinion that is simply based on bias is not worth anything. Quite frankly, do some work with Windows so that you have personal experience not just hearsay. It is almost as you live in a dream world.

    Some of the other ZDNet bloggers did experiments. George Ou, Adrian K. Ed Bott and others try stuff and test their assumptions. Test yours. Rather than read trade journals or talk to buddies who share the same the same bias.
    • oh boy - some answers

      1 - I think that if you look at what Intel is doing now and maybe read some industry materials you'll find that they're copying AMD's integrated memory controllers and high nandwidth interconnects. Core duo now is faster than AMD then only because intel can make it at a smaller scale - but back then AMD blew it away and now Intel is copying it.

      2 - I don't think there are credible benchmarks showing Xeon competitive with Niagara. if there are, point me at them and I'll discuss them in detail in this blog.

      And no Ou et al are not credible sources.

      3 - Yes MacOS X fell first in the last and most widely touted test. Linux fell last (or did it fall at all? - I'm not sure). What's going on there is that it takes two steps for ab attack to work: first find a software bug, then exploit it. With both wintel and mactel on intel the exploit is easy so it comes down to finding a software bug. The bug exploited to bring down mactel in that test was in third party software - software available for both wintel and mactel and equally vulnerable on both.
      Further, if you read what I write, you'll notice that I'm now generally saying that Apple's in a security panic because intel leaves their far more vulnerable than it used to be - i.e. they're desperately trying to protect a reputation really earned by the PPC architecture they've abandoned.
      • Technicaly it didn't fall.

        I'm still not sure if it could have been completely owned in the same way (via Flash) as there is no real user/admin privilege differentiation on OS X while a Linux hack may only gain user privileges with a secondary hack required to gain root privileges.

        Can't say for sure but as FireFox and, by extension Flash, would be running in userland it's likely it would have taken more effort to completely rootkit the box.

        Unless it was running Ubuntu of course, in which case you're pretty much hosed. Great distro, terrible security policy.
        • Any exploit that targets a buffer overflow ....

          ... can completely own NIX the same as it does Windows. You are outside of the controls placed on the system by the OS.
          • Ah

            Didn't know any details had been released. I can't believe buffer overflows still happen.. no wait, I've met some pretty incompetent programmers.
  • RE: Ou is not a credible source

    "And no Ou et al are not credible sources."

    Ou is not a credible source as compared to, say, who? George may not be picture perfect, but who in blogging - or even IT - is? As for effort, he can hardly be faulted. Perhaps you'll enlighten us on what makes one "credible" in your opinion.
    • Real World IT

      The problem with Ou is that he often seems extremely out of touch with enterprise computing. He obsesses over things like boot times, the size of executables, cobbling together a cheap PC or server from components, etc.

      He also tends to shut his mind to considerations outside his own devising. Given that he seems pretty out-of-touch with enterprise computing, this is a problem.

      That being said, at least he makes a effort to defend his assertions. He also seems pretty smart and occasionally insightful, although often this just makes him look like a troll because he seems smart enough to know better.

      Basically, he seems to conflate personal computing, small-business IT, and enterprise IT into the same thing. This is probably good for his readership.

      Murph has the opposite problem. As ShadeTree pointed out, he makes statements as facts then doesn't bother to defend them. I often get the impression that he doesn't care about computing outside of megacorporations and governments, or at least doesn't think it's important.

      Murph's pieces are often high-minded, and I think may be written simply because he likes to hear himself type. I would say he likes to read his own writing, but then they would be better proof-read. If people happen to read it and post comments that's gravy.

      For me it often boils down to Ou being wrong in concept but right in details, while Murph is right (or at least interesting) in concept but dubious in details.

      If the concept is wrong then I don't care about the details.
      Erik Engbrecht
      • Thanks

        In my mind the reason I often don't defend claims like that about WIndows slowing down as you add unused memory is two fold:

        1 - if you know, you know - and if you don't the amount of proof needed is prohibitive for a blog; and,

        2 - the question as raised by shadetree suggests that he didn't understand the nature of the benchmark run and the gaming involved in getting the throughput to use just less than 8GB on windows Server. Again, the amount of verbiage needed to explain and re-explain is just prohibitive.

        And... if I'm wrong on details (fact, not opinion) I'll be embarrased but glad someone pointed it out to me.
        • Defending assertions

          If you make an assertion, you must be prepared to defend it in order to maintain credibility. This especially applies to issues of fact.
          No matter how much "verbiage" is required. (You might wish to consider a different word; you've admitted to surplusage.)

          "You either agree or you're wrong" or, as you put it, "if you know, you know", is a challenge and not a substitute for an explanation.

          Any idea can be stated in a single sentence. Incompletely, unsupportedly, but communicating what's essential. And serving as the topic sentence for an explanation or explication.
          Being unable to simplify often signifies a limit to understanding.

          Comments are brief arguments. If necessary proofs or descriptions must be left out to fit, then the topic selected has been too large.
          The solution can be to limit the subject matter by breaking it up into pieces which can be fully discussed within the available space.

          Sorry, Murph, but your defense as expressed here seems more an admission that you're willing to argue only incompletely, and that the correctness of your views is unquestionable on certain aspects of the topic you selected.
          Anton Philidor
        • Because "I said so" is not a ....

          ... very convincing arguement. It takes a lot of nerve to indict your colleagues credibility when your own is so obviously lacking.
          • Which is interesting.

            Given that this has been the entirety of your counter-argument. In some cases, you've even managed to prove your own ignorance rattling on about how little you think the Linux kernel could have changed in two years, repeatedly, until I brought up the changes from three months, and now criticising Linux on bases you don't understand.

            Here's a tip: there is always overhead when using multiple NICs to access the same network, therefore there will always be gains from using only one NIC. If one O/S needs to use two NICs because it can't get the speed it needs out of one card but another could do the same job with one, it's the [u]former[/u] O/S that needs work. It is, of course, possible that Windows has better packet-splitting code, but then, if it can't handle a one-card workload on one card, it'd need it.

            This is all completely irrelevant to your argument that Murph's claims are 'just opinion' of course, because if that's the case then, obviously, you can't even use your misreading of his arguments as any claim for Windows or against Linux because you're just using opinion to back up opinion, right?

            Of course, that you've completely misread that particular argument leads me to the opinion that your opinion isn't worth much.

            The argument, of course, is that if Linux can handle the workload on one card while Windows requires two, Linux can kill that overhead while Windows can't and so a performance advantage is gained.

            Dom mah?
          • You missed this conversation by a mile.

            Here is a tip for you. Don't start a conversation by rambling on about a previous one. Now to the discussion at hand. The hardware used in the benchmark did have two Gigabit NICs. Only Murph claimed this was an advantage for Windows. Whether the performance win for Windows was influenced by that remains in question. You claim there is overhead associated with two cards and that Windows requires two to do the same work as one in the Linux box. What you failed to notice is that is comparing two 1GB NICs to 1 10GB NIC. Hardly a fair comparison.

            You then gloss over the fact that Murph claims you need two storage controllers to advantage the Linux bocx over the Windows box. Wouldn't your same overhead arguement apply there also? Is it your contention that because Linux needs two to do the work of one it needs work?

            This whole discussion goes back to where I showed a benchmark where Windows beat Linux on the same hardware. That wasn't opinion that was fact. Murph nor you have offered anything viable to refute that. Sure you made a paltry attempt at it by showing benchmarks on different hardware running different applications and acted like that proved anything. So for the record you need to get your act together before you go off half cocked.

            Dom mah, right back at you!
          • I didn't claim any such thing.

            I used a lot of 'ifs'. I was being hypothetical. At no point did I actually make a specific claim, other than regarding your knowledge of networking.

            [b]You then gloss over the fact that Murph claims you need two storage controllers to advantage the Linux bocx over the Windows box.[/b]

            Not so much glossed as missed. With regards to the network cards, I was only reiterating what the previous argument was (again, not actually making any claims). With regards to the storage controllers, I'm not sure what that's about, maybe it's the lack of Intel Matrix drivers for Linux (not actually a problem, the hardware still facilitates excellent software RAID), or maybe it's something that disadvantages Windows rather than helping Linux. Maybe Linux needs multiple storage controllers, or maybe Windows can't handle all those extra controllers properly.

            With regards to the NICs, they are, again, identical. So what's the problem this time? It is, after all, no going to give any advantage to either system. Right?


            Of course, what's really missing the point is [u]that's that point[/u]. It's identical hardware designed to [u]gives Linux an artificial advantage[/u].

            Here's something you appear to have 'glossed over'

            [b]Notice, however, that the configurations are still identical - and so, according to ShadeTree and many others, the comparison is still completely fair.

            In reality, however, it isn?t - [i]because identical is unfair and misleading if the configuration markedly favors one OS over another.[/i][/b]

            That's straight from the article. You might want to read it through properly next time.
          • I didn't gloss over anything!

            I never said the new configuration was unfair. If Murph had tested and supplied data to back his theories with his peferred configuration I wouldn't have a problem with it at all.
          • So...

            why did you even bring up the NICs? Why bring up the extra storage controllers?

            On the one hand your arguments depend on operating systems operating differently on the same hardware, then they depend on hardware having no effect. Which is it?