Hardware comparability in benchmark comparisons

Lots of people believe that identical hardware is required for performance comparisons for an application running under competing OSes to be fair. This is wrong: identical is unfair if the configuration is rigged to favor one OS over the other. Instead the base hardware should be the same with competing teams of experts encouraged to configure that base system to their best advantage.

Last week's discussions under my "purloined benchmark" title included this bit from ShadeTree:

if you are comparing OS performance....

... and you are not using the same hardware You cannot tell if the performance is related to the software or is merely a reflection of better hardware. I proposed you choose the hardware and run a benchmark. I suspect that the results will be the same regardless of the hardware chosen. As I have stated before I do have a background in Unix and in fact started there. I am also involved in the deployment of Linux. I have also stated that linux has a role in IT albeit not on the desktop. As for the configuration in the benchmark I believe it to be typical of many small business systems. I don't think it was biased in any way.

Quite a lot of people believe this - basically that you can only test which of two or more OSes better supports an application by running the application on identical hardware under each OS. In my opinion, however, the core hardware should be the same, but the configuration should be adapted to show each OS at its best.

The specific configuration I object to in the Pedigo benchmark is this:

For the purposes of these tests, it was important that the servers for both the Linux and Windows clusters be configured as identically as possible. The following configuration options were chosen for the RAC clusters:

  • Four HP® ProLiant DL380 G4
  • Two Windows servers
    • RacBench1
    • RacBench2

  • Two Linux servers
    • racbench3
    • racbench4

  • Two Intel EM64T Xeon processors per server (4 logical processors with Hyperthreading), 3.4 GHz
  • Two 36 GB SCSI disks per server, configured as RAID 1
  • 8 GB RAM per server
  • 8 GB swap space/paging file per server
  • Two Gigabit NICs per server
  • One Qlogic 2340-E HBA per server

Suppose I were to argue, in the context of the workload selected and the metric applied, that this configuration should be changed on all four servers by:

  1. adding another 8GB to each;
  2. adding two more storage connectors to each;
  3. replacing both 1Gb NICs with Neptune 10Gb cards. I know this card wouldn't work on the 2004 Linux release used in the benchmark, but lets pretend - and if you don't like that just pick the fastest single card supported in that release.
  4. adjusting Oracle's set-up to separate logging and sort I/O from regular I/O while setting the SGA limits and block sizes to their respective maximums for this Linux configuration.

Now if we were to run the same benchmark workload, and apply the same metrics, Linux would win easily because:

  1. a Windows server application that will just barely fit in 8GB will run measurably more slowly in 16GB - but the paging delay encountered on Linux at 8GB will simply disappear.
  2. the use of only one network card will be a performance killer for Windows but actually lead to a slight throughput improvement for Linux;
  3. the Windows preference for mirroring the two internal drives is now effectively effectless for Linux - since almost all of the paging has disappeared; and,
  4. the larger Oracle limits will have more effect on Linux than on Windows.

Notice, however, that the configurations are still identical - and so, according to ShadeTree and many others, the comparison is still completely fair.

In reality, however, it isn't - because identical is unfair and misleading if the configuration markedly favors one OS over another.

The fair thing to do, therefore, is to have competing teams of experts configure comparable base machines to reflect both the OS of their choice and the application - because, bottom line, Linux isn't Windows and it doesn't use identical hardware identically.