I suggested yesterday that Sun's second CMT/SMP generation uni-processor will surprise people in terms of total performance on traditional database tasks, but that's based on a fairly naive measure: most RDBMS transaction completions per minute.
Since it's hard to argue with people who say there's more to performance than that, what happens if we improve the benchmark?
The general answer is that we get an argument - because no credible, third party, group currently offers a widely accepted benchmark methodology measuring the actual effectiveness of a dollar spent to buy a system.
The great thing about the TPC has always been that they force the vendors to publish more or less real cost data right along with the performance results; and, correspondingly, the worst thing about the SPEC organization is that they don't. Meaning that we're caught between the rock of TPC's current benchmark suite irrelevance to most current processing, and SPEC's continuing refusal to force cost as a co-measure on otherwise appropriate benchmarks - something I'll have more to say about tomorrow.
So what other benchmarks are there? Look past the 80s desktop Unix or VAX/VMS stuff, like Drystone and Bonnie, that have made their way into the PC world and what you see is pretty limited - lots of special purpose stuff you can run yourself, but very little in the way of comprehensive, adjudicated, work you can attach dollar figures too and make decisions on.
Both SAP and Oracle, for example, maintain proprietary benchmark suites that are easy to cost but these, while nominally offered as a service to customers making hardware choices, are really about co-marketing agreements with vendors -i.e. if you're buying Oracle and you know you're buying hardware from IBM, it makes sense to look at Oracle's benchmarks to know whether Power or Opteron is likely to be your better choice, but if you're not buying Oracle or haven't picked your hardware vendor, those results are as likely to mislead you as not.
There are special purpose benchmarks that make sense. For example, the Storage Performance benchmarks offer realistic pricing in the detailed reports and do let you compare different external storage for the same machines and applications.
Another example, the notesbench Lotus series are much more interesting because focused on complete systems, because they're well supported, and because they provide real hardware detail and cost numbers. Unfortunately, while you can reasonably extrapolate from relative Domino performance to relative Apache/PHP or similar task performance, the pricing information is relatively weak - and you have to look very closely at the software pricing and hardware configuration information to notice that IBM in particular often stacks the deck to favor Microsoft solutions over Linux.
SPEC.org has a panel working on a new power/throughput benchmark - and they may end up incorporating a variation on Sun's SWAP measure for data center space and power efficiency. Unfortunately it's not ready yet and a SPEC decision to continue its policy of not publishing audited financial information will greatly reduce its usefulness.
By far the most advanced power/efficiency benchmark, albeit without cost information for public distribution, is the PowerEnergy benchmark developed by the eembc.org. Here's what they say about it:
- Provides data on energy consumed by a processor while running EEMBC's performance benchmarks
- Applies to all benchmarks provided by EEMBC, ties performance with energy consumption for specific benchmarks
- Specified for silicon devices which can be certified under current procedures
- Can be used by system designers in conjunction with EEMBC benchmark software to test in-situ processor behaviour
- Non-intrusive methodology
Early adopters are demonstrating clear business value - but the benchmark is largely limited to embedded systems and correspondingly unhelpful if what I want to do is compare a general purpose system X decision to a system Y choice.
So what's the bottom line? Formal benchmarks are in a state of flux right now, with no obvious stand-outs for general applicability and correspondingly interesting opportunities for someone to move into the vacuum this creates. A website offering cost information for each vendor provided SPEC configurations might, for example, be of tremendous value to the industry.