When supercomputing benchmarks fail to add up

Using benchmarks to choose a supercomputer is more complex than just picking the fastest system, says Andrew Jones
Written by Andrew Jones, Contributor

Benchmarking is either an invaluable procurement tool or just a pointless attempt to quantify the immeasurable, says Andrew Jones.

Benchmarking is valuable for many reasons. It can measure the performance of a system architecture or new algorithms, or evaluate progress in application development and identify hotspots to optimise. But the highest-profile use of benchmarking is to help with high-performance computing (HPC) buying decisions.

Benchmarking is so important in procurement that it would be foolish to try doing without it. Yet it is also something that can be hard to get right, because of complexity, the plethora of options and user applications, and the difficulty of keeping the benchmarking in proportion to the scale of investment.

In supercomputing, benchmarking is an area of specialism in its own right, both on the vendor and user sides. It is a discipline that can help make the most of your investment, but it can also make your head whirl in the process.

Using benchmarks correctly
For example, people often assume the system with the best benchmark will win the order. Sometimes bidders make this assumption, seeing the benchmarking as the most concrete aspect of the proposal evaluation process, and sometimes it is buyers who think the benchmark will provide an unambiguous winner.

Of course, in any sensible procurement, benchmarks are no more than a supporting element. They are one method of determining the business benefit. But other business aspects also matter — for example, reliability, service partnership and price.

Benchmarks will not — or should not — exclusively pick the winner, but they can and should be used to narrow the field and help avoid buying a turkey.

Excessive demands
However, nearly as bad as those procurements that use no benchmarking are those that require each bidder to benchmark a large number of applications under specific rules on their tendered systems. Sometimes the benchmarking effort by the bidder and buyer is out of proportion to the scale of the proposed procurement.

The efforts of bidders under these excessive conditions raise costs and reduce margins, potentially affecting the price-performance available to the customer, or the long-term viability of the bidder's business. In some cases, the size of the benchmark effort compared with the investment on offer simply means the vendor cannot even afford to bid, therefore reducing competition and options for the customer.

The effort from the procurement team is also significant. They have to define the benchmarks, support the vendors running them, collate and compare the results, and investigate anomalies. All these activities take time, raise the cost of the procurement and, again, potentially reduce the money available for investing in the proposed system.

Get the balance right
So, the trick is to balance the benchmarking with the scale of the procurement. For small procurements, or for clearly defined user needs — for example, a few applications with known dataset patterns — buyers could use a very lightweight benchmarking process or publicly available and verified benchmarks. Or they could employ a benchmarking specialist with experience of the applications and systems to provide independent advice.

If the procurement is larger, high profile, involves public money or a diverse set of users and applications, a wider benchmarking exercise becomes viable. It is also important to get the size of benchmark dataset right. If you are buying a bigger machine to cope with workload growth, don't just use today's problem size.

Prove it
Perhaps the most important use of benchmarks is in acceptance tests. These are tests the buyer conducts after the winning system has been delivered and the vendor wants to hand it over to the customer — followed by an invoice, of course. The tests check that the vendor has supplied a system to match the bid and that it works correctly.

Benchmarks need to be a key part of this process. If the system delivered cannot match the bid and rejects it, then negotiate a remedy in the form of a discount or extra performance, or, if necessary, make a business decision to accept the solution as-is, knowing the risk.

Benchmarks should not be the only feature of an acceptance test, and the set used for acceptance may be either more or less thorough than the procurement benchmarks. But to be fair and to get the right result, it must be made clear to bidders what they will have to commit to in terms of benchmarks.

If the procurement benchmarks are only a performance guide by the vendors, then their value to you in comparing solutions is low. If the procurement benchmarks are part of the acceptance suite, then bidders must be held to them, and their effort and the impact on price and risk in the bid should be commensurate.

It is surprising that so many HPC procurements are conducted without benchmarking playing any role at all. In these cases, the buyer is relying on the vendor's sales pitch and subjective assumptions about value for money. How can the return on investment be evaluated under those conditions?

As vice-president of HPC at the Numerical Algorithms Group, Andrew Jones leads the company's HPC services and consulting business, providing expertise in parallel, scalable and robust software development. Jones is well known in the supercomputing community. He is a former head of HPC at the University of Manchester and has more than 10 years' experience in HPC as an end user.

Editorial standards