Rohit Valia, of IBM's Platform Computing group, stopped by the other day to fill me in on some outstanding technical computing and big data benchmark results that were generated using IBM Platform's Symphony and LSF products. The results were impressive.
- The Terasort benchmark ran 40 times faster using BigInsights 220.127.116.11 and Platform Symphony 5.2
- A MapReduce test ran 63 times faster with Platform Symphony 5.2 than a similar configuration that just used Apache MapReduce
- The Berkley SWIM test ran 6 times faster using Hadoop 1.0.1 combined with Platform Symphony than Hadoop 1.0.1 could manage all by itself.
When people want to show me the results of their benchmarks, I typically am reminded of a quote, "There are three kinds of lies: lies, damned lies, and statistics." that is often is attributed to either Benjamin Disraeli or Mark Twain. Had the author known about them, benchmarks would have most certainly been added to the list.
Regardless of who is actually responsible for that statement, it is often true that suppliers use and abuse benchmarks in the hopes of winning over potential customers even when the benchmark has little or nothing to do with the customer's proposed use of systems. Why do suppliers do this? It is because it is very difficult to know ahead of time how a cluster or grid computing solution is going to really perform until a specific workload is installed and used in real life.