Rohit Valia, of IBM's Platform Computing group, stopped by the other day to fill me in on some outstanding technical computing and big data benchmark results that were generated using IBM Platform's Symphony and LSF products. The results were impressive.
- The Terasort benchmark ran 40 times faster using BigInsights 220.127.116.11 and Platform Symphony 5.2
- A MapReduce test ran 63 times faster with Platform Symphony 5.2 than a similar configuration that just used Apache MapReduce
- The Berkley SWIM test ran 6 times faster using Hadoop 1.0.1 combined with Platform Symphony than Hadoop 1.0.1 could manage all by itself.
When people want to show me the results of their benchmarks, I typically am reminded of a quote, "There are three kinds of lies: lies, damned lies, and statistics." that is often is attributed to either Benjamin Disraeli or Mark Twain. Had the author known about them, benchmarks would have most certainly been added to the list.
Regardless of who is actually responsible for that statement, it is often true that suppliers use and abuse benchmarks in the hopes of winning over potential customers even when the benchmark has little or nothing to do with the customer's proposed use of systems. Why do suppliers do this? It is because it is very difficult to know ahead of time how a cluster or grid computing solution is going to really perform until a specific workload is installed and used in real life.
Since suppliers aren't in the business of giving complex, expensive computing solutions away, they try to demonstrate what a somewhat similar workload can be made to do on a specific configuration. The benchmarks IBM cited are designed largely to show how certain types of cluster or grid-based computing solutions will perform.
Will a customer see the same or similar performance running their own applications is a key question. The answer, of course is, "it depends." Very similar workloads that are running on very similar system configurations that have been set up by people having very similar expertise to the IBM folks are likely to see very similar performance. Workloads that are quite different, that are running on configurations that are quite different and were configured by people having quite different levels of expertise are likely to perform differently.
What attracted my attention was the enormous performance improvements offered by inserting IBM Platform Symphony or LSF into an environment when both the software being tested and the system configurations were identical. While I wasn't totally surprised by the results as I've been following Platform Computing for nearly two decades, the results were impressive.
The point IBM is trying to make that using an intelligent orchestration tool that is designed to manage the efforts of thousands of systems can make a big different in performance, efficiency and reduced costs appears to be well supported by the benchmark results. I have to wonder, however, if similar results could be achieved by using other orchestration software, such as the well-known beowulf project. Since that type of configuration wasn't tested, we don't know the answer to that question.
If your organization is involved in technical computing, high performance computing or Big Data, it would be wise to look into what IBM did to learn more about how to improve both the performance and efficiency of your operation. Furthermore, you are likely to discover that you can accomplish the same things using a much smaller system configuration when a low latency orchestration tool like Symphony is optimizing resource usage.