In most cases, a public benchmark is really nothing more than a transaction race where bragging rights and platform pride are the prizes. But every so often, a revolutionary performance breakthrough comes to the forefront during a test.
In the case of eWeek Labs' Web server benchmark, Red Hat's Tux 2.0 Web server running on a Linux 2.4 kernel has taken performance far beyond what was previously possible and blazes the way for future Web servers built on the same architecture.
Working closely with Dell Computer's Performance Engineering group (the original group that first published Tux's amazing performance benchmarks on the SPECWeb 99 benchmark) a test performed at eWeek Labs found that Tux was able to perform nearly three times faster than current Web server mainstay Apache (12,792 transactions per second vs. 4,602 tps) when running a mix of dynamic and static Web content.
The 60.7MB of static Web content was small enough to easily fit into RAM, so this benchmark primarily tested networking and thread management code, not disk-handling routines (we did have Web server logging enabled, however).
Tux's amazing speeds, even on low-end hardware, strongly validate its unusual design: First, Tux puts Web server code into the kernel and reads Web pages directly from Linux's kernel-mode file system cache for speed; second, Tux handles high numbers of connections very efficiently by using a small pool of worker threads instead of using one worker process per connection (as Apache does); third, Tux uses its own high-performance thread-scheduling algorithm to minimize the impact of disk activity.
Tux is also very easy to deploy incrementally across an enterprise because it can transparently forward Web requests it cannot handle to another Web server, such as Apache. Tux's main weakness is that it doesn't support Secure Sockets Layer traffic, a feature planned for a future version.
The fact that Tux 2.0 was also significantly faster than Windows 2000's Internet Information Server 5.0 Web server (5,137 requests per second) clearly shows the advantages of Tux's new design over that of a well-established Web server. The next version of IIS (which ships with Microsoft Corp.'s Whistler project) uses several ideas introduced by Tux, including the kernel-space design.
IBM's AIX has included a kernel-space Web cache (although not a kernel-space Web server) since 1999, so this in-kernel trend is starting to sweep across the industry.
In terms of system implementation, the explosive performance of Tux and Tux-like Web servers should allow IT managers to build faster and more scalable Web server farms using fewer servers and processors, which in turn should free up corporate resources to buy bigger and better application and database servers.
In this test, we also wanted to help quantify the many scalability and performance changes in the Linux 2.4 kernel. It's very clear from our results that Linux 2.4, whether running Tux or Apache, is a far faster platform than Linux 2.2 was.
As mentioned, Tux's internal architecture is designed specifically for high performance, but that design is only one of five factors critical to its top-notch performance, according to Tux's primary author, Ingo Molnar, kernel development/systems engineer at Red Hat, in Berlin.
The other four areas are all features of the Linux 2.4 kernel and will speed up any Linux server application, not just Tux: zero-copy TCP/IP networking, interrupt and process CPU affinity, per-CPU kernel memory resources (called slab caches) and wake-one scheduling. (Some features, including zero-copy networking, require server application changes before they can be used.)
Molnar also credits the big development effort to tune the 2.4 kernel for SMP (symmetric multiprocessing) systems. "Getting good SMP scalability in a kernel is a process of many, many smaller steps and careful profiling," he said. "The 2.4 kernel's main goal was to achieve these enterprise scalability levels."
Zero-copy networking provides a way for a network card driver to access the data it's been asked to send directly from the kernel's disk cache or user-space memory buffer. Previously, the kernel had to copy data from the disk cache to a separate network buffer before sending it.
Affinity features associate system objects such as a running process or an interrupt with particular CPUs to get more from each CPU's cache.
Wake-one scheduling is a major change that improves the efficiency of multiprocess server applications.
In the Linux 2.2 case, all processes waiting for external input (such as network traffic) to arrive before continuing to run are woken up when needed input arrives. Because only one process needs to handle the request, the rest, lacking data, immediately go back to sleep. In Linux 2.4, only one process is woken up, saving CPU cycles. ´
Senior analyst Henry Baltazar and west coast technical director Timothy Dyck can be contacted at henry_baltazar@ ziffdavis.com and timothy_dyck@ ziffdavis.com, respectively.
Find out how the open-source movement is revolutionising the high-tech world at ZDNet UK's Linux Lounge.
Have your say instantly, and see what others have said. Click on the TalkBack button and go to the Linux lounge forum
Let the editors know what you think in the Mailroom. And read other letters.