As I listen to server manufacturers these days the choice seems to come down to "scale-up" or "scale-out." The former meaning that you can get more processing power by adding more processors in the same server (symmetric multi-processing, or SMP). The latter technique increases processing power by coding the application to run across lots of 1U dual-processor "pizza-box" servers. Blades are a variation on that same theme.
Each of these has it's place. SMP is particularly effective when the application has a monolithic architecture and requests for service aren't always independent. Databases are a good example of the kind of application that people run on big SMP boxes because of cache coherency and other issues. Multiple, independent servers makes sense when the application can be split multiple independent tasks. Web servers are a good example of applications that people run "scaled-out."
Some recent developments in the world of processors could portend changes to the conventional wisdom surrounding these two ways of scaling.
Intel and AMD support up to 4-way SMP and no more. You can buy 8-way, 16-way, and 32-way SMP machines, but this is accomplished at considerable expense and engineering expertise. I think we're close to seeing the last of the 8-socket servers. Only IBM and Unisys sill make Intel-based 8-way systems. Currently, 8-way and higher SMP servers represent less than 1% of the Intel-based server market and that's likely to go down. Here's why:
As I said, connecting more than 4 processors requires a custom bridge. As you'd expect, bridging isn't free and so an 8-way server yields about 6 processors worth of processing power. But software vendors license big apps per socket. Consequently you pay for four more sockets in terms of both hardware and software, but only get two more processors worth of compute power.
Another development making 8-socket servers less attractive is the advent of multi-core chips. We're not far from the day when you'll be able to buy a 4-socket, 4-core server, yielding 16 cores on a single motherboard with no custom bridging hardware. Just getting enough memory bandwidth to that many processors is an engineering feat.
Hyperthreading ups the ante even further. If you're not familiar with hyperthreading, it's not some slight-of-hand like multi-tasking that only makes it look like multiple threads are running at the same time. Modern processors have multiple parallel instruction processing pipelines. Unfortunately, dependencies in the instruction stream mean that lots of that parallel processing power goes wasted. Hyperthreading is able to use that redundant hardware more effectively because the threads are independent. (For an excellent introduction to hyperthreading, see this introduction at Ars Technica.)
Not only do the sockets contain multiple cores, but those cores are, in many cases, hyperthreaded. Right now, 2-way hyperthreading is the norm, potentially giving a single socket the power of a 8-way SMP machine for properly threaded applications (more on this later). It's likely that by 2010 we'll see chips from Intel and AMD that can support up to 512 parallel threads in a single socket through the use of multiple cores and hyperthreading.
This raises an interesting question: how do you make use of that processing power? I'm not asking what will we do with it, so much as how will we write applications that are threaded in ways that take advantage of the architecture. Humans are notoriously bad at writing parallel code and automatically extracting parallelism from otherwise sequential algorithms is difficult as well. I see a few possibilities:
- Virtualization gives us the ability to split a 512-thread server up into lots of little servers, making the parallel decomposition more manageable. For example, virtualization would allow us to put one of today's multi-tiered Web applications on a single chip while maintaining the illusion that we're managing individual boxes with their own OS. As I wrote a few weeks ago, Intel and AMD are both working on putting virtualization primitives in the chip, potentially making virtual servers a standard feature on every box you buy.
- Application servers (like jBoss or Weblogic) support a development model in which programmers develop threadless code and the app server manages the threads. That simplifies the point too much, perhaps, but I think having that much parallel processing power on a single chip might make app servers much more important for developing applications. There are continuing debates about whether app servers add more complexity than their worth, but that might be because we haven't met many problems large enough to require them--yet. In the early 60's programmers scoffed at the idea of operating systems as being "needlessly complex;" that idea is ludicrous today.
- It's tough to see how you'll use that much parallelism on the desktop. I just counted the number of processes running on my Powerbook: 76. Give each it's own thread and you've still got lots of power left over. But I think we'll see new ways spring up of using the parallelism. Just over lunch today I was chatting with one of my colleagues about some of his research that uses massive parallelism to create user interfaces that have embedded machine learning for responding to user inputs.
And, there are things we've hardly been able to imagine. One thing's for sure: applications that require a supercomputer today will be runnable on a desktop in 5-7 years. I can hardly wait.