IBM hopes to upend industry standard server ROI equation
Summary: IBM will introduce a new class of industry standard servers that it hopes will widen its market share lead and put rivals like HP and Dell on defense.
IBM on Tuesday will introduce a new class of industry standard servers that it hopes will widen its market share lead and put rivals like HP and Dell on defense.
Big Blue's new family of servers, dubbed the eX5 portfolio, features architecture tweaks that allow the customer to add more memory without buying an entirely new server. IBM spent three years engineering the systems, which will be previewed at CeBIT in Germany.
IBM plans to alter the industry standard server return equation via memory expansion options and an extra chip that's designed to coach the system to better performance. IBM said it will announce three eX5 systems through 2010: A four processor version; a new blade design; and an entry-priced two-processor server (right).
The big advantage here appears to be IBM's memory pitch. The eX5 line is engineered to support more DIMMs (Dual In-Line Memory Modules). A DIMM is a printed circuit board that holds memory chips and plugs into a socket on the motherboard.
Industry standard x86 blade servers generally come with 12 to 16 DIMMs and if that's maxed out you need to buy another server. The real trick would be to add more memory without buying a new server and all the hardware that goes with it.
Also: Server sales show signs of life in the fourth quarter; IBM remains top dog
IBM's plan with eX5? Offer blade and rack servers that have 16 DIMMs standard and then the ability to add an additional 24.
According to IBM, the win is that customers don't have to buy a new server when they max out memory. They can simply buy more memory and can do it in smaller increments for overall savings. Tom Bradidich, IBM fellow and vice president of IBM x86 servers, said expansion options are a big plus because customers were buying full systems when they only really needed more memory. Those additional servers led to higher maintenance and license costs. "It was like buying a full Happy Meal when all you really wanted was the prize," he explained.
Bradidich said this approach can help customers buy less equipment, cut energy costs and prevent server sprawl. Bradidich argues that x86 servers are based on PC architecture that is three decades old and locks memory and the processing power together. "PC architecture shouldn't masquerade as enterprise server," said Bradidich.Aside from the memory advantages (see memory drawer right), IBM is also adding an additional chip to its eX5 systems. The chip, based on IBM's X-Architecture, will ride shotgun along with standard Intel server chips and memory. This IBM chip will cut the latency between memory and the processor. With the additional chip, IBM claims that its eX5 portfolio will deliver 30 times better database performance compared to the current generation of systems, 99 percent better performance per watt and the ability to run 78 percent more virtual servers for the same license cost.
Bradidich added that the X5 chip pulls together I/O, chip, memory, storage and networking to coax more performance out of industry standard memory and chips.
Big Blue said pricing of these new servers will be competitive with the broader market, but specifics would wait until Intel launches its latest server chips at the end of the month.
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback
How is this new tech?
and 90's in the form of PCI add-on cards where you
could add another proc and/or more RAM. Or for
the PowerPC Mac, the ability to run PC apps via a
celeron chip, ram, etc on a PCI card. Seems like
it was just re-engineered for current times if
anything LOL
SIMM Stackers!
RE: IBM hopes to upend industry standard server ROI equation
RE: IBM hopes to upend industry standard server ROI equation
I find that typically if the memory requirements grow, so have the overall system requirements. Even if you could swap the CPUs as well, would you not want new power supplies, faster bus, etc...
Plus, it's called "commodity" hardware for a reason - it has a finite life-span.
You missed the point.
One ECC failure can ruin your whole day
And you are sure that IBM has not accounted for this?
Are you sure that they have?
Yes, as a matter of fact, I am.
Do a little digging yourself before you introduce //more\\ noise into this thread.
But it has more than just ECC
Uncorrectable errors happen. Get used to it.
http://arstechnica.com/business/news/2009/10/dram-study-turns-assumptions-about-errors-upside-down.ars
Chipkill is certainly better than traditional ECC, but no error correction system is perfect. As you increase the total size of your RAM array, you increase the risk of an uncorrectable error. The fact remains that one such error takes down the entire server. If the server is running many virtual machines, that creates a lot of pain.
My take is that there are very few workloads where a single "super sized server" is the best solution. I'm not saying that [b]nobody[/b] needs a server with several TB of RAM, but [b]very few[/b] people do. As such, IBM's announcement is mostly marketing BS, not some "breakthrough".
More common, yes, but with Chipkill
you're in pretty good shape.
I disagree that "very few" people need this. For a VM host, one of the critical factors is RAM. You don't even necessarily need these kind of ultra-reliable measures in that situation--just buy multiple boxes and set up an HA solution. If the memory fails in one server, you can failover to another.
I would worry much more about unrecoverable error rates for disks (affecting rebuilds) more than RAM:
http://blogs.zdnet.com/storage/?p=805
HA/clustering vs. mainframes
My point exactly! The best way for most people to achieve HA is with multiple boxes, not with a single "high reliability mega server".
[i]I would worry much more about unrecoverable error rates for disks (affecting rebuilds) more than RAM[/i]
I'm worried about both.
Chipkill and DIMMkill
However, a double bit unrecoverable error is dependent on the ECC domain.
So a server with twice the memory will experience twice the CEs and UEs as one with 1X memory, just like two separate servers with 1X memory would. But it is not just the number of DIMMs, an 8GB DIMM will have twice the potential for errors as a 4GB DIMM.
Chipkill helps most of this. I do not have data, but I would guess Chipkill provides similar reliability at 100GB of RAM as ECC only provides at 1GB RAM. IBM studies in the late 1990s showed Chipkill reduced UEs by 150 times compared to ECC-only.
As for the comment there is no PC operating system which can withstand a memory UE, this is simply not correct. It depends if the UE occurs in kernel space or user space of the host operating system, the error reporting capabilities of the processor, and the architecture of the OS.
Nehalem-EX includes Intel's Machine Check Recovery, formerly only found on Itanium. MCA can report to the OS the memory location of a UE. The operating system, if properly architected with a service management facility (rather than the legacy UNIX init system), can then deal with the error. If the error is in kernel space, it will panic and reboot the kernel, to prevent data corruption. If it is in user space, the OS can kill the effected process and restart it.
VMware plans to support this in a future release of vSphere. What this means is if vSphere is running on Nehalem-EX, and a UE affects only a running VM, and not the ESX VMkernel, ESX will kill and restart the VM, and keep the other VMs running.
One last thought on this. IBM knows what it is doing when it comes to DRAM memory. They have an excellent track record on reducing and tolerating UEs on the mainframe and pSeries. These machines have built-in hypervisors and are like giant ESX hosts hundreds of gigabytes to terabytes of DRAM. They rarely crash.
Future plans vs. current reality
As I said, no PC operating system [b]available today[/b] can recover from an uncorrectable memory error. Let me know when VMware has something that (1) is proven to work, and (2) I can buy. Until then, it's just marketing hype.
Innovation
More common, yes, but with Chipkill
I disagree that "very few" people need this. For a VM host, one of the critical factors is RAM. You don't even necessarily need these kind of ultra-reliable measures in that situation--just buy multiple boxes and set up an HA solution. If the memory fails in one server, you can failover to another.
I would worry much more about unrecoverable error rates for disks (affecting rebuilds) more than RAM:
http://blogs.zdnet.com/storage/?p=805
RE: IBM hopes to upend industry standard server ROI equation