The cost of mainframe Linux

The data center never get close to break even on capital cost and the organization as a whole will incur significant losses because users expect better and will adapt - usually by buying wintel for themselves.

Here's the opening bit from a widely quoted network world report by "layer8" on IBM's recently announced plan to save $250 million by replacing 4,000 small servers with 30 mainframes:

Talk about eating your own dog food. IBM today will announce it is consolidating nearly 4,000 small computer servers in six locations onto about 30 refrigerator-sized mainframes running Linux saving $250 million in the process.

A related article by John Fontana gives more, although somewhat different, details:

The company will deploy 30 System z9 mainframes running Linux within six data centers to replace 3,900 servers, which will be recycled by IBM Global Asset Recovery Services.

The data centers are located in Poughkeepsie, N.Y., Southbury, Conn., Boulder, Colo., Portsmouth, UK; Osaka, Japan, and Sydney, Australia.

The company is focused mainly on moving workloads generated by WebSphere, SAP and DB2, but will also shift some of its Lotus Notes infrastructure.

The mainframe?s z/VM virtualization technology will play a big role in dividing up resources, including processing cycles, networking, storage and memory. With z/VM 5.3, IBM can host hundreds of instances of Linux on a single processor. The z9?s Hipersockets technology, a sort of virtual Ethernet, will support communication between virtual servers on a single mainframe. IBM also will take advantage of logical partitioning, which is rated at Level 5, the highest security ranking on the Common Criteria?s Evaluation Assurance Level (EAL).

IBM says energy costs represent the bulk of $250 million in expected savings over five years.

Now, before we look at substantive issues here, it's important to note that $250 million divided by 3,900 is $64,102 per server -meaning that this story embeds the prediction that energy cost will go up by a factor of more than seven sometime later this year.

Note too that a Dell 860 server with a dual core Pentium D at 3.2Ghz, an 80GB disk, and 2GB of memory lists for $1,060 while a 26Mips z9 (about 120 x86 Mhz) with seven attached 1.65Ghz Power5 class linux processors each with 16GB, lists at about $850,000 with SuSe 9.0. The cost, therefore, of the 30 z9s mentioned will exceed that for 3,900 new Dells by just about exactly five times or $20,000,000 - and that's before additional IBM licensing and storage.

This story, in other words, is as obviously bogus as the two mentioned yesterday.

We do not know what the service times or request frequencies for the 3,900 x86 servers look like, but we can make assumptions about this, do the arithmetic, and then see what that tells us about how well this transition is likely to work.

The 7 IFLs attached to each z9 offer a total of 11.55 PPC Ghz - at the usual 2:1 PPC advantage, this is roughly equivelent to 24 x86 Ghz. 130 (3900/30) single core x86 servers at 3Ghz offer 390 x86 Ghx. In other words, if mainframe ghosting imposed zero overhead and everything ran from directly connected ram disks, than the maximum x86 average utilization the z9 could handle would be just under 6%.

Notice, however, that this is not a linear function - the 6% noted above is a local maximum for a function determined by the relative task completion potential for the two processors in the context of the workload thrown at them. If, for example, the x86 workload consisted of one request per server per minute and that request took an average of 0.6 seconds to service, you'd have a 1% x86 utilization rate - but the mainframe would need 62 seconds per minute to keep up with the workload.

In practice, of course, overheads are a killer because there isn't enough memory or network bandwidth available keep all 130 ghosts alive concurrently.

The bottom line is simple: to the average user going from a a system with 11,700 x86 Ghz to one with 346 PPC Ghz will be almost exactly like going from a 3000 Mhz x86 machine to one running at 180 Mhz - and what that means is not only will the data center never get close to break even on capital cost, but the organization as a whole will incur significant losses because users expect better and will adapt, usually by some combination of drunken sailor spending on Wintel to bypass data center delays and/or by slowing whatever they do to match the mainframe's pace.