The real pros and cons of server virtualization

PC style virtualization, derived from IBM's zVM ideas and now supported in both AMD and Intel hardware, is a revenant of data processing's failure to adapt to the age of digital computing - an expensive, user punishing, detour into 1920s management ideas in pursuit of that period's holy grail: 100% systems utilization.
Written by Paul Murphy, Contributor

First, lets be clear: this comment is about server virtualization through ghosting - the business of using one OS to run one or more ghost OSes in lieu of applications each of which in turn is able to run one or more applications - it's not about desktops, not about N1 type technologies, and not about containerization.

The pre-eminent examples of ghosting OSes are IBM's zVM - an OS that originated in the late 1960s as one answer to the memory management and application isolation problems confronting the industry at the time- and VMware's more recent rendition of the same ideas for x86.

Back then, IBM was caught between rocks and hard places: lots of people (including IBM's own research leaders) were developing system resident interactive OSes aimed at using the computer largely as a central information switch, but its commercial customer base absolutely refused to countenance any advance on the batch tabulation and reporting model around which its management ideas had evolved in the 1920s and 30s.

Thus when the Multics design effort started at MIT in 1959/60, most of IBM's people didn't even know there were two sides to the argument but the research people lined up with science based computing while those who made the money for IBM almost unanimously choose the data processing side - and ten years later, after MIT's people had first won their design battles and then lost the war (by letting data processing get control of the Multics development effort), IBM's own fence sitting solution: VM, ended up roundly hated by nearly everyone.

Nearly everyone, that is, except people limited to IBM 360 class hardware who had no other means of achieving any kind of interactive use (this was before MTS and a dozen later solutions) - and they, essentially over the objections of IBM's own management, made VM the success it still is.

All of which brings us to the 90s when available x86 hardware mostly wouldn't run NT 3.51 and Microsoft's emergency iVMS port, aka 4.0, contained a misconstrued uaf derivative known as registry that effectively limited it to loading one application at a time - thus forcing buyers to choose between a lot of downtime or rackmounts of dedicated little boxes.

The rackmounts won - at least for a few years; but then data processing got took control of the wintel world and VM, in the VMware incarnation of its ideas, soon became the preferred tool for reducing the rackmount count in the name of their professional holy grail: higher system utilization.

Unfortunately there are two big problems with this:

  1. first, NT 4's limitations went away with NT 4 - addressing them today with VMs achieves a level of absurdity no audience would accept in musical comedy -it's right up there with using a licensed terminal emulation on a licensed PC to access a licensed server running a licensed PC emulation; and,

  2. it is very nearly a universal truth that every gain data processing makes in improving system utilization produces a larger loss in IT productivity for the business paying them to do it.

The reductio ad absurdum example of the latter is Linux running under VM on a zSeries machine: data processing can get very close to 100% system utilization with this approach, but the cost per unit of application work done will be on the order of twenty times what it would be running the same application directly on Lintel; and every variable in the user value equation: from response time to the freedom to innovate, gains a negative exponent.

You can see the latter consequence in virtually every result on benchmarks featuring some kind of interaction processing. For example, the Sun/Oracle people behind their recent recent foray into TPC/C, both demonstrated their own utter incompetence as IT professionals by achieving less than 50% CPU utilization and the user value of this "failure" by turning in response times averaging roughly one seventeenth of IBM's:

  IBM p595 Avg Response time in seconds at 6,085,166 tpmC Sun T5440 Avg Response time in seconds at 7,717,510.6 tpmC
New-Order 1.22 0.075
Payment 1.20 0.063
Order-Status 1.21 0.057
Delivery (Interactive) 0.78 0.041
Delivery (Deferred) 0.26 0.021
Stock-Level 1.20 0.090
Menu 0.78 0.044
Values from the detailed reports at http://www.tpc.org/tpcc/results/tpcc_perf_results.asp

The counter argument I usually hear about all this is that virtual system images are more easily managed than real ones - and this is both perfectly true and utterly specious.

It's perfectly true that VM style virtualization lets you bundle an application with everything it needs to run except hardware, and then move that bundle between machines at the click of an icon; but the simple fact that this applies just as well to Solaris containers as it does to VM ghosts shows that this is an argument for encapsulation and application isolation, not for ghosting.

Worse, the argument is completely specious because it bases its value claims on two demonstrably false beliefs: first that the only alternative is the traditional isolated machine structure, and second that virtualization lets the business achieve more for less. Both are utter nonsense: Unix process management has worked better than VM since the 1970s, and because virtualization adds both overheads and licensing it always costs more to do less than modern alternatives like containerization or simply letting the Unix process management technology do its job.

Again the quintessential example of this is from the heart of the data processing profession: when you take a $20 million dollar zSeries installation and achieve a 60 way split to produce 100% system utilization from 60 logical machines running applications or ghosts, what the business gets out of it is roughly equivalent to what it would get from four Lintel racks costing a cumulative $500,000.

A more down home illustration is provided by VMware itself - their competitive value calculator computes a cost advantage for their products over those from others on the basis of their belief that their VMs impose less overhead and allow you to get closer to 100% hardware utilization. Thus if you enter values saying you've got 200 applications running on NAS connected quad core servers, want to manage virtually, and have average infrastructure costs, they produce a table with this data:

  VMware vSphere 4: Enterprise Plus Edition Microsoft Hyper-V R2 + System Center
Number of applications virtualized 202 205 (inc. mgmt VMs)
Number of VMs per host 18 12
Number of hosts 12 18
Infrastructure Costs $206,571 $280,941
Software Costs $240,951 $181,830
Total Costs $447,522 $462,771
Cost-per-application $2,238 $2,314
Cost-per-application Savings 3%  

All of which should raise a couple of questions in your mind:

  1. first, if the consensus that ghosting doesn't have significant overhead is right, where is VMware getting the third of the box it claims you can recover by getting its ghosting software instead of Microsoft's?

  2. and, second, wouldn't the money VMware wants you to spend on ghosting ($241K in this example) be better spent on hiring people who can move these applications to free environments like Linux or OpenSolaris?

So what's the bottom line? Simple: the real ghost in ghosting is that of 1920s data processing - and the right way to see this particular con job for the professional cost sink it is, is to focus on costs to the business, not ideological comfort in IT.

Editorial standards