Searching for Operational Savings and Effectiveness? Try a Little Virtualization: Part 1 - Virtualization Frameworks

Infrastructure virtualization is no longer just a technically interesting addition to vendors' (e.g.
Written by Rich Evans, Contributor

Infrastructure virtualization is no longer just a technically interesting addition to vendors' (e.g., Unix, Intel) and operations groups' bag of tricks - it is quickly becoming a key element in data center provisioning, service delivery, and, equally important, cost-effective consolidation. Like most technology enhancements, there are pluses and minuses to this development, both financially and operationally. Operations groups will have to hone traditional operations skills, especially in capacity and workload management, to attain positive bottom-line results.

META Trend: During 2003/04, IT operations groups will invest heavily in improving business-unit alignment and implementing more efficient automation strategies. Specific areas of focus will include breaking down vertical, platform-specific delivery, rationalizing inconsistent processes, managing the accelerating platform transitions/migrations to Intel (Windows and Linux: 2004-10), and building robust, effective measurement systems. By 2007, these operations improvements will be formalized as part of standard planning and support.

A Short Bit of History

Infrastructure virtualization is nothing new. Mainframe users got a taste of it in the late 60s and early 70s as IBM offered virtual machine technology in two flavors - both, however, enabled by a microcode layer on top of the hardware. Multiple virtual machines were provided by its VM (Virtual Machine) offering and logical partitions via its more traditional MVS (Multiple Virtual Storage) mainframe offering. As is currently the case, virtualization offered faster provisioning, thereby encouraging users to invoke all aspects of infrastructure virtually versus “building and cabling” specific solutions. Another key characteristic of early virtualization was the ability to supply critical virtual resources, since real resources were extremely expensive. This enabled more users and better asset use or return on assets by allowing a resource to run harder - a problem that is plaguing current operations and financial managers, since average Unix and Intel use has stayed well below 30% for the past 10 years.

Thirty years ago, IT organizations had the problem of not having enough real resources. They now have a different problem; too much unused real resources are driving up the cost of computing. Maturing virtualization and virtualization-like partitioning can help turn the tables. The evolution of mainframe architecture virtualization over more than 30 years is now being repeated in an accelerated fashion for both Unix and Intel platforms, with the same goals in mind - namely more flexible and faster provisioning, with better use via aggregation, coupled with the ability (for the longer term, 24-36 months) to run multiple, isolated workloads on a single SMP platform.

A Key Intel Driver: Simple, Stateless Applications

With the continuing application architectural shift to “stateless” application servers driving large back-end database servers, a front-end failure may be “annoying” but it is not “life-threatening,” since all actual application coordination and recovery work is done by the database. In addition, although data center analysis shows legacy systems being overtaken by Intel in three to five years, a key part of a winning strategy for the data center is this simple fact: Application servers for newly purchased applications such as SAP R/3 consume 8x-10x as much computing power as the database. Consequently, there will be increasingly more “simple” application servers talking to large complex database servers as defined in Figure 1 (which depicts past years and the next five years for high-end data center operations). In fact, when considering application architectures and factoring in some Web service futures, it appears that roughly 75% of installed Intel capacity requirements can be met by exploiting (remember Moore’s Law - 2x capacity increases every 18 months) simple 1- to 2-way servers. And although Linux is not part of this analysis, it is interesting to note that Wintel gained a data center foothold five years ago due not to its then robustness but to the increasing use and discovery of “simple” application servers. Because the growth side of computing is still in simple application servers (i.e., 1- to 2-way boxes or blades with a 6x-10x capacity ratio versus database servers), Linux as a simple application server will do to Windows what Windows did to Unix - provide cheaper, good-enough application servers for a market that is growing 60% annually.

Rational Virtualization Consolidation

Operations groups and data centers have two interrelated infrastructure challenges:

  • Reduce cost: Provide improved hardware, software, and personnel efficiencies
  • Improve time to market: Provide a more flexible (effective) business-driven infrastructure
Physical co-location and logical consolidation can help solve some of these cost problems as well as the introduction of blade technologies, for example (which will be discussed in future research on this topic):
  • The first step, simple co-location, exposes important operational process and usage deficiencies as well as short falls. Knowing the “basics” can help reduce staffing by improving and introducing important processes such as change, configuration, and problem management. As previously noted, a well-planned logical consolidation also provides this benefit to a degree, but it is more difficult to work from afar. In many cases, especially for immature organizations where process maturity as defined by the META Group Maturity Model is in the Level 1 to Level 2 range (out of 5 Levels), physical consolidation is the only real solution if timely benefits are to accrue.
  • Although consolidation - the second step and the rationalization phase - may sound easy, immature management workloads and resources have made running multiple applications on an Intel and even Unix servers too much of an art. Often, operation groups have to rely on heroic efforts by skilled technicians and operations personnel to keep users happy, as applications fight one another for resources. Simple machine constructs are needed to enable the benefits of platform sharing, without the downside of application collision.
Enter virtualization and partitioning. Operations groups now have vendor-supplied virtualization for guaranteed isolation - virtual machines for Intel and an expanded partitioning scheme for Unix. (Although, partitioning is not true virtualization, it can help utilization and operational flexibility.) In addition, expanding Unix workload management (WLM) can provide improved shareability (see Figures 2 and 3 for a description of these differences). Intel virtualization solutions such as VMware and Connectix (a part of Microsoft) supply a virtualization layer on top of an Intel platform for physical and logical partitioning.

The last step, application integration, which is the most difficult, has been quietly side-stepped. Over time, as Unix, Windows, and Linux improve their workload management schemes, this step becomes more reasonable. By 2006/08, these codes should provide enough functionality to enable multiple applications to run in a single logical or real partition.

The Inherent Challenges of Consolidation: Big Is Almost Always More Expensive

There are at least three significant challenges with all consolidation efforts and server aggregation, with the first challenge being the most difficult to overcome:

  • There are always non-economies of scale associated with all systems. Simply put, it costs more to run a transaction on a larger system than on a smaller one, due to the “internal plumbing” costs that enable SMP scale, for example. That is, an 8-way system is significantly more expensive than eight times a 1-way platform. For example, the cost for running a transaction on a “1-way” system (the most economical design) is half the cost of running the same transaction on an 8-way Intel system. So, from the start, consolidation efforts have “cost” challenges. Consolidation almost always requires many servers to be aggregated, easily out-stripping real hardware resources available from, for example, an 8-way system. To make it all “fit,” a virtual solution is more than likely required. It is this solution that can provide the number of virtual entities to match the number of real requirements.
  • A second operational consolidation challenge is the need to have detailed understanding of the targeted server’s use. Although this sounds simple, varying workload patterns have different peak to average usage ratios affecting absolute capacities. Providing consolidated service-level agreements (SLAs) to meet established single-system agreements requires strong processes and methods that facilitate acquisition and analysis of significant amounts of individual system usage data. These skills must be coupled with strong capacity planning and performance management processes - with tools included.
  • The third challenge is that of forcing a holistic view of the effort at hand. For example, did things get better, faster, or cheaper after the consolidation? This requires an in-depth review of platform costs and cost ratios (hardware, software, and people) before and after consolidation. The challenge is to not leave any more money on the table after a consolidation than is absolutely necessary. Basically, we are looking for a reasonably optimized solution for hardware, people, and software, which is no simple task. In this case, we must rely on strong operational processes, for example, that can help pull costs out via simpler, more repeatable organizational structures (and processes) such as a plan/build/run design versus more costly systems administration-centric approaches that have large (often hidden) support staffs.
Non-Economies of Scale: A Constant Problem

The Intel family of SMP machines, like all other SMP families (1-way to 32-way), produces negative economies of scale. That is, a transaction that is run on a large n-way machine costs more to run than a transaction on a simple 1-way solution. In fact, the cost ratio of an 8-way to 1-way systems (via published benchmarks and user data) is about 2+ to 1. Moving on to larger machines (16-way to 32-way), users will experience a doubling again of transaction costs, showing a transaction cost ratio of about 4 for the 32-way. Bigger is often not cheaper.

At the low end, a 1-way system is just a simple system, built without the plumbing required for scalable SMP performance. Therefore, its price is low and the throughput to price cost ratio per transaction is exceptional versus 8-way systems. Larger systems, for example, are saddled with plumbing for 8-way and higher high-speed cache coherence requirements and strong SMP expectations. In a nutshell, this is what makes Intel platform consolidation so difficult - exceptionally inexpensive boxes, multiplying like rabbits, that consume steadily growing amounts of care and feeding, but provide high transaction “value” in the form of low basic transaction costs.

A Consolidation Model Framework

From this consolidation introduction, it should be obvious that “just” consolidating eight 1-way Intel servers on an Intel 8-way system, for example, may not provide the needed financial or operational success. An actual consolidation model would work like this: the average cost for a basic Intel 1-way server processor (excluding storage and software) is just over $2K per year, considering a three-year life. Cost for a typical 8-way Intel solution (excluding storage and software) is about $42K per year over its three-year life, or about $5K+ per processor for the three-year term. For a simple breakeven consolidation viewpoint, it requires at least a 20+ server consolidation effort to justify an 8-way platform (2.5 cost ratio per processor times 8 processors).

Complicating the consolidation effort is the simple fact that the more than 20 application servers may have few, if any, “run” characteristics in common, which often leads to erratic operations. This results in broken SLAs, which translates into unhappy customers. Therefore, loading an 8-way machine with the workload of 20 1-way machines is not the financial or operational answer. Because workload management is still very immature, management of performance and user expectations is an operational crapshoot, at best. A better solution is the machine and application isolation provided by machine virtualization techniques supplied by two vendors, VMware and Connectix (part of Microsoft).

Virtualization Game Plan

Although the technical approaches of these two vendors differ slightly, both vendors end up with images that provide the look and feel of the “real thing.” As previously mentioned, virtual machines have been with us since the 1970s. They have been successful at providing complete operational isolation and have enabled significant virtual aggregation; that is, running far more virtual machines (e.g., 1-way Intel platforms) on a real system than would have been possible if each required actual real and complete resources. This brings up the question of why it took so long for the Intel platform to “go virtual.” To understand this, one must look at the market and the IA32 technology.

IA32/X86 Is a Tough Cookie to Virtualize

As for markets, the answer is simple. All vendors living off Intel’s architecture have done exceedingly well, and a large portion of their hardware financial success has been due to their ability to continue to sell increasingly more machines into an already (and inherently) low utilization environment. In fact, until recently, the question of use had rarely been raised. Although virtualization has been a key element of mainframe technology since the late 60s, the IA32 architecture was never designed for virtualization, so subtle workarounds had to be developed, which, in the case of VMware, resulted in several “virtualization” patents.

Virtualization Layer Magic

Most processor architectures - including S/360 through z/OS - contain two or more “privilege levels.” This is a feature that has allowed for the rather straightforward processor virtualization initially found on S/360. Typically, the most privileged level is owned by the operating system and driver software, while application software (an operation system can be considered an application in the virtual world) uses the least privileged state. Simply put, the theory for the Virtual Machines is to run VM code in non-privileged mode and then have the hardware (and some unique software) trap or catch all privileged operations executed by the application or VM and emulate (or execute) these instructions by the Virtual Machine Manager (VMM) - a thin layer that sits on top of the hardware. This is the case for IBM’s VM or VMware’s ESX offering, which virtualizes all the resources of the machine (see Figure 4 for a brief discussion of virtualization). Since the exported interface (built by the VMM) is the same as the hardware interface of the machine, operating systems such as Windows or Linux are unable to detect the presence of the VMM layer. The theory of VMs for the past 40 years points to the simple fact that most processors faithfully generate exceptions (i.e., operations that cannot be done by a non-privileged task such as a virtual machine); the VMM layer emulates them and still maintains control over the system hardware. Unfortunately, the IA32/X86 architecture does not follow this simple rule.

In fact, 17 privileged IA32/x86 instructions do not provide this feature. Moreover, the same machine operation code has different semantics in the various IA32/x86 “protection rings” (or levels of privilege) of the IA32/x86 system, which has four. Therefore, the IA32/x86 architecture is “somewhat” virtualizable. Indeed, both VMware and Connectix have spent a great deal of time developing frameworks and methods to correct this problem in order to keep virtual machines honest. And it is the Virtual Machine management layer that has proven to be difficult to manufacture.

The myriad number of typical PC devices and drivers only increases the complexity of the Intel virtualization challenge. Needless to say, both vendors have solved this challenge and, except for “small” things such as performance and extended SMP support, both are providing technically correct VM solutions that are getting stronger with every release. Although we believe that the x86 VM environment will continue to improve basic efficiency, it will be three to five years before x86 VMs begin to approach mainframe architecture efficiencies. In many cases, mainframe architecture efficiencies run into the very high nineties (with only 2%-3% virtualization losses), since the architecture is more amenable to virtualization. To put x86 virtual performance in perspective, early VM performance pointed to losses of about 40%. Virtualization losses are now in the range of 20%, and less than 10% losses will not be out of the question in three years.

The Battle for Virtualization

Platform virtualization is a key enabler for on-demand computing and the adaptive enterprise. Provided, of course, that overall resources are sufficient, data operations groups can simply provision machines instead of buying machines. In many instances, a VMM supplier such as VMware might provide reference platform environments for a whole family of servers, enabling strong thought leadership as to the how’s and why’s of virtualization and computing architecture in general. Clearly, virtualization is a powerful feature. In light of that, Intel recently announced its Vanderpool initiative, the goal of which is to provide its own virtualization layer or even a Virtual Machine Monitor, much like VMware’s EX product. So there are now three players in the race - Microsoft, Intel, and VMware - and others are certain to join.

As noted, IA32/x86 virtualization required significant skill and effort, and IA64, for example, will also require a similar or greater effort since the architecture is very difficult to virtualize and to virtualize efficiently. With this in mind, and given Intel’s announced five-year delivery window, customers should continue to evaluate virtualization software from viable vendors and not wait for Intel’s offering. For many organizations, current vendors such as VMware are supplying solutions that will have great operational benefit, even if most are still in semi-production mode. Moreover, in 12-18 months, more than 20% of the high-end Intel market will be exploiting virtualization for production applications. Since virtualization is a thin layer on top of the hardware, users will be able to explore different vendor solutions. Although the effort will not be totally transparent, resource requirements will not be outrageous.

Business Impact: Infrastructure virtualization can provide significant operational benefits such as improved time to market and increased configuration flexibility as well as the financial benefits of reduced transaction costs.

Bottom Line: Infrastructure virtualization can provide significant operational flexibility. To be successful, operations groups must have strong operational practices in place to ensure appropriate management of increasingly complex computational architectures.

META Group originally published this article on 31 December 2003.

Editorial standards