Infrastructure virtualization is no longer just a technically interesting addition to vendors' (e.g., Unix, Intel) and operations groups' bag of tricks - it is quickly becoming a key element in data center provisioning, service delivery, and, equally important, cost-effective consolidation. Like most technology enhancements, there are pluses and minuses to this development, both financially and operationally. Operations groups will have to hone traditional operations skills, especially in capacity and workload management, to attain positive bottom-line results.
META Trend: During 2003/04, IT operations groups will invest heavily in improving business-unit alignment and implementing more efficient automation strategies. Specific areas of focus will include breaking down vertical, platform-specific delivery, rationalizing inconsistent processes, managing the accelerating platform transitions/migrations to Intel (Windows and Linux: 2004-10), and building robust, effective measurement systems. By 2007, these operations improvements will be formalized as part of standard planning and support.
A Short Bit of History
Infrastructure virtualization is nothing new. Mainframe users got a taste of it in the late 60s and early 70s as IBM offered virtual machine technology in two flavors - both, however, enabled by a microcode layer on top of the hardware. Multiple virtual machines were provided by its VM (Virtual Machine) offering and logical partitions via its more traditional MVS (Multiple Virtual Storage) mainframe offering. As is currently the case, virtualization offered faster provisioning, thereby encouraging users to invoke all aspects of infrastructure virtually versus “building and cabling” specific solutions. Another key characteristic of early virtualization was the ability to supply critical virtual resources, since real resources were extremely expensive. This enabled more users and better asset use or return on assets by allowing a resource to run harder - a problem that is plaguing current operations and financial managers, since average Unix and Intel use has stayed well below 30% for the past 10 years.
Thirty years ago, IT organizations had the problem of not having enough real resources. They now have a different problem; too much unused real resources are driving up the cost of computing. Maturing virtualization and virtualization-like partitioning can help turn the tables. The evolution of mainframe architecture virtualization over more than 30 years is now being repeated in an accelerated fashion for both Unix and Intel platforms, with the same goals in mind - namely more flexible and faster provisioning, with better use via aggregation, coupled with the ability (for the longer term, 24-36 months) to run multiple, isolated workloads on a single SMP platform.
A Key Intel Driver: Simple, Stateless Applications
With the continuing application architectural shift to “stateless” application servers driving large back-end database servers, a front-end failure may be “annoying” but it is not “life-threatening,” since all actual application coordination and recovery work is done by the database. In addition, although data center analysis shows legacy systems being overtaken by Intel in three to five years, a key part of a winning strategy for the data center is this simple fact: Application servers for newly purchased applications such as SAP R/3 consume 8x-10x as much computing power as the database. Consequently, there will be increasingly more “simple” application servers talking to large complex database servers as defined in Figure 1 (which depicts past years and the next five years for high-end data center operations). In fact, when considering application architectures and factoring in some Web service futures, it appears that roughly 75% of installed Intel capacity requirements can be met by exploiting (remember Moore’s Law - 2x capacity increases every 18 months) simple 1- to 2-way servers. And although Linux is not part of this analysis, it is interesting to note that Wintel gained a data center foothold five years ago due not to its then robustness but to the increasing use and discovery of “simple” application servers. Because the growth side of computing is still in simple application servers (i.e., 1- to 2-way boxes or blades with a 6x-10x capacity ratio versus database servers), Linux as a simple application server will do to Windows what Windows did to Unix - provide cheaper, good-enough application servers for a market that is growing 60% annually.
Rational Virtualization Consolidation
Operations groups and data centers have two interrelated infrastructure challenges:
The last step, application integration, which is the most difficult, has been quietly side-stepped. Over time, as Unix, Windows, and Linux improve their workload management schemes, this step becomes more reasonable. By 2006/08, these codes should provide enough functionality to enable multiple applications to run in a single logical or real partition.
The Inherent Challenges of Consolidation: Big Is Almost Always More Expensive
There are at least three significant challenges with all consolidation efforts and server aggregation, with the first challenge being the most difficult to overcome:
The Intel family of SMP machines, like all other SMP families (1-way to 32-way), produces negative economies of scale. That is, a transaction that is run on a large n-way machine costs more to run than a transaction on a simple 1-way solution. In fact, the cost ratio of an 8-way to 1-way systems (via published benchmarks and user data) is about 2+ to 1. Moving on to larger machines (16-way to 32-way), users will experience a doubling again of transaction costs, showing a transaction cost ratio of about 4 for the 32-way. Bigger is often not cheaper.
At the low end, a 1-way system is just a simple system, built without the plumbing required for scalable SMP performance. Therefore, its price is low and the throughput to price cost ratio per transaction is exceptional versus 8-way systems. Larger systems, for example, are saddled with plumbing for 8-way and higher high-speed cache coherence requirements and strong SMP expectations. In a nutshell, this is what makes Intel platform consolidation so difficult - exceptionally inexpensive boxes, multiplying like rabbits, that consume steadily growing amounts of care and feeding, but provide high transaction “value” in the form of low basic transaction costs.
A Consolidation Model Framework
From this consolidation introduction, it should be obvious that “just” consolidating eight 1-way Intel servers on an Intel 8-way system, for example, may not provide the needed financial or operational success. An actual consolidation model would work like this: the average cost for a basic Intel 1-way server processor (excluding storage and software) is just over $2K per year, considering a three-year life. Cost for a typical 8-way Intel solution (excluding storage and software) is about $42K per year over its three-year life, or about $5K+ per processor for the three-year term. For a simple breakeven consolidation viewpoint, it requires at least a 20+ server consolidation effort to justify an 8-way platform (2.5 cost ratio per processor times 8 processors).
Complicating the consolidation effort is the simple fact that the more than 20 application servers may have few, if any, “run” characteristics in common, which often leads to erratic operations. This results in broken SLAs, which translates into unhappy customers. Therefore, loading an 8-way machine with the workload of 20 1-way machines is not the financial or operational answer. Because workload management is still very immature, management of performance and user expectations is an operational crapshoot, at best. A better solution is the machine and application isolation provided by machine virtualization techniques supplied by two vendors, VMware and Connectix (part of Microsoft).
Virtualization Game Plan
Although the technical approaches of these two vendors differ slightly, both vendors end up with images that provide the look and feel of the “real thing.” As previously mentioned, virtual machines have been with us since the 1970s. They have been successful at providing complete operational isolation and have enabled significant virtual aggregation; that is, running far more virtual machines (e.g., 1-way Intel platforms) on a real system than would have been possible if each required actual real and complete resources. This brings up the question of why it took so long for the Intel platform to “go virtual.” To understand this, one must look at the market and the IA32 technology.
IA32/X86 Is a Tough Cookie to Virtualize
As for markets, the answer is simple. All vendors living off Intel’s architecture have done exceedingly well, and a large portion of their hardware financial success has been due to their ability to continue to sell increasingly more machines into an already (and inherently) low utilization environment. In fact, until recently, the question of use had rarely been raised. Although virtualization has been a key element of mainframe technology since the late 60s, the IA32 architecture was never designed for virtualization, so subtle workarounds had to be developed, which, in the case of VMware, resulted in several “virtualization” patents.
Virtualization Layer Magic
Most processor architectures - including S/360 through z/OS - contain two or more “privilege levels.” This is a feature that has allowed for the rather straightforward processor virtualization initially found on S/360. Typically, the most privileged level is owned by the operating system and driver software, while application software (an operation system can be considered an application in the virtual world) uses the least privileged state. Simply put, the theory for the Virtual Machines is to run VM code in non-privileged mode and then have the hardware (and some unique software) trap or catch all privileged operations executed by the application or VM and emulate (or execute) these instructions by the Virtual Machine Manager (VMM) - a thin layer that sits on top of the hardware. This is the case for IBM’s VM or VMware’s ESX offering, which virtualizes all the resources of the machine (see Figure 4 for a brief discussion of virtualization). Since the exported interface (built by the VMM) is the same as the hardware interface of the machine, operating systems such as Windows or Linux are unable to detect the presence of the VMM layer. The theory of VMs for the past 40 years points to the simple fact that most processors faithfully generate exceptions (i.e., operations that cannot be done by a non-privileged task such as a virtual machine); the VMM layer emulates them and still maintains control over the system hardware. Unfortunately, the IA32/X86 architecture does not follow this simple rule.
In fact, 17 privileged IA32/x86 instructions do not provide this feature. Moreover, the same machine operation code has different semantics in the various IA32/x86 “protection rings” (or levels of privilege) of the IA32/x86 system, which has four. Therefore, the IA32/x86 architecture is “somewhat” virtualizable. Indeed, both VMware and Connectix have spent a great deal of time developing frameworks and methods to correct this problem in order to keep virtual machines honest. And it is the Virtual Machine management layer that has proven to be difficult to manufacture.
The myriad number of typical PC devices and drivers only increases the complexity of the Intel virtualization challenge. Needless to say, both vendors have solved this challenge and, except for “small” things such as performance and extended SMP support, both are providing technically correct VM solutions that are getting stronger with every release. Although we believe that the x86 VM environment will continue to improve basic efficiency, it will be three to five years before x86 VMs begin to approach mainframe architecture efficiencies. In many cases, mainframe architecture efficiencies run into the very high nineties (with only 2%-3% virtualization losses), since the architecture is more amenable to virtualization. To put x86 virtual performance in perspective, early VM performance pointed to losses of about 40%. Virtualization losses are now in the range of 20%, and less than 10% losses will not be out of the question in three years.
The Battle for Virtualization
Platform virtualization is a key enabler for on-demand computing and the adaptive enterprise. Provided, of course, that overall resources are sufficient, data operations groups can simply provision machines instead of buying machines. In many instances, a VMM supplier such as VMware might provide reference platform environments for a whole family of servers, enabling strong thought leadership as to the how’s and why’s of virtualization and computing architecture in general. Clearly, virtualization is a powerful feature. In light of that, Intel recently announced its Vanderpool initiative, the goal of which is to provide its own virtualization layer or even a Virtual Machine Monitor, much like VMware’s EX product. So there are now three players in the race - Microsoft, Intel, and VMware - and others are certain to join.
As noted, IA32/x86 virtualization required significant skill and effort, and IA64, for example, will also require a similar or greater effort since the architecture is very difficult to virtualize and to virtualize efficiently. With this in mind, and given Intel’s announced five-year delivery window, customers should continue to evaluate virtualization software from viable vendors and not wait for Intel’s offering. For many organizations, current vendors such as VMware are supplying solutions that will have great operational benefit, even if most are still in semi-production mode. Moreover, in 12-18 months, more than 20% of the high-end Intel market will be exploiting virtualization for production applications. Since virtualization is a thin layer on top of the hardware, users will be able to explore different vendor solutions. Although the effort will not be totally transparent, resource requirements will not be outrageous.
Business Impact: Infrastructure virtualization can provide significant operational benefits such as improved time to market and increased configuration flexibility as well as the financial benefits of reduced transaction costs.
Bottom Line: Infrastructure virtualization can provide significant operational flexibility. To be successful, operations groups must have strong operational practices in place to ensure appropriate management of increasingly complex computational architectures.
META Group originally published this article on 31 December 2003.