Building the perfect data centre

What do you need in a data centre and can you find it in new technologies that vendors are offering, like self-management, virtualisation and automation?
Written by David Braue, Contributor

Deep under the stands at Melbourne's Rod Laver Arena, far away from the glare of the TV cameras, there is a locked, darkened room. Inside that room lie a host of servers, quietly humming away to serve up the 81 million Web pages viewed by more than 1.8 million visitors during the two-week Australian Open late in January.
During that time, the room remained locked, with virtually nobody going into the room. The job of remote monitoring staff was made easier by intelligence, built into the system, that let the systems automatically detect and avoid many common problems that might have brought servers in a lesser data centre crashing down.
This is the face of autonomous computing, a concept that IBM was keen to demonstrate at the Open -- a high-profile regular sponsorship for the company and a chance to test out some of the latest innovations to come out of the IT giant. By the tournament's end, the data centre had done its job handsomely, serving up mountains of information to journalists, referees, players, coaches, and many of the several hundred thousand people that paraded through the gates during the Open.
Could your data centre match this level of reliability? In the olden days of mainframes, maybe: you'd build the glasshouse, lock the doors, and only let in the gurus.
These days, however, very few companies can afford to turn their backs on their data centres enough to lock the door and turn out the lights. While IBM has the luxury of building the data centre for a specific purpose and keeping its configuration constant throughout a fixed event, for most real-world companies the data centre is a hive of constant change. Computing heterogeneity has introduced levels of complexity that have increased along with open systems' rapid spread into the data centre -- and you just can't leave them alone for a second.
"Just because you've got a data centre doesn't mean you're protecting your data," says David Cowell, principal consultant for data centre services with StorageTek Australia. "A lot of the disciplines that were pumped into us in the 1980s are still not being managed properly 20 years later. As soon as you get into just about any customer environment, you have to change it. People are going to introduce changes, and each change -- no matter how simple it may seem -- could have a huge impact. It's a lot more complex than it was 20 years ago."
As a result, many companies' data centres are operating at far from the reliability and autonomy many would consider ideal -- often due to simple human oversight. Cowell has seen his share of poorly configured data centres where, for example, a surprising number of companies had installed uninterruptible power supplies but were left with malfunctioning units after failing to maintain the batteries.
That's hardly the way to treat a facility that's quickly becoming the heart of the modern business. A recent survey by StorageTek revealed that 60 percent of the 130 Australian organisations surveyed said all their mission-critical servers and storage infrastructure resided in a data centre, while another 26 percent said that around 80 percent of their servers and storage had found a home there.
Recognising the complexity of the modern data centre, many companies aim to provide enough computing and storage power in the data centre by vastly overprovisioning the bandwidth, electrical services, server power, storage, and other resources installed in the site. Assessments of data centres typically show that only 15 to 25 percent of available resources are actually being used. A recent customer survey by American Power Conversion found that the average data centre is designed with three times more computing resources than it requires, and ten times the actual startup capacity needed.
While it may work for many companies some of the time, however, overprovisioning is far from a strategic ideal. At the very least, it's expensive: staff waste time managing servers and storage arrays that are never used, and the business pays dearly for their time, the energy to keep the systems running, and the leasing costs of the equipment itself. Throughout the lifecycle of the data centre, the investment in underutilised equipment also represents a significant expense which ties up capital that could be better invested in other parts of the business.
The problem is compounded in multi-platform environments, where Windows, mainframe, and several varieties of Unix historically required their own dedicated storage space, backup policies, monitoring software, and staff expertise. Multiply this by n servers for each environment, and it's clear why overprovisioning quickly becomes an expensive and -- in all but the largest and best-funded organisations -- strategically problematic proposition.
Building the virtual vision
Unfortunately, given the lack of alternatives it's also been pretty much the only choice for most companies. And by the time they grow into the excess capacity they bought several years earlier, they're trying to build a modern computing infrastructure on years-old servers and storage systems that simply aren't up to the task anymore.
It would be much nicer if data centres could be designed to suit actual business requirements, automatically allocating resources as needed but shifting unused resources to other tasks -- or other companies -- when application load was low. In an ideal world, resource-efficient data centres would automatically sense when it was time to make a change -- intelligently monitoring application loads and performance. This self-awareness would also help them monitor system functions, automatically fix problems, escalate issues to technical support staff, and provision resources as necessary.
Imbuing the data centre with that much intelligence about itself may sound infinitely difficult, but that's exactly the goal of a number of initiatives by infrastructure providers and the various giants of systems management. In recent years, each has painted a long-term vision of they way they see their server environments evolving. HP calls its vision Adaptive Enterprise, Sun is pushing its N1, IBM knows it as e-Business on Demand, and Microsoft talks of its Dynamic Systems Initiative.
Whatever you call it, however, these vendors are talking about the same thing: a flexible computing infrastructure that can be dynamically adjusted -- or adjust itself -- to suit changing application demands. This is almost antithetical to the static and carefully thought-out overprovisioning approach that's defined data centres in the past, but it's being positioned to become commonplace as ideas about service-based computing take hold.
The benefits of virtualisation
At the core of these visions of the data centre future are presuppositions about the spread of virtualisation -- in which technology builds an abstraction layer across servers and storage devices of different types.
With a virtualisation layer in place, enterprise applications can address data centre resources as an abstraction rather than having to be concerned with the specific operating system versions or application programming interfaces (APIs) necessary to access the systems. Relying on virtual servers rather than real ones allows companies to look past the specifics of any given computing platform and instead focus on the functionality of the platform.
The best example of virtualisation is IBM's successful push to reposition its zSeries mainframes as platforms for the creation and maintenance of hundreds of simultaneous virtual Linux servers. In periods of increasing demand, extra virtual servers can be commissioned, using up more of the system's latent and unused computing capacity to scale horizontally rather than allocating all of the system's processing power to one single vertical task. When demand slows, virtual machines are decommissioned or reassigned to other tasks as necessary.
Another place where virtualisation has come into its own is in blade servers, which contain numbers of commodity servers-on-cards that are designed to be automatically configured as any of several types of server according to demand. While a blade might normally support an SAP environment by acting as part of a Windows Server 2003 cluster, it could be quickly reconfigured to act as a Linux Web server during times of peak Web site demand.
Virtualisation of servers provides a robust way of scaling to meet application demand, but the technique is even more important within storage area networks (SANs), where the need to support many different types of systems in a single storage array years ago led vendors to find ways of virtualising the storage. Virtualised SANs can look like many different file systems to the many different operating systems accessing it; on the SAN disk itself, data is stored in standardised blocks that the disks are optimised to handle in their own way.
Storage and server virtualisation converging?
There are signs that virtualisation of storage and servers are converging. VMWare, long a leader in the server virtualisation space, was this year snapped up by storage giant EMC, whose archrival Veritas has also bought provisioning and virtualisation capabilities through its 2002 acquisition of Jareva Technologies and its January purchase of Ejasent. Microsoft, which has many aspirations to worm its way into the corporate SAN, will soon release Microsoft Virtual Server, a reworked version of virtual server technology it acquired from VMWare rival Connectix last year. For its part, Sun will incorporate virtualisation technology in its upcoming Solaris 10 that will allow for the creation of up to 4000 "Solaris Zones".
Both EMC and Veritas, known for their robust storage management software, now have technology to automatically create virtual Windows, Linux, Unix, and other servers that run on top of all manner of hardware platforms, while operating system vendors are building both close storage links and better virtualisation into their platforms. Virtual server images will be intrinsically linked with the capabilities of the enterprise storage that supports them, tightening the relationship between computing power and storage system while maintaining the actual hardware of each completely separate.
The industry "is working towards a timeline of being able to virtualise everything in the data centre," says Angus MacDonald, chief technical officer with Sun Microsystems Australia. "Then we can start thinking seriously in terms of pools of resources rather than that 'I've got x servers of type z'. We need to move people from thinking about systems to thinking about services."
Sun and IBM, among others, eventually see customers taking the idea to a logical extreme with their support for grid-based computing. Also known as utility computing, the grid idea is built on the notion that virtualisation allows coupling of geographically dispersed systems of all different kinds; applications can then call upon this collective computing power as it's needed, paying its owners as appropriate.
Data centres' reality check
In theory, virtualisation represents an important step away from the inflexibility of homogeneous data centres and the practice of overprovisioning of heterogeneous environments. By adapting resources to match changing application needs, technology investments can be better utilised and data centre reliability increased.
As is so often the case, reality is much different. For most customers, high-end platforms remain painfully elusive, even for those that have invested in SANs, blade servers, or other infrastructure that is readily virtualised.
Their more immediate focus is to complete the quite demanding process of server consolidation, which is a useful precursor to full virtualisation of the data centre resources. StorageTek's customer survey found that 71 percent of respondents were planning or considering consolidating their server and storage infrastructure into a data centre, suggesting that consolidation remains a work in progress for most companies.
That consolidation process can often be quite eye-opening, says Sun's MacDonald. "So many organisations really don't understand what they have in their data centres today," he points out. "You can't automate something if you don't know what you're automating."
This makes it early days for the automated enterprise management that N1, like competing paradigms, is designed to make possible. With customers still needing to make investment decisions based on potential business return, MacDonald concedes it is likely to be two years before N1 really starts to gain traction within the corporate data centre.
A far bigger impediment to the spread of autonomous monitoring is the issue of heterogeneity. Few companies have implemented a single technology platform across their entire data centre, which means that effective autonomy is going to require a multi-platform solution from the start. Each of these platforms is affiliated with a vendor that's trying to dominate the market with its own approach to grid and autonomous computing; put them together and, without some crucial interoperability standards, full virtualisation capabilities -- and the attendant access to heterogeneous resources that they enable -- will prove elusive.
Enter DCML (Data Center Markup Language), an emerging standard that promises to bridge the gaps between the myriad components of today's data centres. Built around a standard XML-based vocabulary, DCML code describes the components of a data centre and the policies governing it. It's being designed to enable data centre automation, utility computing, and system management solutions by providing a standard method for interchanging information.
Systems management, virtualisation, and other tools will use this information to recognise and automatically adjust for the variations between data centres. Yet while it's ambitious in scope, just what effect DCML has is yet to be seen: the 44-member organisation is due to issue the first draft of the standard is due by June, so it will be several years before a final standard is ratified and incorporated into products.
Where autonomous software may prove beneficial is in the use of pseudo-intelligent monitoring agents to provide more meaningful views of enterprise network activities. This is particularly the case with regards to security solutions such as intrusion detection systems (IDSes), which have earned a bad reputation for throwing up alerts at a rate of knots and drowning security personnel in unusable information. The addition of increasingly popular correlation engines, which have gained currency in conventional network monitoring environments as a way of better prioritising network alarms, can reduce this type of information to a volume that's small enough to manage and respond to.
StorageTek's Cowell flat out doubts that fully automated data centres can deliver the reliability they're supposed to. "Although automated management products have done backups and restores automatically in the past, a lot of customising and tuning have to be done," he says. "As much as vendors may say their systems are all singing and all dancing, the reality is that once you get to implementation it's different. Running a data centre is all about process -- for example, the process of change management -- and I can't ever see the day where you're going to be able to automate that."
At this pace, should customers really be buying into vendors' discussions about automation and the lights-out data centres they will enable? Not yet.
For now, you can put all the technology you want into your data centre but you're still going to require the expertise of real live humans to deal with exceptions and the ever-tricky issues intertwined with change management. In the near future at least, autonomous agents will become tools to help people, rather than replace them. And as long as the business continues to grow and change, the lights will almost always have to stay on inside the data centre. Otherwise, your people might trip over a cord.

Next: MCT datacentre case study

Case study: Lights still on at MCT datacentre

Automation may help with some aspects of data centre management, but Glen Noble believes humans are still critical in keeping things running smoothly.
He should know. As general manager of data and hosting with Macquarie Corporate Telecommunications (MCT), Noble heads a growing team that offers collocation, managed dedicated hosting, managed shared hosting, and customised data centre outsourcing to a broad array of corporate clients.
With loads of bandwidth coming into the data centre, redundant everything and all sorts of management software keeping tabs on things, MCT's 24x7 data centre is fully wired. Yet while tools such as correlation engines are proving useful in sorting through trimming tens of millions of security alerts -- typically reducing the volume of alarms by more than 90 percent -- Noble says there's a limit to how much technology can do.
"The amount of automation we have now is fantastic," he says. "Management software suites have lots of element managers and correlation engines. On the other hand, I've got an army of engineers who have to patch servers very couple of weeks. With all these patches constantly coming out, you've got to look at them, analyse them, and make decisions as to whether you should install them. Security is about countermeasures and interpreting stuff. At the end of the day, it's extremely laborious and manual."
Noble gives the example of a hosted customer whose systems have become infected with one of the many e-mail viruses currently doing the rounds. Although management systems will quickly pick up on the infection, it ultimately takes a human to decide whether to cut that customer's access for the time it takes to resolve the issue.
Will MCT ever be able turn the lights out on its data centre? Noble doesn't think so; in fact, he's continuing to expand his data centre monitoring team with new hires. And while he appreciates that autonomous computing technology continues to improve by leaps and bounds, he doubts it's ever going to be able to do all the work on its own.
"If you had a data centre that was standalone and not connected to the world, maybe [you could switch off the lights]," he says. "But this one is not an island and it's a fair bit of effort to manage it efficiently and securely. You need human intelligence to make interpretive decisions, and no amount of automation can take away decision making."

Next: Executive summary

Executive summary: keep your lights on

Vendors like to talk about building automated, lights-out data centres. In the real world, however, it's going to be a long time, if ever, before you can flip the switch for real. Here are a few things to remember:

  • Virtualisation rules
    In the past, data was tied to specific servers and operating systems. This is a major reason why data centres have been so hard to manage: things work differently on different systems. Virtual servers and storage make applications run at an abstract layer divorced from the underlying hardware -- improving uptime and availability.
  • Consolidate before you automate
    Data centres are rightfully becoming centralised places for storage of all corporate data and application servers. But don't jump the gun: ensure you've undergone the often tedious process of server consolidation before giving management systems free reign with virtualisation tools. Otherwise, unconsolidated servers may be left out of the loop and lose their mission-critical value.
  • It's all about availability
    Make sure your planning ensures that mission-critical business processes are supported by high-availability systems.
  • Systems can't run everything
    People do. In the end, good old-fashion human intuition and abstract thinking -- two capabilities still lacking from artificial intelligence engines -- are key skills in ensuring that the data centre can change to meet every new challenge. But they're great at grunt work. Use automation where it's appropriate -- like to sort out the wheat from the chaff of the millions of security alerts the average company will get every month from overcautious security systems.
  • Think smart
    The biggest threats to business continuity these days aren't mean, nasty hackers so much as common things like virus-laden e-mails. Formulate policies for preventing them, then make sure your employees learn why it's important to follow them. Simple precautions now could spare you a data centre meltdown down the track.
  • Think heterogeneous
    Vendors won't, but you need to if you're going to introduce some level of consistency across varying types of equipment. Look to standards like DCML to normalise the structure of your data centre in understandable terms.
  • People still know best
    If you think automated tools will let you reduce your data centre head count, think again. You may be able to improve management of certain parts of the facility, but only people can effectively handle the change management that is part of everyday life within the data centre. Keep them around. Besides, machines aren't very interesting at meetings.

This article was first published in Technology & Business magazine, a ZDNet Australia publication.

Editorial standards