Purdue experience shows value of out of the box datacenter thinking

Purdue experience shows value of out of the box datacenter thinking

Summary: An off the wall thought to use capabilities already at hand saves millions of compute hours at Purdue University


One of the current trends in data center hardware energy efficiency is the burgeoning availability of on-demand power distribution and cooling devices.  The basic idea in this model is that the distribution of power and cooling throughout the datacenter should be a dynamic process with it being applied as and where needed.

Obviously, this helps with the overall efficiency of the datacenter, and should be considered part of the overall model for future datacenter design. This summer's heat wave has certainly put a strain on the cooling systems of US datacenters, and overall, it has been surprising how few heat related outages have actually been reported.

But at Purdue's High Performance Computing datacenter, heat is more of a problem than in most corporate datacenters.  The nature of the jobs that run on their supercomputers is such that if the computer or some major subset of nodes goes down, the entire job has to be started from scratch, which is no trivial matter, as some of the computational analysis run at the site can take months to complete, running 24/7.

Since the amount of heat that is generated by 15,000 processors is significant, and runaway temperatures can actually damage the hardware, the Purdue supercomputer datacenter starts setting off warnings at an ambient temperature of 82 degrees and starts shutting itself down at 90. But one of their system administrators thought he had a better answer.

Patrick Finnegan came up with the idea of using the capabilities in the AMD and Intel processors to throttle down the CPU performance (of 8000 processors) in order to reduce the amount of waste heat that was generated by the supercomputer. The nature of the datacenter meant, however, that the process couldn't be tested; the first time it would be used was when it was actually needed.

Needless to say, the process worked, and Purdue has made it available to other users of distributed computing systems running Linux (it was written for Redhat) from their Folio Direct site.

So what does the corporate user learn from this? A little imagination can go a long way, and there is more than one way to manage the environment in your datacenter.

Topics: Storage, Data Centers, Hardware

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • I think that's a great idea ...

    ... In fact I think OSs such as Windows, should have algorithms running in the background, which note the rates of increases in heat in PCs (this applies especially for laptops), and preemptively throttle the processes that are causing the heat increases - if they calculate the heat increases will soon cause the PCs to exceed their heat tolerance thresholds. The OSs could also provide notifications to users, giving them the option of manually shutting down these CPU hungry processes.
    P. Douglas
  • RE: Purdue experience shows value of out of the box datacenter thinking

    Intel SpeedStep and AMD PowerNow! are features that bring down the core frequency if the loads are light. This is already supported by most modern OSs like Linux (cpufreq) and Windows and are already default on Windows. Not clear why additional software is needed to accomplish this.
  • RE: Purdue experience shows value of out of the box datacenter thinking

    It is true that Intel and AMD have supported this functionality for a while now, and vendors like HP provide direct control to processor throttling settings via their Lights Out Management interfaces. HP's power capping works essentially in this manner.

    Furthermore, new servers and supercomputing clusters and being built with components tested to higher thermal constraints. For most new servers, vendors will warranty the server up to 95F inlet temperature, and SGI/Rackable provide servers with up to 104F thermal certifications. The telco industry has required 104F for years, which is why you rarely hear about telephone circuits going down due to heat waves... It is time for the data center to catch up to the state of the art from 20 years ago in the telco market!