Could this be every techie's dream job? To design a supercomputer from scratch, with no expense spared, no need for private sector investment and, if at all possible, no human involvement in the running of it. This is the challenge that one supercomputer specialist is wrestling with in Germany.
Germany's second most powerful supercomputer in Leibniz, near Munich, is going through an upgrade that should see it double in performance in 2007.
The Leibniz Rechenzentrum (LRZ), a specialist supercomputer centre, was only completed two years ago. According to the director of the centre, Professor Dr Heinz-Gerd Hegering, its Silicon Graphics (SGI) computer was designed from scratch and built with an eye to repeated upgrading of the main components as supercomputing technology improved.
The design of the completed building is certainly eye-catching, with a flat, cube shape that dominates the squat-science park that surrounds it. It looks, if anything, like a Borg ship beamed down from space and left marooned in a sea of bland low-rises.
The alien comparison is a valid one. Professor Hegering has planned this system to operate without people, fully "lights out", with only robotic tape storage loaders moving within its walls. During our visit to the centre we saw only one other person in this massive facility.
Considering that the whole building is only there to house one supercomputer, it is massive, almost twice the height of the five-storey building standing next to it.
The supercomputer sits at the top of the building, with only the exhaust blowers for the air-conditioning system and the roof above it. The computer room is in a cage in the centre. Around it, on all four sides, is a space stretching from the ground to the top of the building, which helps shield the computer.
The Leibniz supercomputer is part of a network of computers and research and education facilities that run across Bavaria and throughout southern Germany.
Professor Hegering is chairman of the board of Leibniz University, which he says was "one of the first elite campuses in Germany" — campuses that were identified as academic areas of particular excellence. It is a state university and, like the supercomputer, owes nothing to private enterprise or investment. While local companies, such as BMW, would like to be involved with enterprises such as the supercomputer, their involvement is not encouraged.
The research and academic infrastructure around Leibniz is vast, including 65,000 computers linked in what they call a high performance computer cluster (HPCC).
The Leibniz supercomputer is a two-year-old Silicon Graphics (SGI) 4700, and is part of a planned European Computer centre. Within this there is a Grid project, and that extends out to the rest of the world. There are three national supercomputers within Germany.
"We need to move this with one voice," Hegering says of the European projects. "Within days we are signing a memorandum of understanding between these groups. We want to boost the science through understanding."
The framework for the Leibniz supercomputer is eight of these sets of servers. The basic components are Intel Itanium 2, single-socket, dual-core servers, and there are 1,024 processor cores in a single row. Counting a CPU as one, you get the advertised total of 4,096 units.
In performance terms, it is rated for 26.2 trillion flops (floating point instructions per second) but it is now going through an upgrade that will more than double its performance, according to Hegering.
There have been challenges, not least SGI's lurch into Chapter 11 bankruptcy protection soon after the LRZ bought the supercomputer.
"You ask, was SGI the right choice in view of what happened to the company?" says Hegering. "At that time you could not forsee the soundness of the enterprise. I could not forsee Chapter 11. But, you know, that did not affect us at all."
Between now and the end of the year, the Leibniz supercomputer will be upgraded incrementally to dual-socket Itanium 2 servers. Additionally, some of the blades will move to the 1.6GHz 9MB cache Montecito dual-core Itanium 2.
This is part of Professor Hegering's strategy of improving the supercomputer without widespread disruption. The whole computer room is built in such a way as to be easily upgradeable. But it does not rely on Intel's ability to offer upgraded processors. At the end of the upgrade, peak performance of around 60 trillion flops will put the Leibniz supercomputer near the top of the Top 500 supercomputer list.
However, Professor Hegering does not put much faith in the Top 500 list, which he calls an "artificial measurement".
"It is comparable to categorising all vehicles with one number," Hegering says, “How do you distinguish between a bus and a truck? It says nothing about search, about frequency bandwidth, about many other factors."
The Top 500 is not for academics, Hegering argues. "Of course your position in the Top 500 is important, but mainly for political reasons," he says. "Politicians like to see they have something that is up there."
Like the true IT professional he is — and he has been working on supercomputers since 1968 — Hegering says: "My aim was to optimise the usability of the system."
As Hegering showed us, underneath these systems is a massive two-metre void filled with hundreds of pillars sitting a very short distance apart. Any of the pillars, which are supporting these systems, can be removed to allow any of the mass of cabling to be moved or removed and replaced with ease. If the LRZ moves to a completely different computer platform with completely different infrastructure platform, the old one can be removed and the new one put in with ease.
Looking like an alien ship arriving, even the air conditioning in this computer is, to say the least, eye-catching. And massive.
It's not just the computers and the computer room at the LRZ that exist on a large scale. So do all the ancillary systems that are needed to look after a large and expensive supercomputer. Most of these systems run underneath the computers, apart from the air-conditioning outlets which sit below and just above the roof of the centre.
These systems certainly work. As you walk around the centre you quickly gain a new understanding of the term "ambient temperature". The environment is perfectly ambient, neither hot nor cold, moist or too dry. It would be a very comfortable place to work, except no humans work here. The perfect environment is there for the benefit of the systems alone and it appears that they thrive on it.
This picture shows a tiny fraction of the cabling needed to keep so many separate computers working together. There is masses of cabling of course, but as you walk around the centre, open doors, roof spaces and hatches, you see there is absolutely nothing out of place. Everything is entirely in order and in its correct place, right down to the cabling, which sits in perfect order and is clearly marked.
The standard access to the centre is across this bridge, which increases the LRZ's sense of isolation. There is, of course, other access built at ground-floor level, for emergency use and, when necessary, for equipment, but this is the only access for regular use. And, naturally enough, when walking around the centre you don't see anybody using it apart from visitors.
It is elevated high above the ground, because that's where the computers are. Everything below is equipment, and there is rather a lot of that. It is worth noting that the computer room for the supercomputer only takes up one third of the width of the building. There are other IT systems and equipment here as well, which are governed by the same rules on "lights out" computing and general low maintenance as the supercomputer itself.
The coolant tanks for the air-conditioning systems continue the larger-than-life and orderly theme. These tanks stand two metres high, and we counted six in one area. That helps explain how the efficient ambient temperature in the centre is achieved.
The tanks sit just below the main computer room.
Argon is one of the most expensive gases you can use for fire control in a computer, and is believed to be the best in terms of suitability for emergency use around people and computer systems. These argon tanks are on hand for use in an emergency at the LRZ and, in themselves, represent a sizeable investment for the public sector that financed the enterprise.
While safety features such as deployment of argon in the event of a fire can be handled by automatic emergency systems, there is also the control centre for dealing manually with more complex emergencies. The control centre is unmanned and remains silent and dark except for the flickering of the nine main computer terminals. According to Professor Hegering, the last time the centre was used was during the power cuts that blacked out large swathes of Continental Europe in November 2006.