Home & Office

How outage triggered mayhem for hospital datacentre

When a brief electricity failure struck a datacentre at a hospital in Australia, it started a chain of incidents that resulted in the serious outages of over 20 health applications
Written by Suzanne Tindal, Contributor

On 20 May, a brief electricity failure struck a datacentre run by Queensland Health in Australia, starting a chain of incidents that resulted in serious outages of over 20 health applications.

The datacentre, located on the campus of Herston hospital, is believed to be one of three datacentres operated by Queensland Health. It lost power for only a fraction of a second, when two flooded Energex transformers failed at around 5pm on that day, according to a source close to the incident. Uninterrupted power supplies kicked in to keep servers up.

However, the power cut tripped the chilled water system, cutting chilled water to the hospital campus. As it was not monitored, the datacentre support team did not notice the loss of the chilled water. A datacentre employee came on scene to check everything was running, but being happy there nothing wrong, he left.

Only two of 10 air-conditioning units within the datacentre were able to use refrigerated gas if chilled water was unavailable, meaning that although the rest of the units were operating, they were not cooling. The temperature in the datacentre began to rise.

No messages
Although people were called in to investigate the temperature rise, the cool water problem was not found. Due to a DNS change the day before the problems began, there were no messages being sent to tell staff of server problems. Four hours after the power cut, services began to suffer. On-call hospital staff were affected and complained. Soon after, a server shut down.

The whereabouts of the air-conditioning specialist who had been called in was unknown to many staff members and he did not answer his phone. It had taken the engineer three hours to arrive on site. Five hours after the systems failed, the fact that the chilled water pumps had not been operating was discovered as more servers shut down with temperatures over 50°. It was believed to be fixed.

Because the remote-access system was not working, staff had to wait until they arrived at the datacentre before they could begin shutting down servers. When they arrived, they started to move systems over to an alternative datacentre, which in some cases caused brief user inconvenience. Some, however, could not be moved since their servers had no ability to failover and Queensland Health's architecture for virtual machines did not allow moving it over to a second datacentre.

The hospital's Cerner electronic medical record (patient administration) system was shut down by the hospital staff.

Six hours after the power cut, the air conditioning was still not working. Although staff believed they had found the problem, more systems shut down, until 75 percent of applications were down and the datacentre reached 45°.

Eight hours after the power cut, chilled water was finally brought back up. Nine hours after, the datacentre was back to normal and the services could be restored. By 9am the morning after the power cut, all services were restored.

Over the course of the problems, 12 applications caused a significant impact, with another 12 having a minor impact. Three years ago the datacentre was forced to shut down for the same reasons. Afterwards, the team had been told it could not happen again.

When queried on the incident, Queensland Health acting chief information officer Ray Brown did not respond to a question on which facilities around Queensland the applications provided services to. However, it is believed that Queensland Health's three datacentres provide services around the state to multiple locations.

Brown denied there had been more than one incident over the past three years at the datacentre.

'Lessons learned'
According to Brown, since several applications were relocated to the other datacentre, there was "minimal disruption" to services. "The majority of services impacted were available by 2:30am and all Queensland Health systems categorised as critical remained operational during this incident," he said.

"In the face of a severe weather event, the IT staff involved were outstanding in their response to minimise the impact of this incident. The ability of staff to physically attend the site was severely hampered by flooding in the area."

Lessons had been learned, according to Brown. Queensland Health was exploring options to remove reliance on chilled water. It also intended to replace the remote-access system by the third quarter of this year. It is undertaking a review of management tools and is examining the crisis-management plan.

Queensland Health has lost several chief information officers over the past several years. Long-time chief information officer Paul Summergreene had his contract terminated by the department in July 2008. Dr Richard Ashbury filled his shoes for a short time, before leaving the chair vacant, with Brown currently leading the department's IT function in an acting capacity.

The news also comes as the Queensland government flagged in the last state budget its intent to spend hundreds of millions of dollars on health IT systems to support its e-health capability.

Editorial standards