CityRail outage prompts kit, software steps

Summary:Software issues caused by a faulty network switch have been blamed for the 12 April CityRail outage that cause the delay of hundreds of train services, leaving thousands of passengers running late.

Software issues caused by a faulty network switch have been blamed for the 12 April CityRail outage that cause the delay of hundreds of train services, leaving thousands of passengers running late.

train

(Cityrail train image by George Grinsted, CC BY-SA 2.0)

The incident began at 7:36am on 12 April when one of the four switches used in the Advanced Train Running Information Control System (ATRICS) in the Sydenham signal box failed, according to a report (PDF) by RailCorp's chief engineer released on Friday. The root cause of the outage, according to RailCorp CEO Rob Mason, was the ATRICS software's inability to readjust after the failure of one of the switches.

"In simple terms, a computer switch failed and this failure was detected by an adjacent switch. This switch detected that the failed switch had started a cycle of operating and then failing. This pattern repeated," he said.

This caused instability in the control system, which caused the shutdown of control systems to 40 per cent of the CityRail network, and led to the delay of 847 trains and the cancellation of 240 services.

"If the first switch had failed completely, the system would have operated in backup mode," he said.

Problematic switches were replaced by the network engineers, which fixed the issue. The report noted that the first faulty switch in question was installed in 2003 and had been in place for over eight years. It had broken down due to a failed electrolytic capacitor.

The report has made a number of recommendations to prevent the incident from occurring again, including upgrading the ATRICS software and developing new standards for the replacement of system components.

Mason said that RailCorp would implement the recommendations of the report. He said that the organisation needs to be more timely with its communication with passengers when incidents such as this arise.

"We have already taken steps to reduce the risk of a similar failure from occurring again but we can also learn some lessons on how we responded to the incident on the day, in particular the quality and timeliness of our communication with customers and staff," he said.

"It's clear that we need to improve passenger communications during a major network disruption. This includes announcements at stations and on trains, as well as the information we provide to customers online."

Topics: Government, Government : AU, Outage

About

Armed with a degree in Computer Science and a Masters in Journalism, Josh keeps a close eye on the telecommunications industry, the National Broadband Network, and all the goings on in government IT.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.