Internode resorts to disaster recovery

Summary:The email accounts of Internode users were stranded over the weekend as the internet service provider battled a major storage infrastructure failure and was forced to fall back to its disaster recovery centre to restore lost services.

The email accounts of Internode users were stranded over the weekend as the internet service provider battled a major storage infrastructure failure and was forced to fall back to its disaster recovery centre to restore lost services.

Simon Hackett
(Credit: Internode)

Customer email, corporate Web hosting, personal Web hosting, Web mail and customer Web tools took a hit on Friday when some of the company's systems were taken down by a major hardware failure affecting multiple servers.

Corporate Web hosting was restored early Friday afternoon, although other services remained down until later. Email was the last to be restored, with the percentage of customers whose services had been fixed creeping up until Sunday afternoon, when the company announced all services were up and running. The company insisted no email was lost.

Internode managing director Simon Hackett wrote on broadband forum Whirlpool that the outage could have been over four hours after it started, but once the recovery process had been completed, the system began crashing within minutes of being fed production traffic. This required the entire file system, holding 22 million plus files, to be rebuilt.

Complications aside, the outage never should have happened, Hackett said in another post.

"We have a very large investment in a very high end dual-site/fully redundant storage area network system that just isn't supposed to do this — ever. Clearly, it has — and yes, the vendor of that system has been involved (from 20min after the initial failure) in being a part of 'the solution' here, too," he said.

The failure also involved the server operating system. When things were working again, there would be an investigation into exactly what caused the disaster, Hackett said.

"The wrap-up here is going to involve two separate vendors (SAN and server OS) debugging some failure modes neither of them has seen before, some changes in approach in handling the mail cluster to avoid the restoration process (in the unlikely event its ever needed again) from taking so long, and a variety of other related measures," he said in another post.

"This is a rare and extremely annoying thing for us, as well as for you — and we're absolutely determined to avoid it becoming a habit," he said.

Internode did not respond this morning to an emailed request for comment.

Topics: Collaboration, Data Management, Outage, Storage, Telcos

About

Suzanne Tindal cut her teeth at ZDNet.com.au as the site's telecommunications reporter, a role that saw her break some of the biggest stories associated with the National Broadband Network process. She then turned her attention to all matters in government and corporate ICT circles. Now she's taking on the whole gamut as news editor for t... Full Bio

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.