Internode resorts to disaster recovery

Internode resorts to disaster recovery

Summary: The email accounts of Internode users were stranded over the weekend as the internet service provider battled a major storage infrastructure failure and was forced to fall back to its disaster recovery centre to restore lost services.

SHARE:

The email accounts of Internode users were stranded over the weekend as the internet service provider battled a major storage infrastructure failure and was forced to fall back to its disaster recovery centre to restore lost services.

Simon Hackett
(Credit: Internode)

Customer email, corporate Web hosting, personal Web hosting, Web mail and customer Web tools took a hit on Friday when some of the company's systems were taken down by a major hardware failure affecting multiple servers.

Corporate Web hosting was restored early Friday afternoon, although other services remained down until later. Email was the last to be restored, with the percentage of customers whose services had been fixed creeping up until Sunday afternoon, when the company announced all services were up and running. The company insisted no email was lost.

Internode managing director Simon Hackett wrote on broadband forum Whirlpool that the outage could have been over four hours after it started, but once the recovery process had been completed, the system began crashing within minutes of being fed production traffic. This required the entire file system, holding 22 million plus files, to be rebuilt.

Complications aside, the outage never should have happened, Hackett said in another post.

"We have a very large investment in a very high end dual-site/fully redundant storage area network system that just isn't supposed to do this — ever. Clearly, it has — and yes, the vendor of that system has been involved (from 20min after the initial failure) in being a part of 'the solution' here, too," he said.

The failure also involved the server operating system. When things were working again, there would be an investigation into exactly what caused the disaster, Hackett said.

"The wrap-up here is going to involve two separate vendors (SAN and server OS) debugging some failure modes neither of them has seen before, some changes in approach in handling the mail cluster to avoid the restoration process (in the unlikely event its ever needed again) from taking so long, and a variety of other related measures," he said in another post.

"This is a rare and extremely annoying thing for us, as well as for you — and we're absolutely determined to avoid it becoming a habit," he said.

Internode did not respond this morning to an emailed request for comment.

Topics: Collaboration, Data Management, Outage, Storage, Telcos

Suzanne Tindal

About Suzanne Tindal

Suzanne Tindal cut her teeth at ZDNet.com.au as the site's telecommunications reporter, a role that saw her break some of the biggest stories associated with the National Broadband Network process. She then turned her attention to all matters in government and corporate ICT circles. Now she's taking on the whole gamut as news editor for the site.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

7 comments
Log in or register to join the discussion
  • Internode

    So, S. Hackett does not issue an apology to his customers or discuss financial compensation for business loss attributed fully or partially to an issue S Hackett has taken full responsibility for an online public forum? Arrogant. If this was Telstra, the media would be sinking in their collective boot. Hackett is too busy providing an excessively technical explanation (excuse) on a blog, rather than offering an appology or reassuring it's growing customer base. Disappointing
    anonymous
  • Arrogance? Non!

    Actually I found Simon's posts a refreshing change of honesty in these times of corporate spin and misinformation. Given that Simon is probably aware that a good percentage of his customer base uses Whirlpool (Whingepool, call it what you will); posting updates there was a master stroke of information dissemination.
    BTW - I'm not an Internode customer, but the attention to their customers (rather than the media) they've shown here is the sort of act that attracts.
    Cheers.
    anonymous
  • 'The sort of act that attracts'??

    What they are saying is that they relied on hardware redundancy. When that let them down, they could NOT perform a cold restore.

    Redundancy does not replace Backups and Restores.
    anonymous
  • Refreshingly Honest

    As anyone who views/contributes to whirlpool forums knows, Simon responds to most Internode posts and is open, honest and displays respect for his customers by always providing a clear, non corporate view from his side on issues. eg ABCiView. You will not get such openness from any other ISP.
    Of course Simon is sorry for what happened but it was out of his control so no apology is wanted by me or most of his reasonable customers.
    anonymous
  • What do they use?

    I'm very interested in what redundancy and DR solutions they tried to employ (without success). Note that I'm not blaming the solution itself, as most IT problems are caused my mis-configuration, lack of training, and other "people"
    issues, however some DR solutions keep that in mind by being able to perform virtual "fire drills" to test whether or not the DR solution has consistent and usable data (e.g. with Veritas Cluster Server and some other state of the art DR softwares).

    And yes, I agree, part of a DR strategy *needs* to be traditional backup/restore methodology.
    anonymous
  • RTFA

    They did restore, then the restored system immediately started experiencing problems.

    If you're relying on an ISP-provided e-mail account (@internode.on.net in this case) for your business, you can't value your e-mail very highly. Stop grandstanding.
    ISP-provided e-mail is a no-extra charge service aimed at home users. If e-mail is important to your business get your own domain, host it on a small business server or a reputable outsourced e-mail provider.

    E-mail is not an instant communication medium. If you're in a hurry, use the phone. The outage was over the weekend, replying to e-mail on Monday is (in most cases) going to be perfectly acceptable.

    My Internet access was uninterrupted during this outage, and I'm very happy with Internode (so far, switched a few months back), and my e-mail was available (and obviously I'm not using Internode's mail).
    Don't overstate this problem's impact.

    I have to applaud Internode for their customer service and Simon for his openness and honesty. More ISPs (and other companies) should do the same, instead of bullshitting us.
    Comments like some of the earlier posters here have made only encourage the corporate PR nonsense instead of honesty.
    anonymous
  • No apology?

    Simon (or one of his clones) may respond to many posts but they are not open or honest hy any means. They are defensive of the company, pull on emotions (poor little ISP), offensive towards regulatuion and the competitors and political most of the time.

    As to an opology not being needed, is that because it happened on a Friday night and impacted businesses and individual users less then if it happened on a Monday morning?
    anonymous