Aussie Broadband has provided customers with a detailed report following an outage across its National Broadband Network (NBN) services over the weekend, explaining that it was due to an attack from a customer IP range targeting its VoIP infrastructure, which caused its firewalls to reboot.
Providing a full post-incident report with a detailed timeline of the outage on Saturday, Aussie Broadband explained that at 11.05am AEST, its monitoring systems reported a large number of failed connections on its VoIP platform.
At 11.06am, its Cisco ASA primary firewall ran out of resources and rebooted, which led to the secondary Cisco ASA firewall entering an active state and taking on the load until it too ran out of resources and rebooted.
"This caused a cascading failure of most of our systems including DHCP, DNS, Radius authentication, office phone system, mail, office network and many others," Aussie Broadband MD Phillip Britt explained on broadband enthusiast website Whirlpool.
CTO John Reisinger was then sent to one of Aussie Broadband's two Melbourne-based datacentres by 11.45am to begin diagnosing the fault alongside senior network engineer Peter Ansell, with Britt personally transporting a spare firewall there at 2pm.
"Peter and John began adding various IP ranges back onto the firewall service by service until it crashed again. It was determined that something was hitting the VoIP servers; over 10,000 connection attempts per second were being seen," Britt added.
"It was decided to sacrifice the VoIP servers for the moment and then bring back the rest of the services."
Aussie Broadband told ZDNet that the outage affected approximately half of all customers, with internet services being restored by 3.05pm while VoIP services remained offline until around 5.50pm.
According to Aussie Broadband, it has been spending its capex and attention on building out to all 121 points of interconnect for the NBN, resulting in the company being "caught with our pants down".
"The server components of the network and firewall systems were designed four years ago prior to our national expansion. Our focus has been on building network and connecting to POIs and the server infrastructure and its supporting systems have been on the backburner," Britt explained.
"Prior to this outage, work was already under way to move DHCP and Radius servers into separate failure domains but was not complete. This is why some customers stayed online and others didn't."
The provider added that it will take months to make the changes needed to prevent a similar issue affecting its services, but that it will immediately work to ensure the DHCP and Radius systems are not reliant on legacy server or firewall infrastructure within 48 hours.
"Medium term things will be separating the office network onto its own infrastructure so that any issue in the ISP part of the business can't affect our phones and internal IT equipment so that we can continue to support customers, this includes systems like being able to send outage notifications to customers etc," the company added.
"Finally, each discrete system that we operate will be separated so that no one system can take out another. This is going to require a significant piece of work to separate IP ranges used for our public-facing servers."
The provider also contacted customers via email to explain the outage.
"We're still conducting investigations, but it appears that our systems came under an attack from an external source, aimed at one of our customers. Normally we would be protected by our firewalls, but the attack was so large that our protection struggled to deal with the load," Britt explained in the email.
"We began working on Saturday night to significantly restructure components of our network, and we are working on ways to improve our outage communications if we ever experienced an internal systems failure again."
Aussie Broadband had in July taken home the highest scores across the Australian Competition and Consumer Commission (ACCC)'s second NBN speed-monitoring report.
According to the ACCC, Aussie Broadband delivered 89.1 percent of its maximum plan speeds overall and 88.3 percent during busy hours for downloads. It likewise scored highest on average upload speeds, providing 89.4 percent of its maximum plan speeds overall and 89.1 percent during busy hours.
Upload speeds during very busy hours remained fairly high, with Aussie Broadband providing 82.4 percent of maximum plan speeds.