Telstra outage review: Increase capacity, traffic protection across network

Telstra's COO has said the ongoing review into its three mobile outages has yielded several plans to increase and augment its signalling capacity, improve traffic management, and better manage any issues.

Recently embattled telecommunications carrier Telstra has provided an update on the network engineering review into its three national outages this year, with plans to increase capacity of its signalling channels, add extra traffic management protection, improve capacity for its home location register, and heighten its "awareness plan".

According to chief operating officer Kate McKenzie, Telstra is seeking the advice of both internal and external engineering experts, including from Cisco, Ericsson, and Juniper.

"Our initial review has confirmed the recent incidents were not related, although two of the disruptions were due to delays in processing the registration of mobile devices ... we are absolutely committed to getting to the bottom of these incidents, and are taking all of the necessary steps to minimise the risk of it happening again," McKenzie said at the CommsDay Summit in Sydney on Monday morning.

"We are well into a thorough review of the network. I am leading this review, and it involves our own specialist teams as well as external experts from around the world. We have already progressed short- and medium-term actions to improve resilience and robustness in the mobile network.

"Changes have been implemented to increase the capacity and path diversity of critical signalling channels, and a temporary layout of traffic management protection has been added to minimise the impact of events like the ones we saw on the 9th of February and the 17th of March.

"Within a few days, we expect to augment capacity in a key platform -- the home location register -- that manages our customers' subscription data. In conjunction with our global partners Ericsson, Cisco, and Juniper, we have assembled a team of internal and external engineering experts to do an end-to-end review of our network.

"While this work is under way, Telstra Operations has a heightened awareness plan, including executive-level review of any changes planned for our mobile and core IP networks. Our network is resilient, and we are determined to get the best advice from around the world to help ensure that it stays that way."

McKenzie added that the network "is now stable, and operating as it should".

Telstra customers have been subjected to three outages over the past six weeks: The first on February 22, which affected prepaid and post-paid mobile services; the second on March 17, which involved an hours-long national mobile data and voice outage; and the latest on March 22, which was a smaller voice outage.

The first outage took down voice and data across mobile for several hours, and was caused by "embarrassing human error". It resulted in the telco gifting all customers with free unlimited data on February 14 in order to provide compensation.

"Absolutely apologise right across our customer base, this is an embarrassing human error," the COO told journalists in February.

"It's not OK, we do not like causing that level of inconvenience to our customers, and we are working very quickly right now how we can provide some free data to our customers to make up for the inconvenience that's been caused to them today."

The human error occurred when the correct procedure was not followed after one of the telco's 10 mobile nodes was taken down.

"The outage was caused when one of our major mobile nodes went down," a Telstra spokesperson said.

"The network is configured to manage this; however, in this instance, we had issues transferring customers to other nodes, which caused congestion on the network for some customers.

"Services have now been restored, with the vast majority of our customers now back online. We thank customers for their patience and apologise for the inconvenience caused."

McKenzie added further detail: "We've got 10 of these nodes across the company ... basically, they are a piece of equipment that ... run in pools, so that enables us to be able to manage traffic and connections for both voice and data around the whole country across the pooled environments.

"So normally, we could take down three or four of those nodes and do work on them, fix them up, and it would have no impact, but on this occasion the correct procedure was unfortunately not followed, and the flow-on consequences you can see."

The second outage was also experienced across the country, with smartphones stuck on "SOS only" or "no service", unable to connect to either data or voice services. Telstra consequently began a "major" engineering review and offered another free data day on April 3.

"Our early findings show we had a problem that triggered a significant number of customers to be disconnected from the network, and as they were all automatically reconnecting at the same time, this caused congestion," Telstra CEO Andrew Penn said.

"While this is unrelated to a network outage last month, the congestion caused by people reconnecting to the network was similar.

"Following the last event we started a major process and engineering review of the network, which includes global network experts, to understand how it occurred. We will add the lessons learned from this incident to that review."

The third outage, which the telco repeatedly emphasised affected fewer than 3 percent of its customers, was due to an issue with a media gateway in Victoria that failed to connect calls.

"The disruption ... was caused by a card failure in a media gateway in Victoria, which meant certain calls could temporarily not get through. The media gateway allows the calls to connect," a Telstra spokesperson told ZDNet at the time.

"While small, we appreciate the impact this outage had on the customers affected, and we apologise to them."

The spokesperson added that Telstra had recommitted to its "major" network review.

"While we have the leading network in Australia, like any of our global peers there will always be issues that arise in such a large and complex technology environment," the spokesperson said.

"We are committed to redoubling our efforts on resilience in the network, and part of that is conducting a major review in relation to the outages from last week and February."

While some users complained over Twitter about the speed of the network during the free data day on Sunday, McKenzie said it was a success, beating the 1,841 terabytes of data downloaded on February's free data day by 46 percent, with 2,686 terabytes of data.

"Yesterday, we reached the peak network traffic level of the previous data day by 8am. By 4pm, we had already surpassed the record set in February. By the end of Sunday, our customers had downloaded 2,686 terabytes of data, which is three times the amount of data downloaded on a normal weekday," McKenzie said at CommsDay.

"We were very pleased with the way the network performed, given this tsunami of data. To ensure fairness for all customers accessing the network, traffic-balancing mechanisms tuned and optimised the mobile network during the day. Like any of our global peers, there will always be issues that arise in such a large and complex environment.

"We understand there is a heightened degree of interest in relation to our network elements at the moment. Social media is a very immediate if not occasionally over-enthusiastic indicator of network performance. But you only need to look at the results of our free data days to see what our network is capable of."

Meanwhile, Vodafone Australia last month attempted to take advantage of Telstra's outages by spruiking its own network and offerings.

"We know how important it is to stay connected, so if you're having trouble with your network, we invite you to come on over to Vodafone," said Vodafone director of Sales Ben McIntosh.

"We know 'free data' days are all the rage right now, which is why Vodafone offers two months of unlimited data to post-paid voice customers when they join or upgrade."

Kogan Mobile, which relaunched on the Vodafone network after experiencing network issues on the Telstra network after ISPOne blocked Kogan customers from using data, also leveraged the Telstra outages to offer 70 percent off deals.

Telstra also experienced an outage in its Australia-Singapore subsea cable in October last year.