The Royal Bank of Scotland's (RBS) CEO, Stephen Hester, provided more information about the ongoing IT failure that hit customers of RBS, NatWest, and Ulster Bank. The bank provided this detail in response to a request from the chairman of the UK Treasury Select Committee.
Each evening, the bank processes the day’s transactions across all our businesses. This is a highly complex and large-scale operation - on an average day we process around 20 million transactions. Due to the scale and complexity of the task the transactions are processed in batches through highly automated systems. In normal conditions this overnight batch processing completes before business resumes the following day.
The initial reviews we have carried out indicate that the problem was created when maintenance on systems, which are managed and operated by our team in Edinburgh, caused an error in our batch scheduler. This error caused the automated batch processing to fail on the night of Tuesday 19 June. The knock-on effects were substantial and required significant manual interventions from our team, compounded because the team could not access the record of transactions that had been processed up to the point of failure. The need to first establish at what point processing had stopped delayed subsequent batches and created a substantial backlog. It is not clear at this stage why that record was not available. Consequently, a significant number of customer account balances did not update as they should have from Thursday 21 June.
Progress to Date
Although the problem was rectified promptly, we were faced with a processing backlog which had to be cleared before we could begin to return the systems to normal. In order to be able to recommence automated batch processing and to move towards a recovered state, the batches had to be brought back into sequence.
The bank's comments are troubling at best. It appears that someone in the batch processing department if IT made a mistake and screwed up the batch scheduler. (It's important to note that a software bug or hardware failure could have caused the problem, although this is less likely than human error.)
After operators found the initial problem, they tried to fix it but could not access the transaction record, making the detective work needed to find the problem nearly impossible. By the time the problem was isolated, the backlog was enormous, which created the long delay in getting operations back to normal.
A STRATEGIC PERSPECTIVE
Information technology is an easy target for budget cuts, lack of status, and negativity. To some extent, IT's poor reputation is justified because so many IT organizations do a poor job at project execution and delivery.
At the same time, situations like this remind us that virtually every modern organization depends on IT for core operations. Companies that undercut and neglect IT are playing a dangerous and fool's game, putting themselves and their customers at risk.
If you work in IT, the solution requires two steps. First, deliver operational excellence, in other words, make sure to complete your projects get on time and within budget. Second, learn what the business needs and give it to them. In IT, job security rests on the foundation of these two points.
With IT such an easy target, it's easy to forget that corporate policies create the environment in which technologists carry out their work. I wonder whether the problems at RBS can ultimately be traced to layoffs, budget cuts, and general disinterest in IT as a valuable function.