RBS gives more detail on IT failure train wreck
Summary: As RBS releases more details about their massive IT failure, we are still left asking how it happened.
The Royal Bank of Scotland's (RBS) CEO, Stephen Hester, provided more information about the ongoing IT failure that hit customers of RBS, NatWest, and Ulster Bank. The bank provided this detail in response to a request from the chairman of the UK Treasury Select Committee.
Also Read:
RBS Bank joins the IT failures 'Hall of Shame'
Key questions on the massive RBS / NatWest IT failure
Some key points from the RBS statement:
Incident Background
Each evening, the bank processes the day’s transactions across all our businesses. This is a highly complex and large-scale operation - on an average day we process around 20 million transactions. Due to the scale and complexity of the task the transactions are processed in batches through highly automated systems. In normal conditions this overnight batch processing completes before business resumes the following day.
The Incident
The initial reviews we have carried out indicate that the problem was created when maintenance on systems, which are managed and operated by our team in Edinburgh, caused an error in our batch scheduler. This error caused the automated batch processing to fail on the night of Tuesday 19 June. The knock-on effects were substantial and required significant manual interventions from our team, compounded because the team could not access the record of transactions that had been processed up to the point of failure. The need to first establish at what point processing had stopped delayed subsequent batches and created a substantial backlog. It is not clear at this stage why that record was not available. Consequently, a significant number of customer account balances did not update as they should have from Thursday 21 June.
Progress to Date
Although the problem was rectified promptly, we were faced with a processing backlog which had to be cleared before we could begin to return the systems to normal. In order to be able to recommence automated batch processing and to move towards a recovered state, the batches had to be brought back into sequence.
The bank's comments are troubling at best. It appears that someone in the batch processing department if IT made a mistake and screwed up the batch scheduler. (It's important to note that a software bug or hardware failure could have caused the problem, although this is less likely than human error.)
After operators found the initial problem, they tried to fix it but could not access the transaction record, making the detective work needed to find the problem nearly impossible. By the time the problem was isolated, the backlog was enormous, which created the long delay in getting operations back to normal.
A STRATEGIC PERSPECTIVE
Information technology is an easy target for budget cuts, lack of status, and negativity. To some extent, IT's poor reputation is justified because so many IT organizations do a poor job at project execution and delivery.
At the same time, situations like this remind us that virtually every modern organization depends on IT for core operations. Companies that undercut and neglect IT are playing a dangerous and fool's game, putting themselves and their customers at risk.
If you work in IT, the solution requires two steps. First, deliver operational excellence, in other words, make sure to complete your projects get on time and within budget. Second, learn what the business needs and give it to them. In IT, job security rests on the foundation of these two points.
With IT such an easy target, it's easy to forget that corporate policies create the environment in which technologists carry out their work. I wonder whether the problems at RBS can ultimately be traced to layoffs, budget cuts, and general disinterest in IT as a valuable function.
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.
Talkback
Most likely the ugly head of GREED showed its true nature.
Yeah - yeah. Save on back-up plans and give more and more ridiculous, insane bonuses and salaries to the "fat-cats".
Greed is really taking its toll.
This was indeed an unfortunate occurrence...
Serious IT failures have the potential to cause catastrophic and long-lasting effects on the profitability and image of a business. By taking the appropriate steps to ensure your business has a proper IT Service Management solution in place, you are ensuring higher levels of consistency, productivity and reduced downtime.
Axios covers this topic as it relates to the RBS incident at length in our latest blog: http://blog.axiossystems.com/?p=56.
Testing this sites options
I worked for a Bank.
Also, i worked for a company a couple of years ago and for a system, i put a backdoor in a system (for testing purpose). This system was audited several times and nobody found it. For the record, the backdoor is still up and running.