RBS gives more detail on IT failure train wreck

RBS gives more detail on IT failure train wreck

Summary: As RBS releases more details about their massive IT failure, we are still left asking how it happened.

SHARE:

The Royal Bank of Scotland's (RBS) CEO, Stephen Hester, provided more information about the ongoing IT failure that hit customers of RBS, NatWest, and Ulster Bank. The bank provided this detail in response to a request from the chairman of the UK Treasury Select Committee.

The RBS IT failures train wreck
Photo credit: Michael Krigsman

Also Read:
RBS Bank joins the IT failures 'Hall of Shame'
Key questions on the massive RBS / NatWest IT failure

Some key points from the RBS statement:

Incident Background

Each evening, the bank processes the day’s transactions across all our businesses.  This is a highly complex and large-scale operation - on an average day we process around 20 million transactions. Due to the scale and complexity of the task the transactions are processed in batches through highly automated systems. In normal conditions this overnight batch processing completes before business resumes the following day.

The Incident

The initial reviews we have carried out indicate that the problem was created when maintenance on systems, which are managed and operated by our team in Edinburgh, caused an error in our batch scheduler. This error caused the automated batch processing to fail on the night of Tuesday 19 June. The knock-on effects were substantial and required significant manual interventions from our team, compounded because the team could not access the record of transactions that had been processed up to the point of failure.  The need to first establish at what point processing had stopped delayed subsequent batches and created a substantial backlog. It is not clear at this stage why that record was not available. Consequently, a significant number of customer account balances did not update as they should have from Thursday 21 June.

Progress to Date

Although the problem was rectified promptly, we were faced with a processing backlog which had to be cleared before we could begin to return the systems to normal. In order to be able to recommence automated batch processing and to move towards a recovered state, the batches had to be brought back into sequence.

The bank's comments are troubling at best. It appears that someone in the batch processing department if IT made a mistake and screwed up the batch scheduler. (It's important to note that a software bug or hardware failure could have caused the problem, although this is less likely than human error.)

After operators found the initial problem, they tried to fix it but could not access the transaction record, making the detective work needed to find the problem nearly impossible. By the time the problem was isolated, the backlog was enormous, which created the long delay in getting operations back to normal.

A STRATEGIC PERSPECTIVE

Information technology is an easy target for budget cuts, lack of status, and negativity. To some extent, IT's poor reputation is justified because so many IT organizations do a poor job at project execution and delivery.

At the same time, situations like this remind us that virtually every modern organization depends on IT for core operations. Companies that undercut and neglect IT are playing a dangerous and fool's game, putting themselves and their customers at risk.

If you work in IT, the solution requires two steps. First, deliver operational excellence, in other words, make sure to complete your projects get on time and within budget. Second, learn what the business needs and give it to them. In IT, job security rests on the foundation of these two points.

With IT such an easy target, it's easy to forget that corporate policies create the environment in which technologists carry out their work. I wonder whether the problems at RBS can ultimately be traced to layoffs, budget cuts, and general disinterest in IT as a valuable function.

Topics: CXO, Banking, Enterprise Software

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

4 comments
Log in or register to join the discussion
  • Most likely the ugly head of GREED showed its true nature.

    It looks like that because the obscene bonuses and CEO salaries seems to be more important than a proper "Plan B" when something does not go as planned.

    Yeah - yeah. Save on back-up plans and give more and more ridiculous, insane bonuses and salaries to the "fat-cats".
    Greed is really taking its toll.
    hkommedal
  • This was indeed an unfortunate occurrence...

    As mentioned in the article, IT programs tend to be an easy target for budget cuts and lack of status within organizations. However, in today’s world – where businesses are increasingly reliant on IT and IT services to gain and retain a competitive edge in the market – this shouldn’t be the case.

    Serious IT failures have the potential to cause catastrophic and long-lasting effects on the profitability and image of a business. By taking the appropriate steps to ensure your business has a proper IT Service Management solution in place, you are ensuring higher levels of consistency, productivity and reduced downtime.

    Axios covers this topic as it relates to the RBS incident at length in our latest blog: http://blog.axiossystems.com/?p=56.
    Markos Symeonides_Axios Systems
  • Testing this sites options

    Testing to see if my user name shows up
    TBNCBSI
  • I worked for a Bank.

    And they asked me a lot of restriction and security in all transaction such passing through a legacy COBOL system and so on. However, the whole security procedure makes a nonsense because i was contracted as an outsourced worker, i never signed anything and my direct employer hardly knew my information. BTW, the work was a sh*t, older computers (they still use CRT monitor!!), older system, crappy code and unmotivated workers.

    Also, i worked for a company a couple of years ago and for a system, i put a backdoor in a system (for testing purpose). This system was audited several times and nobody found it. For the record, the backdoor is still up and running.
    magallanes