Key questions on the massive RBS / NatWest IT failure

Key questions on the massive RBS / NatWest IT failure

Summary: Despite it's massive IT failure, RBS has not released sufficient detail and many questions remain.

SHARE:
TOPICS: CXO, Banking
10

The recent situation at UK banking giant, Royal Bank of Scotland (RBS), will certainly go down in history as one of the most disruptive IT failures of all time. The massive impact of this failure continues to be felt by banking customers a week after the computer disruption first interfered with their gaining access to funds. Unfortunately, RBS has not released sufficient detail and many questions remain.

Also read:
ZDNet: RBS Bank joins the IT failures 'Hall of Shame'
ZDNet: RBS gives more detail on IT failure train wreck
The Guardian: How NatWest's IT meltdown developed

Although RBS (and its operating units NatWest and Ulster Bank) has revealed little information about what caused the problems, new details have emerged in London newspaper The Guardian. Reports indicate that the failure occurred when RBS computer operators tried to upgrade the bank's workload automation system, which is based on a product called CA-7 from CA Technologies.


Image credit: iStocpPhoto

Image credit: iStocpPhoto

The upgrade initially seemed to work as expected, but within hours other "guardian systems" discovered anomalies in batch jobs following the upgrade. Although technicians performed the failed upgrade last Tuesday night, they were not able to complete a successful batch run until Friday. By the time operators finished a successful run, millions upon millions of customer transactions were waiting to be processed. Customers will continue to experience problems until the bank works through this massive backlog of transactions.

WHAT'S GOING ON?

We don't know and that's the problem. RBS has released sketchy details of what caused the problem and why it took so long to resolve. To understand the full scope of the event and aftermath, RBS must answer questions like these:

Upgrade Questions


  • How detailed are the procedures governing patches and upgrades to production systems?
  • Did the operator follow these procedures or deviate at all?
  • To what extent was this upgrade tested in a production-style environment?
  • Who installed the upgrade?
  • How much CA-7 experience did the upgrade installer possess?
  • How much RBS-specific experience did the installer possess?

Software Recovery Questions


  • How did the bank first learn of the problem?
  • What procedures did the operators use to solve the problem initially?
  • Who performed these problem resolution procedures?
  • What was their level of experience with the software, RBS processes, and with the broader RBS technology architecture and environment?
  • Why did the rollback procedure take so long?
  • How did the team finally isolate and solve the problem?

Business Policy Questions


  • Did layoffs and outsourcing contribute to reducing the bank's knowledge and experience with its own systems?
  • Why did recovery take several days?
  • What are the bank's business continuity plans and how often are they tested?
  • Why did this event happen in the first place -- there is always a cause -- what happened?

THE BOTTOM LINE - DON'T BLAME THE CIO

In a case like this, it's tempting to blame the CIO, after all, he or she is responsible for systems. However, it is not the CIO's fault if bank policies dictate layoffs and offshoring that result in lost skills. These are management issues with consequences across the organization, including in IT.

Also read: Who's accountable for IT failure? (part one) Who's accountable for IT failure? (part two)

Government regulators must question the judgment and wisdom of decisions made by the bank Board and CEO, to uncover policies that created an environment in which critical IT tasks could roll out without complete testing, verification, and clear paths to recovery.

Update 6/27/12: According the The Register (a sensationalist tech news site), an inexperienced computer operator in India caused the RBS failure. Apparently, a routine problem arose during the upgrade procedure, which is usually not a serious issue because administrators can roll back to a previous and stable version of the software. In this case, however, it seems the operator erroneously cleared the entire transaction job queue, kicking off a long and difficult process of reconstruction. The article adds: "A complicated legacy mainframe system at RBS and a team inexperienced in its quirks made the problem harder to fix".

Topics: CXO, Banking

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

10 comments
Log in or register to join the discussion
  • Great questions that I don't think will ever be answered...

    Enterprise systems and playforms as big as these are massively complex affairs with lots of moving parts and cross dependencies. Despite all effects to test 100% any CIO who has had to back out a major change (most of us) will appreciate just how difficult it can be to clear up the aftermath. Sometimes the path to reverse out the change is different to that of introducing the change and data issues can be left behind.

    This combined with the fact that you can't shut the systems down for business continuity reasons during the recovery process is why it can take days to fully clean up.

    Should the CIO take the largest share of the blame? I believe so, even if the business steer is to reduce cost the CIO is accountable for choosing the vendors/partners, overseeing this, and advising the board of the associated risks.

    One thing is for sure... if it was a cost cutting exercise then years of savings are likely to be wiped out in one go...

    A huge lesson for RBS nonetheless
    Stuartlynn
  • fdsafsd

    http://115.co/9d
    niussw
  • The old Tie Brigade ain't what it used to be.

    Back in the days of ole, the old tie brigade could pat themselves on each others backs, safe in the knowledge that johnny foreigner was more inept than them.

    Fast forward to today, and that the CEO boys running anything British are friends of friends fathers, and get their peerages, knighthoods with a nudge and wink, utterly without any merit for the job, let alone aptitude, they are being made a mockery of johnny foreigner, who comes in and offers them shiny baubles for the British company they ran into the ground listening only to each other, and not to those of "lower class" than themselves, despite them having vastly superior intellects, knowledge and, dare I say it, less ego.

    The main failure, truth be told, is arrogance in the British Ruling Elite, still thinking they have little competition. When in fact, they have been utterly outed for the charlatans they have always been.

    It's worth bearing in mind that the previous CEO of this bank, actually had his knighthood taken off him. Poor little mite, how badly that must have gone down at the polo club. Fred the Shred he was called.

    The "lower classes" were forced to bail out this bank, almost completely. He walked away with billions in bonuses, and millions in pensions.

    The new "krew", came in, decided they wanted to keep their bonuses, and so sacked the people who knew what they were doing.

    Arrogance. Ineptitude, and sheer criminal negligence.

    These "chaps" should be in prison, but, that would ruin christmas lunch with the Judges, and all the other cronies.
    Bozzer
  • Guardian article comments

    There's a comment on the Guardian article from someone who says they are former Nat West employee.

    He says the batch systems are largely written in IBM Assembler. Now I acknowledge that these systems were not the cause of the problem (it was the scheduler configuration files) but even so I would say this points to a lack of investment in more modern technologies that would (hopefully) be more maintainable than the existing systems.
    jorwell
    • investment in more modern technologies...

      is not the point - testing them to be sure they meet reliability of old assembler or cobol code is - takes decades.
      vgrig
      • As someone who used to program in COBOL

        I would say using an RDBMS properly (not just as a one for one substitute for ISAM) can bring you a long way forward.

        But I would agree there would a huge amount of work involved in replacing the existing systems and also considerable risk.

        If someone were suggesting replacing the assembler and COBOL with Java then I would agree, they would never make it.
        jorwell
  • "The Register (a sensationalist tech news site)"

    Say what? I find register a lot less sensationalist, a lot more subjective and a lot more accurate then zdnet (not to mention a lot more technical)...
    vgrig
  • discount chrisitian louboutin

    I just required some information and was searching on Google for it. I visited each page that came on first page and didn’t got any relevant result then I thought to check out the second one and got your blog. This is what I wanted!Superb! Generally I never read whole articles but the way you wrote this information is simply amazing and this kept my interest in reading and I enjoyed it.
    http://www.christianlouboutincheapc.com
    http://www.cheapmonsterbeatstest.com
    feitian01
  • discount chrisitian louboutin

    I just required some information and was searching on Google for it. I visited each page that came on first page and didn’t got any relevant result then I thought to check out the second one and got your blog. This is what I wanted!Superb! Generally I never read whole articles but the way you wrote this information is simply amazing and this kept my interest in reading and I enjoyed it.
    http://www.christianlouboutincheapc.com
    http://www.cheapmonsterbeatstest.com
    feitian01
  • Great Article

    Here is a New Era Wholesale in China,it is called Fashion Caps .It is known by many people and they have Cheap New Era and other brand of caps like Supreme Caps for sale .They have lowest price with the best quality in the market.You can easy choose the style of caps you like from that store,and the shipping is free.Come on man,go get it!
    New Era Wholesale