About that London Stock Exchange IT failure

About that London Stock Exchange IT failure

Summary: The obvious lesson from the correlation between Microsoft's eagerness to brag about the performance and reliability of the system installed at the London Stock Exchange and it's actual performance and unreliability is that pride goeth before a fall - but the deeper lesson is that top management didn't do its job and should be held accountable.

SHARE:
It's the third one in a year and worse even than last year's November 8th failure.

That said, here's a "reprint" of my blog for November 21, 2006 - followed by a few new comments.

---

Another Microsoft anti-Linux case study

As most people know Microsoft has an anti-Linux program called "Get the Facts" featuring case studies arguing the Windows case. One of those, wearing the title: London Stock Exchange chooses windows over Linux for reliability, arrived in my email last week.

Here's the summary quotation attributed to the customer: LSE CIO David lester:

"No other exchange is undertaking such an ambitious technology refresh programme based on next-generation Microsoft technology. We've always provided a first-class service, but now we can claim to be the fastest in the world as well."

Take a careful look at the actual wording: "No other exchange is undertaking.." and, "now we can claim to be the fastest in the world." (Emphasis added.)

The Tandem system this replaced was installed in 1995 and had earned its non-stop tradename with zero downtime over the last six operating years but now belongs to HP and is therefore going away. In response LSE CIO David Lester developed a plan - one structured around a partnership with Microsoft:

Before choosing Microsoft technology, the London Stock Exchange reviewed several potential architectures to meet the requirements of Infolect® and the TRM design objectives. The Microsoft .NET Framework -an integral component of the Windows Server® 2003 operating system- was selected for a number of reasons, including developer efficiency, performance, and scalability. The Infolect® application, which went into production in September 2005, was implemented on a total of 120 HP ProLiant servers across multiple data centres. This configuration allows Infolect to process an average of 15 million real-time messages a day distributed to more than 107,000 trading screens in more than 100 countries.

120 HP Proliant servers sounds like a lot - but then so does 15 million if you're thinking in terms of personal dollars or weeds to pull in your garden. Unfortunately neither number squares with the reality that 15 million messages per day amounts to something between 600 messages per second if generation occurs only during an eight hour trading period, and 180 if you average across 24 hours to allow for electronic trading. Either way, however, easily within scope for a small Unix server like a four way Opteron or T2000 - remember, this stuff ran on an old Tandem before those 120 proliants were brought in.

But at least they can claim it's fast, right? Here's their headline:

London Stock Exchange Cuts Information Dissemination Time from 30 to 2 Milliseconds

Two milliseconds isn't much time -in fact its barely communications latency for a PC NIC- and in fact 30 MS is pretty fast for the old gear considering that the system was first developed and implemented before the Pentium hit 100Mhz

If you look carefully at the wording, especially as repeated in the excerpt below, you'll see how this is achieved: because they say only that the information is "distributed to more than 107,000 trading screens in more than 100 countries", not that their system actually does it:

Reliability is fundamental to the London Stock Exchange value proposition for the market and continues to give its senior managers peace of mind about system uptime. There are approximately 300 customers who connect directly to the live Infolect system to receive real-time market data directly from the London Stock Exchange. The data disseminated from Infolect is then displayed on more than 107,000 terminals in more than 100 countries.

In other words, we're entitled to assume that the 2ms number represents something like a packet delivery time for bulk flows over a local area network - and not only do those "107,000 screens in 100 more than countries" have nothing at all to do with the 2ms claim, but, because they're attached to networks run by the 300 or so big customers with servers on that LAN, it's very doubtful that their users would have experienced any change at all.

All of which should have you wonder what Linux has to do with any of this - Microsoft's headline, you'll recall said that the LSE picked Windows over Linux for reliability.

The answer is that Linux has nothing to do with any of this: Microsoft simply hung an anti-Linux label on a very carefully worded story about a pair of committed Microsoft partners, HP and Accenture, getting together with Microsoft to sell rather simple technology to a willing customer - and neither Linux nor Solaris is mentioned anywhere in the text.

---

So now the chickens are coming home and the question is, why? Are Microsoft's dot.net technologies so inherently unreliable it's simply absurd to expect them to work when volume changes dramatically and performance pressure mounts, or is there something deeper going on?

My vote goes for a combination of both: second rate technology combining with a problem obvious in both the decision process and Microsoft's decision to brag about this install on its anti-Linux site. Specifically the problem is one of incentives: what incentive did any of the power players involved have to get either the decision or the implementation right?

Before the sale incentives for Accenture, HP, and Microsoft were aligned with selling a Windows project - not with actually achieving both the high reliability and the high performance the customer seems to have expected. And, after the sale, the incentives align more with keeping costs down while getting sign-offs than with meeting any promises made about reliability or performance.

What I'm reminded of in this context is the sad story of the frog who believed a scorpion's promise of unscorpion like behavior and died for his naivete when the scorpion did what scorpions do - what I think, in other words, is that primary responsibility for the LSE mess belongs to the top LSE managers who let their CIO get the LSE into bed with Microsoft and its partners.

Basically it's top management's job to set the right performance incentives in place, to understand how existing incentives are likely to work out, and to take immediate corrective action when people who report to them start to respond to career incentives that don't align with the organization's welfare -and thus the single most important driver for these recent failures wasn't poor technology but the simple fact that LSE top management didn't do its job.

Topics: Enterprise Software, Software, Servers, Operating Systems, Open Source, Microsoft, Linux, Hewlett-Packard, Hardware, Windows

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

88 comments
Log in or register to join the discussion
  • You do know, right?

    You do know that the outage was caused by a [b]network[/b] problem, and had nothing to do with MS technology, right?

    http://www.computerworlduk.com/management/infrastructure/networks/news/index.cfm?newsid=10912

    And about those performance rantings of yours:

    http://www.londonstockexchange.com/NR/exeres/5CA1EB52-2B47-4922-A6F1-E9EC32D0152B.htm

    http://www.londonstockexchange.com/NR/exeres/A1B303DD-86EE-4D84-9B4F-BADFA1C770C7.htm

    Seriously, is there [b]any[/b] truth in any of your rantings? Don't you get tired of being wrong all the time in your clueless anti-MS rantings?

    I am curious, I would love to hear from you how Linux would have kept on working when the [b]network[/b] goes down. Should be interesting to hear you explain how that would work...
    Qbt
    • RE: You dunno right?

      <font color=grey><em><strong>"You do know that the outage was caused by a <font color=black>network</font> problem, and had nothing to do with MS technology, right?"</em></font></strong><br>
      <br>

      <font color=grey><em>?<strong>Two particular <font color=black>software</font> activities</strong>? had taken place, causing the problem, the <a href="http://www.computerworlduk.com/management/infrastructure/networks/news/index.cfm?newsid=10912" target="_blank">LSE admitted</a>. But it categorically refused to disclose any more information."</em></font><br>
      <br>
      ^o^<br>
      <br>
      n0neXn0ne
      • Article clearly states problem was not MS related

        I'm not sure what you are trying to imply, but the following quotes from [url=http://tinyurl.com/3s4tbm]the ComputerworldUK article[/url] cleary indicates that the problem is not a MS technology problem.

        [i]But the LSE said it would not point the finger of blame at Accenture, which built its crucial TradElect platform. [b]It said TradElect was not at the source of the problem.
        .
        .
        .

        An LSE spokesperson told Computerworld UK the problem was related to network software, and not the LSE?s key TradElect platform.[/b]
        .
        .
        .

        The problems are ?thought to have occurred? on the trading gateway between the LSE?s Extranex private network (linking the exchange and clients) and the TradElect electronic trading platform, the Financial Times said. [/i]
        P. Douglas
        • He's never let facts get in the way of his MS bashing. (nt)

          .
          ye
        • I wonder

          if that gateway software was relying on .NOT ...
          Roger Ramjet
          • Worth a try, but...

            Just can't let it go, eh?

            The MS FUDsters thought they finally had a good anti-MS story, only to be exposed as people who would rather keep peddling false information to further their little anti-MS jihads. Your opinions keeps meaning less and less in the real world. It should, since if you can't even stick to the basic facts then your opinions doesn't have a place any "tech" blog.

            Seriously, you people are truly pathetic.
            Qbt
          • Even if it was an MS failing what would it prove?

            That Microsoft is not perfect? We already know that. So what would be the point anyway?
            ye
          • LOL

            The only reason why MS technology is "not perfect" is because of FUD stories like this one Murph is peddling. Yes no software is truly "perfect" but to try and keep spreading misinformation just to make your competition look "less perfect" is just, well, disgusting.

            So basically what you are saying is that any anti-MS FUD should go unchallenged because you have a low opinion of MS and therefore it is OK? Talk about ethics...

            Simply amazing...
            Qbt
          • @Qbt: No.

            [i]So basically what you are saying is that any anti-MS FUD should go unchallenged because you have a low opinion of MS and therefore it is OK? Talk about ethics...[/i]

            If you read long enough you'll see I challenge a lot of FUD about Microsoft. So much so I've been branded a shill and a troll.

            My point was that even if it was a failure on the part of MS all it proves is that MS isn't perfect. Which is what we already know. Therefore the anti-MS crusade would have what point?
            ye
          • RE: Worth a good try, but...

            "<font color=grey><em>"<strong><a href="http://www.microsoft.com/canada/getthefacts/default.mspx" target="_blank">The MS FUDbusters</a></strong> thought they finally had a good anti-MS story, only to be exposed as people who would rather <a href="http://www.microsoft.com/windowsserver/compare/default.mspx" target="_blank>"><strong>keep peddling false information</strong></a> to further their little anti-MS jihads."</em></font><br>
            <br>
            ^o^<br>
            <br>
            n0neXn0ne
      • You do know that if it was Linux running the exchange

        Both you and Murph would would have mentioned at least 42 times in the article that it was network hardware related, and the word Linux would have never come up except to point out the fact that it wasn't Linux's fault.


        <font color="#CC3300">Get a grip why don't you?
        AllKnowingAllSeeing
  • Burn. (nt)

    .
    silent.griffin
  • RE: About that London Stock Exchange IT failure

    Subterfuge, dude. The whole network story is a distraction from the real problem. The network story was thorough debunked at Slashdot.
    epitax
    • Which implies it is accruate.

      [i]The network story was thorough debunked at Slashdot.[/i]

      Slashdot is one of the biggest anti-Microsoft sites in existence. Any conclusion reached there on Microsoft being at fault is a given.
      ye
      • Where's a link to a story confirming the root cause?

        nt
        D T Schmitz
        • RE: ... link to a story confirming the root cause?

          <font color=grey><em>"LSE representatives are saying is that "<a href="http://www.computerworlduk.com/management/it-business/it-organisation/news/index.cfm?newsid=10947" target="_blank">there was a combination of software activities that coincided</a>", ... " </em></font><br>
          <br>
          ^o^<br>
          <br>
          n0neXn0ne
        • I am not aware of one at the moment. Which is why I think it's too...

          ...early to be pointing fingers at anyone.
          ye
          • RE: I am not aware of one ..., O yeah ?

            Maybe because you don't want to be [i]aware of one[/i]?
            Because you are not [i]aware of one[/i], that doesn't change facts.

            Hint; Try Google *not* Live Search.

            ^o^ ]:)
            n0neXn0ne
          • I don't see one. But if you've got one post a link.

            I'm happy to look it over.
            ye
      • RE: It is accruate & it is Accenture

        <font color=grey><em>"<a href="http://www.computerworlduk.com/management/infrastructure/networks/news/index.cfm?newsid=10912" target="_blank">Accenture would not comment</a>, and referred all queries to the LSE."</em></font><br>
        <br>
        ^o^<br>
        <br>
        n0neXn0ne