IT sizing error delays Boston commuters

IT sizing error delays Boston commuters

Summary: The automated machines that dispense subway tickets in Boston went down for two hours during a recent rush hour. This situation offers an example of IT failure in an unusual setting.

SHARE:
TOPICS: CXO, Hardware
9

IT sizing error delays Boston commuters

The automated machines that dispense subway tickets in Boston went down for two hours during a recent rush hour. This situation offers an example of IT failure in an unusual setting.

According to the Boston Globe:

The breakdown affected Charlie Card fare-dispensing machines that take credit and debit cards. Between 7:40 and 9:24 a.m., the machines "started experiencing intermittent disruptions of service" that typically lasted between five and seven minutes, but worked fine after that, T spokesman Joe Pesaturo said.

[Scheidt & Bachmann, the manufacturer, is] going to come back to us with a corrective action plan," Pesaturo said, which could include beefing up computer processing power to handle a crush of monthly pass and fare sales like yesterday's.

THE PROJECT FAILURES ANALYSIS

Automated fare systems are designed to help passengers quickly purchase tickets with minimum delay. This user simplicity is matched by back-end complexity, as shown below in the manufacturer's system diagram:

IT sizing error delays Boston commuters3

IT deployments of this type require proper sizing to determine hardware requirements, based on anticipated transaction levels, system requirements, and so on. If the system isn't properly sized, outages can occur during times of peak demand.

Given the manufacturer's proposal to add processing power, it seems likely the hardware was not adequately sized during the initial deployment. Since Boston subway ridership levels are carefully tracked, and the backend system requirements are well-known by the manufacturer, this failure should not have occurred.

From my perspective, there are two possible explanations:

  • Ridership spiked beyond any reasonable expectations, causing hardware overload. Given the available historical data, this scenario seems unlikely.
  • The implementation was not planned or managed properly. I must conclude this is the likely cause of the failure.

How and why a technical exercise like sizing could be so screwed up is hard to say. Perhaps Boston wanted to save money on hardware, hoping for best. Regardless of the reasons, we can definitely say this is another government IT project gone awry.

Topics: CXO, Hardware

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

9 comments
Log in or register to join the discussion
  • Maybe there should be a backup

    I still remember my college days when the computerized cash registers at the department store where I worked went down every once in a while, bring sales to a halt, sometimes for an hour or more.
    John L. Ries
  • RE: IT sizing error delays Boston commuters

    LANs, WANs, GSM, and everything on the strategic level probably being external to the system, and you automatically jump to the conclusion it was an application server sizing problem?
    Vesicant
    • Processor power is cited

      Insufficient processing power sounds like sizing to me. I suspect they didn't fully account for start of month loads, and this month was especially bad for some reason.
      mkrigsman@...
      • Not exactly...

        I didn't see processing power being cited as a problem, merely adding processing power being offered as a potential fix.

        So there could be more fundamental flaws in the system, and they are using additional processing power as a cheap band-aide.

        In fact, it wouldn't surprise me. I've seen a number of systems that would seize up under certain conditions and then magically fix themselves minutes or hours later. I've never seen lack of CPU power be the cause.

        Usually it is some sort of resource leak. For example, one off-the-shelf web application we have in some situations won't free resources until a session times out or the user explicitly exits certain functions. If the user closes his browser Window the resources are held until automatically cleaned up on expiration. If the user repeats this process several times, it can bring down the application.

        Adding more servers would raise the bar where the problem occurs, but the problem has nothing to do with a lack of processing power. It is simply poorly constructed software.
        Erik Engbrecht
        • Good points

          Thanks for the alternative view. Regardless of the underlying reason, there is clearly a problem with the system. Wish more information was available to know for sure.
          mkrigsman@...
  • Maybe it was the Transit company's fault...

    I'll bet there was someone in the front office who looked at the processing power requirements and figured they could save money by getting some highly-discounted discontinued equipment, not as powerful as spec'd but "good enough."
    muzhik
  • Could of been an upgrade gone bad.

    They have been using this system for a while.. so why would you immediately think its a sizing issue?

    This sounds more like an upgrade failure of some kind.
    Been_Done_Before
  • Being a User of the System

    I've had problems with the kiosks during off peak times. The credit/debit- only machines can't do credit card Txs (the only option that shows up on the screen is for cash when the machine has no cash facility).

    And there is at least one of the electronic turnstiles is out of order. And this system is about a year old.

    There are more problems with the system then meets the eye here.
    elizab
    • Often the case

      This is an excellent point. I suspect you are absolutely correct, that this is only the tip of the iceberg.
      mkrigsman@...