Twitter: An IT failures management perspective

Twitter: An IT failures management perspective

Summary: Twitter, the well-known social messaging service, has finally acknowledged the depth and severity of technical problems causing downtime and disruption to users. While such candor is refreshing, it also offers a glimpse into the kinds of management issues that underlie virtually all IT failures.

SHARE:

Twitter technical issues

Twitter, the well-known social messaging service, has finally acknowledged the depth and severity of technical problems causing downtime and disruption to users. While such candor is refreshing, it also offers a glimpse into the kinds of management issues that underlie virtually all IT failures.

The end-user problem. Twitter has raised about $20 million of funding and garnered tremendous publicity, giving users the expectation this high profile company should offer consistent and reliable service. Despite Twitter's substantial resources, it has acknowledged the service is not sufficiently reliable. From the Twitter blog:

Twitter downtime

[This graph] should be flat.

We've gone through our various databases, caches, web servers, daemons, and despite some increased traffic activity across the board, all systems are running nominally. The truth is we're not sure what's happening. It seems to be occurring in-between these parts.

The technical problem. As typical of many Web 2.0 companies, Twitter built its service to meet short-term objectives; in this case, that meant choosing an unsuitable technical architecture, as another post on the Twitter blog describes:

Twitter is, fundamentally, a messaging system. Twitter was not architected as a messaging system, however. For expediency's sake, Twitter was built with technologies and practices that are more appropriate to a content management system. [This has] introduced a great deal of complexity and unpredictability....This is, clearly, not optimal.

THE PROJECT FAILURES ANALYSIS

When technology fails, management oversight and skill usually determine the impact on a business and its users. Twitter acknowledged they "aren't sure" where the technical problems lie and admitted they built the basic architecture for "expediency" rather than suitability to task.

Lack of sufficient management experience and judgment ultimately created the difficult situation where Twitter must rip-and-replace foundation technical components to resolve severe performance and reliability issues. These issues are substantial enough to threaten users' confidence in both the company and the Twitter service.

Management experience and judgment. I asked Steve Mann, social media strategist at SAP, for comment on the experience issue:

In customer-centric organizations one of the most critical factors which these enterprises focus on is the customer expectation of high availability. Now I don't know the Twitter management or technology teams so I won't presume to armchair Quarterback on their behalf but as both an outsider looking in and an avid Twitter user, the recent outages suggest a degree of inexperience. Now maybe they've done this but I would have expected the Twitter team to project out usage patterns at least six months ago and based on those projections, would have begun re-architecting their infrastructure in order to scale to meet anticipated volumes and the availability expectations we are seeing today.

Steve also blogged about Twitter's credibility in the face of poor reliability:

Once trust is blown it doesn't matter if Twitter fixes its availability issues in a week or a year. Its already lost the opportunity it currently has. One might say, its already lost that opportunity for good.

Zoliblog shares similar concerns about Twitter's management judgment:

On second thought, I am less forgiving. Twitter already raised $5M before this round, that should have allowed them to bring in expertise they clearly lack. If only their priorities were on fixing the service instead of chasing more money.

Legitimate technology challenges. In fairness, the challenges facing Twitter are substantial and push the limits of current technologies. Hueniverse put those issues in context:

The idea that building a large scale web application is trivial or a solved problem is simply ridiculous....The social web is creating demand for new scaling tools and technologies. Current databases and caching solutions are simply unable to handle a complex network of multiple relationship between objects.

Nonetheless, Twitter's rollout planing process is clearly flawed. In contrast, Facebook has demonstrated professionalism in large-scale rollout planning:

The secret for going from zero to seventy million users overnight is to avoid doing it all in one fell swoop. We chose to simulate the impact of many real users hitting many machines by means of a "dark launch" period in which Facebook pages would make connections to the chat servers, query for presence information and simulate message sends without a single UI element drawn on the page. With the "dark launch" bugs fixed, we hope that you enjoy Facebook Chat now that the UI lights have been turned on.

The same post says: "scalability has to be baked in from the start," a lesson that Twitter is learning slowly, painfully, and very much publicly. Facebook is a far larger and more mature organization than Twitter and you can see the difference in their respective approaches toward managing technology.

My take. Twitter is a great service and I love it when it works. In addition, the Twitter folks are friendly and accessible, so it feels somewhat mean-spirited to apply the usual IT failures expectations to them.

Twitter co-founder and Creative Director, Biz Stone, offered these comments by email:

I continue to be inspired by both Jack [Dorsey, CEO] and Ev [Williams, Chief Product Officer] and I'd argue that their talent and judgment is precisely what will navigate us through these growing pains and help us reach the vision of Twitter's future we all share. You'll be interested in knowing that we are actively seeking talented managers and we are spending significant resources on recruiting.

Despite the obvious good will, Twitter now has an $80 million valuation and provides a communications infrastructure upon which many people depend. From that perspective, users are completely justified in expecting a robust, reliable service with no explanations and no excuses.

Topics: Social Enterprise, Collaboration, Enterprise Software, Software

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

4 comments
Log in or register to join the discussion
  • Getting to Unknown Startup to Popular Service Provider

    Twitter's predicament is the story of every startup --- demonstrate the concept of the service is viable, which enables one to raise money to become a "real" company. As with all software startups (at least since the 80s), the problem is during the pre-startup phase, there is a need to quickly build something (almost prototype like) to demonstrate the validity of the concept and the viability of the business. One always thinks there will be time, once there is money, to revisit the architecture and redo the product.

    The problem is, that there is usually no time between validation of the product and business concept and having to have "industrial-strength" product / service, support, and the ability to run a business.

    Having said that, I am skeptical of management?s plan to re-architect and deploy the replacement technology, component by component. This means that the new architecture must work in and of itself, as well as interface with the architecture of the current technology, even if the new architecture model is based on a completely different approach. Can you say "nightmare?"

    I think this strategy continues to show lack of experience by the Twitter team. I wish them the best of luck!
    elizab
    • They're in a tough predicament

      Current architecture doesn't work and won't scale; building a new architecture is always more time-consuming and expensive than anyone expects.

      Twitter is at risk of user defections during this critical time period. How they manage this interim period may ultimately determine the fate of the company and the %$20M of invested funding.
      mkrigsman
  • RE: Twitter: An IT failures management perspective

    Very fair article and analysis. I know from personal
    experience how difficult it is to get a product out there.
    You think it works just fine and then the user touches it!

    I hope Twitter can fix itself. I'd hate to lose their best
    value - accessibility. Imagine if Microsoft or Google
    gobbled them up. Than it would lose its charm and I'd
    have to figure out where I'd go for the very little social
    networking I do.

    BTW, Facebook may work, but I find it tedious and pretty
    useless. It's so complex and full of spam-like applications
    that I can barely tolerate using it. It's just full of junk and
    e-trinkets. Usesless. It may work, but utterly useless.
    007baf
  • RE: Twitter: An IT failures management perspective

    You know, I've been watching this whole Twitter thing for over a year now. I know about the technical problems and all of the brouhaha over Ruby, Rails, architecture, scalability, etc. But one question has never been answered -- what is their profitability model?

    How do they make money with this, assuming they can make it work? I have an account (znmeb) and nobody has asked me for money, I haven't seen any ads, etc. As far as I know, it's a free service.

    What's their business model?
    znmeb