Twitter, the well-known social messaging service, has finally acknowledged the depth and severity of technical problems causing downtime and disruption to users. While such candor is refreshing, it also offers a glimpse into the kinds of management issues that underlie virtually all IT failures.
The end-user problem. Twitter has raised about $20 million of funding and garnered tremendous publicity, giving users the expectation this high profile company should offer consistent and reliable service. Despite Twitter's substantial resources, it has acknowledged the service is not sufficiently reliable. From the Twitter blog:
[This graph] should be flat.
We've gone through our various databases, caches, web servers, daemons, and despite some increased traffic activity across the board, all systems are running nominally. The truth is we're not sure what's happening. It seems to be occurring in-between these parts.
The technical problem. As typical of many Web 2.0 companies, Twitter built its service to meet short-term objectives; in this case, that meant choosing an unsuitable technical architecture, as another post on the Twitter blog describes:
Twitter is, fundamentally, a messaging system. Twitter was not architected as a messaging system, however. For expediency's sake, Twitter was built with technologies and practices that are more appropriate to a content management system. [This has] introduced a great deal of complexity and unpredictability....This is, clearly, not optimal.
THE PROJECT FAILURES ANALYSIS
When technology fails, management oversight and skill usually determine the impact on a business and its users. Twitter acknowledged they "aren't sure" where the technical problems lie and admitted they built the basic architecture for "expediency" rather than suitability to task.
Lack of sufficient management experience and judgment ultimately created the difficult situation where Twitter must rip-and-replace foundation technical components to resolve severe performance and reliability issues. These issues are substantial enough to threaten users' confidence in both the company and the Twitter service.
Management experience and judgment. I asked Steve Mann, social media strategist at SAP, for comment on the experience issue:
In customer-centric organizations one of the most critical factors which these enterprises focus on is the customer expectation of high availability. Now I don't know the Twitter management or technology teams so I won't presume to armchair Quarterback on their behalf but as both an outsider looking in and an avid Twitter user, the recent outages suggest a degree of inexperience. Now maybe they've done this but I would have expected the Twitter team to project out usage patterns at least six months ago and based on those projections, would have begun re-architecting their infrastructure in order to scale to meet anticipated volumes and the availability expectations we are seeing today.
Steve also blogged about Twitter's credibility in the face of poor reliability:
Once trust is blown it doesn't matter if Twitter fixes its availability issues in a week or a year. Its already lost the opportunity it currently has. One might say, its already lost that opportunity for good.
Zoliblog shares similar concerns about Twitter's management judgment:
On second thought, I am less forgiving. Twitter already raised $5M before this round, that should have allowed them to bring in expertise they clearly lack. If only their priorities were on fixing the service instead of chasing more money.
Legitimate technology challenges. In fairness, the challenges facing Twitter are substantial and push the limits of current technologies. Hueniverse put those issues in context:
The idea that building a large scale web application is trivial or a solved problem is simply ridiculous....The social web is creating demand for new scaling tools and technologies. Current databases and caching solutions are simply unable to handle a complex network of multiple relationship between objects.
Nonetheless, Twitter's rollout planing process is clearly flawed. In contrast, Facebook has demonstrated professionalism in large-scale rollout planning:
The secret for going from zero to seventy million users overnight is to avoid doing it all in one fell swoop. We chose to simulate the impact of many real users hitting many machines by means of a "dark launch" period in which Facebook pages would make connections to the chat servers, query for presence information and simulate message sends without a single UI element drawn on the page. With the "dark launch" bugs fixed, we hope that you enjoy Facebook Chat now that the UI lights have been turned on.
The same post says: "scalability has to be baked in from the start," a lesson that Twitter is learning slowly, painfully, and very much publicly. Facebook is a far larger and more mature organization than Twitter and you can see the difference in their respective approaches toward managing technology.
My take. Twitter is a great service and I love it when it works. In addition, the Twitter folks are friendly and accessible, so it feels somewhat mean-spirited to apply the usual IT failures expectations to them.
Twitter co-founder and Creative Director, Biz Stone, offered these comments by email:
I continue to be inspired by both Jack [Dorsey, CEO] and Ev [Williams, Chief Product Officer] and I'd argue that their talent and judgment is precisely what will navigate us through these growing pains and help us reach the vision of Twitter's future we all share. You'll be interested in knowing that we are actively seeking talented managers and we are spending significant resources on recruiting.
Despite the obvious good will, Twitter now has an $80 million valuation and provides a communications infrastructure upon which many people depend. From that perspective, users are completely justified in expecting a robust, reliable service with no explanations and no excuses.