Update 5/30/08 11:00am EDT: TechCrunch says "Twitter’s doing an excellent job of actually communicating with users all of a sudden." That's nonsense. Yes, they've improved, but there's a long way to go to achieve failure communication greatness.
Twitter's downtime, lack of reliability and unresponsiveness has annoyed many users. Some observers, me included, chalk it up to inexperienced management, while still acknowledging Twitter's greatness when it works. Finally, the Twitter team has initiated attempts to be more open and transparent with the problems and their attempts to fix the situation. While long overdue, and much appreciated, the Twitter team still has some lessons to learn about communicating failure.
Twitter has answered a burning question in the development community: Will Ruby on Rails stay as it overhauls its infrastructure? The answer: Ruby stays, but Twitter may diversify in some areas.
While an interesting point, ordinary users want to know when the frikkin' service will be working again. Presumably, the Twitter status blog should help us there. Here's a quote from yesterday's update:
Wanted to provide an update on where we are in restoring services. The partial pagination fix we deployed appears stable. Some folks are still going to be missing older links but we’re working to restore those.
Geez, Twitter leaders, that's just not enough. We users want more insight into your problems; help us understand your issues and feel your pain, because we share some of it.
In contrast to Twitter's mediocre status posts, here's an example of great post-failure communications from Technorati, another free service with a rather spotty reliability record:
Technorati's spiders were shutdown for several hours on Thursday and various intervals since then while we investigated a number of anomalies that were appearing in our data; essentially, a small percentage of recently created blogs were having their data scrambled. An example of this appears in this blog post. The spidering outages allowed us time to investigate, diagnose and make corrections that prevented further data corruption. We started running some corrective measures on Friday but found over the weekend that that was only partially effective. Technorati handles a large volume of data everyday; isolating and devising remedies for these kinds of issues that effect a small percentage of the data flow is tricky. However, we think we're recovering now and the backlog of data processing is getting worked through.
The post continues explaining what happened, and why, in more depth. Technorati comes across as transparent, insightful, and honest because the company:
- Acknowledged the full scope of the problem
- Took immediate corrective action once they realized the problem existed
- Provided context regarding why the problem was hard to solve
- Protected the company’s credibility (I call this “intelligent CYA”)
- Described symptoms the customer might experience, in jargon-free terms
- Presented their problem resolution strategy
- Demonstrated responsible and professional analysis
Transparency is hard and I know there was internal debate inside Technorati about whether or not such openness is wise. But for Twitter, the time for transparency is now at hand. Soon we'll see whether Twitter's management can rise to the occasion and meet this challenge.