What Microsoft needs to do about the Sidekick fiasco

It's nearly Wednesday, and the question of what happened to the Sidekick service remains unanswered.At one level, it's obvious.
Written by Rupert Goodwins, Contributor

It's nearly Wednesday, and the question of what happened to the Sidekick service remains unanswered.

At one level, it's obvious. T-Mobile US's Sidekick service, run by Danger and thus Danger's owners Microsoft, went down and stayed down. As the Sidekick mobile phones rely completely on the service to maintain their data – they're thin clients with only battery-backed local storage – this has left the users with nothing.

At another level, it's remarkably unclear. How can a company with the resources, experience and reputation of Microsoft allow a mission critical system to die beyond resurrection? The reaction of the punditsphere has been rapid, predictable and not unreasonable: this is the failure of the Cloud idea, this is a vindication of keeping it local, this proves Windows is useless, this is absolute proof of Microsoft's incompetence.

Not unreasonable, just not very right. If local backups never failed, then yes, that would be a killer shot against remote services. Cloud means many things, but it doesn't normally mean having a single copy of data running on a single system: Google's downtimes have been bona-fide cloud failures, but data has not been lost. And as for Windows being somehow at fault: please. Danger is an Oracle, Unix and Java outfit: unless the problem happened because the service was being moved to a new Windows-based system, then keep yer trap shut (and even if it was, the old system would have been there for fallback. So, no).

Which leaves Microsoft's incompetence. Of all the diagnoses, this is the most unanswerable. And while Microsoft has many very competent people, the loss of all data goes beyond personal incompetence. This has to be, at a very deep level, a systemic management failure.

And that is a truly dangerous perception for Microsoft. Forget about the loss of consumer confidence in Sidekick – that's gone, and won't come back. It's not as if the customers really knew or cared that Microsoft was behind their service. Management failure speaks most eloquently to enterprises, who know more than they know anything else that bad management is a corporeal disease that will ruin all else that is good and reliable in a company. It makes a partnership unconscionable: it doesn't matter if a business partner is evil, greedy, power-crazed or working to hidden agendas, you can work with all of that if it comes up with the goods. If it is incompetent, though, it is poison and can kill you. Run away.

There is only one course of action that will save the day for Microsoft, and that's a detailed, frank and complete explanation of what happened. That's very difficult for any company to do, let alone one so addicted to public protestations of God-like perfection in the face of what the faithless consider evidence to the contrary. Yet in this case, there is no other path to redemption.

Microsoft even has a template to work to. Earlier this year, the Apache Foundation had a very embarrassing and very public security failure. Hackers gained access to many of its public servers and installed scripts that compromised various Apache developer services. It took some time and a loss of service for Apache to recover from this: not what you want if you're responsible for the code that runs the majority of the Web.

Apache reacted well. After diagnosis and restoration, and having fixed the chain of vulnerabilities that exposed them, the Foundation published a very detailed account of what happened, how they recovered and why it won't be happening again. That account was sufficiently complete to act as a valuable and apt lesson for others who also run web services. Not only did it restore confidence in Apache, it took a bad event and turned it into a public good.

We still don't know what happened with Sidekick. There are plenty of rumours, including sabotage, Machiavellian actions by Microsoft to destroy confidence in the cloud, internal revolt and quite possibly alien action. Best guess? It was probably a failed upgrade that required data restoration from a backup that subsequently proved unusable – this happens. The golden rule, that a backup isn't a backup until it's restored, was ignored.

Microsoft needs to come clean. It needs to publish what it found in the wreckage, it needs to say how that happened and it needs to say what steps its taken to ensure that it doesn't happen again anywhere in the organisation. And it needs to do so in terms that the rest of us can use, to check our own systems and guard against our own tendencies to incompetence.

It needs to do this now. Otherwise, all we know is that the company is incompetent in delivering the core services it's trying to sell us, and it may not be able to cure that.

Enterprise poison.

Editorial standards