Rupert Goodwins' Diary

Tuesday 8/8/2006 Happy Winter Solstice! Dismiss any thoughts of skyclad prancing, though: the festival is only being celebrated in the south of Mars.

Tuesday 8/8/2006

Happy Winter Solstice! Dismiss any thoughts of skyclad prancing, though: the festival is only being celebrated in the south of Mars. Fourteen degrees South, 175 degrees East, to be exact, where Mars Rover Spirit is patiently sitting out its second winter parked on the Low Ridge Haven outcrop. Since landing on 4 January, 2004, the valiant vehicle has survived dust storms and broken motors, but it and its twin Opportunity continue to send back unparalleled data by the bucketload. Although Spirit hasn't moved since April, it's found meteorites and produced a high-resolution, 360-degree panorama of its surroundings — neither of which I've managed from my London flat, despite not moving for considerably longer. NASA has agreed to extend the Rover missions for another year from October, so it's Martian Martinis all round and on with the show.

The question I want answered above all others though, as I survey yet another utility bill that seems to bear little relation to the basic tenets of mathematics, is how NASA can keep such complicated systems alive and well on the surface of the Red Planet while my local telco can't accurately track my telephone usage in Holloway. It's true that the Rovers run a notably robust and mission-tested operating system, VxWorks from Wind River, that has had far more than the usual number of hideously bright people buff its shiny carapace. But then, my Linksys wireless router runs VxWorks too (yes, I like the idea that I've got code running in my living room that may also be running on Mars): if I can buy that level of reliability for 50 quid, it's not made of unobtainium.

Of course, it's not possible to directly compare a billing system with a real-time data acquisition and control project. It's also palpably untrue that to make a reliable application all you need is a reliable operating system. The Rovers and their forerunners have had software problems of their own: the difference is, they got fixed and stayed fixed, because there was no other option. You can't afford to have something turn itself into a brick a hundred million miles away from someone who can reflash the ROM.

That's what's missing — the idea that there is no other option. Telcos and the vast army of other concerns who think it's acceptable to be unreliable know that they can stagger from patch to patch: if they realised that they couldn't, then they wouldn't. The engineers behind the Rover missions are quite clear on how they did it: you make sure you know how your system works. It doesn't matter if you wrote it, if you farmed it out to a third party, or if you bought in a commercial off-the-shelf system, make sure you have the people with the knowledge to understand what on earth it's doing, at every level. Is there anyone on this or any other planet outside Microsoft who can say that about Windows?

Then you make sure you have diagnostics and back-up options plumbed into the system during testing, and then you make sure they stay in when you go live. Test what you ship and ship what you test. And during that testing, you'll have limited resources — of course. So prioritise. Categorise by seriousness, likelihood and difficulty. How will your teams' efforts best be spent?

And make darn sure none of this is lost as things go live and developers move on. Document.

None of this is, ahem, rocket science. All of it works. If you get it right, you will produce good results. Look up at the sky if you doubt it. So why is it so hard?

Anyway. VxWorks is also in use in the Stardust project — and if you fancy contributing a bit of your time to space science and perhaps getting your name immortalised as the discoverer of something small yet really important, take a look at Stardust@Home. It's a curiously relaxing pastime, and just the thing if you need to unwind after that last phone bill.