X
Business

Glitch and recent system outages

Recent data center and software failures have people pointing fingers. Should we look at a broader causal factor and could we be tempting fate?
Written by Brian Sommer, Contributor

There have been some well-publicized system outages recently. Cloud hosting sites have been down as have some cloud services and highly used applications.

Why these systems fail can be attributed to a number of causes. Some are due to hardware failures. Some fail due to operator error. Some fail due to software errors. And, several fail without any of us either aware of it or ever knowing what caused these failures.

As these events occur, I keep remembering a book that Jeff Papows (Jeff was Lotus' CEO and President of Cognos) wrote called Glitch (from Prentice-Hall). While I read months ago, recent events made me recall the words in those pages.

Glitch by Jeff Papows

Glitch by Jeff Papows

In Glitch, Jeff tells how software bugs trigger some spectacular problems. Some of the examples he cites are even tragic. But, through them all, he paints a picture that leaves no doubt as to the enormity and economic damage these glitches create.

Not all the glitches he describes are due to the willful disregard or negligence of programmers. Some occur when two independently created pieces of software (or software and hardware) are mated together without people understanding all the possible ways these different products will be used together. While many of us can test code for the predictable/knowable situations, we have trouble testing for the unknown and unforeseeable events. That's when really hinky things can happen.

Yes, more testing is also beneficial and for certain situations (like aviation navigation software) it is imperative. But, like all business decisions, human beings must make decisions as to when an adequate level of testing been completed. That thought makes me uncomfortable just thinking about it.

I'm not saying that we should accept anything less than perfection but I am also enough of a realist to understand that nothing in this world is perfect. The challenges Jeff describes in his book are only the tip of the problem - and - it's a problem that will only grow more apparent as we incorporate more software into our automobiles, homes, work, etc.

Several times this year, I've been chided for having a really old cell phone. It's not a smart phone but it does exactly what I need it to do: phone calls, voice mail and the odd text message. There's something to simplicity and that phone is one of the simplest pieces of technology I own. It has never had a software upgrade. It can't do much and, as a result, it has very limited means of failing me.

I worry when we surround ourselves with overly engineered 'solutions' as someday, sometime, somehow they will fail. When you see some spectacularly over-engineered products, think about the beach houses on U.S. barrier islands. People construct those homes thinking the engineering of the building will spare it from major damage from hurricanes. While the roof might survive the winds and the pilings might mitigate some of the flood damage, nothing will save the structure when the wind topples a utility pole onto the house and causes the transformer to short out and cause a massive fire. Sometimes, you can't predict every adverse situation. However, sometimes you should avoid over-engineered solutions.

Maybe the Luddites had it right after all....

Editorial standards