Why do programs fail? For as far as we've come since the very first time Ada Lovelace saw the potential of universal computation in the 19th century, our software still has bugs. Over the years, even though we've developed many sophisticated ways to ensure successful code, programs still break.
While the answer to that could be taken in any number of existential directions, we've decided to provide a practical answer. Programmers make mistakes. They sometimes get sloppy. They don't always use the best tools or best practices.
I teach object oriented programming at the UC Berkeley extension, where I spend as much (or more) time teaching good programming practice as I do helping students understand the code itself. In my classes, I see a lot of common mistakes, and I'll share some of those with you in this column.
I also reached out to Professor James A. Connor of Northwestern Polytechnic University's School of Engineering and asked him to share some common mistakes his students make.
I'll go first, and then I'll share Jim's insights with you.
Mistake #1: Poor commenting practice
Comments are elements of text inside a program that the computer does not execute. They are written by a programmer as notes, explaining what's going on inside the code.
Many of my students avoid commenting their code, and don't understand why they should take their time away from actually coding to write some notes. My most practical example comes from my own life.
I wrote Version 1.0 of ZENPRESS, one of the earliest content management systems, back before the turn of the century. I expected it to be delivering articles for a few years. Fourteen years later, it was still feeding articles, having prepared nearly 75,000 articles and fed 2.6 billion pages.
Eventually, the platform it was running on became obsolete. I had to dive back into the code. In 2009, I ported it from the original platform to a modern one. I recently had to change it again because a key language feature of PHP simply disappeared in a version upgrade.
There is no way I would remember how all that code worked after 19 years, but because I'd commented my code well, I had something of a road map. I was able look at my code, see my notes embedded with the code, and make fixes.
Comments are also important when you're working on a team, or when your software will live on beyond your stewardship. You may move on with your career, and someone else may need to come in and understand your code. Commenting will help.
Mistake #2: Poor variable naming
I'm going to continue my theme of making code understandable through language. I'll illustrate this with an example. Let's say you're in a car that gets 20 miles per gallon and you drive 100 miles. How much gas have you used?
It's a simple example, but it works for our purposes. Let's say you encounter the line a = b/c. What does a mean? What are b and c? How do they relate to the rest of your code? Ten minutes after you write the routine, you're going to forget. Never mind if someone else has to come in and make fixes or write an update.
Now look at this expression: gallons = miles/mpg. It's immediately clear what each of the variables is meant to do. One represents gallons, one represents miles and one represents miles per gallon. It's clear.
Think about the relationship of giving variables clear, English-language names (or whatever your native spoken language is) and comments. Let's say you inherit a chunk of code and you see a = b/c. What does it do? Do you have any idea?
Be sure to name your variables in a way that represents their function. You will save a lot of time and reduce a lot of headaches.
Mistake #3: No lab notes
I started writing ZENPRESS in mid-1997 and it went live in January 1998. Sadly, I was in a hurry to finish the project, and I didn't want to take the time to write lab notes for that first release. I have regretted it many times since. Beginning in June of 1999, when I started on version 2, I kept regular lab notes.
Lab notes are records that go beyond comments in code. Scientists use lab notes all the time as a journal or dialog of their development process. Lab notes have been used to prove ownership of scientific discoveries, because the process of exploration is often documented in the daily journal that scientists use to record progress.
Lab notes are a powerful tool for programmers as well. My last lab note for ZENPRESS was written in March of this year, when I had to move the ZATZ archives from one hosting provider to another. I keep regular lab notes on my other projects as well, and I have been saved many times by being able to go back into my notes.
If you're not keeping lab notes already, start now. Write down changes you've made, your reasoning, the things you considered and discarded, references to useful resources, and anything else that would help the future you. You'll also be helping your future colleagues or replacements -- or a judge if you need to prove ownership.
Mistake #4: Not writing in a human language
My students don't just have to program to pass their classes. They also have to write discussion board posts that demonstrate their understanding of certain coding concepts.
We require this for two reasons. First, of course, is the demonstration of understanding of the concepts. But of much more importance is the need for all professionals to be able to write.
I get a lot of push back on this from my students. At least two every semester cry, "I want to be a programmer, not a writer." But programming, engineering, IT -- almost all professional endeavors -- don't exist in a vacuum.
You'll need to write to explain concepts, to pitch an idea, to get funding, to ask for clarification, to prepare a proposal, or even to argue for a better grade. Open source project participants work as colleagues in very extended teams, and the only way they can stay in sync is by writing clear and understandable messages.
The bottom line is simple: If you want to do professional work or work on anything of importance, you need to write in a human language, like English, and not just in a programming language.
Mistake #5: Poor code formatting
As you've no doubt picked up, there's a theme here: make code understandable. Code maintenance is enormously time consuming and expensive. Frankly, it's not all that much fun. It's much better to be able to spend productive time adding capabilities than to spend weeks digging through old code, trying to figure out what you (or the person you inherited the code from) were trying to do.
I have experienced this personally, not just from my old code, but from code I've inherited. I adopt abandoned WordPress open-source plugins as a side project. As far as I know, I've adopted more than anyone else (and I'm speaking on that topic this weekend at WordCamp). Each plugin was developed by someone else, and to keep it working, I've had to dig through strangers' code.
Fortunately, those developers were excellent practitioners of the programming arts. I wouldn't have taken on these projects if they weren't. But even so, it's been challenging to come up to speed. Can you imagine how hard it would have been if their code was poorly structured?
By structured, I mean the way in which code is laid out. I did a video on this for my students. You're welcome to watch it on YouTube.
Think about the articles you read online. Some are nicely formatted, with a line between each paragraph, and everything is consistent. Some articles, though, have everything arranged in one large blob, and they're impossible to read.
Every programmer (or project) tends to have a programming style. It's not as critical what your style is, as long as it is consistent. You need to let the code format help guide you.
For example, in my code, I insist on never having more than one blank line between sections. If I see a bigger chunk of white space, I am therefore immediately cued in on the fact that something is not as it's supposed to be, there might be a bug in that area.
As you move forward in your code, look into whether your organization has a coding style. Consider defining a coding style for all your programmers, and stick to one that's clear and fosters maintainability.
Mistake #6: Poor error checking
Some famous general once said that a plan never survives an encounter with the enemy. My variation of that is that your code will never survive as expected when encountering users. Even though you think you know how users will use your code, trust me on this: you don't.
Users will break your code.
The way to handle this properly is with testing and error checking. Error checking is the practice of checking the result of every operation in your code. Make sure it's either what you expect, or that your code can handle the unexpected result.
My students, for example, have an assignment that involves reading a file. Almost all of them write the code by calling the file read routine. They check if the user cancels the dialog box, but they rarely ever check to see if the file is actually read in, or if there's a system error of some kind. It's worse when they try to write a file. They almost never actually check to see if the file was, in fact, saved. Oops.
You can see how this can be bad. To counter it, you always need to think about whether or not you can absolutely predict behavior -- and then realize you can't. You'll need to test. Testing doesn't just imply running the code yourself. Testing means letting real users, people who may behave in unpredictable ways, run your code.
You will find it infinitely informative.
Mistake #7: Using print statements instead of a real debugger
I've found over the years that programmers of different languages tend to have different cultures. In large part, that's because they're building different kinds of solutions and using different tools.
One example of this is the difference between my C# programming students and the open source PHP developers I work with on some of my projects. Almost no C# programmer would ever consider debugging his or her code without the use of a symbolic debugger. That's because C# is natively programmed using Visual Studio as the coding environment, and a debugger is built in.
By contrast, I have seen a never-ending stream of PHP developers who think just dropping an echo statement or var_dump is good enough to help them debug their code. In part, this is because most PHP programmers tend to program in an editor, not a development environment. A big difference between the two is the debugger.
So what is a debugger? Put simply, it's a tool that allows you to look inside your code as it runs. Think of it as an X-ray, ultrasound or MRI for your code. You can tell the debugger to stop at certain points for you to examine the state of all your variables. You can tell your debugger to stop on certain conditions. You can change values. You can watch and profile values (although profiling is sometimes a separate tool).
The difference in productivity can be substantial. If you want to get the job done faster and with far more accuracy, make sure to use a real symbolic debugger.
With that, my section of tips and observations is over, and I'll turn over the floor to Professor James Connor.
Mistake #8: Using magical numbers
Many programmers think they're just going to have to code once and it'll be perfect. However, in order to optimize the long-term life-cycle costs of enterprise and industrial software, it is necessary to write code that can withstand changing conditions.
One classic example of this is the idea of magic numbers. By magic, I mean numbers that the programmer thinks will always survive the test of time.
Take, for example, a commission calculation that might be based off the customer's purchase amount. At the time of writing, the commission percentage might be three percent, or 0.03.
Now, imagine how this code might be written: commission = .03 * sale. In this context, the magic number is 0.03. Since the programmer thinks this would be magically valid forever, he hard codes the number 0.03 into the code.
That's all well and good, but commissions tend to change from year to year. If the following year the commission increases by half a percent, to 0.035, it will be very hard to hunt it down in the thousands of lines of code.
Rather than using magic numbers, define variables or constants in one place, and let your code use those variables. If you pre-define commission_rate, then code like commission = commission_rate * sale won't need to be changed.
Another thing to consider is that wherever you find a magical number, you might have identified an option you want to expose to the user, one that you can let him or her set in a preferences section.
Mistake #9: Sloppy dates and times
Here's a tough question: how many days are in a year? 365 might be the normal answer, but this year, of course, has 366. Are there ever 365.25 days in a year? No. No there are not.
But some of my students have decide that since leap year comes once every four years, there are therefore 365.25 days in each year on average. When doing date calculations, they use this average, and, as a result, nothing is ever correct.
Often it's better to use a system library to calculate dates, because the dates you are calculating may not be western calendar dates.
Let's look at a similar issue with time. Every several years, as the Earth slows down, an extra second is added to a day, typically on either June 30 or December 31. This is called a leap second and thus, it is possible to have the clock go from 11:59:59 to 11:59:60 to 12:00:00.
Here's a second time challenge. In places where daylight saving time is in use, it's possible for transactions to occur out of order. For example: Transaction A is placed first, but then time is reset backwards by one hour, then Transaction B is placed. However, if you're is sloppy about time sequencing, it will be recorded that Transaction B happened first. This type of time error can cause financial penalty fee's to be wrongly incurred and all manner of other chaos.
Once again, there are many good language and system libraries set up to accommodate both of these time issues. It's often better to use existing libraries than to code your own time calculations.
Mistake #10 Not choosing the right data structures
A data structure is a mechanism for representing data in your programs. Many of you have heard terms like linked list, tree, and array. Each of these are logical representations of data that correspond to some architectural structure of what you're trying to represent.
One of the most common mistakes I see programmers do -- both experienced coders and newbies alike -- is paying too little attention to data structure choice. Since almost all your code builds on your choice of data representation method, choosing the wrong data structure can have costly implications down the line.
Here's one example that illustrates this sort of design error: choosing a simple stack or queue, instead of a circular queue. Think of a stack as a stack of dishes. You put the bottom dish down, then another dish on top, then another, and so forth.
If you want to remove a dish, you take it from the top of the stack. This is called last-in, first-out. The problem is, if you need to remove something earlier in the stack, it's a hassle. Let's say you have ten dishes in the stack. To get to the first one, you have to remove all the others first.
Now, let's think of a queue. When you stand in line at the bank, you're in a queue. The first person in is also the first person out. As soon as the first person is served, the next person is up, and that person is served. The other thing that happens is that each person takes a step forward, moving up in the queue.
What happens when too many people show up? They're either turned away or the line goes out the door. And when the first person is called, all these people have to move.
When you have a lot of data, a queue of this sort can be enormously inefficient. Each time data is pulled from the beginning of the queue, all the data needs to move. We're in a big data world, where we have a constant flow of data through our systems.
In this context, it might be better to implement a circular queue. In this case, the data never moves. Instead, a pointer is set up to point to the beginning and end of the queue and, internally, the queue wraps around itself, so that the data is organized in a ring instead of a line. When a data element is used and eliminated from the ring, there's no need to move all the data in the ring. All that happens is the first element pointer points to a new element in the ring.
This is but one of many examples of how the choice of correct data structure can have enormous implications on the efficiency and effectiveness of your code.
David here. I'd like to send a big shout-out of thanks to Professor Connor for sharing some of his insights. Hopefully, between my tips and his, you'll become more efficient and effective programmers and avoid some of these serious mistakes.