Tubthumping

A place to rant.

Monday, February 27, 2006

Breaking the 'continue' habit

The keywords continue and break are used by many programmers for "easy" flow control. But they make code harder for humans to read.

The problem with them is that they interrupt the flow of control in a strange way. Of course, that's the whole point, but while it makes lots of sense when you're writing it, it's surprisingly counter-intuitive for people reading your code.

When you're scanning code, and you encounter a 'break' or 'continue', you immediately have to stop and figure out where the code actually jumps to. This is particularly annoying with nested scopes, and they always occur in at least one nested conditional.

The next thing the reader has to do is determine the state of local variables after the jump, which is not always easy. In fact, it's hard enough for any non-standard loop to determine the current state of the program at the end of the loop. But when the loop allows the possibility of break'ing or return'ing early, it's even harder, because it means you have to scan the entire body of the loop to figure out what it changed, instead of being able to look at just the header or footer.

A much better way to write most loops is to partition the body into several conditional cases, possibly nested ones. Then, put the 'break' condition into the loop condition, where it should be. People expect terminating conditions here - that's what it's for!

Consider a contrived example in which you're iterating over n elements in a collection list. You only care about elements x such that foo(x) is positive (for some function foo). For those x, you want to do some complicated stuff (like compute an average). You want to stop if you get to a negative number. It's very tempting to write this:


int min = 0;
for(int i = 0; i < list.length; ++i) {
if(!foo(list[i])) {
continue;
} else if(list[i] < 0) {
break;
}
// do work to calculate the average
}


I would argue that the following is a much better way:


int min = 0;
for(int i = 0; i < list.length && list[i] >= 0; ++i) {
if(foo(list[i])) {
// do work to calculate the average
}
}


Why's this better? Well, it's more concise, but that just reflects the other nice property: it's much simpler! If you're reading this code and wondering how long this loop will run, it's easy: you can just look at the loop conditions. And if you want to know what it does in the body, you only need to consider two cases (one of which is a no-op). In the first example, there are at least three important cases to worry about, and the order in which they are checked is significant. As a result, looking at any given line of code in the loop, it's hard to figure out under what circumstances it will be executed. With the second example, it's extremely clear.

Of course, there are exceptions to all rules of thumb. Return'ing early is usually much less controversial (to me), and a really good idea if, for example, you're doing a linear search for something that will only be true for one element. But you can always rewrite the code to be as efficient without 'break' or 'continue', and it will very often be clearer.