Miscellaneous Comments on Programming Complexity

The comments here are a collection of thoughts and opinions I've had on the subject of programming and programming languages. My experiences has been primarily with C, C++, and Perl, with an occasional bit of Fortran, Lisp, Pascal, BASIC, and others.

Program Complexity

I have identified three different criteria which can be used to measure the "size" or "difficulty" of a program. These are syntactic complexity, conceptual complexity, and length.
Syntactic Complexity
This manifests itself as a difficulty in translating between the intent behind the code and the code itself. A syntactically complex program is one which is straightforward when written in pseudocode, but which becomes difficult when expressed in an actual programming language. This is obviously not only a function of the algorithm, but also of the particular language in which it is being expressed.

Excellent (albeit somewhat extreme) examples of syntactically complex programs are available from sources such as The Obfuscated C Contest, The Obfuscated Perl Contest, and Befunge.

Conceptual Complexity
Conceptual complexity relates to the concepts behind the code. A conceptually complex piece of code is one which does not become any easier to write or understand when it is written as pseudocode than when it is written in an actual programming language. Conceptual complexity is mostly independent of the language in which the the algorithm is being implemented.

Particularly good examples of things which are conceptually complex are mathematical algorithms of just about any sort. These often involve nothing more than loops and arithmetic, which are easily expressed in just about any language, yet they can still be quite difficult to understand without an explanation of the actual math behind them.

It is also possible for code to simply be long without actually being complicated or difficult in any way. My favorite example of this (based on my very limited experiences with Microsoft Windows and the hatred that resulted) is user-interface code for a graphical user interface. It takes hundreds of lines of code just to put up a window on the screen and print hello world on it, because you're required to write functions to handle all sorts of events, and then register these handlers. Even something which does most of the work for you, such as Tk (which I've used in the form of Perl/Tk but not Tcl/Tk), does require quite a lot of code to do nothing more than specify what the window is supposed to look like. This isn't complex or difficult, it's just long and tedious.
I've spoken of pseudocode above as if it were language independent, and of course this is not really true. When writing pseudcode, one needs to have some idea of which concepts can reasonably be expressed on one line, and which should be broken down into more. This depends on what sort of functionality is available from the underlying language. Thus, to write the pseudocode, one does not need to know the detailed syntax of the language which is planned for the final implementation, but one does need to have some idea of its capabilities.

Language Complexity

It seems that the act of learning to program in a particular language can be usefully broken into two parts. The first of these is learning the concepts the language uses and constructs it provides, and the second is understanding how to combine these concepts and constructs to achieve desired results.

The greater the complexity of the language, the higher the ratio of the difficulty of the first part to the difficulty of the second part for a given result (of course, there is variability dependong on what exactly this result is, but consider some sort of average). Assembly language, for example, is conceptually quite simple. Almost all the difficulty comes from putting the instructions together to do useful things. A good example from the other end of the spectrum is something that isn't even always thought of as programming, shell scripts. The building blocks are innumerable - just about every program installed on the system can be used. However, once the available constructs (the programs) are understood, relatively little effort is required to put them together to achieve the desired result.

Obviously, less complex languages are easier to learn in their entirety. However, it is often the case that more complex languages are easier to use for simple or straightforward tasks, as long as you're willing to use them without fully understanding them. I'll give an example using C and Perl to illustrate this point. Perl is a more complex language than C; C has nothing which compares to concepts like Perl's namespaces and packages, rules for bareword interpretation, or a hash member's ability to exist without being defined. However, if you want to match regular expressions, you can do it trivially in Perl; it would be a significant undertaking in C.

This language complexity can, if you want, be divided into conceptual (for things like namespaces and packages) and syntactic (things like bareword interpretation, implicit ->, etc.)

From an ęsthetic point of view, I tend to prefer simple languages over complex ones. Simple languages are also easier for me to learn, as I feel uncomfortable using a language that I do not completely understand; I don't like programming using constructs that are magic to me. This means that when learning a complex language, I endeavor to learn all of the concepts and most of the syntax as quickly as possible; this can be a significant undertaking. This is entirely a personal preference, however. Much information available to me indicates that many people do not feel this way and do not much care how well they understand something as long as they are able to use it. It may well be the case that there are people who feel exactly the opposite - they are not bothered by not understanding all of the concepts or constructs provided by a language, but they feel uncomfortable unless they are confident that they have an excellent idea of how to use the concepts they do understand.

Design and Implementation

Leaving aside all debugging and testing functions, there are essentially two distinct activities in programming. These are the making of design decisions and the actual writing of code. I consider this distinction important because the two abilities are not necessarily related, and an individual person may be better at either one. A certain amount of inequality in these abilities is perfectly acceptable and even desirable. If you have two people, one of whom is performing each of these functions, you're better off with one person who is better at each instead of two people who are identical.

It is not possible, however, to have someone who is good at one of these things and has no ability whatsoever at the other, for the simple reason that the abilities are not entirely separable. It is possible to make design decisions without actually writing any code, but it is important to understand how the code is written in order to make decisions which will not require perverse implementations. Similarly, it is nigh impossible to write code without making at least a few decisions along the way.

Thus, it is my opinion that anyone who has, one of these abilities without the other is actually quite useless. On the side, there are the people with no intuition who have learned to program by rote. They can't do anything that they haven't seen before. These people need hand-holding to perform what are generally viewed as relatively simple tasks. On the other side are the Computer Scientists who never actually program computers. These people spout grandiose schemes which produce illness in the people who are supposed to implement them and much laughter in people who aren't involved. Both of these categories of people are fairly useless, and if you put some from each category together, they would likely not produce much.

Lines of Code

Many people seem to use "lines of code" (or number of statements, or something similar) as a measure of productivity, or as a measure of the size of a project. If you read my comments in the section on Program Complexity, you can probably tell that I don't think much of this. Some people do it because they genuinely believe it to be a useful measure; some people say "I know that it isn't very good, but it's all there is" and go ahead to use it anyway. I have even less respect for the latter than the former; the former are at least honestly mistaken, whereas the latter recognize the inadequacy of their methods yet cannot be bothered to make the exertion required to use something better.

The uselessness of lines of code as a measure of productivity or project size comes directly from the fact that different pieces of code differ in complexity. If someone spends hours analytically solving differential equations, and then writes 50 lines implementing this solution, should his contribution be valued less than that of someone who spent the entire time writing 400 lines of user interface code? My opinion is that obviously it should not.

There is another way in which lines of code as a measure of the size of a project are used. A disturbing number of people seem to think that if someone has worked on a large project (measured in lines of code), he must be better at making design decisions or dealing with complicated concepts than someone who has not. This is, however, completely specious. If the person in question was not the overall architect of the program, he might have been given assignments of any conceivable size and specifications of any conceivable detail. If he was given functions to write which took a few weeks each, and each one bore no obvious relation to the previous (as might well have been the case), he has demonstrated no different abilities than he would have by writing a whole series of separate programs which took a few weeks each.

The only time I have seen lines of code used for something appropriate was when someone wanted to see how many pages would be required for a printout.

Selecting a Programming Language

The first question when selecting a programming language is whether or not you care about execution speed. I don't have much to say about the case where you do care about execution speed; my ridiculously overgeneralized answer is that if execution speed needs to be as low as possible, you don't really have any options but C. If execution speed needs to be even lower and you don't care about portability, you use C with the important parts written in assembly language for the target platform.

Now let's suppose you don't care about speed this much. The next consideration is, of course, programmer time. To minimize programmer time, one wants to minimize complexity. However, the dominant type of complexity (syntactic or conceptual) depends on the size of the program.

For small programs, conceptual complexity dominates. It is desirable to find a language whose data types, functions, and libraries match what one is trying to do as closely as possible. This is the reason Perl is so popular for relatively simple, IO-centered programs - it provides many of the functions that one would have to write for onself in an another language. For a small program, this can result in the program being only a fraction of the work it would have been in a language which did not provide as convenient a set of building blocks.

For large programs, conceptual complexity does not come into play in the selection of a language. Large programs are of course going to be quite conceptually complex, but they're also going to be pretty much the same regardless of which language is used. Any program where most of the work could be eliminated by using Perl's text-handling functions, for example, simply doesn't qualify as a large program. In a large program, the gains you would get from using one language's built-in functions instead of writing your functions in another language are negligible in comparison to other considerations.

Therefore, as counterintuitive as it may initially seem, I beleive that large programs are dominated by syntactic complexity. Thus, you want to select a language which minimizes this complexity. Arguably, however, complexity is conserved - the only way to get a simpler program is to write it in a more complex language. You can shift it between the language and the individual programs, but you can't get rid of it altogether. However, it is still desirable to shift the complexity from the individual pieces of code to the language - complexity in the program must be dealt with repeatedly, but complexity in the language must be dealt with only once. For any given piece of code, there is no benefit, but when you look at the next piece, you already understand the complexity which is a part of the language.

Suppose, for example, that you have a program in C. In this program, you're using "static" global variables to hide data from functions in other source files, and you're passing around void pointers to data of varying types, where the functions that operate on this data determine what to do by checking the magic number that comprises the first two bytes. This program is an obvious candidate for C++, because C++ provides syntax to facilitate these things.

On the other hand, programs which tend to perform a great deal of numerical computation or IO do not tend to be syntactically complex, even if they are long. These programs would therefore experience little or no benefit from being moved into a more syntactically complex language like C++. In fact, they would likely become more difficult to understand, because language features that weren't really needed would be used anyway simply because they were there. It is important to remember that syntactic complexity does not necessarily go along with length or conceptual complexity, and every project needs to be evaluated on an individual basis.