Reducing Incidental Complexity in Our Code and in Our Teaching

Controlling complexity is the essence of computer programming. Brian Kernighan

As software craftsmen, complexity is intrinsic to our daily tasks. With that in mind, it’s important to minimize incidental complexity, the byproduct of introducing things that only obfuscate the intended purpose of the code. It's also critical to teach others in an efficient and effective manner, and one that doesn't unnecessarily complicate things.

Code complexity is one of the many reasons our job is a difficult one. Imagine today is your first day on a new team. In your mind you’ve created a short to-do list.

  • Comprehend the code.
  • Take note of the stylistic and structural conventions being used in the code base.
  • Become familiar with the infrastructure of the project and the tools.

Complexity can be defined as the degree of difficulty with which each item on this list is accomplished. The other developers on your new team, those helping you with the onboarding process, may have contributed to this complexity. The unnecessary complexity they introduced is incidental and should be differentiated from intrinsic complexity.

If you’ve worked in our industry for any length of time, you understand that minimizing incidental complexity is easier said than done. Dr. Edsger W. Dijkstra went so far as to call software "the most complex product ever produced by human effort.” We learn how to produce this complex product and eventually undertake the responsibility of teaching the subject with an opaque understanding of why it's complex in the first place and where and how complexity can be reduced.

Why is it complex?

Any fool can write code that a computer can understand. Good programmers write code that humans can understand. Martin Fowler

What are we doing that increases the incidental complexity of our code, thereby making it more difficult for people to understand? There are a number of factors that can impact complexity.

Unruly nesting

Abusing control structures can lead to deeply nested statements that are tremendously difficult to parse and make the code far less extensible.

I'll give you an example and to preface this example I’ll provide you with some background. We have a Java app with a method called generateResponseObject() that will return an instance of a class that extends ResponseObject. There are six types of objects that this method could return.

public ResponseObject generateResponseObject(String route) {
  ResponseObject responseObject;
  if (route.equals("time")) {
    responseObject = new TimeResponse();
  } else if (route.equals("echo")) {
    responseObject = new EchoResponse();
  } else if (route.equals("redirect")) {
    responseObject = new RedirectResponse();
  } else if (route.equals("file")) {
    responseObject = new FileResponse();
  } else {
    responseObject = new FormResponse();
  }
  return responseObject;
}

This has gotten out of hand. It's an unwieldy mess and you can imagine how much worse it would get if we added more ResponseObject subclasses. In this method we've introduced incidental complexity. Remember your to-do list, the one you created on your first day on the new team? The complexity we see in the code above was introduced by the other developers on that team and it will make it more difficult for you to accomplish each item on your to-do list.

Let's look at one way this could have been avoided.

public ResponseObject generateResponseObject(HashMap<String, ResponseObject> routes) {
  ResponseObject responseObject;
  String route = httpRequestContent.get("route");
  responseObject = routes.get(route);
  return responseObject;
}

We parsed the request and stored it in a hash map. Now, to access the route we simply retrieve the value for the key "route". With that route we can look up the specific subclass of ResponseObject that we want from a hash map we created with keys like "time", "echo", and "redirect" and the corresponding response objects as values. The incidental complexity that was introduced is gone, the code is more extensible, and easier to understand.

Tight coupling

Let's say your team also has a Ruby backend service. You're exploring the project and you find a class with a long list of require statements. One of the signs of a coupled code base, this adds some unnecessary complexity. You might find it difficult to make sense of a tangled web of dependencies and third-party libraries. This can have a significant impact on the program's complexity. Sometimes a program’s flow of control is intrinsically complex due to the problem domain, but often times we introduce an unnecessary level of complexity as a result of the confusing (and flawed) way in which our program is designed.

Let's consider another situation. You're writing tests and your test setup begins to grow to the point where it's very difficult to follow. You decide to extract the test setup into a helper function. The class you're testing has a huge pile of dependencies. This thing has become a bit of a mess and you're pulling lots of other files into it. You move the test setup into its own file. You're satisfied that your code is DRY, but did you really accomplish anything or did you just hide a mess that needs to be dealt with? In situations like these, DRYing up the code isn't really helping anything. You've just hidden a problem. The root of your problem is a poor design that has led to tight coupling and a confusing mess of dependencies.

Including multiple files, each of which includes multiple files, can lead to bugs (ones that are especially difficult to track down) and confusion on the part of anyone working in the code at a later date, particularly if the location of specific functions isn’t specified. In his blog post Modules called, they want their integrity back, 8th Light Craftsman Josh Cheek describes some pitfalls of carelessly including modules that provide unneeded and unexpected functionality.

Improper formatting

Formatting antipatterns related to indentation, whitespace, naming and comments can make code unnecessarily difficult to understand. Naming is particularly important and well-constructed method names can make a developer’s life much easier. Similarly, the number and name of parameters is important to consider. While this group’s impact on code complexity may not be as severe as unruly nesting, it is still significant. Consider this example.

(defn resume-game []
  (let [file-name (get-file)
          game-type     (:game-type file-name)]
    (if (= 1 game-type)
      (display "We've loaded your game. It's the computer's turn.")
    (display "We've loaded your game. It's the human's turn."))))

This is a simple function, but it actually takes a few moments to understand its purpose as a result of an inconsistent use of indentation, whitespace, and naming. Let's look more closely at the mistakes in the above function. The indentation within the let special form is different on each of the three lines. The incorrect use of indentation within the if special form is also a problem. Because the else is implied in Clojure, it's important for code readability that we align our code appropriately. We also see an extra, unncessary white space after game-type binding. Lastly, the naming is odd. It seems like file-name is actually the contents of the file, not really the name of the file. The game-type is being evaluted against an integer, which implies that it's not really a type of game, but rather an integer associated with a type of game.

Unruly nesting, tight coupling, and improper formatting are three common causes of incidental complexity. Fortunately, there are a number of ways to measure complexity and provide an insight into where the complexity was introduced.

Code complexity metrics

Computer scientists have been measuring code complexity for decades, but we still struggle to quantify it. Unlike the manufacturing industry, where processes and procedures tend to be objectively analyzed using detailed metrics, in software development we often fail to provide real quantifiable measurements regarding our code. A manufacturing plant might have a way to identify unnecessary complexity or inefficiencies in their processes, whether that’s in the order of the process, the way the machines are told to do their jobs, or in the way managers communicate and teach subordinates. In our industry, however, similar points of complexity and inefficiency aren't as likely to be discovered.

Cyclomatic Metric

The most widely-used metric used to quantify code complexity is called the Cyclomatic Metric. It measures structural complexity by analyzing the control logic (essentially the number of paths through a program). The metric was created by Thomas McCade who recommended developers count the complexity and split code into smaller modules when that complexity exceeded 10. While this metric by no means guarantees an accurate portrayal of a software system’s complexity, it can provide a guideline for estimating test coverage.

The complexity of an expression within a conditional statement is never acknowledged.

Halstead complexity measures

Developed by Maurice Halstead in 1977, these evaluate attributes such as length, vocabulary, volume, difficulty, and effort. For the most part these complexity measures are based on the operators (the modification of data) and operands (the data being modified) in the code.

Length is concerned with the number of operators and operands. Complexity stemming from the vocabulary is determined by the uniqueness of the operators and operands (where a smaller number of more frequently repeated ones is considered less complex). The first two measures are combined to form the third, volume, which allows you to judge the complexity of the code based on the size of the code base. Difficulty refers to writing and maintaining the code, and is proportional to the number of unique operators and operands. Lastly, effort takes into account the volume and the difficulty. These measurements are taken from the source code and the code is not actually executed.

These metrics can provide some useful information, but their worth must not be overvalued. At a minimum, these metrics encourage developers to discuss important attributes of their code such as vocabulary used, length of the code, and difficulty of writing and maintaining the code.

Absolute Priority Premise

8th Light Craftsman Micah Martin suggested the Absolute Priority Premise as a means of scoring code to objectively evaluate it. Because our tendency to judge code based on both static and dynamic qualities is inherently subjective, Micah sought a way to remove the dynamic aspect and used the Transformation Priority Premise as a jumping off point. What I found especially interesting about Micah's presentation was his suggestion that TDD can lead to less-than-ideal algorithms as a result of the indirect path taken to arrive at the solution. The "less-than-ideal" part is incidental complexity and as I've mentioned, it's not always easy to determine where this complexity was introduced and how to remove it. The APP can be of use because it assigns point values to different operations. The six things taken into account are constants, bindings, invocation, conditionals, loops, and assignment. Each of these is weighted so a constant adds one to the total mass, while a while loop adds five. The sum of these produces a total code mass, which can hint at the weight and complexity of the code. In theory, a lower score would mean a simpler, more direct, and less incidentally complex solution.

Where and how can incidental complexity be reduced?

We've already discussed a number of ways unnecessary complexity can be reduced, but there's another piece of the puzzle that comes long before we sit down to write code. Dijkstra claims the steep level of complexity involved in writing software is only made worse by the way we introduce "radically novel" concepts. It’s common belief that in order to access new information we must build connections from previous knowledge, but Dijkstra viewed metaphors as nothing more than a crutch. Perhaps part of the struggle we endure as we learn (all of us, regardless of experience) is due to our tendency to force a square peg into a round hole. Perhaps the incidental complexity in the code base was introduced by developers who had flawed teachers and mentors. Their guides failed them in many ways and as a result created developers more prone to this type of mistake. Perhaps it isn't at the coding level where we should look for ways to reduce incidental complexity, but rather at the teaching level.

More than two decades ago, Dijkstra wrote a paper titled On the cruelty of really teaching computing science where he made an observation that is startlingly relevant today.

The educational dogma seems to be that everything is fine as long as the student does not notice that he is learning something really new...Coming to grips with a radical novelty amounts to creating and learning a new foreign language that can not be translated into one's mother tongue.

As software craftsmen, we may interact with beginners who are taking the first few steps on a long journey toward becoming proficient developers. We certainly work with people who have a small amount of experience but are a long way from becoming master craftsmen. Regardless of the individual's skill level, everyone we work with will be introduced to something new at some point. It might be a new language, a new design pattern, or a new code base. For that person, the process of learning is inherently difficult.

The key to teaching is not finding a box that already exists within someone’s mental model and tidily wrapping up the new information to place in that box. Where we strive to reduce incidental complexity, we so often add to it by forcing ill-fitting metaphors, often times through anthropomorphic language that takes the place of true software terms. This forced personification often handicaps the learner and puts them on a path to make similar mistakes in their code.

We all write code that is unnecessarily complex. We read code that is the same. We might help others acclimate to a new code base in a way that actually increases complexity, and if the code violated some of the guidelines mentioned earlier, the code base itself could be overly complex. We might become familiar with a new code base in the same way, deciding to overcome complexity rather than removing it.

Writing code that introduces a minimal level of incidental complexity isn’t just important for the others working on our team. It’s also for ourselves. Brian Kernighan put it very well when he said:

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. Brian Kernighan

Ryan Verner, Software Craftsman

Ryan Verner is interested in the process of teaching and learning the craft of software development.