Thoughts on Code Clarity & Maintainability

under construction This post is a work in progress, and may be updated at some point in the potentially distant future!

I’ve been thinking about code clarity & maintainability recently, and I thought I’d write some of it up. The main goal is to make code easier to read, debug, understand, and collaborate on. This is especially important as a team grows.

Code Styles

For teams, choose a set code style guidelines and stick to them. Keep indentation, naming conventions, spacing, etc. consistent so all contributors do not need to learn a new style every time they move to a new section of the codebase. Keep a knowledge base of best practices and guidelines.

If your language has a standardized code style, adopt that if possible so you don’t need to teach new hires your custom flavor of code formatting.

It’s not particularly useful to argue about the best style, isIndexSignatureDeclaration, etc, - it’s more important to be consistent across the codebase.

For example, some common JS style guides are:

Python has it’s own PEP 8 style guide, Rust has it’s own recommmended style, and many other languages have similar guides or formatting tools.

Type Annotations and Comments

Obviously, this is geared towards scripting languages like Python and JavaScript, not typed languages like Rust, C, etc which require types for the code to run.

Some languages, like Python, support this out-of-the box in recent versions. Using type annotations makes it much more clear how to properly utilize a function, and much easier for static analysis to catch issues. It allows you to communicate expectations around inputs and outputs to the rest of your team.

Type definitions are nice since they’re part of the code. Unlike comments, they’re harder to ignore or forget when refactoring or adding new code. With good naming and type annotations, comments can be relatively minimal to if you need to describe type information, etc. in them.

These examples are in Python and are supported out of the box, but languages like JavaScript require using a tool like TypeScript to compile your code first. Many linting tools alert if types don’t match expected values.

Resources:

Anti-Pattern

Take this function definition as an example. Just reading this, you don’t have much to go on for how to use it. You could maybe guess that name/email might be strings, but the format of user is unknown without further context. Is it an id? is it an object? Does this return the updates? We can’t be sure.

def perform_user_updates(user, name, email):
  # <some implementation>

Best Practice

The following is a little more verbose, but much more maintainable in the long run. It’s explicitly clear what’s required to successfully call the function, without even needing to read its implementation. As a plus, modern editors also suggest/autocomplete fields when things are typed, and comment blocks are visible on hover in other files!

from typing import TypedDict

# this could be stored in a separate module for types and imported across your project (recommended)
class User(TypedDict):
  user_id: int
  name: str
  email: str

def perform_user_updates(user: User, name: Optional[str] = None, email: Optional[str] = None) -> User:
  """
    Update a user's information in the database, and return the current user.
    If arguments are not specified / None, the existing values are kept.

    - user: the user object to update in the database
    - name: the user's new display name
    - email: the new email
  """
  # <some implementation>

Functions / Modules

Use pure functions for computations

When doing anything non-trivial algorithmically, prefer to use “pure” functions. Functions of this type:

  • Return the same output given the same inputs
  • Have no side-effects:
    • do not modify global variables, arguments, etc.
    • don’t perform database read/writes, make API calls, etc.

If you need to perform side-effects, prefer to do them in a separate function that is called outside the pure function. This allows testing of the pure function without needing to mock out the side-effects.

Avoid global variables

Global variables make it hard to reason about the current state of your program while debugging or editing. If you need to share state, prefer to create it once and pass as an argument to any functions you call. This way, it’s clear what is impacting the behavior of your functions.

There are some exceptions to this, where a global singleton is needed. Some examples of this might be a database connection, configuration, etc.

Limit function size

This is especially important as your team size grows. It’s very hard to maintain a single multi-hundred-line function or module. If you chunk it up into several functions that are then chained together in some controller function, you can reason about the behavior of a single portion more clearly.

If split up the code, you can work on pieces without needing to worry as much about potential side-effects of modifying one part of a single large procedure, and you’ll have far less merge conflicts if multiple people are contributing.

Prefer to split logic, database updates, and utilities whenever possible to ensure the code’s easy to follow. Prefer to reduce the amount of code needed to be understood in order to build, new features and fix bugs.

Split up large modules

If a module grows to a large size, it might be advantageous to split it into separate files. At the moment I don’t have a “guideline” for how to approach this, other than do something that makes sense for your team and codebase.

Notes

I’ve maintained a fair amount of code at this point in my career that breaks one or more of these rules - and this is where I’ve seen all sorts of issues arise. These modules have tended to be several thousand line files, often just one or two large functions, using global variables within for maintaining state throughout the whole thing. Where functions are used, they have side effects requiring edits to all sorts of other places. I’ve often seen this as part of some prototype that ended up getting sent into production, and never refactored.

Programs like this incredibly hard to maintain. It’s incredibly difficult to collaborate on these sorts of programs, since adding a new feature usually introduces edits across the file and all sorts of merge conflicts arise. Reuse of code from these sorts of programs is next-to-impossible.

Un-Nest your code

Especially for languages like Python where indents are meaningful, but this applies to all languages - limit your nesting as much as possible. There’s a couple ways to tackle this, depending on the situation.

  • Add functions to break up distinct cases
  • Refactor callback nesting into async/await or similar async code
  • Refactor if statements to reduce nesting / bail out quickly for error cases
  • Rethink your logic to avoid nested loops (with large n, execution time can quickly grow since this is O(n^2) or worse)

As part of the linux kernel coding style - they explicitly call this out. Indentation for linux kernel code is set at 8 characters, and this makes it incredibly clear when indentation is becoming a problem.

Callback Hell

This is a common issue in JavaScript, where you have a series of nested callbacks.

JavaScript (and many other languages), have a solution to this problem, by way of async/await, and promises. You can easily wrap some classic callback function with a promise “promisifying” it, and then and await the returned promise:

Anti-Pattern

/**
 * Avoid this!
 * This can get incredibly nested, and confusing, fast.
 */
function foo (callback) {
  someLibfunction((res, err) => {
    if (err) { return callback(err) }
    someOtherLib(res, (res2, err) => {
      if (err) { return callback(err) }

      // do something (potentially *MORE* nested logic)
      // this could go on
      callback(undefined, res2)
    })
  })
}

// to use our function:
foo((err, res) => {
  if (err) {
    return console.error('error!', err)
  }

  dosSomething(res)
}))

Best Practice

I’ve found that utilizing tools such as Javascript’s promises are a good way to tackle this problem. In node, util has a nice promisify function that turns any callback func into a promise that can be awaited. This also allows you to write less error handling boilerplate as a side-bonus!

import util from 'node:util'

const someLibfunctionAsync = util.promisify(someLibfunction);
const someOtherLibAsync = util.promisify(someLibfunction);

/**
 * A much clearer version of foo()
 * Logic is much more readable
 */
async function foo() {
  // get the first Result
  const firstResult = await someLibFunctionAsync()

  // get 2nd result
  const secondResult = await someOtherLibAsync(firstResult)

  // do something else:
  // logic
  return secondResult
}

// then, use it:
foo()
  .then(res => { doSomething(res) })
  .catch(err => console.error('error!', err))

Refactor conditionals

Another trick is to limit the amount of if statements. Sometimes, it’s better to invert your statement and raise/return early, rather than keeping logic for that dangling in else {} blocks.

I saw an example of this on twitter x.com. Of which, there are some extremely overcomplicated suggestions, ranging from doing math to calculate the correct ASCII char code, to using all sorts of dictionaries and loops. The best solution is still to just use a simple if statement.

Anti-Pattern

function checkGrade(score: number) {
  if (score >= 90) {
    return "A"
  } else {
    if (score >= 80) {
      return "B"
    } else {
      if (score >= 70) {
        return "C"
      } else {
        if (score >= 60) {
          return "D"
        } else {
          return "F"
        }
      }
    }
  }
}

Best Practice

function  checkGrade(score: number) {
  if (score >= 90) {
    return "A"
  }
  if (score >= 80) {
    return "B"
  }
  if (score >= 70) {
    return "C"
  }
  if (score >= 60) {
    return "D"
  }
  return "F"
}

Often, you can move these onto a single line to enhance clarity if the conditions are simple:

function checkGrade(score: number) {
  if (score >= 90) return "A"
  if (score >= 80) return "B"
  if (score >= 70) return "C"
  if (score >= 60) return "D"
  return "F"
}

Anti-Pattern

Here’s another example, where nested conditionals and else statements are hard to flollow. THe preconditions for the function are specified throughout the body of the function, making it hard to follow what’s going on.

function getDashboardInfo (user: User) {
  if (user) {
    const out = {}
    if (!user.suspended) {
      // TODO: calculate dashboard metrics
      return out
    } else {
      throw new Error('User is suspended')
    }
  } else {
    throw new Error('User is not valid')
  }
}

Best Practice

Do this, which is both easier to read, and debug. The preconditions for the function are specified at the top, so it’s easier to read:

function getDashboardInfo(user: User) {
  if (!user) { throw new Error('User is not valid') }
  if (user.suspended) { throw new Error('User is suspended') }

  const out = {}
  // TODO: calculate dashboard metrics

  return out
}

Avoid Nested Loops

For small n, this doesn’t cause too much of an issue. For larger n, this is a huge bottleneck, as the time complexity here is O(n^2) (quadratic time)

There’s some cases where you need to nest loops (i.e. computing a matrix of data, or something along those lines). But for many data processing cases, you don’t need to, and the nesting is a sign there’s a better way (and clearer/concise) to write it.

Anti-Pattern

This is an example of some theoretical update procedure. Suppose we have an array of items, and we need to produce a merged set with a new list of items/item updates:

const out = []

// merge updates to items with existing data
for (const item of items) {
  for (const new_item of new_items) {
    if (item.id === new_item.id) {
      out.push({
        ...old_item,
        ...new_item
      })
    }
  }
}

// add items we didn't find in our current dataset
for (const new_item of new_items) {
  let found = false
  for (const outItem of out) {
    if (new_item.id === outItem.id) {
      found = true
    }
  }

  if (!found) {
    out.push(new_item)
  }
}

Best Practice

We can more efficiently do this if we first scan the initial list into a hash table. We can then execute our merge much more quickly as the size of items grows, as the time complexity here is O(n). This is also a ton clearer what we’re trying to do:

const out = []
const existingItems = {}

// create hash table of current items
for (const item of items) {
  existingItems[item.id] = item
}

// scan the new items and perform updates to build a new set of items
for (const new_item of new_items) {
  if (existingItems[new_item.id]) {
    // merge updates to items with existing data
    out.push({
      ...existingItems[new_item.id],
      ...new_item
    })
  } else {
    // add items we didn't find in our current dataset
    out.push(new_item)
  }
}

Libraries and Abstraction

A healthy amount of abstraction is helpful in keeping code consise and understandable. If you find yourself copying hundreds of lines of code around your app - that’a a good sign you should abstract that portion.

I’ve worked on a fair amount of sites and apps that use the MVC architecture. In my personal opinion, you should try to keep your controllers as high level as possible - large amounts of business logic don’t belong there. If there is any non-trivial logic, it should go into a model or library function so that it can be re-used, without requiring copy-pasting across controllers.

In many apps, you’ll end up duplicating common controller operations at least once or twice, especially if you have a main app, an administration portal, or an api which have different controllers but all perform similar functions internally.

Final Words

Like most things in programming, some of this is subjective. But I’ve found that trying to follow these guidelines has made my code much easier to maintain, and understand and much easier to collaborate on.

Change Log

  • 4/15/2024 - Initial Revision

Found a typo or technical problem? file an issue!