CRUFT: An alternative to the Technical Debt metaphor

Technical Debt is a term that means different things to different people. Ward Cunningham created this metaphor as a way to explain to his non-technical boss why they were refactoring their code. But if you get six programmers in a room and ask them what it means, and you'll get a dozen definitions. This phrase can mean just about anything, from useless features, to bugs, to complex code. So, practically, it means "code I don't like".

We, as programmers, can be more precise when we discuss code. For the last few years, when I find myself reaching for this metaphor, I try to think a little harder about what it is that I'm trying to say. We don't need to use a financial metaphor intended for non-technical people to discuss the tradeoffs that we make in software. Instead, I'd suggest we use this:

CRUFT

CRUFT represents five dimensions of technical debt:

Complexity - How much code do we need to understand?
Risk - What unwanted behavior do we have?
Use - What wanted behavior do we have?
Feedback - How fast can we learn?
Team - Who can support the system?

Each of these dimensions can be measured, often using simple and objective metrics that are familiar to anyone who has worked in software.

Complexity

Complexity makes software hard to change

Complexity comes in many forms, but "lines of code" is a pretty reasonable measurement. This is especially true if you use automated formatting and linting tools to keep your code consistent. The more code we have, the more there is to read, test, document, deploy, etc. This increases the cost of change. The less code we have, the better.

Here's a thought experiment: Let's say that you have a couple of programmers, Alice and Bob, working on the same project. By accident, both of them try to solve the same problem on the same day. Bob sits down writes a 1000 lines of code that solves the problem beautifully. The code is well written, well tested, and the deployment and operation of it is well documented. Alice heads off to to the park, where she thinks about the problem and feeds the pigeons. At the end of the day, Alice wanders back to the office, deletes 3 lines of code...and the problem is fixed. Which of these solutions is better?

If you want to be more nuanced, you can separate complexity into necessary or unnecessary complexity. Complex problems sometimes require complex solutions, but sometimes we make our solutions more complex than they need to be. As technology progresses, solutions that previously needed to be complex can become simple. This is where refactoring can help. But all complexity is bad, even the necessary complexity. The more of it you have, the harder it will be to make changes.

Risk

Risks represent unwanted behavior

Risks fall into two categories, known and unknown (and perhaps a third we could call Rumsfeldian). To measure your known risks, look in your issue backlog. You probably have some categorization that makes sense here (i.e. "bug", "incident", etc...). I try to close issues for risks that I don't expect to see again, knowing that I can re-open them if I'm wrong. That way, the open "incident" issues represent risks that we are living with, but for one reason or another, we don't want to address right now. For example, let's say we had an outage because the disk on one of our servers filled up with logs. The immediate remediation of that risk was to delete the old logs, but until we set up something to automatically rotate the log files, we still have this risk.

Unknown risks are harder to measure, but thinking about their potential impact more precisely can give you more flexibility in how to manage them. I like to think about the Net Present Cost of a risk. That is, how much is this going to cost me (in terms of time/money), and when, with what probability. These are "what if" scenarios that are important to consider...but equally important is to avoid analysis paralysis. A ship is safe in harbor, but that’s not what ships are built for. Any venture requires risk, and one reason we build feedback into our software is to help protect us against risks.

When I'm reviewing a pull request, I can frame my comments in terms of the known and possible unknown risks. I will occasionally add new issues to the backlog when approving a PR to track the new risks that we're taking on when it gets merged. I feel comfortable doing that because I'm the one who will bear the brunt of the cost if that risk turns against us. I will get paged in the middle of the night. I will have to answer to my boss if something goes wrong. However, if those risks might create problems for someone else, I'm going to be more hesitant to take them. At least I'd ask first!

Use

Uses represent wanted behavior

This is why we build software in the first place: It does things that hopefully create value (usually $$$). However, uses aren't necessarily valuable. Sometimes, we create uses that are valuable, but that value fades over time. Sometimes, we're just wrong about what's valuable, and what we create doesn't result in the value we hoped for. In these situations, we can consider removing some uses to reduce complexity. This is a common definition of technical debt, one in which a system has accumulated a lot of functionality over time, but some of it is no longer valuable. The added complexity makes new uses harder to add, and so the system is unable to adapt to new requirements because its spending too much of its "complexity budget" meeting old requirements that are no longer needed.

I think the easiest way to measure uses is to look at your passing tests. Each automated test in your system represents one thing that your software can do. Again, whether these behaviors are valuable is another question. Often, to see the value of the system, you can't look at the uses in isolation...you have to look at the interactions and workflows that they enable, or the outcomes that can be achieved. But if you want to measure uses (perhaps to track the change over time), just counting your passing tests is a pretty good way to do it.

Feedback

Feedback represents how fast we can learn

Feedback is essential, and fast feedback allows us to make progress quickly. Often times we need to go through a certain number of iteration cycles to achieve a result. The total time to finish the task is a function of how long each cycle takes. How many times have you been trying to diagnose a problem with an automated build, committing and pushing changes over and over again, hoping that this time the problem will be fixed? If you need 10 cycles to fix the problem, and your build takes an hour, then this problem will take all day to fix. If the build takes a minute, you can fix it while you wait for coffee to brew.

Feedback takes many forms, but the three that I think about most are Observability, Automated Testing, and Value Discovery. Each of these has its own sub-topics, and I won't go into them here, other than to provide some basic definitions of what I mean:

Observability - Logging, metrics, tracing, latency. How do you know what running software is doing?
Automated Testing - Unit/integration/acceptance tests, CI/CD pipelines. If you make a change to your software, how do you know it works as intended?
Value Discovery - A/B testing, retrospectives, customer satisfaction, usage metrics. How do you know which uses are valuable, and assess the cost of risks?

How you measure feedback depends on what it is, but all feedback can be thought of as information over time. When something happens, how quickly will you know about it? For example, one thing I keep a close eye on is automated test cycle time. I like this a lot because it's easy to measure, objective, and actionable. When a suite of unit tests gets too slow (more than a few seconds), it means the programmers working on that code can no longer stay in the flow because the cycle time has gotten too long. That's an indication to me that we've got too much complexity in a single repository, and we need to take steps to split it up.

Team

The team represents the people who are capable of supporting the software

Software engineering teams can only be so big while remaining effective. Amazon's two-pizza teams are a recognition of this reality. We can't simply add more people to a software effort and expect all of them to be able to work together.

However, our ability to manage complexity, mitigate risks, create new uses, and interpret and respond to feedback will be a function of the number of software engineers who are capable of doing those things for a particular software system. This is sometimes different than the number of people who are organized into a "team". Different factors at different companies might create team structures where only a small percentage of a "team" is actually capable of performing these tasks.

One way to separate these two groups is to think about which people could leave the team and make supporting the software impossible (aka Bus Factor). The size of this set of people allows us to quantify the support that we can provide. In my experience, 3-5 people is optimal. 2 is lean but can be effective. With 6 to 8 people, you start to spend a lot of time on communication overhead. If you don't, the support starts to fracture into smaller groups, where only certain people can do certain things. Personally, I've never seen 9 or more people who all maintain the same software effectively.

When you only have 1 person who can maintain a particular piece of software, you introduce "key person risk", which acts as a multiplier on all your other risks. If that person departs, this value goes to zero and you will wind up with abandonware. This is a difficult situation to recover from.

Minimizing CRUFT

Given these 5 dimensions, you can treat technical debt as a portfolio optimization problem, trying to increase feedback and team support, minimize complexity and risk, while trying to maximize value derived from use. Once you start thinking of technical debt in this way, you can start making tradeoffs in design discussions, pull request reviews, and everyday collaboration with your fellow programmers. Here are some common tradeoffs:

Increasing complexity to add new uses
Increasing risks to add uses (implementing only the "happy path" to get something done quickly, without handling edge cases)
Decreasing complexity to increase team support
Decreases uses to reduce risks and/or complexity
Increasing feedback to reduce risk
Increases uses to increase feedback (through more user activity)
Increasing the team size to reduce key person risk