Testing with FIRE
November 18, 2016
Updated April 25th, 2023
For years now, I've held the belief that effective automated test suites have four essential attributes. These attributes have been referenced by other authors, and were the subject of a talk I gave at the Agile 2009 conference. But I was shocked to discover (that is, remember) that the only place they are formally documented is in my Continuous Testing book [Pragmatic Bookshelf, 2011], which is now out of date, out of print, and totally inaccessible to most of the Internet. And so, I'm capturing these four attributes here. I intend to treat this as a living document, updating it as my understanding of these attributes evolves.
The Four Attributes
Neatly summarized in the acronym, FIRE, the four attributes I look for in a test suite are:
- Fast
- Informative
- Reliable
- Exhaustive
I frequently consider these attributes when making decisions about automated testing. They are essential to my work, and I often find myself referring to them when making tradeoffs in test design (i.e. "This will make the test more reliable, but less informative. Is it worth it?"). Let's discuss them in detail...
Fast
When I say fast, I mean that the suite runs hundreds of individual tests per second. When writing code with tests, the duration of the test suite determines the length of my shortest feedback loop. This loop is really important. It creates the very foundation of my development process.
Some basic math will tell you that, on average, this means that my individual tests should run in less than 10 milliseconds. However, I often find that the duration of individual tests in a test suite follow a power law distribution, meaning that a handful of tests take up the bulk of the time, and the remaining tests are quite fast by comparison. This means that average test time can be misleading (median is a more useful metric), and placing hard limits on the duration of individual tests is rarely useful. I get a lot more value out of measuring the total duration of all the tests, and comparing that to the number of tests in the suite.
Another reason to measure speed this way is that the total duration of the test suite puts an upper bound on the size of the system. As I described in One Second Services, the goal of keeping the test feedback loop at a second or so can serve as a guide when decomposing a system. A well-written test suite can have up to 10,000 tests and be able to run in less than 10 seconds. At that scale, it often makes sense to run the tests in parallel to speed things up. Any larger than that, though, and you'll want to start breaking the system into smaller pieces that can be tested and deployed independently.
Informative
An informative test tells me what's wrong. It should make clear what the expected behavior was, and how it differs from the actual behavior. Tests that say "something" is wrong without telling me what is wrong aren't informative. It should not spew lots of errors into the console and expect me to read them, or dump string representations of large data structures and force me to manually compare them to see where they differ. And it should have a clear, concise, readable name that helps explain why the expected behavior is valuable.
Test suites become particularly uninformative when introducing a single bug causes the majority of tests to fail. You often see this with test suites that rely heavily on integration tests. Ideally, if I introduce one bug, it should cause one failing test. However, because it is often useful to write sociable tests, it's not uncommon to have one test fail for each underlying layer in a system. In practice, this is not usually more than 5 or so...for example in a Model-View-Controller system it should be no more than 3.
Reliable
A reliable test suite passes or fails consistently across test runs and environments. The results are completely deterministic, and solely a function of whether or not the system under test has the expected behavior. It is not dependent on the order the tests are run, whether they're run serially or in parallel, user input, the state of a database, network access, the presence of ephemeral files, or any other state or process that is not completely controlled by the test suite itself (which, again, runs hundreds of tests per second). This means that any sort of external database setup/teardown is probably not feasible on a per test basis.
It is hard to write unreliable tests that are also fast. Focusing on speed will get you most of the way there, and eliminating the the dependencies that make tests slow will also tend to make them more reliable. This is one of the natural synergies that you get from writing code that is easy to test. Just as testable code tends to be more decoupled and have better levels of abstraction, code that can be tested quickly can usually be tested reliably.
Creating a reliable test suite takes discipline. If a test fails unexpectedly, the worst thing you can do is just re-run it to see if it passes. Outside of intentional randomness, race conditions, and thread safety problems, unreliable tests are almost always caused by some unexpected external dependency. Running the test again can pollute the state of those external dependencies and make it impossible to diagnose what caused the test to fail. It's like "solving" a murder by cleaning up the crime scene.
Exhaustive
Exhaustive tests check all the behavior you want, and none of the behavior you don't. Tests are an essential part of refactoring. Unfortunately "refactoring" is perhaps the software industry's most famous malapropism. As defined, refactoring is "a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior." With this technique, the definition of that "external behavior" is controlled by a suite of tests. Behavior that the tests assert is relevant behavior. Behavior that is not asserted is not relevant and free to change. Designing the test suite to strike the right balance and get the definition of behavior "just right" is what I mean by exhaustive.
If you don't strike this balance correctly, you will find it difficult to refactor. If you under-specify the behavior in your system, changes intended as refactoring will result in altered behavior and bugs. If you over-specify the behavior in your system...asserting things that don't need to be true in order for the system to work, or breaking through layers of abstraction to test the internal implementation of your software components...you'll find that you can't make any changes to the system without also changing the tests. The hallmark of an exhaustive test suite is the ability to change the design of your system freely, without introducing bugs and also without being forced to change the tests.
Comments
You can follow this conversation by subscribing to the comment feed for this post.