Going overboard with unit tests

I read other sources of design advice, in the course of writing this book. A big part of what I want to convey with my book is a few “big ideas” that have a lot of consequences. When I find other advice that I feel ignores these big ideas, that’s a great opportunity for some critique.

Today’s post is inspired by something that really vexed me.

Stop me if you’ve heard this one before

Tests should only test one thing.
Each test should be independent and self-contained.
Refactoring should not break tests.
Try to achieve maximal coverage with tests.

Would it surprise you, dear reader, to hear that the above set of advice (which I’m starting to find as standard in certain circles) is not just practically difficult, but literally impossible, a logical contradiction?

Let’s start at the beginning

One of the ideas—good ideas—behind refactoring is that you should be able to keep all tests green while you do it. This makes perfect sense, in some ways.

If we are refactoring the internals of a public API—a system boundary—then we want to ensure we aren’t breaking users of that API. Most obviously, we don’t want to change the API/ABI of the system boundary, as that would directly break external users. But equally, we don’t want to break the intended behavior of those APIs, and testing is one way we can police that. If we break a test that’s written against a system boundary, obviously we’ve changed some behavior that an external user might be relying upon! In this way, tests are mini-contracts, warning us our implementation is no longer doing what it is supposed to do, or that our changes might have done something we didn’t intend.

But this whole logic only applies to tests written against a system boundary. It does not, and sometimes cannot, apply to unit tests written against the internals.

Especially the kinds of unit tests you’re writing because someone told you tests should only be testing one thing, independent and self-contained.

Refactoring can break unit tests

And here’s where we start to run into problems.

Behind our system boundary, we can have quite a lot of code. Code where we control all users; it’s not a system boundary. And that code will have lots of different subdivisions.

If we’re doing things in an OO fashion, that means lots of internal classes. And so, we’re likely to start writing tests against those classes and their public methods.

And this is reasonable, to a point. Certainly, we can write a bit of code, and we may then immediately want to double check we managed to do that correctly, by testing it out a bit. Writing those as unit tests is just “automating our repl,” after all.

All of our problems arise when we start insisting that we can’t break these tests. The absolute central problem here is that we want to! Changing the behavior may well be the design change we’re trying to make.

Tests to ensure behavior doesn’t change makes sense on a system boundary, because there we don’t want to change behavior. Tests that ensure behavior doesn’t change are ridiculous on an internal API, especially when we’re trying to change that behavior.

Internally, we control all the users of that API, so changing the behavior is fine (and for many refactorings, desired). In fact, we frequently want to deliberately break an internal API, so the compiler will help us find all its users. Perhaps by changing the name of a function, or type, or constructor, etc. Just follow the list of red squigglies to each place we must audit, to accommodate the design change we’re trying to make.

And then the red squigglies lead us to the tests, which are all broken by our change, and must be rewritten.

Sometimes, if the goal is just to rename a function, we’re able to use IDE refactoring services to do the work for us. This is fine when the goal is the renaming. But frequently, what I want to do is make a more subtle change to the function’s behavior, and so I deliberately rename it, to cause a compiler error everywhere it is used. This is the point of static languages: the computer should help me do the work. Take me to every place I need to think about, wizard machine!

Of course, that’s not such a great deal when it turns out to be 80% tests that are now broken and that I now need to manually fix. This is the tension of unit testing. This is why we should concentrate tests on system boundaries where possible. In this situation, unit testing has impeded our ability to refactor our code.

Does this mean unit testing is a problem?

Not really. For one thing, a class in object-oriented languages is intended to be a “miniature system boundary” encapsulating its internal code. Sometimes that works out quite well. It’s just that sometimes it doesn’t fit well.

Part of the problem is simply an over-emphasis on classes in testing advice. If you come away from bad testing advice thinking every class needs to have unit tests written against it, you can start to suffer this problem. Some classes are aggressively internal, and really shouldn’t have their own tests.

Part of the problem is a denigration of “integration testing.” This is sometimes taken to mean that any test should depend on the behavior of only the class under test, otherwise you’ve broken the sacred rules and it’s not a neat, proper unit test anymore. The trouble here is that a class looks like any other class, but in reality some are system boundaries and some aren’t. Classes are artificial boundaries, they don’t naturally reflect a real “unit” of code.

So sometimes a class represents a system boundary, in which case test away! In some cases, a class represents a real “unit,” and so testing it will be fine. But most classes are in-between.

And of course another problem is that example-based testing means test suites are a lot bigger and so harder to maintain in the face of changes. Consider property testing.

Keeping tests too green

This kind of “contradictory TDD” advice is also interestingly self-defeating. If you’ve really made all your tests very self-contained, then it’s no longer even possible to effectively use tests on the system boundary.

The whole idea behind a refactoring not breaking a system boundary test is that the test should be spotting undesired behavior changes on that boundary. But if you’ve worked hard to isolate that boundary (test only one thing!) from its dependencies, then you might accidentally make a breaking change… and the test wouldn’t spot it. This can happen if things have been over-mocked.

After all, how often have you seen the advice that you should see a test failure only against the code you’re changing? If you have class A depends on B depends on C, so the advice goes, then a change to C should not be causing a test failure in A. Obviously it should be reported on C, right? That’s where the behavior changed!

But now consider A as a public API, and C an internal API, where we intended to change the behavior of C, but did not intend for that to impact the behavior of A. If we’ve followed that isolation advice, the test on A is incorrectly mocking out C and we’ve had to fix the tests on C to accommodate the behavior change, and… we broke something, and the test for it comes back green.

Not so helpful, that.

How does this advice persist?

We can’t possibly follow it! In part, I cynically wonder if that’s a feature. Any time this strange form of TDD (this is not what all TDD people advocate) fails, the consultant can insist you did it wrong. After all, you literally cannot do it right, can you?

I’m okay with tensions and trade-offs. There’s frequently some balance to be struck between competing concerns. I’m okay with rigid rules. (I like static languages, after all.)

But we’re dealing with the impossible here. If you’re writing these independent, self-contained, high-coverage, one-thing-only unit tests, then you’re eliminating your own ability to do effective refactoring.

The only reason I don’t think this particular problem with rigid tests could lead you into designing code worse than the legendary untestable designs of old, is that we always have the option to declare testing bankruptcy.