I’ve been on a bit of distributed systems streak lately, which will make this post seem quite out of order. But… that’s part of what this blog is for: I write down and expand on the thoughts I have when I get attacked by them. The book can put things in order. So today’s a jog back to thinking about testing.

Among the many goals of TDD is being quick about things, and to be quick by being small. I’ve been re-reading Kent Beck’s book on TDD, and it’s almost excruciating how small the steps get. There’s a real sense of panic to get back to being able to run the tests and see the green light again.

(Interestingly, Beck does not prescribe small changes later in the book. He muses that larger changes are probably reasonable, but notes that practitioners of TDD almost always gravitate towards more rapid small changes by preference.)

Likewise, there’s a strong sense that tests should be small: isolated unit tests. Test “one thing.” Make sure, when tests start failing, that they can be quickly tracked down to their root cause. Isolate parts of the system, so broken code in one place doesn’t create “spurious” test failures elsewhere.

I think both of these things are, in one sense, redundant. That said, there’s still good reasons to want to do both. I also think types (and data!) change things a lot.

Why keep changes small?

Part of forcing yourself to make small changes is that this is simply how humans work. You have to break large problems down into small problems. We just can’t cope with problems that are too big, and we seem to take non-linear (hyperbolic?) amounts of time to deal with larger and larger problems, until we hit that limit and our brains overflow and it’s impossible. So forcing yourself to work small change by small change can increase the speed that you’re working overall, even if you’re doing “more work.” (In quotes because this measure of “work” is by typing on a keyboard, and typing isn’t a good measure of the work involved, is it?)

Some might object, “work smarter, not harder!” But this advice is rather useless. If you get the work done faster and easier by doing a lot of small changes, instead of one big complicated change, then the “smarter” approach is the “harder” approach, so this aphorism is just… tautological.

But the other reason it’s important to keep changes small comes from the interaction with testing. If you go from green to red with a small change, the thing you “broke” is narrowly identified: it’s something (or directly related to something) you just changed. The debugging problem is simpler. We know where to look.

Which means, with small changes, we don’t actually need narrowly-focused, isolated, unit tests. The point of all that is making it clear what broke, but our small changes already do that!

Why try to keep tests isolated?

Well, as long-time readers of the blog know, I don’t necessarily like this. “Over-mocking” is a design and testing failure that is a result of over-zealous attempts to keep tests more isolated than has any real benefit.

One of the reasons we try to keep tests isolated is that keeping tests small mean they run faster. TDD is all about getting into that loop where you can make a small change and then run the test suite. Big, long, expensive, time-consuming test suites impede that. It’s kinda hard to get immediate feedback when it’s just not immediate.

However, there is another reason to want to better isolate tests. Because we can’t always keep changes small.

Sure, that goes against the general TDD philosophy, but we can’t really lose sight of the fact that TDD is some weird human-specific hack that just kinda seems to work pretty well? We can speculate on some reasons why, but really this isn’t a principled methodology. We didn’t carefully arrive at TDD by following the empirical scientific evidence. It’s absolutely inevitable that it’s not always going to work well.

When we’re forced to make larger changes, isolation helps us quickly debug test failures. A failing isolated test is one that says “this particular small bit of code has something wrong in it.” It narrows things down immensely. When our change is small, we don’t really need help with that. But when the change is large, we’re going to have to spend time debugging and tracking down each and every failure to their root cause.

This is one of those things that can make big changes take non-linear amounts of effort. It’s not just our brains, there’s also something inherent to it. Instead of “oh, this is broken, let me look at it and fix it,” we have to spend time trying to figure out what even is broken in the first place. And then the necessary fix could have far-reaching implications, too!

So I don’t want to disparage isolation for tests! It’s clearly a benefit, when you can get it cheaply enough.

But take note: the TDD method has us going with two redundant approaches to trying to keep debugging time down. First, we keep the changes we make really small before getting back to a green test run. Second, we try to keep the tests as reasonably isolated as possible, so that when we run into a large change, we don’t have to spend as much time tracking down why a failure happened.

A hypothesis about static types

I don’t think it’s a coincidence that TDD came out of the Smalltalk community. I think this is a design approach that’s especially well-suited to the dynamic object-oriented style of programming, both it’s strengths and weaknesses. I also don’t think it’s a coincidence that one of the immediate “walls of applicability” TDD ran into is database schema design. It’s… just not something amendable to this approach!

I think, and of course cannot prove, that static types and better support for data change things about development that affect both of the phenomenon I’ve discussed above.

  1. TDD encourages smaller units of change. Static types and data allow rigorous (machine-aided) thinking about much larger chunks of code. This is not a requirement to do things in bigger chunks, but it enables thinking holistically and systemically to a much greater degree.

  2. TDD encourages isolation of tests to narrow the scope of large changes. Static types and data enforce a natural isolation. Modulo bugs involving state (an orthogonal issue?), it becomes a lot easier to track down the cause of a test failure, compared to a dynamic object-oriented design, because the test failure can only originate in a cause permitted by the types. (Ignoring state, a lot of “cascading failures” of non-isolated tests simply become type errors instead.)

One of the things I think happens here is that the “edit and typecheck” loop serves essentially the same role as “edit and test.” We always want machine-checked feedback quickly!

A quick example

Let’s write the map function in Haskell, using TDD.

The first step? That’s right, types.

map :: (a -> b) -> [a] -> [b]

This might come as a surprise to some who think of TDD as “test-first,” but you literally cannot write the test without first thinking about types. Types are always first.

Now, let’s get to that test!

test_map_single =
  map id [1] == [1]

Okay, you could argue I should start with the empty list, but I want to speed this example along a little. Now we have our test, let’s follow the rules and get an implementation that gets us back to green as fast as humanly possible!

map f x = [1]  -- TYPE ERROR!

Oops, that’s doesn’t work. Uh…

map f [1] = [1]  -- TYPE ERROR!

Uh, well,

map f [] = []
map f [x] = [x]  -- TYPE ERROR!

Shoot.

map f [] = []
map f [x] = [f x]
-- Well, maybe now just a warning: incomplete match

Honestly, that should probably be an error rather than a warning, but hey! We’ve arrived! But… there’s that warning to fix… and how?

map f [] = []
map f [x] = [f x]
map f (x:xs) = undefined

LOL, got it. I said TDD tries to encourage excrutiatingly small steps. Okay, now let’s try another test…

test_map_increment =
  map (+1) [1, 2] == [2, 3]

Great! It fails. And now to fix it…

map f [] = []
map f (x:xs) = f x : map f xs

Well… we’re kinda forced to just finish it, aren’t we? Are we really already done writing example-based tests for map? If our implementation is done just like that, we’re left unable to write a test that fails anymore. We can’t restart the “red-green-refactor” loop without going red.

And if we opt for property testing, we have an even shorter run. Starting over from scratch:

map :: (a -> b) -> [a] -> [b]

prop_map x =
  map id x == x

Oops, now we’re forced to go directly to the correct implementation, nothing else to do here! I suppose the benefit here is that while we’re developing that solution, we can continue to get counter-examples of where our implementation fails by running the tests. So… “TDD” is still good here! But we’re emphatically not following the process as originally envisioned. The tiny step by step process of writing test after test gets completely circumvented. We stood up the type and property test and got dragged all the way to the final implementation.

What’s happening here is that types are a significant aid, and type checking gives us similar feedback as testing. We’re practically railroaded into writing a correct implementation of map after just 2 example-based tests (ok, I skipped [], so 3), and even just 1 property test is enough to force us all the way to the right implementation.

Although I’m not going to write out a comparable example here, I’d encourage you to try this with, say, Python. How long until you start to have confidence that you’ve got a reliable implementation? For reference, the Python standard library has 12 tests. That fits my intuition of about how many tests you’d probably want to write, TDD-style, to arrive at an implementation you have confidence in.

“TDD as described” is really focused on dynamic object-oriented languages. It changes a lot when good types and data are involved.

Thinking with types

One of the first tasks I ever did “professionally” was design a database schema. Sit down, figure out what tables we need, with what columns, their relationships, normalization, obvious initial indexes, and so on. This is a task for which TDD is ill-suited, but this is also a task we can do without running any code!

Kent Beck’s original book notes database schema design as something not really addressed by TDD, and it sure seems like that hasn’t changed. It hasn’t stopped people from trying, but oh yikes. I’m gonna scream if I see “test that table x has column y” “oh no, it failed!” “modify table x to have column y” “mission accomplished!” ever again.

Schema design is just a form of type design. This is also what we’re doing before we write tests. We write the types. Can’t write tests without the types.

And the types can be arbitrarily large—a whole database schema is a nice example. When programming, we’re not just talking about the signature of a function, but we might be designing the types of arguments to that function, too. Or at least, we might with a language that has good support for data.

When we have to use objects instead of data, we end up pausing what we’re currently doing, and going to create that data class instead. And that class probably has methods, and we so we might start by writing tests for those methods. We end up following a very different path of development, like what we see in the books about TDD.

But in any case, type design is task we humans actually can do just fine. We don’t actually encounter problems accomplishing this, the way we do writing implementations. There isn’t much of a design methodology actually called for here. Or at least, whatever design methodology we should be using isn’t anything like TDD at all.

Type, test, red, code, green, refactor

I think my point today drifted into several things:

  1. “Small changes” and “isolated tests” are redundant, but redundancy is useful because nothing’s perfect here.
  2. Good (e.g. non-nullable, abstract, data-oriented) static types help us naturally isolate tests, reducing the need to alter designs (to over-mock) to pursue it. (State remains the major foil for isolation.)
  3. Thinking with types is powerful, does not require anything like TDD itself, and is synergistic with TDD in practice.
  4. Good support for data (together with static types of course) makes us apply TDD differently. It’s not the same as what happens with the purely object-oriented style. (Not just dynamic, even the statically typed object-oriented style, such as Java.)