Breaking systems into modules

I recently came across some discussion about the meaning of the word “abstraction.” I found it somewhat enlightening, actually, to see programmers of various experience argue about the meaning of the term. Part of what interested me was that there didn’t seem to be an apparent correlation between experience and ability to articulate a meaningful definition. (This is hardly any kind of scientific claim; I’m making guesses about anonymous people here.) But that suggests to me that maybe we don’t necessarily learn this just from experience, we need some help organizing our thoughts.

But after thinking it over, abstraction can come next week. Let’s start by talking about modularity.

Modules

Modularity is generally hailed as the thing that lets us construct large programs. The general idea concerns the approach we take to break a large program down into smaller modules. Done well, this lets us think only about each smaller part in relative isolation, and ignore most of the rest of the system. This is essential, because humans have tiny meat brains and we need all the help we can get.

Psychology interlude: Some people are very fond of how good they are at jamming all the details of a large complex system into their heads. Indeed, this is a decent skill to have; it lets you do things some other people might not be able to do.

But it’s also a curse. If you feel good about exercising that ability, you may not feel the need to fix the system’s complexity. And regardless of how good you are, the larger and more complex the system, the slower it is to change. Someone who reacts, not with pride at the ability to make complex changes, but with horror at the complexity and a refactoring of the system to a better design, will eventually be more productive. And probably happier.

Before we get into how we benefit from breaking the system down into modules, though, let’s find a way to think about modules themselves. Here’s my approach:

We should think of a module has having five general parts: public abstractions and their internal implementations, of course. But also incoming dependencies, and both public and private direct dependencies.

This diagram depicts two separate aspects of what a module consists of. First are the linguistic aspects: the code itself generally describes the abstractions and implementations being defined. Second are the system aspects: we get a network of dependencies between modules. Both are important.

Long ago, the term “programming in the large” was coined to describe this kind of systems-level divide. The meaning of this term seems to have drifted around over time, I suspect in reaction to programming languages evolving to better support modular programming. Programming in the small is about what happens inside the implementation part of a module. In the large, we’re all about the system between modules, interacting crucially with the design of the abstractions each module publicly exposes.

In my breakdown:

The private implementation is usually almost all of the code a module consists of.
The public abstraction are the exposed abstractions that other modules can make use of.
The public/private dependencies are the other modules this module depends upon.
The incoming dependencies are the other modules that depend upon this one.

The distinction between a public and private dependency is one I don’t see often made. Many package managers make the distinction between “real” versus testing or compile-time only dependencies, but that’s not the same thing.

To illustrate with an example, if a module uses a regex module internally, that’s a private dependency. But if a module exposes a function that accepts a type from that regex module, that becomes a public dependency. For a user of this module to call that function, we have to know about the other module and be able to construct one of its types. If it’s a dependency our users need to be aware of too, then it’s public.

I think part of the reason we’re not used to separating these two categories out is that our languages play very fast and loose with them. In Java, for example, you cannot discover what dependencies a .java file has without fully type checking it. Even if a package is not imported, even if it is explicitly never named at all, some method from another type might return a type from that package. So x.foo().bar() can return an intermediate type after foo that’s never named in the file at all, and so bar can be a method call to an unknown other module entirely. This isn’t unreasonable behavior, but it can makes us less aware we’ve just jumped from one dependency to another.

Thinking small

This difference between a public dependency and a private one is important. It came up when we talked about how an interface can cut off dependencies With private dependencies, we’ve increased the burden on a user of our module not at all. But a public dependency means, in order to use this module (or at least, some part of this module), you need to understand another module first.

The theory of modularity is that in order to understand how to use a module, you just need to understand its public abstractions and the appropriate public dependencies. Everything else (the private dependencies, the implementation hopefully, depending on abstraction quality) shouldn’t be relevant anymore. That reduces scope. It’s supposed to cut off at depth one a recursive procedure that would otherwise probably cover an unmanaged amount of code. (We’d need to know all of that implementation and its private dependencies, and all their implementations and private dependencies, and all their etc, etc.)

In order to change a module, you only need to understand its dependencies (of any kind) in the above way, and the module itself (abstractions and implementation). Thus, the only possibility for recursively having to understand more and more are the public dependencies.

In order to make a breaking change to a module, you additionally need to understand the incoming dependencies, and their implementations, to some degree. The abstractions exported by a module are at least a tiny bit like a system boundary, but they could be hard or soft boundaries. Entire modules can be consider non-system boundaries, after all, because we might easily control all incoming dependencies (and there may be very few). But regardless of how easy the breaking change is to make, it’s still inherently a harder thing to do than a non-breaking change.

In practice, my observation is that the recursive nature of the public dependencies gets headed off by our designs. We frequently end up with a small set of foundational modules that are the things that routinely get used as public dependencies in that program. These are things like standard library modules, certain framework modules or fundamental data structures (like a JSON library), and the occasional application-specific module.

The end result is that instead of having a complex structure formed by public dependencies, we instead think of having to learn this foundational set first (in long-term memory), and then view the system structure as mostly devoid of any public dependencies (outside of this set). This greatly simplifies the task of understanding any given module. Any other module you already know is one you don’t need to spend any more time learning about before you can get to the task at hand.

(This is also a good razor for visualizing module dependencies, too. Identify the foundational modules, and just cut them out of the module dependency graph entirely. Put them aside as a list of “stuff to learn first.” It greatly reduces the amount of noise.)

Is smaller better?

With every single one of these five parts of a module, we’re better off if they’re smaller:

Fewer public dependencies is good for the reasons outlined above: this is the dangerous recursive part that could make it dramatically harder to understand code.
Fewer private dependencies is good because that’s simply less we need to understand to work on this module.
Smaller implementations are obviously good.
Fewer or smaller abstractions are easier to learn, making this module a smaller contribution to any user’s cognitive burden.
Fewer incoming dependencies means the code and design are easier to evolve (e.g. in consideration of last week’s discussion on how re-use isn’t always the goal.)

This is all true, but by thinking holistically (considering both linguistic and system aspects of what a module is) we can see how each of these goals is in tension:

It may not be possible to remove a public (or incoming) dependency without duplicating some functionality or creating additional abstractions and indirections (i.e. adding a lot of code, increasing the size of the implementation or public interfaces). Abstracting away from things we already need to understand is not helpful in reducing cognitive burden.
It might be hard to reduce private dependencies without likewise duplicating code.
If we try to reduce the number of abstractions in a module, abstractions that were presumably necessary, we’re probably increasing the number of modules, and increasing the number of dependencies between modules.
If we try to reduce the size of the implementation, we again risk adding more modules or more private dependencies.

So we don’t want to make a module as small as possible. Too many tiny modules can be a big problem, because a huge proliferation of small modules creates additional system-level complexity. There’s going to be more nodes in our dependency graph, with more edges between them. (In the extreme, consider neurons: tiny modules in a system capable of amazing things, but unfathomable complexity in between.)

When faced with a choice between complexity we can keep “linguistic” (as abstractions and implementations) or an equal amount of complexity at the system level (“between” modules), it’s usually best to stay linguistic. Our programming languages and tools are designed to help us manage that. It’s a lot rarer to have tools that are any good at helping you understand how your configuration files are wiring together interfaces and classes and modules in your dependency injection framework. It’s easier to “find all references” when your references are code, and not XML, JSON, or YAML. But where the cut-off lies is harder to say; we get surrounded by trade-offs here. This is the task of design.

End notes

A lot about this description of modularity hinges on what “abstraction” means. They’re interrelated ideas. I had to pick one first. Next week!

(2018-8-25) One thing I don’t think I played up enough here is that one of the key tools to help us understand how to reduce the dependencies from a module is to determine what sorts of concerns a module should never have. In my experience, most good designs start with what a module should do, but all the interesting design content then comes from the list of things that module should not do.