Imperative programming and the RPC mistake

One reasonably well-known problem is the way object-oriented APIs fail badly if we just take them and expose them over the network. We have a tidy little graveyard of old services and protocols for trying to do exactly that. Here in 2019, we hopefully know you can’t just take an object and expose it to the network (or equally take an object, and secretly make it remote instead of local), and expect things to work out. You have to design for remote services.

Why is this the case?

Remote services have different design constraints from object-oriented designs:

Distributed systems involve failures (more on this in a later post). Typical OO design does not consider these kinds of failures to be an option. We don’t think of ordinary class design in a “what if we call this method and it just never happens?” sort of way.
Typical object-oriented design involves allocation/construction of objects and management of their lifetimes. How do you do this in a distributed way? Distributed garbage collection has never really been an option, so we often end up with error-prone manual management. But… failures in a distributed system are common, and then who will clean up the mess? So lifetime management of “objects” is required, and this introduces quite a bit of complexity, especially if trying to come up with a generic solution.
The subtleties: problems that were truly there all along, but are far less noticeable locally.

What is imperative programming?

The usual definition of imperative programming involves the notion of statements that manipulate state. I don’t think this is an especially helpful definition.

The issues I take are:

Our machines work like this at the most fundamental level, so we get into an inappropriate mindset where imperative programming is “what’s really happening.”
All programs, at some point no matter what, will involve state and manipulation of state, so in some way, all programming is imperative programming?

I’ve seen a lot of brilliant people grapple with what “declarative” programming even means, but I’m not sure I’ve seen it presented this way around. After all, what is declarative programming, when it’s supposed to contrast with imperative programming, and our definition of imperative programming appears to encompass all programming?!

So I’d like to offer an alternative definition; one that’d I’d describe as my best attempt at what a “future historian” might come up with to describe what’s going on today.

Imperative programming is the inappropriate use of state and sequences to represent things that are not stateful nor sequential.

Here is a simple example:

int *vals = calloc(sizeof(int), 4);
vals[0] = 3;
vals[1] = 1;
vals[2] = 4;
vals[3] = 1;

What we’re seeing here is the attempt to represent an array of digits of pi, but to achieve this we have to write a sequence of mutations. The contrasting non-imperative version of this code would simply be something like:

[3, 1, 4, 1]

Many programming languages lack the ability to directly describe such things, and so force us to into an imperative style. Object-oriented programming, as commonly practiced, even forces this: the whole notion of a constructor is to mutate an uninitialized object into its intended initial state. (Bizarre things can happen with partially constructed objects: consider the problems with calling virtual methods from a constructor.) We wanted one thing, but are forced to use mutation and state to simulate it.

Here is another example:

class Queue<T> {
  static Queue<T> empty();

  T pop();
  void push(T elem);
}

// Construction
Queue<int> q = Queue.empty();
q.push(3);
q.push(1);
q.push(4);

// Later
q.push(1);
q.push(5);

// Later
a = q.pop();
b = q.pop();

A queue seems pretty naturally presented this way, but this is also a pretty imperative way to go about it. I’d argue the “naturalness” of this design is as much a result of the imperative mindset (and limitations of the programming language) than anything inherent to queues.

Here’s a straw proposal for an alternative design:

class Queue<T> {
  static Queue<T> from(T[]);
  T[] pop(int count);
  void push(T[] elems);
}

// Construction
Queue<int> q = Queue.from([3, 1, 4]);

// Later
q.push([1, 5]);

// Later
[a, b] = q.pop(2);

Now, Queue obviously still represents state for us. Clearly, my aim here is not to eschew state entirely. We’ve presumed that state was desired and useful, and instead we’ve tried to design an API that didn’t inappropriately require the use of state and sequences of mutation where none was required. The API now allows operations on multiple elements at once, and so no longer do we need to use sequences of operations to accomplish the same thing.

This is, of course, somewhat contrived in order to get a small and simple example here. The question we’d want to ask is: how often we end up pushing/popping multiple elements at a time? For the sake of this example, I’ve assumed that’s a thing we want to do, but that may not be true of many algorithms that use queues!

That said, what I do think is unequivocal here is that a Queue class should have both sets of these methods, and many programming languages lack this second set…

Long time readers of the blog will likely have noticed that all I’ve really done here is use data. Many of our languages have pretty poor support for representing richer forms of plain old data.

There’s no good reason for that, but there is something of an explanation. After all, machines don’t have good support for data. Registers are integers and floats, and sometimes the integers get interpreted as pointers. Is it any wonder that early languages didn’t give us much more than that, at first?

But of course, I really don’t know what excuse we have in 2019 for not doing much better yet.

Back to the RPC mistake

The remote procedure call (RPC) mistake is exactly what we started this post talking about: just taking ordinary imperative functions and just calling them over the network without rethinking their design. So, given what we’ve talked about above, here is how I’d frame the reasons this is a mistake:

We still have the distributed systems problem: failure has to be taken into account.
OO designs are frequently too imperative. This doesn’t have to be inherent to OO, but present OO languages certainly force it in the ways discussed above.
The object lifetime problem mostly exists because OO designs inappropriately use objects to represent data, and these objects are often very ephemeral. Outside of these kinds of ephemeral replacements for data, the objects of a distributed system are often persistent. (Not that deletion isn’t a thing, hence “CRUD” apps, but rather deleting becomes a real action, rather than just some administrative cleanup we have to get around to.)

So if we look at the most successful approaches to building remote APIs, what’s different? The interesting thing about REST APIs is that they’re arguably, in a sense, object-oriented. After all, each endpoint can be thought of as encapsulated state we send messages to, right?

But of course, the key thing about “message-oriented” programming is that, in contrast to object-oriented programming, it emphasizes data. And that’s exactly what we see from HTTP APIs: exchanges of richly structured data, usually in the form of JSON (and before that, perhaps XML).

And if we look closely at more modern RPC frameworks, we see something similar. Protocol Buffers are used extensively in gRPC, which places significant emphasis on messages. This approach doesn’t just try to handle the boilerplate of making calls remotely, it also encompasses representing data more richly than many programming languages provide for natively.

Richer data here isn’t a small thing. Consider the Queue class above. If we just took the original implementation and exposed it over the network, we’d see interactions like this:

<-- SEND: push(3)
--> RECV: done.
<-- SEND: push(1)
--> RECV: done.
<-- SEND: push(4)
--> RECV: done.

Three round trips over the network just to accomplish a simple task. But the alternative design does not have this problem:

<-- SEND: push([3, 1, 4])
--> RECV: done.

The data-heavier approach looks like it’s less efficient when we’re thinking in terms of local CPU instructions. To pass richer data like an array into push, we’d have to allocate some sort of structure, somewhere. (Though we should be perfectly able to do this on the stack, the costs of this can be near zero.) Once we start to see the cost as round-trips across the network, however, the calculation starts going the other way. That’s how seemingly reasonable designs can start to manifest major flaws.

But efficiency aside, if our goal was the push multiple things onto the queue, then the second design is actually the better design even locally. Not just because it’s fewer lines of code: it also just directly says what we want to happen. And, in contrast to imperative approaches, data-oriented approaches are more amenable to composition:

for(elem in compute_pi_digits(3)) {
  q.push(elem);
}

versus

q.push(compute_pi_digits(3))

Again, we see a sequence of mutations replaced with a direct statement of intent.

End notes

This is my initial foray into talking about the very basics of designing distributed systems. I’ve obviously punted today on a lot of that, but I think it’s quite interesting to see how some of our design sensibilities that worked alright locally can completely fall apart remotely, before we even get into the trickier stuff.