Concurrency vs parellism

There’s some recent discourse out there comparing the approaches of Node and Go to concurrency. I was thinking about writing up some of my own thoughts on this subject, since this is a design-related topic, but I realized that before I can do that, I really need to work up to it with a few posts.

Part of what precipitated today’s post is that I cracked open a few books I have laying around to skim over to ensure I wasn’t forgetting something important, and discovered a chapter (in a book that will remain nameless) entitled “declarative concurrency.” This phrase caused my brain to crash to a stop with some kind of equivalent to a record scratch. What on earth could that be? I paged through the chapter only to discover… it was about declarative parallelism. So right away these very smart authors had managed to make a fundamental error.

I wasn’t sure the world needed another “concurrency is not parallelism” post, but there are people who still need it. And the book will probably need it, so let’s get into it.

What’s the difference?

The difference is really quite trivial, but it takes a lot of work to get there:

Concurrency is about handling (asynchronous, nondeterministic) events.
Parallelism is about using hardware resources to do more at the same time.

This is a nice, stark difference for a topic that often gets buried in a lot of confusion. After all, how did these smart book authors manage to confuse these two concepts if they’re as obviously different as I describe them above?

The short answer is because the most basic programming model that we’ve used since forever ago intermixes the two in a confusing way.

A process (or thread) pretends to be a sequential execution of instructions, but not only is this a lie because it’s interacting with operating system resources, it’s also a more fundamental lie. Every process (or thread) is a concurrent process, there’s no such thing as a sequential process, not really. We’ll get into this in a second.

Meanwhile, because we rely on sequential synchronous execution, if we want to try to get more balls in the air, we need to actually resort to parallel execution. Instead of “submit read file A request, submit read file B request, wait for response” we often end up writing “in thread 1, read file A, in thread 2, read file B.” We were writing multi-threaded code back in the days of single-core processors. This era is still deeply grooved into people’s brains, and it’s still pervasive in OS API designs.

The present state of asynchronous (non-networking) I/O on Linux is a giant, flaming garbage fire. A big part of the reason concurrency and parallelism are so conflated is that we get many purely synchronous API calls from the operating system (not just Linux), and to get around this inherent lack of concurrency in API design, we must resort to the parallelism of threads.

LWN 2016: Fixing asynchronous I/O, again
LWN 2017: Toward non-blocking asynchronous I/O
A quote from FIO man page (a common drive benchmarking tool):

Even async engines may impose OS restrictions causing the desired depth not to be achieved. This may happen on Linux when using libaio and not setting direct=1, since buffered IO is not async on that OS.

So NVMe benchmarks may be able to achieve performance that actual applications on Linux are unable to achieve in practice, not just because the benchmark is artificial, but because the kernel interface to keep the drive saturated with requests is basically unusable for “real” applications (but at least works for the benchmark!)
libuv documentation states that it uses a thread pool to emulate asynchronous I/O, using the synchronous APIs.
Windows offers I/O completion ports which are legitimately good.

So on the one hand, we have a proliferation of synchronous APIs meaning we routinely had to use parallelism features to achieve concurrency. On the other hand, to use parallel features, you often also have to involve concurrency. Threads compute things, and then they have to communicate. Communication manifests as events.

The immediate topics that arises after introducing threads (a parallelism concept) is locks and mutexes and semaphores, oh my! And these get labeled “concurrency primitives” which… technically isn’t a mislabel (because they’re all about nondeterministic, asynchronous events!) but is also misleading? Sorta? Because these are very specific events.

This ties these two concepts together really tightly. You can’t really have parallelism without some kind of concurrency (however tamed and well-behaved) because that would imply the parallel thread functions mostly as a means of converting electricity into heat inside the CPU. You have to communicate results, somehow.

But parallelism and concurrency can be separated. With good enough asynchronous APIs, you can do concurrency without a bit of parallelism. With a good parallel programming model, the concurrency aspects can be completely and perfectly abstracted away, leaving you just focused on the deterministic parallelism.

The sequential lie: our dumb history

In the 70s, our computers were single-core, single-digit megahertz CPUs, connected to nothing more capable than a tape drive. The design decisions from this era are still with us today.

One of the themes I mentioned for this book is how much of design is dealing with ancient unfixable design mistakes we can only work around and pile more hacks on top of. It’s not a pretty picture. It’s a bit of a depressing fact. We can always dream of a better future. But today, we learn to work around the mistakes.

Every program we write is a concurrent program, but almost every API we use is a synchronous one that pretends concurrency does not exist. If you’re writing a web server, you’re fundamentally responding to events. If you’re writing a user interface, you’re fundamentally responding to events.

Even if you’re writing a command-line program, if you want to handle ctrl+c, well that’s delivered via a SIGINT signal, out of band from your blocking read call. (And wait, what’s “blocking” without concurrency?) And you can’t actually do much from a signal handler. So maybe you just set a global variable, and replace all your read prompts with a special function that will check that global variable before actually blocking on read.

And that primitive event handler works great until one of your users discovers a reliable way to have sent SIGINT after the global variable check but before the blocking read. Now they’re complaining ctrl+c doesn’t work reliably in your application.

And of course, any time we interact with the filesystem, our sequential programs are lying to us. Sure, we can check to ensure a path has no symlinks before deciding it’s safe to open it, but that’s just another potential “Time of Check / Time of Use” TOCTOU vulnerability. Everything is not-so-secretly concurrent.

On the one hand, it’s easy to look at that situation and just scream and rage about how hard concurrency is, but that’s bullcrap. Concurrency is actually quite easy. Bullshitting yourself into pretending you’re not doing concurrency, and then getting swamped in “concurrent, but trying to hide it by being sequential and synchronous” APIs, that’s the recipe for everything getting hard.

If you embrace the fact that everything is concurrent, and use something like libuv, then libuv is able to use signalfd to turn UNIX signals into just another kind of event your event loop is handling. The concurrency difficulty here just goes away. TOCTOU might become a bit more obvious when there’s a giant “and then THINGS might happen” sitting between the check and use.

This is an area where I’d like to say that Windows has embraced concurrency better, but I really can’t. Event handling loops are a core part of how Windows works, but they’re also buried under decades of old bad designs and work-arounds. Event loops look like they’re associated with HWND handles, so you might need a window to receive messages? But no, event loops are actually associated with threads, and each HWND has an owning thread, and a thread can have an event loop without any windows (but a thread might not have an event loop, until it’s done something to create one). And there’s still multiple event handling mechanisms, this is just the one used by UI threads. Have fun combining WaitForMultipleObjects with event loops!

Everything is terrible!

But whatever. This point here is: confusion is inevitable coming out of this history. Single threads exist in an inherently concurrent context, but synchronous APIs rob us of that concurrency. So we have to reach for multi-threading, a parallelism tool, to achieve concurrency, which creates more concurrency concerns when multiple threads interact. And many of these different kinds of events (message passing, signals, mutexes, file descriptors) have trouble interacting, exacerbating problems like deadlocks. I’d really like my thread to respond to an event, but it’s currently waiting on a mutex. Darn.

Control and computation

Another way to distinguish concurrency and parallelism is by what you’re most interested in. Concurrency is a control-flow oriented concern. Parallelism is a data-flow oriented concern.

If you’re just trying to compute a result (so you’re chiefly data-flow oriented) then parallelism is what you’re after. How those results get communicated (the concurrency aspects) aren’t as important. If your problem fits a good parallel programming model, they can even be abstracted away for you. (This was the topic of that “declarative concurrency” chapter I mentioned.)

If you’re trying to control what happens (so you’re chiefly control-flow oriented) then concurrency is your bag. This means you’re doing something that inherently more I/O or state-oriented.

So it’s not much of a wonder that “functional programming” (whatever that means) ends up so much more associated with parallelism. The unlikely promise that functional programming would somehow be a panacea for parallel programming didn’t pan out. But they are two concepts that are more predominantly focused on data-flow over control-flow.

End notes

The canonical essay on this topic, in my mind, is Bob Harper’s “Parallelism is not Concurrency”.
Coming up is some comparison of C# and Erlang, which best represent two different approaches to concurrency. Okay, fine, we can talk about Node and Go, too.
I’m also planning a post deep-diving into async/await. This is a major recent innovation. It’s cool.
I should take some time to think more carefully about the function coloring arguments. I’m not sure I’m convinced this is actually a problem. Even without a forced distinction in the language, there’s already a huge amount of “coloring” that happens just from our mental models of what a function does. I’m reminded of invented coloring conventions, like mutation functions in Lisp/Scheme/Ruby ending in a !. The real question is how much composition gets complicated.
This book isn’t going to be a parallel programming book, beyond maybe a few really basic things. Parallel programming is a specialized topic with considerable depth. Concurrency, however, I don’t think is all that specialized, and deserves more attention.
(Added 2018-11-6) An alternative definition of “parallelism vs concurrency” I’ve seen used is distinguishing “hardware vs programming model.” I disagree with this definition: there are a myriad of parallel programming models. These should not be dismissed as mere hardware concerns. And there are several programming models for concurrency, too. Parallelism might be inspired by hardware, but it’s distinct.