We’re in the middle of a run of concurrency posts on this blog, just a quick recap:
- We have to start with the difference between concurrency and parallelism. Everything’s concurrent, but not everything is parallel. I’m focusing on concurrency models, but not parallel programming models.
- Next, we look at how to go about embracing concurrency (event loops) and how to program asynchronously (futures/promises).
- Last week, we looked at async/await, a language feature helping us construct those futures/promises.
This week, I’d just like to get into a few different approaches to concurrency, and why it is that I emphasize asynchronous event loops and promises.
The synchronous threaded model
We traditionally are exposed to a lot of synchronous APIs from our operating systems. These APIs are the root of all evil here. If we were not terribly interested in I/O, and we were really focused on compute (i.e. parallelism more than concurrency), then the threaded model is fine. But once we start to depend on those synchronous APIs, we start to suffer problems.
The most interesting case for the synchronous model is mixing compute and I/O. Once given a workload, the only tunable parameter we have is thread count. In the figure above, I try to sketch a waste curve, showing the costs of thread count choices. There is no ideal choice here. We can try to tune for minimal waste, but the synchronous threaded model simply imposes waste, no matter what.
But this model has further warts. The heavier the I/O workload, the more it starts to break down. Part of the trouble is that submitting more synchronous I/O requests requires more threads. And more threads means more memory use. The classic “C10K” problem is rooted in this conundrum: if an additional concurrent connection requires another thread (or even process) and each takes 10MB, then 10K concurrent connections starts at 100GB of memory overhead!
The asynchronous event loop model
The immediate, dramatic benefit of embracing event loops is that this inherent trade-off goes away. We can actually hit the “ideal” point in that optimization curve above, and we don’t even have to work at it; there’s no tuning necessary.
This is possible because the event loop will aggressively do as much compute work as possible, dispatching as many I/O requests as it gets to. Multiple concurrent paused requests generally take less memory than whole threads, so this architecture is generally able to scale to an absurd degree. (The event loop model has obsoleted C10K as a real challenge, and some have proposed C10M as the replacement challenge. Three orders of magnitude difference.)
The tuning that happens with this model isn’t so much to achieve higher performance, but to limit memory usage, if that proves to be the limiting factor. It is better to serve requests less aggressively than to crash from lack of memory. Better still to install more RAM in your servers, though.
One remarkable part of this new concurrency model is that it frequently manifested itself as a single-threaded model. Applications like Nginx and Node.js are largely single-threaded (at least in terms of compute). The task of fully utilizing the machine’s CPU cores was simply delegated to running multiple independent application processes, one per core. This is made easier with Linux kernel features that allowed multiple processes to listen on the same port, easily balancing incoming request workload between the application instances.
Can we abstract away from asynchony?
One of the classic alternative options to embracing event loops is to hide that event loop behind an abstraction layer. The idea is for the programming language runtime to take care of pumping the event loop, and instead let us continue to believe we’re executing in the synchronous single-threaded model, augmented by “green threads.” (My preferred name. These are Erlang process, or Go routines.)
Different approaches to green threads nevertheless all function in generally the same way: the programmer believes they’re writing synchronous sequential code using cheap “threads,” but secretly the language runtime is not actually blocking when a thread blocks.
await, the thread being executed gets put on hold, and the runtime switches to a different thread, selected from the event loop.
In many ways, you can imagine “what if every function was
await was just implicit?”
There are a few drawbacks to this programming model:
It requires a heavier runtime. Rust ruled it out for this reason.
It easily lends itself to erroneous conflation with real threads. We all know what’s supposed to happen when you block, sure, but what happens when you enter a tight numerical loop for a few seconds? With green threads, you’re almost always co-operatively interrupted, which means this loop will in effect block execution of other tasks on this thread (which might be the only thread). This issue might be fine, or it might be a serious problem, or even a serious problem that’s subtly hidden until you reach a critical mass of “blocked” threads. This all depends on whether you’re using green threads for concurrency (ok) or believe they’re giving you parallelism (maybe uh oh).
Green threads are a leaky abstraction. Lots of system calls block. Many things like file I/O only present synchronous APIs, after all. But in practice, we get blocking behavior from even more sources. FFI calls, for instance, might internally make blocking system calls, so even if we try to wrap language-internal system calls cleverly, we can still run aground.
There’s no indication of blocking or non-blocking behavior. A function call is just a function call. It might hard-block in FFI, it might soft-block co-operatively, or it might just run to completion normally. To find out which… run it?
I believe this combination of drawbacks is what motivated the much more recent developments with async/await. (Promises, of course, go back a long time, too, although not entirely in their modern form…) In contrast to the above, what Rust appears to be getting (once the design process is finalized) with its version of explicit asynchronous programming:
A minimal and low-overhead runtime. In fact, likely even multiple runtime implementations. I haven’t investigated in depth yet, but I believe while the initial design efforts have focused on single-threaded concurrency, it may be a library-level change to support parallel work-stealing thread pools. The async programming model isn’t tightly tied to the implementation technique.
Since it’s clearly about asynchronous programming, and not easily confused with parallelism, you know what happens if you get into long-running compute loops. Nothing is hidden.
If a function is going to do I/O, then if it returns a result, it’s synchronous. If it returns a promise/future, it’s asynchronous. You can more easily understand your program just by reading it, instead of running it and investigating the result.
When faced with a synchronous API, it’s possible to build mechanisms (yourself or simply using a library) to use a sacrificial thread pool to “async-ize” the call.
In the end, I think the async/await/promise approach gets us the same benefits as green threads, with a different mix of drawbacks that I think comes out in favor of the explicit asynchronous model. Pedagogically, I’m also in favor of actually exposing programmers to asynchrony. The confusion that arises from a wholly synchronous world-view is real. Especially since everything is actually asynchronous.
- Coming up next week are my thoughts on “function coloring,” and then I might be done with concurrency for the moment. If you want my thoughts on a specific topic here, ask away! Patreon or Twitter or messenger pigeon, whatever floats your boat.