Async and Await: concurrent control flow

Last week was all about asynchronous programming and how we need to embrace an event loop as a core part of how we write programs, in order to do concurrency well.

At the end of that essay, we’ve figured out this Promise (or Future, or Task, or…) abstraction that allows us to compose together concurrent control flow as data, then let it loose to run with an event loop. But this comes with some downsides: we’re giving up the control flow our languages already have, in order to switch over to a more complicated, explicitly constructed version. Why can’t we just write programs like we usually do?

There are two approaches to solving that problem. The first is to try to invent a “green threads” abstraction that hides the asynchrony, concurrency and event loop inside the language runtime. I’ll get to this idea in a bit, but I’m a skeptic. The more effective strategy, in my opinion, is to avoid hiding the important details, embrace visible event loops and asynchrony, but also expose a minimal language feature to help re-use traditional control flow.

That minimal feature is async/await.

The core abstraction

Async/await builds on top of a core abstraction like Promises (etc). It is essentially a way of easily constructing new promises.

I’m happy to see modern languages more readily introduce language-level features for library-level subjects. These can be foreach-like statements for iterator types, or string interpolation syntax using a “string builder” library underneath. Or do notation in Haskell for supporting monadic types. Async/await is another such feature, building a language feature on top of a promise library.

In fact, promises and monads are deeply related, in that the promise type can be monadic. However, most languages have made choices that deviate from that a little bit. Rust, however, might be getting a truly monadic Future type by accidental necessity, seemingly related to lifetime and efficiency concerns. We’ll see.

Syntactically, the first new keyword allows the declaration of async functions. Async functions always return a promise, though not all functions that return a promise are async functions (we’ll go into this distinction more in a bit). This promise return type might either be explicit (in C#, the return type of an async function is a Task<T>), or implicit (T but automatically wrapped). But in either case, the construction of that promise is taken care of automatically by having declared the function async. Indeed, the language is likely taking care of constructing several promise objects for you, composed together into the single promise the functions returns.

Within a function that’s declared async, we gain the ability to use await to unwrap other promises. Instead of truly blocking and waiting for that promise to complete, what actually happens is that await serves as a marker of where this function is being broken up into multiple sub-promises.

A function like this:

// This whole function implicitly returns a promise
async function example(x) {
  var a = calculate1(x);
  var b = await query1(a); // query1 returns a promise, so unwrap it with await
  var c = calculate2(b);
  var d = await query2(c); // likewise query2 returns a promise
  return calculate3(d);
}

Would be written without async/await like this:

query1(calculate1(x)).then(b => query2(calculate2(b))).then(calculate3)

And since I don’t want anyone to come away with the impression this is more verbose, if we really wanted to be terse, the original functions could have been written like so:

calculate3(await query2(calculate2(await query1(calculate1(x)))))

(And the advantages over plain promises get more clear as soon as we start branching or looping.)

Although implementation details vary, something very much like this transformation from await to promises happens behind the scenes. Async/await just lets us write code in the manner we usually would, with the stopping points annotated with await keywords. (Sometimes there’s a preference for generators or state machines instead of composed promises, but they’re supposed to behave the same in the end.)

So at its core, an async function is just one that has internal gaps where it might return early (and it can return early because it returns a promise, not the final value) and then resume as events come in later.

This async function pauses twice to perform queries.

Where’s the concurrency?

But one thing is missing from the above story: how do we keep multiple balls in the air? What if we want to run query1 and query2 concurrently? That is, submit both queries before we wait? So far, all I’ve described is the ability for async functions to “block” without actually blocking the thread.

Unfortunately, this is where things splinter into a lot of different answers. The truth is: this is no longer about async/await. That language feature has done its job. Now the question is: how do those promises work?

And languages have different answers. Let’s take a look at three.

C#: Task

Microsoft deserves a lot of credit for pioneering async/await. It went pretty quick from experimental design in F# to full implementation in C#. C# drove this feature directly into a widely used mainstream language, and did it well, too.

The design of Task (the C# promise type) was probably driven by two major factors:

The C# designers probably had database queries in mind. Like the “PHP trick” from last week, what you’d like to see is your queries get submitted when made, but then not have execution paused until you actually have to await one of them. As a result, when a function like query1 is called and it returns a promise, it has already begun (or at least been scheduled for) executing. You don’t need to do anything special to get it to actually start. It starts eagerly.
The C# designers are quite fond of multi-threaded parallel programming. They were absolutely not content to achieve a mere single-threaded concurrency. So C# Tasks are inherently parallel-capable. It’s not that a Task represents a thread, instead it’s a unit of work that can be “stolen” by another thread in the worker thread pool. This allows the application to create exactly the number of compute threads as CPU cores, and work as quickly as possible, without significant overhead. This also means that when a Task is created… it can be executed immediately, in another thread from the one that spawned it.

This means if we wanted to write two queries in parallel, we’d write (continuing to use my Javascript-like pseudocode syntax here):

async function example(x) {
  var a = query1(x);
  var b = query2(x);
  return f(await a, await b);
}

There mere act of calling query1 (but not yet awaiting on it), can already spawn the Task running in the background on another thread. And so, in this situation, both query1 and query2 Tasks are created and begin executing, before this Task blocks on waiting for one of them.

So this approach makes things work a bit like the PHP trick: we can put multiple queries out there, and only block once we need one of the results. However, it’s really well integrated with multi-threading, which is a nice advantage. If we want to do two compute tasks actually in parallel we could even reach for Tasks to do it. Our function above could replace query1 and query2 with compute functions, and we’d still gain a benefit.

Node: Promise

Javascript promises were descending directly from the earlier callback-oriented design for asynchronous programming. As a result, the major design criteria was compatibility with this earlier approach and all the code written for it.

The result is that Javascript promises are also eager, though in a slightly different way than in C#. C# had multiple threads, so its design was oriented around the possibility of work-stealing thread pools. Javascript does not, so what we see instead of an immediate execution of async functions, up until the first await.

So if we want to put two queries in flight simultaneously, we do basically the same thing as in C# above. It’s just… instead of there potentially being multiple threads, the main thread will submit the first query, the second query, and then pause, returning to the event loop.

Of course, there’s a lot more to Javascript promises (exceptions, cancellations, and so on), but this isn’t a tutorial on this language feature. There interesting part here is just what it does differently.

Rust: Future

Rust and C# are the most interesting design choices here. Rust Futures take the opposite design direction: instead of being eager, they are lazy. Or rather, they will be. The Rust developers are still working on the details of how this feature will make it into the standard library. But this much seems settled.

This means calling an async function just returns an object. It doesn’t execute anything at all. It’s like async functions are more like closures than functions. Even fully applied, they’re just a value, they don’t get run until they’re actually evaluated (e.g. by await).

This comes with one major downside: In Rust, this function from before (again, using JS-like pseudocode):

async function example(x) {
  var a = query1(x);
  var b = query2(x);
  return f(await a, await b);
}

Will sequentially call query1, wait, then call query2. (Rust specifies left-to-right argument evaluation order, so await a goes first.)

This means that in Rust, you won’t get concurrent queries just from async/await alone. This is somewhat unfortunate, as that’s a major use-case for async/await. But it’s also not the end of the world.

In some ways, I might actually end up preferring this design. With the eager approach, actually concurrent execution is sort of implicit: you just happen to create multiple promises/tasks before you “block.” With Rust, if you want concurrency, you have to explicitly compose it. In some ways, this is nice: if you’re trying to spot places where you could be doing more concurrently, it’ll be much more obvious where you are not.

But this “explicitness” could also be illusory: explicit isn’t always better. One of the bits of programming language design lore is that when a feature is new, everyone always wants it to be explicit. But once the feature is better understood, people often want it to be less “noisy.” So… we’ll see.

But to make things happen at the same time with Rust futures, you have to be explicit about it, like so (more pseudo-code):

async function example(x) {
  var a = query1(x);
  var b = query2(x);
  var (x, y) = await futures_join(a, b);
  return f(x, y);
}

However, while this is a drawback, there are a lot of benefits, too. For one, cancellation is a total non-issue with Rust futures.

In Javascript and C#, there are explicit mechanisms to signal to the work pool that a promise should be removed and no longer executed. These mechanisms complicate things rather significantly sometimes. They are necessary because the work pool is inherently stateful: any time you create a promise/task, the pool is mutated to include it.

Rust doesn’t have cancellation. If you don’t want to do the work, don’t await on that future. Done. The problem is gone, because this design means it was never created in the first place.

This is probably fine.

I think the Rust approach is actually the best one, and I’d like to demonstrate why. Let’s pick a slightly more complicated example of an asynchronous function we want to run.

Our goal: two queries, some post processing that gets done on each as aggressively as possible, before we use both results to compute our final result.

Using promises directly, we would write something like this:

function example() {
  var a = query1().then(compute1);
  var b = query2().then(compute2);
  return Promise.all([a, b]).then([x, y] => compute3(x, y));
}

This is not too bad, but that’s because we’re assuming all these query and compute functions are actually separate functions. All the actual trouble with this purely promise-based approach arises when we’d really just like to write some simple inline code.

So, what do we do with C# async? Well, Task of course has ContinueWith (which is basically then), so we can always re-write the above code, but what about using async/await?

There’s… actually no good answer here, as far as I can tell. The trouble is that async/await is a linguistic abstraction for dealing with essentially just functions with gaps in them. Just that much. The way we went about adding the local concurrency is a hack in C# and Javascript. The creation of a promise that executes eagerly is just some implicit mutation of some global state to add to a work queue. It’s not a general solution to the problem, as we can see here.

If we want to write this function using async/await, we essentially have no choice but to break it up into multiple async functions:

async function example() {
  async function branch1() {
    return compute1(await query1());
  }
  async function branch2() {
    return compute2(await query2());
  }
  ... // fill in here
}

These two branches are quite amenable to being implemented as async functions because they’re “just sequential functions with gaps.” From here, we can see the two different styles for how to finish the function.

Eager-style:

  var x = branch1(); // spawns it, but doesn't wait
  var y = branch2(); // ditto
  return compute3(await x, await y);

Or lazy-style:

  var (x, y) = await futures_join(branch1(), branch2()); // spawns both
  return compute2(x, y);

Suddenly, the disadvantage of the Rust style isn’t so sharp. As soon as you want to do something even slightly more complicated than just submit multiple queries, you’re either re-writing the C# code to be more like what you’d have written in Rust, or you’re skipping async/await and going back to then call chaining on the promises again.

PS: Async functions vs returning a promise

With the Javascript approach, there’s no difference. If a function returns a promise, it’s async. If it’s declared async, all that happens is return values automatically get wrapped into promises, and you can use await. But either way, the function runs immediately.

With C#, there’s a minor difference. An async function could run on another thread. Whereas calling an ordinary function that returns a Task means that function is called on this thread, before it eventually spawns a new Task.

With Rust, there’s a huge difference. Normal functions actually run when they’re called, whereas async functions are glorified data type constructors. So if you write an ordinary function that returns a future, it will run immediately before stopping by returning the future, which has not yet run. This could actually be essential for lifetime reasons. You might want to borrow a reference to a value and not store that reference to it in the async closure, instead immediately using it to compute a derived value that you do actually close over.

So an async function looks like this (now dipping into Rust syntax):

async fn print_async() {
    println!("I won't print until polled");
}

While a synchronous function that returns a future will looks like this:

fn print_both(arg: &str) -> impl Future<Output = ()> {
    println!("Hi {0}, I am printed immediately", arg);
     
    return async {
        println!("I won't print until polled, without borrowing that string.");
    };
}

I could see some Rust developers initially wanting to reach for this for, e.g., database queries, to try to emulate the PHP hack. I’m not sure how that will work out, for the reasons I discussed in the previous section. It doesn’t really “scale” to more complex kinds of concurrency. We’ll see how things work out, but I suspect this second form won’t be used all that often.

End notes

I tend to use “promise” as the general name for this abstraction. In part because Javascript does and is ubiquitous, and in part because it was given the name in a paper that even has Liskov in the author list. Wow. I’m not sure why Rust went with “future.”
Some interesting notes on how Rust futures were designed.
My thoughts on “function coloring” coming soon.