Some AP functionality has been available since Python 3.5, however key functions such as asyncio.run are only available from 3.7 (released in June 2018), so install that if you want to follow along.
Batch downloading of web pages for scraping is a common task in Python — but terribly slow. For example, my project planetmoney-rss downloads every Planet Money episode page (there’s over 1000!) and assembles them into an RSS feed for use in a podcast player. Previously extracted data is of course cached, but the initial download takes over 15 minutes — uncomfortable if you are trying to test or debug it.
Let us try to understand why synchronous batch download is so slow:
As you can see in the figure, when not using AP, we have to wait for each request to finish its round-trip to the server and back again; during this time our program cannot do anything useful. With AP our code for request #1 voluntarily gives up control until its response is ready, allowing code for request #2 to run, and later request #3. This difference shows up in the total time taken by each program: about 1 RTT (one times round-trip time) instead of 3 RTT.
This guy reduced his download time for NBA player statistics from 12 minutes to 22 seconds using AP. I will definitely be rewriting planetmoney-rss to use AsyncIO now!
Say we have a discord bot and want to do some maintenance in the bot every 24 hours (in my case clearing a dictionary to prevent it from growing infinitely, but this could also be a daily morning/goodnight message).
A naive implementation
time.sleep(24*60*60) # sleep 24 hours
send_message(‘Another day has passed!’)
runs into the obvious problem of being unable to respond to any messages it sees since it is busy sleeping.
What would an AP-based implementation look like?
This implementation makes the bot output “Another day has passed!” every 24 hours, but still keeps the bot responsive as evidenced by replying “I can still respond to your messages!” every time any user sends a message.
Concurrency in a single thread
So how does AP work? The key concept here is *inversion of control* — not your code is in charge but something called an event loop. The event loop continually loops and calls your functions (called coroutines in an AP context) which run until they terminate or yield. Termination occurs when we reach the end of a coroutine or a return or throw an exception. For yielding, let us cast an eye on await coroutine() — it looks like any other subroutine call and just like them adds to the call stack. When synchronous code hits a blocking call, we wait (with unchanged call stack) until the result comes back. When asynchronous code hits a normally-blocking call, we instead unwind the call stack and travel all the way up back to the event loop — we yield.
Once we yield, the event loop is free to run some other coroutine if the events they were waiting for have occurred — these coroutines will also yield control back to the event loop at some point. The event loop continually checks if the result of any suspended coroutine is available by polling all sockets (that were registered by the routines) for data. Once a result is available, the event loop will deal with it by restoring the respective coroutine’s call stack and jumping back into the coroutine at the exact point where it yielded.
But to the programmer it all just looks like any other subroutine call, except you call it with the await statement — this is by design. The call can even return a value. Think of await as meaning “I’m doing a normal function call here, but somewhere deep down the call stack we might yield” —and the event loop will take care of resuming everything for you.
Let us look at the discord bot from above as an example:
on_ready gets called when the bot gets initiated, but we immediately hit asyncio.sleep — a *non-blocking sleep* which relinquishes control back to the event loop until 24 hours have passed. If any on_message events happen during the time window, the event loop can soon dispatch the on_message coroutine with little latency. After the 24 hours are up, the event loop will “call” on_ready again — but we continue execution at the line where we left off and print a message to the terminal.
await message.channel.send prepares a request to send (looking up IP, creating a packet, and all that jazz), sends it, and relinquishes control to the event loop. In the meantime the request travels to the Discord servers, gets processed and acknowledged there, and the acknowledgment is sent back, similar to our first motivating example above. Once the event loop gets around to checking sockets our await can complete.
Conclusion: Co-routines co-operate with each other by giving up control voluntarily. Control returns to the event loop which manages co-routine states and execution. In contrast, a subroutine (i.e. a normal, non-AP function call) runs until exit (by return, exception, or otherwise).
Tasks & Entry Points
Take a look at these two “use” functions:
use_await cannot continue execution until each say_after has been resolved (awaited), so it takes a total of 5 seconds, with output printed in order.
*create_task* (pre-3.7: ensure_future) works differently however — this adds a task to the event loop (here task_a), but execution can continue unhinged — no waiting is required before task_b is also launched. await in this context blocks execution until the already-launched task completes. In this case task_b completes first and prints ‘hello’, then task_a, at which point use_tasks can continue. await task_b is hit but ignored as task_b already finished, resulting in a total of 3 more seconds. In essence, all three functions were run concurrently.
A pitfall here is to start a task but not await it before using its results — this happened to me and I did not notice as the code just used results from the last invocation. Another pitfall is to forget to wrap the coroutine into a task: task = coroutine(…); await task (wrong) instead of task = asyncio.create_task( coroutine(…) ); await task.
An *entry point* is needed to call async code from sync code, setting up and entering the event loop. This is demonstrated on the last two lines using asyncio.run — by design we could use any other library’s event loop here too. Event loops are familiar from video games or GUI programming — you define event handlers for keyboard and other events which then get called by the event loop.
Let us look at AP code for batch downloading the Discord default avatars:
This introduces two new concepts which are mostly syntactic sugar however.
await asyncio.wait(aws) (and the similar asyncio.gather) is used to await on a list of awaitables together and is roughly equivalent to for aw in aws: await aw — execution will only continue once all awaitables have finished. Even better: If the awaitable returns something, asyncio.gather will collect the return values into a list! (list = asyncio.gather(aws))
async with is related to context managers and the with-statement. While context managers can seem like magic, they simply call a predefined function when entering a block of code and a different one when exiting it, yet they make code which opens and closes resources so much more readable. (Context managers can also handle exceptions, but most choose not to.) async with simply allows both the enter and exit functions to be coroutines.
As described in the first motivating example, the code sets up a download task, then instead of waiting sets up the next task, and so on, so many requests happen in parallel.
Let us recap the new Python keywords introduced for AP:
- await — Do an operation and only continue once the result is ready. The operation might relinquish control to the event loop. The operation must be ‘awaitable’, typically a coroutine (async def), a task, or a future.
- async def — Any function registered to the event loop must be defined as async def, marking it as a *coroutine*.
- async with — Use a context manager which might await during entry and/or exit.
Any function containing an await, async for or async with must be a coroutine (async def).
Versus Multiple Threads
Now we understand how we can write asynchronous programs in Python, and everything is nice and well, right? “Why not use multi-threading”, I hear you cry.
Multi-threading does indeed follow a very similar idea — while one thread is waiting on some resource, schedule another waiting thread to run (with each thread being responsible for downloading one file for example). The difference here is that each thread can be pre-empted (paused by the runtime) at any line rather than at fixed await points as in AP, so synchronization is needed when accessing a shared list for example.
Usually, multi-threading has the ability to use multiple CPU cores, but because the CPython interpreter maintains a GIL (global interpreter lock), only one native thread can be executed at a time, so performance would not exceed asynchronous code anyway. The multiprocessing library can launch multiple independent Python interpreters at once, but the startup is slow and resource-intensive compared to using coroutines, and communication between them is a pain.
A caveat of using AP is that CPU-intensive operations need to be avoided — we frequently want to return control back to the event loop. For example if we fail to return control in a GUI event loop, the GUI will stop responding to user input and start to hang. AP is also unsuited if we have large data transfers instead of small images as in the example above — the bottleneck would end up being the network bandwidth.
If you think you’ve heard the concept of holding state in a function and voluntarily giving up control in Python before, you are not wrong! Generators have been in Python since 2.2. They generate one object at a time which is meant to be used in a loop, so they are a special kind of iterator. A simple version of the common range object could be implemented like so:
def range(start, stop, step):
while start < stop:
start += step
In fact, generators are also called semi-coroutines, and yield from was used in place of await in Python 3.4’s implementation of coroutines. yield from goes one call deeper into the generator call stack and yield saves the stack and backs up to the original non-generator call site — the exact two operations we have in AP. However one difference here to AP is that yield returns control back to the call site until next is invoked again, not an event loop.
AP can be a powerful tool for speeding up programs, particular ones heavy in I/O reads such as from the web or a large database. AP can be complex to understand but it is worth it. Sadly Python is too old for AP to be the default as it is in Go, leading to somewhat cluttered await/async syntax and limited interoperability between AP and non-AP code. If anything is still unclear I recommend this StackOverflow question (answers from Jul 4 ’18 and Feb 27 ‘18).