← Back to all writing

Multithreading, multiprocessing, and asyncio

· 6 min readsoftware

How to choose the right concurrency model

introduction

Python provides three main approaches to concurrency: multithreading, multiprocessing, and asyncio. Choosing the right model is crucial for writing efficient programs.

Without concurrency, a program processes one task at a time — and wastes CPU cycles sitting idle during I/O operations. Understanding the trade-offs between these models lets you pick the right tool for the job.

fundamentals of concurrency

concurrency vs parallelism

Concurrency is about managing multiple tasks at the same time — but not necessarily running them simultaneously. It creates the illusion of multitasking by switching between tasks quickly.

Parallelism, on the other hand, runs multiple tasks simultaneously using multiple CPU cores. True parallelism requires hardware support.

Visual representation of concurrency vs parallelism

programs

A program is a static file — like a Python script sitting on disk. It's passive and does nothing until the OS loads it into memory and begins executing it.

Diagram showing the relationship between programs, processes, and threads

processes

A process is an independent instance of a running program. Each process has its own memory space, resources, and execution state. Processes are isolated from each other — unless they explicitly communicate via IPC (inter-process communication).

Processes can be categorised into two types based on what they spend most of their time doing:

  • I/O-bound: The process spends most of its time waiting for input/output (network requests, disk reads, etc.), and the CPU sits idle.
  • CPU-bound: The process is doing heavy computation, keeping the CPU busy.

A process goes through a lifecycle: new → ready → waiting → terminated.

threads

A thread is the smallest unit of execution within a process. Think of a process as a "container" for threads — every process has at least one thread (the main thread).

Threads within the same process share memory and resources. This makes communication between them fast, but it also opens the door to problems like race conditions and deadlocks. One misbehaving thread can crash the entire process.

how does the OS manage threads and processes?

A CPU can only execute one task per core at any given time. To handle multiple tasks, the OS uses preemptive context switching — it decides when to pause one task and resume another.

Process context switching is more resource-intensive because each process has its own separate memory space that needs to be swapped in and out.

Thread context switching is faster because threads within the same process share memory — there's less state to save and restore.

True parallel execution requires multiple CPU cores. On a single-core machine, even with multiple threads or processes, you're only getting concurrency (fast switching), not parallelism.

Python's concurrency models

Summary of Python's concurrency models

multithreading

Multithreading allows multiple threads to run concurrently within a single process, sharing memory. However, in Python, multithreading is limited by the GIL.

Python's Global Interpreter Lock (GIL)

The GIL is a mutex that allows only one thread to hold control of the Python interpreter at any given time. It was introduced to simplify CPython's memory management.

When is the GIL a bottleneck?

  • Single-threaded programs: The GIL is irrelevant — there's only one thread.
  • Multithreaded, I/O-bound: Less problematic. Threads release the GIL when they're waiting for I/O, so other threads can run.
  • Multithreaded, CPU-bound: Significant bottleneck. Multiple threads competing for CPU time can only execute Python bytecode one at a time.

Note: time.sleep() is treated as an I/O operation and releases the GIL, allowing other threads to execute while one is sleeping.

multiprocessing

Multiprocessing runs multiple processes in parallel, each with its own memory space, GIL, and resources. Because each process has its own GIL, multiprocessing bypasses the GIL limitation entirely.

This makes it suitable for CPU-bound tasks where you need true parallelism. The trade-off is that it's more resource-intensive — spawning and managing processes has more overhead than threads, and sharing data between processes requires explicit IPC.

asyncio

Asyncio uses a single thread to manage multiple tasks using async/await keywords. It's built around a few key concepts:

  • Coroutines: Functions defined with async def. They can be paused and resumed.
  • Event loop: The central scheduler that manages and runs coroutines.
  • Tasks: Wrappers around coroutines that allow them to be scheduled concurrently.
  • await: Pauses execution of the current coroutine until the awaited result is ready.

How it works: The event loop schedules tasks and runs them one at a time. When a task hits an await, it voluntarily pauses, and the event loop picks up another task that's ready to run. This makes asyncio ideal for scenarios with many small tasks that involve waiting (like network requests).

Because everything runs on a single thread, asyncio avoids the overhead of thread switching entirely.

Key difference: Multithreading uses OS-level preemptive switching (the OS decides when to switch). Asyncio uses cooperative multitasking (tasks decide when to yield control).

method 1: await coroutine

Calling await on a coroutine runs it sequentially within the current coroutine. Only the current coroutine pauses — the rest of the program continues running. This is non-blocking at the program level, but sequential within the calling coroutine.

python
async def fetch_data():
    print("Fetching data...")
    await asyncio.sleep(1)  # Simulate a network call
    print("Data fetched")
    return "data"

async def main():
    result = await fetch_data()  # Current coroutine pauses here
    print(f"Result: {result}")

asyncio.run(main())

method 2: asyncio.create_task(coroutine)

Using create_task() schedules the coroutine to run concurrently in the background. The current coroutine continues immediately without waiting. This is how you enable true concurrency in asyncio.

python
async def fetch_data():
    # Simulate a network call
    await asyncio.sleep(1)
    return "data"

async def main():
    # Schedule fetch_data
    task = asyncio.create_task(fetch_data())
    # Simulate doing other work
    await asyncio.sleep(5)
    # Now, await task to get the result
    result = await task
    print(result)

asyncio.run(main())

mixing sync and async code

You can bridge synchronous and asynchronous code using asyncio.to_thread(), which runs a sync function in a separate thread without blocking the event loop.

python
import asyncio
import time

def sync_task():
    time.sleep(2)
    return "Completed"

async def main():
    result = await asyncio.to_thread(sync_task)
    print(result)

asyncio.run(main())

For CPU-bound tasks within an async program, offload them to a separate process to avoid blocking the event loop.

when should I use which concurrency model?

Flowchart for choosing the right concurrency model

  • Multiprocessing: Best for CPU-bound tasks. Bypasses the GIL and achieves true parallelism across multiple cores.
  • Multithreading: Best for fast I/O-bound tasks where context switching frequency is lower and the overhead of threads is acceptable.
  • Asyncio: Best for slow I/O-bound tasks (long network requests, database queries) where you need to scale to many concurrent operations efficiently.

closing thoughts

Concurrency in Python isn't one-size-fits-all. The GIL makes the choice more nuanced than in other languages, but once you understand the trade-offs — threads for quick I/O, processes for CPU work, and asyncio for scalable async I/O — picking the right model becomes straightforward.