|
| 1 | +# Free Threading Building Blocks |
| 2 | + |
| 3 | +This document describes the low-level primitives used to implement |
| 4 | +free-threaded (nogil) CPython. These are internal implementation details |
| 5 | +and may change between versions. |
| 6 | + |
| 7 | +For a higher-level overview of free threading aimed at Python users, see |
| 8 | +the [free-threading HOWTO](../Doc/howto/free-threading-python.rst). For |
| 9 | +C extension authors, see the |
| 10 | +[free-threading extensions HOWTO](../Doc/howto/free-threading-extensions.rst). |
| 11 | + |
| 12 | + |
| 13 | +## Object Header Layout |
| 14 | + |
| 15 | +In free-threaded builds (`Py_GIL_DISABLED`), every `PyObject` has an |
| 16 | +extended header (defined in `Include/object.h`): |
| 17 | + |
| 18 | +```c |
| 19 | +struct _object { |
| 20 | + // ob_tid stores the thread id (or zero). It is also used by the GC and the |
| 21 | + // trashcan mechanism as a linked list pointer and by the GC to store the |
| 22 | + // computed "gc_refs" refcount. |
| 23 | + uintptr_t ob_tid; // (declared with _Py_ALIGNED_DEF for alignment) |
| 24 | + uint16_t ob_flags; |
| 25 | + PyMutex ob_mutex; // per-object lock |
| 26 | + uint8_t ob_gc_bits; // gc-related state |
| 27 | + uint32_t ob_ref_local; // local reference count |
| 28 | + Py_ssize_t ob_ref_shared; // shared (atomic) reference count |
| 29 | + PyTypeObject *ob_type; |
| 30 | +}; |
| 31 | +``` |
| 32 | + |
| 33 | +Key fields: |
| 34 | + |
| 35 | +- **`ob_tid`**: The thread id of the owning thread. Set to |
| 36 | + `_Py_UNOWNED_TID` (0) for immortal objects or objects whose reference |
| 37 | + count fields have been merged. Also reused by the GC and the trashcan |
| 38 | + mechanism as a linked list pointer. |
| 39 | + |
| 40 | +- **`ob_mutex`**: A one-byte mutex (`PyMutex`) used by the critical |
| 41 | + section API. Must not be locked directly; use `Py_BEGIN_CRITICAL_SECTION` |
| 42 | + instead (see [Per-Object Locking](#per-object-locking-critical-sections) |
| 43 | + below). |
| 44 | + |
| 45 | +- **`ob_ref_local`**: Reference count for references held by the owning |
| 46 | + thread. Non-atomic — only the owning thread modifies it. A value of |
| 47 | + `UINT32_MAX` (`_Py_IMMORTAL_REFCNT_LOCAL`) indicates an immortal object. |
| 48 | + |
| 49 | +- **`ob_ref_shared`**: Atomic reference count for cross-thread references. |
| 50 | + The two least-significant bits store state flags (see |
| 51 | + [Shared Reference Count Flags](#shared-reference-count-flags) below). |
| 52 | + The actual reference count is stored in the upper bits, shifted by |
| 53 | + `_Py_REF_SHARED_SHIFT` (2). |
| 54 | + |
| 55 | +- **`ob_gc_bits`**: Bit flags for the garbage collector (see |
| 56 | + [GC Bit Flags](#gc-bit-flags) below). |
| 57 | + |
| 58 | + |
| 59 | +## Thread Ownership |
| 60 | + |
| 61 | +Each object is "owned" by the thread that created it. The owning thread |
| 62 | +can use fast, non-atomic operations on `ob_ref_local`. Other threads |
| 63 | +must use atomic operations on `ob_ref_shared`. |
| 64 | + |
| 65 | +| Primitive | Header | Description | |
| 66 | +|---|---|---| |
| 67 | +| `_Py_ThreadId()` | `Include/object.h` | Returns the current thread's id (`uintptr_t`) using platform-specific TLS. | |
| 68 | +| `_Py_IsOwnedByCurrentThread(op)` | `Include/object.h` | Returns non-zero if `op->ob_tid` matches the current thread. Uses an atomic load under ThreadSanitizer. | |
| 69 | +| `_Py_UNOWNED_TID` | `Include/object.h` | Constant (0) indicating no owning thread — used for immortal objects and objects with merged refcounts. | |
| 70 | + |
| 71 | + |
| 72 | +## Reference Counting Primitives |
| 73 | + |
| 74 | +Free-threaded CPython splits the reference count into a local part |
| 75 | +(`ob_ref_local`) and a shared part (`ob_ref_shared`). The owning thread |
| 76 | +increments `ob_ref_local` directly. Other threads atomically update |
| 77 | +`ob_ref_shared`. |
| 78 | + |
| 79 | +### Incrementing References |
| 80 | + |
| 81 | +| Primitive | Header | Description | |
| 82 | +|---|---|---| |
| 83 | +| `_Py_TryIncrefFast(op)` | `pycore_object.h` | Tries to increment the refcount using the fast (non-atomic) path. Succeeds only for immortal objects or objects owned by the current thread. Returns 0 if the object requires an atomic operation. | |
| 84 | +| `_Py_TryIncRefShared(op)` | `pycore_object.h` | Atomically increments `ob_ref_shared` using a compare-and-exchange loop. Fails (returns 0) if the raw shared field is 0 or `_Py_REF_MERGED` (0x3). | |
| 85 | +| `_Py_TryIncrefCompare(src, op)` | `pycore_object.h` | Increfs `op` (trying fast path, then shared) and verifies that `*src` still points to `op`. If `*src` changed concurrently, decrefs `op` and returns 0. | |
| 86 | +| `_Py_TryIncref(op)` | `pycore_object.h` | Convenience wrapper: tries `_Py_TryIncrefFast`, then falls back to `_Py_TryIncRefShared`. In GIL builds, checks `Py_REFCNT(op) > 0` and calls `Py_INCREF`. | |
| 87 | + |
| 88 | +### Safe Pointer Loading |
| 89 | + |
| 90 | +These load a `PyObject *` from a location that may be concurrently |
| 91 | +updated by another thread, and return it with an incremented reference |
| 92 | +count: |
| 93 | + |
| 94 | +| Primitive | Header | Description | |
| 95 | +|---|---|---| |
| 96 | +| `_Py_XGetRef(ptr)` | `pycore_object.h` | Loads `*ptr` and increfs it, retrying in a loop until it succeeds or `*ptr` is NULL. The writer must set `_Py_REF_MAYBE_WEAKREF` on the stored object. | |
| 97 | +| `_Py_TryXGetRef(ptr)` | `pycore_object.h` | Single-attempt version of `_Py_XGetRef`. Returns NULL on failure, which may be due to a NULL value or a concurrent update. | |
| 98 | +| `_Py_NewRefWithLock(op)` | `pycore_object.h` | Like `Py_NewRef` but also optimistically sets `_Py_REF_MAYBE_WEAKREF` on objects owned by a different thread. Uses the fast path for local/immortal objects, falls back to an atomic CAS loop on `ob_ref_shared`. | |
| 99 | + |
| 100 | +### Shared Reference Count Flags |
| 101 | + |
| 102 | +The two least-significant bits of `ob_ref_shared` encode state |
| 103 | +(defined in `Include/refcount.h`): |
| 104 | + |
| 105 | +| Flag | Value | Meaning | |
| 106 | +|---|---|---| |
| 107 | +| `_Py_REF_SHARED_INIT` | `0x0` | Initial state, no flags set. | |
| 108 | +| `_Py_REF_MAYBE_WEAKREF` | `0x1` | Object may have weak references. | |
| 109 | +| `_Py_REF_QUEUED` | `0x2` | Queued for refcount merging by the owning thread. | |
| 110 | +| `_Py_REF_MERGED` | `0x3` | Local and shared refcounts have been merged. | |
| 111 | + |
| 112 | +Helpers: |
| 113 | + |
| 114 | +| Primitive | Header | Description | |
| 115 | +|---|---|---| |
| 116 | +| `_Py_REF_IS_MERGED(ob_ref_shared)` | `pycore_object.h` | True if the low two bits equal `_Py_REF_MERGED`. | |
| 117 | +| `_Py_REF_IS_QUEUED(ob_ref_shared)` | `pycore_object.h` | True if the low two bits equal `_Py_REF_QUEUED`. | |
| 118 | +| `_Py_ExplicitMergeRefcount(op, extra)` | `pycore_object.h` | Merges the local and shared reference count fields, adding `extra` to the refcount when merging. | |
| 119 | +| `_PyObject_SetMaybeWeakref(op)` | `pycore_object.h` | Atomically sets `_Py_REF_MAYBE_WEAKREF` on `ob_ref_shared` if no flags are currently set. No-op for immortal objects or objects already in WEAKREF/QUEUED/MERGED state. | |
| 120 | + |
| 121 | + |
| 122 | +## Biased Reference Counting |
| 123 | + |
| 124 | +Each object has two refcount fields: `ob_ref_local` (modified only by |
| 125 | +the owning thread) and `ob_ref_shared` (modified atomically by all |
| 126 | +other threads). The true refcount is the sum of both fields. The |
| 127 | +`ob_ref_shared` field can be negative (although the total refcount must |
| 128 | +be at least zero). |
| 129 | + |
| 130 | +When a non-owning thread calls `Py_DECREF`, it calls `_Py_DecRefShared` |
| 131 | +which decrements `ob_ref_shared`. If `ob_ref_shared` is already zero |
| 132 | +(or only has the `_Py_REF_MAYBE_WEAKREF` flag set, meaning the count |
| 133 | +portion is zero), the thread does not subtract from the refcount. |
| 134 | +Instead, it sets the `_Py_REF_QUEUED` flag and enqueues the object for |
| 135 | +the owning thread to merge. The queue holds a reference to the object. |
| 136 | +The owning thread is notified via the eval breaker mechanism. |
| 137 | + |
| 138 | +When the owning thread processes its merge queue (via |
| 139 | +`_Py_brc_merge_refcounts`), it calls `_Py_ExplicitMergeRefcount` for |
| 140 | +each queued object, which merges `ob_ref_local` and `ob_ref_shared` |
| 141 | +into a single value, sets `ob_tid` to 0, and sets the `_Py_REF_MERGED` |
| 142 | +flag. If the merged refcount is zero, the object is deallocated. |
| 143 | + |
| 144 | +Similarly, when the owning thread's own `Py_DECREF` drops `ob_ref_local` |
| 145 | +to zero, it calls `_Py_MergeZeroLocalRefcount`, which either deallocates |
| 146 | +immediately (if `ob_ref_shared` is also zero) or gives up ownership by |
| 147 | +setting `ob_tid` to 0 and `_Py_REF_MERGED`, and deallocates if the |
| 148 | +combined refcount is zero. |
| 149 | + |
| 150 | +If `ob_ref_shared` is nonzero when a non-owning thread decrements it, |
| 151 | +the decrement subtracts normally — the shared count can go negative. |
| 152 | + |
| 153 | +Defined in `Include/internal/pycore_brc.h`: |
| 154 | + |
| 155 | +| Primitive | Description | |
| 156 | +|---|---| |
| 157 | +| `_Py_brc_queue_object(ob)` | Enqueues `ob` to be merged by its owning thread. Steals a reference to the object. | |
| 158 | +| `_Py_brc_merge_refcounts(tstate)` | Merges the refcounts of all objects queued for the current thread. | |
| 159 | + |
| 160 | +The per-interpreter state (`struct _brc_state`) uses a hash table of |
| 161 | +`_Py_BRC_NUM_BUCKETS` (257) buckets keyed by thread id. Each bucket is |
| 162 | +protected by its own `PyMutex`. Per-thread state |
| 163 | +(`struct _brc_thread_state`) stores a stack of objects to merge. |
| 164 | + |
| 165 | + |
| 166 | +## Per-Object Locking (Critical Sections) |
| 167 | + |
| 168 | +Instead of locking `ob_mutex` directly, code must use the **critical |
| 169 | +section API**. Critical sections integrate with the thread state to |
| 170 | +support suspension (e.g., when a thread needs to release locks for GC) |
| 171 | +and deadlock avoidance. |
| 172 | + |
| 173 | +Defined in `Include/cpython/critical_section.h` (public) and |
| 174 | +`Include/internal/pycore_critical_section.h` (internal): |
| 175 | + |
| 176 | +| Macro | Description | |
| 177 | +|---|---| |
| 178 | +| `Py_BEGIN_CRITICAL_SECTION(op)` | Acquires `op->ob_mutex` via the critical section stack. | |
| 179 | +| `Py_END_CRITICAL_SECTION()` | Releases the lock and pops the critical section. | |
| 180 | +| `Py_BEGIN_CRITICAL_SECTION2(a, b)` | Acquires locks on two objects. Sorts mutexes by address to prevent deadlocks. If both arguments are the same object, degrades to a single-mutex critical section. | |
| 181 | +| `Py_END_CRITICAL_SECTION2()` | Releases both locks. | |
| 182 | +| `Py_BEGIN_CRITICAL_SECTION_SEQUENCE_FAST(op)` | Specialized variant for `PySequence_Fast` — only locks if `op` is a list (`PyList_CheckExact`). Tuples are immutable and need no locking. | |
| 183 | + |
| 184 | +All critical section macros are **no-ops** in GIL-enabled builds (they |
| 185 | +expand to just `{` and `}`). |
| 186 | + |
| 187 | +### PyMutex |
| 188 | + |
| 189 | +`PyMutex` (defined in `Include/cpython/pylock.h`) is a one-byte mutex: |
| 190 | + |
| 191 | +``` |
| 192 | +Bit layout (only the two least significant bits are used): |
| 193 | + 0b00 unlocked |
| 194 | + 0b01 locked |
| 195 | + 0b10 unlocked and has parked threads |
| 196 | + 0b11 locked and has parked threads |
| 197 | +``` |
| 198 | + |
| 199 | +The fast path uses a single compare-and-swap. If contended, the calling |
| 200 | +thread is "parked" until the mutex is unlocked. If the current thread |
| 201 | +holds the GIL, the GIL is released while the thread is parked. |
| 202 | + |
| 203 | +| Function | Description | |
| 204 | +|---|---| |
| 205 | +| `PyMutex_Lock(m)` | Locks the mutex. Parks the calling thread if contended. | |
| 206 | +| `PyMutex_Unlock(m)` | Unlocks the mutex. Wakes a parked thread if any. | |
| 207 | +| `PyMutex_IsLocked(m)` | Returns non-zero if currently locked. | |
| 208 | + |
| 209 | + |
| 210 | +## Immortalization |
| 211 | + |
| 212 | +Immortal objects are never deallocated and their reference counts are |
| 213 | +never modified. In free-threaded builds, immortality is indicated by |
| 214 | +`ob_ref_local == UINT32_MAX` (`_Py_IMMORTAL_REFCNT_LOCAL`). |
| 215 | + |
| 216 | +Defined in `Include/refcount.h` and `Include/internal/pycore_object.h`: |
| 217 | + |
| 218 | +| Primitive | Description | |
| 219 | +|---|---| |
| 220 | +| `_Py_IsImmortal(op)` | True if `op` is immortal. | |
| 221 | +| `_Py_IsStaticImmortal(op)` | True if `op` was statically allocated as immortal. | |
| 222 | +| `_Py_SetImmortal(op)` | Promotes `op` to immortal. | |
| 223 | +| `_Py_SetMortal(op, refcnt)` | Demotes `op` back to mortal. Should only be used during runtime finalization. | |
| 224 | + |
| 225 | + |
| 226 | +## Deferred Reference Counting |
| 227 | + |
| 228 | +Frequently shared objects (e.g., top-level functions, types, modules) |
| 229 | +use **deferred reference counting** to avoid contention on their |
| 230 | +reference count fields. These objects add `_Py_REF_DEFERRED` |
| 231 | +(`PY_SSIZE_T_MAX / 8`) to `ob_ref_shared` so that they are not |
| 232 | +immediately deallocated when the non-deferred reference count drops to |
| 233 | +zero. They are only freed during cyclic garbage collection. |
| 234 | + |
| 235 | +Defined in `Include/internal/pycore_object_deferred.h`: |
| 236 | + |
| 237 | +| Primitive | Description | |
| 238 | +|---|---| |
| 239 | +| `_PyObject_SetDeferredRefcount(op)` | Marks `op` as using deferred reference counting. Objects that use deferred reference counting should be tracked by the GC so that they are eventually collected. No-op in GIL builds. | |
| 240 | +| `_PyObject_HasDeferredRefcount(op)` | True if `op` uses deferred reference counting (checks `_PyGC_BITS_DEFERRED`). Always returns 0 in GIL builds. | |
| 241 | + |
| 242 | +The GC scans each thread's evaluation stack and local variables to keep |
| 243 | +deferred-refcounted objects alive. |
| 244 | + |
| 245 | +See also: [Stack references (_PyStackRef)](stackrefs.md) for how |
| 246 | +deferred reference counting interacts with the evaluation stack. |
| 247 | + |
| 248 | + |
| 249 | +## GC Bit Flags |
| 250 | + |
| 251 | +In free-threaded builds, GC state is stored in `ob_gc_bits` rather than |
| 252 | +a separate `PyGC_Head` linked list. Defined in |
| 253 | +`Include/internal/pycore_gc.h`: |
| 254 | + |
| 255 | +| Flag | Bit | Meaning | |
| 256 | +|---|---|---| |
| 257 | +| `_PyGC_BITS_TRACKED` | 0 | Tracked by the GC. | |
| 258 | +| `_PyGC_BITS_FINALIZED` | 1 | `tp_finalize` was called. | |
| 259 | +| `_PyGC_BITS_UNREACHABLE` | 2 | Object determined unreachable during collection. | |
| 260 | +| `_PyGC_BITS_FROZEN` | 3 | Object is frozen (not collected). | |
| 261 | +| `_PyGC_BITS_SHARED` | 4 | Object is shared between threads. | |
| 262 | +| `_PyGC_BITS_ALIVE` | 5 | Reachable from a known root. | |
| 263 | +| `_PyGC_BITS_DEFERRED` | 6 | Uses deferred reference counting. | |
| 264 | + |
| 265 | +Setting the bits requires a relaxed store and the per-object lock must |
| 266 | +be held (except when the object is only visible to a single thread, |
| 267 | +e.g., during initialization or destruction). Reading the bits requires |
| 268 | +a relaxed load but does not require the per-object lock. |
| 269 | + |
| 270 | +Helpers: |
| 271 | + |
| 272 | +| Primitive | Description | |
| 273 | +|---|---| |
| 274 | +| `_PyObject_SET_GC_BITS(op, bits)` | Sets the specified bits (atomic relaxed load-modify-store). | |
| 275 | +| `_PyObject_HAS_GC_BITS(op, bits)` | True if the specified bits are set (atomic relaxed load). | |
| 276 | +| `_PyObject_CLEAR_GC_BITS(op, bits)` | Clears the specified bits (atomic relaxed load-modify-store). | |
| 277 | + |
| 278 | + |
| 279 | +## Atomic Operations |
| 280 | + |
| 281 | +### Unconditional Atomics (`_Py_atomic_*`) |
| 282 | + |
| 283 | +Full atomic operations defined in `Include/cpython/pyatomic.h`. These |
| 284 | +are always atomic regardless of build configuration. They support |
| 285 | +multiple memory orderings (relaxed, acquire, release, seq_cst). |
| 286 | + |
| 287 | +### Free-Threading Wrappers (`FT_ATOMIC_*`) |
| 288 | + |
| 289 | +Defined in `Include/internal/pycore_pyatomic_ft_wrappers.h`. These |
| 290 | +are **atomic in free-threaded builds** and plain reads/writes in |
| 291 | +GIL-enabled builds: |
| 292 | + |
| 293 | +```c |
| 294 | +// Free-threaded build: |
| 295 | +#define FT_ATOMIC_LOAD_PTR(value) _Py_atomic_load_ptr(&value) |
| 296 | +#define FT_MUTEX_LOCK(lock) PyMutex_Lock(lock) |
| 297 | + |
| 298 | +// GIL-enabled build: |
| 299 | +#define FT_ATOMIC_LOAD_PTR(value) value |
| 300 | +#define FT_MUTEX_LOCK(lock) do {} while (0) |
| 301 | +``` |
| 302 | +
|
| 303 | +Use `FT_ATOMIC_*` when protecting data that is inherently safe under |
| 304 | +the GIL but needs atomics in free-threaded builds. Use `_Py_atomic_*` |
| 305 | +when the operation must always be atomic. |
| 306 | +
|
| 307 | +
|
| 308 | +## QSBR (Quiescent-State Based Reclamation) |
| 309 | +
|
| 310 | +QSBR is used to safely reclaim memory that has been logically removed |
| 311 | +from a data structure but may still be accessed by concurrent readers |
| 312 | +(e.g., resized list backing arrays, dictionary keys). |
| 313 | +
|
| 314 | +See the dedicated document: [Quiescent-State Based Reclamation (QSBR)](qsbr.md). |
| 315 | +
|
| 316 | +
|
| 317 | +## Key Source Files |
| 318 | +
|
| 319 | +| File | Contents | |
| 320 | +|---|---| |
| 321 | +| `Include/object.h` | Object header, `_Py_ThreadId`, `_Py_IsOwnedByCurrentThread` | |
| 322 | +| `Include/refcount.h` | Immortalization constants, `_Py_REF_SHARED_*` flags, `Py_INCREF`/`Py_DECREF` | |
| 323 | +| `Include/cpython/pylock.h` | `PyMutex` definition and lock/unlock | |
| 324 | +| `Include/cpython/critical_section.h` | Public critical section macros | |
| 325 | +| `Include/cpython/pyatomic.h` | Unconditional atomic operations | |
| 326 | +| `Include/internal/pycore_object.h` | `_Py_TryIncrefFast`, `_Py_TryXGetRef`, `_Py_XGetRef`, etc. | |
| 327 | +| `Include/internal/pycore_critical_section.h` | Internal critical section implementation | |
| 328 | +| `Include/internal/pycore_brc.h` | Biased reference counting (per-thread merge queues) | |
| 329 | +| `Include/internal/pycore_object_deferred.h` | Deferred reference counting | |
| 330 | +| `Include/internal/pycore_gc.h` | GC bit flags and helpers | |
| 331 | +| `Include/internal/pycore_pyatomic_ft_wrappers.h` | `FT_ATOMIC_*` wrappers | |
| 332 | +| `Include/internal/pycore_qsbr.h` | QSBR implementation | |
| 333 | +| `Python/gc_free_threading.c` | Free-threaded garbage collector | |
| 334 | +| `Python/critical_section.c` | Critical section slow paths | |
0 commit comments