Skip to content

Commit 5598717

Browse files
committed
gh-128642: Add free threading glossary of building blocks to InternalDocs
1 parent 80ba4e1 commit 5598717

2 files changed

Lines changed: 336 additions & 0 deletions

File tree

InternalDocs/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ Program Execution
4949

5050
- [Quiescent-State Based Reclamation (QSBR)](qsbr.md)
5151

52+
- [Free Threading Building Blocks](free_threading.md)
53+
5254
- [Stack protection](stack_protection.md)
5355

5456
Built-in Types

InternalDocs/free_threading.md

Lines changed: 334 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,334 @@
1+
# Free Threading Building Blocks
2+
3+
This document describes the low-level primitives used to implement
4+
free-threaded (nogil) CPython. These are internal implementation details
5+
and may change between versions.
6+
7+
For a higher-level overview of free threading aimed at Python users, see
8+
the [free-threading HOWTO](../Doc/howto/free-threading-python.rst). For
9+
C extension authors, see the
10+
[free-threading extensions HOWTO](../Doc/howto/free-threading-extensions.rst).
11+
12+
13+
## Object Header Layout
14+
15+
In free-threaded builds (`Py_GIL_DISABLED`), every `PyObject` has an
16+
extended header (defined in `Include/object.h`):
17+
18+
```c
19+
struct _object {
20+
// ob_tid stores the thread id (or zero). It is also used by the GC and the
21+
// trashcan mechanism as a linked list pointer and by the GC to store the
22+
// computed "gc_refs" refcount.
23+
uintptr_t ob_tid; // (declared with _Py_ALIGNED_DEF for alignment)
24+
uint16_t ob_flags;
25+
PyMutex ob_mutex; // per-object lock
26+
uint8_t ob_gc_bits; // gc-related state
27+
uint32_t ob_ref_local; // local reference count
28+
Py_ssize_t ob_ref_shared; // shared (atomic) reference count
29+
PyTypeObject *ob_type;
30+
};
31+
```
32+
33+
Key fields:
34+
35+
- **`ob_tid`**: The thread id of the owning thread. Set to
36+
`_Py_UNOWNED_TID` (0) for immortal objects or objects whose reference
37+
count fields have been merged. Also reused by the GC and the trashcan
38+
mechanism as a linked list pointer.
39+
40+
- **`ob_mutex`**: A one-byte mutex (`PyMutex`) used by the critical
41+
section API. Must not be locked directly; use `Py_BEGIN_CRITICAL_SECTION`
42+
instead (see [Per-Object Locking](#per-object-locking-critical-sections)
43+
below).
44+
45+
- **`ob_ref_local`**: Reference count for references held by the owning
46+
thread. Non-atomic — only the owning thread modifies it. A value of
47+
`UINT32_MAX` (`_Py_IMMORTAL_REFCNT_LOCAL`) indicates an immortal object.
48+
49+
- **`ob_ref_shared`**: Atomic reference count for cross-thread references.
50+
The two least-significant bits store state flags (see
51+
[Shared Reference Count Flags](#shared-reference-count-flags) below).
52+
The actual reference count is stored in the upper bits, shifted by
53+
`_Py_REF_SHARED_SHIFT` (2).
54+
55+
- **`ob_gc_bits`**: Bit flags for the garbage collector (see
56+
[GC Bit Flags](#gc-bit-flags) below).
57+
58+
59+
## Thread Ownership
60+
61+
Each object is "owned" by the thread that created it. The owning thread
62+
can use fast, non-atomic operations on `ob_ref_local`. Other threads
63+
must use atomic operations on `ob_ref_shared`.
64+
65+
| Primitive | Header | Description |
66+
|---|---|---|
67+
| `_Py_ThreadId()` | `Include/object.h` | Returns the current thread's id (`uintptr_t`) using platform-specific TLS. |
68+
| `_Py_IsOwnedByCurrentThread(op)` | `Include/object.h` | Returns non-zero if `op->ob_tid` matches the current thread. Uses an atomic load under ThreadSanitizer. |
69+
| `_Py_UNOWNED_TID` | `Include/object.h` | Constant (0) indicating no owning thread — used for immortal objects and objects with merged refcounts. |
70+
71+
72+
## Reference Counting Primitives
73+
74+
Free-threaded CPython splits the reference count into a local part
75+
(`ob_ref_local`) and a shared part (`ob_ref_shared`). The owning thread
76+
increments `ob_ref_local` directly. Other threads atomically update
77+
`ob_ref_shared`.
78+
79+
### Incrementing References
80+
81+
| Primitive | Header | Description |
82+
|---|---|---|
83+
| `_Py_TryIncrefFast(op)` | `pycore_object.h` | Tries to increment the refcount using the fast (non-atomic) path. Succeeds only for immortal objects or objects owned by the current thread. Returns 0 if the object requires an atomic operation. |
84+
| `_Py_TryIncRefShared(op)` | `pycore_object.h` | Atomically increments `ob_ref_shared` using a compare-and-exchange loop. Fails (returns 0) if the raw shared field is 0 or `_Py_REF_MERGED` (0x3). |
85+
| `_Py_TryIncrefCompare(src, op)` | `pycore_object.h` | Increfs `op` (trying fast path, then shared) and verifies that `*src` still points to `op`. If `*src` changed concurrently, decrefs `op` and returns 0. |
86+
| `_Py_TryIncref(op)` | `pycore_object.h` | Convenience wrapper: tries `_Py_TryIncrefFast`, then falls back to `_Py_TryIncRefShared`. In GIL builds, checks `Py_REFCNT(op) > 0` and calls `Py_INCREF`. |
87+
88+
### Safe Pointer Loading
89+
90+
These load a `PyObject *` from a location that may be concurrently
91+
updated by another thread, and return it with an incremented reference
92+
count:
93+
94+
| Primitive | Header | Description |
95+
|---|---|---|
96+
| `_Py_XGetRef(ptr)` | `pycore_object.h` | Loads `*ptr` and increfs it, retrying in a loop until it succeeds or `*ptr` is NULL. The writer must set `_Py_REF_MAYBE_WEAKREF` on the stored object. |
97+
| `_Py_TryXGetRef(ptr)` | `pycore_object.h` | Single-attempt version of `_Py_XGetRef`. Returns NULL on failure, which may be due to a NULL value or a concurrent update. |
98+
| `_Py_NewRefWithLock(op)` | `pycore_object.h` | Like `Py_NewRef` but also optimistically sets `_Py_REF_MAYBE_WEAKREF` on objects owned by a different thread. Uses the fast path for local/immortal objects, falls back to an atomic CAS loop on `ob_ref_shared`. |
99+
100+
### Shared Reference Count Flags
101+
102+
The two least-significant bits of `ob_ref_shared` encode state
103+
(defined in `Include/refcount.h`):
104+
105+
| Flag | Value | Meaning |
106+
|---|---|---|
107+
| `_Py_REF_SHARED_INIT` | `0x0` | Initial state, no flags set. |
108+
| `_Py_REF_MAYBE_WEAKREF` | `0x1` | Object may have weak references. |
109+
| `_Py_REF_QUEUED` | `0x2` | Queued for refcount merging by the owning thread. |
110+
| `_Py_REF_MERGED` | `0x3` | Local and shared refcounts have been merged. |
111+
112+
Helpers:
113+
114+
| Primitive | Header | Description |
115+
|---|---|---|
116+
| `_Py_REF_IS_MERGED(ob_ref_shared)` | `pycore_object.h` | True if the low two bits equal `_Py_REF_MERGED`. |
117+
| `_Py_REF_IS_QUEUED(ob_ref_shared)` | `pycore_object.h` | True if the low two bits equal `_Py_REF_QUEUED`. |
118+
| `_Py_ExplicitMergeRefcount(op, extra)` | `pycore_object.h` | Merges the local and shared reference count fields, adding `extra` to the refcount when merging. |
119+
| `_PyObject_SetMaybeWeakref(op)` | `pycore_object.h` | Atomically sets `_Py_REF_MAYBE_WEAKREF` on `ob_ref_shared` if no flags are currently set. No-op for immortal objects or objects already in WEAKREF/QUEUED/MERGED state. |
120+
121+
122+
## Biased Reference Counting
123+
124+
Each object has two refcount fields: `ob_ref_local` (modified only by
125+
the owning thread) and `ob_ref_shared` (modified atomically by all
126+
other threads). The true refcount is the sum of both fields. The
127+
`ob_ref_shared` field can be negative (although the total refcount must
128+
be at least zero).
129+
130+
When a non-owning thread calls `Py_DECREF`, it calls `_Py_DecRefShared`
131+
which decrements `ob_ref_shared`. If `ob_ref_shared` is already zero
132+
(or only has the `_Py_REF_MAYBE_WEAKREF` flag set, meaning the count
133+
portion is zero), the thread does not subtract from the refcount.
134+
Instead, it sets the `_Py_REF_QUEUED` flag and enqueues the object for
135+
the owning thread to merge. The queue holds a reference to the object.
136+
The owning thread is notified via the eval breaker mechanism.
137+
138+
When the owning thread processes its merge queue (via
139+
`_Py_brc_merge_refcounts`), it calls `_Py_ExplicitMergeRefcount` for
140+
each queued object, which merges `ob_ref_local` and `ob_ref_shared`
141+
into a single value, sets `ob_tid` to 0, and sets the `_Py_REF_MERGED`
142+
flag. If the merged refcount is zero, the object is deallocated.
143+
144+
Similarly, when the owning thread's own `Py_DECREF` drops `ob_ref_local`
145+
to zero, it calls `_Py_MergeZeroLocalRefcount`, which either deallocates
146+
immediately (if `ob_ref_shared` is also zero) or gives up ownership by
147+
setting `ob_tid` to 0 and `_Py_REF_MERGED`, and deallocates if the
148+
combined refcount is zero.
149+
150+
If `ob_ref_shared` is nonzero when a non-owning thread decrements it,
151+
the decrement subtracts normally — the shared count can go negative.
152+
153+
Defined in `Include/internal/pycore_brc.h`:
154+
155+
| Primitive | Description |
156+
|---|---|
157+
| `_Py_brc_queue_object(ob)` | Enqueues `ob` to be merged by its owning thread. Steals a reference to the object. |
158+
| `_Py_brc_merge_refcounts(tstate)` | Merges the refcounts of all objects queued for the current thread. |
159+
160+
The per-interpreter state (`struct _brc_state`) uses a hash table of
161+
`_Py_BRC_NUM_BUCKETS` (257) buckets keyed by thread id. Each bucket is
162+
protected by its own `PyMutex`. Per-thread state
163+
(`struct _brc_thread_state`) stores a stack of objects to merge.
164+
165+
166+
## Per-Object Locking (Critical Sections)
167+
168+
Instead of locking `ob_mutex` directly, code must use the **critical
169+
section API**. Critical sections integrate with the thread state to
170+
support suspension (e.g., when a thread needs to release locks for GC)
171+
and deadlock avoidance.
172+
173+
Defined in `Include/cpython/critical_section.h` (public) and
174+
`Include/internal/pycore_critical_section.h` (internal):
175+
176+
| Macro | Description |
177+
|---|---|
178+
| `Py_BEGIN_CRITICAL_SECTION(op)` | Acquires `op->ob_mutex` via the critical section stack. |
179+
| `Py_END_CRITICAL_SECTION()` | Releases the lock and pops the critical section. |
180+
| `Py_BEGIN_CRITICAL_SECTION2(a, b)` | Acquires locks on two objects. Sorts mutexes by address to prevent deadlocks. If both arguments are the same object, degrades to a single-mutex critical section. |
181+
| `Py_END_CRITICAL_SECTION2()` | Releases both locks. |
182+
| `Py_BEGIN_CRITICAL_SECTION_SEQUENCE_FAST(op)` | Specialized variant for `PySequence_Fast` — only locks if `op` is a list (`PyList_CheckExact`). Tuples are immutable and need no locking. |
183+
184+
All critical section macros are **no-ops** in GIL-enabled builds (they
185+
expand to just `{` and `}`).
186+
187+
### PyMutex
188+
189+
`PyMutex` (defined in `Include/cpython/pylock.h`) is a one-byte mutex:
190+
191+
```
192+
Bit layout (only the two least significant bits are used):
193+
0b00 unlocked
194+
0b01 locked
195+
0b10 unlocked and has parked threads
196+
0b11 locked and has parked threads
197+
```
198+
199+
The fast path uses a single compare-and-swap. If contended, the calling
200+
thread is "parked" until the mutex is unlocked. If the current thread
201+
holds the GIL, the GIL is released while the thread is parked.
202+
203+
| Function | Description |
204+
|---|---|
205+
| `PyMutex_Lock(m)` | Locks the mutex. Parks the calling thread if contended. |
206+
| `PyMutex_Unlock(m)` | Unlocks the mutex. Wakes a parked thread if any. |
207+
| `PyMutex_IsLocked(m)` | Returns non-zero if currently locked. |
208+
209+
210+
## Immortalization
211+
212+
Immortal objects are never deallocated and their reference counts are
213+
never modified. In free-threaded builds, immortality is indicated by
214+
`ob_ref_local == UINT32_MAX` (`_Py_IMMORTAL_REFCNT_LOCAL`).
215+
216+
Defined in `Include/refcount.h` and `Include/internal/pycore_object.h`:
217+
218+
| Primitive | Description |
219+
|---|---|
220+
| `_Py_IsImmortal(op)` | True if `op` is immortal. |
221+
| `_Py_IsStaticImmortal(op)` | True if `op` was statically allocated as immortal. |
222+
| `_Py_SetImmortal(op)` | Promotes `op` to immortal. |
223+
| `_Py_SetMortal(op, refcnt)` | Demotes `op` back to mortal. Should only be used during runtime finalization. |
224+
225+
226+
## Deferred Reference Counting
227+
228+
Frequently shared objects (e.g., top-level functions, types, modules)
229+
use **deferred reference counting** to avoid contention on their
230+
reference count fields. These objects add `_Py_REF_DEFERRED`
231+
(`PY_SSIZE_T_MAX / 8`) to `ob_ref_shared` so that they are not
232+
immediately deallocated when the non-deferred reference count drops to
233+
zero. They are only freed during cyclic garbage collection.
234+
235+
Defined in `Include/internal/pycore_object_deferred.h`:
236+
237+
| Primitive | Description |
238+
|---|---|
239+
| `_PyObject_SetDeferredRefcount(op)` | Marks `op` as using deferred reference counting. Objects that use deferred reference counting should be tracked by the GC so that they are eventually collected. No-op in GIL builds. |
240+
| `_PyObject_HasDeferredRefcount(op)` | True if `op` uses deferred reference counting (checks `_PyGC_BITS_DEFERRED`). Always returns 0 in GIL builds. |
241+
242+
The GC scans each thread's evaluation stack and local variables to keep
243+
deferred-refcounted objects alive.
244+
245+
See also: [Stack references (_PyStackRef)](stackrefs.md) for how
246+
deferred reference counting interacts with the evaluation stack.
247+
248+
249+
## GC Bit Flags
250+
251+
In free-threaded builds, GC state is stored in `ob_gc_bits` rather than
252+
a separate `PyGC_Head` linked list. Defined in
253+
`Include/internal/pycore_gc.h`:
254+
255+
| Flag | Bit | Meaning |
256+
|---|---|---|
257+
| `_PyGC_BITS_TRACKED` | 0 | Tracked by the GC. |
258+
| `_PyGC_BITS_FINALIZED` | 1 | `tp_finalize` was called. |
259+
| `_PyGC_BITS_UNREACHABLE` | 2 | Object determined unreachable during collection. |
260+
| `_PyGC_BITS_FROZEN` | 3 | Object is frozen (not collected). |
261+
| `_PyGC_BITS_SHARED` | 4 | Object is shared between threads. |
262+
| `_PyGC_BITS_ALIVE` | 5 | Reachable from a known root. |
263+
| `_PyGC_BITS_DEFERRED` | 6 | Uses deferred reference counting. |
264+
265+
Setting the bits requires a relaxed store and the per-object lock must
266+
be held (except when the object is only visible to a single thread,
267+
e.g., during initialization or destruction). Reading the bits requires
268+
a relaxed load but does not require the per-object lock.
269+
270+
Helpers:
271+
272+
| Primitive | Description |
273+
|---|---|
274+
| `_PyObject_SET_GC_BITS(op, bits)` | Sets the specified bits (atomic relaxed load-modify-store). |
275+
| `_PyObject_HAS_GC_BITS(op, bits)` | True if the specified bits are set (atomic relaxed load). |
276+
| `_PyObject_CLEAR_GC_BITS(op, bits)` | Clears the specified bits (atomic relaxed load-modify-store). |
277+
278+
279+
## Atomic Operations
280+
281+
### Unconditional Atomics (`_Py_atomic_*`)
282+
283+
Full atomic operations defined in `Include/cpython/pyatomic.h`. These
284+
are always atomic regardless of build configuration. They support
285+
multiple memory orderings (relaxed, acquire, release, seq_cst).
286+
287+
### Free-Threading Wrappers (`FT_ATOMIC_*`)
288+
289+
Defined in `Include/internal/pycore_pyatomic_ft_wrappers.h`. These
290+
are **atomic in free-threaded builds** and plain reads/writes in
291+
GIL-enabled builds:
292+
293+
```c
294+
// Free-threaded build:
295+
#define FT_ATOMIC_LOAD_PTR(value) _Py_atomic_load_ptr(&value)
296+
#define FT_MUTEX_LOCK(lock) PyMutex_Lock(lock)
297+
298+
// GIL-enabled build:
299+
#define FT_ATOMIC_LOAD_PTR(value) value
300+
#define FT_MUTEX_LOCK(lock) do {} while (0)
301+
```
302+
303+
Use `FT_ATOMIC_*` when protecting data that is inherently safe under
304+
the GIL but needs atomics in free-threaded builds. Use `_Py_atomic_*`
305+
when the operation must always be atomic.
306+
307+
308+
## QSBR (Quiescent-State Based Reclamation)
309+
310+
QSBR is used to safely reclaim memory that has been logically removed
311+
from a data structure but may still be accessed by concurrent readers
312+
(e.g., resized list backing arrays, dictionary keys).
313+
314+
See the dedicated document: [Quiescent-State Based Reclamation (QSBR)](qsbr.md).
315+
316+
317+
## Key Source Files
318+
319+
| File | Contents |
320+
|---|---|
321+
| `Include/object.h` | Object header, `_Py_ThreadId`, `_Py_IsOwnedByCurrentThread` |
322+
| `Include/refcount.h` | Immortalization constants, `_Py_REF_SHARED_*` flags, `Py_INCREF`/`Py_DECREF` |
323+
| `Include/cpython/pylock.h` | `PyMutex` definition and lock/unlock |
324+
| `Include/cpython/critical_section.h` | Public critical section macros |
325+
| `Include/cpython/pyatomic.h` | Unconditional atomic operations |
326+
| `Include/internal/pycore_object.h` | `_Py_TryIncrefFast`, `_Py_TryXGetRef`, `_Py_XGetRef`, etc. |
327+
| `Include/internal/pycore_critical_section.h` | Internal critical section implementation |
328+
| `Include/internal/pycore_brc.h` | Biased reference counting (per-thread merge queues) |
329+
| `Include/internal/pycore_object_deferred.h` | Deferred reference counting |
330+
| `Include/internal/pycore_gc.h` | GC bit flags and helpers |
331+
| `Include/internal/pycore_pyatomic_ft_wrappers.h` | `FT_ATOMIC_*` wrappers |
332+
| `Include/internal/pycore_qsbr.h` | QSBR implementation |
333+
| `Python/gc_free_threading.c` | Free-threaded garbage collector |
334+
| `Python/critical_section.c` | Critical section slow paths |

0 commit comments

Comments
 (0)