fix(python): avoid async callback GIL deadlock via unbounded work channel#2005
Open
chaliy wants to merge 7 commits into
Open
fix(python): avoid async callback GIL deadlock via unbounded work channel#2005chaliy wants to merge 7 commits into
chaliy wants to merge 7 commits into
Conversation
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
bashkit | 4199313 | Commit Preview URL | Jun 10 2026, 09:44 AM |
There was a problem hiding this comment.
Pull request overview
This PR fixes a deterministic Python GIL deadlock in the private-loop async callback path by ensuring the caller’s work dispatch cannot block while still holding the GIL, and adds a regression test that reliably detects the deadlock via a subprocess.
Changes:
- Replace the private-loop worker work channel from a zero-capacity rendezvous channel to an unbounded
std::sync::mpsc::Sendersosend()cannot block during worker startup while the caller holds the GIL. - Keep the result handoff on a zero-capacity channel with the caller waiting while detached from the GIL.
- Add a subprocess-based pytest regression test to ensure the first
execute_sync()call with an async callback on the private loop completes (no deadlock).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
crates/bashkit-python/src/lib.rs |
Switch private-loop work dispatch to an unbounded channel and update comments explaining the deadlock avoidance. |
crates/bashkit-python/tests/test_async_callbacks.py |
Add a subprocess regression test to catch the GIL deadlock deterministically. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The previous comment described a capacity-0 rendezvous channel and explained why GIL-detach was needed for the send. With the unbounded channel the send is non-blocking; GIL release is only required for the result rendezvous. Update the threat-model comment to match.
3ff6055 to
13fc9d3
Compare
Variant (1) is now mitigated by using an unbounded channel() so send() cannot block, rather than solely relying on GIL detach around a rendezvous. Add regression test reference for the first-call deadlock.
- Propagate sys.path via PYTHONPATH to subprocess so it finds the compiled native extension in CI (maturin PYTHONPATH-only setup) - Increase subprocess timeout from 3s to 10s to avoid flakiness on loaded CI runners during extension startup - Add comment to unbounded channel explaining natural queue-depth bound (each caller blocks on result_rx.recv(), so queue depth ≤ concurrent calls)
… test
The previous PYTHONPATH propagation (sys.path join) injected source-tree
paths before site-packages, causing the subprocess to find bashkit/__init__.py
from the source directory but fail to find _bashkit.so (which is only in
site-packages after `pip install`).
Use importlib.util.find_spec('bashkit._bashkit') to locate the .so in the
same installation the test process uses, then prepend its parent directory
to PYTHONPATH. Works for both pip-installed (site-packages) and maturin-
develop (source tree) setups.
…nd_spec
find_spec("bashkit._bashkit") returns None in CI because pytest already
imports bashkit from the source tree; the submodule lookup resolves in that
context, finds no .so, and returns None — leaving PYTHONPATH unset and the
subprocess still hitting the source tree.
Switch to scanning sys.path with glob.glob for _bashkit*.so / *.pyd: this
is independent of Python import state and correctly finds site-packages (pip
install) or the source tree (maturin develop) in all environments.
pytest runs from crates/bashkit-python/, so the subprocess inherits that CWD. Python always puts '' (= CWD) as sys.path[0], placing the source tree before any PYTHONPATH entries — including the site-packages path we derive from the glob scan. Result: subprocess finds bashkit/__init__.py from source but not _bashkit.so, causing ModuleNotFoundError. Pass cwd='/' so '' resolves to a neutral directory with no bashkit/ subdir. The PYTHONPATH (site-packages or maturin-develop source tree, found via glob) is then the first real bashkit candidate and the .so is always found.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Switches the private-loop async callback work dispatch from a zero-capacity rendezvous channel (
sync_channel(0)) to an unboundedchannel(), sosend()can never block while the caller holds the GIL.Rebased on main (includes #2007/#2008 fixes). Updates TM-PY-030 threat model entry and corrects the inline code comment.
Why
The
sync_channel(0)rendezvous was the root cause of the TM-PY-030 variant (1) GIL deadlock: on first call, the worker must attach to Python to create its asyncio loop before it reachesrecv(), so dispatcher and worker waited on each other with the GIL held.#2007 mitigated this by detaching the GIL around the send. This PR eliminates the root cause entirely: an unbounded channel makes
send()non-blocking regardless of worker state. Only the result wait (result_rx.recv()) needs the GIL detached, which is preserved.Changes
crates/bashkit-python/src/lib.rs:SyncSender/sync_channel(0)→Sender/channel()for the work dispatch channel; update TM-PY-030 inline comment to reflect unbounded semantics.crates/bashkit-python/tests/test_async_callbacks.py: add subprocess-based regression testtest_async_callback_execute_sync_first_private_loop_call_does_not_deadlockthat reliably detects the deadlock (runs in a fresh process with 3 s timeout).specs/threat-model.md: update TM-PY-030 mitigation description for variant (1) and add regression test reference.Security
The unbounded channel cannot cause unbounded queuing:
result_rx.recv()blocks until the worker finishes, so the caller cannot issue a second send before the first result returns — queue depth ≤ 1 in practice.