fix(threading): catch std::system_error in ThreadPool::worker_thread cv.wait#656
Open
rmcgrorty wants to merge 1 commit into
Open
Conversation
…cv.wait
On macOS, libc++ translates non-zero pthread_cond_wait return codes into
std::system_error. During ThreadPool destruction, a parked worker thread
on cv.wait can race with the destructor's notify_all and observe such an
error, which then propagates uncaught and triggers std::terminate.
Symptom in the wild (M3M0 desktop app, 6 crashes 2026-04-30 → 2026-05-19,
identical signature):
std::terminate -> std::condition_variable::wait
-> CactusThreading::ThreadPool::worker_thread
The shared, process-global static ThreadPool plus per-session Model
init/destroy cycles trigger this reliably under STT workloads, especially
multi-channel sessions where multiple Transcribers share the pool.
Wrap the wait in try/catch. On caught std::system_error, re-check the
predicate. Exit cleanly if shutting down, otherwise loop and re-wait.
This is a narrow catch that does not mask other exceptions and does not
change normal-path semantics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CactusThreading::ThreadPool::worker_thread()callscv.wait()without a try/catch. On macOS, libc++'scondition_variable::waittranslates non-zeropthread_cond_waitreturn codes intostd::system_error. When the destructor races a parked worker (stop = true; notify_all()followed by the worker's lambda predicate read), the wait can throw, and the uncaught exception propagates tostd::terminate.This patch wraps the wait in a try/catch. On caught
std::system_error, the worker re-checks the predicate. If shutting down, it returns cleanly; otherwise it loops and re-waits. The catch is narrow — it does not mask other exceptions, and it does not change normal-path semantics.Motivation
We hit this in production six times across 2026-04-30 → 2026-05-19, all with identical stack signatures:
The crash is reliably triggerable on macOS during STT workloads where multiple
Transcriberinstances share the process-global staticThreadPool(e.g. multi-channel streaming sessions where N transcribers are constructed and dropped together). Shutdown sequencing of the dropping transcribers races the workers parked oncv.wait.We tried three workarounds on our side first:
Dropjoin on the Rust wrapper (~4× crash reduction, didn't eliminate)Arc<Model>to avoid per-sessioncactus_init/cactus_destroy(narrowed further, didn't eliminate)Each layer added Rust-side complexity without closing the race, because the underlying defect — uncaught exception inside a third-party C++ thread — can't be cleanly compensated for from the FFI consumer side. Catching the exception in
worker_threaditself is the right place.Patch shape
Test plan
🤖 Generated with Claude Code