Fix cross-thread data races where event-loop coroutines complete off the loop thread#54
Open
battlesnake wants to merge 6 commits into
Open
Fix cross-thread data races where event-loop coroutines complete off the loop thread#54battlesnake wants to merge 6 commits into
battlesnake wants to merge 6 commits into
Conversation
… via cfuture Root cause: task_final_awaiter did inline symmetric transfer to the event_task continuation on whatever thread the inner task completed on (e.g. a pool thread resuming a co_await cfuture). The event_task body then ran off the loop thread, racing the loop thread's event_task.done() check in run_until_done(). Fix (Option A - loop-affine by default): - task.h: add inline thread_local tl_loop_post_fn/tl_loop_ctx in rpp::detail. task_base::await_suspend captures them into the promise atomics (continuation, resume_via, resume_ctx) using a CAS handshake with task_final_awaiter so that both the "task finishes first" and "awaiter sets up first" races are safe. task_final_awaiter reads them with acquire semantics and posts the continuation back to the owner loop instead of doing symmetric transfer. - event_loop.cpp: constructor registers loop_post_resume on the owner thread's TLS (covering eagerly-started event_tasks before the first process_event call). process_event saves/restores TLS around each resume so nested loops are correct. loop_post_resume is a static C-function trampoline that calls post_resume(). - tests/test_event_loop.cpp: add TestCase(tsan_task_await_resumes_on_loop_thread) that co_awaits a cfuture-completing task<int> from an event_task and asserts the continuation runs on the loop thread; passes TSAN clean 5/5 runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The CAS failure in task_final_awaiter used memory_order_relaxed, so it did not synchronize with the release CAS in task_base::await_suspend. On weak-memory architectures (ARM) this means resume_via/resume_ctx stores (written before the await_suspend release CAS) were not guaranteed visible to task_final_awaiter after the relaxed failure, breaking the loop-post-back mechanism. Fix: use memory_order_acquire on the CAS failure so it pairs with await_suspend's release and sees all prior relaxed stores. The subsequent resume_via.load can then be relaxed (HB already established). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Merge tl_loop_post_fn + tl_loop_ctx into a single loop_ctx struct (one TLS slot; cleaner save/restore in process_event) - Hoist resume_via/ctx stores before the lazy branch in await_suspend (was duplicated in two places, now one store unconditionally) - await_suspend CAS-failure path: use thread-locals directly instead of re-reading from atomics we just wrote on the same thread - Trim WHAT comments per style guide Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adversarial review (5-lens propose + 3-skeptic verify) of the CAS handshake confirmed resume_via/resume_ctx carried zero synchronization: they were two relaxed atomics per promise specialization, written on the loop thread before the release CAS on `continuation` and read on the completing thread only after the acquire-on-failure CAS. That is the textbook "publish a plain payload via a release flag" idiom, so they can be a single plain detail::loop_ctx member — the `continuation` atomic provides all the happens-before. Removes 4 atomic objects + 4 relaxed memory-order tokens, and makes the contract clearer (the relaxed atomics implied an ordering they did not carry). The four `continuation` memory orders and the done-sentinel are unchanged: the demotion's safety rests entirely on the acquire-on-failure CAS staying acquire. Also: move loop_post_resume above its first use, drop its forward declaration; tighten the ordering WHY-comments to name both publication sites (the lazy first-await release store is a second one the old wording omitted). Verified on the worktree (not master) with gcc and clang TSAN: test_event_loop 45/45 incl. tsan_task_await_resumes_on_loop_thread (30/30 under test_until_failure), full suite green except the pre-existing basic_duration_coro timing assertion, clang-tidy clean, only the 2 pre-existing showcase warnings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mechanical line-number sync (update_doc_linerefs.py); no display-text or API changes. Reconciles refs that drifted as this branch reworked task.h / event_loop handshake (and some pre-existing master drift in sprint.h/tests.h). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… thread Same root cause as this PR's main fix, in a second spot. run_until_done() drove a top-level task by polling handle.done() in a loop; if the task body finished on a pool thread (e.g. it awaited a cfuture directly), that poll raced the pool thread and could read the result before it was ready. Now run_until_done() awaits the task from a small internal event_task, so completion is posted back to the loop and the result is read on the loop thread — reusing the path the event_task driver already uses. The old polling loop is gone. Tests cover the task, deferred, void and exception cases; all clean under TSAN on gcc and clang. (README line refs refreshed mechanically.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
battlesnake
added a commit
that referenced
this pull request
Jun 21, 2026
…drift) #54 made task co_await loop-affine (task_final_awaiter posts the continuation back to the owner loop). cfuture coroutines (initial/final_suspend = never, no task_final_awaiter) were left out: cfuture::await_suspend spawns a pool worker that waits on the future and called cont.resume() on that worker, so an event-loop-bound coroutine that co_awaits a cfuture drifts onto a pool thread and then races the loop thread on its own coroutine frame. A chain of cfuture coroutines (each co_awaiting the next) drifts across pool threads with no happens-before -> TSAN data race on the frame. Fix (mirror #54's loop-affine-by-default): - future_types.h: move the loop-context TLS (loop_post_fn / loop_ctx / tl_loop) here from task.h so both the task and the cfuture awaiters can capture it. - future.h: cfuture<T>::await_suspend and cfuture<void>::await_suspend capture tl_loop and, on the pool worker, post the continuation back to the owner loop (falling back to cont.resume() for standalone, no-loop use). - event_loop.cpp: ~event_loop clears tl_loop if it still points at this loop. The constructor advertises the loop on the owner thread's TLS; without the matching clear a later task/cfuture await on that thread would capture a pointer to the destroyed loop (use-after-free) - a pre-existing latent bug the cfuture change above would otherwise widen. Test: tsan_cfuture_await_resumes_on_loop_thread - an event_task co_awaits a pool-completing cfuture directly and asserts the continuation runs on the loop thread (RED before the fix, GREEN after). The existing run_until_done task test is updated: with the loop-affine cfuture await the body now stays on the loop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
rpp::event_loopis meant to resume every coroutine on its own thread. Two related cases brokethat and tripped TSAN — both with the same root cause: completion was observed off the loop
instead of routed back to it.
The two races
event_taskawaiting atask— if the task's body finished on a pool thread (because itco_awaited anrpp::cfuture),task_final_awaiterdid a symmetric transfer that dragged theevent_taskonto the pool thread, racing the loop'sdone()check.run_until_done(task<T>&)/(deferred<T>&)— these drove a top-level task by pollinghandle.done()in a loop. If the task finished on a pool thread, that poll raced the poolthread on the frame, and the result could be read with no happens-before.
Repro for (1), now a regression test:
The fix — make completion loop-affine
task_final_awaiterposts the awaiter's continuation back to the owning loop (via athread-local loop handle) instead of resuming inline on whatever thread finished the task. A
small lock-free CAS handshake on
promise.continuationcoordinates the two sides — the loopthread registering the awaiter vs. the pool thread completing — with no lost wakeups or
double-resumes. Standalone use (no loop) is unchanged. The captured loop handle is a plain
member published through that CAS, so it stays correct on weak-memory hardware (ARM).
run_until_done(task/deferred)now awaits the task from a tiny internalevent_taskandreuses the (now loop-affine) event_task driver, so the result is produced and the frame
destroyed on the loop thread. The old polling loop is deleted.
No public API change — the new pieces live in
rpp::detail.Verification (gcc + clang, TSAN)
test_event_loop49/49, including 5 new regression tests (event_task await, plusrun_until_donefor task / deferred / void / exception propagation). The key ones stay clean30–50× under
test_until_failure.basic_duration_corotiming assertion (fails onmastertoo — TSAN overhead).clang-tidyclean. The 2 pre-existingtest_event_loopwarnings (lines 89/149) are adifferent mechanism (
background_awaiter_fut) and are left untouched.Notes
README.mdline references refreshed mechanically (update_doc_linerefs.py); no API ordisplay-text changes.
demote-to-plain → docs → the second race). Squash-merge if you'd rather have one.