future: make cfuture co_await loop-affine too (stacked on #54)#55
future: make cfuture co_await loop-affine too (stacked on #54)#55battlesnake wants to merge 1 commit into
Conversation
…drift) #54 made task co_await loop-affine (task_final_awaiter posts the continuation back to the owner loop). cfuture coroutines (initial/final_suspend = never, no task_final_awaiter) were left out: cfuture::await_suspend spawns a pool worker that waits on the future and called cont.resume() on that worker, so an event-loop-bound coroutine that co_awaits a cfuture drifts onto a pool thread and then races the loop thread on its own coroutine frame. A chain of cfuture coroutines (each co_awaiting the next) drifts across pool threads with no happens-before -> TSAN data race on the frame. Fix (mirror #54's loop-affine-by-default): - future_types.h: move the loop-context TLS (loop_post_fn / loop_ctx / tl_loop) here from task.h so both the task and the cfuture awaiters can capture it. - future.h: cfuture<T>::await_suspend and cfuture<void>::await_suspend capture tl_loop and, on the pool worker, post the continuation back to the owner loop (falling back to cont.resume() for standalone, no-loop use). - event_loop.cpp: ~event_loop clears tl_loop if it still points at this loop. The constructor advertises the loop on the owner thread's TLS; without the matching clear a later task/cfuture await on that thread would capture a pointer to the destroyed loop (use-after-free) - a pre-existing latent bug the cfuture change above would otherwise widen. Test: tsan_cfuture_await_resumes_on_loop_thread - an event_task co_awaits a pool-completing cfuture directly and asserts the continuation runs on the loop thread (RED before the fix, GREEN after). The existing run_until_done task test is updated: with the loop-affine cfuture await the body now stays on the loop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
97ec896 to
0bfa8d2
Compare
|
Heads up before this merges: I pinned a downstream consumer (krattlink) to this commit to verify it end to end, and it regressed a radio-config handshake test.
I've reverted the krattlink pin back to #54 for now (which is functionally correct, 102ms pass). This PR needs rework before it can land: keep cfuture awaits loop-affine without stalling a cross-loop handshake. Worth investigating whether the loop post needs a wakeup/notify, or whether the downstream test simply needs to pump both loops. Converting to draft until then. |
Stacked on #54 (base branch:
fix/event-loop-task-pool-thread-race).Problem
#54 made
taskco_awaitloop-affine:task_final_awaiterposts the continuation back to the owner loop instead of resuming on the completing pool thread.cfuturecoroutines were left out - they useinitial_suspend/final_suspend = suspend_neverand have notask_final_awaiter, andcfuture::await_suspendspawns a pool worker that callscont.resume()on that worker. So an event-loop-bound coroutine thatco_awaits a cfuture drifts onto a pool thread, and a chain of cfuture coroutines hops across pool threads with no happens-before edge, racing each other's frames under TSAN.Approach (mirror #54's loop-affine-by-default)
loop_post_fn/loop_ctx/tl_loop) here from task.h so both the task and cfuture awaiters share it.cfuture<T>/cfuture<void>::await_suspendcapturetl_loopand post the continuation back to the owner loop (falling back tocont.resume()with no loop context).~event_loopclearstl_loopif it still points at this loop, so a later await on that thread cannot capture a pointer to the destroyed loop.Test
tsan_cfuture_await_resumes_on_loop_thread: anevent_taskco_awaits a pool-completing cfuture and asserts the continuation runs on the loop thread.