feat(stop-hook): reply-send guard for interactive channels (Signal + Email)#199
Open
mxzinke wants to merge 3 commits into
Open
feat(stop-hook): reply-send guard for interactive channels (Signal + Email)#199mxzinke wants to merge 3 commits into
mxzinke wants to merge 3 commits into
Conversation
Interactive Signal sessions must call `signal send` explicitly to deliver a
reply — the harness does not auto-send the assistant's prose. The recurring
KW24 failure: the agent writes a full reply as text, never sends it, and the
user is left waiting ("Hello?" / "Und?").
This is a real harness gap, not a prompting problem, so the fix is enforcement:
- signal-addon.py: new `has_pending_reply(db, contact)` + `needs-reply <number>`
subcommand. Ground truth is the Signal DB — inbound (cmd_incoming) and
outbound (cmd_send) rows are stored in order, so a highest-id row that is
still inbound means no reply has gone out. Exits 0 when a reply is pending,
1 otherwise; fails open on any error.
- stop.sh: reads the hook payload from stdin and, for ATLAS_TRIGGER_CHANNEL=
signal, blocks the stop with a clear nudge when a reply is pending. Gated by
`stop_hook_active` so it fires at most once per turn — if the agent genuinely
has nothing to send, the next stop is allowed instead of looping.
Tests: 6 new cases for has_pending_reply (inbound/outbound/new-inbound/unknown/
empty/per-contact); 48 pass total. Guard branches + one-shot behavior verified
manually. Docs updated (hooks.md, Integrations.md).
…rding Per review feedback: the same "composed but never sent" failure mode applies to the email-handler channel, where the agent must run `email reply` explicitly — prose in the turn is never delivered. Generalize the guard to both interactive reply channels. - email_db.py: new EmailDb.reply_pending(thread_id) — newest message in the thread inbound (direction='in') ⇒ reply outstanding. Ordered by created_at,id. - email-addon.py: new `email needs-reply <thread_id>` subcommand (silent, exits 0 when pending / 1 otherwise, fails open). - stop.sh: the guard is now channel-driven (signal → signal send, email → email reply). Reminder wording spells out that the correspondent only sees what is actually sent, and that no-reply cases (already answered, triaged/ archived, acknowledgement) may stop — it fires only once. Tests: 5 new reply_pending cases (email_db 70 pass); signal 48, email_addon 161 unchanged. CLI exit codes verified for both channels. Docs updated.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
KW24 post-mortem (§3.1) + review feedback from Max: on interactive reply channels the agent writes a full reply as prose in its turn but forgets to actually send it (
signal send/email reply), so the correspondent receives nothing and has to nudge ("Hello?" / "Und?"). The harness does not auto-deliver the turn's prose — so "forgot to send" is a structural failure, not a discipline problem.Max's design criteria, all met:
What changed
Ground truth from each channel's own DB (no transcript parsing):
signal needs-reply <number>—has_pending_reply()on the Signal DB.email needs-reply <thread_id>—EmailDb.reply_pending(). Newest message inbound ⇒ reply outstanding. Exits 0 when pending, 1 otherwise; fails open on any error.stop.shreads the hook payload from stdin and, forATLAS_TRIGGER_CHANNELin {signal, email}, blocks the stop once with a channel-specific reminder spelling out that the correspondent only sees what's actually sent — not the turn's prose — and that no-reply cases (already answered, triaged/archived, acknowledgement) may stop.stop_hook_active→ at most once per turn; never a hard gate.timeout-guarded so the hook can never hang a stop.Tests
has_pending_replycases (48 pass). Email: 5reply_pendingcases (email_db 70 pass); email_addon 161 unchanged.Docs:
hooks.md,Integrations.md.🤖 Generated with Atlas