Skip to content

Commit 7aef4e5

Browse files
DavertMikDavertMikclaude
authored
feat(mcp): pause_session tool + MCP-aware pause() yield mode (#5544)
* feat(mcp): pause_session tool + MCP-aware pause() yield mode In-test pause() calls hung subprocess runs invoked through the MCP server because readline blocked on stdin that an agent can't supply. pause() now detects MCP context (CODECEPTJS_MCP=1, non-TTY stdin) and adapts: - Skip mode (CODECEPTJS_MCP=1 only): pause() prints a notice and resolves immediately so leftover pause() calls don't deadlock CI runs. - Yield mode (CODECEPTJS_MCP_PAUSE=1): pause() reads JSON-line commands on stdin and emits {__mcpPause:true,...} responses on stdout (paused, result, resumed, exited, error). Each run/snapshot response includes the artifact bundle from captureSnapshot. The new MCP server pause_session tool spawns a test subprocess in yield mode and multiplexes start/run/snapshot/step/resume/exit/status sub-actions over the JSON-line protocol. TTY behavior at a terminal is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(mcp): simplify pause_session — code in, result out Drops the id-keyed message multiplexer and 7-action enum (run/snapshot/step/ resume/exit/status). The yield-mode subprocess now reads plain text lines from stdin (same shape as the TTY readline REPL) and emits one JSON line per input on stdout. The MCP server pause_session tool exposes only "start" and "run". A run takes a code string with the same conventions as the TTY pause REPL — "" steps, "resume" continues, "exit" aborts, otherwise treat as I.<expr> or =>raw_js. Each run returns the next protocol message. Net: 237 lines removed, 159 added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(mcp): pause is a follow-up to run_test, not standalone run_test now spawns its subprocess in pause yield mode and returns early with {status:"paused"} when the test hits pause(). The agent then drives the REPL through the new "pause" tool, which only takes a code string. Drops the standalone pause_session.start action — pause only makes sense when a test is already running. Resume / step / exit are just code values (matching the TTY pause REPL conventions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(mcp): drop subprocess for pause — run in-process via shared container Previously pause yield mode spawned a test subprocess and shuttled JSON-line messages through stdin/stdout. That was a lot of plumbing for something the existing run_step_by_step tool already does cleanly: run codecept in-process in the MCP server itself. Now lib/pause.js exposes setPauseHandler/setNextStep. The MCP server installs a handler at startup that turns pause() into a Promise the agent controls. run_test races bootstrap+run() vs that paused promise; on pause it returns {status:"paused"} with the test promise stashed at module level. The pause tool drives the REPL by running code through the same I that the test is using, no IPC. resume/exit await the test promise and return the final reporter result. Drops: pauseChild, pauseProtocolWaiters, pauseProcessChunk, mcpYieldSession, emitMcpProtocol, ensureMcpReadline, the CODECEPTJS_MCP* env detection in lib/pause.js. The TTY readline path is unchanged. Net: 270 added, 526 removed across pause/mcp files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(mcp): drop pause tool — use run_code + continue The pause tool was duplicating the TTY pause REPL (empty/resume/exit magic strings, => prefix, default I.<expr>) when MCP already has run_code for running code against the live container. Both tools share the same I, so during a paused test, run_code is the right surface for code execution. Replace pause with a simple "continue" tool that just releases the paused test and returns the final reporter result. Drop setNextStep — no step-by-step mode for MCP (use run_step_by_step if needed). Net: 55 added, 152 removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mcp): don't override process.stdout across the pause window The previous patch hijacked process.stdout.write at the start of run_test and only restored it inside collectRunCompletion (i.e., on continue). That muted the MCP SDK's own protocol writes during the pause window — any run_code or continue response would be lost. Reuse the existing withSilencedIO helper instead. Wrap run_test's race and continue's await-pending-run inside it, so stdout is muted while codecept is producing step output and restored before the tool returns its MCP response. The MCP SDK writes responses on a clean stdout. While paused, the test is suspended (handler promise unresolved), so no test output is being produced — no need to mute. run_code calls during pause go through the existing run_code handler, which has its own isolation pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(mcp): pauseAt step breakpoint + rich paused payload run_test now accepts an optional pauseAt (1-based step index). The MCP server tracks step.after events; when stepIndex matches pauseAt, it schedules pauseNow() through the recorder so the test pauses between steps. Useful as a programmatic breakpoint without editing the test — the agent gets step indices via the list CLI or run_step_by_step. The paused response now includes: - pausedAfter: { index, name, status } of the last completed step - page: { url, title, contentSize } via the live helper - suggestions: which tool to call next (snapshot / run_code / continue) lib/pause.js gains pauseNow() which schedules a one-shot pauseSession via recorder.add — the same mechanism as the in-test pause() but without re-attaching the global event listeners. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(mcp): make run_step_by_step actually interactive Previously run_step_by_step ran the whole test to completion in one call and returned a fat blob of per-step artifacts. That's the aiTrace plugin's job, not an interactive tool's. Now it pauses after every step using the same pauseNow + handler machinery as run_test's pauseAt: agent calls run_step_by_step, gets back a paused payload after step 1, calls continue to advance to step 2, and so on. At any pause they can run_code / snapshot to inspect state. continue is unified: it races "test paused again" vs "test completed", so the same call works for run_step_by_step (re-pauses each time), pauseAt (runs to end), and explicit pause() in the test (runs to end). Module- level pendingTestFile / pendingStepInfo carry the paused-payload data through repeated continue cycles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: DavertMik <davert@testomat.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 6f791dc commit 7aef4e5

5 files changed

Lines changed: 454 additions & 193 deletions

File tree

0 commit comments

Comments
 (0)