Commit 7aef4e5
feat(mcp): pause_session tool + MCP-aware pause() yield mode (#5544)
* feat(mcp): pause_session tool + MCP-aware pause() yield mode
In-test pause() calls hung subprocess runs invoked through the MCP server because
readline blocked on stdin that an agent can't supply. pause() now detects MCP
context (CODECEPTJS_MCP=1, non-TTY stdin) and adapts:
- Skip mode (CODECEPTJS_MCP=1 only): pause() prints a notice and resolves
immediately so leftover pause() calls don't deadlock CI runs.
- Yield mode (CODECEPTJS_MCP_PAUSE=1): pause() reads JSON-line commands on
stdin and emits {__mcpPause:true,...} responses on stdout (paused, result,
resumed, exited, error). Each run/snapshot response includes the artifact
bundle from captureSnapshot.
The new MCP server pause_session tool spawns a test subprocess in yield mode
and multiplexes start/run/snapshot/step/resume/exit/status sub-actions over
the JSON-line protocol. TTY behavior at a terminal is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(mcp): simplify pause_session — code in, result out
Drops the id-keyed message multiplexer and 7-action enum (run/snapshot/step/
resume/exit/status). The yield-mode subprocess now reads plain text lines
from stdin (same shape as the TTY readline REPL) and emits one JSON line
per input on stdout.
The MCP server pause_session tool exposes only "start" and "run". A run
takes a code string with the same conventions as the TTY pause REPL —
"" steps, "resume" continues, "exit" aborts, otherwise treat as I.<expr>
or =>raw_js. Each run returns the next protocol message.
Net: 237 lines removed, 159 added.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(mcp): pause is a follow-up to run_test, not standalone
run_test now spawns its subprocess in pause yield mode and returns early
with {status:"paused"} when the test hits pause(). The agent then drives
the REPL through the new "pause" tool, which only takes a code string.
Drops the standalone pause_session.start action — pause only makes sense
when a test is already running. Resume / step / exit are just code values
(matching the TTY pause REPL conventions).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(mcp): drop subprocess for pause — run in-process via shared container
Previously pause yield mode spawned a test subprocess and shuttled JSON-line
messages through stdin/stdout. That was a lot of plumbing for something the
existing run_step_by_step tool already does cleanly: run codecept in-process
in the MCP server itself.
Now lib/pause.js exposes setPauseHandler/setNextStep. The MCP server
installs a handler at startup that turns pause() into a Promise the agent
controls. run_test races bootstrap+run() vs that paused promise; on pause
it returns {status:"paused"} with the test promise stashed at module level.
The pause tool drives the REPL by running code through the same I that the
test is using, no IPC. resume/exit await the test promise and return the
final reporter result.
Drops: pauseChild, pauseProtocolWaiters, pauseProcessChunk, mcpYieldSession,
emitMcpProtocol, ensureMcpReadline, the CODECEPTJS_MCP* env detection in
lib/pause.js. The TTY readline path is unchanged.
Net: 270 added, 526 removed across pause/mcp files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(mcp): drop pause tool — use run_code + continue
The pause tool was duplicating the TTY pause REPL (empty/resume/exit magic
strings, => prefix, default I.<expr>) when MCP already has run_code for
running code against the live container. Both tools share the same I, so
during a paused test, run_code is the right surface for code execution.
Replace pause with a simple "continue" tool that just releases the paused
test and returns the final reporter result. Drop setNextStep — no
step-by-step mode for MCP (use run_step_by_step if needed).
Net: 55 added, 152 removed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(mcp): don't override process.stdout across the pause window
The previous patch hijacked process.stdout.write at the start of run_test
and only restored it inside collectRunCompletion (i.e., on continue). That
muted the MCP SDK's own protocol writes during the pause window — any
run_code or continue response would be lost.
Reuse the existing withSilencedIO helper instead. Wrap run_test's race
and continue's await-pending-run inside it, so stdout is muted while
codecept is producing step output and restored before the tool returns
its MCP response. The MCP SDK writes responses on a clean stdout.
While paused, the test is suspended (handler promise unresolved), so no
test output is being produced — no need to mute. run_code calls during
pause go through the existing run_code handler, which has its own
isolation pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(mcp): pauseAt step breakpoint + rich paused payload
run_test now accepts an optional pauseAt (1-based step index). The MCP
server tracks step.after events; when stepIndex matches pauseAt, it
schedules pauseNow() through the recorder so the test pauses between
steps. Useful as a programmatic breakpoint without editing the test —
the agent gets step indices via the list CLI or run_step_by_step.
The paused response now includes:
- pausedAfter: { index, name, status } of the last completed step
- page: { url, title, contentSize } via the live helper
- suggestions: which tool to call next (snapshot / run_code / continue)
lib/pause.js gains pauseNow() which schedules a one-shot pauseSession via
recorder.add — the same mechanism as the in-test pause() but without
re-attaching the global event listeners.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(mcp): make run_step_by_step actually interactive
Previously run_step_by_step ran the whole test to completion in one call
and returned a fat blob of per-step artifacts. That's the aiTrace plugin's
job, not an interactive tool's.
Now it pauses after every step using the same pauseNow + handler machinery
as run_test's pauseAt: agent calls run_step_by_step, gets back a paused
payload after step 1, calls continue to advance to step 2, and so on. At
any pause they can run_code / snapshot to inspect state.
continue is unified: it races "test paused again" vs "test completed", so
the same call works for run_step_by_step (re-pauses each time), pauseAt
(runs to end), and explicit pause() in the test (runs to end). Module-
level pendingTestFile / pendingStepInfo carry the paused-payload data
through repeated continue cycles.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: DavertMik <davert@testomat.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 6f791dc commit 7aef4e5
5 files changed
Lines changed: 454 additions & 193 deletions
0 commit comments