Skip to content

Commit 72e92f4

Browse files
committed
refactor(12/12): add snapshot test fixtures and benchmarks (#330)
This is **PR 12 of 12**, the final PR in the stacked series that decouples the rendering pipeline from MCP transport. Depends on PR 11. Adds the snapshot test suite and performance benchmarks that validate the entire rendering pipeline end-to-end. These are large in line count but are almost entirely test fixtures (expected output files) and benchmark scripts. The snapshot test infrastructure captures the rendered output of tool invocations and compares against expected fixtures. This provides regression protection for the rendering pipeline -- any change to event formatting, diagnostic grouping, or output ordering will be caught by a fixture mismatch. **Test harness** (`src/snapshot-tests/`): - `harness.ts`: Core test runner that invokes tools with mock executors and captures rendered output - `fixture-io.ts`: Reads/writes fixture files, handles normalization (timestamps, paths, UUIDs) - `normalize.ts`: Output normalization for stable comparisons across environments - `resource-harness.ts`: Resource-specific snapshot testing **Fixtures**: Expected output files for each tool covering success, error, and edge case scenarios. These serve as living documentation of what each tool's output looks like. Performance benchmarks for the rendering pipeline and xcodebuild parsing: - Parser throughput: lines/second for xcodebuild output parsing - Render session performance: events/second for text and JSON strategies - End-to-end tool invocation timing These benchmarks establish baselines and can be run in CI to catch performance regressions. This PR is large by line count but low in conceptual complexity. The fixture files are auto-generated expected outputs. The benchmark scripts are straightforward timing loops. The meaningful code is the ~500 lines of test harness infrastructure. - PR 1-11/12: All code and configuration changes - **PR 12/12** (this PR): Snapshot tests and benchmarks - [ ] `npx vitest run` passes -- snapshot tests match expected fixtures - [ ] `npx vitest run --config vitest.snapshot.config.ts` runs snapshot suite specifically - [ ] Benchmarks execute without errors (performance numbers are informational)
1 parent a461561 commit 72e92f4

File tree

219 files changed

+6291
-476
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

219 files changed

+6291
-476
lines changed

docs/dev/ARCHITECTURE.md

Lines changed: 38 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -201,11 +201,12 @@ Each tool is implemented in TypeScript and follows a standardized pattern that s
201201

202202
```typescript
203203
import { z } from 'zod';
204-
import { createTypedTool } from '../../../utils/typed-tool-factory.js';
204+
import { createTypedTool, getHandlerContext } from '../../../utils/typed-tool-factory.js';
205205
import type { CommandExecutor } from '../../../utils/execution/index.js';
206206
import { getDefaultCommandExecutor } from '../../../utils/execution/index.js';
207207
import { log } from '../../../utils/logging/index.js';
208-
import { createTextResponse, createErrorResponse } from '../../../utils/responses/index.js';
208+
import { withErrorHandling } from '../../../utils/tool-error-handling.js';
209+
import { header, statusLine } from '../../../utils/tool-event-builders.js';
209210

210211
// 1. Define the Zod schema for parameters
211212
const someToolSchema = z.object({
@@ -216,41 +217,46 @@ const someToolSchema = z.object({
216217
// 2. Infer the parameter type from the schema
217218
type SomeToolParams = z.infer<typeof someToolSchema>;
218219

219-
// 3. Implement the core logic in a separate, testable function
220-
// This function receives strongly-typed parameters and an injected executor.
220+
// 3. Implement the core logic as an event-emitting function.
221+
// Handlers emit structured events via ctx.emit() instead of returning ToolResponse.
221222
export async function someToolLogic(
222223
params: SomeToolParams,
223224
executor: CommandExecutor,
224-
): Promise<ToolResponse> {
225-
log('info', `Executing some_tool with param: ${params.requiredParam}`);
226-
227-
try {
228-
const result = await executor(['some', 'command'], 'Some Tool Operation');
229-
230-
if (!result.success) {
231-
return createErrorResponse('Operation failed', result.error);
232-
}
233-
234-
return createTextResponse(`✅ Success: ${result.output}`);
235-
} catch (error) {
236-
const errorMessage = error instanceof Error ? error.message : String(error);
237-
return createErrorResponse('Tool execution failed', errorMessage);
238-
}
225+
): Promise<void> {
226+
const headerEvent = header('Some Tool', [
227+
{ label: 'Param', value: params.requiredParam },
228+
]);
229+
const ctx = getHandlerContext();
230+
231+
return withErrorHandling(
232+
ctx,
233+
async () => {
234+
const result = await executor(['some', 'command'], 'Some Tool Operation');
235+
236+
if (!result.success) {
237+
ctx.emit(headerEvent);
238+
ctx.emit(statusLine('error', `Operation failed: ${result.error}`));
239+
return;
240+
}
241+
242+
ctx.emit(headerEvent);
243+
ctx.emit(statusLine('success', `Success: ${result.output}`));
244+
},
245+
{
246+
header: headerEvent,
247+
errorMessage: ({ message }) => `Tool execution failed: ${message}`,
248+
},
249+
);
239250
}
240251

241-
// 4. Export the tool definition for auto-discovery
242-
export default {
243-
name: 'some_tool',
244-
description: 'Tool description for AI agents. Example: some_tool({ requiredParam: "value" })',
245-
schema: someToolSchema.shape, // Expose shape for MCP SDK
246-
247-
// 5. Create the handler using the type-safe factory
248-
handler: createTypedTool(
249-
someToolSchema,
250-
someToolLogic,
251-
getDefaultCommandExecutor,
252-
),
253-
};
252+
// 4. Export schema shape and handler for manifest-driven auto-discovery
253+
export const schema = someToolSchema.shape;
254+
255+
export const handler = createTypedTool(
256+
someToolSchema,
257+
someToolLogic,
258+
getDefaultCommandExecutor,
259+
);
254260
```
255261

256262
This pattern ensures that:

src/mcp/tools/debugging/__tests__/debugging-tools.test.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,6 @@ import {
4747
} from '../debug_variables.ts';
4848
import { allText, runLogic } from '../../../../test-utils/test-helpers.ts';
4949

50-
5150
function createMockBackend(overrides: Partial<DebuggerBackend> = {}): DebuggerBackend {
5251
return {
5352
kind: 'dap',

src/mcp/tools/device/__tests__/get_device_app_path.test.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ import { schema, handler, get_device_app_pathLogic } from '../get_device_app_pat
99
import { sessionStore } from '../../../../utils/session-store.ts';
1010
import { runLogic } from '../../../../test-utils/test-helpers.ts';
1111

12-
1312
describe('get_device_app_path plugin', () => {
1413
beforeEach(() => {
1514
sessionStore.clear();

src/mcp/tools/device/__tests__/install_app_device.test.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ import { schema, handler, install_app_deviceLogic } from '../install_app_device.
55
import { sessionStore } from '../../../../utils/session-store.ts';
66
import { allText, runLogic } from '../../../../test-utils/test-helpers.ts';
77

8-
98
describe('install_app_device plugin', () => {
109
beforeEach(() => {
1110
sessionStore.clear();

src/mcp/tools/device/__tests__/launch_app_device.test.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@ import { schema, handler, launch_app_deviceLogic } from '../launch_app_device.ts
88
import { sessionStore } from '../../../../utils/session-store.ts';
99
import { allText, runLogic } from '../../../../test-utils/test-helpers.ts';
1010

11-
1211
describe('launch_app_device plugin (device-shared)', () => {
1312
beforeEach(() => {
1413
sessionStore.clear();

src/mcp/tools/device/__tests__/list_devices.test.ts

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -216,7 +216,10 @@ describe('list_devices plugin (device-shared)', () => {
216216
const text = allText(result);
217217
expect(text).toContain('Test iPhone');
218218
expect(text).toContain('test-device-123');
219-
expect(result.nextStepParams).toBeUndefined();
219+
expect(result.nextStepParams).toEqual({
220+
build_device: { scheme: 'YOUR_SCHEME', deviceId: 'UUID_FROM_ABOVE' },
221+
install_app_device: { deviceId: 'UUID_FROM_ABOVE', appPath: 'PATH_TO_APP' },
222+
});
220223
});
221224

222225
it('should return successful xctrace fallback response', async () => {

src/mcp/tools/device/__tests__/stop_app_device.test.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ import { schema, handler, stop_app_deviceLogic } from '../stop_app_device.ts';
55
import { sessionStore } from '../../../../utils/session-store.ts';
66
import { allText, runLogic } from '../../../../test-utils/test-helpers.ts';
77

8-
98
describe('stop_app_device plugin', () => {
109
beforeEach(() => {
1110
sessionStore.clear();

0 commit comments

Comments
 (0)