Skip to content

feat(main-ai): migrate AI core to main backend#15326

Open
DeJeune wants to merge 4 commits into
stack/ai-service/01-provider-corefrom
stack/ai-service/02-main-ai-backend
Open

feat(main-ai): migrate AI core to main backend#15326
DeJeune wants to merge 4 commits into
stack/ai-service/01-provider-corefrom
stack/ai-service/02-main-ai-backend

Conversation

@DeJeune
Copy link
Copy Markdown
Collaborator

@DeJeune DeJeune commented May 26, 2026

What this PR does

Before this PR:
AI execution was split between renderer-owned aiCore, main-process services, MCP services, agent session code, message conversion, tracing, and transport glue.

After this PR:
The AI runtime is consolidated under src/main/ai, renderer aiCore migration sources are removed, shared transport/types are introduced, DataApi/schema wiring is updated, and AI architecture documentation is included with the backend migration.

Fixes # N/A

Why we need it and why it was done in this way

The following tradeoffs were made:
This PR is intentionally the largest stack layer because it preserves the reviewer focus on backend ownership and the renderer-to-main aiCore migration. Renderer consumer cleanup is deferred to the next PR.

The following alternatives were considered:
Splitting every src/main/ai cluster into separate PRs was considered, but that would make the renderer aiCore migration harder to review as one backend ownership change.

Links to places where the discussion took place: N/A

Breaking changes

None for current released users. v2 refactor migration notes are included under v2-refactor-temp/docs/breaking-changes/ for tracking the in-progress v2 data/runtime transition.

If this PR introduces breaking changes, please describe the changes and the impact on users.

Special notes for your reviewer

Stack PR 2/3. Base this review on the provider split PR. Please focus on src/main/ai, the removed src/renderer/src/aiCore migration sources, shared AI transport/types, DataApi/schema wiring, and the architecture docs in docs/references/ai and v2-refactor-temp/docs/ai.

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note

NONE

@DeJeune DeJeune force-pushed the stack/ai-service/02-main-ai-backend branch from 6b7882f to 9d1a527 Compare May 26, 2026 05:06
@DeJeune DeJeune force-pushed the stack/ai-service/01-provider-core branch 2 times, most recently from d619d4d to 2f5815c Compare May 26, 2026 05:15
@DeJeune DeJeune force-pushed the stack/ai-service/02-main-ai-backend branch 2 times, most recently from 1d0ae00 to 32462a7 Compare May 26, 2026 05:23
@DeJeune DeJeune force-pushed the stack/ai-service/01-provider-core branch from 2f5815c to d757342 Compare May 26, 2026 05:23
@DeJeune DeJeune force-pushed the stack/ai-service/02-main-ai-backend branch from 32462a7 to e52243f Compare May 26, 2026 05:35
@DeJeune DeJeune force-pushed the stack/ai-service/01-provider-core branch from d757342 to 98add12 Compare May 26, 2026 05:35
@MyPrototypeWhat
Copy link
Copy Markdown
Collaborator

MyPrototypeWhat commented May 26, 2026

Note

This comment was translated by Claude.

Ran a review with 5 specialized agents (code quality / silent failures / test coverage / type design / comments & docs) locally against PR HEAD e52243f0c. Below, findings are grouped by "severity + whether in PR scope"; all include file:line references.


🚨 Critical — Must Fix in This PR

1. Build Will Break: SpanCacheService File Missing

src/main/core/application/serviceRegistry.ts:13

import { SpanCacheService } from '@main/ai/observability/local/SpanCacheService'

Path doesn't exist anywhere in the tree — the old file src/main/services/SpanCacheService.ts was deleted, and no replacement was added. It's also referenced by multiple @DependsOn(['SpanCacheService']) calls (src/main/ai/observability/runtime/NodeTraceService.ts:35, src/main/ai/observability/adapters/claudeCode/ClaudeCodeTraceBridgeService.ts:22).

2. Orphan Directory src/main/services/agents/ Leaves 42 Files, Multiple Broken Imports

The PR deleted upper-level exports (SessionMessageOrchestrator, SessionStreamBus, sessionStreamIpc, etc.) but kept the underlying files, breaking:

  • src/main/services/agents/services/channels/ChannelMessageHandler.ts:9 imports deleted SessionMessageOrchestrator
  • src/main/services/agents/services/channels/index.ts re-exports deleted SessionStreamBus/sessionStreamIpc
  • src/main/services/agents/skills/SkillService.ts imports migrated symbols

The entire directory should be deleted (CLAUDE.md "Surgical Changes": "Remove imports / variables / functions that your changes orphaned").

3. src/main/services/NodeTraceService.ts (241 lines) Is an Orphan Copy

The new location src/main/ai/observability/runtime/NodeTraceService.ts was added, but the old file wasn't deleted; both have @Injectable('NodeTraceService').

4. DB Persistence Failures Silently Swallowed, UI Shows Success

src/main/ai/streamManager/listeners/PersistenceListener.ts:123-131

} catch (err) {
  logger.error('Failed to persist assistant message', {...})
  return
}

AiStreamManager receives no failure signal, WebContentsListener.onDone still sends {status:'success'}. The assistant placeholder row stays stuck in pending forever; after restart, the message vanishes. Fix: push a 'data-error' part in finalMessage.parts, or re-throw to let the manager mark the topic as error.

5. ClaudeCode Truncation Recovery Code Unreachable

src/main/ai/runtime/claudeCode/ClaudeCodeRuntimeDriver.ts:231-237 pushes {type:'error'} for all for await errors. ClaudeCodeStreamAdapter.handleTruncationError (src/main/ai/runtime/claudeCode/streamAdapter.ts:282) is implemented and unit-tested, but grep shows no runtime callers — it only appears in its own tests. Claude SDK's "Unexpected end of JSON input" truncation should recover gracefully; instead, it becomes a hard error and drops all received tokens. Fix: call this.adapter?.handleTruncationError(error) before pushing error; if it returns true, treat as normal turn-complete.

6. Aborted Streams May Be Reported as Error

The catch at src/main/ai/streamManager/AiStreamManager.ts:702-705 missed the abort check — after user abort, if cleanup throws, it goes to onExecutionError, and UI shows a red error instead of "paused". Mirror the in-loop logic from lines 758-777.

7. Reasoning Extraction Loses 2071 Lines of Tests

On base, src/renderer/src/aiCore/utils/__tests__/reasoning.test.ts is 2071 lines + reasoning.poe.test.ts 288 lines, covering DeepSeek/Qwen/OpenRouter/Gemini/Poe provider reasoning parsing. The new file src/main/ai/utils/reasoning.ts (1088 lines) is the only home for this logic. No tests in the new location — the biggest coverage regression in this PR.

8. Security-Related Code Has No Tests

  • src/main/ai/runtime/claudeCode/ToolApprovalRegistry.ts (112 lines) — 0 tests. Tool-approval scheduler, signal-attach/abort/duplicate-rejection all unverified. Security consequence: if the registry incorrectly "rejects a rejection," it allows tool calls the user already rejected.
  • src/main/ai/AiService.ts:175-258Ai_ToolApproval_Respond IPC handler has 5 branches (live registry / missing anchor / partial pending / DB-only write / dispatch); grep 'Ai_ToolApproval_Respond|respondToolApproval' in AiService.test.ts = 0.
  • src/main/ai/mcp/oauth/{callback,provider,storage}.ts — OAuth callback HTTP server + token persistence, no tests.

⚠️ Important — In PR Scope

Type Design

  • AiStreamRequest Should Be a Discriminated Union, Not Flat Optionals
    src/main/ai/types/requests.ts:46 marks chatId/trigger/runtime/messages/pendingMessages all optional, forcing src/main/ai/AiService.ts:273 to runtime throw new Error('Agent session stream requires...') — exactly the illegal combination the type should forbid. Suggestion: mirror the discriminated union from IPC sibling AiStreamOpenRequest, promote runtime to top-level discriminator. This is the highest-leverage type fix in this PR.

  • ChannelAdapter Flattens Strongly-Typed Config
    src/main/ai/channels/ChannelAdapter.ts:32 flattens zod-discriminated ChannelConfig into Record<string, unknown>; src/main/ai/channels/ChannelManager.ts:18 uses string instead of ChannelType as registry key — new channels can be silently missed from registration. Suggest ChannelAdapter<T extends ChannelType>.

Silent Failures

  • Ai_ToolApproval_Respond Doesn't Propagate Blocked Resultssrc/main/ai/AiService.ts:243-252 returns { ok: true } regardless of what dispatch returns. dispatchStreamRequest may return {mode:'blocked', reason:'agent-session-workspace'}, but renderer waits forever for stream chunks after seeing ok.
  • ChannelAdapterListener.onTextUpdate Completely Swallows Errorssrc/main/ai/streamManager/listeners/ChannelAdapterListener.ts:27: void this.adapter.onTextUpdate(...).catch(() => {}). Slack chat.update 429 / Feishu CardKit errors → streaming card freezes until onDone; external users see "bot is dead." At minimum add logger.warn.
  • approvalEmitter.dispose() Is Dead Codesrc/main/ai/runtime/claudeCode/settingsBuilder.ts:71-86 defines dispose to clear toolApprovalRegistry, but there are no callers in the tree. After turn ends, stale approvals stay in registry; when users later click Approve, they resolve a dead promise. Suggest calling in AgentSessionRuntimeService.markTurnTerminal path.
  • Unreadable file:// Parts Silently Droppedsrc/main/ai/messages/fileProcessor.ts:58 directly filters out unreadable attachments; LLM sees no attachment, user gets a seemingly confident answer that actually didn't see the PDF. Suggest replacing with 'data-error' part or text hint.

Lifecycle / Consistency

  • AgentSessionRuntimeService Missing @DependsOnsrc/main/ai/agentSession/AgentSessionRuntimeService.ts:598-599 accesses two WhenReady same-phase services via application.get('ClaudeCodeTraceBridgeService') and application.get('ClaudeCodeWarmQueryManager'), but the class decorator declares no dependencies. Without @DependsOn for same phase, container initializes in register order — latent race.
  • AiService.@DependsOn Includes 'AiStreamManager' — The comment at src/main/ai/AiService.ts:114-116 explicitly says "DO NOT mirror @DependsOn(['AiService']) on AiStreamManager", but AiService itself depends on AiStreamManager. Runtime, AiStreamManager also reverse-looks up via application.get('AiService') — this is an implicit cycle. Suggest either removing this dep or adding a comment explaining why one-way is OK.

Documentation Accuracy

  • All docs/references/ai/* Paths Wrong — Docs use kebab-case (agent-session/, runtime/ai-sdk/, stream-manager/, tools/builtin/), but actual directories are all camelCase (agentSession/, runtime/aiSdk/, streamManager/, tools/adapters/aiSdk/builtin/). Affects docs/references/ai/README.md:41-67, core-architecture.md:139-156, agent-loop.md:5,77,98-100, params-pipeline.md:4-5,99-101, tool-registry.md:17-68, stream-manager.md:198-202, etc. Future contributors grep docs but can't find code.
  • v2 Cluster Docs Have Fake IPCv2-refactor-temp/docs/ai/ai-service-cluster.md:34 claims Ai_EstimateTokens is an AiService handler, but it doesn't exist in the IpcChannel list; :28 says Ai_GenerateImage uses ipcOn/MessagePort, but actual src/main/ai/AiService.ts:149 is ipcHandle.
  • 4 Breaking-Change Notes All Mark introduced_in_pr: TBD, v2-refactor-temp/docs/breaking-changes/README.md marks as required.
  • 9/13 New docs/references/ai/* Not Indexed in docs/README.md — orphan docs.

💡 Suggestions — Can Fix in Follow-up PR

  • Ai_GenerateText has no abort registry (only Ai_GenerateImage does); tokens burn indefinitely when backend hangs. Mirror the image pattern. src/main/ai/AiService.ts:137-139
  • messageService.getById throws when anchor missing; src/main/ai/AiService.ts:216 should change to try/catch + return {ok:false}, not throw 500 to renderer.
  • Test mocks inconsistent: src/main/ai/__tests__/AiService.test.ts:8-22 and agentSession/__tests__/AgentSessionRuntimeService.test.ts:23 use ad-hoc vi.mock('@main/core/application'), while AiStreamManager.test.ts:118-129 correctly uses mockApplicationFactory. Standardize.
  • SkillInstaller backup rollback path has no tests (src/main/ai/skills/SkillInstaller.ts:48-55); failure may corrupt user skill directory.

⚙️ Out of PR Scope (Suggest Split to Cleanup PR / Follow-up)

The following "violations" already exist identically on the base branch stack/ai-service/01-provider-core; this PR only moved files. Per CLAUDE.md "Fix upstream" principle, this PR shouldn't be responsible, but separate follow-up has value:

  • ChannelManager IPC registration at src/main/ipc.ts:481-485 is global instead of this.ipcHandle() — base ipc.ts:527,532 already written this way
  • ChannelMessageHandler.ts:48-64 getInstance() singleton anti-pattern — same code on base
  • SkillService is non-lifecycle but holds ~10 long IPC handlers (src/main/ipc.ts:548-657) — same on base, just import path changed
  • DxtService.performVariableSubstitution:176-188 uses os.homedir() — word-for-word same on base
  • Slack / Discord / Feishu performFlush silently swallow flush errors — base already has same // Swallow flush errors — FlushController will reflush if needed comment
  • New services use db.transaction(...) instead of withWriteTx — entire repo currently only has 3 services using withWriteTx; asking this PR to fix is unfair

✅ Positive Notes

  • packages/shared/ai/transport/{stream,turnState}.ts discriminated union + Record<TopicStreamStatus, TurnStateFlags> exhaustiveness table + accompanying tests are the type-design gold standard of this PR — what main-process AI types should look like.
  • src/main/ai/streamManager/__tests__/AiStreamManager.test.ts (1009 lines) covering selective real-timer / ring-buffer overflow / multi-model fan-out / grace-period eviction is a model unit test.
  • Plenty of "why" comments blocking future rollbacks: src/main/ai/AiService.ts:206-219 (DB projection race), paired src/main/ai/streamManager/AiStreamManager.ts:174-179 + src/main/ai/AiService.ts:104-110 warning not to mirror @DependsOn, src/main/ai/runtime/aiSdk/observers/usage.ts:1-12 (why result.totalUsage is unreliable).
  • src/main/ai/streamManager/AiStreamManager.ts:710-778 (runExecutionLoop) precise abort vs error split / withIdleTimeout three-way termination cleanup / ToolApprovalRegistry all paths resolve — error discipline is solid in core places.
  • src/main/ai/skills/__tests__/SkillService.test.ts (500 lines) is the only new test in this PR that correctly uses setupTestDatabase().

Suggested Fix Order:

  1. Block merge until Critical SyntaxError: Unexpected token 'E', "Error: abo"... is not valid JSON #1缺一些常用功能 #3 (build) fixed
  2. Fix Critical 非常棒,简洁,配置方便! #4总是出现连接错误,网络一切顺通 #6 (data loss / error degradation)
  3. At minimum add Critical 功能建议 #7 core reasoning tests + 功能建议 #8 ToolApprovalRegistry tests
  4. Fix Important items in this PR
  5. Convert Suggestions + out-of-scope issues to follow-up issues

Original Content

本地用 5 个专项 agent (代码质量 / 静默失败 / 测试覆盖 / 类型设计 / 注释文档) 跑了一轮 review,核对到 PR HEAD e52243f0c。下文按"严重度 + 是否属本 PR scope"分组,所有发现都附 file:line


🚨 Critical — 必须在本 PR 修

1. Build 必崩:SpanCacheService 文件缺失

src/main/core/application/serviceRegistry.ts:13

import { SpanCacheService } from '@main/ai/observability/local/SpanCacheService'

路径在整棵树都找不到 — 老文件 src/main/services/SpanCacheService.ts 已删除,替代文件未加。同时被多处 @DependsOn(['SpanCacheService']) 引用 (src/main/ai/observability/runtime/NodeTraceService.ts:35, src/main/ai/observability/adapters/claudeCode/ClaudeCodeTraceBridgeService.ts:22)。

2. 孤儿目录 src/main/services/agents/ 留下 42 个文件,多处 broken import

PR 删了上层导出 (SessionMessageOrchestrator, SessionStreamBus, sessionStreamIpc 等) 但保留了底层文件,破坏:

  • src/main/services/agents/services/channels/ChannelMessageHandler.ts:9 import 已删除的 SessionMessageOrchestrator
  • src/main/services/agents/services/channels/index.ts re-export 已删除的 SessionStreamBus/sessionStreamIpc
  • src/main/services/agents/skills/SkillService.ts import 已迁移的符号

整个目录应删除 (CLAUDE.md "Surgical Changes":"Remove imports / variables / functions that your changes orphaned")。

3. src/main/services/NodeTraceService.ts (241 行) 是孤儿副本

新位置 src/main/ai/observability/runtime/NodeTraceService.ts 已添加,旧文件未删;两者都带 @Injectable('NodeTraceService')

4. DB 持久化失败被静默吞掉,UI 显示成功

src/main/ai/streamManager/listeners/PersistenceListener.ts:123-131

} catch (err) {
  logger.error('Failed to persist assistant message', {...})
  return
}

AiStreamManager 收不到失败信号,WebContentsListener.onDone 仍发 {status:'success'}。assistant 占位行卡在 pending 永不更新;用户重启后消息消失。修复:在 finalMessage.parts 里 push 'data-error' part,或重新 throw 让 manager 把 topic 标 error

5. ClaudeCode 截断恢复代码不可达

src/main/ai/runtime/claudeCode/ClaudeCodeRuntimeDriver.ts:231-237 把所有 for await 错误一律 push {type:'error'}ClaudeCodeStreamAdapter.handleTruncationError (src/main/ai/runtime/claudeCode/streamAdapter.ts:282) 已实现且单测覆盖,但 grep 显示无任何 runtime 调用方 — 只在自己的测试里出现。Claude SDK 的 "Unexpected end of JSON input" 截断本应优雅恢复,现在变成硬 error 把已收到的 token 全丢。修复:push error 之前调 this.adapter?.handleTruncationError(error),返回 true 则当 normal turn-complete。

6. Aborted stream 可能被报为 error

src/main/ai/streamManager/AiStreamManager.ts:702-705 的 catch 漏 fork 了 abort 检查 — 用户 abort 后如果 cleanup 抛异常,会走 onExecutionError,UI 显示红色错误而非"已暂停"。镜像 line 758-777 的 in-loop 逻辑即可。

7. Reasoning 提取丢失 2071 行测试

base 上 src/renderer/src/aiCore/utils/__tests__/reasoning.test.ts 是 2071 行 + reasoning.poe.test.ts 288 行,覆盖 DeepSeek/Qwen/OpenRouter/Gemini/Poe 各 provider 的 reasoning 解析。新文件 src/main/ai/utils/reasoning.ts (1088 行) 是这套逻辑唯一的家。新位置无任何测试 — 整个 PR 最大的覆盖率回退。

8. 安全相关代码无测试

  • src/main/ai/runtime/claudeCode/ToolApprovalRegistry.ts (112 行) — 0 测试。tool-approval 调度器,signal-attach/abort/duplicate-rejection 全部未验证。security 后果:如果 registry 错误地"拒绝了一次拒绝",会允许用户已拒绝的工具调用。
  • src/main/ai/AiService.ts:175-258Ai_ToolApproval_Respond IPC handler 5 个分支 (live registry / missing anchor / partial pending / DB-only write / dispatch),AiService.test.tsgrep 'Ai_ToolApproval_Respond\\|respondToolApproval' = 0。
  • src/main/ai/mcp/oauth/{callback,provider,storage}.ts — OAuth 回调 HTTP server + token 持久化,无测试。

⚠️ Important — 属本 PR scope

类型设计

  • AiStreamRequest 应是 discriminated union 而非 flat optionals
    src/main/ai/types/requests.ts:46chatId/trigger/runtime/messages/pendingMessages 全标 optional,导致 src/main/ai/AiService.ts:273 必须 runtime throw new Error('Agent session stream requires...') — 这正是类型该禁止的非法组合。建议把 IPC sibling AiStreamOpenRequest 的 discriminated union 镜像过来,把 runtime promote 到 top-level discriminator。这是 PR 中最高 leverage 的类型修复。

  • ChannelAdapter 把强类型 config 展平
    src/main/ai/channels/ChannelAdapter.ts:32 把 zod-discriminated ChannelConfig 展平为 Record<string, unknown>;src/main/ai/channels/ChannelManager.ts:18string 而非 ChannelType 做 registry key — 新增 channel 可能静默漏注册。建议 ChannelAdapter<T extends ChannelType>

静默失败

  • Ai_ToolApproval_Respond 不传播 blocked 结果src/main/ai/AiService.ts:243-252 不论 dispatch 返回什么都 return { ok: true }dispatchStreamRequest 可能返回 {mode:'blocked', reason:'agent-session-workspace'},renderer 看到 ok 后等永不到来的 stream chunks。
  • ChannelAdapterListener.onTextUpdate 完全吞错src/main/ai/streamManager/listeners/ChannelAdapterListener.ts:27: void this.adapter.onTextUpdate(...).catch(() => {})。Slack chat.update 429 / Feishu CardKit 错误 → 流式卡片冻结直到 onDone,外部用户看到的是"机器人挂了"。至少加 logger.warn
  • approvalEmitter.dispose() 是死代码src/main/ai/runtime/claudeCode/settingsBuilder.ts:71-86 定义了 dispose 来清空 toolApprovalRegistry,但全树无调用方。Turn 结束后旧 approval 留在 registry 里,用户后续点 Approve 解 resolve 一个死 promise。建议在 AgentSessionRuntimeService.markTurnTerminal 路径里调用。
  • 不可读的 file:// part 静默丢弃src/main/ai/messages/fileProcessor.ts:58 把读不到的附件直接 filter 掉,LLM 看不到附件,用户得到一个看似自信但其实没看到 PDF 的回答。建议替换为 'data-error' part 或文字提示。

Lifecycle / 一致性

  • AgentSessionRuntimeService@DependsOnsrc/main/ai/agentSession/AgentSessionRuntimeService.ts:598-599 通过 application.get('ClaudeCodeTraceBridgeService')application.get('ClaudeCodeWarmQueryManager') 访问两个 WhenReady 同 phase 服务,但 class decorator 未声明依赖。同 phase 无 @DependsOn 时容器按 register 顺序初始化 — latent race。
  • AiService.@DependsOn'AiStreamManager'src/main/ai/AiService.ts:114-116 的注释明确写 "DO NOT mirror @DependsOn(['AiService']) on AiStreamManager",但 AiService 自己反向依赖 AiStreamManager。runtime 上 AiStreamManager 又通过 application.get('AiService') 反查 — 这是个隐式循环。建议要么去掉这条 dep,要么补一个注释说明为什么单向 OK。

文档准确性

  • docs/references/ai/* 路径全错 — 文档用 kebab-case (agent-session/, runtime/ai-sdk/, stream-manager/, tools/builtin/),实际目录全是 camelCase (agentSession/, runtime/aiSdk/, streamManager/, tools/adapters/aiSdk/builtin/)。涉及 docs/references/ai/README.md:41-67, core-architecture.md:139-156, agent-loop.md:5,77,98-100, params-pipeline.md:4-5,99-101, tool-registry.md:17-68, stream-manager.md:198-202 等。后续贡献者按文档 grep 找不到代码。
  • v2 cluster docs 假 IPCv2-refactor-temp/docs/ai/ai-service-cluster.md:34 声称 Ai_EstimateTokens 是 AiService handler,实际 IpcChannel 列表里没有这个 channel;:28Ai_GenerateImageipcOn/MessagePort,实际 src/main/ai/AiService.ts:149ipcHandle
  • 4 份 breaking-change notes 都是 introduced_in_pr: TBD,v2-refactor-temp/docs/breaking-changes/README.md 标记为必填。
  • 9/13 个新 docs/references/ai/* 未在 docs/README.md 索引中 — 孤儿文档。

💡 Suggestion — 可在跟进 PR 修

  • Ai_GenerateText 无 abort registry (只 Ai_GenerateImage 有);后端挂时 token 持续燃烧。镜像 image 的模式即可。src/main/ai/AiService.ts:137-139
  • messageService.getById 在 anchor 缺失时 throw,src/main/ai/AiService.ts:216 应改成 try/catch + return {ok:false},而非把 500 抛给 renderer。
  • 测试 mock 不一致:src/main/ai/__tests__/AiService.test.ts:8-22agentSession/__tests__/AgentSessionRuntimeService.test.ts:23 用 ad-hoc vi.mock('@main/core/application'),而 AiStreamManager.test.ts:118-129 正确使用了 mockApplicationFactory。统一一下。
  • SkillInstaller 备份回滚路径无测试 (src/main/ai/skills/SkillInstaller.ts:48-55),失败时可能损坏用户技能目录。

⚙️ 非 PR scope (建议拆到清理 PR / 后续 stack)

以下"违规"在 base 分支 stack/ai-service/01-provider-core已经是同样写法,本 PR 只是 file move。按 CLAUDE.md "Fix upstream" 原则,不应该让这个 PR 兜底,但单独跟进有价值:

  • ChannelManager IPC 注册在 src/main/ipc.ts:481-485 全局而非 this.ipcHandle() — base ipc.ts:527,532 已这么写
  • ChannelMessageHandler.ts:48-64getInstance() 单例反模式 — base 上同样代码
  • SkillService 是非 lifecycle 但持有 ~10 个长 IPC handler (src/main/ipc.ts:548-657) — base 上同样写法,只是 import 路径换了
  • DxtService.performVariableSubstitution:176-188os.homedir() — base 上一字不改的代码
  • Slack / Discord / Feishu performFlush 静默吞 flush errors — base 上同样的 // Swallow flush errors — FlushController will reflush if needed 注释已存在
  • 新 services 用 db.transaction(...) 而非 withWriteTx — 整个仓库目前只有 3 个服务用 withWriteTx,要求这个 PR 兜底不公平

✅ 值得肯定

  • packages/shared/ai/transport/{stream,turnState}.ts 的 discriminated union + Record<TopicStreamStatus, TurnStateFlags> 穷尽性表 + 配套测试是这份 PR 类型设计的标杆 — 应该是 main-process AI 类型该有的样子。
  • src/main/ai/streamManager/__tests__/AiStreamManager.test.ts (1009 行) 选择性 real-timer / ring-buffer overflow / multi-model fan-out / grace-period eviction 都覆盖,是模范单测。
  • 大量 "为什么" 注释拦住未来回退:src/main/ai/AiService.ts:206-219 (DB projection race),paired src/main/ai/streamManager/AiStreamManager.ts:174-179 + src/main/ai/AiService.ts:104-110 警告不要 mirror @DependsOn,src/main/ai/runtime/aiSdk/observers/usage.ts:1-12 (result.totalUsage 不可靠原因)。
  • src/main/ai/streamManager/AiStreamManager.ts:710-778 (runExecutionLoop) abort vs error 的精确分流 / withIdleTimeout 三路终止 cleanup / ToolApprovalRegistry 所有路径都 resolve 不会 hang — 错误纪律在核心位置都很对。
  • src/main/ai/skills/__tests__/SkillService.test.ts (500 行) 是这份 PR 里唯一正确使用 setupTestDatabase() 的新测试。

建议处理顺序:

  1. Block merge 直到 Critical SyntaxError: Unexpected token 'E', "Error: abo"... is not valid JSON #1缺一些常用功能 #3 (build) 修复
  2. 修 Critical 非常棒,简洁,配置方便! #4总是出现连接错误,网络一切顺通 #6 (数据丢失 / 错误降级)
  3. 至少补 Critical 功能建议 #7 的核心 reasoning 测试 + 功能建议 #8 的 ToolApprovalRegistry 测试
  4. Important 区在本 PR 修
  5. Suggestion + scope-外问题转 follow-up issue

@DeJeune DeJeune force-pushed the stack/ai-service/02-main-ai-backend branch 3 times, most recently from 3edb986 to 651b0fb Compare May 26, 2026 06:41
@DeJeune DeJeune force-pushed the stack/ai-service/01-provider-core branch from 98add12 to 31a0c7e Compare May 26, 2026 08:53
@DeJeune DeJeune force-pushed the stack/ai-service/02-main-ai-backend branch 2 times, most recently from 93f20ab to 9eb9236 Compare May 26, 2026 09:02
Signed-off-by: suyao <sy20010504@gmail.com>
@DeJeune DeJeune force-pushed the stack/ai-service/02-main-ai-backend branch from 9eb9236 to 8ba1704 Compare May 26, 2026 09:24
DeJeune added 3 commits May 26, 2026 20:02
Signed-off-by: suyao <sy20010504@gmail.com>
Signed-off-by: suyao <sy20010504@gmail.com>
Signed-off-by: suyao <sy20010504@gmail.com>
@MyPrototypeWhat
Copy link
Copy Markdown
Collaborator

MyPrototypeWhat commented May 27, 2026

Note

This comment was translated by Claude.

Follow-up: Status Check After 4 Commits

Updated HEAD to 0a47d1dfb. Verified item-by-item the code status for all Critical/Important items from the previous review, recorded below.


✅ Fixed

Item Resolution
Critical #1 SpanCacheService path doesn't exist Added at src/main/ai/observability/cache/SpanCacheService.ts, registry import updated synchronously
Critical #2 42 orphan files in src/main/services/agents/ All deleted (git ls-tree hit count = 0)
Critical #3 Old src/main/services/NodeTraceService.ts duplicate Deleted, only the new location version remains
Important All docs paths wrong + breaking-change introduced_in_pr: TBD Via 0a47d1dfb, moved all 9 docs/references/ai/*.md + 14 v2-refactor-temp/docs/ai/*-cluster.md out of this stack. ✅ Reasonable defer

❌ Unhandled Review Items

Critical (Data Correctness / Test Coverage)

  • 非常棒,简洁,配置方便! #4 PersistenceListener silently swallows DB persistence failuressrc/main/ai/streamManager/listeners/PersistenceListener.ts:123-131 code unchanged, catch still only logger.error + return. AiStreamManager receives no failure signal, WebContentsListener.onDone still sends {status:'success'}.
  • 一些建议 #5 ClaudeCode handleTruncationError dead codestreamAdapter.ts:282 definition + tests, but git grep handleTruncationError across entire tree still only hits definition and its own tests, no runtime callers. ClaudeCodeRuntimeDriver.ts:231-234 catch still directly eventQueue.push({type:'error'}).
  • 总是出现连接错误,网络一切顺通 #6 Aborted stream pathAiStreamManager.ts:702-705 outer launchLoop().catch(...) still lacks signal.aborted fork (comment says "Defensive funnel for sync throws"). The fork inside runExecutionLoop for abort is correct; only the path where streamText synchronously rejects lands here — low trigger probability, but semantically incorrect case still exists.
  • 功能建议 #7 reasoning testssrc/main/ai/utils/__tests__/reasoning.test.ts still doesn't exist (git ls-tree | grep reasoning only hits source file). 2071+288 line coverage from base not migrated.
  • 功能建议 #8 ToolApprovalRegistry / Ai_ToolApproval_Respond testsgrep -ci 'toolapproval\\|approvalrespond\\|Ai_ToolApproval' = 0 in src/main/ai/__tests__/AiService.test.ts; src/main/ai/runtime/claudeCode/__tests__/ToolApprovalRegistry.test.ts still doesn't exist.

Important

  • AiStreamRequest still flat optionalssrc/main/ai/types/requests.ts:43-52 all fields optional; runtime guard src/main/ai/AiService.ts:283 "Agent session stream X requires an agent-session runtime request" throw still exists, this is the highest leverage type fix opportunity.
  • ChannelAdapter config still flattenedsrc/main/ai/channels/ChannelAdapter.ts:33 channelConfig: Record<string, unknown>; src/main/ai/channels/ChannelManager.ts:18 factory registry still keyed by string not ChannelType.
  • Ai_ToolApproval_Respond still doesn't return blockedsrc/main/ai/AiService.ts:247-252 ignores dispatchStreamRequest return value and directly return { ok: true }. dispatch.ts:69-72 confirms dispatch returns {mode:'blocked', reason:'agent-session-workspace'} when isAgentSessionWorkspaceError, this path can still trigger silent hang.
  • ChannelAdapterListener.onTextUpdate still .catch(() => {})ChannelAdapterListener.ts:26 unchanged, Slack chat.update 429 / Feishu CardKit errors still have no logging.
  • approvalEmitter.dispose still dead codesrc/main/ai/runtime/claudeCode/settingsBuilder.ts:77 defined, :188 comment says "drops any approval still pending for this session when the stream exits abnormally", but git grep dispose across entire codebase has 0 callers. Pending approvals still leak until session close.
  • AgentSessionRuntimeService still lacks @DependsOn:133 class decorator only has @Injectable + @ServicePhase, but :601-602 accesses same-phase services via application.get('ClaudeCodeTraceBridgeService') and application.get('ClaudeCodeWarmQueryManager'). Same-phase without @DependsOn means container initializes by register order — latent race.
  • AiService.@DependsOn(['AiStreamManager']) contradicts comment:120 still contains 'AiStreamManager', while :114-116 comment "DO NOT mirror @DependsOn(['AiService']) on AiStreamManager" doesn't explain why reverse unidirectional is OK.
  • messages/fileProcessor.ts + messageConverter.ts:36 still silently drops attachmentsmessageConverter.ts:36 changed to else logger.warn('Dropped unresolved file part', ...) added a warning log (thank you for this improvement), but still doesn't replace with 'data-error' part, LLM has no way to know attachment existed.

Suggestion (Original List)

Ai_GenerateText still has no abort registry / messageService.getById still throws / test mocks inconsistent / SkillInstaller backup rollback untested — all unchanged.


🆕 Author Proactively Fixed (Real Bugs Review Didn't Find)

9a21256b5 stream-manager replay tool input

Fixed an attach race: buildCompactReplay previously directly fell through and dropped tool-input-start, causing when attach happens before tool-input-available, the tool-input-delta received by renderer can't find toolCall part, reducer deadlocks.

-      case 'tool-input-start':
-      case 'tool-input-delta':
+      case 'tool-input-start': {
+        // Preserve the part announcement — without it the renderer's chat
+        // reducer cannot apply subsequent live tool-input-delta chunks
         flushPending()
+        compact.push(chunk)
+        break

Added delta merging (consecutive same toolCallId merge), lastCompletedAt field, cleanup of isFulfilledCandidate dead field,配套 +70 lines AiStreamManager.test.ts and +112 lines buildCompactReplay.test.ts. ✅ Real bug, real fix, with tests.

9e6a3b075 topic-naming push events

Changed SWR cache-version polling to IPC push: added Topic_AutoRenamed / AgentSession_AutoRenamed two new IPC channels, WindowManager.broadcast actively pushes. Deleted useless cache fields topic.cache_version / agent_session.cache_version. ✅ Architecturally more correct.


Summary

Dimension Progress
Build/Compile (Critical #1-#3) ✅ All fixed
Data Correctness (Critical #4-#6) ❌ 0/3
Test Coverage (Critical #7-#8) ❌ 0/2
Important 8 items ❌ 7/8 untouched (only docs resolved via defer)
Type Design ❌ Untouched
Author's newly discovered attach race + push refactor ✅ Proactively fixed

Suggestion: Build should pass now (locally verify pnpm build:check). But #4#8 are data correctness + safety code without tests, unrelated to build — want to confirm if author plans to handle these in this PR or defer to follow-up? If the latter, opening a separate issue to track #4#6 + #8's ToolApprovalRegistry tests would be appropriate.


Original Content

Follow-up: 4 个 commit 后状态核对

更新 HEAD 0a47d1dfb。逐项 verify 了之前 review 中所有 Critical/Important 的代码状态,记录如下。


✅ 已修复

解决方式
Critical #1 SpanCacheService 路径不存在 加在 src/main/ai/observability/cache/SpanCacheService.ts,registry import 同步更新
Critical #2 src/main/services/agents/ 42 孤儿文件 全删 (git ls-tree 命中数 = 0)
Critical #3src/main/services/NodeTraceService.ts 重复 已删,只剩新位置的版本
Important docs 路径全错 + breaking-change introduced_in_pr: TBD 通过 0a47d1dfb 把 9 份 docs/references/ai/*.md + 14 份 v2-refactor-temp/docs/ai/*-cluster.md 全部移出本 stack。✅ 合理 defer

❌ 仍未处理的 review 项

Critical(数据正确性 / 测试覆盖)

  • 非常棒,简洁,配置方便! #4 PersistenceListener 静默吞 DB 持久化失败src/main/ai/streamManager/listeners/PersistenceListener.ts:123-131 代码原样,catch 里仍然只 logger.error + returnAiStreamManager 收不到失败信号,WebContentsListener.onDone 仍发 {status:'success'}
  • 一些建议 #5 ClaudeCode handleTruncationError 死代码streamAdapter.ts:282 定义 + 测试,但 git grep handleTruncationError 全树仍只命中定义和它自己的测试,无任何 runtime 调用方。ClaudeCodeRuntimeDriver.ts:231-234 的 catch 仍直接 eventQueue.push({type:'error'})
  • 总是出现连接错误,网络一切顺通 #6 Aborted stream 路径AiStreamManager.ts:702-705 外层 launchLoop().catch(...) 仍无 signal.aborted fork(注释自己写的"Defensive funnel for sync throws")。runExecutionLoop 内部对 abort 的 fork 是对的;只有 streamText 同步 reject 的路径会落到这里 — 触发概率不高,但语义不对的 case 仍存在。
  • 功能建议 #7 reasoning 测试src/main/ai/utils/__tests__/reasoning.test.ts 仍不存在(git ls-tree | grep reasoning 只命中源文件)。base 上的 2071+288 行覆盖未迁移。
  • 功能建议 #8 ToolApprovalRegistry / Ai_ToolApproval_Respond 测试src/main/ai/__tests__/AiService.test.tsgrep -ci 'toolapproval\\|approvalrespond\\|Ai_ToolApproval' = 0;src/main/ai/runtime/claudeCode/__tests__/ToolApprovalRegistry.test.ts 仍不存在。

Important

  • AiStreamRequest 仍是 flat optionalssrc/main/ai/types/requests.ts:43-52 字段全 optional;runtime guard src/main/ai/AiService.ts:283 "Agent session stream X requires an agent-session runtime request" 的 throw 仍存在,这是最高 leverage 的类型修复机会。
  • ChannelAdapter 配置仍展平src/main/ai/channels/ChannelAdapter.ts:33 channelConfig: Record<string, unknown>;src/main/ai/channels/ChannelManager.ts:18 factory registry 仍 keyed by string 而非 ChannelType
  • Ai_ToolApproval_Respond 仍不返 blockedsrc/main/ai/AiService.ts:247-252 无视 dispatchStreamRequest 返回值直接 return { ok: true }dispatch.ts:69-72 确认 dispatch 在 isAgentSessionWorkspaceError 时会返 {mode:'blocked', reason:'agent-session-workspace'},该路径仍可触发 silent hang。
  • ChannelAdapterListener.onTextUpdate.catch(() => {})ChannelAdapterListener.ts:26 原样,Slack chat.update 429 / Feishu CardKit 错误仍无任何日志。
  • approvalEmitter.dispose 仍是死代码src/main/ai/runtime/claudeCode/settingsBuilder.ts:77 定义,:188 注释自己说 "drops any approval still pending for this session when the stream exits abnormally",但 git grep dispose 全 codebase 0 调用方。Pending approvals 仍会泄漏到 session 关闭。
  • AgentSessionRuntimeService 仍缺 @DependsOn:133 类装饰器只有 @Injectable + @ServicePhase,但 :601-602 通过 application.get('ClaudeCodeTraceBridgeService')application.get('ClaudeCodeWarmQueryManager') 访问同 phase 服务。同 phase 无 @DependsOn 时容器按 register 顺序初始化 — latent race。
  • AiService.@DependsOn(['AiStreamManager']) 与注释矛盾:120 仍含 'AiStreamManager',而 :114-116 的注释 "DO NOT mirror @DependsOn(['AiService']) on AiStreamManager" 没有解释为什么单向反过来 OK。
  • messages/fileProcessor.ts + messageConverter.ts:36 仍 silent 丢附件messageConverter.ts:36 改成 else logger.warn('Dropped unresolved file part', ...) 加了一行警告日志(感谢这个改进),但仍不替换为 'data-error' part,LLM 无从知晓附件存在。

Suggestion(原列表)

Ai_GenerateText 仍无 abort registry / messageService.getById 仍 throw / 测试 mock 不一致 / SkillInstaller 备份回滚未测 — 全部未改。


🆕 作者主动修复的(review 没找到的真 bug)

9a21256b5 stream-manager replay tool input

修了一个 attach race:buildCompactReplay 之前直接 fallthrough 丢弃 tool-input-start,导致 attach 在 tool-input-available 之前发生时,renderer 收到的 tool-input-delta 找不到 toolCall part,reducer 卡死。

-      case 'tool-input-start':
-      case 'tool-input-delta':
+      case 'tool-input-start': {
+        // Preserve the part announcement — without it the renderer's chat
+        // reducer cannot apply subsequent live tool-input-delta chunks
         flushPending()
+        compact.push(chunk)
+        break

加了 delta 合并(连续相同 toolCallId 合并)、lastCompletedAt 字段、清理 isFulfilledCandidate 死字段、配套 +70AiStreamManager.test.ts+112buildCompactReplay.test.ts。✅ 真 bug,真修,带测试。

9e6a3b075 topic-naming push events

把 SWR cache-version polling 改成 IPC push:加 Topic_AutoRenamed / AgentSession_AutoRenamed 两个新 IPC channel,WindowManager.broadcast 主动推。删掉 topic.cache_version / agent_session.cache_version 两个无用的 cache 字段。✅ 架构上更对。


总结

维度 进展
Build/编译 (Critical #1-#3) ✅ 全修
数据正确性 (Critical #4-#6) ❌ 0/3
测试覆盖 (Critical #7-#8) ❌ 0/2
Important 8 项 ❌ 7/8 未动(只 docs 通过 defer 解决)
类型设计 ❌ 未动
作者新发现的 attach race + push 重构 ✅ 主动修了

建议:Build 应该现在能过了(本地 verify 一下 pnpm build:check)。但 #4#8 是数据正确性 + 安全代码无测试,与 build 无关 — 想确认下作者计划在本 PR 处理这些,还是要推到 follow-up?如果是后者,把 #4#6 + #8ToolApprovalRegistry 测试单独开 issue 跟踪比较合适。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants