Skip to content

fix(memory): bound mate_memory_recall.filename to VARCHAR(256) (#461)#463

Merged
mateaix merged 1 commit into
mateaix:devfrom
ncw1992120:fix/memory-recall-filename-overflow
Jul 1, 2026
Merged

fix(memory): bound mate_memory_recall.filename to VARCHAR(256) (#461)#463
mateaix merged 1 commit into
mateaix:devfrom
ncw1992120:fix/memory-recall-filename-overflow

Conversation

@ncw1992120

Copy link
Copy Markdown
Contributor

Fixes #461

根因

mate_memory_recall.filenameVARCHAR(256),但片段级召回追踪把 key 拼成 文件路径 + '#' + H2标题slugMemoryRecallTracker.java:121)。summarize-system.txt 的 prompt 对 ## 标题没有长度约束,LLM 偶尔把一整段事件细节写成标题,经 sanitizeSectionKey()(CJK 中文被原样保留、不截断)后,叠加文件路径前缀突破 256,写库报 Data too long / 字符串超长。MemoryRecallService.recordRecall 写入前也无截断。

详细分析见 #461

改动(三层防护,治本 + 硬兜底)

层级 文件 改动
治本 · 源头 prompts/memory/summarize-system.txt 要求 ## 标题保持简短(≤30 字),细节写进正文而非标题
硬兜底 · slug MemoryRecallTracker.java sanitizeSectionKey 返回值截断到 MAX_SECTION_SLUG=200,为 路径(~25)+#(1) 留足余量
最后防线 · 写库 MemoryRecallService.java recordRecall 入口截断到 MAX_FILENAME_LENGTH=255,覆盖所有调用路径

为什么第 2、3 层都要

  • 第 2 层只管 tracker 自己产生的 daily-note section key;
  • trackActiveRetrievalMemoryRecallTracker.java:148直接透传工具调用传入的 filename 调 recordRecall,不走 sanitizeSectionKey,必须靠第 3 层统一兜底;
  • recordRecall 是写入 mate_memory_recall唯一入口,截断放在入口处,保证方法内 select / insert / update 三个分支用同一个值,避免"查不到旧记录→又插入→又被截断"的失配,并发兜底(DuplicateKeyException 路径)也能正确匹配。

不改 schema

VARCHAR(256) 不动。系统其他读取方对 # 锚点都是"截断丢弃"处理(MemoryRecallService.computeFreshnessindexOf('#')FactControllersplit("#",2)),截断 slug 不影响日期解析与召回/新鲜度计算。

验证

  • 新增 MemoryRecallFilenameTruncationTest(纯 JUnit 5,不启动 Spring,对齐 AlwaysOnFileBudgetTest 风格):7 个用例全过
    • sanitizeSectionKey:超长中文标题 slug ≤200 且不抛异常;正常中文标题不误截断(含 CJK);ascii 标题折叠正确
    • truncateFilename:超 255 截断到 255;≤255 原样返回;截断后日期前缀存活(computeFreshness 仍能解析)
    • 端到端:daily-note section key 经两层截断后落库值 ≤256
  • 既有 memory 包回归(AlwaysOnFileBudgetTestMemorySummarizationGateTestMemoryHilServiceTestDreamFlagGuardTestStructuredMemoryPrefetchTest 等):35 个全过,0 失败 0 错误

…ix#461)

mate_memory_recall.filename is VARCHAR(256), but the snippet-level recall
tracker assembles the key as `path + '#' + H2-heading-slug`. When the LLM
writes an over-long daily-note heading (the summarize prompt placed no
length cap on the `##` title), the CJK-preserving slug pushes the filename
past the column, and writes fail with Data too long / string too long.

Three layers of defence, root cause + hard caps:

1. prompt (source) — summarize-system.txt now asks for short (≤30 chars)
   `##` titles; details go in the body, not the heading.
2. slug cap (close to source) — MemoryRecallTracker.sanitizeSectionKey
   caps the slug at MAX_SECTION_SLUG=200, leaving path+'#' well under 256.
3. write-side cap (catches every path) — MemoryRecallService.recordRecall
   truncates filename to MAX_FILENAME_LENGTH=255 at the entry point, so
   the select/insert/update branches share one value and the dup-key
   concurrency fallback still matches. Covers trackActiveRetrieval too,
   which bypasses sanitizeSectionKey.

Tests: MemoryRecallFilenameTruncationTest covers both caps (over-long CJK
heading, normal heading untouched, ascii slug, date prefix survives) plus
an end-to-end assertion that the stored value fits VARCHAR(256). Existing
memory-suite unit tests still green.
@mateaix mateaix merged commit 3ac7362 into mateaix:dev Jul 1, 2026
@mateaix

mateaix commented Jul 1, 2026

Copy link
Copy Markdown
Owner

已合并,感谢 🙏 教科书式的干净修复——根因定位准确,三层防护(prompt 约束 → slug 截断 200 → 写库入口截断 255)治本 + 硬兜底,思路和取舍都很到位。

复核确认:

  • Null 安全:recordRecallfilename==null/isBlank 提前 return,截断在其后,不会 NPE ✅
  • CJK 安全:截断 char-based,CJK 在 BMP 不会裂代理对;正常中文标题不误截断 ✅
  • 一致性:截断放在 recordRecall 唯一写入口,select/insert/update 三分支与并发 DuplicateKeyException 路径用同一个值,不会失配 ✅
  • 兼容:不改 schema;读取方(computeFreshnessindexOf('#')FactControllersplit)对锚点都是截断丢弃,日期解析/新鲜度不受影响 ✅
  • 测试:新增 MemoryRecallFilenameTruncationTest 7 用例 + memory 包回归,本地 mvn compile BUILD SUCCESS、全过 ✅

无需改动直接合入,这次没有 follow-up。再次感谢这个高质量修复 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

memory: mate_memory_recall.filename 因拼入整条 H2 标题 slug 突破 VARCHAR(256)(写入报 Data too long / 字符串超长)

2 participants