Skip to content

hybrid semantic search: embeddings + KNN + RRF fusion (#1)#33

Open
arreyder wants to merge 7 commits into
mainfrom
feat/semantic-search
Open

hybrid semantic search: embeddings + KNN + RRF fusion (#1)#33
arreyder wants to merge 7 commits into
mainfrom
feat/semantic-search

Conversation

@arreyder

Copy link
Copy Markdown
Owner

Roadmap item #1: dense-vector semantic search blended with the existing lexical ranking, so conceptually-related memories surface even with no shared terms.

What it does

  • internal/embed: pluggable Embedder; Ollama /api/embeddings impl; FromEnv (EMBED_URL/EMBED_MODEL/EMBED_DIM, default nomic-embed-text/768). Disabled no-op when unconfigured → lexical-only, never fails. Input truncated to EMBED_MAX_CHARS (default 6000) to stay under the model's context window.
  • Schema: knn_vector_768 DenseVectorField (cosine) + embedding field (stored=false so vectors never bloat responses; indexed for KNN).
  • Write path: store/bulk_store embed title+content; update re-embeds only when content/title change (tag/importance updates skip it), fetching the missing half so the vector reflects both.
  • Query path: when enabled, embed the query, run KNN alongside lexical edismax, fuse with reciprocal rank fusion (k=60). Over-fetch fusionK then trim to limit. semantic=false opts out; start>0 (pagination) forces lexical-only. Any embed/KNN error degrades to lexical-only.
  • client.KNNQuery: POSTs the vector (dodges URL-length limits) and forces defType=lucene — the /select handler defaults to edismax, which doesn't honor the leading {!knn} parser-switch and would tokenize the 768-float literal across qf fields, blowing maxClauseCount.
  • cmd/solr-mem-backfill: one-shot re-embed of existing memories (idempotent).

Verified live (crr-mini0)

596/596 memories embedded. A query with zero lexical matches (semantic=false → 0 results) returns conceptually-related memories via vector similarity (semantic=true → hits). Mechanism confirmed end-to-end.

Tests

embedder (request/parse/dim/truncate/disabled), fuseResponses (RRF order, dedup, semantic-only inclusion, limit, nil). build/vet/test/gofmt clean.

Deploy notes (gotchas hit)

  • macOS 15 Local Network Privacy (TCC) blocks the launchd-run server from LAN connections (shell-run binaries are fine). Worked around with a reverse SSH tunnel from pax99 → crr-mini0 loopback, so the server only ever touches 127.0.0.1 (EMBED_URL=http://127.0.0.1:11434). Tunnel is a systemd service on pax99.
  • Schema deploy = docker cp managed-schema.xml into the live core + cores?action=RELOAD (classic read-only schema factory).

Follow-ups (relevance tuning — not in this PR)

  1. nomic task prefixes: prepend search_document: to stored text and search_query: to queries — nomic-embed-text is trained for this and precision is noticeably better with it. Needs a doc-vs-query distinction in the Embedder + a re-backfill. Highest-value next step.
  2. Consider mxbai-embed-large (1024d) for higher quality.
  3. Tune RRF weighting / fusionK; expose a semantic-only mode.

🤖 Generated with Claude Code

arreyder and others added 5 commits June 9, 2026 23:49
Adds dense-vector semantic search blended with the existing lexical ranking,
so conceptually-related memories surface even with no shared terms.

- internal/embed: pluggable Embedder; Ollama /api/embeddings impl; FromEnv
  (EMBED_URL/EMBED_MODEL/EMBED_DIM). Disabled no-op when unconfigured →
  lexical-only, never fails.
- schema: knn_vector_768 DenseVectorField (cosine) + `embedding` field
  (stored=false so vectors never bloat responses; indexed for KNN).
- store/bulk_store: embed title+content on write. update: re-embed only when
  content/title change (tag/importance updates skip it), fetching the missing
  half so the vector reflects both.
- search: when enabled, embed the query, KNN alongside lexical edismax, fuse
  with reciprocal rank fusion (k=60). Over-fetch fusionK then trim to limit.
  semantic=false opts out; start>0 (pagination) forces lexical-only. Any
  embed/KNN error degrades to lexical-only.
- client.KNNQuery (POSTs the vector to dodge URL-length limits) + formatVector.
- cmd/solr-mem-backfill: one-shot re-embed of existing memories (idempotent).

Tests: embedder (request/parse/dim, disabled), fuseResponses (RRF order,
dedup, semantic-only inclusion, limit, nil). build/vet/test/gofmt clean.

Deploy: schema reload (docker cp + cores RELOAD) + EMBED_* env on the server +
run backfill.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t 6000)

16/596 memories failed backfill with 'input length exceeds context length'
(nomic-embed-text ~2048 tokens). Truncate embed input by runes; title+head
carries the semantic signal. Configurable via EMBED_MAX_CHARS.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The /select handler defaults to edismax, which doesn't honor the leading
{!knn} parser switch — it tokenized the 768-float vector literal across qf
fields, exceeding maxClauseCount (1024) -> 500. Send defType=lucene (+ drop
facet/hl) so the knn parser is used.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…exical count)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ric retrieval

nomic-embed-text is trained with task prefixes; without them query/doc vectors
are misaligned and precision suffers. Split Embed into EmbedDocument/EmbedQuery;
prefix stored text with 'search_document: ' and queries with 'search_query: '
(auto for nomic models, overridable via EMBED_DOC_PREFIX/EMBED_QUERY_PREFIX).
Requires re-backfill so stored vectors carry the document prefix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@arreyder

Copy link
Copy Markdown
Owner Author

Follow-up #1 (nomic task prefixes) is now included in this PR (commit 5294b3a) and re-backfilled live — EmbedDocument/EmbedQuery split, search_document:/search_query: prefixes (auto for nomic, env-overridable).

Live quality check after prefixes:

  • Lexical/obvious queries: excellent — e.g. "pebble SST blocksize 32k final decision" returns the 3 Pebble memories with the FINAL decision ranked Auto-context injection: get_briefing tool for relevant memories #1. Hybrid does not bury exact matches.
  • Vocab-mismatch/conceptual queries: real recall win (returns related memories where lexical gets 0), but precision is moderate on this dense technical corpus — nomic-embed-text (768d) ceiling. Next lever if we want sharper conceptual hits: mxbai-embed-large (1024d) — schema dim change + re-backfill.

arreyder and others added 2 commits June 10, 2026 00:18
…ng1024 field

Dimension change can't happen in place (Lucene forbids mixed vector dims in a
field). Add embedding1024 (knn_vector_1024) and point store/update/backfill/KNN
at it; the old 768 'embedding' field goes empty/vestigial (no delete-all, zero
risk to existing data). Set EMBED_MODEL=mxbai-embed-large, EMBED_DIM=1024.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pus too hard

mxbai-embed-large (1024d) benchmarks higher but its 512-token window forced
~1200-char truncation (vs nomic's 6000) — embedding ~1/5 of each memory, with
no conceptual-precision gain. nomic-embed-text (2048 tok) fits our content far
better. Point store/update/backfill/KNN back at the 768 'embedding' field;
embedding1024 stays defined-but-vestigial (kept so its data doesn't break
reload). Env reverts to nomic/768/6000. Real conceptual-precision lever is
reranking or chunked embeddings, not a bigger model.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant