hybrid semantic search: embeddings + KNN + RRF fusion (#1) by arreyder · Pull Request #33 · arreyder/solr-mem

arreyder · 2026-06-10T05:08:10Z

Roadmap item #1: dense-vector semantic search blended with the existing lexical ranking, so conceptually-related memories surface even with no shared terms.

What it does

internal/embed: pluggable Embedder; Ollama /api/embeddings impl; FromEnv (EMBED_URL/EMBED_MODEL/EMBED_DIM, default nomic-embed-text/768). Disabled no-op when unconfigured → lexical-only, never fails. Input truncated to EMBED_MAX_CHARS (default 6000) to stay under the model's context window.
Schema: knn_vector_768 DenseVectorField (cosine) + embedding field (stored=false so vectors never bloat responses; indexed for KNN).
Write path: store/bulk_store embed title+content; update re-embeds only when content/title change (tag/importance updates skip it), fetching the missing half so the vector reflects both.
Query path: when enabled, embed the query, run KNN alongside lexical edismax, fuse with reciprocal rank fusion (k=60). Over-fetch fusionK then trim to limit. semantic=false opts out; start>0 (pagination) forces lexical-only. Any embed/KNN error degrades to lexical-only.
client.KNNQuery: POSTs the vector (dodges URL-length limits) and forces defType=lucene — the /select handler defaults to edismax, which doesn't honor the leading {!knn} parser-switch and would tokenize the 768-float literal across qf fields, blowing maxClauseCount.
cmd/solr-mem-backfill: one-shot re-embed of existing memories (idempotent).

Verified live (crr-mini0)

596/596 memories embedded. A query with zero lexical matches (semantic=false → 0 results) returns conceptually-related memories via vector similarity (semantic=true → hits). Mechanism confirmed end-to-end.

Tests

embedder (request/parse/dim/truncate/disabled), fuseResponses (RRF order, dedup, semantic-only inclusion, limit, nil). build/vet/test/gofmt clean.

Deploy notes (gotchas hit)

macOS 15 Local Network Privacy (TCC) blocks the launchd-run server from LAN connections (shell-run binaries are fine). Worked around with a reverse SSH tunnel from pax99 → crr-mini0 loopback, so the server only ever touches 127.0.0.1 (EMBED_URL=http://127.0.0.1:11434). Tunnel is a systemd service on pax99.
Schema deploy = docker cp managed-schema.xml into the live core + cores?action=RELOAD (classic read-only schema factory).

Follow-ups (relevance tuning — not in this PR)

nomic task prefixes: prepend search_document: to stored text and search_query: to queries — nomic-embed-text is trained for this and precision is noticeably better with it. Needs a doc-vs-query distinction in the Embedder + a re-backfill. Highest-value next step.
Consider mxbai-embed-large (1024d) for higher quality.
Tune RRF weighting / fusionK; expose a semantic-only mode.

🤖 Generated with Claude Code

Adds dense-vector semantic search blended with the existing lexical ranking, so conceptually-related memories surface even with no shared terms. - internal/embed: pluggable Embedder; Ollama /api/embeddings impl; FromEnv (EMBED_URL/EMBED_MODEL/EMBED_DIM). Disabled no-op when unconfigured → lexical-only, never fails. - schema: knn_vector_768 DenseVectorField (cosine) + `embedding` field (stored=false so vectors never bloat responses; indexed for KNN). - store/bulk_store: embed title+content on write. update: re-embed only when content/title change (tag/importance updates skip it), fetching the missing half so the vector reflects both. - search: when enabled, embed the query, KNN alongside lexical edismax, fuse with reciprocal rank fusion (k=60). Over-fetch fusionK then trim to limit. semantic=false opts out; start>0 (pagination) forces lexical-only. Any embed/KNN error degrades to lexical-only. - client.KNNQuery (POSTs the vector to dodge URL-length limits) + formatVector. - cmd/solr-mem-backfill: one-shot re-embed of existing memories (idempotent). Tests: embedder (request/parse/dim, disabled), fuseResponses (RRF order, dedup, semantic-only inclusion, limit, nil). build/vet/test/gofmt clean. Deploy: schema reload (docker cp + cores RELOAD) + EMBED_* env on the server + run backfill. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…t 6000) 16/596 memories failed backfill with 'input length exceeds context length' (nomic-embed-text ~2048 tokens). Truncate embed input by runes; title+head carries the semantic signal. Configurable via EMBED_MAX_CHARS. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The /select handler defaults to edismax, which doesn't honor the leading {!knn} parser switch — it tokenized the 768-float vector literal across qf fields, exceeding maxClauseCount (1024) -> 500. Send defType=lucene (+ drop facet/hl) so the knn parser is used. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…exical count) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ric retrieval nomic-embed-text is trained with task prefixes; without them query/doc vectors are misaligned and precision suffers. Split Embed into EmbedDocument/EmbedQuery; prefix stored text with 'search_document: ' and queries with 'search_query: ' (auto for nomic models, overridable via EMBED_DOC_PREFIX/EMBED_QUERY_PREFIX). Requires re-backfill so stored vectors carry the document prefix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

arreyder · 2026-06-10T05:14:23Z

Follow-up #1 (nomic task prefixes) is now included in this PR (commit 5294b3a) and re-backfilled live — EmbedDocument/EmbedQuery split, search_document:/search_query: prefixes (auto for nomic, env-overridable).

Live quality check after prefixes:

Lexical/obvious queries: excellent — e.g. "pebble SST blocksize 32k final decision" returns the 3 Pebble memories with the FINAL decision ranked Auto-context injection: get_briefing tool for relevant memories #1. Hybrid does not bury exact matches.
Vocab-mismatch/conceptual queries: real recall win (returns related memories where lexical gets 0), but precision is moderate on this dense technical corpus — nomic-embed-text (768d) ceiling. Next lever if we want sharper conceptual hits: mxbai-embed-large (1024d) — schema dim change + re-backfill.

…ng1024 field Dimension change can't happen in place (Lucene forbids mixed vector dims in a field). Add embedding1024 (knn_vector_1024) and point store/update/backfill/KNN at it; the old 768 'embedding' field goes empty/vestigial (no delete-all, zero risk to existing data). Set EMBED_MODEL=mxbai-embed-large, EMBED_DIM=1024. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…pus too hard mxbai-embed-large (1024d) benchmarks higher but its 512-token window forced ~1200-char truncation (vs nomic's 6000) — embedding ~1/5 of each memory, with no conceptual-precision gain. nomic-embed-text (2048 tok) fits our content far better. Point store/update/backfill/KNN back at the 768 'embedding' field; embedding1024 stays defined-but-vestigial (kept so its data doesn't break reload). Env reverts to nomic/768/6000. Real conceptual-precision lever is reranking or chunked embeddings, not a bigger model. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

arreyder and others added 5 commits June 9, 2026 23:49

fuse: NumFound reflects returned docs (semantic-only hits aren't in l…

e5e5173

…exical count) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

arreyder and others added 2 commits June 10, 2026 00:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hybrid semantic search: embeddings + KNN + RRF fusion (#1)#33

hybrid semantic search: embeddings + KNN + RRF fusion (#1)#33
arreyder wants to merge 7 commits into
mainfrom
feat/semantic-search

arreyder commented Jun 10, 2026

Uh oh!

arreyder commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arreyder commented Jun 10, 2026

What it does

Verified live (crr-mini0)

Tests

Deploy notes (gotchas hit)

Follow-ups (relevance tuning — not in this PR)

Uh oh!

arreyder commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant