Skip to content

kundeng/Veles

 
 

Repository files navigation

Veles

Veles

Crates.io Crates.io docs.rs License: MIT

Fast, hybrid (BM25 + semantic) local code search for AI agents and humans, written in pure Rust.

Veles runs entirely on CPU — no GPU, no transformer forward pass at query time. Queries return in tens of milliseconds against a persistent on-disk index, with tree-sitter-aware symbol lookups, pipe-friendly output formats, and built-in MCP / gRPC servers for integration with Claude, Cursor, or anything else that speaks JSON-RPC. Static embeddings come from the potion family via model2vec-rs.

Originally inspired by Semble — Veles started as a Rust port of the same hybrid retrieval recipe and has grown to add persistent + incremental indexing, tree-sitter symbols / defs / refs, six pipe-friendly output formats, glob/language filters, gRPC, and shell completions.

Veles TUI — live hybrid search with preview pane

veles tui — live hybrid search, ~10ms per keystroke.

Interfaces

  • CLIveles search "query" ./my-repo
  • MCP server — stdio JSON-RPC for AI agent integration (Claude, Cursor, etc.)
  • gRPC — tonic-based service with Index, Search, FindRelated, GetStats RPCs

Features

  • Persistent index under <repo>/.veles/ — searches reuse the cache and finish in tens of milliseconds. Incremental update keeps embeddings of unchanged files.
  • Hybrid search with Reciprocal Rank Fusion (RRF) blending BM25 and semantic scores
  • Tree-sitter symbol commandssymbols / defs / refs for Rust, Python, JavaScript, TypeScript, Go
  • Identifier-aware tokenizer — splits camelCase, snake_case, and mixed-script names
  • Query-type detection — symbol queries lean BM25, natural language leans semantic
  • Definition boosting — promotes chunks that define the queried symbol
  • Path penalties — demotes test files, compat dirs, re-export files
  • File saturation — avoids stacking all results from one file
  • Scope labels on every hit — search/related/refs results carry a tree-sitter-derived defines `Foo` or in `bar` suffix so the result header alone tells you what each chunk is
  • Multilingual model option for Cyrillic, CJK, Arabic, etc.
  • Pipe-friendly outputpretty, compact, ripgrep, paths, json, jsonl
  • Filter flags--lang, --path and --exclude glob patterns, --min-score
  • Prebuilt binaries for macOS (Intel/ARM), Linux x86_64/ARM64 (musl), Windows x86_64

Install

# Linux / macOS — prebuilt binary (one-liner)
curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/julymetodiev/Veles/releases/latest/download/veles-cli-installer.sh | sh

# Windows — PowerShell
irm https://github.com/julymetodiev/Veles/releases/latest/download/veles-cli-installer.ps1 | iex

# Homebrew (macOS / Linux)
brew install julymetodiev/tap/veles-cli

# From crates.io (compiles locally; no protoc / extra deps needed)
cargo install veles-cli

# Manual download
gh release download --repo julymetodiev/Veles --pattern '*linux-gnu*'   # or browse
#   https://github.com/julymetodiev/Veles/releases/latest

# Verify (optional)
veles --version    # → veles 0.6.0

See INSTALL.md for SHA-256 verification and other install paths.

Quickstart

veles index .                     # one-off, builds .veles/
veles search "parse config file"  # auto-loads the cache
veles update .                    # refresh after edits

The first search downloads the embedding model from Hugging Face (~64 MB, cached at ~/.cache/huggingface/hub/).

Most-used commands

Search

veles search "rate limiting"                          # hybrid (default)
veles search "rate limiting" -t 10 -f compact         # 10 results, 1 line each
veles search "rate limiting" -f rg                    # ripgrep-style path:line:content
veles search "rate limiting" -f json | jq '.results'  # structured for scripting
veles search "rate limiting" -f paths | xargs $EDITOR # open every matching file
veles search "TokenStream" -m bm25                    # exact identifier
veles search "auth flow"    -m semantic               # fuzzy concept
veles search "auth"  -l rust,python                   # language filter
veles search "X"     -g 'src/**/*.rs' -x 'src/legacy/**'   # glob include / exclude
veles search "BM25"  --min-score 0.4                  # drop weak hits

Symbols (tree-sitter)

veles symbols crates/veles-core/src/persist.rs        # outline a single file
veles defs Manifest                                   # every definition named "Manifest"
veles defs save -k function -l rust                   # filter by kind + language
veles refs save_index -t 30                           # defs + BM25 references

Related code

veles find-related src/main.rs 42                     # semantically similar chunks
veles find-related src/main.rs 42  -l rust            # restrict to one language
veles find-related src/main.rs 42  -g 'crates/foo/**' # restrict to a subtree

Index lifecycle

veles index .              # bootstrap
veles index . --force      # rebuild from scratch
veles update .             # incremental refresh
veles status .             # manifest + drift
veles clean .              # remove .veles/

Interactive TUI

veles tui                          # live hybrid search with preview pane
veles tui ./my-repo                # against another repo
veles tui --debug-keys             # echo every keypress (terminal diagnostic)

Loads the persistent index once, then debounces queries so each keystroke re-runs in tens of milliseconds. Highlights:

  • Search↑↓ navigate, Tab cycles hybrid/bm25/semantic.
  • LookupsCtrl-D defs, Ctrl-F refs, Ctrl-R semantically related. With an empty query, Ctrl-D / Ctrl-F use the selected row's symbol.
  • HistoryCtrl-B / Ctrl-X (also F2/F3, Alt-←/, Alt-h/l) jump back and forward across past views — like a browser.
  • Query recallCtrl-↑ / Ctrl-↓ (also Alt-P/Alt-N) walk past queries, recorded at Enter, Ctrl-O, Ctrl-R/D/F, and Ctrl-U.
  • FiltersCtrl-T cycles language filter through the indexed languages, Ctrl-Y opens a path-glob input. Both pass through to every dispatch (search, related, defs, refs).
  • PreviewShift-↑↓ / Shift-PgUp/PgDn scroll within the chunk; F5F8 are non-modifier fallbacks for terminals that swallow Shift+Arrow.
  • OpenEnter prints path:line to stdout ($EDITOR $(veles tui) works), Ctrl-O spawns $EDITOR in-place and returns to the TUI on exit. Editor heuristic covers vim / nvim / emacs / nano / VS Code / Cursor / Windsurf / Helix; set $EDITOR=vim (or $VISUAL) to pick.
  • Help? (when the query is empty) opens a scrollable keybinding overlay; Ctrl-G cancels an in-flight search (readline convention); Esc / Ctrl-C quit.

Servers

veles serve-mcp                                # MCP over stdio (default if no args)
veles serve-grpc --addr "[::1]:50051"          # gRPC

Shell integration

mkdir -p ~/.zfunc ~/.local/share/man/man1
veles completions zsh > ~/.zfunc/_veles
veles man --out-dir ~/.local/share/man/man1

veles man --out-dir DIR writes one page per subcommand (veles.1, veles-search.1, veles-defs.1, …) so man veles-search works the same way as man git-commit.

Then once in ~/.zshrc:

fpath=(~/.zfunc $fpath)
autoload -Uz compinit && compinit
export MANPATH="$HOME/.local/share/man:$MANPATH"

Remote repos

veles search "BM25 inverted index" https://github.com/julymetodiev/Veles

See USAGE.md for the full reference, recipes (fzf, vim quickfix, jq), and troubleshooting. Maintainers can read the automatic workspace indexing specification for lifecycle, concurrency, and persistence guarantees.

MCP server

veles serve-mcp     # configure once; workspace indexing stays current automatically
veles               # equivalent — bare `veles` starts MCP when stdin is piped

No watcher, pipeline, owner, or dashboard configuration is required. Each coding-agent session discovers its workspace and shares one repository-local updater with any other Veles MCP processes for that repository. Different repositories remain independent.

serve-mcp [PATH] optionally sets the default repo whenever an MCP tool omits repo. Without PATH, Veles checks VELES_WORKSPACE, CLAUDE_PROJECT_DIR, then the spawned process's current directory. Coding-agent configurations should pass the workspace explicitly or set the server cwd; see crates/veles-mcp/README.md.

Exposed tools:

Tool Use it for
search Hybrid / BM25 / semantic query, with optional lang / path / exclude / min_score.
defs Tree-sitter definitions for an exact symbol name (Rust, Python, JS, TS, Go).
refs Definitions plus BM25 hits — "where is X defined and where is X used", in one call.
find_related Semantically similar chunks for a (file_path, line) from an earlier search.
list_symbols Every tree-sitter definition across the index, with kind / lang / path filters.
symbols Outline of a single file — every definition it contains.
scope_at Innermost tree-sitter symbol containing a given file:line.
files Distinct file paths in the index, with lang / path / exclude filters.
read Line range from an indexed file (capped at 500 lines, repo-relative paths only).
stats File / chunk counts, model metadata, per-language chunk breakdown.
status Non-mutating drift check vs. persisted manifest; distinguishes content edits from bare touch.
update Incremental refresh of a local repo's .veles/ index after edits (BLAKE3-aware).

search, find_related, and refs accept a format argument:

  • default (default) — scored, fenced code blocks tagged with the enclosing scope.
  • paths — flat per-line list. search / find_related emit path:start-end; refs emits path:line per word-boundary occurrence of the symbol.
  • unique_paths — collapsed to one path line per file. For agent shortlist workflows that just want "which files matter".

Build from source

cargo build --release

tonic-build ships a vendored protoc via protoc-bin-vendored, so no system-wide protobuf compiler is required.

Embedding in your own Rust project

The workspace publishes four crates on crates.io — pick the layer you need:

Crate Purpose
veles-core Indexing, chunking, BM25, dense search, hybrid ranking, persistence.
veles-grpc tonic-based gRPC service wrapping veles-core.
veles-mcp MCP / JSON-RPC server (stdio) for AI-agent integration.
veles-cli The veles binary.

Full API docs are on docs.rs.

[dependencies]
veles-core = "0.2"
use std::path::Path;
use veles_core::{SearchMode, VelesIndex};

let index = VelesIndex::from_path(Path::new("."), None, None, false)?;
let results = index.search("parse config", 5, SearchMode::Hybrid, None, None, None);
for r in results {
    println!("{} [{:.3}]", r.chunk.location(), r.score);
}
# Ok::<(), anyhow::Error>(())

Architecture

Veles/
  crates/
    veles-core/    indexing, chunking, BM25, dense search, ranking, symbols
    veles-grpc/    gRPC service (tonic + prost)
      proto/
        veles.proto  gRPC schema
    veles-mcp/     MCP server over stdio
    veles-cli/     CLI binary

The persistent index lives under <repo>/.veles/:

.veles/
  manifest.json   # model, dim, per-file (size, mtime, chunk_count)
  chunks.bin      # bincode Vec<Chunk>
  bm25.bin        # bincode BM25 inverted index
  dense.bin       # bincode dense matrix
  symbols.bin     # bincode tree-sitter symbols

update reuses embeddings of files whose (size, mtime) fingerprint hasn't changed, so refreshing after a small edit is near-instant on large repos.

License

MIT

About

Fast hybrid (BM25 + semantic) local code search for AI agents - pure Rust, persistent index, MCP/gRPC servers, tree-sitter symbols

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Rust 99.8%
  • Shell 0.2%