savings: record on every read path, persist the ledger in the sidecar database#76
Merged
Conversation
Single-repo mode indexes nodes and file paths without a repo prefix
while registering the repo's metadata under its real prefix, so
RepoRoot(""), MultiIndexer.ResolveFilePath, and the MCP path resolvers
refused every node the single-repo indexer minted: get_symbol_source
errored with 'could not resolve repo root (repo_prefix="")',
smart_context silently dropped its embedded source, and bare-relative
read paths were rejected — and because savings recording sits behind
those source reads, the token-savings ledger never saw an observation
on single-repo daemons.
With exactly one tracked repo the empty prefix is unambiguous: resolve
it to the lone repo's root. Two or more tracked repos keep the
ambiguity miss/error.
The savings ledger only saw get_symbol_source, batch_symbols (include_source:true), and smart_context's embedded sources — while read_file, get_file_summary, and get_editing_context, the tools agents actually lean on as Read replacements, recorded nothing. A session served entirely by those tools left the savings dashboard empty no matter how much it saved. Each of the three now books a server-side observation: read_file with exact counts of the content it returned vs the original bytes (an uncompressed read books returned == baseline, counting the call without inventing savings), and the summary-shaped tools against the on-disk size of the file their response stands in for, using the payload itself to calibrate the chars-per-token ratio. A synthesized attribution node carries the repo prefix and language so per-repo/per-language buckets stay correct, including the lone-repo fallback for single-repo mode.
The flat-file ledger (savings.json cumulative totals + savings.jsonl event log under the cache dir) was only durable on a lucky schedule: the cumulative file flushed every 20 observations, on a 5-minute ticker gated on pending work, or on graceful shutdown — and MCP clients SIGKILL their stdio servers, so light sessions never reached disk at all. Every write error was silently discarded on top. The ledger now lives in the machine-global SQLite sidecar (~/.gortex/sidecar.sqlite, shared with notes/memories/scopes): savings_events one row per call (now carrying the session id), savings_totals running aggregates per bucket, savings_meta the first/last stamps. Each observation is a single transaction — durable at the call, safe across concurrent writer processes via WAL, no flush machinery left to miss (the periodic flusher and its wiring are gone; FlushSavings stays as a no-op for the shutdown chains). Flat files import once on open — totals, per-repo/per-language buckets, and the event history (a lone .jsonl without its cumulative file rebuilds totals from events) — then rename to *.bak behind a migration mark. The savings/gain CLIs and serverstack derive the ledger from the same sidecar path the side-stores use, --cache-dir now relocating both the ledger and the legacy files it imports. A fresh ledger reports a zero FirstSeen instead of seeding time.Now(), so nothing claims to have been tracking before anything was recorded.
The empty state hinted at three tools when six record, the help text still described the flat files the ledger replaced, and --cache-dir's description pointed at savings.json. The empty-state test now also pins that a never-used ledger prints no 'tracking since' line — the zero FirstSeen stays hidden instead of being passed off as a start date.
Actualise the savings docs for the sidecar-backed ledger: where it lives, the transactional durability story, the legacy flat-file import, and the real recording surface (the read family records too, and the per-call value is server-side accounting, not a response field).
The legacy-import guard checked migration_marks before taking the write path, so two processes racing the first start (daemon + CLI) could both pass the check and double-seed the ledger. The mark is now re-checked and written inside the import transaction: the loser either sees the winner's mark or aborts on the write conflict.
Deriving the ledger from the side-store dir split it by entry point: the embedded server's side stores default to the cache dir, so its ledger landed in <cache>/sidecar.sqlite while the savings CLI reads ~/.gortex/sidecar.sqlite — the writer/reader divergence the flat files had. Every entry point now defaults to the machine-global sidecar; an explicit --cache-dir still relocates both the ledger and the legacy files for isolation.
…ount transitions The review of the lone-repo fallback found it trusted whichever repo happened to be the only one tracked. Three hardening changes: RepoMetadata carries Unprefixed — stamped where single-repo mode mints unprefixed nodes — and the empty-prefix fallback honours only a lone repo that actually did. A 1→2→1 track/untrack sequence leaves a prefixed lone survivor, and stale unprefixed node IDs now keep failing closed instead of resolving (and writing) into the wrong checkout. Untracking a single-mode repo evicts its nodes file-by-file: they never enter the byRepo bucket EvictRepo walks, so untrack used to remove nothing and the next lone repo inherited them. Tracking a second repo into a live single-repo daemon re-mints the first repo's nodes under its real prefix (index first, evict the unprefixed originals after — crash leaves both forms resolvable, never neither), so its symbols stay reachable when the fallback disarms. Plus a collision guard in both path resolvers: a lone repo whose directory name matches one of its own top-level directories (repo "api" containing api/handlers.go) resolves the raw on-disk path instead of being hijacked by the prefix-strip join.
…ration Three accounting defects from the review. Conditional fetches that hit the etag (if_none_match) booked full-content savings while shipping a 47-byte not_modified stub — a polling client minted unbounded fake savings; read_file and get_file_summary now record after the conditional-fetch return, like get_editing_context always did, which also skips tokenizing the content on the cheap turnaround. EstimateFromSample ran the uncached tokenizer over the whole sample on every call (~200ms per MB on the read_file hot path, even when the line above had just cache-counted identical bytes); it now goes through the disk cache, and an uncompressed read_file skips the second pass entirely (returned == baseline by construction). Attribution: get_file_summary records the payload of the format it actually returns instead of pricing every format off the compact rendering (which also double-rendered the compact path); out-of-repo absolute reads stay unattributed instead of polluting the lone repo's buckets; and symbol-tool events in single-repo mode now land in the same per-repo bucket as the read family via the lone-prefix attribution.
The content-addressed cache grew one inode per unique payload forever — and the read-family instrumentation now feeds it every file version. Every 64th write prunes the shard it just wrote of entries idle longer than 30 days; read hits refresh the entry mtime so the TTL approximates LRU for content still flowing through the counters. Best-effort by construction: a swept entry is just a future miss.
Crash class: a legacy savings.json carrying JSON null bucket values panicked the import on every server start — readLegacyFile now drops nil entries at the single choke point. A hard event-log read error aborts the import without marking or renaming, so a truncated read is retried next open instead of becoming a permanent loss. Consistency class: ResetSavings runs its three DELETEs in one transaction (a concurrent observation either survives whole or is wiped whole); the import floors totals per bucket and field at what the events reconstruct, so a flush-lagged cumulative file can't leave 'Last 7 days' above 'All time'; the already-imported path self-heals lingering flat files left by a crash between commit and rename. Diagnosability class: a failing ledger no longer mimics a fresh install — Snapshot returns its read error (the CLI and graph_stats surface it), dropped writes warn once on stderr and ride on the snapshot as dropped_observations, and rename failures propagate. Scale class: the dashboard now reads one week of events plus a SQL per-tool aggregate (SavingsToolTotals) instead of materializing the full event history per invocation, and Store gains Close so tests and one-shot callers release the process-cached handle.
…tive reads Three wiring holes from the review. The embedded server's --cache-dir still relocated its ledger away from the dashboard's default read path — the exact writer/reader split this branch exists to fix; the flag now moves only the graph cache, and ledger isolation comes from XDG_DATA_HOME / XDG_CACHE_HOME, which both ledger paths honour. The savings/gain CLIs run the one-shot legacy import only against the default locations: pointing a dashboard at a directory with --cache-dir must never rename files there as a side effect of looking. And the serverstack constructor test pinned its SavingsPath + SavingsLegacyJSON to temp paths — with both empty it opened the REAL machine-global sidecar and imported (renaming!) the developer's live flat-file ledger on every 'go test ./internal/serverstack'. The config doc now matches the machine-global behavior and carries the warning. Also states the percentage semantics in the dashboard help: bars cover ALL recorded source fetches, including uncompressed read_file calls that saved nothing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #67
What was broken
gortex savingsstayed permanently empty for single-repo users — the typical fresh install — because three defects stacked:repo_prefix="", butRepoRoot("")unconditionally refused the empty prefix and the lone-indexer fallback inresolveNodePathwas dead code (aMultiIndexeris always constructed).get_symbol_sourcefailed withcould not resolve repo root (repo_prefix=""),smart_contextsilently dropped its embedded source — and since every savings record site sits behind that resolution, zero observations were ever taken with one tracked repo. Multi-repo setups masked the bug, which is why maintainer machines never saw it.get_symbol_source,batch_symbolswithinclude_source,smart_context).read_file— the flagship "instead of Read" tool — recorded nothing.savings.jsonflushed every 20 observations, on a pending-gated 5-minute ticker, or on graceful shutdown — and MCP clients SIGKILL their stdio servers. Light sessions never produced a file at all, and the dashboard printed a fabricated "tracking since " on machines that had never recorded anything.What changed
Resolution. The empty prefix now resolves to the lone tracked repo — with provenance:
RepoMetadata.Unprefixedrecords which repo actually minted unprefixed nodes, so after track/untrack transitions stale IDs keep failing closed instead of resolving into the wrong checkout. Untracking a single-mode repo evicts its nodes file-by-file (they were invisible to thebyRepo-bucket eviction and lingered forever), and tracking a second repo re-mints the first repo's nodes under its real prefix so they stay reachable when the fallback disarms. Bare-relative paths (read_file path:"main.go") anchor to the lone repo with the usual containment checks, and a repo whose directory name collides with one of its own top-level dirs resolves by on-disk existence instead of prefix-stripping.Recording.
read_file,get_file_summary, andget_editing_contextnow book observations: tokens actually returned vs the full-file read the response stands in for (an uncompressedread_filebooksreturned == baseline— counted, not invented savings). Conditional fetches that hit the etag book nothing. Events carry the session ID and attribute to the lone repo's prefix in single-repo mode, the same bucket multi-repo mode uses.Persistence. The ledger moves from
~/.gortex/cache/savings.json+savings.jsonlinto the machine-global sidecar (~/.gortex/sidecar.sqlite):savings_events(one session-tagged row per call),savings_totals(running aggregates per bucket),savings_meta. Every observation is a single transaction — durable at the call under SIGKILL and safe across concurrent writer processes — so the periodic-flush machinery is gone. Flat files import once behind an in-transaction migration mark (totals floored at what the events reconstruct, so a flush-lagged cumulative file can't put "Last 7 days" above "All time"), then rename to*.bak. Every entry point and the savings/gain CLIs resolve the same database; per-tool breakdowns aggregate in SQL instead of materializing the event history.Honesty & diagnosability. No fabricated "tracking since" on an empty ledger; an unreadable ledger reports its error instead of posing as a fresh install; dropped writes warn once and ride on snapshots as
dropped_observations; a legacy file with JSONnullbuckets imports instead of crash-looping the daemon; resets are transactional; the token-count disk cache age-sweeps instead of growing one inode per content version forever.Compatibility notes
gortex savings) and renamed*.bak.gortex savings --cache-dir/gortex gain --cache-dirnow read the given directory'ssidecar.sqliteand never import/rename files there;gortex mcp --cache-dirno longer relocates the ledger (isolation comes fromXDG_DATA_HOME/XDG_CACHE_HOME). Theevents_pathkey left the--jsonoutput;first_seen/last_updatedare omitted until something is recorded.graph_statscumulative_savingsis unchanged (plus optionaldropped_observations).