fix(code-graph): respect .gitignore + collision-safe symbol ids (indexing never crashes on a repo)#30
Merged
Merged
Conversation
…xing never crashes on a repo) Indexing a repo with build output (e.g. a `playwright-report/` of minified bundles) crashed the WHOLE index: `Found duplicated primary key value …codeMirrorModule-Ds_H_9Yq.js#h#5`. Two root causes, both fixed: 1. We indexed gitignored files. `discoverFiles` now uses `git ls-files` (TRACKED files only) in a git repo, so .gitignore'd build output / vendored bundles / reports are never indexed — the principled fix vs. a hardcoded skip-list, and consistent with co-change being git-based. A non-git dir still falls back to the SKIP_DIRS filesystem walk. (Incremental already respected ignores — `git diff` never reports ignored files.) 2. Symbol PK `file#name#startLine` wasn't collision-safe — a minified file packs many declarations on one line, so two same-named symbols at the same line duplicated the PK and Kùzu aborted the entire index. `uniqueSymbolId` appends a stable `#<n>` suffix on collision (deterministic extraction order ⇒ stable across re-indexes), so one pathological file can never take down the whole graph. Verified: re-indexing playright now SUCCEEDS — 1328 files / 4759 symbols / 4255 imports / 100 co-change pairs (was: crash → an edgeless partial graph). The viz code-graph landing now renders it with structure (80 nodes / 214 edges). Tests: new `gitignore-robust` integration scenario (a tracked dup-symbol file survives; a gitignored file is not indexed). 552 unit green; typecheck + build green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ly (PR #30 review) Address the Plex review of PR #30: - IMPROVEMENT (item 1): `git ls-files` lists TRACKED files only, so a brand-new source file created mid-feature but not yet `git add`ed got no File/Symbol nodes — an empty blast radius for the very files a review is about. Switch to `git ls-files --cached --others --exclude-standard` (rename listTrackedFiles → listWorktreeFiles): the whole working tree MINUS ignored paths. Build output (`playwright-report/`, `dist/`, the self-ignored `.plex`) is still excluded; new uncommitted source is now included. The correct implementation of "respect .gitignore". - NIT (item 2): documented the intentional git-vs-fallback divergence on committed dot-directory source (`.storybook/*.ts`) — the git path indexes it as real code, the non-git walk skips it; git is authoritative, the walk is best-effort. No behaviour change. - NOTE (item 3): no change — `uniqueSymbolId`'s `seen` loop guarantees PK uniqueness regardless of whether the `#<n>` suffix could theoretically collide (the reviewer agreed it's not a live bug). Tests: the `gitignore-robust` scenario now also asserts an UNTRACKED-not-ignored file IS indexed (`res.files === 2`, the report/ dir excluded). 552 unit green; typecheck + build green; the scenario passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Owner
Author
|
Plex review feedback addressed — ready for re-review. The Plex review (in-session) raised 1 improvement + 1 nit + 1 note. The core fix (collision-safe symbol ids; gitignore-respect) was verified sound — the
Tests: the Checks: No GitHub review threads existed to reply to/resolve (the review was in-session). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed
Indexing a repo that contains build output (e.g. a Playwright
playwright-report/full of minified bundles) crashed the entire index:Two root causes, both fixed:
discoverFilesnow usesgit ls-files(TRACKED files only) in a git repo, so.gitignored build output, vendored bundles, and generated reports are never indexed — the principled fix vs. a hardcoded skip-list, and consistent with co-change already being git-based. A non-git directory falls back to theSKIP_DIRSfilesystem walk. (The incremental path already respected ignores —git diffnever reports ignored files.)file#name#startLinewasn't collision-safe. A minified file packs many declarations on one line, so two same-named symbols at the same line duplicated the PK and Kùzu aborted the whole index.uniqueSymbolIdappends a stable#<n>suffix on collision (deterministic extraction order ⇒ stable across re-indexes), so one pathological file can never take down the graph — belt-and-suspenders even for a committed minified/vendored file.Verification
Re-indexing the repo that crashed (playright) now succeeds:
1328 files / 4759 symbols / 4255 imports / 100 co-change pairs(was: crash → an edgeless partial graph). The viz code-graph landing renders it with structure (80 nodes / 214 edges) instead of edgeless.How to test
pnpm test:integration— newgitignore-robustscenario: a tracked file with two same-named declarations on one line survives the build (both symbols, disambiguated ids), and a.gitignored file is not indexed.pnpm test:unit(552) +pnpm typecheck+pnpm buildgreen.plex index <repo-with-a-gitignored-report-dir>→ succeeds; the report dir is absent from the graph.Notes
This was the launch-checklist "generated-file index crash" item — a real "user indexes a normal repo with a
dist//report dir → broken graph" bug. Independent of the viz PRs.🤖 Generated with Claude Code