Clemg/diff byte arena by clemg · Pull Request #775 · pierrecomputer/pierre

clemg · 2026-06-03T20:26:32Z

Description

Changing the way we represent patch storage from a string[] to a byte arena (a contiguous byte array), which uses less memory the bigger the PR gets.
For full explanation and details, see: #760

Note that I didn't update any documentation yet

Motivation & Context

About half of the time, I was getting OOMs crashes on huge PRs like the linux v6..v7 comparison (which is not too crazy given the size of the patch), but still annoying. I think this other way of representing the lines is more efficient and can help either rendering bigger diffs on good hardware, or just normal diff on older hardware

Type of changes

Bug fix (non-breaking change which fixes an issue)
Refactoring (non-breaking change)
New feature (non-breaking change which adds functionality). You must have
first discussed with the dev team and they should be aware that this PR is
being opened
Breaking change (fix or feature that would change existing functionality).
You must have first discussed with the dev team and they should be aware
that this PR is being opened
Documentation update

Checklist

I have read the
contributing guidelines
My code follows the code style of the project (bun run lint)
My code is formatted properly (bun run format)
I have updated the documentation accordingly (if applicable)
I have added tests to cover my changes (if applicable)
All new and existing tests pass (bun run diffs:test)

How was AI used in generating this PR

The tests have been fully generated by opus 4.8

Related issues

See: #760

`additionLines`/`deletionLines` change from `string[]` to `DiffLines`: a plain data object holding a file's lines as one UTF-8 byte arena plus an offset table, decoded on demand via `lineAt` / `joinLines`. On a huge diff (linux v6..v7, ~22.8M lines across ~77k files) this avoids tens of millions of tiny `String` objects, so the V8 heap drops ~33% on that compare and the parser is faster: it no longer encode+decode-detaches every line, it encodes once on seal and decodes only the visible (virtualized) lines. It is plain data on purpose, so it survives structured clone (the highlight worker), `structuredClone`, and IndexedDB without a revive step (no class, no prototype to drop). `.length` stays a field, so the many `.length` consumers are unchanged; only content reads migrate (`x[i]` -> `lineAt(x, i)`, `x.join('')` -> `joinLines(x)`). Per-file offsets use the smallest int width that fits the file. A file with a lone surrogate keeps exact strings as a fallback, and merge-conflict diffs keep plain strings (no encode) so their parse stays at parity. The parsed model is byte-identical to before (snapshot + content-hash).

Adds diffLines.test.ts (arena round-trip, multibyte, emoji-keeps-arena, lone-surrogate fallback, BOM, offset-width, plainLines, joinLines, isWellFormed) and a withPlainLines snapshot converter so the existing parsed-model snapshots assert byte-identical line content.

The byte-arena type change makes additionLines/deletionLines a DiffLines, so the editor's FileDiff whole-side accessors (getDeletionFile/getAdditionFile) read them with joinLines(...) instead of .join('').

vercel · 2026-06-03T20:26:38Z

@clemg is attempting to deploy a commit to the Pierre Computer Company Team on Vercel.

A member of the Team first needs to authorize it.

clemg added 3 commits June 3, 2026 16:45

fix(diffs): read the editor's whole-side file contents via joinLines

90ac3b0

The byte-arena type change makes additionLines/deletionLines a DiffLines, so the editor's FileDiff whole-side accessors (getDeletionFile/getAdditionFile) read them with joinLines(...) instead of .join('').

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clemg/diff byte arena#775

Clemg/diff byte arena#775
clemg wants to merge 3 commits into
pierrecomputer:beta-1.3from
clemg:clemg/diff-byte-arena

clemg commented Jun 3, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

clemg commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation & Context

Type of changes

Checklist

How was AI used in generating this PR

Related issues

Uh oh!

vercel Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

clemg commented Jun 3, 2026 •

edited

Loading