Skip to content

Shared cache backend (Postgres) for multi-server consistency#3

Merged
megamattron merged 6 commits into
mainfrom
feat/shared-cache-backend
Jun 6, 2026
Merged

Shared cache backend (Postgres) for multi-server consistency#3
megamattron merged 6 commits into
mainfrom
feat/shared-cache-backend

Conversation

@megamattron

Copy link
Copy Markdown
Member

Summary

Adds an opt-in shared cache backend so Cache is consistent across a horizontally-scaled deploy, without changing the per-process default. One line opts in:

app.cache(CacheBackend.postgres(dbFactory));   // shared, durable, cross-server-consistent
// default stays in-process if you never call this

The in-process default is unchanged and stays first-class (live objects, zero serialization). The choice is per use case, not per deployment — you can run both (in-process for hot read-through pages, a shared Cache for counters/invalidation that must be consistent).

Design + rationale: docs/2026-06-04-brace-shared-cache.md.

What's included (by phase)

  • Phase 1 — SPI + Postgres backend. Extract a CacheBackend SPI; Cache becomes a facade owning stats, TTL parsing, and Jackson value serialization. InMemoryBackend (default, live objects) + PostgresBackend (bytes via JDBC). Atomic incr (INSERT … ON CONFLICT … RETURNING), GIN-backed clearTag over TEXT[], read-time expiry. Postgres-only migration_pg/V8 (H2 never sees it).
  • Phase 2 — Rendered page caching. CachedHandler caches a serializable RenderedResponse snapshot, so a page rendered on one server is replayed by any other. No BraceHandler surgery needed (View renders eagerly).
  • Phase 3 — Ops + docs. shared flag on /ops/cache + /ops/status, scope: instance|fleet on clear, dashboard [clear fleet] label. BRACE-AGENTS.md + README.md updated.

Backends at a glance

Use Backend
Single server in-process (default)
Read-through of expensive compute, per-server copies OK in-process
Counters / rate limits / invalidation across servers shared
Cached pages that must match across the fleet shared

Shared-backend constraints (the in-process default has none): values must be Jackson-round-trippable and non-null; getOrSet single-flight is per-server, not global. The near-cache (L1/L2) tier is deferred by design — see the doc.

Code review

Ran a high-effort multi-agent review on the branch before this PR and fixed all 10 findings (commit 9c0b177). Notable real bugs caught:

  • Value/counter key collision on Postgres → split into a separate brace_cache_counters table (parity with the in-memory two-map design).
  • deserialize crash on corrupt/truncated bytes → bounds-checked; unreadable entries (incl. a class removed across a rolling deploy) are treated as a cache miss, not a 500.
  • Reuse Json.mapper() (consistent date handling); vary page key on HX-Request; stop the sweep thread on Brace.stop(); reject null values; size() cached ~5s; PostgresBackend uses DatabaseFactory.withSession.

Behavior / compatibility notes

  • Additive, no breaking changes. Existing apps that never call app.cache(backend) are unaffected.
  • Cache values must now be non-null on both backends (set(key, null) throws) — null is reserved for "missing". This was previously inconsistent (a permanent miss).
  • cache.wrap(...) hits now return a materialized (raw-bytes) Result; the response bytes are identical, but custom middleware reading result.body() should read the response bytes instead.
  • Page caching now varies on HX-Request.

Migration guide

docs/migrations/brace-0.1.6-to-0.1.7.md documents the optional opt-in. Separately, this branch adds a CLAUDE.md rule requiring a migration guide per version step and a docs/migrations/README.md index that flags a pre-existing gap (no guides for 0.1.1→0.1.6) — tracked for a separate backfill, not addressed here.

Tests

  • 601 unit tests (H2, Docker-free) — includes serialization, corruption→miss, null-rejection, htmx-key separation, value/counter independence.
  • 7 Postgres IT tests (Testcontainers, mvn verify) — same CacheBackendContract against real Postgres, plus atomic-incr-under-concurrency, GIN clearTag, read-time expiry, cross-instance page caching through BYTEA, and value/counter no-collision.

All green.

Extract storage behind a CacheBackend SPI; Cache becomes a facade owning
stats, TTL parsing, and Jackson value serialization. Ships two backends:

- InMemoryBackend: today's behavior unchanged (live objects, zero
  serialization), kept as the default — existing apps pay nothing.
- PostgresBackend: shared, cross-server-consistent, durable. Atomic incr
  via INSERT ... ON CONFLICT ... RETURNING, GIN-backed clearTag over
  TEXT[] tags, read-time expiry. Table in the Postgres-only migration_pg
  tier (V8), so H2 never sees it.

Opt in with app.cache(CacheBackend.postgres(dbFactory)); default stays
in-process. requiresSerialization() selects the byte path (shared) vs the
live-object fast path (in-memory). Values on a byte backend carry a
class-name header so a wrong-type get fails loudly and a non-serializable
value throws at set time. Page caching (CachedHandler) bypasses a
serializing backend for now — rendered-response caching lands in Phase 2.

Tests: 6 new serialization unit tests + a shared CacheBackendContract run
Docker-free via a SerializingMapBackend fixture; PostgresCacheBackendIT
runs the same contract plus atomic-incr-under-concurrency, GIN clearTag,
and read-time expiry against real Postgres. Existing CacheTest unchanged.

See docs/2026-06-04-brace-shared-cache.md.
CachedHandler now caches a RenderedResponse — a serializable snapshot of
the materialized response (status, content type, headers, body bytes) —
instead of the Result object. A page rendered on one server is replayed
by any other across a shared backend; it also works on the in-memory
backend (stored as a live object). This removes the Phase 1 bypass that
skipped page caching on serializing backends.

No render seam needed in BraceHandler: View.of renders eagerly at
construction, so a Result is already materialized by the time
CachedHandler sees it — RenderedResponse.from just snapshots its fields.
Result.raw rebuilds an arbitrary status/headers/bytes response on replay
(Result.bytes hardcodes 200).

Tests: cross-instance page-cache hit and status/header preservation
through serialization (unit, via SerializingMapBackend) and against real
Postgres BYTEA (IT). Existing wrap tests updated to read the effective
response body, since a cache hit now replays as raw bytes.
Surface whether the cache is shared so operators know clear is fleet-wide:

- CacheBackend.shared() (default false; PostgresBackend true), Cache.shared().
- /ops/cache and /ops/status report "shared"; POST /ops/cache/clear returns
  scope: "instance"|"fleet". Dashboard shows a shared/in-process label and a
  [clear fleet] vs [clear] button.

clearCache already mapped to TRUNCATE on the shared backend (fleet-wide) and
size() to a count query — no behavior change there, just clearer reporting.

Docs: BRACE-AGENTS.md and README.md document the in-process-vs-shared choice,
the one-line opt-in (app.cache(CacheBackend.postgres(dbFactory))), the
per-use-case framing, and the shared-backend constraints (Jackson-round-trippable
values, per-server getOrSet dogpile). Design doc marked Phases 1-3 done.

Tests updated for the new dashboard label and the shared stat.
- BRACE-AGENTS.md: document what clear() actually clears — data is
  fleet-wide on a shared backend (TRUNCATE) / instance-only on the
  default, but hit/miss/eviction stats are per-instance and only the
  handling box resets; and only the app-registered Cache is touched by
  the ops endpoint. Add a multi-server note to the cache-diagnosis
  runbook (size fleet-wide, hitRate per-instance).
- README.md: same clear/stats clarification.
- docs/migrations/brace-0.1.6-to-0.1.7.md: add an "optional shared cache
  backend" section (additive, non-breaking) with before/after opt-in.
- ClaudeMdGenerator: generated project CLAUDE.md now mentions the shared
  backend option, not just in-process.
Root cause of the missing migration guides: CLAUDE.md's documentation
rule covered BRACE-AGENTS.md/README.md but never mentioned the
docs/migrations/ guides, so agents had no instruction to write them.

- CLAUDE.md: add a "Migration guides (per version step)" rule — keep the
  in-progress (-SNAPSHOT) step's guide current as changes land, require a
  guide even for no-breaking-change steps (so a gap never reads as
  "nothing changed"), and surface missing guides rather than backfilling
  silently.
- docs/migrations/README.md: index of released steps with guide status,
  explicitly flagging the 0.1.1->0.1.6 guides as a known, untouched gap
  to backfill as a separate focused pass.
Correctness:
- PostgresBackend: split counters into brace_cache_counters so a key used
  as both a value and a counter no longer clobbers itself (parity with
  the in-memory two-map design). V8 migration updated; counterCount() now
  reports the real count.
- Cache.deserialize: bounds-check the length prefix and catch
  Class.forName/Jackson failures, treating a corrupt/truncated/
  class-removed entry as a cache MISS instead of crashing the request
  (NegativeArraySize/OOM/BufferUnderflow). Wrong-but-valid type still
  fails loudly. getOrSet recomputes on a corrupt entry.
- Cache: reuse Json.mapper() instead of a second ObjectMapper, so cached
  values and HTTP JSON share date/module config (no divergence).
- CachedHandler: vary the page-cache key on HX-Request so htmx partials
  and full pages don't share an entry; replay the snapshot on a miss too
  so miss and hit return the same materialized Result shape.
- Cache: reject null values on both backends (null is reserved for
  "missing"; previously diverged between get and getOrSet).
- Cache.close() stops the sweep thread; Brace.stop() calls it, so a
  Postgres-backed sweep no longer hammers a closed pool after shutdown.

Cleanup/efficiency:
- PostgresBackend.run() delegates to DatabaseFactory.withSession instead
  of hand-rolling open/begin/commit/rollback/close.
- PostgresBackend.size() caches the count(*) for ~5s (dashboard polls it).

Tests: +1 IT (value/counter no-collision on Postgres), +unit tests for
corrupt-bytes-as-miss, unknown-class-as-miss, getOrSet-recompute,
null-rejection (both backends), htmx-key separation, value/counter
independence, close(). Docs updated (two tables, null rule, htmx vary).
601 unit + 7 Postgres IT green.
@megamattron megamattron merged commit c369fb7 into main Jun 6, 2026
2 checks passed
@megamattron megamattron deleted the feat/shared-cache-backend branch June 6, 2026 23:05
megamattron added a commit that referenced this pull request Jun 14, 2026
Wire JMH into the brace-benchmark module (jmh-core + explicit annotationProcessorPaths
— JDK 23+ disables implicit annotation processing) with a programmatic JmhRunner that
always attaches the GC profiler, since gc.alloc.rate.norm is the point. run-jmh.sh
installs the framework, rebuilds the benchmark jar, and runs from the repo root.

RenderAllocBench isolates M6's before/after on the render unit: a jte engine without
binaryStaticContent (StringOutput -> toString -> getBytes) vs with it (Utf8ByteOutput ->
toByteArray), plus the JSON pair (writeValueAsString().getBytes() vs writeValueAsBytes()),
parameterized by row count.

Results (gc.alloc.rate.norm, deterministic +/-0.001 B/op; JDK 25):
  View render   12 rows: 34,008 -> 14,184 B/op (-58%);  100 rows: 127,187 -> 107,432 (-16%)
  JSON serialize 12 rows: 8,560 -> 1,656 B/op (-81%);   100 rows: 67,904 -> 18,448 (-73%)
The View static-content saving is a near-constant ~19.8 KB/render regardless of rows,
exactly as binaryStaticContent predicts. Time also dropped more than the review predicted
(render -37%, JSON -43% at 100 rows) — a real CPU cut, not just GC pressure. Full results
and mechanism notes recorded in the findings doc.
megamattron added a commit that referenced this pull request Jun 14, 2026
Follow-ups from the merge-gate code review of the Low batch:

#1 Verify ?v= against the current fingerprint before promising immutable.
   serveStaticFile trusted any ?v= param's presence and emitted 1-year
   immutable; a stale or hand-rolled ?v= (or one a CDN/client appended) could
   pin wrong/old bytes for a year. Now Assets.currentVersion(path) returns the
   current content hash (shared (path,mtime) cache) and only an exact match
   earns immutable; everything else is revalidate-always.

#2 Bundled htmx.min.js now carries an ETag + revalidate Cache-Control and
   honors conditional GETs (304), so browsers skip the ~50KB re-download each
   page. Not immutable — a brace upgrade can change the bytes at that fixed URL.

#3 serveStaticFile uses one Files.readAttributes (size+mtime+isRegularFile)
   instead of four separate File stat syscalls (exists/isFile/length/lastModified).

#4 Removed the now-dead null-invoker fallback in the request path and the
   unused null-producing Route(method,pattern,handler) constructor, so 'every
   Route has a non-null invoker' (L1) is enforced by construction.

#8 isNotModified: dropped the unused trimmed-copy var (split ran on the original).

StaticFilesTest +2 (stale ?v= -> revalidate, htmx revalidates); suite 846/846.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant