Skip to content

Fix: MCP commit_draft shipped raw bzip2 bytes into published HTML (4.10.1)#101

Open
kennethphough wants to merge 1 commit into
masterfrom
fix/mcp-publish-bzip2-section-templates
Open

Fix: MCP commit_draft shipped raw bzip2 bytes into published HTML (4.10.1)#101
kennethphough wants to merge 1 commit into
masterfrom
fix/mcp-publish-bzip2-section-templates

Conversation

@kennethphough
Copy link
Copy Markdown
Member

Summary

Publishing a page via the MCP commit_draft flow could inject raw bzip2-compressed binary (BZh9…) into the published HTML, inside <style>/<footer> blocks — browser UTF-8 decode error + garbled header/footer CSS. Surfaced on FrameVTO / doctor.etometry.com (page 58, v13). Tracked as Tempo card #228.

Root cause

A page's header/footer are FKs to KyteSectionTemplate, which stores html/stylesheet/javascript/block_layout bzip2-compressed. getObject()'s FK expansion returns those fields RAW, and the assembly path (createHtmlbuildHeaderFooterStyles etc.) concatenates them straight into the output.

  • Human publish (HTTP update, state=1) was saved only incidentally: KytePageController::hook_response_data() runs first and decompresses $r['header']/$r['footer'].
  • MCP commit (DraftService::commitDraftpublishForSurfaceKytePageController::publishFromContent) calls getObject() then goes straight to publishPage(), never invoking that hook → ships raw compressed bytes.

Deterministic per-path (not a race): any page with a populated header/footer section template was affected. The page's own html/css/js were always fine (decompressed by DraftService::versionContent); only the section templates leaked.

Changes

  1. Path-independent decompression — new KytePageController::decompressSectionTemplate(&$section) (uses Bz2Codec::decompressIfBz2), called from both hook_response_data() and publishFromContent().
  2. Latent guard bug — old hook_response_data required all four fields set or decompressed none; a null block_layout would have leaked the other three even on the HTTP path. Now each field decompresses independently.
  3. Output integrity guardpublishPage() runs hasBinaryContamination() (bzip2 magic BZh[1-9]1AY&SY or invalid UTF-8) and aborts before the S3 write (returns false → MCP commit reports committed:false, draft intact).
  4. Charset — published HTML written with Content-Type: text/html; charset=utf-8 (S3::write() gained an optional content-type via stream context; other callers unchanged).

Tests

tests/PublishIntegrityTest.php — per-field decompression (incl. null-block_layout regression) + contamination detector (embedded bzip2 stream, invalid UTF-8, no false-positive on literal "BZh" prose).

Verified in a PHP 8.3 container with the bz2 extension: OK (12 tests, 20 assertions) (8 new + 4 existing Bz2CodecTest); php -l clean. Full DB-backed suite not run.

🤖 Generated with Claude Code

…10.1)

A page's header/footer are FKs to KyteSectionTemplate, which stores its
content bzip2-compressed. getObject()'s FK expansion returns those fields
raw, and the page-assembly path concatenates them straight into the output.
The human publish path was saved only incidentally by hook_response_data()
decompressing header/footer first; the MCP commit path
(DraftService::commitDraft -> publishFromContent) never runs that hook, so
it shipped the raw "BZh..." bytes into the published <style>/<footer> blocks
(browser UTF-8 decode error + garbled CSS). Deterministic per-path, not a race.

- Add KytePageController::decompressSectionTemplate(), called from both
  hook_response_data() and publishFromContent() so decompression is
  path-independent.
- Fix latent guard bug: the old all-or-nothing isset() check skipped
  decompression of html/stylesheet/javascript when block_layout was null;
  each field now decompresses independently.
- Add KytePageController::hasBinaryContamination() integrity guard;
  publishPage() now aborts before the S3 write (returns false ->
  commit reports committed:false, draft intact) if the assembled HTML
  contains bzip2 stream magic or invalid UTF-8.
- Write published HTML with Content-Type: text/html; charset=utf-8
  (S3::write() gained an optional content-type via stream context;
  other callers unchanged).
- Tests: tests/PublishIntegrityTest.php (per-field decompression incl. the
  null-block_layout regression; contamination detector).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant