Fix: MCP commit_draft shipped raw bzip2 bytes into published HTML (4.10.1)#101
Open
kennethphough wants to merge 1 commit into
Open
Fix: MCP commit_draft shipped raw bzip2 bytes into published HTML (4.10.1)#101kennethphough wants to merge 1 commit into
kennethphough wants to merge 1 commit into
Conversation
…10.1) A page's header/footer are FKs to KyteSectionTemplate, which stores its content bzip2-compressed. getObject()'s FK expansion returns those fields raw, and the page-assembly path concatenates them straight into the output. The human publish path was saved only incidentally by hook_response_data() decompressing header/footer first; the MCP commit path (DraftService::commitDraft -> publishFromContent) never runs that hook, so it shipped the raw "BZh..." bytes into the published <style>/<footer> blocks (browser UTF-8 decode error + garbled CSS). Deterministic per-path, not a race. - Add KytePageController::decompressSectionTemplate(), called from both hook_response_data() and publishFromContent() so decompression is path-independent. - Fix latent guard bug: the old all-or-nothing isset() check skipped decompression of html/stylesheet/javascript when block_layout was null; each field now decompresses independently. - Add KytePageController::hasBinaryContamination() integrity guard; publishPage() now aborts before the S3 write (returns false -> commit reports committed:false, draft intact) if the assembled HTML contains bzip2 stream magic or invalid UTF-8. - Write published HTML with Content-Type: text/html; charset=utf-8 (S3::write() gained an optional content-type via stream context; other callers unchanged). - Tests: tests/PublishIntegrityTest.php (per-field decompression incl. the null-block_layout regression; contamination detector). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Publishing a page via the MCP
commit_draftflow could inject raw bzip2-compressed binary (BZh9…) into the published HTML, inside<style>/<footer>blocks — browser UTF-8 decode error + garbled header/footer CSS. Surfaced on FrameVTO / doctor.etometry.com (page 58, v13). Tracked as Tempo card #228.Root cause
A page's
header/footerare FKs toKyteSectionTemplate, which storeshtml/stylesheet/javascript/block_layoutbzip2-compressed.getObject()'s FK expansion returns those fields RAW, and the assembly path (createHtml→buildHeaderFooterStylesetc.) concatenates them straight into the output.update,state=1) was saved only incidentally:KytePageController::hook_response_data()runs first and decompresses$r['header']/$r['footer'].DraftService::commitDraft→publishForSurface→KytePageController::publishFromContent) callsgetObject()then goes straight topublishPage(), never invoking that hook → ships raw compressed bytes.Deterministic per-path (not a race): any page with a populated header/footer section template was affected. The page's own html/css/js were always fine (decompressed by
DraftService::versionContent); only the section templates leaked.Changes
KytePageController::decompressSectionTemplate(&$section)(usesBz2Codec::decompressIfBz2), called from bothhook_response_data()andpublishFromContent().hook_response_datarequired all four fields set or decompressed none; a nullblock_layoutwould have leaked the other three even on the HTTP path. Now each field decompresses independently.publishPage()runshasBinaryContamination()(bzip2 magicBZh[1-9]1AY&SYor invalid UTF-8) and aborts before the S3 write (returns false → MCP commit reportscommitted:false, draft intact).Content-Type: text/html; charset=utf-8(S3::write()gained an optional content-type via stream context; other callers unchanged).Tests
tests/PublishIntegrityTest.php— per-field decompression (incl. null-block_layoutregression) + contamination detector (embedded bzip2 stream, invalid UTF-8, no false-positive on literal "BZh" prose).Verified in a PHP 8.3 container with the
bz2extension:OK (12 tests, 20 assertions)(8 new + 4 existingBz2CodecTest);php -lclean. Full DB-backed suite not run.🤖 Generated with Claude Code