Support non-root entry points and nested directories in HTML5 zips#672
Draft
rtibbles wants to merge 1 commit into
Draft
Support non-root entry points and nested directories in HTML5 zips#672rtibbles wants to merge 1 commit into
rtibbles wants to merge 1 commit into
Conversation
Ports Studio's findFirstHtml/cleanHTML5Zip behavior to the conversion pipeline: detect the HTML entry point (preferring index.html at the common root), denest archives whose files all share a common parent directory, and record a non-default entry point in extra_fields.options.entry so Kolibri loads it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
HTML5 zips were rejected unless
index.htmlexisted at the archive root, even though Studio accepts such zips on upload. This ports Studio'sfindFirstHtml/cleanHTML5Zipbehavior to the conversion pipeline: detect the HTML entry point (index.htmlat the common root, then anyindex.html, then the shallowest HTML file), strip a common parent directory shared by all archive members, and record a non-default entry point inextra_fields.options.entryso Kolibri loads it.Two existing test fixtures asserted the old behavior (zips with only
notindex.htmlor a nestedindex.htmlare invalid) and were updated, since those archives are now intentionally valid.References
Ported from Studio's
frontend/shared/utils/zipFile.js. Extracted from in-progress spreadsheet chef work; no linked issue.Reviewer guidance
uv run --group test pytest tests/pipeline/test_convert.py tests/test_files.py tests/test_data.pyAreas worth a careful look:
_find_entry_htmlpriority order (ricecooker/utils/pipeline/convert.py) — should match Studio'sfindFirstHtmlexactly so ricecooker and Studio agree on the entry point_prepare_archiverewrites denested zips through a temp file that is cleaned up in afinally— check the error pathsAI usage
I used Claude Code to extract this change from a larger working branch and add the tests; tests were verified to fail without the change, and I reviewed the final diff.