Skip to content

Support non-root entry points and nested directories in HTML5 zips#672

Draft
rtibbles wants to merge 1 commit into
learningequality:mainfrom
rtibbles:html5_entry_points
Draft

Support non-root entry points and nested directories in HTML5 zips#672
rtibbles wants to merge 1 commit into
learningequality:mainfrom
rtibbles:html5_entry_points

Conversation

@rtibbles
Copy link
Copy Markdown
Member

@rtibbles rtibbles commented Jun 4, 2026

Summary

HTML5 zips were rejected unless index.html existed at the archive root, even though Studio accepts such zips on upload. This ports Studio's findFirstHtml/cleanHTML5Zip behavior to the conversion pipeline: detect the HTML entry point (index.html at the common root, then any index.html, then the shallowest HTML file), strip a common parent directory shared by all archive members, and record a non-default entry point in extra_fields.options.entry so Kolibri loads it.

Two existing test fixtures asserted the old behavior (zips with only notindex.html or a nested index.html are invalid) and were updated, since those archives are now intentionally valid.

References

Ported from Studio's frontend/shared/utils/zipFile.js. Extracted from in-progress spreadsheet chef work; no linked issue.

Reviewer guidance

uv run --group test pytest tests/pipeline/test_convert.py tests/test_files.py tests/test_data.py

Areas worth a careful look:

  • _find_entry_html priority order (ricecooker/utils/pipeline/convert.py) — should match Studio's findFirstHtml exactly so ricecooker and Studio agree on the entry point
  • _prepare_archive rewrites denested zips through a temp file that is cleaned up in a finally — check the error paths
  • Validation loosening is intentional: archives that previously failed now pass

AI usage

I used Claude Code to extract this change from a larger working branch and add the tests; tests were verified to fail without the change, and I reviewed the final diff.

Ports Studio's findFirstHtml/cleanHTML5Zip behavior to the conversion
pipeline: detect the HTML entry point (preferring index.html at the
common root), denest archives whose files all share a common parent
directory, and record a non-default entry point in
extra_fields.options.entry so Kolibri loads it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant