Skip to content

[codex] Restore transcript package and output directory#2

Draft
ahassoun wants to merge 2 commits into
mainfrom
codex/rebuild-transcript-pipeline
Draft

[codex] Restore transcript package and output directory#2
ahassoun wants to merge 2 commits into
mainfrom
codex/rebuild-transcript-pipeline

Conversation

@ahassoun

Copy link
Copy Markdown
Collaborator

Summary

  • Restore the deleted output/digital_transformation_with_eric_kimberling directory from the last known-good output commit.
  • Restore the tracked src/scrapling_cli package after the latest main removed it while wrappers/tests still depended on it.
  • Integrate transcript-provider handling so YouTube transcript API provider-block errors fall through without retrying the same backend, and cached unavailable transcript results are retried during live regeneration.

Validation

  • .venv/bin/python -m compileall src scrapling_cli.py fetch_new.py tests
  • .venv/bin/python -m pytest (25 passed, 1 skipped)
  • .venv/bin/python scrapling_cli.py --help
  • .venv/bin/python fetch_new.py --help

Live generation note

  • IBM Technology regenerated successfully with 26/26 selected transcripts available, but generated output was not committed because the second channel was blocked.
  • Digital Transformation with Eric Kimberling hit provider blocking: youtube-transcript-api reported YouTube IP blocking, yt-dlp returned HTTP 429, and OpenAI ASR was unavailable because OPENAI_API_KEY is not set.
  • Per the blocker rule, no partial regenerated output commit is included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant