Skip to content

docs: add OpenDataLoader PDF Reader demo notebook#21384

Open
hyunhee-jo wants to merge 1 commit intorun-llama:mainfrom
hyunhee-jo:docs/add-opendataloader-pdf-reader-demo
Open

docs: add OpenDataLoader PDF Reader demo notebook#21384
hyunhee-jo wants to merge 1 commit intorun-llama:mainfrom
hyunhee-jo:docs/add-opendataloader-pdf-reader-demo

Conversation

@hyunhee-jo
Copy link
Copy Markdown

Description

Add an example notebook for OpenDataLoader PDF Reader (opendataloader-pdf-llamaindex), an independently published LlamaIndex reader for fast, accurate, 100% local PDF extraction.

The reader is available on PyPI as opendataloader-pdf-llamaindex (v0.0.3). No new package is added to this repository — this PR is documentation only.

Notebook contents (44 cells, Colab-ready):

  • Basic usage with per-page Document splitting
  • All four output formats (text, Markdown, JSON, HTML)
  • Advanced options (tagged PDFs, table detection, sanitization, page selection, image handling)
  • Custom metadata via extra_info
  • Multiple file loading
  • SimpleDirectoryReader integration via file_extractor
  • Hybrid AI mode (docling-fast backend)
  • RAG pipeline with MarkdownNodeParser + VectorStoreIndex

Also adds a one-line entry to the data connectors module guide (modules.md).

New Package?

  • Yes
  • No

Version Bump?

  • Yes
  • No

Type of Change

  • This change requires a documentation update

How Has This Been Tested?

  • Ran all notebook cells end-to-end on Google Colab — all cells passed without errors
  • Validated notebook JSON structure and cell source format
  • Verified Colab badge URL resolves to the correct run-llama/llama_index path

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

Tests and lint checklist items are not applicable — this PR adds only a Jupyter notebook and a one-line markdown entry, with no Python package code.

AI Disclosure

This notebook was drafted with Claude Code assistance. All content was reviewed by the author, executed end-to-end on Google Colab, and revised through multiple code-review rounds before submission.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant