Skip to content

refactor(datafusion): convert files_size from table functions to system tables#325

Merged
JingsongLi merged 2 commits into
apache:mainfrom
JingsongLi:to_system_tables
May 19, 2026
Merged

refactor(datafusion): convert files_size from table functions to system tables#325
JingsongLi merged 2 commits into
apache:mainfrom
JingsongLi:to_system_tables

Conversation

@JingsongLi
Copy link
Copy Markdown
Contributor

Purpose

These are table metadata and fit better as system tables (table$physical_files_size) rather than UDTFs (physical_files_size('table')), consistent with $partitions, $snapshots, etc.

Brief change log

Tests

API and Format

Documentation

JingsongLi and others added 2 commits May 19, 2026 10:34
…s_size from table functions to system tables

These are table metadata and fit better as system tables (table$physical_files_size)
rather than UDTFs (physical_files_size('table')), consistent with $partitions, $snapshots, etc.
… as system tables

Replace the old table function documentation with system table ($) syntax.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

This refactor fits the existing DataFusion system table model: the two file-size providers now hang off <table>$physical_files_size and <table>$referenced_files_size, are registered through the shared system_tables registry, and the old table-function argument/catalog lookup layer is removed cleanly. The schemas and collection paths remain equivalent to the previous UDTF providers.

I checked the diff, verified the GitHub CI checks are green for this head commit, and locally ran:

  • cargo test -p paimon-datafusion system_tables
  • cargo test -p paimon-datafusion referenced_files_size
  • cargo test -p paimon-datafusion physical_files_size
  • cargo test -p paimon-datafusion --test system_tables -- --nocapture
  • cargo fmt --all -- --check

All passed. A future follow-up could add dedicated integration assertions for the two new $...files_size system tables, but I do not think that needs to block this refactor.

@JingsongLi JingsongLi merged commit 2b58d25 into apache:main May 19, 2026
8 checks passed
shyjsarah added a commit to shyjsarah/paimon-rust that referenced this pull request May 19, 2026
Upstream apache#325 converted referenced_files_size / physical_files_size
from table functions to system tables, so they no longer have
register_* functions. register_catalog now auto-registers only the
remaining UDTFs — vector_search and full_text_search.

The binding test is reworked accordingly: it verifies the two UDTFs
are registered by triggering their own argument-count validation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants