refactor(datafusion): convert files_size from table functions to system tables#325
Merged
Merged
Conversation
…s_size from table functions to system tables
These are table metadata and fit better as system tables (table$physical_files_size)
rather than UDTFs (physical_files_size('table')), consistent with $partitions, $snapshots, etc.
… as system tables Replace the old table function documentation with system table ($) syntax. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
leaves12138
approved these changes
May 19, 2026
leaves12138
left a comment
There was a problem hiding this comment.
Looks good to me.
This refactor fits the existing DataFusion system table model: the two file-size providers now hang off <table>$physical_files_size and <table>$referenced_files_size, are registered through the shared system_tables registry, and the old table-function argument/catalog lookup layer is removed cleanly. The schemas and collection paths remain equivalent to the previous UDTF providers.
I checked the diff, verified the GitHub CI checks are green for this head commit, and locally ran:
cargo test -p paimon-datafusion system_tablescargo test -p paimon-datafusion referenced_files_sizecargo test -p paimon-datafusion physical_files_sizecargo test -p paimon-datafusion --test system_tables -- --nocapturecargo fmt --all -- --check
All passed. A future follow-up could add dedicated integration assertions for the two new $...files_size system tables, but I do not think that needs to block this refactor.
shyjsarah
added a commit
to shyjsarah/paimon-rust
that referenced
this pull request
May 19, 2026
Upstream apache#325 converted referenced_files_size / physical_files_size from table functions to system tables, so they no longer have register_* functions. register_catalog now auto-registers only the remaining UDTFs — vector_search and full_text_search. The binding test is reworked accordingly: it verifies the two UDTFs are registered by triggering their own argument-count validation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
These are table metadata and fit better as system tables (table$physical_files_size) rather than UDTFs (physical_files_size('table')), consistent with $partitions, $snapshots, etc.
Brief change log
Tests
API and Format
Documentation