expose table observability API for pypaimon_rust by SML0127 · Pull Request #307 · apache/paimon-rust

SML0127 · 2026-05-03T14:40:23Z

Purpose

Linked issue: close #285

Brief change log

added PartitionStat and Table::partition_stats / list_partitions.
exposed latest_snapshot, list_snapshots, list_partitions, partition_stats, and list_tags

Tests

API and Format

Documentation

SML0127 · 2026-05-03T14:43:41Z

I'll rebase this pr once #306 is merged

JingsongLi

Thanks for the contribution! This extends the Python bindings with useful observability APIs (snapshots, tags, partitions).

A few review comments:

PR #306 dependency: This PR builds on #306 (catalog metadata API). Given that #306 still has requested changes outstanding (type stubs and rebase), this PR should wait for #306 to merge first. Consider rebasing once #306 is resolved.
partition_stat.rs — manifest entry scan approach: The read_all_manifest_entries implementation reads ALL manifests from both base_manifest_list and delta_manifest_list, then aggregates stats by partition key bytes. For tables with many manifests, this could be expensive. Consider:
- Adding a note about expected performance characteristics.
- Or in a follow-up, exploring whether the $partitions system table path in DataFusion could be reused instead of duplicating the scan logic.
partition_value_to_string — the fallback branch _ => row.get_string(pos_i) will silently misinterpret binary types (VARBINARY, BINARY) as UTF-8 strings. Consider returning a hex encoding or explicitly erroring for unsupported partition key types.
Snapshot ordering: list_snapshots() reverses the list (into_iter().rev()) so newest is first. The Python test asserts snaps[0].id() == snap.id() — which is correct. But this ordering convention should be documented in the Python type stubs or docstring.
Missing type stubs: The new Snapshot, Tag, and PartitionStat classes need .pyi entries in datafusion.pyi for IDE support. Same issue as raised on #306.
Thread safety of runtime().block_on(): The list_snapshots, latest_snapshot, list_tags, and partition_stats methods all call runtime().block_on(...) from Python. If called from within an async Python context (e.g., inside an asyncio event loop), this will panic. Consider documenting this limitation.

Overall the implementation looks correct and the test coverage is reasonable. The main blocker is the dependency on #306 and the missing type stubs.

SML0127 · 2026-05-29T04:07:26Z

Hi @JingsongLi
I've rebased the branch. Could you take a look?

JingsongLi · 2026-06-16T05:43:35Z

+    for entry in entries {
+        let bucket = grouped.entry(entry.partition().to_vec()).or_default();
+        let file = entry.file();
+        let live_rows = file.row_count - file.delete_row_count.unwrap_or(0);


delete_row_count should not be subtracted when producing these partition statistics. In Paimon, DataFileMeta::row_count is the physical row count and delete_row_count is only the number of DELETE row-kind records inside that file (rowCount = add_row_count + delete_row_count). The existing commit/catalog partition statistics paths aggregate file.row_count directly, so this new API will under-report PK/changelog files and still will not represent the true logical live row count. Please keep this consistent with the existing partition statistics semantics, or document/expose a separately computed logical count.

JingsongLi · 2026-06-16T05:43:53Z

+            .iter()
+            .find(|f| f.name() == key)
+            .map(|f| f.data_type());
+        let value = if row.is_null_at(i) {


This custom decoder returns partition values that differ from the values used everywhere else in the table. For example null partitions should use partition.default-name, non-legacy date/time/timestamp partitions should be formatted with PartitionComputer, and decimals should use the configured scale instead of the raw unscaled integer. As a result list_partitions() / partition_stats() can report partition specs that do not match the catalog/system-table view for the same table. Please reuse PartitionComputer::generate_part_values here (or expose a helper) so the observability API uses the same partition formatting rules as writes and catalog partition listing.

JingsongLi reviewed May 24, 2026

View reviewed changes

JingsongLi mentioned this pull request May 24, 2026

feat: expose catalog metadata api for pypaimon_rust #306

Merged

SML0127 added a commit to SML0127/paimon-rust that referenced this pull request May 25, 2026

fix: address PR apache#307 feedback

34dbad1

SML0127 force-pushed the issue-285 branch from aa5a129 to 34dbad1 Compare May 25, 2026 16:14

SML0127 added a commit to SML0127/paimon-rust that referenced this pull request May 25, 2026

fix: address PR apache#307 feedback

290262f

SML0127 force-pushed the issue-285 branch from 34dbad1 to 290262f Compare May 25, 2026 16:27

SML0127 marked this pull request as ready for review May 25, 2026 18:38

SML0127 added 4 commits June 16, 2026 12:03

add partition_stats and list_partitions

c629393

expose table observability api

d21c8a7

add tests for observability

27e0014

fix: address PR apache#307 feedback

7b54112

JingsongLi force-pushed the issue-285 branch from 290262f to 7b54112 Compare June 16, 2026 04:03

JingsongLi reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expose table observability API for pypaimon_rust#307

expose table observability API for pypaimon_rust#307
SML0127 wants to merge 4 commits into
apache:mainfrom
SML0127:issue-285

SML0127 commented May 3, 2026 •

edited

Loading

Uh oh!

SML0127 commented May 3, 2026

Uh oh!

JingsongLi left a comment

Uh oh!

SML0127 commented May 29, 2026

Uh oh!

JingsongLi Jun 16, 2026

Uh oh!

JingsongLi Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SML0127 commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

SML0127 commented May 3, 2026

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

SML0127 commented May 29, 2026

Uh oh!

JingsongLi Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SML0127 commented May 3, 2026 •

edited

Loading