expose table observability API for pypaimon_rust#307
Conversation
|
I'll rebase this pr once #306 is merged |
JingsongLi
left a comment
There was a problem hiding this comment.
Thanks for the contribution! This extends the Python bindings with useful observability APIs (snapshots, tags, partitions).
A few review comments:
-
PR #306 dependency: This PR builds on #306 (catalog metadata API). Given that #306 still has requested changes outstanding (type stubs and rebase), this PR should wait for #306 to merge first. Consider rebasing once #306 is resolved.
-
partition_stat.rs— manifest entry scan approach: Theread_all_manifest_entriesimplementation reads ALL manifests from bothbase_manifest_listanddelta_manifest_list, then aggregates stats by partition key bytes. For tables with many manifests, this could be expensive. Consider:- Adding a note about expected performance characteristics.
- Or in a follow-up, exploring whether the
$partitionssystem table path in DataFusion could be reused instead of duplicating the scan logic.
-
partition_value_to_string— the fallback branch_ => row.get_string(pos_i)will silently misinterpret binary types (VARBINARY, BINARY) as UTF-8 strings. Consider returning a hex encoding or explicitly erroring for unsupported partition key types. -
Snapshot ordering:
list_snapshots()reverses the list (into_iter().rev()) so newest is first. The Python test assertssnaps[0].id() == snap.id()— which is correct. But this ordering convention should be documented in the Python type stubs or docstring. -
Missing type stubs: The new
Snapshot,Tag, andPartitionStatclasses need.pyientries indatafusion.pyifor IDE support. Same issue as raised on #306. -
Thread safety of
runtime().block_on(): Thelist_snapshots,latest_snapshot,list_tags, andpartition_statsmethods all callruntime().block_on(...)from Python. If called from within an async Python context (e.g., inside anasyncioevent loop), this will panic. Consider documenting this limitation.
Overall the implementation looks correct and the test coverage is reasonable. The main blocker is the dependency on #306 and the missing type stubs.
|
Hi @JingsongLi |
| for entry in entries { | ||
| let bucket = grouped.entry(entry.partition().to_vec()).or_default(); | ||
| let file = entry.file(); | ||
| let live_rows = file.row_count - file.delete_row_count.unwrap_or(0); |
There was a problem hiding this comment.
delete_row_count should not be subtracted when producing these partition statistics. In Paimon, DataFileMeta::row_count is the physical row count and delete_row_count is only the number of DELETE row-kind records inside that file (rowCount = add_row_count + delete_row_count). The existing commit/catalog partition statistics paths aggregate file.row_count directly, so this new API will under-report PK/changelog files and still will not represent the true logical live row count. Please keep this consistent with the existing partition statistics semantics, or document/expose a separately computed logical count.
| .iter() | ||
| .find(|f| f.name() == key) | ||
| .map(|f| f.data_type()); | ||
| let value = if row.is_null_at(i) { |
There was a problem hiding this comment.
This custom decoder returns partition values that differ from the values used everywhere else in the table. For example null partitions should use partition.default-name, non-legacy date/time/timestamp partitions should be formatted with PartitionComputer, and decimals should use the configured scale instead of the raw unscaled integer. As a result list_partitions() / partition_stats() can report partition specs that do not match the catalog/system-table view for the same table. Please reuse PartitionComputer::generate_part_values here (or expose a helper) so the observability API uses the same partition formatting rules as writes and catalog partition listing.
Purpose
Linked issue: close #285
Brief change log
Tests
API and Format
Documentation