Skip to content

fix PruneManifests::row_counts to use total live rows#21

Merged
laverem merged 1 commit into
splitgraph:datafusion-54from
JanKaul:datafusion-54
Jun 17, 2026
Merged

fix PruneManifests::row_counts to use total live rows#21
laverem merged 1 commit into
splitgraph:datafusion-54from
JanKaul:datafusion-54

Conversation

@JanKaul

@JanKaul JanKaul commented Jun 17, 2026

Copy link
Copy Markdown

added_rows_count counts only ADDED-status files in the current snapshot. Manifests carrying EXISTING rows (e.g. after compaction) have added_rows_count = 0, causing DataFusion's IS NOT NULL pruning (null_count != row_count) to evaluate 0 != 0 = false and incorrectly prune manifests that contain live data — silently returning empty results.

Uses added + existing - deleted for the actual live row count; falls back to None (unknown, safe) when any of the three optional fields is absent.

Fixes JanKaul#359

…d_rows_count

added_rows_count counts only ADDED-status files in the current snapshot.
Manifests carrying EXISTING rows (e.g. after compaction) have added_rows_count=0,
causing DataFusion's IS NOT NULL pruning (null_count != row_count) to evaluate
0 != 0 = false and incorrectly prune manifests that contain live data.

Use added + existing - deleted to get the actual live row count; fall back to
None (unknown) when any of the three optional fields is absent.

Fixes #359

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@laverem laverem merged commit bda1aa9 into splitgraph:datafusion-54 Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants