fix(mtree): bound table-diff row collection with max_diff_rows (ACE-198)#133
Conversation
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 39 |
| Duplication | 10 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
📝 WalkthroughWalkthroughAdds a ChangesMaxDiffRows Enforcement
Related PRs: None identified. Suggested labels: enhancement, mtree, tests, documentation Suggested reviewers: Reviewers familiar with Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/consistency/mtree/merkle.go`:
- Around line 590-596: The truncation flag is being set too early in
recordPairRowLocked: it currently marks DiffRowLimitReached and logs the warning
when the counter reaches MaxDiffRows, even though the last allowed row was still
accepted. Move the flag/warning into the branch that actually rejects the next
row after the limit is exceeded, and keep the current accepted row path in
recordPairRowLocked unchanged so only real drops are reported.
- Around line 2098-2102: DiffMtree currently treats 0 as both “unset” and
“unbounded,” which breaks explicit MaxDiffRows settings and lets config-derived
negatives slip past validation. Update DiffMtree to resolve m.MaxDiffRows and
cfg.TableDiff.MaxDiffRows into a separate effective cap that preserves explicit
zero versus unset, then validate that resolved value once before use. Use the
MaxDiffRows handling around the table_diff config resolution path to ensure a
negative config value is rejected after resolution and an explicit 0 remains
unbounded.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d3455636-f350-4a6b-b741-bcf931885f61
📒 Files selected for processing (3)
docs/commands/mtree/mtree-table-diff.mdinternal/consistency/mtree/merkle.gointernal/consistency/mtree/merkle_test.go
The Merkle-tree diff accumulated every differing row per node pair in memory with no limit, so a heavily diverged table (~1.5M differing rows) grew the in-memory diff to several GB and the process was OOM-killed. The classic table-diff engine already caps this via max_diff_rows; the mtree engine did not. Mirror that cap in the mtree path: DiffMtree resolves MaxDiffRows from table_diff.max_diff_rows (default 1000000; 0 = unbounded), appendDiffs stops collecting once a pair reaches the cap and marks the report with diff_row_limit_reached (warning once), and the fetch loops stop pulling further batches for a capped pair. A negative cap is rejected up front. This bounds memory and honestly flags a truncated diff instead of dying silently; the default still assumes a few GB of headroom, so lower it on memory-constrained hosts and re-run after repair. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Flag DiffRowLimitReached only when a row is actually dropped, not on the last accepted row, so a pair with exactly max_diff_rows diffs is not falsely marked truncated. - Source the cap from mtree.diff.max_diff_rows like every other mtree tunable, and guard config resolution with > 0 so a negative value cannot silently disable the bound (matches table_diff).
ff3d46a to
c04ee00
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/config/config.go`:
- Around line 70-74: Negative mtree.diff.max_diff_rows is being treated as unset
before the config fallback in the config loading logic, so the validation in the
MaxDiffRows handling path needs to reject values below zero before applying
cfg.MTree.Diff.MaxDiffRows. Update the branch in the config parser that assigns
or normalizes MaxDiffRows so a negative YAML value fails fast instead of leaving
the limit unbounded, using the existing MaxDiffRows field and
cfg.MTree.Diff.MaxDiffRows path as the anchor points.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9ef61878-1b6f-466c-ac74-a9b46c6d384d
📒 Files selected for processing (6)
ace.yamldocs/commands/mtree/mtree-table-diff.mdinternal/cli/default_config.yamlinternal/consistency/mtree/merkle.gointernal/consistency/mtree/merkle_test.gopkg/config/config.go
✅ Files skipped from review due to trivial changes (2)
- docs/commands/mtree/mtree-table-diff.md
- ace.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/consistency/mtree/merkle_test.go
- internal/consistency/mtree/merkle.go
The Merkle-tree diff accumulated every differing row per node pair in memory with no limit, so a heavily diverged table (~1.5M differing rows) grew the in-memory diff to several GB and the process was OOM-killed. The classic table-diff engine already caps this via max_diff_rows; the mtree engine did not.
Mirror that cap in the mtree path: DiffMtree resolves MaxDiffRows from mtree.max_diff_rows (default 1000000; 0 = unbounded), appendDiffs stops collecting once a pair reaches the cap and marks the report with diff_row_limit_reached (warning once), and the fetch loops stop pulling further batches for a capped pair. A negative cap is rejected up front. This bounds memory and honestly flags a truncated diff instead of dying silently; the default still assumes a few GB of headroom, so lower it on memory-constrained hosts and re-run after repair.