Skip to content

fix(mtree): bound table-diff row collection with max_diff_rows (ACE-198)#133

Merged
mason-sharp merged 2 commits into
mainfrom
fix/ACE-198/mtree-diff-max-rows
Jul 1, 2026
Merged

fix(mtree): bound table-diff row collection with max_diff_rows (ACE-198)#133
mason-sharp merged 2 commits into
mainfrom
fix/ACE-198/mtree-diff-max-rows

Conversation

@danolivo

@danolivo danolivo commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

The Merkle-tree diff accumulated every differing row per node pair in memory with no limit, so a heavily diverged table (~1.5M differing rows) grew the in-memory diff to several GB and the process was OOM-killed. The classic table-diff engine already caps this via max_diff_rows; the mtree engine did not.

Mirror that cap in the mtree path: DiffMtree resolves MaxDiffRows from mtree.max_diff_rows (default 1000000; 0 = unbounded), appendDiffs stops collecting once a pair reaches the cap and marks the report with diff_row_limit_reached (warning once), and the fetch loops stop pulling further batches for a capped pair. A negative cap is rejected up front. This bounds memory and honestly flags a truncated diff instead of dying silently; the default still assumes a few GB of headroom, so lower it on memory-constrained hosts and re-run after repair.

@codacy-production

codacy-production Bot commented Jul 1, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 39 complexity · 10 duplication

Metric Results
Complexity 39
Duplication 10

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a MaxDiffRows configuration to cap differing-row collection per node pair in Merkle tree table diffs. The change enforces the cap during enumeration and accumulation, records truncation in diff results, validates and resolves the setting, and updates tests, config defaults, and docs.

Changes

MaxDiffRows Enforcement

Layer / File(s) Summary
Task struct fields and locking helpers
internal/consistency/mtree/merkle.go
MerkleTreeTask gains MaxDiffRows plus per-pair diff counting and warning state, with helper methods for cap checks, counter updates, and one-time truncation marking.
Work item processing and diff recording
internal/consistency/mtree/merkle.go
processWorkItem builds a stable nodePairKey and stops fetching more mismatched batches once a pair reaches the cap; appendDiffs enforces the cap while recording Node1-only, Node2-only, and modified rows.
DiffMtree validation and config resolution
internal/consistency/mtree/merkle.go, pkg/config/config.go
DiffMtree rejects negative MaxDiffRows, resolves it from config.MTree.Diff.MaxDiffRows when unset, initializes counting state, and writes the resolved value into the diff summary; the config struct gains the YAML-backed field.
Unit tests, defaults, and docs
internal/consistency/mtree/merkle_test.go, internal/cli/default_config.yaml, ace.yaml, docs/commands/mtree/mtree-table-diff.md
New tests cover capped, exact-cap, unlimited, and invalid cases; default config and shipped config add max_diff_rows: 1000000; docs describe the limit, truncation flag, and warning behavior.

Related PRs: None identified.

Suggested labels: enhancement, mtree, tests, documentation

Suggested reviewers: Reviewers familiar with internal/consistency/mtree diff logic.

Poem

A rabbit counts each differing row,
and trims the stream as diffs still flow.
When caps are met, I wink and say,
“Enough for now — more on another day.” 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: bounding mtree table-diff row collection with max_diff_rows.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The description clearly matches the changeset: it adds a max diff-row cap, truncation warning, validation, and config updates for mtree diff.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ACE-198/mtree-diff-max-rows

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/consistency/mtree/merkle.go`:
- Around line 590-596: The truncation flag is being set too early in
recordPairRowLocked: it currently marks DiffRowLimitReached and logs the warning
when the counter reaches MaxDiffRows, even though the last allowed row was still
accepted. Move the flag/warning into the branch that actually rejects the next
row after the limit is exceeded, and keep the current accepted row path in
recordPairRowLocked unchanged so only real drops are reported.
- Around line 2098-2102: DiffMtree currently treats 0 as both “unset” and
“unbounded,” which breaks explicit MaxDiffRows settings and lets config-derived
negatives slip past validation. Update DiffMtree to resolve m.MaxDiffRows and
cfg.TableDiff.MaxDiffRows into a separate effective cap that preserves explicit
zero versus unset, then validate that resolved value once before use. Use the
MaxDiffRows handling around the table_diff config resolution path to ensure a
negative config value is rejected after resolution and an explicit 0 remains
unbounded.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d3455636-f350-4a6b-b741-bcf931885f61

📥 Commits

Reviewing files that changed from the base of the PR and between 8dc2f4c and ff3d46a.

📒 Files selected for processing (3)
  • docs/commands/mtree/mtree-table-diff.md
  • internal/consistency/mtree/merkle.go
  • internal/consistency/mtree/merkle_test.go

Comment thread internal/consistency/mtree/merkle.go Outdated
Comment thread internal/consistency/mtree/merkle.go
Andrei Lepikhov and others added 2 commits July 1, 2026 11:14
The Merkle-tree diff accumulated every differing row per node pair in
memory with no limit, so a heavily diverged table (~1.5M differing rows)
grew the in-memory diff to several GB and the process was OOM-killed. The
classic table-diff engine already caps this via max_diff_rows; the mtree
engine did not.

Mirror that cap in the mtree path: DiffMtree resolves MaxDiffRows from
table_diff.max_diff_rows (default 1000000; 0 = unbounded), appendDiffs
stops collecting once a pair reaches the cap and marks the report with
diff_row_limit_reached (warning once), and the fetch loops stop pulling
further batches for a capped pair. A negative cap is rejected up front.
This bounds memory and honestly flags a truncated diff instead of dying
silently; the default still assumes a few GB of headroom, so lower it on
memory-constrained hosts and re-run after repair.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Flag DiffRowLimitReached only when a row is actually dropped, not on
  the last accepted row, so a pair with exactly max_diff_rows diffs is
  not falsely marked truncated.
- Source the cap from mtree.diff.max_diff_rows like every other mtree
  tunable, and guard config resolution with > 0 so a negative value
  cannot silently disable the bound (matches table_diff).
@mason-sharp mason-sharp force-pushed the fix/ACE-198/mtree-diff-max-rows branch from ff3d46a to c04ee00 Compare July 1, 2026 19:01

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/config/config.go`:
- Around line 70-74: Negative mtree.diff.max_diff_rows is being treated as unset
before the config fallback in the config loading logic, so the validation in the
MaxDiffRows handling path needs to reject values below zero before applying
cfg.MTree.Diff.MaxDiffRows. Update the branch in the config parser that assigns
or normalizes MaxDiffRows so a negative YAML value fails fast instead of leaving
the limit unbounded, using the existing MaxDiffRows field and
cfg.MTree.Diff.MaxDiffRows path as the anchor points.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9ef61878-1b6f-466c-ac74-a9b46c6d384d

📥 Commits

Reviewing files that changed from the base of the PR and between ff3d46a and c04ee00.

📒 Files selected for processing (6)
  • ace.yaml
  • docs/commands/mtree/mtree-table-diff.md
  • internal/cli/default_config.yaml
  • internal/consistency/mtree/merkle.go
  • internal/consistency/mtree/merkle_test.go
  • pkg/config/config.go
✅ Files skipped from review due to trivial changes (2)
  • docs/commands/mtree/mtree-table-diff.md
  • ace.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/consistency/mtree/merkle_test.go
  • internal/consistency/mtree/merkle.go

Comment thread pkg/config/config.go
@mason-sharp mason-sharp merged commit f085feb into main Jul 1, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants