test(harness): raise CI Eventually scale ×3→×5 to fix autoplace-convergence flake#176
test(harness): raise CI Eventually scale ×3→×5 to fix autoplace-convergence flake#176Andrei Kvapil (kvaps) wants to merge 1 commit into
Conversation
The x3 CI budget stretch (30s->90s) still let the heaviest autoplace- convergence cases rotate-flake the Integration lane under full-suite contention: TestGroupFRToggleDiskful2DisklessReapsTieBreaker and TestGroupJ/CSICreateVolumeFromEmpty both timed out at exactly 90s on a loaded GitHub runner while completing in ~8s locally — CPU starvation, not a hang. Raise the scale to x5 (30s->150s) so the placer / mock- satellite reconcile loop gets more wall-clock under contention. Fail-safe: Eventually returns the instant the predicate passes, so green runs pay nothing for the larger budget; a genuinely stuck test still fails at the job-level -timeout=15m ceiling. Pin test updated to 150s. Signed-off-by: Andrei Kvapil <kvapss@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request increases the CI timeout scaling factor from 3x to 5x (adjusting a 30-second timeout to 150 seconds) in the integration test harness to prevent flaky test failures caused by resource contention on CI runners. The corresponding test case has been updated to reflect this change. No review comments were provided, so there is no additional feedback to address.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
What
Raise the CI
Eventuallybudget scale from ×3 to ×5 (per-group base 30s → 150s on CI) intests/integration/harness/asserts.go.Why
The ×3 stretch introduced in #173 (30s → 90s on CI) still let the Integration lane rotate-flake under full-suite contention. The heaviest autoplace-convergence cases —
TestGroupFRToggleDiskful2DisklessReapsTieBreakerandTestGroupJ/CSICreateVolumeFromEmpty— both timed out at exactly 90s on a loaded GitHub runner (Eventually timed out after 1m30s: ... never reached 2 diskful replicas/autoplace did not converge to placeCount=2), while the same tests complete in ~8s locally and pass on other CI runs. That signature is CPU starvation under load, not a hang — the placer / mock-satellite reconcile loop is simply not getting scheduled enough within 90s when the whole suite runs concurrently. More wall-clock is the correct, targeted mitigation.Fail-safe
Eventuallyreturns the instant its predicate passes, so green runs pay nothing for the larger budget — only genuinely slow/failing runs report later, and those are still capped by the job-level-timeout=15m. So ×5 cannot cause runaway jobs; it only widens the headroom for load-starved convergence.Scope
Test-infrastructure only — no product code changes. The pin test
TestScaledTimeoutStretchesOnCIis updated to the new 150s expectation. The CHANGELOG entry lands in the v0.1.17 release section (this repo writes the CHANGELOG at release time, not per-PR).