fix(rest): enforce size bounds on VD resize to match create (Bug 155)#175
fix(rest): enforce size bounds on VD resize to match create (Bug 155)#175Andrei Kvapil (kvaps) wants to merge 2 commits into
Conversation
The VD resize path (PUT .../volume-definitions/{vn}, `linstor vd
set-size`) did not enforce the [4 MiB, 16 TiB] size bounds the create
path enforces via validateVDSize (Bug 155). A below-floor force-shrink
or an over-ceiling grow was accepted (200) and stored verbatim, leaving
a spec the satellite can never materialise — it hot-loops on `drbdadm
create-md` / device resize forever, the exact Bug 155 failure mode
reached through the resize verb. It does not self-heal.
Gate size_kib in rejectVDPatchSize, reusing the create path's
validateVDSize + writeVDSizeRejection so the rejection envelope is
byte-identical across the create and resize verbs. The check runs
before the shrink-vs-force gate (like the Bug 383 non-positive floor):
force waives the shrink-direction opt-in, never the physical
floor/ceiling.
Regression tests (fail on the pre-fix tree, pass with the gate): L1
handler tests assert a below-floor force-shrink and an over-ceiling
grow are refused with 400 + FAIL_INVLD_VLM_SIZE and leave the stored
size untouched, plus an inclusive-boundary accept; an envtest pair
drives the same two scenarios through the real apiserver round-trip.
The existing LargeSizeKibRoundTrip test used 1 PiB — above the 16 TiB
ceiling the create path already refuses (TestBug155VDCreateRefusesAbsurd
Size) — so it was pinning the very create/resize asymmetry this gate
closes. Its int64-no-truncation guard now uses the largest in-bounds
size (16 TiB), still ~8x the int32 clamp point.
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds a size-bounds validation gate to the VD resize PUT path ( ChangesVD resize size bounds
Estimated code review effort: 3 (Moderate) | ~25 minutes Sequence Diagram(s)sequenceDiagram
participant Client
participant RestHandler as rejectVDPatchSize
participant BoundsGate as rejectVDPatchOutOfBounds
Client->>RestHandler: PUT volume-definitions/{vn} (size_kib, force)
RestHandler->>BoundsGate: check size_kib bounds
BoundsGate->>BoundsGate: validateVDSize(size_kib)
alt out of bounds
BoundsGate-->>RestHandler: write 400 FAIL_INVLD_VLM_SIZE envelope
RestHandler-->>Client: 400 Bad Request
else in bounds
BoundsGate-->>RestHandler: no-op
RestHandler-->>Client: proceed with shrink/force logic
end
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request mirrors the volume definition size bounds validation (4 MiB to 16 TiB) from the CREATE path onto the RESIZE path (PUT). This prevents out-of-bounds sizes from being stored during a resize, which previously caused satellite hot-loops. Unit and integration tests have been added to verify this behavior. The feedback highlights that reusing the writeVDSizeRejection helper on the resize path may output a misleading correction message referencing the create command. Additionally, the integration tests should be tightened to assert an exact 400 Bad Request status code rather than just checking for non-2xx status codes to prevent false positives during server crashes.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| return false | ||
| } | ||
|
|
||
| writeVDSizeRejection(w, rd, vn, *patch.SizeKib, sizeErr) |
There was a problem hiding this comment.
The writeVDSizeRejection helper has a hardcoded correction message (Correc) that instructs the operator to re-issue linstor vd c (the create command). Since this helper is now reused in the resize/PUT path (rejectVDPatchOutOfBounds), this correction message will be misleading to operators attempting a resize. Consider refactoring writeVDSizeRejection to accept the action/verb or make the correction message more generic (e.g., "retry the operation with a valid size").
| if status >= 200 && status < 300 { | ||
| t.Fatalf("resize ACCEPTED a sub-floor size %d KiB < %d KiB min (Bug-155 class via vd set-size)", | ||
| belowFloor, vdBoundsMinSizeKib) | ||
| } |
There was a problem hiding this comment.
Asserting that the status code is simply not in the 2xx range is too loose. If the server crashes or returns a 500 Internal Server Error, the test would pass because the stored size remains unchanged. It is better to assert the exact expected status code (400 Bad Request) to verify the API contract strictly.
if status != http.StatusBadRequest {
t.Fatalf("status: got %d, want 400 (sub-floor size must be refused like create); body=%s", status, body)
}| if status >= 200 && status < 300 { | ||
| t.Fatalf("resize ACCEPTED an over-max size %d KiB > %d KiB max (Bug-155 class via vd set-size)", | ||
| aboveMax, vdBoundsMaxSizeKib) | ||
| } |
There was a problem hiding this comment.
Asserting that the status code is simply not in the 2xx range is too loose. If the server crashes or returns a 500 Internal Server Error, the test would pass because the stored size remains unchanged. It is better to assert the exact expected status code (400 Bad Request) to verify the API contract strictly.
if status != http.StatusBadRequest {
t.Fatalf("status: got %d, want 400 (over-ceiling size must be refused like create); body=%s", status, body)
}Complete the CLI-bug-fix test tiers for the VD resize size-bounds gate (vd set-size), per the blockstor CLI-bug-fix protocol. - L6 cli-matrix cell tests/e2e/cli-matrix/vd-resize-bounds-rejected.sh: over-ceiling grow via CLI rejected, below-floor force-shrink via raw REST PUT rejected (400 + FAIL_INVLD_VLM_SIZE), in-bounds grow lands; size unchanged after each rejection. - L7 replay tests/operator-harness/replay/vd-resize-bounds-rejected.yaml: the CLI-driveable operator sequence (over-ceiling reject, below-floor reject, in-bounds grow succeeds) with size-unchanged assertions. - L1: add an in-bounds force-shrink (1 GiB -> 8 MiB) guard so the gate cannot regress the legitimate scenario-4.W13 force-shrink path. The on-live-stand run of the L6 cell and L7 replay is DEFERRED — the dev-kvaps DRBD oracle is unavailable this session; both files carry a run-deferred note and a stand-pending task tracks running them. The fix is already proven end-to-end at the L1 (handler) + integration (real apiserver via envtest) tiers, which fully cover a REST-layer input rejection; the L6/L7 tiers validate DRBD-state convergence, which is N/A for a refused request, so they are the required paper trail rather than the proof. Signed-off-by: Andrei Kvapil <kvapss@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
pkg/rest/volume_definitions.go (1)
812-843: 📐 Maintainability & Code Quality | 🔵 TrivialTwo overlapping floor checks — consider consolidating.
rejectVDNonPositiveSize(size_kib ≤ 0) and the newrejectVDPatchOutOfBounds(viavalidateVDSize, size_kib < min) overlap: sinceminVolumeDefinitionSizeKibis presumably > 0, any non-positive value is already caught by the new bounds check too. Functionally harmless (non-positive rejects first with a more specific "must be > 0" message), but it's now two code paths enforcing effectively the same floor. Consider folding the non-positive case intovalidateVDSize/rejectVDPatchOutOfBoundswith a tailored message, or leaving a short comment noting this is deliberate for message-wording purposes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/rest/volume_definitions.go` around lines 812 - 843, The size validation in rejectVDPatchSize now has two overlapping floor checks: rejectVDNonPositiveSize and rejectVDPatchOutOfBounds both reject values below the minimum, with rejectVDPatchOutOfBounds already covering non-positive sizes through validateVDSize. Consolidate the logic by either folding the non-positive case into rejectVDPatchOutOfBounds/validateVDSize with a specific error message, or keep both paths but add a short comment explaining that rejectVDNonPositiveSize is intentionally separate for the clearer “must be > 0” response. Ensure rejectShrinkWithoutForce remains after the bounds checks.tests/integration/vd_resize_bounds_test.go (2)
121-131: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winTests only check non-2xx status, not the specific rejection envelope.
The PR's stated fix returns
400withFAIL_INVLD_VLM_SIZEplus cause/correction text (per PR objectives). These tests only assertstatus >= 200 && status < 300is false and that stored size is unchanged, but never assert the response body actually carries theFAIL_INVLD_VLM_SIZEcode — a regression that returns a different 4xx/5xx (e.g., a generic decode error) would still pass these tests.Consider decoding
bodyinto theapiv1.APICallRcenvelope and asserting the ret-code / message, tightening the regression coverage this test is meant to provide.Also applies to: 147-157
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/vd_resize_bounds_test.go` around lines 121 - 131, The resize bounds integration test only checks that sub-floor requests are non-2xx and leave stored size unchanged, so it can miss the intended rejection contract. Update the `vdBoundsPut` assertions in `tests/integration/vd_resize_bounds_test.go` to decode `body` into the `apiv1.APICallRc` envelope and verify the returned ret-code/message matches `FAIL_INVLD_VLM_SIZE` (including the expected cause/correction text), using the same check in the `vdBoundsMinSizeKib`/`vdBoundsStoredSize` test cases so regressions returning a different error still fail.
85-104: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winUse a bounded context for the CRD read.
vdBoundsStoredSize'sGetcall usescontext.Background()with no timeout, unlikevdBoundsPutwhich bounds the HTTP call withgroupGTimeout. If the envtest apiserver stalls, this call can hang the test indefinitely instead of failing fast.🔧 Proposed fix
func vdBoundsStoredSize(t *testing.T, stack *harness.Stack, rd string) int64 { t.Helper() + ctx, cancel := context.WithTimeout(context.Background(), groupGTimeout) + defer cancel() + var rdObj blockstoriov1alpha1.ResourceDefinition - if err := stack.Env.Client.Get(context.Background(), types.NamespacedName{Name: rd}, &rdObj); err != nil { + if err := stack.Env.Client.Get(ctx, types.NamespacedName{Name: rd}, &rdObj); err != nil { t.Fatalf("get RD %q: %v", rd, err) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/vd_resize_bounds_test.go` around lines 85 - 104, vdBoundsStoredSize currently reads the RD CRD with an unbounded context, which can hang the test if the apiserver stalls. Update the Get call in vdBoundsStoredSize to use the same bounded timeout pattern as vdBoundsPut, reusing groupGTimeout or an equivalent context with deadline so the read fails fast instead of waiting indefinitely.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/e2e/cli-matrix/vd-resize-bounds-rejected.sh`:
- Around line 103-144: The `force=true` check in `vd-resize-bounds-rejected.sh`
is asserting the wrong behavior for the REST shrink path. Update the test around
the raw PUT to `volume-definitions/0` so it either removes `force=true` when
verifying rejection, or keeps `force=true` and asserts a successful resize plus
the expected size change; use the `floor_json`/`code` checks in that block to
validate the intended outcome.
---
Nitpick comments:
In `@pkg/rest/volume_definitions.go`:
- Around line 812-843: The size validation in rejectVDPatchSize now has two
overlapping floor checks: rejectVDNonPositiveSize and rejectVDPatchOutOfBounds
both reject values below the minimum, with rejectVDPatchOutOfBounds already
covering non-positive sizes through validateVDSize. Consolidate the logic by
either folding the non-positive case into
rejectVDPatchOutOfBounds/validateVDSize with a specific error message, or keep
both paths but add a short comment explaining that rejectVDNonPositiveSize is
intentionally separate for the clearer “must be > 0” response. Ensure
rejectShrinkWithoutForce remains after the bounds checks.
In `@tests/integration/vd_resize_bounds_test.go`:
- Around line 121-131: The resize bounds integration test only checks that
sub-floor requests are non-2xx and leave stored size unchanged, so it can miss
the intended rejection contract. Update the `vdBoundsPut` assertions in
`tests/integration/vd_resize_bounds_test.go` to decode `body` into the
`apiv1.APICallRc` envelope and verify the returned ret-code/message matches
`FAIL_INVLD_VLM_SIZE` (including the expected cause/correction text), using the
same check in the `vdBoundsMinSizeKib`/`vdBoundsStoredSize` test cases so
regressions returning a different error still fail.
- Around line 85-104: vdBoundsStoredSize currently reads the RD CRD with an
unbounded context, which can hang the test if the apiserver stalls. Update the
Get call in vdBoundsStoredSize to use the same bounded timeout pattern as
vdBoundsPut, reusing groupGTimeout or an equivalent context with deadline so the
read fails fast instead of waiting indefinitely.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d847c010-a147-410f-9b49-73c18c32a0b9
📒 Files selected for processing (6)
pkg/rest/vd_put_size_bounds_round4_test.gopkg/rest/volume_definitions.gopkg/rest/volume_definitions_test.gotests/e2e/cli-matrix/vd-resize-bounds-rejected.shtests/integration/vd_resize_bounds_test.gotests/operator-harness/replay/vd-resize-bounds-rejected.yaml
| echo ">> over-ceiling grow vd s $RD 0 16385G (16 TiB + 1 GiB — MUST exit non-zero)" | ||
| err_file=$(mktemp) | ||
| if "${LCTL[@]}" volume-definition set-size "$RD" 0 16385G >"$err_file" 2>&1; then | ||
| echo "FAIL: over-ceiling grow (16 TiB + 1 GiB) unexpectedly succeeded" >&2 | ||
| echo " size_kib > 16 TiB is unaddressable by DRBD — REST must reject." >&2 | ||
| cat "$err_file" >&2 | ||
| rm -f "$err_file" | ||
| exit 1 | ||
| fi | ||
| if ! grep -qiE 'above maximum|maximum|ceiling|invalid volume definition size' "$err_file"; then | ||
| echo "FAIL: over-ceiling rejected but error text is unhelpful:" >&2 | ||
| cat "$err_file" >&2 | ||
| rm -f "$err_file" | ||
| exit 1 | ||
| fi | ||
| rm -f "$err_file" | ||
|
|
||
| cur_kib=$(linstor_vd_size_kib "$RD" 0) | ||
| if (( cur_kib != SIZE_1G_KIB )); then | ||
| echo "FAIL: post-reject SizeKib=$cur_kib != $SIZE_1G_KIB (over-ceiling reject mutated state)" >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo ">> below-floor force-shrink via raw REST PUT (3072 KiB, force=true — MUST be 400)" | ||
| floor_json=$(mktemp) | ||
| code=$(curl -s -m 5 -o "$floor_json" -w '%{http_code}' -X PUT \ | ||
| -H 'Content-Type: application/json' \ | ||
| -d '{"size_kib":3072,"force":true}' \ | ||
| "http://localhost:${LCTL_PORT}/v1/resource-definitions/${RD}/volume-definitions/0") | ||
| if [[ "$code" != "400" ]]; then | ||
| echo "FAIL: below-floor force-shrink returned HTTP $code, want 400" >&2 | ||
| cat "$floor_json" >&2 | ||
| rm -f "$floor_json" | ||
| exit 1 | ||
| fi | ||
| if ! grep -qiE 'below minimum|invalid volume definition size' "$floor_json"; then | ||
| echo "FAIL: 400 but envelope is not a size rejection:" >&2 | ||
| cat "$floor_json" >&2 | ||
| rm -f "$floor_json" | ||
| exit 1 | ||
| fi | ||
| rm -f "$floor_json" |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm the exact rejection message text emitted for out-of-bounds VD resize
fd volume_definitions.go pkg/rest --exec grep -n -i -A3 -B3 'FAIL_INVLD_VLM_SIZE\|maximum\|minimum' {}Repository: cozystack/blockstor
Length of output: 1990
🌐 Web query:
cozystack blockstor rejectVDPatchSize rejectVDPatchOutOfBounds FAIL_INVLD_VLM_SIZE volume_definitions.go
💡 Result:
In the Cozystack blockstor project, rejectVDPatchSize, rejectVDPatchOutOfBounds, and FAIL_INVLD_VLM_SIZE are components of the system's volume management logic, primarily located within volume_definitions.go (specifically in the REST API and volume definition handling logic) [1]. These mechanisms are part of a safety guardrail system designed to prevent data loss during volume resizing operations, specifically volume shrinking [1]. 1. FAIL_INVLD_VLM_SIZE: This is an error code returned by the blockstor API (with an HTTP 400 status) when an operation attempts to shrink a volume [1]. 2. Rationale: Because every blockstor RD (Resource Definition) is backed by DRBD, and DRBD cannot shrink a volume past the on-disk metadata position without risking data destruction, the system defaults to rejecting all shrink operations [1]. 3. Escape Hatch: Users can bypass this restriction by including force=true (as a query parameter or body field) in the API request, which is intended for scenarios where the operator has already performed necessary filesystem-level actions (e.g., resize2fs -s) [1]. These definitions are integral to ensuring that blockstor maintains safe, predictable behavior when managing DRBD-backed storage in a Kubernetes environment [1][2].
Citations:
- 1: https://github.com/cozystack/blockstor/blob/main/docs/cli-parity-known-deltas.md
- 2: https://github.com/cozystack/blockstor
force=true should not be asserted as a rejection The REST force-shrink path is meant to bypass the size guard, so this call should not expect HTTP 400. If the intent is to test rejection, drop force=true; otherwise assert success and the resulting size change.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/e2e/cli-matrix/vd-resize-bounds-rejected.sh` around lines 103 - 144,
The `force=true` check in `vd-resize-bounds-rejected.sh` is asserting the wrong
behavior for the REST shrink path. Update the test around the raw PUT to
`volume-definitions/0` so it either removes `force=true` when verifying
rejection, or keeps `force=true` and asserts a successful resize plus the
expected size change; use the `floor_json`/`code` checks in that block to
validate the intended outcome.
What
The VD resize path (
PUT .../volume-definitions/{vn}, i.e.linstor vd set-size) did not enforce the[4 MiB, 16 TiB]size bounds the create path enforces viavalidateVDSize(Bug 155). A below-floor force-shrink (e.g. 3 MiB) or an over-ceiling grow (e.g. 16 TiB + 1 GiB) was accepted with200 OKand stored verbatim, leaving aVolumeDefinitionspec the satellite can never materialise — it hot-loops ondrbdadm create-md/ device resize indefinitely. This is the exact Bug 155 failure mode reached through the resize verb instead of create, and it does not self-heal.Why it mattered
linstor vd set-sizeandlinstor vd cshould be symmetric on physical size bounds. The create path already refuses out-of-range sizes (TestBug155VDCreateRefusesAbsurdSize); the resize path silently accepted them, so a realistic operator or CSI resize could wedge a resource.Fix
Gate
size_kibinrejectVDPatchSize, reusing the create path'svalidateVDSize+writeVDSizeRejectionso the rejection envelope (400+FAIL_INVLD_VLM_SIZE, with cause/correction) is byte-identical across the two verbs. The bounds check runs before the shrink-vs-force gate — consistent with the existing Bug 383 non-positive floor — becauseforcewaives the shrink-direction opt-in, never the physical floor/ceiling; checking bounds first gives the operator the accurate "invalid size" error instead of an "add --force" hint on a size that would be refused even with force. A prop-only PUT (nosize_kib) is untouched.Tests (fail-on-bug, proven both directions)
pkg/rest/vd_put_size_bounds_round4_test.go): a below-floor force-shrink and an over-ceiling grow are refused with400+FAIL_INVLD_VLM_SIZEand leave the stored size untouched; inclusive-boundary sizes (exactly 4 MiB / 16 TiB) still land200.tests/integration/vd_resize_bounds_test.go): the same two scenarios driven through the real apiserver + store round-trip, asserting both the wire rejection and that the durable spec is unchanged.Both fail on the pre-fix tree (PUT returns
200and mutates the stored spec) and pass with the gate.Heads-up for reviewers
TestVolumeDefinitionsUpdateLargeSizeKibRoundTrippreviously used1 PiB— above the 16 TiB DRBD ceiling the create path already refuses — so it was inadvertently pinning the very create/resize asymmetry this PR closes. Its genuine purpose (a multi-TiBsize_kibsurvives the wire without int32 truncation) is preserved by switching to the largest in-bounds size (16 TiB, still ~8× the int32 clamp point).Scope
REST-layer input-validation fix, fully proven at the L1 + integration (envtest) tiers. The L6 cli-matrix / L7 replay runs on the live DRBD stand are deferred (stand unavailable); this change does not touch the satellite/DRBD data plane.
Summary by CodeRabbit
New Features
Bug Fixes