Skip to content

fix(speculate): gate finalize on the batch's own build result#220

Open
albertywu wants to merge 2 commits into
mainfrom
fix/speculate-build-gating-e2e
Open

fix(speculate): gate finalize on the batch's own build result#220
albertywu wants to merge 2 commits into
mainfrom
fix/speculate-build-gating-e2e

Conversation

@albertywu

@albertywu albertywu commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Problem

speculate.tryFinalize only consulted dependency batches, never the batch's own build. A batch whose build failed still passed finalize and merged, so a sq-fake=build-fail request reached landed instead of error (verified live: build.status=failed, batch=succeeded, request=landed).

Fix

tryFinalize now fetches the batch's own build first:

  • own build failed → fail the batch (CAS Speculating → Failed + publish conclude), via a shared failBatch helper also used by the existing dependency-failure path;
  • own build succeeded → fall through to the existing dependency evaluation → merge;
  • own build not yet persisted / still in-flight → no-op and wait for the next event.

Net merge condition is now own build Succeeded AND all deps Succeeded.

Enabling change: BuildStore.GetByBatchID

Build.ID is the runner-assigned ID (e.g. fake-build-fail-…), not batch.ID, so a caller holding only a batch ID could not fetch its build via Get. Added GetByBatchID(ctx, batchID) (entity.Build, error) (interface + MySQL impl WHERE batch_id = ? LIMIT 1, indexed; regenerated mock).

This also repairs cancelBuild, which called Get(ctx, batch.ID) and was a silent no-op (always ErrNotFound); it now resolves the build by batch ID and updates it by build ID.

Tests

make fmt && make build && make test && make e2e-test

tryFinalize only consulted dependency batches, never the batch's own
build, so a failed build still merged and the request reached "landed".
It now fetches the batch's build first and fails the batch when that
build failed, only falling through to dependency evaluation once the own
build has succeeded (waiting if it is not yet persisted or still
in-flight).

Enabling change: add BuildStore.GetByBatchID. Build.ID is the
runner-assigned ID (e.g. "fake-build-fail-..."), not batch.ID, so a
caller holding only a batch ID could not fetch its build via Get. This
also fixes cancelBuild, which called Get(ctx, batch.ID) and was a silent
no-op (always ErrNotFound); it now resolves the build by batch ID and
updates it by build ID.

e2e: drive a request to terminal status — assert the happy path reaches
"landed" and a "sq-fake=build-fail" request reaches "error" (the
end-to-end regression guard for this fix). Make
PersistsStartedLogViaGatewayConsumer deterministic by asserting on the
durable, append-only request_log row instead of polling the transient
current status via the Status RPC.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 8, 2026 19:47
@albertywu albertywu requested review from a team as code owners June 8, 2026 19:47

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants