From c615b22939506c8e6c9b7b73ad249d93798b893e Mon Sep 17 00:00:00 2001 From: Preetam Dwivedi Date: Mon, 8 Jun 2026 09:29:06 -0700 Subject: [PATCH 1/3] =?UTF-8?q?docs(rfc):=20extension=20contract=20?= =?UTF-8?q?=E2=80=94=20identity=20in,=20resolve=20internally?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Summary ### Why? Extension input granularity is inconsistent across the orchestrator pipeline: `conflict.Analyzer` takes orchestrator identity (`entity.Batch`), while `scorer` / `mergechecker` / `changeprovider` / `buildrunner` / `pusher` take controller-resolved `entity.Change`. The split caps what an extension can do — a real `target_overlap` conflict analyzer and a diff-aware heuristic scorer both cannot be written today, because the data they need is neither in the contract nor resolvable by the extension. ### What? Adds `doc/rfc/submitqueue/extension-contract.md` proposing that decision/action extensions accept thin reference entities at their pipeline-stage granularity (`entity.Request` for request-stage, `entity.Batch` / `[]entity.Batch` for batch-stage) and resolve granular content themselves via narrowly-injected `Factory` dependencies, while `storage` / `changestore` / `queueconfig` stay key/value resolution targets. `conflict.Analyzer` is the baseline. The RFC revises the BuildRunner base/head contract (`build-runner.md`) to pass batches rather than change lists. Also encodes the rule in `CLAUDE.md` so new extensions and signature changes follow it, and links the RFC from the RFC index. Documentation only — no code changes. --- CLAUDE.md | 2 + doc/rfc/index.md | 1 + doc/rfc/submitqueue/extension-contract.md | 58 +++++++++++++++++++++++ 3 files changed, 61 insertions(+) create mode 100644 doc/rfc/submitqueue/extension-contract.md diff --git a/CLAUDE.md b/CLAUDE.md index 650889d4..afaf5d0d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -138,6 +138,8 @@ The cost of "callers loop over a small batch" is usually negligible. The cost of When in doubt, ask: *"If the next implementation were DynamoDB / Kafka / Bigtable / a remote RPC service / an in-memory map, could it satisfy this signature without contortion?"* If the answer is no, simplify the contract. +**Input contract — identity in, resolve internally.** A decision/action extension takes the orchestrator's thin reference entity at its pipeline-stage granularity — `entity.Request` (request stage) or `entity.Batch` / `[]entity.Batch` (batch stage) — never controller-pre-resolved data. It resolves the granular content it needs (changes, diffs, targets) through dependencies injected at its `Factory` (e.g. a request store, a change provider), not a global aggregator. Stores (`storage`, `changestore`) and config (`queueconfig`) are the exception — they are the resolution *targets* and stay key/value-shaped per the rule above. `conflict.Analyzer` is the reference shape; every new extension or signature change must follow it. See [doc/rfc/submitqueue/extension-contract.md](doc/rfc/submitqueue/extension-contract.md). + ### Import Paths Paths follow the directory layout: shared code is top-level, domain code nests under the domain folder (`submitqueue/`, `stovepipe/`). diff --git a/doc/rfc/index.md b/doc/rfc/index.md index 1305a909..f81f41d8 100644 --- a/doc/rfc/index.md +++ b/doc/rfc/index.md @@ -10,3 +10,4 @@ Design documents and technical proposals, grouped by scope. Shared/cross-cutting - [Orchestrator Workflow](submitqueue/workflow.md) - Queue-driven controller pipeline from gateway entry through batching, scoring, build, merge, and conclude - [Build Runner](submitqueue/build-runner.md) - Vendor-agnostic BuildRunner interface, provider-neutral BuildStatus lifecycle, and how the orchestrator wires it into the build stage +- [Extension Contract](submitqueue/extension-contract.md) - When extensions take orchestrator identity (request/batch) and resolve granular content themselves vs. take controller-resolved data; revises the BuildRunner base/head contract diff --git a/doc/rfc/submitqueue/extension-contract.md b/doc/rfc/submitqueue/extension-contract.md new file mode 100644 index 00000000..6dd5f105 --- /dev/null +++ b/doc/rfc/submitqueue/extension-contract.md @@ -0,0 +1,58 @@ +# Extension Contract + +Design notes for what SubmitQueue's pluggable extensions accept: orchestrator **identity** they resolve themselves, versus **controller-resolved data**. Decisions and rationale only; the code changes land after this RFC is reviewed. + +## Problem + +Extension input granularity is inconsistent across the pipeline stages (see [workflow.md](workflow.md)). `conflict.Analyzer` takes identity (`entity.Batch`); `scorer`, `mergechecker`, `changeprovider`, `buildrunner`, `pusher` take controller-resolved `entity.Change`. The split caps what an extension can do: + +- `ConflictType` already names `target_overlap`, but a real target-overlap analyzer **cannot be written** — the batch controller hands it identity-level batches (no changed targets) and the contract has nowhere to put them. +- `scorer` gets a URIs-only `Change`, so a heuristic scorer **cannot see** lines-changed / file-count. + +Both unblock with the shape `conflict` already uses: accept identity, resolve internally. + +## Principle + +- **Decision/action extensions** take orchestrator identity at their stage granularity and resolve granular content through narrowly-injected dependencies. Request stage → `entity.Request`; batch stage → `entity.Batch` / `[]entity.Batch`. Both are thin reference entities (a `Request` carries URIs, not diffs; a `Batch` carries IDs, not changes). +- **Resolution targets** — `storage`, `changestore`, `queueconfig` — stay key/value-shaped. They are what the others resolve *through* (see [storage/README.md](../../../submitqueue/extension/storage/README.md) and CLAUDE.md). + +### What each stage resolves today + +| Stage | Loads | Resolves for the extension | Hands to the extension | +|---|---|---|---| +| `validate` | `entity.Request` | nothing — `request.Change` is already in hand (the change-store reads here serve duplicate detection) | `request.Change` → `mergechecker`, `changeprovider` | +| `batch` | `entity.Request` + active `[]entity.Batch` | **nothing** — builds a batch whose `Contains` is `[requestID]` | `entity.Batch`, `[]entity.Batch` → `conflict` | +| `score` | `entity.Batch`, then each `entity.Request` | batch → requests | `request.Change` per request, then multiplies the scores → `scorer` | +| `build` | `entity.Batch`, then `collectChanges` | batch → requests → changes, **flattening batch boundaries** | base `[]Change`, head `[]Change` → `buildrunner` | +| `merge` | `entity.Batch`, then `collectChanges` | batch → requests → changes | `[]Change` → `pusher` | + +Two facts this grounds: `conflict` already resolves nothing (the baseline), and the batch→changes walk is **already duplicated** in `build`/`merge` `collectChanges` — the shared resolver below only consolidates it. + +## Verdict + +| Extension | Stage | Today | Proposed input | Injected deps | +|---|---|---|---|---| +| `conflict.Analyzer` | batch | identity (`Batch`, `[]Batch`) | unchanged — **the baseline** | request store + change provider | +| `scorer.Scorer` | score | flat `Change`, per request | `entity.Batch` — resolve + reduce internally | request store + change provider | +| `mergechecker.MergeChecker` | validate | `Change` | `entity.Request` | none | +| `changeprovider.ChangeProvider` | validate | `Change` | `entity.Request` | none — it *is* the resolver | +| `buildrunner.BuildRunner` | build | base/head `[]Change` | base `[]entity.Batch` + head `entity.Batch` | request store + change provider | +| `pusher.Pusher` | merge | `[]Change` | ordered `[]entity.Batch` | request store + change provider | +| `storage`, `changestore`, `queueconfig` | — | keys + entities | unchanged — resolution targets | — | + +Non-obvious points: + +- **scorer** — owning the batch moves batch-level reduction (today the controller's multiplicative product) into the scorer, where the `composite` reduce step already lives. +- **buildrunner** — this **revises** [build-runner.md](build-runner.md), which deliberately kept batches out of the boundary. The base/head split survives, expressed as batches; the provider still operates on changes (the shared resolver produces them inside the extension). Cost: a `buildrunner` / `pusher` implementation now depends on a request store + change provider. +- **pusher** — a *list* of batches (not one) designs for a merge-train: land several ready batches, or a batch with not-yet-landed deps, in one atomic push. Today merge pushes a single batch because deps are already on trunk. + +## Mechanism + +Dependencies are injected per-extension at the existing `Factory.For` (wiring: `example/submitqueue/orchestrator/server/main.go`) — only the handles a contract justifies, never the whole storage aggregator. The repeated batch→changes walk becomes one shared resolver (today's duplicated `collectChanges`, consolidated, and preserving the batch boundaries build's copy flattens). Controllers shrink to passing the identity entity they already load. + +## Rejected + +- **Status quo (controller resolves).** Keeps extensions pure and trivially testable, but thickens controllers and caps every extension at what the controller chose to pre-compute — the two blocked features are that ceiling. +- **Literal string IDs.** An extra read per call when the controller already holds the entity; pass thin reference entities instead. +- **Per-implementation batch→changes resolution.** How the `build`/`merge` duplication arose; one shared resolver instead. +- *Acknowledged:* decision extensions gain dependencies and are no longer pure functions — mitigated by their existing mock packages and `Factory` injection. From 43a1cff7eb7276545ca88e46b48e96b657eba212 Mon Sep 17 00:00:00 2001 From: Preetam Dwivedi Date: Mon, 8 Jun 2026 11:32:22 -0700 Subject: [PATCH 2/3] docs(rfc): extension outputs and per-batch pusher result Add an "Output" column to the verdict table and a principle stating that an output element self-identifies with its input unit (ChangeInfo by URI, Conflict by BatchID); a wrapper entity is added only to aggregate up to a coarser unit. Five of six return contracts are unchanged. pusher is the exception: its input becomes a list of batches, so its result regroups per batch (BatchID-tagged, per-change commit detail underneath). Atomicity stays all-or-nothing, so a per-batch status is intentionally omitted. Also note entity.BatchChanges is kept as the shared resolver's detailed output rather than a controller-assembled value. --- doc/rfc/submitqueue/extension-contract.md | 25 ++++++++++++++--------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/doc/rfc/submitqueue/extension-contract.md b/doc/rfc/submitqueue/extension-contract.md index 6dd5f105..1fae9134 100644 --- a/doc/rfc/submitqueue/extension-contract.md +++ b/doc/rfc/submitqueue/extension-contract.md @@ -15,6 +15,7 @@ Both unblock with the shape `conflict` already uses: accept identity, resolve in - **Decision/action extensions** take orchestrator identity at their stage granularity and resolve granular content through narrowly-injected dependencies. Request stage → `entity.Request`; batch stage → `entity.Batch` / `[]entity.Batch`. Both are thin reference entities (a `Request` carries URIs, not diffs; a `Batch` carries IDs, not changes). - **Resolution targets** — `storage`, `changestore`, `queueconfig` — stay key/value-shaped. They are what the others resolve *through* (see [storage/README.md](../../../submitqueue/extension/storage/README.md) and CLAUDE.md). +- **Output mirrors the input unit.** Each output element self-identifies with the input it corresponds to — `changeprovider`'s `ChangeInfo` carries its `URI`, `conflict`'s `Conflict` carries its `BatchID` — so a flat list suffices and the caller correlates results back to inputs without re-deriving boundaries. A *wrapper* entity (`entity.BatchChanges`) is introduced only to aggregate *up* to a coarser unit than the elements — the scorer needs batch-wide line/file totals, so the rollup earns its keep; no `RequestChanges` exists because nothing needs request-wide rollups. And when the input is a *collection* of independently-actioned units, the output groups by them: `pusher`, fed `[]entity.Batch`, returns outcomes grouped per batch, the same way `conflict` already tags each `Conflict` with its in-flight `BatchID`. ### What each stage resolves today @@ -30,26 +31,30 @@ Two facts this grounds: `conflict` already resolves nothing (the baseline), and ## Verdict -| Extension | Stage | Today | Proposed input | Injected deps | -|---|---|---|---|---| -| `conflict.Analyzer` | batch | identity (`Batch`, `[]Batch`) | unchanged — **the baseline** | request store + change provider | -| `scorer.Scorer` | score | flat `Change`, per request | `entity.Batch` — resolve + reduce internally | request store + change provider | -| `mergechecker.MergeChecker` | validate | `Change` | `entity.Request` | none | -| `changeprovider.ChangeProvider` | validate | `Change` | `entity.Request` | none — it *is* the resolver | -| `buildrunner.BuildRunner` | build | base/head `[]Change` | base `[]entity.Batch` + head `entity.Batch` | request store + change provider | -| `pusher.Pusher` | merge | `[]Change` | ordered `[]entity.Batch` | request store + change provider | -| `storage`, `changestore`, `queueconfig` | — | keys + entities | unchanged — resolution targets | — | +| Extension | Stage | Input today | Proposed input | Output | Injected deps | +|---|---|---|---|---|---| +| `conflict.Analyzer` | batch | identity (`Batch`, `[]Batch`) | unchanged — **the baseline** | conflicting in-flight batches (`[]Conflict`, `BatchID`-tagged) — unchanged | request store + change provider | +| `scorer.Scorer` | score | flat `Change`, per request | `entity.Batch` — resolve + reduce internally | one batch score (`float64`) — unchanged | request store + change provider | +| `mergechecker.MergeChecker` | validate | `Change` | `entity.Request` | mergeability (`Result`) — unchanged | none | +| `changeprovider.ChangeProvider` | validate | `Change` | `entity.Request` | per-URI change info (`[]ChangeInfo`, `URI`-tagged) — unchanged | none — it *is* the resolver | +| `buildrunner.BuildRunner` | build | base/head `[]Change` | base `[]entity.Batch` + head `entity.Batch` | build id, then status/cancel (`BuildID`, `BuildStatus`) — unchanged | request store + change provider | +| `pusher.Pusher` | merge | `[]Change` | ordered `[]entity.Batch` | **per-batch** outcomes (`Result` grouped by `BatchID`) — **changed** | request store + change provider | +| `storage`, `changestore`, `queueconfig` | — | keys + entities | unchanged — resolution targets | entities | — | + +**Outputs are unchanged except `pusher`.** This RFC moves the *input* toward identity; five of the six return contracts — conflicts, score, mergeability, change info, build id/status — are exactly what they are today. `pusher` is the lone exception: because its input becomes a *list* of independently-landed batches, its result regroups per batch (`BatchID`-tagged, per-change commit detail kept underneath) so each batch's outcome stays correlatable — the "output mirrors the input unit" principle above. No other output shape changes. Non-obvious points: - **scorer** — owning the batch moves batch-level reduction (today the controller's multiplicative product) into the scorer, where the `composite` reduce step already lives. - **buildrunner** — this **revises** [build-runner.md](build-runner.md), which deliberately kept batches out of the boundary. The base/head split survives, expressed as batches; the provider still operates on changes (the shared resolver produces them inside the extension). Cost: a `buildrunner` / `pusher` implementation now depends on a request store + change provider. -- **pusher** — a *list* of batches (not one) designs for a merge-train: land several ready batches, or a batch with not-yet-landed deps, in one atomic push. Today merge pushes a single batch because deps are already on trunk. +- **pusher** — a *list* of batches (not one) designs for a merge-train: land several ready batches, or a batch with not-yet-landed deps, in one atomic push. Today merge pushes a single batch because deps are already on trunk. Since the input is now a list, the output groups outcomes per batch (`BatchID`-tagged, with per-change commit detail kept underneath) instead of one flat per-change list — the only output shape this RFC changes. Push atomicity is unchanged (all-or-nothing across the whole call), so a per-batch *status* is intentionally omitted: a partial-landing train would be a separate, larger change to the atomicity contract. ## Mechanism Dependencies are injected per-extension at the existing `Factory.For` (wiring: `example/submitqueue/orchestrator/server/main.go`) — only the handles a contract justifies, never the whole storage aggregator. The repeated batch→changes walk becomes one shared resolver (today's duplicated `collectChanges`, consolidated, and preserving the batch boundaries build's copy flattens). Controllers shrink to passing the identity entity they already load. +`entity.BatchChanges` is kept, not removed — it becomes the shared resolver's *detailed output* (URIs + provider details for a batch, what the scorer consumes) rather than a value the score controller assembles and passes in. Its line/file helpers move with it; only its producer changes. + ## Rejected - **Status quo (controller resolves).** Keeps extensions pure and trivially testable, but thickens controllers and caps every extension at what the controller chose to pre-compute — the two blocked features are that ceiling. From 27354750a4bdf1632f4f6dda09a99955f86efac8 Mon Sep 17 00:00:00 2001 From: Preetam Dwivedi Date: Mon, 8 Jun 2026 11:44:07 -0700 Subject: [PATCH 3/3] =?UTF-8?q?feat(changeset):=20shared=20batch=E2=86=92c?= =?UTF-8?q?hanges=20resolver=20in=20core?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add submitqueue/core/changeset, the single place the orchestrator resolves batch identity into the changes a batch contains — consolidating the batch -> requests -> changes walk that the build, merge, and score controllers each performed privately. Resolver exposes two single-batch fidelities, both keyed per batch so callers with several batches loop and keep the per-batch boundary: ChangesForBatch returns raw changes (URIs only, no change-store read) for the build and merge stages, and DetailedForBatch returns one ChangeInfo per claimed URI with provider details read from the change store, for the score stage and detail-aware analyzers. Ships with a store-backed implementation (depending only on the request and change stores), a programmable in-memory fake, a generated mock, and tests. The package is added unused; extensions adopt it in later branches. entity.BatchChanges is repurposed as DetailedForBatch's output (doc comment only). The mocks make-target gains the new package. --- Makefile | 2 +- submitqueue/core/changeset/BUILD.bazel | 28 +++++ submitqueue/core/changeset/README.md | 18 +++ submitqueue/core/changeset/changeset.go | 51 ++++++++ submitqueue/core/changeset/fake/BUILD.bazel | 23 ++++ submitqueue/core/changeset/fake/fake.go | 86 +++++++++++++ submitqueue/core/changeset/fake/fake_test.go | 70 +++++++++++ submitqueue/core/changeset/mock/BUILD.bazel | 12 ++ .../core/changeset/mock/changeset_mock.go | 72 +++++++++++ submitqueue/core/changeset/resolver.go | 77 ++++++++++++ submitqueue/core/changeset/resolver_test.go | 113 ++++++++++++++++++ submitqueue/entity/batch_changes.go | 12 +- 12 files changed, 558 insertions(+), 6 deletions(-) create mode 100644 submitqueue/core/changeset/BUILD.bazel create mode 100644 submitqueue/core/changeset/README.md create mode 100644 submitqueue/core/changeset/changeset.go create mode 100644 submitqueue/core/changeset/fake/BUILD.bazel create mode 100644 submitqueue/core/changeset/fake/fake.go create mode 100644 submitqueue/core/changeset/fake/fake_test.go create mode 100644 submitqueue/core/changeset/mock/BUILD.bazel create mode 100644 submitqueue/core/changeset/mock/changeset_mock.go create mode 100644 submitqueue/core/changeset/resolver.go create mode 100644 submitqueue/core/changeset/resolver_test.go diff --git a/Makefile b/Makefile index 4a8f7cf5..af3b8aa3 100644 --- a/Makefile +++ b/Makefile @@ -336,7 +336,7 @@ local-stovepipe-gateway-start: build-stovepipe-gateway-linux ## Start Stovepipe mocks: ## Generate mock files using mockgen @echo "Generating mocks..." - @$(BAZEL) run @rules_go//go -- generate ./submitqueue/extension/storage/... ./submitqueue/extension/buildrunner/... ./submitqueue/extension/changeprovider/... ./extension/counter/... ./extension/messagequeue/... ./submitqueue/extension/queueconfig/... ./submitqueue/extension/mergechecker/... ./submitqueue/extension/pusher/... ./submitqueue/extension/scorer/... ./submitqueue/extension/conflict/... ./submitqueue/core/consumer/... + @$(BAZEL) run @rules_go//go -- generate ./submitqueue/extension/storage/... ./submitqueue/extension/buildrunner/... ./submitqueue/extension/changeprovider/... ./extension/counter/... ./extension/messagequeue/... ./submitqueue/extension/queueconfig/... ./submitqueue/extension/mergechecker/... ./submitqueue/extension/pusher/... ./submitqueue/extension/scorer/... ./submitqueue/extension/conflict/... ./submitqueue/core/consumer/... ./submitqueue/core/changeset/... @echo "Mocks generated successfully!" proto: ## Generate protobuf files from .proto definitions diff --git a/submitqueue/core/changeset/BUILD.bazel b/submitqueue/core/changeset/BUILD.bazel new file mode 100644 index 00000000..44da4767 --- /dev/null +++ b/submitqueue/core/changeset/BUILD.bazel @@ -0,0 +1,28 @@ +load("@rules_go//go:def.bzl", "go_library", "go_test") + +go_library( + name = "changeset", + srcs = [ + "changeset.go", + "resolver.go", + ], + importpath = "github.com/uber/submitqueue/submitqueue/core/changeset", + visibility = ["//visibility:public"], + deps = [ + "//submitqueue/entity", + "//submitqueue/extension/storage", + ], +) + +go_test( + name = "changeset_test", + srcs = ["resolver_test.go"], + embed = [":changeset"], + deps = [ + "//submitqueue/entity", + "//submitqueue/extension/storage/mock", + "@com_github_stretchr_testify//assert", + "@com_github_stretchr_testify//require", + "@org_uber_go_mock//gomock", + ], +) diff --git a/submitqueue/core/changeset/README.md b/submitqueue/core/changeset/README.md new file mode 100644 index 00000000..b57b9015 --- /dev/null +++ b/submitqueue/core/changeset/README.md @@ -0,0 +1,18 @@ +# changeset + +`changeset` resolves batch identity into the changes a batch contains. It is the single place the orchestrator walks batch → requests → changes, consolidating the resolution the build, merge, and score controllers each performed privately. + +## Why it exists + +A `Batch` is a thin reference entity: it carries the IDs of the requests it contains, not their changes. Decision and action extensions (the scorer, build runner, pusher, and future detail-aware conflict analyzers) are handed that identity and resolve the granular content themselves through an injected `Resolver`, rather than depending on a controller to pre-resolve and pass the data in. The resolver depends only on the two resolution-target stores — the request store (to walk a batch's contained requests) and the change store (to attach provider details) — and nothing else. + +## Two fidelities + +The resolver offers the same walk at two levels of detail, and both preserve batch boundaries — neither flattens across batches, so a caller that wants a flat list flattens the result itself: + +- The raw view returns each batch's contained changes as URIs only, one group per input batch, in input order. It performs no change-store read. The build stage uses it for base and head inputs; the merge stage uses it for the pusher. +- The detailed view returns a single batch's normalized, batch-level changes: one entry per claimed URI, each carrying the provider details recorded in the change store, aggregated across every request in the batch. Because the change store returns rows for every request that ever claimed a URI, the resolver selects the row owned by the requesting request. The score stage uses it, as will any analyzer that needs changed-file or line-count facts. + +## Testing + +A programmable in-memory fake lives in `fake/`: seed per-batch results and inject errors without a real store. A generated mock lives in `mock/` for tests that assert on exact call expectations. Extensions that take a `Resolver` can be exercised against either. diff --git a/submitqueue/core/changeset/changeset.go b/submitqueue/core/changeset/changeset.go new file mode 100644 index 00000000..6ae287e4 --- /dev/null +++ b/submitqueue/core/changeset/changeset.go @@ -0,0 +1,51 @@ +// Copyright (c) 2025 Uber Technologies, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Package changeset resolves batch identity into the changes a batch contains. +// It is the single place the orchestrator walks batch -> requests -> changes, +// consolidating what the build, merge, and score controllers each did privately. +// Decision/action extensions (scorer, buildrunner, pusher, and future +// detail-aware conflict analyzers) take thin identity entities and resolve their +// granular content through an injected Resolver instead of being handed +// pre-resolved data by a controller. +package changeset + +//go:generate mockgen -source=changeset.go -destination=mock/changeset_mock.go -package=mock + +import ( + "context" + + "github.com/uber/submitqueue/submitqueue/entity" +) + +// Resolver turns batch identity into the changes the batch contains. Both methods +// operate on a single batch — callers with several batches (a build's base, a +// merge train) loop and keep the per-batch boundary by holding a slice per batch. +// The two methods differ only in fidelity: ChangesForBatch is the cheap URI-only +// view; DetailedForBatch reads the change store for provider details. +type Resolver interface { + // ChangesForBatch resolves a batch's contained requests into their raw + // changes (URIs only; no change-store read), in batch.Contains order. A batch + // with no requests yields an empty slice. Used by the build (base/head) and + // merge stages. + ChangesForBatch(ctx context.Context, batch entity.Batch) ([]entity.Change, error) + + // DetailedForBatch resolves a batch into its normalized, batch-level view: + // one entity.ChangeInfo per claimed URI (URI plus the provider details read + // from the change store), aggregated across every request in the batch. For + // each URI it selects the record owned by the request, since the change store + // returns rows for all requests that ever claimed the URI. Used by the score + // stage and detail-aware analyzers. + DetailedForBatch(ctx context.Context, batch entity.Batch) (entity.BatchChanges, error) +} diff --git a/submitqueue/core/changeset/fake/BUILD.bazel b/submitqueue/core/changeset/fake/BUILD.bazel new file mode 100644 index 00000000..881731cc --- /dev/null +++ b/submitqueue/core/changeset/fake/BUILD.bazel @@ -0,0 +1,23 @@ +load("@rules_go//go:def.bzl", "go_library", "go_test") + +go_library( + name = "fake", + srcs = ["fake.go"], + importpath = "github.com/uber/submitqueue/submitqueue/core/changeset/fake", + visibility = ["//visibility:public"], + deps = [ + "//submitqueue/core/changeset", + "//submitqueue/entity", + ], +) + +go_test( + name = "fake_test", + srcs = ["fake_test.go"], + embed = [":fake"], + deps = [ + "//submitqueue/entity", + "@com_github_stretchr_testify//assert", + "@com_github_stretchr_testify//require", + ], +) diff --git a/submitqueue/core/changeset/fake/fake.go b/submitqueue/core/changeset/fake/fake.go new file mode 100644 index 00000000..28e3ba4b --- /dev/null +++ b/submitqueue/core/changeset/fake/fake.go @@ -0,0 +1,86 @@ +// Copyright (c) 2025 Uber Technologies, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Package fake provides an in-memory changeset.Resolver for tests and examples. +// Seed per-batch results with Set (raw changes) and SetDetailed (detailed view), +// keyed by batch ID; the resolver serves what was seeded. A batch with no seeded +// entry resolves to empty rather than an error, matching a batch whose requests +// carry no changes. FailWith injects an error on every call to exercise the error +// path without a real store. It is intended for examples and tests only, never +// production. +package fake + +import ( + "context" + + "github.com/uber/submitqueue/submitqueue/core/changeset" + "github.com/uber/submitqueue/submitqueue/entity" +) + +// Resolver is a programmable in-memory changeset.Resolver. +type Resolver struct { + changes map[string][]entity.Change + detailed map[string]entity.BatchChanges + err error +} + +// New returns an empty fake Resolver. Seed it with Set / SetDetailed. +func New() *Resolver { + return &Resolver{ + changes: map[string][]entity.Change{}, + detailed: map[string]entity.BatchChanges{}, + } +} + +// Set seeds the raw changes returned by Changes for the given batch ID. +func (r *Resolver) Set(batchID string, changes ...entity.Change) *Resolver { + r.changes[batchID] = changes + return r +} + +// SetDetailed seeds the detailed view returned by Detailed for the given batch ID. +func (r *Resolver) SetDetailed(batchID string, detailed entity.BatchChanges) *Resolver { + r.detailed[batchID] = detailed + return r +} + +// FailWith makes every Changes and Detailed call return err. +func (r *Resolver) FailWith(err error) *Resolver { + r.err = err + return r +} + +// ChangesForBatch returns the seeded raw changes for the batch, in seeded order. +// An unseeded batch resolves to a nil slice. +func (r *Resolver) ChangesForBatch(_ context.Context, batch entity.Batch) ([]entity.Change, error) { + if r.err != nil { + return nil, r.err + } + return r.changes[batch.ID], nil +} + +// DetailedForBatch returns the seeded detailed view for the batch. An unseeded +// batch resolves to an empty entity.BatchChanges carrying the batch's identity. +func (r *Resolver) DetailedForBatch(_ context.Context, batch entity.Batch) (entity.BatchChanges, error) { + if r.err != nil { + return entity.BatchChanges{}, r.err + } + if detailed, ok := r.detailed[batch.ID]; ok { + return detailed, nil + } + return entity.BatchChanges{BatchID: batch.ID, Queue: batch.Queue}, nil +} + +// ensure the fake satisfies the interface. +var _ changeset.Resolver = (*Resolver)(nil) diff --git a/submitqueue/core/changeset/fake/fake_test.go b/submitqueue/core/changeset/fake/fake_test.go new file mode 100644 index 00000000..7cdb135f --- /dev/null +++ b/submitqueue/core/changeset/fake/fake_test.go @@ -0,0 +1,70 @@ +// Copyright (c) 2025 Uber Technologies, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package fake + +import ( + "context" + "errors" + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + + "github.com/uber/submitqueue/submitqueue/entity" +) + +func TestResolverChanges(t *testing.T) { + r := New(). + Set("q/batch/1", entity.Change{URIs: []string{"u1"}}). + Set("q/batch/2", entity.Change{URIs: []string{"u2"}}, entity.Change{URIs: []string{"u3"}}) + + got, err := r.ChangesForBatch(context.Background(), entity.Batch{ID: "q/batch/2"}) + require.NoError(t, err) + assert.Equal(t, []entity.Change{{URIs: []string{"u2"}}, {URIs: []string{"u3"}}}, got) + + unseeded, err := r.ChangesForBatch(context.Background(), entity.Batch{ID: "q/batch/unseeded"}) + require.NoError(t, err) + assert.Empty(t, unseeded) +} + +func TestResolverDetailed(t *testing.T) { + want := entity.BatchChanges{ + BatchID: "q/batch/1", + Queue: "q", + Changes: []entity.ChangeInfo{{URI: "u1"}}, + } + r := New().SetDetailed("q/batch/1", want) + + got, err := r.DetailedForBatch(context.Background(), entity.Batch{ID: "q/batch/1", Queue: "q"}) + require.NoError(t, err) + assert.Equal(t, want, got) +} + +func TestResolverDetailedUnseeded(t *testing.T) { + got, err := New().DetailedForBatch(context.Background(), entity.Batch{ID: "q/batch/9", Queue: "q"}) + require.NoError(t, err) + assert.Equal(t, entity.BatchChanges{BatchID: "q/batch/9", Queue: "q"}, got) +} + +func TestResolverFailWith(t *testing.T) { + sentinel := errors.New("boom") + r := New().FailWith(sentinel) + + _, err := r.ChangesForBatch(context.Background(), entity.Batch{ID: "q/batch/1"}) + require.ErrorIs(t, err, sentinel) + + _, err = r.DetailedForBatch(context.Background(), entity.Batch{ID: "q/batch/1"}) + require.ErrorIs(t, err, sentinel) +} diff --git a/submitqueue/core/changeset/mock/BUILD.bazel b/submitqueue/core/changeset/mock/BUILD.bazel new file mode 100644 index 00000000..e4aaae4a --- /dev/null +++ b/submitqueue/core/changeset/mock/BUILD.bazel @@ -0,0 +1,12 @@ +load("@rules_go//go:def.bzl", "go_library") + +go_library( + name = "mock", + srcs = ["changeset_mock.go"], + importpath = "github.com/uber/submitqueue/submitqueue/core/changeset/mock", + visibility = ["//visibility:public"], + deps = [ + "//submitqueue/entity", + "@org_uber_go_mock//gomock", + ], +) diff --git a/submitqueue/core/changeset/mock/changeset_mock.go b/submitqueue/core/changeset/mock/changeset_mock.go new file mode 100644 index 00000000..d4d15829 --- /dev/null +++ b/submitqueue/core/changeset/mock/changeset_mock.go @@ -0,0 +1,72 @@ +// Code generated by MockGen. DO NOT EDIT. +// Source: changeset.go +// +// Generated by this command: +// +// mockgen -source=changeset.go -destination=mock/changeset_mock.go -package=mock +// + +// Package mock is a generated GoMock package. +package mock + +import ( + context "context" + reflect "reflect" + + entity "github.com/uber/submitqueue/submitqueue/entity" + gomock "go.uber.org/mock/gomock" +) + +// MockResolver is a mock of Resolver interface. +type MockResolver struct { + ctrl *gomock.Controller + recorder *MockResolverMockRecorder + isgomock struct{} +} + +// MockResolverMockRecorder is the mock recorder for MockResolver. +type MockResolverMockRecorder struct { + mock *MockResolver +} + +// NewMockResolver creates a new mock instance. +func NewMockResolver(ctrl *gomock.Controller) *MockResolver { + mock := &MockResolver{ctrl: ctrl} + mock.recorder = &MockResolverMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockResolver) EXPECT() *MockResolverMockRecorder { + return m.recorder +} + +// ChangesForBatch mocks base method. +func (m *MockResolver) ChangesForBatch(ctx context.Context, batch entity.Batch) ([]entity.Change, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "ChangesForBatch", ctx, batch) + ret0, _ := ret[0].([]entity.Change) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// ChangesForBatch indicates an expected call of ChangesForBatch. +func (mr *MockResolverMockRecorder) ChangesForBatch(ctx, batch any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "ChangesForBatch", reflect.TypeOf((*MockResolver)(nil).ChangesForBatch), ctx, batch) +} + +// DetailedForBatch mocks base method. +func (m *MockResolver) DetailedForBatch(ctx context.Context, batch entity.Batch) (entity.BatchChanges, error) { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "DetailedForBatch", ctx, batch) + ret0, _ := ret[0].(entity.BatchChanges) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// DetailedForBatch indicates an expected call of DetailedForBatch. +func (mr *MockResolverMockRecorder) DetailedForBatch(ctx, batch any) *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "DetailedForBatch", reflect.TypeOf((*MockResolver)(nil).DetailedForBatch), ctx, batch) +} diff --git a/submitqueue/core/changeset/resolver.go b/submitqueue/core/changeset/resolver.go new file mode 100644 index 00000000..28403d7b --- /dev/null +++ b/submitqueue/core/changeset/resolver.go @@ -0,0 +1,77 @@ +// Copyright (c) 2025 Uber Technologies, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package changeset + +import ( + "context" + "fmt" + + "github.com/uber/submitqueue/submitqueue/entity" + "github.com/uber/submitqueue/submitqueue/extension/storage" +) + +// resolver is the store-backed Resolver. It owns the two resolution-target +// stores and nothing else: a request store to walk batch.Contains, and a change +// store to attach provider details for the Detailed view. +type resolver struct { + requests storage.RequestStore + changes storage.ChangeStore +} + +// New returns a Resolver backed by the given request and change stores. +func New(requests storage.RequestStore, changes storage.ChangeStore) Resolver { + return resolver{requests: requests, changes: changes} +} + +// ChangesForBatch resolves a batch's requests to their raw changes, in +// batch.Contains order. +func (r resolver) ChangesForBatch(ctx context.Context, batch entity.Batch) ([]entity.Change, error) { + changes := make([]entity.Change, 0, len(batch.Contains)) + for _, requestID := range batch.Contains { + request, err := r.requests.Get(ctx, requestID) + if err != nil { + return nil, fmt.Errorf("failed to get request %s for batch %s: %w", requestID, batch.ID, err) + } + changes = append(changes, request.Change) + } + return changes, nil +} + +// DetailedForBatch resolves a batch into the normalized entity.BatchChanges: one +// ChangeInfo per claimed URI, owned by the requesting request, aggregated across +// the whole batch. +func (r resolver) DetailedForBatch(ctx context.Context, batch entity.Batch) (entity.BatchChanges, error) { + result := entity.BatchChanges{BatchID: batch.ID, Queue: batch.Queue} + for _, requestID := range batch.Contains { + request, err := r.requests.Get(ctx, requestID) + if err != nil { + return entity.BatchChanges{}, fmt.Errorf("failed to get request %s: %w", requestID, err) + } + for _, uri := range request.Change.URIs { + records, err := r.changes.GetByURI(ctx, batch.Queue, uri) + if err != nil { + return entity.BatchChanges{}, fmt.Errorf("failed to read change record for request %s uri=%s: %w", requestID, uri, err) + } + for _, rec := range records { + if rec.RequestID != requestID { + continue + } + result.Changes = append(result.Changes, entity.ChangeInfo{URI: rec.URI, Details: rec.Details}) + break + } + } + } + return result, nil +} diff --git a/submitqueue/core/changeset/resolver_test.go b/submitqueue/core/changeset/resolver_test.go new file mode 100644 index 00000000..407bdabd --- /dev/null +++ b/submitqueue/core/changeset/resolver_test.go @@ -0,0 +1,113 @@ +// Copyright (c) 2025 Uber Technologies, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package changeset + +import ( + "context" + "errors" + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + "go.uber.org/mock/gomock" + + "github.com/uber/submitqueue/submitqueue/entity" + storagemock "github.com/uber/submitqueue/submitqueue/extension/storage/mock" +) + +func req(id string, uris ...string) entity.Request { + return entity.Request{ID: id, Change: entity.Change{URIs: uris}} +} + +func TestResolverChanges(t *testing.T) { + ctrl := gomock.NewController(t) + reqs := storagemock.NewMockRequestStore(ctrl) + changes := storagemock.NewMockChangeStore(ctrl) + r := New(reqs, changes) + + reqs.EXPECT().Get(gomock.Any(), "r2").Return(req("r2", "u2"), nil) + reqs.EXPECT().Get(gomock.Any(), "r3").Return(req("r3", "u3"), nil) + + got, err := r.ChangesForBatch(context.Background(), entity.Batch{ID: "q/batch/2", Contains: []string{"r2", "r3"}}) + require.NoError(t, err) + // request order within the batch is preserved. + assert.Equal(t, []entity.Change{{URIs: []string{"u2"}}, {URIs: []string{"u3"}}}, got) +} + +func TestResolverChangesEmpty(t *testing.T) { + ctrl := gomock.NewController(t) + r := New(storagemock.NewMockRequestStore(ctrl), storagemock.NewMockChangeStore(ctrl)) + + got, err := r.ChangesForBatch(context.Background(), entity.Batch{ID: "q/batch/1"}) + require.NoError(t, err) + assert.Empty(t, got) +} + +func TestResolverChangesRequestError(t *testing.T) { + ctrl := gomock.NewController(t) + reqs := storagemock.NewMockRequestStore(ctrl) + r := New(reqs, storagemock.NewMockChangeStore(ctrl)) + + sentinel := errors.New("not found") + reqs.EXPECT().Get(gomock.Any(), "r1").Return(entity.Request{}, sentinel) + + _, err := r.ChangesForBatch(context.Background(), entity.Batch{ID: "q/batch/1", Contains: []string{"r1"}}) + require.ErrorIs(t, err, sentinel) +} + +func TestResolverDetailed(t *testing.T) { + ctrl := gomock.NewController(t) + reqs := storagemock.NewMockRequestStore(ctrl) + changes := storagemock.NewMockChangeStore(ctrl) + r := New(reqs, changes) + + batch := entity.Batch{ID: "q/batch/1", Queue: "q", Contains: []string{"r1", "r2"}} + reqs.EXPECT().Get(gomock.Any(), "r1").Return(req("r1", "u1"), nil) + reqs.EXPECT().Get(gomock.Any(), "r2").Return(req("r2", "u2"), nil) + + d1 := entity.ChangeDetails{ChangedFiles: []entity.ChangedFile{{Path: "a.go", LinesAdded: 3}}} + d2 := entity.ChangeDetails{ChangedFiles: []entity.ChangedFile{{Path: "b.go", LinesAdded: 5}}} + // GetByURI returns rows for every request that ever claimed the URI; the + // resolver must pick the row owned by the requesting request. + changes.EXPECT().GetByURI(gomock.Any(), "q", "u1").Return([]entity.ChangeRecord{ + {URI: "u1", RequestID: "other", Details: entity.ChangeDetails{}}, + {URI: "u1", RequestID: "r1", Details: d1}, + }, nil) + changes.EXPECT().GetByURI(gomock.Any(), "q", "u2").Return([]entity.ChangeRecord{ + {URI: "u2", RequestID: "r2", Details: d2}, + }, nil) + + got, err := r.DetailedForBatch(context.Background(), batch) + require.NoError(t, err) + assert.Equal(t, entity.BatchChanges{ + BatchID: "q/batch/1", + Queue: "q", + Changes: []entity.ChangeInfo{{URI: "u1", Details: d1}, {URI: "u2", Details: d2}}, + }, got) +} + +func TestResolverDetailedChangeStoreError(t *testing.T) { + ctrl := gomock.NewController(t) + reqs := storagemock.NewMockRequestStore(ctrl) + changes := storagemock.NewMockChangeStore(ctrl) + r := New(reqs, changes) + + sentinel := errors.New("read failed") + reqs.EXPECT().Get(gomock.Any(), "r1").Return(req("r1", "u1"), nil) + changes.EXPECT().GetByURI(gomock.Any(), "q", "u1").Return(nil, sentinel) + + _, err := r.DetailedForBatch(context.Background(), entity.Batch{ID: "q/batch/1", Queue: "q", Contains: []string{"r1"}}) + require.ErrorIs(t, err, sentinel) +} diff --git a/submitqueue/entity/batch_changes.go b/submitqueue/entity/batch_changes.go index d97e8a9c..ece494ad 100644 --- a/submitqueue/entity/batch_changes.go +++ b/submitqueue/entity/batch_changes.go @@ -14,11 +14,13 @@ package entity -// BatchChanges is the normalized, batch-level view of all changes in a batch, -// assembled by the score controller and handed to a Scorer. A Batch references -// only request IDs, so the controller resolves each request's change records and -// flattens their details into Changes — giving the scorer the whole batch's -// change facts in one value without coupling it to storage. +// BatchChanges is the normalized, batch-level view of all changes in a batch: +// one ChangeInfo per claimed URI, aggregated across every request the batch +// contains. It is produced by the shared changeset resolver (Resolver.DetailedForBatch) +// and consumed by a Scorer. A Batch references only request IDs, so the resolver +// resolves each request's change records and flattens their details into Changes — +// giving the scorer the whole batch's change facts in one value without coupling +// it to storage. type BatchChanges struct { // BatchID is the batch being scored. Format: "/batch/". BatchID string