fix(harbor): nested-run flags/dedup/resume, archive + secret guards [fixes 3/4 compiler]#9
Merged
Conversation
…mkdtemp cleanup Mode B runner + build-compiler robustness (review findings on PR #5): - runner: emit --n-attempts / --max-retries (the typed HarborConfig fields were silently dropped); pick the best trial per task deterministically instead of last-write-wins over an unordered rglob; treat a persisted error sample as not-done so a transient failure is re-run on resume. - compiler: assert git archive exited 0 (and reap it) so a truncated stream cannot bake a near-empty baseline; validate declared secrets at build time and render compose secrets with a fail-fast guard so an unset host var fails loudly instead of producing a credential-less sidecar. - seed.sh: document that read_only_paths is advisory only and scorer provenance is enforced sidecar-side. - compiler: stage the dataset in a cleaned-up TemporaryDirectory (Greptile: the mkdtemp scratch dir was leaking, datasets can be gigabytes). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment on lines
+246
to
+258
| # 4b. Fail early if a declared secret is missing from the host env, so the | ||
| # operator finds out at build time rather than via a credential-less | ||
| # sidecar. The compose ${VAR:?} guard is the run-time backstop. | ||
| import os | ||
|
|
||
| if not os.environ.get("VERO_SKIP_SECRET_CHECK"): | ||
| missing = [s for s in config.secrets if not os.environ.get(s)] | ||
| if missing: | ||
| raise ValueError( | ||
| "Declared secrets missing from the host environment: " | ||
| f"{', '.join(missing)}. Set them, or set VERO_SKIP_SECRET_CHECK=1 " | ||
| "to defer to the run-time compose check." | ||
| ) |
There was a problem hiding this comment.
Secret check runs after expensive build steps
The presence check for declared secrets (step 4b) executes after _prepare_baseline_repo (git archive of the agent repo) and the dataset staging block, both of which can take minutes for large repos or multi-gigabyte datasets. If a required secret is absent, the operator waits through those steps before learning about the missing credential. Moving this check to the top of compile_task — before step 1 — would fail fast and avoid the partial-state output directory that is left behind on failure.
Prompt To Fix With AI
This is a comment left during a code review.
Path: vero/src/vero/harbor/build/compiler.py
Line: 246-258
Comment:
**Secret check runs after expensive build steps**
The presence check for declared secrets (step 4b) executes after `_prepare_baseline_repo` (git archive of the agent repo) and the dataset staging block, both of which can take minutes for large repos or multi-gigabyte datasets. If a required secret is absent, the operator waits through those steps before learning about the missing credential. Moving this check to the top of `compile_task` — before step 1 — would fail fast and avoid the partial-state output directory that is left behind on failure.
How can I resolve this? If you propose a fix, please make it concise.
This was referenced Jul 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacks on #5 (
harbor-3-compiler). Addresses the Mode B runner + build-compiler findings. 3 of 4 fix PRs.What this fixes
HarborConfig.n_attempts/max_retriesare typed fields_build_commandnever emits, so nested runs use harbor defaults--n-attempts/--max-retriesfrom the configured values_load_trialsis last-write-wins over an unorderedrglob; a failing retry can clobber a passing trial_prepare_baseline_repodiscardsgit archive's exit code, so a truncated stream can bake a near-empty baselinegit archiveexited 0 (capture stderr, reap the process); fail the build otherwise${VAR:?msg}compose form + a build-time presence check (opt-outVERO_SKIP_SECRET_CHECK)read_only_pathsperms read as tamper-protection but the verifier checks out git blobs, not the working treeGreptile finding on PR #5, also fixed here (reply posted on #5)
compiler.py): the dataset-staging dir is now atempfile.TemporaryDirectory()cleaned up on success and exception (datasets can be gigabytes).Tests
18 passed(test_harbor_runner.py,test_harbor_build.py): attempts/retries flags emitted; a passing trial wins over a later failing retry; resume re-runs a persisted error sample; a failinggit archiveraises; a missing declared secret fails the build; the rendered seed.sh documents the advisory caveat.🤖 Generated with Claude Code
Greptile Summary
This PR fixes six issues in the Harbor Mode-B runner and build compiler, covering correctness bugs around nested-run flag propagation, non-deterministic trial selection, error-sample resume skipping, git archive exit-code checking, silent empty-secret injection, and missing documentation on
read_only_pathsscope.--n-attemptsand--max-retriesare now emitted in_build_command;_load_trialspicks the best trial per task deterministically (clean+rewardsbeats failure, ties to latest attempt) rather than using undefinedrgloborder;_is_donere-runs a persisted error sample on resume instead of permanently skipping it.git archiveexit code is now asserted after reaping the process; declared secrets are validated against the host environment at build time with an opt-out escape hatch (VERO_SKIP_SECRET_CHECK); the compose template uses${VAR:?msg}fail-fast interpolation;seed.shdocumentsread_only_pathsas advisory only; the dataset-staging temp dir is now aTemporaryDirectorycontext manager that cleans up on both success and exception.Confidence Score: 4/5
Safe to merge; all six targeted defects are fixed with direct test coverage and the changes are tightly scoped to the harbor runner and compiler.
The secret-presence check is placed after the git archive and dataset-staging steps rather than at the top of compile_task. An operator with a missing secret waits through potentially slow operations before getting the error, and the output directory is left in a partial state. Everything else — flag emission, deterministic trial selection, error-sample resume, git archive exit assertion, compose fail-fast interpolation — looks correct and is covered by the new tests.
vero/src/vero/harbor/build/compiler.py — the ordering of the secret check relative to the expensive build steps.
Important Files Changed
Sequence Diagram
%%{init: {'theme': 'neutral'}}%% sequenceDiagram participant V as vero (HarborRunner) participant H as harbor run (nested) participant FS as jobs_dir filesystem participant DB as SampleResult store V->>DB: _is_done(sample_id)? alt result exists AND not is_error() DB-->>V: "done=True (skip)" else missing OR is_error() DB-->>V: "done=False (pending)" V->>H: harbor run --n-attempts N --max-retries M -i task_name H->>FS: "write jobs/<ts>/<trial>/result.json (per attempt)" H-->>V: exit (non-zero tolerated) end V->>FS: rglob result.json FS-->>V: all trial files (unordered) V->>V: "_trial_rank: pick best per task (clean+rewards > rewards > latest finished_at)" V->>DB: save_sample_result (score or error)%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% sequenceDiagram participant V as vero (HarborRunner) participant H as harbor run (nested) participant FS as jobs_dir filesystem participant DB as SampleResult store V->>DB: _is_done(sample_id)? alt result exists AND not is_error() DB-->>V: "done=True (skip)" else missing OR is_error() DB-->>V: "done=False (pending)" V->>H: harbor run --n-attempts N --max-retries M -i task_name H->>FS: "write jobs/<ts>/<trial>/result.json (per attempt)" H-->>V: exit (non-zero tolerated) end V->>FS: rglob result.json FS-->>V: all trial files (unordered) V->>V: "_trial_rank: pick best per task (clean+rewards > rewards > latest finished_at)" V->>DB: save_sample_result (score or error)Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "fix(harbor): nested-run flags/dedup/resu..." | Re-trigger Greptile