docs(harbor): honest integrity guarantees, GAIA leak caveat, Mode A example [fixes 4/4 docs]#10
Merged
Merged
Conversation
…xample, by-mode scorer Documentation accuracy fixes (review findings on PR #6): - architecture: soften the intro from a hard guarantee ("the optimizer cannot read hidden labels, modify the scorer, or bypass its budget") to best-effort, OS/process-level language describing what is actually enforced. - gaia build.yaml: correct "never reaches the optimizer". Because agent_repo is "." and build.yaml is git-tracked, the validation task ids ARE seeded into the optimizer's repo; only the per-sample scores are withheld. Acceptable for a public benchmark, with a caveat + mitigations for secret-identity benchmarks. - examples: gsm8k-agent is cited as the Mode A example but ships no build.yaml; repoint to gaia-optimization as the complete runnable example and pair gsm8k-agent with the tutorial's Mode A snippet. - architecture: document the current fail-open default for unlisted splits (and that it becomes fail-closed once the protocol fix lands), and split the "scorer is sidecar-only" claim by mode (true for Mode B; Mode A keeps the scorer in the agent's editable repo until the serve.py fix). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacks on #6 (
harbor-4-docs). Documentation-accuracy fixes from the review of that PR. 4 of 4 fix PRs.What this fixes
build.yamlsays the validation split "never reaches the optimizer", butagent_repo: .+ a git-trackedbuild.yamlseeds the held-out task ids into the optimizer's repogsm8k-agentis cited as the Mode A example but ships nobuild.yamlgaia-optimizationis the complete runnable example;gsm8k-agentis the Mode A agent reference, paired with the tutorial's Mode Abuild.yamlsnippetProse/yaml only;
build.yamlstill parses. These align the docs with the behavior after the core/sidecar fix PRs (#7, #8) land.🤖 Generated with Claude Code
Greptile Summary
This PR updates Harbor documentation to describe the current integrity limits more accurately. The main changes are:
gaia-optimization.build.yamlcaveat about held-out task IDs being visible.Confidence Score: 4/5
Merge is mostly safe after the README wording is aligned with the caveated integrity model documented elsewhere.
The changes are documentation-only and generally improve accuracy, but one prominent README paragraph still preserves an over-strong guarantee that can mislead users about the actual isolation properties.
vero/README.md
What T-Rex did
Comments Outside Diff (1)
vero/README.md, line 530 (link)agent_repoandread_only_paths, fail-open unlisted splits, and the GAIA identity caveat. A user who reads only the README can still rely on guarantees the implementation does not provide, so this paragraph should be softened or pointed at the caveated integrity model.Prompt To Fix With AI
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "docs(harbor): honest integrity guarantee..." | Re-trigger Greptile