Add coherence-bootstrap spec tracking#150
Conversation
Child steps that boot from a cached parent snapshot now receive fresh source files via tar injection, preventing stale workspace data from leaking across builds.
cp -cR can fail with nonzero exit (cross-APFS-volume) but or_else only caught spawn failures. Now falls back to cp -R on either case.
Add optional workspace_dir column to the snapshots table with an idempotent schema migration. Change eviction return type from Vec<SnapshotId> to Vec<(SnapshotId, Option<String>)> so callers can clean up workspace directories alongside backend snapshots. New methods: put_with_workspace() and get_with_workspace(). Updated vm.rs eviction loop to remove workspace dirs on eviction.
…to cache Replace the tar-upload inject() mechanism with Docker bind mounts for workspace directories. The workspace is now mounted at container creation time via HostConfig.binds, eliminating the tar_directory() helper and UploadToContainerOptions usage entirely. Key changes: - Add WorkspaceMount type (host_path + guest_path) to types.rs - Add workspace_dir field to ExecutionResult for downstream consumption - Add workspace_cache_dir to VmConfig for COW cache persistence - Update VmBackend::create/restore to accept Optional<&WorkspaceMount> - Remove Vm::inject() from the trait and all implementations - Delete tar_directory() and remove tar crate dependency - Persist workspace via cow_copy after snapshot on cache miss - Store workspace path in registry via put_with_workspace on cache store - Return cached workspace path from registry on cache hit - Clean up evicted workspace directories alongside snapshots - Update VmRunner to construct WorkspaceMount instead of inject path
- Add parent_workspace_dir to StepContext, workspace_dir to StepResult - VmRunner: COW copy parent workspace for child steps, extract fresh for root - Scheduler: propagate workspace_dir through StepOutcome to children
- Format all Rust files - Collapse nested if-let to tuple pattern (clippy) - Replace map().unwrap_or() with map_or() (clippy) - Add #[doc] Errors section to cow_copy (clippy pedantic)
- Skip workspace persistence for CachingPolicy::None (fixes parallel uncached steps racing on shared "ephemeral" dir) - Runner keeps step tempdir alive via TempDir::keep() for uncached steps so children can still COW-copy from parent workspace - Single get_with_workspace() call on cache hit (was double lookup) - invalidate() now returns workspace_dir for cleanup (was leaking orphaned workspace dirs on disk) - Format test_cache_predicate.py with ruff
- Registry: unify put/put_with_workspace into single put(key, snap, ws); fold eviction into single lock scope to fix TOCTOU race (C3+C4) - Ephemeral snapshots: UUID-based labels to avoid parallel collision (C1); scheduler cleanup removes Docker images and temp dirs after build (C2) - Cache hits: peek_cache() in VmRunner skips expensive COW copy (I3); validate workspace_dir still exists on disk before trusting cache (I5) - Blocking I/O: wrap cow_copy, remove_dir_all, create_dir_all in spawn_blocking to avoid starving the tokio runtime (I2) - workspace.rs: clear partial state between failed APFS clone and fallback copy (I6); use -p flag to preserve permissions (I7); include src/dst paths in error messages - Python DSL: validate predicate callback is callable; wrap exceptions and reject None returns (I8)
…ots only Rearchitect workspace flow to kill the stale-source bug class (CLI-28): cached snapshots now carry ONLY system state (docker commit); workspace state is strictly run-scoped and rebases to the current run's fresh source at every cache-hit boundary. A cache-hit step contributes its snapshot for container lineage but no workspace dir; its children COW from the run's fresh source extract instead of a stale persisted copy. Core changes: - registry: workspace_dir always written NULL (legacy rows reaped); put() runs upsert+eviction in one transaction with DELETE..RETURNING; invalidate_if() CAS closes the stale-entry/fresh-insert race; contains_snapshot() backs all guarded image removals - vm: HmVm::execute takes a CancellationToken — cooperative cancel with awaited teardown (destroy/chown reclaim strictly happens-before the runner drops the workspace tempdir); evictions deferred to end-of-run (cleanup_deferred_evictions) so in-flight readers never lose their tag; put-failure demotes the snapshot to ephemeral instead of leaking it; gc_orphaned_snapshots sweeps aged rowless tags at startup - scheduler: per-step workspace refcounting frees a parent's kept dir the moment its last BuildsIn consumer finishes (temp footprint now tracks the live DAG frontier); per-step timeout fires a child token and AWAITS teardown instead of dropping the future; ephemeral snapshot cleanup is guarded by contains_snapshot so a concurrently re-registered tag survives - runner: once-per-run shared source extract (Arc<OnceCell<TempDir>>); cow_copy call sites moved into spawn_blocking; cache-hit path does zero workspace prep - workspace: single-attempt cp with captured stderr (cp -c self-falls back to copyfile(2); the manual retry masked real errors) - docker: per-tag GC removal (untag-safe for multi-tag images); exec quiesce on cancelled commands before chown reclaim - DSL: forever() docs state the hit-boundary semantics (workspace writes are not replayed across runs); predicate keygen cross-language tests Semantic change: on a cache hit, files a step wrote into /workspace are no longer visible to downstream steps across runs — downstream sees the current run's source tree. Write build outputs to system paths or use on_change() when workspace outputs must survive hits. Acceptance: deep_cache_chain_workspace passes from warm and cold cache, asserting [a]/[b] cache hits AND fresh marker content (the CLI-28 contract). local_fork_cache + local_parallelism pass. Test fixtures modernized from the obsolete def build()/.run() DSL form.
…workspace-broken # Conflicts: # crates/hm-exec/src/local/backend.rs # crates/hm-exec/src/local/runner/vm.rs # crates/hm-exec/src/local/scheduler.rs # crates/hm-vm/src/docker.rs # crates/hm-vm/src/registry.rs # crates/hm-vm/src/vm.rs
…workspace-broken # Conflicts: # crates/hm-exec/src/local/scheduler.rs
Initialize coherence-bootstrap spec tracking for harmont-cli. 6 specs with 38 acceptance criteria linked to existing tests via the Dolt-backed codeintel store.
|
Very cool to see the specs examples! There's a lot of rough edges, and current version is not really ready for production use.
To make sure everyone can sync Coherence specs for the project and execute tests, you need to export Once I figure out deterministic way to reverse-engineer specs, I will try to do it and push a PR with 5-level taxonomy. On In my example taxonomy,
|
Summary
.coherence/project.toml)Specs created
SPEC-dsl-renderingSPEC-local-execSPEC-cachingSPEC-cloudSPEC-cli-uxSPEC-pipeline-irGaps (SKIP — no test exists yet)
hm cache cleancommandTest plan
coherence-bootstrap verify-spec SPEC-pipeline-ir— all 4 passcoherence-bootstrap verify-spec SPEC-dsl-rendering— all 7 passcoherence-bootstrap verify-spec SPEC-cli-ux— all 6 passcoherence-bootstrap verify-spec SPEC-local-exec— all 8 pass (some Docker-gated)coherence-bootstrap verify-spec SPEC-caching— all 7 pass (some Docker-gated)coherence-bootstrap verify-spec SPEC-cloud— all 6 pass (some SKIP for live API)