Skip to content

Add coherence-bootstrap spec tracking#150

Draft
markovejnovic wants to merge 19 commits into
mainfrom
try-coherence
Draft

Add coherence-bootstrap spec tracking#150
markovejnovic wants to merge 19 commits into
mainfrom
try-coherence

Conversation

@markovejnovic

Copy link
Copy Markdown
Contributor

Summary

  • Initialize coherence-bootstrap project manifest (.coherence/project.toml)
  • 6 specs with 38 acceptance criteria covering all major harmont-cli subsystems
  • All ACs linked to existing tests via the Dolt-backed codeintel store

Specs created

Spec ACs Coverage
SPEC-dsl-rendering 7 Python/TS rendering, fork, wait, toolchains, errors, cache policy
SPEC-local-exec 8 Docker execution, parallelism, keep-going, fail-fast, workspace inheritance, timeouts, events, preflight
SPEC-caching 7 Cache hit/miss, on_change, TTL, workspace propagation, ephemeral cleanup, cache clean
SPEC-cloud 6 Browser/paste login, env token, build submission, auth preflight, cancel
SPEC-cli-ux 6 Init templates, JSON/human output, exit codes, pipeline listing, config layering
SPEC-pipeline-ir 4 DAG validation, edge semantics, default image, env merging

Gaps (SKIP — no test exists yet)

  • Step timeout enforcement
  • hm cache clean command
  • Browser OAuth / paste login / cloud build submit / cloud cancel (need live infra)

Test plan

  • coherence-bootstrap verify-spec SPEC-pipeline-ir — all 4 pass
  • coherence-bootstrap verify-spec SPEC-dsl-rendering — all 7 pass
  • coherence-bootstrap verify-spec SPEC-cli-ux — all 6 pass
  • coherence-bootstrap verify-spec SPEC-local-exec — all 8 pass (some Docker-gated)
  • coherence-bootstrap verify-spec SPEC-caching — all 7 pass (some Docker-gated)
  • coherence-bootstrap verify-spec SPEC-cloud — all 6 pass (some SKIP for live API)

Child steps that boot from a cached parent snapshot now receive
fresh source files via tar injection, preventing stale workspace
data from leaking across builds.
cp -cR can fail with nonzero exit (cross-APFS-volume) but or_else
only caught spawn failures. Now falls back to cp -R on either case.
Add optional workspace_dir column to the snapshots table with an
idempotent schema migration. Change eviction return type from
Vec<SnapshotId> to Vec<(SnapshotId, Option<String>)> so callers can
clean up workspace directories alongside backend snapshots.

New methods: put_with_workspace() and get_with_workspace().
Updated vm.rs eviction loop to remove workspace dirs on eviction.
…to cache

Replace the tar-upload inject() mechanism with Docker bind mounts for
workspace directories. The workspace is now mounted at container creation
time via HostConfig.binds, eliminating the tar_directory() helper and
UploadToContainerOptions usage entirely.

Key changes:
- Add WorkspaceMount type (host_path + guest_path) to types.rs
- Add workspace_dir field to ExecutionResult for downstream consumption
- Add workspace_cache_dir to VmConfig for COW cache persistence
- Update VmBackend::create/restore to accept Optional<&WorkspaceMount>
- Remove Vm::inject() from the trait and all implementations
- Delete tar_directory() and remove tar crate dependency
- Persist workspace via cow_copy after snapshot on cache miss
- Store workspace path in registry via put_with_workspace on cache store
- Return cached workspace path from registry on cache hit
- Clean up evicted workspace directories alongside snapshots
- Update VmRunner to construct WorkspaceMount instead of inject path
- Add parent_workspace_dir to StepContext, workspace_dir to StepResult
- VmRunner: COW copy parent workspace for child steps, extract fresh for root
- Scheduler: propagate workspace_dir through StepOutcome to children
- Format all Rust files
- Collapse nested if-let to tuple pattern (clippy)
- Replace map().unwrap_or() with map_or() (clippy)
- Add #[doc] Errors section to cow_copy (clippy pedantic)
- Skip workspace persistence for CachingPolicy::None (fixes parallel
  uncached steps racing on shared "ephemeral" dir)
- Runner keeps step tempdir alive via TempDir::keep() for uncached
  steps so children can still COW-copy from parent workspace
- Single get_with_workspace() call on cache hit (was double lookup)
- invalidate() now returns workspace_dir for cleanup (was leaking
  orphaned workspace dirs on disk)
- Format test_cache_predicate.py with ruff
- Registry: unify put/put_with_workspace into single put(key, snap, ws);
  fold eviction into single lock scope to fix TOCTOU race (C3+C4)
- Ephemeral snapshots: UUID-based labels to avoid parallel collision (C1);
  scheduler cleanup removes Docker images and temp dirs after build (C2)
- Cache hits: peek_cache() in VmRunner skips expensive COW copy (I3);
  validate workspace_dir still exists on disk before trusting cache (I5)
- Blocking I/O: wrap cow_copy, remove_dir_all, create_dir_all in
  spawn_blocking to avoid starving the tokio runtime (I2)
- workspace.rs: clear partial state between failed APFS clone and
  fallback copy (I6); use -p flag to preserve permissions (I7);
  include src/dst paths in error messages
- Python DSL: validate predicate callback is callable; wrap exceptions
  and reject None returns (I8)
…ots only

Rearchitect workspace flow to kill the stale-source bug class (CLI-28):
cached snapshots now carry ONLY system state (docker commit); workspace
state is strictly run-scoped and rebases to the current run's fresh
source at every cache-hit boundary. A cache-hit step contributes its
snapshot for container lineage but no workspace dir; its children COW
from the run's fresh source extract instead of a stale persisted copy.

Core changes:
- registry: workspace_dir always written NULL (legacy rows reaped);
  put() runs upsert+eviction in one transaction with DELETE..RETURNING;
  invalidate_if() CAS closes the stale-entry/fresh-insert race;
  contains_snapshot() backs all guarded image removals
- vm: HmVm::execute takes a CancellationToken — cooperative cancel with
  awaited teardown (destroy/chown reclaim strictly happens-before the
  runner drops the workspace tempdir); evictions deferred to end-of-run
  (cleanup_deferred_evictions) so in-flight readers never lose their
  tag; put-failure demotes the snapshot to ephemeral instead of leaking
  it; gc_orphaned_snapshots sweeps aged rowless tags at startup
- scheduler: per-step workspace refcounting frees a parent's kept dir
  the moment its last BuildsIn consumer finishes (temp footprint now
  tracks the live DAG frontier); per-step timeout fires a child token
  and AWAITS teardown instead of dropping the future; ephemeral
  snapshot cleanup is guarded by contains_snapshot so a concurrently
  re-registered tag survives
- runner: once-per-run shared source extract (Arc<OnceCell<TempDir>>);
  cow_copy call sites moved into spawn_blocking; cache-hit path does
  zero workspace prep
- workspace: single-attempt cp with captured stderr (cp -c self-falls
  back to copyfile(2); the manual retry masked real errors)
- docker: per-tag GC removal (untag-safe for multi-tag images); exec
  quiesce on cancelled commands before chown reclaim
- DSL: forever() docs state the hit-boundary semantics (workspace
  writes are not replayed across runs); predicate keygen
  cross-language tests

Semantic change: on a cache hit, files a step wrote into /workspace are
no longer visible to downstream steps across runs — downstream sees the
current run's source tree. Write build outputs to system paths or use
on_change() when workspace outputs must survive hits.

Acceptance: deep_cache_chain_workspace passes from warm and cold cache,
asserting [a]/[b] cache hits AND fresh marker content (the CLI-28
contract). local_fork_cache + local_parallelism pass. Test fixtures
modernized from the obsolete def build()/.run() DSL form.
…workspace-broken

# Conflicts:
#	crates/hm-exec/src/local/backend.rs
#	crates/hm-exec/src/local/runner/vm.rs
#	crates/hm-exec/src/local/scheduler.rs
#	crates/hm-vm/src/docker.rs
#	crates/hm-vm/src/registry.rs
#	crates/hm-vm/src/vm.rs
…workspace-broken

# Conflicts:
#	crates/hm-exec/src/local/scheduler.rs
Initialize coherence-bootstrap spec tracking for harmont-cli.
6 specs with 38 acceptance criteria linked to existing tests
via the Dolt-backed codeintel store.
@konovalov-nk

konovalov-nk commented Jun 13, 2026

Copy link
Copy Markdown

Very cool to see the specs examples!

There's a lot of rough edges, and current version is not really ready for production use.

  1. It seems that Claude was able to figure out the format for creating specs and ACs ✅
  2. The specs are currently living in Dolt DB on your machine and the problem is that you can't push DB to GitHub directly. I'd rather not store raw SQL, so that's why I added import/export jsonl commands. Eventually I want to add https://github.com/dolthub/doltlite as a default DB adapter.

To make sure everyone can sync Coherence specs for the project and execute tests, you need to export jsonl from your DB and push it.

Once I figure out deterministic way to reverse-engineer specs, I will try to do it and push a PR with 5-level taxonomy. harmont-cli is >100k lines of Rust code, so this would be a serious benchmark for Coherence.

On coherence-bootstrap/bootstrap-specs.jsonl I have ~12k LoC and 285 DB records already. So I expect harmont-cli to have even more than that. I can't tell what are exactly specs that Claude created because it didn't push the jsonl, but I assume those are product-level. Coherence makes it possible to define specs as granular as you want.

In my example taxonomy,

  • The Product level is highest abstraction (UI, user-visible behavior), meaning more "wiggle" room for agent to implement things.
  • Foundation specs are the lowest level (infrastructure, DB models, low level processes you care about).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants