Skip to content

Add opt-in nerve.db retention: message compaction + telemetry pruning#116

Open
alex-fedotyev wants to merge 1 commit into
ClickHouse:mainfrom
alex-fedotyev:alex/db-retention
Open

Add opt-in nerve.db retention: message compaction + telemetry pruning#116
alex-fedotyev wants to merge 1 commit into
ClickHouse:mainfrom
alex-fedotyev:alex/db-retention

Conversation

@alex-fedotyev

Copy link
Copy Markdown
Contributor

Summary

nerve.db grows without bound. On a heavily-used install it reached 767MB, of
which the messages table is 714MB, dominated by the machine-facing blocks
JSON (523MB) and thinking (54MB). That JSON is only needed while a message is
rendered live or being indexed into memU. memU extraction reads content
(gated by the per-session last_memorized_at watermark), and SDK resume
restores context from the .jsonl transcript rather than DB blocks, so old
already-memorized messages can drop blocks/thinking while keeping content
(the UI falls back to content text when blocks is NULL).

This adds an opt-in retention subsystem, disabled by default so merging it
mutates no existing data:

  • MaintenanceStore DB mixin:
    • compact_messages nulls blocks/thinking for messages older than
      retention_full_days that are past their session's memorize watermark, in
      a non-starred, non-active session. Keeps content. Idempotent.
    • prune_telemetry / prune_file_snapshots delete append-only rows older
      than retention_days.
    • checkpoint truncates the WAL; vacuum rewrites the file to reclaim
      freed pages.
  • RetentionConfig (enabled=False, retention_days=90,
    retention_full_days=30, interval_hours=24).
  • A background lifespan task that no-ops unless enabled.
  • nerve db prune [--dry-run] and nerve db vacuum commands.
  • Reference docs in docs/config.md.

No schema change.

Reclaim model

Nulling columns and deleting rows frees pages to the SQLite freelist (reused by
later writes) but does not shrink the file. PRAGMA wal_checkpoint(TRUNCATE)
truncates only the WAL. Only VACUUM shrinks the file, so it is an explicit
operator step (nerve db vacuum), never on the background loop.

Operator reclaim sequence

  1. Set retention.enabled: true (optionally tune the windows) in the local
    config; restart to pick it up.
  2. nerve db prune --dry-run to preview.
  3. Back up nerve.db, then nerve db prune followed by nerve db vacuum
    (daemon stopped) to shrink the file.

Test plan

  • New tests/test_db_retention.py (15 tests): compaction eligibility (old
    • memorized + non-starred + non-active), and every exclusion (recent,
      starred, active, never-memorized, message-newer-than-watermark); content
      always kept; idempotency; dry-run mutates nothing; telemetry/snapshot pruning
      deletes old and keeps new while leaving core tables untouched; combined
      run_retention.
  • Full backend suite green except pre-existing, environment-specific
    failures unrelated to this change (docker-mode detection under a container
    env var; codex tests that need untracked local fixtures).
  • nerve db prune --dry-run, nerve db prune, and nerve db vacuum
    exercised end-to-end against a throwaway database.
  • Config parses the new block, clamps ints to >= 1, defaults to disabled,
    and raises no unknown-key warnings.

nerve.db grows unbounded. On a heavily-used install it reached 767MB,
714MB of which is the messages table, dominated by the machine-facing
blocks (523MB) and thinking (54MB) JSON. That JSON is only needed while
a message is rendered live or being indexed into memU: memU reads
content (gated by the per-session last_memorized_at watermark) and SDK
resume restores from the .jsonl transcript, not DB blocks. So old,
already-memorized messages can drop blocks/thinking while keeping
content (the UI falls back to content text when blocks is NULL).

This adds an opt-in retention subsystem, disabled by default:

- MaintenanceStore mixin. compact_messages nulls blocks/thinking for
  messages older than retention_full_days that are past their session's
  memorize watermark, in a non-starred, non-active session; content is
  kept. prune_telemetry and prune_file_snapshots delete append-only
  rows older than retention_days. checkpoint truncates the WAL; vacuum
  rewrites the file to reclaim freed pages.
- RetentionConfig (enabled=False, retention_days=90,
  retention_full_days=30, interval_hours=24).
- A background lifespan task that no-ops unless enabled.
- nerve db prune [--dry-run] and nerve db vacuum CLI commands.

Shipping disabled by default so a merge mutates no existing data. The
operator opts in, previews with --dry-run, then runs prune + vacuum to
reclaim the file. No schema change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant