Skip to content

ateredis: encode Actor/Worker records as binary protobuf (#307)#356

Open
Frederick F. Kautz IV (fkautz) wants to merge 1 commit into
agent-substrate:mainfrom
fkautz:issue-307-binary-protobuf
Open

ateredis: encode Actor/Worker records as binary protobuf (#307)#356
Frederick F. Kautz IV (fkautz) wants to merge 1 commit into
agent-substrate:mainfrom
fkautz:issue-307-binary-protobuf

Conversation

@fkautz

Copy link
Copy Markdown
Contributor

Summary

Switch the on-the-wire encoding of Actor and Worker records in the Valkey
state store from protojson to binary protobuf. This roughly halves per-record
RAM and is about 4x faster to encode, which extends runway toward the
1-billion-record target. The change is scoped to the resident Actor/Worker
records, as described in #307.

This is a clean cutover with no migration path. Since this is pre-alpha with no
production data, the keyspace is flushed on deploy. protojson and binary
protobuf cannot decode each other, so it is not safe as a rolling update across
the cutover. Deploy by stopping the replicas, flushing the keyspace, then
starting the new version.

Changes

  • Write path: proto.Marshal for Actor and Worker (Create and Update).
  • Read path: proto.Unmarshal (Get, Update, Delete, List).
  • The optimistic-concurrency version check stays in Go and compares the decoded
    version field rather than the raw bytes, so the WATCH/MULTI paths are
    unaffected and the encoding is never inspected from Redis Lua.
  • Added key-identity guards to the list paths (fetchActors and ListWorkers).
    Unlike protojson, proto.Unmarshal accepts an empty byte slice and returns a
    zero-valued message without an error. GetActor and GetWorker already
    rejected that case via their identity checks, and the list paths now do too.

Out of scope (intentionally still protojson)

  • Atespace records, which are low cardinality and use negligible RAM at scale.
  • The worker-change pub/sub payload, which is ephemeral and never persisted.

Testing

  • Unit (miniredis): a binary-on-the-wire assertion for both record types, which
    catches an accidental revert to protojson, plus rejection of empty, corrupt,
    and identity-mismatched values across both the Get and List paths.
  • Integration (-tags integration): real-cluster round-trip for both types,
    an optimistic-concurrency conflict for both types, and a binary-safe lock test
    using a token with a NUL byte. These skip unless ATE_TEST_REDIS_ADDR points
    at a dedicated, empty cluster, and they refuse to run against a non-empty or
    non-cluster server.

Verified locally against a single-node Valkey 9.1.0 cluster:

ATE_TEST_REDIS_ADDR=127.0.0.1:7300 go test -tags integration \
    ./cmd/ateapi/internal/store/ateredis/

All unit and integration tests pass, and go vet and gofmt are clean. There
is no change to store.Interface, and existing store tests pass unmodified.

Closes #307.

…rate#307)

Switch the on-the-wire encoding of Actor and Worker records in the Valkey state
store from protojson to binary protobuf. This roughly halves per-record RAM and
is ~4x faster to encode, extending runway toward the 1B-record target. Clean
cutover, no migration path (pre-alpha, no production data): the keyspace is
flushed on deploy.

- Write path: proto.Marshal for Actor and Worker (Create/Update).
- Read path: proto.Unmarshal (Get/Update/Delete/List).
- The optimistic-concurrency version check stays in Go and compares the decoded
  version field, never the raw bytes, so WATCH/MULTI is unaffected and the
  encoding is never inspected from Redis Lua.
- Unlike protojson, proto.Unmarshal accepts an empty byte slice and yields a
  zero-valued message without error. GetActor/GetWorker already reject that via
  their key-identity checks; this adds the same guard to the list paths
  (fetchActors, ListWorkers), which previously relied on protojson failing.
- Atespace records and the worker-change pub/sub stay on protojson (low
  cardinality / ephemeral, not RAM-resident at scale); the cutover is scoped to
  the resident Actor/Worker records per agent-substrate#307.

Tests (miniredis): binary-on-the-wire encoding for both record types (so a
silent revert to protojson is caught, since the store API is otherwise
encoding-agnostic), and rejection of empty/identity-mismatched values across the
Get and List paths. Real-cluster round-trip was verified manually against a
single-node Valkey 9.1.0 cluster.

Signed-off-by: Frederick F. Kautz IV <fkautz@alumni.cmu.edu>
// `actor:<atespace>:<actor-id>`, holding a binary protobuf-serialized Actor
// record. Binary protobuf roughly halves per-record memory versus protojson and
// is ~4x faster to encode; the optimistic-concurrency version check stays in Go
// (comparing the decoded version field, never the raw bytes), so the encoding is

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how relevant this comment is:

"Binary protobuf roughly halves per-record memory versus protojson and
// is ~4x faster to encode; the optimistic-concurrency version check stays in Go
Comment view// (comparing the decoded version field, never the raw bytes), so the encoding is
// never inspected from Redis lua."

Looks more like a point in time commit message than package documentation (e.g. 4x faster than what?).

I would just remove it or turn it into a message that justifies why we use binary proto encoding instead of JSON / textproto.

// `worker:<namespace>:<pool-name>:<pod-name>`, holding a binary
// protobuf-serialized Worker record.
//
// Only the resident Actor/Worker records use binary protobuf. Atespace records

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be consistent and use the same encoding across all resources. Over time we will add more resources, and having to think about which ones are in which encoding will add significant cognitive burden.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem, I'll move the others over too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adopt binary protobuf encoding for control-plane records (Actor/Worker)

2 participants