ateredis: encode Actor/Worker records as binary protobuf (#307)#356
ateredis: encode Actor/Worker records as binary protobuf (#307)#356Frederick F. Kautz IV (fkautz) wants to merge 1 commit into
Conversation
…rate#307) Switch the on-the-wire encoding of Actor and Worker records in the Valkey state store from protojson to binary protobuf. This roughly halves per-record RAM and is ~4x faster to encode, extending runway toward the 1B-record target. Clean cutover, no migration path (pre-alpha, no production data): the keyspace is flushed on deploy. - Write path: proto.Marshal for Actor and Worker (Create/Update). - Read path: proto.Unmarshal (Get/Update/Delete/List). - The optimistic-concurrency version check stays in Go and compares the decoded version field, never the raw bytes, so WATCH/MULTI is unaffected and the encoding is never inspected from Redis Lua. - Unlike protojson, proto.Unmarshal accepts an empty byte slice and yields a zero-valued message without error. GetActor/GetWorker already reject that via their key-identity checks; this adds the same guard to the list paths (fetchActors, ListWorkers), which previously relied on protojson failing. - Atespace records and the worker-change pub/sub stay on protojson (low cardinality / ephemeral, not RAM-resident at scale); the cutover is scoped to the resident Actor/Worker records per agent-substrate#307. Tests (miniredis): binary-on-the-wire encoding for both record types (so a silent revert to protojson is caught, since the store API is otherwise encoding-agnostic), and rejection of empty/identity-mismatched values across the Get and List paths. Real-cluster round-trip was verified manually against a single-node Valkey 9.1.0 cluster. Signed-off-by: Frederick F. Kautz IV <fkautz@alumni.cmu.edu>
4c2a6ca to
37deaf5
Compare
| // `actor:<atespace>:<actor-id>`, holding a binary protobuf-serialized Actor | ||
| // record. Binary protobuf roughly halves per-record memory versus protojson and | ||
| // is ~4x faster to encode; the optimistic-concurrency version check stays in Go | ||
| // (comparing the decoded version field, never the raw bytes), so the encoding is |
There was a problem hiding this comment.
I don't know how relevant this comment is:
"Binary protobuf roughly halves per-record memory versus protojson and
// is ~4x faster to encode; the optimistic-concurrency version check stays in Go
Comment view// (comparing the decoded version field, never the raw bytes), so the encoding is
// never inspected from Redis lua."
Looks more like a point in time commit message than package documentation (e.g. 4x faster than what?).
I would just remove it or turn it into a message that justifies why we use binary proto encoding instead of JSON / textproto.
| // `worker:<namespace>:<pool-name>:<pod-name>`, holding a binary | ||
| // protobuf-serialized Worker record. | ||
| // | ||
| // Only the resident Actor/Worker records use binary protobuf. Atespace records |
There was a problem hiding this comment.
I think we should be consistent and use the same encoding across all resources. Over time we will add more resources, and having to think about which ones are in which encoding will add significant cognitive burden.
There was a problem hiding this comment.
Not a problem, I'll move the others over too.
Summary
Switch the on-the-wire encoding of
ActorandWorkerrecords in the Valkeystate store from protojson to binary protobuf. This roughly halves per-record
RAM and is about 4x faster to encode, which extends runway toward the
1-billion-record target. The change is scoped to the resident Actor/Worker
records, as described in #307.
This is a clean cutover with no migration path. Since this is pre-alpha with no
production data, the keyspace is flushed on deploy. protojson and binary
protobuf cannot decode each other, so it is not safe as a rolling update across
the cutover. Deploy by stopping the replicas, flushing the keyspace, then
starting the new version.
Changes
proto.Marshalfor Actor and Worker (Create and Update).proto.Unmarshal(Get, Update, Delete, List).versionfield rather than the raw bytes, so theWATCH/MULTIpaths areunaffected and the encoding is never inspected from Redis Lua.
fetchActorsandListWorkers).Unlike protojson,
proto.Unmarshalaccepts an empty byte slice and returns azero-valued message without an error.
GetActorandGetWorkeralreadyrejected that case via their identity checks, and the list paths now do too.
Out of scope (intentionally still protojson)
Testing
catches an accidental revert to protojson, plus rejection of empty, corrupt,
and identity-mismatched values across both the Get and List paths.
-tags integration): real-cluster round-trip for both types,an optimistic-concurrency conflict for both types, and a binary-safe lock test
using a token with a NUL byte. These skip unless
ATE_TEST_REDIS_ADDRpointsat a dedicated, empty cluster, and they refuse to run against a non-empty or
non-cluster server.
Verified locally against a single-node Valkey 9.1.0 cluster:
All unit and integration tests pass, and
go vetandgofmtare clean. Thereis no change to
store.Interface, and existing store tests pass unmodified.Closes #307.