Run remaining benchmark scenarios in fedify bench#802
Conversation
Implement actor and object benchmark runners so read-path workloads can fetch actor documents and object documents after discovery. Authenticated reads use the synthetic actor server, and dry runs now describe the resolved read targets before any load is sent. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Implement the fanout scenario with benchmark-owned sink inboxes, trigger endpoint delivery, stats-based queue drain polling, and delivery-throughput expectations. Queue task counters are now projected from benchmark stats so the runner can tell when fanout and outbox work has drained. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Implement expected-outcome failure benchmarking and weighted mixed scenario execution so the remaining executable scenario types can run under the published suite schema. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Update the benchmarking manual and changelog for the newly executable bench scenario runners, including queue-backend, failure-outcome, and mixed-run semantics. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Keep unauthenticated actor and object reads on a read-only destination gate so remote private reads do not require a reachable synthetic actor server. Also validate mixed child references during runner preflight so bad suites exit as configuration errors before probing the target. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Gate object-discovery actor and collection fetches before crawling them, serialize fanout trigger and drain windows, and merge mixed scenario latency histograms from the underlying samples. These fixes keep benchmark safety gates ahead of public discovery requests and make reported fanout and mixed metrics match the traffic actually measured. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Apply mixed maxInFlight as a parent-wide fetch limiter instead of copying the same cap into every concurrent child scenario. This keeps the configured in-flight safety limit intact even when children run in parallel. Also skip URL-only collection items when an object source type filter is set, because their type cannot be checked without additional discovery fetches. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Pass the dry-run fetch and read destination gate into object crawl discovery so inspection cannot contact off-target public actor or collection URLs before the same safety checks used by real runs. Also validate explicit object source URLs and crawl seed identifiers during runner preflight, making malformed object sources fail as configuration errors before any probe or load is sent. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Pass the configured dry-run fetch into actor handle discovery so acct: actor plans use the same injected fetch and User-Agent wrapper as real actor runs. This prevents dry-run planning from falling back to the global network fetch. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Route authenticated actor and object reads through the configured signing pipeline so presign and pipeline modes do not silently fall back to inline JIT signing during the measured send path. Resolve and gate inbound failure scenario inboxes before starting the load loop, then reuse the discovered delivery target for each generated failure request. This keeps repeated WebFinger and actor discovery out of the measured samples. fedify-dev#744 fedify-dev#785 Assisted-by: Codex:gpt-5.5
Treat drained fanout queue failures as failed benchmark samples so failed sink deliveries cannot inflate success rate or delivery throughput. Allow missing-actor-only failure scenarios to run against non-loopback targets without --advertise-host, since the missing actor identity is placed under the benchmark target rather than the synthetic actor server. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Route missing-actor failure destinations through the gate that does not require a reachable synthetic actor server, while still applying public load safety checks. Track fanout delivery throughput separately from request throughput so mixed scenarios cannot satisfy deliveryThroughput expectations with read traffic from other children. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Leave fanout throughputPerSec as the trigger request rate and report recipient delivery rate only through deliveryThroughputPerSec. This keeps fanout reports consistent with request counts and prevents mixed scenarios from folding delivery throughput into ordinary request throughput. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Send ActivityPub Accept headers while crawling actor and collection sources for object benchmarks, keeping WebFinger discovery on a JRD Accept header. Carry deliveryThroughputPerSec into scenario reports, render it in text and Markdown, and publish report schema v2 so the JSON report preserves the new field without mutating the immutable v1 schema. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Keep inbound failure scenarios from treating target-side 5xx responses as expected client-fault rejections, so server crashes remain visible in the error buckets. Reject ambiguous mixed child references when duplicate scenario names would otherwise make child lookup bind the first matching scenario silently. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Make remote failure benchmark modes call the target benchmark trigger and observe the target's queue metrics before reporting the expected fault as successful. This keeps offline or non-participating targets from producing synthetic passing samples. Require a sender for remote failure faults and require advertised sink reachability for non-loopback targets, matching the benchmark-owned sink server used to exercise the outbound path. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Unwrap ActivityPub activity items while crawling object sources so outbox entries benchmark their contained objects instead of the wrapper activity. Apply source type filters after that unwrap step. Validate actor recipients during runner preflight so malformed values fail as configuration errors before probing or sending benchmark load. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Mixed scenarios currently merge client-side measurements but discard child server metrics, so server-side expectations would pass preflight and then fail after load with null actual values. Reject those expectations during mixed validation instead, before probing or sending benchmark load. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Validate actor and object read URLs before safety gates and load scheduling so malformed resolved destinations do not turn into full-duration failed sample runs. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Fail actor and object read scenarios during runner validation when resolved URLs cannot be fetched safely, and reject mixed queue-observing children that would share unscoped target queue counters with concurrent queue producers. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Treat a completed outbound task with a remaining retry as the expected network-error signal, matching Fedify's normal retry path where transport failures enqueue follow-up work instead of failing the current task. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Only report fanout queue-drain latency after at least one measured drain sample, and validate explicit failure inbox selectors before discovery can turn typos into runtime URL errors. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Select typed objects from every unwrapped activity object candidate, and wait for all mixed child runners to settle before rethrowing a child setup error. fedify-dev#785 Assisted-by: Codex:gpt-5.5
Describe the newly executable benchmark scenario types in the CLI help and keep nearby schema/type comments aligned with collection being the only reserved scenario type. fedify-dev#785 Assisted-by: Codex:gpt-5.5
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds actor, object, fanout, failure, and mixed bench scenario runners; shared discovery/read helpers and destination gating; queue-task and permanent-failure snapshot parsing; delivery-throughput and report-v2 support; CLI dry-run/preflight wiring; extensive tests, fixtures, schema, and docs. ChangesFedify Benchmark Scenario Runners: Actor, Object, Fanout, Failure, Mixed
Sequence Diagram(s)sequenceDiagram
participant Runner
participant SyntheticServer
participant Target
participant StatsStore
participant Reporter
Runner->>SyntheticServer: spawn sink servers / create signing fleet
Runner->>Target: discovery requests (WebFinger, collections) and read requests
Runner->>Target: POST trigger (fanout / remote-failure)
Target->>StatsStore: update scopeMetrics (queue tasks, delivery failures)
Runner->>StatsStore: poll snapshots (baseline & end)
StatsStore->>Runner: return snapshots
Runner->>Runner: diff snapshots, aggregate client/server metrics
Runner->>Reporter: build report-v2 and render outputs
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request expands the fedify bench tool by implementing several new benchmark scenario runners, including actor, object, fanout, failure, and mixed, while publishing version 2 of both the scenario and report schemas. The changes also introduce support for deterministic sink bases, delivery throughput metrics, and comprehensive unit tests. The review feedback focuses on enhancing runtime robustness across these new components, specifically recommending defensive checks against untrusted JSON payloads in stats-client.ts and object-discovery.ts, safe optional chaining for actor keys in read.ts, and error handling for latency duration parsing in fanout.ts.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Invalid sinkBehavior latency strings could throw while preparing the local sink server. Treat malformed latency values as the default zero latency so the benchmark can still run with the rest of the sink behavior. fedify-dev#802 (comment) Assisted-by: Codex:gpt-5.5
Authenticated read signing now handles malformed synthetic actor records without crashing on a missing keys object. It reports the existing clear RSA signing error instead. fedify-dev#802 (comment) Assisted-by: Codex:gpt-5.5
Object discovery now wraps JSON parse failures with the fetched URL so users can tell which actor or collection response was malformed. fedify-dev#802 (comment) Assisted-by: Codex:gpt-5.5
Stats parsing now checks sum data points before reading their values so malformed target-provided JSON cannot discard the whole snapshot. fedify-dev#802 (comment) Assisted-by: Codex:gpt-5.5
|
@codex review |
|
/gemini review |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Code Review
This pull request introduces several new scenario runners (actor, object, fanout, failure, and mixed) to fedify bench, updating the benchmark scenario and report schemas to version 2. It also adds the sinkBase option to generate deterministic benchmark sink inbox URLs. Feedback on the changes identifies a bug in scenarioNeedsReachableLocalServer where it fails to account for the default ["remote-404"] fault when scenario.faults is empty, which incorrectly suggests that a reachable local server is not required.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Failure scenarios without an explicit fault now use the same remote-404 default when checking local sink reachability as the runner uses while executing the scenario. Scenario schema v2 also allows the omitted fault form so the documented runner default is reachable from CLI suites. fedify-dev#802 (comment) Assisted-by: Codex:gpt-5.5
|
/gemini review |
|
@codex review |
There was a problem hiding this comment.
Code Review
This pull request implements the 'actor', 'object', 'fanout', 'failure', and 'mixed' scenario runners for the 'fedify bench' command, upgrading the benchmark scenario and report schemas to version 2. It also includes comprehensive tests, validation logic, and documentation updates. Feedback on the changes points out that the destination safety gates passed to child runners in 'mixed' scenarios are pre-bound to the parent scenario, which can lead to incorrect rejections of valid child destinations. Additionally, it is recommended to limit the concurrency of destination checks in read scenarios to prevent DNS resolution bursts or socket exhaustion when processing large lists of URLs.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
Codex Review: Didn't find any major issues. Keep them coming! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Mixed child runners now re-bind destination safety gates with the scaled child scenario instead of inheriting the parent mixed scenario. This keeps child-specific settings such as destination overrides visible to the safety checks. fedify-dev#802 (comment) Assisted-by: Codex:gpt-5.5
Read scenarios now gate resolved URLs through a bounded worker pool instead of launching every destination check at once. This avoids creating a burst of DNS and socket work for large resolved actor or object URL lists. fedify-dev#802 (comment) Assisted-by: Codex:gpt-5.5
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2a8235764a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request implements the actor, object, fanout, failure, and mixed scenario runners for the fedify bench tool, updating the benchmark scenario and report schemas to version 2. It introduces collection crawling for object discovery, background queue drain observation for fanout/failure scenarios, concurrent execution of weighted child scenarios, and comprehensive validation and testing. Feedback is provided regarding objectCandidates in object-discovery.ts, where a missing object property on a wrapper activity could incorrectly cause the activity itself to be fetched as an object candidate.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Object discovery now drops ActivityPub wrapper activities that do not carry an object field instead of benchmarking the wrapper activity URL as if it were the target object. fedify-dev#802 (comment) Assisted-by: Codex:gpt-5.5
Scenario schema v2 now permits signatureVerification expectations for authenticated object reads while keeping unauthenticated object scenarios limited to read metrics. fedify-dev#802 (comment) Assisted-by: Codex:gpt-5.5
|
/gemini review |
|
@codex review |
There was a problem hiding this comment.
Code Review
This pull request implements several new benchmark scenario runners (actor, object, fanout, failure, and mixed) for the fedify bench CLI tool, leaving only the collection scenario type reserved but unimplemented. It introduces version 2 of both the benchmark scenario suite schema and the benchmark report schema to support new properties like sinkBase and deliveryThroughputPerSec. Additionally, it adds comprehensive unit tests, updates the documentation, and implements helper modules for crawling collections, resolving actor/object URLs, and validating destination safety. There are no review comments, so I have no feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
Codex Review: Didn't find any major issues. Bravo. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Background
This finishes the scenario runner work split out from #744 into #785. Before this change, the benchmark suite format could describe more scenario types than the CLI could actually run. That made the schema useful as a sketch of the intended benchmark surface, but left most of it unavailable to users.
What changed
This PR adds executable runners for
actor,object,fanout,failure, andmixedscenarios. The existinginboxandwebfingerrunners remain in place, andcollectionstays reserved by the suite format.Read scenarios now cover actor and object document fetches, including authenticated GET requests. Object discovery can crawl actor collections, unwrap activities, apply type filters, send ActivityPub
Acceptheaders, and gate every discovered read destination before fetching it.Fanout scenarios now drive the benchmark trigger endpoint, start local sink inboxes, wait for target queue drain, report delivery throughput separately from trigger request throughput, and avoid reporting queue-drain latency when no drain sample was measured.
Failure scenarios now exercise both inbound and outbound fault paths. Inbound faults cover invalid signatures and missing actors. Remote faults cover 404, 410, slow inbox, and network-error delivery behavior through the target's normal outbound path.
Mixed scenarios now run weighted child scenario blends, share the parent
maxInFlightcap, reject queue-metric combinations that cannot be attributed cleanly, and wait for child runners to settle before reporting setup errors.The report model and schema now preserve
deliveryThroughputPerSec, published as schema/bench/report-v2.json. Text and Markdown renderers show delivery throughput when a scenario measured it.The user-facing docs and fixtures were updated in docs/manual/benchmarking.md, CHANGES.md, packages/cli/src/bench/command.ts, and the benchmark scenario fixtures.
Closes #785. Part of #744.
Validation
I used targeted scenario tests throughout the implementation and ran the CLI package checks before each commit. The final branch has also passed the review follow-ups for destination gating, dry-run discovery, signing modes, fanout drain accounting, mixed result aggregation, malformed URL preflight, and help text.
Commands used during the last verification passes included:
mise run check-each clideno test -A packages/cli/src/bench/scenarios/object.test.ts packages/cli/src/bench/scenarios/mixed.test.tsdeno test -A packages/cli/src/bench/scenarios/registry.test.ts packages/cli/src/bench/command.test.ts packages/cli/src/bench/schema.test.tsgit diff --check