Skip to content

Run remaining benchmark scenarios in fedify bench#802

Open
dahlia wants to merge 50 commits into
fedify-dev:mainfrom
dahlia:feat/bench/remaining-scenarios
Open

Run remaining benchmark scenarios in fedify bench#802
dahlia wants to merge 50 commits into
fedify-dev:mainfrom
dahlia:feat/bench/remaining-scenarios

Conversation

@dahlia

@dahlia dahlia commented Jun 11, 2026

Copy link
Copy Markdown
Member

Background

This finishes the scenario runner work split out from #744 into #785. Before this change, the benchmark suite format could describe more scenario types than the CLI could actually run. That made the schema useful as a sketch of the intended benchmark surface, but left most of it unavailable to users.

What changed

This PR adds executable runners for actor, object, fanout, failure, and mixed scenarios. The existing inbox and webfinger runners remain in place, and collection stays reserved by the suite format.

Read scenarios now cover actor and object document fetches, including authenticated GET requests. Object discovery can crawl actor collections, unwrap activities, apply type filters, send ActivityPub Accept headers, and gate every discovered read destination before fetching it.

Fanout scenarios now drive the benchmark trigger endpoint, start local sink inboxes, wait for target queue drain, report delivery throughput separately from trigger request throughput, and avoid reporting queue-drain latency when no drain sample was measured.

Failure scenarios now exercise both inbound and outbound fault paths. Inbound faults cover invalid signatures and missing actors. Remote faults cover 404, 410, slow inbox, and network-error delivery behavior through the target's normal outbound path.

Mixed scenarios now run weighted child scenario blends, share the parent maxInFlight cap, reject queue-metric combinations that cannot be attributed cleanly, and wait for child runners to settle before reporting setup errors.

The report model and schema now preserve deliveryThroughputPerSec, published as schema/bench/report-v2.json. Text and Markdown renderers show delivery throughput when a scenario measured it.

The user-facing docs and fixtures were updated in docs/manual/benchmarking.md, CHANGES.md, packages/cli/src/bench/command.ts, and the benchmark scenario fixtures.

Closes #785. Part of #744.

Validation

I used targeted scenario tests throughout the implementation and ran the CLI package checks before each commit. The final branch has also passed the review follow-ups for destination gating, dry-run discovery, signing modes, fanout drain accounting, mixed result aggregation, malformed URL preflight, and help text.

Commands used during the last verification passes included:

  • mise run check-each cli
  • deno test -A packages/cli/src/bench/scenarios/object.test.ts packages/cli/src/bench/scenarios/mixed.test.ts
  • deno test -A packages/cli/src/bench/scenarios/registry.test.ts packages/cli/src/bench/command.test.ts packages/cli/src/bench/schema.test.ts
  • git diff --check

dahlia added 24 commits June 9, 2026 19:49
Implement actor and object benchmark runners so read-path workloads can fetch
actor documents and object documents after discovery.  Authenticated reads use
the synthetic actor server, and dry runs now describe the resolved read targets
before any load is sent.

fedify-dev#744
fedify-dev#785

Assisted-by: Codex:gpt-5.5
Implement the fanout scenario with benchmark-owned sink inboxes, trigger
endpoint delivery, stats-based queue drain polling, and delivery-throughput
expectations.  Queue task counters are now projected from benchmark stats so
the runner can tell when fanout and outbox work has drained.

fedify-dev#744
fedify-dev#785

Assisted-by: Codex:gpt-5.5
Implement expected-outcome failure benchmarking and weighted mixed scenario
execution so the remaining executable scenario types can run under the
published suite schema.

fedify-dev#744
fedify-dev#785

Assisted-by: Codex:gpt-5.5
Update the benchmarking manual and changelog for the newly executable bench
scenario runners, including queue-backend, failure-outcome, and mixed-run
semantics.

fedify-dev#744
fedify-dev#785

Assisted-by: Codex:gpt-5.5
Keep unauthenticated actor and object reads on a read-only destination gate so
remote private reads do not require a reachable synthetic actor server.  Also
validate mixed child references during runner preflight so bad suites exit as
configuration errors before probing the target.

fedify-dev#744
fedify-dev#785

Assisted-by: Codex:gpt-5.5
Gate object-discovery actor and collection fetches before crawling them,
serialize fanout trigger and drain windows, and merge mixed scenario latency
histograms from the underlying samples.

These fixes keep benchmark safety gates ahead of public discovery requests and
make reported fanout and mixed metrics match the traffic actually measured.

fedify-dev#744
fedify-dev#785

Assisted-by: Codex:gpt-5.5
Apply mixed maxInFlight as a parent-wide fetch limiter instead of copying the
same cap into every concurrent child scenario.  This keeps the configured
in-flight safety limit intact even when children run in parallel.

Also skip URL-only collection items when an object source type filter is set,
because their type cannot be checked without additional discovery fetches.

fedify-dev#744
fedify-dev#785

Assisted-by: Codex:gpt-5.5
Pass the dry-run fetch and read destination gate into object crawl discovery so
inspection cannot contact off-target public actor or collection URLs before the
same safety checks used by real runs.

Also validate explicit object source URLs and crawl seed identifiers during
runner preflight, making malformed object sources fail as configuration errors
before any probe or load is sent.

fedify-dev#744
fedify-dev#785

Assisted-by: Codex:gpt-5.5
Pass the configured dry-run fetch into actor handle discovery so acct: actor
plans use the same injected fetch and User-Agent wrapper as real actor runs.
This prevents dry-run planning from falling back to the global network fetch.

fedify-dev#744
fedify-dev#785

Assisted-by: Codex:gpt-5.5
Route authenticated actor and object reads through the configured signing
pipeline so presign and pipeline modes do not silently fall back to inline JIT
signing during the measured send path.

Resolve and gate inbound failure scenario inboxes before starting the load loop,
then reuse the discovered delivery target for each generated failure request.
This keeps repeated WebFinger and actor discovery out of the measured samples.

fedify-dev#744
fedify-dev#785

Assisted-by: Codex:gpt-5.5
Treat drained fanout queue failures as failed benchmark samples so
failed sink deliveries cannot inflate success rate or delivery
throughput.

Allow missing-actor-only failure scenarios to run against non-loopback
targets without --advertise-host, since the missing actor identity is
placed under the benchmark target rather than the synthetic actor server.

fedify-dev#785

Assisted-by: Codex:gpt-5.5
Route missing-actor failure destinations through the gate that does not
require a reachable synthetic actor server, while still applying public
load safety checks.

Track fanout delivery throughput separately from request throughput so
mixed scenarios cannot satisfy deliveryThroughput expectations with read
traffic from other children.

fedify-dev#785

Assisted-by: Codex:gpt-5.5
Leave fanout throughputPerSec as the trigger request rate and report
recipient delivery rate only through deliveryThroughputPerSec.  This keeps
fanout reports consistent with request counts and prevents mixed scenarios
from folding delivery throughput into ordinary request throughput.

fedify-dev#785

Assisted-by: Codex:gpt-5.5
Send ActivityPub Accept headers while crawling actor and collection sources
for object benchmarks, keeping WebFinger discovery on a JRD Accept header.

Carry deliveryThroughputPerSec into scenario reports, render it in text and
Markdown, and publish report schema v2 so the JSON report preserves the new
field without mutating the immutable v1 schema.

fedify-dev#785

Assisted-by: Codex:gpt-5.5
Keep inbound failure scenarios from treating target-side 5xx responses
as expected client-fault rejections, so server crashes remain visible in
the error buckets.

Reject ambiguous mixed child references when duplicate scenario names
would otherwise make child lookup bind the first matching scenario
silently.

fedify-dev#785

Assisted-by: Codex:gpt-5.5
Make remote failure benchmark modes call the target benchmark trigger and
observe the target's queue metrics before reporting the expected fault as
successful.  This keeps offline or non-participating targets from producing
synthetic passing samples.

Require a sender for remote failure faults and require advertised sink
reachability for non-loopback targets, matching the benchmark-owned sink
server used to exercise the outbound path.

fedify-dev#785

Assisted-by: Codex:gpt-5.5
Unwrap ActivityPub activity items while crawling object sources so outbox
entries benchmark their contained objects instead of the wrapper activity.
Apply source type filters after that unwrap step.

Validate actor recipients during runner preflight so malformed values fail as
configuration errors before probing or sending benchmark load.

fedify-dev#785

Assisted-by: Codex:gpt-5.5
Mixed scenarios currently merge client-side measurements but discard child
server metrics, so server-side expectations would pass preflight and then fail
after load with null actual values.

Reject those expectations during mixed validation instead, before probing or
sending benchmark load.

fedify-dev#785

Assisted-by: Codex:gpt-5.5
Validate actor and object read URLs before safety gates and load scheduling so
malformed resolved destinations do not turn into full-duration failed sample
runs.

fedify-dev#785

Assisted-by: Codex:gpt-5.5
Fail actor and object read scenarios during runner validation when resolved
URLs cannot be fetched safely, and reject mixed queue-observing children that
would share unscoped target queue counters with concurrent queue producers.

fedify-dev#785

Assisted-by: Codex:gpt-5.5
Treat a completed outbound task with a remaining retry as the expected
network-error signal, matching Fedify's normal retry path where transport
failures enqueue follow-up work instead of failing the current task.

fedify-dev#785

Assisted-by: Codex:gpt-5.5
Only report fanout queue-drain latency after at least one measured drain sample,
and validate explicit failure inbox selectors before discovery can turn typos
into runtime URL errors.

fedify-dev#785

Assisted-by: Codex:gpt-5.5
Select typed objects from every unwrapped activity object candidate, and wait
for all mixed child runners to settle before rethrowing a child setup error.

fedify-dev#785

Assisted-by: Codex:gpt-5.5
Describe the newly executable benchmark scenario types in the CLI help and keep
nearby schema/type comments aligned with collection being the only reserved
scenario type.

fedify-dev#785

Assisted-by: Codex:gpt-5.5
@dahlia dahlia added this to the Fedify 2.3 milestone Jun 11, 2026
@dahlia dahlia self-assigned this Jun 11, 2026
@dahlia dahlia added component/cli CLI tools related component/otel OpenTelemetry integration labels Jun 11, 2026
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds actor, object, fanout, failure, and mixed bench scenario runners; shared discovery/read helpers and destination gating; queue-task and permanent-failure snapshot parsing; delivery-throughput and report-v2 support; CLI dry-run/preflight wiring; extensive tests, fixtures, schema, and docs.

Changes

Fedify Benchmark Scenario Runners: Actor, Object, Fanout, Failure, Mixed

Layer / File(s) Summary
All benchmark runners & infra
packages/cli/src/bench/*, schema/bench/*, docs/manual/benchmarking.md, CHANGES.md
Implements actor/object/fanout/failure/mixed runners; shared discovery/read helpers and destination gates; server snapshot parsing/diffing for queue tasks and permanent delivery failures; deliveryThroughput metric plumbing and report-v2 schema/model/build/render/expect updates; CLI dry-run plan and safety-gating changes; many tests, fixtures, and docs. See hidden review stack for per-range mapping.

Sequence Diagram(s)

sequenceDiagram
  participant Runner
  participant SyntheticServer
  participant Target
  participant StatsStore
  participant Reporter
  Runner->>SyntheticServer: spawn sink servers / create signing fleet
  Runner->>Target: discovery requests (WebFinger, collections) and read requests
  Runner->>Target: POST trigger (fanout / remote-failure)
  Target->>StatsStore: update scopeMetrics (queue tasks, delivery failures)
  Runner->>StatsStore: poll snapshots (baseline & end)
  StatsStore->>Runner: return snapshots
  Runner->>Runner: diff snapshots, aggregate client/server metrics
  Runner->>Reporter: build report-v2 and render outputs
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Suggested labels

type/feature

Suggested reviewers

  • sij411
  • 2chanhaeng
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@dahlia

dahlia commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request expands the fedify bench tool by implementing several new benchmark scenario runners, including actor, object, fanout, failure, and mixed, while publishing version 2 of both the scenario and report schemas. The changes also introduce support for deterministic sink bases, delivery throughput metrics, and comprehensive unit tests. The review feedback focuses on enhancing runtime robustness across these new components, specifically recommending defensive checks against untrusted JSON payloads in stats-client.ts and object-discovery.ts, safe optional chaining for actor keys in read.ts, and error handling for latency duration parsing in fanout.ts.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread packages/cli/src/bench/scenarios/fanout.ts
Comment thread packages/cli/src/bench/scenarios/read.ts Outdated
Comment thread packages/cli/src/bench/scenarios/object-discovery.ts Outdated
Comment thread packages/cli/src/bench/metrics/stats-client.ts
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

dahlia added 4 commits June 11, 2026 18:42
Invalid sinkBehavior latency strings could throw while preparing the
local sink server. Treat malformed latency values as the default zero
latency so the benchmark can still run with the rest of the sink
behavior.

fedify-dev#802 (comment)

Assisted-by: Codex:gpt-5.5
Authenticated read signing now handles malformed synthetic actor records
without crashing on a missing keys object. It reports the existing clear
RSA signing error instead.

fedify-dev#802 (comment)

Assisted-by: Codex:gpt-5.5
Object discovery now wraps JSON parse failures with the fetched URL so
users can tell which actor or collection response was malformed.

fedify-dev#802 (comment)

Assisted-by: Codex:gpt-5.5
Stats parsing now checks sum data points before reading their values so
malformed target-provided JSON cannot discard the whole snapshot.

fedify-dev#802 (comment)

Assisted-by: Codex:gpt-5.5
@dahlia

dahlia commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

@codex review

@dahlia

dahlia commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

/gemini review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several new scenario runners (actor, object, fanout, failure, and mixed) to fedify bench, updating the benchmark scenario and report schemas to version 2. It also adds the sinkBase option to generate deterministic benchmark sink inbox URLs. Feedback on the changes identifies a bug in scenarioNeedsReachableLocalServer where it fails to account for the default ["remote-404"] fault when scenario.faults is empty, which incorrectly suggests that a reachable local server is not required.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread packages/cli/src/bench/action.ts
Failure scenarios without an explicit fault now use the same remote-404
default when checking local sink reachability as the runner uses while
executing the scenario. Scenario schema v2 also allows the omitted fault
form so the documented runner default is reachable from CLI suites.

fedify-dev#802 (comment)

Assisted-by: Codex:gpt-5.5
@dahlia

dahlia commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

/gemini review

@dahlia

dahlia commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

@codex review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the 'actor', 'object', 'fanout', 'failure', and 'mixed' scenario runners for the 'fedify bench' command, upgrading the benchmark scenario and report schemas to version 2. It also includes comprehensive tests, validation logic, and documentation updates. Feedback on the changes points out that the destination safety gates passed to child runners in 'mixed' scenarios are pre-bound to the parent scenario, which can lead to incorrect rejections of valid child destinations. Additionally, it is recommended to limit the concurrency of destination checks in read scenarios to prevent DNS resolution bursts or socket exhaustion when processing large lists of URLs.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread packages/cli/src/bench/scenarios/mixed.ts
Comment thread packages/cli/src/bench/scenarios/read.ts Outdated
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

dahlia added 2 commits June 11, 2026 22:36
Mixed child runners now re-bind destination safety gates with the scaled
child scenario instead of inheriting the parent mixed scenario. This keeps
child-specific settings such as destination overrides visible to the safety
checks.

fedify-dev#802 (comment)

Assisted-by: Codex:gpt-5.5
Read scenarios now gate resolved URLs through a bounded worker pool instead
of launching every destination check at once. This avoids creating a burst of
DNS and socket work for large resolved actor or object URL lists.

fedify-dev#802 (comment)

Assisted-by: Codex:gpt-5.5
@dahlia

dahlia commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

@codex review

@dahlia

dahlia commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

/gemini review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2a8235764a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/cli/src/bench/scenarios/object.ts

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the actor, object, fanout, failure, and mixed scenario runners for the fedify bench tool, updating the benchmark scenario and report schemas to version 2. It introduces collection crawling for object discovery, background queue drain observation for fanout/failure scenarios, concurrent execution of weighted child scenarios, and comprehensive validation and testing. Feedback is provided regarding objectCandidates in object-discovery.ts, where a missing object property on a wrapper activity could incorrectly cause the activity itself to be fetched as an object candidate.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread packages/cli/src/bench/scenarios/object-discovery.ts Outdated
dahlia added 2 commits June 11, 2026 22:59
Object discovery now drops ActivityPub wrapper activities that do not
carry an object field instead of benchmarking the wrapper activity URL as
if it were the target object.

fedify-dev#802 (comment)

Assisted-by: Codex:gpt-5.5
Scenario schema v2 now permits signatureVerification expectations for
authenticated object reads while keeping unauthenticated object scenarios
limited to read metrics.

fedify-dev#802 (comment)

Assisted-by: Codex:gpt-5.5
@dahlia

dahlia commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

/gemini review

@dahlia

dahlia commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

@codex review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements several new benchmark scenario runners (actor, object, fanout, failure, and mixed) for the fedify bench CLI tool, leaving only the collection scenario type reserved but unimplemented. It introduces version 2 of both the benchmark scenario suite schema and the benchmark report schema to support new properties like sinkBase and deliveryThroughputPerSec. Additionally, it adds comprehensive unit tests, updates the documentation, and implements helper modules for crawling collections, resolving actor/object URLs, and validating destination safety. There are no review comments, so I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@dahlia dahlia requested review from 2chanhaeng and sij411 June 11, 2026 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/cli CLI tools related component/otel OpenTelemetry integration

Development

Successfully merging this pull request may close these issues.

Benchmarking: remaining scenario types (actor, object, fanout, failure, mixed)

1 participant