Add dual-path EdgeZero entry point with feature flag (PR 14)#628
Add dual-path EdgeZero entry point with feature flag (PR 14)#628prk-Jr wants to merge 212 commits into
Conversation
Rename crates/common → crates/trusted-server-core and crates/fastly → crates/trusted-server-adapter-fastly following the EdgeZero naming convention. Add EdgeZero workspace dependencies pinned to rev 170b74b. Update all references across docs, CI workflows, scripts, agent files, and configuration.
Introduces trusted-server-core::platform with PlatformConfigStore, PlatformSecretStore, PlatformKvStore, PlatformBackend, PlatformHttpClient, and PlatformGeo traits alongside ClientInfo, PlatformError, and RuntimeServices. Wires the Fastly adapter implementations and threads RuntimeServices into route_request. Moves GeoInfo to platform/types as platform-neutral data and adds geo_from_fastly for field mapping.
…o-pr2-platform-traits
- Defer KV store opening: replace early error return with a local UnavailableKvStore fallback so routes that do not need synthetic ID access succeed when the KV store is missing or temporarily unavailable - Use ConfigStore::try_open + try_get and SecretStore::try_get throughout FastlyPlatformConfigStore and FastlyPlatformSecretStore to honour the Result contract instead of panicking on open/lookup failure - Encapsulate RuntimeServices service fields as pub(crate) with public getter methods (config_store, secret_store, backend, http_client, geo) and a pub new() constructor; adapter updated to use new() - Reference #487 in FastlyPlatformHttpClient stub (PR 6 implements it) - Remove unused KvPage re-export from platform/mod.rs - Use super::KvHandle shorthand in RuntimeServices::kv_handle()
- Split fastly_storage.rs into storage/{config_store,secret_store,api_client,mod}.rs
- Add PlatformConfigStore read path via FastlyPlatformConfigStore::get using ConfigStore::try_open/try_get
- Add PlatformError::NotImplemented variant; stub write methods on FastlyPlatformConfigStore and FastlyPlatformSecretStore
- Add StoreName/StoreId newtypes with From<String>, From<&str>, AsRef<str>
- Add UnavailableKvStore to core platform module
- Add RuntimeServicesBuilder replacing 7-arg constructor
- Migrate get_active_jwks and handle_trusted_server_discovery to use &RuntimeServices
- Update call sites in signing.rs, rotation.rs, main.rs
- Add success-path test for handle_trusted_server_discovery using StubJwksConfigStore
- Fix test_parse_cookies_to_jar_empty typo (was emtpy)
- Make StoreName and StoreId inner fields private; From/AsRef provide all needed construction and access - Add #[deprecated] to GeoInfo::from_request with #[allow(deprecated)] at the three legacy call sites to track migration progress - Enumerate the six platform traits in the platform module doc comment - Extract backend_config_from_spec helper to remove duplicate BackendConfig construction in predict_name and ensure - Replace .into_iter().collect() with .to_vec() on secret plaintext bytes - Remove unused bytes dependency from trusted-server-adapter-fastly - Add comment on SecretStore::open clarifying it already returns Result (unlike ConfigStore::open which panics)
…sing path The buffering cap was only applied when the EdgeZero entry point converted a streaming publisher response into a buffered one. The shared handle_publisher_request BufferedProcessed arm — taken for HTML responses with a registered post-processor such as NextJS — decoded and re-wrote the body into an unbounded Vec, so a highly-compressible origin response under the platform raw-body cap could expand past max_buffered_body_bytes and exhaust the Wasm heap. Move BoundedWriter into trusted-server-core and apply it in the BufferedProcessed arm so both that path and the EdgeZero stream-to-buffer conversion enforce the same configured limit. Add a regression test driving a registered HTML post-processor with a response that exceeds a small cap and asserting the request errors instead of allocating past the limit.
ChristianPavilonis
left a comment
There was a problem hiding this comment.
Automated review: I reviewed PR #628 against feature/edgezero-pr13-integration-provider-type-migration, focusing on the current dual-path EdgeZero entry point, EC/publisher fallback parity, and the latest buffering/config changes. CI checks surfaced by GitHub are green, and most previously raised blockers appear addressed. I found one blocking reliability issue: the new publisher.max_buffered_body_bytes guard is applied after the HTML post-processor has already accumulated the full decoded document, so the cap does not actually prevent the Wasm-heap OOM class it is intended to mitigate. I also left one non-blocking operator-doc correction.
Merge 6e191e9 combined the edgezero http::Request/Response migration with new auction tests from main that still used the Fastly SDK API, and added the RequestTooLarge error variant. Update the auction format tests to build Request<EdgeBody> via the http builder, read response bodies through EdgeBody::into_bytes, pass the new services and geo arguments to convert_tsjs_to_auction_request, and use http accessors. Cover the new RequestTooLarge variant in the error status-code guard and mapping test.
…yer-types' into feature/edgezero-pr13-integration-provider-type-migration # Conflicts: # crates/trusted-server-adapter-fastly/src/platform.rs # crates/trusted-server-core/src/auction/orchestrator.rs # crates/trusted-server-core/src/auction/provider.rs # crates/trusted-server-core/src/integrations/didomi.rs # crates/trusted-server-core/src/integrations/registry.rs # crates/trusted-server-core/src/platform/test_support.rs
Provider transport failures (select() errors) pushed a bare AuctionResponse::error with no metadata, while parse and launch failures attach error_type/message. Route transport failures through a shared helper with a new ERROR_TYPE_TRANSPORT classifier and a static, upstream-safe message, so the public /auction response shape no longer depends on how a provider failed. Add unit coverage for the new helper.
…n-provider-type-migration' into feature/edgezero-pr14-entry-point-dual-path # Conflicts: # crates/trusted-server-adapter-fastly/src/main.rs # crates/trusted-server-core/src/platform/test_support.rs # crates/trusted-server-core/src/proxy.rs
The publisher.max_buffered_body_bytes cap was only enforced by BoundedWriter on the bytes emitted from process_response_streaming. The HTML path with registered post-processors first accumulates every rewritten chunk in HtmlWithPostProcessing::accumulated_output and emits nothing until the final chunk, so a highly compressible document could grow that buffer far past the cap before the writer ever saw it — the exact Wasm-heap OOM the setting is meant to prevent. Thread the configured limit into the accumulator and reject before extending it, and re-check the post-processed output (post-processors may append content). Errors use the same message and propagate through the same process_chunk path that maps to a 5xx proxy error. Add a regression test proving the accumulator itself cannot grow past the cap mid-stream rather than only failing on the final write.
BoundedWriter errors map through TrustedServerError::Proxy, whose status is 502 Bad Gateway, not 500. Operators may build runbooks from this sample, so document the actual 502 proxy-error status.
…-entry-point-dual-path # Conflicts: # crates/trusted-server-adapter-fastly/src/main.rs # crates/trusted-server-core/src/ec/mod.rs # crates/trusted-server-core/src/proxy.rs # crates/trusted-server-core/src/publisher.rs
ChristianPavilonis
left a comment
There was a problem hiding this comment.
Automated review: I reviewed the current PR #628 diff against origin/main, including the EdgeZero entry point, router fallback, EC finalization lifecycle, and current CI/review context. CI is green, and the earlier forwarded-header/router/middleware blockers appear addressed, but I found two high-confidence EdgeZero-path regressions that should be fixed before this feature flag can be safely enabled: configured asset routes are bypassed entirely, and invalid EC partner configuration fails open for fallback responses instead of failing the request as legacy does.
Inline comments contain the actionable findings. Because these are compatibility/correctness regressions on the new entry point, I am submitting REQUEST_CHANGES.
CI observed via gh pr checks 628: all listed checks pass (cargo fmt, cargo test, clippy/Analyze rust, vitest, format-typescript, format-docs, browser/integration tests, CodeQL). Existing reviews already covered forwarded header sanitization, router-level 405/finalize behavior, TLS metadata, parse_edgezero_flag, and buffering; I avoided duplicating those.
aram356
left a comment
There was a problem hiding this comment.
Summary
Re-review at head 34e16098. All three round-1 findings are resolved (the max_buffered_body_bytes default/null contradictions and the geo-401 parity note). New work — the EC identity lifecycle port, the app.router().oneshot() switch to preserve duplicate Set-Cookie headers, and fastly-ssl stripping before origin forwarding — is solid and well-tested. Two items need attention before merge: a shared-path buffering cap that changes legacy behavior while the docs scope it to EdgeZero, and confirmation that the client-IP request context still reaches the router after the dispatch bypass.
Blocking
🔧 wrench
- 16 MiB cap now also constrains the legacy path, but docs say EdgeZero-only: the buffered HTML-post-processed branch of the shared
handle_publisher_requestwent from unboundedVec::new()toBoundedWriter::new(max_buffered_body_bytes), so legacy/default-path post-processed responses over 16 MiB decoded now 500. Reconcile code with thesettings.rs/trusted-server.tomldocs (publisher.rs:699).
❓ question
- Confirm
client_ip(FastlyRequestContext) reaches the router on the EdgeZero path: theinto_core_request+oneshotbypass inserts onlyconfig_store; EC consent geo-gating and middleware geo read the IP viaFastlyRequestContext. Tests useNone-IP requests so they wouldn't catch a missing context (main.rs:243).
Non-blocking
🤔 thinking
- CORS preflight builds full EC state:
OPTIONS /_ts/api/v1/identifyruns the device/geo/KV prelude and attachesEcFinalizeStatebefore routing to the CORS handler; confirm this matches legacy or short-circuit OPTIONS before the prelude (app.rs:337).
CI Status
- browser integration tests: PASS
- integration tests: PASS
- prepare integration artifacts: PASS
- fmt / clippy / unit tests: not surfaced as separate PR checks
Address two EdgeZero-path regressions and one doc mismatch from PR 628 review: - Asset routes: dispatch_fallback now checks settings.asset_route_for_path for GET/HEAD before the publisher fallback and dispatches matches through handle_asset_proxy_request, mirroring legacy route_request. Asset responses skip EC finalization (no EcFinalizeState) and thread AssetProxyCachePolicy out so edgezero_main reapplies protected cache directives after finalization. - EC partner config: build PartnerRegistry in build_state_from_settings so an invalid ec.partners config fails closed at startup (routes() falls back to startup_error_router) instead of serving publisher fallback responses with the partner-registry error only logged at finalize. Per-request rebuilds in identify/auction/batch-sync now reuse the shared registry. - Docs: max_buffered_body_bytes cap applies to any buffered post-processed publisher response on both the legacy and EdgeZero paths; broaden the settings.rs and trusted-server.toml doc strings to match the shared code path. Add regression tests for the asset-route fallback and fail-closed partner config.
ChristianPavilonis
left a comment
There was a problem hiding this comment.
Automated review: I reviewed the current PR #628 diff against origin/main with a focus on EdgeZero/legacy parity, EC lifecycle correctness, asset fallback behavior, and the new buffering setting. I found one blocking EdgeZero correctness regression: device signals are derived after the Fastly request has been converted, so the JA4/H2 signals used by the EC bot gate are lost and real browser requests are treated as non-browser traffic. I also left two non-blocking compatibility/maintainability findings around asset-route buffering and the Rust Default value for the new publisher buffer cap.
CI is currently green in GitHub checks (cargo test, vitest, format jobs, integration tests, and CodeQL). Existing review threads cover many earlier EdgeZero parity issues; these findings are for remaining/current issues I did not see resolved in prior comments.
…-entry-point-dual-path # Conflicts: # crates/trusted-server-adapter-fastly/src/main.rs
The EdgeZero path derived device signals from a synthetic fastly request reconstructed from the EdgeZero HTTP request. That request cannot expose Fastly-only TLS values (`get_tls_ja4`, `get_client_h2_fingerprint`), so the JA4/H2 class the EC bot gate relies on was lost and real browsers were classified as non-browser traffic — silently suppressing EC generation, finalize KV writes, and pull-sync. Derive `DeviceSignals` in `edgezero_main` from the original `FastlyRequest` before `into_core_request` consumes it, store them in the request extensions, and read them back in `build_ec_request_state` (falling back to the reconstructed request when absent, e.g. in tests). Add EdgeZero-path tests proving a browser-shaped request reaches the EC finalize path and that a request without captured signals is treated as non-browser.
The EdgeZero asset-route fallback unconditionally rewrote Content-Length to the buffered byte count whenever the asset origin returned a streaming body. For HEAD responses and bodiless statuses (204, 304) — which advertise the origin representation length while carrying no body — that rewrote a meaningful Content-Length to 0, corrupting the metadata. Only recompute Content-Length and attach the buffered body for body-bearing responses; HEAD/204/304 keep the origin Content-Length untouched. Add a unit test for the body-presence guard. Also document the interim asset buffering behavior: the EdgeZero path buffers asset bodies (legacy streams them uncapped) bounded by the publisher OOM guard. Restoring streaming and deciding whether assets warrant a dedicated cap are deferred to the streaming cutover (issue #495).
Publisher derived Default, so Publisher::default() / Settings::default() set max_buffered_body_bytes to usize's 0 while deserializing TOML without the key applied the intended 16 MiB default. Programmatic defaults are used in tests and helper code, where any buffered post-processing would fail immediately at a zero-byte cap. Replace the derived Default with a manual impl that sources default_max_buffered_body_bytes(), keeping the other fields at their derived defaults. Add a unit test asserting the manual default and the TOML default stay aligned.
ChristianPavilonis
left a comment
There was a problem hiding this comment.
Automated review:
Review Summary
Reviewed PR #628 at b0a57ac against main, focusing on the dual-path EdgeZero entry point, routing/middleware parity, EC/publisher handling, config, and tests. CI is green, and several prior review findings appear addressed, but I found one blocking security/parity regression: the EdgeZero router never runs the integration request-filter pipeline, which bypasses DataDome-style protection/challenge behavior and drops request/response mutations when the flag is enabled.
Findings
P0 / Blockers
- EdgeZero path bypasses integration request filters — see inline comment on
crates/trusted-server-adapter-fastly/src/app.rs.
P1 / High
None.
P2 / Medium
publisher.max_buffered_body_bytes = 0is accepted and can break all buffered publisher responses — see inline comment oncrates/trusted-server-core/src/settings.rs.- Publisher
HEADresponses can be returned withContent-Length: 0after EdgeZero buffering — see inline comment oncrates/trusted-server-adapter-fastly/src/main.rs. - No enabled EdgeZero-path integration/Viceroy coverage — the local
trusted_server_config.edgezero_enableddefault isfalse, and the integration-test workflow does not override it, so CI still primarily exercises the legacy entry point while this PR's new production path can regress in Fastly request conversion, config-store dispatch, and end-to-end middleware/EC wiring. Please add at least one integration-test variant with the flag enabled covering representative publisher fallback, protected-route/auth, EC/API, and asset/proxy paths.
P3 / Low
- Configuration docs omit the new publisher buffer cap —
trusted-server.tomldocumentspublisher.max_buffered_body_bytes, butdocs/guide/configuration.mdstill omits the field and its environment override. Please add the default, operational impact, effective Fastly limits, andTRUSTED_SERVER__PUBLISHER__MAX_BUFFERED_BODY_BYTESexample.
CI / Existing Reviews
gh pr checks 628 reports all visible checks passing, including Rust analysis, cargo test, cargo fmt, vitest, TypeScript/docs format, CodeQL, and integration/browser tests. Existing reviews include earlier CHANGES_REQUESTED rounds; several prior findings have author replies/fixes, but the PR review decision remains CHANGES_REQUESTED. This review is submitted as REQUEST_CHANGES because the request-filter bypass is a blocking security/correctness regression when edgezero_enabled is enabled.
ChristianPavilonis
left a comment
There was a problem hiding this comment.
Summary
Reviewed the current PR #628 diff after the latest fixes. Requesting changes because the EdgeZero path still misses a security-critical legacy hook: integration request filters, including DataDome server-side protection, do not run when edgezero_enabled=true.
Cross-cutting finding
🔧 wrench — CI does not exercise the EdgeZero Fastly entry point
The runtime branch only enters edgezero_main when trusted_server_config.edgezero_enabled is readable and true, but the Rust Viceroy fixture defines only jwks_store, and both browser CI runs point at that same fixture. That means the Fastly request conversion/finalization/send-to-client behavior introduced by this PR can be untested while CI stays green. Please add an EdgeZero-enabled Viceroy fixture or CI matrix entry with trusted_server_config.contents.edgezero_enabled = "true" and run at least a focused parity/smoke subset against it before merging.
Existing GitHub checks are green, but they currently appear to exercise the legacy entry point unless a test explicitly bypasses main.
Address PR review findings on the dual-path EdgeZero entry point. - Run the integration request-filter pipeline (DataDome protection) on the EdgeZero dispatch path, mirroring legacy ordering: after auth, before route match. Respond short-circuits keep EcFinalizeState; response effects thread out via extensions and apply after EC finalization in edgezero_main. Add dispatch regression tests via a test-utils-gated registry constructor. - Reject publisher.max_buffered_body_bytes = 0 at config validation. - Make the buffered publisher resolver method-aware so HEAD and bodiless statuses keep origin Content-Length instead of a misleading 0. - Bound cumulative decoded HTML input so a placeholder-emitting rewriter cannot grow the Wasm heap behind the output cap. - Add an EdgeZero-enabled Viceroy fixture and integration-tests-edgezero CI job running the EC lifecycle suite against the EdgeZero entry point. - Document publisher.max_buffered_body_bytes and the edgezero_enabled runtime flag in the configuration guide.
ChristianPavilonis
left a comment
There was a problem hiding this comment.
Automated review:
Review Summary
Reviewed PR #628 against main, focusing on the current dual-path EdgeZero entry point after prior review fixes. The major previously reported parity issues appear addressed, but I found one remaining security-relevant regression: the EdgeZero path drops Fastly request metadata that DataDome protection consumes for bot detection. I also left a non-blocking CI canary suggestion.
Findings
P0 / Blockers
None.
P1 / High
See inline comment on crates/trusted-server-adapter-fastly/src/app.rs.
P2 / Medium
See inline comment on .github/workflows/integration-tests.yml.
P3 / Low
None.
CI / Existing Reviews
GitHub checks are passing, including the EdgeZero integration job. Existing reviews already covered forwarded-header sanitization, build-state panic handling, dependency pinning, router-level 405/finalization, flag parsing, auth error conversion, and several HTTP/header parity issues; I avoided duplicating those.
| .backend(Arc::new(FastlyPlatformBackend)) | ||
| .http_client(Arc::new(FastlyPlatformHttpClient)) | ||
| .geo(Arc::new(FastlyPlatformGeo)) | ||
| .client_info(ClientInfo { |
There was a problem hiding this comment.
Automated review: P1 / High — EdgeZero drops bot-protection metadata before DataDome sees it.
build_per_request_services only carries client_ip into RuntimeServices::client_info and leaves tls_protocol, tls_cipher, tls_ja4, h2_fingerprint, server_hostname, and server_region empty. The legacy Fastly path populates the same fields from the original fastly::Request and env (platform.rs:570-577), and DataDome protection serializes them into its payload (ServerName, ServerRegion, TlsProtocol, TlsCipher, JA4, H2Fingerprint). On the EdgeZero path, DataDome therefore receives a materially weaker bot-protection payload even though edgezero_main still has the original Fastly request before into_core_request consumes it.
Suggested fix: capture the full ClientInfo in edgezero_main before conversion (or insert an extension alongside DeviceSignals) and have this builder read that extension instead of defaulting these fields to empty. Add a regression test with sample TLS/JA4/H2/server metadata and a DataDome request filter assertion that the EdgeZero protection payload includes those fields.
| # dispatch, publisher fallback proxying, and end-to-end EC/API wiring on | ||
| # the EdgeZero path. The legacy `integration-tests` job above still covers | ||
| # the full framework matrix. | ||
| - name: Run EdgeZero EC lifecycle tests |
There was a problem hiding this comment.
Automated review: P2 / Medium — EdgeZero CI can false-pass on the legacy path.
This job relies on VICEROY_CONFIG_PATH enabling edgezero_enabled, but main() intentionally falls back to legacy_main when the config store cannot be opened/read. The reused EC lifecycle scenarios are also valid on the legacy path, so a fixture/env/config-store regression could make this job green while exercising legacy instead of EdgeZero.
Suggested fix: add an EdgeZero-only canary to this job/suite, such as asserting TRACE / returns the documented EdgeZero router-level 405 rather than the legacy publisher fallback, or exposing a test-only response marker that proves the EdgeZero branch was taken.
Summary
TrustedServerAppor the preservedlegacy_mainbased onedgezero_enabledin thetrusted_server_configFastly ConfigStore. Missing, unreadable, or non-true flag values fall back to legacy.GET /healthas a cheap entry-point shortcut that bypasses logging, settings, and app construction.TrustedServerAppviaedgezero_core::app::HookswithFinalizeResponseMiddlewareandAuthMiddleware; auction and publisher fallback routes fail closed when a configured consent store cannot be opened.rev = "38198f9839b70aef03ab971ae5876982773fc2a1"and bumpstomlto"1.1".ProxyRequestConfig.allowed_domains:&[]is open mode for operator-configured integration proxies; a non-empty slice restricts first-party proxy initial targets and redirect hops.Changes
Cargo.tomledgezero-adapter-axum,edgezero-adapter-cloudflare,edgezero-adapter-fastly, andedgezero-coretorev = "38198f9839b70aef03ab971ae5876982773fc2a1"; bumptomlto"1.1"Cargo.locktomlversion changefastly.tomltrusted_server_configwithedgezero_enabled = "false"as the defaultcrates/trusted-server-adapter-fastly/src/main.rs/healthshortcut, EdgeZerodispatch_with_config, entry-pointmatch get_settings()finalize block for router-level 405/404 responses, and PR13 legacy streaming handlingcrates/trusted-server-adapter-fastly/src/app.rsTrustedServerApp, middleware wiring, consent-aware runtime service selection, table-driven named route registration, androutes_for_statetest seamcrates/trusted-server-adapter-fastly/src/middleware.rsapply_finalize_headershelpercrates/trusted-server-adapter-fastly/src/route_tests.rsHandlerOutcomeand the consent-store fail-closed behaviorcrates/trusted-server-core/src/proxy.rsProxyRequestConfig.allowed_domainsopen-mode versus restricted-mode semanticsCloses
Closes #495
Test plan
cargo test -p trusted-server-adapter-fastly dispatch_auction_with_missing_consent_store_returns_503cargo test -p trusted-server-adapter-fastlycargo test --workspacecargo clippy --workspace --all-targets --all-features -- -D warningscargo fmt --all -- --checkcd crates/js/lib && npx vitest run(not rerun in this cleanup)cd crates/js/lib && npm run format(not rerun in this cleanup)cd docs && npm run format(not rerun in this cleanup)Checklist
unwrap()in production code - useexpect("should ...")logmacros, notprintln!