Add vMCP assembly API in pkg/vmcp/app#5672
Conversation
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5672 +/- ##
==========================================
+ Coverage 70.61% 70.72% +0.10%
==========================================
Files 667 672 +5
Lines 67607 67856 +249
==========================================
+ Hits 47743 47989 +246
+ Misses 16410 16382 -28
- Partials 3454 3485 +31 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
tgrunnagle
left a comment
There was a problem hiding this comment.
Multi-Agent Consensus Review
Agents consulted: api-design, architecture, go-idioms, test-coverage, general-quality
Consensus Summary
| # | Finding | File | Consensus | Severity | Action |
|---|---|---|---|---|---|
| F1 | Discovered-mode watcher silently discarded in BuildCore | build_core.go:199 | 9/10 | MEDIUM | Fix |
| F2 | exampleembedder not anchored in CI build graph | exampleembedder/embedder.go:1 | 9/10 | MEDIUM | Fix |
| F3 | Static/dynamic registries built twice (double-discovery risk) | build_core.go:146 | 7/10 | MEDIUM | Discuss |
| F4 | TestBuildCore_InlineDiscovery bypasses the path it claims to test | build_core_test.go:43 | 8/10 | MEDIUM | Fix |
| F5 | Three deleted CLI tests have no port in new package | serve_test.go (deletions) | 7/10 | MEDIUM | Fix |
| F6 | Shallow copy for OutgoingAuth missing — mutation escapes to caller | build_server_config.go:58 | 8/10 | MEDIUM | Fix |
| F7 | "discovered" compared as magic string literal |
build_core.go:149 | 7/10 | LOW | Fix |
| F8 | WithBackendRegistry accepts nil registry silently | options.go:144 | 7/10 | LOW | Fix |
| F9 | Asymmetric docstring language for discovered mode | build_server_config.go:29 | 7/10 | LOW | Fix |
| F10 | applyOptions panics on nil Option in slice | options.go:184 | 7/10 | LOW | Fix |
| F11 | pkg/vmcp/app missing from vmcp-library.md stability table | docs/arch/vmcp-library.md | 7/10 | LOW | Fix |
| F12 | LateBoundElicitationRequester lacks stability annotation | elicitation_latebound.go:27 | 7/10 | LOW | Fix |
| F13 | Close() error discarded without explanation | build_core.go:143 | 7/10 | LOW | Fix |
| F14 | VMCP_NAMESPACE error message starts uppercase | build_core.go:181 | 8/10 | LOW | Fix |
| F15 | Rate-limit cleanup captures context.Background() | build_server_config.go:148 | 7/10 | LOW | Fix |
| F16 | Log message starts uppercase, inconsistent | build_core.go:177 | 7/10 | LOW | Fix |
| F17 | Cleanup-idempotency tests are vacuously true | build_server_config_test.go:153 | 9/10 | LOW | Fix |
| F18 | deriveHealthMonitorConfig error path untested | build_core_test.go | 7/10 | LOW | Fix |
| F19 | BuildServerConfig static-backend mode untested | build_server_config_test.go | 7/10 | LOW | Fix |
| F20 | exampleembedder cleanup guard pattern is brittle | exampleembedder/embedder.go:59 | 7/10 | LOW | Polish |
Overall
This PR introduces pkg/vmcp/app as the new public assembly package for vMCP, implementing the design specified in issue #5581. The approach — two independent derivation functions (BuildCore, BuildServerConfig) sharing stateful collaborators via functional options — is architecturally sound and correctly reflects the "independent projections from the same config" pattern the issue describes. The CLI migration (serve.go reduced from ~700 to ~360 lines) validates the API in the real production caller.
The main risk is a behavioral asymmetry in the Kubernetes "discovered" mode: BuildServerConfig enforces WithBackendRegistry at runtime (returning an error if absent), while BuildCore silently starts its own K8s informer with no readiness-gating handle when the option is omitted. An embedder who provides WithBackendRegistry to BuildServerConfig but forgets BuildCore gets two informer caches and no clear error. The docstring says "SHOULD provide" rather than enforcing it — this understates the risk for a public API intended for external embedders.
The test gaps (F4, F5: misleadingly-named test, three deleted tests with no port) and the missing importgraph anchor for the exampleembedder (F2) are addressable without design changes and worth fixing before downstream adoption.
Documentation
docs/arch/vmcp-library.md: The stability table has no entry for the new pkg/vmcp/app package (intended for downstream embedders). Add a row with an Experimental designation covering BuildCore, BuildServerConfig, and Option. Also note the LateBoundElicitationRequester addition to the pkg/vmcp/server entry.
F5 — Deleted tests with no port (no diff line available): TestRunDiscovery_KubernetesGroupNotFound, TestRunDiscovery_ZeroBackends, and TestVMCPNamespace were deleted from pkg/vmcp/cli/serve_test.go. The behavior each tested still exists verbatim in build_core.go (lines 240–253) and build_server_config.go. Port all three to pkg/vmcp/app/build_core_test.go / pkg/vmcp/app/build_server_config_test.go.
Generated with Claude Code
Addresses #5672 review comments: - MEDIUM build_core.go (3487247177, F1): make BuildCore return an error for the discovered outgoingAuth source without WithBackendRegistry, symmetric with BuildServerConfig; delete the internal Kubernetes registry build path and the now-dead WithRESTConfig option. Resolves F9/F14/F16 by removal. - MEDIUM exampleembedder/embedder.go (3487247182, F2): add importgraph_test.go anchoring the example in the CI build graph. - MEDIUM build_core.go (3487247185, F3): document that static/dynamic modes build registries independently and may snapshot state at different moments. - MEDIUM build_core_test.go (3487247187, F4): rename misleading InlineDiscovery test to WithInjectedRegistry_SkipsDiscovery; drop the duplicate. - MEDIUM serve_test.go deletions (F5): port TestRunDiscovery_ZeroBackends, TestRunDiscovery_KubernetesGroupNotFound, and TestVMCPNamespace to the app package. - MEDIUM build_server_config.go (3487247188, F6): deep-copy OutgoingAuth before InjectSubjectProviderNames so strategy mutations do not escape to the caller. - LOW build_core.go/build_server_config.go (3487247195, F7): add OutgoingAuthSource{Inline,Discovered} constants and use them. - LOW options.go (3487247197, F8): panic on nil registry in WithBackendRegistry. - LOW build_server_config.go (3487247201, F9): align discovered-mode docstring to MUST. - LOW options.go (3487247203, F10): skip nil Option in applyOptions. - LOW vmcp-library.md (F11): add pkg/vmcp/app row; note LateBoundElicitationRequester. - LOW elicitation_latebound.go (3487247205, F12): add Experimental annotation. - LOW build_core.go (3487247208, F13): log core Close error in cleanup. - LOW build_server_config.go (3487247210, F15): document context.Background() in cleanup. - LOW build_*_test.go (3487247212, F17): rename vacuous cleanup tests to DoesNotPanic. - LOW build_core_test.go (3487247214, F18): add health-monitor invalid-threshold test. - LOW build_server_config_test.go (3487247215, F19): add static-backend test. - LOW exampleembedder/embedder.go (3487247216, F20): consolidate cleanup guards into a single disarm flag. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Large PR justification has been provided. Thank you!
|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
Addresses #5672 review comments: - MEDIUM build_core.go (3487247177, F1): make BuildCore return an error for the discovered outgoingAuth source without WithBackendRegistry, symmetric with BuildServerConfig; delete the internal Kubernetes registry build path and the now-dead WithRESTConfig option. Resolves F9/F14/F16 by removal. - MEDIUM exampleembedder/embedder.go (3487247182, F2): add importgraph_test.go anchoring the example in the CI build graph. - MEDIUM build_core.go (3487247185, F3): document that static/dynamic modes build registries independently and may snapshot state at different moments. - MEDIUM build_core_test.go (3487247187, F4): rename misleading InlineDiscovery test to WithInjectedRegistry_SkipsDiscovery; drop the duplicate. - MEDIUM serve_test.go deletions (F5): port TestRunDiscovery_ZeroBackends, TestRunDiscovery_KubernetesGroupNotFound, and TestVMCPNamespace to the app package. - MEDIUM build_server_config.go (3487247188, F6): deep-copy OutgoingAuth before InjectSubjectProviderNames so strategy mutations do not escape to the caller. - LOW build_core.go/build_server_config.go (3487247195, F7): add OutgoingAuthSource{Inline,Discovered} constants and use them. - LOW options.go (3487247197, F8): panic on nil registry in WithBackendRegistry. - LOW build_server_config.go (3487247201, F9): align discovered-mode docstring to MUST. - LOW options.go (3487247203, F10): skip nil Option in applyOptions. - LOW vmcp-library.md (F11): add pkg/vmcp/app row; note LateBoundElicitationRequester. - LOW elicitation_latebound.go (3487247205, F12): add Experimental annotation. - LOW build_core.go (3487247208, F13): log core Close error in cleanup. - LOW build_server_config.go (3487247210, F15): document context.Background() in cleanup. - LOW build_*_test.go (3487247212, F17): rename vacuous cleanup tests to DoesNotPanic. - LOW build_core_test.go (3487247214, F18): add health-monitor invalid-threshold test. - LOW build_server_config_test.go (3487247215, F19): add static-backend test. - LOW exampleembedder/embedder.go (3487247216, F20): consolidate cleanup guards into a single disarm flag. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1bac365 to
b114ee3
Compare
|
Design question on the assembly API shape — would a builder fit better here? Reading through the new I think these are all symptoms of one thing: The elicitation case is the clearest example. The late-bind exists because Which made me wonder whether a small builder that owns b := app.NewBuilder(ctx, cfg, app.WithVersion(v), app.WithListenAddress(host, port))
b.Decorate(myDecorator) // the THV-0076 seam
// optional overrides with config-derived defaults: WithTelemetryProvider, WithBackendRegistry, ...
srv, core, err := b.Finish() // builds the registry once, wires elicitation internally, owns cleanup
If That said, I may well be missing constraints that make this awkward — e.g. an embedder needing to do something between core and server beyond decoration, or a reason the two derivations genuinely need to stay independent. wdyt? Happy to be talked out of it if there's complexity here I'm not seeing. |
|
Thanks @jerm-dro — this was a good call, implemented in 14ddcec9.
srv, core, cleanup, err := app.NewBuilder(ctx, cfg,
app.WithVersion(v), app.WithHost(host, port)).
Decorate(myDecorator).
Finish()
defer cleanup()
One deliberate deviation from the proposal: I kept The one place I didn't fully collapse:
|
Closes #5581 Creates pkg/vmcp/app with two public functions that encapsulate vMCP assembly behind a config-driven API: - app.BuildCore derives core.Config from vmcpconfig.Config + Options and calls core.New, returning the assembled VMCP for decoration. Handles all three discovery modes: static backends, Kubernetes discovered (reusing backendregistry.NewKubernetesBackendRegistry from #5541), and dynamic (groups manager). - app.BuildServerConfig derives *server.ServerConfig from the same config independently. Requires WithBackendRegistry for the Kubernetes discovered mode to prevent duplicate informer caches. - Functional Options carry the non-serialized runtime injectables: version, host/port, session TTL, telemetry provider, backend registry + watcher, REST config, embedded auth server RunConfig, status reporter, and elicitation requester. - Example embedder in app/internal/exampleembedder demonstrates the full BuildCore + decorate + BuildServerConfig + server.Serve flow. - Exports LateBoundElicitationRequester from pkg/vmcp/server so callers outside the package can wire composite-tool elicitation. Migrates pkg/vmcp/cli/serve.go to call app.BuildCore + app.BuildServerConfig, replacing the ~650-line hand-wired assembly with ~130 lines. Moves readHeaderForwardFromEnv from cli to app (the only consumer is now BuildCore). Removes cli functions that moved: discoverBackends, runDiscovery, vmcpNamespace, getStatusReportingInterval, readHeaderForwardFromEnv. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Remove dead outgoing auth registry and backend client construction from buildStaticBackendRegistry and buildDynamicBackendRegistry; those functions only need a discoverer, not a backend client (the BackendClient for aggregation is constructed once at the top of BuildCore) - Remove the unused _ vmcp.BackendClient parameter from runDiscovery - Correct the package doc telemetry comment: neither BuildCore nor BuildServerConfig builds a provider; callers who want telemetry must inject one via WithTelemetryProvider - Replace "Kubernetes dynamic mode" with "Kubernetes discovered mode" in options.go and the example embedder to match the rest of the codebase convention Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Addresses #5672 review comments: - MEDIUM build_core.go (3487247177, F1): make BuildCore return an error for the discovered outgoingAuth source without WithBackendRegistry, symmetric with BuildServerConfig; delete the internal Kubernetes registry build path and the now-dead WithRESTConfig option. Resolves F9/F14/F16 by removal. - MEDIUM exampleembedder/embedder.go (3487247182, F2): add importgraph_test.go anchoring the example in the CI build graph. - MEDIUM build_core.go (3487247185, F3): document that static/dynamic modes build registries independently and may snapshot state at different moments. - MEDIUM build_core_test.go (3487247187, F4): rename misleading InlineDiscovery test to WithInjectedRegistry_SkipsDiscovery; drop the duplicate. - MEDIUM serve_test.go deletions (F5): port TestRunDiscovery_ZeroBackends, TestRunDiscovery_KubernetesGroupNotFound, and TestVMCPNamespace to the app package. - MEDIUM build_server_config.go (3487247188, F6): deep-copy OutgoingAuth before InjectSubjectProviderNames so strategy mutations do not escape to the caller. - LOW build_core.go/build_server_config.go (3487247195, F7): add OutgoingAuthSource{Inline,Discovered} constants and use them. - LOW options.go (3487247197, F8): panic on nil registry in WithBackendRegistry. - LOW build_server_config.go (3487247201, F9): align discovered-mode docstring to MUST. - LOW options.go (3487247203, F10): skip nil Option in applyOptions. - LOW vmcp-library.md (F11): add pkg/vmcp/app row; note LateBoundElicitationRequester. - LOW elicitation_latebound.go (3487247205, F12): add Experimental annotation. - LOW build_core.go (3487247208, F13): log core Close error in cleanup. - LOW build_server_config.go (3487247210, F15): document context.Background() in cleanup. - LOW build_*_test.go (3487247212, F17): rename vacuous cleanup tests to DoesNotPanic. - LOW build_core_test.go (3487247214, F18): add health-monitor invalid-threshold test. - LOW build_server_config_test.go (3487247215, F19): add static-backend test. - LOW exampleembedder/embedder.go (3487247216, F20): consolidate cleanup guards into a single disarm flag. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The lifecycle E2E suite (E2E Test Lifecycle) failed on this branch because the migration of cli.Serve to the app.BuildCore/BuildServerConfig assembly dropped two behaviors that the legacy server.New path provided. 1. Discovered-mode backend registry was no longer seeded at startup. The old cli.Serve seeded the DynamicRegistry synchronously via discovery and then attached the K8s watcher; backendregistry.NewKubernetesBackendRegistry started empty and relied solely on the watcher's asynchronous reconcile, which lags the informer cache sync. The pod could pass its /readyz cache-sync gate and report status while the registry was still empty, leaving a window where it advertised zero backends. The "VirtualMCPServer Unauthenticated Backend Auth" E2E caught this (Status.DiscoveredBackends empty though BackendsDiscovered=True). Fix: add BackendWatcher.SyncRegistry, which lists the group's current backends via a direct (uncached) client and upserts them through the same workloads.Discoverer conversion the reconciler uses, and call it from NewKubernetesBackendRegistry before returning so the registry is seeded for all callers (CLI and embedders). A seeding failure is logged and tolerated; the watcher still populates the registry as backends appear. 2. Code mode decorator was no longer applied. server.New wrapped the core with codemode.NewDecorator when CodeModeConfig was set, but BuildCore + server.Serve never did, so execute_tool_script was never advertised (the "VirtualMCPServer Code Mode" E2E failed). Fix: apply codemode.NewDecorator in BuildCore from cfg.CodeMode, matching server.New (it sits below any Serve-layer optimizer). Public interfaces are unchanged: core.VMCP, core.New/Config, server.Serve/ ServerConfig/New, and backendregistry.NewKubernetesBackendRegistry's signature are all byte-identical to main, so downstream embedders are unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The lifecycle E2E flaked on "Redis-Backed Session Sharing ... reject the session on pod B after it is terminated on pod A" at the precondition that pod B serves the reconstructed session before termination. Cross-pod session reconstruction is eventually consistent — pod B must reconstruct the session from Redis and warm its per-identity capability view — so a single immediate ListTools on pod B flakes under parallel CI load. Identical code passed on a prior run and passes locally, confirming a timing flake rather than a regression. Wrap both cross-pod "pod B serves tools" preconditions (the lazy-eviction test and its reconstruction sibling) in Eventually so they retry until pod B serves the session. A genuine failure to serve still fails when the poll times out, so this hardens the test without masking real regressions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses the design feedback on PR #5672: correct use of BuildCore + BuildServerConfig was largely the caller's responsibility (pass the same opts to both, build telemetry/registry once, create + Bind the elicitation requester, choreograph cleanups). Introduce app.Builder to own that single "construct once / wire once" pass: - NewBuilder(ctx, cfg, opts...).Decorate(fn).Finish() builds the shared collaborators exactly once (telemetry from cfg.Telemetry; the K8s backend registry + watcher for the discovered source), wires composite-tool elicitation internally (LateBoundElicitationRequester created and bound so the construction-order chicken-and-egg never leaks), applies the caller's decorator, serves, and returns the server, the assembled core, and a single cleanup func. It also injects token-exchange subject-provider names once so the core's backend client and the transport session factory can't disagree. - BuildCore / BuildServerConfig / LateBoundElicitationRequester stay public as lower-level primitives for advanced embedders that must compose the derived pieces themselves (inspect/modify the ServerConfig, reuse the core, run a custom serve loop) or replace a collaborator via the interface-typed options. - cli.Serve and the example embedder now assemble via the Builder, dogfooding it as the single assembly entry point. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
main #5522 rearchitected vMCP rate limiting from HTTP middleware to a core.VMCP decorator applied at the CallTool seam (below the optimizer), keyed by the resolved backend tool name. It removed ratelimitfactory.NewMiddleware and server.ServerConfig.RateLimitMiddleware in favor of NewLimiter + vmcpratelimit.NewDecorator wired through server.New. Rebasing the assembly work onto that change requires the app package to build the limiter and apply it as a core decorator rather than as transport middleware: - BuildCore builds the limiter via ratelimitfactory.NewLimiter and applies vmcpratelimit.NewDecorator below the code-mode decorator (matching server.New's order: core.New → rate-limit → code-mode), and releases the limiter's Redis connection in its cleanup. Extracted buildDomainCore and applyCoreDecorators helpers to keep BuildCore within the complexity budget. - BuildServerConfig no longer builds rate-limit middleware or sets the removed ServerConfig.RateLimitMiddleware field. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
e4708e9 to
7c97f44
Compare
|
Thanks for turning this around so fast — the To be clear, I'm not asking for any changes here — just thinking out loud, since I found the public-surface question interesting. One thread I keep pulling on: keeping What I couldn't talk myself out of is that each composition case you cited seems expressible as a Builder method or return value rather than an exported primitive:
Concretely, what I'm imagining is a public surface of just |
jerm-dro
left a comment
There was a problem hiding this comment.
Approving. Remaining items are all non-blocking follow-ups: the public-surface question in the comment above, the static/dynamic registry double-build, and unit coverage for the telemetry-from-config and don't-mutate-caller-config paths.
JAORMX
left a comment
There was a problem hiding this comment.
Multi-agent panel review 🤖
Reviewed by a 5-axis panel (correctness · security · architecture/API · tests · vMCP anti-patterns). Non-blocking COMMENT.
Consensus: approve-with-nits. No critical/high bug or vulnerability on the production CLI/operator path (still validated and unchanged). This is a strong refactor — it removes two documented mirrors and shrinks cli/serve.go from ~700 to ~360 lines. Findings below are prioritized; the tests axis alone leaned toward request-changes.
Highest priority
1. Registry "construct once" divergence + a reachable ordering asymmetry (correctness + vMCP, medium)
The Builder builds a single shared registry only for the discovered (K8s) source. For static/dynamic modes, BuildCore and BuildServerConfig each resolve and run discovery independently, producing two snapshots (core aggregates against one, the session manager connects against another) — contradicting the Builder's "construct once" docstring. Compounding this, the two resolvers check branches in different order:
resolveBackendRegistryForCore(build_core.go:242-257) testslen(cfg.Backends) > 0beforediscovered.resolveBackendRegistryForServerConfig(build_server_config.go:831-857) testsdiscoveredfirst.
validator.go has no cross-field rule rejecting non-empty Backends together with source: discovered, so an advanced embedder passing that config gets success from one primitive and errDiscoveredModeRequiresRegistry from the other — despite both doc comments claiming identical behavior.
Suggested fix: build the registry once for all modes and inject via WithBackendRegistry; or make the branch order identical and add the validator cross-field rule.
2. New assembly API skips validation → default-open auth (security, medium)
The public assembly path never calls vmcpconfig.NewValidator().Validate. resolveIncomingAuth silently defaults a nil IncomingAuth to anonymous, and deriveAuthzConfig returns nil (allow-all) — a combination the validator explicitly rejects (validator.go:143). An embedder building a config programmatically and calling Finish() gets a fully unauthenticated, allow-all vMCP fronting every backend, instead of a loud startup error. The shipped internal/exampleembedder passes config straight through, making the unvalidated path the path of least resistance. (Production CLI/operator still validates, so this is a hardening gap in the new public contract, not a shipped vuln.)
Suggested fix: validate inside BuildCore/BuildServerConfig/Finish, or fail loudly on nil IncomingAuth (per go-style.md "fail loudly"); validate in the example embedder.
3. SyncRegistry — the PR's own "extra scrutiny" seam — has zero unit/integration coverage (tests, high)
k8s/manager.go:289 (SyncRegistry: direct-client construction, per-workload conversion, skip-on-error, skip-nil, upsert tolerance, seeded count) is exercised only by the added operator E2E. A regression in the skip/count/upsert logic would silently reintroduce the transiently-empty-registry window this PR fixes, and no unit test would catch it. envtest infra already exists (backend_reconciler_integration_test.go). Related test gaps: TestNewKubernetesBackendRegistry_StartsEmpty now passes only incidentally and its name/comment describe the pre-PR mechanism; decorator ordering (applyCoreDecorators), deriveHealthMonitorConfig happy path, and the embedded-auth-server branch are untested; TestBuildCore_WithInjectedRegistry_SkipsDiscovery / ..._SharedCollaboratorBuiltOnce over-claim with AnyTimes().
Worth addressing
4. docs/arch contradicts the new "primary entry point" (architecture, medium) — docs/arch/vmcp-library.md now calls Builder the primary entry point, but the "Library Embedding Pattern" prose still hand-wires server.New(...) — the exact wiring this PR encapsulates. Add a Builder-based embedding example; note server.New as the low-level alternative.
5. Speculative public surface (vMCP medium / architecture low) — Three exported entry points (Builder + BuildCore + BuildServerConfig), but only Builder has production consumers; the primitives are consumed only by Builder (same package) and tests. Consider unexporting them until a real embedder needs the split, or explicitly marking the surface speculative in the stability table.
Lower / nits
- Doubled side-effecting discovery (
EnsureDefaultGroupExists+Discover) on the local CLI path — same root as #1. Builderstorescontext.Contextin a struct field (documented tradeoff).WithBackendRegistrypanics on nil while every otherWith*option is a total function — asymmetric.- Exporting
LateBoundElicitationRequester/Bindmakes mcp-go's two-phase-lifecycle race window permanent public surface (correctly marked Experimental); redundantBind/bind+ dual constructors. - Helpers thread the full
*vmcpconfig.Configwith nested nil-guards instead of the specific sub-struct (mirrors pre-existingderive.go). header_forward_env.golost the doc sentence explaining secret-backed header values arrive viasecretKeyRefand never enter the operator's view (logic unchanged).
Positives
Removes the documented mirror in backendregistry (closing the transiently-empty-registry window); SyncRegistry reuses the reconciler's conversion instead of duplicating it (no new mirror); god-function removed from the CLI layer; SPDX/naming/%w-wrapping conform; the redis-session E2E Eventually change is a sound de-flake.
Generated with Claude Code — multi-agent panel review.
Addresses #5672 review comments: - MEDIUM build_core.go/build_server_config.go/builder.go (#1): the Builder now constructs the backend registry ONCE for every mode (static, dynamic, and discovered) and injects it into both derivations, so core aggregation and the session manager operate on one snapshot — satisfying the Builder's "construct once" contract and removing the doubled discovery on the CLI path (#6). The two standalone-primitive resolvers now check branches in an identical order (injected → discovered → static → dynamic) so they can no longer disagree. - MEDIUM config/validator.go (#1): reject the contradictory combination of a non-empty Backends list with outgoingAuth.source: discovered, which previously produced success from one primitive and an error from the other. - MEDIUM build_core.go/build_server_config.go (#2): validate the config at the top of BuildCore and BuildServerConfig (inherited by Finish), so a programmatic embedder gets a loud error instead of a nil IncomingAuth silently defaulting to anonymous + allow-all. The example embedder is covered transitively. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses #5672 review comment #3 (tests): - Extract SyncRegistry's per-workload conversion+upsert loop into seedRegistryFromDiscoverer and unit-test it (k8s/seed_registry_test.go) with a mock discoverer: list-error-is-fatal, empty group, happy-path count, skip-on-conversion-error, and skip-nil-backend. This is the PR's "extra scrutiny" seam — previously exercised only by the operator E2E. - Rename TestNewKubernetesBackendRegistry_StartsEmpty → ..._ReturnsRegistryAndWatcher and fix its comment: the constructor now seeds synchronously, so the empty result reflects an unreachable fake API, not a by-design empty start. - Add TestValidator_RejectsBackendsWithDiscoveredSource for the new cross-field rule. - Add TestDeriveHealthMonitorConfig covering the disabled and happy paths (default and explicit timeout + circuit breaker). - Rename TestBuildCore_SharedCollaboratorBuiltOnce → ..._InjectedRegistryUsedByBothBuilds so it no longer over-claims single construction (which is the Builder's guarantee, covered there). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses #5672 review comments (approve-with-nits): - LOW docs/arch/vmcp-library.md (#4): add an app.Builder embedding example as the recommended path and reposition server.New as the low-level composition root, so the doc no longer hand-wires the exact assembly this PR encapsulates. - INFO docs/arch/vmcp-library.md (#5): mark BuildCore/BuildServerConfig/Option as speculative lower-level primitives in the stability table; the Builder is the primary supported entry point. - LOW pkg/vmcp/server/elicitation_latebound.go (#9): drop the redundant unexported newLateBoundElicitationRequester/bind aliases now that the app package consumes the exported NewLateBoundElicitationRequester/Bind directly. - LOW pkg/vmcp/app/header_forward_env.go (#11): restore the doc sentence noting secret-backed header values are delivered via valueFrom.secretKeyRef and never enter the operator's view (logic unchanged). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Thanks for the panel review — really thorough. Addressed all the recommended findings across three focused commits; details below (finding numbers match the review). Fixed#1 — registry "construct once" divergence + ordering asymmetry — #2 — assembly API skipped validation → default-open auth — #3 — #4 — docs/arch hand-wired #5 (partial) — speculative surface marked in the stability table — #6 — doubled side-effecting discovery on the local CLI path — #9 — redundant #11 — Kept as-is (with rationale)#5 — unexporting #7 — #8 — #10 — helpers thread the full All unit tests + lint (0 issues) pass. |
Summary
Closes #5581
vMCP's assembly — turning a
vmcpconfig.Configinto a running server — previously livedentirely in
pkg/vmcp/cli/serve.go(~700 lines). Any embedder (e.g., the enterprisegateway) that wanted a vMCP server had to replicate that wiring by hand, and would break
whenever OSS changed its assembly sequence or internal rules. This PR encapsulates the
assembly behind a public API in the new
pkg/vmcp/apppackage so that callers hand overconfig and get back a
core.VMCPto decorate and a*server.ServerConfigto serve —without reimplementing vMCP internals.
app.BuildCorederivescore.Configfromvmcpconfig.Config(+ options) andreturns the assembled
core.VMCPfor decoration.app.BuildServerConfigindependently derives*server.ServerConfigfrom the sameconfig — the transport config is not a byproduct of core construction.
Optionvalues carry non-serializable runtime injectables: version, host/port,session TTL, telemetry provider, backend registry + watcher, REST config, embedded auth
server run config, status reporter, and elicitation requester.
injected via options so one instance is shared between both
Build*calls — no duplicateOTEL pipelines or K8s informer caches.
pkg/vmcp/cli/serve.gois migrated to callapp.BuildCore+app.BuildServerConfig,reducing it from ~700 lines to ~360 and removing all domain assembly logic from the CLI.
LateBoundElicitationRequesterand its constructor are exported so embedders can wirecomposite-tool elicitation without access to package-internal types.
The migration is behavior-preserving for the production CLI/operator path. Two behaviors
that the legacy
server.New+ hand-wiredcli.Serveprovided had to be carried into theassembly, since the operator launches the same
vmcp servebinary in its pods:path builds its registry via
backendregistry.NewKubernetesBackendRegistry, which startedempty and relied solely on the watcher's controller-runtime reconciler to populate it. That
reconcile runs asynchronously and lags the informer cache-sync that the pod's
/readyzprobe gates on, so the server could pass readiness and begin serving (and reporting
status.discoveredBackends) while the registry was still empty — a startup window where itadvertised zero backends. The legacy
cli.Serveavoided this by seeding the registry from asynchronous discovery before serving. To restore that, a new
k8s.BackendWatcher.SyncRegistrylists the group's current backends through the same
workloads.Discovererconversion thereconciler uses and upserts them;
NewKubernetesBackendRegistrycalls it before returning, sothe registry is populated before the server serves — for both
cli.Serveand embedders. Thewatcher still keeps it current; a seed failure is logged and tolerated. This was caught by the
operator lifecycle E2E (
VirtualMCPServer Unauthenticated Backend Auth).BuildCore.server.Newwrapped the core withcodemode.NewDecoratorwhen configured;BuildCorenow does the same fromvmcpConfig.CodeMode, so theexecute_tool_scriptvirtual tool is advertised on theassembly path (it sits below any Serve-layer optimizer). Caught by the
VirtualMCPServer Code ModeE2E.Public interfaces are unchanged:
core.VMCP,core.New/Config,server.Serve/ServerConfig/New,and
backendregistry.NewKubernetesBackendRegistry's signature are byte-identical tomain, sodownstream embedders are unaffected (the registry seed is a backward-compatible improvement —
callers now receive a populated registry instead of a transiently-empty one).
Large PR Justification
This PR is a single, atomic logical change — encapsulating vMCP assembly behind the
new
pkg/vmcp/appAPI (issue #5581) — that cannot be cleanly split:apppackage(
BuildCore/BuildServerConfig/Option) and the migration ofpkg/vmcp/cli/serve.goonto it are two halves of one change. Migrating
cli.Serveis an explicit acceptancecriterion of Encapsulate vMCP assembly behind a config-driven public API #5581 ("prove the API is sufficient for the real caller"). Landing the API
without the migration would add dead code; landing the migration without the API is
impossible. Splitting them would leave a broken or dead intermediate state.
large reduction in
cli/serve.go(~700 → ~360 lines; 499 changed, mostly deletions).Roughly 600 added lines are tests (6 files), the runnable example embedder, and docs.
(discovered mode, code mode, auth discovery, aggregation) pass on a Kind cluster running
the branch's image.
Splitting into <1000-line PRs would increase review overhead and risk without improving
reviewability, since the pieces only make sense together.
Type of change
Test plan
task test)task lint-fix)task thv-operator-e2e-test-runon a local Kind cluster)Unit tests in
pkg/vmcp/app/build_core_test.goandpkg/vmcp/app/build_server_config_test.gocover both static-backends and dynamic-discovery modes, option application, and nil-config
rejection. The
pkg/vmcp/app/internal/exampleembedderpackage demonstrates the fullBuildCore → decorate → BuildServerConfig → server.Serveflow and validates that thepublic API types plug directly into
server.Serve.The operator lifecycle E2E suite was run on a local Kind cluster against a vmcp image built
from this branch, confirming the discovered-mode seeding and code-mode fixes: the
VirtualMCPServer Unauthenticated Backend Auth,Code Mode,Auth Discovery, and discovered-modeaggregation/circuit-breaker specs pass. (The optimizer specs require the TEI embeddings image,
which has no arm64 manifest, so they were exercised in CI rather than locally.)
Changes
pkg/vmcp/app/options.goOptionfunctional type and allWith*constructorspkg/vmcp/app/build_core.goBuildCore, backend-registry resolution, and code-mode decorator wiringpkg/vmcp/app/build_core_test.goBuildCorepkg/vmcp/app/build_server_config.goBuildServerConfigand transport derivation helperspkg/vmcp/app/build_server_config_test.goBuildServerConfigpkg/vmcp/app/discovery_internal_test.gorunDiscovery/vmcpNamespacetests fromclipkg/vmcp/app/importgraph_test.gopkg/vmcp/app/header_forward_env.gopkg/vmcp/cli/(no logic change)pkg/vmcp/app/header_forward_env_test.gopkg/vmcp/cli/(no logic change)pkg/vmcp/app/internal/exampleembedder/embedder.gopkg/vmcp/backendregistry/registry.goSyncRegistrybefore returning (discovered mode)pkg/vmcp/k8s/manager.goBackendWatcher.SyncRegistry: synchronous initial backend seedpkg/vmcp/config/config.goOutgoingAuthSourceInline/OutgoingAuthSourceDiscoveredconstantspkg/vmcp/config/validator.gopkg/vmcp/cli/serve.goapp.BuildCore+app.BuildServerConfigpkg/vmcp/cli/serve_test.goserve.gopkg/vmcp/cli/header_forward_env.gopkg/vmcp/app/)pkg/vmcp/server/elicitation_latebound.goLateBoundElicitationRequester,NewLateBoundElicitationRequester, andBinddocs/arch/vmcp-library.mdpkg/vmcp/appstability-table row; noted the elicitation exportDoes this introduce a user-facing change?
No. The CLI behavior of
thv vmcp serveis unchanged. The newpkg/vmcp/apppackage isadditive public API intended for embedders; no existing user-facing behavior is modified.
Special notes for reviewers
Build*functions are intentionally independent derivations from the samevmcpconfig.Config, mirroring the existingderiveCoreConfig/deriveServerConfigsplit.BuildCoredoes not return aServerConfig— the transport config is not a byproduct ofcore construction.
WithBackendRegistry;BuildServerConfigreturns an error if it is absent, to preventstarting two K8s informer caches.
LateBoundElicitationRequesterexport (pkg/vmcp/server) adds a small amount ofsurface to an existing unexported type. The package-internal
newLateBoundElicitationRequesterand
bindaliases are retained soserver.Newcall sites need no change.BackendWatcher.SyncRegistryreads the current backends with a direct, uncached client(the manager cache isn't started yet at seed time) and reuses the reconciler's
workloads.Discoverer.GetWorkloadAsVMCPBackendconversion, so the seeded set is identical towhat the watcher would produce. It is the seam that makes the discovered-mode assembly path
behavior-preserving versus the legacy
cli.Serve; worth extra scrutiny.Generated with Claude Code