Make resource names (like service + namespace) configurable#13
Merged
EItanya merged 9 commits intoJul 1, 2026
Merged
Conversation
|
Have you run the manifest generation? I ran |
d286c7d to
f16d3d7
Compare
0589059 to
13904b1
Compare
03f370f to
133474d
Compare
The dns-controller (`atenet dns`) and router (`atenet router`) hardcoded
the substrate namespace ("ate-system") and the component Service names
("atenet-router", "dns") from the canonical install manifests under
`manifests/ate-install/`. Deployments that deviate from that layout —
running in a different namespace, renaming the Services, or composing
substrate into a larger install that rewrites resource names — silently
break: the dns-controller can't find atenet-router, the router can't
find itself for /statusz, and the cluster's actor DNS never gets
patched.
Expose the relevant names as flags on the cobra commands and as fields
on `dns.Controller` / `router.RouterConfig`. Defaults match the values
in `manifests/ate-install/` so existing deployments are unaffected:
atenet dns:
--system-namespace (default "ate-system")
--router-service-name (default "atenet-router")
--dns-service-name (default "dns")
atenet router:
--router-service-name (default "atenet-router")
The atelet pod informer hardcoded `ateletNamespace = "ate-system"`, so ate-api-server could only locate atelet pods in that namespace. Deployments that run atelet elsewhere — an alternative install layout or a larger composition that relocates substrate components — leave the informer's cache empty and ResumeActor fails with `found 0 atelet pods on node "<node>", expected 1`. Promote the constant to an exported default and accept the namespace as a parameter to `AteletInformer`. Add an `--atelet-namespace` flag on the ateapi binary (default DefaultAteletNamespace) that callers override when needed.
…router
Wire the new flags added in the previous commit through the Helm
templates so the canonical-render defaults are overridden when the chart
is used as a subchart (e.g. the kagent-enterprise composition where
substrate.fullname prefixes all component Service names).
For atenet-dns the dns-controller now receives:
--system-namespace={{ .Release.Namespace }}
--router-service-name={{ include "substrate.fullname" (list "atenet-router" .) }}
--dns-service-name={{ include "substrate.fullname" (list "dns" .) }}
For atenet-router the /statusz lookup gets:
--router-service-name={{ include "substrate.fullname" (list "atenet-router" .) }}
When the release name equals the chart name ("substrate") these expand
to the canonical bare names, preserving existing behavior for top-level
installs.
Wire the new ateapi flag from the previous commit through the chart so the atelet pod informer watches the chart's release namespace by default. Canonical render (release name "substrate" in namespace "ate-system") still produces "--atelet-namespace=ate-system", so behavior is unchanged for top-level installs.
Re-runs `make helm-template` so the checked-in render matches the chart. Brings in rustfs.yaml, the s3-backed atelet storage envvars, the trimmed valkey manifest, and drops the no-longer-templated sandboxconfig-gvisor and sandboxconfig-validation manifests. `make verify-helm-template` now passes.
…_NAMESPACE Addresses review comments on agent-substrate#350: - New internal/installdefaults package owns SystemNamespace, RouterServiceName, DNSServiceName. dns, router, and controlapi/informer drop their duplicate Default* constants and reference installdefaults via the matching flag declarations and tests. - Drop the --atelet-namespace flag on ateapi. The namespace is now resolved at startup from the POD_NAMESPACE env var (Kubernetes' downward API), falling back to installdefaults.SystemNamespace for non-k8s invocations (tests, local dev). atelet and ateapi share a namespace in every supported deployment topology, so a separate knob was dead weight.
Same rationale as the prior atelet-namespace change: atenet, atenet-router, and substrate's CoreDNS live in a single namespace in every supported deployment topology, so a separate --system-namespace flag was dead weight. Resolve from the POD_NAMESPACE env var (Kubernetes' downward API) with installdefaults.SystemNamespace as the fallback for non-k8s runs. --router-service-name and --dns-service-name stay as flags because a subchart deployment renames those Services with a release prefix, and the binary can't derive that from pod metadata.
…ardcodes
Three follow-ups from the self-review:
- Extract the POD_NAMESPACE-with-SystemNamespace-fallback pattern into
installdefaults.NamespaceFromPodEnv() so ateapi and atenet share a
single implementation (also makes a third call site one line instead
of four if anyone needs one).
- Add installdefaults.PodNamespaceEnv ("POD_NAMESPACE") and APIServiceName
("api") so the constant set covers every name in the canonical install
layout that's referenced by Go code.
- Route internal/ateclient/builder.go's previously-hardcoded "ate-system"
and "api" lookups through installdefaults, so kubectl-ate's port-forward
no longer bypasses the new single source of truth.
ate-controller (ServiceAccount), ate-api-server-deployment (Deployment),
and "api.ate-system.svc" (JWT audience) are still hardcoded but their
configurability needs a real flag/discovery story and is out of scope
for this PR.
The JWT install overlay (manifests/ate-install/jwt) references
ate-client.yaml as a top-level resource, but the chart previously
guarded the SA behind {{ if eq .Values.auth.mode "jwt" }} so
render-manifests.sh (mtls) never emitted it. That divergence broke
verify-helm-template after merging the upstream JWT fix that added a
hand-maintained manifests/ate-install/ate-client.yaml.
The SA is harmless in mtls installs (unused), so render it
unconditionally so the chart is the single source of truth.
ce2405a to
3e2403b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes substrate's component Service names operator-overrideable via flags and wires Kubernetes' downward API through the Helm chart so atenet, ateapi, and atelet pick up their pod's namespace automatically. The canonical chart render and
manifests/ate-install/remain byte-identical at the default release name and namespace.Motivated by deploying substrate as a Helm subchart of kagent (OSS), where the parent release prefixes substrate resource names with its release name and runs everything in the parent's namespace. With hardcoded values in the substrate code, the components silently fail to find each other after install:
dns-controllercan't locateatenet-routeror substrate's CoreDNS Service (forbidden / not found inate-system).ate-controllerdialsdns:///api.ate-system.svc:443and gets NXDOMAIN — the chart never passed--ateapi-conn-speceither.ate-api-server's atelet pod informer watches a hardcodedate-systemnamespace and finds no atelet pods.What changed
New package
internal/installdefaultsSingle source of truth for the canonical install names:
SystemNamespace = "ate-system"APIServiceName = "api"RouterServiceName = "atenet-router"DNSServiceName = "dns"PodNamespaceEnv = "POD_NAMESPACE"NamespaceFromPodEnv()helper for env-var-with-fallback resolution.dns/,router/, andcontrolapi/no longer duplicate the same string literals.Binary changes
atenet dns:--router-service-name(defaultinstalldefaults.RouterServiceName)--dns-service-name(defaultinstalldefaults.DNSServiceName)POD_NAMESPACEenv (downward API), no flag.atenet router:--router-service-name(defaultinstalldefaults.RouterServiceName) — used by/statusz.ateapi:AteletInformertakes the namespace as a parameter;main.goresolves it fromPOD_NAMESPACE.internal/ateclient/builder.go:kubectl-ate's port-forward helper now referencesinstalldefaults.SystemNamespace/APIServiceNameinstead of literal"ate-system"/"api".Tests in
cmd/atenet/internal/dnsandcmd/ateapi/internal/controlapiupdated to set the new fields explicitly.Chart templates
charts/substrate/templates/atenet-dns.yaml— passes--router-service-name,--dns-service-nameresolved viasubstrate.fullname; injectsPOD_NAMESPACEvia downward API.charts/substrate/templates/atenet-router.yaml— passes--router-service-name.charts/substrate/templates/ate-controller.yaml— also adds--ateapi-conn-spec(the binary already accepts it; the chart had only been passing--ateapi-server-name, which is just SNI, not the dial target).charts/substrate/templates/ate-api-server.yaml— injectsPOD_NAMESPACEvia downward API.At the canonical render (release name
substrate, namespaceate-system) every flag resolves to the same value the code previously hardcoded, so existing top-level installs are byte-identical before and after.Manifests catch-up
The chart regen commit also picks up unrelated chart drift that had accumulated but never been reconciled (s3-backed atelet storage, rustfs, trimmed valkey, etc.). A follow-up commit restores
sandboxconfig-gvisor.yamlandsandboxconfig-validation.yaml— those aren't chart-rendered but are hand-maintained and depended on by the SandboxConfig ValidatingAdmissionPolicy test and the e2e installer.Related
A separate issue tracks the next templating gap surfaced during testing —
dns-controlleronly patches the GKE-stylekube-dnsConfigMap and silently no-ops on kubeadm/kind clusters that usekube-system/coredns: agent-substrate#348. Out of scope for this PR.Validation
End-to-end on a local kind cluster consuming substrate as a Helm subchart of a parent release:
atenet-routerand substrate'sdnsServices in the release namespace (was:services "atenet-router" is forbidden ... in namespace "ate-system")api.ate-system.svc:443)found 0 atelet pods on node)CreateActor→ResumeActor→AssignWorkersucceedsSandboxAssets, downloads runsc, gVisor sandbox starts, actor reachesSTATUS_RUNNING(Past that point, gVisor's
runsc checkpointfails on Apple Silicon kind due to a separate linuxkit-kernel limitation; unrelated to this PR.)Compatibility
Backward compatible. Every new flag has a default matching the previously hardcoded value. The chart's existing one-shot install path (
helm install substrate ... -n ate-system) renders identically before and after.Test plan
go build ./...go vet ./...go test ./cmd/atenet/internal/... ./cmd/ateapi/internal/... ./internal/installdefaults/...(all pass)helm lint charts/substrateclean