Skip to content

Make resource names (like service + namespace) configurable#13

Merged
EItanya merged 9 commits into
kagent-dev:mainfrom
jjamroga:jjamroga/atenet-configurable-resource-names-and-chart
Jul 1, 2026
Merged

Make resource names (like service + namespace) configurable#13
EItanya merged 9 commits into
kagent-dev:mainfrom
jjamroga:jjamroga/atenet-configurable-resource-names-and-chart

Conversation

@jjamroga

@jjamroga jjamroga commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

Makes substrate's component Service names operator-overrideable via flags and wires Kubernetes' downward API through the Helm chart so atenet, ateapi, and atelet pick up their pod's namespace automatically. The canonical chart render and manifests/ate-install/ remain byte-identical at the default release name and namespace.

Motivated by deploying substrate as a Helm subchart of kagent (OSS), where the parent release prefixes substrate resource names with its release name and runs everything in the parent's namespace. With hardcoded values in the substrate code, the components silently fail to find each other after install:

  • dns-controller can't locate atenet-router or substrate's CoreDNS Service (forbidden / not found in ate-system).
  • ate-controller dials dns:///api.ate-system.svc:443 and gets NXDOMAIN — the chart never passed --ateapi-conn-spec either.
  • ate-api-server's atelet pod informer watches a hardcoded ate-system namespace and finds no atelet pods.

What changed

New package internal/installdefaults

Single source of truth for the canonical install names:

  • SystemNamespace = "ate-system"
  • APIServiceName = "api"
  • RouterServiceName = "atenet-router"
  • DNSServiceName = "dns"
  • PodNamespaceEnv = "POD_NAMESPACE"
  • NamespaceFromPodEnv() helper for env-var-with-fallback resolution.

dns/, router/, and controlapi/ no longer duplicate the same string literals.

Binary changes

  • atenet dns:
    • --router-service-name (default installdefaults.RouterServiceName)
    • --dns-service-name (default installdefaults.DNSServiceName)
    • Namespace derived from POD_NAMESPACE env (downward API), no flag.
  • atenet router:
    • --router-service-name (default installdefaults.RouterServiceName) — used by /statusz.
  • ateapi:
    • AteletInformer takes the namespace as a parameter; main.go resolves it from POD_NAMESPACE.
  • internal/ateclient/builder.go:
    • kubectl-ate's port-forward helper now references installdefaults.SystemNamespace / APIServiceName instead of literal "ate-system" / "api".

Tests in cmd/atenet/internal/dns and cmd/ateapi/internal/controlapi updated to set the new fields explicitly.

Chart templates

  • charts/substrate/templates/atenet-dns.yaml — passes --router-service-name, --dns-service-name resolved via substrate.fullname; injects POD_NAMESPACE via downward API.
  • charts/substrate/templates/atenet-router.yaml — passes --router-service-name.
  • charts/substrate/templates/ate-controller.yaml — also adds --ateapi-conn-spec (the binary already accepts it; the chart had only been passing --ateapi-server-name, which is just SNI, not the dial target).
  • charts/substrate/templates/ate-api-server.yaml — injects POD_NAMESPACE via downward API.

At the canonical render (release name substrate, namespace ate-system) every flag resolves to the same value the code previously hardcoded, so existing top-level installs are byte-identical before and after.

Manifests catch-up

The chart regen commit also picks up unrelated chart drift that had accumulated but never been reconciled (s3-backed atelet storage, rustfs, trimmed valkey, etc.). A follow-up commit restores sandboxconfig-gvisor.yaml and sandboxconfig-validation.yaml — those aren't chart-rendered but are hand-maintained and depended on by the SandboxConfig ValidatingAdmissionPolicy test and the e2e installer.

Related

A separate issue tracks the next templating gap surfaced during testing — dns-controller only patches the GKE-style kube-dns ConfigMap and silently no-ops on kubeadm/kind clusters that use kube-system/coredns: agent-substrate#348. Out of scope for this PR.

Validation

End-to-end on a local kind cluster consuming substrate as a Helm subchart of a parent release:

  • ✅ dns-controller can locate atenet-router and substrate's dns Services in the release namespace (was: services "atenet-router" is forbidden ... in namespace "ate-system")
  • ✅ ate-controller dials ate-api-server (was: NXDOMAIN on api.ate-system.svc:443)
  • ✅ ate-api-server finds atelet in the release namespace (was: found 0 atelet pods on node)
  • CreateActorResumeActorAssignWorker succeeds
  • ✅ atelet receives SandboxAssets, downloads runsc, gVisor sandbox starts, actor reaches STATUS_RUNNING

(Past that point, gVisor's runsc checkpoint fails on Apple Silicon kind due to a separate linuxkit-kernel limitation; unrelated to this PR.)

Compatibility

Backward compatible. Every new flag has a default matching the previously hardcoded value. The chart's existing one-shot install path (helm install substrate ... -n ate-system) renders identically before and after.

Test plan

  • go build ./...
  • go vet ./...
  • go test ./cmd/atenet/internal/... ./cmd/ateapi/internal/... ./internal/installdefaults/... (all pass)
  • helm lint charts/substrate clean
  • End-to-end demo (SandboxAgent on substrate) reconciles past the originally-failing dial steps when substrate is installed as a subchart

@iplay88keys

Copy link
Copy Markdown

Have you run the manifest generation? I ran make verify-helm-template and it looks like the manifests/ate-install folder is out-of-date.

@EItanya EItanya force-pushed the main branch 2 times, most recently from d286c7d to f16d3d7 Compare June 30, 2026 13:44
@jjamroga jjamroga force-pushed the jjamroga/atenet-configurable-resource-names-and-chart branch from 0589059 to 13904b1 Compare June 30, 2026 14:02
@jjamroga jjamroga force-pushed the jjamroga/atenet-configurable-resource-names-and-chart branch 2 times, most recently from 03f370f to 133474d Compare July 1, 2026 15:40
jjamroga added 9 commits July 1, 2026 12:22
The dns-controller (`atenet dns`) and router (`atenet router`) hardcoded
the substrate namespace ("ate-system") and the component Service names
("atenet-router", "dns") from the canonical install manifests under
`manifests/ate-install/`. Deployments that deviate from that layout —
running in a different namespace, renaming the Services, or composing
substrate into a larger install that rewrites resource names — silently
break: the dns-controller can't find atenet-router, the router can't
find itself for /statusz, and the cluster's actor DNS never gets
patched.

Expose the relevant names as flags on the cobra commands and as fields
on `dns.Controller` / `router.RouterConfig`. Defaults match the values
in `manifests/ate-install/` so existing deployments are unaffected:

  atenet dns:
    --system-namespace       (default "ate-system")
    --router-service-name    (default "atenet-router")
    --dns-service-name       (default "dns")

  atenet router:
    --router-service-name    (default "atenet-router")
The atelet pod informer hardcoded `ateletNamespace = "ate-system"`, so
ate-api-server could only locate atelet pods in that namespace.
Deployments that run atelet elsewhere — an alternative install layout
or a larger composition that relocates substrate components — leave the
informer's cache empty and ResumeActor fails with
`found 0 atelet pods on node "<node>", expected 1`.

Promote the constant to an exported default and accept the namespace as
a parameter to `AteletInformer`. Add an `--atelet-namespace` flag on
the ateapi binary (default DefaultAteletNamespace) that callers
override when needed.
…router

Wire the new flags added in the previous commit through the Helm
templates so the canonical-render defaults are overridden when the chart
is used as a subchart (e.g. the kagent-enterprise composition where
substrate.fullname prefixes all component Service names).

For atenet-dns the dns-controller now receives:
  --system-namespace={{ .Release.Namespace }}
  --router-service-name={{ include "substrate.fullname" (list "atenet-router" .) }}
  --dns-service-name={{ include "substrate.fullname" (list "dns" .) }}

For atenet-router the /statusz lookup gets:
  --router-service-name={{ include "substrate.fullname" (list "atenet-router" .) }}

When the release name equals the chart name ("substrate") these expand
to the canonical bare names, preserving existing behavior for top-level
installs.
Wire the new ateapi flag from the previous commit through the chart so
the atelet pod informer watches the chart's release namespace by default.
Canonical render (release name "substrate" in namespace "ate-system")
still produces "--atelet-namespace=ate-system", so behavior is unchanged
for top-level installs.
Re-runs `make helm-template` so the checked-in render matches the
chart. Brings in rustfs.yaml, the s3-backed atelet storage envvars,
the trimmed valkey manifest, and drops the no-longer-templated
sandboxconfig-gvisor and sandboxconfig-validation manifests.
`make verify-helm-template` now passes.
…_NAMESPACE

Addresses review comments on agent-substrate#350:

- New internal/installdefaults package owns SystemNamespace,
  RouterServiceName, DNSServiceName. dns, router, and controlapi/informer
  drop their duplicate Default* constants and reference installdefaults
  via the matching flag declarations and tests.

- Drop the --atelet-namespace flag on ateapi. The namespace is now
  resolved at startup from the POD_NAMESPACE env var (Kubernetes' downward
  API), falling back to installdefaults.SystemNamespace for non-k8s
  invocations (tests, local dev). atelet and ateapi share a namespace in
  every supported deployment topology, so a separate knob was dead weight.
Same rationale as the prior atelet-namespace change: atenet, atenet-router,
and substrate's CoreDNS live in a single namespace in every supported
deployment topology, so a separate --system-namespace flag was dead
weight. Resolve from the POD_NAMESPACE env var (Kubernetes' downward
API) with installdefaults.SystemNamespace as the fallback for non-k8s
runs.

--router-service-name and --dns-service-name stay as flags because a
subchart deployment renames those Services with a release prefix, and
the binary can't derive that from pod metadata.
…ardcodes

Three follow-ups from the self-review:

- Extract the POD_NAMESPACE-with-SystemNamespace-fallback pattern into
  installdefaults.NamespaceFromPodEnv() so ateapi and atenet share a
  single implementation (also makes a third call site one line instead
  of four if anyone needs one).
- Add installdefaults.PodNamespaceEnv ("POD_NAMESPACE") and APIServiceName
  ("api") so the constant set covers every name in the canonical install
  layout that's referenced by Go code.
- Route internal/ateclient/builder.go's previously-hardcoded "ate-system"
  and "api" lookups through installdefaults, so kubectl-ate's port-forward
  no longer bypasses the new single source of truth.

ate-controller (ServiceAccount), ate-api-server-deployment (Deployment),
and "api.ate-system.svc" (JWT audience) are still hardcoded but their
configurability needs a real flag/discovery story and is out of scope
for this PR.
The JWT install overlay (manifests/ate-install/jwt) references
ate-client.yaml as a top-level resource, but the chart previously
guarded the SA behind {{ if eq .Values.auth.mode "jwt" }} so
render-manifests.sh (mtls) never emitted it. That divergence broke
verify-helm-template after merging the upstream JWT fix that added a
hand-maintained manifests/ate-install/ate-client.yaml.

The SA is harmless in mtls installs (unused), so render it
unconditionally so the chart is the single source of truth.
@jjamroga jjamroga force-pushed the jjamroga/atenet-configurable-resource-names-and-chart branch from ce2405a to 3e2403b Compare July 1, 2026 16:23
@EItanya EItanya merged commit de0b0e6 into kagent-dev:main Jul 1, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants