Skip to content

docs: add "Route to a Kubernetes service with HA" how-to#810

Open
SunsetDrifter wants to merge 14 commits into
mainfrom
cc/k8s-ha-routing-peers
Open

docs: add "Route to a Kubernetes service with HA" how-to#810
SunsetDrifter wants to merge 14 commits into
mainfrom
cc/k8s-ha-routing-peers

Conversation

@SunsetDrifter

@SunsetDrifter SunsetDrifter commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a full how-to guide, Route to a Kubernetes service with high availability, under Kubernetes > Use Cases. It walks the whole journey of reaching a private in-cluster ClusterIP Service from a NetBird client through a redundant pool of routing-peer pods, so access survives a pod or node failure.

The guide covers the NetBird-side pieces the operator does not create for you, then the operator CRDs, then verification:

  1. Create a custom DNS zone — created empty; the operator fills in the A record (<service>.<namespace>.<zone>) automatically when you expose a Service.
  2. Create groups and an access policy — NetBird is deny-by-default; the operator writes no groups or policies.
  3. Deploy the routing peers (HA)NetworkRouter with workloadOverride.replicas, the group-backed router at one metric for equal-metric failover, and the auto PodDisruptionBudget. Node spread is woven in: Kubernetes spreads replicas across nodes by default, with a topologySpreadConstraints (ScheduleAnyway) example and a note on the DoNotSchedule rolling-update deadlock.
  4. Expose the ServiceNetworkResource (ClusterIP-only) placed in the destination group.
  5. Verify and test failover — pods spread across nodes, PDB, resolve + curl, drop a pod and watch traffic continue.

Also includes an appendix on friendly DNS names (CNAME alias to the operator-managed record) and a custom dark-mode topology diagram.

Changes

  • New page: manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service.mdx
  • Nav: new Use Cases group under Kubernetes
  • Assets: custom dark-mode topology.svg + dashboard/terminal screenshots

Summary by CodeRabbit

  • Documentation
    • Added a new Kubernetes integration guide explaining how to route NetBird clients to in-cluster Kubernetes services through routing peers, including setup and failover testing procedures.
    • Updated navigation menu to include the new Kubernetes use case documentation page.

…erator)

Add a standalone use-case page under a new Use Cases group in the Kubernetes
nav, covering how to run the operator's routing peers in HA: NetworkRouter
workloadOverride.replicas (default 3), the auto-created PodDisruptionBudget
(maxUnavailable: 1), equal-metric automatic failover, and spreading replicas
across failure domains via workloadOverride.podTemplate. Models least-privilege
(named destination group + access policy) rather than the All group.
Two SVG topology diagrams: replicas on a single node (single point of
failure) and replicas spread one-per-node via topologySpreadConstraints.
Embedded in Step 1 and the failure-domains section.
kube-scheduler spreads a Deployment's replicas across nodes by default
(best-effort, via built-in PodTopologySpread defaults). The earlier text/
diagram wrongly implied replicas co-locate by default. Reframe: multi-node
spread is the default; topologySpreadConstraints turns it into a guarantee
(or spans zones). Remove the single-node diagram (non-HA case, out of scope).
Document exposing a service under a cleaner name via a CNAME in a custom
zone pointing at the operator's <service>.<namespace>.<zone> record (verified
end-to-end). Placed as an appendix for now; can move to a shared location later.
…t deadlock

Multi-node verification: default scheduling already spreads replicas one-per-node;
the operator merges workloadOverride.podTemplate.topologySpreadConstraints into the
Deployment. DoNotSchedule with replicas == schedulable nodes deadlocks rolling updates
(surge pod can't place). Switch the example to ScheduleAnyway (verified clean rollout)
and document DoNotSchedule + the node-count/maxSurge caveat for a hard guarantee.
…wing)

Verified on the lab: a NetBird custom zone serves only the records you add; other
names under the domain fall through to upstream DNS. Reusing a real internal domain
for friendly names is safe except for exact-name collisions.
Restructure the HA use-case page into an end-to-end guide covering the whole
journey: create the custom DNS zone, groups, and access policy (dashboard) ->
deploy HA routing peers (NetworkRouter, replicas:3) -> expose a Service
(NetworkResource) -> verify + failover. Generic, human-readable example names
(k8s.company.internal, kubernetes-clients/-services, network 'kubernetes',
nginx). Keeps the failure-domains diagram + ScheduleAnyway/DoNotSchedule note
and the friendly-DNS appendix. Adds <img> slots for 5 dashboard/terminal
screenshots (to be supplied). Renames the page + nav entry to
route-to-a-kubernetes-service; old slug removed.
Four screenshots (DNS zone, access policy, the kubernetes network with HA +
3 routing peers, kubectl pods-across-nodes). Drop the groups screenshot and
renumber the <img> refs to match.
Node-spread is the point of an HA guide, not a tail-end section. Move the
topology diagram up to 'What you'll achieve', fold the node-spread story into
Step 3 (deploy HA routing peers) - leading with the verified fact that the
scheduler spreads replicas across nodes by default (HA out of the box), with
topologySpreadConstraints as optional hardening - and drop the orphaned
'Spread across failure domains' section.
…cord)

Step 1 showed the auto-created A record without saying you don't enter it.
Note that you create only the zone (no hostname/IP/TTL by hand) and the
operator adds <service>.<namespace>.<zone> -> ClusterIP (5-min TTL) in Step 4.
Hand-authored dark-background topology diagram (NetBird overlay -> routing
peers one-per-node -> Service) that matches the dark docs theme, replacing the
light Excalidraw-derived SVG. Removes the orphaned ha-routing-peers-spread-nodes.svg.
Show the Add DNS Record dialog (CNAME 'app' -> nginx.default.k8s.company.internal)
and align the example hostname to 'app' to match.
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@SunsetDrifter, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 59 minutes and 14 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses rolling per-developer review limits. Reviews become available again as older review attempts age out of the rolling limit window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2b5b2998-54b2-4102-aa99-cf79b968fe82

📥 Commits

Reviewing files that changed from the base of the PR and between cce9e07 and 020d952.

📒 Files selected for processing (1)
  • src/pages/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service.mdx
📝 Walkthrough

Walkthrough

A new "Use Cases" navigation group is added under the Kubernetes integration in NavigationDocs.jsx, containing a single link. A new MDX page is added documenting how to route NetBird clients to a Kubernetes ClusterIP Service through an HA pool of routing peers using DNS, access-control policies, NetworkRouter, and NetworkResource CRDs.

Changes

Kubernetes Use Cases: Route to a ClusterIP Service

Layer / File(s) Summary
Navigation entry for Kubernetes Use Cases
src/components/NavigationDocs.jsx
Adds a new nested "Use Cases" group (isOpen: false) under the Kubernetes integration section containing a single child link for "Route to a Kubernetes Service".
Guide intro, prerequisites, and DNS zone setup
src/pages/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service.mdx
Introduces the guide's goal and end-to-end flow, lists prerequisites and example object names, and provides Step 1 for creating a custom DNS zone with a distribution group.
Access control policy and HA NetworkRouter deployment
src/pages/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service.mdx
Step 2 creates two access-control groups and a TCP/80 policy; Step 3 deploys a NetworkRouter with workloadOverride.replicas, PodDisruptionBudget behavior, and optional topology spread constraints.
NetworkResource exposure, verification, failover, and DNS appendix
src/pages/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service.mdx
Step 4 exposes the ClusterIP Service via NetworkResource with operator-managed DNS A record; Step 5 provides verification commands and a failover test; appendix covers the <service>.<namespace>.<zone> naming scheme and CNAME aliasing.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • netbirdio/docs#796: Also adds a new "Use Cases" navigation group entry to docsNavigation in NavigationDocs.jsx, following the same structural pattern as this PR.
  • netbirdio/docs#786: Modifies docsNavigation group structure and isOpen toggle behavior in NavigationDocs.jsx, directly affecting how the newly added Kubernetes "Use Cases" group expands and collapses.

Suggested reviewers

  • mlsmaycon
  • jnfrati

Poem

🐇 Hoppity hop through the cluster we go,
A ClusterIP hidden, now easy to know!
DNS zones blossom, the routing peers stand tall,
HA replicas catch us whenever we fall.
The docs nav now shines with a bright "Use Cases" sign—
This rabbit approves, your Kubernetes is fine! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding documentation for routing to a Kubernetes service with high availability, which aligns with the PR's primary objective.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cc/k8s-ha-routing-peers

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@src/pages/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service.mdx`:
- Around line 115-117: The Note mentions setting `maxSurge: 0` as a solution for
rolling-update constraints with `whenUnsatisfiable: DoNotSchedule`, but the
documented example only shows pod-spec configuration through
`spec.workloadOverride.podTemplate`. Since `maxSurge` is a Deployment-level
`strategy.rollingUpdate` field rather than a pod-spec field, either add
clarification with an example showing how to configure `maxSurge: 0` through the
operator if it is supported, or remove the mention of `maxSurge: 0` from the
Note and keep only the option about maintaining more schedulable nodes than
replicas.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ed468dec-b3a9-4cf8-8844-a40a66e36645

📥 Commits

Reviewing files that changed from the base of the PR and between 6b4aa80 and cce9e07.

⛔ Files ignored due to path filters (6)
  • public/docs-static/img/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service/01-dns-zone.png is excluded by !**/*.png
  • public/docs-static/img/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service/02-access-policy.png is excluded by !**/*.png
  • public/docs-static/img/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service/03-network.png is excluded by !**/*.png
  • public/docs-static/img/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service/04-pods-across-nodes.png is excluded by !**/*.png
  • public/docs-static/img/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service/friendly-dns-cname.png is excluded by !**/*.png
  • public/docs-static/img/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service/topology.svg is excluded by !**/*.svg
📒 Files selected for processing (2)
  • src/components/NavigationDocs.jsx
  • src/pages/manage/integrations/kubernetes/use-cases/route-to-a-kubernetes-service.mdx

The operator's workloadOverride only exposes annotations, labels, podTemplate,
and replicas — there is no hook for the Deployment's strategy.rollingUpdate.maxSurge.
Keep the achievable workaround (more schedulable nodes than replicas).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant