Skip to content

[WIP - test needed] docker: switch node to host networking on Linux (fix #18)#19

Open
shayanb wants to merge 1 commit into
mainfrom
feat/host-networking-linux
Open

[WIP - test needed] docker: switch node to host networking on Linux (fix #18)#19
shayanb wants to merge 1 commit into
mainfrom
feat/host-networking-linux

Conversation

@shayanb

@shayanb shayanb commented Jun 11, 2026

Copy link
Copy Markdown
Member

Summary

Fixes #18ufw-docker installs a DOCKER-USER rule that drops inbound UDP to container IPs on dport ≤32767. The node's QUIC socket reuses port 3000 for listen + dial, so peer replies come back on dport 3000 and get dropped → every handshake times out, node stuck at height 0.

Host networking sidesteps the entire DOCKER-USER chain (host-net containers flow through INPUT/OUTPUT, not FORWARD). Linux installs now default to network_mode: host; macOS / Docker Desktop stays on bridge.

Changes

  • LOGOS_DOCKER_NETWORK_MODE setting added to settings.env. Default = host on Linux, bridge elsewhere. Escape hatch for operators with custom ports or unexpected host-net edge cases.
  • Node compose (lib/docker.sh) — generate_compose_file() now emits two distinct shapes: under host mode it drops ports: / networks: and adds network_mode: host; bridge mode is byte-equivalent to 0.4.3.
  • Monitoring compose (lib/monitoring.sh) — stays on the bridge. Under host mode the exporter points at host.docker.internal:${LOGOS_API_PORT} with extra_hosts: host-gateway; logos-otel publishes 127.0.0.1:4317:4317 so the host-net node can push OTLP via loopback (never exposed to the LAN).
  • OTLP auto-migration — new migrate_user_config_otlp_endpoint helper rewrites endpoint: "http://logos-otel:4317"http://127.0.0.1:4317 on logosup update so 0.4.3 installs going to host mode get migrated automatically. Idempotent; leaves operator-customized endpoints alone; no-op when the metrics block is absent.
  • Drift detection (cmd_start.sh) — regen triggers on network-mode mismatch in addition to port drift, with a defensive migrate_user_config_otlp_endpoint call after any regen.
  • Custom-port warning — at every compose-regen site, warn when LOGOS_API_PORT≠8080 or LOGOS_UDP_PORT≠3000 under host mode (they're ignored because Docker port mapping no longer applies) and point at the escape hatch.
  • Security scan — new _check_ufw_docker finding (Linux-only) flags ufw-docker presence when the bridged monitoring stack is installed. Grafana LAN access still needs an explicit sudo ufw-docker allow logos-grafana 3000/tcp — the finding surfaces that command.
  • VERSION → 0.4.4.

Local verification done

  • bash -n on all eight modified files: clean.
  • Compose YAML validates under both modes (python3 yaml.safe_load).
  • docker compose config --quiet exits 0 for both node and monitoring compose, under both modes.
  • OTLP migrate helper: forward / backward / idempotent re-run / custom-endpoint-untouched / no-metrics-block-noop — all five scenarios pass.
  • Custom-port warning fires only under host + custom, silent in all other combinations.

Test plan (real hardware — not yet done)

  • Fresh Linux installlogosup install on a clean box. Verify network_mode: host in docker-compose.yml, no ports:/networks: block. docker inspect logos-node --format '{{.HostConfig.NetworkMode}}' returns host. ss -lnup | grep :3000 shows the node listening on host UDP 3000 directly.
  • The actual issue Docker networking with ufw-docker breaks #18 box — on Lisbon Pi (ufw + ufw-docker active), confirm peers connect and the node syncs past height 0.
  • Update from 0.4.3 on Linux — start on 0.4.3 with monitoring enabled, logosup update cli && logosup update node. Verify user_config.yaml endpoint rewrote from logos-otel:4317 to 127.0.0.1:4317, compose recreated with host mode, container has NetworkMode=host. Chain DB and wallet keys untouched.
  • Opt-out on Linuxecho 'LOGOS_DOCKER_NETWORK_MODE=bridge' >> settings.env && logosup stop && logosup start. Compose regenerates with ports: and networks:, OTLP endpoint reverts to logos-otel:4317.
  • Monitoring stack health under host mode — confirm exporter still reaches the node (/cryptarchia/info, /network/info), prometheus scrapes are green, Grafana dashboard shows live data, node-side OTLP push lands in the otel collector.
  • macOS — fresh install on Mac. Defaults to bridge. Compose is byte-equivalent to 0.4.3 (modulo whitespace from the heredoc refactor).
  • Security scan — on a Linux box with ufw-docker rules + docker-compose.monitoring.yml, confirm the new warn finding appears with the ufw-docker allow command. Without ufw-docker → no finding. Without monitoring compose → no finding.
  • Custom-port warningLOGOS_API_PORT=9000 + Linux + logosup update node → warn message visible, points at escape hatch.

Edge cases noted

  • Rootless Docker on Linux — host mode works but routes through slirp4netns; no code change, mention in release notes.
  • WSL2 — identical to bare-metal Linux. Host mode works.
  • Healthcheck port collision — if something else binds host:8080, the node binary fails to start and exits; cmd_start.sh health_rc=2 branch already catches this and surfaces logs.
  • docker_repair_unmanaged_network — under host mode the node is no longer attached to logosnode-net, but the monitoring stack still owns it. Repair logic operates on labels and works fine; verified during planning.

Planning notes: /Users/shayan/.claude/plans/crystalline-growing-flame.md

🤖 Generated with Claude Code

ufw-docker installs a DOCKER-USER rule that drops inbound UDP to container
IPs on dport ≤32767. The node's QUIC socket reuses port 3000 for listen +
dial, so peer replies come back on dport 3000 and get dropped → every
handshake times out, node stuck at height 0.

Host networking sidesteps the DOCKER-USER chain entirely (host-net containers
flow through INPUT/OUTPUT, not FORWARD). Linux installs now default to
`network_mode: host`; Mac/Docker Desktop stays on bridge.

  - New LOGOS_DOCKER_NETWORK_MODE setting in settings.env. Default = host on
    Linux, bridge elsewhere. Operators with custom ports or unexpected
    host-net problems can opt back to bridge.
  - Monitoring stack stays on bridge: exporter reaches the node via
    host.docker.internal:${LOGOS_API_PORT} + extra_hosts host-gateway;
    logos-otel publishes 4317 on loopback only so the node can push OTLP.
  - migrate_user_config_otlp_endpoint rewrites the OTLP endpoint on update
    so 0.4.3 installs going to host mode get http://logos-otel:4317 →
    http://127.0.0.1:4317 automatically. Idempotent; leaves custom
    endpoints alone.
  - cmd_start drift check now also triggers on network-mode mismatch.
  - Warn at every compose-regen site when custom LOGOS_API_PORT /
    LOGOS_UDP_PORT are set under host mode (they no longer apply), and
    point at the escape hatch.
  - New security scan finding flags ufw-docker presence when the bridged
    monitoring stack is installed — Grafana LAN access still needs an
    explicit `ufw-docker allow logos-grafana 3000/tcp`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docker networking with ufw-docker breaks

1 participant