Skip to content

Egress tunnel capture + agent identity#338

Open
Nina Polshakova (npolshakova) wants to merge 3 commits into
agent-substrate:mainfrom
npolshakova:egress-tunnel
Open

Egress tunnel capture + agent identity#338
Nina Polshakova (npolshakova) wants to merge 3 commits into
agent-substrate:mainfrom
npolshakova:egress-tunnel

Conversation

@npolshakova

@npolshakova Nina Polshakova (npolshakova) commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

POC for #126

Based on design discussed in https://docs.google.com/document/d/1KmpIFu2gnqy9gp95wASgIo_vkJ_dA1DZckV8upET6bs/edit?usp=sharing

  • Tests pass
  • Appropriate changes to documentation are included in the PR

Summary:
This is a proof of concept egress capture path for actors. It introduces a reusable internal/egresscapture package that:

  • starts local capture listeners
  • drives the CONNECT authority from HTTP Host or TLS SNI, and opens a CONNECT-style tunnel to a configured PEP address.

The gVisor and microvm runtimes wire this into actor network setup by redirecting actor HTTP/HTTPS egress traffic to local capture ports. Agentgateway is used as the receiving proxy to prove that captured actor traffic reaches the tunnel endpoint.

Notes:

  • Tested the gvisor setup and microvm setups in kind. Can split these up into separate chunks to make it easier to review, but this shows the shared code.
  • Agent identity is currently passed as unsigned metadata headers:
    • x-ate-actor-id
    • x-ate-actor-template
    • x-ate-actor-template-namespace
    • x-ate-original-destination
    • x-ate-connect-authority
  • Signed agent/actor identity is out of scope for this PR and should replace the current metadata headers approach once agent identity goes in [Feature] Actor Identity #124
  • PEP control plane is out of scope of this PR (just using agentgateway to prove out the proxy gets traffic from the tunnel)

Comment thread cmd/ateom-gvisor/egress_capture.go Outdated
Comment on lines +35 to +36
egressCaptureHTTPPort = uint16(15001)
egressCaptureHTTPSPort = uint16(15002)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to capture HTTP vs HTTPS separately? A single listener could handle both and since SO_ORIGINAL_DST is used anyway to lookup the original port in deriveConnectAuthority.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that's a good clean up- originally I was testing http first and had one redirect per protocol.

@npolshakova Nina Polshakova (npolshakova) force-pushed the egress-tunnel branch 2 times, most recently from 9b63f65 to e678eeb Compare June 29, 2026 16:50
@npolshakova Nina Polshakova (npolshakova) marked this pull request as ready for review June 29, 2026 17:24
Comment thread cmd/ateom-microvm/egress_proxy.go Outdated
const (
egressCapturePort = uint16(15001)
egressOriginalHTTPPort = uint16(80)
egressOriginalHTTPSPort = uint16(443)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any specific reason we're only capturing these ports, and not just redirecting everything?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So right now ateom derives the CONNECT authority from HTTP Host header for port 80 OR TLS SNI for port 443. For other ports, the current capture code just falls back to raw original destination IP:port.

We could support trying to figure out the authority with other ports, but it would be a little more complicated because we'd need some sort of classifier:

  • try TLS SNI on any port
  • try HTTP Host on any port
  • fall back to recent per-actor DNS correlation
  • if still can't figure it out, use IP:port

Comment thread cmd/ateom-gvisor/egress_proxy.go Outdated
Comment on lines +36 to +37
egressOriginalHTTPPort = uint16(80)
egressOriginalHTTPSPort = uint16(443)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about only supporting these 2 ports

Comment thread cmd/ateom-microvm/net.go
cleanupErr = errors.Join(cleanupErr, fmt.Errorf("while removing actor nftables rules: %w", err))
slog.WarnContext(ctx, "Failed to remove actor nftables rules; continuing actor netns cleanup", slog.Any("err", err))
}
if s.egressCapture != nil {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a time we would want no egress capture?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question- I think in general, egress capture should always be enabled but you can imagine scenarios where that doesn't make sense:

  1. No PEP egress is setup. In this case I think traffic should still be captured by the ateom proxy, but not forwarded to any PEP
  2. Flexibility for workloads using traffic we do not capture yet (UDP/QUIC, non-80/443 TCP if we don't want to do the classifier, etc.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mike Morris (@mikemorris) This is kind of related to the discussion we had in slack. Like you brought up, if we're expecting all traffic to always egress through a PEP, then the PEP registration can be a first-class API. Are there cases though where we don't want the traffic to get captured/egress through a PEP?

Comment thread demos/counter/counter.go Outdated
return counter
}

const defaultEgressURL = "https://httpbin.org/get"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this added to the counter, seems like we should have a different demo potentially

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed more lightweight to just have a new endpoint under the counter demo for future testing, but we can make it a separate demo!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we name this package egress, or maybe tunnel?

Comment thread internal/egresscapture/capture.go Outdated
Comment on lines +80 to +81
// Keep tunnel protocol support behind factories so additional transports
// such as HBONE can plug in without changing capture/listener logic.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we discussed keeping the protocol generic, but do we need to start there, it makes the first merge more confusing to read, and I'm not clear if it's actually necessary

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was hoping to show how it could be extended in the future, but let me clean it up.

Signed-off-by: npolshakova <nina.polshakova@solo.io>
Signed-off-by: npolshakova <nina.polshakova@solo.io>
Signed-off-by: npolshakova <nina.polshakova@solo.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants