Stop SSE filter from leaking tools/list on undecodable lines#5304
Stop SSE filter from leaking tools/list on undecodable lines#5304saivedant169 wants to merge 6 commits into
Conversation
processSSEResponse used to write the entire raw upstream payload and return whenever a single SSE data line failed jsonrpc2.DecodeMessage or decoded to a non-Response message (e.g. a notifications/* frame). On a tools/list reply that meant every subsequent data line, including the real Response, reached the client unfiltered, silently bypassing the cedar authorization filter and producing the superfluous WriteHeader warning at response_filter.go:191. Treat undecodable or non-Response data lines as pass-through for that line only: fall through to the existing line writer instead of dumping rawResponse. The explicit WriteHeader calls go away with them, which also removes the double-header warning that surfaced the bug. Skipped filtering on tools/list now emits a WARN so future bypasses are visible in audit logs. Adds a table-driven regression test covering both branches (decode error and non-Response). It fails on the old code with the unfiltered admin_tool entry reaching the recorder. Closes stacklok#5257
|
This PR looks like it's targetting issue #5292 ? |
|
cc @amirejaz |
|
Not quite — #5304 closes #5257, which is an authorization bypass in the Cedar SSE response filter ( Our issue #5292 is a separate concern in a different layer: per-event JSON-RPC frame validation in the transparent proxy ( The two are complementary — #5304 fixes a more serious auth bypass, #5292 is about frame validation in the non-auth transparent proxy path. |
yrobla
left a comment
There was a problem hiding this comment.
The direction is correct and the core fix is sound — removing the whole-payload dump and letting the existing if !written branch handle per-line passthrough is exactly right. Two warnings below about logging coverage and test coverage.
| // Pass this line through unfiltered. Earlier revisions wrote | ||
| // rawResponse and returned here, which leaked every subsequent | ||
| // data line on the stream past the filter (issue #5257). | ||
| if rfw.method == string(mcp.MethodToolsList) { |
There was a problem hiding this comment.
Warning: WARN logs are only emitted for tools/list, but the bypass applies to all four filtered methods.
Both log conditions — err != nil branch here, and the else if for non-*jsonrpc2.Response in the default: branch below — are gated on rfw.method == string(mcp.MethodToolsList). However, requiresResponseFiltering covers four methods: tools/list, prompts/list, resources/list, and find_tool. An SSE backend serving prompts/list or resources/list with interleaved MCP notifications would silently hit the same passthrough path — no log line at all.
The security fix itself is correct for all methods (the written flag controls passthrough regardless of method), but the observability goal from the issue — "future bypasses surface in audit logs rather than going silent" — is only fulfilled for tools/list.
Consider replacing both method checks with requiresResponseFiltering(rfw.method), which already encodes the right set.
There was a problem hiding this comment.
Confirmed FIXED — all three branches in processSSEResponse (decode-error, carriesResult-drop, genuine-non-Response) now slog.Warn(..., "method", rfw.method) unconditionally. Since this function only runs for the four requiresResponseFiltering methods, bypasses on all of them now log.
| // an MCP notification) or an undecodable data line with a real tools/list | ||
| // response, the filter previously wrote the entire raw upstream payload and | ||
| // returned, leaking the unfiltered tools/list past Cedar. It must instead pass | ||
| // only the offending line through and continue filtering the rest of the |
There was a problem hiding this comment.
Warning: Regression test only covers MethodToolsList; the same bypass applies to prompts/list and resources/list over SSE.
The new test hardcodes string(mcp.MethodToolsList). The same code path in processSSEResponse runs for any method where requiresResponseFiltering is true — including MethodPromptsList and MethodResourcesList. A notification-interleaved SSE response on those paths is not currently exercised.
A brief additional sub-test for MethodPromptsList or MethodResourcesList with an interleaved notification would confirm parity and guard against a future regression on those paths.
There was a problem hiding this comment.
Confirmed FIXED — TestResponseFilteringWriter_SSE_PerLineFallthrough is a genuine 3 methods × 2 scenarios table (tools/list, prompts/list, resources/list × notification-interleaved, undecodable-line), each with its own Cedar policy and authorized/unauthorized fixture. Not a shallow method-name parameterization.
Address review feedback on stacklok#5304: The WARN log paths in processSSEResponse were gated on the tools/list method, but the underlying bypass applies to every method routed through requiresResponseFiltering (tools/list, prompts/list, resources/list, find_tool). Drop the gating so the WARN fires for any of them. The regression test only exercised tools/list. Restructure it into a method-by-preceding-line table so the same notification and undecodable line cases run against prompts/list and resources/list as well. Verified the WARN now logs for all three methods and the unauthorized entry never reaches the recorder.
|
Both points addressed in 3984262. The WARN logs no longer gate on tools/list, so any method that hits processSSEResponse via requiresResponseFiltering (tools/list, prompts/list, resources/list, find_tool) now WARNs when a data line is undecodable or non-Response. The regression test is now a method-by-preceding-line table over all three list methods. Each method has its own cedar policy and a one-authorized + one-unauthorized fixture; both the notification and undecodable-line cases run against tools/list, prompts/list, and resources/list. Confirmed the unauthorized entry never reaches the recorder for any of the six combinations. @yrobla ready for another look when you have a moment. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5304 +/- ##
=======================================
Coverage 68.40% 68.40%
=======================================
Files 624 624
Lines 63442 63442
=======================================
+ Hits 43397 43400 +3
+ Misses 16807 16805 -2
+ Partials 3238 3237 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| } | ||
|
|
||
| written = true | ||
| } else { |
There was a problem hiding this comment.
The pass-through gate here is still keyed on the JSON-RPC message type, which the upstream controls. DecodeMessage only classifies a frame as *Response when it has no method field. So a frame carrying both method (e.g. notifications/initialized) and result (with the real tools list) decodes to a *Request, fails the *Response assertion, and passes through unfiltered via the if !written branch below.
Same bypass class as #5257, just narrowed from "any non-Response line" to "a deliberately framed one." Since the proxy's threat model treats upstream list responses as untrusted (that's the whole reason requiresResponseFiltering exists), I think this residual is worth closing.
Cheapest fix: for filtered methods, if the raw data contains a result field, route it through filterListResponse regardless of message type — or drop the line as a protocol violation. A regression test with a frame like {"jsonrpc":"2.0","method":"notifications/initialized","id":1,"result":{"tools":[...]}} would lock it.
Happy to file this as a follow-up if you'd rather ship the accidental-bypass fix now and track it separately — your call.
There was a problem hiding this comment.
Partially fixed. The exact PoC shape you gave (method+result, valid jsonrpc/id) is now caught — carriesResult (response_filter.go:278-283) does a proper json.Unmarshal into a struct, not substring matching, and TestResponseFilteringWriter_SSE_DisguisedResponseFrame genuinely exercises it.
But the fix is incomplete: carriesResult is only consulted in the err == nil && !isResponse branch (line 217). jsonrpc2.DecodeMessage returns an error — not a benign non-Response — for several upstream-controlled shapes (missing/invalid jsonrpc version tag, a response-shaped frame with no id, a non-scalar id, or a batch/array payload), and all of those land in the if err != nil branch at line 191, which still passes the line through unfiltered without ever checking for a smuggled result. That's the same bypass class, just relocated to an easier-to-reach branch. Left a detailed comment with concrete exploit frames on line 191.
Also: processJSONResponse (line 132) has the identical unpatched shape for the application/json transport — the PR's 'Special notes for reviewers' says that path is 'already correct', but the disguised-frame issue is transport-independent, not stream-specific, so that note isn't right.
| return rfw.writeErrorResponse(response.ID, err) | ||
| } | ||
|
|
||
| _, err = rfw.ResponseWriter.Write([]byte("data: " + string(filteredData) + "\n")) |
There was a problem hiding this comment.
This hardcodes "\n", but the loop already writes the detected linesep at line 228. On a \r\n stream that produces data: <json>\n\r\n — a stray \n a strict SSE parser might read as an empty event. The passthrough branch writes line + linesep (no extra \n), so the two paths produce structurally different output frames.
Drop the "\n" here and let linesep terminate. One-line change.
There was a problem hiding this comment.
Confirmed FIXED — the filtered-write path no longer appends a hardcoded terminator; the loop's shared linesep write at the bottom handles termination for every branch. No more \r\n desync.
| filteredData, err := jsonrpc2.EncodeMessage(filteredResponse) | ||
| if err != nil { | ||
| return rfw.writeErrorResponse(response.ID, err) | ||
| switch { |
There was a problem hiding this comment.
The switch { default: if/else } is harder to read than a plain if / else if / else chain here. There's exactly one condition (err != nil) plus a type assertion — the switch adds no dispatch value, and the happy path (decode ok, it's a Response, filter it) is buried in a default with a nested if, three levels deep.
An if err != nil { warn } else if response, ok := ...; ok { filter } else { warn } chain keeps the happy path at one indentation level and makes the three outcomes symmetric. No behavior change, just clarity.
There was a problem hiding this comment.
Confirmed FIXED — this is now a flat if err != nil {...} else if isResponse {...} else if carriesResult(...) {...} else {...} chain at one indentation level. Happy path is no longer buried in a default case.
A non-Response SSE frame carrying both a method field and a result (for example notifications/initialized smuggling a tools/list result) was classified as a request by DecodeMessage, failed the Response check, and passed through unfiltered, a narrowed form of stacklok#5257. carriesResult now detects a result field on such frames and drops the line as a protocol violation, with a regression test. Also drop the hardcoded "\n" after the filtered data line: the loop writes the detected line separator, so the literal "\n" desynced \r\n streams. Flatten the decode switch into an if/else chain for readability. Signed-off-by: Saivedant Hava <saivedant169@gmail.com>
|
Pushed
@yrobla: your two points are already in on the current branch, the WARN logs are unconditional (so every filtered method logs), and the SSE fallthrough test exercises tools/list, prompts/list, and resources/list. |
JAORMX
left a comment
There was a problem hiding this comment.
Re-reviewed against head commit eaa6c0d. Items 1, 2, and 4 from the two review rounds are genuinely fixed with real regression tests that fail without the fix — good work there. Item 3 (the method+result frame-smuggling bypass) is only partially closed: carriesResult correctly catches the exact PoC shape via a proper JSON unmarshal, but it's only consulted in the err == nil && !isResponse branch. jsonrpc2.DecodeMessage returns an error — not a benign non-Response — for several upstream-controlled cases (missing/invalid jsonrpc version tag, a response-shaped frame with no id, a non-scalar id, or a batch/array payload), and all of those land in the err != nil branch, which still passes the line through unfiltered without ever checking for a smuggled result. That's the exact #5257 bypass this PR exists to close, just relocated to an easier-to-reach branch — and it fails open on a component whose entire job is enforcing an authz boundary against an untrusted upstream.
The sibling processJSONResponse has the identical unpatched shape for the application/json transport. The PR's 'Special notes for reviewers' dismisses this as 'already correct', but that's not right — the disguised-frame issue is transport-independent, not stream-specific.
Blocking:
- Make the result-smuggling check run regardless of
DecodeMessage's error outcome — fail closed on anydata:line under a filtered method that either can't be classified as a benign non-response or carries aresult— and handle array/batch payloads too. - Apply the same guard to
processJSONResponse, or explicitly track it as a fast-follow and correct the PR notes (don't leave 'already correct' standing). - Extend
TestResponseFilteringWriter_SSE_DisguisedResponseFramewith the additional smuggled shapes (missingjsonrpctag, missingid, batch array) — all three currently leakadmin_toolper this review.
Non-blocking:
carriesResulthas no standalone unit test independent of the SSE integration test.find_toolisn't covered by the new fallthrough test table (only the three list methods are).
Replied inline on the specific threads, plus two new comments below.
| if data, ok := bytes.CutPrefix(line, []byte("data:")); ok { | ||
| message, err := jsonrpc2.DecodeMessage(data) | ||
| response, isResponse := message.(*jsonrpc2.Response) | ||
| if err != nil { |
There was a problem hiding this comment.
This is the actual hole. jsonrpc2.DecodeMessage returns a non-nil err — landing here, not in the carriesResult branch below — for several frame shapes a malicious/compromised upstream fully controls:
- missing or invalid
jsonrpcversion tag - a response-shaped frame (no
method) with no validid - an
idof a non-scalar type (object/array/bool) - a JSON array (batch) payload
All of these hit this branch and get written through unfiltered, e.g.:
data: {"id":1,"result":{"tools":[{"name":"admin_tool"}]}}
(missing jsonrpc tag → decode error → unfiltered passthrough), or
data: [{"jsonrpc":"2.0","id":1,"result":{"tools":[{"name":"admin_tool"}]}}]
(batch array → decode error; note carriesResult's struct-based probe would also return false on this shape even if it ran here, since it doesn't handle arrays).
Any client lenient about the jsonrpc tag, a missing id, or JSON-RPC batching (which MCP supported prior to 2025-06-18) will read the smuggled result as the real response — exactly the Cedar bypass this PR sets out to close, just relocated here. Recommend restructuring so the smuggled-result check runs independent of whether DecodeMessage succeeded, and extending it to detect a result key inside array-shaped payloads too. This is the same class as #5257 and should fail closed, not open, on malformed/adversarial input.
| // response and must reach the filter. | ||
| slog.Warn("SSE data line was not a JSON-RPC Response; passing through unfiltered", | ||
| "method", rfw.method) | ||
| } |
There was a problem hiding this comment.
Minor: this branch (genuine non-Response frame, e.g. an interleaved notifications/* message) is spec-legitimate, routine SSE traffic, but it logs at Warn — the same level as the two actually-suspicious branches (undecodable line, smuggled-result drop). Warning on every routine notification will train operators to filter the WARN level, which then buries the two branches that actually indicate something adversarial. Consider demoting this one to Debug so severity tracks suspiciousness.
The SSE and JSON response filters only checked for a smuggled result on frames that decoded as a non-Response. Frames that DecodeMessage rejects outright (missing or invalid jsonrpc tag, a response frame with no or a non-scalar id, a batch array) landed in the decode-error branch and were written through unfiltered, the same Cedar bypass as stacklok#5257 relocated to an easier-to-reach branch. Run the smuggled-result check independent of decode success on both the SSE and application/json paths, and extend carriesResult to detect a result inside a batch array. Fail closed on these frames. Demote the genuine-notification SSE branch to Debug so the two suspicious branches are not buried under routine traffic. Signed-off-by: Saivedant Hava <saivedant169@gmail.com>
|
Pushed 644506a. You were right on both counts. The smuggled-result check now runs independent of DecodeMessage's outcome. I reordered the SSE branch so
Demoted the genuine-notification SSE branch to Debug so the two branches that actually indicate something adversarial (undecodable line, smuggled-result drop) keep Warn to themselves. Tests: One judgment call worth flagging: on the JSON path I fail closed by writing |
Summary
ResponseFilteringWriter.processSSEResponseused to abort filtering of the entire SSE stream whenever a singledata:line failedjsonrpc2.DecodeMessageor decoded to a non-*jsonrpc2.Responsemessage. The fallback wrote the entire buffered upstream payload and returned, so any subsequentdata:line containing the realtools/listreply reached the client unfiltered, bypassing the cedar authorization filter. The fallback also re-calledWriteHeaderafter the reverse proxy's firstFlush()had already emitted headers, producing thehttp: superfluous response.WriteHeader call from … response_filter.go:191warning that surfaced the bug.This PR:
Responsedata:line as pass-through for that line only: fall through to the existing line writer (the loop'sif !writtenbranch) rather than dumpingrawResponseand returning.WriteHeader(rfw.statusCode)calls in the fallback branches, which removes the double-header warning at line 191.tools/listreply contains adata:line that bypasses the filter, so future bypasses are visible in audit logs instead of silent.Notifications interleaved on a response stream are explicitly allowed by the MCP spec, so this case can be hit by fully spec-compliant upstreams (
notifications/message, progress updates, etc.) as well as by upstreams fronted by an SSE bridge (the npmmcp-proxycase from the issue reproduction).Closes #5257
Type of change
Test plan
task test)task lint-fix)Added
TestResponseFilteringWriter_SSE_PerLineFallthroughas a table-driven regression test covering both branches (decode error and non-Responsedecode). It builds an SSE stream that interleaves a notification (or an undecodable garbage line) with a realtools/listresponse, then asserts:tools/listresponse is filtered to only authorized tools."admin_tool"(the unauthorized tool from the unfiltered upstream payload) does not appear anywhere in the output.The test fails on the prior code with
admin_toolreaching the recorder, and passes on the patched code.Verified on
pkg/authz/...with-race:go test -ldflags=-extldflags=-Wl,-w -race ./pkg/authz/...passes (including the existing JSON-path tests and the new SSE test).golangci-lint run ./pkg/authz/...reports 0 issues.Does this introduce a user-facing change?
Yes. On SSE upstreams (
Content-Type: text/event-stream),tools/list,prompts/list, andresources/listresponses now respect the configured cedar authorization filter even when the upstream interleaves notifications or sends an undecodabledata:line. Previously such streams returned the unfiltered tool/prompt/resource catalog to the caller. Filtered behavior is what operators already get onapplication/jsonupstreams; this brings SSE in line.Special notes for reviewers
The change is intentionally minimal. I considered also auditing
processJSONResponsefor the same shape (return rawResponse on per-line failure) but it processes a single message, not a stream, so the same code shape there is already correct. Happy to extend if you want a defensive log added on that side too.