fix(WebRequest): CWE-190/DoS fix and boundary-parsing refactor#447
fix(WebRequest): CWE-190/DoS fix and boundary-parsing refactor#447mathieucarbou wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens and refactors multipart boundary= parsing in AsyncWebServerRequest::_parseReqHeader() to reduce allocation overhead during header scanning and to enforce RFC 2046’s 70-character boundary limit as a mitigation for CWE-190/DoS scenarios.
Changes:
- Refactors the
boundary=parameter scan to use raw C-string access and earlier exit conditions. - Reworks boundary value extraction to a single-pass unescape/write path (token or quoted-string) and enforces the 70-char limit early.
- Introduces
std::string_view-based slicing to avoid intermediateString::substring()allocations during parsing.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@willmmiles : fyi we cannot use string_view if we want to keep supporting arduino 2 and libretiny. |
521087c to
9ed7c0d
Compare
7bbca28 to
d62f601
Compare
|
@me-no-dev @willmmiles : can you guys review / approve this pr do that we can merge it ? It is sitting there since a while now. Thanks ! |
| // If there is not enough room left for "boundary=" plus at least | ||
| // one value byte, no later ';' can match either — stop scanning. | ||
| if (j + (int)T_BOUNDARY_LEN + 1 > llen) { | ||
| break; | ||
| } |
There was a problem hiding this comment.
We shouldn't have to check this here - the test at the top of the loop should be enough. The mistake is that lscan needs to deduct the semicolon too.
There was a problem hiding this comment.
The code is valid and this check is necessary.
lscan already accounts for semicolon.
I have doubled checked this one with GLM 5.2 to be sure and have a more in depth explanation of why your proposal would fail.
the loop would stop one iteration early and miss valid boundaries like ;boundary=X at the very end of the string.
There was a problem hiding this comment.
You're right - this function does require a duplicate check as it is currently arranged. I'm sorry for the confusion - I really couldn't put my finger on what it was that was bugging me with the logic here, and I didn't do a very good job of articulating that unease.
I did a more thorough re-review with Claude and it was very helpful for clarifying my intuition. The deeper issue IMO is that the function tries to mix token analysis in a byte loop. A clearer formulation is to make the outer loop a loop over parameters instead, with "skip logic" over the quoted values inside.
The cleaner version is here: willmmiles@65e128e
It's substantially shorter and clearer IMO. (It's also dropped the toLowerCase call in favor of strncasecmp which should be much faster and avoids the extra allocation.)
There was a problem hiding this comment.
Thanks! I will have a look!
There was a problem hiding this comment.
@willmmiles I cherry picked your commit and verified it. I added a commit on top of it to fix an issue in your commit that re-introduced the CVE, and updated the MultiPart.ino example. The issue was an non-handled quoted empty boundary parameter. All the test cases in MultiPart.ino are working fine now.
|
@willmmiles : I have addressed your comments. Could you please have a look ? Thanks! |
willmmiles' parameter-loop refactor (65e128e) removed the final _boundary length safety net that existed after both extraction branches, relying on per-branch checks. However, the per-branch checks do not cover the empty quoted-string case: boundary="" closes immediately with _boundary still empty and falls through to _isMultipart = true without rejection. Downstream in EXPECT_BOUNDARY, an empty _boundary would cause --\r\n to be accepted as a valid boundary delimiter, reintroducing the CWE-190 / DoS condition this PR is meant to fix. Restore the final safety check (length == 0 || length > 70) with a comment explaining the empty-quoted-string gap that motivates it. Ref: #447 (comment)
Tighten multipart boundary parsing in _parseReqHeader(): - Replace String::charAt() inner loop with a raw C-string pointer and pre-computed length for faster, allocation-free scanning. - Add an early-exit upper bound on the scan loop so that positions where 'boundary=' cannot possibly fit are skipped entirely. - Replace the three-pass quoted-string extraction (find close-quote, substring, then unescape) with a single-pass approach that writes directly into _boundary using a raw pointer+length pair — no intermediate heap allocations and no C++17 dependency. - Enforce the RFC 2046 §5.1 70-character limit on boundary length in both token and quoted-string paths; abort with PARSE_REQ_FAIL on violation to prevent CWE-190 integer-overflow / DoS. - Replace strncmp length guard with a position-based early break that is clearer and avoids the redundant cast. - Fix ESP8266 build: String(const char*, size_t) is unavailable; use a 71-byte stack buffer + String(const char*) instead (#ifdef ESP8266). - Fix LibreTiny/C++14 build: remove std::string_view (C++17 only); replaced with plain const char* + size_t throughout. Ref: #445
Use a token-based loop instead of a byte loop as the primary parser for multipart options. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
willmmiles' parameter-loop refactor (65e128e) removed the final _boundary length safety net that existed after both extraction branches, relying on per-branch checks. However, the per-branch checks do not cover the empty quoted-string case: boundary="" closes immediately with _boundary still empty and falls through to _isMultipart = true without rejection. Downstream in EXPECT_BOUNDARY, an empty _boundary would cause --\r\n to be accepted as a valid boundary delimiter, reintroducing the CWE-190 / DoS condition this PR is meant to fix. Restore the final safety check (length == 0 || length > 70) with a comment explaining the empty-quoted-string gap that motivates it. Ref: #447 (comment)
Tighten multipart boundary parsing in _parseReqHeader():
Ref: #445