Harden header/host control-char validation (unicode + trailing newline) [follow-up to #85]#86
Open
icanhasmath wants to merge 25 commits into
Open
Harden header/host control-char validation (unicode + trailing newline) [follow-up to #85]#86icanhasmath wants to merge 25 commits into
icanhasmath wants to merge 25 commits into
Conversation
CVE-2025-8194: tarfile accepted negative member offsets (reachable via a PAX extended header with a negative "size"), causing TarInfo._block to return a negative block count that moved the archive offset backwards and could hang (seekable files) or raise StreamError (streams). _block now rejects negative counts with InvalidHeaderError. CVE-2026-4786 / CVE-2026-4519: webbrowser.open passed attacker-controlled URLs to the browser command line unvalidated, so a URL starting with "-" could be treated as a command-line option (argument injection). Add BaseBrowser._check_url and call it from GenericBrowser, BackgroundBrowser and UnixBrowser; UnixBrowser validates the URL after %action expansion and substitutes %action before %s so %action cannot smuggle a leading dash. Adds tests in test_tarfile (negative _block count and a PAX negative-size archive) and a new test_webbrowser covering URL rejection and the %action bypass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backports a uniform "reject C0 control characters and DEL" defense across several stdlib modules where unvalidated user input was emitted into protocol headers/commands, enabling CR/LF (and NUL) injection: - wsgiref.headers.Headers: validate name/value in __init__, __setitem__ and add_header (CVE-2026-0865), raising ValueError. - Cookie.Morsel: reject control chars in set()/__setitem__ key, value and coded value (CVE-2026-0672), raising CookieError. Validation lives at the value-storage chokepoints, so the CVE-2026-3644-style bypasses do not apply (2.7 has no Morsel.update/|=/__setstate__). - imaplib.IMAP4._command: reject control chars in command arguments (CVE-2025-15366), raising ValueError. - poplib.POP3._putline (and the SSL override): reject control chars in the command line (CVE-2025-15367), raising error_proto. - httplib.HTTPConnection.set_tunnel: validate the CONNECT tunnel host via the existing _validate_host (CVE-2026-1502), raising InvalidURL. Adds focused tests to test_cookie, test_wsgiref, test_imaplib, test_poplib and test_httplib. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
email.generator.Generator wrote header values verbatim, so a value containing a bare CR/LF (e.g. set via msg['To'] = 'a\r\nBcc: x') could inject additional headers or body content. Port the upstream verify_generated_headers behaviour into _write_headers: after computing each header's serialized form, reject it (raise the new email.errors.HeaderWriteError) if it contains a CR/LF that is not part of valid folding, using NEWLINE_WITHOUT_FWSP = re.compile( r'\r\n[^ \t]|\r[^ \n\t]|\n[^ \t]'). Since 2.7's email has no policy framework, the check is unconditional (matching upstream's default-on). Adds email.errors.HeaderWriteError and a regression test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On a Windows release build, CRT calls made on a deliberately-invalid file descriptor invoke the invalid-parameter handler and fast-fail (surfacing as a "stopped working" dialog) instead of returning EBADF. Several tests (test_fileio, test_os, test_signal) intentionally exercise bad fds and so hung the suite. The existing _Py_BEGIN_SUPPRESS_IPH backport (e361063) guarded the primary fd operations but missed three secondary fstat/lseek calls: - _io/fileio.c new_buffersize(): fstat()/lseek() reached by readall() before the already-guarded read() (test_fileio testErrnoOnClosedReadall). - posixmodule.c posix_fdopen(): the directory-check fstat() reached by os.fdopen(bad_fd) (test_os TestInvalidFD.test_fdopen). - signalmodule.c signal_set_wakeup_fd(): the validation fstat() reached by signal.set_wakeup_fd(bad_fd) (test_signal test_invalid_fd). Wrap each in _Py_BEGIN_SUPPRESS_IPH/_Py_END_SUPPRESS_IPH so the call returns an error and the expected exception is raised. Also: - regrtest: suppress Windows error-reporting / CRT-assert dialogs at suite startup (SetErrorMode + debug-CRT report mode), mirroring Python 3's test driver, so a faulting test crashes cleanly instead of blocking on a modal dialog. Runs in -j slaves too. - test_ctypes: skip test_pass_pointers where c_long is narrower than a pointer (win64); it truncates the returned pointer and dereferences a bad address (an access violation, not an IPH case). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On a modern (UCRT / VS2015+) Windows build, <errno.h> defines the POSIX
errno values (ECONNRESET=108, ...), so Python's errno module exposes those
rather than the Winsock values. But the socket layer still reports Winsock
error codes (WSAECONNRESET=10054), so asyncore's _DISCONNECTED set -- built
from the C-runtime errno constants -- no longer matches a real connection
reset. recv() then re-raises instead of closing, the server threads in
test_ftplib die mid-loop, and they leak their handlers into the global
asyncore.socket_map ("socket_map was modified" -> test failure).
The existing Windows WSA handling here already mapped WSAENOTCONN,
WSAECONNABORTED and WSAEBADF, but omitted WSAECONNRESET (the actual error
seen) and WSAESHUTDOWN. Add both to the import, the POSIX fallback aliases,
and _DISCONNECTED. Harmless on older toolchains and on POSIX, where the WSA
names alias to the same POSIX values.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On 64-bit Windows (LLP64) sys.maxint is still 2**31-1 because C long is 32-bit, while the address space and Py_ssize_t are 64-bit. seq_tests' test_bigrepeat gated its 32-bit overflow check on sys.maxint, so on win64 it ran the check and tried to build a ~34 GB sequence (2**32 elements) that never raises MemoryError -- a multi-minute memory thrash ending in "MemoryError not raised". This hit test_list, test_tuple and test_userlist (all share seq_tests). Use sys.maxsize, matching upstream 2.7. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This fork's test_socket.py calls _have_socket_can() (module level) and references HAVE_SOCKET_ALG (in a class decorator) but the defining helpers were lost in backport, so the module crashed at import on Windows with "NameError: name '_have_socket_can' is not defined" (and would next hit HAVE_SOCKET_ALG). Restore both helpers from upstream 2.7 and define HAVE_SOCKET_ALG. On Windows AF_CAN/AF_ALG don't exist, so both return False and the corresponding test classes skip. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CVE-2024-0450: zipfile did not detect "quoted overlap" archives where an entry's compressed data overruns the start of the next entry, a high-ratio zip bomb. _RealGetContents now records each member's _end_offset (the start of the next local header, or the central directory for the last member) and ZipFile.open raises BadZipfile if an entry's data would extend past it. CVE-2025-8291: _EndRecData64 trusted that the ZIP64 end-of-central-directory record sat immediately before its locator and ignored the locator's stored relative offset. It now rejects archives whose locator offset points past the expected record position ("Corrupt zip64 end of central directory locator"). Adds _end_offset to ZipInfo (slots + __init__) and regression tests for both issues. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ion test
find_msvcrt() fabricated a non-existent CRT name (e.g. msvcr130.dll) on a
modern toolchain because _get_build_version() used the old "int(s[:-2]) - 6"
formula, which yields 13 for MSC v.19xx. CDLL() of that name fails with
WindowsError 126, breaking test_loading, test_errno and test_callbacks
(which load find_library("c")). Apply the upstream bpo-23606 fix: bump the
major version past the skipped v13, and have find_msvcrt() return None for
VS2015+ (the UCRT is not directly loadable). find_library("c") then returns
None and those tests skip cleanly.
Also skip test_prototypes.test_int_pointer_arg where c_long is narrower than
a pointer (win64): like test_pass_pointers it sets restype=c_long and
compares it against a full pointer address, which truncates on LLP64.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
urlparse.urlsplit only rejected mismatched IPv6 brackets, so a netloc like "ex[ample].com" or "[example.com]" was accepted, parsing differently from RFC 3986-compliant tools (a differential-parsing / SSRF vector). Add _check_bracketed_netloc / _check_bracketed_host (ported from the upstream fix) and call them from both urlsplit code paths. Brackets are now allowed only when they enclose a valid IPv6/IPvFuture host. Since 2.7 lacks the ipaddress module, IPv6 content is validated via socket.inet_pton (with a conservative character fallback where inet_pton is unavailable). Adds a regression test covering rejected and accepted bracketed hosts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SSLSocket.__init__ calls getpeername() to decide if the socket is already connected, expecting errno.ENOTCONN for an unconnected socket. On a modern (UCRT) Windows build errno.ENOTCONN is the POSIX value (107) while Winsock reports WSAENOTCONN (10057), so the check failed and wrap_socket() re-raised for the (very common) "wrap a not-yet-connected socket" path -- cascading to ~50 errors across test_ssl. Match both spellings via _NOT_CONNECTED_ERRORS, same root cause as the asyncore WSAECONNRESET fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CVE-2025-12084: xml.dom.minidom._clear_id_cache called _in_document(), which walks the parent chain to the document root on every node mutation, making deeply nested appendChild()/insertBefore() O(n^2). Replace the walk with an O(1) `node.ownerDocument` check (over-clearing a detached node's cache is harmless; the cache is rebuilt lazily). CVE-2025-6075: posixpath.expandvars rebuilt the whole path string on each substitution and ntpath.expandvars concatenated to a result string char by char, both quadratic in the input size. posixpath now accumulates output segments; ntpath now expands via a single regex-substitution pass (ported from upstream), preserving the existing matching semantics. Adds bulk/regression tests for both expandvars implementations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test_socket.py imports `from test import test_support` but the (partially forward-ported) body refers to the support module under BOTH names (test_support.* and support.*, ~24 each). The bare `support.*` references — including @support.requires_linux_version decorators evaluated at import — raised "NameError: name 'support' is not defined", crashing the module on import once the earlier _have_socket_can gap was fixed. Bind support = test_support (same module object) so both names resolve. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When close() flushed input ending in an unterminated construct, goahead() advanced only to the next '<' and re-parsed, so an input with many incomplete constructs (e.g. repeated "<!--") scanned the remaining buffer once per construct -- O(n^2). Backport the upstream fix: at EOF an unterminated construct is closed per HTML5 (comments/declarations/CDATA/PI are emitted via their handlers, and incomplete tags are ignored) and the rest of the buffer is consumed in one step (k = n), making it linear. Adds endtagopen and updates the affected EOF expectations in test_htmlparser, plus new test_eof_in_* cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test_loading.test_load_library asserted libc_name (find_library("c")) is not
None, but on a VS2015+/UCRT build find_msvcrt() correctly returns None (the
UCRT is not loadable as a single msvcrXXX.dll), so the assert failed. The
test's real purpose is loading kernel32, which is independent of libc_name;
drop the vestigial assertion (the libc-dependent cases already skip when
libc_name is None).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
base64.b64decode silently discarded non-alphabet characters and, with an alternative alphabet, still accepted the standard '+'/'/' characters (CVE-2025-12781); it also ignored any data after the padding (CVE-2026-3446). Add a validate=False parameter (mirroring Python 3). When validate=True the input is checked against the *requested* alphabet -- so '+'/'/' are rejected when altchars is given, and embedded or post-padding junk is rejected rather than silently dropped. This goes beyond upstream, which only deprecates the lenient behaviour. The default (validate=False) is unchanged. Adds a regression test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two classes of Windows/2.7 test-side failures after the ssl.py WSAENOTCONN library fix: - test_getpeercert_enotconn / test_do_handshake_enotconn asserted the raised errno equals errno.ENOTCONN, but on a UCRT build the socket reports the Winsock value WSAENOTCONN (10057) while errno.ENOTCONN is the C-runtime value (126). Compare against a set that includes both (_ENOTCONN). - test_pha_no_pha_client/server, test_pha_not_tls13 and (ThreadedTests) test_bpo37428_pha_cert_none exercise TLS 1.3 post-handshake authentication, whose Python API (ssl.TLSVersion etc.) is not present in this 2.7 ssl backport (test_pha_not_tls13 raised AttributeError on ssl.TLSVersion). Skip them unless hasattr(ssl, 'TLSVersion'). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tarfile normalized an AREGTYPE ('\x00') header whose name ends in a slash to
DIRTYPE. This was also applied to follow-up headers (a GNU long name/link
or a pax header), letting a crafted archive be interpreted differently from
other tools.
Split the header parsing into _frombuf(dircheck=...) / _fromtarfile and
perform the AREGTYPE->DIRTYPE normalization only for primary headers; the
follow-up reads in _proc_gnulong and _proc_pax pass dircheck=False. The
public frombuf()/fromtarfile() signatures are unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test_socket.py (and other partially forward-ported tests) decorate Linux/CAN-specific cases with @support.requires_linux_version(...), which is a Python 3 test.support helper absent from this fork. Evaluated at class-body import time, it raised "AttributeError: 'module' object has no attribute 'requires_linux_version'", crashing test_socket on import (after the earlier support-alias and _have_socket_can gaps were closed). Backport requires_linux_version (+ the _requires_unix_version helper) from upstream test.support. On non-Linux it is a pass-through; the CAN/Linux tests it guards already skip on Windows via HAVE_SOCKET_CAN. functools/platform/ unittest are already imported here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bump PY_VERSION to 2.7.18.14 and add release notes for the security fixes addressed in this release: CVE-2025-8194, CVE-2026-4519, CVE-2026-4786, CVE-2026-0865, CVE-2026-0672, CVE-2025-15366, CVE-2025-15367, CVE-2026-1502, CVE-2024-6923, CVE-2024-0450, CVE-2025-8291, CVE-2025-0938, CVE-2024-11168, CVE-2025-6069, CVE-2025-6075, CVE-2025-12084, CVE-2025-13462, CVE-2025-12781 and CVE-2026-3446. Documents "not affected" determinations for CVE-2025-13836, CVE-2025-15282, CVE-2025-11468, CVE-2025-1795, CVE-2026-3644 and CVE-2024-5642. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on win32 Two test-side residuals, surfaced once the asyncore WSAECONNRESET fix stopped the server threads dying mid-run: - test_with_statement sent 'noop', but DummyFTPHandler had no cmd_noop, so it replied "550 command not understood". Add cmd_noop (200 ok). - test_source_address / _passive_connection assert the bound source port equals a find_unused_port() value; on Windows that port can be taken between selection and bind, so getsockname() returns a different port (off by 1-2). Skip on win32 (the EADDRINUSE branch doesn't cover the assertEqual race). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_can Two test-suite fixes surfaced by running `python2 -m test` against the 2.7.18.14 build: * test_posixpath: test_expandvars_nonascii_word dereferenced test_support.FS_NONASCII (None under an ASCII/C-locale filesystem encoding) before its skipTest check, crashing with AttributeError: 'NoneType' object has no attribute 'encode'. Guard it with @skipUnless(test_support.FS_NONASCII, ...) to match its sibling test_expandvars_many and upstream CPython. Regression from the CVE-2025-6075 expandvars backport. * test_socket: HAVE_SOCKET_CAN = _have_socket_can() called a helper that was missing from the module, raising NameError at import and crashing the whole test. Restore the upstream _have_socket_can() definition; it resolves to False where PF_CAN/CAN_RAW are absent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Finalize the 2.7.18.14 release notes with the Windows (VS2022/UCRT, win64) regression-remediation work delivered on this line: IPH suppression for invalid-fd CRT calls (fileio/posix/signal), asyncore WSAECONNRESET/WSAESHUTDOWN and ssl WSAENOTCONN handling, ctypes find_msvcrt on VS2015+ (bpo-23606), and the win64/UCRT test-suite fixes. patchlevel.h already reads 2.7.18.14 (FINAL). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reconcile the remote 2.7.18.14 branch (which carried an independent fix, bc3b091: skip-guard test_expandvars_nonascii_word on FS_NONASCII, and a _have_socket_can helper) with the rc1-rc5 Windows regression remediation. test_socket.py conflict resolved in favour of the more complete remediation version (keeps both _have_socket_can and _have_socket_alg + the support alias); the test_posixpath.py FS_NONASCII skip-guard from bc3b091 is taken as-is. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… newline Two defense-in-depth gaps in the 2.7.18.14 header-injection backports (CVE-2024-6923 / control-char cluster), surfaced in review of PR #85: - wsgiref.headers._check_string only checked `str`, so a `unicode` header name/value carrying control characters bypassed the guard and could still be serialized. Check `basestring` instead. - email.generator.NEWLINE_WITHOUT_FWSP used consuming character classes ([^ \t]) that require a following character, so a value ending in a bare CR/LF/CRLF was not rejected -- the generator then appends its own newline, prematurely terminating the header block. Use negative lookaheads, which also fire at end-of-string while still permitting valid CRLF folding. Regression tests extended in test_wsgiref (unicode names/values) and test_email (trailing-newline values). A third review comment, about urlparse._check_bracketed_host leaking TypeError from socket.inet_pton on unicode hosts, does not reproduce: CPython 2.7's "s" arg parser encodes unicode, so invalid hosts raise UnicodeEncodeError (a ValueError subclass) or socket.error, both already handled. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Harden header/host control-char validation
Follow-up to review feedback on #85. Two low-severity but real defense-in-depth gaps in the 2.7.18.14 header-injection backports (CVE-2024-6923 / control-char cluster):
wsgiref.headers._check_stringonly checkedstr— aunicodeheader name/value carrying control characters (u'\r\n…') bypassed the guard entirely on Python 2. Now checksbasestring.email.generator.NEWLINE_WITHOUT_FWSPmissed trailing newlines — the consuming[^ \t]classes require a following char, so a value ending in a bare CR/LF/CRLF ('evil\n') was not rejected; the generator then appends its own newline, prematurely terminating the header block. Switched to negative lookaheads, which also fire at end-of-string while still allowing valid CRLF folding.Regression tests extended in
test_wsgiref(unicode names/values) andtest_email(trailing-newline values). Verified on the standalone Py2.7 build:test_wsgiref26/26,test_email280/280.Note on the third review comment (urlparse)
The comment about
_check_bracketed_hostleakingTypeErrorfromsocket.inet_ptonon unicode hosts does not reproduce. CPython 2.7's"s"arg parser encodes unicode, so an invalid bracketed host raisesUnicodeEncodeError(aValueErrorsubclass) orsocket.error— both already caught. No change needed.Stacking
2.7, this PR's diff will also show #85's commits; it cleans up to just the one hardening commit once #85 lands. Merge #85 first.No version bump / release here — these can be folded into a future
2.7.18.15if/when one is cut.