Skip to content

Harden header/host control-char validation (unicode + trailing newline) [follow-up to #85]#86

Open
icanhasmath wants to merge 25 commits into
2.7from
harden-header-host-ctrlchar
Open

Harden header/host control-char validation (unicode + trailing newline) [follow-up to #85]#86
icanhasmath wants to merge 25 commits into
2.7from
harden-header-host-ctrlchar

Conversation

@icanhasmath

Copy link
Copy Markdown

Harden header/host control-char validation

Follow-up to review feedback on #85. Two low-severity but real defense-in-depth gaps in the 2.7.18.14 header-injection backports (CVE-2024-6923 / control-char cluster):

  1. wsgiref.headers._check_string only checked str — a unicode header name/value carrying control characters (u'\r\n…') bypassed the guard entirely on Python 2. Now checks basestring.
  2. email.generator.NEWLINE_WITHOUT_FWSP missed trailing newlines — the consuming [^ \t] classes require a following char, so a value ending in a bare CR/LF/CRLF ('evil\n') was not rejected; the generator then appends its own newline, prematurely terminating the header block. Switched to negative lookaheads, which also fire at end-of-string while still allowing valid CRLF folding.

Regression tests extended in test_wsgiref (unicode names/values) and test_email (trailing-newline values). Verified on the standalone Py2.7 build: test_wsgiref 26/26, test_email 280/280.

Note on the third review comment (urlparse)

The comment about _check_bracketed_host leaking TypeError from socket.inet_pton on unicode hosts does not reproduce. CPython 2.7's "s" arg parser encodes unicode, so an invalid bracketed host raises UnicodeEncodeError (a ValueError subclass) or socket.error — both already caught. No change needed.

Stacking

⚠️ Stacked on #85 — the files patched here are introduced by #85. Until #85 merges into 2.7, this PR's diff will also show #85's commits; it cleans up to just the one hardening commit once #85 lands. Merge #85 first.

No version bump / release here — these can be folded into a future 2.7.18.15 if/when one is cut.

icanhasmath and others added 25 commits June 10, 2026 13:53
CVE-2025-8194: tarfile accepted negative member offsets (reachable via a
PAX extended header with a negative "size"), causing TarInfo._block to
return a negative block count that moved the archive offset backwards and
could hang (seekable files) or raise StreamError (streams). _block now
rejects negative counts with InvalidHeaderError.

CVE-2026-4786 / CVE-2026-4519: webbrowser.open passed attacker-controlled
URLs to the browser command line unvalidated, so a URL starting with "-"
could be treated as a command-line option (argument injection). Add
BaseBrowser._check_url and call it from GenericBrowser, BackgroundBrowser
and UnixBrowser; UnixBrowser validates the URL after %action expansion and
substitutes %action before %s so %action cannot smuggle a leading dash.

Adds tests in test_tarfile (negative _block count and a PAX negative-size
archive) and a new test_webbrowser covering URL rejection and the %action
bypass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backports a uniform "reject C0 control characters and DEL" defense across
several stdlib modules where unvalidated user input was emitted into
protocol headers/commands, enabling CR/LF (and NUL) injection:

- wsgiref.headers.Headers: validate name/value in __init__, __setitem__
  and add_header (CVE-2026-0865), raising ValueError.
- Cookie.Morsel: reject control chars in set()/__setitem__ key, value and
  coded value (CVE-2026-0672), raising CookieError. Validation lives at the
  value-storage chokepoints, so the CVE-2026-3644-style bypasses do not
  apply (2.7 has no Morsel.update/|=/__setstate__).
- imaplib.IMAP4._command: reject control chars in command arguments
  (CVE-2025-15366), raising ValueError.
- poplib.POP3._putline (and the SSL override): reject control chars in the
  command line (CVE-2025-15367), raising error_proto.
- httplib.HTTPConnection.set_tunnel: validate the CONNECT tunnel host via
  the existing _validate_host (CVE-2026-1502), raising InvalidURL.

Adds focused tests to test_cookie, test_wsgiref, test_imaplib, test_poplib
and test_httplib.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
email.generator.Generator wrote header values verbatim, so a value
containing a bare CR/LF (e.g. set via msg['To'] = 'a\r\nBcc: x') could
inject additional headers or body content.

Port the upstream verify_generated_headers behaviour into _write_headers:
after computing each header's serialized form, reject it (raise the new
email.errors.HeaderWriteError) if it contains a CR/LF that is not part of
valid folding, using NEWLINE_WITHOUT_FWSP = re.compile(
r'\r\n[^ \t]|\r[^ \n\t]|\n[^ \t]'). Since 2.7's email has no policy
framework, the check is unconditional (matching upstream's default-on).

Adds email.errors.HeaderWriteError and a regression test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On a Windows release build, CRT calls made on a deliberately-invalid file
descriptor invoke the invalid-parameter handler and fast-fail (surfacing as
a "stopped working" dialog) instead of returning EBADF. Several tests
(test_fileio, test_os, test_signal) intentionally exercise bad fds and so
hung the suite. The existing _Py_BEGIN_SUPPRESS_IPH backport (e361063)
guarded the primary fd operations but missed three secondary fstat/lseek
calls:

- _io/fileio.c new_buffersize(): fstat()/lseek() reached by readall() before
  the already-guarded read() (test_fileio testErrnoOnClosedReadall).
- posixmodule.c posix_fdopen(): the directory-check fstat() reached by
  os.fdopen(bad_fd) (test_os TestInvalidFD.test_fdopen).
- signalmodule.c signal_set_wakeup_fd(): the validation fstat() reached by
  signal.set_wakeup_fd(bad_fd) (test_signal test_invalid_fd).

Wrap each in _Py_BEGIN_SUPPRESS_IPH/_Py_END_SUPPRESS_IPH so the call returns
an error and the expected exception is raised.

Also:
- regrtest: suppress Windows error-reporting / CRT-assert dialogs at suite
  startup (SetErrorMode + debug-CRT report mode), mirroring Python 3's test
  driver, so a faulting test crashes cleanly instead of blocking on a modal
  dialog. Runs in -j slaves too.
- test_ctypes: skip test_pass_pointers where c_long is narrower than a
  pointer (win64); it truncates the returned pointer and dereferences a bad
  address (an access violation, not an IPH case).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On a modern (UCRT / VS2015+) Windows build, <errno.h> defines the POSIX
errno values (ECONNRESET=108, ...), so Python's errno module exposes those
rather than the Winsock values. But the socket layer still reports Winsock
error codes (WSAECONNRESET=10054), so asyncore's _DISCONNECTED set -- built
from the C-runtime errno constants -- no longer matches a real connection
reset. recv() then re-raises instead of closing, the server threads in
test_ftplib die mid-loop, and they leak their handlers into the global
asyncore.socket_map ("socket_map was modified" -> test failure).

The existing Windows WSA handling here already mapped WSAENOTCONN,
WSAECONNABORTED and WSAEBADF, but omitted WSAECONNRESET (the actual error
seen) and WSAESHUTDOWN. Add both to the import, the POSIX fallback aliases,
and _DISCONNECTED. Harmless on older toolchains and on POSIX, where the WSA
names alias to the same POSIX values.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On 64-bit Windows (LLP64) sys.maxint is still 2**31-1 because C long is
32-bit, while the address space and Py_ssize_t are 64-bit. seq_tests'
test_bigrepeat gated its 32-bit overflow check on sys.maxint, so on win64 it
ran the check and tried to build a ~34 GB sequence (2**32 elements) that
never raises MemoryError -- a multi-minute memory thrash ending in
"MemoryError not raised". This hit test_list, test_tuple and test_userlist
(all share seq_tests). Use sys.maxsize, matching upstream 2.7.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This fork's test_socket.py calls _have_socket_can() (module level) and
references HAVE_SOCKET_ALG (in a class decorator) but the defining helpers
were lost in backport, so the module crashed at import on Windows with
"NameError: name '_have_socket_can' is not defined" (and would next hit
HAVE_SOCKET_ALG). Restore both helpers from upstream 2.7 and define
HAVE_SOCKET_ALG. On Windows AF_CAN/AF_ALG don't exist, so both return False
and the corresponding test classes skip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CVE-2024-0450: zipfile did not detect "quoted overlap" archives where an
entry's compressed data overruns the start of the next entry, a high-ratio
zip bomb. _RealGetContents now records each member's _end_offset (the start
of the next local header, or the central directory for the last member) and
ZipFile.open raises BadZipfile if an entry's data would extend past it.

CVE-2025-8291: _EndRecData64 trusted that the ZIP64 end-of-central-directory
record sat immediately before its locator and ignored the locator's stored
relative offset. It now rejects archives whose locator offset points past
the expected record position ("Corrupt zip64 end of central directory
locator").

Adds _end_offset to ZipInfo (slots + __init__) and regression tests for
both issues.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ion test

find_msvcrt() fabricated a non-existent CRT name (e.g. msvcr130.dll) on a
modern toolchain because _get_build_version() used the old "int(s[:-2]) - 6"
formula, which yields 13 for MSC v.19xx. CDLL() of that name fails with
WindowsError 126, breaking test_loading, test_errno and test_callbacks
(which load find_library("c")). Apply the upstream bpo-23606 fix: bump the
major version past the skipped v13, and have find_msvcrt() return None for
VS2015+ (the UCRT is not directly loadable). find_library("c") then returns
None and those tests skip cleanly.

Also skip test_prototypes.test_int_pointer_arg where c_long is narrower than
a pointer (win64): like test_pass_pointers it sets restype=c_long and
compares it against a full pointer address, which truncates on LLP64.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
urlparse.urlsplit only rejected mismatched IPv6 brackets, so a netloc like
"ex[ample].com" or "[example.com]" was accepted, parsing differently from
RFC 3986-compliant tools (a differential-parsing / SSRF vector).

Add _check_bracketed_netloc / _check_bracketed_host (ported from the
upstream fix) and call them from both urlsplit code paths. Brackets are now
allowed only when they enclose a valid IPv6/IPvFuture host. Since 2.7 lacks
the ipaddress module, IPv6 content is validated via socket.inet_pton (with a
conservative character fallback where inet_pton is unavailable).

Adds a regression test covering rejected and accepted bracketed hosts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SSLSocket.__init__ calls getpeername() to decide if the socket is already
connected, expecting errno.ENOTCONN for an unconnected socket. On a modern
(UCRT) Windows build errno.ENOTCONN is the POSIX value (107) while Winsock
reports WSAENOTCONN (10057), so the check failed and wrap_socket() re-raised
for the (very common) "wrap a not-yet-connected socket" path -- cascading to
~50 errors across test_ssl. Match both spellings via _NOT_CONNECTED_ERRORS,
same root cause as the asyncore WSAECONNRESET fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CVE-2025-12084: xml.dom.minidom._clear_id_cache called _in_document(), which
walks the parent chain to the document root on every node mutation, making
deeply nested appendChild()/insertBefore() O(n^2). Replace the walk with an
O(1) `node.ownerDocument` check (over-clearing a detached node's cache is
harmless; the cache is rebuilt lazily).

CVE-2025-6075: posixpath.expandvars rebuilt the whole path string on each
substitution and ntpath.expandvars concatenated to a result string char by
char, both quadratic in the input size. posixpath now accumulates output
segments; ntpath now expands via a single regex-substitution pass (ported
from upstream), preserving the existing matching semantics.

Adds bulk/regression tests for both expandvars implementations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test_socket.py imports `from test import test_support` but the (partially
forward-ported) body refers to the support module under BOTH names
(test_support.* and support.*, ~24 each). The bare `support.*` references —
including @support.requires_linux_version decorators evaluated at import —
raised "NameError: name 'support' is not defined", crashing the module on
import once the earlier _have_socket_can gap was fixed. Bind support =
test_support (same module object) so both names resolve.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When close() flushed input ending in an unterminated construct, goahead()
advanced only to the next '<' and re-parsed, so an input with many
incomplete constructs (e.g. repeated "<!--") scanned the remaining buffer
once per construct -- O(n^2).

Backport the upstream fix: at EOF an unterminated construct is closed per
HTML5 (comments/declarations/CDATA/PI are emitted via their handlers, and
incomplete tags are ignored) and the rest of the buffer is consumed in one
step (k = n), making it linear. Adds endtagopen and updates the affected
EOF expectations in test_htmlparser, plus new test_eof_in_* cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test_loading.test_load_library asserted libc_name (find_library("c")) is not
None, but on a VS2015+/UCRT build find_msvcrt() correctly returns None (the
UCRT is not loadable as a single msvcrXXX.dll), so the assert failed. The
test's real purpose is loading kernel32, which is independent of libc_name;
drop the vestigial assertion (the libc-dependent cases already skip when
libc_name is None).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
base64.b64decode silently discarded non-alphabet characters and, with an
alternative alphabet, still accepted the standard '+'/'/' characters
(CVE-2025-12781); it also ignored any data after the padding
(CVE-2026-3446).

Add a validate=False parameter (mirroring Python 3). When validate=True the
input is checked against the *requested* alphabet -- so '+'/'/' are rejected
when altchars is given, and embedded or post-padding junk is rejected rather
than silently dropped. This goes beyond upstream, which only deprecates the
lenient behaviour. The default (validate=False) is unchanged.

Adds a regression test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two classes of Windows/2.7 test-side failures after the ssl.py WSAENOTCONN
library fix:

- test_getpeercert_enotconn / test_do_handshake_enotconn asserted the raised
  errno equals errno.ENOTCONN, but on a UCRT build the socket reports the
  Winsock value WSAENOTCONN (10057) while errno.ENOTCONN is the C-runtime
  value (126). Compare against a set that includes both (_ENOTCONN).

- test_pha_no_pha_client/server, test_pha_not_tls13 and (ThreadedTests)
  test_bpo37428_pha_cert_none exercise TLS 1.3 post-handshake authentication,
  whose Python API (ssl.TLSVersion etc.) is not present in this 2.7 ssl
  backport (test_pha_not_tls13 raised AttributeError on ssl.TLSVersion).
  Skip them unless hasattr(ssl, 'TLSVersion').

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tarfile normalized an AREGTYPE ('\x00') header whose name ends in a slash to
DIRTYPE.  This was also applied to follow-up headers (a GNU long name/link
or a pax header), letting a crafted archive be interpreted differently from
other tools.

Split the header parsing into _frombuf(dircheck=...) / _fromtarfile and
perform the AREGTYPE->DIRTYPE normalization only for primary headers; the
follow-up reads in _proc_gnulong and _proc_pax pass dircheck=False. The
public frombuf()/fromtarfile() signatures are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test_socket.py (and other partially forward-ported tests) decorate
Linux/CAN-specific cases with @support.requires_linux_version(...), which is a
Python 3 test.support helper absent from this fork. Evaluated at class-body
import time, it raised "AttributeError: 'module' object has no attribute
'requires_linux_version'", crashing test_socket on import (after the earlier
support-alias and _have_socket_can gaps were closed).

Backport requires_linux_version (+ the _requires_unix_version helper) from
upstream test.support. On non-Linux it is a pass-through; the CAN/Linux tests
it guards already skip on Windows via HAVE_SOCKET_CAN. functools/platform/
unittest are already imported here.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bump PY_VERSION to 2.7.18.14 and add release notes for the security fixes
addressed in this release: CVE-2025-8194, CVE-2026-4519, CVE-2026-4786,
CVE-2026-0865, CVE-2026-0672, CVE-2025-15366, CVE-2025-15367, CVE-2026-1502,
CVE-2024-6923, CVE-2024-0450, CVE-2025-8291, CVE-2025-0938, CVE-2024-11168,
CVE-2025-6069, CVE-2025-6075, CVE-2025-12084, CVE-2025-13462, CVE-2025-12781
and CVE-2026-3446.  Documents "not affected" determinations for
CVE-2025-13836, CVE-2025-15282, CVE-2025-11468, CVE-2025-1795, CVE-2026-3644
and CVE-2024-5642.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on win32

Two test-side residuals, surfaced once the asyncore WSAECONNRESET fix stopped
the server threads dying mid-run:

- test_with_statement sent 'noop', but DummyFTPHandler had no cmd_noop, so it
  replied "550 command not understood". Add cmd_noop (200 ok).
- test_source_address / _passive_connection assert the bound source port
  equals a find_unused_port() value; on Windows that port can be taken between
  selection and bind, so getsockname() returns a different port (off by 1-2).
  Skip on win32 (the EADDRINUSE branch doesn't cover the assertEqual race).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_can

Two test-suite fixes surfaced by running `python2 -m test` against the
2.7.18.14 build:

* test_posixpath: test_expandvars_nonascii_word dereferenced
  test_support.FS_NONASCII (None under an ASCII/C-locale filesystem
  encoding) before its skipTest check, crashing with
  AttributeError: 'NoneType' object has no attribute 'encode'.
  Guard it with @skipUnless(test_support.FS_NONASCII, ...) to match
  its sibling test_expandvars_many and upstream CPython. Regression
  from the CVE-2025-6075 expandvars backport.

* test_socket: HAVE_SOCKET_CAN = _have_socket_can() called a helper
  that was missing from the module, raising NameError at import and
  crashing the whole test. Restore the upstream _have_socket_can()
  definition; it resolves to False where PF_CAN/CAN_RAW are absent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Finalize the 2.7.18.14 release notes with the Windows (VS2022/UCRT, win64)
regression-remediation work delivered on this line: IPH suppression for
invalid-fd CRT calls (fileio/posix/signal), asyncore WSAECONNRESET/WSAESHUTDOWN
and ssl WSAENOTCONN handling, ctypes find_msvcrt on VS2015+ (bpo-23606), and
the win64/UCRT test-suite fixes. patchlevel.h already reads 2.7.18.14 (FINAL).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reconcile the remote 2.7.18.14 branch (which carried an independent fix,
bc3b091: skip-guard test_expandvars_nonascii_word on FS_NONASCII, and a
_have_socket_can helper) with the rc1-rc5 Windows regression remediation.

test_socket.py conflict resolved in favour of the more complete remediation
version (keeps both _have_socket_can and _have_socket_alg + the support alias);
the test_posixpath.py FS_NONASCII skip-guard from bc3b091 is taken as-is.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… newline

Two defense-in-depth gaps in the 2.7.18.14 header-injection backports
(CVE-2024-6923 / control-char cluster), surfaced in review of PR #85:

- wsgiref.headers._check_string only checked `str`, so a `unicode` header
  name/value carrying control characters bypassed the guard and could still
  be serialized. Check `basestring` instead.

- email.generator.NEWLINE_WITHOUT_FWSP used consuming character classes
  ([^ \t]) that require a following character, so a value ending in a bare
  CR/LF/CRLF was not rejected -- the generator then appends its own newline,
  prematurely terminating the header block. Use negative lookaheads, which
  also fire at end-of-string while still permitting valid CRLF folding.

Regression tests extended in test_wsgiref (unicode names/values) and
test_email (trailing-newline values). A third review comment, about
urlparse._check_bracketed_host leaking TypeError from socket.inet_pton on
unicode hosts, does not reproduce: CPython 2.7's "s" arg parser encodes
unicode, so invalid hosts raise UnicodeEncodeError (a ValueError subclass)
or socket.error, both already handled.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant