Skip to content

fix: treat non-breaking space as a separator in links (#66)#113

Open
patchwright wants to merge 1 commit into
robinst:mainfrom
patchwright:fix/nbsp-separator
Open

fix: treat non-breaking space as a separator in links (#66)#113
patchwright wants to merge 1 commit into
robinst:mainfrom
patchwright:fix/nbsp-separator

Conversation

@patchwright

Copy link
Copy Markdown

Problem

Fixes #66. A non-breaking space (U+00A0) next to an e-mail address or URL is swallowed into the link instead of separating it. For example test@example.com\u{a0} (NBSP after) extends the e-mail link into the following text, and https://example.com\u{a0}now pulls the NBSP and now into the URL. Confirmed on main HEAD b663d4e.

Root cause

All three scanners admit U+00A0 as a valid character because it sits in the non-ASCII range they accept for internationalized text: local_atom_allowed (_ => c >= '\u{80}') in email.rs, the ALPHA arm '\u{80}'..=char::MAX in domains.rs, and the URL delimiter set in url.rs which stops at \u{9F}, one code point below NBSP. So NBSP never reaches a separator/break path even though it is whitespace per Unicode.

Fix

Treat U+00A0 as a separator in all three scanners: exclude it from local_atom_allowed, break the authority scan on it, and add it to the URL "never part of a URL" set. 3 source lines changed, no API change; other non-ASCII characters (e.g. U+03F8) are unaffected.

How to test

cargo test --test email non_breaking_space_does_not_join_email
cargo test --test url non_breaking_space_does_not_join_url

Both new tests fail on main and pass with this change; the full suite stays green (91 passed).

Backward compatibility

No breaking changes. NBSP is whitespace and is not a valid character in e-mail local-parts, domain labels, or URLs (RFC 3986/3987, RFC 5321/6531), so no previously valid link changes; links that previously ran through an NBSP now terminate at it.


Assisted-by: Claude (code generation, reviewed and tested locally)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

non-breaking space is included as part of e-mail links

1 participant