-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Description
Description
When calling URL.to_uri() on a URL that uses a plain IPv6 literal host (e.g. http://[::1]:PORT/...), hyperlink attempts to IDNA-encode the host. This is incorrect and results in an exception from idna, since IPv6 literals are not DNS labels and must not undergo IDNA processing.
This is the same root cause as #68, which appears to have gone stale.
Reproduction
from hyperlink import EncodedURL
url = EncodedURL.from_text("http://[::1]:60993/api/v1/token")
url.to_uri().to_text().encode("ascii")Actual result
idna.core.InvalidCodepoint:
Codepoint U+003A ':' not allowed
This occurs because idna.encode("::1") is invoked.
Expected result
The URL should round-trip successfully:
"http://[::1]:60993/api/v1/token"Why this is a bug
- IDNA applies only to DNS labels, not IP literals
- RFC 3986 explicitly allows IPv6 literal hosts
hyperlinkalready hasparse_host()which distinguishes:- DNS names
- IPv4 literals
- IPv6 literals
- Applying IDNA to IP literals is both unnecessary and invalid
Suggested fix
In URL.to_uri(), detect IPv4/IPv6 literal hosts using parse_host() and skip IDNA encoding for those cases. IDNA should only be applied to non-IP (DNS) hosts.
Pseudo-logic:
family, host_text = parse_host(self.host)
if family is not None:
# IPv4 / IPv6 literal
host = host_text
else:
host = idna_encode(host_text, uts46=True).decode("ascii")Regression test
def test_ipv6_literal_not_idna_encoded():
from hyperlink import EncodedURL
url = EncodedURL.from_text("http://[::1]:8080/path")
assert url.to_uri().to_text() == "http://[::1]:8080/path"Environment
- hyperlink: current release / main
- Python: 3.10+
- idna: current
Metadata
Metadata
Assignees
Labels
No labels