Commit 58f0d09
authored
[PATCH] urllib.parse: Restrict IPv6 ZoneID characters to RFC 6874-compliant set
The current parsing logic for IPv6 addresses with Zone Identifiers (ZoneIDs)
uses the `ipaddress` module, which validates ZoneIDs according to RFC 4007,
allowing any non-null string. However, when used in URLs, ZoneIDs must follow
the percent-encoded format defined in RFC 6874.
This patch adds a check to restrict ZoneIDs to the allowed characters:
ALPHA / DIGIT / "-" / "." / "_" / "~" / "% HEXDIG HEXDIG"
RFC 6874 §2.1 specifies the format of an IPv6 address with a ZoneID in a URI as:
`IPv6addrz = IPv6address "%25" ZoneID`
Additionally, RFC 6874 recommends accepting a bare `%` without hex digits as a
liberal extension, but that flexibility still requires ZoneID content to conform
to a safe character set. This patch enforces that ZoneIDs do not include
characters outside the permitted range.
### Before the fix:
```py
>>> import urllib.parse
>>> urllib.parse.urlparse("http://[::1%2|test]/path")
ParseResult(scheme='http', netloc='[::1%2|test]', path='/path', ...)
```
Invalid characters such as `|` were incorrectly accepted in ZoneIDs.
### After the fix:
```py
>>> import urllib.parse
>>> urllib.parse.urlparse("http://[::1%2|test]/path")
Traceback (most recent call last):
...
ValueError: IPv6 ZoneID is invalid
```
This patch ensures `urllib.parse` properly rejects ZoneIDs with invalid characters,
improving compliance with the URI standards and helping prevent subtle bugs
or security vulnerabilities.1 parent 2bd4ff0 commit 58f0d09
1 file changed
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
466 | 466 | | |
467 | 467 | | |
468 | 468 | | |
| 469 | + | |
| 470 | + | |
469 | 471 | | |
470 | 472 | | |
471 | 473 | | |
| |||
0 commit comments