Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
216 changes: 81 additions & 135 deletions Doc/library/urllib.parse.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,12 @@
The URL parsing functions focus on splitting a URL string into its components,
or on combining URL components into a URL string.

.. function:: urlparse(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False)
.. function:: urlsplit(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False)

Parse a URL into six components, returning a 6-item :term:`named tuple`. This
corresponds to the general structure of a URL:
``scheme://netloc/path;parameters?query#fragment``.
Parse a URL into five components, returning a 5-item :term:`named tuple`
:class:`SplitResult` or :class:`SplitResultBytes`.
This corresponds to the general structure of a URL:
``scheme://netloc/path?query#fragment``.
Each tuple item is a string, possibly empty, or ``None`` if
*missing_as_none* is true.
Not defined component are represented an empty string (by default) or
Expand All @@ -68,15 +69,15 @@
.. doctest::
:options: +NORMALIZE_WHITESPACE

>>> from urllib.parse import urlparse
>>> urlparse("scheme://netloc/path;parameters?query#fragment")
ParseResult(scheme='scheme', netloc='netloc', path='/path;parameters', params='',
>>> from urllib.parse import urlsplit
>>> urlsplit("scheme://netloc/path?query#fragment")
SplitResult(scheme='scheme', netloc='netloc', path='/path',
query='query', fragment='fragment')
>>> o = urlparse("http://docs.python.org:80/3/library/urllib.parse.html?"
>>> o = urlsplit("http://docs.python.org:80/3/library/urllib.parse.html?"
... "highlight=params#url-parsing")
>>> o
ParseResult(scheme='http', netloc='docs.python.org:80',
path='/3/library/urllib.parse.html', params='',
SplitResult(scheme='http', netloc='docs.python.org:80',
path='/3/library/urllib.parse.html',
query='highlight=params', fragment='url-parsing')
>>> o.scheme
'http'
Expand All @@ -88,42 +89,42 @@
80
>>> o._replace(fragment="").geturl()
'http://docs.python.org:80/3/library/urllib.parse.html?highlight=params'
>>> urlparse("http://docs.python.org?")
ParseResult(scheme='http', netloc='docs.python.org',
path='', params='', query='', fragment='')
>>> urlparse("http://docs.python.org?", missing_as_none=True)
ParseResult(scheme='http', netloc='docs.python.org',
path='', params=None, query='', fragment=None)

Following the syntax specifications in :rfc:`1808`, urlparse recognizes
>>> urlsplit("http://docs.python.org?")
SplitResult(scheme='http', netloc='docs.python.org', path='',
query='', fragment='')
>>> urlsplit("http://docs.python.org?", missing_as_none=True)
SplitResult(scheme='http', netloc='docs.python.org', path='',
query='', fragment=None)

Following the syntax specifications in :rfc:`1808`, :func:`!urlsplit` recognizes
a netloc only if it is properly introduced by '//'. Otherwise the
input is presumed to be a relative URL and thus to start with
a path component.

.. doctest::
:options: +NORMALIZE_WHITESPACE

>>> from urllib.parse import urlparse
>>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
params='', query='', fragment='')
>>> urlparse('www.cwi.nl/%7Eguido/Python.html')
ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
params='', query='', fragment='')
>>> urlparse('help/Python.html')
ParseResult(scheme='', netloc='', path='help/Python.html',
params='', query='', fragment='')
>>> urlparse('help/Python.html', missing_as_none=True)
ParseResult(scheme=None, netloc=None, path='help/Python.html',
params=None, query=None, fragment=None)
>>> from urllib.parse import urlsplit
>>> urlsplit('//www.cwi.nl:80/%7Eguido/Python.html')
SplitResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
query='', fragment='')
>>> urlsplit('www.cwi.nl/%7Eguido/Python.html')
SplitResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
query='', fragment='')
>>> urlsplit('help/Python.html')
SplitResult(scheme='', netloc='', path='help/Python.html',
query='', fragment='')
>>> urlsplit('help/Python.html', missing_as_none=True)
SplitResult(scheme=None, netloc=None, path='help/Python.html',
query=None, fragment=None)

The *scheme* argument gives the default addressing scheme, to be
used only if the URL does not specify one. It should be the same type
(text or bytes) as *urlstring* or ``None``, except that the ``''`` is
always allowed, and is automatically converted to ``b''`` if appropriate.

If the *allow_fragments* argument is false, fragment identifiers are not

Check warning on line 126 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: fragment [ref.attr]
recognized. Instead, they are parsed as part of the path, parameters
recognized. Instead, they are parsed as part of the path
or query component, and :attr:`fragment` is set to ``None`` or the empty
string (depending on the value of *missing_as_none*) in the return value.

Expand All @@ -134,27 +135,24 @@
| Attribute | Index | Value | Value if not present |
+==================+=======+=========================+===============================+
| :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter or |
| | | | empty string [1]_ |

Check warning on line 138 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: scheme [ref.attr]
+------------------+-------+-------------------------+-------------------------------+
| :attr:`netloc` | 1 | Network location part | ``None`` or empty string [1]_ |
+------------------+-------+-------------------------+-------------------------------+

Check warning on line 141 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: netloc [ref.attr]
| :attr:`path` | 2 | Hierarchical path | empty string |
+------------------+-------+-------------------------+-------------------------------+

Check warning on line 143 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: path [ref.attr]
| :attr:`params` | 3 | Parameters for last | ``None`` or empty string [1]_ |
| | | path element | |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`query` | 4 | Query component | ``None`` or empty string [1]_ |
| :attr:`query` | 3 | Query component | ``None`` or empty string [1]_ |
+------------------+-------+-------------------------+-------------------------------+

Check warning on line 145 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: query [ref.attr]
| :attr:`fragment` | 5 | Fragment identifier | ``None`` or empty string [1]_ |
| :attr:`fragment` | 4 | Fragment identifier | ``None`` or empty string [1]_ |
+------------------+-------+-------------------------+-------------------------------+

Check warning on line 147 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: fragment [ref.attr]
| :attr:`username` | | User name | ``None`` |
+------------------+-------+-------------------------+-------------------------------+

Check warning on line 149 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: username [ref.attr]
| :attr:`password` | | Password | ``None`` |
+------------------+-------+-------------------------+-------------------------------+

Check warning on line 151 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: password [ref.attr]
| :attr:`hostname` | | Host name (lower case) | ``None`` |
+------------------+-------+-------------------------+-------------------------------+

Check warning on line 153 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: hostname [ref.attr]
| :attr:`port` | | Port number as integer, | ``None`` |
| | | if present | |

Check warning on line 155 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: port [ref.attr]
+------------------+-------+-------------------------+-------------------------------+

.. [1] Depending on the value of the *missing_as_none* argument.
Expand All @@ -171,26 +169,30 @@
``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
decomposed before parsing, no error will be raised.

Following some of the `WHATWG spec`_ that updates :rfc:`3986`, leading C0
control and space characters are stripped from the URL. ``\n``,
``\r`` and tab ``\t`` characters are removed from the URL at any position.

As is the case with all named tuples, the subclass has a few additional methods
and attributes that are particularly useful. One such method is :meth:`_replace`.
The :meth:`_replace` method will return a new ParseResult object replacing specified
fields with new values.
The :meth:`_replace` method will return a new :class:`SplitResult` object
replacing specified fields with new values.

.. doctest::
:options: +NORMALIZE_WHITESPACE

>>> from urllib.parse import urlparse
>>> u = urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
>>> from urllib.parse import urlsplit
>>> u = urlsplit('//www.cwi.nl:80/%7Eguido/Python.html')
>>> u
ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
params='', query='', fragment='')
SplitResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
query='', fragment='')
>>> u._replace(scheme='http')
ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
params='', query='', fragment='')
SplitResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
query='', fragment='')

.. warning::

:func:`urlparse` does not perform validation. See :ref:`URL parsing
:func:`urlsplit` does not perform validation. See :ref:`URL parsing
security <url-parsing-security>` for details.

.. versionchanged:: 3.2
Expand All @@ -209,9 +211,17 @@
Characters that affect netloc parsing under NFKC normalization will
now raise :exc:`ValueError`.

.. versionchanged:: 3.10
ASCII newline and tab characters are stripped from the URL.

.. versionchanged:: 3.12
Leading WHATWG C0 control and space characters are stripped from the URL.

.. versionchanged:: next
Added the *missing_as_none* parameter.

.. _WHATWG spec: https://url.spec.whatwg.org/#concept-basic-url-parser


.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')

Expand Down Expand Up @@ -306,11 +316,11 @@
separator key, with ``&`` as the default separator.


.. function:: urlunparse(parts)
urlunparse(parts, *, keep_empty)
.. function:: urlunsplit(parts)
urlunsplit(parts, *, keep_empty)

Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
argument can be any six-item iterable.
Construct a URL from a tuple as returned by :func:`urlsplit`. The *parts*
argument can be any five-item iterable.

This may result in a slightly different, but equivalent URL, if the
URL that was parsed originally had unnecessary delimiters (for example,
Expand All @@ -321,97 +331,33 @@
This allows rebuilding a URL that was parsed with option
``missing_as_none=True``.
By default, *keep_empty* is true if *parts* is the result of the
:func:`urlparse` call with ``missing_as_none=True``.
:func:`urlsplit` call with ``missing_as_none=True``.

.. versionchanged:: next
Added the *keep_empty* parameter.


.. function:: urlsplit(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False)

This is similar to :func:`urlparse`, but does not split the params from the URL.
This should generally be used instead of :func:`urlparse` if the more recent URL
syntax allowing parameters to be applied to each segment of the *path* portion
of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
separate the path segments and parameters. This function returns a 5-item
:term:`named tuple`::

(addressing scheme, network location, path, query, fragment identifier).

The return value is a :term:`named tuple`, its items can be accessed by index
or as named attributes:

+------------------+-------+-------------------------+-------------------------------+
| Attribute | Index | Value | Value if not present |
+==================+=======+=========================+===============================+
| :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter or |
| | | | empty string [1]_ |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`netloc` | 1 | Network location part | ``None`` or empty string [2]_ |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`path` | 2 | Hierarchical path | empty string |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`query` | 3 | Query component | ``None`` or empty string [2]_ |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`fragment` | 4 | Fragment identifier | ``None`` or empty string [2]_ |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`username` | | User name | ``None`` |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`password` | | Password | ``None`` |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`hostname` | | Host name (lower case) | ``None`` |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`port` | | Port number as integer, | ``None`` |
| | | if present | |
+------------------+-------+-------------------------+-------------------------------+

.. [2] Depending on the value of the *missing_as_none* argument.

Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
an invalid port is specified in the URL. See section
:ref:`urlparse-result-object` for more information on the result object.

Unmatched square brackets in the :attr:`netloc` attribute will raise a
:exc:`ValueError`.

Characters in the :attr:`netloc` attribute that decompose under NFKC
normalization (as used by the IDNA encoding) into any of ``/``, ``?``,
``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
decomposed before parsing, no error will be raised.

Following some of the `WHATWG spec`_ that updates RFC 3986, leading C0
control and space characters are stripped from the URL. ``\n``,
``\r`` and tab ``\t`` characters are removed from the URL at any position.

.. warning::

:func:`urlsplit` does not perform validation. See :ref:`URL parsing
security <url-parsing-security>` for details.

.. versionchanged:: 3.6
Out-of-range port numbers now raise :exc:`ValueError`, instead of
returning ``None``.

.. versionchanged:: 3.8
Characters that affect netloc parsing under NFKC normalization will
now raise :exc:`ValueError`.

.. versionchanged:: 3.10
ASCII newline and tab characters are stripped from the URL.

.. versionchanged:: 3.12
Leading WHATWG C0 control and space characters are stripped from the URL.
.. function:: urlparse(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False)

.. versionchanged:: next
Added the *missing_as_none* parameter.
This is similar to :func:`urlsplit`, but additionally splits the *path*
component on *path* and *params*.
This function returns a 6-item :term:`named tuple` :class:`ParseResult`
or :class:`ParseResultBytes`.
Its items are the same as for the :func:`!urlsplit` result, except that
*params* is inserted at index 3, between *path* and *query*.

.. _WHATWG spec: https://url.spec.whatwg.org/#concept-basic-url-parser
This function is based on obsoleted :rfc:`1738` and :rfc:`1808`, which
listed *params* as the main URL component.
The more recent URL syntax allows parameters to be applied to each segment
of the *path* portion of the URL (see :rfc:`3986`).
:func:`urlsplit` should generally be used instead of :func:`urlparse`.
A separate function is needed to separate the path segments and parameters.

.. function:: urlunsplit(parts)
urlunsplit(parts, *, keep_empty)
.. function:: urlunparse(parts)
urlunparse(parts, *, keep_empty)

Combine the elements of a tuple as returned by :func:`urlsplit` into a
complete URL as a string. The *parts* argument can be any five-item
Combine the elements of a tuple as returned by :func:`urlparse` into a
complete URL as a string. The *parts* argument can be any six-item
iterable.

This may result in a slightly different, but equivalent URL, if the
Expand All @@ -423,7 +369,7 @@
This allows rebuilding a URL that was parsed with option
``missing_as_none=True``.
By default, *keep_empty* is true if *parts* is the result of the
:func:`urlsplit` call with ``missing_as_none=True``.
:func:`urlparse` call with ``missing_as_none=True``.

.. versionchanged:: next
Added the *keep_empty* parameter.
Expand All @@ -441,7 +387,7 @@
'http://www.cwi.nl/%7Eguido/FAQ.html'

The *allow_fragments* argument has the same meaning and default as for
:func:`urlparse`.
:func:`urlsplit`.

.. note::

Expand Down Expand Up @@ -587,7 +533,7 @@
Structured Parse Results
------------------------

The result objects from the :func:`urlparse`, :func:`urlsplit` and
The result objects from the :func:`urlsplit`, :func:`urlparse` and
:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
These subclasses add the attributes listed in the documentation for
those functions, the encoding and decoding support described in the
Expand Down
4 changes: 2 additions & 2 deletions Doc/library/venv.rst
Original file line number Diff line number Diff line change
Expand Up @@ -550,7 +550,7 @@ subclass which installs setuptools and pip into a created virtual environment::
from subprocess import Popen, PIPE
import sys
from threading import Thread
from urllib.parse import urlparse
from urllib.parse import urlsplit
from urllib.request import urlretrieve
import venv

Expand Down Expand Up @@ -621,7 +621,7 @@ subclass which installs setuptools and pip into a created virtual environment::
stream.close()

def install_script(self, context, name, url):
_, _, path, _, _, _ = urlparse(url)
_, _, path, _, _ = urlsplit(url)
fn = os.path.split(path)[-1]
binpath = context.bin_path
distpath = os.path.join(binpath, fn)
Expand Down
8 changes: 4 additions & 4 deletions Doc/whatsnew/3.15.rst
Original file line number Diff line number Diff line change
Expand Up @@ -968,10 +968,10 @@ unittest
urllib.parse
------------

* Add the *missing_as_none* parameter to :func:`~urllib.parse.urlparse`,
:func:`~urllib.parse.urlsplit` and :func:`~urllib.parse.urldefrag` functions.
Add the *keep_empty* parameter to :func:`~urllib.parse.urlunparse` and
:func:`~urllib.parse.urlunsplit` functions.
* Add the *missing_as_none* parameter to :func:`~urllib.parse.urlsplit`,
:func:`~urllib.parse.urlparse` and :func:`~urllib.parse.urldefrag` functions.
Add the *keep_empty* parameter to :func:`~urllib.parse.urlunsplit` and
:func:`~urllib.parse.urlunparse` functions.
This allows to distinguish between empty and not defined URI components
and preserve empty components.
(Contributed by Serhiy Storchaka in :gh:`67041`.)
Expand Down
Loading
Loading