././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1716919522.8112032 uritools-4.0.3/0000775000175000017500000000000014625416343012132 5ustar00tkemtkem././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1716919430.0 uritools-4.0.3/CHANGELOG.rst0000664000175000017500000001263214625416206014155 0ustar00tkemtkemv4.0.3 (2024-05-28) =================== - Prepare for Python 3.13. v4.0.2 (2023-08-30) =================== - Depend on Python >= 3.7. - Support Python 3.12. v4.0.1 (2023-01-08) =================== - Add support for Python 3.11. - Correct version information in RTD documentation. - ``badges/shields``: Change to GitHub workflow badge routes. v4.0.0 (2022-01-02) =================== - Require Python 3.7 or later (breaking change). - Remove undocumented submodules (breaking change). The ``chars``, ``classify``, ``compose``, ``defrag``, ``encoding``, ``join`` and ``split`` submodules have been deleted. Therefore, statements like ``from uritools.classify import isuri`` will no longer work. Use ``from uritools import isuri`` instead. v3.0.2 (2021-04-27) =================== - Update build environment. v3.0.1 (2021-03-09) =================== - Do not convert percent-encodings to uppercase in host components generated by ``uricompose()``. - Officially support Python 3.9. - Format code with Black. v3.0.0 (2019-12-15) =================== - Require Python 3.5 or later. v2.2.0 (2018-05-17) =================== - Add URI classification methods and functions. v2.1.1 (2018-05-13) =================== - Treat URIs with invalid schemes as relative references. v2.1.0 (2017-10-07) =================== - Add ``SplitResult.getauthority()``. - Add optional ``errors`` parameter to ``SplitResult.gethost()``. v2.0.1 (2017-09-13) =================== - Officially support Python 3.6. - Move documentation to RTD. - Fix ``flake8`` checks. v2.0.0 (2016-10-09) =================== - Drop Python 3.2 support (breaking change). - No longer treat semicolons as query separators by default (breaking change). - Add optional ``sep`` parameter to ``SplitResult.getquerydict()`` and ``SplitResult.getquerylist()`` (breaks ``encoding`` when passed as positional argument). - Add optional ``querysep`` parameter to ``uricompose()`` (breaks ``encoding`` when passed as positional argument). v1.0.2 (2016-04-08) =================== - Fix ``uriencode()`` documentation and unit tests requiring the ``safe`` parameter to be a ``bytes`` object. v1.0.1 (2015-07-09) =================== - Encode semicolon in query values passed to ``uricompose()``. v1.0.0 (2015-06-12) =================== - Fix use of URI references as base URIs in ``urijoin()`` and ``SplitResult.transform()``. - Remove ``SplitResult.getaddrinfo()``. - Remove ``SplitResult.getauthority()``. - Remove ``SplitResult.gethostip()``; return ``ipaddress`` address objects from ``SplitResult.gethost()`` instead. - Remove ``SplitResult.gethost()`` ``encoding`` parameter. - Remove query delimiter parameters. - Return normalized paths from ``SplitResult.getpath()``. - Convert character constants to strings. v0.12.0 (2015-04-03) ==================== - Deprecate ``SplitResult.getaddrinfo()``. - Deprecate ``SplitResult.getauthority()``. - Deprecate ``SplitResult.gethost()`` and ``SplitResult.gethostip()`` ``encoding`` parameter; always use ``utf-8`` instead. - Drop support for "bytes-like objects". - Remove ``DefragResult.base``. v0.11.1 (2015-03-25) ==================== - Fix ``uricompose()`` for relative-path references with colons in the first path segment. v0.11.0 (2014-12-16) ==================== - Support ``encoding=None`` for ``uriencode()`` and ``uridecode()``. - Add optional ``errors`` parameter to decoding methods. v0.10.1 (2014-11-30) ==================== - Make ``uricompose()`` return ``str`` on all Python versions. v0.10.0 (2014-11-30) ==================== - Use ``ipaddress`` module for handling IPv4/IPv6 host addresses. - Add ``userinfo``, ``host`` and ``port`` keyword arguments to ``uricompose()``. - Deprecate ``DefragResult.base``. - Feature freeze for v1.0. v0.9.0 (2014-11-21) =================== - Improve Python 3 support. v0.8.0 (2014-11-04) =================== - Fix ``uriencode()`` and ``uridecode()``. - Deprecate ``RE``, ``urinormpath()``, ``DefragResult.getbase()``. - Support non-string query values in ``uricompose()``. v0.7.0 (2014-10-12) =================== - Add optional port parameter to ``SplitResult.getaddrinfo()``. - Cache ``SplitResult.authority`` subcomponents. v0.6.0 (2014-09-17) =================== - Add basic IPv6 support. - Change ``SplitResult.port`` back to string, to distinguish between empty and absent port components. - Remove ``querysep`` and ``sep`` parameters. - Do not raise ``ValueError`` if scheme is not well-formed. - Improve Python 3 support. v0.5.2 (2014-08-06) =================== - Fix empty port handling. v0.5.1 (2014-06-22) =================== - Add basic Python 3 support. v0.5.0 (2014-06-21) =================== - Add ``SplitResult.getaddrinfo()``. - Support query mappings and sequences in ``uricompose()``. v0.4.0 (2014-03-20) =================== - Fix ``SplitResult.port`` to return int (matching urlparse). - Add ``SplitResult.getquerylist(), SplitResult.getquerydict()``. v0.3.0 (2014-03-02) =================== - Add result object accessor methods. - Update documentation. v0.2.1 (2014-02-24) =================== - Fix IndexError in ``urinormpath()``. - Integrate Python 2.7.6 ``urlparse`` unit tests. v0.2.0 (2014-02-18) =================== - Add authority subcomponent attributes. - Return ``DefragResult`` from ``uridefrag()``. - Improve edge case behavior. v0.1.0 (2014-02-14) =================== - Initial beta release. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1716919411.0 uritools-4.0.3/LICENSE0000664000175000017500000000207514625416163013143 0ustar00tkemtkemThe MIT License (MIT) Copyright (c) 2014-2024 Thomas Kemmer Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1714588891.0 uritools-4.0.3/MANIFEST.in0000644000175000017500000000030114614506333013655 0ustar00tkemtkeminclude CHANGELOG.rst include LICENSE include MANIFEST.in include README.rst include tox.ini exclude .readthedocs.yaml recursive-include docs * prune docs/_build recursive-include tests *.py ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1716919522.8112032 uritools-4.0.3/PKG-INFO0000644000175000017500000001115514625416343013230 0ustar00tkemtkemMetadata-Version: 2.1 Name: uritools Version: 4.0.3 Summary: URI parsing, classification and composition Home-page: https://github.com/tkem/uritools/ Author: Thomas Kemmer Author-email: tkemmer@computer.org License: MIT Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Other Environment Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: MIT License Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: 3.12 Classifier: Topic :: Software Development :: Libraries :: Python Modules Requires-Python: >=3.7 License-File: LICENSE uritools ======================================================================== .. image:: https://img.shields.io/pypi/v/uritools :target: https://pypi.org/project/uritools :alt: Latest PyPI version .. image:: https://img.shields.io/github/actions/workflow/status/tkem/uritools/ci.yml :target: https://github.com/tkem/uritools/actions/workflows/ci.yml :alt: CI build status .. image:: https://img.shields.io/readthedocs/uritools :target: https://uritools.readthedocs.io :alt: Documentation build status .. image:: https://img.shields.io/codecov/c/github/tkem/uritools/master.svg :target: https://codecov.io/gh/tkem/uritools :alt: Test coverage .. image:: https://img.shields.io/librariesio/sourcerank/pypi/uritools :target: https://libraries.io/pypi/uritools :alt: Libraries.io SourceRank .. image:: https://img.shields.io/github/license/tkem/uritools :target: https://raw.github.com/tkem/uritools/master/LICENSE :alt: License .. image:: https://img.shields.io/badge/code%20style-black-000000.svg :target: https://github.com/psf/black :alt: Code style: black This module provides RFC 3986 compliant functions for parsing, classifying and composing URIs and URI references, largely replacing the Python Standard Library's ``urllib.parse`` module. .. code-block:: pycon >>> from uritools import uricompose, urijoin, urisplit, uriunsplit >>> uricompose(scheme='foo', host='example.com', port=8042, ... path='/over/there', query={'name': 'ferret'}, ... fragment='nose') 'foo://example.com:8042/over/there?name=ferret#nose' >>> parts = urisplit(_) >>> parts.scheme 'foo' >>> parts.authority 'example.com:8042' >>> parts.getport(default=80) 8042 >>> parts.getquerydict().get('name') ['ferret'] >>> parts.isuri() True >>> parts.isabsuri() False >>> urijoin(uriunsplit(parts), '/right/here?name=swallow#beak') 'foo://example.com:8042/right/here?name=swallow#beak' For various reasons, ``urllib.parse`` and its Python 2 predecessor ``urlparse`` are not compliant with current Internet standards. As stated in `Lib/urllib/parse.py `_: RFC 3986 is considered the current standard and any future changes to urlparse module should conform with it. The urlparse module is currently not entirely compliant with this RFC due to defacto scenarios for parsing, and for backward compatibility purposes, some parsing quirks from older RFCs are retained. This module aims to provide fully RFC 3986 compliant replacements for the most commonly used functions found in ``urllib.parse``. It also includes functions for distinguishing between the different forms of URIs and URI references, and for conveniently creating URIs from their individual components. Installation ------------------------------------------------------------------------ uritools is available from PyPI_ and can be installed by running:: pip install uritools Project Resources ------------------------------------------------------------------------ - `Documentation`_ - `Issue tracker`_ - `Source code`_ - `Change log`_ License ------------------------------------------------------------------------ Copyright (c) 2014-2023 Thomas Kemmer. Licensed under the `MIT License`_. .. _PyPI: https://pypi.org/project/uritools/ .. _Documentation: https://uritools.readthedocs.io/ .. _Issue tracker: https://github.com/tkem/uritools/issues/ .. _Source code: https://github.com/tkem/uritools/ .. _Change log: https://github.com/tkem/uritools/blob/master/CHANGELOG.rst .. _MIT License: https://raw.github.com/tkem/uritools/master/LICENSE ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1673210476.0 uritools-4.0.3/README.rst0000664000175000017500000000723514356625154013633 0ustar00tkemtkemuritools ======================================================================== .. image:: https://img.shields.io/pypi/v/uritools :target: https://pypi.org/project/uritools :alt: Latest PyPI version .. image:: https://img.shields.io/github/actions/workflow/status/tkem/uritools/ci.yml :target: https://github.com/tkem/uritools/actions/workflows/ci.yml :alt: CI build status .. image:: https://img.shields.io/readthedocs/uritools :target: https://uritools.readthedocs.io :alt: Documentation build status .. image:: https://img.shields.io/codecov/c/github/tkem/uritools/master.svg :target: https://codecov.io/gh/tkem/uritools :alt: Test coverage .. image:: https://img.shields.io/librariesio/sourcerank/pypi/uritools :target: https://libraries.io/pypi/uritools :alt: Libraries.io SourceRank .. image:: https://img.shields.io/github/license/tkem/uritools :target: https://raw.github.com/tkem/uritools/master/LICENSE :alt: License .. image:: https://img.shields.io/badge/code%20style-black-000000.svg :target: https://github.com/psf/black :alt: Code style: black This module provides RFC 3986 compliant functions for parsing, classifying and composing URIs and URI references, largely replacing the Python Standard Library's ``urllib.parse`` module. .. code-block:: pycon >>> from uritools import uricompose, urijoin, urisplit, uriunsplit >>> uricompose(scheme='foo', host='example.com', port=8042, ... path='/over/there', query={'name': 'ferret'}, ... fragment='nose') 'foo://example.com:8042/over/there?name=ferret#nose' >>> parts = urisplit(_) >>> parts.scheme 'foo' >>> parts.authority 'example.com:8042' >>> parts.getport(default=80) 8042 >>> parts.getquerydict().get('name') ['ferret'] >>> parts.isuri() True >>> parts.isabsuri() False >>> urijoin(uriunsplit(parts), '/right/here?name=swallow#beak') 'foo://example.com:8042/right/here?name=swallow#beak' For various reasons, ``urllib.parse`` and its Python 2 predecessor ``urlparse`` are not compliant with current Internet standards. As stated in `Lib/urllib/parse.py `_: RFC 3986 is considered the current standard and any future changes to urlparse module should conform with it. The urlparse module is currently not entirely compliant with this RFC due to defacto scenarios for parsing, and for backward compatibility purposes, some parsing quirks from older RFCs are retained. This module aims to provide fully RFC 3986 compliant replacements for the most commonly used functions found in ``urllib.parse``. It also includes functions for distinguishing between the different forms of URIs and URI references, and for conveniently creating URIs from their individual components. Installation ------------------------------------------------------------------------ uritools is available from PyPI_ and can be installed by running:: pip install uritools Project Resources ------------------------------------------------------------------------ - `Documentation`_ - `Issue tracker`_ - `Source code`_ - `Change log`_ License ------------------------------------------------------------------------ Copyright (c) 2014-2023 Thomas Kemmer. Licensed under the `MIT License`_. .. _PyPI: https://pypi.org/project/uritools/ .. _Documentation: https://uritools.readthedocs.io/ .. _Issue tracker: https://github.com/tkem/uritools/issues/ .. _Source code: https://github.com/tkem/uritools/ .. _Change log: https://github.com/tkem/uritools/blob/master/CHANGELOG.rst .. _MIT License: https://raw.github.com/tkem/uritools/master/LICENSE ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1716919522.8072033 uritools-4.0.3/docs/0000775000175000017500000000000014625416343013062 5ustar00tkemtkem././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1597080907.0 uritools-4.0.3/docs/.gitignore0000644000175000017500000000000713714302513015034 0ustar00tkemtkem_build ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1597080907.0 uritools-4.0.3/docs/Makefile0000644000175000017500000001270413714302513014513 0ustar00tkemtkem# Makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build PAPER = BUILDDIR = _build # Internal variables. PAPEROPT_a4 = -D latex_paper_size=a4 PAPEROPT_letter = -D latex_paper_size=letter ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . # the i18n builder cannot share the environment and doctrees with the others I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext help: @echo "Please use \`make ' where is one of" @echo " html to make standalone HTML files" @echo " dirhtml to make HTML files named index.html in directories" @echo " singlehtml to make a single large HTML file" @echo " pickle to make pickle files" @echo " json to make JSON files" @echo " htmlhelp to make HTML files and a HTML help project" @echo " qthelp to make HTML files and a qthelp project" @echo " devhelp to make HTML files and a Devhelp project" @echo " epub to make an epub" @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" @echo " latexpdf to make LaTeX files and run them through pdflatex" @echo " text to make text files" @echo " man to make manual pages" @echo " texinfo to make Texinfo files" @echo " info to make Texinfo files and run them through makeinfo" @echo " gettext to make PO message catalogs" @echo " changes to make an overview of all changed/added/deprecated items" @echo " linkcheck to check all external links for integrity" @echo " doctest to run all doctests embedded in the documentation (if enabled)" clean: -rm -rf $(BUILDDIR)/* html: $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." dirhtml: $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." singlehtml: $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml @echo @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." pickle: $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle @echo @echo "Build finished; now you can process the pickle files." json: $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json @echo @echo "Build finished; now you can process the JSON files." htmlhelp: $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp @echo @echo "Build finished; now you can run HTML Help Workshop with the" \ ".hhp project file in $(BUILDDIR)/htmlhelp." qthelp: $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp @echo @echo "Build finished; now you can run "qcollectiongenerator" with the" \ ".qhcp project file in $(BUILDDIR)/qthelp, like this:" @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/uritools.qhcp" @echo "To view the help file:" @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/uritools.qhc" devhelp: $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp @echo @echo "Build finished." @echo "To view the help file:" @echo "# mkdir -p $$HOME/.local/share/devhelp/uritools" @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/uritools" @echo "# devhelp" epub: $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub @echo @echo "Build finished. The epub file is in $(BUILDDIR)/epub." latex: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." @echo "Run \`make' in that directory to run these through (pdf)latex" \ "(use \`make latexpdf' here to do that automatically)." latexpdf: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through pdflatex..." $(MAKE) -C $(BUILDDIR)/latex all-pdf @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." text: $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text @echo @echo "Build finished. The text files are in $(BUILDDIR)/text." man: $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man @echo @echo "Build finished. The manual pages are in $(BUILDDIR)/man." texinfo: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." @echo "Run \`make' in that directory to run these through makeinfo" \ "(use \`make info' here to do that automatically)." info: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo "Running Texinfo files through makeinfo..." make -C $(BUILDDIR)/texinfo info @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." gettext: $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale @echo @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." changes: $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes @echo @echo "The overview file is in $(BUILDDIR)/changes." linkcheck: $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck @echo @echo "Link check complete; look for any errors in the above output " \ "or in $(BUILDDIR)/linkcheck/output.txt." doctest: $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest @echo "Testing of doctests in the sources finished, look at the " \ "results in $(BUILDDIR)/doctest/output.txt." ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1716919411.0 uritools-4.0.3/docs/conf.py0000664000175000017500000000161314625416163014362 0ustar00tkemtkemimport pathlib import sys src_directory = (pathlib.Path(__file__).parent.parent / "src").resolve() sys.path.insert(0, str(src_directory)) # Extract the current version from the source. def get_version(): """Get the version and release from the source code.""" text = (src_directory / "uritools/__init__.py").read_text() for line in text.splitlines(): if not line.strip().startswith("__version__"): continue full_version = line.partition("=")[2].strip().strip("\"'") partial_version = ".".join(full_version.split(".")[:2]) return full_version, partial_version project = "uritools" copyright = "2014-2024 Thomas Kemmer" release, version = get_version() extensions = [ "sphinx.ext.autodoc", "sphinx.ext.coverage", "sphinx.ext.doctest", "sphinx.ext.todo", ] exclude_patterns = ["_build"] master_doc = "index" html_theme = "classic" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1597080907.0 uritools-4.0.3/docs/index.rst0000644000175000017500000002213213714302513014710 0ustar00tkemtkem*************************************************************** :mod:`uritools` --- URI parsing, classification and composition *************************************************************** .. module:: uritools This module provides RFC 3986 compliant functions for parsing, classifying and composing URIs and URI references, largely replacing the Python Standard Library's :mod:`urllib.parse` module. .. doctest:: >>> from uritools import uricompose, urijoin, urisplit, uriunsplit >>> uricompose(scheme='foo', host='example.com', port=8042, ... path='/over/there', query={'name': 'ferret'}, ... fragment='nose') 'foo://example.com:8042/over/there?name=ferret#nose' >>> parts = urisplit(_) >>> parts.scheme 'foo' >>> parts.authority 'example.com:8042' >>> parts.getport(default=80) 8042 >>> parts.getquerydict().get('name') ['ferret'] >>> parts.isuri() True >>> parts.isabsuri() False >>> urijoin(uriunsplit(parts), '/right/here?name=swallow#beak') 'foo://example.com:8042/right/here?name=swallow#beak' For various reasons, :mod:`urllib.parse` and its Python 2 predecessor :mod:`urlparse` are not compliant with current Internet standards. As stated in `Lib/urllib/parse.py `_: RFC 3986 is considered the current standard and any future changes to urlparse module should conform with it. The urlparse module is currently not entirely compliant with this RFC due to defacto scenarios for parsing, and for backward compatibility purposes, some parsing quirks from older RFCs are retained. This module aims to provide fully RFC 3986 compliant replacements for the most commonly used functions found in :mod:`urllib.parse`. It also includes functions for distinguishing between the different forms of URIs and URI references, and for conveniently creating URIs from their individual components. .. seealso:: :rfc:`3986` - Uniform Resource Identifier (URI): Generic Syntax The current Internet standard (STD66) defining URI syntax, to which any changes to :mod:`uritools` should conform. If deviations are observed, the module's implementation should be changed, even if this means breaking backward compatibility. URI Classification ================== According to RFC 3986, a URI reference is either a URI or a *relative reference*. If the URI reference's prefix does not match the syntax of a scheme followed by its colon separator, then the URI reference is a relative reference. A relative reference that begins with two slash characters is termed a *network-path* reference. A relative reference that begins with a single slash character is termed an *absolute-path* reference. A relative reference that does not begin with a slash character is termed a *relative-path* reference. When a URI reference refers to a URI that is, aside from its fragment component, identical to the base URI, that reference is called a *same-document* reference. Examples of same-document references are relative references that are empty or include only the number sign ("#") separator followed by a fragment identifier. A URI without a fragment identifier is termed an *absolute URI*. A base URI, for example, must be an absolute URI. If the base URI is obtained from a URI reference, then that reference must be stripped of any fragment component prior to its use as a base URI. .. autofunction:: isuri .. autofunction:: isabsuri .. autofunction:: isnetpath .. autofunction:: isabspath .. autofunction:: isrelpath .. autofunction:: issamedoc URI Composition =============== .. autofunction:: uricompose All components may be specified as either Unicode strings, which will be encoded according to `encoding`, or :class:`bytes` objects. `authority` may also be passed a three-item iterable specifying userinfo, host and port subcomponents. If both `authority` and any of the `userinfo`, `host` or `port` keyword arguments are given, the keyword argument will override the corresponding `authority` subcomponent. `query` may also be passed a mapping object or a sequence of two-element tuples, which will be converted to a string of `name=value` pairs separated by `querysep`. The returned URI reference is of type :class:`str`. .. autofunction:: urijoin If `strict` is :const:`False`, a scheme in the reference is ignored if it is identical to the base URI's scheme. .. autofunction:: uriunsplit URI Decomposition ================= .. autofunction:: uridefrag The return value is an instance of a subclass of :class:`collections.namedtuple` with the following read-only attributes: +-------------------+-------+---------------------------------------------+ | Attribute | Index | Value | +===================+=======+=============================================+ | :attr:`uri` | 0 | Absolute URI, or relative reference without | | | | a fragment identifier | +-------------------+-------+---------------------------------------------+ | :attr:`fragment` | 1 | Fragment identifier, or :const:`None` if no | | | | fragment was present | +-------------------+-------+---------------------------------------------+ .. autofunction:: urisplit The return value is an instance of a subclass of :class:`collections.namedtuple` with the following read-only attributes: +-------------------+-------+---------------------------------------------+ | Attribute | Index | Value | +===================+=======+=============================================+ | :attr:`scheme` | 0 | URI scheme, or :const:`None` if not present | +-------------------+-------+---------------------------------------------+ | :attr:`authority` | 1 | Authority component, | | | | or :const:`None` if not present | +-------------------+-------+---------------------------------------------+ | :attr:`path` | 2 | Path component, always present but may be | | | | empty | +-------------------+-------+---------------------------------------------+ | :attr:`query` | 3 | Query component, | | | | or :const:`None` if not present | +-------------------+-------+---------------------------------------------+ | :attr:`fragment` | 4 | Fragment identifier, | | | | or :const:`None` if not present | +-------------------+-------+---------------------------------------------+ | :attr:`userinfo` | | Userinfo subcomponent of `authority`, | | | | or :const:`None` if not present | +-------------------+-------+---------------------------------------------+ | :attr:`host` | | Host subcomponent of `authority`, | | | | or :const:`None` if not present | +-------------------+-------+---------------------------------------------+ | :attr:`port` | | Port subcomponent of `authority` as a | | | | (possibly empty) string, | | | | or :const:`None` if not present | +-------------------+-------+---------------------------------------------+ URI Encoding ============ .. autofunction:: uridecode If `encoding` is set to :const:`None`, return the percent-decoded `uristring` as a :class:`bytes` object. Otherwise, replace any percent-encodings and decode `uristring` using the codec registered for `encoding`, returning a Unicode string. .. autofunction:: uriencode If `uristring` is a :class:`bytes` object, replace any characters not in :const:`UNRESERVED` or `safe` with their corresponding percent-encodings and return the result as a :class:`bytes` object. Otherwise, encode `uristring` using the codec registered for `encoding` before replacing any percent encodings. Structured Parse Results ======================== The result objects from the :func:`uridefrag` and :func:`urisplit` functions are instances of subclasses of :class:`collections.namedtuple`. These objects contain the attributes described in the function documentation, as well as some additional convenience methods. .. autoclass:: DefragResult :members: .. autoclass:: SplitResult :members: Character Constants =================== .. data:: GEN_DELIMS A string containing all general delimiting characters specified in RFC 3986. .. data:: RESERVED A string containing all reserved characters specified in RFC 3986. .. data:: SUB_DELIMS A string containing all subcomponent delimiting characters specified in RFC 3986. .. data:: UNRESERVED A string containing all unreserved characters specified in RFC 3986. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1633976314.0 uritools-4.0.3/pyproject.toml0000664000175000017500000000014414131077772015046 0ustar00tkemtkem[build-system] requires = ["setuptools >= 46.4.0", "wheel"] build-backend = "setuptools.build_meta" ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1716919522.8112032 uritools-4.0.3/setup.cfg0000664000175000017500000000226414625416343013757 0ustar00tkemtkem[metadata] name = uritools version = attr: uritools.__version__ url = https://github.com/tkem/uritools/ author = Thomas Kemmer author_email = tkemmer@computer.org license = MIT license_files = LICENSE description = URI parsing, classification and composition long_description = file: README.rst classifiers = Development Status :: 5 - Production/Stable Environment :: Other Environment Intended Audience :: Developers License :: OSI Approved :: MIT License Operating System :: OS Independent Programming Language :: Python Programming Language :: Python :: 3 Programming Language :: Python :: 3.7 Programming Language :: Python :: 3.8 Programming Language :: Python :: 3.9 Programming Language :: Python :: 3.10 Programming Language :: Python :: 3.11 Programming Language :: Python :: 3.12 Topic :: Software Development :: Libraries :: Python Modules [options] package_dir = = src packages = find: python_requires = >= 3.7 [options.packages.find] where = src [flake8] max-line-length = 80 exclude = .git, .tox, build select = C, E, F, W, B, B950, I, N ignore = E501, W503 [build_sphinx] source-dir = docs/ build-dir = docs/_build all_files = 1 [egg_info] tag_build = tag_date = 0 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1597080907.0 uritools-4.0.3/setup.py0000644000175000017500000000004613714302513013631 0ustar00tkemtkemfrom setuptools import setup setup() ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1716919522.8032033 uritools-4.0.3/src/0000775000175000017500000000000014625416343012721 5ustar00tkemtkem././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1716919522.8072033 uritools-4.0.3/src/uritools/0000775000175000017500000000000014625416343014601 5ustar00tkemtkem././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1716919411.0 uritools-4.0.3/src/uritools/__init__.py0000664000175000017500000006457214625416163016730 0ustar00tkemtkem"""RFC 3986 compliant, scheme-agnostic replacement for `urllib.parse`. This module defines RFC 3986 compliant replacements for the most commonly used functions of the Python Standard Library :mod:`urllib.parse` module. """ import collections import collections.abc import ipaddress import numbers import re from string import hexdigits __all__ = ( "GEN_DELIMS", "RESERVED", "SUB_DELIMS", "UNRESERVED", "isabspath", "isabsuri", "isnetpath", "isrelpath", "issamedoc", "isuri", "uricompose", "uridecode", "uridefrag", "uriencode", "urijoin", "urisplit", "uriunsplit", ) __version__ = "4.0.3" # RFC 3986 2.2. Reserved Characters # # reserved = gen-delims / sub-delims # # gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" # # sub-delims = "!" / "$" / "&" / "'" / "(" / ")" # / "*" / "+" / "," / ";" / "=" # GEN_DELIMS = ":/?#[]@" SUB_DELIMS = "!$&'()*+,;=" RESERVED = GEN_DELIMS + SUB_DELIMS # RFC 3986 2.3. Unreserved Characters # # unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" # UNRESERVED = ( "ABCDEFGHIJKLMNOPQRSTUVWXYZ" "abcdefghijklmnopqrstuvwxyz" "0123456789" "-._~" ) _unreserved = frozenset(UNRESERVED.encode()) # RFC 3986 2.1: For consistency, URI producers and normalizers should # use uppercase hexadecimal digits for all percent-encodings. _encoded = { b"": [ bytes([i]) if i in _unreserved else ("%%%02X" % i).encode() for i in range(256) ] } _decoded = { (a + b).encode(): bytes.fromhex(a + b) for a in hexdigits for b in hexdigits } def uriencode(uristring, safe="", encoding="utf-8", errors="strict"): """Encode a URI string or string component.""" if not isinstance(uristring, bytes): uristring = uristring.encode(encoding, errors) if not isinstance(safe, bytes): safe = safe.encode("ascii") try: encoded = _encoded[safe] except KeyError: encoded = _encoded[b""][:] for i in safe: encoded[i] = bytes([i]) _encoded[safe] = encoded return b"".join(map(encoded.__getitem__, uristring)) def uridecode(uristring, encoding="utf-8", errors="strict"): """Decode a URI string or string component.""" if not isinstance(uristring, bytes): uristring = uristring.encode(encoding or "ascii", errors) parts = uristring.split(b"%") result = [parts[0]] append = result.append decode = _decoded.get for s in parts[1:]: append(decode(s[:2], b"%" + s[:2])) append(s[2:]) if encoding is not None: return b"".join(result).decode(encoding, errors) else: return b"".join(result) class DefragResult(collections.namedtuple("DefragResult", "uri fragment")): """Class to hold :func:`uridefrag` results.""" __slots__ = () # prevent creation of instance dictionary def geturi(self): """Return the recombined version of the original URI as a string.""" fragment = self.fragment if fragment is None: return self.uri elif isinstance(fragment, bytes): return self.uri + b"#" + fragment else: return self.uri + "#" + fragment def getfragment(self, default=None, encoding="utf-8", errors="strict"): """Return the decoded fragment identifier, or `default` if the original URI did not contain a fragment component. """ fragment = self.fragment if fragment is not None: return uridecode(fragment, encoding, errors) else: return default class SplitResult( collections.namedtuple("SplitResult", "scheme authority path query fragment") ): """Base class to hold :func:`urisplit` results.""" __slots__ = () # prevent creation of instance dictionary @property def userinfo(self): authority = self.authority if authority is None: return None userinfo, present, _ = authority.rpartition(self.AT) if present: return userinfo else: return None @property def host(self): authority = self.authority if authority is None: return None _, _, hostinfo = authority.rpartition(self.AT) host, _, port = hostinfo.rpartition(self.COLON) if port.lstrip(self.DIGITS): return hostinfo else: return host @property def port(self): authority = self.authority if authority is None: return None _, present, port = authority.rpartition(self.COLON) if present and not port.lstrip(self.DIGITS): return port else: return None def geturi(self): """Return the re-combined version of the original URI reference as a string. """ scheme, authority, path, query, fragment = self # RFC 3986 5.3. Component Recomposition result = [] if scheme is not None: result.extend([scheme, self.COLON]) if authority is not None: result.extend([self.SLASH, self.SLASH, authority]) result.append(path) if query is not None: result.extend([self.QUEST, query]) if fragment is not None: result.extend([self.HASH, fragment]) return self.EMPTY.join(result) def getscheme(self, default=None): """Return the URI scheme in canonical (lowercase) form, or `default` if the original URI reference did not contain a scheme component. """ scheme = self.scheme if scheme is None: return default elif isinstance(scheme, bytes): return scheme.decode("ascii").lower() else: return scheme.lower() def getauthority(self, default=None, encoding="utf-8", errors="strict"): """Return the decoded userinfo, host and port subcomponents of the URI authority as a three-item tuple. """ # TBD: (userinfo, host, port) kwargs, default string? if default is None: default = (None, None, None) elif not isinstance(default, collections.abc.Iterable): raise TypeError("Invalid default type") elif len(default) != 3: raise ValueError("Invalid default length") # TODO: this could be much more efficient by using a dedicated regex return ( self.getuserinfo(default[0], encoding, errors), self.gethost(default[1], errors), self.getport(default[2]), ) def getuserinfo(self, default=None, encoding="utf-8", errors="strict"): """Return the decoded userinfo subcomponent of the URI authority, or `default` if the original URI reference did not contain a userinfo field. """ userinfo = self.userinfo if userinfo is None: return default else: return uridecode(userinfo, encoding, errors) def gethost(self, default=None, errors="strict"): """Return the decoded host subcomponent of the URI authority as a string or an :mod:`ipaddress` address object, or `default` if the original URI reference did not contain a host. """ host = self.host if host is None or (not host and default is not None): return default elif host.startswith(self.LBRACKET) and host.endswith(self.RBRACKET): return self.__parse_ip_literal(host[1:-1]) elif host.startswith(self.LBRACKET) or host.endswith(self.RBRACKET): raise ValueError("Invalid host %r" % host) # TODO: faster check for IPv4 address? try: if isinstance(host, bytes): return ipaddress.IPv4Address(host.decode("ascii")) else: return ipaddress.IPv4Address(host) except ValueError: return uridecode(host, "utf-8", errors).lower() def getport(self, default=None): """Return the port subcomponent of the URI authority as an :class:`int`, or `default` if the original URI reference did not contain a port or if the port was empty. """ port = self.port if port: return int(port) else: return default def getpath(self, encoding="utf-8", errors="strict"): """Return the normalized decoded URI path.""" path = self.__remove_dot_segments(self.path) return uridecode(path, encoding, errors) def getquery(self, default=None, encoding="utf-8", errors="strict"): """Return the decoded query string, or `default` if the original URI reference did not contain a query component. """ query = self.query if query is None: return default else: return uridecode(query, encoding, errors) def getquerydict(self, sep="&", encoding="utf-8", errors="strict"): """Split the query component into individual `name=value` pairs separated by `sep` and return a dictionary of query variables. The dictionary keys are the unique query variable names and the values are lists of values for each name. """ dict = collections.defaultdict(list) for name, value in self.getquerylist(sep, encoding, errors): dict[name].append(value) return dict def getquerylist(self, sep="&", encoding="utf-8", errors="strict"): """Split the query component into individual `name=value` pairs separated by `sep`, and return a list of `(name, value)` tuples. """ if not self.query: return [] elif isinstance(sep, type(self.query)): qsl = self.query.split(sep) elif isinstance(sep, bytes): qsl = self.query.split(sep.decode("ascii")) else: qsl = self.query.split(sep.encode("ascii")) items = [] for parts in [qs.partition(self.EQ) for qs in qsl if qs]: name = uridecode(parts[0], encoding, errors) if parts[1]: value = uridecode(parts[2], encoding, errors) else: value = None items.append((name, value)) return items def getfragment(self, default=None, encoding="utf-8", errors="strict"): """Return the decoded fragment identifier, or `default` if the original URI reference did not contain a fragment component. """ fragment = self.fragment if fragment is None: return default else: return uridecode(fragment, encoding, errors) def isuri(self): """Return :const:`True` if this is a URI.""" return self.scheme is not None def isabsuri(self): """Return :const:`True` if this is an absolute URI.""" return self.scheme is not None and self.fragment is None def isnetpath(self): """Return :const:`True` if this is a network-path reference.""" return self.scheme is None and self.authority is not None def isabspath(self): """Return :const:`True` if this is an absolute-path reference.""" return ( self.scheme is None and self.authority is None and self.path.startswith(self.SLASH) ) def isrelpath(self): """Return :const:`True` if this is a relative-path reference.""" return ( self.scheme is None and self.authority is None and not self.path.startswith(self.SLASH) ) def issamedoc(self): """Return :const:`True` if this is a same-document reference.""" return ( self.scheme is None and self.authority is None and not self.path and self.query is None ) def transform(self, ref, strict=False): """Transform a URI reference relative to `self` into a :class:`SplitResult` representing its target URI. """ scheme, authority, path, query, fragment = self.RE.match(ref).groups() # RFC 3986 5.2.2. Transform References if scheme is not None and (strict or scheme != self.scheme): path = self.__remove_dot_segments(path) elif authority is not None: scheme = self.scheme path = self.__remove_dot_segments(path) elif not path: scheme = self.scheme authority = self.authority path = self.path query = self.query if query is None else query elif path.startswith(self.SLASH): scheme = self.scheme authority = self.authority path = self.__remove_dot_segments(path) else: scheme = self.scheme authority = self.authority path = self.__remove_dot_segments(self.__merge(path)) return type(self)(scheme, authority, path, query, fragment) def __merge(self, path): # RFC 3986 5.2.3. Merge Paths if self.authority is not None and not self.path: return self.SLASH + path else: parts = self.path.rpartition(self.SLASH) return parts[1].join((parts[0], path)) @classmethod def __remove_dot_segments(cls, path): # RFC 3986 5.2.4. Remove Dot Segments pseg = [] for s in path.split(cls.SLASH): if s == cls.DOT: continue elif s != cls.DOTDOT: pseg.append(s) elif len(pseg) == 1 and not pseg[0]: continue elif pseg and pseg[-1] != cls.DOTDOT: pseg.pop() else: pseg.append(s) # adjust for trailing '/.' or '/..' if path.rpartition(cls.SLASH)[2] in (cls.DOT, cls.DOTDOT): pseg.append(cls.EMPTY) if path and len(pseg) == 1 and pseg[0] == cls.EMPTY: pseg.insert(0, cls.DOT) return cls.SLASH.join(pseg) @classmethod def __parse_ip_literal(cls, address): # RFC 3986 3.2.2: In anticipation of future, as-yet-undefined # IP literal address formats, an implementation may use an # optional version flag to indicate such a format explicitly # rather than rely on heuristic determination. # # IP-literal = "[" ( IPv6address / IPvFuture ) "]" # # IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" ) # # If a URI containing an IP-literal that starts with "v" # (case-insensitive), indicating that the version flag is # present, is dereferenced by an application that does not # know the meaning of that version flag, then the application # should return an appropriate error for "address mechanism # not supported". if isinstance(address, bytes): address = address.decode("ascii") if address.startswith("v"): raise ValueError("address mechanism not supported") return ipaddress.IPv6Address(address) class SplitResultBytes(SplitResult): __slots__ = () # prevent creation of instance dictionary # RFC 3986 Appendix B RE = re.compile( rb""" (?:([A-Za-z][A-Za-z0-9+.-]*):)? # scheme (RFC 3986 3.1) (?://([^/?#]*))? # authority ([^?#]*) # path (?:\?([^#]*))? # query (?:\#(.*))? # fragment """, flags=re.VERBOSE, ) # RFC 3986 2.2 gen-delims COLON, SLASH, QUEST, HASH, LBRACKET, RBRACKET, AT = ( b":", b"/", b"?", b"#", b"[", b"]", b"@", ) # RFC 3986 3.3 dot-segments DOT, DOTDOT = b".", b".." EMPTY, EQ = b"", b"=" DIGITS = b"0123456789" class SplitResultString(SplitResult): __slots__ = () # prevent creation of instance dictionary # RFC 3986 Appendix B RE = re.compile( r""" (?:([A-Za-z][A-Za-z0-9+.-]*):)? # scheme (RFC 3986 3.1) (?://([^/?#]*))? # authority ([^?#]*) # path (?:\?([^#]*))? # query (?:\#(.*))? # fragment """, flags=re.VERBOSE, ) # RFC 3986 2.2 gen-delims COLON, SLASH, QUEST, HASH, LBRACKET, RBRACKET, AT = ( ":", "/", "?", "#", "[", "]", "@", ) # RFC 3986 3.3 dot-segments DOT, DOTDOT = ".", ".." EMPTY, EQ = "", "=" DIGITS = "0123456789" def uridefrag(uristring): """Remove an existing fragment component from a URI reference string.""" if isinstance(uristring, bytes): parts = uristring.partition(b"#") else: parts = uristring.partition("#") return DefragResult(parts[0], parts[2] if parts[1] else None) def urisplit(uristring): """Split a well-formed URI reference string into a tuple with five components corresponding to a URI's general structure:: :///?# """ if isinstance(uristring, bytes): result = SplitResultBytes else: result = SplitResultString return result(*result.RE.match(uristring).groups()) def uriunsplit(parts): """Combine the elements of a five-item iterable into a URI reference's string representation. """ scheme, authority, path, query, fragment = parts if isinstance(path, bytes): result = SplitResultBytes else: result = SplitResultString return result(scheme, authority, path, query, fragment).geturi() def urijoin(base, ref, strict=False): """Convert a URI reference relative to a base URI to its target URI string. """ if isinstance(base, type(ref)): return urisplit(base).transform(ref, strict).geturi() elif isinstance(base, bytes): return urisplit(base.decode()).transform(ref, strict).geturi() else: return urisplit(base).transform(ref.decode(), strict).geturi() def isuri(uristring): """Return :const:`True` if `uristring` is a URI.""" return urisplit(uristring).isuri() def isabsuri(uristring): """Return :const:`True` if `uristring` is an absolute URI.""" return urisplit(uristring).isabsuri() def isnetpath(uristring): """Return :const:`True` if `uristring` is a network-path reference.""" return urisplit(uristring).isnetpath() def isabspath(uristring): """Return :const:`True` if `uristring` is an absolute-path reference.""" return urisplit(uristring).isabspath() def isrelpath(uristring): """Return :const:`True` if `uristring` is a relative-path reference.""" return urisplit(uristring).isrelpath() def issamedoc(uristring): """Return :const:`True` if `uristring` is a same-document reference.""" return urisplit(uristring).issamedoc() # TBD: move compose to its own submodule? # RFC 3986 3.1: scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) _SCHEME_RE = re.compile(b"^[A-Za-z][A-Za-z0-9+.-]*$") # RFC 3986 3.2: authority = [ userinfo "@" ] host [ ":" port ] _AUTHORITY_RE_BYTES = re.compile(b"^(?:(.*)@)?(.*?)(?::([0-9]*))?$") _AUTHORITY_RE_STR = re.compile("^(?:(.*)@)?(.*?)(?::([0-9]*))?$") # safe component characters _SAFE_USERINFO = SUB_DELIMS + ":" _SAFE_HOST = SUB_DELIMS _SAFE_PATH = SUB_DELIMS + ":@/" _SAFE_QUERY = SUB_DELIMS + ":@/?" _SAFE_FRAGMENT = SUB_DELIMS + ":@/?" def _scheme(scheme): if _SCHEME_RE.match(scheme): return scheme.lower() else: raise ValueError("Invalid scheme component") def _authority(userinfo, host, port, encoding): authority = [] if userinfo is not None: authority.append(uriencode(userinfo, _SAFE_USERINFO, encoding)) authority.append(b"@") if isinstance(host, ipaddress.IPv6Address): authority.append(b"[" + host.compressed.encode() + b"]") elif isinstance(host, ipaddress.IPv4Address): authority.append(host.compressed.encode()) elif isinstance(host, bytes): authority.append(_host(host)) elif host is not None: authority.append(_host(host.encode("utf-8"))) if isinstance(port, numbers.Number): authority.append(_port(str(port).encode())) elif isinstance(port, bytes): authority.append(_port(port)) elif port is not None: authority.append(_port(port.encode())) return b"".join(authority) if authority else None def _ip_literal(address): if address.startswith("v"): raise ValueError("Address mechanism not supported") else: return b"[" + ipaddress.IPv6Address(address).compressed.encode() + b"]" def _host(host): # RFC 3986 3.2.3: Although host is case-insensitive, producers and # normalizers should use lowercase for registered names and # hexadecimal addresses for the sake of uniformity, while only # using uppercase letters for percent-encodings. if host.startswith(b"[") and host.endswith(b"]"): return _ip_literal(host[1:-1].decode()) # check for IPv6 addresses as returned by SplitResult.gethost() try: return _ip_literal(host.decode("utf-8")) except ValueError: return uriencode(host.lower(), _SAFE_HOST, "utf-8") def _port(port): # RFC 3986 3.2.3: URI producers and normalizers should omit the # port component and its ":" delimiter if port is empty or if its # value would be the same as that of the scheme's default. if port.lstrip(b"0123456789"): raise ValueError("Invalid port subcomponent") elif port: return b":" + port else: return b"" def _querylist(items, sep, encoding): terms = [] append = terms.append safe = _SAFE_QUERY.replace(sep, "") for key, value in items: name = uriencode(key, safe, encoding) if value is None: append(name) elif isinstance(value, (bytes, str)): append(name + b"=" + uriencode(value, safe, encoding)) else: append(name + b"=" + uriencode(str(value), safe, encoding)) return sep.encode("ascii").join(terms) def _querydict(mapping, sep, encoding): items = [] for key, value in mapping.items(): if isinstance(value, (bytes, str)): items.append((key, value)) elif isinstance(value, collections.abc.Iterable): items.extend([(key, v) for v in value]) else: items.append((key, value)) return _querylist(items, sep, encoding) def uricompose( scheme=None, authority=None, path="", query=None, fragment=None, userinfo=None, host=None, port=None, querysep="&", encoding="utf-8", ): """Compose a URI reference string from its individual components.""" # RFC 3986 3.1: Scheme names consist of a sequence of characters # beginning with a letter and followed by any combination of # letters, digits, plus ("+"), period ("."), or hyphen ("-"). # Although schemes are case-insensitive, the canonical form is # lowercase and documents that specify schemes must do so with # lowercase letters. An implementation should accept uppercase # letters as equivalent to lowercase in scheme names (e.g., allow # "HTTP" as well as "http") for the sake of robustness but should # only produce lowercase scheme names for consistency. if isinstance(scheme, bytes): scheme = _scheme(scheme) elif scheme is not None: scheme = _scheme(scheme.encode()) # authority must be string type or three-item iterable if authority is None: authority = (None, None, None) elif isinstance(authority, bytes): authority = _AUTHORITY_RE_BYTES.match(authority).groups() elif isinstance(authority, str): authority = _AUTHORITY_RE_STR.match(authority).groups() elif not isinstance(authority, collections.abc.Iterable): raise TypeError("Invalid authority type") elif len(authority) != 3: raise ValueError("Invalid authority length") authority = _authority( userinfo if userinfo is not None else authority[0], host if host is not None else authority[1], port if port is not None else authority[2], encoding, ) # RFC 3986 3.3: If a URI contains an authority component, then the # path component must either be empty or begin with a slash ("/") # character. If a URI does not contain an authority component, # then the path cannot begin with two slash characters ("//"). path = uriencode(path, _SAFE_PATH, encoding) if authority is not None and path and not path.startswith(b"/"): raise ValueError("Invalid path with authority component") if authority is None and path.startswith(b"//"): raise ValueError("Invalid path without authority component") # RFC 3986 4.2: A path segment that contains a colon character # (e.g., "this:that") cannot be used as the first segment of a # relative-path reference, as it would be mistaken for a scheme # name. Such a segment must be preceded by a dot-segment (e.g., # "./this:that") to make a relative-path reference. if scheme is None and authority is None and not path.startswith(b"/"): if b":" in path.partition(b"/")[0]: path = b"./" + path # RFC 3986 3.4: The characters slash ("/") and question mark ("?") # may represent data within the query component. Beware that some # older, erroneous implementations may not handle such data # correctly when it is used as the base URI for relative # references (Section 5.1), apparently because they fail to # distinguish query data from path data when looking for # hierarchical separators. However, as query components are often # used to carry identifying information in the form of "key=value" # pairs and one frequently used value is a reference to another # URI, it is sometimes better for usability to avoid percent- # encoding those characters. if isinstance(query, (bytes, str)): query = uriencode(query, _SAFE_QUERY, encoding) elif isinstance(query, collections.abc.Mapping): query = _querydict(query, querysep, encoding) elif isinstance(query, collections.abc.Iterable): query = _querylist(query, querysep, encoding) elif query is not None: raise TypeError("Invalid query type") # RFC 3986 3.5: The characters slash ("/") and question mark ("?") # are allowed to represent data within the fragment identifier. # Beware that some older, erroneous implementations may not handle # this data correctly when it is used as the base URI for relative # references. if fragment is not None: fragment = uriencode(fragment, _SAFE_FRAGMENT, encoding) # return URI reference as `str` return uriunsplit((scheme, authority, path, query, fragment)).decode() ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1716919522.8072033 uritools-4.0.3/src/uritools.egg-info/0000775000175000017500000000000014625416343016273 5ustar00tkemtkem././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1716919522.0 uritools-4.0.3/src/uritools.egg-info/PKG-INFO0000644000175000017500000001115514625416342017370 0ustar00tkemtkemMetadata-Version: 2.1 Name: uritools Version: 4.0.3 Summary: URI parsing, classification and composition Home-page: https://github.com/tkem/uritools/ Author: Thomas Kemmer Author-email: tkemmer@computer.org License: MIT Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Other Environment Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: MIT License Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: 3.12 Classifier: Topic :: Software Development :: Libraries :: Python Modules Requires-Python: >=3.7 License-File: LICENSE uritools ======================================================================== .. image:: https://img.shields.io/pypi/v/uritools :target: https://pypi.org/project/uritools :alt: Latest PyPI version .. image:: https://img.shields.io/github/actions/workflow/status/tkem/uritools/ci.yml :target: https://github.com/tkem/uritools/actions/workflows/ci.yml :alt: CI build status .. image:: https://img.shields.io/readthedocs/uritools :target: https://uritools.readthedocs.io :alt: Documentation build status .. image:: https://img.shields.io/codecov/c/github/tkem/uritools/master.svg :target: https://codecov.io/gh/tkem/uritools :alt: Test coverage .. image:: https://img.shields.io/librariesio/sourcerank/pypi/uritools :target: https://libraries.io/pypi/uritools :alt: Libraries.io SourceRank .. image:: https://img.shields.io/github/license/tkem/uritools :target: https://raw.github.com/tkem/uritools/master/LICENSE :alt: License .. image:: https://img.shields.io/badge/code%20style-black-000000.svg :target: https://github.com/psf/black :alt: Code style: black This module provides RFC 3986 compliant functions for parsing, classifying and composing URIs and URI references, largely replacing the Python Standard Library's ``urllib.parse`` module. .. code-block:: pycon >>> from uritools import uricompose, urijoin, urisplit, uriunsplit >>> uricompose(scheme='foo', host='example.com', port=8042, ... path='/over/there', query={'name': 'ferret'}, ... fragment='nose') 'foo://example.com:8042/over/there?name=ferret#nose' >>> parts = urisplit(_) >>> parts.scheme 'foo' >>> parts.authority 'example.com:8042' >>> parts.getport(default=80) 8042 >>> parts.getquerydict().get('name') ['ferret'] >>> parts.isuri() True >>> parts.isabsuri() False >>> urijoin(uriunsplit(parts), '/right/here?name=swallow#beak') 'foo://example.com:8042/right/here?name=swallow#beak' For various reasons, ``urllib.parse`` and its Python 2 predecessor ``urlparse`` are not compliant with current Internet standards. As stated in `Lib/urllib/parse.py `_: RFC 3986 is considered the current standard and any future changes to urlparse module should conform with it. The urlparse module is currently not entirely compliant with this RFC due to defacto scenarios for parsing, and for backward compatibility purposes, some parsing quirks from older RFCs are retained. This module aims to provide fully RFC 3986 compliant replacements for the most commonly used functions found in ``urllib.parse``. It also includes functions for distinguishing between the different forms of URIs and URI references, and for conveniently creating URIs from their individual components. Installation ------------------------------------------------------------------------ uritools is available from PyPI_ and can be installed by running:: pip install uritools Project Resources ------------------------------------------------------------------------ - `Documentation`_ - `Issue tracker`_ - `Source code`_ - `Change log`_ License ------------------------------------------------------------------------ Copyright (c) 2014-2023 Thomas Kemmer. Licensed under the `MIT License`_. .. _PyPI: https://pypi.org/project/uritools/ .. _Documentation: https://uritools.readthedocs.io/ .. _Issue tracker: https://github.com/tkem/uritools/issues/ .. _Source code: https://github.com/tkem/uritools/ .. _Change log: https://github.com/tkem/uritools/blob/master/CHANGELOG.rst .. _MIT License: https://raw.github.com/tkem/uritools/master/LICENSE ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1716919522.0 uritools-4.0.3/src/uritools.egg-info/SOURCES.txt0000664000175000017500000000074114625416342020160 0ustar00tkemtkemCHANGELOG.rst LICENSE MANIFEST.in README.rst pyproject.toml setup.cfg setup.py tox.ini docs/.gitignore docs/Makefile docs/conf.py docs/index.rst src/uritools/__init__.py src/uritools.egg-info/PKG-INFO src/uritools.egg-info/SOURCES.txt src/uritools.egg-info/dependency_links.txt src/uritools.egg-info/top_level.txt tests/__init__.py tests/test_classify.py tests/test_compose.py tests/test_defrag.py tests/test_encoding.py tests/test_join.py tests/test_split.py tests/test_unsplit.py././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1716919522.0 uritools-4.0.3/src/uritools.egg-info/dependency_links.txt0000664000175000017500000000000114625416342022340 0ustar00tkemtkem ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1716919522.0 uritools-4.0.3/src/uritools.egg-info/top_level.txt0000664000175000017500000000001114625416342021014 0ustar00tkemtkemuritools ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1716919522.8072033 uritools-4.0.3/tests/0000775000175000017500000000000014625416343013274 5ustar00tkemtkem././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1597080907.0 uritools-4.0.3/tests/__init__.py0000644000175000017500000000000013714302513015360 0ustar00tkemtkem././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1633976314.0 uritools-4.0.3/tests/test_classify.py0000664000175000017500000000576614131077772016541 0ustar00tkemtkemimport unittest import uritools class ClassifyTest(unittest.TestCase): def test_classification(self): cases = [ ("", False, False, False, False, True, True), ("#", False, False, False, False, True, True), ("#f", False, False, False, False, True, True), ("?", False, False, False, False, True, False), ("?q", False, False, False, False, True, False), ("p", False, False, False, False, True, False), ("/p", False, False, False, True, False, False), ("/p?", False, False, False, True, False, False), ("/p?q", False, False, False, True, False, False), ("/p#", False, False, False, True, False, False), ("/p#f", False, False, False, True, False, False), ("/p?q#f", False, False, False, True, False, False), ("//", False, False, True, False, False, False), ("//n?", False, False, True, False, False, False), ("//n?q", False, False, True, False, False, False), ("//n#", False, False, True, False, False, False), ("//n#f", False, False, True, False, False, False), ("//n?q#f", False, False, True, False, False, False), ("s:", True, True, False, False, False, False), ("s:p", True, True, False, False, False, False), ("s:p?", True, True, False, False, False, False), ("s:p?q", True, True, False, False, False, False), ("s:p#", True, False, False, False, False, False), ("s:p#f", True, False, False, False, False, False), ("s://", True, True, False, False, False, False), ("s://h", True, True, False, False, False, False), ("s://h/", True, True, False, False, False, False), ("s://h/p", True, True, False, False, False, False), ("s://h/p?", True, True, False, False, False, False), ("s://h/p?q", True, True, False, False, False, False), ("s://h/p#", True, False, False, False, False, False), ("s://h/p#f", True, False, False, False, False, False), ] for s, uri, absuri, netpath, abspath, relpath, samedoc in cases: for ref in [s, s.encode("ascii")]: parts = uritools.urisplit(ref) self.assertEqual(parts.isuri(), uri) self.assertEqual(parts.isabsuri(), absuri) self.assertEqual(parts.isnetpath(), netpath) self.assertEqual(parts.isabspath(), abspath) self.assertEqual(parts.isrelpath(), relpath) self.assertEqual(parts.issamedoc(), samedoc) self.assertEqual(uritools.isuri(ref), uri) self.assertEqual(uritools.isabsuri(ref), absuri) self.assertEqual(uritools.isnetpath(ref), netpath) self.assertEqual(uritools.isabspath(ref), abspath) self.assertEqual(uritools.isrelpath(ref), relpath) self.assertEqual(uritools.issamedoc(ref), samedoc) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1633976314.0 uritools-4.0.3/tests/test_compose.py0000664000175000017500000002363614131077772016365 0ustar00tkemtkemimport ipaddress import unittest from uritools import uricompose class ComposeTest(unittest.TestCase): def check(self, uri, **kwargs): result = uricompose(**kwargs) self.assertEqual( uri, result, msg="%r != %r (kwargs=%r)" % (uri, result, kwargs) ) def test_rfc3986(self): """uricompose test cases from [RFC3986] 3. Syntax Components""" self.check( "foo://example.com:42/over/there?name=ferret#nose", scheme="foo", authority="example.com:42", path="/over/there", query="name=ferret", fragment="nose", ) self.check( "urn:example:animal:ferret:nose", scheme="urn", path="example:animal:ferret:nose", ) def test_scheme(self): cases = [ ("foo+bar:", "foo+bar"), ("foo+bar:", b"foo+bar"), ("foo+bar:", "FOO+BAR"), ("foo+bar:", b"FOO+BAR"), ] for uri, scheme in cases: self.check(uri, scheme=scheme) # invalid scheme for scheme in ("", "foo:", "\xf6lk\xfcrbis"): with self.assertRaises(ValueError, msg="scheme=%r" % scheme): uricompose(scheme=scheme) def test_authority(self): cases = [ ("", None), ("//", ""), ("//", b""), ("//example.com", "example.com"), ("//example.com", b"example.com"), ("//example.com", "example.com:"), ("//example.com", b"example.com:"), ("//user@example.com", "user@example.com"), ("//user@example.com", b"user@example.com"), ("//example.com:42", "example.com:42"), ("//example.com:42", b"example.com:42"), ("//user@example.com:42", "user@example.com:42"), ("//user@example.com:42", b"user@example.com:42"), ("//user@127.0.0.1:42", "user@127.0.0.1:42"), ("//user@127.0.0.1:42", b"user@127.0.0.1:42"), ("//user@[::1]:42", "user@[::1]:42"), ("//user@[::1]:42", b"user@[::1]:42"), ("//user:c2VjcmV0@example.com", "user:c2VjcmV0@example.com"), ("//user:c2VjcmV0@example.com", b"user:c2VjcmV0@example.com"), ] for uri, authority in cases: self.check(uri, authority=authority) # invalid authority type for authority in (True, 42, 3.14, ipaddress.IPv6Address("::1")): with self.assertRaises(TypeError, msg="authority=%r" % authority): uricompose(authority=authority) def test_authority_kwargs(self): from ipaddress import IPv4Address, IPv6Address cases = [ ("", [None, None, None]), ("//", [None, "", None]), ("//", [None, b"", None]), ("//example.com", [None, "example.com", None]), ("//example.com", [None, b"example.com", None]), ("//example.com", [None, "example.com", ""]), ("//example.com", [None, "example.com", b""]), ("//user@example.com", ["user", "example.com", None]), ("//user@example.com", [b"user", "example.com", None]), ("//user@example.com", [b"user", b"example.com", None]), ("//example.com:42", [None, "example.com", "42"]), ("//example.com:42", [None, b"example.com", "42"]), ("//example.com:42", [None, b"example.com", b"42"]), ("//example.com:42", [None, "example.com", 42]), ("//example.com:42", [None, b"example.com", 42]), ("//user@example.com:42", ["user", "example.com", "42"]), ("//user@example.com:42", [b"user", "example.com", "42"]), ("//user@example.com:42", [b"user", b"example.com", "42"]), ("//user@example.com:42", [b"user", b"example.com", b"42"]), ("//user@example.com:42", ["user", "example.com", 42]), ("//user@example.com:42", [b"user", "example.com", 42]), ("//user@example.com:42", [b"user", b"example.com", 42]), ("//user@127.0.0.1:42", ["user", "127.0.0.1", 42]), ("//user@127.0.0.1:42", ["user", b"127.0.0.1", 42]), ("//user@127.0.0.1:42", ["user", IPv4Address("127.0.0.1"), 42]), ("//user@[::1]:42", ["user", "::1", 42]), ("//user@[::1]:42", ["user", b"::1", 42]), ("//user@[::1]:42", ["user", "[::1]", 42]), ("//user@[::1]:42", ["user", b"[::1]", 42]), ("//user@[::1]:42", ["user", IPv6Address("::1"), 42]), ] for uri, authority in cases: self.check(uri, authority=authority) userinfo, host, port = authority self.check(uri, userinfo=userinfo, host=host, port=port) # invalid authority value for authority in ([], ["foo"], ["foo", "bar"], range(4)): with self.assertRaises(ValueError, msg="authority=%r" % authority): uricompose(authority=authority) # invalid host type for host in (True, 42, 3.14, ipaddress.IPv6Network("2001:db00::0/24")): with self.assertRaises(AttributeError, msg="host=%r" % host): uricompose(authority=[None, host, None]) with self.assertRaises(AttributeError, msg="host=%r" % host): uricompose(host=host) # invalid host ip-literal for host in ("[foo]", "[v1.x]"): with self.assertRaises(ValueError, msg="host=%r" % host): uricompose(authority=[None, host, None]) with self.assertRaises(ValueError, msg="host=%r" % host): uricompose(host=host) # invalid port value for port in (-1, "foo", 3.14): with self.assertRaises(ValueError, msg="port=%r" % port): uricompose(authority=[None, "", port]) with self.assertRaises(ValueError, msg="port=%r" % port): uricompose(port=port) def test_authority_override(self): cases = [ ("//user@example.com:42", None, "user", "example.com", 42), ("//user@example.com:42", "", "user", "example.com", 42), ("//user@example.com:42", "example.com", "user", None, 42), ("//user@example.com:42", "user@:42", None, "example.com", None), ] for uri, authority, userinfo, host, port in cases: self.check( uri, authority=authority, userinfo=userinfo, host=host, port=port ) def test_host_lowercase(self): cases = [ ("//hostname", "HostName"), ("//[2001:db8::1]", "[2001:DB8::1]"), ( "//uuid%3A228f0766-a241-4050-a7a8-2c153073e3d7", "UUID:228F0766-A241-4050-A7A8-2C153073E3D7", ), ] for uri, host in cases: self.check(uri, host=host) def test_path(self): cases = [ ("foo", "foo"), ("foo", b"foo"), ("foo+bar", "foo+bar"), ("foo+bar", b"foo+bar"), ("foo%20bar", "foo bar"), ("foo%20bar", b"foo bar"), ("./this:that", "this:that"), ("./this:that", b"this:that"), ("./this:that/", "this:that/"), ("./this:that/", b"this:that/"), ] for uri, path in cases: self.check(uri, path=path) # invalid path with authority for path in ("foo", b"foo"): with self.assertRaises(ValueError, msg="path=%r" % path): uricompose(authority="auth", path=path) # invalid path without authority for path in ("//", b"//", "//foo", b"//foo"): with self.assertRaises(ValueError, msg="path=%r" % path): uricompose(path=path) def test_query(self): from collections import OrderedDict as od cases = [ ("?", ""), ("?", b""), ("?", []), ("?", {}), ("?name", "name"), ("?name", b"name"), ("?name", [("name", None)]), ("?name", [(b"name", None)]), ("?name", {"name": None}), ("?name", {b"name": None}), ("?name=foo", "name=foo"), ("?name=foo", b"name=foo"), ("?name=foo", [("name", "foo")]), ("?name=foo", [("name", b"foo")]), ("?name=foo", [(b"name", b"foo")]), ("?name=foo", {"name": "foo"}), ("?name=foo", {"name": b"foo"}), ("?name=foo", {"name": ["foo"]}), ("?name=foo", {"name": [b"foo"]}), ("?name=foo", {b"name": b"foo"}), ("?name=foo", {b"name": [b"foo"]}), ("?name=42", [("name", 42)]), ("?name=42", {"name": 42}), ("?name=42", {"name": [42]}), ("?name=foo&type=bar", [("name", "foo"), ("type", "bar")]), ("?name=foo&type=bar", od([("name", "foo"), ("type", "bar")])), ("?name=foo&name=bar", [("name", "foo"), ("name", "bar")]), ("?name=foo&name=bar", {"name": ["foo", "bar"]}), ("?name=a/b/c", dict(name="a/b/c")), ("?name=a:b:c", dict(name="a:b:c")), ("?name=a?b?c", dict(name="a?b?c")), ("?name=a@b@c", dict(name="a@b@c")), ("?name=a;b;c", dict(name="a;b;c")), ("?name=a%23b%23c", dict(name="a#b#c")), ("?name=a%26b%26c", dict(name="a&b&c")), ] for uri, query in cases: self.check(uri, query=query) # invalid query type for query in (0, [1]): with self.assertRaises(TypeError, msg="query=%r" % query): uricompose(query=query) def test_query_sep(self): cases = [ ("&", "?x=foo&y=bar", [("x", "foo"), ("y", "bar")]), (";", "?x=foo;y=bar", [("x", "foo"), ("y", "bar")]), ("&", "?x=foo;y=bar", [("x", "foo;y=bar")]), (";", "?x=foo&y=bar", [("x", "foo&y=bar")]), ] for sep, uri, query in cases: self.check(uri, query=query, querysep=sep) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1633976314.0 uritools-4.0.3/tests/test_defrag.py0000664000175000017500000000362514131077772016144 0ustar00tkemtkemimport unittest from uritools import uridefrag class DefragTest(unittest.TestCase): def test_uridefrag(self): cases = [ ("http://python.org#frag", "http://python.org", "frag"), ("http://python.org", "http://python.org", None), ("http://python.org/#frag", "http://python.org/", "frag"), ("http://python.org/", "http://python.org/", None), ("http://python.org/?q#frag", "http://python.org/?q", "frag"), ("http://python.org/?q", "http://python.org/?q", None), ("http://python.org/p#frag", "http://python.org/p", "frag"), ("http://python.org/p?q", "http://python.org/p?q", None), ("http://python.org#", "http://python.org", ""), ("http://python.org/#", "http://python.org/", ""), ("http://python.org/?q#", "http://python.org/?q", ""), ("http://python.org/p?q#", "http://python.org/p?q", ""), ] def encode(s): return s.encode() if s is not None else None cases += list(map(encode, case) for case in cases) for uri, base, fragment in cases: defrag = uridefrag(uri) self.assertEqual(defrag, (base, fragment)) self.assertEqual(defrag.uri, base) self.assertEqual(defrag.fragment, fragment) self.assertEqual(uri, defrag.geturi()) def test_getfragment(self): self.assertEqual(uridefrag("").getfragment(), None) self.assertEqual(uridefrag(b"").getfragment(), None) self.assertEqual(uridefrag("#").getfragment(), "") self.assertEqual(uridefrag(b"#").getfragment(), "") self.assertEqual(uridefrag("#foo").getfragment(), "foo") self.assertEqual(uridefrag(b"#foo").getfragment(), "foo") self.assertEqual(uridefrag("#foo%20bar").getfragment(), "foo bar") self.assertEqual(uridefrag(b"#foo%20bar").getfragment(), "foo bar") ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1633976314.0 uritools-4.0.3/tests/test_encoding.py0000664000175000017500000000527414131077772016504 0ustar00tkemtkemimport unittest from uritools import RESERVED, UNRESERVED, uridecode, uriencode class EncodingTest(unittest.TestCase): def check(self, decoded, encoded, safe="", encoding="utf-8"): self.assertEqual(uriencode(decoded, safe, encoding), encoded) self.assertEqual(uridecode(encoded, encoding), decoded) # swap bytes/string types self.assertEqual( uriencode(decoded.encode(encoding), safe, encoding), encoded ) # noqa self.assertEqual(uridecode(encoded.decode("ascii"), encoding), decoded) def test_encoding(self): cases = [ ("", b""), (" ", b"%20"), ("%", b"%25"), ("~", b"~"), (UNRESERVED, UNRESERVED.encode("ascii")), ] for decoded, encoded in cases: self.check(decoded, encoded) def test_safe_encoding(self): cases = [ ("", b"", ""), ("", b"", b""), (" ", b" ", " "), (" ", b" ", b" "), ("%", b"%", "%"), ("%", b"%", b"%"), (RESERVED, RESERVED.encode("ascii"), RESERVED), ] for decoded, encoded, safe in cases: self.check(decoded, encoded, safe) def test_utf8_encoding(self): cases = [("\xf6lk\xfcrbis", b"%C3%B6lk%C3%BCrbis")] for decoded, encoded in cases: self.check(decoded, encoded, encoding="utf-8") def test_latin1_encoding(self): cases = [("\xf6lk\xfcrbis", b"%F6lk%FCrbis")] for decoded, encoded in cases: self.check(decoded, encoded, encoding="latin-1") def test_idna_encoding(self): cases = [("\xf6lk\xfcrbis", b"xn--lkrbis-vxa4c")] for decoded, encoded in cases: self.check(decoded, encoded, encoding="idna") def test_decode_bytes(self): cases = [ ("%F6lk%FCrbis", b"\xf6lk\xfcrbis"), (b"%F6lk%FCrbis", b"\xf6lk\xfcrbis"), ] for input, output in cases: self.assertEqual(uridecode(input, encoding=None), output) def test_encode_bytes(self): cases = [(b"\xf6lk\xfcrbis", b"%F6lk%FCrbis")] for input, output in cases: self.assertEqual(uriencode(input, encoding=None), output) def test_decode_errors(self): cases = [ (UnicodeError, b"%FF", "utf-8"), ] for exception, string, encoding in cases: self.assertRaises(exception, uridecode, string, encoding) def test_encode_errors(self): cases = [ (UnicodeError, "\xff", b"", "ascii"), ] for exception, string, safe, encoding in cases: self.assertRaises(exception, uriencode, string, safe, encoding) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1693423911.0 uritools-4.0.3/tests/test_join.py0000664000175000017500000001125414473714447015656 0ustar00tkemtkemimport unittest from uritools import urijoin class JoinTest(unittest.TestCase): RFC3986_BASE = "http://a/b/c/d;p?q" def check(self, base, ref, expected, strict=False): self.assertEqual(expected, urijoin(base, ref, strict)) # base as bytes, ref as str self.assertEqual(expected, urijoin(base.encode(), ref, strict)) # base as str, ref as bytes self.assertEqual(expected, urijoin(base, ref.encode(), strict)) # both base and ref as bytes self.assertEqual( expected.encode(), urijoin(base.encode(), ref.encode(), strict) ) def test_rfc3986_normal(self): """urijoin test cases from RFC 3986 5.4.1. Normal Examples""" self.check(self.RFC3986_BASE, "g:h", "g:h") self.check(self.RFC3986_BASE, "g", "http://a/b/c/g") self.check(self.RFC3986_BASE, "./g", "http://a/b/c/g") self.check(self.RFC3986_BASE, "g/", "http://a/b/c/g/") self.check(self.RFC3986_BASE, "/g", "http://a/g") self.check(self.RFC3986_BASE, "//g", "http://g") self.check(self.RFC3986_BASE, "?y", "http://a/b/c/d;p?y") self.check(self.RFC3986_BASE, "g?y", "http://a/b/c/g?y") self.check(self.RFC3986_BASE, "#s", "http://a/b/c/d;p?q#s") self.check(self.RFC3986_BASE, "g#s", "http://a/b/c/g#s") self.check(self.RFC3986_BASE, "g?y#s", "http://a/b/c/g?y#s") self.check(self.RFC3986_BASE, ";x", "http://a/b/c/;x") self.check(self.RFC3986_BASE, "g;x", "http://a/b/c/g;x") self.check(self.RFC3986_BASE, "g;x?y#s", "http://a/b/c/g;x?y#s") self.check(self.RFC3986_BASE, "", "http://a/b/c/d;p?q") self.check(self.RFC3986_BASE, ".", "http://a/b/c/") self.check(self.RFC3986_BASE, "./", "http://a/b/c/") self.check(self.RFC3986_BASE, "..", "http://a/b/") self.check(self.RFC3986_BASE, "../", "http://a/b/") self.check(self.RFC3986_BASE, "../g", "http://a/b/g") self.check(self.RFC3986_BASE, "../..", "http://a/") self.check(self.RFC3986_BASE, "../../", "http://a/") self.check(self.RFC3986_BASE, "../../g", "http://a/g") def test_rfc3986_abnormal(self): """urijoin test cases from RFC 3986 5.4.2. Abnormal Examples""" self.check(self.RFC3986_BASE, "../../../g", "http://a/g") self.check(self.RFC3986_BASE, "../../../../g", "http://a/g") self.check(self.RFC3986_BASE, "/./g", "http://a/g") self.check(self.RFC3986_BASE, "/../g", "http://a/g") self.check(self.RFC3986_BASE, "g.", "http://a/b/c/g.") self.check(self.RFC3986_BASE, ".g", "http://a/b/c/.g") self.check(self.RFC3986_BASE, "g..", "http://a/b/c/g..") self.check(self.RFC3986_BASE, "..g", "http://a/b/c/..g") self.check(self.RFC3986_BASE, "./../g", "http://a/b/g") self.check(self.RFC3986_BASE, "./g/.", "http://a/b/c/g/") self.check(self.RFC3986_BASE, "g/./h", "http://a/b/c/g/h") self.check(self.RFC3986_BASE, "g/../h", "http://a/b/c/h") self.check(self.RFC3986_BASE, "g;x=1/./y", "http://a/b/c/g;x=1/y") self.check(self.RFC3986_BASE, "g;x=1/../y", "http://a/b/c/y") self.check(self.RFC3986_BASE, "g?y/./x", "http://a/b/c/g?y/./x") self.check(self.RFC3986_BASE, "g?y/../x", "http://a/b/c/g?y/../x") self.check(self.RFC3986_BASE, "g#s/./x", "http://a/b/c/g#s/./x") self.check(self.RFC3986_BASE, "g#s/../x", "http://a/b/c/g#s/../x") self.check(self.RFC3986_BASE, "http:g", "http:g", True) self.check(self.RFC3986_BASE, "http:g", "http://a/b/c/g", False) def test_rfc3986_merge(self): """urijoin test cases for RFC 3986 5.2.3. Merge Paths""" self.check("http://a", "b", "http://a/b") def test_relative_base(self): self.check("", "bar", "bar") self.check("foo", "bar", "bar") self.check("foo/", "bar", "foo/bar") self.check(".", "bar", "bar") self.check("./", "bar", "bar") self.check("./foo", "bar", "bar") self.check("./foo/", "bar", "foo/bar") self.check("..", "bar", "bar") self.check("../", "bar", "../bar") self.check("../foo", "bar", "../bar") self.check("../foo/", "bar", "../foo/bar") self.check("", "../bar", "../bar") self.check("foo", "../bar", "../bar") self.check("foo/", "../bar", "bar") self.check(".", "../bar", "../bar") self.check("./", "../bar", "../bar") self.check("./foo", "../bar", "../bar") self.check("./foo/", "../bar", "bar") self.check("..", "../bar", "../bar") self.check("../", "../bar", "../../bar") self.check("../foo", "../bar", "../../bar") self.check("../foo/", "../bar", "../bar") ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1716919411.0 uritools-4.0.3/tests/test_split.py0000664000175000017500000005503714625416163016052 0ustar00tkemtkemimport unittest from uritools import urisplit class SplitTest(unittest.TestCase): def check(self, uri, parts, decoded=None): result = urisplit(uri) self.assertEqual(result, parts, "Error parsing %r" % uri) self.assertEqual(result.geturi(), uri, "Error recomposing %r" % uri) def test_rfc3986(self): """urisplit test cases from [RFC3986] 3. Syntax Components""" cases = [ ( "foo://example.com:8042/over/there?name=ferret#nose", ("foo", "example.com:8042", "/over/there", "name=ferret", "nose"), ), ( "urn:example:animal:ferret:nose", ("urn", None, "example:animal:ferret:nose", None, None), ), ( b"foo://example.com:8042/over/there?name=ferret#nose", (b"foo", b"example.com:8042", b"/over/there", b"name=ferret", b"nose"), ), ( b"urn:example:animal:ferret:nose", (b"urn", None, b"example:animal:ferret:nose", None, None), ), ] for uri, parts in cases: self.check(uri, parts) def test_abnormal(self): cases = [ ("", (None, None, "", None, None)), (":", (None, None, ":", None, None)), (":/", (None, None, ":/", None, None)), ("://", (None, None, "://", None, None)), ("://?", (None, None, "://", "", None)), ("://#", (None, None, "://", None, "")), ("://?#", (None, None, "://", "", "")), ("//", (None, "", "", None, None)), ("//:", (None, ":", "", None, None)), ("///", (None, "", "/", None, None)), ("//?", (None, "", "", "", None)), ("//#", (None, "", "", None, "")), ("//?#", (None, "", "", "", "")), ("?", (None, None, "", "", None)), ("??", (None, None, "", "?", None)), ("?#", (None, None, "", "", "")), ("#", (None, None, "", None, "")), ("##", (None, None, "", None, "#")), ("?:", (None, None, "", ":", None)), ("#:", (None, None, "", None, ":")), ("_:", (None, None, "_:", None, None)), (b"", (None, None, b"", None, None)), (b":", (None, None, b":", None, None)), (b":/", (None, None, b":/", None, None)), (b"://", (None, None, b"://", None, None)), (b"://?", (None, None, b"://", b"", None)), (b"://#", (None, None, b"://", None, b"")), (b"://?#", (None, None, b"://", b"", b"")), (b"//", (None, b"", b"", None, None)), (b"//:", (None, b":", b"", None, None)), (b"///", (None, b"", b"/", None, None)), (b"//?", (None, b"", b"", b"", None)), (b"//#", (None, b"", b"", None, b"")), (b"//?#", (None, b"", b"", b"", b"")), (b"?", (None, None, b"", b"", None)), (b"??", (None, None, b"", b"?", None)), (b"?#", (None, None, b"", b"", b"")), (b"#", (None, None, b"", None, b"")), (b"##", (None, None, b"", None, b"#")), (b"?:", (None, None, b"", b":", None)), (b"#:", (None, None, b"", None, b":")), (b"_:", (None, None, b"_:", None, None)), ] for uri, parts in cases: self.check(uri, parts) def test_members(self): uri = "foo://user@example.com:8042/over/there?name=ferret#nose" result = urisplit(uri) self.assertEqual(result.scheme, "foo") self.assertEqual(result.authority, "user@example.com:8042") self.assertEqual(result.path, "/over/there") self.assertEqual(result.query, "name=ferret") self.assertEqual(result.fragment, "nose") self.assertEqual(result.userinfo, "user") self.assertEqual(result.host, "example.com") self.assertEqual(result.port, "8042") self.assertEqual(result.geturi(), uri) self.assertEqual(result.getscheme(), "foo") self.assertEqual(result.getauthority(), ("user", "example.com", 8042)) self.assertEqual(result.getuserinfo(), "user") self.assertEqual(result.gethost(), "example.com") self.assertEqual(result.getport(), 8042) self.assertEqual(result.getpath(), "/over/there") self.assertEqual(result.getquery(), "name=ferret") self.assertEqual(dict(result.getquerydict()), {"name": ["ferret"]}) self.assertEqual(list(result.getquerylist()), [("name", "ferret")]) self.assertEqual(result.getfragment(), "nose") uri = "urn:example:animal:ferret:nose" result = urisplit(uri) self.assertEqual(result.scheme, "urn") self.assertEqual(result.authority, None) self.assertEqual(result.path, "example:animal:ferret:nose") self.assertEqual(result.query, None) self.assertEqual(result.fragment, None) self.assertEqual(result.userinfo, None) self.assertEqual(result.host, None) self.assertEqual(result.port, None) self.assertEqual(result.geturi(), uri) self.assertEqual(result.getscheme(), "urn") self.assertEqual(result.getauthority(), (None, None, None)) self.assertEqual(result.getuserinfo(), None) self.assertEqual(result.gethost(), None) self.assertEqual(result.getport(), None) self.assertEqual(result.getpath(), "example:animal:ferret:nose") self.assertEqual(result.getquery(), None) self.assertEqual(dict(result.getquerydict()), {}) self.assertEqual(list(result.getquerylist()), []) self.assertEqual(result.getfragment(), None) uri = "file:///" result = urisplit(uri) self.assertEqual(result.scheme, "file") self.assertEqual(result.authority, "") self.assertEqual(result.path, "/") self.assertEqual(result.query, None) self.assertEqual(result.fragment, None) self.assertEqual(result.userinfo, None) self.assertEqual(result.host, "") self.assertEqual(result.port, None) self.assertEqual(result.geturi(), uri) self.assertEqual(result.getscheme(), "file") self.assertEqual(result.getauthority(), (None, "", None)) self.assertEqual(result.getuserinfo(), None) self.assertEqual(result.gethost(), "") self.assertEqual(result.getport(), None) self.assertEqual(result.getpath(), "/") self.assertEqual(result.getquery(), None) self.assertEqual(dict(result.getquerydict()), {}) self.assertEqual(list(result.getquerylist()), []) self.assertEqual(result.getfragment(), None) uri = b"foo://user@example.com:8042/over/there?name=ferret#nose" result = urisplit(uri) self.assertEqual(result.scheme, b"foo") self.assertEqual(result.authority, b"user@example.com:8042") self.assertEqual(result.path, b"/over/there") self.assertEqual(result.query, b"name=ferret") self.assertEqual(result.fragment, b"nose") self.assertEqual(result.userinfo, b"user") self.assertEqual(result.host, b"example.com") self.assertEqual(result.port, b"8042") self.assertEqual(result.geturi(), uri) self.assertEqual(result.getscheme(), "foo") self.assertEqual(result.getauthority(), ("user", "example.com", 8042)) self.assertEqual(result.getuserinfo(), "user") self.assertEqual(result.gethost(), "example.com") self.assertEqual(result.getport(), 8042) self.assertEqual(result.getpath(), "/over/there") self.assertEqual(result.getquery(), "name=ferret") self.assertEqual(dict(result.getquerydict()), {"name": ["ferret"]}) self.assertEqual(list(result.getquerylist()), [("name", "ferret")]) self.assertEqual(result.getfragment(), "nose") uri = b"urn:example:animal:ferret:nose" result = urisplit(uri) self.assertEqual(result.scheme, b"urn") self.assertEqual(result.authority, None) self.assertEqual(result.path, b"example:animal:ferret:nose") self.assertEqual(result.query, None) self.assertEqual(result.fragment, None) self.assertEqual(result.userinfo, None) self.assertEqual(result.host, None) self.assertEqual(result.port, None) self.assertEqual(result.geturi(), uri) self.assertEqual(result.getscheme(), "urn") self.assertEqual(result.getauthority(), (None, None, None)) self.assertEqual(result.getuserinfo(), None) self.assertEqual(result.gethost(), None) self.assertEqual(result.getport(), None) self.assertEqual(result.getpath(), "example:animal:ferret:nose") self.assertEqual(result.getquery(), None) self.assertEqual(dict(result.getquerydict()), {}) self.assertEqual(list(result.getquerylist()), []) self.assertEqual(result.getfragment(), None) uri = b"file:///" result = urisplit(uri) self.assertEqual(result.scheme, b"file") self.assertEqual(result.authority, b"") self.assertEqual(result.path, b"/") self.assertEqual(result.query, None) self.assertEqual(result.fragment, None) self.assertEqual(result.userinfo, None) self.assertEqual(result.host, b"") self.assertEqual(result.port, None) self.assertEqual(result.geturi(), uri) self.assertEqual(result.getscheme(), "file") self.assertEqual(result.getauthority(), (None, "", None)) self.assertEqual(result.getuserinfo(), None) self.assertEqual(result.gethost(), "") self.assertEqual(result.getport(), None) self.assertEqual(result.getpath(), "/") self.assertEqual(result.getquery(), None) self.assertEqual(dict(result.getquerydict()), {}) self.assertEqual(list(result.getquerylist()), []) self.assertEqual(result.getfragment(), None) def test_getscheme(self): self.assertEqual(urisplit("foo").getscheme(default="bar"), "bar") self.assertEqual(urisplit("foo:").getscheme(default="bar"), "foo") self.assertEqual(urisplit("FOO:").getscheme(default="bar"), "foo") self.assertEqual(urisplit("FOO_BAR:/").getscheme(default="x"), "x") self.assertEqual(urisplit(b"foo").getscheme(default="bar"), "bar") self.assertEqual(urisplit(b"foo:").getscheme(default="bar"), "foo") self.assertEqual(urisplit(b"FOO:").getscheme(default="bar"), "foo") self.assertEqual(urisplit(b"FOO_BAR:/").getscheme(default="x"), "x") def test_getauthority(self): from ipaddress import IPv4Address, IPv6Address cases = [ ("urn:example:animal:ferret:nose", None, (None, None, None)), ("file:///", None, (None, "", None)), ( "http://userinfo@Test.python.org:5432/foo/", None, ("userinfo", "test.python.org", 5432), ), ( "http://userinfo@12.34.56.78:5432/foo/", None, ("userinfo", IPv4Address("12.34.56.78"), 5432), ), ( "http://userinfo@[::1]:5432/foo/", None, ("userinfo", IPv6Address("::1"), 5432), ), ( "urn:example:animal:ferret:nose", ("nobody", "localhost", 42), ("nobody", "localhost", 42), ), ("file:///", ("nobody", "localhost", 42), ("nobody", "localhost", 42)), ( "http://Test.python.org/foo/", ("nobody", "localhost", 42), ("nobody", "test.python.org", 42), ), ( "http://userinfo@Test.python.org/foo/", ("nobody", "localhost", 42), ("userinfo", "test.python.org", 42), ), ( "http://Test.python.org:5432/foo/", ("nobody", "localhost", 42), ("nobody", "test.python.org", 5432), ), ( "http://userinfo@Test.python.org:5432/foo/", ("nobody", "localhost", 42), ("userinfo", "test.python.org", 5432), ), ] for uri, default, authority in cases: self.assertEqual(urisplit(uri).getauthority(default), authority) for uri in ["http://[::1/", "http://::1]/"]: with self.assertRaises(ValueError, msg="%r" % uri): urisplit(uri).getauthority() with self.assertRaises(ValueError, msg="%r" % uri): urisplit(uri.encode()).getauthority() with self.assertRaises(TypeError, msg="%r" % uri): urisplit("").getauthority(42) with self.assertRaises(ValueError, msg="%r" % uri): urisplit("").getauthority(("userinfo", "test.python.org")) def test_gethost(self): from ipaddress import IPv4Address, IPv6Address cases = [ ("http://Test.python.org:5432/foo/", "test.python.org"), ("http://12.34.56.78:5432/foo/", IPv4Address("12.34.56.78")), ("http://[::1]:5432/foo/", IPv6Address("::1")), ] for uri, host in cases: self.assertEqual(urisplit(uri).gethost(), host) self.assertEqual(urisplit(uri.encode()).gethost(), host) for uri in ["http://[::1/", "http://::1]/"]: with self.assertRaises(ValueError, msg="%r" % uri): urisplit(uri).gethost() with self.assertRaises(ValueError, msg="%r" % uri): urisplit(uri.encode()).gethost() def test_getport(self): for uri in ["foo://bar", "foo://bar:", "foo://bar/", "foo://bar:/"]: result = urisplit(uri) if result.authority.endswith(":"): self.assertEqual(result.port, "") else: self.assertEqual(result.port, None) self.assertEqual(result.gethost(), "bar") self.assertEqual(result.getport(8000), 8000) def test_getpath(self): cases = [ ("", "", "/"), (".", "./", "/"), ("./", "./", "/"), ("./.", "./", "/"), ("./..", "../", "/"), ("./foo", "foo", "/foo"), ("./foo/", "foo/", "/foo/"), ("./foo/.", "foo/", "/foo/"), ("./foo/..", "./", "/"), ("..", "../", "/"), ("../", "../", "/"), ("../.", "../", "/"), ("../..", "../../", "/"), ("../foo", "../foo", "/foo"), ("../foo/", "../foo/", "/foo/"), ("../foo/.", "../foo/", "/foo/"), ("../foo/..", "../", "/"), ("../../foo", "../../foo", "/foo"), ("../../foo/", "../../foo/", "/foo/"), ("../../foo/.", "../../foo/", "/foo/"), ("../../foo/..", "../../", "/"), ("../../foo/../bar", "../../bar", "/bar"), ("../../foo/../bar/", "../../bar/", "/bar/"), ("../../foo/../bar/.", "../../bar/", "/bar/"), ("../../foo/../bar/..", "../../", "/"), ("../../foo/../..", "../../../", "/"), ] for uri, relpath, abspath in cases: parts = urisplit(uri) self.assertEqual(relpath, parts.getpath()) parts = urisplit(uri.encode("ascii")) self.assertEqual(relpath, parts.getpath()) parts = urisplit("/" + uri) self.assertEqual(abspath, parts.getpath()) parts = urisplit(("/" + uri).encode("ascii")) self.assertEqual(abspath, parts.getpath()) def test_getquery(self): cases = [ ("", [], {}), ("?", [], {}), ("?&", [], {}), ("?&&", [], {}), ("?=", [("", "")], {"": [""]}), ("?=a", [("", "a")], {"": ["a"]}), ("?a", [("a", None)], {"a": [None]}), ("?a=", [("a", "")], {"a": [""]}), ("?&a=b", [("a", "b")], {"a": ["b"]}), ( "?a=a+b&b=b+c", [("a", "a+b"), ("b", "b+c")], {"a": ["a+b"], "b": ["b+c"]}, ), ( "?a=a%20b&b=b%20c", [("a", "a b"), ("b", "b c")], {"a": ["a b"], "b": ["b c"]}, ), ("?a=1&a=2", [("a", "1"), ("a", "2")], {"a": ["1", "2"]}), ] for query, querylist, querydict in cases: parts = urisplit(query) self.assertEqual( parts.getquerylist(), querylist, "Error parsing query dict for %r" % query, ) self.assertEqual( parts.getquerydict(), querydict, "Error parsing query list for %r" % query, ) def test_getquerysep(self): cases = [ ("&", "?a=b", [("a", "b")]), (";", "?a=b", [("a", "b")]), ("&", "?a=a+b&b=b+c", [("a", "a+b"), ("b", "b+c")]), (";", "?a=a+b;b=b+c", [("a", "a+b"), ("b", "b+c")]), ("&", "?a=a+b;b=b+c", [("a", "a+b;b=b+c")]), (";", "?a=a+b&b=b+c", [("a", "a+b&b=b+c")]), ("&", "?a&b", [("a", None), ("b", None)]), (";", "?a;b", [("a", None), ("b", None)]), (b"&", "?a&b", [("a", None), ("b", None)]), ("&", b"?a&b", [("a", None), ("b", None)]), ] for sep, query, querylist in cases: parts = urisplit(query) self.assertEqual( parts.getquerylist(sep), querylist, "Error parsing query list for %r" % query, ) def test_ipv4_literal(self): cases = [ ("http://12.34.56.78/foo/", "12.34.56.78", None), ("http://12.34.56.78:/foo/", "12.34.56.78", None), ("http://12.34.56.78:5432/foo/", "12.34.56.78", 5432), ] for uri, host, port in cases: for parts in (urisplit(uri), urisplit(uri.encode("ascii"))): self.assertEqual(host, str(parts.gethost())) self.assertEqual(port, parts.getport()) def test_ipv6_literal(self): cases = [ ("http://[::1]:5432/foo/", "0000:0000:0000:0000:0000:0000:0000:0001", 5432), ( "http://[dead:beef::1]:5432/foo/", "dead:beef:0000:0000:0000:0000:0000:0001", 5432, ), ( "http://[dead:beef::]:5432/foo/", "dead:beef:0000:0000:0000:0000:0000:0000", 5432, ), ( "http://[dead:beef:cafe:5417:affe:8FA3:deaf:feed]:5432/foo/", "dead:beef:cafe:5417:affe:8fa3:deaf:feed", 5432, ), ("http://[::1]/foo/", "0000:0000:0000:0000:0000:0000:0000:0001", None), ( "http://[dead:beef::1]/foo/", "dead:beef:0000:0000:0000:0000:0000:0001", None, ), ( "http://[dead:beef::]/foo/", "dead:beef:0000:0000:0000:0000:0000:0000", None, ), ( "http://[dead:beef:cafe:5417:affe:8FA3:deaf:feed]/foo/", "dead:beef:cafe:5417:affe:8fa3:deaf:feed", None, ), ("http://[::1]:/foo/", "0000:0000:0000:0000:0000:0000:0000:0001", None), ( "http://[dead:beef::1]:/foo/", "dead:beef:0000:0000:0000:0000:0000:0001", None, ), ( "http://[dead:beef::]:/foo/", "dead:beef:0000:0000:0000:0000:0000:0000", None, ), ( "http://[dead:beef:cafe:5417:affe:8FA3:deaf:feed]:/foo/", "dead:beef:cafe:5417:affe:8fa3:deaf:feed", None, ), ] for uri, host, port in cases: for parts in (urisplit(uri), urisplit(uri.encode("ascii"))): self.assertEqual(host, parts.gethost().exploded) self.assertEqual(port, parts.getport()) def test_ipv4_mapped_literal(self): # since Python 3.13, the "alternative form" is used for # IPv4-mapped addresses, see RFC 4291 2.2 p.3 cases = [ ( "http://[::12.34.56.78]:5432/foo/", [ "0000:0000:0000:0000:0000:0000:0c22:384e", "0000:0000:0000:0000:0000:0000:12.34.56.78", ], 5432, ), ( "http://[::ffff:12.34.56.78]:5432/foo/", [ "0000:0000:0000:0000:0000:ffff:0c22:384e", "0000:0000:0000:0000:0000:ffff:12.34.56.78", ], 5432, ), ( "http://[::12.34.56.78]/foo/", [ "0000:0000:0000:0000:0000:0000:0c22:384e", "0000:0000:0000:0000:0000:0000:12.34.56.78", ], None, ), ( "http://[::ffff:12.34.56.78]/foo/", [ "0000:0000:0000:0000:0000:ffff:0c22:384e", "0000:0000:0000:0000:0000:ffff:12.34.56.78", ], None, ), ( "http://[::12.34.56.78]:/foo/", [ "0000:0000:0000:0000:0000:0000:0c22:384e", "0000:0000:0000:0000:0000:0000:12.34.56.78", ], None, ), ( "http://[::ffff:12.34.56.78]:/foo/", [ "0000:0000:0000:0000:0000:ffff:0c22:384e", "0000:0000:0000:0000:0000:ffff:12.34.56.78", ], None, ), ] for uri, hosts, port in cases: parts = urisplit(uri) self.assertIn(parts.gethost().exploded, hosts) self.assertEqual(parts.getport(), port) parts = urisplit(uri.encode("ascii")) self.assertIn(parts.gethost().exploded, hosts) self.assertEqual(parts.getport(), port) def test_invalid_ip_literal(self): uris = [ "http://::12.34.56.78]/", "http://[::1/foo/", "ftp://[::1/foo/bad]/bad", "http://[::1/foo/bad]/bad", "http://[foo]/", "http://[v7.future]", ] for uri in uris: with self.assertRaises(ValueError, msg="%r" % uri): urisplit(uri).gethost() with self.assertRaises(ValueError, msg="%r" % uri.encode("ascii")): urisplit(uri.encode("ascii")).gethost() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1633976314.0 uritools-4.0.3/tests/test_unsplit.py0000664000175000017500000000205714131077772016410 0ustar00tkemtkemimport unittest from uritools import uriunsplit class UnsplitTest(unittest.TestCase): def check(self, split, uri): result = uriunsplit(split) self.assertEqual(result, uri) def test_rfc3986_3(self): """uriunsplit test cases from [RFC3986] 3. Syntax Components""" cases = [ ( ("foo", "example.com:8042", "/over/there", "name=ferret", "nose"), "foo://example.com:8042/over/there?name=ferret#nose", ), ( ("urn", None, "example:animal:ferret:nose", None, None), "urn:example:animal:ferret:nose", ), ( (b"foo", b"example.com:8042", b"/over/there", b"name=ferret", b"nose"), b"foo://example.com:8042/over/there?name=ferret#nose", ), ( (b"urn", None, b"example:animal:ferret:nose", None, None), b"urn:example:animal:ferret:nose", ), ] for uri, parts in cases: self.check(uri, parts) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1716919411.0 uritools-4.0.3/tox.ini0000664000175000017500000000140014625416163013440 0ustar00tkemtkem[tox] envlist = check-manifest,docs,doctest,flake8,py [testenv] deps = pytest pytest-cov commands = py.test --basetemp={envtmpdir} --cov=uritools {posargs} [testenv:check-manifest] deps = check-manifest==0.44; python_version < "3.8" check-manifest; python_version >= "3.8" commands = check-manifest skip_install = true [testenv:docs] deps = sphinx commands = sphinx-build -W -b html -d {envtmpdir}/doctrees docs {envtmpdir}/html [testenv:doctest] deps = sphinx commands = sphinx-build -W -b doctest -d {envtmpdir}/doctrees docs {envtmpdir}/doctest [testenv:flake8] deps = flake8 flake8-black; implementation_name == "cpython" flake8-bugbear flake8-import-order commands = flake8 skip_install = true