rfc3986-1.3.2/0000775000327200032720000000000013466312017014014 5ustar slarsonslarson00000000000000rfc3986-1.3.2/AUTHORS.rst0000664000327200032720000000033713466311640015677 0ustar slarsonslarson00000000000000Development Lead ---------------- - Ian Stapleton Cordasco Contributors ------------ - Thomas Weißschuh - Kostya Esmukov - Derek Higgins - Victor Stinner - Viktor Haag - Seth Michael Larson rfc3986-1.3.2/LICENSE0000664000327200032720000000106413466311640015023 0ustar slarsonslarson00000000000000Copyright 2014 Ian Cordasco, Rackspace Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. rfc3986-1.3.2/MANIFEST.in0000664000327200032720000000024613466311640015555 0ustar slarsonslarson00000000000000include README.rst include LICENSE include AUTHORS.rst include setup.cfg prune *.pyc recursive-include docs *.rst *.py recursive-include tests *.py prune docs/_build rfc3986-1.3.2/PKG-INFO0000664000327200032720000001760313466312017015120 0ustar slarsonslarson00000000000000Metadata-Version: 2.1 Name: rfc3986 Version: 1.3.2 Summary: Validating URI References per RFC 3986 Home-page: http://rfc3986.readthedocs.io Author: Ian Stapleton Cordasco Author-email: graffatcolmingov@gmail.com License: Apache 2.0 Description: rfc3986 ======= A Python implementation of `RFC 3986`_ including validation and authority parsing. Installation ------------ Use pip to install ``rfc3986`` like so:: pip install rfc3986 License ------- `Apache License Version 2.0`_ Example Usage ------------- The following are the two most common use cases envisioned for ``rfc3986``. Replacing ``urlparse`` `````````````````````` To parse a URI and receive something very similar to the standard library's ``urllib.parse.urlparse`` .. code-block:: python from rfc3986 import urlparse ssh = urlparse('ssh://user@git.openstack.org:29418/openstack/glance.git') print(ssh.scheme) # => ssh print(ssh.userinfo) # => user print(ssh.params) # => None print(ssh.port) # => 29418 To create a copy of it with new pieces you can use ``copy_with``: .. code-block:: python new_ssh = ssh.copy_with( scheme='https' userinfo='', port=443, path='/openstack/glance' ) print(new_ssh.scheme) # => https print(new_ssh.userinfo) # => None # etc. Strictly Parsing a URI and Applying Validation `````````````````````````````````````````````` To parse a URI into a convenient named tuple, you can simply: .. code-block:: python from rfc3986 import uri_reference example = uri_reference('http://example.com') email = uri_reference('mailto:user@domain.com') ssh = uri_reference('ssh://user@git.openstack.org:29418/openstack/keystone.git') With a parsed URI you can access data about the components: .. code-block:: python print(example.scheme) # => http print(email.path) # => user@domain.com print(ssh.userinfo) # => user print(ssh.host) # => git.openstack.org print(ssh.port) # => 29418 It can also parse URIs with unicode present: .. code-block:: python uni = uri_reference(b'http://httpbin.org/get?utf8=\xe2\x98\x83') # ☃ print(uni.query) # utf8=%E2%98%83 With a parsed URI you can also validate it: .. code-block:: python if ssh.is_valid(): subprocess.call(['git', 'clone', ssh.unsplit()]) You can also take a parsed URI and normalize it: .. code-block:: python mangled = uri_reference('hTTp://exAMPLe.COM') print(mangled.scheme) # => hTTp print(mangled.authority) # => exAMPLe.COM normal = mangled.normalize() print(normal.scheme) # => http print(mangled.authority) # => example.com But these two URIs are (functionally) equivalent: .. code-block:: python if normal == mangled: webbrowser.open(normal.unsplit()) Your paths, queries, and fragments are safe with us though: .. code-block:: python mangled = uri_reference('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth') normal = mangled.normalize() assert normal == 'hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth' assert normal == 'http://example.com/Some/reallY/biZZare/pAth' assert normal != 'http://example.com/some/really/bizzare/path' If you do not actually need a real reference object and just want to normalize your URI: .. code-block:: python from rfc3986 import normalize_uri assert (normalize_uri('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth') == 'http://example.com/Some/reallY/biZZare/pAth') You can also very simply validate a URI: .. code-block:: python from rfc3986 import is_valid_uri assert is_valid_uri('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth') Requiring Components ~~~~~~~~~~~~~~~~~~~~ You can validate that a particular string is a valid URI and require independent components: .. code-block:: python from rfc3986 import is_valid_uri assert is_valid_uri('http://localhost:8774/v2/resource', require_scheme=True, require_authority=True, require_path=True) # Assert that a mailto URI is invalid if you require an authority # component assert is_valid_uri('mailto:user@example.com', require_authority=True) is False If you have an instance of a ``URIReference``, you can pass the same arguments to ``URIReference#is_valid``, e.g., .. code-block:: python from rfc3986 import uri_reference http = uri_reference('http://localhost:8774/v2/resource') assert uri.is_valid(require_scheme=True, require_authority=True, require_path=True) # Assert that a mailto URI is invalid if you require an authority # component mailto = uri_reference('mailto:user@example.com') assert uri.is_valid(require_authority=True) is False Alternatives ------------ - `rfc3987 `_ This is a direct competitor to this library, with extra features, licensed under the GPL. - `uritools `_ This can parse URIs in the manner of RFC 3986 but provides no validation and only recently added Python 3 support. - Standard library's `urlparse`/`urllib.parse` The functions in these libraries can only split a URI (valid or not) and provide no validation. Contributing ------------ This project follows and enforces the Python Software Foundation's `Code of Conduct `_. If you would like to contribute but do not have a bug or feature in mind, feel free to email Ian and find out how you can help. The git repository for this project is maintained at https://github.com/python-hyper/rfc3986 .. _RFC 3986: http://tools.ietf.org/html/rfc3986 .. _Apache License Version 2.0: https://www.apache.org/licenses/LICENSE-2.0 Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers Classifier: Natural Language :: English Classifier: License :: OSI Approved :: Apache Software License Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Provides-Extra: idna2008 rfc3986-1.3.2/README.rst0000664000327200032720000001273213466311640015511 0ustar slarsonslarson00000000000000rfc3986 ======= A Python implementation of `RFC 3986`_ including validation and authority parsing. Installation ------------ Use pip to install ``rfc3986`` like so:: pip install rfc3986 License ------- `Apache License Version 2.0`_ Example Usage ------------- The following are the two most common use cases envisioned for ``rfc3986``. Replacing ``urlparse`` `````````````````````` To parse a URI and receive something very similar to the standard library's ``urllib.parse.urlparse`` .. code-block:: python from rfc3986 import urlparse ssh = urlparse('ssh://user@git.openstack.org:29418/openstack/glance.git') print(ssh.scheme) # => ssh print(ssh.userinfo) # => user print(ssh.params) # => None print(ssh.port) # => 29418 To create a copy of it with new pieces you can use ``copy_with``: .. code-block:: python new_ssh = ssh.copy_with( scheme='https' userinfo='', port=443, path='/openstack/glance' ) print(new_ssh.scheme) # => https print(new_ssh.userinfo) # => None # etc. Strictly Parsing a URI and Applying Validation `````````````````````````````````````````````` To parse a URI into a convenient named tuple, you can simply: .. code-block:: python from rfc3986 import uri_reference example = uri_reference('http://example.com') email = uri_reference('mailto:user@domain.com') ssh = uri_reference('ssh://user@git.openstack.org:29418/openstack/keystone.git') With a parsed URI you can access data about the components: .. code-block:: python print(example.scheme) # => http print(email.path) # => user@domain.com print(ssh.userinfo) # => user print(ssh.host) # => git.openstack.org print(ssh.port) # => 29418 It can also parse URIs with unicode present: .. code-block:: python uni = uri_reference(b'http://httpbin.org/get?utf8=\xe2\x98\x83') # ☃ print(uni.query) # utf8=%E2%98%83 With a parsed URI you can also validate it: .. code-block:: python if ssh.is_valid(): subprocess.call(['git', 'clone', ssh.unsplit()]) You can also take a parsed URI and normalize it: .. code-block:: python mangled = uri_reference('hTTp://exAMPLe.COM') print(mangled.scheme) # => hTTp print(mangled.authority) # => exAMPLe.COM normal = mangled.normalize() print(normal.scheme) # => http print(mangled.authority) # => example.com But these two URIs are (functionally) equivalent: .. code-block:: python if normal == mangled: webbrowser.open(normal.unsplit()) Your paths, queries, and fragments are safe with us though: .. code-block:: python mangled = uri_reference('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth') normal = mangled.normalize() assert normal == 'hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth' assert normal == 'http://example.com/Some/reallY/biZZare/pAth' assert normal != 'http://example.com/some/really/bizzare/path' If you do not actually need a real reference object and just want to normalize your URI: .. code-block:: python from rfc3986 import normalize_uri assert (normalize_uri('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth') == 'http://example.com/Some/reallY/biZZare/pAth') You can also very simply validate a URI: .. code-block:: python from rfc3986 import is_valid_uri assert is_valid_uri('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth') Requiring Components ~~~~~~~~~~~~~~~~~~~~ You can validate that a particular string is a valid URI and require independent components: .. code-block:: python from rfc3986 import is_valid_uri assert is_valid_uri('http://localhost:8774/v2/resource', require_scheme=True, require_authority=True, require_path=True) # Assert that a mailto URI is invalid if you require an authority # component assert is_valid_uri('mailto:user@example.com', require_authority=True) is False If you have an instance of a ``URIReference``, you can pass the same arguments to ``URIReference#is_valid``, e.g., .. code-block:: python from rfc3986 import uri_reference http = uri_reference('http://localhost:8774/v2/resource') assert uri.is_valid(require_scheme=True, require_authority=True, require_path=True) # Assert that a mailto URI is invalid if you require an authority # component mailto = uri_reference('mailto:user@example.com') assert uri.is_valid(require_authority=True) is False Alternatives ------------ - `rfc3987 `_ This is a direct competitor to this library, with extra features, licensed under the GPL. - `uritools `_ This can parse URIs in the manner of RFC 3986 but provides no validation and only recently added Python 3 support. - Standard library's `urlparse`/`urllib.parse` The functions in these libraries can only split a URI (valid or not) and provide no validation. Contributing ------------ This project follows and enforces the Python Software Foundation's `Code of Conduct `_. If you would like to contribute but do not have a bug or feature in mind, feel free to email Ian and find out how you can help. The git repository for this project is maintained at https://github.com/python-hyper/rfc3986 .. _RFC 3986: http://tools.ietf.org/html/rfc3986 .. _Apache License Version 2.0: https://www.apache.org/licenses/LICENSE-2.0 rfc3986-1.3.2/docs/0000775000327200032720000000000013466312017014744 5ustar slarsonslarson00000000000000rfc3986-1.3.2/docs/source/0000775000327200032720000000000013466312017016244 5ustar slarsonslarson00000000000000rfc3986-1.3.2/docs/source/api-ref/0000775000327200032720000000000013466312017017567 5ustar slarsonslarson00000000000000rfc3986-1.3.2/docs/source/api-ref/api.rst0000664000327200032720000000026113466311640021072 0ustar slarsonslarson00000000000000=============== API Submodule =============== .. autofunction:: rfc3986.api.urlparse .. autofunction:: rfc3986.api.uri_reference .. autofunction:: rfc3986.api.normalize_uri rfc3986-1.3.2/docs/source/api-ref/builder.rst0000664000327200032720000000113213466311640021745 0ustar slarsonslarson00000000000000==================== URI Builder Module ==================== .. autoclass:: rfc3986.builder.URIBuilder .. automethod:: rfc3986.builder.URIBuilder.add_scheme .. automethod:: rfc3986.builder.URIBuilder.add_credentials .. automethod:: rfc3986.builder.URIBuilder.add_host .. automethod:: rfc3986.builder.URIBuilder.add_port .. automethod:: rfc3986.builder.URIBuilder.add_path .. automethod:: rfc3986.builder.URIBuilder.add_query_from .. automethod:: rfc3986.builder.URIBuilder.add_query .. automethod:: rfc3986.builder.URIBuilder.add_fragment .. automethod:: rfc3986.builder.URIBuilder.finalize rfc3986-1.3.2/docs/source/api-ref/index.rst0000664000327200032720000000053713466311640021436 0ustar slarsonslarson00000000000000=============== API Reference =============== This section contains API documentation generated from the source code of |rfc3986|. If you're looking for an introduction to the module and how it can be utilized, please see :ref:`narrative` instead. .. toctree:: :maxdepth: 1 api builder uri validators iri miscellaneous rfc3986-1.3.2/docs/source/api-ref/iri.rst0000664000327200032720000000071313466311640021106 0ustar slarsonslarson00000000000000=============== IRI Submodule =============== .. autoclass:: rfc3986.iri.IRIReference .. automethod:: rfc3986.iri.IRIReference.encode .. automethod:: rfc3986.iri.IRIReference.from_string .. automethod:: rfc3986.iri.IRIReference.unsplit .. automethod:: rfc3986.iri.IRIReference.resolve_with .. automethod:: rfc3986.iri.IRIReference.copy_with .. automethod:: rfc3986.iri.IRIReference.is_absolute .. automethod:: rfc3986.iri.IRIReference.authority_info rfc3986-1.3.2/docs/source/api-ref/miscellaneous.rst0000664000327200032720000001517113466311640023172 0ustar slarsonslarson00000000000000========================== Miscellaneous Submodules ========================== There are several submodules in |rfc3986| that are not meant to be exposed to users directly but which are valuable to document, regardless. .. data:: rfc3986.misc.UseExisting A sentinel object to make certain APIs simpler for users. .. module:: rfc3986.abnf_regexp The :mod:`rfc3986.abnf_regexp` module contains the regular expressions written from the RFC's ABNF. The :mod:`rfc3986.misc` module contains compiled regular expressions from :mod:`rfc3986.abnf_regexp` and previously contained those regular expressions. .. data:: rfc3986.abnf_regexp.GEN_DELIMS .. data:: rfc3986.abnf_regexp.GENERIC_DELIMITERS The string containing all of the generic delimiters as defined on `page 13 `__. .. data:: rfc3986.abnf_regexp.GENERIC_DELIMITERS_SET :data:`rfc3986.abnf_regexp.GEN_DELIMS` represented as a :class:`set`. .. data:: rfc3986.abnf_regexp.SUB_DELIMS .. data:: rfc3986.abnf_regexp.SUB_DELIMITERS The string containing all of the 'sub' delimiters as defined on `page 13 `__. .. data:: rfc3986.abnf_regexp.SUB_DELIMITERS_SET :data:`rfc3986.abnf_regexp.SUB_DELIMS` represented as a :class:`set`. .. data:: rfc3986.abnf_regexp.SUB_DELIMITERS_RE :data:`rfc3986.abnf_regexp.SUB_DELIMS` with the ``*`` escaped for use in regular expressions. .. data:: rfc3986.abnf_regexp.RESERVED_CHARS_SET A :class:`set` constructed of :data:`GEN_DELIMS` and :data:`SUB_DELIMS`. This union is defined on `page 13 `__. .. data:: rfc3986.abnf_regexp.ALPHA The string of upper- and lower-case letters in USASCII. .. data:: rfc3986.abnf_regexp.DIGIT The string of digits 0 through 9. .. data:: rfc3986.abnf_regexp.UNRESERVED .. data:: rfc3986.abnf_regexp.UNRESERVED_CHARS The string of unreserved characters defined in :rfc:`3986#section-2.3`. .. data:: rfc3986.abnf_regexp.UNRESERVED_CHARS_SET :data:`rfc3986.abnf_regexp.UNRESERVED_CHARS` represented as a :class:`set`. .. data:: rfc3986.abnf_regexp.NON_PCT_ENCODED_SET The non-percent encoded characters represented as a :class:`set`. .. data:: rfc3986.abnf_regexp.UNRESERVED_RE Optimized regular expression for unreserved characters. .. data:: rfc3986.abnf_regexp.SCHEME_RE Stricter regular expression to match and validate the scheme part of a URI. .. data:: rfc3986.abnf_regexp.COMPONENT_PATTERN_DICT Dictionary with regular expressions to match various components in a URI. Except for :data:`rfc3986.abnf_regexp.SCHEME_RE`, all patterns are from :rfc:`3986#appendix-B`. .. data:: rfc3986.abnf_regexp.URL_PARSING_RE Regular expression compposed from the components in :data:`rfc3986.abnf_regexp.COMPONENT_PATTERN_DICT`. .. data:: rfc3986.abnf_regexp.HEXDIG_RE Hexadecimal characters used in each piece of an IPv6 address. See :rfc:`3986#section-3.2.2`. .. data:: rfc3986.abnf_regexp.LS32_RE Lease significant 32 bits of an IPv6 address. See :rfc:`3986#section-3.2.2`. .. data:: rfc3986.abnf_regexp.REG_NAME .. data:: rfc3986.abnf_regexp.REGULAR_NAME_RE The pattern for a regular name, e.g., ``www.google.com``, ``api.github.com``. See :rfc:`3986#section-3.2.2`. .. data:: rfc3986.abnf_regexp.IPv4_RE The pattern for an IPv4 address, e.g., ``192.168.255.255``. See :rfc:`3986#section-3.2.2`. .. data:: rfc3986.abnf_regexp.IPv6_RE The pattern for an IPv6 address, e.g., ``::1``. See :rfc:`3986#section-3.2.2`. .. data:: rfc3986.abnf_regexp.IPv_FUTURE_RE A regular expression to parse out IPv Futures. See :rfc:`3986#section-3.2.2`. .. data:: rfc3986.abnf_regexp.IP_LITERAL_RE Pattern to match IPv6 addresses and IPv Future addresses. See :rfc:`3986#section-3.2.2`. .. data:: rfc3986.abnf_regexp.HOST_RE .. data:: rfc3986.abnf_regexp.HOST_PATTERN Pattern to match and validate the host piece of an authority. This is composed of - :data:`rfc3986.abnf_regexp.REG_NAME` - :data:`rfc3986.abnf_regexp.IPv4_RE` - :data:`rfc3986.abnf_regexp.IP_LITERAL_RE` See :rfc:`3986#section-3.2.2`. .. data:: rfc3986.abnf_regexp.USERINFO_RE Pattern to match and validate the user information portion of an authority component. See :rfc:`3986#section-3.2.2`. .. data:: rfc3986.abnf_regexp.PORT_RE Pattern to match and validate the port portion of an authority component. See :rfc:`3986#section-3.2.2`. .. data:: rfc3986.abnf_regexp.PCT_ENCODED .. data:: rfc3986.abnf_regexp.PERCENT_ENCODED Regular expression to match percent encoded character values. .. data:: rfc3986.abnf_regexp.PCHAR Regular expression to match printable characters. .. data:: rfc3986.abnf_regexp.PATH_RE Regular expression to match and validate the path component of a URI. See :rfc:`3986#section-3.3`. .. data:: rfc3986.abnf_regexp.PATH_EMPTY .. data:: rfc3986.abnf_regexp.PATH_ROOTLESS .. data:: rfc3986.abnf_regexp.PATH_NOSCHEME .. data:: rfc3986.abnf_regexp.PATH_ABSOLUTE .. data:: rfc3986.abnf_regexp.PATH_ABEMPTY Components of the :data:`rfc3986.abnf_regexp.PATH_RE`. See :rfc:`3986#section-3.3`. .. data:: rfc3986.abnf_regexp.QUERY_RE Regular expression to parse and validate the query component of a URI. .. data:: rfc3986.abnf_regexp.FRAGMENT_RE Regular expression to parse and validate the fragment component of a URI. .. data:: rfc3986.abnf_regexp.RELATIVE_PART_RE Regular expression to parse the relative URI when resolving URIs. .. data:: rfc3986.abnf_regexp.HIER_PART_RE The hierarchical part of a URI. This regular expression is used when resolving relative URIs. See :rfc:`3986#section-3`. .. module:: rfc3986.misc .. data:: rfc3986.misc.URI_MATCHER Compiled version of :data:`rfc3986.abnf_regexp.URL_PARSING_RE`. .. data:: rfc3986.misc.SUBAUTHORITY_MATCHER Compiled compilation of :data:`rfc3986.abnf_regexp.USERINFO_RE`, :data:`rfc3986.abnf_regexp.HOST_PATTERN`, :data:`rfc3986.abnf_regexp.PORT_RE`. .. data:: rfc3986.misc.SCHEME_MATCHER Compiled version of :data:`rfc3986.abnf_regexp.SCHEME_RE`. .. data:: rfc3986.misc.IPv4_MATCHER Compiled version of :data:`rfc3986.abnf_regexp.IPv4_RE`. .. data:: rfc3986.misc.PATH_MATCHER Compiled version of :data:`rfc3986.abnf_regexp.PATH_RE`. .. data:: rfc3986.misc.QUERY_MATCHER Compiled version of :data:`rfc3986.abnf_regexp.QUERY_RE`. .. data:: rfc3986.misc.RELATIVE_REF_MATCHER Compiled compilation of :data:`rfc3986.abnf_regexp.SCHEME_RE`, :data:`rfc3986.abnf_regexp.HIER_PART_RE`, :data:`rfc3986.abnf_regexp.QUERY_RE`. rfc3986-1.3.2/docs/source/api-ref/uri.rst0000664000327200032720000000151513466311640021123 0ustar slarsonslarson00000000000000=============== URI Submodule =============== .. autoclass:: rfc3986.uri.URIReference .. automethod:: rfc3986.uri.URIReference.from_string .. automethod:: rfc3986.uri.URIReference.unsplit .. automethod:: rfc3986.uri.URIReference.resolve_with .. automethod:: rfc3986.uri.URIReference.copy_with .. automethod:: rfc3986.uri.URIReference.normalize .. automethod:: rfc3986.uri.URIReference.is_absolute .. automethod:: rfc3986.uri.URIReference.authority_info Deprecated Methods ================== .. automethod:: rfc3986.uri.URIReference.is_valid .. automethod:: rfc3986.uri.URIReference.authority_is_valid .. automethod:: rfc3986.uri.URIReference.scheme_is_valid .. automethod:: rfc3986.uri.URIReference.path_is_valid .. automethod:: rfc3986.uri.URIReference.query_is_valid .. automethod:: rfc3986.uri.URIReference.fragment_is_valid rfc3986-1.3.2/docs/source/api-ref/validators.rst0000664000327200032720000000114313466311640022471 0ustar slarsonslarson00000000000000====================== Validators Submodule ====================== .. autoclass:: rfc3986.validators.Validator .. automethod:: rfc3986.validators.Validator.allow_schemes .. automethod:: rfc3986.validators.Validator.allow_hosts .. automethod:: rfc3986.validators.Validator.allow_ports .. automethod:: rfc3986.validators.Validator.allow_use_of_password .. automethod:: rfc3986.validators.Validator.check_validity_of .. automethod:: rfc3986.validators.Validator.forbid_use_of_password .. automethod:: rfc3986.validators.Validator.require_presence_of .. automethod:: rfc3986.validators.Validator.validate rfc3986-1.3.2/docs/source/conf.py0000664000327200032720000001165213466311640017551 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- # # rfc3986 documentation build configuration file, created by # sphinx-quickstart on Tue Mar 14 07:06:46 2017. # # This file is execfile()d with the current directory set to its # containing dir. # # Note that not all possible configuration values are present in this # autogenerated file. # # All configuration values have a default; values that are commented out # serve to show the default. # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. # # import os # import sys # sys.path.insert(0, os.path.abspath('.')) # -- General configuration ------------------------------------------------ # If your documentation needs a minimal Sphinx version, state it here. # # needs_sphinx = '1.0' rst_epilog = """ .. |rfc3986| replace:: :mod:`rfc3986` """ # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. extensions = [ 'sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.intersphinx', 'sphinx.ext.coverage', 'sphinx-prompt', ] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix(es) of source filenames. # You can specify multiple suffix as a list of string: # # source_suffix = ['.rst', '.md'] source_suffix = '.rst' # The master toctree document. master_doc = 'index' # General information about the project. project = u'rfc3986' copyright = u'2017, Ian Stapleton Cordasco' author = u'Ian Stapleton Cordasco' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The short X.Y version. version = u'1.0.0' # The full version, including alpha/beta/rc tags. release = u'1.0.0' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. # # This is also used if you do content translation via gettext catalogs. # Usually you set "language" from the command line for these cases. language = None # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. # This patterns also effect to html_static_path and html_extra_path exclude_patterns = [] # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # If true, `todo` and `todoList` produce output, else they produce nothing. todo_include_todos = False # -- Options for HTML output ---------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. # html_theme = 'sphinx_rtd_theme' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. # # html_theme_options = {} # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] # -- Options for HTMLHelp output ------------------------------------------ # Output file base name for HTML help builder. htmlhelp_basename = 'rfc3986doc' # -- Options for LaTeX output --------------------------------------------- latex_elements = { # The paper size ('letterpaper' or 'a4paper'). # # 'papersize': 'letterpaper', # The font size ('10pt', '11pt' or '12pt'). # # 'pointsize': '10pt', # Additional stuff for the LaTeX preamble. # # 'preamble': '', # Latex figure (float) alignment # # 'figure_align': 'htbp', } # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, # author, documentclass [howto, manual, or own class]). latex_documents = [ (master_doc, 'rfc3986.tex', u'rfc3986 Documentation', u'Ian Stapleton Cordasco', 'manual'), ] # -- Options for manual page output --------------------------------------- # One entry per manual page. List of tuples # (source start file, name, description, authors, manual section). man_pages = [ (master_doc, 'rfc3986', u'rfc3986 Documentation', [author], 1) ] # -- Options for Texinfo output ------------------------------------------- # Grouping the document tree into Texinfo files. List of tuples # (source start file, target name, title, author, # dir menu entry, description, category) texinfo_documents = [ (master_doc, 'rfc3986', u'rfc3986 Documentation', author, 'rfc3986', 'One line description of project.', 'Miscellaneous'), ] # Example configuration for intersphinx: refer to the Python standard library. intersphinx_mapping = {'https://docs.python.org/': None} rfc3986-1.3.2/docs/source/index.rst0000664000327200032720000000111713466311640020106 0ustar slarsonslarson00000000000000========= rfc3986 ========= |rfc3986| is a Python implementation of :rfc:`3986` including validation and authority parsing. This module also supports :rfc:`6874` which adds support for zone identifiers to IPv6 Addresses. The maintainers strongly suggest using `pip`_ to install |rfc3986|. For example, .. prompt:: bash pip install rfc3986 python -m pip install rfc3986 python3.6 -m pip install rfc3986 .. toctree:: :maxdepth: 2 :caption: Contents: narrative api-ref/index release-notes/index .. links .. _pip: https://pypi.python.org/pypi/pip/ rfc3986-1.3.2/docs/source/narrative.rst0000664000327200032720000000156113466311640020775 0ustar slarsonslarson00000000000000.. _narrative: ==================== User Documentation ==================== |rfc3986| has several API features and convenience methods. The core of |rfc3986|'s API revolves around parsing, validating, and building URIs. There is an API to provide compatibility with :mod:`urllib.parse`, there is an API to parse a URI as a URI Reference, there's an API to provide validation of URIs, and finally there's an API to build URIs. .. note:: There's presently no support for IRIs as defined in :rfc:`3987`. |rfc3986| parses URIs much differently from :mod:`urllib.parse` so users may see some subtle differences with very specific URLs that contain rough edgecases. Regardless, we do our best to implement the same API so you should be able to seemlessly swap |rfc3986| for ``urlparse``. .. toctree:: :maxdepth: 2 user/parsing user/validating user/building rfc3986-1.3.2/docs/source/release-notes/0000775000327200032720000000000013466312017021012 5ustar slarsonslarson00000000000000rfc3986-1.3.2/docs/source/release-notes/0.1.0.rst0000664000327200032720000000015113466311640022176 0ustar slarsonslarson000000000000000.1.0 -- 2014-06-27 ------------------- - Initial Release includes validation and normalization of URIs rfc3986-1.3.2/docs/source/release-notes/0.2.0.rst0000664000327200032720000000047713466311640022212 0ustar slarsonslarson000000000000000.2.0 -- 2014-06-30 ------------------- - Add support for requiring components during validation. This includes adding parameters ``require_scheme``, ``require_authority``, ``require_path``, ``require_path``, ``require_query``, and ``require_fragment`` to ``rfc3986.is_valid_uri`` and ``URIReference#is_valid``. rfc3986-1.3.2/docs/source/release-notes/0.2.1.rst0000664000327200032720000000044013466311640022201 0ustar slarsonslarson000000000000000.2.1 -- 2015-03-20 ------------------- - Check that the bytes of an IPv4 Host Address are within the valid range. Otherwise, URIs like "http://256.255.255.0/v1/resource" are considered valid. - Add 6 to the list of unreserved characters. It was previously missing. Closes bug #9 rfc3986-1.3.2/docs/source/release-notes/0.2.2.rst0000664000327200032720000000040213466311640022200 0ustar slarsonslarson000000000000000.2.2 -- 2015-05-27 ------------------- - Update the regular name regular expression to accept all of the characters allowed in the RFC. Closes bug #11 (Thanks Viktor Haag). Previously URIs similar to "http://http-bin.org" would be considered invalid. rfc3986-1.3.2/docs/source/release-notes/0.3.0.rst0000664000327200032720000000036313466311640022205 0ustar slarsonslarson000000000000000.3.0 -- 2015-10-20 ------------------- - Read README and HISTORY files using the appropriate codec so rfc3986 can be installed on systems with locale's other than utf-8 (specifically C) - Replace the standard library's urlparse behaviour rfc3986-1.3.2/docs/source/release-notes/0.3.1.rst0000664000327200032720000000013513466311640022203 0ustar slarsonslarson000000000000000.3.1 -- 2015-12-15 ------------------- - Preserve empty query strings during normalization rfc3986-1.3.2/docs/source/release-notes/0.4.0.rst0000664000327200032720000000044613466311640022210 0ustar slarsonslarson000000000000000.4.0 -- 2016-08-20 ------------------- - Add ``ParseResult.from_parts`` and ``ParseResultBytes.from_parts`` class methods to easily create a ParseResult - When using regular expressions, use ``[0-9]`` instead of ``\d`` to avoid finding ports with "numerals" that are not valid in a port rfc3986-1.3.2/docs/source/release-notes/0.4.1.rst0000664000327200032720000000021513466311640022203 0ustar slarsonslarson000000000000000.4.1 -- 2016-08-22 ------------------- - Normalize URIs constructed using ``ParseResult.from_parts`` and ``ParseResultBytes.from_parts`` rfc3986-1.3.2/docs/source/release-notes/0.4.2.rst0000664000327200032720000000017413466311640022210 0ustar slarsonslarson000000000000000.4.2 -- 2016-08-22 ------------------- - Avoid parsing an string with just an IPv6 address as having a scheme of ``[``. rfc3986-1.3.2/docs/source/release-notes/1.0.0.rst0000664000327200032720000000130413466311640022177 0ustar slarsonslarson000000000000001.0.0 -- 2017-05-10 ------------------- - Add support for :rfc:`6874` - Zone Identifiers in IPv6 Addresses See also `issue #2`_ - Add a more flexible and usable validation framework. See our documentation for more information. - Add an object to aid in building new URIs from scratch. See our documentation for more information. - Add real documentation for the entire module. - Add separate submodule with documented regular expression strings for the collected ABNF. - Allow ``None`` to be used to eliminate components via ``copy_with`` for URIs and ParseResults. - Move release history into our documentation. .. links .. _issue #2: https://github.com/python-hyper/rfc3986/issues/2 rfc3986-1.3.2/docs/source/release-notes/1.1.0.rst0000664000327200032720000000067013466311640022205 0ustar slarsonslarson000000000000001.1.0 -- 2017-07-18 ------------------- - Correct the regular expression for the User Information sub-component of the Authority Component. See also `GitHub #26`_ - :meth:`~rfc3986.validators.Validator.check_validity_of` to the :class:`~rfc3986.validators.Validator` class. See :ref:`Validating URIs ` documentation for more information. .. links .. _GitHub #26: https://github.com/python-hyper/rfc3986/pull/26 rfc3986-1.3.2/docs/source/release-notes/1.2.0.rst0000664000327200032720000000111613466311640022202 0ustar slarsonslarson000000000000001.2.0 -- 2018-12-04 ------------------- - Attempt to detect percent-encoded URI components and encode ``%`` characters if required. See also `GitHub #38`_ - Allow percent-encoded bytes within host. See also `GitHub #39`_ - Correct the IPv6 regular expression by adding a missing variation. - Fix hashing for URIReferences on Python 3. See also `GitHub !35`_ .. links .. _GitHub !35: https://github.com/python-hyper/rfc3986/pull/35 .. _GitHub #38: https://github.com/python-hyper/rfc3986/pull/38 .. _GitHub #39: https://github.com/python-hyper/rfc3986/pull/39 rfc3986-1.3.2/docs/source/release-notes/1.3.0.rst0000664000327200032720000000040313466311640022201 0ustar slarsonslarson000000000000001.3.0 -- 2019-04-20 ------------------- - Add the ``IRIReference`` class which parses data according to RFC 3987 and encodes into an ``URIReference``. See also `GitHub #50`_ .. links .. _GitHub #50: https://github.com/python-hyper/rfc3986/pull/50 rfc3986-1.3.2/docs/source/release-notes/1.3.1.rst0000664000327200032720000000035513466311640022210 0ustar slarsonslarson000000000000001.3.1 -- 2019-04-23 ------------------- - Only apply IDNA-encoding when there are characters outside of the ASCII character set. See also `GitHub #52`_ .. links .. _GitHub #52: https://github.com/python-hyper/rfc3986/pull/52 rfc3986-1.3.2/docs/source/release-notes/1.3.2.rst0000664000327200032720000000041613466311640022207 0ustar slarsonslarson000000000000001.3.2 -- 2019-05-13 ------------------- - Remove unnecessary IRI-flavored matchers from ``rfc3986.misc`` to speed up import time on resource-constrained systems. See also `GitHub #55`_ .. links .. _GitHub #55: https://github.com/python-hyper/rfc3986/pull/55 rfc3986-1.3.2/docs/source/release-notes/index.rst0000664000327200032720000000071013466311640022652 0ustar slarsonslarson00000000000000=========================== Release Notes and History =========================== All of the release notes that have been recorded for |rfc3986| are organized here with the newest releases first. 1.x Release Series ================== .. toctree:: 1.3.2 1.3.1 1.3.0 1.2.0 1.1.0 1.0.0 0.x Release Series ================== .. toctree:: 0.4.2 0.4.1 0.4.0 0.3.1 0.3.0 0.2.2 0.2.1 0.2.0 0.1.0 rfc3986-1.3.2/docs/source/user/0000775000327200032720000000000013466312017017222 5ustar slarsonslarson00000000000000rfc3986-1.3.2/docs/source/user/building.rst0000664000327200032720000000720613466311640021557 0ustar slarsonslarson00000000000000=============== Building URIs =============== Constructing URLs often seems simple. There are some problems with concatenating strings to build a URL: - Certain parts of the URL disallow certain characters - Formatting some parts of the URL is tricky and doing it manually isn't fun To make the experience better |rfc3986| provides the :class:`~rfc3986.builder.URIBuilder` class to generate valid :class:`~rfc3986.uri.URIReference` instances. The :class:`~rfc3986.builder.URIBuilder` class will handle ensuring that each component is normalized and safe for real world use. Example Usage ============= .. note:: All of the methods on a :class:`~rfc3986.builder.URIBuilder` are chainable (except :meth:`~rfc3986.builder.URIBuilder.finalize`). Let's build a basic URL with just a scheme and host. First we create an instance of :class:`~rfc3986.builder.URIBuilder`. Then we call :meth:`~rfc3986.builder.URIBuilder.add_scheme` and :meth:`~rfc3986.builder.URIBuilder.add_host` with the scheme and host we want to include in the URL. Then we convert our builder object into a :class:`~rfc3986.uri.URIReference` and call :meth:`~rfc3986.uri.URIReference.unsplit`. .. doctest:: >>> from rfc3986 import builder >>> print(builder.URIBuilder().add_scheme( ... 'https' ... ).add_host( ... 'github.com' ... ).finalize().unsplit()) https://github.com Each time you invoke a method, you get a new instance of a :class:`~rfc3986.builder.URIBuilder` class so you can build several different URLs from one base instance. .. doctest:: >>> from rfc3986 import builder >>> github_builder = builder.URIBuilder().add_scheme( ... 'https' ... ).add_host( ... 'api.github.com' ... ) >>> print(github_builder.add_path( ... '/users/sigmavirus24' ... ).finalize().unsplit()) https://api.github.com/users/sigmavirus24 >>> print(github_builder.add_path( ... '/repos/sigmavirus24/rfc3986' ... ).finalize().unsplit()) https://api.github.com/repos/sigmavirus24/rfc3986 |rfc3986| makes adding authentication credentials convenient. It takes care of making the credentials URL safe. There are some characters someone might want to include in a URL that are not safe for the authority component of a URL. .. doctest:: >>> from rfc3986 import builder >>> print(builder.URIBuilder().add_scheme( ... 'https' ... ).add_host( ... 'api.github.com' ... ).add_credentials( ... username='us3r', ... password='p@ssw0rd', ... ).finalize().unsplit()) https://us3r:p%40ssw0rd@api.github.com Further, |rfc3986| attempts to simplify the process of adding query parameters to a URL. For example, if we were using Elasticsearch, we might do something like: .. doctest:: >>> from rfc3986 import builder >>> print(builder.URIBuilder().add_scheme( ... 'https' ... ).add_host( ... 'search.example.com' ... ).add_path( ... '_search' ... ).add_query_from( ... [('q', 'repo:sigmavirus24/rfc3986'), ('sort', 'created_at:asc')] ... ).finalize().unsplit()) https://search.example.com/_search?q=repo%3Asigmavirus24%2Frfc3986&sort=created_at%3Aasc Finally, we provide a way to add a fragment to a URL. Let's build up a URL to view the section of the RFC that refers to fragments: .. doctest:: >>> from rfc3986 import builder >>> print(builder.URIBuilder().add_scheme( ... 'https' ... ).add_host( ... 'tools.ietf.org' ... ).add_path( ... '/html/rfc3986' ... ).add_fragment( ... 'section-3.5' ... ).finalize().unsplit()) https://tools.ietf.org/html/rfc3986#section-3.5 rfc3986-1.3.2/docs/source/user/parsing.rst0000664000327200032720000000736513466311640021433 0ustar slarsonslarson00000000000000=============== Parsing a URI =============== There are two ways to parse a URI with |rfc3986| #. :meth:`rfc3986.api.uri_reference` This is best when you're **not** replacing existing usage of :mod:`urllib.parse`. This also provides convenience methods around safely normalizing URIs passed into it. #. :meth:`rfc3986.api.urlparse` This is best suited to completely replace :func:`urllib.parse.urlparse`. It returns a class that should be indistinguishable from :class:`urllib.parse.ParseResult` Let's look at some code samples. Some Examples ============= First we'll parse the URL that points to the repository for this project. .. testsetup:: * import rfc3986 url = rfc3986.urlparse('https://github.com/sigmavirus24/rfc3986') uri = rfc3986.uri_reference('https://github.com/sigmavirus24/rfc3986') .. code-block:: python url = rfc3986.urlparse('https://github.com/sigmavirus24/rfc3986') Then we'll replace parts of that URL with new values: .. testcode:: ex0 print(url.copy_with( userinfo='username:password', port='443', ).unsplit()) .. testoutput:: ex0 https://username:password@github.com:443/sigmavirus24/rfc3986 This, however, does not change the current ``url`` instance of :class:`~rfc3986.parseresult.ParseResult`. As the method name might suggest, we're copying that instance and then overriding certain attributes. In fact, we can make as many copies as we like and nothing will change. .. testcode:: ex1 print(url.copy_with( scheme='ssh', userinfo='git', ).unsplit()) .. testoutput:: ex1 ssh://git@github.com/sigmavirus24/rfc3986 .. testcode:: ex1 print(url.scheme) .. testoutput:: ex1 https We can do similar things with URI References as well. .. code-block:: python uri = rfc3986.uri_reference('https://github.com/sigmavirus24/rfc3986') .. testcode:: ex2 print(uri.copy_with( authority='username:password@github.com:443', path='/sigmavirus24/github3.py', ).unsplit()) .. testoutput:: ex2 https://username:password@github.com:443/sigmavirus24/github3.py However, URI References may have some unexpected behaviour based strictly on the RFC. Finally, if you want to remove a component from a URI, you may pass ``None`` to remove it, for example: .. testcode:: ex3 print(uri.copy_with(path=None).unsplit()) .. testoutput:: ex3 https://github.com This will work on both URI References and Parse Results. And Now For Something Slightly Unusual ====================================== If you are familiar with GitHub, GitLab, or a similar service, you may have interacted with the "SSH URL" for some projects. For this project, the SSH URL is: .. code:: git@github.com:sigmavirus24/rfc3986 Let's see what happens when we parse this. .. code-block:: pycon >>> rfc3986.uri_reference('git@github.com:sigmavirus24/rfc3986') URIReference(scheme=None, authority=None, path=u'git@github.com:sigmavirus24/rfc3986', query=None, fragment=None) There's no scheme present, but it is apparent to our (human) eyes that ``git@github.com`` should not be part of the path. This is one of the areas where :mod:`rfc3986` suffers slightly due to its strict conformance to :rfc:`3986`. In the RFC, an authority must be preceded by ``//``. Let's see what happens when we add that to our URI .. code-block:: pycon >>> rfc3986.uri_reference('//git@github.com:sigmavirus24/rfc3986') URIReference(scheme=None, authority=u'git@github.com:sigmavirus24', path=u'/rfc3986', query=None, fragment=None) Somewhat better, but not much. .. note:: The maintainers of :mod:`rfc3986` are working to discern better ways to parse these less common URIs in a reasonable and sensible way without losing conformance to the RFC. rfc3986-1.3.2/docs/source/user/validating.rst0000664000327200032720000001456513466311640022112 0ustar slarsonslarson00000000000000.. _validating: ================= Validating URIs ================= While not as difficult as `validating an email address`_, validating URIs is tricky. Different parts of the URI allow different characters. Those sets sometimes overlap and othertimes they don't and it's not very convenient. Luckily, |rfc3986| makes validating URIs far simpler. Example Usage ============= First we need to create an instance of a :class:`~rfc3986.validators.Validator` which takes no parameters. After that we can call methods on the instance to indicate what we want to validate. Allowing Only Trusted Domains and Schemes ----------------------------------------- Let's assume that we're building something that takes user input for a URL and we want to ensure that URL is only ever using a specific domain with https. In that case, our code would look like this: .. doctest:: >>> from rfc3986 import validators, uri_reference >>> user_url = 'https://github.com/sigmavirus24/rfc3986' >>> validator = validators.Validator().allow_schemes( ... 'https', ... ).allow_hosts( ... 'github.com', ... ) >>> validator.validate(uri_reference( ... 'https://github.com/sigmavirus24/rfc3986' ... )) >>> validator.validate(uri_reference( ... 'https://github.com/' ... )) >>> validator.validate(uri_reference( ... 'http://example.com' ... )) Traceback (most recent call last): ... rfc3986.exceptions.UnpermittedComponentError First notice that we can easily reuse our validator object for each URL. This allows users to not have to constantly reconstruct Validators for each bit of user input. Next, we have three different URLs that we validate: #. ``https://github.com/sigmavirus24/rfc3986`` #. ``https://github.com/`` #. ``http://example.com`` As it stands, our validator will allow the first two URLs to pass but will fail the third. This is specifically because we only allow URLs using ``https`` as a scheme and ``github.com`` as the domain name. Preventing Leaks of User Credentials ------------------------------------ Next, let's imagine that we want to prevent leaking user credentials. In that case, we want to ensure that there is no password in the user information portion of the authority. In that case, our new validator would look like this: .. doctest:: >>> from rfc3986 import validators, uri_reference >>> user_url = 'https://github.com/sigmavirus24/rfc3986' >>> validator = validators.Validator().allow_schemes( ... 'https', ... ).allow_hosts( ... 'github.com', ... ).forbid_use_of_password() >>> validator.validate(uri_reference( ... 'https://github.com/sigmavirus24/rfc3986' ... )) >>> validator.validate(uri_reference( ... 'https://github.com/' ... )) >>> validator.validate(uri_reference( ... 'http://example.com' ... )) Traceback (most recent call last): ... rfc3986.exceptions.UnpermittedComponentError >>> validator.validate(uri_reference( ... 'https://sigmavirus24@github.com' ... )) >>> validator.validate(uri_reference( ... 'https://sigmavirus24:not-my-real-password@github.com' ... )) Traceback (most recent call last): ... rfc3986.exceptions.PasswordForbidden Requiring the Presence of Components ------------------------------------ Up until now, we have assumed that we will get a URL that has the appropriate components for validation. For example, we assume that we will have a URL that has a scheme and hostname. However, our current validation doesn't require those items exist. .. doctest:: >>> from rfc3986 import validators, uri_reference >>> user_url = 'https://github.com/sigmavirus24/rfc3986' >>> validator = validators.Validator().allow_schemes( ... 'https', ... ).allow_hosts( ... 'github.com', ... ).forbid_use_of_password() >>> validator.validate(uri_reference('//github.com')) >>> validator.validate(uri_reference('https:/')) In the first case, we have a host name but no scheme and in the second we have a scheme and a path but no host. If we want to ensure that those components are there and that they are *always* what we allow, then we must add one last item to our validator: .. doctest:: >>> from rfc3986 import validators, uri_reference >>> user_url = 'https://github.com/sigmavirus24/rfc3986' >>> validator = validators.Validator().allow_schemes( ... 'https', ... ).allow_hosts( ... 'github.com', ... ).forbid_use_of_password( ... ).require_presence_of( ... 'scheme', 'host', ... ) >>> validator.validate(uri_reference('//github.com')) Traceback (most recent call last): ... rfc3986.exceptions.MissingComponentError >>> validator.validate(uri_reference('https:/')) Traceback (most recent call last): ... rfc3986.exceptions.MissingComponentError >>> validator.validate(uri_reference('https://github.com')) >>> validator.validate(uri_reference( ... 'https://github.com/sigmavirus24/rfc3986' ... )) Checking the Validity of Components ----------------------------------- As of version 1.1.0, |rfc3986| allows users to check the validity of a URI Reference using a :class:`~rfc3986.validators.Validator`. Along with the above examples we can also check that a URI is valid per :rfc:`3986`. The validation of the components is pre-determined so all we need to do is specify which components we want to validate: .. doctest:: >>> from rfc3986 import validators, uri_reference >>> valid_uri = uri_reference('https://github.com/') >>> validator = validators.Validator().allow_schemes( ... 'https', ... ).allow_hosts( ... 'github.com', ... ).forbid_use_of_password( ... ).require_presence_of( ... 'scheme', 'host', ... ).check_validity_of( ... 'scheme', 'host', 'path', ... ) >>> validator.validate(valid_uri) >>> invalid_uri = valid_uri.copy_with(path='/#invalid/path') >>> validator.validate(invalid_uri) Traceback (most recent call last): ... rfc3986.exceptions.InvalidComponentsError Paths are not allowed to contain a ``#`` character unless it's percent-encoded. This is why our ``invalid_uri`` raises an exception when we attempt to validate it. .. links .. _validating an email address: http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx/ rfc3986-1.3.2/setup.cfg0000664000327200032720000000010313466312017015627 0ustar slarsonslarson00000000000000[bdist_wheel] universal = 1 [egg_info] tag_build = tag_date = 0 rfc3986-1.3.2/setup.py0000775000327200032720000000247413466311640015541 0ustar slarsonslarson00000000000000"""Packaging logic for the rfc3986 library.""" import io import os import sys import setuptools sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'src')) # noqa import rfc3986 packages = [ 'rfc3986', ] with io.open('README.rst', encoding='utf-8') as f: readme = f.read() setuptools.setup( name='rfc3986', version=rfc3986.__version__, description='Validating URI References per RFC 3986', long_description=readme, author='Ian Stapleton Cordasco', author_email='graffatcolmingov@gmail.com', url='http://rfc3986.readthedocs.io', packages=packages, package_dir={'': 'src'}, package_data={'': ['LICENSE']}, include_package_data=True, license='Apache 2.0', classifiers=( 'Development Status :: 5 - Production/Stable', 'Intended Audience :: Developers', 'Natural Language :: English', 'License :: OSI Approved :: Apache Software License', 'Programming Language :: Python', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: 3.5', 'Programming Language :: Python :: 3.6', 'Programming Language :: Python :: 3.7', ), extras_require={ 'idna2008': ['idna'] } ) rfc3986-1.3.2/src/0000775000327200032720000000000013466312017014603 5ustar slarsonslarson00000000000000rfc3986-1.3.2/src/rfc3986/0000775000327200032720000000000013466312017015707 5ustar slarsonslarson00000000000000rfc3986-1.3.2/src/rfc3986/__init__.py0000664000327200032720000000303213466311640020017 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- # Copyright (c) 2014 Rackspace # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or # implied. # See the License for the specific language governing permissions and # limitations under the License. """ An implementation of semantics and validations described in RFC 3986. See http://rfc3986.readthedocs.io/ for detailed documentation. :copyright: (c) 2014 Rackspace :license: Apache v2.0, see LICENSE for details """ from .api import iri_reference from .api import IRIReference from .api import is_valid_uri from .api import normalize_uri from .api import uri_reference from .api import URIReference from .api import urlparse from .parseresult import ParseResult __title__ = 'rfc3986' __author__ = 'Ian Stapleton Cordasco' __author_email__ = 'graffatcolmingov@gmail.com' __license__ = 'Apache v2.0' __copyright__ = 'Copyright 2014 Rackspace' __version__ = '1.3.2' __all__ = ( 'ParseResult', 'URIReference', 'IRIReference', 'is_valid_uri', 'normalize_uri', 'uri_reference', 'iri_reference', 'urlparse', '__title__', '__author__', '__author_email__', '__license__', '__copyright__', '__version__', ) rfc3986-1.3.2/src/rfc3986/_mixin.py0000664000327200032720000003163613466311640017556 0ustar slarsonslarson00000000000000"""Module containing the implementation of the URIMixin class.""" import warnings from . import exceptions as exc from . import misc from . import normalizers from . import validators class URIMixin(object): """Mixin with all shared methods for URIs and IRIs.""" __hash__ = tuple.__hash__ def authority_info(self): """Return a dictionary with the ``userinfo``, ``host``, and ``port``. If the authority is not valid, it will raise a :class:`~rfc3986.exceptions.InvalidAuthority` Exception. :returns: ``{'userinfo': 'username:password', 'host': 'www.example.com', 'port': '80'}`` :rtype: dict :raises rfc3986.exceptions.InvalidAuthority: If the authority is not ``None`` and can not be parsed. """ if not self.authority: return {'userinfo': None, 'host': None, 'port': None} match = self._match_subauthority() if match is None: # In this case, we have an authority that was parsed from the URI # Reference, but it cannot be further parsed by our # misc.SUBAUTHORITY_MATCHER. In this case it must not be a valid # authority. raise exc.InvalidAuthority(self.authority.encode(self.encoding)) # We had a match, now let's ensure that it is actually a valid host # address if it is IPv4 matches = match.groupdict() host = matches.get('host') if (host and misc.IPv4_MATCHER.match(host) and not validators.valid_ipv4_host_address(host)): # If we have a host, it appears to be IPv4 and it does not have # valid bytes, it is an InvalidAuthority. raise exc.InvalidAuthority(self.authority.encode(self.encoding)) return matches def _match_subauthority(self): return misc.SUBAUTHORITY_MATCHER.match(self.authority) @property def host(self): """If present, a string representing the host.""" try: authority = self.authority_info() except exc.InvalidAuthority: return None return authority['host'] @property def port(self): """If present, the port extracted from the authority.""" try: authority = self.authority_info() except exc.InvalidAuthority: return None return authority['port'] @property def userinfo(self): """If present, the userinfo extracted from the authority.""" try: authority = self.authority_info() except exc.InvalidAuthority: return None return authority['userinfo'] def is_absolute(self): """Determine if this URI Reference is an absolute URI. See http://tools.ietf.org/html/rfc3986#section-4.3 for explanation. :returns: ``True`` if it is an absolute URI, ``False`` otherwise. :rtype: bool """ return bool(misc.ABSOLUTE_URI_MATCHER.match(self.unsplit())) def is_valid(self, **kwargs): """Determine if the URI is valid. .. deprecated:: 1.1.0 Use the :class:`~rfc3986.validators.Validator` object instead. :param bool require_scheme: Set to ``True`` if you wish to require the presence of the scheme component. :param bool require_authority: Set to ``True`` if you wish to require the presence of the authority component. :param bool require_path: Set to ``True`` if you wish to require the presence of the path component. :param bool require_query: Set to ``True`` if you wish to require the presence of the query component. :param bool require_fragment: Set to ``True`` if you wish to require the presence of the fragment component. :returns: ``True`` if the URI is valid. ``False`` otherwise. :rtype: bool """ warnings.warn("Please use rfc3986.validators.Validator instead. " "This method will be eventually removed.", DeprecationWarning) validators = [ (self.scheme_is_valid, kwargs.get('require_scheme', False)), (self.authority_is_valid, kwargs.get('require_authority', False)), (self.path_is_valid, kwargs.get('require_path', False)), (self.query_is_valid, kwargs.get('require_query', False)), (self.fragment_is_valid, kwargs.get('require_fragment', False)), ] return all(v(r) for v, r in validators) def authority_is_valid(self, require=False): """Determine if the authority component is valid. .. deprecated:: 1.1.0 Use the :class:`~rfc3986.validators.Validator` object instead. :param bool require: Set to ``True`` to require the presence of this component. :returns: ``True`` if the authority is valid. ``False`` otherwise. :rtype: bool """ warnings.warn("Please use rfc3986.validators.Validator instead. " "This method will be eventually removed.", DeprecationWarning) try: self.authority_info() except exc.InvalidAuthority: return False return validators.authority_is_valid( self.authority, host=self.host, require=require, ) def scheme_is_valid(self, require=False): """Determine if the scheme component is valid. .. deprecated:: 1.1.0 Use the :class:`~rfc3986.validators.Validator` object instead. :param str require: Set to ``True`` to require the presence of this component. :returns: ``True`` if the scheme is valid. ``False`` otherwise. :rtype: bool """ warnings.warn("Please use rfc3986.validators.Validator instead. " "This method will be eventually removed.", DeprecationWarning) return validators.scheme_is_valid(self.scheme, require) def path_is_valid(self, require=False): """Determine if the path component is valid. .. deprecated:: 1.1.0 Use the :class:`~rfc3986.validators.Validator` object instead. :param str require: Set to ``True`` to require the presence of this component. :returns: ``True`` if the path is valid. ``False`` otherwise. :rtype: bool """ warnings.warn("Please use rfc3986.validators.Validator instead. " "This method will be eventually removed.", DeprecationWarning) return validators.path_is_valid(self.path, require) def query_is_valid(self, require=False): """Determine if the query component is valid. .. deprecated:: 1.1.0 Use the :class:`~rfc3986.validators.Validator` object instead. :param str require: Set to ``True`` to require the presence of this component. :returns: ``True`` if the query is valid. ``False`` otherwise. :rtype: bool """ warnings.warn("Please use rfc3986.validators.Validator instead. " "This method will be eventually removed.", DeprecationWarning) return validators.query_is_valid(self.query, require) def fragment_is_valid(self, require=False): """Determine if the fragment component is valid. .. deprecated:: 1.1.0 Use the Validator object instead. :param str require: Set to ``True`` to require the presence of this component. :returns: ``True`` if the fragment is valid. ``False`` otherwise. :rtype: bool """ warnings.warn("Please use rfc3986.validators.Validator instead. " "This method will be eventually removed.", DeprecationWarning) return validators.fragment_is_valid(self.fragment, require) def normalized_equality(self, other_ref): """Compare this URIReference to another URIReference. :param URIReference other_ref: (required), The reference with which we're comparing. :returns: ``True`` if the references are equal, ``False`` otherwise. :rtype: bool """ return tuple(self.normalize()) == tuple(other_ref.normalize()) def resolve_with(self, base_uri, strict=False): """Use an absolute URI Reference to resolve this relative reference. Assuming this is a relative reference that you would like to resolve, use the provided base URI to resolve it. See http://tools.ietf.org/html/rfc3986#section-5 for more information. :param base_uri: Either a string or URIReference. It must be an absolute URI or it will raise an exception. :returns: A new URIReference which is the result of resolving this reference using ``base_uri``. :rtype: :class:`URIReference` :raises rfc3986.exceptions.ResolutionError: If the ``base_uri`` is not an absolute URI. """ if not isinstance(base_uri, URIMixin): base_uri = type(self).from_string(base_uri) if not base_uri.is_absolute(): raise exc.ResolutionError(base_uri) # This is optional per # http://tools.ietf.org/html/rfc3986#section-5.2.1 base_uri = base_uri.normalize() # The reference we're resolving resolving = self if not strict and resolving.scheme == base_uri.scheme: resolving = resolving.copy_with(scheme=None) # http://tools.ietf.org/html/rfc3986#page-32 if resolving.scheme is not None: target = resolving.copy_with( path=normalizers.normalize_path(resolving.path) ) else: if resolving.authority is not None: target = resolving.copy_with( scheme=base_uri.scheme, path=normalizers.normalize_path(resolving.path) ) else: if resolving.path is None: if resolving.query is not None: query = resolving.query else: query = base_uri.query target = resolving.copy_with( scheme=base_uri.scheme, authority=base_uri.authority, path=base_uri.path, query=query ) else: if resolving.path.startswith('/'): path = normalizers.normalize_path(resolving.path) else: path = normalizers.normalize_path( misc.merge_paths(base_uri, resolving.path) ) target = resolving.copy_with( scheme=base_uri.scheme, authority=base_uri.authority, path=path, query=resolving.query ) return target def unsplit(self): """Create a URI string from the components. :returns: The URI Reference reconstituted as a string. :rtype: str """ # See http://tools.ietf.org/html/rfc3986#section-5.3 result_list = [] if self.scheme: result_list.extend([self.scheme, ':']) if self.authority: result_list.extend(['//', self.authority]) if self.path: result_list.append(self.path) if self.query is not None: result_list.extend(['?', self.query]) if self.fragment is not None: result_list.extend(['#', self.fragment]) return ''.join(result_list) def copy_with(self, scheme=misc.UseExisting, authority=misc.UseExisting, path=misc.UseExisting, query=misc.UseExisting, fragment=misc.UseExisting): """Create a copy of this reference with the new components. :param str scheme: (optional) The scheme to use for the new reference. :param str authority: (optional) The authority to use for the new reference. :param str path: (optional) The path to use for the new reference. :param str query: (optional) The query to use for the new reference. :param str fragment: (optional) The fragment to use for the new reference. :returns: New URIReference with provided components. :rtype: URIReference """ attributes = { 'scheme': scheme, 'authority': authority, 'path': path, 'query': query, 'fragment': fragment, } for key, value in list(attributes.items()): if value is misc.UseExisting: del attributes[key] uri = self._replace(**attributes) uri.encoding = self.encoding return uri rfc3986-1.3.2/src/rfc3986/abnf_regexp.py0000664000327200032720000002157113466311640020550 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or # implied. # See the License for the specific language governing permissions and # limitations under the License. """Module for the regular expressions crafted from ABNF.""" import sys # https://tools.ietf.org/html/rfc3986#page-13 GEN_DELIMS = GENERIC_DELIMITERS = ":/?#[]@" GENERIC_DELIMITERS_SET = set(GENERIC_DELIMITERS) # https://tools.ietf.org/html/rfc3986#page-13 SUB_DELIMS = SUB_DELIMITERS = "!$&'()*+,;=" SUB_DELIMITERS_SET = set(SUB_DELIMITERS) # Escape the '*' for use in regular expressions SUB_DELIMITERS_RE = r"!$&'()\*+,;=" RESERVED_CHARS_SET = GENERIC_DELIMITERS_SET.union(SUB_DELIMITERS_SET) ALPHA = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' DIGIT = '0123456789' # https://tools.ietf.org/html/rfc3986#section-2.3 UNRESERVED = UNRESERVED_CHARS = ALPHA + DIGIT + r'._!-' UNRESERVED_CHARS_SET = set(UNRESERVED_CHARS) NON_PCT_ENCODED_SET = RESERVED_CHARS_SET.union(UNRESERVED_CHARS_SET) # We need to escape the '-' in this case: UNRESERVED_RE = r'A-Za-z0-9._~\-' # Percent encoded character values PERCENT_ENCODED = PCT_ENCODED = '%[A-Fa-f0-9]{2}' PCHAR = '([' + UNRESERVED_RE + SUB_DELIMITERS_RE + ':@]|%s)' % PCT_ENCODED # NOTE(sigmavirus24): We're going to use more strict regular expressions # than appear in Appendix B for scheme. This will prevent over-eager # consuming of items that aren't schemes. SCHEME_RE = '[a-zA-Z][a-zA-Z0-9+.-]*' _AUTHORITY_RE = '[^/?#]*' _PATH_RE = '[^?#]*' _QUERY_RE = '[^#]*' _FRAGMENT_RE = '.*' # Extracted from http://tools.ietf.org/html/rfc3986#appendix-B COMPONENT_PATTERN_DICT = { 'scheme': SCHEME_RE, 'authority': _AUTHORITY_RE, 'path': _PATH_RE, 'query': _QUERY_RE, 'fragment': _FRAGMENT_RE, } # See http://tools.ietf.org/html/rfc3986#appendix-B # In this case, we name each of the important matches so we can use # SRE_Match#groupdict to parse the values out if we so choose. This is also # modified to ignore other matches that are not important to the parsing of # the reference so we can also simply use SRE_Match#groups. URL_PARSING_RE = ( r'(?:(?P{scheme}):)?(?://(?P{authority}))?' r'(?P{path})(?:\?(?P{query}))?' r'(?:#(?P{fragment}))?' ).format(**COMPONENT_PATTERN_DICT) # ######################### # Authority Matcher Section # ######################### # Host patterns, see: http://tools.ietf.org/html/rfc3986#section-3.2.2 # The pattern for a regular name, e.g., www.google.com, api.github.com REGULAR_NAME_RE = REG_NAME = '((?:{0}|[{1}])*)'.format( '%[0-9A-Fa-f]{2}', SUB_DELIMITERS_RE + UNRESERVED_RE ) # The pattern for an IPv4 address, e.g., 192.168.255.255, 127.0.0.1, IPv4_RE = r'([0-9]{1,3}\.){3}[0-9]{1,3}' # Hexadecimal characters used in each piece of an IPv6 address HEXDIG_RE = '[0-9A-Fa-f]{1,4}' # Least-significant 32 bits of an IPv6 address LS32_RE = '({hex}:{hex}|{ipv4})'.format(hex=HEXDIG_RE, ipv4=IPv4_RE) # Substitutions into the following patterns for IPv6 patterns defined # http://tools.ietf.org/html/rfc3986#page-20 _subs = {'hex': HEXDIG_RE, 'ls32': LS32_RE} # Below: h16 = hexdig, see: https://tools.ietf.org/html/rfc5234 for details # about ABNF (Augmented Backus-Naur Form) use in the comments variations = [ # 6( h16 ":" ) ls32 '(%(hex)s:){6}%(ls32)s' % _subs, # "::" 5( h16 ":" ) ls32 '::(%(hex)s:){5}%(ls32)s' % _subs, # [ h16 ] "::" 4( h16 ":" ) ls32 '(%(hex)s)?::(%(hex)s:){4}%(ls32)s' % _subs, # [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32 '((%(hex)s:)?%(hex)s)?::(%(hex)s:){3}%(ls32)s' % _subs, # [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32 '((%(hex)s:){0,2}%(hex)s)?::(%(hex)s:){2}%(ls32)s' % _subs, # [ *3( h16 ":" ) h16 ] "::" h16 ":" ls32 '((%(hex)s:){0,3}%(hex)s)?::%(hex)s:%(ls32)s' % _subs, # [ *4( h16 ":" ) h16 ] "::" ls32 '((%(hex)s:){0,4}%(hex)s)?::%(ls32)s' % _subs, # [ *5( h16 ":" ) h16 ] "::" h16 '((%(hex)s:){0,5}%(hex)s)?::%(hex)s' % _subs, # [ *6( h16 ":" ) h16 ] "::" '((%(hex)s:){0,6}%(hex)s)?::' % _subs, ] IPv6_RE = '(({0})|({1})|({2})|({3})|({4})|({5})|({6})|({7})|({8}))'.format( *variations ) IPv_FUTURE_RE = r'v[0-9A-Fa-f]+\.[%s]+' % ( UNRESERVED_RE + SUB_DELIMITERS_RE + ':' ) # RFC 6874 Zone ID ABNF ZONE_ID = '(?:[' + UNRESERVED_RE + ']|' + PCT_ENCODED + ')+' IPv6_ADDRZ_RFC4007_RE = IPv6_RE + '(?:(?:%25|%)' + ZONE_ID + ')?' IPv6_ADDRZ_RE = IPv6_RE + '(?:%25' + ZONE_ID + ')?' IP_LITERAL_RE = r'\[({0}|{1})\]'.format( IPv6_ADDRZ_RFC4007_RE, IPv_FUTURE_RE, ) # Pattern for matching the host piece of the authority HOST_RE = HOST_PATTERN = '({0}|{1}|{2})'.format( REG_NAME, IPv4_RE, IP_LITERAL_RE, ) USERINFO_RE = '^([' + UNRESERVED_RE + SUB_DELIMITERS_RE + ':]|%s)+' % ( PCT_ENCODED ) PORT_RE = '[0-9]{1,5}' # #################### # Path Matcher Section # #################### # See http://tools.ietf.org/html/rfc3986#section-3.3 for more information # about the path patterns defined below. segments = { 'segment': PCHAR + '*', # Non-zero length segment 'segment-nz': PCHAR + '+', # Non-zero length segment without ":" 'segment-nz-nc': PCHAR.replace(':', '') + '+' } # Path types taken from Section 3.3 (linked above) PATH_EMPTY = '^$' PATH_ROOTLESS = '%(segment-nz)s(/%(segment)s)*' % segments PATH_NOSCHEME = '%(segment-nz-nc)s(/%(segment)s)*' % segments PATH_ABSOLUTE = '/(%s)?' % PATH_ROOTLESS PATH_ABEMPTY = '(/%(segment)s)*' % segments PATH_RE = '^(%s|%s|%s|%s|%s)$' % ( PATH_ABEMPTY, PATH_ABSOLUTE, PATH_NOSCHEME, PATH_ROOTLESS, PATH_EMPTY ) FRAGMENT_RE = QUERY_RE = ( '^([/?:@' + UNRESERVED_RE + SUB_DELIMITERS_RE + ']|%s)*$' % PCT_ENCODED ) # ########################## # Relative reference matcher # ########################## # See http://tools.ietf.org/html/rfc3986#section-4.2 for details RELATIVE_PART_RE = '(//%s%s|%s|%s|%s)' % ( COMPONENT_PATTERN_DICT['authority'], PATH_ABEMPTY, PATH_ABSOLUTE, PATH_NOSCHEME, PATH_EMPTY, ) # See http://tools.ietf.org/html/rfc3986#section-3 for definition HIER_PART_RE = '(//%s%s|%s|%s|%s)' % ( COMPONENT_PATTERN_DICT['authority'], PATH_ABEMPTY, PATH_ABSOLUTE, PATH_ROOTLESS, PATH_EMPTY, ) # ############### # IRIs / RFC 3987 # ############### # Only wide-unicode gets the high-ranges of UCSCHAR if sys.maxunicode > 0xFFFF: # pragma: no cover IPRIVATE = u'\uE000-\uF8FF\U000F0000-\U000FFFFD\U00100000-\U0010FFFD' UCSCHAR_RE = ( u'\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF' u'\U00010000-\U0001FFFD\U00020000-\U0002FFFD' u'\U00030000-\U0003FFFD\U00040000-\U0004FFFD' u'\U00050000-\U0005FFFD\U00060000-\U0006FFFD' u'\U00070000-\U0007FFFD\U00080000-\U0008FFFD' u'\U00090000-\U0009FFFD\U000A0000-\U000AFFFD' u'\U000B0000-\U000BFFFD\U000C0000-\U000CFFFD' u'\U000D0000-\U000DFFFD\U000E1000-\U000EFFFD' ) else: # pragma: no cover IPRIVATE = u'\uE000-\uF8FF' UCSCHAR_RE = ( u'\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF' ) IUNRESERVED_RE = u'A-Za-z0-9\\._~\\-' + UCSCHAR_RE IPCHAR = u'([' + IUNRESERVED_RE + SUB_DELIMITERS_RE + u':@]|%s)' % PCT_ENCODED isegments = { 'isegment': IPCHAR + u'*', # Non-zero length segment 'isegment-nz': IPCHAR + u'+', # Non-zero length segment without ":" 'isegment-nz-nc': IPCHAR.replace(':', '') + u'+' } IPATH_ROOTLESS = u'%(isegment-nz)s(/%(isegment)s)*' % isegments IPATH_NOSCHEME = u'%(isegment-nz-nc)s(/%(isegment)s)*' % isegments IPATH_ABSOLUTE = u'/(?:%s)?' % IPATH_ROOTLESS IPATH_ABEMPTY = u'(?:/%(isegment)s)*' % isegments IPATH_RE = u'^(?:%s|%s|%s|%s|%s)$' % ( IPATH_ABEMPTY, IPATH_ABSOLUTE, IPATH_NOSCHEME, IPATH_ROOTLESS, PATH_EMPTY ) IREGULAR_NAME_RE = IREG_NAME = u'(?:{0}|[{1}])*'.format( u'%[0-9A-Fa-f]{2}', SUB_DELIMITERS_RE + IUNRESERVED_RE ) IHOST_RE = IHOST_PATTERN = u'({0}|{1}|{2})'.format( IREG_NAME, IPv4_RE, IP_LITERAL_RE, ) IUSERINFO_RE = u'^(?:[' + IUNRESERVED_RE + SUB_DELIMITERS_RE + u':]|%s)+' % ( PCT_ENCODED ) IFRAGMENT_RE = (u'^(?:[/?:@' + IUNRESERVED_RE + SUB_DELIMITERS_RE + u']|%s)*$' % PCT_ENCODED) IQUERY_RE = (u'^(?:[/?:@' + IUNRESERVED_RE + SUB_DELIMITERS_RE + IPRIVATE + u']|%s)*$' % PCT_ENCODED) IRELATIVE_PART_RE = u'(//%s%s|%s|%s|%s)' % ( COMPONENT_PATTERN_DICT['authority'], IPATH_ABEMPTY, IPATH_ABSOLUTE, IPATH_NOSCHEME, PATH_EMPTY, ) IHIER_PART_RE = u'(//%s%s|%s|%s|%s)' % ( COMPONENT_PATTERN_DICT['authority'], IPATH_ABEMPTY, IPATH_ABSOLUTE, IPATH_ROOTLESS, PATH_EMPTY, ) rfc3986-1.3.2/src/rfc3986/api.py0000664000327200032720000000745713466311640017050 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- # Copyright (c) 2014 Rackspace # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or # implied. # See the License for the specific language governing permissions and # limitations under the License. """ Module containing the simple and functional API for rfc3986. This module defines functions and provides access to the public attributes and classes of rfc3986. """ from .iri import IRIReference from .parseresult import ParseResult from .uri import URIReference def uri_reference(uri, encoding='utf-8'): """Parse a URI string into a URIReference. This is a convenience function. You could achieve the same end by using ``URIReference.from_string(uri)``. :param str uri: The URI which needs to be parsed into a reference. :param str encoding: The encoding of the string provided :returns: A parsed URI :rtype: :class:`URIReference` """ return URIReference.from_string(uri, encoding) def iri_reference(iri, encoding='utf-8'): """Parse a IRI string into an IRIReference. This is a convenience function. You could achieve the same end by using ``IRIReference.from_string(iri)``. :param str iri: The IRI which needs to be parsed into a reference. :param str encoding: The encoding of the string provided :returns: A parsed IRI :rtype: :class:`IRIReference` """ return IRIReference.from_string(iri, encoding) def is_valid_uri(uri, encoding='utf-8', **kwargs): """Determine if the URI given is valid. This is a convenience function. You could use either ``uri_reference(uri).is_valid()`` or ``URIReference.from_string(uri).is_valid()`` to achieve the same result. :param str uri: The URI to be validated. :param str encoding: The encoding of the string provided :param bool require_scheme: Set to ``True`` if you wish to require the presence of the scheme component. :param bool require_authority: Set to ``True`` if you wish to require the presence of the authority component. :param bool require_path: Set to ``True`` if you wish to require the presence of the path component. :param bool require_query: Set to ``True`` if you wish to require the presence of the query component. :param bool require_fragment: Set to ``True`` if you wish to require the presence of the fragment component. :returns: ``True`` if the URI is valid, ``False`` otherwise. :rtype: bool """ return URIReference.from_string(uri, encoding).is_valid(**kwargs) def normalize_uri(uri, encoding='utf-8'): """Normalize the given URI. This is a convenience function. You could use either ``uri_reference(uri).normalize().unsplit()`` or ``URIReference.from_string(uri).normalize().unsplit()`` instead. :param str uri: The URI to be normalized. :param str encoding: The encoding of the string provided :returns: The normalized URI. :rtype: str """ normalized_reference = URIReference.from_string(uri, encoding).normalize() return normalized_reference.unsplit() def urlparse(uri, encoding='utf-8'): """Parse a given URI and return a ParseResult. This is a partial replacement of the standard library's urlparse function. :param str uri: The URI to be parsed. :param str encoding: The encoding of the string provided. :returns: A parsed URI :rtype: :class:`~rfc3986.parseresult.ParseResult` """ return ParseResult.from_string(uri, encoding, strict=False) rfc3986-1.3.2/src/rfc3986/builder.py0000664000327200032720000002255113466311640017715 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- # Copyright (c) 2017 Ian Stapleton Cordasco # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or # implied. # See the License for the specific language governing permissions and # limitations under the License. """Module containing the logic for the URIBuilder object.""" from . import compat from . import normalizers from . import uri class URIBuilder(object): """Object to aid in building up a URI Reference from parts. .. note:: This object should be instantiated by the user, but it's recommended that it is not provided with arguments. Instead, use the available method to populate the fields. """ def __init__(self, scheme=None, userinfo=None, host=None, port=None, path=None, query=None, fragment=None): """Initialize our URI builder. :param str scheme: (optional) :param str userinfo: (optional) :param str host: (optional) :param int port: (optional) :param str path: (optional) :param str query: (optional) :param str fragment: (optional) """ self.scheme = scheme self.userinfo = userinfo self.host = host self.port = port self.path = path self.query = query self.fragment = fragment def __repr__(self): """Provide a convenient view of our builder object.""" formatstr = ('URIBuilder(scheme={b.scheme}, userinfo={b.userinfo}, ' 'host={b.host}, port={b.port}, path={b.path}, ' 'query={b.query}, fragment={b.fragment})') return formatstr.format(b=self) def add_scheme(self, scheme): """Add a scheme to our builder object. After normalizing, this will generate a new URIBuilder instance with the specified scheme and all other attributes the same. .. code-block:: python >>> URIBuilder().add_scheme('HTTPS') URIBuilder(scheme='https', userinfo=None, host=None, port=None, path=None, query=None, fragment=None) """ scheme = normalizers.normalize_scheme(scheme) return URIBuilder( scheme=scheme, userinfo=self.userinfo, host=self.host, port=self.port, path=self.path, query=self.query, fragment=self.fragment, ) def add_credentials(self, username, password): """Add credentials as the userinfo portion of the URI. .. code-block:: python >>> URIBuilder().add_credentials('root', 's3crete') URIBuilder(scheme=None, userinfo='root:s3crete', host=None, port=None, path=None, query=None, fragment=None) >>> URIBuilder().add_credentials('root', None) URIBuilder(scheme=None, userinfo='root', host=None, port=None, path=None, query=None, fragment=None) """ if username is None: raise ValueError('Username cannot be None') userinfo = normalizers.normalize_username(username) if password is not None: userinfo = '{}:{}'.format( userinfo, normalizers.normalize_password(password), ) return URIBuilder( scheme=self.scheme, userinfo=userinfo, host=self.host, port=self.port, path=self.path, query=self.query, fragment=self.fragment, ) def add_host(self, host): """Add hostname to the URI. .. code-block:: python >>> URIBuilder().add_host('google.com') URIBuilder(scheme=None, userinfo=None, host='google.com', port=None, path=None, query=None, fragment=None) """ return URIBuilder( scheme=self.scheme, userinfo=self.userinfo, host=normalizers.normalize_host(host), port=self.port, path=self.path, query=self.query, fragment=self.fragment, ) def add_port(self, port): """Add port to the URI. .. code-block:: python >>> URIBuilder().add_port(80) URIBuilder(scheme=None, userinfo=None, host=None, port='80', path=None, query=None, fragment=None) >>> URIBuilder().add_port(443) URIBuilder(scheme=None, userinfo=None, host=None, port='443', path=None, query=None, fragment=None) """ port_int = int(port) if port_int < 0: raise ValueError( 'ports are not allowed to be negative. You provided {}'.format( port_int, ) ) if port_int > 65535: raise ValueError( 'ports are not allowed to be larger than 65535. ' 'You provided {}'.format( port_int, ) ) return URIBuilder( scheme=self.scheme, userinfo=self.userinfo, host=self.host, port='{}'.format(port_int), path=self.path, query=self.query, fragment=self.fragment, ) def add_path(self, path): """Add a path to the URI. .. code-block:: python >>> URIBuilder().add_path('sigmavirus24/rfc3985') URIBuilder(scheme=None, userinfo=None, host=None, port=None, path='/sigmavirus24/rfc3986', query=None, fragment=None) >>> URIBuilder().add_path('/checkout.php') URIBuilder(scheme=None, userinfo=None, host=None, port=None, path='/checkout.php', query=None, fragment=None) """ if not path.startswith('/'): path = '/{}'.format(path) return URIBuilder( scheme=self.scheme, userinfo=self.userinfo, host=self.host, port=self.port, path=normalizers.normalize_path(path), query=self.query, fragment=self.fragment, ) def add_query_from(self, query_items): """Generate and add a query a dictionary or list of tuples. .. code-block:: python >>> URIBuilder().add_query_from({'a': 'b c'}) URIBuilder(scheme=None, userinfo=None, host=None, port=None, path=None, query='a=b+c', fragment=None) >>> URIBuilder().add_query_from([('a', 'b c')]) URIBuilder(scheme=None, userinfo=None, host=None, port=None, path=None, query='a=b+c', fragment=None) """ query = normalizers.normalize_query(compat.urlencode(query_items)) return URIBuilder( scheme=self.scheme, userinfo=self.userinfo, host=self.host, port=self.port, path=self.path, query=query, fragment=self.fragment, ) def add_query(self, query): """Add a pre-formated query string to the URI. .. code-block:: python >>> URIBuilder().add_query('a=b&c=d') URIBuilder(scheme=None, userinfo=None, host=None, port=None, path=None, query='a=b&c=d', fragment=None) """ return URIBuilder( scheme=self.scheme, userinfo=self.userinfo, host=self.host, port=self.port, path=self.path, query=normalizers.normalize_query(query), fragment=self.fragment, ) def add_fragment(self, fragment): """Add a fragment to the URI. .. code-block:: python >>> URIBuilder().add_fragment('section-2.6.1') URIBuilder(scheme=None, userinfo=None, host=None, port=None, path=None, query=None, fragment='section-2.6.1') """ return URIBuilder( scheme=self.scheme, userinfo=self.userinfo, host=self.host, port=self.port, path=self.path, query=self.query, fragment=normalizers.normalize_fragment(fragment), ) def finalize(self): """Create a URIReference from our builder. .. code-block:: python >>> URIBuilder().add_scheme('https').add_host('github.com' ... ).add_path('sigmavirus24/rfc3986').finalize().unsplit() 'https://github.com/sigmavirus24/rfc3986' >>> URIBuilder().add_scheme('https').add_host('github.com' ... ).add_path('sigmavirus24/rfc3986').add_credentials( ... 'sigmavirus24', 'not-re@l').finalize().unsplit() 'https://sigmavirus24:not-re%40l@github.com/sigmavirus24/rfc3986' """ return uri.URIReference( self.scheme, normalizers.normalize_authority( (self.userinfo, self.host, self.port) ), self.path, self.query, self.fragment, ) rfc3986-1.3.2/src/rfc3986/compat.py0000664000327200032720000000275113466311640017552 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- # Copyright (c) 2014 Rackspace # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or # implied. # See the License for the specific language governing permissions and # limitations under the License. """Compatibility module for Python 2 and 3 support.""" import sys try: from urllib.parse import quote as urlquote except ImportError: # Python 2.x from urllib import quote as urlquote try: from urllib.parse import urlencode except ImportError: # Python 2.x from urllib import urlencode __all__ = ( 'to_bytes', 'to_str', 'urlquote', 'urlencode', ) PY3 = (3, 0) <= sys.version_info < (4, 0) PY2 = (2, 6) <= sys.version_info < (2, 8) if PY3: unicode = str # Python 3.x def to_str(b, encoding='utf-8'): """Ensure that b is text in the specified encoding.""" if hasattr(b, 'decode') and not isinstance(b, unicode): b = b.decode(encoding) return b def to_bytes(s, encoding='utf-8'): """Ensure that s is converted to bytes from the encoding.""" if hasattr(s, 'encode') and not isinstance(s, bytes): s = s.encode(encoding) return s rfc3986-1.3.2/src/rfc3986/exceptions.py0000664000327200032720000000727713466311640020460 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- """Exceptions module for rfc3986.""" from . import compat class RFC3986Exception(Exception): """Base class for all rfc3986 exception classes.""" pass class InvalidAuthority(RFC3986Exception): """Exception when the authority string is invalid.""" def __init__(self, authority): """Initialize the exception with the invalid authority.""" super(InvalidAuthority, self).__init__( u"The authority ({0}) is not valid.".format( compat.to_str(authority))) class InvalidPort(RFC3986Exception): """Exception when the port is invalid.""" def __init__(self, port): """Initialize the exception with the invalid port.""" super(InvalidPort, self).__init__( 'The port ("{0}") is not valid.'.format(port)) class ResolutionError(RFC3986Exception): """Exception to indicate a failure to resolve a URI.""" def __init__(self, uri): """Initialize the error with the failed URI.""" super(ResolutionError, self).__init__( "{0} is not an absolute URI.".format(uri.unsplit())) class ValidationError(RFC3986Exception): """Exception raised during Validation of a URI.""" pass class MissingComponentError(ValidationError): """Exception raised when a required component is missing.""" def __init__(self, uri, *component_names): """Initialize the error with the missing component name.""" verb = 'was' if len(component_names) > 1: verb = 'were' self.uri = uri self.components = sorted(component_names) components = ', '.join(self.components) super(MissingComponentError, self).__init__( "{} {} required but missing".format(components, verb), uri, self.components, ) class UnpermittedComponentError(ValidationError): """Exception raised when a component has an unpermitted value.""" def __init__(self, component_name, component_value, allowed_values): """Initialize the error with the unpermitted component.""" super(UnpermittedComponentError, self).__init__( "{} was required to be one of {!r} but was {!r}".format( component_name, list(sorted(allowed_values)), component_value, ), component_name, component_value, allowed_values, ) self.component_name = component_name self.component_value = component_value self.allowed_values = allowed_values class PasswordForbidden(ValidationError): """Exception raised when a URL has a password in the userinfo section.""" def __init__(self, uri): """Initialize the error with the URI that failed validation.""" unsplit = getattr(uri, 'unsplit', lambda: uri) super(PasswordForbidden, self).__init__( '"{}" contained a password when validation forbade it'.format( unsplit() ) ) self.uri = uri class InvalidComponentsError(ValidationError): """Exception raised when one or more components are invalid.""" def __init__(self, uri, *component_names): """Initialize the error with the invalid component name(s).""" verb = 'was' if len(component_names) > 1: verb = 'were' self.uri = uri self.components = sorted(component_names) components = ', '.join(self.components) super(InvalidComponentsError, self).__init__( "{} {} found to be invalid".format(components, verb), uri, self.components, ) class MissingDependencyError(RFC3986Exception): """Exception raised when an IRI is encoded without the 'idna' module.""" rfc3986-1.3.2/src/rfc3986/iri.py0000664000327200032720000001253213466311640017050 0ustar slarsonslarson00000000000000"""Module containing the implementation of the IRIReference class.""" # -*- coding: utf-8 -*- # Copyright (c) 2014 Rackspace # Copyright (c) 2015 Ian Stapleton Cordasco # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or # implied. # See the License for the specific language governing permissions and # limitations under the License. from collections import namedtuple from . import compat from . import exceptions from . import misc from . import normalizers from . import uri try: import idna except ImportError: # pragma: no cover idna = None class IRIReference(namedtuple('IRIReference', misc.URI_COMPONENTS), uri.URIMixin): """Immutable object representing a parsed IRI Reference. Can be encoded into an URIReference object via the procedure specified in RFC 3987 Section 3.1 .. note:: The IRI submodule is a new interface and may possibly change in the future. Check for changes to the interface when upgrading. """ slots = () def __new__(cls, scheme, authority, path, query, fragment, encoding='utf-8'): """Create a new IRIReference.""" ref = super(IRIReference, cls).__new__( cls, scheme or None, authority or None, path or None, query, fragment) ref.encoding = encoding return ref def __eq__(self, other): """Compare this reference to another.""" other_ref = other if isinstance(other, tuple): other_ref = self.__class__(*other) elif not isinstance(other, IRIReference): try: other_ref = self.__class__.from_string(other) except TypeError: raise TypeError( 'Unable to compare {0}() to {1}()'.format( type(self).__name__, type(other).__name__)) # See http://tools.ietf.org/html/rfc3986#section-6.2 return tuple(self) == tuple(other_ref) def _match_subauthority(self): return misc.ISUBAUTHORITY_MATCHER.match(self.authority) @classmethod def from_string(cls, iri_string, encoding='utf-8'): """Parse a IRI reference from the given unicode IRI string. :param str iri_string: Unicode IRI to be parsed into a reference. :param str encoding: The encoding of the string provided :returns: :class:`IRIReference` or subclass thereof """ iri_string = compat.to_str(iri_string, encoding) split_iri = misc.IRI_MATCHER.match(iri_string).groupdict() return cls( split_iri['scheme'], split_iri['authority'], normalizers.encode_component(split_iri['path'], encoding), normalizers.encode_component(split_iri['query'], encoding), normalizers.encode_component(split_iri['fragment'], encoding), encoding, ) def encode(self, idna_encoder=None): # noqa: C901 """Encode an IRIReference into a URIReference instance. If the ``idna`` module is installed or the ``rfc3986[idna]`` extra is used then unicode characters in the IRI host component will be encoded with IDNA2008. :param idna_encoder: Function that encodes each part of the host component If not given will raise an exception if the IRI contains a host component. :rtype: uri.URIReference :returns: A URI reference """ authority = self.authority if authority: if idna_encoder is None: if idna is None: # pragma: no cover raise exceptions.MissingDependencyError( "Could not import the 'idna' module " "and the IRI hostname requires encoding" ) def idna_encoder(name): if any(ord(c) > 128 for c in name): try: return idna.encode(name.lower(), strict=True, std3_rules=True) except idna.IDNAError: raise exceptions.InvalidAuthority(self.authority) return name authority = "" if self.host: authority = ".".join([compat.to_str(idna_encoder(part)) for part in self.host.split(".")]) if self.userinfo is not None: authority = (normalizers.encode_component( self.userinfo, self.encoding) + '@' + authority) if self.port is not None: authority += ":" + str(self.port) return uri.URIReference(self.scheme, authority, path=self.path, query=self.query, fragment=self.fragment, encoding=self.encoding) rfc3986-1.3.2/src/rfc3986/misc.py0000664000327200032720000000777613466311640017236 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- # Copyright (c) 2014 Rackspace # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or # implied. # See the License for the specific language governing permissions and # limitations under the License. """ Module containing compiled regular expressions and constants. This module contains important constants, patterns, and compiled regular expressions for parsing and validating URIs and their components. """ import re from . import abnf_regexp # These are enumerated for the named tuple used as a superclass of # URIReference URI_COMPONENTS = ['scheme', 'authority', 'path', 'query', 'fragment'] important_characters = { 'generic_delimiters': abnf_regexp.GENERIC_DELIMITERS, 'sub_delimiters': abnf_regexp.SUB_DELIMITERS, # We need to escape the '*' in this case 're_sub_delimiters': abnf_regexp.SUB_DELIMITERS_RE, 'unreserved_chars': abnf_regexp.UNRESERVED_CHARS, # We need to escape the '-' in this case: 're_unreserved': abnf_regexp.UNRESERVED_RE, } # For details about delimiters and reserved characters, see: # http://tools.ietf.org/html/rfc3986#section-2.2 GENERIC_DELIMITERS = abnf_regexp.GENERIC_DELIMITERS_SET SUB_DELIMITERS = abnf_regexp.SUB_DELIMITERS_SET RESERVED_CHARS = abnf_regexp.RESERVED_CHARS_SET # For details about unreserved characters, see: # http://tools.ietf.org/html/rfc3986#section-2.3 UNRESERVED_CHARS = abnf_regexp.UNRESERVED_CHARS_SET NON_PCT_ENCODED = abnf_regexp.NON_PCT_ENCODED_SET URI_MATCHER = re.compile(abnf_regexp.URL_PARSING_RE) SUBAUTHORITY_MATCHER = re.compile(( '^(?:(?P{0})@)?' # userinfo '(?P{1})' # host ':?(?P{2})?$' # port ).format(abnf_regexp.USERINFO_RE, abnf_regexp.HOST_PATTERN, abnf_regexp.PORT_RE)) HOST_MATCHER = re.compile('^' + abnf_regexp.HOST_RE + '$') IPv4_MATCHER = re.compile('^' + abnf_regexp.IPv4_RE + '$') IPv6_MATCHER = re.compile(r'^\[' + abnf_regexp.IPv6_ADDRZ_RFC4007_RE + r'\]$') # Used by host validator IPv6_NO_RFC4007_MATCHER = re.compile(r'^\[%s\]$' % ( abnf_regexp.IPv6_ADDRZ_RE )) # Matcher used to validate path components PATH_MATCHER = re.compile(abnf_regexp.PATH_RE) # ################################## # Query and Fragment Matcher Section # ################################## QUERY_MATCHER = re.compile(abnf_regexp.QUERY_RE) FRAGMENT_MATCHER = QUERY_MATCHER # Scheme validation, see: http://tools.ietf.org/html/rfc3986#section-3.1 SCHEME_MATCHER = re.compile('^{0}$'.format(abnf_regexp.SCHEME_RE)) RELATIVE_REF_MATCHER = re.compile(r'^%s(\?%s)?(#%s)?$' % ( abnf_regexp.RELATIVE_PART_RE, abnf_regexp.QUERY_RE, abnf_regexp.FRAGMENT_RE, )) # See http://tools.ietf.org/html/rfc3986#section-4.3 ABSOLUTE_URI_MATCHER = re.compile(r'^%s:%s(\?%s)?$' % ( abnf_regexp.COMPONENT_PATTERN_DICT['scheme'], abnf_regexp.HIER_PART_RE, abnf_regexp.QUERY_RE[1:-1], )) # ############### # IRIs / RFC 3987 # ############### IRI_MATCHER = re.compile(abnf_regexp.URL_PARSING_RE, re.UNICODE) ISUBAUTHORITY_MATCHER = re.compile(( u'^(?:(?P{0})@)?' # iuserinfo u'(?P{1})' # ihost u':?(?P{2})?$' # port ).format(abnf_regexp.IUSERINFO_RE, abnf_regexp.IHOST_RE, abnf_regexp.PORT_RE), re.UNICODE) # Path merger as defined in http://tools.ietf.org/html/rfc3986#section-5.2.3 def merge_paths(base_uri, relative_path): """Merge a base URI's path with a relative URI's path.""" if base_uri.path is None and base_uri.authority is not None: return '/' + relative_path else: path = base_uri.path or '' index = path.rfind('/') return path[:index] + '/' + relative_path UseExisting = object() rfc3986-1.3.2/src/rfc3986/normalizers.py0000664000327200032720000001221313466311640020626 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- # Copyright (c) 2014 Rackspace # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or # implied. # See the License for the specific language governing permissions and # limitations under the License. """Module with functions to normalize components.""" import re from . import compat from . import misc def normalize_scheme(scheme): """Normalize the scheme component.""" return scheme.lower() def normalize_authority(authority): """Normalize an authority tuple to a string.""" userinfo, host, port = authority result = '' if userinfo: result += normalize_percent_characters(userinfo) + '@' if host: result += normalize_host(host) if port: result += ':' + port return result def normalize_username(username): """Normalize a username to make it safe to include in userinfo.""" return compat.urlquote(username) def normalize_password(password): """Normalize a password to make safe for userinfo.""" return compat.urlquote(password) def normalize_host(host): """Normalize a host string.""" if misc.IPv6_MATCHER.match(host): percent = host.find('%') if percent != -1: percent_25 = host.find('%25') # Replace RFC 4007 IPv6 Zone ID delimiter '%' with '%25' # from RFC 6874. If the host is '[%25]' then we # assume RFC 4007 and normalize to '[%2525]' if percent_25 == -1 or percent < percent_25 or \ (percent == percent_25 and percent_25 == len(host) - 4): host = host.replace('%', '%25', 1) # Don't normalize the casing of the Zone ID return host[:percent].lower() + host[percent:] return host.lower() def normalize_path(path): """Normalize the path string.""" if not path: return path path = normalize_percent_characters(path) return remove_dot_segments(path) def normalize_query(query): """Normalize the query string.""" if not query: return query return normalize_percent_characters(query) def normalize_fragment(fragment): """Normalize the fragment string.""" if not fragment: return fragment return normalize_percent_characters(fragment) PERCENT_MATCHER = re.compile('%[A-Fa-f0-9]{2}') def normalize_percent_characters(s): """All percent characters should be upper-cased. For example, ``"%3afoo%DF%ab"`` should be turned into ``"%3Afoo%DF%AB"``. """ matches = set(PERCENT_MATCHER.findall(s)) for m in matches: if not m.isupper(): s = s.replace(m, m.upper()) return s def remove_dot_segments(s): """Remove dot segments from the string. See also Section 5.2.4 of :rfc:`3986`. """ # See http://tools.ietf.org/html/rfc3986#section-5.2.4 for pseudo-code segments = s.split('/') # Turn the path into a list of segments output = [] # Initialize the variable to use to store output for segment in segments: # '.' is the current directory, so ignore it, it is superfluous if segment == '.': continue # Anything other than '..', should be appended to the output elif segment != '..': output.append(segment) # In this case segment == '..', if we can, we should pop the last # element elif output: output.pop() # If the path starts with '/' and the output is empty or the first string # is non-empty if s.startswith('/') and (not output or output[0]): output.insert(0, '') # If the path starts with '/.' or '/..' ensure we add one more empty # string to add a trailing '/' if s.endswith(('/.', '/..')): output.append('') return '/'.join(output) def encode_component(uri_component, encoding): """Encode the specific component in the provided encoding.""" if uri_component is None: return uri_component # Try to see if the component we're encoding is already percent-encoded # so we can skip all '%' characters but still encode all others. percent_encodings = len(PERCENT_MATCHER.findall( compat.to_str(uri_component, encoding))) uri_bytes = compat.to_bytes(uri_component, encoding) is_percent_encoded = percent_encodings == uri_bytes.count(b'%') encoded_uri = bytearray() for i in range(0, len(uri_bytes)): # Will return a single character bytestring on both Python 2 & 3 byte = uri_bytes[i:i+1] byte_ord = ord(byte) if ((is_percent_encoded and byte == b'%') or (byte_ord < 128 and byte.decode() in misc.NON_PCT_ENCODED)): encoded_uri.extend(byte) continue encoded_uri.extend('%{0:02x}'.format(byte_ord).encode().upper()) return encoded_uri.decode(encoding) rfc3986-1.3.2/src/rfc3986/parseresult.py0000664000327200032720000003447613466311640020651 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- # Copyright (c) 2015 Ian Stapleton Cordasco # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or # implied. # See the License for the specific language governing permissions and # limitations under the License. """Module containing the urlparse compatibility logic.""" from collections import namedtuple from . import compat from . import exceptions from . import misc from . import normalizers from . import uri __all__ = ('ParseResult', 'ParseResultBytes') PARSED_COMPONENTS = ('scheme', 'userinfo', 'host', 'port', 'path', 'query', 'fragment') class ParseResultMixin(object): def _generate_authority(self, attributes): # I swear I did not align the comparisons below. That's just how they # happened to align based on pep8 and attribute lengths. userinfo, host, port = (attributes[p] for p in ('userinfo', 'host', 'port')) if (self.userinfo != userinfo or self.host != host or self.port != port): if port: port = '{0}'.format(port) return normalizers.normalize_authority( (compat.to_str(userinfo, self.encoding), compat.to_str(host, self.encoding), port) ) return self.authority def geturl(self): """Shim to match the standard library method.""" return self.unsplit() @property def hostname(self): """Shim to match the standard library.""" return self.host @property def netloc(self): """Shim to match the standard library.""" return self.authority @property def params(self): """Shim to match the standard library.""" return self.query class ParseResult(namedtuple('ParseResult', PARSED_COMPONENTS), ParseResultMixin): """Implementation of urlparse compatibility class. This uses the URIReference logic to handle compatibility with the urlparse.ParseResult class. """ slots = () def __new__(cls, scheme, userinfo, host, port, path, query, fragment, uri_ref, encoding='utf-8'): """Create a new ParseResult.""" parse_result = super(ParseResult, cls).__new__( cls, scheme or None, userinfo or None, host, port or None, path or None, query, fragment) parse_result.encoding = encoding parse_result.reference = uri_ref return parse_result @classmethod def from_parts(cls, scheme=None, userinfo=None, host=None, port=None, path=None, query=None, fragment=None, encoding='utf-8'): """Create a ParseResult instance from its parts.""" authority = '' if userinfo is not None: authority += userinfo + '@' if host is not None: authority += host if port is not None: authority += ':{0}'.format(port) uri_ref = uri.URIReference(scheme=scheme, authority=authority, path=path, query=query, fragment=fragment, encoding=encoding).normalize() userinfo, host, port = authority_from(uri_ref, strict=True) return cls(scheme=uri_ref.scheme, userinfo=userinfo, host=host, port=port, path=uri_ref.path, query=uri_ref.query, fragment=uri_ref.fragment, uri_ref=uri_ref, encoding=encoding) @classmethod def from_string(cls, uri_string, encoding='utf-8', strict=True, lazy_normalize=True): """Parse a URI from the given unicode URI string. :param str uri_string: Unicode URI to be parsed into a reference. :param str encoding: The encoding of the string provided :param bool strict: Parse strictly according to :rfc:`3986` if True. If False, parse similarly to the standard library's urlparse function. :returns: :class:`ParseResult` or subclass thereof """ reference = uri.URIReference.from_string(uri_string, encoding) if not lazy_normalize: reference = reference.normalize() userinfo, host, port = authority_from(reference, strict) return cls(scheme=reference.scheme, userinfo=userinfo, host=host, port=port, path=reference.path, query=reference.query, fragment=reference.fragment, uri_ref=reference, encoding=encoding) @property def authority(self): """Return the normalized authority.""" return self.reference.authority def copy_with(self, scheme=misc.UseExisting, userinfo=misc.UseExisting, host=misc.UseExisting, port=misc.UseExisting, path=misc.UseExisting, query=misc.UseExisting, fragment=misc.UseExisting): """Create a copy of this instance replacing with specified parts.""" attributes = zip(PARSED_COMPONENTS, (scheme, userinfo, host, port, path, query, fragment)) attrs_dict = {} for name, value in attributes: if value is misc.UseExisting: value = getattr(self, name) attrs_dict[name] = value authority = self._generate_authority(attrs_dict) ref = self.reference.copy_with(scheme=attrs_dict['scheme'], authority=authority, path=attrs_dict['path'], query=attrs_dict['query'], fragment=attrs_dict['fragment']) return ParseResult(uri_ref=ref, encoding=self.encoding, **attrs_dict) def encode(self, encoding=None): """Convert to an instance of ParseResultBytes.""" encoding = encoding or self.encoding attrs = dict( zip(PARSED_COMPONENTS, (attr.encode(encoding) if hasattr(attr, 'encode') else attr for attr in self))) return ParseResultBytes( uri_ref=self.reference, encoding=encoding, **attrs ) def unsplit(self, use_idna=False): """Create a URI string from the components. :returns: The parsed URI reconstituted as a string. :rtype: str """ parse_result = self if use_idna and self.host: hostbytes = self.host.encode('idna') host = hostbytes.decode(self.encoding) parse_result = self.copy_with(host=host) return parse_result.reference.unsplit() class ParseResultBytes(namedtuple('ParseResultBytes', PARSED_COMPONENTS), ParseResultMixin): """Compatibility shim for the urlparse.ParseResultBytes object.""" def __new__(cls, scheme, userinfo, host, port, path, query, fragment, uri_ref, encoding='utf-8', lazy_normalize=True): """Create a new ParseResultBytes instance.""" parse_result = super(ParseResultBytes, cls).__new__( cls, scheme or None, userinfo or None, host, port or None, path or None, query or None, fragment or None) parse_result.encoding = encoding parse_result.reference = uri_ref parse_result.lazy_normalize = lazy_normalize return parse_result @classmethod def from_parts(cls, scheme=None, userinfo=None, host=None, port=None, path=None, query=None, fragment=None, encoding='utf-8', lazy_normalize=True): """Create a ParseResult instance from its parts.""" authority = '' if userinfo is not None: authority += userinfo + '@' if host is not None: authority += host if port is not None: authority += ':{0}'.format(int(port)) uri_ref = uri.URIReference(scheme=scheme, authority=authority, path=path, query=query, fragment=fragment, encoding=encoding) if not lazy_normalize: uri_ref = uri_ref.normalize() to_bytes = compat.to_bytes userinfo, host, port = authority_from(uri_ref, strict=True) return cls(scheme=to_bytes(scheme, encoding), userinfo=to_bytes(userinfo, encoding), host=to_bytes(host, encoding), port=port, path=to_bytes(path, encoding), query=to_bytes(query, encoding), fragment=to_bytes(fragment, encoding), uri_ref=uri_ref, encoding=encoding, lazy_normalize=lazy_normalize) @classmethod def from_string(cls, uri_string, encoding='utf-8', strict=True, lazy_normalize=True): """Parse a URI from the given unicode URI string. :param str uri_string: Unicode URI to be parsed into a reference. :param str encoding: The encoding of the string provided :param bool strict: Parse strictly according to :rfc:`3986` if True. If False, parse similarly to the standard library's urlparse function. :returns: :class:`ParseResultBytes` or subclass thereof """ reference = uri.URIReference.from_string(uri_string, encoding) if not lazy_normalize: reference = reference.normalize() userinfo, host, port = authority_from(reference, strict) to_bytes = compat.to_bytes return cls(scheme=to_bytes(reference.scheme, encoding), userinfo=to_bytes(userinfo, encoding), host=to_bytes(host, encoding), port=port, path=to_bytes(reference.path, encoding), query=to_bytes(reference.query, encoding), fragment=to_bytes(reference.fragment, encoding), uri_ref=reference, encoding=encoding, lazy_normalize=lazy_normalize) @property def authority(self): """Return the normalized authority.""" return self.reference.authority.encode(self.encoding) def copy_with(self, scheme=misc.UseExisting, userinfo=misc.UseExisting, host=misc.UseExisting, port=misc.UseExisting, path=misc.UseExisting, query=misc.UseExisting, fragment=misc.UseExisting, lazy_normalize=True): """Create a copy of this instance replacing with specified parts.""" attributes = zip(PARSED_COMPONENTS, (scheme, userinfo, host, port, path, query, fragment)) attrs_dict = {} for name, value in attributes: if value is misc.UseExisting: value = getattr(self, name) if not isinstance(value, bytes) and hasattr(value, 'encode'): value = value.encode(self.encoding) attrs_dict[name] = value authority = self._generate_authority(attrs_dict) to_str = compat.to_str ref = self.reference.copy_with( scheme=to_str(attrs_dict['scheme'], self.encoding), authority=to_str(authority, self.encoding), path=to_str(attrs_dict['path'], self.encoding), query=to_str(attrs_dict['query'], self.encoding), fragment=to_str(attrs_dict['fragment'], self.encoding) ) if not lazy_normalize: ref = ref.normalize() return ParseResultBytes( uri_ref=ref, encoding=self.encoding, lazy_normalize=lazy_normalize, **attrs_dict ) def unsplit(self, use_idna=False): """Create a URI bytes object from the components. :returns: The parsed URI reconstituted as a string. :rtype: bytes """ parse_result = self if use_idna and self.host: # self.host is bytes, to encode to idna, we need to decode it # first host = self.host.decode(self.encoding) hostbytes = host.encode('idna') parse_result = self.copy_with(host=hostbytes) if self.lazy_normalize: parse_result = parse_result.copy_with(lazy_normalize=False) uri = parse_result.reference.unsplit() return uri.encode(self.encoding) def split_authority(authority): # Initialize our expected return values userinfo = host = port = None # Initialize an extra var we may need to use extra_host = None # Set-up rest in case there is no userinfo portion rest = authority if '@' in authority: userinfo, rest = authority.rsplit('@', 1) # Handle IPv6 host addresses if rest.startswith('['): host, rest = rest.split(']', 1) host += ']' if ':' in rest: extra_host, port = rest.split(':', 1) elif not host and rest: host = rest if extra_host and not host: host = extra_host return userinfo, host, port def authority_from(reference, strict): try: subauthority = reference.authority_info() except exceptions.InvalidAuthority: if strict: raise userinfo, host, port = split_authority(reference.authority) else: # Thanks to Richard Barrell for this idea: # https://twitter.com/0x2ba22e11/status/617338811975139328 userinfo, host, port = (subauthority.get(p) for p in ('userinfo', 'host', 'port')) if port: try: port = int(port) except ValueError: raise exceptions.InvalidPort(port) return userinfo, host, port rfc3986-1.3.2/src/rfc3986/uri.py0000664000327200032720000001215313466311640017063 0ustar slarsonslarson00000000000000"""Module containing the implementation of the URIReference class.""" # -*- coding: utf-8 -*- # Copyright (c) 2014 Rackspace # Copyright (c) 2015 Ian Stapleton Cordasco # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or # implied. # See the License for the specific language governing permissions and # limitations under the License. from collections import namedtuple from . import compat from . import misc from . import normalizers from ._mixin import URIMixin class URIReference(namedtuple('URIReference', misc.URI_COMPONENTS), URIMixin): """Immutable object representing a parsed URI Reference. .. note:: This class is not intended to be directly instantiated by the user. This object exposes attributes for the following components of a URI: - scheme - authority - path - query - fragment .. attribute:: scheme The scheme that was parsed for the URI Reference. For example, ``http``, ``https``, ``smtp``, ``imap``, etc. .. attribute:: authority Component of the URI that contains the user information, host, and port sub-components. For example, ``google.com``, ``127.0.0.1:5000``, ``username@[::1]``, ``username:password@example.com:443``, etc. .. attribute:: path The path that was parsed for the given URI Reference. For example, ``/``, ``/index.php``, etc. .. attribute:: query The query component for a given URI Reference. For example, ``a=b``, ``a=b%20c``, ``a=b+c``, ``a=b,c=d,e=%20f``, etc. .. attribute:: fragment The fragment component of a URI. For example, ``section-3.1``. This class also provides extra attributes for easier access to information like the subcomponents of the authority component. .. attribute:: userinfo The user information parsed from the authority. .. attribute:: host The hostname, IPv4, or IPv6 adddres parsed from the authority. .. attribute:: port The port parsed from the authority. """ slots = () def __new__(cls, scheme, authority, path, query, fragment, encoding='utf-8'): """Create a new URIReference.""" ref = super(URIReference, cls).__new__( cls, scheme or None, authority or None, path or None, query, fragment) ref.encoding = encoding return ref __hash__ = tuple.__hash__ def __eq__(self, other): """Compare this reference to another.""" other_ref = other if isinstance(other, tuple): other_ref = URIReference(*other) elif not isinstance(other, URIReference): try: other_ref = URIReference.from_string(other) except TypeError: raise TypeError( 'Unable to compare URIReference() to {0}()'.format( type(other).__name__)) # See http://tools.ietf.org/html/rfc3986#section-6.2 naive_equality = tuple(self) == tuple(other_ref) return naive_equality or self.normalized_equality(other_ref) def normalize(self): """Normalize this reference as described in Section 6.2.2. This is not an in-place normalization. Instead this creates a new URIReference. :returns: A new reference object with normalized components. :rtype: URIReference """ # See http://tools.ietf.org/html/rfc3986#section-6.2.2 for logic in # this method. return URIReference(normalizers.normalize_scheme(self.scheme or ''), normalizers.normalize_authority( (self.userinfo, self.host, self.port)), normalizers.normalize_path(self.path or ''), normalizers.normalize_query(self.query), normalizers.normalize_fragment(self.fragment), self.encoding) @classmethod def from_string(cls, uri_string, encoding='utf-8'): """Parse a URI reference from the given unicode URI string. :param str uri_string: Unicode URI to be parsed into a reference. :param str encoding: The encoding of the string provided :returns: :class:`URIReference` or subclass thereof """ uri_string = compat.to_str(uri_string, encoding) split_uri = misc.URI_MATCHER.match(uri_string).groupdict() return cls( split_uri['scheme'], split_uri['authority'], normalizers.encode_component(split_uri['path'], encoding), normalizers.encode_component(split_uri['query'], encoding), normalizers.encode_component(split_uri['fragment'], encoding), encoding, ) rfc3986-1.3.2/src/rfc3986/validators.py0000664000327200032720000003303613466311640020437 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- # Copyright (c) 2017 Ian Stapleton Cordasco # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or # implied. # See the License for the specific language governing permissions and # limitations under the License. """Module containing the validation logic for rfc3986.""" from . import exceptions from . import misc from . import normalizers class Validator(object): """Object used to configure validation of all objects in rfc3986. .. versionadded:: 1.0 Example usage:: >>> from rfc3986 import api, validators >>> uri = api.uri_reference('https://github.com/') >>> validator = validators.Validator().require_presence_of( ... 'scheme', 'host', 'path', ... ).allow_schemes( ... 'http', 'https', ... ).allow_hosts( ... '127.0.0.1', 'github.com', ... ) >>> validator.validate(uri) >>> invalid_uri = rfc3986.uri_reference('imap://mail.google.com') >>> validator.validate(invalid_uri) Traceback (most recent call last): ... rfc3986.exceptions.MissingComponentError: ('path was required but missing', URIReference(scheme=u'imap', authority=u'mail.google.com', path=None, query=None, fragment=None), ['path']) """ COMPONENT_NAMES = frozenset([ 'scheme', 'userinfo', 'host', 'port', 'path', 'query', 'fragment', ]) def __init__(self): """Initialize our default validations.""" self.allowed_schemes = set() self.allowed_hosts = set() self.allowed_ports = set() self.allow_password = True self.required_components = { 'scheme': False, 'userinfo': False, 'host': False, 'port': False, 'path': False, 'query': False, 'fragment': False, } self.validated_components = self.required_components.copy() def allow_schemes(self, *schemes): """Require the scheme to be one of the provided schemes. .. versionadded:: 1.0 :param schemes: Schemes, without ``://`` that are allowed. :returns: The validator instance. :rtype: Validator """ for scheme in schemes: self.allowed_schemes.add(normalizers.normalize_scheme(scheme)) return self def allow_hosts(self, *hosts): """Require the host to be one of the provided hosts. .. versionadded:: 1.0 :param hosts: Hosts that are allowed. :returns: The validator instance. :rtype: Validator """ for host in hosts: self.allowed_hosts.add(normalizers.normalize_host(host)) return self def allow_ports(self, *ports): """Require the port to be one of the provided ports. .. versionadded:: 1.0 :param ports: Ports that are allowed. :returns: The validator instance. :rtype: Validator """ for port in ports: port_int = int(port, base=10) if 0 <= port_int <= 65535: self.allowed_ports.add(port) return self def allow_use_of_password(self): """Allow passwords to be present in the URI. .. versionadded:: 1.0 :returns: The validator instance. :rtype: Validator """ self.allow_password = True return self def forbid_use_of_password(self): """Prevent passwords from being included in the URI. .. versionadded:: 1.0 :returns: The validator instance. :rtype: Validator """ self.allow_password = False return self def check_validity_of(self, *components): """Check the validity of the components provided. This can be specified repeatedly. .. versionadded:: 1.1 :param components: Names of components from :attr:`Validator.COMPONENT_NAMES`. :returns: The validator instance. :rtype: Validator """ components = [c.lower() for c in components] for component in components: if component not in self.COMPONENT_NAMES: raise ValueError( '"{}" is not a valid component'.format(component) ) self.validated_components.update({ component: True for component in components }) return self def require_presence_of(self, *components): """Require the components provided. This can be specified repeatedly. .. versionadded:: 1.0 :param components: Names of components from :attr:`Validator.COMPONENT_NAMES`. :returns: The validator instance. :rtype: Validator """ components = [c.lower() for c in components] for component in components: if component not in self.COMPONENT_NAMES: raise ValueError( '"{}" is not a valid component'.format(component) ) self.required_components.update({ component: True for component in components }) return self def validate(self, uri): """Check a URI for conditions specified on this validator. .. versionadded:: 1.0 :param uri: Parsed URI to validate. :type uri: rfc3986.uri.URIReference :raises MissingComponentError: When a required component is missing. :raises UnpermittedComponentError: When a component is not one of those allowed. :raises PasswordForbidden: When a password is present in the userinfo component but is not permitted by configuration. :raises InvalidComponentsError: When a component was found to be invalid. """ if not self.allow_password: check_password(uri) required_components = [ component for component, required in self.required_components.items() if required ] validated_components = [ component for component, required in self.validated_components.items() if required ] if required_components: ensure_required_components_exist(uri, required_components) if validated_components: ensure_components_are_valid(uri, validated_components) ensure_one_of(self.allowed_schemes, uri, 'scheme') ensure_one_of(self.allowed_hosts, uri, 'host') ensure_one_of(self.allowed_ports, uri, 'port') def check_password(uri): """Assert that there is no password present in the uri.""" userinfo = uri.userinfo if not userinfo: return credentials = userinfo.split(':', 1) if len(credentials) <= 1: return raise exceptions.PasswordForbidden(uri) def ensure_one_of(allowed_values, uri, attribute): """Assert that the uri's attribute is one of the allowed values.""" value = getattr(uri, attribute) if value is not None and allowed_values and value not in allowed_values: raise exceptions.UnpermittedComponentError( attribute, value, allowed_values, ) def ensure_required_components_exist(uri, required_components): """Assert that all required components are present in the URI.""" missing_components = sorted([ component for component in required_components if getattr(uri, component) is None ]) if missing_components: raise exceptions.MissingComponentError(uri, *missing_components) def is_valid(value, matcher, require): """Determine if a value is valid based on the provided matcher. :param str value: Value to validate. :param matcher: Compiled regular expression to use to validate the value. :param require: Whether or not the value is required. """ if require: return (value is not None and matcher.match(value)) # require is False and value is not None return value is None or matcher.match(value) def authority_is_valid(authority, host=None, require=False): """Determine if the authority string is valid. :param str authority: The authority to validate. :param str host: (optional) The host portion of the authority to validate. :param bool require: (optional) Specify if authority must not be None. :returns: ``True`` if valid, ``False`` otherwise :rtype: bool """ validated = is_valid(authority, misc.SUBAUTHORITY_MATCHER, require) if validated and host is not None: return host_is_valid(host, require) return validated def host_is_valid(host, require=False): """Determine if the host string is valid. :param str host: The host to validate. :param bool require: (optional) Specify if host must not be None. :returns: ``True`` if valid, ``False`` otherwise :rtype: bool """ validated = is_valid(host, misc.HOST_MATCHER, require) if validated and host is not None and misc.IPv4_MATCHER.match(host): return valid_ipv4_host_address(host) elif validated and host is not None and misc.IPv6_MATCHER.match(host): return misc.IPv6_NO_RFC4007_MATCHER.match(host) is not None return validated def scheme_is_valid(scheme, require=False): """Determine if the scheme is valid. :param str scheme: The scheme string to validate. :param bool require: (optional) Set to ``True`` to require the presence of a scheme. :returns: ``True`` if the scheme is valid. ``False`` otherwise. :rtype: bool """ return is_valid(scheme, misc.SCHEME_MATCHER, require) def path_is_valid(path, require=False): """Determine if the path component is valid. :param str path: The path string to validate. :param bool require: (optional) Set to ``True`` to require the presence of a path. :returns: ``True`` if the path is valid. ``False`` otherwise. :rtype: bool """ return is_valid(path, misc.PATH_MATCHER, require) def query_is_valid(query, require=False): """Determine if the query component is valid. :param str query: The query string to validate. :param bool require: (optional) Set to ``True`` to require the presence of a query. :returns: ``True`` if the query is valid. ``False`` otherwise. :rtype: bool """ return is_valid(query, misc.QUERY_MATCHER, require) def fragment_is_valid(fragment, require=False): """Determine if the fragment component is valid. :param str fragment: The fragment string to validate. :param bool require: (optional) Set to ``True`` to require the presence of a fragment. :returns: ``True`` if the fragment is valid. ``False`` otherwise. :rtype: bool """ return is_valid(fragment, misc.FRAGMENT_MATCHER, require) def valid_ipv4_host_address(host): """Determine if the given host is a valid IPv4 address.""" # If the host exists, and it might be IPv4, check each byte in the # address. return all([0 <= int(byte, base=10) <= 255 for byte in host.split('.')]) _COMPONENT_VALIDATORS = { 'scheme': scheme_is_valid, 'path': path_is_valid, 'query': query_is_valid, 'fragment': fragment_is_valid, } _SUBAUTHORITY_VALIDATORS = set(['userinfo', 'host', 'port']) def subauthority_component_is_valid(uri, component): """Determine if the userinfo, host, and port are valid.""" try: subauthority_dict = uri.authority_info() except exceptions.InvalidAuthority: return False # If we can parse the authority into sub-components and we're not # validating the port, we can assume it's valid. if component == 'host': return host_is_valid(subauthority_dict['host']) elif component != 'port': return True try: port = int(subauthority_dict['port']) except TypeError: # If the port wasn't provided it'll be None and int(None) raises a # TypeError return True return (0 <= port <= 65535) def ensure_components_are_valid(uri, validated_components): """Assert that all components are valid in the URI.""" invalid_components = set([]) for component in validated_components: if component in _SUBAUTHORITY_VALIDATORS: if not subauthority_component_is_valid(uri, component): invalid_components.add(component) # Python's peephole optimizer means that while this continue *is* # actually executed, coverage.py cannot detect that. See also, # https://bitbucket.org/ned/coveragepy/issues/198/continue-marked-as-not-covered continue # nocov: Python 2.7, 3.3, 3.4 validator = _COMPONENT_VALIDATORS[component] if not validator(getattr(uri, component)): invalid_components.add(component) if invalid_components: raise exceptions.InvalidComponentsError(uri, *invalid_components) rfc3986-1.3.2/src/rfc3986.egg-info/0000775000327200032720000000000013466312017017401 5ustar slarsonslarson00000000000000rfc3986-1.3.2/src/rfc3986.egg-info/PKG-INFO0000664000327200032720000001760313466312017020505 0ustar slarsonslarson00000000000000Metadata-Version: 2.1 Name: rfc3986 Version: 1.3.2 Summary: Validating URI References per RFC 3986 Home-page: http://rfc3986.readthedocs.io Author: Ian Stapleton Cordasco Author-email: graffatcolmingov@gmail.com License: Apache 2.0 Description: rfc3986 ======= A Python implementation of `RFC 3986`_ including validation and authority parsing. Installation ------------ Use pip to install ``rfc3986`` like so:: pip install rfc3986 License ------- `Apache License Version 2.0`_ Example Usage ------------- The following are the two most common use cases envisioned for ``rfc3986``. Replacing ``urlparse`` `````````````````````` To parse a URI and receive something very similar to the standard library's ``urllib.parse.urlparse`` .. code-block:: python from rfc3986 import urlparse ssh = urlparse('ssh://user@git.openstack.org:29418/openstack/glance.git') print(ssh.scheme) # => ssh print(ssh.userinfo) # => user print(ssh.params) # => None print(ssh.port) # => 29418 To create a copy of it with new pieces you can use ``copy_with``: .. code-block:: python new_ssh = ssh.copy_with( scheme='https' userinfo='', port=443, path='/openstack/glance' ) print(new_ssh.scheme) # => https print(new_ssh.userinfo) # => None # etc. Strictly Parsing a URI and Applying Validation `````````````````````````````````````````````` To parse a URI into a convenient named tuple, you can simply: .. code-block:: python from rfc3986 import uri_reference example = uri_reference('http://example.com') email = uri_reference('mailto:user@domain.com') ssh = uri_reference('ssh://user@git.openstack.org:29418/openstack/keystone.git') With a parsed URI you can access data about the components: .. code-block:: python print(example.scheme) # => http print(email.path) # => user@domain.com print(ssh.userinfo) # => user print(ssh.host) # => git.openstack.org print(ssh.port) # => 29418 It can also parse URIs with unicode present: .. code-block:: python uni = uri_reference(b'http://httpbin.org/get?utf8=\xe2\x98\x83') # ☃ print(uni.query) # utf8=%E2%98%83 With a parsed URI you can also validate it: .. code-block:: python if ssh.is_valid(): subprocess.call(['git', 'clone', ssh.unsplit()]) You can also take a parsed URI and normalize it: .. code-block:: python mangled = uri_reference('hTTp://exAMPLe.COM') print(mangled.scheme) # => hTTp print(mangled.authority) # => exAMPLe.COM normal = mangled.normalize() print(normal.scheme) # => http print(mangled.authority) # => example.com But these two URIs are (functionally) equivalent: .. code-block:: python if normal == mangled: webbrowser.open(normal.unsplit()) Your paths, queries, and fragments are safe with us though: .. code-block:: python mangled = uri_reference('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth') normal = mangled.normalize() assert normal == 'hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth' assert normal == 'http://example.com/Some/reallY/biZZare/pAth' assert normal != 'http://example.com/some/really/bizzare/path' If you do not actually need a real reference object and just want to normalize your URI: .. code-block:: python from rfc3986 import normalize_uri assert (normalize_uri('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth') == 'http://example.com/Some/reallY/biZZare/pAth') You can also very simply validate a URI: .. code-block:: python from rfc3986 import is_valid_uri assert is_valid_uri('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth') Requiring Components ~~~~~~~~~~~~~~~~~~~~ You can validate that a particular string is a valid URI and require independent components: .. code-block:: python from rfc3986 import is_valid_uri assert is_valid_uri('http://localhost:8774/v2/resource', require_scheme=True, require_authority=True, require_path=True) # Assert that a mailto URI is invalid if you require an authority # component assert is_valid_uri('mailto:user@example.com', require_authority=True) is False If you have an instance of a ``URIReference``, you can pass the same arguments to ``URIReference#is_valid``, e.g., .. code-block:: python from rfc3986 import uri_reference http = uri_reference('http://localhost:8774/v2/resource') assert uri.is_valid(require_scheme=True, require_authority=True, require_path=True) # Assert that a mailto URI is invalid if you require an authority # component mailto = uri_reference('mailto:user@example.com') assert uri.is_valid(require_authority=True) is False Alternatives ------------ - `rfc3987 `_ This is a direct competitor to this library, with extra features, licensed under the GPL. - `uritools `_ This can parse URIs in the manner of RFC 3986 but provides no validation and only recently added Python 3 support. - Standard library's `urlparse`/`urllib.parse` The functions in these libraries can only split a URI (valid or not) and provide no validation. Contributing ------------ This project follows and enforces the Python Software Foundation's `Code of Conduct `_. If you would like to contribute but do not have a bug or feature in mind, feel free to email Ian and find out how you can help. The git repository for this project is maintained at https://github.com/python-hyper/rfc3986 .. _RFC 3986: http://tools.ietf.org/html/rfc3986 .. _Apache License Version 2.0: https://www.apache.org/licenses/LICENSE-2.0 Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers Classifier: Natural Language :: English Classifier: License :: OSI Approved :: Apache Software License Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Provides-Extra: idna2008 rfc3986-1.3.2/src/rfc3986.egg-info/SOURCES.txt0000664000327200032720000000331613466312017021270 0ustar slarsonslarson00000000000000AUTHORS.rst LICENSE MANIFEST.in README.rst setup.cfg setup.py docs/source/conf.py docs/source/index.rst docs/source/narrative.rst docs/source/api-ref/api.rst docs/source/api-ref/builder.rst docs/source/api-ref/index.rst docs/source/api-ref/iri.rst docs/source/api-ref/miscellaneous.rst docs/source/api-ref/uri.rst docs/source/api-ref/validators.rst docs/source/release-notes/0.1.0.rst docs/source/release-notes/0.2.0.rst docs/source/release-notes/0.2.1.rst docs/source/release-notes/0.2.2.rst docs/source/release-notes/0.3.0.rst docs/source/release-notes/0.3.1.rst docs/source/release-notes/0.4.0.rst docs/source/release-notes/0.4.1.rst docs/source/release-notes/0.4.2.rst docs/source/release-notes/1.0.0.rst docs/source/release-notes/1.1.0.rst docs/source/release-notes/1.2.0.rst docs/source/release-notes/1.3.0.rst docs/source/release-notes/1.3.1.rst docs/source/release-notes/1.3.2.rst docs/source/release-notes/index.rst docs/source/user/building.rst docs/source/user/parsing.rst docs/source/user/validating.rst src/rfc3986/__init__.py src/rfc3986/_mixin.py src/rfc3986/abnf_regexp.py src/rfc3986/api.py src/rfc3986/builder.py src/rfc3986/compat.py src/rfc3986/exceptions.py src/rfc3986/iri.py src/rfc3986/misc.py src/rfc3986/normalizers.py src/rfc3986/parseresult.py src/rfc3986/uri.py src/rfc3986/validators.py src/rfc3986.egg-info/PKG-INFO src/rfc3986.egg-info/SOURCES.txt src/rfc3986.egg-info/dependency_links.txt src/rfc3986.egg-info/requires.txt src/rfc3986.egg-info/top_level.txt tests/__init__.py tests/base.py tests/conftest.py tests/test_api.py tests/test_builder.py tests/test_iri.py tests/test_misc.py tests/test_normalizers.py tests/test_parseresult.py tests/test_unicode_support.py tests/test_uri.py tests/test_validators.pyrfc3986-1.3.2/src/rfc3986.egg-info/dependency_links.txt0000664000327200032720000000000113466312017023447 0ustar slarsonslarson00000000000000 rfc3986-1.3.2/src/rfc3986.egg-info/requires.txt0000664000327200032720000000002113466312017021772 0ustar slarsonslarson00000000000000 [idna2008] idna rfc3986-1.3.2/src/rfc3986.egg-info/top_level.txt0000664000327200032720000000001013466312017022122 0ustar slarsonslarson00000000000000rfc3986 rfc3986-1.3.2/tests/0000775000327200032720000000000013466312017015156 5ustar slarsonslarson00000000000000rfc3986-1.3.2/tests/__init__.py0000664000327200032720000000000013466311640017256 0ustar slarsonslarson00000000000000rfc3986-1.3.2/tests/base.py0000664000327200032720000001553413466311640016453 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- # Copyright (c) 2015 Ian Stapleton Cordasco # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or # implied. # See the License for the specific language governing permissions and # limitations under the License. class BaseTestParsesURIs: test_class = None """Tests for self.test_class handling of URIs.""" def test_handles_basic_uri(self, basic_uri): """Test that self.test_class can handle a simple URI.""" uri = self.test_class.from_string(basic_uri) assert uri.scheme == 'http' assert uri.authority == basic_uri[7:] # len('http://') assert uri.host == uri.authority assert uri.path is None assert uri.query is None assert uri.fragment is None assert uri.port is None assert uri.userinfo is None def test_handles_basic_uri_with_port(self, basic_uri_with_port): """Test that self.test_class can handle a simple URI with a port.""" uri = self.test_class.from_string(basic_uri_with_port) assert uri.scheme == 'ftp' assert uri.authority == basic_uri_with_port[6:] assert uri.host != uri.authority assert str(uri.port) == '21' assert uri.path is None assert uri.query is None assert uri.fragment is None assert uri.userinfo is None def test_handles_uri_with_port_and_userinfo( self, uri_with_port_and_userinfo): """ Test that self.test_class can handle a URI with a port and userinfo. """ uri = self.test_class.from_string(uri_with_port_and_userinfo) assert uri.scheme == 'ssh' # 6 == len('ftp://') assert uri.authority == uri_with_port_and_userinfo[6:] assert uri.host != uri.authority assert str(uri.port) == '22' assert uri.path is None assert uri.query is None assert uri.fragment is None assert uri.userinfo == 'user:pass' def test_handles_tricky_userinfo( self, uri_with_port_and_tricky_userinfo): """ Test that self.test_class can handle a URI with unusual (non a-z) chars in userinfo. """ uri = self.test_class.from_string(uri_with_port_and_tricky_userinfo) assert uri.scheme == 'ssh' # 6 == len('ftp://') assert uri.authority == uri_with_port_and_tricky_userinfo[6:] assert uri.host != uri.authority assert str(uri.port) == '22' assert uri.path is None assert uri.query is None assert uri.fragment is None assert uri.userinfo == 'user%20!=:pass' def test_handles_basic_uri_with_path(self, basic_uri_with_path): """Test that self.test_class can handle a URI with a path.""" uri = self.test_class.from_string(basic_uri_with_path) assert uri.scheme == 'http' assert basic_uri_with_path == (uri.scheme + '://' + uri.authority + uri.path) assert uri.host == uri.authority assert uri.path == '/path/to/resource' assert uri.query is None assert uri.fragment is None assert uri.userinfo is None assert uri.port is None def test_handles_uri_with_path_and_query(self, uri_with_path_and_query): """ Test that self.test_class can handle a URI with a path and query. """ uri = self.test_class.from_string(uri_with_path_and_query) assert uri.scheme == 'http' assert uri.host == uri.authority assert uri.path == '/path/to/resource' assert uri.query == 'key=value' assert uri.fragment is None assert uri.userinfo is None assert uri.port is None def test_handles_uri_with_everything(self, uri_with_everything): """ Test that self.test_class can handle and with everything in it. """ uri = self.test_class.from_string(uri_with_everything) assert uri.scheme == 'https' assert uri.path == '/path/to/resource' assert uri.query == 'key=value' assert uri.fragment == 'fragment' assert uri.userinfo == 'user:pass' assert str(uri.port) == '443' def test_handles_relative_uri(self, relative_uri): """Test that self.test_class can handle a relative URI.""" uri = self.test_class.from_string(relative_uri) assert uri.scheme is None assert uri.authority == relative_uri[2:] def test_handles_percent_in_path(self, uri_path_with_percent): """Test that self.test_class encodes the % character properly.""" uri = self.test_class.from_string(uri_path_with_percent) print(uri.path) assert uri.path == '/%25%20' def test_handles_percent_in_query(self, uri_query_with_percent): uri = self.test_class.from_string(uri_query_with_percent) assert uri.query == 'a=%25' def test_handles_percent_in_fragment(self, uri_fragment_with_percent): uri = self.test_class.from_string(uri_fragment_with_percent) assert uri.fragment == 'perc%25ent' class BaseTestUnsplits: test_class = None def test_basic_uri_unsplits(self, basic_uri): uri = self.test_class.from_string(basic_uri) assert uri.unsplit() == basic_uri def test_basic_uri_with_port_unsplits(self, basic_uri_with_port): uri = self.test_class.from_string(basic_uri_with_port) assert uri.unsplit() == basic_uri_with_port def test_uri_with_port_and_userinfo_unsplits(self, uri_with_port_and_userinfo): uri = self.test_class.from_string(uri_with_port_and_userinfo) assert uri.unsplit() == uri_with_port_and_userinfo def test_basic_uri_with_path_unsplits(self, basic_uri_with_path): uri = self.test_class.from_string(basic_uri_with_path) assert uri.unsplit() == basic_uri_with_path def test_uri_with_path_and_query_unsplits(self, uri_with_path_and_query): uri = self.test_class.from_string(uri_with_path_and_query) assert uri.unsplit() == uri_with_path_and_query def test_uri_with_everything_unsplits(self, uri_with_everything): uri = self.test_class.from_string(uri_with_everything) assert uri.unsplit() == uri_with_everything def test_relative_uri_unsplits(self, relative_uri): uri = self.test_class.from_string(relative_uri) assert uri.unsplit() == relative_uri def test_absolute_path_uri_unsplits(self, absolute_path_uri): uri = self.test_class.from_string(absolute_path_uri) assert uri.unsplit() == absolute_path_uri rfc3986-1.3.2/tests/conftest.py0000664000327200032720000000622613466311640017364 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- import itertools import sys import pytest SNOWMAN = b'\xe2\x98\x83' valid_hosts = [ '[21DA:00D3:0000:2F3B:02AA:00FF:FE28:9C5A]', '[::1]', '[::1%25lo]', # With ZoneID '[FF02:0:0:0:0:0:0:2%25en01]', # With ZoneID '[FF02:30:0:0:0:0:0:5%25en1]', # With ZoneID '[FF02:30:0:0:0:0:0:5%25%26]', # With ZoneID '[FF02:30:0:0:0:0:0:5%2525]', # With ZoneID '[21DA:D3:0:2F3B:2AA:FF:FE28:9C5A]', '[FE80::2AA:FF:FE9A:4CA2]', '[FF02::2]', '[FFFF::]', '[FF02:3::5]', '[FF02:0:0:0:0:0:0:2]', '[FF02:30:0:0:0:0:0:5]', '127.0.0.1', 'www.example.com', 'localhost', 'http-bin.org', '%2Fvar%2Frun%2Fsocket', '6g9m8V6', # Issue #48 ] invalid_hosts = [ '[FF02::3::5]', # IPv6 can only have one :: '[FADF:01]', # Not properly compacted (missing a :) '[FADF:01%en0]', # Not properly compacted (missing a :), Invalid ZoneID '[FADF::01%]', # Empty Zone ID 'localhost:80:80:80', # Too many ports '256.256.256.256', # Invalid IPv4 Address SNOWMAN.decode('utf-8') ] equivalent_hostnames = [ 'example.com', 'eXample.com', 'example.COM', 'EXAMPLE.com', 'ExAMPLE.com', 'eXample.COM', 'example.COM', 'EXAMPLE.COM', 'ExAMPLE.COM', ] equivalent_schemes = [ 'https', 'HTTPS', 'HttPs', 'hTTpS', 'HtTpS', ] equivalent_schemes_and_hostnames = list(itertools.product( equivalent_schemes, equivalent_hostnames, )) @pytest.fixture(params=valid_hosts) def basic_uri(request): return 'http://%s' % request.param @pytest.fixture(params=equivalent_schemes_and_hostnames) def uri_to_normalize(request): return '%s://%s' % request.param @pytest.fixture(params=valid_hosts) def basic_uri_with_port(request): return 'ftp://%s:21' % request.param @pytest.fixture(params=valid_hosts) def uri_with_port_and_userinfo(request): return 'ssh://user:pass@%s:22' % request.param @pytest.fixture(params=valid_hosts) def uri_with_port_and_tricky_userinfo(request): return 'ssh://%s@%s:22' % ('user%20!=:pass', request.param) @pytest.fixture(params=valid_hosts) def basic_uri_with_path(request): return 'http://%s/path/to/resource' % request.param @pytest.fixture(params=valid_hosts) def uri_with_path_and_query(request): return 'http://%s/path/to/resource?key=value' % request.param @pytest.fixture(params=valid_hosts) def uri_with_everything(request): return 'https://user:pass@%s:443/path/to/resource?key=value#fragment' % ( request.param) @pytest.fixture(params=valid_hosts) def relative_uri(request): return '//%s' % request.param @pytest.fixture def absolute_path_uri(): return '/path/to/file' @pytest.fixture(params=invalid_hosts) def invalid_uri(request): return 'https://%s' % request.param @pytest.fixture(params=valid_hosts) def uri_path_with_percent(request): return 'https://%s/%% ' % request.param @pytest.fixture(params=valid_hosts) def uri_query_with_percent(request): return 'https://%s?a=%%' % request.param @pytest.fixture(params=valid_hosts) def uri_fragment_with_percent(request): return 'https://%s#perc%%ent' % request.param sys.path.insert(0, '.') rfc3986-1.3.2/tests/test_api.py0000664000327200032720000000061613466311640017344 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- from rfc3986.api import ( uri_reference, is_valid_uri, normalize_uri, URIReference ) def test_uri_reference(): assert isinstance(uri_reference('http://example.com'), URIReference) def test_is_valid_uri(): assert is_valid_uri('http://example.com') is True def test_normalize_uri(): assert normalize_uri('HTTP://EXAMPLE.COM') == 'http://example.com' rfc3986-1.3.2/tests/test_builder.py0000664000327200032720000001174713466311640020230 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- # Copyright (c) 2017 Ian Stapleton Cordasco # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or # implied. # See the License for the specific language governing permissions and # limitations under the License. """Module containing the tests for the URIBuilder object.""" import pytest from rfc3986 import builder def test_builder_default(): """Verify the default values.""" uribuilder = builder.URIBuilder() assert uribuilder.scheme is None assert uribuilder.userinfo is None assert uribuilder.host is None assert uribuilder.port is None assert uribuilder.path is None assert uribuilder.query is None assert uribuilder.fragment is None def test_repr(): """Verify our repr looks like our class.""" uribuilder = builder.URIBuilder() assert repr(uribuilder).startswith('URIBuilder(scheme=None') @pytest.mark.parametrize('scheme', [ 'https', 'hTTps', 'Https', 'HtTpS', 'HTTPS', ]) def test_add_scheme(scheme): """Verify schemes are normalized when added.""" uribuilder = builder.URIBuilder().add_scheme(scheme) assert uribuilder.scheme == 'https' @pytest.mark.parametrize('username, password, userinfo', [ ('user', 'pass', 'user:pass'), ('user', None, 'user'), ('user@domain.com', 'password', 'user%40domain.com:password'), ('user', 'pass:word', 'user:pass%3Aword'), ]) def test_add_credentials(username, password, userinfo): """Verify we normalize usernames and passwords.""" uribuilder = builder.URIBuilder().add_credentials(username, password) assert uribuilder.userinfo == userinfo def test_add_credentials_requires_username(): """Verify one needs a username to add credentials.""" with pytest.raises(ValueError): builder.URIBuilder().add_credentials(None, None) @pytest.mark.parametrize( ['hostname', 'expected_hostname'], [ ('google.com', 'google.com'), ('GOOGLE.COM', 'google.com'), ('gOOgLe.COM', 'google.com'), ('goOgLE.com', 'google.com'), ('[::ff%etH0]', '[::ff%25etH0]'), ('[::ff%25etH0]', '[::ff%25etH0]'), ('[::FF%etH0]', '[::ff%25etH0]'), ] ) def test_add_host(hostname, expected_hostname): """Verify we normalize hostnames in add_host.""" uribuilder = builder.URIBuilder().add_host(hostname) assert uribuilder.host == expected_hostname @pytest.mark.parametrize('port', [ -100, '-100', -1, '-1', 65536, '65536', 1000000, '1000000', '', 'abc', '0b10', ]) def test_add_invalid_port(port): """Verify we raise a ValueError for invalid ports.""" with pytest.raises(ValueError): builder.URIBuilder().add_port(port) @pytest.mark.parametrize('port, expected', [ (0, '0'), ('0', '0'), (1, '1'), ('1', '1'), (22, '22'), ('22', '22'), (80, '80'), ('80', '80'), (443, '443'), ('443', '443'), (65535, '65535'), ('65535', '65535'), ]) def test_add_port(port, expected): """Verify we normalize our port.""" uribuilder = builder.URIBuilder().add_port(port) assert uribuilder.port == expected @pytest.mark.parametrize('path', [ 'sigmavirus24/rfc3986', '/sigmavirus24/rfc3986', ]) def test_add_path(path): """Verify we normalize our path value.""" uribuilder = builder.URIBuilder().add_path(path) assert uribuilder.path == '/sigmavirus24/rfc3986' @pytest.mark.parametrize('query_items, expected', [ ({'a': 'b c'}, 'a=b+c'), ({'a': 'b+c'}, 'a=b%2Bc'), ([('a', 'b c')], 'a=b+c'), ([('a', 'b+c')], 'a=b%2Bc'), ([('a', 'b'), ('c', 'd')], 'a=b&c=d'), ([('a', 'b'), ('username', '@d')], 'a=b&username=%40d'), ([('percent', '%')], 'percent=%25'), ]) def test_add_query_from(query_items, expected): """Verify the behaviour of add_query_from.""" uribuilder = builder.URIBuilder().add_query_from(query_items) assert uribuilder.query == expected def test_add_query(): """Verify we do not modify the provided query string.""" uribuilder = builder.URIBuilder().add_query('username=@foo') assert uribuilder.query == 'username=@foo' def test_add_fragment(): """Verify our handling of fragments.""" uribuilder = builder.URIBuilder().add_fragment('section-2.5.1') assert uribuilder.fragment == 'section-2.5.1' def test_finalize(): """Verify the whole thing.""" uri = builder.URIBuilder().add_scheme('https').add_credentials( 'sigmavirus24', 'not-my-re@l-password' ).add_host('github.com').add_path('sigmavirus24/rfc3986').finalize( ).unsplit() expected = ('https://sigmavirus24:not-my-re%40l-password@github.com/' 'sigmavirus24/rfc3986') assert expected == uri rfc3986-1.3.2/tests/test_iri.py0000664000327200032720000000400713466311640017354 0ustar slarsonslarson00000000000000# coding: utf-8 import pytest import rfc3986 import sys from rfc3986.exceptions import InvalidAuthority try: import idna except ImportError: idna = None requires_idna = pytest.mark.skipif(idna is None, reason="This test requires the 'idna' module") iri_to_uri = pytest.mark.parametrize( ["iri", "uri"], [ (u'http://Bücher.de', u'http://xn--bcher-kva.de'), (u'http://faß.de', u'http://xn--fa-hia.de'), (u'http://βόλος.com/β/ό?λ#ος', u'http://xn--nxasmm1c.com/%CE%B2/%CF%8C?%CE%BB#%CE%BF%CF%82'), (u'http://ශ්\u200dරී.com', u'http://xn--10cl1a0b660p.com'), (u'http://نامه\u200cای.com', u'http://xn--mgba3gch31f060k.com'), (u'http://Bü:ẞ@gOoGle.com', u'http://B%C3%BC:%E1%BA%9E@gOoGle.com'), (u'http://ẞ.com:443', u'http://xn--zca.com:443'), (u'http://ẞ.foo.com', u'http://xn--zca.foo.com'), (u'http://Bẞ.com', u'http://xn--b-qfa.com'), (u'http+unix://%2Ftmp%2FTEST.sock/get', 'http+unix://%2Ftmp%2FTEST.sock/get'), ] ) @requires_idna @iri_to_uri def test_encode_iri(iri, uri): assert rfc3986.iri_reference(iri).encode().unsplit() == uri @iri_to_uri def test_iri_equality(iri, uri): assert rfc3986.iri_reference(iri) == iri def test_iri_equality_special_cases(): assert rfc3986.iri_reference(u"http://Bü:ẞ@βόλος.com/β/ό?λ#ος") == \ (u"http", u"Bü:ẞ@βόλος.com", u"/%CE%B2/%CF%8C", u"%CE%BB", u"%CE%BF%CF%82") with pytest.raises(TypeError): rfc3986.iri_reference(u"http://ẞ.com") == 1 @requires_idna @pytest.mark.parametrize("iri", [ u'http://♥.net', u'http://\u0378.net', pytest.param( u'http://㛼.com', marks=pytest.mark.skipif( sys.version_info < (3, 3) and sys.maxunicode <= 0xFFFF, reason="Python configured without UCS-4 support" ) ), ]) def test_encode_invalid_iri(iri): iri_ref = rfc3986.iri_reference(iri) with pytest.raises(InvalidAuthority): iri_ref.encode() rfc3986-1.3.2/tests/test_misc.py0000664000327200032720000000332513466311640017526 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- from rfc3986.uri import URIReference from rfc3986.misc import merge_paths def test_merge_paths_with_base_path_without_base_authority(): """Demonstrate merging with a base URI without an authority.""" base = URIReference(scheme=None, authority=None, path='/foo/bar/bogus', query=None, fragment=None) expected = '/foo/bar/relative' assert merge_paths(base, 'relative') == expected def test_merge_paths_with_base_authority_and_path(): """Demonstrate merging with a base URI with an authority and path.""" base = URIReference(scheme=None, authority='authority', path='/foo/bar/bogus', query=None, fragment=None) expected = '/foo/bar/relative' assert merge_paths(base, 'relative') == expected def test_merge_paths_without_base_authority_or_path(): """Demonstrate merging with a base URI without an authority or path.""" base = URIReference(scheme=None, authority=None, path=None, query=None, fragment=None) expected = '/relative' assert merge_paths(base, 'relative') == expected def test_merge_paths_with_base_authority_without_path(): """Demonstrate merging with a base URI without an authority or path.""" base = URIReference(scheme=None, authority='authority', path=None, query=None, fragment=None) expected = '/relative' assert merge_paths(base, 'relative') == expected rfc3986-1.3.2/tests/test_normalizers.py0000664000327200032720000000650313466311640021141 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- import pytest from rfc3986.uri import URIReference from rfc3986.normalizers import ( normalize_scheme, normalize_percent_characters, remove_dot_segments, encode_component, normalize_host ) def test_normalize_scheme(): assert 'http' == normalize_scheme('htTp') assert 'http' == normalize_scheme('http') assert 'http' == normalize_scheme('HTTP') def test_normalize_percent_characters(): expected = '%3Athis_should_be_lowercase%DF%AB%4C' assert expected == normalize_percent_characters( '%3athis_should_be_lowercase%DF%ab%4c') assert expected == normalize_percent_characters( '%3Athis_should_be_lowercase%DF%AB%4C') assert expected == normalize_percent_characters( '%3Athis_should_be_lowercase%DF%aB%4C') paths = [ # (Input, expected output) ('/foo/bar/.', '/foo/bar/'), ('/foo/bar/', '/foo/bar/'), ('/foo/bar', '/foo/bar'), ('./foo/bar', 'foo/bar'), ('/./foo/bar', '/foo/bar'), ('/foo%20bar/biz%2Abaz', '/foo%20bar/biz%2Abaz'), ('../foo/bar', 'foo/bar'), ('/../foo/bar', '/foo/bar'), ('a/./b/../b/%63/%7Bfoo%7D', 'a/b/%63/%7Bfoo%7D'), ('//a/./b/../b/%63/%7Bfoo%7D', '//a/b/%63/%7Bfoo%7D'), ('mid/content=5/../6', 'mid/6'), ('/a/b/c/./../../g', '/a/g'), ] @pytest.fixture(params=paths) def path_fixture(request): return request.param @pytest.fixture(params=paths) def uris(request): to_norm, normalized = request.param return (URIReference(None, None, to_norm, None, None), URIReference(None, None, normalized, None, None)) def test_remove_dot_segments(path_fixture): to_normalize, expected = path_fixture assert expected == remove_dot_segments(to_normalize) def test_normalized_equality(uris): assert uris[0] == uris[1] def test_hostname_normalization(): assert (URIReference(None, 'EXAMPLE.COM', None, None, None) == URIReference(None, 'example.com', None, None, None)) @pytest.mark.parametrize( ['authority', 'expected_authority'], [ ('user%2aName@EXAMPLE.COM', 'user%2AName@example.com'), ('[::1%eth0]', '[::1%25eth0]') ] ) def test_authority_normalization(authority, expected_authority): uri = URIReference( None, authority, None, None, None).normalize() assert uri.authority == expected_authority def test_fragment_normalization(): uri = URIReference( None, 'example.com', None, None, 'fiz%DF').normalize() assert uri.fragment == 'fiz%DF' @pytest.mark.parametrize( ["component", "encoded_component"], [ ('/%', '/%25'), ('/%a', '/%25a'), ('/%ag', '/%25ag'), ('/%af', '/%af'), ('/%20/%', '/%2520/%25'), ('/%20%25', '/%20%25'), ('/%21%22%23%ah%12%ff', '/%2521%2522%2523%25ah%2512%25ff'), ] ) def test_detect_percent_encoded_component(component, encoded_component): assert encode_component(component, 'utf-8') == encoded_component @pytest.mark.parametrize( ["host", "normalized_host"], [ ('LOCALHOST', 'localhost'), ('[::1%eth0]', '[::1%25eth0]'), ('[::1%25]', '[::1%2525]'), ('[::1%%25]', '[::1%25%25]'), ('[::1%25%25]', '[::1%25%25]'), ('[::Af%Ff]', '[::af%25Ff]'), ('[::Af%%Ff]', '[::af%25%Ff]'), ('[::Af%25Ff]', '[::af%25Ff]'), ] ) def test_normalize_host(host, normalized_host): assert normalize_host(host) == normalized_host rfc3986-1.3.2/tests/test_parseresult.py0000664000327200032720000001401413466311640021141 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- # Copyright (c) 2015 Ian Stapleton Cordasco # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or # implied. # See the License for the specific language governing permissions and # limitations under the License. import rfc3986 from rfc3986 import exceptions from rfc3986 import parseresult as pr import pytest from . import base INVALID_PORTS = ['443:80', '443:80:443', 'abcdef', 'port', '43port'] SNOWMAN = b'\xe2\x98\x83' SNOWMAN_IDNA_HOST = 'http://xn--n3h.com' @pytest.mark.parametrize('port', INVALID_PORTS) def test_port_parsing(port): with pytest.raises(exceptions.InvalidPort): rfc3986.urlparse('https://httpbin.org:{0}/get'.format(port)) @pytest.mark.parametrize('parts, unsplit', [ (('https', None, 'httpbin.org'), u'https://httpbin.org'), (('https', 'user', 'httpbin.org'), u'https://user@httpbin.org'), (('https', None, 'httpbin.org', 443, '/get'), u'https://httpbin.org:443/get'), (('HTTPS', None, 'HTTPBIN.ORG'), u'https://httpbin.org'), ]) def test_from_parts(parts, unsplit): uri = pr.ParseResult.from_parts(*parts) assert uri.unsplit() == unsplit @pytest.mark.parametrize('parts, unsplit', [ (('https', None, 'httpbin.org'), b'https://httpbin.org'), (('https', 'user', 'httpbin.org'), b'https://user@httpbin.org'), (('https', None, 'httpbin.org', 443, '/get'), b'https://httpbin.org:443/get'), (('HTTPS', None, 'HTTPBIN.ORG'), b'https://httpbin.org'), ]) def test_bytes_from_parts(parts, unsplit): uri = pr.ParseResultBytes.from_parts(*parts) assert uri.unsplit() == unsplit class TestParseResultParsesURIs(base.BaseTestParsesURIs): test_class = pr.ParseResult class TestParseResultUnsplits(base.BaseTestUnsplits): test_class = pr.ParseResult def test_normalizes_uris_when_using_from_string(uri_to_normalize): """Verify we always get the same thing out as we expect.""" result = pr.ParseResult.from_string(uri_to_normalize, lazy_normalize=False) assert result.scheme == 'https' assert result.host == 'example.com' class TestStdlibShims: def test_uri_with_everything(self, uri_with_everything): uri = pr.ParseResult.from_string(uri_with_everything) assert uri.host == uri.hostname assert uri.netloc == uri.authority assert uri.query == uri.params assert uri.geturl() == uri.unsplit() def test_creates_a_copy_with_a_new_path(uri_with_everything): uri = pr.ParseResult.from_string(uri_with_everything) new_uri = uri.copy_with(path='/parse/result/tests/are/fun') assert new_uri.path == '/parse/result/tests/are/fun' def test_creates_a_copy_with_a_new_port(basic_uri): uri = pr.ParseResult.from_string(basic_uri) new_uri = uri.copy_with(port=443) assert new_uri.port == 443 def test_parse_result_encodes_itself(uri_with_everything): uri = pr.ParseResult.from_string(uri_with_everything) uribytes = uri.encode() encoding = uri.encoding assert uri.scheme.encode(encoding) == uribytes.scheme assert uri.userinfo.encode(encoding) == uribytes.userinfo assert uri.host.encode(encoding) == uribytes.host assert uri.port == uribytes.port assert uri.path.encode(encoding) == uribytes.path assert uri.query.encode(encoding) == uribytes.query assert uri.fragment.encode(encoding) == uribytes.fragment class TestParseResultBytes: def test_handles_uri_with_everything(self, uri_with_everything): uri = pr.ParseResultBytes.from_string(uri_with_everything) assert uri.scheme == b'https' assert uri.path == b'/path/to/resource' assert uri.query == b'key=value' assert uri.fragment == b'fragment' assert uri.userinfo == b'user:pass' assert uri.port == 443 assert isinstance(uri.authority, bytes) is True def test_raises_invalid_authority_for_invalid_uris(self, invalid_uri): with pytest.raises(exceptions.InvalidAuthority): pr.ParseResultBytes.from_string(invalid_uri) @pytest.mark.parametrize('port', INVALID_PORTS) def test_raises_invalid_port_non_strict_parse(self, port): with pytest.raises(exceptions.InvalidPort): pr.ParseResultBytes.from_string( 'https://httpbin.org:{0}/get'.format(port), strict=False ) def test_copy_with_a_new_path(self, uri_with_everything): uri = pr.ParseResultBytes.from_string(uri_with_everything) new_uri = uri.copy_with(path=b'/parse/result/tests/are/fun') assert new_uri.path == b'/parse/result/tests/are/fun' def test_copy_with_a_new_unicode_path(self, uri_with_everything): uri = pr.ParseResultBytes.from_string(uri_with_everything) pathbytes = b'/parse/result/tests/are/fun' + SNOWMAN new_uri = uri.copy_with(path=pathbytes.decode('utf-8')) assert new_uri.path == (b'/parse/result/tests/are/fun' + SNOWMAN) def test_unsplit(self): uri = pr.ParseResultBytes.from_string( b'http://' + SNOWMAN + b'.com/path', strict=False ) idna_encoded = SNOWMAN_IDNA_HOST.encode('utf-8') + b'/path' assert uri.unsplit(use_idna=True) == idna_encoded def test_eager_normalization_from_string(self): uri = pr.ParseResultBytes.from_string( b'http://' + SNOWMAN + b'.com/path', strict=False, lazy_normalize=False, ) assert uri.unsplit() == b'http:/path' def test_eager_normalization_from_parts(self): uri = pr.ParseResultBytes.from_parts( scheme='http', host=SNOWMAN.decode('utf-8'), path='/path', lazy_normalize=False, ) assert uri.unsplit() == b'http:/path' rfc3986-1.3.2/tests/test_unicode_support.py0000664000327200032720000000353613466311640022021 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- import pytest from rfc3986 import exceptions from rfc3986 import parseresult from rfc3986 import uri_reference from rfc3986 import urlparse SNOWMAN = b'\xe2\x98\x83' SNOWMAN_PARAMS = b'http://example.com?utf8=' + SNOWMAN SNOWMAN_HOST = b'http://' + SNOWMAN + b'.com' SNOWMAN_IDNA_HOST = 'http://xn--n3h.com' def test_unicode_uri(): url_bytestring = SNOWMAN_PARAMS unicode_url = url_bytestring.decode('utf-8') uri = uri_reference(unicode_url) assert uri.is_valid() is True assert uri == 'http://example.com?utf8=%E2%98%83' def test_unicode_uri_passed_as_bytes(): url_bytestring = SNOWMAN_PARAMS uri = uri_reference(url_bytestring) assert uri.is_valid() is True assert uri == 'http://example.com?utf8=%E2%98%83' def test_unicode_authority(): url_bytestring = SNOWMAN_HOST unicode_url = url_bytestring.decode('utf-8') uri = uri_reference(unicode_url) assert uri.is_valid() is False assert uri == unicode_url def test_urlparse_a_unicode_hostname(): url_bytestring = SNOWMAN_HOST unicode_url = url_bytestring.decode('utf-8') parsed = urlparse(url_bytestring) assert parsed.host == unicode_url[7:] def test_urlparse_a_unicode_hostname_with_auth(): url = b'http://userinfo@' + SNOWMAN + b'.com' parsed = urlparse(url) assert parsed.userinfo == 'userinfo' def test_urlparse_an_invalid_authority_parses_port(): url = 'http://foo:b@r@[::1]:80/get' parsed = urlparse(url) assert parsed.port == 80 assert parsed.userinfo == 'foo:b@r' assert parsed.hostname == '[::1]' def test_unsplit_idna_a_unicode_hostname(): parsed = urlparse(SNOWMAN_HOST) assert parsed.unsplit(use_idna=True) == SNOWMAN_IDNA_HOST def test_strict_urlparsing(): with pytest.raises(exceptions.InvalidAuthority): parseresult.ParseResult.from_string(SNOWMAN_HOST) rfc3986-1.3.2/tests/test_uri.py0000664000327200032720000003220613466311640017372 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- import pytest from rfc3986.exceptions import InvalidAuthority, ResolutionError from rfc3986.misc import URI_MATCHER from rfc3986.uri import URIReference from . import base @pytest.fixture def scheme_and_path_uri(): return 'mailto:user@example.com' class TestURIReferenceParsesURIs(base.BaseTestParsesURIs): """Tests for URIReference handling of URIs.""" test_class = URIReference def test_authority_info_raises_InvalidAuthority(self, invalid_uri): """Test that an invalid IPv6 is caught by authority_info().""" uri = URIReference.from_string(invalid_uri) with pytest.raises(InvalidAuthority): uri.authority_info() def test_attributes_catch_InvalidAuthority(self, invalid_uri): """Test that an invalid IPv6 is caught by authority_info().""" uri = URIReference.from_string(invalid_uri) assert uri.host is None assert uri.userinfo is None assert uri.port is None def test_handles_absolute_path_uri(self, absolute_path_uri): """Test that URIReference can handle a path-only URI.""" uri = URIReference.from_string(absolute_path_uri) assert uri.path == absolute_path_uri assert uri.authority_info() == { 'userinfo': None, 'host': None, 'port': None, } def test_scheme_and_path_uri_is_valid(self, scheme_and_path_uri): uri = self.test_class.from_string(scheme_and_path_uri) assert uri.is_valid() is True def test_handles_scheme_and_path_uri(self, scheme_and_path_uri): """Test that self.test_class can handle a `scheme:path` URI.""" uri = self.test_class.from_string(scheme_and_path_uri) assert uri.path == 'user@example.com' assert uri.scheme == 'mailto' assert uri.query is None assert uri.host is None assert uri.port is None assert uri.userinfo is None assert uri.authority is None def test_parses_ipv6_to_path(self): """Verify that we don't parse [ as a scheme.""" uri = self.test_class.from_string('[::1]') assert uri.scheme is None assert uri.authority is None assert uri.path == '[::1]' class TestURIValidation: # Valid URI tests def test_basic_uri_is_valid(self, basic_uri): uri = URIReference.from_string(basic_uri) assert uri.is_valid() is True def test_basic_uri_requiring_scheme(self, basic_uri): uri = URIReference.from_string(basic_uri) assert uri.is_valid(require_scheme=True) is True def test_basic_uri_requiring_authority(self, basic_uri): uri = URIReference.from_string(basic_uri) assert uri.is_valid(require_authority=True) is True def test_uri_with_everything_requiring_path(self, uri_with_everything): uri = URIReference.from_string(uri_with_everything) assert uri.is_valid(require_path=True) is True def test_uri_with_everything_requiring_query(self, uri_with_everything): uri = URIReference.from_string(uri_with_everything) assert uri.is_valid(require_query=True) is True def test_uri_with_everything_requiring_fragment(self, uri_with_everything): uri = URIReference.from_string(uri_with_everything) assert uri.is_valid(require_fragment=True) is True def test_basic_uri_with_port_is_valid(self, basic_uri_with_port): uri = URIReference.from_string(basic_uri_with_port) assert uri.is_valid() is True def test_uri_with_port_and_userinfo_is_valid(self, uri_with_port_and_userinfo): uri = URIReference.from_string(uri_with_port_and_userinfo) assert uri.is_valid() is True def test_basic_uri_with_path_is_valid(self, basic_uri_with_path): uri = URIReference.from_string(basic_uri_with_path) assert uri.is_valid() is True def test_uri_with_path_and_query_is_valid(self, uri_with_path_and_query): uri = URIReference.from_string(uri_with_path_and_query) assert uri.is_valid() is True def test_uri_with_everything_is_valid(self, uri_with_everything): uri = URIReference.from_string(uri_with_everything) assert uri.is_valid() is True def test_relative_uri_is_valid(self, relative_uri): uri = URIReference.from_string(relative_uri) assert uri.is_valid() is True def test_absolute_path_uri_is_valid(self, absolute_path_uri): uri = URIReference.from_string(absolute_path_uri) assert uri.is_valid() is True def test_scheme_and_path_uri_is_valid(self, scheme_and_path_uri): uri = URIReference.from_string(scheme_and_path_uri) assert uri.is_valid() is True # Invalid URI tests def test_invalid_uri_is_not_valid(self, invalid_uri): uri = URIReference.from_string(invalid_uri) assert uri.is_valid() is False def test_invalid_scheme(self): uri = URIReference('123', None, None, None, None) assert uri.is_valid() is False def test_invalid_path(self): uri = URIReference(None, None, 'foo#bar', None, None) assert uri.is_valid() is False def test_invalid_query_component(self): uri = URIReference(None, None, None, 'foo#bar', None) assert uri.is_valid() is False def test_invalid_fragment_component(self): uri = URIReference(None, None, None, None, 'foo#bar') assert uri.is_valid() is False class TestURIReferenceUnsplits(base.BaseTestUnsplits): test_class = URIReference def test_scheme_and_path_uri_unsplits(self, scheme_and_path_uri): uri = self.test_class.from_string(scheme_and_path_uri) assert uri.unsplit() == scheme_and_path_uri class TestURIReferenceComparesToStrings: def test_basic_uri(self, basic_uri): uri = URIReference.from_string(basic_uri) assert uri == basic_uri def test_basic_uri_with_port(self, basic_uri_with_port): uri = URIReference.from_string(basic_uri_with_port) assert uri == basic_uri_with_port def test_uri_with_port_and_userinfo(self, uri_with_port_and_userinfo): uri = URIReference.from_string(uri_with_port_and_userinfo) assert uri == uri_with_port_and_userinfo def test_basic_uri_with_path(self, basic_uri_with_path): uri = URIReference.from_string(basic_uri_with_path) assert uri == basic_uri_with_path def test_uri_with_path_and_query(self, uri_with_path_and_query): uri = URIReference.from_string(uri_with_path_and_query) assert uri == uri_with_path_and_query def test_uri_with_everything(self, uri_with_everything): uri = URIReference.from_string(uri_with_everything) assert uri == uri_with_everything def test_relative_uri(self, relative_uri): uri = URIReference.from_string(relative_uri) assert uri == relative_uri def test_absolute_path_uri(self, absolute_path_uri): uri = URIReference.from_string(absolute_path_uri) assert uri == absolute_path_uri def test_scheme_and_path_uri(self, scheme_and_path_uri): uri = URIReference.from_string(scheme_and_path_uri) assert uri == scheme_and_path_uri class TestURIReferenceComparesToTuples: def to_tuple(self, uri): return URI_MATCHER.match(uri).groups() def test_basic_uri(self, basic_uri): uri = URIReference.from_string(basic_uri) assert uri == self.to_tuple(basic_uri) def test_basic_uri_with_port(self, basic_uri_with_port): uri = URIReference.from_string(basic_uri_with_port) assert uri == self.to_tuple(basic_uri_with_port) def test_uri_with_port_and_userinfo(self, uri_with_port_and_userinfo): uri = URIReference.from_string(uri_with_port_and_userinfo) assert uri == self.to_tuple(uri_with_port_and_userinfo) def test_basic_uri_with_path(self, basic_uri_with_path): uri = URIReference.from_string(basic_uri_with_path) assert uri == self.to_tuple(basic_uri_with_path) def test_uri_with_path_and_query(self, uri_with_path_and_query): uri = URIReference.from_string(uri_with_path_and_query) assert uri == self.to_tuple(uri_with_path_and_query) def test_uri_with_everything(self, uri_with_everything): uri = URIReference.from_string(uri_with_everything) assert uri == self.to_tuple(uri_with_everything) def test_relative_uri(self, relative_uri): uri = URIReference.from_string(relative_uri) assert uri == self.to_tuple(relative_uri) def test_absolute_path_uri(self, absolute_path_uri): uri = URIReference.from_string(absolute_path_uri) assert uri == self.to_tuple(absolute_path_uri) def test_scheme_and_path_uri(self, scheme_and_path_uri): uri = URIReference.from_string(scheme_and_path_uri) assert uri == self.to_tuple(scheme_and_path_uri) def test_uri_comparison_raises_TypeError(basic_uri): uri = URIReference.from_string(basic_uri) with pytest.raises(TypeError): uri == 1 class TestURIReferenceComparesToURIReferences: def test_same_basic_uri(self, basic_uri): uri = URIReference.from_string(basic_uri) assert uri == uri def test_different_basic_uris(self, basic_uri, basic_uri_with_port): uri = URIReference.from_string(basic_uri) assert (uri == URIReference.from_string(basic_uri_with_port)) is False class TestURIReferenceIsAbsolute: def test_basic_uris_are_absolute(self, basic_uri): uri = URIReference.from_string(basic_uri) assert uri.is_absolute() is True def test_basic_uris_with_ports_are_absolute(self, basic_uri_with_port): uri = URIReference.from_string(basic_uri_with_port) assert uri.is_absolute() is True def test_basic_uris_with_paths_are_absolute(self, basic_uri_with_path): uri = URIReference.from_string(basic_uri_with_path) assert uri.is_absolute() is True def test_uri_with_everything_are_not_absolute(self, uri_with_everything): uri = URIReference.from_string(uri_with_everything) assert uri.is_absolute() is False def test_absolute_paths_are_not_absolute_uris(self, absolute_path_uri): uri = URIReference.from_string(absolute_path_uri) assert uri.is_absolute() is False # @pytest.fixture(params=[ # basic_uri, basic_uri_with_port, basic_uri_with_path, # scheme_and_path_uri, uri_with_path_and_query # ]) # @pytest.fixture(params=[absolute_path_uri, relative_uri]) class TestURIReferencesResolve: def test_with_basic_and_relative_uris(self, basic_uri, relative_uri): R = URIReference.from_string(relative_uri) B = URIReference.from_string(basic_uri) T = R.resolve_with(basic_uri) assert T.scheme == B.scheme assert T.host == R.host assert T.path == R.path def test_with_basic_and_absolute_path_uris(self, basic_uri, absolute_path_uri): R = URIReference.from_string(absolute_path_uri) B = URIReference.from_string(basic_uri).normalize() T = R.resolve_with(B) assert T.scheme == B.scheme assert T.host == B.host assert T.path == R.path def test_with_basic_uri_and_relative_path(self, basic_uri): R = URIReference.from_string('foo/bar/bogus') B = URIReference.from_string(basic_uri).normalize() T = R.resolve_with(B) assert T.scheme == B.scheme assert T.host == B.host assert T.path == '/' + R.path def test_basic_uri_with_path_and_relative_path(self, basic_uri_with_path): R = URIReference.from_string('foo/bar/bogus') B = URIReference.from_string(basic_uri_with_path).normalize() T = R.resolve_with(B) assert T.scheme == B.scheme assert T.host == B.host index = B.path.rfind('/') assert T.path == B.path[:index] + '/' + R.path def test_uri_with_everything_raises_exception(self, uri_with_everything): R = URIReference.from_string('foo/bar/bogus') B = URIReference.from_string(uri_with_everything) with pytest.raises(ResolutionError): R.resolve_with(B) def test_basic_uri_resolves_itself(self, basic_uri): R = URIReference.from_string(basic_uri) B = URIReference.from_string(basic_uri) T = R.resolve_with(B) assert T == B def test_differing_schemes(self, basic_uri): R = URIReference.from_string('https://example.com/path') B = URIReference.from_string(basic_uri) T = R.resolve_with(B) assert T.scheme == R.scheme def test_resolve_pathless_fragment(self, basic_uri): R = URIReference.from_string('#fragment') B = URIReference.from_string(basic_uri) T = R.resolve_with(B) assert T.path is None assert T.fragment == 'fragment' def test_resolve_pathless_query(self, basic_uri): R = URIReference.from_string('?query') B = URIReference.from_string(basic_uri) T = R.resolve_with(B) assert T.path is None assert T.query == 'query' def test_empty_querystrings_persist(): url = 'https://httpbin.org/get?' ref = URIReference.from_string(url) assert ref.query == '' assert ref.unsplit() == url rfc3986-1.3.2/tests/test_validators.py0000664000327200032720000002252613466311640020747 0ustar slarsonslarson00000000000000# -*- coding: utf-8 -*- """Tests for the validators module.""" import rfc3986 from rfc3986 import exceptions from rfc3986 import validators import pytest def test_defaults(): """Verify the default Validator settings.""" validator = validators.Validator() assert validator.required_components == { c: False for c in validator.COMPONENT_NAMES } assert validator.allow_password is True assert validator.allowed_schemes == set() assert validator.allowed_hosts == set() assert validator.allowed_ports == set() def test_allowing_schemes(): """Verify the ability to select schemes to be allowed.""" validator = validators.Validator().allow_schemes('http', 'https') assert 'http' in validator.allowed_schemes assert 'https' in validator.allowed_schemes def test_allowing_hosts(): """Verify the ability to select hosts to be allowed.""" validator = validators.Validator().allow_hosts( 'pypi.python.org', 'pypi.org', ) assert 'pypi.python.org' in validator.allowed_hosts assert 'pypi.org' in validator.allowed_hosts def test_allowing_ports(): """Verify the ability select ports to be allowed.""" validator = validators.Validator().allow_ports('80', '100') assert '80' in validator.allowed_ports assert '100' in validator.allowed_ports def test_requiring_invalid_component(): """Verify that we validate required component names.""" with pytest.raises(ValueError): validators.Validator().require_presence_of('frob') def test_checking_validity_of_component(): """Verify that we validate components we're validating.""" with pytest.raises(ValueError): validators.Validator().check_validity_of('frob') def test_use_of_password(): """Verify the behaviour of {forbid,allow}_use_of_password.""" validator = validators.Validator() assert validator.allow_password is True validator.forbid_use_of_password() assert validator.allow_password is False validator.allow_use_of_password() assert validator.allow_password is True @pytest.mark.parametrize('uri', [ rfc3986.uri_reference('https://user:password@github.com'), rfc3986.uri_reference('https://user:password@github.com/path'), rfc3986.uri_reference('https://user:password@github.com/path?query'), rfc3986.uri_reference('https://user:password@github.com/path?query#frag'), rfc3986.uri_reference('//user:password@github.com'), ]) def test_forbidden_passwords(uri): """Verify that passwords are disallowed.""" validator = validators.Validator().forbid_use_of_password() with pytest.raises(exceptions.PasswordForbidden): validator.validate(uri) @pytest.mark.parametrize('uri', [ rfc3986.uri_reference('https://user@github.com'), rfc3986.uri_reference('https://user@github.com/path'), rfc3986.uri_reference('https://user@github.com/path?query'), rfc3986.uri_reference('https://user@github.com/path?query#frag'), rfc3986.uri_reference('//user@github.com'), rfc3986.uri_reference('//github.com'), rfc3986.uri_reference('https://github.com'), ]) def test_passwordless_uris_pass_validation(uri): """Verify password-less URLs validate properly.""" validator = validators.Validator().forbid_use_of_password() validator.validate(uri) @pytest.mark.parametrize('uri', [ rfc3986.uri_reference('https://'), rfc3986.uri_reference('/path/to/resource'), ]) def test_missing_host_component(uri): """Verify that missing host components cause errors.""" validators.Validator().validate(uri) validator = validators.Validator().require_presence_of('host') with pytest.raises(exceptions.MissingComponentError): validator.validate(uri) @pytest.mark.parametrize('uri', [ rfc3986.uri_reference('https://'), rfc3986.uri_reference('//google.com'), rfc3986.uri_reference('//google.com?query=value'), rfc3986.uri_reference('//google.com#fragment'), rfc3986.uri_reference('https://google.com'), rfc3986.uri_reference('https://google.com#fragment'), rfc3986.uri_reference('https://google.com?query=value'), ]) def test_missing_path_component(uri): """Verify that missing path components cause errors.""" validator = validators.Validator().require_presence_of('path') with pytest.raises(exceptions.MissingComponentError): validator.validate(uri) @pytest.mark.parametrize('uri', [ rfc3986.uri_reference('//google.com'), rfc3986.uri_reference('//google.com?query=value'), rfc3986.uri_reference('//google.com#fragment'), ]) def test_multiple_missing_components(uri): """Verify that multiple missing components are caught.""" validator = validators.Validator().require_presence_of('scheme', 'path') with pytest.raises(exceptions.MissingComponentError) as captured_exc: validator.validate(uri) exception = captured_exc.value assert 2 == len(exception.args[-1]) @pytest.mark.parametrize('uri', [ rfc3986.uri_reference('smtp://'), rfc3986.uri_reference('telnet://'), ]) def test_ensure_uri_has_a_scheme(uri): """Verify validation with allowed schemes.""" validator = validators.Validator().allow_schemes('https', 'http') with pytest.raises(exceptions.UnpermittedComponentError): validator.validate(uri) @pytest.mark.parametrize('uri, failed_component', [ (rfc3986.uri_reference('git://github.com'), 'scheme'), (rfc3986.uri_reference('http://github.com'), 'scheme'), (rfc3986.uri_reference('ssh://gitlab.com'), 'host'), (rfc3986.uri_reference('https://gitlab.com'), 'host'), ]) def test_allowed_hosts_and_schemes(uri, failed_component): """Verify each of these fails.""" validator = validators.Validator().allow_schemes( 'https', 'ssh', ).allow_hosts( 'github.com', 'git.openstack.org', ) with pytest.raises(exceptions.UnpermittedComponentError) as caught_exc: validator.validate(uri) exc = caught_exc.value assert exc.component_name == failed_component @pytest.mark.parametrize('uri', [ rfc3986.uri_reference('https://github.com/sigmavirus24'), rfc3986.uri_reference('ssh://github.com/sigmavirus24'), rfc3986.uri_reference('ssh://ssh@github.com:22/sigmavirus24'), rfc3986.uri_reference('https://github.com:443/sigmavirus24'), rfc3986.uri_reference('https://gitlab.com/sigmavirus24'), rfc3986.uri_reference('ssh://gitlab.com/sigmavirus24'), rfc3986.uri_reference('ssh://ssh@gitlab.com:22/sigmavirus24'), rfc3986.uri_reference('https://gitlab.com:443/sigmavirus24'), rfc3986.uri_reference('https://bitbucket.org/sigmavirus24'), rfc3986.uri_reference('ssh://bitbucket.org/sigmavirus24'), rfc3986.uri_reference('ssh://ssh@bitbucket.org:22/sigmavirus24'), rfc3986.uri_reference('https://bitbucket.org:443/sigmavirus24'), rfc3986.uri_reference('https://git.openstack.org/sigmavirus24'), rfc3986.uri_reference('ssh://git.openstack.org/sigmavirus24'), rfc3986.uri_reference('ssh://ssh@git.openstack.org:22/sigmavirus24'), rfc3986.uri_reference('https://git.openstack.org:443/sigmavirus24'), rfc3986.uri_reference( 'ssh://ssh@git.openstack.org:22/sigmavirus24?foo=bar#fragment' ), rfc3986.uri_reference( 'ssh://git.openstack.org:22/sigmavirus24?foo=bar#fragment' ), rfc3986.uri_reference('ssh://git.openstack.org:22/?foo=bar#fragment'), rfc3986.uri_reference('ssh://git.openstack.org:22/sigmavirus24#fragment'), rfc3986.uri_reference('ssh://git.openstack.org:22/#fragment'), rfc3986.uri_reference('ssh://git.openstack.org:22/'), rfc3986.uri_reference('ssh://ssh@git.openstack.org:22/?foo=bar#fragment'), rfc3986.uri_reference( 'ssh://ssh@git.openstack.org:22/sigmavirus24#fragment' ), rfc3986.uri_reference('ssh://ssh@git.openstack.org:22/#fragment'), rfc3986.uri_reference('ssh://ssh@git.openstack.org:22/'), ]) def test_successful_complex_validation(uri): """Verify we do not raise ValidationErrors for good URIs.""" validators.Validator().allow_schemes( 'https', 'ssh', ).allow_hosts( 'github.com', 'bitbucket.org', 'gitlab.com', 'git.openstack.org', ).allow_ports( '22', '443', ).require_presence_of( 'scheme', 'host', 'path', ).check_validity_of( 'scheme', 'userinfo', 'host', 'port', 'path', 'query', 'fragment', ).validate(uri) def test_invalid_uri_generates_error(invalid_uri): """Verify we catch invalid URIs.""" uri = rfc3986.uri_reference(invalid_uri) with pytest.raises(exceptions.InvalidComponentsError): validators.Validator().check_validity_of('host').validate(uri) def test_invalid_uri_with_invalid_path(invalid_uri): """Verify we catch multiple invalid components.""" uri = rfc3986.uri_reference(invalid_uri) uri = uri.copy_with(path='#foobar') with pytest.raises(exceptions.InvalidComponentsError): validators.Validator().check_validity_of( 'host', 'path', ).validate(uri) def test_validating_rfc_4007_ipv6_zone_ids(): """Verify that RFC 4007 IPv6 Zone IDs are invalid host/authority but after normalization are valid """ uri = rfc3986.uri_reference("http://[::1%eth0]") with pytest.raises(exceptions.InvalidComponentsError): validators.Validator().check_validity_of( 'host' ).validate(uri) uri = uri.normalize() assert uri.host == '[::1%25eth0]' validators.Validator().check_validity_of( 'host' ).validate(uri)