pax_global_header00006660000000000000000000000064142754601110014513gustar00rootroot0000000000000052 comment=19d8a42bab3a2832461e2ffbc9e947f85cab27d2 elementpath-3.0.2/000077500000000000000000000000001427546011100140235ustar00rootroot00000000000000elementpath-3.0.2/.coveragerc000066400000000000000000000002501427546011100161410ustar00rootroot00000000000000[run] branch = True source = elementpath/ omit = elementpath/regex/generate_categories.py [report] exclude_lines = pragma: no cover raise NotImplementedError()elementpath-3.0.2/.github/000077500000000000000000000000001427546011100153635ustar00rootroot00000000000000elementpath-3.0.2/.github/workflows/000077500000000000000000000000001427546011100174205ustar00rootroot00000000000000elementpath-3.0.2/.github/workflows/test-elementpath.yml000066400000000000000000000032721427546011100234320ustar00rootroot00000000000000name: elementpath on: push: branches: [master, develop] pull_request: branches: [master, develop] jobs: build: runs-on: ${{ matrix.os }} strategy: fail-fast: false matrix: os: [ubuntu-latest, macos-latest, windows-latest] python-version: [3.7, 3.8, 3.9, "3.10", "3.11-dev", "pypy-3.8"] exclude: - os: macos-latest python-version: 3.7 - os: windows-latest python-version: 3.7 - os: macos-latest python-version: 3.8 - os: windows-latest python-version: 3.8 steps: - uses: actions/checkout@v2 - name: Install additional development libraries for building lxml if: ${{ matrix.os == 'ubuntu-latest' && (matrix.python-version == '3.11-dev' || matrix.python-version == 'pypy-3.8') }} run: sudo apt-get update && sudo apt-get install libxml2-dev libxslt-dev python-dev - name: Set up Python ${{ matrix.python-version }} uses: actions/setup-python@v2 with: python-version: ${{ matrix.python-version }} - name: Install pip and setuptools run: | python -m pip install --upgrade pip pip install setuptools - name: Lint with flake8 run: | pip install flake8 flake8 elementpath --max-line-length=100 --statistics - name: Lint with mypy if Python version != 3.7 if: ${{ matrix.python-version != '3.7' }} run: | pip install mypy==0.971 lxml-stubs mypy --show-error-codes --strict elementpath - name: Test with unittest run: | pip install lxml xmlschema>=2.0.0 python -m unittest elementpath-3.0.2/.gitignore000066400000000000000000000002661427546011100160170ustar00rootroot00000000000000*.pyc *.pyo *~ *.so *.swp *.egg-info .idea/ .project .ipynb_checkpoints/ .tox/ .mypy_cache/ .coverage* !.coveragerc doc/_* __pycache__/ dist/ build/ development/ out/ profiling/out/ elementpath-3.0.2/CHANGELOG.rst000066400000000000000000000342551427546011100160550ustar00rootroot00000000000000********* CHANGELOG ********* `v3.0.2`_ (2022-08-12) ====================== * Extend root concept to subtrees used as root (e.g. XSD 1.1 assertions) * Begin XPath 3.1 implementation adding XPathMap and XPathArray `v3.0.1`_ (2022-07-23) ====================== * Fix of descendant path operator (issue #51) * Add support for Python 3.11 `v3.0.0`_ (2022-07-16) ====================== * Transition to full XPath node implementation (more memory usage but better control and overall faster) * Add etree.py module with a safe XML parser (ported from xmlschema) `v2.5.3`_ (2022-05-30) ====================== * Fix unary path step operator (issue #46) * Fix sphinx warnings *'reference target not found'* (issue #45) `v2.5.2`_ (2022-05-17) ====================== * Include PR #43 with fixes for `XPathContext.iter_siblings()` (issues #42 and #44) `v2.5.1`_ (2022-04-28) ====================== * Fix for failed floats equality tests (issue #41) * Static typing tested with mypy==0.950 `v2.5.0`_ (2022-03-04) ====================== * Add XPath 3.0 support * Better use of lxml.etree features * Full coverage of W3C tests * Drop support for Python 3.6 `v2.4.0`_ (2021-11-09) ====================== * Fix type annotations and going strict on parsers and other public classes * Add XPathConstructor token class (subclass of XPathFunction) * Last release for Python 3.6 `v2.3.2`_ (2021-09-16) ====================== * Make ElementProtocol and LxmlElementProtocol runtime checkable (only for Python 3.8+) * Type annotations for all package public APIs `v2.3.1`_ (2021-09-07) ====================== * Add LxmlElementProtocol * Add pytest env to tox.ini (test issue #39) `v2.3.0`_ (2021-09-01) ====================== * Add inline type annotations check support * Add structural Protocol based type checks (effective for Python 3.8+) `v2.2.3`_ (2021-06-16) ====================== * Add Python 3.10 in Tox and CI tests * Apply __slots__ to TDOP and regex classes `v2.2.2`_ (2021-05-03) ====================== * Fix issue sissaschool/xmlschema#243 (assert with xsi:nil usage) * First implementation of XPath 3.0 fn:format-integer `v2.2.1`_ (2021-03-24) ====================== * Add function signatures at token registration * Some fixes to XPath tokens and more XPath 3.0 implementations `v2.2.0`_ (2021-03-01) ====================== * Optimize TDOP parser's tokenizer * Resolve ambiguities with operators and statements that are also names * Merge with XPath 3.0/3.1 develop (to be completed) `v2.1.4`_ (2021-02-09) ====================== * Add tests and apply small fixes to TDOP parser * Fix wildcard selection of attributes (issue #35) `v2.1.3`_ (2021-01-30) ====================== * Extend tests for XPath 2.0 with minor fixes * Fix fn:round-half-to-even (issue #33) `v2.1.2`_ (2021-01-22) ====================== * Extend tests for XPath 1.0/2.0 with minor fixes * Fix for +/- prefix operators * Fix for regex patterns anchors and binary datatypes `v2.1.1`_ (2021-01-06) ====================== * Fix for issue #32 (test failure on missing locale setting) * Extend tests for XPath 1.0 with minor fixes `v2.1.0`_ (2021-01-05) ====================== * Create custom class hierarchy for XPath nodes that replaces named-tuples * Bind attribute nodes, text nodes and namespace nodes to parent element (issue #31) `v2.0.5`_ (2020-12-02) ====================== * Increase the speed of path step selection on large trees * More tests and small fixes to XSD builtin datatypes `v2.0.4`_ (2020-10-30) ====================== * Lazy tokenizer for parser classes in order to minimize import time `v2.0.3`_ (2020-09-13) ====================== * Fix context handling in cycle statements * Change constructor's label to 'constructor function' `v2.0.2`_ (2020-09-03) ====================== * Add regex translator to package API * More than 99% of W3C XPath 2.0 tests pass `v2.0.1`_ (2020-08-24) ====================== * Add regex transpiler (for XPath/XQuery and XML Schema regular expressions) * Hotfix for issue #30 `v2.0.0`_ (2020-08-13) ====================== * Extensive testing with W3C XPath 2.0 tests (~98% passed) * Split context variables from in-scope variables (types) * Add other XSD builtin atomic types `v1.4.6`_ (2020-06-15) ====================== * Fix XPathContext to let the subclasses replace the XPath nodes iterator function `v1.4.5`_ (2020-05-22) ====================== * Fix tokenizer and parsers for ambiguities between symbols and names `v1.4.4`_ (2020-04-23) ====================== * Improve XPath context and axes processing * Integrate pull requests and fix bug on predicate selector `v1.4.3`_ (2020-03-18) ====================== * Fix PyPy 3 tests on xs:base64Binary and xs:hexBinary * Separated the tests of schema proxy API and other schemas based tests `v1.4.2`_ (2020-03-13) ====================== * Multiple XSD type associations on a token * Extend xs:untypedAtomic type usage * Increase the tests coverage to 95% `v1.4.1`_ (2020-01-28) ====================== * Fix for node kind tests * Fix for issue #17 * Update test dependencies * Add PyPy3 to tests `v1.4.0`_ (2019-12-31) ====================== * Remove Python 2 support * Add TextNode node type * Fix for issue #15 and for errors related to PR #16 `v1.3.3`_ (2019-12-17) ====================== * Fix 'attribute' multi-role token (axis and kind test) * Fixes for issues #13 and #14 `v1.3.2`_ (2019-12-10) ====================== * Add token labels 'sequence types' and 'kind test' for callables that are not XPath functions * Add missing XPath 2.0 functions * Fix for issue #12 `v1.3.1`_ (2019-10-21) ====================== * Add test module for TDOP parser * Fix for issue #10 `v1.3.0`_ (2019-10-11) ====================== * Improved schema proxy * Improved XSD type matching using paths * Cached parent path for XPathContext (only Python 3) * Improve typed selection with TypedAttribute and TypedElement named-tuples * Add iter_results to XPathContext * Remove XMLSchemaProxy from package * Fix descendant shortcut operator '//' * Fix text() function * Fix typed select of '(name)' token * Fix 24-hour time for DateTime `v1.2.1`_ (2019-08-30) ====================== * Hashable XSD datatypes classes * Fix Duration types comparison `v1.2.0`_ (2019-08-14) ====================== * Added special XSD datatypes * Better handling of schema contexts * Added validators for numeric types * Fixed function conversion rules * Fixed tests with lxml and XPath 1.0 * Added tests for uncovered code `v1.1.8`_ (2019-05-20) ====================== * Added code coverage and flake8 checks * Drop Python 3.4 support * Use more specific XPath errors for functions and namespace resolving * Fix for issue #4 `v1.1.7`_ (2019-04-25) ====================== * Added Parser.is_spaced() method for checking if the current token has extra spaces before or after * Fixes for '/' and ':' tokens * Fixes for fn:max() and fn:min() functions `v1.1.6`_ (2019-03-28) ====================== * Fixes for XSD datatypes * Minor fixes after a first test run with Python v3.8a3 `v1.1.5`_ (2019-02-23) ====================== * Differentiated unordered XPath gregorian types from ordered types for XSD * Fix issue #2 `v1.1.4`_ (2019-02-21) ====================== * Implementation of a full Static Analysis Phase at parse() level * Schema-based static analysis for XPath 2.0 parsers using schema contexts * Added ``XPathSchemaContext`` class for processing schema contexts * Added atomization() and get_atomized_operand() helpers to XPathToken * Fix value comparison operators `v1.1.3`_ (2019-02-06) ====================== * Fix for issue #1 * Added fn:static-base-uri() and fn:resolve-uri() * Fixes to XPath 1.0 functions for compatibility mode `v1.1.2`_ (2019-01-30) ====================== * Fixes for XSD datatypes * Change the default value of *default_namespace* argument of XPath2Parser to ``None`` `v1.1.1`_ (2019-01-19) ====================== * Improvements and fixes for XSD datatypes * Rewritten AbstractDateTime for supporting years with value > 9999 * Added fn:dateTime() `v1.1.0`_ (2018-12-23) ====================== * Almost full implementation of XPath 2.0 * Extended XPath errors management * Add XSD datatypes for data/time builtins * Add constructors for XSD builtins `v1.0.12`_ (2018-09-01) ======================= * Fixed the default namespace use for names without prefix. `v1.0.11`_ (2018-07-25) ======================= * Added two recursive protected methods to context class * Minor fixes for context and helpers `v1.0.10`_ (2018-06-15) ======================= * Updated TDOP parser and implemented token classes serialization `v1.0.8`_ (2018-06-13) ====================== * Fixed token classes creation for parsers serialization `v1.0.7`_ (2018-05-07) ====================== * Added autodoc based manual with Sphinx `v1.0.6`_ (2018-05-02) ====================== * Added tox testing * Improved the parser class with raw_advance method `v1.0.5`_ (2018-03-31) ====================== * Added n.10 XPath 2.0 functions for strings * Fix README.rst for right rendering in PyPI * Added ElementPathMissingContextError exception for a correct handling of static context evaluation `v1.0.4`_ (2018-03-27) ====================== * Fixed packaging ('packages' argument in setup.py). `v1.0.3`_ (2018-03-27) ====================== * Fixed the effective boolean value for a list containing an empty string. `v1.0.2`_ (2018-03-27) ====================== * Add QName parsing like in the ElementPath library (usage regulated by a *strict* flag). `v1.0.1`_ (2018-03-27) ====================== * Some bug fixes for attributes selection. `v1.0.0`_ (2018-03-26) ====================== * First stable version. .. _v1.0.0: https://github.com/sissaschool/elementpath/commit/b28da83 .. _v1.0.1: https://github.com/sissaschool/elementpath/compare/v1.0.0...v1.0.1 .. _v1.0.2: https://github.com/sissaschool/elementpath/compare/v1.0.1...v1.0.2 .. _v1.0.3: https://github.com/sissaschool/elementpath/compare/v1.0.2...v1.0.3 .. _v1.0.4: https://github.com/sissaschool/elementpath/compare/v1.0.3...v1.0.4 .. _v1.0.5: https://github.com/sissaschool/elementpath/compare/v1.0.4...v1.0.5 .. _v1.0.6: https://github.com/sissaschool/elementpath/compare/v1.0.5...v1.0.6 .. _v1.0.7: https://github.com/sissaschool/elementpath/compare/v1.0.6...v1.0.7 .. _v1.0.8: https://github.com/sissaschool/elementpath/compare/v1.0.7...v1.0.8 .. _v1.0.10: https://github.com/sissaschool/elementpath/compare/v1.0.8...v1.0.10 .. _v1.0.11: https://github.com/sissaschool/elementpath/compare/v1.0.10...v1.0.11 .. _v1.0.12: https://github.com/sissaschool/elementpath/compare/v1.0.11...v1.0.12 .. _v1.1.0: https://github.com/sissaschool/elementpath/compare/v1.0.12...v1.1.0 .. _v1.1.1: https://github.com/sissaschool/elementpath/compare/v1.1.0...v1.1.1 .. _v1.1.2: https://github.com/sissaschool/elementpath/compare/v1.1.1...v1.1.2 .. _v1.1.3: https://github.com/sissaschool/elementpath/compare/v1.1.2...v1.1.3 .. _v1.1.4: https://github.com/sissaschool/elementpath/compare/v1.1.3...v1.1.4 .. _v1.1.5: https://github.com/sissaschool/elementpath/compare/v1.1.4...v1.1.5 .. _v1.1.6: https://github.com/sissaschool/elementpath/compare/v1.1.5...v1.1.6 .. _v1.1.7: https://github.com/sissaschool/elementpath/compare/v1.1.6...v1.1.7 .. _v1.1.8: https://github.com/sissaschool/elementpath/compare/v1.1.7...v1.1.8 .. _v1.1.9: https://github.com/sissaschool/elementpath/compare/v1.1.8...v1.1.9 .. _v1.2.0: https://github.com/sissaschool/elementpath/compare/v1.1.9...v1.2.0 .. _v1.2.1: https://github.com/sissaschool/elementpath/compare/v1.2.0...v1.2.1 .. _v1.3.0: https://github.com/sissaschool/elementpath/compare/v1.2.1...v1.3.0 .. _v1.3.1: https://github.com/sissaschool/elementpath/compare/v1.3.0...v1.3.1 .. _v1.3.2: https://github.com/sissaschool/elementpath/compare/v1.3.1...v1.3.2 .. _v1.3.3: https://github.com/sissaschool/elementpath/compare/v1.3.2...v1.3.3 .. _v1.4.0: https://github.com/sissaschool/elementpath/compare/v1.3.3...v1.4.0 .. _v1.4.1: https://github.com/sissaschool/elementpath/compare/v1.4.0...v1.4.1 .. _v1.4.2: https://github.com/sissaschool/elementpath/compare/v1.4.1...v1.4.2 .. _v1.4.3: https://github.com/sissaschool/elementpath/compare/v1.4.2...v1.4.3 .. _v1.4.4: https://github.com/sissaschool/elementpath/compare/v1.4.3...v1.4.4 .. _v1.4.5: https://github.com/sissaschool/elementpath/compare/v1.4.4...v1.4.5 .. _v1.4.6: https://github.com/sissaschool/elementpath/compare/v1.4.5...v1.4.6 .. _v2.0.0: https://github.com/sissaschool/elementpath/compare/v1.4.6...v2.0.0 .. _v2.0.1: https://github.com/sissaschool/elementpath/compare/v2.0.0...v2.0.1 .. _v2.0.2: https://github.com/sissaschool/elementpath/compare/v2.0.1...v2.0.2 .. _v2.0.3: https://github.com/sissaschool/elementpath/compare/v2.0.2...v2.0.3 .. _v2.0.4: https://github.com/sissaschool/elementpath/compare/v2.0.3...v2.0.4 .. _v2.0.5: https://github.com/sissaschool/elementpath/compare/v2.0.4...v2.0.5 .. _v2.1.0: https://github.com/sissaschool/elementpath/compare/v2.0.5...v2.1.0 .. _v2.1.1: https://github.com/sissaschool/elementpath/compare/v2.1.0...v2.1.1 .. _v2.1.2: https://github.com/sissaschool/elementpath/compare/v2.1.1...v2.1.2 .. _v2.1.3: https://github.com/sissaschool/elementpath/compare/v2.1.2...v2.1.3 .. _v2.1.4: https://github.com/sissaschool/elementpath/compare/v2.1.3...v2.1.4 .. _v2.2.0: https://github.com/sissaschool/elementpath/compare/v2.1.4...v2.2.0 .. _v2.2.1: https://github.com/sissaschool/elementpath/compare/v2.2.0...v2.2.1 .. _v2.2.2: https://github.com/sissaschool/elementpath/compare/v2.2.1...v2.2.2 .. _v2.2.3: https://github.com/sissaschool/elementpath/compare/v2.2.2...v2.2.3 .. _v2.3.0: https://github.com/sissaschool/elementpath/compare/v2.2.3...v2.3.0 .. _v2.3.1: https://github.com/sissaschool/elementpath/compare/v2.3.0...v2.3.1 .. _v2.3.2: https://github.com/sissaschool/elementpath/compare/v2.3.1...v2.3.2 .. _v2.4.0: https://github.com/sissaschool/elementpath/compare/v2.3.3...v2.4.0 .. _v2.5.0: https://github.com/sissaschool/elementpath/compare/v2.4.0...v2.5.0 .. _v2.5.1: https://github.com/sissaschool/elementpath/compare/v2.5.0...v2.5.1 .. _v2.5.2: https://github.com/sissaschool/elementpath/compare/v2.5.1...v2.5.2 .. _v2.5.3: https://github.com/sissaschool/elementpath/compare/v2.5.2...v2.5.3 .. _v3.0.0: https://github.com/sissaschool/elementpath/compare/v2.5.3...v3.0.0 .. _v3.0.1: https://github.com/sissaschool/elementpath/compare/v3.0.0...v3.0.1 .. _v3.0.2: https://github.com/sissaschool/elementpath/compare/v3.0.1...v3.0.2 elementpath-3.0.2/LICENSE000066400000000000000000000021531427546011100150310ustar00rootroot00000000000000The MIT License (MIT) Copyright (c), 2018-2021, SISSA (Scuola Internazionale Superiore di Studi Avanzati) Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. elementpath-3.0.2/MANIFEST.in000066400000000000000000000004001427546011100155530ustar00rootroot00000000000000include LICENSE include MANIFEST.in include README.rst include CHANGELOG.rst include setup.py include setup.cfg include requirements-dev.txt include tox.ini include doc/* recursive-include elementpath * recursive-include tests * global-exclude *.py[cod] elementpath-3.0.2/README.rst000066400000000000000000000111621427546011100155130ustar00rootroot00000000000000*********** elementpath *********** .. image:: https://img.shields.io/pypi/v/elementpath.svg :target: https://pypi.python.org/pypi/elementpath/ .. image:: https://img.shields.io/pypi/pyversions/elementpath.svg :target: https://pypi.python.org/pypi/elementpath/ .. image:: https://img.shields.io/pypi/implementation/elementpath.svg :target: https://pypi.python.org/pypi/elementpath/ .. image:: https://img.shields.io/badge/License-MIT-blue.svg :alt: MIT License :target: https://lbesson.mit-license.org/ .. image:: https://travis-ci.org/sissaschool/elementpath.svg?branch=master :target: https://travis-ci.org/sissaschool/elementpath .. image:: https://img.shields.io/pypi/dm/elementpath.svg :target: https://pypi.python.org/pypi/elementpath/ .. elementpath-introduction The proposal of this package is to provide XPath 1.0, 2.0 and 3.0 selectors for ElementTree XML data structures, both for the standard ElementTree library and for the `lxml.etree `_ library. For `lxml.etree `_ this package can be useful for providing XPath 2.0/3.0 selectors, because `lxml.etree `_ already has it's own implementation of XPath 1.0. Installation and usage ====================== You can install the package with *pip* in a Python 3.7+ environment:: pip install elementpath For using it import the package and apply the selectors on ElementTree nodes: >>> import elementpath >>> from xml.etree import ElementTree >>> root = ElementTree.XML('') >>> elementpath.select(root, '/A/B2/*') [, , ] The *select* API provides the standard XPath result format that is a list or an elementary datatype's value. If you want only to iterate over results you can use the generator function *iter_select* that accepts the same arguments of *select*. The selectors API works also using XML data trees based on the `lxml.etree `_ library: >>> import elementpath >>> import lxml.etree as etree >>> root = etree.XML('') >>> elementpath.select(root, '/A/B2/*') [, , ] When you need to apply the same XPath expression to several XML data you can also use the *Selector* class, creating an instance and then using it to apply the path on distinct XML data: >>> import elementpath >>> import lxml.etree as etree >>> selector = elementpath.Selector('/A/*/*') >>> root = etree.XML('') >>> selector.select(root) [, , ] >>> root = etree.XML('') >>> selector.select(root) [, , , ] Public API classes and functions are described into the `elementpath manual on the "Read the Docs" site `_. For default the XPath 2.0 is used. If you need XPath 1.0 parser provide the *parser* argument: >>> from elementpath import select, XPath1Parser >>> from xml.etree import ElementTree >>> root = ElementTree.XML('') >>> select(root, '/A/B2/*', parser=XPath1Parser) [, , ] For XPath 3.0 import the parser from *elementpath.xpath3* subpackage, that is not loaded for default: >>> from elementpath.xpath3 import XPath3Parser >>> select(root, 'math:atan(1.0e0)', parser=XPath3Parser) 0.7853981633974483 Contributing ============ You can contribute to this package reporting bugs, using the issue tracker or by a pull request. In case you open an issue please try to provide a test or test data for reproducing the wrong behaviour. The provided testing code shall be added to the tests of the package. The XPath parsers are based on an implementation of the Pratt's Top Down Operator Precedence parser. The implemented parser includes some lookup-ahead features, helpers for registering tokens and for extending language implementations. Also the token class has been generalized using a `MutableSequence` as base class. See *tdop_parser.py* for the basic internal classes and *xpath1_parser.py* for extensions and for a basic usage of the parser. If you like you can use the basic parser and tokens provided by the *tdop_parser.py* module to implement other types of parsers (I think it could be also a funny exercise!). License ======= This software is distributed under the terms of the MIT License. See the file 'LICENSE' in the root directory of the present distribution, or http://opensource.org/licenses/MIT. elementpath-3.0.2/doc/000077500000000000000000000000001427546011100145705ustar00rootroot00000000000000elementpath-3.0.2/doc/Makefile000066400000000000000000000011401427546011100162240ustar00rootroot00000000000000# Minimal makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build SPHINXPROJ = elementpath SOURCEDIR = . BUILDDIR = _build # Put it first so that "make" without argument is like "make help". help: @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) .PHONY: help Makefile # Catch-all target: route all unknown targets to Sphinx using the new # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). %: Makefile @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)elementpath-3.0.2/doc/conf.py000066400000000000000000000133521427546011100160730ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Configuration file for the Sphinx documentation builder. # # This file does only contain a selection of the most common options. For a # full list see the documentation: # http://www.sphinx-doc.org/en/stable/config # -- Path setup -------------------------------------------------------------- # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. # # import os # import sys # sys.path.insert(0, os.path.abspath('.')) # Extends the path with parent directory in order to import elementpath from # the project directory also if it's installed. import sys import os sys.path.insert(0, os.path.abspath('..')) # -- Project information ----------------------------------------------------- project = 'elementpath' copyright = '2018-2022, SISSA (International School for Advanced Studies)' author = 'Davide Brunato' # The short X.Y version version = '3.0' # The full version, including alpha/beta/rc tags release = '3.0.2' # -- General configuration --------------------------------------------------- # If your documentation needs a minimal Sphinx version, state it here. # # needs_sphinx = '1.0' # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. extensions = [ 'sphinx.ext.autodoc', 'sphinx.ext.doctest', ] # Options for autodoc add_module_names = False # do not add module name as prefix to classes or functions. autodoc_typehints = 'none' # do not add type annotations nitpick_ignore = [ ('py:class', 'XMLSchemaProxy') ] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix(es) of source filenames. # You can specify multiple suffix as a list of string: # # source_suffix = ['.rst', '.md'] source_suffix = '.rst' # The master toctree document. master_doc = 'index' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. # # This is also used if you do content translation via gettext catalogs. # Usually you set "language" from the command line for these cases. # language = None language = 'en' # required by Sphinx v5.0.0 # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. # This pattern also affects html_static_path and html_extra_path . exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # -- Options for HTML output ------------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. # html_theme = 'alabaster' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. # # html_theme_options = {} # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] # Custom sidebar templates, must be a dictionary that maps document names # to template names. # # The default sidebars (for documents that don't match any pattern) are # defined by theme itself. Builtin themes are using these templates by # default: ``['localtoc.html', 'relations.html', 'sourcelink.html', # 'searchbox.html']``. # # html_sidebars = {} # -- Options for HTMLHelp output --------------------------------------------- # Output file base name for HTML help builder. htmlhelp_basename = 'elementpathdoc' # -- Options for LaTeX output ------------------------------------------------ latex_elements = { # The paper size ('letterpaper' or 'a4paper'). # # 'papersize': 'letterpaper', # The font size ('10pt', '11pt' or '12pt'). # # 'pointsize': '10pt', # Additional stuff for the LaTeX preamble. # # 'preamble': '', # Latex figure (float) alignment # # 'figure_align': 'htbp', } # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, # author, documentclass [howto, manual, or own class]). latex_documents = [ (master_doc, 'elementpath.tex', 'elementpath Manual', 'Davide Brunato', 'manual'), ] # -- Options for manual page output ------------------------------------------ # One entry per manual page. List of tuples # (source start file, name, description, authors, manual section). man_pages = [ (master_doc, 'elementpath', 'elementpath Manual', [author], 1) ] # -- Options for Texinfo output ---------------------------------------------- # Grouping the document tree into Texinfo files. List of tuples # (source start file, target name, title, author, # dir menu entry, description, category) texinfo_documents = [ (master_doc, 'elementpath', 'elementpath Manual', author, 'elementpath', 'One line description of project.', 'Miscellaneous'), ] # -- Options for Epub output ------------------------------------------------- # Bibliographic Dublin Core info. epub_title = project epub_author = author epub_publisher = author epub_copyright = copyright # The unique identifier of the text. This can be a ISBN number # or the project homepage. # # epub_identifier = '' # A unique identification for the text. # # epub_uid = '' # A list of files that should not be packed into the epub file. epub_exclude_files = ['search.html'] # -- Extension configuration ------------------------------------------------- elementpath-3.0.2/doc/index.rst000066400000000000000000000005161427546011100164330ustar00rootroot00000000000000.. elementpath documentation master file, created by sphinx-quickstart on Fri May 4 19:54:35 2018. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. elementpath manual ================== .. toctree:: :maxdepth: 2 introduction xpath_api pratt_api elementpath-3.0.2/doc/introduction.rst000066400000000000000000000001561427546011100200450ustar00rootroot00000000000000************ Introduction ************ .. include:: ../README.rst :start-after: elementpath-introduction elementpath-3.0.2/doc/make.bat000066400000000000000000000014571427546011100162040ustar00rootroot00000000000000@ECHO OFF pushd %~dp0 REM Command file for Sphinx documentation if "%SPHINXBUILD%" == "" ( set SPHINXBUILD=sphinx-build ) set SOURCEDIR=. set BUILDDIR=_build set SPHINXPROJ=elementpath if "%1" == "" goto help %SPHINXBUILD% >NUL 2>NUL if errorlevel 9009 ( echo. echo.The 'sphinx-build' command was not found. Make sure you have Sphinx echo.installed, then set the SPHINXBUILD environment variable to point echo.to the full path of the 'sphinx-build' executable. Alternatively you echo.may add the Sphinx directory to PATH. echo. echo.If you don't have Sphinx installed, grab it from echo.http://sphinx-doc.org/ exit /b 1 ) %SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% goto end :help %SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% :end popd elementpath-3.0.2/doc/pratt_api.rst000066400000000000000000000036121427546011100173070ustar00rootroot00000000000000****************** Pratt's parser API ****************** The TDOP (Top Down Operator Precedence) parser implemented within this library is a variant of the original Pratt's parser based on a class for the parser and meta-classes for tokens. The parser base class includes helper functions for registering token classes, the Pratt's methods and a regexp-based tokenizer builder. There are also additional methods and attributes to help the developing of new parsers. Parsers can be defined by class derivation and following a tokens registration procedure. These classes are not available at package level but only within module `elementpath.tdop`. Token base class ================ .. autoclass:: elementpath.tdop.Token .. autoattribute:: arity .. autoattribute:: tree .. autoattribute:: source .. automethod:: nud .. automethod:: led .. automethod:: evaluate .. automethod:: iter Helper methods for checking symbols and for error raising: .. automethod:: expected .. automethod:: unexpected .. automethod:: wrong_syntax .. automethod:: wrong_value .. automethod:: wrong_type Parser base class ================= .. autoclass:: elementpath.tdop.Parser .. autoattribute:: position Parsing methods: .. automethod:: parse .. automethod:: advance .. automethod:: advance_until .. automethod:: expression Helper methods for checking parser status: .. automethod:: is_source_start .. automethod:: is_line_start .. automethod:: is_spaced Helper methods for building new parsers: .. automethod:: register .. automethod:: unregister .. automethod:: duplicate .. automethod:: literal .. automethod:: nullary .. automethod:: prefix .. automethod:: postfix .. automethod:: infix .. automethod:: infixr .. automethod:: method .. automethod:: build .. automethod:: create_tokenizer elementpath-3.0.2/doc/xpath_api.rst000066400000000000000000000110551427546011100173010ustar00rootroot00000000000000**************** Public XPath API **************** The package includes some classes and functions that implement XPath selectors, parsers, tokens, contexts and schema proxy. XPath selectors =============== .. autofunction:: elementpath.select .. autofunction:: elementpath.iter_select .. autoclass:: elementpath.Selector .. autoattribute:: namespaces .. automethod:: select .. automethod:: iter_select XPath parsers ============= .. autoclass:: elementpath.XPath1Parser .. autoattribute:: DEFAULT_NAMESPACES .. autoattribute:: version Helper methods for defining token classes: .. automethod:: axis .. automethod:: function .. autoclass:: elementpath.XPath2Parser .. autoclass:: elementpath.xpath3.XPath30Parser XPath tokens ============ .. autoclass:: elementpath.XPathToken .. automethod:: evaluate .. automethod:: select Context manipulation helpers: .. automethod:: get_argument .. automethod:: atomization .. automethod:: get_atomized_operand .. automethod:: iter_comparison_data .. automethod:: get_operands .. automethod:: get_results .. automethod:: select_results .. automethod:: adjust_datetime .. automethod:: use_locale Schema context methods .. automethod:: select_xsd_nodes .. automethod:: add_xsd_type .. automethod:: get_xsd_type .. automethod:: get_typed_node Data accessor helpers .. automethod:: data_value .. automethod:: boolean_value .. automethod:: string_value .. automethod:: number_value .. automethod:: schema_node_value Error management helper: .. automethod:: error XPath contexts ============== .. autoclass:: elementpath.XPathContext .. autoclass:: elementpath.XPathSchemaContext XML Schema proxy ================ The XPath 2.0 parser can be interfaced with an XML Schema processor through a schema proxy. An :class:`XMLSchemaProxy` class is defined for interfacing schemas created with the *xmlschema* package. This class is based on an abstract class :class:`elementpath.AbstractSchemaProxy`, that can be used for implementing concrete interfaces to other types of XML Schema processors. .. autoclass:: elementpath.AbstractSchemaProxy .. automethod:: bind_parser .. automethod:: get_context .. automethod:: find .. automethod:: get_type .. automethod:: get_attribute .. automethod:: get_element .. automethod:: is_instance .. automethod:: cast_as .. automethod:: iter_atomic_types XPath nodes =========== XPath nodes are processed using a set of classes derived from :class:`elementpath.XPathNode`. This class hierarchy is as simple as possible, with a focus on speed a low memory consumption. .. autoclass:: elementpath.XPathNode The seven XPath node types: .. autoclass:: elementpath.AttributeNode .. autoclass:: elementpath.NamespaceNode .. autoclass:: elementpath.TextNode .. autoclass:: elementpath.CommentNode .. autoclass:: elementpath.ProcessingInstructionNode .. autoclass:: elementpath.ElementNode .. autoclass:: elementpath.DocumentNode There are also other two specialized versions of ElementNode usable on specific cases: .. autoclass:: elementpath.LazyElementNode .. autoclass:: elementpath.SchemaElementNode Node tree builders ================== Node trees are automatically created during the initialization of an :class:`elementpath.XPathContext`. But if you need to process the same XML data more times there is an helper API for creating document or element based node trees: .. autofunction:: elementpath.get_node_tree .. autofunction:: elementpath.build_node_tree .. autofunction:: elementpath.build_lxml_node_tree .. autofunction:: elementpath.build_schema_node_tree XPath regular expressions ========================= .. autofunction:: elementpath.translate_pattern Exception classes ================= .. autoexception:: elementpath.ElementPathError .. autoexception:: elementpath.MissingContextError .. autoexception:: elementpath.RegexError .. autoexception:: elementpath.ElementPathLocaleError There are also other exceptions, multiple derived from the base exception :class:`elementpath.ElementPathError` and Python built-in exceptions: .. autoexception:: elementpath.ElementPathKeyError .. autoexception:: elementpath.ElementPathNameError .. autoexception:: elementpath.ElementPathOverflowError .. autoexception:: elementpath.ElementPathRuntimeError .. autoexception:: elementpath.ElementPathSyntaxError .. autoexception:: elementpath.ElementPathTypeError .. autoexception:: elementpath.ElementPathValueError .. autoexception:: elementpath.ElementPathZeroDivisionError elementpath-3.0.2/elementpath/000077500000000000000000000000001427546011100163315ustar00rootroot00000000000000elementpath-3.0.2/elementpath/__init__.py000066400000000000000000000052011427546011100204400ustar00rootroot00000000000000# # Copyright (c), 2018-2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # __version__ = '3.0.2' __author__ = "Davide Brunato" __contact__ = "brunato@sissa.it" __copyright__ = "Copyright 2018-2022, SISSA" __license__ = "MIT" __status__ = "Production/Stable" # Imports here are considered as stable API, other internal calls may change. from . import datatypes # XSD datatypes from . import etree # Safe parser and helper functions for ElementTree from . import protocols # Protocols for type annotations from .exceptions import ElementPathError, MissingContextError, ElementPathKeyError, \ ElementPathZeroDivisionError, ElementPathNameError, ElementPathOverflowError, \ ElementPathRuntimeError, ElementPathSyntaxError, ElementPathTypeError, \ ElementPathValueError, ElementPathLocaleError from .xpath_context import XPathContext, XPathSchemaContext from .xpath_nodes import XPathNode, DocumentNode, ElementNode, AttributeNode, \ NamespaceNode, CommentNode, ProcessingInstructionNode, TextNode, \ LazyElementNode, SchemaElementNode from .tree_builders import get_node_tree, build_node_tree, build_lxml_node_tree, \ build_schema_node_tree from .xpath_token import XPathToken, XPathFunction from .xpath1 import XPath1Parser from .xpath2 import XPath2Parser from .xpath_selectors import select, iter_select, Selector from .schema_proxy import AbstractSchemaProxy from .regex import RegexError, translate_pattern TypedElement = ElementNode # for backward compatibility with xmlschema<=1.10.0 __all__ = ['datatypes', 'protocols', 'etree', 'ElementPathError', 'MissingContextError', 'ElementPathKeyError', 'ElementPathZeroDivisionError', 'ElementPathNameError', 'ElementPathOverflowError', 'ElementPathRuntimeError', 'ElementPathSyntaxError', 'ElementPathTypeError', 'ElementPathValueError', 'ElementPathLocaleError', 'XPathContext', 'XPathSchemaContext', 'XPathNode', 'DocumentNode', 'ElementNode', 'AttributeNode', 'NamespaceNode', 'CommentNode', 'ProcessingInstructionNode', 'TextNode', 'LazyElementNode', 'SchemaElementNode', 'TypedElement', 'get_node_tree', 'build_node_tree', 'build_lxml_node_tree', 'build_schema_node_tree', 'XPathToken', 'XPathFunction', 'XPath1Parser', 'XPath2Parser', 'select', 'iter_select', 'Selector', 'AbstractSchemaProxy', 'RegexError', 'translate_pattern'] elementpath-3.0.2/elementpath/datatypes/000077500000000000000000000000001427546011100203275ustar00rootroot00000000000000elementpath-3.0.2/elementpath/datatypes/__init__.py000066400000000000000000000167151427546011100224520ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XSD atomic datatypes subpackage. Includes a class for UntypedAtomic data and classes for other XSD built-in types. This subpackage raises only built-in exceptions in order to be reusable in other packages. """ from decimal import Decimal from typing import Dict, Optional, Union from ..helpers import QNAME_PATTERN # For backward compatibility from ..namespaces import XSD_NAMESPACE from ..protocols import XsdTypeProtocol from .atomic_types import xsd10_atomic_types, xsd11_atomic_types, \ AtomicTypeMeta, AnyAtomicType from .untyped import UntypedAtomic from .qname import Notation, QName from .numeric import Float10, Float, Integer, Int, NegativeInteger, \ PositiveInteger, NonNegativeInteger, NonPositiveInteger, Long, \ Short, Byte, UnsignedByte, UnsignedInt, UnsignedLong, UnsignedShort from .string import NormalizedString, XsdToken, Name, NCName, NMToken, Id, \ Idref, Language, Entity from .uri import AnyURI from .binary import AbstractBinary, Base64Binary, HexBinary from .datetime import AbstractDateTime, DateTime10, DateTime, DateTimeStamp, \ Date10, Date, GregorianDay, GregorianMonth, GregorianYear, GregorianYear10, \ GregorianMonthDay, GregorianYearMonth, GregorianYearMonth10, Time, Timezone, \ Duration, DayTimeDuration, YearMonthDuration, OrderedDateTime from .proxies import BooleanProxy, DecimalProxy, DoubleProxy10, DoubleProxy, \ StringProxy, NumericProxy, ArithmeticProxy ## # Register not derived XSD primitive types as virtual subclasses of AnyAtomicType AnyAtomicType.register(BooleanProxy) AnyAtomicType.register(Base64Binary) AnyAtomicType.register(DecimalProxy) AnyAtomicType.register(StringProxy) AnyAtomicType.register(Date10) AnyAtomicType.register(DateTime10) AnyAtomicType.register(DoubleProxy10) AnyAtomicType.register(GregorianDay) AnyAtomicType.register(GregorianMonth) AnyAtomicType.register(GregorianMonthDay) AnyAtomicType.register(GregorianYear10) AnyAtomicType.register(GregorianYearMonth10) AnyAtomicType.register(HexBinary) AnyAtomicType.register(Notation) AnyAtomicType.register(QName) AnyAtomicType.register(Time) AnyAtomicType.register(UntypedAtomic) StringProxy.register(NormalizedString) xsd11_atomic_types.update( (k, v) for k, v in xsd10_atomic_types.items() if k not in xsd11_atomic_types ) XSD_BUILTIN_TYPES = xsd10_atomic_types DatetimeValueType = Union[OrderedDateTime, Date10, Date, DateTime10, DateTime, Time, GregorianDay, GregorianMonth, GregorianMonthDay, GregorianYear10, GregorianYear, GregorianYearMonth10, GregorianYearMonth] AtomicValueType = Union[str, int, float, Decimal, bool, Integer, Float10, NormalizedString, AnyURI, HexBinary, Base64Binary, QName, AbstractDateTime, Duration, UntypedAtomic, DatetimeValueType] ATOMIC_VALUES: Dict[Optional[str], AtomicValueType] = { f'{{{XSD_NAMESPACE}}}untypedAtomic': UntypedAtomic('1'), f'{{{XSD_NAMESPACE}}}anyType': UntypedAtomic('1'), f'{{{XSD_NAMESPACE}}}anySimpleType': UntypedAtomic('1'), f'{{{XSD_NAMESPACE}}}anyAtomicType': UntypedAtomic('1'), f'{{{XSD_NAMESPACE}}}boolean': True, f'{{{XSD_NAMESPACE}}}decimal': Decimal('1.0'), f'{{{XSD_NAMESPACE}}}double': 1.0, f'{{{XSD_NAMESPACE}}}float': Float10(1.0), f'{{{XSD_NAMESPACE}}}string': ' alpha\t', f'{{{XSD_NAMESPACE}}}date': Date.fromstring('2000-01-01'), f'{{{XSD_NAMESPACE}}}dateTime': DateTime.fromstring('2000-01-01T12:00:00'), f'{{{XSD_NAMESPACE}}}gDay': GregorianDay.fromstring('---31'), f'{{{XSD_NAMESPACE}}}gMonth': GregorianMonth.fromstring('--12'), f'{{{XSD_NAMESPACE}}}gMonthDay': GregorianMonthDay.fromstring('--12-01'), f'{{{XSD_NAMESPACE}}}gYear': GregorianYear.fromstring('1999'), f'{{{XSD_NAMESPACE}}}gYearMonth': GregorianYearMonth.fromstring('1999-09'), f'{{{XSD_NAMESPACE}}}time': Time.fromstring('09:26:54'), f'{{{XSD_NAMESPACE}}}duration': Duration.fromstring('P1MT1S'), f'{{{XSD_NAMESPACE}}}dayTimeDuration': DayTimeDuration.fromstring('P1DT1S'), f'{{{XSD_NAMESPACE}}}yearMonthDuration': YearMonthDuration.fromstring('P1Y1M'), f'{{{XSD_NAMESPACE}}}QName': QName("http://www.w3.org/2001/XMLSchema", 'xs:element'), f'{{{XSD_NAMESPACE}}}anyURI': AnyURI('https://example.com'), f'{{{XSD_NAMESPACE}}}normalizedString': NormalizedString(' alpha '), f'{{{XSD_NAMESPACE}}}token': XsdToken('a token'), f'{{{XSD_NAMESPACE}}}language': Language('en-US'), f'{{{XSD_NAMESPACE}}}Name': Name('_a.name::'), f'{{{XSD_NAMESPACE}}}NCName': NCName('nc-name'), f'{{{XSD_NAMESPACE}}}ID': Id('id1'), f'{{{XSD_NAMESPACE}}}IDREF': Idref('id_ref1'), f'{{{XSD_NAMESPACE}}}ENTITY': Entity('entity1'), f'{{{XSD_NAMESPACE}}}NMTOKEN': NMToken('a_token'), f'{{{XSD_NAMESPACE}}}base64Binary': Base64Binary(b'YWxwaGE='), f'{{{XSD_NAMESPACE}}}hexBinary': HexBinary(b'31'), f'{{{XSD_NAMESPACE}}}dateTimeStamp': DateTimeStamp.fromstring('2000-01-01T12:00:00+01:00'), f'{{{XSD_NAMESPACE}}}integer': Integer(1), f'{{{XSD_NAMESPACE}}}long': Long(1), f'{{{XSD_NAMESPACE}}}int': Int(1), f'{{{XSD_NAMESPACE}}}short': Short(1), f'{{{XSD_NAMESPACE}}}byte': Byte(1), f'{{{XSD_NAMESPACE}}}positiveInteger': PositiveInteger(1), f'{{{XSD_NAMESPACE}}}negativeInteger': NegativeInteger(-1), f'{{{XSD_NAMESPACE}}}nonPositiveInteger': NonPositiveInteger(0), f'{{{XSD_NAMESPACE}}}nonNegativeInteger': NonNegativeInteger(0), f'{{{XSD_NAMESPACE}}}unsignedLong': UnsignedLong(1), f'{{{XSD_NAMESPACE}}}unsignedInt': UnsignedInt(1), f'{{{XSD_NAMESPACE}}}unsignedShort': UnsignedShort(1), f'{{{XSD_NAMESPACE}}}unsignedByte': UnsignedByte(1), } def get_atomic_value(xsd_type: Optional[XsdTypeProtocol]) -> AtomicValueType: """Gets an atomic value for an XSD type instance. Used for schema contexts.""" if xsd_type is None: return UntypedAtomic('1') try: return ATOMIC_VALUES[xsd_type.name] except KeyError: try: return ATOMIC_VALUES[xsd_type.root_type.name] except KeyError: return UntypedAtomic('1') __all__ = ['xsd10_atomic_types', 'xsd11_atomic_types', 'get_atomic_value', 'AtomicTypeMeta', 'AnyAtomicType', 'XSD_BUILTIN_TYPES', 'NumericProxy', 'ArithmeticProxy', 'QNAME_PATTERN', 'AbstractDateTime', 'DateTime10', 'DateTime', 'DateTimeStamp', 'Date10', 'Date', 'Time', 'GregorianDay', 'GregorianMonth', 'GregorianMonthDay', 'GregorianYear10', 'GregorianYear', 'GregorianYearMonth10', 'GregorianYearMonth', 'Timezone', 'Duration', 'YearMonthDuration', 'DayTimeDuration', 'StringProxy', 'NormalizedString', 'XsdToken', 'Language', 'Name', 'NCName', 'Id', 'Idref', 'Entity', 'NMToken', 'Base64Binary', 'HexBinary', 'Float10', 'Float', 'Integer', 'NonPositiveInteger', 'NegativeInteger', 'Long', 'Int', 'Short', 'Byte', 'NonNegativeInteger', 'PositiveInteger', 'UnsignedLong', 'UnsignedInt', 'UnsignedShort', 'UnsignedByte', 'AnyURI', 'Notation', 'QName', 'BooleanProxy', 'DecimalProxy', 'DoubleProxy10', 'DoubleProxy', 'UntypedAtomic', 'AbstractBinary', 'AtomicValueType', 'DatetimeValueType', 'OrderedDateTime'] elementpath-3.0.2/elementpath/datatypes/atomic_types.py000066400000000000000000000070421427546011100234040ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from abc import ABCMeta from typing import Any, Dict, Optional, Pattern, Tuple, Type import re XSD_NAMESPACE = "http://www.w3.org/2001/XMLSchema" ### # Classes for XSD built-in atomic types. All defined classes use a # metaclass that adds some common methods and registers each class # into a dictionary. Some classes of XSD primitive types are defined # as proxies of basic Python datatypes. xsd10_atomic_types: Dict[Optional[str], 'AtomicTypeMeta'] = {} """Dictionary of builtin XSD 1.0 atomic types.""" xsd11_atomic_types: Dict[Optional[str], 'AtomicTypeMeta'] = {} """Dictionary of builtin XSD 1.1 atomic types.""" class AtomicTypeMeta(ABCMeta): """ Metaclass for creating XSD atomic types. The created classes are decorated with missing attributes and methods. When a name attribute is provided the class is registered into a global map of XSD atomic types and also the expanded name is added. """ xsd_version: str pattern: Pattern[str] name: Optional[str] = None def __new__(mcs, class_name: str, bases: Tuple[Type[Any], ...], dict_: Dict[str, Any]) \ -> 'AtomicTypeMeta': try: name = dict_['name'] except KeyError: name = dict_['name'] = None # do not inherit name if name is not None and not isinstance(name, str): raise TypeError("attribute 'name' must be a string or None") dict_['is_valid'] = classmethod(mcs.is_valid) dict_['invalid_type'] = classmethod(mcs.invalid_type) dict_['invalid_value'] = classmethod(mcs.invalid_value) cls = super(AtomicTypeMeta, mcs).__new__(mcs, class_name, bases, dict_) # Add missing attributes and methods if not hasattr(cls, 'xsd_version'): cls.xsd_version = '1.0' if not hasattr(cls, 'pattern'): cls.pattern = re.compile(r'^$') # Register class with a name if name: expanded_name = '{%s}%s' % (XSD_NAMESPACE, name) if cls.xsd_version == '1.0': xsd10_atomic_types[name] = xsd10_atomic_types[expanded_name] = cls else: xsd11_atomic_types[name] = xsd11_atomic_types[expanded_name] = cls return cls def validate(cls, value: object) -> None: if isinstance(value, cls): return elif isinstance(value, str): if cls.pattern.match(value) is None: raise cls.invalid_value(value) else: raise cls.invalid_type(value) def is_valid(cls, value: object) -> bool: try: cls.validate(value) except (TypeError, ValueError): return False else: return True def invalid_type(cls, value: object) -> TypeError: if cls.name: return TypeError('invalid type {!r} for xs:{}'.format(type(value), cls.name)) return TypeError('invalid type {!r} for {!r}'.format(type(value), cls)) def invalid_value(cls, value: object) -> ValueError: if cls.name: return ValueError('invalid value {!r} for xs:{}'.format(value, cls.name)) return ValueError('invalid value {!r} for {!r}'.format(value, cls)) class AnyAtomicType(metaclass=AtomicTypeMeta): name = 'anyAtomicType' elementpath-3.0.2/elementpath/datatypes/binary.py000066400000000000000000000120011427546011100221570ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from abc import abstractmethod from typing import Any, Callable, Union import re import codecs from ..helpers import collapse_white_spaces from .atomic_types import AtomicTypeMeta from .untyped import UntypedAtomic class AbstractBinary(metaclass=AtomicTypeMeta): """ Abstract class for xs:base64Binary data. :param value: a string or a binary data or an untyped atomic instance. """ value: bytes invalid_type: Callable[[Any], TypeError] def __init__(self, value: Union[str, bytes, UntypedAtomic, 'AbstractBinary']) -> None: if isinstance(value, self.__class__): self.value = value.value elif isinstance(value, AbstractBinary): self.value = self.encoder(value.decode()) else: if isinstance(value, UntypedAtomic): value = collapse_white_spaces(value.value) elif isinstance(value, str): value = collapse_white_spaces(value) elif isinstance(value, bytes): value = collapse_white_spaces(value.decode('utf-8')) else: raise self.invalid_type(value) self.validate(value) self.value = value.replace(' ', '').encode('ascii') def __repr__(self) -> str: return '%s(%r)' % (self.__class__.__name__, self.value) def __bytes__(self) -> bytes: return self.value @classmethod def validate(cls, value: object) -> None: raise NotImplementedError() @staticmethod @abstractmethod def encoder(value: bytes) -> bytes: raise NotImplementedError() @abstractmethod def decode(self) -> bytes: raise NotImplementedError() class Base64Binary(AbstractBinary): name = 'base64Binary' pattern = re.compile( r'((?:(?:[A-Za-z0-9+/] ?){4})*(?:(?:[A-Za-z0-9+/] ?){3}[A-Za-z0-9+/]|' r'(?:[A-Za-z0-9+/] ?){2}' r'[AEIMQUYcgkosw048] ?=|[A-Za-z0-9+/] ?[AQgw] ?= ?=))?' ) @classmethod def validate(cls, value: object) -> None: if isinstance(value, cls): return elif isinstance(value, bytes): value = value.decode() elif not isinstance(value, str): raise cls.invalid_type(value) value = value.replace(' ', '') if value: match = cls.pattern.match(value) if match is None or match.group(0) != value: raise cls.invalid_value(value) def __str__(self) -> str: return self.value.decode('utf-8') def __hash__(self) -> int: return hash(self.value) def __len__(self) -> int: if self.value[-2] == ord('='): return len(self.value) // 4 * 3 - 2 elif self.value[-1] == ord('='): return len(self.value) // 4 * 3 - 1 return len(self.value) // 4 * 3 def __eq__(self, other: object) -> bool: if isinstance(other, self.__class__): return self.value == other.value elif isinstance(other, UntypedAtomic): return self.value == self.__class__(other).value elif isinstance(other, str): return self.value == other.encode() return isinstance(other, bytes) and self.value == other @staticmethod def encoder(value: bytes) -> bytes: return codecs.encode(value, 'base64').rstrip(b'\n') def decode(self) -> bytes: return codecs.decode(self.value, 'base64') class HexBinary(AbstractBinary): name = 'hexBinary' pattern = re.compile(r'^([0-9a-fA-F]{2})*$') @classmethod def validate(cls, value: object) -> None: if isinstance(value, cls): return elif isinstance(value, bytes): value = value.decode() elif not isinstance(value, str): raise cls.invalid_type(value) value = value.strip() if cls.pattern.match(value) is None: raise cls.invalid_value(value) @staticmethod def encoder(value: bytes) -> bytes: return codecs.encode(value, 'hex') def decode(self) -> bytes: return codecs.decode(self.value, 'hex') def __str__(self) -> str: return self.value.decode('utf-8').upper() def __hash__(self) -> int: return hash(self.value.upper()) def __len__(self) -> int: return len(self.value) // 2 def __eq__(self, other: object) -> bool: if isinstance(other, self.__class__): return self.value.upper() == other.value.upper() elif isinstance(other, UntypedAtomic): return self.value.upper() == self.__class__(other).value.upper() elif isinstance(other, str): return self.value.upper() == other.encode().upper() return isinstance(other, bytes) and self.value.upper() == other.upper() elementpath-3.0.2/elementpath/datatypes/datetime.py000066400000000000000000001153271427546011100225060ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from abc import abstractmethod import math import operator import re import datetime from calendar import isleap from decimal import Decimal, Context from typing import cast, Any, Callable, Dict, Optional, Tuple, Union from ..helpers import MONTH_DAYS_LEAP, MONTH_DAYS, DAYS_IN_4Y, \ DAYS_IN_100Y, DAYS_IN_400Y, days_from_common_era, adjust_day, \ normalized_seconds, months2days, round_number from .atomic_types import AtomicTypeMeta, AnyAtomicType from .untyped import UntypedAtomic class Timezone(datetime.tzinfo): """ A tzinfo implementation for XSD timezone offsets. Offsets must be specified between -14:00 and +14:00. :param offset: a timedelta instance or an XSD timezone formatted string. """ _maxoffset = datetime.timedelta(hours=14, minutes=0) _minoffset = -_maxoffset def __init__(self, offset: datetime.timedelta) -> None: super(Timezone, self).__init__() if not isinstance(offset, datetime.timedelta): raise TypeError("offset must be a datetime.timedelta") if offset < self._minoffset or offset > self._maxoffset: raise ValueError("offset must be between -14:00 and +14:00") self.offset = offset @classmethod def fromstring(cls, text: str) -> 'Timezone': try: hours, minutes = text.strip().split(':') if hours.startswith('-'): return cls(datetime.timedelta(hours=int(hours), minutes=-int(minutes))) else: return cls(datetime.timedelta(hours=int(hours), minutes=int(minutes))) except AttributeError: raise TypeError("argument is not a string") except ValueError: if text.strip() == 'Z': return cls(datetime.timedelta(0)) raise ValueError("%r: not an XSD timezone formatted string" % text) from None @classmethod def fromduration(cls, duration: 'Duration') -> 'Timezone': if duration.seconds % 60 != 0: raise ValueError("{!r} has not an integral number of minutes".format(duration)) return cls(datetime.timedelta(seconds=int(duration.seconds))) def __getinitargs__(self) -> Tuple[datetime.timedelta]: return self.offset, def __hash__(self) -> int: return hash(self.offset) def __eq__(self, other: object) -> bool: return isinstance(other, Timezone) and self.offset == other.offset def __ne__(self, other: object) -> bool: return not isinstance(other, Timezone) or self.offset != other.offset def __repr__(self) -> str: return "%s(%r)" % (self.__class__.__name__, self.offset) def __str__(self) -> str: return self.tzname(None) def utcoffset(self, dt: Optional[datetime.datetime]) -> datetime.timedelta: if not isinstance(dt, datetime.datetime) and dt is not None: raise TypeError("utcoffset() argument must be a " "datetime.datetime instance or None") return self.offset def tzname(self, dt: Optional[datetime.datetime]) -> str: if not isinstance(dt, datetime.datetime) and dt is not None: raise TypeError("tzname() argument must be a " "datetime.datetime instance or None") if not self.offset: return 'Z' elif self.offset < datetime.timedelta(0): sign, offset = '-', -self.offset else: sign, offset = '+', self.offset hours, minutes = offset.seconds // 3600, offset.seconds // 60 % 60 return '{}{:02d}:{:02d}'.format(sign, hours, minutes) def dst(self, dt: Optional[datetime.datetime]) -> None: if not isinstance(dt, datetime.datetime) and dt is not None: raise TypeError("dst() argument must be a " "datetime.datetime instance or None") def fromutc(self, dt: datetime.datetime) -> datetime.datetime: if isinstance(dt, datetime.datetime): return dt + self.offset raise TypeError("fromutc() argument must be a datetime.datetime instance") class AbstractDateTime(metaclass=AtomicTypeMeta): """ A class for representing XSD date/time objects. It uses and internal datetime.datetime attribute and an integer attribute for processing BCE years or for years after 9999 CE. """ xsd_version = '1.0' pattern = re.compile(r'^$') _utc_timezone = Timezone(datetime.timedelta(0)) _year = None def __init__(self, year: int = 2000, month: int = 1, day: int = 1, hour: int = 0, minute: int = 0, second: int = 0, microsecond: int = 0, tzinfo: Optional[datetime.tzinfo] = None) -> None: if hour == 24 and minute == second == microsecond == 0: delta = datetime.timedelta(days=1) hour = 0 else: delta = datetime.timedelta(0) if 1 <= year <= 9999: self._dt = datetime.datetime(year, month, day, hour, minute, second, microsecond, tzinfo) elif year == 0: raise ValueError('0 is an illegal value for year') elif not isinstance(year, int): raise TypeError("invalid type %r for year" % type(year)) elif abs(year) > 2 ** 31: raise OverflowError("year overflow") else: self._year = year if isleap(year + bool(self.xsd_version != '1.0')): self._dt = datetime.datetime(4, month, day, hour, minute, second, microsecond, tzinfo) else: self._dt = datetime.datetime(6, month, day, hour, minute, second, microsecond, tzinfo) if delta: self._dt += delta def __repr__(self) -> str: fields = self.pattern.groupindex.keys() arg_string = ', '.join( str(getattr(self, k)) for k in ['year', 'month', 'day', 'hour', 'minute'] if k in fields ) if 'second' in fields: if self.microsecond: arg_string += ', %d.%06d' % (self.second, self.microsecond) else: arg_string += ', %d' % self.second if self.tzinfo is not None: arg_string += ', tzinfo=%r' % self.tzinfo return '%s(%s)' % (self.__class__.__name__, arg_string) @abstractmethod def __str__(self) -> str: raise NotImplementedError() @property def year(self) -> int: return self._year or self._dt.year @property def bce(self) -> bool: return self._year is not None and self._year < 0 @property def iso_year(self) -> str: """The ISO string representation of the year field.""" year = self.year if -9999 <= year < -1: return '{:05}'.format(year if self.xsd_version == '1.0' else year + 1) elif year == -1: return '-0001' if self.xsd_version == '1.0' else '0000' elif 0 <= year <= 9999: return '{:04}'.format(year) else: return str(year) @property def month(self) -> int: return self._dt.month @property def day(self) -> int: return self._dt.day @property def hour(self) -> int: return self._dt.hour @property def minute(self) -> int: return self._dt.minute @property def second(self) -> int: return self._dt.second @property def microsecond(self) -> int: return self._dt.microsecond @property def tzinfo(self) -> Optional[Timezone]: return cast(Timezone, self._dt.tzinfo) @tzinfo.setter def tzinfo(self, tz: Timezone) -> None: self._dt = self._dt.replace(tzinfo=tz) def tzname(self) -> Optional[str]: return self._dt.tzname() def astimezone(self, tz: Optional[datetime.tzinfo] = None) -> datetime.datetime: return self._dt.astimezone(tz) def isocalendar(self) -> Tuple[int, int, int]: return self._dt.isocalendar() @classmethod def fromstring(cls, datetime_string: str, tzinfo: Optional[Timezone] = None) \ -> 'AbstractDateTime': """ Creates an XSD date/time instance from a string formatted value. :param datetime_string: a string containing an XSD formatted date/time specification. :param tzinfo: optional implicit timezone information, must be a `Timezone` instance. :return: an AbstractDateTime concrete subclass instance. """ if not isinstance(datetime_string, str): msg = '1st argument has an invalid type {!r}' raise TypeError(msg.format(type(datetime_string))) elif tzinfo and not isinstance(tzinfo, Timezone): msg = '2nd argument has an invalid type {!r}' raise TypeError(msg.format(type(tzinfo))) match = cls.pattern.match(datetime_string.strip()) if match is None: msg = 'Invalid datetime string {!r} for {!r}' raise ValueError(msg.format(datetime_string, cls)) match_dict = match.groupdict() kwargs: Dict[str, int] = { k: int(v) for k, v in match_dict.items() if k != 'tzinfo' and v is not None } if match_dict['tzinfo'] is not None: tzinfo = Timezone.fromstring(match_dict['tzinfo']) if 'microsecond' in kwargs: microseconds = match_dict['microsecond'] if len(microseconds) != 6: microseconds += '0' * (6 - len(microseconds)) kwargs['microsecond'] = int(microseconds[:6]) if 'year' in kwargs: year_digits = match_dict['year'].lstrip('-') if year_digits.startswith('0') and len(year_digits) > 4: msg = "Invalid datetime string {!r} for {!r} (when year " \ "exceeds 4 digits leading zeroes are not allowed)" raise ValueError(msg.format(datetime_string, cls)) if cls.xsd_version == '1.0': if kwargs['year'] == 0: raise ValueError("year '0000' is an illegal value for XSD 1.0") elif kwargs['year'] <= 0: kwargs['year'] -= 1 return cls(tzinfo=tzinfo, **kwargs) @classmethod def fromdatetime(cls, dt: Union[datetime.datetime, datetime.date, datetime.time], year: Optional[int] = None) -> 'AbstractDateTime': """ Creates an XSD date/time instance from a datetime.datetime/date/time instance. :param dt: the datetime, date or time instance that stores the XSD Date/Time value. :param year: if an year is provided the created instance refers to it and the \ possibly present *dt.year* part is ignored. :return: an AbstractDateTime concrete subclass instance. """ if not isinstance(dt, (datetime.datetime, datetime.date, datetime.time)): raise TypeError('1st argument has an invalid type %r' % type(dt)) elif year is not None and not isinstance(year, int): raise TypeError('2nd argument has an invalid type %r' % type(year)) kwargs = {k: getattr(dt, k) for k in cls.pattern.groupindex.keys() if hasattr(dt, k)} if year is not None: kwargs['year'] = year return cls(**kwargs) # Python can't compares offset-naive and offset-aware datetimes def _get_operands(self, other: object) -> Tuple[datetime.datetime, datetime.datetime]: if isinstance(other, (self.__class__, datetime.datetime)) or \ isinstance(self, other.__class__): dt: datetime.datetime = getattr(other, '_dt', cast(datetime.datetime, other)) if self._dt.tzinfo is dt.tzinfo: return self._dt, dt elif self.tzinfo is None: return self._dt.replace(tzinfo=self._utc_timezone), dt elif dt.tzinfo is None: return self._dt, dt.replace(tzinfo=self._utc_timezone) else: return self._dt, dt else: raise TypeError("wrong type %r for operand %r" % (type(other), other)) def __hash__(self) -> int: return hash((self._dt, self._year)) def __eq__(self, other: object) -> bool: if not isinstance(other, (AbstractDateTime, datetime.datetime)): return False try: return operator.eq(*self._get_operands(other)) and self.year == other.year except TypeError: return False def __ne__(self, other: object) -> bool: if not isinstance(other, (AbstractDateTime, datetime.datetime)): return True try: return operator.ne(*self._get_operands(other)) or self.year != other.year except TypeError: return True class OrderedDateTime(AbstractDateTime): @abstractmethod def __str__(self) -> str: raise NotImplementedError() @classmethod def fromdelta(cls, delta: datetime.timedelta, adjust_timezone: bool = False) \ -> 'OrderedDateTime': """ Creates an XSD dateTime/date instance from a datetime.timedelta related to 0001-01-01T00:00:00 CE. In case of a date the time part is not counted. :param delta: a datetime.timedelta instance. :param adjust_timezone: if `True` adjusts the timezone of Date objects \ with eventually present hours and minutes. """ try: dt = datetime.datetime(1, 1, 1) + delta except OverflowError: days = delta.days if days > 0: y400, days = divmod(days, DAYS_IN_400Y) y100, days = divmod(days, DAYS_IN_100Y) y4, days = divmod(days, DAYS_IN_4Y) y1, days = divmod(days, 365) year = y400 * 400 + y100 * 100 + y4 * 4 + y1 + 1 if y1 == 4 or y100 == 4: year -= 1 days = 365 td = datetime.timedelta(days=days, seconds=delta.seconds, microseconds=delta.microseconds) dt = datetime.datetime(4 if isleap(year) else 6, 1, 1) + td elif days >= -366: year = -1 td = datetime.timedelta(days=days, seconds=delta.seconds, microseconds=delta.microseconds) dt = datetime.datetime(5, 1, 1) + td else: days = -days - 366 y400, days = divmod(days, DAYS_IN_400Y) y100, days = divmod(days, DAYS_IN_100Y) y4, days = divmod(days, DAYS_IN_4Y) y1, days = divmod(days, 365) year = -y400 * 400 - y100 * 100 - y4 * 4 - y1 - 2 if y1 == 4 or y100 == 4: year += 1 days = 365 td = datetime.timedelta(days=-days, seconds=delta.seconds, microseconds=delta.microseconds) if not td: dt = datetime.datetime(4 if isleap(year + 1) else 6, 1, 1) year += 1 else: dt = datetime.datetime(5 if isleap(year + 1) else 7, 1, 1) + td else: year = dt.year if issubclass(cls, Date10): if adjust_timezone and (dt.hour or dt.minute): assert dt.tzinfo is None hour, minute = dt.hour, dt.minute if hour < 14 or hour == 14 and minute == 0: tz = Timezone(datetime.timedelta(hours=-hour, minutes=-minute)) dt = dt.replace(tzinfo=tz) else: tz = Timezone(datetime.timedelta(hours=-dt.hour + 24, minutes=-minute)) dt = dt.replace(tzinfo=tz) dt += datetime.timedelta(days=1) return cls(year, dt.month, dt.day, tzinfo=dt.tzinfo) return cls(year, dt.month, dt.day, dt.hour, dt.minute, dt.second, dt.microsecond, dt.tzinfo) def todelta(self) -> datetime.timedelta: """Returns the datetime.timedelta from 0001-01-01T00:00:00 CE.""" if self._year is None: delta = operator.sub(*self._get_operands(datetime.datetime(1, 1, 1))) return cast(datetime.timedelta, delta) year, dt = self.year, self._dt tzinfo = None if dt.tzinfo is None else self._utc_timezone if year > 0: m_days = MONTH_DAYS_LEAP if isleap(year) else MONTH_DAYS days = days_from_common_era(year - 1) + sum(m_days[m] for m in range(1, dt.month)) else: m_days = MONTH_DAYS_LEAP if isleap(year + 1) else MONTH_DAYS days = days_from_common_era(year) + sum(m_days[m] for m in range(1, dt.month)) delta = (dt - datetime.datetime(dt.year, dt.month, day=1, tzinfo=tzinfo)) return datetime.timedelta(days=days, seconds=delta.total_seconds()) def _date_operator(self, op: Callable[[Any, Any], Any], other: object) \ -> Union['DayTimeDuration', 'OrderedDateTime']: if isinstance(other, self.__class__): dt1, dt2 = self._get_operands(other) if self._year is None and other._year is None: return DayTimeDuration.fromtimedelta(dt1 - dt2) return DayTimeDuration.fromtimedelta(self.todelta() - other.todelta()) elif isinstance(other, datetime.timedelta): delta = op(self.todelta(), other) return type(self).fromdelta(delta, adjust_timezone=True) elif isinstance(other, DayTimeDuration): delta = op(self.todelta(), other.get_timedelta()) tzinfo = cast(Optional[Timezone], self._dt.tzinfo) if tzinfo is None: return type(self).fromdelta(delta) value = type(self).fromdelta(delta + tzinfo.offset) value.tzinfo = tzinfo return value elif isinstance(other, YearMonthDuration): month = op(self._dt.month - 1, other.months) % 12 + 1 year = self.year + op(self._dt.month - 1, other.months) // 12 day = adjust_day(year, month, self._dt.day) if year > 0: dt = self._dt.replace(year=year, month=month, day=day) elif isleap(year): dt = self._dt.replace(year=4, month=month, day=day) else: dt = self._dt.replace(year=6, month=month, day=day) kwargs = {k: getattr(dt, k) for k in self.pattern.groupindex.keys()} if year <= 0: kwargs['year'] = year return type(self)(**kwargs) else: raise TypeError("wrong type %r for operand %r" % (type(other), other)) def __lt__(self, other: object) -> bool: if not isinstance(other, (AbstractDateTime, datetime.datetime)): return NotImplemented dt1, dt2 = self._get_operands(other) y1, y2 = self.year, other.year return y1 < y2 or y1 == y2 and dt1 < dt2 def __le__(self, other: object) -> bool: if not isinstance(other, (AbstractDateTime, datetime.datetime)): return NotImplemented dt1, dt2 = self._get_operands(other) y1, y2 = self.year, other.year return y1 < y2 or y1 == y2 and dt1 <= dt2 def __gt__(self, other: object) -> bool: if not isinstance(other, (AbstractDateTime, datetime.datetime)): return NotImplemented dt1, dt2 = self._get_operands(other) y1, y2 = self.year, other.year return y1 > y2 or y1 == y2 and dt1 > dt2 def __ge__(self, other: object) -> bool: if not isinstance(other, (AbstractDateTime, datetime.datetime)): return NotImplemented dt1, dt2 = self._get_operands(other) y1, y2 = self.year, other.year return y1 > y2 or y1 == y2 and dt1 >= dt2 def __add__(self, other: object) -> Union['DayTimeDuration', 'OrderedDateTime']: if isinstance(other, OrderedDateTime): raise TypeError("wrong type %r for operand %r" % (type(other), other)) return self._date_operator(operator.add, other) def __sub__(self, other: object) -> Union['DayTimeDuration', 'OrderedDateTime']: return self._date_operator(operator.sub, other) class DateTime10(OrderedDateTime): """XSD 1.0 xs:dateTime builtin type""" name = 'dateTime' pattern = re.compile( r'^(?P-?[0-9]*[0-9]{4})-(?P[0-9]{2})-(?P[0-9]{2})' r'(T(?P[0-9]{2}):(?P[0-9]{2}):' r'(?P[0-9]{2})(?:\.(?P[0-9]+))?)' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))?$') def __init__(self, year: int, month: int, day: int, hour: int = 0, minute: int = 0, second: int = 0, microsecond: int = 0, tzinfo: Optional[datetime.tzinfo] = None) -> None: super(DateTime10, self).__init__( year, month, day, hour, minute, second, microsecond, tzinfo ) def __str__(self) -> str: if self.microsecond: return '{}-{:02}-{:02}T{:02}:{:02}:{:02}.{}{}'.format( self.iso_year, self.month, self.day, self.hour, self.minute, self.second, '{:06}'.format(self.microsecond).rstrip('0'), str(self.tzinfo or '') ) return '{}-{:02}-{:02}T{:02}:{:02}:{:02}{}'.format( self.iso_year, self.month, self.day, self.hour, self.minute, self.second, str(self.tzinfo or '') ) class DateTime(DateTime10): """XSD 1.1 xs:dateTime builtin type""" name = 'dateTime' xsd_version = '1.1' class DateTimeStamp(DateTime): """XSD 1.1 xs:dateTimeStamp builtin type""" name = 'dateTimeStamp' pattern = re.compile( r'^(?P-?[0-9]*[0-9]{4})-(?P[0-9]{2})-(?P[0-9]{2})' r'(T(?P[0-9]{2}):(?P[0-9]{2}):' r'(?P[0-9]{2})(?:\.(?P[0-9]+))?)' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))$') class Date10(OrderedDateTime): """XSD 1.0 xs:date builtin type""" name = 'date' pattern = re.compile(r'^(?P-?[0-9]*[0-9]{4})-(?P[0-9]{2})-(?P[0-9]{2})' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))?$') def __init__(self, year: int, month: int, day: int, tzinfo: Optional[datetime.tzinfo] = None) -> None: super(Date10, self).__init__(year, month, day, tzinfo=tzinfo) def __str__(self) -> str: return '{}-{:02}-{:02}{}'.format( self.iso_year, self.month, self.day, str(self.tzinfo or '') ) class Date(Date10): """XSD 1.1 xs:date builtin type""" name = 'date' xsd_version = '1.1' class GregorianDay(OrderedDateTime): """XSD xs:gDay builtin type""" name = 'gDay' pattern = re.compile(r'^---(?P[0-9]{2})' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))?$') def __init__(self, day: int, tzinfo: Optional[Timezone] = None) -> None: super(GregorianDay, self).__init__(day=day, tzinfo=tzinfo) def __str__(self) -> str: return '---{:02}{}'.format(self.day, str(self.tzinfo or '')) class GregorianMonth(OrderedDateTime): """XSD xs:gMonth builtin type""" name = 'gMonth' pattern = re.compile(r'^--(?P[0-9]{2})' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))?$') def __init__(self, month: int, tzinfo: Optional[Timezone] = None) -> None: super(GregorianMonth, self).__init__(month=month, tzinfo=tzinfo) def __str__(self) -> str: return '--{:02}{}'.format(self.month, str(self.tzinfo or '')) class GregorianMonthDay(OrderedDateTime): """XSD xs:gMonthDay builtin type""" name = 'gMonthDay' pattern = re.compile(r'^--(?P[0-9]{2})-(?P[0-9]{2})' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))?$') def __init__(self, month: int, day: int, tzinfo: Optional[Timezone] = None) -> None: super(GregorianMonthDay, self).__init__(month=month, day=day, tzinfo=tzinfo) def __str__(self) -> str: return '--{:02}-{:02}{}'.format(self.month, self.day, str(self.tzinfo or '')) class GregorianYear10(OrderedDateTime): """XSD 1.0 xs:gYear builtin type""" name = 'gYear' pattern = re.compile(r'^(?P-?[0-9]*[0-9]{4})' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))?$') def __init__(self, year: int, tzinfo: Optional[Timezone] = None) -> None: super(GregorianYear10, self).__init__(year, tzinfo=tzinfo) def __str__(self) -> str: return '{}{}'.format(self.iso_year, str(self.tzinfo or '')) class GregorianYear(GregorianYear10): """XSD 1.1 xs:gYear builtin type""" name = 'gYear' xsd_version = '1.1' class GregorianYearMonth10(OrderedDateTime): """XSD 1.0 xs:gYearMonth builtin type""" name = 'gYearMonth' pattern = re.compile(r'^(?P-?[0-9]*[0-9]{4})-(?P[0-9]{2})' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))?$') def __init__(self, year: int, month: int, tzinfo: Optional[Timezone] = None) -> None: super(GregorianYearMonth10, self).__init__(year, month, tzinfo=tzinfo) def __str__(self) -> str: return '{}-{:02}{}'.format(self.iso_year, self.month, str(self.tzinfo or '')) class GregorianYearMonth(GregorianYearMonth10): """XSD 1.1 xs:gYearMonth builtin type""" name = 'gYearMonth' xsd_version = '1.1' class Time(AbstractDateTime): """XSD xs:time builtin type""" name = 'time' pattern = re.compile( r'^(?P[0-9]{2}):(?P[0-9]{2}):' r'(?P[0-9]{2})(?:\.(?P[0-9]+))?' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))?$') def __init__(self, hour: int = 0, minute: int = 0, second: int = 0, microsecond: int = 0, tzinfo: Union[None, Timezone, datetime.tzinfo] = None) -> None: if hour == 24 and minute == second == microsecond == 0: hour = 0 super(Time, self).__init__( hour=hour, minute=minute, second=second, microsecond=microsecond, tzinfo=tzinfo ) def __str__(self) -> str: if self.microsecond: return '{:02}:{:02}:{:02}.{}{}'.format( self.hour, self.minute, self.second, '{:06}'.format(self.microsecond).rstrip('0'), str(self.tzinfo or '') ) return '{:02}:{:02}:{:02}{}'.format( self.hour, self.minute, self.second, str(self.tzinfo or '') ) def __lt__(self, other: object) -> bool: return cast(bool, operator.lt(*self._get_operands(other))) def __le__(self, other: object) -> bool: return cast(bool, operator.le(*self._get_operands(other))) def __gt__(self, other: object) -> bool: return cast(bool, operator.gt(*self._get_operands(other))) def __ge__(self, other: object) -> bool: return cast(bool, operator.ge(*self._get_operands(other))) def __add__(self, other: object) -> 'Time': if isinstance(other, DayTimeDuration): dt = self._dt + other.get_timedelta() elif isinstance(other, datetime.timedelta): dt = self._dt + other else: raise TypeError("wrong type %r for operand %r" % (type(other), other)) return Time(dt.hour, dt.minute, dt.second, dt.microsecond, dt.tzinfo) def __sub__(self, other: object) -> Union['DayTimeDuration', 'Time']: if isinstance(other, self.__class__): delta = operator.sub(*self._get_operands(other)) return DayTimeDuration.fromtimedelta(delta) elif isinstance(other, DayTimeDuration): dt = self._dt - other.get_timedelta() return Time(dt.hour, dt.minute, dt.second, dt.microsecond, dt.tzinfo) elif isinstance(other, datetime.timedelta): dt = self._dt - other return Time(dt.hour, dt.minute, dt.second, dt.microsecond, dt.tzinfo) else: raise TypeError("wrong type %r for operand %r" % (type(other), other)) class Duration(AnyAtomicType): """ Base class for the XSD duration types. :param months: an integer value that represents years and months. :param seconds: a decimal or an integer instance that represents \ days, hours, minutes, seconds and fractions of seconds. """ name = 'duration' pattern = re.compile( r'^(-)?P(?=[0-9]|T)(?:([0-9]+)Y)?(?:([0-9]+)M)?(?:([0-9]+)D)?' r'(?:T(?=[0-9])(?:([0-9]+)H)?(?:([0-9]+)M)?(?:([0-9]+(?:\.[0-9]+)?)S)?)?$' ) def __init__(self, months: int = 0, seconds: Union[Decimal, int] = 0) -> None: if seconds < 0 < months or months < 0 < seconds: raise ValueError('signs differ: (months=%d, seconds=%d)' % (months, seconds)) elif abs(months) > 2 ** 31: raise OverflowError("months duration overflow") elif abs(seconds) > 2 ** 63: # type: ignore[operator] raise OverflowError("seconds duration overflow") self.months = months self.seconds = Decimal(seconds).quantize(Decimal('1.000000', context=Context(prec=30))) def __repr__(self) -> str: return '{}(months={!r}, seconds={})'.format( self.__class__.__name__, self.months, normalized_seconds(self.seconds) ) def __str__(self) -> str: m = abs(self.months) years, months = m // 12, m % 12 s = self.seconds.copy_abs() days = int(s // 86400) hours = int(s // 3600 % 24) minutes = int(s // 60 % 60) seconds = s % 60 value = '-P' if self.sign else 'P' if years or months or days: if years: value += '%dY' % years if months: value += '%dM' % months if days: value += '%dD' % days if hours or minutes or seconds: value += 'T' if hours: value += '%dH' % hours if minutes: value += '%dM' % minutes if seconds: value += '%sS' % normalized_seconds(seconds) elif value[-1] == 'P': value += 'T0S' return value @classmethod def fromstring(cls, text: str) -> 'Duration': """ Creates a Duration instance from a formatted XSD duration string. :param text: an ISO 8601 representation without week fragment and an optional decimal part \ only for seconds fragment. """ if not isinstance(text, str): msg = 'argument has an invalid type {!r}' raise TypeError(msg.format(type(text))) match = cls.pattern.match(text.strip()) if match is None: raise ValueError('%r is not an xs:duration value' % text) sign, y, mo, d, h, mi, s = match.groups() seconds = Decimal(s or 0) minutes = int(mi or 0) + int(seconds // 60) seconds = seconds % 60 hours = int(h or 0) + minutes // 60 minutes = minutes % 60 days = int(d or 0) + hours // 24 hours = hours % 24 months = int(mo or 0) + 12 * int(y or 0) if sign is None: seconds = seconds + (days * 24 + hours) * 3600 + minutes * 60 else: months = -months seconds = -seconds - (days * 24 + hours) * 3600 - minutes * 60 if cls is DayTimeDuration: if months: raise ValueError('months must be 0 for %r' % cls.__name__) return cls(seconds=seconds) elif cls is YearMonthDuration: if seconds: raise ValueError('seconds must be 0 for %r' % cls.__name__) return cls(months=months) return cls(months=months, seconds=seconds) @property def sign(self) -> str: return '-' if self.months < 0 or self.seconds < 0 else '' def _compare_durations(self, other: object, op: Callable[[Any, Any], Any]) -> bool: """ Ordering is defined through comparison of four datetime.datetime values. Ref: https://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/#duration """ if not isinstance(other, self.__class__): raise TypeError("wrong type %r for operand %r" % (type(other), other)) m1, s1 = self.months, int(self.seconds) m2, s2 = other.months, int(other.seconds) ms1, ms2 = int((self.seconds - s1) * 1000000), int((other.seconds - s2) * 1000000) return all([ op(datetime.timedelta(months2days(1696, 9, m1), s1, ms1), datetime.timedelta(months2days(1696, 9, m2), s2, ms2)), op(datetime.timedelta(months2days(1697, 2, m1), s1, ms1), datetime.timedelta(months2days(1697, 2, m2), s2, ms2)), op(datetime.timedelta(months2days(1903, 3, m1), s1, ms1), datetime.timedelta(months2days(1903, 3, m2), s2, ms2)), op(datetime.timedelta(months2days(1903, 7, m1), s1, ms1), datetime.timedelta(months2days(1903, 7, m2), s2, ms2)), ]) def __hash__(self) -> int: return hash((self.months, self.seconds)) def __eq__(self, other: object) -> bool: if isinstance(other, self.__class__): return self.months == other.months and self.seconds == other.seconds elif isinstance(other, UntypedAtomic): return self.__eq__(self.fromstring(other.value)) else: return other == (self.months, self.seconds) def __ne__(self, other: object) -> bool: if isinstance(other, self.__class__): return self.months != other.months or self.seconds != other.seconds elif isinstance(other, UntypedAtomic): return self.__ne__(self.fromstring(other.value)) else: return other != (self.months, self.seconds) def __lt__(self, other: object) -> bool: return self._compare_durations(other, operator.lt) def __le__(self, other: object) -> bool: return self == other or self._compare_durations(other, operator.le) def __gt__(self, other: object) -> bool: return self._compare_durations(other, operator.gt) def __ge__(self, other: object) -> bool: return self == other or self._compare_durations(other, operator.ge) class YearMonthDuration(Duration): name = 'yearMonthDuration' def __init__(self, months: int = 0) -> None: super(YearMonthDuration, self).__init__(months, 0) def __repr__(self) -> str: return '%s(months=%r)' % (self.__class__.__name__, self.months) def __str__(self) -> str: m = abs(self.months) years, months = m // 12, m % 12 if not years: return '-P%dM' % months if self.months < 0 else 'P%dM' % months elif not months: return '-P%dY' % years if self.months < 0 else 'P%dY' % years elif self.months < 0: return '-P%dY%dM' % (years, months) else: return 'P%dY%dM' % (years, months) def __add__(self, other: object) \ -> Union['YearMonthDuration', 'DayTimeDuration', 'OrderedDateTime']: if isinstance(other, self.__class__): return YearMonthDuration(months=self.months + other.months) elif isinstance(other, (DateTime10, Date10)): return other + self raise TypeError("cannot add %r to %r" % (type(other), type(self))) def __sub__(self, other: object) -> 'YearMonthDuration': if not isinstance(other, self.__class__): raise TypeError("cannot subtract %r from %r" % (type(other), type(self))) return YearMonthDuration(months=self.months - other.months) def __mul__(self, other: object) -> 'YearMonthDuration': if not isinstance(other, (float, int, Decimal)): raise TypeError("cannot multiply a %r by %r" % (type(self), type(other))) return YearMonthDuration(months=int(round_number(self.months * other))) def __truediv__(self, other: object) -> Union[float, 'YearMonthDuration']: if isinstance(other, self.__class__): return self.months / other.months elif isinstance(other, (float, int, Decimal)): return YearMonthDuration(months=int(round_number(self.months / other))) else: raise TypeError("cannot divide a %r by %r" % (type(self), type(other))) class DayTimeDuration(Duration): name = 'dayTimeDuration' def __init__(self, seconds: Union[Decimal, int] = 0) -> None: super(DayTimeDuration, self).__init__(0, seconds) @classmethod def fromtimedelta(cls, td: datetime.timedelta) -> 'DayTimeDuration': return cls(seconds=Decimal( '{}.{:06}'.format(td.days * 86400 + td.seconds, td.microseconds) )) def get_timedelta(self) -> datetime.timedelta: return datetime.timedelta( seconds=int(self.seconds), microseconds=int(self.seconds % 1 * 1000000) ) def __repr__(self) -> str: return '%s(seconds=%s)' % (self.__class__.__name__, normalized_seconds(self.seconds)) def __add__(self, other: object) -> Union['DayTimeDuration', Time, OrderedDateTime]: if isinstance(other, (Time, Date10)): return other + self elif isinstance(other, self.__class__): return DayTimeDuration(self.seconds + other.seconds) raise TypeError("cannot add %r to %r" % (type(other), type(self))) def __sub__(self, other: object) -> 'DayTimeDuration': if not isinstance(other, self.__class__): raise TypeError("cannot subtract %r from %r" % (type(other), type(self))) return DayTimeDuration(seconds=self.seconds - other.seconds) def __mul__(self, other: object) -> 'DayTimeDuration': if isinstance(other, (float, int, Decimal)): if math.isnan(other): raise ValueError("cannot multiply a %r by NaN" % type(self)) if isinstance(other, (int, Decimal)): seconds = self.seconds * other else: seconds = self.seconds * Decimal.from_float(other) return DayTimeDuration(seconds) else: raise TypeError("cannot multiply a %r by %r" % (type(self), type(other))) def __truediv__(self, other: object) -> Union[Decimal, 'DayTimeDuration']: if isinstance(other, self.__class__): return self.seconds / other.seconds elif isinstance(other, (float, int, Decimal)): if math.isnan(other): raise ValueError("cannot divide a %r by NaN" % type(self)) if isinstance(other, (int, Decimal)): seconds = self.seconds / other else: seconds = self.seconds / Decimal.from_float(other) return DayTimeDuration(seconds) else: raise TypeError("cannot divide a %r by %r" % (type(self), type(other))) elementpath-3.0.2/elementpath/datatypes/numeric.py000066400000000000000000000176311427546011100223530ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import re import math from typing import Any, Optional, SupportsFloat, SupportsInt, Union, Type from ..helpers import NUMERIC_INF_OR_NAN, INVALID_NUMERIC, collapse_white_spaces from .atomic_types import AtomicTypeMeta, AnyAtomicType class Float10(float, AnyAtomicType): name = 'float' xsd_version = '1.0' pattern = re.compile( r'^(?:[+-]?(?:[0-9]+(?:\.[0-9]*)?|\.[0-9]+)(?:[Ee][+-]?[0-9]+)? |[+-]?INF|NaN)$' ) def __new__(cls, value: Union[str, SupportsFloat]) -> 'Float10': if isinstance(value, str): value = collapse_white_spaces(value) if value in NUMERIC_INF_OR_NAN or cls.xsd_version != '1.0' and value == '+INF': pass elif value.lower() in INVALID_NUMERIC: raise ValueError('invalid value {!r} for xs:{}'.format(value, cls.name)) _value = super().__new__(cls, value) if _value > 3.4028235E38: return super().__new__(cls, 'INF') elif _value < -3.4028235E38: return super().__new__(cls, '-INF') elif -1e-37 < _value < 1e-37: return super().__new__(cls, -0.0 if str(_value).startswith('-') else 0.0) return _value def __hash__(self) -> int: return super(Float10, self).__hash__() def __eq__(self, other: object) -> bool: if isinstance(other, self.__class__): if super(Float10, self).__eq__(other): return True return math.isclose(self, other, rel_tol=1e-7, abs_tol=0.0) return super(Float10, self).__eq__(other) def __ne__(self, other: object) -> bool: if isinstance(other, self.__class__): if super(Float10, self).__eq__(other): return False return not math.isclose(self, other, rel_tol=1e-7, abs_tol=0.0) return super(Float10, self).__ne__(other) def __add__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)): return self.__class__(super(Float10, self).__add__(other)) elif isinstance(other, float): return super(Float10, self).__add__(other) return NotImplemented def __radd__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)): return self.__class__(super(Float10, self).__radd__(other)) elif isinstance(other, float): return super(Float10, self).__radd__(other) return NotImplemented def __sub__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)): return self.__class__(super(Float10, self).__sub__(other)) elif isinstance(other, float): return super(Float10, self).__sub__(other) return NotImplemented def __rsub__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)): return self.__class__(super(Float10, self).__rsub__(other)) elif isinstance(other, float): return super(Float10, self).__rsub__(other) return NotImplemented def __mul__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)): return self.__class__(super(Float10, self).__mul__(other)) elif isinstance(other, float): return super(Float10, self).__mul__(other) return NotImplemented def __rmul__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)): return self.__class__(super(Float10, self).__rmul__(other)) elif isinstance(other, float): return super(Float10, self).__rmul__(other) return NotImplemented def __truediv__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)): return self.__class__(super(Float10, self).__truediv__(other)) elif isinstance(other, float): return super(Float10, self).__truediv__(other) return NotImplemented def __rtruediv__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)): return self.__class__(super(Float10, self).__rtruediv__(other)) elif isinstance(other, float): return super(Float10, self).__rtruediv__(other) return NotImplemented def __mod__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)): return self.__class__(super(Float10, self).__mod__(other)) elif isinstance(other, float): return super(Float10, self).__mod__(other) return NotImplemented def __rmod__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)): return self.__class__(super(Float10, self).__rmod__(other)) elif isinstance(other, float): return super(Float10, self).__rmod__(other) return NotImplemented def __abs__(self) -> Union['Float10', 'Float']: return self.__class__(super(Float10, self).__abs__()) class Float(Float10): name = 'float' xsd_version = '1.1' class Integer(int, metaclass=AtomicTypeMeta): """A wrapper for emulating xs:integer and limited integer types.""" name = 'integer' pattern = re.compile(r'^[\-+]?[0-9]+$') lower_bound: Optional[int] = None higher_bound: Optional[int] = None def __init__(self, value: Union[str, SupportsInt]) -> None: if self.lower_bound is not None and self < self.lower_bound: raise ValueError("value {} is too low for {!r}".format(value, self.__class__)) elif self.higher_bound is not None and self >= self.higher_bound: raise ValueError("value {} is too high for {!r}".format(value, self.__class__)) super(Integer, self).__init__() @classmethod def __subclasshook__(cls, subclass: Type[Any]) -> bool: if cls is Integer: return issubclass(subclass, int) and not issubclass(subclass, bool) return NotImplemented @classmethod def validate(cls, value: object) -> None: if isinstance(value, cls): return elif isinstance(value, str): if cls.pattern.match(value) is None: raise cls.invalid_value(value) else: raise cls.invalid_type(value) class NonPositiveInteger(Integer): name = 'nonPositiveInteger' lower_bound, higher_bound = None, 1 class NegativeInteger(NonPositiveInteger): name = 'negativeInteger' lower_bound, higher_bound = None, 0 class Long(Integer): name = 'long' lower_bound, higher_bound = -2**63, 2**63 class Int(Long): name = 'int' lower_bound, higher_bound = -2**31, 2**31 class Short(Int): name = 'short' lower_bound, higher_bound = -2**15, 2**15 class Byte(Short): name = 'byte' lower_bound, higher_bound = -2**7, 2**7 class NonNegativeInteger(Integer): name = 'nonNegativeInteger' lower_bound = 0 higher_bound: Optional[int] = None class PositiveInteger(NonNegativeInteger): name = 'positiveInteger' lower_bound, higher_bound = 1, None class UnsignedLong(NonNegativeInteger): name = 'unsignedLong' lower_bound, higher_bound = 0, 2**64 class UnsignedInt(UnsignedLong): name = 'unsignedInt' lower_bound, higher_bound = 0, 2**32 class UnsignedShort(UnsignedInt): name = 'unsignedShort' lower_bound, higher_bound = 0, 2**16 class UnsignedByte(UnsignedShort): name = 'unsignedByte' lower_bound, higher_bound = 0, 2**8 elementpath-3.0.2/elementpath/datatypes/proxies.py000066400000000000000000000155611427546011100224020ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import re import math from decimal import Decimal from typing import Any, Union, SupportsFloat from ..helpers import BOOLEAN_VALUES, NUMERIC_INF_OR_NAN, INVALID_NUMERIC, \ collapse_white_spaces from .atomic_types import AtomicTypeMeta from .untyped import UntypedAtomic from .numeric import Float10, Integer from .datetime import AbstractDateTime, Duration FloatArgType = Union[SupportsFloat, str, bytes] #### # Type proxies for basic Python datatypes: a proxy class creates # and validates its Python datatype and virtual registered types. class BooleanProxy(metaclass=AtomicTypeMeta): name = 'boolean' pattern = re.compile(r'^(?:true|false|1|0)$') def __new__(cls, value: object) -> bool: # type: ignore[misc] if isinstance(value, bool): return value elif isinstance(value, (int, float, Decimal)): if math.isnan(value): return False return bool(value) elif isinstance(value, UntypedAtomic): value = value.value elif not isinstance(value, str): raise TypeError('invalid type {!r} for xs:{}'.format(type(value), cls.name)) if value.strip() not in BOOLEAN_VALUES: raise ValueError('invalid value {!r} for xs:{}'.format(value, cls.name)) return 't' in value or '1' in value @classmethod def __subclasshook__(cls, subclass: type) -> bool: return issubclass(subclass, bool) @classmethod def validate(cls, value: object) -> None: if isinstance(value, bool): return elif isinstance(value, str): if cls.pattern.match(value) is None: raise cls.invalid_value(value) else: raise cls.invalid_type(value) class DecimalProxy(metaclass=AtomicTypeMeta): name = 'decimal' pattern = re.compile(r'^[+-]?(?:[0-9]+(?:\.[0-9]*)?|\.[0-9]+)$') def __new__(cls, value: Any) -> Decimal: # type: ignore[misc] if isinstance(value, (str, UntypedAtomic)): value = collapse_white_spaces(str(value)).replace(' ', '') if cls.pattern.match(value) is None: raise cls.invalid_value(value) elif isinstance(value, (float, Float10, Decimal)): if math.isinf(value) or math.isnan(value): raise cls.invalid_value(value) try: return Decimal(value) except (ValueError, ArithmeticError): msg = 'invalid value {!r} for xs:{}' raise ArithmeticError(msg.format(value, cls.name)) from None @classmethod def __subclasshook__(cls, subclass: type) -> bool: return issubclass(subclass, (int, Decimal, Integer)) and not issubclass(subclass, bool) @classmethod def validate(cls, value: object) -> None: if isinstance(value, Decimal): if math.isnan(value) or math.isinf(value): raise cls.invalid_value(value) elif isinstance(value, (int, Integer)) and not isinstance(value, bool): return elif isinstance(value, str): if cls.pattern.match(value) is None: raise cls.invalid_value(value) else: raise cls.invalid_type(value) class DoubleProxy10(metaclass=AtomicTypeMeta): name = 'double' xsd_version = '1.0' pattern = re.compile( r'^(?:[+-]?(?:[0-9]+(?:\.[0-9]*)?|\.[0-9]+)(?:[Ee][+-]?[0-9]+)?|[+-]?INF|NaN)$' ) def __new__(cls, value: Union[SupportsFloat, str]) -> float: # type: ignore[misc] if isinstance(value, str): value = collapse_white_spaces(value) if value in NUMERIC_INF_OR_NAN or cls.xsd_version != '1.0' and value == '+INF': pass elif value.lower() in INVALID_NUMERIC: raise ValueError('invalid value {!r} for xs:{}'.format(value, cls.name)) return float(value) @classmethod def __subclasshook__(cls, subclass: type) -> bool: return issubclass(subclass, float) and not issubclass(subclass, Float10) @classmethod def validate(cls, value: object) -> None: if isinstance(value, float) and not isinstance(value, Float10): return elif isinstance(value, str): if cls.pattern.match(value) is None: raise cls.invalid_value(value) else: raise cls.invalid_type(value) class DoubleProxy(DoubleProxy10): name = 'double' xsd_version = '1.1' class StringProxy(metaclass=AtomicTypeMeta): name = 'string' def __new__(cls, *args: object, **kwargs: object) -> str: # type: ignore[misc] return str(*args, **kwargs) @classmethod def __subclasshook__(cls, subclass: type) -> bool: return issubclass(subclass, str) @classmethod def validate(cls, value: object) -> None: if not isinstance(value, str): raise cls.invalid_type(value) #### # Type proxies for multiple type-checking in XPath expressions class NumericTypeMeta(type): """Metaclass for checking numeric classes and instances.""" def __instancecheck__(cls, instance: object) -> bool: return isinstance(instance, (int, float, Decimal)) and not isinstance(instance, bool) def __subclasscheck__(cls, subclass: type) -> bool: if issubclass(subclass, bool): return False return issubclass(subclass, int) or issubclass(subclass, float) \ or issubclass(subclass, Decimal) class NumericProxy(metaclass=NumericTypeMeta): """Proxy for xs:numeric related types. Builds xs:float instances.""" def __new__(cls, *args: FloatArgType, **kwargs: FloatArgType) -> float: # type: ignore[misc] return float(*args, **kwargs) class ArithmeticTypeMeta(type): """Metaclass for checking numeric, datetime and duration classes/instances.""" def __instancecheck__(cls, instance: object) -> bool: return isinstance( instance, (int, float, Decimal, AbstractDateTime, Duration, UntypedAtomic) ) and not isinstance(instance, bool) def __subclasscheck__(cls, subclass: type) -> bool: if issubclass(subclass, bool): return False return issubclass(subclass, int) or issubclass(subclass, float) or \ issubclass(subclass, Decimal) or issubclass(subclass, Duration) \ or issubclass(subclass, AbstractDateTime) or issubclass(subclass, UntypedAtomic) class ArithmeticProxy(metaclass=ArithmeticTypeMeta): """Proxy for arithmetic related types. Builds xs:float instances.""" def __new__(cls, *args: FloatArgType, **kwargs: FloatArgType) -> float: # type: ignore[misc] return float(*args, **kwargs) elementpath-3.0.2/elementpath/datatypes/qname.py000066400000000000000000000056071427546011100220120ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from typing import Any, Optional from ..helpers import QNAME_PATTERN from .atomic_types import AtomicTypeMeta from .untyped import UntypedAtomic class AbstractQName(metaclass=AtomicTypeMeta): """ XPath compliant QName, bound with a prefix and a namespace. :param uri: the bound namespace URI, must be a not empty \ URI if a prefixed name is provided for the 2nd argument. :param qname: the prefixed name or a local name. """ pattern = QNAME_PATTERN def __new__(cls, *args: Any, **kwargs: Any) -> 'AbstractQName': if cls.__name__ == 'Notation': raise TypeError("can't instantiate xs:NOTATION objects") return super().__new__(cls) def __init__(self, uri: Optional[str], qname: str) -> None: if uri is None: self.uri = '' elif isinstance(uri, str): self.uri = uri else: raise TypeError('the 1st argument has an invalid type %r' % type(uri)) if not isinstance(qname, str): raise TypeError('the 2nd argument has an invalid type %r' % type(qname)) self.qname = qname.strip() match = self.pattern.match(self.qname) if match is None: raise ValueError('invalid value {!r} for an xs:QName'.format(self.qname)) self.prefix = match.groupdict()['prefix'] self.local_name = match.groupdict()['local'] if not uri and self.prefix: msg = '{!r}: cannot associate a non-empty prefix with no namespace' raise ValueError(msg.format(self)) @property def namespace(self) -> str: return self.uri @property def expanded_name(self) -> str: return '{%s}%s' % (self.uri, self.local_name) if self.uri else self.local_name @property def braced_uri_name(self) -> str: return 'Q{%s}%s' % (self.uri, self.local_name) if self.uri else self.local_name def __repr__(self) -> str: return '%s(uri=%r, qname=%r)' % (self.__class__.__name__, self.uri, self.qname) def __str__(self) -> str: return self.qname def __hash__(self) -> int: return hash((self.uri, self.local_name)) def __eq__(self, other: object) -> bool: if isinstance(other, AbstractQName): return self.uri == other.uri and self.local_name == other.local_name elif isinstance(other, (str, UntypedAtomic)): return other == self.qname raise TypeError("cannot compare {!r} to {!r}".format(type(self), type(other))) class QName(AbstractQName): name = 'QName' class Notation(AbstractQName): name = 'NOTATION' elementpath-3.0.2/elementpath/datatypes/string.py000066400000000000000000000045501427546011100222130ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import re from typing import Any from ..helpers import NORMALIZE_PATTERN, collapse_white_spaces from .atomic_types import AtomicTypeMeta class NormalizedString(str, metaclass=AtomicTypeMeta): name = 'normalizedString' pattern = re.compile('^[^\t\r]*$') def __new__(cls, obj: Any) -> 'NormalizedString': try: return super().__new__(cls, NORMALIZE_PATTERN.sub(' ', obj)) except TypeError: return super().__new__(cls, obj) class XsdToken(NormalizedString): name = 'token' pattern = re.compile(r'^[\S\xa0]*(?: [\S\xa0]+)*$') def __new__(cls, value: Any) -> 'XsdToken': if not isinstance(value, str): value = str(value) else: value = collapse_white_spaces(value) match = cls.pattern.match(value) if match is None: raise ValueError('invalid value {!r} for xs:{}'.format(value, cls.name)) return super(NormalizedString, cls).__new__(cls, value) class Language(XsdToken): name = 'language' pattern = re.compile(r'^[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*$') def __new__(cls, value: Any) -> 'Language': if isinstance(value, bool): value = 'true' if value else 'false' elif not isinstance(value, str): value = str(value) else: value = collapse_white_spaces(value) match = cls.pattern.match(value) if match is None: raise ValueError('invalid value {!r} for xs:{}'.format(value, cls.name)) return super(NormalizedString, cls).__new__(cls, value) class Name(XsdToken): name = 'Name' pattern = re.compile(r'^(?:[^\d\W]|:)[\w.\-:\u00B7\u0300-\u036F\u203F\u2040]*$') class NCName(Name): name = 'NCName' pattern = re.compile(r'^[^\d\W][\w.\-\u00B7\u0300-\u036F\u203F\u2040]*$') class Id(NCName): name = 'ID' class Idref(NCName): name = 'IDREF' class Entity(NCName): name = 'ENTITY' class NMToken(XsdToken): name = 'NMTOKEN' pattern = re.compile(r'^[\w.\-:\u00B7\u0300-\u036F\u203F\u2040]+$') elementpath-3.0.2/elementpath/datatypes/untyped.py000066400000000000000000000117161427546011100223770ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import operator from decimal import Decimal from typing import Any, Tuple, Union from ..helpers import BOOLEAN_VALUES from .atomic_types import AtomicTypeMeta, AnyAtomicType class UntypedAtomic(metaclass=AtomicTypeMeta): """ Class for xs:untypedAtomic data. Provides special methods for comparing and converting to basic data types. :param value: the untyped value, usually a string. """ name = 'untypedAtomic' value: str @classmethod def validate(cls, value: object) -> None: if not isinstance(value, cls): raise cls.invalid_type(value) def __init__(self, value: Union[str, bytes, bool, float, Decimal, 'UntypedAtomic', AnyAtomicType]) -> None: if isinstance(value, str): self.value = value elif isinstance(value, bytes): self.value = value.decode('utf-8') elif isinstance(value, bool): self.value = 'true' if value else 'false' elif isinstance(value, float): self.value = str(value).rstrip('0').rstrip('.') elif isinstance(value, Decimal): self.value = str(value.normalize()) elif isinstance(value, UntypedAtomic): self.value = value.value elif isinstance(value, AnyAtomicType): self.value = str(value) else: raise TypeError("{!r} is not an atomic value".format(value)) def __repr__(self) -> str: return '%s(%r)' % (self.__class__.__name__, self.value) def _get_operands(self, other: Any, force_float: bool = True) -> Tuple[Any, Any]: """ Returns a couple of operands, applying a cast to the instance value based on the type of the *other* argument. :param other: The other operand, that determines the cast for the untyped instance. :param force_float: Force a conversion to float if *other* is an UntypedAtomic instance. :return: A couple of values. """ if isinstance(other, UntypedAtomic): if force_float: return float(self.value), float(other.value) return self.value, other.value elif isinstance(other, bool): # Cast to xs:boolean value = self.value.strip() if value not in BOOLEAN_VALUES: raise ValueError("{!r} cannot be cast to xs:boolean".format(self.value)) return value in ('1', 'true'), other elif isinstance(other, int): return float(self.value), other elif other is None or isinstance(other, (str, list)): return self.value, other try: return type(other).fromstring(self.value), other except AttributeError: return type(other)(self.value), other def __hash__(self) -> int: return hash(self.value) def __eq__(self, other: Any) -> Any: return operator.eq(*self._get_operands(other, force_float=False)) def __ne__(self, other: Any) -> Any: return not operator.eq(*self._get_operands(other, force_float=False)) def __lt__(self, other: Any) -> Any: return operator.lt(*self._get_operands(other)) def __le__(self, other: Any) -> Any: return operator.le(*self._get_operands(other)) def __gt__(self, other: Any) -> Any: return operator.gt(*self._get_operands(other)) def __ge__(self, other: Any) -> Any: return operator.ge(*self._get_operands(other)) def __add__(self, other: Any) -> Any: return operator.add(*self._get_operands(other)) __radd__ = __add__ def __sub__(self, other: Any) -> Any: return operator.sub(*self._get_operands(other)) def __rsub__(self, other: Any) -> Any: return operator.sub(*reversed(self._get_operands(other))) def __mul__(self, other: Any) -> Any: return operator.mul(*self._get_operands(other)) __rmul__ = __mul__ def __truediv__(self, other: Any) -> Any: return operator.truediv(*self._get_operands(other)) def __rtruediv__(self, other: Any) -> Any: return operator.truediv(*reversed(self._get_operands(other))) def __int__(self) -> int: return int(self.value) def __float__(self) -> float: return float(self.value) def __bool__(self) -> bool: return bool(self.value) # For effective boolean value, not for cast to xs:boolean. def __abs__(self) -> Decimal: return abs(Decimal(self.value)) def __mod__(self, other: Any) -> Any: return operator.mod(*self._get_operands(other)) def __str__(self) -> str: return self.value def __bytes__(self) -> bytes: return bytes(self.value, encoding='utf-8') elementpath-3.0.2/elementpath/datatypes/uri.py000066400000000000000000000103161427546011100215010ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from decimal import Decimal from urllib.parse import urlparse from typing import Union from ..helpers import collapse_white_spaces, WRONG_ESCAPE_PATTERN from .atomic_types import AnyAtomicType from .untyped import UntypedAtomic from .numeric import Integer class AnyURI(AnyAtomicType): """ Class for xs:anyURI data. :param value: a string or an untyped atomic instance. """ value: str name = 'anyURI' def __init__(self, value: Union[str, bytes, UntypedAtomic, 'AnyURI']) -> None: if isinstance(value, str): self.value = collapse_white_spaces(value) elif isinstance(value, bytes): self.value = collapse_white_spaces(value.decode('utf-8')) elif isinstance(value, self.__class__): self.value = value.value elif isinstance(value, UntypedAtomic): self.value = collapse_white_spaces(value.value) else: raise TypeError('the argument has an invalid type %r' % type(value)) self.validate(self.value) def __repr__(self) -> str: return '%s(%r)' % (self.__class__.__name__, self.value) def __str__(self) -> str: return self.value def __bool__(self) -> bool: return bool(self.value) # For effective boolean value def __hash__(self) -> int: return hash(self.value) def __contains__(self, item: str) -> bool: return item in self.value def __eq__(self, other: object) -> bool: if isinstance(other, (AnyURI, UntypedAtomic)): return self.value == other.value elif isinstance(other, (bool, float, Decimal, Integer)): raise TypeError("cannot compare {} with xs:{}".format(type(other), self.name)) return self.value == other def __ne__(self, other: object) -> bool: if isinstance(other, (AnyURI, UntypedAtomic)): return self.value != other.value elif isinstance(other, (bool, float, Decimal, Integer)): raise TypeError("cannot compare {} with xs:{}".format(type(other), self.name)) return self.value != other def __lt__(self, other: Union[str, 'AnyURI', UntypedAtomic]) -> bool: if isinstance(other, (AnyURI, UntypedAtomic)): return self.value < other.value return self.value < other def __le__(self, other: Union[str, 'AnyURI', UntypedAtomic]) -> bool: if isinstance(other, (AnyURI, UntypedAtomic)): return self.value <= other.value return self.value <= other def __gt__(self, other: Union[str, 'AnyURI', UntypedAtomic]) -> bool: if isinstance(other, (AnyURI, UntypedAtomic)): return self.value > other.value return self.value > other def __ge__(self, other: Union[str, 'AnyURI', UntypedAtomic]) -> bool: if isinstance(other, (AnyURI, UntypedAtomic)): return self.value >= other.value return self.value >= other @classmethod def validate(cls, value: object) -> None: if isinstance(value, cls): return elif isinstance(value, bytes): value = value.decode() elif not isinstance(value, str): raise cls.invalid_type(value) try: url_parts = urlparse(value) _ = url_parts.port # check invalid port! except ValueError as err: msg = 'invalid value {!r} for xs:{} ({})' raise ValueError(msg.format(value, cls.name, str(err))) from None else: if url_parts.path.startswith(':'): raise cls.invalid_value(value) elif value.count('#') > 1: msg = 'invalid value {!r} for xs:{} (too many # characters)' raise ValueError(msg.format(value, cls.name)) elif WRONG_ESCAPE_PATTERN.search(value) is not None: msg = 'invalid value {!r} for xs:{} (wrong escaping)' raise ValueError(msg.format(value, cls.name)) elementpath-3.0.2/elementpath/etree.py000066400000000000000000000256451427546011100200230ustar00rootroot00000000000000# # Copyright (c), 2016-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ A unified loader module for ElementTree with a safe parser and helper functions. """ import sys import re import io from typing import cast, Any, Counter, Iterator, Optional, MutableMapping, \ Tuple, Union from .protocols import ElementProtocol, DocumentProtocol ### # Programmatic import of the pure Python ElementTree module. # # In Python 3 the pure Python implementation is overwritten by the C module API, # so use a programmatic re-import to obtain the pure Python module, necessary for # defining a safer XMLParser. ### # Temporary remove the loaded modules import xml.etree.ElementTree as ElementTree sys.modules.pop('xml.etree.ElementTree') _cmod = sys.modules.pop('_elementtree', None) # Load the pure Python module sys.modules['_elementtree'] = None # type: ignore[assignment] import xml.etree.ElementTree as PyElementTree # noqa import xml.etree # noqa # Restore original modules if _cmod is not None: sys.modules['_elementtree'] = _cmod xml.etree.ElementTree = ElementTree sys.modules['xml.etree.ElementTree'] = ElementTree class SafeXMLParser(PyElementTree.XMLParser): """ An XMLParser that forbids entities processing. Drops the *html* argument that is deprecated since version 3.4. :param target: the target object called by the `feed()` method of the \ parser, that defaults to `TreeBuilder`. :param encoding: if provided, its value overrides the encoding specified \ in the XML file. """ def __init__(self, target: Optional[Any] = None, encoding: Optional[str] = None) -> None: super(SafeXMLParser, self).__init__(target=target, encoding=encoding) self.parser.EntityDeclHandler = self.entity_declaration self.parser.UnparsedEntityDeclHandler = self.unparsed_entity_declaration self.parser.ExternalEntityRefHandler = self.external_entity_reference def entity_declaration(self, entity_name, is_parameter_entity, value, base, # type: ignore system_id, public_id, notation_name): raise PyElementTree.ParseError( "Entities are forbidden (entity_name={!r})".format(entity_name) ) def unparsed_entity_declaration(self, entity_name, base, system_id, # type: ignore public_id, notation_name): raise PyElementTree.ParseError( "Unparsed entities are forbidden (entity_name={!r})".format(entity_name) ) def external_entity_reference(self, context, base, system_id, public_id): # type: ignore raise PyElementTree.ParseError( "External references are forbidden (system_id={!r}, " "public_id={!r})".format(system_id, public_id) ) # pragma: no cover (EntityDeclHandler is called before) def defuse_xml(xml_source: Union[str, bytes]) -> Union[str, bytes]: resource: Any if isinstance(xml_source, str): resource = io.StringIO(xml_source) else: resource = io.BytesIO(xml_source) safe_parser = SafeXMLParser(target=PyElementTree.TreeBuilder()) try: for _ in PyElementTree.iterparse(resource, ('start',), safe_parser): break except PyElementTree.ParseError as err: msg = str(err) if "Entities are forbidden" in msg or \ "Unparsed entities are forbidden" in msg or \ "External references are forbidden" in msg: raise return xml_source def is_etree_element(obj: Any) -> bool: return hasattr(obj, 'tag') and hasattr(obj, 'attrib') and hasattr(obj, 'text') def is_lxml_etree_element(obj: Any) -> bool: return is_etree_element(obj) and hasattr(obj, 'getparent') and hasattr(obj, 'nsmap') def is_etree_document(obj: Any) -> bool: return hasattr(obj, 'getroot') and hasattr(obj, 'parse') and hasattr(obj, 'iter') def is_lxml_etree_document(obj: Any) -> bool: return is_etree_document(obj) and hasattr(obj, 'xpath') and hasattr(obj, 'xslt') def etree_iter_strings(elem: Union[DocumentProtocol, ElementProtocol], normalize: bool = False) -> Iterator[str]: e: ElementProtocol if normalize: for e in elem.iter(): if callable(e.tag): continue if e.text is not None: yield e.text.strip() if e is elem else e.text if e.tail is not None and e is not elem: yield e.tail.strip() if e in elem else e.tail else: for e in elem.iter(): if callable(e.tag): continue if e.text is not None: yield e.text if e.tail is not None and e is not elem: yield e.tail def etree_deep_equal(e1: ElementProtocol, e2: ElementProtocol) -> bool: if e1.tag != e2.tag: return False elif (e1.text or '').strip() != (e2.text or '').strip(): return False elif (e1.tail or '').strip() != (e2.tail or '').strip(): return False elif e1.attrib != e2.attrib: return False elif len(e1) != len(e2): return False return all(etree_deep_equal(c1, c2) for c1, c2 in zip(e1, e2)) def etree_iter_paths(elem: ElementProtocol, path: str = '.') \ -> Iterator[Tuple[ElementProtocol, str]]: yield elem, path comment_nodes = 0 pi_nodes = Counter[Optional[str]]() positions = Counter[Optional[str]]() for child in elem: if callable(child.tag): if child.tag.__name__ == 'Comment': # type: ignore[attr-defined] comment_nodes += 1 yield child, f'{path}/comment()[{comment_nodes}]' continue try: name = cast(str, child.target) # type: ignore[attr-defined] except AttributeError: assert child.text is not None name = child.text.split(' ', maxsplit=1)[0] pi_nodes[name] += 1 yield child, f'{path}/processing-instruction({name})[{pi_nodes[name]}]' continue if child.tag.startswith('{'): tag = f'Q{child.tag}' else: tag = f'Q{{}}{child.tag}' if path == '/': child_path = f'/{tag}' elif path: child_path = '/'.join((path, tag)) else: child_path = tag positions[child.tag] += 1 child_path += f'[{positions[child.tag]}]' yield from etree_iter_paths(child, child_path) def etree_tostring(elem: ElementProtocol, namespaces: Optional[MutableMapping[str, str]] = None, indent: str = '', max_lines: Optional[int] = None, spaces_for_tab: Optional[int] = None, xml_declaration: Optional[bool] = None, encoding: str = 'unicode', method: str = 'xml') -> Union[str, bytes]: """ Serialize an Element tree to a string. Tab characters are replaced by whitespaces. :param elem: the Element instance. :param namespaces: is an optional mapping from namespace prefix to URI. \ Provided namespaces are registered before serialization. :param indent: the base line indentation. :param max_lines: if truncate serialization after a number of lines \ (default: do not truncate). :param spaces_for_tab: number of spaces for replacing tab characters. \ For default tabs are replaced with 4 spaces, but only if not empty \ indentation or a max lines limit are provided. :param xml_declaration: if set to `True` inserts the XML declaration at the head. :param encoding: if "unicode" (the default) the output is a string, otherwise it’s binary. :param method: is either "xml" (the default), "html" or "text". :return: a Unicode string. """ def reindent(line: str) -> str: if not line: return line elif line.startswith(min_indent): return line[start:] if start >= 0 else indent[start:] + line else: return indent + line etree_module: Any if not is_etree_element(elem): raise TypeError(f"{elem!r} is not an Element") elif isinstance(elem, PyElementTree.Element): etree_module = PyElementTree elif not hasattr(elem, 'nsmap'): etree_module = ElementTree else: import lxml.etree as etree_module # type: ignore[no-redef] if namespaces: default_namespace = namespaces.get('') for prefix, uri in namespaces.items(): if prefix and not re.match(r'ns\d+$', prefix): etree_module.register_namespace(prefix, uri) if uri == default_namespace: default_namespace = None if default_namespace and not hasattr(elem, 'nsmap'): etree_module.register_namespace('', default_namespace) xml_text = etree_module.tostring(elem, encoding=encoding, method=method) if isinstance(xml_text, bytes): xml_text = xml_text.decode('utf-8') if spaces_for_tab: xml_text = xml_text.replace('\t', ' ' * spaces_for_tab) elif method != 'text' and (indent or max_lines): xml_text = xml_text.replace('\t', ' ' * 4) if xml_text.startswith(''.format(encoding)] lines.extend(xml_text.splitlines()) else: lines = xml_text.splitlines() # Clear ending empty lines while lines and not lines[-1].strip(): lines.pop(-1) if not lines or method == 'text' or (not indent and not max_lines): if encoding == 'unicode': return '\n'.join(lines) return '\n'.join(lines).encode(encoding) last_indent = ' ' * min(k for k in range(len(lines[-1])) if lines[-1][k] != ' ') if len(lines) > 2: child_indent = ' ' * min( k for line in lines[1:-1] for k in range(len(line)) if line[k] != ' ' ) min_indent = min(child_indent, last_indent) else: min_indent = child_indent = last_indent start = len(min_indent) - len(indent) if max_lines is not None and len(lines) > max_lines + 2: lines = lines[:max_lines] + [child_indent + '...'] * 2 + lines[-1:] if encoding == 'unicode': return '\n'.join(reindent(line) for line in lines) return '\n'.join(reindent(line) for line in lines).encode(encoding) __all__ = ['ElementTree', 'PyElementTree', 'SafeXMLParser', 'defuse_xml', 'is_etree_element', 'is_lxml_etree_element', 'is_etree_document', 'is_lxml_etree_document', 'etree_iter_strings', 'etree_deep_equal', 'etree_iter_paths', 'etree_tostring'] elementpath-3.0.2/elementpath/exceptions.py000066400000000000000000000256531427546011100210770ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import locale from typing import TYPE_CHECKING, Optional, Any if TYPE_CHECKING: from .tdop import Token class ElementPathError(Exception): """ Base exception class for elementpath package. :param message: the message related to the error. :param code: an optional error code. :param token: an optional token instance related with the error. """ def __init__(self, message: str, code: Optional[str] = None, token: Optional['Token[Any]'] = None) -> None: super(ElementPathError, self).__init__(message) self.message = message self.code = code self.token = token def __str__(self) -> str: if self.token is None or not isinstance(self.token.value, (str, bytes)): if not self.code: return self.message return '[{}] {}'.format(self.code, self.message) elif not self.code: return '{1} at line {2}, column {3}: {0}'.format( self.message, self.token, *self.token.position ) return '{2} at line {3}, column {4}: [{1}] {0}'.format( self.message, self.code, self.token, *self.token.position ) class MissingContextError(ElementPathError): """Raised when the dynamic context is required for evaluate the XPath expression.""" class ElementPathKeyError(ElementPathError, KeyError): pass class ElementPathZeroDivisionError(ElementPathError, ZeroDivisionError): pass class ElementPathNameError(ElementPathError, NameError): pass class ElementPathOverflowError(ElementPathError, OverflowError): pass class ElementPathRuntimeError(ElementPathError, RuntimeError): pass class ElementPathSyntaxError(ElementPathError, SyntaxError): pass class ElementPathTypeError(ElementPathError, TypeError): pass class ElementPathValueError(ElementPathError, ValueError): pass class ElementPathLocaleError(ElementPathError, locale.Error): pass XPATH_ERROR_CODES = { # XPath 2.0 parser errors (https://www.w3.org/TR/xpath20/#id-errors) 'XPST0001': (ElementPathValueError, 'Parser not bound to a schema'), 'XPST0003': (ElementPathSyntaxError, 'Invalid XPath expression'), 'XPDY0002': (MissingContextError, 'Dynamic context required for evaluate'), 'XPTY0004': (ElementPathTypeError, 'Type is not appropriate for the context'), 'XPST0005': (ElementPathValueError, 'A not empty sequence required'), 'XPST0008': (ElementPathNameError, 'Name not found'), 'XPST0010': (ElementPathNameError, 'Axis not found'), 'XPST0017': (ElementPathTypeError, 'Wrong number of arguments'), 'XPTY0018': (ElementPathTypeError, 'Step result contains both nodes and atomic values'), 'XPTY0019': (ElementPathTypeError, 'Intermediate step contains an atomic value'), 'XPTY0020': (ElementPathTypeError, 'Context item is not a node'), 'XPDY0050': (ElementPathTypeError, 'Type does not match sequence type'), 'XPST0051': (ElementPathNameError, 'Unknown atomic type'), 'XPST0080': (ElementPathNameError, 'Target type cannot be xs:NOTATION or xs:anyAtomicType'), 'XPST0081': (ElementPathNameError, 'Unknown namespace'), # Data types and functions errors 'FOER0000': (ElementPathError, 'Unidentified error'), 'FOAR0001': (ElementPathZeroDivisionError, 'Division by zero'), 'FOAR0002': (ElementPathOverflowError, 'Numeric operation overflow/underflow'), 'FOCA0001': (ElementPathValueError, 'Input value too large for decimal'), 'FOCA0002': (ElementPathValueError, 'Invalid lexical value'), 'FOCA0003': (ElementPathValueError, 'Input value too large for integer'), 'FOCA0005': (ElementPathValueError, 'NaN supplied as float/double value'), 'FOCA0006': (ElementPathValueError, 'String to be cast to decimal has too many digits of precision'), 'FOCH0001': (ElementPathValueError, 'Code point not valid'), 'FOCH0002': (ElementPathLocaleError, 'Unsupported collation'), 'FOCH0003': (ElementPathValueError, 'Unsupported normalization form'), 'FOCH0004': (ElementPathValueError, 'Collation does not support collation units'), 'FODC0001': (ElementPathValueError, 'No context document'), 'FODC0002': (ElementPathValueError, 'Error retrieving resource'), 'FODC0003': (ElementPathValueError, 'Function stability not defined'), 'FODC0004': (ElementPathValueError, 'Invalid argument to fn:collection'), 'FODC0005': (ElementPathValueError, 'Invalid argument to fn:doc or fn:doc-available'), 'FODT0001': (ElementPathOverflowError, 'Overflow/underflow in date/time operation'), 'FODT0002': (ElementPathOverflowError, 'Overflow/underflow in duration operation'), 'FODT0003': (ElementPathValueError, 'Invalid timezone value'), 'FONS0004': (ElementPathKeyError, 'No namespace found for prefix'), 'FONS0005': (ElementPathValueError, 'Base-uri not defined in the static context'), 'FORG0001': (ElementPathValueError, 'Invalid value for cast/constructor'), 'FORG0002': (ElementPathValueError, 'Invalid argument to fn:resolve-uri()'), 'FORG0003': (ElementPathValueError, 'fn:zero-or-one called with a sequence containing more than one item'), 'FORG0004': (ElementPathValueError, 'fn:one-or-more called with a sequence containing no items'), 'FORG0005': (ElementPathValueError, 'fn:exactly-one called with a sequence containing zero or more than one item'), 'FORG0006': (ElementPathTypeError, 'Invalid argument type'), 'FORG0008': (ElementPathValueError, 'The two arguments to fn:dateTime have inconsistent timezones'), 'FORG0009': (ElementPathValueError, 'Error in resolving a relative URI against a base URI in fn:resolve-uri'), 'FORX0001': (ElementPathValueError, 'Invalid regular expression flags'), 'FORX0002': (ElementPathValueError, 'Invalid regular expression'), 'FORX0003': (ElementPathValueError, 'Regular expression matches zero-length string'), 'FORX0004': (ElementPathValueError, 'Invalid replacement string'), 'FOTY0012': (ElementPathValueError, 'Argument node does not have a typed value'), # XPath 3.0+ errors 'XQST0039': (ElementPathTypeError, 'Duplicate parameter name in inline function expression'), 'XQST0046': (ElementPathTypeError, 'The namespace part of the EQName is not a valid URI'), 'XQST0052': (ElementPathNameError, 'The name of an in-scope simple schema type required'), 'XQST0070': (ElementPathNameError, 'Illegal use of a predefined namespace'), 'FOTY0013': (ElementPathTypeError, 'The argument to fn:data() contains a function item'), 'FOTY0014': (ElementPathTypeError, 'The argument to fn:string() is a function item'), 'FOTY0015': (ElementPathTypeError, 'An argument to fn:deep-equal() contains a function item'), 'FODC0006': (ElementPathValueError, 'String passed to fn:parse-xml is not a well-formed XML document'), 'FODC0010': (ElementPathRuntimeError, 'The processor does not support serialization'), 'FOUT1170': (ElementPathValueError, 'Invalid $href argument to fn:unparsed-text()'), 'FOUT1190': (ElementPathValueError, 'Cannot decode resource retrieved by fn:unparsed-text()'), 'FOUT1200': (ElementPathValueError, 'Cannot infer encoding of resource retrieved by fn:unparsed-text()'), 'FODF1280': (ElementPathValueError, 'Invalid decimal format name'), 'FODF1310': (ElementPathValueError, 'Invalid decimal format picture string'), 'FOFD1340': (ElementPathValueError, 'Invalid date/time formatting parameters'), 'FOFD1350': (ElementPathValueError, 'Invalid date/time formatting component'), 'XPTY0117': (ElementPathTypeError, 'Item type is xs:untypedAtomic and the expected type is namespace-sensitive'), 'XPDY0130': (ElementPathValueError, 'An implementation-defined limit has been exceeded'), 'XPST0133': (ElementPathValueError, 'The namespace URI for EQName is http://www.w3.org/2000/xmlns/'), # XSLT and XQuery Serialization errors # (the complete list: https://www.w3.org/TR/xslt-xquery-serialization/#id-errors) 'SENR0001': (ElementPathTypeError, 'item is an attribute node or a namespace node'), 'SEPM0016': (ElementPathValueError, 'parameter value is invalid for the defined domain'), 'SEPM0017': (ElementPathValueError, 'error during extraction of serialization parameters'), 'SEPM0018': (ElementPathValueError, 'use-character-maps serialization parameter in ' 'a sequence of length greater than one'), 'SEPM0019': (ElementPathValueError, 'same serialization parameter appears more than once'), # XPath 3.1+ errors 'FOAY0001': (ElementPathValueError, 'Array index out of bounds'), 'FOAY0002': (ElementPathValueError, 'Negative array length'), } def xpath_error(code: str, message: Optional[str] = None, token: Optional['Token[Any]'] = None, prefix: str = 'err') -> ElementPathError: """ Returns an XPath error instance related with a code. An XPath/XQuery/XSLT error code (ref: http://www.w3.org/2005/xqt-errors) is an alphanumeric token starting with four uppercase letters and ending with four digits. :param code: the error code. :param message: an optional custom additional message. :param token: an optional token instance. :param prefix: the namespace prefix to apply to the error code, defaults to 'err'. """ if code.startswith('{'): try: namespace, code = code[1:].split('}') except ValueError: message = '{!r} is not an xs:QName'.format(code) raise ElementPathValueError(message, 'err:XPTY0004', token) else: if namespace != 'http://www.w3.org/2005/xqt-errors': message = 'invalid namespace {!r}'.format(namespace) raise ElementPathValueError(message, 'err:XPTY0004', token) pcode = '%s:%s' % (prefix, code) if prefix else code elif ':' not in code: pcode = '%s:%s' % (prefix, code) if prefix else code elif not prefix or not code.startswith(prefix + ':'): message = '%r is not an XPath error code' % code raise ElementPathValueError(message, 'err:XPTY0004', token) else: pcode = code code = code[len(prefix) + 1:] try: error_class, default_message = XPATH_ERROR_CODES[code] except KeyError: raise ElementPathValueError( message or 'unknown XPath error code %r' % code, 'err:XPTY0004', token ) else: return error_class(message or default_message, pcode, token) elementpath-3.0.2/elementpath/helpers.py000066400000000000000000000155351427546011100203560ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import re import math import locale as _locale from calendar import isleap, leapdays from decimal import Decimal from operator import attrgetter from typing import Optional, Union, SupportsFloat ### # Common sets constants OCCURRENCE_INDICATORS = frozenset(('?', '*', '+')) BOOLEAN_VALUES = frozenset(('true', 'false', '1', '0')) NUMERIC_INF_OR_NAN = frozenset(('INF', '-INF', 'NaN')) INVALID_NUMERIC = frozenset( ('inf', '+inf', '-inf', 'nan', 'infinity', '+infinity', '-infinity') ) ### # Data validation helpers NORMALIZE_PATTERN = re.compile(r'[^\S\xa0]') WHITESPACES_PATTERN = re.compile(r'[^\S\xa0]+') # include ASCII 160 (non-breaking space) NCNAME_PATTERN = re.compile(r'^[^\d\W][\w.\-\u00B7\u0300-\u036F\u203F\u2040]*$') QNAME_PATTERN = re.compile( r'^(?:(?P[^\d\W][\w\-.\u00B7\u0300-\u036F\u0387\u06DD\u06DE\u203F\u2040]*):)?' r'(?P[^\d\W][\w\-.\u00B7\u0300-\u036F\u0387\u06DD\u06DE\u203F\u2040]*)$', ) EQNAME_PATTERN = re.compile( r'^(?:Q{(?P[^}]+)}|' r'(?P[^\d\W][\w\-.\u00B7\u0300-\u036F\u0387\u06DD\u06DE\u203F\u2040]*):)?' r'(?P[^\d\W][\w\-.\u00B7\u0300-\u036F\u0387\u06DD\u06DE\u203F\u2040]*)$', ) WRONG_ESCAPE_PATTERN = re.compile(r'%(?![a-fA-F\d]{2})') XML_NEWLINES_PATTERN = re.compile('\r\n|\r|\n') def collapse_white_spaces(s: str) -> str: return WHITESPACES_PATTERN.sub(' ', s).strip(' ') def is_idrefs(value: Optional[str]) -> bool: return isinstance(value, str) and \ all(NCNAME_PATTERN.match(x) is not None for x in value.split()) node_position = attrgetter('position') ### # Sequence type checking SEQUENCE_TYPE_PATTERN = re.compile(r'\s?([()?*+,])\s?') def normalize_sequence_type(sequence_type: str) -> str: sequence_type = WHITESPACES_PATTERN.sub(' ', sequence_type).strip() sequence_type = SEQUENCE_TYPE_PATTERN.sub(r'\1', sequence_type) return sequence_type.replace(',', ', ').replace(')as', ') as') ### # Date/Time helpers MONTH_DAYS = [0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31] MONTH_DAYS_LEAP = [0, 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31] def adjust_day(year: int, month: int, day: int) -> int: if month in (1, 3, 5, 7, 8, 10, 12): return day elif month in (4, 6, 9, 11): return min(day, 30) else: return min(day, 29) if isleap(year) else min(day, 28) def days_from_common_era(year: int) -> int: """ Returns the number of days from from 0001-01-01 to the provided year. For a common era year the days are counted until the last day of December, for a BCE year the days are counted down from the end to the 1st of January. """ if year > 0: return year * 365 + year // 4 - year // 100 + year // 400 elif year >= -1: return year * 366 else: year = -year - 1 return -(366 + year * 365 + year // 4 - year // 100 + year // 400) DAYS_IN_4Y = days_from_common_era(4) DAYS_IN_100Y = days_from_common_era(100) DAYS_IN_400Y = days_from_common_era(400) def months2days(year: int, month: int, months_delta: int) -> int: """ Converts a delta of months to a delta of days, counting from the 1st day of the month, relative to the year and the month passed as arguments. :param year: the reference start year, a negative or zero value means a BCE year \ (0 is 1 BCE, -1 is 2 BCE, -2 is 3 BCE, etc). :param month: the starting month (1-12). :param months_delta: the number of months, if negative count backwards. """ if not months_delta: return 0 total_months = month - 1 + months_delta target_year = year + total_months // 12 target_month = total_months % 12 + 1 if month <= 2: y_days = 365 * (target_year - year) + leapdays(year, target_year) else: y_days = 365 * (target_year - year) + leapdays(year + 1, target_year + 1) months_days = MONTH_DAYS_LEAP if isleap(target_year) else MONTH_DAYS if target_month >= month: m_days = sum(months_days[m] for m in range(month, target_month)) return y_days + m_days if y_days >= 0 else y_days + m_days else: m_days = sum(months_days[m] for m in range(target_month, month)) return y_days - m_days if y_days >= 0 else y_days - m_days def round_number(value: Union[float, int, Decimal]) -> Union[float, int, Decimal]: if math.isnan(value) or math.isinf(value): return value number = Decimal(value) if number > 0: return type(value)(number.quantize(Decimal('1'), rounding='ROUND_HALF_UP')) else: return type(value)(number.quantize(Decimal('1'), rounding='ROUND_HALF_DOWN')) def normalized_seconds(seconds: Decimal) -> str: # Decimal.normalize() does not remove exp every time: eg. Decimal('1E+1') return '{:.6f}'.format(seconds).rstrip('0').rstrip('.') def is_xml_codepoint(cp: int) -> bool: return cp in (0x9, 0xA, 0xD) or \ 0x20 <= cp <= 0xD7FF or \ 0xE000 <= cp <= 0xFFFD or \ 0x10000 <= cp <= 0x10FFFF def ordinal(n: int) -> str: if n in (11, 12, 13): return '%dth' % n least_significant_digit = n % 10 if least_significant_digit == 1: return '%dst' % n elif least_significant_digit == 2: return '%dnd' % n elif least_significant_digit == 3: return '%drd' % n else: return '%dth' % n def numeric_equal(op1: SupportsFloat, op2: SupportsFloat) -> bool: if op1 == op2: return True return math.isclose(op1, op2, rel_tol=1e-7, abs_tol=0.0) def numeric_not_equal(op1: SupportsFloat, op2: SupportsFloat) -> bool: if op1 == op2: return False return not math.isclose(op1, op2, rel_tol=1e-7, abs_tol=0.0) def match_wildcard(name: str, wildcard: str) -> bool: if wildcard == '*' or wildcard == '*:*': return True elif wildcard.startswith('*:'): if name.startswith('{'): return name.endswith(f'}}{wildcard[2:]}') else: return name == wildcard[2:] elif wildcard.startswith('{') and wildcard.endswith('}*') or wildcard.endswith(':*'): return name.startswith(wildcard[:-1]) else: return False def get_locale_category(category: int) -> str: """ Gets the current value of a locale category. A replacement of locale.getdefaultlocale(), deprecated since Python 3.11. """ locale = _locale.setlocale(category, None) if locale == 'C': # locale category does not seem to be configured, so get the user # preferred locale and then restore the previous state locale = _locale.setlocale(category, '') _locale.setlocale(category, 'C') return locale elementpath-3.0.2/elementpath/namespaces.py000066400000000000000000000125301427546011100210230ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import re from typing import cast, Dict, Optional, Tuple, MutableMapping, Union NamespacesType = MutableMapping[str, str] # Regex patterns related to names and namespaces NAMESPACE_URI_PATTERN = re.compile(r'{([^}]+)}') EXPANDED_NAME_PATTERN = re.compile( r'^(?:{(?P[^}]+)})?' r'(?P[^\d\W][\w\-.\u00B7\u0300-\u036F\u0387\u06DD\u06DE\u203F\u2040]*)$', ) # Namespaces XML_NAMESPACE = "http://www.w3.org/XML/1998/namespace" XMLNS_NAMESPACE = "http://www.w3.org/2000/xmlns/" # Used in DOM for xmlns declarations XSD_NAMESPACE = "http://www.w3.org/2001/XMLSchema" XSI_NAMESPACE = "http://www.w3.org/2001/XMLSchema-instance" XLINK_NAMESPACE = "http://www.w3.org/1999/xlink" # XPath/XQuery namespaces XPATH_FUNCTIONS_NAMESPACE = "http://www.w3.org/2005/xpath-functions" XQT_ERRORS_NAMESPACE = "http://www.w3.org/2005/xqt-errors" XPATH_MATH_FUNCTIONS_NAMESPACE = "http://www.w3.org/2005/xpath-functions/math" XPATH_MAP_FUNCTIONS_NAMESPACE = "http://www.w3.org/2005/xpath-functions/map" XPATH_ARRAY_FUNCTIONS_NAMESPACE = "http://www.w3.org/2005/xpath-functions/array" XSLT_XQUERY_SERIALIZATION_NAMESPACE = "http://www.w3.org/2010/xslt-xquery-serialization" # XML namespace attributes XML_BASE = '{%s}base' % XML_NAMESPACE XML_LANG = '{%s}lang' % XML_NAMESPACE XML_SPACE = '{%s}space' % XML_NAMESPACE XML_ID = '{%s}id' % XML_NAMESPACE # XML Schema Instance namespace attributes XSI_TYPE = '{%s}type' % XSI_NAMESPACE XSI_NIL = '{%s}nil' % XSI_NAMESPACE XSI_SCHEMA_LOCATION = '{%s}schemaLocation' % XSI_NAMESPACE XSI_NONS_SCHEMA_LOCATION = '{%s}schemaLocation' % XSI_NAMESPACE # XML Schema tags (schema and types) XSD_SCHEMA = '{%s}schema' % XSD_NAMESPACE XSD_ANY_TYPE = '{%s}anyType' % XSD_NAMESPACE XSD_ANY_SIMPLE_TYPE = '{%s}anySimpleType' % XSD_NAMESPACE XSD_ANY_ATOMIC_TYPE = '{%s}anyAtomicType' % XSD_NAMESPACE XSD_NOTATION = '{%s}NOTATION' % XSD_NAMESPACE XSD_ID = '{%s}ID' % XSD_NAMESPACE XSD_IDREF = '{%s}IDREF' % XSD_NAMESPACE XSD_IDREFS = '{%s}IDREFS' % XSD_NAMESPACE XSD_STRING = '{%s}string' % XSD_NAMESPACE XSD_FLOAT = '{%s}float' % XSD_NAMESPACE XSD_DOUBLE = '{%s}double' % XSD_NAMESPACE XSD_DECIMAL = '{%s}decimal' % XSD_NAMESPACE # XPath type labels defined in XSD namespace that are not XSD builtin types XSD_UNTYPED = '{%s}untyped' % XSD_NAMESPACE XSD_UNTYPED_ATOMIC = '{%s}untypedAtomic' % XSD_NAMESPACE XSD_ERROR = '{%s}error' % XSD_NAMESPACE def get_namespace(name: str) -> str: try: return NAMESPACE_URI_PATTERN.match(name).group(1) # type: ignore[union-attr] except AttributeError: return '' def split_expanded_name(name: str) -> Tuple[str, str]: match = EXPANDED_NAME_PATTERN.match(name) if match is None: raise ValueError("{!r} is not an expanded QName".format(name)) namespace, local_name = match.groups() return namespace or '', local_name def get_prefixed_name( qname: str, namespaces: Union[Dict[str, str], Dict[Optional[str], str]]) -> str: """ Get the prefixed form of a QName, using a namespace map. :param qname: an extended QName or a local name or a prefixed QName. :param namespaces: a dictionary with a map from prefixes to namespace URIs. """ try: if qname[0] == '{': ns_uri, local_name = qname[1:].split('}') elif qname[1] == '{' and qname[0] == 'Q': ns_uri, local_name = qname[2:].split('}') else: return qname except IndexError: return qname except (ValueError, TypeError): raise ValueError("{!r} is not a QName".format(qname)) for prefix, uri in sorted(namespaces.items(), reverse=True, key=lambda x: x if x[0] is not None else ('', x[1])): if uri == ns_uri: return '%s:%s' % (prefix, local_name) if prefix else local_name else: return qname def get_expanded_name( qname: str, namespaces: Union[Dict[str, str], Dict[Optional[str], str]]) -> str: """ Get the expanded form of a QName, using a namespace map. Local names are mapped to the default namespace. :param qname: a prefixed QName or a local name or an extended QName. :param namespaces: a dictionary with a map from prefixes to namespace URIs. :return: the expanded format of a QName or a local name. """ try: if qname[0] == '{' or qname[1] == '{' and qname[0] == 'Q': return qname except IndexError: return qname try: prefix, local_name = qname.split(':') except ValueError: if ':' in qname: raise ValueError("wrong format for prefixed QName %r" % qname) elif '' in namespaces: uri = namespaces[''] elif None in namespaces: uri = cast(Dict[Optional[str], str], namespaces)[None] # lxml nsmap else: return qname return '{%s}%s' % (uri, qname) if uri else qname else: if not prefix or not local_name: raise ValueError("wrong format for reference name %r" % qname) uri = namespaces[prefix] return '{%s}%s' % (uri, local_name) if uri else local_name elementpath-3.0.2/elementpath/protocols.py000066400000000000000000000160361427546011100207350ustar00rootroot00000000000000# # Copyright (c), 2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ Define type hints protocols for XPath related objects. """ import sys from typing import overload, Any if sys.version_info < (3, 8): # for Python < 3.8 fallback to typing.Any ElementProtocol = Any LxmlElementProtocol = Any DocumentProtocol = Any XsdValidatorProtocol = Any XsdSchemaProtocol = Any XsdComponentProtocol = Any XsdTypeProtocol = Any XsdElementProtocol = Any XsdAttributeProtocol = Any GlobalMapsProtocol = Any XMLSchemaProtocol = Any else: from typing import Dict, Iterator, Iterable, List, Literal, \ Optional, Protocol, Sized, Hashable, Union, TypeVar, runtime_checkable _T = TypeVar("_T") @runtime_checkable class ElementProtocol(Iterable['ElementProtocol'], Sized, Hashable, Protocol): def find( self, path: str, namespaces: Optional[Dict[str, str]] = ... ) -> Optional['ElementProtocol']: ... def iter(self, tag: Optional[str] = ...) -> Iterator['ElementProtocol']: ... @overload def get(self, key: str, default: None = ...) -> Optional[str]: ... # noinspection PyOverloads @overload def get(self, key: str, default: _T) -> Union[str, _T]: ... tag: str attrib: Dict[str, Any] text: Optional[str] tail: Optional[str] @runtime_checkable class LxmlElementProtocol(ElementProtocol, Protocol): def getroottree(self) -> 'DocumentProtocol': ... def getnext(self) -> Optional['LxmlElementProtocol']: ... def getparent(self) -> Optional['LxmlElementProtocol']: ... def getprevious(self) -> Optional['LxmlElementProtocol']: ... def itersiblings(self, tag: Optional[str] = ..., preceding: bool = False, *tags: str) -> Iterable['LxmlElementProtocol']: ... nsmap: Dict[Optional[str], str] @runtime_checkable class DocumentProtocol(Iterable[ElementProtocol], Hashable, Protocol): def getroot(self) -> ElementProtocol: ... def parse(self, source: Any, *args: Any, **kwargs: Any) -> 'DocumentProtocol': ... def iter(self, tag: Optional[str] = ...) -> Iterator[ElementProtocol]: ... @runtime_checkable class XsdValidatorProtocol(Protocol): def is_matching(self, name: Optional[str], default_namespace: Optional[str] = None) -> bool: ... xsd_version: Literal['1.0', '1.1'] name: Optional[str] maps: 'GlobalMapsProtocol' @runtime_checkable class XsdSchemaProtocol(XsdValidatorProtocol, ElementProtocol, Protocol): tag: Literal['{http://www.w3.org/2001/XMLSchema}schema'] attrib: Dict[str, 'XsdAttributeProtocol'] text: None XMLSchemaProtocol = XsdSchemaProtocol # for backward compatibility @runtime_checkable class XsdComponentProtocol(XsdValidatorProtocol, Protocol): parent: Optional['XsdComponentProtocol'] @runtime_checkable class XsdTypeProtocol(XsdComponentProtocol, Protocol): def is_simple(self) -> bool: """Returns `True` if it's a simpleType instance, `False` if it's a complexType.""" ... def is_empty(self) -> bool: """ Returns `True` if it's a simpleType instance or a complexType with empty content, `False` otherwise. """ ... def has_simple_content(self) -> bool: """ Returns `True` if it's a simpleType instance or a complexType with simple content, `False` otherwise. """ ... def has_mixed_content(self) -> bool: """ Returns `True` if it's a complexType with mixed content, `False` otherwise. """ ... def is_element_only(self) -> bool: """ Returns `True` if it's a complexType with element-only content, `False` otherwise. """ ... def is_key(self) -> bool: """Returns `True` if it's a simpleType derived from xs:ID, `False` otherwise.""" ... def is_qname(self) -> bool: """Returns `True` if it's a simpleType derived from xs:QName, `False` otherwise.""" ... def is_notation(self) -> bool: """Returns `True` if it's a simpleType derived from xs:NOTATION, `False` otherwise.""" ... def is_valid(self, obj: Any, *args: Any, **kwargs: Any) -> bool: """ Validates an XML object node using the XSD type. The argument *obj* is an element for complex type nodes or a text value for simple type nodes. Returns `True` if the argument is valid, `False` otherwise. """ ... def validate(self, obj: Any, *args: Any, **kwargs: Any) -> None: """ Validates an XML object node using the XSD type. The argument *obj* is an element for complex type nodes or a text value for simple type nodes. Raises a `ValueError` compatible exception (a `ValueError` or a subclass of it) if the argument is not valid. """ ... def decode(self, obj: Any, *args: Any, **kwargs: Any) -> Any: """ Decodes an XML object node using the XSD type. The argument *obj* is an element for complex type nodes or a text value for simple type nodes. Raises a `ValueError` or a `TypeError` compatible exception if the argument it's not valid. """ ... root_type: 'XsdTypeProtocol' """ The type at base of the definition of the XSD type. For a special type is the type itself. For an atomic type is the primitive type. For a list is the primitive type of the item. For a union is the base union type. For a complex type is xs:anyType. """ @runtime_checkable class XsdAttributeProtocol(XsdComponentProtocol, Protocol): type: Optional[XsdTypeProtocol] ref: Optional['XsdAttributeProtocol'] @runtime_checkable class XsdElementProtocol(XsdComponentProtocol, ElementProtocol, Protocol): type: Optional[XsdTypeProtocol] ref: Optional['XsdElementProtocol'] attrib: Dict[str, XsdAttributeProtocol] text: None class GlobalMapsProtocol(Protocol): types: Dict[str, XsdTypeProtocol] attributes: Dict[str, XsdAttributeProtocol] elements: Dict[str, XsdElementProtocol] substitution_groups: Dict[str, List[XsdElementProtocol]] __all__ = ['ElementProtocol', 'LxmlElementProtocol', 'DocumentProtocol', 'XsdValidatorProtocol', 'XsdSchemaProtocol', 'XsdComponentProtocol', 'XsdTypeProtocol', 'XsdElementProtocol', 'XsdAttributeProtocol', 'GlobalMapsProtocol', 'XMLSchemaProtocol'] elementpath-3.0.2/elementpath/py.typed000066400000000000000000000000001427546011100200160ustar00rootroot00000000000000elementpath-3.0.2/elementpath/regex/000077500000000000000000000000001427546011100174435ustar00rootroot00000000000000elementpath-3.0.2/elementpath/regex/__init__.py000066400000000000000000000020331427546011100215520ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ Subpackage for processing XML regular expressions and for converting them to Python-compatible regexps. XPath/XQuery/XML-Schema regex flavors are supported through translate_pattern() API options. Default options process XPath/XQuery patterns. """ from .codepoints import iter_code_points from .unicode_subsets import RegexError, UnicodeSubset, UNICODE_CATEGORIES, UNICODE_BLOCKS from .character_classes import I_SHORTCUT_REPLACE, C_SHORTCUT_REPLACE, CharacterClass from .patterns import translate_pattern __all__ = ['UNICODE_CATEGORIES', 'UNICODE_BLOCKS', 'I_SHORTCUT_REPLACE', 'C_SHORTCUT_REPLACE', 'translate_pattern', 'RegexError', 'UnicodeSubset', 'CharacterClass', 'iter_code_points'] elementpath-3.0.2/elementpath/regex/character_classes.py000066400000000000000000000201471427546011100234720ustar00rootroot00000000000000# # Copyright (c), 2016-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import re from itertools import chain from sys import maxunicode from collections import Counter from typing import AbstractSet, Any, Iterator, MutableSet, Optional, Union from .unicode_subsets import RegexError, UnicodeSubset, UNICODE_CATEGORIES, unicode_subset I_SHORTCUT_REPLACE = ( ":A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF" "\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD" ) C_SHORTCUT_REPLACE = ( "-.0-9:A-Z_a-z\u00B7\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u037D\u037F-\u1FFF\u200C-" "\u200D\u203F\u2040\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD" ) S_SHORTCUT_SET = UnicodeSubset(' \n\t\r') D_SHORTCUT_SET = UnicodeSubset() D_SHORTCUT_SET._codepoints = UNICODE_CATEGORIES['Nd'].codepoints I_SHORTCUT_SET = UnicodeSubset(I_SHORTCUT_REPLACE) C_SHORTCUT_SET = UnicodeSubset(C_SHORTCUT_REPLACE) W_SHORTCUT_SET = UnicodeSubset(chain( UNICODE_CATEGORIES['L'].codepoints, UNICODE_CATEGORIES['M'].codepoints, UNICODE_CATEGORIES['N'].codepoints, UNICODE_CATEGORIES['S'].codepoints )) # Single and Multi character escapes CHARACTER_ESCAPES = { # Single-character escapes '\\n': '\n', '\\r': '\r', '\\t': '\t', '\\|': '|', '\\.': '.', '\\-': '-', '\\^': '^', '\\?': '?', '\\*': '*', '\\+': '+', '\\{': '{', '\\}': '}', '\\(': '(', '\\)': ')', '\\[': '[', '\\]': ']', '\\\\': '\\', # Multi-character escapes '\\s': S_SHORTCUT_SET, '\\S': S_SHORTCUT_SET, '\\d': D_SHORTCUT_SET, '\\D': D_SHORTCUT_SET, '\\i': I_SHORTCUT_SET, '\\I': I_SHORTCUT_SET, '\\c': C_SHORTCUT_SET, '\\C': C_SHORTCUT_SET, '\\w': W_SHORTCUT_SET, '\\W': W_SHORTCUT_SET, } class CharacterClass(MutableSet[int]): """ A set class to represent XML Schema/XQuery/XPath regex character class. :param charset: a string with formatted character set. :param xsd_version: the reference XSD version for syntax variants. Defaults to '1.0'. TODO: implement __ior__, __iand__, __ixor__ operators for a full mutable set class. """ _re_char_set = re.compile(r'(? None: self.xsd_version = xsd_version self.positive = UnicodeSubset() self.negative = UnicodeSubset() if charset: self.add(charset) def __repr__(self) -> str: return '%s(%s)' % (self.__class__.__name__, str(self)) def __str__(self) -> str: if not self.negative: return '[%s]' % str(self.positive) elif not self.positive: return '[^%s]' % str(self.negative) else: return '[%s%s]' % ( str(UnicodeSubset(self.negative.complement())), str(self.positive) ) def __copy__(self) -> 'CharacterClass': obj = CharacterClass(xsd_version=self.xsd_version) obj.positive.update(self.positive) obj.negative.update(self.negative) return self def __contains__(self, item: object) -> bool: if isinstance(item, str): item = ord(item) elif not isinstance(item, int): return False if self.negative: return item not in self.negative or item in self.positive return item in self.positive def __iter__(self) -> Iterator[int]: if self.negative: return ( cp for cp in range(maxunicode + 1) if cp in self.positive or cp not in self.negative ) return iter(sorted(self.positive)) # type: ignore[arg-type] def __len__(self) -> int: if self.negative: not_in_positive = Counter(x not in self.positive for x in self.negative)[True] return maxunicode + 1 - not_in_positive return len(self.positive) def __isub__(self, other: AbstractSet[Any]) -> 'CharacterClass': if not isinstance(other, CharacterClass): return NotImplemented elif self.negative: if other.negative: self.positive |= (other.negative - self.negative) self.negative.clear() self.negative |= other.positive elif other.negative: self.positive &= other.negative self.positive -= other.positive return self def __sub__(self, other: AbstractSet[Any]) -> 'CharacterClass': obj = self.__copy__() return obj.__isub__(other) def add(self, charset: Union[int, str]) -> None: if isinstance(charset, int): charset = chr(charset) for part in self._re_char_set.split(charset): if part in CHARACTER_ESCAPES: value = CHARACTER_ESCAPES[part] if isinstance(value, str): self.positive.update(value) elif part[-1].islower(): self.positive |= value else: self.negative |= value elif part.startswith('\\p') or part.startswith('\\P'): if self._re_unicode_ref.search(part) is None: raise RegexError("wrong Unicode block specification %r" % part) try: subset = unicode_subset(part[3:-1]) except RegexError: # XSD 1.1 supports Is prefix to match Unicode blocks if not self.xsd_version or not part[3:].startswith('Is'): raise self.positive |= UnicodeSubset([(0, maxunicode + 1)]) else: if part.startswith('\\p'): self.positive |= subset else: self.negative |= subset else: self.positive.update(part) def discard(self, charset: Union[int, str]) -> None: if isinstance(charset, int): charset = chr(charset) for part in self._re_char_set.split(charset): if part in CHARACTER_ESCAPES: value = CHARACTER_ESCAPES[part] if isinstance(value, str): self.positive.difference_update(value) if self.negative: self.negative.update(value) elif part[-1].islower(): self.positive -= value if self.negative: self.negative |= value else: self.positive &= value self.negative.clear() elif part.startswith('\\p') or part.startswith('\\P'): if self._re_unicode_ref.search(part) is None: raise RegexError("wrong Unicode block specification %r" % part) try: subset = unicode_subset(part[3:-1]) except RegexError: # XSD 1.1 supports Is prefix to match Unicode blocks if not self.xsd_version or not part[3:].startswith('Is'): raise self.positive -= UnicodeSubset([(0, maxunicode + 1)]) else: if part.startswith('\\p'): self.positive -= subset else: self.negative -= subset else: self.positive.difference_update(part) def clear(self) -> None: self.positive.clear() self.negative.clear() def complement(self) -> None: if self.positive or self.negative: self.positive, self.negative = self.negative, self.positive else: self.positive.codepoints.append((0, maxunicode + 1)) elementpath-3.0.2/elementpath/regex/codepoints.py000066400000000000000000000071671427546011100221770ustar00rootroot00000000000000# # Copyright (c), 2016-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ This module defines Unicode code points helper functions. """ from sys import maxunicode from typing import Iterable, Iterator, Optional, Set, Tuple, Union CHARACTER_CLASS_ESCAPED: Set[int] = {ord(c) for c in r'-|.^?*+{}()[]\\'} """Code Points of escaped chars in a character class.""" CodePoint = Union[int, Tuple[int, int]] def code_point_order(cp: CodePoint) -> int: """Ordering function for code points.""" return cp if isinstance(cp, int) else cp[0] def code_point_reverse_order(cp: CodePoint) -> int: """Reverse ordering function for code points.""" return cp if isinstance(cp, int) else cp[1] - 1 def iter_code_points(code_points: Iterable[CodePoint], reverse: bool = False) \ -> Iterator[CodePoint]: """ Iterates a code points sequence. Three ore more consecutive code points are merged in a range. :param code_points: an iterable with code points and code point ranges. :param reverse: if `True` reverses the order of the sequence. :return: yields code points or code point ranges. """ start_cp = end_cp = 0 if reverse: code_points = sorted(code_points, key=code_point_reverse_order, reverse=True) else: code_points = sorted(code_points, key=code_point_order) for cp in code_points: if isinstance(cp, int): cp = cp, cp + 1 if not end_cp: start_cp, end_cp = cp continue elif reverse: if start_cp <= cp[1]: start_cp = min(start_cp, cp[0]) continue elif end_cp >= cp[0]: end_cp = max(end_cp, cp[1]) continue if end_cp > start_cp + 1: yield start_cp, end_cp else: yield start_cp start_cp, end_cp = cp else: if end_cp: if end_cp > start_cp + 1: yield start_cp, end_cp else: yield start_cp def get_code_point_range(cp: CodePoint) -> Optional[CodePoint]: """ Returns a code point range. :param cp: a single code point or a code point range. :return: a code point range or `None` if the argument is not a \ code point or a code point range. """ if isinstance(cp, int): if 0 <= cp <= maxunicode: return cp, cp + 1 else: try: if isinstance(cp[0], int) and isinstance(cp[1], int): if 0 <= cp[0] < cp[1] <= maxunicode + 1: return cp except (IndexError, TypeError): pass return None def code_point_repr(cp: CodePoint) -> str: """ Returns the string representation of a code point. :param cp: an integer or a tuple with at least two integers. \ Values must be in interval [0, sys.maxunicode]. """ if isinstance(cp, int): if cp in CHARACTER_CLASS_ESCAPED: return r'\%s' % chr(cp) return chr(cp) if cp[0] in CHARACTER_CLASS_ESCAPED: start_char = r'\%s' % chr(cp[0]) else: start_char = chr(cp[0]) end_cp = cp[1] - 1 # Character ranges include the right bound if end_cp in CHARACTER_CLASS_ESCAPED: end_char = r'\%s' % chr(end_cp) else: end_char = chr(end_cp) if end_cp > cp[0] + 1: return '%s-%s' % (start_char, end_char) else: return start_char + end_char elementpath-3.0.2/elementpath/regex/generate_categories.py000066400000000000000000000076741427546011100240320ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # type: ignore """Codepoints module generator utility.""" CATEGORIES_TEMPLATE = """# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # --- Auto-generated code: don't edit this file --- # # Unicode data version {0} # RAW_UNICODE_CATEGORIES = {{ {1} }} """ def get_unicodedata_categories(): """ Extracts Unicode categories information from unicodedata library. Each category is represented with an ordered list containing code points and code point ranges. :return: a dictionary with category names as keys and lists as values. """ categories = {k: [] for k in ( 'C', 'Cc', 'Cf', 'Cs', 'Co', 'Cn', 'L', 'Lu', 'Ll', 'Lt', 'Lm', 'Lo', 'M', 'Mn', 'Mc', 'Me', 'N', 'Nd', 'Nl', 'No', 'P', 'Pc', 'Pd', 'Ps', 'Pe', 'Pi', 'Pf', 'Po', 'S', 'Sm', 'Sc', 'Sk', 'So', 'Z', 'Zs', 'Zl', 'Zp' )} # Generate major categories major_category = 'C' start_cp, next_cp = 0, 1 for cp in range(maxunicode + 1): if category(chr(cp))[0] != major_category: if cp > next_cp: categories[major_category].append((start_cp, cp)) else: categories[major_category].append(start_cp) major_category = category(chr(cp))[0] start_cp, next_cp = cp, cp + 1 else: if next_cp == maxunicode + 1: categories[major_category].append(start_cp) else: categories[major_category].append((start_cp, maxunicode + 1)) # Generate minor categories minor_category = 'Cc' start_cp, next_cp = 0, 1 for cp in range(maxunicode + 1): if category(chr(cp)) != minor_category: if cp > next_cp: categories[minor_category].append((start_cp, cp)) else: categories[minor_category].append(start_cp) minor_category = category(chr(cp)) start_cp, next_cp = cp, cp + 1 else: if next_cp == maxunicode + 1: categories[minor_category].append(start_cp) else: categories[minor_category].append((start_cp, maxunicode + 1)) return categories if __name__ == '__main__': import argparse import pprint import os from sys import maxunicode from unicodedata import category, unidata_version parser = argparse.ArgumentParser(description="Generate Unicode categories module.") parser.add_argument('dirpath', type=str, nargs='?', default=os.path.dirname(__file__), help="alternative directory path for generated module.") args = parser.parse_args() print("+++ Generate Unicode categories module +++\n") print("Unicode data version {}\n".format(unidata_version)) filename = os.path.join(args.dirpath, 'unicode_categories.py') if os.path.isfile(filename): confirm = input("Overwrite existing module %r? [Y/Yes to confirm] " % filename) if confirm.upper() not in ('Y', 'YES'): print("Generation not confirmed: exiting ...") exit() print("Saving Unicode categories codepoints to %r" % filename) with open(filename, 'w') as fp: categories_repr = pprint.pformat(get_unicodedata_categories(), compact=True) indented_repr = '\n '.join(categories_repr[1:-1].split('\n')) fp.write(CATEGORIES_TEMPLATE.format(unidata_version, indented_repr)) elementpath-3.0.2/elementpath/regex/patterns.py000066400000000000000000000263571427546011100216720ustar00rootroot00000000000000# # Copyright (c), 2016-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ Parse and translate XML Schema regular expressions to Python regex syntax. """ import re from sys import maxunicode from ..helpers import OCCURRENCE_INDICATORS from .unicode_subsets import RegexError, UnicodeSubset, unicode_subset from .character_classes import I_SHORTCUT_REPLACE, C_SHORTCUT_REPLACE, CharacterClass HYPHENS_PATTERN = re.compile(r'(? str: """ Translates a pattern regex expression to a Python regex pattern. With default options the translator processes XPath 2.0/XQuery 1.0 regex patterns. For XML Schema patterns set all boolean options to `False`. :param pattern: the source XML Schema regular expression. :param flags: regex flags as represented by Python's re module. :param xsd_version: apply regex rules of a specific XSD version, '1.0' for default. :param back_references: if `True` supports back-references and capturing groups. :param lazy_quantifiers: if `True` supports lazy quantifiers (\\*?, +?). :param anchors: if `True` supports ^ and $ anchors, otherwise the translated \ pattern is anchored to its boundaries and anchors are treated as normal characters. """ pos: int msg: str def parse_character_class() -> CharacterClass: nonlocal pos nonlocal msg pos += 1 if pattern[pos] == '^': pos += 1 negative = True else: negative = False char_class_pos = pos while True: if pattern[pos] == '[': msg = "invalid character '[' at position {}: {!r}" raise RegexError(msg.format(pos, pattern)) elif pattern[pos] == '\\': if pattern[pos + 1].isdigit(): msg = "illegal back-reference in character class at position {}: {!r}" raise RegexError(msg.format(pos, pattern)) pos += 2 elif pattern[pos] == ']' or pattern[pos:pos + 2] == '-[': if pos == char_class_pos: msg = "empty character class at position {}: {!r}" raise RegexError(msg.format(pos, pattern)) char_class_pattern = pattern[char_class_pos:pos] if HYPHENS_PATTERN.search(char_class_pattern) and pos - char_class_pos > 2: msg = "invalid character range '--' at position {}: {!r}" raise RegexError(msg.format(pos, pattern)) if xsd_version == '1.0': hyphen_match = INVALID_HYPHEN_PATTERN.search(char_class_pattern) if hyphen_match is not None: hyphen_pos = char_class_pos + hyphen_match.span()[1] - 2 msg = "unescaped character '-' at position {}: {!r}" raise RegexError(msg.format(hyphen_pos, pattern)) char_class = CharacterClass(char_class_pattern, xsd_version) if negative: char_class.complement() break # pragma: no cover else: pos += 1 if pattern[pos] != ']': # Parse a group subtraction pos += 1 subtracted_class = parse_character_class() pos += 1 char_class -= subtracted_class return char_class group_open_char = '(' if back_references else '(?:' regex = [] if anchors else ['^%s' % group_open_char] pos = 0 pattern_len = len(pattern) total_groups = 0 nested_groups = 0 dot_all = flags & re.DOTALL if back_references: match = FORBIDDEN_ESCAPES_REF_PATTERN.search(pattern) else: match = FORBIDDEN_ESCAPES_NOREF_PATTERN.search(pattern) if match: msg = "not allowed escape sequence {!r} at position {}: {!r}" raise RegexError(msg.format(match.group(), match.span()[0], pattern)) while pos < pattern_len: ch = pattern[pos] if ch == '.': regex.append(ch if dot_all else '[^\r\n]') elif ch in ('^', '$'): if not anchors: regex.append(r'\%s' % ch) elif ch == '^': regex.append(r'(?= pattern_len: regex.append('\\') elif pattern[pos].isdigit(): regex.append('\\%s' % pattern[pos]) reference = DIGITS_PATTERN.match(pattern[pos:]).group() # type: ignore[union-attr] if len(reference) > 1: k = 0 for k in range(1, len(reference)): if total_groups < int(reference[:k + 1]): regex.append('[%s]' % pattern[pos + k]) break else: regex.append(pattern[pos + k]) pos += k # pragma: no cover elif pattern[pos] == 'i': regex.append('[%s]' % I_SHORTCUT_REPLACE) elif pattern[pos] == 'I': regex.append('[^%s]' % I_SHORTCUT_REPLACE) elif pattern[pos] == 'c': regex.append('[%s]' % C_SHORTCUT_REPLACE) elif pattern[pos] == 'C': regex.append('[^%s]' % C_SHORTCUT_REPLACE) elif pattern[pos] in 'pP': block_pos = pos - 1 try: if pattern[pos + 1] != '{': raise RegexError("a '{' expected, found %r." % pattern[pos + 1]) while pattern[pos] != '}': pos += 1 except (IndexError, ValueError): msg = "truncated unicode block escape at position {}: {!r}" raise RegexError(msg.format(block_pos, pattern)) block_name = pattern[block_pos + 3:pos] if flags & re.VERBOSE: # spaces are completely collapsed in verbose regex patterns block_name = block_name.replace(' ', '') try: p_shortcut_set = unicode_subset(block_name) except RegexError: # XSD 1.1 supports Is prefix to match Unicode blocks if xsd_version == '1.0' or not block_name.startswith('Is'): raise p_shortcut_group = '[%s]' % UnicodeSubset([(0, maxunicode)]) else: if pattern[block_pos + 1] == 'p': p_shortcut_group = '[%s]' % p_shortcut_set else: p_shortcut_group = '[^%s]' % p_shortcut_set if flags & re.IGNORECASE: regex.append('(?-i:%s)' % p_shortcut_group) else: regex.append(p_shortcut_group) else: regex.append('\\%s' % pattern[pos]) else: regex.append(ch) pos += 1 if nested_groups > 0: raise RegexError("unterminated subpattern in expression: %r" % pattern) if not anchors: regex.append(r')$(?!\n\Z)') return ''.join(regex) elementpath-3.0.2/elementpath/regex/unicode_categories.py000066400000000000000000002331701427546011100236560ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # --- Auto-generated code: don't edit this file --- # # Unicode data version 12.1.0 # RAW_UNICODE_CATEGORIES = { 'C': [(0, 32), (127, 160), 173, (888, 890), (896, 900), 907, 909, 930, 1328, (1367, 1369), (1419, 1421), 1424, (1480, 1488), (1515, 1519), (1525, 1542), (1564, 1566), 1757, (1806, 1808), (1867, 1869), (1970, 1984), (2043, 2045), (2094, 2096), 2111, (2140, 2142), 2143, (2155, 2208), 2229, (2238, 2259), 2274, 2436, (2445, 2447), (2449, 2451), 2473, 2481, (2483, 2486), (2490, 2492), (2501, 2503), (2505, 2507), (2511, 2519), (2520, 2524), 2526, (2532, 2534), (2559, 2561), 2564, (2571, 2575), (2577, 2579), 2601, 2609, 2612, 2615, (2618, 2620), 2621, (2627, 2631), (2633, 2635), (2638, 2641), (2642, 2649), 2653, (2655, 2662), (2679, 2689), 2692, 2702, 2706, 2729, 2737, 2740, (2746, 2748), 2758, 2762, (2766, 2768), (2769, 2784), (2788, 2790), (2802, 2809), 2816, 2820, (2829, 2831), (2833, 2835), 2857, 2865, 2868, (2874, 2876), (2885, 2887), (2889, 2891), (2894, 2902), (2904, 2908), 2910, (2916, 2918), (2936, 2946), 2948, (2955, 2958), 2961, (2966, 2969), 2971, 2973, (2976, 2979), (2981, 2984), (2987, 2990), (3002, 3006), (3011, 3014), 3017, (3022, 3024), (3025, 3031), (3032, 3046), (3067, 3072), 3085, 3089, 3113, (3130, 3133), 3141, 3145, (3150, 3157), 3159, (3163, 3168), (3172, 3174), (3184, 3191), 3213, 3217, 3241, 3252, (3258, 3260), 3269, 3273, (3278, 3285), (3287, 3294), 3295, (3300, 3302), 3312, (3315, 3328), 3332, 3341, 3345, 3397, 3401, (3408, 3412), (3428, 3430), (3456, 3458), 3460, (3479, 3482), 3506, 3516, (3518, 3520), (3527, 3530), (3531, 3535), 3541, 3543, (3552, 3558), (3568, 3570), (3573, 3585), (3643, 3647), (3676, 3713), 3715, 3717, 3723, 3748, 3750, (3774, 3776), 3781, 3783, (3790, 3792), (3802, 3804), (3808, 3840), 3912, (3949, 3953), 3992, 4029, 4045, (4059, 4096), 4294, (4296, 4301), (4302, 4304), 4681, (4686, 4688), 4695, 4697, (4702, 4704), 4745, (4750, 4752), 4785, (4790, 4792), 4799, 4801, (4806, 4808), 4823, 4881, (4886, 4888), (4955, 4957), (4989, 4992), (5018, 5024), (5110, 5112), (5118, 5120), (5789, 5792), (5881, 5888), 5901, (5909, 5920), (5943, 5952), (5972, 5984), 5997, 6001, (6004, 6016), (6110, 6112), (6122, 6128), (6138, 6144), (6158, 6160), (6170, 6176), (6265, 6272), (6315, 6320), (6390, 6400), 6431, (6444, 6448), (6460, 6464), (6465, 6468), (6510, 6512), (6517, 6528), (6572, 6576), (6602, 6608), (6619, 6622), (6684, 6686), 6751, (6781, 6783), (6794, 6800), (6810, 6816), (6830, 6832), (6847, 6912), (6988, 6992), (7037, 7040), (7156, 7164), (7224, 7227), (7242, 7245), (7305, 7312), (7355, 7357), (7368, 7376), (7419, 7424), 7674, (7958, 7960), (7966, 7968), (8006, 8008), (8014, 8016), 8024, 8026, 8028, 8030, (8062, 8064), 8117, 8133, (8148, 8150), 8156, (8176, 8178), 8181, 8191, (8203, 8208), (8234, 8239), (8288, 8304), (8306, 8308), 8335, (8349, 8352), (8384, 8400), (8433, 8448), (8588, 8592), (9255, 9280), (9291, 9312), (11124, 11126), (11158, 11160), 11311, 11359, (11508, 11513), 11558, (11560, 11565), (11566, 11568), (11624, 11631), (11633, 11647), (11671, 11680), 11687, 11695, 11703, 11711, 11719, 11727, 11735, 11743, (11856, 11904), 11930, (12020, 12032), (12246, 12272), (12284, 12288), 12352, (12439, 12441), (12544, 12549), 12592, 12687, (12731, 12736), (12772, 12784), 12831, (19894, 19904), (40944, 40960), (42125, 42128), (42183, 42192), (42540, 42560), (42744, 42752), (42944, 42946), (42951, 42999), (43052, 43056), (43066, 43072), (43128, 43136), (43206, 43214), (43226, 43232), (43348, 43359), (43389, 43392), 43470, (43482, 43486), 43519, (43575, 43584), (43598, 43600), (43610, 43612), (43715, 43739), (43767, 43777), (43783, 43785), (43791, 43793), (43799, 43808), 43815, 43823, (43880, 43888), (44014, 44016), (44026, 44032), (55204, 55216), (55239, 55243), (55292, 63744), (64110, 64112), (64218, 64256), (64263, 64275), (64280, 64285), 64311, 64317, 64319, 64322, 64325, (64450, 64467), (64832, 64848), (64912, 64914), (64968, 65008), (65022, 65024), (65050, 65056), 65107, 65127, (65132, 65136), 65141, (65277, 65281), (65471, 65474), (65480, 65482), (65488, 65490), (65496, 65498), (65501, 65504), 65511, (65519, 65532), (65534, 65536), 65548, 65575, 65595, 65598, (65614, 65616), (65630, 65664), (65787, 65792), (65795, 65799), (65844, 65847), 65935, (65948, 65952), (65953, 66000), (66046, 66176), (66205, 66208), (66257, 66272), (66300, 66304), (66340, 66349), (66379, 66384), (66427, 66432), 66462, (66500, 66504), (66518, 66560), (66718, 66720), (66730, 66736), (66772, 66776), (66812, 66816), (66856, 66864), (66916, 66927), (66928, 67072), (67383, 67392), (67414, 67424), (67432, 67584), (67590, 67592), 67593, 67638, (67641, 67644), (67645, 67647), 67670, (67743, 67751), (67760, 67808), 67827, (67830, 67835), (67868, 67871), (67898, 67903), (67904, 67968), (68024, 68028), (68048, 68050), 68100, (68103, 68108), 68116, 68120, (68150, 68152), (68155, 68159), (68169, 68176), (68185, 68192), (68256, 68288), (68327, 68331), (68343, 68352), (68406, 68409), (68438, 68440), (68467, 68472), (68498, 68505), (68509, 68521), (68528, 68608), (68681, 68736), (68787, 68800), (68851, 68858), (68904, 68912), (68922, 69216), (69247, 69376), (69416, 69424), (69466, 69600), (69623, 69632), (69710, 69714), (69744, 69759), 69821, (69826, 69840), (69865, 69872), (69882, 69888), 69941, (69959, 69968), (70007, 70016), (70094, 70096), 70112, (70133, 70144), 70162, (70207, 70272), 70279, 70281, 70286, 70302, (70314, 70320), (70379, 70384), (70394, 70400), 70404, (70413, 70415), (70417, 70419), 70441, 70449, 70452, 70458, (70469, 70471), (70473, 70475), (70478, 70480), (70481, 70487), (70488, 70493), (70500, 70502), (70509, 70512), (70517, 70656), 70746, 70748, (70752, 70784), (70856, 70864), (70874, 71040), (71094, 71096), (71134, 71168), (71237, 71248), (71258, 71264), (71277, 71296), (71353, 71360), (71370, 71424), (71451, 71453), (71468, 71472), (71488, 71680), (71740, 71840), (71923, 71935), (71936, 72096), (72104, 72106), (72152, 72154), (72165, 72192), (72264, 72272), (72355, 72384), (72441, 72704), 72713, 72759, (72774, 72784), (72813, 72816), (72848, 72850), 72872, (72887, 72960), 72967, 72970, (73015, 73018), 73019, 73022, (73032, 73040), (73050, 73056), 73062, 73065, 73103, 73106, (73113, 73120), (73130, 73440), (73465, 73664), (73714, 73727), (74650, 74752), 74863, (74869, 74880), (75076, 77824), (78895, 82944), (83527, 92160), (92729, 92736), 92767, (92778, 92782), (92784, 92880), (92910, 92912), (92918, 92928), (92998, 93008), 93018, 93026, (93048, 93053), (93072, 93760), (93851, 93952), (94027, 94031), (94088, 94095), (94112, 94176), (94180, 94208), (100344, 100352), (101107, 110592), (110879, 110928), (110931, 110948), (110952, 110960), (111356, 113664), (113771, 113776), (113789, 113792), (113801, 113808), (113818, 113820), (113824, 118784), (119030, 119040), (119079, 119081), (119155, 119163), (119273, 119296), (119366, 119520), (119540, 119552), (119639, 119648), (119673, 119808), 119893, 119965, (119968, 119970), (119971, 119973), (119975, 119977), 119981, 119994, 119996, 120004, 120070, (120075, 120077), 120085, 120093, 120122, 120127, 120133, (120135, 120138), 120145, (120486, 120488), (120780, 120782), (121484, 121499), 121504, (121520, 122880), 122887, (122905, 122907), 122914, 122917, (122923, 123136), (123181, 123184), (123198, 123200), (123210, 123214), (123216, 123584), (123642, 123647), (123648, 124928), (125125, 125127), (125143, 125184), (125260, 125264), (125274, 125278), (125280, 126065), (126133, 126209), (126270, 126464), 126468, 126496, 126499, (126501, 126503), 126504, 126515, 126520, 126522, (126524, 126530), (126531, 126535), 126536, 126538, 126540, 126544, 126547, (126549, 126551), 126552, 126554, 126556, 126558, 126560, 126563, (126565, 126567), 126571, 126579, 126584, 126589, 126591, 126602, (126620, 126625), 126628, 126634, (126652, 126704), (126706, 126976), (127020, 127024), (127124, 127136), (127151, 127153), 127168, 127184, (127222, 127232), (127245, 127248), (127341, 127344), (127405, 127462), (127491, 127504), (127548, 127552), (127561, 127568), (127570, 127584), (127590, 127744), (128726, 128736), (128749, 128752), (128763, 128768), (128884, 128896), (128985, 128992), (129004, 129024), (129036, 129040), (129096, 129104), (129114, 129120), (129160, 129168), (129198, 129280), 129292, 129394, (129399, 129402), (129443, 129445), (129451, 129454), (129483, 129485), (129620, 129632), (129646, 129648), (129652, 129656), (129659, 129664), (129667, 129680), (129686, 131072), (173783, 173824), (177973, 177984), (178206, 178208), (183970, 183984), (191457, 194560), (195102, 917760), (918000, 1114112)], 'Cc': [(0, 32), (127, 160)], 'Cf': [173, (1536, 1542), 1564, 1757, 1807, 2274, 6158, (8203, 8208), (8234, 8239), (8288, 8293), (8294, 8304), 65279, (65529, 65532), 69821, 69837, (78896, 78905), (113824, 113828), (119155, 119163), 917505, (917536, 917632)], 'Cn': [(888, 890), (896, 900), 907, 909, 930, 1328, (1367, 1369), (1419, 1421), 1424, (1480, 1488), (1515, 1519), (1525, 1536), 1565, 1806, (1867, 1869), (1970, 1984), (2043, 2045), (2094, 2096), 2111, (2140, 2142), 2143, (2155, 2208), 2229, (2238, 2259), 2436, (2445, 2447), (2449, 2451), 2473, 2481, (2483, 2486), (2490, 2492), (2501, 2503), (2505, 2507), (2511, 2519), (2520, 2524), 2526, (2532, 2534), (2559, 2561), 2564, (2571, 2575), (2577, 2579), 2601, 2609, 2612, 2615, (2618, 2620), 2621, (2627, 2631), (2633, 2635), (2638, 2641), (2642, 2649), 2653, (2655, 2662), (2679, 2689), 2692, 2702, 2706, 2729, 2737, 2740, (2746, 2748), 2758, 2762, (2766, 2768), (2769, 2784), (2788, 2790), (2802, 2809), 2816, 2820, (2829, 2831), (2833, 2835), 2857, 2865, 2868, (2874, 2876), (2885, 2887), (2889, 2891), (2894, 2902), (2904, 2908), 2910, (2916, 2918), (2936, 2946), 2948, (2955, 2958), 2961, (2966, 2969), 2971, 2973, (2976, 2979), (2981, 2984), (2987, 2990), (3002, 3006), (3011, 3014), 3017, (3022, 3024), (3025, 3031), (3032, 3046), (3067, 3072), 3085, 3089, 3113, (3130, 3133), 3141, 3145, (3150, 3157), 3159, (3163, 3168), (3172, 3174), (3184, 3191), 3213, 3217, 3241, 3252, (3258, 3260), 3269, 3273, (3278, 3285), (3287, 3294), 3295, (3300, 3302), 3312, (3315, 3328), 3332, 3341, 3345, 3397, 3401, (3408, 3412), (3428, 3430), (3456, 3458), 3460, (3479, 3482), 3506, 3516, (3518, 3520), (3527, 3530), (3531, 3535), 3541, 3543, (3552, 3558), (3568, 3570), (3573, 3585), (3643, 3647), (3676, 3713), 3715, 3717, 3723, 3748, 3750, (3774, 3776), 3781, 3783, (3790, 3792), (3802, 3804), (3808, 3840), 3912, (3949, 3953), 3992, 4029, 4045, (4059, 4096), 4294, (4296, 4301), (4302, 4304), 4681, (4686, 4688), 4695, 4697, (4702, 4704), 4745, (4750, 4752), 4785, (4790, 4792), 4799, 4801, (4806, 4808), 4823, 4881, (4886, 4888), (4955, 4957), (4989, 4992), (5018, 5024), (5110, 5112), (5118, 5120), (5789, 5792), (5881, 5888), 5901, (5909, 5920), (5943, 5952), (5972, 5984), 5997, 6001, (6004, 6016), (6110, 6112), (6122, 6128), (6138, 6144), 6159, (6170, 6176), (6265, 6272), (6315, 6320), (6390, 6400), 6431, (6444, 6448), (6460, 6464), (6465, 6468), (6510, 6512), (6517, 6528), (6572, 6576), (6602, 6608), (6619, 6622), (6684, 6686), 6751, (6781, 6783), (6794, 6800), (6810, 6816), (6830, 6832), (6847, 6912), (6988, 6992), (7037, 7040), (7156, 7164), (7224, 7227), (7242, 7245), (7305, 7312), (7355, 7357), (7368, 7376), (7419, 7424), 7674, (7958, 7960), (7966, 7968), (8006, 8008), (8014, 8016), 8024, 8026, 8028, 8030, (8062, 8064), 8117, 8133, (8148, 8150), 8156, (8176, 8178), 8181, 8191, 8293, (8306, 8308), 8335, (8349, 8352), (8384, 8400), (8433, 8448), (8588, 8592), (9255, 9280), (9291, 9312), (11124, 11126), (11158, 11160), 11311, 11359, (11508, 11513), 11558, (11560, 11565), (11566, 11568), (11624, 11631), (11633, 11647), (11671, 11680), 11687, 11695, 11703, 11711, 11719, 11727, 11735, 11743, (11856, 11904), 11930, (12020, 12032), (12246, 12272), (12284, 12288), 12352, (12439, 12441), (12544, 12549), 12592, 12687, (12731, 12736), (12772, 12784), 12831, (19894, 19904), (40944, 40960), (42125, 42128), (42183, 42192), (42540, 42560), (42744, 42752), (42944, 42946), (42951, 42999), (43052, 43056), (43066, 43072), (43128, 43136), (43206, 43214), (43226, 43232), (43348, 43359), (43389, 43392), 43470, (43482, 43486), 43519, (43575, 43584), (43598, 43600), (43610, 43612), (43715, 43739), (43767, 43777), (43783, 43785), (43791, 43793), (43799, 43808), 43815, 43823, (43880, 43888), (44014, 44016), (44026, 44032), (55204, 55216), (55239, 55243), (55292, 55296), (64110, 64112), (64218, 64256), (64263, 64275), (64280, 64285), 64311, 64317, 64319, 64322, 64325, (64450, 64467), (64832, 64848), (64912, 64914), (64968, 65008), (65022, 65024), (65050, 65056), 65107, 65127, (65132, 65136), 65141, (65277, 65279), 65280, (65471, 65474), (65480, 65482), (65488, 65490), (65496, 65498), (65501, 65504), 65511, (65519, 65529), (65534, 65536), 65548, 65575, 65595, 65598, (65614, 65616), (65630, 65664), (65787, 65792), (65795, 65799), (65844, 65847), 65935, (65948, 65952), (65953, 66000), (66046, 66176), (66205, 66208), (66257, 66272), (66300, 66304), (66340, 66349), (66379, 66384), (66427, 66432), 66462, (66500, 66504), (66518, 66560), (66718, 66720), (66730, 66736), (66772, 66776), (66812, 66816), (66856, 66864), (66916, 66927), (66928, 67072), (67383, 67392), (67414, 67424), (67432, 67584), (67590, 67592), 67593, 67638, (67641, 67644), (67645, 67647), 67670, (67743, 67751), (67760, 67808), 67827, (67830, 67835), (67868, 67871), (67898, 67903), (67904, 67968), (68024, 68028), (68048, 68050), 68100, (68103, 68108), 68116, 68120, (68150, 68152), (68155, 68159), (68169, 68176), (68185, 68192), (68256, 68288), (68327, 68331), (68343, 68352), (68406, 68409), (68438, 68440), (68467, 68472), (68498, 68505), (68509, 68521), (68528, 68608), (68681, 68736), (68787, 68800), (68851, 68858), (68904, 68912), (68922, 69216), (69247, 69376), (69416, 69424), (69466, 69600), (69623, 69632), (69710, 69714), (69744, 69759), (69826, 69837), (69838, 69840), (69865, 69872), (69882, 69888), 69941, (69959, 69968), (70007, 70016), (70094, 70096), 70112, (70133, 70144), 70162, (70207, 70272), 70279, 70281, 70286, 70302, (70314, 70320), (70379, 70384), (70394, 70400), 70404, (70413, 70415), (70417, 70419), 70441, 70449, 70452, 70458, (70469, 70471), (70473, 70475), (70478, 70480), (70481, 70487), (70488, 70493), (70500, 70502), (70509, 70512), (70517, 70656), 70746, 70748, (70752, 70784), (70856, 70864), (70874, 71040), (71094, 71096), (71134, 71168), (71237, 71248), (71258, 71264), (71277, 71296), (71353, 71360), (71370, 71424), (71451, 71453), (71468, 71472), (71488, 71680), (71740, 71840), (71923, 71935), (71936, 72096), (72104, 72106), (72152, 72154), (72165, 72192), (72264, 72272), (72355, 72384), (72441, 72704), 72713, 72759, (72774, 72784), (72813, 72816), (72848, 72850), 72872, (72887, 72960), 72967, 72970, (73015, 73018), 73019, 73022, (73032, 73040), (73050, 73056), 73062, 73065, 73103, 73106, (73113, 73120), (73130, 73440), (73465, 73664), (73714, 73727), (74650, 74752), 74863, (74869, 74880), (75076, 77824), 78895, (78905, 82944), (83527, 92160), (92729, 92736), 92767, (92778, 92782), (92784, 92880), (92910, 92912), (92918, 92928), (92998, 93008), 93018, 93026, (93048, 93053), (93072, 93760), (93851, 93952), (94027, 94031), (94088, 94095), (94112, 94176), (94180, 94208), (100344, 100352), (101107, 110592), (110879, 110928), (110931, 110948), (110952, 110960), (111356, 113664), (113771, 113776), (113789, 113792), (113801, 113808), (113818, 113820), (113828, 118784), (119030, 119040), (119079, 119081), (119273, 119296), (119366, 119520), (119540, 119552), (119639, 119648), (119673, 119808), 119893, 119965, (119968, 119970), (119971, 119973), (119975, 119977), 119981, 119994, 119996, 120004, 120070, (120075, 120077), 120085, 120093, 120122, 120127, 120133, (120135, 120138), 120145, (120486, 120488), (120780, 120782), (121484, 121499), 121504, (121520, 122880), 122887, (122905, 122907), 122914, 122917, (122923, 123136), (123181, 123184), (123198, 123200), (123210, 123214), (123216, 123584), (123642, 123647), (123648, 124928), (125125, 125127), (125143, 125184), (125260, 125264), (125274, 125278), (125280, 126065), (126133, 126209), (126270, 126464), 126468, 126496, 126499, (126501, 126503), 126504, 126515, 126520, 126522, (126524, 126530), (126531, 126535), 126536, 126538, 126540, 126544, 126547, (126549, 126551), 126552, 126554, 126556, 126558, 126560, 126563, (126565, 126567), 126571, 126579, 126584, 126589, 126591, 126602, (126620, 126625), 126628, 126634, (126652, 126704), (126706, 126976), (127020, 127024), (127124, 127136), (127151, 127153), 127168, 127184, (127222, 127232), (127245, 127248), (127341, 127344), (127405, 127462), (127491, 127504), (127548, 127552), (127561, 127568), (127570, 127584), (127590, 127744), (128726, 128736), (128749, 128752), (128763, 128768), (128884, 128896), (128985, 128992), (129004, 129024), (129036, 129040), (129096, 129104), (129114, 129120), (129160, 129168), (129198, 129280), 129292, 129394, (129399, 129402), (129443, 129445), (129451, 129454), (129483, 129485), (129620, 129632), (129646, 129648), (129652, 129656), (129659, 129664), (129667, 129680), (129686, 131072), (173783, 173824), (177973, 177984), (178206, 178208), (183970, 183984), (191457, 194560), (195102, 917505), (917506, 917536), (917632, 917760), (918000, 983040), (1048574, 1048576), (1114110, 1114112)], 'Co': [(57344, 63744), (983040, 1048574), (1048576, 1114110)], 'Cs': [(55296, 57344)], 'L': [(65, 91), (97, 123), 170, 181, 186, (192, 215), (216, 247), (248, 706), (710, 722), (736, 741), 748, 750, (880, 885), (886, 888), (890, 894), 895, 902, (904, 907), 908, (910, 930), (931, 1014), (1015, 1154), (1162, 1328), (1329, 1367), 1369, (1376, 1417), (1488, 1515), (1519, 1523), (1568, 1611), (1646, 1648), (1649, 1748), 1749, (1765, 1767), (1774, 1776), (1786, 1789), 1791, 1808, (1810, 1840), (1869, 1958), 1969, (1994, 2027), (2036, 2038), 2042, (2048, 2070), 2074, 2084, 2088, (2112, 2137), (2144, 2155), (2208, 2229), (2230, 2238), (2308, 2362), 2365, 2384, (2392, 2402), (2417, 2433), (2437, 2445), (2447, 2449), (2451, 2473), (2474, 2481), 2482, (2486, 2490), 2493, 2510, (2524, 2526), (2527, 2530), (2544, 2546), 2556, (2565, 2571), (2575, 2577), (2579, 2601), (2602, 2609), (2610, 2612), (2613, 2615), (2616, 2618), (2649, 2653), 2654, (2674, 2677), (2693, 2702), (2703, 2706), (2707, 2729), (2730, 2737), (2738, 2740), (2741, 2746), 2749, 2768, (2784, 2786), 2809, (2821, 2829), (2831, 2833), (2835, 2857), (2858, 2865), (2866, 2868), (2869, 2874), 2877, (2908, 2910), (2911, 2914), 2929, 2947, (2949, 2955), (2958, 2961), (2962, 2966), (2969, 2971), 2972, (2974, 2976), (2979, 2981), (2984, 2987), (2990, 3002), 3024, (3077, 3085), (3086, 3089), (3090, 3113), (3114, 3130), 3133, (3160, 3163), (3168, 3170), 3200, (3205, 3213), (3214, 3217), (3218, 3241), (3242, 3252), (3253, 3258), 3261, 3294, (3296, 3298), (3313, 3315), (3333, 3341), (3342, 3345), (3346, 3387), 3389, 3406, (3412, 3415), (3423, 3426), (3450, 3456), (3461, 3479), (3482, 3506), (3507, 3516), 3517, (3520, 3527), (3585, 3633), (3634, 3636), (3648, 3655), (3713, 3715), 3716, (3718, 3723), (3724, 3748), 3749, (3751, 3761), (3762, 3764), 3773, (3776, 3781), 3782, (3804, 3808), 3840, (3904, 3912), (3913, 3949), (3976, 3981), (4096, 4139), 4159, (4176, 4182), (4186, 4190), 4193, (4197, 4199), (4206, 4209), (4213, 4226), 4238, (4256, 4294), 4295, 4301, (4304, 4347), (4348, 4681), (4682, 4686), (4688, 4695), 4696, (4698, 4702), (4704, 4745), (4746, 4750), (4752, 4785), (4786, 4790), (4792, 4799), 4800, (4802, 4806), (4808, 4823), (4824, 4881), (4882, 4886), (4888, 4955), (4992, 5008), (5024, 5110), (5112, 5118), (5121, 5741), (5743, 5760), (5761, 5787), (5792, 5867), (5873, 5881), (5888, 5901), (5902, 5906), (5920, 5938), (5952, 5970), (5984, 5997), (5998, 6001), (6016, 6068), 6103, 6108, (6176, 6265), (6272, 6277), (6279, 6313), 6314, (6320, 6390), (6400, 6431), (6480, 6510), (6512, 6517), (6528, 6572), (6576, 6602), (6656, 6679), (6688, 6741), 6823, (6917, 6964), (6981, 6988), (7043, 7073), (7086, 7088), (7098, 7142), (7168, 7204), (7245, 7248), (7258, 7294), (7296, 7305), (7312, 7355), (7357, 7360), (7401, 7405), (7406, 7412), (7413, 7415), 7418, (7424, 7616), (7680, 7958), (7960, 7966), (7968, 8006), (8008, 8014), (8016, 8024), 8025, 8027, 8029, (8031, 8062), (8064, 8117), (8118, 8125), 8126, (8130, 8133), (8134, 8141), (8144, 8148), (8150, 8156), (8160, 8173), (8178, 8181), (8182, 8189), 8305, 8319, (8336, 8349), 8450, 8455, (8458, 8468), 8469, (8473, 8478), 8484, 8486, 8488, (8490, 8494), (8495, 8506), (8508, 8512), (8517, 8522), 8526, (8579, 8581), (11264, 11311), (11312, 11359), (11360, 11493), (11499, 11503), (11506, 11508), (11520, 11558), 11559, 11565, (11568, 11624), 11631, (11648, 11671), (11680, 11687), (11688, 11695), (11696, 11703), (11704, 11711), (11712, 11719), (11720, 11727), (11728, 11735), (11736, 11743), 11823, (12293, 12295), (12337, 12342), (12347, 12349), (12353, 12439), (12445, 12448), (12449, 12539), (12540, 12544), (12549, 12592), (12593, 12687), (12704, 12731), (12784, 12800), (13312, 19894), (19968, 40944), (40960, 42125), (42192, 42238), (42240, 42509), (42512, 42528), (42538, 42540), (42560, 42607), (42623, 42654), (42656, 42726), (42775, 42784), (42786, 42889), (42891, 42944), (42946, 42951), (42999, 43010), (43011, 43014), (43015, 43019), (43020, 43043), (43072, 43124), (43138, 43188), (43250, 43256), 43259, (43261, 43263), (43274, 43302), (43312, 43335), (43360, 43389), (43396, 43443), 43471, (43488, 43493), (43494, 43504), (43514, 43519), (43520, 43561), (43584, 43587), (43588, 43596), (43616, 43639), 43642, (43646, 43696), 43697, (43701, 43703), (43705, 43710), 43712, 43714, (43739, 43742), (43744, 43755), (43762, 43765), (43777, 43783), (43785, 43791), (43793, 43799), (43808, 43815), (43816, 43823), (43824, 43867), (43868, 43880), (43888, 44003), (44032, 55204), (55216, 55239), (55243, 55292), (63744, 64110), (64112, 64218), (64256, 64263), (64275, 64280), 64285, (64287, 64297), (64298, 64311), (64312, 64317), 64318, (64320, 64322), (64323, 64325), (64326, 64434), (64467, 64830), (64848, 64912), (64914, 64968), (65008, 65020), (65136, 65141), (65142, 65277), (65313, 65339), (65345, 65371), (65382, 65471), (65474, 65480), (65482, 65488), (65490, 65496), (65498, 65501), (65536, 65548), (65549, 65575), (65576, 65595), (65596, 65598), (65599, 65614), (65616, 65630), (65664, 65787), (66176, 66205), (66208, 66257), (66304, 66336), (66349, 66369), (66370, 66378), (66384, 66422), (66432, 66462), (66464, 66500), (66504, 66512), (66560, 66718), (66736, 66772), (66776, 66812), (66816, 66856), (66864, 66916), (67072, 67383), (67392, 67414), (67424, 67432), (67584, 67590), 67592, (67594, 67638), (67639, 67641), 67644, (67647, 67670), (67680, 67703), (67712, 67743), (67808, 67827), (67828, 67830), (67840, 67862), (67872, 67898), (67968, 68024), (68030, 68032), 68096, (68112, 68116), (68117, 68120), (68121, 68150), (68192, 68221), (68224, 68253), (68288, 68296), (68297, 68325), (68352, 68406), (68416, 68438), (68448, 68467), (68480, 68498), (68608, 68681), (68736, 68787), (68800, 68851), (68864, 68900), (69376, 69405), 69415, (69424, 69446), (69600, 69623), (69635, 69688), (69763, 69808), (69840, 69865), (69891, 69927), 69956, (69968, 70003), 70006, (70019, 70067), (70081, 70085), 70106, 70108, (70144, 70162), (70163, 70188), (70272, 70279), 70280, (70282, 70286), (70287, 70302), (70303, 70313), (70320, 70367), (70405, 70413), (70415, 70417), (70419, 70441), (70442, 70449), (70450, 70452), (70453, 70458), 70461, 70480, (70493, 70498), (70656, 70709), (70727, 70731), 70751, (70784, 70832), (70852, 70854), 70855, (71040, 71087), (71128, 71132), (71168, 71216), 71236, (71296, 71339), 71352, (71424, 71451), (71680, 71724), (71840, 71904), 71935, (72096, 72104), (72106, 72145), 72161, 72163, 72192, (72203, 72243), 72250, 72272, (72284, 72330), 72349, (72384, 72441), (72704, 72713), (72714, 72751), 72768, (72818, 72848), (72960, 72967), (72968, 72970), (72971, 73009), 73030, (73056, 73062), (73063, 73065), (73066, 73098), 73112, (73440, 73459), (73728, 74650), (74880, 75076), (77824, 78895), (82944, 83527), (92160, 92729), (92736, 92767), (92880, 92910), (92928, 92976), (92992, 92996), (93027, 93048), (93053, 93072), (93760, 93824), (93952, 94027), 94032, (94099, 94112), (94176, 94178), 94179, (94208, 100344), (100352, 101107), (110592, 110879), (110928, 110931), (110948, 110952), (110960, 111356), (113664, 113771), (113776, 113789), (113792, 113801), (113808, 113818), (119808, 119893), (119894, 119965), (119966, 119968), 119970, (119973, 119975), (119977, 119981), (119982, 119994), 119995, (119997, 120004), (120005, 120070), (120071, 120075), (120077, 120085), (120086, 120093), (120094, 120122), (120123, 120127), (120128, 120133), 120134, (120138, 120145), (120146, 120486), (120488, 120513), (120514, 120539), (120540, 120571), (120572, 120597), (120598, 120629), (120630, 120655), (120656, 120687), (120688, 120713), (120714, 120745), (120746, 120771), (120772, 120780), (123136, 123181), (123191, 123198), 123214, (123584, 123628), (124928, 125125), (125184, 125252), 125259, (126464, 126468), (126469, 126496), (126497, 126499), 126500, 126503, (126505, 126515), (126516, 126520), 126521, 126523, 126530, 126535, 126537, 126539, (126541, 126544), (126545, 126547), 126548, 126551, 126553, 126555, 126557, 126559, (126561, 126563), 126564, (126567, 126571), (126572, 126579), (126580, 126584), (126585, 126589), 126590, (126592, 126602), (126603, 126620), (126625, 126628), (126629, 126634), (126635, 126652), (131072, 173783), (173824, 177973), (177984, 178206), (178208, 183970), (183984, 191457), (194560, 195102)], 'Ll': [(97, 123), 181, (223, 247), (248, 256), 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, (311, 313), 314, 316, 318, 320, 322, 324, 326, (328, 330), 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 378, 380, (382, 385), 387, 389, 392, (396, 398), 402, 405, (409, 412), 414, 417, 419, 421, 424, (426, 428), 429, 432, 436, 438, (441, 443), (445, 448), 454, 457, 460, 462, 464, 466, 468, 470, 472, 474, (476, 478), 479, 481, 483, 485, 487, 489, 491, 493, (495, 497), 499, 501, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 531, 533, 535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, (563, 570), 572, (575, 577), 578, 583, 585, 587, 589, (591, 660), (661, 688), 881, 883, 887, (891, 894), 912, (940, 975), (976, 978), (981, 984), 985, 987, 989, 991, 993, 995, 997, 999, 1001, 1003, 1005, (1007, 1012), 1013, 1016, (1019, 1021), (1072, 1120), 1121, 1123, 1125, 1127, 1129, 1131, 1133, 1135, 1137, 1139, 1141, 1143, 1145, 1147, 1149, 1151, 1153, 1163, 1165, 1167, 1169, 1171, 1173, 1175, 1177, 1179, 1181, 1183, 1185, 1187, 1189, 1191, 1193, 1195, 1197, 1199, 1201, 1203, 1205, 1207, 1209, 1211, 1213, 1215, 1218, 1220, 1222, 1224, 1226, 1228, (1230, 1232), 1233, 1235, 1237, 1239, 1241, 1243, 1245, 1247, 1249, 1251, 1253, 1255, 1257, 1259, 1261, 1263, 1265, 1267, 1269, 1271, 1273, 1275, 1277, 1279, 1281, 1283, 1285, 1287, 1289, 1291, 1293, 1295, 1297, 1299, 1301, 1303, 1305, 1307, 1309, 1311, 1313, 1315, 1317, 1319, 1321, 1323, 1325, 1327, (1376, 1417), (4304, 4347), (4349, 4352), (5112, 5118), (7296, 7305), (7424, 7468), (7531, 7544), (7545, 7579), 7681, 7683, 7685, 7687, 7689, 7691, 7693, 7695, 7697, 7699, 7701, 7703, 7705, 7707, 7709, 7711, 7713, 7715, 7717, 7719, 7721, 7723, 7725, 7727, 7729, 7731, 7733, 7735, 7737, 7739, 7741, 7743, 7745, 7747, 7749, 7751, 7753, 7755, 7757, 7759, 7761, 7763, 7765, 7767, 7769, 7771, 7773, 7775, 7777, 7779, 7781, 7783, 7785, 7787, 7789, 7791, 7793, 7795, 7797, 7799, 7801, 7803, 7805, 7807, 7809, 7811, 7813, 7815, 7817, 7819, 7821, 7823, 7825, 7827, (7829, 7838), 7839, 7841, 7843, 7845, 7847, 7849, 7851, 7853, 7855, 7857, 7859, 7861, 7863, 7865, 7867, 7869, 7871, 7873, 7875, 7877, 7879, 7881, 7883, 7885, 7887, 7889, 7891, 7893, 7895, 7897, 7899, 7901, 7903, 7905, 7907, 7909, 7911, 7913, 7915, 7917, 7919, 7921, 7923, 7925, 7927, 7929, 7931, 7933, (7935, 7944), (7952, 7958), (7968, 7976), (7984, 7992), (8000, 8006), (8016, 8024), (8032, 8040), (8048, 8062), (8064, 8072), (8080, 8088), (8096, 8104), (8112, 8117), (8118, 8120), 8126, (8130, 8133), (8134, 8136), (8144, 8148), (8150, 8152), (8160, 8168), (8178, 8181), (8182, 8184), 8458, (8462, 8464), 8467, 8495, 8500, 8505, (8508, 8510), (8518, 8522), 8526, 8580, (11312, 11359), 11361, (11365, 11367), 11368, 11370, 11372, 11377, (11379, 11381), (11382, 11388), 11393, 11395, 11397, 11399, 11401, 11403, 11405, 11407, 11409, 11411, 11413, 11415, 11417, 11419, 11421, 11423, 11425, 11427, 11429, 11431, 11433, 11435, 11437, 11439, 11441, 11443, 11445, 11447, 11449, 11451, 11453, 11455, 11457, 11459, 11461, 11463, 11465, 11467, 11469, 11471, 11473, 11475, 11477, 11479, 11481, 11483, 11485, 11487, 11489, (11491, 11493), 11500, 11502, 11507, (11520, 11558), 11559, 11565, 42561, 42563, 42565, 42567, 42569, 42571, 42573, 42575, 42577, 42579, 42581, 42583, 42585, 42587, 42589, 42591, 42593, 42595, 42597, 42599, 42601, 42603, 42605, 42625, 42627, 42629, 42631, 42633, 42635, 42637, 42639, 42641, 42643, 42645, 42647, 42649, 42651, 42787, 42789, 42791, 42793, 42795, 42797, (42799, 42802), 42803, 42805, 42807, 42809, 42811, 42813, 42815, 42817, 42819, 42821, 42823, 42825, 42827, 42829, 42831, 42833, 42835, 42837, 42839, 42841, 42843, 42845, 42847, 42849, 42851, 42853, 42855, 42857, 42859, 42861, 42863, (42865, 42873), 42874, 42876, 42879, 42881, 42883, 42885, 42887, 42892, 42894, 42897, (42899, 42902), 42903, 42905, 42907, 42909, 42911, 42913, 42915, 42917, 42919, 42921, 42927, 42933, 42935, 42937, 42939, 42941, 42943, 42947, 43002, (43824, 43867), (43872, 43880), (43888, 43968), (64256, 64263), (64275, 64280), (65345, 65371), (66600, 66640), (66776, 66812), (68800, 68851), (71872, 71904), (93792, 93824), (119834, 119860), (119886, 119893), (119894, 119912), (119938, 119964), (119990, 119994), 119995, (119997, 120004), (120005, 120016), (120042, 120068), (120094, 120120), (120146, 120172), (120198, 120224), (120250, 120276), (120302, 120328), (120354, 120380), (120406, 120432), (120458, 120486), (120514, 120539), (120540, 120546), (120572, 120597), (120598, 120604), (120630, 120655), (120656, 120662), (120688, 120713), (120714, 120720), (120746, 120771), (120772, 120778), 120779, (125218, 125252)], 'Lm': [(688, 706), (710, 722), (736, 741), 748, 750, 884, 890, 1369, 1600, (1765, 1767), (2036, 2038), 2042, 2074, 2084, 2088, 2417, 3654, 3782, 4348, 6103, 6211, 6823, (7288, 7294), (7468, 7531), 7544, (7579, 7616), 8305, 8319, (8336, 8349), (11388, 11390), 11631, 11823, 12293, (12337, 12342), 12347, (12445, 12447), (12540, 12543), 40981, (42232, 42238), 42508, 42623, (42652, 42654), (42775, 42784), 42864, 42888, (43000, 43002), 43471, 43494, 43632, 43741, (43763, 43765), (43868, 43872), 65392, (65438, 65440), (92992, 92996), (94099, 94112), (94176, 94178), 94179, (123191, 123198), 125259], 'Lo': [170, 186, 443, (448, 452), 660, (1488, 1515), (1519, 1523), (1568, 1600), (1601, 1611), (1646, 1648), (1649, 1748), 1749, (1774, 1776), (1786, 1789), 1791, 1808, (1810, 1840), (1869, 1958), 1969, (1994, 2027), (2048, 2070), (2112, 2137), (2144, 2155), (2208, 2229), (2230, 2238), (2308, 2362), 2365, 2384, (2392, 2402), (2418, 2433), (2437, 2445), (2447, 2449), (2451, 2473), (2474, 2481), 2482, (2486, 2490), 2493, 2510, (2524, 2526), (2527, 2530), (2544, 2546), 2556, (2565, 2571), (2575, 2577), (2579, 2601), (2602, 2609), (2610, 2612), (2613, 2615), (2616, 2618), (2649, 2653), 2654, (2674, 2677), (2693, 2702), (2703, 2706), (2707, 2729), (2730, 2737), (2738, 2740), (2741, 2746), 2749, 2768, (2784, 2786), 2809, (2821, 2829), (2831, 2833), (2835, 2857), (2858, 2865), (2866, 2868), (2869, 2874), 2877, (2908, 2910), (2911, 2914), 2929, 2947, (2949, 2955), (2958, 2961), (2962, 2966), (2969, 2971), 2972, (2974, 2976), (2979, 2981), (2984, 2987), (2990, 3002), 3024, (3077, 3085), (3086, 3089), (3090, 3113), (3114, 3130), 3133, (3160, 3163), (3168, 3170), 3200, (3205, 3213), (3214, 3217), (3218, 3241), (3242, 3252), (3253, 3258), 3261, 3294, (3296, 3298), (3313, 3315), (3333, 3341), (3342, 3345), (3346, 3387), 3389, 3406, (3412, 3415), (3423, 3426), (3450, 3456), (3461, 3479), (3482, 3506), (3507, 3516), 3517, (3520, 3527), (3585, 3633), (3634, 3636), (3648, 3654), (3713, 3715), 3716, (3718, 3723), (3724, 3748), 3749, (3751, 3761), (3762, 3764), 3773, (3776, 3781), (3804, 3808), 3840, (3904, 3912), (3913, 3949), (3976, 3981), (4096, 4139), 4159, (4176, 4182), (4186, 4190), 4193, (4197, 4199), (4206, 4209), (4213, 4226), 4238, (4352, 4681), (4682, 4686), (4688, 4695), 4696, (4698, 4702), (4704, 4745), (4746, 4750), (4752, 4785), (4786, 4790), (4792, 4799), 4800, (4802, 4806), (4808, 4823), (4824, 4881), (4882, 4886), (4888, 4955), (4992, 5008), (5121, 5741), (5743, 5760), (5761, 5787), (5792, 5867), (5873, 5881), (5888, 5901), (5902, 5906), (5920, 5938), (5952, 5970), (5984, 5997), (5998, 6001), (6016, 6068), 6108, (6176, 6211), (6212, 6265), (6272, 6277), (6279, 6313), 6314, (6320, 6390), (6400, 6431), (6480, 6510), (6512, 6517), (6528, 6572), (6576, 6602), (6656, 6679), (6688, 6741), (6917, 6964), (6981, 6988), (7043, 7073), (7086, 7088), (7098, 7142), (7168, 7204), (7245, 7248), (7258, 7288), (7401, 7405), (7406, 7412), (7413, 7415), 7418, (8501, 8505), (11568, 11624), (11648, 11671), (11680, 11687), (11688, 11695), (11696, 11703), (11704, 11711), (11712, 11719), (11720, 11727), (11728, 11735), (11736, 11743), 12294, 12348, (12353, 12439), 12447, (12449, 12539), 12543, (12549, 12592), (12593, 12687), (12704, 12731), (12784, 12800), (13312, 19894), (19968, 40944), (40960, 40981), (40982, 42125), (42192, 42232), (42240, 42508), (42512, 42528), (42538, 42540), 42606, (42656, 42726), 42895, 42999, (43003, 43010), (43011, 43014), (43015, 43019), (43020, 43043), (43072, 43124), (43138, 43188), (43250, 43256), 43259, (43261, 43263), (43274, 43302), (43312, 43335), (43360, 43389), (43396, 43443), (43488, 43493), (43495, 43504), (43514, 43519), (43520, 43561), (43584, 43587), (43588, 43596), (43616, 43632), (43633, 43639), 43642, (43646, 43696), 43697, (43701, 43703), (43705, 43710), 43712, 43714, (43739, 43741), (43744, 43755), 43762, (43777, 43783), (43785, 43791), (43793, 43799), (43808, 43815), (43816, 43823), (43968, 44003), (44032, 55204), (55216, 55239), (55243, 55292), (63744, 64110), (64112, 64218), 64285, (64287, 64297), (64298, 64311), (64312, 64317), 64318, (64320, 64322), (64323, 64325), (64326, 64434), (64467, 64830), (64848, 64912), (64914, 64968), (65008, 65020), (65136, 65141), (65142, 65277), (65382, 65392), (65393, 65438), (65440, 65471), (65474, 65480), (65482, 65488), (65490, 65496), (65498, 65501), (65536, 65548), (65549, 65575), (65576, 65595), (65596, 65598), (65599, 65614), (65616, 65630), (65664, 65787), (66176, 66205), (66208, 66257), (66304, 66336), (66349, 66369), (66370, 66378), (66384, 66422), (66432, 66462), (66464, 66500), (66504, 66512), (66640, 66718), (66816, 66856), (66864, 66916), (67072, 67383), (67392, 67414), (67424, 67432), (67584, 67590), 67592, (67594, 67638), (67639, 67641), 67644, (67647, 67670), (67680, 67703), (67712, 67743), (67808, 67827), (67828, 67830), (67840, 67862), (67872, 67898), (67968, 68024), (68030, 68032), 68096, (68112, 68116), (68117, 68120), (68121, 68150), (68192, 68221), (68224, 68253), (68288, 68296), (68297, 68325), (68352, 68406), (68416, 68438), (68448, 68467), (68480, 68498), (68608, 68681), (68864, 68900), (69376, 69405), 69415, (69424, 69446), (69600, 69623), (69635, 69688), (69763, 69808), (69840, 69865), (69891, 69927), 69956, (69968, 70003), 70006, (70019, 70067), (70081, 70085), 70106, 70108, (70144, 70162), (70163, 70188), (70272, 70279), 70280, (70282, 70286), (70287, 70302), (70303, 70313), (70320, 70367), (70405, 70413), (70415, 70417), (70419, 70441), (70442, 70449), (70450, 70452), (70453, 70458), 70461, 70480, (70493, 70498), (70656, 70709), (70727, 70731), 70751, (70784, 70832), (70852, 70854), 70855, (71040, 71087), (71128, 71132), (71168, 71216), 71236, (71296, 71339), 71352, (71424, 71451), (71680, 71724), 71935, (72096, 72104), (72106, 72145), 72161, 72163, 72192, (72203, 72243), 72250, 72272, (72284, 72330), 72349, (72384, 72441), (72704, 72713), (72714, 72751), 72768, (72818, 72848), (72960, 72967), (72968, 72970), (72971, 73009), 73030, (73056, 73062), (73063, 73065), (73066, 73098), 73112, (73440, 73459), (73728, 74650), (74880, 75076), (77824, 78895), (82944, 83527), (92160, 92729), (92736, 92767), (92880, 92910), (92928, 92976), (93027, 93048), (93053, 93072), (93952, 94027), 94032, (94208, 100344), (100352, 101107), (110592, 110879), (110928, 110931), (110948, 110952), (110960, 111356), (113664, 113771), (113776, 113789), (113792, 113801), (113808, 113818), (123136, 123181), 123214, (123584, 123628), (124928, 125125), (126464, 126468), (126469, 126496), (126497, 126499), 126500, 126503, (126505, 126515), (126516, 126520), 126521, 126523, 126530, 126535, 126537, 126539, (126541, 126544), (126545, 126547), 126548, 126551, 126553, 126555, 126557, 126559, (126561, 126563), 126564, (126567, 126571), (126572, 126579), (126580, 126584), (126585, 126589), 126590, (126592, 126602), (126603, 126620), (126625, 126628), (126629, 126634), (126635, 126652), (131072, 173783), (173824, 177973), (177984, 178206), (178208, 183970), (183984, 191457), (194560, 195102)], 'Lt': [453, 456, 459, 498, (8072, 8080), (8088, 8096), (8104, 8112), 8124, 8140, 8188], 'Lu': [(65, 91), (192, 215), (216, 223), 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 313, 315, 317, 319, 321, 323, 325, 327, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, (376, 378), 379, 381, (385, 387), 388, (390, 392), (393, 396), (398, 402), (403, 405), (406, 409), (412, 414), (415, 417), 418, 420, (422, 424), 425, 428, (430, 432), (433, 436), 437, (439, 441), 444, 452, 455, 458, 461, 463, 465, 467, 469, 471, 473, 475, 478, 480, 482, 484, 486, 488, 490, 492, 494, 497, 500, (502, 505), 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 536, 538, 540, 542, 544, 546, 548, 550, 552, 554, 556, 558, 560, 562, (570, 572), (573, 575), 577, (579, 583), 584, 586, 588, 590, 880, 882, 886, 895, 902, (904, 907), 908, (910, 912), (913, 930), (931, 940), 975, (978, 981), 984, 986, 988, 990, 992, 994, 996, 998, 1000, 1002, 1004, 1006, 1012, 1015, (1017, 1019), (1021, 1072), 1120, 1122, 1124, 1126, 1128, 1130, 1132, 1134, 1136, 1138, 1140, 1142, 1144, 1146, 1148, 1150, 1152, 1162, 1164, 1166, 1168, 1170, 1172, 1174, 1176, 1178, 1180, 1182, 1184, 1186, 1188, 1190, 1192, 1194, 1196, 1198, 1200, 1202, 1204, 1206, 1208, 1210, 1212, 1214, (1216, 1218), 1219, 1221, 1223, 1225, 1227, 1229, 1232, 1234, 1236, 1238, 1240, 1242, 1244, 1246, 1248, 1250, 1252, 1254, 1256, 1258, 1260, 1262, 1264, 1266, 1268, 1270, 1272, 1274, 1276, 1278, 1280, 1282, 1284, 1286, 1288, 1290, 1292, 1294, 1296, 1298, 1300, 1302, 1304, 1306, 1308, 1310, 1312, 1314, 1316, 1318, 1320, 1322, 1324, 1326, (1329, 1367), (4256, 4294), 4295, 4301, (5024, 5110), (7312, 7355), (7357, 7360), 7680, 7682, 7684, 7686, 7688, 7690, 7692, 7694, 7696, 7698, 7700, 7702, 7704, 7706, 7708, 7710, 7712, 7714, 7716, 7718, 7720, 7722, 7724, 7726, 7728, 7730, 7732, 7734, 7736, 7738, 7740, 7742, 7744, 7746, 7748, 7750, 7752, 7754, 7756, 7758, 7760, 7762, 7764, 7766, 7768, 7770, 7772, 7774, 7776, 7778, 7780, 7782, 7784, 7786, 7788, 7790, 7792, 7794, 7796, 7798, 7800, 7802, 7804, 7806, 7808, 7810, 7812, 7814, 7816, 7818, 7820, 7822, 7824, 7826, 7828, 7838, 7840, 7842, 7844, 7846, 7848, 7850, 7852, 7854, 7856, 7858, 7860, 7862, 7864, 7866, 7868, 7870, 7872, 7874, 7876, 7878, 7880, 7882, 7884, 7886, 7888, 7890, 7892, 7894, 7896, 7898, 7900, 7902, 7904, 7906, 7908, 7910, 7912, 7914, 7916, 7918, 7920, 7922, 7924, 7926, 7928, 7930, 7932, 7934, (7944, 7952), (7960, 7966), (7976, 7984), (7992, 8000), (8008, 8014), 8025, 8027, 8029, 8031, (8040, 8048), (8120, 8124), (8136, 8140), (8152, 8156), (8168, 8173), (8184, 8188), 8450, 8455, (8459, 8462), (8464, 8467), 8469, (8473, 8478), 8484, 8486, 8488, (8490, 8494), (8496, 8500), (8510, 8512), 8517, 8579, (11264, 11311), 11360, (11362, 11365), 11367, 11369, 11371, (11373, 11377), 11378, 11381, (11390, 11393), 11394, 11396, 11398, 11400, 11402, 11404, 11406, 11408, 11410, 11412, 11414, 11416, 11418, 11420, 11422, 11424, 11426, 11428, 11430, 11432, 11434, 11436, 11438, 11440, 11442, 11444, 11446, 11448, 11450, 11452, 11454, 11456, 11458, 11460, 11462, 11464, 11466, 11468, 11470, 11472, 11474, 11476, 11478, 11480, 11482, 11484, 11486, 11488, 11490, 11499, 11501, 11506, 42560, 42562, 42564, 42566, 42568, 42570, 42572, 42574, 42576, 42578, 42580, 42582, 42584, 42586, 42588, 42590, 42592, 42594, 42596, 42598, 42600, 42602, 42604, 42624, 42626, 42628, 42630, 42632, 42634, 42636, 42638, 42640, 42642, 42644, 42646, 42648, 42650, 42786, 42788, 42790, 42792, 42794, 42796, 42798, 42802, 42804, 42806, 42808, 42810, 42812, 42814, 42816, 42818, 42820, 42822, 42824, 42826, 42828, 42830, 42832, 42834, 42836, 42838, 42840, 42842, 42844, 42846, 42848, 42850, 42852, 42854, 42856, 42858, 42860, 42862, 42873, 42875, (42877, 42879), 42880, 42882, 42884, 42886, 42891, 42893, 42896, 42898, 42902, 42904, 42906, 42908, 42910, 42912, 42914, 42916, 42918, 42920, (42922, 42927), (42928, 42933), 42934, 42936, 42938, 42940, 42942, 42946, (42948, 42951), (65313, 65339), (66560, 66600), (66736, 66772), (68736, 68787), (71840, 71872), (93760, 93792), (119808, 119834), (119860, 119886), (119912, 119938), 119964, (119966, 119968), 119970, (119973, 119975), (119977, 119981), (119982, 119990), (120016, 120042), (120068, 120070), (120071, 120075), (120077, 120085), (120086, 120093), (120120, 120122), (120123, 120127), (120128, 120133), 120134, (120138, 120145), (120172, 120198), (120224, 120250), (120276, 120302), (120328, 120354), (120380, 120406), (120432, 120458), (120488, 120513), (120546, 120571), (120604, 120629), (120662, 120687), (120720, 120745), 120778, (125184, 125218)], 'M': [(768, 880), (1155, 1162), (1425, 1470), 1471, (1473, 1475), (1476, 1478), 1479, (1552, 1563), (1611, 1632), 1648, (1750, 1757), (1759, 1765), (1767, 1769), (1770, 1774), 1809, (1840, 1867), (1958, 1969), (2027, 2036), 2045, (2070, 2074), (2075, 2084), (2085, 2088), (2089, 2094), (2137, 2140), (2259, 2274), (2275, 2308), (2362, 2365), (2366, 2384), (2385, 2392), (2402, 2404), (2433, 2436), 2492, (2494, 2501), (2503, 2505), (2507, 2510), 2519, (2530, 2532), 2558, (2561, 2564), 2620, (2622, 2627), (2631, 2633), (2635, 2638), 2641, (2672, 2674), 2677, (2689, 2692), 2748, (2750, 2758), (2759, 2762), (2763, 2766), (2786, 2788), (2810, 2816), (2817, 2820), 2876, (2878, 2885), (2887, 2889), (2891, 2894), (2902, 2904), (2914, 2916), 2946, (3006, 3011), (3014, 3017), (3018, 3022), 3031, (3072, 3077), (3134, 3141), (3142, 3145), (3146, 3150), (3157, 3159), (3170, 3172), (3201, 3204), 3260, (3262, 3269), (3270, 3273), (3274, 3278), (3285, 3287), (3298, 3300), (3328, 3332), (3387, 3389), (3390, 3397), (3398, 3401), (3402, 3406), 3415, (3426, 3428), (3458, 3460), 3530, (3535, 3541), 3542, (3544, 3552), (3570, 3572), 3633, (3636, 3643), (3655, 3663), 3761, (3764, 3773), (3784, 3790), (3864, 3866), 3893, 3895, 3897, (3902, 3904), (3953, 3973), (3974, 3976), (3981, 3992), (3993, 4029), 4038, (4139, 4159), (4182, 4186), (4190, 4193), (4194, 4197), (4199, 4206), (4209, 4213), (4226, 4238), 4239, (4250, 4254), (4957, 4960), (5906, 5909), (5938, 5941), (5970, 5972), (6002, 6004), (6068, 6100), 6109, (6155, 6158), (6277, 6279), 6313, (6432, 6444), (6448, 6460), (6679, 6684), (6741, 6751), (6752, 6781), 6783, (6832, 6847), (6912, 6917), (6964, 6981), (7019, 7028), (7040, 7043), (7073, 7086), (7142, 7156), (7204, 7224), (7376, 7379), (7380, 7401), 7405, 7412, (7415, 7418), (7616, 7674), (7675, 7680), (8400, 8433), (11503, 11506), 11647, (11744, 11776), (12330, 12336), (12441, 12443), (42607, 42611), (42612, 42622), (42654, 42656), (42736, 42738), 43010, 43014, 43019, (43043, 43048), (43136, 43138), (43188, 43206), (43232, 43250), 43263, (43302, 43310), (43335, 43348), (43392, 43396), (43443, 43457), 43493, (43561, 43575), 43587, (43596, 43598), (43643, 43646), 43696, (43698, 43701), (43703, 43705), (43710, 43712), 43713, (43755, 43760), (43765, 43767), (44003, 44011), (44012, 44014), 64286, (65024, 65040), (65056, 65072), 66045, 66272, (66422, 66427), (68097, 68100), (68101, 68103), (68108, 68112), (68152, 68155), 68159, (68325, 68327), (68900, 68904), (69446, 69457), (69632, 69635), (69688, 69703), (69759, 69763), (69808, 69819), (69888, 69891), (69927, 69941), (69957, 69959), 70003, (70016, 70019), (70067, 70081), (70089, 70093), (70188, 70200), 70206, (70367, 70379), (70400, 70404), (70459, 70461), (70462, 70469), (70471, 70473), (70475, 70478), 70487, (70498, 70500), (70502, 70509), (70512, 70517), (70709, 70727), 70750, (70832, 70852), (71087, 71094), (71096, 71105), (71132, 71134), (71216, 71233), (71339, 71352), (71453, 71468), (71724, 71739), (72145, 72152), (72154, 72161), 72164, (72193, 72203), (72243, 72250), (72251, 72255), 72263, (72273, 72284), (72330, 72346), (72751, 72759), (72760, 72768), (72850, 72872), (72873, 72887), (73009, 73015), 73018, (73020, 73022), (73023, 73030), 73031, (73098, 73103), (73104, 73106), (73107, 73112), (73459, 73463), (92912, 92917), (92976, 92983), 94031, (94033, 94088), (94095, 94099), (113821, 113823), (119141, 119146), (119149, 119155), (119163, 119171), (119173, 119180), (119210, 119214), (119362, 119365), (121344, 121399), (121403, 121453), 121461, 121476, (121499, 121504), (121505, 121520), (122880, 122887), (122888, 122905), (122907, 122914), (122915, 122917), (122918, 122923), (123184, 123191), (123628, 123632), (125136, 125143), (125252, 125259), (917760, 918000)], 'Mc': [2307, 2363, (2366, 2369), (2377, 2381), (2382, 2384), (2434, 2436), (2494, 2497), (2503, 2505), (2507, 2509), 2519, 2563, (2622, 2625), 2691, (2750, 2753), 2761, (2763, 2765), (2818, 2820), 2878, 2880, (2887, 2889), (2891, 2893), 2903, (3006, 3008), (3009, 3011), (3014, 3017), (3018, 3021), 3031, (3073, 3076), (3137, 3141), (3202, 3204), 3262, (3264, 3269), (3271, 3273), (3274, 3276), (3285, 3287), (3330, 3332), (3390, 3393), (3398, 3401), (3402, 3405), 3415, (3458, 3460), (3535, 3538), (3544, 3552), (3570, 3572), (3902, 3904), 3967, (4139, 4141), 4145, 4152, (4155, 4157), (4182, 4184), (4194, 4197), (4199, 4206), (4227, 4229), (4231, 4237), 4239, (4250, 4253), 6070, (6078, 6086), (6087, 6089), (6435, 6439), (6441, 6444), (6448, 6450), (6451, 6457), (6681, 6683), 6741, 6743, 6753, (6755, 6757), (6765, 6771), 6916, 6965, 6971, (6973, 6978), (6979, 6981), 7042, 7073, (7078, 7080), 7082, 7143, (7146, 7149), 7150, (7154, 7156), (7204, 7212), (7220, 7222), 7393, 7415, (12334, 12336), (43043, 43045), 43047, (43136, 43138), (43188, 43204), (43346, 43348), 43395, (43444, 43446), (43450, 43452), (43454, 43457), (43567, 43569), (43571, 43573), 43597, 43643, 43645, 43755, (43758, 43760), 43765, (44003, 44005), (44006, 44008), (44009, 44011), 44012, 69632, 69634, 69762, (69808, 69811), (69815, 69817), 69932, (69957, 69959), 70018, (70067, 70070), (70079, 70081), (70188, 70191), (70194, 70196), 70197, (70368, 70371), (70402, 70404), (70462, 70464), (70465, 70469), (70471, 70473), (70475, 70478), 70487, (70498, 70500), (70709, 70712), (70720, 70722), 70725, (70832, 70835), 70841, (70843, 70847), 70849, (71087, 71090), (71096, 71100), 71102, (71216, 71219), (71227, 71229), 71230, 71340, (71342, 71344), 71350, (71456, 71458), 71462, (71724, 71727), 71736, (72145, 72148), (72156, 72160), 72164, 72249, (72279, 72281), 72343, 72751, 72766, 72873, 72881, 72884, (73098, 73103), (73107, 73109), 73110, (73461, 73463), (94033, 94088), (119141, 119143), (119149, 119155)], 'Me': [(1160, 1162), 6846, (8413, 8417), (8418, 8421), (42608, 42611)], 'Mn': [(768, 880), (1155, 1160), (1425, 1470), 1471, (1473, 1475), (1476, 1478), 1479, (1552, 1563), (1611, 1632), 1648, (1750, 1757), (1759, 1765), (1767, 1769), (1770, 1774), 1809, (1840, 1867), (1958, 1969), (2027, 2036), 2045, (2070, 2074), (2075, 2084), (2085, 2088), (2089, 2094), (2137, 2140), (2259, 2274), (2275, 2307), 2362, 2364, (2369, 2377), 2381, (2385, 2392), (2402, 2404), 2433, 2492, (2497, 2501), 2509, (2530, 2532), 2558, (2561, 2563), 2620, (2625, 2627), (2631, 2633), (2635, 2638), 2641, (2672, 2674), 2677, (2689, 2691), 2748, (2753, 2758), (2759, 2761), 2765, (2786, 2788), (2810, 2816), 2817, 2876, 2879, (2881, 2885), 2893, 2902, (2914, 2916), 2946, 3008, 3021, 3072, 3076, (3134, 3137), (3142, 3145), (3146, 3150), (3157, 3159), (3170, 3172), 3201, 3260, 3263, 3270, (3276, 3278), (3298, 3300), (3328, 3330), (3387, 3389), (3393, 3397), 3405, (3426, 3428), 3530, (3538, 3541), 3542, 3633, (3636, 3643), (3655, 3663), 3761, (3764, 3773), (3784, 3790), (3864, 3866), 3893, 3895, 3897, (3953, 3967), (3968, 3973), (3974, 3976), (3981, 3992), (3993, 4029), 4038, (4141, 4145), (4146, 4152), (4153, 4155), (4157, 4159), (4184, 4186), (4190, 4193), (4209, 4213), 4226, (4229, 4231), 4237, 4253, (4957, 4960), (5906, 5909), (5938, 5941), (5970, 5972), (6002, 6004), (6068, 6070), (6071, 6078), 6086, (6089, 6100), 6109, (6155, 6158), (6277, 6279), 6313, (6432, 6435), (6439, 6441), 6450, (6457, 6460), (6679, 6681), 6683, 6742, (6744, 6751), 6752, 6754, (6757, 6765), (6771, 6781), 6783, (6832, 6846), (6912, 6916), 6964, (6966, 6971), 6972, 6978, (7019, 7028), (7040, 7042), (7074, 7078), (7080, 7082), (7083, 7086), 7142, (7144, 7146), 7149, (7151, 7154), (7212, 7220), (7222, 7224), (7376, 7379), (7380, 7393), (7394, 7401), 7405, 7412, (7416, 7418), (7616, 7674), (7675, 7680), (8400, 8413), 8417, (8421, 8433), (11503, 11506), 11647, (11744, 11776), (12330, 12334), (12441, 12443), 42607, (42612, 42622), (42654, 42656), (42736, 42738), 43010, 43014, 43019, (43045, 43047), (43204, 43206), (43232, 43250), 43263, (43302, 43310), (43335, 43346), (43392, 43395), 43443, (43446, 43450), (43452, 43454), 43493, (43561, 43567), (43569, 43571), (43573, 43575), 43587, 43596, 43644, 43696, (43698, 43701), (43703, 43705), (43710, 43712), 43713, (43756, 43758), 43766, 44005, 44008, 44013, 64286, (65024, 65040), (65056, 65072), 66045, 66272, (66422, 66427), (68097, 68100), (68101, 68103), (68108, 68112), (68152, 68155), 68159, (68325, 68327), (68900, 68904), (69446, 69457), 69633, (69688, 69703), (69759, 69762), (69811, 69815), (69817, 69819), (69888, 69891), (69927, 69932), (69933, 69941), 70003, (70016, 70018), (70070, 70079), (70089, 70093), (70191, 70194), 70196, (70198, 70200), 70206, 70367, (70371, 70379), (70400, 70402), (70459, 70461), 70464, (70502, 70509), (70512, 70517), (70712, 70720), (70722, 70725), 70726, 70750, (70835, 70841), 70842, (70847, 70849), (70850, 70852), (71090, 71094), (71100, 71102), (71103, 71105), (71132, 71134), (71219, 71227), 71229, (71231, 71233), 71339, 71341, (71344, 71350), 71351, (71453, 71456), (71458, 71462), (71463, 71468), (71727, 71736), (71737, 71739), (72148, 72152), (72154, 72156), 72160, (72193, 72203), (72243, 72249), (72251, 72255), 72263, (72273, 72279), (72281, 72284), (72330, 72343), (72344, 72346), (72752, 72759), (72760, 72766), 72767, (72850, 72872), (72874, 72881), (72882, 72884), (72885, 72887), (73009, 73015), 73018, (73020, 73022), (73023, 73030), 73031, (73104, 73106), 73109, 73111, (73459, 73461), (92912, 92917), (92976, 92983), 94031, (94095, 94099), (113821, 113823), (119143, 119146), (119163, 119171), (119173, 119180), (119210, 119214), (119362, 119365), (121344, 121399), (121403, 121453), 121461, 121476, (121499, 121504), (121505, 121520), (122880, 122887), (122888, 122905), (122907, 122914), (122915, 122917), (122918, 122923), (123184, 123191), (123628, 123632), (125136, 125143), (125252, 125259), (917760, 918000)], 'N': [(48, 58), (178, 180), 185, (188, 191), (1632, 1642), (1776, 1786), (1984, 1994), (2406, 2416), (2534, 2544), (2548, 2554), (2662, 2672), (2790, 2800), (2918, 2928), (2930, 2936), (3046, 3059), (3174, 3184), (3192, 3199), (3302, 3312), (3416, 3423), (3430, 3449), (3558, 3568), (3664, 3674), (3792, 3802), (3872, 3892), (4160, 4170), (4240, 4250), (4969, 4989), (5870, 5873), (6112, 6122), (6128, 6138), (6160, 6170), (6470, 6480), (6608, 6619), (6784, 6794), (6800, 6810), (6992, 7002), (7088, 7098), (7232, 7242), (7248, 7258), 8304, (8308, 8314), (8320, 8330), (8528, 8579), (8581, 8586), (9312, 9372), (9450, 9472), (10102, 10132), 11517, 12295, (12321, 12330), (12344, 12347), (12690, 12694), (12832, 12842), (12872, 12880), (12881, 12896), (12928, 12938), (12977, 12992), (42528, 42538), (42726, 42736), (43056, 43062), (43216, 43226), (43264, 43274), (43472, 43482), (43504, 43514), (43600, 43610), (44016, 44026), (65296, 65306), (65799, 65844), (65856, 65913), (65930, 65932), (66273, 66300), (66336, 66340), 66369, 66378, (66513, 66518), (66720, 66730), (67672, 67680), (67705, 67712), (67751, 67760), (67835, 67840), (67862, 67868), (68028, 68030), (68032, 68048), (68050, 68096), (68160, 68169), (68221, 68223), (68253, 68256), (68331, 68336), (68440, 68448), (68472, 68480), (68521, 68528), (68858, 68864), (68912, 68922), (69216, 69247), (69405, 69415), (69457, 69461), (69714, 69744), (69872, 69882), (69942, 69952), (70096, 70106), (70113, 70133), (70384, 70394), (70736, 70746), (70864, 70874), (71248, 71258), (71360, 71370), (71472, 71484), (71904, 71923), (72784, 72813), (73040, 73050), (73120, 73130), (73664, 73685), (74752, 74863), (92768, 92778), (93008, 93018), (93019, 93026), (93824, 93847), (119520, 119540), (119648, 119673), (120782, 120832), (123200, 123210), (123632, 123642), (125127, 125136), (125264, 125274), (126065, 126124), (126125, 126128), (126129, 126133), (126209, 126254), (126255, 126270), (127232, 127245)], 'Nd': [(48, 58), (1632, 1642), (1776, 1786), (1984, 1994), (2406, 2416), (2534, 2544), (2662, 2672), (2790, 2800), (2918, 2928), (3046, 3056), (3174, 3184), (3302, 3312), (3430, 3440), (3558, 3568), (3664, 3674), (3792, 3802), (3872, 3882), (4160, 4170), (4240, 4250), (6112, 6122), (6160, 6170), (6470, 6480), (6608, 6618), (6784, 6794), (6800, 6810), (6992, 7002), (7088, 7098), (7232, 7242), (7248, 7258), (42528, 42538), (43216, 43226), (43264, 43274), (43472, 43482), (43504, 43514), (43600, 43610), (44016, 44026), (65296, 65306), (66720, 66730), (68912, 68922), (69734, 69744), (69872, 69882), (69942, 69952), (70096, 70106), (70384, 70394), (70736, 70746), (70864, 70874), (71248, 71258), (71360, 71370), (71472, 71482), (71904, 71914), (72784, 72794), (73040, 73050), (73120, 73130), (92768, 92778), (93008, 93018), (120782, 120832), (123200, 123210), (123632, 123642), (125264, 125274)], 'Nl': [(5870, 5873), (8544, 8579), (8581, 8585), 12295, (12321, 12330), (12344, 12347), (42726, 42736), (65856, 65909), 66369, 66378, (66513, 66518), (74752, 74863)], 'No': [(178, 180), 185, (188, 191), (2548, 2554), (2930, 2936), (3056, 3059), (3192, 3199), (3416, 3423), (3440, 3449), (3882, 3892), (4969, 4989), (6128, 6138), 6618, 8304, (8308, 8314), (8320, 8330), (8528, 8544), 8585, (9312, 9372), (9450, 9472), (10102, 10132), 11517, (12690, 12694), (12832, 12842), (12872, 12880), (12881, 12896), (12928, 12938), (12977, 12992), (43056, 43062), (65799, 65844), (65909, 65913), (65930, 65932), (66273, 66300), (66336, 66340), (67672, 67680), (67705, 67712), (67751, 67760), (67835, 67840), (67862, 67868), (68028, 68030), (68032, 68048), (68050, 68096), (68160, 68169), (68221, 68223), (68253, 68256), (68331, 68336), (68440, 68448), (68472, 68480), (68521, 68528), (68858, 68864), (69216, 69247), (69405, 69415), (69457, 69461), (69714, 69734), (70113, 70133), (71482, 71484), (71914, 71923), (72794, 72813), (73664, 73685), (93019, 93026), (93824, 93847), (119520, 119540), (119648, 119673), (125127, 125136), (126065, 126124), (126125, 126128), (126129, 126133), (126209, 126254), (126255, 126270), (127232, 127245)], 'P': [(33, 36), (37, 43), (44, 48), (58, 60), (63, 65), (91, 94), 95, 123, 125, 161, 167, 171, (182, 184), 187, 191, 894, 903, (1370, 1376), (1417, 1419), 1470, 1472, 1475, 1478, (1523, 1525), (1545, 1547), (1548, 1550), 1563, (1566, 1568), (1642, 1646), 1748, (1792, 1806), (2039, 2042), (2096, 2111), 2142, (2404, 2406), 2416, 2557, 2678, 2800, 3191, 3204, 3572, 3663, (3674, 3676), (3844, 3859), 3860, (3898, 3902), 3973, (4048, 4053), (4057, 4059), (4170, 4176), 4347, (4960, 4969), 5120, 5742, (5787, 5789), (5867, 5870), (5941, 5943), (6100, 6103), (6104, 6107), (6144, 6155), (6468, 6470), (6686, 6688), (6816, 6823), (6824, 6830), (7002, 7009), (7164, 7168), (7227, 7232), (7294, 7296), (7360, 7368), 7379, (8208, 8232), (8240, 8260), (8261, 8274), (8275, 8287), (8317, 8319), (8333, 8335), (8968, 8972), (9001, 9003), (10088, 10102), (10181, 10183), (10214, 10224), (10627, 10649), (10712, 10716), (10748, 10750), (11513, 11517), (11518, 11520), 11632, (11776, 11823), (11824, 11856), (12289, 12292), (12296, 12306), (12308, 12320), 12336, 12349, 12448, 12539, (42238, 42240), (42509, 42512), 42611, 42622, (42738, 42744), (43124, 43128), (43214, 43216), (43256, 43259), 43260, (43310, 43312), 43359, (43457, 43470), (43486, 43488), (43612, 43616), (43742, 43744), (43760, 43762), 44011, (64830, 64832), (65040, 65050), (65072, 65107), (65108, 65122), 65123, 65128, (65130, 65132), (65281, 65284), (65285, 65291), (65292, 65296), (65306, 65308), (65311, 65313), (65339, 65342), 65343, 65371, 65373, (65375, 65382), (65792, 65795), 66463, 66512, 66927, 67671, 67871, 67903, (68176, 68185), 68223, (68336, 68343), (68409, 68416), (68505, 68509), (69461, 69466), (69703, 69710), (69819, 69821), (69822, 69826), (69952, 69956), (70004, 70006), (70085, 70089), 70093, 70107, (70109, 70112), (70200, 70206), 70313, (70731, 70736), 70747, 70749, 70854, (71105, 71128), (71233, 71236), (71264, 71277), (71484, 71487), 71739, 72162, (72255, 72263), (72346, 72349), (72350, 72355), (72769, 72774), (72816, 72818), (73463, 73465), 73727, (74864, 74869), (92782, 92784), 92917, (92983, 92988), 92996, (93847, 93851), 94178, 113823, (121479, 121484), (125278, 125280)], 'Pc': [95, (8255, 8257), 8276, (65075, 65077), (65101, 65104), 65343], 'Pd': [45, 1418, 1470, 5120, 6150, (8208, 8214), 11799, 11802, (11834, 11836), 11840, 12316, 12336, 12448, (65073, 65075), 65112, 65123, 65293], 'Pe': [41, 93, 125, 3899, 3901, 5788, 8262, 8318, 8334, 8969, 8971, 9002, 10089, 10091, 10093, 10095, 10097, 10099, 10101, 10182, 10215, 10217, 10219, 10221, 10223, 10628, 10630, 10632, 10634, 10636, 10638, 10640, 10642, 10644, 10646, 10648, 10713, 10715, 10749, 11811, 11813, 11815, 11817, 12297, 12299, 12301, 12303, 12305, 12309, 12311, 12313, 12315, (12318, 12320), 64830, 65048, 65078, 65080, 65082, 65084, 65086, 65088, 65090, 65092, 65096, 65114, 65116, 65118, 65289, 65341, 65373, 65376, 65379], 'Pf': [187, 8217, 8221, 8250, 11779, 11781, 11786, 11789, 11805, 11809], 'Pi': [171, 8216, (8219, 8221), 8223, 8249, 11778, 11780, 11785, 11788, 11804, 11808], 'Po': [(33, 36), (37, 40), 42, 44, (46, 48), (58, 60), (63, 65), 92, 161, 167, (182, 184), 191, 894, 903, (1370, 1376), 1417, 1472, 1475, 1478, (1523, 1525), (1545, 1547), (1548, 1550), 1563, (1566, 1568), (1642, 1646), 1748, (1792, 1806), (2039, 2042), (2096, 2111), 2142, (2404, 2406), 2416, 2557, 2678, 2800, 3191, 3204, 3572, 3663, (3674, 3676), (3844, 3859), 3860, 3973, (4048, 4053), (4057, 4059), (4170, 4176), 4347, (4960, 4969), 5742, (5867, 5870), (5941, 5943), (6100, 6103), (6104, 6107), (6144, 6150), (6151, 6155), (6468, 6470), (6686, 6688), (6816, 6823), (6824, 6830), (7002, 7009), (7164, 7168), (7227, 7232), (7294, 7296), (7360, 7368), 7379, (8214, 8216), (8224, 8232), (8240, 8249), (8251, 8255), (8257, 8260), (8263, 8274), 8275, (8277, 8287), (11513, 11517), (11518, 11520), 11632, (11776, 11778), (11782, 11785), 11787, (11790, 11799), (11800, 11802), 11803, (11806, 11808), (11818, 11823), (11824, 11834), (11836, 11840), 11841, (11843, 11856), (12289, 12292), 12349, 12539, (42238, 42240), (42509, 42512), 42611, 42622, (42738, 42744), (43124, 43128), (43214, 43216), (43256, 43259), 43260, (43310, 43312), 43359, (43457, 43470), (43486, 43488), (43612, 43616), (43742, 43744), (43760, 43762), 44011, (65040, 65047), 65049, 65072, (65093, 65095), (65097, 65101), (65104, 65107), (65108, 65112), (65119, 65122), 65128, (65130, 65132), (65281, 65284), (65285, 65288), 65290, 65292, (65294, 65296), (65306, 65308), (65311, 65313), 65340, 65377, (65380, 65382), (65792, 65795), 66463, 66512, 66927, 67671, 67871, 67903, (68176, 68185), 68223, (68336, 68343), (68409, 68416), (68505, 68509), (69461, 69466), (69703, 69710), (69819, 69821), (69822, 69826), (69952, 69956), (70004, 70006), (70085, 70089), 70093, 70107, (70109, 70112), (70200, 70206), 70313, (70731, 70736), 70747, 70749, 70854, (71105, 71128), (71233, 71236), (71264, 71277), (71484, 71487), 71739, 72162, (72255, 72263), (72346, 72349), (72350, 72355), (72769, 72774), (72816, 72818), (73463, 73465), 73727, (74864, 74869), (92782, 92784), 92917, (92983, 92988), 92996, (93847, 93851), 94178, 113823, (121479, 121484), (125278, 125280)], 'Ps': [40, 91, 123, 3898, 3900, 5787, 8218, 8222, 8261, 8317, 8333, 8968, 8970, 9001, 10088, 10090, 10092, 10094, 10096, 10098, 10100, 10181, 10214, 10216, 10218, 10220, 10222, 10627, 10629, 10631, 10633, 10635, 10637, 10639, 10641, 10643, 10645, 10647, 10712, 10714, 10748, 11810, 11812, 11814, 11816, 11842, 12296, 12298, 12300, 12302, 12304, 12308, 12310, 12312, 12314, 12317, 64831, 65047, 65077, 65079, 65081, 65083, 65085, 65087, 65089, 65091, 65095, 65113, 65115, 65117, 65288, 65339, 65371, 65375, 65378], 'S': [36, 43, (60, 63), 94, 96, 124, 126, (162, 167), (168, 170), 172, (174, 178), 180, 184, 215, 247, (706, 710), (722, 736), (741, 748), 749, (751, 768), 885, (900, 902), 1014, 1154, (1421, 1424), (1542, 1545), 1547, (1550, 1552), 1758, 1769, (1789, 1791), 2038, (2046, 2048), (2546, 2548), (2554, 2556), 2801, 2928, (3059, 3067), 3199, 3407, 3449, 3647, (3841, 3844), 3859, (3861, 3864), (3866, 3872), 3892, 3894, 3896, (4030, 4038), (4039, 4045), (4046, 4048), (4053, 4057), (4254, 4256), (5008, 5018), 5741, 6107, 6464, (6622, 6656), (7009, 7019), (7028, 7037), 8125, (8127, 8130), (8141, 8144), (8157, 8160), (8173, 8176), (8189, 8191), 8260, 8274, (8314, 8317), (8330, 8333), (8352, 8384), (8448, 8450), (8451, 8455), (8456, 8458), 8468, (8470, 8473), (8478, 8484), 8485, 8487, 8489, 8494, (8506, 8508), (8512, 8517), (8522, 8526), 8527, (8586, 8588), (8592, 8968), (8972, 9001), (9003, 9255), (9280, 9291), (9372, 9450), (9472, 10088), (10132, 10181), (10183, 10214), (10224, 10627), (10649, 10712), (10716, 10748), (10750, 11124), (11126, 11158), (11160, 11264), (11493, 11499), (11904, 11930), (11931, 12020), (12032, 12246), (12272, 12284), 12292, (12306, 12308), 12320, (12342, 12344), (12350, 12352), (12443, 12445), (12688, 12690), (12694, 12704), (12736, 12772), (12800, 12831), (12842, 12872), 12880, (12896, 12928), (12938, 12977), (12992, 13312), (19904, 19968), (42128, 42183), (42752, 42775), (42784, 42786), (42889, 42891), (43048, 43052), (43062, 43066), (43639, 43642), 43867, 64297, (64434, 64450), (65020, 65022), 65122, (65124, 65127), 65129, 65284, 65291, (65308, 65311), 65342, 65344, 65372, 65374, (65504, 65511), (65512, 65519), (65532, 65534), (65847, 65856), (65913, 65930), (65932, 65935), (65936, 65948), 65952, (66000, 66045), (67703, 67705), 68296, 71487, (73685, 73714), (92988, 92992), 92997, 113820, (118784, 119030), (119040, 119079), (119081, 119141), (119146, 119149), (119171, 119173), (119180, 119210), (119214, 119273), (119296, 119362), 119365, (119552, 119639), 120513, 120539, 120571, 120597, 120629, 120655, 120687, 120713, 120745, 120771, (120832, 121344), (121399, 121403), (121453, 121461), (121462, 121476), (121477, 121479), 123215, 123647, 126124, 126128, 126254, (126704, 126706), (126976, 127020), (127024, 127124), (127136, 127151), (127153, 127168), (127169, 127184), (127185, 127222), (127248, 127341), (127344, 127405), (127462, 127491), (127504, 127548), (127552, 127561), (127568, 127570), (127584, 127590), (127744, 128726), (128736, 128749), (128752, 128763), (128768, 128884), (128896, 128985), (128992, 129004), (129024, 129036), (129040, 129096), (129104, 129114), (129120, 129160), (129168, 129198), (129280, 129292), (129293, 129394), (129395, 129399), (129402, 129443), (129445, 129451), (129454, 129483), (129485, 129620), (129632, 129646), (129648, 129652), (129656, 129659), (129664, 129667), (129680, 129686)], 'Sc': [36, (162, 166), 1423, 1547, (2046, 2048), (2546, 2548), 2555, 2801, 3065, 3647, 6107, (8352, 8384), 43064, 65020, 65129, 65284, (65504, 65506), (65509, 65511), (73693, 73697), 123647, 126128], 'Sk': [94, 96, 168, 175, 180, 184, (706, 710), (722, 736), (741, 748), 749, (751, 768), 885, (900, 902), 8125, (8127, 8130), (8141, 8144), (8157, 8160), (8173, 8176), (8189, 8191), (12443, 12445), (42752, 42775), (42784, 42786), (42889, 42891), 43867, (64434, 64450), 65342, 65344, 65507, (127995, 128000)], 'Sm': [43, (60, 63), 124, 126, 172, 177, 215, 247, 1014, (1542, 1545), 8260, 8274, (8314, 8317), (8330, 8333), 8472, (8512, 8517), 8523, (8592, 8597), (8602, 8604), 8608, 8611, 8614, 8622, (8654, 8656), 8658, 8660, (8692, 8960), (8992, 8994), 9084, (9115, 9140), (9180, 9186), 9655, 9665, (9720, 9728), 9839, (10176, 10181), (10183, 10214), (10224, 10240), (10496, 10627), (10649, 10712), (10716, 10748), (10750, 11008), (11056, 11077), (11079, 11085), 64297, 65122, (65124, 65127), 65291, (65308, 65311), 65372, 65374, 65506, (65513, 65517), 120513, 120539, 120571, 120597, 120629, 120655, 120687, 120713, 120745, 120771, (126704, 126706)], 'So': [166, 169, 174, 176, 1154, (1421, 1423), (1550, 1552), 1758, 1769, (1789, 1791), 2038, 2554, 2928, (3059, 3065), 3066, 3199, 3407, 3449, (3841, 3844), 3859, (3861, 3864), (3866, 3872), 3892, 3894, 3896, (4030, 4038), (4039, 4045), (4046, 4048), (4053, 4057), (4254, 4256), (5008, 5018), 5741, 6464, (6622, 6656), (7009, 7019), (7028, 7037), (8448, 8450), (8451, 8455), (8456, 8458), 8468, (8470, 8472), (8478, 8484), 8485, 8487, 8489, 8494, (8506, 8508), 8522, (8524, 8526), 8527, (8586, 8588), (8597, 8602), (8604, 8608), (8609, 8611), (8612, 8614), (8615, 8622), (8623, 8654), (8656, 8658), 8659, (8661, 8692), (8960, 8968), (8972, 8992), (8994, 9001), (9003, 9084), (9085, 9115), (9140, 9180), (9186, 9255), (9280, 9291), (9372, 9450), (9472, 9655), (9656, 9665), (9666, 9720), (9728, 9839), (9840, 10088), (10132, 10176), (10240, 10496), (11008, 11056), (11077, 11079), (11085, 11124), (11126, 11158), (11160, 11264), (11493, 11499), (11904, 11930), (11931, 12020), (12032, 12246), (12272, 12284), 12292, (12306, 12308), 12320, (12342, 12344), (12350, 12352), (12688, 12690), (12694, 12704), (12736, 12772), (12800, 12831), (12842, 12872), 12880, (12896, 12928), (12938, 12977), (12992, 13312), (19904, 19968), (42128, 42183), (43048, 43052), (43062, 43064), 43065, (43639, 43642), 65021, 65508, 65512, (65517, 65519), (65532, 65534), (65847, 65856), (65913, 65930), (65932, 65935), (65936, 65948), 65952, (66000, 66045), (67703, 67705), 68296, 71487, (73685, 73693), (73697, 73714), (92988, 92992), 92997, 113820, (118784, 119030), (119040, 119079), (119081, 119141), (119146, 119149), (119171, 119173), (119180, 119210), (119214, 119273), (119296, 119362), 119365, (119552, 119639), (120832, 121344), (121399, 121403), (121453, 121461), (121462, 121476), (121477, 121479), 123215, 126124, 126254, (126976, 127020), (127024, 127124), (127136, 127151), (127153, 127168), (127169, 127184), (127185, 127222), (127248, 127341), (127344, 127405), (127462, 127491), (127504, 127548), (127552, 127561), (127568, 127570), (127584, 127590), (127744, 127995), (128000, 128726), (128736, 128749), (128752, 128763), (128768, 128884), (128896, 128985), (128992, 129004), (129024, 129036), (129040, 129096), (129104, 129114), (129120, 129160), (129168, 129198), (129280, 129292), (129293, 129394), (129395, 129399), (129402, 129443), (129445, 129451), (129454, 129483), (129485, 129620), (129632, 129646), (129648, 129652), (129656, 129659), (129664, 129667), (129680, 129686)], 'Z': [32, 160, 5760, (8192, 8203), (8232, 8234), 8239, 8287, 12288], 'Zl': [8232], 'Zp': [8233], 'Zs': [32, 160, 5760, (8192, 8203), 8239, 8287, 12288] } elementpath-3.0.2/elementpath/regex/unicode_subsets.py000066400000000000000000000471441427546011100232250ustar00rootroot00000000000000# # Copyright (c), 2016-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ This module defines Unicode character categories and blocks. """ from sys import maxunicode from typing import cast, Iterable, Iterator, List, MutableSet, Union, Optional from .unicode_categories import RAW_UNICODE_CATEGORIES from .codepoints import CodePoint, code_point_order, code_point_repr, \ iter_code_points, get_code_point_range CodePointsArgType = Union[str, 'UnicodeSubset', List[CodePoint], Iterable[CodePoint]] class RegexError(Exception): """ Error in a regular expression or in a character class specification. This exception is derived from `Exception` base class and is raised only by the regex subpackage. """ def iterparse_character_subset(s: str, expand_ranges: bool = False) -> Iterator[CodePoint]: """ Parses a regex character subset, generating a sequence of code points and code points ranges. An unescaped hyphen (-) that is not at the start or at the and is interpreted as range specifier. :param s: a string representing the character subset. :param expand_ranges: if set to `True` then expands character ranges. :return: yields integers or couples of integers. """ escaped = False on_range = False char = '' length = len(s) subset_index_iterator = iter(range(len(s))) for k in subset_index_iterator: if k == 0: char = s[0] if char == '\\': escaped = True elif char in r'[]' and length > 1: raise RegexError("bad character %r at position 0" % char) elif expand_ranges: yield ord(char) elif length <= 2 or s[1] != '-': yield ord(char) elif s[k] == '-': if escaped or (k == length - 1): char = s[k] yield ord(char) escaped = False elif on_range: char = s[k] yield ord(char) on_range = False else: # Parse character range on_range = True k = next(subset_index_iterator) end_char = s[k] if end_char == '\\' and (k < length - 1): if s[k + 1] in r'-|.^?*+{}()[]': k = next(subset_index_iterator) end_char = s[k] elif s[k + 1] in r'sSdDiIcCwWpP': msg = "bad character range '%s-\\%s' at position %d: %r" raise RegexError(msg % (char, s[k + 1], k - 2, s)) if ord(char) > ord(end_char): msg = "bad character range '%s-%s' at position %d: %r" raise RegexError(msg % (char, end_char, k - 2, s)) elif expand_ranges: yield from range(ord(char) + 1, ord(end_char) + 1) else: yield ord(char), ord(end_char) + 1 elif s[k] in r'|.^?*+{}()': if escaped: escaped = False on_range = False char = s[k] yield ord(char) elif s[k] in r'[]': if not escaped and length > 1: raise RegexError("bad character %r at position %d" % (s[k], k)) escaped = on_range = False char = s[k] if k >= length - 2 or s[k + 1] != '-': yield ord(char) elif s[k] == '\\': if escaped: escaped = on_range = False char = '\\' yield ord(char) else: escaped = True else: if escaped: escaped = False yield ord('\\') on_range = False char = s[k] if k >= length - 2 or s[k + 1] != '-': yield ord(char) if escaped: yield ord('\\') class UnicodeSubset(MutableSet[CodePoint]): """ Represents a subset of Unicode code points, implemented with an ordered list of integer values and ranges. Codepoints can be added or discarded using sequences of integer values and ranges or with strings equivalent to regex character set. :param codepoints: a sequence of integer values and ranges, another UnicodeSubset \ instance ora a string equivalent of a regex character set. """ __slots__ = '_codepoints', _codepoints: List[CodePoint] def __init__(self, codepoints: Optional[CodePointsArgType] = None) -> None: if not codepoints: self._codepoints = list() elif isinstance(codepoints, list): self._codepoints = sorted(codepoints, key=code_point_order) elif isinstance(codepoints, UnicodeSubset): self._codepoints = codepoints.codepoints.copy() else: self._codepoints = list() self.update(codepoints) @property def codepoints(self) -> List[CodePoint]: return self._codepoints def __repr__(self) -> str: return '%s(%r)' % (self.__class__.__name__, str(self)) def __str__(self) -> str: return ''.join(code_point_repr(cp) for cp in self._codepoints) def copy(self) -> 'UnicodeSubset': return self.__copy__() def __copy__(self) -> 'UnicodeSubset': return UnicodeSubset(self._codepoints) def __reversed__(self) -> Iterator[int]: for item in reversed(self._codepoints): if isinstance(item, int): yield item else: yield from reversed(range(item[0], item[1])) def complement(self) -> Iterator[CodePoint]: last_cp = 0 for cp in self._codepoints: if isinstance(cp, int): cp = cp, cp + 1 diff = cp[0] - last_cp if diff > 2: yield last_cp, cp[0] elif diff == 2: yield last_cp yield last_cp + 1 elif diff == 1: yield last_cp elif diff: raise ValueError("unordered code points found in {!r}".format(self)) last_cp = cp[1] if last_cp < maxunicode: yield last_cp, maxunicode + 1 elif last_cp == maxunicode: yield maxunicode def iter_characters(self) -> Iterator[str]: return map(chr, self.__iter__()) # # MutableSet's abstract methods implementation def __contains__(self, value: object) -> bool: if not isinstance(value, int): try: value = ord(value) # type: ignore[arg-type] except TypeError: return False for cp in self._codepoints: if not isinstance(cp, int): if cp[0] > value: return False elif cp[1] <= value: continue else: return True elif cp > value: return False elif cp == value: return True return False def __iter__(self) -> Iterator[int]: for cp in self._codepoints: if isinstance(cp, int): yield cp else: yield from range(*cp) def __len__(self) -> int: k = 0 for _ in self: k += 1 return k def update(self, *others: Union[str, Iterable[CodePoint]]) -> None: for value in others: if isinstance(value, str): for cp in iter_code_points(iterparse_character_subset(value), reverse=True): self.add(cp) else: for cp in iter_code_points(value, reverse=True): self.add(cp) def add(self, value: CodePoint) -> None: try: start_value, end_value = get_code_point_range(value) # type: ignore[misc] except TypeError: raise ValueError("{!r} is not a Unicode code point value/range".format(value)) code_points = self._codepoints last_index = len(code_points) - 1 for k, cp in enumerate(code_points): if isinstance(cp, int): cp = cp, cp + 1 if end_value < cp[0]: code_points.insert(k, value) elif start_value > cp[1]: continue elif end_value > cp[1]: if k == last_index: code_points[k] = min(cp[0], start_value), end_value else: next_cp = code_points[k + 1] higher_bound = next_cp if isinstance(next_cp, int) else next_cp[0] if end_value <= higher_bound: code_points[k] = min(cp[0], start_value), end_value else: code_points[k] = min(cp[0], start_value), higher_bound start_value = higher_bound continue elif start_value < cp[0]: code_points[k] = start_value, cp[1] break else: self._codepoints.append(value) def difference_update(self, *others: Union[str, Iterable[CodePoint]]) -> None: for value in others: if isinstance(value, str): for cp in iter_code_points(iterparse_character_subset(value), reverse=True): self.discard(cp) else: for cp in iter_code_points(value, reverse=True): self.discard(cp) def discard(self, value: CodePoint) -> None: try: start_cp, end_cp = get_code_point_range(value) # type: ignore[misc] except TypeError: raise ValueError("{!r} is not a Unicode code point value/range".format(value)) code_points = self._codepoints for k in reversed(range(len(code_points))): cp = code_points[k] if isinstance(cp, int): cp = cp, cp + 1 if start_cp >= cp[1]: break elif end_cp >= cp[1]: if start_cp <= cp[0]: del code_points[k] elif start_cp - cp[0] > 1: code_points[k] = cp[0], start_cp else: code_points[k] = cp[0] elif end_cp > cp[0]: if start_cp <= cp[0]: if cp[1] - end_cp > 1: code_points[k] = end_cp, cp[1] else: code_points[k] = cp[1] - 1 else: if cp[1] - end_cp > 1: code_points.insert(k + 1, (end_cp, cp[1])) else: code_points.insert(k + 1, cp[1] - 1) if start_cp - cp[0] > 1: code_points[k] = cp[0], start_cp else: code_points[k] = cp[0] # # MutableSet's mixin methods override def clear(self) -> None: del self._codepoints[:] def __eq__(self, other: object) -> bool: if not isinstance(other, Iterable): return NotImplemented elif isinstance(other, UnicodeSubset): return self._codepoints == other._codepoints else: return self._codepoints == other def __ior__(self, other: object) -> 'UnicodeSubset': if not isinstance(other, Iterable): return NotImplemented elif isinstance(other, UnicodeSubset): other = reversed(other._codepoints) elif isinstance(other, str): other = reversed(UnicodeSubset(other)._codepoints) else: other = iter_code_points(other, reverse=True) for cp in other: self.add(cp) return self def __or__(self, other: object) -> 'UnicodeSubset': obj = self.copy() return obj.__ior__(other) def __isub__(self, other: object) -> 'UnicodeSubset': if not isinstance(other, Iterable): return NotImplemented elif isinstance(other, UnicodeSubset): other = reversed(other._codepoints) elif isinstance(other, str): other = reversed(UnicodeSubset(other)._codepoints) else: other = iter_code_points(other, reverse=True) for cp in other: self.discard(cp) return self def __sub__(self, other: object) -> 'UnicodeSubset': obj = self.copy() return obj.__isub__(other) __rsub__ = __sub__ def __iand__(self, other: object) -> 'UnicodeSubset': if not isinstance(other, Iterable): return NotImplemented for value in (self - other): self.discard(value) return self def __and__(self, other: object) -> 'UnicodeSubset': obj = self.copy() return obj.__iand__(other) def __ixor__(self, other: object) -> 'UnicodeSubset': if other is self: self.clear() return self elif not isinstance(other, Iterable): return NotImplemented elif not isinstance(other, UnicodeSubset): other = UnicodeSubset(cast(Union[str, Iterable[CodePoint]], other)) for value in other: if value in self: self.discard(value) else: self.add(value) return self def __xor__(self, other: object) -> 'UnicodeSubset': obj = self.copy() return obj.__ixor__(other) UNICODE_CATEGORIES = {k: UnicodeSubset(cast(List[CodePoint], v)) for k, v in RAW_UNICODE_CATEGORIES.items()} # See http://www.unicode.org/Public/UNIDATA/Blocks.txt UNICODE_BLOCKS = { 'IsBasicLatin': UnicodeSubset('\u0000-\u007F'), 'IsLatin-1Supplement': UnicodeSubset('\u0080-\u00FF'), 'IsLatinExtended-A': UnicodeSubset('\u0100-\u017F'), 'IsLatinExtended-B': UnicodeSubset('\u0180-\u024F'), 'IsIPAExtensions': UnicodeSubset('\u0250-\u02AF'), 'IsSpacingModifierLetters': UnicodeSubset('\u02B0-\u02FF'), 'IsCombiningDiacriticalMarks': UnicodeSubset('\u0300-\u036F'), 'IsGreek': UnicodeSubset('\u0370-\u03FF'), 'IsCyrillic': UnicodeSubset('\u0400-\u04FF'), 'IsArmenian': UnicodeSubset('\u0530-\u058F'), 'IsHebrew': UnicodeSubset('\u0590-\u05FF'), 'IsArabic': UnicodeSubset('\u0600-\u06FF'), 'IsSyriac': UnicodeSubset('\u0700-\u074F'), 'IsThaana': UnicodeSubset('\u0780-\u07BF'), 'IsDevanagari': UnicodeSubset('\u0900-\u097F'), 'IsBengali': UnicodeSubset('\u0980-\u09FF'), 'IsGurmukhi': UnicodeSubset('\u0A00-\u0A7F'), 'IsGujarati': UnicodeSubset('\u0A80-\u0AFF'), 'IsOriya': UnicodeSubset('\u0B00-\u0B7F'), 'IsTamil': UnicodeSubset('\u0B80-\u0BFF'), 'IsTelugu': UnicodeSubset('\u0C00-\u0C7F'), 'IsKannada': UnicodeSubset('\u0C80-\u0CFF'), 'IsMalayalam': UnicodeSubset('\u0D00-\u0D7F'), 'IsSinhala': UnicodeSubset('\u0D80-\u0DFF'), 'IsThai': UnicodeSubset('\u0E00-\u0E7F'), 'IsLao': UnicodeSubset('\u0E80-\u0EFF'), 'IsTibetan': UnicodeSubset('\u0F00-\u0FFF'), 'IsMyanmar': UnicodeSubset('\u1000-\u109F'), 'IsGeorgian': UnicodeSubset('\u10A0-\u10FF'), 'IsHangulJamo': UnicodeSubset('\u1100-\u11FF'), 'IsEthiopic': UnicodeSubset('\u1200-\u137F'), 'IsCherokee': UnicodeSubset('\u13A0-\u13FF'), 'IsUnifiedCanadianAboriginalSyllabics': UnicodeSubset('\u1400-\u167F'), 'IsOgham': UnicodeSubset('\u1680-\u169F'), 'IsRunic': UnicodeSubset('\u16A0-\u16FF'), 'IsKhmer': UnicodeSubset('\u1780-\u17FF'), 'IsMongolian': UnicodeSubset('\u1800-\u18AF'), 'IsLatinExtendedAdditional': UnicodeSubset('\u1E00-\u1EFF'), 'IsGreekExtended': UnicodeSubset('\u1F00-\u1FFF'), 'IsGeneralPunctuation': UnicodeSubset('\u2000-\u206F'), 'IsSuperscriptsandSubscripts': UnicodeSubset('\u2070-\u209F'), 'IsCurrencySymbols': UnicodeSubset('\u20A0-\u20CF'), 'IsCombiningMarksforSymbols': UnicodeSubset('\u20D0-\u20FF'), 'IsLetterlikeSymbols': UnicodeSubset('\u2100-\u214F'), 'IsNumberForms': UnicodeSubset('\u2150-\u218F'), 'IsArrows': UnicodeSubset('\u2190-\u21FF'), 'IsMathematicalOperators': UnicodeSubset('\u2200-\u22FF'), 'IsMiscellaneousTechnical': UnicodeSubset('\u2300-\u23FF'), 'IsControlPictures': UnicodeSubset('\u2400-\u243F'), 'IsOpticalCharacterRecognition': UnicodeSubset('\u2440-\u245F'), 'IsEnclosedAlphanumerics': UnicodeSubset('\u2460-\u24FF'), 'IsBoxDrawing': UnicodeSubset('\u2500-\u257F'), 'IsBlockElements': UnicodeSubset('\u2580-\u259F'), 'IsGeometricShapes': UnicodeSubset('\u25A0-\u25FF'), 'IsMiscellaneousSymbols': UnicodeSubset('\u2600-\u26FF'), 'IsDingbats': UnicodeSubset('\u2700-\u27BF'), 'IsBraillePatterns': UnicodeSubset('\u2800-\u28FF'), 'IsCJKRadicalsSupplement': UnicodeSubset('\u2E80-\u2EFF'), 'IsKangxiRadicals': UnicodeSubset('\u2F00-\u2FDF'), 'IsIdeographicDescriptionCharacters': UnicodeSubset('\u2FF0-\u2FFF'), 'IsCJKSymbolsandPunctuation': UnicodeSubset('\u3000-\u303F'), 'IsHiragana': UnicodeSubset('\u3040-\u309F'), 'IsKatakana': UnicodeSubset('\u30A0-\u30FF'), 'IsBopomofo': UnicodeSubset('\u3100-\u312F'), 'IsHangulCompatibilityJamo': UnicodeSubset('\u3130-\u318F'), 'IsKanbun': UnicodeSubset('\u3190-\u319F'), 'IsBopomofoExtended': UnicodeSubset('\u31A0-\u31BF'), 'IsEnclosedCJKLettersandMonths': UnicodeSubset('\u3200-\u32FF'), 'IsCJKCompatibility': UnicodeSubset('\u3300-\u33FF'), 'IsCJKUnifiedIdeographsExtensionA': UnicodeSubset('\u3400-\u4DB5'), 'IsCJKUnifiedIdeographs': UnicodeSubset('\u4E00-\u9FFF'), 'IsYiSyllables': UnicodeSubset('\uA000-\uA48F'), 'IsYiRadicals': UnicodeSubset('\uA490-\uA4CF'), 'IsHangulSyllables': UnicodeSubset('\uAC00-\uD7A3'), 'IsHighSurrogates': UnicodeSubset('\uD800-\uDB7F'), 'IsHighPrivateUseSurrogates': UnicodeSubset('\uDB80-\uDBFF'), 'IsLowSurrogates': UnicodeSubset('\uDC00-\uDFFF'), 'IsPrivateUse': UnicodeSubset('\uE000-\uF8FF\U000F0000-\U000FFFFF\U00100000-\U0010FFFF'), 'IsCJKCompatibilityIdeographs': UnicodeSubset('\uF900-\uFAFF'), 'IsAlphabeticPresentationForms': UnicodeSubset('\uFB00-\uFB4F'), 'IsArabicPresentationForms-A': UnicodeSubset('\uFB50-\uFDFF'), 'IsCombiningHalfMarks': UnicodeSubset('\uFE20-\uFE2F'), 'IsCJKCompatibilityForms': UnicodeSubset('\uFE30-\uFE4F'), 'IsSmallFormVariants': UnicodeSubset('\uFE50-\uFE6F'), 'IsArabicPresentationForms-B': UnicodeSubset('\uFE70-\uFEFE'), 'IsSpecials': UnicodeSubset('\uFEFF\uFFF0-\uFFFD'), 'IsHalfwidthandFullwidthForms': UnicodeSubset('\uFF00-\uFFEF'), 'IsOldItalic': UnicodeSubset('\U00010300-\U0001032F'), 'IsGothic': UnicodeSubset('\U00010330-\U0001034F'), 'IsDeseret': UnicodeSubset('\U00010400-\U0001044F'), 'IsByzantineMusicalSymbols': UnicodeSubset('\U0001D000-\U0001D0FF'), 'IsMusicalSymbols': UnicodeSubset('\U0001D100-\U0001D1FF'), 'IsMathematicalAlphanumericSymbols': UnicodeSubset('\U0001D400-\U0001D7FF'), 'IsCJKUnifiedIdeographsExtensionB': UnicodeSubset('\U00020000-\U0002A6D6'), 'IsCJKCompatibilityIdeographsSupplement': UnicodeSubset('\U0002F800-\U0002FA1F'), 'IsTags': UnicodeSubset('\U000E0000-\U000E007F'), } UNICODE_BLOCKS['IsPrivateUse'].update('\U000F0000-\U0010FFFD') def unicode_subset(name: str) -> UnicodeSubset: if name.startswith('Is'): try: return UNICODE_BLOCKS[name] except KeyError: raise RegexError("%r doesn't match to any Unicode block." % name) else: try: return UNICODE_CATEGORIES[name] except KeyError: raise RegexError("%r doesn't match to any Unicode category." % name) elementpath-3.0.2/elementpath/schema_proxy.py000066400000000000000000000152401427546011100214060ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from abc import ABCMeta, abstractmethod from typing import TYPE_CHECKING, cast, Any, Dict, List, Optional, Iterator, Union from .exceptions import ElementPathTypeError from .protocols import ElementProtocol, XsdTypeProtocol, XsdAttributeProtocol, \ XsdElementProtocol, XsdSchemaProtocol from .datatypes import AtomicValueType from .etree import is_etree_element from .xpath_context import XPathSchemaContext if TYPE_CHECKING: from .xpath2 import XPath2Parser from .xpath30 import XPath30Parser XPathParserType = Union[XPath2Parser, XPath30Parser] else: XPathParserType = Any class AbstractSchemaProxy(metaclass=ABCMeta): """ Abstract base class for defining schema proxies. :param schema: a schema instance that implements the `AbstractEtreeElement` interface. :param base_element: the schema element used as base item for static analysis. """ def __init__(self, schema: XsdSchemaProtocol, base_element: Optional[ElementProtocol] = None) -> None: if not is_etree_element(schema): raise ElementPathTypeError( "argument {!r} is not a compatible schema instance".format(schema) ) if base_element is not None and not is_etree_element(base_element): raise ElementPathTypeError( "argument 'base_element' is not a compatible element instance" ) self._schema = schema self._base_element: Optional[ElementProtocol] = base_element def bind_parser(self, parser: XPathParserType) -> None: """ Binds a parser instance with schema proxy adding the schema's atomic types constructors. This method can be redefined in a concrete proxy to optimize schema bindings. :param parser: a parser instance. """ if parser.schema is not self: parser.schema = self parser.symbol_table = dict(parser.__class__.symbol_table) for xsd_type in self.iter_atomic_types(): if xsd_type.name is not None: parser.schema_constructor(xsd_type.name) parser.tokenizer = parser.create_tokenizer(parser.symbol_table) def get_context(self) -> XPathSchemaContext: """ Get a context instance for static analysis phase. :returns: an `XPathSchemaContext` instance. """ return XPathSchemaContext(root=self._schema, item=self._base_element) def find(self, path: str, namespaces: Optional[Dict[str, str]] = None) \ -> Optional[XsdElementProtocol]: """ Find a schema element or attribute using an XPath expression. :param path: an XPath expression that selects an element or an attribute node. :param namespaces: an optional mapping from namespace prefix to namespace URI. :return: The first matching schema component, or ``None`` if there is no match. """ return cast(Optional[XsdElementProtocol], self._schema.find(path, namespaces)) @property def xsd_version(self) -> str: """The XSD version, returns '1.0' or '1.1'.""" return self._schema.xsd_version def get_type(self, qname: str) -> Optional[XsdTypeProtocol]: """ Get the XSD global type from the schema's scope. A concrete implementation must return an object that supports the protocols `XsdTypeProtocol`, or `None` if the global type is not found. :param qname: the fully qualified name of the type to retrieve. :returns: an object that represents an XSD type or `None`. """ return self._schema.maps.types.get(qname) def get_attribute(self, qname: str) -> Optional[XsdAttributeProtocol]: """ Get the XSD global attribute from the schema's scope. A concrete implementation must return an object that supports the protocol `XsdAttributeProtocol`, or `None` if the global attribute is not found. :param qname: the fully qualified name of the attribute to retrieve. :returns: an object that represents an XSD attribute or `None`. """ return self._schema.maps.attributes.get(qname) def get_element(self, qname: str) -> Optional[XsdElementProtocol]: """ Get the XSD global element from the schema's scope. A concrete implementation must return an object that supports the protocol `XsdElementProtocol` interface, or `None` if the global element is not found. :param qname: the fully qualified name of the element to retrieve. :returns: an object that represents an XSD element or `None`. """ return self._schema.maps.elements.get(qname) def get_substitution_group(self, qname: str) -> Optional[List[XsdElementProtocol]]: """ Get a substitution group. A concrete implementation must returns a list containing substitution elements or `None` if the substitution group is not found. Moreover each item of the returned list must be an object that implements the `AbstractXsdElement` interface. :param qname: the fully qualified name of the substitution group to retrieve. :returns: a list containing substitution elements or `None`. """ return self._schema.maps.substitution_groups.get(qname) @abstractmethod def is_instance(self, obj: Any, type_qname: str) -> bool: """ Returns `True` if *obj* is an instance of the XSD global type, `False` if not. :param obj: the instance to be tested. :param type_qname: the fully qualified name of the type used to test the instance. """ @abstractmethod def cast_as(self, obj: Any, type_qname: str) -> AtomicValueType: """ Converts *obj* to the Python type associated with an XSD global type. A concrete implementation must raises a `ValueError` or `TypeError` in case of a decoding error or a `KeyError` if the type is not bound to the schema's scope. :param obj: the instance to be cast. :param type_qname: the fully qualified name of the type used to convert the instance. """ @abstractmethod def iter_atomic_types(self) -> Iterator[XsdTypeProtocol]: """ Returns an iterator for not builtin atomic types defined in the schema's scope. A concrete implementation must yield objects that implement the protocol `XsdTypeProtocol`. """ __all__ = ['AbstractSchemaProxy'] elementpath-3.0.2/elementpath/tdop.py000066400000000000000000001034121427546011100176520ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ This module contains base classes and helper functions for defining Pratt parsers. """ import sys import re from abc import ABCMeta from unicodedata import name as unicode_name from decimal import Decimal, DecimalException from typing import Any, cast, overload, no_type_check_decorator, Callable, \ ClassVar, FrozenSet, Dict, Generic, List, Optional, Union, Tuple, Type, \ Pattern, Match, MutableMapping, MutableSequence, Iterator, Set, TypeVar # # Simple top-down parser based on Vaughan Pratt's algorithm (Top Down Operator Precedence). # # References: # # https://tdop.github.io/ (Vaughan R. Pratt's "Top Down Operator Precedence" - 1973) # http://crockford.com/javascript/tdop/tdop.html (Douglas Crockford - 2007) # http://effbot.org/zone/simple-top-down-parsing.htm (Fredrik Lundh - 2008) # # This implementation is based on a base class for tokens and a base class for parsers. # A real parser is built with a derivation of the base parser class followed by the # registrations of token classes for the symbols of the language. # # A parser can be extended by derivation, copying the reusable token classes and # defining the additional ones. See the files xpath1_parser.py and xpath2_parser.py # for a full implementation example of a real parser. # # Parser special symbols set, that includes the special symbols of TDOP plus two # additional special symbols for managing invalid literals and unknown symbols # and source start. SPECIAL_SYMBOLS = frozenset(( '(start)', '(end)', '(string)', '(float)', '(decimal)', '(integer)', '(name)', '(invalid)', '(unknown)', )) class ParseError(SyntaxError): """An error when parsing source with TDOP parser.""" def _symbol_to_classname(symbol: str) -> str: """ Converts a symbol string to an identifier (only alphanumeric and '_'). """ def get_id_name(c: str) -> str: if c.isalnum() or c == '_': return c else: return '%s_' % unicode_name(str(c)).title() if symbol.isalnum(): return symbol.title() elif symbol in SPECIAL_SYMBOLS: return symbol[1:-1].title() elif all(c in '-_' for c in symbol): value = ' '.join(unicode_name(c) for c in symbol) return value.title().replace(' ', '').replace('-', '').replace('_', '') value = symbol.replace('-', '_') if value.isidentifier(): return value.title().replace('_', '') value = ''.join(get_id_name(c) for c in symbol) return value.replace(' ', '').replace('-', '').replace('_', '') class MultiLabel: """ Helper class for defining multi-value label for tokens. Useful when a symbol has more roles. A label of this type has equivalence with each of its values. Example: label = MultiLabel('function', 'operator') label == 'symbol' # False label == 'function' # True label == 'operator' # True """ def __init__(self, *values: str) -> None: self.values = values def __eq__(self, other: object) -> bool: return any(other == v for v in self.values) def __ne__(self, other: object) -> bool: return all(other != v for v in self.values) def __repr__(self) -> str: return '%s%s' % (self.__class__.__name__, self.values) def __str__(self) -> str: return '__'.join(self.values).replace(' ', '_') def __hash__(self) -> int: return hash(self.values) def __contains__(self, item: str) -> bool: return any(item in v for v in self.values) def startswith(self, s: str) -> bool: return any(v.startswith(s) for v in self.values) def endswith(self, s: str) -> bool: return any(v.endswith(s) for v in self.values) TK = TypeVar('TK', bound='Token[Any]') class Token(MutableSequence[TK]): """ Token base class for defining a parser based on Pratt's method. Each token instance is a list-like object. The number of token's items is the arity of the represented operator, where token's items are the operands. Nullary operators are used for symbols, names and literals. Tokens with items represent the other operators (unary, binary and so on). Each token class has a *symbol*, a lbp (left binding power) value and a rbp (right binding power) value, that are used in the sense described by the Pratt's method. This implementation of Pratt tokens includes two extra attributes, *pattern* and *label*, that can be used to simplify the parsing of symbols in a concrete parser. :param parser: The parser instance that creates the token instance. :param value: The token value. If not provided defaults to token symbol. :cvar symbol: the symbol of the token class. :cvar lbp: Pratt's left binding power, defaults to 0. :cvar rbp: Pratt's right binding power, defaults to 0. :cvar pattern: the regex pattern used for the token class. Defaults to the \ escaped symbol. Can be customized to match more detailed conditions (e.g. a \ function with its left round bracket), in order to simplify the related code. :cvar label: defines the typology of the token class. Its value is used in \ representations of the token instance and can be used to restrict code choices \ without more complicated analysis. The label value can be set as needed by the \ parser implementation (eg. 'function', 'axis', 'constructor function' are used by \ the XPath parsers). In the base parser class defaults to 'symbol' with 'literal' \ and 'operator' as possible alternatives. If set by a tuple of values the token \ class label is transformed to a multi-value label, that means the token class can \ covers multiple roles (e.g. as XPath function or axis). In those cases the definitive \ role is defined at parse time (nud and/or led methods) after the token instance creation. """ lbp: int = 0 # left binding power rbp: int = 0 # right binding power symbol: str = '' # the token identifier lookup_name: str = '' # the key in symbol table, usually matches the symbol. label: str = 'symbol' # optional label pattern: Optional[str] = None # a custom regex pattern for building the tokenizer __slots__ = '_items', 'parser', 'value', '_source', 'span' _items: List[TK] parser: 'Parser[Token[TK]]' value: Optional[Any] _source: str span: Tuple[int, int] def __init__(self, parser: 'Parser[Token[TK]]', value: Optional[Any] = None) -> None: self._items = [] self.parser = parser self.value = value if value is not None else self.symbol self._source = parser.source self.span = (0, 0) if parser.next_match is None else parser.next_match.span() @overload def __getitem__(self, i: int) -> TK: ... @overload def __getitem__(self, s: slice) -> MutableSequence[TK]: ... def __getitem__(self, i: Union[int, slice]) \ -> Union[TK, MutableSequence[TK]]: return self._items[i] def __setitem__(self, i: Union[int, slice], o: Any) -> None: self._items[i] = o def __delitem__(self, i: Union[int, slice]) -> None: del self._items[i] def __len__(self) -> int: return len(self._items) def insert(self, i: int, item: TK) -> None: self._items.insert(i, item) def __str__(self) -> str: if self.symbol in SPECIAL_SYMBOLS: return '%r %s' % (self.value, self.symbol[1:-1]) else: return '%r %s' % (self.symbol, str(self.label)) def __repr__(self) -> str: symbol, value = self.symbol, self.value if value != symbol: return '%s(value=%r)' % (self.__class__.__name__, value) else: return '%s()' % self.__class__.__name__ def __eq__(self, other: object) -> bool: if isinstance(other, Token): return self.symbol == other.symbol and self.value == other.value return False @property def arity(self) -> int: return len(self) @property def tree(self) -> str: """Returns a tree representation string.""" if self.symbol == '(name)': return '(%s)' % self.value elif self.symbol in SPECIAL_SYMBOLS: return '(%r)' % self.value elif self.symbol == '(': if not self: return '()' elif len(self) == 1: return self[0].tree return '(%s)' % ' '.join(item.tree for item in self) elif not self: return '(%s)' % self.symbol else: return '(%s %s)' % (self.symbol, ' '.join(item.tree for item in self)) @property def source(self) -> str: """Returns the source representation string.""" symbol = self.symbol if symbol == '(name)': return cast(str, self.value) elif symbol == '(decimal)': return str(self.value) elif symbol in SPECIAL_SYMBOLS: return repr(self.value) else: length = len(self) if not length: return symbol elif length == 1: if 'postfix' in self.label: return '%s %s' % (self[0].source, symbol) return '%s %s' % (symbol, self[0].source) elif length == 2: return '%s %s %s' % (self[0].source, symbol, self[1].source) else: return '%s %s' % (symbol, ' '.join(item.source for item in self)) @property def position(self) -> Tuple[int, int]: """A tuple with the position of the token in terms of line and column.""" token_index = self.span[0] line = self._source[:token_index].count('\n') + 1 if line == 1: return 1, token_index + 1 return line, token_index - self._source[:token_index].rindex('\n') def nud(self) -> TK: """Pratt's null denotation method""" raise self.wrong_syntax() def led(self, left: TK) -> TK: """Pratt's left denotation method""" raise self.wrong_syntax() def evaluate(self) -> Any: """Evaluation method""" return self.value def iter(self, *symbols: str) -> Iterator['Token[TK]']: """Returns a generator for iterating the token's tree.""" status: List[Tuple[Optional['Token[TK]'], Iterator['Token[TK]']]] = [] parent: Optional['Token[TK]'] = self children: Iterator['Token[TK]'] = iter(self) tk: 'Token[TK]' while True: try: tk = next(children) except StopIteration: try: parent, children = status.pop() except IndexError: if parent is not None: if not symbols or parent.symbol in symbols: yield parent return else: if parent is not None: if not symbols or parent.symbol in symbols: yield parent parent = None else: if parent is not None and len(parent._items) == 1: if not symbols or parent.symbol in symbols: yield parent parent = None if not tk._items: if not symbols or tk.symbol in symbols: yield tk if parent is not None: if not symbols or parent.symbol in symbols: yield parent parent = None continue status.append((parent, children)) parent, children = tk, iter(tk) def expected(self, *symbols: str, message: Optional[str] = None) -> None: if symbols and self.symbol not in symbols: raise self.wrong_syntax(message) def unexpected(self, *symbols: str, message: Optional[str] = None) -> None: if not symbols or self.symbol in symbols: raise self.wrong_syntax(message) def wrong_syntax(self, message: Optional[str] = None) -> ParseError: if message: return ParseError(message) elif self.symbol not in SPECIAL_SYMBOLS: return ParseError('unexpected %s' % self) elif self.symbol == '(invalid)': return ParseError('invalid literal %r' % self.value) elif self.symbol == '(unknown)': return ParseError('unknown symbol %r' % self.value) elif self.symbol == '(name)': return ParseError('unexpected name %r' % self.value) elif self.symbol != '(end)': return ParseError('unexpected literal %r' % self.value) elif self.parser.token.symbol == '(start)': return ParseError('source is empty') else: return ParseError('unexpected end of source') def wrong_type(self, message: str = 'invalid type') -> TypeError: return TypeError(message) def wrong_value(self, message: str = 'invalid value') -> ValueError: return ValueError(message) class ParserMeta(ABCMeta): token_base_class: Type[Any] literals_pattern: Pattern[str] name_pattern: Pattern[str] tokenizer: Optional[Pattern[str]] SYMBOLS: Set[str] symbol_table: Dict[str, Token[Any]] def __new__(mcs, name: str, bases: Tuple[Type[Any], ...], namespace: Dict[str, Any]) \ -> 'ParserMeta': cls = super(ParserMeta, mcs).__new__(mcs, name, bases, namespace) # Avoids more parsers definitions for a single module for k, v in sys.modules[cls.__module__].__dict__.items(): if isinstance(v, ParserMeta) and v.__module__ == cls.__module__: raise RuntimeError("Multiple parser class definitions per module are not allowed") # Checks and initializes class attributes if not hasattr(cls, 'token_base_class'): cls.token_base_class = Token if not hasattr(cls, 'literals_pattern'): cls.literals_pattern = re.compile( r"""'[^']*'|"[^"]*"|(?:\d+|\.\d+)(?:\.\d*)?(?:[Ee][+-]?\d+)?""" ) if not hasattr(cls, 'name_pattern'): cls.name_pattern = re.compile(r'[A-Za-z0-9_]+') if 'tokenizer' not in namespace: cls.tokenizer = None if 'SYMBOLS' not in namespace: cls.SYMBOLS = set() for base_class in bases: if hasattr(base_class, 'SYMBOLS'): cls.SYMBOLS.update(base_class.SYMBOLS) break if 'symbol_table' not in namespace: cls.symbol_table = {} for base_class in bases: if hasattr(base_class, 'symbol_table'): cls.symbol_table.update(base_class.symbol_table) break return cls TK_co = TypeVar('TK_co', bound=Token[Any], covariant=True) class Parser(Generic[TK_co], metaclass=ParserMeta): """ Parser class for implementing a Top-Down Operator Precedence parser. :cvar SYMBOLS: the symbols of the definable tokens for the parser. In the base class it's an \ immutable set that contains the symbols for special tokens (literals, names and end-token).\ Has to be extended in a concrete parser adding all the symbols of the language. :cvar symbol_table: a dictionary that stores the token classes defined for the language. :type symbol_table: dict :cvar token_base_class: the base class for creating language's token classes. :type token_base_class: Token :cvar tokenizer: the language tokenizer compiled regexp. """ SYMBOLS: ClassVar[FrozenSet[str]] = SPECIAL_SYMBOLS token_base_class = Token tokenizer: Optional[Pattern[str]] = None symbol_table: MutableMapping[str, Type[TK_co]] = {} _start_token: TK_co source: str tokens: Iterator[Match[str]] token: TK_co next_token: TK_co next_match: Optional[Match[str]] literals_pattern: Pattern[str] name_pattern: Pattern[str] __slots__ = 'source', 'tokens', 'next_match', '_start_token', 'token', 'next_token' def __init__(self) -> None: if self.tokenizer is None: self.build() self.source = '' self.tokens = iter(()) self.next_match = None self._start_token = self.symbol_table['(start)'](self) self.token = self.next_token = self._start_token def __eq__(self, other: object) -> bool: return isinstance(other, Parser) and \ self.token_base_class is other.token_base_class and \ self.SYMBOLS == other.SYMBOLS and \ self.symbol_table == other.symbol_table def parse(self, source: str) -> TK_co: """ Parses a source code of the formal language. This is the main method that has to be called for a parser's instance. :param source: The source string. :return: The root of the token's tree that parse the source. """ assert self.tokenizer, "Parser tokenizer is not built!" try: try: self.tokens = iter(self.tokenizer.finditer(source)) except TypeError as err: token = self.symbol_table['(invalid)'](self, source) raise token.wrong_syntax('invalid source type, {}'.format(err)) self.source = source self.advance() root_token = self.expression() self.next_token.expected('(end)') return root_token finally: self.tokens = iter(()) self.next_match = None self.token = self.next_token = self._start_token def advance(self, *symbols: str) -> TK_co: """ The Pratt's function for advancing to next token. :param symbols: Optional arguments tuple. If not empty one of the provided \ symbols is expected. If the next token's symbol differs the parser raises a \ parse error. :return: The next token instance. """ value: Any if self.next_token.symbol == '(end)' or \ symbols and self.next_token.symbol not in symbols: raise self.next_token.wrong_syntax() self.token = self.next_token for self.next_match in self.tokens: assert self.next_match is not None if not self.next_match.group().isspace(): break else: self.next_token = self.symbol_table['(end)'](self) return self.next_token literal, symbol, name, unknown = self.next_match.groups() if symbol is not None: try: self.next_token = self.symbol_table[symbol](self) except KeyError: if self.name_pattern.match(symbol) is None: self.next_token = self.symbol_table['(unknown)'](self, symbol) raise self.next_token.wrong_syntax() self.next_token = self.symbol_table['(name)'](self, symbol) elif literal is not None: if literal[0] in '\'"': value = self.unescape(literal) self.next_token = self.symbol_table['(string)'](self, value) elif 'e' in literal or 'E' in literal: try: value = float(literal) except ValueError as err: self.next_token = self.symbol_table['(invalid)'](self, literal) raise self.next_token.wrong_syntax(message=str(err)) else: self.next_token = self.symbol_table['(float)'](self, value) elif '.' in literal: try: value = Decimal(literal) except DecimalException as err: self.next_token = self.symbol_table['(invalid)'](self, literal) raise self.next_token.wrong_syntax(message=str(err)) else: self.next_token = self.symbol_table['(decimal)'](self, value) else: self.next_token = self.symbol_table['(integer)'](self, int(literal)) elif name is not None: self.next_token = self.symbol_table['(name)'](self, name) elif unknown is not None: self.next_token = self.symbol_table['(unknown)'](self, unknown) else: msg = "unexpected matching %r: incompatible tokenizer" raise RuntimeError(msg % self.next_match.group()) return self.next_token def advance_until(self, *stop_symbols: str) -> str: """ Advances until one of the symbols is found or the end of source is reached, returning the raw source string placed before. Useful for raw parsing of comments and references enclosed between specific symbols. :param stop_symbols: The symbols that have to be found for stopping advance. :return: The source string chunk enclosed between the initial position \ and the first stop symbol. """ if not stop_symbols: raise self.next_token.wrong_type("at least a stop symbol required!") elif self.next_token.symbol == '(end)': raise self.next_token.wrong_syntax() self.token = self.next_token source_chunk: List[str] = [] while True: try: self.next_match = next(self.tokens) except StopIteration: self.next_token = self.symbol_table['(end)'](self) break else: symbol = self.next_match.group(2) if symbol is not None: symbol = symbol.strip() if symbol not in stop_symbols: source_chunk.append(symbol) else: try: self.next_token = self.symbol_table[symbol](self) break except KeyError: self.next_token = self.symbol_table['(unknown)'](self) raise self.next_token.wrong_syntax() else: source_chunk.append(self.next_match.group()) return ''.join(source_chunk) def expression(self, rbp: int = 0) -> TK_co: """ Pratt's function for parsing an expression. It calls token.nud() and then advances until the right binding power is less the left binding power of the next token, invoking the led() method on the following token. :param rbp: right binding power for the expression. :return: left token. """ self.advance() left = self.token.nud() while rbp < self.next_token.lbp: self.advance() left = self.token.led(left) return cast(TK_co, left) @property def position(self) -> Tuple[int, int]: """Property that returns the current line and column indexes.""" return self.token.position def is_source_start(self) -> bool: """ Returns `True` if the parser is positioned at the start of the source, ignoring the spaces. """ return not bool(self.source[0:self.token.span[0]].strip()) def is_line_start(self) -> bool: """ Returns `True` if the parser is positioned at the start of a source line, ignoring the spaces. """ token_index = self.token.span[0] try: line_start = self.source[:token_index].rindex('\n') + 1 except ValueError: return not bool(self.source[:token_index].strip()) else: return not bool(self.source[line_start:token_index].strip()) def is_spaced(self, before: bool = True, after: bool = True) -> bool: """ Returns `True` if the source has an extra space (whitespace, tab or newline) immediately before or after the current position of the parser. :param before: if `True` considers also the extra spaces before \ the current token symbol. :param after: if `True` considers also the extra spaces after \ the current token symbol. """ start, end = self.token.span try: if before and start > 0 and self.source[start - 1] in ' \t\n': return True return after and self.source[end] in ' \t\n' except IndexError: return False @staticmethod def unescape(string_literal: str) -> str: return string_literal[1:-1].replace("\\'", "'").replace('\\"', '"') @classmethod def register(cls, symbol: Union[str, Type[TK_co]], **kwargs: Any) -> Type[TK_co]: """ Register/update a token class in the symbol table. :param symbol: The identifier symbol for a new class or an existent token class. :param kwargs: Optional attributes/methods for the token class. :return: A token class. """ if isinstance(symbol, str): if ' ' in symbol: raise ValueError("%r: a symbol can't contain whitespaces" % symbol) lookup_name = kwargs.get('lookup_name', symbol) try: token_class = cls.symbol_table[lookup_name] except KeyError: # Register a new symbol and create a new custom class. The new class # name is registered at parser class's module level. if symbol not in cls.SYMBOLS: if symbol != '(start)': # for backward compatibility raise NameError('%r is not a symbol of the parser %r.' % (symbol, cls)) kwargs['symbol'] = symbol kwargs['lookup_name'] = lookup_name label = kwargs.get('label', 'symbol') if isinstance(label, tuple): label = kwargs['label'] = MultiLabel(*label) token_class_name = "_{}{}".format( _symbol_to_classname(symbol), str(label).title().replace(' ', '') ) token_class_bases = kwargs.get('bases', (cls.token_base_class,)) kwargs.update({ '__module__': cls.__module__, '__qualname__': token_class_name, '__return__': None }) token_class = cast( Type[TK_co], ABCMeta(token_class_name, token_class_bases, kwargs) ) cls.symbol_table[lookup_name] = token_class MutableSequence.register(token_class) setattr(sys.modules[cls.__module__], token_class_name, token_class) elif not isinstance(symbol, type) or not issubclass(symbol, Token): raise TypeError("A string or a %r subclass requested, not %r." % (Token, symbol)) else: token_class = symbol if cls.symbol_table.get(symbol.lookup_name) is not token_class: raise ValueError("Token class %r is not registered." % token_class) for key, value in kwargs.items(): if key == 'lbp' and value > token_class.lbp: token_class.lbp = value elif key == 'rbp' and value > token_class.rbp: token_class.rbp = value elif callable(value): setattr(token_class, key, value) return token_class @classmethod def unregister(cls, symbol: str) -> None: """Unregister a token class from the symbol table.""" del cls.symbol_table[symbol.strip()] @classmethod def duplicate(cls, symbol: str, new_symbol: str, **kwargs: Any) -> Type[TK_co]: """Duplicate a token class with a new symbol.""" token_class = cls.symbol_table[symbol] new_token_class = cls.register(new_symbol, **kwargs) for key, value in token_class.__dict__.items(): if key in kwargs or key in ('symbol', 'pattern') or key.startswith('_'): continue setattr(new_token_class, key, value) return new_token_class @classmethod def literal(cls, symbol: str, bp: int = 0) -> Type[TK_co]: """Register a token for a symbol that represents a *literal*.""" def nud(self: Token[TK_co]) -> Token[TK_co]: return self def evaluate(self: Token[TK_co], *_args: Any, **_kwargs: Any) -> Any: return self.value return cls.register(symbol, label='literal', lbp=bp, evaluate=evaluate, nud=nud) @classmethod def nullary(cls, symbol: str, bp: int = 0) -> Type[TK_co]: """Register a token for a symbol that represents a *nullary* operator.""" def nud(self: Token[TK_co]) -> Token[TK_co]: return self return cls.register(symbol, label='operator', lbp=bp, nud=nud) @classmethod def prefix(cls, symbol: str, bp: int = 0) -> Type[TK_co]: """Register a token for a symbol that represents a *prefix* unary operator.""" def nud(self: Token[TK_co]) -> Token[TK_co]: self[:] = self.parser.expression(rbp=bp), return self return cls.register(symbol, label='prefix operator', lbp=bp, rbp=bp, nud=nud) @classmethod def postfix(cls, symbol: str, bp: int = 0) -> Type[TK_co]: """Register a token for a symbol that represents a *postfix* unary operator.""" def led(self: Token[TK_co], left: Token[TK_co]) -> Token[TK_co]: self[:] = left, return self return cls.register(symbol, label='postfix operator', lbp=bp, rbp=bp, led=led) @classmethod def infix(cls, symbol: str, bp: int = 0) -> Type[TK_co]: """Register a token for a symbol that represents an *infix* binary operator.""" def led(self: Token[TK_co], left: Token[TK_co]) -> Token[TK_co]: self[:] = left, self.parser.expression(rbp=bp) return self return cls.register(symbol, label='operator', lbp=bp, rbp=bp, led=led) @classmethod def infixr(cls, symbol: str, bp: int = 0) -> Type[TK_co]: """Register a token for a symbol that represents an *infixr* binary operator.""" def led(self: Token[TK_co], left: Token[TK_co]) -> Token[TK_co]: self[:] = left, self.parser.expression(rbp=bp - 1) return self return cls.register(symbol, label='operator', lbp=bp, rbp=bp - 1, led=led) @classmethod def method(cls, symbol: Union[str, Type[TK_co]], bp: int = 0) \ -> Callable[[Callable[..., Any]], Callable[..., Any]]: """ Register a token for a symbol that represents a custom operator or redefine a method for an existing token. """ token_class = cls.register(symbol, label='operator', lbp=bp, rbp=bp) @no_type_check_decorator def bind(func: Callable[..., Any]) -> Callable[..., Any]: method_name = func.__name__.partition('_')[0] if not callable(getattr(token_class, method_name)): raise TypeError(f"The attribute {method_name!r} is not a callable of {token_class}") setattr(token_class, method_name, func) return func return bind @classmethod def build(cls) -> None: """ Builds the parser class. Checks if all declared symbols are defined and builds the regex tokenizer using the symbol related patterns. """ # For backward compatibility with external defined parsers if '(start)' not in cls.symbol_table: cls.register('(start)') symbols = {tk.symbol for tk in cls.symbol_table.values()} if not cls.SYMBOLS.issubset(symbols): unregistered = list(s for s in cls.SYMBOLS if s not in symbols) raise ValueError("The parser %r has unregistered symbols: %r" % (cls, unregistered)) cls.tokenizer = cls.create_tokenizer(cls.symbol_table) build_tokenizer = build # For backward compatibility @classmethod def create_tokenizer(cls, symbol_table: MutableMapping[str, Type[TK_co]]) -> Pattern[str]: """ Returns a regex based tokenizer built from a symbol table of token classes. The returned tokenizer skips extra spaces between symbols. A regular expression is created from the symbol table of the parser using a template. The symbols are inserted in the template putting the longer symbols first. Symbols and their patterns can't contain spaces. :param symbol_table: a dictionary containing the token classes of the formal language. """ character_patterns = [] string_patterns = [] name_patterns = [] custom_patterns = set() for symbol, token_class in symbol_table.items(): if symbol in SPECIAL_SYMBOLS: continue elif token_class.pattern is not None: custom_patterns.add(token_class.pattern) elif cls.name_pattern.match(symbol) is not None: name_patterns.append(re.escape(symbol)) elif len(symbol) == 1: character_patterns.append(re.escape(symbol)) else: string_patterns.append(re.escape(symbol)) symbols_patterns: List[str] = [] if string_patterns: symbols_patterns.append('|'.join(sorted(string_patterns, key=lambda x: -len(x)))) if character_patterns: symbols_patterns.append('[{}]'.format(''.join(character_patterns))) if name_patterns: symbols_patterns.append(r'\b(?:{})\b(?![\-\.])'.format( '|'.join(sorted(name_patterns, key=lambda x: -len(x))) )) if custom_patterns: symbols_patterns.append('|'.join(custom_patterns)) tokenizer_pattern = r"({})|({})|({})|(\S)|\s+".format( cls.literals_pattern.pattern, '|'.join(symbols_patterns), cls.name_pattern.pattern ) return re.compile(tokenizer_pattern) elementpath-3.0.2/elementpath/tree_builders.py000066400000000000000000000276051427546011100215450ustar00rootroot00000000000000# # Copyright (c), 2018-2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from typing import cast, Any, Dict, Iterator, List, MutableMapping, Optional, Union from .exceptions import ElementPathTypeError from .protocols import ElementProtocol, LxmlElementProtocol, \ DocumentProtocol, XsdElementProtocol from .etree import is_etree_document, is_etree_element from .xpath_nodes import SchemaElemType, RootArgType, ChildNodeType, \ ElementMapType, TextNode, CommentNode, ProcessingInstructionNode, \ ElementNode, SchemaElementNode, DocumentNode __all__ = ['get_node_tree', 'build_node_tree', 'build_lxml_node_tree', 'build_schema_node_tree'] def is_schema(obj: Any) -> bool: return hasattr(obj, 'xsd_version') and hasattr(obj, 'maps') and not hasattr(obj, 'parent') def get_node_tree(root: RootArgType, namespaces: Optional[Dict[str, str]] = None) \ -> Union[DocumentNode, ElementNode]: """ Returns a tree of XPath nodes that wrap the provided root tree. :param root: an Element or an ElementTree or a schema or a schema element. :param namespaces: an optional mapping from prefixes to namespace URIs, \ Ignored if root is a lxml etree or a schema structure. """ if isinstance(root, (DocumentNode, ElementNode)): return root elif is_etree_document(root): if hasattr(root, 'xpath'): return build_lxml_node_tree(cast(DocumentProtocol, root)) return build_node_tree( cast(DocumentProtocol, root), namespaces ) elif hasattr(root, 'xsd_version') and hasattr(root, 'maps'): # schema or schema node return build_schema_node_tree( cast(SchemaElemType, root) ) elif is_etree_element(root) and not callable(root.tag): # type: ignore[union-attr] if hasattr(root, 'nsmap') and hasattr(root, 'xpath'): return build_lxml_node_tree(cast(LxmlElementProtocol, root)) return build_node_tree( cast(ElementProtocol, root), namespaces ) else: msg = "invalid root {!r}, an Element or an ElementTree or a schema node required" raise ElementPathTypeError(msg.format(root)) def build_node_tree(root: Union[DocumentProtocol, ElementProtocol], namespaces: Optional[MutableMapping[str, str]] = None) \ -> Union[DocumentNode, ElementNode]: """ Returns a tree of XPath nodes that wrap the provided root tree. :param root: an Element or an ElementTree. :param namespaces: an optional mapping from prefixes to namespace URIs. """ root_node: Union[DocumentNode, ElementNode] parent: Any elements: Any child: ChildNodeType children: Iterator[Any] position = 1 def build_element_node() -> ElementNode: nonlocal position node = ElementNode(elem, parent, position, namespaces) position += 1 elements[elem] = node # Do not generate namespace and attribute nodes, only reserve positions. position += len(node.nsmap) + int('xml' not in node.nsmap) + len(elem.attrib) if elem.text is not None: node.children.append(TextNode(elem.text, node, position)) position += 1 return node if hasattr(root, 'parse'): document = cast(DocumentProtocol, root) root_node = parent = DocumentNode(document, position) position += 1 elements = root_node.elements elem = document.getroot() child = build_element_node() parent.children.append(child) parent = child else: elem = cast(ElementProtocol, root) parent = None elements = {} root_node = parent = build_element_node() root_node.elements = elements children = iter(elem) iterators: List[Any] = [] ancestors: List[Any] = [] while True: for elem in children: if not callable(elem.tag): child = build_element_node() elif elem.tag.__name__ == 'Comment': # type: ignore[attr-defined] child = CommentNode(elem, parent, position) position += 1 else: child = ProcessingInstructionNode(elem, parent, position) parent.children.append(child) if elem.tail is not None: parent.children.append(TextNode(elem.tail, parent, position)) position += 1 if len(elem): ancestors.append(parent) parent = child iterators.append(children) children = iter(elem) break else: try: children, parent = iterators.pop(), ancestors.pop() except IndexError: return root_node def build_lxml_node_tree(root: Union[DocumentProtocol, LxmlElementProtocol]) \ -> Union[DocumentNode, ElementNode]: """ Returns a tree of XPath nodes that wrap the provided lxml root tree. :param root: a lxml Element or a lxml ElementTree. """ root_node: Union[DocumentNode, ElementNode] parent: Any elements: Any child: ChildNodeType children: Iterator[Any] position = 1 def build_lxml_element_node() -> ElementNode: nonlocal position node = ElementNode(elem, parent, position, elem.nsmap) position += 1 elements[elem] = node # Do not generate namespace and attribute nodes, only reserve positions. position += len(elem.nsmap) + int('xml' not in elem.nsmap) + len(elem.attrib) if elem.text is not None: node.children.append(TextNode(elem.text, node, position)) position += 1 return node if hasattr(root, 'parse'): document = cast(DocumentProtocol, root) root_node = parent = DocumentNode(document, position) position += 1 else: # create a new ElementTree for the root element at position==0 document = cast(LxmlElementProtocol, root).getroottree() root_node = parent = DocumentNode(document, 0) elem = cast(LxmlElementProtocol, document.getroot()) elements = root_node.elements # Add root siblings (comments and processing instructions) for e in reversed([x for x in elem.itersiblings(preceding=True)]): if e.tag.__name__ == 'Comment': # type: ignore[attr-defined] parent.children.append(CommentNode(e, parent, position)) else: parent.children.append(ProcessingInstructionNode(e, parent, position)) position += 1 child = build_lxml_element_node() parent.children.append(child) for e in elem.itersiblings(): if e.tag.__name__ == 'Comment': # type: ignore[attr-defined] parent.children.append(CommentNode(e, parent, position)) else: parent.children.append(ProcessingInstructionNode(e, parent, position)) position += 1 if not root_node.position and len(parent.children) == 1: # Remove non-root document if root element has no siblings child.elements = root_node.elements root_node = child root_node.parent = None parent = child iterators: List[Any] = [] ancestors: List[Any] = [] children = iter(elem) while True: for elem in children: if not callable(elem.tag): child = build_lxml_element_node() elif elem.tag.__name__ == 'Comment': # type: ignore[attr-defined] child = CommentNode(elem, parent, position) position += 1 else: child = ProcessingInstructionNode(elem, parent, position) parent.children.append(child) if elem.tail is not None: parent.children.append(TextNode(elem.tail, parent, position)) position += 1 if len(elem): ancestors.append(parent) parent = child iterators.append(children) children = iter(elem) break else: try: children, parent = iterators.pop(), ancestors.pop() except IndexError: if isinstance(root_node, ElementNode) and root_node.elem is not root: for _node in root_node.iter_descendants(): if isinstance(_node, ElementNode) and _node.elem is root: return _node return root_node def build_schema_node_tree(root: SchemaElemType, elements: Optional[ElementMapType] = None, global_elements: Optional[List[ChildNodeType]] = None) \ -> SchemaElementNode: """ Returns a tree of XPath nodes that wrap the provided XSD schema structure. :param root: a schema or a schema element. :param elements: a shared map from XSD elements to tree nodes. Provided for \ linking together parts of the same schema or other schemas. :param global_elements: a list for schema global elements, used for linking \ the elements declared by reference. """ parent: Any elem: Any child: SchemaElementNode children: Iterator[Any] position = 1 _elements = {} if elements is None else elements def build_schema_element_node() -> SchemaElementNode: nonlocal position node = SchemaElementNode(elem, parent, position, elem.namespaces) position += 1 _elements[elem] = node # Do not generate namespace and attribute nodes, only reserve positions. position += len(elem.namespaces) + int('xml' not in elem.namespaces) + len(elem.attrib) return node children = iter(root) elem = root parent = None root_node = parent = build_schema_element_node() root_node.elements = _elements if global_elements is not None: global_elements.append(root_node) elif is_schema(root): global_elements = root_node.children else: # Track global elements even if the initial root is not a schema to avoid circularity global_elements = [] local_nodes = {root: root_node} # Irrelevant even if it's the schema ref_nodes: List[SchemaElementNode] = [] iterators: List[Any] = [] ancestors: List[Any] = [] while True: for elem in children: child = build_schema_element_node() child.xsd_type = elem.type parent.children.append(child) if elem in local_nodes: if elem.ref is None: child.children = local_nodes[elem].children else: ref_nodes.append(child) else: local_nodes[elem] = child if elem.ref is None: ancestors.append(parent) parent = child iterators.append(children) children = iter(elem) break else: ref_nodes.append(child) else: try: children, parent = iterators.pop(), ancestors.pop() except IndexError: # connect references to proper nodes for element_node in ref_nodes: ref = cast(XsdElementProtocol, element_node.elem).ref assert ref is not None other: Any for other in global_elements: if other.elem is ref: element_node.ref = other break else: # Extend node tree with other globals element_node.ref = build_schema_node_tree( ref, _elements, global_elements ) return root_node elementpath-3.0.2/elementpath/xpath1/000077500000000000000000000000001427546011100175365ustar00rootroot00000000000000elementpath-3.0.2/elementpath/xpath1/__init__.py000066400000000000000000000007711427546011100216540ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from typing import TYPE_CHECKING if TYPE_CHECKING: from .xpath1_parser import XPath1Parser else: from ._xpath1_axes import XPath1Parser __all__ = ['XPath1Parser'] elementpath-3.0.2/elementpath/xpath1/_xpath1_axes.py000066400000000000000000000073321427546011100225010ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # type: ignore """ XPath 1.0 implementation - part 4 (axes) """ from ..xpath_nodes import ElementNode from ..xpath_context import XPathSchemaContext from ._xpath1_functions import XPath1Parser method = XPath1Parser.method axis = XPath1Parser.axis @method('@', bp=80) def nud_attribute_reference(self): self.parser.expected_name( '*', '(name)', ':', '{', 'Q{', message="invalid attribute specification") self[:] = self.parser.expression(rbp=80), return self @method('@') @method(axis('attribute')) def select_attribute_reference_or_axis(self, context=None): if context is None: raise self.missing_context() else: for _ in context.iter_attributes(): yield from self[0].select(context) @method(axis('namespace')) def select_namespace_axis(self, context=None): if context is None: raise self.missing_context() elif isinstance(context, XPathSchemaContext): return # deprecated for XP20+ and not needed for schema analysis elif isinstance(context.item, ElementNode): elem = context.item if self[0].symbol == 'namespace-node': name = '*' else: name = self[0].value for context.item in elem.namespace_nodes: if name == '*' or name == context.item.prefix: yield context.item @method(axis('self')) def select_self_axis(self, context=None): if context is None: raise self.missing_context() else: for _ in context.iter_self(): yield from self[0].select(context) @method(axis('child')) def select_child_axis(self, context=None): if context is None: raise self.missing_context() else: for _ in context.iter_children_or_self(): yield from self[0].select(context) @method(axis('parent', reverse_axis=True)) def select_parent_axis(self, context=None): if context is None: raise self.missing_context() else: for _ in context.iter_parent(): yield from self[0].select(context) @method(axis('following-sibling')) @method(axis('preceding-sibling', reverse_axis=True)) def select_sibling_axes(self, context=None): if context is None: raise self.missing_context() else: for _ in context.iter_siblings(axis=self.symbol): yield from self[0].select(context) @method(axis('ancestor', reverse_axis=True)) @method(axis('ancestor-or-self', reverse_axis=True)) def select_ancestor_axes(self, context=None): if context is None: raise self.missing_context() else: for _ in context.iter_ancestors(axis=self.symbol): yield from self[0].select(context) @method(axis('descendant')) @method(axis('descendant-or-self')) def select_descendant_axes(self, context=None): if context is None: raise self.missing_context() else: for _ in context.iter_descendants(axis=self.symbol): yield from self[0].select(context) @method(axis('following')) def select_following_axis(self, context=None): if context is None: raise self.missing_context() else: for _ in context.iter_followings(): yield from self[0].select(context) @method(axis('preceding', reverse_axis=True)) def select_preceding_axis(self, context=None): if context is None: raise self.missing_context() else: for _ in context.iter_preceding(): yield from self[0].select(context) elementpath-3.0.2/elementpath/xpath1/_xpath1_functions.py000066400000000000000000000365131427546011100235540ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # type: ignore """ XPath 1.0 implementation - part 3 (functions) """ import sys import math import decimal from ..datatypes import Duration, DayTimeDuration, YearMonthDuration, \ StringProxy, AnyURI, Float10 from ..namespaces import XML_ID, XML_LANG, get_prefixed_name from ..xpath_nodes import XPathNode, ElementNode, TextNode, CommentNode, \ ProcessingInstructionNode, DocumentNode from ..xpath_token import XPathFunction from ._xpath1_operators import XPath1Parser method = XPath1Parser.method function = XPath1Parser.function ### # Kind tests (for matching of node types in XPath 1.0 or sequence types in XPath 2.0) @method(function('node', nargs=0, label='kind test')) def select_node_kind_test(self, context=None): if context is None: raise self.missing_context() for item in context.iter_children_or_self(): if item is None: yield context.root elif isinstance(item, XPathNode): yield item @method('node') def nud_item_sequence_type(self): XPathFunction.nud(self) if self.parser.next_token.symbol in ('*', '+', '?'): self.occurrence = self.parser.next_token.symbol self.parser.advance() return self @method(function('processing-instruction', nargs=(0, 1), bp=79, label='kind test')) def select_pi_kind_test(self, context=None): if context is None: raise self.missing_context() for item in context.iter_children_or_self(): if isinstance(item, ProcessingInstructionNode): if not self: yield item else: name = self[0].value if item.name == ' '.join(name.strip().split()): yield item @method('processing-instruction') def nud_pi_kind_test(self): self.parser.advance('(') if self.parser.next_token.symbol != ')': self.parser.next_token.expected('(name)', '(string)') self[0:] = self.parser.expression(5), self.parser.advance(')') self.value = None return self @method(function('comment', nargs=0, label='kind test')) def select_comment_kind_test(self, context=None): if context is None: raise self.missing_context() for item in context.iter_children_or_self(): if isinstance(item, CommentNode): yield item @method(function('text', nargs=0, label='kind test')) def select_text_kind_test(self, context=None): if context is None: raise self.missing_context() for item in context.iter_children_or_self(): if isinstance(item, TextNode): yield item ### # Node set functions @method(function('last', nargs=0, sequence_types=('xs:integer',))) def evaluate_last_function(self, context=None): if context is None: raise self.missing_context() return context.size @method(function('position', nargs=0, sequence_types=('xs:integer',))) def evaluate_position_function(self, context=None): if context is None: raise self.missing_context() return context.position @method(function('count', nargs=1, sequence_types=('item()*', 'xs:integer'))) def evaluate_count_function(self, context=None): return len([x for x in self[0].select(context)]) @method(function('id', nargs=1, sequence_types=('xs:string*', 'element()*'))) def select_id_function(self, context=None): if context is None: raise self.missing_context() else: value = self[0].evaluate(context) item = context.item if item is None: item = context.root if isinstance(item, (ElementNode, DocumentNode)): for element in item.iter_descendants(): if isinstance(element, ElementNode) and element.elem.get(XML_ID) == value: yield element @method(function('name', nargs=(0, 1), sequence_types=('node()?', 'xs:string'))) @method(function('local-name', nargs=(0, 1), sequence_types=('node()?', 'xs:string'))) @method(function('namespace-uri', nargs=(0, 1), sequence_types=('node()?', 'xs:anyURI'))) def evaluate_name_related_functions(self, context=None): if context is None: raise self.missing_context() arg = self.get_argument(context, default_to_context=True) if arg is None: return '' elif not isinstance(arg, XPathNode): raise self.error('XPTY0004') name = arg.name if name is None: return '' symbol = self.symbol if symbol == 'name': nsmap = getattr(arg, 'nsmap', self.parser.namespaces) return get_prefixed_name(name, nsmap) elif symbol == 'local-name': return name if not name or name[0] != '{' else name.split('}')[1] elif self.parser.version == '1.0': return '' if not name or name[0] != '{' else name.split('}')[0][1:] else: return AnyURI('') if not name or name[0] != '{' else AnyURI(name.split('}')[0][1:]) ### # String functions @method(function('string', nargs=(0, 1), sequence_types=('item()?', 'xs:string'))) def evaluate_string_function(self, context=None): if not self: if context is None: raise self.missing_context() return self.string_value(context.item) return self.string_value(self.get_argument(context)) @method(function('contains', nargs=2, sequence_types=('xs:string?', 'xs:string?', 'xs:boolean'))) def evaluate_contains_function(self, context=None): arg1 = self.get_argument(context, default='', cls=str) arg2 = self.get_argument(context, index=1, default='', cls=str) return arg2 in arg1 @method(function('concat', nargs=(2, None), sequence_types=('xs:anyAtomicType?', 'xs:anyAtomicType?', 'xs:string'))) def evaluate_concat_function(self, context=None): return ''.join(self.string_value(self.get_argument(context, index=k)) for k in range(len(self))) @method(function('string-length', nargs=(0, 1), sequence_types=('xs:string?', 'xs:integer'))) def evaluate_string_length_function(self, context=None): if self: return len(self.get_argument(context, default_to_context=True, default='', cls=str)) try: return len(self.string_value(context.item)) except AttributeError: raise self.missing_context() from None @method(function('normalize-space', nargs=(0, 1), sequence_types=('xs:string?', 'xs:string'))) def evaluate_normalize_space_function(self, context=None): if self.parser.version == '1.0' or not self: arg = self.string_value(self.get_argument(context, default_to_context=True, default='')) else: arg = self.get_argument(context, default_to_context=True, default='', cls=str) return ' '.join(arg.strip().split()) @method(function('starts-with', nargs=2, sequence_types=('xs:string?', 'xs:string?', 'xs:boolean'))) def evaluate_starts_with_function(self, context=None): arg1 = self.get_argument(context, default='', cls=str) arg2 = self.get_argument(context, index=1, default='', cls=str) return arg1.startswith(arg2) @method(function('translate', nargs=3, sequence_types=('xs:string?', 'xs:string', 'xs:string', 'xs:string'))) def evaluate_translate_function(self, context=None): arg = self.get_argument(context, default='', cls=str) map_string = self.get_argument(context, index=1, cls=str) if map_string is None: message = "the 2nd argument of fn:translate() cannot be the empty sequence" raise self.error('XPTY0004', message) trans_string = self.get_argument(context, index=2, cls=str) if trans_string is None: message = "the 3rd argument of fn:translate() cannot be the empty sequence" raise self.error('XPTY0004', message) if len(map_string) == len(trans_string): return arg.translate(str.maketrans(map_string, trans_string)) elif len(map_string) > len(trans_string): k = len(trans_string) return arg.translate(str.maketrans(map_string[:k], trans_string, map_string[k:])) else: return arg.translate(str.maketrans(map_string, trans_string[:len(map_string)])) @method(function('substring', nargs=(2, 3), sequence_types=('xs:string?', 'xs:double', 'xs:double', 'xs:string'))) def evaluate_substring_function(self, context=None): item = self.get_argument(context, default='', cls=str) start = self.get_argument(context, index=1) try: if math.isnan(start) or math.isinf(start): return '' except TypeError: raise self.error('FORG0006', "the second argument must be xs:numeric") from None else: start = int(round(start)) - 1 if len(self) == 2: return item[max(start, 0):] else: length = self.get_argument(context, index=2) try: if math.isnan(length) or length <= 0: return '' except TypeError: raise self.error('FORG0006', "the third argument must be xs:numeric") from None if math.isinf(length): return item[max(start, 0):] else: stop = start + int(round(length)) return item[slice(max(start, 0), max(stop, 0))] @method(function('substring-before', nargs=2, sequence_types=('xs:string?', 'xs:string?', 'xs:string'))) @method(function('substring-after', nargs=2, sequence_types=('xs:string?', 'xs:string?', 'xs:string'))) def evaluate_substring_before_or_after_functions(self, context=None): arg1 = self.get_argument(context, default='', cls=str) arg2 = self.get_argument(context, index=1, default='', cls=str) index = arg1.find(arg2) if index < 0: return '' if self.symbol == 'substring-before': return arg1[:index] else: return arg1[index + len(arg2):] ### # Boolean functions @method(function('boolean', nargs=1, sequence_types=('item()*', 'xs:boolean'))) def evaluate_boolean_function(self, context=None): return self.boolean_value([x for x in self[0].select(context)]) @method(function('not', nargs=1, sequence_types=('item()*', 'xs:boolean'))) def evaluate_not_function(self, context=None): return not self.boolean_value([x for x in self[0].select(context)]) @method(function('true', nargs=0, sequence_types=('xs:boolean',))) def evaluate_true_function(self, context=None): return True @method(function('false', nargs=0, sequence_types=('xs:boolean',))) def evaluate_false_function(self, context=None): return False @method(function('lang', nargs=1, sequence_types=('xs:string?', 'xs:boolean'))) def evaluate_lang_function(self, context=None): if context is None: raise self.missing_context() elif not isinstance(context.item, ElementNode): return False else: try: lang = context.item.elem.attrib[XML_LANG].strip() except KeyError: for e in context.iter_ancestors(): if isinstance(e, ElementNode) and XML_LANG in e.elem.attrib: lang = e.elem.attrib[XML_LANG] break else: return False if '-' in lang: lang, _ = lang.split('-') return lang.lower() == self[0].evaluate().lower() ### # Number functions @method(function('number', nargs=(0, 1), sequence_types=('xs:anyAtomicType?', 'xs:double'))) def evaluate_number_function(self, context=None): arg = self.get_argument(context, default_to_context=True) try: return float(self.string_value(arg) if isinstance(arg, XPathNode) else arg) except (TypeError, ValueError): return float('nan') @method(function('sum', nargs=(1, 2), sequence_types=('xs:anyAtomicType*', 'xs:anyAtomicType?', 'xs:anyAtomicType?'))) def evaluate_sum_function(self, context=None): try: values = [float(self.string_value(x)) if isinstance(x, XPathNode) else x for x in self[0].select(context)] except (TypeError, ValueError): if self.parser.version == '1.0': return float('nan') raise self.error('FORG0006') from None if not values: zero = 0 if len(self) == 1 else self.get_argument(context, index=1) return [] if zero is None else zero if all(isinstance(x, (decimal.Decimal, int)) for x in values): return sum(values) if len(values) > 1 else values[0] elif all(isinstance(x, DayTimeDuration) for x in values) or \ all(isinstance(x, YearMonthDuration) for x in values): if sys.version_info >= (3, 8): return sum(values[1:], start=values[0]) result = values[0] for val in values[1:]: result += val return result elif any(isinstance(x, Duration) for x in values): raise self.error('FORG0006', 'invalid sum of duration values') elif any(isinstance(x, (StringProxy, AnyURI)) for x in values): raise self.error('FORG0006', 'cannot apply fn:sum() to string-based types') elif any(isinstance(x, float) and math.isnan(x) for x in values): return float('nan') elif all(isinstance(x, Float10) for x in values): return sum(values) try: return sum(self.number_value(x) for x in values) except TypeError: if self.parser.version == '1.0': return float('nan') raise self.error('FORG0006') from None @method(function('ceiling', nargs=1, sequence_types=('numeric?', 'numeric?'))) @method(function('floor', nargs=1, sequence_types=('numeric?', 'numeric?'))) def evaluate_ceiling_and_floor_functions(self, context=None): arg = self.get_argument(context) if arg is None: return float('nan') if self.parser.version == '1.0' else [] elif isinstance(arg, XPathNode) or self.parser.compatibility_mode: arg = self.number_value(arg) try: if math.isnan(arg) or math.isinf(arg): return arg if self.symbol == 'floor': return type(arg)(math.floor(arg)) else: return type(arg)(math.ceil(arg)) except TypeError as err: if isinstance(arg, str): raise self.error('XPTY0004', err) from None raise self.error('FORG0006', err) from None @method(function('round', nargs=1, sequence_types=('numeric?', 'numeric?'))) def evaluate_round_function(self, context=None): arg = self.get_argument(context) if arg is None: return float('nan') if self.parser.version == '1.0' else [] elif isinstance(arg, XPathNode) or self.parser.compatibility_mode: arg = self.number_value(arg) if isinstance(arg, float) and (math.isnan(arg) or math.isinf(arg)): return arg try: number = decimal.Decimal(arg) if number > 0: return type(arg)(number.quantize(decimal.Decimal('1'), rounding='ROUND_HALF_UP')) else: return type(arg)(number.quantize(decimal.Decimal('1'), rounding='ROUND_HALF_DOWN')) except TypeError as err: raise self.error('FORG0006', err) from None except decimal.InvalidOperation: if isinstance(arg, str): raise self.error('XPTY0004') from None return round(arg) except decimal.DecimalException as err: raise self.error('FOCA0002', err) from None # XPath 1.0 definitions continue into module xpath1_axes elementpath-3.0.2/elementpath/xpath1/_xpath1_operators.py000066400000000000000000000647131427546011100235650ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # type: ignore """ XPath 1.0 implementation - part 2 (operators and expressions) """ import math import decimal import operator from copy import copy from ..datatypes import AnyURI from ..exceptions import ElementPathKeyError, ElementPathTypeError from ..helpers import collapse_white_spaces, node_position from ..datatypes import AbstractDateTime, Duration, DayTimeDuration, \ YearMonthDuration, NumericProxy, ArithmeticProxy from ..xpath_context import XPathSchemaContext from ..namespaces import XMLNS_NAMESPACE, XSD_NAMESPACE from ..schema_proxy import AbstractSchemaProxy from ..xpath_nodes import XPathNode, ElementNode, AttributeNode, DocumentNode from .xpath1_parser import XPath1Parser OPERATORS_MAP = { '=': operator.eq, '!=': operator.ne, '>': operator.gt, '>=': operator.ge, '<': operator.lt, '<=': operator.le, } register = XPath1Parser.register nullary = XPath1Parser.nullary prefix = XPath1Parser.prefix infix = XPath1Parser.infix postfix = XPath1Parser.postfix method = XPath1Parser.method function = XPath1Parser.function axis = XPath1Parser.axis @method(register('(name)', bp=10, label='literal')) def nud_name_literal(self): if self.parser.next_token.symbol == '::': raise self.missing_axis("axis '%s::' not found" % self.value) elif self.parser.next_token.symbol == '(': if self.parser.version >= '2.0': pass # XP30+ has led() for '(' operator that can check this elif self.namespace == XSD_NAMESPACE: raise self.error('XPST0017', 'unknown constructor function {!r}'.format(self.value)) elif self.namespace or self.value not in self.parser.RESERVED_FUNCTION_NAMES: raise self.error('XPST0017', 'unknown function {!r}'.format(self.value)) else: msg = f"{self.value!r} is not allowed as function name" raise self.error('XPST0003', msg) return self @method('(name)') def evaluate_name_literal(self, context=None): return [x for x in self.select(context)] @method('(name)') def select_name_literal(self, context=None): if context is None: raise self.missing_context() elif isinstance(context, XPathSchemaContext): yield from self.select_xsd_nodes(context, self.value) return else: name = self.value default_namespace = self.parser.default_namespace # With an ElementTree context checks if the token is bound to an XSD type. If not # try a match using the element path. If this match fails the xsd_type attribute # is set with the schema object to prevent other checks until the schema change. if self.xsd_types is self.parser.schema: # Untyped selection for item in context.iter_children_or_self(): if item.match_name(name, default_namespace): yield item elif self.xsd_types is None or isinstance(self.xsd_types, AbstractSchemaProxy): # Try to match the type using the item's path for item in context.iter_children_or_self(): if item.match_name(name, default_namespace): if item.xsd_type is not None: yield item else: xsd_node = self.parser.schema.find(item.path, self.parser.namespaces) if xsd_node is None: self.xsd_types = self.parser.schema elif isinstance(item, AttributeNode): self.xsd_types = {item.name: xsd_node.type} else: self.xsd_types = {item.elem.tag: xsd_node.type} context.item = self.get_typed_node(item) yield context.item else: # XSD typed selection for item in context.iter_children_or_self(): if item.match_name(name, default_namespace): if item.xsd_type is not None: yield item else: context.item = self.get_typed_node(item) yield context.item ### # Namespace prefix reference @method(':', bp=95) def led_namespace_prefix(self, left): if self.parser.version == '1.0': left.expected('(name)') else: left.expected('(name)', '*') if not self.parser.next_token.label.endswith('function'): self.parser.expected_name('(name)', '*') if self.parser.is_spaced(): raise self.wrong_syntax("a QName cannot contains spaces before or after ':'") if left.symbol == '(name)': try: namespace = self.get_namespace(left.value) except ElementPathKeyError: self.parser.advance() # Assure there isn't a following incomplete comment self[:] = left, self.parser.token msg = "prefix {!r} is not declared".format(left.value) raise self.error('XPST0081', msg) from None else: self.parser.next_token.bind_namespace(namespace) elif self.parser.next_token.symbol != '(name)': raise self.wrong_syntax() self[:] = left, self.parser.expression(90) self.value = '{}:{}'.format(self[0].value, self[1].value) if self.parser.next_token.symbol == ':': raise self.wrong_syntax() return self @method(':') def evaluate_namespace_prefix(self, context=None): if self[1].label.endswith('function'): return self[1].evaluate(context) return [x for x in self.select(context)] @method(':') def select_namespace_prefix(self, context=None): if self[1].label.endswith('function'): value = self[1].evaluate(context) if isinstance(value, list): yield from value elif value is not None: yield value return if self[0].value == '*': name = '*:%s' % self[1].value else: name = '{%s}%s' % (self.get_namespace(self[0].value), self[1].value) if context is None: yield name elif isinstance(context, XPathSchemaContext): yield from self.select_xsd_nodes(context, name) elif self.xsd_types is self.parser.schema: for item in context.iter_children_or_self(): if item.match_name(name): yield item elif self.xsd_types is None or isinstance(self.xsd_types, AbstractSchemaProxy): for item in context.iter_children_or_self(): if item.match_name(name): assert isinstance(item, (ElementNode, AttributeNode)) if item.xsd_type is not None: yield item else: xsd_node = self.parser.schema.find(item.path, self.parser.namespaces) if xsd_node is not None: self.add_xsd_type(xsd_node) else: self.xsd_types = self.parser.schema context.item = self.get_typed_node(item) yield context.item else: # XSD typed selection for item in context.iter_children_or_self(): if item.match_name(name): assert isinstance(item, (ElementNode, AttributeNode)) if item.xsd_type is not None: yield item else: context.item = self.get_typed_node(item) yield context.item ### # Namespace URI as in ElementPath @method('{', bp=95) def nud_namespace_uri(self): if self.parser.strict and self.symbol == '{': raise self.wrong_syntax("not allowed symbol if parser has strict=True") self.parser.next_token.unexpected('{') if self.parser.next_token.symbol == '}': namespace = '' else: namespace = self.parser.next_token.value + self.parser.advance_until('}') namespace = collapse_white_spaces(namespace) try: AnyURI(namespace) except ValueError as err: msg = f"invalid URI in an EQName: {str(err)}" raise self.error('XQST0046', msg) from None if namespace == XMLNS_NAMESPACE: msg = f"cannot use the URI {XMLNS_NAMESPACE!r}!r in an EQName" raise self.error('XQST0070', msg) self.parser.advance() if not self.parser.next_token.label.endswith('function'): self.parser.expected_name('(name)', '*') self.parser.next_token.bind_namespace(namespace) self[:] = self.parser.symbol_table['(string)'](self.parser, namespace), \ self.parser.expression(90) if self[1].value is None or not self[0].value: self.value = self[1].value else: self.value = '{%s}%s' % (self[0].value, self[1].value) return self @method('{') def evaluate_namespace_uri(self, context=None): if self[1].label.endswith('function'): return self[1].evaluate(context) return [x for x in self.select(context)] @method('{') def select_namespace_uri(self, context=None): if self[1].label.endswith('function'): yield self[1].evaluate(context) return elif context is None: raise self.missing_context() if isinstance(context, XPathSchemaContext): yield from self.select_xsd_nodes(context, self.value) elif self.xsd_types is None: for item in context.iter_children_or_self(): if item.match_name(self.value): yield item else: # XSD typed selection for item in context.iter_children_or_self(): if item.match_name(self.value): assert isinstance(item, (ElementNode, AttributeNode)) if item.xsd_type is not None: yield item else: context.item = self.get_typed_node(item) yield context.item ### # Variables @method('$', bp=90) def nud_variable_reference(self): self.parser.expected_name('(name)') self[:] = self.parser.expression(rbp=90), if ':' in self[0].value: raise self[0].wrong_syntax("variable reference requires a simple reference name") return self @method('$') def evaluate_variable_reference(self, context=None): if context is None: raise self.missing_context() try: return context.variables[self[0].value] except KeyError as err: raise self.missing_name('unknown variable %r' % str(err)) from None ### # Nullary operators (use only the context) @method(nullary('*')) def select_wildcard(self, context=None): if self: # Product operator item = self.evaluate(context) if item is not None: if context is not None: context.item = item yield item elif context is None: raise self.missing_context() # Wildcard literal elif isinstance(context, XPathSchemaContext): for item in context.iter_children_or_self(): if item is not None: self.add_xsd_type(item) yield item elif self.xsd_types is None: for item in context.iter_children_or_self(): if item is None: pass # '*' wildcard doesn't match document nodes elif context.axis == 'attribute': if isinstance(item, AttributeNode): yield item elif isinstance(item, ElementNode): yield item else: # XSD typed selection for item in context.iter_children_or_self(): if context.item is not None and context.is_principal_node_kind(): if isinstance(item, (ElementNode, AttributeNode)) and \ item.xsd_type is not None: yield item else: context.item = self.get_typed_node(item) yield context.item @method(nullary('.')) def select_self_shortcut(self, context=None): if context is None: raise self.missing_context() elif isinstance(context, XPathSchemaContext): for item in context.iter_self(): if isinstance(item, (AttributeNode, ElementNode)): if item.is_schema_element(): self.add_xsd_type(item) elif item is context.root: # item is the schema for xsd_element in item: self.add_xsd_type(xsd_element) yield item elif self.xsd_types is None: for item in context.iter_self(): if item is not None: yield item elif isinstance(context.root, DocumentNode): yield context.root else: for item in context.iter_self(): if item is not None: if isinstance(item, (ElementNode, AttributeNode)) and \ item.xsd_type is not None: yield item else: context.item = self.get_typed_node(item) yield context.item elif isinstance(context.root, DocumentNode): yield context.root @method(nullary('..')) def select_parent_shortcut(self, context=None): if context is None: raise self.missing_context() yield from context.iter_parent() ### # Logical Operators @method(infix('or', bp=20)) def evaluate_or_operator(self, context=None): return self.boolean_value(self[0].evaluate(copy(context))) or \ self.boolean_value(self[1].evaluate(copy(context))) @method(infix('and', bp=25)) def evaluate_and_operator(self, context=None): return self.boolean_value(self[0].evaluate(copy(context))) and \ self.boolean_value(self[1].evaluate(copy(context))) ### # Comparison operators @method('=', bp=30) @method('!=', bp=30) @method('<', bp=30) @method('>', bp=30) @method('<=', bp=30) @method('>=', bp=30) def led_comparison_operators(self, left): if left.symbol in OPERATORS_MAP: raise self.wrong_syntax() self[:] = left, self.parser.expression(rbp=30) return self @method('=') @method('!=') @method('<') @method('>') @method('<=') @method('>=') def evaluate_comparison_operators(self, context=None): op = OPERATORS_MAP[self.symbol] try: return any(op(x1, x2) for x1, x2 in self.iter_comparison_data(context)) except ElementPathTypeError: raise except TypeError as err: raise self.error('XPTY0004', err) from None except ValueError as err: raise self.error('FORG0001', err) from None ### # Numerical operators @method(infix('+', bp=40)) def evaluate_plus_operator(self, context=None): if len(self) == 1: arg = self.get_argument(context, cls=NumericProxy) if arg is not None: return +arg else: op1, op2 = self.get_operands(context, cls=ArithmeticProxy) if op1 is not None: try: return op1 + op2 except TypeError as err: raise self.error('XPTY0004', err) from None except OverflowError as err: if isinstance(op1, AbstractDateTime): raise self.error('FODT0001', err) from None elif isinstance(op1, Duration): raise self.error('FODT0002', err) from None else: raise self.error('FOAR0002', err) from None @method(infix('-', bp=40)) def evaluate_minus_operator(self, context=None): if len(self) == 1: arg = self.get_argument(context, cls=NumericProxy) if arg is not None: return -arg else: op1, op2 = self.get_operands(context, cls=ArithmeticProxy) if op1 is not None: try: return op1 - op2 except TypeError as err: raise self.error('XPTY0004', err) from None except OverflowError as err: if isinstance(op1, AbstractDateTime): raise self.error('FODT0001', err) from None elif isinstance(op1, Duration): raise self.error('FODT0002', err) from None else: raise self.error('FOAR0002', err) from None @method('+') @method('-') def nud_plus_minus_operators(self): self[:] = self.parser.expression(rbp=70), return self @method(infix('*', bp=45)) def evaluate_multiply_operator(self, context=None): if self: op1, op2 = self.get_operands(context, cls=ArithmeticProxy) if op1 is not None: try: if isinstance(op2, (YearMonthDuration, DayTimeDuration)): return op2 * op1 return op1 * op2 except TypeError as err: if isinstance(op1, (float, decimal.Decimal)): if math.isnan(op1): raise self.error('FOCA0005') from None elif math.isinf(op1): raise self.error('FODT0002') from None if isinstance(op2, (float, decimal.Decimal)): if math.isnan(op2): raise self.error('FOCA0005') from None elif math.isinf(op2): raise self.error('FODT0002') from None raise self.error('XPTY0004', err) from None except ValueError as err: raise self.error('FOCA0005', err) from None except OverflowError as err: if isinstance(op1, AbstractDateTime): raise self.error('FODT0001', err) from None elif isinstance(op1, Duration): raise self.error('FODT0002', err) from None else: raise self.error('FOAR0002', err) from None else: # This is not a multiplication operator but a wildcard select statement return [x for x in self.select(context)] @method(infix('div', bp=45)) def evaluate_div_operator(self, context=None): dividend, divisor = self.get_operands(context, cls=ArithmeticProxy) if dividend is None: return elif divisor != 0: try: if isinstance(dividend, int) and isinstance(divisor, int): return decimal.Decimal(dividend) / decimal.Decimal(divisor) return dividend / divisor except TypeError as err: raise self.error('XPTY0004', err) from None except ValueError as err: raise self.error('FOCA0005', err) from None except OverflowError as err: raise self.error('FOAR0002', err) from None except (ZeroDivisionError, decimal.DivisionByZero): raise self.error('FOAR0001') from None elif isinstance(dividend, AbstractDateTime): raise self.error('FODT0001') elif isinstance(dividend, Duration): raise self.error('FODT0002') elif not self.parser.compatibility_mode and \ isinstance(dividend, (int, decimal.Decimal)) and \ isinstance(divisor, (int, decimal.Decimal)): raise self.error('FOAR0001') elif dividend == 0: return float('nan') elif dividend > 0: return float('-inf') if str(divisor).startswith('-') else float('inf') else: return float('inf') if str(divisor).startswith('-') else float('-inf') @method(infix('mod', bp=45)) def evaluate_mod_operator(self, context=None): op1, op2 = self.get_operands(context, cls=NumericProxy) if op1 is not None: if op2 == 0 and isinstance(op2, float): return float('nan') elif math.isinf(op2) and not math.isinf(op1) and op1 != 0: return op1 if self.parser.version != '1.0' else float('nan') try: if isinstance(op1, int) and isinstance(op2, int): return op1 % op2 if op1 * op2 >= 0 else -(abs(op1) % op2) return op1 % op2 except TypeError as err: raise self.error('FORG0006', err) from None except (ZeroDivisionError, decimal.InvalidOperation): raise self.error('FOAR0001') from None # Resolve the intrinsic ambiguity of some infix operators @method('or') @method('and') @method('div') @method('mod') def nud_logical_div_mod_operators(self): token = self.parser.symbol_table['(name)'](self.parser, self.symbol) return token.nud() ### # Union expressions @method('|', bp=50) def led_union_operator(self, left): self.cut_and_sort = True if left.symbol in ('|', 'union'): left.cut_and_sort = False self[:] = left, self.parser.expression(rbp=50) return self @method('|') def select_union_operator(self, context=None): if context is None: raise self.missing_context() results = {item for k in range(2) for item in self[k].select(copy(context))} if any(not isinstance(x, XPathNode) for x in results): raise self.error('XPTY0004', 'only XPath nodes are allowed') elif not self.cut_and_sort: yield from results else: yield from sorted(results, key=node_position) ### # Path expressions @method('//', bp=75) def nud_descendant_path(self): if self.parser.next_token.label not in self.parser.PATH_STEP_LABELS: self.parser.expected_name(*self.parser.PATH_STEP_SYMBOLS) self[:] = self.parser.expression(75), return self @method('/', bp=75) def nud_child_path(self): if self.parser.next_token.label not in self.parser.PATH_STEP_LABELS: try: self.parser.expected_name(*self.parser.PATH_STEP_SYMBOLS) except SyntaxError: return self self[:] = self.parser.expression(75), return self @method('//') @method('/') def led_child_or_descendant_path(self, left): if self.parser.next_token.label not in self.parser.PATH_STEP_LABELS: self.parser.expected_name(*self.parser.PATH_STEP_SYMBOLS) self[:] = left, self.parser.expression(75) return self @method('/') def select_child_path(self, context=None): """ Child path expression. Selects child:: axis as default (when bind to '*' or '(name)'). """ if context is None: raise self.missing_context() elif not self: if isinstance(context.root, DocumentNode): yield context.root elif len(self) == 1: if context.item is context.root and context.item.parent is not None: return # A rooted subtree -> document root produce [] elif not isinstance(context, XPathSchemaContext): context.item = None else: context.item = context.root yield from self[0].select(context) else: items = set() for _ in context.inner_focus_select(self[0]): if not isinstance(context.item, XPathNode): msg = f"Intermediate step contains an atomic value {context.item!r}" raise self.error('XPTY0019', msg) for result in self[1].select(context): if not isinstance(result, XPathNode): yield result elif result in items: pass elif isinstance(result, ElementNode): if result.elem not in items: items.add(result) yield result else: items.add(result) yield result if isinstance(context, XPathSchemaContext): self[1].add_xsd_type(result) @method('//') def select_descendant_path(self, context=None): """Operator '//' is a short equivalent to /descendant-or-self::node()/""" if context is None: raise self.missing_context() elif len(self) == 2: items = set() for _ in context.inner_focus_select(self[0]): if not isinstance(context.item, XPathNode): raise self.error('XPTY0019') for _ in context.iter_descendants(): for result in self[1].select(context): if not isinstance(result, XPathNode): yield result elif result in items: pass elif isinstance(result, ElementNode): if result.elem not in items: items.add(result) yield result else: items.add(result) yield result if isinstance(context, XPathSchemaContext): self[1].add_xsd_type(result) else: if context.item is context.root and context.item.parent is not None: return # A rooted subtree -> document root produce [] elif not isinstance(context, XPathSchemaContext): context.item = None else: context.item = context.root items = set() for _ in context.iter_descendants(): for result in self[0].select(context): if not isinstance(result, XPathNode): items.add(result) elif result in items: pass elif isinstance(result, ElementNode): if result.elem not in items: items.add(result) else: items.add(result) if isinstance(context, XPathSchemaContext): self[0].add_xsd_type(result) yield from sorted(items, key=node_position) ### # Predicate filters @method('[', bp=80) def led_predicate(self, left): self[:] = left, self.parser.expression() self.parser.advance(']') return self @method('[') def select_predicate(self, context=None): if context is None: raise self.missing_context() for _ in context.inner_focus_select(self[0]): if (self[1].label in ('axis', 'kind test') or self[1].symbol == '..') \ and not isinstance(context.item, XPathNode): raise self.error('XPTY0020') elif False and isinstance(context, XPathSchemaContext): yield context.item continue predicate = [x for x in self[1].select(copy(context))] if len(predicate) == 1 and isinstance(predicate[0], NumericProxy): if context.position == predicate[0]: yield context.item elif self.boolean_value(predicate): yield context.item ### # Parenthesized expressions @method('(', bp=100) def nud_parenthesized_expr(self): self[:] = self.parser.expression(), self.parser.advance(')') return self @method('(') def evaluate_parenthesized_expr(self, context=None): return self[0].evaluate(context) @method('(') def select_parenthesized_expr(self, context=None): return self[0].select(context) # XPath 1.0 definitions continue into module xpath1_functions elementpath-3.0.2/elementpath/xpath1/xpath1_parser.py000066400000000000000000000466701427546011100227060ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 1.0 implementation - part 1 (parser class and symbols) """ import re from abc import ABCMeta from typing import cast, Any, ClassVar, Dict, FrozenSet, MutableMapping, \ Optional, Tuple, Type, Set, Sequence from ..helpers import OCCURRENCE_INDICATORS, EQNAME_PATTERN, normalize_sequence_type from ..exceptions import MissingContextError, ElementPathKeyError, \ ElementPathValueError, xpath_error from ..datatypes import AnyAtomicType, NumericProxy, UntypedAtomic, QName, \ xsd10_atomic_types, xsd11_atomic_types from ..tdop import Token, Parser from ..namespaces import NamespacesType, XML_NAMESPACE, XSD_NAMESPACE, XSD_ERROR, \ XPATH_FUNCTIONS_NAMESPACE, XSD_ANY_SIMPLE_TYPE, XSD_ANY_ATOMIC_TYPE, \ XSD_UNTYPED_ATOMIC, get_namespace, get_expanded_name from ..schema_proxy import AbstractSchemaProxy from ..xpath_token import NargsType, XPathToken, XPathAxis, XPathFunction, ProxyToken from ..xpath_nodes import XPathNode, ElementNode, AttributeNode, DocumentNode COMMON_SEQUENCE_TYPES = { 'xs:untyped', 'untypedAtomic', 'attribute()', 'attribute(*)', 'element()', 'element(*)', 'text()', 'document-node()', 'comment()', 'processing-instruction()', 'item()', 'node()', 'numeric' } class XPath1Parser(Parser[XPathToken]): """ XPath 1.0 expression parser class. Provide a *namespaces* dictionary argument for mapping namespace prefixes to URI inside expressions. If *strict* is set to `False` the parser enables also the parsing of QNames, like the ElementPath library. :param namespaces: a dictionary with mapping from namespace prefixes into URIs. :param strict: a strict mode is `False` the parser enables parsing of QNames \ in extended format, like the Python's ElementPath library. Default is `True`. """ version = '1.0' """The XPath version string.""" token_base_class: Type[Token[Any]] = XPathToken literals_pattern = re.compile( r"""'(?:[^']|'')*'|"(?:[^"]|"")*"|(?:\d+|\.\d+)(?:\.\d*)?(?:[Ee][+-]?\d+)?""" ) name_pattern = re.compile(r'[^\d\W][\w.\-\xb7\u0300-\u036F\u203F\u2040]*') SYMBOLS: ClassVar[FrozenSet[str]] = Parser.SYMBOLS | { # Axes 'descendant-or-self', 'following-sibling', 'preceding-sibling', 'ancestor-or-self', 'descendant', 'attribute', 'following', 'namespace', 'preceding', 'ancestor', 'parent', 'child', 'self', # Operators 'and', 'mod', 'div', 'or', '..', '//', '!=', '<=', '>=', '(', ')', '[', ']', ':', '.', '@', ',', '/', '|', '*', '-', '=', '+', '<', '>', '$', '::', # Node test functions 'node', 'text', 'comment', 'processing-instruction', # Node set functions 'last', 'position', 'count', 'id', 'name', 'local-name', 'namespace-uri', # String functions 'string', 'concat', 'starts-with', 'contains', 'substring-before', 'substring-after', 'substring', 'string-length', 'normalize-space', 'translate', # Boolean functions 'boolean', 'not', 'true', 'false', 'lang', # Number functions 'number', 'sum', 'floor', 'ceiling', 'round', # Symbols for ElementPath extensions '{', '}' } RESERVED_FUNCTION_NAMES = { 'comment', 'element', 'node', 'processing-instruction', 'text' } DEFAULT_NAMESPACES: ClassVar[Dict[str, str]] = {'xml': XML_NAMESPACE} """Namespaces known statically by default.""" # Labels and symbols admitted after a path step PATH_STEP_LABELS: ClassVar[Tuple[str, ...]] = ('axis', 'kind test') PATH_STEP_SYMBOLS: ClassVar[Set[str]] = { '(integer)', '(string)', '(float)', '(decimal)', '(name)', '*', '@', '..', '.', '{' } # Class attributes for compatibility with XPath 2.0+ schema: Optional[AbstractSchemaProxy] = None variable_types: Optional[Dict[str, str]] = None base_uri: Optional[str] = None function_namespace = XPATH_FUNCTIONS_NAMESPACE function_signatures: Dict[Tuple[QName, int], str] = {} compatibility_mode: bool = True """XPath 1.0 compatibility mode.""" default_namespace: Optional[str] = None """ The default namespace. For XPath 1.0 this value is always `None` because the default namespace is ignored (see https://www.w3.org/TR/1999/REC-xpath-19991116/#node-tests). """ def __init__(self, namespaces: Optional[NamespacesType] = None, strict: bool = True, *args: Any, **kwargs: Any) -> None: super(XPath1Parser, self).__init__() self.namespaces: Dict[str, str] = self.DEFAULT_NAMESPACES.copy() if namespaces is not None: self.namespaces.update(namespaces) self.strict: bool = strict @property def other_namespaces(self) -> Dict[str, str]: """The subset of namespaces not known by default.""" return {k: v for k, v in self.namespaces.items() if k not in self.DEFAULT_NAMESPACES} @property def xsd_version(self) -> str: return '1.0' # Use XSD 1.0 datatypes for default def xsd_qname(self, local_name: str) -> str: """Returns a prefixed QName string for XSD namespace.""" if self.namespaces.get('xs') == XSD_NAMESPACE: return 'xs:%s' % local_name for pfx, uri in self.namespaces.items(): if uri == XSD_NAMESPACE: return '%s:%s' % (pfx, local_name) if pfx else local_name raise xpath_error('XPST0081', 'Missing XSD namespace registration') @classmethod def create_restricted_parser(cls, name: str, symbols: Sequence[str]) \ -> Type['XPath1Parser']: """Get a parser subclass with a restricted set of symbols.s""" _symbols = frozenset(symbols) symbol_table = { k: v for k, v in cls.symbol_table.items() if k in _symbols } return cast(Type['XPath1Parser'], ABCMeta( f"{name}{cls.__name__}", (cls,), {'symbol_table': symbol_table, 'SYMBOLS': _symbols} )) @staticmethod def unescape(string_literal: str) -> str: if string_literal.startswith("'"): return string_literal[1:-1].replace("''", "'") else: return string_literal[1:-1].replace('""', '"') @classmethod def proxy(cls, symbol: str, label: str = 'proxy', bp: int = 90) -> Type[ProxyToken]: """Register a proxy token for a symbol.""" if symbol in cls.symbol_table and not issubclass(cls.symbol_table[symbol], ProxyToken): # Move the token class before register the proxy token token_cls = cls.symbol_table.pop(symbol) cls.symbol_table[f'{{{token_cls.namespace}}}{symbol}'] = token_cls proxy_class = cls.register(symbol, bases=(ProxyToken,), label=label, lbp=bp, rbp=bp) return cast(Type[ProxyToken], proxy_class) @classmethod def axis(cls, symbol: str, reverse_axis: bool = False, bp: int = 80) -> Type[XPathAxis]: """Register a token for a symbol that represents an XPath *axis*.""" token_class = cls.register(symbol, label='axis', bases=(XPathAxis,), reverse_axis=reverse_axis, lbp=bp, rbp=bp) return cast(Type[XPathAxis], token_class) @classmethod def function(cls, symbol: str, prefix: Optional[str] = None, label: str = 'function', nargs: NargsType = None, sequence_types: Tuple[str, ...] = (), bp: int = 90) -> Type[XPathFunction]: """ Registers a token class for a symbol that represents an XPath function. """ kwargs = { 'bases': (XPathFunction,), 'label': label, 'nargs': nargs, 'lbp': bp, 'rbp': bp, } if 'function' not in label: # kind test or sequence type return cast(Type[XPathFunction], cls.register(symbol, **kwargs)) elif symbol in cls.RESERVED_FUNCTION_NAMES: raise ElementPathValueError(f'{symbol!r} is a reserved function name') if prefix: namespace = cls.DEFAULT_NAMESPACES[prefix] qname = QName(namespace, '%s:%s' % (prefix, symbol)) kwargs['lookup_name'] = qname.expanded_name kwargs['namespace'] = namespace cls.proxy(symbol, label='proxy function', bp=bp) else: qname = QName(XPATH_FUNCTIONS_NAMESPACE, 'fn:%s' % symbol) kwargs['namespace'] = XPATH_FUNCTIONS_NAMESPACE if sequence_types: # Register function signature(s) kwargs['sequence_types'] = sequence_types if nargs is None: pass # pragma: no cover elif isinstance(nargs, int): assert len(sequence_types) == nargs + 1 cls.function_signatures[(qname, nargs)] = 'function({}) as {}'.format( ', '.join(sequence_types[:-1]), sequence_types[-1] ) elif nargs[1] is None: assert len(sequence_types) == nargs[0] + 1 cls.function_signatures[(qname, nargs[0])] = 'function({}, ...) as {}'.format( ', '.join(sequence_types[:-1]), sequence_types[-1] ) else: assert len(sequence_types) == nargs[1] + 1 for arity in range(nargs[0], nargs[1] + 1): cls.function_signatures[(qname, arity)] = 'function({}) as {}'.format( ', '.join(sequence_types[:arity]), sequence_types[-1] ) return cast(Type[XPathFunction], cls.register(symbol, **kwargs)) def parse(self, source: str) -> XPathToken: root_token = super(XPath1Parser, self).parse(source) try: root_token.evaluate() # Static context evaluation except MissingContextError: pass return root_token def expected_name(self, *symbols: str, message: Optional[str] = None) -> None: """ Checks the next symbol with a list of symbols. Replaces the next token with a '(name)' token if check fails and the symbol can be also a name. Otherwise raises a syntax error. :param symbols: a sequence of symbols. :param message: optional error message. """ if self.next_token.symbol in symbols: return elif self.next_token.label in ('operator', 'symbol', 'let expression', 'proxy function') \ and self.name_pattern.match(self.next_token.symbol) is not None: token_class = self.symbol_table['(name)'] self.next_token = token_class(self, self.next_token.symbol) else: raise self.next_token.wrong_syntax(message) ### # Type checking (used in XPath 2.0) def is_instance(self, obj: Any, type_qname: str) -> bool: """Checks an instance against an XSD type.""" if get_namespace(type_qname) == XSD_NAMESPACE: if type_qname == XSD_ERROR: return obj is None or obj == [] elif type_qname == XSD_UNTYPED_ATOMIC: return isinstance(obj, UntypedAtomic) elif type_qname == XSD_ANY_ATOMIC_TYPE: return isinstance(obj, AnyAtomicType) elif type_qname == XSD_ANY_SIMPLE_TYPE: return isinstance(obj, AnyAtomicType) or \ isinstance(obj, list) and \ all(isinstance(x, AnyAtomicType) for x in obj) try: if self.xsd_version == '1.1': return isinstance(obj, xsd11_atomic_types[type_qname]) return isinstance(obj, xsd10_atomic_types[type_qname]) except KeyError: pass if self.schema is not None: try: return self.schema.is_instance(obj, type_qname) except KeyError: pass raise ElementPathKeyError("unknown type %r" % type_qname) def is_sequence_type(self, value: str) -> bool: """Checks if a string is a sequence type specification.""" try: value = normalize_sequence_type(value) except TypeError: return False if not value: return False elif value == 'empty-sequence()' or value == 'none': return True elif value in ('map(*)', 'array(*)') and self.version >= '3.1': return True elif value[-1] in OCCURRENCE_INDICATORS: value = value[:-1] if value in COMMON_SEQUENCE_TYPES: return True elif value.startswith('element(') and value.endswith(')'): if ',' not in value: return EQNAME_PATTERN.match(value[8:-1]) is not None try: arg1, arg2 = value[8:-1].split(', ') except ValueError: return False else: return (arg1 == '*' or EQNAME_PATTERN.match(arg1) is not None) \ and EQNAME_PATTERN.match(arg2) is not None elif value.startswith('document-node(') and value.endswith(')'): if not value.startswith('document-node(element('): return False return self.is_sequence_type(value[14:-1]) elif value.startswith('function('): if self.version >= '3.0': if value == 'function(*)': return True elif ' as ' in value: pass elif not value.endswith(')'): return False else: return self.is_sequence_type(value[9:-1]) try: value, return_type = value.rsplit(' as ', 1) except ValueError: return False else: if not self.is_sequence_type(return_type): return False elif value == 'function()': return True value = value[9:-1] if value.endswith(', ...'): value = value[:-5] if 'function(' not in value: return all(self.is_sequence_type(x) for x in value.split(', ')) # Cover only if function() spec is the last argument k = value.index('function(') if not self.is_sequence_type(value[k:]): return False return all(self.is_sequence_type(x) for x in value[:k].split(', ') if x) elif QName.pattern.match(value) is None: return False try: type_qname = get_expanded_name(value, self.namespaces) self.is_instance(None, type_qname) except (KeyError, ValueError): return False else: return True def match_sequence_type(self, value: Any, sequence_type: str, occurrence: Optional[str] = None) -> bool: """ Checks a value instance against a sequence type. :param value: the instance to check. :param sequence_type: a string containing the sequence type spec. :param occurrence: an optional occurrence spec, can be '?', '+' or '*'. """ if sequence_type[-1] in OCCURRENCE_INDICATORS: return self.match_sequence_type(value, sequence_type[:-1], sequence_type[-1]) elif value is None or isinstance(value, list) and value == []: return sequence_type in ('empty-sequence()', 'none') or occurrence in ('?', '*') elif sequence_type in ('empty-sequence()', 'none'): return False elif isinstance(value, list): if len(value) == 1: return self.match_sequence_type(value[0], sequence_type) elif occurrence is None or occurrence == '?': return False else: return all(self.match_sequence_type(x, sequence_type) for x in value) elif sequence_type == 'item()': return isinstance(value, XPathNode) \ or isinstance(value, (AnyAtomicType, list, XPathFunction)) elif sequence_type == 'numeric': return isinstance(value, NumericProxy) elif sequence_type.startswith('function('): if not isinstance(value, XPathFunction): return False return value.match_function_test(sequence_type) if isinstance(value, XPathNode): value_kind = value.kind else: try: type_expanded_name = get_expanded_name(sequence_type, self.namespaces) return self.is_instance(value, type_expanded_name) except (KeyError, ValueError): return False if sequence_type == 'node()': return True elif not sequence_type.startswith(value_kind) or not sequence_type.endswith(')'): return False elif sequence_type == f'{value_kind}()': return True elif value_kind == 'document': element_test = sequence_type[14:-1] if not element_test: return True element_node = cast(DocumentNode, value).getroot() return self.match_sequence_type(element_node, element_test) elif value_kind not in ('element', 'attribute'): return False _, params = sequence_type[:-1].split('(') if ',' not in sequence_type: name = params else: name, type_name = params.split(',') if type_name.endswith('?'): type_name = type_name[:-1] elif isinstance(value, ElementNode) and value.nilled: return False if type_name == 'xs:untyped': if isinstance(value, (ElementNode, AttributeNode)) \ and value.xsd_type is not None: return False else: try: type_expanded_name = get_expanded_name(type_name, self.namespaces) if not self.is_instance(value, type_expanded_name): return False except (KeyError, ValueError): return False if name == '*': return True try: return bool(value.name == get_expanded_name(name, self.namespaces)) except (KeyError, ValueError, AttributeError): return False def check_variables(self, values: MutableMapping[str, Any]) -> None: """Checks the sequence types of the XPath dynamic context's variables.""" for varname, value in values.items(): if not self.match_sequence_type( value, 'item()', occurrence='*' if isinstance(value, list) else None): message = "Unmatched sequence type for variable {!r}".format(varname) raise xpath_error('XPDY0050', message) ### # Special symbols XPath1Parser.register('(start)') XPath1Parser.register('(end)') XPath1Parser.literal('(string)') XPath1Parser.literal('(float)') XPath1Parser.literal('(decimal)') XPath1Parser.literal('(integer)') XPath1Parser.literal('(invalid)') XPath1Parser.register('(unknown)') ### # Simple symbols XPath1Parser.register(',') XPath1Parser.register(')', bp=100) XPath1Parser.register(']') XPath1Parser.register('::') XPath1Parser.register('}') # XPath 1.0 definitions continue into module xpath1_operators elementpath-3.0.2/elementpath/xpath2/000077500000000000000000000000001427546011100175375ustar00rootroot00000000000000elementpath-3.0.2/elementpath/xpath2/__init__.py000066400000000000000000000010011427546011100216400ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from typing import TYPE_CHECKING if TYPE_CHECKING: from .xpath2_parser import XPath2Parser else: from ._xpath2_constructors import XPath2Parser __all__ = ['XPath2Parser'] elementpath-3.0.2/elementpath/xpath2/_xpath2_constructors.py000066400000000000000000000446401427546011100243160ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # type: ignore """ XPath 2.0 implementation - part 4 (XSD constructors) """ from ..exceptions import ElementPathError, ElementPathSyntaxError from ..namespaces import XSD_NAMESPACE from ..datatypes import xsd10_atomic_types, xsd11_atomic_types, GregorianDay, \ GregorianMonth, GregorianMonthDay, GregorianYear10, GregorianYear, \ GregorianYearMonth10, GregorianYearMonth, Duration, DayTimeDuration, \ YearMonthDuration, Date10, Date, DateTime10, DateTime, DateTimeStamp, \ Time, UntypedAtomic, QName, HexBinary, Base64Binary, BooleanProxy from ._xpath2_functions import XPath2Parser register = XPath2Parser.register unregister = XPath2Parser.unregister method = XPath2Parser.method constructor = XPath2Parser.constructor ### # Constructors for string-based XSD types @constructor('normalizedString') @constructor('token') @constructor('language') @constructor('NMTOKEN') @constructor('Name') @constructor('NCName') @constructor('ID') @constructor('IDREF') @constructor('ENTITY') @constructor('anyURI') def cast_string_based_types(self, value): try: return xsd10_atomic_types[self.symbol](value) except ValueError as err: raise self.error('FORG0001', err) ### # Constructors for numeric XSD types @constructor('decimal') @constructor('double') @constructor('float') def cast_numeric_types(self, value): try: if self.parser.xsd_version == '1.0': return xsd10_atomic_types[self.symbol](value) return xsd11_atomic_types[self.symbol](value) except ValueError as err: if isinstance(value, (str, UntypedAtomic)): raise self.error('FORG0001', err) raise self.error('FOCA0002', err) @constructor('integer') @constructor('nonNegativeInteger') @constructor('positiveInteger') @constructor('nonPositiveInteger') @constructor('negativeInteger') @constructor('long') @constructor('int') @constructor('short') @constructor('byte') @constructor('unsignedLong') @constructor('unsignedInt') @constructor('unsignedShort') @constructor('unsignedByte') def cast_integer_types(self, value): try: return xsd10_atomic_types[self.symbol](value) except ValueError: msg = 'could not convert {!r} to xs:{}'.format(value, self.symbol) if isinstance(value, (str, bytes, int, UntypedAtomic)): raise self.error('FORG0001', msg) from None raise self.error('FOCA0002', msg) from None except OverflowError as err: raise self.error('FOCA0002', err) from None ### # Constructors for datetime XSD types @constructor('date') def cast_date_type(self, value): cls = Date if self.parser.xsd_version == '1.1' else Date10 if isinstance(value, cls): return value try: if isinstance(value, UntypedAtomic): return cls.fromstring(value.value) elif isinstance(value, DateTime10): return cls(value.year, value.month, value.day, value.tzinfo) return cls.fromstring(value) except OverflowError as err: raise self.error('FODT0001', err) from None except ValueError as err: raise self.error('FORG0001', err) @constructor('gDay') def cast_gregorian_day_type(self, value): if isinstance(value, GregorianDay): return value try: if isinstance(value, UntypedAtomic): return GregorianDay.fromstring(value.value) elif isinstance(value, (Date10, DateTime10)): return GregorianDay(value.day, value.tzinfo) return GregorianDay.fromstring(value) except ValueError as err: raise self.error('FORG0001', err) @constructor('gMonth') def cast_gregorian_month_type(self, value): if isinstance(value, GregorianMonth): return value try: if isinstance(value, UntypedAtomic): return GregorianMonth.fromstring(value.value) elif isinstance(value, (Date10, DateTime10)): return GregorianMonth(value.month, value.tzinfo) return GregorianMonth.fromstring(value) except ValueError as err: raise self.error('FORG0001', err) @constructor('gMonthDay') def cast_gregorian_month_day_type(self, value): if isinstance(value, GregorianMonthDay): return value try: if isinstance(value, UntypedAtomic): return GregorianMonthDay.fromstring(value.value) elif isinstance(value, (Date10, DateTime10)): return GregorianMonthDay(value.month, value.day, value.tzinfo) return GregorianMonthDay.fromstring(value) except ValueError as err: raise self.error('FORG0001', err) @constructor('gYear') def cast_gregorian_year_type(self, value): cls = GregorianYear if self.parser.xsd_version == '1.1' else GregorianYear10 if isinstance(value, cls): return value try: if isinstance(value, UntypedAtomic): return cls.fromstring(value.value) elif isinstance(value, (Date10, DateTime10)): return cls(value.year, value.tzinfo) return cls.fromstring(value) except OverflowError as err: raise self.error('FODT0001', err) from None except ValueError as err: raise self.error('FORG0001', err) @constructor('gYearMonth') def cast_gregorian_year_month_type(self, value): cls = GregorianYearMonth \ if self.parser.xsd_version == '1.1' else GregorianYearMonth10 if isinstance(value, cls): return value try: if isinstance(value, UntypedAtomic): return cls.fromstring(value.value) elif isinstance(value, (Date10, DateTime10)): return cls(value.year, value.month, value.tzinfo) return cls.fromstring(value) except OverflowError as err: raise self.error('FODT0001', err) from None except ValueError as err: raise self.error('FORG0001', err) @constructor('time') def cast_time_type(self, value): if isinstance(value, Time): return value try: if isinstance(value, UntypedAtomic): return Time.fromstring(value.value) elif isinstance(value, DateTime10): return Time(value.hour, value.minute, value.second, value.microsecond, value.tzinfo) return Time.fromstring(value) except ValueError as err: raise self.error('FORG0001', err) @method('date') @method('gDay') @method('gMonth') @method('gMonthDay') @method('gYear') @method('gYearMonth') @method('time') def evaluate_other_datetime_types(self, context=None): arg = self.data_value(self.get_argument(context)) if arg is None: return [] try: return self.cast(arg) except TypeError as err: raise self.error('FORG0006', err) from None except OverflowError as err: raise self.error('FODT0001', err) from None ### # Constructors for time durations XSD types @constructor('duration') def cast_duration_type(self, value): if isinstance(value, Duration): return value try: if isinstance(value, UntypedAtomic): return Duration.fromstring(value.value) return Duration.fromstring(value) except OverflowError as err: raise self.error('FODT0002', err) from None except ValueError as err: raise self.error('FORG0001', err) @constructor('yearMonthDuration') def cast_year_month_duration_type(self, value): if isinstance(value, YearMonthDuration): return value elif isinstance(value, Duration): return YearMonthDuration(months=value.months) try: if isinstance(value, UntypedAtomic): return YearMonthDuration.fromstring(value.value) return YearMonthDuration.fromstring(value) except OverflowError as err: raise self.error('FODT0002', err) from None except ValueError as err: raise self.error('FORG0001', err) @constructor('dayTimeDuration') def cast_day_time_duration_type(self, value): if isinstance(value, DayTimeDuration): return value elif isinstance(value, Duration): return DayTimeDuration(seconds=value.seconds) try: if isinstance(value, UntypedAtomic): return DayTimeDuration.fromstring(value.value) return DayTimeDuration.fromstring(value) except OverflowError as err: raise self.error('FODT0002', err) from None except ValueError as err: raise self.error('FORG0001', err) from None @constructor('dateTimeStamp') def cast_datetime_stamp_type(self, value): if isinstance(value, DateTimeStamp): return value elif isinstance(value, DateTime10): value = str(value) try: if isinstance(value, UntypedAtomic): return DateTimeStamp.fromstring(value.value) elif isinstance(value, Date): return DateTimeStamp(value.year, value.month, value.day, tzinfo=value.tzinfo) return DateTimeStamp.fromstring(value) except ValueError as err: raise self.error('FORG0001', err) from None @method('dateTimeStamp') def evaluate_datetime_stamp_type(self, context=None): arg = self.data_value(self.get_argument(context)) if arg is None: return [] if isinstance(arg, UntypedAtomic): return self.cast(arg.value) elif isinstance(arg, Date): return self.cast(arg) return self.cast(str(arg)) @method('dateTimeStamp') def nud_datetime_stamp_type(self): if self.parser.xsd_version == '1.0': raise self.wrong_syntax("xs:dateTimeStamp is not recognized unless XSD 1.1 is enabled") try: self.parser.advance('(') self[0:] = self.parser.expression(5), if self.parser.next_token.symbol == ',': raise self.wrong_nargs('Too many arguments: expected at most 1 argument') self.parser.advance(')') self.value = None except SyntaxError: raise self.error('XPST0017') from None return self ### # Constructors for binary XSD types @constructor('base64Binary') def cast_base64_binary_type(self, value): try: return Base64Binary(value) except ValueError as err: raise self.error('FORG0001', err) from None except TypeError as err: raise self.error('XPTY0004', err) from None @constructor('hexBinary') def cast_hex_binary_type(self, value): try: return HexBinary(value) except ValueError as err: raise self.error('FORG0001', err) from None except TypeError as err: raise self.error('XPTY0004', err) from None @method('base64Binary') @method('hexBinary') def evaluate_binary_types(self, context=None): arg = self.data_value(self.get_argument(context)) if arg is None: return [] try: return self.cast(arg) except ElementPathError as err: err.token = self raise @constructor('NOTATION') def cast_notation_type(self, value): raise NotImplementedError("No value is castable to xs:NOTATION") @method('NOTATION') def nud_notation_type(self): self.parser.advance('(') if self.parser.next_token.symbol == ')': raise self.error('XPST0017', 'expected exactly one argument') self[0:] = self.parser.expression(5), if self.parser.next_token.symbol != ')': raise self.error('XPST0017', 'expected exactly one argument') self.parser.advance() self.value = None raise self.error('XPST0017', "no constructor function exists for xs:NOTATION") ### # Multirole tokens (function or constructor function) # # Case 1: In XPath 2.0 the 'boolean' keyword is used both for boolean() function and # for boolean() constructor function. unregister('boolean') @constructor('boolean', label=('function', 'constructor function'), sequence_types=('item()*', 'xs:boolean')) def cast_boolean_type(self, value): try: return BooleanProxy(value) except ValueError as err: raise self.error('FORG0001', err) from None except TypeError as err: raise self.error('XPTY0004', err) from None @method('boolean') def nud_boolean_type_and_function(self): self.parser.advance('(') if self.parser.next_token.symbol == ')': raise self.wrong_nargs('Too few arguments: expected at least 1 argument') self[0:] = self.parser.expression(5), if self.parser.next_token.symbol == ',': raise self.wrong_nargs('Too many arguments: expected at most 1 argument') self.parser.advance(')') self.value = None return self @method('boolean') def evaluate_boolean_type_and_function(self, context=None): if self.label == 'function': return self.boolean_value([x for x in self[0].select(context)]) # xs:boolean constructor arg = self.data_value(self.get_argument(context)) if arg is None: return [] try: return self.cast(arg) except ElementPathError as err: err.token = self raise ### # Case 2: In XPath 2.0 the 'string' keyword is used both for fn:string() and xs:string(). unregister('string') @constructor('string', label=('function', 'constructor function'), nargs=(0, 1), sequence_types=('item()?', 'xs:string')) def cast_string_type(self, value): return self.string_value(value) @method('string') def nud_string_type_and_function(self): try: self.parser.advance('(') if self.label != 'function' or self.parser.next_token.symbol != ')': self[0:] = self.parser.expression(5), self.parser.advance(')') except ElementPathSyntaxError as err: err.code = self.error_code('XPST0017') raise self.value = None return self @method('string') def evaluate_string_type_and_function(self, context=None): if self.label == 'function': if not self: if context is None: raise self.missing_context() return self.string_value(context.item) return self.string_value(self.get_argument(context)) else: item = self.get_argument(context) return [] if item is None else self.string_value(item) # Case 3 and 4: In XPath 2.0 the XSD 'QName' and 'dateTime' types have special # constructor functions so the 'QName' keyword is used both for fn:QName() and # xs:QName(), the same for 'dateTime' keyword. # # In those cases the label at parse time is set by the nud method, in dependence # of the number of args. # @constructor('QName', bp=90, label=('function', 'constructor function'), nargs=(1, 2), sequence_types=('xs:string?', 'xs:string', 'xs:QName')) def cast_qname_type(self, value): if isinstance(value, QName): return value elif isinstance(value, UntypedAtomic) and self.parser.version >= '3.0': return self.cast_to_qname(value.value) elif isinstance(value, str): return self.cast_to_qname(value) else: raise self.error('XPTY0004', 'the argument has an invalid type %r' % type(value)) @constructor('dateTime', bp=90, label=('function', 'constructor function'), nargs=(1, 2), sequence_types=('xs:date?', 'xs:time?', 'xs:dateTime?')) def cast_datetime_type(self, value): cls = DateTime if self.parser.xsd_version == '1.1' else DateTime10 if isinstance(value, cls): return value try: if isinstance(value, UntypedAtomic): return cls.fromstring(value.value) elif isinstance(value, Date10): return cls(value.year, value.month, value.day, tzinfo=value.tzinfo) return cls.fromstring(value) except OverflowError as err: raise self.error('FODT0001', err) from None except ValueError as err: raise self.error('FORG0001', err) from None @method('QName') @method('dateTime') def nud_qname_and_datetime(self): try: self.parser.advance('(') self[0:] = self.parser.expression(5), if self.parser.next_token.symbol == ',': if self.label != 'function': raise self.error('XPST0017', 'unexpected 2nd argument') self.label = 'function' self.parser.advance(',') self[1:] = self.parser.expression(5), elif self.label != 'constructor function' or self.namespace != XSD_NAMESPACE: raise self.error('XPST0017', '2nd argument missing') else: self.label = 'constructor function' self.nargs = 1 self.parser.advance(')') except SyntaxError: raise self.error('XPST0017') from None self.value = None return self @method('QName') def evaluate_qname_type_and_function(self, context=None): if self.label == 'constructor function': arg = self.data_value(self.get_argument(context)) return [] if arg is None else self.cast(arg) else: uri = self.get_argument(context) qname = self.get_argument(context, index=1) try: return QName(uri, qname) except TypeError as err: raise self.error('XPTY0004', err) except ValueError as err: raise self.error('FOCA0002', err) @method('dateTime') def evaluate_datetime_type_and_function(self, context=None): if self.label == 'constructor function': arg = self.data_value(self.get_argument(context)) if arg is None: return [] try: return self.cast(arg) except ValueError as err: raise self.error('FORG0001', err) from None except TypeError as err: raise self.error('FORG0006', err) from None else: dt = self.get_argument(context, cls=Date10) tm = self.get_argument(context, 1, cls=Time) if dt is None or tm is None: return [] elif dt.tzinfo == tm.tzinfo or tm.tzinfo is None: tzinfo = dt.tzinfo elif dt.tzinfo is None: tzinfo = tm.tzinfo else: raise self.error('FORG0008') if self.parser.xsd_version == '1.1': return DateTime(dt.year, dt.month, dt.day, tm.hour, tm.minute, tm.second, tm.microsecond, tzinfo) return DateTime10(dt.year, dt.month, dt.day, tm.hour, tm.minute, tm.second, tm.microsecond, tzinfo) @constructor('untypedAtomic') def cast_untyped_atomic(self, value): return UntypedAtomic(value) @method('untypedAtomic') def evaluate_untyped_atomic(self, context=None): arg = self.data_value(self.get_argument(context)) if arg is None: return [] elif isinstance(arg, UntypedAtomic): return arg else: return self.cast(arg) elementpath-3.0.2/elementpath/xpath2/_xpath2_functions.py000066400000000000000000001624641427546011100235630ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # type: ignore """ XPath 2.0 implementation - part 3 (functions) """ import math import datetime import time import re import locale import os.path import unicodedata from copy import copy from decimal import Decimal, DecimalException from string import ascii_letters from urllib.parse import urlsplit, quote as urllib_quote from ..exceptions import ElementPathValueError from ..helpers import is_idrefs, is_xml_codepoint, round_number from ..datatypes import QNAME_PATTERN, DateTime10, DateTime, Date10, Date, \ Float10, DoubleProxy, Time, Duration, DayTimeDuration, YearMonthDuration, \ UntypedAtomic, AnyURI, QName, NCName, Id, ArithmeticProxy, NumericProxy from ..namespaces import XML_NAMESPACE, get_namespace, split_expanded_name, \ XML_BASE, XML_ID, XML_LANG from ..etree import etree_deep_equal from ..xpath_context import XPathSchemaContext from ..xpath_nodes import XPathNode, DocumentNode, ElementNode, AttributeNode, \ NamespaceNode, CommentNode, ProcessingInstructionNode from ..xpath_token import XPathFunction from ..regex import RegexError, translate_pattern from ._xpath2_operators import XPath2Parser method = XPath2Parser.method function = XPath2Parser.function def is_local_url_scheme(scheme): return scheme in ('', 'file') or len(scheme) == 1 and scheme in ascii_letters def is_local_dir_url(url): url_parts = urlsplit(url) return is_local_url_scheme(url_parts.scheme) and os.path.isdir(url_parts.path.lstrip(':')) ### # Sequence types (allowed only for type checking in treat-as/instance-of statements) function('empty-sequence', nargs=0, label='sequence type') @method(function('item', nargs=0, label='sequence type')) def evaluate_item_sequence_type(self, context=None): if context is None: raise self.missing_context() return context.root if context.item is None else context.item @method('item') def nud_item_sequence_type(self): XPathFunction.nud(self) if self.parser.next_token.symbol in ('*', '+', '?'): self.occurrence = self.parser.next_token.symbol self.parser.advance() return self ### # Function for QNames @method(function('prefix-from-QName', nargs=1, sequence_types=('xs:QName?', 'xs:NCName?'))) def evaluate_prefix_from_qname_function(self, context=None): qname = self.get_argument(context) if qname is None: return [] elif not isinstance(qname, QName): raise self.error('XPTY0004', 'argument has an invalid type %r' % type(qname)) return NCName(qname.prefix) if qname.prefix else [] @method(function('local-name-from-QName', nargs=1, sequence_types=('xs:QName?', 'xs:NCName?'))) def evaluate_local_name_from_qname_function(self, context=None): qname = self.get_argument(context) if qname is None: return elif not isinstance(qname, QName): if self.parser.version >= '3.0' and \ isinstance(self.data_value(qname), UntypedAtomic): code = 'XPTY0117' else: code = 'XPTY0004' raise self.error(code, 'argument has an invalid type %r' % type(qname)) return NCName(qname.local_name) @method(function('namespace-uri-from-QName', nargs=1, sequence_types=('xs:QName?', 'xs:anyURI?'))) def evaluate_uri_from_qname_function(self, context=None): qname = self.get_argument(context) if qname is None: return elif not isinstance(qname, QName): if self.parser.version >= '3.0' and \ isinstance(self.data_value(qname), UntypedAtomic): code = 'XPTY0117' else: code = 'XPTY0004' raise self.error(code, 'argument has an invalid type %r' % type(qname)) return AnyURI(qname.uri or '') @method(function('namespace-uri-for-prefix', nargs=2, sequence_types=('xs:string?', 'element()', 'xs:anyURI?'))) def evaluate_namespace_uri_for_prefix_function(self, context=None): if context is None: raise self.missing_context() prefix = self.get_argument(context=copy(context)) if prefix is None: prefix = '' if not isinstance(prefix, str): raise self.error('FORG0006', '1st argument has an invalid type %r' % type(prefix)) elem = self.get_argument(context, index=1) if not isinstance(elem, ElementNode): raise self.error('FORG0006', '2nd argument %r is not an element node' % elem) ns_uris = {get_namespace(e.tag) for e in elem.elem.iter() if not callable(e.tag)} for p, uri in self.parser.namespaces.items(): if uri in ns_uris: if p == prefix: if not prefix or uri: return AnyURI(uri) else: msg = 'Prefix %r is associated to no namespace' raise self.error('XPST0081', msg % prefix) @method(function('in-scope-prefixes', nargs=1, sequence_types=('element()', 'xs:string*'))) def select_in_scope_prefixes_function(self, context=None): if context is None: raise self.missing_context() arg = self.get_argument(context, required=True) if not isinstance(arg, ElementNode): raise self.error('XPTY0004', 'argument %r is not an element node' % arg) elem = arg.elem if isinstance(context, XPathSchemaContext): # For schema context returns prefixes of static namespaces for pfx, uri in self.parser.namespaces.items(): if uri: yield pfx or '' elif hasattr(elem, 'nsmap'): # For lxml returns Element nsmap prefixes, replacing None with '' if 'xml' not in elem.nsmap: yield 'xml' for pfx, uri in elem.nsmap.items(): if uri: yield pfx or '' else: # For ElementTree returns module registered prefixes for pfx, uri in self.parser.namespaces.items(): if uri: yield pfx or '' if context.namespaces: yield from (x for x in context.namespaces if x not in self.parser.namespaces) @method(function('resolve-QName', nargs=2, sequence_types=('xs:string?', 'element()', 'xs:QName?'))) def evaluate_resolve_qname_function(self, context=None): qname = self.get_argument(context=copy(context)) if qname is None: return elif not isinstance(qname, str): raise self.error('FORG0006', '1st argument has an invalid type %r' % type(qname)) if context is None: raise self.missing_context() elem = self.get_argument(context, index=1) if not isinstance(elem, ElementNode): raise self.error('FORG0006', '2nd argument %r is not an element node' % elem) qname = qname.strip() match = QNAME_PATTERN.match(qname) if match is None: raise self.error('FOCA0002', '1st argument must be an xs:QName') prefix = match.groupdict()['prefix'] or '' if prefix == 'xml': return QName(XML_NAMESPACE, qname) try: nsmap = elem.nsmap except AttributeError: nsmap = self.parser.namespaces for pfx, uri in nsmap.items(): if pfx is None: pfx = '' if pfx == prefix: if pfx: return QName(uri, '{}:{}'.format(pfx, match.groupdict()['local'])) else: return QName(uri, match.groupdict()['local']) if prefix or '' in nsmap or None in nsmap: raise self.error('FONS0004', 'no namespace found for prefix %r' % prefix) return QName('', qname) ### # Accessor functions @method(function('node-name', nargs=1, sequence_types=('node()?', 'xs:QName?'))) def evaluate_node_name_function(self, context=None): arg = self.get_argument(context) if arg is None: return None elif not isinstance(arg, XPathNode): raise self.error('XPTY0004', 'an XPath node required') name = arg.name if name is None: return elif name.startswith('{'): # name is a QName in extended format namespace, local_name = split_expanded_name(name) for pfx, uri in self.parser.namespaces.items(): if uri == namespace: return QName(uri, '{}:{}'.format(pfx, local_name)) raise self.error('FONS0004', 'no prefix found for namespace {}'.format(namespace)) else: # name is a local name return QName(self.parser.namespaces.get('', ''), name) @method(function('nilled', nargs=1, sequence_types=('node()?', 'xs:boolean?'))) def evaluate_nilled_function(self, context=None): arg = self.get_argument(context) if arg is None: return None elif not isinstance(arg, XPathNode): raise self.error('XPTY0004', 'an XPath node required') return arg.nilled @method(function('data', nargs=1, sequence_types=('item()*', 'xs:anyAtomicType*'))) def select_data_function(self, context=None): yield from self[0].atomization(context) @method(function('base-uri', nargs=(0, 1), sequence_types=('node()?', 'xs:anyURI?'))) def evaluate_base_uri_function(self, context=None): item = self.get_argument(context, default_to_context=True) if context is None: raise self.missing_context("context item is undefined") elif item is None: return None elif not isinstance(item, XPathNode): raise self.wrong_context_type("context item is not a node") elif isinstance(item, DocumentNode): base_uri = item.document.getroot().get(XML_BASE) return AnyURI(base_uri if base_uri is not None else '') else: context.item = item base_uri = [] for item in context.iter_ancestors(axis='ancestor-or-self'): if isinstance(item, ElementNode): uri = item.elem.get(XML_BASE) if uri is not None: if base_uri and urlsplit(uri).scheme: break base_uri.append(uri) if not urlsplit(uri).path.endswith('/'): break return AnyURI(''.join(base_uri)) @method(function('document-uri', nargs=1, sequence_types=('node()?', 'xs:anyURI?'))) def evaluate_document_uri_function(self, context=None): if context is None: raise self.missing_context() arg = self.get_argument(context) if isinstance(arg, DocumentNode): uri = arg.document_uri if uri is not None: return AnyURI(uri) elif isinstance(context.root, DocumentNode): if context.documents: for uri, doc in context.documents.items(): if doc and doc.document is context.root.document: return AnyURI(uri) return None ### # Number functions @method(function('round-half-to-even', nargs=(1, 2), sequence_types=('numeric?', 'xs:integer', 'numeric?'))) def evaluate_round_half_to_even_function(self, context=None): item = self.get_argument(context) if item is None: return elif isinstance(item, float) and (math.isnan(item) or math.isinf(item)): return item elif not isinstance(item, (float, int, Decimal)): code = 'XPTY0004' if isinstance(item, str) else 'FORG0006' raise self.error(code, "invalid argument type {!r}".format(type(item))) precision = 0 if len(self) < 2 else self[1].evaluate(context) try: if isinstance(item, int): return round(item, precision) elif isinstance(item, Decimal): return round(item, precision) elif isinstance(item, Float10): return Float10(round(item, precision)) return float(round(Decimal.from_float(item), precision)) except TypeError as err: raise self.error('XPTY0004', err) except (DecimalException, OverflowError): if isinstance(item, Decimal): return Decimal.from_float(round(float(item), precision)) return round(item, precision) @method(function('abs', nargs=1, sequence_types=('numeric?', 'numeric?'))) def evaluate_abs_function(self, context=None): item = self.get_argument(context) if item is None: return elif isinstance(item, float) and math.isnan(item): return item elif isinstance(item, XPathNode): value = self.string_value(item) try: return abs(Decimal(value)) except DecimalException: raise self.error('FOCA0002', "invalid string value {!r} for {!r}".format(value, item)) elif isinstance(item, bool) or not isinstance(item, (float, int, Decimal)): raise self.error('XPTY0004', "invalid argument type {!r}".format(type(item))) else: return abs(item) ### # Aggregate functions @method(function('avg', nargs=1, sequence_types=('xs:anyAtomicType*', 'xs:anyAtomicType'))) def evaluate_avg_function(self, context=None): values = [] for item in self[0].atomization(context): if isinstance(item, UntypedAtomic): values.append(self.cast_to_double(item.value)) elif isinstance(item, (AnyURI, bool)): raise self.error('FORG0006', 'non numeric value {!r} in the sequence'.format(item)) else: values.append(item) if not values: return elif isinstance(values[0], Duration): value = values[0] try: for item in values[1:]: value = value + item return value / len(values) except TypeError as err: raise self.error('FORG0006', err) elif all(isinstance(x, int) for x in values): result = sum(values) / Decimal(len(values)) return int(result) if result % 1 == 0 else result elif all(isinstance(x, (int, Decimal)) for x in values): return sum(values) / Decimal(len(values)) elif all(not isinstance(x, DoubleProxy) for x in values): try: return sum(Float10(x) if isinstance(x, Decimal) else x for x in values) / len(values) except TypeError as err: raise self.error('FORG0006', err) else: try: return sum(float(x) if isinstance(x, Decimal) else x for x in values) / len(values) except TypeError as err: raise self.error('FORG0006', err) @method(function('max', nargs=(1, 2), sequence_types=('xs:anyAtomicType*', 'xs:string', 'xs:anyAtomicType?'))) @method(function('min', nargs=(1, 2), sequence_types=('xs:anyAtomicType*', 'xs:string', 'xs:anyAtomicType?'))) def evaluate_max_min_functions(self, context=None): def max_or_min(): if not values: return values elif all(isinstance(x, str) for x in values): if to_any_uri: return AnyURI(aggregate_func(values)) elif any(isinstance(x, str) for x in values): if any(isinstance(x, ArithmeticProxy) for x in values): raise self.error('FORG0006', "cannot compare strings with numeric data") elif all(isinstance(x, (Decimal, int)) for x in values): return aggregate_func(values) elif any(isinstance(x, float) and math.isnan(x) for x in values): return float_class('NaN') elif all(isinstance(x, (int, float, Decimal)) for x in values): return float_class(aggregate_func(values)) return aggregate_func(values) values = [] float_class = None to_any_uri = None aggregate_func = max if self.symbol == 'max' else min for item in self[0].atomization(context): if isinstance(item, UntypedAtomic): values.append(self.cast_to_double(item)) float_class = float elif isinstance(item, float): values.append(item) if float_class is None: float_class = type(item) elif float_class is Float10 and not isinstance(item, Float10): float_class = float elif isinstance(item, AnyURI): values.append(item.value) if to_any_uri is None: to_any_uri = True elif isinstance(item, (DayTimeDuration, YearMonthDuration)): values.append(item) elif isinstance(item, (Duration, QName)): raise self.error('FORG0006', "xs:{} is not an ordered type".format(type(item).name)) else: to_any_uri = False values.append(item) try: if len(self) > 1: with self.use_locale(collation=self.get_argument(context, 1)): return max_or_min() return max_or_min() except TypeError as err: raise self.error('FORG0006', err) ### # General functions for sequences @method(function('empty', nargs=1, sequence_types=('item()*', 'xs:boolean'))) @method(function('exists', nargs=1, sequence_types=('item()*', 'xs:boolean'))) def evaluate_empty_and_exists_functions(self, context=None): return next(iter(self.select(context))) @method('empty') def select_empty_function(self, context=None): try: next(iter(self[0].select(context))) except StopIteration: yield True else: yield False @method('exists') def select_exists_function(self, context=None): try: next(iter(self[0].select(context))) except StopIteration: yield False else: yield True @method(function('distinct-values', nargs=(1, 2), sequence_types=('xs:anyAtomicType*', 'xs:string', 'xs:anyAtomicType*'))) def select_distinct_values_function(self, context=None): def distinct_values(): nan = False results = [] for value in self[0].atomization(context): if isinstance(value, (float, Decimal)): if math.isnan(value): if not nan: yield value nan = True elif all(not math.isclose(value, x, rel_tol=1E-18, abs_tol=0) for x in results if isinstance(x, (int, Decimal, float))): yield value results.append(value) elif value not in results: yield value results.append(value) if len(self) > 1: with self.use_locale(collation=self.get_argument(context, 1)): yield from distinct_values() else: yield from distinct_values() @method(function('insert-before', nargs=3, sequence_types=('item()*', 'xs:integer', 'item()*', 'item()*'))) def select_insert_before_function(self, context=None): position = self.get_argument(context, 1, required=True, cls=int) insert_at_pos = max(0, position - 1) inserted = False for pos, result in enumerate(self[0].select(context)): if not inserted and pos == insert_at_pos: yield from self[2].select(context) inserted = True yield result if not inserted: yield from self[2].select(context) @method(function('index-of', nargs=(2, 3), sequence_types=( 'xs:anyAtomicType*', 'xs:anyAtomicType', 'xs:string', 'xs:integer*'))) def select_index_of_function(self, context=None): value = self[1].get_atomized_operand(copy(context)) if value is None: raise self.error('XPTY0004', "2nd argument cannot be an empty sequence") if len(self) < 3: for pos, result in enumerate(self[0].atomization(context), start=1): if result == value: yield pos else: with self.use_locale(collation=self.get_argument(context, 2)): for pos, result in enumerate(self[0].atomization(context), start=1): if result == value: yield pos @method(function('remove', nargs=2, sequence_types=('item()*', 'xs:integer', 'item()*'))) def select_remove_function(self, context=None): position = self.get_argument(context, 1) if not isinstance(position, int): raise self.error('XPTY0004', 'an xs:integer required') for pos, result in enumerate(self[0].select(context), start=1): if pos != position: yield result @method(function('reverse', nargs=1, sequence_types=('item()*', 'item()*'))) def select_reverse_function(self, context=None): yield from reversed([x for x in self[0].select(context)]) @method(function('subsequence', nargs=(2, 3), sequence_types=('item()*', 'xs:double', 'xs:double', 'item()*'))) def select_subsequence_function(self, context=None): starting_loc = self.get_argument(context, 1, cls=NumericProxy) if not math.isnan(starting_loc) and not math.isinf(starting_loc): starting_loc = float(round_number(starting_loc)) if len(self) == 2: for pos, result in enumerate(self[0].select(context), start=1): if starting_loc <= pos: yield result else: length = self.get_argument(context, 2, cls=NumericProxy) if not math.isnan(length) and not math.isinf(length): length = float(round_number(length)) for pos, result in enumerate(self[0].select(context), start=1): if starting_loc <= pos < starting_loc + length: yield result @method(function('unordered', nargs=1, sequence_types=('item()*', 'item()*'))) def select_unordered_function(self, context=None): yield from sorted([x for x in self[0].select(context)], key=lambda x: self.string_value(x)) ### # Cardinality functions for sequences @method(function('zero-or-one', nargs=1, sequence_types=('item()*', 'item()?'))) def select_zero_or_one_function(self, context=None): results = iter(self[0].select(context)) try: item = next(results) except StopIteration: return try: next(results) except StopIteration: yield item else: raise self.error('FORG0003') @method(function('one-or-more', nargs=1, sequence_types=('item()*', 'item()+'))) def select_one_or_more_function(self, context=None): results = iter(self[0].select(context)) try: item = next(results) except StopIteration: raise self.error('FORG0004') from None else: yield item while True: try: yield next(results) except StopIteration: break @method(function('exactly-one', nargs=1, sequence_types=('item()*', 'item()'))) def select_exactly_one_function(self, context=None): results = iter(self[0].select(context)) try: item = next(results) except StopIteration: raise self.error('FORG0005') from None else: try: next(results) except StopIteration: yield item else: raise self.error('FORG0005') ### # Comparing sequences @method(function('deep-equal', nargs=(2, 3), sequence_types=('item()*', 'item()*', 'xs:string', 'xs:boolean'))) def evaluate_deep_equal_function(self, context=None): def deep_equal(): while True: value1 = next(seq1, None) value2 = next(seq2, None) if isinstance(value1, XPathFunction) or isinstance(value2, XPathFunction): raise self.error('FOTY0015') if (value1 is None) ^ (value2 is None): return False elif value1 is None: return True elif isinstance(value1, XPathNode) ^ isinstance(value2, XPathNode): return False elif not isinstance(value1, XPathNode): try: if isinstance(value1, bool): if not isinstance(value2, bool) or value1 is not value2: return False elif isinstance(value2, bool): return False elif isinstance(value1, UntypedAtomic): if not isinstance(value2, UntypedAtomic) or value1 != value2: return False elif isinstance(value2, UntypedAtomic): return False elif isinstance(value1, float): if math.isnan(value1): if not math.isnan(value2): return False elif math.isinf(value1): if value1 != value2: return False elif isinstance(value2, Decimal): if value1 != float(value2): return False elif not isinstance(value2, (value1.__class__, int)): return False elif value1 != value2: return False elif isinstance(value2, float): if math.isnan(value2): return False elif math.isinf(value2): if value1 != value2: return False elif isinstance(value1, Decimal): if value2 != float(value1): return False elif not isinstance(value1, (value2.__class__, int)): return False elif value1 != value2: return False elif value1 != value2: return False except TypeError: return False elif value1.kind != value2.kind: return False elif isinstance(value1, (ElementNode, CommentNode, ProcessingInstructionNode)): if not etree_deep_equal(value1.elem, value2.elem): return False elif isinstance(value1, DocumentNode): if not etree_deep_equal(value1.document.getroot(), value2.document.getroot()): return False elif value1.value != value2.value: return False elif isinstance(value1, AttributeNode): if value1.name != value2.name: return False elif isinstance(value1, NamespaceNode): if value1.prefix != value2.prefix: return False seq1 = iter(self[0].select(copy(context))) seq2 = iter(self[1].select(copy(context))) if len(self) > 2: with self.use_locale(collation=self.get_argument(context, 2)): return deep_equal() else: return deep_equal() ### # Regex @method(function('matches', nargs=(2, 3), sequence_types=('xs:string?', 'xs:string', 'xs:string', 'xs:boolean'))) def evaluate_matches_function(self, context=None): input_string = self.get_argument(context, default='', cls=str) pattern = self.get_argument(context, 1, required=True, cls=str) flags = 0 if len(self) > 2: for c in self.get_argument(context, 2, required=True, cls=str): if c in 'smix': flags |= getattr(re, c.upper()) elif c == 'q' and self.parser.version > '2': pattern = re.escape(pattern) else: raise self.error('FORX0001', "Invalid regular expression flag %r" % c) try: python_pattern = translate_pattern(pattern, flags, self.parser.xsd_version) return re.search(python_pattern, input_string, flags=flags) is not None except (re.error, RegexError) as err: msg = "Invalid regular expression: {}" raise self.error('FORX0002', msg.format(str(err))) from None except OverflowError as err: raise self.error('FORX0002', err) from None REPLACEMENT_PATTERN = re.compile(r'^([^\\$]|[\\]{2}|\\\$|\$\d+)*$') @method(function('replace', nargs=(3, 4), sequence_types=( 'xs:string?', 'xs:string', 'xs:string', 'xs:string', 'xs:string'))) def evaluate_replace_function(self, context=None): input_string = self.get_argument(context, default='', cls=str) pattern = self.get_argument(context, 1, required=True, cls=str) replacement = self.get_argument(context, 2, required=True, cls=str) flags = 0 q_flag = False if len(self) > 3: for c in self.get_argument(context, 3, required=True, cls=str): if c in 'smix': flags |= getattr(re, c.upper()) elif c == 'q' and self.parser.version > '2': pattern = re.escape(pattern) q_flag = True else: raise self.error('FORX0001', "Invalid regular expression flag %r" % c) try: python_pattern = translate_pattern(pattern, flags, self.parser.xsd_version) pattern = re.compile(python_pattern, flags=flags) except (re.error, RegexError): raise self.error('FORX0002', "Invalid regular expression %r" % pattern) else: if pattern.search(''): msg = "Regular expression %r matches zero-length string" raise self.error('FORX0003', msg % pattern.pattern) elif q_flag: # use replacement string as is (but inactivating escapes) replacement = replacement.replace('\\', '\\\\') input_string = input_string.replace('\\', '\\\\') return pattern.sub(replacement, input_string).replace('\\\\', '\\') elif REPLACEMENT_PATTERN.search(replacement) is None: raise self.error('FORX0004', "Invalid replacement string %r" % replacement) else: for g in range(pattern.groups, -1, -1): if '$%d' % g in replacement: replacement = re.sub(r'(?' % g, replacement) return pattern.sub(replacement, input_string).replace('\\$', '$') @method(function('tokenize', nargs=(2, 3), sequence_types=('xs:string?', 'xs:string', 'xs:string', 'xs:string*'))) def select_tokenize_function(self, context=None): input_string = self.get_argument(context, cls=str) pattern = self.get_argument(context, 1, required=True, cls=str) flags = 0 if len(self) > 2: for c in self.get_argument(context, 2, required=True, cls=str): if c in 'smix': flags |= getattr(re, c.upper()) elif c == 'q' and self.parser.version > '2': pattern = re.escape(pattern) else: raise self.error('FORX0001', "Invalid regular expression flag %r" % c) try: python_pattern = translate_pattern(pattern, flags, self.parser.xsd_version) pattern = re.compile(python_pattern, flags=flags) except (re.error, RegexError): raise self.error('FORX0002', "Invalid regular expression %r" % pattern) from None else: if pattern.search(''): msg = "Regular expression %r matches zero-length string" raise self.error('FORX0003', msg % pattern.pattern) if input_string: for value in pattern.split(input_string): if value is not None and pattern.search(value) is None: yield value ### # Functions on anyURI @method(function('resolve-uri', nargs=(1, 2), sequence_types=('xs:string?', 'xs:string', 'xs:anyURI?'))) def evaluate_resolve_uri_function(self, context=None): relative = self.get_argument(context, cls=str) if len(self) == 1: if self.parser.base_uri is None: raise self.error('FONS0005') elif relative is None: return elif not AnyURI.is_valid(relative): raise self.error('FORG0002', '{!r} is not a valid URI'.format(relative)) else: return self.get_absolute_uri(relative, as_string=False) base_uri = self.get_argument(context, index=1, required=True, cls=str) if not AnyURI.is_valid(base_uri): raise self.error('FORG0002', '{!r} is not a valid URI'.format(base_uri)) elif relative is None: return elif not AnyURI.is_valid(relative): raise self.error('FORG0002', '{!r} is not a valid URI'.format(relative)) else: return self.get_absolute_uri(relative, base_uri, as_string=False) ### # String functions @method(function('codepoints-to-string', nargs=1, sequence_types=('xs:integer*', 'xs:string'))) def evaluate_codepoints_to_string_function(self, context=None): result = [] for value in self[0].select(context): if isinstance(value, UntypedAtomic): value = int(value) if not isinstance(value, int): msg = "invalid type {} for codepoint {}".format(type(value), value) if isinstance(value, str): raise self.error('XPTY0004', msg) raise self.error('FORG0006', msg) elif is_xml_codepoint(value): result.append(chr(value)) else: msg = "{} is not a valid XML 1.0 codepoint".format(value) raise self.error('FOCH0001', msg) return ''.join(result) @method(function('string-to-codepoints', nargs=1, sequence_types=('xs:string?', 'xs:integer*'))) def evaluate_string_to_codepoints_function(self, context=None): arg = self.get_argument(context, cls=str) return [ord(c) for c in arg] if arg else None @method(function('compare', nargs=(2, 3), sequence_types=('xs:string?', 'xs:string?', 'xs:string', 'xs:integer?'))) def evaluate_compare_function(self, context=None): comp1 = self.get_argument(context, 0, cls=str, promote=(AnyURI, UntypedAtomic)) comp2 = self.get_argument(context, 1, cls=str, promote=(AnyURI, UntypedAtomic)) if comp1 is None or comp2 is None: return None if len(self) < 3: value = locale.strcoll(comp1, comp2) else: with self.use_locale(collation=self.get_argument(context, 2)): value = locale.strcoll(comp1, comp2) return 0 if not value else 1 if value > 0 else -1 @method(function('contains', nargs=(2, 3), sequence_types=('xs:string?', 'xs:string?', 'xs:string', 'xs:boolean'))) def evaluate_contains_function(self, context=None): arg1 = self.get_argument(context, default='', cls=str) arg2 = self.get_argument(context, index=1, default='', cls=str) if len(self) < 3: return arg2 in arg1 else: with self.use_locale(collation=self.get_argument(context, 2)): return arg2 in arg1 @method(function('codepoint-equal', nargs=2, sequence_types=('xs:string?', 'xs:string?', 'xs:boolean?'))) def evaluate_codepoint_equal_function(self, context=None): comp1 = self.get_argument(context, 0, cls=str) comp2 = self.get_argument(context, 1, cls=str) if comp1 is None or comp2 is None: return elif len(comp1) != len(comp2): return False else: return all(ord(c1) == ord(c2) for c1, c2 in zip(comp1, comp2)) @method(function('string-join', nargs=2, sequence_types=('xs:string*', 'xs:string', 'xs:string'))) def evaluate_string_join_function(self, context=None): items = [ self.validated_value(s, cls=str, promote=AnyURI) for s in self[0].atomization(context) ] return self.get_argument(context, 1, required=True, cls=str).join(items) @method(function('normalize-unicode', nargs=(1, 2), sequence_types=('xs:string?', 'xs:string', 'xs:string'))) def evaluate_normalize_unicode_function(self, context=None): arg = self.get_argument(context, default='', cls=str) if len(self) > 1: normalization_form = self.get_argument(context, 1, cls=str) if normalization_form is None: raise self.error('XPTY0004', "2nd argument can't be an empty sequence") else: normalization_form = normalization_form.strip().upper() else: normalization_form = 'NFC' if normalization_form == 'FULLY-NORMALIZED': msg = "%r normalization form not supported" % normalization_form raise self.error('FOCH0003', msg) if not arg: return '' elif not normalization_form: return arg try: return unicodedata.normalize(normalization_form, arg) except ValueError: msg = "unsupported normalization form %r" % normalization_form raise self.error('FOCH0003', msg) from None @method(function('upper-case', nargs=1, sequence_types=('xs:string?', 'xs:string'))) def evaluate_upper_case_function(self, context=None): return self.get_argument(context, default='', cls=str).upper() @method(function('lower-case', nargs=1, sequence_types=('xs:string?', 'xs:string'))) def evaluate_lower_case_function(self, context=None): return self.get_argument(context, default='', cls=str).lower() @method(function('encode-for-uri', nargs=1, sequence_types=('xs:string?', 'xs:string'))) def evaluate_encode_for_uri_function(self, context=None): uri_part = self.get_argument(context, cls=str) return '' if uri_part is None else urllib_quote(uri_part, safe='~') @method(function('iri-to-uri', nargs=1, sequence_types=('xs:string?', 'xs:string'))) def evaluate_iri_to_uri_function(self, context=None): iri = self.get_argument(context, cls=str, promote=AnyURI) return '' if iri is None else urllib_quote(iri, safe='-_.!~*\'()#;/?:@&=+$,[]%') @method(function('escape-html-uri', nargs=1, sequence_types=('xs:string?', 'xs:string'))) def evaluate_escape_html_uri_function(self, context=None): uri = self.get_argument(context, cls=str) if uri is None: return '' return urllib_quote(uri, safe=''.join(chr(cp) for cp in range(32, 127))) @method(function('starts-with', nargs=(2, 3), sequence_types=('xs:string?', 'xs:string?', 'xs:string', 'xs:boolean'))) def evaluate_starts_with_function(self, context=None): arg1 = self.get_argument(context, default='', cls=str) arg2 = self.get_argument(context, index=1, default='', cls=str) if len(self) < 3: return arg1.startswith(arg2) else: with self.use_locale(collation=self.get_argument(context, 2)): return arg1.startswith(arg2) @method(function('ends-with', nargs=(2, 3), sequence_types=('xs:string?', 'xs:string?', 'xs:string', 'xs:boolean'))) def evaluate_ends_with_function(self, context=None): arg1 = self.get_argument(context, default='', cls=str) arg2 = self.get_argument(context, index=1, default='', cls=str) if len(self) < 3: return arg1.endswith(arg2) else: with self.use_locale(collation=self.get_argument(context, 2)): return arg1.endswith(arg2) @method(function('substring-before', nargs=(2, 3), sequence_types=('xs:string?', 'xs:string?', 'xs:string', 'xs:string'))) @method(function('substring-after', nargs=(2, 3), sequence_types=('xs:string?', 'xs:string?', 'xs:string', 'xs:string'))) def evaluate_substring_functions(self, context=None): arg1 = self.get_argument(context, default='', cls=str) arg2 = self.get_argument(context, index=1, default='', cls=str) if len(self) < 3: index = arg1.find(arg2) else: with self.use_locale(collation=self.get_argument(context, 2)): index = arg1.find(arg2) if index < 0: return '' if self.symbol == 'substring-before': return arg1[:index] else: return arg1[index + len(arg2):] ### # Functions on durations, dates and times @method(function('years-from-duration', nargs=1, sequence_types=('xs:duration?', 'xs:integer?'))) def evaluate_years_from_duration_function(self, context=None): item = self.get_argument(context, cls=Duration) if item is None: return None elif item.months >= 0: return item.months // 12 else: return -(abs(item.months) // 12) @method(function('months-from-duration', nargs=1, sequence_types=('xs:duration?', 'xs:integer?'))) def evaluate_months_from_duration_function(self, context=None): item = self.get_argument(context, cls=Duration) if item is None: return None elif item.months >= 0: return item.months % 12 else: return -(abs(item.months) % 12) @method(function('days-from-duration', nargs=1, sequence_types=('xs:duration?', 'xs:integer?'))) def evaluate_days_from_duration_function(self, context=None): item = self.get_argument(context, cls=Duration) if item is None: return None elif item.seconds >= 0: return int(item.seconds // 86400) else: return - int(abs(item.seconds) // 86400) @method(function('hours-from-duration', nargs=1, sequence_types=('xs:duration?', 'xs:integer?'))) def evaluate_hours_from_duration_function(self, context=None): item = self.get_argument(context, cls=Duration) if item is None: return None elif item.seconds >= 0: return int(item.seconds // 3600 % 24) else: return - int(abs(item.seconds) // 3600 % 24) @method(function('minutes-from-duration', nargs=1, sequence_types=('xs:duration?', 'xs:integer?'))) def evaluate_minutes_from_duration_function(self, context=None): item = self.get_argument(context, cls=Duration) if item is None: return None elif item.seconds >= 0: return int(item.seconds // 60 % 60) else: return - int(abs(item.seconds) // 60 % 60) @method(function('seconds-from-duration', nargs=1, sequence_types=('xs:duration?', 'xs:decimal?'))) def evaluate_seconds_from_duration_function(self, context=None): item = self.get_argument(context, cls=Duration) if item is None: return None elif item.seconds >= 0: return item.seconds % 60 else: return -(abs(item.seconds) % 60) @method(function('year-from-dateTime', nargs=1, sequence_types=('xs:dateTime?', 'xs:integer?'))) @method(function('month-from-dateTime', nargs=1, sequence_types=('xs:dateTime?', 'xs:integer?'))) @method(function('day-from-dateTime', nargs=1, sequence_types=('xs:dateTime?', 'xs:integer?'))) @method(function('hours-from-dateTime', nargs=1, sequence_types=('xs:dateTime?', 'xs:integer?'))) @method(function('minutes-from-dateTime', nargs=1, sequence_types=('xs:dateTime?', 'xs:integer?'))) @method(function('seconds-from-dateTime', nargs=1, sequence_types=('xs:dateTime?', 'xs:decimal?'))) def evaluate_from_datetime_functions(self, context=None): cls = DateTime if self.parser.xsd_version == '1.1' else DateTime10 item = self.get_argument(context, cls=cls) if item is None: return elif self.symbol.startswith('year'): return item.year elif self.symbol.startswith('month'): return item.month elif self.symbol.startswith('day'): return item.day elif self.symbol.startswith('hour'): return item.hour elif self.symbol.startswith('minute'): return item.minute elif item.microsecond: return Decimal('{}.{}'.format(item.second, item.microsecond)) else: return item.second @method(function('timezone-from-dateTime', nargs=1, sequence_types=('xs:dateTime?', 'xs:dayTimeDuration?'))) def evaluate_timezone_from_datetime_function(self, context=None): cls = DateTime if self.parser.xsd_version == '1.1' else DateTime10 item = self.get_argument(context, cls=cls) if item is None or item.tzinfo is None: return return DayTimeDuration(seconds=item.tzinfo.offset.total_seconds()) @method(function('year-from-date', nargs=1, sequence_types=('xs:date?', 'xs:integer?'))) @method(function('month-from-date', nargs=1, sequence_types=('xs:date?', 'xs:integer?'))) @method(function('day-from-date', nargs=1, sequence_types=('xs:date?', 'xs:integer?'))) @method(function('timezone-from-date', nargs=1, sequence_types=('xs:date?', 'xs:dayTimeDuration?'))) def evaluate_from_date_functions(self, context=None): cls = Date if self.parser.xsd_version == '1.1' else Date10 item = self.get_argument(context, cls=cls) if item is None: return elif self.symbol.startswith('year'): return item.year elif self.symbol.startswith('month'): return item.month elif self.symbol.startswith('day'): return item.day elif item.tzinfo is None: return return DayTimeDuration(seconds=item.tzinfo.offset.total_seconds()) @method(function('hours-from-time', nargs=1, sequence_types=('xs:time?', 'xs:integer?'))) def evaluate_hours_from_time_function(self, context=None): item = self.get_argument(context, cls=Time) return None if item is None else item.hour @method(function('minutes-from-time', nargs=1, sequence_types=('xs:time?', 'xs:integer?'))) def evaluate_minutes_from_time_function(self, context=None): item = self.get_argument(context, cls=Time) return None if item is None else item.minute @method(function('seconds-from-time', nargs=1, sequence_types=('xs:time?', 'xs:decimal?'))) def evaluate_seconds_from_time_function(self, context=None): item = self.get_argument(context, cls=Time) return None if item is None else item.second + item.microsecond / Decimal('1000000.0') @method(function('timezone-from-time', nargs=1, sequence_types=('xs:time?', 'xs:dayTimeDuration?'))) def evaluate_timezone_from_time_function(self, context=None): item = self.get_argument(context, cls=Time) if item is None or item.tzinfo is None: return return DayTimeDuration(seconds=item.tzinfo.offset.total_seconds()) ### # Timezone adjustment functions @method(function('adjust-dateTime-to-timezone', nargs=(1, 2), sequence_types=('xs:dateTime?', 'xs:dayTimeDuration?', 'xs:dateTime?'))) def evaluate_adjust_datetime_to_timezone_function(self, context=None): cls = DateTime if self.parser.xsd_version == '1.1' else DateTime10 return self.adjust_datetime(context, cls) @method(function('adjust-date-to-timezone', nargs=(1, 2), sequence_types=('xs:date?', 'xs:dayTimeDuration?', 'xs:date?'))) def evaluate_adjust_date_to_timezone_function(self, context=None): cls = Date if self.parser.xsd_version == '1.1' else Date10 return self.adjust_datetime(context, cls) @method(function('adjust-time-to-timezone', nargs=(1, 2), sequence_types=('xs:time?', 'xs:dayTimeDuration?', 'xs:time?'))) def evaluate_adjust_time_to_timezone_function(self, context=None): return self.adjust_datetime(context, Time) ### # Static context functions @method(function('default-collation', nargs=0, sequence_types=('xs:string',))) def evaluate_default_collation_function(self, context=None): return self.parser.default_collation @method(function('static-base-uri', nargs=0, sequence_types=('xs:anyURI?',))) def evaluate_static_base_uri_function(self, context=None): if self.parser.base_uri is None: return None return AnyURI(self.parser.base_uri) ### # Dynamic context functions @method(function('current-dateTime', nargs=0, sequence_types=('xs:dateTime',))) def evaluate_current_datetime_function(self, context=None): dt = datetime.datetime.now() if context is None else context.current_dt if self.parser.xsd_version == '1.1': return DateTime(dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second, dt.microsecond, dt.tzinfo) return DateTime10(dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second, dt.microsecond, dt.tzinfo) @method(function('current-date', nargs=0, sequence_types=('xs:date',))) def evaluate_current_date_function(self, context=None): dt = datetime.datetime.now() if context is None else context.current_dt if self.parser.xsd_version == '1.1': return Date(dt.year, dt.month, dt.day, tzinfo=dt.tzinfo) return Date10(dt.year, dt.month, dt.day, tzinfo=dt.tzinfo) @method(function('current-time', nargs=0, sequence_types=('xs:time',))) def evaluate_current_time_function(self, context=None): dt = datetime.datetime.now() if context is None else context.current_dt return Time(dt.hour, dt.minute, dt.second, dt.microsecond, dt.tzinfo) @method(function('implicit-timezone', nargs=0, sequence_types=('xs:dayTimeDuration',))) def evaluate_implicit_timezone_function(self, context=None): if context is not None and context.timezone is not None: return DayTimeDuration.fromtimedelta(context.timezone.offset) else: return DayTimeDuration.fromtimedelta(datetime.timedelta(seconds=time.timezone)) ### # The root function (Ref: https://www.w3.org/TR/2010/REC-xpath-functions-20101214/#func-root) @method(function('root', nargs=(0, 1), sequence_types=('node()?', 'node()?'))) def evaluate_root_function(self, context=None): if context is None: raise self.missing_context() elif isinstance(context, XPathSchemaContext): return None elif not self: if context.item is None: return context.root elif not isinstance(context.item, XPathNode): raise self.error('XPTY0004') return context.get_root(context.item) else: item = self.get_argument(context) if item is None: return None elif not isinstance(item, XPathNode): raise self.error('XPTY0004') return context.get_root(item) @method(function('lang', nargs=(1, 2), sequence_types=('xs:string?', 'node()', 'xs:boolean'))) def evaluate_lang_function(self, context=None): if len(self) > 1: item = self.get_argument(context, index=1, default_to_context=True) elif context is None: raise self.missing_context() else: item = context.item if not isinstance(item, ElementNode): raise self.error('XPTY0004') try: lang = item.elem.attrib[XML_LANG].strip() except KeyError: if len(self) > 1: return False for elem in context.iter_ancestors(): try: if XML_LANG in elem.elem.attrib: lang = elem.elem.attrib[XML_LANG] break except AttributeError: pass # is a document node else: return False test_lang = self.get_argument(context, cls=str) if test_lang is None: test_lang = '' test_lang = test_lang.strip().lower() lang = lang.strip().lower() return lang == test_lang or lang.startswith(test_lang) and lang[len(test_lang)] == '-' ### # Functions that generate sequences @method(function('element-with-id', nargs=(1, 2), sequence_types=('xs:string*', 'node()', 'element()*'))) @method(function('id', nargs=(1, 2), sequence_types=('xs:string*', 'node()', 'element()*'))) def select_id_function(self, context=None): idrefs = {x for item in self[0].select(copy(context)) for x in self.string_value(item).split() if Id.is_valid(x)} if context is None: raise self.missing_context() if len(self) == 1: node = context.item if node is None: node = context.root else: node = self.get_argument(context, index=1) if not isinstance(node, XPathNode): raise self.error('XPTY0004') if isinstance(context, XPathSchemaContext): return None root = context.get_root(node) if root is None: return None # TODO: PSVI bindings with also xsi:type evaluation for element in filter(lambda x: isinstance(x, ElementNode), root.iter_descendants()): if element.elem.text in idrefs: if self.parser.schema is not None: xsd_element = self.parser.schema.find(element.path, self.parser.namespaces) if xsd_element is None or not xsd_element.type.is_key(): continue idrefs.remove(element.elem.text) if self.symbol == 'id': yield element else: parent = element.parent if parent is not None: yield parent continue # pragma: no cover for attr in element.attributes: if attr.value in idrefs: if attr.name == XML_ID: idrefs.remove(attr.value) yield element break if self.parser.schema is None: continue xsd_element = self.parser.schema.find(element.path, self.parser.namespaces) if xsd_element is None: continue xsd_attribute = xsd_element.attrib.get(attr.name) if xsd_attribute is None or not xsd_attribute.type.is_key(): continue # pragma: no cover idrefs.remove(attr.value) yield element break @method(function('idref', nargs=(1, 2), sequence_types=('xs:string*', 'node()', 'node()*'))) def select_idref_function(self, context=None): # TODO: PSVI bindings with also xsi:type evaluation ids = [x for x in self[0].select(context=copy(context))] node = self.get_argument(context, index=1, default_to_context=True) if isinstance(context, XPathSchemaContext): return elif context is None or node is not context.item: pass elif context.item is None: node = context.root if not isinstance(node, XPathNode): raise self.error('XPTY0004') elif not isinstance(node, (ElementNode, DocumentNode)): return for element in filter(lambda x: isinstance(x, ElementNode), node.iter_descendants()): text = element.elem.text if text and is_idrefs(text) and any(v in text.split() for x in ids for v in x.split()): yield element continue for attr in element.attributes: # pragma: no cover if attr.name != XML_ID and \ any(v in attr.value.split() for x in ids for v in x.split()): yield element break @method(function('doc', nargs=1, sequence_types=('xs:string?', 'document-node()?'))) @method(function('doc-available', nargs=1, sequence_types=('xs:string?', 'xs:boolean'))) def evaluate_doc_functions(self, context=None): uri = self.get_argument(context) if uri is None: return None if self.symbol == 'doc' else False elif isinstance(uri, str): pass elif isinstance(uri, AnyURI): uri = str(uri) elif isinstance(uri, UntypedAtomic): raise self.error('FODC0002') else: raise self.error('XPTY0004') if context is None: raise self.missing_context() elif isinstance(context, XPathSchemaContext): return None uri = uri.strip() if uri.startswith(':'): raise self.error('FODC0005') try: uri = self.get_absolute_uri(uri) except ElementPathValueError as err: if self.symbol == 'doc': raise self.error('FODC0002', err.message) from None return False try: doc = context.documents[uri] except (KeyError, TypeError): if self.symbol == 'doc': if is_local_dir_url(uri): raise self.error('FODC0005', 'document URI is a directory') raise self.error('FODC0002') return False else: if doc is None: raise self.error('FODC0002') try: sequence_type = self.parser.document_types[uri] except (KeyError, TypeError): sequence_type = 'document-node()' if not self.parser.match_sequence_type(doc, sequence_type): msg = "Type does not match sequence type {!r}" raise self.wrong_sequence_type(msg.format(sequence_type)) return doc if self.symbol == 'doc' else True @method(function('collection', nargs=(0, 1), sequence_types=('xs:string?', 'node()*'))) def evaluate_collection_function(self, context=None): uri = self.get_argument(context) if context is None: raise self.missing_context() elif isinstance(context, XPathSchemaContext): return elif not self or uri is None: if context.default_collection is None: raise self.error('FODC0002', 'no default collection has been defined') collection = context.default_collection sequence_type = self.parser.default_collection_type else: uri = self.get_absolute_uri(uri) try: collection = context.collections[uri] except (KeyError, TypeError): if is_local_dir_url(uri): raise self.error('FODC0004', 'collection URI is a directory') raise self.error('FODC0002', '{!r} collection not found'.format(uri)) from None try: sequence_type = self.parser.collection_types[uri] except (KeyError, TypeError): return collection if not self.parser.match_sequence_type(collection, sequence_type): msg = "Type does not match sequence type {!r}" raise self.wrong_sequence_type(msg.format(sequence_type)) return collection ### # The error function # # https://www.w3.org/TR/2010/REC-xpath-functions-20101214/#func-error # https://www.w3.org/TR/xpath-functions/#func-error # @method(function('error', nargs=(0, 3), sequence_types=('xs:QName?', 'xs:string', 'item()*', 'none'))) def evaluate_error_function(self, context=None): if not self: raise self.error('FOER0000') elif len(self) == 1: error = self.get_argument(context, cls=QName) if error is None: raise self.error('XPTY0004', "an xs:QName expected") raise self.error(error or 'FOER0000') else: error = self.get_argument(context, cls=QName) description = self.get_argument(context, index=1, cls=str) raise self.error(error or 'FOER0000', description) ### # The trace function # # https://www.w3.org/TR/2010/REC-xpath-functions-20101214/#func-trace # @method(function('trace', nargs=2, sequence_types=('item()*', 'xs:string', 'item()*'))) def select_trace_function(self, context=None): label = self.get_argument(context, index=1, cls=str) for value in self[0].select(context): '{} {}'.format(label, str(value).strip()) # TODO: trace dataset yield value # XPath 2.0 definitions continue into module xpath2_constructors elementpath-3.0.2/elementpath/xpath2/_xpath2_operators.py000066400000000000000000000740411427546011100235620ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # type: ignore """ XPath 2.0 implementation - part 2 (operators, expressions and multi-role tokens) """ import math import operator from copy import copy from decimal import Decimal, DivisionByZero from ..exceptions import ElementPathError, ElementPathTypeError from ..helpers import OCCURRENCE_INDICATORS, numeric_equal, numeric_not_equal, \ node_position from ..namespaces import XSD_NAMESPACE, XSD_NOTATION, XSD_ANY_ATOMIC_TYPE, \ get_namespace, get_expanded_name from ..datatypes import get_atomic_value, UntypedAtomic, QName, AnyURI, \ Duration, Integer, DoubleProxy10 from ..xpath_nodes import ElementNode, DocumentNode, XPathNode, AttributeNode from ..xpath_context import XPathSchemaContext from ..xpath_token import XPathFunction from .xpath2_parser import XPath2Parser COMPARISON_OPERATORS = {'eq', 'ne', 'lt', 'le', 'gt', 'ge'} register = XPath2Parser.register infix = XPath2Parser.infix method = XPath2Parser.method function = XPath2Parser.function @method('as') @method('of') def nud_as_and_of_symbols(self): raise self.error('XPDY0002') # Dynamic context required ### # Variables @method('$', bp=90) def nud_variable_reference(self): self.parser.expected_name('(name)', 'Q{') self[:] = self.parser.expression(rbp=90), return self @method('$') def evaluate_variable_reference(self, context=None): if context is None: raise self.missing_context() try: get_expanded_name(self[0].value, self.parser.namespaces) except KeyError as err: raise self.error('XPST0081', "namespace prefix {} not found".format(err)) varname = self[0].value try: return context.variables[varname] except KeyError: if isinstance(context, XPathSchemaContext): try: sequence_type = self.parser.variable_types[varname].strip() except KeyError: pass else: if sequence_type[-1] in OCCURRENCE_INDICATORS: sequence_type = sequence_type[:-1] if QName.pattern.match(sequence_type) is not None: try: type_name = get_expanded_name(sequence_type, self.parser.namespaces) except KeyError: pass else: xsd_type = context.root.elem.xpath_proxy.get_type(type_name) if xsd_type is not None: return get_atomic_value(xsd_type) return UntypedAtomic('1') raise self.missing_name('unknown variable %r' % str(varname)) ### # Node sequence composition XPath2Parser.duplicate('|', 'union') @method(infix('intersect', bp=55)) @method(infix('except', bp=55)) def select_intersect_and_except_operators(self, context=None): if context is None: raise self.missing_context() s1, s2 = set(self[0].select(copy(context))), set(self[1].select(copy(context))) if any(not isinstance(x, XPathNode) for x in s1) \ or any(not isinstance(x, XPathNode) for x in s2): raise self.error('XPTY0004', 'only XPath nodes are allowed') if self.symbol == 'except': yield from sorted(s1 - s2, key=node_position) else: yield from sorted(s1 & s2, key=node_position) ### # 'if' expression @method('if', bp=20) def nud_if_expression(self): if self.parser.next_token.symbol != '(': token = self.parser.symbol_table['(name)'](self.parser, self.symbol) return token.nud() self.parser.advance('(') self[:] = self.parser.expression(5), self.parser.advance(')') self.parser.advance('then') self[1:] = self.parser.expression(5), self.parser.advance('else') self[2:] = self.parser.expression(5), return self @method('if') def evaluate_if_expression(self, context=None): if self.boolean_value(self[0].evaluate(copy(context))): return self[1].evaluate(context) else: return self[2].evaluate(context) @method('if') def select_if_expression(self, context=None): if self.boolean_value([x for x in self[0].select(copy(context))]): yield from self[1].select(context) else: yield from self[2].select(context) ### # Quantified expressions @method('some', bp=20) @method('every', bp=20) def nud_quantified_expressions(self): del self[:] if self.parser.next_token.symbol != '$': token = self.parser.symbol_table['(name)'](self.parser, self.symbol) return token.nud() while True: self.parser.next_token.expected('$') variable = self.parser.expression(5) self.append(variable) self.parser.advance('in') expr = self.parser.expression(5) self.append(expr) for tk in filter(lambda x: x.symbol == '$', expr.iter()): if tk[0].value == variable[0].value: raise tk.error('XPST0008', 'loop variable in its range expression') if self.parser.next_token.symbol != ',': break self.parser.advance() self.parser.advance('satisfies') self.append(self.parser.expression(5)) return self @method('some') @method('every') def evaluate_quantified_expressions(self, context=None): if context is None: raise self.missing_context() context = copy(context) some = self.symbol == 'some' varnames = [self[k][0].value for k in range(0, len(self) - 1, 2)] selectors = [self[k].select for k in range(1, len(self) - 1, 2)] for results in copy(context).iter_product(selectors, varnames): context.variables.update(x for x in zip(varnames, results)) if self.boolean_value([x for x in self[-1].select(copy(context))]): if some: return True elif not some: return False return not some ### # 'for' expressions @method('for', bp=20) def nud_for_expression(self): del self[:] if self.parser.next_token.symbol != '$': token = self.parser.symbol_table['(name)'](self.parser, self.symbol) return token.nud() while True: self.parser.next_token.expected('$') variable = self.parser.expression(5) self.append(variable) self.parser.advance('in') expr = self.parser.expression(5) self.append(expr) for tk in filter(lambda x: x.symbol == '$', expr.iter()): if tk[0].value == variable[0].value: raise tk.error('XPST0008', 'loop variable in its range expression') if self.parser.next_token.symbol != ',': break self.parser.advance() self.parser.advance('return') self.append(self.parser.expression(5)) return self @method('for') def select_for_expression(self, context=None): if context is None: raise self.missing_context() context = copy(context) varnames = [self[k][0].value for k in range(0, len(self) - 1, 2)] selectors = [self[k].select for k in range(1, len(self) - 1, 2)] for results in copy(context).iter_product(selectors, varnames): context.variables.update(x for x in zip(varnames, results)) yield from self[-1].select(copy(context)) ### # Sequence type based @method('instance', bp=60) @method('treat', bp=61) def led_sequence_type_based_expressions(self, left): self.parser.advance('of' if self.symbol == 'instance' else 'as') if self.parser.next_token.label not in ('kind test', 'sequence type', 'function test'): self.parser.expected_name('(name)', ':') try: self[:] = left, self.parser.expression(rbp=self.rbp) except ElementPathTypeError as err: message = getattr(err, 'message', str(err)) raise self.error('XPST0003', message) from None next_symbol = self.parser.next_token.symbol if self[1].symbol != 'empty-sequence' and next_symbol in ('?', '*', '+'): self[2:] = self.parser.symbol_table[next_symbol](self.parser), # Add nullary token self.parser.advance() return self @method('instance') def evaluate_instance_expression(self, context=None): if len(self) > 2: occurs = self[2].symbol else: occurs = self[1].occurrence position = None if self[1].symbol == 'empty-sequence': for _ in self[0].select(context): return False return True elif self[1].label in ('kind test', 'sequence type', 'function test'): if context is None: raise self.missing_context() for position, context.item in enumerate(self[0].select(context)): result = self[1].evaluate(context) if result is None or isinstance(result, list) and not result: return occurs in ('*', '?') elif position and (occurs is None or occurs == '?'): return False else: return position is not None or occurs in ('*', '?') else: try: qname = get_expanded_name(self[1].source, self.parser.namespaces) except KeyError as err: raise self.error('XPST0081', "namespace prefix {} not found".format(err)) for position, item in enumerate(self[0].select(context)): try: if not self.parser.is_instance(item, qname): return False except KeyError: msg = "atomic type %r not found in in-scope schema types" raise self.error('XPST0051', msg % self[1].source) from None else: if position and (occurs is None or occurs == '?'): return False else: return position is not None or occurs in ('*', '?') @method('treat') def evaluate_treat_expression(self, context=None): if len(self) > 2: occurs = self[2].symbol else: occurs = self[1].occurrence position = None castable_expr = [] if self[1].symbol == 'empty-sequence': for _ in self[0].select(context): raise self.wrong_sequence_type() elif self[1].label in ('kind test', 'sequence type', 'function test'): for position, item in enumerate(self[0].select(context)): result = self[1].evaluate(context) if isinstance(result, list) and not result: raise self.wrong_sequence_type() elif position and (occurs is None or occurs == '?'): raise self.wrong_sequence_type("more than one item in sequence") castable_expr.append(item) else: if position is None and occurs not in ('*', '?'): raise self.wrong_sequence_type("the sequence cannot be empty") else: try: qname = get_expanded_name(self[1].source, self.parser.namespaces) except KeyError as err: raise self.error('XPST0081', 'prefix {} not found'.format(str(err))) if not qname.startswith('{') and not QName.is_valid(qname): raise self.error('XPST0003') for position, item in enumerate(self[0].select(context)): try: if not self.parser.is_instance(item, qname): msg = f"item {item!r} is not of type {self[1].source!r}" raise self.error('XPDY0050', msg) except KeyError: msg = "atomic type %r not found in in-scope schema types" raise self.error('XPST0051', msg % self[1].source) from None else: if position and (occurs is None or occurs == '?'): raise self.wrong_sequence_type("more than one item in sequence") castable_expr.append(item) else: if position is None and occurs not in ('*', '?'): raise self.wrong_sequence_type("the sequence cannot be empty") return castable_expr ### # Simple type based @method('castable', bp=62) @method('cast', bp=63) def led_cast_expressions(self, left): self.parser.advance('as') self.parser.expected_name('(name)', ':') self[:] = left, self.parser.expression(rbp=self.rbp) if self.parser.next_token.symbol == '?': self[2:] = self.parser.symbol_table['?'](self.parser), # Add nullary token self.parser.advance() return self @method('castable') @method('cast') def evaluate_cast_expressions(self, context=None): try: atomic_type = get_expanded_name(self[1].source, namespaces=self.parser.namespaces) except KeyError as err: raise self.error('XPST0081', 'prefix {} not found'.format(str(err))) if atomic_type in (XSD_NOTATION, XSD_ANY_ATOMIC_TYPE): raise self.error('XPST0080') namespace = get_namespace(atomic_type) if namespace != XSD_NAMESPACE and \ (self.parser.schema is None or self.parser.schema.get_type(atomic_type) is None): msg = "atomic type %r not found in the in-scope schema types" raise self.unknown_atomic_type(msg % atomic_type) result = [res for res in self[0].select(context)] if len(result) > 1: if self.symbol != 'cast': return False raise self.wrong_context_type("more than one value in expression") elif not result: if len(self) == 3: return [] if self.symbol == 'cast' else True elif self.symbol != 'cast': return False else: raise self.wrong_context_type("an atomic value is required") arg = self.data_value(result[0]) try: if namespace != XSD_NAMESPACE: value = self.parser.schema.cast_as(self.string_value(arg), atomic_type) else: local_name = atomic_type.split('}')[1] token_class = self.parser.symbol_table.get(local_name) if token_class is None or token_class.label != 'constructor function': msg = "atomic type %r not found in the in-scope schema types" raise self.unknown_atomic_type(msg % self[1].source) elif local_name == 'QName': if isinstance(arg, QName): pass elif self.parser.version < '3.0' and self[0].symbol != '(string)': raise self.error('XPTY0004', "Non literal string to QName cast") token = token_class(self.parser) value = token.cast(arg) except ElementPathError: if self.symbol != 'cast': return False raise except (TypeError, ValueError) as err: if self.symbol != 'cast': return False elif isinstance(arg, (UntypedAtomic, str)): raise self.error('FORG0001', err) from None raise self.error('XPTY0004', err) from None else: return value if self.symbol == 'cast' else True ### # Comma operator - concatenate items or sequences @method(infix(',', bp=5)) def evaluate_comma_operator(self, context=None): results = [] for op in self: result = op.evaluate(context) if isinstance(result, list): results.extend(result) elif result is not None: results.append(result) return results @method(',') def select_comma_operator(self, context=None): for op in self: yield from op.select(context=copy(context)) ### # Parenthesized expression: XPath 2.0 admits the empty case (). @method(register('(', lbp=80, rpb=80, label='expression')) def nud_parenthesized_expression(self): if self.parser.next_token.symbol != ')': self[:] = self.parser.expression(), self.parser.advance(')') return self @method('(') def led_parenthesized_expression(self, left): if left.symbol == '(name)': if left.value in self.parser.RESERVED_FUNCTION_NAMES: msg = f"{left.value!r} is not allowed as function name" raise left.error('XPST0003', msg) else: raise left.error('XPST0017', 'unknown function {!r}'.format(left.value)) elif left.symbol == ':' and left[1].symbol == '(name)': if left[1].namespace == XSD_NAMESPACE: msg = 'unknown constructor function {!r}'.format(left[1].value) raise left[1].error('XPST0017', msg) raise left.error('XPST0017', 'unknown function {!r}'.format(left.value)) if self.parser.next_token.symbol != ')': self[:] = left, self.parser.expression() else: self[:] = left, self.parser.advance(')') return self @method('(') def evaluate_parenthesized_expression(self, context=None): return self[0].evaluate(context) if self else [] @method('(') def select_parenthesized_expression(self, context=None): return self[0].select(context) if self else iter(()) ### # Value comparison operators (eq, ne, lt, le, gt, and ge) # # Ref: https://www.w3.org/TR/xpath20/#id-value-comparisons # @method('eq', bp=30) @method('ne', bp=30) @method('lt', bp=30) @method('gt', bp=30) @method('le', bp=30) @method('ge', bp=30) def led_value_comparison_operators(self, left): if left.symbol in COMPARISON_OPERATORS: raise self.wrong_syntax() self[:] = left, self.parser.expression(rbp=30) return self @method('eq') @method('ne') @method('lt') @method('gt') @method('le') @method('ge') def evaluate_value_comparison_operators(self, context=None): operands = [self[0].get_atomized_operand(context=copy(context)), self[1].get_atomized_operand(context=copy(context))] if any(x is None for x in operands): return None elif any(isinstance(x, XPathFunction) for x in operands): raise self.error('FOTY0013', "cannot compare a function item") elif all(isinstance(x, DoubleProxy10) for x in operands): # Special case of two values: use custom operators if self.symbol == 'eq': return numeric_equal(*operands) elif self.symbol == 'ne': return numeric_not_equal(*operands) elif numeric_equal(*operands): return self.symbol in ('le', 'ge') cls0, cls1 = type(operands[0]), type(operands[1]) if cls0 is cls1 and cls0 is not Duration: pass elif all(isinstance(x, float) for x in operands): pass elif any(isinstance(x, bool) for x in operands): msg = "cannot apply {} between {!r} and {!r}".format(self, *operands) raise self.error('XPTY0004', msg) elif all(isinstance(x, (int, Decimal)) for x in operands): pass elif all(isinstance(x, (str, UntypedAtomic, AnyURI)) for x in operands): pass elif all(isinstance(x, (str, UntypedAtomic, QName)) for x in operands): pass elif all(isinstance(x, (float, Decimal, int)) for x in operands): if isinstance(operands[0], float): operands[1] = float(operands[1]) else: operands[0] = float(operands[0]) elif all(isinstance(x, Duration) for x in operands) and self.symbol in ('eq', 'ne'): pass elif (issubclass(cls0, cls1) or issubclass(cls1, cls0)) and not issubclass(cls0, Duration): pass else: msg = "cannot apply {} between {!r} and {!r}".format(self, *operands) raise self.error('XPTY0004', msg) try: return getattr(operator, self.symbol)(*operands) except TypeError as err: raise self.error('XPTY0004', err) from None ### # Node comparison @method('is', bp=30) def led_node_comparison(self, left): if left.symbol == 'is': raise self.wrong_syntax() self[:] = left, self.parser.expression(rbp=30) return self @method('is') @method(infix('<<', bp=30)) @method(infix('>>', bp=30)) def evaluate_node_comparison(self, context=None): symbol = self.symbol left = [x for x in self[0].select(context)] if not left: return None elif len(left) > 1 or not isinstance(left[0], XPathNode): raise self[0].error('XPTY0004', "left operand of %r must be a single node" % symbol) right = [x for x in self[1].select(context)] if not right: return None elif len(right) > 1 or not isinstance(right[0], XPathNode): raise self[0].error('XPTY0004', "right operand of %r must be a single node" % symbol) if symbol == 'is': return left[0] is right[0] else: if left[0] is right[0]: return False documents = [context.root] documents.extend(v for v in context.variables.values() if isinstance(v, DocumentNode)) for root in documents: for item in root.iter_document(): # pragma: no cover if left[0] is item: return True if symbol == '<<' else False elif right[0] is item: return False if symbol == '<<' else True else: raise self.error('FOCA0002', "operands are not nodes of the XML tree!") ### # Range expression @method('to', bp=35) def led_range_expression(self, left): if left.symbol == 'to': raise self.wrong_syntax() self[:] = left, self.parser.expression(rbp=35) return self @method('to') def evaluate_range_expression(self, context=None): start, stop = self.get_operands(context, cls=Integer) try: return [x for x in range(start, stop + 1)] except TypeError: return [] @method('to') def select_range_expression(self, context=None): yield from self.evaluate(context) ### # Numerical operators @method(infix('idiv', bp=45)) def evaluate_idiv_operator(self, context=None): op1, op2 = self.get_operands(context) if op1 is None or op2 is None: raise self.error('XPST0005') try: if math.isinf(op1): raise self.error('FOAR0001' if op2 == 0 else 'FOAR0002') elif math.isnan(op1) or math.isnan(op2): raise self.error('FOAR0002') except TypeError as err: raise self.error('XPTY0004', err) from None try: result = op1 // op2 except (ZeroDivisionError, DivisionByZero): raise self.error('FOAR0001') from None else: if result >= 0 or isinstance(op1, Decimal) or \ isinstance(op2, Decimal) or abs(op1) == abs(op2): return int(result) else: return int(result) + 1 # Resolve the intrinsic ambiguity of some infix operators @method('union') @method('intersect') @method('except') @method('eq') @method('ne') @method('lt') @method('gt') @method('le') @method('ge') @method('is') @method('to') @method('idiv') @method('instance') @method('treat') @method('castable') @method('cast') def nud_disambiguation_of_infix_operators(self): token = self.parser.symbol_table['(name)'](self.parser, self.symbol) return token.nud() ### # Kind tests (sequence types that can appear also in XPath expressions) @method(function('document-node', nargs=(0, 1), label='kind test')) def select_document_node_kind_test(self, context=None): if context is None: raise self.missing_context() elif not self: if isinstance(context.item, DocumentNode): yield context.item elif isinstance(context.root, DocumentNode) and context.item is None: for item in context.iter_children_or_self(): if item is None: yield context.root else: elements = [e for e in self[0].select(copy(context)) if isinstance(e, ElementNode)] if isinstance(context.root, DocumentNode) and context.item is None: if len(elements) == 1: yield context.root @method('document-node') def nud_document_node_kind_test(self): self.parser.advance('(') if self.parser.next_token.symbol in ('element', 'schema-element'): self[0:] = self.parser.expression(5), if self.parser.next_token.symbol == ',': raise self.wrong_nargs('Too many arguments: expected at most 1 argument') elif self.parser.next_token.symbol != ')': raise self.error('XPST0003', 'element or schema-element kind test expected') self.parser.advance(')') self.value = None return self @method(function('element', nargs=(0, 2), label='kind test')) def select_element_kind_test(self, context=None): if context is None: raise self.missing_context() elif not self: for item in context.iter_children_or_self(): if isinstance(item, ElementNode): yield item else: for item in self[0].select(context): if len(self) == 1: yield item elif isinstance(item, ElementNode): try: type_annotation = get_expanded_name(self[1].source, self.parser.namespaces) except KeyError: type_annotation = self[1].source if item.nilled: if type_annotation[-1] in '*?': yield item elif item.xsd_type is not None and type_annotation == item.xsd_type.name: yield item @method('element') def nud_element_kind_test(self): self.parser.advance('(') if self.parser.next_token.symbol != ')': self.parser.expected_name('(name)', ':', '*', message='a QName or a wildcard expected') self[0:] = self.parser.expression(5), if self.parser.next_token.symbol == ',': self.parser.advance(',') self.parser.expected_name('(name)', ':', message='a QName expected') self[1:] = self.parser.expression(5), if self.parser.next_token.symbol in ('*', '+', '?'): self[1].occurrence = self.parser.next_token.symbol self.parser.advance() self.parser.advance(')') self.value = None return self @method(function('schema-attribute', nargs=1, label='kind test')) def select_schema_attribute_kind_test(self, context=None): if context is None: raise self.missing_context() attribute_name = self[0].source for _ in context.iter_children_or_self(): qname = get_expanded_name(attribute_name, self.parser.namespaces) if self.parser.schema.get_attribute(qname) is None: raise self.missing_name("attribute %r not found in schema" % attribute_name) if isinstance(context.item, AttributeNode) and context.item.match_name(qname): yield context.item return if not isinstance(context, XPathSchemaContext): raise self.error('XPST0008', 'schema attribute %r not found' % attribute_name) @method(function('schema-element', nargs=1, label='kind test')) def select_schema_element_kind_test(self, context=None): if context is None: raise self.missing_context() element_name = self[0].source for _ in context.iter_children_or_self(): qname = get_expanded_name(element_name, self.parser.namespaces) if self.parser.schema.get_element(qname) is None \ and self.parser.schema.get_substitution_group(qname) is None: raise self.missing_name("element %r not found in schema" % element_name) if isinstance(context.item, ElementNode) and context.item.elem.tag == qname: yield context.item return if not isinstance(context, XPathSchemaContext): raise self.error('XPST0008', 'schema element %r not found' % element_name) @method('schema-attribute') @method('schema-element') def nud_schema_node_kind_test(self): self.parser.advance('(') self.parser.expected_name('(name)', ':', message='a QName expected') self[0:] = self.parser.expression(5), self.parser.advance(')') self.value = None return self ### # Multi role-tokens definition: in XPath 2.0 the 'attribute' keyword is used both for # attribute:: axis and attribute() node type function. # # First the XPath1 token class has to be removed from the XPath2 symbol table. Then the # symbol has to be registered usually with the same binding power (bp --> lbp, rbp), a # multi-value label (using a tuple of values) and a custom pattern. Finally a custom nud # or led method is required. XPath2Parser.unregister('attribute') XPath2Parser.register( 'attribute', lbp=90, rbp=90, label=('kind test', 'axis'), pattern=r'\battribute(?=\s*\:\:|\s*\(\:.*\:\)\s*\:\:|\s*\(|\s*\(\:.*\:\)\()' ) @method('attribute') def nud_attribute_kind_test_or_axis(self): if self.parser.next_token.symbol == '::': self.label = 'axis' self.parser.advance('::') self.parser.expected_name( '(name)', '*', 'text', 'node', 'document-node', 'comment', 'processing-instruction', 'attribute', 'schema-attribute', 'element', 'schema-element', 'namespace-node' ) self[:] = self.parser.expression(rbp=90), else: self.label = 'kind test' self.parser.advance('(') if self.parser.next_token.symbol != ')': self.parser.next_token.expected('(name)', '*', ':') self[:] = self.parser.expression(5), if self.parser.next_token.symbol == ',': self.parser.advance(',') self.parser.next_token.expected('(name)', ':') self[1:] = self.parser.expression(5), self.parser.advance(')') if self.namespace: msg = f"{self.value!r} is not allowed as function name" raise self.error('XPST0003', msg) return self @method('attribute') def select_attribute_kind_test_or_axis(self, context=None): if context is None: raise self.missing_context() elif self.label == 'axis': for _ in context.iter_attributes(): yield from self[0].select(context) elif not self: for attribute in context.iter_attributes(): yield attribute.value else: name = self[0].value if self.parser.schema is not None and len(self) == 2: type_name = get_expanded_name(self[1].value, namespaces=self.parser.namespaces) else: type_name = None for attribute in context.iter_attributes(): if attribute.match_name(name): if isinstance(context, XPathSchemaContext): self.add_xsd_type(attribute) elif not type_name: yield attribute.value else: xsd_type = self.get_xsd_type(attribute) if xsd_type is not None and xsd_type.name == type_name: yield attribute.value # XPath 2.0 definitions continue into module xpath2_functions elementpath-3.0.2/elementpath/xpath2/xpath2_parser.py000066400000000000000000000476761427546011100227170ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 2.0 implementation - part 1 (parser class and symbols) """ from abc import ABCMeta import locale from collections.abc import MutableSequence from urllib.parse import urlparse from typing import cast, Any, Callable, ClassVar, Dict, FrozenSet, List, \ MutableMapping, Optional, Tuple, Type, Union from ..helpers import normalize_sequence_type, get_locale_category from ..exceptions import ElementPathError, ElementPathTypeError, \ ElementPathValueError, MissingContextError, xpath_error from ..namespaces import NamespacesType, XSD_NAMESPACE, XML_NAMESPACE, \ XPATH_FUNCTIONS_NAMESPACE, XQT_ERRORS_NAMESPACE, \ XSD_NOTATION, XSD_ANY_ATOMIC_TYPE, get_prefixed_name from ..datatypes import UntypedAtomic, AtomicValueType, QName from ..xpath_token import UNICODE_CODEPOINT_COLLATION, NargsType, \ XPathToken, XPathFunction, XPathConstructor from ..xpath_context import XPathContext from ..schema_proxy import AbstractSchemaProxy from ..xpath1 import XPath1Parser class XPath2Parser(XPath1Parser): """ XPath 2.0 expression parser class. This is the default parser used by XPath selectors. A parser instance represents also the XPath static context. With *variable_types* you can pass a dictionary with the types of the in-scope variables. Provide a *namespaces* dictionary argument for mapping namespace prefixes to URI inside expressions. If *strict* is set to `False` the parser enables also the parsing of QNames, like the ElementPath library. There are some additional XPath 2.0 related arguments. :param namespaces: a dictionary with mapping from namespace prefixes into URIs. :param variable_types: a dictionary with the static context's in-scope variable \ types. It defines the associations between variables and static types. :param strict: if strict mode is `False` the parser enables parsing of QNames, \ like the ElementPath library. Default is `True`. :param compatibility_mode: if set to `True` the parser instance works with \ XPath 1.0 compatibility rules. :param default_namespace: the default namespace to apply to unprefixed names. \ For default no namespace is applied (empty namespace ''). :param function_namespace: the default namespace to apply to unprefixed function \ names. For default the namespace "http://www.w3.org/2005/xpath-functions" is used. :param schema: the schema proxy class or instance to use for types, attributes and \ elements lookups. If an `AbstractSchemaProxy` subclass is provided then a schema \ proxy instance is built without the optional argument, that involves a mapping of \ only XSD builtin types. If it's not provided the XPath 2.0 schema's related \ expressions cannot be used. :param base_uri: an absolute URI maybe provided, used when necessary in the \ resolution of relative URIs. :param default_collation: the default string collation to use. If not set the \ environment's default locale setting is used. :param document_types: statically known documents, that is a dictionary from \ absolute URIs onto types. Used for type check when calling the *fn:doc* function \ with a sequence of URIs. The default type of a document is 'document-node()'. :param collection_types: statically known collections, that is a dictionary from \ absolute URIs onto types. Used for type check when calling the *fn:collection* \ function with a sequence of URIs. The default type of a collection is 'node()*'. :param default_collection_type: this is the type of the sequence of nodes that \ would result from calling the *fn:collection* function with no arguments. \ Default is 'node()*'. """ version = '2.0' SYMBOLS: ClassVar[FrozenSet[str]] = XPath1Parser.SYMBOLS | { 'union', 'intersect', 'instance', 'castable', 'if', 'then', 'else', 'for', 'to', 'some', 'every', 'in', 'satisfies', 'item', 'satisfies', 'cast', 'treat', 'return', 'except', '?', 'as', 'of', # Comments '(:', ':)', # Value comparison operators 'eq', 'ne', 'lt', 'le', 'gt', 'ge', # Node comparison operators 'is', '<<', '>>', # Mathematical operators 'idiv', # Node type functions 'document-node', 'schema-attribute', 'element', 'schema-element', 'attribute', 'empty-sequence', # Accessor functions 'node-name', 'nilled', 'data', 'base-uri', 'document-uri', # Number functions 'abs', 'round-half-to-even', # Aggregate functions 'avg', 'min', 'max', # String functions 'codepoints-to-string', 'string-to-codepoints', 'compare', 'codepoint-equal', 'string-join', 'normalize-unicode', 'upper-case', 'lower-case', 'encode-for-uri', 'iri-to-uri', 'escape-html-uri', 'ends-with', # General functions for sequences 'distinct-values', 'empty', 'exists', 'index-of', 'insert-before', 'remove', 'reverse', 'subsequence', 'unordered', # Cardinality functions for sequences 'zero-or-one', 'one-or-more', 'exactly-one', # Comparing function for sequences 'deep-equal', # Pattern matching functions 'matches', 'replace', 'tokenize', # Functions on anyURI 'resolve-uri', # Functions for extracting fragments from xs:duration 'years-from-duration', 'months-from-duration', 'days-from-duration', 'hours-from-duration', 'minutes-from-duration', 'seconds-from-duration', # Functions for extracting fragments from xs:dateTime 'year-from-dateTime', 'month-from-dateTime', 'day-from-dateTime', 'hours-from-dateTime', 'minutes-from-dateTime', 'seconds-from-dateTime', 'timezone-from-dateTime', # Functions for extracting fragments from xs:date 'year-from-date', 'month-from-date', 'day-from-date', 'timezone-from-date', # Functions for extracting fragments from xs:time 'hours-from-time', 'minutes-from-time', 'seconds-from-time', 'timezone-from-time', # Timezone adjustment functions 'adjust-dateTime-to-timezone', 'adjust-date-to-timezone', 'adjust-time-to-timezone', # Functions Related to QNames (QName function is also a constructor) 'QName', 'local-name-from-QName', 'prefix-from-QName', 'local-name-from-QName', 'namespace-uri-from-QName', 'namespace-uri-for-prefix', 'in-scope-prefixes', 'resolve-QName', # Static context functions 'default-collation', 'static-base-uri', # Dynamic context functions 'current-dateTime', 'current-date', 'current-time', 'implicit-timezone', # Node set functions 'root', # Error function and trace function 'error', 'trace', # XSD builtins constructors ('string', 'boolean' and 'QName' are # already registered as functions) 'normalizedString', 'token', 'language', 'Name', 'NCName', 'ENTITY', 'ID', 'IDREF', 'NMTOKEN', 'anyURI', 'NOTATION', 'decimal', 'int', 'integer', 'long', 'short', 'byte', 'double', 'float', 'nonNegativeInteger', 'positiveInteger', 'nonPositiveInteger', 'negativeInteger', 'unsignedLong', 'unsignedInt', 'unsignedShort', 'unsignedByte', 'dateTime', 'date', 'time', 'gDay', 'gMonth', 'gYear', 'gMonthDay', 'gYearMonth', 'duration', 'dayTimeDuration', 'yearMonthDuration', 'dateTimeStamp', 'base64Binary', 'hexBinary', 'untypedAtomic', # Functions and Operators that Generate Sequences ('id' changes but # is already registered) 'element-with-id', 'idref', 'doc', 'doc-available', 'collection', } DEFAULT_NAMESPACES: ClassVar[Dict[str, str]] = { 'xml': XML_NAMESPACE, 'xs': XSD_NAMESPACE, # 'xlink': XLINK_NAMESPACE, 'fn': XPATH_FUNCTIONS_NAMESPACE, 'err': XQT_ERRORS_NAMESPACE } PATH_STEP_LABELS = ('axis', 'function', 'kind test') PATH_STEP_SYMBOLS = { '(integer)', '(string)', '(float)', '(decimal)', '(name)', '*', '@', '..', '.', '(', '{' } # https://www.w3.org/TR/xpath20/#id-reserved-fn-names RESERVED_FUNCTION_NAMES = { 'attribute', 'comment', 'document-node', 'element', 'empty-sequence', 'if', 'item', 'node', 'processing-instruction', 'schema-attribute', 'schema-element', 'text', 'typeswitch', } function_signatures: Dict[Tuple[QName, int], str] = XPath1Parser.function_signatures.copy() namespaces: Dict[str, str] token: XPathToken next_token: XPathToken def __init__(self, namespaces: Optional[NamespacesType] = None, variable_types: Optional[Dict[str, str]] = None, strict: bool = True, compatibility_mode: bool = False, default_collation: Optional[str] = None, default_namespace: Optional[str] = None, function_namespace: Optional[str] = None, xsd_version: Optional[str] = None, schema: Optional[AbstractSchemaProxy] = None, base_uri: Optional[str] = None, document_types: Optional[Dict[str, str]] = None, collection_types: Optional[Dict[str, str]] = None, default_collection_type: str = 'node()*') -> None: super(XPath2Parser, self).__init__(namespaces, strict) self.compatibility_mode = compatibility_mode self._default_collation = default_collation self._xsd_version = xsd_version if xsd_version is not None else '1.0' if default_namespace is not None: self.default_namespace = self.namespaces[''] = default_namespace else: self.default_namespace = self.namespaces.get('', '') if function_namespace is not None: self.function_namespace = function_namespace if schema is None: pass elif not isinstance(schema, AbstractSchemaProxy): msg = "argument 'schema' must be an instance of AbstractSchemaProxy" raise ElementPathTypeError(msg) else: schema.bind_parser(self) if not variable_types: self.variable_types = {} elif all(self.is_sequence_type(v) for v in variable_types.values()): self.variable_types = { k: normalize_sequence_type(v) for k, v in variable_types.items() } else: raise ElementPathValueError('invalid sequence type for in-scope variable types') self.base_uri = None if base_uri is None else urlparse(base_uri).geturl() if document_types: if any(not self.is_sequence_type(v) for v in document_types.values()): raise ElementPathValueError('invalid sequence type in document_types argument') self.document_types = document_types if collection_types: if any(not self.is_sequence_type(v) for v in collection_types.values()): raise ElementPathValueError('invalid sequence type in collection_types argument') self.collection_types = collection_types if not self.is_sequence_type(default_collection_type): raise ElementPathValueError('invalid sequence type for ' 'default_collection_type argument') self.default_collection_type = default_collection_type def __getstate__(self) -> Dict[str, Any]: state = self.__dict__.copy() state.pop('symbol_table', None) state.pop('tokenizer', None) return state @property def default_collation(self) -> str: if self._default_collation is not None: return self._default_collation language_code, encoding = get_locale_category(locale.LC_COLLATE).split('.') if language_code is None: return UNICODE_CODEPOINT_COLLATION elif encoding is None or not encoding: return language_code else: collation = f'{language_code}.{encoding}' if collation != 'en_US.UTF-8': return collation else: return UNICODE_CODEPOINT_COLLATION @property def xsd_version(self) -> str: if self.schema is None: return self._xsd_version try: return self.schema.xsd_version except (AttributeError, NotImplementedError): return self._xsd_version def advance(self, *symbols: str) -> XPathToken: super(XPath2Parser, self).advance(*symbols) if self.next_token.symbol == '(:': # Parses and consumes an XPath 2.0 comment. A comment is delimited # by symbols '(:' and ':)' and can be nested. The current token is # saved and restored after parsing the entire comment. Comments # cannot be inside a prefixed name ':' specification. self.token.unexpected(':') token = self.token comment_level = 1 while comment_level: self.advance_until('(:', ':)') if self.next_token.symbol == ':)': comment_level -= 1 else: comment_level += 1 self.advance(':)') self.next_token.unexpected(':') self.token = token return self.next_token @classmethod def constructor(cls, symbol: str, bp: int = 0, nargs: NargsType = 1, sequence_types: Union[Tuple[()], Tuple[str, ...], List[str]] = (), label: Union[str, Tuple[str, ...]] = 'constructor function') \ -> Callable[[Callable[..., Any]], Callable[..., Any]]: """Creates a constructor token class.""" def nud_(self: XPathConstructor) -> XPathConstructor: try: self.parser.advance('(') self[0:] = self.parser.expression(5), if self.parser.next_token.symbol == ',': raise self.wrong_nargs('Too many arguments: expected at most 1 argument') self.parser.advance(')') self.value = None except SyntaxError: raise self.error('XPST0017') from None if self[0].symbol == '?': self._partial_function() return self def evaluate_(self: XPathConstructor, context: Optional[XPathContext] = None) \ -> Union[List[None], AtomicValueType]: arg = self.data_value(self.get_argument(context)) if arg is None: return [] elif arg == '?' and self[0].symbol == '?': raise self.error('XPTY0004', "cannot evaluate a partial function") try: if isinstance(arg, UntypedAtomic): return self.cast(arg.value) return self.cast(arg) except ElementPathError: raise except (TypeError, ValueError) as err: raise self.error('FORG0001', err) from None if not sequence_types: assert nargs == 1 sequence_types = ('xs:anyAtomicType?', 'xs:%s?' % symbol) token_class = cls.register(symbol, nargs=nargs, sequence_types=sequence_types, label=label, bases=(XPathConstructor,), lbp=bp, rbp=bp, nud=nud_, evaluate=evaluate_) def bind(func: Callable[..., Any]) -> Callable[..., Any]: method_name = func.__name__.partition('_')[0] if method_name != 'cast': raise ValueError("The function name must be 'cast' or starts with 'cast_'") setattr(token_class, method_name, func) return func return bind def schema_constructor(self, atomic_type_name: str, bp: int = 90) \ -> Type[XPathFunction]: """Registers a token class for a schema atomic type constructor function.""" if atomic_type_name in (XSD_ANY_ATOMIC_TYPE, XSD_NOTATION): raise xpath_error('XPST0080') def nud_(self_: XPathFunction) -> XPathFunction: self_.parser.advance('(') self_[0:] = self_.parser.expression(5), self_.parser.advance(')') try: self_.value = self_.evaluate() # Static context evaluation except MissingContextError: self_.value = None return self_ def evaluate_(self_: XPathFunction, context: Optional[XPathContext] = None) \ -> Union[List[None], AtomicValueType]: arg = self_.get_argument(context) if arg is None or self_.parser.schema is None: return [] value = self_.string_value(arg) try: return self_.parser.schema.cast_as(value, atomic_type_name) except (TypeError, ValueError) as err: raise self_.error('FORG0001', err) symbol = get_prefixed_name(atomic_type_name, self.namespaces) token_class_name = "_%sConstructorFunction" % symbol.replace(':', '_') kwargs = { 'symbol': symbol, 'nargs': 1, 'label': 'constructor function', 'pattern': r'\b%s(?=\s*\(|\s*\(\:.*\:\)\()' % symbol, 'lbp': bp, 'rbp': bp, 'nud': nud_, 'evaluate': evaluate_, '__module__': self.__module__, '__qualname__': token_class_name, '__return__': None } token_class = cast( Type[XPathFunction], ABCMeta(token_class_name, (XPathFunction,), kwargs) ) MutableSequence.register(token_class) self.symbol_table[symbol] = token_class return token_class def is_schema_bound(self) -> bool: return 'symbol_table' in self.__dict__ def parse(self, source: str) -> XPathToken: root_token = super(XPath1Parser, self).parse(source) if root_token.label in ('sequence type', 'function test'): raise root_token.error('XPST0003', "not allowed in XPath expression") if self.schema is None: try: root_token.evaluate() # Static context evaluation except MissingContextError: pass else: # Static context evaluation with a dynamic schema context context = self.schema.get_context() for _ in root_token.select(context): pass return root_token def check_variables(self, values: MutableMapping[str, Any]) -> None: if self.variable_types is None: return for varname, xsd_type in self.variable_types.items(): if varname not in values: raise xpath_error('XPST0008', "missing variable {!r}".format(varname)) for varname, value in values.items(): try: sequence_type = self.variable_types[varname] except KeyError: sequence_type = 'item()*' if isinstance(value, list) else 'item()' if not self.match_sequence_type(value, sequence_type): message = "Unmatched sequence type for variable {!r}".format(varname) raise xpath_error('XPDY0050', message) ## # Remove symbols that have to be redefined for XPath 2.0. XPath2Parser.unregister(',') XPath2Parser.unregister('(') XPath2Parser.unregister('$') XPath2Parser.unregister('contains') XPath2Parser.unregister('lang') XPath2Parser.unregister('id') XPath2Parser.unregister('substring-before') XPath2Parser.unregister('substring-after') XPath2Parser.unregister('starts-with') ### # Symbols XPath2Parser.register('then') XPath2Parser.register('else') XPath2Parser.register('in') XPath2Parser.register('return') XPath2Parser.register('satisfies') XPath2Parser.register('?') XPath2Parser.register('(:') XPath2Parser.register(':)') # XPath 2.0 definitions continue into module xpath2_operators elementpath-3.0.2/elementpath/xpath3.py000066400000000000000000000007561427546011100201220ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from .xpath30 import XPath30Parser from .xpath31 import XPath31Parser XPath3Parser = XPath30Parser __all__ = ['XPath30Parser', 'XPath31Parser', 'XPath3Parser'] elementpath-3.0.2/elementpath/xpath30/000077500000000000000000000000001427546011100176205ustar00rootroot00000000000000elementpath-3.0.2/elementpath/xpath30/__init__.py000066400000000000000000000010031427546011100217230ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from typing import TYPE_CHECKING if TYPE_CHECKING: from .xpath30_parser import XPath30Parser else: from ._xpath30_functions import XPath30Parser __all__ = ['XPath30Parser'] elementpath-3.0.2/elementpath/xpath30/_translation_maps.py000066400000000000000000000101271427546011100237100ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ Translation maps for XPath 3.0+ format functions. Add languages with pull-requests. """ from string import ascii_lowercase ALPHABET_CHARACTERS = { None: ascii_lowercase, 'en': ascii_lowercase, 'it': 'abcdefghilmnopqrstuvz', 'el': 'αβγδεζηθικλμνξοπρςστυφχψω', } OTHER_NUMBERS = ( '\u2070\u00B9\u00B2\u00B3' + ''.join(chr(x) for x in range(0x2074, 0x207A)), # superscript digits (0-9) ''.join(chr(x) for x in range(0x2080, 0x208A)), # subscript digits (0-9) ''.join(chr(x) for x in range(0x2460, 0x2474)), # circled numbers (1-20) ''.join(chr(x) for x in range(0x2474, 0x2488)), # parenthesized numbers (1-20) ''.join(chr(x) for x in range(0x2488, 0x249C)), # full stop numbers (1-20) ) ROMAN_NUMERALS_MAP = { 1000: 'M', 900: 'CM', 500: 'D', 400: 'CD', 100: 'C', 90: 'XC', 50: 'L', 40: 'XL', 10: 'X', 9: 'IX', 5: 'V', 4: 'IV', 1: 'I', } NUM_TO_MONTH_MAPS = { 'en': { 1: 'january', 2: 'february', 3: 'march', 4: 'april', 5: 'may', 6: 'june', 7: 'july', 8: 'august', 9: 'september', 10: 'october', 11: 'november', 12: 'december', }, 'it': { 1: 'gennaio', 2: 'febbraio', 3: 'marzo', 4: 'aprile', 5: 'maggio', 6: 'giugno', 7: 'luglio', 8: 'agosto', 9: 'settembre', 10: 'ottobre', 11: 'novembre', 12: 'dicembre', }, } NUM_TO_WEEKDAY_MAPS = { 'en': { 1: 'monday', 2: 'tuesday', 3: 'wednesday', 4: 'thursday', 5: 'friday', 6: 'saturday', 7: 'sunday', }, 'it': { 1: 'lunedì', 2: 'martedì', 3: 'mercoledì', 4: 'giovedì', 5: 'venerdì', 6: 'sabato', 7: 'domenica', }, } NUM_TO_WORD_MAPS = { 'en': { 10 ** 9: 'billion', 10 ** 6: 'million', 1000: 'thousand', 100: 'hundred', 90: 'ninety', 80: 'eighty', 70: 'seventy', 60: 'sixty', 50: 'fifty', 40: 'forty', 30: 'thirty', 20: 'twenty', 19: 'nineteen', 18: 'eighteen', 17: 'seventeen', 16: 'sixteen', 15: 'fifteen', 14: 'fourteen', 13: 'thirteen', 12: 'twelve', 11: 'eleven', 10: 'ten', 9: 'nine', 8: 'eight', 7: 'seven', 6: 'six', 5: 'five', 4: 'four', 3: 'three', 2: 'two', 1: 'one', 0: 'zero', }, 'it': { 10 ** 9: 'miliardo', 10 ** 6: 'milione', 1000: 'mille', 100: 'cento', 90: 'novanta', 80: 'ottanta', 70: 'settanta', 60: 'sessanta', 50: 'cinquanta', 40: 'quaranta', 30: 'trenta', 20: 'venti', 19: 'diciannove', 18: 'diciotto', 17: 'diciassette', 16: 'sedici', 15: 'quindici', 14: 'quattordici', 13: 'tredici', 12: 'dodici', 11: 'undici', 10: 'dieci', 9: 'nove', 8: 'otto', 7: 'sette', 6: 'sei', 5: 'cinque', 4: 'quattro', 3: 'tre', 2: 'due', 1: 'uno', 0: 'zero', } } MILITARY_TIME_ZONES = { '+01': 'A', '+02': 'B', '+03': 'C', '+04': 'D', '+05': 'E', '+06': 'F', '+07': 'G', '+08': 'H', '+09': 'I', None: 'J', '+10': 'K', '+11': 'L', '+12': 'M', '-01': 'N', '-02': 'O', '-03': 'P', '-04': 'Q', '-05': 'R', '-06': 'S', '-07': 'T', '-08': 'U', '-09': 'V', '-10': 'W', '-11': 'X', '-12': 'Y', '+00': 'Z', } elementpath-3.0.2/elementpath/xpath30/_xpath30_functions.py000066400000000000000000001742601427546011100237220ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # type: ignore """ XPath 3.0 implementation - part 3 (functions) """ import decimal import os import re import codecs import math import xml.etree.ElementTree as ElementTree from copy import copy from urllib.parse import urlsplit try: import zoneinfo except ImportError: zoneinfo = None # Python < 3.9 from ..exceptions import ElementPathError from ..helpers import OCCURRENCE_INDICATORS, EQNAME_PATTERN, \ XML_NEWLINES_PATTERN, is_xml_codepoint, node_position from ..namespaces import get_expanded_name, split_expanded_name, \ XPATH_FUNCTIONS_NAMESPACE, XSLT_XQUERY_SERIALIZATION_NAMESPACE, \ XSD_NAMESPACE from ..etree import defuse_xml, etree_iter_paths from ..xpath_nodes import XPathNode, ElementNode, TextNode, AttributeNode, \ NamespaceNode, DocumentNode, ProcessingInstructionNode, CommentNode from ..tree_builders import get_node_tree from ..xpath_token import XPathFunction from ..xpath_context import XPathSchemaContext from ..datatypes import xsd10_atomic_types, NumericProxy, QName, Date10, \ DateTime10, Time, AnyURI, UntypedAtomic from ..regex import translate_pattern, RegexError from ._xpath30_operators import XPath30Parser from .xpath30_helpers import UNICODE_DIGIT_PATTERN, DECIMAL_DIGIT_PATTERN, \ MODIFIER_PATTERN, decimal_to_string, int_to_roman, int_to_alphabetic, \ format_digits, int_to_words, parse_datetime_picture, parse_datetime_marker, \ ordinal_suffix # XSLT and XQuery Serialization parameters SERIALIZATION_PARAMS = '{%s}serialization-parameters' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_OMIT_XML_DECLARATION = '{%s}omit-xml-declaration' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_USE_CHARACTER_MAPS = '{%s}use-character-maps' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_CHARACTER_MAP = '{%s}character-map' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_METHOD = '{%s}method' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_INDENT = '{%s}indent' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_VERSION = '{%s}version' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_CDATA = '{%s}cdata-section-elements' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_NO_INDENT = '{%s}suppress-indentation' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_STANDALONE = '{%s}standalone' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_ITEM_SEPARATOR = '{%s}item-separator' % XSLT_XQUERY_SERIALIZATION_NAMESPACE FORMAT_INTEGER_TOKENS = {'A', 'a', 'i', 'I', 'w', 'W', 'Ww'} DECL_PARAM_PATTERN = re.compile(r'([^\d\W][\w.\-\u00B7\u0300-\u036F\u203F\u2040]*)\s*=\s*') EXPONENT_PIC = re.compile(r'\d[eE]\d') register = XPath30Parser.register method = XPath30Parser.method function = XPath30Parser.function ### # 'inline function' expression or 'function test' @method(register('function', bp=90, label=('inline function', 'function test'), bases=(XPathFunction,))) def nud_inline_function(self): if self.parser.next_token.symbol != '(': self.label = 'inline function' token = self.parser.symbol_table['(name)'](self.parser, self.symbol) return token.nud() def append_sequence_type(tk): if tk.symbol == '(' and len(tk) == 1: tk = tk[0] sequence_type = tk.source next_symbol = self.parser.next_token.symbol if sequence_type != 'empty-sequence()' and next_symbol in OCCURRENCE_INDICATORS: self.parser.advance() sequence_type += next_symbol tk.occurrence = next_symbol if not self.parser.is_sequence_type(sequence_type): if 'xs:NMTOKENS' in sequence_type \ or 'xs:ENTITIES' in sequence_type \ or 'xs:IDREFS' in sequence_type: msg = "a list type cannot be used in a function signature" raise self.error('XPST0051', msg) raise self.error('XPST0003', "a sequence type expected") self.sequence_types.append(sequence_type) self.parser.advance('(') self.sequence_types = [] if self.parser.next_token.symbol in ('$', ')'): self.label = 'inline function' while self.parser.next_token.symbol != ')': self.parser.next_token.expected('$') param = self.parser.expression(5) name = param[0].value if any(name == tk[0].value for tk in self): raise self.error('XQST0039') self.append(param) if self.parser.next_token.symbol == 'as': self.parser.advance('as') token = self.parser.expression(90) append_sequence_type(token) else: self.sequence_types.append('item()*') self.parser.next_token.expected(')', ',') if self.parser.next_token.symbol == ',': self.parser.advance() self.parser.next_token.unexpected(')') self.parser.advance(')') elif self.parser.next_token.symbol == '*': self.label = 'function test' self.append(self.parser.advance('*')) self.sequence_types.append('*') self.parser.advance(')') return self else: self.label = 'function test' while True: token = self.parser.expression(5) append_sequence_type(token) self.append(token) if self.parser.next_token.symbol != ',': break self.parser.advance(',') self.parser.advance(')') # Add function return sequence type if self.parser.next_token.symbol != 'as': self.sequence_types.append('item()*') else: self.parser.advance('as') if self.parser.next_token.label not in ('kind test', 'sequence type', 'function test'): self.parser.expected_name('(name)', ':') token = self.parser.expression(rbp=90) append_sequence_type(token) if self.label == 'inline function': if self.parser.next_token.symbol != '{' and not self: self.label = 'function test' else: self.parser.advance('{') self.body = self.parser.expression() self.parser.advance('}') return self @method('function') def evaluate_anonymous_function(self, context=None): if context is None: raise self.missing_context() elif self.label == 'inline function': self.variables = context.variables.copy() # like a closure return self # A function test if not isinstance(context.item, XPathFunction): return None elif self.source == 'function(*)': return context.item elif len(context.item) != len(self): return None # compare sequence types for t1, t2 in zip(context.item.sequence_types[:-1], self.sequence_types[:-1]): # check occurrences if t1[-1] not in '?+*': if t2[-1] in '?+*': return None elif t1[-1] == '+': t1 = t1[:-1] if t2[-1] in '?*': return None elif t2[-1] == '+': t2 = t2[:-1] elif t1[-1] == '*': t1 = t1[:-1] if t2[-1] in '?+': return None elif t2[-1] == '*': t2 = t2[:-1] elif t1[-1] == '?': t1 = t1[:-1] if t2[-1] in '+*': return None elif t2[-1] == '?': t2 = t2[:-1] if t1 == t2: continue elif t1 == 'item()': continue elif t2 == 'item()': return None else: return context.item ### # Mathematical functions @method(function('pi', prefix='math', label='math function', nargs=0, sequence_types=('xs:double',))) def evaluate_pi_function(self, context=None): return math.pi @method(function('exp', prefix='math', label='math function', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_exp_function(self, context=None): arg = self.get_argument(context, cls=NumericProxy) if arg is not None: return math.exp(arg) @method(function('exp10', prefix='math', label='math function', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_exp10_function(self, context=None): arg = self.get_argument(context, cls=NumericProxy) if arg is not None: return float(10 ** arg) @method(function('log', prefix='math', label='math function', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_log_function(self, context=None): arg = self.get_argument(context, cls=NumericProxy) if arg is not None: return float('-inf') if not arg else float('nan') if arg <= -1 else math.log(arg) @method(function('log10', prefix='math', label='math function', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_log10_function(self, context=None): arg = self.get_argument(context, cls=NumericProxy) if arg is not None: return float('-inf') if not arg else float('nan') if arg <= -1 else math.log10(arg) @method(function('pow', prefix='math', label='math function', nargs=2, sequence_types=('xs:double?', 'numeric', 'xs:double?'))) def evaluate_pow_function(self, context=None): x = self.get_argument(context, cls=NumericProxy) y = self.get_argument(context, index=1, required=True, cls=NumericProxy) if x is not None: if not x and y < 0: return math.copysign(float('inf'), x) if (y % 2) == 1 else float('inf') try: return float(x ** y) except TypeError: return float('nan') @method(function('sqrt', prefix='math', label='math function', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_sqrt_function(self, context=None): arg = self.get_argument(context, cls=NumericProxy) if arg is not None: if arg < 0: return float('nan') return math.sqrt(arg) @method(function('sin', prefix='math', label='math function', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_sin_function(self, context=None): arg = self.get_argument(context, cls=NumericProxy) if arg is not None: if math.isinf(arg): return float('nan') return math.sin(arg) @method(function('cos', prefix='math', label='math function', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_cos_function(self, context=None): arg = self.get_argument(context, cls=NumericProxy) if arg is not None: if math.isinf(arg): return float('nan') return math.cos(arg) @method(function('tan', prefix='math', label='math function', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_tan_function(self, context=None): arg = self.get_argument(context, cls=NumericProxy) if arg is not None: if math.isinf(arg): return float('nan') return math.tan(arg) @method(function('asin', prefix='math', label='math function', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_asin_function(self, context=None): arg = self.get_argument(context, cls=NumericProxy) if arg is not None: if arg < -1 or arg > 1: return float('nan') return math.asin(arg) @method(function('acos', prefix='math', label='math function', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_acos_function(self, context=None): arg = self.get_argument(context, cls=NumericProxy) if arg is not None: if arg < -1 or arg > 1: return float('nan') return math.acos(arg) @method(function('atan', prefix='math', label='math function', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_atan_function(self, context=None): arg = self.get_argument(context, cls=NumericProxy) if arg is not None: return math.atan(arg) @method(function('atan2', prefix='math', label='math function', nargs=2, sequence_types=('xs:double', 'xs:double', 'xs:double'))) def evaluate_atan2_function(self, context=None): x = self.get_argument(context, cls=NumericProxy) y = self.get_argument(context, index=1, required=True, cls=NumericProxy) return math.atan2(x, y) ### # Formatting functions @method(function('format-integer', nargs=(2, 3), sequence_types=('xs:integer?', 'xs:string', 'xs:string?', 'xs:string'))) def evaluate_format_integer_function(self, context=None): value = self.get_argument(context, cls=NumericProxy) picture = self.get_argument(context, index=1, required=True, cls=str) lang = self.get_argument(context, index=2, cls=str) if value is None: return '' if ';' not in picture: fmt_token, fmt_modifier = picture, '' else: fmt_token, fmt_modifier = picture.rsplit(';', 1) if MODIFIER_PATTERN.match(fmt_modifier) is None: raise self.error('FODF1310') if not fmt_token: raise self.error('FODF1310') elif fmt_token in FORMAT_INTEGER_TOKENS: if fmt_token == 'a': result = int_to_alphabetic(value, lang) elif fmt_token == 'A': result = int_to_alphabetic(value, lang).upper() elif fmt_token == 'i': result = int_to_roman(value).lower() elif fmt_token == 'I': result = int_to_roman(value) elif fmt_token == 'w': return int_to_words(value, lang, fmt_modifier) elif fmt_token == 'W': return int_to_words(value, lang, fmt_modifier).upper() else: return int_to_words(value, lang, fmt_modifier).title() else: if UNICODE_DIGIT_PATTERN.search(fmt_token) is None: if any(not x.isalpha() and not x.isdigit() for x in fmt_token): result = str(value) # fallback for invalid pictures else: base_char = '1' for base_char in fmt_token: if base_char.isalpha(): break if base_char.islower(): result = int_to_alphabetic(value, base_char) else: result = int_to_alphabetic(value, base_char.lower()).upper() elif DECIMAL_DIGIT_PATTERN.search(fmt_token) is None or ',,' in fmt_token: msg = 'picture argument has an invalid primary format token' raise self.error('FODF1310', msg) else: digits = UNICODE_DIGIT_PATTERN.findall(fmt_token) cp = ord(digits[0]) if any((ord(ch) - cp) > 10 for ch in digits[1:]): msg = "picture argument mixes digits from different digit families" raise self.error('FODF1310', msg) elif fmt_token[0].isdigit(): if '#' in fmt_token: msg = 'picture argument has an invalid primary format token' raise self.error('FODF1310', msg) elif fmt_token[0] != '#': raise self.error('FODF1310', "invalid grouping in picture argument") if digits[0].isdigit(): cp = ord(digits[0]) while chr(cp - 1).isdigit(): cp -= 1 digits_family = ''.join(chr(cp + k) for k in range(10)) else: raise ValueError() if value < 0: result = '-' + format_digits(str(abs(value)), fmt_token, digits_family) else: result = format_digits(str(abs(value)), fmt_token, digits_family) if fmt_modifier.startswith('o'): return f'{result}{ordinal_suffix(value)}' return result @method(function('format-number', nargs=(2, 3), sequence_types=('numeric?', 'xs:string', 'xs:string?', 'xs:string'))) def evaluate_format_number_function(self, context=None): value = self.get_argument(context, cls=NumericProxy) picture = self.get_argument(context, index=1, required=True, cls=str) decimal_format_name = self.get_argument(context, index=2, cls=str) # Check and adapt decimal format name if decimal_format_name is not None: decimal_format_name = decimal_format_name.strip() if decimal_format_name.startswith('Q{'): if decimal_format_name.startswith('Q{}'): decimal_format_name = decimal_format_name[3:] else: decimal_format_name = decimal_format_name[1:] elif ':' in decimal_format_name: try: decimal_format_name = get_expanded_name( qname=decimal_format_name, namespaces=self.parser.namespaces ) except (KeyError, ValueError): raise self.error('FODF1280') from None try: decimal_format = self.parser.decimal_formats[decimal_format_name] except KeyError: decimal_format = self.parser.decimal_formats[None] pattern_separator = decimal_format['pattern-separator'] sub_pictures = picture.split(pattern_separator) if len(sub_pictures) > 2: raise self.error('FODF1310') decimal_separator = decimal_format['decimal-separator'] if any(p.count(decimal_separator) > 1 for p in sub_pictures): raise self.error('FODF1310') percent_sign = decimal_format['percent'] per_mille_sign = decimal_format['per-mille'] if any(p.count(percent_sign) + p.count(per_mille_sign) > 1 for p in sub_pictures): raise self.error('FODF1310') zero_digit = decimal_format['zero-digit'] optional_digit = decimal_format['digit'] digits_family = ''.join(chr(cp + ord(zero_digit)) for cp in range(10)) if any(optional_digit not in p and all(x not in p for x in digits_family) for p in sub_pictures): raise self.error('FODF1310') grouping_separator = decimal_format['grouping-separator'] adjacent_pattern = re.compile(r'[\\%s\\%s]{2}' % (grouping_separator, decimal_separator)) if any(adjacent_pattern.search(p) for p in sub_pictures): raise self.error('FODF1310') if any(x.endswith(grouping_separator) for s in sub_pictures for x in s.split(decimal_separator)): raise self.error('FODF1310') if self.parser.version == '3.0' and any(EXPONENT_PIC.search(s) for s in sub_pictures): raise self.error('FODF1310') if value is None or math.isnan(value): return decimal_format['NaN'] elif isinstance(value, float): value = decimal.Decimal.from_float(value) elif not isinstance(value, decimal.Decimal): value = decimal.Decimal(value) minus_sign = decimal_format['minus-sign'] prefix = '' if value >= 0: fmt_tokens = sub_pictures[0].split(decimal_separator) else: fmt_tokens = sub_pictures[-1].split(decimal_separator) if len(sub_pictures) == 1: prefix = minus_sign for k, ch in enumerate(fmt_tokens[0]): if ch.isdigit() or ch == optional_digit or ch == grouping_separator: prefix += fmt_tokens[0][:k] fmt_tokens[0] = fmt_tokens[0][k:] break else: prefix += fmt_tokens[0] fmt_tokens[0] = '' if not fmt_tokens[-1]: suffix = '' elif fmt_tokens[-1][-1] == percent_sign: suffix = fmt_tokens[-1][-1] fmt_tokens[-1] = fmt_tokens[-1][:-1] if value.as_tuple().exponent < 0: value *= 100 else: value = decimal.Decimal(int(value) * 100) elif fmt_tokens[-1][-1] == per_mille_sign: suffix = fmt_tokens[-1][-1] fmt_tokens[-1] = fmt_tokens[-1][:-1] if value.as_tuple().exponent < 0: value *= 1000 else: value = decimal.Decimal(int(value) * 1000) else: for k, ch in enumerate(reversed(fmt_tokens[-1])): if ch in digits_family or ch == optional_digit: idx = len(fmt_tokens[-1]) - k suffix = fmt_tokens[-1][idx:] fmt_tokens[-1] = fmt_tokens[-1][:idx] break else: suffix = fmt_tokens[-1] fmt_tokens[-1] = '' if math.isinf(value): return prefix + decimal_format['infinity'] + suffix # round the value by fractional part if len(fmt_tokens) == 1 or not fmt_tokens[-1]: exp = decimal.Decimal('1') else: k = -1 for ch in fmt_tokens[-1]: if ch in digits_family or ch == optional_digit: k += 1 exp = decimal.Decimal('.' + '0' * k + '1') try: if value > 0: value = value.quantize(exp, rounding='ROUND_HALF_UP') else: value = value.quantize(exp, rounding='ROUND_HALF_DOWN') except decimal.InvalidOperation: pass # number too large, don't round ... chunks = decimal_to_string(value).lstrip('-').split('.') result = format_digits(chunks[0], fmt_tokens[0], digits_family, optional_digit, grouping_separator) if len(fmt_tokens) > 1 and fmt_tokens[-1]: has_optional_digit = False for ch in fmt_tokens[-1]: if ch == optional_digit: has_optional_digit = True elif ch.isdigit() and has_optional_digit: raise self.error('FODF1310') if len(chunks) == 1: chunks.append(zero_digit) decimal_part = format_digits(chunks[1], fmt_tokens[-1], digits_family, optional_digit, grouping_separator) for ch in reversed(fmt_tokens[-1]): if ch == optional_digit: if decimal_part and decimal_part[-1] == zero_digit: decimal_part = decimal_part[:-1] else: if not decimal_part: decimal_part = zero_digit break if decimal_part: result += decimal_separator + decimal_part if not fmt_tokens[0] and result.startswith(zero_digit): result = result.lstrip(zero_digit) return prefix + result + suffix function('format-dateTime', nargs=(2, 5), sequence_types=('xs:dateTime?', 'xs:string', 'xs:string?', 'xs:string?', 'xs:string?', 'xs:string?')) function('format-date', nargs=(2, 5), sequence_types=('xs:date?', 'xs:string', 'xs:string?', 'xs:string?', 'xs:string?', 'xs:string?')) function('format-time', nargs=(2, 5), sequence_types=('xs:time?', 'xs:string', 'xs:string?', 'xs:string?', 'xs:string?', 'xs:string?')) @method('format-dateTime') @method('format-date') @method('format-time') def evaluate_format_date_time_functions(self, context=None): if self.symbol == 'format-dateTime': cls = DateTime10 invalid_markers = '' elif self.symbol == 'format-date': cls = Date10 invalid_markers = 'HhPmsf' else: cls = Time invalid_markers = 'YMDdFWwCE' value = self.get_argument(context, cls=cls) picture = self.get_argument(context, index=1, required=True, cls=str) if len(self) not in [2, 5]: raise self.error('XPST0017') language = self.get_argument(context, index=2, cls=str) calendar = self.get_argument(context, index=3, cls=str) place = self.get_argument(context, index=4, cls=str) if value is None: return '' try: literals, markers = parse_datetime_picture(picture) except ElementPathError as err: err.token = self raise if invalid_markers: for mrk in markers: if mrk[1] in invalid_markers: msg = 'Invalid date formatting component {!r}'.format(mrk) raise self.error('FOFD1350', msg) result = [] if language not in ('en', 'it', None): language = 'en' result.append('[Language: en') if calendar is not None: if calendar.startswith('Q{}'): calendar = calendar[3:] if calendar not in ('AD', 'ISO', 'OS'): if context is None or calendar != context.default_calendar: if QName.is_valid(calendar): if ':' not in calendar: msg = f'unknown calendar in no namespace {calendar!r}' raise self.error('FOFD1340', msg) try: _ = get_expanded_name(calendar, self.parser.namespaces) except (KeyError, ValueError) as err: raise self.error('FOFD1340', str(err)) from None elif EQNAME_PATTERN.search(calendar) is None: raise self.error('FOFD1340', f'Invalid calendar argument {calendar!r}') else: result.append('[' if not result else ', ') result.append('Calendar: AD') if place is not None and zoneinfo is not None: try: zone = zoneinfo.ZoneInfo(place.strip()) except zoneinfo.ZoneInfoNotFoundError: raise self.error('FOFD1340', f'Invalid place argument {place!r}') else: value = value.astimezone(zone) if result: result.append(']') for k in range(len(markers)): result.append(literals[k]) try: result.append(parse_datetime_marker(markers[k], value, language)) except ElementPathError as err: err.token = self raise result.append(literals[-1]) return ''.join(result) ### # String functions that use regular expressions @method(function('analyze-string', nargs=(2, 3), sequence_types=('xs:string?', 'xs:string', 'xs:string', 'element(fn:analyze-string-result)'))) def evaluate_analyze_string_function(self, context=None): input_string = self.get_argument(context, default='', cls=str) pattern = self.get_argument(context, 1, required=True, cls=str) flags = 0 if len(self) > 2: for c in self.get_argument(context, 2, required=True, cls=str): if c in 'smix': flags |= getattr(re, c.upper()) elif c == 'q' and self.parser.version > '2': pattern = re.escape(pattern) else: raise self.error('FORX0001', "Invalid regular expression flag %r" % c) try: python_pattern = translate_pattern(pattern, flags, self.parser.xsd_version) compiled_pattern = re.compile(python_pattern, flags=flags) except (re.error, RegexError) as err: msg = "Invalid regular expression: {}" raise self.error('FORX0002', msg.format(str(err))) from None except OverflowError as err: raise self.error('FORX0002', err) from None if compiled_pattern.match('') is not None: raise self.error('FORX0003', "pattern matches a zero-length string") if context is None: raise self.missing_context() level = 0 escaped = False char_class = False group_levels = [0] for s in compiled_pattern.pattern: if escaped: escaped = False elif s == '\\': escaped = True elif char_class: if s == ']': char_class = False elif s == '[': char_class = True elif s == '(': group_levels.append(level) level += 1 elif s == ')': level -= 1 lines = [''.format(XPATH_FUNCTIONS_NAMESPACE)] k = 0 while k < len(input_string): match = compiled_pattern.search(input_string, k) if match is None: lines.append('{}'.format(input_string[k:])) break elif not match.groups(): start, stop = match.span() if start > k: lines.append('{}'.format(input_string[k:start])) lines.append('{}'.format(input_string[start:stop])) k = stop else: start, stop = match.span() if start > k: lines.append('{}'.format(input_string[k:start])) k = start match_items = [] group_tmpl = '{}' empty_group_tmpl = '' unclosed_groups = 0 for idx in range(1, compiled_pattern.groups + 1): _start, _stop = match.span(idx) if _start < 0: continue elif _start > k: if unclosed_groups: for _ in range(unclosed_groups): match_items.append('') unclosed_groups = 0 match_items.append(input_string[k:_start]) if _start == _stop: if group_levels[idx] <= group_levels[idx - 1]: for _ in range(unclosed_groups): match_items.append('') unclosed_groups = 0 match_items.append(empty_group_tmpl.format(idx)) k = _stop elif idx == compiled_pattern.groups: k = _stop match_items.append(group_tmpl.format(idx, input_string[_start:k])) match_items.append('') else: next_start = match.span(idx + 1)[0] if next_start < 0 or _stop < next_start or _stop == next_start \ and group_levels[idx + 1] <= group_levels[idx]: k = _stop match_items.append(group_tmpl.format(idx, input_string[_start:k])) match_items.append('') else: k = next_start match_items.append(group_tmpl.format(idx, input_string[_start:k])) unclosed_groups += 1 for _ in range(unclosed_groups): match_items.append('') match_items.append(input_string[k:stop]) k = stop lines.append('{}'.format(''.join(match_items))) lines.append('') if self.parser.defuse_xml: root = context.etree.XML(defuse_xml(''.join(lines))) else: root = context.etree.XML(''.join(lines)) return get_node_tree(root=root, namespaces=self.parser.namespaces) ### # Functions and operators on nodes @method(function('path', nargs=(0, 1), sequence_types=('node()?', 'xs:string?'))) def evaluate_path_function(self, context=None): if context is None: raise self.missing_context() elif isinstance(context, XPathSchemaContext): return None elif not self: if context.item is None: return '/' item = context.item else: item = self.get_argument(context) if item is None: return None suffix = '' if isinstance(item, DocumentNode): return '/' elif isinstance(item, (ElementNode, CommentNode, ProcessingInstructionNode)): elem = item.elem elif isinstance(item, TextNode): elem = item.parent.elem suffix = '/text()[1]' elif isinstance(item, AttributeNode): elem = item.parent.elem if item.name.startswith('{'): suffix = f'/@Q{item.name}' else: suffix = f'/@{item.name}' elif isinstance(item, NamespaceNode): elem = item.parent.elem if item.prefix: suffix = f'/namespace::{item.prefix}' else: suffix = f'/namespace::*[Q{{{XPATH_FUNCTIONS_NAMESPACE}}}local-name()=""]' else: return None if isinstance(context.root, DocumentNode): root = context.root.getroot().elem path = f'/Q{root.tag}[1]' else: root = context.root.elem path = 'Q{%s}root()' % XPATH_FUNCTIONS_NAMESPACE if isinstance(item, ProcessingInstructionNode): if item.parent is None or isinstance(item.parent, DocumentNode): return f'/processing-instruction({item.name})[{context.position}]' elif isinstance(item, CommentNode): if item.parent is None or isinstance(item.parent, DocumentNode): return f'/comment()[{context.position}]' for e, path in etree_iter_paths(root, path): if e is elem: return path + suffix else: return None @method(function('has-children', nargs=(0, 1), sequence_types=('node()?', 'xs:boolean'))) def evaluate_has_children_function(self, context=None): if context is None: raise self.missing_context() elif not self: if context.item is None: return isinstance(context.root, DocumentNode) item = context.item if not isinstance(item, XPathNode): raise self.error('XPTY0004', 'context item must be a node') else: item = self.get_argument(context) if item is None: return False elif not isinstance(item, XPathNode): raise self.error('XPTY0004', 'argument must be a node') return isinstance(item, DocumentNode) or \ isinstance(item, ElementNode) and (len(item.elem) > 0 or item.elem.text is not None) @method(function('innermost', nargs=1, sequence_types=('node()*', 'node()*'))) def select_innermost_function(self, context=None): if context is None: raise self.missing_context() context = copy(context) nodes = [e for e in self[0].select(context)] if any(not isinstance(x, XPathNode) for x in nodes): raise self.error('XPTY0004', 'argument must contain only nodes') ancestors = {x for context.item in nodes for x in context.iter_ancestors(axis='ancestor')} results = {x for x in nodes if x not in ancestors} yield from sorted(results, key=node_position) @method(function('outermost', nargs=1, sequence_types=('node()*', 'node()*'))) def select_outermost_function(self, context=None): if context is None: raise self.missing_context() context = copy(context) nodes = {e for e in self[0].select(context)} if any(not isinstance(x, XPathNode) for x in nodes): raise self.error('XPTY0004', 'argument must contain only nodes') results = set() for item in nodes: context.item = item ancestors = {x for x in context.iter_ancestors(axis='ancestor')} if any(x in nodes for x in ancestors): continue results.add(item) yield from sorted(results, key=node_position) ## # Functions and operators on sequences @method(function('head', nargs=1, sequence_types=('item()*', 'item()?'))) def evaluate_head_function(self, context=None): for item in self[0].select(context): return item @method(function('tail', nargs=1, sequence_types=('item()*', 'item()?'))) def select_tail_function(self, context=None): for k, item in enumerate(self[0].select(context)): if k: yield item @method(function('generate-id', nargs=(0, 1), sequence_types=('node()?', 'xs:string'))) def evaluate_generate_id_function(self, context=None): arg = self.get_argument(context, default_to_context=True) if arg is None: return '' elif not isinstance(arg, XPathNode): if self: raise self.error('XPTY0004', "argument is not a node") raise self.error('XPTY0004', "context item is not a node") else: return f'ID{id(arg)}' @method(function('uri-collection', nargs=(0, 1), sequence_types=('xs:string?', 'xs:anyURI*'))) def evaluate_uri_collection_function(self, context=None): uri = self.get_argument(context) if context is None: raise self.missing_context() elif isinstance(context, XPathSchemaContext): return elif not self or uri is None: if context.default_resource_collection is None: raise self.error('FODC0002', 'no default resource collection has been defined') resource_collection = AnyURI(context.default_resource_collection) else: try: AnyURI(uri) except ValueError: raise self.error('FODC0004', 'invalid argument to fn:uri-collection') from None uri = self.get_absolute_uri(uri) try: resource_collection = context.resource_collections[uri] except (KeyError, TypeError): url_parts = urlsplit(uri) if url_parts.scheme in ('', 'file') and \ not url_parts.path.startswith(':') and url_parts.path.endswith('/'): raise self.error('FODC0003', 'collection URI is a directory') raise self.error('FODC0002', '{!r} collection not found'.format(uri)) from None if not self.parser.match_sequence_type(resource_collection, 'xs:anyURI*'): raise self.wrong_sequence_type("Type does not match sequence type xs:anyURI*") return resource_collection @method(function('unparsed-text', nargs=(1, 2), sequence_types=('xs:string?', 'xs:string', 'xs:string?'))) @method(function('unparsed-text-lines', nargs=(1, 2), sequence_types=('xs:string?', 'xs:string', 'xs:string*'))) def evaluate_unparsed_text_functions(self, context=None): from urllib.request import urlopen # optional because it consumes ~4.3 MiB from urllib.error import URLError href = self.get_argument(context, cls=str) if href is None: return elif urlsplit(href).fragment: raise self.error('FOUT1170') if len(self) > 1: encoding = self.get_argument(context, index=1, required=True, cls=str) else: encoding = 'UTF-8' try: uri = self.get_absolute_uri(href) except ValueError: raise self.error('FOUT1170') from None try: codecs.lookup(encoding) except LookupError: raise self.error('FOUT1190') from None if context is not None and uri in context.text_resources: obj = context.text_resources[uri] else: try: with urlopen(uri) as rp: obj = rp.read() except (ValueError, URLError) as err: message = str(err) if 'No such file' in message or \ 'unknown url type' in message or \ 'HTTP Error 404' in message or \ 'failure in name resolution' in message: raise self.error('FOUT1170', message) from None raise self.error('FOUT1190') from None else: if context is not None: context.text_resources[uri] = obj try: text = codecs.decode(obj, encoding) except UnicodeDecodeError: if len(self) > 1: raise self.error('FOUT1190') from None try: text = codecs.decode(obj, 'UTF-16') except UnicodeDecodeError: raise self.error('FOUT1190') from None if not all(is_xml_codepoint(ord(s)) for s in text): raise self.error('FOUT1190') text = text.lstrip('\ufeff') if self.symbol == 'unparsed-text-lines': lines = XML_NEWLINES_PATTERN.split(text) return lines[:-1] if lines[-1] == '' else lines return text @method(function('unparsed-text-available', nargs=(1, 2), sequence_types=('xs:string?', 'xs:string', 'xs:boolean'))) def evaluate_unparsed_text_available_function(self, context=None): from urllib.request import urlopen # optional because it consumes ~4.3 MiB from urllib.error import URLError href = self.get_argument(context, cls=str) if href is None: return False elif urlsplit(href).fragment: return False if len(self) > 1: encoding = self.get_argument(context, index=1, required=True, cls=str) else: encoding = 'UTF-8' try: uri = self.get_absolute_uri(href) codecs.lookup(encoding) with urlopen(uri) as rp: obj = rp.read() except (ValueError, URLError, LookupError): return False try: return all(is_xml_codepoint(ord(s)) for s in codecs.decode(obj, encoding)) except UnicodeDecodeError: if len(self) > 1: return False try: return all(is_xml_codepoint(ord(s)) for s in codecs.decode(obj, 'UTF-16')) except UnicodeDecodeError: return False @method(function('environment-variable', nargs=1, sequence_types=('xs:string', 'xs:string?'))) def evaluate_environment_variable_function(self, context=None): name = self.get_argument(context, required=True, cls=str) if context is None: raise self.missing_context() elif not context.allow_environment: return else: return os.environ.get(name) @method(function('available-environment-variables', nargs=0, sequence_types=('xs:string*',))) def evaluate_available_environment_variables_function(self, context=None): if context is None: raise self.missing_context() elif not context.allow_environment: return else: return list(os.environ) ### # Parsing and serializing @method(function('parse-xml', nargs=1, sequence_types=('xs:string?', 'document-node(element(*))?'))) def evaluate_parse_xml_function(self, context=None): # TODO: resolve relative entity references with static base URI arg = self.get_argument(context, cls=str) if arg is None: return [] elif context is None: raise self.missing_context() etree = context.etree try: if self.parser.defuse_xml: root = etree.XML(defuse_xml(arg.encode('utf-8'))) else: root = etree.XML(arg.encode('utf-8')) except etree.ParseError: raise self.error('FODC0006') else: return get_node_tree(etree.ElementTree(root), self.parser.namespaces) @method(function('parse-xml-fragment', nargs=1, sequence_types=('xs:string?', 'document-node()?'))) def evaluate_parse_xml_fragment_function(self, context=None): arg = self.get_argument(context, cls=str) if arg is None: return [] elif context is None: raise self.missing_context() # Wrap argument in a fake document because an # XML document can have only one root element if arg.startswith('') xml_params = DECL_PARAM_PATTERN.findall(xml_declaration) if 'encoding' not in xml_params: raise self.error('FODC0006', "'encoding' argument is mandatory") for param in xml_params: if param not in ('version', 'encoding'): msg = f'unexpected parameter {param!r} in XML declaration' raise self.error('FODC0006', msg) if arg.lstrip().startswith('{arg}'), namespaces=self.parser.namespaces ) except etree.ParseError: raise self.error('FODC0006', str(err)) from None else: return get_node_tree( root=etree.ElementTree(root), namespaces=self.parser.namespaces ) @method(function('serialize', nargs=(1, 2), sequence_types=( 'item()*', 'element(output:serialization-parameters)?', 'xs:string'))) def evaluate_serialize_function(self, context=None): # TODO full implementation of serialization with # https://www.w3.org/TR/xpath-functions-30/#xslt-xquery-serialization-30 params = self.get_argument(context, index=1) if len(self) == 2 else None if params is None: tmpl = '' params = ElementTree.XML(tmpl.format(XSLT_XQUERY_SERIALIZATION_NAMESPACE)) elif isinstance(params, ElementNode): params = params.value if params.tag != SERIALIZATION_PARAMS: raise self.error('XPTY0004', 'output:serialization-parameters tag expected') if context is None or isinstance(context, XPathSchemaContext): etree = ElementTree else: etree = context.etree if context.namespaces: for pfx, uri in context.namespaces.items(): etree.register_namespace(pfx, uri) else: for pfx, uri in self.parser.namespaces.items(): etree.register_namespace(pfx, uri) item_separator = ' ' kwargs = {} character_map = {} if len(params): if len(params) > len({e.tag for e in params}): raise self.error('SEPM0019') for child in params: if child.tag == SER_PARAM_OMIT_XML_DECLARATION: value = child.get('value') if value not in ('yes', 'no') or len(child.attrib) > 1: raise self.error('SEPM0017') elif value == 'no': kwargs['xml_declaration'] = True elif child.tag == SER_PARAM_USE_CHARACTER_MAPS: if len(child.attrib): raise self.error('SEPM0017') for e in child: if e.tag != SER_PARAM_CHARACTER_MAP: raise self.error('SEPM0017') try: character = e.attrib['character'] if character in character_map: msg = 'duplicate character {!r} in character map' raise self.error('SEPM0018', msg.format(character)) elif len(character) != 1: msg = 'invalid character {!r} in character map' raise self.error('SEPM0017', msg.format(character)) character_map[character] = e.attrib['map-string'] except KeyError as key: msg = "missing {} in character map" raise self.error('SEPM0017', msg.format(key)) from None else: if len(e.attrib) > 2: msg = "invalid attribute in character map" raise self.error('SEPM0017', msg) elif child.tag == SER_PARAM_METHOD: value = child.get('value') if value not in ('html', 'xml', 'xhtml', 'text') or len(child.attrib) > 1: raise self.error('SEPM0017') kwargs['method'] = value if value != 'xhtml' else 'html' elif child.tag == SER_PARAM_INDENT: value = child.attrib.get('value', '').strip() if value not in ('yes', 'no') or len(child.attrib) > 1: raise self.error('SEPM0017') elif child.tag == SER_PARAM_ITEM_SEPARATOR: try: item_separator = child.attrib['value'] except KeyError: raise self.error('SEPM0017') from None elif child.tag == SER_PARAM_CDATA: pass # TODO param elif child.tag == SER_PARAM_NO_INDENT: pass # TODO param elif child.tag == SER_PARAM_STANDALONE: value = child.attrib.get('value', '').strip() if value not in ('yes', 'no', 'omit') or len(child.attrib) > 1: raise self.error('SEPM0017') if value != 'omit': kwargs['standalone'] = value == 'yes' elif child.tag.startswith(f'{{{XSLT_XQUERY_SERIALIZATION_NAMESPACE}'): raise self.error('SEPM0017') elif not child.tag.startswith('{'): # no-namespace not allowed raise self.error('SEPM0017') chunks = [] for item in self[0].select(context): if isinstance(item, DocumentNode): item = item.document.getroot() elif isinstance(item, ElementNode): item = item.elem elif isinstance(item, (AttributeNode, NamespaceNode)): raise self.error('SENR0001') elif isinstance(item, TextNode): chunks.append(item.value) continue elif isinstance(item, bool): chunks.append('true' if item else 'false') continue else: chunks.append(str(item)) continue if isinstance(context, XPathSchemaContext): continue try: ck = etree.tostringlist(item, encoding='utf-8', **kwargs) except TypeError: chunks.append(etree.tostring(item, encoding='utf-8').decode('utf-8')) else: if ck and ck[0].startswith(b' 0: return type(arg)(number.quantize(exponent, rounding='ROUND_HALF_UP')) else: return type(arg)(number.quantize(exponent, rounding='ROUND_HALF_DOWN')) except TypeError as err: raise self.error('FORG0006', err) from None except decimal.InvalidOperation: if isinstance(arg, str): raise self.error('XPTY0004') from None return round(arg) except decimal.DecimalException as err: raise self.error('FOCA0002', err) from None # # XSD list-based constructors @XPath30Parser.constructor('NMTOKENS', sequence_types=('xs:NMTOKEN*',)) def cast_nmtokens_list_type(self, value): cast_func = xsd10_atomic_types['NMTOKEN'] if isinstance(value, UntypedAtomic): values = value.value.split() or [value.value] else: values = value.split() or [value] try: return [cast_func(x) for x in values] except ValueError as err: raise self.error('FORG0001', err) from None @XPath30Parser.constructor('IDREFS', sequence_types=('xs:IDREF*',)) def cast_idrefs_list_type(self, value): cast_func = xsd10_atomic_types['IDREF'] if isinstance(value, UntypedAtomic): values = value.value.split() or [value.value] else: values = value.split() or [value] try: return [cast_func(x) for x in values] except ValueError as err: raise self.error('FORG0001', err) from None @XPath30Parser.constructor('ENTITIES', sequence_types=('xs:ENTITY*',)) def cast_entities_list_type(self, value): cast_func = xsd10_atomic_types['ENTITY'] if isinstance(value, UntypedAtomic): values = value.value.split() or [value.value] else: values = value.split() or [value] try: return [cast_func(x) for x in values] except ValueError as err: raise self.error('FORG0001', err) from None ### # In XPath 3.0+ the 'error' keyword has to be used both for fn:error() and xs:error() XPath30Parser.unregister('error') # TODO: apply sequence_types=('xs:anyAtomicType?', 'xs:error?') for xs:error @XPath30Parser.constructor('error', bp=90, label=('function', 'constructor function'), nargs=(0, 3), sequence_types=('xs:QName?', 'xs:string', 'item()*', 'none')) def cast_error_type(self, value): if value is None or value == []: return [] msg = f"Cast {value!r} to xs:error is not possible" raise self.error('FORG0001', msg) @method('error') def nud_error_type_and_function(self): self.clear() try: self.parser.advance('(') if self.namespace == XSD_NAMESPACE: self.label = 'constructor function' self.nargs = 1 if self.parser.xsd_version == '1.0': raise self.error('XPST0051', 'xs:error is not defined with XSD 1.0') self.append(self.parser.expression(5)) else: self.label = 'function' for k in range(3): if self.parser.next_token.symbol == ')': break self.append(self.parser.expression(5)) if self.parser.next_token.symbol == ')': break self.parser.advance(',') self.parser.advance(')') except SyntaxError: raise self.error('XPST0017') from None self.value = None return self @method('error') def evaluate_error_type_and_function(self, context=None): if self.label == 'constructor function': return self.cast(self.get_argument(context)) elif not self: raise self.error('FOER0000') elif len(self) == 1: error = self.get_argument(context, cls=QName) if error is None and self.parser.version == '3.0': raise self.error('XPTY0004', "an xs:QName expected") raise self.error(error or 'FOER0000') else: error = self.get_argument(context, cls=QName) description = self.get_argument(context, index=1, cls=str) raise self.error(error or 'FOER0000', description) elementpath-3.0.2/elementpath/xpath30/_xpath30_operators.py000066400000000000000000000167051427546011100237270ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # type: ignore """ XPath 3.0 implementation - part 2 (symbols, operators and expressions) """ from copy import copy from ..namespaces import XPATH_FUNCTIONS_NAMESPACE, XSD_NAMESPACE from ..xpath_nodes import AttributeNode, ElementNode from ..xpath_token import XPathToken, ValueToken, XPathFunction from ..xpath_context import XPathSchemaContext from ..datatypes import QName from .xpath30_parser import XPath30Parser register = XPath30Parser.register infix = XPath30Parser.infix method = XPath30Parser.method register(':=') ### # Placeholder symbol (used also for optional occurrence) XPath30Parser.unregister('?') register('?', bases=(ValueToken,)) @method('?') def nud_placeholder_symbol(self): return self @method('?') def evaluate_placeholder_symbol(self, context=None): return self ### # Braced/expanded QName(s) XPath30Parser.duplicate('{', 'Q{', pattern=r'Q\{') XPath30Parser.unregister('{') XPath30Parser.unregister('}') register('{') register('}', bp=100) XPath30Parser.unregister('(') @method(register('(', lbp=80, rpb=80, label='expression')) def nud_parenthesized_expression(self): if self.parser.next_token.symbol != ')': self[:] = self.parser.expression(), self.parser.advance(')') return self @method('(') def led_parenthesized_expression(self, left): if left.symbol in ('(name)', 'Q{'): if left.value in self.parser.RESERVED_FUNCTION_NAMES: msg = f"{left.value!r} is not allowed as function name" raise left.error('XPST0003', msg) else: raise left.error('XPST0017', 'unknown function {!r}'.format(left.value)) elif left.symbol == ':' and left[1].symbol == '(name)': if left[1].namespace == XSD_NAMESPACE: msg = 'unknown constructor function {!r}'.format(left[1].value) raise left[1].error('XPST0017', msg) raise left.error('XPST0017', 'unknown function {!r}'.format(left.value)) if self.parser.next_token.symbol != ')': self[:] = left, self.parser.expression() else: self[:] = left, self.parser.advance(')') return self @method('(') def evaluate_parenthesized_expression(self, context=None): if not self: return [] value = self[0].evaluate(context) if isinstance(value, list) and len(value) == 1: value = value[0] if len(self) > 1: if isinstance(value, XPathFunction): # Build argument list considering commas as separators of different arguments arguments = [] tk = self[1] while True: if tk.symbol == ',': arguments.append(tk[1].evaluate(context)) tk = tk[0] else: arguments.append(tk.evaluate(context)) break arguments.reverse() return value(context, *arguments) elif self[0].symbol == '(': if not isinstance(value, list): return value elif any(not isinstance(x, XPathFunction) for x in value): return value if isinstance(value, XPathToken) and value.symbol == '?': return value raise self.error('XPTY0004', f'an XPath function expected, not {type(value)!r}') if not isinstance(value, XPathFunction) or self[0].span[0] > self.span[0]: return value else: return value(context) @method(infix('||', bp=32)) def evaluate_union_operator(self, context=None): return self.string_value(self.get_argument(context)) + \ self.string_value(self.get_argument(context, index=1)) @method(infix('!', bp=72)) def select_simple_map_operator(self, context=None): if context is None: raise self.missing_context() for context.item in context.inner_focus_select(self[0]): for result in self[1].select(copy(context)): yield result if isinstance(context, XPathSchemaContext) and \ isinstance(result, (AttributeNode, ElementNode)): self[1].add_xsd_type(result) ### # 'let' expressions @method(register('let', lbp=20, rbp=20, label='let expression')) def nud_let_expression(self): del self[:] if self.parser.next_token.symbol != '$': token = self.parser.symbol_table['(name)'](self.parser, self.symbol) return token.nud() while True: self.parser.next_token.expected('$') variable = self.parser.expression(5) self.append(variable) self.parser.advance(':=') expr = self.parser.expression(5) self.append(expr) if self.parser.next_token.symbol != ',': break self.parser.advance() self.parser.advance('return') self.append(self.parser.expression(5)) return self @method('let') def select_let_expression(self, context=None): if context is None: raise self.missing_context() context = copy(context) for k in range(0, len(self) - 1, 2): varname = self[k][0].value value = self[k+1].evaluate(context) context.variables[varname] = value yield from self[-1].select(context) @method('#', bp=90) def led_function_reference(self, left): left.expected(':', '(name)', 'Q{') self[:] = left, self.parser.expression(rbp=90) self[1].expected('(integer)') return self @method('#') def evaluate_function_reference(self, context=None): if self[0].symbol == ':': qname = QName(self[0][1].namespace, self[0].value) elif self[0].symbol == 'Q{': qname = QName(self[0][0].value, self[0][1].value) elif self[0].value in self.parser.RESERVED_FUNCTION_NAMES: msg = f"{self[0].value!r} is not allowed as function name" raise self.error('XPST0003', msg) else: qname = QName(XPATH_FUNCTIONS_NAMESPACE, self[0].value) arity = self[1].value namespace = qname.namespace local_name = qname.local_name # Generic rule for XSD constructor functions if namespace == XSD_NAMESPACE and arity != 1: raise self.error('XPST0017', f"unknown function {qname.qname}#{arity}") # Special checks for multirole tokens if namespace == XPATH_FUNCTIONS_NAMESPACE and \ local_name in ('QName', 'dateTime') and arity == 1: raise self.error('XPST0017', f"unknown function {qname.qname}#{arity}") try: if namespace in (XPATH_FUNCTIONS_NAMESPACE, XSD_NAMESPACE): token_class = self.parser.symbol_table[local_name] else: token_class = self.parser.symbol_table[qname.expanded_name] except KeyError: msg = f"unknown function {qname.qname}#{arity}" raise self.error('XPST0017', msg) from None else: if token_class.symbol == 'function' or not token_class.label.endswith('function'): raise self.error('XPST0003') try: func = token_class(self.parser, nargs=arity) except TypeError: msg = f"unknown function {qname.qname}#{arity}" raise self.error('XPST0017', msg) from None else: if func.namespace is None: func.namespace = namespace elif func.namespace != namespace: raise self.error('XPST0017', f"unknown function {qname.qname}#{arity}") return func elementpath-3.0.2/elementpath/xpath30/xpath30_helpers.py000066400000000000000000000555751427546011100232240ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import calendar import datetime import decimal import re from typing import Iterator, List, Optional, Tuple, Union from unicodedata import category from ..exceptions import xpath_error from ..regex import translate_pattern from ._translation_maps import ALPHABET_CHARACTERS, OTHER_NUMBERS, ROMAN_NUMERALS_MAP, \ NUM_TO_MONTH_MAPS, NUM_TO_WEEKDAY_MAPS, NUM_TO_WORD_MAPS, MILITARY_TIME_ZONES PRESENTATION_FORMATS = {'i', 'I', 'w', 'W', 'Ww', 'a', 'A', 'n', 'N', 'Nn', 'Z'} PICTURE_PATTERN = re.compile(r'\[(?!\[)[^]]+]') UNICODE_DIGIT_PATTERN = re.compile(r'\d') DECIMAL_DIGIT_PATTERN = re.compile(translate_pattern(r'^((\p{Nd}|#|[^\p{N}\p{L}])+?)$')) FMT_MODIFIER_PATTERN = re.compile(r'([co](\(.+\))?)?[at]?$') WIDTH_PATTERN = re.compile(r'^([0-9]+|\*)(-([0-9]+|\*))?$') MODIFIER_PATTERN = re.compile(r'^([co](\(.+\))?)?[at]?$') def decimal_to_string(value: decimal.Decimal) -> str: """ Convert a Decimal value to a string representation that not includes exponent and with its decimals. """ sign, digits, exponent = value.as_tuple() if not exponent: result = ''.join(str(x) for x in digits) elif exponent > 0: result = ''.join(str(x) for x in digits) + '0' * exponent else: result = ''.join(str(x) for x in digits[:exponent]) if not result: result = '0' result += '.' if len(digits) >= -exponent: result += ''.join(str(x) for x in digits[exponent:]) else: result += '0' * (-exponent - len(digits)) result += ''.join(str(x) for x in digits) return '-' + result if sign else result def int_to_roman(num: int) -> str: """ Convert an integer to Roman ordinal. """ def roman_num(value: int) -> Iterator[str]: if not value: yield '0' return elif value < 0: yield '-' value = abs(value) for base, roman in ROMAN_NUMERALS_MAP.items(): if value: yield roman * (value // base) value %= base return ''.join(x for x in roman_num(num)) def int_to_alphabetic(num: int, reference: Optional[str] = None) -> str: if not reference or len(reference) > 1: try: alphabet = ALPHABET_CHARACTERS[reference] except KeyError: msg = "formatting for language {!r} is not supported" raise NotImplementedError(msg.format(reference)) elif reference.isdigit(): for alphabet in OTHER_NUMBERS: if reference in alphabet: break else: alphabet = '1234567890' else: for alphabet in ALPHABET_CHARACTERS.values(): if reference.lower() in alphabet: break else: alphabet = '1234567890' base = len(alphabet) if not num: return '0' chars = [] negative = num < 0 num = abs(num) - 1 while num >= 0: chars.append(alphabet[num % base]) num = (num // base) - 1 if negative: chars.append('-') return ''.join(reversed(chars)) def int_to_month(num: int, lang: Optional[str] = None) -> str: if lang is None: lang = 'en' try: months_map = NUM_TO_MONTH_MAPS[lang] except KeyError: months_map = NUM_TO_MONTH_MAPS['en'] return months_map[num] def int_to_weekday(num: int, lang: Optional[str] = None) -> str: if lang is None: lang = 'en' try: weekday_map = NUM_TO_WEEKDAY_MAPS[lang] except KeyError: weekday_map = NUM_TO_WEEKDAY_MAPS['en'] return weekday_map[num] def week_in_month(dt: datetime.datetime) -> int: month_cal = calendar.monthcalendar(dt.year, dt.month) for k, week_cal in enumerate(month_cal, start=1): if dt.day in week_cal: if month_cal[0][3]: return k elif k > 1: return k - 1 if dt.month > 1: prev_month_cal = calendar.monthcalendar(dt.year, dt.month - 1) else: prev_month_cal = calendar.monthcalendar(dt.year - 1, 12) if prev_month_cal[0][3]: return len(prev_month_cal) else: return len(prev_month_cal) - 1 else: raise ValueError(f'{dt.day} does not match related calendar') def format_digits(digits: str, fmt: str, digits_family: str = '0123456789', optional_digit: str = '#', grouping_separator: Optional[str] = None) -> str: result = [] iter_num_digits = reversed(digits) num_digit = next(iter_num_digits) for fmt_char in reversed(fmt): if fmt_char.isdigit() or fmt_char == optional_digit: if num_digit: result.append(digits_family[ord(num_digit) - 48]) num_digit = next(iter_num_digits, '') elif fmt_char != optional_digit: result.append(digits_family[0]) elif not result or not result[-1].isdigit() and grouping_separator \ and result[-1] != grouping_separator: raise xpath_error('FODF1310', "invalid grouping in picture argument") else: result.append(fmt_char) if num_digit: separator = '' _separator = {x for x in fmt if not x.isdigit() and x != optional_digit} if len(_separator) != 1: repeat = None else: separator = _separator.pop() chunks = fmt.split(separator) repeat = len(chunks[-1]) if all(len(item) == repeat for item in chunks[1:-1]): repeat += 1 else: repeat = None if repeat is None: while num_digit: result.append(digits_family[ord(num_digit) - 48]) num_digit = next(iter_num_digits, '') else: while num_digit: if ((len(result) + 1) % repeat) == 0: result.append(separator) result.append(digits_family[ord(num_digit) - 48]) num_digit = next(iter_num_digits, '') if grouping_separator: return ''.join(reversed(result)).lstrip(grouping_separator) while result and \ category(result[-1]) not in ('Nd', 'Nl', 'No', 'Lu', 'Ll', 'Lt', 'Lm', 'Lo'): result.pop() return ''.join(reversed(result)) def ordinal_suffix(value: int) -> str: value = abs(value) % 100 if 3 < value < 20: return 'th' value %= 10 if value == 1: return 'st' elif value == 2: return 'nd' elif value == 3: return 'rd' else: return 'th' def to_ordinal_en(num_as_words: str) -> str: if num_as_words.endswith('one'): return num_as_words[:-3] + 'first' elif num_as_words.endswith('two'): return num_as_words[:-3] + 'second' elif num_as_words.endswith('three'): return num_as_words[:-5] + 'third' elif num_as_words.endswith('eight'): return num_as_words + 'h' elif num_as_words.endswith('nine'): return num_as_words[:-1] + 'th' elif num_as_words.endswith('y'): return num_as_words[:-1] + 'ieth' elif num_as_words.endswith('e'): return num_as_words[:-2] + 'fth' else: return num_as_words + 'th' def to_ordinal_it(num_as_words: str, fmt_modifier: str) -> str: if '%spellout-ordinal-feminine' in fmt_modifier: suffix = 'a' elif fmt_modifier.startswith('o(-'): suffix = fmt_modifier[3:-1] else: suffix = '' ordinal_map = { 'zero': '', 'uno': 'primo', 'due': 'secondo', 'tre': 'terzo', 'quattro': 'quarto', 'cinque': 'quinto', 'sei': 'sesto', 'sette': 'settimo', 'otto': 'ottavo', 'nove': 'nono', 'dieci': 'decimo', } try: value = ordinal_map[num_as_words] except KeyError: if num_as_words[-1] in 'eo': value = num_as_words[:-1] + 'esimo' else: value = num_as_words + 'esimo' if value and suffix: return value[:-1] + suffix return value def int_to_words(num: int, lang: Optional[str] = None, fmt_modifier: str = '') -> str: def word_num(value: int) -> Iterator[str]: if not value: yield num_map[value] for base, word in num_map.items(): if base >= 1: floor = value // base if not floor: continue elif base >= 100: yield from word_num(floor) yield ' ' yield word value %= base if not value: break elif base < 100: yield '-' elif base == 100: if lang == 'en': yield ' and ' else: yield ' ' try: num_map = NUM_TO_WORD_MAPS[lang] # type: ignore[index] except KeyError: lang = 'en' num_map = NUM_TO_WORD_MAPS[lang] if num < 0: result = '-' + ''.join(x for x in word_num(abs(num))) else: result = ''.join(x for x in word_num(num)) if not fmt_modifier.startswith('o'): return result if lang == 'en': return to_ordinal_en(result) elif lang == 'it': return to_ordinal_it(result, fmt_modifier) else: return result def parse_datetime_picture(picture: str) -> Tuple[List[str], List[str]]: """ Analyze a picture argument of XPath 3.0+ formatting functions. :param picture: the picture string. :return: a couple of lists containing the literal parts and markers. """ min_value: Union[int, str] max_value: Union[None, int, str] literals = [] for lit in PICTURE_PATTERN.split(picture): if '[' in lit.replace('[[', ''): raise xpath_error('FOFD1340', "Invalid character '[' in picture literal") elif ']' in lit.replace(']]', ''): raise xpath_error('FOFD1340', "Invalid character ']' in picture literal") else: literals.append(lit.replace('[[', '[').replace(']]', ']')) markers = [x.group().replace(' ', '').replace('\n', '').replace('\t', '') for x in PICTURE_PATTERN.finditer(picture)] assert len(markers) == (len(literals) - 1) msg_tmpl = 'Invalid formatting component {!r}' for value in markers: if value[1] not in 'YMDdFWwHhPmsfZzCE': raise xpath_error('FOFD1340', msg_tmpl.format(value)) if ',' not in value: presentation = value[2:-1] else: presentation, width = value[2:-1].rsplit(',', maxsplit=1) if WIDTH_PATTERN.match(width) is None: raise xpath_error('FOFD1340', f'Invalid width modifier {value!r}') elif '-' not in width: if '*' not in width and not int(width): raise xpath_error('FOFD1340', f'Invalid width modifier {value!r}') elif '*' not in width: min_value, max_value = map(int, width.split('-')) if min_value < 1 or max_value < min_value: raise xpath_error('FOFD1340', msg_tmpl.format(value)) else: min_value, max_value = width.split('-') if min_value != '*' and not int(min_value): raise xpath_error('FOFD1340', f'Invalid width modifier {value!r}') if max_value != '*' and not int(max_value): raise xpath_error('FOFD1340', f'Invalid width modifier {value!r}') if len(presentation) > 1 and presentation[-1] in 'atco': presentation = presentation[:-1] if not presentation or presentation in PRESENTATION_FORMATS: pass elif DECIMAL_DIGIT_PATTERN.match(presentation) is None: raise xpath_error('FOFD1340', msg_tmpl.format(value)) else: if value[1] == 'f': if presentation[0] == '#' and any(ch.isdigit() for ch in presentation): msg = 'picture argument has an invalid primary format token' raise xpath_error('FOFD1340', msg) elif presentation[0].isdigit() and '#' in presentation: msg = 'picture argument has an invalid primary format token' raise xpath_error('FOFD1340', msg) # Check digits set uniformity cp = None for ch in reversed(presentation): if not ch.isdigit(): continue elif cp is None: cp = ord(ch) elif abs(ord(ch) - cp) > 10: raise xpath_error('FOFD1340', msg_tmpl.format(value)) return literals, markers def parse_datetime_marker(marker: str, dt: datetime.datetime, lang: Optional[str] = None) -> str: min_width: int max_width: Optional[int] component = marker[1] fmt_token = marker[2:-1] if ',' not in fmt_token: presentation, width = fmt_token, '' else: presentation, width = fmt_token.rsplit(',', maxsplit=1) if not presentation: fmt_modifier = '' if component in 'Hhf': presentation = '1' elif component in 'ms': presentation = '01' elif component in 'Zz': presentation = '01:01' else: presentation = 'n' elif presentation == 'a': fmt_modifier = '' else: _match = FMT_MODIFIER_PATTERN.search(presentation) if _match is None: fmt_modifier = '' else: fmt_modifier = _match.group(0) if fmt_modifier: presentation = presentation[:-len(fmt_modifier)] if presentation.startswith('#') and presentation.endswith('#'): msg_tmpl = 'Invalid formatting component {!r}' raise xpath_error('FOFD1340', msg_tmpl.format(component)) for pch in presentation: if pch.isdigit(): zero_cp = ord(pch) - int(pch) zero_ch = chr(zero_cp) break else: zero_cp, zero_ch = ord('0'), '0' digits = sum(c.isdigit() for c in presentation) opt_digits = presentation.count('#') if not width or width == '*': if digits > 1: min_width, max_width = digits, digits + opt_digits else: min_width, max_width = 0, None else: min_width, max_width = parse_width(width) if digits > 1: min_width = max(min_width, digits) if max_width: max_width = max(max_width, digits + opt_digits) if component == 'Y': value = str(abs(dt.year)) elif component == 'M': if presentation.lower().startswith('n') and lang is not None: value = int_to_month(dt.month, lang) else: value = str(dt.month) elif component == 'D': value = str(dt.day) elif component == 'H': value = str(dt.hour) elif component == 'h': if dt.hour == 0: value = '12' elif dt.hour > 12: value = str(dt.hour % 12) else: value = str(dt.hour) elif component == 'P': value = 'a.m.' if dt.hour < 12 else 'p.m.' elif component == 'm': value = str(dt.minute) elif component == 's': value = str(dt.second) elif component == 'f': value = str('{:06}'.format(dt.microsecond)) elif component == 'z' or component == 'Z': if presentation == 'N': value = dt.tzname() or '' elif dt.tzinfo is None: value = '+00:00' else: value = str(dt) if value.endswith('Z'): value = '+00:00' else: value = value[-6:] elif component == 'W': value = str(dt.isocalendar()[1]) elif component == 'w': value = str(week_in_month(dt)) elif component == 'F': if presentation.lower().startswith('n') and lang is not None: value = int_to_weekday(dt.isocalendar()[2], lang) else: value = str(dt.isocalendar()[2]) elif component == 'E': if dt.year < 0: value = 'BC' else: value = 'AD' elif component == 'd': delta = dt - type(dt)(dt.year, 1, 1) value = str(1 + delta.seconds // 86400) else: msg_tmpl = 'Invalid formatting component {!r}' raise xpath_error('FOFD1340', msg_tmpl.format(component)) sign = '' left_to_right = component != 'Y' if presentation == 'n': fmt_chunk = value.lower() elif presentation == 'N': fmt_chunk = value.upper() elif presentation == 'Nn': fmt_chunk = value.title() elif presentation == 'I' or presentation == 'i': fmt_chunk = value elif presentation == 'Z' and component == 'Z': if dt.tzinfo is None: fmt_chunk = MILITARY_TIME_ZONES[None] elif value.endswith(':00'): fmt_chunk = MILITARY_TIME_ZONES.get(value[:3], value) else: fmt_chunk = value elif presentation == 'w': fmt_chunk = int_to_words(int(value), lang, fmt_modifier) elif presentation == 'W': fmt_chunk = int_to_words(int(value), lang, fmt_modifier).upper() elif presentation == 'Ww': fmt_chunk = int_to_words(int(value), lang, fmt_modifier).title() elif presentation == 'a': fmt_chunk = int_to_alphabetic(int(value), lang) elif presentation == 'A': fmt_chunk = int_to_alphabetic(int(value), lang).upper() else: left_to_right = False k = 0 pch = '' chars = [] # Extract the sign if value.startswith('-') or value.startswith('+'): sign = value[0] value = value[1:] if component in 'zZ': if presentation.isdigit(): if len(presentation) <= 2: if value.endswith(':00'): value = value[:-3] left_to_right = True elif len(presentation) == 1: presentation = '#0:01' min_width, max_width = 3, 4 else: presentation = '01:01' min_width = max_width = 4 elif len(presentation) == 3: presentation = '#001' min_width, max_width = 3, 4 elif presentation.replace(':', '', 1).isdigit(): if len(presentation) == 4: presentation = '#0:01' min_width, max_width = 3, 4 if component != 'f': presentation = ''.join(reversed(presentation)) value = ''.join(reversed(value)) for ch in value: try: pch = presentation[k] except IndexError: if ch == '0' and not pch.isdigit(): break else: k += 1 while pch != '#' and not pch.isdigit(): chars.append(pch) min_width += 1 if max_width is not None: max_width += 1 try: pch = presentation[k] except IndexError: break else: k += 1 else: if ch.isdigit(): chars.append(ch) if component != 'f': fmt_chunk = ''.join(reversed(chars)) else: fmt_chunk = ''.join(chars) if 'o' in fmt_modifier: try: fmt_chunk += ordinal_suffix(int(fmt_chunk)) except ValueError: pass else: min_width += 2 if max_width is not None: max_width += 2 if len(fmt_chunk) < min_width and component not in 'PzZ': if component in 'f': fmt_chunk += zero_ch * (min_width - len(fmt_chunk)) else: fmt_chunk = zero_ch * (min_width - len(fmt_chunk)) + fmt_chunk if max_width: if left_to_right or component in 'f': fmt_chunk = fmt_chunk[:max_width] else: fmt_chunk = fmt_chunk[max(0, len(fmt_chunk)-max_width):] if component in 'zZ': if not min_width: fmt_chunk = fmt_chunk.lstrip('0') if not fmt_chunk: return 'Z' if component == 'Z' else 'GMT' + sign + '0' else: try: nz_first = min(k for k in range(len(fmt_chunk)) if fmt_chunk[k] != zero_ch) except ValueError: fmt_chunk = fmt_chunk[max(0, len(fmt_chunk) - min_width):] else: fmt_chunk = fmt_chunk[max(0, min(nz_first, len(fmt_chunk) - min_width)):] elif min_width == 3 and component == 'F': fmt_chunk = fmt_chunk[:3] elif min_width or component == 'f': try: nz_last = max(k for k in range(len(fmt_chunk)) if fmt_chunk[k] != zero_ch) except ValueError: nz_last = 0 fmt_chunk = fmt_chunk[:max(min_width, nz_last + 1)] if zero_ch != '0': fmt_chunk = ''.join(chr(zero_cp + int(ch)) if ch.isdigit() else ch for ch in fmt_chunk) if component == 'z': return 'GMT' + sign + fmt_chunk if presentation == 'I': return sign + int_to_roman(int(fmt_chunk)) elif presentation == 'i': return sign + int_to_roman(int(fmt_chunk)).lower() return sign + fmt_chunk def parse_width(width: str) -> Tuple[int, Optional[int]]: min_width: Union[str, int] max_width: Union[str, int, None] if WIDTH_PATTERN.match(width) is None: raise xpath_error('FOFD1340', f'Invalid width modifier {width!r}') elif '-' not in width: if width == '*': return 0, None min_width = int(width) if not min_width: raise xpath_error('FOFD1340', f'Invalid width modifier {width!r}') return min_width, None elif '*' not in width: min_width, max_width = map(int, width.split('-')) if not min_width or max_width < min_width: raise xpath_error('FOFD1340', f'Invalid width modifier {width!r}') return min_width, max_width else: min_width, max_width = width.split('-') if min_width == '*': min_width = 0 else: min_width = int(min_width) if not min_width: raise xpath_error('FOFD1340', f'Invalid width modifier {width!r}') if max_width == '*': return min_width, None else: max_width = int(max_width) if not max_width: raise xpath_error('FOFD1340', f'Invalid width modifier {width!r}') return min_width, max_width elementpath-3.0.2/elementpath/xpath30/xpath30_parser.py000066400000000000000000000107151427546011100230410ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 3.0 implementation - part 1 (parser class) Refs: - https://www.w3.org/TR/2014/REC-xpath-30-20140408/ - https://www.w3.org/TR/xpath-functions-30/ """ from copy import deepcopy from typing import Any, Dict, Optional from ..namespaces import XPATH_MATH_FUNCTIONS_NAMESPACE from ..xpath2 import XPath2Parser DecimalFormatsType = Dict[Optional[str], Dict[str, str]] class XPath30Parser(XPath2Parser): """ XPath 3.0 expression parser class. Accepts all XPath 2.0 options as keyword arguments, but the *strict* option is ignored because XPath 3.0+ has braced URI literals and the expanded name syntax is not compatible. :param args: the same positional arguments of class :class:`elementpath.XPath2Parser`. :param decimal_formats: a mapping with statically known decimal formats. :param defuse_xml: if `True` defuse XML data before parsing, that is the default. :param kwargs: the same keyword arguments of class :class:`elementpath.XPath2Parser`. """ version = '3.0' SYMBOLS = XPath2Parser.SYMBOLS | { 'Q{', # see BracedURILiteral rule '||', # concat operator '!', # Simple map operator # Math functions (trigonometric and exponential) 'pi', 'exp', 'exp10', 'log', 'log10', 'pow', 'sqrt', 'sin', 'cos', 'tan', 'asin', 'acos', 'atan', 'atan2', # Formatting functions 'format-integer', 'format-number', 'format-dateTime', 'format-date', 'format-time', # String functions that use regular expressions 'analyze-string', # Functions and operators on nodes 'path', 'has-children', 'innermost', 'outermost', # Functions and operators on sequences 'head', 'tail', 'generate-id', 'uri-collection', 'unparsed-text', 'unparsed-text-lines', 'unparsed-text-available', 'environment-variable', 'available-environment-variables', # Parsing and serializing 'parse-xml', 'parse-xml-fragment', 'serialize', # Higher-order functions 'function-lookup', 'function-name', 'function-arity', '#', '?', 'for-each', 'filter', 'fold-left', 'fold-right', 'for-each-pair', # Expressions and node type functions 'function', 'let', ':=', 'namespace-node', # XSD list-types constructor functions 'ENTITIES', 'IDREFS', 'NMTOKENS', } DEFAULT_NAMESPACES = { 'math': XPATH_MATH_FUNCTIONS_NAMESPACE, **XPath2Parser.DEFAULT_NAMESPACES } PATH_STEP_SYMBOLS = { '(integer)', '(string)', '(float)', '(decimal)', '(name)', '*', '@', '..', '.', '(', '{', 'Q{', '$', } decimal_formats: DecimalFormatsType = { None: { 'decimal-separator': '.', 'grouping-separator': ',', 'exponent-separator': 'e', 'infinity': 'Infinity', 'minus-sign': '-', 'NaN': 'NaN', 'percent': '%', 'per-mille': '‰', 'zero-digit': '0', 'digit': '#', 'pattern-separator': ';', } } # https://www.w3.org/TR/xpath-30/#id-reserved-fn-names RESERVED_FUNCTION_NAMES = { 'attribute', 'comment', 'document-node', 'element', 'empty-sequence', 'function', 'if', 'item', 'namespace-node', 'node', 'processing-instruction', 'schema-attribute', 'schema-element', 'switch', 'text', 'typeswitch', } function_signatures = XPath2Parser.function_signatures.copy() def __init__(self, *args: Any, decimal_formats: Optional[DecimalFormatsType] = None, defuse_xml: bool = True, **kwargs: Any) -> None: kwargs.pop('strict', None) super(XPath30Parser, self).__init__(*args, **kwargs) self.defuse_xml = defuse_xml if decimal_formats is not None: self.decimal_formats = deepcopy(self.decimal_formats) for k, v in decimal_formats.items(): if k is not None: self.decimal_formats[k] = self.decimal_formats[None].copy() self.decimal_formats[k].update(v) if None in decimal_formats: self.decimal_formats[None].update(decimal_formats[None]) elementpath-3.0.2/elementpath/xpath31/000077500000000000000000000000001427546011100176215ustar00rootroot00000000000000elementpath-3.0.2/elementpath/xpath31/__init__.py000066400000000000000000000010031427546011100217240ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from typing import TYPE_CHECKING if TYPE_CHECKING: from .xpath31_parser import XPath31Parser else: from ._xpath31_functions import XPath31Parser __all__ = ['XPath31Parser'] elementpath-3.0.2/elementpath/xpath31/_xpath31_functions.py000066400000000000000000000066221427546011100237200ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # type: ignore """ XPath 3.1 implementation - part 3 (functions) """ from ..datatypes import AnyAtomicType from ..xpath_token import XPathMap, XPathArray from ._xpath31_operators import XPath31Parser method = XPath31Parser.method function = XPath31Parser.function XPath31Parser.unregister('string-join') @method(function('string-join', nargs=(1, 2), sequence_types=('xs:anyAtomicType*', 'xs:string', 'xs:string'))) def evaluate_string_join_function(self, context=None): items = [self.string_value(s) for s in self[0].select(context)] if len(self) == 1: return ''.join(items) return self.get_argument(context, 1, required=True, cls=str).join(items) @method(function('size', prefix='map', label='map function', nargs=1, sequence_types=('map(*)', 'xs:integer'))) def evaluate_map_size_function(self, context=None): return len(self.get_argument(context, required=True, cls=XPathMap)) @method(function('keys', prefix='map', label='map function', nargs=1, sequence_types=('map(*)', 'xs:anyAtomicType*'))) def evaluate_map_keys_function(self, context=None): map_ = self.get_argument(context, required=True, cls=XPathMap) return map_.keys(context) @method(function('contains', prefix='map', label='map function', nargs=2, sequence_types=('map(*)', 'xs:anyAtomicType', 'xs:boolean'))) def evaluate_map_contains_function(self, context=None): map_ = self.get_argument(context, required=True, cls=XPathMap) key = self.get_argument(context, index=1, required=True, cls=AnyAtomicType) return map_.contains(context, key) @method(function('get', prefix='map', label='map function', nargs=2, sequence_types=('map(*)', 'xs:anyAtomicType', 'item()*'))) def evaluate_map_get_function(self, context=None): map_ = self.get_argument(context, required=True, cls=XPathMap) key = self.get_argument(context, index=1, required=True, cls=AnyAtomicType) return map_(context, key) @method(function('size', prefix='array', label='array function', nargs=1, sequence_types=('array(*)', 'xs:integer'))) def evaluate_array_size_function(self, context=None): return len(self.get_argument(context, required=True, cls=XPathArray)) @method(function('get', prefix='array', label='array function', nargs=2, sequence_types=('array(*)', 'xs:integer', 'item()*'))) def evaluate_array_get_function(self, context=None): array_ = self.get_argument(context, required=True, cls=XPathArray) position = self.get_argument(context, index=1, required=True, cls=int) return array_(context, position) @method(function('put', prefix='array', label='array function', nargs=3, sequence_types=('array(*)', 'xs:integer', 'item()*', 'array(*)'))) def evaluate_array_put_function(self, context=None): array_ = self.get_argument(context, required=True, cls=XPathArray) position = self.get_argument(context, index=1, required=True, cls=int) member = self[2].evaluate(context) if member is None: member = [] return array_.put(position, member, context) elementpath-3.0.2/elementpath/xpath31/_xpath31_operators.py000066400000000000000000000023521427546011100237220ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # type: ignore """ XPath 3.1 implementation - part 2 (operators and constructors) """ from ..xpath_token import XPathMap, XPathArray from .xpath31_parser import XPath31Parser register = XPath31Parser.register method = XPath31Parser.method register('map', bp=90, label='map', bases=(XPathMap,)) register('array', bp=90, label='array', bases=(XPathArray,)) ### # Square array constructor (pushed lazy) @method('[') def nud_array_constructor(self): if self.parser.version < '3.1': raise self.wrong_syntax() # Constructs an XPathArray token and returns it instead of the predicate token = XPathArray(self.parser) if token.parser.next_token.symbol not in (']', '(end)'): while True: token.append(self.parser.expression(5)) if token.parser.next_token.symbol != ',': break token.parser.advance() token.parser.advance(']') return token elementpath-3.0.2/elementpath/xpath31/xpath31_parser.py000066400000000000000000000034771427546011100230520ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 3.1 implementation """ from ..namespaces import XPATH_MAP_FUNCTIONS_NAMESPACE, \ XPATH_ARRAY_FUNCTIONS_NAMESPACE # , XSLT_XQUERY_SERIALIZATION_NAMESPACE from ..xpath30 import XPath30Parser class XPath31Parser(XPath30Parser): """ XPath 3.1 expression parser class. """ version = '3.1' SYMBOLS = XPath30Parser.SYMBOLS | { # Map and array functions 'map', 'array', # 'merge', 'size', 'keys', 'contains', 'get', # 'find', 'put', # 'entry', # 'remove', 'append', 'subarray', 'remove', 'join', 'flatten', # 'random-number-generator', 'collation-key', # 'contains-token', 'parse-ietf-date', # Higher-order functions # 'sort', 'apply', 'load-xquery-module', 'transform', # Functions on JSON Data # 'parse-json', 'json-doc', 'json-to-xml', 'xml-to-json', # Arrow operator # '=>', } DEFAULT_NAMESPACES = { 'map': XPATH_MAP_FUNCTIONS_NAMESPACE, 'array': XPATH_ARRAY_FUNCTIONS_NAMESPACE, **XPath30Parser.DEFAULT_NAMESPACES } # https://www.w3.org/TR/xpath-31/#id-reserved-fn-names RESERVED_FUNCTION_NAMES = { 'array', 'attribute', 'comment', 'document-node', 'element', 'empty-sequence', 'function', 'if', 'item', 'map', 'namespace-node', 'node', 'processing-instruction', 'schema-attribute', 'schema-element', 'switch', 'text', 'typeswitch', } function_signatures = XPath30Parser.function_signatures.copy() elementpath-3.0.2/elementpath/xpath_context.py000066400000000000000000000477021427546011100216050ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import datetime import importlib from copy import copy from itertools import chain from types import ModuleType from typing import TYPE_CHECKING, cast, overload, Dict, Any, List, Iterator, \ Optional, Sequence, Union, Callable, Set from .exceptions import ElementPathTypeError from .namespaces import NamespacesType from .datatypes import AnyAtomicType, Timezone from .protocols import ElementProtocol, DocumentProtocol from .etree import is_etree_element, is_etree_document from .xpath_nodes import RootArgType, ChildNodeType, XPathNode, \ AttributeNode, NamespaceNode, CommentNode, ProcessingInstructionNode, \ ElementNode, DocumentNode from .tree_builders import get_node_tree if TYPE_CHECKING: from .xpath_token import XPathToken, XPathAxis __all__ = ['XPathContext', 'XPathSchemaContext'] ItemArgType = Union[RootArgType, XPathNode, AnyAtomicType] def is_xpath_node(obj: Any) -> bool: return isinstance(obj, XPathNode) or is_etree_element(obj) or is_etree_document(obj) class XPathContext: """ The XPath dynamic context. The static context is provided by the parser. Usually the dynamic context instances are created providing only the root element. Variable values argument is needed if the XPath expression refers to in-scope variables. The other optional arguments are needed only if a specific position on the context is required, but have to be used with the knowledge of what is their meaning. :param root: the root of the XML document, can be a ElementTree instance or an Element. :param namespaces: a dictionary with mapping from namespace prefixes into URIs, \ used when namespace information is not available within document and element nodes. \ This can be useful when the dynamic context has additional namespaces and root \ is an Element or an ElementTree instance of the standard library. :param item: the context item. A `None` value means that the context is positioned on \ the document node. :param position: the current position of the node within the input sequence. :param size: the number of items in the input sequence. :param axis: the active axis. Used to choose when apply the default axis ('child' axis). :param variables: dictionary of context variables that maps a QName to a value. :param current_dt: current dateTime of the implementation, including explicit timezone. :param timezone: implicit timezone to be used when a date, time, or dateTime value does \ not have a timezone. :param documents: available documents. This is a mapping of absolute URI \ strings onto document nodes. Used by the function fn:doc. :param collections: available collections. This is a mapping of absolute URI \ strings onto sequences of nodes. Used by the XPath 2.0+ function fn:collection. :param default_collection: this is the sequence of nodes used when fn:collection \ is called with no arguments. :param text_resources: available text resources. This is a mapping of absolute URI strings \ onto text resources. Used by XPath 3.0+ function fn:unparsed-text/fn:unparsed-text-lines. :param resource_collections: available URI collections. This is a mapping of absolute \ URI strings to sequence of URIs. Used by the XPath 3.0+ function fn:uri-collection. :param default_resource_collection: this is the sequence of URIs used when \ fn:uri-collection is called with no arguments. :param allow_environment: defines if the access to system environment is allowed, \ for default is `False`. Used by the XPath 3.0+ functions fn:environment-variable \ and fn:available-environment-variables. """ _etree: Optional[ModuleType] = None root: Union[DocumentNode, ElementNode] item: Union[XPathNode, AnyAtomicType, None] total_nodes: int = 0 # Number of nodes associated to the context documents: Optional[Dict[str, Union[DocumentNode, ElementNode]]] = None collections = None default_collection: Optional[List[Union[XPathNode, ElementProtocol, DocumentProtocol]]] = None def __init__(self, root: RootArgType, namespaces: Optional[NamespacesType] = None, item: Optional[ItemArgType] = None, position: int = 1, size: int = 1, axis: Optional[str] = None, variables: Optional[Dict[str, Any]] = None, current_dt: Optional[datetime.datetime] = None, timezone: Optional[Union[str, Timezone]] = None, documents: Optional[Dict[str, RootArgType]] = None, collections: Optional[Dict[str, List[ItemArgType]]] = None, default_collection: Optional[str] = None, text_resources: Optional[Dict[str, str]] = None, resource_collections: Optional[Dict[str, List[str]]] = None, default_resource_collection: Optional[str] = None, allow_environment: bool = False, default_language: Optional[str] = None, default_calendar: Optional[str] = None, default_place: Optional[str] = None) -> None: self.namespaces = dict(namespaces) if namespaces else {} self.root = get_node_tree(root, self.namespaces) if item is None: self.item = self.root if isinstance(self.root, ElementNode) else None else: self.item = self.get_context_item(item) self.position = position self.size = size self.axis = axis if timezone is None or isinstance(timezone, Timezone): self.timezone = timezone else: self.timezone = Timezone.fromstring(timezone) self.current_dt = current_dt or datetime.datetime.now(tz=self.timezone) if documents is not None: self.documents = {k: get_node_tree(v, self.namespaces) if v is not None else v for k, v in documents.items()} if variables is None: self.variables = {} else: self.variables = {k: self.get_context_item(v) for k, v in variables.items()} if collections is not None: self.collections = {k: self.get_context_item(v) if v is not None else v for k, v in collections.items()} if default_collection is not None: if isinstance(default_collection, list) and \ all(is_xpath_node(x) for x in default_collection): self.default_collection = self.get_context_item(default_collection) else: msg = "'default_collection' argument must be a list of XPath nodes" raise ElementPathTypeError(msg) self.text_resources = text_resources if text_resources is not None else {} self.resource_collections = resource_collections self.default_resource_collection = default_resource_collection self.allow_environment = allow_environment self.default_language = default_language self.default_calendar = default_calendar self.default_place = default_place def __repr__(self) -> str: return f'{self.__class__.__name__}(root={self.root.value})' def __copy__(self) -> 'XPathContext': obj: XPathContext = object.__new__(self.__class__) obj.__dict__.update(self.__dict__) obj.axis = None obj.variables = {k: v for k, v in self.variables.items()} return obj @property def etree(self) -> ModuleType: if self._etree is None: etree_module_name = self.root.value.__class__.__module__ self._etree: ModuleType = importlib.import_module(etree_module_name) return self._etree def get_root(self, node: Any) -> Union[None, ElementNode, DocumentNode]: if any(node is x for x in self.root.iter()): return self.root if self.documents is not None: try: for uri, doc in self.documents.items(): if any(node is x for x in doc.iter()): return doc except AttributeError: pass return None def is_principal_node_kind(self) -> bool: if self.axis == 'attribute': return isinstance(self.item, AttributeNode) elif self.axis == 'namespace': return isinstance(self.item, NamespaceNode) else: return isinstance(self.item, ElementNode) @overload def get_context_item(self, item: ItemArgType) \ -> Union[XPathNode, AnyAtomicType]: ... @overload def get_context_item(self, item: List[ItemArgType]) \ -> List[Union[XPathNode, AnyAtomicType]]: ... def get_context_item(self, item: Union[ItemArgType, List[ItemArgType]]) \ -> Union[XPathNode, AnyAtomicType, List[Union[XPathNode, AnyAtomicType]]]: """ Checks the item and returns an item suitable for XPath processing. For XML trees and elements try a match with an existing node in the context. If it fails then builds a new node. """ if isinstance(item, XPathNode): return item elif isinstance(item, (list, tuple)): return [self.get_context_item(x) for x in item] elif is_etree_document(item): if item is self.root.value: return self.root if self.documents: for doc in self.documents.values(): if item is doc.value: return doc elif is_etree_element(item): try: return self.root.elements[item] # type: ignore[index] except (TypeError, KeyError): pass if self.documents: for doc in self.documents.values(): if doc.elements is not None and item in doc.elements: return doc.elements[item] # type: ignore[index] if callable(item.tag): # type: ignore[union-attr] if item.tag.__name__ == 'Comment': # type: ignore[union-attr] return CommentNode(cast(ElementProtocol, item)) else: return ProcessingInstructionNode(cast(ElementProtocol, item)) else: return cast(AnyAtomicType, item) return get_node_tree( root=cast(Union[RootArgType], item), namespaces=self.namespaces ) def inner_focus_select(self, token: Union['XPathToken', 'XPathAxis']) -> Iterator[Any]: """Apply the token's selector with an inner focus.""" status = self.item, self.size, self.position, self.axis results = [x for x in token.select(copy(self))] self.axis = None if token.label == 'axis' and cast('XPathAxis', token).reverse_axis: self.size = self.position = len(results) for self.item in results: yield self.item self.position -= 1 else: self.size = len(results) for self.position, self.item in enumerate(results, start=1): yield self.item self.item, self.size, self.position, self.axis = status def iter_product(self, selectors: Sequence[Callable[[Any], Any]], varnames: Optional[Sequence[str]] = None) -> Iterator[Any]: """ Iterator for cartesian products of selectors. :param selectors: a sequence of selector generator functions. :param varnames: a sequence of variables for storing the generated values. """ iterators = [x(self) for x in selectors] dimension = len(iterators) prod = [None] * dimension max_index = dimension - 1 k = 0 while True: try: value = next(iterators[k]) except StopIteration: if not k: return iterators[k] = selectors[k](self) k -= 1 else: if varnames is not None: try: self.variables[varnames[k]] = value except (TypeError, IndexError): pass prod[k] = value if k == max_index: yield tuple(prod) else: k += 1 ## # Context item iterators for axis def iter_self(self) -> Iterator[Union[XPathNode, AnyAtomicType, None]]: """Iterator for 'self' axis and '.' shortcut.""" status = self.axis self.axis = 'self' yield self.item self.axis = status def iter_attributes(self) -> Iterator[AttributeNode]: """Iterator for 'attribute' axis and '@' shortcut.""" status: Any if isinstance(self.item, AttributeNode): status = self.axis self.axis = 'attribute' yield self.item self.axis = status return elif isinstance(self.item, ElementNode): status = self.item, self.axis self.axis = 'attribute' for self.item in self.item.attributes: yield self.item self.item, self.axis = status def iter_children_or_self(self) -> Iterator[Union[XPathNode, AnyAtomicType, None]]: """Iterator for 'child' forward axis and '/' step.""" if self.axis is not None: yield self.item elif isinstance(self.item, (ElementNode, DocumentNode)): _status = self.item, self.axis self.axis = 'child' for self.item in self.item: yield self.item self.item, self.axis = _status elif self.item is None: self.axis = 'child' if isinstance(self.root, DocumentNode): for self.item in self.root: yield self.item else: # document position without a document node -> yield root ElementNode yield self.root self.item = self.axis = None def iter_parent(self) -> Iterator[Union[ElementNode, DocumentNode]]: """Iterator for 'parent' reverse axis and '..' shortcut.""" if not isinstance(self.item, XPathNode): return # not applicable if self.item is not self.root: parent = self.item.parent if parent is not None: status = self.item, self.axis self.axis = 'parent' self.item = parent yield self.item self.item, self.axis = status def iter_siblings(self, axis: Optional[str] = None) -> Iterator[ChildNodeType]: """ Iterator for 'following-sibling' forward axis and 'preceding-sibling' reverse axis. :param axis: the context axis, default is 'following-sibling'. """ if not isinstance(self.item, XPathNode) or self.item is self.root: return parent = self.item.parent if parent is None: return item = self.item status = self.item, self.axis self.axis = axis or 'following-sibling' if axis == 'preceding-sibling': for child in parent: # pragma: no cover if child is item: break self.item = child yield child else: follows = False for child in parent: if follows: self.item = child yield child elif child is item: follows = True self.item, self.axis = status def iter_descendants(self, axis: Optional[str] = None) -> Iterator[Union[None, XPathNode]]: """ Iterator for 'descendant' and 'descendant-or-self' forward axes and '//' shortcut. :param axis: the context axis, for default has no explicit axis. """ descendants: Iterator[Union[None, XPathNode]] with_self = axis != 'descendant' if isinstance(self.item, (ElementNode, DocumentNode)): descendants = self.item.iter_descendants(with_self) elif self.item is None: if isinstance(self.root, DocumentNode): descendants = self.root.iter_descendants(with_self) elif with_self: # Yields None in order to emulate position on document # FIXME replacing the self.root with ElementTree(self.root)? descendants = chain((None,), self.root.iter_descendants()) else: descendants = self.root.iter_descendants() else: if with_self and isinstance(self.item, XPathNode): self.axis, axis = axis, self.axis yield self.item self.axis = axis return status = self.item, self.axis self.axis = axis for self.item in descendants: yield self.item self.item, self.axis = status def iter_ancestors(self, axis: Optional[str] = None) -> Iterator[XPathNode]: """ Iterator for 'ancestor' and 'ancestor-or-self' reverse axes. :param axis: the context axis, default is 'ancestor'. """ if not isinstance(self.item, XPathNode): return # item is not an XPath node or document position without a document root status = self.item, self.axis self.axis = axis or 'ancestor' ancestors: List[XPathNode] = [] if axis == 'ancestor-or-self': ancestors.append(self.item) if self.item is not self.root: parent = self.item.parent while parent is not None: ancestors.append(parent) if parent is self.root: break parent = parent.parent for self.item in reversed(ancestors): yield self.item self.item, self.axis = status def iter_preceding(self) -> Iterator[Union[DocumentNode, ChildNodeType]]: """Iterator for 'preceding' reverse axis.""" ancestors: Set[Union[ElementNode, DocumentNode]] item: XPathNode parent: Union[None, ElementNode, DocumentNode] if not isinstance(self.item, XPathNode) or self.item is self.root: return parent = self.item.parent if parent is None: return status = self.item, self.axis self.axis = 'preceding' ancestors = set() while parent is not None: ancestors.add(parent) if parent is self.root: break parent = parent.parent item = self.item for self.item in self.root.iter_descendants(): if self.item is item: break if self.item not in ancestors: yield self.item self.item, self.axis = status def iter_followings(self) -> Iterator[ChildNodeType]: """Iterator for 'following' forward axis.""" if self.item is None or self.item is self.root: return elif isinstance(self.item, ElementNode): status = self.item, self.axis self.axis = 'following' item = self.item descendants = set(item.iter_descendants()) for self.item in self.root.iter_descendants(with_self=False): if item.position < self.item.position and self.item not in descendants: yield cast(ChildNodeType, self.item) self.item, self.axis = status class XPathSchemaContext(XPathContext): """ The XPath dynamic context base class for schema bounded parsers. Use this class as dynamic context for schema instances in order to perform a schema-based type checking during the static analysis phase. Don't use this as dynamic context on XML instances. """ root: ElementNode elementpath-3.0.2/elementpath/xpath_nodes.py000066400000000000000000000700101427546011100212150ustar00rootroot00000000000000# # Copyright (c), 2018-2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from urllib.parse import urlparse from typing import cast, Any, Dict, Iterator, List, MutableMapping, Optional, Tuple, Union from .datatypes import UntypedAtomic, get_atomic_value, AtomicValueType from .namespaces import XML_NAMESPACE, XML_BASE, XSI_NIL, \ XSD_ANY_TYPE, XSD_ANY_SIMPLE_TYPE, XSD_ANY_ATOMIC_TYPE, \ XML_ID, XSD_IDREF, XSD_IDREFS from .protocols import ElementProtocol, DocumentProtocol, XsdElementProtocol, \ XsdAttributeProtocol, XsdTypeProtocol, XsdSchemaProtocol from .helpers import match_wildcard from .etree import etree_iter_strings __all__ = ['SchemaElemType', 'RootArgType', 'ChildNodeType', 'ElementMapType', 'XPathNode', 'AttributeNode', 'NamespaceNode', 'TextNode', 'CommentNode', 'ProcessingInstructionNode', 'ElementNode', 'LazyElementNode', 'SchemaElementNode', 'DocumentNode'] _XSD_SPECIAL_TYPES = {XSD_ANY_TYPE, XSD_ANY_SIMPLE_TYPE, XSD_ANY_ATOMIC_TYPE} SchemaElemType = Union[XsdSchemaProtocol, XsdElementProtocol] RootArgType = Union[DocumentProtocol, ElementProtocol, SchemaElemType, 'DocumentNode', 'ElementNode'] ChildNodeType = Union['TextNode', 'ElementNode', 'CommentNode', 'ProcessingInstructionNode'] ElementMapType = Dict[Union[ElementProtocol, SchemaElemType], 'ElementNode'] ### # XQuery and XPath Data Model: https://www.w3.org/TR/xpath-datamodel/ # # Note: in this implementation empty sequence return value is replaced by None. # # XPath has seven kinds of nodes: # # element, attribute, text, namespace, processing-instruction, comment, document ### class XPathNode: """The base class of all XPath nodes. Used only for type checking.""" # Accessors, empty sequences are represented with None values. kind: str = '' children: Optional[List[ChildNodeType]] parent: Union['ElementNode', 'DocumentNode', None] __slots__ = 'parent', 'position' @property def attributes(self) -> Optional[List['AttributeNode']]: return None @property def base_uri(self) -> Optional[str]: return None @property def document_uri(self) -> Optional[str]: return None @property def is_id(self) -> Optional[bool]: return None @property def is_idrefs(self) -> Optional[bool]: return None @property def namespace_nodes(self) -> Optional[List['NamespaceNode']]: return None @property def nilled(self) -> Optional[bool]: return None @property def name(self) -> Optional[str]: return None @property def type_name(self) -> Optional[str]: return None @property def string_value(self) -> str: raise NotImplementedError() @property def typed_value(self) -> Optional[AtomicValueType]: raise NotImplementedError() # Other common attributes and methods value: Any position: int # for document total order def match_name(self, name: str, default_namespace: Optional[str] = None) -> bool: """ Returns `True` if the argument is matching the name of the node, `False` otherwise. Raises a ValueError if the argument is used, but it's in a wrong format. :param name: a fully qualified name, a local name or a wildcard. The accepted \ wildcard formats are '*', '*:*', '*:local-name' and '{namespace}*'. :param default_namespace: the default namespace for unprefixed names. """ return False class AttributeNode(XPathNode): """ A class for processing XPath attribute nodes. :param name: the attribute name. :param value: a string value or an XSD attribute when XPath is applied on a schema. :param parent: the parent element node. :param position: the position of the node in the document. :param xsd_type: an optional XSD type associated with the attribute node. """ attributes: None children: None = None base_uri: None document_uri: None namespace_nodes: None nilled: None parent: Optional['ElementNode'] kind = 'attribute' __slots__ = '_name', 'value', 'xsd_type' def __init__(self, name: str, value: Union[str, XsdAttributeProtocol], parent: Optional['ElementNode'] = None, position: int = 1, xsd_type: Optional[XsdTypeProtocol] = None) -> None: self._name = name self.value: Union[str, XsdAttributeProtocol] = value self.parent = parent self.position = position self.xsd_type = xsd_type @property def is_id(self) -> bool: return self._name == XML_ID or self.xsd_type is not None and self.xsd_type.is_key() @property def is_idrefs(self) -> bool: if self.xsd_type is None: return False root_type = self.xsd_type.root_type return root_type.name == XSD_IDREF or root_type.name == XSD_IDREFS @property def name(self) -> Optional[str]: return self._name @property def type_name(self) -> Optional[str]: if self.xsd_type is None: return None return self.xsd_type.name @property def string_value(self) -> str: if isinstance(self.value, str): return self.value return str(get_atomic_value(self.value.type)) @property def typed_value(self) -> AtomicValueType: if not isinstance(self.value, str): return get_atomic_value(self.value.type) elif self.xsd_type is None or self.xsd_type.name in _XSD_SPECIAL_TYPES: return UntypedAtomic(self.value) return cast(AtomicValueType, self.xsd_type.decode(self.value)) def as_item(self) -> Tuple[str, Union[str, XsdAttributeProtocol]]: return self._name, self.value def __repr__(self) -> str: return '%s(name=%r, value=%r)' % (self.__class__.__name__, self._name, self.value) @property def path(self) -> str: if self.parent is None: return f'@{self._name}' return f'{self.parent.path}/@{self._name}' def match_name(self, name: str, default_namespace: Optional[str] = None) -> bool: if '*' in name: return match_wildcard(self._name, name) else: return self._name == name class NamespaceNode(XPathNode): """ A class for processing XPath namespace nodes. :param prefix: the namespace prefix. :param uri: the namespace URI. :param parent: the parent element node. :param position: the position of the node in the document. """ attributes: None children: None = None base_uri: None document_uri: None is_id: None is_idrefs: None namespace_nodes: None nilled: None parent: Optional['ElementNode'] type_name: None kind = 'namespace' __slots__ = 'prefix', 'uri' def __init__(self, prefix: Optional[str], uri: str, parent: Optional['ElementNode'] = None, position: int = 1) -> None: self.prefix = prefix self.uri = uri self.parent = parent self.position = position @property def name(self) -> Optional[str]: return self.prefix @property def value(self) -> str: return self.uri def as_item(self) -> Tuple[Optional[str], str]: return self.prefix, self.uri def __repr__(self) -> str: return '%s(prefix=%r, uri=%r)' % (self.__class__.__name__, self.prefix, self.uri) @property def string_value(self) -> str: return self.uri @property def typed_value(self) -> str: return self.uri class TextNode(XPathNode): """ A class for processing XPath text nodes. An Element's property (elem.text or elem.tail) with a `None` value is not a text node. :param value: a string value. :param parent: the parent element node. :param position: the position of the node in the document. """ attributes: None children: None = None document_uri: None is_id: None is_idrefs: None namespace_nodes: None nilled: None name: None parent: Optional['ElementNode'] type_name: None kind = 'text' value: str __slots__ = 'value', def __init__(self, value: str, parent: Optional['ElementNode'] = None, position: int = 1) -> None: self.value = value self.parent = parent self.position = position def __repr__(self) -> str: return '%s(value=%r)' % (self.__class__.__name__, self.value) @property def base_uri(self) -> Optional[str]: if isinstance(self.parent, ElementNode): return self.parent.elem.get(XML_BASE) return None @property def string_value(self) -> str: return self.value @property def typed_value(self) -> UntypedAtomic: return UntypedAtomic(self.value) class CommentNode(XPathNode): """ A class for processing XPath comment nodes. :param elem: the wrapped Comment Element. :param parent: the parent element node. :param position: the position of the node in the document. """ attributes: None children: None = None document_uri: None is_id: None is_idrefs: None namespace_nodes: None nilled: None name: None type_name: None kind = 'comment' __slots__ = 'elem', def __init__(self, elem: ElementProtocol, parent: Union['ElementNode', 'DocumentNode', None] = None, position: int = 1) -> None: self.elem = elem self.parent = parent self.position = position def __repr__(self) -> str: return '%s(elem=%r)' % (self.__class__.__name__, self.elem) @property def value(self) -> ElementProtocol: return self.elem @property def base_uri(self) -> Optional[str]: if self.parent is not None: return self.parent.base_uri return None @property def string_value(self) -> str: return self.elem.text or '' @property def typed_value(self) -> str: return self.elem.text or '' class ProcessingInstructionNode(XPathNode): """ A class for XPath processing instructions nodes. :param elem: the wrapped Processing Instruction Element. :param parent: the parent element node. :param position: the position of the node in the document. """ attributes: None children: None = None document_uri: None is_id: None is_idrefs: None namespace_nodes: None nilled: None type_name: None kind = 'processing-instruction' __slots__ = 'elem', def __init__(self, elem: ElementProtocol, parent: Union['ElementNode', 'DocumentNode', None] = None, position: int = 1) -> None: self.elem = elem self.parent = parent self.position = position def __repr__(self) -> str: return '%s(elem=%r)' % (self.__class__.__name__, self.elem) @property def value(self) -> ElementProtocol: return self.elem @property def base_uri(self) -> Optional[str]: if self.parent is not None: return self.parent.base_uri return None @property def name(self) -> str: try: # an lxml PI return cast(str, self.elem.target) # type: ignore[attr-defined] except AttributeError: return cast(str, self.elem.text).split(' ', maxsplit=1)[0] @property def string_value(self) -> str: return self.elem.text or '' @property def typed_value(self) -> str: return self.elem.text or '' class ElementNode(XPathNode): """ A class for processing XPath element nodes that uses lazy properties to diminish the average load for a tree processing. :param elem: the wrapped Element or XSD schema/element. :param parent: the parent document node or element node. :param position: the position of the node in the document. :param nsmap: an optional mapping from prefix to namespace URI. :param xsd_type: an optional XSD type associated with the element node. """ children: List[ChildNodeType] document_uri: None kind = 'element' elem: Union[ElementProtocol, SchemaElemType] nsmap: MutableMapping[Optional[str], str] elements: Optional[ElementMapType] _namespace_nodes: Optional[List['NamespaceNode']] _attributes: Optional[List['AttributeNode']] __slots__ = 'nsmap', 'elem', 'xsd_type', 'elements', \ '_namespace_nodes', '_attributes', 'children' def __init__(self, elem: Union[ElementProtocol, SchemaElemType], parent: Optional[Union['ElementNode', 'DocumentNode']] = None, position: int = 1, nsmap: Optional[MutableMapping[Any, str]] = None, xsd_type: Optional[XsdTypeProtocol] = None) -> None: self.elem = elem self.parent = parent self.position = position self.xsd_type = xsd_type self.elements = None self._namespace_nodes = None self._attributes = None self.children = [] if nsmap is not None: self.nsmap = nsmap else: try: self.nsmap = cast(Dict[Any, str], getattr(elem, 'nsmap')) except AttributeError: self.nsmap = {} def __repr__(self) -> str: return '%s(elem=%r)' % (self.__class__.__name__, self.elem) def __getitem__(self, i: Union[int, slice]) -> Union[ChildNodeType, List[ChildNodeType]]: return self.children[i] def __len__(self) -> int: return len(self.children) def __iter__(self) -> Iterator[ChildNodeType]: yield from self.children @property def value(self) -> Union[ElementProtocol, SchemaElemType]: return self.elem @property def is_id(self) -> bool: return False @property def is_idrefs(self) -> bool: return False @property def name(self) -> str: return self.elem.tag @property def type_name(self) -> Optional[str]: if self.xsd_type is None: return None return self.xsd_type.name @property def base_uri(self) -> Optional[str]: return self.elem.get(XML_BASE) @property def nilled(self) -> bool: return self.elem.get(XSI_NIL) in ('true', '1') @property def string_value(self) -> str: if self.xsd_type is not None and self.xsd_type.is_element_only(): # Element-only text content is normalized return ''.join(etree_iter_strings(self.elem, normalize=True)) return ''.join(etree_iter_strings(self.elem)) @property def typed_value(self) -> Optional[AtomicValueType]: if self.xsd_type is None or \ self.xsd_type.name in _XSD_SPECIAL_TYPES or \ self.xsd_type.has_mixed_content(): return UntypedAtomic(''.join(etree_iter_strings(self.elem))) elif self.xsd_type.is_element_only() or self.xsd_type.is_empty(): return None elif self.elem.get(XSI_NIL) and getattr(self.xsd_type.parent, 'nillable', None): return None if self.elem.text is not None: value = self.xsd_type.decode(self.elem.text) elif self.elem.get(XSI_NIL) in ('1', 'true'): return '' else: value = self.xsd_type.decode(self.elem.text) return cast(Optional[AtomicValueType], value) @property def namespace_nodes(self) -> List['NamespaceNode']: if self._namespace_nodes is None: # Lazy generation of namespace nodes of the element position = self.position + 1 self._namespace_nodes = [NamespaceNode('xml', XML_NAMESPACE, self, position)] position += 1 if self.nsmap: for pfx, uri in self.nsmap.items(): if pfx != 'xml': self._namespace_nodes.append(NamespaceNode(pfx, uri, self, position)) position += 1 return self._namespace_nodes @property def attributes(self) -> List['AttributeNode']: if self._attributes is None: position = self.position + len(self.nsmap) + int('xml' not in self.nsmap) self._attributes = [ AttributeNode(name, value, self, pos) for pos, (name, value) in enumerate(self.elem.attrib.items(), position) ] return self._attributes def is_schema_element(self) -> bool: return hasattr(self.elem, 'name') and hasattr(self.elem, 'type') @property def path(self) -> str: """Returns an absolute path for the node.""" path = [] item: Any = self while True: if isinstance(item, ElementNode): path.append(item.elem.tag) item = item.parent if item is None: return '/{}'.format('/'.join(reversed(path))) def match_name(self, name: str, default_namespace: Optional[str] = None) -> bool: if '*' in name: return match_wildcard(self.elem.tag, name) elif not name: return not self.elem.tag elif hasattr(self.elem, 'type'): return cast(XsdElementProtocol, self.elem).is_matching(name, default_namespace) elif name[0] == '{' or default_namespace is None: return self.elem.tag == name if None in self.nsmap: default_namespace = self.nsmap[None] # lxml element in-scope namespaces if default_namespace: return self.elem.tag == '{%s}%s' % (default_namespace, name) return self.elem.tag == name def get_element_node(self, elem: Union[ElementProtocol, SchemaElemType]) \ -> Optional['ElementNode']: if self.elements is not None: return self.elements.get(elem) # Fallback if there is not the map of elements but do not expand lazy elements for node in self.iter(): if isinstance(node, ElementNode) and elem is node.elem: return node else: return None def iter(self) -> Iterator[XPathNode]: # Iterate the tree not including the not built lazy components. yield self iterators: List[Any] = [] children: Iterator[Any] = iter(self.children) if self._namespace_nodes: yield from self._namespace_nodes if self._attributes: yield from self._attributes while True: for child in children: yield child if isinstance(child, ElementNode): if child._namespace_nodes: yield from child._namespace_nodes if child._attributes: yield from child._attributes if child.children: iterators.append(children) children = iter(child.children) break else: try: children = iterators.pop() except IndexError: return def iter_document(self) -> Iterator[XPathNode]: # Iterate the tree but building lazy components. # Rarely used, don't need optimization. yield self yield from self.namespace_nodes yield from self.attributes for child in self: if isinstance(child, ElementNode): yield from child.iter() else: yield child def iter_descendants(self, with_self: bool = True) -> Iterator[ChildNodeType]: if with_self: yield self iterators: List[Any] = [] children: Iterator[Any] = iter(self.children) while True: for child in children: yield child if isinstance(child, ElementNode) and child.children: iterators.append(children) children = iter(child.children) break else: try: children = iterators.pop() except IndexError: return class DocumentNode(XPathNode): """ A class for XPath document nodes. :param document: the wrapped ElementTree instance. :param position: the position of the node in the document, usually 1, \ or 0 for lxml standalone root elements with siblings. """ attributes: None = None children: List[ChildNodeType] is_id: None is_idrefs: None namespace_nodes: None nilled: None name: None parent: None type_name: None kind = 'document' elements: Dict[ElementProtocol, ElementNode] __slots__ = 'document', 'elements', 'children' def __init__(self, document: DocumentProtocol, position: int = 1) -> None: self.document = document self.parent = None self.position = position self.elements = {} self.children = [] @property def base_uri(self) -> Optional[str]: if not self.children: return None return self.getroot().base_uri def getroot(self) -> ElementNode: for child in self.children: if isinstance(child, ElementNode): return child raise RuntimeError("Missing document root") def get_element_node(self, elem: ElementProtocol) -> Optional[ElementNode]: return self.elements.get(elem) def iter(self) -> Iterator[XPathNode]: yield self for e in self.children: if isinstance(e, ElementNode): yield from e.iter() else: yield e def iter_document(self) -> Iterator[XPathNode]: yield self for e in self.children: if isinstance(e, ElementNode): yield from e.iter_document() else: yield e def iter_descendants(self, with_self: bool = True) \ -> Iterator[Union['DocumentNode', ChildNodeType]]: if with_self: yield self for e in self.children: if isinstance(e, ElementNode): yield from e.iter_descendants() else: yield e def __getitem__(self, i: Union[int, slice]) -> Union[ChildNodeType, List[ChildNodeType]]: return self.children[i] def __len__(self) -> int: return len(self.children) def __iter__(self) -> Iterator[ChildNodeType]: yield from self.children @property def value(self) -> DocumentProtocol: return self.document @property def string_value(self) -> str: return ''.join(etree_iter_strings(self.document.getroot())) @property def typed_value(self) -> UntypedAtomic: return UntypedAtomic(''.join(etree_iter_strings(self.document.getroot()))) @property def document_uri(self) -> Optional[str]: try: uri = cast(str, self.document.getroot().attrib[XML_BASE]) parts = urlparse(uri) except (KeyError, ValueError): pass else: if parts.scheme and parts.netloc or parts.path.startswith('/'): return uri return None ### # Specialized element nodes class LazyElementNode(ElementNode): """ A fully lazy element node, slower but better if the node does not to be used in a document context. The node extends descendants but does not record positions and a map of elements. """ __slots__ = () def __iter__(self) -> Iterator[ChildNodeType]: if not self.children: if self.elem.text is not None: self.children.append(TextNode(self.elem.text, self)) if len(self.elem): for elem in self.elem: if not callable(elem.tag): nsmap = cast(Dict[Any, str], getattr(elem, 'nsmap', self.nsmap)) self.children.append(LazyElementNode(elem, self, nsmap=nsmap)) elif elem.tag.__name__ == 'Comment': # type: ignore[attr-defined] self.children.append(CommentNode(elem, self)) else: self.children.append(ProcessingInstructionNode(elem, self)) if elem.tail is not None: self.children.append(TextNode(elem.tail, self)) yield from self.children def iter_descendants(self, with_self: bool = True) -> Iterator[ChildNodeType]: if with_self: yield self for child in self: if isinstance(child, ElementNode): yield from child.iter_descendants() else: yield child class SchemaElementNode(ElementNode): """ An element node class for wrapping the XSD schema and its elements. The resulting structure can be a tree or a set of disjoint trees. With more roots only one of them is the schema node. """ __slots__ = '__dict__' ref: Optional['SchemaElementNode'] = None elem: SchemaElemType def __iter__(self) -> Iterator[ChildNodeType]: if self.ref is None: yield from self.children else: yield from self.ref.children @property def attributes(self) -> List['AttributeNode']: if self._attributes is None: position = self.position + len(self.nsmap) + int('xml' not in self.nsmap) self._attributes = [ AttributeNode(name, attr, self, pos, attr.type) for pos, (name, attr) in enumerate(self.elem.attrib.items(), position) ] return self._attributes @property def string_value(self) -> str: if not hasattr(self.elem, 'type'): return '' schema_node = cast(XsdElementProtocol, self.elem) return str(get_atomic_value(schema_node.type)) @property def typed_value(self) -> Optional[AtomicValueType]: if not hasattr(self.elem, 'type'): return UntypedAtomic('') schema_node = cast(XsdElementProtocol, self.elem) return get_atomic_value(schema_node.type) def iter(self) -> Iterator[XPathNode]: yield self iterators: List[Any] = [] children: Iterator[Any] = iter(self.children) if self._namespace_nodes: yield from self._namespace_nodes if self._attributes: yield from self._attributes elements = {self} while True: for child in children: if child in elements: continue yield child elements.add(child) if isinstance(child, ElementNode): if child._namespace_nodes: yield from child._namespace_nodes if child._attributes: yield from child._attributes if child.children: iterators.append(children) children = iter(child.children) break else: try: children = iterators.pop() except IndexError: return def iter_descendants(self, with_self: bool = True) -> Iterator[ChildNodeType]: if with_self: yield self iterators: List[Any] = [] children: Iterator[Any] = iter(self.children) elements = {self} while True: for child in children: if child.ref is not None: child = child.ref if child in elements: continue yield child elements.add(child) if child.children: iterators.append(children) children = iter(child.children) break else: try: children = iterators.pop() except IndexError: return elementpath-3.0.2/elementpath/xpath_selectors.py000066400000000000000000000137461427546011100221250ustar00rootroot00000000000000# # Copyright (c), 2018, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from typing import TYPE_CHECKING, Any, Dict, Optional, Iterator, Union, Type from .namespaces import NamespacesType from .xpath_nodes import RootArgType from .xpath_context import XPathContext from .xpath2 import XPath2Parser if TYPE_CHECKING: from .xpath1 import XPath1Parser from .xpath30 import XPath30Parser ParserType = Union[Type[XPath1Parser], Type[XPath2Parser], Type[XPath30Parser]] else: ParserType = XPath2Parser def select(root: RootArgType, path: str, namespaces: Optional[NamespacesType] = None, parser: Optional[ParserType] = None, **kwargs: Any) -> Any: """ XPath selector function that apply a *path* expression on *root* Element. :param root: an Element or ElementTree instance. :param path: the XPath expression. :param namespaces: a dictionary with mapping from namespace prefixes into URIs. :param parser: the parser class to use, that is :class:`XPath2Parser` for default. :param kwargs: other optional parameters for the parser instance or the dynamic context. :return: a list with XPath nodes or a basic type for expressions based \ on a function or literal. """ context_kwargs = { 'item': kwargs.pop('item', None), 'position': kwargs.pop('position', 1), 'size': kwargs.pop('size', 1), 'axis': kwargs.pop('axis', None), 'variables': kwargs.pop('variables', None), 'current_dt': kwargs.pop('current_dt', None), 'timezone': kwargs.pop('timezone', None), } _parser = (parser or XPath2Parser)(namespaces, **kwargs) root_token = _parser.parse(path) context = XPathContext(root, namespaces, **context_kwargs) return root_token.get_results(context) def iter_select(root: RootArgType, path: str, namespaces: Optional[NamespacesType] = None, parser: Optional[ParserType] = None, **kwargs: Any) -> Iterator[Any]: """ A function that creates an XPath selector generator for apply a *path* expression on *root* Element. :param root: an Element or ElementTree instance. :param path: the XPath expression. :param namespaces: a dictionary with mapping from namespace prefixes into URIs. :param parser: the parser class to use, that is :class:`XPath2Parser` for default. :param kwargs: other optional parameters for the parser instance or the dynamic context. :return: a generator of the XPath expression results. """ context_kwargs = { 'item': kwargs.pop('item', None), 'position': kwargs.pop('position', 1), 'size': kwargs.pop('size', 1), 'axis': kwargs.pop('axis', None), 'variables': kwargs.pop('variables', None), 'current_dt': kwargs.pop('current_dt', None), 'timezone': kwargs.pop('timezone', None), } _parser = (parser or XPath2Parser)(namespaces, **kwargs) root_token = _parser.parse(path) context = XPathContext(root, namespaces, **context_kwargs) return root_token.select_results(context) class Selector(object): """ XPath selector class. Create an instance of this class if you want to apply an XPath selector to several target data. :param path: the XPath expression. :param namespaces: a dictionary with mapping from namespace prefixes into URIs. :param parser: the parser class to use, that is :class:`XPath2Parser` for default. :param kwargs: other optional parameters for the XPath parser instance. :ivar path: the XPath expression. :vartype path: str :ivar parser: the parser instance. :vartype parser: XPath1Parser or XPath2Parser :ivar root_token: the root of tokens tree compiled from path. :vartype root_token: XPathToken """ def __init__(self, path: str, namespaces: Optional[NamespacesType] = None, parser: Optional[ParserType] = None, **kwargs: Any) -> None: self._variables = kwargs.pop('variables', None) # For backward compatibility self.parser = (parser or XPath2Parser)(namespaces, **kwargs) self.path = path self.root_token = self.parser.parse(path) def __repr__(self) -> str: return '%s(path=%r, parser=%s)' % ( self.__class__.__name__, self.path, self.parser.__class__.__name__ ) @property def namespaces(self) -> Dict[str, str]: """A dictionary with mapping from namespace prefixes into URIs.""" return self.parser.namespaces def select(self, root: RootArgType, **kwargs: Any) -> Any: """ Applies the instance's XPath expression on *root* Element. :param root: an Element or ElementTree instance. :param kwargs: other optional parameters for the XPath dynamic context. :return: a list with XPath nodes or a basic type for expressions based on \ a function or literal. """ if 'variables' not in kwargs and self._variables: kwargs['variables'] = self._variables context = XPathContext(root, **kwargs) return self.root_token.get_results(context) def iter_select(self, root: RootArgType, **kwargs: Any) -> Iterator[Any]: """ Creates an XPath selector generator for apply the instance's XPath expression on *root* Element. :param root: an Element or ElementTree instance. :param kwargs: other optional parameters for the XPath dynamic context. :return: a generator of the XPath expression results. """ if 'variables' not in kwargs and self._variables: kwargs['variables'] = self._variables context = XPathContext(root, **kwargs) return self.root_token.select_results(context) elementpath-3.0.2/elementpath/xpath_token.py000066400000000000000000002011231427546011100212260ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPathToken and helper functions for XPath nodes. XPath error messages and node helper functions are embedded in XPathToken class, in order to raise errors related to token instances. In XPath there are 7 kinds of nodes: element, attribute, text, namespace, processing-instruction, comment, document Element-like objects are used for representing elements and comments, ElementTree-like objects for documents. XPathNode subclasses are used for representing other node types and typed elements/attributes. """ import decimal import locale import contextlib import math from copy import copy from decimal import Decimal from itertools import product from typing import TYPE_CHECKING, cast, Dict, Optional, List, Tuple, Union, \ Any, Iterator, SupportsFloat, Type import urllib.parse from .exceptions import ElementPathError, ElementPathValueError, ElementPathNameError, \ ElementPathTypeError, ElementPathSyntaxError, MissingContextError, XPATH_ERROR_CODES from .helpers import ordinal from .namespaces import XQT_ERRORS_NAMESPACE, XSD_NAMESPACE, XSD_SCHEMA, \ XPATH_FUNCTIONS_NAMESPACE, XPATH_MATH_FUNCTIONS_NAMESPACE, XSD_DECIMAL, \ XSD_ANY_TYPE, XSD_ANY_SIMPLE_TYPE, XSD_ANY_ATOMIC_TYPE from .xpath_nodes import XPathNode, ElementNode, AttributeNode, \ DocumentNode, NamespaceNode, SchemaElementNode from .datatypes import xsd11_atomic_types, AbstractDateTime, AnyURI, \ UntypedAtomic, Timezone, DateTime10, Date10, DayTimeDuration, Duration, \ Integer, DoubleProxy10, DoubleProxy, QName, DatetimeValueType, \ AtomicValueType, AnyAtomicType, Float10, Float from .protocols import ElementProtocol, DocumentProtocol, XsdAttributeProtocol, \ XsdElementProtocol, XsdTypeProtocol, XsdSchemaProtocol from .schema_proxy import AbstractSchemaProxy from .tdop import Token, MultiLabel from .xpath_context import XPathContext, XPathSchemaContext if TYPE_CHECKING: from .xpath1 import XPath1Parser from .xpath2 import XPath2Parser from .xpath30 import XPath30Parser XPathParserType = Union[XPath1Parser, XPath2Parser, XPath30Parser] else: XPathParserType = Any UNICODE_CODEPOINT_COLLATION = "http://www.w3.org/2005/xpath-functions/collation/codepoint" _XSD_SPECIAL_TYPES = {XSD_ANY_TYPE, XSD_ANY_SIMPLE_TYPE, XSD_ANY_ATOMIC_TYPE} _CHILD_AXIS_TOKENS = { '*', 'node', 'child', 'text', '(name)', ':', '[', 'document-node', 'element', 'comment', 'processing-instruction', 'schema-element' } _LEAF_ELEMENTS_TOKENS = { '(name)', '*', ':', '..', '.', '[', 'self', 'child', 'parent', 'following-sibling', 'preceding-sibling', 'ancestor', 'ancestor-or-self', 'descendant', 'descendant-or-self', 'following', 'preceding' } # Type annotations aliases NargsType = Optional[Union[int, Tuple[int, Optional[int]]]] ClassCheckType = Union[Type[Any], Tuple[Type[Any], ...]] PrincipalNodeType = Union[ElementProtocol, AttributeNode, ElementNode] OperandsType = Tuple[Optional[AtomicValueType], Optional[AtomicValueType]] XPathResultType = Union[ AtomicValueType, ElementProtocol, XsdAttributeProtocol, Tuple[Optional[str], str] ] XPathTokenType = Union['XPathToken', 'XPathAxis', 'XPathFunction', 'XPathConstructor'] XPathFunctionArgType = Union[None, 'XPathToken', XPathNode, AtomicValueType, List[Union['XPathToken', XPathNode, AtomicValueType]]] class XPathToken(Token[XPathTokenType]): """Base class for XPath tokens.""" parser: XPathParserType xsd_types: Optional[Dict[Optional[str], Union[XsdTypeProtocol, List[XsdTypeProtocol]]]] namespace: Optional[str] occurrence: Optional[str] xsd_types = None # for XPath 2.0+ XML Schema types labeling namespace = None # for namespace binding of names and wildcards occurrence = None # occurrence indicator for item types def __call__(self, context: Optional[XPathContext] = None) -> Any: return self.evaluate(context) def evaluate(self, context: Optional[XPathContext] = None) -> Any: """ Evaluate default method for XPath tokens. :param context: The XPath dynamic context. """ return [x for x in self.select(context)] def select(self, context: Optional[XPathContext] = None) -> Iterator[Any]: """ Select operator that generates XPath results. :param context: The XPath dynamic context. """ item = self.evaluate(context) if item is not None: if isinstance(item, list): yield from item else: if context is not None: context.item = item yield item def __str__(self) -> str: symbol, label = self.symbol, self.label if symbol == '$': return '$%s variable reference' % (self[0].value if self._items else '') elif symbol == ',': return 'comma operator' if self.parser.version > '1.0' else 'comma symbol' elif symbol == 'function': return str(label) elif label.endswith('function') or label in ('axis', 'sequence type', 'kind test'): return '%r %s' % (symbol, str(label)) return super(XPathToken, self).__str__() @property def source(self) -> str: symbol = self.symbol if self.label == 'axis': # For XPath 2.0 'attribute' multi-role token ('kind test', 'axis') return '%s::%s' % (symbol, self[0].source) elif symbol == ':': if self.occurrence: return str(self.value) + self.occurrence else: return str(self.value) elif symbol == '/' or symbol == '//': if not self: return symbol elif len(self) == 1: return f'{symbol}{self[0].source}' else: return f'{self[0].source}{symbol}{self[1].source}' elif symbol == '(': return '()' if not self else '(%s)' % self[0].source elif symbol == '[': return '%s[%s]' % (self[0].source, self[1].source) elif symbol == ',': return '%s, %s' % (self[0].source, self[1].source) elif symbol == '$' or symbol == '@': return f'{symbol}{self[0].source}' elif symbol == '{': return '{%s}%s' % (self[0].value, self[1].value) elif symbol == 'if': return 'if (%s) then %s else %s' % (self[0].source, self[1].source, self[2].source) elif symbol == 'instance': return '%s instance of %s' % (self[0].source, ''.join(t.source for t in self[1:])) elif symbol == 'treat': return '%s treat as %s' % (self[0].source, ''.join(t.source for t in self[1:])) elif symbol == 'for': return 'for %s return %s' % ( ', '.join('%s in %s' % (self[k].source, self[k + 1].source) for k in range(0, len(self) - 1, 2)), self[-1].source ) return super(XPathToken, self).source @property def child_axis(self) -> bool: """Is `True` if the token apply child axis for default, `False` otherwise.""" if self.symbol not in _CHILD_AXIS_TOKENS: return False elif self.symbol == '[': return self._items[0].child_axis elif self.symbol != ':': return True return not self._items[1].label.endswith('function') ### # Tokens tree analysis methods def iter_leaf_elements(self) -> Iterator[str]: """ Iterates through the leaf elements of the token tree if there are any, returning QNames in prefixed format. A leaf element is an element positioned at last path step. Does not consider kind tests and wildcards. """ if self.symbol in ('(name)', ':'): yield cast(str, self.value) elif self.symbol in ('//', '/'): if self._items[-1].symbol in _LEAF_ELEMENTS_TOKENS: yield from self._items[-1].iter_leaf_elements() elif self.symbol in ('[',): yield from self._items[0].iter_leaf_elements() else: for tk in self._items: yield from tk.iter_leaf_elements() ### # Dynamic context methods def get_argument(self, context: Optional[XPathContext], index: int = 0, required: bool = False, default_to_context: bool = False, default: Optional[AtomicValueType] = None, cls: Optional[Type[Any]] = None, promote: Optional[ClassCheckType] = None) -> Any: """ Get the argument value of a function of constructor token. A zero length sequence is converted to a `None` value. If the function has no argument returns the context's item if the dynamic context is not `None`. :param context: the dynamic context. :param index: an index for select the argument to be got, the first for default. :param required: if set to `True` missing or empty sequence arguments are not allowed. :param default_to_context: if set to `True` then the item of the dynamic context is \ returned when the argument is missing. :param default: the default value returned in case the argument is an empty sequence. \ If not provided returns `None`. :param cls: if a type is provided performs a type checking on item. :param promote: a class or a tuple of classes that are promoted to `cls` class. """ item: Union[None, ElementProtocol, DocumentProtocol, XPathNode, AnyAtomicType] try: selector = self._items[index].select except IndexError: if default_to_context: if context is None: raise self.missing_context() from None item = context.item if context.item is not None else context.root elif required: msg = "missing %s argument" % ordinal(index + 1) raise self.error('XPST0017', msg) from None else: return default else: item = None for k, result in enumerate(selector(copy(context))): if k == 0: item = result elif self.parser.compatibility_mode: break elif isinstance(context, XPathSchemaContext): # Multiple schema nodes are ignored but do not raise. The target # of schema context selection is XSD type association and multiple # node coherency is already checked at schema level. break else: raise self.wrong_context_type( "a sequence of more than one item is not allowed as argument" ) else: if item is None: if not required: return default ord_arg = ordinal(index + 1) msg = "A not empty sequence required for {} argument" raise self.error('XPTY0004', msg.format(ord_arg)) if cls is not None: return self.validated_value(item, cls, promote) return item def validated_value(self, item: Any, cls: Type[Any], promote: Optional[ClassCheckType] = None) -> Any: """ Type promotion checking (see "function conversion rules" in XPath 2.0 language definition) """ if isinstance(item, (cls, ValueToken)): return item elif promote and isinstance(item, promote): return cls(item) if self.parser.compatibility_mode: if issubclass(cls, str): return self.string_value(item) elif issubclass(cls, float) or issubclass(float, cls): return self.number_value(item) if issubclass(cls, XPathToken) or self.parser.version == '1.0': code = 'XPTY0004' else: value = self.data_value(item) if isinstance(value, cls): return value elif isinstance(value, AnyURI) and issubclass(cls, str): return cls(value) elif isinstance(value, UntypedAtomic): try: return cls(value) except (TypeError, ValueError): pass code = 'FOTY0012' if value is None else 'XPTY0004' message = "item type is {!r} instead of {!r}" raise self.error(code, message.format(type(item), cls)) def atomization(self, context: Optional[XPathContext] = None) \ -> Iterator[AtomicValueType]: """ Helper method for value atomization of a sequence. Ref: https://www.w3.org/TR/xpath20/#id-atomization :param context: the XPath dynamic context. """ for item in self.select(context): value = self.data_value(item) if value is None: msg = "argument node {!r} does not have a typed value" raise self.error('FOTY0012', msg.format(item)) elif isinstance(value, list): yield from value else: yield value def get_atomized_operand(self, context: Optional[XPathContext] = None) \ -> Optional[AtomicValueType]: """ Get the atomized value for an XPath operator. :param context: the XPath dynamic context. :return: the atomized value of a single length sequence or `None` if the sequence is empty. """ selector = iter(self.atomization(context)) try: value = next(selector) except StopIteration: return None else: item = getattr(context, 'item', None) try: next(selector) except StopIteration: if isinstance(value, UntypedAtomic): value = str(value) if not isinstance(context, XPathSchemaContext) and \ item is not None and \ self.xsd_types and \ isinstance(value, str): xsd_type = self.get_xsd_type(item) if xsd_type is None or xsd_type.name in _XSD_SPECIAL_TYPES: pass else: try: value = xsd_type.decode(value) except (TypeError, ValueError): msg = "Type {!r} is not appropriate for the context" raise self.wrong_context_type(msg.format(type(value))) return value else: msg = "atomized operand is a sequence of length greater than one" raise self.wrong_context_type(msg) def iter_comparison_data(self, context: XPathContext) -> Iterator[OperandsType]: """ Generates comparison data couples for the general comparison of sequences. Different sequences maybe generated with an XPath 2.0 parser, depending on compatibility mode setting. Ref: https://www.w3.org/TR/xpath20/#id-general-comparisons :param context: the XPath dynamic context. """ left_values: Any right_values: Any if self.parser.compatibility_mode: left_values = [x for x in self._items[0].atomization(copy(context))] right_values = [x for x in self._items[1].atomization(copy(context))] # Boolean comparison if one of the results is a single boolean value (1.) try: if isinstance(left_values[0], bool): if len(left_values) == 1: yield left_values[0], self.boolean_value(right_values) return if isinstance(right_values[0], bool): if len(right_values) == 1: yield self.boolean_value(left_values), right_values[0] return except IndexError: return # Converts to float for lesser-greater operators (3.) if self.symbol in ('<', '<=', '>', '>='): yield from product(map(float, left_values), map(float, right_values)) return elif self.parser.version == '1.0': yield from product(left_values, right_values) return else: left_values = self._items[0].atomization(copy(context)) right_values = self._items[1].atomization(copy(context)) for values in product(left_values, right_values): if any(isinstance(x, bool) for x in values): if any(isinstance(x, (str, Integer)) for x in values): msg = "cannot compare {!r} and {!r}" raise TypeError(msg.format(type(values[0]), type(values[1]))) elif any(isinstance(x, Integer) for x in values) and \ any(isinstance(x, str) for x in values): msg = "cannot compare {!r} and {!r}" raise TypeError(msg.format(type(values[0]), type(values[1]))) elif any(isinstance(x, float) for x in values): if isinstance(values[0], decimal.Decimal): yield float(values[0]), values[1] continue elif isinstance(values[1], decimal.Decimal): yield values[0], float(values[1]) continue yield values def select_results(self, context: Optional[XPathContext]) -> Iterator[XPathResultType]: """ Generates formatted XPath results. :param context: the XPath dynamic context. """ if context is not None: self.parser.check_variables(context.variables) for result in self.select(context): if not isinstance(result, XPathNode): yield result elif isinstance(result, NamespaceNode): if self.parser.compatibility_mode: yield result.prefix, result.uri else: yield result.uri else: yield result.value def get_results(self, context: XPathContext) -> Union[List[XPathResultType], AtomicValueType]: """ Returns results formatted according to XPath specifications. :param context: the XPath dynamic context. :return: a list or a simple datatype when the result is a single simple type \ generated by a literal or function token. """ if context is not None: self.parser.check_variables(context.variables) results = [] item = None for item in self.select(context): if not isinstance(item, XPathNode): results.append(item) elif isinstance(item, NamespaceNode): if self.parser.compatibility_mode: results.append((item.prefix, item.uri)) else: results.append(item.uri) else: results.append(item.value) if len(results) == 1 and not isinstance(item, (ElementNode, DocumentNode)): if isinstance(item, (bool, int, float, Decimal)): return item elif self.label in ('function', 'literal'): return cast(AtomicValueType, results[0]) return results def get_operands(self, context: XPathContext, cls: Optional[Type[Any]] = None) \ -> OperandsType: """ Returns the operands for a binary operator. Float arguments are converted to decimal if the other argument is a `Decimal` instance. :param context: the XPath dynamic context. :param cls: if a type is provided performs a type checking on item. :return: a couple of values representing the operands. If any operand \ is not available returns a `(None, None)` couple. """ op1 = self.get_argument(context, cls=cls) if op1 is None: return None, None elif isinstance(op1, ElementNode): op1 = self._items[0].data_value(op1) op2 = self.get_argument(context, index=1, cls=cls) if op2 is None: return None, None elif isinstance(op2, ElementNode): op2 = self._items[1].data_value(op2) if isinstance(op1, AbstractDateTime) and isinstance(op2, AbstractDateTime): if context is not None and context.timezone is not None: if op1.tzinfo is None: op1.tzinfo = context.timezone if op2.tzinfo is None: op2.tzinfo = context.timezone else: if isinstance(op1, UntypedAtomic): op1 = self.cast_to_double(op1.value) if isinstance(op2, Decimal): return op1, float(op2) if isinstance(op2, UntypedAtomic): op2 = self.cast_to_double(op2.value) if isinstance(op1, Decimal): return float(op1), op2 if isinstance(op1, float): if isinstance(op2, Duration): return Decimal(op1), op2 if isinstance(op2, Decimal): return op1, type(op1)(op2) if isinstance(op2, float): if isinstance(op1, Duration): return op1, Decimal(op2) if isinstance(op1, Decimal): return type(op2)(op1), op2 return op1, op2 def get_absolute_uri(self, uri: str, base_uri: Optional[str] = None, as_string: bool = True) -> Union[str, AnyURI]: """ Obtains an absolute URI from the argument and the static context. :param uri: a string representing an URI. :param base_uri: an alternative base URI, otherwise the base_uri \ of the static context is used. :param as_string: if `True` then returns the URI as a string, otherwise \ returns the URI as xs:anyURI instance. :returns: the argument if it's an absolute URI. Otherwise returns the URI obtained by the join o the base_uri of the static context with the argument. Returns the argument if the base_uri is `None'. """ if not base_uri: base_uri = self.parser.base_uri uri_parts: urllib.parse.ParseResult = urllib.parse.urlparse(uri) if uri_parts.scheme or uri_parts.netloc or base_uri is None: return uri if as_string else AnyURI(uri) base_uri_parts: urllib.parse.SplitResult = urllib.parse.urlsplit(base_uri) if base_uri_parts.fragment or not base_uri_parts.scheme and \ not base_uri_parts.netloc and not base_uri_parts.path.startswith('/'): raise self.error('FORG0002', '{!r} is not suitable as base URI'.format(base_uri)) if uri_parts.path.startswith('/') and base_uri_parts.path not in ('', '/'): return uri if as_string else AnyURI(uri) if as_string: return urllib.parse.urljoin(base_uri, uri) return AnyURI(urllib.parse.urljoin(base_uri, uri)) def get_namespace(self, prefix: str) -> str: """ Resolves a prefix to a namespace raising an error (FONS0004) if the prefix is not found in the namespace map. """ try: return self.parser.namespaces[prefix] except KeyError as err: msg = 'no namespace found for prefix %r' % str(err) raise self.error('FONS0004', msg) from None def bind_namespace(self, namespace: str) -> None: """ Bind a token with a namespace. The token has to be a name, a name wildcard, a function or a constructor, otherwise a syntax error is raised. Functions and constructors must be limited to its namespaces. """ if self.symbol in ('(name)', '*') or isinstance(self, ProxyToken): pass elif namespace == self.parser.function_namespace: if self.label != 'function': msg = "a name, a wildcard or a function expected" raise self.wrong_syntax(msg, code='XPST0017') elif isinstance(self.label, MultiLabel): self.label = 'function' elif namespace == XSD_NAMESPACE: if self.label != 'constructor function': msg = "a name, a wildcard or a constructor function expected" raise self.wrong_syntax(msg, code='XPST0017') elif isinstance(self.label, MultiLabel): self.label = 'constructor function' elif namespace == XPATH_MATH_FUNCTIONS_NAMESPACE: if self.label != 'math function': msg = "a name, a wildcard or a math function expected" raise self.wrong_syntax(msg, code='XPST0017') elif isinstance(self.label, MultiLabel): self.label = 'math function' elif not self.label.endswith('function'): msg = "a name, a wildcard or a function expected" raise self.wrong_syntax(msg, code='XPST0017') elif self.namespace and namespace != self.namespace: msg = "unmatched namespace" raise self.wrong_syntax(msg, code='XPST0017') self.namespace = namespace def adjust_datetime(self, context: XPathContext, cls: Type[DatetimeValueType]) \ -> Optional[Union[DatetimeValueType, DayTimeDuration]]: """ XSD datetime adjust function helper. :param context: the XPath dynamic context. :param cls: the XSD datetime subclass to use. :return: an empty list if there is only one argument that is the empty sequence \ or the adjusted XSD datetime instance. """ timezone: Optional[Any] item: Optional[DatetimeValueType] _item: Union[DatetimeValueType, DayTimeDuration] if len(self) == 1: item = self.get_argument(context, cls=cls) if item is None: return None timezone = getattr(context, 'timezone', None) else: item = self.get_argument(context, cls=cls) timezone = self.get_argument(context, 1, cls=DayTimeDuration) if timezone is not None: try: timezone = Timezone.fromduration(timezone) except ValueError as err: raise self.error('FODT0003', str(err)) from None if item is None: return None _item = item _tzinfo = _item.tzinfo try: if _tzinfo is not None and timezone is not None: if isinstance(_item, DateTime10): _item += timezone.offset elif not isinstance(item, Date10): _item += timezone.offset - _tzinfo.offset elif timezone.offset < _tzinfo.offset: _item -= timezone.offset - _tzinfo.offset _item -= DayTimeDuration.fromstring('P1D') except OverflowError as err: raise self.error('FODT0001', str(err)) from None if not isinstance(_item, DayTimeDuration): _item.tzinfo = timezone return _item @contextlib.contextmanager def use_locale(self, collation: str) -> Iterator[None]: """A context manager for use a locale setting for string comparison in a code block.""" loc = locale.getlocale(locale.LC_COLLATE) if collation == UNICODE_CODEPOINT_COLLATION or collation == 'collation/codepoint': collation = 'en_US.UTF-8' elif collation is None: raise self.error('XPTY0004', 'collation cannot be an empty sequence') try: locale.setlocale(locale.LC_COLLATE, collation) except locale.Error: raise self.error('FOCH0002', 'Unsupported collation %r' % collation) from None else: yield finally: locale.setlocale(locale.LC_COLLATE, loc) ### # XSD types related methods def select_xsd_nodes(self, schema_context: XPathSchemaContext, name: str) \ -> Iterator[Union[None, AttributeNode, ElementNode]]: """ Selector for XSD nodes (elements, attributes and schemas). If there is a match with an attribute or an element the node's type is added to matching types of the token. For each matching elements or attributes yields tuple nodes containing the node, its type and a compatible value for doing static evaluation. For matching schemas yields the original instance. :param schema_context: an XPathSchemaContext instance. :param name: a QName in extended format. """ xsd_node: Any xsd_root = cast(Union[XsdSchemaProtocol, XsdElementProtocol], schema_context.root.value) for xsd_node in schema_context.iter_children_or_self(): if xsd_node is None: if name == XSD_SCHEMA == schema_context.root.elem.tag: yield None elif isinstance(xsd_node, AttributeNode): assert not isinstance(xsd_node.value, str) if not xsd_node.value.is_matching(name): continue if xsd_node.name is not None: self.add_xsd_type(xsd_node) else: # node is an XSD attribute wildcard xsd_attribute = xsd_root.maps.attributes.get(name) if xsd_attribute is not None: self.add_xsd_type(xsd_attribute) yield xsd_node elif isinstance(xsd_node, SchemaElementNode): if name == XSD_SCHEMA == xsd_node.elem.tag: # The element is a schema yield xsd_node elif xsd_node.elem.is_matching(name, self.parser.namespaces.get('')): if xsd_node.elem.name is not None: self.add_xsd_type(xsd_node) else: # node is an XSD element wildcard xsd_element = xsd_root.maps.elements.get(name) if xsd_element is not None: for child in schema_context.root.children: if child.value is xsd_element: xsd_node = child self.add_xsd_type(xsd_node) break else: self.add_xsd_type(xsd_element) yield xsd_node def add_xsd_type(self, item: Any) -> Optional[XsdTypeProtocol]: """ Adds an XSD type association from an item. The association is added using the item's name and type. """ if isinstance(item, XPathNode): item = item.value # TODO: replace with protocol check (XsdAttributeProtocol, XsdElementProtocol) if not hasattr(item, 'type') or not hasattr(item, 'xsd_version'): return None name: str = item.name xsd_type: XsdTypeProtocol = item.type if self.xsd_types is None: self.xsd_types = {name: xsd_type} else: obj = self.xsd_types.get(name) if obj is None: self.xsd_types[name] = xsd_type elif not isinstance(obj, list): if obj is not xsd_type: self.xsd_types[name] = [obj, xsd_type] elif xsd_type not in obj: obj.append(xsd_type) return xsd_type def get_xsd_type(self, item: Union[str, PrincipalNodeType]) \ -> Optional[XsdTypeProtocol]: """ Returns the XSD type associated with an item. Match by item's name and XSD validity. Returns `None` if no XSD type is matching. :param item: a string or an AttributeNode or an element. """ if not self.xsd_types or isinstance(self.xsd_types, AbstractSchemaProxy): return None elif isinstance(item, AttributeNode): if item.xsd_type is not None: return item.xsd_type xsd_type = self.xsd_types.get(item.name) elif isinstance(item, ElementNode): if item.xsd_type is not None: return item.xsd_type xsd_type = self.xsd_types.get(item.elem.tag) elif isinstance(item, str): xsd_type = self.xsd_types.get(item) else: return None x: XsdTypeProtocol if not xsd_type: return None elif not isinstance(xsd_type, list): return xsd_type elif isinstance(item, AttributeNode): for x in xsd_type: if x.is_valid(item.value): return x elif isinstance(item, ElementNode): for x in xsd_type: if x.is_simple(): if x.is_valid(item.elem.text): return x elif x.is_valid(item.elem): return x return xsd_type[0] def get_typed_node(self, item: PrincipalNodeType) -> PrincipalNodeType: """ Returns a typed node if the item is matching an XSD type. Ref: https://www.w3.org/TR/xpath20/#id-processing-model https://www.w3.org/TR/xpath20/#id-static-analysis https://www.w3.org/TR/xquery-semantics/ :param item: an untyped attribute or element. :return: a typed AttributeNode/ElementNode if the argument is matching \ any associated XSD type. """ if isinstance(item, (ElementNode, AttributeNode)) and item.xsd_type is not None: return item xsd_type = self.get_xsd_type(item) if xsd_type is not None and isinstance(item, (ElementNode, AttributeNode)): item.xsd_type = xsd_type return item def cast_to_qname(self, qname: str) -> QName: """Cast a prefixed qname string to a QName object.""" try: if ':' not in qname: return QName(self.parser.namespaces.get(''), qname.strip()) pfx, _ = qname.strip().split(':') return QName(self.parser.namespaces[pfx], qname) except ValueError: msg = 'invalid value {!r} for an xs:QName'.format(qname.strip()) raise self.error('FORG0001', msg) except KeyError as err: raise self.error('FONS0004', 'no namespace found for prefix {}'.format(err)) def cast_to_double(self, value: Union[SupportsFloat, str]) -> float: """Cast a value to xs:double.""" try: if self.parser.xsd_version == '1.0': return cast(float, DoubleProxy10(value)) return cast(float, DoubleProxy(value)) except ValueError as err: raise self.error('FORG0001', str(err)) # str or UntypedAtomic def cast_to_primitive_type(self, obj: Any, type_name: str) -> Any: if obj is None or not type_name.startswith('xs:') or type_name.count(':') != 1: return obj values = obj if isinstance(obj, list) else [obj] if not values: return obj if type_name[-1] in '+*?': type_name = type_name[:-1] result = [] for v in values: if self.parser.is_instance(v, XSD_DECIMAL): if type_name == 'xs:double': result.append(float(v)) continue elif type_name == 'xs:float': if self.parser.xsd_version == '1.0': result.append(Float10(v)) else: result.append(Float(v)) continue result.append(v) if isinstance(obj, list) or len(result) > 1: return result return result[0] ### # XPath data accessors base functions def boolean_value(self, obj: Any) -> bool: """ The effective boolean value, as computed by fn:boolean(). """ if isinstance(obj, list): if not obj: return False elif isinstance(obj[0], XPathNode): return True elif len(obj) > 1: message = "effective boolean value is not defined for a sequence " \ "of two or more items not starting with an XPath node." raise self.error('FORG0006', message) else: obj = obj[0] if isinstance(obj, (int, str, UntypedAtomic, AnyURI)): # Include bool return bool(obj) elif isinstance(obj, (float, Decimal)): return False if math.isnan(obj) else bool(obj) elif obj is None: return False elif isinstance(obj, XPathNode): return True else: message = "effective boolean value is not defined for {!r}.".format(type(obj)) raise self.error('FORG0006', message) def data_value(self, obj: Any) -> Optional[AtomicValueType]: """ The typed value, as computed by fn:data() on each item. Returns an instance of UntypedAtomic for untyped data. https://www.w3.org/TR/xpath20/#dt-typed-value """ if obj is None: return None elif isinstance(obj, XPathNode): try: return obj.typed_value except (TypeError, ValueError) as err: raise self.error('XPDY0050', str(err)) elif isinstance(obj, XPathFunction): raise self.error('FOTY0013', f"{obj.label!r} has no typed value") else: return cast(AtomicValueType, obj) def string_value(self, obj: Any) -> str: """ The string value, as computed by fn:string(). """ if obj is None: return '' elif isinstance(obj, XPathNode): return obj.string_value elif isinstance(obj, bool): return 'true' if obj else 'false' elif isinstance(obj, Decimal): value = format(obj, 'f') if '.' in value: return value.rstrip('0').rstrip('.') return value elif isinstance(obj, float): if math.isnan(obj): return 'NaN' elif math.isinf(obj): return str(obj).upper() value = str(obj) if '.' in value: value = value.rstrip('0').rstrip('.') if '+' in value: value = value.replace('+', '') if 'e' in value: return value.upper() return value elif isinstance(obj, XPathFunction): raise self.error('FOTY0014', f"{obj.label!r} has no string value") return str(obj) def number_value(self, obj: Any) -> float: """ The numeric value, as computed by fn:number() on each item. Returns a float value. """ try: return float(self.string_value(obj) if isinstance(obj, XPathNode) else obj) except (TypeError, ValueError): return float('nan') ### # Error handling helpers def error_code(self, code: str) -> str: """Returns a prefixed error code.""" if self.parser.namespaces.get('err') == XQT_ERRORS_NAMESPACE: return 'err:%s' % code for pfx, uri in self.parser.namespaces.items(): if uri == XQT_ERRORS_NAMESPACE: return '%s:%s' % (pfx, code) if pfx else code return code # returns an unprefixed code (without prefix the namespace is not checked) def error(self, code: Union[str, QName], message_or_error: Union[None, str, Exception] = None) -> ElementPathError: """ Returns an XPath error instance related with a code. An XPath/XQuery/XSLT error code is an alphanumeric token starting with four uppercase letters and ending with four digits. :param code: the error code as QName or string. :param message_or_error: an optional custom message or an exception. """ namespace: Optional[str] if isinstance(code, QName): namespace = code.uri code = code.local_name elif ':' not in code: namespace = None else: try: prefix, code = code.split(':') except ValueError: raise ElementPathValueError( message='%r is not a prefixed name' % code, code=self.error_code('XPTY0004'), token=self, ) else: namespace = self.parser.namespaces.get(prefix) if namespace and namespace != XQT_ERRORS_NAMESPACE: raise ElementPathValueError( message='%r namespace is required' % XQT_ERRORS_NAMESPACE, code=self.error_code('XPTY0004'), token=self, ) try: error_class, default_message = XPATH_ERROR_CODES[code] except KeyError: raise ElementPathValueError( message='unknown XPath error code %r' % code, code=self.error_code('XPTY0004'), token=self, ) if message_or_error is None: message = default_message elif isinstance(message_or_error, str): message = message_or_error elif isinstance(message_or_error, ElementPathError): message = message_or_error.message else: message = str(message_or_error) return error_class(message, code=self.error_code(code), token=self) # Shortcuts for XPath errors, only the wrong_syntax def expected(self, *symbols: str, message: Optional[str] = None, code: str = 'XPST0003') -> None: if symbols and self.symbol not in symbols: raise self.wrong_syntax(message, code) def unexpected(self, *symbols: str, message: Optional[str] = None, code: str = 'XPST0003') -> None: if not symbols or self.symbol in symbols: raise self.wrong_syntax(message, code) def wrong_syntax(self, message: Optional[str] = None, # type: ignore[override] code: str = 'XPST0003') -> ElementPathError: if self.label == 'function': code = 'XPST0017' if message: return self.error(code, message) error = super(XPathToken, self).wrong_syntax(message) return self.error(code, str(error)) def wrong_value(self, message: Optional[str] = None) -> ElementPathValueError: return cast(ElementPathValueError, self.error('FOCA0002', message)) def wrong_type(self, message: Optional[str] = None) -> ElementPathTypeError: return cast(ElementPathTypeError, self.error('FORG0006', message)) def missing_context(self, message: Optional[str] = None) -> MissingContextError: return cast(MissingContextError, self.error('XPDY0002', message)) def wrong_context_type(self, message: Optional[str] = None) -> ElementPathTypeError: return cast(ElementPathTypeError, self.error('XPTY0004', message)) def missing_name(self, message: Optional[str] = None) -> ElementPathNameError: return cast(ElementPathNameError, self.error('XPST0008', message)) def missing_axis(self, message: Optional[str] = None) \ -> Union[ElementPathNameError, ElementPathSyntaxError]: if self.parser.compatibility_mode: return cast(ElementPathNameError, self.error('XPST0010', message)) return cast(ElementPathSyntaxError, self.error('XPST0003', message)) def wrong_nargs(self, message: Optional[str] = None) -> ElementPathTypeError: return cast(ElementPathTypeError, self.error('XPST0017', message)) def wrong_sequence_type(self, message: Optional[str] = None) -> ElementPathTypeError: return cast(ElementPathTypeError, self.error('XPDY0050', message)) def unknown_atomic_type(self, message: Optional[str] = None) -> ElementPathNameError: return cast(ElementPathNameError, self.error('XPST0051', message)) class XPathAxis(XPathToken): pattern = r'\b[^\d\W][\w.\-\xb7\u0300-\u036F\u203F\u2040]*(?=\s*\:\:|\s*\(\:.*\:\)\s*\:\:)' label = 'axis' reverse_axis: bool = False def nud(self) -> 'XPathAxis': self.parser.advance('::') self.parser.expected_name( '(name)', '*', '{', 'Q{', 'text', 'node', 'document-node', 'comment', 'processing-instruction', 'element', 'attribute', 'schema-attribute', 'schema-element', 'namespace-node', ) self._items[:] = self.parser.expression(rbp=self.rbp), return self @property def source(self) -> str: return '%s::%s' % (self.symbol, self[0].source) class ValueToken(XPathToken): """ A dummy token for encapsulating a value. """ symbol = '(value)' def evaluate(self, context: Optional[XPathContext] = None) -> Any: return self.value def select(self, context: Optional[XPathContext] = None) -> Iterator[Any]: if isinstance(self.value, list): yield from self.value elif self.value is not None: yield self.value class ProxyToken(XPathToken): """ A proxy token for resolving or calling namespace related functions. TODO: adding dynamic function definitions and resolving possible conflicts for axes (e.g.: defining tns:child() function) """ label = 'proxy function' def nud(self) -> XPathToken: namespace = self.namespace or XPATH_FUNCTIONS_NAMESPACE expanded_name = '{%s}%s' % (namespace, self.value) try: token = self.parser.symbol_table[expanded_name](self.parser) except KeyError: if self.namespace == XSD_NAMESPACE: raise self.error('XPST0017', 'unknown constructor function {!r}'.format(self.symbol)) else: raise self.error('XPST0017', 'unknown function {!r}'.format(self.symbol)) else: if self.parser.next_token.symbol == '#': if self.parser.version >= '2.0': return token return token.nud() class XPathFunction(XPathToken): """ A token for processing XPath functions. """ _name: Optional[QName] = None pattern = r'(? None: super().__init__(parser) if isinstance(nargs, int) and nargs != self.nargs: if nargs < 0: raise self.error('XPST0017', 'number of arguments must be non negative') elif self.nargs is None: self.nargs = nargs elif isinstance(self.nargs, int): raise self.error('XPST0017', 'incongruent number of arguments') elif self.nargs[0] > nargs or self.nargs[1] is not None and self.nargs[1] < nargs: raise self.error('XPST0017', 'incongruent number of arguments') else: self.nargs = nargs def __call__(self, context: Optional[XPathContext] = None, *args: XPathFunctionArgType) -> Any: # Check provided argument with arity if self.nargs is None or self.nargs == len(args): pass elif isinstance(self.nargs, tuple): if len(args) < self.nargs[0]: raise self.error('XPTY0004', "missing required arguments") elif self.nargs[1] is not None and len(args) > self.nargs[1]: raise self.error('XPTY0004', "too many arguments") elif self.nargs > len(args): raise self.error('XPTY0004', "missing required arguments") else: raise self.error('XPTY0004', "too many arguments") context = copy(context) if self.variables is not None and context is not None: context.variables.update(self.variables) if self.symbol == 'function': if context is None: raise self.missing_context() elif not args and self: if context.item is None: if isinstance(context.root, DocumentNode): context.item = context.root.getroot() else: context.item = context.root args = cast(Tuple[Union[XPathNode, XPathToken, AtomicValueType]], (context.item,)) partial_function = False if self.variables is None: self.variables = {} for variable, sequence_type, value in zip(self, self.sequence_types, args): varname = cast(str, variable[0].value) if isinstance(value, XPathToken) and value.symbol == '?': partial_function = True continue elif isinstance(value, XPathFunction) and sequence_type.startswith('function('): if not value.match_function_test(sequence_type, as_argument=True): msg = "argument {!r}: {} does not match sequence type {}" raise self.error('XPTY0004', msg.format(varname, value, sequence_type)) elif not self.parser.match_sequence_type(value, sequence_type): value = self.cast_to_primitive_type(value, sequence_type) if not self.parser.match_sequence_type(value, sequence_type): msg = "argument {!r}: {} does not match sequence type {}" raise self.error('XPTY0004', msg.format(varname, value, sequence_type)) context.variables[varname] = self.variables[varname] = value if partial_function: return self elif self.label == 'partial function': for value, tk in zip(args, filter(lambda x: x.symbol == '?', self)): if isinstance(value, XPathToken): tk.value = value.evaluate(context) else: tk.value = value else: self.clear() for value in args: if isinstance(value, XPathToken): self._items.append(value) else: self._items.append(ValueToken(self.parser, value=value)) if any(tk.symbol == '?' for tk in self._items): self._partial_function() return self if isinstance(self.label, MultiLabel): # Disambiguate multi-label tokens if self.namespace == XSD_NAMESPACE and \ 'constructor function' in self.label.values: self.label = 'constructor function' else: for label in self.label.values: if label.endswith('function'): self.label = label break if self.label == 'partial function': result = self._partial_evaluate(context) elif self.body is not None: assert self.label == 'inline function' result = self.body.evaluate(context) else: result = self.evaluate(context) if isinstance(result, XPathToken) and result.symbol == '?': pass elif not self.parser.match_sequence_type(result, self.sequence_types[-1]): result = self.cast_to_primitive_type(result, self.sequence_types[-1]) if not self.parser.match_sequence_type(result, self.sequence_types[-1]): msg = "{!r} does not match sequence type {}" self.parser.match_sequence_type(result, self.sequence_types[-1]) raise self.error('XPTY0004', msg.format(result, self.sequence_types[-1])) return result @property def source(self) -> str: if self.label == 'function test': if len(self.sequence_types) == 1 and self.sequence_types[0] == '*': return 'function(*)' else: return 'function(%s) as %s' % ( ', '.join(self.sequence_types[:-1]), self.sequence_types[-1] ) elif self.label in ('sequence type', 'kind test', ''): return '%s(%s)%s' % ( self.symbol, ', '.join(item.source for item in self), self.occurrence or '' ) return '%s(%s)' % (self.symbol, ', '.join(item.source for item in self)) @property def name(self) -> Optional[QName]: if self._name is not None: return self._name elif self.symbol == 'function': return None elif self.label == 'partial function': return None elif not self.namespace or self.namespace == XPATH_FUNCTIONS_NAMESPACE: self._name = QName(XPATH_FUNCTIONS_NAMESPACE, 'fn:%s' % self.symbol) elif self.namespace == XSD_NAMESPACE: self._name = QName(XSD_NAMESPACE, 'xs:%s' % self.symbol) elif self.namespace == XPATH_MATH_FUNCTIONS_NAMESPACE: self._name = QName(XPATH_MATH_FUNCTIONS_NAMESPACE, 'math:%s' % self.symbol) else: for pfx, uri in self.parser.namespaces.items(): if uri == self.namespace: self._name = QName(uri, f'{pfx}:{self.symbol}') break else: self._name = QName(self.namespace, self.symbol) return self._name @property def arity(self) -> int: if isinstance(self.nargs, int): return self.nargs return len(self._items) def nud(self) -> 'XPathFunction': code = 'XPST0017' if self.label == 'function' else 'XPST0003' self.value = None self.parser.advance('(') if self.nargs is None: del self._items[:] if self.parser.next_token.symbol in (')', '(end)'): raise self.error(code, 'at least an argument is required') while True: self.append(self.parser.expression(5)) if self.parser.next_token.symbol != ',': break self.parser.advance() elif self.nargs == 0: if self.parser.next_token.symbol != ')': if self.parser.next_token.symbol != '(end)': raise self.error(code, '%s has no arguments' % str(self)) raise self.parser.next_token.wrong_syntax() self.parser.advance() return self else: if isinstance(self.nargs, (tuple, list)): min_args, max_args = self.nargs else: min_args = max_args = self.nargs k = 0 while k < min_args: if self.parser.next_token.symbol in (')', '(end)'): msg = 'Too few arguments: expected at least %s arguments' % min_args raise self.wrong_nargs(msg if min_args > 1 else msg[:-1]) self._items[k:] = self.parser.expression(5), k += 1 if k < min_args: if self.parser.next_token.symbol == ')': msg = 'Too few arguments: expected at least %s arguments' % min_args raise self.error(code, msg if min_args > 1 else msg[:-1]) self.parser.advance(',') while max_args is None or k < max_args: if self.parser.next_token.symbol == ',': self.parser.advance(',') self._items[k:] = self.parser.expression(5), elif k == 0 and self.parser.next_token.symbol != ')': self._items[k:] = self.parser.expression(5), else: break # pragma: no cover k += 1 if self.parser.next_token.symbol == ',': msg = 'Too many arguments: expected at most %s arguments' % max_args raise self.error(code, msg if max_args != 1 else msg[:-1]) self.parser.advance(')') if any(tk.symbol == '?' for tk in self._items): self._partial_function() return self def match_function_test(self, function_test: str, as_argument: bool = False) -> bool: """ Match if function signature is a subtype of provided *function_test*. For default return type is covariant and arguments are contravariant. If *as_argument* is `True` the match is inverted and also the return type is considered contravariant. References: https://www.w3.org/TR/xpath-31/#id-function-test https://www.w3.org/TR/xpath-31/#id-sequencetype-subtype """ if not function_test.startswith('function('): return False elif function_test == 'function(*)': return True parts = function_test[9:].partition(') as ') if not parts[1] or not parts[2]: return False sequence_types = parts[0].split(', ') sequence_types.append(parts[2]) signature = [x for x in self.sequence_types[:self.arity]] signature.append(self.sequence_types[-1]) if len(sequence_types) != len(signature): return False if as_argument: iterator = zip(sequence_types, signature) else: iterator = zip(signature, sequence_types) k = 0 for fst, st in iterator: k += 1 if not as_argument and k == len(sequence_types): st, fst = fst, st if st[-1] in '*+?': st_occurs = st[-1] st = st[:-1] else: st_occurs = '' if fst[-1] in '*+?': fst_occurs = fst[-1] fst = fst[:-1] else: fst_occurs = '' if st_occurs == fst_occurs or fst_occurs == '*': pass elif not fst_occurs: if st_occurs not in '?*': return False elif fst_occurs == '+': if st_occurs: return False elif st_occurs: return False if st == fst: continue elif fst == 'item()': continue elif st == 'item()': return False elif fst.startswith('xs:') ^ st.startswith('xs:'): return False elif fst.startswith('xs:'): if not issubclass(xsd11_atomic_types[st[3:]], xsd11_atomic_types[fst[3:]]): return False elif fst != 'node()': return False return True def _partial_function(self) -> None: """Convert a named function to an anonymous partial function.""" def evaluate(context: Optional[XPathContext] = None) -> Any: return self def select(context: Optional[XPathContext] = None) -> Any: yield self if self.__class__.evaluate is not XPathToken.evaluate: setattr(self, '_partial_evaluate', self.evaluate) if self.__class__.select is not XPathToken.select: setattr(self, '_partial_select', self.select) setattr(self, 'evaluate', evaluate) setattr(self, 'select', select) self._name = None self.label = 'partial function' self.nargs = len([tk for tk in self._items if tk.symbol == '?']) def _partial_evaluate(self, context: Optional[XPathContext] = None) -> Any: return [x for x in self._partial_select(context)] def _partial_select(self, context: Optional[XPathContext] = None) -> Iterator[Any]: item = self._partial_evaluate(context) if item is not None: if isinstance(item, list): yield from item else: if context is not None: context.item = item yield item class XPathConstructor(XPathFunction): """ A token for processing XPath 2.0+ constructors. """ @staticmethod def cast(value: Any) -> AtomicValueType: raise NotImplementedError() class XPathMap(XPathFunction): """ A token for processing XPath 3.1+ maps. """ pattern = r'(? None: self._values = [] super().__init__(parser, nargs) def nud(self) -> 'XPathMap': self.parser.advance('{') del self._items[:] if self.parser.next_token.symbol not in ('}', '(end)'): while True: key = self.parser.expression(95) # ':' self._items.append(key) self.parser.advance(':') self._values.append(self.parser.expression(5)) if self.parser.next_token.symbol != ',': break self.parser.advance() self.parser.advance('}') return self def evaluate(self, context: Optional[XPathContext] = None) -> Any: _map = {} for key, value in zip(self._items, self._values): k = next(key.atomization(context), None) if k is None: self.error('XPST0003', 'missing key value') assert k is not None _map[k] = value.evaluate(context) self._map = cast(Dict[AnyAtomicType, Any], _map) return self def __call__(self, context: Optional[XPathContext] = None, *args: XPathFunctionArgType) -> Any: if len(args) != 1 or not isinstance(args[0], AnyAtomicType): self.error('XPST0003', 'exactly one atomic argument is expected') key = cast(AnyAtomicType, args[0]) if self._map is None: self.evaluate(context) assert self._map is not None return self._map.get(key) def keys(self, context: Optional[XPathContext] = None) -> List[AnyAtomicType]: if self._map is None: self.evaluate(context) assert self._map is not None return list(self._map.keys()) def contains(self, context: Optional[XPathContext] = None, key: Optional[AnyAtomicType] = None) -> bool: if self._map is None: self.evaluate(context) assert self._map is not None return key in self._map.keys() class XPathArray(XPathFunction): """ A token for processing XPath 3.1+ arrays. """ pattern = r'(? 'XPathArray': self.value = None self.parser.advance('{') del self._items[:] if self.parser.next_token.symbol not in ('}', '(end)'): while True: self._items.append(self.parser.expression(5)) if self.parser.next_token.symbol != ',': break self.parser.advance() self.parser.advance('}') return self def evaluate(self, context: Optional[XPathContext] = None) -> Any: _array: List[Any] = [] for tk in self._items: _array.extend(tk.select(context)) self._array = _array return self def __call__(self, context: Optional[XPathContext] = None, *args: XPathFunctionArgType) -> Any: if len(args) != 1 or not isinstance(args[0], int): self.error('XPST0003', 'exactly one xs:integer argument is expected') position = cast(int, args[0]) if position <= 0: self.error('FOAY0002' if position else 'FOAY0001') if self._array is None: self.evaluate(context) assert self._array is not None try: return self._array[position - 1] except IndexError: self.error('FOAY0001') def put(self, position: int, member: Any, context: Optional[XPathContext] = None) \ -> 'XPathArray': if position <= 0: self.error('FOAY0002' if position else 'FOAY0001') other = XPathArray(self.parser) other.extend(self._items) other.evaluate(context) assert other._array is not None try: other._array[position - 1] = member except IndexError: self.error('FOAY0001') return other elementpath-3.0.2/mypy.ini000066400000000000000000000000371427546011100155220ustar00rootroot00000000000000[mypy] show_error_codes = True elementpath-3.0.2/profiling/000077500000000000000000000000001427546011100160145ustar00rootroot00000000000000elementpath-3.0.2/profiling/memray_node_tree.py000077500000000000000000000103731427546011100217130ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # if __name__ == '__main__': import argparse import pathlib import memray import xml.etree.ElementTree as ElementTree from elementpath import DocumentNode, AttributeNode, ElementNode, \ CommentNode, ProcessingInstructionNode, TextNode def get_element_tree(source): return ElementTree.XML(source) parser = argparse.ArgumentParser() parser.add_argument('--depth', type=int, default=7, help="the depth of the test XML tree (7 for default)") parser.add_argument('--children', type=int, default=3, help="the number of children for each element (3 for default)") params = parser.parse_args() print('*' * 64) print("*** Memory usage estimation of XPath node trees using memray ***") print('*' * 64) print() chunk = 'lorem ipsum' for k in range(params.depth - 1, 0, -1): chunk = f'{chunk}' * params.children xml_source = f'{chunk}' label = f'{params.depth}x{params.children}' outdir = pathlib.Path(__file__).parent.joinpath('out/') et_file = outdir.joinpath(f'memray-element-tree-{label}.bin') nt_file = outdir.joinpath(f'memray-node-tree-{label}.bin') if et_file.is_file(): et_file.unlink() with memray.Tracker(et_file, memory_interval_ms=1, follow_fork=True): root = get_element_tree(xml_source) if nt_file.is_file(): nt_file.unlink() with memray.Tracker(nt_file, follow_fork=True): namespaces = None position = 1 def build_element_node() -> ElementNode: global position node = ElementNode(elem, parent, position, nsmap) position += 1 position += len(nsmap) if 'xml' in nsmap else len(nsmap) + 1 position += len(elem.attrib) if elem.text is not None: node.children.append(TextNode(elem.text, node, position)) position += 1 return node # Common nsmap nsmap = {} if namespaces is None else dict(namespaces) if hasattr(root, 'parse'): root_node = parent = DocumentNode(root, position) position += 1 elem = root.getroot() child = build_element_node() parent.children.append(child) parent = child else: elem = root parent = None root_node = parent = build_element_node() # elements = {elem: parent} # Enable for building a reverse map elem -> node children = iter(elem) iterators = [] ancestors = [] while True: for elem in children: if not callable(elem.tag): child = build_element_node() elif elem.tag.__name__ == 'Comment': # type: ignore[attr-defined] child = CommentNode(elem, parent, position) position += 1 else: child = ProcessingInstructionNode(elem, parent, position) # elements[elem] = child parent.children.append(child) if elem.tail is not None: parent.children.append(TextNode(elem.tail, parent, position)) position += 1 if len(elem): ancestors.append(parent) parent = child iterators.append(children) children = iter(elem) break else: try: children, parent = iterators.pop(), ancestors.pop() except IndexError: break print(f"Number of elements: {len(list(root.iter()))}") print(f"Number of nodes: {len(list(root_node.iter()))}") element_nodes = list(x for x in root_node.iter() if isinstance(x, ElementNode)) print(f"Number of element nodes: {len(element_nodes)}") elementpath-3.0.2/profiling/profile_character_classes.py000077500000000000000000000026351427546011100235700ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from timeit import timeit from memory_profiler import profile from elementpath.regex import CharacterClass def run_timeit(stmt='pass', setup='pass', number=1000): seconds = timeit(stmt, setup=setup, number=number) print("{}: {}s".format(stmt, seconds)) @profile def character_class_objects(): return [CharacterClass(r'\c') for _ in range(10000)] if __name__ == '__main__': print('*' * 62) print("*** Memory and timing profile of CharacterClass class ***") print("***" + ' ' * 56 + "***") print("*** Note: save ~15% of memory with __slots__ (from v2.2.3) ***") print('*' * 62) print() character_class_objects() character_class = CharacterClass(r'\c') character_class -= CharacterClass(r'\i') SETUP = 'from __main__ import character_class' NUMBER = 10000 run_timeit('"9" in character_class # True ', SETUP, NUMBER) run_timeit('"q" in character_class # False', SETUP, NUMBER) run_timeit('8256 in character_class # True ', SETUP, NUMBER) run_timeit('8257 in character_class # False', SETUP, NUMBER) elementpath-3.0.2/profiling/profile_unicode_subsets.py000077500000000000000000000025421427546011100233120ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from timeit import timeit from memory_profiler import profile from elementpath.regex import UNICODE_CATEGORIES, UnicodeSubset def run_timeit(stmt='pass', setup='pass', number=1000): seconds = timeit(stmt, setup=setup, number=number) print("{}: {}s".format(stmt, seconds)) @profile def unicode_subset_objects(): return [UnicodeSubset('\U00020000-\U0002A6D6') for _ in range(10000)] if __name__ == '__main__': print('*' * 62) print("*** Memory and timing profile of UnicodeSubset class ***") print("***" + ' ' * 56 + "***") print("*** Note: save ~28% of memory with __slots__ (from v2.2.3) ***") print('*' * 62) print() unicode_subset_objects() subset = UNICODE_CATEGORIES['C'] SETUP = 'from __main__ import subset' NUMBER = 10000 run_timeit('1328 in subset # True ', SETUP, NUMBER) run_timeit('1329 in subset # False', SETUP, NUMBER) run_timeit('72165 in subset # True ', SETUP, NUMBER) run_timeit('72872 in subset # False', SETUP, NUMBER) elementpath-3.0.2/profiling/profile_xpath_nodes.py000077500000000000000000000060741427546011100224340ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import sys from timeit import timeit from memory_profiler import profile import lxml.etree as etree from elementpath import XPathNode, build_node_tree from elementpath.etree import PyElementTree def run_timeit(stmt='pass', setup='pass', number=1000): seconds = timeit(stmt, setup=setup, number=number) print("{}: {}s ({} times, about {}s each)".format(stmt, seconds, number, seconds/number)) @profile def create_element_tree(source): doc = etree.XML(source) return doc @profile def create_py_element_tree(source): doc = PyElementTree.XML(source) return doc @profile def create_xpath_tree(et_root): node_tree = build_node_tree(et_root) return node_tree # ep2.5 node checking function def is_xpath_node(obj): return isinstance(obj, XPathNode) or \ hasattr(obj, 'tag') and hasattr(obj, 'attrib') and hasattr(obj, 'text') or \ hasattr(obj, 'local_name') and hasattr(obj, 'type') and hasattr(obj, 'name') or \ hasattr(obj, 'getroot') and hasattr(obj, 'parse') and hasattr(obj, 'iter') if __name__ == '__main__': import argparse parser = argparse.ArgumentParser() parser.add_argument('--depth', type=int, default=7, help="the depth of the test XML tree (7 for default)") parser.add_argument('--children', type=int, default=3, help="the number of children for each element (3 for default)") parser.add_argument('--speed', action='store_true', default=False, help="run also speed tests (disabled for default)") params = parser.parse_args() print('*' * 60) print("*** Memory and timing profile of XPath node trees ***") print('*' * 60) print() SETUP = 'from __main__ import root, xpath_tree, build_node_tree, is_xpath_node, XPathNode' NUMBER = 5000 chunk = 'lorem ipsum' for k in range(params.depth - 1, 0, -1): chunk = f'{chunk}' * params.children xml_source = f'{chunk}' root = create_element_tree(xml_source) create_py_element_tree(xml_source) xpath_tree = create_xpath_tree(root) if not params.speed: print('Speed tests skipped ... exit') sys.exit() run_timeit('build_node_tree(root)', SETUP, 100) print() run_timeit('is_xpath_node(root)', SETUP, NUMBER) run_timeit('is_xpath_node(xpath_tree)', SETUP, NUMBER) run_timeit('isinstance(xpath_tree, XPathNode)', SETUP, NUMBER) print() run_timeit('for e in root.iter(): e', SETUP, NUMBER) run_timeit('for e in xpath_tree.iter(): e', SETUP, NUMBER) print() run_timeit('for e in root.iter(): is_xpath_node(e)', SETUP, NUMBER) run_timeit('for e in xpath_tree.iter(): isinstance(e, XPathNode)', SETUP, NUMBER) elementpath-3.0.2/profiling/profile_xpath_parsers.py000077500000000000000000000043561427546011100230040ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from timeit import timeit from memory_profiler import profile from elementpath import XPath1Parser, XPath2Parser from elementpath.xpath30 import XPath30Parser def run_timeit(stmt='pass', setup='pass', number=1000): seconds = timeit(stmt, setup=setup, number=number) print("{}: {}s".format(stmt, seconds)) @profile def xpath1_parser_objects(): return [XPath1Parser() for _ in range(10000)] @profile def xpath2_parser_objects(): return [XPath2Parser() for _ in range(10000)] @profile def xpath30_parser_objects(): return [XPath30Parser() for _ in range(10000)] if __name__ == '__main__': print('*' * 62) print("*** Memory and timing profile of XPathParser1/2/3 classes ***") print("***" + ' ' * 56 + "***") print('*' * 62) print() xpath1_parser_objects() xpath2_parser_objects() xpath30_parser_objects() NUMBER = 10000 SETUP = 'from __main__ import XPath1Parser' run_timeit("XPath1Parser().parse('18 - 9 + 10')", SETUP, NUMBER) run_timeit("XPath1Parser().parse('true()')", SETUP, NUMBER) run_timeit("XPath1Parser().parse('contains(\"foobar\", \"bar\")')", SETUP, NUMBER) run_timeit("XPath1Parser().parse('/A/B/C/D')", SETUP, NUMBER) print() SETUP = 'from __main__ import XPath2Parser' run_timeit("XPath2Parser().parse('18 - 9 + 10')", SETUP, NUMBER) run_timeit("XPath2Parser().parse('true()')", SETUP, NUMBER) run_timeit("XPath2Parser().parse('contains(\"foobar\", \"bar\")')", SETUP, NUMBER) run_timeit("XPath2Parser().parse('/A/B/C/D')", SETUP, NUMBER) print() SETUP = 'from __main__ import XPath30Parser' run_timeit("XPath30Parser().parse('18 - 9 + 10')", SETUP, NUMBER) run_timeit("XPath30Parser().parse('true()')", SETUP, NUMBER) run_timeit("XPath30Parser().parse('contains(\"foobar\", \"bar\")')", SETUP, NUMBER) run_timeit("XPath30Parser().parse('/A/B/C/D')", SETUP, NUMBER) print() elementpath-3.0.2/profiling/profile_xpath_tokens.py000077500000000000000000000026401427546011100226220ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from timeit import timeit from memory_profiler import profile from elementpath import XPath1Parser def run_timeit(stmt='pass', setup='pass', number=1000): seconds = timeit(stmt, setup=setup, number=number) print("{}: {}s".format(stmt, seconds)) @profile def xpath_token_objects(): true_token = XPath1Parser.symbol_table['true'] return [true_token(parser) for _ in range(10000)] if __name__ == '__main__': print('*' * 62) print("*** Memory and timing profile of XPathToken class ***") print("***" + ' ' * 56 + "***") print("*** Note: save ~34% of memory with __slots__ (from v2.2.3) ***") print('*' * 62) print() parser = XPath1Parser() xpath_token_objects() t1 = parser.parse('18 - 9 + 10') t2 = parser.parse('true()') t3 = parser.parse('contains("foobar", "bar")') NUMBER = 30000 run_timeit('t1.evaluate() # 19 ', 'from __main__ import t1', NUMBER) run_timeit('t2.evaluate() # True ', 'from __main__ import t2', NUMBER) run_timeit('t3.evaluate() # True ', 'from __main__ import t3', NUMBER) elementpath-3.0.2/publiccode.yml000066400000000000000000000045201427546011100166600ustar00rootroot00000000000000# This repository adheres to the publiccode.yml standard by including this # metadata file that makes public software easily discoverable. # More info at https://github.com/italia/publiccode.yml publiccodeYmlVersion: '0.2' name: elementpath url: 'https://github.com/sissaschool/elementpath' landingURL: 'https://github.com/sissaschool/elementpath' releaseDate: '2022-08-12' softwareVersion: v3.0.2 developmentStatus: stable platforms: - linux - windows - mac softwareType: library inputTypes: - text/XML categories: - data-analytics - data-collection maintenance: type: internal contacts: - name: Davide Brunato email: davide.brunato@sissa.it affiliation: 'Scuola Internazionale Superiore di Studi Avanzati' legal: license: MIT mainCopyrightOwner: Scuola Internazionale Superiore di Studi Avanzati repoOwner: Scuola Internazionale Superiore di Studi Avanzati localisation: localisationReady: false availableLanguages: - en it: countryExtensionVersion: '0.2' riuso: codiceIPA: sissa description: en: genericName: elementpath apiDocumentation: 'https://elementpath.readthedocs.io/en/latest/xpath_api.html' documentation: 'https://elementpath.readthedocs.io/en/latest/' shortDescription: >- Python library that provides XPath 1.0/2.0/3.0 parsers and selectors for ElementTree and lxml longDescription: | This is a library for Python 3.7+ that provides XPath 1.0, 2.0 and 3.0 selectors for Python's ElementTree XML data structures, both for the standard **ElementTree** library and for the **lxml** library. For lxml this package can be useful for providing XPath 2.0 selectors, because lxml already has it's own implementation of XPath 1.0. ## Installation and usage You can install the package with _pip_ in a Python 3.7+ environment: ~~~~ pip install elementpath ~~~~ For using it import the package and apply the selectors on ElementTree nodes: ~~~~ >>> import elementpath >>> from xml.etree import ElementTree >>> root = ElementTree.XML('') >>> elementpath.select(root, '/A/B2/\*') [, , ] ~~~~ features: - XPath 1.0, XPath 2.0 amd XPath 3.0 implementations elementpath-3.0.2/requirements-dev.txt000066400000000000000000000002331427546011100200610ustar00rootroot00000000000000# Requirements for setup a development environment setuptools tox coverage lxml xmlschema>=2.0.0 Sphinx memory-profiler memray flake8 mypy lxml-stubs -e . elementpath-3.0.2/setup.py000066400000000000000000000041771427546011100155460ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (c), 2018-2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from setuptools import setup, find_packages with open("README.rst") as readme: long_description = readme.read() setup( name='elementpath', version='3.0.2', packages=find_packages(include=['elementpath', 'elementpath.*']), include_package_data=True, author='Davide Brunato', author_email='brunato@sissa.it', url='https://github.com/sissaschool/elementpath', keywords=['XPath', 'XPath2', 'XPath3', 'Pratt-parser', 'ElementTree', 'lxml'], license='MIT', license_file='LICENSE', description='XPath 1.0/2.0/3.0 parsers and selectors for ElementTree and lxml', long_description=long_description, python_requires='>=3.7', extras_require={ 'dev': ['tox', 'coverage', 'lxml', 'xmlschema>=2.0.0', 'Sphinx', 'memory-profiler', 'memray', 'flake8', 'mypy==0.971', 'lxml-stubs'] }, classifiers=[ 'Development Status :: 5 - Production/Stable', 'Intended Audience :: Developers', 'Intended Audience :: Information Technology', 'Intended Audience :: Science/Research', 'License :: OSI Approved :: MIT License', 'Operating System :: OS Independent', 'Programming Language :: Python', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3 :: Only', 'Programming Language :: Python :: 3.7', 'Programming Language :: Python :: 3.8', 'Programming Language :: Python :: 3.9', 'Programming Language :: Python :: 3.10', 'Programming Language :: Python :: 3.11', 'Programming Language :: Python :: Implementation :: CPython', 'Programming Language :: Python :: Implementation :: PyPy', 'Topic :: Software Development :: Libraries', 'Topic :: Text Processing :: Markup :: XML', ] ) elementpath-3.0.2/tests/000077500000000000000000000000001427546011100151655ustar00rootroot00000000000000elementpath-3.0.2/tests/__init__.py000066400000000000000000000000001427546011100172640ustar00rootroot00000000000000elementpath-3.0.2/tests/execute_w3c_tests.py000077500000000000000000001530501427546011100212060ustar00rootroot00000000000000#!/usr/bin/env python3 # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Jelte Jansen # @author Davide Brunato # """ Test script for running W3C XPath tests on elementpath. This is a reworking of https://github.com/tjeb/elementpath_w3c_tests project that uses ElementTree for default and collapses the essential parts into only one module. """ import argparse import contextlib import decimal import re import json import math import os import pathlib import sys import traceback from collections import OrderedDict from urllib.parse import urlsplit from xml.etree import ElementTree import lxml.etree import elementpath import xmlschema from elementpath import ElementPathError, XPath2Parser, XPathContext, XPathNode, \ CommentNode, ProcessingInstructionNode, get_node_tree from elementpath.namespaces import get_expanded_name from elementpath.xpath_token import XPathFunction from elementpath.datatypes import AnyAtomicType from elementpath.xpath31 import XPath31Parser PY38_PLUS = sys.version_info > (3, 8) DEPENDENCY_TYPES = {'spec', 'feature', 'calendar', 'default-language', 'format-integer-sequence', 'language', 'limits', 'xml-version', 'xsd-version', 'unicode-version', 'unicode-normalization-form'} SKIP_TESTS = { 'fn-subsequence__cbcl-subsequence-010', 'fn-subsequence__cbcl-subsequence-011', 'fn-subsequence__cbcl-subsequence-012', 'fn-subsequence__cbcl-subsequence-013', 'fn-subsequence__cbcl-subsequence-014', 'prod-NameTest__NodeTest004', # Unsupported collations 'fn-compare__compare-010', 'fn-substring-after__fn-substring-after-24', 'fn-substring-before__fn-substring-before-24', 'fn-deep-equal__K-SeqDeepEqualFunc-57', 'fn-deep-equal__K-SeqDeepEqualFunc-56', # Unsupported language 'fn-format-integer__format-integer-032', 'fn-format-integer__format-integer-032-fr', 'fn-format-integer__format-integer-052', 'fn-format-integer__format-integer-065', # Processing-instructions (tests on env "auction") 'fn-local-name__fn-local-name-78', 'fn-name__fn-name-28', 'fn-string__fn-string-28', # Require XML 1.1 'fn-codepoints-to-string__K-CodepointToStringFunc-8a', 'fn-codepoints-to-string__K-CodepointToStringFunc-11b', 'fn-codepoints-to-string__K-CodepointToStringFunc-12b', # Require unicode version "7.0" 'fn-lower-case__fn-lower-case-19', 'fn-upper-case__fn-upper-case-19', 'fn-matches.re__re00506', 'fn-matches.re__re00984', # Very large number fault (interpreter crashes or float rounding) 'op-to__RangeExpr-409d', 'fn-format-number__numberformat60a', 'fn-format-number__cbcl-fn-format-number-035', # For XQuery?? 'fn-deep-equal__K2-SeqDeepEqualFunc-43', # includes a '!' symbol # For XP30+ 'fn-root__K-NodeRootFunc-2', # includes a XPath 3.0 fn:generate-id() 'fn-codepoints-to-string__cbcl-codepoints-to-string-021', # Too long ... 'fn-unparsed-text__fn-unparsed-text-038', # Typo in filename 'fn-unparsed-text-lines__fn-unparsed-text-lines-038', # Typo in filename 'fn-serialize__serialize-xml-015b', # Do not raise, attribute is good 'fn-parse-xml-fragment__parse-xml-fragment-022-st', # conflict with parse-xml-fragment-022 'fn-for-each-pair__fn-for-each-pair-017', # Requires PI and comments parsing 'fn-function-lookup__fn-function-lookup-522', # xs:dateTimeStamp for XSD 1.1 only # Unsupported language (German) 'fn-format-date__format-date-de101', 'fn-format-date__format-date-de102', 'fn-format-date__format-date-de103', 'fn-format-date__format-date-de104', 'fn-format-date__format-date-de105', 'fn-format-date__format-date-de106', 'fn-format-date__format-date-de111', 'fn-format-date__format-date-de112', 'fn-format-date__format-date-de113', 'fn-format-date__format-date-de114', 'fn-format-date__format-date-de115', 'fn-format-date__format-date-de116', # Unicode FULLY-NORMALIZATION not supported in Python's unicodedata 'fn-normalize-unicode__cbcl-fn-normalize-unicode-001', 'fn-normalize-unicode__cbcl-fn-normalize-unicode-006', # 'เจมส์' does not match xs:NCName (maybe due to Python re module limitation) 'prod-CastExpr__K2-SeqExprCast-488', 'prod-CastExpr__K2-SeqExprCast-504', # IMHO incorrect tests 'fn-resolve-uri__fn-resolve-uri-9', # URI scheme names are lowercase 'fn-format-number__numberformat82', # result may be '12.340,00' instead of '0.012,34' 'fn-format-number__numberformat83', # (idem) } # Tests that can be run only with lxml.etree LXML_ONLY = { # parse of comments or PIs required 'fn-string__fn-string-30', 'prod-AxisStep__Axes003-4', 'prod-AxisStep__Axes006-4', 'prod-AxisStep__Axes033-4', 'prod-AxisStep__Axes037-2', 'prod-AxisStep__Axes046-2', 'prod-AxisStep__Axes049-2', 'prod-AxisStep__Axes058-2', 'prod-AxisStep__Axes058-3', 'prod-AxisStep__Axes061-1', 'prod-AxisStep__Axes061-2', 'prod-AxisStep__Axes064-2', 'prod-AxisStep__Axes064-3', 'prod-AxisStep__Axes067-2', 'prod-AxisStep__Axes067-3', 'prod-AxisStep__Axes073-1', 'prod-AxisStep__Axes073-2', 'prod-AxisStep__Axes076-4', 'prod-AxisStep__Axes079-4', 'fn-path__path007', 'fn-path__path009', 'fn-generate-id__generate-id-005', 'fn-parse-xml-fragment__parse-xml-fragment-010', # in-scope namespaces required 'prod-AxisStep__Axes118', 'prod-AxisStep__Axes120', 'prod-AxisStep__Axes126', 'fn-resolve-QName__fn-resolve-qname-26', 'fn-in-scope-prefixes__fn-in-scope-prefixes-21', 'fn-in-scope-prefixes__fn-in-scope-prefixes-22', 'fn-in-scope-prefixes__fn-in-scope-prefixes-24', 'fn-in-scope-prefixes__fn-in-scope-prefixes-25', 'fn-in-scope-prefixes__fn-in-scope-prefixes-26', 'fn-innermost__fn-innermost-017', 'fn-innermost__fn-innermost-018', 'fn-innermost__fn-innermost-019', 'fn-innermost__fn-innermost-020', 'fn-innermost__fn-innermost-021', 'fn-outermost__fn-outermost-017', 'fn-outermost__fn-outermost-018', 'fn-outermost__fn-outermost-019', 'fn-outermost__fn-outermost-020', 'fn-outermost__fn-outermost-021', 'fn-outermost__fn-outermost-046', 'fn-local-name__fn-local-name-77', 'fn-local-name__fn-local-name-79', 'fn-name__fn-name-27', 'fn-name__fn-name-29', 'fn-string__fn-string-27', 'fn-format-number__numberformat87', 'fn-format-number__numberformat88', 'fn-path__path010', 'fn-path__path011', 'fn-path__path012', 'fn-path__path013', 'fn-function-lookup__fn-function-lookup-262', 'fn-generate-id__generate-id-007', 'fn-serialize__serialize-xml-012', 'prod-EQName__eqname-018', 'prod-EQName__eqname-023', 'prod-NamedFunctionRef__function-literal-262', # XML declaration 'fn-serialize__serialize-xml-029b', 'fn-serialize__serialize-xml-030b', # require external ENTITY parsing 'fn-parse-xml__parse-xml-010', } xpath_parser = XPath2Parser ignore_specs = {'XQ10', 'XQ10+', 'XP30', 'XP30+', 'XQ30', 'XQ30+', 'XP31', 'XP31+', 'XQ31', 'XQ31+', 'XT30+'} QT3_NAMESPACE = "http://www.w3.org/2010/09/qt-fots-catalog" namespaces = {'': QT3_NAMESPACE} INVALID_BASE_URL = 'http://www.w3.org/fots/unparsed-text/' effective_base_url = None @contextlib.contextmanager def working_directory(dirpath): orig_wd = os.getcwd() os.chdir(dirpath) try: yield finally: os.chdir(orig_wd) def get_context_result(item): if isinstance(item, XPathNode): raise TypeError("Unexpected XPath node in external results") elif isinstance(item, (list, tuple)): return [get_context_result(x) for x in item] elif hasattr(item, 'tag'): if callable(item.tag): if item.tag.__name__ == 'Comment': return CommentNode(item) else: return ProcessingInstructionNode(item) elif not hasattr(item, 'getroot'): return item return get_node_tree(root=item) def etree_is_equal(root1, root2, strict=True): nodes1 = root1.iter() nodes2 = root2.iter() for e1 in nodes1: e2 = next(nodes2, None) if e2 is None: return False if e1.tail != e2.tail: if strict or e1.tail is None or e2.tail is None: return False if e1.tail.strip() != e2.tail.strip(): return False if callable(e1.tag) ^ callable(e2.tag): return False elif not callable(e1.tag): if e1.tag != e1.tag: return False if e1.attrib != e1.attrib: return False if e1.text != e2.text: if strict or e1.text is None or e2.text is None: return False if e1.text.strip() != e2.text.strip(): return False return next(nodes2, None) is None class ExecutionError(Exception): """Common class for W3C XPath tests execution script.""" class ParseError(ExecutionError): """Other error generated by XPath expression parsing and static evaluation.""" class EvaluateError(ExecutionError): """Other error generated by XPath token evaluation with dynamic context.""" class Schema(object): """Represents an XSD schema used in XML environment settings.""" def __init__(self, elem): assert elem.tag == '{%s}schema' % QT3_NAMESPACE self.uri = elem.attrib.get('uri') self.file = elem.attrib.get('file') try: self.description = elem.find('description', namespaces).text except AttributeError: self.description = '' self.filepath = self.file and os.path.abspath(self.file) def __repr__(self): return '%s(uri=%r, file=%s)' % (self.__class__.__name__, self.uri, self.file) class Source(object): """Represents a source file as used in XML environment settings.""" namespaces = None def __init__(self, elem, use_lxml=False): assert elem.tag == '{%s}source' % QT3_NAMESPACE self.file = elem.attrib['file'] self.role = elem.attrib.get('role', '') self.uri = elem.attrib.get('uri', self.file) if not urlsplit(self.uri).scheme: self.uri = pathlib.Path(self.uri).absolute().as_uri() self.key = self.role or self.file try: self.description = elem.find('description', namespaces).text except AttributeError: self.description = '' if use_lxml: iterparse = lxml.etree.iterparse parser = lxml.etree.XMLParser(collect_ids=False) try: self.xml = lxml.etree.parse(self.file, parser=parser) except lxml.etree.XMLSyntaxError: self.xml = None else: iterparse = ElementTree.iterparse if PY38_PLUS: tree_builder = ElementTree.TreeBuilder(insert_comments=True, insert_pis=True) parser = ElementTree.XMLParser(target=tree_builder) else: parser = None try: self.xml = ElementTree.parse(self.file, parser=parser) except ElementTree.ParseError: self.xml = None try: self.namespaces = {} dup_index = 1 for _, (prefix, uri) in iterparse(self.file, events=('start-ns',)): if prefix not in self.namespaces: self.namespaces[prefix] = uri elif prefix: self.namespaces[f'{prefix}{dup_index}'] = uri dup_index += 1 else: self.namespaces[f'default{dup_index}'] = uri dup_index += 1 except (ElementTree.ParseError, lxml.etree.XMLSyntaxError): pass def __repr__(self): return '%s(file=%r)' % (self.__class__.__name__, self.file) class Collection(object): """Represents a collection of source files as used in XML environment settings.""" def __init__(self, elem, use_lxml=False): assert elem.tag == '{%s}collection' % QT3_NAMESPACE self.uri = elem.attrib.get('uri') self.query = elem.find('query', namespaces) # Not used (for XQuery) self.sources = [Source(e, use_lxml) for e in elem.iterfind('source', namespaces)] def __repr__(self): return '%s(uri=%r)' % (self.__class__.__name__, self.uri) class Environment(object): """ The XML environment definition for a test case. :param elem: the XML Element that contains the environment definition. :param use_lxml: use lxml.etree for loading XML sources. """ collection = None schema = None static_base_uri = None decimal_formats = None def __init__(self, elem, use_lxml=False): assert elem.tag == '{%s}environment' % QT3_NAMESPACE self.name = elem.get('name', 'anonymous') self.namespaces = { namespace.attrib['prefix']: namespace.attrib['uri'] for namespace in elem.iterfind('namespace', namespaces) } child = elem.find('decimal-format', namespaces) if child is not None: name = child.get('name') if name is not None and ':' in name: if use_lxml: name = get_expanded_name(name, child.nsmap) else: try: name = get_expanded_name(name, self.namespaces) except KeyError: pass self.decimal_formats = {name: child.attrib} child = elem.find('collection', namespaces) if child is not None: self.collection = Collection(child, use_lxml) child = elem.find('schema', namespaces) if child is not None: self.schema = Schema(child) child = elem.find('static-base-uri', namespaces) if child is not None: self.static_base_uri = child.get('uri') self.params = [e.attrib for e in elem.iterfind('param', namespaces)] self.sources = {} for child in elem.iterfind('source', namespaces): source = Source(child, use_lxml) self.sources[source.key] = source def __repr__(self): return '%s(name=%r)' % (self.__class__.__name__, self.name) def __str__(self): children = [] for prefix, uri in self.namespaces.items(): children.append(''.format(prefix, uri)) if self.schema is not None: children.append(''.format( self.schema.uri or '', self.schema.file or '' )) for role, source in self.sources.items(): children.append(''.format( role, source.uri or '', source.file )) return '\n {}\n'.format( self.name, '\n '.join(children) ) class TestSet(object): """ Represents a test-set as read from the catalog file and the test-set XML file itself. :param elem: the XML Element that contains the test-set definitions. :param pattern: the regex pattern for selecting test-cases to load. :param use_lxml: use lxml.etree for loading environment XML sources. :param environments: the global environments. """ def __init__(self, elem, pattern, use_lxml=False, environments=None): assert elem.tag == '{%s}test-set' % QT3_NAMESPACE self.name = elem.attrib['name'] self.file = elem.attrib['file'] self.environments = {} if environments is None else environments.copy() self.test_cases = [] self.specs = [] self.features = [] self.xsd_version = None self.use_lxml = use_lxml self.etree = lxml.etree if use_lxml else ElementTree full_path = os.path.abspath(self.file) filename = os.path.basename(full_path) self.workdir = os.path.dirname(full_path) with working_directory(self.workdir): xml_root = self.etree.parse(filename).getroot() self.description = xml_root.find('description', namespaces).text for child in xml_root.findall('dependency', namespaces): dep_type = child.attrib['type'] value = child.attrib['value'] if dep_type == 'spec': self.specs.extend(value.split(' ')) elif dep_type == 'feature': self.features.append(value) elif dep_type == 'xsd-version': self.xsd_version = value else: print("unexpected dependency type %s for test-set %r" % (dep_type, self.name)) for child in xml_root.findall('environment', namespaces): environment = Environment(child, use_lxml) self.environments[environment.name] = environment test_case_template = self.name + '__%s' for child in xml_root.findall('test-case', namespaces): if pattern.search(test_case_template % child.attrib['name']) is not None: self.test_cases.append(TestCase(child, self, use_lxml)) def __repr__(self): return '%s(name=%r)' % (self.__class__.__name__, self.name) class TestCase(object): """ Represents a test case as read from a test-set file. :param elem: the XML Element that contains the test-case definition. :param test_set: the test-set that the test-case belongs to. :param use_lxml: use lxml.etree for loading environment XML sources. """ # Single value dependencies calendar = None default_language = None format_integer_sequence = None language = None limits = None unicode_version = None unicode_normalization_form = None xml_version = None def __init__(self, elem, test_set, use_lxml=False): assert elem.tag == '{%s}test-case' % QT3_NAMESPACE self.test_set = test_set self.xsd_version = test_set.xsd_version self.use_lxml = use_lxml self.etree = lxml.etree if use_lxml else ElementTree self.name = test_set.name + "__" + elem.attrib['name'] self.description = elem.find('description', namespaces).text self.test = elem.find('test', namespaces).text result_child = elem.find('result', namespaces).find("*") self.result = Result(result_child, test_case=self, use_lxml=use_lxml) self.environment_ref = None self.environment = None self.specs = [] self.features = [] for child in elem.findall('dependency', namespaces): dep_type = child.attrib['type'] value = child.attrib['value'] if dep_type == 'spec': self.specs.extend(value.split(' ')) elif dep_type == 'feature': self.features.append(value) elif dep_type in DEPENDENCY_TYPES: setattr(self, dep_type.replace('-', '_'), value) else: print("unexpected dependency type %s for test-case %r" % (dep_type, self.name)) child = elem.find('environment', namespaces) if child is not None: if 'ref' in child.attrib: self.environment_ref = child.attrib['ref'] else: self.environment = Environment(child, use_lxml) def __repr__(self): return '%s(name=%r)' % (self.__class__.__name__, self.name) def __str__(self): children = [ '{}'.format(self.description or ''), '{}'.format(self.test) if self.test else '', '\n {}\n'.format(self.result), ] if self.environment_ref: children.append(''.format(self.environment_ref)) for dep_type in sorted(DEPENDENCY_TYPES): if dep_type == 'spec': if self.specs: children.extend(''.format(x) for x in self.specs) elif dep_type == 'feature': if self.features: children.extend(''.format(x) for x in self.features) else: value = getattr(self, dep_type.replace('-', '_')) if value is not None: children.append('<{} value="{}"/>'.format(dep_type, value)) return '\n {}\n'.format( self.name, self.test_set_file, '\n '.join('\n'.join(children).split('\n')), ) @property def test_set_file(self): return self.test_set.file def get_environment(self): env_ref = self.environment_ref if env_ref: try: return self.test_set.environments[env_ref] except KeyError: msg = "Unknown environment %s in test case %s" raise ExecutionError(msg % (env_ref, self.name)) from None elif self.environment: return self.environment def run(self, verbose=1): if verbose > 4: print("\n*** Execute test case {!r} ***".format(self.name)) print(str(self)) print() return self.result.validate(verbose) def run_xpath_test(self, verbose=1, with_context=True, with_xpath_nodes=False): """ Helper function to parse and evaluate tests with elementpath. If may_fail is true, raise the exception instead of printing and aborting """ environment = self.get_environment() # Create the parser instance (static context) if environment is None: test_namespaces = static_base_uri = schema_proxy = None else: test_namespaces = environment.namespaces.copy() for source in environment.sources.values(): if source.namespaces: for pfx, uri in source.namespaces.items(): if pfx not in test_namespaces: test_namespaces[pfx] = uri static_base_uri = environment.static_base_uri if environment.schema is None or not environment.schema.filepath: schema_proxy = None else: if verbose > 2: print("Schema %r required for test %r" % (environment.schema.file, self.name)) schema = xmlschema.XMLSchema(environment.schema.filepath) schema_proxy = schema.xpath_proxy if static_base_uri is None: if self.name == "fn-parse-xml__parse-xml-007": # workaround: static-base-uri() must return AnyURI('') for this case static_base_uri = '' else: base_uri = os.path.dirname(os.path.abspath(self.test_set_file)) if os.path.isdir(base_uri): static_base_uri = f'{pathlib.Path(base_uri).as_uri()}/' elif static_base_uri.startswith(INVALID_BASE_URL): static_base_uri = static_base_uri.replace(INVALID_BASE_URL, effective_base_url) kwargs = dict( namespaces=test_namespaces, xsd_version=self.xsd_version, schema=schema_proxy, base_uri=static_base_uri, compatibility_mode='xpath-1.0-compatibility' in self.features, ) if environment is not None and xpath_parser.version >= '3.0': kwargs['decimal_formats'] = environment.decimal_formats kwargs['defuse_xml'] = False parser = xpath_parser(**kwargs) if self.test is not None: xpath_expression = self.test.replace(INVALID_BASE_URL, effective_base_url) else: xpath_expression = None try: root_node = parser.parse(xpath_expression) # static evaluation except Exception as err: if isinstance(err, ElementPathError): raise raise ParseError(err) # Create the dynamic context if not with_context: context = None elif environment is None: context = XPathContext( root=self.etree.XML(""), namespaces=test_namespaces, timezone='Z', default_calendar=self.calendar ) else: kwargs = {'timezone': 'Z'} variables = {} documents = {} if '.' in environment.sources: root = environment.sources['.'].xml else: root = self.etree.XML("") if any(k.startswith('$') for k in environment.sources): variables.update( (k[1:], v.xml) for k, v in environment.sources.items() if k.startswith('$') ) for param in environment.params: name = param['name'] value = xpath_parser().parse(param['select']).evaluate() variables[name] = value for source in environment.sources.values(): documents[source.uri] = source.xml if environment.collection is not None: uri = environment.collection.uri collection = [source.xml for source in environment.collection.sources] if uri is not None: kwargs['collections'] = {uri: collection} if collection: kwargs['default_collection'] = collection if 'non_empty_sequence_collection' in self.features: kwargs['default_resource_collection'] = uri if test_namespaces: kwargs['namespaces'] = test_namespaces if variables: kwargs['variables'] = variables if documents: kwargs['documents'] = documents if self.calendar: kwargs['default_calendar'] = self.calendar context = XPathContext(root=root, **kwargs) try: if with_xpath_nodes: result = root_node.evaluate(context) else: result = root_node.get_results(context) except Exception as err: if isinstance(err, ElementPathError): raise raise EvaluateError(err) if verbose > 4: print("Result of evaluation: {!r}\n".format(result)) return result class Result(object): """ Class for validating the result of a test case. Result instances can be nested for multiple validation options. There are several types of result validators available: * all-of * any-of * assert * assert-count * assert-deep-eq * assert-empty * assert-eq * assert-false * assert-permutation * assert-serialization-error * assert-string-value * assert-true * assert-type * assert-xml * error * not * serialization-matches :param elem: the XML Element that contains the test-case definition. :param test_case: the test-case that the result validator belongs to. """ # Validation helper tokens parser = xpath_parser() string_token = XPath31Parser().parse('fn:string($result)') string_join_token = XPath31Parser().parse('fn:string-join($result, " ")') def __init__(self, elem, test_case, use_lxml=False): self.test_case = test_case self.use_lxml = use_lxml self.etree = lxml.etree if use_lxml else ElementTree self.type = elem.tag.split('}')[1] self.value = elem.text self.attrib = {k: v for k, v in elem.attrib.items()} if self.value is None and self.type == 'assert-xml': self.attrib['file'] = os.path.abspath(self.attrib['file']) self.children = [Result(child, test_case) for child in elem.findall('*')] self.validate = getattr(self, '%s_validator' % self.type.replace("-", "_")) def __repr__(self): return '%s(type=%r)' % (self.__class__.__name__, self.type) def __str__(self): attrib = ' '.join('{}="{}"'.format(k, v) for k, v in self.attrib.items()) if self.children: return '<{0} {1}>{2}{3}\n'.format( self.type, attrib, self.value if self.value is not None else '', '\n '.join(str(child) for child in self.children), ) elif self.value is not None: return '<{0} {1}>{2}'.format(self.type, attrib, self.value) else: return '<{} {}/>'.format(self.type, attrib) def report_failure(self, verbose=1, **results): if verbose <= 1: return if verbose < 4: print('Result <{}> failed for test case {!r}'.format(self.type, self.test_case.name)) print('XPath expression: {}'.format(self.test_case.test)) else: print('Result <{}> failed\n'.format(self.type)) print(self.test_case) if results: print() print_traceback = False max_key = max(len(k) for k in results) for k, v in results.items(): if isinstance(v, Exception): v = "Unexpected {!r}: {}".format(type(v), v) if verbose >= 3: print_traceback = True print(' {}: {}{!r}'.format(k, ' ' * (max_key - len(k)), v)) if print_traceback: print() traceback.print_exc() print() def all_of_validator(self, verbose=1): """Valid if all child result validators are valid.""" assert self.children result = True for child in self.children: if not child.validate(verbose): result = False return result def any_of_validator(self, verbose=1): """Valid if any child result validator is valid.""" assert self.children result = False for child in self.children: if child.validate(): result = True if not result and verbose > 1: for child in self.children: child.validate(verbose) return result def not_validator(self, verbose=1): """Valid if the child result validator is not valid.""" assert len(self.children) == 1 result = not self.children[0].validate() if not result and verbose > 1: self.children[0].validate(verbose) if not result: self.report_failure(verbose, expected=False, result=True) return result def assert_eq_validator(self, verbose=1): try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False if isinstance(result, list) and len(result) == 1: result = result[0] parser = xpath_parser(xsd_version=self.test_case.xsd_version) root_node = parser.parse(self.value) context = XPathContext(root=self.etree.XML("")) expected_result = root_node.evaluate(context) try: if expected_result == result: return True elif isinstance(expected_result, decimal.Decimal) and isinstance(result, float): if float(expected_result) == result: return True except TypeError: pass self.report_failure(verbose, expected=expected_result, result=result) return False def assert_type_validator(self, verbose=1): try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False if isinstance(result, list) and len(result) == 1: result = result[0] if self.value == 'function(*)': type_check = isinstance(result, XPathFunction) elif not self.parser.is_sequence_type(self.value): msg = " test-case {}: {!r} is not a valid sequence type" print(msg.format(self.test_case.name, self.value)) type_check = False else: context_result = get_context_result(result) type_check = self.parser.match_sequence_type(context_result, self.value) if not type_check: self.report_failure( verbose, expected=self.value, result=result, result_type=type(result) ) return type_check def assert_string_value_validator(self, verbose=1): try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False context = XPathContext(self.etree.XML(""), variables={'result': result}) if isinstance(result, list): value = self.string_join_token.evaluate(context) else: value = self.string_token.evaluate(context) if self.attrib.get('normalize-space'): expected = re.sub(r'\s+', ' ', self.value).strip() value = ' '.join(x.strip() for x in value.split('\n')).strip() else: expected = self.value if not value: if expected is None: return True elif value == expected: return True elif isinstance(expected, str): # workaround for typos in some expected values if expected.strip() == value: return True elif expected.replace('v ;', 'v;') == value: return True if value and ' ' not in value: try: dv = decimal.Decimal(value) if math.isclose(dv, decimal.Decimal(expected), rel_tol=1E-7, abs_tol=0.0): return True except decimal.DecimalException: pass self.report_failure( verbose, expected=expected, string_value=value, xpath_result=result ) return False def error_validator(self, verbose=1): code = self.attrib.get('code', '*').strip() err_traceback = '' try: self.test_case.run_xpath_test(verbose, with_context=code != 'XPDY0002') except ElementPathError as err: if code == '*' or code in str(err): return True if verbose > 3: err_traceback = ''.join(traceback.format_exception(None, err, err.__traceback__)) reason = "Unexpected error {!r}: {}".format(type(err), str(err)) except (ParseError, EvaluateError) as err: if verbose > 3: err_traceback = ''.join(traceback.format_exception(None, err, err.__traceback__)) reason = "Not an elementpath error {!r}: {}".format(type(err), str(err)) else: reason = "Error not raised" self.report_failure(verbose, reason=reason, expected_code=code) if err_traceback: print(err_traceback) return False def assert_true_validator(self, verbose=1): """Valid if the result is `True`.""" try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False else: if result is True or isinstance(result, list) and result and result[0] is True: return True self.report_failure(verbose) return False def assert_false_validator(self, verbose=1): """Valid if the result is `False`.""" try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False else: if result is False or isinstance(result, list) and result and result[0] is False: return True self.report_failure(verbose) return False def assert_count_validator(self, verbose=1): """Valid if the number of items of the result matches.""" try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False if isinstance(result, AnyAtomicType): length = 1 else: try: length = len(result) except TypeError as err: self.report_failure(verbose, error=err) return False if int(self.value) == length: return True self.report_failure( verbose, expected=int(self.value), value=length, xpath_result=result ) return False def assert_validator(self, verbose=1): """ Assert validator contains an XPath expression whose value must be true. The expression may use the variable $result, which is the result of the original test. """ try: result = self.test_case.run_xpath_test(verbose, with_xpath_nodes=True) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False variables = {'result': result} parser = XPath31Parser(xsd_version=self.test_case.xsd_version) root_node = parser.parse(self.value) context = XPathContext(root=self.etree.XML(""), variables=variables) if root_node.boolean_value(root_node.evaluate(context)) is True: return True self.report_failure(verbose) return False def assert_deep_eq_validator(self, verbose=1): try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False if isinstance(result, (str, bytes)): result = result.strip() expression = "fn:deep-equal($result, (%s))" % self.value variables = {'result': result} parser = XPath31Parser(xsd_version=self.test_case.xsd_version) root_node = parser.parse(expression) context = XPathContext(root=self.etree.XML(""), variables=variables) if root_node.evaluate(context) is True: return True self.report_failure(verbose, expected=self.value, result=result) return False def assert_empty_validator(self, verbose=1): """Valid if the result is empty.""" try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False else: if result is None or result == '' or result == [] or result == ['']: return True self.report_failure(verbose, result=result) return False def assert_permutation_validator(self, verbose=1): try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False if not isinstance(result, list): result = [result] expected = self.parser.parse(self.value).evaluate() if not isinstance(expected, list): expected = [expected] if set(expected) == set(result): return True if len(expected) == len(result): _expected = set(expected) for value in result: if value in _expected: _expected.remove(value) continue elif not isinstance(value, (float, decimal.Decimal)): self.report_failure(verbose, result=result, expected=expected) return False dv = decimal.Decimal(value) for ev in _expected: if not isinstance(ev, (float, decimal.Decimal)): continue elif math.isnan(ev) and math.isnan(dv): _expected.remove(ev) break elif math.isclose(dv, decimal.Decimal(ev), rel_tol=1E-7, abs_tol=0.0): _expected.remove(ev) break else: self.report_failure(verbose, result=result, expected=expected) return False return True self.report_failure(verbose, result=result, expected=expected) return False def assert_serialization_error_validator(self, verbose=1): # TODO: this currently succeeds on any error try: self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError): return True else: return False def assert_xml_validator(self, verbose=1): try: if self.test_case.test_set.name == 'fn-parse-xml': with working_directory(self.test_case.test_set.workdir): result = self.test_case.run_xpath_test(verbose) else: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False if result is None: return False if self.use_lxml: fromstring = lxml.etree.fromstring tostring = lxml.etree.tostring else: fromstring = ElementTree.fromstring tostring = ElementTree.tostring environment = self.test_case.get_environment() if environment is not None: for source in environment.sources.values(): if source.namespaces: for prefix, uri in source.namespaces.items(): ElementTree.register_namespace(prefix, uri) for prefix, uri in environment.namespaces.items(): ElementTree.register_namespace(prefix, uri) else: for prefix, uri in self.parser.namespaces.items(): ElementTree.register_namespace(prefix, uri) if self.value is not None: expected = self.value else: with open(self.attrib['file']) as fp: expected = fp.read() if type(result) == list: parts = [] for item in result: if isinstance(item, elementpath.ElementNode): tail, item.elem.tail = item.elem.tail, None parts.append(tostring(item.elem).decode('utf-8').strip()) item.elem.tail = tail elif isinstance(item, XPathNode): parts.append(str(item.value)) elif hasattr(item, 'tag'): tail, item.tail = item.tail, None parts.append(tostring(item).decode('utf-8').strip()) item.tail = tail elif hasattr(item, 'getroot'): parts.append(tostring(item.getroot()).decode('utf-8').strip()) else: parts.append(str(item)) xml_str = ''.join(parts) else: try: root = result.getroot() except AttributeError: root = result xml_str = tostring(root).decode('utf-8').strip() # Remove character data from result if expected result is serialized if '\n' not in expected: xml_str = '>'.join(s.lstrip() for s in xml_str.split('>\n')) # Strip the tail from serialized result if '>' in xml_str: tail_pos = xml_str.rindex('>') + 1 if tail_pos < len(xml_str): xml_str = xml_str[:tail_pos] if xml_str == expected or xml_str.replace(' />', '/>') == expected: return True # 2nd tentative (expected result from a serialization or comparing trees) try: if xml_str == tostring(fromstring(expected)).decode('utf-8').strip(): return True if etree_is_equal(fromstring(xml_str), fromstring(expected)): return True except (ElementTree.ParseError, lxml.etree.ParseError): # invalid XML data (maybe empty or concatenation of XML elements) # Last try removing xmlns registrations xmlns_pattern = re.compile(r'\sxmlns[^"]+"[^"]+"') expected_xmlns = xmlns_pattern.findall(expected) if any(xmlns not in expected_xmlns for xmlns in xmlns_pattern.findall(xml_str)): pass elif xmlns_pattern.sub('', xml_str) == xmlns_pattern.sub('', expected): return True self.report_failure(verbose, result=xml_str, expected=self.value or self.attrib['file']) return False def serialization_matches_validator(self, verbose=1): try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError): return False regex = re.compile(self.value) return regex.match(result) def main(): global xpath_parser parser = argparse.ArgumentParser() parser.add_argument('catalog', metavar='CATALOG_FILE', help='the path to the main index file of test suite (catalog.xml)') parser.add_argument('pattern', nargs='?', default='.*', metavar='PATTERN', help='run only test cases which name matches a regex pattern') parser.add_argument('--xpath', metavar='XPATH_EXPR', help="run only test cases that have a specific XPath expression") parser.add_argument('-i', dest='ignore_case', action='store_true', default=False, help="ignore character case for regex pattern matching") parser.add_argument('--xp30', action='store_true', default=False, help="test XPath 3.0 parser") parser.add_argument('--xp31', action='store_true', default=False, help="test XPath 3.0 parser") parser.add_argument('-l', '--lxml', dest='use_lxml', action='store_true', default=False, help="use lxml.etree for environment sources (default is ElementTree)") parser.add_argument('-v', dest='verbose', action='count', default=1, help='increase verbosity: one option to show unexpected errors, ' 'two for show also unmatched error codes, three for debug') parser.add_argument('-r', dest='report', metavar='REPORT_FILE', help="write a report (JSON format) to the given file") args = parser.parse_args() report = OrderedDict() report["summary"] = OrderedDict() report['other_failures'] = [] report['unknown'] = [] report['failed'] = [] report['success'] = [] catalog_file = os.path.abspath(args.catalog) pattern = re.compile(args.pattern, flags=re.IGNORECASE if args.ignore_case else 0) etree = lxml.etree if args.use_lxml else ElementTree if not os.path.isfile(catalog_file): print("Error: catalog file %s does not exist" % args.catalog) sys.exit(1) if args.xp31: from elementpath.xpath31 import XPath31Parser xpath_parser = XPath31Parser ignore_specs.remove('XP30+') ignore_specs.remove('XP31') ignore_specs.remove('XP31+') ignore_specs.add('XP20') elif args.xp30: from elementpath.xpath30 import XPath30Parser xpath_parser = XPath30Parser ignore_specs.remove('XP30') ignore_specs.remove('XP30+') ignore_specs.add('XP20') with working_directory(dirpath=os.path.dirname(catalog_file)): catalog_xml = etree.parse(catalog_file) global effective_base_url effective_base_url = 'file://{}/fn/unparsed-text/'.format(os.getcwd()) environments = {} for child in catalog_xml.getroot().iterfind("environment", namespaces): environment = Environment(child, args.use_lxml) environments[environment.name] = environment test_sets = {} for child in catalog_xml.getroot().iterfind("test-set", namespaces): test_set = TestSet(child, pattern, args.use_lxml, environments) test_sets[test_set.name] = test_set count_read = 0 count_skip = 0 count_run = 0 count_success = 0 count_failed = 0 count_unknown = 0 count_other_failures = 0 for test_set in test_sets.values(): # ignore by specs of test_set ignore_all_in_test_set = test_set.specs and all( dep in ignore_specs for dep in test_set.specs ) for test_case in test_set.test_cases: count_read += 1 if ignore_all_in_test_set: count_skip += 1 continue # ignore test cases for XML version 1.1 (not yet supported by Python's libraries) if test_case.xml_version == '1.1': count_skip += 1 continue # ignore by specs of test_case if test_case.specs and all(dep in ignore_specs for dep in test_case.specs): count_skip += 1 continue # ignore tests that rely on higher-order function such as array:sort() if 'higherOrderFunctions' in test_case.features: count_skip += 1 continue # ignore tests that rely on DTD parsing (TODO with lxml or a custom parser) if 'infoset-dtd' in test_case.features \ or test_case.environment_ref == 'id-idref-dtd': count_skip += 1 continue # ignore cases where a directory is used as collection uri (not supported # feature, only the case fn-collection__collection-010) if 'directory-as-collection-uri' in test_case.features: count_skip += 1 continue # ignore tests that rely on XQuery 1.0/XPath 2.0 static-typing enforcement if 'staticTyping' in test_case.test_set.features \ or 'staticTyping' in test_case.features: count_skip += 1 continue # ignore tests that rely on processing-instructions and comments if test_case.environment_ref == 'bib2': count_skip += 1 continue # Other test cases to skip for technical limitations if test_case.name in SKIP_TESTS: count_skip += 1 continue if not args.use_lxml and test_case.name in LXML_ONLY: count_skip += 1 continue if args.xpath and test_case.test != args.xpath: count_skip += 1 continue if args.xp30 and not args.xp31 and test_case.test: if 'parse-json' in test_case.test: count_skip += 1 continue elif 'map {' in test_case.test: count_skip += 1 continue count_run += 1 try: case_result = test_case.run(verbose=args.verbose) if case_result is True: if args.report: report['success'].append(test_case.name) count_success += 1 elif case_result is False: if args.report: report['failed'].append(test_case.name) count_failed += 1 else: if args.report: report['unknown'].append(test_case.name) count_unknown += 1 except Exception as err: print("\nUnexpected failure for test %r" % test_case.name) print(type(err), str(err)) if args.verbose >= 4: traceback.print_exc() if args.report: report['other_failures'].append(test_case.name) count_other_failures += 1 print("\n*** Totals of W3C XPath tests execution ***\n") print("%d test cases read" % count_read) print("%d test cases skipped" % count_skip) print("%d test cases run\n" % count_run) print(" %d success" % count_success) print(" %d failed" % count_failed) print(" %d unknown" % count_unknown) print(" %d other failures" % count_other_failures) if args.report: report['summary']['read'] = count_read report['summary']['skipped'] = count_skip report['summary']['run'] = count_run report['summary']['success'] = count_success report['summary']['failed'] = count_failed report['summary']['unknown'] = count_unknown report['summary']['other_failures'] = count_other_failures with open(args.report, 'w') as outfile: outfile.write(json.dumps(report, indent=2)) if __name__ == '__main__': sys.exit(main()) elementpath-3.0.2/tests/memory_profiling.py000066400000000000000000000023471427546011100211260ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # flake8: noqa from memory_profiler import profile # noinspection PyUnresolvedReferences @profile(precision=3) def elementpath_memory_usage(): # Memory relevant standard library imports import pathlib import decimal import calendar import xml.etree.ElementTree import unicodedata # elementpath imports # # Note: comments out all subpackages imports in elementpath/__init__.py # to put in evidence the memory consumption of each subpackage. # import elementpath import elementpath.regex import elementpath.datatypes import elementpath.xpath_nodes import elementpath.xpath_context import elementpath.xpath_token import elementpath.xpath1 import elementpath.xpath2 # Optional elementpath imports import elementpath.xpath30 import elementpath.xpath31 if __name__ == '__main__': elementpath_memory_usage() elementpath-3.0.2/tests/mypy_tests/000077500000000000000000000000001427546011100174055ustar00rootroot00000000000000elementpath-3.0.2/tests/mypy_tests/selectors.py000077500000000000000000000007631427546011100217730ustar00rootroot00000000000000#!/usr/bin/env python def main() -> None: from xml.etree.ElementTree import XML import elementpath root = XML('') result = elementpath.select(root, '*') print(result) result = list(elementpath.iter_select(root, '*')) print(result) selector = elementpath.Selector('*') result = selector.select(root) print(result) result = list(selector.iter_select(root)) print(result) if __name__ == '__main__': main() elementpath-3.0.2/tests/resources/000077500000000000000000000000001427546011100171775ustar00rootroot00000000000000elementpath-3.0.2/tests/resources/analyze-string.xsd000066400000000000000000000023011427546011100226620ustar00rootroot00000000000000 elementpath-3.0.2/tests/resources/external_entity.xml000066400000000000000000000002531427546011100231370ustar00rootroot00000000000000 ]> elementpath-3.0.2/tests/resources/sample.xml000066400000000000000000000001031427546011100211740ustar00rootroot00000000000000 abc àèéìù elementpath-3.0.2/tests/resources/unparsed_entity.xml000066400000000000000000000004641427546011100231420ustar00rootroot00000000000000 ]> elementpath-3.0.2/tests/resources/unused_external_entity.xml000066400000000000000000000002521427546011100245210ustar00rootroot00000000000000 ]> abc elementpath-3.0.2/tests/resources/unused_unparsed_entity.xml000066400000000000000000000004161427546011100245220ustar00rootroot00000000000000 ]> elementpath-3.0.2/tests/resources/with_entity.xml000066400000000000000000000002011427546011100222610ustar00rootroot00000000000000 ]> &e; elementpath-3.0.2/tests/test_datatypes.py000066400000000000000000002132501427546011100205770ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest import sys import datetime import math import operator import pickle import platform import random from decimal import Decimal from calendar import isleap from textwrap import dedent from xml.etree import ElementTree try: import xmlschema except ImportError: xmlschema = None from elementpath.helpers import MONTH_DAYS, MONTH_DAYS_LEAP from elementpath.datatypes import DateTime, DateTime10, Date, Date10, Time, \ Timezone, Duration, DayTimeDuration, YearMonthDuration, UntypedAtomic, \ GregorianYear, GregorianYear10, GregorianYearMonth, GregorianYearMonth10, \ GregorianMonthDay, GregorianMonth, GregorianDay, AbstractDateTime, NumericProxy, \ ArithmeticProxy, Id, Notation, QName, Base64Binary, HexBinary, NormalizedString, \ XsdToken, Language, Float, Float10, Integer, AnyURI, BooleanProxy, DecimalProxy, \ DoubleProxy10, DoubleProxy, StringProxy, get_atomic_value from elementpath.datatypes.atomic_types import AtomicTypeMeta from elementpath.datatypes.datetime import OrderedDateTime class AnyAtomicTypeTest(unittest.TestCase): def test_invalid_type_name(self): with self.assertRaises(TypeError): class InvalidAtomicType(metaclass=AtomicTypeMeta): name = b'invalid' def test_validation(self): class AnotherAtomicType(metaclass=AtomicTypeMeta): pass self.assertIsNone(AnotherAtomicType.validate(AnotherAtomicType())) self.assertIsNone(AnotherAtomicType.validate('')) with self.assertRaises(TypeError) as ctx: AnotherAtomicType.validate(10) self.assertIn("invalid type for xyz').text)) self.assertFalse(Id.is_valid(ElementTree.XML('xyz abc').text)) self.assertFalse(Id.is_valid(ElementTree.XML('12345').text)) self.assertTrue(Id.is_valid('alpha')) self.assertFalse(Id.is_valid('alpha beta')) self.assertFalse(Id.is_valid('12345')) def test_new_instance(self): self.assertEqual(NormalizedString(' a b\t c\n'), ' a b c ') self.assertEqual(NormalizedString(10.0), '10.0') self.assertEqual(XsdToken(10), '10') self.assertEqual(Language(True), 'true') with self.assertRaises(ValueError) as ctx: Language(10), '10' self.assertEqual("invalid value '10' for xs:language", str(ctx.exception)) class FloatTypesTest(unittest.TestCase): def test_init(self): self.assertEqual(Float10(10), 10.0) self.assertTrue(math.isnan(Float10('NaN'))) self.assertTrue(math.isinf(Float10('INF'))) self.assertTrue(math.isinf(Float10('-INF'))) with self.assertRaises(ValueError): Float10('+INF') self.assertTrue(math.isnan(Float('NaN'))) self.assertTrue(math.isinf(Float('INF'))) self.assertTrue(math.isinf(Float('-INF'))) self.assertTrue(math.isinf(Float('+INF'))) with self.assertRaises(ValueError): Float10('nan') with self.assertRaises(ValueError): Float10('inf') def test_hash(self): self.assertEqual(hash(Float10(892.1)), hash(892.1)) def test_equivalence(self): self.assertEqual(Float10('10.1'), Float10('10.1')) self.assertEqual(Float10('10.1'), Float('10.1')) self.assertNotEqual(Float10('10.1001'), Float10('10.1')) self.assertFalse(Float10('10.1001') == Float10('10.1')) self.assertNotEqual(Float10('10.1001'), Float('10.1')) self.assertFalse(Float10('10.1') != Float10('10.1')) self.assertEqual(Float10('10.0'), 10) self.assertNotEqual(Float10('10.0'), 11) def test_addition(self): self.assertEqual(Float10('10.1') + Float10('10.1'), 20.2) self.assertEqual(Float('10.1') + Float10('10.1'), 20.2) self.assertEqual(10.1 + Float10('10.1'), 20.2) def test_subtraction(self): self.assertEqual(Float10('10.1') - Float10('1.1'), 9.0) self.assertEqual(Float('10.1') - Float10('1.1'), 9.0) self.assertEqual(10.1 - Float10('1.1'), 9.0) self.assertEqual(10 - Float10('1.1'), 8.9) def test_multiplication(self): self.assertEqual(Float10('10.1') * 2, 20.2) self.assertEqual(Float('10.1') * 2.0, 20.2) self.assertEqual(2 * Float10('10.1'), 20.2) self.assertEqual(2.0 * Float('10.1'), 20.2) def test_division(self): self.assertEqual(Float10('20.2') / 2, 10.1) self.assertEqual(Float('20.2') / 2.0, 10.1) self.assertEqual(20.2 / Float10('2'), 10.1) self.assertEqual(20 / Float('2'), 10.0) def test_module(self): self.assertEqual(Float10('20.2') % 3, 20.2 % 3) self.assertEqual(Float('20.2') % 3.0, 20.2 % 3.0) self.assertEqual(20.2 % Float10('3'), 20.2 % 3) self.assertEqual(20 % Float('3.0'), 20 % 3.0) def test_abs(self): self.assertEqual(abs(Float10('-20.2')), 20.2) class IntegerTypesTest(unittest.TestCase): def test_validate(self): self.assertIsNone(Integer.validate(10)) self.assertIsNone(Integer.validate(Integer(10))) self.assertIsNone(Integer.validate('10')) with self.assertRaises(TypeError): Integer.validate(True) with self.assertRaises(ValueError): Integer.validate('10.1') class UntypedAtomicTest(unittest.TestCase): def test_init(self): self.assertEqual(UntypedAtomic(1).value, '1') self.assertEqual(UntypedAtomic(-3.9).value, '-3.9') self.assertEqual(UntypedAtomic('alpha').value, 'alpha') self.assertEqual(UntypedAtomic(b'beta').value, 'beta') self.assertEqual(UntypedAtomic(True).value, 'true') self.assertEqual(UntypedAtomic(UntypedAtomic(2)).value, '2') self.assertEqual(UntypedAtomic(Date.fromstring('2000-02-01')).value, '2000-02-01') with self.assertRaises(TypeError) as err: UntypedAtomic(None) self.assertEqual(str(err.exception), "None is not an atomic value") def test_repr(self): self.assertEqual(repr(UntypedAtomic(7)), "UntypedAtomic('7')") def test_eq(self): self.assertTrue(UntypedAtomic(-10) == UntypedAtomic(-10)) self.assertTrue(UntypedAtomic(5.2) == UntypedAtomic(5.2)) self.assertTrue(UntypedAtomic('-6.09') == UntypedAtomic('-6.09')) self.assertTrue(UntypedAtomic(Decimal('8.91')) == UntypedAtomic(Decimal('8.91'))) self.assertTrue(UntypedAtomic(False) == UntypedAtomic(False)) self.assertTrue(UntypedAtomic(-10) == -10) self.assertTrue(-10 == UntypedAtomic(-10)) self.assertTrue('-10' == UntypedAtomic(-10)) self.assertTrue(UntypedAtomic(False) == bool(False)) self.assertTrue(bool(False) == UntypedAtomic(False)) self.assertTrue(Decimal('8.91') == UntypedAtomic(Decimal('8.91'))) self.assertTrue(UntypedAtomic(Decimal('8.91')) == Decimal('8.91')) self.assertTrue(bool(True) == UntypedAtomic(1)) with self.assertRaises(ValueError) as ctx: _ = bool(True) == UntypedAtomic(10) self.assertEqual(str(ctx.exception), "'10' cannot be cast to xs:boolean") self.assertFalse(-10.9 == UntypedAtomic(-10)) self.assertFalse(UntypedAtomic(-10) == -11) self.assertFalse(UntypedAtomic(-10.5) == UntypedAtomic(-10)) self.assertFalse(-10.5 == UntypedAtomic(-10)) self.assertFalse(-17 == UntypedAtomic(-17.3)) def test_ne(self): self.assertTrue(UntypedAtomic(True) != UntypedAtomic(False)) self.assertTrue(UntypedAtomic(5.12) != UntypedAtomic(5.2)) self.assertTrue('29' != UntypedAtomic(5.2)) self.assertFalse('2.0' != UntypedAtomic('2.0')) def test_lt(self): self.assertTrue(UntypedAtomic(9.0) < UntypedAtomic(15)) self.assertTrue(False < UntypedAtomic(True)) self.assertTrue(UntypedAtomic('78') < 100.0) self.assertFalse(UntypedAtomic('100.1') < 100.0) def test_le(self): self.assertTrue(UntypedAtomic(9.0) <= UntypedAtomic(15)) self.assertTrue(False <= UntypedAtomic(False)) self.assertTrue(UntypedAtomic('78') <= 100.0) self.assertFalse(UntypedAtomic('100.001') <= 100.0) def test_gt(self): self.assertTrue(UntypedAtomic(25) > UntypedAtomic(15)) self.assertTrue(25 > UntypedAtomic(15)) self.assertTrue(UntypedAtomic(25) > 15) self.assertTrue(UntypedAtomic(25) > '15') def test_ge(self): self.assertTrue(UntypedAtomic(25) >= UntypedAtomic(25)) self.assertFalse(25 >= UntypedAtomic(25.1)) def test_add(self): self.assertEqual(UntypedAtomic(20) + UntypedAtomic(3), UntypedAtomic(23)) self.assertEqual(UntypedAtomic(-2) + UntypedAtomic(3), UntypedAtomic(1)) self.assertEqual(UntypedAtomic(17) + UntypedAtomic(5.1), UntypedAtomic(22.1)) self.assertEqual(UntypedAtomic('1') + UntypedAtomic('2.7'), UntypedAtomic(3.7)) def test_conversion(self): self.assertEqual(str(UntypedAtomic(25.1)), '25.1') self.assertEqual(int(UntypedAtomic(25)), 25) with self.assertRaises(ValueError): int(UntypedAtomic(25.1)) self.assertEqual(float(UntypedAtomic(25.1)), 25.1) self.assertEqual(bool(UntypedAtomic(True)), True) self.assertEqual(str(UntypedAtomic(u'Joan Miró')), u'Joan Miró') self.assertEqual(bytes(UntypedAtomic(u'Joan Miró')), b'Joan Mir\xc3\xb3') def test_numerical_operators(self): self.assertEqual(0.25 * UntypedAtomic(1000), 250) self.assertEqual(1200 - UntypedAtomic(1000.0), 200.0) self.assertEqual(UntypedAtomic(1000.0) - 250, 750.0) self.assertEqual(UntypedAtomic('1000.0') - 250, 750.0) self.assertEqual(UntypedAtomic('1000.0') - UntypedAtomic(250), 750.0) self.assertEqual(UntypedAtomic(0.75) * UntypedAtomic(100), 75) self.assertEqual(UntypedAtomic('0.75') * UntypedAtomic('100'), 75) self.assertEqual(UntypedAtomic('9.0') / UntypedAtomic('3'), 3.0) self.assertEqual(9.0 / UntypedAtomic('3'), 3.0) self.assertEqual(UntypedAtomic('15') * UntypedAtomic('4'), 60) def test_abs(self): self.assertEqual(abs(UntypedAtomic(-10)), 10) def test_mod(self): self.assertEqual(UntypedAtomic(1) % 2, 1) self.assertEqual(UntypedAtomic('1') % 2, 1.0) def test_hashing(self): self.assertEqual(hash(UntypedAtomic(12345)), hash('12345')) self.assertIsInstance(hash(UntypedAtomic('alpha')), int) def test_validate(self): self.assertIsNone(UntypedAtomic.validate(UntypedAtomic('10'))) self.assertRaises(TypeError, UntypedAtomic.validate, '10') self.assertRaises(TypeError, UntypedAtomic.validate, 10) class DateTimeTypesTest(unittest.TestCase): def test_abstract_classes(self): self.assertRaises(TypeError, AbstractDateTime) self.assertRaises(TypeError, OrderedDateTime) def test_datetime_init(self): with self.assertRaises(ValueError) as err: DateTime(year=0, month=1, day=1) self.assertIn("0 is an illegal value for year", str(err.exception)) with self.assertRaises(TypeError) as err: DateTime(year=-1999.0, month=1, day=1) self.assertIn("invalid type for year", str(err.exception)) def test_datetime_fromstring(self): dt = DateTime.fromstring('2000-10-07T00:00:00') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(2000, 10, 7)) dt = DateTime.fromstring('-2000-10-07T00:00:00') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(4, 10, 7)) self.assertEqual(dt._year, -2001) dt = DateTime.fromstring('2020-03-05T23:04:10.047') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(2020, 3, 5, 23, 4, 10, 47000)) with self.assertRaises(TypeError) as err: DateTime.fromstring(b'00-10-07') self.assertIn("1st argument has an invalid type ", str(err.exception)) with self.assertRaises(TypeError) as err: DateTime.fromstring('2010-10-07', tzinfo='Z') self.assertIn("2nd argument has an invalid type ", str(err.exception)) with self.assertRaises(ValueError) as err: DateTime.fromstring('2000-10-07') self.assertIn("Invalid datetime string", str(err.exception)) with self.assertRaises(ValueError) as err: DateTime.fromstring('00-10-07T00:00:00') self.assertIn("Invalid datetime string", str(err.exception)) with self.assertRaises(ValueError) as err: DateTime.fromstring('2020-03-05 23:04:10.047') self.assertIn("Invalid datetime string", str(err.exception)) dt = DateTime.fromstring('2000-10-07T00:00:00.100000') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(2000, 10, 7, microsecond=100000)) def test_issue_36_fromstring_with_more_microseconds_digits(self): dt = DateTime.fromstring('2000-10-07T00:00:00.00090001') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(2000, 10, 7, microsecond=900)) dt = DateTime.fromstring('2000-10-07T00:00:00.0009009999') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(2000, 10, 7, microsecond=900)) dt = DateTime.fromstring('2000-10-07T00:00:00.1000000') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(2000, 10, 7, microsecond=100000)) # Regression test of issue #36 tz = Timezone.fromstring('+01:00') dt = DateTime.fromstring('2021-02-21T21:43:03.1121296+01:00') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(2021, 2, 21, 21, 43, 3, 112129, tz)) # From W3C's XQuery/XPath tests dt = DateTime.fromstring('9999-12-31T23:59:59.9999999') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(9999, 12, 31, 23, 59, 59, 999999)) def test_date_fromstring(self): self.assertIsInstance(Date.fromstring('2000-10-07'), Date) self.assertIsInstance(Date.fromstring('-2000-10-07'), Date) self.assertIsInstance(Date.fromstring('0000-02-29'), Date) with self.assertRaises(ValueError) as ctx: Date10.fromstring('0000-02-29') self.assertIn("year '0000' is an illegal value for XSD 1.0", str(ctx.exception)) with self.assertRaises(ValueError) as ctx: Date.fromstring('01000-02-29') self.assertIn("when year exceeds 4 digits leading zeroes are not allowed", str(ctx.exception)) dt = Date.fromstring("-0003-01-01") self.assertEqual(dt._year, -4) self.assertEqual(dt._dt.year, 6) self.assertEqual(dt._dt.month, 1) self.assertEqual(dt._dt.day, 1) self.assertTrue(dt.bce) def test_fromdatetime(self): dt = datetime.datetime(2000, 1, 20) self.assertEqual(str(DateTime.fromdatetime(dt)), '2000-01-20T00:00:00') with self.assertRaises(TypeError) as err: DateTime.fromdatetime('2000-10-07') self.assertEqual("1st argument has an invalid type ", str(err.exception)) with self.assertRaises(TypeError) as err: DateTime.fromdatetime(dt, year='0001') self.assertEqual("2nd argument has an invalid type ", str(err.exception)) self.assertEqual(str(DateTime.fromdatetime(dt, year=1)), '0001-01-20T00:00:00') def test_iso_year_property(self): self.assertEqual(DateTime(2000, 10, 7).iso_year, '2000') self.assertEqual(DateTime(20001, 10, 7).iso_year, '20001') self.assertEqual(DateTime(-9999, 10, 7).iso_year, '-9998') self.assertEqual(DateTime10(-9999, 10, 7).iso_year, '-9999') self.assertEqual(DateTime(-1, 10, 7).iso_year, '0000') self.assertEqual(DateTime10(-1, 10, 7).iso_year, '-0001') def test_datetime_repr(self): dt = DateTime.fromstring('2000-10-07T00:00:00') self.assertEqual(repr(dt), "DateTime(2000, 10, 7, 0, 0, 0)") self.assertEqual(str(dt), '2000-10-07T00:00:00') dt = DateTime.fromstring('-0100-04-13T23:59:59') self.assertEqual(repr(dt), "DateTime(-101, 4, 13, 23, 59, 59)") self.assertEqual(str(dt), '-0100-04-13T23:59:59') dt = DateTime10.fromstring('-0100-04-13T10:30:00-04:00') if sys.version_info >= (3, 7): self.assertEqual( repr(dt), "DateTime10(-100, 4, 13, 10, 30, 0, " "tzinfo=Timezone(datetime.timedelta(days=-1, seconds=72000)))" ) else: self.assertEqual(repr(dt), "DateTime10(-100, 4, 13, 10, 30, 0, " "tzinfo=Timezone(datetime.timedelta(-1, 72000)))") self.assertEqual(str(dt), '-0100-04-13T10:30:00-04:00') dt = DateTime(2001, 1, 1, microsecond=10) self.assertEqual(repr(dt), 'DateTime(2001, 1, 1, 0, 0, 0.000010)') self.assertEqual(str(dt), '2001-01-01T00:00:00.00001') def test_24_hour_datetime(self): dt = DateTime.fromstring('0000-09-19T24:00:00Z') self.assertEqual(str(dt), '0000-09-20T00:00:00Z') def test_date_repr(self): dt = Date.fromstring('2000-10-07') self.assertEqual(repr(dt), "Date(2000, 10, 7)") self.assertEqual(str(dt), '2000-10-07') dt = Date.fromstring('-0100-04-13') self.assertEqual(repr(dt), "Date(-101, 4, 13)") self.assertEqual(str(dt), '-0100-04-13') dt = Date10.fromstring('-0100-04-13') self.assertEqual(repr(dt), "Date10(-100, 4, 13)") self.assertEqual(str(dt), '-0100-04-13') dt = Date.fromstring("-0003-01-01") self.assertEqual(repr(dt), "Date(-4, 1, 1)") self.assertEqual(str(dt), '-0003-01-01') dt = Date10.fromstring("-0003-01-01") self.assertEqual(repr(dt), "Date10(-3, 1, 1)") self.assertEqual(str(dt), '-0003-01-01') def test_gregorian_year_repr(self): dt = GregorianYear.fromstring('1991') self.assertEqual(repr(dt), "GregorianYear(1991)") self.assertEqual(str(dt), '1991') dt = GregorianYear.fromstring('0000') self.assertEqual(repr(dt), "GregorianYear(-1)") self.assertEqual(str(dt), '0000') dt = GregorianYear10.fromstring('-0050') self.assertEqual(repr(dt), "GregorianYear10(-50)") self.assertEqual(str(dt), '-0050') def test_gregorian_day_repr(self): dt = GregorianDay.fromstring('---31') self.assertEqual(repr(dt), "GregorianDay(31)") self.assertEqual(str(dt), '---31') dt = GregorianDay.fromstring('---05Z') self.assertEqual(repr(dt), "GregorianDay(5, tzinfo=Timezone(datetime.timedelta(0)))") self.assertEqual(str(dt), '---05Z') def test_gregorian_month_repr(self): dt = GregorianMonth.fromstring('--09') self.assertEqual(repr(dt), "GregorianMonth(9)") self.assertEqual(str(dt), '--09') def test_gregorian_month_day_repr(self): dt = GregorianMonthDay.fromstring('--07-23') self.assertEqual(repr(dt), "GregorianMonthDay(7, 23)") self.assertEqual(str(dt), '--07-23') def test_gregorian_year_month_repr(self): dt = GregorianYearMonth.fromstring('-1890-12') self.assertEqual(repr(dt), "GregorianYearMonth(-1891, 12)") self.assertEqual(str(dt), '-1890-12') dt = GregorianYearMonth10.fromstring('-0050-04') self.assertEqual(repr(dt), "GregorianYearMonth10(-50, 4)") self.assertEqual(str(dt), '-0050-04') def test_time_repr(self): dt = Time.fromstring('20:40:13') self.assertEqual(repr(dt), "Time(20, 40, 13)") self.assertEqual(str(dt), '20:40:13') dt = Time.fromstring('24:00:00') self.assertEqual(repr(dt), "Time(0, 0, 0)") self.assertEqual(str(dt), '00:00:00') dt = Time.fromstring('15:34:29.000037') self.assertEqual(repr(dt), "Time(15, 34, 29.000037)") self.assertEqual(str(dt), '15:34:29.000037') def test_eq_operator(self): tz = Timezone.fromstring('-05:00') mkdt = DateTime.fromstring self.assertTrue(mkdt("2002-04-02T12:00:00-01:00") == mkdt("2002-04-02T17:00:00+04:00")) self.assertFalse(mkdt("2002-04-02T12:00:00") == mkdt("2002-04-02T23:00:00+06:00")) self.assertFalse(mkdt("2002-04-02T12:00:00") == mkdt("2002-04-02T17:00:00")) self.assertTrue(mkdt("2002-04-02T12:00:00") == mkdt("2002-04-02T12:00:00")) self.assertTrue(mkdt("2002-04-02T23:00:00-04:00") == mkdt("2002-04-03T02:00:00-01:00")) self.assertTrue(mkdt("1999-12-31T24:00:00") == mkdt("2000-01-01T00:00:00")) self.assertTrue(mkdt("2005-04-04T24:00:00") == mkdt("2005-04-05T00:00:00")) self.assertTrue( mkdt("2002-04-02T12:00:00-01:00", tz) == mkdt("2002-04-02T17:00:00+04:00", tz)) self.assertTrue(mkdt("2002-04-02T12:00:00", tz) == mkdt("2002-04-02T23:00:00+06:00", tz)) self.assertFalse(mkdt("2002-04-02T12:00:00", tz) == mkdt("2002-04-02T17:00:00", tz)) self.assertTrue(mkdt("2002-04-02T12:00:00", tz) == mkdt("2002-04-02T12:00:00", tz)) self.assertTrue( mkdt("2002-04-02T23:00:00-04:00", tz) == mkdt("2002-04-03T02:00:00-01:00", tz)) self.assertTrue(mkdt("1999-12-31T24:00:00", tz) == mkdt("2000-01-01T00:00:00", tz)) self.assertTrue(mkdt("2005-04-04T24:00:00", tz) == mkdt("2005-04-05T00:00:00", tz)) self.assertFalse(mkdt("2005-04-04T24:00:00", tz) != mkdt("2005-04-05T00:00:00", tz)) self.assertTrue(Date.fromstring("-1000-01-01") == Date.fromstring("-1000-01-01")) self.assertTrue(Date.fromstring("-10000-01-01") == Date.fromstring("-10000-01-01")) self.assertFalse(Date.fromstring("20000-01-01") != Date.fromstring("20000-01-01")) self.assertFalse(Date.fromstring("-10000-01-02") == Date.fromstring("-10000-01-01")) self.assertFalse(Date.fromstring("-10000-01-02") == (1, 2, 3)) # Wrong type self.assertTrue(Date.fromstring("-10000-01-02") != (1, 2, 3)) # Wrong type def test_lt_operator(self): mkdt = DateTime.fromstring mkdate = Date.fromstring self.assertTrue(mkdt("2002-04-02T12:00:00-01:00") < mkdt("2002-04-02T17:00:00-01:00")) self.assertFalse(mkdt("2002-04-02T18:00:00-01:00") < mkdt("2002-04-02T17:00:00-01:00")) self.assertTrue(mkdt("2002-04-02T18:00:00+02:00") < mkdt("2002-04-02T17:00:00Z")) self.assertTrue(mkdt("2002-04-02T18:00:00+02:00") < mkdt("2002-04-03T00:00:00Z")) self.assertTrue(mkdt("-2002-01-01T10:00:00") < mkdt("2001-01-01T17:00:00Z")) self.assertFalse(mkdt("2002-01-01T10:00:00") < mkdt("-2001-01-01T17:00:00Z")) self.assertTrue(mkdt("-2002-01-01T10:00:00") < mkdt("-2001-01-01T17:00:00Z")) self.assertTrue(mkdt("-12002-01-01T10:00:00") < mkdt("-12001-01-01T17:00:00Z")) self.assertFalse(mkdt("12002-01-01T10:00:00") < mkdt("12001-01-01T17:00:00Z")) self.assertTrue(mkdt("-10000-01-01T10:00:00Z") < mkdt("-10000-01-01T17:00:00Z")) self.assertRaises(TypeError, operator.lt, mkdt("2002-04-02T18:00:00+02:00"), mkdate("2002-04-03")) def test_le_operator(self): mkdt = DateTime.fromstring mkdate = Date.fromstring self.assertTrue(mkdt("2002-04-02T12:00:00-01:00") <= mkdt("2002-04-02T12:00:00-01:00")) self.assertFalse(mkdt("2002-04-02T18:00:00-01:00") <= mkdt("2002-04-02T17:00:00-01:00")) self.assertTrue(mkdt("2002-04-02T18:00:00+01:00") <= mkdt("2002-04-02T17:00:00Z")) self.assertTrue(mkdt("-2002-01-01T10:00:00") <= mkdt("2001-01-01T17:00:00Z")) self.assertFalse(mkdt("2002-01-01T10:00:00") <= mkdt("-2001-01-01T17:00:00Z")) self.assertTrue(mkdt("-2002-01-01T10:00:00") <= mkdt("-2001-01-01T17:00:00Z")) self.assertTrue(mkdt("-10000-01-01T10:00:00Z") <= mkdt("-10000-01-01T10:00:00Z")) self.assertTrue(mkdt("-190000-01-01T10:00:00Z") <= mkdt("0100-01-01T10:00:00Z")) self.assertRaises(TypeError, operator.le, mkdt("2002-04-02T18:00:00+02:00"), mkdate("2002-04-03")) def test_gt_operator(self): mkdt = DateTime.fromstring mkdate = Date.fromstring self.assertFalse(mkdt("2002-04-02T12:00:00-01:00") > mkdt("2002-04-02T17:00:00-01:00")) self.assertTrue(mkdt("2002-04-02T18:00:00-01:00") > mkdt("2002-04-02T17:00:00-01:00")) self.assertFalse(mkdt("2002-04-02T18:00:00+02:00") > mkdt("2002-04-02T17:00:00Z")) self.assertFalse(mkdt("2002-04-02T18:00:00+02:00") > mkdt("2002-04-03T00:00:00Z")) self.assertTrue(mkdt("2002-01-01T10:00:00") > mkdt("-2001-01-01T17:00:00Z")) self.assertFalse(mkdt("-2002-01-01T10:00:00") > mkdt("-2001-01-01T17:00:00Z")) self.assertTrue(mkdt("13567-04-18T10:00:00Z") > datetime.datetime.now()) self.assertFalse(mkdt("15032-11-12T23:17:59Z") > mkdt("15032-11-12T23:17:59Z")) self.assertRaises(TypeError, operator.lt, mkdt("2002-04-02T18:00:00+02:00"), mkdate("2002-04-03")) def test_ge_operator(self): mkdt = DateTime.fromstring mkdate = Date.fromstring self.assertTrue(mkdt("2002-04-02T12:00:00-01:00") >= mkdt("2002-04-02T12:00:00-01:00")) self.assertTrue(mkdt("2002-04-02T18:00:00-01:00") >= mkdt("2002-04-02T17:00:00-01:00")) self.assertTrue(mkdt("2002-04-02T18:00:00+01:00") >= mkdt("2002-04-02T17:00:00Z")) self.assertFalse(mkdt("-2002-01-01T10:00:00") >= mkdt("2001-01-01T17:00:00Z")) self.assertTrue(mkdt("2002-01-01T10:00:00") >= mkdt("-2001-01-01T17:00:00Z")) self.assertFalse(mkdt("-2002-01-01T10:00:00") >= mkdt("-2001-01-01T17:00:00Z")) self.assertTrue(mkdt("-3000-06-21T00:00:00Z") >= mkdt("-3000-06-21T00:00:00Z")) self.assertFalse(mkdt("-3000-06-21T00:00:00Z") >= mkdt("-3000-06-21T01:00:00Z")) self.assertTrue(mkdt("15032-11-12T23:17:59Z") >= mkdt("15032-11-12T23:17:59Z")) self.assertRaises(TypeError, operator.le, mkdt("2002-04-02T18:00:00+02:00"), mkdate("2002-04-03")) def test_fromdelta(self): self.assertIsNotNone(Date.fromstring('10000-02-28')) self.assertEqual(Date.fromdelta(datetime.timedelta(days=0)), Date.fromstring("0001-01-01")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=31)), Date.fromstring("0001-02-01")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=59)), Date.fromstring("0001-03-01")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=151)), Date.fromstring("0001-06-01")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=153)), Date.fromstring("0001-06-03")) self.assertEqual(DateTime.fromdelta(datetime.timedelta(days=153, seconds=72000)), DateTime.fromstring("0001-06-03T20:00:00")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=365)), Date.fromstring("0002-01-01")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=396)), Date.fromstring("0002-02-01")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=-366)), Date.fromstring("-0000-01-01")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=-1)), Date.fromstring("-0000-12-31")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=-335)), Date.fromstring("-0000-02-01")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=-1)), Date.fromstring("-0000-12-31")) self.assertEqual(Date10.fromdelta(datetime.timedelta(days=-366)), Date10.fromstring("-0001-01-01")) self.assertEqual(Date10.fromdelta(datetime.timedelta(days=-326)), Date10.fromstring("-0001-02-10")) self.assertEqual(Date10.fromdelta(datetime.timedelta(days=-1)), Date10.fromstring("-0001-12-31Z")) # With timezone adjusting self.assertEqual(Date10.fromdelta(datetime.timedelta(hours=-22), adjust_timezone=True), Date10.fromstring("-0001-12-31-02:00")) self.assertEqual(Date10.fromdelta(datetime.timedelta(hours=-27), adjust_timezone=True), Date10.fromstring("-0001-12-31+03:00")) self.assertEqual( Date10.fromdelta(datetime.timedelta(hours=-27, minutes=-12), adjust_timezone=True), Date10.fromstring("-0001-12-31+03:12") ) self.assertEqual( DateTime10.fromdelta(datetime.timedelta(hours=-27, minutes=-12, seconds=-5)), DateTime10.fromstring("-0001-12-30T20:47:55") ) def test_todelta(self): self.assertEqual(Date.fromstring("0001-01-01").todelta(), datetime.timedelta(days=0)) self.assertEqual(Date.fromstring("0001-02-01").todelta(), datetime.timedelta(days=31)) self.assertEqual(Date.fromstring("0001-03-01").todelta(), datetime.timedelta(days=59)) self.assertEqual(Date.fromstring("0001-06-01").todelta(), datetime.timedelta(days=151)) self.assertEqual(Date.fromstring("0001-06-03").todelta(), datetime.timedelta(days=153)) self.assertEqual(DateTime.fromstring("0001-06-03T20:00:00").todelta(), datetime.timedelta(days=153, seconds=72000)) self.assertEqual(Date.fromstring("0001-01-01-01:00").todelta(), datetime.timedelta(seconds=3600)) self.assertEqual(Date.fromstring("0001-01-01-07:00").todelta(), datetime.timedelta(seconds=3600 * 7)) self.assertEqual(Date.fromstring("0001-01-01+10:00").todelta(), datetime.timedelta(seconds=-3600 * 10)) self.assertEqual(Date.fromstring("0001-01-02+10:00").todelta(), DayTimeDuration.fromstring("PT14H").get_timedelta()) self.assertEqual(Date.fromstring("-0000-12-31-01:00").todelta(), DayTimeDuration.fromstring("-PT23H").get_timedelta()) self.assertEqual(Date10.fromstring("-0001-12-31-01:00").todelta(), DayTimeDuration.fromstring("-PT23H").get_timedelta()) self.assertEqual(Date.fromstring("-0000-12-31+01:00").todelta(), DayTimeDuration.fromstring("-P1DT1H").get_timedelta()) self.assertEqual(Date.fromstring("0002-01-01").todelta(), datetime.timedelta(days=365)) self.assertEqual(Date.fromstring("0002-02-01").todelta(), datetime.timedelta(days=396)) self.assertEqual(Date.fromstring("-0000-01-01").todelta(), datetime.timedelta(days=-366)) self.assertEqual(Date.fromstring("-0000-02-01").todelta(), datetime.timedelta(days=-335)) self.assertEqual(Date.fromstring("-0000-12-31").todelta(), datetime.timedelta(days=-1)) self.assertEqual(Date10.fromstring("-0001-01-01").todelta(), datetime.timedelta(days=-366)) self.assertEqual(Date10.fromstring("-0001-02-10").todelta(), datetime.timedelta(days=-326)) self.assertEqual(Date10.fromstring("-0001-12-31Z").todelta(), datetime.timedelta(days=-1)) self.assertEqual(Date10.fromstring("-0001-12-31-02:00").todelta(), datetime.timedelta(hours=-22)) self.assertEqual(Date10.fromstring("-0001-12-31+03:00").todelta(), datetime.timedelta(hours=-27)) self.assertEqual(Date10.fromstring("-0001-12-31+03:00").todelta(), datetime.timedelta(hours=-27)) self.assertEqual(Date10.fromstring("-0001-12-31+03:12").todelta(), datetime.timedelta(hours=-27, minutes=-12)) def test_to_and_from_delta(self): for month, day in [(1, 1), (1, 2), (2, 1), (2, 28), (3, 10), (6, 30), (12, 31)]: fmt1 = '{:04}-%s' % '{:02}-{:02}'.format(month, day) fmt2 = '{}-%s' % '{:02}-{:02}'.format(month, day) days = sum(MONTH_DAYS[m] for m in range(1, month)) + day - 1 for year in range(1, 15000): if year <= 500 or 9900 <= year <= 10100 or random.randint(1, 20) == 1: date_string = fmt1.format(year) if year < 10000 else fmt2.format(year) dt1 = Date10.fromstring(date_string) delta1 = dt1.todelta() delta2 = datetime.timedelta(days=days) self.assertEqual(delta1, delta2, msg="Failed for %r: %r != %r" % (dt1, delta1, delta2)) dt2 = Date10.fromdelta(delta2) self.assertEqual(dt1, dt2, msg="Failed for year %d: %r != %r" % (year, dt1, dt2)) days += 366 if isleap(year if month <= 2 else year + 1) else 365 def test_to_and_from_delta_bce(self): for month, day in [(1, 1), (1, 2), (2, 1), (2, 28), (3, 10), (5, 26), (6, 30), (12, 31)]: fmt1 = '-{:04}-%s' % '{:02}-{:02}'.format(month, day) fmt2 = '{}-%s' % '{:02}-{:02}'.format(month, day) days = -sum(MONTH_DAYS_LEAP[m] for m in range(month, 13)) + day - 1 for year in range(-1, -15000, -1): if year >= -500 or -9900 >= year >= -10100 or random.randint(1, 20) == 1: date_string = fmt1.format(abs(year)) if year > -10000 else fmt2.format(year) dt1 = Date10.fromstring(date_string) delta1 = dt1.todelta() delta2 = datetime.timedelta(days=days) self.assertEqual(delta1, delta2, msg="Failed for %r: %r != %r" % (dt1, delta1, delta2)) dt2 = Date10.fromdelta(delta2) self.assertEqual(dt1, dt2, msg="Failed for year %d: %r != %r" % (year, dt1, dt2)) days -= 366 if isleap(year if month <= 2 else year + 1) else 365 def test_add_operator(self): date = Date.fromstring date10 = Date10.fromstring daytime_duration = DayTimeDuration.fromstring self.assertEqual(date("0001-01-01") + daytime_duration('P2D'), date("0001-01-03")) self.assertEqual(date("0001-01-01") + daytime_duration('-P2D'), date("0000-12-30")) self.assertEqual(date("-0001-01-01") + daytime_duration('P2D'), date("-0001-01-03")) self.assertEqual(date("-0001-12-01") + daytime_duration('P30D'), date("-0001-12-31")) self.assertEqual(date("-0001-12-01") + daytime_duration('P31D'), date("0000-01-01")) self.assertEqual(date10("-0001-12-01") + daytime_duration('P31D'), date10("0001-01-01")) self.assertEqual(date("0001-01-01") + YearMonthDuration(months=12), Date(2, 1, 1)) self.assertEqual(date("-0003-01-01") + YearMonthDuration(months=12), Date(-3, 1, 1)) self.assertEqual(date("-0004-01-01") + YearMonthDuration(months=13), Date(-4, 2, 1)) self.assertEqual(date("0001-01-05") + YearMonthDuration(months=25), Date(3, 2, 5)) with self.assertRaises(TypeError) as err: date("0001-01-05") + date("0001-01-01") self.assertEqual(str(err.exception), "wrong type " "for operand Date(1, 1, 1)") with self.assertRaises(TypeError) as err: date("0001-01-05") + 10 self.assertEqual(str(err.exception), "wrong type for operand 10") self.assertEqual(Time(13, 30, 00) + daytime_duration('PT3M21S'), Time(13, 33, 21)) self.assertEqual(Time(21, 00, 00) + datetime.timedelta(seconds=105), Time(21, 1, 45)) with self.assertRaises(TypeError) as err: Time(21, 00, 00) + 105 self.assertEqual(str(err.exception), "wrong type for operand 105") def test_sub_operator(self): date = Date.fromstring date10 = Date10.fromstring daytime_duration = DayTimeDuration.fromstring self.assertEqual(date("2002-04-02") - date("2002-04-01"), DayTimeDuration(seconds=86400)) self.assertEqual(date("-2002-04-02") - date("-2002-04-01"), DayTimeDuration(seconds=86400)) self.assertEqual(date("-0002-01-01") - date("-0001-12-31"), DayTimeDuration.fromstring('-P729D')) self.assertEqual(date("-0101-01-01") - date("-0100-12-31"), DayTimeDuration.fromstring('-P729D')) self.assertEqual(date("15032-11-12") - date("15032-11-11"), DayTimeDuration(seconds=86400)) self.assertEqual(date("-9999-11-12") - date("-9999-11-11"), DayTimeDuration(seconds=86400)) self.assertEqual(date("-9999-11-12") - date("-9999-11-12"), DayTimeDuration(seconds=0)) self.assertEqual(date("-9999-11-11") - date("-9999-11-12"), DayTimeDuration(seconds=-86400)) self.assertEqual(date10("-2001-04-02-02:00") - date10("-2001-04-01"), DayTimeDuration.fromstring('P1DT2H')) self.assertEqual(Time(13, 30, 00) - Time(13, 00, 00), daytime_duration('PT30M')) self.assertEqual(Time(13, 30, 00) - Time(13, 59, 59), daytime_duration('-PT29M59S')) self.assertEqual(Time(13, 30, 00) - daytime_duration('PT3M21S'), Time(13, 26, 39)) self.assertEqual(Time(21, 00, 00) - datetime.timedelta(seconds=105), Time(20, 58, 15)) with self.assertRaises(TypeError) as err: Time(21, 00, 00) - 105 self.assertEqual(str(err.exception), "wrong type for operand 105") def test_hashing(self): dt = DateTime.fromstring("2002-04-02T12:00:00-01:00") self.assertIsInstance(hash(dt), int) class DurationTypesTest(unittest.TestCase): def test_init(self): self.assertIsInstance(Duration(months=1, seconds=37000), Duration) with self.assertRaises(ValueError) as err: Duration(months=-1, seconds=1) self.assertEqual(str(err.exception), "signs differ: (months=-1, seconds=1)") seconds = Decimal('1.0100001') self.assertNotEqual(Duration(seconds=seconds).seconds, seconds) with self.assertRaises(OverflowError): Duration(months=2 ** 32) with self.assertRaises(OverflowError): Duration(seconds=Decimal('1' * 40)) self.assertEqual(DayTimeDuration(300).seconds, 300) self.assertEqual(YearMonthDuration(10).months, 10) def test_init_fromstring(self): self.assertIsInstance(Duration.fromstring('P1Y'), Duration) self.assertIsInstance(Duration.fromstring('P1M'), Duration) self.assertIsInstance(Duration.fromstring('P1D'), Duration) self.assertIsInstance(Duration.fromstring('PT0H'), Duration) self.assertIsInstance(Duration.fromstring('PT1M'), Duration) self.assertIsInstance(Duration.fromstring('PT0.0S'), Duration) self.assertRaises(ValueError, Duration.fromstring, 'P') self.assertRaises(ValueError, Duration.fromstring, 'PT') self.assertRaises(ValueError, Duration.fromstring, '1Y') self.assertRaises(ValueError, Duration.fromstring, 'P1W1DT5H3M23.9S') self.assertRaises(ValueError, Duration.fromstring, 'P1.5Y') self.assertRaises(ValueError, Duration.fromstring, 'PT1.1H') self.assertRaises(ValueError, Duration.fromstring, 'P1.0DT5H3M23.9S') self.assertIsInstance(DayTimeDuration.fromstring('PT0.0S'), DayTimeDuration) with self.assertRaises(ValueError) as err: DayTimeDuration.fromstring('P1MT0.0S') self.assertEqual(str(err.exception), "months must be 0 for 'DayTimeDuration'") self.assertIsInstance(YearMonthDuration.fromstring('P1Y'), YearMonthDuration) with self.assertRaises(ValueError) as err: YearMonthDuration.fromstring('P1YT10S') self.assertEqual(str(err.exception), "seconds must be 0 for 'YearMonthDuration'") def test_string_representation(self): self.assertEqual(repr(Duration(months=1, seconds=86400)), 'Duration(months=1, seconds=86400)') self.assertEqual(repr(Duration.fromstring('P3Y1D')), 'Duration(months=36, seconds=86400)') self.assertEqual(repr(YearMonthDuration.fromstring('P3Y6M')), 'YearMonthDuration(months=42)') self.assertEqual(repr(DayTimeDuration.fromstring('P1DT6H')), 'DayTimeDuration(seconds=108000)') def test_as_string(self): self.assertEqual(str(Duration.fromstring('P3Y1D')), 'P3Y1D') self.assertEqual(str(Duration.fromstring('PT2M10.4S')), 'PT2M10.4S') self.assertEqual(str(Duration.fromstring('PT2400H')), 'P100D') self.assertEqual(str(Duration.fromstring('-P15M')), '-P1Y3M') self.assertEqual(str(Duration.fromstring('-P809YT3H5M5S')), '-P809YT3H5M5S') self.assertEqual(str(Duration.fromstring('-PT1H8S')), '-PT1H8S') self.assertEqual(str(Duration.fromstring('PT2H5M')), 'PT2H5M') self.assertEqual(str(Duration.fromstring('P0Y')), 'PT0S') self.assertEqual(str(YearMonthDuration.fromstring('P3Y6M')), 'P3Y6M') self.assertEqual(str(YearMonthDuration.fromstring('-P3Y6M')), '-P3Y6M') self.assertEqual(str(YearMonthDuration.fromstring('P7M')), 'P7M') self.assertEqual(str(YearMonthDuration.fromstring('P2Y')), 'P2Y') self.assertEqual(str(DayTimeDuration.fromstring('P1DT6H')), 'P1DT6H') def test_eq(self): self.assertEqual(Duration.fromstring('PT147.5S'), (0, 147.5)) self.assertEqual(Duration.fromstring('PT147.3S'), (0, Decimal("147.3"))) self.assertEqual(Duration.fromstring('PT2M10.4S'), (0, Decimal("130.4"))) self.assertEqual(Duration.fromstring('PT5H3M23.9S'), (0, Decimal("18203.9"))) self.assertEqual(Duration.fromstring('P1DT5H3M23.9S'), (0, Decimal("104603.9"))) self.assertEqual(Duration.fromstring('P31DT5H3M23.9S'), (0, Decimal("2696603.9"))) self.assertEqual(Duration.fromstring('P1Y1DT5H3M23.9S'), (12, Decimal("104603.9"))) self.assertEqual(Duration.fromstring('-P809YT3H5M5S'), (-9708, -11105)) self.assertEqual(Duration.fromstring('P15M'), (15, 0)) self.assertEqual(Duration.fromstring('P1Y'), (12, 0)) self.assertEqual(Duration.fromstring('P3Y1D'), (36, 3600 * 24)) self.assertEqual(Duration.fromstring('PT2400H'), (0, 8640000)) self.assertEqual(Duration.fromstring('PT4500M'), (0, 4500 * 60)) self.assertEqual(Duration.fromstring('PT4500M70S'), (0, 4500 * 60 + 70)) self.assertEqual(Duration.fromstring('PT5529615.3S'), (0, Decimal('5529615.3'))) self.assertEqual(Duration.fromstring('P3Y1D'), UntypedAtomic('P3Y1D')) self.assertFalse(Duration.fromstring('P3Y1D') == UntypedAtomic('P3Y2D')) def test_ne(self): self.assertNotEqual(Duration.fromstring('PT147.3S'), None) self.assertNotEqual(Duration.fromstring('PT147.3S'), (0, 147.3)) self.assertNotEqual(Duration.fromstring('P3Y1D'), (36, 3600 * 2)) self.assertNotEqual(Duration.fromstring('P3Y1D'), (36, 3600 * 24, 0)) self.assertNotEqual(Duration.fromstring('P3Y1D'), None) self.assertNotEqual(Duration.fromstring('P3Y1D'), Duration.fromstring('P3Y2D')) self.assertNotEqual(Duration.fromstring('P3Y1D'), YearMonthDuration.fromstring('P3Y')) self.assertNotEqual(Duration.fromstring('P3Y1D'), UntypedAtomic('P3Y2D')) self.assertFalse(Duration.fromstring('P3Y1D') != UntypedAtomic('P3Y1D')) def test_lt(self): self.assertTrue(Duration(months=15) < Duration(months=16)) self.assertFalse(Duration(months=16) < Duration(months=16)) self.assertTrue(Duration(months=16) < Duration.fromstring('P16M1D')) self.assertTrue(Duration(months=16) < Duration.fromstring('P16MT1H')) self.assertTrue(Duration(months=16) < Duration.fromstring('P16MT1M')) self.assertTrue(Duration(months=16) < Duration.fromstring('P16MT1S')) self.assertFalse(Duration(months=16) < Duration.fromstring('P16MT0S')) self.assertTrue(Time(20, 15, 0) < Time(21, 0, 0)) self.assertFalse(Time(21, 15, 0) < Time(21, 0, 0)) with self.assertRaises(TypeError) as err: _ = Duration(months=16) < 16 self.assertEqual(str(err.exception), "wrong type for operand 16") def test_le(self): self.assertTrue(Duration(months=15) <= Duration(months=16)) self.assertTrue(Duration(months=16) <= Duration(16)) self.assertTrue(Duration(months=16) <= Duration.fromstring('P16M1D')) self.assertTrue(Duration(months=16) <= Duration.fromstring('P16MT1H')) self.assertTrue(Duration(months=16) <= Duration.fromstring('P16MT1M')) self.assertTrue(Duration(months=16) <= Duration.fromstring('P16MT1S')) self.assertTrue(Duration(months=16) <= Duration.fromstring('P16MT0S')) self.assertTrue(Time(11, 10, 35) <= Time(11, 10, 35)) self.assertFalse(Time(11, 10, 35) <= Time(11, 10, 34)) def test_gt(self): self.assertTrue(Duration(months=16) > Duration(15)) self.assertFalse(Duration(months=16) > Duration(16)) self.assertFalse(Time(23, 59, 59) > Time(23, 59, 59)) self.assertTrue(Time(9, 0, 0) > Time(8, 59, 59)) def test_ge(self): self.assertTrue(Duration(16) >= Duration(15)) self.assertTrue(Duration(16) >= Duration(16)) self.assertTrue(Duration.fromstring('P1Y1DT1S') >= Duration.fromstring('P1Y1D')) self.assertTrue(Time(23, 59, 59) >= Time(23, 59, 59)) self.assertFalse(Time(23, 59, 58) >= Time(23, 59, 59)) def test_incomparable_values(self): self.assertFalse(Duration(1) < Duration.fromstring('P30D')) self.assertFalse(Duration(1) <= Duration.fromstring('P30D')) self.assertFalse(Duration(1) > Duration.fromstring('P30D')) self.assertFalse(Duration(1) >= Duration.fromstring('P30D')) def test_add_operator(self): daytime_duration = DayTimeDuration.fromstring year_month_duration = YearMonthDuration.fromstring self.assertEqual(daytime_duration('P2D') + daytime_duration('P1D'), DayTimeDuration(seconds=86400 * 3)) self.assertEqual(daytime_duration('P2D') + Date10(1999, 8, 12), Date10(1999, 8, 14)) self.assertEqual(year_month_duration('P2Y') + year_month_duration('P1Y'), YearMonthDuration(months=36)) self.assertEqual(year_month_duration('P2Y') + Date10(1999, 8, 12), Date10(2001, 8, 12)) with self.assertRaises(TypeError) as err: _ = year_month_duration('P2Y') + daytime_duration('P1D') self.assertIn("cannot add ", str(ctx.exception)) qname = QName('http://xpath.test/ns', 'foo') self.assertEqual(qname.namespace, 'http://xpath.test/ns') self.assertEqual(qname.local_name, 'foo') self.assertIsNone(qname.prefix) self.assertEqual(qname.expanded_name, '{http://xpath.test/ns}foo') qname = QName('http://xpath.test/ns', 'tst:foo') self.assertEqual(qname.namespace, 'http://xpath.test/ns') self.assertEqual(qname.local_name, 'foo') self.assertEqual(qname.prefix, 'tst') self.assertEqual(qname.expanded_name, '{http://xpath.test/ns}foo') def test_string_representation(self): qname = QName('http://xpath.test/ns', 'tst:foo') self.assertEqual(repr(qname), "QName(uri='http://xpath.test/ns', qname='tst:foo')") qname = QName(uri=None, qname='foo') self.assertEqual(repr(qname), "QName(uri='', qname='foo')") qname = QName(uri='', qname='foo') self.assertEqual(repr(qname), "QName(uri='', qname='foo')") def test_hash_value(self): qname = QName('http://xpath.test/ns', 'tst:foo') self.assertEqual(hash(qname), hash(('http://xpath.test/ns', 'foo'))) def test_equivalence(self): qname1 = QName('http://xpath.test/ns1', 'tst1:foo') qname2 = QName('http://xpath.test/ns1', 'tst2:foo') qname3 = QName('http://xpath.test/ns2', 'tst2:foo') self.assertEqual(qname1, qname2) self.assertNotEqual(qname1, qname3) self.assertNotEqual(qname2, qname3) self.assertEqual(qname1, 'tst1:foo') with self.assertRaises(TypeError) as ctx: _ = qname1 == 1 self.assertIn('cannot compare', str(ctx.exception)) def test_notation(self): with self.assertRaises(TypeError) as ec: Notation(None, 'foo') self.assertEqual(str(ec.exception), "can't instantiate xs:NOTATION objects") class EffectiveNotation(Notation): def __init__(self, uri, qname): super().__init__(uri, qname) notation = EffectiveNotation(None, 'foo') self.assertEqual(notation, QName(None, 'foo')) notation = EffectiveNotation('http://xpath.test/ns1', 'tst1:foo') self.assertEqual(notation, QName('http://xpath.test/ns1', 'tst2:foo')) self.assertEqual(hash(notation), hash(('http://xpath.test/ns1', 'foo'))) class AnyUriTest(unittest.TestCase): def test_init(self): uri = AnyURI('http://xpath.test') self.assertEqual(uri, 'http://xpath.test') self.assertEqual(AnyURI(b'http://xpath.test'), 'http://xpath.test') self.assertEqual(AnyURI(uri), uri) self.assertEqual(AnyURI(UntypedAtomic('http://xpath.test')), uri) with self.assertRaises(TypeError): AnyURI(1) def test_string_representation(self): self.assertEqual(repr(AnyURI('http://xpath.test')), "AnyURI('http://xpath.test')") def test_bool_value(self): self.assertTrue(bool(AnyURI('http://xpath.test'))) self.assertFalse(bool(AnyURI(''))) def test_hash_value(self): self.assertEqual(hash(AnyURI('http://xpath.test')), hash('http://xpath.test')) def test_in_operator(self): uri = AnyURI('http://xpath.test') self.assertIn('xpath', uri) self.assertNotIn('example', uri) def test_comparison_operators(self): uri = AnyURI('http://xpath.test') self.assertTrue(uri != 'http://example.test') self.assertTrue(uri != AnyURI('http://example.test')) with self.assertRaises(TypeError): _ = uri == 10 with self.assertRaises(TypeError): _ = uri != 10 self.assertLess(AnyURI('1'), AnyURI('2')) self.assertLess(AnyURI('1'), '2') self.assertLessEqual(AnyURI('1'), AnyURI('1')) self.assertLessEqual(AnyURI('1'), '1') self.assertGreater(AnyURI('2'), AnyURI('1')) self.assertGreater(AnyURI('2'), '1') self.assertGreaterEqual(AnyURI('1'), AnyURI('1')) self.assertGreaterEqual(AnyURI('1'), '1') def test_validate(self): uri = AnyURI('http://xpath.test') self.assertIsNone(AnyURI.validate(uri)) self.assertIsNone(AnyURI.validate(b'http://xpath.test')) self.assertIsNone(AnyURI.validate('http://xpath.test')) with self.assertRaises(TypeError): AnyURI.validate(1) with self.assertRaises(ValueError): AnyURI.validate('http:://xpath.test') class TypeProxiesTest(unittest.TestCase): def test_instance_check(self): self.assertIsInstance(10, NumericProxy) self.assertIsInstance(17.8, NumericProxy) self.assertIsInstance(Decimal('18.12'), NumericProxy) self.assertNotIsInstance(True, NumericProxy) self.assertNotIsInstance(Duration.fromstring('P1Y'), NumericProxy) self.assertIsInstance(10, ArithmeticProxy) def test_subclass_check(self): self.assertFalse(issubclass(bool, NumericProxy)) self.assertFalse(issubclass(str, NumericProxy)) self.assertTrue(issubclass(int, NumericProxy)) self.assertTrue(issubclass(float, NumericProxy)) self.assertTrue(issubclass(Decimal, NumericProxy)) self.assertFalse(issubclass(DateTime10, NumericProxy)) self.assertFalse(issubclass(bool, ArithmeticProxy)) self.assertFalse(issubclass(str, ArithmeticProxy)) self.assertTrue(issubclass(int, ArithmeticProxy)) self.assertTrue(issubclass(float, ArithmeticProxy)) self.assertTrue(issubclass(Decimal, ArithmeticProxy)) # noinspection PyArgumentList def test_instance_build(self): self.assertEqual(NumericProxy(), 0.0) self.assertEqual(NumericProxy(9), 9.0) self.assertEqual(NumericProxy('49'), 49.0) self.assertEqual(ArithmeticProxy(), 0.0) self.assertEqual(ArithmeticProxy(8.0), 8.0) self.assertEqual(ArithmeticProxy('81.0'), 81.0) def test_boolean_proxy(self): self.assertTrue(BooleanProxy(1)) self.assertFalse(BooleanProxy(float('nan'))) self.assertIsNone(BooleanProxy.validate(True)) self.assertIsNone(BooleanProxy.validate('true')) self.assertIsNone(BooleanProxy.validate('1')) self.assertIsNone(BooleanProxy.validate('false')) self.assertIsNone(BooleanProxy.validate('0')) with self.assertRaises(TypeError): BooleanProxy.validate(1) with self.assertRaises(ValueError): BooleanProxy.validate('2') def test_decimal_proxy(self): self.assertIsInstance(DecimalProxy(20.0), Decimal) self.assertEqual(Decimal('10'), DecimalProxy('10')) self.assertEqual(Decimal('10'), DecimalProxy(Decimal('10'))) self.assertEqual(Decimal('10.0'), DecimalProxy(10.0)) self.assertEqual(Decimal(1), DecimalProxy(True)) with self.assertRaises(TypeError): DecimalProxy(None) with self.assertRaises(ArithmeticError): DecimalProxy([]) with self.assertRaises(ValueError): DecimalProxy('false') with self.assertRaises(ValueError): DecimalProxy('INF') with self.assertRaises(ValueError): DecimalProxy('NaN') with self.assertRaises(ValueError): DecimalProxy(float('nan')) with self.assertRaises(ValueError): DecimalProxy(float('inf')) self.assertIsNone(DecimalProxy.validate(Decimal(-2.0))) self.assertIsNone(DecimalProxy.validate(17)) self.assertIsNone(DecimalProxy.validate('17')) with self.assertRaises(ValueError): DecimalProxy.validate(Decimal('nan')) with self.assertRaises(ValueError): DecimalProxy.validate('alpha') with self.assertRaises(TypeError): DecimalProxy.validate(True) def test_double_proxy(self): self.assertIsInstance(DoubleProxy10(20), float) self.assertEqual(DoubleProxy10('10'), 10.0) self.assertTrue(math.isnan(DoubleProxy10('NaN'))) self.assertTrue(math.isinf(DoubleProxy10('INF'))) self.assertTrue(math.isinf(DoubleProxy10('-INF'))) # noinspection PyTypeChecker self.assertTrue(math.isinf(DoubleProxy('+INF'))) with self.assertRaises(ValueError): DoubleProxy10('+INF') with self.assertRaises(ValueError): DoubleProxy('nan') with self.assertRaises(ValueError): DoubleProxy('inf') self.assertIsNone(DoubleProxy10.validate(1.9)) self.assertIsNone(DoubleProxy10.validate('1.9')) with self.assertRaises(TypeError): DoubleProxy10.validate(Float10('1.9')) with self.assertRaises(ValueError): DoubleProxy10.validate('six') def test_string_proxy(self): self.assertIsInstance(StringProxy(20), str) self.assertIsNone(StringProxy.validate('alpha')) with self.assertRaises(TypeError): StringProxy.validate(b'alpha') @unittest.skipIf(xmlschema is None, "xmlschema library required.") class AtomicValuesTest(unittest.TestCase): def test_get_atomic_value(self): schema = xmlschema.XMLSchema(dedent("""\ """)) self.assertEqual(get_atomic_value(schema.elements['d'].type), UntypedAtomic('1')) with self.assertRaises(AttributeError): value = get_atomic_value(schema) value = get_atomic_value(schema.elements['a'].type) self.assertIsInstance(value, UntypedAtomic) self.assertEqual(value, UntypedAtomic(value='1')) value = get_atomic_value(schema.elements['b'].type) self.assertIsInstance(value, int) self.assertEqual(value, 1) value = get_atomic_value(schema.elements['c'].type) self.assertIsInstance(value, UntypedAtomic) self.assertEqual(value, UntypedAtomic(value='1')) value = get_atomic_value(schema.elements['d'].type) self.assertIsInstance(value, float) self.assertEqual(value, 1.0) value = get_atomic_value(schema.elements['e'].type) self.assertIsInstance(value, UntypedAtomic) self.assertEqual(value, UntypedAtomic(value='1')) if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_elementpath.py000066400000000000000000000021321427546011100211020ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # # Note: Many tests in imported modules are built using the examples of the # XPath standards, published by W3C under the W3C Document License. # # References: # http://www.w3.org/TR/1999/REC-xpath-19991116/ # http://www.w3.org/TR/2010/REC-xpath20-20101214/ # http://www.w3.org/TR/2010/REC-xpath-functions-20101214/ # https://www.w3.org/Consortium/Legal/2015/doc-license # https://www.w3.org/TR/charmod-norm/ # if __name__ == '__main__': import unittest import os def load_tests(loader, tests, pattern): tests_dir = os.path.dirname(__file__) tests.addTests(loader.discover(start_dir=tests_dir, pattern=pattern or 'test*.py')) return tests unittest.main() elementpath-3.0.2/tests/test_etree.py000066400000000000000000000227601427546011100177110ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest import platform import importlib import io from pathlib import Path try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None from elementpath.etree import ElementTree, PyElementTree, \ SafeXMLParser, etree_tostring, is_etree_document class TestElementTree(unittest.TestCase): @unittest.skipUnless(platform.python_implementation() == 'CPython', "requires CPython") def test_imported_modules(self): self.assertIs(importlib.import_module('xml.etree.ElementTree'), ElementTree) self.assertIs(importlib.import_module('xml.etree').ElementTree, ElementTree) self.assertIsNot(ElementTree.Element, ElementTree._Element_Py, msg="cElementTree is not available!") def test_element_string_serialization(self): self.assertRaises(TypeError, etree_tostring, '') elem = ElementTree.Element('element') self.assertEqual(etree_tostring(elem), '') self.assertEqual(etree_tostring(elem, xml_declaration=True), '') self.assertEqual(etree_tostring(elem, encoding='us-ascii'), b'') self.assertEqual(etree_tostring(elem, encoding='us-ascii', indent=' '), b' ') self.assertEqual(etree_tostring(elem, encoding='us-ascii', xml_declaration=True), b'\n') self.assertEqual(etree_tostring(elem, encoding='ascii'), b"\n") self.assertEqual(etree_tostring(elem, encoding='ascii', xml_declaration=False), b'') self.assertEqual(etree_tostring(elem, encoding='utf-8'), b'') self.assertEqual(etree_tostring(elem, encoding='utf-8', xml_declaration=True), b'\n') self.assertEqual(etree_tostring(elem, encoding='iso-8859-1'), b"\n") self.assertEqual(etree_tostring(elem, encoding='iso-8859-1', xml_declaration=False), b"") self.assertEqual(etree_tostring(elem, method='html'), '') self.assertEqual(etree_tostring(elem, method='text'), '') root = ElementTree.XML('\n' ' text1\n' ' text2\n' '') self.assertEqual(etree_tostring(root, method='text'), '\n text1\n text2') def test_py_element_string_serialization(self): elem = PyElementTree.Element('element') self.assertEqual(etree_tostring(elem), '') self.assertEqual(etree_tostring(elem, xml_declaration=True), '') self.assertEqual(etree_tostring(elem, encoding='us-ascii'), b'') self.assertEqual(etree_tostring(elem, encoding='us-ascii', xml_declaration=True), b'\n') self.assertEqual(etree_tostring(elem, encoding='ascii'), b"\n") self.assertEqual(etree_tostring(elem, encoding='ascii', xml_declaration=False), b'') self.assertEqual(etree_tostring(elem, encoding='utf-8'), b'') self.assertEqual(etree_tostring(elem, encoding='utf-8', xml_declaration=True), b'\n') self.assertEqual(etree_tostring(elem, encoding='iso-8859-1'), b"\n") self.assertEqual(etree_tostring(elem, encoding='iso-8859-1', xml_declaration=False), b"") self.assertEqual(etree_tostring(elem, method='html'), '') self.assertEqual(etree_tostring(elem, method='text'), '') root = PyElementTree.XML('\n' ' text1\n' ' text2\n' '') self.assertEqual(etree_tostring(root, method='text'), '\n text1\n text2') @unittest.skipIf(lxml_etree is None, 'lxml is not installed ...') def test_lxml_element_string_serialization(self): elem = lxml_etree.Element('element') self.assertEqual(etree_tostring(elem), '') self.assertEqual(etree_tostring(elem, xml_declaration=True), '') self.assertEqual(etree_tostring(elem, encoding='us-ascii'), b'') self.assertEqual(etree_tostring(elem, encoding='us-ascii', xml_declaration=True), b'\n') self.assertEqual(etree_tostring(elem, encoding='ascii'), b'') self.assertEqual(etree_tostring(elem, encoding='ascii', xml_declaration=True), b'\n') self.assertEqual(etree_tostring(elem, encoding='utf-8'), b'') self.assertEqual(etree_tostring(elem, encoding='utf-8', xml_declaration=True), b'\n') self.assertEqual(etree_tostring(elem, encoding='iso-8859-1'), b"\n") self.assertEqual(etree_tostring(elem, encoding='iso-8859-1', xml_declaration=False), b"") self.assertEqual(etree_tostring(elem, method='html'), '') self.assertEqual(etree_tostring(elem, method='text'), '') root = lxml_etree.XML('\n' ' text1\n' ' text2\n' '') self.assertEqual(etree_tostring(root, method='text'), '\n text1\n text2') def test_defuse_xml_entities(self): xml_file = Path(__file__).parent.joinpath('resources/with_entity.xml') elem = ElementTree.parse(str(xml_file)).getroot() self.assertEqual(elem.text, 'abc') parser = SafeXMLParser(target=PyElementTree.TreeBuilder()) with self.assertRaises(PyElementTree.ParseError) as ctx: ElementTree.parse(xml_file, parser=parser) self.assertEqual("Entities are forbidden (entity_name='e')", str(ctx.exception)) def test_defuse_xml_external_entities(self): xml_file = Path(__file__).parent.joinpath('resources/external_entity.xml') with self.assertRaises(ElementTree.ParseError) as ctx: ElementTree.parse(str(xml_file)) self.assertIn("undefined entity &ee", str(ctx.exception)) parser = SafeXMLParser(target=PyElementTree.TreeBuilder()) with self.assertRaises(PyElementTree.ParseError) as ctx: ElementTree.parse(str(xml_file), parser=parser) self.assertEqual("Entities are forbidden (entity_name='ee')", str(ctx.exception)) def test_defuse_xml_unused_external_entities(self): xml_file = str(Path(__file__).parent.joinpath('resources/unused_external_entity.xml')) elem = ElementTree.parse(xml_file).getroot() self.assertEqual(elem.text, 'abc') parser = SafeXMLParser(target=PyElementTree.TreeBuilder()) with self.assertRaises(PyElementTree.ParseError) as ctx: ElementTree.parse(xml_file, parser=parser) self.assertEqual("Entities are forbidden (entity_name='ee')", str(ctx.exception)) def test_defuse_xml_unparsed_entities(self): xml_file = Path(__file__).parent.joinpath('resources/unparsed_entity.xml') parser = SafeXMLParser(target=PyElementTree.TreeBuilder()) with self.assertRaises(PyElementTree.ParseError) as ctx: ElementTree.parse(str(xml_file), parser=parser) self.assertEqual("Unparsed entities are forbidden (entity_name='logo_file')", str(ctx.exception)) def test_defuse_xml_unused_unparsed_entities(self): xml_file = Path(__file__).parent.joinpath('resources/unused_unparsed_entity.xml') elem = ElementTree.parse(str(xml_file)).getroot() self.assertIsNone(elem.text) parser = SafeXMLParser(target=PyElementTree.TreeBuilder()) with self.assertRaises(PyElementTree.ParseError) as ctx: ElementTree.parse(str(xml_file), parser=parser) self.assertEqual("Unparsed entities are forbidden (entity_name='logo_file')", str(ctx.exception)) def test_is_etree_document_function(self): document = ElementTree.parse(io.StringIO('')) self.assertTrue(is_etree_document(document)) self.assertFalse(is_etree_document(ElementTree.XML(''))) if __name__ == '__main__': header_template = "ElementTree tests for elementpath with Python {} on {}" header = header_template.format(platform.python_version(), platform.platform()) print('{0}\n{1}\n{0}'.format("*" * len(header), header)) unittest.main() elementpath-3.0.2/tests/test_exceptions.py000066400000000000000000000050611427546011100207610ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest from elementpath.exceptions import ElementPathError, xpath_error from elementpath.namespaces import XSD_NAMESPACE from elementpath.xpath1 import XPath1Parser class ExceptionsTest(unittest.TestCase): @classmethod def setUpClass(cls): cls.parser = XPath1Parser(namespaces={'xs': XSD_NAMESPACE, 'tst': "http://xpath.test/ns"}) def test_string_conversion(self): err = ElementPathError("unknown error") self.assertEqual(str(err), 'unknown error') err = ElementPathError("unknown error", code='XPST0001') self.assertEqual(str(err), '[XPST0001] unknown error') token = self.parser.symbol_table['true'](self.parser) err = ElementPathError("unknown error", token=token) self.assertEqual(str(err), "'true' function at line 1, column 1: unknown error") err = ElementPathError("unknown error", code='XPST0001', token=token) self.assertEqual(str(err), "'true' function at line 1, column 1: [XPST0001] unknown error") def test_xpath_error(self): self.assertEqual(str(xpath_error('XPST0001')), '[err:XPST0001] Parser not bound to a schema') self.assertEqual(str(xpath_error('err:XPDY0002', "test message")), '[err:XPDY0002] test message') self.assertRaises(ValueError, xpath_error, '') self.assertRaises(ValueError, xpath_error, 'error:XPDY0002') self.assertEqual(str(xpath_error('{http://www.w3.org/2005/xqt-errors}XPST0001')), '[err:XPST0001] Parser not bound to a schema') with self.assertRaises(ValueError) as err: xpath_error('{http://www.w3.org/2005/xpath-functions}XPST0001') self.assertEqual(str(err.exception), "[err:XPTY0004] invalid namespace " "'http://www.w3.org/2005/xpath-functions'") with self.assertRaises(ValueError) as err: xpath_error('{http://www.w3.org/2005/xpath-functions}}XPST0001') self.assertEqual(str(err.exception), "[err:XPTY0004] '{http://www.w3.org/2005/xpath-" "functions}}XPST0001' is not an xs:QName",) if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_helpers.py000066400000000000000000000161331427546011100202440ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest import math from xml.etree import ElementTree from elementpath.helpers import days_from_common_era, months2days, \ round_number, is_idrefs, collapse_white_spaces, normalize_sequence_type class HelperFunctionsTest(unittest.TestCase): def test_node_is_idref_function(self): self.assertTrue(is_idrefs(ElementTree.XML('xyz').text)) self.assertTrue(is_idrefs(ElementTree.XML('xyz abc').text)) self.assertFalse(is_idrefs(ElementTree.XML('12345').text)) self.assertTrue(is_idrefs('alpha')) self.assertTrue(is_idrefs('alpha beta')) self.assertFalse(is_idrefs('12345')) def test_days_from_common_era_function(self): days4y = 365 * 3 + 366 days100y = days4y * 24 + 365 * 4 days400y = days100y * 4 + 1 self.assertEqual(days_from_common_era(0), 0) self.assertEqual(days_from_common_era(1), 365) self.assertEqual(days_from_common_era(3), 365 * 3) self.assertEqual(days_from_common_era(4), days4y) self.assertEqual(days_from_common_era(100), days100y) self.assertEqual(days_from_common_era(200), days100y * 2) self.assertEqual(days_from_common_era(300), days100y * 3) self.assertEqual(days_from_common_era(400), days400y) self.assertEqual(days_from_common_era(800), 2 * days400y) self.assertEqual(days_from_common_era(-1), -366) self.assertEqual(days_from_common_era(-4), -days4y) self.assertEqual(days_from_common_era(-5), -days4y - 366) self.assertEqual(days_from_common_era(-100), -days100y - 1) self.assertEqual(days_from_common_era(-200), -days100y * 2 - 1) self.assertEqual(days_from_common_era(-300), -days100y * 3 - 1) self.assertEqual(days_from_common_era(-101), -days100y - 366) self.assertEqual(days_from_common_era(-400), -days400y) self.assertEqual(days_from_common_era(-401), -days400y - 366) self.assertEqual(days_from_common_era(-800), -days400y * 2) def test_months2days_function(self): self.assertEqual(months2days(-119, 1, 12 * 319), 116512) self.assertEqual(months2days(200, 1, -12 * 320) - 1, -116877 - 2) # 0000 BCE tests self.assertEqual(months2days(0, 1, 12), 366) self.assertEqual(months2days(0, 1, -12), -365) self.assertEqual(months2days(1, 1, 12), 365) self.assertEqual(months2days(1, 1, -12), -366) # xs:duration ordering related tests self.assertEqual(months2days(year=1696, month=9, months_delta=0), 0) self.assertEqual(months2days(1696, 9, 1), 30) self.assertEqual(months2days(1696, 9, 2), 61) self.assertEqual(months2days(1696, 9, 3), 91) self.assertEqual(months2days(1696, 9, 4), 122) self.assertEqual(months2days(1696, 9, 5), 153) self.assertEqual(months2days(1696, 9, 12), 365) self.assertEqual(months2days(1696, 9, -1), -31) self.assertEqual(months2days(1696, 9, -2), -62) self.assertEqual(months2days(1696, 9, -12), -366) self.assertEqual(months2days(1697, 2, 0), 0) self.assertEqual(months2days(1697, 2, 1), 28) self.assertEqual(months2days(1697, 2, 12), 365) self.assertEqual(months2days(1697, 2, -1), -31) self.assertEqual(months2days(1697, 2, -2), -62) self.assertEqual(months2days(1697, 2, -3), -92) self.assertEqual(months2days(1697, 2, -12), -366) self.assertEqual(months2days(1697, 2, -14), -428) self.assertEqual(months2days(1697, 2, -15), -458) self.assertEqual(months2days(1903, 3, 0), 0) self.assertEqual(months2days(1903, 3, 1), 31) self.assertEqual(months2days(1903, 3, 2), 61) self.assertEqual(months2days(1903, 3, 3), 92) self.assertEqual(months2days(1903, 3, 4), 122) self.assertEqual(months2days(1903, 3, 11), 366 - 29) self.assertEqual(months2days(1903, 3, 12), 366) self.assertEqual(months2days(1903, 3, -1), -28) self.assertEqual(months2days(1903, 3, -2), -59) self.assertEqual(months2days(1903, 3, -3), -90) self.assertEqual(months2days(1903, 3, -12), -365) self.assertEqual(months2days(1903, 7, 0), 0) self.assertEqual(months2days(1903, 7, 1), 31) self.assertEqual(months2days(1903, 7, 2), 62) self.assertEqual(months2days(1903, 7, 3), 92) self.assertEqual(months2days(1903, 7, 6), 184) self.assertEqual(months2days(1903, 7, 12), 366) self.assertEqual(months2days(1903, 7, -1), -30) self.assertEqual(months2days(1903, 7, -2), -61) self.assertEqual(months2days(1903, 7, -6), -181) self.assertEqual(months2days(1903, 7, -12), -365) # Extra tests self.assertEqual(months2days(1900, 3, 0), 0) self.assertEqual(months2days(1900, 3, 1), 31) self.assertEqual(months2days(1900, 3, 24), 730) self.assertEqual(months2days(1900, 3, -1), -28) self.assertEqual(months2days(1900, 3, -24), -730) self.assertEqual(months2days(1000, 4, 0), 0) self.assertEqual(months2days(1000, 4, 1), 30) self.assertEqual(months2days(1000, 4, 24), 730) self.assertEqual(months2days(1000, 4, -1), -31) self.assertEqual(months2days(1000, 4, -24), -730) self.assertEqual(months2days(2001, 10, -12), -365) self.assertEqual(months2days(2000, 10, -12), -366) self.assertEqual(months2days(2000, 2, -12), -365) self.assertEqual(months2days(2000, 3, -12), -366) def test_round_number_function(self): self.assertTrue(math.isnan(round_number(float('NaN')))) self.assertTrue(math.isinf(round_number(float('INF')))) self.assertTrue(math.isinf(round_number(float('-INF')))) self.assertEqual(round_number(10.1), 10) self.assertEqual(round_number(9.5), 10) self.assertEqual(round_number(-10.1), -10) self.assertEqual(round_number(-9.5), -9) def test_collapse_white_spaces_function(self): self.assertEqual(collapse_white_spaces(' ab c '), 'ab c') self.assertEqual(collapse_white_spaces(' ab\t\nc '), 'ab c') def test_normalize_sequence_type_function(self): self.assertEqual(normalize_sequence_type(' xs:integer + '), 'xs:integer+') self.assertEqual(normalize_sequence_type(' xs :integer + '), 'xs :integer+') # Invalid self.assertEqual(normalize_sequence_type(' element( * ) '), 'element(*)') self.assertEqual(normalize_sequence_type(' element( *,xs:int ) '), 'element(*, xs:int)') self.assertEqual(normalize_sequence_type(' \nfunction ( * )\t '), 'function(*)') self.assertEqual( normalize_sequence_type(' \nfunction ( item( ) * ) as xs:integer\t '), 'function(item()*) as xs:integer' ) if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_namespaces.py000066400000000000000000000052721427546011100207230ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest from elementpath.namespaces import XSD_NAMESPACE, get_namespace, \ get_prefixed_name, get_expanded_name, split_expanded_name class NamespacesTest(unittest.TestCase): namespaces = { 'xs': XSD_NAMESPACE, 'tst': "http://xpath.test/ns" } # namespaces.py module def test_get_namespace_function(self): self.assertEqual(get_namespace('A'), '') self.assertEqual(get_namespace('{ns}foo'), 'ns') self.assertEqual(get_namespace('{}foo'), '') self.assertEqual(get_namespace('{A}B{C}'), 'A') def test_qname_to_prefixed_function(self): self.assertEqual(get_prefixed_name('{ns}foo', {'bar': 'ns'}), 'bar:foo') self.assertEqual(get_prefixed_name('{ns}foo', {'': 'ns'}), 'foo') self.assertEqual(get_prefixed_name('Q{ns}foo', {'': 'ns'}), 'foo') self.assertEqual(get_prefixed_name('foo', {'': 'ns'}), 'foo') self.assertEqual(get_prefixed_name('', {'': 'ns'}), '') self.assertEqual(get_prefixed_name('{ns}foo', {}), '{ns}foo') self.assertEqual(get_prefixed_name('{ns}foo', {'bar': 'other'}), '{ns}foo') with self.assertRaises(ValueError): get_prefixed_name('{{ns}}foo', {'bar': 'ns'}) def test_prefixed_to_qname_function(self): self.assertEqual(get_expanded_name('{ns}foo', {'bar': 'ns'}), '{ns}foo') self.assertEqual(get_expanded_name('bar:foo', {'bar': 'ns'}), '{ns}foo') self.assertEqual(get_expanded_name('foo', {'': 'ns'}), '{ns}foo') self.assertEqual(get_expanded_name('', {'': 'ns'}), '') with self.assertRaises(KeyError): get_expanded_name('bar:foo', self.namespaces) with self.assertRaises(ValueError): get_expanded_name('bar:foo:bar', {'bar': 'ns'}) with self.assertRaises(ValueError): get_expanded_name(':foo', {'': 'ns'}) with self.assertRaises(ValueError): get_expanded_name('foo:', {'': 'ns'}) def test_split_expanded_name_function(self): self.assertEqual(split_expanded_name('{ns}foo'), ('ns', 'foo')) self.assertEqual(split_expanded_name('foo'), ('', 'foo')) with self.assertRaises(ValueError): split_expanded_name('tst:foo') with self.assertRaises(ValueError): split_expanded_name('{{ns}}foo') if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_package.py000066400000000000000000000053451427546011100202000ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest import glob import fileinput import os import re import platform class PackageTest(unittest.TestCase): @classmethod def setUpClass(cls): cls.test_dir = os.path.dirname(os.path.abspath(__file__)) cls.package_dir = os.path.dirname(cls.test_dir) cls.source_dir = os.path.join(cls.package_dir, 'elementpath/') cls.missing_debug = re.compile( r"(\bimport\s+pdb\b|\bpdb\s*\.\s*set_trace\(\s*\)|\bprint\s*\(|\bbreakpoint\s*\()") cls.get_version = re.compile( r"(?:\bversion|__version__)(?:\s*=\s*)(\'[^\']*\'|\"[^\"]*\")") @unittest.skipIf(platform.system() == 'Windows', 'Skip on Windows platform') def test_missing_debug_statements(self): message = "\nFound a debug missing statement at line %d of file %r: %r" filename = None source_files = glob.glob(os.path.join(self.source_dir, '*.py')) + \ glob.glob(os.path.join(self.source_dir, '*/*.py')) for line in fileinput.input(source_files): if fileinput.isfirstline(): filename = os.path.basename(fileinput.filename()) if filename == 'generate_categories.py': fileinput.nextfile() continue lineno = fileinput.filelineno() match = self.missing_debug.search(line) self.assertIsNone( match, message % (lineno, filename, match.group(0) if match else None) ) def test_version_matching(self): message = "\nFound a different version at line %d of file %r: %r (maybe %r)." files = [ os.path.join(self.source_dir, '__init__.py'), os.path.join(self.package_dir, 'setup.py'), ] version = filename = None for line in fileinput.input(files): if fileinput.isfirstline(): filename = fileinput.filename() lineno = fileinput.filelineno() match = self.get_version.search(line) if match is not None: if version is None: version = match.group(1).strip('\'\"') else: self.assertTrue( version == match.group(1).strip('\'\"'), message % (lineno, filename, match.group(1).strip('\'\"'), version) ) if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_regex.py000066400000000000000000001224311427546011100177130ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2016-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ This module runs tests on XML Schema regular expressions. """ import unittest import sys import re import string from itertools import chain from unicodedata import category from elementpath.regex import RegexError, CharacterClass, translate_pattern from elementpath.regex.codepoints import get_code_point_range from elementpath.regex.unicode_subsets import code_point_repr, \ iterparse_character_subset, iter_code_points, UnicodeSubset, \ UNICODE_CATEGORIES class TestCodePoints(unittest.TestCase): def test_iter_code_points(self): self.assertEqual(list(iter_code_points([10, 20, 11, 12, 25, (9, 21), 21])), [(9, 22), 25]) self.assertEqual(list(iter_code_points([10, 20, 11, 12, 25, (9, 20), 21])), [(9, 22), 25]) self.assertEqual(list(iter_code_points({2, 120, 121, (150, 260)})), [2, (120, 122), (150, 260)]) self.assertEqual( list(iter_code_points([10, 20, (10, 22), 11, 12, 25, 8, (9, 20), 21, 22, 9, 0])), [0, (8, 23), 25] ) self.assertEqual( list(e for e in iter_code_points([10, 20, 11, 12, 25, (9, 21)], reverse=True)), [25, (9, 21)] ) self.assertEqual( list(iter_code_points([10, 20, (10, 22), 11, 12, 25, 8, (9, 20), 21, 22, 9, 0], reverse=True)), [25, (8, 23), 0] ) def test_get_code_point_range(self): self.assertEqual(get_code_point_range(97), (97, 98)) self.assertEqual(get_code_point_range((97, 100)), (97, 100)) self.assertEqual(get_code_point_range([97, 100]), [97, 100]) self.assertIsNone(get_code_point_range(-1)) self.assertIsNone(get_code_point_range(sys.maxunicode + 1)) self.assertIsNone(get_code_point_range((-1, 100))) self.assertIsNone(get_code_point_range((97, sys.maxunicode + 2))) self.assertIsNone(get_code_point_range(97.0)) self.assertIsNone(get_code_point_range((97.0, 100))) class TestParseCharacterSubset(unittest.TestCase): def test_expand_ranges(self): self.assertListEqual( list(iterparse_character_subset('a-e', expand_ranges=True)), [ord('a'), ord('b'), ord('c'), ord('d'), ord('e')] ) def test_backslash_character(self): self.assertListEqual(list(iterparse_character_subset('\\')), [ord('\\')]) self.assertListEqual(list(iterparse_character_subset('2-\\')), [(ord('2'), ord('\\') + 1)]) self.assertListEqual(list(iterparse_character_subset('2-\\\\')), [(ord('2'), ord('\\') + 1), ord('\\')]) self.assertListEqual(list(iterparse_character_subset('2-\\x')), [(ord('2'), ord('\\') + 1), ord('x')]) self.assertListEqual(list(iterparse_character_subset('2-\\a-x')), [(ord('2'), ord('\\') + 1), (ord('a'), ord('x') + 1)]) self.assertListEqual(list(iterparse_character_subset('2-\\{')), [(ord('2'), ord('{') + 1)]) def test_backslash_escapes(self): self.assertListEqual(list(iterparse_character_subset('\\{')), [ord('{')]) self.assertListEqual(list(iterparse_character_subset('\\(')), [ord('(')]) self.assertListEqual(list(iterparse_character_subset('\\a')), [ord('\\'), ord('a')]) def test_square_brackets(self): self.assertListEqual(list(iterparse_character_subset('\\[')), [ord('[')]) self.assertListEqual(list(iterparse_character_subset('[')), [ord('[')]) with self.assertRaises(RegexError) as ctx: list(iterparse_character_subset('[ ')) self.assertIn("bad character '['", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: list(iterparse_character_subset('x[')) self.assertIn("bad character '['", str(ctx.exception)) self.assertListEqual(list(iterparse_character_subset('\\]')), [ord(']')]) self.assertListEqual(list(iterparse_character_subset(']')), [ord(']')]) with self.assertRaises(RegexError) as ctx: list(iterparse_character_subset('].')) self.assertIn("bad character ']'", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: list(iterparse_character_subset('8[')) self.assertIn("bad character '['", str(ctx.exception)) def test_character_range(self): self.assertListEqual(list(iterparse_character_subset('A-z')), [(ord('A'), ord('z') + 1)]) self.assertListEqual(list(iterparse_character_subset('\\[-z')), [(ord('['), ord('z') + 1)]) def test_bad_character_range(self): with self.assertRaises(RegexError) as ctx: list(iterparse_character_subset('9-2')) self.assertIn('bad character range', str(ctx.exception)) with self.assertRaises(RegexError) as ctx: list(iterparse_character_subset('2-\\s')) self.assertIn('bad character range', str(ctx.exception)) def test_parse_multiple_ranges(self): self.assertListEqual( list(iterparse_character_subset('a-c-1-4x-z-7-9')), [(ord('a'), ord('c') + 1), ord('-'), (ord('1'), ord('4') + 1), (ord('x'), ord('z') + 1), ord('-'), (55, 58)] ) class TestUnicodeSubset(unittest.TestCase): def test_creation(self): subset = UnicodeSubset([(0, 9), 11, 12, (14, 32), (33, sys.maxunicode + 1)]) self.assertEqual(subset, [(0, 9), 11, 12, (14, 32), (33, sys.maxunicode + 1)]) self.assertEqual(UnicodeSubset('0-9'), [(48, 58)]) self.assertEqual(UnicodeSubset('0-9:'), [(48, 59)]) subset = UnicodeSubset('a-z') self.assertEqual(UnicodeSubset(subset), [(ord('a'), ord('z') + 1)]) def test_repr(self): self.assertEqual(code_point_repr((ord('2'), ord('\\') + 1)), r'2-\\') subset = UnicodeSubset('a-z') self.assertEqual(repr(subset), "UnicodeSubset('a-z')") self.assertEqual(str(subset), "a-z") subset = UnicodeSubset((50, 90)) subset.codepoints.append(sys.maxunicode + 10) # Invalid subset self.assertRaises(ValueError, repr, subset) def test_modify(self): subset = UnicodeSubset() for cp in [50, 90, 10, 90]: subset.add(cp) self.assertEqual(subset, [10, 50, 90]) self.assertRaises(ValueError, subset.add, -1) self.assertRaises(ValueError, subset.add, sys.maxunicode + 1) subset.add((100, 20001)) subset.discard((100, 19001)) self.assertEqual(subset, [10, 50, 90, (19001, 20001)]) subset.add(0) subset.discard(1) self.assertEqual(subset, [0, 10, 50, 90, (19001, 20001)]) subset.discard(0) self.assertEqual(subset, [10, 50, 90, (19001, 20001)]) subset.discard((10, 100)) self.assertEqual(subset, [(19001, 20001)]) subset.add(20) subset.add(19) subset.add(30) subset.add([30, 33]) subset.add(30000) subset.add(30001) self.assertEqual(subset, [(19, 21), (30, 33), (19001, 20001), (30000, 30002)]) subset.add(22) subset.add(21) subset.add(22) self.assertEqual(subset, [(19, 22), 22, (30, 33), (19001, 20001), (30000, 30002)]) subset.discard((90, 50000)) self.assertEqual(subset, [(19, 22), 22, (30, 33)]) subset.discard(21) subset.discard(19) self.assertEqual(subset, [20, 22, (30, 33)]) subset.discard((0, 200)) self.assertEqual(subset, []) with self.assertRaises(ValueError): subset.discard(None) with self.assertRaises(ValueError): subset.discard((10, 11, 12)) def test_update_method(self): subset = UnicodeSubset() subset.update('\\\\') self.assertListEqual(subset.codepoints, [ord('\\')]) subset.update('\\$') self.assertListEqual(subset.codepoints, [ord('$'), ord('\\')]) subset.clear() subset.update('!--') self.assertListEqual(subset.codepoints, [(ord('!'), ord('-') + 1)]) subset.clear() subset.update('!---') self.assertListEqual(subset.codepoints, [(ord('!'), ord('-') + 1)]) subset.clear() subset.update('!--a') self.assertListEqual(subset.codepoints, [(ord('!'), ord('-') + 1), ord('a')]) with self.assertRaises(RegexError): subset.update('[[') def test_difference_update_method(self): subset = UnicodeSubset('a-z') subset.difference_update('a-c') self.assertEqual(subset, UnicodeSubset('d-z')) subset = UnicodeSubset('a-z') subset.difference_update([(ord('a'), ord('c') + 1)]) self.assertEqual(subset, UnicodeSubset('d-z')) def test_iterate(self): subset = UnicodeSubset('a-d') self.assertListEqual(list(iter(subset)), [ord('a'), ord('b'), ord('c'), ord('d')]) self.assertListEqual(list(subset.iter_characters()), ['a', 'b', 'c', 'd']) def test_reversed(self): subset = UnicodeSubset('0-9ax') self.assertEqual(list(reversed(subset)), [ord('x'), ord('a'), ord('9'), 56, 55, 54, 53, 52, 51, 50, 49, 48]) def test_in_operator(self): subset = UnicodeSubset('0-9a-z') self.assertIn('a', subset) self.assertIn(ord('a'), subset) self.assertIn(ord('z'), subset) self.assertNotIn('/', subset) self.assertNotIn('A', subset) self.assertNotIn(ord('A'), subset) self.assertNotIn(ord('}'), subset) self.assertNotIn(float(ord('a')), subset) self.assertNotIn('.', subset) subset.update('.') self.assertIn('.', subset) self.assertNotIn('/', subset) self.assertNotIn('-', subset) def test_complement(self): subset = UnicodeSubset((50, 90, 10, 90)) self.assertEqual(list(subset.complement()), [(0, 10), (11, 50), (51, 90), (91, sys.maxunicode + 1)]) subset.add(11) self.assertEqual(list(subset.complement()), [(0, 10), (12, 50), (51, 90), (91, sys.maxunicode + 1)]) subset.add((0, 10)) self.assertEqual(list(subset.complement()), [(12, 50), (51, 90), (91, sys.maxunicode + 1)]) s1 = UnicodeSubset(chain( UNICODE_CATEGORIES['L'].codepoints, UNICODE_CATEGORIES['M'].codepoints, UNICODE_CATEGORIES['N'].codepoints, UNICODE_CATEGORIES['S'].codepoints )) s2 = UnicodeSubset(chain( UNICODE_CATEGORIES['C'].codepoints, UNICODE_CATEGORIES['P'].codepoints, UNICODE_CATEGORIES['Z'].codepoints )) self.assertEqual(s1.codepoints, UnicodeSubset(s2.complement()).codepoints) subset = UnicodeSubset((50, 90)) subset.codepoints.append(70) # Invalid subset (unordered) with self.assertRaises(ValueError) as ctx: list(subset.complement()) self.assertEqual( str(ctx.exception), "unordered code points found in UnicodeSubset('2ZF')") subset = UnicodeSubset((sys.maxunicode - 1,)) self.assertEqual(list(subset.complement()), [(0, sys.maxunicode - 1), sys.maxunicode]) def test_equality(self): self.assertFalse(UnicodeSubset() == 0.0) self.assertEqual(UnicodeSubset('a-z'), UnicodeSubset('a-kl-z')) def test_union_and_intersection(self): s1 = UnicodeSubset([50, (90, 200), 10]) s2 = UnicodeSubset([10, 51, (89, 150), 90]) self.assertEqual(s1 | s2, [10, (50, 52), (89, 200)]) self.assertEqual(s1 & s2, [10, (90, 150)]) subset = UnicodeSubset('a-z') subset |= UnicodeSubset('A-Zfx') self.assertEqual(subset, UnicodeSubset('A-Za-z')) subset |= '0-9' self.assertEqual(subset, UnicodeSubset('0-9A-Za-z')) subset |= [ord('{'), ord('}')] self.assertEqual(subset, UnicodeSubset('0-9A-Za-z{}')) subset = UnicodeSubset('a-z') subset &= UnicodeSubset('A-Zfx') self.assertEqual(subset, UnicodeSubset('fx')) subset &= 'xyz' self.assertEqual(subset, UnicodeSubset('x')) with self.assertRaises(TypeError) as ctx: subset = UnicodeSubset('a-z') subset |= False self.assertIn('unsupported operand type', str(ctx.exception)) with self.assertRaises(TypeError) as ctx: subset = UnicodeSubset('a-z') subset &= False self.assertIn('unsupported operand type', str(ctx.exception)) def test_max_and_min(self): s1 = UnicodeSubset([10, 51, (89, 151), 90]) s2 = UnicodeSubset([0, 2, (80, 201), 10000]) s3 = UnicodeSubset([1]) self.assertEqual((min(s1), max(s1)), (10, 150)) self.assertEqual((min(s2), max(s2)), (0, 10000)) self.assertEqual((min(s3), max(s3)), (1, 1)) def test_subtraction(self): subset = UnicodeSubset([0, 2, (80, 200), 10000]) self.assertEqual(subset - {2, 120, 121, (150, 260)}, [0, (80, 120), (122, 150), 10000]) subset = UnicodeSubset('a-z') subset -= UnicodeSubset('a-c') self.assertEqual(subset, UnicodeSubset('d-z')) subset = UnicodeSubset('a-z') subset -= 'a-c' self.assertEqual(subset, UnicodeSubset('d-z')) with self.assertRaises(TypeError) as ctx: subset = UnicodeSubset('a-z') subset -= False self.assertIn('unsupported operand type', str(ctx.exception)) def test_xor(self): subset = UnicodeSubset('a-z') subset ^= subset self.assertEqual(subset, UnicodeSubset()) subset = UnicodeSubset('a-z') subset ^= UnicodeSubset('a-c') self.assertEqual(subset, UnicodeSubset('d-z')) subset = UnicodeSubset('a-z') subset ^= 'a-f' self.assertEqual(subset, UnicodeSubset('g-z')) with self.assertRaises(TypeError) as ctx: subset = UnicodeSubset('a-z') subset ^= False self.assertIn('unsupported operand type', str(ctx.exception)) subset = UnicodeSubset('a-z') subset ^= 'A-Za-f' self.assertEqual(subset, UnicodeSubset('A-Zg-z')) class TestCharacterClass(unittest.TestCase): def test_char_class_init(self): char_class = CharacterClass() self.assertEqual(char_class.positive, []) self.assertEqual(char_class.negative, []) char_class = CharacterClass('a-z') self.assertEqual(char_class.positive, [(97, 123)]) self.assertEqual(char_class.negative, []) def test_char_class_repr(self): char_class = CharacterClass('a-z') self.assertEqual(repr(char_class), 'CharacterClass([a-z])') char_class.complement() self.assertEqual(repr(char_class), 'CharacterClass([^a-z])') def test_char_class_split(self): self.assertListEqual(CharacterClass._re_char_set.split(r'2-\\'), [r'2-\\']) def test_complement(self): char_class = CharacterClass('a-z') self.assertListEqual(char_class.positive.codepoints, [(97, 123)]) self.assertListEqual(char_class.negative.codepoints, []) char_class.complement() self.assertListEqual(char_class.positive.codepoints, []) self.assertListEqual(char_class.negative.codepoints, [(97, 123)]) self.assertEqual(str(char_class), '[^a-z]') char_class = CharacterClass() char_class.complement() self.assertEqual(len(char_class), sys.maxunicode + 1) def test_isub_operator(self): char_class = CharacterClass('A-Za-z') char_class -= CharacterClass('a-z') self.assertEqual(str(char_class), '[A-Z]') char_class = CharacterClass('a-z') other = CharacterClass('A-Za-c') other.complement() char_class -= other self.assertEqual(str(char_class), '[a-c]') char_class = CharacterClass('a-z') other = CharacterClass('A-Za-c') other.complement() other.add('b') char_class -= other self.assertEqual(str(char_class), '[ac]') char_class = CharacterClass('a-c') char_class.complement() other = CharacterClass('a-z') other.complement() char_class -= other self.assertEqual(str(char_class), '[d-z]') def test_in_operator(self): char_class = CharacterClass('A-Za-z') self.assertIn(100, char_class) self.assertIn('d', char_class) self.assertNotIn(49, char_class) self.assertNotIn('1', char_class) char_class.complement() self.assertNotIn(100, char_class) self.assertNotIn('d', char_class) self.assertIn(49, char_class) self.assertIn('1', char_class) def test_iterate(self): char_class = CharacterClass('A-Za-z') self.assertEqual(''.join(chr(c) for c in char_class), string.ascii_uppercase + string.ascii_lowercase) char_class.complement() self.assertEqual(len(''.join(chr(c) for c in char_class)), sys.maxunicode + 1 - len(string.ascii_letters)) def test_length(self): char_class = CharacterClass('0-9A-Z') self.assertListEqual(char_class.positive.codepoints, [(48, 58), (65, 91)]) self.assertListEqual(char_class.negative.codepoints, []) self.assertEqual(len(char_class), 36) char_class.complement() self.assertListEqual(char_class.positive.codepoints, []) self.assertListEqual(char_class.negative.codepoints, [(48, 58), (65, 91)]) self.assertEqual(len(char_class), sys.maxunicode + 1 - 36) char_class.add('k-m') self.assertListEqual(char_class.positive.codepoints, [(107, 110)]) self.assertListEqual(char_class.negative.codepoints, [(48, 58), (65, 91)]) self.assertEqual(str(char_class), '[\x00-/:-@\\[-\U0010ffffk-m]') self.assertEqual(len(char_class), sys.maxunicode + 1 - 36) char_class.add('K-M') self.assertListEqual(char_class.positive.codepoints, [(75, 78), (107, 110)]) self.assertListEqual(char_class.negative.codepoints, [(48, 58), (65, 91)]) self.assertEqual(len(char_class), sys.maxunicode + 1 - 33) self.assertEqual(str(char_class), '[\x00-/:-@\\[-\U0010ffffK-Mk-m]') char_class.clear() self.assertListEqual(char_class.positive.codepoints, []) self.assertListEqual(char_class.negative.codepoints, []) self.assertEqual(len(char_class), 0) def test_add(self): char_class = CharacterClass() self.assertListEqual(char_class.positive.codepoints, []) self.assertListEqual(char_class.negative.codepoints, []) self.assertEqual(len(char_class), 0) char_class.add('0-9') self.assertListEqual(char_class.positive.codepoints, [(48, 58)]) self.assertListEqual(char_class.negative.codepoints, []) self.assertEqual(len(char_class), 10) char_class.add(r'\p{Nd}') self.assertEqual(len(char_class), 630) with self.assertRaises(RegexError): char_class.add(r'\p{}') with self.assertRaises(RegexError): char_class.add(r'\p{XYZ}') char_class.add(r'\P{Nd}') self.assertEqual(len(char_class), sys.maxunicode + 1) char_class = CharacterClass() char_class.add(r'\p{IsFoo}') def test_discard(self): char_class = CharacterClass('0-9') char_class.discard('6-9') self.assertListEqual(char_class.positive.codepoints, [(48, 54)]) self.assertListEqual(char_class.negative.codepoints, []) self.assertEqual(len(char_class), 6) char_class.add(r'\p{Nd}') self.assertEqual(len(char_class), 630) char_class.discard(r'\p{Nd}') self.assertEqual(len(char_class), 0) with self.assertRaises(RegexError): char_class.discard(r'\p{}') with self.assertRaises(RegexError): char_class.discard(r'\p{XYZ}') char_class.add(r'\P{Nd}') self.assertEqual(len(char_class), sys.maxunicode + 1 - 630) char_class.discard(r'\P{Nd}') self.assertEqual(len(char_class), 0) char_class = CharacterClass('a-z') char_class.discard(r'\p{IsFoo}') self.assertEqual(len(char_class), 0) char_class = CharacterClass() char_class.complement() char_class.discard('\\n') self.assertListEqual(char_class.positive.codepoints, [(0, 10), (11, 1114112)]) self.assertListEqual(char_class.negative.codepoints, []) self.assertEqual(len(char_class), sys.maxunicode) char_class.discard('\\s') self.assertListEqual(char_class.positive.codepoints, [(0, 9), (11, 13), (14, 32), (33, 1114112)]) self.assertEqual(len(char_class), sys.maxunicode - 3) char_class.discard('\\S') self.assertEqual(len(char_class), 0) char_class.clear() char_class.negative.codepoints.append(10) char_class.discard('\\s') self.assertListEqual(char_class.positive.codepoints, []) self.assertListEqual(char_class.negative.codepoints, [(9, 11), 13, 32]) char_class = CharacterClass('\t') char_class.complement() self.assertListEqual(char_class.negative.codepoints, [9]) char_class.discard('\\n') self.assertListEqual(char_class.positive.codepoints, []) self.assertListEqual(char_class.negative.codepoints, [(9, 11)]) self.assertEqual(len(char_class), sys.maxunicode - 1) class TestUnicodeCategories(unittest.TestCase): """ Test the subsets of Unicode categories, mainly to check the loaded JSON file. """ def test_unicode_categories(self): self.assertEqual(sum(len(v) for k, v in UNICODE_CATEGORIES.items() if len(k) > 1), sys.maxunicode + 1) self.assertEqual(min([min(s) for s in UNICODE_CATEGORIES.values()]), 0) self.assertEqual(max([max(s) for s in UNICODE_CATEGORIES.values()]), sys.maxunicode) base_sets = [set(v) for k, v in UNICODE_CATEGORIES.items() if len(k) > 1] self.assertFalse(any(s.intersection(t) for s in base_sets for t in base_sets if s != t)) @unittest.skipIf(not ((3, 8) <= sys.version_info < (3, 9)), "Test only for Python 3.8") def test_unicodedata_category(self): for key in UNICODE_CATEGORIES: for cp in UNICODE_CATEGORIES[key]: uc = category(chr(cp)) if key == uc or len(key) == 1 and key == uc[0]: continue self.assertTrue( False, "Wrong category %r for code point %d (should be %r)." % (uc, cp, key) ) class TestPatterns(unittest.TestCase): """ Test of specific regex patterns and their application. """ def test_issue_079(self): # Do not escape special characters in character class regex = translate_pattern('[^\n\t]+', anchors=False) self.assertEqual(regex, '^([^\t\n]+)$(?!\\n\\Z)') pattern = re.compile(regex) self.assertIsNone(pattern.search('first\tsecond\tthird')) self.assertEqual(pattern.search('first second third').group(0), 'first second third') def test_dot_wildcard(self): regex = translate_pattern('.+', anchors=False) self.assertEqual(regex, '^([^\r\n]+)$(?!\\n\\Z)') pattern = re.compile(regex) self.assertIsNone(pattern.search('line1\rline2\r')) self.assertIsNone(pattern.search('line1\nline2')) self.assertIsNone(pattern.search('')) self.assertIsNotNone(pattern.search('\\')) self.assertEqual(pattern.search('abc').group(0), 'abc') regex = translate_pattern('.+T.+(Z|[+-].+)', anchors=False) self.assertEqual(regex, '^([^\r\n]+T[^\r\n]+(Z|[\\+\\-][^\r\n]+))$(?!\\n\\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('12T0A3+36').group(0), '12T0A3+36') self.assertEqual(pattern.search('12T0A3Z').group(0), '12T0A3Z') self.assertIsNone(pattern.search('')) self.assertIsNone(pattern.search('12T0A3Z2')) def test_not_spaces(self): regex = translate_pattern(r"[\S' ']{1,10}", anchors=False) if sys.version_info >= (3,): self.assertEqual( regex, "^([\x00-\x08\x0b\x0c\x0e-\x1f!-\U0010ffff ']{1,10})$(?!\\n\\Z)" ) pattern = re.compile(regex) self.assertIsNone(pattern.search('alpha\r')) self.assertEqual(pattern.search('beta').group(0), 'beta') self.assertIsNone(pattern.search('beta\n')) self.assertIsNone(pattern.search('beta\n ')) self.assertIsNone(pattern.search('')) self.assertIsNone(pattern.search('over the maximum length!')) self.assertIsNotNone(pattern.search('\\')) self.assertEqual(pattern.search('abc').group(0), 'abc') def test_category_escape(self): regex = translate_pattern('^\\p{IsBasicLatin}*$') self.assertEqual(regex, '^[\x00-\x7f]*$(?!\\n\\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('').group(0), '') self.assertEqual(pattern.search('e').group(0), 'e') self.assertIsNone(pattern.search('è')) regex = translate_pattern('^[\\p{IsBasicLatin}\\p{IsLatin-1Supplement}]*$') self.assertEqual(regex, '^[\x00-\xff]*$(?!\\n\\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('e').group(0), 'e') self.assertEqual(pattern.search('è').group(0), 'è') self.assertIsNone(pattern.search('Ĭ')) def test_digit_shortcut(self): regex = translate_pattern(r'\d{1,3}\.\d{1,2}', anchors=False) self.assertEqual(regex, r'^(\d{1,3}\.\d{1,2})$(?!\n\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('12.40').group(0), '12.40') self.assertEqual(pattern.search('867.00').group(0), '867.00') self.assertIsNone(pattern.search('867.00\n')) self.assertIsNone(pattern.search('867.00 ')) self.assertIsNone(pattern.search('867.000')) self.assertIsNone(pattern.search('1867.0')) self.assertIsNone(pattern.search('a1.13')) regex = translate_pattern(r'[-+]?(\d+|\d+(\.\d+)?%)', anchors=False) self.assertEqual(regex, r'^([\+\-]?(\d+|\d+(\.\d+)?%))$(?!\n\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('78.8%').group(0), '78.8%') self.assertIsNone(pattern.search('867.00')) def test_character_class_reordering(self): regex = translate_pattern('[A-Z ]', anchors=False) self.assertEqual(regex, '^([ A-Z])$(?!\\n\\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('A').group(0), 'A') self.assertEqual(pattern.search('Z').group(0), 'Z') self.assertEqual(pattern.search('Q').group(0), 'Q') self.assertEqual(pattern.search(' ').group(0), ' ') self.assertIsNone(pattern.search(' ')) self.assertIsNone(pattern.search('AA')) regex = translate_pattern(r'[0-9.,DHMPRSTWYZ/:+\-]+', anchors=False) self.assertEqual(regex, r'^([\+-\-\.-:DHMPR-TWYZ]+)$(?!\n\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('12,40').group(0), '12,40') self.assertEqual(pattern.search('YYYY:MM:DD').group(0), 'YYYY:MM:DD') self.assertIsNone(pattern.search('')) self.assertIsNone(pattern.search('C')) regex = translate_pattern('[^: \n\r\t]+', anchors=False) self.assertEqual(regex, '^([^\t\n\r :]+)$(?!\\n\\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('56,41').group(0), '56,41') self.assertIsNone(pattern.search('56,41\n')) self.assertIsNone(pattern.search('13:20')) regex = translate_pattern(r'^[A-Za-z0-9_\-]+(:[A-Za-z0-9_\-]+)?$') self.assertEqual(regex, r'^[\-0-9A-Z_a-z]+(:[\-0-9A-Z_a-z]+)?$(?!\n\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('fa9').group(0), 'fa9') self.assertIsNone(pattern.search('-x_1:_tZ-\n')) self.assertEqual(pattern.search('-x_1:_tZ-').group(0), '-x_1:_tZ-') self.assertIsNone(pattern.search('')) self.assertIsNone(pattern.search('+78')) regex = translate_pattern(r'[!%\^\*@~;#,|/]', anchors=False) self.assertEqual(regex, r'^([!#%\*,/;@\^\|~])$(?!\n\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('#').group(0), '#') self.assertEqual(pattern.search('!').group(0), '!') self.assertEqual(pattern.search('^').group(0), '^') self.assertEqual(pattern.search('|').group(0), '|') self.assertEqual(pattern.search('*').group(0), '*') self.assertIsNone(pattern.search('**')) self.assertIsNone(pattern.search('b')) self.assertIsNone(pattern.search('')) regex = translate_pattern('[A-Za-z]+:[A-Za-z][A-Za-z0-9\\-]+', anchors=False) self.assertEqual(regex, '^([A-Za-z]+:[A-Za-z][\\-0-9A-Za-z]+)$(?!\\n\\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('zk:xy-9s').group(0), 'zk:xy-9s') self.assertIsNone(pattern.search('xx:y')) def test_occurrences_qualifiers(self): regex = translate_pattern('#[0-9a-fA-F]{3}([0-9a-fA-F]{3})?', anchors=False) self.assertEqual(regex, r'^(#[0-9A-Fa-f]{3}([0-9A-Fa-f]{3})?)$(?!\n\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('#F3D').group(0), '#F3D') self.assertIsNone(pattern.search('#F3D\n')) self.assertEqual(pattern.search('#F3DA30').group(0), '#F3DA30') self.assertIsNone(pattern.search('#F3')) self.assertIsNone(pattern.search('#F3D ')) self.assertIsNone(pattern.search('F3D')) self.assertIsNone(pattern.search('')) def test_or_operator(self): regex = translate_pattern('0|1', anchors=False) self.assertEqual(regex, r'^(0|1)$(?!\n\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('0').group(0), '0') self.assertEqual(pattern.search('1').group(0), '1') self.assertIsNone(pattern.search('1\n')) self.assertIsNone(pattern.search('')) self.assertIsNone(pattern.search('2')) self.assertIsNone(pattern.search('01')) self.assertIsNone(pattern.search('1\n ')) regex = translate_pattern(r'\d+[%]|\d*\.\d+[%]', anchors=False) self.assertEqual(regex, r'^(\d+[%]|\d*\.\d+[%])$(?!\n\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('99%').group(0), '99%') self.assertEqual(pattern.search('99.9%').group(0), '99.9%') self.assertEqual(pattern.search('.90%').group(0), '.90%') self.assertIsNone(pattern.search('%')) self.assertIsNone(pattern.search('90.%')) regex = translate_pattern('([ -~]|\n|\r|\t)*', anchors=False) self.assertEqual(regex, '^(([ -~]|\n|\r|\t)*)$(?!\\n\\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('ciao\t-~ ').group(0), 'ciao\t-~ ') self.assertEqual(pattern.search('\r\r').group(0), '\r\r') self.assertEqual(pattern.search('\n -.abc').group(0), '\n -.abc') self.assertIsNone(pattern.search('à')) self.assertIsNone(pattern.search('\t\n à')) def test_character_class_shortcuts(self): regex = translate_pattern(r"^[\i-[:]][\c-[:]]*$") pattern = re.compile(regex) self.assertEqual(pattern.search('x11').group(0), 'x11') self.assertIsNone(pattern.search('3a')) regex = translate_pattern(r"^\w*$") pattern = re.compile(regex) self.assertEqual(pattern.search('aA_x7').group(0), 'aA_x7') self.assertIsNone(pattern.search('.')) self.assertIsNone(pattern.search('-')) regex = translate_pattern(r"\W*", anchors=False) pattern = re.compile(regex) self.assertIsNone(pattern.search('aA_x7')) self.assertEqual(pattern.search('.-').group(0), '.-') regex = translate_pattern(r"^\d*$") pattern = re.compile(regex) self.assertEqual(pattern.search('6410').group(0), '6410') self.assertIsNone(pattern.search('a')) self.assertIsNone(pattern.search('-')) regex = translate_pattern(r"^\D*$") pattern = re.compile(regex) self.assertIsNone(pattern.search('6410')) self.assertEqual(pattern.search('a').group(0), 'a') self.assertEqual(pattern.search('-').group(0), '-') # Pull Request 114 regex = translate_pattern(r"^[\w]{0,5}$") pattern = re.compile(regex) self.assertEqual(pattern.search('abc').group(0), 'abc') self.assertIsNone(pattern.search('.')) regex = translate_pattern(r"^[\W]{0,5}$") pattern = re.compile(regex) self.assertEqual(pattern.search('.').group(0), '.') self.assertIsNone(pattern.search('abc')) def test_character_class_range(self): regex = translate_pattern('[bc-]') self.assertEqual(regex, r'[\-bc]') def test_character_class_subtraction(self): regex = translate_pattern('[a-z-[aeiuo]]') self.assertEqual(regex, '[b-df-hj-np-tv-z]') # W3C XSD 1.1 test group RegexTest_422 regex = translate_pattern('[^0-9-[a-zAE-Z]]') self.assertEqual(regex, '[^0-9AE-Za-z]') regex = translate_pattern(r'^([^0-9-[a-zAE-Z]]|[\w-[a-zAF-Z]])+$') pattern = re.compile(regex) self.assertIsNone(pattern.search('azBCDE1234567890BCDEFza')) self.assertEqual(pattern.search('BCD').group(0), 'BCD') def test_invalid_character_class(self): with self.assertRaises(RegexError) as ctx: translate_pattern('[[]') self.assertIn("invalid character '['", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('ab]d') self.assertIn("unexpected meta character ']'", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('[abc\\1]') self.assertIn("illegal back-reference in character class", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('[--a]') self.assertIn("invalid character range '--'", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('[a-z-[c-q') self.assertIn("unterminated character class", str(ctx.exception)) def test_empty_character_class(self): regex = translate_pattern('[a-[a-f]]', anchors=False) self.assertEqual(regex, r'^([^\w\W])$(?!\n\Z)') self.assertRaises(RegexError, translate_pattern, '[]') self.assertEqual(translate_pattern(r'[\w-[\w]]'), r'[^\w\W]') self.assertEqual(translate_pattern(r'[\s-[\s]]'), r'[^\w\W]') self.assertEqual(translate_pattern(r'[\c-[\c]]'), r'[^\w\W]') self.assertEqual(translate_pattern(r'[\i-[\i]]'), r'[^\w\W]') self.assertEqual(translate_pattern('[a-[ab]]'), r'[^\w\W]') self.assertEqual(translate_pattern('[^a-[^a]]'), r'[^\w\W]') def test_back_references(self): self.assertEqual(translate_pattern('(a)\\1'), '(a)\\1') self.assertEqual(translate_pattern('(a)\\11'), '(a)\\1[1]') regex = translate_pattern('((((((((((((a))))))))))))\\11') self.assertEqual(regex, '((((((((((((a))))))))))))\\11') with self.assertRaises(RegexError) as ctx: translate_pattern('(a)\\1', back_references=False) self.assertIn("not allowed escape sequence", str(ctx.exception)) def test_anchors(self): regex = translate_pattern('a^b') self.assertEqual(regex, 'a^b') regex = translate_pattern('a^b', anchors=False) self.assertEqual(regex, '^(a\\^b)$(?!\\n\\Z)') regex = translate_pattern('ab$') self.assertEqual(regex, 'ab$(?!\\n\\Z)') regex = translate_pattern('ab$', anchors=False) self.assertEqual(regex, '^(ab\\$)$(?!\\n\\Z)') def test_lazy_quantifiers(self): regex = translate_pattern('.*?') self.assertEqual(regex, '[^\r\n]*?') regex = translate_pattern('[a-z]{2,3}?') self.assertEqual(regex, '[a-z]{2,3}?') regex = translate_pattern('[a-z]*?') self.assertEqual(regex, '[a-z]*?') regex = translate_pattern('[a-z]*', lazy_quantifiers=False) self.assertEqual(regex, '[a-z]*') with self.assertRaises(RegexError) as ctx: translate_pattern('.*?', lazy_quantifiers=False) self.assertEqual(str(ctx.exception), "unexpected meta character '?' at position 2: '.*?'") with self.assertRaises(RegexError): translate_pattern('[a-z]{2,3}?', lazy_quantifiers=False) with self.assertRaises(RegexError): translate_pattern(r'[a-z]{2,3}?\s+', lazy_quantifiers=False) with self.assertRaises(RegexError): translate_pattern(r'[a-z]+?\s+', lazy_quantifiers=False) def test_invalid_quantifiers(self): with self.assertRaises(RegexError) as ctx: translate_pattern('{1}') self.assertIn("unexpected quantifier '{'", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('.{1,2,3}') self.assertIn("invalid quantifier '{'", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('*') self.assertIn("unexpected quantifier '*'", str(ctx.exception)) def test_invalid_hyphen(self): with self.assertRaises(RegexError) as ctx: translate_pattern('[a-b-c]') self.assertIn("unescaped character '-' at position 4", str(ctx.exception)) regex = translate_pattern('[a-b-c]', xsd_version='1.1') self.assertEqual(regex, '[\\-a-c]') self.assertEqual(translate_pattern('[-a-bc]'), regex) self.assertEqual(translate_pattern('[a-bc-]'), regex) def test_invalid_pattern_groups(self): with self.assertRaises(RegexError) as ctx: translate_pattern('(?.*)') self.assertIn("invalid '(?...)' extension notation", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('(.*))') self.assertIn("unbalanced parenthesis ')'", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('((.*)') self.assertIn("unterminated subpattern in expression", str(ctx.exception)) def test_verbose_patterns(self): regex = translate_pattern('\\ s*[a-z]+', flags=re.VERBOSE) self.assertEqual(regex, '\\s*[a-z]+') regex = translate_pattern('\\ p{ Is BasicLatin}+', flags=re.VERBOSE) self.assertEqual(regex, '[\x00-\x7f]+') def test_backslash_and_escapes(self): regex = translate_pattern('\\') self.assertEqual(regex, '\\') regex = translate_pattern('\\i') self.assertTrue(regex.startswith('[:A-Z_a-z')) regex = translate_pattern('\\I') self.assertTrue(regex.startswith('[^:A-Z_a-z')) regex = translate_pattern('\\c') self.assertTrue(regex.startswith('[-.0-9:A-Z_a-z')) regex = translate_pattern('\\C') self.assertTrue(regex.startswith('[^-.0-9:A-Z_a-z')) def test_block_escapes(self): regex = translate_pattern('\\p{P}') self.assertTrue(regex.startswith('[!-#%-')) regex = translate_pattern('\\P{P}') self.assertTrue(regex.startswith('[^!-#%-')) regex = translate_pattern('\\p{IsBasicLatin}') self.assertEqual(regex, '[\x00-\x7f]') regex = translate_pattern('\\p{IsBasicLatin}', flags=re.IGNORECASE) self.assertEqual(regex, '(?-i:[\x00-\x7f])') with self.assertRaises(RegexError) as ctx: translate_pattern('\\px') self.assertIn("a '{' expected", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('\\p{Pu') self.assertIn("truncated unicode block escape", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('\\p{Unknown}') self.assertIn("'Unknown' doesn't match to any Unicode category", str(ctx.exception)) regex = translate_pattern('\\p{IsUnknown}', xsd_version='1.1') self.assertEqual(regex, '[\x00-\U0010fffe]') with self.assertRaises(RegexError) as ctx: translate_pattern('\\p{IsUnknown}') self.assertIn("'IsUnknown' doesn't match to any Unicode block", str(ctx.exception)) def test_ending_newline_match(self): # Related with xmlschema's issue #223 regex = translate_pattern( pattern=r"\d{2}:\d{2}:\d{6,7}", back_references=False, lazy_quantifiers=False, anchors=False ) pattern = re.compile(regex) self.assertIsNotNone(pattern.match("38:36:000031")) self.assertIsNone(pattern.match("38:36:000031\n")) def test_possessive_quantifiers(self): # Note: possessive quantifiers (*+, ++, ?+, {m,n}+) are supported in Python 3.11+ with self.assertRaises(RegexError) as ctx: translate_pattern('^[abcd]*+$') self.assertIn("unexpected meta character '+' at position 8", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('^[abcd]{1,5}+$') self.assertIn("unexpected meta character '+' at position 12", str(ctx.exception)) if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_schema_context.py000066400000000000000000000231161427546011100216050ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest from copy import copy from textwrap import dedent from elementpath import XPath2Parser, XPathSchemaContext from elementpath.datatypes import UntypedAtomic try: # noinspection PyPackageRequirements import xmlschema except (ImportError, AttributeError): xmlschema = None @unittest.skipIf(xmlschema is None, "xmlschema library required.") class XMLSchemaProxyTest(unittest.TestCase): @classmethod def setUpClass(cls): cls.schema1 = xmlschema.XMLSchema(dedent('''\ ''')) cls.schema2 = xmlschema.XMLSchema(dedent('''\ ''')) def test_name_token(self): parser = XPath2Parser(default_namespace="http://xpath.test/ns") schema_context = XPathSchemaContext(self.schema1) elem_a = self.schema1.elements['a'] token = parser.parse('a') self.assertIsNone(token.xsd_types) context = copy(schema_context) element_node = context.root[0] self.assertIs(element_node.elem, elem_a) self.assertIs(element_node.xsd_type, elem_a.type) result = token.evaluate(context) self.assertEqual(token.xsd_types, {"{http://xpath.test/ns}a": elem_a.type}) self.assertListEqual(result, [element_node]) elem_b1 = elem_a.type.content[0] token = parser.parse('a/b1') self.assertIsNone(token[0].xsd_types) self.assertIsNone(token[1].xsd_types) context = copy(schema_context) element_node = context.root[0][0] self.assertIs(element_node.elem, elem_b1) self.assertIs(element_node.xsd_type, elem_b1.type) result = token.evaluate(context) self.assertEqual(token[0].xsd_types, {"{http://xpath.test/ns}a": elem_a.type}) self.assertEqual(token[1].xsd_types, {"b1": elem_b1.type}) self.assertListEqual(result, [element_node]) def test_colon_token(self): parser = XPath2Parser(namespaces={'tst': "http://xpath.test/ns"}) context = XPathSchemaContext(self.schema1) elem_a = self.schema1.elements['a'] token = parser.parse('tst:a') self.assertEqual(token.symbol, ':') self.assertIsNone(token.xsd_types) result = token.evaluate(copy(context)) self.assertEqual(token.xsd_types, {"{http://xpath.test/ns}a": elem_a.type}) self.assertListEqual(result, [context.root[0]]) elem_b1 = elem_a.type.content[0] token = parser.parse('tst:a/b1') self.assertEqual(token.symbol, '/') self.assertEqual(token[0].symbol, ':') self.assertIsNone(token[0].xsd_types) self.assertIsNone(token[1].xsd_types) result = token.evaluate(copy(context)) self.assertListEqual(result, [context.root[0][0]]) self.assertEqual(token[0].xsd_types, {"{http://xpath.test/ns}a": elem_a.type}) self.assertEqual(token[1].xsd_types, {"b1": elem_b1.type}) token = parser.parse('tst:a/tst:b1') result = token.evaluate(copy(context)) self.assertListEqual(result, []) self.assertEqual(token[0].xsd_types, {"{http://xpath.test/ns}a": elem_a.type}) self.assertIsNone(token[1].xsd_types) elem_b3 = elem_a.type.content[2] token = parser.parse('tst:a/tst:b3') self.assertEqual(token.symbol, '/') self.assertEqual(token[0].symbol, ':') self.assertIsNone(token[0].xsd_types) self.assertIsNone(token[1].xsd_types) result = token.evaluate(copy(context)) self.assertListEqual(result, [context.root[0][2]]) self.assertEqual(token[0].xsd_types, {"{http://xpath.test/ns}a": elem_a.type}) self.assertEqual(token[1].xsd_types, {"{http://xpath.test/ns}b3": elem_b3.type}) def test_extended_name_token(self): parser = XPath2Parser(strict=False) context = XPathSchemaContext(self.schema1) elem_a = self.schema1.elements['a'] token = parser.parse('{http://xpath.test/ns}a') self.assertEqual(token.symbol, '{') self.assertIsNone(token.xsd_types) self.assertEqual(token[0].symbol, '(string)') self.assertEqual(token[1].symbol, '(name)') self.assertEqual(token[1].value, 'a') result = token.evaluate(context) self.assertListEqual(result, [context.root[0]]) self.assertEqual(token.xsd_types, {"{http://xpath.test/ns}a": elem_a.type}) self.assertIsNone(token[0].xsd_types) self.assertIsNone(token[1].xsd_types) def test_wildcard_token(self): parser = XPath2Parser(default_namespace="http://xpath.test/ns") context = XPathSchemaContext(self.schema1) elem_a = self.schema1.elements['a'] elem_b3 = self.schema1.elements['b3'] token = parser.parse('*') self.assertEqual(token.symbol, '*') self.assertIsNone(token.xsd_types) result = token.evaluate(context) self.assertListEqual([e.value for e in result], [elem_a, elem_b3]) self.assertEqual(token.xsd_types, {"{http://xpath.test/ns}a": elem_a.type, "{http://xpath.test/ns}b3": elem_b3.type}) token = parser.parse('a/*') self.assertEqual(token.symbol, '/') self.assertEqual(token[0].symbol, '(name)') self.assertEqual(token[1].symbol, '*') result = token.evaluate(context) self.assertListEqual([e.value for e in result], elem_a.type.content[:]) self.assertIsNone(token.xsd_types) self.assertEqual(token[0].xsd_types, {"{http://xpath.test/ns}a": elem_a.type}) self.assertEqual(token[1].xsd_types, {'b1': elem_a.type.content[0].type, 'b2': elem_a.type.content[1].type, '{http://xpath.test/ns}b3': elem_b3.type}) def test_dot_shortcut_token(self): parser = XPath2Parser(default_namespace="http://xpath.test/ns") context = XPathSchemaContext(self.schema1) elem_a = self.schema1.elements['a'] elem_b3 = self.schema1.elements['b3'] token = parser.parse('.') self.assertIsNone(token.xsd_types) result = token.evaluate(context) self.assertListEqual(result, [context.root]) self.assertEqual(token.xsd_types, {"{http://xpath.test/ns}a": elem_a.type, "{http://xpath.test/ns}b3": elem_b3.type}) context = XPathSchemaContext(self.schema1, item=self.schema1) token = parser.parse('.') self.assertIsNone(token.xsd_types) result = token.evaluate(context) self.assertListEqual(result, [context.root]) self.assertEqual(token.xsd_types, {"{http://xpath.test/ns}a": elem_a.type, "{http://xpath.test/ns}b3": elem_b3.type}) context = XPathSchemaContext(self.schema1, item=self.schema2) schema2_node = context.item token = parser.parse('.') self.assertIsNone(token.xsd_types) result = token.evaluate(context) self.assertListEqual(result, [schema2_node]) self.assertIsNone(token.xsd_types) def test_schema_variables(self): variable_types = {'a': 'item()', 'b': 'xs:integer?', 'c': 'xs:string'} parser = XPath2Parser(default_namespace="http://xpath.test/ns", variable_types=variable_types) context = XPathSchemaContext(self.schema1) token = parser.parse('$a') result = token.evaluate(context) self.assertIsInstance(result, UntypedAtomic) self.assertEqual(result.value, '1') token = parser.parse('$b') result = token.evaluate(context) self.assertIsInstance(result, int) self.assertEqual(result, 1) token = parser.parse('$c') result = token.evaluate(context) self.assertIsInstance(result, str) self.assertEqual(result, ' alpha\t') token = parser.parse('$z') with self.assertRaises(NameError): token.evaluate(context) def test_not_applicable_functions(self): parser = XPath2Parser(default_namespace="http://xpath.test/ns") context = XPathSchemaContext(self.schema1) token = parser.parse("fn:collection('filepath')") self.assertIsNone(token.evaluate(context)) token = parser.parse("fn:doc-available('tns1')") self.assertIsNone(token.evaluate(context)) token = parser.parse("fn:root(.)") self.assertIsNone(token.evaluate(context)) token = parser.parse("fn:id('ID21256')") self.assertListEqual(token.evaluate(context), []) token = parser.parse("fn:idref('ID21256')") self.assertListEqual(token.evaluate(context), []) if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_schema_proxy.py000066400000000000000000000473201427546011100213050ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest import xml.etree.ElementTree as ElementTree import io from textwrap import dedent try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None from elementpath import AttributeNode, XPathContext, XPath2Parser, MissingContextError from elementpath.namespaces import XML_LANG, XSD_NAMESPACE, XSD_ANY_ATOMIC_TYPE, XSD_NOTATION try: # noinspection PyPackageRequirements import xmlschema from xmlschema.xpath import XMLSchemaProxy except (ImportError, AttributeError): xmlschema = None try: from tests import xpath_test_class except ImportError: import xpath_test_class @unittest.skipIf(xmlschema is None, "xmlschema library required.") class XMLSchemaProxyTest(xpath_test_class.XPathTestCase): @classmethod def setUpClass(cls): cls.schema = xmlschema.XMLSchema(''' ''') def setUp(self): self.schema_proxy = XMLSchemaProxy(self.schema) self.parser = XPath2Parser(namespaces=self.namespaces, schema=self.schema_proxy) def test_abstract_xsd_schema(self): class GlobalMaps: types = {} attributes = {} elements = {} substitution_groups = {} class XsdSchema: tag = '{%s}schema' % XSD_NAMESPACE xsd_version = '1.1' maps = GlobalMaps() text = None @property def attrib(self): return {} def __iter__(self): return iter(()) def find(self, path, namespaces=None): return schema = XsdSchema() self.assertEqual(schema.tag, '{http://www.w3.org/2001/XMLSchema}schema') self.assertIsNone(schema.text) def test_schema_proxy_init(self): schema_src = """ """ schema_tree = ElementTree.parse(io.StringIO(schema_src)) self.assertIsInstance(XMLSchemaProxy(), XMLSchemaProxy) self.assertIsInstance(XMLSchemaProxy(xmlschema.XMLSchema(schema_src)), XMLSchemaProxy) with self.assertRaises(TypeError): XMLSchemaProxy(schema=schema_tree) with self.assertRaises(TypeError): XMLSchemaProxy(schema=xmlschema.XMLSchema(schema_src), base_element=schema_tree) with self.assertRaises(TypeError): XMLSchemaProxy(schema=xmlschema.XMLSchema(schema_src), base_element=schema_tree.getroot()) schema = xmlschema.XMLSchema(schema_src) with self.assertRaises(ValueError): XMLSchemaProxy(base_element=schema.elements['test_element']) def test_xmlschema_proxy(self): context = XPathContext( root=self.etree.XML('') ) self.wrong_syntax("schema-element(*)") self.wrong_name("schema-element(nil)") self.wrong_name("schema-element(xs:string)") self.check_value("schema-element(xs:complexType)", MissingContextError) self.check_value("self::schema-element(xs:complexType)", NameError, context) self.check_value("self::schema-element(xs:schema)", [context.item], context) self.check_tree("schema-element(xs:group)", '(schema-element (: (xs) (group)))') attribute = context.item = AttributeNode(XML_LANG, 'en') self.wrong_syntax("schema-attribute(*)") self.wrong_name("schema-attribute(nil)") self.wrong_name("schema-attribute(xs:string)") self.check_value("schema-attribute(xml:lang)", MissingContextError) self.check_value("schema-attribute(xml:lang)", NameError, context) self.check_value("self::schema-attribute(xml:lang)", [context.item], context) self.check_tree("schema-attribute(xsi:schemaLocation)", '(schema-attribute (: (xsi) (schemaLocation)))') token = self.parser.parse("self::schema-attribute(xml:lang)") context.item = attribute context.axis = 'attribute' self.assertEqual(list(token.select(context)), [context.item]) def test_bind_parser_method(self): schema_src = dedent(""" """) schema = xmlschema.XMLSchema(schema_src) schema_proxy = XMLSchemaProxy(schema=schema) parser = XPath2Parser(namespaces=self.namespaces) self.assertFalse(parser.is_schema_bound()) schema_proxy.bind_parser(parser) self.assertTrue(parser.is_schema_bound()) self.assertIs(schema_proxy, parser.schema) # To test AbstractSchemaProxy.bind_parser() parser = XPath2Parser(namespaces=self.namespaces) super(XMLSchemaProxy, schema_proxy).bind_parser(parser) self.assertIs(schema_proxy, parser.schema) super(XMLSchemaProxy, schema_proxy).bind_parser(parser) self.assertIs(schema_proxy, parser.schema) def test_schema_constructors(self): schema_src = dedent(""" """) schema = xmlschema.XMLSchema(schema_src) schema_proxy = XMLSchemaProxy(schema=schema) parser = XPath2Parser(namespaces=self.namespaces, schema=schema_proxy) with self.assertRaises(NameError) as ctx: parser.schema_constructor(XSD_ANY_ATOMIC_TYPE) self.assertIn('XPST0080', str(ctx.exception)) with self.assertRaises(NameError) as ctx: parser.schema_constructor(XSD_NOTATION) self.assertIn('XPST0080', str(ctx.exception)) token = parser.parse('stringType("apple")') self.assertEqual(token.symbol, 'stringType') self.assertEqual(token.label, 'constructor function') self.assertEqual(token.evaluate(), 'apple') token = parser.parse('stringType(())') self.assertEqual(token.symbol, 'stringType') self.assertEqual(token.label, 'constructor function') self.assertEqual(token.evaluate(), []) token = parser.parse('stringType(10)') self.assertEqual(token.symbol, 'stringType') self.assertEqual(token.label, 'constructor function') self.assertEqual(token.evaluate(), '10') token = parser.parse('stringType(.)') self.assertEqual(token.symbol, 'stringType') self.assertEqual(token.label, 'constructor function') token = parser.parse('intType(10)') self.assertEqual(token.symbol, 'intType') self.assertEqual(token.label, 'constructor function') self.assertEqual(token.evaluate(), 10) with self.assertRaises(ValueError) as ctx: parser.parse('intType(true())') self.assertIn('FORG0001', str(ctx.exception)) def test_get_context_method(self): schema_proxy = XMLSchemaProxy() self.assertIsInstance(schema_proxy.get_context(), XPathContext) self.assertIsInstance(super(XMLSchemaProxy, schema_proxy).get_context(), XPathContext) def test_get_type_api(self): schema_proxy = XMLSchemaProxy() self.assertIsNone(schema_proxy.get_type('unknown')) self.assertEqual(schema_proxy.get_type('{%s}string' % XSD_NAMESPACE), xmlschema.XMLSchema.builtin_types()['string']) def test_xsd_version_api(self): self.assertEqual(self.schema_proxy.xsd_version, '1.0') def test_find_api(self): schema_src = """ """ schema = xmlschema.XMLSchema(schema_src) schema_proxy = XMLSchemaProxy(schema=schema) self.assertEqual(schema_proxy.find('/test_element'), schema.elements['test_element']) def test_get_attribute_api(self): self.assertIs( self.schema_proxy.get_attribute("{http://xpath.test/ns}test_attribute"), self.schema_proxy._schema.maps.attributes["{http://xpath.test/ns}test_attribute"] ) def test_get_element_api(self): self.assertIs( self.schema_proxy.get_element("{http://xpath.test/ns}test_element"), self.schema_proxy._schema.maps.elements["{http://xpath.test/ns}test_element"] ) def test_get_substitution_group_api(self): self.assertIsNone(self.schema_proxy.get_substitution_group('x')) def test_is_instance_api(self): self.assertFalse(self.schema_proxy.is_instance(True, '{%s}integer' % XSD_NAMESPACE)) self.assertTrue(self.schema_proxy.is_instance(5, '{%s}integer' % XSD_NAMESPACE)) self.assertFalse(self.schema_proxy.is_instance('alpha', '{%s}integer' % XSD_NAMESPACE)) self.assertTrue(self.schema_proxy.is_instance('alpha', '{%s}string' % XSD_NAMESPACE)) self.assertTrue(self.schema_proxy.is_instance('alpha beta', '{%s}token' % XSD_NAMESPACE)) self.assertTrue(self.schema_proxy.is_instance('alpha', '{%s}Name' % XSD_NAMESPACE)) self.assertFalse(self.schema_proxy.is_instance('alpha beta', '{%s}Name' % XSD_NAMESPACE)) self.assertFalse(self.schema_proxy.is_instance('1alpha', '{%s}Name' % XSD_NAMESPACE)) self.assertTrue(self.schema_proxy.is_instance('alpha', '{%s}NCName' % XSD_NAMESPACE)) self.assertFalse(self.schema_proxy.is_instance('eg:alpha', '{%s}NCName' % XSD_NAMESPACE)) def test_cast_as_api(self): schema_proxy = XMLSchemaProxy() self.assertEqual(schema_proxy.cast_as('19', '{%s}short' % XSD_NAMESPACE), 19) def test_attributes_type(self): parser = XPath2Parser(namespaces=self.namespaces) token = parser.parse("@min le @max") context = XPathContext(self.etree.XML('')) self.assertTrue(token.evaluate(context)) context = XPathContext(self.etree.XML('')) self.assertTrue(token.evaluate(context)) schema = xmlschema.XMLSchema(''' ''') parser = XPath2Parser(namespaces=self.namespaces, schema=XMLSchemaProxy(schema, schema.elements['range'])) token = parser.parse("@min le @max") context = XPathContext(self.etree.XML('')) self.assertTrue(token.evaluate(context)) context = XPathContext(self.etree.XML('')) self.assertFalse(token.evaluate(context)) schema = xmlschema.XMLSchema(''' ''') parser = XPath2Parser(namespaces=self.namespaces, schema=XMLSchemaProxy(schema, schema.elements['range'])) self.assertRaises(TypeError, parser.parse, '@min le @max') def test_elements_type(self): schema = xmlschema.XMLSchema(''' ''') parser = XPath2Parser(namespaces={'': "http://xpath.test/ns", 'xs': XSD_NAMESPACE}, schema=XMLSchemaProxy(schema)) token = parser.parse("//a") self.assertEqual(token[0].xsd_types['a'], schema.maps.types['{%s}string' % XSD_NAMESPACE]) token = parser.parse("//b") self.assertEqual(token[0].xsd_types['b'], schema.maps.types['{%s}integer' % XSD_NAMESPACE]) token = parser.parse("//values/c") self.assertEqual(token[0][0].xsd_types["{http://xpath.test/ns}values"], schema.elements['values'].type) self.assertEqual(token[1].xsd_types['c'], schema.maps.types['{%s}boolean' % XSD_NAMESPACE]) token = parser.parse("values/c") self.assertEqual(token[0].xsd_types['{http://xpath.test/ns}values'], schema.elements['values'].type) self.assertEqual(token[1].xsd_types['c'], schema.maps.types['{%s}boolean' % XSD_NAMESPACE]) token = parser.parse("values/*") self.assertEqual(token[1].xsd_types, { 'a': schema.maps.types['{%s}string' % XSD_NAMESPACE], 'b': schema.maps.types['{%s}integer' % XSD_NAMESPACE], 'c': schema.maps.types['{%s}boolean' % XSD_NAMESPACE], 'd': schema.maps.types['{%s}float' % XSD_NAMESPACE], }) def test_elements_and_attributes_type(self): schema = xmlschema.XMLSchema(''' ''') parser = XPath2Parser(namespaces={'': "http://xpath.test/ns", 'xs': XSD_NAMESPACE}, schema=XMLSchemaProxy(schema)) token = parser.parse("//a") self.assertEqual(token[0].xsd_types['a'], schema.maps.types['{%s}string' % XSD_NAMESPACE]) token = parser.parse("//b") self.assertEqual(token[0].xsd_types['b'], schema.types['rangeType']) token = parser.parse("values/c") self.assertEqual(token[0].xsd_types['{http://xpath.test/ns}values'], schema.elements['values'].type) self.assertEqual(token[1].xsd_types['c'], schema.maps.types['{%s}boolean' % XSD_NAMESPACE]) token = parser.parse("//b/@min") self.assertEqual(token[0][0].xsd_types['b'], schema.types['rangeType']) self.assertEqual(token[1][0].xsd_types['min'], schema.maps.types['{%s}integer' % XSD_NAMESPACE]) token = parser.parse("values/b/@min") self.assertEqual(token[0][0].xsd_types['{http://xpath.test/ns}values'], schema.elements['values'].type) self.assertEqual(token[0][1].xsd_types['b'], schema.types['rangeType']) self.assertEqual(token[1][0].xsd_types['min'], schema.maps.types['{%s}integer' % XSD_NAMESPACE]) token = parser.parse("//b/@min lt //b/@max") self.assertEqual(token[0][0][0].xsd_types['b'], schema.types['rangeType']) self.assertEqual(token[0][1][0].xsd_types['min'], schema.maps.types['{%s}integer' % XSD_NAMESPACE]) self.assertEqual(token[1][0][0].xsd_types['b'], schema.types['rangeType']) self.assertEqual(token[1][1][0].xsd_types['max'], schema.maps.types['{%s}integer' % XSD_NAMESPACE]) root = self.etree.XML('') context = XPathContext(root, namespaces={'': "http://xpath.test/ns"}) self.assertIsNone(token.evaluate(context)) root = self.etree.XML('30') context = XPathContext(root, namespaces={'': "http://xpath.test/ns"}) self.assertIsNone(token.evaluate(context)) root = self.etree.XML( '30') context = XPathContext(root, namespaces={'': "http://xpath.test/ns"}) self.assertTrue(token.evaluate(context)) root = self.etree.XML( '30') context = XPathContext(root, namespaces={'': "http://xpath.test/ns"}) self.assertFalse(token.evaluate(context)) def test_issue_10(self): schema = xmlschema.XMLSchema(''' ''') # TODO: test fail with xmlschema-1.0.17+, added namespaces as temporary fix for test. # A fix for xmlschema.xpath.ElementPathMixin._get_xpath_namespaces() is required. root = schema.find('root', namespaces={'': 'http://xpath.test/ns#'}) self.assertEqual(getattr(root, 'tag', None), '{http://xpath.test/ns#}root') @unittest.skipIf(xmlschema is None or lxml_etree is None, "both xmlschema and lxml required") class LxmlXMLSchemaProxyTest(XMLSchemaProxyTest): etree = lxml_etree if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_selectors.py000066400000000000000000000051761427546011100206120ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest import xml.etree.ElementTree as ElementTree from elementpath import select, iter_select, Selector, XPath2Parser class XPathSelectorsTest(unittest.TestCase): root = ElementTree.XML('Dickens') def test_select_function(self): self.assertListEqual(select(self.root, 'text()'), ['Dickens']) self.assertEqual(select(self.root, '$a', variables={'a': 1}), 1) self.assertEqual( select(self.root, '$a', variables={'a': 1}, variable_types={'a': 'xs:decimal'}), 1 ) def test_iter_select_function(self): self.assertListEqual(list(iter_select(self.root, 'text()')), ['Dickens']) self.assertListEqual(list(iter_select(self.root, '$a', variables={'a': True})), [True]) def test_selector_class(self): selector = Selector('/A') self.assertEqual(repr(selector), "Selector(path='/A', parser=XPath2Parser)") self.assertEqual(selector.namespaces, XPath2Parser.DEFAULT_NAMESPACES) selector = Selector('text()') self.assertListEqual(selector.select(self.root), ['Dickens']) self.assertListEqual(list(selector.iter_select(self.root)), ['Dickens']) selector = Selector('$a', variables={'a': 1}) self.assertEqual(selector.select(self.root), 1) self.assertListEqual(list(selector.iter_select(self.root)), [1]) def test_issue_001(self): selector = Selector("//FullPath[ends-with(., 'Temp')]") self.assertListEqual(selector.select(ElementTree.XML('')), []) self.assertListEqual(selector.select(ElementTree.XML('')), []) root = ElementTree.XML('High Temp') self.assertListEqual(selector.select(root), [root]) def test_issue_042(self): selector1 = Selector('text()') selector2 = Selector('sup[last()]/preceding-sibling::text()') root = ElementTree.XML('a1b2c3') self.assertListEqual(selector1.select(root), selector2.select(root)) selector2 = Selector('sup[1]/following-sibling::text()') root = ElementTree.XML('1b2c3d') self.assertListEqual(selector1.select(root), selector2.select(root)) if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_tdop_parser.py000066400000000000000000000357301427546011100211300ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest import re import sys from collections import namedtuple from elementpath.tdop import _symbol_to_classname, ParseError, Token, \ ParserMeta, Parser, MultiLabel class TdopParserTest(unittest.TestCase): @classmethod def setUpClass(cls): class ExpressionParser(Parser): SYMBOLS = {'(integer)', '+', '-', '(name)', '(end)', '(invalid)', '(unknown)'} @classmethod def create_tokenizer(cls, symbol_table): return re.compile( r'INCOMPATIBLE | (\d+) | (UNKNOWN|[+\-]) | (\w+) | (\S)| \s+', flags=re.VERBOSE ) ExpressionParser.literal('(integer)') ExpressionParser.register('(name)', bp=100, lbp=100) ExpressionParser.register('(end)') ExpressionParser.register('(invalid)') ExpressionParser.register('(unknown)') @ExpressionParser.method(ExpressionParser.infix('+', bp=40)) def evaluate_plus(self, context=None): return self[0].evaluate(context) + self[1].evaluate(context) @ExpressionParser.method(ExpressionParser.infix('-', bp=40)) def evaluate_minus(self, context=None): return self[0].evaluate(context) - self[1].evaluate(context) cls.parser = ExpressionParser() def test_multi_label_class(self): label = MultiLabel('function', 'constructor function') self.assertEqual(label, 'function') self.assertEqual(label, 'constructor function') self.assertNotEqual(label, 'constructor') self.assertNotEqual(label, 'operator') self.assertEqual(str(label), 'function__constructor_function') self.assertEqual(repr(label), "MultiLabel('function', 'constructor function')") self.assertEqual(hash(label), hash(('function', 'constructor function'))) self.assertIn(label, ['function']) self.assertNotIn(label, []) self.assertNotIn(label, ['not a function']) self.assertNotIn(label, {'function'}) # compares not equality but hash self.assertIn('function', label) self.assertIn('constructor', label) self.assertNotIn('axis', label) self.assertTrue(label.startswith('function')) self.assertTrue(label.startswith('constructor')) self.assertFalse(label.startswith('operator')) self.assertTrue(label.endswith('function')) self.assertFalse(label.endswith('constructor')) def test_symbol_to_classname_function(self): self.assertEqual(_symbol_to_classname('_cat10'), 'Cat10') self.assertEqual(_symbol_to_classname('&'), 'Ampersand') self.assertEqual(_symbol_to_classname('('), 'LeftParenthesis') self.assertEqual(_symbol_to_classname(')'), 'RightParenthesis') self.assertEqual(_symbol_to_classname('(name)'), 'Name') self.assertEqual(_symbol_to_classname('(name'), 'LeftParenthesisname') self.assertEqual(_symbol_to_classname('-'), 'HyphenMinus') self.assertEqual(_symbol_to_classname('_'), 'LowLine') self.assertEqual(_symbol_to_classname('-_'), 'HyphenMinusLowLine') self.assertEqual(_symbol_to_classname('--'), 'HyphenMinusHyphenMinus') self.assertEqual(_symbol_to_classname('my-api-call'), 'MyApiCall') self.assertEqual(_symbol_to_classname('call-'), 'Call') def test_create_tokenizer_method(self): FakeToken = namedtuple('Token', 'symbol pattern label') tokens = { FakeToken(symbol='(name)', pattern=None, label='literal'), FakeToken('call', pattern=r'\bcall\b(?=\s+\()', label='function'), } pattern = Parser.create_tokenizer({t.symbol: t for t in tokens}) self.assertEqual(pattern.pattern, '(\'[^\']*\'|"[^"]*"|(?:\\d+|\\.\\d+)(?:\\.\\d*)?(?:[Ee][+-]?\\d+)?)|' '(\\bcall\\b(?=\\s+\\())|([A-Za-z0-9_]+)|(\\S)|\\s+') tokens = { FakeToken(symbol='(name)', pattern=None, label='literal'), FakeToken('call', pattern=r'\bcall\b(?=\s+\()', label='function'), FakeToken('+', pattern=None, label='operator'), } pattern = Parser.create_tokenizer({t.symbol: t for t in tokens}) self.assertEqual(pattern.pattern, '(\'[^\']*\'|"[^"]*"|(?:\\d+|\\.\\d+)(?:\\.\\d*)?(?:[Ee][+-]?\\d+)?)|' '([\\+]|\\bcall\\b(?=\\s+\\())|([A-Za-z0-9_]+)|(\\S)|\\s+') # Check fix for issue #10 tk = FakeToken('{http://www.w3.org/2000/09/xmldsig#}CryptoBinary', None, 'constructor') tokens.add(tk) pattern = Parser.create_tokenizer({t.symbol: t for t in tokens}) if sys.version_info >= (3, 7): self.assertIn(r"(\{http://www\.w3\.org/2000/09/xmldsig\#\}", pattern.pattern) else: self.assertIn(r"(\{http\:\/\/www\.w3\.org\/2000\/09\/xmldsig\#\}", pattern.pattern) def test_tokenizer_items(self): self.assertListEqual(self.parser.tokenizer.findall('5 56'), [('5', '', '', ''), ('', '', '', ''), ('56', '', '', '')]) self.assertListEqual(self.parser.tokenizer.findall('5+56'), [('5', '', '', ''), ('', '+', '', ''), ('56', '', '', '')]) self.assertListEqual(self.parser.tokenizer.findall('xy'), [('', '', 'xy', '')]) self.assertListEqual(self.parser.tokenizer.findall('5x'), [('5', '', '', ''), ('', '', 'x', '')]) def test_incompatible_tokenizer(self): with self.assertRaises(RuntimeError) as ec: self.parser.parse('INCOMPATIBLE') self.assertIn("incompatible tokenizer", str(ec.exception)) def test_expression(self): token = self.parser.parse('10 + 6') self.assertEqual(token.evaluate(), 16) def test_syntax_errors(self): with self.assertRaises(ParseError) as ec: self.parser.parse('x') # with nud() self.assertEqual(str(ec.exception), "unexpected name 'x'") with self.assertRaises(ParseError) as ec: self.parser.parse('5y') # with led() self.assertEqual(str(ec.exception), "unexpected name 'y'") with self.assertRaises(ParseError) as ec: self.parser.parse('5 5') # with expected() self.assertEqual(str(ec.exception), "unexpected literal 5") def test_unused_token_helpers(self): token = self.parser.parse('10') self.assertIsNone(token.unexpected('+', '-')) with self.assertRaises(ParseError) as ec: token.unexpected('(integer)') self.assertEqual(str(ec.exception), "unexpected literal 10") self.assertIsInstance(token.wrong_type(), TypeError) self.assertIsInstance(token.wrong_value(), ValueError) def test_unknown_symbol(self): with self.assertRaises(ParseError) as ec: self.parser.parse('?') self.assertEqual(str(ec.exception), "unknown symbol '?'") with self.assertRaises(ParseError) as ec: self.parser.parse('UNKNOWN') self.assertEqual(str(ec.exception), "unexpected name 'UNKNOWN'") parser = self.parser.__class__() parser.symbol_table = parser.symbol_table.copy() parser.build() parser.symbol_table.pop('+') with self.assertRaises(ParseError) as ec: parser.parse('+') self.assertEqual(str(ec.exception), "unknown symbol '+'") def test_invalid_source(self): with self.assertRaises(ParseError) as ec: self.parser.parse(10) self.assertIn("invalid source type", str(ec.exception)) def test_invalid_token(self): token = self.parser.symbol_table['(invalid)'](self.parser, '10e') self.assertEqual(str(token.wrong_syntax()), "invalid literal '10e'") def test_parser_position(self): parser = type(self.parser)() parser.source = ' 7 +\n 8 ' parser.tokens = iter(parser.tokenizer.finditer(parser.source)) self.assertEqual(parser.token.symbol, '(start)') parser.advance() self.assertEqual(parser.token.symbol, '(start)') self.assertEqual(parser.position, (1, 1)) self.assertTrue(parser.is_source_start()) self.assertTrue(parser.is_line_start()) self.assertTrue(parser.is_spaced()) parser.advance() self.assertNotEqual(parser.token.symbol, '(start)') self.assertEqual(parser.token.value, 7) self.assertEqual(parser.position, (1, 4)) self.assertTrue(parser.is_source_start()) self.assertTrue(parser.is_line_start()) self.assertTrue(parser.is_spaced()) parser.advance() self.assertEqual(parser.token.symbol, '+') self.assertEqual(parser.position, (1, 6)) self.assertFalse(parser.is_source_start()) self.assertFalse(parser.is_line_start()) parser.advance() self.assertEqual(parser.token.value, 8) self.assertEqual(parser.position, (2, 2)) self.assertFalse(parser.is_source_start()) self.assertTrue(parser.is_line_start()) self.assertTrue(parser.is_spaced()) parser.source = ' 7 +' self.assertFalse(parser.is_spaced()) def test_advance_until(self): parser = type(self.parser)() parser.source = '' parser.tokens = iter(parser.tokenizer.finditer(parser.source)) parser.advance() with self.assertRaises(TypeError) as ec: parser.advance_until() self.assertEqual(str(ec.exception), "at least a stop symbol required!") with self.assertRaises(ParseError) as ec: parser.advance_until('+') self.assertEqual(str(ec.exception), "source is empty") parser = type(self.parser)() parser.source = '5 6 7 + 8' parser.tokens = iter(parser.tokenizer.finditer(parser.source)) parser.advance() self.assertEqual(parser.next_token.symbol, '(integer)') self.assertEqual(parser.next_token.value, 5) parser.advance_until('+') self.assertEqual(parser.next_token.symbol, '+') parser = type(self.parser)() parser.source = '5 6 7 + 8' parser.tokens = iter(parser.tokenizer.finditer(parser.source)) parser.advance() self.assertEqual(parser.next_token.symbol, '(integer)') self.assertEqual(parser.next_token.value, 5) parser.advance_until('*') self.assertEqual(parser.next_token.symbol, '(end)') parser = type(self.parser)() parser.source = '5 UNKNOWN' parser.tokens = iter(parser.tokenizer.finditer(parser.source)) parser.advance() self.assertEqual(parser.next_token.symbol, '(integer)') self.assertEqual(parser.next_token.value, 5) with self.assertRaises(ParseError) as ec: parser.advance_until('UNKNOWN') self.assertEqual(str(ec.exception), "unknown symbol '(unknown)'") def test_unescape_helper(self): self.assertEqual(self.parser.unescape("'\\''"), "'") self.assertEqual(self.parser.unescape('"\\""'), '"') def test_invalid_parser_derivation(self): globals()['ExpressionParser'] = self.parser.__class__ try: with self.assertRaises(RuntimeError) as ec: class AnotherParser(Parser): pass isinstance(AnotherParser, Parser) self.assertEqual(str(ec.exception), "Multiple parser class definitions per module are not allowed") finally: del globals()['ExpressionParser'] def test_new_parser_class(self): class FakeBase: pass class AnotherParser(FakeBase, metaclass=ParserMeta): pass self.assertIs(AnotherParser.token_base_class, Token) self.assertEqual(AnotherParser.literals_pattern.pattern, r"""'[^']*'|"[^"]*"|(?:\d+|\.\d+)(?:\.\d*)?(?:[Ee][+-]?\d+)?""") def test_incomplete_parser_build(self): class UnfinishedParser(Parser): SYMBOLS = {'(integer)', r'function\(', r'axis\:\:', '(name)', '(end)'} UnfinishedParser.literal('(integer)') UnfinishedParser.register(r'function\(') UnfinishedParser.register(r'axis\:\:') UnfinishedParser.register('(end)') with self.assertRaises(ValueError) as ec: UnfinishedParser.build() self.assertIn("unregistered symbols: ['(name)']", str(ec.exception)) def test_invalid_registrations(self): class AnotherParser(Parser): SYMBOLS = {'(integer)', r'function\(', '(name)', '(end)'} with self.assertRaises(ValueError) as ec: AnotherParser.register(r'function \(') self.assertIn("a symbol can't contain whitespaces", str(ec.exception)) with self.assertRaises(NameError) as ec: AnotherParser.register('undefined') self.assertIn("'undefined' is not a symbol of the parser", str(ec.exception)) def test_other_operators(self): class ExpressionParser(Parser): SYMBOLS = {'(integer)', '+', '++', '-', '*', '(end)'} ExpressionParser.prefix('++') ExpressionParser.postfix('+') @ExpressionParser.method(ExpressionParser.prefix('++', bp=90)) def evaluate_increment(self_, context=None): return self_[0].evaluate(context) + 1 @ExpressionParser.method(ExpressionParser.postfix('+', bp=90)) def evaluate_plus(self_, context=None): return self_[0].evaluate(context) + 1 @ExpressionParser.method(ExpressionParser.infixr('-', bp=50)) def evaluate_minus(self_, context=None): return self_[0].evaluate(context) - self_[1].evaluate(context) @ExpressionParser.method('*', bp=70) def nud_mul(self_): for _ in range(3): self_.append(self_.parser.expression(rbp=70)) return self_ @ExpressionParser.method('*', bp=70) def evaluate_mul(self_, context=None): return self_[0].evaluate(context) * \ self_[1].evaluate(context) * self_[2].evaluate(context) ExpressionParser.literal('(integer)') ExpressionParser.register('(end)') parser = ExpressionParser() token = parser.parse('++5') self.assertEqual(token.source, '++ 5') self.assertEqual(token.evaluate(), 6) token = parser.parse('8 +') self.assertEqual(token.source, '8 +') self.assertEqual(token.evaluate(), 9) token = parser.parse(' 8 - 5') self.assertEqual(token.source, '8 - 5') self.assertEqual(token.evaluate(), 3) token = parser.parse('* 8 2 5') self.assertEqual(token.source, '* 8 2 5') self.assertEqual(token.evaluate(), 80) if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_typing.py000066400000000000000000000035371427546011100201200ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """Tests about static typing of elementpath objects.""" import unittest import subprocess import re import sys from pathlib import Path try: import mypy except ImportError: mypy = None @unittest.skipIf(mypy is None, "mypy is not installed") @unittest.skipIf(sys.version_info < (3, 8), "Python version is lesser than 3.8") class TestTyping(unittest.TestCase): @classmethod def setUpClass(cls): cls.cases_dir = Path(__file__).parent.joinpath('mypy_tests') cls.config_file = Path(__file__).parent.parent.joinpath('mypy.ini') cls.error_pattern = re.compile(r'Found \d+ error', re.IGNORECASE) def check_mypy_output(self, testfile, *options): cmd = ['mypy', '--config-file', str(self.config_file), testfile] if options: cmd.extend(str(opt) for opt in options) process = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) self.assertEqual(process.stderr, b'') output = process.stdout.decode('utf-8').strip() output_lines = output.split('\n') self.assertGreater(len(output_lines), 0, msg=output) self.assertNotRegex(output_lines[-1], self.error_pattern, msg=output) return output_lines def test_selectors(self): case_path = self.cases_dir.joinpath('selectors.py') output_lines = self.check_mypy_output(case_path, '--strict') self.assertTrue(output_lines[0].startswith('Success:'), msg='\n'.join(output_lines)) if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_xpath1_parser.py000066400000000000000000002451671427546011100213760ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # # Note: Many tests are built using the examples of the XPath standards, # published by W3C under the W3C Document License. # # References: # http://www.w3.org/TR/1999/REC-xpath-19991116/ # http://www.w3.org/TR/2010/REC-xpath20-20101214/ # http://www.w3.org/TR/2010/REC-xpath-functions-20101214/ # https://www.w3.org/Consortium/Legal/2015/doc-license # https://www.w3.org/TR/charmod-norm/ # import unittest import sys import io import math import pickle from decimal import Decimal from textwrap import dedent from typing import Optional, List, Tuple from xml.etree import ElementTree try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None try: import xmlschema except ImportError: xmlschema = None from elementpath import datatypes, XPath1Parser, XPathContext, MissingContextError, \ AttributeNode, NamespaceNode, TextNode, CommentNode, ProcessingInstructionNode, \ ElementNode, select, XPathFunction from elementpath.namespaces import XSD_NAMESPACE, XPATH_FUNCTIONS_NAMESPACE, \ XPATH_MATH_FUNCTIONS_NAMESPACE, XSD_ANY_ATOMIC_TYPE, XSD_ANY_SIMPLE_TYPE, \ XSD_UNTYPED_ATOMIC try: from tests import xpath_test_class except ImportError: import xpath_test_class XML_GENERIC_TEST = """ some content space space \t . """ XML_DATA_TEST = """ 3.4 20 -10.1 alpha true 44 """ # noinspection PyPropertyAccess,PyTypeChecker class XPath1ParserTest(xpath_test_class.XPathTestCase): def setUp(self): self.parser = XPath1Parser(self.namespaces, strict=True) # # Test methods @unittest.skipIf(sys.version_info < (3,), "Python 2 pickling is not supported.") def test_parser_pickling(self): if getattr(self.parser, 'schema', None) is None: obj = pickle.dumps(self.parser) parser = pickle.loads(obj) obj = pickle.dumps(self.parser.symbol_table) symbol_table = pickle.loads(obj) self.assertEqual(self.parser, parser) self.assertEqual(self.parser.symbol_table, symbol_table) def test_xpath_tokenizer(self): # tests from the XPath specification self.check_tokenizer("*", ['*']) self.check_tokenizer("text()", ['text', '(', ')']) self.check_tokenizer("@name", ['@', 'name']) self.check_tokenizer("@*", ['@', '*']) self.check_tokenizer("para[1]", ['para', '[', '1', ']']) self.check_tokenizer("para[last()]", ['para', '[', 'last', '(', ')', ']']) self.check_tokenizer("*/para", ['*', '/', 'para']) self.check_tokenizer("/doc/chapter[5]/section[2]", ['/', 'doc', '/', 'chapter', '[', '5', ']', '/', 'section', '[', '2', ']']) self.check_tokenizer("chapter//para", ['chapter', '//', 'para']) self.check_tokenizer("//para", ['//', 'para']) self.check_tokenizer("//olist/item", ['//', 'olist', '/', 'item']) self.check_tokenizer(".", ['.']) self.check_tokenizer(".//para", ['.', '//', 'para']) self.check_tokenizer("..", ['..']) self.check_tokenizer("../@lang", ['..', '/', '@', 'lang']) self.check_tokenizer("chapter[title]", ['chapter', '[', 'title', ']']) self.check_tokenizer( "employee[@secretary and @assistant]", ['employee', '[', '@', 'secretary', '', 'and', '', '@', 'assistant', ']'] ) self.check_tokenizer('/root/a/true()', ['/', 'root', '/', 'a', '/', 'true', '(', ')']) # additional tests from Python XML etree test cases self.check_tokenizer("{http://spam}egg", ['{', 'http', ':', '//', 'spam', '}', 'egg']) self.check_tokenizer("./spam.egg", ['.', '/', 'spam.egg']) self.check_tokenizer(".//spam:egg", ['.', '//', 'spam', ':', 'egg']) # additional tests self.check_tokenizer("substring-after()", ['substring-after', '(', ')']) self.check_tokenizer("contains('XML','XM')", ['contains', '(', "'XML'", ',', "'XM'", ')']) self.check_tokenizer( "concat('XML', true(), 10)", ['concat', '(', "'XML'", ',', '', 'true', '(', ')', ',', '', '10', ')'] ) self.check_tokenizer("concat('a', 'b', 'c')", ['concat', '(', "'a'", ',', '', "'b'", ',', '', "'c'", ')']) self.check_tokenizer("_last()", ['_last', '(', ')']) self.check_tokenizer("last ()", ['last', '', '(', ')']) self.check_tokenizer('child::text()', ['child', '::', 'text', '(', ')']) self.check_tokenizer('./ /.', ['.', '/', '', '/', '.']) self.check_tokenizer('tns :*', ['tns', '', ':', '*']) def test_token_classes(self): # Literals self.check_token('(string)', 'literal', "'hello' string", "_StringLiteral(value='hello')", 'hello') self.check_token('(integer)', 'literal', "1999 integer", "_IntegerLiteral(value=1999)", 1999) self.check_token('(float)', 'literal', "3.1415 float", "_FloatLiteral(value=3.1415)", 3.1415) self.check_token('(decimal)', 'literal', "217.35 decimal", "_DecimalLiteral(value=217.35)", 217.35) self.check_token('(name)', 'literal', "'schema' name", "_NameLiteral(value='schema')", 'schema') # Variables self.check_token('$', 'operator', "$ variable reference", "_DollarSignOperator()") # Axes self.check_token('self', 'axis', "'self' axis", "_SelfAxis()") self.check_token('child', 'axis', "'child' axis", "_ChildAxis()") self.check_token('parent', 'axis', "'parent' axis", "_ParentAxis()") self.check_token('ancestor', 'axis', "'ancestor' axis", "_AncestorAxis()") self.check_token('preceding', 'axis', "'preceding' axis", "_PrecedingAxis()") self.check_token('descendant-or-self', 'axis', "'descendant-or-self' axis") self.check_token('following-sibling', 'axis', "'following-sibling' axis") self.check_token('preceding-sibling', 'axis', "'preceding-sibling' axis") self.check_token('ancestor-or-self', 'axis', "'ancestor-or-self' axis") self.check_token('descendant', 'axis', "'descendant' axis") if self.parser.version == '1.0': self.check_token('attribute', 'axis', "'attribute' axis") self.check_token('following', 'axis', "'following' axis") self.check_token('namespace', 'axis', "'namespace' axis") # Functions self.check_token( 'position', 'function', "'position' function", "_PositionFunction()" ) # Operators self.check_token('and', 'operator', "'and' operator", "_AndOperator()") if self.parser.version == '1.0': self.check_token(',', 'symbol', "comma symbol", "_CommaSymbol()") else: self.check_token(',', 'operator', "comma operator", "_CommaOperator()") def test_token_tree(self): self.check_tree('child::B1', '(child (B1))') self.check_tree('A/B//C/D', '(/ (// (/ (A) (B)) (C)) (D))') self.check_tree('child::*/child::B1', '(/ (child (*)) (child (B1)))') self.check_tree('attribute::name="Galileo"', "(= (attribute (name)) ('Galileo'))") self.check_tree('1 + 2 * 3', '(+ (1) (* (2) (3)))') self.check_tree('(1 + 2) * 3', '(* (+ (1) (2)) (3))') self.check_tree("false() and true()", '(and (false) (true))') self.check_tree("false() or true()", '(or (false) (true))') self.check_tree("./A/B[C][D]/E", '(/ (/ (/ (.) (A)) ([ ([ (B) (C)) (D))) (E))') self.check_tree("string(xml:lang)", '(string (: (xml) (lang)))') self.check_tree("//text/preceding-sibling::text[1]", '(/ (// (text)) ([ (preceding-sibling (text)) (1)))') def test_token_source(self): self.check_source(' child ::B1', 'child::B1') self.check_source('false()', 'false()') self.check_source("concat('alpha', 'beta', 'gamma')", "concat('alpha', 'beta', 'gamma')") self.check_source('1 +2 * 3 ', '1 + 2 * 3') self.check_source('(1 + 2) * 3', '(1 + 2) * 3') self.check_source(' eg:example ', 'eg:example') self.check_source('attribute::name="Galileo"', "attribute::name = 'Galileo'") self.check_source(".//eg:a | .//eg:b", './/eg:a | .//eg:b') self.check_source("/A/B[C]", '/A/B[C]') if self.parser.version < '3.0': try: self.parser.strict = False self.check_source("{tns1}name", '{tns1}name') finally: self.parser.strict = True def test_parser_position(self): self.assertEqual(self.parser.position, (1, 1)) with self.assertRaises(SyntaxError) as ctx: self.parser.parse('child::node())') self.assertIn('line 1, column 14', str(ctx.exception)) with self.assertRaises(SyntaxError) as ctx: self.parser.parse(' child::node())') self.assertIn('line 1, column 15', str(ctx.exception)) with self.assertRaises(SyntaxError) as ctx: self.parser.parse(' child::node( ))') self.assertIn('line 1, column 16', str(ctx.exception)) with self.assertRaises(SyntaxError) as ctx: self.parser.parse(' child::node() )') self.assertIn('line 1, column 16', str(ctx.exception)) with self.assertRaises(SyntaxError) as ctx: self.parser.parse('^)') self.assertIn('line 1, column 1', str(ctx.exception)) with self.assertRaises(SyntaxError) as ctx: self.parser.parse(' ^)') self.assertIn('line 1, column 2', str(ctx.exception)) def test_wrong_syntax(self): self.wrong_syntax('') self.wrong_syntax(" \n \n )") self.wrong_syntax('child::1') self.wrong_syntax("{}egg") self.wrong_syntax("./*:*") self.wrong_syntax('./ /.') self.wrong_syntax(' eg : example ') def test_wrong_nargs(self): self.wrong_type("boolean()") # Too few arguments self.wrong_type("count(0, 1, 2)") # Too many arguments self.wrong_type("round(2.5, 1.7)") self.wrong_type("contains('XPath', 'XP', 20)") self.wrong_type("boolean(1, 5)") def test_xsd_qname_method(self): qname = self.parser.xsd_qname('string') self.assertEqual(qname, 'xs:string') parser = self.parser.__class__(namespaces={'xs': XSD_NAMESPACE}) parser.namespaces['xsd'] = parser.namespaces.pop('xs') self.assertEqual(parser.xsd_qname('string'), 'xsd:string') parser.namespaces.pop('xsd') with self.assertRaises(NameError) as ctx: parser.xsd_qname('string') self.assertIn('XPST0081', str(ctx.exception)) def test_is_instance_method(self): self.assertTrue(self.parser.is_instance(datatypes.UntypedAtomic(1), XSD_UNTYPED_ATOMIC)) self.assertFalse(self.parser.is_instance(1, XSD_UNTYPED_ATOMIC)) self.assertTrue(self.parser.is_instance(1, XSD_ANY_ATOMIC_TYPE)) self.assertFalse(self.parser.is_instance([1], XSD_ANY_ATOMIC_TYPE)) self.assertTrue(self.parser.is_instance(1, XSD_ANY_SIMPLE_TYPE)) self.assertTrue(self.parser.is_instance([1], XSD_ANY_SIMPLE_TYPE)) self.assertTrue(self.parser.is_instance('foo', '{%s}string' % XSD_NAMESPACE)) self.assertFalse(self.parser.is_instance(1, '{%s}string' % XSD_NAMESPACE)) self.assertTrue(self.parser.is_instance(1.0, '{%s}double' % XSD_NAMESPACE)) self.assertFalse(self.parser.is_instance(1.0, '{%s}float' % XSD_NAMESPACE)) self.parser._xsd_version = '1.1' try: self.assertTrue(self.parser.is_instance(1.0, '{%s}double' % XSD_NAMESPACE)) self.assertFalse(self.parser.is_instance(1.0, '{%s}float' % XSD_NAMESPACE)) finally: self.parser._xsd_version = '1.0' with self.assertRaises(KeyError): self.parser.is_instance('foo', '{%s}unknown' % XSD_NAMESPACE) if xmlschema is not None and self.parser.version > '1.0': schema = xmlschema.XMLSchema(""" """) self.parser.schema = xmlschema.xpath.XMLSchemaProxy(schema) try: self.assertFalse(self.parser.is_instance(1.0, 'myInt')) self.assertTrue(self.parser.is_instance(1, 'myInt')) with self.assertRaises(KeyError): self.parser.is_instance(1.0, 'dType') finally: self.parser.schema = None def test_check_variables_method(self): self.assertIsNone(self.parser.check_variables({ 'values': [1, 2, -1], 'myaddress': 'info@example.com', 'word': '' })) with self.assertRaises(TypeError) as ctx: self.parser.check_variables({'values': [None, 2, -1]}) error_message = str(ctx.exception) self.assertIn('XPDY0050', error_message) self.assertIn('Unmatched sequence type', error_message) with self.assertRaises(TypeError) as ctx: self.parser.check_variables({'other': None}) error_message = str(ctx.exception) self.assertIn('XPDY0050', error_message) self.assertIn('Unmatched sequence type', error_message) # XPath expression tests def test_node_selection(self): root = self.etree.XML('') self.check_value("mars", MissingContextError) context = XPathContext(root) self.check_value("mars", [], context=context) self.check_value("B1", [context.root[0]], context=context) self.check_value("B2", [context.root[1], context.root[3]], context=context) self.check_value("B4", [], context=context) def test_prefixed_references(self): namespaces = {'tst': "http://xpath.test/ns"} root = self.etree.XML(""" """) # Prefix references self.check_tree('eg:unknown', '(: (eg) (unknown))') self.check_tree('string(eg:unknown)', '(string (: (eg) (unknown)))') # Test evaluate method self.check_value("fn:true()", True) self.check_value("fx:true()", NameError) context = XPathContext(root) self.check_value("tst:B1", [context.root[1]], context=context) self.check_value("tst:B2", [context.root[3], context.root[7]], context=context) self.check_value("tst:B1:B2", NameError) self.check_selector("./tst:B1", root, [root[0]], namespaces=namespaces) self.check_selector("./tst:*", root, root[:], namespaces=namespaces) self.wrong_syntax("./tst:1") self.check_value("./fn:A", MissingContextError) self.wrong_type("./xs:true()") # Namespace wildcard works only for XPath > 1.0 if self.parser.version == '1.0': self.check_selector("./*:B2", root, Exception, namespaces=namespaces) else: self.check_selector("./*:B2", root, [root[1], root[3]], namespaces=namespaces) def test_braced_uri_literal(self): root = self.etree.XML(""" """) self.parser.strict = False self.check_tree('{%s}string' % XSD_NAMESPACE, "({ ('http://www.w3.org/2001/XMLSchema') (string))") self.check_tree('string({%s}unknown)' % XSD_NAMESPACE, "(string ({ ('http://www.w3.org/2001/XMLSchema') (unknown)))") self.wrong_syntax("{%s" % XSD_NAMESPACE) self.wrong_syntax("{%s}1" % XSD_NAMESPACE) self.check_value("{%s}true()" % XPATH_FUNCTIONS_NAMESPACE, True) self.check_value("string({%s}true())" % XPATH_FUNCTIONS_NAMESPACE, 'true') context = XPathContext(root) name = '{%s}alpha' % XPATH_FUNCTIONS_NAMESPACE self.check_value(name, [], context) # it's not an error to use 'fn' namespace for a name self.parser.strict = True self.wrong_syntax('{%s}string' % XSD_NAMESPACE) if not hasattr(self.etree, 'LxmlError') or self.parser.version > '1.0': # Do not test with XPath 1.0 on lxml. self.check_selector( "./{http://www.w3.org/2001/04/xmlenc#}EncryptedData", root, [], strict=False) self.check_selector("./{http://xpath.test/ns}B1", root, [root[0]], strict=False) self.check_selector("./{http://xpath.test/ns}*", root, root[:], strict=False) def test_node_types(self): document = self.etree.parse(io.StringIO(u'')) element = self.etree.Element('schema') context = XPathContext(element) attribute = AttributeNode('id', '0212349350') namespace = NamespaceNode('xs', 'http://www.w3.org/2001/XMLSchema') comment = CommentNode(self.etree.Comment('nothing important')) pi = ProcessingInstructionNode(self.etree.ProcessingInstruction('action')) text = TextNode('aldebaran') self.check_selector("node()", element, []) context.item = attribute self.check_select("self::node()", [attribute], context) context.item = namespace self.check_select("self::node()", [namespace], context) self.check_value("comment()", [], context=context) context.item = comment self.check_select("self::node()", [comment], context) self.check_select("self::comment()", [comment], context) self.check_value("comment()", MissingContextError) self.check_value("processing-instruction()", [], context=context) context.item = pi self.check_select("self::node()", [pi], context) self.check_select("self::processing-instruction()", [pi], context) self.check_select("self::processing-instruction('action')", [pi], context) self.check_select("self::processing-instruction('other')", [], context) self.check_value("processing-instruction()", MissingContextError) context.item = text self.check_select("self::node()", [text], context) self.check_select("text()", [], context) # Selects the children self.check_selector("node()", self.etree.XML('Dickens'), ['Dickens']) self.check_selector("text()", self.etree.XML('Dickens'), ['Dickens']) document = self.etree.parse(io.StringIO('Dickens')) root = document.getroot() if self.etree is not lxml_etree: # self.check_value("//self::node()", [document, root, 'Dickens'], context=context) # Skip lxml test because lxml's XPath doesn't include document root self.check_selector("//self::node()", document, [document, root, 'Dickens']) self.check_selector("/self::node()", document, [document]) self.check_selector("/self::node()", root, [root]) self.check_selector("//self::text()", root, ['Dickens']) context = XPathContext(document) self.check_select("node()", [context.root.getroot()], context) context = XPathContext(root) context.item = None # lxml differs: doesn't consider the document position even if select from an ElementTree self.check_value("/self::node()", expected=[context.root], context=context) context.item = 1 self.check_value("self::node()", expected=[], context=context) def test_unknown_function(self): self.wrong_type("unknown('5')", 'XPST0017', 'unknown function') def test_node_set_id_function(self): # XPath 1.0 id() function: https://www.w3.org/TR/1999/REC-xpath-19991116/#function-id root = self.etree.XML('') self.check_selector('id("foo")', root, [root[0]]) context = XPathContext(root) self.check_value('./B/@xml:id[id("bar")]', expected=[], context=context) context.item = None self.check_value('id("none")', expected=[], context=context) self.check_value('id("foo")', expected=[context.root[0]], context=context) self.check_value('id("bar")', expected=[context.root[2]], context=context) context.item = CommentNode(self.etree.Comment('a comment')) self.check_value('id("foo")', expected=[], context=context) def test_node_set_functions(self): root = self.etree.XML('') context = XPathContext(root, item=root[1], size=3, position=3) self.check_value("position()", MissingContextError) self.check_value("position()", 3, context=context) self.check_value("position()<=2", MissingContextError) self.check_value("position()<=2", False, context=context) self.check_value("position()=3", True, context=context) self.check_value("position()=2", False, context=context) self.check_value("last()", MissingContextError) self.check_value("last()", 3, context=context) self.check_value("last()-1", 2, context=context) self.check_selector("name(.)", root, 'A') self.check_selector("name(A)", root, '') self.check_selector("name(1.0)", root, TypeError) self.check_selector("local-name(A)", root, '') self.check_selector("namespace-uri(A)", root, '') self.check_selector("name(B2)", root, 'B2') self.check_selector("local-name(B2)", root, 'B2') self.check_selector("namespace-uri(B2)", root, '') if self.parser.version <= '1.0': self.check_selector("name(*)", root, 'B1') context = XPathContext(root, item=self.etree.Comment('a comment')) self.check_value("name()", '', context=context) root = self.etree.XML('') self.check_selector("name(.)", root, 'tst:A', namespaces={'tst': "http://xpath.test/ns"}) self.check_selector("local-name(.)", root, 'A') self.check_selector("namespace-uri(.)", root, 'http://xpath.test/ns') self.check_selector("name(tst:B1)", root, 'tst:B1', namespaces={'tst': "http://xpath.test/ns"}) self.check_selector("name(tst:B1)", root, 'tst:B1', namespaces={'tst': "http://xpath.test/ns", '': ''}) def test_string_function(self): self.check_value("string()", MissingContextError) self.check_value("string(10.0)", '10') if self.parser.version == '1.0': self.wrong_syntax("string(())") else: self.check_value("string(())", '') root = self.etree.XML('foo') self.check_value("string()", 'foo', context=XPathContext(root)) def test_string_length_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("string-length('hello world')", 11) self.check_value("string-length('')", 0) self.check_selector("a[string-length(@id) = 4]", root, [root[0]]) self.check_selector("a[string-length(@id) = 3]", root, []) self.check_selector("//b[string-length(.) = 12]", root, [root[0][0]]) self.check_selector("//b[string-length(.) = 10]", root, []) self.check_selector("//none[string-length(.) = 10]", root, []) self.check_value('fn:string-length("Harp not on that string, madam; that is past.")', 45) if self.parser.version == '1.0': self.wrong_syntax("string-length(())") self.check_value("string-length(12345)", 5) else: self.check_value("string-length(())", 0) self.check_value("string-length(('alpha'))", 5) self.check_value("string-length(('alpha'))", 5) self.wrong_type("string-length(12345)") self.wrong_type("string-length(('12345', 'abc'))") self.parser.compatibility_mode = True self.check_value("string-length(('12345', 'abc'))", 5) self.check_value("string-length(12345)", 5) self.parser.compatibility_mode = False root = self.etree.XML('foo') self.check_value("string-length()", 3, context=XPathContext(root)) def test_normalize_space_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("normalize-space(' hello \t world ')", 'hello world') self.check_selector("//c[normalize-space(.) = 'space space .']", root, [root[0][1]]) self.check_value('fn:normalize-space(" The wealthy curled darlings of our nation. ")', 'The wealthy curled darlings of our nation.') if self.parser.version == '1.0': self.wrong_syntax('fn:normalize-space(())') self.check_value("normalize-space(1000)", '1000') self.check_value("normalize-space(true())", 'true') else: self.check_value('fn:normalize-space(())', '') self.wrong_type("normalize-space(true())") self.wrong_type("normalize-space(('\ta b c ', 'other'))") self.parser.compatibility_mode = True self.check_value("normalize-space(true())", 'true') self.check_value("normalize-space(('\ta b\tc ', 'other'))", 'a b c') self.parser.compatibility_mode = False def test_translate_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("translate('hello world!', 'hw', 'HW')", 'Hello World!') self.check_value("translate('hello world!', 'hwx', 'HW')", 'Hello World!') self.check_value("translate('hello world!', 'hw!', 'HW')", 'Hello World') self.check_value("translate('hello world!', 'hw', 'HW!')", 'Hello World!') self.check_selector("a[translate(@id, 'id', 'no') = 'a_no']", root, [root[0]]) self.check_selector("a[translate(@id, 'id', 'na') = 'a_no']", root, []) self.check_selector( "//b[translate(., 'some', 'one2') = 'one2 cnnt2nt']", root, [root[0][0]]) self.check_selector("//b[translate(., 'some', 'two2') = 'one2 cnnt2nt']", root, []) self.check_selector("//none[translate(., 'some', 'two2') = 'one2 cnnt2nt']", root, []) self.check_value('fn:translate("bar","abc","ABC")', 'BAr') self.check_value('fn:translate("--aaa--","abc-","ABC")', 'AAA') self.check_value('fn:translate("abcdabc", "abc", "AB")', "ABdAB") if self.parser.version > '1.0': self.check_value("translate((), 'hw', 'HW')", '') self.wrong_type("translate((), (), 'HW')", 'XPTY0004', '2nd argument', 'empty sequence') self.wrong_type("translate((), 'hw', ())", 'XPTY0004', '3rd argument', 'empty sequence') def test_variable_substitution(self): root = self.etree.XML('' ' 40kW' ' 20kW' ' 30kWXYZ' '') variables = {'ups1': root[0], 'ups2': root[1], 'ups3': root[2]} self.check_selector('string($ups1/power)', root, '40kW', variables=variables) context = XPathContext(root, variables=self.variables) self.check_value('$word', 'alpha', context) self.wrong_syntax('${http://xpath.test/ns}word', 'XPST0003') if self.parser.version == '1.0': self.wrong_syntax('$eg:word', 'variable reference requires a simple reference name') else: context = XPathContext(root, variables={'eg:color': 'purple'}) self.check_value('$eg:color', 'purple', context) def test_substring_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("substring('Preem Palver', 1)", 'Preem Palver') self.check_value("substring('Preem Palver', 2)", 'reem Palver') self.check_value("substring('Preem Palver', 7)", 'Palver') self.check_value("substring('Preem Palver', 1, 5)", 'Preem') self.wrong_type("substring('Preem Palver', 'c', 5)") self.wrong_type("substring('Preem Palver', 1, '5')") self.check_selector("a[substring(@id, 1) = 'a_id']", root, [root[0]]) self.check_selector("a[substring(@id, 2) = '_id']", root, [root[0]]) self.check_selector("a[substring(@id, 3) = '_id']", root, []) self.check_selector("//b[substring(., 1, 5) = 'some ']", root, [root[0][0]]) self.check_selector("//b[substring(., 1, 6) = 'some ']", root, []) self.check_selector("//none[substring(., 1, 6) = 'some ']", root, []) self.check_value("substring('12345', 1.5, 2.6)", '234') self.check_value("substring('12345', 0, 3)", '12') if self.parser.version == '1.0': self.check_value("substring('12345', 0 div 0, 3)", '') self.check_value("substring('12345', 1, 0 div 0)", '') self.check_value("substring('12345', -42, 1 div 0)", '12345') self.check_value("substring('12345', -1 div 0, 1 div 0)", '') else: self.check_value('fn:substring("motor car", 6)', ' car') self.check_value('fn:substring("metadata", 4, 3)', 'ada') self.check_value('fn:substring("12345", 1.5, 2.6)', '234') self.check_value('fn:substring("12345", 0, 3)', '12') self.check_value('fn:substring("12345", 5, -3)', '') self.check_value('fn:substring("12345", -3, 5)', '1') self.check_value('fn:substring("12345", 0 div 0E0, 3)', '') self.check_value('fn:substring("12345", 1, 0 div 0E0)', '') self.check_value('fn:substring((), 1, 3)', '') self.check_value('fn:substring("12345", -42, 1 div 0)', ZeroDivisionError) self.check_value('fn:substring("12345", -42, 1 div 0E0)', '12345') self.check_value('fn:substring("12345", -1 div 0E0, 1 div 0E0)', '') self.check_value('fn:substring(("alpha"), 1, 3)', 'alp') self.check_value('fn:substring(("alpha"), (1), 3)', 'alp') self.check_value('fn:substring(("alpha"), 1, (3))', 'alp') self.wrong_type('fn:substring(("alpha"), (1, 2), 3)') self.wrong_type('fn:substring(("alpha", "beta"), 1, 3)') self.parser.compatibility_mode = True self.check_value('fn:substring(("alpha", "beta"), 1, 3)', 'alp') self.check_value('fn:substring("12345", -42, 1 div 0E0)', '12345') self.check_value('fn:substring("12345", -1 div 0E0, 1 div 0E0)', '') self.parser.compatibility_mode = False def test_starts_with_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("starts-with('Hello World', 'Hello')", True) self.check_value("starts-with('Hello World', 'hello')", False) self.check_selector("a[starts-with(@id, 'a_i')]", root, [root[0]]) self.check_selector("a[starts-with(@id, 'a_b')]", root, []) self.check_selector("//b[starts-with(., 'some')]", root, [root[0][0]]) self.check_selector("//b[starts-with(., 'none')]", root, []) self.check_selector("//none[starts-with(., 'none')]", root, []) self.check_selector("a[starts-with(@id, 'a_id')]", root, [root[0]]) self.check_selector("a[starts-with(@id, 'a')]", root, [root[0]]) self.check_selector("a[starts-with(@id, 'a!')]", root, []) self.check_selector("//b[starts-with(., 'some')]", root, [root[0][0]]) self.check_selector("//b[starts-with(., 'a')]", root, []) self.check_value("starts-with('', '')", True) self.check_value('fn:starts-with("abracadabra", "abra")', True) self.check_value('fn:starts-with("abracadabra", "a")', True) self.check_value('fn:starts-with("abracadabra", "bra")', False) if self.parser.version == '1.0': self.wrong_syntax("starts-with((), ())") self.check_value("starts-with('1999', 19)", True) else: self.check_value('fn:starts-with("tattoo", "tat", "http://www.w3.org/' '2005/xpath-functions/collation/codepoint")', True) self.check_value('fn:starts-with ("tattoo", "att", "http://www.w3.org/' '2005/xpath-functions/collation/codepoint")', False) self.check_value('fn:starts-with ((), ())', True) self.wrong_type("starts-with('1999', 19)") self.parser.compatibility_mode = True self.check_value("starts-with('1999', 19)", True) self.parser.compatibility_mode = False def test_concat_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("concat('alpha', 'beta', 'gamma')", 'alphabetagamma') self.check_value("concat('', '', '')", '') self.check_value("concat('alpha', 10, 'gamma')", 'alpha10gamma') self.check_value("concat('alpha', 'beta', 'gamma')", 'alphabetagamma') self.check_value("concat('alpha', 10, 'gamma')", 'alpha10gamma') self.check_value("concat('alpha', 'gamma')", 'alphagamma') self.check_selector("a[concat(@id, '_foo') = 'a_id_foo']", root, [root[0]]) self.check_selector("a[concat(@id, '_fo') = 'a_id_foo']", root, []) self.check_selector("//b[concat(., '_foo') = 'some content_foo']", root, [root[0][0]]) self.check_selector("//b[concat(., '_fo') = 'some content_foo']", root, []) self.check_selector("//none[concat(., '_fo') = 'some content_foo']", root, []) self.wrong_type("concat()", 'XPST0017') if self.parser.version == '1.0': self.wrong_syntax("concat((), (), ())") else: self.check_value("concat((), (), ())", '') self.check_value("concat(('a'), (), ('c'))", 'ac') self.wrong_type("concat(('a', 'b'), (), ('c'))") self.parser.compatibility_mode = True self.check_value("concat(('a', 'b'), (), ('c'))", 'ac') self.parser.compatibility_mode = False def test_contains_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("contains('XPath','XP')", True) self.check_value("contains('XP','XPath')", False) self.check_value("contains('', '')", True) self.check_selector("a[contains(@id, '_i')]", root, [root[0]]) self.check_selector("a[contains(@id, '_b')]", root, []) self.check_selector("//b[contains(., 'c')]", root, [root[0][0]]) self.check_selector("//b[contains(., ' -con')]", root, []) self.check_selector("//none[contains(., ' -con')]", root, []) if self.parser.version == '1.0': self.wrong_syntax("contains((), ())") self.check_value("contains('XPath', 20)", False) else: self.check_value('fn:contains ( "tattoo", "t", "http://www.w3.org/' '2005/xpath-functions/collation/codepoint")', True) self.check_value('fn:contains ( "tattoo", "ttt", "http://www.w3.org/' '2005/xpath-functions/collation/codepoint")', False) self.check_value('fn:contains ( "", ())', True) self.wrong_type("contains('XPath', 20)") self.check_value('fn:contains(xs:untypedAtomic("abcde"), "bcd")', True) self.check_value('fn:contains(xs:anyURI("http://xpath.test"), "th")', True) self.parser.compatibility_mode = True try: self.check_value("contains('XPath', 20)", False) finally: self.parser.compatibility_mode = False def test_substring_before_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("substring-before('Wolfgang Amadeus Mozart', 'Wolfgang')", '') self.check_value("substring-before('Wolfgang Amadeus Mozart', 'Amadeus')", 'Wolfgang ') self.check_value('substring-before("1999/04/01","/")', '1999') self.check_selector("a[substring-before(@id, 'a') = '']", root, [root[0]]) self.check_selector("a[substring-before(@id, 'id') = 'a_']", root, [root[0]]) self.check_selector("a[substring-before(@id, 'id') = '']", root, []) self.check_selector("//b[substring-before(., ' ') = 'some']", root, [root[0][0]]) self.check_selector("//b[substring-before(., 'con') = 'some']", root, []) self.check_selector("//none[substring-before(., 'con') = 'some']", root, []) if self.parser.version == '1.0': self.check_value("substring-before('2017-10-27', 10)", '2017-') self.wrong_syntax("fn:substring-before((), ())") else: self.check_value('fn:substring-before ( "tattoo", "attoo", "http://www.w3.org/' '2005/xpath-functions/collation/codepoint")', 't') self.check_value('fn:substring-before ( "tattoo", "tatto", "http://www.w3.org/' '2005/xpath-functions/collation/codepoint")', '') self.check_value('fn:substring-before ((), ())', '') self.check_value('fn:substring-before ((), "")', '') self.wrong_type("substring-before('2017-10-27', 10)") self.parser.compatibility_mode = True self.check_value("substring-before('2017-10-27', 10)", '2017-') self.parser.compatibility_mode = False def test_substring_after_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("substring-after('Wolfgang Amadeus Mozart', 'Amadeus ')", 'Mozart') self.check_value("substring-after('Wolfgang Amadeus Mozart', 'Mozart')", '') self.check_value("substring-after('', '')", '') self.check_value("substring-after('Mozart', 'B')", '') self.check_value("substring-after('Mozart', 'Bach')", '') self.check_value("substring-after('Mozart', 'Amadeus')", '') self.check_value("substring-after('Mozart', '')", 'Mozart') self.check_value('substring-after("1999/04/01","/")', '04/01') self.check_value('substring-after("1999/04/01","19")', '99/04/01') self.check_value("substring-after('Wolfgang Amadeus Mozart', 'Amadeus ')", 'Mozart') self.check_value("substring-after('Wolfgang Amadeus Mozart', 'Mozart')", '') self.check_selector("a[substring-after(@id, 'a') = '_id']", root, [root[0]]) self.check_selector("a[substring-after(@id, 'id') = '']", root, [root[0]]) self.check_selector("a[substring-after(@id, 'i') = '']", root, []) self.check_selector("//b[substring-after(., ' ') = 'content']", root, [root[0][0]]) self.check_selector("//b[substring-after(., 'con') = 'content']", root, []) self.check_selector("//none[substring-after(., 'con') = 'content']", root, []) if self.parser.version == '1.0': self.wrong_syntax("fn:substring-after((), ())") else: self.check_value('fn:substring-after("tattoo", "tat")', 'too') self.check_value('fn:substring-after("tattoo", "tattoo")', '') self.check_value("fn:substring-after((), ())", '') self.wrong_type("substring-after('2017-10-27', 10)") self.parser.compatibility_mode = True self.check_value("substring-after('2017-10-27', 10)", '-27') self.parser.compatibility_mode = False def test_boolean_functions(self): self.check_value("true()", True) self.check_value("false()", False) self.check_value("not(false())", True) self.check_value("not(true())", False) self.check_value("boolean(0)", False) self.check_value("boolean(1)", True) self.check_value("boolean(-1)", True) self.check_value("boolean('hello!')", True) self.check_value("boolean(' ')", True) self.check_value("boolean('')", False) self.wrong_type("true(1)", 'XPST0017', "'true' function has no arguments") self.wrong_syntax("true(", 'unexpected end of source') if self.parser.version == '1.0': self.wrong_syntax("boolean(())") else: self.check_value("boolean(())", False) def test_boolean_context_nonempty_elements(self): root = self.etree.XML(""" text """) context = XPathContext(root=root) root_token = self.parser.parse("boolean(node())") self.assertEqual(True, root_token.evaluate(context)) root_token = self.parser.parse("not(node())") self.assertEqual(False, root_token.evaluate(context)) root_token = self.parser.parse("not(not(node()))") self.assertEqual(True, root_token.evaluate(context)) def test_nonempty_elements(self): root = self.etree.XML(" text") context = XPathContext(root=root) root_token = self.parser.parse("normalize-space(text()) = ''") self.assertEqual(True, root_token.evaluate(context)) if self.parser.version > '1.0': with self.assertRaises(TypeError) as ctx: root_token.evaluate( context=XPathContext( root=self.etree.XML(" text ") # Two text nodes ... ) ) self.assertIn('sequence of more than one item is not allowed', str(ctx.exception)) elements = select(root, "//*") for element in elements: context = XPathContext(root=root, item=element) root_token = self.parser.parse("* or normalize-space(text()) != ''") self.assertEqual(True, root_token.evaluate(context), element) def test_lang_function(self): # From https://www.w3.org/TR/1999/REC-xpath-19991116/#section-Boolean-Functions root = self.etree.XML('') self.check_selector('lang("en")', root, True) root = self.etree.XML('
') document = self.etree.ElementTree(root) self.check_selector('lang("en")', root, True) if self.parser.version > '1.0': self.check_selector('para/lang("en")', root, True) context = XPathContext(root) self.check_value('for $x in . return $x/fn:lang(())', expected=[False], context=context) else: context = XPathContext(document, item=root[0]) self.check_value('lang("en")', True, context=context) self.check_value('lang("it")', False, context=context) root = self.etree.XML('') self.check_selector('lang("en")', root, False) if self.parser.version > '1.0': self.check_selector('b/c/lang("en")', root, False) self.check_selector('b/c/lang("en", .)', root, False) else: context = XPathContext(root, item=root[0][0]) self.check_value('lang("en")', False, context=context) self.check_selector('lang("en")', self.etree.XML(''), True) self.check_selector('lang("en")', self.etree.XML(''), True) self.check_selector('lang("en")', self.etree.XML(''), False) self.check_selector('lang("en")', self.etree.XML('
'), False) document = self.etree.ElementTree(root) context = XPathContext(root=document) if self.parser.version == '1.0': self.check_value('lang("en")', expected=False, context=context) else: self.check_value('lang("en")', expected=TypeError, context=context) context.item = document self.check_value('for $x in /a/b/c return $x/fn:lang("en")', expected=[False], context=context) def test_logical_and_operator(self): self.check_value("false() and true()", False) self.check_value("true() and true()", True) self.check_value("1 and 0", False) self.check_value("1 and 1", True) self.check_value("1 and 'jupiter'", True) self.check_value("0 and 'mars'", False) self.check_value("1 and mars", MissingContextError) context = XPathContext(self.etree.XML('')) self.check_value("1 and mars", False, context) def test_logical_or_operator(self): self.check_value("false() or true()", True) self.check_value("true() or false()", True) def test_logical_expressions(self): root_token = self.parser.parse("(@a and not(@b)) or (not(@a) and @b)") context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is False) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is False) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is True) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is True) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is True) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is False) def test_comparison_operators(self): self.check_value("0.05 = 0.05", True) self.check_value("19.03 != 19.02999", True) self.check_value("-1.0 = 1.0", False) self.check_value("1 <= 2", True) self.check_value("5 >= 9", False) self.check_value("5 > 3", True) self.check_value("5 < 20.0", True) self.check_value("2 * 2 = 4", True) self.wrong_syntax("5 > 3 < 4", "unexpected '<' operator") if self.parser.version == '1.0': self.check_value("false() = 1", False) self.check_value("0 = false()", True) else: self.wrong_type("false() = 1") self.wrong_type("0 = false()") self.wrong_value('xs:untypedAtomic("1") = xs:dayTimeDuration("PT1S")', 'FORG0001', "'1' is not an xs:duration value") def test_comparison_of_sequences(self): root = self.etree.XML('' ' 50' ' 30' ' 20' ' 40' '
') self.check_selector("/table/unit[2]/cost <= /table/unit[1]/cost", root, True) self.check_selector("/table/unit[2]/cost > /table/unit[position()!=2]/cost", root, True) self.check_selector("/table/unit[3]/cost > /table/unit[position()!=3]/cost", root, False) self.check_selector(". = 'Dickens'", self.etree.XML('Dickens'), True) def test_numerical_expressions(self): self.check_value("9", 9) self.check_value("-3", -3) self.check_value("7.1", Decimal('7.1')) self.check_value("0.45e3", 0.45e3) self.check_value(" 7+5 ", 12) self.check_value("8 - 5", 3) self.check_value("-8 - 5", -13) self.check_value("-3 * 7", -21) self.check_value("(5 * 7) + 9", 44) self.check_value("-3 * 7", -21) self.check_value('(2 + 4) * 5', 30) self.check_value('2 + 4 * 5', 22) # From W3C XQuery/XPath test suite self.wrong_syntax('1.1.1.E2') self.wrong_syntax('.0.1') def test_addition_and_subtraction_operators(self): # '+' and '-' are both prefix and infix operators. The binding # power is equal to 40 but the nud() method is set with rbp=70. self.check_value("9 + 1 + 6", 16) self.check_tree("9 - 1 + 6", '(+ (- (9) (1)) (6))') self.check_value("(9 - 1) + 6", 14) self.check_value("9 - 1 + 6", 14) self.check_tree('1 + 2 * 4 + (1 + 2 + 3 * 4)', '(+ (+ (1) (* (2) (4))) (+ (+ (1) (2)) (* (3) (4))))') self.check_value('1 + 2 * 4 + (1 + 2 + 3 * 4)', 24) self.check_tree('15 - 13.64 - 1.36', "(- (- (15) (Decimal('13.64'))) (Decimal('1.36')))") self.check_tree('15 + 13.64 + 1.36', "(+ (+ (15) (Decimal('13.64'))) (Decimal('1.36')))") self.check_value('15 - 13.64 - 1.36', 0) if self.parser.version != '1.0': self.check_tree('(5, 6) instance of xs:integer+', '(instance (, (5) (6)) (: (xs) (integer)) (+))') self.check_tree('- 1 instance of xs:int', "(instance (- (1)) (: (xs) (int)))") self.check_tree('+ 1 instance of xs:int', "(instance (+ (1)) (: (xs) (int)))") self.wrong_type('2 - 1 instance of xs:int', 'XPTY0004') def test_div_operator(self): self.check_value("5 div 2", 2.5) self.check_value("0 div 2", 0.0) if self.parser.version == '1.0': self.check_value("10div 3", SyntaxError) # TODO: accepted syntax in XPath 1.0 else: self.check_value("() div 2") self.check_raise('1 div 0.0', ZeroDivisionError, 'FOAR0001', 'Division by zero') def test_numerical_add_operator(self): self.check_value("3 + 8", 11) self.check_value("+9", 9) self.wrong_syntax("+") root = self.etree.XML(XML_DATA_TEST) if self.parser.version == '1.0': self.check_value("'9' + 5.0", 14) self.check_selector("/values/a + 2", root, 5.4) self.check_value("/values/b + 2", float('nan'), context=XPathContext(root)) self.check_value("+'alpha'", float('nan')) self.check_value("3 + 'alpha'", float('nan')) else: self.check_selector("/values/a + 2", root, TypeError) self.check_value("/values/b + 2", ValueError, context=XPathContext(root)) self.wrong_type("+'alpha'") self.wrong_type("3 + 'alpha'") self.check_value("() + 81") self.check_value("72 + ()") self.check_value("+()") self.wrong_type('xs:dayTimeDuration("P1D") + xs:duration("P6M")', 'XPTY0004') self.check_selector("/values/d + 3", root, 47) def test_numerical_sub_operator(self): self.check_value("9 - 5.0", 4) self.check_value("-8", -8) self.wrong_syntax("-") root = self.etree.XML(XML_DATA_TEST) if self.parser.version == '1.0': self.check_value("'9' - 5.0", 4) self.check_selector("/values/a - 2", root, 1.4) self.check_value("/values/b - 1", float('nan'), context=XPathContext(root)) self.check_value("-'alpha'", float('nan')) self.check_value("3 - 'alpha'", float('nan')) else: self.check_selector("/values/a - 2", root, TypeError) self.check_value("/values/b - 2", ValueError, context=XPathContext(root)) self.wrong_type("-'alpha'") self.wrong_type("3 - 'alpha'") self.check_value("() - 6") self.check_value("19 - ()") self.check_value("-()") self.wrong_type('xs:duration("P3Y") - xs:yearMonthDuration("P2Y3M")', 'XPTY0004') self.check_selector("/values/d - 3", root, 41) def test_numerical_mod_operator(self): self.check_value("11 mod 3", 2) self.check_value("4.5 mod 1.2", Decimal('0.9')) self.check_value("1.23E2 mod 0.6E1", 3.0E0) self.check_value("10 mod 0e1", math.isnan) self.check_raise('3 mod 0', ZeroDivisionError, 'FOAR0001') root = self.etree.XML(XML_DATA_TEST) if self.parser.version == '1.0': self.check_selector("/values/a mod 2", root, 1.4) self.check_value("/values/b mod 2", float('nan'), context=XPathContext(root)) else: self.check_selector("/values/a mod 2", root, TypeError) self.check_value("/values/b mod 2", TypeError, context=XPathContext(root)) self.check_value("() mod 2e1") self.check_value("2 mod xs:float('INF')", 2) self.check_selector("/values/d mod 3", root, 2) def test_number_function(self): root = self.etree.XML('15') self.check_value("number()", MissingContextError) self.check_value("number()", 15, context=XPathContext(root)) self.check_value("number()", 15, context=XPathContext(root, item=root.text)) self.check_value("number(.)", 15, context=XPathContext(root)) self.check_value("number(5.0)", 5.0) self.check_value("number('text')", math.isnan) self.check_value("number('-11')", -11) self.check_selector("number(9)", root, 9.0) if self.parser.version == '1.0': self.wrong_syntax("number(())") else: self.check_value("number(())", float('nan'), context=XPathContext(root)) root = self.etree.XML(XML_DATA_TEST) self.check_selector("/values/a/number()", root, [3.4, 20.0, -10.1]) results = select(root, "/values/*/number()", parser=self.parser.__class__) self.assertEqual(results[:3], [3.4, 20.0, -10.1]) self.assertTrue(math.isnan(results[3]) and math.isnan(results[4])) self.check_selector("number(/values/d)", root, 44.0) self.check_selector("number(/values/a)", root, TypeError) def test_count_function(self): root = self.etree.XML('') self.check_selector("count(B)", root, 3) self.check_selector("count(.//C)", root, 5) root = self.etree.XML('5') self.check_selector("count(@avg)", root, 0) self.check_selector("count(@max)", root, 1) self.check_selector("count(@min)", root, 1) self.check_selector("count(@min | @max)", root, 2) self.check_selector("count(@min | @avg)", root, 1) self.check_selector("count(@top | @avg)", root, 0) self.check_selector("count(@min | @max) = 1", root, False) self.check_selector("count(@min | @max) = 2", root, True) def test_sum_function(self): root = self.etree.XML(XML_DATA_TEST) context = XPathContext(root, variables=self.variables) self.check_value("sum($values)", 35, context) self.check_selector("sum(/values/a)", root, 13.299999999999999) if self.parser.version == '1.0': self.check_selector("sum(/values/*)", root, math.isnan) self.wrong_syntax("sum(())") else: self.check_selector("sum(/values/*)", root, TypeError) self.check_value("sum(())", 0) self.check_value("sum((), ())", []) self.check_value('sum((xs:yearMonthDuration("P2Y"), xs:yearMonthDuration("P1Y")))', datatypes.YearMonthDuration(months=36)) self.wrong_type('sum((xs:duration("P2Y"), xs:duration("P1Y")))', 'FORG0006') self.wrong_type('sum(("P2Y", "P1Y"))', 'FORG0006') self.check_value("sum((1.0, xs:float('NaN')))", math.isnan) def test_ceiling_function(self): root = self.etree.XML(XML_DATA_TEST) self.check_value("ceiling(10.5)", 11) self.check_value("ceiling(-10.5)", -10) self.check_selector("//a[ceiling(.) = 10]", root, []) self.check_selector("//a[ceiling(.) = -10]", root, [root[2]]) if self.parser.version == '1.0': self.wrong_syntax("ceiling(())") else: self.check_value("ceiling(())", []) self.check_value("ceiling((10.5))", 11) self.check_value("ceiling((xs:float('NaN')))", math.isnan) self.wrong_type("ceiling((10.5, 17.3))") def test_floor_function(self): root = self.etree.XML(XML_DATA_TEST) self.check_value("floor(10.5)", 10) self.check_value("floor(-10.5)", -11) self.check_selector("//a[floor(.) = 10]", root, []) self.check_selector("//a[floor(.) = 20]", root, [root[1]]) if self.parser.version == '1.0': self.wrong_syntax("floor(())") self.check_selector("//ab[floor(.) = 10]", root, []) else: self.check_value("floor(())", []) self.check_value("floor((10.5))", 10) self.wrong_type("floor((10.5, 17.3))") def test_round_function(self): self.check_value("round(2.5)", 3) self.check_value("round(2.4999)", 2) self.check_value("round(-2.5)", -2) if self.parser.version == '1.0': self.wrong_syntax("round(())") self.check_value("round('foo')", math.isnan) else: self.check_value("round(())", []) self.check_value("round((10.5))", 11) self.wrong_type("round((2.5, 12.2))") self.check_value("round(xs:double('NaN'))", math.isnan) self.wrong_type("round('foo')", 'XPTY0004') self.check_value('fn:round(xs:double("1E300"))', 1E300) def test_context_variables(self): root = self.etree.XML('') context = XPathContext(root, variables={'alpha': 10, 'id': '19273222'}) self.check_value("$alpha", MissingContextError) self.check_value("$alpha", 10, context=context) self.check_value("$beta", NameError, context=context) self.check_value("$id", '19273222', context=context) if self.parser.version == '1.0': self.wrong_type("$id()", 'XPST0017') else: self.check_value("$id()", MissingContextError) def test_path_step_operator(self): root = self.etree.XML('') document = self.etree.ElementTree(root) self.check_selector('/', root, []) if self.etree is ElementTree or self.parser.version > '1.0': # Skip lxml'xpath() comparison because it doesn't include document selection self.check_selector('/', document, [document]) self.check_selector('/B1', root, []) self.check_selector('/A1', root, []) self.check_selector('/A', root, [root]) self.check_selector('/A/B1', root, [root[0]]) self.check_selector('/A/*', root, [root[0], root[1], root[2]]) self.check_selector('/*/*', root, [root[0], root[1], root[2]]) self.check_selector('/A/B1/C1', root, [root[0][0]]) self.check_selector('/A/B1/*', root, [root[0][0]]) self.check_selector('/A/B3/*', root, [root[2][0], root[2][1]]) self.check_selector('child::*/child::C1', root, [root[0][0], root[2][0]]) self.check_selector('/A/child::B3', root, [root[2]]) self.check_selector('/A/child::C1', root, []) if self.parser.version == '1.0': self.wrong_type('/true()') self.wrong_type('/A/true()') self.wrong_syntax('/|') else: self.check_value('/true()', [True], context=XPathContext(root)) self.check_value('/A/true()', [True], context=XPathContext(root)) self.wrong_syntax('/|') root = self.etree.XML("") context = XPathContext(root) self.check_value('/A', [context.root], context=context) context = XPathContext(root, item=root[0][0]) self.check_value('/A', [context.root], context=context) def test_path_step_operator_with_duplicates(self): root = self.etree.XML('1011101011') self.check_selector('/A/node()', root, ['10', root[0], '10', root[1], '10', root[2]]) self.check_selector('/A/node() | /A/node()', root, ['10', root[0], '10', root[1], '10', root[2]]) self.check_selector('/A/node() | /A/B/text()', root, ['10', root[0], '11', '10', root[1], '10', root[2], '11']) root = self.etree.XML('') self.check_selector('/A/B1/@a', root, ['2', '2', '3']) self.check_selector('/A/B1/@a | /A/B1/@a', root, ['2', '2', '3']) self.check_selector('/A/B1/@a | /A/@a', root, ['2', '2', '2', '3']) self.check_selector('/A/B1/@a | /A/B2/@a', root, ['2', '2', '3', '2']) def test_context_item_expression(self): root = self.etree.XML('') self.check_selector('.', root, [root]) self.check_selector('./.', root, [root]) self.check_selector('././.', root, [root]) self.check_selector('./././.', root, [root]) self.check_selector('/', root, []) self.check_selector('/.', root, []) self.check_selector('/./.', root, []) self.check_selector('/././.', root, []) self.check_selector('/A/.', root, [root]) self.check_selector('/A/B1/.', root, [root[0]]) self.check_selector('/A/B1/././.', root, [root[0]]) self.check_selector('1/.', root, TypeError) document = self.etree.ElementTree(root) context = XPathContext(root) self.check_value('.', [context.root], context=context) context = XPathContext(root=document) self.check_value('.', [context.root], context=context) def test_self_axis(self): root = self.etree.XML('A textB1 textB3 text') self.check_selector('self::node()', root, [root]) self.check_selector('self::text()', root, []) def test_child_axis(self): root = self.etree.XML('A textB1 textB3 text') self.check_selector('child::B1', root, [root[0]]) self.check_selector('child::A', root, []) self.check_selector('child::text()', root, ['A text']) self.check_selector('child::node()', root, ['A text'] + root[:]) self.check_selector('child::*', root, root[:]) root = self.etree.XML('') self.check_selector('child::eg:A', root, [], namespaces={'eg': 'http://www.example.com/ns/'}) self.check_selector('child::eg:B1', root, [root[0]], namespaces={'eg': 'http://www.example.com/ns/'}) def test_descendant_axis(self): root = self.etree.XML('') self.check_selector('descendant::node()', root, [e for e in root.iter()][1:]) self.check_selector('/descendant::node()', root, [e for e in root.iter()]) def test_descendant_or_self_axis(self): root = self.etree.XML('') self.check_selector('descendant-or-self::node()', root, [e for e in root.iter()]) self.check_selector('descendant-or-self::node()/.', root, [e for e in root.iter()]) def test_double_slash_shortcut(self): root = self.etree.XML('') self.check_selector('//.', root, [e for e in root.iter()]) self.check_selector('/A//.', root, [e for e in root.iter()]) self.check_selector('/A//self::node()', root, [e for e in root.iter()]) self.check_selector('//C1', root, [root[2][1]]) self.check_selector('//B2', root, [root[1]]) self.check_selector('//C', root, [root[0][0], root[2][0]]) self.check_selector('//*', root, [e for e in root.iter()]) self.check_value('/1//*', TypeError, context=XPathContext(root)) # Issue #14 root = self.etree.XML(""" """) self.check_selector('/pm/content/pmEntry/pmEntry//pmEntry[@pmEntryType]', root, []) root = self.etree.XML("") context = XPathContext(root) expected = list(e for e in context.root.iter() if isinstance(e, ElementNode)) self.check_value('//*', expected=expected, context=context) context = XPathContext(root, item=root[0][0]) expected = list(e for e in context.root.iter() if isinstance(e, ElementNode)) self.check_value('//*', expected=expected, context=context) root = self.etree.XML("") context = XPathContext(root) expected = list(e for e in context.root.iter() if isinstance(e, ElementNode)) self.check_value('//A', expected=expected, context=context) def test_double_slash_shortcut_pr16(self): # Pull-Request #16 root = self.etree.XML("""
  • a
""") self.check_selector("//span", root, [root[0][0][0]]) # self.check_selector("//span[concat('', '', 'class_a')='class_a']/text()", root, ['a']) self.check_selector("//span[concat('', '', @class)='class_a']/text()", root, ['a']) def test_following_axis(self): root = self.etree.XML( '') self.check_selector('/A/B1/C1/following::*', root, [ root[1], root[2], root[2][0], root[2][1], root[3], root[3][0], root[3][0][0] ]) self.check_selector('/A/B1/following::C1', root, [root[2][0], root[3][0]]) self.check_value('following::*', MissingContextError) def test_following_sibling_axis(self): root = self.etree.XML('') self.check_selector( '/A/B1/C1/following-sibling::*', root, [root[0][1], root[0][2]]) self.check_selector( '/A/B2/C1/following-sibling::*', root, [root[1][1], root[1][2], root[1][3]]) self.check_selector('/A/B1/C1/following-sibling::C3', root, [root[0][2]]) self.check_selector("/A/B1/C1/1/following-sibling::*", root, TypeError) self.check_selector("/A/B1/C1/@a/following-sibling::*", root, []) self.check_value('following-sibling::*', MissingContextError) def test_attribute_abbreviation_and_axis(self): root = self.etree.XML('' '') self.check_selector('/A/B1/attribute::*', root, ['beta1']) self.check_selector('/A/B1/@*', root, ['beta1']) self.check_selector('/A/B3/attribute::*', root, {'beta2', 'beta3'}) self.check_selector('/A/attribute::*', root, {'1', 'alpha'}) root = self.etree.XML('10') self.check_selector('@choice', root, ['int']) root = self.etree.XML('10') self.check_selector('@choice', root, ['int']) self.check_selector('@choice="int"', root, True) self.check_value('@choice', MissingContextError) self.check_value('@1', SyntaxError, context=XPathContext(root)) def test_namespace_axis(self): root = self.etree.XML('10') namespaces = list(self.parser.DEFAULT_NAMESPACES.items()) \ + [('tst', 'http://xpath.test/ns')] if self.parser.version == '1.0': self.check_selector('/A/namespace::*', root, expected=set(namespaces), namespaces=namespaces[-1:]) else: self.check_selector('/A/namespace::*', root, expected={'http://www.w3.org/XML/1998/namespace', 'http://xpath.test/ns'}, namespaces=namespaces[-1:]) self.check_value('namespace::*', MissingContextError) self.check_value('./text()/namespace::*', [], context=XPathContext(root)) def test_parent_shortcut_and_axis(self): root = self.etree.XML( '') self.check_selector('/A/*/C2/..', root, [root[2]]) self.check_selector('/A/*/*/..', root, [root[0], root[2], root[3]]) self.check_selector('//C2/..', root, [root[2]]) self.check_selector('/A/*/C2/parent::node()', root, [root[2]]) self.check_selector('/A/*/*/parent::node()', root, [root[0], root[2], root[3]]) self.check_selector('//C2/parent::node()', root, [root[2]]) self.check_selector('..', self.etree.ElementTree(root), []) self.check_value('..', MissingContextError) self.check_value('parent::*', MissingContextError) def test_ancestor_axes(self): root = self.etree.XML( '') self.check_selector('/A/B3/C1/ancestor::*', root, [root, root[2]]) self.check_selector('/A/B4/C1/ancestor::*', root, []) self.check_selector('/A/*/C1/ancestor::*', root, [root, root[0], root[1], root[2]]) self.check_selector('/A/*/C1/ancestor::B3', root, [root[2]]) self.check_selector('/A/B3/C1/ancestor-or-self::*', root, [root, root[2], root[2][0]]) self.check_selector('/A/*/C1/ancestor-or-self::*', root, [ root, root[0], root[0][0], root[1], root[1][0], root[2], root[2][0] ]) self.check_value('ancestor-or-self::*', MissingContextError) def test_preceding_axis(self): root = self.etree.XML('') self.check_selector('/A/B1/C2/preceding::*', root, [root[0][0]]) self.check_selector('/A/B2/C4/preceding::*', root, [ root[0], root[0][0], root[0][1], root[0][2], root[1][0], root[1][1], root[1][2] ]) root = self.etree.XML("") self.check_tree("/root/e/preceding::b", '(/ (/ (/ (root)) (e)) (preceding (b)))') self.check_selector('/root/e[2]/preceding::b', root, [root[0][0][0], root[0][1][0]]) self.check_value('preceding::*', MissingContextError) root = self.etree.XML('value') self.check_selector('./text()/preceding::*', root, []) def test_preceding_sibling_axis(self): root = self.etree.XML('') self.check_selector('/A/B1/C2/preceding-sibling::*', root, [root[0][0]]) self.check_selector('/A/B2/C4/preceding-sibling::*', root, [root[1][0], root[1][1], root[1][2]]) self.check_selector('/A/B1/C2/preceding-sibling::C3', root, []) def test_default_axis(self): """Tests about when child:: default axis is applied.""" root = self.etree.XML('firstsecond') self.check_selector('/root/a/*', root, [root[0][0]]) self.check_selector('/root/a/b', root, [root[0][0]]) self.check_selector('/root/a/node()', root, ['first', root[0][0], 'second']) self.check_selector('/root/a/text()', root, ['first', 'second']) self.check_selector('/root/a/attribute::*', root, ['1', '2']) if self.parser.version > '1.0': self.check_selector('/root/a/true()', root, [True, True]) self.check_selector('/root/a/attribute()', root, ['1', '2']) self.check_selector('/root/a/element()', root, [root[0][0]]) self.check_selector('/root/a/name()', root, ['a', 'a']) self.check_selector('/root/a/last()', root, [2, 2]) self.check_selector('/root/a/position()', root, [1, 2]) else: # Functions are not allowed after path step in XPath 1.0 self.wrong_type('/root/a/true()') def test_unknown_axis(self): self.wrong_name('unknown::node()', 'XPST0010') self.wrong_name('A/unknown::node()', 'XPST0010') def test_predicate(self): root = self.etree.XML('') self.check_selector('/A/B1[C2]', root, [root[0]]) self.check_selector('/A/B1[1]', root, [root[0]]) self.check_selector('/A/B1[2]', root, []) self.check_selector('/A/*[2]', root, [root[1]]) self.check_selector('/A/*[position()<2]', root, [root[0]]) self.check_selector('/A/*[last()-1]', root, [root[0]]) self.check_selector('/A/B2/*[position()>=2]', root, root[1][1:]) root = self.etree.XML("Asimov") self.check_selector("book/author[. = 'Asimov']", root, [root[0][0]]) self.check_selector("book/author[. = 'Dickens']", root, []) self.check_selector("book/author[text()='Asimov']", root, [root[0][0]]) root = self.etree.XML('hello ') self.check_selector("/A/*[' ']", root, root[:]) self.check_selector("/A/*['']", root, []) root = self.etree.XML("") self.check_tree("child::a[b][c]", '([ ([ (child (a)) (b)) (c))') self.check_selector("child::a[b][c]", root, [root[1]]) root = self.etree.XML("") self.check_tree("a[not(b)]", '([ (a) (not (b)))') self.check_value("a[not(b)]", [], context=XPathContext(root, item=root[0])) context = XPathContext(root, item=root[1]) self.check_value("a[not(b)]", [context.root[1][0]], context) self.check_raise('88[..]', TypeError, 'XPTY0020', 'Context item is not a node', context=XPathContext(root)) self.check_tree("preceding::a[not(b)]", '([ (preceding (a)) (not (b)))') self.check_value("a[preceding::a[not(b)]]", [], context=XPathContext(root, item=root[0])) self.check_value("a[preceding::a[not(b)]]", [], context=XPathContext(root, item=root[1])) def test_parenthesized_expression(self): self.check_value('(6 + 9)', 15) if self.parser.version == '1.0': self.check_value('()', SyntaxError) else: self.check_value('()', []) def test_union(self): root = self.etree.XML( '') self.check_selector('/A/B2 | /A/B1', root, root[:2]) self.check_selector('/A/B2 | /A/*', root, root[:]) self.check_selector('/A/B2 | /A/* | /A/B1', root, root[:]) self.check_selector('/A/@min | /A/@max', root, {'1', '10'}) self.check_raise('1|2|3', TypeError, 'XPTY0004', 'only XPath nodes are allowed', context=XPathContext(root)) def test_default_namespace(self): root = self.etree.XML('bar') self.check_selector('/foo', root, [root]) if self.parser.version == '1.0': # XPath 1.0 ignores the default namespace self.check_selector('/foo', root, [root], namespaces={'': 'ns'}) # foo --> foo else: self.check_selector('/foo', root, [], namespaces={'': 'ns'}) # foo --> {ns}foo if self.parser.version != '1.0': self.check_selector('/*:foo', root, [root], namespaces={'': 'ns'}) # foo --> {ns}foo root = self.etree.XML('bar') if self.parser.version == '1.0' or self.etree is not lxml_etree: self.check_selector('/foo', root, []) else: self.check_selector('/foo', root, [root]) if self.parser.version == '1.0': self.check_selector('/foo', root, [], namespaces={'': 'ns'}) else: self.check_selector('/foo', root, [root], namespaces={'': 'ns'}) root = self.etree.XML('') if self.parser.version > '1.0': self.check_selector("name(tst:B1)", root, 'B1' if self.etree is lxml_etree else 'tst:B1', namespaces={'tst': "http://xpath.test/ns"}) self.check_selector("name(B1)", root, 'B1', namespaces={'': "http://xpath.test/ns"}) else: # XPath 1.0 ignores the default namespace declarations self.check_selector("name(B1)", root, '', namespaces={'': "http://xpath.test/ns"}) def test_function_signatures(self): function_names = [] for tk in self.parser.symbol_table.values(): if issubclass(tk, XPathFunction) and 'function' in tk.label: function_names.append(tk.symbol) for st in tk.sequence_types: if 'dateTimeStamp' in st: self.assertFalse(self.parser.is_sequence_type(st), msg=st) with self.xsd_version_parser('1.1'): self.assertTrue(self.parser.is_sequence_type(st), msg=st) else: self.assertTrue(self.parser.is_sequence_type(st), msg=st) if self.parser.version == '1.0': self.assertEqual(len(self.parser.function_signatures), 36) elif self.parser.version == '2.0': self.assertEqual(len(self.parser.function_signatures), 150) elif self.parser.version == '3.0': self.assertEqual(len(self.parser.function_signatures), 220) for key, value in self.parser.function_signatures.items(): self.assertIsInstance(key, tuple) self.assertEqual(len(key), 2) self.assertIsInstance(key[0], datatypes.QName) self.assertIsInstance(key[1], int) try: self.assertIn(key[0].local_name, function_names) except AssertionError: self.assertIn(key[0].expanded_name, function_names) if self.parser.version <= '2.0': self.assertIn(key[0].namespace, XPATH_FUNCTIONS_NAMESPACE) elif self.parser.version == '3.0': self.assertIn(key[0].namespace, {XPATH_FUNCTIONS_NAMESPACE, XPATH_MATH_FUNCTIONS_NAMESPACE}) self.assertIsInstance(value, str) self.assertTrue(value.startswith('function(')) self.assertTrue(self.parser.is_sequence_type(value)) def test_descendant_predicate__issue_51(self): root = self.etree.XML(dedent(""" V1 3 foo V3 5 bar V1 3 V2 3 """)) self.check_selector("//target[name=//var/name]", root, expected=[root[0]]) @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath1ParserTest(XPath1ParserTest): etree = lxml_etree def check_selector(self, path, root, expected, namespaces=None, **kwargs): """Check using the selector API (the *select* function of the package).""" if isinstance(expected, type) and issubclass(expected, Exception): self.assertRaises(expected, select, root, path, namespaces, self.parser.__class__, **kwargs) else: results = select(root, path, namespaces, self.parser.__class__, **kwargs) variables = kwargs.get('variables', {}) if namespaces and '' in namespaces: namespaces = {k: v for k, v in namespaces.items() if k} if isinstance(expected, set): self.assertEqual( set(root.xpath(path, namespaces=namespaces, **variables)), expected ) self.assertEqual(set(results), expected) elif not callable(expected): self.assertEqual(root.xpath(path, namespaces=namespaces, **variables), expected) self.assertEqual(results, expected) elif isinstance(expected, type): self.assertTrue(isinstance(results, expected)) else: self.assertTrue(expected(results)) def test_namespace_axis(self): root = self.etree.XML('') namespaces: List[Tuple[Optional[str], str]] = [] namespaces.extend(self.parser.DEFAULT_NAMESPACES.items()) namespaces += [('tst', 'http://xpath.test/ns')] self.check_selector('/A/namespace::*', root, expected=set(namespaces), namespaces=namespaces[-1:]) self.check_selector('/A/namespace::*', root, expected=set(namespaces)) root = self.etree.XML('') namespaces.append((None, 'http://xpath.test/ns')) self.check_selector('/tst:A/namespace::*', root, set(namespaces), namespaces=namespaces[-2:-1]) def test_issue_25_with_count_function(self): root = lxml_etree.fromstring(""" C A P I T O L O I I I """) path = '//text/preceding-sibling::text' self.check_selector(path, root, root[:-1]) self.check_tree('//text[7]/preceding-sibling::text[1]', '(/ (// ([ (text) (7))) ([ (preceding-sibling (text)) (1)))') if self.parser.version != '1.0': self.check_tree('//text[7]/(preceding-sibling::text)[1]', '(/ (// ([ (text) (7))) ([ (preceding-sibling (text)) (1)))') path = '//text[7]/(preceding-sibling::text)[2]' self.check_selector(path, root, [root[1]]) path = '//text[7]/preceding-sibling::text[2]' self.check_selector(path, root, [root[4]]) path = 'count(//text[@size="12.482"][not(preceding-sibling::text[1][@size="12.482"])])' self.check_selector(path, root, 3) path = '//text[@size="12.482"][not(preceding-sibling::text[1][@size="12.482"])]' self.check_selector(path, root, [root[0], root[4], root[9]]) if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_xpath2_constructors.py000066400000000000000000001000411427546011100226300ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest import datetime import platform from decimal import Decimal try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None from elementpath import XPathContext, AttributeNode from elementpath.datatypes import Timezone, DateTime10, DateTime, DateTimeStamp, \ GregorianDay, GregorianMonth, GregorianMonthDay, GregorianYear10, GregorianYearMonth10, \ Duration, YearMonthDuration, DayTimeDuration, Date10, Time, QName, UntypedAtomic from elementpath.namespaces import XSD_NAMESPACE try: from tests import xpath_test_class except ImportError: import xpath_test_class class XPath2ConstructorsTest(xpath_test_class.XPathTestCase): def test_unknown_constructor(self): self.wrong_type("xs:unknown('5')", 'XPST0017', 'unknown constructor function') def test_invalid_arguments(self): # Invalid argument types (parsed by null-denotation method) self.wrong_type('xs:normalizedString(()', 'XPST0017') self.wrong_type('xs:normalizedString(5, 2)', 'XPST0017') def test_string_constructor(self): self.check_value("xs:string(5.0)", '5') self.check_value("xs:string(5.2)", '5.2') self.check_value('xs:string(" hello ")', ' hello ') self.check_value('xs:string("\thello \n")', '\thello \n') self.check_value('xs:string(())', []) self.wrong_syntax('xs:string(()', 'XPST0017') # canonical string representation of xs:hexBinary self.check_value('xs:string(xs:hexBinary("ef"))', 'EF') def test_normalized_string_constructor(self): self.check_value('xs:normalizedString("hello")', "hello") self.check_value('xs:normalizedString(" hello ")', " hello ") self.check_value('xs:normalizedString("\thello \n")', " hello ") self.check_value('xs:normalizedString(())', []) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:normalizedString(@a)', ' alpha beta ', context=context) def test_token_constructor(self): self.check_value('xs:token(" hello world ")', "hello world") self.check_value('xs:token("hello\t world\n")', "hello world") self.check_value('xs:token(xs:untypedAtomic("hello\t world\n"))', "hello world") self.check_value('xs:token(())', []) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:token(@a)', 'hello world', context=context) def test_language_constructor(self): self.check_value('xs:language(" en ")', "en") self.check_value('xs:language(xs:untypedAtomic(" en "))', "en") self.check_value('xs:language(" en-GB ")', "en-GB") self.check_value('xs:language("it-IT")', "it-IT") self.check_value('xs:language("i-klingon")', 'i-klingon') # IANA-registered language self.check_value('xs:language("x-another-language-code")', 'x-another-language-code') self.wrong_value('xs:language("MoreThan8")') self.check_value('xs:language(())', []) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:language(@a)', 'en-US', context=context) def test_nmtoken_constructor(self): self.check_value('xs:NMTOKEN(" :menù.09-_ ")', ":menù.09-_") self.check_value('xs:NMTOKEN(xs:untypedAtomic(" :menù.09-_ "))', ":menù.09-_") self.wrong_value('xs:NMTOKEN("alpha+")') self.wrong_value('xs:NMTOKEN("hello world")') self.check_value('xs:NMTOKEN(())', []) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:NMTOKEN(@a)', 'tns:example', context=context) def test_name_constructor(self): self.check_value('xs:Name(" :base ")', ":base") self.check_value('xs:Name(xs:untypedAtomic(" :base "))', ":base") self.check_value('xs:Name(" ::level_alpha ")', "::level_alpha") self.check_value('xs:Name("level-alpha")', "level-alpha") self.check_value('xs:Name("level.alpha\t\n")', "level.alpha") self.check_value('xs:Name("__init__ ")', "__init__") self.check_value('xs:Name("\u0110")', "\u0110") self.wrong_value('xs:Name("2_values")') self.wrong_value('xs:Name(" .values ")') self.wrong_value('xs:Name(" -values ")') self.check_value('xs:Name(())', []) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:Name(@a)', ':foo:', context=context) def test_ncname_constructor(self): self.check_value('xs:NCName(" base ")', "base") self.check_value('xs:NCName(xs:untypedAtomic(" base "))', "base") self.check_value('xs:NCName(" _level_alpha ")', "_level_alpha") self.check_value('xs:NCName("level-alpha")', "level-alpha") self.check_value('xs:NCName("level.alpha\t\n")', "level.alpha") self.check_value('xs:NCName("__init__ ")', "__init__") self.check_value('xs:NCName("\u0110")', "\u0110") self.wrong_value('xs:NCName("2_values")') self.wrong_value('xs:NCName(" .values ")') self.wrong_value('xs:NCName(" -values ")') self.check_value('xs:NCName(())', []) self.wrong_value('xs:NCName("tns:example")') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:NCName(@a)', 'foo', context=context) def test_id_constructor(self): self.check_value('xs:ID("xyz")', 'xyz') self.check_value('xs:ID(xs:untypedAtomic("xyz"))', 'xyz') def test_idref_constructor(self): self.check_value('xs:IDREF("xyz")', 'xyz') self.check_value('xs:IDREF(xs:untypedAtomic("xyz"))', 'xyz') def test_entity_constructor(self): self.check_value('xs:ENTITY("xyz")', 'xyz') self.check_value('xs:ENTITY(xs:untypedAtomic("xyz"))', 'xyz') def test_qname_constructor(self): qname = QName(XSD_NAMESPACE, 'xs:element') self.check_value('xs:QName(())', []) self.check_value('xs:QName("xs:element")', qname) self.check_value('xs:QName(xs:QName("xs:element"))', qname) if self.parser.version == '2.0': self.wrong_type('xs:QName(xs:untypedAtomic("xs:element"))', 'XPTY0004') else: self.check_value('xs:QName(xs:untypedAtomic("xs:element"))', qname) self.wrong_type('xs:QName(5)', 'XPTY0004', "the argument has an invalid type") self.wrong_value('xs:QName("1")', 'FORG0001', "invalid value") def test_any_uri_constructor(self): self.check_value('xs:anyURI("")', '') self.check_value('xs:anyURI("https://example.com")', 'https://example.com') self.check_value('xs:anyURI("mailto:info@example.com")', 'mailto:info@example.com') self.check_value('xs:anyURI("urn:example:com")', 'urn:example:com') self.check_value('xs:anyURI(xs:untypedAtomic("urn:example:com"))', 'urn:example:com') self.check_value('xs:anyURI("../principi/libertà.html")', '../principi/libertà.html') self.check_value('xs:anyURI("../principi/libert%E0.html")', '../principi/libert%E0.html') self.check_value('xs:anyURI("../path/page.html#frag")', '../path/page.html#frag') self.wrong_value('xs:anyURI("../path/page.html#frag1#frag2")') self.wrong_value('xs:anyURI("https://example.com/index%.html")') self.wrong_value('xs:anyURI("https://example.com/index.%html")') self.wrong_value('xs:anyURI("https://example.com/index.html% frag")') self.check_value('xs:anyURI(())', []) if platform.python_version_tuple() >= ('3', '6') and \ platform.python_implementation() != 'PyPy': self.wrong_value('xs:anyURI("https://example.com:65536")', 'FORG0001', 'Port out of range 0-65535') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:anyURI(@a)', 'https://example.com', context=context) def test_boolean_constructor(self): self.check_value('xs:boolean(())', []) self.check_value('xs:boolean(1)', True) self.check_value('xs:boolean(0)', False) self.check_value('xs:boolean(xs:boolean(0))', False) self.check_value('xs:boolean(xs:untypedAtomic(0))', False) self.wrong_type('xs:boolean(xs:hexBinary("FF"))', 'XPTY0004', "HexBinary") self.wrong_value('xs:boolean("2")', 'FORG0001', "invalid value") def test_integer_constructors(self): self.wrong_value('xs:integer("hello")', 'FORG0001') self.check_value('xs:integer("19")', 19) self.check_value('xs:integer(xs:untypedAtomic("19"))', 19) self.check_value("xs:integer('-5')", -5) self.wrong_value("xs:integer('INF')", 'FORG0001') self.check_value("xs:integer('inf')", ValueError) self.wrong_value("xs:integer('NaN')", 'FORG0001') self.wrong_value("xs:integer(xs:float('-INF'))", 'FOCA0002') self.check_value("xs:integer(xs:double('NaN'))", ValueError) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:integer(@a)', 19, context=context) root = self.etree.XML('') context = XPathContext(root, item=float('nan')) self.check_value('xs:integer(.)', ValueError, context=context) self.wrong_value('xs:nonNegativeInteger("-1")') self.wrong_value('xs:nonNegativeInteger(-1)') self.check_value('xs:nonNegativeInteger(0)', 0) self.check_value('xs:nonNegativeInteger(1000)', 1000) self.wrong_value('xs:positiveInteger(0)') self.check_value('xs:positiveInteger("1")', 1) self.wrong_value('xs:negativeInteger(0)') self.check_value('xs:negativeInteger(-1)', -1) self.wrong_value('xs:nonPositiveInteger(1)') self.check_value('xs:nonPositiveInteger(0)', 0) self.check_value('xs:nonPositiveInteger("-1")', -1) def test_limited_integer_constructors(self): self.wrong_value('xs:long("true")') self.wrong_value('xs:long("340282366920938463463374607431768211456")') self.check_value('xs:long("-20")', -20) self.wrong_value('xs:int("-20 91")') self.wrong_value('xs:int("2147483648")') self.wrong_value('xs:int(xs:untypedAtomic("INF"))') self.check_value('xs:int("2147483647")', 2**31 - 1) self.check_value('xs:int("-2147483648")', -2**31) self.wrong_value('xs:short("40000")') self.check_value('xs:short("9999")', 9999) self.check_value('xs:short(-9999)', -9999) self.wrong_value('xs:byte(-129)') self.wrong_value('xs:byte(128)') self.check_value('xs:byte("-128")', -128) self.check_value('xs:byte(127)', 127) self.check_value('xs:byte(-90)', -90) self.wrong_value('xs:unsignedLong("-10")') self.check_value('xs:unsignedLong("3")', 3) self.wrong_value('xs:unsignedInt("-4294967296")') self.check_value('xs:unsignedInt("4294967295")', 2**32 - 1) self.wrong_value('xs:unsignedShort("-1")') self.check_value('xs:unsignedShort("0")', 0) self.wrong_value('xs:unsignedByte(-128)') self.check_value('xs:unsignedByte("128")', 128) def test_decimal_constructors(self): self.check_value('xs:decimal("19")', 19) self.check_value('xs:decimal("19")', Decimal) self.check_value('xs:decimal(xs:untypedAtomic("19"))', 19) self.wrong_value('xs:decimal("hello")', 'FORG0001') self.wrong_value('xs:decimal(xs:float("INF"))', 'FOCA0002') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:decimal(@a)', Decimal('10.3'), context=context) def test_double_constructor(self): self.wrong_value('xs:double("world")') self.check_value('xs:double("39.09")', 39.09) self.check_value('xs:double(xs:untypedAtomic("39.09"))', 39.09) self.check_value('xs:double(-5)', -5.0) self.check_value('xs:double(-5)', float) root = self.etree.XML('') context = XPathContext(root) context.item = context.root.attributes[0] self.check_value('xs:double(.)', float, context=context) self.check_value('xs:double(.)', 10.3, context=context) def test_float_constructor(self): self.wrong_value('xs:float("..")') self.wrong_value('xs:float("ab")', 'FORG0001') self.wrong_value('xs:float("inf")') self.check_value('xs:float(25.05)', 25.05) self.check_value('xs:float(xs:untypedAtomic(25.05))', 25.05) self.check_value('xs:float(-0.00001)', -0.00001) self.check_value('xs:float(0.00001)', float) self.check_value('xs:float("INF")', float('inf')) self.check_value('xs:float("-INF")', float('-inf')) root = self.etree.XML('') context = XPathContext(root) context.item = context.root.attributes[0] self.check_value('xs:float(.)', float, context=context) self.check_value('xs:float(.)', 10.3, context=context) self.parser._xsd_version = '1.1' try: self.check_value('xs:float(9.001)', 9.001) finally: self.parser._xsd_version = '1.1' def test_datetime_constructor(self): tz1 = Timezone(datetime.timedelta(hours=5, minutes=24)) self.check_value('xs:dateTime(())', []) self.check_value('xs:dateTime("1969-07-20T20:18:00")', DateTime10(1969, 7, 20, 20, 18)) self.check_value('xs:dateTime(xs:untypedAtomic("1969-07-20T20:18:00"))', DateTime10(1969, 7, 20, 20, 18)) self.check_value('xs:dateTime("2000-05-10T21:30:00+05:24")', datetime.datetime(2000, 5, 10, hour=21, minute=30, tzinfo=tz1)) self.check_value('xs:dateTime("1999-12-31T24:00:00")', datetime.datetime(2000, 1, 1, 0, 0)) self.check_value('xs:dateTime(xs:date("1969-07-20"))', DateTime10(1969, 7, 20)) self.check_value('xs:dateTime(xs:date("1969-07-20"))', DateTime10) with self.assertRaises(AssertionError): self.check_value('xs:dateTime(xs:date("1969-07-20"))', DateTime) self.parser._xsd_version = '1.1' try: self.check_value('xs:dateTime(xs:date("1969-07-20"))', DateTime(1969, 7, 20)) self.check_value('xs:dateTime(xs:date("1969-07-20"))', DateTime) finally: self.parser._xsd_version = '1.0' self.wrong_value('xs:dateTime("2000-05-10t21:30:00+05:24")') self.wrong_value('xs:dateTime("2000-5-10T21:30:00+05:24")') self.wrong_value('xs:dateTime("2000-05-10T21:3:00+05:24")') self.wrong_value('xs:dateTime("2000-05-10T21:13:0+05:24")') self.wrong_value('xs:dateTime("2000-05-10T21:13:0")') self.check_value('xs:dateTime("-25252734927766554-12-31T12:00:00")', OverflowError) self.wrong_type('xs:dateTime(50)', 'FORG0006', '1st argument has an invalid type') self.wrong_type('xs:dateTime("2000-05-10T21:30:00", "+05:24")', 'XPST0017') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:dateTime(@a)', DateTime10(1969, 7, 20, 20, 18), context=context) context.item = AttributeNode('a', str(DateTime10(1969, 7, 20, 20, 18))) self.check_value('xs:dateTime(.)', DateTime10(1969, 7, 20, 20, 18), context=context) context.item = AttributeNode('a', 'true') self.check_value('xs:dateTime(.)', ValueError, context=context) context.item = DateTime10(1969, 7, 20, 20, 18) self.check_value('xs:dateTime(.)', DateTime10(1969, 7, 20, 20, 18), context=context) def test_datetimestamp_constructor(self): tz0 = Timezone(datetime.timedelta(hours=7, minutes=0)) tz1 = Timezone(datetime.timedelta(hours=5, minutes=24)) ts = DateTimeStamp(1969, 7, 20, 20, 18, tzinfo=tz0) self.assertEqual(self.parser.xsd_version, '1.0') self.wrong_syntax('xs:dateTimeStamp("1969-07-20T20:18:00+07:00")') self.parser._xsd_version = '1.1' try: self.check_value('xs:dateTimeStamp(())', []) self.check_value('xs:dateTimeStamp("1969-07-20T20:18:00+07:00")', ts) self.check_value('xs:dateTimeStamp(xs:untypedAtomic("1969-07-20T20:18:00+07:00"))', ts) self.check_value('xs:dateTimeStamp("1969-07-20T20:18:00+07:00") ' 'castable as xs:dateTimeStamp', True) self.check_value('xs:untypedAtomic("1969-07-20T20:18:00+07:00") ' 'castable as xs:dateTimeStamp', True) self.check_value('xs:dateTime("1969-07-20T20:18:00+07:00") ' 'cast as xs:dateTimeStamp', ts) self.check_value('xs:dateTimeStamp("2000-05-10T21:30:00+05:24")', datetime.datetime(2000, 5, 10, hour=21, minute=30, tzinfo=tz1)) self.wrong_value('xs:dateTimeStamp("1999-12-31T24:00:00")') self.wrong_value('xs:dateTimeStamp("2000-05-10t21:30:00+05:24")') self.wrong_type('xs:dateTimeStamp("1969-07-20T20:18:00", "+07:00")', 'XPST0017') self.wrong_type('xs:dateTimeStamp("1969-07-20T20:18:00+07:00"', 'XPST0017') finally: self.parser._xsd_version = '1.0' def test_time_constructor(self): tz = Timezone(datetime.timedelta(hours=5, minutes=24)) self.check_value('xs:time("21:30:00")', datetime.datetime(2000, 1, 1, 21, 30)) self.check_value('xs:time(xs:untypedAtomic("21:30:00"))', datetime.datetime(2000, 1, 1, 21, 30)) self.check_value('xs:time("11:15:48+05:24")', datetime.datetime(2000, 1, 1, 11, 15, 48, tzinfo=tz)) self.check_value('xs:time(xs:dateTime("1969-07-20T20:18:00"))', Time(20, 18, 00)) self.wrong_value('xs:time("24:00:01")') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:time(@a)', Time(13, 15, 39), context=context) context.item = Time(20, 10, 00) self.check_value('xs:time(.)', Time(20, 10, 00), context=context) def test_date_constructor(self): tz = Timezone(datetime.timedelta(hours=-14, minutes=0)) self.check_value('xs:date("2017-01-19")', datetime.datetime(2017, 1, 19)) self.check_value('xs:date(xs:untypedAtomic("2017-01-19"))', datetime.datetime(2017, 1, 19)) self.check_value('xs:date("2011-11-11-14:00")', datetime.datetime(2011, 11, 11, tzinfo=tz)) self.check_value('xs:date(xs:dateTime("1969-07-20T20:18:00"))', Date10(1969, 7, 20)) self.wrong_value('xs:date("2011-11-11-14:01")') self.wrong_value('xs:date("11-11-11")') self.check_value('xs:date(())', []) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:date(@a)', Date10(2017, 1, 19), context=context) class DummyXsdDateType(xpath_test_class.DummyXsdType): def is_simple(self): return True def decode(self, obj, *args, **kwargs): return Date10.fromstring(obj) def validate(self, obj, *args, **kwargs): if not isinstance(obj, Date10): raise TypeError() context.item = AttributeNode('a', 'true', xsd_type=DummyXsdDateType()) self.check_value('xs:date(.)', TypeError, context=context) context.item = AttributeNode('a', str(Date10(2017, 1, 19))) self.check_value('xs:date(.)', Date10(2017, 1, 19), context=context) context.item = AttributeNode('a', 'true') self.check_value('xs:date(.)', ValueError, context=context) root = self.etree.XML("2017-10-02") context = XPathContext(root) self.check_value('xs:date(.)', Date10(2017, 10, 2), context=context) root = self.etree.XML("2017-10-02") context = XPathContext(root) self.check_value('xs:date(.)', Date10(2017, 10, 2), context=context) context = XPathContext(root, item=Date10(2017, 10, 2)) self.check_value('xs:date(.)', Date10(2017, 10, 2), context=context) def test_gregorian_day_constructor(self): tz = Timezone(datetime.timedelta(hours=5, minutes=24)) self.check_value('xs:gDay("---30")', datetime.datetime(2000, 1, 30)) self.check_value('xs:gDay(xs:untypedAtomic("---30"))', datetime.datetime(2000, 1, 30)) self.check_value('xs:gDay("---21+05:24")', datetime.datetime(2000, 1, 21, tzinfo=tz)) self.check_value('xs:gDay(xs:dateTime("1969-07-20T20:18:00"))', GregorianDay(20)) self.wrong_value('xs:gDay("---32")') self.wrong_value('xs:gDay("--19")') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:gDay(@a)', GregorianDay(8), context=context) context.item = GregorianDay(10) self.check_value('xs:gDay(.)', GregorianDay(10), context=context) def test_gregorian_month_constructor(self): self.check_value('xs:gMonth("--09")', datetime.datetime(2000, 9, 1)) self.check_value('xs:gMonth(xs:untypedAtomic("--09"))', datetime.datetime(2000, 9, 1)) self.check_value('xs:gMonth("--12")', datetime.datetime(2000, 12, 1)) self.wrong_value('xs:gMonth("--9")') self.wrong_value('xs:gMonth("-09")') self.wrong_value('xs:gMonth("--13")') self.check_value('xs:gMonth(xs:dateTime("1969-07-20T20:18:00"))', GregorianMonth(7)) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:gMonth(@a)', GregorianMonth(11), context=context) context.item = GregorianMonth(1) self.check_value('xs:gMonth(.)', GregorianMonth(1), context=context) def test_gregorian_month_day_constructor(self): tz = Timezone(datetime.timedelta(hours=-14, minutes=0)) self.check_value('xs:gMonthDay("--07-02")', datetime.datetime(2000, 7, 2)) self.check_value('xs:gMonthDay(xs:untypedAtomic("--07-02"))', datetime.datetime(2000, 7, 2)) self.check_value('xs:gMonthDay("--07-02-14:00")', datetime.datetime(2000, 7, 2, tzinfo=tz)) self.check_value('xs:gMonthDay(xs:dateTime("1969-07-20T20:18:00"))', GregorianMonthDay(7, 20)) self.wrong_value('xs:gMonthDay("--7-02")') self.wrong_value('xs:gMonthDay("-07-02")') self.wrong_value('xs:gMonthDay("--07-32")') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:gMonthDay(@a)', GregorianMonthDay(5, 20), context=context) context.item = GregorianMonthDay(1, 15) self.check_value('xs:gMonthDay(.)', GregorianMonthDay(1, 15), context=context) def test_gregorian_year_constructor(self): self.check_value('xs:gYear("2004")', datetime.datetime(2004, 1, 1)) self.check_value('xs:gYear(xs:untypedAtomic("2004"))', datetime.datetime(2004, 1, 1)) self.check_value('xs:gYear("-2004")', GregorianYear10(-2004)) self.check_value('xs:gYear("-12540")', GregorianYear10(-12540)) self.check_value('xs:gYear("12540")', GregorianYear10(12540)) self.check_value('xs:gYear(xs:dateTime("1969-07-20T20:18:00"))', GregorianYear10(1969)) self.wrong_value('xs:gYear("84")') self.wrong_value('xs:gYear("821")') self.wrong_value('xs:gYear("84")') self.check_value('"99999999999999999999999999999" castable as xs:gYear', False) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:gYear(@a)', GregorianYear10(1999), context=context) context.item = GregorianYear10(1492) self.check_value('xs:gYear(.)', GregorianYear10(1492), context=context) def test_gregorian_year_month_constructor(self): self.check_value('xs:gYearMonth("2004-02")', datetime.datetime(2004, 2, 1)) self.check_value('xs:gYearMonth(xs:untypedAtomic("2004-02"))', datetime.datetime(2004, 2, 1)) self.check_value('xs:gYearMonth(xs:dateTime("1969-07-20T20:18:00"))', GregorianYearMonth10(1969, 7)) self.wrong_value('xs:gYearMonth("2004-2")') self.wrong_value('xs:gYearMonth("204-02")') self.check_value('"99999999999999999999999999999-01" castable as xs:gYearMonth', False) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:gYearMonth(@a)', GregorianYearMonth10(1900, 1), context=context) context.item = GregorianYearMonth10(1300, 10) self.check_value('xs:gYearMonth(.)', GregorianYearMonth10(1300, 10), context=context) def test_duration_constructor(self): self.check_value('xs:duration("P3Y5M1D")', (41, 86400)) self.check_value('xs:duration(xs:untypedAtomic("P3Y5M1D"))', (41, 86400)) self.check_value('xs:duration("P3Y5M1DT1H")', (41, 90000)) self.check_value('xs:duration("P3Y5M1DT1H3M2.01S")', (41, Decimal('90182.01'))) self.check_value('xs:untypedAtomic("P3Y5M1D") castable as xs:duration', True) self.check_value('"P8192912991912Y" castable as xs:duration', False) self.wrong_value('xs:duration("P3Y5M1X")') self.assertRaises(ValueError, self.parser.parse, 'xs:duration(1)') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:duration(@a)', Duration(months=17), context=context) context.item = Duration(months=12, seconds=86400) self.check_value('xs:duration(.)', Duration(12, 86400), context=context) root = self.etree.XML('P1Y5M') context = XPathContext(root) self.check_value('xs:duration(.)', Duration(months=17), context=context) def test_year_month_duration_constructor(self): self.check_value('xs:yearMonthDuration("P3Y5M")', (41, 0)) self.check_value('xs:yearMonthDuration(xs:untypedAtomic("P3Y5M"))', (41, 0)) self.check_value('xs:yearMonthDuration("-P15M")', (-15, 0)) self.check_value('xs:yearMonthDuration("-P20Y18M")', YearMonthDuration.fromstring("-P21Y6M")) self.check_value('xs:yearMonthDuration(xs:duration("P3Y5M"))', (41, 0)) self.check_value('xs:untypedAtomic("P3Y5M") castable as xs:yearMonthDuration', True) self.check_value('"P9999999999999999Y" castable as xs:yearMonthDuration', False) self.wrong_value('xs:yearMonthDuration("-P15M1D")') self.wrong_value('xs:yearMonthDuration("P15MT1H")') self.wrong_value('xs:yearMonthDuration("P1MT10H")') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:yearMonthDuration(@a)', Duration(months=17), context=context) context.item = YearMonthDuration(months=12) self.check_value('xs:yearMonthDuration(.)', YearMonthDuration(12), context=context) def test_day_time_duration_constructor(self): self.check_value('xs:dayTimeDuration("-P2DT15H")', DayTimeDuration(seconds=-226800)) self.check_value('xs:dayTimeDuration(xs:duration("-P2DT15H"))', DayTimeDuration(seconds=-226800)) self.check_value('xs:dayTimeDuration("PT240H")', DayTimeDuration.fromstring("P10D")) self.check_value('xs:dayTimeDuration("P365D")', DayTimeDuration.fromstring("P365D")) self.check_value('xs:dayTimeDuration(xs:untypedAtomic("PT240H"))', DayTimeDuration.fromstring("P10D")) self.check_value('xs:untypedAtomic("PT240H") castable as xs:dayTimeDuration', True) self.check_value('xs:dayTimeDuration("-P2DT15H0M0S")', DayTimeDuration.fromstring('-P2DT15H')) self.check_value('xs:dayTimeDuration("P3DT10H")', DayTimeDuration.fromstring("P3DT10H")) self.check_value('xs:dayTimeDuration("PT1S")', (0, 1)) self.check_value('xs:dayTimeDuration("PT0S")', (0, 0)) self.wrong_value('xs:dayTimeDuration("+P3DT10H")', 'FORG0001') self.check_value('xs:dayTimeDuration("P999999999999999D")', OverflowError) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:dayTimeDuration(@a)', DayTimeDuration(496800), context=context) context.item = DayTimeDuration(86400) self.check_value('xs:dayTimeDuration(.)', DayTimeDuration(86400), context=context) def test_hex_binary_constructor(self): self.check_value('xs:hexBinary(())', []) self.check_value('xs:hexBinary("84")', b'84') self.check_value('xs:hexBinary(xs:hexBinary("84"))', b'84') self.wrong_type('xs:hexBinary(12)') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:hexBinary(@a)', b'84', context=context) context.item = UntypedAtomic('84') self.check_value('xs:hexBinary(.)', b'84', context=context) context.item = '84' self.check_value('xs:hexBinary(.)', b'84', context=context) context.item = b'84' self.check_value('xs:hexBinary(.)', b'84', context=context) context.item = b'XY' self.check_value('xs:hexBinary(.)', ValueError, context=context) context.item = b'F859' self.check_value('xs:hexBinary(.)', b'F859', context=context) def test_base64_binary_constructor(self): self.check_value('xs:base64Binary(())', []) self.check_value('xs:base64Binary("ODQ=")', b'ODQ=') self.check_value('xs:base64Binary(xs:base64Binary("ODQ="))', b'ODQ=') self.check_value('xs:base64Binary("YWJjZWZnaGk=")', b'YWJjZWZnaGk=') self.wrong_value('xs:base64Binary("xyz")') self.wrong_value('xs:base64Binary("\u0411")') self.wrong_type('xs:base64Binary(1e2)') self.wrong_type('xs:base64Binary(1.1)') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:base64Binary(@a)', b'YWJjZWZnaGk=', context=context) context.item = UntypedAtomic('YWJjZWZnaGk=') self.check_value('xs:base64Binary(.)', b'YWJjZWZnaGk=', context=context) context.item = b'abcefghi' # Don't change, it can be an encoded value. self.check_value('xs:base64Binary(.)', b'abcefghi', context=context) context.item = b'YWJjZWZnaGlq' self.check_value('xs:base64Binary(.)', b'YWJjZWZnaGlq', context=context) def test_untyped_atomic_constructor(self): self.check_value('xs:untypedAtomic(())', []) root = self.etree.XML('1999') context = XPathContext(root) self.check_value('xs:untypedAtomic(.)', UntypedAtomic(1999), context=context) context.item = UntypedAtomic('true') self.check_value('xs:untypedAtomic(.)', UntypedAtomic(True), context=context) def test_notation_constructor(self): self.wrong_type('xs:NOTATION()', 'XPST0017') self.wrong_type('xs:NOTATION(()', 'XPST0017') self.wrong_type('xs:NOTATION(())', 'XPST0017', 'no constructor function exists for xs:NOTATION') self.wrong_name('"A120" castable as xs:NOTATION', 'XPST0080') @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath2ConstructorsTest(XPath2ConstructorsTest): etree = lxml_etree if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_xpath2_functions.py000066400000000000000000002475051427546011100221110ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # # Note: Many tests are built using the examples of the XPath standards, # published by W3C under the W3C Document License. # # References: # http://www.w3.org/TR/1999/REC-xpath-19991116/ # http://www.w3.org/TR/2010/REC-xpath20-20101214/ # http://www.w3.org/TR/2010/REC-xpath-functions-20101214/ # https://www.w3.org/Consortium/Legal/2015/doc-license # https://www.w3.org/TR/charmod-norm/ # import unittest import datetime import io import locale import math import os import platform import time from textwrap import dedent from decimal import Decimal try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None try: import xmlschema except ImportError: xmlschema = None else: xmlschema.XMLSchema.meta_schema.build() from elementpath import XPath2Parser, XPathContext, ElementPathError, \ MissingContextError, select, Selector, datatypes, AttributeNode, \ NamespaceNode, TextNode from elementpath.namespaces import XSI_NAMESPACE, XML_NAMESPACE, XML_ID from elementpath.datatypes import DateTime10, DateTime, Date10, Date, Time, \ Timezone, DayTimeDuration, YearMonthDuration, QName, UntypedAtomic from elementpath.xpath_token import UNICODE_CODEPOINT_COLLATION try: from tests import test_xpath1_parser except ImportError: import test_xpath1_parser XML_GENERIC_TEST = test_xpath1_parser.XML_GENERIC_TEST XML_POEM_TEST = """ Kaum hat dies der Hahn gesehen, Fängt er auch schon an zu krähen: «Kikeriki! Kikikerikih!!» Tak, tak, tak! - da kommen sie. """ try: from tests import xpath_test_class except ImportError: import xpath_test_class class XPath2FunctionsTest(xpath_test_class.XPathTestCase): def setUp(self): self.parser = XPath2Parser(namespaces=self.namespaces) # Make sure the tests are repeatable. env_vars_to_tweak = 'LC_ALL', 'LANG' self.current_env_vars = {v: os.environ.get(v) for v in env_vars_to_tweak} for v in self.current_env_vars: os.environ[v] = 'en_US.UTF-8' def tearDown(self): if hasattr(self, 'current_env_vars'): for v in self.current_env_vars: if self.current_env_vars[v] is not None: os.environ[v] = self.current_env_vars[v] def test_boolean_function(self): root = self.etree.XML('') self.check_selector("boolean(/A)", root, True) self.check_selector("boolean((-10, 35))", root, TypeError) # Sequence with 2 numeric values self.check_selector("boolean((/A, 35))", root, True) def test_abs_function(self): # Test cases taken from https://www.w3.org/TR/xquery-operators/#numeric-value-functions self.check_value("abs(10.5)", 10.5) self.check_value("abs(-10.5)", 10.5) self.check_value("abs(())") root = self.etree.XML('-10') context = XPathContext(root, item=float('nan')) self.check_value("abs(.)", float('nan'), context=context) context = XPathContext(root) self.check_value("abs(.)", 10, context=context) context = XPathContext(root=self.etree.XML('foo')) self.wrong_type('abs("10")', 'XPTY0004', 'invalid argument type') with self.assertRaises(ValueError) as err: self.check_value("abs(.)", 10, context=context) self.assertIn('FOCA0002', str(err.exception)) self.assertIn('invalid string value', str(err.exception)) def test_round_half_to_even_function(self): self.check_value("round-half-to-even(())") self.check_value("round-half-to-even(0.5)", 0) self.check_value("round-half-to-even(1)", 1) self.check_value("round-half-to-even(1.5)", 2) self.check_value("round-half-to-even(2.5)", 2) self.check_value("round-half-to-even(xs:float(2.5))", 2) self.check_value("round-half-to-even(3.567812E+3, 2)", 3567.81E0) self.check_value("round-half-to-even(4.7564E-3, 2)", 0.0E0) self.check_value("round-half-to-even(35612.25, -2)", 35600) self.wrong_type('round-half-to-even(3.5, "2")', 'XPTY0004') self.check_value('fn:round-half-to-even(xs:double("1.0E300"))', 1.0E300) self.check_value('fn:round-half-to-even(4.8712122, 8328782878)', 4.8712122) root = self.etree.XML('') context = XPathContext(root, item=float('nan')) self.check_value("round-half-to-even(.)", float('nan'), context=context) self.wrong_type('round-half-to-even("wrong")', 'XPTY0004', 'invalid argument type') def test_sum_function(self): self.check_value("sum((10, 15, 6, -2))", 29) def test_avg_function(self): context = XPathContext(root=self.etree.XML(''), variables={ 'd1': YearMonthDuration.fromstring("P20Y"), 'd2': YearMonthDuration.fromstring("P10M"), 'seq3': [3, 4, 5] }) self.check_value("fn:avg($seq3)", 4.0, context=context) self.check_value("fn:avg(($d1, $d2))", YearMonthDuration.fromstring("P125M"), context=context) root_token = self.parser.parse("fn:avg(($d1, $seq3))") self.assertRaises(TypeError, root_token.evaluate, context=context) self.check_value("fn:avg(())") self.wrong_type("fn:avg('10')", 'FORG0006') self.check_value("fn:avg($seq3)", 4.0, context=context) self.check_value('avg((xs:float(1), xs:untypedAtomic(2), xs:integer(0)))', 1) self.check_value('avg((1.0, 2.0, 3.0))', 2) self.wrong_type('avg((xs:float(1), true(), xs:integer(0)))', 'FORG0006') self.wrong_type('avg((xs:untypedAtomic(3), xs:integer(3), "three"))', 'FORG0006', 'unsupported operand') root_token = self.parser.parse("fn:avg((xs:float('INF'), xs:float('-INF')))") self.assertTrue(math.isnan(root_token.evaluate(context))) root_token = self.parser.parse("fn:avg(($seq3, xs:float('NaN')))") self.assertTrue(math.isnan(root_token.evaluate(context))) root = self.etree.XML('19') self.check_selector('avg(/a/b/number(text()))', root, 5) def test_max_function(self): self.check_value("fn:max(())", []) self.check_value("fn:max((3,4,5))", 5) self.check_value("fn:max((3, 4, xs:float('NaN')))", float('nan')) self.check_value("fn:max((3,4,5), 'en_US.UTF-8')", 5) self.check_value("fn:max((5, 5.0e0))", 5.0e0) self.check_value("fn:max((xs:float(1.0E0), xs:double(15.0)))", 15.0) self.wrong_type("fn:max((3,4,'Zero'))") dt = datetime.datetime.now() self.check_value('fn:max((fn:current-date(), xs:date("2001-01-01")))', Date(dt.year, dt.month, dt.day, tzinfo=dt.tzinfo)) self.check_value('fn:max(("a", "b", "c"))', 'c') root = self.etree.XML('19') self.check_selector('max(/a/b/number(text()))', root, 9) self.check_selector('max(/a/b)', root, 9) self.check_value( 'max((xs:anyURI("http://xpath.test/ns0"), xs:anyURI("http://xpath.test/ns1")))', datatypes.AnyURI("http://xpath.test/ns1") ) self.check_value('max((xs:dayTimeDuration("P1D"), xs:dayTimeDuration("P2D")))', datatypes.DayTimeDuration(seconds=3600 * 48)) self.wrong_type('max(QName("http://xpath.test/ns", "foo"))', 'FORG0006', 'xs:QName is not an ordered type') self.wrong_type('max(xs:duration("P1Y"))', 'FORG0006', 'xs:duration is not an ordered type') def test_min_function(self): self.check_value("fn:min(())", []) self.check_value("fn:min((3,4,5))", 3) self.check_value("fn:min((3, 4, xs:float('NaN')))", float('nan')) self.check_value("fn:min((5, 5.0e0))", 5.0e0) self.check_value("fn:min((xs:float(0.0E0), xs:float(-0.0E0)))", 0.0) self.check_value("fn:min((xs:float(1.0E0), xs:double(15.0)))", 1.0) self.check_value('fn:min((fn:current-date(), xs:date("2001-01-01")))', Date.fromstring("2001-01-01")) self.check_value('fn:min(("a", "b", "c"))', 'a') root = self.etree.XML('19') self.check_selector('min(/a/b/number(text()))', root, 1) self.check_selector('min(/a/b)', root, 1) self.check_value( 'min((xs:anyURI("http://xpath.test/ns0"), xs:anyURI("http://xpath.test/ns1")))', datatypes.AnyURI("http://xpath.test/ns0") ) self.check_value('min((xs:dayTimeDuration("P1D"), xs:dayTimeDuration("P2D")))', datatypes.DayTimeDuration(seconds=3600 * 24)) self.wrong_type('min(QName("http://xpath.test/ns", "foo"))', 'FORG0006') self.wrong_type('min(xs:duration("P1Y"))', 'FORG0006') ### # Functions on strings def test_codepoints_to_string_function(self): self.check_value("codepoints-to-string((2309, 2358, 2378, 2325))", 'अशॊक') self.check_value("codepoints-to-string(2309)", 'अ') self.wrong_value("codepoints-to-string((55296))", 'FOCH0001') self.wrong_type("codepoints-to-string(('z'))", 'XPTY0004') self.wrong_type("codepoints-to-string((2309.1))", 'FORG0006') def test_string_to_codepoints_function(self): self.check_value('string-to-codepoints("Thérèse")', [84, 104, 233, 114, 232, 115, 101]) self.check_value('string-to-codepoints(())') self.wrong_type('string-to-codepoints(84)', 'XPTY0004') self.check_value('string-to-codepoints(("Thérèse"))', [84, 104, 233, 114, 232, 115, 101]) self.wrong_type('string-to-codepoints(("Thér", "èse"))', 'XPTY0004') def test_codepoint_equal_function(self): self.check_value("fn:codepoint-equal('abc', 'abc')", True) self.check_value("fn:codepoint-equal('abc', 'abcd')", False) self.check_value("fn:codepoint-equal('', '')", True) self.check_value("fn:codepoint-equal((), 'abc')") self.check_value("fn:codepoint-equal('abc', ())") self.check_value("fn:codepoint-equal((), ())") def test_compare_function(self): env_locale_setting = locale.getlocale(locale.LC_COLLATE) locale.setlocale(locale.LC_COLLATE, 'C') try: self.assertEqual(locale.getlocale(locale.LC_COLLATE), (None, None)) self.check_value("fn:compare('abc', 'abc')", 0) self.check_value("fn:compare('abc', 'abd')", -1) self.check_value("fn:compare('abc', 'abb')", 1) self.check_value("fn:compare('foo bar', 'foo bar')", 0) self.check_value("fn:compare('', '')", 0) self.check_value("fn:compare('abc', 'abcd')", -1) self.check_value("fn:compare('', ' foo bar')", -1) self.check_value("fn:compare('abcd', 'abc')", 1) self.check_value("fn:compare('foo bar', '')", 1) self.check_value('fn:compare("a","A")', 1) self.check_value('fn:compare("A","a")', -1) self.check_value('fn:compare("+++","++")', 1) self.check_value('fn:compare("1234","123")', 1) self.check_value("fn:count(fn:compare((), ''))", 0) self.check_value("fn:count(fn:compare('abc', ()))", 0) self.check_value("compare(xs:anyURI('http://example.com/'), 'http://example.com/')", 0) self.check_value( "compare(xs:untypedAtomic('http://example.com/'), 'http://example.com/')", 0 ) self.check_value('compare("𐀁", "𐀂", ' '"http://www.w3.org/2005/xpath-functions/collation/codepoint")', -1) self.check_value('compare("𐀁", "￰", ' '"http://www.w3.org/2005/xpath-functions/collation/codepoint")', 1) # Issue #17 self.check_value("fn:compare('Strassen', 'Straße')", -1) if platform.system() != 'Linux': return locale.setlocale(locale.LC_COLLATE, 'en_US.UTF-8') self.check_value("fn:compare('Strasse', 'Straße')", -1) self.check_value("fn:compare('Strassen', 'Straße')", 1) try: self.check_value("fn:compare('Strasse', 'Straße', 'it_IT.UTF-8')", -1) self.check_value("fn:compare('Strassen', 'Straße')", 1) except locale.Error: pass # Skip test if 'it_IT.UTF-8' is an unknown locale setting try: self.check_value("fn:compare('Strasse', 'Straße', 'de_DE.UTF-8')", -1) except locale.Error: pass # Skip test if 'de_DE.UTF-8' is an unknown locale setting try: self.check_value("fn:compare('Strasse', 'Straße', 'deutsch')", -1) except locale.Error: pass # Skip test if 'deutsch' is an unknown locale setting with self.assertRaises(locale.Error) as cm: self.check_value("fn:compare('Strasse', 'Straße', 'invalid_collation')") self.assertIn('FOCH0002', str(cm.exception)) self.wrong_type("fn:compare('Strasse', 111)", 'XPTY0004') self.wrong_type('fn:compare("1234", 1234)', 'XPTY0004') finally: locale.setlocale(locale.LC_COLLATE, env_locale_setting) def test_normalize_unicode_function(self): self.check_value('fn:normalize-unicode(())', '') self.check_value('fn:normalize-unicode("menù")', 'menù') self.wrong_type('fn:normalize-unicode(xs:hexBinary("84"))', 'XPTY0004') self.assertRaises(ValueError, self.parser.parse, 'fn:normalize-unicode("à", "FULLY-NORMALIZED")') self.check_value('fn:normalize-unicode("à", "")', 'à') self.wrong_value('fn:normalize-unicode("à", "UNKNOWN")') self.wrong_type('fn:normalize-unicode("à", ())', 'XPTY0004', "can't be an empty sequence") # https://www.w3.org/TR/charmod-norm/#normalization_forms self.check_value("fn:normalize-unicode('\u01FA')", '\u01FA') self.check_value("fn:normalize-unicode('\u01FA', 'NFD')", '\u0041\u030A\u0301') self.check_value("fn:normalize-unicode('\u01FA', 'NFKC')", '\u01FA') self.check_value("fn:normalize-unicode('\u01FA', 'NFKD')", '\u0041\u030A\u0301') self.check_value("fn:normalize-unicode('\u00C5\u0301')", '\u01FA') self.check_value("fn:normalize-unicode('\u00C5\u0301', 'NFD')", '\u0041\u030A\u0301') self.check_value("fn:normalize-unicode('\u00C5\u0301', 'NFKC')", '\u01FA') self.check_value("fn:normalize-unicode('\u00C5\u0301', ' nfkd ')", '\u0041\u030A\u0301') self.check_value("fn:normalize-unicode('\u212B\u0301')", '\u01FA') self.check_value("fn:normalize-unicode('\u212B\u0301', 'NFD')", '\u0041\u030A\u0301') self.check_value("fn:normalize-unicode('\u212B\u0301', 'NFKC')", '\u01FA') self.check_value("fn:normalize-unicode('\u212B\u0301', 'NFKD')", '\u0041\u030A\u0301') self.check_value("fn:normalize-unicode('\u0041\u030A\u0301')", '\u01FA') self.check_value("fn:normalize-unicode('\u0041\u030A\u0301', 'NFD')", '\u0041\u030A\u0301') self.check_value("fn:normalize-unicode('\u0041\u030A\u0301', 'NFKC')", '\u01FA') self.check_value("fn:normalize-unicode('\u0041\u030A\u0301', 'NFKD')", '\u0041\u030A\u0301') self.check_value("fn:normalize-unicode('\uFF21\u030A\u0301')", '\uFF21\u030A\u0301') self.check_value("fn:normalize-unicode('\uFF21\u030A\u0301', 'NFD')", '\uFF21\u030A\u0301') self.check_value("fn:normalize-unicode('\uFF21\u030A\u0301', 'NFKC')", '\u01FA') self.check_value("fn:normalize-unicode('\uFF21\u030A\u0301', 'NFKD')", '\u0041\u030A\u0301') def test_count_function(self): self.check_value("fn:count('')", 1) self.check_value("count('')", 1) self.check_value("fn:count('abc')", 1) self.check_value("fn:count(7)", 1) self.check_value("fn:count(())", 0) self.check_value("fn:count((1, 2, 3))", 3) self.check_value("fn:count((1, 2, ()))", 2) self.check_value("fn:count((((()))))", 0) self.check_value("fn:count((((), (), ()), (), (), (), ()))", 0) self.check_value('fn:count((1, 2 to ()))', 1) self.check_value("count(('1', (2, ())))", 2) self.check_value("count(('1', (2, '3')))", 3) self.check_value("count(1 to 5)", 5) self.check_value("count(reverse((1, 2, 3, 4)))", 4) root = self.etree.XML('') self.check_selector("count(5)", root, 1) self.check_value("count((0, 1, 2 + 1, 3 - 1))", 4) self.check_value('fn:count((xs:decimal("-999999999999999999")))', 1) self.check_value('fn:count((xs:float("0")))', 1) self.check_value("count(//*[@name='John Doe'])", MissingContextError) context = XPathContext(self.etree.XML('')) self.check_value("count(//*[@name='John Doe'])", 0, context) with self.assertRaises(TypeError) as cm: self.check_value("fn:count()") self.assertIn('XPST0017', str(cm.exception)) with self.assertRaises(TypeError) as cm: self.check_value("fn:count(1, ())") self.assertIn('XPST0017', str(cm.exception)) with self.assertRaises(TypeError) as cm: self.check_value("fn:count(1, 2)") self.assertIn('XPST0017', str(cm.exception)) def test_lower_case_function(self): self.check_value('lower-case("aBcDe01")', 'abcde01') self.check_value('lower-case(("aBcDe01"))', 'abcde01') self.check_value('lower-case(())', '') self.wrong_type('lower-case((10))') root = self.etree.XML(XML_GENERIC_TEST) self.check_selector("a[lower-case(@id) = 'a_id']", root, [root[0]]) self.check_selector("a[lower-case(@id) = 'a_i']", root, []) self.check_selector("//b[lower-case(.) = 'some content']", root, [root[0][0]]) self.check_selector("//b[lower-case((.)) = 'some content']", root, [root[0][0]]) self.check_selector("//none[lower-case((.)) = 'some content']", root, []) def test_upper_case_function(self): self.check_value('upper-case("aBcDe01")', 'ABCDE01') self.check_value('upper-case(("aBcDe01"))', 'ABCDE01') self.check_value('upper-case(())', '') self.wrong_type('upper-case((10))', 'XPTY0004') root = self.etree.XML(XML_GENERIC_TEST) self.check_selector("a[upper-case(@id) = 'A_ID']", root, [root[0]]) self.check_selector("a[upper-case(@id) = 'A_I']", root, []) self.check_selector("//b[upper-case(.) = 'SOME CONTENT']", root, [root[0][0]]) self.check_selector("//b[upper-case((.)) = 'SOME CONTENT']", root, [root[0][0]]) self.check_selector("//none[upper-case((.)) = 'SOME CONTENT']", root, []) def test_encode_for_uri_function(self): self.check_value('encode-for-uri("http://xpath.test")', 'http%3A%2F%2Fxpath.test') self.check_value('encode-for-uri("~bébé")', '~b%C3%A9b%C3%A9') self.check_value('encode-for-uri("100% organic")', '100%25%20organic') self.check_value('encode-for-uri("")', '') self.check_value('encode-for-uri(())', '') def test_iri_to_uri_function(self): self.check_value('iri-to-uri("http://www.example.com/00/Weather/CA/Los%20Angeles#ocean")', 'http://www.example.com/00/Weather/CA/Los%20Angeles#ocean') self.check_value('iri-to-uri("http://www.example.com/~bébé")', 'http://www.example.com/~b%C3%A9b%C3%A9') self.check_value('iri-to-uri("")', '') self.check_value('iri-to-uri(())', '') def test_escape_html_uri_function(self): self.check_value( 'escape-html-uri("http://www.example.com/00/Weather/CA/Los Angeles#ocean")', 'http://www.example.com/00/Weather/CA/Los Angeles#ocean' ) self.check_value("escape-html-uri(\"javascript:if (navigator.browserLanguage == 'fr') " "window.open('http://www.example.com/~bébé');\")", "javascript:if (navigator.browserLanguage == 'fr') " "window.open('http://www.example.com/~b%C3%A9b%C3%A9');") self.check_value('escape-html-uri("")', '') self.check_value('escape-html-uri(())', '') def test_string_join_function(self): self.check_value("string-join(('Now', 'is', 'the', 'time', '...'), ' ')", "Now is the time ...") self.check_value("string-join(('Blow, ', 'blow, ', 'thou ', 'winter ', 'wind!'), '')", 'Blow, blow, thou winter wind!') self.check_value("string-join((), 'separator')", '') self.check_value("string-join(('a', 'b', 'c'), ', ')", 'a, b, c') self.wrong_type("string-join(('a', 'b', 'c'), 8)", 'XPTY0004') if self.parser.version < '3.1': self.wrong_type("string-join(('a', 4, 'c'), ', ')", 'XPTY0004') else: self.check_value("string-join(('a', 4, 'c'), ', ')", 'a, 4, c') root = self.etree.XML(XML_GENERIC_TEST) self.check_selector("a[string-join((@id, 'foo', 'bar'), ' ') = 'a_id foo bar']", root, [root[0]]) self.check_selector("a[string-join((@id, 'foo'), ',') = 'a_id,foo']", root, [root[0]]) self.check_selector("//b[string-join((., 'bar'), ' ') = 'some content bar']", root, [root[0][0]]) self.check_selector("//b[string-join((., 'bar'), ',') = 'some content,bar']", root, [root[0][0]]) self.check_selector("//b[string-join((., 'bar'), ',') = 'some content bar']", root, []) self.check_selector("//none[string-join((., 'bar'), ',') = 'some content,bar']", root, []) def test_matches_function(self): self.check_value('fn:matches("abracadabra", "bra")', True) self.check_value('fn:matches("abracadabra", "^a.*a$")', True) self.check_value('fn:matches("abracadabra", "^bra")', False) self.wrong_value('fn:matches("abracadabra", "bra", "k")') self.wrong_value('fn:matches("abracadabra", "[bra")') self.wrong_value('fn:matches("abracadabra", "a{1,99999999999999999999999999}")', 'FORX0002') self.check_value('fn:matches("1", "\\S")', True) self.check_value('fn:matches(" ", "\\S")', False) self.check_value('fn:matches("", "\\S")', False) self.check_value('fn:matches("\t", "\\S")', False) self.check_value('fn:matches(" foo bar", "\\S")', True) if platform.python_implementation() != 'PyPy' or self.etree is not lxml_etree: poem_context = XPathContext(root=self.etree.XML(XML_POEM_TEST)) self.check_value('fn:matches(., "Kaum.*krähen")', False, context=poem_context) self.check_value('fn:matches(., "Kaum.*krähen", "s")', True, context=poem_context) self.check_value('fn:matches(., "^Kaum.*gesehen,$", "m")', True, context=poem_context) self.check_value('fn:matches(., "^Kaum.*gesehen,$")', False, context=poem_context) self.check_value('fn:matches(., "kiki", "i")', True, context=poem_context) root = self.etree.XML(XML_GENERIC_TEST) self.check_selector("a[matches(@id, '^a_id$')]", root, [root[0]]) self.check_selector("a[matches(@id, 'a.id')]", root, [root[0]]) self.check_selector("a[matches(@id, '_id')]", root, [root[0]]) self.check_selector("a[matches(@id, 'a!')]", root, []) self.check_selector("//b[matches(., '^some.content$')]", root, [root[0][0]]) self.check_selector("//b[matches(., '^content')]", root, []) self.check_selector("//none[matches(., '.*')]", root, []) def test_ends_with_function(self): self.check_value('fn:ends-with("abracadabra", "bra")', True) self.check_value('fn:ends-with("abracadabra", "a")', True) self.check_value('fn:ends-with("abracadabra", "cbra")', False) root = self.etree.XML(XML_GENERIC_TEST) self.check_selector("a[ends-with(@id, 'a_id')]", root, [root[0]]) self.check_selector("a[ends-with(@id, 'id')]", root, [root[0]]) self.check_selector("a[ends-with(@id, 'a!')]", root, []) self.check_selector("//b[ends-with(., 'some content')]", root, [root[0][0]]) self.check_selector("//b[ends-with(., 't')]", root, [root[0][0]]) self.check_selector("//none[ends-with(., 's')]", root, []) self.check_value('fn:ends-with ( "tattoo", "tattoo", "http://www.w3.org/' '2005/xpath-functions/collation/codepoint")', True) self.check_value('fn:ends-with ( "tattoo", "atto", "http://www.w3.org/' '2005/xpath-functions/collation/codepoint")', False) self.check_value("ends-with((), ())", True) def test_replace_function(self): self.check_value('fn:replace("abracadabra", "bra", "*")', "a*cada*") self.check_value('fn:replace("abracadabra", "a.*a", "*")', "*") self.check_value('fn:replace("abracadabra", "a.*?a", "*")', "*c*bra") self.check_value('fn:replace("abracadabra", "a", "")', "brcdbr") self.check_value('fn:replace("abracadabra", "a", "", "i")', "brcdbr") self.wrong_value('fn:replace("abracadabra", "a", "", "z")') self.wrong_value('fn:replace("abracadabra", "[a", "")') self.wrong_type('fn:replace("abracadabra")') self.check_value('fn:replace("abracadabra", "a(.)", "a$1$1")', "abbraccaddabbra") self.wrong_value('replace("abc", "a(.)", "$x")', 'FORX0004', 'Invalid replacement string') self.wrong_value('fn:replace("abracadabra", ".*?", "$1")') self.check_value('fn:replace("AAAA", "A+", "b")', "b") self.check_value('fn:replace("AAAA", "A+?", "b")', "bbbb") self.check_value('fn:replace("darted", "^(.*?)d(.*)$", "$1c$2")', "carted") self.check_value('fn:replace("abcd", "(ab)|(a)", "[1=$1][2=$2]")', "[1=ab][2=]cd") root = self.etree.XML(XML_GENERIC_TEST) self.check_selector("a[replace(@id, '^a_id$', 'foo') = 'foo']", root, [root[0]]) self.check_selector("a[replace(@id, 'a.id', 'foo') = 'foo']", root, [root[0]]) self.check_selector("a[replace(@id, '_id', 'no') = 'ano']", root, [root[0]]) self.check_selector("//b[replace(., '^some.content$', 'new') = 'new']", root, [root[0][0]]) self.check_selector("//b[replace(., '^content', '') = '']", root, []) self.check_selector("//none[replace(., '.*', 'a') = 'a']", root, []) def test_tokenize_function(self): self.check_value('fn:tokenize("abracadabra", "(ab)|(a)")', ['', 'r', 'c', 'd', 'r', '']) self.check_value(r'fn:tokenize("The cat sat on the mat", "\s+")', ['The', 'cat', 'sat', 'on', 'the', 'mat']) self.check_value(r'fn:tokenize("1, 15, 24, 50", ",\s*")', ['1', '15', '24', '50']) self.check_value('fn:tokenize("1,15,,24,50,", ",")', ['1', '15', '', '24', '50', '']) self.check_value(r'fn:tokenize("Some unparsed
HTML
text", "\s*
\s*", "i")', ['Some unparsed', 'HTML', 'text']) self.check_value('fn:tokenize("", "(ab)|(a)")', []) self.wrong_value('fn:tokenize("abc", "[a")', 'FORX0002', 'Invalid regular expression') self.wrong_value('fn:tokenize("abc", ".*?")', 'FORX0003', 'matches zero-length string') self.wrong_value('fn:tokenize("abba", ".?")') self.wrong_value('fn:tokenize("abracadabra", "(ab)|(a)", "sxf")') self.wrong_type('fn:tokenize("abracadabra", ())') self.wrong_type('fn:tokenize("abracadabra", "(ab)|(a)", ())') def test_resolve_uri_function(self): self.check_value('fn:resolve-uri("dir1/dir2", "file:///home/")', 'file:///home/dir1/dir2') self.wrong_value('fn:resolve-uri("dir1/dir2", "home/")', '') self.wrong_value('fn:resolve-uri("dir1/dir2")') self.check_value('fn:resolve-uri((), "http://xpath.test")') self.wrong_value('fn:resolve-uri("file:://file1.txt", "http://xpath.test")', 'FORG0002', "'file:://file1.txt' is not a valid URI") self.wrong_value('fn:resolve-uri("dir1/dir2", "http:://xpath.test")', 'FORG0002', "'http:://xpath.test' is not a valid URI") self.parser.base_uri = 'http://www.example.com/ns/' try: self.check_value('fn:resolve-uri("dir1/dir2")', 'http://www.example.com/ns/dir1/dir2') self.check_value('fn:resolve-uri("/dir1/dir2")', '/dir1/dir2') self.check_value('fn:resolve-uri("file:text.txt")', 'file:text.txt') self.check_value('fn:resolve-uri(())') self.wrong_value('fn:resolve-uri("http:://xpath.test")', 'FORG0002', "'http:://xpath.test' is not a valid URI") finally: self.parser.base_uri = None def test_empty_function(self): # Test cases from https://www.w3.org/TR/xquery-operators/#general-seq-funcs self.check_value('fn:empty(("hello", "world"))', False) self.check_value('fn:empty(fn:remove(("hello", "world"), 1))', False) self.check_value('fn:empty(())', True) self.check_value("empty(() * ())", True) self.check_value('fn:empty(fn:remove(("hello"), 1))', True) self.check_value('fn:empty((xs:double("0")))', False) def test_exists_function(self): self.check_value('fn:exists(("hello", "world"))', True) self.check_value('fn:exists(())', False) self.check_value('fn:exists(fn:remove(("hello"), 1))', False) self.check_value('fn:exists((xs:int("-1873914410")))', True) def test_distinct_values_function(self): self.check_value('fn:distinct-values((1, 2.0, 3, 2))', [1, 2.0, 3]) context = XPathContext( root=self.etree.XML(''), variables={ 'x': [UntypedAtomic("foo"), UntypedAtomic("bar"), UntypedAtomic("bar")] } ) self.check_value('fn:distinct-values($x)', ['foo', 'bar'], context) context = XPathContext( root=self.etree.XML(''), variables={'x': [UntypedAtomic("foo"), float('nan'), UntypedAtomic("bar")]} ) token = self.parser.parse('fn:distinct-values($x)') results = token.evaluate(context) self.assertEqual(results[0], 'foo') self.assertTrue(math.isnan(results[1])) self.assertEqual(results[2], 'bar') root = self.etree.XML('') self.check_selector( "fn:distinct-values((xs:float('NaN'), xs:double('NaN'), xs:float('NaN')))", root, math.isnan ) self.check_value('fn:distinct-values((xs:float("0"), xs:float("0")))', [0.0]) self.check_value( 'fn:distinct-values("foo", "{}")'.format(UNICODE_CODEPOINT_COLLATION), ['foo'] ) def test_index_of_function(self): self.check_value('fn:index-of ((10, 20, 30, 40), 35)', []) self.wrong_type('fn:index-of ((10, 20, 30, 40), ())', 'XPTY0004') self.check_value('fn:index-of ((10, 20, 30, 30, 20, 10), 20)', [2, 5]) self.check_value('fn:index-of (("a", "sport", "and", "a", "pastime"), "a")', [1, 4]) self.check_value( 'fn:index-of (("foo", "bar"), "bar", "{}")'.format(UNICODE_CODEPOINT_COLLATION), [2] ) # Issue #28 root = self.etree.XML(""" 030 """) test1 = "/root/descript[index-of(('030','031'), '030')]" test2 = "/root/descript[ancestor::root/incode = '030']" test3 = "/root/descript[index-of(('030','031'), ancestor::root/incode)]" self.check_selector(test1, root, [root[1]]) self.check_selector(test2, root, [root[1]]) self.check_selector(test3, root, [root[1]]) def test_insert_before_function(self): context = XPathContext(root=self.etree.XML(''), variables={'x': ['a', 'b', 'c']}) self.check_value('fn:insert-before($x, 0, "z")', ['z', 'a', 'b', 'c'], context) self.check_value('fn:insert-before($x, 1, "z")', ['z', 'a', 'b', 'c'], context) self.check_value('fn:insert-before($x, 2, "z")', ['a', 'z', 'b', 'c'], context) self.check_value('fn:insert-before($x, 3, "z")', ['a', 'b', 'z', 'c'], context) self.check_value('fn:insert-before($x, 4, "z")', ['a', 'b', 'c', 'z'], context) self.wrong_type('fn:insert-before($x, "1", "z")', 'XPTY0004', context=context) def test_remove_function(self): context = XPathContext(root=self.etree.XML(''), variables={'x': ['a', 'b', 'c']}) self.check_value('fn:remove($x, 0)', ['a', 'b', 'c'], context) self.check_value('fn:remove($x, 1)', ['b', 'c'], context) self.check_value('remove($x, 6)', ['a', 'b', 'c'], context) self.wrong_type('remove($x, "6")', 'XPTY0004', context=context) self.check_value('fn:remove((), 3)', []) def test_reverse_function(self): context = XPathContext(root=self.etree.XML(''), variables={'x': ['a', 'b', 'c']}) self.check_value('reverse($x)', ['c', 'b', 'a'], context) self.check_value('fn:reverse(("hello"))', ['hello'], context) self.check_value('fn:reverse(())', []) def test_subsequence_function(self): self.check_value('fn:subsequence((), 5)', []) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), 1)', [1, 2, 3, 4, 5, 6, 7]) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), 0)', [1, 2, 3, 4, 5, 6, 7]) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), -1)', [1, 2, 3, 4, 5, 6, 7]) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), 10)', []) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), 4)', [4, 5, 6, 7]) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), 4, 2)', [4, 5]) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), 3, 10)', [3, 4, 5, 6, 7]) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), xs:float("INF"))', []) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), xs:float("-INF"))', [1, 2, 3, 4, 5, 6, 7]) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), 5, xs:float("-INF"))', []) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), 5, xs:float("INF"))', [5, 6, 7]) def test_unordered_function(self): self.check_value('fn:unordered(())', []) self.check_value('fn:unordered(("z", 2, "3", "Z", "b", "a"))', [2, '3', 'Z', 'a', 'b', 'z']) def test_sequence_cardinality_functions(self): self.check_value('fn:zero-or-one(())', []) self.check_value('fn:zero-or-one((10))', [10]) self.wrong_value('fn:zero-or-one((10, 20))') self.wrong_value('fn:one-or-more(())') self.check_value('fn:one-or-more((10))', [10]) self.check_value('fn:one-or-more((10, 20, 30, 40))', [10, 20, 30, 40]) self.check_value('fn:exactly-one((20))', [20]) self.wrong_value('fn:exactly-one(())') self.wrong_value('fn:exactly-one((10, 20, 30, 40))') def test_qname_function(self): self.check_value('fn:string(fn:QName("", "person"))', 'person') self.check_value('fn:string(fn:QName((), "person"))', 'person') self.check_value('fn:string(fn:QName("http://www.example.com/ns/", "person"))', 'person') self.check_value('fn:string(fn:QName("http://www.example.com/ns/", "ht:person"))', 'ht:person') self.check_value('fn:string(fn:QName("http://www.example.com/ns/", "xs:person"))', 'xs:person') self.wrong_value('fn:QName("http://www.example.com/ns/", "@person")') self.wrong_type('fn:QName(1.0, "person")', 'XPTY0004', '1st argument has an invalid type') self.wrong_type('fn:QName("", 2)', 'XPTY0004', '2nd argument has an invalid type') self.wrong_value('fn:QName("", "3")', 'FOCA0002', 'invalid value') self.wrong_value('fn:QName("", "xs:int")', 'FOCA0002', 'cannot associate a non-empty prefix with no namespace') self.wrong_type('fn:QName("http://www.example.com/ns/")', 'XPST0017', '2nd argument missing') self.wrong_type('fn:QName("http://www.example.com/ns/", "person"', 'XPST0017', 'Wrong number of arguments') if xmlschema is not None: schema = xmlschema.XMLSchema(""" """) with self.schema_bound_parser(schema.xpath_proxy): context = self.parser.schema.get_context() self.check_value('fn:QName("http://www.example.com/ns/", "@person")', expected=ValueError, context=context) def test_prefix_from_qname_function(self): self.check_value( 'fn:prefix-from-QName(fn:QName("http://www.example.com/ns/", "ht:person"))', 'ht' ) self.check_value( 'fn:prefix-from-QName(fn:QName("http://www.example.com/ns/", "person"))', [] ) self.check_value('fn:prefix-from-QName(())', []) self.check_value('fn:prefix-from-QName(7)', TypeError) self.check_value('fn:prefix-from-QName("7")', TypeError) def test_local_name_from_qname_function(self): self.check_value( 'fn:local-name-from-QName(fn:QName("http://www.example.com/ns/", "person"))', 'person' ) self.check_value('fn:local-name-from-QName(())') self.check_value('fn:local-name-from-QName(8)', TypeError) self.check_value('fn:local-name-from-QName("8")', TypeError) def test_namespace_uri_from_qname_function(self): root = self.etree.XML('' ' ' ' ' '') self.check_value( 'fn:namespace-uri-from-QName(fn:QName("http://www.example.com/ns/", "person"))', 'http://www.example.com/ns/' ) self.check_value('fn:namespace-uri-from-QName(())') self.check_value('fn:namespace-uri-from-QName(1)', TypeError) self.check_value('fn:namespace-uri-from-QName("1")', TypeError) self.check_selector("fn:namespace-uri-from-QName(xs:QName('p3:C3'))", root, KeyError) self.check_selector("fn:namespace-uri-from-QName(xs:QName('p3:C3'))", root, ValueError, namespaces={'p3': ''}) def test_resolve_qname_function(self): root = self.etree.XML('' ' ' ' ' '') context = XPathContext(root=root, namespaces=self.namespaces) self.check_value("fn:resolve-QName((), .)", context=context) if self.etree is lxml_etree: self.check_value("fn:string(fn:resolve-QName('eg:C2', .))", KeyError, context=context) self.check_selector("fn:resolve-QName('p3:C3', .)", root, KeyError, namespaces={'p3': ''}) else: self.check_value("fn:string(fn:resolve-QName('eg:C2', .))", 'eg:C2', context=context) self.check_selector("fn:resolve-QName('p3:C3', .)", root, ValueError, namespaces={'p3': ''}) self.check_raise("fn:resolve-QName('p3:C3', .)", KeyError, 'FONS0004', "no namespace found for prefix 'p3'", context=context) self.check_value("fn:resolve-QName('C3', .)", QName('', 'C3'), context=context) self.check_value("fn:resolve-QName(2, .)", TypeError, context=context) self.check_value("fn:resolve-QName('2', .)", ValueError, context=context) self.check_value("fn:resolve-QName((), 4)", context=context) self.wrong_type("fn:resolve-QName('p3:C3', 4)", 'FORG0006', '2nd argument 4 is not an element node', context=context) root = self.etree.XML('') self.check_selector("fn:resolve-QName('C3', .)", root, [QName('', 'C3')], namespaces={'': ''}) self.check_selector("fn:resolve-QName('xml:lang', .)", root, [QName(XML_NAMESPACE, 'lang')]) def test_namespace_uri_for_prefix_function(self): root = self.etree.XML('' ' ' ' ' '') context = XPathContext(root=root) self.check_value("fn:namespace-uri-for-prefix('p1', .)", context=context) self.check_value("fn:namespace-uri-for-prefix(4, .)", TypeError, context=context) self.check_value("fn:namespace-uri-for-prefix('p1', 9)", TypeError, context=context) self.check_value("fn:namespace-uri-for-prefix('eg', .)", 'http://www.example.com/ns/', context=context) self.check_selector("fn:namespace-uri-for-prefix('p3', .)", root, NameError, namespaces={'p3': ''}) # Note: default namespace for XPath 2 tests is 'http://www.example.com/ns/' self.check_value("fn:namespace-uri-for-prefix('', .)", context=context) self.check_value( 'fn:namespace-uri-from-QName(fn:QName("http://www.example.com/ns/", "person"))', 'http://www.example.com/ns/' ) self.check_value("fn:namespace-uri-for-prefix('', .)", context=context) self.check_value("fn:namespace-uri-for-prefix((), .)", context=context) def test_in_scope_prefixes_function(self): root = self.etree.XML('' ' ' ' ' '') namespaces = {'p0': 'ns0', 'p2': 'ns2'} prefixes = select(root, "fn:in-scope-prefixes(.)", namespaces, parser=type(self.parser)) if self.etree is lxml_etree: self.assertIn('p0', prefixes) self.assertIn('p1', prefixes) self.assertNotIn('p2', prefixes) else: self.assertIn('p0', prefixes) self.assertNotIn('p1', prefixes) self.assertIn('p2', prefixes) # Provides namespaces through the dynamic context selector = Selector("fn:in-scope-prefixes(.)", parser=type(self.parser)) prefixes = selector.select(root, namespaces=namespaces) self.assertIn('p0', prefixes) self.assertNotIn('p1', prefixes) self.assertIn('p2', prefixes) with self.assertRaises(TypeError): select(root, "fn:in-scope-prefixes('')", namespaces, parser=type(self.parser)) root = self.etree.XML(''.format(XML_NAMESPACE)) namespaces = {'tns': 'ns1', 'xml': XML_NAMESPACE} prefixes = select(root, "fn:in-scope-prefixes(.)", namespaces, parser=type(self.parser)) if self.etree is lxml_etree: self.assertIn('tns', prefixes) self.assertIn('xml', prefixes) self.assertNotIn('fn', prefixes) else: self.assertIn('tns', prefixes) self.assertIn('xml', prefixes) self.assertIn('fn', prefixes) if xmlschema is not None: schema = xmlschema.XMLSchema(""" """) with self.schema_bound_parser(schema.xpath_proxy): context = self.parser.schema.get_context() prefixes = {'xml', 'xs', 'fn', 'err', 'xsi', 'eg', 'tst'} if self.parser.version >= '3.0': prefixes.add('math') if self.parser.version >= '3.1': prefixes.add('map') prefixes.add('array') self.check_value("fn:in-scope-prefixes(.)", prefixes, context) def test_datetime_function(self): tz = Timezone(datetime.timedelta(hours=5, minutes=24)) self.check_value('fn:dateTime((), xs:time("24:00:00"))', []) self.check_value('fn:dateTime(xs:date("1999-12-31"), ())', []) self.check_value('fn:dateTime(xs:date("1999-12-31"), xs:time("12:00:00"))', datetime.datetime(1999, 12, 31, 12, 0)) self.check_value('fn:dateTime(xs:date("1999-12-31"), xs:time("24:00:00"))', datetime.datetime(1999, 12, 31, 0, 0)) self.check_value('fn:dateTime(xs:date("1999-12-31"), xs:time("13:00:00+05:24"))', datetime.datetime(1999, 12, 31, 13, 0, tzinfo=tz)) self.wrong_value('fn:dateTime(xs:date("1999-12-31+03:00"), xs:time("13:00:00+05:24"))', 'FORG0008', 'inconsistent timezones') self.check_value('fn:dateTime(xs:date("1999-12-31"), xs:time("12:00:00"))', DateTime10) with self.assertRaises(AssertionError): self.check_value('fn:dateTime(xs:date("1999-12-31"), xs:time("12:00:00"))', DateTime) self.parser._xsd_version = '1.1' try: self.check_value('fn:dateTime(xs:date("1999-12-31"), xs:time("12:00:00"))', DateTime(1999, 12, 31, 12)) self.check_value('fn:dateTime(xs:date("1999-12-31"), xs:time("12:00:00"))', DateTime) finally: self.parser._xsd_version = '1.0' def test_year_from_datetime_function(self): self.check_value('fn:year-from-dateTime(xs:dateTime("1999-05-31T13:20:00-05:00"))', 1999) self.check_value('fn:year-from-dateTime(xs:dateTime("1999-05-31T21:30:00-05:00"))', 1999) self.check_value('fn:year-from-dateTime(xs:dateTime("1999-12-31T19:20:00"))', 1999) self.check_value('fn:year-from-dateTime(xs:dateTime("1999-12-31T24:00:00"))', 2000) self.check_value('fn:year-from-dateTime(())') def test_month_from_datetime_function(self): self.check_value('fn:month-from-dateTime(xs:dateTime("1999-05-31T13:20:00-05:00"))', 5) self.check_value('fn:month-from-dateTime(xs:dateTime("1999-12-31T19:20:00-05:00"))', 12) self.check_value('fn:month-from-dateTime(fn:adjust-dateTime-to-timezone(xs:dateTime(' '"1999-12-31T19:20:00-05:00"), xs:dayTimeDuration("PT0S")))', 1) def test_day_from_datetime_function(self): self.check_value('fn:day-from-dateTime(xs:dateTime("1999-05-31T13:20:00-05:00"))', 31) self.check_value('fn:day-from-dateTime(xs:dateTime("1999-12-31T20:00:00-05:00"))', 31) self.check_value('fn:day-from-dateTime(fn:adjust-dateTime-to-timezone(xs:dateTime(' '"1999-12-31T19:20:00-05:00"), xs:dayTimeDuration("PT0S")))', 1) def test_hours_from_datetime_function(self): self.check_value('fn:hours-from-dateTime(xs:dateTime("1999-05-31T08:20:00-05:00")) ', 8) self.check_value('fn:hours-from-dateTime(xs:dateTime("1999-12-31T21:20:00-05:00"))', 21) self.check_value('fn:hours-from-dateTime(fn:adjust-dateTime-to-timezone(xs:dateTime(' '"1999-12-31T21:20:00-05:00"), xs:dayTimeDuration("PT0S")))', 2) self.check_value('fn:hours-from-dateTime(xs:dateTime("1999-12-31T12:00:00")) ', 12) self.check_value('fn:hours-from-dateTime(xs:dateTime("1999-12-31T24:00:00"))', 0) def test_minutes_from_datetime_function(self): self.check_value('fn:minutes-from-dateTime(xs:dateTime("1999-05-31T13:20:00-05:00"))', 20) self.check_value('fn:minutes-from-dateTime(xs:dateTime("1999-05-31T13:30:00+05:30"))', 30) def test_seconds_from_datetime_function(self): self.check_value('fn:seconds-from-dateTime(xs:dateTime("1999-05-31T13:20:00-05:00"))', 0) self.check_value('seconds-from-dateTime(xs:dateTime("2001-02-03T08:23:12.43"))', Decimal('12.43')) def test_timezone_from_datetime_function(self): self.check_value('fn:timezone-from-dateTime(xs:dateTime("1999-05-31T13:20:00-05:00"))', DayTimeDuration(seconds=-18000)) self.check_value('fn:timezone-from-dateTime(())') def test_year_from_date_function(self): self.check_value('fn:year-from-date(xs:date("1999-05-31"))', 1999) self.check_value('fn:year-from-date(xs:date("2000-01-01+05:00"))', 2000) self.check_value('year-from-date(())') def test_month_from_date_function(self): self.check_value('fn:month-from-date(xs:date("1999-05-31-05:00"))', 5) self.check_value('fn:month-from-date(xs:date("2000-01-01+05:00"))', 1) def test_day_from_date_function(self): self.check_value('fn:day-from-date(xs:date("1999-05-31-05:00"))', 31) self.check_value('fn:day-from-date(xs:date("2000-01-01+05:00"))', 1) def test_timezone_from_date_function(self): self.check_value('fn:timezone-from-date(xs:date("1999-05-31-05:00"))', DayTimeDuration.fromstring('-PT5H')) self.check_value('fn:timezone-from-date(xs:date("2000-06-12Z"))', DayTimeDuration.fromstring('PT0H')) self.check_value('fn:timezone-from-date(xs:date("2000-06-12"))') def test_hours_from_time_function(self): self.check_value('fn:hours-from-time(xs:time("11:23:00"))', 11) self.check_value('fn:hours-from-time(xs:time("21:23:00"))', 21) self.check_value('fn:hours-from-time(xs:time("01:23:00+05:00"))', 1) self.check_value('fn:hours-from-time(fn:adjust-time-to-timezone(xs:time("01:23:00+05:00"), ' 'xs:dayTimeDuration("PT0S")))', 20) self.check_value('fn:hours-from-time(xs:time("24:00:00"))', 0) def test_minutes_from_time_function(self): self.check_value('fn:minutes-from-time(xs:time("13:00:00Z"))', 0) self.check_value('fn:minutes-from-time(xs:time("09:45:10"))', 45) def test_seconds_from_time_function(self): self.check_value('fn:seconds-from-time(xs:time("13:20:10.5"))', 10.5) self.check_value('fn:seconds-from-time(xs:time("20:50:10.0"))', 10.0) self.check_value('fn:seconds-from-time(xs:time("03:59:59.000001"))', Decimal('59.000001')) def test_timezone_from_time_function(self): self.check_value('fn:timezone-from-time(xs:time("13:20:00-05:00"))', DayTimeDuration.fromstring('-PT5H')) self.check_value('timezone-from-time(())') def test_years_from_duration_function(self): self.check_value('fn:years-from-duration(())') self.check_value('fn:years-from-duration(xs:yearMonthDuration("P20Y15M"))', 21) self.check_value('fn:years-from-duration(xs:yearMonthDuration("-P15M"))', -1) self.check_value('fn:years-from-duration(xs:dayTimeDuration("-P2DT15H"))', 0) def test_months_from_duration_function(self): self.check_value('fn:months-from-duration(())') self.check_value('fn:months-from-duration(xs:yearMonthDuration("P20Y15M"))', 3) self.check_value('fn:months-from-duration(xs:yearMonthDuration("-P20Y18M"))', -6) self.check_value('fn:months-from-duration(xs:dayTimeDuration("-P2DT15H0M0S"))', 0) def test_days_from_duration_function(self): self.check_value('fn:days-from-duration(())') self.check_value('fn:days-from-duration(xs:dayTimeDuration("P3DT10H"))', 3) self.check_value('fn:days-from-duration(xs:dayTimeDuration("P3DT55H"))', 5) self.check_value('fn:days-from-duration(xs:yearMonthDuration("P3Y5M"))', 0) def test_hours_from_duration_function(self): self.check_value('fn:hours-from-duration(())') self.check_value('fn:hours-from-duration(xs:dayTimeDuration("P3DT10H"))', 10) self.check_value('fn:hours-from-duration(xs:dayTimeDuration("P3DT12H32M12S"))', 12) self.check_value('fn:hours-from-duration(xs:dayTimeDuration("PT123H"))', 3) self.check_value('fn:hours-from-duration(xs:dayTimeDuration("-P3DT10H"))', -10) def test_minutes_from_duration_function(self): self.check_value('fn:minutes-from-duration(())') self.check_value('fn:minutes-from-duration(xs:dayTimeDuration("P3DT10H"))', 0) self.check_value('fn:minutes-from-duration(xs:dayTimeDuration("-P5DT12H30M"))', -30) def test_seconds_from_duration_function(self): self.check_value('fn:seconds-from-duration(())') self.check_value('fn:seconds-from-duration(xs:dayTimeDuration("P3DT10H12.5S"))', 12.5) self.check_value('fn:seconds-from-duration(xs:dayTimeDuration("-PT256S"))', -16.0) def test_node_accessor_functions(self): root = self.etree.XML('' 'simple text' % XSI_NAMESPACE) self.check_selector("node-name(.)", root, QName('', 'A')) self.check_selector("node-name(/A/B1)", root, QName('', 'B1')) self.check_selector("node-name(/A/*)", root, TypeError) # Not allowed more than one item! self.check_selector("nilled(./B1/C1)", root, False) self.check_selector("nilled(./B1/C2)", root, True) self.check_raise("nilled(.)", MissingContextError) context = XPathContext(root) self.check_value('nilled(())', context=context) self.wrong_type('nilled(8)', 'XPTY0004', 'an XPath node required', context=context) self.check_value('node-name(())', context=context) self.wrong_type('node-name(8)', 'XPTY0004', 'an XPath node required', context=context) self.check_value('node-name(.)', context=XPathContext(self.etree.ElementTree(root))) root = self.etree.XML('') self.check_value('node-name(.)', QName('http://xpath.test/ns', 'root'), context=XPathContext(root)) self.check_value('node-name(./@tst:a)', QName('http://xpath.test/ns', 'a'), context=XPathContext(root)) root = self.etree.XML('') self.check_value('node-name(./@a)', QName('', 'a'), context=XPathContext(root)) root = self.etree.XML('') self.check_raise('node-name(.)', KeyError, 'FONS0004', 'no prefix found for namespace http://xpath.test/ns0', context=XPathContext(root)) def test_string_and_data_functions(self): root = self.etree.XML(' a text, an inner text, a tail, ' 'an ending text ') self.check_selector("/*/string()", root, [' a text, an inner text, a tail, an ending text ']) self.check_selector("string(.)", root, ' a text, an inner text, a tail, an ending text ') self.check_selector("data(.)", root, ' a text, an inner text, a tail, an ending text ') self.check_selector("data(.)", root, UntypedAtomic) self.check_selector("data(())", root, []) self.check_value("string()", MissingContextError) context = XPathContext(root=self.etree.XML('')) parser = XPath2Parser(base_uri='http://www.example.com/ns/') self.assertEqual(parser.parse('data(fn:resolve-uri(()))').evaluate(context), []) @unittest.skipIf(xmlschema is None, "The xmlschema library is not installed") def test_data_function_with_typed_nodes(self): schema = xmlschema.XMLSchema(dedent("""\ """)) self.parser.schema = xmlschema.xpath.XMLSchemaProxy(schema) try: root = self.etree.XML('') self.wrong_value("data(/root)", 'FOTY0012', 'argument node', 'does not have a typed value', context=XPathContext(root)) self.wrong_value("data(.)", 'FOTY0012', 'argument node', 'does not have a typed value', context=XPathContext(root)) finally: self.parser.schema = None def test_node_set_id_function(self): root = self.etree.XML('') self.check_selector('element-with-id("foo")', root, [root[0]]) self.check_selector('id("foo")', root, [root[0]]) doc = self.etree.ElementTree(root) root = doc.getroot() self.check_selector('id("foo")', doc, [root[0]]) self.check_selector('id("fox")', doc, []) self.check_selector('id("foo baz")', doc, [root[0], root[3]]) self.check_selector('id(("foo", "baz"))', doc, [root[0], root[3]]) self.check_selector('id(("foo", "baz bar"))', doc, [root[0], root[2], root[3]]) self.check_selector('id("baz bar foo")', doc, [root[0], root[2], root[3]]) # From XPath documentation doc = self.etree.parse(io.StringIO(""" E21256 John Brown """)) root = doc.getroot() self.check_selector("id('ID21256')", doc, [root]) self.check_selector("id('E21256')", doc, [root[0]]) self.check_selector('element-with-id("ID21256")', doc, [root]) self.check_selector('element-with-id("E21256")', doc, [root]) with self.assertRaises(MissingContextError) as err: self.check_value("id('ID21256')") self.assertIn('XPDY0002', str(err.exception)) context = XPathContext(doc, variables={'x': 11}) with self.assertRaises(TypeError) as err: self.check_value("id('ID21256', $x)", context=context) self.assertIn('XPTY0004', str(err.exception)) context = XPathContext(doc, item=11, variables={'x': 11}) with self.assertRaises(TypeError) as err: self.check_value("id('ID21256', $x)", context=context) self.assertIn('XPTY0004', str(err.exception)) context = XPathContext(doc, item=root, variables={'x': root}) self.check_value("id('ID21256', $x)", [context.root.getroot()], context=context) # Id on root element root = self.etree.XML("E21256") self.check_selector("id('E21256')", root, [root]) self.check_selector('element-with-id("E21256")', root, []) @unittest.skipIf(xmlschema is None, "xmlschema library is not installed ...") def test_node_set_id_function_with_schema(self): root = self.etree.XML(dedent("""\ E21256 John Brown """)) doc = self.etree.ElementTree(root) # Test with matching value of type xs:ID schema = xmlschema.XMLSchema(dedent("""\ """)) self.assertTrue(schema.is_valid(root)) with self.schema_bound_parser(schema.xpath_proxy): context = XPathContext(doc) self.check_select("id('ID21256')", [context.root.getroot()], context) # self.check_select("id('E21256')", [root[0]], context) # Test with matching value of type xs:string schema = xmlschema.XMLSchema(dedent("""\ """)) self.assertTrue(schema.is_valid(root)) with self.schema_bound_parser(schema.xpath_proxy): context = XPathContext(doc) self.check_select("id('E21256')", [], context) @unittest.skipIf(xmlschema is None, "xmlschema library is not installed ...") def test_node_set_id_function_with_wrong_schema(self): root = self.etree.XML(dedent("""\ E21256 John Brown """)) doc = self.etree.ElementTree(root) schema = xmlschema.XMLSchema(dedent("""\ """)) self.assertFalse(schema.is_valid(root)) with self.schema_bound_parser(schema.xpath_proxy): context = XPathContext(doc) self.check_select("id('ID21256')", [context.root.getroot()], context) self.check_select("id('E21256')", [], context) schema = xmlschema.XMLSchema(dedent("""\ """)) self.assertFalse(schema.is_valid(root)) with self.schema_bound_parser(schema.xpath_proxy): context = XPathContext(doc) self.check_select("id('ID21256')", [context.root.getroot()], context) self.check_select("id('E21256')", [], context) def test_node_set_idref_function(self): doc = self.etree.parse(io.StringIO(""" E21256 John Brown E21257 John Doe """)) root = doc.getroot() self.check_value("idref('ID21256')", MissingContextError) self.check_selector("idref('ID21256')", doc, []) self.check_selector("idref('E21256')", doc, [root[0][0]]) self.check_selector("idref('ID21256')", root, []) context = XPathContext(doc, variables={'x': 11}) self.wrong_type("idref('ID21256', $x)", 'XPTY0004', context=context) context = XPathContext(doc, item=root, variables={'x': root}) self.check_value("idref('ID21256', $x)", [], context=context) context = XPathContext(doc, item=root) context.variables = { 'x': AttributeNode(XML_ID, 'ID21256', parent=context.root[0]) } self.check_value("idref('ID21256', $x)", [], context=context) context = XPathContext(root, variables={'x': None}) context.item = None self.check_value("idref('ID21256', $x)", [], context=context) def test_deep_equal_function(self): root = self.etree.XML(""" """) context = XPathContext(root, variables={'xt': root}) self.check_value('fn:deep-equal($xt, $xt)', True, context=context) self.check_value('deep-equal($xt, $xt/*)', False, context=context) self.check_value('deep-equal($xt/name[1], $xt/name[2])', False, context=context) self.check_value('deep-equal($xt/name[1], $xt/name[3])', True, context=context) self.check_value('deep-equal($xt/name[1], $xt/name[3]/@last)', False, context=context) self.check_value('deep-equal($xt/name[1]/@last, $xt/name[3]/@last)', True, context=context) self.check_value('deep-equal($xt/name[1]/@last, $xt/name[2]/@last)', False, context=context) self.check_value('deep-equal($xt/name[1], "Peter Parker")', False, context=context) root = self.etree.XML("""""") context = XPathContext(root, variables={'xt': root}) self.check_value('deep-equal($xt, $xt)', True, context=context) self.check_value('deep-equal((1, 2, 3), (1, 2, 3))', True) self.check_value('deep-equal((1, 2, 3), (1, (), 3))', False) self.check_value('deep-equal((true(), 2, 3), (1, 2, 3))', False) self.check_value('deep-equal((true(), 2, 3), (true(), 2, 3))', True) self.check_value('deep-equal((1, 2, 3), (true(), 2, 3))', False) self.check_value('deep-equal((xs:untypedAtomic("1"), 2, 3), (1, 2, 3))', False) self.check_value('deep-equal((1, 2, 3), (xs:untypedAtomic("1"), 2, 3))', False) self.check_value( 'deep-equal((xs:untypedAtomic("1"), 2, 3), (xs:untypedAtomic("2"), 2, 3))', False ) self.check_value( 'deep-equal((xs:untypedAtomic("1"), 2, 3), (xs:untypedAtomic("1"), 2, 3))', True ) self.check_value('deep-equal((), (1, 2, 3))', False) self.check_value('deep-equal((1, 2, 3), (1, 2, 4))', False) self.check_value("deep-equal((1, 2, 3), (1, '2', 3))", False) self.check_value("deep-equal(('1', '2', '3'), ('1', '2', '3'))", True) self.check_value("deep-equal(('1', '2', '3'), ('1', '4', '3'))", False) self.check_value("deep-equal((1, 2, 3), (1, 2, 3), 'en_US.UTF-8')", True) self.check_value('fn:deep-equal(xs:float("NaN"), xs:double("NaN"))', True) self.check_value('fn:deep-equal(xs:float("NaN"), 1.0)', False) self.check_value('fn:deep-equal(1.0, xs:double("NaN"))', False) self.check_value('deep-equal((1.1E0, 2E0, 3), (1.1, 2.0, 3))', True) self.check_value('deep-equal((1.1E0, 2E0, 3), (1.1, 2.1, 3))', False) self.check_value('deep-equal((1E0, 2E0, 3), (1, 2, 3))', True) self.check_value('deep-equal((1E0, 2E0, 3), (1, 4, 3))', False) self.check_value('deep-equal((1.1, 2.0, 3), (1.1E0, 2E0, 3))', True) self.check_value('deep-equal((1.1, 2.1, 3), (1.1E0, 2E0, 3))', False) self.check_value('deep-equal((1, 2, 3), (1E0, 2E0, 3))', True) self.check_value('deep-equal((1, 4, 3), (1E0, 2E0, 3))', False) self.check_value('deep-equal(3.1, xs:anyURI("http://xpath.test")) ', False) context = XPathContext(root) context.variables = {'a': [TextNode('alpha')], 'b': [TextNode('beta')]} self.check_value('deep-equal($a, $a)', True, context=context) self.check_value('deep-equal($a, $b)', False, context=context) context = XPathContext(root) context.variables = {'a': [AttributeNode('a', '10')], 'b': [AttributeNode('b', '10')]} self.check_value('deep-equal($a, $a)', True, context=context) self.check_value('deep-equal($a, $b)', False, context=context) context.variables = {'a': [NamespaceNode('tns0', 'http://xpath.test/ns')], 'b': [NamespaceNode('tns1', 'http://xpath.test/ns')]} self.check_value('deep-equal($a, $a)', True, context=context) self.check_value('deep-equal($a, $b)', False, context=context) def test_adjust_datetime_to_timezone_function(self): context = XPathContext(root=self.etree.XML(''), timezone=Timezone.fromstring('-05:00'), variables={'tz': DayTimeDuration.fromstring("-PT10H")}) self.check_value('fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T10:00:00-07:00"))', DateTime.fromstring('2002-03-07T12:00:00-05:00'), context) self.check_value('fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T10:00:00"))', DateTime.fromstring('2002-03-07T10:00:00')) self.check_value('fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T10:00:00"))', DateTime.fromstring('2002-03-07T10:00:00-05:00'), context) self.check_value('fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T10:00:00"), $tz)', DateTime.fromstring('2002-03-07T10:00:00-10:00'), context) self.check_value( 'fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T10:00:00-07:00"), $tz)', DateTime.fromstring('2002-03-07T07:00:00-10:00'), context ) self.check_value('fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T10:00:00-07:00"), ' 'xs:dayTimeDuration("PT10H"))', DateTime.fromstring('2002-03-08T03:00:00+10:00'), context) self.check_value('fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T00:00:00+01:00"), ' 'xs:dayTimeDuration("-PT8H"))', DateTime.fromstring('2002-03-06T15:00:00-08:00'), context) self.check_value('fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T10:00:00"), ())', DateTime.fromstring('2002-03-07T10:00:00'), context) self.check_value( 'fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T10:00:00-07:00"), ())', DateTime.fromstring('2002-03-07T10:00:00'), context ) self.check_value('fn:adjust-dateTime-to-timezone((), ())') def test_adjust_date_to_timezone_function(self): context = XPathContext(root=self.etree.XML(''), timezone=Timezone.fromstring('-05:00'), variables={'tz': DayTimeDuration.fromstring("-PT10H")}) self.check_value('fn:adjust-date-to-timezone(xs:date("2002-03-07"))', Date.fromstring('2002-03-07-05:00'), context) self.check_value('fn:adjust-date-to-timezone(xs:date("2002-03-07-07:00"))', Date.fromstring('2002-03-07-05:00'), context) self.check_value('fn:adjust-date-to-timezone(xs:date("2002-03-07"), $tz)', Date.fromstring('2002-03-07-10:00'), context) self.check_value('fn:adjust-date-to-timezone(xs:date("2002-03-07"), ())', Date.fromstring('2002-03-07'), context) self.check_value('fn:adjust-date-to-timezone(xs:date("2002-03-07-07:00"), ())', Date.fromstring('2002-03-07'), context) self.check_value('fn:adjust-date-to-timezone(xs:date("2002-03-07-07:00"), $tz)', Date.fromstring('2002-03-06-10:00'), context) self.check_value('fn:adjust-date-to-timezone((), ())') self.check_value('adjust-date-to-timezone(xs:date("-25252734927766555-06-07+02:00"), ' 'xs:dayTimeDuration("PT0S"))', OverflowError) def test_adjust_time_to_timezone_function(self): context = XPathContext(root=self.etree.XML(''), timezone=Timezone.fromstring('-05:00'), variables={'tz': DayTimeDuration.fromstring("-PT10H")}) self.check_value('fn:adjust-time-to-timezone(())') self.check_value('fn:adjust-time-to-timezone((), ())') self.check_value('fn:adjust-time-to-timezone(xs:time("10:00:00"))', Time.fromstring('10:00:00-05:00'), context) self.check_value('fn:adjust-time-to-timezone(xs:time("10:00:00-07:00"))', Time.fromstring('12:00:00-05:00'), context) self.check_value('fn:adjust-time-to-timezone(xs:time("10:00:00"), $tz)', Time.fromstring('10:00:00-10:00'), context) self.check_value('fn:adjust-time-to-timezone(xs:time("10:00:00-07:00"), $tz)', Time.fromstring('07:00:00-10:00'), context) self.check_value('fn:adjust-time-to-timezone(xs:time("10:00:00"), ())', Time.fromstring('10:00:00'), context) self.check_value('fn:adjust-time-to-timezone(xs:time("10:00:00-07:00"), ())', Time.fromstring('10:00:00'), context) self.check_value('fn:adjust-time-to-timezone(xs:time("10:00:00-07:00"), ' 'xs:dayTimeDuration("PT10H"))', Time.fromstring('03:00:00+10:00'), context) def test_default_collation_function(self): default_collation = self.parser.default_collation self.check_value('fn:default-collation()', default_collation) def test_context_datetime_functions(self): context = XPathContext(root=self.etree.XML('')) self.check_value('fn:current-dateTime()', context=context, expected=DateTime10.fromdatetime(context.current_dt)) self.check_value(path='fn:current-date()', context=context, expected=Date10.fromdatetime(context.current_dt.date())) self.check_value(path='fn:current-time()', context=context, expected=Time.fromdatetime(context.current_dt)) self.check_value(path='fn:implicit-timezone()', context=context, expected=DayTimeDuration(seconds=time.timezone)) context.timezone = Timezone.fromstring('-05:00') self.check_value(path='fn:implicit-timezone()', context=context, expected=DayTimeDuration.fromstring('-PT5H')) self.parser._xsd_version = '1.1' try: self.check_value('fn:current-dateTime()', context=context, expected=DateTime.fromdatetime(context.current_dt)) self.check_value(path='fn:current-date()', context=context, expected=Date.fromdatetime(context.current_dt.date())) finally: self.parser._xsd_version = '1.0' def test_static_base_uri_function(self): context = XPathContext(root=self.etree.XML('')) self.check_value('fn:static-base-uri()', context=context) parser = XPath2Parser(strict=True, base_uri='http://example.com/ns/') self.assertEqual(parser.parse('fn:static-base-uri()').evaluate(context), 'http://example.com/ns/') def test_base_uri_function(self): context = XPathContext(root=self.etree.XML('')) with self.assertRaises(MissingContextError) as err: self.check_value('fn:base-uri(())') self.assertIn('XPDY0002', str(err.exception)) self.assertIn('context item is undefined', str(err.exception)) self.check_value('fn:base-uri(9)', MissingContextError) self.check_value('fn:base-uri(9)', TypeError, context=context) self.check_value('fn:base-uri()', datatypes.AnyURI(''), context=context) self.check_value('fn:base-uri(())', context=context) context = XPathContext(root=self.etree.XML('')) self.check_value('fn:base-uri()', '/base_path/', context=context) def test_document_uri_function(self): document = self.etree.parse(io.StringIO('')) context = XPathContext(root=document) self.check_value('fn:document-uri(())', context=context) self.check_value('fn:document-uri(.)', context=context) context = XPathContext(root=document.getroot(), item=document, documents={'/base_path/': document}) self.check_value('fn:document-uri(.)', context=context) context = XPathContext(root=document, documents={'/base_path/': document}) self.check_value('fn:document-uri(.)', '/base_path/', context=context) context = XPathContext(root=document, documents={ '/base_path/': self.etree.parse(io.StringIO('')), }) self.check_value('fn:document-uri(.)', context=context) document = self.etree.parse(io.StringIO('')) context = XPathContext(root=document) self.check_value('fn:document-uri(.)', '/base_path/', context=context) def test_doc_functions(self): root = self.etree.XML("") doc = self.etree.parse(io.StringIO("")) context = XPathContext(root, documents={'tns0': doc}) self.check_value("fn:doc(())", context=context) self.check_value("fn:doc-available(())", False, context=context) self.wrong_value('fn:doc-available(xs:untypedAtomic("2"))', 'FODC0002', context=context) self.wrong_type('fn:doc-available(2)', 'XPTY0004', context=context) self.check_value("fn:doc('tns0')", context.documents['tns0'], context=context) self.check_value("fn:doc-available('tns0')", True, context=context) self.check_value("fn:doc('tns1')", ValueError, context=context) self.check_value("fn:doc-available('tns1')", False, context=context) self.parser.base_uri = "/path1" self.check_value("fn:doc('http://foo.test')", ValueError, context=context) self.check_value("fn:doc-available('http://foo.test')", False, context=context) self.parser.base_uri = None doc = self.etree.XML("") context = XPathContext(root, documents={'tns0': doc}) self.wrong_type("fn:doc('tns0')", 'XPDY0050', context=context) self.wrong_type("fn:doc-available('tns0')", 'XPDY0050', context=context) context = XPathContext(root, documents={'file.xml': None}) self.wrong_value("fn:doc('file.xml')", 'FODC0002', context=context) self.wrong_value("fn:doc('unknown')", 'FODC0002', context=context) self.check_value("fn:doc-available('unknown')", False, context=context) dirpath = os.path.dirname(__file__) self.wrong_value("fn:doc('{}')".format(dirpath), 'FODC0005', context=context) def test_collection_function(self): root = self.etree.XML("") doc1 = self.etree.parse(io.StringIO("")) doc2 = self.etree.parse(io.StringIO("")) context = XPathContext(root, collections={'tns0': [doc1, doc2]}) collection = context.collections['tns0'] self.check_value("fn:collection('tns0')", collection, context=context) self.parser.collection_types = {'tns0': 'node()*'} self.check_value("fn:collection('tns0')", collection, context=context) self.parser.collection_types = {'tns0': 'node()'} self.check_value("fn:collection('tns0')", TypeError, context=context) self.check_value("fn:collection()", ValueError, context=context) context.default_collection = context.collections['tns0'] self.check_value("fn:collection()", collection, context=context) self.parser.default_collection_type = 'node()' self.check_value("fn:collection()", TypeError, context=context) self.parser.default_collection_type = 'node()*' context = XPathContext(root) self.wrong_value("fn:collection('filepath')", 'FODC0002', context=context) self.wrong_value("fn:collection('dirpath/')", 'FODC0002', context=context) def test_root_function(self): root = self.etree.XML("") context = XPathContext(root) self.check_value("root()", context.root, context=context) context = XPathContext(root, item=root[2]) self.check_value("root()", context.root, context=context) with self.assertRaises(TypeError) as err: context = XPathContext(root, item=10) self.check_value("root()", context.root, context=context) self.assertIn('XPTY0004', str(err.exception)) with self.assertRaises(TypeError) as err: self.check_value("root(7)", root, context=XPathContext(root)) self.assertIn('XPTY0004', str(err.exception)) context = XPathContext(root, variables={'elem': root[1]}) self.check_value("fn:root(())", context=context) self.check_value("fn:root($elem)", context.root, context=context) doc = self.etree.XML("") context = XPathContext(root, variables={'elem': doc[1]}) self.check_value("fn:root($elem)", context=context) context = XPathContext(root, variables={'elem': doc[1]}, documents={}) self.check_value("fn:root($elem)", context=context) context = XPathContext(root, variables={'elem': doc[1]}, documents={'.': doc}) self.check_value("root($elem)", context.documents['.'], context=context) doc2 = self.etree.XML("") context = XPathContext(root, variables={'elem': doc2[1]}, documents={'.': doc}) self.check_value("root($elem)", context=context) context = XPathContext(root, variables={'elem': doc2[1]}, documents={'.': doc, 'doc2': doc2}) self.check_value("root($elem)", context.documents['doc2'], context=context) if xmlschema is not None: schema = xmlschema.XMLSchema(dedent("""\ """)) with self.schema_bound_parser(schema.xpath_proxy): context = self.parser.schema.get_context() self.check_value("fn:root()", None, context) def test_error_function(self): with self.assertRaises(ElementPathError) as err: self.check_value('fn:error()') self.assertEqual(str(err.exception), '[err:FOER0000] Unidentified error') with self.assertRaises(ElementPathError) as err: self.check_value('fn:error("err:XPST0001")') self.assertIn("[err:XPTY0004]", str(err.exception)) with self.assertRaises(ElementPathError) as err: self.check_value( "fn:error(fn:QName('http://www.w3.org/2005/xqt-errors', 'err:XPST0001'))" ) self.assertEqual(str(err.exception), '[err:XPST0001] Parser not bound to a schema') with self.assertRaises(ElementPathError) as err: self.check_value( "fn:error(fn:QName('http://www.w3.org/2005/xqt-errors', 'err:XPST0001'), " "'Missing schema')" ) self.assertEqual(str(err.exception), '[err:XPST0001] Missing schema') def test_trace_function(self): self.check_value('trace((), "trace message")', []) self.check_value('trace("foo", "trace message")', ['foo']) @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath2FunctionsTest(XPath2FunctionsTest): etree = lxml_etree if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_xpath2_parser.py000066400000000000000000001723601427546011100213710ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # # Note: Many tests are built using the examples of the XPath standards, # published by W3C under the W3C Document License. # # References: # http://www.w3.org/TR/1999/REC-xpath-19991116/ # http://www.w3.org/TR/2010/REC-xpath20-20101214/ # http://www.w3.org/TR/2010/REC-xpath-functions-20101214/ # https://www.w3.org/Consortium/Legal/2015/doc-license # https://www.w3.org/TR/charmod-norm/ # import unittest import io import locale import os from decimal import Decimal from textwrap import dedent try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None try: import xmlschema except ImportError: xmlschema = None else: xmlschema.XMLSchema.meta_schema.build() from elementpath import XPath2Parser, XPathContext, MissingContextError, \ ElementNode, select, iter_select from elementpath.datatypes import xsd10_atomic_types, xsd11_atomic_types, DateTime, \ Date, Time, Timezone, DayTimeDuration, YearMonthDuration, UntypedAtomic, QName from elementpath.helpers import get_locale_category try: from tests import test_xpath1_parser except ImportError: import test_xpath1_parser def get_sequence_type(value, xsd_version='1.0'): """ Infers the sequence type from a value. """ if value is None or value == []: return 'empty-sequence()' elif isinstance(value, list): if value[0] is not None and not isinstance(value[0], list): sequence_type = get_sequence_type(value[0], xsd_version) if all(get_sequence_type(x, xsd_version) == sequence_type for x in value[1:]): return '{}+'.format(sequence_type) else: return 'node()+' else: value_kind = getattr(value, 'kind', None) if value_kind is not None: return '{}()'.format(value_kind) elif isinstance(value, UntypedAtomic): return 'xs:untypedAtomic' if QName.is_valid(value) and ':' in str(value): return 'xs:QName' if xsd_version == '1.0': atomic_types = xsd10_atomic_types else: atomic_types = xsd11_atomic_types if atomic_types['dateTimeStamp'].is_valid(value): return 'xs:dateTimeStamp' for type_name in ['string', 'boolean', 'decimal', 'float', 'double', 'date', 'dateTime', 'gDay', 'gMonth', 'gMonthDay', 'anyURI', 'gYear', 'gYearMonth', 'time', 'duration', 'dayTimeDuration', 'yearMonthDuration', 'base64Binary', 'hexBinary']: if atomic_types[type_name].is_valid(value): return 'xs:%s' % type_name raise ValueError("Inconsistent sequence type for {!r}".format(value)) class XPath2ParserTest(test_xpath1_parser.XPath1ParserTest): def setUp(self): self.parser = XPath2Parser(namespaces=self.namespaces) # Make sure the tests are repeatable. env_vars_to_tweak = 'LC_ALL', 'LANG' self.current_env_vars = {v: os.environ.get(v) for v in env_vars_to_tweak} for v in self.current_env_vars: os.environ[v] = 'en_US.UTF-8' def tearDown(self): if hasattr(self, 'current_env_vars'): for v in self.current_env_vars: if self.current_env_vars[v] is not None: os.environ[v] = self.current_env_vars[v] def test_is_sequence_type_method(self): self.assertTrue(self.parser.is_sequence_type('empty-sequence()')) self.assertTrue(self.parser.is_sequence_type('xs:string')) self.assertTrue(self.parser.is_sequence_type('xs:float+')) self.assertTrue(self.parser.is_sequence_type('element()*')) self.assertTrue(self.parser.is_sequence_type('item()?')) self.assertTrue(self.parser.is_sequence_type('xs:untypedAtomic+')) self.assertFalse(self.parser.is_sequence_type(10)) self.assertFalse(self.parser.is_sequence_type('')) self.assertFalse(self.parser.is_sequence_type('empty-sequence()*')) self.assertFalse(self.parser.is_sequence_type('unknown')) self.assertFalse(self.parser.is_sequence_type('unknown?')) self.assertFalse(self.parser.is_sequence_type('tns0:unknown')) self.assertTrue(self.parser.is_sequence_type(' element( ) ')) self.assertTrue(self.parser.is_sequence_type(' element( * ) ')) self.assertFalse(self.parser.is_sequence_type(' element( *, * ) ')) self.assertTrue(self.parser.is_sequence_type('element(A)')) self.assertTrue(self.parser.is_sequence_type('element(A, xs:date)')) self.assertTrue(self.parser.is_sequence_type('element(*, xs:date)')) self.assertFalse(self.parser.is_sequence_type('element(A, B, xs:date)')) if self.parser.version >= '3.0': self.assertTrue(self.parser.is_sequence_type('function(*)')) else: self.assertFalse(self.parser.is_sequence_type('function(*)')) def test_match_sequence_type_method(self): self.assertTrue(self.parser.match_sequence_type(None, 'empty-sequence()')) self.assertTrue(self.parser.match_sequence_type([], 'empty-sequence()')) self.assertFalse(self.parser.match_sequence_type('', 'empty-sequence()')) self.assertFalse(self.parser.match_sequence_type('', 'empty-sequence()')) context = XPathContext(self.etree.XML('')) root = context.root self.assertTrue(self.parser.match_sequence_type(root, 'element()')) self.assertTrue(self.parser.match_sequence_type([root], 'element()')) self.assertTrue(self.parser.match_sequence_type(root, 'element()', '?')) self.assertTrue(self.parser.match_sequence_type(root, 'element()', '+')) self.assertTrue(self.parser.match_sequence_type(root, 'element()', '*')) self.assertFalse(self.parser.match_sequence_type(root[:], 'element()')) self.assertFalse(self.parser.match_sequence_type(root[:], 'element()', '?')) self.assertTrue(self.parser.match_sequence_type(root[:], 'element()', '+')) self.assertTrue(self.parser.match_sequence_type(root[:], 'element()', '*')) self.assertTrue(self.parser.match_sequence_type(UntypedAtomic(1), 'xs:untypedAtomic')) self.assertFalse(self.parser.match_sequence_type(1, 'xs:untypedAtomic')) self.assertTrue(self.parser.match_sequence_type('1', 'xs:string')) self.assertFalse(self.parser.match_sequence_type(1, 'xs:string')) self.assertFalse(self.parser.match_sequence_type('1', 'xs:unknown')) self.assertFalse(self.parser.match_sequence_type('1', 'tns0:string')) def test_variable_reference(self): root = self.etree.XML('') context = XPathContext(root=root, variables={'var1': root[0]}) self.check_value('$var1', context.root[0], context=context) context = XPathContext(root=root, variables={'tns:var1': root[0]}) self.check_raise('$tns:var1', NameError, 'XPST0081', context=context) # Test dynamic evaluation error parser = XPath2Parser(namespaces={'tns': 'http://xpath.test/ns'}) token = parser.parse('$tns:var1') parser.namespaces.pop('tns') with self.assertRaises(NameError) as ctx: token.evaluate(context) self.assertIn('XPST0081', str(ctx.exception)) def test_check_variables_method(self): self.parser.variable_types.update( (k, get_sequence_type(v)) for k, v in self.variables.items() ) self.assertEqual(self.parser.variable_types, {'values': 'xs:decimal+', 'myaddress': 'xs:string', 'word': 'xs:string'}) self.assertIsNone(self.parser.check_variables( {'values': [1, 2, -1], 'myaddress': 'info@example.com', 'word': ''} )) with self.assertRaises(NameError) as ctx: self.parser.check_variables({'values': 1}) self.assertIn("[err:XPST0008] missing variable", str(ctx.exception)) with self.assertRaises(TypeError) as ctx: self.parser.check_variables( {'values': 1.0, 'myaddress': 'info@example.com', 'word': ''} ) self.assertEqual("[err:XPDY0050] Unmatched sequence type for variable 'values'", str(ctx.exception)) with self.assertRaises(TypeError) as ctx: self.parser.check_variables( {'values': 1, 'myaddress': 'info@example.com', 'word': True} ) self.assertEqual("[err:XPDY0050] Unmatched sequence type for variable 'word'", str(ctx.exception)) self.parser.variable_types.clear() def test_xpath_tokenizer(self): super(XPath2ParserTest, self).test_xpath_tokenizer() self.check_tokenizer("(: this is a comment :)", ['(:', '', 'this', '', 'is', '', 'a', '', 'comment', '', ':)']) self.check_tokenizer("last (:", ['last', '', '(:']) def test_token_tree(self): super(XPath2ParserTest, self).test_token_tree() self.check_tree('(1 + 6, 2, 10 - 4)', '(, (, (+ (1) (6)) (2)) (- (10) (4)))') self.check_tree('/A/B2 union /A/B1', '(union (/ (/ (A)) (B2)) (/ (/ (A)) (B1)))') self.check_tree("//text/(preceding-sibling::text)[1]", '(/ (// (text)) ([ (preceding-sibling (text)) (1)))') def test_token_source(self): super(XPath2ParserTest, self).test_token_source() self.check_source("(5, 6) instance of xs:integer+", '(5, 6) instance of xs:integer+') self.check_source("$myaddress treat as element(*, USAddress)", "$myaddress treat as element(*, USAddress)") def test_xpath_comments(self): self.wrong_syntax("(: this is a comment :)") self.check_value("(: this is a comment :) true()", True) self.check_value("(: comment 1 :)(: comment 2 :) true()", True) self.check_value("(: comment 1 :) true() (: comment 2 :)", True) self.wrong_syntax("(: this is a (: nested :) comment :)") self.check_value("(: this is a (: nested :) comment :) true()", True) self.check_tree('child (: nasty (:nested :) axis comment :) ::B1', '(child (B1))') self.check_tree('child (: nasty "(: but not nested :)" axis comment :) ::B1', '(child (B1))') self.check_value("5 (: before operator comment :) < 4", False) # Before infix operator self.check_value("5 < (: after operator comment :) 4", False) # After infix operator self.check_value("true (:# nasty function comment :) ()", True) self.check_tree(' (: initial comment :)/ (:2nd comment:)A/B1(: 3rd comment :)/ \n' 'C1 (: last comment :)\t', '(/ (/ (/ (A)) (B1)) (C1))') self.wrong_syntax("xs:(: invalid QName :)string") def test_comma_operator(self): self.check_value("1, 2", [1, 2]) self.check_value("(1, 2)", [1, 2]) self.check_value("(1, 2, ())", [1, 2]) self.check_value("(1, fn:round-half-to-even(()), 7)", [1, 7]) self.check_value("(-9, 28, 10)", [-9, 28, 10]) self.check_value("(1, 2)", [1, 2]) root = self.etree.XML('') self.check_selector("(7.0, /A, 'foo')", root, [7.0, root, 'foo']) self.check_selector("7.0, /A, 'foo'", root, [7.0, root, 'foo']) self.check_selector("/A, 7.0, 'foo'", self.etree.XML(''), [7.0, 'foo']) def test_range_expressions(self): # Some cases from https://www.w3.org/TR/xpath20/#construct_seq self.check_value("1 to 2", [1, 2]) self.check_value("1 to 10", list(range(1, 11))) self.check_value("(10, 1 to 4)", [10, 1, 2, 3, 4]) self.check_value("10 to 10", [10]) self.check_value("15 to 10", []) self.check_value("fn:reverse(10 to 15)", [15, 14, 13, 12, 11, 10]) self.wrong_syntax("1 to 10 to 20", 'XPST0003') root = self.etree.XML('') self.wrong_type("'1' to '10'", 'XPTY0004', context=XPathContext(root)) self.wrong_type("true() to 10", 'XPTY0004') def test_parenthesized_expressions(self): self.check_value("(1, 2, '10')", [1, 2, '10']) self.check_value("()", []) def test_if_expressions(self): root = self.etree.XML('') token = self.parser.parse("if (1) then 2 else 3") self.assertEqual(len(token), 3) self.assertEqual(token.source, 'if (1) then 2 else 3') self.check_value("if (1) then 2 else 3", 2) self.check_selector("if (true()) then /A/B1 else /A/B2", root, root[:1]) self.check_selector("if (false()) then /A/B1 else /A/B2", root, root[1:2]) token = self.parser.parse("if") self.assertEqual(token.symbol, '(name)') self.assertEqual(token.value, 'if') # Cases from XPath 2.0 examples root = self.etree.XML('') self.check_selector( 'if ($part/@discounted) then $part/wholesale else $part/retail', root, [root[0]], variables={'part': root}, variable_types={'part': 'element()'} ) root = self.etree.XML('' ' 25' ' 10' ' 15' '') self.check_selector( 'if ($widget1/unit-cost < $widget2/unit-cost) then $widget1 else $widget2', root, [root[2]], variables={'widget1': root[0], 'widget2': root[2]} ) def test_quantifier_expressions(self): # Cases from XPath 2.0 examples root = self.etree.XML('' ' ' ' ' ' ' '') self.check_selector("every $part in /parts/part satisfies $part/@discounted", root, True) self.check_selector("every $part in /parts/part satisfies $part/@available", root, False) root = self.etree.XML('' ' 1000400' ' 1200300' ' 1200200' '') self.check_selector("some $emp in /emps/employee satisfies " " ($emp/bonus > 0.25 * $emp/salary)", root, True) self.check_selector("every $emp in /emps/employee satisfies " " ($emp/bonus < 0.5 * $emp/salary)", root, True) context = XPathContext(root=self.etree.XML('')) self.check_value("some $x in (1, 2, 3), $y in (2, 3, 4) satisfies $x + $y = 4", True, context) self.check_value("every $x in (1, 2, 3), $y in (2, 3, 4) satisfies $x + $y = 4", False, context) self.check_value("some $x in (1, 2, 3), $y in (2, 3, 4) satisfies $x + $y = 7", True, context) self.check_value("some $x in (1, 2, 3), $y in (2, 3, 4) satisfies $x + $y = 8", False, context) self.check_value('some $x in (1, 2, "cat") satisfies $x * 2 = 4', True, context) self.check_value('every $x in (1, 2, "cat") satisfies $x * 2 = 4', False, context) token = self.parser.parse("some") self.assertEqual(token.symbol, '(name)') self.assertEqual(token.value, 'some') # From W3C XQuery/XPath tests context = XPathContext(root=self.etree.XML(''), variables={'result': [43, 44, 45]}) self.check_value('some $i in $result satisfies $i = 44', True, context) self.check_value('every $i in $result satisfies $i = 44', False, context) self.check_raise('some $foo in (1, $foo) satisfies 1', NameError, 'XPST0008') def test_for_expressions(self): # Cases from XPath 2.0 examples context = XPathContext(root=self.etree.XML('')) path = "for $i in (10, 20), $j in (1, 2) return ($i + $j)" self.check_value(path, [11, 12, 21, 22], context) self.check_source(path, path) root = self.etree.XML( """ TCP/IP Illustrated Stevens Addison-Wesley Advanced Programming in the Unix Environment Stevens Addison-Wesley Data on the Web Abiteboul Buneman Suciu """) # Test step-by-step, testing also other basic features. self.check_selector("book/author[1]", root, [root[0][1], root[1][1], root[2][1]]) self.check_selector("book/author[. = $a]", root, [root[0][1], root[1][1]], variables={'a': 'Stevens'}) self.check_tree("book/author[. = $a][1]", '(/ (book) ([ ([ (author) (= (.) ($ (a)))) (1)))') self.check_selector("book/author[. = $a][1]", root, [root[0][1], root[1][1]], variables={'a': 'Stevens'}) self.check_selector("book/author[. = 'Stevens'][2]", root, []) self.check_selector("for $a in fn:distinct-values(book/author) return $a", root, ['Stevens', 'Abiteboul', 'Buneman', 'Suciu']) self.check_selector("for $a in fn:distinct-values(book/author) return book/author[. = $a]", root, [root[0][1], root[1][1]] + root[2][1:4]) self.check_selector("for $a in fn:distinct-values(book/author) " "return book/author[. = $a][1]", root, [root[0][1], root[1][1]] + root[2][1:4]) self.check_selector( "for $a in fn:distinct-values(book/author) " "return (book/author[. = $a][1], book[author = $a]/title)", root, [root[0][1], root[1][1], root[0][0], root[1][0], root[2][1], root[2][0], root[2][2], root[2][0], root[2][3], root[2][0]] ) # From W3C XQuery/XPath tests context = XPathContext(root=self.etree.XML(''), variables={'result': [43, 44, 45]}) self.check_value('for $i in $result return $i + 10', [53, 54, 55], context) self.check_raise('for $foo in (1, $foo) return 1', NameError, 'XPST0008') def test_idiv_operator(self): self.check_value("5 idiv 2", 2) self.check_value("-3.5 idiv -2", 1) self.check_value("-3.5 idiv 2", -1) self.check_value('xs:float("-3.5") idiv xs:float("3")', -1) self.check_value("-3.5 idiv 0", ZeroDivisionError) self.check_value("xs:float('INF') idiv 2", OverflowError) self.wrong_value("-3.5 idiv ()", 'XPST0005') self.check_raise('xs:float("NaN") idiv 1', OverflowError, 'FOAR0002') self.wrong_type("5 idiv '2'", 'XPTY0004') def test_comparison_operators(self): super(XPath2ParserTest, self).test_comparison_operators() self.check_value("0.05 eq 0.05", True) self.check_value("19.03 ne 19.02999", True) self.check_value("-1.0 eq 1.0", False) self.check_value("1 le 2", True) self.check_value("1e0 eq 1e2", False) self.check_value("xs:float('1e0') eq 1e2", False) self.check_value("1.0 lt 1e2", True) self.check_value("1e2 lt 1000", True) self.check_value("3 le 2", False) self.check_value("5 ge 9", False) self.check_value("5 gt 3", True) self.check_value("5 lt 20.0", True) self.wrong_type("false() eq 1", 'XPTY0004') self.wrong_type("0 eq false()", 'XPTY0004') self.check_value("2 * 2 eq 4", True) self.check_value("() * 7") self.check_value("() * ()") self.check_value('xs:string("http://xpath.test") eq xs:anyURI("http://xpath.test")', True) self.check_value("() le 4") self.check_value("4 gt ()") self.check_value("() eq ()") # Equality of empty sequences is also an empty sequence self.wrong_syntax('true() eq true() eq true()', 'XPST0003') # From W3C XQuery/XPath tests self.check_value('xs:duration("P31D") ne xs:yearMonthDuration("P1M")', True) self.wrong_type('QName("", "ncname") le QName("", "ncname")', 'XPTY0004') # From W3C XSD 1.1 tests context = XPathContext(root=self.etree.XML(''), variables={'value': Date(9999, 10, 10)}) self.check_value('$value lt current-date()', False, context=context) def test_comparison_in_expression(self): context = XPathContext(self.etree.XML('false')) self.check_value("(. = 'false') = (. = 'false')", True, context) self.check_value("(. = 'asdf') != (. = 'false')", True, context) def test_boolean_evaluation_in_selector(self): context = XPathContext(self.etree.XML(""" true 10.0 1 10.0 false 5.0 0 5.0 """)) self.check_value("sum(//price)", 30, context) self.check_value("sum(//price[../available = 'true'])", 10, context) self.check_value("sum(//price[../available = 'false'])", 5, context) self.check_value("sum(//price[../available = '1'])", 10, context) self.check_value("sum(//price[../available = '0'])", 5, context) self.check_value("sum(//price[../available = true()])", 20, context) self.check_value("sum(//price[../available = false()])", 10, context) def test_comparison_of_sequences(self): super(XPath2ParserTest, self).test_comparison_of_sequences() self.parser.compatibility_mode = True self.wrong_type("(false(), false()) = 1") self.check_value("(false(), false()) = (false(), false())", True) self.check_value("(false(), false()) = (false(), false(), false())", True) self.check_value("(false(), false()) = (false(), true())", True) self.check_value("(false(), false()) = (true(), false())", True) self.check_value("(false(), false()) = (true(), true())", False) self.check_value("(false(), false()) = (true(), true(), false())", True) self.parser.compatibility_mode = False # From XPath 2.0 examples root = self.etree.XML('' ' Kafka' ' Huxley' ' Asimov' '') context = XPathContext(root=root, variables={'book1': root[0]}) self.check_value('$book1 / author = "Kafka"', True, context=context) self.check_value('$book1 / author eq "Kafka"', True, context=context) self.check_value("(1, 2) = (2, 3)", True) self.check_value("(2, 3) = (3, 4)", True) self.check_value("(1, 2) = (3, 4)", False) self.check_value("(1, 2) != (2, 3)", True) # != is not the inverse of = context = XPathContext(root=root, variables={ 'a': UntypedAtomic('1'), 'b': UntypedAtomic('2'), 'c': UntypedAtomic('2.0') }) self.check_value('($a, $b) = ($c, 3.0)', False, context=context) self.check_value('($a, $b) = ($c, 2.0)', True, context=context) self.wrong_type("(1, 2) le (2, 3)", 'XPTY0004', 'sequence of length greater than one') root = self.etree.XML('') context = XPathContext(root=root) self.check_value('@min', [context.root.attributes[0]], context=context) self.check_value('@min le @max', True, context=context) root = self.etree.XML('') self.check_value('@min le @max', False, context=XPathContext(root=root)) self.check_value('@min le @maximum', None, context=XPathContext(root=root)) if xmlschema is not None: schema = xmlschema.XMLSchema(""" """) with self.schema_bound_parser(schema.elements['root'].xpath_proxy): root = self.etree.XML('11') self.check_value('. le 10', False, context=XPathContext(root)) self.check_value('. le 20', True, context=XPathContext(root)) root = self.etree.XML('eleven') self.wrong_type('. le 10', 'XPDY0050', context=XPathContext(root)) root = self.etree.XML('12') with self.assertRaises(TypeError) as err: self.check_value('. le "11"', context=XPathContext(root)) self.assertIn('XPTY0004', str(err.exception)) # Static schema context error with self.assertRaises(TypeError) as err: self.check_value('. le 10', context=XPathContext(root)) self.assertIn('XPTY0004', str(err.exception)) # Dynamic context error schema = xmlschema.XMLSchema(""" """) with self.schema_bound_parser(schema.elements['root'].xpath_proxy): root = self.etree.XML('15') self.check_value('. le "11"', False, context=XPathContext(root)) root = self.etree.XML('1103050') self.check_selector("a = (1 to 30)", root, True) self.check_selector("a = (2)", root, False) self.check_selector("a[1] = (1 to 10, 30)", root, True) self.check_selector("a[2] = (1 to 10, 30)", root, True) self.check_selector("a[3] = (1 to 10, 30)", root, True) self.check_selector("a[4] = (1 to 10, 30)", root, False) def test_unknown_axis(self): self.wrong_syntax('unknown::node()', 'XPST0003') self.wrong_syntax('A/unknown::node()', 'XPST0003') self.parser.compatibility_mode = True self.wrong_name('unknown::node()', 'XPST0010') self.wrong_name('A/unknown::node()', 'XPST0010') self.parser.compatibility_mode = False def test_predicate(self): super(XPath2ParserTest, self).test_predicate() root = self.etree.XML('') self.check_selector("/(A/*/*)[1]", root, [root[0][0]]) self.check_selector("/A/*/*[1]", root, [root[0][0], root[1][0]]) def test_subtract_datetimes(self): context = XPathContext(root=self.etree.XML(''), timezone=Timezone.fromstring('-05:00')) self.check_value('xs:dateTime("2000-10-30T06:12:00") - xs:dateTime("1999-11-28T09:00:00Z")', DayTimeDuration.fromstring('P337DT2H12M'), context) self.check_value('xs:dateTime("2000-10-30T06:12:00") - xs:dateTime("1999-11-28T09:00:00Z")', DayTimeDuration.fromstring('P336DT21H12M')) def test_subtract_dates(self): context = XPathContext(root=self.etree.XML(''), timezone=Timezone.fromstring('Z')) self.check_value('xs:date("2000-10-30") - xs:date("1999-11-28")', DayTimeDuration.fromstring('P337D'), context) context.timezone = Timezone.fromstring('+05:00') self.check_value('xs:date("2000-10-30") - xs:date("1999-11-28Z")', DayTimeDuration.fromstring('P336DT19H'), context) self.check_value('xs:date("2000-10-15-05:00") - xs:date("2000-10-10+02:00")', DayTimeDuration.fromstring('P5DT7H')) # BCE test cases self.check_value('xs:date("0001-01-01") - xs:date("-0001-01-01")', DayTimeDuration.fromstring('P366D')) self.check_value('xs:date("-0001-01-01") - xs:date("-0001-01-01")', DayTimeDuration.fromstring('P0D')) self.check_value('xs:date("-0001-01-01") - xs:date("0001-01-01")', DayTimeDuration.fromstring('-P366D')) self.check_value('xs:date("-0001-01-01") - xs:date("-0001-01-02")', DayTimeDuration.fromstring('-P1D')) self.check_value('xs:date("-0001-01-04") - xs:date("-0001-01-01")', DayTimeDuration.fromstring('P3D')) self.check_value('xs:date("0200-01-01") - xs:date("-0121-01-01")', DayTimeDuration.fromstring('P116878D')) self.check_value('xs:date("-0201-01-01") - xs:date("0120-01-01")', DayTimeDuration.fromstring('-P116877D')) def test_subtract_times(self): context = XPathContext(root=self.etree.XML(''), timezone=Timezone.fromstring('-05:00')) self.check_value('xs:time("11:12:00Z") - xs:time("04:00:00")', DayTimeDuration.fromstring('PT2H12M'), context) self.check_value('xs:time("11:00:00-05:00") - xs:time("21:30:00+05:30")', DayTimeDuration.fromstring('PT0S'), context) self.check_value('xs:time("17:00:00-06:00") - xs:time("08:00:00+09:00")', DayTimeDuration.fromstring('PT24H'), context) self.check_value('xs:time("24:00:00") - xs:time("23:59:59")', DayTimeDuration.fromstring('-PT23H59M59S'), context) def test_add_year_month_duration_to_datetime(self): self.check_value('xs:dateTime("2000-10-30T11:12:00") + xs:yearMonthDuration("P1Y2M")', DateTime.fromstring("2001-12-30T11:12:00")) def test_add_day_time_duration_to_datetime(self): self.check_value('xs:dateTime("2000-10-30T11:12:00") + xs:dayTimeDuration("P3DT1H15M")', DateTime.fromstring("2000-11-02T12:27:00")) def test_subtract_year_month_duration_from_datetime(self): self.check_value('xs:dateTime("2000-10-30T11:12:00") - xs:yearMonthDuration("P0Y2M")', DateTime.fromstring("2000-08-30T11:12:00")) self.check_value('xs:dateTime("2000-10-30T11:12:00") - xs:yearMonthDuration("P1Y2M")', DateTime.fromstring("1999-08-30T11:12:00")) def test_subtract_day_time_duration_from_datetime(self): self.check_value('xs:dateTime("2000-10-30T11:12:00") - xs:dayTimeDuration("P3DT1H15M")', DateTime.fromstring("2000-10-27T09:57:00")) def test_add_year_month_duration_to_date(self): self.check_value('xs:date("2000-10-30") + xs:yearMonthDuration("P1Y2M")', Date.fromstring('2001-12-30')) def test_subtract_year_month_duration_from_date(self): self.check_value('xs:date("2000-10-30") - xs:yearMonthDuration("P1Y2M")', Date.fromstring('1999-08-30')) self.check_value('xs:date("2000-02-29Z") - xs:yearMonthDuration("P1Y")', Date.fromstring('1999-02-28Z')) self.check_value('xs:date("2000-10-31-05:00") - xs:yearMonthDuration("P1Y1M")', Date.fromstring('1999-09-30-05:00')) def test_subtract_day_time_duration_from_date(self): self.check_value('xs:date("0001-01-05") - xs:dayTimeDuration("P3DT1H15M")', Date.fromstring('0001-01-01')) self.check_value('xs:date("2000-10-30") - xs:dayTimeDuration("P3DT1H15M")', Date.fromstring('2000-10-26')) def test_add_day_time_duration_to_time(self): self.check_value('xs:time("11:12:00") + xs:dayTimeDuration("P3DT1H15M")', Time.fromstring('12:27:00')) self.check_value('xs:time("23:12:00+03:00") + xs:dayTimeDuration("P1DT3H15M")', Time.fromstring('02:27:00+03:00')) def test_subtract_day_time_duration_to_time(self): self.check_value('xs:time("11:12:00") - xs:dayTimeDuration("P3DT1H15M")', Time.fromstring('09:57:00')) self.check_value('xs:time("08:20:00-05:00") - xs:dayTimeDuration("P23DT10H10M")', Time.fromstring('22:10:00-05:00')) def test_duration_with_arithmetical_operators(self): self.wrong_type('xs:duration("P1Y") * 3', 'XPTY0004', 'unsupported operand type(s)') self.wrong_value('xs:duration("P1Y") * xs:float("NaN")', 'FOCA0005') self.check_value('xs:duration("P1Y") * xs:float("INF")', OverflowError) self.wrong_value('xs:float("NaN") * xs:duration("P1Y")', 'FOCA0005') self.check_value('xs:float("INF") * xs:duration("P1Y")', OverflowError) self.wrong_type('xs:duration("P3Y") div 3', 'XPTY0004', 'unsupported operand type(s)') def test_year_month_duration_operators(self): self.check_value('xs:yearMonthDuration("P2Y11M") + xs:yearMonthDuration("P3Y3M")', YearMonthDuration(months=74)) self.check_value('xs:yearMonthDuration("P2Y11M") - xs:yearMonthDuration("P3Y3M")', YearMonthDuration(months=-4)) self.check_value('xs:yearMonthDuration("P2Y11M") * 2.3', YearMonthDuration.fromstring('P6Y9M')) self.check_value('xs:yearMonthDuration("P2Y11M") div 1.5', YearMonthDuration.fromstring('P1Y11M')) self.check_value('xs:yearMonthDuration("P3Y4M") div xs:yearMonthDuration("-P1Y4M")', -2.5) self.wrong_value('xs:double("NaN") * xs:yearMonthDuration("P2Y")', 'FOCA0005') self.check_value('xs:yearMonthDuration("P1Y") * xs:double("INF")', OverflowError) self.wrong_value('xs:yearMonthDuration("P3Y") div xs:double("NaN")', 'FOCA0005') self.check_raise('xs:yearMonthDuration("P3Y") div xs:yearMonthDuration("P0Y")', ZeroDivisionError, 'FOAR0001', 'Division by zero') self.check_raise('xs:yearMonthDuration("P3Y36M") div 0', OverflowError, 'FODT0002') def test_day_time_duration_operators(self): self.check_value('xs:dayTimeDuration("P2DT12H5M") + xs:dayTimeDuration("P5DT12H")', DayTimeDuration.fromstring('P8DT5M')) self.check_value('xs:dayTimeDuration("P2DT12H") - xs:dayTimeDuration("P1DT10H30M")', DayTimeDuration.fromstring('P1DT1H30M')) self.check_value('xs:dayTimeDuration("PT2H10M") * 2.1', DayTimeDuration.fromstring('PT4H33M')) self.check_value('xs:dayTimeDuration("P1DT2H30M10.5S") div 1.5', DayTimeDuration.fromstring('PT17H40M7S')) self.check_value('3 * xs:dayTimeDuration("P1D")', DayTimeDuration.fromstring('P3D')) self.check_value( 'xs:dayTimeDuration("P2DT53M11S") div xs:dayTimeDuration("P1DT10H")', Decimal('1.437834967320261437908496732') ) def test_document_node_accessor(self): document = self.etree.parse(io.StringIO('')) context = XPathContext(root=document) self.wrong_syntax("document-node(A)") self.wrong_syntax("document-node(*)") self.wrong_syntax("document-node(true())") self.wrong_syntax("document-node(node())") self.wrong_type("document-node(element(A), 1)") self.check_select("document-node()", [], context) self.check_select("self::document-node()", [context.root], context) self.check_selector("self::document-node(element(A))", document, [document]) self.check_selector("self::document-node(element(B))", document, []) context = XPathContext(root=document.getroot()) self.check_select("document-node()", [], context) self.check_select("self::document-node()", [], context) self.check_select("self::document-node(element(A))", [], context) def test_element_accessor(self): element = self.etree.Element('schema') context = XPathContext(root=element) self.wrong_syntax("element('name')") self.wrong_syntax("element(A, 'name')") self.check_select("element()", [], context) self.check_select("self::element()", [context.root], context) self.check_select("self::element(schema)", [context.root], context) self.check_select("self::element(schema, xs:string)", [], context) root = self.etree.XML('texttail') context = XPathContext(root) expected = [e for e in context.root if isinstance(e, ElementNode)] self.check_select("element(*)", expected, context) self.check_select("element(B)", expected, context) self.check_select("element(A)", [], context) if xmlschema is not None: schema = xmlschema.XMLSchema(dedent('''\ ''')) root = self.etree.XML('hello') context = XPathContext(root) with self.schema_bound_parser(schema.elements['root'].xpath_proxy): context.root.xsd_type = schema.elements['root'].type self.check_select("self::element(*, xs:string)", [context.root], context) self.check_select("self::element(*, xs:int)", [], context) def test_attribute_accessor(self): root = self.etree.XML('texttail') context = XPathContext(root) self.check_select("attribute()", {'10', '20'}, context) self.check_select("attribute(*)", {'10', '20'}, context) self.check_select("attribute(a)", ['10'], context) self.check_select("attribute(a, xs:int)", ['10'], context) if xmlschema is not None: schema = xmlschema.XMLSchema(""" """) with self.schema_bound_parser(schema.elements['A'].xpath_proxy): self.check_select("attribute(a, xs:int)", ['10'], context) self.check_select("attribute(*, xs:int)", {'10', '20'}, context) self.check_select("attribute(a, xs:string)", [], context) self.check_select("attribute(*, xs:string)", [], context) def test_node_and_node_accessors(self): element = self.etree.Element('schema') element.attrib.update([('id', '0212349350')]) context = XPathContext(root=element) self.check_select("self::node()", [context.root], context) self.check_select("self::attribute()", ['0212349350'], context) context.item = 7 self.check_select("node()", [], context) context.item = 10.2 self.check_select("node()", [], context) def test_union_intersect_except_operators(self): root = self.etree.XML('') self.check_selector('/A/B2 union /A/B1', root, root[:2]) self.check_selector('/A/B2 union /A/*', root, root[:]) self.check_selector('/A/B2 intersect /A/B1', root, []) self.check_selector('/A/B2 intersect /A/*', root, [root[1]]) self.check_selector('/A/B1/* intersect /A/B2/*', root, []) self.check_selector('/A/B1/* intersect /A/*/*', root, root[0][:]) self.check_selector('/A/B2 except /A/B1', root, root[1:2]) self.check_selector('/A/* except /A/B2', root, [root[0], root[2]]) self.check_selector('/A/*/* except /A/B2/*', root, root[0][:]) self.check_selector('/A/B2/* except /A/B1/*', root, root[1][:]) self.check_selector('/A/B2/* except /A/*/*', root, []) root = self.etree.XML('') # From variables like XPath 2.0 examples context = XPathContext(root, variables={ 'seq1': root[:2], # (A, B) 'seq2': root[:2], # (A, B) 'seq3': root[1:], # (B, C) }) self.check_select('$seq1 union $seq2', context.root[:2], context=context) self.check_select('$seq2 union $seq3', context.root[:], context=context) self.check_select('$seq1 intersect $seq2', context.root[:2], context=context) self.check_select('$seq2 intersect $seq3', context.root[1:2], context=context) self.check_select('$seq1 except $seq2', [], context=context) self.check_select('$seq2 except $seq3', context.root[:1], context=context) self.wrong_type('1 intersect 1', 'XPTY0004', 'only XPath nodes are allowed', context=context) self.wrong_type('1 except $seq1', 'XPTY0004', 'only XPath nodes are allowed', context=context) self.wrong_type('1 union $seq1', 'XPTY0004', 'only XPath nodes are allowed', context=context) self.wrong_type('$seq1 intersect 1', 'XPTY0004', 'only XPath nodes are allowed', context=context) self.wrong_type('$seq1 union 1', 'XPTY0004', 'only XPath nodes are allowed', context=context) def test_node_comparison_operators(self): # Test cases from https://www.w3.org/TR/xpath20/#id-node-comparisons root = self.etree.XML(''' 1558604820QA76.9 C3845 0070512655QA76.9 C3846 0131477005QA76.9 C3847 ''') self.check_selector('/books/book[isbn="1558604820"] is /books/book[call="QA76.9 C3845"]', root, True) self.check_selector('/books/book[isbn="0070512655"] is /books/book[call="QA76.9 C3847"]', root, False) self.check_selector('/books/book[isbn="not a code"] is /books/book[call="QA76.9 C3847"]', root, []) context = XPathContext(root) self.check_value('/books/book[isbn="1558604820"] is ()', context=context) self.wrong_type('/books/book[isbn="1558604820"] is (1, 2)', 'XPTY0004', context=context) self.check_value('/books/book[isbn="1558604820"] << /books/book[isbn="1558604820"]', False, context=context) context = XPathContext(root, variables={'a': self.etree.Element('a'), 'b': self.etree.Element('b')}) self.wrong_value('$a << $b', 'FOCA0002', 'operands are not nodes of the XML tree', context=context) root = self.etree.XML(''' 28-451 33-870 15-392 35-530 10-639 10-639 39-729 ''') self.check_selector( '/transactions/purchase[parcel="28-451"] << /transactions/sale[parcel="33-870"]', root, True ) self.check_selector( '/transactions/purchase[parcel="15-392"] >> /transactions/sale[parcel="33-870"]', root, True ) self.check_selector( '/transactions/purchase[parcel="10-639"] >> /transactions/sale[parcel="33-870"]', root, TypeError ) self.wrong_type('is ()', 'XPST0017') self.wrong_syntax('is B', 'XPST0003') self.wrong_syntax('A is B is C', 'XPST0003') def test_empty_sequence_type(self): self.check_value("() treat as empty-sequence()", []) self.check_value("6 treat as empty-sequence()", TypeError) self.wrong_syntax("empty-sequence()") context = XPathContext(root=self.etree.XML('')) self.check_value("() instance of empty-sequence()", expected=True, context=context) self.check_value(". instance of empty-sequence()", expected=False, context=context) def test_item_sequence_type(self): self.check_value("4 treat as item()", MissingContextError) context = XPathContext(self.etree.XML('')) self.check_value("4 treat as item()", [4], context) self.check_value("() treat as item()", TypeError, context) self.wrong_syntax("item()") context = XPathContext(root=self.etree.XML('')) self.check_value(". instance of item()", expected=True, context=context) self.check_value("() instance of item()", expected=False, context=context) context = XPathContext(root=self.etree.parse(io.StringIO(''))) self.check_value(". instance of item()", expected=True, context=context) self.check_value("() instance of item()", expected=False, context=context) def test_static_analysis_phase(self): context = XPathContext(self.etree.XML(''), variables=self.variables) self.check_value('fn:concat($word, fn:lower-case(" BETA"))', 'alpha beta', context) self.check_value('fn:concat($word, fn:lower-case(10))', TypeError, context) self.check_value('fn:concat($unknown, fn:lower-case(10))', NameError, context) def test_instance_of_expression(self): element = self.etree.Element('schema') # Test cases from https://www.w3.org/TR/xpath20/#id-instance-of self.check_value("5 instance of xs:integer", True) self.check_value("5 instance of xs:decimal", True) self.check_value("9.0 instance of xs:integer", False) self.check_value("(5, 6) instance of xs:integer+", True) context = XPathContext(element) self.check_value(". instance of element()", True, context) context.item = None self.check_value(". instance of element()", False, context) self.check_value("(5, 6) instance of xs:integer", False) self.check_value("(5, 6) instance of xs:integer*", True) self.check_value("(5, 6) instance of xs:integer?", False) self.check_value("5 instance of empty-sequence()", False) self.check_value("() instance of empty-sequence()", True) self.wrong_syntax("5 instance of unknown()", 'XPST0003', "unknown function 'unknown'") self.wrong_syntax("1e3 instance of empty-sequence()(", 'XPST0003') # Test dynamic evaluation error on prefixed name parser = XPath2Parser() token = parser.parse('5 instance of xs:decimal') parser.namespaces.pop('xs') with self.assertRaises(NameError) as ctx: token.evaluate() self.assertIn('XPST0081', str(ctx.exception)) # From W3C XQuery/XPath tests context = XPathContext(element) self.check_value("not(1 instance of node())", True, context) self.check_value("(1, 2, 3, 4, 5) instance of item()+", True, context) self.check_value("(1, 2, 3, 4, 5) instance of item()", False, context) self.wrong_name("3 instance of void") def test_treat_as_expression(self): element = self.etree.Element('schema') context = XPathContext(element) self.check_value("5 treat as xs:integer", [5]) self.check_value("5 treat as xs:string", TypeError) self.check_value("5 treat as xs:decimal", [5]) self.check_value("(5, 6) treat as xs:integer+", [5, 6]) self.check_value(". treat as element()", [context.root], context) self.check_value("(5, 6) treat as xs:integer", TypeError) self.check_value("(5, 6) treat as xs:integer*", [5, 6]) self.check_value("(5, 6) treat as xs:integer?", TypeError) self.check_value("5 treat as empty-sequence()", TypeError) self.check_value("() treat as empty-sequence()", []) self.check_value("() treat as xs:integer?", []) self.wrong_type("() treat as xs:integer", 'XPDY0050') # Test dynamic evaluation error on prefixed name parser = XPath2Parser() token = parser.parse('5 treat as xs:decimal') parser.namespaces.pop('xs') with self.assertRaises(NameError) as ctx: token.evaluate() self.assertIn('XPST0081', str(ctx.exception)) # From W3C XQuery/XPath tests self.check_value("3 treat as item()+", [3], context) self.wrong_type("3 treat as node()+", 'XPDY0050', context=context) self.check_value("(1, 2, 3) treat as item()+", [1, 2, 3], context) self.wrong_type("(1, 2, 3) treat as item()", 'XPDY0050', context=context) self.wrong_name("3 treat as xs:doesNotExist") def test_castable_expression(self): self.check_value("5 castable as xs:integer", True) self.check_value("'5' castable as xs:integer", True) self.check_value("'hello' castable as xs:integer", False) self.check_value("('5', '6') castable as xs:integer", False) self.check_value("() castable as xs:integer", False) self.check_value("() castable as xs:integer?", True) self.wrong_syntax("5 castable as empty-sequence()", 'XPST0003') self.wrong_name("5 castable as void", 'XPST0051') self.check_value("5 castable as xs:void", False) self.check_value("'NaN' castable as xs:double", True) self.check_value("'None' castable as xs:double", False) self.check_value("'NaN' castable as xs:float", True) self.check_value("'NaN' castable as xs:integer", False) # From W3C XQuery/XPath tests self.check_value("(1E3) castable as xs:double?", True) def test_cast_expression(self): self.check_value("5 cast as xs:integer", 5) self.check_value("'5' cast as xs:integer", 5) self.check_value("'hello' cast as xs:integer", ValueError) self.check_value("('5', '6') cast as xs:integer", TypeError) self.check_value("() cast as xs:integer", TypeError) self.check_value("() cast as xs:integer?", []) self.check_value('"1" cast as xs:boolean', True) self.check_value('"0" cast as xs:boolean', False) self.check_value("xs:untypedAtomic('1E3') cast as xs:double", 1E3) self.wrong_value("xs:untypedAtomic('x') cast as xs:double", 'FORG0001') # Test dynamic evaluation error on prefixed name parser = XPath2Parser() token = parser.parse("() cast as xs:string?") parser.namespaces.pop('xs') with self.assertRaises(NameError) as ctx: token.evaluate() self.assertIn('XPST0081', str(ctx.exception)) @unittest.skipIf(xmlschema is None, "xmlschema library is not installed!") def test_cast_or_castable_with_derived_type(self): schema = xmlschema.XMLSchema(dedent("""\n """)) with self.schema_bound_parser(schema.xpath_proxy): root = self.etree.XML('') context = XPathContext(root) self.check_value("'1E3' castable as floatType", True, context) self.check_value("(1E3) castable as floatType", True, context) self.check_value("xs:untypedAtomic('1E3') cast as floatType", 1E3) self.check_value("xs:untypedAtomic('x') castable as floatType", False) self.wrong_value("xs:untypedAtomic('x') cast as floatType", 'FORG0001') self.wrong_value("'x' cast as floatType", 'FORG0001') self.wrong_type("xs:anyURI('http://xpath.test') cast as floatType", 'XPTY0004') def test_logical_expressions_(self): super(XPath2ParserTest, self).test_logical_expressions() if xmlschema is not None: schema = xmlschema.XMLSchema(""" """) with self.schema_bound_parser(schema.elements['root'].xpath_proxy): root_token = self.parser.parse("(@a and not(@b)) or (not(@a) and @b)") context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is False) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is False) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is True) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is False) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is True) def test_element_decimal_cast(self): root = self.etree.XML(''' 155860482012.50 155860482013.50 1558604820-0.1 ''') expected_values = [Decimal('12.5'), Decimal('13.5'), Decimal('-0.1')] self.assertEqual(3, len(select(root, "//book"))) for book in iter_select(root, "//book"): context = XPathContext(root=root, item=book) root_token = self.parser.parse("xs:decimal(price)") self.assertEqual(expected_values.pop(0), root_token.evaluate(context)) def test_element_decimal_comparison_after_round(self): self.check_value('xs:decimal(0.36) = round(0.36*100) div 100', True) def test_tokenizer_ambiguity(self): # From issue #27 self.check_tokenizer("sch:pattern[@is-a]", ['sch', ':', 'pattern', '[', '@', 'is-a', ']']) self.check_tokenizer("/is-a", ['/', 'is-a']) self.check_tokenizer("/-is-a", ['/', '-', 'is-a']) def test_operator_ambiguity(self): # Related to issue #27 self.check_tokenizer("/is", ['/', 'is']) context = XPathContext(self.etree.XML('')) self.check_value('/is', [], context) context = XPathContext(self.etree.XML('')) self.check_value('/is', [context.root], context) self.check_value('/and', [], context) context = XPathContext(self.etree.XML('')) self.check_value('/and', [context.root], context) root = self.etree.XML('') context = XPathContext(self.etree.ElementTree(root)) self.check_value('and', [context.root.getroot()], context) root = self.etree.XML('') context = XPathContext(self.etree.ElementTree(root)) self.check_value('eq', [context.root.getroot()], context) root = self.etree.XML('') context = XPathContext(self.etree.ElementTree(root)) self.check_value('union', [context.root.getroot()], context) def test_statements_ambiguity(self): root = self.etree.XML('') context = XPathContext(self.etree.ElementTree(root)) self.check_value('for', [context.root.getroot()], context) def test_auxiliary_tokens(self): self.check_raise('as', MissingContextError) self.check_raise('of', MissingContextError) context = XPathContext(self.etree.XML('')) self.check_raise('as', MissingContextError, context=context) self.check_raise('of', MissingContextError, context=context) def test_function_namespace(self): function_namespace = "http://xpath.test/fn/xpath-functions" parser = self.parser.__class__( namespaces={'fn2': function_namespace}, function_namespace=function_namespace ) token = parser.parse('fn2:true()') self.assertTrue(token.evaluate()) def test_invalid_schema_argument(self): schema = dedent("""\ """) with self.assertRaises(TypeError) as ctx: self.parser.__class__(schema=schema) self.assertEqual(str(ctx.exception), "argument 'schema' must be an instance of AbstractSchemaProxy") if xmlschema is not None: with self.assertRaises(TypeError): self.parser.__class__(schema=xmlschema.XMLSchema(schema)) def test_variable_types_argument(self): variable_types = {'a': 'item()', 'b': 'xs:integer'} parser = self.parser.__class__(variable_types=variable_types) self.assertEqual(variable_types, parser.variable_types) self.assertIsNot(variable_types, parser.variable_types) with self.assertRaises(ValueError) as ctx: self.parser.__class__(variable_types={'a': 'item()', 'b': 'xs:complex'}) self.assertEqual(str(ctx.exception), "invalid sequence type for in-scope variable types") def test_document_types_argument(self): document_types = {'doc1': 'node()*', 'doc2': 'element()'} parser = self.parser.__class__(document_types=document_types) self.assertEqual(document_types, parser.document_types) self.assertIs(document_types, parser.document_types) with self.assertRaises(ValueError) as ctx: self.parser.__class__(document_types={'doc1': 'node()*', 'doc2': 'etree()'}) self.assertEqual(str(ctx.exception), "invalid sequence type in document_types argument") def test_collection_types_argument(self): collection_types = {'col1': 'node()*', 'col2': 'element()*'} parser = self.parser.__class__(collection_types=collection_types) self.assertEqual(collection_types, parser.collection_types) self.assertIs(collection_types, parser.collection_types) with self.assertRaises(ValueError) as ctx: self.parser.__class__(collection_types={'doc1': 'node()*', 'doc2': 'etree()*'}) self.assertEqual(str(ctx.exception), "invalid sequence type in collection_types argument") def test_default_collection_type_argument(self): parser = self.parser.__class__(default_collection_type='element()*') self.assertEqual(parser.default_collection_type, 'element()*') with self.assertRaises(ValueError) as ctx: self.parser.__class__(default_collection_type='elem()*') self.assertEqual(str(ctx.exception), "invalid sequence type for default_collection_type argument") def test_default_collation_argument(self): locale_collation = get_locale_category(locale.LC_COLLATE) if locale_collation == 'en_US.UTF-8': locale_collation = "http://www.w3.org/2005/xpath-functions/collation/codepoint" self.assertEqual(self.parser.__class__().default_collation, locale_collation) parser = self.parser.__class__(default_collation='it_IT.UTF-8') self.assertEqual(parser.default_collation, 'it_IT.UTF-8') def test_issue_35_getting_attribute_names(self): root = self.etree.XML(dedent("""\ some text T1 T1 T1 T1 T1 T2 T2 T2 T2 T2 """)) result = ['attrib1', 'attrib2', 'isbn', 'lang', 'isbn', 'lang'] self.check_selector('//@*/local-name()', root, result) self.check_selector('//@*/name()', root, result) @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath2ParserTest(XPath2ParserTest): etree = lxml_etree if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_xpath30.py000066400000000000000000001375051427546011100201000ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # # Note: Many tests are built using the examples of the XPath standards, # published by W3C under the W3C Document License. # # References: # https://www.w3.org/TR/xpath-3/ # https://www.w3.org/TR/xpath-30/ # https://www.w3.org/TR/xpath-31/ # https://www.w3.org/Consortium/Legal/2015/doc-license # https://www.w3.org/TR/charmod-norm/ # import unittest import os import re import math import pathlib import platform import xml.etree.ElementTree as ElementTree from typing import cast try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None try: import xmlschema except ImportError: xmlschema = None else: xmlschema.XMLSchema.meta_schema.build() from elementpath import XPathContext, MissingContextError, datatypes, XPathFunction from elementpath.namespaces import XPATH_FUNCTIONS_NAMESPACE from elementpath.etree import is_etree_element, is_lxml_etree_document from elementpath.xpath_nodes import ElementNode, DocumentNode from elementpath.xpath3 import XPath30Parser from elementpath.xpath30.xpath30_helpers import PICTURE_PATTERN, \ int_to_roman, int_to_alphabetic, int_to_words try: from tests import test_xpath2_parser from tests import test_xpath2_functions from tests import test_xpath2_constructors except ImportError: import test_xpath2_parser import test_xpath2_functions import test_xpath2_constructors ANALYZE_STRING_1 = """ 2008-12-03 """ ANALYZE_STRING_2 = """ A1 , C15 ,, D24 , X50 , """ class XPath30ParserTest(test_xpath2_parser.XPath2ParserTest): def setUp(self): self.parser = XPath30Parser(namespaces=self.namespaces) def test_function_match(self): self.parser.parse('math:pi()') def test_braced_uri_literal(self): expected_lexemes = ['Q{', 'http', ':', '//', 'xpath.test', '/', 'ns', '}', 'ABC'] self.check_tokenizer("Q{http://xpath.test/ns}ABC", expected_lexemes) self.check_tokenizer("/Q{http://xpath.test/ns}ABC", ['/'] + expected_lexemes) self.check_tokenizer("Q{###}ABC", ['Q{', '#', '#', '#', '}', 'ABC']) token = self.parser.parse('/Q{http://xpath.test/ns}ABC') self.assertEqual(token.symbol, '/') self.assertEqual(token[0].symbol, 'Q{') with self.assertRaises(TypeError) as ctx: self.parser.parse('/Q{###}ABC') self.assertIn('XQST0046', str(ctx.exception)) token = self.parser.parse('Q{http://www.w3.org/2005/xpath-functions/math}pi()') self.assertAlmostEqual(token.evaluate(), math.pi) # '{' is unusable for non-standard braced URI literals # because is used for inline functions body with self.assertRaises(SyntaxError): self.parser.parse('{http://www.w3.org/2005/xpath-functions/math}pi()') def test_concat_operator(self): token = self.parser.parse("10 || '/' || 6") self.assertEqual(token.evaluate(), "10/6") self.check_tree('"true" || "false"', "(|| ('true') ('false'))") self.check_tree('"true"||"false"', "(|| ('true') ('false'))") def test_function_test(self): func: XPathFunction func = cast(XPathFunction, self.parser.parse("function($x as item()) as item() { $x }")) self.assertTrue(func.match_function_test('function(*)')) func = cast(XPathFunction, self.parser.parse("function($x as item()) as xs:integer { $x }")) self.assertTrue(func.match_function_test('function(item()) as item()')) func = cast(XPathFunction, self.parser.parse("function($x as item()) as item() { $x }")) self.assertTrue(func.match_function_test('function(xs:string) as item()')) def test_dynamic_function_call(self): token = self.parser.parse("$f(2, 3)") with self.assertRaises(MissingContextError): token.evaluate() root = self.etree.XML('') context = XPathContext(root=root, variables={'f': 10}) with self.assertRaises(TypeError): token.evaluate(context) context.variables['f'] = self.parser.symbol_table['concat'](self.parser, nargs=2) self.assertEqual(token.evaluate(context), '23') with self.assertRaises(TypeError): self.parser.parse("f(2, 3)") with self.assertRaises(MissingContextError): token.evaluate() token = self.parser.parse('$f[2]("Hi there")') with self.assertRaises(MissingContextError): token.evaluate() context.variables['f'] = self.parser.symbol_table['concat'](self.parser, nargs=2) with self.assertRaises(TypeError): token.evaluate(context) context.variables['f'] = [1, context.variables['f']] with self.assertRaises(TypeError): token.evaluate(context) context.variables['f'] = self.parser.symbol_table['true'](self.parser, nargs=0) token = self.parser.parse('$f()[2]') with self.assertRaises(MissingContextError): token.evaluate() self.assertEqual(token.evaluate(context), []) token = self.parser.parse('$f()[1]') self.assertTrue(token.evaluate(context)) def test_let_expression(self): token = self.parser.parse('let $x := 4, $y := 3 return $x + $y') with self.assertRaises(MissingContextError): token.evaluate() root = self.etree.XML('') context = XPathContext(root=root) self.assertEqual(token.evaluate(context), [7]) def test_picture_pattern(self): self.assertListEqual(PICTURE_PATTERN.findall(''), []) self.assertListEqual(PICTURE_PATTERN.findall('a'), []) self.assertListEqual(PICTURE_PATTERN.findall('[y]'), ['[y]']) self.assertListEqual(PICTURE_PATTERN.findall('[h01][m01][z,2-6]'), ['[h01]', '[m01]', '[z,2-6]']) self.assertListEqual(PICTURE_PATTERN.findall('[H٠]:[m٠]:[s٠٠]:[f٠٠٠]'), ['[H٠]', '[m٠]', '[s٠٠]', '[f٠٠٠]']) self.assertListEqual(PICTURE_PATTERN.split(' [H٠]:[m٠]:[s٠٠]:[f٠٠٠]'), [' ', ':', ':', ':', '']) self.assertListEqual(PICTURE_PATTERN.findall('[y'), []) self.assertListEqual(PICTURE_PATTERN.findall('[[y]'), ['[y]']) def test_int_to_roman(self): self.assertRaises(TypeError, int_to_roman, 3.0) self.assertEqual(int_to_roman(0), '0') self.assertEqual(int_to_roman(3), 'III') self.assertEqual(int_to_roman(4), 'IV') self.assertEqual(int_to_roman(5), 'V') self.assertEqual(int_to_roman(7), 'VII') self.assertEqual(int_to_roman(9), 'IX') self.assertEqual(int_to_roman(10), 'X') self.assertEqual(int_to_roman(11), 'XI') self.assertEqual(int_to_roman(19), 'XIX') self.assertEqual(int_to_roman(20), 'XX') self.assertEqual(int_to_roman(49), 'XLIX') self.assertEqual(int_to_roman(100), 'C') self.assertEqual(int_to_roman(489), 'CDLXXXIX') self.assertEqual(int_to_roman(2999), 'MMCMXCIX') def test_int_to_alphabetic(self): self.assertEqual(int_to_alphabetic(4), 'd') self.assertEqual(int_to_alphabetic(7), 'g') self.assertEqual(int_to_alphabetic(25), 'y') self.assertEqual(int_to_alphabetic(26), 'z') self.assertEqual(int_to_alphabetic(27), 'aa') self.assertEqual(int_to_alphabetic(-29), '-ac') self.assertEqual(int_to_alphabetic(890), 'ahf') def test_int_to_words(self): self.assertEqual(int_to_words(1), 'one') self.assertEqual(int_to_words(4), 'four') @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath30ParserTest(XPath30ParserTest): etree = lxml_etree class XPath30FunctionsTest(test_xpath2_functions.XPath2FunctionsTest): maxDiff = 1024 def setUp(self): self.parser = XPath30Parser(namespaces=self.namespaces) # Make sure the tests are repeatable. env_vars_to_tweak = 'LC_ALL', 'LANG' self.current_env_vars = {v: os.environ.get(v) for v in env_vars_to_tweak} for v in self.current_env_vars: os.environ[v] = 'en_US.UTF-8' def tearDown(self): if hasattr(self, 'current_env_vars'): for v in self.current_env_vars: if self.current_env_vars[v] is not None: os.environ[v] = self.current_env_vars[v] def test_pi_math_function(self): token = self.parser.parse('math:pi()') self.assertEqual(token.evaluate(), math.pi) def test_exp_math_function(self): token = self.parser.parse('math:exp(())') self.assertIsNone(token.evaluate()) self.assertAlmostEqual(self.parser.parse('math:exp(0)').evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:exp(1)').evaluate(), 2.718281828459045) self.assertAlmostEqual(self.parser.parse('math:exp(2)').evaluate(), 7.38905609893065) self.assertAlmostEqual(self.parser.parse('math:exp(-1)').evaluate(), 0.36787944117144233) self.assertAlmostEqual(self.parser.parse('math:exp(math:pi())').evaluate(), 23.140692632779267) self.assertTrue(math.isnan(self.parser.parse('math:exp(xs:double("NaN"))').evaluate())) self.assertEqual(self.parser.parse("math:exp(xs:double('INF'))").evaluate(), float('inf')) self.assertAlmostEqual(self.parser.parse("math:exp(xs:double('-INF'))").evaluate(), 0.0) def test_exp10_math_function(self): token = self.parser.parse('math:exp10(())') self.assertIsNone(token.evaluate()) self.assertAlmostEqual(self.parser.parse('math:exp10(0)').evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:exp10(1)').evaluate(), 10) self.assertAlmostEqual(self.parser.parse('math:exp10(0.5)').evaluate(), 3.1622776601683795) self.assertAlmostEqual(self.parser.parse('math:exp10(-1)').evaluate(), 0.1) self.assertTrue(math.isnan(self.parser.parse('math:exp10(xs:double("NaN"))').evaluate())) self.assertEqual(self.parser.parse("math:exp10(xs:double('INF'))").evaluate(), float('inf')) self.assertAlmostEqual(self.parser.parse("math:exp10(xs:double('-INF'))").evaluate(), 0.0) def test_log_math_function(self): token = self.parser.parse('math:log(())') self.assertIsNone(token.evaluate()) self.assertEqual(self.parser.parse('math:log(0)').evaluate(), float('-inf')) self.assertAlmostEqual(self.parser.parse('math:log(math:exp(1))').evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:log(1.0e-3)').evaluate(), -6.907755278982137) self.assertAlmostEqual(self.parser.parse('math:log(2)').evaluate(), 0.6931471805599453) self.assertTrue(math.isnan(self.parser.parse('math:log(-1)').evaluate())) self.assertTrue(math.isnan(self.parser.parse('math:log(xs:double("NaN"))').evaluate())) self.assertEqual(self.parser.parse("math:log(xs:double('INF'))").evaluate(), float('inf')) self.assertTrue(math.isnan(self.parser.parse('math:log(xs:double("-INF"))').evaluate())) def test_log10_math_function(self): token = self.parser.parse('math:log10(())') self.assertIsNone(token.evaluate()) self.assertEqual(self.parser.parse('math:log10(0)').evaluate(), float('-inf')) self.assertAlmostEqual(self.parser.parse('math:log10(1.0e3)').evaluate(), 3.0) self.assertAlmostEqual(self.parser.parse('math:log10(1.0e-3)').evaluate(), -3.0) self.assertAlmostEqual(self.parser.parse('math:log10(2)').evaluate(), 0.3010299956639812) self.assertTrue(math.isnan(self.parser.parse('math:log10(-1)').evaluate())) self.assertTrue(math.isnan(self.parser.parse('math:log10(xs:double("NaN"))').evaluate())) self.assertEqual(self.parser.parse("math:log10(xs:double('INF'))").evaluate(), float('inf')) self.assertTrue(math.isnan(self.parser.parse('math:log10(xs:double("-INF"))').evaluate())) def test_pow_math_function(self): self.assertIsNone(self.parser.parse('math:pow((), 93.7)').evaluate()) self.assertAlmostEqual(self.parser.parse('math:pow(2, 3)').evaluate(), 8.0) self.assertAlmostEqual(self.parser.parse('math:pow(-2, 3)').evaluate(), -8.0) self.assertAlmostEqual(self.parser.parse('math:pow(2, -3)').evaluate(), 0.125) self.assertAlmostEqual(self.parser.parse('math:pow(-2, -3)').evaluate(), -0.125) self.assertAlmostEqual(self.parser.parse('math:pow(2, 0)').evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:pow(0, 0)').evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse("math:pow(xs:double('INF'), 0)").evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse("math:pow(xs:double('NaN'), 0)").evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse("math:pow(-math:pi(), 0)").evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:pow(0e0, 3)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:pow(0e0, 4)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:pow(-0e0, 3)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:pow(0, 3)').evaluate(), 0.0) self.assertEqual(self.parser.parse('math:pow(0e0, -3)').evaluate(), float('inf')) self.assertEqual(self.parser.parse('math:pow(0e0, -4)').evaluate(), float('inf')) # self.assertEqual(self.parser.parse('math:pow(-0e0, -3)').evaluate(), float('-inf')) self.assertEqual(self.parser.parse('math:pow(0, -4)').evaluate(), float('inf')) self.assertAlmostEqual(self.parser.parse('math:pow(16, 0.5e0)').evaluate(), 4.0) self.assertAlmostEqual(self.parser.parse('math:pow(16, 0.25e0)').evaluate(), 2.0) self.assertEqual(self.parser.parse('math:pow(0e0, -3.0e0)').evaluate(), float('inf')) # self.assertEqual(self.parser.parse('math:pow(-0e0, -3.0e0)').evaluate(), float('-inf')) self.assertEqual(self.parser.parse('math:pow(0e0, -3.1e0)').evaluate(), float('inf')) self.assertEqual(self.parser.parse('math:pow(-0e0, -3.1e0)').evaluate(), float('inf')) self.assertAlmostEqual(self.parser.parse('math:pow(0e0, 3.0e0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:pow(-0e0, 3.0e0)').evaluate(), -0.0) self.assertAlmostEqual(self.parser.parse('math:pow(0e0, 3.1e0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:pow(-0e0, 3.1e0)').evaluate(), -0.0) self.assertAlmostEqual(self.parser.parse("math:pow(-1, xs:double('INF'))").evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse("math:pow(-1, xs:double('-INF'))").evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse("math:pow(1, xs:double('INF'))").evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse("math:pow(1, xs:double('-INF'))").evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse("math:pow(1, xs:double('NaN'))").evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:pow(-2.5e0, 2.0e0)').evaluate(), 6.25) self.assertTrue(math.isnan(self.parser.parse('math:pow(-2.5e0, 2.00000001e0)').evaluate())) def test_sqrt_math_function(self): self.assertIsNone(self.parser.parse('math:sqrt(())').evaluate()) self.assertAlmostEqual(self.parser.parse('math:sqrt(0.0e0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:sqrt(-0.0e0)').evaluate(), -0.0) self.assertAlmostEqual(self.parser.parse('math:sqrt(1.0e6)').evaluate(), 1.0e3) self.assertAlmostEqual(self.parser.parse('math:sqrt(2.0e0)').evaluate(), 1.4142135623730951) self.assertTrue(math.isnan(self.parser.parse('math:sqrt(-2.0e0)').evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:sqrt(xs:double('NaN'))").evaluate())) self.assertEqual(self.parser.parse("math:sqrt(xs:double('INF'))").evaluate(), float('inf')) self.assertTrue(math.isnan(self.parser.parse("math:sqrt(xs:double('-INF'))").evaluate())) def test_sin_math_function(self): self.assertIsNone(self.parser.parse('math:sin(())').evaluate()) self.assertAlmostEqual(self.parser.parse('math:sin(0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:sin(-0.0e0)').evaluate(), -0.0) self.assertAlmostEqual(self.parser.parse('math:sin(math:pi() div 2)').evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:sin(-math:pi() div 2)').evaluate(), -1.0) self.assertAlmostEqual(self.parser.parse('math:sin(math:pi())').evaluate(), 0.0, places=13) self.assertTrue(math.isnan(self.parser.parse("math:sin(xs:double('NaN'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:sin(xs:double('INF'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:sin(xs:double('-INF'))").evaluate())) def test_cos_math_function(self): self.assertIsNone(self.parser.parse('math:cos(())').evaluate()) self.assertAlmostEqual(self.parser.parse('math:cos(0)').evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:cos(-0.0e0)').evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:cos(math:pi() div 2)').evaluate(), 0.0, places=13) self.assertAlmostEqual(self.parser.parse('math:cos(-math:pi() div 2)').evaluate(), 0.0, places=13) self.assertAlmostEqual(self.parser.parse('math:cos(math:pi())').evaluate(), -1.0) self.assertTrue(math.isnan(self.parser.parse("math:cos(xs:double('NaN'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:cos(xs:double('INF'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:cos(xs:double('-INF'))").evaluate())) def test_tan_math_function(self): self.assertIsNone(self.parser.parse('math:tan(())').evaluate()) self.assertAlmostEqual(self.parser.parse('math:tan(0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:tan(-0.0e0)').evaluate(), -0.0) self.assertAlmostEqual(self.parser.parse('math:tan(math:pi() div 4)').evaluate(), 1.0, places=13) self.assertAlmostEqual(self.parser.parse('math:tan(-math:pi() div 4)').evaluate(), -1.0, places=13) self.assertAlmostEqual(self.parser.parse('math:tan(math:pi() div 2)').evaluate(), 1.633123935319537E16, places=13) self.assertAlmostEqual(self.parser.parse('math:tan(-math:pi() div 2)').evaluate(), -1.633123935319537E16, places=13) self.assertAlmostEqual(self.parser.parse('math:tan(math:pi())').evaluate(), 0.0, places=13) self.assertTrue(math.isnan(self.parser.parse("math:tan(xs:double('NaN'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:tan(xs:double('INF'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:tan(xs:double('-INF'))").evaluate())) def test_asin_math_function(self): self.assertIsNone(self.parser.parse('math:asin(())').evaluate()) self.assertAlmostEqual(self.parser.parse('math:asin(0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:asin(-0.0e0)').evaluate(), -0.0) self.assertAlmostEqual( self.parser.parse('math:asin(1.0e0)').evaluate(), 1.5707963267948966e0, places=13 ) self.assertAlmostEqual( self.parser.parse('math:asin(-1.0e0)').evaluate(), -1.5707963267948966e0, places=13 ) self.assertTrue(math.isnan(self.parser.parse("math:asin(2.0e0)").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:asin(xs:double('NaN'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:asin(xs:double('INF'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:asin(xs:double('-INF'))").evaluate())) def test_acos_math_function(self): self.assertIsNone(self.parser.parse('math:acos(())').evaluate()) self.assertAlmostEqual( self.parser.parse('math:acos(0.0e0)').evaluate(), 1.5707963267948966e0, places=13 ) self.assertAlmostEqual( self.parser.parse('math:acos(-0.0e0)').evaluate(), 1.5707963267948966e0, places=13 ) self.assertAlmostEqual(self.parser.parse('math:acos(1.0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:acos(-1.0e0)').evaluate(), math.pi) self.assertTrue(math.isnan(self.parser.parse("math:acos(2.0e0)").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:acos(xs:double('NaN'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:acos(xs:double('INF'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:acos(xs:double('-INF'))").evaluate())) def test_atan_math_function(self): self.assertIsNone(self.parser.parse('math:atan(())').evaluate()) self.assertAlmostEqual(self.parser.parse('math:atan(0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:atan(-0.0e0)').evaluate(), -0.0) self.assertAlmostEqual( self.parser.parse('math:atan(1.0e0)').evaluate(), 0.7853981633974483e0, places=13 ) self.assertAlmostEqual( self.parser.parse('math:atan(-1.0e0)').evaluate(), -0.7853981633974483e0, places=13 ) self.assertTrue(math.isnan(self.parser.parse("math:atan(xs:double('NaN'))").evaluate())) self.assertAlmostEqual( self.parser.parse("math:atan(xs:double('INF'))").evaluate(), 1.5707963267948966e0, places=5 ) self.assertAlmostEqual( self.parser.parse("math:atan(xs:double('-INF'))").evaluate(), -1.5707963267948966e0, places=5 ) def test_atan2_math_function(self): self.assertAlmostEqual(self.parser.parse('math:atan2(+0.0e0, 0.0e0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:atan2(-0.0e0, 0.0e0)').evaluate(), -0.0) self.assertAlmostEqual(self.parser.parse('math:atan2(+0.0e0, -0.0e0)').evaluate(), math.pi) self.assertAlmostEqual(self.parser.parse('math:atan2(-0.0e0, -0.0e0)').evaluate(), -math.pi) self.assertAlmostEqual(self.parser.parse('math:atan2(-1, 0.0e0)').evaluate(), -math.pi / 2) self.assertAlmostEqual(self.parser.parse('math:atan2(+1, 0.0e0)').evaluate(), math.pi / 2) self.assertAlmostEqual(self.parser.parse('math:atan2(-0.0e0, -1)').evaluate(), -math.pi) self.assertAlmostEqual(self.parser.parse('math:atan2(+0.0e0, -1)').evaluate(), math.pi) self.assertAlmostEqual(self.parser.parse('math:atan2(-0.0e0, +1)').evaluate(), -0.0e0) self.assertAlmostEqual(self.parser.parse('math:atan2(+0.0e0, +1)').evaluate(), 0.0e0) def test_analyze_string_function(self): context = XPathContext(root=self.etree.XML('')) token = self.parser.parse('fn:analyze-string("The cat sat on the mat.", "unmatchable")') result = token.evaluate(context) self.assertIsInstance(result, ElementNode) root = result.elem self.assertEqual(len(root), 1) self.assertEqual(root[0].text, "The cat sat on the mat.") token = self.parser.parse(r'fn:analyze-string("The cat sat on the mat.", "\w+")') result = token.evaluate(context) self.assertIsInstance(result, ElementNode) root = result.elem self.assertEqual(len(root), 12) chunks = ['The', ' ', 'cat', ' ', 'sat', ' ', 'on', ' ', 'the', ' ', 'mat', '.'] for k in range(len(chunks)): if k % 2: self.assertEqual(root[k].tag, '{http://www.w3.org/2005/xpath-functions}non-match') else: self.assertEqual(root[k].tag, '{http://www.w3.org/2005/xpath-functions}match') self.assertEqual(root[k].text, chunks[k]) token = self.parser.parse(r'fn:analyze-string("2008-12-03", "^(\d+)\-(\d+)\-(\d+)$")') result = token.evaluate(context) self.assertIsInstance(result, ElementNode) root = result.elem self.assertEqual(len(root), 1) ElementTree.register_namespace('', XPATH_FUNCTIONS_NAMESPACE) self.assertEqual( ElementTree.tostring(root, encoding='utf-8').decode('utf-8'), re.sub(r'\n\s*', '', ANALYZE_STRING_1) ) token = self.parser.parse('fn:analyze-string("A1,C15,,D24, X50,", "([A-Z])([0-9]+)")') result = token.evaluate(context) self.assertIsInstance(result, ElementNode) root = result.elem self.assertEqual(len(root), 8) self.assertEqual( ElementTree.tostring(root, encoding='utf-8').decode('utf-8'), re.sub(r'\n\s*', '', ANALYZE_STRING_2) ) def test_has_children_function(self): with self.assertRaises(MissingContextError): self.parser.parse('has-children()').evaluate() with self.assertRaises(MissingContextError): self.parser.parse('fn:has-children(1)').evaluate() context = XPathContext(root=self.etree.ElementTree(self.etree.XML(''))) self.assertTrue(self.parser.parse('has-children()').evaluate(context)) self.assertTrue(self.parser.parse('has-children(.)').evaluate(context)) context = XPathContext(root=self.etree.XML('')) self.assertFalse(self.parser.parse('has-children()').evaluate(context)) self.assertFalse(self.parser.parse('has-children(.)').evaluate(context)) context.item = None self.assertFalse(self.parser.parse('has-children()').evaluate(context)) self.assertFalse(self.parser.parse('has-children(.)').evaluate(context)) context.variables['elem'] = ElementNode(self.etree.XML('')) self.assertTrue(self.parser.parse('has-children($elem)').evaluate(context)) self.assertFalse(self.parser.parse('has-children($elem/b1)').evaluate(context)) def test_innermost_function(self): with self.assertRaises(MissingContextError): self.parser.parse('fn:innermost(A)').evaluate() root = self.etree.XML('') document = self.etree.ElementTree(root) context = XPathContext(root=document) nodes = self.parser.parse('fn:innermost(.)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIs(nodes[0], context.root) context = XPathContext(root=root) nodes = self.parser.parse('fn:innermost(.)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIs(nodes[0], context.root) context = XPathContext(root=document, variables={'nodes': [root, document]}) nodes = self.parser.parse('fn:innermost($nodes)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIsInstance(nodes[0], ElementNode) self.assertIs(nodes[0].value, root) root = self.etree.XML('') document = self.etree.ElementTree(root) context = XPathContext( root=document, variables={'nodes': [root, document, root[0], root[0]]} ) nodes = self.parser.parse('fn:innermost($nodes)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIs(nodes[0].value, root[0]) context = XPathContext( root=document, variables={'nodes': [document, root[0][0], root, document, root[0], root[1]]} ) nodes = self.parser.parse('fn:innermost($nodes)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 2) self.assertIs(nodes[0].value, root[0][0]) self.assertIs(nodes[1].value, root[1]) def test_outermost_function(self): with self.assertRaises(MissingContextError): self.parser.parse('fn:outermost(A)').evaluate() root = self.etree.XML('') document = self.etree.ElementTree(root) context = XPathContext(root=document) nodes = self.parser.parse('fn:outermost(.)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIs(nodes[0], context.root) context = XPathContext(root=root) nodes = self.parser.parse('fn:outermost(.)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIs(nodes[0], context.root) context = XPathContext(root=document, variables={'nodes': [root, document]}) nodes = self.parser.parse('fn:outermost($nodes)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIsInstance(nodes[0], DocumentNode) self.assertIs(nodes[0].value, document) root = self.etree.XML('') document = self.etree.ElementTree(root) context = XPathContext( root=document, variables={'nodes': [root, document, root[0], document]} ) nodes = self.parser.parse('fn:outermost($nodes)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIsInstance(nodes[0], DocumentNode) self.assertIs(nodes[0].value, document) context = XPathContext( root=document, variables={'nodes': [document, root[0][0], root, document, root[0], root[1]]} ) nodes = self.parser.parse('fn:outermost($nodes)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIsInstance(nodes[0], DocumentNode) self.assertIs(nodes[0].value, document) context = XPathContext( root=document, variables={'nodes': [root[0][0], root[1], root[0]]} ) nodes = self.parser.parse('fn:outermost($nodes)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 2) self.assertIs(nodes[0].value, root[0]) self.assertIs(nodes[1].value, root[1]) def test_parse_xml_function(self): with self.assertRaises(MissingContextError): self.parser.parse('fn:parse-xml("abcd")').evaluate() root = self.etree.XML('') context = XPathContext(root=self.etree.ElementTree(root)) document = self.parser.parse('fn:parse-xml("abcd")').evaluate(context) self.assertIsInstance(document, DocumentNode) self.assertTrue(is_etree_element(document.document.getroot())) self.assertEqual(document.document.getroot().tag, 'alpha') self.assertEqual(document.document.getroot().text, 'abcd') if self.etree is lxml_etree: self.assertTrue(is_lxml_etree_document(document.document)) else: self.assertFalse(is_lxml_etree_document(document.document)) self.assertEqual(document.document.getroot().tag, 'alpha') self.assertEqual(document.document.getroot().text, 'abcd') self.assertEqual(self.parser.parse('fn:parse-xml(())').evaluate(), []) with self.assertRaises(ValueError) as ctx: self.parser.parse('fn:parse-xml("abcd")').evaluate(context) self.assertIn('FODC0006', str(ctx.exception)) self.assertIn('not a well-formed XML document', str(ctx.exception)) def test_parse_xml_fragment_function(self): root = self.etree.XML('') context = XPathContext(root=self.etree.ElementTree(root)) result = self.parser.parse( 'fn:parse-xml-fragment("abcdabcd")' ).evaluate(context) self.assertIsInstance(result, DocumentNode) document = result.document self.assertTrue(is_etree_element(document.getroot())) self.assertEqual(document.getroot().tag, 'root') self.assertEqual(document.getroot()[0].tag, 'alpha') self.assertEqual(document.getroot()[0].text, 'abcd') self.assertEqual(document.getroot()[1].tag, 'beta') self.assertEqual(document.getroot()[1].text, 'abcd') # Fragments that are not valid formal documents result = self.parser.parse( 'fn:parse-xml-fragment("abcdabcd")' ).evaluate(context) self.assertIsInstance(result, ElementNode) root = result.elem self.assertTrue(is_etree_element(root)) self.assertEqual(root.tag, 'document') self.assertEqual(root[0].tag, 'alpha') self.assertEqual(root[0].text, 'abcd') self.assertEqual(root[1].tag, 'beta') self.assertEqual(root[1].text, 'abcd') result = self.parser.parse( 'fn:parse-xml-fragment("He was so kind")' ).evaluate(context) self.assertIsInstance(result, ElementNode) root = result.elem self.assertTrue(is_etree_element(root)) self.assertEqual(root.text, 'He was ') self.assertEqual(root.tag, 'document') self.assertEqual(root[0].tag, 'i') self.assertEqual(root[0].text, 'so') self.assertEqual(root[0].tail, ' kind') element_node = self.parser.parse('fn:parse-xml-fragment("")').evaluate(context) self.assertTrue(is_etree_element(element_node.elem)) self.assertEqual(element_node.elem.tag, 'document') self.assertIsNone(element_node.elem.text) element_node = self.parser.parse('fn:parse-xml-fragment(" ")').evaluate(context) self.assertTrue(is_etree_element(element_node.elem)) self.assertEqual(element_node.elem.tag, 'document') self.assertEqual(element_node.elem.text, ' ') with self.assertRaises(MissingContextError): self.parser.parse( 'fn:parse-xml(\'\')' ).evaluate() root = self.etree.XML('') context = XPathContext(root=self.etree.ElementTree(root)) with self.assertRaises(ValueError) as ctx: self.parser.parse( 'fn:parse-xml(\'\')' ).evaluate(context) self.assertIn('FODC0006', str(ctx.exception)) self.assertIn('not a well-formed XML document', str(ctx.exception)) def test_serialize_function(self): root = self.etree.XML('') document = self.etree.ElementTree(root) context = XPathContext( root=document, variables={ 'params': ElementTree.XML( '' ' ' '' ), 'data': self.etree.XML("") } ) result = self.parser.parse('fn:serialize($data, $params)').evaluate(context) self.assertEqual(result.replace(' />', '/>'), '') def test_head_function(self): self.assertEqual(self.parser.parse('fn:head(1 to 5)').evaluate(), 1) self.assertEqual(self.parser.parse('fn:head(("a", "b", "c"))').evaluate(), 'a') self.assertIsNone(self.parser.parse('fn:head(())').evaluate()) def test_tail_function(self): self.assertListEqual(self.parser.parse('fn:tail(1 to 5)').evaluate(), [2, 3, 4, 5]) self.assertListEqual(self.parser.parse('fn:tail(("a", "b", "c"))').evaluate(), ['b', 'c']) self.assertListEqual(self.parser.parse('fn:tail(("a"))').evaluate(), []) self.assertListEqual(self.parser.parse('fn:tail(())').evaluate(), []) def test_generate_id_function(self): with self.assertRaises(MissingContextError): self.parser.parse('fn:generate-id()').evaluate() with self.assertRaises(TypeError) as ctx: self.parser.parse('fn:generate-id(1)').evaluate() self.assertIn('XPTY0004', str(ctx.exception)) self.assertIn('argument is not a node', str(ctx.exception)) root = self.etree.XML('') context = XPathContext(root=root) result = self.parser.parse('fn:generate-id()').evaluate(context) self.assertEqual(result, 'ID{}'.format(id(context.item))) result = self.parser.parse('fn:generate-id(.)').evaluate(context) self.assertEqual(result, 'ID{}'.format(id(context.item))) context.item = 1 with self.assertRaises(TypeError) as ctx: self.parser.parse('fn:generate-id()').evaluate(context) self.assertIn('XPTY0004', str(ctx.exception)) self.assertIn('context item is not a node', str(ctx.exception)) def test_unparsed_text_function(self): with self.assertRaises(ValueError) as ctx: self.parser.parse('fn:unparsed-text("alpha#fragment")').evaluate() self.assertIn('FOUT1170', str(ctx.exception)) self.assertIsNone(self.parser.parse('fn:unparsed-text(())').evaluate()) if platform.system() != 'Windows': filepath = pathlib.Path(__file__).absolute().parent.joinpath('resources/sample.xml') file_lines = ['', 'abc àèéìù'] # Checks before that the resource text file is accessible and its content is as expected with filepath.open() as fp: text = fp.read() self.assertListEqual([x.strip() for x in text.strip().split('\n')], file_lines) path = 'fn:unparsed-text("file://{}")'.format(str(filepath)) text = self.parser.parse(path).evaluate() self.assertListEqual([x.strip() for x in text.strip().split('\n')], file_lines) path = 'fn:unparsed-text("file://{}", "unknown")'.format(str(filepath)) with self.assertRaises(ValueError) as ctx: self.parser.parse(path).evaluate() self.assertIn('FOUT1190', str(ctx.exception)) def test_environment_variable_function(self): with self.assertRaises(MissingContextError): self.parser.parse('fn:environment-variable("PATH")').evaluate() root = self.etree.XML('') context = XPathContext(root=root) path = 'fn:environment-variable("PATH")' self.assertIsNone(self.parser.parse(path).evaluate(context)) context = XPathContext(root=root, allow_environment=True) try: key = list(os.environ)[0] except IndexError: pass else: path = 'fn:environment-variable("{}")'.format(key) self.assertEqual(self.parser.parse(path).evaluate(context), os.environ[key]) def test_available_environment_variables_function(self): with self.assertRaises(MissingContextError): self.parser.parse('fn:available-environment-variables()').evaluate() root = self.etree.XML('') context = XPathContext(root=root) path = 'fn:available-environment-variables()' self.assertIsNone(self.parser.parse(path).evaluate(context)) context = XPathContext(root=root, allow_environment=True) self.assertListEqual(self.parser.parse(path).evaluate(context), list(os.environ)) def test_inline_function_expression(self): token = self.parser.parse("function() as xs:integer+ { 2, 3, 5, 7, 11, 13 }") with self.assertRaises(MissingContextError): token.evaluate() root = self.etree.XML('') context = XPathContext(root=root, variables={'a': 9.0, 'b': 3.0}) self.assertListEqual(token(context), [2, 3, 5, 7, 11, 13]) token = self.parser.parse( "function($a as xs:double, $b as xs:double) as xs:double { $a * $b } (9.0, 3.0)" ) with self.assertRaises(MissingContextError): token.evaluate() root = self.etree.XML('') context = XPathContext(root=root) self.assertAlmostEqual(token(context), 27.0) token = self.parser.parse("function($a) { $a } (10)") with self.assertRaises(MissingContextError): token.evaluate() self.assertEqual(token(context), 10) def test_function_lookup(self): token = self.parser.parse("fn:function-lookup(xs:QName('fn:substring'), 2)('abcd', 2)") self.assertEqual(token.evaluate(), "bcd") with self.xsd_version_parser('1.1'): token = self.parser.parse("(fn:function-lookup(xs:QName('xs:dateTimeStamp'), 1), " "xs:dateTime#1)[1] ('2011-11-11T11:11:11Z')") with self.assertRaises(MissingContextError): token.evaluate() # Context is required by predicate selector [1] root = self.etree.XML('') context = XPathContext(root=root) dts = datatypes.DateTimeStamp.fromstring('2011-11-11T11:11:11Z') self.assertEqual(token.evaluate(context), dts) def test_function_name(self): token = self.parser.parse("fn:function-name(fn:substring#2) ") result = datatypes.QName("http://www.w3.org/2005/xpath-functions", "fn:substring") self.assertEqual(token.evaluate(), result) token = self.parser.parse("fn:function-name(function($node){count($node/*)})") # Context is not used if the argument is a function self.assertEqual(token.evaluate(), []) root = self.etree.XML('') context = XPathContext(root=root, variables={'node': root}) self.assertEqual(token.evaluate(context), []) def test_function_arity(self): token = self.parser.parse("fn:function-arity(fn:substring#2)") self.assertEqual(token.evaluate(), 2) token = self.parser.parse("fn:function-arity(function($node){name($node)})") # Context is not used if the argument is a function self.assertEqual(token.evaluate(), 1) root = self.etree.XML('') context = XPathContext(root=root, variables={'node': root}) self.assertEqual(token.evaluate(context), 1) def test_for_each(self): token = self.parser.parse('fn:for-each(1 to 5, function($a) { $a * $a })') with self.assertRaises(MissingContextError): token.evaluate() root = self.etree.XML('') context = XPathContext(root=root) self.assertListEqual(token.evaluate(context), [1, 4, 9, 16, 25]) token = self.parser.parse('fn:for-each(("john", "jane"), fn:string-to-codepoints#1)') self.assertListEqual(token.evaluate(context), [106, 111, 104, 110, 106, 97, 110, 101]) token = self.parser.parse('fn:for-each(("23", "29"), xs:int#1)') self.assertListEqual(token.evaluate(context), [23, 29]) def test_filter(self): token = self.parser.parse('fn:filter(1 to 10, function($a) {$a mod 2 = 0})') with self.assertRaises(MissingContextError): token.evaluate() root = self.etree.XML('') context = XPathContext(root=root) self.assertListEqual(token.evaluate(context), [2, 4, 6, 8, 10]) def test_fold_left(self): token = self.parser.parse('fn:fold-left(1 to 5, 0, function($a, $b) { $a + $b })') with self.assertRaises(MissingContextError): token.evaluate() root = self.etree.XML('') context = XPathContext(root=root) self.assertListEqual(token.evaluate(context), [15]) token = self.parser.parse('fn:fold-left((2,3,5,7), 1, function($a, $b) { $a * $b })') self.assertListEqual(token.evaluate(context), [210]) token = self.parser.parse( 'fn:fold-left((true(), false(), false()), false(), function($a, $b) { $a or $b })') self.assertListEqual(token.evaluate(context), [True]) token = self.parser.parse( 'fn:fold-left((true(), false(), false()), false(), function($a, $b) { $a and $b })') self.assertListEqual(token.evaluate(context), [False]) token = self.parser.parse( 'fn:fold-left(1 to 5, (), function($a, $b) {($b, $a)})') self.assertListEqual(token.evaluate(context), [5, 4, 3, 2, 1]) token = self.parser.parse( 'fn:fold-left(1 to 5, "", fn:concat(?, ".", ?))') self.assertListEqual(token.evaluate(context), [".1.2.3.4.5"]) token = self.parser.parse( 'fn:fold-left(1 to 5, "$zero", fn:concat("$f(", ?, ", ", ?, ")"))') self.assertListEqual(token.evaluate(context), ["$f($f($f($f($f($zero, 1), 2), 3), 4), 5)"]) def test_fold_right(self): token = self.parser.parse('fn:fold-right(1 to 5, 0, function($a, $b) { $a + $b })') with self.assertRaises(MissingContextError): token.evaluate() root = self.etree.XML('') context = XPathContext(root=root) self.assertListEqual(token.evaluate(context), [15]) token = self.parser.parse('fn:fold-right(1 to 5, "", fn:concat(?, ".", ?))') self.assertListEqual(token.evaluate(context), ["1.2.3.4.5."]) token = self.parser.parse( 'fn:fold-right(1 to 5, "$zero", concat("$f(", ?, ", ", ?, ")"))') self.assertListEqual(token.evaluate(context), ["$f(1, $f(2, $f(3, $f(4, $f(5, $zero)))))"]) def test_for_each_pair(self): token = self.parser.parse('fn:for-each-pair(("a", "b", "c"), ("x", "y", "z"), concat#2)') self.assertListEqual(token.evaluate(), ["ax", "by", "cz"]) token = self.parser.parse('fn:for-each-pair(1 to 5, 1 to 5, function($a, $b){10*$a + $b})') with self.assertRaises(MissingContextError): token.evaluate() root = self.etree.XML('') context = XPathContext(root=root) self.assertListEqual(token.evaluate(context), [11, 22, 33, 44, 55]) def test_format_integer(self): self.check_value("format-integer(57, 'I')", 'LVII') self.check_value("format-integer(594, 'i')", 'dxciv') self.check_value("format-integer(7, 'a')", 'g') self.check_value("format-integer(-90956, 'A')", '-EDNH') self.check_value("format-integer(123, 'w')", 'one hundred and twenty-three') self.check_value("format-integer(-8912, 'W')", "-EIGHT THOUSAND NINE HUNDRED AND TWELVE") self.check_value("format-integer(17089674, 'Ww')", "Seventeen Million Eighty-Nine Thousand Six Hundred And Seventy-Four") self.check_value("format-integer(123, '0000')", '0123') @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath30FunctionsTest(XPath30FunctionsTest): etree = lxml_etree class XPath30ConstructorsTest(test_xpath2_constructors.XPath2ConstructorsTest): def setUp(self): self.parser = XPath30Parser(namespaces=self.namespaces) @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath30ConstructorsTest(XPath30ConstructorsTest): etree = lxml_etree if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_xpath31.py000066400000000000000000000177751427546011100201070ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # # Note: Many tests are built using the examples of the XPath standards, # published by W3C under the W3C Document License. # # References: # https://www.w3.org/TR/xpath-3/ # https://www.w3.org/TR/xpath-30/ # https://www.w3.org/TR/xpath-31/ # https://www.w3.org/Consortium/Legal/2015/doc-license # https://www.w3.org/TR/charmod-norm/ # import unittest import os try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None try: import xmlschema except ImportError: xmlschema = None else: xmlschema.XMLSchema.meta_schema.build() from elementpath import XPathContext from elementpath.xpath3 import XPath31Parser from elementpath.xpath_token import XPathMap, XPathArray try: from tests import test_xpath30 except ImportError: import test_xpath30 MAP_WEEKDAYS = """\ map { "Su" : "Sunday", "Mo" : "Monday", "Tu" : "Tuesday", "We" : "Wednesday", "Th" : "Thursday", "Fr" : "Friday", "Sa" : "Saturday" }""" MAP_WEEKDAYS_DE = """\ map{0:"Sonntag", 1:"Montag", 2:"Dienstag", 3:"Mittwoch", 4:"Donnerstag", 5:"Freitag", 6:"Samstag"}""" NESTED_MAP = """\ map { "book": map { "title": "Data on the Web", "year": 2000, "author": [ map { "last": "Abiteboul", "first": "Serge" }, map { "last": "Buneman", "first": "Peter" }, map { "last": "Suciu", "first": "Dan" } ], "publisher": "Morgan Kaufmann Publishers", "price": 39.95 } }""" class XPath31ParserTest(test_xpath30.XPath30ParserTest): def setUp(self): self.parser = XPath31Parser(namespaces=self.namespaces) def test_map_weekdays(self): token = self.parser.parse(MAP_WEEKDAYS) self.assertIsInstance(token, XPathMap) map_value = {'Su': 'Sunday', 'Mo': 'Monday', 'Tu': 'Tuesday', 'We': 'Wednesday', 'Th': 'Thursday', 'Fr': 'Friday', 'Sa': 'Saturday'} self.assertDictEqual(token.evaluate()._map, map_value) token = self.parser.parse(f"{MAP_WEEKDAYS}('Mo')") self.assertEqual(token.evaluate(), 'Monday') token = self.parser.parse(f"{MAP_WEEKDAYS}('Mon')") self.assertIsNone(token.evaluate()) token = self.parser.parse(f"let $x := {MAP_WEEKDAYS} return $x('Mo')") context = XPathContext(self.etree.XML('')) self.assertEqual(token.evaluate(context), ['Monday']) def test_nested_map(self): token = self.parser.parse(MAP_WEEKDAYS) self.assertIsInstance(token, XPathMap) token = self.parser.parse(f'{NESTED_MAP}("book")("title")') self.assertEqual(token.evaluate(), 'Data on the Web') token = self.parser.parse(f'{NESTED_MAP}("book")("author")') self.assertIsInstance(token.evaluate(), XPathArray) token = self.parser.parse(f'{NESTED_MAP}("book")("author")(1)("last")') self.assertEqual(token.evaluate(), 'Abiteboul') def test_curly_array_constructor(self): token = self.parser.parse('array { 1, 2, 5, 7 }') self.assertIsInstance(token, XPathArray) def test_square_array_constructor(self): token = self.parser.parse('[ 1, 2, 5, 7 ]') self.assertIsInstance(token, XPathArray) def test_array_lookup(self): token = self.parser.parse('array { 1, 2, 5, 7 }(4)') self.assertEqual(token.evaluate(), 7) token = self.parser.parse('[ 1, 2, 5, 7 ](4)') self.assertEqual(token.evaluate(), 7) def test_map_size_function(self): self.check_value('map:size(map{})', 0) self.check_value('map:size(map{"true":1, "false":0})', 2) def test_map_keys_function(self): self.check_value('map:keys(map{})', []) self.check_value('map:keys(map{1:"yes", 2:"no"})', [1, 2]) def test_map_contains_function(self): self.check_value('map:contains(map{}, 1)', False) self.check_value('map:contains(map{}, "xyz")', False) self.check_value('map:contains(map{1:"yes", 2:"no"}, 1)', True) self.check_value('map:contains(map{"xyz":23}, "xyz")', True) self.check_value('map:contains(map{"abc":23, "xyz":()}, "xyz")', True) context = XPathContext(self.etree.XML('')) expression = f"let $x := {MAP_WEEKDAYS_DE} return map:contains($x, 2)" self.check_value(expression, [True], context=context) expression = f"let $x := {MAP_WEEKDAYS_DE} return map:contains($x, 9)" self.check_value(expression, [False], context=context) def test_map_get_function(self): context = XPathContext(self.etree.XML('')) expression = f"let $x := {MAP_WEEKDAYS} return map:get($x, 'Mo')" self.check_value(expression, ['Monday'], context=context) expression = f"let $x := {MAP_WEEKDAYS} return map:get($x, 'Mon')" self.check_value(expression, [], context=context) def test_array_size_function(self): self.check_value('array:size(["a", "b", "c"])', 3) self.check_value('array:size(["a", ["b", "c"]])', 2) self.check_value('array:size([ ])', 0) self.check_value('array:size([[ ]])', 1) def test_array_get_function(self): self.check_value('array:get(["a", "b", "c"], 2)', 'b') token = self.parser.parse('array:get(["a", ["b", "c"]], 2)') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result._array, ['b', 'c']) def test_array_put_function(self): token = self.parser.parse(' array:put(["a", "b", "c"], 2, "d")') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result._array, ['a', 'd', 'c']) token = self.parser.parse('array:put(["a", "b", "c"], 2, ("d", "e"))') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result._array, ['a', ['d', 'e'], 'c']) token = self.parser.parse('array:put(["a"], 1, ["d", "e"])') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertIsInstance(result._array[0], XPathArray) self.assertListEqual(result._array[0]._array, ['d', 'e']) @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath31ParserTest(XPath31ParserTest): etree = lxml_etree class XPath31FunctionsTest(test_xpath30.XPath30FunctionsTest): maxDiff = 1024 def setUp(self): self.parser = XPath31Parser(namespaces=self.namespaces) # Make sure the tests are repeatable. env_vars_to_tweak = 'LC_ALL', 'LANG' self.current_env_vars = {v: os.environ.get(v) for v in env_vars_to_tweak} for v in self.current_env_vars: os.environ[v] = 'en_US.UTF-8' def tearDown(self): if hasattr(self, 'current_env_vars'): for v in self.current_env_vars: if self.current_env_vars[v] is not None: os.environ[v] = self.current_env_vars[v] @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath31FunctionsTest(XPath31FunctionsTest): etree = lxml_etree class XPath31ConstructorsTest(test_xpath30.XPath30ConstructorsTest): def setUp(self): self.parser = XPath31Parser(namespaces=self.namespaces) @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath31ConstructorsTest(XPath31ConstructorsTest): etree = lxml_etree if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_xpath_context.py000066400000000000000000000302231427546011100214660ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest from copy import copy from unittest.mock import patch import xml.etree.ElementTree as ElementTree try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None from elementpath import XPathContext, DocumentNode, ElementNode, datatypes, select class DummyXsdType: name = local_name = None def is_matching(self, name, default_namespace): pass def is_empty(self): pass def is_simple(self): pass def has_simple_content(self): pass def has_mixed_content(self): pass def is_element_only(self): pass def is_key(self): pass def is_qname(self): pass def is_notation(self): pass def decode(self, obj, *args, **kwargs): pass def validate(self, obj, *args, **kwargs): pass class XPathContextTest(unittest.TestCase): root = ElementTree.XML('Dickens') def test_basic_initialization(self): self.assertRaises(TypeError, XPathContext, None) def test_timezone_argument(self): context = XPathContext(self.root) self.assertIsNone(context.timezone) context = XPathContext(self.root, timezone='Z') self.assertIsInstance(context.timezone, datatypes.Timezone) def test_repr(self): self.assertEqual(repr(XPathContext(self.root)), f"XPathContext(root={self.root})") def test_copy(self): root = ElementTree.XML('') context = XPathContext(root) self.assertIsInstance(copy(context), XPathContext) self.assertIsNot(copy(context), context) @unittest.skipIf(lxml_etree is None, 'lxml library is not installed') def test_etree_property(self): root = ElementTree.XML('') context = XPathContext(root) self.assertEqual(context.etree.__name__, 'xml.etree.ElementTree') self.assertEqual(context.etree.__name__, 'xml.etree.ElementTree') # property caching root = lxml_etree.XML('') context = XPathContext(root) self.assertEqual(context.etree.__name__, 'lxml.etree') self.assertEqual(context.etree.__name__, 'lxml.etree') def test_is_principal_node_kind(self): root = ElementTree.XML('') context = XPathContext(root) self.assertTrue(hasattr(context.item.elem, 'tag')) self.assertTrue(context.is_principal_node_kind()) context.item = context.root.attributes[0] self.assertFalse(context.is_principal_node_kind()) context.axis = 'attribute' self.assertTrue(context.is_principal_node_kind()) def test_iter_product(self): context = XPathContext(self.root) def sel1(_context): yield from range(2) def sel2(_context): yield from range(3) expected = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)] self.assertListEqual(list(context.iter_product([sel1, sel2])), expected) self.assertEqual(context.variables, {}) self.assertListEqual(list(context.iter_product([sel1, sel2], [])), expected) self.assertEqual(context.variables, {}) self.assertListEqual(list(context.iter_product([sel1, sel2], ['a', 'b'])), expected) self.assertEqual(context.variables, {'a': 1, 'b': 2}) context.variables = {'a': 0, 'b': 0} self.assertListEqual(list(context.iter_product([sel1, sel2], ['a', 'b'])), expected) self.assertEqual(context.variables, {'a': 1, 'b': 2}) context.variables = {'a': 0, 'b': 0} self.assertListEqual(list(context.iter_product([sel1, sel2], ['a'])), expected) self.assertEqual(context.variables, {'a': 1, 'b': 0}) context.variables = {'a': 0, 'b': 0} self.assertListEqual(list(context.iter_product([sel1, sel2], ['c', 'b'])), expected) self.assertEqual(context.variables, {'a': 0, 'b': 2, 'c': 1}) context.variables = {'a': 0, 'b': 0} self.assertListEqual(list(context.iter_product([sel1, sel2], ['b'])), expected) self.assertEqual(context.variables, {'a': 0, 'b': 1}) def test_iter_attributes(self): root = ElementTree.XML('') context = XPathContext(root) attributes = context.root.attributes self.assertEqual(len(attributes), 2) self.assertListEqual(list(context.iter_attributes()), attributes) context.item = attributes[0] self.assertListEqual(list(context.iter_attributes()), attributes[:1]) with patch.object(DummyXsdType(), 'has_simple_content', return_value=True) as xsd_type: context = XPathContext(root) context.root.xsd_type = xsd_type self.assertListEqual(list(context.iter_attributes()), context.root.attributes) self.assertNotEqual(attributes, context.root.attributes) context.item = None self.assertListEqual(list(context.iter_attributes()), []) def test_iter_children_or_self(self): doc = ElementTree.ElementTree(self.root) context = XPathContext(doc) self.assertIsInstance(context.root, DocumentNode) self.assertIsInstance(context.root[0], ElementNode) self.assertListEqual(list(e.elem for e in context.iter_children_or_self()), [self.root]) context.item = context.root[0] # root element self.assertListEqual(list(context.iter_children_or_self()), [context.root[0].children[0]]) context.item = context.root # document node self.assertListEqual(list(e.elem for e in context.iter_children_or_self()), [self.root]) def test_iter_parent(self): root = ElementTree.XML('') context = XPathContext(root, item=None) self.assertListEqual(list(context.iter_parent()), []) context = XPathContext(root) self.assertListEqual(list(context.iter_parent()), []) with patch.object(DummyXsdType(), 'has_simple_content', return_value=True) as xsd_type: context = XPathContext(root, item=root) context.root.xsd_type = xsd_type self.assertListEqual(list(context.iter_parent()), []) root = ElementTree.XML('') context = XPathContext(root, item=None) self.assertListEqual(list(context.iter_parent()), []) context = XPathContext(root, item=root[2][0]) self.assertListEqual(list(e.elem for e in context.iter_parent()), [root[2]]) with patch.object(DummyXsdType(), 'is_empty', return_value=True) as xsd_type: context = XPathContext(root, item=root[2][0]) context.root[2][0].xsd_type = xsd_type self.assertListEqual(list(e.elem for e in context.iter_parent()), [root[2]]) def test_iter_siblings(self): root = ElementTree.XML('') context = XPathContext(root) self.assertListEqual(list(context.iter_siblings()), []) context = XPathContext(root, item=root[2]) self.assertListEqual(list(e.elem for e in context.iter_siblings()), list(root[3:])) with patch.object(DummyXsdType(), 'is_element_only', return_value=True) as xsd_type: context = XPathContext(root, item=root[2]) context.root[2].xsd_type = xsd_type self.assertListEqual(list(e.elem for e in context.iter_siblings()), list(root[3:])) context = XPathContext(root, item=root[2]) self.assertListEqual( list(e.elem for e in context.iter_siblings('preceding-sibling')), list(root[:2]) ) with patch.object(DummyXsdType(), 'is_element_only', return_value=True) as xsd_type: context = XPathContext(root, item=root[2]) context.root[2].xsd_type = xsd_type self.assertListEqual( list(e.elem for e in context.iter_siblings('preceding-sibling')), list(root[:2]) ) @unittest.skipIf(lxml_etree is None, 'lxml library is not installed') def test_iter_siblings__issue_44(self): root = lxml_etree.XML('text 1text 2 text 3') result = select(root, 'node()[1]/following-sibling::node()') self.assertListEqual(result, [root[0], 'text 2', root[1], ' text 3']) self.assertListEqual(result, root.xpath('node()[1]/following-sibling::node()')) def test_iter_descendants(self): root = ElementTree.XML('') context = XPathContext(root) attr = context.root.attributes[0] self.assertListEqual(list(e.elem for e in context.iter_descendants()), [root, root[0], root[1]]) context.item = attr self.assertListEqual(list(context.iter_descendants(axis='descendant')), []) context.item = attr self.assertListEqual(list(context.iter_descendants()), [attr]) with patch.object(DummyXsdType(), 'has_mixed_content', return_value=True) as xsd_type: context = XPathContext(root, item=root) context.root.xsd_type = xsd_type self.assertListEqual( list(e.elem for e in context.iter_descendants()), [root, root[0], root[1]] ) def test_iter_ancestors(self): root = ElementTree.XML('') context = XPathContext(root) attr = context.root.attributes[0] self.assertListEqual(list(context.iter_ancestors()), []) context.item = attr self.assertListEqual(list(context.iter_ancestors()), [context.root]) result = list(e.elem for e in XPathContext(root, item=root[1]).iter_ancestors()) self.assertListEqual(result, [root]) with patch.object(DummyXsdType(), 'has_mixed_content', return_value=True) as xsd_type: context = XPathContext(root, item=root[1]) context.root[1].xsd_type = xsd_type self.assertListEqual(list(context.iter_ancestors()), [context.root]) def test_iter_preceding(self): root = ElementTree.XML('') context = XPathContext(root, item=None) self.assertListEqual(list(context.iter_preceding()), []) context = XPathContext(root) self.assertListEqual(list(context.iter_preceding()), []) with patch.object(DummyXsdType(), 'has_simple_content', return_value=True) as xsd_type: context = XPathContext(root, item=root) context.root.xsd_type = xsd_type self.assertListEqual(list(context.iter_preceding()), []) context = XPathContext(root, item='text') self.assertListEqual(list(context.iter_preceding()), []) root = ElementTree.XML('') context = XPathContext(root, item=root[2][1]) self.assertListEqual(list(e.elem for e in context.iter_preceding()), [root[0], root[0][0], root[1], root[2][0]]) def test_iter_following(self): root = ElementTree.XML('') context = XPathContext(root) self.assertListEqual(list(context.iter_followings()), []) context = XPathContext(root) context.item = context.root.attributes[0] self.assertListEqual(list(context.iter_followings()), []) context = XPathContext(root, item=root[2]) self.assertListEqual(list(e.elem for e in context.iter_followings()), list(root[3:])) context = XPathContext(root, item=root[1]) result = [root[2], root[2][0], root[3], root[4]] self.assertListEqual(list(e.elem for e in context.iter_followings()), result) with patch.object(DummyXsdType(), 'has_mixed_content', return_value=True) as xsd_type: context = XPathContext(root, item=root[1]) context.root[1].xsd_type = xsd_type self.assertListEqual(list(e.elem for e in context.iter_followings()), result) if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_xpath_nodes.py000066400000000000000000000404701427546011100211170ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest from unittest.mock import patch import io import xml.etree.ElementTree as ElementTree from elementpath.etree import is_etree_element, etree_iter_strings, \ etree_deep_equal, etree_iter_paths from elementpath.xpath_nodes import DocumentNode, ElementNode, AttributeNode, TextNode, \ NamespaceNode, CommentNode, ProcessingInstructionNode from elementpath.xpath_context import XPathContext class DummyXsdType: name = local_name = None def is_matching(self, name, default_namespace): pass def is_empty(self): pass def is_simple(self): pass def has_simple_content(self): pass def has_mixed_content(self): pass def is_element_only(self): pass def is_key(self): pass def is_qname(self): pass def is_notation(self): pass def decode(self, obj, *args, **kwargs): return int(obj) def validate(self, obj, *args, **kwargs): pass class XPathNodesTest(unittest.TestCase): elem = ElementTree.XML('') def setUp(self): root = ElementTree.Element('root') self.context = XPathContext(root) # Dummy context for creating nodes def test_is_etree_element_function(self): self.assertTrue(is_etree_element(self.elem)) self.assertFalse(is_etree_element('text')) self.assertFalse(is_etree_element(None)) def test_elem_iter_strings_function(self): root = ElementTree.XML('text1\ntext2tail1text3tail2') result = ['text1\n', 'text2', 'tail1', 'tail2', 'text3'] self.assertListEqual(list(etree_iter_strings(root)), result) with patch.multiple(DummyXsdType, has_mixed_content=lambda x: True): xsd_type = DummyXsdType() typed_root = ElementNode(elem=root, xsd_type=xsd_type) self.assertListEqual(list(etree_iter_strings(typed_root.elem)), result) norm_result = ['text1', 'text2', 'tail1', 'tail2', 'text3'] with patch.multiple(DummyXsdType, is_element_only=lambda x: True): xsd_type = DummyXsdType() typed_root = ElementNode(elem=root, xsd_type=xsd_type) self.assertListEqual(list(etree_iter_strings(typed_root.elem, True)), norm_result) comment = ElementTree.Comment('foo') root[1].append(comment) self.assertListEqual(list(etree_iter_strings(typed_root.elem, True)), norm_result) self.assertListEqual(list(etree_iter_strings(root)), result) def test_etree_deep_equal_function(self): root = ElementTree.XML('10end') self.assertTrue(etree_deep_equal(root, root)) elem = ElementTree.XML('11end') self.assertFalse(etree_deep_equal(root, elem)) elem = ElementTree.XML('1030end') self.assertFalse(etree_deep_equal(root, elem)) elem = ElementTree.XML('10end') self.assertTrue(etree_deep_equal(root, elem)) elem = ElementTree.XML('10end') self.assertFalse(etree_deep_equal(root, elem)) def test_match_name_method(self): attr = AttributeNode('a1', '10', parent=None) self.assertTrue(attr.match_name('*')) self.assertTrue(attr.match_name('a1')) self.assertTrue(attr.match_name('*:a1')) self.assertFalse(attr.match_name('{foo}*')) self.assertFalse(attr.match_name('foo:*')) self.assertTrue( AttributeNode('{foo}a1', '10').match_name('{foo}*') ) attr = AttributeNode('{http://xpath.test/ns}a1', '10', parent=None) self.assertTrue(attr.match_name('*:a1')) def test_node_base_uri(self): xml_test = '' self.assertEqual(ElementNode(ElementTree.XML(xml_test)).base_uri, '/') document = ElementTree.parse(io.StringIO(xml_test)) self.assertIsNone(DocumentNode(document).base_uri) self.assertIsNone(ElementNode(self.elem).base_uri) self.assertIsNone(TextNode('a text node').base_uri) def test_node_document_uri_function(self): node = ElementNode(self.elem) self.assertIsNone(node.document_uri) xml_test = '' document = ElementTree.parse(io.StringIO(xml_test)) node = DocumentNode(document) self.assertEqual(node.document_uri, '/root') xml_test = '' document = ElementTree.parse(io.StringIO(xml_test)) node = DocumentNode(document) self.assertEqual(node.document_uri, 'http://xpath.test') xml_test = '' document = ElementTree.parse(io.StringIO(xml_test)) node = DocumentNode(document) self.assertIsNone(node.document_uri) xml_test = '' document = ElementTree.parse(io.StringIO(xml_test)) node = DocumentNode(document) self.assertIsNone(node.document_uri) def test_attribute_nodes(self): parent = self.context.root attribute = AttributeNode('id', '0212349350') self.assertEqual(repr(attribute), "AttributeNode(name='id', value='0212349350')") self.assertNotEqual(attribute, AttributeNode('id', '0212349350')) self.assertEqual(attribute.as_item(), ('id', '0212349350')) self.assertNotEqual(attribute.as_item(), AttributeNode('id', '0212349350')) self.assertNotEqual(attribute, AttributeNode('id', '0212349350', parent)) attribute = AttributeNode('id', '0212349350', parent) self.assertNotEqual(attribute, AttributeNode('id', '0212349350', parent)) self.assertEqual(attribute.as_item(), ('id', '0212349350')) attribute = AttributeNode('value', '10', parent) self.assertEqual(repr(attribute), "AttributeNode(name='value', value='10')") with patch.multiple(DummyXsdType, is_simple=lambda x: True): xsd_type = DummyXsdType() attribute.xsd_type = xsd_type self.assertEqual(attribute.as_item(), ('value', '10')) def test_typed_element_nodes(self): element = ElementTree.Element('schema') with patch.multiple(DummyXsdType, is_simple=lambda x: True): xsd_type = DummyXsdType() context = XPathContext(element) context.root.xsd_type = xsd_type self.assertTrue(repr(context.root).startswith( "ElementNode(elem=")) self.assertListEqual(elem.children, [x for x in elem]) document = DocumentNode(ElementTree.parse(io.StringIO(""))) self.assertListEqual(document.children, []) # not built document document.children.append(ElementNode(document.value.getroot(), document)) self.assertListEqual(document.children, [document.getroot()]) self.assertIsNone(TextNode('a text node').children) def test_node_nilled_property(self): xml_test = '' self.assertTrue(ElementNode(ElementTree.XML(xml_test)).nilled) xml_test = '' self.assertFalse(ElementNode(ElementTree.XML(xml_test)).nilled) self.assertFalse(ElementNode(ElementTree.XML('')).nilled) self.assertFalse(TextNode('foo').nilled) def test_node_kind_property(self): document = DocumentNode(ElementTree.parse(io.StringIO(u''))) element = ElementNode(ElementTree.Element('schema')) attribute = AttributeNode('id', '0212349350') namespace = NamespaceNode('xs', 'http://www.w3.org/2001/XMLSchema') comment = CommentNode(ElementTree.Comment('nothing important')) pi = ProcessingInstructionNode( self.context, ElementTree.ProcessingInstruction('action', 'nothing to do') ) text = TextNode('betelgeuse') self.assertEqual(document.kind, 'document') self.assertEqual(element.kind, 'element') self.assertEqual(attribute.kind, 'attribute') self.assertEqual(namespace.kind, 'namespace') self.assertEqual(comment.kind, 'comment') self.assertEqual(pi.kind, 'processing-instruction') self.assertEqual(text.kind, 'text') with patch.multiple(DummyXsdType, is_simple=lambda x: True): xsd_type = DummyXsdType() attribute = AttributeNode('id', '0212349350', xsd_type=xsd_type) self.assertEqual(attribute.kind, 'attribute') typed_element = ElementNode(element.elem, xsd_type=xsd_type) self.assertEqual(typed_element.kind, 'element') def test_name_property(self): root = self.context.root attr = AttributeNode('a1', '20') namespace = NamespaceNode('xs', 'http://www.w3.org/2001/XMLSchema') self.assertEqual(root.name, 'root') self.assertEqual(attr.name, 'a1') self.assertEqual(namespace.name, 'xs') with patch.multiple(DummyXsdType, is_simple=lambda x: True): xsd_type = DummyXsdType() typed_elem = ElementNode(elem=root.elem, xsd_type=xsd_type) self.assertEqual(typed_elem.name, 'root') typed_attr = AttributeNode('a1', value='20', xsd_type=xsd_type) self.assertEqual(typed_attr.name, 'a1') def test_path_property(self): root = ElementTree.XML('') context = XPathContext(root) self.assertEqual(context.root.path, '/A') self.assertEqual(context.root[0].path, '/A/B1') self.assertEqual(context.root[0][0].path, '/A/B1/C1') self.assertEqual(context.root[1].path, '/A/B2') self.assertEqual(context.root[2].path, '/A/B3') self.assertEqual(context.root[2][0].path, '/A/B3/C1') self.assertEqual(context.root[2][1].path, '/A/B3/C2') attr = context.root[2][1].attributes[0] self.assertEqual(attr.path, '/A/B3/C2/@max') document = ElementTree.ElementTree(root) context = XPathContext(root=document) self.assertEqual(context.root[0][2][0].path, '/A/B3/C1') root = ElementTree.XML('10') context = XPathContext(root) with patch.object(DummyXsdType(), 'is_simple', return_value=True) as xsd_type: elem = context.root[0] elem.xsd_type = xsd_type self.assertEqual(elem.path, '/A/B1') with patch.object(DummyXsdType(), 'is_simple', return_value=True) as xsd_type: context = XPathContext(root) attr = context.root[1].attributes[0] attr.xsd_type = xsd_type self.assertEqual(attr.path, '/A/B2/@min') def test_element_node_iter(self): root = ElementTree.XML('text1\ntext2text3') context = XPathContext(root) expected = [ context.root, context.root.namespace_nodes[0], context.root[0], context.root[1], context.root[1].namespace_nodes[0], context.root[1].attributes[0], context.root[1][0], context.root[2], context.root[2].namespace_nodes[0], context.root[3], context.root[3].namespace_nodes[0], context.root[3][0], context.root[3][0].namespace_nodes[0], context.root[3][0][0] ] result = list(context.root.iter()) self.assertListEqual(result, expected) root = ElementTree.XML('') context = XPathContext(root) # iter includes also xml namespace nodes self.assertListEqual( list(e.elem for e in context.root.iter() if isinstance(e, ElementNode)), list(root.iter()) ) def test_document_node_iter(self): root = ElementTree.XML('') doc = ElementTree.ElementTree(root) context = XPathContext(doc) self.assertListEqual( list(e.elem for e in context.root.iter() if isinstance(e, ElementNode)), list(doc.iter()) ) def test_etree_iter_paths(self): root = ElementTree.XML('') root[2].append(ElementTree.Comment('a comment')) root[2].append(ElementTree.Element('c3')) # duplicated tag items = list(etree_iter_paths(root)) self.assertListEqual(items, [ (root, '.'), (root[0], './Q{}b1[1]'), (root[0][0], './Q{}b1[1]/Q{}c1[1]'), (root[0][1], './Q{}b1[1]/Q{}c2[1]'), (root[1], './Q{}b2[1]'), (root[2], './Q{}b3[1]'), (root[2][0], './Q{}b3[1]/Q{}c3[1]'), (root[2][1], './Q{}b3[1]/comment()[1]'), (root[2][2], './Q{}b3[1]/Q{}c3[2]') ]) self.assertListEqual(list(etree_iter_paths(root, path='')), [ (root, ''), (root[0], 'Q{}b1[1]'), (root[0][0], 'Q{}b1[1]/Q{}c1[1]'), (root[0][1], 'Q{}b1[1]/Q{}c2[1]'), (root[1], 'Q{}b2[1]'), (root[2], 'Q{}b3[1]'), (root[2][0], 'Q{}b3[1]/Q{}c3[1]'), (root[2][1], 'Q{}b3[1]/comment()[1]'), (root[2][2], 'Q{}b3[1]/Q{}c3[2]') ]) self.assertListEqual(list(etree_iter_paths(root, path='/')), [ (root, '/'), (root[0], '/Q{}b1[1]'), (root[0][0], '/Q{}b1[1]/Q{}c1[1]'), (root[0][1], '/Q{}b1[1]/Q{}c2[1]'), (root[1], '/Q{}b2[1]'), (root[2], '/Q{}b3[1]'), (root[2][0], '/Q{}b3[1]/Q{}c3[1]'), (root[2][1], '/Q{}b3[1]/comment()[1]'), (root[2][2], '/Q{}b3[1]/Q{}c3[2]') ]) if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/test_xpath_token.py000066400000000000000000001115371427546011100211320ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest from unittest.mock import patch import io import locale import math import xml.etree.ElementTree as ElementTree from collections import namedtuple from decimal import Decimal try: import xmlschema except ImportError: xmlschema = None else: xmlschema.XMLSchema.meta_schema.build() from elementpath.exceptions import MissingContextError from elementpath.datatypes import UntypedAtomic from elementpath.namespaces import XSD_NAMESPACE, XPATH_FUNCTIONS_NAMESPACE from elementpath.xpath_nodes import ElementNode, AttributeNode, NamespaceNode, \ CommentNode, ProcessingInstructionNode, TextNode, DocumentNode from elementpath.xpath_token import UNICODE_CODEPOINT_COLLATION from elementpath.helpers import ordinal from elementpath.xpath_context import XPathContext, XPathSchemaContext from elementpath.xpath1 import XPath1Parser from elementpath.xpath2 import XPath2Parser class DummyXsdType: name = local_name = None def is_matching(self, name, default_namespace): pass def is_empty(self): pass def is_simple(self): pass def has_simple_content(self): pass def has_mixed_content(self): pass def is_element_only(self): pass def is_key(self): pass def is_qname(self): pass def is_notation(self): pass def validate(self, obj, *args, **kwargs): pass @staticmethod def decode(obj, *args, **kwargs): return int(obj) class Tagged(object): tag = 'root' def __repr__(self): return 'Tagged(tag=%r)' % self.tag class XPath1TokenTest(unittest.TestCase): @classmethod def setUpClass(cls): cls.parser = XPath1Parser(namespaces={'xs': XSD_NAMESPACE, 'tst': "http://xpath.test/ns"}) def test_ordinal_function(self): self.assertEqual(ordinal(1), '1st') self.assertEqual(ordinal(2), '2nd') self.assertEqual(ordinal(3), '3rd') self.assertEqual(ordinal(4), '4th') self.assertEqual(ordinal(11), '11th') self.assertEqual(ordinal(23), '23rd') self.assertEqual(ordinal(34), '34th') def test_arity_property(self): token = self.parser.parse('true()') self.assertEqual(token.symbol, 'true') self.assertEqual(token.label, 'function') self.assertEqual(token.arity, 0) token = self.parser.parse('2 + 5') self.assertEqual(token.symbol, '+') self.assertEqual(token.label, 'operator') self.assertEqual(token.arity, 2) def test_source_property(self): token = self.parser.parse('last()') self.assertEqual(token.symbol, 'last') self.assertEqual(token.label, 'function') self.assertEqual(token.source, 'last()') token = self.parser.parse('2.0') self.assertEqual(token.symbol, '(decimal)') self.assertEqual(token.label, 'literal') self.assertEqual(token.source, '2.0') def test_position(self): parser = XPath2Parser() token = parser.parse("(1, 2, 3, 4)") self.assertEqual(token.symbol, '(') self.assertEqual(token.position, (1, 1)) token = parser.parse("(: Comment line :)\n\n (1, 2, 3, 4)") self.assertEqual(token.symbol, '(') self.assertEqual(token.position, (3, 2)) def test_iter_method(self): token = self.parser.parse('2 + 5') items = [tk for tk in token.iter()] self.assertListEqual(items, [token[0], token, token[1]]) token = self.parser.parse('/A/B[C]/D/@a') self.assertEqual(token.tree, '(/ (/ (/ (/ (A)) ([ (B) (C))) (D)) (@ (a)))') self.assertListEqual(list(tk.value for tk in token.iter()), ['/', 'A', '/', 'B', '[', 'C', '/', 'D', '/', '@', 'a']) self.assertListEqual(list(tk.value for tk in token.iter('(name)')), ['A', 'B', 'C', 'D', 'a']) self.assertListEqual(list(tk.source for tk in token.iter('/')), ['/A', '/A/B[C]', '/A/B[C]/D', '/A/B[C]/D/@a']) def test_iter_leaf_elements_method(self): token = self.parser.parse('2 + 5') self.assertListEqual(list(token.iter_leaf_elements()), []) token = self.parser.parse('/A/B[C]/D/@a') self.assertListEqual(list(token.iter_leaf_elements()), []) token = self.parser.parse('/A/B[C]/D') self.assertListEqual(list(token.iter_leaf_elements()), ['D']) token = self.parser.parse('/A/B[C]') self.assertEqual(token.tree, '(/ (/ (A)) ([ (B) (C)))') self.assertListEqual(list(token.iter_leaf_elements()), ['B']) def test_get_argument_method(self): token = self.parser.symbol_table['true'](self.parser) self.assertIsNone(token.get_argument(2)) with self.assertRaises(TypeError): token.get_argument(1, required=True) @patch.multiple(DummyXsdType, is_simple=lambda x: False, has_simple_content=lambda x: True) def test_select_results(self): token = self.parser.parse('.') elem = ElementTree.Element('A', attrib={'max': '30'}) elem.text = '10' xsd_type = DummyXsdType() context = XPathContext(elem) self.assertListEqual(list(token.select_results(context)), [elem]) context = XPathContext(elem, item=elem) context.root.xsd_type = xsd_type self.assertListEqual(list(token.select_results(context)), [elem]) context = XPathContext(elem) context.item = context.root.attributes[0] self.assertListEqual(list(token.select_results(context)), ['30']) context = XPathContext(elem) context.item = context.root.attributes[0] context.item.xsd_type = xsd_type self.assertListEqual(list(token.select_results(context)), ['30']) context = XPathContext(elem, item=10) self.assertListEqual(list(token.select_results(context)), [10]) context = XPathContext(elem, item='10') self.assertListEqual(list(token.select_results(context)), ['10']) def test_cast_to_double(self): token = self.parser.parse('.') self.assertEqual(token.cast_to_double(1), 1.0) with self.assertRaises(ValueError) as ctx: token.cast_to_double('nan') self.assertIn('FORG0001', str(ctx.exception)) if self.parser.version != '1.0': self.parser._xsd_version = '1.1' self.assertEqual(token.cast_to_double('1'), 1.0) self.parser._xsd_version = '1.0' def test_atomization_function(self): root = ElementTree.Element('root') token = self.parser.parse('/unknown/.') context = XPathContext(root) self.assertListEqual(list(token.atomization(context)), []) if self.parser.version > '1.0': token = self.parser.parse('((), 1, 3, "a")') self.assertListEqual(list(token.atomization()), [1, 3, 'a']) def test_use_locale_context_manager(self): token = self.parser.parse('true()') with token.use_locale(UNICODE_CODEPOINT_COLLATION): self.assertEqual(locale.getlocale(locale.LC_COLLATE), ('en_US', 'UTF-8')) try: with token.use_locale('de_DE.UTF-8'): self.assertEqual(locale.getlocale(locale.LC_COLLATE), ('de_DE', 'UTF-8')) except locale.Error: pass # Skip test if 'de_DE.UTF-8' is an unknown locale setting with self.assertRaises(TypeError) as cm: with token.use_locale(None): pass self.assertIn('XPTY0004', str(cm.exception)) self.assertIn('collation cannot be an empty sequence', str(cm.exception)) def test_boolean_value_function(self): token = self.parser.parse('true()') elem = ElementTree.Element('A') context = XPathContext(elem) self.assertTrue(token.boolean_value(context.root)) self.assertFalse(token.boolean_value([])) self.assertTrue(token.boolean_value([context.root])) self.assertFalse(token.boolean_value([0])) self.assertTrue(token.boolean_value([1])) with self.assertRaises(TypeError): token.boolean_value([1, 1]) self.assertFalse(token.boolean_value(0)) self.assertTrue(token.boolean_value(1)) self.assertTrue(token.boolean_value(1.0)) self.assertFalse(token.boolean_value(None)) @patch.multiple(DummyXsdType(), is_simple=lambda x: False, has_simple_content=lambda x: True) def test_data_value_function(self): token = self.parser.parse('true()') if self.parser.version != '1.0': xsd_type = DummyXsdType() context = XPathContext(ElementTree.XML('19')) context.root.xsd_type = xsd_type self.assertEqual(token.data_value(context.root), 19) context = XPathContext(ElementTree.XML('')) obj = AttributeNode('age', '19') self.assertEqual(token.data_value(obj), UntypedAtomic('19')) obj = NamespaceNode('tns', 'http://xpath.test/ns') self.assertEqual(token.data_value(obj), 'http://xpath.test/ns') obj = TextNode('19') self.assertEqual(token.data_value(obj), UntypedAtomic('19')) obj = ElementTree.XML('abcde') element_node = ElementNode(obj) self.assertEqual(token.data_value(element_node), UntypedAtomic('abcde')) obj = ElementTree.parse(io.StringIO('abcde')) document_node = DocumentNode(obj) self.assertEqual(token.data_value(document_node), UntypedAtomic('abcde')) obj = ElementTree.Comment("foo bar") comment_node = CommentNode(obj) self.assertEqual(token.data_value(comment_node), 'foo bar') obj = ElementTree.ProcessingInstruction('action', 'nothing to do') pi_node = ProcessingInstructionNode(obj) self.assertEqual(token.data_value(pi_node), 'action nothing to do') self.assertIsNone(token.data_value(None)) self.assertEqual(token.data_value(19), 19) self.assertEqual(token.data_value('19'), '19') self.assertFalse(token.data_value(False)) # Does not check type of non nodes, simply returns the object. tagged_object = Tagged() self.assertIs(token.data_value(tagged_object), tagged_object) def test_string_value_function(self): token = self.parser.parse('true()') document = ElementTree.parse(io.StringIO(u'123456789')) element = ElementTree.Element('schema') comment = ElementTree.Comment('nothing important') pi = ElementTree.ProcessingInstruction('action', 'nothing to do') document_node = XPathContext(document).root context = XPathContext(element) element_node = context.root attribute_node = AttributeNode('id', '0212349350') namespace_node = NamespaceNode('xs', 'http://www.w3.org/2001/XMLSchema') comment_node = CommentNode(comment) pi_node = ProcessingInstructionNode(pi) text_node = TextNode('betelgeuse') self.assertEqual(token.string_value(document_node), '123456789') self.assertEqual(token.string_value(element_node), '') self.assertEqual(token.string_value(attribute_node), '0212349350') self.assertEqual(token.string_value(namespace_node), 'http://www.w3.org/2001/XMLSchema') self.assertEqual(token.string_value(comment_node), 'nothing important') self.assertEqual(token.string_value(pi_node), 'action nothing to do') self.assertEqual(token.string_value(text_node), 'betelgeuse') self.assertEqual(token.string_value(None), '') self.assertEqual(token.string_value(Decimal(+1999)), '1999') self.assertEqual(token.string_value(Decimal('+1999')), '1999') self.assertEqual(token.string_value(Decimal('+19.0010')), '19.001') self.assertEqual(token.string_value(10), '10') self.assertEqual(token.string_value(1e99), '1E99') self.assertEqual(token.string_value(1e-05), '1E-05') self.assertEqual(token.string_value(1.00), '1') self.assertEqual(token.string_value(+19.0010), '19.001') self.assertEqual(token.string_value(float('nan')), 'NaN') self.assertEqual(token.string_value(float('inf')), 'INF') self.assertEqual(token.string_value(float('-inf')), '-INF') self.assertEqual(token.string_value(()), '()') tagged_object = Tagged() self.assertEqual(token.string_value(tagged_object), "Tagged(tag='root')") with patch.multiple(DummyXsdType, is_simple=lambda x: True): xsd_type = DummyXsdType() element.text = '10' typed_elem = ElementNode(elem=element, xsd_type=xsd_type) self.assertEqual(token.string_value(typed_elem), '10') self.assertEqual(token.data_value(typed_elem), 10) def test_number_value_function(self): token = self.parser.parse('true()') self.assertEqual(token.number_value("19"), 19) self.assertTrue(math.isnan(token.number_value("not a number"))) def test_compare_operator(self): token1 = self.parser.parse('true()') token2 = self.parser.parse('false()') self.assertEqual(token1, token1) self.assertNotEqual(token1, token2) self.assertNotEqual(token2, 'false()') def test_expected_method(self): token = self.parser.parse('.') self.assertIsNone(token.expected('.')) with self.assertRaises(SyntaxError) as ctx: raise token.expected('*') self.assertIn('XPST0003', str(ctx.exception)) def test_unexpected_method(self): token = self.parser.parse('.') self.assertIsNone(token.unexpected('*')) with self.assertRaises(SyntaxError) as ctx: raise token.unexpected('.') self.assertIn('XPST0003', str(ctx.exception)) with self.assertRaises(SyntaxError) as ctx: raise token.unexpected('.', message="unknown error") self.assertIn('XPST0003', str(ctx.exception)) self.assertIn('unknown error', str(ctx.exception)) with self.assertRaises(TypeError) as ctx: raise token.unexpected('.', code='XPST0017') self.assertIn('XPST0017', str(ctx.exception)) def test_xpath_error_code(self): parser = XPath2Parser() token = parser.parse('.') self.assertEqual(token.error_code('XPST0003'), 'err:XPST0003') parser.namespaces['error'] = parser.namespaces.pop('err') self.assertEqual(token.error_code('XPST0003'), 'error:XPST0003') parser.namespaces.pop('error') self.assertEqual(token.error_code('XPST0003'), 'XPST0003') def test_xpath_error(self): token = self.parser.parse('.') with self.assertRaises(ValueError) as ctx: raise token.error('xml:XPST0003') self.assertIn('XPTY0004', str(ctx.exception)) self.assertIn("'http://www.w3.org/2005/xqt-errors' namespace is required", str(ctx.exception)) with self.assertRaises(ValueError) as ctx: raise token.error('err:err:XPST0003') self.assertIn('XPTY0004', str(ctx.exception)) self.assertIn("is not a prefixed name", str(ctx.exception)) with self.assertRaises(ValueError) as ctx: raise token.error('XPST9999') self.assertIn('XPTY0004', str(ctx.exception)) self.assertIn("unknown XPath error code", str(ctx.exception)) def test_xpath_error_shortcuts(self): token = self.parser.parse('.') with self.assertRaises(ValueError) as ctx: raise token.wrong_value() self.assertIn('FOCA0002', str(ctx.exception)) with self.assertRaises(TypeError) as ctx: raise token.wrong_type() self.assertIn('FORG0006', str(ctx.exception)) with self.assertRaises(MissingContextError) as ctx: raise token.missing_context() self.assertIn('XPDY0002', str(ctx.exception)) with self.assertRaises(TypeError) as ctx: raise token.wrong_context_type() self.assertIn('XPTY0004', str(ctx.exception)) with self.assertRaises(NameError) as ctx: raise token.missing_name() self.assertIn('XPST0008', str(ctx.exception)) if self.parser.compatibility_mode: with self.assertRaises(NameError) as ctx: raise token.missing_axis() self.assertIn('XPST0010', str(ctx.exception)) else: with self.assertRaises(SyntaxError) as ctx: raise token.missing_axis() self.assertIn('XPST0003', str(ctx.exception)) with self.assertRaises(TypeError) as ctx: raise token.wrong_nargs() self.assertIn('XPST0017', str(ctx.exception)) with self.assertRaises(TypeError) as ctx: raise token.wrong_sequence_type() self.assertIn('XPDY0050', str(ctx.exception)) with self.assertRaises(NameError) as ctx: raise token.unknown_atomic_type() self.assertIn('XPST0051', str(ctx.exception)) class XPath2TokenTest(XPath1TokenTest): @classmethod def setUpClass(cls): cls.parser = XPath2Parser(namespaces={'xs': XSD_NAMESPACE, 'tst': "http://xpath.test/ns"}) def test_bind_namespace_method(self): token = self.parser.parse('true()') self.assertIsNone(token.bind_namespace(XPATH_FUNCTIONS_NAMESPACE)) with self.assertRaises(TypeError) as ctx: token.bind_namespace(XSD_NAMESPACE) self.assertIn('XPST0017', str(ctx.exception)) self.assertIn("a name, a wildcard or a constructor function", str(ctx.exception)) token = self.parser.parse("xs:string(10.1)") with self.assertRaises(TypeError) as ctx: token.bind_namespace(XSD_NAMESPACE) self.assertIn('XPST0017', str(ctx.exception)) self.assertIn("a name, a wildcard or a constructor function", str(ctx.exception)) self.assertIsNone(token[1].bind_namespace(XSD_NAMESPACE)) with self.assertRaises(TypeError) as ctx: token[1].bind_namespace(XPATH_FUNCTIONS_NAMESPACE) self.assertIn("a function expected", str(ctx.exception)) token = self.parser.parse("tst:foo") with self.assertRaises(TypeError) as ctx: token.bind_namespace('http://xpath.test/ns') self.assertIn('XPST0017', str(ctx.exception)) self.assertIn("a name, a wildcard or a function", str(ctx.exception)) @unittest.skipIf(xmlschema is None, "xmlschema library required.") def test_add_xsd_type(self): schema = xmlschema.XMLSchema(""" """) root_token = self.parser.parse('a1') self.assertIsNone(root_token.xsd_types) root_token.add_xsd_type(schema.elements['a1']) self.assertEqual(root_token.xsd_types, {'a1': schema.meta_schema.types['int']}) self.parser.schema = xmlschema.xpath.XMLSchemaProxy(schema) try: root_token = self.parser.parse('a1') self.assertEqual(root_token.xsd_types, {'a1': schema.meta_schema.types['int']}) root_token = self.parser.parse('a2') self.assertEqual(root_token.xsd_types, {'a2': schema.meta_schema.types['string']}) root_token = self.parser.parse('a3') self.assertEqual(root_token.xsd_types, {'a3': schema.meta_schema.types['boolean']}) root_token = self.parser.parse('*') self.assertEqual(root_token.xsd_types, { 'a1': schema.meta_schema.types['int'], 'a2': schema.meta_schema.types['string'], 'a3': schema.meta_schema.types['boolean'], }) # With the schema as base element all the global elements are added. root_token = self.parser.parse('.') self.assertEqual(root_token.xsd_types, { 'a1': schema.meta_schema.types['int'], 'a2': schema.meta_schema.types['string'], 'a3': schema.meta_schema.types['boolean'], }) self.parser.schema = xmlschema.xpath.XMLSchemaProxy(schema, schema.elements['a2']) root_token = self.parser.parse('.') self.assertEqual(root_token.xsd_types, {'a2': schema.meta_schema.types['string']}) finally: self.parser.schema = None schema = xmlschema.XMLSchema(""" """) self.parser.schema = xmlschema.xpath.XMLSchemaProxy(schema, schema.elements['a']) try: root_token = self.parser.parse('.') self.assertEqual(root_token.xsd_types, {'a': schema.types['aType']}) root_token = self.parser.parse('*') self.assertEqual(root_token.xsd_types, { 'b1': schema.types['b1Type'], 'b2': schema.types['b2Type'], 'b3': schema.types['b3Type'], }) root_token = self.parser.parse('b1') self.assertEqual(root_token.xsd_types, {'b1': schema.types['b1Type']}) root_token = self.parser.parse('b2') self.assertEqual(root_token.xsd_types, {'b2': schema.types['b2Type']}) root_token = self.parser.parse('b') self.assertIsNone(root_token.xsd_types) root_token = self.parser.parse('*/c1') self.assertEqual(root_token[0].xsd_types, { 'b1': schema.types['b1Type'], 'b2': schema.types['b2Type'], 'b3': schema.types['b3Type'], }) self.assertEqual(root_token[1].xsd_types, {'c1': [ schema.meta_schema.types['int'], schema.meta_schema.types['string'], schema.meta_schema.types['boolean'], ]}) root_token = self.parser.parse('*/c2') self.assertEqual(root_token[1].xsd_types, {'c2': schema.meta_schema.types['string']}) root_token = self.parser.parse('*/*') self.assertEqual(root_token[1].xsd_types, { 'c1': [schema.meta_schema.types['int'], schema.meta_schema.types['string'], schema.meta_schema.types['boolean']], 'c2': schema.meta_schema.types['string'] }) finally: self.parser.schema = None @unittest.skipIf(xmlschema is None, "xmlschema library required.") def test_add_xsd_type_alternatives(self): schema = xmlschema.XMLSchema(""" """) schema_context = XPathSchemaContext(schema) root_token = self.parser.parse('root') self.assertIsNone(root_token.add_xsd_type('xs:string')) # ignore non-schema items self.assertIsNone(root_token.xsd_types) xsd_type = root_token.add_xsd_type(schema.elements['root']) self.assertEqual(root_token.xsd_types, {'root': schema.meta_schema.types['int']}) self.assertIs(xsd_type, schema.meta_schema.types['int']) root_token.xsd_types = None typed_element = schema_context.root.children[0] xsd_type = root_token.add_xsd_type(typed_element) self.assertEqual(root_token.xsd_types, {'root': schema.meta_schema.types['int']}) self.assertIs(xsd_type, schema.meta_schema.types['int']) attribute = schema_context.root.attributes[0] attribute.xsd_type = schema.meta_schema.types['string'] xsd_type = root_token.add_xsd_type(attribute) self.assertEqual(root_token.xsd_types, {'a': schema.meta_schema.types['string'], 'root': schema.meta_schema.types['int']}) self.assertIs(xsd_type, schema.meta_schema.types['string']) @unittest.skipIf(xmlschema is None, "xmlschema library required.") def test_select_xsd_nodes(self): schema = xmlschema.XMLSchema(""" """) self.parser.schema = xmlschema.xpath.XMLSchemaProxy(schema) try: root_token = self.parser.parse('.') self.assertEqual(root_token.xsd_types, { 'root': schema.elements['root'].type, }) context = XPathSchemaContext(root=schema, axis='self') self.assertListEqual(list(root_token.select_xsd_nodes(context, 'root')), []) tag = '{%s}schema' % XSD_NAMESPACE self.assertListEqual( list(e.elem for e in root_token.select_xsd_nodes(context, tag)), [schema] ) context.item = None self.assertListEqual(list(root_token.select_xsd_nodes(context, 'root')), []) context.item = None result = list(root_token.select_xsd_nodes(context, tag)) self.assertListEqual(result, [None]) # Schema as document node finally: self.parser.schema = None @unittest.skipIf(xmlschema is None, "xmlschema library required.") def test_match_xsd_type(self): schema = xmlschema.XMLSchema(""" """) self.parser.schema = xmlschema.xpath.XMLSchemaProxy(schema) try: root_token = self.parser.parse('root') self.assertEqual(root_token.xsd_types, {'root': schema.meta_schema.types['int']}) context = XPathSchemaContext(root=schema) obj = list(root_token.select_xsd_nodes(context, 'root')) self.assertIsInstance(obj[0], ElementNode) self.assertEqual(root_token.xsd_types, {'root': schema.meta_schema.types['int']}) context.axis = 'self' root_token.xsd_types = None list(root_token.select_xsd_nodes(context, 'root')) self.assertIsNone(root_token.xsd_types) context.axis = None obj = list(root_token.select_xsd_nodes(context, 'root')) self.assertIsInstance(obj[0], ElementNode) context = XPathSchemaContext(root=schema.meta_schema) obj = list(root_token.select_xsd_nodes(context, 'root')) self.assertListEqual(obj, []) root_token = self.parser.parse('@a') self.assertEqual(root_token[0].xsd_types, {'a': schema.meta_schema.types['string']}) context = XPathSchemaContext(root=schema.meta_schema, axis='self') xsd_attribute = schema.attributes['a'] context.item = AttributeNode('a', xsd_attribute, xsd_type=xsd_attribute.type) obj = list(root_token.select_xsd_nodes(context, 'a')) self.assertIsInstance(obj[0], AttributeNode) self.assertIsNotNone(obj[0].xsd_type) self.assertEqual(root_token[0].xsd_types, {'a': schema.meta_schema.types['string']}) root_token.xsd_types = None context = XPathSchemaContext(root=schema) list(root_token.select_xsd_nodes(context, 'a')) self.assertIsNone(root_token.xsd_types) context = XPathSchemaContext(root=schema.meta_schema, axis='self') attribute = context.item = AttributeNode('a', schema.attributes['a']) obj = list(root_token.select_xsd_nodes(context, 'a')) self.assertIsInstance(obj[0], AttributeNode) self.assertEqual(obj[0], attribute) self.assertIsInstance(obj[0].value, xmlschema.XsdAttribute) self.assertIsInstance(obj[0].typed_value, str) self.assertEqual(root_token[0].xsd_types, {'a': schema.meta_schema.types['string']}) finally: self.parser.schema = None @unittest.skipIf(xmlschema is None, "xmlschema library required.") def test_get_xsd_type(self): schema = xmlschema.XMLSchema(""" """) root_token = self.parser.parse('root') self.assertIsNone(root_token.xsd_types) self.assertIsNone(root_token.get_xsd_type('root')) self.parser.schema = xmlschema.xpath.XMLSchemaProxy(schema) try: root_token = self.parser.parse('root') self.assertEqual(root_token.xsd_types, {'root': schema.meta_schema.types['int']}) xsd_type = root_token.get_xsd_type('root') self.assertEqual(xsd_type, schema.meta_schema.types['int']) self.assertIsNone(root_token.get_xsd_type('node')) TestElement = namedtuple('XsdElement', 'name xsd_version type') root_token.add_xsd_type( TestElement('node', '1.0', schema.meta_schema.types['float']) ) root_token.add_xsd_type( TestElement('node', '1.0', schema.meta_schema.types['boolean']) ) root_token.add_xsd_type( TestElement('node', '1.0', schema.meta_schema.types['decimal']) ) xsd_type = root_token.get_xsd_type('node') self.assertEqual(xsd_type, schema.meta_schema.types['float']) xsd_type = root_token.get_xsd_type(AttributeNode('node', 'false')) self.assertEqual(xsd_type, schema.meta_schema.types['boolean']) xsd_type = root_token.get_xsd_type(AttributeNode('node', 'alpha')) self.assertEqual(xsd_type, schema.meta_schema.types['float']) elem = ElementTree.Element('node') elem.text = 'false' xsd_type = root_token.get_xsd_type(ElementNode(elem)) self.assertEqual(xsd_type, schema.meta_schema.types['boolean']) typed_element = ElementNode(elem, xsd_type=xsd_type) self.assertIs(xsd_type, root_token.get_xsd_type(typed_element)) elem.text = 'alpha' xsd_type = root_token.get_xsd_type(ElementNode(elem)) self.assertEqual(xsd_type, schema.meta_schema.types['float']) finally: self.parser.schema = None schema = xmlschema.XMLSchema(""" """) self.parser.schema = xmlschema.xpath.XMLSchemaProxy(schema) try: root_token = self.parser.parse('a') elem = ElementTree.Element('a') elem.append(ElementTree.Element('b1')) elem.append(ElementTree.Element('b2')) elem[0].text = 14 elem[1].text = 'true' self.assertEqual( root_token.get_xsd_type(ElementNode(elem)), schema.types['aType'] ) TestElement = namedtuple('XsdElement', 'name xsd_version type') root_token.add_xsd_type(TestElement('a', '1.0', schema.meta_schema.types['float'])) self.assertEqual( root_token.get_xsd_type(ElementNode(elem)), schema.types['aType'] ) root_token.xsd_types['a'].insert(0, schema.meta_schema.types['boolean']) self.assertEqual( root_token.get_xsd_type(ElementNode(elem)), schema.types['aType'] ) del elem[1] self.assertEqual(root_token.get_xsd_type(ElementNode(elem)), schema.meta_schema.types['boolean']) finally: self.parser.schema = None @unittest.skipIf(xmlschema is None, "xmlschema library required.") def test_get_typed_node(self): schema = xmlschema.XMLSchema(""" """) self.parser.schema = xmlschema.xpath.XMLSchemaProxy(schema) try: root_token = self.parser.parse('root') elem = ElementTree.Element('root') elem.text = '49' context = XPathContext(elem) node = root_token.get_typed_node(context.root) self.assertIsInstance(node, ElementNode) self.assertIsInstance(node.xsd_type, xmlschema.XsdType) self.assertEqual(node.typed_value, 49) self.assertIs(root_token.get_typed_node(node), node) # elem.text = 'beta' # with self.assertRaises(TypeError) as err: # root_token.get_typed_node(elem) # self.assertIn('XPDY0050', str(err.exception)) # self.assertIn('does not match sequence type', str(err.exception)) root_token.xsd_types['root'] = schema.meta_schema.types['anySimpleType'] elem.text = '36' context = XPathContext(elem) node = root_token.get_typed_node(context.root) self.assertIsInstance(node, ElementNode) self.assertIsInstance(node.xsd_type, xmlschema.XsdType) self.assertIsInstance(node.typed_value, UntypedAtomic) self.assertEqual(node.typed_value, 36) # Convert untyped to int root_token.xsd_types['root'] = schema.meta_schema.types['anyType'] context = XPathContext(elem) node = root_token.get_typed_node(context.root) self.assertIs(node.elem, elem) root_token = self.parser.parse('@a') self.assertEqual(root_token[0].xsd_types, {'a': schema.meta_schema.types['int']}) elem = ElementTree.Element('root', a='10') context = XPathContext(elem) attribute = context.root.attributes[0] node = root_token[0].get_typed_node(attribute) self.assertIsInstance(node, AttributeNode) self.assertIsInstance(node.xsd_type, xmlschema.XsdType) self.assertEqual(node.value, '10') root_token[0].xsd_types['a'] = schema.meta_schema.types['anyType'] node = root_token[0].get_typed_node(attribute) self.assertIsInstance(node, AttributeNode) self.assertIsInstance(node.xsd_type, xmlschema.XsdType) self.assertIsInstance(node.typed_value, int) self.assertEqual(node.value, '10') self.assertEqual(node.typed_value, 10) finally: self.parser.schema = None def test_string_value_function(self): super(XPath2TokenTest, self).test_string_value_function() if xmlschema is not None: schema = xmlschema.XMLSchema(""" """) token = self.parser.parse('.') self.parser.schema = xmlschema.xpath.XMLSchemaProxy(schema) context = XPathContext(schema) try: value = token.string_value(context.root[0]) # 'root' element self.assertIsInstance(value, str) self.assertEqual(value, '1') finally: self.parser.schema = None if __name__ == '__main__': unittest.main() elementpath-3.0.2/tests/xpath_test_class.py000066400000000000000000000266411427546011100211200ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # # Note: Many tests are built using the examples of the XPath standards, # published by W3C under the W3C Document License. # # References: # http://www.w3.org/TR/1999/REC-xpath-19991116/ # http://www.w3.org/TR/2010/REC-xpath20-20101214/ # http://www.w3.org/TR/2010/REC-xpath-functions-20101214/ # https://www.w3.org/Consortium/Legal/2015/doc-license # https://www.w3.org/TR/charmod-norm/ # import unittest import math from copy import copy from contextlib import contextmanager from xml.etree import ElementTree from elementpath import ElementPathError, XPath2Parser, XPathContext, select from elementpath.namespaces import XML_NAMESPACE, XSD_NAMESPACE, \ XSI_NAMESPACE, XPATH_FUNCTIONS_NAMESPACE class DummyXsdType: name = local_name = None def is_matching(self, name, default_namespace): pass def is_empty(self): pass def is_simple(self): pass def has_simple_content(self): pass def has_mixed_content(self): pass def is_element_only(self): pass def is_key(self): pass def is_qname(self): pass def is_notation(self): pass def decode(self, obj, *args, **kwargs): pass def validate(self, obj, *args, **kwargs): pass # noinspection PyPropertyAccess class XPathTestCase(unittest.TestCase): namespaces = { 'xml': XML_NAMESPACE, 'xs': XSD_NAMESPACE, 'xsi': XSI_NAMESPACE, 'fn': XPATH_FUNCTIONS_NAMESPACE, 'eg': 'http://www.example.com/ns/', 'tst': 'http://xpath.test/ns', } variables = { 'values': [10, 20, 5], 'myaddress': 'admin@example.com', 'word': 'alpha', } etree = ElementTree def setUp(self): self.parser = XPath2Parser(self.namespaces) # # Helper methods def check_tokenizer(self, path, expected): """ Checks the list of lexemes generated by the parser tokenizer. :param path: the XPath expression to be checked. :param expected: a list with lexemes generated by the tokenizer. """ self.assertEqual([ lit or symbol or name or unexpected for lit, symbol, name, unexpected in self.parser.__class__.tokenizer.findall(path) ], expected) def check_token(self, symbol, expected_label=None, expected_str=None, expected_repr=None, value=None): """ Checks a token class of an XPath parser class. The instance of the token is created using the value argument and than is checked against other optional arguments. :param symbol: the string that identifies the token class in the parser's symbol table. :param expected_label: the expected label for the token instance. :param expected_str: the expected string conversion of the token instance. :param expected_repr: the expected string representation of the token instance. :param value: the value used to create the token instance. """ token = self.parser.symbol_table[symbol](self.parser, value) self.assertEqual(token.symbol, symbol) if expected_label is not None: self.assertEqual(token.label, expected_label) if expected_str is not None: self.assertEqual(str(token), expected_str) if expected_repr is not None: self.assertEqual(repr(token), expected_repr) def check_tree(self, path, expected): """ Checks the tree string representation of a parsed path. :param path: an XPath expression. :param expected: the expected result string. """ self.assertEqual(self.parser.parse(path).tree, expected) def check_source(self, path, expected): """ Checks the source representation of a parsed path. :param path: an XPath expression. :param expected: the expected result string. """ self.assertEqual(self.parser.parse(path).source, expected) def check_value(self, path, expected=None, context=None): """ Checks the result of the *evaluate* method with an XPath expression. The evaluation is applied on the root token of the parsed XPath expression. :param path: an XPath expression. :param expected: the expected result. Can be a data instance to compare to the result, \ a type to be used to check the type of the result, a function that accepts the result \ as argument and returns a boolean value, an exception class that is raised by running \ the evaluate method. :param context: an optional `XPathContext` instance to be passed to evaluate method. """ context = copy(context) try: root_token = self.parser.parse(path) except ElementPathError as err: if isinstance(expected, type) and isinstance(err, expected): return raise if isinstance(expected, type) and issubclass(expected, Exception): self.assertRaises(expected, root_token.evaluate, context) elif isinstance(expected, float) and math.isnan(expected): value = root_token.evaluate(context) if isinstance(value, list): self.assertTrue(any(math.isnan(x) for x in value)) else: self.assertTrue(math.isnan(value)) elif isinstance(expected, list): self.assertListEqual(root_token.evaluate(context), expected) elif isinstance(expected, set): self.assertEqual(set(root_token.evaluate(context)), expected) elif not callable(expected): self.assertEqual(root_token.evaluate(context), expected) elif isinstance(expected, type): value = root_token.evaluate(context) self.assertIsInstance(value, expected) else: self.assertTrue(expected(root_token.evaluate(context))) def check_select(self, path, expected, context=None): """ Checks the materialized result of the *select* method with an XPath expression. The selection is applied on the root token of the parsed XPath expression. :param path: an XPath expression. :param expected: the expected result. Can be a data instance to compare to the result, \ a function that accepts the result as argument and returns a boolean value, an exception \ class that is raised by running the evaluate method. :param context: an optional `XPathContext` instance to be passed to evaluate method. If no \ context is provided the method is called with a dummy context. """ if context is None: context = XPathContext(root=self.etree.Element(u'dummy_root')) else: context = copy(context) root_token = self.parser.parse(path) if isinstance(expected, type) and issubclass(expected, Exception): self.assertRaises(expected, root_token.select, context) elif isinstance(expected, list): self.assertListEqual(list(root_token.select(context)), expected) elif isinstance(expected, set): self.assertEqual(set(root_token.select(context)), expected) elif not callable(expected): self.assertEqual(list(root_token.select(context)), expected) else: self.assertTrue(expected(list(root_token.parser.parse(path).select(context)))) def check_selector(self, path, root, expected, namespaces=None, **kwargs): """ Checks using the selector API, namely the *select* function at package level. :param path: an XPath expression. :param root: an Element or an ElementTree instance. :param expected: the expected result. Can be a data instance to compare to the result, \ a type to be used to check the type of the result, a function that accepts the result \ as argument and returns a boolean value, an exception class that is raised by running \ the evaluate method. :param namespaces: an optional mapping from prefixes to namespace URIs. :param kwargs: other optional arguments for the parser class. """ if isinstance(expected, type) and issubclass(expected, Exception): self.assertRaises(expected, select, root, path, namespaces, self.parser.__class__, **kwargs) else: results = select(root, path, namespaces, self.parser.__class__, **kwargs) if isinstance(expected, list): self.assertListEqual(results, expected) elif isinstance(expected, set): self.assertEqual(set(results), expected) elif isinstance(expected, float) and math.isnan(expected): self.assertTrue(math.isnan(results)) elif not callable(expected): self.assertEqual(results, expected) elif isinstance(expected, type): self.assertIsInstance(results, expected) else: self.assertTrue(expected(results)) @contextmanager def schema_bound_parser(self, schema_proxy): # Code to acquire resource, e.g.: self.parser.schema = schema_proxy try: yield self.parser finally: self.parser.schema = None @contextmanager def xsd_version_parser(self, xsd_version): xsd_version, self.parser._xsd_version = self.parser._xsd_version, xsd_version try: yield self.parser finally: self.parser._xsd_version = xsd_version # Wrong XPath expression checker shortcuts def check_raise(self, path, exception_class, *message_parts, context=None): with self.assertRaises(exception_class) as error_context: root_token = self.parser.parse(path) root_token.evaluate(copy(context)) for part in message_parts: self.assertIn(part, str(error_context.exception)) def wrong_syntax(self, path, *message_parts, context=None): with self.assertRaises(SyntaxError) as error_context: root_token = self.parser.parse(path) root_token.evaluate(copy(context)) for part in message_parts: self.assertIn(part, str(error_context.exception)) def wrong_value(self, path, *message_parts, context=None): with self.assertRaises(ValueError) as error_context: root_token = self.parser.parse(path) root_token.evaluate(copy(context)) for part in message_parts: self.assertIn(part, str(error_context.exception)) def wrong_type(self, path, *message_parts, context=None): with self.assertRaises(TypeError) as error_context: root_token = self.parser.parse(path) root_token.evaluate(copy(context)) for part in message_parts: self.assertIn(part, str(error_context.exception)) def wrong_name(self, path, *message_parts, context=None): with self.assertRaises(NameError) as error_context: root_token = self.parser.parse(path) root_token.evaluate(copy(context)) for part in message_parts: self.assertIn(part, str(error_context.exception)) if __name__ == '__main__': unittest.main() elementpath-3.0.2/tox.ini000066400000000000000000000025751427546011100153470ustar00rootroot00000000000000# Tox (http://tox.testrun.org/) is a tool for running tests # in multiple virtualenvs. This configuration file will run the # test suite on all supported python versions. To use it, "pip install tox" # and then run "tox" from this directory. [tox] envlist = py{37,38,39,310,311}, pypy3, xmlschema{20}, docs, flake8, mypy-py{38,39,310,311}, pytest, coverage skip_missing_interpreters = true toxworkdir = {homedir}/.tox/elementpath [testenv] deps = lxml xmlschema>=2.0.0 docs: Sphinx coverage: coverage xmlschema20: xmlschema~=2.0.0 commands = python -m unittest whitelist_externals = make [testenv:docs] commands = make -C doc html SPHINXOPTS="-W -n" make -C doc latexpdf SPHINXOPTS="-W -n" make -C doc doctest SPHINXOPTS="-W -n" sphinx-build -W -n -T -b man doc build/sphinx/man [flake8] max-line-length = 100 [testenv:flake8] deps = flake8 commands = flake8 elementpath flake8 tests [testenv:mypy-py{38,39,310,311}] deps = mypy==0.971 lxml-stubs commands = mypy --strict elementpath python tests/test_typing.py [testenv:coverage] commands = coverage run -p -m unittest coverage combine coverage report -m [testenv:pytest] deps = pytest pytest-randomly lxml xmlschema>=2.0.0 commands = pytest tests -ra [testenv:build] deps = setuptools wheel build commands = python -m build