pax_global_header00006660000000000000000000000064125563453320014522gustar00rootroot0000000000000052 comment=720516a7e11db3181a2ee6225a0df054578d9c5b mwparserfromhell-0.4.2/000077500000000000000000000000001255634533200151165ustar00rootroot00000000000000mwparserfromhell-0.4.2/.coveragerc000066400000000000000000000002341255634533200172360ustar00rootroot00000000000000[report] exclude_lines = pragma: no cover raise NotImplementedError() partial_branches = pragma: no branch if py3k: if not py3k: if py26: mwparserfromhell-0.4.2/.gitignore000066400000000000000000000001661255634533200171110ustar00rootroot00000000000000*.pyc *.pyd *.so *.dll *.egg *.egg-info .coverage .DS_Store __pycache__ build dist docs/_build scripts/*.log htmlcov/ mwparserfromhell-0.4.2/.travis.yml000066400000000000000000000005161255634533200172310ustar00rootroot00000000000000language: python python: - 2.6 - 2.7 - 3.2 - 3.3 - 3.4 - 3.5-dev sudo: false install: - pip install coveralls - python setup.py build script: - coverage run --source=mwparserfromhell setup.py -q test after_success: - coveralls env: matrix: - WITHOUT_EXTENSION=0 - WITHOUT_EXTENSION=1 mwparserfromhell-0.4.2/CHANGELOG000066400000000000000000000165731255634533200163440ustar00rootroot00000000000000v0.4.2 (released July 30, 2015): - Fixed setup script not including header files in releases. - Fixed Windows binary uploads. v0.4.1 (released July 30, 2015): - The process for building Windows binaries has been fixed, and these should be distributed along with new releases. Windows users can now take advantage of C speedups without having a compiler of their own. - Added support for Python 3.5. - '<' and '>' are now disallowed in wikilink titles and template names. This includes when denoting tags, but not comments. - Fixed the behavior of preserve_spacing in Template.add() and keep_field in Template.remove() on parameters with hidden keys. - Removed _ListProxy.detach(). SmartLists now use weak references and their children are garbage-collected properly. - Fixed parser bugs involving: - templates with completely blank names; - templates with newlines and comments. - Heavy refactoring and fixes to the C tokenizer, including: - corrected a design flaw in text handling, allowing for substantial speed improvements when parsing long strings of plain text; - implemented new Python 3.3 PEP 393 Unicode APIs. - Fixed various bugs in SmartList, including one that was causing memory issues on 64-bit builds of Python 2 on Windows. - Fixed some bugs in the release scripts. v0.4 (released May 23, 2015): - The parser now falls back on pure Python mode if C extensions cannot be built. This fixes an issue that prevented some Windows users from installing the parser. - Added support for parsing wikicode tables (patches by David Winegar). - Added a script to test for memory leaks in scripts/memtest.py. - Added a script to do releases in scripts/release.sh. - skip_style_tags can now be passed to mwparserfromhell.parse() (previously, only Parser().parse() allowed it). - The 'recursive' argument to Wikicode's filter methods now accepts a third option, RECURSE_OTHERS, which recurses over all children except instances of 'forcetype' (for example, `code.filter_templates(code.RECURSE_OTHERS)` returns all un-nested templates). - The parser now understands HTML tag attributes quoted with single quotes. When setting a tag attribute's value, quotes will be added if necessary. As part of this, Attribute's 'quoted' attribute has been changed to 'quotes', and is now either a string or None. - Calling Template.remove() with a Parameter object that is not part of the template now raises ValueError instead of doing nothing. - Parameters with non-integer keys can no longer be created with 'showkey=False', nor have the value of this attribute be set to False later. - _ListProxy.destroy() has been changed to _ListProxy.detach(), and now works in a more useful way. - If something goes wrong while parsing, ParserError will now be raised. Previously, the parser would produce an unclear BadRoute exception or allow an incorrect node tree to be build. - Fixed parser bugs involving: - nested tags; - comments in template names; - tags inside of tags. - Added tests to ensure that parsed trees convert back to wikicode without unintentional modifications. - Added support for a NOWEB environment variable, which disables a unit test that makes a web call. - Test coverage has been improved, and some minor related bugs have been fixed. - Updated and fixed some documentation. v0.3.3 (released April 22, 2014): - Added support for Python 2.6 and 3.4. - Template.has() is now passed 'ignore_empty=False' by default instead of True. This fixes a bug when adding parameters to templates with empty fields, and is a breaking change if you rely on the default behavior. - The 'matches' argument of Wikicode's filter methods now accepts a function (taking one argument, a Node, and returning a bool) in addition to a regex. - Re-added 'flat' argument to Wikicode.get_sections(), fixed the order in which it returns sections, and made it faster. - Wikicode.matches() now accepts a tuple or list of strings/Wikicode objects instead of just a single string or Wikicode. - Given the frequency of issues with the (admittedly insufficient) tag parser, there's a temporary skip_style_tags argument to parse() that ignores '' and ''' until these issues are corrected. - Fixed a parser bug involving nested wikilinks and external links. - C code cleanup and speed improvements. v0.3.2 (released September 1, 2013): - Added support for Python 3.2 (along with current support for 3.3 and 2.7). - Renamed Template.remove()'s first argument from 'name' to 'param', which now accepts Parameter objects in addition to parameter name strings. v0.3.1 (released August 29, 2013): - Fixed a parser bug involving URLs nested inside other markup. - Fixed some typos. v0.3 (released August 24, 2013): - Added complete support for HTML Tags, including forms like foo, , and wiki-markup tags like bold ('''), italics (''), and lists (*, #, ; and :). - Added support for ExternalLinks (http://example.com/ and [http://example.com/ Example]). - Wikicode's filter methods are now passed 'recursive=True' by default instead of False. This is a breaking change if you rely on any filter() methods being non-recursive by default. - Added a matches() method to Wikicode for page/template name comparisons. - The 'obj' param of Wikicode.insert_before(), insert_after(), replace(), and remove() now accepts other Wikicode objects and strings representing parts of wikitext, instead of just nodes. These methods also make all possible substitutions instead of just one. - Renamed Template.has_param() to has() for consistency with Template's other methods; has_param() is now an alias. - The C tokenizer extension now works on Python 3 in addition to Python 2.7. - Various bugfixes, internal changes, and cleanup. v0.2 (released June 20, 2013): - The parser now fully supports Python 3 in addition to Python 2.7. - Added a C tokenizer extension that is significantly faster than its Python equivalent. It is enabled by default (if available) and can be toggled by setting `mwparserfromhell.parser.use_c` to a boolean value. - Added a complete set of unit tests covering parsing and wikicode manipulation. - Renamed Wikicode.filter_links() to filter_wikilinks() (applies to ifilter as well). - Added filter methods for Arguments, Comments, Headings, and HTMLEntities. - Added 'before' param to Template.add(); renamed 'force_nonconformity' to 'preserve_spacing'. - Added 'include_lead' param to Wikicode.get_sections(). - Removed 'flat' param from Wikicode.get_sections(). - Removed 'force_no_field' param from Template.remove(). - Added support for Travis CI. - Added note about Windows build issue in the README. - The tokenizer will limit itself to a realistic recursion depth to prevent errors and unreasonably long parse times. - Fixed how some nodes' attribute setters handle input. - Fixed multiple bugs in the tokenizer's handling of invalid markup. - Fixed bugs in the implementation of SmartList and StringMixIn. - Fixed some broken example code in the README; other copyedits. - Other bugfixes and code cleanup. v0.1.1 (released September 21, 2012): - Added support for Comments () and Wikilinks ([[foo]]). - Added corresponding ifilter_links() and filter_links() methods to Wikicode. - Fixed a bug when parsing incomplete templates. - Fixed strip_code() to affect the contents of headings. - Various copyedits in documentation and comments. v0.1 (released August 23, 2012): - Initial release. mwparserfromhell-0.4.2/LICENSE000066400000000000000000000020761255634533200161300ustar00rootroot00000000000000Copyright (C) 2012-2015 Ben Kurtovic Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. mwparserfromhell-0.4.2/MANIFEST.in000066400000000000000000000001471255634533200166560ustar00rootroot00000000000000include LICENSE CHANGELOG recursive-include mwparserfromhell *.h recursive-include tests *.py *.mwtest mwparserfromhell-0.4.2/README.rst000066400000000000000000000137571255634533200166220ustar00rootroot00000000000000mwparserfromhell ================ .. image:: https://img.shields.io/travis/earwig/mwparserfromhell/develop.svg :alt: Build Status :target: http://travis-ci.org/earwig/mwparserfromhell .. image:: https://img.shields.io/coveralls/earwig/mwparserfromhell/develop.svg :alt: Coverage Status :target: https://coveralls.io/r/earwig/mwparserfromhell **mwparserfromhell** (the *MediaWiki Parser from Hell*) is a Python package that provides an easy-to-use and outrageously powerful parser for MediaWiki_ wikicode. It supports Python 2 and Python 3. Developed by Earwig_ with contributions from `Σ`_, Legoktm_, and others. Full documentation is available on ReadTheDocs_. Development occurs on GitHub_. Installation ------------ The easiest way to install the parser is through the `Python Package Index`_; you can install the latest release with ``pip install mwparserfromhell`` (`get pip`_). On Windows, make sure you have the latest version of pip installed by running ``pip install --upgrade pip``. Alternatively, get the latest development version:: git clone https://github.com/earwig/mwparserfromhell.git cd mwparserfromhell python setup.py install You can run the comprehensive unit testing suite with ``python setup.py test -q``. Usage ----- Normal usage is rather straightforward (where ``text`` is page text):: >>> import mwparserfromhell >>> wikicode = mwparserfromhell.parse(text) ``wikicode`` is a ``mwparserfromhell.Wikicode`` object, which acts like an ordinary ``str`` object (or ``unicode`` in Python 2) with some extra methods. For example:: >>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?" >>> wikicode = mwparserfromhell.parse(text) >>> print(wikicode) I has a template! {{foo|bar|baz|eggs=spam}} See it? >>> templates = wikicode.filter_templates() >>> print(templates) ['{{foo|bar|baz|eggs=spam}}'] >>> template = templates[0] >>> print(template.name) foo >>> print(template.params) ['bar', 'baz', 'eggs=spam'] >>> print(template.get(1).value) bar >>> print(template.get("eggs").value) spam Since nodes can contain other nodes, getting nested templates is trivial:: >>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}" >>> mwparserfromhell.parse(text).filter_templates() ['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}'] You can also pass ``recursive=False`` to ``filter_templates()`` and explore templates manually. This is possible because nodes can contain additional ``Wikicode`` objects:: >>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}") >>> print(code.filter_templates(recursive=False)) ['{{foo|this {{includes a|template}}}}'] >>> foo = code.filter_templates(recursive=False)[0] >>> print(foo.get(1).value) this {{includes a|template}} >>> print(foo.get(1).value.filter_templates()[0]) {{includes a|template}} >>> print(foo.get(1).value.filter_templates()[0].get(1).value) template Templates can be easily modified to add, remove, or alter params. ``Wikicode`` objects can be treated like lists, with ``append()``, ``insert()``, ``remove()``, ``replace()``, and more. They also have a ``matches()`` method for comparing page or template names, which takes care of capitalization and whitespace:: >>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}" >>> code = mwparserfromhell.parse(text) >>> for template in code.filter_templates(): ... if template.name.matches("Cleanup") and not template.has("date"): ... template.add("date", "July 2012") ... >>> print(code) {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}} >>> code.replace("{{uncategorized}}", "{{bar-stub}}") >>> print(code) {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}} >>> print(code.filter_templates()) ['{{cleanup|date=July 2012}}', '{{bar-stub}}'] You can then convert ``code`` back into a regular ``str`` object (for saving the page!) by calling ``str()`` on it:: >>> text = str(code) >>> print(text) {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}} >>> text == code True Likewise, use ``unicode(code)`` in Python 2. Integration ----------- ``mwparserfromhell`` is used by and originally developed for EarwigBot_; ``Page`` objects have a ``parse`` method that essentially calls ``mwparserfromhell.parse()`` on ``page.get()``. If you're using Pywikibot_, your code might look like this:: import mwparserfromhell import pywikibot def parse(title): site = pywikibot.Site() page = pywikibot.Page(site, title) text = page.get() return mwparserfromhell.parse(text) If you're not using a library, you can parse any page using the following code (via the API_):: import json from urllib.parse import urlencode from urllib.request import urlopen import mwparserfromhell API_URL = "https://en.wikipedia.org/w/api.php" def parse(title): data = {"action": "query", "prop": "revisions", "rvlimit": 1, "rvprop": "content", "format": "json", "titles": title} raw = urlopen(API_URL, urlencode(data).encode()).read() res = json.loads(raw) text = res["query"]["pages"].values()[0]["revisions"][0]["*"] return mwparserfromhell.parse(text) .. _MediaWiki: http://mediawiki.org .. _ReadTheDocs: http://mwparserfromhell.readthedocs.org .. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig .. _Σ: http://en.wikipedia.org/wiki/User:%CE%A3 .. _Legoktm: http://en.wikipedia.org/wiki/User:Legoktm .. _GitHub: https://github.com/earwig/mwparserfromhell .. _Python Package Index: http://pypi.python.org .. _get pip: http://pypi.python.org/pypi/pip .. _EarwigBot: https://github.com/earwig/earwigbot .. _Pywikibot: https://www.mediawiki.org/wiki/Manual:Pywikibot .. _API: http://mediawiki.org/wiki/API mwparserfromhell-0.4.2/appveyor.yml000066400000000000000000000026531255634533200175140ustar00rootroot00000000000000# This config file is used by appveyor.com to build Windows release binaries version: 0.4.2-b{build} branches: only: - master skip_tags: true environment: global: # See: http://stackoverflow.com/a/13751649/163740 WRAPPER: "cmd /E:ON /V:ON /C .\\scripts\\win_wrapper.cmd" PIP: "%WRAPPER% %PYTHON%\\Scripts\\pip.exe" SETUPPY: "%WRAPPER% %PYTHON%\\python setup.py --with-extension" PYMOD: "%WRAPPER% %PYTHON%\\python -m" PYPI_USERNAME: "earwigbot" PYPI_PASSWORD: secure: gOIcvPxSC2ujuhwOzwj3v8xjq3CCYd8keFWVnguLM+gcL0e02qshDHy7gwZZwj0+ matrix: - PYTHON: "C:\\Python27" PYTHON_VERSION: "2.7" PYTHON_ARCH: "32" - PYTHON: "C:\\Python27-x64" PYTHON_VERSION: "2.7" PYTHON_ARCH: "64" - PYTHON: "C:\\Python33" PYTHON_VERSION: "3.3" PYTHON_ARCH: "32" - PYTHON: "C:\\Python33-x64" PYTHON_VERSION: "3.3" PYTHON_ARCH: "64" - PYTHON: "C:\\Python34" PYTHON_VERSION: "3.4" PYTHON_ARCH: "32" - PYTHON: "C:\\Python34-x64" PYTHON_VERSION: "3.4" PYTHON_ARCH: "64" install: - "%PIP% install wheel twine" build_script: - "%SETUPPY% build" test_script: - "%SETUPPY% -q test" after_test: - "%SETUPPY% bdist_wheel" on_success: - "%PYMOD% twine upload dist\\* -u %PYPI_USERNAME% -p %PYPI_PASSWORD%" artifacts: - path: dist\* deploy: off mwparserfromhell-0.4.2/docs/000077500000000000000000000000001255634533200160465ustar00rootroot00000000000000mwparserfromhell-0.4.2/docs/Makefile000066400000000000000000000127441255634533200175160ustar00rootroot00000000000000# Makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build PAPER = BUILDDIR = _build # Internal variables. PAPEROPT_a4 = -D latex_paper_size=a4 PAPEROPT_letter = -D latex_paper_size=letter ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . # the i18n builder cannot share the environment and doctrees with the others I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext help: @echo "Please use \`make ' where is one of" @echo " html to make standalone HTML files" @echo " dirhtml to make HTML files named index.html in directories" @echo " singlehtml to make a single large HTML file" @echo " pickle to make pickle files" @echo " json to make JSON files" @echo " htmlhelp to make HTML files and a HTML help project" @echo " qthelp to make HTML files and a qthelp project" @echo " devhelp to make HTML files and a Devhelp project" @echo " epub to make an epub" @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" @echo " latexpdf to make LaTeX files and run them through pdflatex" @echo " text to make text files" @echo " man to make manual pages" @echo " texinfo to make Texinfo files" @echo " info to make Texinfo files and run them through makeinfo" @echo " gettext to make PO message catalogs" @echo " changes to make an overview of all changed/added/deprecated items" @echo " linkcheck to check all external links for integrity" @echo " doctest to run all doctests embedded in the documentation (if enabled)" clean: -rm -rf $(BUILDDIR)/* html: $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." dirhtml: $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." singlehtml: $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml @echo @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." pickle: $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle @echo @echo "Build finished; now you can process the pickle files." json: $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json @echo @echo "Build finished; now you can process the JSON files." htmlhelp: $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp @echo @echo "Build finished; now you can run HTML Help Workshop with the" \ ".hhp project file in $(BUILDDIR)/htmlhelp." qthelp: $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp @echo @echo "Build finished; now you can run "qcollectiongenerator" with the" \ ".qhcp project file in $(BUILDDIR)/qthelp, like this:" @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/mwparserfromhell.qhcp" @echo "To view the help file:" @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/mwparserfromhell.qhc" devhelp: $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp @echo @echo "Build finished." @echo "To view the help file:" @echo "# mkdir -p $$HOME/.local/share/devhelp/mwparserfromhell" @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/mwparserfromhell" @echo "# devhelp" epub: $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub @echo @echo "Build finished. The epub file is in $(BUILDDIR)/epub." latex: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." @echo "Run \`make' in that directory to run these through (pdf)latex" \ "(use \`make latexpdf' here to do that automatically)." latexpdf: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through pdflatex..." $(MAKE) -C $(BUILDDIR)/latex all-pdf @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." text: $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text @echo @echo "Build finished. The text files are in $(BUILDDIR)/text." man: $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man @echo @echo "Build finished. The manual pages are in $(BUILDDIR)/man." texinfo: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." @echo "Run \`make' in that directory to run these through makeinfo" \ "(use \`make info' here to do that automatically)." info: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo "Running Texinfo files through makeinfo..." make -C $(BUILDDIR)/texinfo info @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." gettext: $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale @echo @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." changes: $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes @echo @echo "The overview file is in $(BUILDDIR)/changes." linkcheck: $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck @echo @echo "Link check complete; look for any errors in the above output " \ "or in $(BUILDDIR)/linkcheck/output.txt." doctest: $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest @echo "Testing of doctests in the sources finished, look at the " \ "results in $(BUILDDIR)/doctest/output.txt." mwparserfromhell-0.4.2/docs/api/000077500000000000000000000000001255634533200166175ustar00rootroot00000000000000mwparserfromhell-0.4.2/docs/api/modules.rst000066400000000000000000000001251255634533200210170ustar00rootroot00000000000000mwparserfromhell ================ .. toctree:: :maxdepth: 6 mwparserfromhell mwparserfromhell-0.4.2/docs/api/mwparserfromhell.nodes.extras.rst000066400000000000000000000007421255634533200253610ustar00rootroot00000000000000extras Package ============== :mod:`extras` Package --------------------- .. automodule:: mwparserfromhell.nodes.extras :members: :undoc-members: :mod:`attribute` Module ----------------------- .. automodule:: mwparserfromhell.nodes.extras.attribute :members: :undoc-members: :show-inheritance: :mod:`parameter` Module ----------------------- .. automodule:: mwparserfromhell.nodes.extras.parameter :members: :undoc-members: :show-inheritance: mwparserfromhell-0.4.2/docs/api/mwparserfromhell.nodes.rst000066400000000000000000000031231255634533200240500ustar00rootroot00000000000000nodes Package ============= :mod:`nodes` Package -------------------- .. automodule:: mwparserfromhell.nodes .. autoclass:: mwparserfromhell.nodes.Node :special-members: :mod:`argument` Module ---------------------- .. automodule:: mwparserfromhell.nodes.argument :members: :undoc-members: :show-inheritance: :mod:`comment` Module --------------------- .. automodule:: mwparserfromhell.nodes.comment :members: :undoc-members: :show-inheritance: :mod:`external_link` Module --------------------------- .. automodule:: mwparserfromhell.nodes.external_link :members: :undoc-members: :show-inheritance: :mod:`heading` Module --------------------- .. automodule:: mwparserfromhell.nodes.heading :members: :undoc-members: :show-inheritance: :mod:`html_entity` Module ------------------------- .. automodule:: mwparserfromhell.nodes.html_entity :members: :undoc-members: :show-inheritance: :mod:`tag` Module ----------------- .. automodule:: mwparserfromhell.nodes.tag :members: :undoc-members: :show-inheritance: :mod:`template` Module ---------------------- .. automodule:: mwparserfromhell.nodes.template :members: :undoc-members: :show-inheritance: :mod:`text` Module ------------------ .. automodule:: mwparserfromhell.nodes.text :members: :undoc-members: :show-inheritance: :mod:`wikilink` Module ---------------------- .. automodule:: mwparserfromhell.nodes.wikilink :members: :undoc-members: :show-inheritance: Subpackages ----------- .. toctree:: mwparserfromhell.nodes.extras mwparserfromhell-0.4.2/docs/api/mwparserfromhell.parser.rst000066400000000000000000000014071255634533200242370ustar00rootroot00000000000000parser Package ============== :mod:`parser` Package --------------------- .. automodule:: mwparserfromhell.parser :members: :undoc-members: :mod:`builder` Module --------------------- .. automodule:: mwparserfromhell.parser.builder :members: :undoc-members: :private-members: :mod:`contexts` Module ---------------------- .. automodule:: mwparserfromhell.parser.contexts :members: :undoc-members: :mod:`tokenizer` Module ----------------------- .. automodule:: mwparserfromhell.parser.tokenizer :members: :undoc-members: :private-members: .. autoexception:: mwparserfromhell.parser.tokenizer.BadRoute :mod:`tokens` Module -------------------- .. automodule:: mwparserfromhell.parser.tokens :members: :undoc-members: mwparserfromhell-0.4.2/docs/api/mwparserfromhell.rst000066400000000000000000000021031255634533200227360ustar00rootroot00000000000000mwparserfromhell Package ======================== :mod:`mwparserfromhell` Package ------------------------------- .. automodule:: mwparserfromhell.__init__ :members: :undoc-members: :mod:`compat` Module -------------------- .. automodule:: mwparserfromhell.compat :members: :undoc-members: :mod:`definitions` Module ------------------------- .. automodule:: mwparserfromhell.definitions :members: :mod:`smart_list` Module ------------------------ .. automodule:: mwparserfromhell.smart_list :members: SmartList, _ListProxy :undoc-members: :show-inheritance: :mod:`string_mixin` Module -------------------------- .. automodule:: mwparserfromhell.string_mixin :members: :undoc-members: :mod:`utils` Module ------------------- .. automodule:: mwparserfromhell.utils :members: :undoc-members: :mod:`wikicode` Module ---------------------- .. automodule:: mwparserfromhell.wikicode :members: :undoc-members: :show-inheritance: Subpackages ----------- .. toctree:: mwparserfromhell.nodes mwparserfromhell.parser mwparserfromhell-0.4.2/docs/changelog.rst000066400000000000000000000232221255634533200205300ustar00rootroot00000000000000Changelog ========= v0.4.2 ------ `Released July 30, 2015 `_ (`changes `__): - Fixed setup script not including header files in releases. - Fixed Windows binary uploads. v0.4.1 ------ `Released July 30, 2015 `_ (`changes `__): - The process for building Windows binaries has been fixed, and these should be distributed along with new releases. Windows users can now take advantage of C speedups without having a compiler of their own. - Added support for Python 3.5. - ``<`` and ``>`` are now disallowed in wikilink titles and template names. This includes when denoting tags, but not comments. - Fixed the behavior of *preserve_spacing* in :meth:`.Template.add` and *keep_field* in :meth:`.Template.remove` on parameters with hidden keys. - Removed :meth:`._ListProxy.detach`. :class:`.SmartList`\ s now use weak references and their children are garbage-collected properly. - Fixed parser bugs involving: - templates with completely blank names; - templates with newlines and comments. - Heavy refactoring and fixes to the C tokenizer, including: - corrected a design flaw in text handling, allowing for substantial speed improvements when parsing long strings of plain text; - implemented new Python 3.3 `PEP 393 `_ Unicode APIs. - Fixed various bugs in :class:`.SmartList`, including one that was causing memory issues on 64-bit builds of Python 2 on Windows. - Fixed some bugs in the release scripts. v0.4 ---- `Released May 23, 2015 `_ (`changes `__): - The parser now falls back on pure Python mode if C extensions cannot be built. This fixes an issue that prevented some Windows users from installing the parser. - Added support for parsing wikicode tables (patches by David Winegar). - Added a script to test for memory leaks in :file:`scripts/memtest.py`. - Added a script to do releases in :file:`scripts/release.sh`. - *skip_style_tags* can now be passed to :func:`mwparserfromhell.parse() <.parse_anything>` (previously, only :meth:`.Parser.parse` allowed it). - The *recursive* argument to :class:`Wikicode's <.Wikicode>` :meth:`.filter` methods now accepts a third option, ``RECURSE_OTHERS``, which recurses over all children except instances of *forcetype* (for example, ``code.filter_templates(code.RECURSE_OTHERS)`` returns all un-nested templates). - The parser now understands HTML tag attributes quoted with single quotes. When setting a tag attribute's value, quotes will be added if necessary. As part of this, :class:`.Attribute`\ 's :attr:`~.Attribute.quoted` attribute has been changed to :attr:`~.Attribute.quotes`, and is now either a string or ``None``. - Calling :meth:`.Template.remove` with a :class:`.Parameter` object that is not part of the template now raises :exc:`ValueError` instead of doing nothing. - :class:`.Parameter`\ s with non-integer keys can no longer be created with *showkey=False*, nor have the value of this attribute be set to *False* later. - :meth:`._ListProxy.destroy` has been changed to :meth:`._ListProxy.detach`, and now works in a more useful way. - If something goes wrong while parsing, :exc:`.ParserError` will now be raised. Previously, the parser would produce an unclear :exc:`.BadRoute` exception or allow an incorrect node tree to be build. - Fixed parser bugs involving: - nested tags; - comments in template names; - tags inside of ```` tags. - Added tests to ensure that parsed trees convert back to wikicode without unintentional modifications. - Added support for a :envvar:`NOWEB` environment variable, which disables a unit test that makes a web call. - Test coverage has been improved, and some minor related bugs have been fixed. - Updated and fixed some documentation. v0.3.3 ------ `Released April 22, 2014 `_ (`changes `__): - Added support for Python 2.6 and 3.4. - :meth:`.Template.has` is now passed *ignore_empty=False* by default instead of *True*. This fixes a bug when adding parameters to templates with empty fields, **and is a breaking change if you rely on the default behavior.** - The *matches* argument of :class:`Wikicode's <.Wikicode>` :meth:`.filter` methods now accepts a function (taking one argument, a :class:`.Node`, and returning a bool) in addition to a regex. - Re-added *flat* argument to :meth:`.Wikicode.get_sections`, fixed the order in which it returns sections, and made it faster. - :meth:`.Wikicode.matches` now accepts a tuple or list of strings/:class:`.Wikicode` objects instead of just a single string or :class:`.Wikicode`. - Given the frequency of issues with the (admittedly insufficient) tag parser, there's a temporary *skip_style_tags* argument to :meth:`~.Parser.parse` that ignores ``''`` and ``'''`` until these issues are corrected. - Fixed a parser bug involving nested wikilinks and external links. - C code cleanup and speed improvements. v0.3.2 ------ `Released September 1, 2013 `_ (`changes `__): - Added support for Python 3.2 (along with current support for 3.3 and 2.7). - Renamed :meth:`.Template.remove`\ 's first argument from *name* to *param*, which now accepts :class:`.Parameter` objects in addition to parameter name strings. v0.3.1 ------ `Released August 29, 2013 `_ (`changes `__): - Fixed a parser bug involving URLs nested inside other markup. - Fixed some typos. v0.3 ---- `Released August 24, 2013 `_ (`changes `__): - Added complete support for HTML :class:`Tags <.Tag>`, including forms like ``foo``, ````, and wiki-markup tags like bold (``'''``), italics (``''``), and lists (``*``, ``#``, ``;`` and ``:``). - Added support for :class:`.ExternalLink`\ s (``http://example.com/`` and ``[http://example.com/ Example]``). - :class:`Wikicode's <.Wikicode>` :meth:`.filter` methods are now passed *recursive=True* by default instead of *False*. **This is a breaking change if you rely on any filter() methods being non-recursive by default.** - Added a :meth:`.matches` method to :class:`.Wikicode` for page/template name comparisons. - The *obj* param of :meth:`.Wikicode.insert_before`, :meth:`.insert_after`, :meth:`~.Wikicode.replace`, and :meth:`~.Wikicode.remove` now accepts :class:`.Wikicode` objects and strings representing parts of wikitext, instead of just nodes. These methods also make all possible substitutions instead of just one. - Renamed :meth:`.Template.has_param` to :meth:`~.Template.has` for consistency with :class:`.Template`\ 's other methods; :meth:`.has_param` is now an alias. - The C tokenizer extension now works on Python 3 in addition to Python 2.7. - Various bugfixes, internal changes, and cleanup. v0.2 ---- `Released June 20, 2013 `_ (`changes `__): - The parser now fully supports Python 3 in addition to Python 2.7. - Added a C tokenizer extension that is significantly faster than its Python equivalent. It is enabled by default (if available) and can be toggled by setting :attr:`mwparserfromhell.parser.use_c` to a boolean value. - Added a complete set of unit tests covering parsing and wikicode manipulation. - Renamed :meth:`.filter_links` to :meth:`.filter_wikilinks` (applies to :meth:`.ifilter` as well). - Added filter methods for :class:`Arguments <.Argument>`, :class:`Comments <.Comment>`, :class:`Headings <.Heading>`, and :class:`HTMLEntities <.HTMLEntity>`. - Added *before* param to :meth:`.Template.add`; renamed *force_nonconformity* to *preserve_spacing*. - Added *include_lead* param to :meth:`.Wikicode.get_sections`. - Removed *flat* param from :meth:`.get_sections`. - Removed *force_no_field* param from :meth:`.Template.remove`. - Added support for Travis CI. - Added note about Windows build issue in the README. - The tokenizer will limit itself to a realistic recursion depth to prevent errors and unreasonably long parse times. - Fixed how some nodes' attribute setters handle input. - Fixed multiple bugs in the tokenizer's handling of invalid markup. - Fixed bugs in the implementation of :class:`.SmartList` and :class:`.StringMixIn`. - Fixed some broken example code in the README; other copyedits. - Other bugfixes and code cleanup. v0.1.1 ------ `Released September 21, 2012 `_ (`changes `__): - Added support for :class:`Comments <.Comment>` (````) and :class:`Wikilinks <.Wikilink>` (``[[foo]]``). - Added corresponding :meth:`.ifilter_links` and :meth:`.filter_links` methods to :class:`.Wikicode`. - Fixed a bug when parsing incomplete templates. - Fixed :meth:`.strip_code` to affect the contents of headings. - Various copyedits in documentation and comments. v0.1 ---- `Released August 23, 2012 `_: - Initial release. mwparserfromhell-0.4.2/docs/conf.py000066400000000000000000000177041255634533200173560ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # mwparserfromhell documentation build configuration file, created by # sphinx-quickstart on Tue Aug 21 20:47:26 2012. # # This file is execfile()d with the current directory set to its containing dir. # # Note that not all possible configuration values are present in this # autogenerated file. # # All configuration values have a default; values that are commented out # serve to show the default. import sys, os # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. sys.path.insert(0, os.path.abspath('..')) import mwparserfromhell # -- General configuration ----------------------------------------------------- # If your documentation needs a minimal Sphinx version, state it here. #needs_sphinx = '1.0' # Add any Sphinx extension module names here, as strings. They can be extensions # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. extensions = ['sphinx.ext.autodoc', 'sphinx.ext.intersphinx', 'sphinx.ext.viewcode'] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix of source filenames. source_suffix = '.rst' # The encoding of source files. #source_encoding = 'utf-8-sig' # The master toctree document. master_doc = 'index' # General information about the project. project = u'mwparserfromhell' copyright = u'2012, 2013, 2014, 2015 Ben Kurtovic' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The short X.Y version. version = ".".join(mwparserfromhell.__version__.split(".", 2)[:2]) # The full version, including alpha/beta/rc tags. release = mwparserfromhell.__version__ # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. #language = None # There are two options for replacing |today|: either, you set today to some # non-false value, then it is used: #today = '' # Else, today_fmt is used as the format for a strftime call. #today_fmt = '%B %d, %Y' # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. exclude_patterns = ['_build'] # The reST default role (used for this markup: `text`) to use for all documents. #default_role = None # If true, '()' will be appended to :func: etc. cross-reference text. #add_function_parentheses = True # If true, the current module name will be prepended to all description # unit titles (such as .. function::). #add_module_names = True # If true, sectionauthor and moduleauthor directives will be shown in the # output. They are ignored by default. #show_authors = False # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # A list of ignored prefixes for module index sorting. #modindex_common_prefix = [] # -- Options for HTML output --------------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. html_theme = 'nature' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. #html_theme_options = {} # Add any paths that contain custom themes here, relative to this directory. #html_theme_path = [] # The name for this set of Sphinx documents. If None, it defaults to # " v documentation". #html_title = None # A shorter title for the navigation bar. Default is the same as html_title. #html_short_title = None # The name of an image file (relative to this directory) to place at the top # of the sidebar. #html_logo = None # The name of an image file (within the static path) to use as favicon of the # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 # pixels large. #html_favicon = None # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, # using the given strftime format. #html_last_updated_fmt = '%b %d, %Y' # If true, SmartyPants will be used to convert quotes and dashes to # typographically correct entities. #html_use_smartypants = True # Custom sidebar templates, maps document names to template names. #html_sidebars = {} # Additional templates that should be rendered to pages, maps page names to # template names. #html_additional_pages = {} # If false, no module index is generated. #html_domain_indices = True # If false, no index is generated. #html_use_index = True # If true, the index is split into individual pages for each letter. #html_split_index = False # If true, links to the reST sources are added to the pages. #html_show_sourcelink = True # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. #html_show_sphinx = True # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. #html_show_copyright = True # If true, an OpenSearch description file will be output, and all pages will # contain a tag referring to it. The value of this option must be the # base URL from which the finished HTML is served. #html_use_opensearch = '' # This is the file name suffix for HTML files (e.g. ".xhtml"). #html_file_suffix = None # Output file base name for HTML help builder. htmlhelp_basename = 'mwparserfromhelldoc' # -- Options for LaTeX output -------------------------------------------------- latex_elements = { # The paper size ('letterpaper' or 'a4paper'). #'papersize': 'letterpaper', # The font size ('10pt', '11pt' or '12pt'). #'pointsize': '10pt', # Additional stuff for the LaTeX preamble. #'preamble': '', } # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, author, documentclass [howto/manual]). latex_documents = [ ('index', 'mwparserfromhell.tex', u'mwparserfromhell Documentation', u'Ben Kurtovic', 'manual'), ] # The name of an image file (relative to this directory) to place at the top of # the title page. #latex_logo = None # For "manual" documents, if this is true, then toplevel headings are parts, # not chapters. #latex_use_parts = False # If true, show page references after internal links. #latex_show_pagerefs = False # If true, show URL addresses after external links. #latex_show_urls = False # Documents to append as an appendix to all manuals. #latex_appendices = [] # If false, no module index is generated. #latex_domain_indices = True # -- Options for manual page output -------------------------------------------- # One entry per manual page. List of tuples # (source start file, name, description, authors, manual section). man_pages = [ ('index', 'mwparserfromhell', u'mwparserfromhell Documentation', [u'Ben Kurtovic'], 1) ] # If true, show URL addresses after external links. #man_show_urls = False # -- Options for Texinfo output ------------------------------------------------ # Grouping the document tree into Texinfo files. List of tuples # (source start file, target name, title, author, # dir menu entry, description, category) texinfo_documents = [ ('index', 'mwparserfromhell', u'mwparserfromhell Documentation', u'Ben Kurtovic', 'mwparserfromhell', 'One line description of project.', 'Miscellaneous'), ] # Documents to append as an appendix to all manuals. #texinfo_appendices = [] # If false, no module index is generated. #texinfo_domain_indices = True # How to display URL addresses: 'footnote', 'no', or 'inline'. #texinfo_show_urls = 'footnote' # Example configuration for intersphinx: refer to the Python standard library. intersphinx_mapping = {'http://docs.python.org/': None} mwparserfromhell-0.4.2/docs/index.rst000066400000000000000000000030741255634533200177130ustar00rootroot00000000000000MWParserFromHell v\ |version| Documentation =========================================== :mod:`mwparserfromhell` (the *MediaWiki Parser from Hell*) is a Python package that provides an easy-to-use and outrageously powerful parser for MediaWiki_ wikicode. It supports Python 2 and Python 3. Developed by Earwig_ with contributions from `Σ`_, Legoktm_, and others. Development occurs on GitHub_. .. _MediaWiki: http://mediawiki.org .. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig .. _Σ: http://en.wikipedia.org/wiki/User:%CE%A3 .. _Legoktm: http://en.wikipedia.org/wiki/User:Legoktm .. _GitHub: https://github.com/earwig/mwparserfromhell Installation ------------ The easiest way to install the parser is through the `Python Package Index`_; you can install the latest release with ``pip install mwparserfromhell`` (`get pip`_). On Windows, make sure you have the latest version of pip installed by running ``pip install --upgrade pip``. Alternatively, get the latest development version:: git clone https://github.com/earwig/mwparserfromhell.git cd mwparserfromhell python setup.py install You can run the comprehensive unit testing suite with ``python setup.py test -q``. .. _Python Package Index: http://pypi.python.org .. _get pip: http://pypi.python.org/pypi/pip Contents -------- .. toctree:: :maxdepth: 2 usage integration changelog API Reference Indices and tables ------------------ * :ref:`genindex` * :ref:`modindex` * :ref:`search` mwparserfromhell-0.4.2/docs/integration.rst000066400000000000000000000026471255634533200211340ustar00rootroot00000000000000Integration =========== :mod:`mwparserfromhell` is used by and originally developed for EarwigBot_; :class:`~earwigbot.wiki.page.Page` objects have a :meth:`~earwigbot.wiki.page.Page.parse` method that essentially calls :func:`mwparserfromhell.parse() ` on :meth:`~earwigbot.wiki.page.Page.get`. If you're using Pywikibot_, your code might look like this:: import mwparserfromhell import pywikibot def parse(title): site = pywikibot.Site() page = pywikibot.Page(site, title) text = page.get() return mwparserfromhell.parse(text) If you're not using a library, you can parse any page using the following code (via the API_):: import json from urllib.parse import urlencode from urllib.request import urlopen import mwparserfromhell API_URL = "https://en.wikipedia.org/w/api.php" def parse(title): data = {"action": "query", "prop": "revisions", "rvlimit": 1, "rvprop": "content", "format": "json", "titles": title} raw = urlopen(API_URL, urlencode(data).encode()).read() res = json.loads(raw) text = res["query"]["pages"].values()[0]["revisions"][0]["*"] return mwparserfromhell.parse(text) .. _EarwigBot: https://github.com/earwig/earwigbot .. _Pywikibot: https://www.mediawiki.org/wiki/Manual:Pywikibot .. _API: http://mediawiki.org/wiki/API mwparserfromhell-0.4.2/docs/usage.rst000066400000000000000000000062651255634533200177150ustar00rootroot00000000000000Usage ===== Normal usage is rather straightforward (where ``text`` is page text):: >>> import mwparserfromhell >>> wikicode = mwparserfromhell.parse(text) ``wikicode`` is a :class:`mwparserfromhell.Wikicode <.Wikicode>` object, which acts like an ordinary ``str`` object (or ``unicode`` in Python 2) with some extra methods. For example:: >>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?" >>> wikicode = mwparserfromhell.parse(text) >>> print(wikicode) I has a template! {{foo|bar|baz|eggs=spam}} See it? >>> templates = wikicode.filter_templates() >>> print(templates) ['{{foo|bar|baz|eggs=spam}}'] >>> template = templates[0] >>> print(template.name) foo >>> print(template.params) ['bar', 'baz', 'eggs=spam'] >>> print(template.get(1).value) bar >>> print(template.get("eggs").value) spam Since nodes can contain other nodes, getting nested templates is trivial:: >>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}" >>> mwparserfromhell.parse(text).filter_templates() ['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}'] You can also pass *recursive=False* to :meth:`.filter_templates` and explore templates manually. This is possible because nodes can contain additional :class:`.Wikicode` objects:: >>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}") >>> print(code.filter_templates(recursive=False)) ['{{foo|this {{includes a|template}}}}'] >>> foo = code.filter_templates(recursive=False)[0] >>> print(foo.get(1).value) this {{includes a|template}} >>> print(foo.get(1).value.filter_templates()[0]) {{includes a|template}} >>> print(foo.get(1).value.filter_templates()[0].get(1).value) template Templates can be easily modified to add, remove, or alter params. :class:`.Wikicode` objects can be treated like lists, with :meth:`~.Wikicode.append`, :meth:`~.Wikicode.insert`, :meth:`~.Wikicode.remove`, :meth:`~.Wikicode.replace`, and more. They also have a :meth:`~.Wikicode.matches` method for comparing page or template names, which takes care of capitalization and whitespace:: >>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}" >>> code = mwparserfromhell.parse(text) >>> for template in code.filter_templates(): ... if template.name.matches("Cleanup") and not template.has("date"): ... template.add("date", "July 2012") ... >>> print(code) {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}} >>> code.replace("{{uncategorized}}", "{{bar-stub}}") >>> print(code) {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}} >>> print(code.filter_templates()) ['{{cleanup|date=July 2012}}', '{{bar-stub}}'] You can then convert ``code`` back into a regular :class:`str` object (for saving the page!) by calling :func:`str` on it:: >>> text = str(code) >>> print(text) {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}} >>> text == code True (Likewise, use :func:`unicode(code) ` in Python 2.) For more tips, check out :class:`Wikicode's full method list <.Wikicode>` and the :mod:`list of Nodes <.nodes>`. mwparserfromhell-0.4.2/mwparserfromhell/000077500000000000000000000000001255634533200205075ustar00rootroot00000000000000mwparserfromhell-0.4.2/mwparserfromhell/__init__.py000066400000000000000000000032571255634533200226270ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. """ `mwparserfromhell `_ (the MediaWiki Parser from Hell) is a Python package that provides an easy-to-use and outrageously powerful parser for `MediaWiki `_ wikicode. """ __author__ = "Ben Kurtovic" __copyright__ = "Copyright (C) 2012, 2013, 2014, 2015 Ben Kurtovic" __license__ = "MIT License" __version__ = "0.4.2" __email__ = "ben.kurtovic@gmail.com" from . import (compat, definitions, nodes, parser, smart_list, string_mixin, utils, wikicode) parse = utils.parse_anything mwparserfromhell-0.4.2/mwparserfromhell/compat.py000066400000000000000000000013701255634533200223450ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ Implements support for both Python 2 and Python 3 by defining common types in terms of their Python 2/3 variants. For example, :class:`str` is set to :class:`unicode` on Python 2 but :class:`str` on Python 3; likewise, :class:`bytes` is :class:`str` on 2 but :class:`bytes` on 3. These types are meant to be imported directly from within the parser's modules. """ import sys py26 = (sys.version_info[0] == 2) and (sys.version_info[1] == 6) py3k = (sys.version_info[0] == 3) py32 = py3k and (sys.version_info[1] == 2) if py3k: bytes = bytes str = str range = range import html.entities as htmlentities else: bytes = str str = unicode range = xrange import htmlentitydefs as htmlentities del sys mwparserfromhell-0.4.2/mwparserfromhell/definitions.py000066400000000000000000000066121255634533200234010ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. """Contains data about certain markup, like HTML tags and external links.""" from __future__ import unicode_literals __all__ = ["get_html_tag", "is_parsable", "is_visible", "is_single", "is_single_only", "is_scheme"] URI_SCHEMES = { # [mediawiki/core.git]/includes/DefaultSettings.php @ 374a0ad943 "http": True, "https": True, "ftp": True, "ftps": True, "ssh": True, "sftp": True, "irc": True, "ircs": True, "xmpp": False, "sip": False, "sips": False, "gopher": True, "telnet": True, "nntp": True, "worldwind": True, "mailto": False, "tel": False, "sms": False, "news": False, "svn": True, "git": True, "mms": True, "bitcoin": False, "magnet": False, "urn": False, "geo": False } PARSER_BLACKLIST = [ # enwiki extensions @ 2013-06-28 "categorytree", "gallery", "hiero", "imagemap", "inputbox", "math", "nowiki", "pre", "score", "section", "source", "syntaxhighlight", "templatedata", "timeline" ] INVISIBLE_TAGS = [ # enwiki extensions @ 2013-06-28 "categorytree", "gallery", "imagemap", "inputbox", "math", "score", "section", "templatedata", "timeline" ] # [mediawiki/core.git]/includes/Sanitizer.php @ 87a0aef762 SINGLE_ONLY = ["br", "hr", "meta", "link", "img"] SINGLE = SINGLE_ONLY + ["li", "dt", "dd", "th", "td", "tr"] MARKUP_TO_HTML = { "#": "li", "*": "li", ";": "dt", ":": "dd" } def get_html_tag(markup): """Return the HTML tag associated with the given wiki-markup.""" return MARKUP_TO_HTML[markup] def is_parsable(tag): """Return if the given *tag*'s contents should be passed to the parser.""" return tag.lower() not in PARSER_BLACKLIST def is_visible(tag): """Return whether or not the given *tag* contains visible text.""" return tag.lower() not in INVISIBLE_TAGS def is_single(tag): """Return whether or not the given *tag* can exist without a close tag.""" return tag.lower() in SINGLE def is_single_only(tag): """Return whether or not the given *tag* must exist without a close tag.""" return tag.lower() in SINGLE_ONLY def is_scheme(scheme, slashes=True): """Return whether *scheme* is valid for external links.""" scheme = scheme.lower() if slashes: return scheme in URI_SCHEMES return scheme in URI_SCHEMES and not URI_SCHEMES[scheme] mwparserfromhell-0.4.2/mwparserfromhell/nodes/000077500000000000000000000000001255634533200216175ustar00rootroot00000000000000mwparserfromhell-0.4.2/mwparserfromhell/nodes/__init__.py000066400000000000000000000061511255634533200237330ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. """ This package contains :class:`.Wikicode` "nodes", which represent a single unit of wikitext, such as a Template, an HTML tag, a Heading, or plain text. The node "tree" is far from flat, as most types can contain additional :class:`.Wikicode` types within them - and with that, more nodes. For example, the name of a :class:`.Template` is a :class:`.Wikicode` object that can contain text or more templates. """ from __future__ import unicode_literals from ..compat import str from ..string_mixin import StringMixIn __all__ = ["Node", "Text", "Argument", "Heading", "HTMLEntity", "Tag", "Template"] class Node(StringMixIn): """Represents the base Node type, demonstrating the methods to override. :meth:`__unicode__` must be overridden. It should return a ``unicode`` or (``str`` in py3k) representation of the node. If the node contains :class:`.Wikicode` objects inside of it, :meth:`__children__` should be a generator that iterates over them. If the node is printable (shown when the page is rendered), :meth:`__strip__` should return its printable version, stripping out any formatting marks. It does not have to return a string, but something that can be converted to a string with ``str()``. Finally, :meth:`__showtree__` can be overridden to build a nice tree representation of the node, if desired, for :meth:`~.Wikicode.get_tree`. """ def __unicode__(self): raise NotImplementedError() def __children__(self): return yield # pragma: no cover (this is a generator that yields nothing) def __strip__(self, normalize, collapse): return None def __showtree__(self, write, get, mark): write(str(self)) from . import extras from .text import Text from .argument import Argument from .comment import Comment from .external_link import ExternalLink from .heading import Heading from .html_entity import HTMLEntity from .tag import Tag from .template import Template from .wikilink import Wikilink mwparserfromhell-0.4.2/mwparserfromhell/nodes/argument.py000066400000000000000000000057631255634533200240260ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals from . import Node from ..compat import str from ..utils import parse_anything __all__ = ["Argument"] class Argument(Node): """Represents a template argument substitution, like ``{{{foo}}}``.""" def __init__(self, name, default=None): super(Argument, self).__init__() self._name = name self._default = default def __unicode__(self): start = "{{{" + str(self.name) if self.default is not None: return start + "|" + str(self.default) + "}}}" return start + "}}}" def __children__(self): yield self.name if self.default is not None: yield self.default def __strip__(self, normalize, collapse): if self.default is not None: return self.default.strip_code(normalize, collapse) return None def __showtree__(self, write, get, mark): write("{{{") get(self.name) if self.default is not None: write(" | ") mark() get(self.default) write("}}}") @property def name(self): """The name of the argument to substitute.""" return self._name @property def default(self): """The default value to substitute if none is passed. This will be ``None`` if the argument wasn't defined with one. The MediaWiki parser handles this by rendering the argument itself in the result, complete braces. To have the argument render as nothing, set default to ``""`` (``{{{arg}}}`` vs. ``{{{arg|}}}``). """ return self._default @name.setter def name(self, value): self._name = parse_anything(value) @default.setter def default(self, default): if default is None: self._default = None else: self._default = parse_anything(default) mwparserfromhell-0.4.2/mwparserfromhell/nodes/comment.py000066400000000000000000000033451255634533200236400ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals from . import Node from ..compat import str __all__ = ["Comment"] class Comment(Node): """Represents a hidden HTML comment, like ````.""" def __init__(self, contents): super(Comment, self).__init__() self._contents = contents def __unicode__(self): return "" @property def contents(self): """The hidden text contained between ````.""" return self._contents @contents.setter def contents(self, value): self._contents = str(value) mwparserfromhell-0.4.2/mwparserfromhell/nodes/external_link.py000066400000000000000000000061701255634533200250340ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals from . import Node from ..compat import str from ..utils import parse_anything __all__ = ["ExternalLink"] class ExternalLink(Node): """Represents an external link, like ``[http://example.com/ Example]``.""" def __init__(self, url, title=None, brackets=True): super(ExternalLink, self).__init__() self._url = url self._title = title self._brackets = brackets def __unicode__(self): if self.brackets: if self.title is not None: return "[" + str(self.url) + " " + str(self.title) + "]" return "[" + str(self.url) + "]" return str(self.url) def __children__(self): yield self.url if self.title is not None: yield self.title def __strip__(self, normalize, collapse): if self.brackets: if self.title: return self.title.strip_code(normalize, collapse) return None return self.url.strip_code(normalize, collapse) def __showtree__(self, write, get, mark): if self.brackets: write("[") get(self.url) if self.title is not None: get(self.title) if self.brackets: write("]") @property def url(self): """The URL of the link target, as a :class:`.Wikicode` object.""" return self._url @property def title(self): """The link title (if given), as a :class:`.Wikicode` object.""" return self._title @property def brackets(self): """Whether to enclose the URL in brackets or display it straight.""" return self._brackets @url.setter def url(self, value): from ..parser import contexts self._url = parse_anything(value, contexts.EXT_LINK_URI) @title.setter def title(self, value): self._title = None if value is None else parse_anything(value) @brackets.setter def brackets(self, value): self._brackets = bool(value) mwparserfromhell-0.4.2/mwparserfromhell/nodes/extras/000077500000000000000000000000001255634533200231255ustar00rootroot00000000000000mwparserfromhell-0.4.2/mwparserfromhell/nodes/extras/__init__.py000066400000000000000000000025371255634533200252450ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. """ This package contains objects used by :class:`.Node`\ s, but that are not nodes themselves. This includes template parameters and HTML tag attributes. """ from .attribute import Attribute from .parameter import Parameter mwparserfromhell-0.4.2/mwparserfromhell/nodes/extras/attribute.py000066400000000000000000000122321255634533200255020ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals from ...compat import str from ...string_mixin import StringMixIn from ...utils import parse_anything __all__ = ["Attribute"] class Attribute(StringMixIn): """Represents an attribute of an HTML tag. This is used by :class:`.Tag` objects. For example, the tag ```` contains an Attribute whose name is ``"name"`` and whose value is ``"foo"``. """ def __init__(self, name, value=None, quotes='"', pad_first=" ", pad_before_eq="", pad_after_eq="", check_quotes=True): super(Attribute, self).__init__() if check_quotes and not quotes and self._value_needs_quotes(value): raise ValueError("given value {0!r} requires quotes".format(value)) self._name = name self._value = value self._quotes = quotes self._pad_first = pad_first self._pad_before_eq = pad_before_eq self._pad_after_eq = pad_after_eq def __unicode__(self): result = self.pad_first + str(self.name) + self.pad_before_eq if self.value is not None: result += "=" + self.pad_after_eq if self.quotes: return result + self.quotes + str(self.value) + self.quotes return result + str(self.value) return result @staticmethod def _value_needs_quotes(val): """Return the preferred quotes for the given value, or None.""" if val and any(char.isspace() for char in val): return ('"' in val and "'" in val) or ("'" if '"' in val else '"') return None def _set_padding(self, attr, value): """Setter for the value of a padding attribute.""" if not value: setattr(self, attr, "") else: value = str(value) if not value.isspace(): raise ValueError("padding must be entirely whitespace") setattr(self, attr, value) @staticmethod def coerce_quotes(quotes): """Coerce a quote type into an acceptable value, or raise an error.""" orig, quotes = quotes, str(quotes) if quotes else None if quotes not in [None, '"', "'"]: raise ValueError("{0!r} is not a valid quote type".format(orig)) return quotes @property def name(self): """The name of the attribute as a :class:`.Wikicode` object.""" return self._name @property def value(self): """The value of the attribute as a :class:`.Wikicode` object.""" return self._value @property def quotes(self): """How to enclose the attribute value. ``"``, ``'``, or ``None``.""" return self._quotes @property def pad_first(self): """Spacing to insert right before the attribute.""" return self._pad_first @property def pad_before_eq(self): """Spacing to insert right before the equal sign.""" return self._pad_before_eq @property def pad_after_eq(self): """Spacing to insert right after the equal sign.""" return self._pad_after_eq @name.setter def name(self, value): self._name = parse_anything(value) @value.setter def value(self, newval): if newval is None: self._value = None else: code = parse_anything(newval) quotes = self._value_needs_quotes(code) if quotes in ['"', "'"] or (quotes is True and not self.quotes): self._quotes = quotes self._value = code @quotes.setter def quotes(self, value): value = self.coerce_quotes(value) if not value and self._value_needs_quotes(self.value): raise ValueError("attribute value requires quotes") self._quotes = value @pad_first.setter def pad_first(self, value): self._set_padding("_pad_first", value) @pad_before_eq.setter def pad_before_eq(self, value): self._set_padding("_pad_before_eq", value) @pad_after_eq.setter def pad_after_eq(self, value): self._set_padding("_pad_after_eq", value) mwparserfromhell-0.4.2/mwparserfromhell/nodes/extras/parameter.py000066400000000000000000000060471255634533200254660ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals import re from ...compat import str from ...string_mixin import StringMixIn from ...utils import parse_anything __all__ = ["Parameter"] class Parameter(StringMixIn): """Represents a paramater of a template. For example, the template ``{{foo|bar|spam=eggs}}`` contains two Parameters: one whose name is ``"1"``, value is ``"bar"``, and ``showkey`` is ``False``, and one whose name is ``"spam"``, value is ``"eggs"``, and ``showkey`` is ``True``. """ def __init__(self, name, value, showkey=True): super(Parameter, self).__init__() if not showkey and not self.can_hide_key(name): raise ValueError("key {0!r} cannot be hidden".format(name)) self._name = name self._value = value self._showkey = showkey def __unicode__(self): if self.showkey: return str(self.name) + "=" + str(self.value) return str(self.value) @staticmethod def can_hide_key(key): """Return whether or not the given key can be hidden.""" return re.match(r"[1-9][0-9]*$", str(key).strip()) @property def name(self): """The name of the parameter as a :class:`.Wikicode` object.""" return self._name @property def value(self): """The value of the parameter as a :class:`.Wikicode` object.""" return self._value @property def showkey(self): """Whether to show the parameter's key (i.e., its "name").""" return self._showkey @name.setter def name(self, newval): self._name = parse_anything(newval) @value.setter def value(self, newval): self._value = parse_anything(newval) @showkey.setter def showkey(self, newval): newval = bool(newval) if not newval and not self.can_hide_key(self.name): raise ValueError("parameter key cannot be hidden") self._showkey = newval mwparserfromhell-0.4.2/mwparserfromhell/nodes/heading.py000066400000000000000000000046251255634533200235770ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals from . import Node from ..compat import str from ..utils import parse_anything __all__ = ["Heading"] class Heading(Node): """Represents a section heading in wikicode, like ``== Foo ==``.""" def __init__(self, title, level): super(Heading, self).__init__() self._title = title self._level = level def __unicode__(self): return ("=" * self.level) + str(self.title) + ("=" * self.level) def __children__(self): yield self.title def __strip__(self, normalize, collapse): return self.title.strip_code(normalize, collapse) def __showtree__(self, write, get, mark): write("=" * self.level) get(self.title) write("=" * self.level) @property def title(self): """The title of the heading, as a :class:`.Wikicode` object.""" return self._title @property def level(self): """The heading level, as an integer between 1 and 6, inclusive.""" return self._level @title.setter def title(self, value): self._title = parse_anything(value) @level.setter def level(self, value): value = int(value) if value < 1 or value > 6: raise ValueError(value) self._level = value mwparserfromhell-0.4.2/mwparserfromhell/nodes/html_entity.py000066400000000000000000000151601255634533200245340ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals from . import Node from ..compat import htmlentities, py3k, str __all__ = ["HTMLEntity"] class HTMLEntity(Node): """Represents an HTML entity, like `` ``, either named or unnamed.""" def __init__(self, value, named=None, hexadecimal=False, hex_char="x"): super(HTMLEntity, self).__init__() self._value = value if named is None: # Try to guess whether or not the entity is named try: int(value) self._named = False self._hexadecimal = False except ValueError: try: int(value, 16) self._named = False self._hexadecimal = True except ValueError: self._named = True self._hexadecimal = False else: self._named = named self._hexadecimal = hexadecimal self._hex_char = hex_char def __unicode__(self): if self.named: return "&{0};".format(self.value) if self.hexadecimal: return "&#{0}{1};".format(self.hex_char, self.value) return "&#{0};".format(self.value) def __strip__(self, normalize, collapse): if normalize: return self.normalize() return self if not py3k: @staticmethod def _unichr(value): """Implement builtin unichr() with support for non-BMP code points. On wide Python builds, this functions like the normal unichr(). On narrow builds, this returns the value's encoded surrogate pair. """ try: return unichr(value) except ValueError: # Test whether we're on the wide or narrow Python build. Check # the length of a non-BMP code point # (U+1F64A, SPEAK-NO-EVIL MONKEY): if len("\U0001F64A") == 1: # pragma: no cover raise # Ensure this is within the range we can encode: if value > 0x10FFFF: raise ValueError("unichr() arg not in range(0x110000)") code = value - 0x10000 if value < 0: # Invalid code point raise lead = 0xD800 + (code >> 10) trail = 0xDC00 + (code % (1 << 10)) return unichr(lead) + unichr(trail) @property def value(self): """The string value of the HTML entity.""" return self._value @property def named(self): """Whether the entity is a string name for a codepoint or an integer. For example, ``Σ``, ``Σ``, and ``Σ`` refer to the same character, but only the first is "named", while the others are integer representations of the codepoint. """ return self._named @property def hexadecimal(self): """If unnamed, this is whether the value is hexadecimal or decimal.""" return self._hexadecimal @property def hex_char(self): """If the value is hexadecimal, this is the letter denoting that. For example, the hex_char of ``"ሴ"`` is ``"x"``, whereas the hex_char of ``"ሴ"`` is ``"X"``. Lowercase and uppercase ``x`` are the only values supported. """ return self._hex_char @value.setter def value(self, newval): newval = str(newval) try: int(newval) except ValueError: try: int(newval, 16) except ValueError: if newval not in htmlentities.entitydefs: raise ValueError("entity value is not a valid name") self._named = True self._hexadecimal = False else: if int(newval, 16) < 0 or int(newval, 16) > 0x10FFFF: raise ValueError("entity value is not in range(0x110000)") self._named = False self._hexadecimal = True else: test = int(newval, 16 if self.hexadecimal else 10) if test < 0 or test > 0x10FFFF: raise ValueError("entity value is not in range(0x110000)") self._named = False self._value = newval @named.setter def named(self, newval): newval = bool(newval) if newval and self.value not in htmlentities.entitydefs: raise ValueError("entity value is not a valid name") if not newval: try: int(self.value, 16) except ValueError: err = "current entity value is not a valid Unicode codepoint" raise ValueError(err) self._named = newval @hexadecimal.setter def hexadecimal(self, newval): newval = bool(newval) if newval and self.named: raise ValueError("a named entity cannot be hexadecimal") self._hexadecimal = newval @hex_char.setter def hex_char(self, newval): newval = str(newval) if newval not in ("x", "X"): raise ValueError(newval) self._hex_char = newval def normalize(self): """Return the unicode character represented by the HTML entity.""" chrfunc = chr if py3k else HTMLEntity._unichr if self.named: return chrfunc(htmlentities.name2codepoint[self.value]) if self.hexadecimal: return chrfunc(int(self.value, 16)) return chrfunc(int(self.value)) mwparserfromhell-0.4.2/mwparserfromhell/nodes/tag.py000066400000000000000000000261511255634533200227510ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals from . import Node from .extras import Attribute from ..compat import str from ..definitions import is_visible from ..utils import parse_anything __all__ = ["Tag"] class Tag(Node): """Represents an HTML-style tag in wikicode, like ````.""" def __init__(self, tag, contents=None, attrs=None, wiki_markup=None, self_closing=False, invalid=False, implicit=False, padding="", closing_tag=None, wiki_style_separator=None, closing_wiki_markup=None): super(Tag, self).__init__() self._tag = tag if contents is None and not self_closing: self._contents = parse_anything("") else: self._contents = contents self._attrs = attrs if attrs else [] self._wiki_markup = wiki_markup self._self_closing = self_closing self._invalid = invalid self._implicit = implicit self._padding = padding if closing_tag: self._closing_tag = closing_tag else: self._closing_tag = tag self._wiki_style_separator = wiki_style_separator if closing_wiki_markup is not None: self._closing_wiki_markup = closing_wiki_markup elif wiki_markup and not self_closing: self._closing_wiki_markup = wiki_markup else: self._closing_wiki_markup = None def __unicode__(self): if self.wiki_markup: if self.attributes: attrs = "".join([str(attr) for attr in self.attributes]) else: attrs = "" padding = self.padding or "" separator = self.wiki_style_separator or "" close = self.closing_wiki_markup or "" if self.self_closing: return self.wiki_markup + attrs + padding + separator else: return self.wiki_markup + attrs + padding + separator + \ str(self.contents) + close result = ("" if self.implicit else "/>") else: result += self.padding + ">" + str(self.contents) result += "" return result def __children__(self): if not self.wiki_markup: yield self.tag for attr in self.attributes: yield attr.name if attr.value is not None: yield attr.value if self.contents: yield self.contents if not self.self_closing and not self.wiki_markup and self.closing_tag: yield self.closing_tag def __strip__(self, normalize, collapse): if self.contents and is_visible(self.tag): return self.contents.strip_code(normalize, collapse) return None def __showtree__(self, write, get, mark): write("" if self.implicit else "/>") else: write(">") get(self.contents) write("") @property def tag(self): """The tag itself, as a :class:`.Wikicode` object.""" return self._tag @property def contents(self): """The contents of the tag, as a :class:`.Wikicode` object.""" return self._contents @property def attributes(self): """The list of attributes affecting the tag. Each attribute is an instance of :class:`.Attribute`. """ return self._attrs @property def wiki_markup(self): """The wikified version of a tag to show instead of HTML. If set to a value, this will be displayed instead of the brackets. For example, set to ``''`` to replace ```` or ``----`` to replace ``
``. """ return self._wiki_markup @property def self_closing(self): """Whether the tag is self-closing with no content (like ``
``).""" return self._self_closing @property def invalid(self): """Whether the tag starts with a backslash after the opening bracket. This makes the tag look like a lone close tag. It is technically invalid and is only parsable Wikicode when the tag itself is single-only, like ``
`` and ````. See :func:`.definitions.is_single_only`. """ return self._invalid @property def implicit(self): """Whether the tag is implicitly self-closing, with no ending slash. This is only possible for specific "single" tags like ``
`` and ``
  • ``. See :func:`.definitions.is_single`. This field only has an effect if :attr:`self_closing` is also ``True``. """ return self._implicit @property def padding(self): """Spacing to insert before the first closing ``>``.""" return self._padding @property def closing_tag(self): """The closing tag, as a :class:`.Wikicode` object. This will usually equal :attr:`tag`, unless there is additional spacing, comments, or the like. """ return self._closing_tag @property def wiki_style_separator(self): """The separator between the padding and content in a wiki markup tag. Essentially the wiki equivalent of the TagCloseOpen. """ return self._wiki_style_separator @property def closing_wiki_markup(self): """The wikified version of the closing tag to show instead of HTML. If set to a value, this will be displayed instead of the close tag brackets. If tag is :attr:`self_closing` is ``True`` then this is not displayed. If :attr:`wiki_markup` is set and this has not been set, this is set to the value of :attr:`wiki_markup`. If this has been set and :attr:`wiki_markup` is set to a ``False`` value, this is set to ``None``. """ return self._closing_wiki_markup @tag.setter def tag(self, value): self._tag = self._closing_tag = parse_anything(value) @contents.setter def contents(self, value): self._contents = parse_anything(value) @wiki_markup.setter def wiki_markup(self, value): self._wiki_markup = str(value) if value else None if not value or not self.closing_wiki_markup: self._closing_wiki_markup = self._wiki_markup @self_closing.setter def self_closing(self, value): self._self_closing = bool(value) @invalid.setter def invalid(self, value): self._invalid = bool(value) @implicit.setter def implicit(self, value): self._implicit = bool(value) @padding.setter def padding(self, value): if not value: self._padding = "" else: value = str(value) if not value.isspace(): raise ValueError("padding must be entirely whitespace") self._padding = value @closing_tag.setter def closing_tag(self, value): self._closing_tag = parse_anything(value) @wiki_style_separator.setter def wiki_style_separator(self, value): self._wiki_style_separator = str(value) if value else None @closing_wiki_markup.setter def closing_wiki_markup(self, value): self._closing_wiki_markup = str(value) if value else None def has(self, name): """Return whether any attribute in the tag has the given *name*. Note that a tag may have multiple attributes with the same name, but only the last one is read by the MediaWiki parser. """ for attr in self.attributes: if attr.name == name.strip(): return True return False def get(self, name): """Get the attribute with the given *name*. The returned object is a :class:`.Attribute` instance. Raises :exc:`ValueError` if no attribute has this name. Since multiple attributes can have the same name, we'll return the last match, since all but the last are ignored by the MediaWiki parser. """ for attr in reversed(self.attributes): if attr.name == name.strip(): return attr raise ValueError(name) def add(self, name, value=None, quotes='"', pad_first=" ", pad_before_eq="", pad_after_eq=""): """Add an attribute with the given *name* and *value*. *name* and *value* can be anything parsable by :func:`.utils.parse_anything`; *value* can be omitted if the attribute is valueless. If *quotes* is not ``None``, it should be a string (either ``"`` or ``'``) that *value* will be wrapped in (this is recommended). ``None`` is only legal if *value* contains no spacing. *pad_first*, *pad_before_eq*, and *pad_after_eq* are whitespace used as padding before the name, before the equal sign (or after the name if no value), and after the equal sign (ignored if no value), respectively. """ if value is not None: value = parse_anything(value) quotes = Attribute.coerce_quotes(quotes) attr = Attribute(parse_anything(name), value, quotes) attr.pad_first = pad_first attr.pad_before_eq = pad_before_eq attr.pad_after_eq = pad_after_eq self.attributes.append(attr) return attr def remove(self, name): """Remove all attributes with the given *name*.""" attrs = [attr for attr in self.attributes if attr.name == name.strip()] if not attrs: raise ValueError(name) for attr in attrs: self.attributes.remove(attr) mwparserfromhell-0.4.2/mwparserfromhell/nodes/template.py000066400000000000000000000316731255634533200240160ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals from collections import defaultdict import re from . import HTMLEntity, Node, Text from .extras import Parameter from ..compat import range, str from ..utils import parse_anything __all__ = ["Template"] FLAGS = re.DOTALL | re.UNICODE class Template(Node): """Represents a template in wikicode, like ``{{foo}}``.""" def __init__(self, name, params=None): super(Template, self).__init__() self._name = name if params: self._params = params else: self._params = [] def __unicode__(self): if self.params: params = "|".join([str(param) for param in self.params]) return "{{" + str(self.name) + "|" + params + "}}" else: return "{{" + str(self.name) + "}}" def __children__(self): yield self.name for param in self.params: if param.showkey: yield param.name yield param.value def __showtree__(self, write, get, mark): write("{{") get(self.name) for param in self.params: write(" | ") mark() get(param.name) write(" = ") mark() get(param.value) write("}}") def _surface_escape(self, code, char): """Return *code* with *char* escaped as an HTML entity. The main use of this is to escape pipes (``|``) or equal signs (``=``) in parameter names or values so they are not mistaken for new parameters. """ replacement = str(HTMLEntity(value=ord(char))) for node in code.filter_text(recursive=False): if char in node: code.replace(node, node.replace(char, replacement), False) def _select_theory(self, theories): """Return the most likely spacing convention given different options. Given a dictionary of convention options as keys and their occurrence as values, return the convention that occurs the most, or ``None`` if there is no clear preferred style. """ if theories: values = tuple(theories.values()) best = max(values) confidence = float(best) / sum(values) if confidence >= 0.75: return tuple(theories.keys())[values.index(best)] def _get_spacing_conventions(self, use_names): """Try to determine the whitespace conventions for parameters. This will examine the existing parameters and use :meth:`_select_theory` to determine if there are any preferred styles for how much whitespace to put before or after the value. """ before_theories = defaultdict(lambda: 0) after_theories = defaultdict(lambda: 0) for param in self.params: if use_names: component = str(param.name) else: component = str(param.value) match = re.search(r"^(\s*).*?(\s*)$", component, FLAGS) before, after = match.group(1), match.group(2) before_theories[before] += 1 after_theories[after] += 1 before = self._select_theory(before_theories) after = self._select_theory(after_theories) return before, after def _blank_param_value(self, value): """Remove the content from *value* while keeping its whitespace. Replace *value*\ 's nodes with two text nodes, the first containing whitespace from before its content and the second containing whitespace from after its content. """ match = re.search(r"^(\s*).*?(\s*)$", str(value), FLAGS) value.nodes = [Text(match.group(1)), Text(match.group(2))] def _fix_dependendent_params(self, i): """Unhide keys if necessary after removing the param at index *i*.""" if not self.params[i].showkey: for param in self.params[i + 1:]: if not param.showkey: param.showkey = True def _remove_exact(self, needle, keep_field): """Remove a specific parameter, *needle*, from the template.""" for i, param in enumerate(self.params): if param is needle: if keep_field: self._blank_param_value(param.value) else: self._fix_dependendent_params(i) self.params.pop(i) return raise ValueError(needle) def _should_remove(self, i, name): """Look ahead for a parameter with the same name, but hidden. If one exists, we should remove the given one rather than blanking it. """ if self.params[i].showkey: following = self.params[i + 1:] better_matches = [after.name.strip() == name and not after.showkey for after in following] return any(better_matches) return False @property def name(self): """The name of the template, as a :class:`.Wikicode` object.""" return self._name @property def params(self): """The list of parameters contained within the template.""" return self._params @name.setter def name(self, value): self._name = parse_anything(value) def has(self, name, ignore_empty=False): """Return ``True`` if any parameter in the template is named *name*. With *ignore_empty*, ``False`` will be returned even if the template contains a parameter with the name *name*, if the parameter's value is empty. Note that a template may have multiple parameters with the same name, but only the last one is read by the MediaWiki parser. """ name = str(name).strip() for param in self.params: if param.name.strip() == name: if ignore_empty and not param.value.strip(): continue return True return False has_param = lambda self, name, ignore_empty=False: \ self.has(name, ignore_empty) has_param.__doc__ = "Alias for :meth:`has`." def get(self, name): """Get the parameter whose name is *name*. The returned object is a :class:`.Parameter` instance. Raises :exc:`ValueError` if no parameter has this name. Since multiple parameters can have the same name, we'll return the last match, since the last parameter is the only one read by the MediaWiki parser. """ name = str(name).strip() for param in reversed(self.params): if param.name.strip() == name: return param raise ValueError(name) def add(self, name, value, showkey=None, before=None, preserve_spacing=True): """Add a parameter to the template with a given *name* and *value*. *name* and *value* can be anything parsable by :func:`.utils.parse_anything`; pipes and equal signs are automatically escaped from *value* when appropriate. If *name* is already a parameter in the template, we'll replace its value. If *showkey* is given, this will determine whether or not to show the parameter's name (e.g., ``{{foo|bar}}``'s parameter has a name of ``"1"`` but it is hidden); otherwise, we'll make a safe and intelligent guess. If *before* is given (either a :class:`.Parameter` object or a name), then we will place the parameter immediately before this one. Otherwise, it will be added at the end. If *before* is a name and exists multiple times in the template, we will place it before the last occurrence. If *before* is not in the template, :exc:`ValueError` is raised. The argument is ignored if *name* is an existing parameter. If *preserve_spacing* is ``True``, we will try to preserve whitespace conventions around the parameter, whether it is new or we are updating an existing value. It is disabled for parameters with hidden keys, since MediaWiki doesn't strip whitespace in this case. """ name, value = parse_anything(name), parse_anything(value) self._surface_escape(value, "|") if self.has(name): self.remove(name, keep_field=True) existing = self.get(name) if showkey is not None: existing.showkey = showkey if not existing.showkey: self._surface_escape(value, "=") nodes = existing.value.nodes if preserve_spacing and existing.showkey: for i in range(2): # Ignore empty text nodes if not nodes[i]: nodes[i] = None existing.value = parse_anything([nodes[0], value, nodes[1]]) else: existing.value = value return existing if showkey is None: if Parameter.can_hide_key(name): int_name = int(str(name)) int_keys = set() for param in self.params: if not param.showkey: int_keys.add(int(str(param.name))) expected = min(set(range(1, len(int_keys) + 2)) - int_keys) if expected == int_name: showkey = False else: showkey = True else: showkey = True if not showkey: self._surface_escape(value, "=") if preserve_spacing and showkey: before_n, after_n = self._get_spacing_conventions(use_names=True) before_v, after_v = self._get_spacing_conventions(use_names=False) name = parse_anything([before_n, name, after_n]) value = parse_anything([before_v, value, after_v]) param = Parameter(name, value, showkey) if before: if not isinstance(before, Parameter): before = self.get(before) self.params.insert(self.params.index(before), param) else: self.params.append(param) return param def remove(self, param, keep_field=False): """Remove a parameter from the template, identified by *param*. If *param* is a :class:`.Parameter` object, it will be matched exactly, otherwise it will be treated like the *name* argument to :meth:`has` and :meth:`get`. If *keep_field* is ``True``, we will keep the parameter's name, but blank its value. Otherwise, we will remove the parameter completely. When removing a parameter with a hidden name, subsequent parameters with hidden names will be made visible. For example, removing ``bar`` from ``{{foo|bar|baz}}`` produces ``{{foo|2=baz}}`` because ``{{foo|baz}}`` is incorrect. If the parameter shows up multiple times in the template and *param* is not a :class:`.Parameter` object, we will remove all instances of it (and keep only one if *keep_field* is ``True`` - either the one with a hidden name, if it exists, or the first instance). """ if isinstance(param, Parameter): return self._remove_exact(param, keep_field) name = str(param).strip() removed = False to_remove = [] for i, param in enumerate(self.params): if param.name.strip() == name: if keep_field: if self._should_remove(i, name): to_remove.append(i) else: self._blank_param_value(param.value) keep_field = False else: self._fix_dependendent_params(i) to_remove.append(i) if not removed: removed = True if not removed: raise ValueError(name) for i in reversed(to_remove): self.params.pop(i) mwparserfromhell-0.4.2/mwparserfromhell/nodes/text.py000066400000000000000000000035061255634533200231610ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals from . import Node from ..compat import str __all__ = ["Text"] class Text(Node): """Represents ordinary, unformatted text with no special properties.""" def __init__(self, value): super(Text, self).__init__() self._value = value def __unicode__(self): return self.value def __strip__(self, normalize, collapse): return self def __showtree__(self, write, get, mark): write(str(self).encode("unicode_escape").decode("utf8")) @property def value(self): """The actual text itself.""" return self._value @value.setter def value(self, newval): self._value = str(newval) mwparserfromhell-0.4.2/mwparserfromhell/nodes/wikilink.py000066400000000000000000000053121255634533200240130ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals from . import Node from ..compat import str from ..utils import parse_anything __all__ = ["Wikilink"] class Wikilink(Node): """Represents an internal wikilink, like ``[[Foo|Bar]]``.""" def __init__(self, title, text=None): super(Wikilink, self).__init__() self._title = title self._text = text def __unicode__(self): if self.text is not None: return "[[" + str(self.title) + "|" + str(self.text) + "]]" return "[[" + str(self.title) + "]]" def __children__(self): yield self.title if self.text is not None: yield self.text def __strip__(self, normalize, collapse): if self.text is not None: return self.text.strip_code(normalize, collapse) return self.title.strip_code(normalize, collapse) def __showtree__(self, write, get, mark): write("[[") get(self.title) if self.text is not None: write(" | ") mark() get(self.text) write("]]") @property def title(self): """The title of the linked page, as a :class:`.Wikicode` object.""" return self._title @property def text(self): """The text to display (if any), as a :class:`.Wikicode` object.""" return self._text @title.setter def title(self, value): self._title = parse_anything(value) @text.setter def text(self, value): if value is None: self._text = None else: self._text = parse_anything(value) mwparserfromhell-0.4.2/mwparserfromhell/parser/000077500000000000000000000000001255634533200220035ustar00rootroot00000000000000mwparserfromhell-0.4.2/mwparserfromhell/parser/__init__.py000066400000000000000000000076401255634533200241230ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. """ This package contains the actual wikicode parser, split up into two main modules: the :mod:`.tokenizer` and the :mod:`.builder`. This module joins them together into one interface. """ class ParserError(Exception): """Exception raised when an internal error occurs while parsing. This does not mean that the wikicode was invalid, because invalid markup should still be parsed correctly. This means that the parser caught itself with an impossible internal state and is bailing out before other problems can happen. Its appearance indicates a bug. """ def __init__(self, extra): msg = "This is a bug and should be reported. Info: {0}.".format(extra) super(ParserError, self).__init__(msg) from .builder import Builder try: from ._tokenizer import CTokenizer use_c = True except ImportError: from .tokenizer import Tokenizer CTokenizer = None use_c = False __all__ = ["use_c", "Parser", "ParserError"] class Parser(object): """Represents a parser for wikicode. Actual parsing is a two-step process: first, the text is split up into a series of tokens by the :class:`.Tokenizer`, and then the tokens are converted into trees of :class:`.Wikicode` objects and :class:`.Node`\ s by the :class:`.Builder`. Instances of this class or its dependents (:class:`.Tokenizer` and :class:`.Builder`) should not be shared between threads. :meth:`parse` can be called multiple times as long as it is not done concurrently. In general, there is no need to do this because parsing should be done through :func:`mwparserfromhell.parse`, which creates a new :class:`.Parser` object as necessary. """ def __init__(self): if use_c and CTokenizer: self._tokenizer = CTokenizer() else: from .tokenizer import Tokenizer self._tokenizer = Tokenizer() self._builder = Builder() def parse(self, text, context=0, skip_style_tags=False): """Parse *text*, returning a :class:`.Wikicode` object tree. If given, *context* will be passed as a starting context to the parser. This is helpful when this function is used inside node attribute setters. For example, :class:`.ExternalLink`\ 's :attr:`~.ExternalLink.url` setter sets *context* to :mod:`contexts.EXT_LINK_URI <.contexts>` to prevent the URL itself from becoming an :class:`.ExternalLink`. If *skip_style_tags* is ``True``, then ``''`` and ``'''`` will not be parsed, but instead will be treated as plain text. If there is an internal error while parsing, :exc:`.ParserError` will be raised. """ tokens = self._tokenizer.tokenize(text, context, skip_style_tags) code = self._builder.build(tokens) return code mwparserfromhell-0.4.2/mwparserfromhell/parser/builder.py000066400000000000000000000300701255634533200240030ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals from . import tokens, ParserError from ..compat import str from ..nodes import (Argument, Comment, ExternalLink, Heading, HTMLEntity, Tag, Template, Text, Wikilink) from ..nodes.extras import Attribute, Parameter from ..smart_list import SmartList from ..wikicode import Wikicode __all__ = ["Builder"] _HANDLERS = { tokens.Text: lambda self, token: Text(token.text) } def _add_handler(token_type): """Create a decorator that adds a handler function to the lookup table.""" def decorator(func): """Add a handler function to the lookup table.""" _HANDLERS[token_type] = func return func return decorator class Builder(object): """Builds a tree of nodes out of a sequence of tokens. To use, pass a list of :class:`.Token`\ s to the :meth:`build` method. The list will be exhausted as it is parsed and a :class:`.Wikicode` object containing the node tree will be returned. """ def __init__(self): self._tokens = [] self._stacks = [] def _push(self): """Push a new node list onto the stack.""" self._stacks.append([]) def _pop(self): """Pop the current node list off of the stack. The raw node list is wrapped in a :class:`.SmartList` and then in a :class:`.Wikicode` object. """ return Wikicode(SmartList(self._stacks.pop())) def _write(self, item): """Append a node to the current node list.""" self._stacks[-1].append(item) def _handle_parameter(self, default): """Handle a case where a parameter is at the head of the tokens. *default* is the value to use if no parameter name is defined. """ key = None showkey = False self._push() while self._tokens: token = self._tokens.pop() if isinstance(token, tokens.TemplateParamEquals): key = self._pop() showkey = True self._push() elif isinstance(token, (tokens.TemplateParamSeparator, tokens.TemplateClose)): self._tokens.append(token) value = self._pop() if key is None: key = Wikicode(SmartList([Text(str(default))])) return Parameter(key, value, showkey) else: self._write(self._handle_token(token)) raise ParserError("_handle_parameter() missed a close token") @_add_handler(tokens.TemplateOpen) def _handle_template(self, token): """Handle a case where a template is at the head of the tokens.""" params = [] default = 1 self._push() while self._tokens: token = self._tokens.pop() if isinstance(token, tokens.TemplateParamSeparator): if not params: name = self._pop() param = self._handle_parameter(default) params.append(param) if not param.showkey: default += 1 elif isinstance(token, tokens.TemplateClose): if not params: name = self._pop() return Template(name, params) else: self._write(self._handle_token(token)) raise ParserError("_handle_template() missed a close token") @_add_handler(tokens.ArgumentOpen) def _handle_argument(self, token): """Handle a case where an argument is at the head of the tokens.""" name = None self._push() while self._tokens: token = self._tokens.pop() if isinstance(token, tokens.ArgumentSeparator): name = self._pop() self._push() elif isinstance(token, tokens.ArgumentClose): if name is not None: return Argument(name, self._pop()) return Argument(self._pop()) else: self._write(self._handle_token(token)) raise ParserError("_handle_argument() missed a close token") @_add_handler(tokens.WikilinkOpen) def _handle_wikilink(self, token): """Handle a case where a wikilink is at the head of the tokens.""" title = None self._push() while self._tokens: token = self._tokens.pop() if isinstance(token, tokens.WikilinkSeparator): title = self._pop() self._push() elif isinstance(token, tokens.WikilinkClose): if title is not None: return Wikilink(title, self._pop()) return Wikilink(self._pop()) else: self._write(self._handle_token(token)) raise ParserError("_handle_wikilink() missed a close token") @_add_handler(tokens.ExternalLinkOpen) def _handle_external_link(self, token): """Handle when an external link is at the head of the tokens.""" brackets, url = token.brackets, None self._push() while self._tokens: token = self._tokens.pop() if isinstance(token, tokens.ExternalLinkSeparator): url = self._pop() self._push() elif isinstance(token, tokens.ExternalLinkClose): if url is not None: return ExternalLink(url, self._pop(), brackets) return ExternalLink(self._pop(), brackets=brackets) else: self._write(self._handle_token(token)) raise ParserError("_handle_external_link() missed a close token") @_add_handler(tokens.HTMLEntityStart) def _handle_entity(self, token): """Handle a case where an HTML entity is at the head of the tokens.""" token = self._tokens.pop() if isinstance(token, tokens.HTMLEntityNumeric): token = self._tokens.pop() if isinstance(token, tokens.HTMLEntityHex): text = self._tokens.pop() self._tokens.pop() # Remove HTMLEntityEnd return HTMLEntity(text.text, named=False, hexadecimal=True, hex_char=token.char) self._tokens.pop() # Remove HTMLEntityEnd return HTMLEntity(token.text, named=False, hexadecimal=False) self._tokens.pop() # Remove HTMLEntityEnd return HTMLEntity(token.text, named=True, hexadecimal=False) @_add_handler(tokens.HeadingStart) def _handle_heading(self, token): """Handle a case where a heading is at the head of the tokens.""" level = token.level self._push() while self._tokens: token = self._tokens.pop() if isinstance(token, tokens.HeadingEnd): title = self._pop() return Heading(title, level) else: self._write(self._handle_token(token)) raise ParserError("_handle_heading() missed a close token") @_add_handler(tokens.CommentStart) def _handle_comment(self, token): """Handle a case where an HTML comment is at the head of the tokens.""" self._push() while self._tokens: token = self._tokens.pop() if isinstance(token, tokens.CommentEnd): contents = self._pop() return Comment(contents) else: self._write(self._handle_token(token)) raise ParserError("_handle_comment() missed a close token") def _handle_attribute(self, start): """Handle a case where a tag attribute is at the head of the tokens.""" name = quotes = None self._push() while self._tokens: token = self._tokens.pop() if isinstance(token, tokens.TagAttrEquals): name = self._pop() self._push() elif isinstance(token, tokens.TagAttrQuote): quotes = token.char elif isinstance(token, (tokens.TagAttrStart, tokens.TagCloseOpen, tokens.TagCloseSelfclose)): self._tokens.append(token) if name: value = self._pop() else: name, value = self._pop(), None return Attribute(name, value, quotes, start.pad_first, start.pad_before_eq, start.pad_after_eq, check_quotes=False) else: self._write(self._handle_token(token)) raise ParserError("_handle_attribute() missed a close token") @_add_handler(tokens.TagOpenOpen) def _handle_tag(self, token): """Handle a case where a tag is at the head of the tokens.""" close_tokens = (tokens.TagCloseSelfclose, tokens.TagCloseClose) implicit, attrs, contents, closing_tag = False, [], None, None wiki_markup, invalid = token.wiki_markup, token.invalid or False wiki_style_separator, closing_wiki_markup = None, wiki_markup self._push() while self._tokens: token = self._tokens.pop() if isinstance(token, tokens.TagAttrStart): attrs.append(self._handle_attribute(token)) elif isinstance(token, tokens.TagCloseOpen): wiki_style_separator = token.wiki_markup padding = token.padding or "" tag = self._pop() self._push() elif isinstance(token, tokens.TagOpenClose): closing_wiki_markup = token.wiki_markup contents = self._pop() self._push() elif isinstance(token, close_tokens): if isinstance(token, tokens.TagCloseSelfclose): closing_wiki_markup = token.wiki_markup tag = self._pop() self_closing = True padding = token.padding or "" implicit = token.implicit or False else: self_closing = False closing_tag = self._pop() return Tag(tag, contents, attrs, wiki_markup, self_closing, invalid, implicit, padding, closing_tag, wiki_style_separator, closing_wiki_markup) else: self._write(self._handle_token(token)) raise ParserError("_handle_tag() missed a close token") def _handle_token(self, token): """Handle a single token.""" try: return _HANDLERS[type(token)](self, token) except KeyError: err = "_handle_token() got unexpected {0}" raise ParserError(err.format(type(token).__name__)) def build(self, tokenlist): """Build a Wikicode object from a list tokens and return it.""" self._tokens = tokenlist self._tokens.reverse() self._push() while self._tokens: node = self._handle_token(self._tokens.pop()) self._write(node) return self._pop() del _add_handler mwparserfromhell-0.4.2/mwparserfromhell/parser/contexts.py000066400000000000000000000127421255634533200242320ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. """ This module contains various "context" definitions, which are essentially flags set during the tokenization process, either on the current parse stack (local contexts) or affecting all stacks (global contexts). They represent the context the tokenizer is in, such as inside a template's name definition, or inside a level-two heading. This is used to determine what tokens are valid at the current point and also if the current parsing route is invalid. The tokenizer stores context as an integer, with these definitions bitwise OR'd to set them, AND'd to check if they're set, and XOR'd to unset them. The advantage of this is that contexts can have sub-contexts (as ``FOO == 0b11`` will cover ``BAR == 0b10`` and ``BAZ == 0b01``). Local (stack-specific) contexts: * :const:`TEMPLATE` * :const:`TEMPLATE_NAME` * :const:`TEMPLATE_PARAM_KEY` * :const:`TEMPLATE_PARAM_VALUE` * :const:`ARGUMENT` * :const:`ARGUMENT_NAME` * :const:`ARGUMENT_DEFAULT` * :const:`WIKILINK` * :const:`WIKILINK_TITLE` * :const:`WIKILINK_TEXT` * :const:`EXT_LINK` * :const:`EXT_LINK_URI` * :const:`EXT_LINK_TITLE` * :const:`HEADING` * :const:`HEADING_LEVEL_1` * :const:`HEADING_LEVEL_2` * :const:`HEADING_LEVEL_3` * :const:`HEADING_LEVEL_4` * :const:`HEADING_LEVEL_5` * :const:`HEADING_LEVEL_6` * :const:`TAG` * :const:`TAG_OPEN` * :const:`TAG_ATTR` * :const:`TAG_BODY` * :const:`TAG_CLOSE` * :const:`STYLE` * :const:`STYLE_ITALICS` * :const:`STYLE_BOLD` * :const:`STYLE_PASS_AGAIN` * :const:`STYLE_SECOND_PASS` * :const:`DL_TERM` * :const:`SAFETY_CHECK` * :const:`HAS_TEXT` * :const:`FAIL_ON_TEXT` * :const:`FAIL_NEXT` * :const:`FAIL_ON_LBRACE` * :const:`FAIL_ON_RBRACE` * :const:`FAIL_ON_EQUALS` * :const:`HAS_TEMPLATE` * :const:`TABLE` * :const:`TABLE_OPEN` * :const:`TABLE_CELL_OPEN` * :const:`TABLE_CELL_STYLE` * :const:`TABLE_TD_LINE` * :const:`TABLE_TH_LINE` * :const:`TABLE_CELL_LINE_CONTEXTS` Global contexts: * :const:`GL_HEADING` Aggregate contexts: * :const:`FAIL` * :const:`UNSAFE` * :const:`DOUBLE` * :const:`NO_WIKILINKS` * :const:`NO_EXT_LINKS` """ # Local contexts: TEMPLATE_NAME = 1 << 0 TEMPLATE_PARAM_KEY = 1 << 1 TEMPLATE_PARAM_VALUE = 1 << 2 TEMPLATE = TEMPLATE_NAME + TEMPLATE_PARAM_KEY + TEMPLATE_PARAM_VALUE ARGUMENT_NAME = 1 << 3 ARGUMENT_DEFAULT = 1 << 4 ARGUMENT = ARGUMENT_NAME + ARGUMENT_DEFAULT WIKILINK_TITLE = 1 << 5 WIKILINK_TEXT = 1 << 6 WIKILINK = WIKILINK_TITLE + WIKILINK_TEXT EXT_LINK_URI = 1 << 7 EXT_LINK_TITLE = 1 << 8 EXT_LINK = EXT_LINK_URI + EXT_LINK_TITLE HEADING_LEVEL_1 = 1 << 9 HEADING_LEVEL_2 = 1 << 10 HEADING_LEVEL_3 = 1 << 11 HEADING_LEVEL_4 = 1 << 12 HEADING_LEVEL_5 = 1 << 13 HEADING_LEVEL_6 = 1 << 14 HEADING = (HEADING_LEVEL_1 + HEADING_LEVEL_2 + HEADING_LEVEL_3 + HEADING_LEVEL_4 + HEADING_LEVEL_5 + HEADING_LEVEL_6) TAG_OPEN = 1 << 15 TAG_ATTR = 1 << 16 TAG_BODY = 1 << 17 TAG_CLOSE = 1 << 18 TAG = TAG_OPEN + TAG_ATTR + TAG_BODY + TAG_CLOSE STYLE_ITALICS = 1 << 19 STYLE_BOLD = 1 << 20 STYLE_PASS_AGAIN = 1 << 21 STYLE_SECOND_PASS = 1 << 22 STYLE = STYLE_ITALICS + STYLE_BOLD + STYLE_PASS_AGAIN + STYLE_SECOND_PASS DL_TERM = 1 << 23 HAS_TEXT = 1 << 24 FAIL_ON_TEXT = 1 << 25 FAIL_NEXT = 1 << 26 FAIL_ON_LBRACE = 1 << 27 FAIL_ON_RBRACE = 1 << 28 FAIL_ON_EQUALS = 1 << 29 HAS_TEMPLATE = 1 << 30 SAFETY_CHECK = (HAS_TEXT + FAIL_ON_TEXT + FAIL_NEXT + FAIL_ON_LBRACE + FAIL_ON_RBRACE + FAIL_ON_EQUALS + HAS_TEMPLATE) TABLE_OPEN = 1 << 31 TABLE_CELL_OPEN = 1 << 32 TABLE_CELL_STYLE = 1 << 33 TABLE_ROW_OPEN = 1 << 34 TABLE_TD_LINE = 1 << 35 TABLE_TH_LINE = 1 << 36 TABLE_CELL_LINE_CONTEXTS = TABLE_TD_LINE + TABLE_TH_LINE + TABLE_CELL_STYLE TABLE = (TABLE_OPEN + TABLE_CELL_OPEN + TABLE_CELL_STYLE + TABLE_ROW_OPEN + TABLE_TD_LINE + TABLE_TH_LINE) # Global contexts: GL_HEADING = 1 << 0 # Aggregate contexts: FAIL = (TEMPLATE + ARGUMENT + WIKILINK + EXT_LINK_TITLE + HEADING + TAG + STYLE + TABLE) UNSAFE = (TEMPLATE_NAME + WIKILINK_TITLE + EXT_LINK_TITLE + TEMPLATE_PARAM_KEY + ARGUMENT_NAME + TAG_CLOSE) DOUBLE = TEMPLATE_PARAM_KEY + TAG_CLOSE + TABLE_ROW_OPEN NO_WIKILINKS = TEMPLATE_NAME + ARGUMENT_NAME + WIKILINK_TITLE + EXT_LINK_URI NO_EXT_LINKS = TEMPLATE_NAME + ARGUMENT_NAME + WIKILINK_TITLE + EXT_LINK mwparserfromhell-0.4.2/mwparserfromhell/parser/ctokenizer/000077500000000000000000000000001255634533200241605ustar00rootroot00000000000000mwparserfromhell-0.4.2/mwparserfromhell/parser/ctokenizer/common.h000066400000000000000000000074541255634533200256330ustar00rootroot00000000000000/* Copyright (C) 2012-2015 Ben Kurtovic Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #ifndef PY_SSIZE_T_CLEAN #define PY_SSIZE_T_CLEAN // See: https://docs.python.org/2/c-api/arg.html #endif #include #include #include /* Compatibility macros */ #if PY_MAJOR_VERSION >= 3 #define IS_PY3K #endif #ifndef uint64_t #define uint64_t unsigned PY_LONG_LONG #endif #define malloc PyObject_Malloc // XXX: yuck #define realloc PyObject_Realloc #define free PyObject_Free /* Unicode support macros */ #if defined(IS_PY3K) && PY_MINOR_VERSION >= 3 #define PEP_393 #endif #ifdef PEP_393 #define Unicode Py_UCS4 #define PyUnicode_FROM_SINGLE(chr) \ PyUnicode_FromKindAndData(PyUnicode_4BYTE_KIND, &(chr), 1) #else #define Unicode Py_UNICODE #define PyUnicode_FROM_SINGLE(chr) \ PyUnicode_FromUnicode(&(chr), 1) #define PyUnicode_GET_LENGTH PyUnicode_GET_SIZE #endif /* Error handling macros */ #define BAD_ROUTE self->route_state #define BAD_ROUTE_CONTEXT self->route_context #define FAIL_ROUTE(context) { \ self->route_state = 1; \ self->route_context = context; \ } #define RESET_ROUTE() self->route_state = 0 /* Shared globals */ extern char** entitydefs; extern PyObject* NOARGS; extern PyObject* definitions; /* Structs */ typedef struct { Py_ssize_t capacity; Py_ssize_t length; #ifdef PEP_393 PyObject* object; int kind; void* data; #else Py_UNICODE* data; #endif } Textbuffer; struct Stack { PyObject* stack; uint64_t context; Textbuffer* textbuffer; struct Stack* next; }; typedef struct Stack Stack; typedef struct { PyObject* object; /* base PyUnicodeObject object */ Py_ssize_t length; /* length of object, in code points */ #ifdef PEP_393 int kind; /* object's kind value */ void* data; /* object's raw unicode buffer */ #else Py_UNICODE* buf; /* object's internal buffer */ #endif } TokenizerInput; typedef struct { PyObject_HEAD TokenizerInput text; /* text to tokenize */ Stack* topstack; /* topmost stack */ Py_ssize_t head; /* current position in text */ int global; /* global context */ int depth; /* stack recursion depth */ int cycles; /* total number of stack recursions */ int route_state; /* whether a BadRoute has been triggered */ uint64_t route_context; /* context when the last BadRoute was triggered */ int skip_style_tags; /* temp fix for the sometimes broken tag parser */ } Tokenizer; mwparserfromhell-0.4.2/mwparserfromhell/parser/ctokenizer/contexts.h000066400000000000000000000107631255634533200262070ustar00rootroot00000000000000/* Copyright (C) 2012-2015 Ben Kurtovic Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once /* Local contexts */ #define LC_TEMPLATE 0x0000000000000007 #define LC_TEMPLATE_NAME 0x0000000000000001 #define LC_TEMPLATE_PARAM_KEY 0x0000000000000002 #define LC_TEMPLATE_PARAM_VALUE 0x0000000000000004 #define LC_ARGUMENT 0x0000000000000018 #define LC_ARGUMENT_NAME 0x0000000000000008 #define LC_ARGUMENT_DEFAULT 0x0000000000000010 #define LC_WIKILINK 0x0000000000000060 #define LC_WIKILINK_TITLE 0x0000000000000020 #define LC_WIKILINK_TEXT 0x0000000000000040 #define LC_EXT_LINK 0x0000000000000180 #define LC_EXT_LINK_URI 0x0000000000000080 #define LC_EXT_LINK_TITLE 0x0000000000000100 #define LC_HEADING 0x0000000000007E00 #define LC_HEADING_LEVEL_1 0x0000000000000200 #define LC_HEADING_LEVEL_2 0x0000000000000400 #define LC_HEADING_LEVEL_3 0x0000000000000800 #define LC_HEADING_LEVEL_4 0x0000000000001000 #define LC_HEADING_LEVEL_5 0x0000000000002000 #define LC_HEADING_LEVEL_6 0x0000000000004000 #define LC_TAG 0x0000000000078000 #define LC_TAG_OPEN 0x0000000000008000 #define LC_TAG_ATTR 0x0000000000010000 #define LC_TAG_BODY 0x0000000000020000 #define LC_TAG_CLOSE 0x0000000000040000 #define LC_STYLE 0x0000000000780000 #define LC_STYLE_ITALICS 0x0000000000080000 #define LC_STYLE_BOLD 0x0000000000100000 #define LC_STYLE_PASS_AGAIN 0x0000000000200000 #define LC_STYLE_SECOND_PASS 0x0000000000400000 #define LC_DLTERM 0x0000000000800000 #define LC_SAFETY_CHECK 0x000000007F000000 #define LC_HAS_TEXT 0x0000000001000000 #define LC_FAIL_ON_TEXT 0x0000000002000000 #define LC_FAIL_NEXT 0x0000000004000000 #define LC_FAIL_ON_LBRACE 0x0000000008000000 #define LC_FAIL_ON_RBRACE 0x0000000010000000 #define LC_FAIL_ON_EQUALS 0x0000000020000000 #define LC_HAS_TEMPLATE 0x0000000040000000 #define LC_TABLE 0x0000001F80000000 #define LC_TABLE_CELL_LINE_CONTEXTS 0x0000001A00000000 #define LC_TABLE_OPEN 0x0000000080000000 #define LC_TABLE_CELL_OPEN 0x0000000100000000 #define LC_TABLE_CELL_STYLE 0x0000000200000000 #define LC_TABLE_ROW_OPEN 0x0000000400000000 #define LC_TABLE_TD_LINE 0x0000000800000000 #define LC_TABLE_TH_LINE 0x0000001000000000 /* Global contexts */ #define GL_HEADING 0x1 /* Aggregate contexts */ #define AGG_FAIL (LC_TEMPLATE | LC_ARGUMENT | LC_WIKILINK | LC_EXT_LINK_TITLE | LC_HEADING | LC_TAG | LC_STYLE | LC_TABLE_OPEN) #define AGG_UNSAFE (LC_TEMPLATE_NAME | LC_WIKILINK_TITLE | LC_EXT_LINK_TITLE | LC_TEMPLATE_PARAM_KEY | LC_ARGUMENT_NAME) #define AGG_DOUBLE (LC_TEMPLATE_PARAM_KEY | LC_TAG_CLOSE | LC_TABLE_ROW_OPEN) #define AGG_NO_WIKILINKS (LC_TEMPLATE_NAME | LC_ARGUMENT_NAME | LC_WIKILINK_TITLE | LC_EXT_LINK_URI) #define AGG_NO_EXT_LINKS (LC_TEMPLATE_NAME | LC_ARGUMENT_NAME | LC_WIKILINK_TITLE | LC_EXT_LINK) /* Tag contexts */ #define TAG_NAME 0x01 #define TAG_ATTR_READY 0x02 #define TAG_ATTR_NAME 0x04 #define TAG_ATTR_VALUE 0x08 #define TAG_QUOTED 0x10 #define TAG_NOTE_SPACE 0x20 #define TAG_NOTE_EQUALS 0x40 #define TAG_NOTE_QUOTE 0x80 mwparserfromhell-0.4.2/mwparserfromhell/parser/ctokenizer/tag_data.c000066400000000000000000000044661255634533200261020ustar00rootroot00000000000000/* Copyright (C) 2012-2015 Ben Kurtovic Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "tag_data.h" #include "contexts.h" /* Initialize a new TagData object. */ TagData* TagData_new(TokenizerInput* text) { #define ALLOC_BUFFER(name) \ name = Textbuffer_new(text); \ if (!name) { \ TagData_dealloc(self); \ return NULL; \ } TagData *self = malloc(sizeof(TagData)); if (!self) { PyErr_NoMemory(); return NULL; } self->context = TAG_NAME; ALLOC_BUFFER(self->pad_first) ALLOC_BUFFER(self->pad_before_eq) ALLOC_BUFFER(self->pad_after_eq) self->quoter = 0; self->reset = 0; return self; #undef ALLOC_BUFFER } /* Deallocate the given TagData object. */ void TagData_dealloc(TagData* self) { if (self->pad_first) Textbuffer_dealloc(self->pad_first); if (self->pad_before_eq) Textbuffer_dealloc(self->pad_before_eq); if (self->pad_after_eq) Textbuffer_dealloc(self->pad_after_eq); free(self); } /* Clear the internal buffers of the given TagData object. */ int TagData_reset_buffers(TagData* self) { if (Textbuffer_reset(self->pad_first) || Textbuffer_reset(self->pad_before_eq) || Textbuffer_reset(self->pad_after_eq)) return -1; return 0; } mwparserfromhell-0.4.2/mwparserfromhell/parser/ctokenizer/tag_data.h000066400000000000000000000027011255634533200260750ustar00rootroot00000000000000/* Copyright (C) 2012-2015 Ben Kurtovic Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include "common.h" #include "textbuffer.h" /* Structs */ typedef struct { uint64_t context; Textbuffer* pad_first; Textbuffer* pad_before_eq; Textbuffer* pad_after_eq; Unicode quoter; Py_ssize_t reset; } TagData; /* Functions */ TagData* TagData_new(TokenizerInput*); void TagData_dealloc(TagData*); int TagData_reset_buffers(TagData*); mwparserfromhell-0.4.2/mwparserfromhell/parser/ctokenizer/textbuffer.c000066400000000000000000000131101255634533200264760ustar00rootroot00000000000000/* Copyright (C) 2012-2015 Ben Kurtovic Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "textbuffer.h" #define INITIAL_CAPACITY 32 #define RESIZE_FACTOR 2 #define CONCAT_EXTRA 32 /* Internal allocation function for textbuffers. */ static int internal_alloc(Textbuffer* self, Unicode maxchar) { self->capacity = INITIAL_CAPACITY; self->length = 0; #ifdef PEP_393 self->object = PyUnicode_New(self->capacity, maxchar); if (!self->object) return -1; self->kind = PyUnicode_KIND(self->object); self->data = PyUnicode_DATA(self->object); #else (void) maxchar; // Unused self->data = malloc(sizeof(Unicode) * self->capacity); if (!self->data) return -1; #endif return 0; } /* Internal deallocation function for textbuffers. */ static void internal_dealloc(Textbuffer* self) { #ifdef PEP_393 Py_DECREF(self->object); #else free(self->data); #endif } /* Internal resize function. */ static int internal_resize(Textbuffer* self, Py_ssize_t new_cap) { #ifdef PEP_393 PyObject *newobj; void *newdata; newobj = PyUnicode_New(new_cap, PyUnicode_MAX_CHAR_VALUE(self->object)); if (!newobj) return -1; newdata = PyUnicode_DATA(newobj); memcpy(newdata, self->data, self->length * self->kind); Py_DECREF(self->object); self->object = newobj; self->data = newdata; #else if (!(self->data = realloc(self->data, sizeof(Unicode) * new_cap))) return -1; #endif self->capacity = new_cap; return 0; } /* Create a new textbuffer object. */ Textbuffer* Textbuffer_new(TokenizerInput* text) { Textbuffer* self = malloc(sizeof(Textbuffer)); Unicode maxchar = 0; #ifdef PEP_393 maxchar = PyUnicode_MAX_CHAR_VALUE(text->object); #endif if (!self) goto fail_nomem; if (internal_alloc(self, maxchar) < 0) goto fail_dealloc; return self; fail_dealloc: free(self); fail_nomem: PyErr_NoMemory(); return NULL; } /* Deallocate the given textbuffer. */ void Textbuffer_dealloc(Textbuffer* self) { internal_dealloc(self); free(self); } /* Reset a textbuffer to its initial, empty state. */ int Textbuffer_reset(Textbuffer* self) { Unicode maxchar = 0; #ifdef PEP_393 maxchar = PyUnicode_MAX_CHAR_VALUE(self->object); #endif internal_dealloc(self); if (internal_alloc(self, maxchar)) return -1; return 0; } /* Write a Unicode codepoint to the given textbuffer. */ int Textbuffer_write(Textbuffer* self, Unicode code) { if (self->length >= self->capacity) { if (internal_resize(self, self->capacity * RESIZE_FACTOR) < 0) return -1; } #ifdef PEP_393 PyUnicode_WRITE(self->kind, self->data, self->length++, code); #else self->data[self->length++] = code; #endif return 0; } /* Read a Unicode codepoint from the given index of the given textbuffer. This function does not check for bounds. */ Unicode Textbuffer_read(Textbuffer* self, Py_ssize_t index) { #ifdef PEP_393 return PyUnicode_READ(self->kind, self->data, index); #else return self->data[index]; #endif } /* Return the contents of the textbuffer as a Python Unicode object. */ PyObject* Textbuffer_render(Textbuffer* self) { #ifdef PEP_393 return PyUnicode_FromKindAndData(self->kind, self->data, self->length); #else return PyUnicode_FromUnicode(self->data, self->length); #endif } /* Concatenate the 'other' textbuffer onto the end of the given textbuffer. */ int Textbuffer_concat(Textbuffer* self, Textbuffer* other) { Py_ssize_t newlen = self->length + other->length; if (newlen > self->capacity) { if (internal_resize(self, newlen + CONCAT_EXTRA) < 0) return -1; } #ifdef PEP_393 assert(self->kind == other->kind); memcpy(((Py_UCS1*) self->data) + self->kind * self->length, other->data, other->length * other->kind); #else memcpy(self->data + self->length, other->data, other->length * sizeof(Unicode)); #endif self->length = newlen; return 0; } /* Reverse the contents of the given textbuffer. */ void Textbuffer_reverse(Textbuffer* self) { Py_ssize_t i, end = self->length - 1; Unicode tmp; for (i = 0; i < self->length / 2; i++) { #ifdef PEP_393 tmp = PyUnicode_READ(self->kind, self->data, i); PyUnicode_WRITE(self->kind, self->data, i, PyUnicode_READ(self->kind, self->data, end - i)); PyUnicode_WRITE(self->kind, self->data, end - i, tmp); #else tmp = self->data[i]; self->data[i] = self->data[end - i]; self->data[end - i] = tmp; #endif } } mwparserfromhell-0.4.2/mwparserfromhell/parser/ctokenizer/textbuffer.h000066400000000000000000000027161255634533200265150ustar00rootroot00000000000000/* Copyright (C) 2012-2015 Ben Kurtovic Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #pragma once #include "common.h" /* Functions */ Textbuffer* Textbuffer_new(TokenizerInput*); void Textbuffer_dealloc(Textbuffer*); int Textbuffer_reset(Textbuffer*); int Textbuffer_write(Textbuffer*, Unicode); Unicode Textbuffer_read(Textbuffer*, Py_ssize_t); PyObject* Textbuffer_render(Textbuffer*); int Textbuffer_concat(Textbuffer*, Textbuffer*); void Textbuffer_reverse(Textbuffer*); mwparserfromhell-0.4.2/mwparserfromhell/parser/ctokenizer/tok_parse.c000066400000000000000000002417341255634533200263260ustar00rootroot00000000000000/* Copyright (C) 2012-2015 Ben Kurtovic Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include "tok_parse.h" #include "contexts.h" #include "tag_data.h" #include "tok_support.h" #include "tokens.h" #define DIGITS "0123456789" #define HEXDIGITS "0123456789abcdefABCDEF" #define ALPHANUM "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" #define MAX_BRACES 255 #define MAX_ENTITY_SIZE 8 #define GET_HTML_TAG(markup) (markup == ':' ? "dd" : markup == ';' ? "dt" : "li") #define IS_PARSABLE(tag) (call_def_func("is_parsable", tag, NULL)) #define IS_SINGLE(tag) (call_def_func("is_single", tag, NULL)) #define IS_SINGLE_ONLY(tag) (call_def_func("is_single_only", tag, NULL)) #define IS_SCHEME(scheme, slashes) \ (call_def_func("is_scheme", scheme, slashes ? Py_True : Py_False)) typedef struct { PyObject* title; int level; } HeadingData; /* Forward declarations */ static int Tokenizer_parse_entity(Tokenizer*); static int Tokenizer_parse_comment(Tokenizer*); static int Tokenizer_handle_dl_term(Tokenizer*); static int Tokenizer_parse_tag(Tokenizer*); /* Determine whether the given code point is a marker. */ static int is_marker(Unicode this) { int i; for (i = 0; i < NUM_MARKERS; i++) { if (MARKERS[i] == this) return 1; } return 0; } /* Given a context, return the heading level encoded within it. */ static int heading_level_from_context(uint64_t n) { int level; n /= LC_HEADING_LEVEL_1; for (level = 1; n > 1; n >>= 1) level++; return level; } /* Call the given function in definitions.py, using 'in1' and 'in2' as parameters, and return its output as a bool. */ static int call_def_func(const char* funcname, PyObject* in1, PyObject* in2) { PyObject* func = PyObject_GetAttrString(definitions, funcname); PyObject* result = PyObject_CallFunctionObjArgs(func, in1, in2, NULL); int ans = (result == Py_True) ? 1 : 0; Py_DECREF(func); Py_DECREF(result); return ans; } /* Sanitize the name of a tag so it can be compared with others for equality. */ static PyObject* strip_tag_name(PyObject* token, int take_attr) { PyObject *text, *rstripped, *lowered; if (take_attr) { text = PyObject_GetAttrString(token, "text"); if (!text) return NULL; rstripped = PyObject_CallMethod(text, "rstrip", NULL); Py_DECREF(text); } else rstripped = PyObject_CallMethod(token, "rstrip", NULL); if (!rstripped) return NULL; lowered = PyObject_CallMethod(rstripped, "lower", NULL); Py_DECREF(rstripped); return lowered; } /* Parse a template at the head of the wikicode string. */ static int Tokenizer_parse_template(Tokenizer* self, int has_content) { PyObject *template; Py_ssize_t reset = self->head; uint64_t context = LC_TEMPLATE_NAME; if (has_content) context |= LC_HAS_TEMPLATE; template = Tokenizer_parse(self, context, 1); if (BAD_ROUTE) { self->head = reset; return 0; } if (!template) return -1; if (Tokenizer_emit_first(self, TemplateOpen)) { Py_DECREF(template); return -1; } if (Tokenizer_emit_all(self, template)) { Py_DECREF(template); return -1; } Py_DECREF(template); if (Tokenizer_emit(self, TemplateClose)) return -1; return 0; } /* Parse an argument at the head of the wikicode string. */ static int Tokenizer_parse_argument(Tokenizer* self) { PyObject *argument; Py_ssize_t reset = self->head; argument = Tokenizer_parse(self, LC_ARGUMENT_NAME, 1); if (BAD_ROUTE) { self->head = reset; return 0; } if (!argument) return -1; if (Tokenizer_emit_first(self, ArgumentOpen)) { Py_DECREF(argument); return -1; } if (Tokenizer_emit_all(self, argument)) { Py_DECREF(argument); return -1; } Py_DECREF(argument); if (Tokenizer_emit(self, ArgumentClose)) return -1; return 0; } /* Parse a template or argument at the head of the wikicode string. */ static int Tokenizer_parse_template_or_argument(Tokenizer* self) { unsigned int braces = 2, i; int has_content = 0; PyObject *tokenlist; self->head += 2; while (Tokenizer_read(self, 0) == '{' && braces < MAX_BRACES) { self->head++; braces++; } if (Tokenizer_push(self, 0)) return -1; while (braces) { if (braces == 1) { if (Tokenizer_emit_text_then_stack(self, "{")) return -1; return 0; } if (braces == 2) { if (Tokenizer_parse_template(self, has_content)) return -1; if (BAD_ROUTE) { RESET_ROUTE(); if (Tokenizer_emit_text_then_stack(self, "{{")) return -1; return 0; } break; } if (Tokenizer_parse_argument(self)) return -1; if (BAD_ROUTE) { RESET_ROUTE(); if (Tokenizer_parse_template(self, has_content)) return -1; if (BAD_ROUTE) { char text[MAX_BRACES + 1]; RESET_ROUTE(); for (i = 0; i < braces; i++) text[i] = '{'; text[braces] = '\0'; if (Tokenizer_emit_text_then_stack(self, text)) return -1; return 0; } else braces -= 2; } else braces -= 3; if (braces) { has_content = 1; self->head++; } } tokenlist = Tokenizer_pop(self); if (!tokenlist) return -1; if (Tokenizer_emit_all(self, tokenlist)) { Py_DECREF(tokenlist); return -1; } Py_DECREF(tokenlist); if (self->topstack->context & LC_FAIL_NEXT) self->topstack->context ^= LC_FAIL_NEXT; return 0; } /* Handle a template parameter at the head of the string. */ static int Tokenizer_handle_template_param(Tokenizer* self) { PyObject *stack; if (self->topstack->context & LC_TEMPLATE_NAME) { if (!(self->topstack->context & (LC_HAS_TEXT | LC_HAS_TEMPLATE))) { Tokenizer_fail_route(self); return -1; } self->topstack->context ^= LC_TEMPLATE_NAME; } else if (self->topstack->context & LC_TEMPLATE_PARAM_VALUE) self->topstack->context ^= LC_TEMPLATE_PARAM_VALUE; if (self->topstack->context & LC_TEMPLATE_PARAM_KEY) { stack = Tokenizer_pop_keeping_context(self); if (!stack) return -1; if (Tokenizer_emit_all(self, stack)) { Py_DECREF(stack); return -1; } Py_DECREF(stack); } else self->topstack->context |= LC_TEMPLATE_PARAM_KEY; if (Tokenizer_emit(self, TemplateParamSeparator)) return -1; if (Tokenizer_push(self, self->topstack->context)) return -1; return 0; } /* Handle a template parameter's value at the head of the string. */ static int Tokenizer_handle_template_param_value(Tokenizer* self) { PyObject *stack; stack = Tokenizer_pop_keeping_context(self); if (!stack) return -1; if (Tokenizer_emit_all(self, stack)) { Py_DECREF(stack); return -1; } Py_DECREF(stack); self->topstack->context ^= LC_TEMPLATE_PARAM_KEY; self->topstack->context |= LC_TEMPLATE_PARAM_VALUE; if (Tokenizer_emit(self, TemplateParamEquals)) return -1; return 0; } /* Handle the end of a template at the head of the string. */ static PyObject* Tokenizer_handle_template_end(Tokenizer* self) { PyObject* stack; if (self->topstack->context & LC_TEMPLATE_NAME) { if (!(self->topstack->context & (LC_HAS_TEXT | LC_HAS_TEMPLATE))) return Tokenizer_fail_route(self); } else if (self->topstack->context & LC_TEMPLATE_PARAM_KEY) { stack = Tokenizer_pop_keeping_context(self); if (!stack) return NULL; if (Tokenizer_emit_all(self, stack)) { Py_DECREF(stack); return NULL; } Py_DECREF(stack); } self->head++; stack = Tokenizer_pop(self); return stack; } /* Handle the separator between an argument's name and default. */ static int Tokenizer_handle_argument_separator(Tokenizer* self) { self->topstack->context ^= LC_ARGUMENT_NAME; self->topstack->context |= LC_ARGUMENT_DEFAULT; if (Tokenizer_emit(self, ArgumentSeparator)) return -1; return 0; } /* Handle the end of an argument at the head of the string. */ static PyObject* Tokenizer_handle_argument_end(Tokenizer* self) { PyObject* stack = Tokenizer_pop(self); self->head += 2; return stack; } /* Parse an internal wikilink at the head of the wikicode string. */ static int Tokenizer_parse_wikilink(Tokenizer* self) { Py_ssize_t reset; PyObject *wikilink; self->head += 2; reset = self->head - 1; wikilink = Tokenizer_parse(self, LC_WIKILINK_TITLE, 1); if (BAD_ROUTE) { RESET_ROUTE(); self->head = reset; if (Tokenizer_emit_text(self, "[[")) return -1; return 0; } if (!wikilink) return -1; if (Tokenizer_emit(self, WikilinkOpen)) { Py_DECREF(wikilink); return -1; } if (Tokenizer_emit_all(self, wikilink)) { Py_DECREF(wikilink); return -1; } Py_DECREF(wikilink); if (Tokenizer_emit(self, WikilinkClose)) return -1; return 0; } /* Handle the separator between a wikilink's title and its text. */ static int Tokenizer_handle_wikilink_separator(Tokenizer* self) { self->topstack->context ^= LC_WIKILINK_TITLE; self->topstack->context |= LC_WIKILINK_TEXT; if (Tokenizer_emit(self, WikilinkSeparator)) return -1; return 0; } /* Handle the end of a wikilink at the head of the string. */ static PyObject* Tokenizer_handle_wikilink_end(Tokenizer* self) { PyObject* stack = Tokenizer_pop(self); self->head += 1; return stack; } /* Parse the URI scheme of a bracket-enclosed external link. */ static int Tokenizer_parse_bracketed_uri_scheme(Tokenizer* self) { static const char* valid = "abcdefghijklmnopqrstuvwxyz0123456789+.-"; Textbuffer* buffer; PyObject* scheme; Unicode this; int slashes, i; if (Tokenizer_push(self, LC_EXT_LINK_URI)) return -1; if (Tokenizer_read(self, 0) == '/' && Tokenizer_read(self, 1) == '/') { if (Tokenizer_emit_text(self, "//")) return -1; self->head += 2; } else { buffer = Textbuffer_new(&self->text); if (!buffer) return -1; while ((this = Tokenizer_read(self, 0))) { i = 0; while (1) { if (!valid[i]) goto end_of_loop; if (this == valid[i]) break; i++; } Textbuffer_write(buffer, this); if (Tokenizer_emit_char(self, this)) { Textbuffer_dealloc(buffer); return -1; } self->head++; } end_of_loop: if (this != ':') { Textbuffer_dealloc(buffer); Tokenizer_fail_route(self); return 0; } if (Tokenizer_emit_char(self, ':')) { Textbuffer_dealloc(buffer); return -1; } self->head++; slashes = (Tokenizer_read(self, 0) == '/' && Tokenizer_read(self, 1) == '/'); if (slashes) { if (Tokenizer_emit_text(self, "//")) { Textbuffer_dealloc(buffer); return -1; } self->head += 2; } scheme = Textbuffer_render(buffer); Textbuffer_dealloc(buffer); if (!scheme) return -1; if (!IS_SCHEME(scheme, slashes)) { Py_DECREF(scheme); Tokenizer_fail_route(self); return 0; } Py_DECREF(scheme); } return 0; } /* Parse the URI scheme of a free (no brackets) external link. */ static int Tokenizer_parse_free_uri_scheme(Tokenizer* self) { static const char* valid = "abcdefghijklmnopqrstuvwxyz0123456789+.-"; Textbuffer *scheme_buffer = Textbuffer_new(&self->text); PyObject *scheme; Unicode chunk; Py_ssize_t i; int slashes, j; if (!scheme_buffer) return -1; // We have to backtrack through the textbuffer looking for our scheme since // it was just parsed as text: for (i = self->topstack->textbuffer->length - 1; i >= 0; i--) { chunk = Textbuffer_read(self->topstack->textbuffer, i); if (Py_UNICODE_ISSPACE(chunk) || is_marker(chunk)) goto end_of_loop; j = 0; do { if (!valid[j]) { Textbuffer_dealloc(scheme_buffer); FAIL_ROUTE(0); return 0; } } while (chunk != valid[j++]); Textbuffer_write(scheme_buffer, chunk); } end_of_loop: Textbuffer_reverse(scheme_buffer); scheme = Textbuffer_render(scheme_buffer); if (!scheme) { Textbuffer_dealloc(scheme_buffer); return -1; } slashes = (Tokenizer_read(self, 0) == '/' && Tokenizer_read(self, 1) == '/'); if (!IS_SCHEME(scheme, slashes)) { Py_DECREF(scheme); Textbuffer_dealloc(scheme_buffer); FAIL_ROUTE(0); return 0; } Py_DECREF(scheme); if (Tokenizer_push(self, self->topstack->context | LC_EXT_LINK_URI)) { Textbuffer_dealloc(scheme_buffer); return -1; } if (Tokenizer_emit_textbuffer(self, scheme_buffer)) return -1; if (Tokenizer_emit_char(self, ':')) return -1; if (slashes) { if (Tokenizer_emit_text(self, "//")) return -1; self->head += 2; } return 0; } /* Handle text in a free external link, including trailing punctuation. */ static int Tokenizer_handle_free_link_text( Tokenizer* self, int* parens, Textbuffer* tail, Unicode this) { #define PUSH_TAIL_BUFFER(tail, error) \ if (tail->length > 0) { \ if (Textbuffer_concat(self->topstack->textbuffer, tail)) \ return error; \ if (Textbuffer_reset(tail)) \ return error; \ } if (this == '(' && !(*parens)) { *parens = 1; PUSH_TAIL_BUFFER(tail, -1) } else if (this == ',' || this == ';' || this == '\\' || this == '.' || this == ':' || this == '!' || this == '?' || (!(*parens) && this == ')')) return Textbuffer_write(tail, this); else PUSH_TAIL_BUFFER(tail, -1) return Tokenizer_emit_char(self, this); } /* Return whether the current head is the end of a free link. */ static int Tokenizer_is_free_link(Tokenizer* self, Unicode this, Unicode next) { // Built from Tokenizer_parse()'s end sentinels: Unicode after = Tokenizer_read(self, 2); uint64_t ctx = self->topstack->context; return (!this || this == '\n' || this == '[' || this == ']' || this == '<' || this == '>' || (this == '\'' && next == '\'') || (this == '|' && ctx & LC_TEMPLATE) || (this == '=' && ctx & (LC_TEMPLATE_PARAM_KEY | LC_HEADING)) || (this == '}' && next == '}' && (ctx & LC_TEMPLATE || (after == '}' && ctx & LC_ARGUMENT)))); } /* Really parse an external link. */ static PyObject* Tokenizer_really_parse_external_link(Tokenizer* self, int brackets, Textbuffer* extra) { Unicode this, next; int parens = 0; if (brackets ? Tokenizer_parse_bracketed_uri_scheme(self) : Tokenizer_parse_free_uri_scheme(self)) return NULL; if (BAD_ROUTE) return NULL; this = Tokenizer_read(self, 0); if (!this || this == '\n' || this == ' ' || this == ']') return Tokenizer_fail_route(self); if (!brackets && this == '[') return Tokenizer_fail_route(self); while (1) { this = Tokenizer_read(self, 0); next = Tokenizer_read(self, 1); if (this == '&') { PUSH_TAIL_BUFFER(extra, NULL) if (Tokenizer_parse_entity(self)) return NULL; } else if (this == '<' && next == '!' && Tokenizer_read(self, 2) == '-' && Tokenizer_read(self, 3) == '-') { PUSH_TAIL_BUFFER(extra, NULL) if (Tokenizer_parse_comment(self)) return NULL; } else if (!brackets && Tokenizer_is_free_link(self, this, next)) { self->head--; return Tokenizer_pop(self); } else if (!this || this == '\n') return Tokenizer_fail_route(self); else if (this == '{' && next == '{' && Tokenizer_CAN_RECURSE(self)) { PUSH_TAIL_BUFFER(extra, NULL) if (Tokenizer_parse_template_or_argument(self)) return NULL; } else if (this == ']') return Tokenizer_pop(self); else if (this == ' ') { if (brackets) { if (Tokenizer_emit(self, ExternalLinkSeparator)) return NULL; self->topstack->context ^= LC_EXT_LINK_URI; self->topstack->context |= LC_EXT_LINK_TITLE; self->head++; return Tokenizer_parse(self, 0, 0); } if (Textbuffer_write(extra, ' ')) return NULL; return Tokenizer_pop(self); } else if (!brackets) { if (Tokenizer_handle_free_link_text(self, &parens, extra, this)) return NULL; } else { if (Tokenizer_emit_char(self, this)) return NULL; } self->head++; } } /* Remove the URI scheme of a new external link from the textbuffer. */ static int Tokenizer_remove_uri_scheme_from_textbuffer(Tokenizer* self, PyObject* link) { PyObject *text = PyObject_GetAttrString(PyList_GET_ITEM(link, 0), "text"), *split, *scheme; Py_ssize_t length; if (!text) return -1; split = PyObject_CallMethod(text, "split", "si", ":", 1); Py_DECREF(text); if (!split) return -1; scheme = PyList_GET_ITEM(split, 0); length = PyUnicode_GET_LENGTH(scheme); Py_DECREF(split); self->topstack->textbuffer->length -= length; return 0; } /* Parse an external link at the head of the wikicode string. */ static int Tokenizer_parse_external_link(Tokenizer* self, int brackets) { #define INVALID_CONTEXT self->topstack->context & AGG_NO_EXT_LINKS #define NOT_A_LINK \ if (!brackets && self->topstack->context & LC_DLTERM) \ return Tokenizer_handle_dl_term(self); \ return Tokenizer_emit_char(self, Tokenizer_read(self, 0)) Py_ssize_t reset = self->head; PyObject *link, *kwargs; Textbuffer *extra; if (INVALID_CONTEXT || !(Tokenizer_CAN_RECURSE(self))) { NOT_A_LINK; } extra = Textbuffer_new(&self->text); if (!extra) return -1; self->head++; link = Tokenizer_really_parse_external_link(self, brackets, extra); if (BAD_ROUTE) { RESET_ROUTE(); self->head = reset; Textbuffer_dealloc(extra); NOT_A_LINK; } if (!link) { Textbuffer_dealloc(extra); return -1; } if (!brackets) { if (Tokenizer_remove_uri_scheme_from_textbuffer(self, link)) { Textbuffer_dealloc(extra); Py_DECREF(link); return -1; } } kwargs = PyDict_New(); if (!kwargs) { Textbuffer_dealloc(extra); Py_DECREF(link); return -1; } PyDict_SetItemString(kwargs, "brackets", brackets ? Py_True : Py_False); if (Tokenizer_emit_kwargs(self, ExternalLinkOpen, kwargs)) { Textbuffer_dealloc(extra); Py_DECREF(link); return -1; } if (Tokenizer_emit_all(self, link)) { Textbuffer_dealloc(extra); Py_DECREF(link); return -1; } Py_DECREF(link); if (Tokenizer_emit(self, ExternalLinkClose)) { Textbuffer_dealloc(extra); return -1; } if (extra->length > 0) return Tokenizer_emit_textbuffer(self, extra); Textbuffer_dealloc(extra); return 0; } /* Parse a section heading at the head of the wikicode string. */ static int Tokenizer_parse_heading(Tokenizer* self) { Py_ssize_t reset = self->head; int best = 1, i, context, diff; HeadingData *heading; PyObject *level, *kwargs; self->global |= GL_HEADING; self->head += 1; while (Tokenizer_read(self, 0) == '=') { best++; self->head++; } context = LC_HEADING_LEVEL_1 << (best > 5 ? 5 : best - 1); heading = (HeadingData*) Tokenizer_parse(self, context, 1); if (BAD_ROUTE) { RESET_ROUTE(); self->head = reset + best - 1; for (i = 0; i < best; i++) { if (Tokenizer_emit_char(self, '=')) return -1; } self->global ^= GL_HEADING; return 0; } #ifdef IS_PY3K level = PyLong_FromSsize_t(heading->level); #else level = PyInt_FromSsize_t(heading->level); #endif if (!level) { Py_DECREF(heading->title); free(heading); return -1; } kwargs = PyDict_New(); if (!kwargs) { Py_DECREF(level); Py_DECREF(heading->title); free(heading); return -1; } PyDict_SetItemString(kwargs, "level", level); Py_DECREF(level); if (Tokenizer_emit_kwargs(self, HeadingStart, kwargs)) { Py_DECREF(heading->title); free(heading); return -1; } if (heading->level < best) { diff = best - heading->level; for (i = 0; i < diff; i++) { if (Tokenizer_emit_char(self, '=')) { Py_DECREF(heading->title); free(heading); return -1; } } } if (Tokenizer_emit_all(self, heading->title)) { Py_DECREF(heading->title); free(heading); return -1; } Py_DECREF(heading->title); free(heading); if (Tokenizer_emit(self, HeadingEnd)) return -1; self->global ^= GL_HEADING; return 0; } /* Handle the end of a section heading at the head of the string. */ static HeadingData* Tokenizer_handle_heading_end(Tokenizer* self) { Py_ssize_t reset = self->head; int best, i, current, level, diff; HeadingData *after, *heading; PyObject *stack; self->head += 1; best = 1; while (Tokenizer_read(self, 0) == '=') { best++; self->head++; } current = heading_level_from_context(self->topstack->context); level = current > best ? (best > 6 ? 6 : best) : (current > 6 ? 6 : current); after = (HeadingData*) Tokenizer_parse(self, self->topstack->context, 1); if (BAD_ROUTE) { RESET_ROUTE(); if (level < best) { diff = best - level; for (i = 0; i < diff; i++) { if (Tokenizer_emit_char(self, '=')) return NULL; } } self->head = reset + best - 1; } else { for (i = 0; i < best; i++) { if (Tokenizer_emit_char(self, '=')) { Py_DECREF(after->title); free(after); return NULL; } } if (Tokenizer_emit_all(self, after->title)) { Py_DECREF(after->title); free(after); return NULL; } Py_DECREF(after->title); level = after->level; free(after); } stack = Tokenizer_pop(self); if (!stack) return NULL; heading = malloc(sizeof(HeadingData)); if (!heading) { PyErr_NoMemory(); return NULL; } heading->title = stack; heading->level = level; return heading; } /* Actually parse an HTML entity and ensure that it is valid. */ static int Tokenizer_really_parse_entity(Tokenizer* self) { PyObject *kwargs, *charobj, *textobj; Unicode this; int numeric, hexadecimal, i, j, zeroes, test; char *valid, *text, *buffer, *def; #define FAIL_ROUTE_AND_EXIT() { \ Tokenizer_fail_route(self); \ free(text); \ return 0; \ } if (Tokenizer_emit(self, HTMLEntityStart)) return -1; self->head++; this = Tokenizer_read(self, 0); if (!this) { Tokenizer_fail_route(self); return 0; } if (this == '#') { numeric = 1; if (Tokenizer_emit(self, HTMLEntityNumeric)) return -1; self->head++; this = Tokenizer_read(self, 0); if (!this) { Tokenizer_fail_route(self); return 0; } if (this == 'x' || this == 'X') { hexadecimal = 1; kwargs = PyDict_New(); if (!kwargs) return -1; if (!(charobj = PyUnicode_FROM_SINGLE(this))) { Py_DECREF(kwargs); return -1; } PyDict_SetItemString(kwargs, "char", charobj); Py_DECREF(charobj); if (Tokenizer_emit_kwargs(self, HTMLEntityHex, kwargs)) return -1; self->head++; } else hexadecimal = 0; } else numeric = hexadecimal = 0; if (hexadecimal) valid = HEXDIGITS; else if (numeric) valid = DIGITS; else valid = ALPHANUM; text = calloc(MAX_ENTITY_SIZE, sizeof(char)); if (!text) { PyErr_NoMemory(); return -1; } i = 0; zeroes = 0; while (1) { this = Tokenizer_read(self, 0); if (this == ';') { if (i == 0) FAIL_ROUTE_AND_EXIT() break; } if (i == 0 && this == '0') { zeroes++; self->head++; continue; } if (i >= MAX_ENTITY_SIZE) FAIL_ROUTE_AND_EXIT() if (is_marker(this)) FAIL_ROUTE_AND_EXIT() j = 0; while (1) { if (!valid[j]) FAIL_ROUTE_AND_EXIT() if (this == valid[j]) break; j++; } text[i] = (char) this; self->head++; i++; } if (numeric) { sscanf(text, (hexadecimal ? "%x" : "%d"), &test); if (test < 1 || test > 0x10FFFF) FAIL_ROUTE_AND_EXIT() } else { i = 0; while (1) { def = entitydefs[i]; if (!def) // We've reached the end of the defs without finding it FAIL_ROUTE_AND_EXIT() if (strcmp(text, def) == 0) break; i++; } } if (zeroes) { buffer = calloc(strlen(text) + zeroes + 1, sizeof(char)); if (!buffer) { free(text); PyErr_NoMemory(); return -1; } for (i = 0; i < zeroes; i++) strcat(buffer, "0"); strcat(buffer, text); free(text); text = buffer; } textobj = PyUnicode_FromString(text); if (!textobj) { free(text); return -1; } free(text); kwargs = PyDict_New(); if (!kwargs) { Py_DECREF(textobj); return -1; } PyDict_SetItemString(kwargs, "text", textobj); Py_DECREF(textobj); if (Tokenizer_emit_kwargs(self, Text, kwargs)) return -1; if (Tokenizer_emit(self, HTMLEntityEnd)) return -1; return 0; } /* Parse an HTML entity at the head of the wikicode string. */ static int Tokenizer_parse_entity(Tokenizer* self) { Py_ssize_t reset = self->head; PyObject *tokenlist; if (Tokenizer_push(self, 0)) return -1; if (Tokenizer_really_parse_entity(self)) return -1; if (BAD_ROUTE) { RESET_ROUTE(); self->head = reset; if (Tokenizer_emit_char(self, '&')) return -1; return 0; } tokenlist = Tokenizer_pop(self); if (!tokenlist) return -1; if (Tokenizer_emit_all(self, tokenlist)) { Py_DECREF(tokenlist); return -1; } Py_DECREF(tokenlist); return 0; } /* Parse an HTML comment at the head of the wikicode string. */ static int Tokenizer_parse_comment(Tokenizer* self) { Py_ssize_t reset = self->head + 3; PyObject *comment; Unicode this; self->head += 4; if (Tokenizer_push(self, 0)) return -1; while (1) { this = Tokenizer_read(self, 0); if (!this) { comment = Tokenizer_pop(self); Py_XDECREF(comment); self->head = reset; return Tokenizer_emit_text(self, " TagOpenOpen = make("TagOpenOpen") # < TagAttrStart = make("TagAttrStart") TagAttrEquals = make("TagAttrEquals") # = TagAttrQuote = make("TagAttrQuote") # ", ' TagCloseOpen = make("TagCloseOpen") # > TagCloseSelfclose = make("TagCloseSelfclose") # /> TagOpenClose = make("TagOpenClose") # del make mwparserfromhell-0.4.2/mwparserfromhell/smart_list.py000066400000000000000000000353771255634533200232610ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. """ This module contains the :class:`.SmartList` type, as well as its :class:`._ListProxy` child, which together implement a list whose sublists reflect changes made to the main list, and vice-versa. """ from __future__ import unicode_literals from sys import maxsize from weakref import ref from .compat import py3k __all__ = ["SmartList"] def inheritdoc(method): """Set __doc__ of *method* to __doc__ of *method* in its parent class. Since this is used on :class:`.SmartList`, the "parent class" used is ``list``. This function can be used as a decorator. """ method.__doc__ = getattr(list, method.__name__).__doc__ return method class _SliceNormalizerMixIn(object): """MixIn that provides a private method to normalize slices.""" def _normalize_slice(self, key, clamp=False): """Return a slice equivalent to the input *key*, standardized.""" if key.start is None: start = 0 else: start = (len(self) + key.start) if key.start < 0 else key.start if key.stop is None or key.stop == maxsize: stop = len(self) if clamp else None else: stop = (len(self) + key.stop) if key.stop < 0 else key.stop return slice(start, stop, key.step or 1) class SmartList(_SliceNormalizerMixIn, list): """Implements the ``list`` interface with special handling of sublists. When a sublist is created (by ``list[i:j]``), any changes made to this list (such as the addition, removal, or replacement of elements) will be reflected in the sublist, or vice-versa, to the greatest degree possible. This is implemented by having sublists - instances of the :class:`._ListProxy` type - dynamically determine their elements by storing their slice info and retrieving that slice from the parent. Methods that change the size of the list also change the slice info. For example:: >>> parent = SmartList([0, 1, 2, 3]) >>> parent [0, 1, 2, 3] >>> child = parent[2:] >>> child [2, 3] >>> child.append(4) >>> child [2, 3, 4] >>> parent [0, 1, 2, 3, 4] """ def __init__(self, iterable=None): if iterable: super(SmartList, self).__init__(iterable) else: super(SmartList, self).__init__() self._children = {} def __getitem__(self, key): if not isinstance(key, slice): return super(SmartList, self).__getitem__(key) key = self._normalize_slice(key, clamp=False) sliceinfo = [key.start, key.stop, key.step] child = _ListProxy(self, sliceinfo) child_ref = ref(child, self._delete_child) self._children[id(child_ref)] = (child_ref, sliceinfo) return child def __setitem__(self, key, item): if not isinstance(key, slice): return super(SmartList, self).__setitem__(key, item) item = list(item) super(SmartList, self).__setitem__(key, item) key = self._normalize_slice(key, clamp=True) diff = len(item) + (key.start - key.stop) // key.step if not diff: return values = self._children.values if py3k else self._children.itervalues for child, (start, stop, step) in values(): if start > key.stop: self._children[id(child)][1][0] += diff if stop is not None and stop >= key.stop: self._children[id(child)][1][1] += diff def __delitem__(self, key): super(SmartList, self).__delitem__(key) if isinstance(key, slice): key = self._normalize_slice(key, clamp=True) else: key = slice(key, key + 1, 1) diff = (key.stop - key.start) // key.step values = self._children.values if py3k else self._children.itervalues for child, (start, stop, step) in values(): if start > key.start: self._children[id(child)][1][0] -= diff if stop is not None and stop >= key.stop: self._children[id(child)][1][1] -= diff if not py3k: def __getslice__(self, start, stop): return self.__getitem__(slice(start, stop)) def __setslice__(self, start, stop, iterable): self.__setitem__(slice(start, stop), iterable) def __delslice__(self, start, stop): self.__delitem__(slice(start, stop)) def __add__(self, other): return SmartList(list(self) + other) def __radd__(self, other): return SmartList(other + list(self)) def __iadd__(self, other): self.extend(other) return self def _delete_child(self, child_ref): """Remove a child reference that is about to be garbage-collected.""" del self._children[id(child_ref)] def _detach_children(self): """Remove all children and give them independent parent copies.""" children = [val[0] for val in self._children.values()] for child in children: child()._parent = list(self) self._children.clear() @inheritdoc def append(self, item): head = len(self) self[head:head] = [item] @inheritdoc def extend(self, item): head = len(self) self[head:head] = item @inheritdoc def insert(self, index, item): self[index:index] = [item] @inheritdoc def pop(self, index=None): if index is None: index = len(self) - 1 item = self[index] del self[index] return item @inheritdoc def remove(self, item): del self[self.index(item)] @inheritdoc def reverse(self): self._detach_children() super(SmartList, self).reverse() if py3k: @inheritdoc def sort(self, key=None, reverse=None): self._detach_children() kwargs = {} if key is not None: kwargs["key"] = key if reverse is not None: kwargs["reverse"] = reverse super(SmartList, self).sort(**kwargs) else: @inheritdoc def sort(self, cmp=None, key=None, reverse=None): self._detach_children() kwargs = {} if cmp is not None: kwargs["cmp"] = cmp if key is not None: kwargs["key"] = key if reverse is not None: kwargs["reverse"] = reverse super(SmartList, self).sort(**kwargs) class _ListProxy(_SliceNormalizerMixIn, list): """Implement the ``list`` interface by getting elements from a parent. This is created by a :class:`.SmartList` object when slicing. It does not actually store the list at any time; instead, whenever the list is needed, it builds it dynamically using the :meth:`_render` method. """ def __init__(self, parent, sliceinfo): super(_ListProxy, self).__init__() self._parent = parent self._sliceinfo = sliceinfo def __repr__(self): return repr(self._render()) def __lt__(self, other): if isinstance(other, _ListProxy): return self._render() < list(other) return self._render() < other def __le__(self, other): if isinstance(other, _ListProxy): return self._render() <= list(other) return self._render() <= other def __eq__(self, other): if isinstance(other, _ListProxy): return self._render() == list(other) return self._render() == other def __ne__(self, other): if isinstance(other, _ListProxy): return self._render() != list(other) return self._render() != other def __gt__(self, other): if isinstance(other, _ListProxy): return self._render() > list(other) return self._render() > other def __ge__(self, other): if isinstance(other, _ListProxy): return self._render() >= list(other) return self._render() >= other if py3k: def __bool__(self): return bool(self._render()) else: def __nonzero__(self): return bool(self._render()) def __len__(self): return (self._stop - self._start) // self._step def __getitem__(self, key): if isinstance(key, slice): key = self._normalize_slice(key, clamp=True) keystart = min(self._start + key.start, self._stop) keystop = min(self._start + key.stop, self._stop) adjusted = slice(keystart, keystop, key.step) return self._parent[adjusted] else: return self._render()[key] def __setitem__(self, key, item): if isinstance(key, slice): key = self._normalize_slice(key, clamp=True) keystart = min(self._start + key.start, self._stop) keystop = min(self._start + key.stop, self._stop) adjusted = slice(keystart, keystop, key.step) self._parent[adjusted] = item else: length = len(self) if key < 0: key = length + key if key < 0 or key >= length: raise IndexError("list assignment index out of range") self._parent[self._start + key] = item def __delitem__(self, key): if isinstance(key, slice): key = self._normalize_slice(key, clamp=True) keystart = min(self._start + key.start, self._stop) keystop = min(self._start + key.stop, self._stop) adjusted = slice(keystart, keystop, key.step) del self._parent[adjusted] else: length = len(self) if key < 0: key = length + key if key < 0 or key >= length: raise IndexError("list assignment index out of range") del self._parent[self._start + key] def __iter__(self): i = self._start while i < self._stop: yield self._parent[i] i += self._step def __reversed__(self): i = self._stop - 1 while i >= self._start: yield self._parent[i] i -= self._step def __contains__(self, item): return item in self._render() if not py3k: def __getslice__(self, start, stop): return self.__getitem__(slice(start, stop)) def __setslice__(self, start, stop, iterable): self.__setitem__(slice(start, stop), iterable) def __delslice__(self, start, stop): self.__delitem__(slice(start, stop)) def __add__(self, other): return SmartList(list(self) + other) def __radd__(self, other): return SmartList(other + list(self)) def __iadd__(self, other): self.extend(other) return self def __mul__(self, other): return SmartList(list(self) * other) def __rmul__(self, other): return SmartList(other * list(self)) def __imul__(self, other): self.extend(list(self) * (other - 1)) return self @property def _start(self): """The starting index of this list, inclusive.""" return self._sliceinfo[0] @property def _stop(self): """The ending index of this list, exclusive.""" if self._sliceinfo[1] is None: return len(self._parent) return self._sliceinfo[1] @property def _step(self): """The number to increase the index by between items.""" return self._sliceinfo[2] def _render(self): """Return the actual list from the stored start/stop/step.""" return list(self._parent)[self._start:self._stop:self._step] @inheritdoc def append(self, item): self._parent.insert(self._stop, item) @inheritdoc def count(self, item): return self._render().count(item) @inheritdoc def index(self, item, start=None, stop=None): if start is not None: if stop is not None: return self._render().index(item, start, stop) return self._render().index(item, start) return self._render().index(item) @inheritdoc def extend(self, item): self._parent[self._stop:self._stop] = item @inheritdoc def insert(self, index, item): if index < 0: index = len(self) + index self._parent.insert(self._start + index, item) @inheritdoc def pop(self, index=None): length = len(self) if index is None: index = length - 1 elif index < 0: index = length + index if index < 0 or index >= length: raise IndexError("pop index out of range") return self._parent.pop(self._start + index) @inheritdoc def remove(self, item): index = self.index(item) del self._parent[self._start + index] @inheritdoc def reverse(self): item = self._render() item.reverse() self._parent[self._start:self._stop:self._step] = item if py3k: @inheritdoc def sort(self, key=None, reverse=None): item = self._render() kwargs = {} if key is not None: kwargs["key"] = key if reverse is not None: kwargs["reverse"] = reverse item.sort(**kwargs) self._parent[self._start:self._stop:self._step] = item else: @inheritdoc def sort(self, cmp=None, key=None, reverse=None): item = self._render() kwargs = {} if cmp is not None: kwargs["cmp"] = cmp if key is not None: kwargs["key"] = key if reverse is not None: kwargs["reverse"] = reverse item.sort(**kwargs) self._parent[self._start:self._stop:self._step] = item del inheritdoc mwparserfromhell-0.4.2/mwparserfromhell/string_mixin.py000066400000000000000000000077511255634533200236050ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. """ This module contains the :class:`.StringMixIn` type, which implements the interface for the ``unicode`` type (``str`` on py3k) in a dynamic manner. """ from __future__ import unicode_literals from sys import getdefaultencoding from .compat import bytes, py26, py3k, str __all__ = ["StringMixIn"] def inheritdoc(method): """Set __doc__ of *method* to __doc__ of *method* in its parent class. Since this is used on :class:`.StringMixIn`, the "parent class" used is ``str``. This function can be used as a decorator. """ method.__doc__ = getattr(str, method.__name__).__doc__ return method class StringMixIn(object): """Implement the interface for ``unicode``/``str`` in a dynamic manner. To use this class, inherit from it and override the :meth:`__unicode__` method (same on py3k) to return the string representation of the object. The various string methods will operate on the value of :meth:`__unicode__` instead of the immutable ``self`` like the regular ``str`` type. """ if py3k: def __str__(self): return self.__unicode__() def __bytes__(self): return bytes(self.__unicode__(), getdefaultencoding()) else: def __str__(self): return bytes(self.__unicode__()) def __unicode__(self): raise NotImplementedError() def __repr__(self): return repr(self.__unicode__()) def __lt__(self, other): return self.__unicode__() < other def __le__(self, other): return self.__unicode__() <= other def __eq__(self, other): return self.__unicode__() == other def __ne__(self, other): return self.__unicode__() != other def __gt__(self, other): return self.__unicode__() > other def __ge__(self, other): return self.__unicode__() >= other if py3k: def __bool__(self): return bool(self.__unicode__()) else: def __nonzero__(self): return bool(self.__unicode__()) def __len__(self): return len(self.__unicode__()) def __iter__(self): for char in self.__unicode__(): yield char def __getitem__(self, key): return self.__unicode__()[key] def __reversed__(self): return reversed(self.__unicode__()) def __contains__(self, item): return str(item) in self.__unicode__() def __getattr__(self, attr): return getattr(self.__unicode__(), attr) if py3k: maketrans = str.maketrans # Static method can't rely on __getattr__ if py26: @inheritdoc def encode(self, encoding=None, errors=None): if encoding is None: encoding = getdefaultencoding() if errors is not None: return self.__unicode__().encode(encoding, errors) return self.__unicode__().encode(encoding) del inheritdoc mwparserfromhell-0.4.2/mwparserfromhell/utils.py000066400000000000000000000061251255634533200222250ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. """ This module contains accessory functions for other parts of the library. Parser users generally won't need stuff from here. """ from __future__ import unicode_literals from .compat import bytes, str from .nodes import Node from .smart_list import SmartList __all__ = ["parse_anything"] def parse_anything(value, context=0, skip_style_tags=False): """Return a :class:`.Wikicode` for *value*, allowing multiple types. This differs from :meth:`.Parser.parse` in that we accept more than just a string to be parsed. Unicode objects (strings in py3k), strings (bytes in py3k), integers (converted to strings), ``None``, existing :class:`.Node` or :class:`.Wikicode` objects, as well as an iterable of these types, are supported. This is used to parse input on-the-fly by various methods of :class:`.Wikicode` and others like :class:`.Template`, such as :meth:`wikicode.insert() <.Wikicode.insert>` or setting :meth:`template.name <.Template.name>`. Additional arguments are passed directly to :meth:`.Parser.parse`. """ from .parser import Parser from .wikicode import Wikicode if isinstance(value, Wikicode): return value elif isinstance(value, Node): return Wikicode(SmartList([value])) elif isinstance(value, str): return Parser().parse(value, context, skip_style_tags) elif isinstance(value, bytes): return Parser().parse(value.decode("utf8"), context, skip_style_tags) elif isinstance(value, int): return Parser().parse(str(value), context, skip_style_tags) elif value is None: return Wikicode(SmartList()) try: nodelist = SmartList() for item in value: nodelist += parse_anything(item, context, skip_style_tags).nodes return Wikicode(nodelist) except TypeError: error = "Needs string, Node, Wikicode, int, None, or iterable of these, but got {0}: {1}" raise ValueError(error.format(type(value).__name__, value)) mwparserfromhell-0.4.2/mwparserfromhell/wikicode.py000066400000000000000000000653631255634533200226740ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals from itertools import chain import re from .compat import py3k, range, str from .nodes import (Argument, Comment, ExternalLink, Heading, HTMLEntity, Node, Tag, Template, Text, Wikilink) from .string_mixin import StringMixIn from .utils import parse_anything __all__ = ["Wikicode"] FLAGS = re.IGNORECASE | re.DOTALL | re.UNICODE class Wikicode(StringMixIn): """A ``Wikicode`` is a container for nodes that operates like a string. Additionally, it contains methods that can be used to extract data from or modify the nodes, implemented in an interface similar to a list. For example, :meth:`index` can get the index of a node in the list, and :meth:`insert` can add a new node at that index. The :meth:`filter() ` series of functions is very useful for extracting and iterating over, for example, all of the templates in the object. """ RECURSE_OTHERS = 2 def __init__(self, nodes): super(Wikicode, self).__init__() self._nodes = nodes def __unicode__(self): return "".join([str(node) for node in self.nodes]) @staticmethod def _get_children(node, contexts=False, restrict=None, parent=None): """Iterate over all child :class:`.Node`\ s of a given *node*.""" yield (parent, node) if contexts else node if restrict and isinstance(node, restrict): return for code in node.__children__(): for child in code.nodes: sub = Wikicode._get_children(child, contexts, restrict, code) for result in sub: yield result @staticmethod def _slice_replace(code, index, old, new): """Replace the string *old* with *new* across *index* in *code*.""" nodes = [str(node) for node in code.get(index)] substring = "".join(nodes).replace(old, new) code.nodes[index] = parse_anything(substring).nodes @staticmethod def _build_matcher(matches, flags): """Helper for :meth:`_indexed_ifilter` and others. If *matches* is a function, return it. If it's a regex, return a wrapper around it that can be called with a node to do a search. If it's ``None``, return a function that always returns ``True``. """ if matches: if callable(matches): return matches return lambda obj: re.search(matches, str(obj), flags) return lambda obj: True def _indexed_ifilter(self, recursive=True, matches=None, flags=FLAGS, forcetype=None): """Iterate over nodes and their corresponding indices in the node list. The arguments are interpreted as for :meth:`ifilter`. For each tuple ``(i, node)`` yielded by this method, ``self.index(node) == i``. Note that if *recursive* is ``True``, ``self.nodes[i]`` might not be the node itself, but will still contain it. """ match = self._build_matcher(matches, flags) if recursive: restrict = forcetype if recursive == self.RECURSE_OTHERS else None def getter(i, node): for ch in self._get_children(node, restrict=restrict): yield (i, ch) inodes = chain(*(getter(i, n) for i, n in enumerate(self.nodes))) else: inodes = enumerate(self.nodes) for i, node in inodes: if (not forcetype or isinstance(node, forcetype)) and match(node): yield (i, node) def _do_strong_search(self, obj, recursive=True): """Search for the specific element *obj* within the node list. *obj* can be either a :class:`.Node` or a :class:`.Wikicode` object. If found, we return a tuple (*context*, *index*) where *context* is the :class:`.Wikicode` that contains *obj* and *index* is its index there, as a :class:`slice`. Note that if *recursive* is ``False``, *context* will always be ``self`` (since we only look for *obj* among immediate descendants), but if *recursive* is ``True``, then it could be any :class:`.Wikicode` contained by a node within ``self``. If *obj* is not found, :exc:`ValueError` is raised. """ if isinstance(obj, Node): mkslice = lambda i: slice(i, i + 1) if not recursive: return self, mkslice(self.index(obj)) for i, node in enumerate(self.nodes): for context, child in self._get_children(node, contexts=True): if obj is child: if not context: context = self return context, mkslice(context.index(child)) raise ValueError(obj) context, ind = self._do_strong_search(obj.get(0), recursive) for i in range(1, len(obj.nodes)): if obj.get(i) is not context.get(ind.start + i): raise ValueError(obj) return context, slice(ind.start, ind.start + len(obj.nodes)) def _do_weak_search(self, obj, recursive): """Search for an element that looks like *obj* within the node list. This follows the same rules as :meth:`_do_strong_search` with some differences. *obj* is treated as a string that might represent any :class:`.Node`, :class:`.Wikicode`, or combination of the two present in the node list. Thus, matching is weak (using string comparisons) rather than strong (using ``is``). Because multiple nodes can match *obj*, the result is a list of tuples instead of just one (however, :exc:`ValueError` is still raised if nothing is found). Individual matches will never overlap. The tuples contain a new first element, *exact*, which is ``True`` if we were able to match *obj* exactly to one or more adjacent nodes, or ``False`` if we found *obj* inside a node or incompletely spanning multiple nodes. """ obj = parse_anything(obj) if not obj or obj not in self: raise ValueError(obj) results = [] contexts = [self] while contexts: context = contexts.pop() i = len(context.nodes) - 1 while i >= 0: node = context.get(i) if obj.get(-1) == node: for j in range(-len(obj.nodes), -1): if obj.get(j) != context.get(i + j + 1): break else: i -= len(obj.nodes) - 1 index = slice(i, i + len(obj.nodes)) results.append((True, context, index)) elif recursive and obj in node: contexts.extend(node.__children__()) i -= 1 if not results: if not recursive: raise ValueError(obj) results.append((False, self, slice(0, len(self.nodes)))) return results def _get_tree(self, code, lines, marker, indent): """Build a tree to illustrate the way the Wikicode object was parsed. The method that builds the actual tree is ``__showtree__`` of ``Node`` objects. *code* is the ``Wikicode`` object to build a tree for. *lines* is the list to append the tree to, which is returned at the end of the method. *marker* is some object to be used to indicate that the builder should continue on from the last line instead of starting a new one; it should be any object that can be tested for with ``is``. *indent* is the starting indentation. """ def write(*args): """Write a new line following the proper indentation rules.""" if lines and lines[-1] is marker: # Continue from the last line lines.pop() # Remove the marker last = lines.pop() lines.append(last + " ".join(args)) else: lines.append(" " * 6 * indent + " ".join(args)) get = lambda code: self._get_tree(code, lines, marker, indent + 1) mark = lambda: lines.append(marker) for node in code.nodes: node.__showtree__(write, get, mark) return lines @classmethod def _build_filter_methods(cls, **meths): """Given Node types, build the corresponding i?filter shortcuts. The should be given as keys storing the method's base name paired with values storing the corresponding :class:`.Node` type. For example, the dict may contain the pair ``("templates", Template)``, which will produce the methods :meth:`ifilter_templates` and :meth:`filter_templates`, which are shortcuts for :meth:`ifilter(forcetype=Template) ` and :meth:`filter(forcetype=Template) `, respectively. These shortcuts are added to the class itself, with an appropriate docstring. """ doc = """Iterate over {0}. This is equivalent to :meth:`{1}` with *forcetype* set to :class:`~{2.__module__}.{2.__name__}`. """ make_ifilter = lambda ftype: (lambda self, *a, **kw: self.ifilter(forcetype=ftype, *a, **kw)) make_filter = lambda ftype: (lambda self, *a, **kw: self.filter(forcetype=ftype, *a, **kw)) for name, ftype in (meths.items() if py3k else meths.iteritems()): ifilter = make_ifilter(ftype) filter = make_filter(ftype) ifilter.__doc__ = doc.format(name, "ifilter", ftype) filter.__doc__ = doc.format(name, "filter", ftype) setattr(cls, "ifilter_" + name, ifilter) setattr(cls, "filter_" + name, filter) @property def nodes(self): """A list of :class:`.Node` objects. This is the internal data actually stored within a :class:`.Wikicode` object. """ return self._nodes @nodes.setter def nodes(self, value): if not isinstance(value, list): value = parse_anything(value).nodes self._nodes = value def get(self, index): """Return the *index*\ th node within the list of nodes.""" return self.nodes[index] def set(self, index, value): """Set the ``Node`` at *index* to *value*. Raises :exc:`IndexError` if *index* is out of range, or :exc:`ValueError` if *value* cannot be coerced into one :class:`.Node`. To insert multiple nodes at an index, use :meth:`get` with either :meth:`remove` and :meth:`insert` or :meth:`replace`. """ nodes = parse_anything(value).nodes if len(nodes) > 1: raise ValueError("Cannot coerce multiple nodes into one index") if index >= len(self.nodes) or -1 * index > len(self.nodes): raise IndexError("List assignment index out of range") if nodes: self.nodes[index] = nodes[0] else: self.nodes.pop(index) def index(self, obj, recursive=False): """Return the index of *obj* in the list of nodes. Raises :exc:`ValueError` if *obj* is not found. If *recursive* is ``True``, we will look in all nodes of ours and their descendants, and return the index of our direct descendant node within *our* list of nodes. Otherwise, the lookup is done only on direct descendants. """ strict = isinstance(obj, Node) equivalent = (lambda o, n: o is n) if strict else (lambda o, n: o == n) for i, node in enumerate(self.nodes): if recursive: for child in self._get_children(node): if equivalent(obj, child): return i elif equivalent(obj, node): return i raise ValueError(obj) def insert(self, index, value): """Insert *value* at *index* in the list of nodes. *value* can be anything parsable by :func:`.parse_anything`, which includes strings or other :class:`.Wikicode` or :class:`.Node` objects. """ nodes = parse_anything(value).nodes for node in reversed(nodes): self.nodes.insert(index, node) def insert_before(self, obj, value, recursive=True): """Insert *value* immediately before *obj*. *obj* can be either a string, a :class:`.Node`, or another :class:`.Wikicode` object (as created by :meth:`get_sections`, for example). If *obj* is a string, we will operate on all instances of that string within the code, otherwise only on the specific instance given. *value* can be anything parsable by :func:`.parse_anything`. If *recursive* is ``True``, we will try to find *obj* within our child nodes even if it is not a direct descendant of this :class:`.Wikicode` object. If *obj* is not found, :exc:`ValueError` is raised. """ if isinstance(obj, (Node, Wikicode)): context, index = self._do_strong_search(obj, recursive) context.insert(index.start, value) else: for exact, context, index in self._do_weak_search(obj, recursive): if exact: context.insert(index.start, value) else: obj = str(obj) self._slice_replace(context, index, obj, str(value) + obj) def insert_after(self, obj, value, recursive=True): """Insert *value* immediately after *obj*. *obj* can be either a string, a :class:`.Node`, or another :class:`.Wikicode` object (as created by :meth:`get_sections`, for example). If *obj* is a string, we will operate on all instances of that string within the code, otherwise only on the specific instance given. *value* can be anything parsable by :func:`.parse_anything`. If *recursive* is ``True``, we will try to find *obj* within our child nodes even if it is not a direct descendant of this :class:`.Wikicode` object. If *obj* is not found, :exc:`ValueError` is raised. """ if isinstance(obj, (Node, Wikicode)): context, index = self._do_strong_search(obj, recursive) context.insert(index.stop, value) else: for exact, context, index in self._do_weak_search(obj, recursive): if exact: context.insert(index.stop, value) else: obj = str(obj) self._slice_replace(context, index, obj, obj + str(value)) def replace(self, obj, value, recursive=True): """Replace *obj* with *value*. *obj* can be either a string, a :class:`.Node`, or another :class:`.Wikicode` object (as created by :meth:`get_sections`, for example). If *obj* is a string, we will operate on all instances of that string within the code, otherwise only on the specific instance given. *value* can be anything parsable by :func:`.parse_anything`. If *recursive* is ``True``, we will try to find *obj* within our child nodes even if it is not a direct descendant of this :class:`.Wikicode` object. If *obj* is not found, :exc:`ValueError` is raised. """ if isinstance(obj, (Node, Wikicode)): context, index = self._do_strong_search(obj, recursive) for i in range(index.start, index.stop): context.nodes.pop(index.start) context.insert(index.start, value) else: for exact, context, index in self._do_weak_search(obj, recursive): if exact: for i in range(index.start, index.stop): context.nodes.pop(index.start) context.insert(index.start, value) else: self._slice_replace(context, index, str(obj), str(value)) def append(self, value): """Insert *value* at the end of the list of nodes. *value* can be anything parsable by :func:`.parse_anything`. """ nodes = parse_anything(value).nodes for node in nodes: self.nodes.append(node) def remove(self, obj, recursive=True): """Remove *obj* from the list of nodes. *obj* can be either a string, a :class:`.Node`, or another :class:`.Wikicode` object (as created by :meth:`get_sections`, for example). If *obj* is a string, we will operate on all instances of that string within the code, otherwise only on the specific instance given. If *recursive* is ``True``, we will try to find *obj* within our child nodes even if it is not a direct descendant of this :class:`.Wikicode` object. If *obj* is not found, :exc:`ValueError` is raised. """ if isinstance(obj, (Node, Wikicode)): context, index = self._do_strong_search(obj, recursive) for i in range(index.start, index.stop): context.nodes.pop(index.start) else: for exact, context, index in self._do_weak_search(obj, recursive): if exact: for i in range(index.start, index.stop): context.nodes.pop(index.start) else: self._slice_replace(context, index, str(obj), "") def matches(self, other): """Do a loose equivalency test suitable for comparing page names. *other* can be any string-like object, including :class:`.Wikicode`, or a tuple of these. This operation is symmetric; both sides are adjusted. Specifically, whitespace and markup is stripped and the first letter's case is normalized. Typical usage is ``if template.name.matches("stub"): ...``. """ cmp = lambda a, b: (a[0].upper() + a[1:] == b[0].upper() + b[1:] if a and b else a == b) this = self.strip_code().strip() if isinstance(other, (tuple, list)): for obj in other: that = parse_anything(obj).strip_code().strip() if cmp(this, that): return True return False that = parse_anything(other).strip_code().strip() return cmp(this, that) def ifilter(self, recursive=True, matches=None, flags=FLAGS, forcetype=None): """Iterate over nodes in our list matching certain conditions. If *forcetype* is given, only nodes that are instances of this type (or tuple of types) are yielded. Setting *recursive* to ``True`` will iterate over all children and their descendants. ``RECURSE_OTHERS`` will only iterate over children that are not the instances of *forcetype*. ``False`` will only iterate over immediate children. ``RECURSE_OTHERS`` can be used to iterate over all un-nested templates, even if they are inside of HTML tags, like so: >>> code = mwparserfromhell.parse("{{foo}}{{foo|{{bar}}}}") >>> code.filter_templates(code.RECURSE_OTHERS) ["{{foo}}", "{{foo|{{bar}}}}"] *matches* can be used to further restrict the nodes, either as a function (taking a single :class:`.Node` and returning a boolean) or a regular expression (matched against the node's string representation with :func:`re.search`). If *matches* is a regex, the flags passed to :func:`re.search` are :const:`re.IGNORECASE`, :const:`re.DOTALL`, and :const:`re.UNICODE`, but custom flags can be specified by passing *flags*. """ gen = self._indexed_ifilter(recursive, matches, flags, forcetype) return (node for i, node in gen) def filter(self, *args, **kwargs): """Return a list of nodes within our list matching certain conditions. This is equivalent to calling :func:`list` on :meth:`ifilter`. """ return list(self.ifilter(*args, **kwargs)) def get_sections(self, levels=None, matches=None, flags=FLAGS, flat=False, include_lead=None, include_headings=True): """Return a list of sections within the page. Sections are returned as :class:`.Wikicode` objects with a shared node list (implemented using :class:`.SmartList`) so that changes to sections are reflected in the parent Wikicode object. Each section contains all of its subsections, unless *flat* is ``True``. If *levels* is given, it should be a iterable of integers; only sections whose heading levels are within it will be returned. If *matches* is given, it should be either a function or a regex; only sections whose headings match it (without the surrounding equal signs) will be included. *flags* can be used to override the default regex flags (see :meth:`ifilter`) if a regex *matches* is used. If *include_lead* is ``True``, the first, lead section (without a heading) will be included in the list; ``False`` will not include it; the default will include it only if no specific *levels* were given. If *include_headings* is ``True``, the section's beginning :class:`.Heading` object will be included; otherwise, this is skipped. """ title_matcher = self._build_matcher(matches, flags) matcher = lambda heading: (title_matcher(heading.title) and (not levels or heading.level in levels)) iheadings = self._indexed_ifilter(recursive=False, forcetype=Heading) sections = [] # Tuples of (index_of_first_node, section) open_headings = [] # Tuples of (index, heading), where index and # heading.level are both monotonically increasing # Add the lead section if appropriate: if include_lead or not (include_lead is not None or matches or levels): itr = self._indexed_ifilter(recursive=False, forcetype=Heading) try: first = next(itr)[0] sections.append((0, Wikicode(self.nodes[:first]))) except StopIteration: # No headings in page sections.append((0, Wikicode(self.nodes[:]))) # Iterate over headings, adding sections to the list as they end: for i, heading in iheadings: if flat: # With flat, all sections close at the next heading newly_closed, open_headings = open_headings, [] else: # Otherwise, figure out which sections have closed, if any closed_start_index = len(open_headings) for j, (start, last_heading) in enumerate(open_headings): if heading.level <= last_heading.level: closed_start_index = j break newly_closed = open_headings[closed_start_index:] del open_headings[closed_start_index:] for start, closed_heading in newly_closed: if matcher(closed_heading): sections.append((start, Wikicode(self.nodes[start:i]))) start = i if include_headings else (i + 1) open_headings.append((start, heading)) # Add any remaining open headings to the list of sections: for start, heading in open_headings: if matcher(heading): sections.append((start, Wikicode(self.nodes[start:]))) # Ensure that earlier sections are earlier in the returned list: return [section for i, section in sorted(sections)] def strip_code(self, normalize=True, collapse=True): """Return a rendered string without unprintable code such as templates. The way a node is stripped is handled by the :meth:`~.Node.__strip__` method of :class:`.Node` objects, which generally return a subset of their nodes or ``None``. For example, templates and tags are removed completely, links are stripped to just their display part, headings are stripped to just their title. If *normalize* is ``True``, various things may be done to strip code further, such as converting HTML entities like ``Σ``, ``Σ``, and ``Σ`` to ``Σ``. If *collapse* is ``True``, we will try to remove excess whitespace as well (three or more newlines are converted to two, for example). """ nodes = [] for node in self.nodes: stripped = node.__strip__(normalize, collapse) if stripped: nodes.append(str(stripped)) if collapse: stripped = "".join(nodes).strip("\n") while "\n\n\n" in stripped: stripped = stripped.replace("\n\n\n", "\n\n") return stripped else: return "".join(nodes) def get_tree(self): """Return a hierarchical tree representation of the object. The representation is a string makes the most sense printed. It is built by calling :meth:`_get_tree` on the :class:`.Wikicode` object and its children recursively. The end result may look something like the following:: >>> text = "Lorem ipsum {{foo|bar|{{baz}}|spam=eggs}}" >>> print(mwparserfromhell.parse(text).get_tree()) Lorem ipsum {{ foo | 1 = bar | 2 = {{ baz }} | spam = eggs }} """ marker = object() # Random object we can find with certainty in a list return "\n".join(self._get_tree(self, [], marker, 0)) Wikicode._build_filter_methods( arguments=Argument, comments=Comment, external_links=ExternalLink, headings=Heading, html_entities=HTMLEntity, tags=Tag, templates=Template, text=Text, wikilinks=Wikilink) mwparserfromhell-0.4.2/scripts/000077500000000000000000000000001255634533200166055ustar00rootroot00000000000000mwparserfromhell-0.4.2/scripts/README000066400000000000000000000002671255634533200174720ustar00rootroot00000000000000This directory contains support files used for *developing* mwparserfromhell, not running it. If you are looking for code examples, read the documentation or explore the source code. mwparserfromhell-0.4.2/scripts/memtest.py000066400000000000000000000141441255634533200206410ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. """ Tests for memory leaks in the CTokenizer. Python 2 and 3 compatible. This appears to work mostly fine under Linux, but gives an absurd number of false positives on OS X. I'm not sure why. Running the tests multiple times yields different results (tests don't always leak, and the amount they leak by varies). Increasing the number of loops results in a smaller bytes/loop value, too, indicating the increase in memory usage might be due to something else. Actual memory leaks typically leak very large amounts of memory (megabytes) and scale with the number of loops. """ from __future__ import unicode_literals, print_function from locale import LC_ALL, setlocale from multiprocessing import Process, Pipe from os import listdir, path import sys import psutil from mwparserfromhell.compat import py3k from mwparserfromhell.parser._tokenizer import CTokenizer if sys.version_info[0] == 2: range = xrange LOOPS = 10000 class Color(object): GRAY = "\x1b[30;1m" GREEN = "\x1b[92m" YELLOW = "\x1b[93m" RESET = "\x1b[0m" class MemoryTest(object): """Manages a memory test.""" def __init__(self): self._tests = [] self._load() def _parse_file(self, name, text): tests = text.split("\n---\n") counter = 1 digits = len(str(len(tests))) for test in tests: data = {"name": None, "label": None, "input": None, "output": None} for line in test.strip().splitlines(): if line.startswith("name:"): data["name"] = line[len("name:"):].strip() elif line.startswith("label:"): data["label"] = line[len("label:"):].strip() elif line.startswith("input:"): raw = line[len("input:"):].strip() if raw[0] == '"' and raw[-1] == '"': raw = raw[1:-1] raw = raw.encode("raw_unicode_escape") data["input"] = raw.decode("unicode_escape") number = str(counter).zfill(digits) fname = "test_{0}{1}_{2}".format(name, number, data["name"]) self._tests.append((fname, data["input"])) counter += 1 def _load(self): def load_file(filename): with open(filename, "rU") as fp: text = fp.read() if not py3k: text = text.decode("utf8") name = path.split(filename)[1][:0-len(extension)] self._parse_file(name, text) root = path.split(path.dirname(path.abspath(__file__)))[0] directory = path.join(root, "tests", "tokenizer") extension = ".mwtest" if len(sys.argv) > 2 and sys.argv[1] == "--use": for name in sys.argv[2:]: load_file(path.join(directory, name + extension)) sys.argv = [sys.argv[0]] # So unittest doesn't try to load these else: for filename in listdir(directory): if not filename.endswith(extension): continue load_file(path.join(directory, filename)) @staticmethod def _print_results(info1, info2): r1, r2 = info1.rss, info2.rss buff = 8192 if r2 - buff > r1: d = r2 - r1 p = float(d) / r1 bpt = d // LOOPS tmpl = "{0}LEAKING{1}: {2:n} bytes, {3:.2%} inc ({4:n} bytes/loop)" sys.stdout.write(tmpl.format(Color.YELLOW, Color.RESET, d, p, bpt)) else: sys.stdout.write("{0}OK{1}".format(Color.GREEN, Color.RESET)) def run(self): """Run the memory test suite.""" width = 1 for (name, _) in self._tests: if len(name) > width: width = len(name) tmpl = "{0}[{1:03}/{2}]{3} {4}: " for i, (name, text) in enumerate(self._tests, 1): sys.stdout.write(tmpl.format(Color.GRAY, i, len(self._tests), Color.RESET, name.ljust(width))) sys.stdout.flush() parent, child = Pipe() p = Process(target=_runner, args=(text, child)) p.start() try: proc = psutil.Process(p.pid) parent.recv() parent.send("OK") parent.recv() info1 = proc.get_memory_info() sys.stdout.flush() parent.send("OK") parent.recv() info2 = proc.get_memory_info() self._print_results(info1, info2) sys.stdout.flush() parent.send("OK") finally: proc.kill() print() def _runner(text, child): r1, r2 = range(250), range(LOOPS) for i in r1: CTokenizer().tokenize(text) child.send("OK") child.recv() child.send("OK") child.recv() for i in r2: CTokenizer().tokenize(text) child.send("OK") child.recv() if __name__ == "__main__": setlocale(LC_ALL, "") MemoryTest().run() mwparserfromhell-0.4.2/scripts/release.sh000077500000000000000000000115001255634533200205610ustar00rootroot00000000000000#! /usr/bin/env bash if [[ -z "$1" ]]; then echo "usage: $0 1.2.3" exit 1 fi VERSION=$1 SCRIPT_DIR=$(dirname "$0") RELEASE_DATE=$(date +"%B %d, %Y") check_git() { if [[ -n "$(git status --porcelain --untracked-files=no)" ]]; then echo "Aborting: dirty working directory." exit 1 fi if [[ "$(git rev-parse --abbrev-ref HEAD)" != "develop" ]]; then echo "Aborting: not on develop." exit 1 fi echo -n "Are you absolutely ready to release? [yN] " read confirm if [[ ${confirm,,} != "y" ]]; then exit 1 fi } update_version() { echo -n "Updating mwparserfromhell.__version__..." sed -e 's/__version__ = .*/__version__ = "'$VERSION'"/' -i "" mwparserfromhell/__init__.py echo " done." } update_appveyor() { filename="appveyor.yml" echo -n "Updating $filename..." sed -e "s/version: .*/version: $VERSION-b{build}/" -i "" $filename echo " done." } update_changelog() { filename="CHANGELOG" echo -n "Updating $filename..." sed -e "1s/.*/v$VERSION (released $RELEASE_DATE):/" -i "" $filename echo " done." } update_docs_changelog() { filename="docs/changelog.rst" echo -n "Updating $filename..." dashes=$(seq 1 $(expr ${#VERSION} + 1) | sed 's/.*/-/' | tr -d '\n') previous_lineno=$(expr $(grep -n -e "^---" $filename | sed '2q;d' | cut -d ':' -f 1) - 1) previous_version=$(sed $previous_lineno'q;d' $filename) sed \ -e "4s/.*/v$VERSION/" \ -e "5s/.*/$dashes/" \ -e "7s/.*/\`Released $RELEASE_DATE \`_/" \ -e "8s/.*/(\`changes \`__):/" \ -i "" $filename echo " done." } do_git_stuff() { echo -n "Git: committing, tagging, and merging release..." git commit -qam "release/$VERSION" git tag v$VERSION -s -m "version $VERSION" git checkout -q master git merge -q --no-ff develop -m "Merge branch 'develop'" echo -n " pushing..." git push -q --tags origin master git checkout -q develop git push -q origin develop echo " done." } upload_to_pypi() { echo -n "PyPI: uploading source tarball and docs..." python setup.py -q register sdist upload -s python setup.py -q upload_docs echo " done." } post_release() { echo echo "*** Release completed." echo "*** Update: https://github.com/earwig/mwparserfromhell/releases/tag/v$VERSION" echo "*** Verify: https://pypi.python.org/pypi/mwparserfromhell" echo "*** Verify: https://ci.appveyor.com/project/earwig/mwparserfromhell" echo "*** Verify: https://mwparserfromhell.readthedocs.org" echo "*** Press enter to sanity-check the release." read } test_release() { echo echo "Checking mwparserfromhell v$VERSION..." echo -n "Creating a virtualenv..." virtdir="mwparser-test-env" virtualenv -q $virtdir cd $virtdir source bin/activate echo " done." echo -n "Installing mwparserfromhell with pip..." pip -q install mwparserfromhell echo " done." echo -n "Checking version..." reported_version=$(python -c 'print __import__("mwparserfromhell").__version__') if [[ "$reported_version" != "$VERSION" ]]; then echo " error." echo "*** ERROR: mwparserfromhell is reporting its version as $reported_version, not $VERSION!" deactivate cd .. rm -rf $virtdir exit 1 else echo " done." fi pip -q uninstall -y mwparserfromhell echo -n "Downloading mwparserfromhell source tarball and GPG signature..." curl -sL "https://pypi.python.org/packages/source/m/mwparserfromhell/mwparserfromhell-$VERSION.tar.gz" -o "mwparserfromhell.tar.gz" curl -sL "https://pypi.python.org/packages/source/m/mwparserfromhell/mwparserfromhell-$VERSION.tar.gz.asc" -o "mwparserfromhell.tar.gz.asc" echo " done." echo "Verifying tarball..." gpg --verify mwparserfromhell.tar.gz.asc if [[ "$?" != "0" ]]; then echo "*** ERROR: GPG signature verification failed!" deactivate cd .. rm -rf $virtdir exit 1 fi tar -xf mwparserfromhell.tar.gz rm mwparserfromhell.tar.gz mwparserfromhell.tar.gz.asc cd mwparserfromhell-$VERSION echo "Running unit tests..." python setup.py -q test if [[ "$?" != "0" ]]; then echo "*** ERROR: Unit tests failed!" deactivate cd ../.. rm -rf $virtdir exit 1 fi echo -n "Everything looks good. Cleaning up..." deactivate cd ../.. rm -rf $virtdir echo " done." } echo "Preparing mwparserfromhell v$VERSION..." cd "$SCRIPT_DIR/.." check_git update_version update_appveyor update_changelog update_docs_changelog do_git_stuff upload_to_pypi post_release test_release echo "All done." exit 0 mwparserfromhell-0.4.2/scripts/win_wrapper.cmd000066400000000000000000000030601255634533200216260ustar00rootroot00000000000000:: To build extensions for 64 bit Python 3, we need to configure environment :: variables to use the MSVC 2010 C++ compilers from GRMSDKX_EN_DVD.iso of: :: MS Windows SDK for Windows 7 and .NET Framework 4 (SDK v7.1) :: :: To build extensions for 64 bit Python 2, we need to configure environment :: variables to use the MSVC 2008 C++ compilers from GRMSDKX_EN_DVD.iso of: :: MS Windows SDK for Windows 7 and .NET Framework 3.5 (SDK v7.0) :: :: 32 bit builds do not require specific environment configurations. :: :: Note: this script needs to be run with the /E:ON and /V:ON flags for the :: cmd interpreter, at least for (SDK v7.0) :: :: More details at: :: https://github.com/cython/cython/wiki/64BitCythonExtensionsOnWindows :: http://stackoverflow.com/a/13751649/163740 :: :: Author: Olivier Grisel :: License: CC0 1.0 Universal: http://creativecommons.org/publicdomain/zero/1.0/ @ECHO OFF SET COMMAND_TO_RUN=%* SET WIN_SDK_ROOT=C:\Program Files\Microsoft SDKs\Windows SET MAJOR_PYTHON_VERSION="%PYTHON_VERSION:~0,1%" IF %MAJOR_PYTHON_VERSION% == "2" ( SET WINDOWS_SDK_VERSION="v7.0" ) ELSE IF %MAJOR_PYTHON_VERSION% == "3" ( SET WINDOWS_SDK_VERSION="v7.1" ) ELSE ( ECHO Unsupported Python version: "%MAJOR_PYTHON_VERSION%" EXIT 1 ) IF "%PYTHON_ARCH%"=="64" ( SET DISTUTILS_USE_SDK=1 SET MSSdk=1 "%WIN_SDK_ROOT%\%WINDOWS_SDK_VERSION%\Setup\WindowsSdkVer.exe" -q -version:%WINDOWS_SDK_VERSION% "%WIN_SDK_ROOT%\%WINDOWS_SDK_VERSION%\Bin\SetEnv.cmd" /x64 /release call %COMMAND_TO_RUN% || EXIT 1 ) ELSE ( call %COMMAND_TO_RUN% || EXIT 1 ) mwparserfromhell-0.4.2/setup.py000066400000000000000000000103531255634533200166320ustar00rootroot00000000000000#! /usr/bin/env python # -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import print_function from distutils.errors import DistutilsError, CCompilerError from glob import glob from os import environ import sys if ((sys.version_info[0] == 2 and sys.version_info[1] < 6) or (sys.version_info[1] == 3 and sys.version_info[1] < 2)): raise RuntimeError("mwparserfromhell needs Python 2.6+ or 3.2+") from setuptools import setup, find_packages, Extension from setuptools.command.build_ext import build_ext from mwparserfromhell import __version__ from mwparserfromhell.compat import py26, py3k with open("README.rst", **({'encoding':'utf-8'} if py3k else {})) as fp: long_docs = fp.read() use_extension = True fallback = True # Allow env var WITHOUT_EXTENSION and args --with[out]-extension: env_var = environ.get("WITHOUT_EXTENSION") if "--without-extension" in sys.argv: use_extension = False elif "--with-extension" in sys.argv: fallback = False elif env_var is not None: if env_var == "1": use_extension = False elif env_var == "0": fallback = False # Remove the command line argument as it isn't understood by setuptools: sys.argv = [arg for arg in sys.argv if arg != "--without-extension" and arg != "--with-extension"] def build_ext_patched(self): try: build_ext_original(self) except (DistutilsError, CCompilerError) as exc: print("error: " + str(exc)) print("Falling back to pure Python mode.") del self.extensions[:] if fallback: build_ext.run, build_ext_original = build_ext_patched, build_ext.run # Project-specific part begins here: tokenizer = Extension("mwparserfromhell.parser._tokenizer", sources=glob("mwparserfromhell/parser/ctokenizer/*.c"), depends=glob("mwparserfromhell/parser/ctokenizer/*.h")) setup( name = "mwparserfromhell", packages = find_packages(exclude=("tests",)), ext_modules = [tokenizer] if use_extension else [], tests_require = ["unittest2"] if py26 else [], test_suite = "tests.discover", version = __version__, author = "Ben Kurtovic", author_email = "ben.kurtovic@gmail.com", url = "https://github.com/earwig/mwparserfromhell", description = "MWParserFromHell is a parser for MediaWiki wikicode.", long_description = long_docs, download_url = "https://github.com/earwig/mwparserfromhell/tarball/v{0}".format(__version__), keywords = "earwig mwparserfromhell wikipedia wiki mediawiki wikicode template parsing", license = "MIT License", classifiers = [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.2", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Topic :: Text Processing :: Markup" ], ) mwparserfromhell-0.4.2/tests/000077500000000000000000000000001255634533200162605ustar00rootroot00000000000000mwparserfromhell-0.4.2/tests/MWPFHTestCase.tmlanguage000066400000000000000000000053671255634533200226560ustar00rootroot00000000000000 fileTypes mwtest name MWParserFromHell Test Case patterns match --- name markup.heading.divider.mwpfh captures 1 name keyword.other.name.mwpfh 2 name variable.other.name.mwpfh match (name:)\s*(\w*) name meta.name.mwpfh captures 1 name keyword.other.label.mwpfh 2 name comment.line.other.label.mwpfh match (label:)\s*(.*) name meta.label.mwpfh captures 1 name keyword.other.input.mwpfh 2 name string.quoted.double.input.mwpfh match (input:)\s*(.*) name meta.input.mwpfh captures 1 name keyword.other.output.mwpfh match (output:) name meta.output.mwpfh captures 1 name support.language.token.mwpfh match (\w+)\s*\( name meta.name.token.mwpfh captures 1 name variable.parameter.token.mwpfh match (\w+)\s*(=) name meta.name.parameter.token.mwpfh match ".*?" name string.quoted.double.mwpfh scopeName text.mwpfh uuid cd3e2ffa-a57d-4c40-954f-1a2e87ffd638 mwparserfromhell-0.4.2/tests/__init__.py000066400000000000000000000000311255634533200203630ustar00rootroot00000000000000# -*- coding: utf-8 -*- mwparserfromhell-0.4.2/tests/_test_tokenizer.py000066400000000000000000000142121255634533200220420ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import print_function, unicode_literals import codecs from os import listdir, path import sys from mwparserfromhell.compat import py3k, str from mwparserfromhell.parser import tokens from mwparserfromhell.parser.builder import Builder class _TestParseError(Exception): """Raised internally when a test could not be parsed.""" pass class TokenizerTestCase(object): """A base test case for tokenizers, whose tests are loaded dynamically. Subclassed along with unittest.TestCase to form TestPyTokenizer and TestCTokenizer. Tests are loaded dynamically from files in the 'tokenizer' directory. """ @staticmethod def _build_test_method(funcname, data): """Create and return a method to be treated as a test case method. *data* is a dict containing multiple keys: the *input* text to be tokenized, the expected list of tokens as *output*, and an optional *label* for the method's docstring. """ def inner(self): if hasattr(self, "roundtrip"): expected = data["input"] actual = str(Builder().build(data["output"][:])) else: expected = data["output"] actual = self.tokenizer().tokenize(data["input"]) self.assertEqual(expected, actual) if not py3k: inner.__name__ = funcname.encode("utf8") inner.__doc__ = data["label"] return inner @staticmethod def _parse_test(test, data): """Parse an individual *test*, storing its info in *data*.""" for line in test.strip().splitlines(): if line.startswith("name:"): data["name"] = line[len("name:"):].strip() elif line.startswith("label:"): data["label"] = line[len("label:"):].strip() elif line.startswith("input:"): raw = line[len("input:"):].strip() if raw[0] == '"' and raw[-1] == '"': raw = raw[1:-1] raw = raw.encode("raw_unicode_escape") data["input"] = raw.decode("unicode_escape") elif line.startswith("output:"): raw = line[len("output:"):].strip() try: data["output"] = eval(raw, vars(tokens)) except Exception as err: raise _TestParseError(err) @classmethod def _load_tests(cls, filename, name, text, restrict=None): """Load all tests in *text* from the file *filename*.""" tests = text.split("\n---\n") counter = 1 digits = len(str(len(tests))) for test in tests: data = {"name": None, "label": None, "input": None, "output": None} try: cls._parse_test(test, data) except _TestParseError as err: if data["name"]: error = "Could not parse test '{0}' in '{1}':\n\t{2}" print(error.format(data["name"], filename, err)) else: error = "Could not parse a test in '{0}':\n\t{1}" print(error.format(filename, err)) continue if not data["name"]: error = "A test in '{0}' was ignored because it lacked a name" print(error.format(filename)) continue if data["input"] is None or data["output"] is None: error = "Test '{0}' in '{1}' was ignored because it lacked an input or an output" print(error.format(data["name"], filename)) continue number = str(counter).zfill(digits) counter += 1 if restrict and data["name"] != restrict: continue fname = "test_{0}{1}_{2}".format(name, number, data["name"]) meth = cls._build_test_method(fname, data) setattr(cls, fname, meth) @classmethod def build(cls): """Load and install all tests from the 'tokenizer' directory.""" def load_file(filename, restrict=None): with codecs.open(filename, "rU", encoding="utf8") as fp: text = fp.read() name = path.split(filename)[1][:-len(extension)] cls._load_tests(filename, name, text, restrict) directory = path.join(path.dirname(__file__), "tokenizer") extension = ".mwtest" if len(sys.argv) > 2 and sys.argv[1] == "--use": for name in sys.argv[2:]: if "." in name: name, test = name.split(".", 1) else: test = None load_file(path.join(directory, name + extension), test) sys.argv = [sys.argv[0]] # So unittest doesn't try to parse this cls.skip_others = True else: for filename in listdir(directory): if not filename.endswith(extension): continue load_file(path.join(directory, filename)) cls.skip_others = False TokenizerTestCase.build() mwparserfromhell-0.4.2/tests/_test_tree_equality.py000066400000000000000000000150231255634533200227050ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: from unittest2 import TestCase except ImportError: from unittest import TestCase from mwparserfromhell.compat import range from mwparserfromhell.nodes import (Argument, Comment, Heading, HTMLEntity, Tag, Template, Text, Wikilink) from mwparserfromhell.nodes.extras import Attribute, Parameter from mwparserfromhell.smart_list import SmartList from mwparserfromhell.wikicode import Wikicode wrap = lambda L: Wikicode(SmartList(L)) wraptext = lambda *args: wrap([Text(t) for t in args]) class TreeEqualityTestCase(TestCase): """A base test case with support for comparing the equality of node trees. This adds a number of type equality functions, for Wikicode, Text, Templates, and Wikilinks. """ def assertNodeEqual(self, expected, actual): """Assert that two Nodes have the same type and have the same data.""" registry = { Argument: self.assertArgumentNodeEqual, Comment: self.assertCommentNodeEqual, Heading: self.assertHeadingNodeEqual, HTMLEntity: self.assertHTMLEntityNodeEqual, Tag: self.assertTagNodeEqual, Template: self.assertTemplateNodeEqual, Text: self.assertTextNodeEqual, Wikilink: self.assertWikilinkNodeEqual } for nodetype in registry: if isinstance(expected, nodetype): self.assertIsInstance(actual, nodetype) registry[nodetype](expected, actual) def assertArgumentNodeEqual(self, expected, actual): """Assert that two Argument nodes have the same data.""" self.assertWikicodeEqual(expected.name, actual.name) if expected.default is not None: self.assertWikicodeEqual(expected.default, actual.default) else: self.assertIs(None, actual.default) def assertCommentNodeEqual(self, expected, actual): """Assert that two Comment nodes have the same data.""" self.assertWikicodeEqual(expected.contents, actual.contents) def assertHeadingNodeEqual(self, expected, actual): """Assert that two Heading nodes have the same data.""" self.assertWikicodeEqual(expected.title, actual.title) self.assertEqual(expected.level, actual.level) def assertHTMLEntityNodeEqual(self, expected, actual): """Assert that two HTMLEntity nodes have the same data.""" self.assertEqual(expected.value, actual.value) self.assertIs(expected.named, actual.named) self.assertIs(expected.hexadecimal, actual.hexadecimal) self.assertEqual(expected.hex_char, actual.hex_char) def assertTagNodeEqual(self, expected, actual): """Assert that two Tag nodes have the same data.""" self.assertWikicodeEqual(expected.tag, actual.tag) if expected.contents is not None: self.assertWikicodeEqual(expected.contents, actual.contents) length = len(expected.attributes) self.assertEqual(length, len(actual.attributes)) for i in range(length): exp_attr = expected.attributes[i] act_attr = actual.attributes[i] self.assertWikicodeEqual(exp_attr.name, act_attr.name) if exp_attr.value is not None: self.assertWikicodeEqual(exp_attr.value, act_attr.value) self.assertEqual(exp_attr.quotes, act_attr.quotes) self.assertEqual(exp_attr.pad_first, act_attr.pad_first) self.assertEqual(exp_attr.pad_before_eq, act_attr.pad_before_eq) self.assertEqual(exp_attr.pad_after_eq, act_attr.pad_after_eq) self.assertEqual(expected.wiki_markup, actual.wiki_markup) self.assertIs(expected.self_closing, actual.self_closing) self.assertIs(expected.invalid, actual.invalid) self.assertIs(expected.implicit, actual.implicit) self.assertEqual(expected.padding, actual.padding) self.assertWikicodeEqual(expected.closing_tag, actual.closing_tag) def assertTemplateNodeEqual(self, expected, actual): """Assert that two Template nodes have the same data.""" self.assertWikicodeEqual(expected.name, actual.name) length = len(expected.params) self.assertEqual(length, len(actual.params)) for i in range(length): exp_param = expected.params[i] act_param = actual.params[i] self.assertWikicodeEqual(exp_param.name, act_param.name) self.assertWikicodeEqual(exp_param.value, act_param.value) self.assertIs(exp_param.showkey, act_param.showkey) def assertTextNodeEqual(self, expected, actual): """Assert that two Text nodes have the same data.""" self.assertEqual(expected.value, actual.value) def assertWikilinkNodeEqual(self, expected, actual): """Assert that two Wikilink nodes have the same data.""" self.assertWikicodeEqual(expected.title, actual.title) if expected.text is not None: self.assertWikicodeEqual(expected.text, actual.text) else: self.assertIs(None, actual.text) def assertWikicodeEqual(self, expected, actual): """Assert that two Wikicode objects have the same data.""" self.assertIsInstance(actual, Wikicode) length = len(expected.nodes) self.assertEqual(length, len(actual.nodes)) for i in range(length): self.assertNodeEqual(expected.get(i), actual.get(i)) mwparserfromhell-0.4.2/tests/compat.py000066400000000000000000000006721255634533200201220ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ Serves the same purpose as mwparserfromhell.compat, but only for objects required by unit tests. This avoids unnecessary imports (like urllib) within the main library. """ from mwparserfromhell.compat import py3k if py3k: from io import StringIO from urllib.parse import urlencode from urllib.request import urlopen else: from StringIO import StringIO from urllib import urlencode, urlopen mwparserfromhell-0.4.2/tests/discover.py000066400000000000000000000011111255634533200204420ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ Discover tests using ``unittest2` for Python 2.6. It appears the default distutils test suite doesn't play nice with ``setUpClass`` thereby making some tests fail. Using ``unittest2`` to load tests seems to work around that issue. http://stackoverflow.com/a/17004409/753501 """ import os.path from mwparserfromhell.compat import py26 if py26: import unittest2 as unittest else: import unittest def additional_tests(): project_root = os.path.split(os.path.dirname(__file__))[0] return unittest.defaultTestLoader.discover(project_root) mwparserfromhell-0.4.2/tests/test_argument.py000066400000000000000000000103461255634533200215170ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.compat import str from mwparserfromhell.nodes import Argument, Text from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext class TestArgument(TreeEqualityTestCase): """Test cases for the Argument node.""" def test_unicode(self): """test Argument.__unicode__()""" node = Argument(wraptext("foobar")) self.assertEqual("{{{foobar}}}", str(node)) node2 = Argument(wraptext("foo"), wraptext("bar")) self.assertEqual("{{{foo|bar}}}", str(node2)) def test_children(self): """test Argument.__children__()""" node1 = Argument(wraptext("foobar")) node2 = Argument(wraptext("foo"), wrap([Text("bar"), Text("baz")])) gen1 = node1.__children__() gen2 = node2.__children__() self.assertIs(node1.name, next(gen1)) self.assertIs(node2.name, next(gen2)) self.assertIs(node2.default, next(gen2)) self.assertRaises(StopIteration, next, gen1) self.assertRaises(StopIteration, next, gen2) def test_strip(self): """test Argument.__strip__()""" node = Argument(wraptext("foobar")) node2 = Argument(wraptext("foo"), wraptext("bar")) for a in (True, False): for b in (True, False): self.assertIs(None, node.__strip__(a, b)) self.assertEqual("bar", node2.__strip__(a, b)) def test_showtree(self): """test Argument.__showtree__()""" output = [] getter, marker = object(), object() get = lambda code: output.append((getter, code)) mark = lambda: output.append(marker) node1 = Argument(wraptext("foobar")) node2 = Argument(wraptext("foo"), wraptext("bar")) node1.__showtree__(output.append, get, mark) node2.__showtree__(output.append, get, mark) valid = [ "{{{", (getter, node1.name), "}}}", "{{{", (getter, node2.name), " | ", marker, (getter, node2.default), "}}}"] self.assertEqual(valid, output) def test_name(self): """test getter/setter for the name attribute""" name = wraptext("foobar") node1 = Argument(name) node2 = Argument(name, wraptext("baz")) self.assertIs(name, node1.name) self.assertIs(name, node2.name) node1.name = "héhehé" node2.name = "héhehé" self.assertWikicodeEqual(wraptext("héhehé"), node1.name) self.assertWikicodeEqual(wraptext("héhehé"), node2.name) def test_default(self): """test getter/setter for the default attribute""" default = wraptext("baz") node1 = Argument(wraptext("foobar")) node2 = Argument(wraptext("foobar"), default) self.assertIs(None, node1.default) self.assertIs(default, node2.default) node1.default = "buzz" node2.default = None self.assertWikicodeEqual(wraptext("buzz"), node1.default) self.assertIs(None, node2.default) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_attribute.py000066400000000000000000000117261255634533200217030ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.compat import str from mwparserfromhell.nodes import Template from mwparserfromhell.nodes.extras import Attribute from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext class TestAttribute(TreeEqualityTestCase): """Test cases for the Attribute node extra.""" def test_unicode(self): """test Attribute.__unicode__()""" node = Attribute(wraptext("foo")) self.assertEqual(" foo", str(node)) node2 = Attribute(wraptext("foo"), wraptext("bar")) self.assertEqual(' foo="bar"', str(node2)) node3 = Attribute(wraptext("a"), wraptext("b"), '"', "", " ", " ") self.assertEqual('a = "b"', str(node3)) node4 = Attribute(wraptext("a"), wraptext("b"), "'", "", " ", " ") self.assertEqual("a = 'b'", str(node4)) node5 = Attribute(wraptext("a"), wraptext("b"), None, "", " ", " ") self.assertEqual("a = b", str(node5)) node6 = Attribute(wraptext("a"), wrap([]), None, " ", "", " ") self.assertEqual(" a= ", str(node6)) def test_name(self): """test getter/setter for the name attribute""" name = wraptext("id") node = Attribute(name, wraptext("bar")) self.assertIs(name, node.name) node.name = "{{id}}" self.assertWikicodeEqual(wrap([Template(wraptext("id"))]), node.name) def test_value(self): """test getter/setter for the value attribute""" value = wraptext("foo") node = Attribute(wraptext("id"), value) self.assertIs(value, node.value) node.value = "{{bar}}" self.assertWikicodeEqual(wrap([Template(wraptext("bar"))]), node.value) node.value = None self.assertIs(None, node.value) node2 = Attribute(wraptext("id"), wraptext("foo"), None) node2.value = "foo bar baz" self.assertWikicodeEqual(wraptext("foo bar baz"), node2.value) self.assertEqual('"', node2.quotes) node2.value = 'foo "bar" baz' self.assertWikicodeEqual(wraptext('foo "bar" baz'), node2.value) self.assertEqual("'", node2.quotes) node2.value = "foo 'bar' baz" self.assertWikicodeEqual(wraptext("foo 'bar' baz"), node2.value) self.assertEqual('"', node2.quotes) node2.value = "fo\"o 'bar' b\"az" self.assertWikicodeEqual(wraptext("fo\"o 'bar' b\"az"), node2.value) self.assertEqual('"', node2.quotes) def test_quotes(self): """test getter/setter for the quotes attribute""" node1 = Attribute(wraptext("id"), wraptext("foo"), None) node2 = Attribute(wraptext("id"), wraptext("bar")) node3 = Attribute(wraptext("id"), wraptext("foo bar baz")) self.assertIs(None, node1.quotes) self.assertEqual('"', node2.quotes) node1.quotes = "'" node2.quotes = None self.assertEqual("'", node1.quotes) self.assertIs(None, node2.quotes) self.assertRaises(ValueError, setattr, node1, "quotes", "foobar") self.assertRaises(ValueError, setattr, node3, "quotes", None) self.assertRaises(ValueError, Attribute, wraptext("id"), wraptext("foo bar baz"), None) def test_padding(self): """test getter/setter for the padding attributes""" for pad in ["pad_first", "pad_before_eq", "pad_after_eq"]: node = Attribute(wraptext("id"), wraptext("foo"), **{pad: "\n"}) self.assertEqual("\n", getattr(node, pad)) setattr(node, pad, " ") self.assertEqual(" ", getattr(node, pad)) setattr(node, pad, None) self.assertEqual("", getattr(node, pad)) self.assertRaises(ValueError, setattr, node, pad, True) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_builder.py000066400000000000000000000525601255634533200213270ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.compat import py3k from mwparserfromhell.nodes import (Argument, Comment, ExternalLink, Heading, HTMLEntity, Tag, Template, Text, Wikilink) from mwparserfromhell.nodes.extras import Attribute, Parameter from mwparserfromhell.parser import tokens, ParserError from mwparserfromhell.parser.builder import Builder from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext class TestBuilder(TreeEqualityTestCase): """Tests for the builder, which turns tokens into Wikicode objects.""" def setUp(self): self.builder = Builder() def test_text(self): """tests for building Text nodes""" tests = [ ([tokens.Text(text="foobar")], wraptext("foobar")), ([tokens.Text(text="fóóbar")], wraptext("fóóbar")), ([tokens.Text(text="spam"), tokens.Text(text="eggs")], wraptext("spam", "eggs")), ] for test, valid in tests: self.assertWikicodeEqual(valid, self.builder.build(test)) def test_template(self): """tests for building Template nodes""" tests = [ ([tokens.TemplateOpen(), tokens.Text(text="foobar"), tokens.TemplateClose()], wrap([Template(wraptext("foobar"))])), ([tokens.TemplateOpen(), tokens.Text(text="spam"), tokens.Text(text="eggs"), tokens.TemplateClose()], wrap([Template(wraptext("spam", "eggs"))])), ([tokens.TemplateOpen(), tokens.Text(text="foo"), tokens.TemplateParamSeparator(), tokens.Text(text="bar"), tokens.TemplateClose()], wrap([Template(wraptext("foo"), params=[ Parameter(wraptext("1"), wraptext("bar"), showkey=False)])])), ([tokens.TemplateOpen(), tokens.Text(text="foo"), tokens.TemplateParamSeparator(), tokens.Text(text="bar"), tokens.TemplateParamEquals(), tokens.Text(text="baz"), tokens.TemplateClose()], wrap([Template(wraptext("foo"), params=[ Parameter(wraptext("bar"), wraptext("baz"))])])), ([tokens.TemplateOpen(), tokens.TemplateParamSeparator(), tokens.TemplateParamSeparator(), tokens.TemplateParamEquals(), tokens.TemplateParamSeparator(), tokens.TemplateClose()], wrap([Template(wrap([]), params=[ Parameter(wraptext("1"), wrap([]), showkey=False), Parameter(wrap([]), wrap([]), showkey=True), Parameter(wraptext("2"), wrap([]), showkey=False)])])), ([tokens.TemplateOpen(), tokens.Text(text="foo"), tokens.TemplateParamSeparator(), tokens.Text(text="bar"), tokens.TemplateParamEquals(), tokens.Text(text="baz"), tokens.TemplateParamSeparator(), tokens.Text(text="biz"), tokens.TemplateParamSeparator(), tokens.Text(text="buzz"), tokens.TemplateParamSeparator(), tokens.Text(text="3"), tokens.TemplateParamEquals(), tokens.Text(text="buff"), tokens.TemplateParamSeparator(), tokens.Text(text="baff"), tokens.TemplateClose()], wrap([Template(wraptext("foo"), params=[ Parameter(wraptext("bar"), wraptext("baz")), Parameter(wraptext("1"), wraptext("biz"), showkey=False), Parameter(wraptext("2"), wraptext("buzz"), showkey=False), Parameter(wraptext("3"), wraptext("buff")), Parameter(wraptext("3"), wraptext("baff"), showkey=False)])])), ] for test, valid in tests: self.assertWikicodeEqual(valid, self.builder.build(test)) def test_argument(self): """tests for building Argument nodes""" tests = [ ([tokens.ArgumentOpen(), tokens.Text(text="foobar"), tokens.ArgumentClose()], wrap([Argument(wraptext("foobar"))])), ([tokens.ArgumentOpen(), tokens.Text(text="spam"), tokens.Text(text="eggs"), tokens.ArgumentClose()], wrap([Argument(wraptext("spam", "eggs"))])), ([tokens.ArgumentOpen(), tokens.Text(text="foo"), tokens.ArgumentSeparator(), tokens.Text(text="bar"), tokens.ArgumentClose()], wrap([Argument(wraptext("foo"), wraptext("bar"))])), ([tokens.ArgumentOpen(), tokens.Text(text="foo"), tokens.Text(text="bar"), tokens.ArgumentSeparator(), tokens.Text(text="baz"), tokens.Text(text="biz"), tokens.ArgumentClose()], wrap([Argument(wraptext("foo", "bar"), wraptext("baz", "biz"))])), ] for test, valid in tests: self.assertWikicodeEqual(valid, self.builder.build(test)) def test_wikilink(self): """tests for building Wikilink nodes""" tests = [ ([tokens.WikilinkOpen(), tokens.Text(text="foobar"), tokens.WikilinkClose()], wrap([Wikilink(wraptext("foobar"))])), ([tokens.WikilinkOpen(), tokens.Text(text="spam"), tokens.Text(text="eggs"), tokens.WikilinkClose()], wrap([Wikilink(wraptext("spam", "eggs"))])), ([tokens.WikilinkOpen(), tokens.Text(text="foo"), tokens.WikilinkSeparator(), tokens.Text(text="bar"), tokens.WikilinkClose()], wrap([Wikilink(wraptext("foo"), wraptext("bar"))])), ([tokens.WikilinkOpen(), tokens.Text(text="foo"), tokens.Text(text="bar"), tokens.WikilinkSeparator(), tokens.Text(text="baz"), tokens.Text(text="biz"), tokens.WikilinkClose()], wrap([Wikilink(wraptext("foo", "bar"), wraptext("baz", "biz"))])), ] for test, valid in tests: self.assertWikicodeEqual(valid, self.builder.build(test)) def test_external_link(self): """tests for building ExternalLink nodes""" tests = [ ([tokens.ExternalLinkOpen(brackets=False), tokens.Text(text="http://example.com/"), tokens.ExternalLinkClose()], wrap([ExternalLink(wraptext("http://example.com/"), brackets=False)])), ([tokens.ExternalLinkOpen(brackets=True), tokens.Text(text="http://example.com/"), tokens.ExternalLinkClose()], wrap([ExternalLink(wraptext("http://example.com/"))])), ([tokens.ExternalLinkOpen(brackets=True), tokens.Text(text="http://example.com/"), tokens.ExternalLinkSeparator(), tokens.ExternalLinkClose()], wrap([ExternalLink(wraptext("http://example.com/"), wrap([]))])), ([tokens.ExternalLinkOpen(brackets=True), tokens.Text(text="http://example.com/"), tokens.ExternalLinkSeparator(), tokens.Text(text="Example"), tokens.ExternalLinkClose()], wrap([ExternalLink(wraptext("http://example.com/"), wraptext("Example"))])), ([tokens.ExternalLinkOpen(brackets=False), tokens.Text(text="http://example"), tokens.Text(text=".com/foo"), tokens.ExternalLinkClose()], wrap([ExternalLink(wraptext("http://example", ".com/foo"), brackets=False)])), ([tokens.ExternalLinkOpen(brackets=True), tokens.Text(text="http://example"), tokens.Text(text=".com/foo"), tokens.ExternalLinkSeparator(), tokens.Text(text="Example"), tokens.Text(text=" Web Page"), tokens.ExternalLinkClose()], wrap([ExternalLink(wraptext("http://example", ".com/foo"), wraptext("Example", " Web Page"))])), ] for test, valid in tests: self.assertWikicodeEqual(valid, self.builder.build(test)) def test_html_entity(self): """tests for building HTMLEntity nodes""" tests = [ ([tokens.HTMLEntityStart(), tokens.Text(text="nbsp"), tokens.HTMLEntityEnd()], wrap([HTMLEntity("nbsp", named=True, hexadecimal=False)])), ([tokens.HTMLEntityStart(), tokens.HTMLEntityNumeric(), tokens.Text(text="107"), tokens.HTMLEntityEnd()], wrap([HTMLEntity("107", named=False, hexadecimal=False)])), ([tokens.HTMLEntityStart(), tokens.HTMLEntityNumeric(), tokens.HTMLEntityHex(char="X"), tokens.Text(text="6B"), tokens.HTMLEntityEnd()], wrap([HTMLEntity("6B", named=False, hexadecimal=True, hex_char="X")])), ] for test, valid in tests: self.assertWikicodeEqual(valid, self.builder.build(test)) def test_heading(self): """tests for building Heading nodes""" tests = [ ([tokens.HeadingStart(level=2), tokens.Text(text="foobar"), tokens.HeadingEnd()], wrap([Heading(wraptext("foobar"), 2)])), ([tokens.HeadingStart(level=4), tokens.Text(text="spam"), tokens.Text(text="eggs"), tokens.HeadingEnd()], wrap([Heading(wraptext("spam", "eggs"), 4)])), ] for test, valid in tests: self.assertWikicodeEqual(valid, self.builder.build(test)) def test_comment(self): """tests for building Comment nodes""" tests = [ ([tokens.CommentStart(), tokens.Text(text="foobar"), tokens.CommentEnd()], wrap([Comment(wraptext("foobar"))])), ([tokens.CommentStart(), tokens.Text(text="spam"), tokens.Text(text="eggs"), tokens.CommentEnd()], wrap([Comment(wraptext("spam", "eggs"))])), ] for test, valid in tests: self.assertWikicodeEqual(valid, self.builder.build(test)) def test_tag(self): """tests for building Tag nodes""" tests = [ # ([tokens.TagOpenOpen(), tokens.Text(text="ref"), tokens.TagCloseOpen(padding=""), tokens.TagOpenClose(), tokens.Text(text="ref"), tokens.TagCloseClose()], wrap([Tag(wraptext("ref"), wrap([]), closing_tag=wraptext("ref"))])), # ([tokens.TagOpenOpen(), tokens.Text(text="ref"), tokens.TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), tokens.Text(text="name"), tokens.TagCloseOpen(padding=""), tokens.TagOpenClose(), tokens.Text(text="ref"), tokens.TagCloseClose()], wrap([Tag(wraptext("ref"), wrap([]), attrs=[Attribute(wraptext("name"))])])), # ([tokens.TagOpenOpen(), tokens.Text(text="ref"), tokens.TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), tokens.Text(text="name"), tokens.TagAttrEquals(), tokens.TagAttrQuote(char='"'), tokens.Text(text="abc"), tokens.TagCloseSelfclose(padding=" ")], wrap([Tag(wraptext("ref"), attrs=[Attribute(wraptext("name"), wraptext("abc"))], self_closing=True, padding=" ")])), #
    ([tokens.TagOpenOpen(), tokens.Text(text="br"), tokens.TagCloseSelfclose(padding="")], wrap([Tag(wraptext("br"), self_closing=True)])), #
  • ([tokens.TagOpenOpen(), tokens.Text(text="li"), tokens.TagCloseSelfclose(padding="", implicit=True)], wrap([Tag(wraptext("li"), self_closing=True, implicit=True)])), #
    ([tokens.TagOpenOpen(invalid=True), tokens.Text(text="br"), tokens.TagCloseSelfclose(padding="", implicit=True)], wrap([Tag(wraptext("br"), self_closing=True, invalid=True, implicit=True)])), #
    ([tokens.TagOpenOpen(invalid=True), tokens.Text(text="br"), tokens.TagCloseSelfclose(padding="")], wrap([Tag(wraptext("br"), self_closing=True, invalid=True)])), # [[Source]] ([tokens.TagOpenOpen(), tokens.Text(text="ref"), tokens.TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), tokens.Text(text="name"), tokens.TagAttrEquals(), tokens.TemplateOpen(), tokens.Text(text="abc"), tokens.TemplateClose(), tokens.TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), tokens.Text(text="foo"), tokens.TagAttrEquals(), tokens.TagAttrQuote(char='"'), tokens.Text(text="bar "), tokens.TemplateOpen(), tokens.Text(text="baz"), tokens.TemplateClose(), tokens.TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), tokens.Text(text="abc"), tokens.TagAttrEquals(), tokens.TemplateOpen(), tokens.Text(text="de"), tokens.TemplateClose(), tokens.Text(text="f"), tokens.TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), tokens.Text(text="ghi"), tokens.TagAttrEquals(), tokens.Text(text="j"), tokens.TemplateOpen(), tokens.Text(text="k"), tokens.TemplateClose(), tokens.TemplateOpen(), tokens.Text(text="l"), tokens.TemplateClose(), tokens.TagAttrStart(pad_first=" \n ", pad_before_eq=" ", pad_after_eq=" "), tokens.Text(text="mno"), tokens.TagAttrEquals(), tokens.TagAttrQuote(char="'"), tokens.TemplateOpen(), tokens.Text(text="p"), tokens.TemplateClose(), tokens.Text(text=" "), tokens.WikilinkOpen(), tokens.Text(text="q"), tokens.WikilinkClose(), tokens.Text(text=" "), tokens.TemplateOpen(), tokens.Text(text="r"), tokens.TemplateClose(), tokens.TagCloseOpen(padding=""), tokens.WikilinkOpen(), tokens.Text(text="Source"), tokens.WikilinkClose(), tokens.TagOpenClose(), tokens.Text(text="ref"), tokens.TagCloseClose()], wrap([Tag(wraptext("ref"), wrap([Wikilink(wraptext("Source"))]), [ Attribute(wraptext("name"), wrap([Template(wraptext("abc"))]), None), Attribute(wraptext("foo"), wrap([Text("bar "), Template(wraptext("baz"))]), pad_first=" "), Attribute(wraptext("abc"), wrap([Template(wraptext("de")), Text("f")]), None), Attribute(wraptext("ghi"), wrap([Text("j"), Template(wraptext("k")), Template(wraptext("l"))]), None), Attribute(wraptext("mno"), wrap([Template(wraptext("p")), Text(" "), Wikilink(wraptext("q")), Text(" "), Template(wraptext("r"))]), "'", " \n ", " ", " ")])])), # "''italic text''" ([tokens.TagOpenOpen(wiki_markup="''"), tokens.Text(text="i"), tokens.TagCloseOpen(), tokens.Text(text="italic text"), tokens.TagOpenClose(), tokens.Text(text="i"), tokens.TagCloseClose()], wrap([Tag(wraptext("i"), wraptext("italic text"), wiki_markup="''")])), # * bullet ([tokens.TagOpenOpen(wiki_markup="*"), tokens.Text(text="li"), tokens.TagCloseSelfclose(), tokens.Text(text=" bullet")], wrap([Tag(wraptext("li"), wiki_markup="*", self_closing=True), Text(" bullet")])), ] for test, valid in tests: self.assertWikicodeEqual(valid, self.builder.build(test)) def test_integration(self): """a test for building a combination of templates together""" # {{{{{{{{foo}}bar|baz=biz}}buzz}}usr|{{bin}}}} test = [tokens.TemplateOpen(), tokens.TemplateOpen(), tokens.TemplateOpen(), tokens.TemplateOpen(), tokens.Text(text="foo"), tokens.TemplateClose(), tokens.Text(text="bar"), tokens.TemplateParamSeparator(), tokens.Text(text="baz"), tokens.TemplateParamEquals(), tokens.Text(text="biz"), tokens.TemplateClose(), tokens.Text(text="buzz"), tokens.TemplateClose(), tokens.Text(text="usr"), tokens.TemplateParamSeparator(), tokens.TemplateOpen(), tokens.Text(text="bin"), tokens.TemplateClose(), tokens.TemplateClose()] valid = wrap( [Template(wrap([Template(wrap([Template(wrap([Template(wraptext( "foo")), Text("bar")]), params=[Parameter(wraptext("baz"), wraptext("biz"))]), Text("buzz")])), Text("usr")]), params=[ Parameter(wraptext("1"), wrap([Template(wraptext("bin"))]), showkey=False)])]) self.assertWikicodeEqual(valid, self.builder.build(test)) def test_integration2(self): """an even more audacious test for building a horrible wikicode mess""" # {{a|b|{{c|[[d]]{{{e}}}}}}}[[f|{{{g}}}]]{{i|j= }} test = [tokens.TemplateOpen(), tokens.Text(text="a"), tokens.TemplateParamSeparator(), tokens.Text(text="b"), tokens.TemplateParamSeparator(), tokens.TemplateOpen(), tokens.Text(text="c"), tokens.TemplateParamSeparator(), tokens.WikilinkOpen(), tokens.Text(text="d"), tokens.WikilinkClose(), tokens.ArgumentOpen(), tokens.Text(text="e"), tokens.ArgumentClose(), tokens.TemplateClose(), tokens.TemplateClose(), tokens.WikilinkOpen(), tokens.Text(text="f"), tokens.WikilinkSeparator(), tokens.ArgumentOpen(), tokens.Text(text="g"), tokens.ArgumentClose(), tokens.CommentStart(), tokens.Text(text="h"), tokens.CommentEnd(), tokens.WikilinkClose(), tokens.TemplateOpen(), tokens.Text(text="i"), tokens.TemplateParamSeparator(), tokens.Text(text="j"), tokens.TemplateParamEquals(), tokens.HTMLEntityStart(), tokens.Text(text="nbsp"), tokens.HTMLEntityEnd(), tokens.TemplateClose()] valid = wrap( [Template(wraptext("a"), params=[Parameter(wraptext("1"), wraptext( "b"), showkey=False), Parameter(wraptext("2"), wrap([Template( wraptext("c"), params=[Parameter(wraptext("1"), wrap([Wikilink( wraptext("d")), Argument(wraptext("e"))]), showkey=False)])]), showkey=False)]), Wikilink(wraptext("f"), wrap([Argument(wraptext( "g")), Comment(wraptext("h"))])), Template(wraptext("i"), params=[ Parameter(wraptext("j"), wrap([HTMLEntity("nbsp", named=True)]))])]) self.assertWikicodeEqual(valid, self.builder.build(test)) def test_parser_errors(self): """test whether ParserError gets thrown for bad input""" missing_closes = [ [tokens.TemplateOpen(), tokens.TemplateParamSeparator()], [tokens.TemplateOpen()], [tokens.ArgumentOpen()], [tokens.WikilinkOpen()], [tokens.ExternalLinkOpen()], [tokens.HeadingStart()], [tokens.CommentStart()], [tokens.TagOpenOpen(), tokens.TagAttrStart()], [tokens.TagOpenOpen()] ] func = self.assertRaisesRegex if py3k else self.assertRaisesRegexp msg = r"_handle_token\(\) got unexpected TemplateClose" func(ParserError, msg, self.builder.build, [tokens.TemplateClose()]) for test in missing_closes: self.assertRaises(ParserError, self.builder.build, test) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_comment.py000066400000000000000000000050611255634533200213350ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.compat import str from mwparserfromhell.nodes import Comment from ._test_tree_equality import TreeEqualityTestCase class TestComment(TreeEqualityTestCase): """Test cases for the Comment node.""" def test_unicode(self): """test Comment.__unicode__()""" node = Comment("foobar") self.assertEqual("", str(node)) def test_children(self): """test Comment.__children__()""" node = Comment("foobar") gen = node.__children__() self.assertRaises(StopIteration, next, gen) def test_strip(self): """test Comment.__strip__()""" node = Comment("foobar") for a in (True, False): for b in (True, False): self.assertIs(None, node.__strip__(a, b)) def test_showtree(self): """test Comment.__showtree__()""" output = [] node = Comment("foobar") node.__showtree__(output.append, None, None) self.assertEqual([""], output) def test_contents(self): """test getter/setter for the contents attribute""" node = Comment("foobar") self.assertEqual("foobar", node.contents) node.contents = "barfoo" self.assertEqual("barfoo", node.contents) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_ctokenizer.py000066400000000000000000000036701255634533200220540ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest try: from mwparserfromhell.parser._tokenizer import CTokenizer except ImportError: CTokenizer = None from ._test_tokenizer import TokenizerTestCase @unittest.skipUnless(CTokenizer, "C tokenizer not available") class TestCTokenizer(TokenizerTestCase, unittest.TestCase): """Test cases for the C tokenizer.""" @classmethod def setUpClass(cls): cls.tokenizer = CTokenizer if not TokenizerTestCase.skip_others: def test_uses_c(self): """make sure the C tokenizer identifies as using a C extension""" self.assertTrue(CTokenizer.USES_C) self.assertTrue(CTokenizer().USES_C) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_docs.py000066400000000000000000000141251255634533200206240ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import print_function, unicode_literals import json import os try: import unittest2 as unittest except ImportError: import unittest import mwparserfromhell from mwparserfromhell.compat import py3k, str from .compat import StringIO, urlencode, urlopen class TestDocs(unittest.TestCase): """Integration test cases for mwparserfromhell's documentation.""" def assertPrint(self, input, output): """Assertion check that *input*, when printed, produces *output*.""" buff = StringIO() print(input, end="", file=buff) buff.seek(0) self.assertEqual(output, buff.read()) def test_readme_1(self): """test a block of example code in the README""" text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?" wikicode = mwparserfromhell.parse(text) self.assertPrint(wikicode, "I has a template! {{foo|bar|baz|eggs=spam}} See it?") templates = wikicode.filter_templates() if py3k: self.assertPrint(templates, "['{{foo|bar|baz|eggs=spam}}']") else: self.assertPrint(templates, "[u'{{foo|bar|baz|eggs=spam}}']") template = templates[0] self.assertPrint(template.name, "foo") if py3k: self.assertPrint(template.params, "['bar', 'baz', 'eggs=spam']") else: self.assertPrint(template.params, "[u'bar', u'baz', u'eggs=spam']") self.assertPrint(template.get(1).value, "bar") self.assertPrint(template.get("eggs").value, "spam") def test_readme_2(self): """test a block of example code in the README""" text = "{{foo|{{bar}}={{baz|{{spam}}}}}}" temps = mwparserfromhell.parse(text).filter_templates() if py3k: res = "['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']" else: res = "[u'{{foo|{{bar}}={{baz|{{spam}}}}}}', u'{{bar}}', u'{{baz|{{spam}}}}', u'{{spam}}']" self.assertPrint(temps, res) def test_readme_3(self): """test a block of example code in the README""" code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}") if py3k: self.assertPrint(code.filter_templates(recursive=False), "['{{foo|this {{includes a|template}}}}']") else: self.assertPrint(code.filter_templates(recursive=False), "[u'{{foo|this {{includes a|template}}}}']") foo = code.filter_templates(recursive=False)[0] self.assertPrint(foo.get(1).value, "this {{includes a|template}}") self.assertPrint(foo.get(1).value.filter_templates()[0], "{{includes a|template}}") self.assertPrint(foo.get(1).value.filter_templates()[0].get(1).value, "template") def test_readme_4(self): """test a block of example code in the README""" text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}" code = mwparserfromhell.parse(text) for template in code.filter_templates(): if template.name.matches("Cleanup") and not template.has("date"): template.add("date", "July 2012") res = "{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}" self.assertPrint(code, res) code.replace("{{uncategorized}}", "{{bar-stub}}") res = "{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}" self.assertPrint(code, res) if py3k: res = "['{{cleanup|date=July 2012}}', '{{bar-stub}}']" else: res = "[u'{{cleanup|date=July 2012}}', u'{{bar-stub}}']" self.assertPrint(code.filter_templates(), res) text = str(code) res = "{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}" self.assertPrint(text, res) self.assertEqual(text, code) @unittest.skipIf("NOWEB" in os.environ, "web test disabled by environ var") def test_readme_5(self): """test a block of example code in the README; includes a web call""" url1 = "https://en.wikipedia.org/w/api.php" url2 = "https://en.wikipedia.org/w/index.php?title={0}&action=raw" title = "Test" data = {"action": "query", "prop": "revisions", "rvlimit": 1, "rvprop": "content", "format": "json", "titles": title} try: raw = urlopen(url1, urlencode(data).encode("utf8")).read() except IOError: self.skipTest("cannot continue because of unsuccessful web call") res = json.loads(raw.decode("utf8")) text = list(res["query"]["pages"].values())[0]["revisions"][0]["*"] try: expected = urlopen(url2.format(title)).read().decode("utf8") except IOError: self.skipTest("cannot continue because of unsuccessful web call") actual = mwparserfromhell.parse(text) self.assertEqual(expected, actual) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_external_link.py000066400000000000000000000134201255634533200225300ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.compat import str from mwparserfromhell.nodes import ExternalLink, Text from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext class TestExternalLink(TreeEqualityTestCase): """Test cases for the ExternalLink node.""" def test_unicode(self): """test ExternalLink.__unicode__()""" node = ExternalLink(wraptext("http://example.com/"), brackets=False) self.assertEqual("http://example.com/", str(node)) node2 = ExternalLink(wraptext("http://example.com/")) self.assertEqual("[http://example.com/]", str(node2)) node3 = ExternalLink(wraptext("http://example.com/"), wrap([])) self.assertEqual("[http://example.com/ ]", str(node3)) node4 = ExternalLink(wraptext("http://example.com/"), wraptext("Example Web Page")) self.assertEqual("[http://example.com/ Example Web Page]", str(node4)) def test_children(self): """test ExternalLink.__children__()""" node1 = ExternalLink(wraptext("http://example.com/"), brackets=False) node2 = ExternalLink(wraptext("http://example.com/"), wrap([Text("Example"), Text("Page")])) gen1 = node1.__children__() gen2 = node2.__children__() self.assertEqual(node1.url, next(gen1)) self.assertEqual(node2.url, next(gen2)) self.assertEqual(node2.title, next(gen2)) self.assertRaises(StopIteration, next, gen1) self.assertRaises(StopIteration, next, gen2) def test_strip(self): """test ExternalLink.__strip__()""" node1 = ExternalLink(wraptext("http://example.com"), brackets=False) node2 = ExternalLink(wraptext("http://example.com")) node3 = ExternalLink(wraptext("http://example.com"), wrap([])) node4 = ExternalLink(wraptext("http://example.com"), wraptext("Link")) for a in (True, False): for b in (True, False): self.assertEqual("http://example.com", node1.__strip__(a, b)) self.assertEqual(None, node2.__strip__(a, b)) self.assertEqual(None, node3.__strip__(a, b)) self.assertEqual("Link", node4.__strip__(a, b)) def test_showtree(self): """test ExternalLink.__showtree__()""" output = [] getter, marker = object(), object() get = lambda code: output.append((getter, code)) mark = lambda: output.append(marker) node1 = ExternalLink(wraptext("http://example.com"), brackets=False) node2 = ExternalLink(wraptext("http://example.com"), wraptext("Link")) node1.__showtree__(output.append, get, mark) node2.__showtree__(output.append, get, mark) valid = [ (getter, node1.url), "[", (getter, node2.url), (getter, node2.title), "]"] self.assertEqual(valid, output) def test_url(self): """test getter/setter for the url attribute""" url = wraptext("http://example.com/") node1 = ExternalLink(url, brackets=False) node2 = ExternalLink(url, wraptext("Example")) self.assertIs(url, node1.url) self.assertIs(url, node2.url) node1.url = "mailto:héhehé@spam.com" node2.url = "mailto:héhehé@spam.com" self.assertWikicodeEqual(wraptext("mailto:héhehé@spam.com"), node1.url) self.assertWikicodeEqual(wraptext("mailto:héhehé@spam.com"), node2.url) def test_title(self): """test getter/setter for the title attribute""" title = wraptext("Example!") node1 = ExternalLink(wraptext("http://example.com/"), brackets=False) node2 = ExternalLink(wraptext("http://example.com/"), title) self.assertIs(None, node1.title) self.assertIs(title, node2.title) node2.title = None self.assertIs(None, node2.title) node2.title = "My Website" self.assertWikicodeEqual(wraptext("My Website"), node2.title) def test_brackets(self): """test getter/setter for the brackets attribute""" node1 = ExternalLink(wraptext("http://example.com/"), brackets=False) node2 = ExternalLink(wraptext("http://example.com/"), wraptext("Link")) self.assertFalse(node1.brackets) self.assertTrue(node2.brackets) node1.brackets = True node2.brackets = False self.assertTrue(node1.brackets) self.assertFalse(node2.brackets) self.assertEqual("[http://example.com/]", str(node1)) self.assertEqual("http://example.com/", str(node2)) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_heading.py000066400000000000000000000071251255634533200212750ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.compat import str from mwparserfromhell.nodes import Heading, Text from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext class TestHeading(TreeEqualityTestCase): """Test cases for the Heading node.""" def test_unicode(self): """test Heading.__unicode__()""" node = Heading(wraptext("foobar"), 2) self.assertEqual("==foobar==", str(node)) node2 = Heading(wraptext(" zzz "), 5) self.assertEqual("===== zzz =====", str(node2)) def test_children(self): """test Heading.__children__()""" node = Heading(wrap([Text("foo"), Text("bar")]), 3) gen = node.__children__() self.assertEqual(node.title, next(gen)) self.assertRaises(StopIteration, next, gen) def test_strip(self): """test Heading.__strip__()""" node = Heading(wraptext("foobar"), 3) for a in (True, False): for b in (True, False): self.assertEqual("foobar", node.__strip__(a, b)) def test_showtree(self): """test Heading.__showtree__()""" output = [] getter = object() get = lambda code: output.append((getter, code)) node1 = Heading(wraptext("foobar"), 3) node2 = Heading(wraptext(" baz "), 4) node1.__showtree__(output.append, get, None) node2.__showtree__(output.append, get, None) valid = ["===", (getter, node1.title), "===", "====", (getter, node2.title), "===="] self.assertEqual(valid, output) def test_title(self): """test getter/setter for the title attribute""" title = wraptext("foobar") node = Heading(title, 3) self.assertIs(title, node.title) node.title = "héhehé" self.assertWikicodeEqual(wraptext("héhehé"), node.title) def test_level(self): """test getter/setter for the level attribute""" node = Heading(wraptext("foobar"), 3) self.assertEqual(3, node.level) node.level = 5 self.assertEqual(5, node.level) self.assertRaises(ValueError, setattr, node, "level", 0) self.assertRaises(ValueError, setattr, node, "level", 7) self.assertRaises(ValueError, setattr, node, "level", "abc") self.assertRaises(ValueError, setattr, node, "level", False) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_html_entity.py000066400000000000000000000165431255634533200222420ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.compat import str from mwparserfromhell.nodes import HTMLEntity from ._test_tree_equality import TreeEqualityTestCase, wrap class TestHTMLEntity(TreeEqualityTestCase): """Test cases for the HTMLEntity node.""" def test_unicode(self): """test HTMLEntity.__unicode__()""" node1 = HTMLEntity("nbsp", named=True, hexadecimal=False) node2 = HTMLEntity("107", named=False, hexadecimal=False) node3 = HTMLEntity("6b", named=False, hexadecimal=True) node4 = HTMLEntity("6C", named=False, hexadecimal=True, hex_char="X") self.assertEqual(" ", str(node1)) self.assertEqual("k", str(node2)) self.assertEqual("k", str(node3)) self.assertEqual("l", str(node4)) def test_children(self): """test HTMLEntity.__children__()""" node = HTMLEntity("nbsp", named=True, hexadecimal=False) gen = node.__children__() self.assertRaises(StopIteration, next, gen) def test_strip(self): """test HTMLEntity.__strip__()""" node1 = HTMLEntity("nbsp", named=True, hexadecimal=False) node2 = HTMLEntity("107", named=False, hexadecimal=False) node3 = HTMLEntity("e9", named=False, hexadecimal=True) for a in (True, False): self.assertEqual("\xa0", node1.__strip__(True, a)) self.assertEqual(" ", node1.__strip__(False, a)) self.assertEqual("k", node2.__strip__(True, a)) self.assertEqual("k", node2.__strip__(False, a)) self.assertEqual("é", node3.__strip__(True, a)) self.assertEqual("é", node3.__strip__(False, a)) def test_showtree(self): """test HTMLEntity.__showtree__()""" output = [] node1 = HTMLEntity("nbsp", named=True, hexadecimal=False) node2 = HTMLEntity("107", named=False, hexadecimal=False) node3 = HTMLEntity("e9", named=False, hexadecimal=True) node1.__showtree__(output.append, None, None) node2.__showtree__(output.append, None, None) node3.__showtree__(output.append, None, None) res = [" ", "k", "é"] self.assertEqual(res, output) def test_value(self): """test getter/setter for the value attribute""" node1 = HTMLEntity("nbsp") node2 = HTMLEntity("107") node3 = HTMLEntity("e9") self.assertEqual("nbsp", node1.value) self.assertEqual("107", node2.value) self.assertEqual("e9", node3.value) node1.value = "ffa4" node2.value = 72 node3.value = "Sigma" self.assertEqual("ffa4", node1.value) self.assertFalse(node1.named) self.assertTrue(node1.hexadecimal) self.assertEqual("72", node2.value) self.assertFalse(node2.named) self.assertFalse(node2.hexadecimal) self.assertEqual("Sigma", node3.value) self.assertTrue(node3.named) self.assertFalse(node3.hexadecimal) node1.value = "10FFFF" node2.value = 110000 node2.value = 1114111 self.assertRaises(ValueError, setattr, node3, "value", "") self.assertRaises(ValueError, setattr, node3, "value", "foobar") self.assertRaises(ValueError, setattr, node3, "value", True) self.assertRaises(ValueError, setattr, node3, "value", -1) self.assertRaises(ValueError, setattr, node1, "value", 110000) self.assertRaises(ValueError, setattr, node1, "value", "1114112") self.assertRaises(ValueError, setattr, node1, "value", "12FFFF") def test_named(self): """test getter/setter for the named attribute""" node1 = HTMLEntity("nbsp") node2 = HTMLEntity("107") node3 = HTMLEntity("e9") self.assertTrue(node1.named) self.assertFalse(node2.named) self.assertFalse(node3.named) node1.named = 1 node2.named = 0 node3.named = 0 self.assertTrue(node1.named) self.assertFalse(node2.named) self.assertFalse(node3.named) self.assertRaises(ValueError, setattr, node1, "named", False) self.assertRaises(ValueError, setattr, node2, "named", True) self.assertRaises(ValueError, setattr, node3, "named", True) def test_hexadecimal(self): """test getter/setter for the hexadecimal attribute""" node1 = HTMLEntity("nbsp") node2 = HTMLEntity("107") node3 = HTMLEntity("e9") self.assertFalse(node1.hexadecimal) self.assertFalse(node2.hexadecimal) self.assertTrue(node3.hexadecimal) node1.hexadecimal = False node2.hexadecimal = True node3.hexadecimal = False self.assertFalse(node1.hexadecimal) self.assertTrue(node2.hexadecimal) self.assertFalse(node3.hexadecimal) self.assertRaises(ValueError, setattr, node1, "hexadecimal", True) def test_hex_char(self): """test getter/setter for the hex_char attribute""" node1 = HTMLEntity("e9") node2 = HTMLEntity("e9", hex_char="X") self.assertEqual("x", node1.hex_char) self.assertEqual("X", node2.hex_char) node1.hex_char = "X" node2.hex_char = "x" self.assertEqual("X", node1.hex_char) self.assertEqual("x", node2.hex_char) self.assertRaises(ValueError, setattr, node1, "hex_char", 123) self.assertRaises(ValueError, setattr, node1, "hex_char", "foobar") self.assertRaises(ValueError, setattr, node1, "hex_char", True) def test_normalize(self): """test getter/setter for the normalize attribute""" node1 = HTMLEntity("nbsp") node2 = HTMLEntity("107") node3 = HTMLEntity("e9") node4 = HTMLEntity("1f648") node5 = HTMLEntity("-2") node6 = HTMLEntity("110000", named=False, hexadecimal=True) self.assertEqual("\xa0", node1.normalize()) self.assertEqual("k", node2.normalize()) self.assertEqual("é", node3.normalize()) self.assertEqual("\U0001F648", node4.normalize()) self.assertRaises(ValueError, node5.normalize) self.assertRaises(ValueError, node6.normalize) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_parameter.py000066400000000000000000000063161255634533200216570ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.compat import str from mwparserfromhell.nodes import Text from mwparserfromhell.nodes.extras import Parameter from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext class TestParameter(TreeEqualityTestCase): """Test cases for the Parameter node extra.""" def test_unicode(self): """test Parameter.__unicode__()""" node = Parameter(wraptext("1"), wraptext("foo"), showkey=False) self.assertEqual("foo", str(node)) node2 = Parameter(wraptext("foo"), wraptext("bar")) self.assertEqual("foo=bar", str(node2)) def test_name(self): """test getter/setter for the name attribute""" name1 = wraptext("1") name2 = wraptext("foobar") node1 = Parameter(name1, wraptext("foobar"), showkey=False) node2 = Parameter(name2, wraptext("baz")) self.assertIs(name1, node1.name) self.assertIs(name2, node2.name) node1.name = "héhehé" node2.name = "héhehé" self.assertWikicodeEqual(wraptext("héhehé"), node1.name) self.assertWikicodeEqual(wraptext("héhehé"), node2.name) def test_value(self): """test getter/setter for the value attribute""" value = wraptext("bar") node = Parameter(wraptext("foo"), value) self.assertIs(value, node.value) node.value = "héhehé" self.assertWikicodeEqual(wraptext("héhehé"), node.value) def test_showkey(self): """test getter/setter for the showkey attribute""" node1 = Parameter(wraptext("1"), wraptext("foo"), showkey=False) node2 = Parameter(wraptext("foo"), wraptext("bar")) self.assertFalse(node1.showkey) self.assertTrue(node2.showkey) node1.showkey = True self.assertTrue(node1.showkey) node1.showkey = "" self.assertFalse(node1.showkey) self.assertRaises(ValueError, setattr, node2, "showkey", False) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_parser.py000066400000000000000000000072401255634533200211700ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell import parser from mwparserfromhell.compat import range from mwparserfromhell.nodes import Tag, Template, Text, Wikilink from mwparserfromhell.nodes.extras import Parameter from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext class TestParser(TreeEqualityTestCase): """Tests for the Parser class itself, which tokenizes and builds nodes.""" def test_use_c(self): """make sure the correct tokenizer is used""" restore = parser.use_c if parser.use_c: self.assertTrue(parser.Parser()._tokenizer.USES_C) parser.use_c = False self.assertFalse(parser.Parser()._tokenizer.USES_C) parser.use_c = restore def test_parsing(self): """integration test for parsing overall""" text = "this is text; {{this|is=a|template={{with|[[links]]|in}}it}}" expected = wrap([ Text("this is text; "), Template(wraptext("this"), [ Parameter(wraptext("is"), wraptext("a")), Parameter(wraptext("template"), wrap([ Template(wraptext("with"), [ Parameter(wraptext("1"), wrap([Wikilink(wraptext("links"))]), showkey=False), Parameter(wraptext("2"), wraptext("in"), showkey=False) ]), Text("it") ])) ]) ]) actual = parser.Parser().parse(text) self.assertWikicodeEqual(expected, actual) def test_skip_style_tags(self): """test Parser.parse(skip_style_tags=True)""" def test(): with_style = parser.Parser().parse(text, skip_style_tags=False) without_style = parser.Parser().parse(text, skip_style_tags=True) self.assertWikicodeEqual(a, with_style) self.assertWikicodeEqual(b, without_style) text = "This is an example with ''italics''!" a = wrap([Text("This is an example with "), Tag(wraptext("i"), wraptext("italics"), wiki_markup="''"), Text("!")]) b = wraptext("This is an example with ''italics''!") restore = parser.use_c if parser.use_c: test() parser.use_c = False test() parser.use_c = restore if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_pytokenizer.py000066400000000000000000000035071255634533200222610ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.parser.tokenizer import Tokenizer from ._test_tokenizer import TokenizerTestCase class TestPyTokenizer(TokenizerTestCase, unittest.TestCase): """Test cases for the Python tokenizer.""" @classmethod def setUpClass(cls): cls.tokenizer = Tokenizer if not TokenizerTestCase.skip_others: def test_uses_c(self): """make sure the Python tokenizer identifies as not using C""" self.assertFalse(Tokenizer.USES_C) self.assertFalse(Tokenizer().USES_C) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_roundtripping.py000066400000000000000000000030501255634533200225730ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from ._test_tokenizer import TokenizerTestCase class TestRoundtripping(TokenizerTestCase, unittest.TestCase): """Test cases for roundtripping tokens back to wikitext.""" @classmethod def setUpClass(cls): cls.roundtrip = True if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_smart_list.py000066400000000000000000000406621255634533200220620ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.compat import py3k, range from mwparserfromhell.smart_list import SmartList, _ListProxy class TestSmartList(unittest.TestCase): """Test cases for the SmartList class and its child, _ListProxy.""" def _test_get_set_del_item(self, builder): """Run tests on __get/set/delitem__ of a list built with *builder*.""" def assign(L, s1, s2, s3, val): L[s1:s2:s3] = val def delete(L, s1): del L[s1] list1 = builder([0, 1, 2, 3, "one", "two"]) list2 = builder(list(range(10))) self.assertEqual(1, list1[1]) self.assertEqual("one", list1[-2]) self.assertEqual([2, 3], list1[2:4]) self.assertRaises(IndexError, lambda: list1[6]) self.assertRaises(IndexError, lambda: list1[-7]) self.assertEqual([0, 1, 2], list1[:3]) self.assertEqual([0, 1, 2, 3, "one", "two"], list1[:]) self.assertEqual([3, "one", "two"], list1[3:]) self.assertEqual([3, "one", "two"], list1[3:100]) self.assertEqual(["one", "two"], list1[-2:]) self.assertEqual([0, 1], list1[:-4]) self.assertEqual([], list1[6:]) self.assertEqual([], list1[4:2]) self.assertEqual([0, 2, "one"], list1[0:5:2]) self.assertEqual([0, 2], list1[0:-3:2]) self.assertEqual([0, 1, 2, 3, "one", "two"], list1[::]) self.assertEqual([2, 3, "one", "two"], list1[2::]) self.assertEqual([0, 1, 2, 3], list1[:4:]) self.assertEqual([2, 3], list1[2:4:]) self.assertEqual([0, 2, 4, 6, 8], list2[::2]) self.assertEqual([2, 5, 8], list2[2::3]) self.assertEqual([0, 3], list2[:6:3]) self.assertEqual([2, 5, 8], list2[-8:9:3]) self.assertEqual([], list2[100000:1000:-100]) list1[3] = 100 self.assertEqual(100, list1[3]) list1[-3] = 101 self.assertEqual([0, 1, 2, 101, "one", "two"], list1) list1[5:] = [6, 7, 8] self.assertEqual([6, 7, 8], list1[5:]) self.assertEqual([0, 1, 2, 101, "one", 6, 7, 8], list1) list1[2:4] = [-1, -2, -3, -4, -5] self.assertEqual([0, 1, -1, -2, -3, -4, -5, "one", 6, 7, 8], list1) list1[0:-3] = [99] self.assertEqual([99, 6, 7, 8], list1) list2[0:6:2] = [100, 102, 104] self.assertEqual([100, 1, 102, 3, 104, 5, 6, 7, 8, 9], list2) list2[::3] = [200, 203, 206, 209] self.assertEqual([200, 1, 102, 203, 104, 5, 206, 7, 8, 209], list2) list2[::] = range(7) self.assertEqual([0, 1, 2, 3, 4, 5, 6], list2) self.assertRaises(ValueError, assign, list2, 0, 5, 2, [100, 102, 104, 106]) with self.assertRaises(IndexError): list2[7] = "foo" with self.assertRaises(IndexError): list2[-8] = "foo" del list2[2] self.assertEqual([0, 1, 3, 4, 5, 6], list2) del list2[-3] self.assertEqual([0, 1, 3, 5, 6], list2) self.assertRaises(IndexError, delete, list2, 100) self.assertRaises(IndexError, delete, list2, -6) list2[:] = range(10) del list2[3:6] self.assertEqual([0, 1, 2, 6, 7, 8, 9], list2) del list2[-2:] self.assertEqual([0, 1, 2, 6, 7], list2) del list2[:2] self.assertEqual([2, 6, 7], list2) list2[:] = range(10) del list2[2:8:2] self.assertEqual([0, 1, 3, 5, 7, 8, 9], list2) def _test_add_radd_iadd(self, builder): """Run tests on __r/i/add__ of a list built with *builder*.""" list1 = builder(range(5)) list2 = builder(range(5, 10)) self.assertEqual([0, 1, 2, 3, 4, 5, 6], list1 + [5, 6]) self.assertEqual([0, 1, 2, 3, 4], list1) self.assertEqual(list(range(10)), list1 + list2) self.assertEqual([-2, -1, 0, 1, 2, 3, 4], [-2, -1] + list1) self.assertEqual([0, 1, 2, 3, 4], list1) list1 += ["foo", "bar", "baz"] self.assertEqual([0, 1, 2, 3, 4, "foo", "bar", "baz"], list1) def _test_other_magic_methods(self, builder): """Run tests on other magic methods of a list built with *builder*.""" list1 = builder([0, 1, 2, 3, "one", "two"]) list2 = builder([]) list3 = builder([0, 2, 3, 4]) list4 = builder([0, 1, 2]) if py3k: self.assertEqual("[0, 1, 2, 3, 'one', 'two']", str(list1)) self.assertEqual(b"\x00\x01\x02", bytes(list4)) self.assertEqual("[0, 1, 2, 3, 'one', 'two']", repr(list1)) else: self.assertEqual("[0, 1, 2, 3, u'one', u'two']", unicode(list1)) self.assertEqual(b"[0, 1, 2, 3, u'one', u'two']", str(list1)) self.assertEqual(b"[0, 1, 2, 3, u'one', u'two']", repr(list1)) self.assertTrue(list1 < list3) self.assertTrue(list1 <= list3) self.assertFalse(list1 == list3) self.assertTrue(list1 != list3) self.assertFalse(list1 > list3) self.assertFalse(list1 >= list3) other1 = [0, 2, 3, 4] self.assertTrue(list1 < other1) self.assertTrue(list1 <= other1) self.assertFalse(list1 == other1) self.assertTrue(list1 != other1) self.assertFalse(list1 > other1) self.assertFalse(list1 >= other1) other2 = [0, 0, 1, 2] self.assertFalse(list1 < other2) self.assertFalse(list1 <= other2) self.assertFalse(list1 == other2) self.assertTrue(list1 != other2) self.assertTrue(list1 > other2) self.assertTrue(list1 >= other2) other3 = [0, 1, 2, 3, "one", "two"] self.assertFalse(list1 < other3) self.assertTrue(list1 <= other3) self.assertTrue(list1 == other3) self.assertFalse(list1 != other3) self.assertFalse(list1 > other3) self.assertTrue(list1 >= other3) self.assertTrue(bool(list1)) self.assertFalse(bool(list2)) self.assertEqual(6, len(list1)) self.assertEqual(0, len(list2)) out = [] for obj in list1: out.append(obj) self.assertEqual([0, 1, 2, 3, "one", "two"], out) out = [] for ch in list2: out.append(ch) self.assertEqual([], out) gen1 = iter(list1) out = [] for i in range(len(list1)): out.append(next(gen1)) self.assertRaises(StopIteration, next, gen1) self.assertEqual([0, 1, 2, 3, "one", "two"], out) gen2 = iter(list2) self.assertRaises(StopIteration, next, gen2) self.assertEqual(["two", "one", 3, 2, 1, 0], list(reversed(list1))) self.assertEqual([], list(reversed(list2))) self.assertTrue("one" in list1) self.assertTrue(3 in list1) self.assertFalse(10 in list1) self.assertFalse(0 in list2) self.assertEqual([], list2 * 5) self.assertEqual([], 5 * list2) self.assertEqual([0, 1, 2, 0, 1, 2, 0, 1, 2], list4 * 3) self.assertEqual([0, 1, 2, 0, 1, 2, 0, 1, 2], 3 * list4) list4 *= 2 self.assertEqual([0, 1, 2, 0, 1, 2], list4) def _test_list_methods(self, builder): """Run tests on the public methods of a list built with *builder*.""" list1 = builder(range(5)) list2 = builder(["foo"]) list3 = builder([("a", 5), ("d", 2), ("b", 8), ("c", 3)]) list1.append(5) list1.append(1) list1.append(2) self.assertEqual([0, 1, 2, 3, 4, 5, 1, 2], list1) self.assertEqual(0, list1.count(6)) self.assertEqual(2, list1.count(1)) list1.extend(range(5, 8)) self.assertEqual([0, 1, 2, 3, 4, 5, 1, 2, 5, 6, 7], list1) self.assertEqual(1, list1.index(1)) self.assertEqual(6, list1.index(1, 3)) self.assertEqual(6, list1.index(1, 3, 7)) self.assertRaises(ValueError, list1.index, 1, 3, 5) list1.insert(0, -1) self.assertEqual([-1, 0, 1, 2, 3, 4, 5, 1, 2, 5, 6, 7], list1) list1.insert(-1, 6.5) self.assertEqual([-1, 0, 1, 2, 3, 4, 5, 1, 2, 5, 6, 6.5, 7], list1) list1.insert(13, 8) self.assertEqual([-1, 0, 1, 2, 3, 4, 5, 1, 2, 5, 6, 6.5, 7, 8], list1) self.assertEqual(8, list1.pop()) self.assertEqual(7, list1.pop()) self.assertEqual([-1, 0, 1, 2, 3, 4, 5, 1, 2, 5, 6, 6.5], list1) self.assertEqual(-1, list1.pop(0)) self.assertEqual(5, list1.pop(5)) self.assertEqual(6.5, list1.pop(-1)) self.assertEqual([0, 1, 2, 3, 4, 1, 2, 5, 6], list1) self.assertEqual("foo", list2.pop()) self.assertRaises(IndexError, list2.pop) self.assertEqual([], list2) list1.remove(6) self.assertEqual([0, 1, 2, 3, 4, 1, 2, 5], list1) list1.remove(1) self.assertEqual([0, 2, 3, 4, 1, 2, 5], list1) list1.remove(1) self.assertEqual([0, 2, 3, 4, 2, 5], list1) self.assertRaises(ValueError, list1.remove, 1) list1.reverse() self.assertEqual([5, 2, 4, 3, 2, 0], list1) list1.sort() self.assertEqual([0, 2, 2, 3, 4, 5], list1) list1.sort(reverse=True) self.assertEqual([5, 4, 3, 2, 2, 0], list1) if not py3k: func = lambda x, y: abs(3 - x) - abs(3 - y) # Distance from 3 list1.sort(cmp=func) self.assertEqual([3, 4, 2, 2, 5, 0], list1) list1.sort(cmp=func, reverse=True) self.assertEqual([0, 5, 4, 2, 2, 3], list1) list3.sort(key=lambda i: i[1]) self.assertEqual([("d", 2), ("c", 3), ("a", 5), ("b", 8)], list3) list3.sort(key=lambda i: i[1], reverse=True) self.assertEqual([("b", 8), ("a", 5), ("c", 3), ("d", 2)], list3) def _dispatch_test_for_children(self, meth): """Run a test method on various different types of children.""" meth(lambda L: SmartList(list(L))[:]) meth(lambda L: SmartList([999] + list(L))[1:]) meth(lambda L: SmartList(list(L) + [999])[:-1]) meth(lambda L: SmartList([101, 102] + list(L) + [201, 202])[2:-2]) def test_docs(self): """make sure the methods of SmartList/_ListProxy have docstrings""" methods = ["append", "count", "extend", "index", "insert", "pop", "remove", "reverse", "sort"] for meth in methods: expected = getattr(list, meth).__doc__ smartlist_doc = getattr(SmartList, meth).__doc__ listproxy_doc = getattr(_ListProxy, meth).__doc__ self.assertEqual(expected, smartlist_doc) self.assertEqual(expected, listproxy_doc) def test_doctest(self): """make sure the test embedded in SmartList's docstring passes""" parent = SmartList([0, 1, 2, 3]) self.assertEqual([0, 1, 2, 3], parent) child = parent[2:] self.assertEqual([2, 3], child) child.append(4) self.assertEqual([2, 3, 4], child) self.assertEqual([0, 1, 2, 3, 4], parent) def test_parent_get_set_del(self): """make sure SmartList's getitem/setitem/delitem work""" self._test_get_set_del_item(SmartList) def test_parent_add(self): """make sure SmartList's add/radd/iadd work""" self._test_add_radd_iadd(SmartList) def test_parent_other_magics(self): """make sure SmartList's other magically implemented features work""" self._test_other_magic_methods(SmartList) def test_parent_methods(self): """make sure SmartList's non-magic methods work, like append()""" self._test_list_methods(SmartList) def test_child_get_set_del(self): """make sure _ListProxy's getitem/setitem/delitem work""" self._dispatch_test_for_children(self._test_get_set_del_item) def test_child_add(self): """make sure _ListProxy's add/radd/iadd work""" self._dispatch_test_for_children(self._test_add_radd_iadd) def test_child_other_magics(self): """make sure _ListProxy's other magically implemented features work""" self._dispatch_test_for_children(self._test_other_magic_methods) def test_child_methods(self): """make sure _ListProxy's non-magic methods work, like append()""" self._dispatch_test_for_children(self._test_list_methods) def test_influence(self): """make sure changes are propagated from parents to children""" parent = SmartList([0, 1, 2, 3, 4, 5]) child1 = parent[2:] child2 = parent[2:5] self.assertEqual([0, 1, 2, 3, 4, 5], parent) self.assertEqual([2, 3, 4, 5], child1) self.assertEqual([2, 3, 4], child2) self.assertEqual(2, len(parent._children)) parent.append(6) child1.append(7) child2.append(4.5) self.assertEqual([0, 1, 2, 3, 4, 4.5, 5, 6, 7], parent) self.assertEqual([2, 3, 4, 4.5, 5, 6, 7], child1) self.assertEqual([2, 3, 4, 4.5], child2) parent.insert(0, -1) parent.insert(4, 2.5) parent.insert(10, 6.5) self.assertEqual([-1, 0, 1, 2, 2.5, 3, 4, 4.5, 5, 6, 6.5, 7], parent) self.assertEqual([2, 2.5, 3, 4, 4.5, 5, 6, 6.5, 7], child1) self.assertEqual([2, 2.5, 3, 4, 4.5], child2) self.assertEqual(7, parent.pop()) self.assertEqual(6.5, child1.pop()) self.assertEqual(4.5, child2.pop()) self.assertEqual([-1, 0, 1, 2, 2.5, 3, 4, 5, 6], parent) self.assertEqual([2, 2.5, 3, 4, 5, 6], child1) self.assertEqual([2, 2.5, 3, 4], child2) parent.remove(-1) child1.remove(2.5) self.assertEqual([0, 1, 2, 3, 4, 5, 6], parent) self.assertEqual([2, 3, 4, 5, 6], child1) self.assertEqual([2, 3, 4], child2) self.assertEqual(0, parent.pop(0)) self.assertEqual([1, 2, 3, 4, 5, 6], parent) self.assertEqual([2, 3, 4, 5, 6], child1) self.assertEqual([2, 3, 4], child2) child2.reverse() self.assertEqual([1, 4, 3, 2, 5, 6], parent) self.assertEqual([4, 3, 2, 5, 6], child1) self.assertEqual([4, 3, 2], child2) parent.extend([7, 8]) child1.extend([8.1, 8.2]) child2.extend([1.9, 1.8]) self.assertEqual([1, 4, 3, 2, 1.9, 1.8, 5, 6, 7, 8, 8.1, 8.2], parent) self.assertEqual([4, 3, 2, 1.9, 1.8, 5, 6, 7, 8, 8.1, 8.2], child1) self.assertEqual([4, 3, 2, 1.9, 1.8], child2) child3 = parent[9:] self.assertEqual([8, 8.1, 8.2], child3) del parent[8:] self.assertEqual([1, 4, 3, 2, 1.9, 1.8, 5, 6], parent) self.assertEqual([4, 3, 2, 1.9, 1.8, 5, 6], child1) self.assertEqual([4, 3, 2, 1.9, 1.8], child2) self.assertEqual([], child3) del child1 self.assertEqual([1, 4, 3, 2, 1.9, 1.8, 5, 6], parent) self.assertEqual([4, 3, 2, 1.9, 1.8], child2) self.assertEqual([], child3) self.assertEqual(2, len(parent._children)) del child3 self.assertEqual([1, 4, 3, 2, 1.9, 1.8, 5, 6], parent) self.assertEqual([4, 3, 2, 1.9, 1.8], child2) self.assertEqual(1, len(parent._children)) parent.remove(1.9) parent.remove(1.8) self.assertEqual([1, 4, 3, 2, 5, 6], parent) self.assertEqual([4, 3, 2], child2) parent.reverse() self.assertEqual([6, 5, 2, 3, 4, 1], parent) self.assertEqual([4, 3, 2], child2) self.assertEqual(0, len(parent._children)) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_string_mixin.py000066400000000000000000000432311255634533200224060ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals from sys import getdefaultencoding from types import GeneratorType try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.compat import bytes, py3k, py32, range, str from mwparserfromhell.string_mixin import StringMixIn class _FakeString(StringMixIn): def __init__(self, data): self._data = data def __unicode__(self): return self._data class TestStringMixIn(unittest.TestCase): """Test cases for the StringMixIn class.""" def test_docs(self): """make sure the various methods of StringMixIn have docstrings""" methods = [ "capitalize", "center", "count", "encode", "endswith", "expandtabs", "find", "format", "index", "isalnum", "isalpha", "isdecimal", "isdigit", "islower", "isnumeric", "isspace", "istitle", "isupper", "join", "ljust", "lower", "lstrip", "partition", "replace", "rfind", "rindex", "rjust", "rpartition", "rsplit", "rstrip", "split", "splitlines", "startswith", "strip", "swapcase", "title", "translate", "upper", "zfill"] if py3k: if not py32: methods.append("casefold") methods.extend(["format_map", "isidentifier", "isprintable", "maketrans"]) else: methods.append("decode") for meth in methods: expected = getattr("foo", meth).__doc__ actual = getattr(_FakeString("foo"), meth).__doc__ self.assertEqual(expected, actual) def test_types(self): """make sure StringMixIns convert to different types correctly""" fstr = _FakeString("fake string") self.assertEqual(str(fstr), "fake string") self.assertEqual(bytes(fstr), b"fake string") if py3k: self.assertEqual(repr(fstr), "'fake string'") else: self.assertEqual(repr(fstr), b"u'fake string'") self.assertIsInstance(str(fstr), str) self.assertIsInstance(bytes(fstr), bytes) if py3k: self.assertIsInstance(repr(fstr), str) else: self.assertIsInstance(repr(fstr), bytes) def test_comparisons(self): """make sure comparison operators work""" str1 = _FakeString("this is a fake string") str2 = _FakeString("this is a fake string") str3 = _FakeString("fake string, this is") str4 = "this is a fake string" str5 = "fake string, this is" self.assertFalse(str1 > str2) self.assertTrue(str1 >= str2) self.assertTrue(str1 == str2) self.assertFalse(str1 != str2) self.assertFalse(str1 < str2) self.assertTrue(str1 <= str2) self.assertTrue(str1 > str3) self.assertTrue(str1 >= str3) self.assertFalse(str1 == str3) self.assertTrue(str1 != str3) self.assertFalse(str1 < str3) self.assertFalse(str1 <= str3) self.assertFalse(str1 > str4) self.assertTrue(str1 >= str4) self.assertTrue(str1 == str4) self.assertFalse(str1 != str4) self.assertFalse(str1 < str4) self.assertTrue(str1 <= str4) self.assertFalse(str5 > str1) self.assertFalse(str5 >= str1) self.assertFalse(str5 == str1) self.assertTrue(str5 != str1) self.assertTrue(str5 < str1) self.assertTrue(str5 <= str1) def test_other_magics(self): """test other magically implemented features, like len() and iter()""" str1 = _FakeString("fake string") str2 = _FakeString("") expected = ["f", "a", "k", "e", " ", "s", "t", "r", "i", "n", "g"] self.assertTrue(str1) self.assertFalse(str2) self.assertEqual(11, len(str1)) self.assertEqual(0, len(str2)) out = [] for ch in str1: out.append(ch) self.assertEqual(expected, out) out = [] for ch in str2: out.append(ch) self.assertEqual([], out) gen1 = iter(str1) gen2 = iter(str2) self.assertIsInstance(gen1, GeneratorType) self.assertIsInstance(gen2, GeneratorType) out = [] for i in range(len(str1)): out.append(next(gen1)) self.assertRaises(StopIteration, next, gen1) self.assertEqual(expected, out) self.assertRaises(StopIteration, next, gen2) self.assertEqual("gnirts ekaf", "".join(list(reversed(str1)))) self.assertEqual([], list(reversed(str2))) self.assertEqual("f", str1[0]) self.assertEqual(" ", str1[4]) self.assertEqual("g", str1[10]) self.assertEqual("n", str1[-2]) self.assertRaises(IndexError, lambda: str1[11]) self.assertRaises(IndexError, lambda: str2[0]) self.assertTrue("k" in str1) self.assertTrue("fake" in str1) self.assertTrue("str" in str1) self.assertTrue("" in str1) self.assertTrue("" in str2) self.assertFalse("real" in str1) self.assertFalse("s" in str2) def test_other_methods(self): """test the remaining non-magic methods of StringMixIn""" str1 = _FakeString("fake string") self.assertEqual("Fake string", str1.capitalize()) self.assertEqual(" fake string ", str1.center(15)) self.assertEqual(" fake string ", str1.center(16)) self.assertEqual("qqfake stringqq", str1.center(15, "q")) self.assertEqual(1, str1.count("e")) self.assertEqual(0, str1.count("z")) self.assertEqual(1, str1.count("r", 7)) self.assertEqual(0, str1.count("r", 8)) self.assertEqual(1, str1.count("r", 5, 9)) self.assertEqual(0, str1.count("r", 5, 7)) if not py3k: str2 = _FakeString("fo") self.assertEqual(str1, str1.decode()) actual = _FakeString("\\U00010332\\U0001033f\\U00010344") self.assertEqual("𐌲𐌿𐍄", actual.decode("unicode_escape")) self.assertRaises(UnicodeError, str2.decode, "punycode") self.assertEqual("", str2.decode("punycode", "ignore")) str3 = _FakeString("𐌲𐌿𐍄") actual = b"\xF0\x90\x8C\xB2\xF0\x90\x8C\xBF\xF0\x90\x8D\x84" self.assertEqual(b"fake string", str1.encode()) self.assertEqual(actual, str3.encode("utf-8")) self.assertEqual(actual, str3.encode(encoding="utf-8")) if getdefaultencoding() == "ascii": self.assertRaises(UnicodeEncodeError, str3.encode) elif getdefaultencoding() == "utf-8": self.assertEqual(actual, str3.encode()) self.assertRaises(UnicodeEncodeError, str3.encode, "ascii") self.assertRaises(UnicodeEncodeError, str3.encode, "ascii", "strict") if getdefaultencoding() == "ascii": self.assertRaises(UnicodeEncodeError, str3.encode, errors="strict") elif getdefaultencoding() == "utf-8": self.assertEqual(actual, str3.encode(errors="strict")) self.assertEqual(b"", str3.encode("ascii", "ignore")) if getdefaultencoding() == "ascii": self.assertEqual(b"", str3.encode(errors="ignore")) elif getdefaultencoding() == "utf-8": self.assertEqual(actual, str3.encode(errors="ignore")) self.assertTrue(str1.endswith("ing")) self.assertFalse(str1.endswith("ingh")) str4 = _FakeString("\tfoobar") self.assertEqual("fake string", str1) self.assertEqual(" foobar", str4.expandtabs()) self.assertEqual(" foobar", str4.expandtabs(4)) self.assertEqual(3, str1.find("e")) self.assertEqual(-1, str1.find("z")) self.assertEqual(7, str1.find("r", 7)) self.assertEqual(-1, str1.find("r", 8)) self.assertEqual(7, str1.find("r", 5, 9)) self.assertEqual(-1, str1.find("r", 5, 7)) str5 = _FakeString("foo{0}baz") str6 = _FakeString("foo{abc}baz") str7 = _FakeString("foo{0}{abc}buzz") str8 = _FakeString("{0}{1}") self.assertEqual("fake string", str1.format()) self.assertEqual("foobarbaz", str5.format("bar")) self.assertEqual("foobarbaz", str6.format(abc="bar")) self.assertEqual("foobarbazbuzz", str7.format("bar", abc="baz")) self.assertRaises(IndexError, str8.format, "abc") if py3k: self.assertEqual("fake string", str1.format_map({})) self.assertEqual("foobarbaz", str6.format_map({"abc": "bar"})) self.assertRaises(ValueError, str5.format_map, {0: "abc"}) self.assertEqual(3, str1.index("e")) self.assertRaises(ValueError, str1.index, "z") self.assertEqual(7, str1.index("r", 7)) self.assertRaises(ValueError, str1.index, "r", 8) self.assertEqual(7, str1.index("r", 5, 9)) self.assertRaises(ValueError, str1.index, "r", 5, 7) str9 = _FakeString("foobar") str10 = _FakeString("foobar123") str11 = _FakeString("foo bar") self.assertTrue(str9.isalnum()) self.assertTrue(str10.isalnum()) self.assertFalse(str11.isalnum()) self.assertTrue(str9.isalpha()) self.assertFalse(str10.isalpha()) self.assertFalse(str11.isalpha()) str12 = _FakeString("123") str13 = _FakeString("\u2155") str14 = _FakeString("\u00B2") self.assertFalse(str9.isdecimal()) self.assertTrue(str12.isdecimal()) self.assertFalse(str13.isdecimal()) self.assertFalse(str14.isdecimal()) self.assertFalse(str9.isdigit()) self.assertTrue(str12.isdigit()) self.assertFalse(str13.isdigit()) self.assertTrue(str14.isdigit()) if py3k: self.assertTrue(str9.isidentifier()) self.assertTrue(str10.isidentifier()) self.assertFalse(str11.isidentifier()) self.assertFalse(str12.isidentifier()) str15 = _FakeString("") str16 = _FakeString("FooBar") self.assertTrue(str9.islower()) self.assertFalse(str15.islower()) self.assertFalse(str16.islower()) self.assertFalse(str9.isnumeric()) self.assertTrue(str12.isnumeric()) self.assertTrue(str13.isnumeric()) self.assertTrue(str14.isnumeric()) if py3k: str16B = _FakeString("\x01\x02") self.assertTrue(str9.isprintable()) self.assertTrue(str13.isprintable()) self.assertTrue(str14.isprintable()) self.assertTrue(str15.isprintable()) self.assertFalse(str16B.isprintable()) str17 = _FakeString(" ") str18 = _FakeString("\t \t \r\n") self.assertFalse(str1.isspace()) self.assertFalse(str9.isspace()) self.assertTrue(str17.isspace()) self.assertTrue(str18.isspace()) str19 = _FakeString("This Sentence Looks Like A Title") str20 = _FakeString("This sentence doesn't LookLikeATitle") self.assertFalse(str15.istitle()) self.assertTrue(str19.istitle()) self.assertFalse(str20.istitle()) str21 = _FakeString("FOOBAR") self.assertFalse(str9.isupper()) self.assertFalse(str15.isupper()) self.assertTrue(str21.isupper()) self.assertEqual("foobar", str15.join(["foo", "bar"])) self.assertEqual("foo123bar123baz", str12.join(("foo", "bar", "baz"))) self.assertEqual("fake string ", str1.ljust(15)) self.assertEqual("fake string ", str1.ljust(16)) self.assertEqual("fake stringqqqq", str1.ljust(15, "q")) str22 = _FakeString("ß") self.assertEqual("", str15.lower()) self.assertEqual("foobar", str16.lower()) self.assertEqual("ß", str22.lower()) if py3k and not py32: self.assertEqual("", str15.casefold()) self.assertEqual("foobar", str16.casefold()) self.assertEqual("ss", str22.casefold()) str23 = _FakeString(" fake string ") self.assertEqual("fake string", str1.lstrip()) self.assertEqual("fake string ", str23.lstrip()) self.assertEqual("ke string", str1.lstrip("abcdef")) self.assertEqual(("fa", "ke", " string"), str1.partition("ke")) self.assertEqual(("fake string", "", ""), str1.partition("asdf")) str24 = _FakeString("boo foo moo") self.assertEqual("real string", str1.replace("fake", "real")) self.assertEqual("bu fu moo", str24.replace("oo", "u", 2)) self.assertEqual(3, str1.rfind("e")) self.assertEqual(-1, str1.rfind("z")) self.assertEqual(7, str1.rfind("r", 7)) self.assertEqual(-1, str1.rfind("r", 8)) self.assertEqual(7, str1.rfind("r", 5, 9)) self.assertEqual(-1, str1.rfind("r", 5, 7)) self.assertEqual(3, str1.rindex("e")) self.assertRaises(ValueError, str1.rindex, "z") self.assertEqual(7, str1.rindex("r", 7)) self.assertRaises(ValueError, str1.rindex, "r", 8) self.assertEqual(7, str1.rindex("r", 5, 9)) self.assertRaises(ValueError, str1.rindex, "r", 5, 7) self.assertEqual(" fake string", str1.rjust(15)) self.assertEqual(" fake string", str1.rjust(16)) self.assertEqual("qqqqfake string", str1.rjust(15, "q")) self.assertEqual(("fa", "ke", " string"), str1.rpartition("ke")) self.assertEqual(("", "", "fake string"), str1.rpartition("asdf")) str25 = _FakeString(" this is a sentence with whitespace ") actual = ["this", "is", "a", "sentence", "with", "whitespace"] self.assertEqual(actual, str25.rsplit()) self.assertEqual(actual, str25.rsplit(None)) actual = ["", "", "", "this", "is", "a", "", "", "sentence", "with", "", "whitespace", ""] self.assertEqual(actual, str25.rsplit(" ")) actual = [" this is a", "sentence", "with", "whitespace"] self.assertEqual(actual, str25.rsplit(None, 3)) actual = [" this is a sentence with", "", "whitespace", ""] self.assertEqual(actual, str25.rsplit(" ", 3)) if py3k and not py32: actual = [" this is a", "sentence", "with", "whitespace"] self.assertEqual(actual, str25.rsplit(maxsplit=3)) self.assertEqual("fake string", str1.rstrip()) self.assertEqual(" fake string", str23.rstrip()) self.assertEqual("fake stri", str1.rstrip("ngr")) actual = ["this", "is", "a", "sentence", "with", "whitespace"] self.assertEqual(actual, str25.split()) self.assertEqual(actual, str25.split(None)) actual = ["", "", "", "this", "is", "a", "", "", "sentence", "with", "", "whitespace", ""] self.assertEqual(actual, str25.split(" ")) actual = ["this", "is", "a", "sentence with whitespace "] self.assertEqual(actual, str25.split(None, 3)) actual = ["", "", "", "this is a sentence with whitespace "] self.assertEqual(actual, str25.split(" ", 3)) if py3k and not py32: actual = ["this", "is", "a", "sentence with whitespace "] self.assertEqual(actual, str25.split(maxsplit=3)) str26 = _FakeString("lines\nof\ntext\r\nare\r\npresented\nhere") self.assertEqual(["lines", "of", "text", "are", "presented", "here"], str26.splitlines()) self.assertEqual(["lines\n", "of\n", "text\r\n", "are\r\n", "presented\n", "here"], str26.splitlines(True)) self.assertTrue(str1.startswith("fake")) self.assertFalse(str1.startswith("faker")) self.assertEqual("fake string", str1.strip()) self.assertEqual("fake string", str23.strip()) self.assertEqual("ke stri", str1.strip("abcdefngr")) self.assertEqual("fOObAR", str16.swapcase()) self.assertEqual("Fake String", str1.title()) if py3k: table1 = StringMixIn.maketrans({97: "1", 101: "2", 105: "3", 111: "4", 117: "5"}) table2 = StringMixIn.maketrans("aeiou", "12345") table3 = StringMixIn.maketrans("aeiou", "12345", "rts") self.assertEqual("f1k2 str3ng", str1.translate(table1)) self.assertEqual("f1k2 str3ng", str1.translate(table2)) self.assertEqual("f1k2 3ng", str1.translate(table3)) else: table = {97: "1", 101: "2", 105: "3", 111: "4", 117: "5"} self.assertEqual("f1k2 str3ng", str1.translate(table)) self.assertEqual("", str15.upper()) self.assertEqual("FOOBAR", str16.upper()) self.assertEqual("123", str12.zfill(3)) self.assertEqual("000123", str12.zfill(6)) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_tag.py000066400000000000000000000371751255634533200204610ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.compat import str from mwparserfromhell.nodes import Tag, Template, Text from mwparserfromhell.nodes.extras import Attribute from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext agen = lambda name, value: Attribute(wraptext(name), wraptext(value)) agennv = lambda name: Attribute(wraptext(name)) agennq = lambda name, value: Attribute(wraptext(name), wraptext(value), None) agenp = lambda name, v, a, b, c: Attribute(wraptext(name), v, '"', a, b, c) agenpnv = lambda name, a, b, c: Attribute(wraptext(name), None, '"', a, b, c) class TestTag(TreeEqualityTestCase): """Test cases for the Tag node.""" def test_unicode(self): """test Tag.__unicode__()""" node1 = Tag(wraptext("ref")) node2 = Tag(wraptext("span"), wraptext("foo"), [agen("style", "color: red;")]) node3 = Tag(wraptext("ref"), attrs=[agennq("name", "foo"), agenpnv("some_attr", " ", "", "")], self_closing=True) node4 = Tag(wraptext("br"), self_closing=True, padding=" ") node5 = Tag(wraptext("br"), self_closing=True, implicit=True) node6 = Tag(wraptext("br"), self_closing=True, invalid=True, implicit=True) node7 = Tag(wraptext("br"), self_closing=True, invalid=True, padding=" ") node8 = Tag(wraptext("hr"), wiki_markup="----", self_closing=True) node9 = Tag(wraptext("i"), wraptext("italics!"), wiki_markup="''") self.assertEqual("", str(node1)) self.assertEqual('foo', str(node2)) self.assertEqual("", str(node3)) self.assertEqual("
    ", str(node4)) self.assertEqual("
    ", str(node5)) self.assertEqual("
    ", str(node6)) self.assertEqual("
    ", str(node7)) self.assertEqual("----", str(node8)) self.assertEqual("''italics!''", str(node9)) def test_children(self): """test Tag.__children__()""" # foobar node1 = Tag(wraptext("ref"), wraptext("foobar")) # '''bold text''' node2 = Tag(wraptext("b"), wraptext("bold text"), wiki_markup="'''") # node3 = Tag(wraptext("img"), attrs=[agen("id", "foo"), agen("class", "bar"), agennv("selected")], self_closing=True, padding=" ") gen1 = node1.__children__() gen2 = node2.__children__() gen3 = node3.__children__() self.assertEqual(node1.tag, next(gen1)) self.assertEqual(node3.tag, next(gen3)) self.assertEqual(node3.attributes[0].name, next(gen3)) self.assertEqual(node3.attributes[0].value, next(gen3)) self.assertEqual(node3.attributes[1].name, next(gen3)) self.assertEqual(node3.attributes[1].value, next(gen3)) self.assertEqual(node3.attributes[2].name, next(gen3)) self.assertEqual(node1.contents, next(gen1)) self.assertEqual(node2.contents, next(gen2)) self.assertEqual(node1.closing_tag, next(gen1)) self.assertRaises(StopIteration, next, gen1) self.assertRaises(StopIteration, next, gen2) self.assertRaises(StopIteration, next, gen3) def test_strip(self): """test Tag.__strip__()""" node1 = Tag(wraptext("i"), wraptext("foobar")) node2 = Tag(wraptext("math"), wraptext("foobar")) node3 = Tag(wraptext("br"), self_closing=True) for a in (True, False): for b in (True, False): self.assertEqual("foobar", node1.__strip__(a, b)) self.assertEqual(None, node2.__strip__(a, b)) self.assertEqual(None, node3.__strip__(a, b)) def test_showtree(self): """test Tag.__showtree__()""" output = [] getter, marker = object(), object() get = lambda code: output.append((getter, code)) mark = lambda: output.append(marker) node1 = Tag(wraptext("ref"), wraptext("text"), [agen("name", "foo"), agennv("selected")]) node2 = Tag(wraptext("br"), self_closing=True, padding=" ") node3 = Tag(wraptext("br"), self_closing=True, invalid=True, implicit=True, padding=" ") node1.__showtree__(output.append, get, mark) node2.__showtree__(output.append, get, mark) node3.__showtree__(output.append, get, mark) valid = [ "<", (getter, node1.tag), (getter, node1.attributes[0].name), " = ", marker, (getter, node1.attributes[0].value), (getter, node1.attributes[1].name), ">", (getter, node1.contents), "", "<", (getter, node2.tag), "/>", ""] self.assertEqual(valid, output) def test_tag(self): """test getter/setter for the tag attribute""" tag = wraptext("ref") node = Tag(tag, wraptext("text")) self.assertIs(tag, node.tag) self.assertIs(tag, node.closing_tag) node.tag = "span" self.assertWikicodeEqual(wraptext("span"), node.tag) self.assertWikicodeEqual(wraptext("span"), node.closing_tag) self.assertEqual("text", node) def test_contents(self): """test getter/setter for the contents attribute""" contents = wraptext("text") node = Tag(wraptext("ref"), contents) self.assertIs(contents, node.contents) node.contents = "text and a {{template}}" parsed = wrap([Text("text and a "), Template(wraptext("template"))]) self.assertWikicodeEqual(parsed, node.contents) self.assertEqual("text and a {{template}}", node) def test_attributes(self): """test getter for the attributes attribute""" attrs = [agen("name", "bar")] node1 = Tag(wraptext("ref"), wraptext("foo")) node2 = Tag(wraptext("ref"), wraptext("foo"), attrs) self.assertEqual([], node1.attributes) self.assertIs(attrs, node2.attributes) def test_wiki_markup(self): """test getter/setter for the wiki_markup attribute""" node = Tag(wraptext("i"), wraptext("italic text")) self.assertIs(None, node.wiki_markup) node.wiki_markup = "''" self.assertEqual("''", node.wiki_markup) self.assertEqual("''italic text''", node) node.wiki_markup = False self.assertFalse(node.wiki_markup) self.assertEqual("italic text", node) def test_self_closing(self): """test getter/setter for the self_closing attribute""" node = Tag(wraptext("ref"), wraptext("foobar")) self.assertFalse(node.self_closing) node.self_closing = True self.assertTrue(node.self_closing) self.assertEqual("", node) node.self_closing = 0 self.assertFalse(node.self_closing) self.assertEqual("foobar", node) def test_invalid(self): """test getter/setter for the invalid attribute""" node = Tag(wraptext("br"), self_closing=True, implicit=True) self.assertFalse(node.invalid) node.invalid = True self.assertTrue(node.invalid) self.assertEqual("
    ", node) node.invalid = 0 self.assertFalse(node.invalid) self.assertEqual("
    ", node) def test_implicit(self): """test getter/setter for the implicit attribute""" node = Tag(wraptext("br"), self_closing=True) self.assertFalse(node.implicit) node.implicit = True self.assertTrue(node.implicit) self.assertEqual("
    ", node) node.implicit = 0 self.assertFalse(node.implicit) self.assertEqual("
    ", node) def test_padding(self): """test getter/setter for the padding attribute""" node = Tag(wraptext("ref"), wraptext("foobar")) self.assertEqual("", node.padding) node.padding = " " self.assertEqual(" ", node.padding) self.assertEqual("foobar", node) node.padding = None self.assertEqual("", node.padding) self.assertEqual("foobar", node) self.assertRaises(ValueError, setattr, node, "padding", True) def test_closing_tag(self): """test getter/setter for the closing_tag attribute""" tag = wraptext("ref") node = Tag(tag, wraptext("foobar")) self.assertIs(tag, node.closing_tag) node.closing_tag = "ref {{ignore me}}" parsed = wrap([Text("ref "), Template(wraptext("ignore me"))]) self.assertWikicodeEqual(parsed, node.closing_tag) self.assertEqual("foobar", node) def test_wiki_style_separator(self): """test getter/setter for wiki_style_separator attribute""" node = Tag(wraptext("table"), wraptext("\n")) self.assertIs(None, node.wiki_style_separator) node.wiki_style_separator = "|" self.assertEqual("|", node.wiki_style_separator) node.wiki_markup = "{" self.assertEqual("{|\n{", node) node2 = Tag(wraptext("table"), wraptext("\n"), wiki_style_separator="|") self.assertEqual("|", node.wiki_style_separator) def test_closing_wiki_markup(self): """test getter/setter for closing_wiki_markup attribute""" node = Tag(wraptext("table"), wraptext("\n")) self.assertIs(None, node.closing_wiki_markup) node.wiki_markup = "{|" self.assertEqual("{|", node.closing_wiki_markup) node.closing_wiki_markup = "|}" self.assertEqual("|}", node.closing_wiki_markup) self.assertEqual("{|\n|}", node) node.wiki_markup = "!!" self.assertEqual("|}", node.closing_wiki_markup) self.assertEqual("!!\n|}", node) node.wiki_markup = False self.assertFalse(node.closing_wiki_markup) self.assertEqual("\n
    ", node) node2 = Tag(wraptext("table"), wraptext("\n"), attrs=[agen("id", "foo")], wiki_markup="{|", closing_wiki_markup="|}") self.assertEqual("|}", node2.closing_wiki_markup) self.assertEqual('{| id="foo"\n|}', node2) def test_has(self): """test Tag.has()""" node = Tag(wraptext("ref"), wraptext("cite"), [agen("name", "foo")]) self.assertTrue(node.has("name")) self.assertTrue(node.has(" name ")) self.assertTrue(node.has(wraptext("name"))) self.assertFalse(node.has("Name")) self.assertFalse(node.has("foo")) attrs = [agen("id", "foo"), agenp("class", "bar", " ", "\n", "\n"), agen("foo", "bar"), agenpnv("foo", " ", " \n ", " \t")] node2 = Tag(wraptext("div"), attrs=attrs, self_closing=True) self.assertTrue(node2.has("id")) self.assertTrue(node2.has("class")) self.assertTrue(node2.has(attrs[1].pad_first + str(attrs[1].name) + attrs[1].pad_before_eq)) self.assertTrue(node2.has(attrs[3])) self.assertTrue(node2.has(str(attrs[3]))) self.assertFalse(node2.has("idclass")) self.assertFalse(node2.has("id class")) self.assertFalse(node2.has("id=foo")) def test_get(self): """test Tag.get()""" attrs = [agen("name", "foo")] node = Tag(wraptext("ref"), wraptext("cite"), attrs) self.assertIs(attrs[0], node.get("name")) self.assertIs(attrs[0], node.get(" name ")) self.assertIs(attrs[0], node.get(wraptext("name"))) self.assertRaises(ValueError, node.get, "Name") self.assertRaises(ValueError, node.get, "foo") attrs = [agen("id", "foo"), agenp("class", "bar", " ", "\n", "\n"), agen("foo", "bar"), agenpnv("foo", " ", " \n ", " \t")] node2 = Tag(wraptext("div"), attrs=attrs, self_closing=True) self.assertIs(attrs[0], node2.get("id")) self.assertIs(attrs[1], node2.get("class")) self.assertIs(attrs[1], node2.get( attrs[1].pad_first + str(attrs[1].name) + attrs[1].pad_before_eq)) self.assertIs(attrs[3], node2.get(attrs[3])) self.assertIs(attrs[3], node2.get(str(attrs[3]))) self.assertIs(attrs[3], node2.get(" foo")) self.assertRaises(ValueError, node2.get, "idclass") self.assertRaises(ValueError, node2.get, "id class") self.assertRaises(ValueError, node2.get, "id=foo") def test_add(self): """test Tag.add()""" node = Tag(wraptext("ref"), wraptext("cite")) node.add("name", "value") node.add("name", "value", quotes=None) node.add("name", "value", quotes="'") node.add("name") node.add(1, False) node.add("style", "{{foobar}}") node.add("name", "value", '"', "\n", " ", " ") attr1 = ' name="value"' attr2 = " name=value" attr3 = " name='value'" attr4 = " name" attr5 = ' 1="False"' attr6 = ' style="{{foobar}}"' attr7 = '\nname = "value"' self.assertEqual(attr1, node.attributes[0]) self.assertEqual(attr2, node.attributes[1]) self.assertEqual(attr3, node.attributes[2]) self.assertEqual(attr4, node.attributes[3]) self.assertEqual(attr5, node.attributes[4]) self.assertEqual(attr6, node.attributes[5]) self.assertEqual(attr7, node.attributes[6]) self.assertEqual(attr7, node.get("name")) self.assertWikicodeEqual(wrap([Template(wraptext("foobar"))]), node.attributes[5].value) self.assertEqual("".join(("cite
    ")), node) self.assertRaises(ValueError, node.add, "name", "foo", quotes="bar") self.assertRaises(ValueError, node.add, "name", "a bc d", quotes=None) def test_remove(self): """test Tag.remove()""" attrs = [agen("id", "foo"), agenp("class", "bar", " ", "\n", "\n"), agen("foo", "bar"), agenpnv("foo", " ", " \n ", " \t")] node = Tag(wraptext("div"), attrs=attrs, self_closing=True) node.remove("class") self.assertEqual('
    ', node) node.remove("foo") self.assertEqual('
    ', node) self.assertRaises(ValueError, node.remove, "foo") node.remove("id") self.assertEqual('
    ', node) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_template.py000066400000000000000000000520531255634533200215110ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.compat import str from mwparserfromhell.nodes import HTMLEntity, Template, Text from mwparserfromhell.nodes.extras import Parameter from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext pgens = lambda k, v: Parameter(wraptext(k), wraptext(v), showkey=True) pgenh = lambda k, v: Parameter(wraptext(k), wraptext(v), showkey=False) class TestTemplate(TreeEqualityTestCase): """Test cases for the Template node.""" def test_unicode(self): """test Template.__unicode__()""" node = Template(wraptext("foobar")) self.assertEqual("{{foobar}}", str(node)) node2 = Template(wraptext("foo"), [pgenh("1", "bar"), pgens("abc", "def")]) self.assertEqual("{{foo|bar|abc=def}}", str(node2)) def test_children(self): """test Template.__children__()""" node2p1 = Parameter(wraptext("1"), wraptext("bar"), showkey=False) node2p2 = Parameter(wraptext("abc"), wrap([Text("def"), Text("ghi")]), showkey=True) node1 = Template(wraptext("foobar")) node2 = Template(wraptext("foo"), [node2p1, node2p2]) gen1 = node1.__children__() gen2 = node2.__children__() self.assertEqual(node1.name, next(gen1)) self.assertEqual(node2.name, next(gen2)) self.assertEqual(node2.params[0].value, next(gen2)) self.assertEqual(node2.params[1].name, next(gen2)) self.assertEqual(node2.params[1].value, next(gen2)) self.assertRaises(StopIteration, next, gen1) self.assertRaises(StopIteration, next, gen2) def test_strip(self): """test Template.__strip__()""" node1 = Template(wraptext("foobar")) node2 = Template(wraptext("foo"), [pgenh("1", "bar"), pgens("abc", "def")]) for a in (True, False): for b in (True, False): self.assertEqual(None, node1.__strip__(a, b)) self.assertEqual(None, node2.__strip__(a, b)) def test_showtree(self): """test Template.__showtree__()""" output = [] getter, marker = object(), object() get = lambda code: output.append((getter, code)) mark = lambda: output.append(marker) node1 = Template(wraptext("foobar")) node2 = Template(wraptext("foo"), [pgenh("1", "bar"), pgens("abc", "def")]) node1.__showtree__(output.append, get, mark) node2.__showtree__(output.append, get, mark) valid = [ "{{", (getter, node1.name), "}}", "{{", (getter, node2.name), " | ", marker, (getter, node2.params[0].name), " = ", marker, (getter, node2.params[0].value), " | ", marker, (getter, node2.params[1].name), " = ", marker, (getter, node2.params[1].value), "}}"] self.assertEqual(valid, output) def test_name(self): """test getter/setter for the name attribute""" name = wraptext("foobar") node1 = Template(name) node2 = Template(name, [pgenh("1", "bar")]) self.assertIs(name, node1.name) self.assertIs(name, node2.name) node1.name = "asdf" node2.name = "téstïng" self.assertWikicodeEqual(wraptext("asdf"), node1.name) self.assertWikicodeEqual(wraptext("téstïng"), node2.name) def test_params(self): """test getter for the params attribute""" node1 = Template(wraptext("foobar")) plist = [pgenh("1", "bar"), pgens("abc", "def")] node2 = Template(wraptext("foo"), plist) self.assertEqual([], node1.params) self.assertIs(plist, node2.params) def test_has(self): """test Template.has()""" node1 = Template(wraptext("foobar")) node2 = Template(wraptext("foo"), [pgenh("1", "bar"), pgens("\nabc ", "def")]) node3 = Template(wraptext("foo"), [pgenh("1", "a"), pgens("b", "c"), pgens("1", "d")]) node4 = Template(wraptext("foo"), [pgenh("1", "a"), pgens("b", " ")]) self.assertFalse(node1.has("foobar", False)) self.assertTrue(node2.has(1, False)) self.assertTrue(node2.has("abc", False)) self.assertFalse(node2.has("def", False)) self.assertTrue(node3.has("1", False)) self.assertTrue(node3.has(" b ", False)) self.assertTrue(node4.has("b", False)) self.assertTrue(node3.has("b", True)) self.assertFalse(node4.has("b", True)) self.assertFalse(node1.has_param("foobar", False)) self.assertTrue(node2.has_param(1, False)) def test_get(self): """test Template.get()""" node1 = Template(wraptext("foobar")) node2p1 = pgenh("1", "bar") node2p2 = pgens("abc", "def") node2 = Template(wraptext("foo"), [node2p1, node2p2]) node3p1 = pgens("b", "c") node3p2 = pgens("1", "d") node3 = Template(wraptext("foo"), [pgenh("1", "a"), node3p1, node3p2]) node4p1 = pgens(" b", " ") node4 = Template(wraptext("foo"), [pgenh("1", "a"), node4p1]) self.assertRaises(ValueError, node1.get, "foobar") self.assertIs(node2p1, node2.get(1)) self.assertIs(node2p2, node2.get("abc")) self.assertRaises(ValueError, node2.get, "def") self.assertIs(node3p1, node3.get("b")) self.assertIs(node3p2, node3.get("1")) self.assertIs(node4p1, node4.get("b ")) def test_add(self): """test Template.add()""" node1 = Template(wraptext("a"), [pgens("b", "c"), pgenh("1", "d")]) node2 = Template(wraptext("a"), [pgens("b", "c"), pgenh("1", "d")]) node3 = Template(wraptext("a"), [pgens("b", "c"), pgenh("1", "d")]) node4 = Template(wraptext("a"), [pgens("b", "c"), pgenh("1", "d")]) node5 = Template(wraptext("a"), [pgens("b", "c"), pgens(" d ", "e")]) node6 = Template(wraptext("a"), [pgens("b", "c"), pgens("b", "d"), pgens("b", "e")]) node7 = Template(wraptext("a"), [pgens("b", "c"), pgenh("1", "d")]) node8p = pgenh("1", "d") node8 = Template(wraptext("a"), [pgens("b", "c"), node8p]) node9 = Template(wraptext("a"), [pgens("b", "c"), pgenh("1", "d")]) node10 = Template(wraptext("a"), [pgens("b", "c"), pgenh("1", "e")]) node11 = Template(wraptext("a"), [pgens("b", "c")]) node12 = Template(wraptext("a"), [pgens("b", "c")]) node13 = Template(wraptext("a"), [ pgens("\nb ", " c"), pgens("\nd ", " e"), pgens("\nf ", " g")]) node14 = Template(wraptext("a\n"), [ pgens("b ", "c\n"), pgens("d ", " e"), pgens("f ", "g\n"), pgens("h ", " i\n")]) node15 = Template(wraptext("a"), [ pgens("b ", " c\n"), pgens("\nd ", " e"), pgens("\nf ", "g ")]) node16 = Template(wraptext("a"), [ pgens("\nb ", " c"), pgens("\nd ", " e"), pgens("\nf ", " g")]) node17 = Template(wraptext("a"), [pgenh("1", "b")]) node18 = Template(wraptext("a"), [pgenh("1", "b")]) node19 = Template(wraptext("a"), [pgenh("1", "b")]) node20 = Template(wraptext("a"), [pgenh("1", "b"), pgenh("2", "c"), pgenh("3", "d"), pgenh("4", "e")]) node21 = Template(wraptext("a"), [pgenh("1", "b"), pgenh("2", "c"), pgens("4", "d"), pgens("5", "e")]) node22 = Template(wraptext("a"), [pgenh("1", "b"), pgenh("2", "c"), pgens("4", "d"), pgens("5", "e")]) node23 = Template(wraptext("a"), [pgenh("1", "b")]) node24 = Template(wraptext("a"), [pgenh("1", "b")]) node25 = Template(wraptext("a"), [pgens("b", "c")]) node26 = Template(wraptext("a"), [pgenh("1", "b")]) node27 = Template(wraptext("a"), [pgenh("1", "b")]) node28 = Template(wraptext("a"), [pgens("1", "b")]) node29 = Template(wraptext("a"), [ pgens("\nb ", " c"), pgens("\nd ", " e"), pgens("\nf ", " g")]) node30 = Template(wraptext("a\n"), [ pgens("b ", "c\n"), pgens("d ", " e"), pgens("f ", "g\n"), pgens("h ", " i\n")]) node31 = Template(wraptext("a"), [ pgens("b ", " c\n"), pgens("\nd ", " e"), pgens("\nf ", "g ")]) node32 = Template(wraptext("a"), [ pgens("\nb ", " c "), pgens("\nd ", " e "), pgens("\nf ", " g ")]) node33 = Template(wraptext("a"), [pgens("b", "c"), pgens("d", "e"), pgens("b", "f"), pgens("b", "h"), pgens("i", "j")]) node34 = Template(wraptext("a"), [pgens("1", "b"), pgens("x", "y"), pgens("1", "c"), pgens("2", "d")]) node35 = Template(wraptext("a"), [pgens("1", "b"), pgens("x", "y"), pgenh("1", "c"), pgenh("2", "d")]) node36 = Template(wraptext("a"), [pgens("b", "c"), pgens("d", "e"), pgens("f", "g")]) node37 = Template(wraptext("a"), [pgenh("1", "")]) node38 = Template(wraptext("abc")) node39 = Template(wraptext("a"), [pgenh("1", " b ")]) node40 = Template(wraptext("a"), [pgenh("1", " b"), pgenh("2", " c")]) node41 = Template(wraptext("a"), [pgens("1", " b"), pgens("2", " c")]) node1.add("e", "f", showkey=True) node2.add(2, "g", showkey=False) node3.add("e", "foo|bar", showkey=True) node4.add("e", "f", showkey=True, before="b") node5.add("f", "g", showkey=True, before=" d ") node6.add("f", "g", showkey=True, before="b") self.assertRaises(ValueError, node7.add, "e", "f", showkey=True, before="q") node8.add("e", "f", showkey=True, before=node8p) node9.add("e", "f", showkey=True, before=pgenh("1", "d")) self.assertRaises(ValueError, node10.add, "e", "f", showkey=True, before=pgenh("1", "d")) node11.add("d", "foo=bar", showkey=True) node12.add("1", "foo=bar", showkey=False) node13.add("h", "i", showkey=True) node14.add("j", "k", showkey=True) node15.add("h", "i", showkey=True) node16.add("h", "i", showkey=True, preserve_spacing=False) node17.add("2", "c") node18.add("3", "c") node19.add("c", "d") node20.add("5", "f") node21.add("3", "f") node22.add("6", "f") node23.add("c", "foo=bar") node24.add("2", "foo=bar") node25.add("b", "d") node26.add("1", "foo=bar") node27.add("1", "foo=bar", showkey=True) node28.add("1", "foo=bar", showkey=False) node29.add("d", "foo") node30.add("f", "foo") node31.add("f", "foo") node32.add("d", "foo", preserve_spacing=False) node33.add("b", "k") node34.add("1", "e") node35.add("1", "e") node36.add("d", "h", before="b") node37.add(1, "b") node38.add("1", "foo") self.assertRaises(ValueError, node38.add, "z", "bar", showkey=False) node39.add("1", "c") node40.add("3", "d") node41.add("3", "d") self.assertEqual("{{a|b=c|d|e=f}}", node1) self.assertEqual("{{a|b=c|d|g}}", node2) self.assertEqual("{{a|b=c|d|e=foo|bar}}", node3) self.assertIsInstance(node3.params[2].value.get(1), HTMLEntity) self.assertEqual("{{a|e=f|b=c|d}}", node4) self.assertEqual("{{a|b=c|f=g| d =e}}", node5) self.assertEqual("{{a|b=c|b=d|f=g|b=e}}", node6) self.assertEqual("{{a|b=c|d}}", node7) self.assertEqual("{{a|b=c|e=f|d}}", node8) self.assertEqual("{{a|b=c|e=f|d}}", node9) self.assertEqual("{{a|b=c|e}}", node10) self.assertEqual("{{a|b=c|d=foo=bar}}", node11) self.assertEqual("{{a|b=c|foo=bar}}", node12) self.assertIsInstance(node12.params[1].value.get(1), HTMLEntity) self.assertEqual("{{a|\nb = c|\nd = e|\nf = g|\nh = i}}", node13) self.assertEqual("{{a\n|b =c\n|d = e|f =g\n|h = i\n|j =k\n}}", node14) self.assertEqual("{{a|b = c\n|\nd = e|\nf =g |h =i}}", node15) self.assertEqual("{{a|\nb = c|\nd = e|\nf = g|h=i}}", node16) self.assertEqual("{{a|b|c}}", node17) self.assertEqual("{{a|b|3=c}}", node18) self.assertEqual("{{a|b|c=d}}", node19) self.assertEqual("{{a|b|c|d|e|f}}", node20) self.assertEqual("{{a|b|c|4=d|5=e|f}}", node21) self.assertEqual("{{a|b|c|4=d|5=e|6=f}}", node22) self.assertEqual("{{a|b|c=foo=bar}}", node23) self.assertEqual("{{a|b|foo=bar}}", node24) self.assertIsInstance(node24.params[1].value.get(1), HTMLEntity) self.assertEqual("{{a|b=d}}", node25) self.assertEqual("{{a|foo=bar}}", node26) self.assertIsInstance(node26.params[0].value.get(1), HTMLEntity) self.assertEqual("{{a|1=foo=bar}}", node27) self.assertEqual("{{a|foo=bar}}", node28) self.assertIsInstance(node28.params[0].value.get(1), HTMLEntity) self.assertEqual("{{a|\nb = c|\nd = foo|\nf = g}}", node29) self.assertEqual("{{a\n|b =c\n|d = e|f =foo\n|h = i\n}}", node30) self.assertEqual("{{a|b = c\n|\nd = e|\nf =foo }}", node31) self.assertEqual("{{a|\nb = c |\nd =foo|\nf = g }}", node32) self.assertEqual("{{a|b=k|d=e|i=j}}", node33) self.assertEqual("{{a|1=e|x=y|2=d}}", node34) self.assertEqual("{{a|x=y|e|d}}", node35) self.assertEqual("{{a|b=c|d=h|f=g}}", node36) self.assertEqual("{{a|b}}", node37) self.assertEqual("{{abc|foo}}", node38) self.assertEqual("{{a|c}}", node39) self.assertEqual("{{a| b| c|d}}", node40) self.assertEqual("{{a|1= b|2= c|3= d}}", node41) def test_remove(self): """test Template.remove()""" node1 = Template(wraptext("foobar")) node2 = Template(wraptext("foo"), [pgenh("1", "bar"), pgens("abc", "def")]) node3 = Template(wraptext("foo"), [pgenh("1", "bar"), pgens("abc", "def")]) node4 = Template(wraptext("foo"), [pgenh("1", "bar"), pgenh("2", "baz")]) node5 = Template(wraptext("foo"), [ pgens(" a", "b"), pgens("b", "c"), pgens("a ", "d")]) node6 = Template(wraptext("foo"), [ pgens(" a", "b"), pgens("b", "c"), pgens("a ", "d")]) node7 = Template(wraptext("foo"), [ pgens("1 ", "a"), pgens(" 1", "b"), pgens("2", "c")]) node8 = Template(wraptext("foo"), [ pgens("1 ", "a"), pgens(" 1", "b"), pgens("2", "c")]) node9 = Template(wraptext("foo"), [ pgens("1 ", "a"), pgenh("1", "b"), pgenh("2", "c")]) node10 = Template(wraptext("foo"), [ pgens("1 ", "a"), pgenh("1", "b"), pgenh("2", "c")]) node11 = Template(wraptext("foo"), [ pgens(" a", "b"), pgens("b", "c"), pgens("a ", "d")]) node12 = Template(wraptext("foo"), [ pgens(" a", "b"), pgens("b", "c"), pgens("a ", "d")]) node13 = Template(wraptext("foo"), [ pgens(" a", "b"), pgens("b", "c"), pgens("a ", "d")]) node14 = Template(wraptext("foo"), [ pgens(" a", "b"), pgens("b", "c"), pgens("a ", "d")]) node15 = Template(wraptext("foo"), [ pgens(" a", "b"), pgens("b", "c"), pgens("a ", "d")]) node16 = Template(wraptext("foo"), [ pgens(" a", "b"), pgens("b", "c"), pgens("a ", "d")]) node17 = Template(wraptext("foo"), [ pgens("1 ", "a"), pgenh("1", "b"), pgenh("2", "c")]) node18 = Template(wraptext("foo"), [ pgens("1 ", "a"), pgenh("1", "b"), pgenh("2", "c")]) node19 = Template(wraptext("foo"), [ pgens("1 ", "a"), pgenh("1", "b"), pgenh("2", "c")]) node20 = Template(wraptext("foo"), [ pgens("1 ", "a"), pgenh("1", "b"), pgenh("2", "c")]) node21 = Template(wraptext("foo"), [ pgens("a", "b"), pgens("c", "d"), pgens("e", "f"), pgens("a", "b"), pgens("a", "b")]) node22 = Template(wraptext("foo"), [ pgens("a", "b"), pgens("c", "d"), pgens("e", "f"), pgens("a", "b"), pgens("a", "b")]) node23 = Template(wraptext("foo"), [ pgens("a", "b"), pgens("c", "d"), pgens("e", "f"), pgens("a", "b"), pgens("a", "b")]) node24 = Template(wraptext("foo"), [ pgens("a", "b"), pgens("c", "d"), pgens("e", "f"), pgens("a", "b"), pgens("a", "b")]) node25 = Template(wraptext("foo"), [ pgens("a", "b"), pgens("c", "d"), pgens("e", "f"), pgens("a", "b"), pgens("a", "b")]) node26 = Template(wraptext("foo"), [ pgens("a", "b"), pgens("c", "d"), pgens("e", "f"), pgens("a", "b"), pgens("a", "b")]) node27 = Template(wraptext("foo"), [pgenh("1", "bar")]) node28 = Template(wraptext("foo"), [pgenh("1", "bar")]) node2.remove("1") node2.remove("abc") node3.remove(1, keep_field=True) node3.remove("abc", keep_field=True) node4.remove("1", keep_field=False) node5.remove("a", keep_field=False) node6.remove("a", keep_field=True) node7.remove(1, keep_field=True) node8.remove(1, keep_field=False) node9.remove(1, keep_field=True) node10.remove(1, keep_field=False) node11.remove(node11.params[0], keep_field=False) node12.remove(node12.params[0], keep_field=True) node13.remove(node13.params[1], keep_field=False) node14.remove(node14.params[1], keep_field=True) node15.remove(node15.params[2], keep_field=False) node16.remove(node16.params[2], keep_field=True) node17.remove(node17.params[0], keep_field=False) node18.remove(node18.params[0], keep_field=True) node19.remove(node19.params[1], keep_field=False) node20.remove(node20.params[1], keep_field=True) node21.remove("a", keep_field=False) node22.remove("a", keep_field=True) node23.remove(node23.params[0], keep_field=False) node24.remove(node24.params[0], keep_field=True) node25.remove(node25.params[3], keep_field=False) node26.remove(node26.params[3], keep_field=True) self.assertRaises(ValueError, node1.remove, 1) self.assertRaises(ValueError, node1.remove, "a") self.assertRaises(ValueError, node2.remove, "1") self.assertEqual("{{foo}}", node2) self.assertEqual("{{foo||abc=}}", node3) self.assertEqual("{{foo|2=baz}}", node4) self.assertEqual("{{foo|b=c}}", node5) self.assertEqual("{{foo| a=|b=c}}", node6) self.assertEqual("{{foo|1 =|2=c}}", node7) self.assertEqual("{{foo|2=c}}", node8) self.assertEqual("{{foo||c}}", node9) self.assertEqual("{{foo|2=c}}", node10) self.assertEqual("{{foo|b=c|a =d}}", node11) self.assertEqual("{{foo| a=|b=c|a =d}}", node12) self.assertEqual("{{foo| a=b|a =d}}", node13) self.assertEqual("{{foo| a=b|b=|a =d}}", node14) self.assertEqual("{{foo| a=b|b=c}}", node15) self.assertEqual("{{foo| a=b|b=c|a =}}", node16) self.assertEqual("{{foo|b|c}}", node17) self.assertEqual("{{foo|1 =|b|c}}", node18) self.assertEqual("{{foo|1 =a|2=c}}", node19) self.assertEqual("{{foo|1 =a||c}}", node20) self.assertEqual("{{foo|c=d|e=f}}", node21) self.assertEqual("{{foo|a=|c=d|e=f}}", node22) self.assertEqual("{{foo|c=d|e=f|a=b|a=b}}", node23) self.assertEqual("{{foo|a=|c=d|e=f|a=b|a=b}}", node24) self.assertEqual("{{foo|a=b|c=d|e=f|a=b}}", node25) self.assertEqual("{{foo|a=b|c=d|e=f|a=|a=b}}", node26) self.assertRaises(ValueError, node27.remove, node28.get(1)) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_text.py000066400000000000000000000055501255634533200206620ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.compat import str from mwparserfromhell.nodes import Text class TestText(unittest.TestCase): """Test cases for the Text node.""" def test_unicode(self): """test Text.__unicode__()""" node = Text("foobar") self.assertEqual("foobar", str(node)) node2 = Text("fóóbar") self.assertEqual("fóóbar", str(node2)) def test_children(self): """test Text.__children__()""" node = Text("foobar") gen = node.__children__() self.assertRaises(StopIteration, next, gen) def test_strip(self): """test Text.__strip__()""" node = Text("foobar") for a in (True, False): for b in (True, False): self.assertIs(node, node.__strip__(a, b)) def test_showtree(self): """test Text.__showtree__()""" output = [] node1 = Text("foobar") node2 = Text("fóóbar") node3 = Text("𐌲𐌿𐍄") node1.__showtree__(output.append, None, None) node2.__showtree__(output.append, None, None) node3.__showtree__(output.append, None, None) res = ["foobar", r"f\xf3\xf3bar", "\\U00010332\\U0001033f\\U00010344"] self.assertEqual(res, output) def test_value(self): """test getter/setter for the value attribute""" node = Text("foobar") self.assertEqual("foobar", node.value) self.assertIsInstance(node.value, str) node.value = "héhéhé" self.assertEqual("héhéhé", node.value) self.assertIsInstance(node.value, str) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_tokens.py000066400000000000000000000103661255634533200212020ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.compat import py3k from mwparserfromhell.parser import tokens class TestTokens(unittest.TestCase): """Test cases for the Token class and its subclasses.""" def test_issubclass(self): """check that all classes within the tokens module are really Tokens""" for name in tokens.__all__: klass = getattr(tokens, name) self.assertTrue(issubclass(klass, tokens.Token)) self.assertIsInstance(klass(), klass) self.assertIsInstance(klass(), tokens.Token) def test_attributes(self): """check that Token attributes can be managed properly""" token1 = tokens.Token() token2 = tokens.Token(foo="bar", baz=123) self.assertEqual("bar", token2.foo) self.assertEqual(123, token2.baz) self.assertFalse(token1.foo) self.assertFalse(token2.bar) token1.spam = "eggs" token2.foo = "ham" del token2.baz self.assertEqual("eggs", token1.spam) self.assertEqual("ham", token2.foo) self.assertFalse(token2.baz) self.assertRaises(KeyError, delattr, token2, "baz") def test_repr(self): """check that repr() on a Token works as expected""" token1 = tokens.Token() token2 = tokens.Token(foo="bar", baz=123) token3 = tokens.Text(text="earwig" * 100) hundredchars = ("earwig" * 100)[:97] + "..." self.assertEqual("Token()", repr(token1)) if py3k: token2repr1 = "Token(foo='bar', baz=123)" token2repr2 = "Token(baz=123, foo='bar')" token3repr = "Text(text='" + hundredchars + "')" else: token2repr1 = "Token(foo=u'bar', baz=123)" token2repr2 = "Token(baz=123, foo=u'bar')" token3repr = "Text(text=u'" + hundredchars + "')" token2repr = repr(token2) self.assertTrue(token2repr == token2repr1 or token2repr == token2repr2) self.assertEqual(token3repr, repr(token3)) def test_equality(self): """check that equivalent tokens are considered equal""" token1 = tokens.Token() token2 = tokens.Token() token3 = tokens.Token(foo="bar", baz=123) token4 = tokens.Text(text="asdf") token5 = tokens.Text(text="asdf") token6 = tokens.TemplateOpen(text="asdf") self.assertEqual(token1, token2) self.assertEqual(token2, token1) self.assertEqual(token4, token5) self.assertEqual(token5, token4) self.assertNotEqual(token1, token3) self.assertNotEqual(token2, token3) self.assertNotEqual(token4, token6) self.assertNotEqual(token5, token6) def test_repr_equality(self): "check that eval(repr(token)) == token" tests = [ tokens.Token(), tokens.Token(foo="bar", baz=123), tokens.Text(text="earwig") ] for token in tests: self.assertEqual(token, eval(repr(token), vars(tokens))) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_utils.py000066400000000000000000000055271255634533200210420ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.nodes import Template, Text from mwparserfromhell.utils import parse_anything from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext class TestUtils(TreeEqualityTestCase): """Tests for the utils module, which provides parse_anything().""" def test_parse_anything_valid(self): """tests for valid input to utils.parse_anything()""" tests = [ (wraptext("foobar"), wraptext("foobar")), (Template(wraptext("spam")), wrap([Template(wraptext("spam"))])), ("fóóbar", wraptext("fóóbar")), (b"foob\xc3\xa1r", wraptext("foobár")), (123, wraptext("123")), (True, wraptext("True")), (None, wrap([])), ([Text("foo"), Text("bar"), Text("baz")], wraptext("foo", "bar", "baz")), ([wraptext("foo"), Text("bar"), "baz", 123, 456], wraptext("foo", "bar", "baz", "123", "456")), ([[[([[((("foo",),),)], "bar"],)]]], wraptext("foo", "bar")) ] for test, valid in tests: self.assertWikicodeEqual(valid, parse_anything(test)) def test_parse_anything_invalid(self): """tests for invalid input to utils.parse_anything()""" self.assertRaises(ValueError, parse_anything, Ellipsis) self.assertRaises(ValueError, parse_anything, object) self.assertRaises(ValueError, parse_anything, object()) self.assertRaises(ValueError, parse_anything, type) self.assertRaises(ValueError, parse_anything, ["foo", [object]]) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_wikicode.py000066400000000000000000000542131255634533200214740ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals from functools import partial import re from types import GeneratorType try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.compat import py3k, str from mwparserfromhell.nodes import (Argument, Comment, Heading, HTMLEntity, Node, Tag, Template, Text, Wikilink) from mwparserfromhell.smart_list import SmartList from mwparserfromhell.wikicode import Wikicode from mwparserfromhell import parse from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext class TestWikicode(TreeEqualityTestCase): """Tests for the Wikicode class, which manages a list of nodes.""" def test_unicode(self): """test Wikicode.__unicode__()""" code1 = parse("foobar") code2 = parse("Have a {{template}} and a [[page|link]]") self.assertEqual("foobar", str(code1)) self.assertEqual("Have a {{template}} and a [[page|link]]", str(code2)) def test_nodes(self): """test getter/setter for the nodes attribute""" code = parse("Have a {{template}}") self.assertEqual(["Have a ", "{{template}}"], code.nodes) L1 = SmartList([Text("foobar"), Template(wraptext("abc"))]) L2 = [Text("barfoo"), Template(wraptext("cba"))] L3 = "abc{{def}}" code.nodes = L1 self.assertIs(L1, code.nodes) code.nodes = L2 self.assertIs(L2, code.nodes) code.nodes = L3 self.assertEqual(["abc", "{{def}}"], code.nodes) self.assertRaises(ValueError, setattr, code, "nodes", object) def test_get(self): """test Wikicode.get()""" code = parse("Have a {{template}} and a [[page|link]]") self.assertIs(code.nodes[0], code.get(0)) self.assertIs(code.nodes[2], code.get(2)) self.assertRaises(IndexError, code.get, 4) def test_set(self): """test Wikicode.set()""" code = parse("Have a {{template}} and a [[page|link]]") code.set(1, "{{{argument}}}") self.assertEqual("Have a {{{argument}}} and a [[page|link]]", code) self.assertIsInstance(code.get(1), Argument) code.set(2, None) self.assertEqual("Have a {{{argument}}}[[page|link]]", code) code.set(-3, "This is an ") self.assertEqual("This is an {{{argument}}}[[page|link]]", code) self.assertRaises(ValueError, code.set, 1, "foo {{bar}}") self.assertRaises(IndexError, code.set, 3, "{{baz}}") self.assertRaises(IndexError, code.set, -4, "{{baz}}") def test_index(self): """test Wikicode.index()""" code = parse("Have a {{template}} and a [[page|link]]") self.assertEqual(0, code.index("Have a ")) self.assertEqual(3, code.index("[[page|link]]")) self.assertEqual(1, code.index(code.get(1))) self.assertRaises(ValueError, code.index, "foo") code = parse("{{foo}}{{bar|{{baz}}}}") self.assertEqual(1, code.index("{{bar|{{baz}}}}")) self.assertEqual(1, code.index("{{baz}}", recursive=True)) self.assertEqual(1, code.index(code.get(1).get(1).value, recursive=True)) self.assertRaises(ValueError, code.index, "{{baz}}", recursive=False) self.assertRaises(ValueError, code.index, code.get(1).get(1).value, recursive=False) def test_insert(self): """test Wikicode.insert()""" code = parse("Have a {{template}} and a [[page|link]]") code.insert(1, "{{{argument}}}") self.assertEqual( "Have a {{{argument}}}{{template}} and a [[page|link]]", code) self.assertIsInstance(code.get(1), Argument) code.insert(2, None) self.assertEqual( "Have a {{{argument}}}{{template}} and a [[page|link]]", code) code.insert(-3, Text("foo")) self.assertEqual( "Have a {{{argument}}}foo{{template}} and a [[page|link]]", code) code2 = parse("{{foo}}{{bar}}{{baz}}") code2.insert(1, "abc{{def}}ghi[[jk]]") self.assertEqual("{{foo}}abc{{def}}ghi[[jk]]{{bar}}{{baz}}", code2) self.assertEqual(["{{foo}}", "abc", "{{def}}", "ghi", "[[jk]]", "{{bar}}", "{{baz}}"], code2.nodes) code3 = parse("{{foo}}bar") code3.insert(1000, "[[baz]]") code3.insert(-1000, "derp") self.assertEqual("derp{{foo}}bar[[baz]]", code3) def _test_search(self, meth, expected): """Base test for insert_before(), insert_after(), and replace().""" code = parse("{{a}}{{b}}{{c}}{{d}}{{e}}") func = partial(meth, code) func("{{b}}", "x", recursive=True) func("{{d}}", "[[y]]", recursive=False) func(code.get(2), "z") self.assertEqual(expected[0], code) self.assertRaises(ValueError, func, "{{r}}", "n", recursive=True) self.assertRaises(ValueError, func, "{{r}}", "n", recursive=False) fake = parse("{{a}}").get(0) self.assertRaises(ValueError, func, fake, "n", recursive=True) self.assertRaises(ValueError, func, fake, "n", recursive=False) code2 = parse("{{a}}{{a}}{{a}}{{b}}{{b}}{{b}}") func = partial(meth, code2) func(code2.get(1), "c", recursive=False) func("{{a}}", "d", recursive=False) func(code2.get(-1), "e", recursive=True) func("{{b}}", "f", recursive=True) self.assertEqual(expected[1], code2) code3 = parse("{{a|{{b}}|{{c|d={{f}}}}}}") func = partial(meth, code3) obj = code3.get(0).params[0].value.get(0) self.assertRaises(ValueError, func, obj, "x", recursive=False) func(obj, "x", recursive=True) self.assertRaises(ValueError, func, "{{f}}", "y", recursive=False) func("{{f}}", "y", recursive=True) self.assertEqual(expected[2], code3) code4 = parse("{{a}}{{b}}{{c}}{{d}}{{e}}{{f}}{{g}}{{h}}{{i}}{{j}}") func = partial(meth, code4) fake = parse("{{b}}{{c}}") self.assertRaises(ValueError, func, fake, "q", recursive=False) self.assertRaises(ValueError, func, fake, "q", recursive=True) func("{{b}}{{c}}", "w", recursive=False) func("{{d}}{{e}}", "x", recursive=True) func(wrap(code4.nodes[-2:]), "y", recursive=False) func(wrap(code4.nodes[-2:]), "z", recursive=True) self.assertEqual(expected[3], code4) self.assertRaises(ValueError, func, "{{c}}{{d}}", "q", recursive=False) self.assertRaises(ValueError, func, "{{c}}{{d}}", "q", recursive=True) code5 = parse("{{a|{{b}}{{c}}|{{f|{{g}}={{h}}{{i}}}}}}") func = partial(meth, code5) self.assertRaises(ValueError, func, "{{b}}{{c}}", "x", recursive=False) func("{{b}}{{c}}", "x", recursive=True) obj = code5.get(0).params[1].value.get(0).params[0].value self.assertRaises(ValueError, func, obj, "y", recursive=False) func(obj, "y", recursive=True) self.assertEqual(expected[4], code5) code6 = parse("here is {{some text and a {{template}}}}") func = partial(meth, code6) self.assertRaises(ValueError, func, "text and", "ab", recursive=False) func("text and", "ab", recursive=True) self.assertRaises(ValueError, func, "is {{some", "cd", recursive=False) func("is {{some", "cd", recursive=True) self.assertEqual(expected[5], code6) code7 = parse("{{foo}}{{bar}}{{baz}}{{foo}}{{baz}}") func = partial(meth, code7) obj = wrap([code7.get(0), code7.get(2)]) self.assertRaises(ValueError, func, obj, "{{lol}}") func("{{foo}}{{baz}}", "{{lol}}") self.assertEqual(expected[6], code7) def test_insert_before(self): """test Wikicode.insert_before()""" meth = lambda code, *args, **kw: code.insert_before(*args, **kw) expected = [ "{{a}}xz{{b}}{{c}}[[y]]{{d}}{{e}}", "d{{a}}cd{{a}}d{{a}}f{{b}}f{{b}}ef{{b}}", "{{a|x{{b}}|{{c|d=y{{f}}}}}}", "{{a}}w{{b}}{{c}}x{{d}}{{e}}{{f}}{{g}}{{h}}yz{{i}}{{j}}", "{{a|x{{b}}{{c}}|{{f|{{g}}=y{{h}}{{i}}}}}}", "here cdis {{some abtext and a {{template}}}}", "{{foo}}{{bar}}{{baz}}{{lol}}{{foo}}{{baz}}"] self._test_search(meth, expected) def test_insert_after(self): """test Wikicode.insert_after()""" meth = lambda code, *args, **kw: code.insert_after(*args, **kw) expected = [ "{{a}}{{b}}xz{{c}}{{d}}[[y]]{{e}}", "{{a}}d{{a}}dc{{a}}d{{b}}f{{b}}f{{b}}fe", "{{a|{{b}}x|{{c|d={{f}}y}}}}", "{{a}}{{b}}{{c}}w{{d}}{{e}}x{{f}}{{g}}{{h}}{{i}}{{j}}yz", "{{a|{{b}}{{c}}x|{{f|{{g}}={{h}}{{i}}y}}}}", "here is {{somecd text andab a {{template}}}}", "{{foo}}{{bar}}{{baz}}{{foo}}{{baz}}{{lol}}"] self._test_search(meth, expected) def test_replace(self): """test Wikicode.replace()""" meth = lambda code, *args, **kw: code.replace(*args, **kw) expected = [ "{{a}}xz[[y]]{{e}}", "dcdffe", "{{a|x|{{c|d=y}}}}", "{{a}}wx{{f}}{{g}}z", "{{a|x|{{f|{{g}}=y}}}}", "here cd ab a {{template}}}}", "{{foo}}{{bar}}{{baz}}{{lol}}"] self._test_search(meth, expected) def test_append(self): """test Wikicode.append()""" code = parse("Have a {{template}}") code.append("{{{argument}}}") self.assertEqual("Have a {{template}}{{{argument}}}", code) self.assertIsInstance(code.get(2), Argument) code.append(None) self.assertEqual("Have a {{template}}{{{argument}}}", code) code.append(Text(" foo")) self.assertEqual("Have a {{template}}{{{argument}}} foo", code) self.assertRaises(ValueError, code.append, slice(0, 1)) def test_remove(self): """test Wikicode.remove()""" meth = lambda code, obj, value, **kw: code.remove(obj, **kw) expected = [ "{{a}}{{c}}", "", "{{a||{{c|d=}}}}", "{{a}}{{f}}", "{{a||{{f|{{g}}=}}}}", "here a {{template}}}}", "{{foo}}{{bar}}{{baz}}"] self._test_search(meth, expected) def test_matches(self): """test Wikicode.matches()""" code1 = parse("Cleanup") code2 = parse("\nstub") code3 = parse("") self.assertTrue(code1.matches("Cleanup")) self.assertTrue(code1.matches("cleanup")) self.assertTrue(code1.matches(" cleanup\n")) self.assertFalse(code1.matches("CLEANup")) self.assertFalse(code1.matches("Blah")) self.assertTrue(code2.matches("stub")) self.assertTrue(code2.matches("Stub")) self.assertFalse(code2.matches("StuB")) self.assertTrue(code1.matches(("cleanup", "stub"))) self.assertTrue(code2.matches(("cleanup", "stub"))) self.assertFalse(code2.matches(("StuB", "sTUb", "foobar"))) self.assertFalse(code2.matches(["StuB", "sTUb", "foobar"])) self.assertTrue(code2.matches(("StuB", "sTUb", "foo", "bar", "Stub"))) self.assertTrue(code2.matches(["StuB", "sTUb", "foo", "bar", "Stub"])) self.assertTrue(code3.matches("")) self.assertTrue(code3.matches("")) self.assertTrue(code3.matches(("a", "b", ""))) def test_filter_family(self): """test the Wikicode.i?filter() family of functions""" def genlist(gen): self.assertIsInstance(gen, GeneratorType) return list(gen) ifilter = lambda code: (lambda *a, **k: genlist(code.ifilter(*a, **k))) code = parse("a{{b}}c[[d]]{{{e}}}{{f}}[[g]]") for func in (code.filter, ifilter(code)): self.assertEqual(["a", "{{b}}", "b", "c", "[[d]]", "d", "{{{e}}}", "e", "{{f}}", "f", "[[g]]", "g"], func()) self.assertEqual(["{{{e}}}"], func(forcetype=Argument)) self.assertIs(code.get(4), func(forcetype=Argument)[0]) self.assertEqual(list("abcdefg"), func(forcetype=Text)) self.assertEqual([], func(forcetype=Heading)) self.assertRaises(TypeError, func, forcetype=True) funcs = [ lambda name, **kw: getattr(code, "filter_" + name)(**kw), lambda name, **kw: genlist(getattr(code, "ifilter_" + name)(**kw)) ] for get_filter in funcs: self.assertEqual(["{{{e}}}"], get_filter("arguments")) self.assertIs(code.get(4), get_filter("arguments")[0]) self.assertEqual([], get_filter("comments")) self.assertEqual([], get_filter("external_links")) self.assertEqual([], get_filter("headings")) self.assertEqual([], get_filter("html_entities")) self.assertEqual([], get_filter("tags")) self.assertEqual(["{{b}}", "{{f}}"], get_filter("templates")) self.assertEqual(list("abcdefg"), get_filter("text")) self.assertEqual(["[[d]]", "[[g]]"], get_filter("wikilinks")) code2 = parse("{{a|{{b}}|{{c|d={{f}}{{h}}}}}}") for func in (code2.filter, ifilter(code2)): self.assertEqual(["{{a|{{b}}|{{c|d={{f}}{{h}}}}}}"], func(recursive=False, forcetype=Template)) self.assertEqual(["{{a|{{b}}|{{c|d={{f}}{{h}}}}}}", "{{b}}", "{{c|d={{f}}{{h}}}}", "{{f}}", "{{h}}"], func(recursive=True, forcetype=Template)) code3 = parse("{{foobar}}{{FOO}}{{baz}}{{bz}}{{barfoo}}") for func in (code3.filter, ifilter(code3)): self.assertEqual(["{{foobar}}", "{{barfoo}}"], func(False, matches=lambda node: "foo" in node)) self.assertEqual(["{{foobar}}", "{{FOO}}", "{{barfoo}}"], func(False, matches=r"foo")) self.assertEqual(["{{foobar}}", "{{FOO}}"], func(matches=r"^{{foo.*?}}")) self.assertEqual(["{{foobar}}"], func(matches=r"^{{foo.*?}}", flags=re.UNICODE)) self.assertEqual(["{{baz}}", "{{bz}}"], func(matches=r"^{{b.*?z")) self.assertEqual(["{{baz}}"], func(matches=r"^{{b.+?z}}")) exp_rec = ["{{a|{{b}}|{{c|d={{f}}{{h}}}}}}", "{{b}}", "{{c|d={{f}}{{h}}}}", "{{f}}", "{{h}}"] exp_unrec = ["{{a|{{b}}|{{c|d={{f}}{{h}}}}}}"] self.assertEqual(exp_rec, code2.filter_templates()) self.assertEqual(exp_unrec, code2.filter_templates(recursive=False)) self.assertEqual(exp_rec, code2.filter_templates(recursive=True)) self.assertEqual(exp_rec, code2.filter_templates(True)) self.assertEqual(exp_unrec, code2.filter_templates(False)) self.assertEqual(["{{foobar}}"], code3.filter_templates( matches=lambda node: node.name.matches("Foobar"))) self.assertEqual(["{{baz}}", "{{bz}}"], code3.filter_templates(matches=r"^{{b.*?z")) self.assertEqual([], code3.filter_tags(matches=r"^{{b.*?z")) self.assertEqual([], code3.filter_tags(matches=r"^{{b.*?z", flags=0)) self.assertRaises(TypeError, code.filter_templates, a=42) self.assertRaises(TypeError, code.filter_templates, forcetype=Template) self.assertRaises(TypeError, code.filter_templates, 1, 0, 0, Template) code4 = parse("{{foo}}{{foo|{{bar}}}}") actual1 = code4.filter_templates(recursive=code4.RECURSE_OTHERS) actual2 = code4.filter_templates(code4.RECURSE_OTHERS) self.assertEqual(["{{foo}}", "{{foo|{{bar}}}}"], actual1) self.assertEqual(["{{foo}}", "{{foo|{{bar}}}}"], actual2) def test_get_sections(self): """test Wikicode.get_sections()""" page1 = parse("") page2 = parse("==Heading==") page3 = parse("===Heading===\nFoo bar baz\n====Gnidaeh====\n") p4_lead = "This is a lead.\n" p4_IA = "=== Section I.A ===\nSection I.A [[body]].\n" p4_IB1 = "==== Section I.B.1 ====\nSection I.B.1 body.\n\n•Some content.\n\n" p4_IB = "=== Section I.B ===\n" + p4_IB1 p4_I = "== Section I ==\nSection I body. {{and a|template}}\n" + p4_IA + p4_IB p4_II = "== Section II ==\nSection II body.\n\n" p4_IIIA1a = "===== Section III.A.1.a =====\nMore text.\n" p4_IIIA2ai1 = "======= Section III.A.2.a.i.1 =======\nAn invalid section!" p4_IIIA2 = "==== Section III.A.2 ====\nEven more text.\n" + p4_IIIA2ai1 p4_IIIA = "=== Section III.A ===\nText.\n" + p4_IIIA1a + p4_IIIA2 p4_III = "== Section III ==\n" + p4_IIIA page4 = parse(p4_lead + p4_I + p4_II + p4_III) self.assertEqual([""], page1.get_sections()) self.assertEqual(["", "==Heading=="], page2.get_sections()) self.assertEqual(["", "===Heading===\nFoo bar baz\n====Gnidaeh====\n", "====Gnidaeh====\n"], page3.get_sections()) self.assertEqual([p4_lead, p4_I, p4_IA, p4_IB, p4_IB1, p4_II, p4_III, p4_IIIA, p4_IIIA1a, p4_IIIA2, p4_IIIA2ai1], page4.get_sections()) self.assertEqual(["====Gnidaeh====\n"], page3.get_sections(levels=[4])) self.assertEqual(["===Heading===\nFoo bar baz\n====Gnidaeh====\n"], page3.get_sections(levels=(2, 3))) self.assertEqual(["===Heading===\nFoo bar baz\n"], page3.get_sections(levels=(2, 3), flat=True)) self.assertEqual([], page3.get_sections(levels=[0])) self.assertEqual(["", "====Gnidaeh====\n"], page3.get_sections(levels=[4], include_lead=True)) self.assertEqual(["===Heading===\nFoo bar baz\n====Gnidaeh====\n", "====Gnidaeh====\n"], page3.get_sections(include_lead=False)) self.assertEqual(["===Heading===\nFoo bar baz\n", "====Gnidaeh====\n"], page3.get_sections(flat=True, include_lead=False)) self.assertEqual([p4_IB1, p4_IIIA2], page4.get_sections(levels=[4])) self.assertEqual([p4_IA, p4_IB, p4_IIIA], page4.get_sections(levels=[3])) self.assertEqual([p4_IA, "=== Section I.B ===\n", "=== Section III.A ===\nText.\n"], page4.get_sections(levels=[3], flat=True)) self.assertEqual(["", ""], page2.get_sections(include_headings=False)) self.assertEqual(["\nSection I.B.1 body.\n\n•Some content.\n\n", "\nEven more text.\n" + p4_IIIA2ai1], page4.get_sections(levels=[4], include_headings=False)) self.assertEqual([], page4.get_sections(matches=r"body")) self.assertEqual([p4_I, p4_IA, p4_IB, p4_IB1], page4.get_sections(matches=r"Section\sI[.\s].*?")) self.assertEqual([p4_IA, p4_IIIA, p4_IIIA1a, p4_IIIA2, p4_IIIA2ai1], page4.get_sections(matches=r".*?a.*?")) self.assertEqual([p4_IIIA1a, p4_IIIA2ai1], page4.get_sections(matches=r".*?a.*?", flags=re.U)) self.assertEqual(["\nMore text.\n", "\nAn invalid section!"], page4.get_sections(matches=r".*?a.*?", flags=re.U, include_headings=False)) sections = page2.get_sections(include_headings=False) sections[0].append("Lead!\n") sections[1].append("\nFirst section!") self.assertEqual("Lead!\n==Heading==\nFirst section!", page2) page5 = parse("X\n== Foo ==\nBar\n== Baz ==\nBuzz") section = page5.get_sections(matches="Foo")[0] section.replace("\nBar\n", "\nBarf ") section.append("{{Haha}}\n") self.assertEqual("== Foo ==\nBarf {{Haha}}\n", section) self.assertEqual("X\n== Foo ==\nBarf {{Haha}}\n== Baz ==\nBuzz", page5) def test_strip_code(self): """test Wikicode.strip_code()""" # Since individual nodes have test cases for their __strip__ methods, # we're only going to do an integration test: code = parse("Foo [[bar]]\n\n{{baz}}\n\n[[a|b]] Σ") self.assertEqual("Foo bar\n\nb Σ", code.strip_code(normalize=True, collapse=True)) self.assertEqual("Foo bar\n\n\n\nb Σ", code.strip_code(normalize=True, collapse=False)) self.assertEqual("Foo bar\n\nb Σ", code.strip_code(normalize=False, collapse=True)) self.assertEqual("Foo bar\n\n\n\nb Σ", code.strip_code(normalize=False, collapse=False)) def test_get_tree(self): """test Wikicode.get_tree()""" # Since individual nodes have test cases for their __showtree___ # methods, and the docstring covers all possibilities for the output of # __showtree__, we'll test it only: code = parse("Lorem ipsum {{foo|bar|{{baz}}|spam=eggs}}") expected = "Lorem ipsum \n{{\n\t foo\n\t| 1\n\t= bar\n\t| 2\n\t= " + \ "{{\n\t\t\tbaz\n\t }}\n\t| spam\n\t= eggs\n}}" self.assertEqual(expected.expandtabs(4), code.get_tree()) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/test_wikilink.py000066400000000000000000000103301255634533200215070ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (C) 2012-2015 Ben Kurtovic # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE # SOFTWARE. from __future__ import unicode_literals try: import unittest2 as unittest except ImportError: import unittest from mwparserfromhell.compat import str from mwparserfromhell.nodes import Text, Wikilink from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext class TestWikilink(TreeEqualityTestCase): """Test cases for the Wikilink node.""" def test_unicode(self): """test Wikilink.__unicode__()""" node = Wikilink(wraptext("foobar")) self.assertEqual("[[foobar]]", str(node)) node2 = Wikilink(wraptext("foo"), wraptext("bar")) self.assertEqual("[[foo|bar]]", str(node2)) def test_children(self): """test Wikilink.__children__()""" node1 = Wikilink(wraptext("foobar")) node2 = Wikilink(wraptext("foo"), wrap([Text("bar"), Text("baz")])) gen1 = node1.__children__() gen2 = node2.__children__() self.assertEqual(node1.title, next(gen1)) self.assertEqual(node2.title, next(gen2)) self.assertEqual(node2.text, next(gen2)) self.assertRaises(StopIteration, next, gen1) self.assertRaises(StopIteration, next, gen2) def test_strip(self): """test Wikilink.__strip__()""" node = Wikilink(wraptext("foobar")) node2 = Wikilink(wraptext("foo"), wraptext("bar")) for a in (True, False): for b in (True, False): self.assertEqual("foobar", node.__strip__(a, b)) self.assertEqual("bar", node2.__strip__(a, b)) def test_showtree(self): """test Wikilink.__showtree__()""" output = [] getter, marker = object(), object() get = lambda code: output.append((getter, code)) mark = lambda: output.append(marker) node1 = Wikilink(wraptext("foobar")) node2 = Wikilink(wraptext("foo"), wraptext("bar")) node1.__showtree__(output.append, get, mark) node2.__showtree__(output.append, get, mark) valid = [ "[[", (getter, node1.title), "]]", "[[", (getter, node2.title), " | ", marker, (getter, node2.text), "]]"] self.assertEqual(valid, output) def test_title(self): """test getter/setter for the title attribute""" title = wraptext("foobar") node1 = Wikilink(title) node2 = Wikilink(title, wraptext("baz")) self.assertIs(title, node1.title) self.assertIs(title, node2.title) node1.title = "héhehé" node2.title = "héhehé" self.assertWikicodeEqual(wraptext("héhehé"), node1.title) self.assertWikicodeEqual(wraptext("héhehé"), node2.title) def test_text(self): """test getter/setter for the text attribute""" text = wraptext("baz") node1 = Wikilink(wraptext("foobar")) node2 = Wikilink(wraptext("foobar"), text) self.assertIs(None, node1.text) self.assertIs(text, node2.text) node1.text = "buzz" node2.text = None self.assertWikicodeEqual(wraptext("buzz"), node1.text) self.assertIs(None, node2.text) if __name__ == "__main__": unittest.main(verbosity=2) mwparserfromhell-0.4.2/tests/tokenizer/000077500000000000000000000000001255634533200202725ustar00rootroot00000000000000mwparserfromhell-0.4.2/tests/tokenizer/arguments.mwtest000066400000000000000000000061351255634533200235510ustar00rootroot00000000000000name: blank label: argument with no content input: "{{{}}}" output: [ArgumentOpen(), ArgumentClose()] --- name: blank_with_default label: argument with no content but a pipe input: "{{{|}}}" output: [ArgumentOpen(), ArgumentSeparator(), ArgumentClose()] --- name: basic label: simplest type of argument input: "{{{argument}}}" output: [ArgumentOpen(), Text(text="argument"), ArgumentClose()] --- name: default label: argument with a default value input: "{{{foo|bar}}}" output: [ArgumentOpen(), Text(text="foo"), ArgumentSeparator(), Text(text="bar"), ArgumentClose()] --- name: blank_with_multiple_defaults label: no content, multiple pipes input: "{{{|||}}}" output: [ArgumentOpen(), ArgumentSeparator(), Text(text="||"), ArgumentClose()] --- name: multiple_defaults label: multiple values separated by pipes input: "{{{foo|bar|baz}}}" output: [ArgumentOpen(), Text(text="foo"), ArgumentSeparator(), Text(text="bar|baz"), ArgumentClose()] --- name: newline label: newline as only content input: "{{{\n}}}" output: [ArgumentOpen(), Text(text="\n"), ArgumentClose()] --- name: right_braces label: multiple } scattered throughout text input: "{{{foo}b}a}r}}}" output: [ArgumentOpen(), Text(text="foo}b}a}r"), ArgumentClose()] --- name: right_braces_default label: multiple } scattered throughout text, with a default value input: "{{{foo}b}|}a}r}}}" output: [ArgumentOpen(), Text(text="foo}b}"), ArgumentSeparator(), Text(text="}a}r"), ArgumentClose()] --- name: nested label: an argument nested within another argument input: "{{{{{{foo}}}|{{{bar}}}}}}" output: [ArgumentOpen(), ArgumentOpen(), Text(text="foo"), ArgumentClose(), ArgumentSeparator(), ArgumentOpen(), Text(text="bar"), ArgumentClose(), ArgumentClose()] --- name: invalid_braces label: invalid argument: multiple braces that are not part of a template or argument input: "{{{foo{{[a}}}}}" output: [Text(text="{{{foo{{[a}}}}}")] --- name: incomplete_open_only label: incomplete arguments: just an open input: "{{{" output: [Text(text="{{{")] --- name: incomplete_open_text label: incomplete arguments: an open with some text input: "{{{foo" output: [Text(text="{{{foo")] --- name: incomplete_open_text_pipe label: incomplete arguments: an open, text, then a pipe input: "{{{foo|" output: [Text(text="{{{foo|")] --- name: incomplete_open_pipe label: incomplete arguments: an open, then a pipe input: "{{{|" output: [Text(text="{{{|")] --- name: incomplete_open_pipe_text label: incomplete arguments: an open, then a pipe, then text input: "{{{|foo" output: [Text(text="{{{|foo")] --- name: incomplete_open_pipes_text label: incomplete arguments: a pipe, then text then two pipes input: "{{{|f||" output: [Text(text="{{{|f||")] --- name: incomplete_open_partial_close label: incomplete arguments: an open, then one right brace input: "{{{{}" output: [Text(text="{{{{}")] --- name: incomplete_preserve_previous label: incomplete arguments: a valid argument followed by an invalid one input: "{{{foo}}} {{{bar" output: [ArgumentOpen(), Text(text="foo"), ArgumentClose(), Text(text=" {{{bar")] mwparserfromhell-0.4.2/tests/tokenizer/comments.mwtest000066400000000000000000000020041255634533200233600ustar00rootroot00000000000000name: blank label: a blank comment input: "" output: [CommentStart(), CommentEnd()] --- name: basic label: a basic comment input: "" output: [CommentStart(), Text(text=" comment "), CommentEnd()] --- name: tons_of_nonsense label: a comment with tons of ignorable garbage in it input: "" output: [CommentStart(), Text(text=" foo{{bar}}[[basé\n\n]{}{}{}{}]{{{{{{haha{{--a>aabsp;" output: [Text(text="&n"), CommentStart(), Text(text="foo"), CommentEnd(), Text(text="bsp;")] --- name: rich_tags label: a HTML tag with tons of other things in it input: "{{dubious claim}}[[Source]]" output: [TemplateOpen(), Text(text="dubious claim"), TemplateClose(), TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), TemplateOpen(), Text(text="abc"), TemplateClose(), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="foo"), TagAttrEquals(), TagAttrQuote(char="\""), Text(text="bar "), TemplateOpen(), Text(text="baz"), TemplateClose(), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="abc"), TagAttrEquals(), TemplateOpen(), Text(text="de"), TemplateClose(), Text(text="f"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="ghi"), TagAttrEquals(), Text(text="j"), TemplateOpen(), Text(text="k"), TemplateClose(), TemplateOpen(), Text(text="l"), TemplateClose(), TagAttrStart(pad_first=" \n ", pad_before_eq=" ", pad_after_eq=" "), Text(text="mno"), TagAttrEquals(), TagAttrQuote(char="\""), TemplateOpen(), Text(text="p"), TemplateClose(), Text(text=" "), WikilinkOpen(), Text(text="q"), WikilinkClose(), Text(text=" "), TemplateOpen(), Text(text="r"), TemplateClose(), TagCloseOpen(padding=""), WikilinkOpen(), Text(text="Source"), WikilinkClose(), TagOpenClose(), Text(text="ref"), TagCloseClose()] --- name: wildcard label: a wildcard assortment of various things input: "{{{{{{{{foo}}bar|baz=biz}}buzz}}usr|{{bin}}}}" output: [TemplateOpen(), TemplateOpen(), TemplateOpen(), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text="bar"), TemplateParamSeparator(), Text(text="baz"), TemplateParamEquals(), Text(text="biz"), TemplateClose(), Text(text="buzz"), TemplateClose(), Text(text="usr"), TemplateParamSeparator(), TemplateOpen(), Text(text="bin"), TemplateClose(), TemplateClose()] --- name: wildcard_redux label: an even wilder assortment of various things input: "{{a|b|{{c|[[d]]{{{e}}}}}}}[[f|{{{g}}}]]{{i|j= }}" output: [TemplateOpen(), Text(text="a"), TemplateParamSeparator(), Text(text="b"), TemplateParamSeparator(), TemplateOpen(), Text(text="c"), TemplateParamSeparator(), WikilinkOpen(), Text(text="d"), WikilinkClose(), ArgumentOpen(), Text(text="e"), ArgumentClose(), TemplateClose(), TemplateClose(), WikilinkOpen(), Text(text="f"), WikilinkSeparator(), ArgumentOpen(), Text(text="g"), ArgumentClose(), CommentStart(), Text(text="h"), CommentEnd(), WikilinkClose(), TemplateOpen(), Text(text="i"), TemplateParamSeparator(), Text(text="j"), TemplateParamEquals(), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), TemplateClose()] --- name: link_inside_dl label: an external link inside a def list, such that the external link is parsed input: ";;;mailto:example" output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), ExternalLinkOpen(brackets=False), Text(text="mailto:example"), ExternalLinkClose()] --- name: link_inside_dl_2 label: an external link inside a def list, such that the external link is not parsed input: ";;;malito:example" output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="malito"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="example")] --- name: link_inside_template label: an external link nested inside a template, before the end input: "{{URL|http://example.com}}" output: [TemplateOpen(), Text(text="URL"), TemplateParamSeparator(), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), TemplateClose()] --- name: link_inside_template_2 label: an external link nested inside a template, before a separator input: "{{URL|http://example.com|foobar}}" output: [TemplateOpen(), Text(text="URL"), TemplateParamSeparator(), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), TemplateParamSeparator(), Text(text="foobar"), TemplateClose()] --- name: link_inside_template_3 label: an external link nested inside a template, before an equal sign input: "{{URL|http://example.com=foobar}}" output: [TemplateOpen(), Text(text="URL"), TemplateParamSeparator(), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), TemplateParamEquals(), Text(text="foobar"), TemplateClose()] --- name: link_inside_argument label: an external link nested inside an argument input: "{{{URL|http://example.com}}}" output: [ArgumentOpen(), Text(text="URL"), ArgumentSeparator(), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), ArgumentClose()] --- name: link_inside_heading label: an external link nested inside a heading input: "==http://example.com==" output: [HeadingStart(level=2), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), HeadingEnd()] --- name: link_inside_tag_body label: an external link nested inside the body of a tag input: "http://example.com" output: [TagOpenOpen(), Text(text="ref"), TagCloseOpen(padding=""), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), TagOpenClose(), Text(text="ref"), TagCloseClose()] --- name: link_inside_tag_style label: an external link nested inside style tags input: "''http://example.com''" output: [TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), TagOpenClose(), Text(text="i"), TagCloseClose()] --- name: style_tag_inside_link label: style tags disrupting an external link input: "http://example.com/foo''bar''" output: [ExternalLinkOpen(brackets=False), Text(text="http://example.com/foo"), ExternalLinkClose(), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="bar"), TagOpenClose(), Text(text="i"), TagCloseClose()] --- name: comment_inside_link label: an HTML comment inside an external link input: "http://example.com/foobar" output: [ExternalLinkOpen(brackets=False), Text(text="http://example.com/foo"), CommentStart(), Text(text="comment"), CommentEnd(), Text(text="bar"), ExternalLinkClose()] --- name: bracketed_link_inside_template label: a bracketed external link nested inside a template, before the end input: "{{URL|[http://example.com}}]" output: [Text(text="{{URL|"), ExternalLinkOpen(brackets=True), Text(text="http://example.com}}"), ExternalLinkClose()] --- name: comment_inside_bracketed_link label: an HTML comment inside a bracketed external link input: "[http://example.com/foobar]" output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com/foo"), CommentStart(), Text(text="comment"), CommentEnd(), Text(text="bar"), ExternalLinkClose()] --- name: wikilink_inside_external_link label: a wikilink inside an external link, which the parser considers valid (see issue #61) input: "[http://example.com/foo Foo [[Bar]]]" output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com/foo"), ExternalLinkSeparator(), Text(text="Foo "), WikilinkOpen(), Text(text="Bar"), WikilinkClose(), ExternalLinkClose()] --- name: external_link_inside_wikilink label: an external link inside a wikilink, valid in the case of images (see issue #62) input: "[[File:Example.png|thumb|http://example.com]]" output: [WikilinkOpen(), Text(text="File:Example.png"), WikilinkSeparator(), Text(text="thumb|"), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), WikilinkClose()] --- name: external_link_inside_wikilink_brackets label: an external link with brackets inside a wikilink input: "[[File:Example.png|thumb|[http://example.com Example]]]" output: [WikilinkOpen(), Text(text="File:Example.png"), WikilinkSeparator(), Text(text="thumb|"), ExternalLinkOpen(brackets=True), Text(text="http://example.com"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose(), WikilinkClose()] --- name: external_link_inside_wikilink_title label: an external link inside a wikilink title, which is invalid input: "[[File:Example.png http://example.com]]" output: [WikilinkOpen(), Text(text="File:Example.png http://example.com"), WikilinkClose()] --- name: italics_inside_external_link_inside_incomplete_list label: italic text inside an external link inside an incomplete list input: "
  • [http://www.example.com ''example'']" output: [TagOpenOpen(), Text(text="li"), TagCloseSelfclose(padding="", implicit=True), ExternalLinkOpen(brackets=True), Text(text="http://www.example.com"), ExternalLinkSeparator(), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="example"), TagOpenClose(), Text(text="i"), TagCloseClose(), ExternalLinkClose()] --- name: nodes_inside_external_link_after_punct label: various complex nodes inside an external link following punctuation input: "http://example.com/foo.{{bar}}baz.&biz;bingo" output: [ExternalLinkOpen(brackets=False), Text(text="http://example.com/foo."), TemplateOpen(), Text(text="bar"), TemplateClose(), Text(text="baz.&biz;"), CommentStart(), Text(text="hello"), CommentEnd(), Text(text="bingo"), ExternalLinkClose()] --- name: newline_and_comment_in_template_name label: a template name containing a newline followed by a comment input: "{{foobar\n}}" output: [TemplateOpen(), Text(text="foobar\n"), CommentStart(), Text(text=" comment "), CommentEnd(), TemplateClose()] --- name: newline_and_comment_in_template_name_2 label: a template name containing a newline followed by a comment input: "{{foobar\n|key=value}}" output: [TemplateOpen(), Text(text="foobar\n"), CommentStart(), Text(text=" comment "), CommentEnd(), TemplateParamSeparator(), Text(text="key"), TemplateParamEquals(), Text(text="value"), TemplateClose()] --- name: newline_and_comment_in_template_name_3 label: a template name containing a newline followed by a comment input: "{{foobar\n\n|key=value}}" output: [TemplateOpen(), Text(text="foobar\n"), CommentStart(), Text(text=" comment "), CommentEnd(), Text(text="\n"), TemplateParamSeparator(), Text(text="key"), TemplateParamEquals(), Text(text="value"), TemplateClose()] --- name: newline_and_comment_in_template_name_4 label: a template name containing a newline followed by a comment input: "{{foobar\ninvalid|key=value}}" output: [Text(text="{{foobar\n"), CommentStart(), Text(text=" comment "), CommentEnd(), Text(text="invalid|key=value}}")] --- name: newline_and_comment_in_template_name_5 label: a template name containing a newline followed by a comment input: "{{foobar\n\ninvalid|key=value}}" output: [Text(text="{{foobar\n"), CommentStart(), Text(text=" comment "), CommentEnd(), Text(text="\ninvalid|key=value}}")] --- name: newline_and_comment_in_template_name_6 label: a template name containing a newline followed by a comment input: "{{foobar\n\nfoobar\n}}" output: [TemplateOpen(), CommentStart(), Text(text=" comment "), CommentEnd(), Text(text="\nfoobar\n"), CommentStart(), Text(text=" comment "), CommentEnd(), TemplateClose()] --- name: tag_in_link_title label: HTML tags are invalid in link titles, even when complete input: "[[foobarbaz]]" output: [Text(text="[[foo"), TagOpenOpen(), Text(text="i"), TagCloseOpen(padding=""), Text(text="bar"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text="baz]]")] --- name: tag_in_template_name label: HTML tags are invalid in template names, even when complete input: "{{foobarbaz}}" output: [Text(text="{{foo"), TagOpenOpen(), Text(text="i"), TagCloseOpen(padding=""), Text(text="bar"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text="baz}}")] --- name: tag_in_link_text label: HTML tags are valid in link text input: "[[foo|barbaz]]" output: [WikilinkOpen(), Text(text="foo"), WikilinkSeparator(), TagOpenOpen(), Text(text="i"), TagCloseOpen(padding=""), Text(text="bar"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text="baz"), WikilinkClose()] --- name: comment_in_link_title label: comments are valid in link titles input: "[[foobaz]]" output: [WikilinkOpen(), Text(text="foo"), CommentStart(), Text(text="bar"), CommentEnd(), Text(text="baz"), WikilinkClose()] --- name: incomplete_comment_in_link_title label: incomplete comments are invalid in link titles input: "[[foo