pax_global_header00006660000000000000000000000064146037173070014522gustar00rootroot0000000000000052 comment=ecc8a696eafc51097e43158e5a38ba7dcda1eacf lxml_html_clean-0.1.1/000077500000000000000000000000001460371730700146635ustar00rootroot00000000000000lxml_html_clean-0.1.1/.github/000077500000000000000000000000001460371730700162235ustar00rootroot00000000000000lxml_html_clean-0.1.1/.github/workflows/000077500000000000000000000000001460371730700202605ustar00rootroot00000000000000lxml_html_clean-0.1.1/.github/workflows/main.yml000066400000000000000000000011631460371730700217300ustar00rootroot00000000000000on: push: branches: - main pull_request: branches: - main name: Run Tox tests jobs: tox_test: name: Tox test steps: - name: Checkout uses: actions/checkout@v2 - name: Run Tox tests id: test uses: fedora-python/tox-github-action@main with: tox_env: ${{ matrix.tox_env }} dnf_install: gcc libxml2-devel libxslt-devel strategy: matrix: tox_env: - py36 - py37 - py38 - py39 - py310 - py311 - py312 # Use GitHub's Linux Docker host runs-on: ubuntu-latest lxml_html_clean-0.1.1/.gitignore000066400000000000000000000001201460371730700166440ustar00rootroot00000000000000__pycache__ *.pyc *.pyo .tox dist/ docs/_build build/ lxml_html_clean.egg-info/ lxml_html_clean-0.1.1/.readthedocs.yaml000066400000000000000000000004661460371730700201200ustar00rootroot00000000000000# .readthedocs.yaml # Read the Docs configuration file # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details # Required version: 2 build: os: ubuntu-22.04 tools: python: "3.12" sphinx: configuration: docs/conf.py python: install: - requirements: docs/requirements.txt lxml_html_clean-0.1.1/CHANGES.rst000066400000000000000000000054441460371730700164740ustar00rootroot00000000000000========================= lxml_html_clean changelog ========================= Unreleased ========== 0.1.1 (2024-04-05) ================== Bugs fixed ---------- * Regular expresion for image data URLs now supports multiple data URLs on a single line. 0.1.0 (2024-02-26) ================== First official release of the split project. Relevant changes from lxml project before the split =================================================== This part contains releases of lxml project containing important changes related to HTML Cleaner functionalities. 5.1.0 (2024-01-05) ================== Bugs fixed ---------- * The HTML ``Cleaner()`` interpreted an accidentally provided string parameter for the ``host_whitelist`` as list of characters and silently failed to reject any hosts. Passing a non-collection is now rejected. 4.9.3 (2023-07-05) ================== Bugs fixed ---------- * A memory leak in ``lxml.html.clean`` was resolved by switching to Cython 0.29.34+. * URL checking in the HTML cleaner was improved. Patch by Tim McCormack. 4.6.5 (2021-12-12) ================== Bugs fixed ---------- * A vulnerability (GHSL-2021-1038) in the HTML cleaner allowed sneaking script content through SVG images (CVE-2021-43818). * A vulnerability (GHSL-2021-1037) in the HTML cleaner allowed sneaking script content through CSS imports and other crafted constructs (CVE-2021-43818). 4.6.3 (2021-03-21) ================== Bugs fixed ---------- * A vulnerability (CVE-2021-28957) was discovered in the HTML Cleaner by Kevin Chung, which allowed JavaScript to pass through. The cleaner now removes the HTML5 ``formaction`` attribute. 4.6.2 (2020-11-26) ================== Bugs fixed ---------- * A vulnerability (CVE-2020-27783) was discovered in the HTML Cleaner by Yaniv Nizry, which allowed JavaScript to pass through. The cleaner now removes more sneaky "style" content. 4.6.1 (2020-10-18) ================== Bugs fixed ---------- * A vulnerability was discovered in the HTML Cleaner by Yaniv Nizry, which allowed JavaScript to pass through. The cleaner now removes more sneaky "style" content. 4.5.2 (2020-07-09) ================== Bugs fixed ---------- * ``Cleaner()`` now validates that only known configuration options can be set. * ``Cleaner.clean_html()`` discarded comments and PIs regardless of the corresponding configuration option, if ``remove_unknown_tags`` was set. 4.2.5 (2018-09-09) ================== Bugs fixed ---------- * Javascript URLs that used URL escaping were not removed by the HTML cleaner. Security problem found by Omar Eissa. (CVE-2018-19787) 4.0.0 (2017-09-17) ================== Features added -------------- * The modules ``lxml.builder``, ``lxml.html.diff`` and ``lxml.html.clean`` are also compiled using Cython in order to speed them up. lxml_html_clean-0.1.1/LICENSE.txt000066400000000000000000000027201460371730700165070ustar00rootroot00000000000000Copyright (c) 2004 Infrae. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of Infrae nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INFRAE OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. lxml_html_clean-0.1.1/README.md000066400000000000000000000013271460371730700161450ustar00rootroot00000000000000# lxml_html_clean ## Motivation This project was initially a part of [lxml](https://github.com/lxml/lxml). Because HTML cleaner is designed as blocklist-based, many reports about possible security vulnerabilities were filed for lxml and that make the project problematic for security-sensitive environments. Therefore we decided to extract the problematic part to a separate project. ## Installation You can install this project directly via `pip install lxml_html_clean` or soon as an extra of lxml via `pip install lxml[html_clean]`. Both ways installs this project together with lxml itself. ## Documentation [https://lxml-html-clean.readthedocs.io/](https://lxml-html-clean.readthedocs.io/) ## License BSD-3-Clause lxml_html_clean-0.1.1/docs/000077500000000000000000000000001460371730700156135ustar00rootroot00000000000000lxml_html_clean-0.1.1/docs/Makefile000066400000000000000000000011721460371730700172540ustar00rootroot00000000000000# Minimal makefile for Sphinx documentation # # You can set these variables from the command line, and also # from the environment for the first two. SPHINXOPTS ?= SPHINXBUILD ?= sphinx-build SOURCEDIR = . BUILDDIR = _build # Put it first so that "make" without argument is like "make help". help: @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) .PHONY: help Makefile # Catch-all target: route all unknown targets to Sphinx using the new # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). %: Makefile @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) lxml_html_clean-0.1.1/docs/changes.rst000066400000000000000000000000341460371730700177520ustar00rootroot00000000000000.. include:: ../CHANGES.rst lxml_html_clean-0.1.1/docs/conf.py000066400000000000000000000022271460371730700171150ustar00rootroot00000000000000# Configuration file for the Sphinx documentation builder. # # For the full list of built-in configuration values, see the documentation: # https://www.sphinx-doc.org/en/master/usage/configuration.html import sys sys.path.insert(0, "..") # -- Project information ----------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information project = 'lxml_html_clean' copyright = '2024, Lumír Balhar' author = 'Lumír Balhar' # -- General configuration --------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration extensions = ['sphinx.ext.autodoc'] templates_path = ['_templates'] exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] autodoc_default_options = { 'ignore-module-all': True, 'private-members': True, 'inherited-members': True, } autodoc_member_order = 'groupwise' # -- Options for HTML output ------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output html_theme = 'sphinx_rtd_theme' #html_static_path = ['_static'] lxml_html_clean-0.1.1/docs/index.rst000066400000000000000000000017011460371730700174530ustar00rootroot00000000000000Welcome to lxml_html_clean's documentation! =========================================== Motivation ---------- This project was initially a part of `lxml `_. Because HTML cleaner is designed as blocklist-based, many reports about possible security vulnerabilities were filed for lxml and that make the project problematic for security-sensitive environments. Therefore we decided to extract the problematic part to a separate project. Installation ------------ You can install this project directly via ``pip install lxml_html_clean``` or soon as an extra of lxml via ``pip install lxml[html_clean]``. Both ways installs this project together with lxml itself. Usage ===== .. toctree:: :maxdepth: 2 usage API === .. toctree:: :maxdepth: 2 lxml_html_clean Changelog ========= .. toctree:: :maxdepth: 2 changes Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search` lxml_html_clean-0.1.1/docs/lxml_html_clean.rst000066400000000000000000000003201460371730700215020ustar00rootroot00000000000000lxml\_html\_clean package ========================= lxml\_html\_clean.clean module ------------------------------ .. automodule:: lxml_html_clean.clean :members: :undoc-members: :show-inheritance: lxml_html_clean-0.1.1/docs/modules.rst000066400000000000000000000001221460371730700200100ustar00rootroot00000000000000lxml_html_clean =============== .. toctree:: :maxdepth: 4 lxml_html_clean lxml_html_clean-0.1.1/docs/requirements.txt000066400000000000000000000000341460371730700210740ustar00rootroot00000000000000lxml sphinx sphinx-rtd-themelxml_html_clean-0.1.1/docs/usage.rst000066400000000000000000000131541460371730700174550ustar00rootroot00000000000000Cleaning up HTML ================ The module ``lxml_html_clean`` provides a ``Cleaner`` class for cleaning up HTML pages. It supports removing embedded or script content, special tags, CSS style annotations and much more. Note: the HTML Cleaner in ``lxml_html_clean`` is **not** considered appropriate **for security sensitive environments**. See e.g. `bleach `_ for an alternative. Say, you have an overburdened web page from a hideous source which contains lots of content that upsets browsers and tries to run unnecessary code on the client side: .. sourcecode:: pycon >>> html = '''\ ... ... ... ... ... ... ... ... ... a link ... another link ...

a paragraph

...
secret EVIL!
... of EVIL! ... ...
... Password: ...
... annoying EVIL! ... spam spam SPAM! ... ... ... ''' To remove the all superfluous content from this unparsed document, use the ``clean_html`` function: .. sourcecode:: pycon >>> from lxml_html_clean import clean_html >>> print clean_html(html)
a link another link

a paragraph

secret EVIL!
of EVIL! Password: annoying EVIL!spam spam SPAM!
The ``Cleaner`` class supports several keyword arguments to control exactly which content is removed: .. sourcecode:: pycon >>> from lxml_html_clean import Cleaner >>> cleaner = Cleaner(page_structure=False, links=False) >>> print cleaner.clean_html(html) a link another link

a paragraph

secret EVIL!
of EVIL! Password: annoying EVIL! spam spam SPAM! >>> cleaner = Cleaner(style=True, links=True, add_nofollow=True, ... page_structure=False, safe_attrs_only=False) >>> print cleaner.clean_html(html) a link another link

a paragraph

secret EVIL!
of EVIL! Password: annoying EVIL! spam spam SPAM! You can also whitelist some otherwise dangerous content with ``Cleaner(host_whitelist=['www.youtube.com'])``, which would allow embedded media from YouTube, while still filtering out embedded media from other sites. See the docstring of ``Cleaner`` for the details of what can be cleaned. autolink -------- In addition to cleaning up malicious HTML, ``lxml_html_clean`` contains functions to do other things to your HTML. This includes autolinking:: autolink(doc, ...) autolink_html(html, ...) This finds anything that looks like a link (e.g., ``http://example.com``) in the *text* of an HTML document, and turns it into an anchor. It avoids making bad links. Links in the elements ``'''))
A link in
>>> print(autolink_html(''' ...
A link in http://bar.com
'''))
A link in http://bar.com
>>> print(autolink_html(''' ...
A link in http://foo.com or ... http://bar.com
'''))
A link in http://foo.com or http://bar.com
There's also a word wrapping function, that should probably be run after autolink:: >>> from lxml_html_clean import word_break_html >>> def pascii(s): ... print(s.encode('ascii', 'xmlcharrefreplace').decode('ascii')) >>> pascii(word_break_html( u''' ...
Hey you ... 12345678901234567890123456789012345678901234567890
'''))
Hey you 1234567890123456789012345678901234567890​1234567890
Not everything is broken: >>> pascii(word_break_html(''' ...
Hey you ... 12345678901234567890123456789012345678901234567890
'''))
Hey you 12345678901234567890123456789012345678901234567890
>>> pascii(word_break_html(''' ... text''')) text lxml_html_clean-0.1.1/tests/test_clean.py000066400000000000000000000301021460371730700205140ustar00rootroot00000000000000import base64 import gzip import io import unittest import lxml.html from lxml_html_clean import Cleaner, clean_html class CleanerTest(unittest.TestCase): def test_allow_tags(self): html = """

some text

helloworld
helloworld
""" html_root = lxml.html.document_fromstring(html) cleaner = Cleaner( remove_unknown_tags = False, allow_tags = ['table', 'tr', 'td']) result = cleaner.clean_html(html_root) self.assertEqual(12-5+1, len(list(result.iter()))) def test_allow_and_remove(self): with self.assertRaises(ValueError): Cleaner(allow_tags=['a'], remove_unknown_tags=True) def test_remove_unknown_tags(self): html = """
lettuce, tomato, veggie patty
""" clean_html = """
lettuce, tomato, veggie patty
""" cleaner = Cleaner(remove_unknown_tags=True) result = cleaner.clean_html(html) self.assertEqual( result, clean_html, msg="Unknown tags not removed. Got: %s" % result, ) def test_safe_attrs_included(self): html = """

Cyan

""" safe_attrs=set(lxml.html.defs.safe_attrs) safe_attrs.add('style') cleaner = Cleaner( safe_attrs_only=True, safe_attrs=safe_attrs) result = cleaner.clean_html(html) self.assertEqual(html, result) def test_safe_attrs_excluded(self): html = """

Cyan

""" expected = """

Cyan

""" safe_attrs=set() cleaner = Cleaner( safe_attrs_only=True, safe_attrs=safe_attrs) result = cleaner.clean_html(html) self.assertEqual(expected, result) def test_clean_invalid_root_tag(self): # only testing that cleaning with invalid root tags works at all s = lxml.html.fromstring('parent child') self.assertEqual('parent child', clean_html(s).text_content()) s = lxml.html.fromstring('child') self.assertEqual('child', clean_html(s).text_content()) def test_clean_with_comments(self): html = """

Cyan

""" s = lxml.html.fragment_fromstring(html) self.assertEqual( b'

Cyan

', lxml.html.tostring(clean_html(s))) self.assertEqual( '

Cyan

', clean_html(html)) cleaner = Cleaner(comments=False) result = cleaner.clean_html(s) self.assertEqual( b'

Cyan

', lxml.html.tostring(result)) self.assertEqual( '

Cyan

', cleaner.clean_html(html)) def test_sneaky_noscript_in_style(self): # This gets parsed as through into the output. html = '', lxml.html.tostring(clean_html(s))) def test_sneaky_js_in_math_style(self): # This gets parsed as -> # thus passing any tag/script/whatever content through into the output. html = '' s = lxml.html.fragment_fromstring(html) self.assertEqual( b'', lxml.html.tostring(clean_html(s))) def test_sneaky_import_in_style(self): # Prevent "@@importimport" -> "@import" replacement etc. style_codes = [ "@@importimport(extstyle.css)", "@ @ import import(extstyle.css)", "@ @ importimport(extstyle.css)", "@@ import import(extstyle.css)", "@ @import import(extstyle.css)", "@@importimport()", "@@importimport() ()", "@/* ... */import()", "@im/* ... */port()", "@ @import/* ... */import()", "@ /* ... */ import()", ] for style_code in style_codes: html = '' % style_code s = lxml.html.fragment_fromstring(html) cleaned = lxml.html.tostring(clean_html(s)) self.assertEqual( b'', cleaned, "%s -> %s" % (style_code, cleaned)) def test_sneaky_schemes_in_style(self): style_codes = [ "javasjavascript:cript:", "javascriptjavascript::", "javascriptjavascript:: :", "vbjavascript:cript:", ] for style_code in style_codes: html = '' % style_code s = lxml.html.fragment_fromstring(html) cleaned = lxml.html.tostring(clean_html(s)) self.assertEqual( b'', cleaned, "%s -> %s" % (style_code, cleaned)) def test_sneaky_urls_in_style(self): style_codes = [ "url(data:image/svg+xml;base64,...)", "url(javasjavascript:cript:)", "url(javasjavascript:cript: ::)", "url(vbjavascript:cript:)", "url(vbjavascript:cript: :)", ] for style_code in style_codes: html = '' % style_code s = lxml.html.fragment_fromstring(html) cleaned = lxml.html.tostring(clean_html(s)) self.assertEqual( b'', cleaned, "%s -> %s" % (style_code, cleaned)) def test_svg_data_links(self): # Remove SVG images with potentially insecure content. svg = b'' gzout = io.BytesIO() f = gzip.GzipFile(fileobj=gzout, mode='wb') f.write(svg) f.close() svgz = gzout.getvalue() svg_b64 = base64.b64encode(svg).decode('ASCII') svgz_b64 = base64.b64encode(svgz).decode('ASCII') urls = [ "data:image/svg+xml;base64," + svg_b64, "data:image/svg+xml-compressed;base64," + svgz_b64, ] for url in urls: html = '' % url s = lxml.html.fragment_fromstring(html) cleaned = lxml.html.tostring(clean_html(s)) self.assertEqual( b'', cleaned, "%s -> %s" % (url, cleaned)) def test_image_data_links(self): data = b'123' data_b64 = base64.b64encode(data).decode('ASCII') urls = [ "data:image/jpeg;base64," + data_b64, "data:image/apng;base64," + data_b64, "data:image/png;base64," + data_b64, "data:image/gif;base64," + data_b64, "data:image/webp;base64," + data_b64, "data:image/bmp;base64," + data_b64, "data:image/tiff;base64," + data_b64, "data:image/x-icon;base64," + data_b64, ] for url in urls: html = '' % url s = lxml.html.fragment_fromstring(html) cleaned = lxml.html.tostring(clean_html(s)) self.assertEqual( html.encode("UTF-8"), cleaned, "%s -> %s" % (url, cleaned)) def test_image_data_links_in_style(self): data = b'123' data_b64 = base64.b64encode(data).decode('ASCII') urls = [ "data:image/jpeg;base64," + data_b64, "data:image/apng;base64," + data_b64, "data:image/png;base64," + data_b64, "data:image/gif;base64," + data_b64, "data:image/webp;base64," + data_b64, "data:image/bmp;base64," + data_b64, "data:image/tiff;base64," + data_b64, "data:image/x-icon;base64," + data_b64, ] for url in urls: html = '' % url s = lxml.html.fragment_fromstring(html) cleaned = lxml.html.tostring(clean_html(s)) self.assertEqual( html.encode("UTF-8"), cleaned, "%s -> %s" % (url, cleaned)) def test_image_data_links_in_inline_style(self): safe_attrs = set(lxml.html.defs.safe_attrs) safe_attrs.add('style') cleaner = Cleaner( safe_attrs_only=True, safe_attrs=safe_attrs) data = b'123' data_b64 = base64.b64encode(data).decode('ASCII') url = "url(data:image/jpeg;base64,%s)" % data_b64 styles = [ "background: %s" % url, "background: %s; background-image: %s" % (url, url), ] for style in styles: html = '
' % style s = lxml.html.fragment_fromstring(html) cleaned = lxml.html.tostring(cleaner.clean_html(s)) self.assertEqual( html.encode("UTF-8"), cleaned, "%s -> %s" % (style, cleaned)) def test_formaction_attribute_in_button_input(self): # The formaction attribute overrides the form's action and should be # treated as a malicious link attribute html = ('
' '') expected = ('
' '
') cleaner = Cleaner( forms=False, safe_attrs_only=False, ) self.assertEqual( expected, cleaner.clean_html(html)) def test_host_whitelist_slash_type_confusion(self): # Regression test: Accidentally passing a string when a 1-tuple was intended # creates a host_whitelist of the empty string; a malformed triple-slash # URL has an "empty host" according to urlsplit, and `"" in ""` passes. # So, don't allow user to accidentally pass a string for host_whitelist. html = '
' cleaner = Cleaner(frames=False, host_whitelist=["example.com"]) self.assertEqual(expected, cleaner.clean_html(html)) def test_host_whitelist_invalid(self): html = '
...
... Password: ...
... spam spam SPAM! ... ... Text ... ... ... ''' >>> print(re.sub('[\x00-\x07\x0E]', '', doc)) a link a control char link data another link

a paragraph

secret EVIL!
of EVIL!
Password:
spam spam SPAM! Text >>> print(tostring(fromstring(doc)).decode("utf-8")) a link a control char link data another link

a paragraph

secret EVIL!
of EVIL!
Password:
spam spam SPAM! Text >>> print(Cleaner(page_structure=False, comments=False).clean_html(doc)) a link a control char link data another link

a paragraph

secret EVIL!
of EVIL! Password: spam spam SPAM! Text >>> print(Cleaner(page_structure=False, safe_attrs_only=False).clean_html(doc)) a link a control char link data another link

a paragraph

secret EVIL!
of EVIL! Password: spam spam SPAM! Text >>> print(Cleaner(style=True, inline_style=True, links=True, add_nofollow=True, page_structure=False, safe_attrs_only=False).clean_html(doc)) a link a control char link data another link

a paragraph

secret EVIL!
of EVIL! Password: spam spam SPAM! Author Text >>> print(Cleaner(style=True, inline_style=False, links=True, add_nofollow=True, page_structure=False, safe_attrs_only=False).clean_html(doc)) a link a control char link data another link

a paragraph

secret EVIL!
of EVIL! Password: spam spam SPAM! Author Text >>> print(Cleaner(links=False, page_structure=False, javascript=True, host_whitelist=['example.com'], whitelist_tags=None).clean_html(doc)) a link a control char link data another link

a paragraph

secret EVIL!
of EVIL! Password: spam spam SPAM! Text lxml_html_clean-0.1.1/tests/test_clean_embed.txt000066400000000000000000000027601460371730700220500ustar00rootroot00000000000000THIS FAILS IN libxml2 2.6.29 AND 2.6.30 !! >>> from lxml.html import fromstring, tostring >>> from lxml_html_clean import clean, clean_html, Cleaner >>> from lxml.html import usedoctest >>> def tostring(el): # work-around for Py3 'bytes' type ... from lxml.html import tostring ... s = tostring(el) ... if not isinstance(s, str): ... s = s.decode('UTF-8') ... return s >>> doc_embed = '''
... ... ... ... ...
''' >>> print(tostring(fromstring(doc_embed)))
>>> print(Cleaner().clean_html(doc_embed))
>>> print(Cleaner(host_whitelist=['www.youtube.com']).clean_html(doc_embed))
>>> print(Cleaner(host_whitelist=['www.youtube.com'], whitelist_tags=None).clean_html(doc_embed))
lxml_html_clean-0.1.1/tox.ini000066400000000000000000000003531460371730700161770ustar00rootroot00000000000000[tox] envlist = py36,py37,py38,py39,py310,py311,py312 skipsdist = True [testenv] commands = python -m unittest tests.test_clean python -m doctest tests/test_clean_embed.txt tests/test_clean.txt tests/test_autolink.txt deps = lxml