html2text-2016.1.8/ 0000755 0001750 0001750 00000000000 12643763140 014600 5 ustar ubuntu ubuntu 0000000 0000000 html2text-2016.1.8/README.md 0000644 0001750 0001750 00000006250 12616413406 016057 0 ustar ubuntu ubuntu 0000000 0000000 # html2text [](http://travis-ci.org/Alir3z4/html2text) [](https://coveralls.io/r/Alir3z4/html2text) [](https://pypi.python.org/pypi/html2text/) [](https://pypi.python.org/pypi/html2text/) [](https://pypi.python.org/pypi/html2text/) [](https://pypi.python.org/pypi/html2text/) [](https://pypi.python.org/pypi/html2text/) [](https://pypi.python.org/pypi/html2text/) html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format). Usage: `html2text [(filename|url) [encoding]]` | Option | Description |--------------------------------------------------------|--------------------------------------------------- | `--version` | Show program's version number and exit | `-h`, `--help` | Show this help message and exit | `--ignore-links` | Don't include any formatting for links |`--escape-all` | Escape all special characters. Output is less readable, but avoids corner case formatting issues. | `--reference-links` | Use reference links instead of links to create markdown | `--mark-code` | Mark preformatted and code blocks with [code]...[/code] For a complete list of options see the [docs](docs/usage.md) Or you can use it from within `Python`: ``` >>> import html2text >>> >>> print(html2text.html2text("
Zed's dead baby, Zed's dead.
")) **Zed's** dead baby, _Zed's_ dead. ``` Or with some configuration options: ``` >>> import html2text >>> >>> h = html2text.HTML2Text() >>> # Ignore converting links from HTML >>> h.ignore_links = True >>> print h.handle("Hello, world!") Hello, world! >>> print(h.handle("
Hello, world!")) Hello, world! >>> # Don't Ignore links anymore, I like links >>> h.ignore_links = False >>> print(h.handle("
Hello, world!"))
Hello, [world](http://earth.google.com/)!
```
*Originally written by Aaron Swartz. This code is distributed under the GPLv3.*
## How to install
`html2text` is available on pypi
https://pypi.python.org/pypi/html2text
```
$ pip install html2text
```
## How to run unit tests
PYTHONPATH=$PYTHONPATH:. coverage run --source=html2text setup.py test -v
To see the coverage results:
coverage combine
coverage html
then open the `./htmlcov/index.html` file in your browser.
## Documentation
Documentation lives [here](docs/index.md)
html2text-2016.1.8/html2text.egg-info/ 0000755 0001750 0001750 00000000000 12643763140 020225 5 ustar ubuntu ubuntu 0000000 0000000 html2text-2016.1.8/html2text.egg-info/not-zip-safe 0000644 0001750 0001750 00000000001 12471110761 022445 0 ustar ubuntu ubuntu 0000000 0000000
html2text-2016.1.8/html2text.egg-info/SOURCES.txt 0000644 0001750 0001750 00000004747 12643763140 022125 0 ustar ubuntu ubuntu 0000000 0000000 AUTHORS.rst
COPYING
ChangeLog.rst
MANIFEST.in
README.md
setup.cfg
setup.py
html2text/__init__.py
html2text/cli.py
html2text/compat.py
html2text/config.py
html2text/utils.py
html2text.egg-info/PKG-INFO
html2text.egg-info/SOURCES.txt
html2text.egg-info/dependency_links.txt
html2text.egg-info/entry_points.txt
html2text.egg-info/not-zip-safe
html2text.egg-info/top_level.txt
test/GoogleDocMassDownload.html
test/GoogleDocMassDownload.md
test/GoogleDocSaved.html
test/GoogleDocSaved.md
test/GoogleDocSaved_two.html
test/GoogleDocSaved_two.md
test/__init__.py
test/abbr_tag.html
test/abbr_tag.md
test/anchors.html
test/anchors.md
test/apos_element.html
test/apos_element.md
test/blockquote_example.html
test/blockquote_example.md
test/bodywidth_newline.html
test/bodywidth_newline.md
test/bold_inside_link.html
test/bold_inside_link.md
test/css_import_no_semicolon.html
test/css_import_no_semicolon.md
test/decript_tage.html
test/decript_tage.md
test/doc_with_table.html
test/doc_with_table.md
test/doc_with_table_bypass.html
test/doc_with_table_bypass.md
test/emdash-para.html
test/emdash-para.md
test/empty-link.html
test/empty-link.md
test/flip_emphasis.html
test/flip_emphasis.md
test/header_tags.html
test/header_tags.md
test/horizontal_rule.html
test/horizontal_rule.md
test/html-escaping.html
test/html-escaping.md
test/html_entities_out_of_text.html
test/html_entities_out_of_text.md
test/images_to_alt.html
test/images_to_alt.md
test/images_with_size.html
test/images_with_size.md
test/img-tag-with-link.html
test/img-tag-with-link.md
test/invalid_start.html
test/invalid_start.md
test/invalid_unicode.html
test/invalid_unicode.md
test/link_titles.html
test/link_titles.md
test/list_tags_example.html
test/list_tags_example.md
test/mark_code.html
test/mark_code.md
test/nbsp.html
test/nbsp.md
test/nbsp_unicode.html
test/nbsp_unicode.md
test/no_inline_links_example.html
test/no_inline_links_example.md
test/no_inline_links_images_to_alt.html
test/no_inline_links_images_to_alt.md
test/no_inline_links_nested.html
test/no_inline_links_nested.md
test/no_wrap_links.html
test/no_wrap_links.md
test/no_wrap_links_no_inline_links.html
test/no_wrap_links_no_inline_links.md
test/normal.html
test/normal.md
test/normal_escape_snob.html
test/normal_escape_snob.md
test/pre.html
test/pre.md
test/preformatted_in_list.html
test/preformatted_in_list.md
test/protect_links.html
test/protect_links.md
test/single_line_break.html
test/single_line_break.md
test/test_html2text.py
test/test_memleak.py
test/url-escaping.html
test/url-escaping.md html2text-2016.1.8/html2text.egg-info/PKG-INFO 0000644 0001750 0001750 00000002115 12643763137 021327 0 ustar ubuntu ubuntu 0000000 0000000 Metadata-Version: 1.1
Name: html2text
Version: 2016.1.8
Summary: Turn HTML into equivalent Markdown-structured text.
Home-page: https://github.com/Alir3z4/html2text/
Author: Alireza Savand
Author-email: alireza.savand@gmail.com
License: GNU GPL 3
Description: UNKNOWN
Platform: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.4
Classifier: Programming Language :: Python :: 2.5
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.0
Classifier: Programming Language :: Python :: 3.1
Classifier: Programming Language :: Python :: 3.2
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
html2text-2016.1.8/html2text.egg-info/entry_points.txt 0000644 0001750 0001750 00000000104 12643763137 023524 0 ustar ubuntu ubuntu 0000000 0000000
[console_scripts]
html2text=html2text.cli:main
html2text-2016.1.8/html2text.egg-info/dependency_links.txt 0000644 0001750 0001750 00000000001 12643763137 024301 0 ustar ubuntu ubuntu 0000000 0000000
html2text-2016.1.8/html2text.egg-info/top_level.txt 0000644 0001750 0001750 00000000012 12643763137 022756 0 ustar ubuntu ubuntu 0000000 0000000 html2text
html2text-2016.1.8/ChangeLog.rst 0000644 0001750 0001750 00000010071 12643762527 017170 0 ustar ubuntu ubuntu 0000000 0000000 2016.1.8
=========
----
* Feature #99: Removed duplicated initialisation.
* Fix #100: Get element style key error.
* Fix #101: Fix error end tag pop exception
* , , now rendered as ~~text~~.
2015.11.4
=========
----
* Fix #38: Long links wrapping controlled by `--no-wrap-links`.
* Note: `--no-wrap-links` implies `--reference-links`
* Feature #83: Add callback-on-tag.
* Fix #87: Decode errors can be handled via command line.
* Feature #95: Docs, decode errors spelling mistake.
* Fix #84: Make bodywidth kwarg overridable using config.
2015.6.21
=========
----
* Fix #31: HTML entities stay inside link.
* Fix #71: Coverage detects command line tests.
* Fix #39: Documentation update.
* Fix #61: Functionality added for optional use of automatic links.
* Feature #80: ``title`` attribute is preserved in both inline and reference links.
* Feature #82: More command line options. See docs.
2015.6.12
=========
----
* Feature #76: Making ``pre`` blocks clearer for further automatic formatting.
* Fix #71: Coverage detects tests carried out in ``subprocesses``
2015.6.6
========
----
* Fix #24: ``3.200.3`` vs ``2014.7.3`` output quirks.
* Fix #61. Malformed links in markdown output.
* Feature #62: Automatic version number.
* Fix #63: Nested code, anchor bug.
* Fix #64: Proper handling of anchors with content that starts with tags.
* Feature #67: Documentation all over the module.
* Feature #70: Adding tests for the module.
* Fix #73: Typo in config documentation.
2015.4.14
=========
----
* Feature #59: Write image tags with height and width attrs as raw html to retain dimensions
2015.4.13
=========
----
* Feature #56: Treat '-' file parameter as stdin.
* Feature #57: Retain escaping of html except within code or pre tags.
2015.2.18
=========
----
* Fix #38: Anchor tags with empty text or with `` tags inside are no longer stripped.
2014.12.29
==========
----
* Feature #51: Add single line break option.
This feature is useful for ensuring that lots of extra line breaks do not
end up in the resulting Markdown file in situations like Evernote .enex
exports. Note that this only works properly if ``body-width`` is set
to ``0``.
2014.12.24
==========
----
* Feature #49: Added a images_to_alt option to discard images and keep only their alt.
* Feature #50: Protect links, surrounding them with angle brackets to avoid breaking...
* Feature: Add ``setup.cfg`` file.
2014.12.5
=========
----
* Feature: Update `README.md` with usage examples.
* Fix #35: Remove `py_modules` from `setup.py`.
* Fix #36: Excludes tests from being installed as a separate module.
* Fix #37: Don't hardcode the path to the installed binary.
* Fix: Readme typo in running cli.
* Feature #40: Extract cli part to ``cli`` module.
* Feature #42: Bring python version compatibility to ``compat.py`` module.
* Feature #41: Extract utility/helper methods to ``utils`` module.
* Fix #45: Does not accept standard input when running under Python 3.
* Feature: Clean up ``ChangeLog.rst`` for version and date numbers.
2014.9.25
=========
----
* Feature #29, #27: Add simple table support with bypass option.
* Fix #20: Replace project website with: http://alir3z4.github.io/html2text/ .
2014.9.8
========
----
* Fix #28: missing ``html2text`` package in installation.
2014.9.7
========
----
* Fix ``unicode``/``type`` error in memory leak unit-test.
* Feature #16: Remove ``install_deps.py``.
* Feature #17: Add status badges via pypin.
* Feature #18: Add ``Python`` ``3.4`` to travis config file.
* Feature #19: Bring ``html2text`` to a separate module and take out the ``conf``/``constant`` variables.
* Feature #21: Remove meta vars from ``html2text.py`` file header.
* Fix: Fix TypeError when parsing tags like
. Fixed in #25.
2014.7.3
========
----
* Fix #8: Remove ``How to do a release`` section from README.md.
* Fix #11: Include test directory markdown, html files.
* Fix #13: memory leak in using ``handle`` while keeping the old instance of ``html2text``.
2014.4.5
========
----
* Fix #1: Add ``ChangeLog.rst`` file.
* Fix #2: Add ``AUTHORS.rst`` file.
html2text-2016.1.8/test/ 0000755 0001750 0001750 00000000000 12643763140 015557 5 ustar ubuntu ubuntu 0000000 0000000 html2text-2016.1.8/test/html_entities_out_of_text.md 0000644 0001750 0001750 00000000047 12616413406 023366 0 ustar ubuntu ubuntu 0000000 0000000 [allas: Country Manager](http://thth)
html2text-2016.1.8/test/doc_with_table_bypass.html 0000664 0001750 0001750 00000001066 12471110237 022772 0 ustar ubuntu ubuntu 0000000 0000000
This is a test document
With some text, code
, bolds and italics.
This is second header
Header 1
Header 2
Header 3
Content 1
Content 2
Image!