latexcodec-2.0.1/0000755005105600024240000000000013674355145013612 5ustar dma0mtdma00000000000000latexcodec-2.0.1/AUTHORS.rst0000644005105600024240000000076713674352317015501 0ustar dma0mtdma00000000000000Main authors: * David Eppstein - wrote the original LaTeX codec as a recipe on ActiveState http://code.activestate.com/recipes/252124-latex-codec/ * Peter Tröger - wrote the original latexcodec package, which contained a simple but very effective LaTeX encoder * Matthias Troffaes (matthias.troffaes@gmail.com) - wrote the lexer - integrated codec with the lexer for a simpler and more robust design - various bugfixes Contributors: * Michael Radziej * Philipp Spitzer latexcodec-2.0.1/CHANGELOG.rst0000644005105600024240000000603613674355130015632 0ustar dma0mtdma000000000000002.0.1 (23 July 2020) -------------------- * Drop Python 3.3 support. * Added a few more translations. 2.0.0 (14 January 2020) ----------------------- * Lexer now processes unicode directly, to fix various issues with multibyte encodings. This also simplifies the implementation. Many thanks to davidweichiang for reporting and implementing. * New detailed description of the package for the readme, to clarify the behaviour and design choices. Many thanks to tschantzmc for contributing this description (see issue #70). * Minor fix in decoding of LaTeX comments (see issue #72). * Support Python 3.9 (see issue #75). 1.0.7 (3 May 2019) ------------------ * More symbols (THIN SPACE, various accented characters). * Fix lexer issue with multibyte encodings (thanks to davidweichiang for reporting). 1.0.6 (18 January 2018) ----------------------- * More symbols (EM SPACE, MINUS SIGN, GREEK PHI SYMBOL, HYPHEN, alternate encodings of Swedish å and Å). 1.0.5 (16 June 2017) -------------------- * More maths symbols (naturals, reals, ...). * Fix lower case z with accents (reported by AndrewSwann, see issue #51). 1.0.4 (21 September 2016) ------------------------- * Fix encoding and decoding of percent sign (reported by jgosmann, see issue #48). 1.0.3 (26 March 2016) --------------------- * New ``'keep'`` error for the ulatex encoder to keep unicode characters that cannot be translated (contributed by xuhdev, see pull requestion #45). 1.0.2 (1 March 2016) -------------------- * New ``ulatex`` codec which works as a text transform on unicode strings. * Fix spacing when translating math (see issue #29, reported by beltiste). * Performance improvements in latex to unicode translation. * Support old-style math mode (see pull request #40, contributed by xuhdev). * Treat tab character as a space character (see discussion in issue #40, raised by xuhdev). 1.0.1 (24 September 2014) ------------------------- * br"\\par" is now decoded using two newlines (see issue #26, reported by Jorrit Wronski). * Fix encoding and decoding of the ogonek (see issue #24, reported by beltiste). 1.0.0 (5 August 2014) --------------------- * Add Python 3.4 support. * Fix "DZ" decoding (see issue #21, reported and fixed by Philipp Spitzer). 0.3.2 (17 April 2014) --------------------- * Fix underscore "\\_" encoding (see issue #17, reported and fixed by Michael Radziej). 0.3.1 (5 February 2014) ----------------------- * Drop Python 3.2 support. * Drop 2to3 and instead use six to support both Python 2 and 3 from a single code base. * Fix control space "\\ " decoding. * Fix LaTeX encoding of number sign "#" and other special ascii characters (see issues #11 and #13, reported by beltiste). 0.3.0 (19 August 2013) ---------------------- * Copied lexer and codec from sphinxcontrib-bibtex. * Initial usage and API documentation. * Some small bugs fixed. 0.2 (28 September 2012) ----------------------- * Adding additional codec with brackets around special characters. 0.1 (26 May 2012) ----------------- * Initial release. latexcodec-2.0.1/INSTALL.rst0000644005105600024240000000622613674352317015456 0ustar dma0mtdma00000000000000Install the module with ``pip install latexcodec``, or from source using ``python setup.py install``. Minimal Example --------------- Simply import the :mod:`latexcodec` module to enable ``"latex"`` to be used as an encoding: .. code-block:: python import latexcodec text_latex = b"\\'el\\`eve" assert text_latex.decode("latex") == u"élève" text_unicode = u"ångström" assert text_unicode.encode("latex") == b'\\aa ngstr\\"om' There are also a ``ulatex`` encoding for text transforms. The simplest way to use this codec goes through the codecs module (as for all text transform codecs on Python): .. code-block:: python import codecs import latexcodec text_latex = u"\\'el\\`eve" assert codecs.decode(text_latex, "ulatex") == u"élève" text_unicode = u"ångström" assert codecs.encode(text_unicode, "ulatex") == u'\\aa ngstr\\"om' By default, the LaTeX input is assumed to be ascii, as per standard LaTeX. However, you can also specify an extra codec as ``latex+`` or ``ulatex+``, where ```` describes another encoding. In this case characters will be translated to and from that encoding whenever possible. The following code snippet demonstrates this behaviour: .. code-block:: python import latexcodec text_latex = b"\xfe" assert text_latex.decode("latex+latin1") == u"þ" assert text_latex.decode("latex+latin2") == u"ţ" text_unicode = u"ţ" assert text_unicode.encode("latex+latin1") == b'\\c t' # ţ is not latin1 assert text_unicode.encode("latex+latin2") == b'\xfe' # but it is latin2 When encoding using the ``ulatex`` codec, you have the option to pass through characters that cannot be encoded in the desired encoding, by using the ``'keep'`` error. This can be a useful fallback option if you want to encode as much as possible, whilst still retaining as much as possible of the original code when encoding fails. If instead you want to translate to LaTeX but keep as much of the unicode as possible, use the ``ulatex+utf8`` codec, which should never fail. .. code-block:: python import codecs import latexcodec text_unicode = u'⌨' # \u2328 = keyboard symbol, currently not translated try: # raises a value error as \u2328 cannot be encoded into latex codecs.encode(text_unicode, "ulatex+ascii") except ValueError: pass assert codecs.encode(text_unicode, "ulatex+ascii", "keep") == u'⌨' assert codecs.encode(text_unicode, "ulatex+utf8") == u'⌨' Limitations ----------- * Not all unicode characters are registered. If you find any missing, please report them on the tracker: https://github.com/mcmtroffaes/latexcodec/issues * Unicode combining characters are currently not handled. * By design, the codec never removes curly brackets. This is because it is very hard to guess whether brackets are part of a command or not (this would require a full latex parser). Moreover, bibtex uses curly brackets as a guard against case conversion, in which case automatic removal of curly brackets may not be desired at all, even if they are not part of a command. Also see: http://stackoverflow.com/a/19754245/2863746 latexcodec-2.0.1/LICENSE.rst0000644005105600024240000000217113674355130015421 0ustar dma0mtdma00000000000000| latexcodec is a lexer and codec to work with LaTeX code in Python | Copyright (c) 2011-2020 by Matthias C. M. Troffaes Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. latexcodec-2.0.1/MANIFEST.in0000644005105600024240000000042313607357532015345 0ustar dma0mtdma00000000000000include VERSION include README.rst include INSTALL.rst include CHANGELOG.rst include LICENSE.rst include AUTHORS.rst include requirements.txt recursive-include doc * recursive-include test * global-exclude *.pyc global-exclude .gitignore prune doc/_build exclude .travis.yml latexcodec-2.0.1/PKG-INFO0000644005105600024240000001250413674355145014711 0ustar dma0mtdma00000000000000Metadata-Version: 1.2 Name: latexcodec Version: 2.0.1 Summary: A lexer and codec to work with LaTeX code in Python. Home-page: https://github.com/mcmtroffaes/latexcodec Author: Matthias C. M. Troffaes Author-email: matthias.troffaes@gmail.com License: MIT Download-URL: http://pypi.python.org/pypi/latexcodec Description: * Download: http://pypi.python.org/pypi/latexcodec/#downloads * Documentation: http://latexcodec.readthedocs.org/ * Development: http://github.com/mcmtroffaes/latexcodec/ .. |travis| image:: https://travis-ci.org/mcmtroffaes/latexcodec.png?branch=develop :target: https://travis-ci.org/mcmtroffaes/latexcodec :alt: travis-ci .. |codecov| image:: https://codecov.io/gh/mcmtroffaes/latexcodec/branch/develop/graph/badge.svg :target: https://codecov.io/gh/mcmtroffaes/latexcodec :alt: codecov The codec provides a convenient way of going between text written in LaTeX and unicode. Since it is not a LaTeX compiler, it is more appropriate for short chunks of text, such as a paragraph or the values of a BibTeX entry, and it is not appropriate for a full LaTeX document. In particular, its behavior on the LaTeX commands that do not simply select characters is intended to allow the unicode representation to be understandable by a human reader, but is not canonical and may require hand tuning to produce the desired effect. The encoder does a best effort to replace unicode characters outside of the range used as LaTeX input (ascii by default) with a LaTeX command that selects the character. More technically, the unicode code point is replaced by a LaTeX command that selects a glyph that reasonably represents the code point. Unicode characters with special uses in LaTeX are replaced by their LaTeX equivalents. For example, ====================== =================== original text encoded LaTeX ====================== =================== ``¥`` ``\yen`` ``ü`` ``\"u`` ``\N{NO-BREAK SPACE}`` ``~`` ``~`` ``\textasciitilde`` ``%`` ``\%`` ``#`` ``\#`` ``\textbf{x}`` ``\textbf{x}`` ====================== =================== The decoder does a best effort to replace LaTeX commands that select characters with the unicode for the character they are selecting. For example, ===================== ====================== original LaTeX decoded unicode ===================== ====================== ``\yen`` ``¥`` ``\"u`` ``ü`` ``~`` ``\N{NO-BREAK SPACE}`` ``\textasciitilde`` ``~`` ``\%`` ``%`` ``\#`` ``#`` ``\textbf{x}`` ``\textbf {x}`` ``#`` ``#`` ===================== ====================== In addition, comments are dropped (including the final newline that marks the end of a comment), paragraphs are canonicalized into double newlines, and other newlines are left as is. Spacing after LaTeX commands is also canonicalized. For example, :: hi % bye there\par world \textbf {awesome} is decoded as :: hi there world \textbf {awesome} When decoding, LaTeX commands not directly selecting characters (for example, macros and formatting commands) are passed through unchanged. The same happens for LaTeX commands that select characters but are not yet recognized by the codec. Either case can result in a hybrid unicode string in which some characters are understood as literally the character and others as parts of unexpanded commands. Consequently, at times, backslashes will be left intact for denoting the start of a potentially unrecognized control sequence. Given the numerous and changing packages providing such LaTeX commands, the codec will never be complete, and new translations of unrecognized unicode or unrecognized LaTeX symbols are always welcome. Platform: any Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: MIT License Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Topic :: Text Processing :: Markup :: LaTeX Classifier: Topic :: Text Processing :: Filters Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.* latexcodec-2.0.1/README.rst0000644005105600024240000000705113674352317015302 0ustar dma0mtdma00000000000000latexcodec ========== |travis| |codecov| A lexer and codec to work with LaTeX code in Python. * Download: http://pypi.python.org/pypi/latexcodec/#downloads * Documentation: http://latexcodec.readthedocs.org/ * Development: http://github.com/mcmtroffaes/latexcodec/ .. |travis| image:: https://travis-ci.org/mcmtroffaes/latexcodec.png?branch=develop :target: https://travis-ci.org/mcmtroffaes/latexcodec :alt: travis-ci .. |codecov| image:: https://codecov.io/gh/mcmtroffaes/latexcodec/branch/develop/graph/badge.svg :target: https://codecov.io/gh/mcmtroffaes/latexcodec :alt: codecov The codec provides a convenient way of going between text written in LaTeX and unicode. Since it is not a LaTeX compiler, it is more appropriate for short chunks of text, such as a paragraph or the values of a BibTeX entry, and it is not appropriate for a full LaTeX document. In particular, its behavior on the LaTeX commands that do not simply select characters is intended to allow the unicode representation to be understandable by a human reader, but is not canonical and may require hand tuning to produce the desired effect. The encoder does a best effort to replace unicode characters outside of the range used as LaTeX input (ascii by default) with a LaTeX command that selects the character. More technically, the unicode code point is replaced by a LaTeX command that selects a glyph that reasonably represents the code point. Unicode characters with special uses in LaTeX are replaced by their LaTeX equivalents. For example, ====================== =================== original text encoded LaTeX ====================== =================== ``¥`` ``\yen`` ``ü`` ``\"u`` ``\N{NO-BREAK SPACE}`` ``~`` ``~`` ``\textasciitilde`` ``%`` ``\%`` ``#`` ``\#`` ``\textbf{x}`` ``\textbf{x}`` ====================== =================== The decoder does a best effort to replace LaTeX commands that select characters with the unicode for the character they are selecting. For example, ===================== ====================== original LaTeX decoded unicode ===================== ====================== ``\yen`` ``¥`` ``\"u`` ``ü`` ``~`` ``\N{NO-BREAK SPACE}`` ``\textasciitilde`` ``~`` ``\%`` ``%`` ``\#`` ``#`` ``\textbf{x}`` ``\textbf {x}`` ``#`` ``#`` ===================== ====================== In addition, comments are dropped (including the final newline that marks the end of a comment), paragraphs are canonicalized into double newlines, and other newlines are left as is. Spacing after LaTeX commands is also canonicalized. For example, :: hi % bye there\par world \textbf {awesome} is decoded as :: hi there world \textbf {awesome} When decoding, LaTeX commands not directly selecting characters (for example, macros and formatting commands) are passed through unchanged. The same happens for LaTeX commands that select characters but are not yet recognized by the codec. Either case can result in a hybrid unicode string in which some characters are understood as literally the character and others as parts of unexpanded commands. Consequently, at times, backslashes will be left intact for denoting the start of a potentially unrecognized control sequence. Given the numerous and changing packages providing such LaTeX commands, the codec will never be complete, and new translations of unrecognized unicode or unrecognized LaTeX symbols are always welcome. latexcodec-2.0.1/VERSION0000644005105600024240000000000613674355130014650 0ustar dma0mtdma000000000000002.0.1 latexcodec-2.0.1/doc/0000755005105600024240000000000013674355145014357 5ustar dma0mtdma00000000000000latexcodec-2.0.1/doc/Makefile0000644005105600024240000001271412204362324016005 0ustar dma0mtdma00000000000000# Makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build PAPER = BUILDDIR = _build # Internal variables. PAPEROPT_a4 = -D latex_paper_size=a4 PAPEROPT_letter = -D latex_paper_size=letter ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . # the i18n builder cannot share the environment and doctrees with the others I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext help: @echo "Please use \`make ' where is one of" @echo " html to make standalone HTML files" @echo " dirhtml to make HTML files named index.html in directories" @echo " singlehtml to make a single large HTML file" @echo " pickle to make pickle files" @echo " json to make JSON files" @echo " htmlhelp to make HTML files and a HTML help project" @echo " qthelp to make HTML files and a qthelp project" @echo " devhelp to make HTML files and a Devhelp project" @echo " epub to make an epub" @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" @echo " latexpdf to make LaTeX files and run them through pdflatex" @echo " text to make text files" @echo " man to make manual pages" @echo " texinfo to make Texinfo files" @echo " info to make Texinfo files and run them through makeinfo" @echo " gettext to make PO message catalogs" @echo " changes to make an overview of all changed/added/deprecated items" @echo " linkcheck to check all external links for integrity" @echo " doctest to run all doctests embedded in the documentation (if enabled)" clean: -rm -rf $(BUILDDIR)/* html: $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." dirhtml: $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." singlehtml: $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml @echo @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." pickle: $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle @echo @echo "Build finished; now you can process the pickle files." json: $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json @echo @echo "Build finished; now you can process the JSON files." htmlhelp: $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp @echo @echo "Build finished; now you can run HTML Help Workshop with the" \ ".hhp project file in $(BUILDDIR)/htmlhelp." qthelp: $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp @echo @echo "Build finished; now you can run "qcollectiongenerator" with the" \ ".qhcp project file in $(BUILDDIR)/qthelp, like this:" @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/latexcodec.qhcp" @echo "To view the help file:" @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/latexcodec.qhc" devhelp: $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp @echo @echo "Build finished." @echo "To view the help file:" @echo "# mkdir -p $$HOME/.local/share/devhelp/latexcodec" @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/latexcodec" @echo "# devhelp" epub: $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub @echo @echo "Build finished. The epub file is in $(BUILDDIR)/epub." latex: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." @echo "Run \`make' in that directory to run these through (pdf)latex" \ "(use \`make latexpdf' here to do that automatically)." latexpdf: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through pdflatex..." $(MAKE) -C $(BUILDDIR)/latex all-pdf @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." text: $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text @echo @echo "Build finished. The text files are in $(BUILDDIR)/text." man: $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man @echo @echo "Build finished. The manual pages are in $(BUILDDIR)/man." texinfo: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." @echo "Run \`make' in that directory to run these through makeinfo" \ "(use \`make info' here to do that automatically)." info: $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo @echo "Running Texinfo files through makeinfo..." make -C $(BUILDDIR)/texinfo info @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." gettext: $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale @echo @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." changes: $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes @echo @echo "The overview file is in $(BUILDDIR)/changes." linkcheck: $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck @echo @echo "Link check complete; look for any errors in the above output " \ "or in $(BUILDDIR)/linkcheck/output.txt." doctest: $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest @echo "Testing of doctests in the sources finished, look at the " \ "results in $(BUILDDIR)/doctest/output.txt." latexcodec-2.0.1/doc/api/0000755005105600024240000000000013674355145015130 5ustar dma0mtdma00000000000000latexcodec-2.0.1/doc/api/codec.rst0000644005105600024240000000004113674352317016730 0ustar dma0mtdma00000000000000.. automodule:: latexcodec.codec latexcodec-2.0.1/doc/api/lexer.rst0000644005105600024240000000004113674352317016772 0ustar dma0mtdma00000000000000.. automodule:: latexcodec.lexer latexcodec-2.0.1/doc/api.rst0000644005105600024240000000010413674352317015653 0ustar dma0mtdma00000000000000API ~~~ .. toctree:: :maxdepth: 2 api/codec api/lexer latexcodec-2.0.1/doc/authors.rst0000644005105600024240000000005613674352317016575 0ustar dma0mtdma00000000000000Authors ======= .. include:: ../AUTHORS.rst latexcodec-2.0.1/doc/changes.rst0000644005105600024240000000007613674352317016522 0ustar dma0mtdma00000000000000:tocdepth: 1 Changes ======= .. include:: ../CHANGELOG.rst latexcodec-2.0.1/doc/conf.py0000644005105600024240000000226413674355130015654 0ustar dma0mtdma00000000000000# -*- coding: utf-8 -*- # # latexcodec documentation build configuration file, created by # sphinx-quickstart on Wed Aug 3 15:45:22 2011. extensions = [ 'sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.intersphinx', 'sphinx.ext.todo', 'sphinx.ext.coverage', 'sphinx.ext.imgmath', 'sphinx.ext.viewcode'] source_suffix = '.rst' master_doc = 'index' project = u'latexcodec' copyright = u'2011-2014, Matthias C. M. Troffaes' with open("../VERSION") as version_file: release = version_file.read().strip() version = '.'.join(release.split('.')[:2]) exclude_patterns = ['_build'] pygments_style = 'sphinx' html_theme = 'default' htmlhelp_basename = 'latexcodecdoc' latex_documents = [ ('index', 'latexcodec.tex', u'latexcodec Documentation', u'Matthias C. M. Troffaes', 'manual'), ] man_pages = [ ('index', 'latexcodec', u'latexcodec Documentation', [u'Matthias C. M. Troffaes'], 1) ] texinfo_documents = [ ('index', 'latexcodec', u'latexcodec Documentation', u'Matthias C. M. Troffaes', 'latexcodec', 'One line description of project.', 'Miscellaneous'), ] intersphinx_mapping = { 'python': ('http://docs.python.org/', None), } latexcodec-2.0.1/doc/index.rst0000644005105600024240000000047213674352317016221 0ustar dma0mtdma00000000000000Welcome to latexcodec's documentation! ====================================== :Release: |release| :Date: |today| Contents -------- .. toctree:: :maxdepth: 2 quickstart api changes authors license Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search` latexcodec-2.0.1/doc/license.rst0000644005105600024240000000044113674352317016530 0ustar dma0mtdma00000000000000License ======= .. include:: ../LICENSE.rst .. rubric:: Remark Versions 0.1 and 0.2 of the latexcodec package were written by Peter Tröger, and were released under the Academic Free License 3.0. The current version of the latexcodec package shares no code with those earlier versions. latexcodec-2.0.1/doc/make.bat0000644005105600024240000001176012177740034015761 0ustar dma0mtdma00000000000000@ECHO OFF REM Command file for Sphinx documentation if "%SPHINXBUILD%" == "" ( set SPHINXBUILD=sphinx-build ) set BUILDDIR=_build set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% . set I18NSPHINXOPTS=%SPHINXOPTS% . if NOT "%PAPER%" == "" ( set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS% set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS% ) if "%1" == "" goto help if "%1" == "help" ( :help echo.Please use `make ^` where ^ is one of echo. html to make standalone HTML files echo. dirhtml to make HTML files named index.html in directories echo. singlehtml to make a single large HTML file echo. pickle to make pickle files echo. json to make JSON files echo. htmlhelp to make HTML files and a HTML help project echo. qthelp to make HTML files and a qthelp project echo. devhelp to make HTML files and a Devhelp project echo. epub to make an epub echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter echo. text to make text files echo. man to make manual pages echo. texinfo to make Texinfo files echo. gettext to make PO message catalogs echo. changes to make an overview over all changed/added/deprecated items echo. linkcheck to check all external links for integrity echo. doctest to run all doctests embedded in the documentation if enabled goto end ) if "%1" == "clean" ( for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i del /q /s %BUILDDIR%\* goto end ) if "%1" == "html" ( %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html if errorlevel 1 exit /b 1 echo. echo.Build finished. The HTML pages are in %BUILDDIR%/html. goto end ) if "%1" == "dirhtml" ( %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml if errorlevel 1 exit /b 1 echo. echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml. goto end ) if "%1" == "singlehtml" ( %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml if errorlevel 1 exit /b 1 echo. echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml. goto end ) if "%1" == "pickle" ( %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can process the pickle files. goto end ) if "%1" == "json" ( %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can process the JSON files. goto end ) if "%1" == "htmlhelp" ( %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can run HTML Help Workshop with the ^ .hhp project file in %BUILDDIR%/htmlhelp. goto end ) if "%1" == "qthelp" ( %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp if errorlevel 1 exit /b 1 echo. echo.Build finished; now you can run "qcollectiongenerator" with the ^ .qhcp project file in %BUILDDIR%/qthelp, like this: echo.^> qcollectiongenerator %BUILDDIR%\qthelp\latexcodec.qhcp echo.To view the help file: echo.^> assistant -collectionFile %BUILDDIR%\qthelp\latexcodec.ghc goto end ) if "%1" == "devhelp" ( %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp if errorlevel 1 exit /b 1 echo. echo.Build finished. goto end ) if "%1" == "epub" ( %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub if errorlevel 1 exit /b 1 echo. echo.Build finished. The epub file is in %BUILDDIR%/epub. goto end ) if "%1" == "latex" ( %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex if errorlevel 1 exit /b 1 echo. echo.Build finished; the LaTeX files are in %BUILDDIR%/latex. goto end ) if "%1" == "text" ( %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text if errorlevel 1 exit /b 1 echo. echo.Build finished. The text files are in %BUILDDIR%/text. goto end ) if "%1" == "man" ( %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man if errorlevel 1 exit /b 1 echo. echo.Build finished. The manual pages are in %BUILDDIR%/man. goto end ) if "%1" == "texinfo" ( %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo if errorlevel 1 exit /b 1 echo. echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo. goto end ) if "%1" == "gettext" ( %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale if errorlevel 1 exit /b 1 echo. echo.Build finished. The message catalogs are in %BUILDDIR%/locale. goto end ) if "%1" == "changes" ( %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes if errorlevel 1 exit /b 1 echo. echo.The overview file is in %BUILDDIR%/changes. goto end ) if "%1" == "linkcheck" ( %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck if errorlevel 1 exit /b 1 echo. echo.Link check complete; look for any errors in the above output ^ or in %BUILDDIR%/linkcheck/output.txt. goto end ) if "%1" == "doctest" ( %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest if errorlevel 1 exit /b 1 echo. echo.Testing of doctests in the sources finished, look at the ^ results in %BUILDDIR%/doctest/output.txt. goto end ) :end latexcodec-2.0.1/doc/quickstart.rst0000644005105600024240000000023113674352317017275 0ustar dma0mtdma00000000000000Getting Started =============== Overview -------- .. include:: ../README.rst :start-line: 5 Installation ------------ .. include:: ../INSTALL.rst latexcodec-2.0.1/latexcodec/0000755005105600024240000000000013674355145015725 5ustar dma0mtdma00000000000000latexcodec-2.0.1/latexcodec/__init__.py0000644005105600024240000000006413674352304020030 0ustar dma0mtdma00000000000000import latexcodec.codec latexcodec.codec.register() latexcodec-2.0.1/latexcodec/codec.py0000644005105600024240000013053613674355130017356 0ustar dma0mtdma00000000000000# -*- coding: utf-8 -*- """ LaTeX Codec ~~~~~~~~~~~ The :mod:`latexcodec.codec` module contains all classes and functions for LaTeX code translation. For practical use, you should only ever need to import the :mod:`latexcodec` module, which will automatically register the codec so it can be used by :meth:`str.encode`, :meth:`str.decode`, and any of the functions defined in the :mod:`codecs` module such as :func:`codecs.open` and so on. The other functions and classes are exposed in case someone would want to extend them. .. autofunction:: register .. autofunction:: find_latex .. autoclass:: LatexIncrementalEncoder :show-inheritance: :members: .. autoclass:: LatexIncrementalDecoder :show-inheritance: :members: .. autoclass:: LatexCodec :show-inheritance: :members: .. autoclass:: LatexUnicodeTable :members: """ # Copyright (c) 2003, 2008 David Eppstein # Copyright (c) 2011-2020 Matthias C. M. Troffaes # # Permission is hereby granted, free of charge, to any person # obtaining a copy of this software and associated documentation # files (the "Software"), to deal in the Software without # restriction, including without limitation the rights to use, # copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the # Software is furnished to do so, subject to the following # conditions: # # The above copyright notice and this permission notice shall be # included in all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES # OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND # NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, # WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING # FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR # OTHER DEALINGS IN THE SOFTWARE. from __future__ import print_function import codecs from six import string_types, text_type from six.moves import range from latexcodec import lexer def register(): """Register the :func:`find_latex` codec search function. .. seealso:: :func:`codecs.register` """ codecs.register(find_latex) # returns the codec search function # this is used if latex_codec.py were to be placed in stdlib def getregentry(): """Encodings module API.""" return find_latex('latex') class LatexUnicodeTable: """Tabulates a translation between LaTeX and unicode.""" def __init__(self, lexer): self.lexer = lexer self.unicode_map = {} self.max_length = 0 self.latex_map = {} self.register_all() def register_all(self): """Register all symbols and their LaTeX equivalents (called by constructor). """ # TODO complete this list # register special symbols self.register(u'\n\n', u' \\par', encode=False) self.register(u'\n\n', u'\\par', encode=False) self.register(u' ', u'\\ ', encode=False) self.register(u'\N{EM SPACE}', u'\\quad') self.register(u'\N{THIN SPACE}', u' ', decode=False) self.register(u'%', u'\\%') self.register(u'\N{EN DASH}', u'--') self.register(u'\N{EN DASH}', u'\\textendash') self.register(u'\N{EM DASH}', u'---') self.register(u'\N{EM DASH}', u'\\textemdash') self.register(u'\N{REPLACEMENT CHARACTER}', u"????", decode=False) self.register(u'\N{LEFT SINGLE QUOTATION MARK}', u'`', decode=False) self.register(u'\N{RIGHT SINGLE QUOTATION MARK}', u"'", decode=False) self.register(u'\N{LEFT DOUBLE QUOTATION MARK}', u'``') self.register(u'\N{RIGHT DOUBLE QUOTATION MARK}', u"''") self.register(u'\N{DOUBLE LOW-9 QUOTATION MARK}', u",,") self.register(u'\N{DOUBLE LOW-9 QUOTATION MARK}', u'\\glqq', encode=False) self.register(u'\N{LEFT-POINTING DOUBLE ANGLE QUOTATION MARK}', u'\\guillemotleft') self.register(u'\N{RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK}', u'\\guillemotright') self.register(u'\N{MODIFIER LETTER PRIME}', u"'", decode=False) self.register(u'\N{MODIFIER LETTER DOUBLE PRIME}', u"''", decode=False) self.register(u'\N{MODIFIER LETTER TURNED COMMA}', u'`', decode=False) self.register(u'\N{MODIFIER LETTER APOSTROPHE}', u"'", decode=False) self.register(u'\N{MODIFIER LETTER REVERSED COMMA}', u'`', decode=False) self.register(u'\N{DAGGER}', u'\\dag') self.register(u'\N{DOUBLE DAGGER}', u'\\ddag') self.register(u'\\', u'\\textbackslash', encode=False) self.register(u'\\', u'\\backslash', mode='math', encode=False) self.register(u'\N{TILDE OPERATOR}', u'\\sim', mode='math') self.register(u'\N{MODIFIER LETTER LOW TILDE}', u'\\texttildelow', package='textcomp') self.register(u'\N{SMALL TILDE}', u'\\~{}') self.register(u'~', u'\\textasciitilde') self.register(u'\N{BULLET}', u'\\bullet', mode='math') self.register(u'\N{BULLET}', u'\\textbullet', package='textcomp') self.register(u'\N{ASTERISK OPERATOR}', u'\\ast', mode='math') self.register(u'\N{NUMBER SIGN}', u'\\#') self.register(u'\N{LOW LINE}', u'\\_') self.register(u'\N{AMPERSAND}', u'\\&') self.register(u'\N{NO-BREAK SPACE}', u'~') self.register(u'\N{INVERTED EXCLAMATION MARK}', u'!`') self.register(u'\N{CENT SIGN}', u'\\not{c}') self.register(u'\N{POUND SIGN}', u'\\pounds') self.register(u'\N{POUND SIGN}', u'\\textsterling', package='textcomp') self.register(u'\N{YEN SIGN}', u'\\yen') self.register(u'\N{YEN SIGN}', u'\\textyen', package='textcomp') self.register(u'\N{SECTION SIGN}', u'\\S') self.register(u'\N{DIAERESIS}', u'\\"{}') self.register(u'\N{NOT SIGN}', u'\\neg') self.register(u'\N{HYPHEN}', u'-', decode=False) self.register(u'\N{SOFT HYPHEN}', u'\\-') self.register(u'\N{MACRON}', u'\\={}') self.register(u'\N{DEGREE SIGN}', u'^\\circ', mode='math') self.register(u'\N{DEGREE SIGN}', u'\\textdegree', package='textcomp') self.register(u'\N{MINUS SIGN}', u'-', mode='math') self.register(u'\N{PLUS-MINUS SIGN}', u'\\pm', mode='math') self.register(u'\N{PLUS-MINUS SIGN}', u'\\textpm', package='textcomp') self.register(u'\N{SUPERSCRIPT TWO}', u'^2', mode='math') self.register( u'\N{SUPERSCRIPT TWO}', u'\\texttwosuperior', package='textcomp') self.register(u'\N{SUPERSCRIPT THREE}', u'^3', mode='math') self.register( u'\N{SUPERSCRIPT THREE}', u'\\textthreesuperior', package='textcomp') self.register(u'\N{ACUTE ACCENT}', u"\\'{}") self.register(u'\N{MICRO SIGN}', u'\\mu', mode='math') self.register(u'\N{MICRO SIGN}', u'\\micro', package='gensymu') self.register(u'\N{PILCROW SIGN}', u'\\P') self.register(u'\N{MIDDLE DOT}', u'\\cdot', mode='math') self.register( u'\N{MIDDLE DOT}', u'\\textperiodcentered', package='textcomp') self.register(u'\N{CEDILLA}', u'\\c{}') self.register(u'\N{SUPERSCRIPT ONE}', u'^1', mode='math') self.register( u'\N{SUPERSCRIPT ONE}', u'\\textonesuperior', package='textcomp') self.register(u'\N{INVERTED QUESTION MARK}', u'?`') self.register(u'\N{LATIN CAPITAL LETTER A WITH GRAVE}', u'\\`A') self.register(u'\N{LATIN CAPITAL LETTER A WITH CIRCUMFLEX}', u'\\^A') self.register(u'\N{LATIN CAPITAL LETTER A WITH TILDE}', u'\\~A') self.register(u'\N{LATIN CAPITAL LETTER A WITH DIAERESIS}', u'\\"A') self.register(u'\N{LATIN CAPITAL LETTER A WITH RING ABOVE}', u'\\AA') self.register(u'\N{LATIN CAPITAL LETTER A WITH RING ABOVE}', u'\\r A', encode=False) self.register(u'\N{LATIN CAPITAL LETTER AE}', u'\\AE') self.register(u'\N{LATIN CAPITAL LETTER C WITH CEDILLA}', u'\\c C') self.register(u'\N{LATIN CAPITAL LETTER E WITH GRAVE}', u'\\`E') self.register(u'\N{LATIN CAPITAL LETTER E WITH ACUTE}', u"\\'E") self.register(u'\N{LATIN CAPITAL LETTER E WITH CIRCUMFLEX}', u'\\^E') self.register(u'\N{LATIN CAPITAL LETTER E WITH DIAERESIS}', u'\\"E') self.register(u'\N{LATIN CAPITAL LETTER I WITH GRAVE}', u'\\`I') self.register(u'\N{LATIN CAPITAL LETTER I WITH CIRCUMFLEX}', u'\\^I') self.register(u'\N{LATIN CAPITAL LETTER I WITH DIAERESIS}', u'\\"I') self.register(u'\N{LATIN CAPITAL LETTER N WITH TILDE}', u'\\~N') self.register(u'\N{LATIN CAPITAL LETTER O WITH GRAVE}', u'\\`O') self.register(u'\N{LATIN CAPITAL LETTER O WITH ACUTE}', u"\\'O") self.register(u'\N{LATIN CAPITAL LETTER O WITH CIRCUMFLEX}', u'\\^O') self.register(u'\N{LATIN CAPITAL LETTER O WITH TILDE}', u'\\~O') self.register(u'\N{LATIN CAPITAL LETTER O WITH DIAERESIS}', u'\\"O') self.register(u'\N{MULTIPLICATION SIGN}', u'\\times', mode='math') self.register(u'\N{LATIN CAPITAL LETTER O WITH STROKE}', u'\\O') self.register(u'\N{LATIN CAPITAL LETTER U WITH GRAVE}', u'\\`U') self.register(u'\N{LATIN CAPITAL LETTER U WITH ACUTE}', u"\\'U") self.register(u'\N{LATIN CAPITAL LETTER U WITH CIRCUMFLEX}', u'\\^U') self.register(u'\N{LATIN CAPITAL LETTER U WITH DIAERESIS}', u'\\"U') self.register(u'\N{LATIN CAPITAL LETTER Y WITH ACUTE}', u"\\'Y") self.register(u'\N{LATIN SMALL LETTER SHARP S}', u'\\ss') self.register(u'\N{LATIN SMALL LETTER A WITH GRAVE}', u'\\`a') self.register(u'\N{LATIN SMALL LETTER A WITH ACUTE}', u"\\'a") self.register(u'\N{LATIN SMALL LETTER A WITH CIRCUMFLEX}', u'\\^a') self.register(u'\N{LATIN SMALL LETTER A WITH TILDE}', u'\\~a') self.register(u'\N{LATIN SMALL LETTER A WITH DIAERESIS}', u'\\"a') self.register(u'\N{LATIN SMALL LETTER A WITH RING ABOVE}', u'\\aa') self.register(u'\N{LATIN SMALL LETTER A WITH RING ABOVE}', u'\\r a', encode=False) self.register(u'\N{LATIN SMALL LETTER AE}', u'\\ae') self.register(u'\N{LATIN SMALL LETTER C WITH CEDILLA}', u'\\c c') self.register(u'\N{LATIN SMALL LETTER E WITH GRAVE}', u'\\`e') self.register(u'\N{LATIN SMALL LETTER E WITH ACUTE}', u"\\'e") self.register(u'\N{LATIN SMALL LETTER E WITH CIRCUMFLEX}', u'\\^e') self.register(u'\N{LATIN SMALL LETTER E WITH DIAERESIS}', u'\\"e') self.register(u'\N{LATIN SMALL LETTER I WITH GRAVE}', u'\\`\\i') self.register(u'\N{LATIN SMALL LETTER I WITH GRAVE}', u'\\`i') self.register(u'\N{LATIN SMALL LETTER I WITH ACUTE}', u"\\'\\i") self.register(u'\N{LATIN SMALL LETTER I WITH ACUTE}', u"\\'i") self.register(u'\N{LATIN SMALL LETTER I WITH CIRCUMFLEX}', u'\\^\\i') self.register(u'\N{LATIN SMALL LETTER I WITH CIRCUMFLEX}', u'\\^i') self.register(u'\N{LATIN SMALL LETTER I WITH DIAERESIS}', u'\\"\\i') self.register(u'\N{LATIN SMALL LETTER I WITH DIAERESIS}', u'\\"i') self.register(u'\N{LATIN SMALL LETTER N WITH TILDE}', u'\\~n') self.register(u'\N{LATIN SMALL LETTER O WITH GRAVE}', u'\\`o') self.register(u'\N{LATIN SMALL LETTER O WITH ACUTE}', u"\\'o") self.register(u'\N{LATIN SMALL LETTER O WITH CIRCUMFLEX}', u'\\^o') self.register(u'\N{LATIN SMALL LETTER O WITH TILDE}', u'\\~o') self.register(u'\N{LATIN SMALL LETTER O WITH DIAERESIS}', u'\\"o') self.register(u'\N{DIVISION SIGN}', u'\\div', mode='math') self.register(u'\N{LATIN SMALL LETTER O WITH STROKE}', u'\\o') self.register(u'\N{LATIN SMALL LETTER U WITH GRAVE}', u'\\`u') self.register(u'\N{LATIN SMALL LETTER U WITH ACUTE}', u"\\'u") self.register(u'\N{LATIN SMALL LETTER U WITH CIRCUMFLEX}', u'\\^u') self.register(u'\N{LATIN SMALL LETTER U WITH DIAERESIS}', u'\\"u') self.register(u'\N{LATIN SMALL LETTER Y WITH ACUTE}', u"\\'y") self.register(u'\N{LATIN SMALL LETTER Y WITH DIAERESIS}', u'\\"y') self.register(u'\N{LATIN CAPITAL LETTER A WITH MACRON}', u'\\=A') self.register(u'\N{LATIN SMALL LETTER A WITH MACRON}', u'\\=a') self.register(u'\N{LATIN CAPITAL LETTER A WITH BREVE}', u'\\u A') self.register(u'\N{LATIN SMALL LETTER A WITH BREVE}', u'\\u a') self.register(u'\N{LATIN CAPITAL LETTER A WITH OGONEK}', u'\\k A') self.register(u'\N{LATIN SMALL LETTER A WITH OGONEK}', u'\\k a') self.register(u'\N{LATIN CAPITAL LETTER C WITH ACUTE}', u"\\'C") self.register(u'\N{LATIN SMALL LETTER C WITH ACUTE}', u"\\'c") self.register(u'\N{LATIN CAPITAL LETTER C WITH CIRCUMFLEX}', u'\\^C') self.register(u'\N{LATIN SMALL LETTER C WITH CIRCUMFLEX}', u'\\^c') self.register(u'\N{LATIN CAPITAL LETTER C WITH DOT ABOVE}', u'\\.C') self.register(u'\N{LATIN SMALL LETTER C WITH DOT ABOVE}', u'\\.c') self.register(u'\N{LATIN CAPITAL LETTER C WITH CARON}', u'\\v C') self.register(u'\N{LATIN SMALL LETTER C WITH CARON}', u'\\v c') self.register(u'\N{LATIN CAPITAL LETTER D WITH CARON}', u'\\v D') self.register(u'\N{LATIN SMALL LETTER D WITH CARON}', u'\\v d') self.register(u'\N{LATIN CAPITAL LETTER E WITH MACRON}', u'\\=E') self.register(u'\N{LATIN SMALL LETTER E WITH MACRON}', u'\\=e') self.register(u'\N{LATIN CAPITAL LETTER E WITH BREVE}', u'\\u E') self.register(u'\N{LATIN SMALL LETTER E WITH BREVE}', u'\\u e') self.register(u'\N{LATIN CAPITAL LETTER E WITH DOT ABOVE}', u'\\.E') self.register(u'\N{LATIN SMALL LETTER E WITH DOT ABOVE}', u'\\.e') self.register(u'\N{LATIN CAPITAL LETTER E WITH OGONEK}', u'\\k E') self.register(u'\N{LATIN SMALL LETTER E WITH OGONEK}', u'\\k e') self.register(u'\N{LATIN CAPITAL LETTER E WITH CARON}', u'\\v E') self.register(u'\N{LATIN SMALL LETTER E WITH CARON}', u'\\v e') self.register(u'\N{LATIN CAPITAL LETTER G WITH CIRCUMFLEX}', u'\\^G') self.register(u'\N{LATIN SMALL LETTER G WITH CIRCUMFLEX}', u'\\^g') self.register(u'\N{LATIN CAPITAL LETTER G WITH BREVE}', u'\\u G') self.register(u'\N{LATIN SMALL LETTER G WITH BREVE}', u'\\u g') self.register(u'\N{LATIN CAPITAL LETTER G WITH DOT ABOVE}', u'\\.G') self.register(u'\N{LATIN SMALL LETTER G WITH DOT ABOVE}', u'\\.g') self.register(u'\N{LATIN CAPITAL LETTER G WITH CEDILLA}', u'\\c G') self.register(u'\N{LATIN SMALL LETTER G WITH CEDILLA}', u'\\c g') self.register(u'\N{LATIN CAPITAL LETTER H WITH CIRCUMFLEX}', u'\\^H') self.register(u'\N{LATIN SMALL LETTER H WITH CIRCUMFLEX}', u'\\^h') self.register(u'\N{LATIN CAPITAL LETTER I WITH TILDE}', u'\\~I') self.register(u'\N{LATIN SMALL LETTER I WITH TILDE}', u'\\~\\i') self.register(u'\N{LATIN SMALL LETTER I WITH TILDE}', u'\\~i') self.register(u'\N{LATIN CAPITAL LETTER I WITH MACRON}', u'\\=I') self.register(u'\N{LATIN SMALL LETTER I WITH MACRON}', u'\\=\\i') self.register(u'\N{LATIN SMALL LETTER I WITH MACRON}', u'\\=i') self.register(u'\N{LATIN CAPITAL LETTER I WITH BREVE}', u'\\u I') self.register(u'\N{LATIN SMALL LETTER I WITH BREVE}', u'\\u\\i') self.register(u'\N{LATIN SMALL LETTER I WITH BREVE}', u'\\u i') self.register(u'\N{LATIN CAPITAL LETTER I WITH OGONEK}', u'\\k I') self.register(u'\N{LATIN SMALL LETTER I WITH OGONEK}', u'\\k i') self.register(u'\N{LATIN CAPITAL LETTER I WITH DOT ABOVE}', u'\\.I') self.register(u'\N{LATIN SMALL LETTER DOTLESS I}', u'\\i') self.register(u'\N{LATIN CAPITAL LIGATURE IJ}', u'IJ', decode=False) self.register(u'\N{LATIN SMALL LIGATURE IJ}', u'ij', decode=False) self.register(u'\N{LATIN CAPITAL LETTER J WITH CIRCUMFLEX}', u'\\^J') self.register(u'\N{LATIN SMALL LETTER J WITH CIRCUMFLEX}', u'\\^\\j') self.register(u'\N{LATIN SMALL LETTER J WITH CIRCUMFLEX}', u'\\^j') self.register(u'\N{LATIN CAPITAL LETTER K WITH CEDILLA}', u'\\c K') self.register(u'\N{LATIN SMALL LETTER K WITH CEDILLA}', u'\\c k') self.register(u'\N{LATIN CAPITAL LETTER L WITH ACUTE}', u"\\'L") self.register(u'\N{LATIN SMALL LETTER L WITH ACUTE}', u"\\'l") self.register(u'\N{LATIN CAPITAL LETTER L WITH CEDILLA}', u'\\c L') self.register(u'\N{LATIN SMALL LETTER L WITH CEDILLA}', u'\\c l') self.register(u'\N{LATIN CAPITAL LETTER L WITH CARON}', u'\\v L') self.register(u'\N{LATIN SMALL LETTER L WITH CARON}', u'\\v l') self.register(u'\N{LATIN CAPITAL LETTER L WITH STROKE}', u'\\L') self.register(u'\N{LATIN SMALL LETTER L WITH STROKE}', u'\\l') self.register(u'\N{LATIN CAPITAL LETTER N WITH ACUTE}', u"\\'N") self.register(u'\N{LATIN SMALL LETTER N WITH ACUTE}', u"\\'n") self.register(u'\N{LATIN CAPITAL LETTER N WITH CEDILLA}', u'\\c N') self.register(u'\N{LATIN SMALL LETTER N WITH CEDILLA}', u'\\c n') self.register(u'\N{LATIN CAPITAL LETTER N WITH CARON}', u'\\v N') self.register(u'\N{LATIN SMALL LETTER N WITH CARON}', u'\\v n') self.register(u'\N{LATIN CAPITAL LETTER O WITH MACRON}', u'\\=O') self.register(u'\N{LATIN SMALL LETTER O WITH MACRON}', u'\\=o') self.register(u'\N{LATIN CAPITAL LETTER O WITH BREVE}', u'\\u O') self.register(u'\N{LATIN SMALL LETTER O WITH BREVE}', u'\\u o') self.register( u'\N{LATIN CAPITAL LETTER O WITH DOUBLE ACUTE}', u'\\H O') self.register(u'\N{LATIN SMALL LETTER O WITH DOUBLE ACUTE}', u'\\H o') self.register(u'\N{LATIN CAPITAL LIGATURE OE}', u'\\OE') self.register(u'\N{LATIN SMALL LIGATURE OE}', u'\\oe') self.register(u'\N{LATIN CAPITAL LETTER R WITH ACUTE}', u"\\'R") self.register(u'\N{LATIN SMALL LETTER R WITH ACUTE}', u"\\'r") self.register(u'\N{LATIN CAPITAL LETTER R WITH CEDILLA}', u'\\c R') self.register(u'\N{LATIN SMALL LETTER R WITH CEDILLA}', u'\\c r') self.register(u'\N{LATIN CAPITAL LETTER R WITH CARON}', u'\\v R') self.register(u'\N{LATIN SMALL LETTER R WITH CARON}', u'\\v r') self.register(u'\N{LATIN CAPITAL LETTER S WITH ACUTE}', u"\\'S") self.register(u'\N{LATIN SMALL LETTER S WITH ACUTE}', u"\\'s") self.register(u'\N{LATIN CAPITAL LETTER S WITH CIRCUMFLEX}', u'\\^S') self.register(u'\N{LATIN SMALL LETTER S WITH CIRCUMFLEX}', u'\\^s') self.register(u'\N{LATIN CAPITAL LETTER S WITH CEDILLA}', u'\\c S') self.register(u'\N{LATIN SMALL LETTER S WITH CEDILLA}', u'\\c s') self.register(u'\N{LATIN CAPITAL LETTER S WITH CARON}', u'\\v S') self.register(u'\N{LATIN SMALL LETTER S WITH CARON}', u'\\v s') self.register(u'\N{LATIN CAPITAL LETTER T WITH CEDILLA}', u'\\c T') self.register(u'\N{LATIN SMALL LETTER T WITH CEDILLA}', u'\\c t') self.register(u'\N{LATIN CAPITAL LETTER T WITH CARON}', u'\\v T') self.register(u'\N{LATIN SMALL LETTER T WITH CARON}', u'\\v t') self.register(u'\N{LATIN CAPITAL LETTER U WITH TILDE}', u'\\~U') self.register(u'\N{LATIN SMALL LETTER U WITH TILDE}', u'\\~u') self.register(u'\N{LATIN CAPITAL LETTER U WITH MACRON}', u'\\=U') self.register(u'\N{LATIN SMALL LETTER U WITH MACRON}', u'\\=u') self.register(u'\N{LATIN CAPITAL LETTER U WITH BREVE}', u'\\u U') self.register(u'\N{LATIN SMALL LETTER U WITH BREVE}', u'\\u u') self.register(u'\N{LATIN CAPITAL LETTER U WITH RING ABOVE}', u'\\r U') self.register(u'\N{LATIN SMALL LETTER U WITH RING ABOVE}', u'\\r u') self.register( u'\N{LATIN CAPITAL LETTER U WITH DOUBLE ACUTE}', u'\\H U') self.register(u'\N{LATIN SMALL LETTER U WITH DOUBLE ACUTE}', u'\\H u') self.register(u'\N{LATIN CAPITAL LETTER U WITH OGONEK}', u'\\k U') self.register(u'\N{LATIN SMALL LETTER U WITH OGONEK}', u'\\k u') self.register(u'\N{LATIN CAPITAL LETTER W WITH CIRCUMFLEX}', u'\\^W') self.register(u'\N{LATIN SMALL LETTER W WITH CIRCUMFLEX}', u'\\^w') self.register(u'\N{LATIN CAPITAL LETTER Y WITH CIRCUMFLEX}', u'\\^Y') self.register(u'\N{LATIN SMALL LETTER Y WITH CIRCUMFLEX}', u'\\^y') self.register(u'\N{LATIN CAPITAL LETTER Y WITH DIAERESIS}', u'\\"Y') self.register(u'\N{LATIN CAPITAL LETTER Z WITH ACUTE}', u"\\'Z") self.register(u'\N{LATIN SMALL LETTER Z WITH ACUTE}', u"\\'z") self.register(u'\N{LATIN CAPITAL LETTER Z WITH DOT ABOVE}', u'\\.Z') self.register(u'\N{LATIN SMALL LETTER Z WITH DOT ABOVE}', u'\\.z') self.register(u'\N{LATIN CAPITAL LETTER Z WITH CARON}', u'\\v Z') self.register(u'\N{LATIN SMALL LETTER Z WITH CARON}', u'\\v z') self.register(u'\N{LATIN CAPITAL LETTER DZ WITH CARON}', u'D\\v Z') self.register( u'\N{LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON}', u'D\\v z') self.register(u'\N{LATIN SMALL LETTER DZ WITH CARON}', u'd\\v z') self.register(u'\N{LATIN CAPITAL LETTER LJ}', u'LJ', decode=False) self.register( u'\N{LATIN CAPITAL LETTER L WITH SMALL LETTER J}', u'Lj', decode=False) self.register(u'\N{LATIN SMALL LETTER LJ}', u'lj', decode=False) self.register(u'\N{LATIN CAPITAL LETTER NJ}', u'NJ', decode=False) self.register( u'\N{LATIN CAPITAL LETTER N WITH SMALL LETTER J}', u'Nj', decode=False) self.register(u'\N{LATIN SMALL LETTER NJ}', u'nj', decode=False) self.register(u'\N{LATIN CAPITAL LETTER A WITH CARON}', u'\\v A') self.register(u'\N{LATIN SMALL LETTER A WITH CARON}', u'\\v a') self.register(u'\N{LATIN CAPITAL LETTER I WITH CARON}', u'\\v I') self.register(u'\N{LATIN SMALL LETTER I WITH CARON}', u'\\v\\i') self.register(u'\N{LATIN CAPITAL LETTER O WITH CARON}', u'\\v O') self.register(u'\N{LATIN SMALL LETTER O WITH CARON}', u'\\v o') self.register(u'\N{LATIN CAPITAL LETTER U WITH CARON}', u'\\v U') self.register(u'\N{LATIN SMALL LETTER U WITH CARON}', u'\\v u') self.register(u'\N{LATIN CAPITAL LETTER G WITH CARON}', u'\\v G') self.register(u'\N{LATIN SMALL LETTER G WITH CARON}', u'\\v g') self.register(u'\N{LATIN CAPITAL LETTER K WITH CARON}', u'\\v K') self.register(u'\N{LATIN SMALL LETTER K WITH CARON}', u'\\v k') self.register(u'\N{LATIN CAPITAL LETTER O WITH OGONEK}', u'\\k O') self.register(u'\N{LATIN SMALL LETTER O WITH OGONEK}', u'\\k o') self.register(u'\N{LATIN SMALL LETTER J WITH CARON}', u'\\v\\j') self.register(u'\N{LATIN CAPITAL LETTER DZ}', u'DZ', decode=False) self.register( u'\N{LATIN CAPITAL LETTER D WITH SMALL LETTER Z}', u'Dz', decode=False) self.register(u'\N{LATIN SMALL LETTER DZ}', u'dz', decode=False) self.register(u'\N{LATIN CAPITAL LETTER G WITH ACUTE}', u"\\'G") self.register(u'\N{LATIN SMALL LETTER G WITH ACUTE}', u"\\'g") self.register(u'\N{LATIN CAPITAL LETTER AE WITH ACUTE}', u"\\'\\AE") self.register(u'\N{LATIN SMALL LETTER AE WITH ACUTE}', u"\\'\\ae") self.register( u'\N{LATIN CAPITAL LETTER O WITH STROKE AND ACUTE}', u"\\'\\O") self.register( u'\N{LATIN SMALL LETTER O WITH STROKE AND ACUTE}', u"\\'\\o") self.register(u'\N{LATIN CAPITAL LETTER ETH}', u'\\DH') self.register(u'\N{LATIN SMALL LETTER ETH}', u'\\dh') self.register(u'\N{LATIN CAPITAL LETTER THORN}', u'\\TH') self.register(u'\N{LATIN SMALL LETTER THORN}', u'\\th') self.register(u'\N{LATIN CAPITAL LETTER D WITH STROKE}', u'\\DJ') self.register(u'\N{LATIN SMALL LETTER D WITH STROKE}', u'\\dj') self.register(u'\N{LATIN CAPITAL LETTER D WITH DOT BELOW}', u'\\d D') self.register(u'\N{LATIN SMALL LETTER D WITH DOT BELOW}', u'\\d d') self.register(u'\N{LATIN CAPITAL LETTER L WITH DOT BELOW}', u'\\d L') self.register(u'\N{LATIN SMALL LETTER L WITH DOT BELOW}', u'\\d l') self.register(u'\N{LATIN CAPITAL LETTER M WITH DOT BELOW}', u'\\d M') self.register(u'\N{LATIN SMALL LETTER M WITH DOT BELOW}', u'\\d m') self.register(u'\N{LATIN CAPITAL LETTER N WITH DOT BELOW}', u'\\d N') self.register(u'\N{LATIN SMALL LETTER N WITH DOT BELOW}', u'\\d n') self.register(u'\N{LATIN CAPITAL LETTER R WITH DOT BELOW}', u'\\d R') self.register(u'\N{LATIN SMALL LETTER R WITH DOT BELOW}', u'\\d r') self.register(u'\N{LATIN CAPITAL LETTER S WITH DOT BELOW}', u'\\d S') self.register(u'\N{LATIN SMALL LETTER S WITH DOT BELOW}', u'\\d s') self.register(u'\N{LATIN CAPITAL LETTER T WITH DOT BELOW}', u'\\d T') self.register(u'\N{LATIN SMALL LETTER T WITH DOT BELOW}', u'\\d t') self.register(u'\N{LATIN CAPITAL LETTER S WITH COMMA BELOW}', u'\\textcommabelow S') self.register(u'\N{LATIN SMALL LETTER S WITH COMMA BELOW}', u'\\textcommabelow s') self.register(u'\N{LATIN CAPITAL LETTER T WITH COMMA BELOW}', u'\\textcommabelow T') self.register(u'\N{LATIN SMALL LETTER T WITH COMMA BELOW}', u'\\textcommabelow t') self.register(u'\N{PARTIAL DIFFERENTIAL}', u'\\partial', mode='math') self.register(u'\N{N-ARY PRODUCT}', u'\\prod', mode='math') self.register(u'\N{N-ARY SUMMATION}', u'\\sum', mode='math') self.register(u'\N{SQUARE ROOT}', u'\\surd', mode='math') self.register(u'\N{INFINITY}', u'\\infty', mode='math') self.register(u'\N{INTEGRAL}', u'\\int', mode='math') self.register(u'\N{INTERSECTION}', u'\\cap', mode='math') self.register(u'\N{UNION}', u'\\cup', mode='math') self.register(u'\N{RIGHTWARDS ARROW}', u'\\rightarrow', mode='math') self.register( u'\N{RIGHTWARDS DOUBLE ARROW}', u'\\Rightarrow', mode='math') self.register(u'\N{LEFTWARDS ARROW}', u'\\leftarrow', mode='math') self.register( u'\N{LEFTWARDS DOUBLE ARROW}', u'\\Leftarrow', mode='math') self.register(u'\N{LOGICAL OR}', u'\\vee', mode='math') self.register(u'\N{LOGICAL AND}', u'\\wedge', mode='math') self.register(u'\N{ALMOST EQUAL TO}', u'\\approx', mode='math') self.register(u'\N{NOT EQUAL TO}', u'\\neq', mode='math') self.register(u'\N{LESS-THAN OR EQUAL TO}', u'\\leq', mode='math') self.register(u'\N{GREATER-THAN OR EQUAL TO}', u'\\geq', mode='math') self.register(u'\N{MODIFIER LETTER CIRCUMFLEX ACCENT}', u'\\^{}') self.register(u'\N{CARON}', u'\\v{}') self.register(u'\N{BREVE}', u'\\u{}') self.register(u'\N{DOT ABOVE}', u'\\.{}') self.register(u'\N{RING ABOVE}', u'\\r{}') self.register(u'\N{OGONEK}', u'\\k{}') self.register(u'\N{DOUBLE ACUTE ACCENT}', u'\\H{}') self.register(u'\N{LATIN SMALL LIGATURE FI}', u'fi', decode=False) self.register(u'\N{LATIN SMALL LIGATURE FL}', u'fl', decode=False) self.register(u'\N{LATIN SMALL LIGATURE FF}', u'ff', decode=False) self.register(u'\N{GREEK SMALL LETTER ALPHA}', u'\\alpha', mode='math') self.register(u'\N{GREEK SMALL LETTER BETA}', u'\\beta', mode='math') self.register(u'\N{GREEK SMALL LETTER GAMMA}', u'\\gamma', mode='math') self.register(u'\N{GREEK SMALL LETTER DELTA}', u'\\delta', mode='math') self.register( u'\N{GREEK SMALL LETTER EPSILON}', u'\\epsilon', mode='math') self.register(u'\N{GREEK SMALL LETTER ZETA}', u'\\zeta', mode='math') self.register(u'\N{GREEK SMALL LETTER ETA}', u'\\eta', mode='math') self.register(u'\N{GREEK SMALL LETTER THETA}', u'\\theta', mode='math') self.register(u'\N{GREEK SMALL LETTER THETA}', u'\\texttheta', package='textgreek', encode=False) self.register(u'\N{GREEK SMALL LETTER IOTA}', u'\\iota', mode='math') self.register(u'\N{GREEK SMALL LETTER KAPPA}', u'\\kappa', mode='math') self.register( u'\N{GREEK SMALL LETTER LAMDA}', u'\\lambda', mode='math') # LAMDA not LAMBDA self.register(u'\N{GREEK SMALL LETTER MU}', u'\\mu', mode='math') self.register(u'\N{GREEK SMALL LETTER NU}', u'\\nu', mode='math') self.register(u'\N{GREEK SMALL LETTER XI}', u'\\xi', mode='math') self.register( u'\N{GREEK SMALL LETTER OMICRON}', u'\\omicron', mode='math') self.register(u'\N{GREEK SMALL LETTER PI}', u'\\pi', mode='math') self.register(u'\N{GREEK SMALL LETTER RHO}', u'\\rho', mode='math') self.register(u'\N{GREEK SMALL LETTER SIGMA}', u'\\sigma', mode='math') self.register(u'\N{GREEK SMALL LETTER TAU}', u'\\tau', mode='math') self.register( u'\N{GREEK SMALL LETTER UPSILON}', u'\\upsilon', mode='math') self.register(u'\N{GREEK SMALL LETTER PHI}', u'\\phi', mode='math') self.register(u'\N{GREEK PHI SYMBOL}', u'\\varphi', mode='math') self.register(u'\N{GREEK SMALL LETTER CHI}', u'\\chi', mode='math') self.register(u'\N{GREEK SMALL LETTER PSI}', u'\\psi', mode='math') self.register(u'\N{GREEK SMALL LETTER OMEGA}', u'\\omega', mode='math') self.register( u'\N{GREEK CAPITAL LETTER ALPHA}', u'\\Alpha', mode='math') self.register(u'\N{GREEK CAPITAL LETTER BETA}', u'\\Beta', mode='math') self.register( u'\N{GREEK CAPITAL LETTER GAMMA}', u'\\Gamma', mode='math') self.register( u'\N{GREEK CAPITAL LETTER DELTA}', u'\\Delta', mode='math') self.register( u'\N{GREEK CAPITAL LETTER EPSILON}', u'\\Epsilon', mode='math') self.register(u'\N{GREEK CAPITAL LETTER ZETA}', u'\\Zeta', mode='math') self.register(u'\N{GREEK CAPITAL LETTER ETA}', u'\\Eta', mode='math') self.register( u'\N{GREEK CAPITAL LETTER THETA}', u'\\Theta', mode='math') self.register(u'\N{GREEK CAPITAL LETTER IOTA}', u'\\Iota', mode='math') self.register( u'\N{GREEK CAPITAL LETTER KAPPA}', u'\\Kappa', mode='math') self.register( u'\N{GREEK CAPITAL LETTER LAMDA}', u'\\Lambda', mode='math') # LAMDA not LAMBDA self.register(u'\N{GREEK CAPITAL LETTER MU}', u'\\Mu', mode='math') self.register(u'\N{GREEK CAPITAL LETTER NU}', u'\\Nu', mode='math') self.register(u'\N{GREEK CAPITAL LETTER XI}', u'\\Xi', mode='math') self.register( u'\N{GREEK CAPITAL LETTER OMICRON}', u'\\Omicron', mode='math') self.register(u'\N{GREEK CAPITAL LETTER PI}', u'\\Pi', mode='math') self.register(u'\N{GREEK CAPITAL LETTER RHO}', u'\\Rho', mode='math') self.register( u'\N{GREEK CAPITAL LETTER SIGMA}', u'\\Sigma', mode='math') self.register(u'\N{GREEK CAPITAL LETTER TAU}', u'\\Tau', mode='math') self.register( u'\N{GREEK CAPITAL LETTER UPSILON}', u'\\Upsilon', mode='math') self.register(u'\N{GREEK CAPITAL LETTER PHI}', u'\\Phi', mode='math') self.register(u'\N{GREEK CAPITAL LETTER CHI}', u'\\Chi', mode='math') self.register(u'\N{GREEK CAPITAL LETTER PSI}', u'\\Psi', mode='math') self.register( u'\N{GREEK CAPITAL LETTER OMEGA}', u'\\Omega', mode='math') self.register(u'\N{COPYRIGHT SIGN}', u'\\copyright') self.register(u'\N{COPYRIGHT SIGN}', u'\\textcopyright') self.register(u'\N{LATIN CAPITAL LETTER A WITH ACUTE}', u"\\'A") self.register(u'\N{LATIN CAPITAL LETTER I WITH ACUTE}', u"\\'I") self.register(u'\N{HORIZONTAL ELLIPSIS}', u'\\ldots') self.register(u'\N{TRADE MARK SIGN}', u'^{TM}', mode='math') self.register( u'\N{TRADE MARK SIGN}', u'\\texttrademark', package='textcomp') self.register( u'\N{REGISTERED SIGN}', u'\\textregistered', package='textcomp') # \=O and \=o will be translated into Ō and ō before we can # match the full latex string... so decoding disabled for now self.register(u'Ǭ', text_type(r'\textogonekcentered{\=O}'), decode=False) self.register(u'ǭ', text_type(r'\textogonekcentered{\=o}'), decode=False) self.register(u'ℕ', text_type(r'\mathbb{N}'), mode='math') self.register(u'ℕ', text_type(r'\mathbb N'), mode='math', decode=False) self.register(u'ℤ', text_type(r'\mathbb{Z}'), mode='math') self.register(u'ℤ', text_type(r'\mathbb Z'), mode='math', decode=False) self.register(u'ℚ', text_type(r'\mathbb{Q}'), mode='math') self.register(u'ℚ', text_type(r'\mathbb Q'), mode='math', decode=False) self.register(u'ℝ', text_type(r'\mathbb{R}'), mode='math') self.register(u'ℝ', text_type(r'\mathbb R'), mode='math', decode=False) self.register(u'ℂ', text_type(r'\mathbb{C}'), mode='math') self.register(u'ℂ', text_type(r'\mathbb C'), mode='math', decode=False) def register(self, unicode_text, latex_text, mode='text', package=None, decode=True, encode=True): """Register a correspondence between *unicode_text* and *latex_text*. :param str unicode_text: A unicode character. :param str latex_text: Its corresponding LaTeX translation. :param str mode: LaTeX mode in which the translation applies (``'text'`` or ``'math'``). :param str package: LaTeX package requirements (currently ignored). :param bool decode: Whether this translation applies to decoding (default: ``True``). :param bool encode: Whether this translation applies to encoding (default: ``True``). """ if mode == 'math': # also register text version self.register(unicode_text, u'$' + latex_text + u'$', mode='text', package=package, decode=decode, encode=encode) self.register(unicode_text, text_type(r'\(') + latex_text + text_type(r'\)'), mode='text', package=package, decode=decode, encode=encode) # XXX for the time being, we do not perform in-math substitutions return if package is not None: # TODO implement packages pass # tokenize, and register unicode translation self.lexer.reset() self.lexer.state = 'M' tokens = tuple(self.lexer.get_tokens(latex_text, final=True)) if decode: if tokens not in self.unicode_map: self.max_length = max(self.max_length, len(tokens)) self.unicode_map[tokens] = unicode_text # also register token variant with brackets, if appropriate # for instance, "\'{e}" for "\'e", "\c{c}" for "\c c", etc. # note: we do not remove brackets (they sometimes matter, # e.g. bibtex uses them to prevent lower case transformation) if (len(tokens) == 2 and tokens[0].name.startswith(u'control') and tokens[1].name == u'chars'): alt_tokens = (tokens[0], self.lexer.curlylefttoken, tokens[1], self.lexer.curlyrighttoken) if alt_tokens not in self.unicode_map: self.max_length = max(self.max_length, len(alt_tokens)) self.unicode_map[alt_tokens] = u"{" + unicode_text + u"}" if encode and unicode_text not in self.latex_map: assert len(unicode_text) == 1 self.latex_map[unicode_text] = (latex_text, tokens) _LATEX_UNICODE_TABLE = LatexUnicodeTable(lexer.LatexIncrementalDecoder()) _ULATEX_UNICODE_TABLE = LatexUnicodeTable( lexer.UnicodeLatexIncrementalDecoder()) # incremental encoder does not need a buffer # but decoder does class LatexIncrementalEncoder(lexer.LatexIncrementalEncoder): """Translating incremental encoder for latex. Maintains a state to determine whether control spaces etc. need to be inserted. """ emptytoken = lexer.Token(u"unknown", u"") """The empty token.""" table = _LATEX_UNICODE_TABLE """Translation table.""" def __init__(self, errors='strict'): super(LatexIncrementalEncoder, self).__init__(errors=errors) self.reset() def reset(self): super(LatexIncrementalEncoder, self).reset() self.state = 'M' def get_space_bytes(self, bytes_): """Inserts space bytes in space eating mode.""" if self.state == 'S': # in space eating mode # control space needed? if bytes_.startswith(u' '): # replace by control space return u'\\ ', bytes_[1:] else: # insert space (it is eaten, but needed for separation) return u' ', bytes_ else: return u'', bytes_ def _get_latex_chars_tokens_from_char(self, c): # if ascii, try latex equivalents # (this covers \, #, &, and other special LaTeX characters) if ord(c) < 128: try: return self.table.latex_map[c] except KeyError: pass # next, try input encoding try: bytes_ = c.encode(self.inputenc, 'strict') except UnicodeEncodeError: pass else: return c, (lexer.Token(name=u'chars', text=c),) # next, try latex equivalents of common unicode characters try: return self.table.latex_map[c] except KeyError: # translation failed if self.errors == 'strict': raise UnicodeEncodeError( "latex", # codec c, # problematic input 0, 1, # location of problematic character "don't know how to translate {0} into latex" .format(repr(c))) elif self.errors == 'ignore': return u'', (self.emptytoken,) elif self.errors == 'replace': # use the \\char command # this assumes # \usepackage[T1]{fontenc} # \usepackage[utf8]{inputenc} bytes_ = u'{\\char' + str(ord(c)) + u'}' return bytes_, (lexer.Token(name=u'chars', text=bytes_),) elif self.errors == 'keep' and not self.binary_mode: return c, (lexer.Token(name=u'chars', text=c),) else: raise ValueError( "latex codec does not support {0} errors" .format(self.errors)) def get_latex_chars(self, unicode_, final=False): if not isinstance(unicode_, string_types): raise TypeError( "expected unicode for encode input, but got {0} instead" .format(unicode_.__class__.__name__)) # convert character by character for pos, c in enumerate(unicode_): bytes_, tokens = self._get_latex_chars_tokens_from_char(c) space, bytes_ = self.get_space_bytes(bytes_) # update state if tokens[-1].name == u'control_word': # we're eating spaces self.state = 'S' else: self.state = 'M' if space: yield space yield bytes_ class LatexIncrementalDecoder(lexer.LatexIncrementalDecoder): """Translating incremental decoder for LaTeX.""" table = _LATEX_UNICODE_TABLE """Translation table.""" def __init__(self, errors='strict'): lexer.LatexIncrementalDecoder.__init__(self, errors=errors) def reset(self): lexer.LatexIncrementalDecoder.reset(self) self.token_buffer = [] # python codecs API does not support multibuffer incremental decoders def getstate(self): raise NotImplementedError def setstate(self, state): raise NotImplementedError def get_unicode_tokens(self, chars, final=False): for token in self.get_tokens(chars, final=final): # at this point, token_buffer does not match anything self.token_buffer.append(token) # new token appended at the end, see if we have a match now # note: match is only possible at the *end* of the buffer # because all other positions have already been checked in # earlier iterations for i in range(len(self.token_buffer), 0, -1): last_tokens = tuple(self.token_buffer[-i:]) # last i tokens try: unicode_text = self.table.unicode_map[last_tokens] except KeyError: # no match: continue continue else: # match!! flush buffer, and translate last bit # exclude last i tokens for token in self.token_buffer[:-i]: yield self.decode_token(token) yield unicode_text self.token_buffer = [] break # flush tokens that can no longer match while len(self.token_buffer) >= self.table.max_length: yield self.decode_token(self.token_buffer.pop(0)) # also flush the buffer at the end if final: for token in self.token_buffer: yield self.decode_token(token) self.token_buffer = [] class LatexCodec(codecs.Codec): IncrementalEncoder = None IncrementalDecoder = None def encode(self, unicode_, errors='strict'): """Convert unicode string to LaTeX bytes.""" encoder = self.IncrementalEncoder(errors=errors) return ( encoder.encode(unicode_, final=True), len(unicode_), ) def decode(self, bytes_, errors='strict'): """Convert LaTeX bytes to unicode string.""" decoder = self.IncrementalDecoder(errors=errors) return ( decoder.decode(bytes_, final=True), len(bytes_), ) class UnicodeLatexIncrementalDecoder(LatexIncrementalDecoder): table = _ULATEX_UNICODE_TABLE binary_mode = False class UnicodeLatexIncrementalEncoder(LatexIncrementalEncoder): table = _ULATEX_UNICODE_TABLE binary_mode = False def find_latex(encoding): """Return a :class:`codecs.CodecInfo` instance for the requested LaTeX *encoding*, which must be equal to ``latex``, or to ``latex+`` where ```` describes another encoding. """ if u'_' in encoding: # Python 3.9 now normalizes "latex+latin1" to "latex_latin1" # https://bugs.python.org/issue37751 encoding, _, inputenc_ = encoding.partition(u"_") else: encoding, _, inputenc_ = encoding.partition(u"+") if not inputenc_: inputenc_ = "ascii" if encoding == "latex": IncEnc = LatexIncrementalEncoder DecEnc = LatexIncrementalDecoder elif encoding == "ulatex": IncEnc = UnicodeLatexIncrementalEncoder DecEnc = UnicodeLatexIncrementalDecoder else: return None class IncrementalEncoder_(IncEnc): inputenc = inputenc_ class IncrementalDecoder_(DecEnc): inputenc = inputenc_ class Codec(LatexCodec): IncrementalEncoder = IncrementalEncoder_ IncrementalDecoder = IncrementalDecoder_ class StreamWriter(Codec, codecs.StreamWriter): pass class StreamReader(Codec, codecs.StreamReader): pass return codecs.CodecInfo( encode=Codec().encode, decode=Codec().decode, incrementalencoder=Codec.IncrementalEncoder, incrementaldecoder=Codec.IncrementalDecoder, streamreader=StreamReader, streamwriter=StreamWriter, ) latexcodec-2.0.1/latexcodec/lexer.py0000644005105600024240000004124413674355130017415 0ustar dma0mtdma00000000000000# -*- coding: utf-8 -*- """ LaTeX Lexer ~~~~~~~~~~~ This module contains all classes for lexing LaTeX code, as well as general purpose base classes for incremental LaTeX decoders and encoders, which could be useful in case you are writing your own custom LaTeX codec. .. autoclass:: Token(name, text) .. autoclass:: LatexLexer :show-inheritance: :members: .. autoclass:: LatexIncrementalLexer :show-inheritance: :members: .. autoclass:: LatexIncrementalDecoder :show-inheritance: :members: .. autoclass:: LatexIncrementalEncoder :show-inheritance: :members: """ # Copyright (c) 2003, 2008 David Eppstein # Copyright (c) 2011-2020 Matthias C. M. Troffaes # # Permission is hereby granted, free of charge, to any person # obtaining a copy of this software and associated documentation # files (the "Software"), to deal in the Software without # restriction, including without limitation the rights to use, # copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the # Software is furnished to do so, subject to the following # conditions: # # The above copyright notice and this permission notice shall be # included in all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES # OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND # NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, # WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING # FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR # OTHER DEALINGS IN THE SOFTWARE. import codecs import collections import re from six import add_metaclass, binary_type, string_types import unicodedata Token = collections.namedtuple("Token", "name text") # implementation note: we derive from IncrementalDecoder because this # class serves excellently as a base class for incremental decoders, # but of course we don't decode yet until later class MetaRegexpLexer(type): """Metaclass for :class:`RegexpLexer`. Compiles tokens into a regular expression. """ def __init__(cls, name, bases, dct): super(MetaRegexpLexer, cls).__init__(name, bases, dct) regexp_string = (u"|".join( u"(?P<" + name + u">" + regexp + u")" for name, regexp in cls.tokens)) cls.regexp = re.compile(regexp_string, re.DOTALL) @add_metaclass(MetaRegexpLexer) class RegexpLexer(codecs.IncrementalDecoder): """Abstract base class for regexp based lexers.""" emptytoken = Token(u"unknown", u"") """The empty token.""" tokens = () """Tuple containing all token regular expressions.""" def __init__(self, errors='strict'): """Initialize the codec.""" self.errors = errors self.reset() def reset(self): """Reset state.""" # buffer for storing last (possibly incomplete) token self.raw_buffer = self.emptytoken def getstate(self): """Get state.""" return (self.raw_buffer.text, 0) def setstate(self, state): """Set state. The *state* must correspond to the return value of a previous :meth:`getstate` call. """ self.raw_buffer = Token('unknown', state[0]) def get_raw_tokens(self, chars, final=False): """Yield tokens without any further processing. Tokens are one of: - ``\\``: a control word (i.e. a command) - ``\\``: a control symbol (i.e. \\^ etc.) - ``#``: a parameter - a series of byte characters """ if self.raw_buffer.text: chars = self.raw_buffer.text + chars self.raw_buffer = self.emptytoken for match in self.regexp.finditer(chars): # yield the buffer token if self.raw_buffer.text: yield self.raw_buffer # fill buffer with next token self.raw_buffer = Token(match.lastgroup, match.group(0)) if final: for token in self.flush_raw_tokens(): yield token def flush_raw_tokens(self): """Flush the raw token buffer.""" if self.raw_buffer.text: yield self.raw_buffer self.raw_buffer = self.emptytoken class LatexLexer(RegexpLexer): """A very simple lexer for tex/latex.""" # implementation note: every token **must** be decodable by inputenc tokens = ( # match newlines and percent first, to ensure comments match correctly (u'control_symbol_x2', r'[\\][\\]|[\\]%'), # comment: for ease, and for speed, we handle it as a token (u'comment', r'%[^\n]*'), # control tokens # in latex, some control tokens skip following whitespace # ('control-word' and 'control-symbol') # others do not ('control-symbol-x') # XXX TBT says no control symbols skip whitespace (except '\ ') # XXX but tests reveal otherwise? (u'control_word', r'[\\][a-zA-Z]+'), (u'control_symbol', r'[\\][~' r"'" r'"` =^!.]'), # TODO should only match ascii (u'control_symbol_x', r'[\\][^a-zA-Z]'), # parameter tokens # also support a lone hash so we can lex things like '#a' (u'parameter', r'\#[0-9]|\#'), # any remaining characters; for ease we also handle space and # newline as tokens # XXX TBT does not mention \t to be a space character as well # XXX but tests reveal otherwise? (u'space', r' |\t'), (u'newline', r'\n'), (u'mathshift', r'[$][$]|[$]'), # note: some chars joined together to make it easier to detect # symbols that have a special function (i.e. --, ---, etc.) (u'chars', r'---|--|-|[`][`]' r"|['][']" r'|[?][`]|[!][`]' # separate chars because brackets are optional # e.g. fran\\c cais = fran\\c{c}ais in latex # so only way to detect \\c acting on c only is this way r'|(?![ %#$\n\t\\]).'), # trailing garbage which we cannot decode otherwise # (such as a lone '\' at the end of a buffer) # is never emitted, but used internally by the buffer (u'unknown', r'.'), ) """List of token names, and the regular expressions they match.""" class LatexIncrementalLexer(LatexLexer): """A very simple incremental lexer for tex/latex code. Roughly follows the state machine described in Tex By Topic, Chapter 2. The generated tokens satisfy: * no newline characters: paragraphs are separated by '\\par' * spaces following control tokens are compressed """ partoken = Token(u"control_word", u"\\par") spacetoken = Token(u"space", u" ") replacetoken = Token(u"chars", u"\ufffd") curlylefttoken = Token(u"chars", u"{") curlyrighttoken = Token(u"chars", u"}") def reset(self): super(LatexIncrementalLexer, self).reset() # three possible states: # newline (N), skipping spaces (S), and middle of line (M) self.state = 'N' # inline math mode? self.inline_math = False def getstate(self): # state 'M' is most common, so let that be zero return ( self.raw_buffer, {'M': 0, 'N': 1, 'S': 2}[self.state] | (4 if self.inline_math else 0) ) def setstate(self, state): self.raw_buffer = state[0] self.state = {0: 'M', 1: 'N', 2: 'S'}[state[1] & 3] self.inline_math = bool(state[1] & 4) def get_tokens(self, chars, final=False): """Yield tokens while maintaining a state. Also skip whitespace after control words and (some) control symbols. Replaces newlines by spaces and \\par commands depending on the context. """ # current position relative to the start of chars in the sequence # of bytes that have been decoded pos = -len(self.raw_buffer.text) for token in self.get_raw_tokens(chars, final=final): pos = pos + len(token.text) assert pos >= 0 # first token includes at least self.raw_buffer if token.name == u'newline': if self.state == 'N': # if state was 'N', generate new paragraph yield self.partoken elif self.state == 'S': # switch to 'N' state, do not generate a space self.state = 'N' elif self.state == 'M': # switch to 'N' state, generate a space self.state = 'N' yield self.spacetoken else: raise AssertionError( "unknown tex state {0!r}".format(self.state)) elif token.name == u'space': if self.state == 'N': # remain in 'N' state, no space token generated pass elif self.state == 'S': # remain in 'S' state, no space token generated pass elif self.state == 'M': # in M mode, generate the space, # but switch to space skip mode self.state = 'S' yield token else: raise AssertionError( "unknown state {0!r}".format(self.state)) elif token.name == u'mathshift': self.inline_math = not self.inline_math self.state = 'M' yield token elif token.name == u'parameter': self.state = 'M' yield token elif token.name == u'control_word': # go to space skip mode self.state = 'S' yield token elif token.name == u'control_symbol': # go to space skip mode self.state = 'S' yield token elif (token.name == u'control_symbol_x' or token.name == u'control_symbol_x2'): # don't skip following space, so go to M mode self.state = 'M' yield token elif token.name == u'comment': # no token is generated # note: comment does not include the newline self.state = 'S' elif token.name == 'chars': self.state = 'M' yield token elif token.name == u'unknown': if self.errors == 'strict': # current position within chars # this is the position right after the unknown token raise UnicodeDecodeError( "latex", # codec chars.encode('utf8'), # problematic input pos - len(token.text), # start of problematic token pos, # end of it "unknown token {0!r}".format(token.text)) elif self.errors == 'ignore': # do nothing pass elif self.errors == 'replace': yield self.replacetoken else: raise NotImplementedError( "error mode {0!r} not supported".format(self.errors)) else: raise AssertionError( "unknown token name {0!r}".format(token.name)) class LatexIncrementalDecoder(LatexIncrementalLexer): """Simple incremental decoder. Transforms lexed LaTeX tokens into unicode. To customize decoding, subclass and override :meth:`get_unicode_tokens`. """ inputenc = "ascii" """Input encoding. **Must** extend ascii.""" binary_mode = True """Whether this lexer processes binary data (bytes) or text data (unicode). """ def __init__(self, errors='strict'): super(LatexIncrementalDecoder, self).__init__(errors) self.decoder = codecs.getincrementaldecoder(self.inputenc)(errors) def decode_token(self, token): """Returns the decoded token text. .. note:: Control words get an extra space added at the back to make sure separation from the next token, so that decoded token sequences can be joined together. For example, the tokens ``u'\\hello'`` and ``u'world'`` will correctly result in ``u'\\hello world'`` (remember that LaTeX eats space following control words). If no space were added, this would wrongfully result in ``u'\\helloworld'``. """ text = token.text return text if token.name != u'control_word' else text + u' ' def get_unicode_tokens(self, chars, final=False): """Decode every token. Override to process the tokens in some other way (for example, for token translation). """ for token in self.get_tokens(chars, final=final): yield self.decode_token(token) def decode(self, bytes_, final=False): """Decode LaTeX *bytes_* into a unicode string. This implementation calls :meth:`get_unicode_tokens` and joins the resulting unicode strings together. """ if self.binary_mode: try: # in python 3, the token text can be a memoryview # which do not have a decode method; must cast to # bytes explicitly chars = self.decoder.decode(binary_type(bytes_), final=final) except UnicodeDecodeError as e: # API requires that the encode method raises a ValueError # in this case raise ValueError(e) else: chars = bytes_ return u''.join(self.get_unicode_tokens(chars, final=final)) class LatexIncrementalEncoder(codecs.IncrementalEncoder): """Simple incremental encoder for LaTeX. Transforms unicode into :class:`bytes`. To customize decoding, subclass and override :meth:`get_latex_bytes`. """ inputenc = "ascii" """Input encoding. **Must** extend ascii.""" binary_mode = True """Whether this lexer processes binary data (bytes) or text data (unicode). """ def __init__(self, errors='strict'): """Initialize the codec.""" self.errors = errors self.reset() def reset(self): """Reset state.""" # buffer for storing last (possibly incomplete) token self.buffer = u"" def getstate(self): """Get state.""" return self.buffer def setstate(self, state): """Set state. The *state* must correspond to the return value of a previous :meth:`getstate` call. """ self.buffer = state def get_unicode_tokens(self, unicode_, final=False): """Split unicode into tokens so that every token starts with a non-combining character. """ if not isinstance(unicode_, string_types): raise TypeError( "expected unicode for encode input, but got {0} instead" .format(unicode_.__class__.__name__)) for c in unicode_: if not unicodedata.combining(c): for token in self.flush_unicode_tokens(): yield token self.buffer += c if final: for token in self.flush_unicode_tokens(): yield token def flush_unicode_tokens(self): """Flush the buffer.""" if self.buffer: yield self.buffer self.buffer = u"" def get_latex_chars(self, unicode_, final=False): """Encode every character. Override to process the unicode in some other way (for example, for character translation). """ for token in self.get_unicode_tokens(unicode_, final=final): yield token def encode(self, unicode_, final=False): """Encode the *unicode_* string into LaTeX :class:`bytes`. This implementation calls :meth:`get_latex_chars` and joins the resulting :class:`bytes` together. """ chars = u''.join(self.get_latex_chars(unicode_, final=final)) if self.binary_mode: try: return chars.encode(self.inputenc, self.errors) except UnicodeEncodeError as e: # API requires that the encode method raises a ValueError # in this case raise ValueError(e) else: return chars class UnicodeLatexIncrementalDecoder(LatexIncrementalDecoder): binary_mode = False class UnicodeLatexIncrementalEncoder(LatexIncrementalEncoder): binary_mode = False latexcodec-2.0.1/latexcodec.egg-info/0000755005105600024240000000000013674355145017417 5ustar dma0mtdma00000000000000latexcodec-2.0.1/latexcodec.egg-info/PKG-INFO0000644005105600024240000001250413674355145020516 0ustar dma0mtdma00000000000000Metadata-Version: 1.2 Name: latexcodec Version: 2.0.1 Summary: A lexer and codec to work with LaTeX code in Python. Home-page: https://github.com/mcmtroffaes/latexcodec Author: Matthias C. M. Troffaes Author-email: matthias.troffaes@gmail.com License: MIT Download-URL: http://pypi.python.org/pypi/latexcodec Description: * Download: http://pypi.python.org/pypi/latexcodec/#downloads * Documentation: http://latexcodec.readthedocs.org/ * Development: http://github.com/mcmtroffaes/latexcodec/ .. |travis| image:: https://travis-ci.org/mcmtroffaes/latexcodec.png?branch=develop :target: https://travis-ci.org/mcmtroffaes/latexcodec :alt: travis-ci .. |codecov| image:: https://codecov.io/gh/mcmtroffaes/latexcodec/branch/develop/graph/badge.svg :target: https://codecov.io/gh/mcmtroffaes/latexcodec :alt: codecov The codec provides a convenient way of going between text written in LaTeX and unicode. Since it is not a LaTeX compiler, it is more appropriate for short chunks of text, such as a paragraph or the values of a BibTeX entry, and it is not appropriate for a full LaTeX document. In particular, its behavior on the LaTeX commands that do not simply select characters is intended to allow the unicode representation to be understandable by a human reader, but is not canonical and may require hand tuning to produce the desired effect. The encoder does a best effort to replace unicode characters outside of the range used as LaTeX input (ascii by default) with a LaTeX command that selects the character. More technically, the unicode code point is replaced by a LaTeX command that selects a glyph that reasonably represents the code point. Unicode characters with special uses in LaTeX are replaced by their LaTeX equivalents. For example, ====================== =================== original text encoded LaTeX ====================== =================== ``¥`` ``\yen`` ``ü`` ``\"u`` ``\N{NO-BREAK SPACE}`` ``~`` ``~`` ``\textasciitilde`` ``%`` ``\%`` ``#`` ``\#`` ``\textbf{x}`` ``\textbf{x}`` ====================== =================== The decoder does a best effort to replace LaTeX commands that select characters with the unicode for the character they are selecting. For example, ===================== ====================== original LaTeX decoded unicode ===================== ====================== ``\yen`` ``¥`` ``\"u`` ``ü`` ``~`` ``\N{NO-BREAK SPACE}`` ``\textasciitilde`` ``~`` ``\%`` ``%`` ``\#`` ``#`` ``\textbf{x}`` ``\textbf {x}`` ``#`` ``#`` ===================== ====================== In addition, comments are dropped (including the final newline that marks the end of a comment), paragraphs are canonicalized into double newlines, and other newlines are left as is. Spacing after LaTeX commands is also canonicalized. For example, :: hi % bye there\par world \textbf {awesome} is decoded as :: hi there world \textbf {awesome} When decoding, LaTeX commands not directly selecting characters (for example, macros and formatting commands) are passed through unchanged. The same happens for LaTeX commands that select characters but are not yet recognized by the codec. Either case can result in a hybrid unicode string in which some characters are understood as literally the character and others as parts of unexpanded commands. Consequently, at times, backslashes will be left intact for denoting the start of a potentially unrecognized control sequence. Given the numerous and changing packages providing such LaTeX commands, the codec will never be complete, and new translations of unrecognized unicode or unrecognized LaTeX symbols are always welcome. Platform: any Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: MIT License Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Topic :: Text Processing :: Markup :: LaTeX Classifier: Topic :: Text Processing :: Filters Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.* latexcodec-2.0.1/latexcodec.egg-info/SOURCES.txt0000644005105600024240000000115713674355145021307 0ustar dma0mtdma00000000000000AUTHORS.rst CHANGELOG.rst INSTALL.rst LICENSE.rst MANIFEST.in README.rst VERSION requirements.txt setup.cfg setup.py doc/Makefile doc/api.rst doc/authors.rst doc/changes.rst doc/conf.py doc/index.rst doc/license.rst doc/make.bat doc/quickstart.rst doc/api/codec.rst doc/api/lexer.rst latexcodec/__init__.py latexcodec/codec.py latexcodec/lexer.py latexcodec.egg-info/PKG-INFO latexcodec.egg-info/SOURCES.txt latexcodec.egg-info/dependency_links.txt latexcodec.egg-info/requires.txt latexcodec.egg-info/top_level.txt latexcodec.egg-info/zip-safe test/test_install_example.py test/test_latex_codec.py test/test_latex_lexer.pylatexcodec-2.0.1/latexcodec.egg-info/dependency_links.txt0000644005105600024240000000000113674355145023465 0ustar dma0mtdma00000000000000 latexcodec-2.0.1/latexcodec.egg-info/requires.txt0000644005105600024240000000001313674355145022011 0ustar dma0mtdma00000000000000six>=1.4.1 latexcodec-2.0.1/latexcodec.egg-info/top_level.txt0000644005105600024240000000001313674355145022143 0ustar dma0mtdma00000000000000latexcodec latexcodec-2.0.1/latexcodec.egg-info/zip-safe0000644005105600024240000000000113674355145021047 0ustar dma0mtdma00000000000000 latexcodec-2.0.1/requirements.txt0000644005105600024240000000001213674352323017062 0ustar dma0mtdma00000000000000six>=1.4.1latexcodec-2.0.1/setup.cfg0000644005105600024240000000023113674355145015427 0ustar dma0mtdma00000000000000[nosetests] with-coverage = 1 cover-package = latexcodec cover-branches = 1 cover-html = 1 [wheel] universal = 1 [egg_info] tag_build = tag_date = 0 latexcodec-2.0.1/setup.py0000644005105600024240000000314713674355130015323 0ustar dma0mtdma00000000000000# -*- coding: utf-8 -*- import io from setuptools import setup, find_packages def readfile(filename): with io.open(filename, encoding="utf-8") as stream: return stream.read().split("\n") readme = readfile("README.rst")[5:] # skip title and badges requires = readfile("requirements.txt") version = readfile("VERSION")[0].strip() setup( name='latexcodec', version=version, url='https://github.com/mcmtroffaes/latexcodec', download_url='http://pypi.python.org/pypi/latexcodec', license='MIT', author='Matthias C. M. Troffaes', author_email='matthias.troffaes@gmail.com', description=readme[0], long_description="\n".join(readme[2:]), zip_safe=True, classifiers=[ 'Development Status :: 5 - Production/Stable', 'Environment :: Console', 'Intended Audience :: Developers', 'License :: OSI Approved :: MIT License', 'Operating System :: OS Independent', 'Programming Language :: Python', 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: 3.5', 'Programming Language :: Python :: 3.6', 'Programming Language :: Python :: 3.7', 'Programming Language :: Python :: 3.8', 'Topic :: Text Processing :: Markup :: LaTeX', 'Topic :: Text Processing :: Filters', ], platforms='any', packages=find_packages(), install_requires=requires, python_requires='>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*', ) latexcodec-2.0.1/test/0000755005105600024240000000000013674355145014571 5ustar dma0mtdma00000000000000latexcodec-2.0.1/test/test_install_example.py0000644005105600024240000000255013674352304021357 0ustar dma0mtdma00000000000000# -*- coding: utf-8 -*- def test_install_example_1(): import latexcodec # noqa text_latex = b"\\'el\\`eve" assert text_latex.decode("latex") == u"élève" text_unicode = u"ångström" assert text_unicode.encode("latex") == b'\\aa ngstr\\"om' def test_install_example_2(): import codecs import latexcodec # noqa text_latex = u"\\'el\\`eve" assert codecs.decode(text_latex, "ulatex") == u"élève" text_unicode = u"ångström" assert codecs.encode(text_unicode, "ulatex") == u'\\aa ngstr\\"om' def test_install_example_3(): import latexcodec # noqa text_latex = b"\xfe" assert text_latex.decode("latex+latin1") == u"þ" assert text_latex.decode("latex+latin2") == u"ţ" text_unicode = u"ţ" assert text_unicode.encode("latex+latin1") == b'\\c t' # ţ is not latin1 assert text_unicode.encode("latex+latin2") == b'\xfe' # but it is latin2 def test_install_example_4(): import codecs import latexcodec # noqa text_unicode = u'⌨' # \u2328 = keyboard symbol, currently not translated try: # raises a value error as \u2328 cannot be encoded into latex codecs.encode(text_unicode, "ulatex+ascii") except ValueError: pass assert codecs.encode(text_unicode, "ulatex+ascii", "keep") == u'⌨' assert codecs.encode(text_unicode, "ulatex+utf8") == u'⌨' latexcodec-2.0.1/test/test_latex_codec.py0000644005105600024240000004123313674352304020451 0ustar dma0mtdma00000000000000# -*- coding: utf-8 -*- """Tests for the latex codec.""" from __future__ import print_function import codecs import pytest from six import text_type, binary_type, BytesIO, PY2 from unittest import TestCase import latexcodec def test_getregentry(): assert latexcodec.codec.getregentry() is not None def test_find_latex(): assert latexcodec.codec.find_latex('hello') is None def test_latex_incremental_decoder_getstate(): encoder = codecs.getincrementaldecoder('latex')() with pytest.raises(NotImplementedError): encoder.getstate() def test_latex_incremental_decoder_setstate(): encoder = codecs.getincrementaldecoder('latex')() state = (u'', 0) with pytest.raises(NotImplementedError): encoder.setstate(state) def split_input(input_): """Helper function for testing the incremental encoder and decoder.""" if not isinstance(input_, (text_type, binary_type)): raise TypeError("expected unicode or bytes input") if input_: for i in range(len(input_)): if i + 1 < len(input_): yield input_[i:i + 1], False else: yield input_[i:i + 1], True else: yield input_, True class TestDecoder(TestCase): """Stateless decoder tests.""" maxDiff = None def decode(self, text_utf8, text_latex, inputenc=None): """Main test function.""" encoding = 'latex+' + inputenc if inputenc else 'latex' decoded, n = codecs.getdecoder(encoding)(text_latex) self.assertEqual((decoded, n), (text_utf8, len(text_latex))) def test_invalid_type(self): with pytest.raises(TypeError): codecs.getdecoder("latex")(object()) def test_invalid_code(self): with pytest.raises(ValueError): # b'\xe9' is invalid utf-8 code self.decode(u'', b'\xe9 ', 'utf-8') def test_null(self): self.decode(u'', b'') def test_maelstrom(self): self.decode(u"mælström", br'm\ae lstr\"om') def test_maelstrom_latin1(self): self.decode(u"mælström", b'm\\ae lstr\xf6m', 'latin1') def test_laren(self): self.decode( u"© låren av björn", br'\copyright\ l\aa ren av bj\"orn') def test_laren_brackets(self): self.decode( u"© l{å}ren av bj{ö}rn", br'\copyright\ l{\aa}ren av bj{\"o}rn') def test_laren_latin1(self): self.decode( u"© låren av björn", b'\\copyright\\ l\xe5ren av bj\xf6rn', 'latin1') def test_droitcivil(self): self.decode( u"Même s'il a fait l'objet d'adaptations suite à l'évolution, " u"la transformation sociale, économique et politique du pays, " u"le code civil fran{ç}ais est aujourd'hui encore le texte " u"fondateur " u"du droit civil français mais aussi du droit civil belge " u"ainsi que " u"de plusieurs autres droits civils.", b"M\\^eme s'il a fait l'objet d'adaptations suite " b"\\`a l'\\'evolution, \nla transformation sociale, " b"\\'economique et politique du pays, \nle code civil " b"fran\\c{c}ais est aujourd'hui encore le texte fondateur \n" b"du droit civil fran\\c cais mais aussi du droit civil " b"belge ainsi que \nde plusieurs autres droits civils.", ) def test_oeuf(self): self.decode( u"D'un point de vue diététique, l'œuf apaise la faim.", br"D'un point de vue di\'et\'etique, l'\oe uf apaise la faim.", ) def test_oeuf_latin1(self): self.decode( u"D'un point de vue diététique, l'œuf apaise la faim.", b"D'un point de vue di\xe9t\xe9tique, l'\\oe uf apaise la faim.", 'latin1' ) def test_alpha(self): self.decode(u"α", b"$\\alpha$") def test_maelstrom_multibyte_encoding(self): self.decode(u"\\c öké", b'\\c \xc3\xb6k\xc3\xa9', 'utf8') def test_serafin(self): self.decode(u"Seraf{\xed}n", b"Seraf{\\'i}n") def test_astrom(self): self.decode(u"{\xc5}str{\xf6}m", b'{\\AA}str{\\"o}m') def test_space_1(self): self.decode(u"ææ", br'\ae \ae') def test_space_2(self): self.decode(u"æ æ", br'\ae\ \ae') def test_space_3(self): self.decode(u"æ æ", br'\ae \quad \ae') def test_number_sign_1(self): self.decode(u"# hello", br'\#\ hello') def test_number_sign_2(self): # LaTeX does not absorb the space following '\#': # check decoding is correct self.decode(u"# hello", br'\# hello') def test_number_sign_3(self): # a single '#' is not valid LaTeX: # for the moment we ignore this error and return # unchanged self.decode(u"# hello", br'# hello') def test_underscore(self): self.decode(u"_", br'\_') def test_dz(self): self.decode(u"DZ", br'DZ') def test_newline(self): self.decode(u"hello world", b"hello\nworld") def test_par1(self): self.decode(u"hello\n\nworld", b"hello\n\nworld") def test_par2(self): self.decode(u"hello\n\nworld", b"hello\\par world") def test_par3(self): self.decode(u"hello\n\nworld", b"hello \\par world") def test_ogonek1(self): self.decode(u"ĄąĘęĮįǪǫŲų", br'\k A\k a\k E\k e\k I\k i\k O\k o\k U\k u') def test_ogonek2(self): # note: should decode into u"Ǭǭ" but can't support this yet... self.decode(u"\\textogonekcentered {Ō}\\textogonekcentered {ō}", br'\textogonekcentered{\=O}\textogonekcentered{\=o}') def test_math_spacing_dollar(self): self.decode(u'This is a ψ test.', br'This is a $\psi$ test.') def test_math_spacing_brace(self): self.decode(u'This is a ψ test.', br'This is a \(\psi\) test.') def test_double_math(self): # currently no attempt to translate maths inside $$ self.decode(u'This is a $$\\psi $$ test.', br'This is a $$\psi$$ test.') def test_tilde(self): self.decode(u'This is a ˜, ˷, ∼ and ~test.', (br'This is a \~{}, \texttildelow, ' br'$\sim$ and \textasciitilde test.')) def test_backslash(self): self.decode(u'This is a \\ \\test.', br'This is a $\backslash$ \textbackslash test.') def test_percent(self): self.decode(u'This is a % test.', br'This is a \% test.') def test_math_minus(self): self.decode(u'This is a − test.', br'This is a $-$ test.') def test_swedish_again(self): self.decode( u"l{å}ren l{Å}ren", br'l{\r a}ren l{\r A}ren') def test_double_quotes(self): self.decode(u"“a+b”", br"``a+b''") def test_double_quotes_unicode(self): self.decode(u"“á”", u"``á''".encode("utf8"), "utf8") def test_double_quotes_gb2312(self): self.decode(u"“你好”", u"``你好''".encode('gb2312'), 'gb2312') def test_theta(self): self.decode(u"θ", br"$\theta$") self.decode(u"θ", br"\texttheta") def test_decode_comment(self): self.decode(u"\\\\", br"\\%") self.decode(u"% abc \\\\\\\\% ghi", b"\\% abc\n\\\\% def\n\\\\\\% ghi") def test_decode_lower_quotes(self): self.decode(u"„", br",,") self.decode(u"„", br"\glqq") def test_decode_guillemet(self): self.decode(u"«quote»", br"\guillemotleft quote\guillemotright") class TestStreamDecoder(TestDecoder): """Stream decoder tests.""" def decode(self, text_utf8, text_latex, inputenc=None): encoding = 'latex+' + inputenc if inputenc else 'latex' stream = BytesIO(text_latex) reader = codecs.getreader(encoding)(stream) self.assertEqual(text_utf8, reader.read()) # in this test, BytesIO(object()) is eventually called # this is valid on Python 2, so we skip this test there def test_invalid_type(self): if PY2: pytest.skip("test not relevant for Python 2") else: TestDecoder.test_invalid_type(self) class TestIncrementalDecoder(TestDecoder): """Incremental decoder tests.""" def decode(self, text_utf8, text_latex, inputenc=None): encoding = 'latex+' + inputenc if inputenc else 'latex' decoder = codecs.getincrementaldecoder(encoding)() decoded_parts = ( decoder.decode(text_latex_part, final) for text_latex_part, final in split_input(text_latex)) self.assertEqual(text_utf8, u''.join(decoded_parts)) class TestEncoder(TestCase): """Stateless encoder tests.""" def encode(self, text_utf8, text_latex, inputenc=None, errors='strict'): """Main test function.""" encoding = 'latex+' + inputenc if inputenc else 'latex' encoded, n = codecs.getencoder(encoding)(text_utf8, errors=errors) self.assertEqual((encoded, n), (text_latex, len(text_utf8))) def test_invalid_type(self): with pytest.raises(TypeError): codecs.getencoder("latex")(object()) # note concerning test_invalid_code_* methods: # u'\u2328' (0x2328 = 9000) is unicode for keyboard symbol # we currently provide no translation for this into LaTeX code def test_invalid_code_strict(self): with pytest.raises(ValueError): self.encode(u'\u2328', b'', 'ascii', 'strict') def test_invalid_code_ignore(self): self.encode(u'\u2328', b'', 'ascii', 'ignore') def test_invalid_code_replace(self): self.encode(u'\u2328', b'{\\char9000}', 'ascii', 'replace') def test_invalid_code_baderror(self): with pytest.raises(ValueError): self.encode(u'\u2328', b'', 'ascii', '**baderror**') def test_null(self): self.encode(u'', b'') def test_maelstrom(self): self.encode(u"mælström", br'm\ae lstr\"om') def test_maelstrom_latin1(self): self.encode(u"mælström", b'm\xe6lstr\xf6m', 'latin1') def test_laren(self): self.encode( u"© låren av björn", br'\copyright\ l\aa ren av bj\"orn') def test_laren_latin1(self): self.encode( u"© låren av björn", b'\xa9 l\xe5ren av bj\xf6rn', 'latin1') def test_droitcivil(self): self.encode( u"Même s'il a fait l'objet d'adaptations suite à l'évolution, \n" u"la transformation sociale, économique et politique du pays, \n" u"le code civil fran{ç}ais est aujourd'hui encore le texte " u"fondateur \n" u"du droit civil français mais aussi du droit civil belge " u"ainsi que \n" u"de plusieurs autres droits civils.", b"M\\^eme s'il a fait l'objet d'adaptations suite " b"\\`a l'\\'evolution, \nla transformation sociale, " b"\\'economique et politique du pays, \nle code civil " b"fran{\\c c}ais est aujourd'hui encore le texte fondateur \n" b"du droit civil fran\\c cais mais aussi du droit civil " b"belge ainsi que \nde plusieurs autres droits civils.", ) def test_oeuf(self): self.encode( u"D'un point de vue diététique, l'œuf apaise la faim.", br"D'un point de vue di\'et\'etique, l'\oe uf apaise la faim.", ) def test_oeuf_latin1(self): self.encode( u"D'un point de vue diététique, l'œuf apaise la faim.", b"D'un point de vue di\xe9t\xe9tique, l'\\oe uf apaise la faim.", 'latin1' ) def test_alpha(self): self.encode(u"α", b"$\\alpha$") def test_serafin(self): self.encode(u"Seraf{\xed}n", b"Seraf{\\'\\i }n") def test_space_1(self): self.encode(u"ææ", br'\ae \ae') def test_space_2(self): self.encode(u"æ æ", br'\ae\ \ae') def test_space_3(self): self.encode(u"æ æ", br'\ae \quad \ae') def test_number_sign(self): # note: no need for control space after \# self.encode(u"# hello", br'\# hello') def test_underscore(self): self.encode(u"_", br'\_') def test_dz1(self): self.encode(u"DZ", br'DZ') def test_dz2(self): self.encode(u"DZ", br'DZ') def test_newline(self): self.encode(u"hello\nworld", b"hello\nworld") def test_par1(self): self.encode(u"hello\n\nworld", b"hello\n\nworld") def test_par2(self): self.encode(u"hello\\par world", b"hello\\par world") def test_ogonek1(self): self.encode(u"ĄąĘęĮįǪǫŲų", br'\k A\k a\k E\k e\k I\k i\k O\k o\k U\k u') def test_ogonek2(self): self.encode(u"Ǭǭ", br'\textogonekcentered{\=O}\textogonekcentered{\=o}') def test_math_spacing(self): self.encode(u'This is a ψ test.', br'This is a $\psi$ test.') def test_double_math(self): # currently no attempt to translate maths inside $$ self.encode(u'This is a $$\\psi$$ test.', br'This is a $$\psi$$ test.') def test_tilde(self): self.encode(u'This is a ˜, ˷, ∼ and ~test.', (br'This is a \~{}, \texttildelow , ' br'$\sim$ and \textasciitilde test.')) def test_percent(self): self.encode(u'This is a % test.', br'This is a \% test.') def test_hyphen(self): self.encode(u'This is a \N{HYPHEN} test.', br'This is a - test.') def test_math_minus(self): self.encode(u'This is a − test.', br'This is a $-$ test.') def test_double_quotes(self): self.encode(u"“a+b”", br"``a+b''") def test_double_quotes_unicode(self): self.encode(u"“á”", br"``\'a''") def test_thin_space(self): self.encode(u"a\u2009b", b"a b") def test_theta(self): self.encode(u"θ", br"$\theta$") def test_encode_lower_quotes(self): self.encode(u"„", br",,") def test_encode_guillemet(self): self.encode(u"«quote»", br"\guillemotleft quote\guillemotright") class TestStreamEncoder(TestEncoder): """Stream encoder tests.""" def encode(self, text_utf8, text_latex, inputenc=None, errors='strict'): encoding = 'latex+' + inputenc if inputenc else 'latex' stream = BytesIO() writer = codecs.getwriter(encoding)(stream, errors=errors) writer.write(text_utf8) self.assertEqual(text_latex, stream.getvalue()) class TestIncrementalEncoder(TestEncoder): """Incremental encoder tests.""" def encode(self, text_utf8, text_latex, inputenc=None, errors='strict'): encoding = 'latex+' + inputenc if inputenc else 'latex' encoder = codecs.getincrementalencoder(encoding)(errors=errors) encoded_parts = ( encoder.encode(text_utf8_part, final) for text_utf8_part, final in split_input(text_utf8)) self.assertEqual(text_latex, b''.join(encoded_parts)) class TestUnicodeDecoder(TestDecoder): def decode(self, text_utf8, text_latex, inputenc=None): """Main test function.""" text_latex = text_latex.decode(inputenc if inputenc else "ascii") decoded, n = codecs.getdecoder('ulatex')(text_latex) self.assertEqual((decoded, n), (text_utf8, len(text_latex))) class TestUnicodeEncoder(TestEncoder): def encode(self, text_utf8, text_latex, inputenc=None, errors='strict'): """Main test function.""" encoding = 'ulatex+' + inputenc if inputenc else 'ulatex' text_latex = text_latex.decode(inputenc if inputenc else 'ascii') encoded, n = codecs.getencoder(encoding)(text_utf8, errors=errors) self.assertEqual((encoded, n), (text_latex, len(text_utf8))) def uencode(self, text_utf8, text_ulatex, inputenc=None, errors='strict'): """Main test function.""" encoding = 'ulatex+' + inputenc if inputenc else 'ulatex' encoded, n = codecs.getencoder(encoding)(text_utf8, errors=errors) self.assertEqual((encoded, n), (text_ulatex, len(text_utf8))) def test_ulatex_ascii(self): self.uencode(u'# ψ', u'\\# $\\psi$', 'ascii') def test_ulatex_utf8(self): self.uencode(u'# ψ', u'\\# ψ', 'utf8') # the following tests rely on the fact that \u2328 is not in our # translation table def test_ulatex_ascii_invalid(self): with pytest.raises(ValueError): self.uencode(u'# \u2328', u'', 'ascii') def test_ulatex_utf8_invalid(self): self.uencode(u'# ψ \u2328', u'\\# ψ \u2328', 'utf8') def test_invalid_code_keep(self): self.uencode(u'# ψ \u2328', u'\\# $\\psi$ \u2328', 'ascii', 'keep') latexcodec-2.0.1/test/test_latex_lexer.py0000644005105600024240000003306113674352304020513 0ustar dma0mtdma00000000000000# -*- coding: utf-8 -*- """Tests for the tex lexer.""" import pytest from unittest import TestCase import six from latexcodec.lexer import ( LatexLexer, LatexIncrementalLexer, LatexIncrementalDecoder, UnicodeLatexIncrementalDecoder, LatexIncrementalEncoder, UnicodeLatexIncrementalEncoder, Token) class MockLexer(LatexLexer): tokens = ( ('chars', u'mock'), ('unknown', u'.'), ) class MockIncrementalDecoder(LatexIncrementalDecoder): tokens = ( ('chars', u'mock'), ('unknown', u'.'), ) def test_token_create_with_args(): t = Token('hello', u'world') assert t.name == 'hello' assert t.text == u'world' def test_token_assign_name(): with pytest.raises(AttributeError): t = Token('hello', u'world') t.name = 'test' def test_token_assign_text(): with pytest.raises(AttributeError): t = Token('hello', u'world') t.text = 'test' def test_token_assign_other(): with pytest.raises(AttributeError): t = Token('hello', u'world') t.blabla = 'test' class BaseLatexLexerTest(TestCase): errors = 'strict' Lexer = None def setUp(self): self.lexer = self.Lexer(errors=self.errors) def lex_it(self, latex_code, latex_tokens, final=False): tokens = self.lexer.get_raw_tokens(latex_code, final=final) self.assertEqual( list(token.text for token in tokens), latex_tokens) def tearDown(self): del self.lexer class LatexLexerTest(BaseLatexLexerTest): Lexer = LatexLexer def test_null(self): self.lex_it('', [], final=True) def test_hello(self): self.lex_it( u'hello! [#1] This \\is\\ \\^ a \ntest.\n' u' \nHey.\n\n\\# x \\#x', six.u(r'h|e|l|l|o|!| | |[|#1|]| |T|h|i|s| |\is|\ | | |\^| |a| ' '|\n|t|e|s|t|.|\n| | | | |\n|H|e|y|.|\n|\n' r'|\#| |x| |\#|x').split(u'|'), final=True ) def test_comment(self): self.lex_it( u'test% some comment\ntest', u't|e|s|t|% some comment|\n|t|e|s|t'.split(u'|'), final=True ) def test_comment_newline(self): self.lex_it( u'test% some comment\n\ntest', u't|e|s|t|% some comment|\n|\n|t|e|s|t'.split(u'|'), final=True ) def test_control(self): self.lex_it( u'\\hello\\world', u'\\hello|\\world'.split(u'|'), final=True ) def test_control_whitespace(self): self.lex_it( u'\\hello \\world ', u'\\hello| | | |\\world| | | '.split(u'|'), final=True ) def test_controlx(self): self.lex_it( u'\\#\\&', u'\\#|\\&'.split(u'|'), final=True ) def test_controlx_whitespace(self): self.lex_it( u'\\# \\& ', u'\\#| | | | |\\&| | | '.split(u'|'), final=True ) def test_buffer(self): self.lex_it( u'hi\\t', u'h|i'.split(u'|'), ) self.lex_it( 'here', [u'\\there'], final=True, ) def test_state(self): self.lex_it( u'hi\\t', u'h|i'.split(u'|'), ) state = self.lexer.getstate() self.lexer.reset() self.lex_it( u'here', u'h|e|r|e'.split(u'|'), final=True, ) self.lexer.setstate(state) self.lex_it( u'here', [u'\\there'], final=True, ) def test_decode(self): with pytest.raises(NotImplementedError): self.lexer.decode(b'') def test_final_backslash(self): self.lex_it( u'notsogood\\', u'n|o|t|s|o|g|o|o|d|\\'.split(u'|'), final=True ) def test_final_comment(self): self.lex_it( u'hello%', u'h|e|l|l|o|%'.split(u'|'), final=True ) def test_hash(self): self.lex_it(u'#', [u'#'], final=True) def test_tab(self): self.lex_it(u'\\c\tc', u'\\c|\t|c'.split(u'|'), final=True) def test_percent(self): self.lex_it(u'This is a \\% test.', u'T|h|i|s| |i|s| |a| |\\%| |t|e|s|t|.'.split(u'|'), final=True) self.lex_it(u'\\% %test', u'\\%| |%test'.split(u'|'), final=True) self.lex_it(u'\\% %test\nhi', u'\\%| |%test|\n|h|i'.split(u'|'), final=True) def test_double_quotes(self): self.lex_it(u"``a+b''", u"``|a|+|b|''".split(u'|'), final=True) class BaseLatexIncrementalDecoderTest(TestCase): """Tex lexer fixture.""" errors = 'strict' IncrementalDecoder = None def setUp(self): self.lexer = self.IncrementalDecoder(self.errors) def fix(self, s): return s if self.lexer.binary_mode else s.decode("ascii") def lex_it(self, latex_code, latex_tokens, final=False): tokens = self.lexer.get_tokens(latex_code, final=final) self.assertEqual( list(token.text for token in tokens), latex_tokens) def tearDown(self): del self.lexer class LatexIncrementalDecoderTest(BaseLatexIncrementalDecoderTest): IncrementalDecoder = LatexIncrementalDecoder def test_null(self): self.lex_it(u'', [], final=True) def test_hello(self): self.lex_it( u'hello! [#1] This \\is\\ \\^ a \ntest.\n' u' \nHey.\n\n\\# x \\#x', six.u(r'h|e|l|l|o|!| |[|#1|]| |T|h|i|s| |\is|\ |\^|a| ' r'|t|e|s|t|.| |\par|H|e|y|.| ' r'|\par|\#| |x| |\#|x').split(u'|'), final=True ) def test_comment(self): self.lex_it( u'test% some comment\ntest', u't|e|s|t|t|e|s|t'.split(u'|'), final=True ) def test_comment_newline(self): self.lex_it( u'test% some comment\n\ntest', u't|e|s|t|\\par|t|e|s|t'.split(u'|'), final=True ) def test_control(self): self.lex_it( u'\\hello\\world', u'\\hello|\\world'.split(u'|'), final=True ) def test_control_whitespace(self): self.lex_it( u'\\hello \\world ', u'\\hello|\\world'.split(u'|'), final=True ) def test_controlx(self): self.lex_it( u'\\#\\&', u'\\#|\\&'.split(u'|'), final=True ) def test_controlx_whitespace(self): self.lex_it( u'\\# \\& ', u'\\#| |\\&| '.split(u'|'), final=True ) def test_buffer(self): self.lex_it( u'hi\\t', u'h|i'.split(u'|'), ) self.lex_it( u'here', [u'\\there'], final=True, ) def test_buffer_decode(self): self.assertEqual( self.lexer.decode(self.fix(b'hello! [#1] This \\i')), u'hello! [#1] This ', ) self.assertEqual( self.lexer.decode(self.fix(b's\\ \\^ a \ntest.\n')), u'\\is \\ \\^a test.', ) self.assertEqual( self.lexer.decode( self.fix(b' \nHey.\n\n\\# x \\#x'), final=True), u' \\par Hey. \\par \\# x \\#x', ) def test_state_middle(self): self.lex_it( u'hi\\t', u'h|i'.split(u'|'), ) state = self.lexer.getstate() self.assertEqual(self.lexer.state, 'M') self.assertEqual(self.lexer.raw_buffer.name, 'control_word') self.assertEqual(self.lexer.raw_buffer.text, u'\\t') self.lexer.reset() self.assertEqual(self.lexer.state, 'N') self.assertEqual(self.lexer.raw_buffer.name, 'unknown') self.assertEqual(self.lexer.raw_buffer.text, u'') self.lex_it( u'here', u'h|e|r|e'.split(u'|'), final=True, ) self.lexer.setstate(state) self.assertEqual(self.lexer.state, 'M') self.assertEqual(self.lexer.raw_buffer.name, 'control_word') self.assertEqual(self.lexer.raw_buffer.text, u'\\t') self.lex_it( u'here', [u'\\there'], final=True, ) def test_state_inline_math(self): self.lex_it( u'hi$t', u'h|i|$'.split(u'|'), ) assert self.lexer.inline_math self.lex_it( u'here$', u't|h|e|r|e|$'.split(u'|'), final=True, ) assert not self.lexer.inline_math # counterintuitive? def test_final_backslash(self): with pytest.raises(UnicodeDecodeError): self.lex_it( u'notsogood\\', [u'notsogood'], final=True ) def test_final_comment(self): self.lex_it( u'hello%', u'h|e|l|l|o'.split(u'|'), final=True ) def test_hash(self): self.lex_it(u'#', [u'#'], final=True) def test_tab(self): self.lex_it(u'\\c\tc', u'\\c|c'.split(u'|'), final=True) class UnicodeLatexIncrementalDecoderTest(LatexIncrementalDecoderTest): IncrementalDecoder = UnicodeLatexIncrementalDecoder class LatexIncrementalDecoderReplaceTest(BaseLatexIncrementalDecoderTest): errors = 'replace' IncrementalDecoder = MockIncrementalDecoder def test_errors_replace(self): self.lex_it( u'helmocklo', u'\ufffd|\ufffd|\ufffd|mock|\ufffd|\ufffd'.split(u'|'), final=True ) class LatexIncrementalDecoderIgnoreTest(BaseLatexIncrementalDecoderTest): errors = 'ignore' IncrementalDecoder = MockIncrementalDecoder def test_errors_ignore(self): self.lex_it( u'helmocklo', u'mock'.split(u'|'), final=True ) class LatexIncrementalDecoderInvalidErrorTest(BaseLatexIncrementalDecoderTest): errors = '**baderror**' IncrementalDecoder = MockIncrementalDecoder def test_errors_invalid(self): with pytest.raises(NotImplementedError): self.lex_it( u'helmocklo', u'?|?|?|mock|?|?'.split(u'|'), final=True ) def test_invalid_token(): lexer = LatexIncrementalDecoder() # piggyback an implementation which results in invalid tokens lexer.get_raw_tokens = lambda bytes_, final: [Token('**invalid**', bytes_)] with pytest.raises(AssertionError): lexer.decode(b'hello') def test_invalid_state_1(): lexer = LatexIncrementalDecoder() # piggyback invalid state lexer.state = '**invalid**' with pytest.raises(AssertionError): lexer.decode(b'\n\n\n') def test_invalid_state_2(): lexer = LatexIncrementalDecoder() # piggyback invalid state lexer.state = '**invalid**' with pytest.raises(AssertionError): lexer.decode(b' ') class LatexIncrementalLexerTest(TestCase): errors = 'strict' def setUp(self): self.lexer = LatexIncrementalLexer(errors=self.errors) def lex_it(self, latex_code, latex_tokens, final=False): tokens = self.lexer.get_tokens(latex_code, final=final) self.assertEqual( list(token.text for token in tokens), latex_tokens) def tearDown(self): del self.lexer def test_newline(self): self.lex_it( u"hello\nworld", u"h|e|l|l|o| |w|o|r|l|d".split(u'|'), final=True) def test_par(self): self.lex_it( u"hello\n\nworld", u"h|e|l|l|o| |\\par|w|o|r|l|d".split(u'|'), final=True) class LatexIncrementalEncoderTest(TestCase): """Encoder test fixture.""" errors = 'strict' IncrementalEncoder = LatexIncrementalEncoder def setUp(self): self.encoder = self.IncrementalEncoder(self.errors) def encode(self, latex_code, latex_bytes, final=False): result = self.encoder.encode(latex_code, final=final) self.assertEqual(result, latex_bytes) def tearDown(self): del self.encoder def test_invalid_type(self): with pytest.raises(TypeError): self.encoder.encode(object(), final=True) def test_invalid_code(self): with pytest.raises(ValueError): # default encoding is ascii, \u00ff is not ascii translatable self.encoder.encode(u"\u00ff", final=True) def test_hello(self): self.encode( u'hello', b'hello' if self.encoder.binary_mode else u'hello', final=True) def test_unicode_tokens(self): self.assertEqual( list(self.encoder.get_unicode_tokens( u"ĄąĄ̊ą̊ĘęĮįǪǫǬǭŲųY̨y̨", final=True)), u"Ą|ą|Ą̊|ą̊|Ę|ę|Į|į|Ǫ|ǫ|Ǭ|ǭ|Ų|ų|Y̨|y̨".split(u"|")) def test_state(self): self.assertEqual( list(self.encoder.get_unicode_tokens( u"Ą", final=False)), []) state = self.encoder.getstate() self.encoder.reset() self.assertEqual( list(self.encoder.get_unicode_tokens( u"ABC", final=True)), [u"A", u"B", u"C"]) self.encoder.setstate(state) self.assertEqual( list(self.encoder.get_unicode_tokens( u"̊", final=True)), [u"Ą̊"]) class UnicodeLatexIncrementalEncoderTest(LatexIncrementalEncoderTest): IncrementalEncoder = UnicodeLatexIncrementalEncoder def test_invalid_code(self): pass