pyglossary-3.2.1/0000755000175000017500000000000013577304644014224 5ustar emfoxemfox00000000000000pyglossary-3.2.1/AUTHORS0000755000175000017500000000115713575553425015304 0ustar emfoxemfox00000000000000Saeed Rasooli (ilius) Kubtek Xiaoqiang Wang Thomas Vogt Thanks to: Raul Fernandes and Karl Grill for reverse enginearing on BGL format Nilton Volpato for program python-progressbar Mehrdad Momeny for program MDic(a multilingual dictionary) and MDicConv Pier Carteri for program Py_Shell.py gnomek for the suggestions and feedback about Octopus MDic PyGlossary logo is created based on QStarDict logo and Python logo pyglossary-3.2.1/PKG-INFO0000644000175000017500000002317113577304644015325 0ustar emfoxemfox00000000000000Metadata-Version: 2.1 Name: pyglossary Version: 3.2.1 Summary: A tool for workig with dictionary databases Home-page: https://github.com/ilius/pyglossary Author: Saeed Rasooli Author-email: saeed.gnu@gmail.com License: GPLv3 Description: PyGlossary ========== PyGlossary is a tool for converting dictionary files aka glossaries, from/to various formats used by different dictionary applications Screenshots ----------- ![](https://raw.githubusercontent.com/ilius/pyglossary/resources/screenshots/30-gtk-bgl-stardict-nl-en.png) Linux - (New) Gtk3-based intreface ------------------------------------------------------------------------ ![](https://raw.githubusercontent.com/ilius/pyglossary/resources/screenshots/30-tk-bgl-mdict-fr-zh-win7.png) Windows - Tkinter-based interface ------------------------------------------------------------------------ ![](https://raw.githubusercontent.com/ilius/pyglossary/resources/screenshots/30-cmd-bgl-apple-ru-de.png) Linux - command line interface Supported formats ----------------- | Format | Extension | Read | Write | |-----------------------------------|---------------|:-----:|:-----:| | ABBYY Lingvo DSL | .dsl | ✔ | | | AppleDict Source | .xml | | ✔ | | Babylon | .bgl | ✔ | | | Babylon Source | .gls | | ✔ | | CC-CEDICT | | ✔ | | | CSV | .csv | ✔ | ✔ | | DictionaryForMIDs | | ✔ | ✔ | | DICTD dictionary server | .index | ✔ | ✔ | | Editable Linked List of Entries | .edlin | ✔ | ✔ | | FreeDict | .tei | | ✔ | | Gettext Source | .po | ✔ | ✔ | | Lingoes Source (LDF) | .ldf | ✔ | ✔ | | Octopus MDict | .mdx | ✔ | | | Octopus MDict Source | .txt | ✔ | ✔ | | Omnidic | | | ✔ | | Sdictionary Binary | .dct | ✔ | | | Sdictionary Source | .sdct | | ✔ | | SQL | .sql | | ✔ | | StarDict | .ifo | ✔ | ✔ | | Tabfile | .txt, .dic | ✔ | ✔ | | TreeDict | | | ✔ | | XDXF | .xdxf | ✔ | | Requirements ------------ PyGlossary uses **Python 3.x**, and works in practically all operating systems. While primarilly designed for *GNU/Linux*, it works on *Windows*, *Mac OS X* and other Unix-based operating systems as well. As shown in the screenshots, there are multiple User Interface types, ie. multiple ways to use the program. - **Gtk3-based interface**, uses [PyGI (Python Gobject Introspection)](http://pygobject.readthedocs.io/en/latest/getting_started.html) You can install it on: - Debian/Ubuntu: `apt install python3-gi python3-gi-cairo gir1.2-gtk-3.0` - openSUSE: `zypper install python3-gobject gtk3` - Fedora: `dnf install pygobject3 python3-gobject gtk3` - Archlinux: * `pacman -S python-gobject gtk3` * https://aur.archlinux.org/packages/pyglossary/ - Mac OS X: `brew install pygobject3 gtk+3` - Nix / NixOS: `nix-shell -p gnome3.gobjectIntrospection python37Packages.pygobject3 python37Packages.pycairo` - **Tkinter-based interface**, works in the lack of Gtk. Specially on Windows where Tkinter library is installed with the Python itself. You can also install it on: - Debian/Ubuntu: `apt-get install python3-tk tix` - openSUSE: `zypper install python3-tk tix` - Fedora: `yum install python3-tkinter tix` - Mac OS X: read - Nix / NixOS: `nix-shell -p python37Packages.tkinter tix` - **Command-line interface**, works in all operating systems without any specific requirements, just type: `python3 pyglossary.pyw --help` You may have to give `--no-progress-bar` option in Windows when converting glossaries (because the progress bar does not work properly in Windows command window) When you run the program without any command line arguments or options, PyGlossary tries to find PyGI, if it's installed, opens the Gtk3-based interface, if it's not, tries to find Tkinter and open the Tkinter-based interface. And exits with an error if neither are installed. But you can explicitly determine the user interface type using `--ui`, for example: python3 pyglossary.pyw --ui=gtk Or python3 pyglossary.pyw --ui=tk Format-specific Requirements ---------------------------- - **Reading from XDXF** `sudo pip3 install lxml` - **Writing to AppleDict** `sudo pip3 install lxml beautifulsoup4 html5lib` - **Reading from Babylon BGL**: Python 3.4 to 3.7 is recommended - **Reading from CC-CEDICT** `sudo pip3 install jinja2` - **Reading from Octopus Mdict (MDX)** + **python-lzo**, required for **some** MDX glossaries - First try converting your MDX file, and if failed (`AssertionError` probably), then you may need to install LZO library and Python binding: - **On Linux**, make sure `liblzo2-dev` or `liblzo2-devel` is installed and then run `sudo pip3 install python-lzo` - **On Windows**: + Open this page: https://www.lfd.uci.edu/~gohlke/pythonlibs/#python-lzo + If you are using Python 3.7 (32 bit) for example, click on `python_lzo‑1.12‑cp37‑cp37m‑win32.whl` + Open Start -> type Command -> right-click on Command Prompt -> Run as administrator + Run `pip install C:\....\python_lzo‑1.12‑cp37‑cp37m‑win32.whl` command, giving the path of downloaded file **Other Requirements for Mac OS X** If you want to convert glossaries into AppleDict format on Mac OS X, you also need: - GNU make as part of [Command Line Tools for Xcode](http://developer.apple.com/downloads). - Dictionary Development Kit as part of [Additional Tools for Xcode](http://developer.apple.com/downloads). Extract to `/Developer/Extras/Dictionary Development Kit` HOWTOs ------ ### Convert Babylon (bgl) to Mac OS X dictionary Let's assume the Babylon dict is at `~/Documents/Duden_Synonym/Duden_Synonym.BGL`: cd ~/Documents/Duden_Synonym/ python3 ~/Software/pyglossary/pyglossary.pyw --write-format=AppleDict Duden_Synonym.BGL Duden_Synonym-apple cd Duden_Synonym-apple make make install Launch Dictionary.app and test. ### Convert Octopus Mdict to Mac OS X dictionary Let's assume the MDict dict is at `~/Documents/Duden-Oxford/Duden-Oxford DEED ver.20110408.mdx`. Run the following command: cd ~/Documents/Duden-Oxford/ python3 ~/Software/pyglossary/pyglossary.pyw --write-format=AppleDict "Duden-Oxford DEED ver.20110408.mdx" "Duden-Oxford DEED ver.20110408-apple" cd "Duden-Oxford DEED ver.20110408-apple" make make install Launch Dictionary.app and test. Let's assume the MDict dict is at `~/Downloads/oald8/oald8.mdx`, along with the image/audio resources file `oald8.mdd`. Run the following commands: : cd ~/Downloads/oald8/ python3 ~/Software/pyglossary/pyglossary.pyw --write-format=AppleDict oald8.mdx oald8-apple cd oald8-apple This extracts dictionary into `oald8.xml` and data resources into folder `OtherResources`. Hyperlinks use relative path. : sed -i "" 's:src="/:src=":g' oald8.xml Convert audio file from SPX format to WAV format. You need package `speex` from [MacPorts](https://www.macports.org) : find OtherResources -name "*.spx" -execdir sh -c 'spx={};speexdec $spx ${spx%.*}.wav' \; sed -i "" 's|sound://\([/_a-zA-Z0-9]*\).spx|\1.wav|g' oald8.xml But be warned that the decoded WAVE audio can consume \~5 times more disk space! Compile and install. : make make install Launch Dictionary.app and test. Platform: UNKNOWN Description-Content-Type: text/markdown pyglossary-3.2.1/README.md0000644000175000017500000001725013577304507015506 0ustar emfoxemfox00000000000000PyGlossary ========== PyGlossary is a tool for converting dictionary files aka glossaries, from/to various formats used by different dictionary applications Screenshots ----------- ![](https://raw.githubusercontent.com/ilius/pyglossary/resources/screenshots/30-gtk-bgl-stardict-nl-en.png) Linux - (New) Gtk3-based intreface ------------------------------------------------------------------------ ![](https://raw.githubusercontent.com/ilius/pyglossary/resources/screenshots/30-tk-bgl-mdict-fr-zh-win7.png) Windows - Tkinter-based interface ------------------------------------------------------------------------ ![](https://raw.githubusercontent.com/ilius/pyglossary/resources/screenshots/30-cmd-bgl-apple-ru-de.png) Linux - command line interface Supported formats ----------------- | Format | Extension | Read | Write | |-----------------------------------|---------------|:-----:|:-----:| | ABBYY Lingvo DSL | .dsl | ✔ | | | AppleDict Source | .xml | | ✔ | | Babylon | .bgl | ✔ | | | Babylon Source | .gls | | ✔ | | CC-CEDICT | | ✔ | | | CSV | .csv | ✔ | ✔ | | DictionaryForMIDs | | ✔ | ✔ | | DICTD dictionary server | .index | ✔ | ✔ | | Editable Linked List of Entries | .edlin | ✔ | ✔ | | FreeDict | .tei | | ✔ | | Gettext Source | .po | ✔ | ✔ | | Lingoes Source (LDF) | .ldf | ✔ | ✔ | | Octopus MDict | .mdx | ✔ | | | Octopus MDict Source | .txt | ✔ | ✔ | | Omnidic | | | ✔ | | Sdictionary Binary | .dct | ✔ | | | Sdictionary Source | .sdct | | ✔ | | SQL | .sql | | ✔ | | StarDict | .ifo | ✔ | ✔ | | Tabfile | .txt, .dic | ✔ | ✔ | | TreeDict | | | ✔ | | XDXF | .xdxf | ✔ | | Requirements ------------ PyGlossary uses **Python 3.x**, and works in practically all operating systems. While primarilly designed for *GNU/Linux*, it works on *Windows*, *Mac OS X* and other Unix-based operating systems as well. As shown in the screenshots, there are multiple User Interface types, ie. multiple ways to use the program. - **Gtk3-based interface**, uses [PyGI (Python Gobject Introspection)](http://pygobject.readthedocs.io/en/latest/getting_started.html) You can install it on: - Debian/Ubuntu: `apt install python3-gi python3-gi-cairo gir1.2-gtk-3.0` - openSUSE: `zypper install python3-gobject gtk3` - Fedora: `dnf install pygobject3 python3-gobject gtk3` - Archlinux: * `pacman -S python-gobject gtk3` * https://aur.archlinux.org/packages/pyglossary/ - Mac OS X: `brew install pygobject3 gtk+3` - Nix / NixOS: `nix-shell -p gnome3.gobjectIntrospection python37Packages.pygobject3 python37Packages.pycairo` - **Tkinter-based interface**, works in the lack of Gtk. Specially on Windows where Tkinter library is installed with the Python itself. You can also install it on: - Debian/Ubuntu: `apt-get install python3-tk tix` - openSUSE: `zypper install python3-tk tix` - Fedora: `yum install python3-tkinter tix` - Mac OS X: read - Nix / NixOS: `nix-shell -p python37Packages.tkinter tix` - **Command-line interface**, works in all operating systems without any specific requirements, just type: `python3 pyglossary.pyw --help` You may have to give `--no-progress-bar` option in Windows when converting glossaries (because the progress bar does not work properly in Windows command window) When you run the program without any command line arguments or options, PyGlossary tries to find PyGI, if it's installed, opens the Gtk3-based interface, if it's not, tries to find Tkinter and open the Tkinter-based interface. And exits with an error if neither are installed. But you can explicitly determine the user interface type using `--ui`, for example: python3 pyglossary.pyw --ui=gtk Or python3 pyglossary.pyw --ui=tk Format-specific Requirements ---------------------------- - **Reading from XDXF** `sudo pip3 install lxml` - **Writing to AppleDict** `sudo pip3 install lxml beautifulsoup4 html5lib` - **Reading from Babylon BGL**: Python 3.4 to 3.7 is recommended - **Reading from CC-CEDICT** `sudo pip3 install jinja2` - **Reading from Octopus Mdict (MDX)** + **python-lzo**, required for **some** MDX glossaries - First try converting your MDX file, and if failed (`AssertionError` probably), then you may need to install LZO library and Python binding: - **On Linux**, make sure `liblzo2-dev` or `liblzo2-devel` is installed and then run `sudo pip3 install python-lzo` - **On Windows**: + Open this page: https://www.lfd.uci.edu/~gohlke/pythonlibs/#python-lzo + If you are using Python 3.7 (32 bit) for example, click on `python_lzo‑1.12‑cp37‑cp37m‑win32.whl` + Open Start -> type Command -> right-click on Command Prompt -> Run as administrator + Run `pip install C:\....\python_lzo‑1.12‑cp37‑cp37m‑win32.whl` command, giving the path of downloaded file **Other Requirements for Mac OS X** If you want to convert glossaries into AppleDict format on Mac OS X, you also need: - GNU make as part of [Command Line Tools for Xcode](http://developer.apple.com/downloads). - Dictionary Development Kit as part of [Additional Tools for Xcode](http://developer.apple.com/downloads). Extract to `/Developer/Extras/Dictionary Development Kit` HOWTOs ------ ### Convert Babylon (bgl) to Mac OS X dictionary Let's assume the Babylon dict is at `~/Documents/Duden_Synonym/Duden_Synonym.BGL`: cd ~/Documents/Duden_Synonym/ python3 ~/Software/pyglossary/pyglossary.pyw --write-format=AppleDict Duden_Synonym.BGL Duden_Synonym-apple cd Duden_Synonym-apple make make install Launch Dictionary.app and test. ### Convert Octopus Mdict to Mac OS X dictionary Let's assume the MDict dict is at `~/Documents/Duden-Oxford/Duden-Oxford DEED ver.20110408.mdx`. Run the following command: cd ~/Documents/Duden-Oxford/ python3 ~/Software/pyglossary/pyglossary.pyw --write-format=AppleDict "Duden-Oxford DEED ver.20110408.mdx" "Duden-Oxford DEED ver.20110408-apple" cd "Duden-Oxford DEED ver.20110408-apple" make make install Launch Dictionary.app and test. Let's assume the MDict dict is at `~/Downloads/oald8/oald8.mdx`, along with the image/audio resources file `oald8.mdd`. Run the following commands: : cd ~/Downloads/oald8/ python3 ~/Software/pyglossary/pyglossary.pyw --write-format=AppleDict oald8.mdx oald8-apple cd oald8-apple This extracts dictionary into `oald8.xml` and data resources into folder `OtherResources`. Hyperlinks use relative path. : sed -i "" 's:src="/:src=":g' oald8.xml Convert audio file from SPX format to WAV format. You need package `speex` from [MacPorts](https://www.macports.org) : find OtherResources -name "*.spx" -execdir sh -c 'spx={};speexdec $spx ${spx%.*}.wav' \; sed -i "" 's|sound://\([/_a-zA-Z0-9]*\).spx|\1.wav|g' oald8.xml But be warned that the decoded WAVE audio can consume \~5 times more disk space! Compile and install. : make make install Launch Dictionary.app and test. pyglossary-3.2.1/about0000644000175000017500000000040313575553425015257 0ustar emfoxemfox00000000000000A tool for converting, modifying and workig with dictionary files aka glossaries, with various formats used by different dictionary applications Copyleft © 2008-2018 Saeed Rasooli PyGlossary is licensed by the GNU General Public License version 3 (or later) pyglossary-3.2.1/config.json0000644000175000017500000000054213575553425016366 0ustar emfoxemfox00000000000000{ "noProgressBar": false, "ui_autoSetFormat": true, "ui_autoSetOutputFileName": true, "lower": false, "utf8Check": true, "enable_alts": true, "reverse_matchWord": true, "reverse_showRel": "Percent", "reverse_saveStep": 1000, "reverse_minRel": 0.3, "reverse_maxNum": -1, "reverse_includeDefs": false } pyglossary-3.2.1/doc/0000755000175000017500000000000013577304644014771 5ustar emfoxemfox00000000000000pyglossary-3.2.1/doc/Babylon/0000755000175000017500000000000013577304644016357 5ustar emfoxemfox00000000000000pyglossary-3.2.1/doc/Babylon/BGL.svg0000644000175000017500000021603113575553425017510 0ustar emfoxemfox00000000000000 image/svg+xml signature gzip header 4B gzip g-unzip too many blocks each BGL block Byte block.type 2 block.length 3 Bytes block.length Bytes block.data Byte block.type 3 block.length 4 Bytes block.length Bytes block.data Byte block.type x x-4 Bytes 4 <= x < 16block.length = x-4 block.data headword definition Byte 2 Bytes len(headword) len(definition) block.data 2B position ofgzip header when block.type is in {1, 7, 10, 13} alternates (with modifications) len(alternate) Byte block.data when block.type = 11 definition Byte 4 Bytes len(headword) len(definition) alternates number of alternates 4 Bytes headword alternate Byte alternate Byte alternate len(alternate) len(alternate) (same structure as other block types) block block block block block block pyglossary-3.2.1/doc/DSL/0000755000175000017500000000000013577304644015413 5ustar emfoxemfox00000000000000pyglossary-3.2.1/doc/DSL/README.rst0000644000175000017500000000221713575553425017105 0ustar emfoxemfox00000000000000 {{COMMENT}}..{{/COMMENT}} MAINENTRY: entry word MULTIWORD: entry words STYLE-LEVEL: spoken DEFINITION: definition PRON: pronunciation PART-OF-SPEECH: word class INFLECTION: list of inflection types INFLECTION-TYPE: singular/plural for noun, comparative/superlative for adj INFLECTION-ENTRY: SENSE-NUM: meaning number HOMO-NUM: meaning number SYNTAX-CODING: Thesaurus: PATTERNS-COLLOCATIONS: EXAMPLE: Main entry: DIALECT: See also: Phrase: Phrasal Verb: [m#] Indent Level [*]...[/* ] Optional Text. Show only in full mode [p]...[/p] Label defined in abbrev.dsl [s]...[/s] Sound/Picture File Text Format ============== [c color_name]...[/c] Color Name, e.g red, orange, [b]...[/b] Bold [']...[/'] [u]...[/u] Underline [i]...[/i] Italic [sup]...[/sup] Superscript [sub]...[/sub] Subscript Text Zone ========= [trn]...[/trn] Translation [ex]...[/ex] Example [com]...[/com] Comment [!trs]...[/!trs] text between these tags will not be indexed [t]...[/t] Unknown [url]...[/url] URL Link <<...>> Reference pyglossary-3.2.1/doc/Octopus MDict/0000755000175000017500000000000013577304644017406 5ustar emfoxemfox00000000000000pyglossary-3.2.1/doc/Octopus MDict/MDD.svg0000644000175000017500000031317513575553425020546 0ustar emfoxemfox00000000000000 image/svg+xml Dict Info String UTF-16 Integer Unknown Number of Bytes Long Long Integer Long Long Integer Number of Entries Long Long Integer Long Long Integer Long Long Integer Number of Bytes Number of Bytes Unknown Unknown Unknown Long Long Integer Long Long Integer Number of Entries Long Long Integer Long Long Integer ............................. Long Long Integer Long Long Integer 4 bytes 8 bytes Chunk of bytes with size determined elsewhere Number of Bytes Number of Bytes Legend ............................. } } 02 00 00 00 Zlib Compressed Key Block 02 00 00 00 Same as the last 4 bytes of compressed block.Used for validation Same as the last 4 bytes of compressed block Zlib Compressed Record Block Decompress Key Id Key Text { ............................. 00 Key Id Key Text 00 Record Text Decompress ..................... { Record Text Key Blocks Record Block Record Block Long Long Integer ............................. Key Blocks } MDD File Structure Analysis } Note: Integer and Long Long Integer have big endian order. } } Number of Bytes Number of Bytes Number of Record Blocks Number of Key Blocks 00 00 pyglossary-3.2.1/doc/Octopus MDict/MDX.svg0000644000175000017500000051036313575553425020570 0ustar emfoxemfox00000000000000 image/svg+xml Dict Info String UTF-16 Integer Unknown Number of Bytes Number of Entries Number of Bytes Number of Bytes Unknown Key Text Key Block Info Unknown Number of Entries ............................. 4 bytes 4 bytes when version < 2.08 bytes when version >= 2.0 Chunk of bytes with size determined elsewhere Number of Bytes Number of Bytes Legend ............................. } } Sum of all following chars + 1Used for validation 00 00 00 00 Unknown 00 00 00 00 00 00 00 00 02 00 00 00 { Zlib Compressed Key Block Record Text 00 00 00 00 Unknown 0D 0A 00 02 00 00 00 { Record Text 0D 0A 00 Same as the last 4 bytes of compressed block.Used for validation Same as the last 4 bytes of compressed block Zlib Compressed Key Block Decompress Key Id Key Text { ............................. 00 Key Id Key Text 00 0D 0A 00 Record Text Decompress ............................. { 0D 0A 00 Record Text Key Blocks Record Block Record Block 00 ............................. Key Blocks } MDX File Structure Analysis } Note: Integer and Long Long Integer have big endian order. } } Number of Bytes Number of Bytes Number of Record Blocks Number of Key Blocks 01 00 00 00 LZO Compressed Key Block Adler32Checksum 01 00 00 00 LZO Compressed Key Block Adler32 Checksum Number of Bytes Number of Bytes CompressedSize DecompressedSize 02 00 00 00 Zlib Compressed Key Block Info ............................. Version >= 2.0 Version <2.0 Header Key Block Record Block Only when Version > 2.0 Only when Version > 2.0 1 Byte when version < 2.02 Byets when version >= 2.0 } CompressedSize DecompressedSize Same as the last 4 bytes of compressed block.Used for validation pyglossary-3.2.1/doc/Octopus MDict/README.rst0000644000175000017500000000475413575553425021110 0ustar emfoxemfox00000000000000An Analysis of MDX/MDD File Format ================================== MDict is a multi-platform open dictionary which are both questionable. It is not available for every platform, e.g. OS X, Linux. Its dictionary file format is not open. But this has not hindered its popularity, and many dictionaries have been created for it. This is an attempt to reveal MDX/MDD file format, so that my favarite dictionaries, created by MDict users, could be used elsewhere. MDict Files =========== MDict stores the dictionary definitions, i.e. (key word, explanation) in MDX file and the dictionary reference data, e.g. images, pronunciations, stylesheets in MDD file. Although holding different contents, these two file formats share the same structure. MDX File Format =============== .. image:: MDX.svg MDD File Format =============== .. image:: MDD.svg Example Programs ================ readmdict.py ------------ readmdict.py is an example implementation in Python. This program can read/extract mdx/mdd files. **NOTE:** python-lzo is required to read mdx files created with enegine 1.2. Get Windows version from http://www.lfd.uci.edu/~gohlke/pythonlibs/#python-lzo It can be used as a command line tool. Suppose one has oald8.mdx and oald8.mdd:: $ python readmdict.py -x oald8.mdx This will creates *oald8.txt* dictionary file and creates a folder *data* for images, pronunciation audio files. On Windows, one can also double click it and select the file in the popup dialog. Or as a module:: In [1]: from readmdict import MDX, MDD Read MDX file and print the first entry:: In [2]: mdx = MDX('oald8.mdx') In [3]: items = mdx.items() In [4]: items.next() Out[4]: ('A', '.........') ``mdx`` is an objectt having all info from a MDX file. ``items`` is an iterator producing 2-item tuples. Of each tuple, the first element is the entry text and the second is the explanation. Both are UTF-8 encoded strings. Read MDD file and print the first entry:: In [5]: mdd = MDD('oald8.mdd') In [6]: items = mdd.items() In [7]: items = mdd.next() Out[7]: (u'\\pic\\accordion_concertina.jpg', '\xff\xd8\xff\xe0\x00\x10JFIF...........') ``mdd`` is an object having all info from a MDD file. ``items`` is an iterator producing 2-item tuples. Of each tuple, the first element is the file name and the second element is the corresponding file content. The file name is encoded in UTF-8. The file content is a plain bytes array. pyglossary-3.2.1/doc/non-gui_examples/0000755000175000017500000000000013577304644020243 5ustar emfoxemfox00000000000000pyglossary-3.2.1/doc/non-gui_examples/any_to_txt.py0000755000175000017500000000016313575553425023011 0ustar emfoxemfox00000000000000#!/usr/bin/python import sys from pyglossary import Glossary g = Glossary() g.read(sys.argv[1]) g.writeTabfile() pyglossary-3.2.1/doc/non-gui_examples/oxford.py0000644000175000017500000000177713575553425022133 0ustar emfoxemfox00000000000000def takePhonetic_oxford_gb(glos): phonGlos = Glossary() ## phonetic glossary phonGlos.setInfo('name', glos.getInfo('name') + '_phonetic') for entry in glos: word = entry.getWord() defi = entry.getDefi() if not defi.startswith('/'): continue #### Now set the phonetic to the `ph` variable. ph = '' for s in ( '/ adj', '/ v', '/ n', '/ adv', '/adj', '/v', '/n', '/adv', '/ n', '/ the', ): i = defi.find(s, 2, 85) if i==-1: continue else: ph = defi[:i+1] break ph = ph.replace(';', '\t')\ .replace(',', '\t')\ .replace(' ', '\t')\ .replace(' ', '\t')\ .replace(' ', '\t')\ .replace('//', '/')\ .replace('\t/\t', '\t')\ .replace('US\t', '\tUS: ')\ .replace('US', '\tUS: ')\ .replace('\t\t\t', '\t')\ .replace('\t\t', '\t')\ # .replace('/', '') # .replace('\\n ', '\\n') # .replace('\\n ', '\\n') if ph != '': phonGlos.addEntry(word, ph) return phonGlos pyglossary-3.2.1/help0000644000175000017500000000511113575553425015076 0ustar emfoxemfox00000000000000PyGlossary is a tool for working with dictionary databases (glossaries) Basic Usage: PyGI (Gtk3) Interface: To open PyGlossary window: ${CMD} PyGI is the default interface (so you never need to use "--ui=gtk" option) If PyGI was not found (not installed), then PyGlossary will fallback to Tkinter interface. Tkinter Interface: To open PyGlossary window: ${CMD} --ui=tk Usually good for Windows and Mac OS X Command Line Interface: To show this help: ${CMD} --help To show program version: ${CMD} --version To Convert: ${CMD} INPUT_FILE OUTPUT_FILE To Reverse: ${CMD} INPUT_FILE OUTPUT_FILE.txt --reverse Input and output files formats will be detected from extensions. Except that you explicitly specify input or output format, for example: ${CMD} mydic.utf8 mydic.ifo --read-format=tabfile ${CMD} mydic.utf8 mydic.ifo --read-format tabfile ${CMD} mydic.ifo mydic.utf8 --write-format=tabfile ${CMD} mydic.ifo mydic.utf8 --write-format tabfile General Options: Verbosity: -v0 or '--verbosity 0' for critical errors only -v1 or '--verbosity 1' for errors only -v2 or '--verbosity 2' for errors and warnings -v3 or '--verbosity 3' for errors, warnings and info -v4 or '--verbosity 4' for debug mode Appearance: --no-progress-bar and --no-color, useful for Windows (non-Unix) command line Full Convert Usage: ${CMD} INPUT_FILE OUTPUT_FILE [-vN] [--read-format=FORMAT] [--write-format=FORMAT] [--sort|--no-sort] [--direct|--indirect] [--sort-cache-size=2000] [--utf8-check|--no-utf8-check] [--lower|--no-lower] [--read-options=READ_OPTIONS] [--write-options=WRITE_OPTIONS] Command line arguments and options (and arguments for options) is parsed with GNU getopt method You can also just type extension of output file instead of full path, if you want to create with the same input file name with another extension. For example: ${CMD} mydic.ifo txt instead of: ${CMD} mydic.ifo mydic.txt Compressing with gz, bz2 and zip is supported, just append these extension to the file name, for example: ${CMD} mydic.ifo mydic.txt.gz or ${CMD} mydic.ifo txt.gz And if the input file has these extensions (gz, bz2, zip), it will be extracted before loading pyglossary-3.2.1/license-dialog0000644000175000017500000000140413575553425017026 0ustar emfoxemfox00000000000000PyGlossary - A tool for workig with dictionary databases Copyright © 2008-2018 Saeed Rasooli This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. Or on Debian systems, from /usr/share/common-licenses/GPL. If not, see . pyglossary-3.2.1/license.txt0000644000175000017500000010451313575553425016414 0ustar emfoxemfox00000000000000 GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The GNU General Public License is a free, copyleft license for software and other kinds of works. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things. To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others. For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License giving you legal permission to copy, distribute and/or modify it. For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions. Some devices are designed to deny users access to install or run modified versions of the software inside them, although the manufacturer can do so. This is fundamentally incompatible with the aim of protecting users' freedom to change the software. The systematic pattern of such abuse occurs in the area of products for individuals to use, which is precisely where it is most unacceptable. Therefore, we have designed this version of the GPL to prohibit the practice for those products. If such problems arise substantially in other domains, we stand ready to extend this provision to those domains in future versions of the GPL, as needed to protect the freedom of users. Finally, every program is threatened constantly by software patents. States should not allow patents to restrict development and use of software on general-purpose computers, but in those that do, we wish to avoid the special danger that patents applied to a free program could make it effectively proprietary. To prevent this, the GPL assures that patents cannot be used to render the program non-free. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 0. Definitions. "This License" refers to version 3 of the GNU General Public License. "Copyright" also means copyright-like laws that apply to other kinds of works, such as semiconductor masks. "The Program" refers to any copyrightable work licensed under this License. Each licensee is addressed as "you". "Licensees" and "recipients" may be individuals or organizations. To "modify" a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a "modified version" of the earlier work or a work "based on" the earlier work. A "covered work" means either the unmodified Program or a work based on the Program. To "propagate" a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well. To "convey" a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. An interactive user interface displays "Appropriate Legal Notices" to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion. 1. Source Code. The "source code" for a work means the preferred form of the work for making modifications to it. "Object code" means any non-source form of a work. A "Standard Interface" means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language. The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source. The Corresponding Source for a work in source code form is that same work. 2. Basic Permissions. All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law. You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you. Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary. 3. Protecting Users' Legal Rights From Anti-Circumvention Law. No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures. When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures. 4. Conveying Verbatim Copies. You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program. You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. 5. Conveying Modified Source Versions. You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: a) The work must carry prominent notices stating that you modified it, and giving a relevant date. b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to "keep intact all notices". c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate. 6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work. A "User Product" is either (1) a "consumer product", which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, "normally used" refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product. "Installation Information" for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made. If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM). The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying. 7. Additional Terms. "Additional permissions" are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions. When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission. Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms: a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or d) Limiting the use for publicity purposes of names of licensors or authors of the material; or e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors. All other non-permissive additional terms are considered "further restrictions" within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying. If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms. Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way. 8. Termination. You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11). However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10. 9. Acceptance Not Required for Having Copies. You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so. 10. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. An "entity transaction" is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it. 11. Patents. A "contributor" is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's "contributor version". A contributor's "essential patent claims" are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, "control" includes the right to grant patent sublicenses in a manner consistent with the requirements of this License. Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version. In the following three paragraphs, a "patent license" is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To "grant" such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party. If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. "Knowingly relying" means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid. If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it. A patent license is "discriminatory" if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007. Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law. 12. No Surrender of Others' Freedom. If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program. 13. Use with the GNU Affero General Public License. Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such. 14. Revised Versions of this License. The Free Software Foundation may publish revised and/or new versions of the GNU General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU General Public License, you may choose any version ever published by the Free Software Foundation. If the Program specifies that a proxy can decide which future versions of the GNU General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program. Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version. 15. Disclaimer of Warranty. THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. Limitation of Liability. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 17. Interpretation of Sections 15 and 16. If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively state the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Also add information on how to contact you by electronic and paper mail. If the program does terminal interaction, make it output a short notice like this when it starts in an interactive mode: Copyright (C) This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, your program's commands might be different; for a GUI interface, you would use an "about box". You should also get your employer (if you work as a programmer) or school, if any, to sign a "copyright disclaimer" for the program, if necessary. For more information on this, and how to apply and follow the GNU GPL, see . The GNU General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. But first, please read . pyglossary-3.2.1/pyglossary/0000755000175000017500000000000013577304644016440 5ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/__init__.py0000644000175000017500000000012513575553425020550 0ustar emfoxemfox00000000000000from .core import log, VERSION from .glossary import Glossary __version__ = VERSION pyglossary-3.2.1/pyglossary/arabic_utils.py0000644000175000017500000000063513577304507021455 0ustar emfoxemfox00000000000000def cleanWinArabicStr(u): """ u is a utf-8 encoded string """ replaceList = [ ('ی', 'ي'), ('ک', 'ك'), ('ٔ', 'ء'), ('\ufffd', ''), ] + [(chr(i), chr(i+144)) for i in range(1632, 1642)] for item in replaceList: u = u.replace(item[0], item[1]) return u def recodeToWinArabic(u): """ u is a utf-8 encoded string """ u = cleanWinArabicStr(u) return u.encode('windows-1256', 'replace') pyglossary-3.2.1/pyglossary/core.py0000644000175000017500000001147113577304507017744 0ustar emfoxemfox00000000000000import logging import traceback import inspect from pprint import pformat import sys import os from os.path import ( join, isfile, isdir, exists, realpath, dirname, ) import platform VERSION = "3.2.1" class MyLogger(logging.Logger): levelsByVerbosity = ( logging.CRITICAL, logging.ERROR, logging.WARNING, logging.INFO, logging.DEBUG, logging.NOTSET, ) levelNamesCap = [ "Critical", "Error", "Warning", "Info", "Debug", "All", # "Not-Set", ] def setVerbosity(self, verbosity): self.setLevel(self.levelsByVerbosity[verbosity]) self._verbosity = verbosity def getVerbosity(self): return getattr(self, "_verbosity", 3) # FIXME def pretty(self, data, header=""): self.debug(header + pformat(data)) def isDebug(self): return self.getVerbosity() >= 4 def format_var_dict(dct, indent=4, max_width=80): lines = [] pre = " " * indent for key, value in dct.items(): line = pre + key + " = " + repr(value) if len(line) > max_width: line = line[:max_width-3] + "..." try: value_len = len(value) except: pass else: line += "\n" + pre + "len(%s) = %s"%(key, value_len) lines.append(line) return "\n".join(lines) def format_exception(exc_info=None, add_locals=False, add_globals=False): if not exc_info: exc_info = sys.exc_info() _type, value, tback = exc_info text = "".join(traceback.format_exception(_type, value, tback)) if add_locals or add_globals: try: frame = inspect.getinnerframes(tback, context=0)[-1][0] except IndexError: pass else: if add_locals: text += "Traceback locals:\n%s\n" % format_var_dict( frame.f_locals, ) if add_globals: text += "Traceback globals:\n%s\n" % format_var_dict( frame.f_globals, ) return text class StdLogHandler(logging.Handler): startRed = "\x1b[31m" endFormat = "\x1b[0;0;0m" # len=8 def __init__(self, noColor=False): logging.Handler.__init__(self) self.noColor = noColor def emit(self, record): msg = record.getMessage() ### if record.exc_info: _type, value, tback = record.exc_info tback_text = format_exception( exc_info=record.exc_info, add_locals=(log.level <= logging.DEBUG), # FIXME add_globals=False, ) if not msg: msg = "unhandled exception:" msg += "\n" msg += tback_text ### if record.levelname in ("CRITICAL", "ERROR"): if not self.noColor: msg = self.startRed + msg + self.endFormat fp = sys.stderr else: fp = sys.stdout ### fp.write(msg + "\n") fp.flush() # def exception(self, msg): # if not self.noColor: # msg = self.startRed + msg + self.endFormat # sys.stderr.write(msg + "\n") # sys.stderr.flush() def checkCreateConfDir(): if not isdir(confDir): if exists(confDir): # file, or anything other than directory os.rename(confDir, confDir + ".bak") # we do not import old config os.mkdir(confDir) if not exists(userPluginsDir): os.mkdir(userPluginsDir) if not isfile(confJsonFile): with open(rootConfJsonFile) as srcFp, \ open(confJsonFile, "w") as userFp: userFp.write(srcFp.read()) # __________________________________________________________________________ # logging.setLoggerClass(MyLogger) log = logging.getLogger("root") sys.excepthook = lambda *exc_info: log.critical( format_exception( exc_info=exc_info, add_locals=(log.level <= logging.DEBUG), # FIXME add_globals=False, ) ) sysName = platform.system() # can set env var WARNINGS to: "error", "ignore", "always", "default", "module", "once" if os.getenv("WARNINGS"): import warnings warnings.filterwarnings(os.getenv("WARNINGS")) if hasattr(sys, "frozen"): rootDir = dirname(sys.executable) uiDir = join(rootDir, "ui") else: uiDir = dirname(realpath(__file__)) rootDir = dirname(uiDir) dataDir = rootDir if dataDir.endswith("dist-packages") or dataDir.endswith("site-packages"): dataDir = dirname(sys.argv[0]) appResDir = join(dataDir, "res") if os.sep == "/": # Operating system is Unix-Like homeDir = os.getenv("HOME") user = os.getenv("USER") tmpDir = "/tmp" # os.name == "posix" # FIXME if sysName == "Darwin": # MacOS X confDir = homeDir + "/Library/Preferences/PyGlossary" # or maybe: homeDir + "/Library/PyGlossary" # os.environ["OSTYPE"] == "darwin10.0" # os.environ["MACHTYPE"] == "x86_64-apple-darwin10.0" # platform.dist() == ("", "", "") # platform.release() == "10.3.0" else: # GNU/Linux, ... confDir = homeDir + "/.pyglossary" elif os.sep == "\\": # Operating system is Windows homeDir = os.getenv("HOMEDRIVE") + os.getenv("HOMEPATH") user = os.getenv("USERNAME") tmpDir = os.getenv("TEMP") confDir = os.getenv("APPDATA") + "\\" + "PyGlossary" else: raise RuntimeError( "Unknown path seperator(os.sep==%r)" % os.sep + ", unknown operating system!" ) confJsonFile = join(confDir, "config.json") rootConfJsonFile = join(dataDir, "config.json") userPluginsDir = join(confDir, "plugins") pyglossary-3.2.1/pyglossary/entry.py0000644000175000017500000001622413577304507020156 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- import re from tempfile import mktemp import os from os.path import ( join, exists, dirname, ) class DataEntry(object): # or Resource? FIXME def isData(self): return True def __init__(self, fname, data, inTmp=False): assert isinstance(fname, str) assert isinstance(data, bytes) assert isinstance(inTmp, bool) if inTmp: tmpPath = mktemp(prefix=fname + "_") with open(tmpPath, "wb") as toFile: toFile.write(data) data = "" else: tmpPath = None self._fname = fname self._data = data # bytes instance self._tmpPath = tmpPath def getFileName(self): return self._fname def getData(self): if self._tmpPath: with open(self._tmpPath, "rb") as fromFile: return fromFile.read() else: return self._data def save(self, directory): fname = self._fname # fix filename depending on operating system? FIXME fpath = join(directory, fname) fdir = dirname(fpath) if not exists(fdir): os.makedirs(fdir) with open(fpath, "wb") as toFile: toFile.write(self.getData()) return fpath def getWord(self): return self._fname def getWords(self): return [self._fname] def getDefi(self): return "File: %s" % self._fname def getDefis(self): return [self.getDefi()] def getDefiFormat(self): return "b" # "m" or "b" (binary) FIXME def setDefiFormat(self, defiFormat): pass def detectDefiFormat(self): pass def addAlt(self, alt): pass def editFuncWord(self, func): pass # modify fname? # FIXME def editFuncDefi(self, func): pass def strip(self): pass def replaceInWord(self, source, target): pass def replaceInDefi(self, source, target): pass def replace(self, source, target): pass def getRaw(self): return ( self._fname, "DATA", self, ) class Entry(object): sep = "|" htmlPattern = re.compile( ".*(" + "|".join([ r"", r"]", r"]", r"]", ]) + ")", re.S, ) def isData(self): return False def _join(self, parts): return self.sep.join([ part.replace(self.sep, "\\"+self.sep) for part in parts ]) @staticmethod def getEntrySortKey(key=None): if key: return lambda entry: key(entry.getWords()[0]) else: return lambda entry: entry.getWords()[0] @staticmethod def getRawEntrySortKey(key=None): if key: return lambda x: key( x[0][0] if isinstance(x[0], (list, tuple)) else x[0] ) else: return lambda x: \ x[0][0] if isinstance(x[0], (list, tuple)) else x[0] def __init__(self, word, defi, defiFormat="m"): """ word: string or a list of strings (including alternate words) defi: string or a list of strings (including alternate definitions) defiFormat (optional): definition format: "m": plain text "h": html "x": xdxf """ # memory optimization: if isinstance(word, list): if len(word) == 1: word = word[0] elif not isinstance(word, str): raise TypeError("invalid word type %s" % type(word)) if isinstance(defi, list): if len(defi) == 1: defi = defi[0] elif not isinstance(defi, str): raise TypeError("invalid defi type %s" % type(defi)) if not defiFormat in ("m", "h", "x"): raise ValueError("invalid defiFormat %r" % defiFormat) self._word = word self._defi = defi self._defiFormat = defiFormat def getWord(self): """ returns string of word, and all the alternate words seperated by "|" """ if isinstance(self._word, str): return self._word else: return self._join(self._word) def getWords(self): """ returns list of the word and all the alternate words """ if isinstance(self._word, str): return [self._word] else: return self._word def getDefi(self): """ returns string of definition, and all the alternate definitions seperated by "|" """ if isinstance(self._defi, str): return self._defi else: return self._join(self._defi) def getDefis(self): """ returns list of the definition and all the alternate definitions """ if isinstance(self._defi, str): return [self._defi] else: return self._defi def getDefiFormat(self): """ returns definition format: "m": plain text "h": html "x": xdxf """ return self._defiFormat def setDefiFormat(self, defiFormat): """ defiFormat: "m": plain text "h": html "x": xdxf """ self._defiFormat = defiFormat def detectDefiFormat(self): if self._defiFormat != "m": return defi = self.getDefi().lower() if re.match(self.htmlPattern, defi): self._defiFormat = "h" def addAlt(self, alt): words = self.getWords() words.append(alt) self._word = words def editFuncWord(self, func): """ run function `func` on all the words `func` must accept only one string as argument and return the modified string """ if isinstance(self._word, str): self._word = func(self._word) else: self._word = tuple( func(st) for st in self._word ) def editFuncDefi(self, func): """ run function `func` on all the definitions `func` must accept only one string as argument and return the modified string """ if isinstance(self._defi, str): self._defi = func(self._defi) else: self._defi = tuple( func(st) for st in self._defi ) def strip(self): """ strip whitespaces from all words and definitions """ self.editFuncWord(str.strip) self.editFuncDefi(str.strip) def replaceInWord(self, source, target): """ replace string `source` with `target` in all words """ if isinstance(self._word, str): self._word = self._word.replace(source, target) else: self._word = tuple( st.replace(source, target) for st in self._word ) def replaceInDefi(self, source, target): """ replace string `source` with `target` in all definitions """ if isinstance(self._defi, str): self._defi = self._defi.replace(source, target) else: self._defi = tuple( st.replace(source, target) for st in self._defi ) def replace(self, source, target): """ replace string `source` with `target` in all words and definitions """ self.replaceInWord(source, target) self.replaceInDefi(source, target) def getRaw(self): """ returns a tuple (word, defi) or (word, defi, defiFormat) where both word and defi might be string or list of strings """ if self._defiFormat: return ( self._word, self._defi, self._defiFormat, ) else: return ( self._word, self._defi, ) @classmethod def fromRaw(cls, rawEntry, defaultDefiFormat="m"): """ rawEntry can be (word, defi) or (word, defi, defiFormat) where both word and defi can be string or list of strings if defiFormat is missing, defaultDefiFormat will be used creates and return an Entry object from `rawEntry` tuple """ word = rawEntry[0] defi = rawEntry[1] if defi == "DATA": try: dataEntry = rawEntry[2] # DataEntry instance except IndexError: pass else: # if isinstance(dataEntry, DataEntry) # FIXME return dataEntry try: defiFormat = rawEntry[2] except IndexError: defiFormat = defaultDefiFormat if isinstance(word, tuple): word = list(word) if isinstance(defi, tuple): defi = list(defi) return cls( word, defi, defiFormat=defiFormat, ) pyglossary-3.2.1/pyglossary/entry_filters.py0000644000175000017500000000560113577304507021703 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- import re from .text_utils import ( fixUtf8, ) class EntryFilter(object): name = '' desc = '' def __init__(self, glos): self.glos = glos def run(self, entry): """ returns an Entry object, or None to skip may return the same `entry`, or modify and return it, or return a new Entry object """ return entry class StripEntryFilter(EntryFilter): name = 'strip' desc = 'Strip Whitespaces' def run(self, entry): entry.strip() entry.replace('\r', '') return entry class NonEmptyWordFilter(EntryFilter): name = 'non_empty_word' desc = 'Non-empty Words' def run(self, entry): if not entry.getWord(): return # words = entry.getWords() # if not words: # return # wordsStr = ''.join([w.strip() for w in words]) # if not wordsStr: # return return entry class NonEmptyDefiFilter(EntryFilter): name = 'non_empty_defi' desc = 'Non-empty Definition' def run(self, entry): if not entry.getDefi(): return return entry class FixUnicodeFilter(EntryFilter): name = 'fix_unicode' desc = 'Fix Unicode' def run(self, entry): entry.editFuncWord(fixUtf8) entry.editFuncDefi(fixUtf8) return entry class LowerWordFilter(EntryFilter): name = 'lower_word' desc = 'Lowercase Words' def run(self, entry): entry.editFuncWord(str.lower) return entry class SkipDataEntryFilter(EntryFilter): name = 'skip_resources' desc = 'Skip Resources' def run(self, entry): if entry.isData(): return return entry class LangEntryFilter(EntryFilter): name = 'lang' desc = 'Language-dependent Filters' def run_fa(self, entry): from pyglossary.persian_utils import faEditStr entry.editFuncWord(faEditStr) entry.editFuncDefi(faEditStr) # RLM = '\xe2\x80\x8f' # defi = '\n'.join([RLM+line for line in defi.split('\n')]) # for GoldenDict ^^ FIXME return entry def run(self, entry): langs = ( self.glos.getInfo('sourceLang') + self.glos.getInfo('targetLang') ).lower() if 'persian' in langs or 'farsi' in langs: entry = self.run_fa(entry) return entry class CleanEntryFilter(EntryFilter): # FIXME name = 'clean' desc = 'Clean' def cleanDefi(self, st): st = st.replace('♦ ', '♦ ') st = re.sub('[\r\n]+', '\n', st) st = re.sub(' *\n *', '\n', st) """ This code may correct snippets like: - First sentence .Second sentence. -> First sentence. Second sentence. - First clause ,second clause. -> First clause, second clause. But there are cases when this code have undesirable effects ( '<' represented as '<' in HTML markup): - -> < Adj. > - -> < fig. > """ """ for j in range(3): for ch in ',.;': st = replacePostSpaceChar(st, ch) """ st = re.sub('♦\n+♦', '♦', st) if st.endswith(' (ilius) # This file is part of PyGlossary project, https://github.com/ilius/pyglossary # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program. Or on Debian systems, from /usr/share/common-licenses/GPL # If not, see . import logging import sys import os from os.path import ( split, join, splitext, isfile, isdir, dirname, basename, abspath, ) from time import time as now import subprocess import re import pkgutil from collections import Counter from collections import OrderedDict as odict import io from .flags import * from . import core from .core import VERSION, userPluginsDir from .entry import Entry, DataEntry from .entry_filters import * from .sort_stream import hsortStreamList from .text_utils import ( fixUtf8, ) from .os_utils import indir homePage = "https://github.com/ilius/pyglossary" log = logging.getLogger("root") file = io.BufferedReader try: ModuleNotFoundError except NameError: ModuleNotFoundError = ImportError def get_ext(path): return splitext(path)[1].lower() class Glossary(object): """ Direct access to glos.data is droped Use `glos.addEntry(word, defi, [defiFormat])` where both word and defi can be list (including alternates) or string See help(glos.addEntry) Use `for entry in glos:` to iterate over entries (glossary data) See help(pyglossary.entry.Entry) for details """ # Should be changed according to plugins? FIXME infoKeysAliasDict = { "title": "name", "bookname": "name", "dbname": "name", ## "sourcelang": "sourceLang", "inputlang": "sourceLang", "origlang": "sourceLang", ## "targetlang": "targetLang", "outputlang": "targetLang", "destlang": "targetLang", ## "license": "copyright", } plugins = {} # format => pluginModule readFormats = [] writeFormats = [] readFunctions = {} readerClasses = {} writeFunctions = {} formatsDesc = {} formatsExt = {} formatsReadOptions = {} formatsWriteOptions = {} readExt = [] writeExt = [] readDesc = [] writeDesc = [] descFormat = {} descExt = {} extFormat = {} @classmethod def loadPlugins(cls, directory): """ executed on startup. as name implies, loads plugins from directory """ log.debug("Loading plugins from directory: %r" % directory) if not isdir(directory): log.error("Invalid plugin directory: %r" % directory) return sys.path.append(directory) for _, pluginName, _ in pkgutil.iter_modules([directory]): cls.loadPlugin(pluginName) sys.path.pop() @classmethod def loadPlugin(cls, pluginName): try: plugin = __import__(pluginName) except ModuleNotFoundError as e: log.warning("Module %r not found, skipping plugin %r", e.name, pluginName) return except Exception as e: log.exception("Error while importing plugin %s" % pluginName) return if (not hasattr(plugin, "enable")) or (not plugin.enable): log.debug("Plugin disabled or not a plugin: %s" % pluginName) return format = plugin.format extentions = plugin.extentions # FIXME: deprecate non-tuple values in plugin.extentions if isinstance(extentions, str): extentions = (extentions,) elif not isinstance(extentions, tuple): extentions = tuple(extentions) if hasattr(plugin, "description"): desc = plugin.description else: desc = "%s (%s)" % (format, extentions[0]) cls.plugins[format] = plugin cls.descFormat[desc] = format cls.descExt[desc] = extentions[0] for ext in extentions: cls.extFormat[ext] = format cls.formatsExt[format] = extentions cls.formatsDesc[format] = desc hasReadSupport = False try: Reader = plugin.Reader except AttributeError: pass else: for attr in ( "__init__", "open", "close", "__len__", "__iter__", ): if not hasattr(Reader, attr): log.error( "Invalid Reader class in \"%s\" plugin" % format + ", no \"%s\" method" % attr ) break else: cls.readerClasses[format] = Reader hasReadSupport = True try: cls.readFunctions[format] = plugin.read except AttributeError: pass else: hasReadSupport = True if hasReadSupport: cls.readFormats.append(format) cls.readExt.append(extentions) cls.readDesc.append(desc) cls.formatsReadOptions[format] = getattr(plugin, "readOptions", []) if hasattr(plugin, "write"): cls.writeFunctions[format] = plugin.write cls.writeFormats.append(format) cls.writeExt.append(extentions) cls.writeDesc.append(desc) cls.formatsWriteOptions[format] = getattr( plugin, "writeOptions", [] ) return plugin def clear(self): self._info = odict() self._data = [] try: readers = self._readers except AttributeError: pass else: for reader in readers: try: reader.close() except Exception: log.exception("") self._readers = [] self._iter = None self._entryFilters = [] self._sortKey = None self._sortCacheSize = 1000 self._filename = "" self._defaultDefiFormat = "m" self._progressbar = True def __init__(self, info=None, ui=None): """ info: OrderedDict instance, or None no need to copy OrderedDict instance, we will not reference to it """ self.clear() if info: if not isinstance(info, (dict, odict)): raise TypeError( "Glossary: `info` has invalid type" ", dict or OrderedDict expected" ) for key, value in info.items(): self.setInfo(key, value) """ self._data is a list of tuples with length 2 or 3: (word, definition) (word, definition, defiFormat) where both word and definition can be a string, or list (containing word and alternates) defiFormat: format of the definition: "m": plain text "h": html "x": xdxf """ self.ui = ui def updateEntryFilters(self): self._entryFilters = [] pref = getattr(self.ui, "pref", {}) self._entryFilters.append(StripEntryFilter(self)) self._entryFilters.append(NonEmptyWordFilter(self)) if pref.get("skipResources", False): self._entryFilters.append(SkipDataEntryFilter(self)) if pref.get("utf8Check", True): self._entryFilters.append(FixUnicodeFilter(self)) if pref.get("lower", True): self._entryFilters.append(LowerWordFilter(self)) self._entryFilters.append(LangEntryFilter(self)) self._entryFilters.append(CleanEntryFilter(self)) self._entryFilters.append(NonEmptyWordFilter(self)) self._entryFilters.append(NonEmptyDefiFilter(self)) def __str__(self): return "glossary.Glossary" def addEntryObj(self, entry): self._data.append(entry.getRaw()) def newEntry(self, word, defi, defiFormat=None): """ create and return a new entry object """ if not defiFormat: defiFormat = self._defaultDefiFormat return Entry(word, defi, defiFormat) def addEntry(self, word, defi, defiFormat=None): """ create and add a new entry object to glossary """ self.addEntryObj(self.newEntry(word, defi, defiFormat)) def _loadedEntryGen(self): wordCount = len(self._data) progressbar = self.ui and self._progressbar if progressbar: self.progressInit("Writing") for index, rawEntry in enumerate(self._data): yield Entry.fromRaw( rawEntry, defaultDefiFormat=self._defaultDefiFormat ) if progressbar: self.progress(index, wordCount) if progressbar: self.progressEnd() def _readersEntryGen(self): for reader in self._readers: wordCount = 0 progressbar = False if self.ui and self._progressbar: try: wordCount = len(reader) except Exception: log.exception("") if wordCount: progressbar = True if progressbar: self.progressInit("Converting") try: for index, entry in enumerate(reader): yield entry if progressbar: self.progress(index, wordCount) finally: reader.close() if progressbar: self.progressEnd() def _applyEntryFiltersGen(self, gen): for entry in gen: if not entry: continue for entryFilter in self._entryFilters: entry = entryFilter.run(entry) if not entry: break else: yield entry def __iter__(self): if self._iter is None: log.error( "Trying to iterate over a blank Glossary" ", must call `glos.read` first" ) return iter([]) return self._iter def iterEntryBuckets(self, size): """ iterate over buckets of entries, with size `size` For example: for bucket in glos.iterEntryBuckets(100): assert len(bucket) == 100 for entry in bucket: print(entry.getWord()) print("-----------------") """ bucket = [] for entry in self: if len(bucket) >= size: yield bucket bucket = [] bucket.append(entry) yield bucket def setDefaultDefiFormat(self, defiFormat): self._defaultDefiFormat = defiFormat def getDefaultDefiFormat(self): return self._defaultDefiFormat def __len__(self): return len(self._data) + sum( len(reader) for reader in self._readers ) def infoKeys(self): return list(self._info.keys()) def getMostUsedDefiFormats(self, count=None): return Counter([ entry.getDefiFormat() for entry in self ]).most_common(count) # def formatInfoKeys(self, format):# FIXME def iterInfo(self): return self._info.items() def getInfo(self, key): key = str(key) try: key = self.infoKeysAliasDict[key.lower()] except KeyError: pass return self._info.get(key, "") # "" or None as default? FIXME def setInfo(self, key, value): # FIXME origKey = key key = fixUtf8(key) value = fixUtf8(value) try: key = self.infoKeysAliasDict[key.lower()] except KeyError: pass if origKey != key: log.debug("setInfo: %s -> %s" % (origKey, key)) self._info[key] = value def getExtraInfos(self, excludeKeys): """ excludeKeys: a list of (basic) info keys to be excluded returns an OrderedDict including the rest of info keys, with associated values """ excludeKeySet = set() for key in excludeKeys: excludeKeySet.add(key) try: excludeKeySet.add(self.infoKeysAliasDict[key.lower()]) except KeyError: pass extra = odict() for key, value in self._info.items(): if key in excludeKeySet: continue extra[key] = value return extra def getPref(self, name, default): if self.ui: return self.ui.pref.get(name, default) else: return default def newDataEntry(self, fname, data): inTmp = not self._readers return DataEntry(fname, data, inTmp) # ________________________________________________________________________# def read( self, filename, format="", direct=False, progressbar=True, **options ): """ filename (str): name/path of input file format (str): name of input format, or "" to detect from file extention direct (bool): enable direct mode """ filename = abspath(filename) # don't allow direct=False when there are readers # (read is called before with direct=True) if self._readers and not direct: raise ValueError( "there are already %s readers" % len(self._readers) + ", you can not read with direct=False mode" ) self.updateEntryFilters() ### delFile = False ext = get_ext(filename) if ext in (".gz", ".bz2", ".zip"): if ext == ".bz2": output, error = subprocess.Popen( ["bzip2", "-dk", filename], stdout=subprocess.PIPE, ).communicate() # -k ==> keep original bz2 file # bunzip2 ~= bzip2 -d if error: log.error( error + "\n" + "Failed to decompress file \"%s\"" % filename ) return False else: filename = filename[:-4] ext = get_ext(filename) delFile = True elif ext == ".gz": output, error = subprocess.Popen( ["gzip", "-dc", filename], stdout=subprocess.PIPE, ).communicate() # -c ==> write to stdout (we want to keep original gz file) # gunzip ~= gzip -d if error: log.error( error + "\n" + "Failed to decompress file \"%s\"" % filename ) return False else: filename = filename[:-3] open(filename, "w").write(output) ext = get_ext(filename) delFile = True elif ext == ".zip": output, error = subprocess.Popen( ["unzip", filename, "-d", dirname(filename)], stdout=subprocess.PIPE, ).communicate() if error: log.error( error + "\n" + "Failed to decompress file \"%s\"" % filename ) return False else: filename = filename[:-4] ext = get_ext(filename) delFile = True if not format: for key in Glossary.formatsExt.keys(): if ext in Glossary.formatsExt[key]: format = key if not format: # if delFile: # os.remove(filename) log.error("Unknown extension \"%s\" for read support!" % ext) return False validOptionKeys = self.formatsReadOptions[format] for key in list(options.keys()): if key not in validOptionKeys: log.error( "Invalid read option \"%s\" " % key + "given for %s format" % format ) del options[key] filenameNoExt, ext = splitext(filename) if not ext.lower() in self.formatsExt[format]: filenameNoExt = filename self._filename = filenameNoExt if not self.getInfo("name"): self.setInfo("name", split(filename)[1]) self._progressbar = progressbar if format in self.readerClasses: Reader = self.readerClasses[format] reader = Reader(self) reader.open(filename, **options) if direct: self._readers.append(reader) log.info( "Using Reader class from %s plugin" % format + " for direct conversion without loading into memory" ) else: self.loadReader(reader) else: if direct: log.debug( "No `Reader` class found in %s plugin" % format + ", falling back to indirect mode" ) result = self.readFunctions[format].__call__( self, filename, **options ) # if not result:## FIXME # return False if delFile: os.remove(filename) self._updateIter() return True def loadReader(self, reader): """ iterates over `reader` object and loads the whole data into self._data must call `reader.open(filename)` before calling this function """ wordCount = 0 progressbar = False if self.ui and self._progressbar: try: wordCount = len(reader) except Exception: log.exception("") if wordCount: progressbar = True if progressbar: self.progressInit("Reading") try: for index, entry in enumerate(reader): if entry: self.addEntryObj(entry) if progressbar: self.progress(index, wordCount) finally: reader.close() if progressbar: self.progressEnd() return True def _inactivateDirectMode(self): """ loads all of `self._readers` into `self._data` closes readers and sets self._readers to [] """ for reader in self._readers: self.loadReader(reader) self._readers = [] def _updateIter(self, sort=False): """ updates self._iter depending on: 1- Wheather or not direct mode is On (self._readers not empty) or Off (self._readers empty) 2- Wheather sort is True, and if it is, checks for self._sortKey and self._sortCacheSize """ if self._readers: # direct mode if sort: sortKey = self._sortKey cacheSize = self._sortCacheSize log.info("Stream sorting enabled, cache size: %s" % cacheSize) # only sort by main word, or list of words + alternates? FIXME gen = hsortStreamList( self._readers, cacheSize, key=Entry.getEntrySortKey(sortKey), ) else: gen = self._readersEntryGen() else: gen = self._loadedEntryGen() self._iter = self._applyEntryFiltersGen(gen) def sortWords(self, key=None, cacheSize=None): # only sort by main word, or list of words + alternates? FIXME if self._readers: self._sortKey = key if cacheSize: self._sortCacheSize = cacheSize # FIXME else: self._data.sort( key=Entry.getRawEntrySortKey(key), ) self._updateIter(sort=True) def _detectOutput(self, filename="", format=""): """ returns (filename, format, archiveType) or None """ archiveType = "" if filename: ext = "" filenameNoExt, fext = splitext(filename) fext = fext.lower() if fext in (".gz", ".bz2", ".zip"): archiveType = fext[1:] filename = filenameNoExt fext = get_ext(filename) if not format: for fmt, extList in Glossary.formatsExt.items(): for e in extList: if format == e[1:] or format == e: format = fmt ext = e break if format: break if not format: for fmt, extList in Glossary.formatsExt.items(): if filename == fmt: format = filename ext = extList[0] filename = self._filename + ext break for e in extList: if filename == e[1:] or filename == e: format = fmt ext = e filename = self._filename + ext break if format: break if not format: for fmt, extList in Glossary.formatsExt.items(): if fext in extList: format = fmt ext = fext if not format: log.error("Unable to detect write format!") return else: # filename is empty if not self._filename: log.error("Invalid filename %r" % filename) return filename = self._filename # no extension if not format: log.error("No filename nor format is given for output file") return try: filename += Glossary.formatsExt[format][0] except KeyError: log.error("Invalid write format") return return filename, format, archiveType def write( self, filename, format, sort=None, sortKey=None, sortCacheSize=1000, **options ): """ sort (bool): True (enable sorting), False (disable sorting), None (auto, get from UI) sortKey (callable or None): key function for sorting takes a word as argument, which is str or list (with alternates) returns absolute path of output file, or None if failed """ if isdir(filename): filename = join(filename, basename(self._filename)) try: validOptionKeys = self.formatsWriteOptions[format] except KeyError: log.critical("No write support for \"%s\" format" % format) return for key in list(options.keys()): if key not in validOptionKeys: log.error( "Invalid write option \"%s\"" % key + " given for %s format" % format ) del options[key] plugin = self.plugins[format] sortOnWrite = plugin.sortOnWrite if sortOnWrite == ALWAYS: if sort is False: log.warning( "Writing %s requires sorting" % format + ", ignoring user sort=False option" ) if self._readers: log.warning( "Writing to %s format requires full sort" % format + ", falling back to indirect mode" ) self._inactivateDirectMode() log.info("Loaded %s entries" % len(self._data)) sort = True elif sortOnWrite == DEFAULT_YES: if sort is None: sort = True elif sortOnWrite == DEFAULT_NO: if sort is None: sort = False elif sortOnWrite == NEVER: if sort: log.warning( "Plugin prevents sorting before write" + ", ignoring user sort=True option" ) sort = False if sort: if sortKey is None: try: sortKey = plugin.sortKey except AttributeError: pass else: log.debug( "Using sort key function from %s plugin" % format ) elif sortOnWrite == ALWAYS: try: sortKey = plugin.sortKey except AttributeError: pass else: log.warning( "Ignoring user-defined sort order, " + "and using key function from %s plugin" % format ) self.sortWords( key=sortKey, cacheSize=sortCacheSize ) else: self._updateIter(sort=False) filename = abspath(filename) log.info("Writing to file \"%s\"" % filename) try: self.writeFunctions[format].__call__(self, filename, **options) except Exception: log.exception("Exception while calling plugin\'s write function") return finally: self.clear() return filename def archiveOutDir(self, filename, archiveType): """ filename is the existing file path archiveType is the archive extention (without dot): "gz", "bz2", "zip" """ try: os.remove("%s.%s" % (filename, archiveType)) except OSError: pass if archiveType == "gz": output, error = subprocess.Popen( ["gzip", filename], stdout=subprocess.PIPE, ).communicate() if error: log.error( error + "\n" + "Failed to compress file \"%s\"" % filename ) elif archiveType == "bz2": output, error = subprocess.Popen( ["bzip2", filename], stdout=subprocess.PIPE, ).communicate() if error: log.error( error + "\n" + "Failed to compress file \"%s\"" % filename ) elif archiveType == "zip": dirn, name = split(filename) with indir(dirn): output, error = subprocess.Popen( ["zip", filename+".zip", name, "-m"], stdout=subprocess.PIPE, ).communicate() if error: log.error( error + "\n" + "Failed to compress file \"%s\"" % filename ) archiveFilename = "%s.%s" % (filename, archiveType) if isfile(archiveFilename): return archiveFilename else: return filename def convert( self, inputFilename, inputFormat="", direct=None, progressbar=True, outputFilename="", outputFormat="", sort=None, sortKey=None, sortCacheSize=1000, readOptions=None, writeOptions=None, ): """ returns absolute path of output file, or None if failed """ if not readOptions: readOptions = {} if not writeOptions: writeOptions = {} outputArgs = self._detectOutput( filename=outputFilename, format=outputFormat, ) if not outputArgs: log.error("Writing file \"%s\" failed." % outputFilename) return outputFilename, outputFormat, archiveType = outputArgs if direct is None: if sort is not True: direct = True # FIXME tm0 = now() if not self.read( inputFilename, format=inputFormat, direct=direct, progressbar=progressbar, **readOptions ): return log.info("") finalOutputFile = self.write( outputFilename, outputFormat, sort=sort, sortKey=sortKey, sortCacheSize=sortCacheSize, **writeOptions ) log.info("") if not finalOutputFile: log.error("Writing file \"%s\" failed." % outputFilename) return if archiveType: finalOutputFile = self.archiveOutDir(finalOutputFile, archiveType) log.info("Writing file \"%s\" done." % finalOutputFile) log.info("Running time of convert: %.1f seconds" % (now() - tm0)) return finalOutputFile # ________________________________________________________________________# def writeTxt( self, sep1, sep2, filename="", writeInfo=True, rplList=None, ext=".txt", head="", iterEntries=None, entryFilterFunc=None, outInfoKeysAliasDict=None, encoding="utf-8", newline="\n", resources=True, ): if rplList is None: rplList = [] if not filename: filename = self._filename + ext if not outInfoKeysAliasDict: outInfoKeysAliasDict = {} fp = open(filename, "w", encoding=encoding, newline=newline) fp.write(head) if writeInfo: for key, desc in self._info.items(): try: key = outInfoKeysAliasDict[key] except KeyError: pass for rpl in rplList: desc = desc.replace(rpl[0], rpl[1]) fp.write("##" + key + sep1 + desc + sep2) fp.flush() myResDir = filename + "_res" if not isdir(myResDir): os.mkdir(myResDir) if not iterEntries: iterEntries = self for entry in iterEntries: if entry.isData(): if resources: entry.save(myResDir) if entryFilterFunc: entry = entryFilterFunc(entry) if not entry: continue word = entry.getWord() defi = entry.getDefi() if word.startswith("#"): # FIXME continue # if self.getPref("enable_alts", True): # FIXME for rpl in rplList: defi = defi.replace(rpl[0], rpl[1]) fp.write(word + sep1 + defi + sep2) fp.close() if not os.listdir(myResDir): os.rmdir(myResDir) return True def writeTabfile(self, filename="", **kwargs): self.writeTxt( "\t", "\n", filename=filename, rplList=( ("\\", "\\\\"), ("\n", "\\n"), ("\t", "\\t"), ), ext=".txt", **kwargs ) def writeDict(self, filename="", writeInfo=False): # Used in "/usr/share/dict/" for some dictionarys such as "ding" self.writeTxt( " :: ", "\n", filename, writeInfo, ( ("\n", "\\n"), ), ".dict", ) def iterSqlLines( self, filename="", infoKeys=None, addExtraInfo=True, newline="\\n", transaction=False, ): newline = "
" infoDefLine = "CREATE TABLE dbinfo (" infoValues = [] if not infoKeys: infoKeys = [ "dbname", "author", "version", "direction", "origLang", "destLang", "license", "category", "description", ] for key in infoKeys: value = self.getInfo(key) value = value.replace("\'", "\'\'")\ .replace("\x00", "")\ .replace("\r", "")\ .replace("\n", newline) infoValues.append("\'" + value + "\'") infoDefLine += "%s char(%d), " % (key, len(value)) infoDefLine = infoDefLine[:-2] + ");" yield infoDefLine if addExtraInfo: yield ( "CREATE TABLE dbinfo_extra (" + "\'id\' INTEGER PRIMARY KEY NOT NULL, " + "\'name\' TEXT UNIQUE, \'value\' TEXT);" ) yield ( "CREATE TABLE word (\'id\' INTEGER PRIMARY KEY NOT NULL, " + "\'w\' TEXT, \'m\' TEXT);" ) if transaction: yield "BEGIN TRANSACTION;" yield "INSERT INTO dbinfo VALUES(%s);" % (",".join(infoValues)) if addExtraInfo: extraInfo = self.getExtraInfos(infoKeys) for index, (key, value) in enumerate(extraInfo.items()): yield ( "INSERT INTO dbinfo_extra " + "VALUES(%d, \'%s\', \'%s\');" % ( index + 1, key.replace("\'", "\'\'"), value.replace("\'", "\'\'"), ) ) for i, entry in enumerate(self): if entry.isData(): # FIXME continue word = entry.getWord() defi = entry.getDefi() word = word.replace("\'", "\'\'")\ .replace("\r", "").replace("\n", newline) defi = defi.replace("\'", "\'\'")\ .replace("\r", "").replace("\n", newline) yield "INSERT INTO word VALUES(%d, \'%s\', \'%s\');" % ( i+1, word, defi, ) if transaction: yield "END TRANSACTION;" yield "CREATE INDEX ix_word_w ON word(w COLLATE NOCASE);" # ________________________________________________________________________# def takeOutputWords(self, minWordLen=3): wordPattern = re.compile("[\w]{%d,}" % minWordLen, re.U) words = set() progressbar, self._progressbar = self._progressbar, False for entry in self: words.update(re.findall( wordPattern, entry.getDefi(), )) self._progressbar = progressbar return sorted(words) # ________________________________________________________________________# def progressInit(self, *args): if self.ui: self.ui.progressInit(*args) def progress(self, wordI, wordCount): if self.ui and wordI % (wordCount//500 + 1) == 0: self.ui.progress( min(wordI + 1, wordCount) / wordCount, "%d / %d completed" % (wordI, wordCount), ) def progressEnd(self): if self.ui: self.ui.progressEnd() # ________________________________________________________________________# def searchWordInDef( self, st, matchWord=True, sepChars=".,،", maxNum=100, minRel=0.0, minWordLen=3, includeDefs=False, showRel="Percent", ): # searches word "st" in definitions of the glossary splitPattern = re.compile( "|".join([re.escape(x) for x in sepChars]), re.U, ) wordPattern = re.compile("[\w]{%d,}" % minWordLen, re.U) outRel = [] for item in self._data: words, defi = item[:2] if isinstance(words, str): words = [words] if isinstance(defi, list): defi = "\n".join(defi) if st not in defi: continue for word in words: rel = 0 # relation value of word (0 <= rel <= 1) for part in re.split(splitPattern, defi): if not part: continue if matchWord: partWords = re.findall( wordPattern, part, ) if not partWords: continue rel = max( rel, partWords.count(st) / len(partWords) ) else: rel = max( rel, part.count(st) * len(st) / len(part) ) if rel <= minRel: continue if includeDefs: outRel.append((word, rel, defi)) else: outRel.append((word, rel)) outRel.sort( key=lambda x: x[1], reverse=True, ) n = len(outRel) if n > maxNum > 0: outRel = outRel[:maxNum] n = maxNum num = 0 out = [] if includeDefs: for j in range(n): numP = num w, num, m = outRel[j] m = m.replace("\n", "\\n").replace("\t", "\\t") onePer = int(1.0/num) if onePer == 1.0: out.append("%s\\n%s" % (w, m)) elif showRel == "Percent": out.append("%s(%%%d)\\n%s" % (w, 100*num, m)) elif showRel == "Percent At First": if num == numP: out.append("%s\\n%s" % (w, m)) else: out.append("%s(%%%d)\\n%s" % (w, 100*num, m)) else: out.append("%s\\n%s" % (w, m)) return out for j in range(n): numP = num w, num = outRel[j] onePer = int(1.0/num) if onePer == 1.0: out.append(w) elif showRel == "Percent": out.append("%s(%%%d)" % (w, 100*num)) elif showRel == "Percent At First": if num == numP: out.append(w) else: out.append("%s(%%%d)" % (w, 100*num)) else: out.append(w) return out def reverse( self, savePath="", words=None, includeDefs=False, reportStep=300, saveStep=1000, # set this to zero to disable auto saving **kwargs ): """ This is a generator Usage: for wordIndex in glos.reverse(...): pass Inside the `for` loop, you can pause by waiting (for input or a flag) or stop by breaking Potential keyword arguments: words = None ## None, or list reportStep = 300 saveStep = 1000 savePath = "" matchWord = True sepChars = ".,،" maxNum = 100 minRel = 0.0 minWordLen = 3 includeDefs = False showRel = "None" allowed values: "None", "Percent", "Percent At First" """ if not savePath: savePath = self.getInfo("name") + ".txt" if saveStep < 2: raise ValueError("saveStep must be more than 1") ui = self.ui if words: words = list(words) else: words = self.takeOutputWords() wordCount = len(words) log.info( "Reversing to file \"%s\"" % savePath + ", number of words: %s" % wordCount ) self.progressInit("Reversing") with open(savePath, "w") as saveFile: for wordI in range(wordCount): word = words[wordI] self.progress(wordI, wordCount) if wordI % saveStep == 0 and wordI > 0: saveFile.flush() result = self.searchWordInDef( word, includeDefs=includeDefs, **kwargs ) if result: try: if includeDefs: defi = "\\n\\n".join(result) else: defi = ", ".join(result) + "." except Exception: log.exception("") log.pretty(result, "result = ") return saveFile.write("%s\t%s\n" % (word, defi)) yield wordI self.progressEnd() yield wordCount Glossary.loadPlugins(join(dirname(__file__), "plugins")) Glossary.loadPlugins(userPluginsDir) pyglossary-3.2.1/pyglossary/gregorian.py0000644000175000017500000000602513577304507020770 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # # Copyright (C) Saeed Rasooli # Copyright (C) 2007 Mehdi Bayazee # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program. If not, see . # Also avalable in /usr/share/common-licenses/GPL on Debian systems # or /usr/share/licenses/common/GPL3/license.txt on ArchLinux # Gregorian calendar: # http://en.wikipedia.org/wiki/Gregorian_calendar from datetime import datetime name = 'gregorian' desc = 'Gregorian' origLang = 'en' monthName = ('January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December') monthNameAb = ('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec') def getMonthName(m, y=None): return monthName.__getitem__(m - 1) def getMonthNameAb(m, y=None): return monthNameAb.__getitem__(m - 1) def getMonthsInYear(y): return 12 epoch = 1721426 minMonthLen = 29 maxMonthLen = 31 avgYearLen = 365.2425 # FIXME options = () def save(): pass def isLeap(y): if y < 1: y += 1 return y % 4 == 0 and not (y % 100 == 0 and y % 400 != 0) def to_jd(year, month, day): if year > 0: # > 1.5x faster return datetime(year, month, day).toordinal() + 1721425 # Python 2.x and 3.x: if month <= 2: tm = 0 elif isLeap(year): tm = -1 else: tm = -2 # Python >= 2.5: # tm = 0 if month <= 2 else (-1 if isLeap(year) else -2) return epoch - 1 + 365*(year-1) + (year-1)//4 - (year-1)//100 + \ (year-1)//400 + (367*month-362)//12 + tm + day def jd_to(jd): ordinal = int(jd) - 1721425 if ordinal > 0: # > 4x faster dt = datetime.fromordinal(ordinal) return (dt.year, dt.month, dt.day) # wjd = floor(jd - 0.5) + 0.5 qc, dqc = divmod(jd - epoch, 146097) # qc ~~ quadricent cent, dcent = divmod(dqc, 36524) quad, dquad = divmod(dcent, 1461) yindex = dquad//365 # divmod(dquad, 365)[0] year = qc*400 + cent*100 + quad*4 + yindex + (cent != 4 and yindex != 4) yearday = jd - to_jd(year, 1, 1) # Python 2.x and 3.x: if jd < to_jd(year, 3, 1): leapadj = 0 elif isLeap(year): leapadj = 1 else: leapadj = 2 # Python >= 2.5: # leapadj = 0 if jd < to_jd(year, 3, 1) else (1 if isLeap(year) else 2) month = ((yearday+leapadj) * 12 + 373) // 367 day = jd - to_jd(year, month, 1) + 1 return int(year), int(month), int(day) def getMonthLen(y, m): if m == 12: return to_jd(y+1, 1, 1) - to_jd(y, 12, 1) else: return to_jd(y, m+1, 1) - to_jd(y, m, 1) J0001 = to_jd(1, 1, 1) J1970 = to_jd(1970, 1, 1) pyglossary-3.2.1/pyglossary/html_utils.py0000644000175000017500000001442513577304507021202 0ustar emfoxemfox00000000000000import logging log = logging.getLogger('root') def toStr(s): return str(s, 'utf8') if isinstance(s, bytes) else str(s) html_entity2str = { "ldash": "–", "acirc": "â", "ecirc": "ê", "icirc": "î", "ocirc": "ô", "ucirc": "û", "ycirc": "ŷ", "uring": "ů", "wring": "ẘ", "yring": "ẙ", "agrave": "à", "egrave": "è", "igrave": "ì", "ograve": "ò", "ugrave": "ù", "ygrave": "ỳ", "atilde": "ã", "etilde": "ẽ", "itilde": "ĩ", "otilde": "õ", "utilde": "ũ", "ytilde": "ỹ", "auml": "ӓ", "euml": "ë", "iuml": "ï", "ouml": "ö", "uuml": "ü", "yuml": "ÿ", "ccedil": "ç", "aelig": "æ", "eth": "ð", "pound": "£", "deg": "°", "divide": "÷", "frac12": "½", "frac13": "⅓", "frac14": "¼", "frac23": "⅔", "frac34": "¾", "xfrac13": "⅓", "hearts": "♥", "diams": "♦", "spades": "♠", "clubs": "♣" } # Use build_name2codepoint_dict function to update this dictionary name2codepoint = { "aring": 0x00c5, # Å "gt": 0x003e, # > "sup": 0x2283, # ⊃ "ge": 0x2265, # ≥ "upsih": 0x03d2, # ϒ "asymp": 0x2248, # ≈ "radic": 0x221a, # √ "otimes": 0x2297, # ⊗ "aelig": 0x00c6, # Æ "sigmaf": 0x03c2, # ς "lrm": 0x200e, # ‎ "cedil": 0x00b8, # ¸ "kappa": 0x03ba, # κ "wring": 0x1e98, # ẘ "prime": 0x2032, # ′ "lceil": 0x2308, # ⌈ "iquest": 0x00bf, # ¿ "shy": 0x00ad, # ­ "sdot": 0x22c5, # ⋅ "lfloor": 0x230a, # ⌊ "brvbar": 0x00a6, # ¦ "egrave": 0x00c8, # È "sub": 0x2282, # ⊂ "iexcl": 0x00a1, # ¡ "ordf": 0x00aa, # ª "sum": 0x2211, # ∑ "ntilde": 0x00f1, # ñ "atilde": 0x00e3, # ã "theta": 0x03b8, # θ "equiv": 0x2261, # ≡ "nsub": 0x2284, # ⊄ "omicron": 0x039f, # Ο "yuml": 0x0178, # Ÿ "thinsp": 0x2009, #   "ecirc": 0x00ca, # Ê "bdquo": 0x201e, # „ "frac23": 0x2154, # ⅔ "emsp": 0x2003, #   "permil": 0x2030, # ‰ "eta": 0x0397, # Η "forall": 0x2200, # ∀ "eth": 0x00d0, # Ð "rceil": 0x2309, # ⌉ "ldash": 0x2013, # – "divide": 0x00f7, # ÷ "igrave": 0x00cc, # Ì "pound": 0x00a3, # £ "frasl": 0x2044, # ⁄ "zeta": 0x03b6, # ζ "lowast": 0x2217, # ∗ "chi": 0x03a7, # Χ "cent": 0x00a2, # ¢ "perp": 0x22a5, # ⊥ "there4": 0x2234, # ∴ "pi": 0x03c0, # π "empty": 0x2205, # ∅ "euml": 0x00cb, # Ë "notin": 0x2209, # ∉ "uuml": 0x00fc, # ü "icirc": 0x00ee, # î "bull": 0x2022, # • "upsilon": 0x03a5, # Υ "ensp": 0x2002, #   "ccedil": 0x00c7, # Ç "cap": 0x2229, # ∩ "mu": 0x03bc, # μ "deg": 0x00b0, # ° "tau": 0x03c4, # τ "nabla": 0x2207, # ∇ "ucirc": 0x00db, # Û "ugrave": 0x00f9, # ù "cong": 0x2245, # ≅ "quot": 0x0022, # " "uacute": 0x00da, # Ú "acirc": 0x00c2, #  "sim": 0x223c, # ∼ "phi": 0x03a6, # Φ "diams": 0x2666, # ♦ "minus": 0x2212, # − "euro": 0x20ac, # € "thetasym": 0x03d1, # ϑ "iuml": 0x00cf, # Ï "sect": 0x00a7, # § "ldquo": 0x201c, # “ "hearts": 0x2665, # ♥ "oacute": 0x00f3, # ó "zwnj": 0x200c, # ‌ "yen": 0x00a5, # ¥ "ograve": 0x00d2, # Ò "uring": 0x016f, # ů "trade": 0x2122, # ™ "nbsp": 0x00a0, #   "tilde": 0x02dc, # ˜ "itilde": 0x0129, # ĩ "oelig": 0x0153, # œ "xfrac13": 0x2153, # ⅓ "le": 0x2264, # ≤ "auml": 0x00e4, # ä "cup": 0x222a, # ∪ "otilde": 0x00f5, # õ "lt": 0x003c, # < "ndash": 0x2013, # – "sbquo": 0x201a, # ‚ "real": 0x211c, # ℜ "psi": 0x03c8, # ψ "rsaquo": 0x203a, # › "darr": 0x2193, # ↓ "not": 0x00ac, # ¬ "amp": 0x0026, # & "oslash": 0x00f8, # ø "acute": 0x00b4, # ´ "zwj": 0x200d, # ‍ "alefsym": 0x2135, # ℵ "sup3": 0x00b3, # ³ "rdquo": 0x201d, # ” "laquo": 0x00ab, # « "micro": 0x00b5, # µ "ygrave": 0x1ef3, # ỳ "szlig": 0x00df, # ß "clubs": 0x2663, # ♣ "agrave": 0x00e0, # à "harr": 0x2194, # ↔ "frac14": 0x00bc, # ¼ "frac13": 0x2153, # ⅓ "frac12": 0x00bd, # ½ "utilde": 0x0169, # ũ "prop": 0x221d, # ∝ "circ": 0x02c6, # ˆ "ocirc": 0x00f4, # ô "uml": 0x00a8, # ¨ "prod": 0x220f, # ∏ "reg": 0x00ae, # ® "rlm": 0x200f, # ‏ "ycirc": 0x0177, # ŷ "infin": 0x221e, # ∞ "etilde": 0x1ebd, # ẽ "mdash": 0x2014, # — "uarr": 0x21d1, # ⇑ "times": 0x00d7, # × "rarr": 0x21d2, # ⇒ "yring": 0x1e99, # ẙ "or": 0x2228, # ∨ "gamma": 0x03b3, # γ "lambda": 0x03bb, # λ "rang": 0x232a, # 〉 "xi": 0x039e, # Ξ "dagger": 0x2021, # ‡ "image": 0x2111, # ℑ "hellip": 0x2026, # … "sube": 0x2286, # ⊆ "alpha": 0x03b1, # α "plusmn": 0x00b1, # ± "sup1": 0x00b9, # ¹ "sup2": 0x00b2, # ² "frac34": 0x00be, # ¾ "oline": 0x203e, # ‾ "loz": 0x25ca, # ◊ "iota": 0x03b9, # ι "iacute": 0x00cd, # Í "para": 0x00b6, # ¶ "ordm": 0x00ba, # º "epsilon": 0x03b5, # ε "weierp": 0x2118, # ℘ "part": 0x2202, # ∂ "delta": 0x03b4, # δ "copy": 0x00a9, # © "scaron": 0x0161, # š "lsquo": 0x2018, # ‘ "isin": 0x2208, # ∈ "supe": 0x2287, # ⊇ "and": 0x2227, # ∧ "ang": 0x2220, # ∠ "curren": 0x00a4, # ¤ "int": 0x222b, # ∫ "rfloor": 0x230b, # ⌋ "crarr": 0x21b5, # ↵ "exist": 0x2203, # ∃ "oplus": 0x2295, # ⊕ "piv": 0x03d6, # ϖ "ni": 0x220b, # ∋ "ne": 0x2260, # ≠ "lsaquo": 0x2039, # ‹ "yacute": 0x00fd, # ý "nu": 0x03bd, # ν "macr": 0x00af, # ¯ "larr": 0x2190, # ← "aacute": 0x00e1, # á "beta": 0x03b2, # β "fnof": 0x0192, # ƒ "rho": 0x03c1, # ρ "eacute": 0x00e9, # é "omega": 0x03c9, # ω "middot": 0x00b7, # · "lang": 0x2329, # 〈 "spades": 0x2660, # ♠ "rsquo": 0x2019, # ’ "thorn": 0x00fe, # þ "ouml": 0x00f6, # ö "raquo": 0x00bb, # » "sigma": 0x03c3, # σ "ytilde": 0x1ef9, # ỹ } def build_name2codepoint_dict(): """ Builds name to codepoint dictionary copy and paste the output to the name2codepoint dictionary name2str - name to utf-8 string dictionary """ import html.entities name2str = html_entity2str for k, v in html.entities.name2codepoint.items(): name2str[k.lower()] = chr(v).encode("utf-8") for key in sorted(name2str.keys()): value = toStr(name2str[key]) if len(value) > 1: raise ValueError("value = %r" % value) log.info(" \"{0}\": 0x{1:0>4x}, # {2}".format( key, ord(value), value, )) if __name__ == "__main__": build_name2codepoint_dict() pyglossary-3.2.1/pyglossary/json_utils.py0000644000175000017500000000442713577304507021210 0ustar emfoxemfox00000000000000import sys try: import json except ImportError: import simplejson as json from collections import OrderedDict def dataToPrettyJson(data, ensure_ascii=False, sort_keys=False): return json.dumps( data, sort_keys=sort_keys, indent=2, ensure_ascii=ensure_ascii, ) def dataToCompactJson(data, ensure_ascii=False, sort_keys=False): return json.dumps( data, sort_keys=sort_keys, separators=(',', ':'), ensure_ascii=ensure_ascii, ) jsonToData = json.loads def jsonToOrderedData(text): return json.JSONDecoder( object_pairs_hook=OrderedDict, ).decode(text) ############################### def loadJsonConf(module, confPath, decoders={}): from os.path import isfile ### if not isfile(confPath): return ### try: text = open(confPath).read() except Exception as e: print('failed to read file "%s": %s' % (confPath, e)) return ### try: data = json.loads(text) except Exception as e: print('invalid json file "%s": %s' % (confPath, e)) return ### if isinstance(module, str): module = sys.modules[module] for param, value in data.items(): try: decoder = decoders[param] except KeyError: pass else: value = decoder(value) setattr(module, param, value) def saveJsonConf(module, confPath, params, encoders={}): if isinstance(module, str): module = sys.modules[module] ### data = OrderedDict() for param in params: value = getattr(module, param) try: encoder = encoders[param] except KeyError: pass else: value = encoder(value) data[param] = value ### text = dataToPrettyJson(data) try: open(confPath, 'w').write(text) except Exception as e: print('failed to save file "%s": %s' % (confPath, e)) return def loadModuleJsonConf(module): if isinstance(module, str): module = sys.modules[module] ### decoders = getattr(module, 'confDecoders', {}) ### try: sysConfPath = module.sysConfPath except AttributeError: pass else: loadJsonConf( module, sysConfPath, decoders, ) #### loadJsonConf( module, module.confPath, decoders, ) # should use module.confParams to restrict json keys? FIXME def saveModuleJsonConf(module): if isinstance(module, str): module = sys.modules[module] ### saveJsonConf( module, module.confPath, module.confParams, getattr(module, 'confEncoders', {}), ) pyglossary-3.2.1/pyglossary/math_utils.py0000644000175000017500000000035313577304507021162 0ustar emfoxemfox00000000000000def chBaseIntToList(number, base): result = [] if number < 0: raise ValueError('number must be posotive integer') while True: number, rdigit = divmod(number, base) result = [rdigit] + result if number == 0: return result pyglossary-3.2.1/pyglossary/os_utils.py0000644000175000017500000000240413577304507020651 0ustar emfoxemfox00000000000000import os import shutil class indir(object): """ mkdir + chdir shortcut to use with `with` statement. >>> print(os.getcwd()) # -> "~/projects" >>> with indir('my_directory', create=True): >>> print(os.getcwd()) # -> "~/projects/my_directory" >>> # do some work inside new 'my_directory'... >>> print(os.getcwd()) # -> "~/projects" >>> # automatically return to previous directory. """ def __init__(self, directory, create=False, clear=False): self.oldpwd = None self.dir = directory self.create = create self.clear = clear def __enter__(self): self.oldpwd = os.getcwd() if os.path.exists(self.dir): if self.clear: shutil.rmtree(self.dir) os.makedirs(self.dir) elif self.create: os.makedirs(self.dir) os.chdir(self.dir) def __exit__(self, exc_type, exc_val, exc_tb): os.chdir(self.oldpwd) self.oldpwd = None def my_url_show(link): import subprocess for path in ( '/usr/bin/gnome-www-browser', '/usr/bin/firefox', '/usr/bin/iceweasel', '/usr/bin/konqueror', ): if os.path.isfile(path): subprocess.call([path, link]) break """ try: from gnome import url_show except: try: from gnomevfs import url_show except: url_show = my_url_show """ def click_website(widget, link): my_url_show(link) pyglossary-3.2.1/pyglossary/persian_utils.py0000644000175000017500000000032213577304507021666 0ustar emfoxemfox00000000000000from .text_utils import replacePostSpaceChar def faEditStr(st): return replacePostSpaceChar( st.replace('ي', 'ی') .replace('ك', 'ک') .replace('ۂ', 'هٔ') .replace('ہ', 'ه'), '،', ) pyglossary-3.2.1/pyglossary/plugin_lib/0000755000175000017500000000000013577304644020564 5ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/plugin_lib/__init__.py0000644000175000017500000000000013575553425022664 0ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/plugin_lib/pureSalsa20.py0000644000175000017500000003107613575553425023247 0ustar emfoxemfox00000000000000#!/usr/bin/env python # coding: utf-8 """ Copyright by https://github.com/zhansliu/writemdict pureSalsa20.py -- a pure Python implementation of the Salsa20 cipher, ported to Python 3 v4.0: Added Python 3 support, dropped support for Python <= 2.5. // zhansliu Original comments below. ==================================================================== There are comments here by two authors about three pieces of software: comments by Larry Bugbee about Salsa20, the stream cipher by Daniel J. Bernstein (including comments about the speed of the C version) and pySalsa20, Bugbee's own Python wrapper for salsa20.c (including some references), and comments by Steve Witham about pureSalsa20, Witham's pure Python 2.5 implementation of Salsa20, which follows pySalsa20's API, and is in this file. Salsa20: a Fast Streaming Cipher (comments by Larry Bugbee) ----------------------------------------------------------- Salsa20 is a fast stream cipher written by Daniel Bernstein that basically uses a hash function and XOR making for fast encryption. (Decryption uses the same function.) Salsa20 is simple and quick. Some Salsa20 parameter values... design strength 128 bits key length 128 or 256 bits, exactly IV, aka nonce 64 bits, always chunk size must be in multiples of 64 bytes Salsa20 has two reduced versions, 8 and 12 rounds each. One benchmark (10 MB): 1.5GHz PPC G4 102/97/89 MB/sec for 8/12/20 rounds AMD Athlon 2500+ 77/67/53 MB/sec for 8/12/20 rounds (no I/O and before Python GC kicks in) Salsa20 is a Phase 3 finalist in the EU eSTREAM competition and appears to be one of the fastest ciphers. It is well documented so I will not attempt any injustice here. Please see "References" below. ...and Salsa20 is "free for any use". pySalsa20: a Python wrapper for Salsa20 (Comments by Larry Bugbee) ------------------------------------------------------------------ pySalsa20.py is a simple ctypes Python wrapper. Salsa20 is as it's name implies, 20 rounds, but there are two reduced versions, 8 and 12 rounds each. Because the APIs are identical, pySalsa20 is capable of wrapping all three versions (number of rounds hardcoded), including a special version that allows you to set the number of rounds with a set_rounds() function. Compile the version of your choice as a shared library (not as a Python extension), name and install it as libsalsa20.so. Sample usage: from pySalsa20 import Salsa20 s20 = Salsa20(key, IV) dataout = s20.encryptBytes(datain) # same for decrypt This is EXPERIMENTAL software and intended for educational purposes only. To make experimentation less cumbersome, pySalsa20 is also free for any use. THIS PROGRAM IS PROVIDED WITHOUT WARRANTY OR GUARANTEE OF ANY KIND. USE AT YOUR OWN RISK. Enjoy, Larry Bugbee bugbee@seanet.com April 2007 References: ----------- http://en.wikipedia.org/wiki/Salsa20 http://en.wikipedia.org/wiki/Daniel_Bernstein http://cr.yp.to/djb.html http://www.ecrypt.eu.org/stream/salsa20p3.html http://www.ecrypt.eu.org/stream/p3ciphers/salsa20/salsa20_p3source.zip Prerequisites for pySalsa20: ---------------------------- - Python 2.5 (haven't tested in 2.4) pureSalsa20: Salsa20 in pure Python 2.5 (comments by Steve Witham) ------------------------------------------------------------------ pureSalsa20 is the stand-alone Python code in this file. It implements the underlying Salsa20 core algorithm and emulates pySalsa20's Salsa20 class API (minus a bug(*)). pureSalsa20 is MUCH slower than libsalsa20.so wrapped with pySalsa20-- about 1/1000 the speed for Salsa20/20 and 1/500 the speed for Salsa20/8, when encrypting 64k-byte blocks on my computer. pureSalsa20 is for cases where portability is much more important than speed. I wrote it for use in a "structured" random number generator. There are comments about the reasons for this slowness in http://www.tiac.net/~sw/2010/02/PureSalsa20 Sample usage: from pureSalsa20 import Salsa20 s20 = Salsa20(key, IV) dataout = s20.encryptBytes(datain) # same for decrypt I took the test code from pySalsa20, added a bunch of tests including rough speed tests, and moved them into the file testSalsa20.py. To test both pySalsa20 and pureSalsa20, type python testSalsa20.py (*)The bug (?) in pySalsa20 is this. The rounds variable is global to the libsalsa20.so library and not switched when switching between instances of the Salsa20 class. s1 = Salsa20( key, IV, 20 ) s2 = Salsa20( key, IV, 8 ) In this example, with pySalsa20, both s1 and s2 will do 8 rounds of encryption. with pureSalsa20, s1 will do 20 rounds and s2 will do 8 rounds. Perhaps giving each instance its own nRounds variable, which is passed to the salsa20wordtobyte() function, is insecure. I'm not a cryptographer. pureSalsa20.py and testSalsa20.py are EXPERIMENTAL software and intended for educational purposes only. To make experimentation less cumbersome, pureSalsa20.py and testSalsa20.py are free for any use. Revisions: ---------- p3.2 Fixed bug that initialized the output buffer with plaintext! Saner ramping of nreps in speed test. Minor changes and print statements. p3.1 Took timing variability out of add32() and rot32(). Made the internals more like pySalsa20/libsalsa . Put the semicolons back in the main loop! In encryptBytes(), modify a byte array instead of appending. Fixed speed calculation bug. Used subclasses instead of patches in testSalsa20.py . Added 64k-byte messages to speed test to be fair to pySalsa20. p3 First version, intended to parallel pySalsa20 version 3. More references: ---------------- http://www.seanet.com/~bugbee/crypto/salsa20/ [pySalsa20] http://cr.yp.to/snuffle.html [The original name of Salsa20] http://cr.yp.to/snuffle/salsafamily-20071225.pdf [ Salsa20 design] http://www.tiac.net/~sw/2010/02/PureSalsa20 THIS PROGRAM IS PROVIDED WITHOUT WARRANTY OR GUARANTEE OF ANY KIND. USE AT YOUR OWN RISK. Cheers, Steve Witham sw at remove-this tiac dot net February, 2010 """ import sys assert(sys.version_info >= (2, 6)) if sys.version_info >= (3,): integer_types = (int,) python3 = True else: integer_types = (int, long) python3 = False from struct import Struct little_u64 = Struct( "= 2**64" ctx = self.ctx ctx[ 8],ctx[ 9] = little2_i32.unpack( little_u64.pack( counter ) ) def getCounter( self ): return little_u64.unpack( little2_i32.pack( *self.ctx[ 8:10 ] ) ) [0] def setRounds(self, rounds, testing=False ): assert testing or rounds in [8, 12, 20], 'rounds must be 8, 12, 20' self.rounds = rounds def encryptBytes(self, data): assert type(data) == bytes, 'data must be byte string' assert self._lastChunk64, 'previous chunk not multiple of 64 bytes' lendata = len(data) munged = bytearray(lendata) for i in range( 0, lendata, 64 ): h = salsa20_wordtobyte( self.ctx, self.rounds, checkRounds=False ) self.setCounter( ( self.getCounter() + 1 ) % 2**64 ) # Stopping at 2^70 bytes per nonce is user's responsibility. for j in range( min( 64, lendata - i ) ): if python3: munged[ i+j ] = data[ i+j ] ^ h[j] else: munged[ i+j ] = ord(data[ i+j ]) ^ ord(h[j]) self._lastChunk64 = not lendata % 64 return bytes(munged) decryptBytes = encryptBytes # encrypt and decrypt use same function #-------------------------------------------------------------------------- def salsa20_wordtobyte( input, nRounds=20, checkRounds=True ): """ Do nRounds Salsa20 rounds on a copy of input: list or tuple of 16 ints treated as little-endian unsigneds. Returns a 64-byte string. """ assert( type(input) in ( list, tuple ) and len(input) == 16 ) assert( not(checkRounds) or ( nRounds in [ 8, 12, 20 ] ) ) x = list( input ) def XOR( a, b ): return a ^ b ROTATE = rot32 PLUS = add32 for i in range( nRounds // 2 ): # These ...XOR...ROTATE...PLUS... lines are from ecrypt-linux.c # unchanged except for indents and the blank line between rounds: x[ 4] = XOR(x[ 4],ROTATE(PLUS(x[ 0],x[12]), 7)); x[ 8] = XOR(x[ 8],ROTATE(PLUS(x[ 4],x[ 0]), 9)); x[12] = XOR(x[12],ROTATE(PLUS(x[ 8],x[ 4]),13)); x[ 0] = XOR(x[ 0],ROTATE(PLUS(x[12],x[ 8]),18)); x[ 9] = XOR(x[ 9],ROTATE(PLUS(x[ 5],x[ 1]), 7)); x[13] = XOR(x[13],ROTATE(PLUS(x[ 9],x[ 5]), 9)); x[ 1] = XOR(x[ 1],ROTATE(PLUS(x[13],x[ 9]),13)); x[ 5] = XOR(x[ 5],ROTATE(PLUS(x[ 1],x[13]),18)); x[14] = XOR(x[14],ROTATE(PLUS(x[10],x[ 6]), 7)); x[ 2] = XOR(x[ 2],ROTATE(PLUS(x[14],x[10]), 9)); x[ 6] = XOR(x[ 6],ROTATE(PLUS(x[ 2],x[14]),13)); x[10] = XOR(x[10],ROTATE(PLUS(x[ 6],x[ 2]),18)); x[ 3] = XOR(x[ 3],ROTATE(PLUS(x[15],x[11]), 7)); x[ 7] = XOR(x[ 7],ROTATE(PLUS(x[ 3],x[15]), 9)); x[11] = XOR(x[11],ROTATE(PLUS(x[ 7],x[ 3]),13)); x[15] = XOR(x[15],ROTATE(PLUS(x[11],x[ 7]),18)); x[ 1] = XOR(x[ 1],ROTATE(PLUS(x[ 0],x[ 3]), 7)); x[ 2] = XOR(x[ 2],ROTATE(PLUS(x[ 1],x[ 0]), 9)); x[ 3] = XOR(x[ 3],ROTATE(PLUS(x[ 2],x[ 1]),13)); x[ 0] = XOR(x[ 0],ROTATE(PLUS(x[ 3],x[ 2]),18)); x[ 6] = XOR(x[ 6],ROTATE(PLUS(x[ 5],x[ 4]), 7)); x[ 7] = XOR(x[ 7],ROTATE(PLUS(x[ 6],x[ 5]), 9)); x[ 4] = XOR(x[ 4],ROTATE(PLUS(x[ 7],x[ 6]),13)); x[ 5] = XOR(x[ 5],ROTATE(PLUS(x[ 4],x[ 7]),18)); x[11] = XOR(x[11],ROTATE(PLUS(x[10],x[ 9]), 7)); x[ 8] = XOR(x[ 8],ROTATE(PLUS(x[11],x[10]), 9)); x[ 9] = XOR(x[ 9],ROTATE(PLUS(x[ 8],x[11]),13)); x[10] = XOR(x[10],ROTATE(PLUS(x[ 9],x[ 8]),18)); x[12] = XOR(x[12],ROTATE(PLUS(x[15],x[14]), 7)); x[13] = XOR(x[13],ROTATE(PLUS(x[12],x[15]), 9)); x[14] = XOR(x[14],ROTATE(PLUS(x[13],x[12]),13)); x[15] = XOR(x[15],ROTATE(PLUS(x[14],x[13]),18)); for i in range( len( input ) ): x[i] = PLUS( x[i], input[i] ) return little16_i32.pack( *x ) #--------------------------- 32-bit ops ------------------------------- def trunc32( w ): """ Return the bottom 32 bits of w as a Python int. This creates longs temporarily, but returns an int. """ w = int( ( w & 0x7fffFFFF ) | -( w & 0x80000000 ) ) assert type(w) == int return w def add32( a, b ): """ Add two 32-bit words discarding carry above 32nd bit, and without creating a Python long. Timing shouldn't vary. """ lo = ( a & 0xFFFF ) + ( b & 0xFFFF ) hi = ( a >> 16 ) + ( b >> 16 ) + ( lo >> 16 ) return ( -(hi & 0x8000) | ( hi & 0x7FFF ) ) << 16 | ( lo & 0xFFFF ) def rot32( w, nLeft ): """ Rotate 32-bit word left by nLeft or right by -nLeft without creating a Python long. Timing depends on nLeft but not on w. """ nLeft &= 31 # which makes nLeft >= 0 if nLeft == 0: return w # Note: now 1 <= nLeft <= 31. # RRRsLLLLLL There are nLeft RRR's, (31-nLeft) LLLLLL's, # => sLLLLLLRRR and one s which becomes the sign bit. RRR = ( ( ( w >> 1 ) & 0x7fffFFFF ) >> ( 31 - nLeft ) ) sLLLLLL = -( (1<<(31-nLeft)) & w ) | (0x7fffFFFF>>nLeft) & w return RRR | ( sLLLLLL << nLeft ) # --------------------------------- end ----------------------------------- pyglossary-3.2.1/pyglossary/plugin_lib/py34/0000755000175000017500000000000013577304644021363 5ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/plugin_lib/py34/__init__.py0000644000175000017500000000000013575553425023463 0ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/plugin_lib/py34/gzip_no_crc.py0000644000175000017500000005737213575553425024250 0ustar emfoxemfox00000000000000"""Functions that read and write gzipped files. The user of the file doesn't have to worry about the compression, but random access is not allowed.""" # based on Andrew Kuchling's minigzip.py distributed with the zlib module import struct, sys, time, os import zlib import builtins import io __all__ = ["GzipFile", "open", "compress", "decompress"] FTEXT, FHCRC, FEXTRA, FNAME, FCOMMENT = 1, 2, 4, 8, 16 READ, WRITE = 1, 2 def open(filename, mode="rb", compresslevel=9, encoding=None, errors=None, newline=None): """Open a gzip-compressed file in binary or text mode. The filename argument can be an actual filename (a str or bytes object), or an existing file object to read from or write to. The mode argument can be "r", "rb", "w", "wb", "x", "xb", "a" or "ab" for binary mode, or "rt", "wt", "xt" or "at" for text mode. The default mode is "rb", and the default compresslevel is 9. For binary mode, this function is equivalent to the GzipFile constructor: GzipFile(filename, mode, compresslevel). In this case, the encoding, errors and newline arguments must not be provided. For text mode, a GzipFile object is created, and wrapped in an io.TextIOWrapper instance with the specified encoding, error handling behavior, and line ending(s). """ if "t" in mode: if "b" in mode: raise ValueError("Invalid mode: %r" % (mode,)) else: if encoding is not None: raise ValueError("Argument 'encoding' not supported in binary mode") if errors is not None: raise ValueError("Argument 'errors' not supported in binary mode") if newline is not None: raise ValueError("Argument 'newline' not supported in binary mode") gz_mode = mode.replace("t", "") if isinstance(filename, (str, bytes)): binary_file = GzipFile(filename, gz_mode, compresslevel) elif hasattr(filename, "read") or hasattr(filename, "write"): binary_file = GzipFile(None, gz_mode, compresslevel, filename) else: raise TypeError("filename must be a str or bytes object, or a file") if "t" in mode: return io.TextIOWrapper(binary_file, encoding, errors, newline) else: return binary_file def write32u(output, value): # The L format writes the bit pattern correctly whether signed # or unsigned. output.write(struct.pack("' def _check_closed(self): """Raises a ValueError if the underlying file object has been closed. """ if self.closed: raise ValueError('I/O operation on closed file.') def _init_write(self, filename): self.name = filename self.crc = zlib.crc32(b"") & 0xffffffff self.size = 0 self.writebuf = [] self.bufsize = 0 def _write_gzip_header(self): self.fileobj.write(b'\037\213') # magic header self.fileobj.write(b'\010') # compression method try: # RFC 1952 requires the FNAME field to be Latin-1. Do not # include filenames that cannot be represented that way. fname = os.path.basename(self.name) if not isinstance(fname, bytes): fname = fname.encode('latin-1') if fname.endswith(b'.gz'): fname = fname[:-3] except UnicodeEncodeError: fname = b'' flags = 0 if fname: flags = FNAME self.fileobj.write(chr(flags).encode('latin-1')) mtime = self.mtime if mtime is None: mtime = time.time() write32u(self.fileobj, int(mtime)) self.fileobj.write(b'\002') self.fileobj.write(b'\377') if fname: self.fileobj.write(fname + b'\000') def _init_read(self): self.crc = zlib.crc32(b"") & 0xffffffff self.size = 0 def _read_exact(self, n): data = self.fileobj.read(n) while len(data) < n: b = self.fileobj.read(n - len(data)) if not b: raise EOFError("Compressed file ended before the " "end-of-stream marker was reached") data += b return data def _read_gzip_header(self): magic = self.fileobj.read(2) if magic == b'': return False if magic != b'\037\213': raise OSError('Not a gzipped file') method, flag, self.mtime = struct.unpack(" 0: self.fileobj.write(self.compress.compress(data)) self.size += len(data) self.crc = zlib.crc32(data, self.crc) & 0xffffffff self.offset += len(data) return len(data) def read(self, size=-1): self._check_closed() if self.mode != READ: import errno raise OSError(errno.EBADF, "read() on write-only GzipFile object") if self.extrasize <= 0 and self.fileobj is None: return b'' readsize = 1024 if size < 0: # get the whole thing while self._read(readsize): readsize = min(self.max_read_chunk, readsize * 2) size = self.extrasize else: # just get some more of it while size > self.extrasize: if not self._read(readsize): if size > self.extrasize: size = self.extrasize break readsize = min(self.max_read_chunk, readsize * 2) offset = self.offset - self.extrastart chunk = self.extrabuf[offset: offset + size] self.extrasize = self.extrasize - size self.offset += size return chunk def read1(self, size=-1): self._check_closed() if self.mode != READ: import errno raise OSError(errno.EBADF, "read1() on write-only GzipFile object") if self.extrasize <= 0 and self.fileobj is None: return b'' # For certain input data, a single call to _read() may not return # any data. In this case, retry until we get some data or reach EOF. while self.extrasize <= 0 and self._read(): pass if size < 0 or size > self.extrasize: size = self.extrasize offset = self.offset - self.extrastart chunk = self.extrabuf[offset: offset + size] self.extrasize -= size self.offset += size return chunk def peek(self, n): if self.mode != READ: import errno raise OSError(errno.EBADF, "peek() on write-only GzipFile object") # Do not return ridiculously small buffers, for one common idiom # is to call peek(1) and expect more bytes in return. if n < 100: n = 100 if self.extrasize == 0: if self.fileobj is None: return b'' # Ensure that we don't return b"" if we haven't reached EOF. # 1024 is the same buffering heuristic used in read() while self.extrasize == 0 and self._read(max(n, 1024)): pass offset = self.offset - self.extrastart remaining = self.extrasize assert remaining == len(self.extrabuf) - offset return self.extrabuf[offset:offset + n] def _unread(self, buf): self.extrasize = len(buf) + self.extrasize self.offset -= len(buf) def _read(self, size=1024): if self.fileobj is None: return False if self._new_member: # If the _new_member flag is set, we have to # jump to the next member, if there is one. self._init_read() if not self._read_gzip_header(): return False self.decompress = zlib.decompressobj(-zlib.MAX_WBITS) self._new_member = False # Read a chunk of data from the file buf = self.fileobj.read(size) # If the EOF has been reached, flush the decompression object # and mark this object as finished. if buf == b"": uncompress = self.decompress.flush() # Prepend the already read bytes to the fileobj to they can be # seen by _read_eof() self.fileobj.prepend(self.decompress.unused_data, True) self._read_eof() self._add_read_data( uncompress ) return False uncompress = self.decompress.decompress(buf) self._add_read_data( uncompress ) if self.decompress.unused_data != b"": # Ending case: we've come to the end of a member in the file, # so seek back to the start of the unused data, finish up # this member, and read a new gzip header. # Prepend the already read bytes to the fileobj to they can be # seen by _read_eof() and _read_gzip_header() self.fileobj.prepend(self.decompress.unused_data, True) # Check the CRC and file size, and set the flag so we read # a new member on the next call self._read_eof() self._new_member = True return True def _add_read_data(self, data): self.crc = zlib.crc32(data, self.crc) & 0xffffffff offset = self.offset - self.extrastart self.extrabuf = self.extrabuf[offset:] + data self.extrasize = self.extrasize + len(data) self.extrastart = self.offset self.size = self.size + len(data) def _read_eof(self): # We've read to the end of the file # We check the that the computed CRC and size of the # uncompressed data matches the stored values. Note that the size # stored is the true file size mod 2**32. crc32, isize = struct.unpack(" 0: self.extrasize -= i - offset self.offset += i - offset return self.extrabuf[offset: i] size = sys.maxsize readsize = self.min_readsize else: readsize = size bufs = [] while size != 0: c = self.read(readsize) i = c.find(b'\n') # We set i=size to break out of the loop under two # conditions: 1) there's no newline, and the chunk is # larger than size, or 2) there is a newline, but the # resulting line would be longer than 'size'. if (size <= i) or (i == -1 and len(c) > size): i = size - 1 if i >= 0 or c == b'': bufs.append(c[:i + 1]) # Add portion of last chunk self._unread(c[i + 1:]) # Push back rest of chunk break # Append chunk to list, decrease 'size', bufs.append(c) size = size - len(c) readsize = min(size, readsize * 2) if readsize > self.min_readsize: self.min_readsize = min(readsize, self.min_readsize * 2, 512) return b''.join(bufs) # Return resulting line def compress(data, compresslevel=9): """Compress data in one shot and return the compressed string. Optional argument is the compression level, in range of 0-9. """ buf = io.BytesIO() with GzipFile(fileobj=buf, mode='wb', compresslevel=compresslevel) as f: f.write(data) return buf.getvalue() def decompress(data): """Decompress a gzip compressed string in one shot. Return the decompressed string. """ with GzipFile(fileobj=io.BytesIO(data)) as f: return f.read() def _test(): # Act like gzip; with -d, act like gunzip. # The input file is not deleted, however, nor are any other gzip # options or features supported. args = sys.argv[1:] decompress = args and args[0] == "-d" if decompress: args = args[1:] if not args: args = ["-"] for arg in args: if decompress: if arg == "-": f = GzipFile(filename="", mode="rb", fileobj=sys.stdin.buffer) g = sys.stdout.buffer else: if arg[-3:] != ".gz": print("filename doesn't end in .gz:", repr(arg)) continue f = open(arg, "rb") g = builtins.open(arg[:-3], "wb") else: if arg == "-": f = sys.stdin.buffer g = GzipFile(filename="", mode="wb", fileobj=sys.stdout.buffer) else: f = builtins.open(arg, "rb") g = open(arg + ".gz", "wb") while True: chunk = f.read(1024) if not chunk: break g.write(chunk) if g is not sys.stdout.buffer: g.close() if f is not sys.stdin.buffer: f.close() if __name__ == '__main__': _test() pyglossary-3.2.1/pyglossary/plugin_lib/py35/0000755000175000017500000000000013577304644021364 5ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/plugin_lib/py35/__init__.py0000644000175000017500000000000013575553425023464 0ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/plugin_lib/py35/gzip_no_crc.py0000644000175000017500000004752213575553425024245 0ustar emfoxemfox00000000000000"""Functions that read and write gzipped files. The user of the file doesn't have to worry about the compression, but random access is not allowed.""" # based on Andrew Kuchling's minigzip.py distributed with the zlib module import logging log = logging.getLogger('root') import struct, sys, time, os import zlib import builtins import io import _compression __all__ = ["GzipFile", "open", "compress", "decompress"] FTEXT, FHCRC, FEXTRA, FNAME, FCOMMENT = 1, 2, 4, 8, 16 READ, WRITE = 1, 2 def open(filename, mode="rb", compresslevel=9, encoding=None, errors=None, newline=None): """Open a gzip-compressed file in binary or text mode. The filename argument can be an actual filename (a str or bytes object), or an existing file object to read from or write to. The mode argument can be "r", "rb", "w", "wb", "x", "xb", "a" or "ab" for binary mode, or "rt", "wt", "xt" or "at" for text mode. The default mode is "rb", and the default compresslevel is 9. For binary mode, this function is equivalent to the GzipFile constructor: GzipFile(filename, mode, compresslevel). In this case, the encoding, errors and newline arguments must not be provided. For text mode, a GzipFile object is created, and wrapped in an io.TextIOWrapper instance with the specified encoding, error handling behavior, and line ending(s). """ if "t" in mode: if "b" in mode: raise ValueError("Invalid mode: %r" % (mode,)) else: if encoding is not None: raise ValueError("Argument 'encoding' not supported in binary mode") if errors is not None: raise ValueError("Argument 'errors' not supported in binary mode") if newline is not None: raise ValueError("Argument 'newline' not supported in binary mode") gz_mode = mode.replace("t", "") if isinstance(filename, (str, bytes)): binary_file = GzipFile(filename, gz_mode, compresslevel) elif hasattr(filename, "read") or hasattr(filename, "write"): binary_file = GzipFile(None, gz_mode, compresslevel, filename) else: raise TypeError("filename must be a str or bytes object, or a file") if "t" in mode: return io.TextIOWrapper(binary_file, encoding, errors, newline) else: return binary_file def write32u(output, value): # The L format writes the bit pattern correctly whether signed # or unsigned. output.write(struct.pack("' def _init_write(self, filename): self.name = filename self.crc = zlib.crc32(b"") self.size = 0 self.writebuf = [] self.bufsize = 0 self.offset = 0 # Current file offset for seek(), tell(), etc def _write_gzip_header(self): self.fileobj.write(b'\037\213') # magic header self.fileobj.write(b'\010') # compression method try: # RFC 1952 requires the FNAME field to be Latin-1. Do not # include filenames that cannot be represented that way. fname = os.path.basename(self.name) if not isinstance(fname, bytes): fname = fname.encode('latin-1') if fname.endswith(b'.gz'): fname = fname[:-3] except UnicodeEncodeError: fname = b'' flags = 0 if fname: flags = FNAME self.fileobj.write(chr(flags).encode('latin-1')) mtime = self._write_mtime if mtime is None: mtime = time.time() write32u(self.fileobj, int(mtime)) self.fileobj.write(b'\002') self.fileobj.write(b'\377') if fname: self.fileobj.write(fname + b'\000') def write(self,data): self._check_not_closed() if self.mode != WRITE: import errno raise OSError(errno.EBADF, "write() on read-only GzipFile object") if self.fileobj is None: raise ValueError("write() on closed GzipFile object") if isinstance(data, bytes): length = len(data) else: # accept any data that supports the buffer protocol data = memoryview(data) length = data.nbytes if length > 0: self.fileobj.write(self.compress.compress(data)) self.size += length self.crc = zlib.crc32(data, self.crc) self.offset += length return length def read(self, size=-1): self._check_not_closed() if self.mode != READ: import errno raise OSError(errno.EBADF, "read() on write-only GzipFile object") return self._buffer.read(size) def read1(self, size=-1): """Implements BufferedIOBase.read1() Reads up to a buffer's worth of data is size is negative.""" self._check_not_closed() if self.mode != READ: import errno raise OSError(errno.EBADF, "read1() on write-only GzipFile object") if size < 0: size = io.DEFAULT_BUFFER_SIZE return self._buffer.read1(size) def peek(self, n): self._check_not_closed() if self.mode != READ: import errno raise OSError(errno.EBADF, "peek() on write-only GzipFile object") return self._buffer.peek(n) @property def closed(self): return self.fileobj is None def close(self): fileobj = self.fileobj if fileobj is None: return self.fileobj = None try: if self.mode == WRITE: fileobj.write(self.compress.flush()) write32u(fileobj, self.crc) # self.size may exceed 2GB, or even 4GB write32u(fileobj, self.size & 0xffffffff) elif self.mode == READ: self._buffer.close() finally: myfileobj = self.myfileobj if myfileobj: self.myfileobj = None myfileobj.close() def flush(self,zlib_mode=zlib.Z_SYNC_FLUSH): self._check_not_closed() if self.mode == WRITE: # Ensure the compressor's buffer is flushed self.fileobj.write(self.compress.flush(zlib_mode)) self.fileobj.flush() def fileno(self): """Invoke the underlying file object's fileno() method. This will raise AttributeError if the underlying file object doesn't support fileno(). """ return self.fileobj.fileno() def rewind(self): '''Return the uncompressed stream file position indicator to the beginning of the file''' if self.mode != READ: raise OSError("Can't rewind in write mode") self._buffer.seek(0) def readable(self): return self.mode == READ def writable(self): return self.mode == WRITE def seekable(self): return True def seek(self, offset, whence=io.SEEK_SET): if self.mode == WRITE: if whence != io.SEEK_SET: if whence == io.SEEK_CUR: offset = self.offset + offset else: raise ValueError('Seek from end not supported') if offset < self.offset: raise OSError('Negative seek in write mode') count = offset - self.offset chunk = bytes(1024) for i in range(count // 1024): self.write(chunk) self.write(bytes(count % 1024)) elif self.mode == READ: self._check_not_closed() return self._buffer.seek(offset, whence) return self.offset def readline(self, size=-1): self._check_not_closed() return self._buffer.readline(size) class _GzipReader(_compression.DecompressReader): def __init__(self, fp): super().__init__(_PaddedFile(fp), zlib.decompressobj, wbits=-zlib.MAX_WBITS) # Set flag indicating start of a new member self._new_member = True self._last_mtime = None def _init_read(self): self._crc = zlib.crc32(b"") self._stream_size = 0 # Decompressed size of unconcatenated stream def _read_exact(self, n): '''Read exactly *n* bytes from `self._fp` This method is required because self._fp may be unbuffered, i.e. return short reads. ''' data = self._fp.read(n) while len(data) < n: b = self._fp.read(n - len(data)) if not b: raise EOFError("Compressed file ended before the " "end-of-stream marker was reached") data += b return data def _read_gzip_header(self): magic = self._fp.read(2) if magic == b'': return False if magic != b'\037\213': raise OSError('Not a gzipped file (%r)' % magic) (method, flag, self._last_mtime) = struct.unpack("' def _init_write(self, filename): self.name = filename self.crc = zlib.crc32(b"") self.size = 0 self.writebuf = [] self.bufsize = 0 self.offset = 0 # Current file offset for seek(), tell(), etc def _write_gzip_header(self): self.fileobj.write(b'\037\213') # magic header self.fileobj.write(b'\010') # compression method try: # RFC 1952 requires the FNAME field to be Latin-1. Do not # include filenames that cannot be represented that way. fname = os.path.basename(self.name) if not isinstance(fname, bytes): fname = fname.encode('latin-1') if fname.endswith(b'.gz'): fname = fname[:-3] except UnicodeEncodeError: fname = b'' flags = 0 if fname: flags = FNAME self.fileobj.write(chr(flags).encode('latin-1')) mtime = self._write_mtime if mtime is None: mtime = time.time() write32u(self.fileobj, int(mtime)) self.fileobj.write(b'\002') self.fileobj.write(b'\377') if fname: self.fileobj.write(fname + b'\000') def write(self,data): self._check_not_closed() if self.mode != WRITE: import errno raise OSError(errno.EBADF, "write() on read-only GzipFile object") if self.fileobj is None: raise ValueError("write() on closed GzipFile object") if isinstance(data, bytes): length = len(data) else: # accept any data that supports the buffer protocol data = memoryview(data) length = data.nbytes if length > 0: self.fileobj.write(self.compress.compress(data)) self.size += length self.crc = zlib.crc32(data, self.crc) self.offset += length return length def read(self, size=-1): self._check_not_closed() if self.mode != READ: import errno raise OSError(errno.EBADF, "read() on write-only GzipFile object") return self._buffer.read(size) def read1(self, size=-1): """Implements BufferedIOBase.read1() Reads up to a buffer's worth of data is size is negative.""" self._check_not_closed() if self.mode != READ: import errno raise OSError(errno.EBADF, "read1() on write-only GzipFile object") if size < 0: size = io.DEFAULT_BUFFER_SIZE return self._buffer.read1(size) def peek(self, n): self._check_not_closed() if self.mode != READ: import errno raise OSError(errno.EBADF, "peek() on write-only GzipFile object") return self._buffer.peek(n) @property def closed(self): return self.fileobj is None def close(self): fileobj = self.fileobj if fileobj is None: return self.fileobj = None try: if self.mode == WRITE: fileobj.write(self.compress.flush()) write32u(fileobj, self.crc) # self.size may exceed 2GB, or even 4GB write32u(fileobj, self.size & 0xffffffff) elif self.mode == READ: self._buffer.close() finally: myfileobj = self.myfileobj if myfileobj: self.myfileobj = None myfileobj.close() def flush(self,zlib_mode=zlib.Z_SYNC_FLUSH): self._check_not_closed() if self.mode == WRITE: # Ensure the compressor's buffer is flushed self.fileobj.write(self.compress.flush(zlib_mode)) self.fileobj.flush() def fileno(self): """Invoke the underlying file object's fileno() method. This will raise AttributeError if the underlying file object doesn't support fileno(). """ return self.fileobj.fileno() def rewind(self): '''Return the uncompressed stream file position indicator to the beginning of the file''' if self.mode != READ: raise OSError("Can't rewind in write mode") self._buffer.seek(0) def readable(self): return self.mode == READ def writable(self): return self.mode == WRITE def seekable(self): return True def seek(self, offset, whence=io.SEEK_SET): if self.mode == WRITE: if whence != io.SEEK_SET: if whence == io.SEEK_CUR: offset = self.offset + offset else: raise ValueError('Seek from end not supported') if offset < self.offset: raise OSError('Negative seek in write mode') count = offset - self.offset chunk = b'\0' * 1024 for i in range(count // 1024): self.write(chunk) self.write(b'\0' * (count % 1024)) elif self.mode == READ: self._check_not_closed() return self._buffer.seek(offset, whence) return self.offset def readline(self, size=-1): self._check_not_closed() return self._buffer.readline(size) class _GzipReader(_compression.DecompressReader): def __init__(self, fp): super().__init__(_PaddedFile(fp), zlib.decompressobj, wbits=-zlib.MAX_WBITS) # Set flag indicating start of a new member self._new_member = True self._last_mtime = None def _init_read(self): self._crc = zlib.crc32(b"") self._stream_size = 0 # Decompressed size of unconcatenated stream def _read_exact(self, n): '''Read exactly *n* bytes from `self._fp` This method is required because self._fp may be unbuffered, i.e. return short reads. ''' data = self._fp.read(n) while len(data) < n: b = self._fp.read(n - len(data)) if not b: raise EOFError("Compressed file ended before the " "end-of-stream marker was reached") data += b return data def _read_gzip_header(self): magic = self._fp.read(2) if magic == b'': return False if magic != b'\037\213': raise OSError('Not a gzipped file (%r)' % magic) (method, flag, self._last_mtime) = struct.unpack("' def _init_write(self, filename): self.name = filename self.crc = zlib.crc32(b"") self.size = 0 self.writebuf = [] self.bufsize = 0 self.offset = 0 # Current file offset for seek(), tell(), etc def _write_gzip_header(self): self.fileobj.write(b'\037\213') # magic header self.fileobj.write(b'\010') # compression method try: # RFC 1952 requires the FNAME field to be Latin-1. Do not # include filenames that cannot be represented that way. fname = os.path.basename(self.name) if not isinstance(fname, bytes): fname = fname.encode('latin-1') if fname.endswith(b'.gz'): fname = fname[:-3] except UnicodeEncodeError: fname = b'' flags = 0 if fname: flags = FNAME self.fileobj.write(chr(flags).encode('latin-1')) mtime = self._write_mtime if mtime is None: mtime = time.time() write32u(self.fileobj, int(mtime)) self.fileobj.write(b'\002') self.fileobj.write(b'\377') if fname: self.fileobj.write(fname + b'\000') def write(self,data): self._check_not_closed() if self.mode != WRITE: import errno raise OSError(errno.EBADF, "write() on read-only GzipFile object") if self.fileobj is None: raise ValueError("write() on closed GzipFile object") if isinstance(data, bytes): length = len(data) else: # accept any data that supports the buffer protocol data = memoryview(data) length = data.nbytes if length > 0: self.fileobj.write(self.compress.compress(data)) self.size += length self.crc = zlib.crc32(data, self.crc) self.offset += length return length def read(self, size=-1): self._check_not_closed() if self.mode != READ: import errno raise OSError(errno.EBADF, "read() on write-only GzipFile object") return self._buffer.read(size) def read1(self, size=-1): """Implements BufferedIOBase.read1() Reads up to a buffer's worth of data is size is negative.""" self._check_not_closed() if self.mode != READ: import errno raise OSError(errno.EBADF, "read1() on write-only GzipFile object") if size < 0: size = io.DEFAULT_BUFFER_SIZE return self._buffer.read1(size) def peek(self, n): self._check_not_closed() if self.mode != READ: import errno raise OSError(errno.EBADF, "peek() on write-only GzipFile object") return self._buffer.peek(n) @property def closed(self): return self.fileobj is None def close(self): fileobj = self.fileobj if fileobj is None: return self.fileobj = None try: if self.mode == WRITE: fileobj.write(self.compress.flush()) write32u(fileobj, self.crc) # self.size may exceed 2 GiB, or even 4 GiB write32u(fileobj, self.size & 0xffffffff) elif self.mode == READ: self._buffer.close() finally: myfileobj = self.myfileobj if myfileobj: self.myfileobj = None myfileobj.close() def flush(self,zlib_mode=zlib.Z_SYNC_FLUSH): self._check_not_closed() if self.mode == WRITE: # Ensure the compressor's buffer is flushed self.fileobj.write(self.compress.flush(zlib_mode)) self.fileobj.flush() def fileno(self): """Invoke the underlying file object's fileno() method. This will raise AttributeError if the underlying file object doesn't support fileno(). """ return self.fileobj.fileno() def rewind(self): '''Return the uncompressed stream file position indicator to the beginning of the file''' if self.mode != READ: raise OSError("Can't rewind in write mode") self._buffer.seek(0) def readable(self): return self.mode == READ def writable(self): return self.mode == WRITE def seekable(self): return True def seek(self, offset, whence=io.SEEK_SET): if self.mode == WRITE: if whence != io.SEEK_SET: if whence == io.SEEK_CUR: offset = self.offset + offset else: raise ValueError('Seek from end not supported') if offset < self.offset: raise OSError('Negative seek in write mode') count = offset - self.offset chunk = b'\0' * 1024 for i in range(count // 1024): self.write(chunk) self.write(b'\0' * (count % 1024)) elif self.mode == READ: self._check_not_closed() return self._buffer.seek(offset, whence) return self.offset def readline(self, size=-1): self._check_not_closed() return self._buffer.readline(size) class _GzipReader(_compression.DecompressReader): def __init__(self, fp): super().__init__(_PaddedFile(fp), zlib.decompressobj, wbits=-zlib.MAX_WBITS) # Set flag indicating start of a new member self._new_member = True self._last_mtime = None def _init_read(self): self._crc = zlib.crc32(b"") self._stream_size = 0 # Decompressed size of unconcatenated stream def _read_exact(self, n): '''Read exactly *n* bytes from `self._fp` This method is required because self._fp may be unbuffered, i.e. return short reads. ''' data = self._fp.read(n) while len(data) < n: b = self._fp.read(n - len(data)) if not b: raise EOFError("Compressed file ended before the " "end-of-stream marker was reached") data += b return data def _read_gzip_header(self): magic = self._fp.read(2) if magic == b'': return False if magic != b'\037\213': raise OSError('Not a gzipped file (%r)' % magic) (method, flag, self._last_mtime) = struct.unpack(" # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, version 3 of the License. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. import logging log = logging.getLogger('root') from struct import pack, unpack from io import BytesIO import re import sys from .ripemd128 import ripemd128 from .pureSalsa20 import Salsa20 # zlib compression is used for engine version >=2.0 import zlib # LZO compression is used for engine version < 2.0 try: import lzo except ImportError: lzo = None print("LZO compression support is not available") # 2x3 compatible if sys.hexversion >= 0x03000000: str, unicode = bytes, str def _unescape_entities(text): """ unescape offending tags < > " & """ text = text.replace(b'<', b'<') text = text.replace(b'>', b'>') text = text.replace(b'"', b'"') text = text.replace(b'&', b'&') return text def _fast_decrypt(data, key): b = bytearray(data) key = bytearray(key) previous = 0x36 for i in range(len(b)): t = (b[i] >> 4 | b[i] << 4) & 0xff t = t ^ previous ^ (i & 0xff) ^ key[i % len(key)] previous = b[i] b[i] = t return bytes(b) def _mdx_decrypt(comp_block): key = ripemd128(comp_block[4:8] + pack(b' """ taglist = re.findall(b'(\w+)="(.*?)"', header, re.DOTALL) tagdict = {} for key, value in taglist: tagdict[key] = _unescape_entities(value) return tagdict def _decode_key_block_info(self, key_block_info_compressed): if self._version >= 2: # zlib compression assert(key_block_info_compressed[:4] == b'\x02\x00\x00\x00') # decrypt if needed if self._encrypt & 0x02: key_block_info_compressed = _mdx_decrypt(key_block_info_compressed) # decompress key_block_info = zlib.decompress(key_block_info_compressed[8:]) # adler checksum adler32 = unpack('>I', key_block_info_compressed[4:8])[0] assert(adler32 == zlib.adler32(key_block_info) & 0xffffffff) else: # no compression key_block_info = key_block_info_compressed # decode key_block_info_list = [] num_entries = 0 i = 0 if self._version >= 2: byte_format = '>H' byte_width = 2 text_term = 1 else: byte_format = '>B' byte_width = 1 text_term = 0 while i < len(key_block_info): # number of entries in current key block num_entries += unpack(self._number_format, key_block_info[i:i+self._number_width])[0] i += self._number_width # text head size text_head_size = unpack(byte_format, key_block_info[i:i+byte_width])[0] i += byte_width # text head if self._encoding != 'UTF-16': i += text_head_size + text_term else: i += (text_head_size + text_term) * 2 # text tail size text_tail_size = unpack(byte_format, key_block_info[i:i+byte_width])[0] i += byte_width # text tail if self._encoding != 'UTF-16': i += text_tail_size + text_term else: i += (text_tail_size + text_term) * 2 # key block compressed size key_block_compressed_size = unpack(self._number_format, key_block_info[i:i+self._number_width])[0] i += self._number_width # key block decompressed size key_block_decompressed_size = unpack(self._number_format, key_block_info[i:i+self._number_width])[0] i += self._number_width key_block_info_list += [(key_block_compressed_size, key_block_decompressed_size)] #assert(num_entries == self._num_entries) return key_block_info_list def _decode_key_block(self, key_block_compressed, key_block_info_list): key_list = [] i = 0 for compressed_size, decompressed_size in key_block_info_list: start = i end = i + compressed_size # 4 bytes : compression type key_block_type = key_block_compressed[start:start+4] # 4 bytes : adler checksum of decompressed key block adler32 = unpack('>I', key_block_compressed[start+4:start+8])[0] if key_block_type == b'\x00\x00\x00\x00': key_block = key_block_compressed[start+8:end] elif key_block_type == b'\x01\x00\x00\x00': if lzo is None: print("LZO compression is not supported") break # decompress key block header = b'\xf0' + pack('>I', decompressed_size) key_block = lzo.decompress(header + key_block_compressed[start+8:end]) elif key_block_type == b'\x02\x00\x00\x00': # decompress key block key_block = zlib.decompress(key_block_compressed[start+8:end]) # extract one single key block into a key list key_list += self._split_key_block(key_block) # notice that adler32 returns signed value assert(adler32 == zlib.adler32(key_block) & 0xffffffff) i += compressed_size return key_list def _split_key_block(self, key_block): key_list = [] key_start_index = 0 while key_start_index < len(key_block): # the corresponding record's offset in record block key_id = unpack(self._number_format, key_block[key_start_index:key_start_index+self._number_width])[0] # key text ends with '\x00' if self._encoding == 'UTF-16': delimiter = b'\x00\x00' width = 2 else: delimiter = b'\x00' width = 1 i = key_start_index + self._number_width while i < len(key_block): if key_block[i:i+width] == delimiter: key_end_index = i break i += width key_text = key_block[key_start_index+self._number_width:key_end_index]\ .decode(self._encoding, errors='ignore').encode('utf-8').strip() key_start_index = key_end_index + width key_list += [(key_id, key_text)] return key_list def _read_header(self): f = open(self._fname, 'rb') # number of bytes of header text header_bytes_size = unpack('>I', f.read(4))[0] header_bytes = f.read(header_bytes_size) # 4 bytes: adler32 checksum of header, in little endian adler32 = unpack('= 0x03000000: encoding = encoding.decode('utf-8') # GB18030 > GBK > GB2312 if encoding in ['GBK', 'GB2312']: encoding = 'GB18030' self._encoding = encoding # encryption flag # 0x00 - no encryption # 0x01 - encrypt record block # 0x02 - encrypt key info block if b'Encrypted' not in header_tag or header_tag[b'Encrypted'] == b'No': self._encrypt = 0 elif header_tag[b'Encrypted'] == b'Yes': self._encrypt = 1 else: self._encrypt = int(header_tag[b'Encrypted']) # stylesheet attribute if present takes form of: # style_number # 1-255 # style_begin # or '' # style_end # or '' # store stylesheet in dict in the form of # {'number' : ('style_begin', 'style_end')} self._stylesheet = {} if header_tag.get('StyleSheet'): lines = header_tag['StyleSheet'].splitlines() for i in range(0, len(lines), 3): self._stylesheet[lines[i]] = (lines[i+1], lines[i+2]) # before version 2.0, number is 4 bytes integer # version 2.0 and above uses 8 bytes self._version = float(header_tag[b'GeneratedByEngineVersion']) if self._version < 2.0: self._number_width = 4 self._number_format = '>I' else: self._number_width = 8 self._number_format = '>Q' return header_tag def _read_keys(self): f = open(self._fname, 'rb') f.seek(self._key_block_offset) # the following numbers could be encrypted if self._version >= 2.0: num_bytes = 8 * 5 else: num_bytes = 4 * 4 block = f.read(num_bytes) if self._encrypt & 1: if self._passcode is None: raise RuntimeError('user identification is needed to read encrypted file') regcode, userid = self._passcode if isinstance(userid, unicode): userid = userid.encode('utf8') if self.header[b'RegisterBy'] == b'EMail': encrypted_key = _decrypt_regcode_by_email(regcode, userid) else: encrypted_key = _decrypt_regcode_by_deviceid(regcode, userid) block = _salsa_decrypt(block, encrypted_key) # decode this block sf = BytesIO(block) # number of key blocks num_key_blocks = self._read_number(sf) # number of entries self._num_entries = self._read_number(sf) # number of bytes of key block info after decompression if self._version >= 2.0: key_block_info_decomp_size = self._read_number(sf) # number of bytes of key block info key_block_info_size = self._read_number(sf) # number of bytes of key block key_block_size = self._read_number(sf) # 4 bytes: adler checksum of previous 5 numbers if self._version >= 2.0: adler32 = unpack('>I', f.read(4))[0] assert adler32 == (zlib.adler32(block) & 0xffffffff) # read key block info, which indicates key block's compressed and decompressed size key_block_info = f.read(key_block_info_size) key_block_info_list = self._decode_key_block_info(key_block_info) assert(num_key_blocks == len(key_block_info_list)) # read key block key_block_compressed = f.read(key_block_size) # extract key block key_list = self._decode_key_block(key_block_compressed, key_block_info_list) self._record_block_offset = f.tell() f.close() return key_list def _read_keys_brutal(self): f = open(self._fname, 'rb') f.seek(self._key_block_offset) # the following numbers could be encrypted, disregard them! if self._version >= 2.0: num_bytes = 8 * 5 + 4 key_block_type = b'\x02\x00\x00\x00' else: num_bytes = 4 * 4 key_block_type = b'\x01\x00\x00\x00' block = f.read(num_bytes) # key block info # 4 bytes '\x02\x00\x00\x00' # 4 bytes adler32 checksum # unknown number of bytes follows until '\x02\x00\x00\x00' which marks the beginning of key block key_block_info = f.read(8) if self._version >= 2.0: assert key_block_info[:4] == b'\x02\x00\x00\x00' while True: fpos = f.tell() t = f.read(1024) index = t.find(key_block_type) if index != -1: key_block_info += t[:index] f.seek(fpos + index) break else: key_block_info += t key_block_info_list = self._decode_key_block_info(key_block_info) key_block_size = sum(list(zip(*key_block_info_list))[0]) # read key block key_block_compressed = f.read(key_block_size) # extract key block key_list = self._decode_key_block(key_block_compressed, key_block_info_list) self._record_block_offset = f.tell() f.close() self._num_entries = len(key_list) return key_list class MDD(MDict): """ MDict resource file format (*.MDD) reader. >>> mdd = MDD('example.mdd') >>> len(mdd) 208 >>> for filename,content in mdd.items(): ... print filename, content[:10] """ def __init__(self, fname, passcode=None): MDict.__init__(self, fname, encoding='UTF-16', passcode=passcode) def items(self): """Return a generator which in turn produce tuples in the form of (filename, content) """ return self._decode_record_block() def _decode_record_block(self): f = open(self._fname, 'rb') f.seek(self._record_block_offset) num_record_blocks = self._read_number(f) num_entries = self._read_number(f) assert(num_entries == self._num_entries) record_block_info_size = self._read_number(f) record_block_size = self._read_number(f) # record block info section record_block_info_list = [] size_counter = 0 for i in range(num_record_blocks): compressed_size = self._read_number(f) decompressed_size = self._read_number(f) record_block_info_list += [(compressed_size, decompressed_size)] size_counter += self._number_width * 2 assert(size_counter == record_block_info_size) # actual record block offset = 0 i = 0 size_counter = 0 for compressed_size, decompressed_size in record_block_info_list: record_block_compressed = f.read(compressed_size) # 4 bytes: compression type record_block_type = record_block_compressed[:4] # 4 bytes: adler32 checksum of decompressed record block adler32 = unpack('>I', record_block_compressed[4:8])[0] if record_block_type == b'\x00\x00\x00\x00': record_block = record_block_compressed[8:] elif record_block_type == b'\x01\x00\x00\x00': if lzo is None: print("LZO compression is not supported") break # decompress header = '\xf0' + pack('>I', decompressed_size) record_block = lzo.decompress(header + record_block_compressed[8:]) elif record_block_type == b'\x02\x00\x00\x00': # decompress record_block = zlib.decompress(record_block_compressed[8:]) # notice that adler32 return signed value assert(adler32 == zlib.adler32(record_block) & 0xffffffff) assert(len(record_block) == decompressed_size) # split record block according to the offset info from key block while i < len(self._key_list): record_start, key_text = self._key_list[i] # reach the end of current record block if record_start - offset >= len(record_block): break # record end index if i < len(self._key_list)-1: record_end = self._key_list[i+1][0] else: record_end = len(record_block) + offset i += 1 data = record_block[record_start-offset:record_end-offset] yield key_text, data offset += len(record_block) size_counter += compressed_size assert(size_counter == record_block_size) f.close() class MDX(MDict): """ MDict dictionary file format (*.MDD) reader. >>> mdx = MDX('example.mdx') >>> len(mdx) 42481 >>> for key,value in mdx.items(): ... print key, value[:10] """ def __init__(self, fname, encoding='', substyle=False, passcode=None): MDict.__init__(self, fname, encoding, passcode) self._substyle = substyle def items(self): """Return a generator which in turn produce tuples in the form of (key, value) """ return self._decode_record_block() def _substitute_stylesheet(self, txt): # substitute stylesheet definition txt_list = re.split('`\d+`', txt) txt_tag = re.findall('`\d+`', txt) txt_styled = txt_list[0] for j, p in enumerate(txt_list[1:]): key = txt_tag[j][1:-1] try: style = self._stylesheet[key] except KeyError: log.error('invalid stylesheet key "%s"'%key) continue if p and p[-1] == '\n': txt_styled = txt_styled + style[0] + p.rstrip() + style[1] + '\r\n' else: txt_styled = txt_styled + style[0] + p + style[1] return txt_styled def _decode_record_block(self): f = open(self._fname, 'rb') f.seek(self._record_block_offset) num_record_blocks = self._read_number(f) num_entries = self._read_number(f) assert(num_entries == self._num_entries) record_block_info_size = self._read_number(f) record_block_size = self._read_number(f) # record block info section record_block_info_list = [] size_counter = 0 for i in range(num_record_blocks): compressed_size = self._read_number(f) decompressed_size = self._read_number(f) record_block_info_list += [(compressed_size, decompressed_size)] size_counter += self._number_width * 2 assert(size_counter == record_block_info_size) # actual record block data offset = 0 i = 0 size_counter = 0 for compressed_size, decompressed_size in record_block_info_list: record_block_compressed = f.read(compressed_size) # 4 bytes indicates block compression type record_block_type = record_block_compressed[:4] # 4 bytes adler checksum of uncompressed content adler32 = unpack('>I', record_block_compressed[4:8])[0] # no compression if record_block_type == b'\x00\x00\x00\x00': record_block = record_block_compressed[8:] # lzo compression elif record_block_type == b'\x01\x00\x00\x00': if lzo is None: print("LZO compression is not supported") break # decompress header = b'\xf0' + pack('>I', decompressed_size) record_block = lzo.decompress(header + record_block_compressed[8:]) # zlib compression elif record_block_type == b'\x02\x00\x00\x00': # decompress record_block = zlib.decompress(record_block_compressed[8:]) # notice that adler32 return signed value assert(adler32 == zlib.adler32(record_block) & 0xffffffff) assert(len(record_block) == decompressed_size) # split record block according to the offset info from key block while i < len(self._key_list): record_start, key_text = self._key_list[i] # reach the end of current record block if record_start - offset >= len(record_block): break # record end index if i < len(self._key_list)-1: record_end = self._key_list[i+1][0] else: record_end = len(record_block) + offset i += 1 record = record_block[record_start-offset:record_end-offset] # convert to utf-8 record = record.decode(self._encoding, errors='ignore').strip(unicode('\x00')).encode('utf-8') # substitute styles if self._substyle and self._stylesheet: record = self._substitute_stylesheet(record) yield key_text, record offset += len(record_block) size_counter += compressed_size assert(size_counter == record_block_size) f.close() if __name__ == '__main__': import sys import os import os.path import argparse import codecs def passcode(s): try: regcode, userid = s.split(',') except: raise argparse.ArgumentTypeError("Passcode must be regcode,userid") try: regcode = codecs.decode(regcode, 'hex') except: raise argparse.ArgumentTypeError("regcode must be a 32 bytes hexadecimal string") return regcode, userid parser = argparse.ArgumentParser() parser.add_argument('-x', '--extract', action="store_true", help='extract mdx to source format and extract files from mdd') parser.add_argument('-s', '--substyle', action="store_true", help='substitute style definition if present') parser.add_argument('-d', '--datafolder', default="data", help='folder to extract data files from mdd') parser.add_argument('-e', '--encoding', default="", help='folder to extract data files from mdd') parser.add_argument('-p', '--passcode', default=None, type=passcode, help='register_code,email_or_deviceid') parser.add_argument("filename", nargs='?', help="mdx file name") args = parser.parse_args() # use GUI to select file, default to extract if not args.filename: if sys.hexversion >= 0x03000000: import tkinter as tk import tkinter.filedialog as filedialog else: import Tkinter as tk import tkFileDialog as filedialog root = tk.Tk() root.withdraw() args.filename = filedialog.askopenfilename(parent=root) args.extract = True if not os.path.exists(args.filename): print("Please specify a valid MDX/MDD file") base, ext = os.path.splitext(args.filename) # read mdx file if ext.lower() == os.path.extsep + 'mdx': mdx = MDX(args.filename, args.encoding, args.substyle, args.passcode) if type(args.filename) is unicode: bfname = args.filename.encode('utf-8') else: bfname = args.filename print('======== %s ========' % bfname) print(' Number of Entries : %d' % len(mdx)) for key, value in mdx.header.items(): print(' %s : %s' % (key, value)) else: mdx = None # find companion mdd file mdd_filename = ''.join([base, os.path.extsep, 'mdd']) if os.path.exists(mdd_filename): mdd = MDD(mdd_filename, args.passcode) if type(mdd_filename) is unicode: bfname = mdd_filename.encode('utf-8') else: bfname = mdd_filename print('======== %s ========' % bfname) print(' Number of Entries : %d' % len(mdd)) for key, value in mdd.header.items(): print(' %s : %s' % (key, value)) else: mdd = None if args.extract: # write out glos if mdx: output_fname = ''.join([base, os.path.extsep, 'txt']) tf = open(output_fname, 'wb') for key, value in mdx.items(): tf.write(key) tf.write(b'\r\n') tf.write(value) if not value.endswith(b'\n'): tf.write(b'\r\n') tf.write(b'\r\n') tf.close() # write out style if mdx.header.get('StyleSheet'): style_fname = ''.join([base, '_style', os.path.extsep, 'txt']) sf = open(style_fname, 'wb') sf.write(b'\r\n'.join(mdx.header['StyleSheet'].splitlines())) sf.close() # write out optional data files if mdd: datafolder = os.path.join(os.path.dirname(args.filename), args.datafolder) if not os.path.exists(datafolder): os.makedirs(datafolder) for key, value in mdd.items(): fname = key.decode('utf-8').replace('\\', os.path.sep) dfname = datafolder + fname if not os.path.exists(os.path.dirname(dfname)): os.makedirs(os.path.dirname(dfname)) df = open(dfname, 'wb') df.write(value) df.close() pyglossary-3.2.1/pyglossary/plugin_lib/ripemd128.py0000644000175000017500000000677613575553425022672 0ustar emfoxemfox00000000000000""" Copyright by https://github.com/zhansliu/writemdict ripemd128.py - A simple ripemd128 library in pure Python. Supports both Python 2 (versions >= 2.6) and Python 3. Usage: from ripemd128 import ripemd128 digest = ripemd128(b"The quick brown fox jumps over the lazy dog") assert(digest == b"\x3f\xa9\xb5\x7f\x05\x3c\x05\x3f\xbe\x27\x35\xb2\x38\x0d\xb5\x96") """ import struct # follows this description: http://homes.esat.kuleuven.be/~bosselae/ripemd/rmd128.txt def f(j, x, y, z): assert(0 <= j and j < 64) if j < 16: return x ^ y ^ z elif j < 32: return (x & y) | (z & ~x) elif j < 48: return (x | (0xffffffff & ~y)) ^ z else: return (x & z) | (y & ~z) def K(j): assert(0 <= j and j < 64) if j < 16: return 0x00000000 elif j < 32: return 0x5a827999 elif j < 48: return 0x6ed9eba1 else: return 0x8f1bbcdc def Kp(j): assert(0 <= j and j < 64) if j < 16: return 0x50a28be6 elif j < 32: return 0x5c4dd124 elif j < 48: return 0x6d703ef3 else: return 0x00000000 def padandsplit(message): """ returns a two-dimensional array X[i][j] of 32-bit integers, where j ranges from 0 to 16. First pads the message to length in bytes is congruent to 56 (mod 64), by first adding a byte 0x80, and then padding with 0x00 bytes until the message length is congruent to 56 (mod 64). Then adds the little-endian 64-bit representation of the original length. Finally, splits the result up into 64-byte blocks, which are further parsed as 32-bit integers. """ origlen = len(message) padlength = 64 - ((origlen - 56) % 64) #minimum padding is 1! message += b"\x80" message += b"\x00" * (padlength - 1) message += struct.pack("> (32-s)) & 0xffffffff r = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15, 7, 4,13, 1,10, 6,15, 3,12, 0, 9, 5, 2,14,11, 8, 3,10,14, 4, 9,15, 8, 1, 2, 7, 0, 6,13,11, 5,12, 1, 9,11,10, 0, 8,12, 4,13, 3, 7,15,14, 5, 6, 2] rp = [ 5,14, 7, 0, 9, 2,11, 4,13, 6,15, 8, 1,10, 3,12, 6,11, 3, 7, 0,13, 5,10,14,15, 8,12, 4, 9, 1, 2, 15, 5, 1, 3, 7,14, 6, 9,11, 8,12, 2,10, 0, 4,13, 8, 6, 4, 1, 3,11,15, 0, 5,12, 2,13, 9, 7,10,14] s = [11,14,15,12, 5, 8, 7, 9,11,13,14,15, 6, 7, 9, 8, 7, 6, 8,13,11, 9, 7,15, 7,12,15, 9,11, 7,13,12, 11,13, 6, 7,14, 9,13,15,14, 8,13, 6, 5,12, 7, 5, 11,12,14,15,14,15, 9, 8, 9,14, 5, 6, 8, 6, 5,12] sp = [ 8, 9, 9,11,13,15,15, 5, 7, 7, 8,11,14,14,12, 6, 9,13,15, 7,12, 8, 9,11, 7, 7,12, 7, 6,15,13,11, 9, 7,15,11, 8, 6, 6,14,12,13, 5,14,13,13, 7, 5, 15, 5, 8,11,14,14, 6,14, 6, 9,12, 9,12, 5,15, 8] def ripemd128(message): h0 = 0x67452301 h1 = 0xefcdab89 h2 = 0x98badcfe h3 = 0x10325476 X = padandsplit(message) for i in range(len(X)): (A,B,C,D) = (h0,h1,h2,h3) (Ap,Bp,Cp,Dp) = (h0,h1,h2,h3) for j in range(64): T = rol(s[j], add(A, f(j,B,C,D), X[i][r[j]], K(j))) (A,D,C,B) = (D,C,B,T) T = rol(sp[j], add(Ap, f(63-j,Bp,Cp,Dp), X[i][rp[j]], Kp(j))) (Ap,Dp,Cp,Bp)=(Dp,Cp,Bp,T) T = add(h1,C,Dp) h1 = add(h2,D,Ap) h2 = add(h3,A,Bp) h3 = add(h0,B,Cp) h0 = T return struct.pack(" (ilius) # Copyright (C) 2016 Ratijas # Copyright (C) 2012-2015 Xiaoqiang Wang # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, version 3 of the License. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. import sys import os from os.path import abspath, basename import re import pkgutil import shutil from pyglossary.plugins.formats_common import * from ._dict import * import xdxf sys.setrecursionlimit(10000) enable = True format = "AppleDict" description = "AppleDict Source (xml)" extentions = [".xml"] readOptions = [] writeOptions = [ "cleanHTML", # bool "css", # str or None "xsl", # str or None "defaultPrefs", # dict or None, FIXME "prefsHTML", # str or None "frontBackMatter", # str or None "jing", # str or None "indexes", # str or None ] def abspath_or_None(path): return os.path.abspath(os.path.expanduser(path)) if path else None def write_header(glos, toFile, frontBackMatter): # write header toFile.write( '\n' '\n' ) if frontBackMatter: with open(frontBackMatter, "r") as front_back_matter: toFile.write(front_back_matter.read()) def format_default_prefs(defaultPrefs): """ :type defaultPrefs: dict or None as by 14th of Jan 2016, it is highly recommended that prefs should contain {"version": "1"}, otherwise Dictionary.app does not keep user changes between restarts. """ if not defaultPrefs: return "" if not isinstance(defaultPrefs, dict): raise TypeError("defaultPrefs not a dictionary: %r" % defaultPrefs) if str(defaultPrefs.get("version", None)) != "1": log.error("default prefs does not contain {'version': '1'}. prefs " "will not be persistent between Dictionary.app restarts.") return "\n".join("\t\t%s\n\t\t%s" % i for i in sorted(defaultPrefs.items())).strip() def write_css(fname, css_file): with open(fname, "wb") as toFile: if css_file: with open(css_file, "rb") as fromFile: toFile.write(fromFile.read()) else: toFile.write(pkgutil.get_data( __name__, "templates/Dictionary.css", )) def write( glos, dirPath, cleanHTML=True, css=None, xsl=None, defaultPrefs=None, prefsHTML=None, frontBackMatter=None, jing=None, indexes=None, ): """ write glossary to Apple dictionary .xml and supporting files. :type glos: pyglossary.glossary.Glossary :type dirPath: str, directory path, must not have extension :type cleanHTML: str :param cleanHTML: pass "yes" to use BeautifulSoup parser. :type css: str or None :param css: path to custom .css file :type xsl: str or None :param xsl: path to custom XSL transformations file. :type defaultPrefs: dict or None :param defaultPrefs: Default prefs in python dictionary literal format, i.e. {"key1": "value1", "key2": "value2", ...}. All keys and values must be quoted strings; not allowed characters (e.g. single/double quotes, equal sign "=", semicolon) must be escaped as hex code according to python string literal rules. :type prefsHTML: str or None :param prefsHTML: path to XHTML file with user interface for dictionary's preferences. refer to Apple's documentation for details. :type frontBackMatter: str or None :param frontBackMatter: path to XML file with top-level tag your front/back matter entry content :type jing: str or None :param jing: pass "yes" to run Jing check on generated XML. :type indexes: str or None :param indexes: Dictionary.app is dummy and by default it don't know how to perform flexible search. we can help it by manually providing additional indexes to dictionary entries. # for now no languages supported yet. """ if not isdir(dirPath): os.mkdir(dirPath) xdxf.xdxf_init() if cleanHTML: BeautifulSoup = get_beautiful_soup() if not BeautifulSoup: log.warning( "cleanHTML option passed but BeautifulSoup not found. " + "to fix this run `sudo pip3 install lxml beautifulsoup4 html5lib`" ) else: BeautifulSoup = None fileNameBase = basename(dirPath).replace(".", "_") filePathBase = join(dirPath, fileNameBase) # before chdir (outside indir block) css = abspath_or_None(css) xsl = abspath_or_None(xsl) prefsHTML = abspath_or_None(prefsHTML) frontBackMatter = abspath_or_None(frontBackMatter) generate_id = id_generator() generate_indexes = indexes_generator(indexes) glos.setDefaultDefiFormat("h") myResDir = join(dirPath, "OtherResources") if not isdir(myResDir): os.mkdir(myResDir) with open(filePathBase + ".xml", "w", encoding="utf8") as toFile: write_header(glos, toFile, frontBackMatter) for entryI, entry in enumerate(glos): if entry.isData(): entry.save(myResDir) continue words = entry.getWords() word, alts = words[0], words[1:] defi = entry.getDefi() long_title = _normalize.title_long( _normalize.title(word, BeautifulSoup) ) if not long_title: continue _id = next(generate_id) if BeautifulSoup: title_attr = BeautifulSoup.dammit.EntitySubstitution\ .substitute_xml(long_title, True) else: title_attr = str(long_title) content_title = long_title if entry.getDefiFormat() == "x": defi = xdxf.xdxf_to_html(defi) content_title = None content = format_clean_content(content_title, defi, BeautifulSoup) toFile.write( '\n' % (_id, title_attr) + generate_indexes(long_title, alts, content, BeautifulSoup) + content + "\n\n" ) toFile.write("\n") if xsl: shutil.copy(xsl, myResDir) if prefsHTML: shutil.copy(prefsHTML, myResDir) write_css(filePathBase + ".css", css) with open(join(dirPath, "Makefile"), "w") as toFile: toFile.write( toStr(pkgutil.get_data( __name__, "templates/Makefile", )) % {"dict_name": fileNameBase} ) copyright = glos.getInfo("copyright") if BeautifulSoup: # strip html tags copyright = str(BeautifulSoup.BeautifulSoup( copyright, "lxml" ).text) # if DCSDictionaryXSL provided but DCSDictionaryDefaultPrefs not # present in Info.plist, Dictionary.app will crash. with open(filePathBase + ".plist", "w", encoding="utf-8") as toFile: toFile.write( toStr(pkgutil.get_data( __name__, "templates/Info.plist", )) % { "CFBundleIdentifier": fileNameBase.replace(" ", ""), # identifier must be unique "CFBundleDisplayName": glos.getInfo("name"), "CFBundleName": fileNameBase, "DCSDictionaryCopyright": copyright, "DCSDictionaryManufacturerName": glos.getInfo("author"), "DCSDictionaryXSL": basename(xsl) if xsl else "", "DCSDictionaryDefaultPrefs": format_default_prefs(defaultPrefs), "DCSDictionaryPrefsHTML": basename(prefsHTML) if prefsHTML else "", "DCSDictionaryFrontMatterReferenceID": "DCSDictionaryFrontMatterReferenceID\n" "\tfront_back_matter" if frontBackMatter else "", } ) if jing == "yes": from .jing import run as jing_run jing_run(filePathBase + ".xml") pyglossary-3.2.1/pyglossary/plugins/appledict/_dict.py0000644000175000017500000001620413577304507023523 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # appledict/_dict.py # Output to Apple Dictionary xml sources for Dictionary Development Kit. # # Copyright (C) 2016 Saeed Rasooli (ilius) # Copyright (C) 2016 Ratijas # Copyright (C) 2012-2015 Xiaoqiang Wang # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, version 3 of the License. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. import logging import re import string from xml.sax.saxutils import unescape, quoteattr from . import _normalize from pyglossary.plugins.formats_common import log, toStr log = logging.getLogger("root") def get_beautiful_soup(): try: import bs4 as BeautifulSoup except ImportError: try: import BeautifulSoup except ImportError: return None if int(BeautifulSoup.__version__.split(".")[0]) < 4: raise ImportError( "BeautifulSoup is too old, required at least version 4, " + "%r found.\n" % BeautifulSoup.__version__ + "Please run `sudo pip3 install lxml beautifulsoup4 html5lib`" ) return BeautifulSoup digs = string.digits + string.ascii_letters def base36(x): """ simplified version of int2base http://stackoverflow.com/questions/2267362/convert-integer-to-a-string-in-a-given-numeric-base-in-python#2267446 """ digits = [] while x: digits.append(digs[x % 36]) x //= 36 digits.reverse() return "".join(digits) def id_generator(): cnt = 1 while True: s = "_%s" % base36(cnt) yield s cnt += 1 def indexes_generator(indexes_lang): """ factory that acts according to glossary language :param indexes_lang: str """ indexer = None """Callable[[Sequence[str], str], Sequence[str]]""" if indexes_lang: from . import indexes as idxs indexer = idxs.languages.get(indexes_lang, None) if not indexer: msg = "extended indexes not supported for the specified language: %s.\n"\ "following languages avaible: %s." %\ (indexes_lang, ", ".join(list(idxs.languages.keys()))) log.error(msg) raise ValueError(msg) def generate_indexes(title, alts, content, BeautifulSoup): indexes = [title] indexes.extend(alts) if BeautifulSoup: quoted_title = BeautifulSoup.dammit.EntitySubstitution.substitute_xml(title, True) else: quoted_title = '"%s"' % title.replace(">", ">").replace('"', """) if indexer: indexes = set(indexer(indexes, content)) normal_indexes = set() for idx in indexes: normal = _normalize.title(idx, BeautifulSoup) normal_indexes.add(_normalize.title_long(normal)) normal_indexes.add(_normalize.title_short(normal)) normal_indexes.discard(title) normal_indexes = [s for s in normal_indexes if s.strip()] # skip empty titles. everything could happen. s = "" % (quoted_title, quoted_title) if BeautifulSoup: for idx in normal_indexes: s += "" % ( BeautifulSoup.dammit.EntitySubstitution.substitute_xml(idx, True), quoted_title) else: for idx in normal_indexes: s += '' % ( idx.replace(">", ">").replace('"', """), quoted_title) return s return generate_indexes close_tag = re.compile("<(BR|HR)>", re.IGNORECASE) nonprintable = re.compile("[\x00-\x07\x0e-\x1f]") img_tag = re.compile("", re.IGNORECASE) em0_9_re = re.compile(r'
') em0_9_sub = r'
' em0_9_ex_re = re.compile(r'
') em0_9_ex_sub = r'
' href_re = re.compile(r'''href=(["'])(.*?)\1''') def href_sub(x): href = x.groups()[1] if href.startswith("http"): return x.group() if href.startswith("bword://"): href = href[len("bword://"):] return "href=%s" % quoteattr( "x-dictionary:d:" + unescape( href, {""": '"'}, ) ) def is_green(x): return "color:green" in x.get("style", "") margin_re = re.compile("margin-left:(\d)em") def remove_style(tag, line): s = "".join(tag["style"].replace(line, "").split(";")) if s: tag["style"] = s else: del tag["style"] def format_clean_content(title, body, BeautifulSoup): # heavily integrated with output of dsl reader plugin! # and with xdxf also. """ :param title: str | None """ # class="sec" => d:priority="2" # style="color:steelblue" => class="ex" # class="p" style="color:green" => class="p" # style="color:green" => class="c" # style="margin-left:{}em" => class="m{}" # => # xhtml is strict if BeautifulSoup: soup = BeautifulSoup.BeautifulSoup(body, "lxml", from_encoding="utf-8") # difference between "lxml" and "html.parser" if soup.body: soup = soup.body for tag in soup(class_="sec"): tag["class"].remove("sec") if not tag["class"]: del tag["class"] tag["d:priority"] = "2" for tag in soup(lambda x: "color:steelblue" in x.get("style", "")): remove_style(tag, "color:steelblue") if "ex" not in tag.get("class", []): tag["class"] = tag.get("class", []) + ["ex"] for tag in soup(is_green): remove_style(tag, "color:green") if "p" not in tag.get("class", ""): tag["class"] = tag.get("class", []) + ["c"] for tag in soup(True): if "style" in tag.attrs: m = margin_re.search(tag["style"]) if m: remove_style(tag, m.group(0)) tag["class"] = tag.get("class", []) + ["m" + m.group(1)] for tag in soup.select("[href]"): href = tag["href"] if href.startswith("bword://"): href = href[len("bword://"):] if not (href.startswith("http:") or href.startswith("https:")): tag["href"] = "x-dictionary:d:%s" % href for tag in soup("u"): tag.name = "span" tag["class"] = tag.get("class", []) + ["u"] for tag in soup("s"): tag.name = "del" if title: h1 = BeautifulSoup.Tag(name="h1") h1.string = title soup.insert(0, h1) # hence the name BeautifulSoup content = toStr(soup.encode_contents()) else: # somewhat analogue to what BeautifulSoup suppose to do body = em0_9_re.sub(em0_9_sub, body) body = em0_9_ex_re.sub(em0_9_ex_sub, body) body = href_re.sub(href_sub, body) body = body \ .replace('', '') \ .replace('', '') \ .replace('', '') \ .replace('', '') \ .replace('', '').replace('', '') \ .replace('', '').replace('', '') # nice header to display content = "

%s

%s" % (title, body) if title else body content = close_tag.sub("<\g<1> />", content) content = img_tag.sub("/>", content) content = content.replace(" ", " ") content = nonprintable.sub("", content) return content pyglossary-3.2.1/pyglossary/plugins/appledict/_normalize.py0000644000175000017500000000604313577304507024600 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # appledict/_normalize.py # Output to Apple Dictionary xml sources for Dictionary Development Kit. # # Copyright (C) 2016 Saeed Rasooli (ilius) # Copyright (C) 2016 Ratijas # Copyright (C) 2012-2015 Xiaoqiang Wang # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, version 3 of the License. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. import re _spaces_re = re.compile(r"[ \t\n]{2,}") _title_re = re.compile('<[^<]+?>|"|[<>]|\xef\xbb\xbf') _title_short_re = re.compile(r"\[.*?\]") def spaces(s): """ strip off leading and trailing whitespaces and replace contiguous whitespaces with just one space. """ return _spaces_re.sub(" ", s.strip()) _brackets_sub = ( ( re.compile(r"( *)\{( *)\\\[( *)"), # { \[ r"\1\2\3[", ), ( re.compile(r"( *)\\\]( *)\}( *)"), # \] } r"]\1\2\3", ), ( re.compile(r"( *)\{( *)\(( *)\}( *)"), # { ( } r"\1\2\3\4[", ), ( re.compile(r"( *)\{( *)\)( *)\}( *)"), # { ) } r"]\1\2\3\4", ), ( re.compile(r"( *)\{( *)\(( *)"), # { ( r"\1\2\3[", ), ( re.compile(r"( *)\)( *)\}( *)"), # ) } r"]\1\2\3", ), ( re.compile(r"( *)\{( *)"), # { r"\1\2[", ), ( re.compile(r"( *)\}( *)"), # } r"]\1\2", ), ( re.compile(r"{.*?}"), r"", ), ) def brackets(s): r""" replace all crazy brackets with square ones []. following combinations are to replace: { \[ ... \] } { ( } ... { ) } { ( ... ) } { ... } """ if "{" in s: for exp, sub in _brackets_sub: s = exp.sub(sub, s) return spaces(s) def truncate(text, length=449): """ trunct a string to given length :param str text: :return: truncated text :rtype: str """ content = re.sub("(\t|\n|\r)", " ", text) if len(text) > length: # find the next space after max_len chars (do not break inside a word) pos = content[:length].rfind(" ") if pos == -1: pos = length text = text[:pos] return text def title(title, BeautifulSoup): """ strip double quotes and html tags. """ if BeautifulSoup: title = title.replace("\xef\xbb\xbf", "") if len(title) > 1: # BeautifulSoup has a bug when markup <= 1 char length title = BeautifulSoup.BeautifulSoup( title, "html", ).get_text(strip=True) else: title = _title_re.sub("", title) title = title.replace("&", "&") title = brackets(title) title = truncate(title, 1126) return title def title_long(s): """ title_long("str[ing]") -> "string" """ return s.replace("[", "").replace("]", "") def title_short(s): """ title_short("str[ing]") -> "str" """ return spaces(_title_short_re.sub("", s)) pyglossary-3.2.1/pyglossary/plugins/appledict/indexes/0000755000175000017500000000000013577304644023525 5ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/plugins/appledict/indexes/__init__.py0000644000175000017500000000302713575553425025641 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # appledict/indexes/__init__.py # # Copyright (C) 2016 Ratijas # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, version 3 of the License. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. """ extended indexes generation with respect to source language. """ import os import pkgutil from pyglossary.plugins.formats_common import log __all__ = ["languages", "log"] languages = {} """ Dict[str, Callable[[Sequence[str], str], Sequence[str]]] submodules must register languages by adding (language name -> function) pairs to the mapping. function must follow signature bellow: :param titles: flat iterable of title and altenrative titles :param content: cleaned entry content :return: iterable of indexes (str). use ``` from . import languages # or from appledict.indexes import languages ``` """ here = os.path.dirname(os.path.abspath(__file__)) for _, module, _ in pkgutil.iter_modules([here]): try: __import__("%s.%s" % (__name__, module)) except ImportError: log.exception( "error while importing indexes plugin %s" % module ) pyglossary-3.2.1/pyglossary/plugins/appledict/indexes/ru.py0000644000175000017500000000456413575553425024537 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # appledict/indexes/ru.py # # Copyright (C) 2016 Ratijas # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, version 3 of the License. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. """ Russian indexes based on pymorphy. """ from . import languages from pyglossary.plugins.formats_common import log try: import pymorphy2 except ImportError: log.error("""module pymorphy2 is required to build extended russian indexes. \ you can download it here: http://pymorphy2.readthedocs.org/en/latest/. \ or run `pip3 install pymorphy2`. """) raise else: morphy = pymorphy2.MorphAnalyzer() def ru(titles, _): """ gives a set of all declines, cases and other froms of word `title`. note that it works only if title is one word. :type titles: Sequence[str] :rtype: Set[str] """ indexes = set() indexes_norm = set() for title in titles: # in-place modification _ru(title, indexes, indexes_norm) return list(sorted(indexes)) def _ru(title, a, a_norm): # uppercase abbreviature if title.isupper(): return title_norm = normalize(title) # feature: put dot at the end to match only this word a.add(title) a.add(title + ".") a_norm.add(title_norm) # decline only one-word titles if len(title.split()) == 1: normal_forms = morphy.parse(title) if len(normal_forms) > 0: # forms of most probable match normal_form = normal_forms[0] for x in normal_form.lexeme: word = x.word # Apple Dictionary Services see no difference between # "й" and "и", "ё" and "е", so we're trying to avoid # "* Duplicate index. Skipped..." warning. # new: return indexes with original letters but check for # occurence against "normal forms". word_norm = normalize(word) if word_norm not in a_norm: a.add(word) a_norm.add(word_norm) def normalize(word): return word.lower().replace("й", "и").replace("ё", "е").replace("-", " ") languages["ru"] = ru pyglossary-3.2.1/pyglossary/plugins/appledict/indexes/zh.py0000644000175000017500000000505313577304507024521 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # appledict/indexes/zh.py # # Copyright (C) 2016 Ratijas # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, version 3 of the License. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. """ Chinese wildcard and pinyin indexes. """ import re import bs4 import colorize_pinyin as color from . import languages, log def zh(titles, content): """ chinese indexes. assuming that content is HTML and pinyin is inside second tag (first is

), we can try to parse pinyin and generate indexes with pinyin subwords separated by whitespaces - pinyin itself - pinyin with diacritics replaced by tone numbers multiple pronunciations separated by comma or semicolon are supported. """ indexes = set() for title in titles: # feature: put dot at the end to match only this word indexes.update({title, title + "。"}) # remove all non hieroglyph title = re.sub(r"[^\u4e00-\u9fff]", "", title) indexes.add(title) indexes.update(pinyin_indexes(content)) return indexes def pinyin_indexes(content): pinyin = find_pinyin(content) # assert type(pinyin) == unicode if not pinyin or pinyin == "_": return () indexes = set() # multiple pronunciations for pinyin in re.split(r",|;", pinyin): # find all pinyin ranges, use them to rip pinyin out py = [ r._slice(pinyin) for r in color.ranges_of_pinyin_in_string(pinyin) ] # maybe no pinyin here if not py: return () # just pinyin, with diacritics, separated by whitespace indexes.add("%s." % color.utf(" ".join(py))) # pinyin with diacritics replaced by tone numbers indexes.add("%s." % color.utf(" ".join( ["%s%d" % ( color.lowercase_string_by_removing_pinyin_tones(p), color.determine_tone(p)) for p in py]))) return indexes def find_pinyin(content): # assume that content is HTML and pinyin is inside second tag # (first is

) soup = bs4.BeautifulSoup(content.splitlines()[0], "lxml") if soup.body: soup = soup.body children = soup.children try: next(children) pinyin = next(children) except StopIteration: return None return pinyin.text languages["zh"] = zh pyglossary-3.2.1/pyglossary/plugins/appledict/jing/0000755000175000017500000000000013577304644023015 5ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/plugins/appledict/jing/DictionarySchema/0000755000175000017500000000000013577304644026243 5ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/plugins/appledict/jing/DictionarySchema/AppleDictionarySchema.rng0000755000175000017500000000421413575553425033170 0ustar emfoxemfox00000000000000 pyglossary-3.2.1/pyglossary/plugins/appledict/jing/DictionarySchema/modules/0000755000175000017500000000000013577304644027713 5ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/plugins/appledict/jing/DictionarySchema/modules/dict-struct.rng0000755000175000017500000000603113575553425032674 0ustar emfoxemfox00000000000000 1 pyglossary-3.2.1/pyglossary/plugins/appledict/jing/__init__.py0000644000175000017500000000050113575553425025123 0ustar emfoxemfox00000000000000"""checking XML files with Apple Dictionary Schema. this module can be run from command line with only argument -- file to be checked. otherwise, you need to import this module and call `run` function with the filename as its only argument. """ __all__ = ['run', 'JingTestError'] from .main import run, JingTestError pyglossary-3.2.1/pyglossary/plugins/appledict/jing/__main__.py0000644000175000017500000000061713575553425025114 0ustar emfoxemfox00000000000000"""main entry point""" import logging import os import sys sys.path.append(os.path.abspath(os.path.dirname(__file__))) from . import main log = logging.getLogger('root') console_output_handler = logging.StreamHandler(sys.stderr) console_output_handler.setFormatter(logging.Formatter( '%(asctime)s: %(message)s' )) log.addHandler(console_output_handler) log.setLevel(logging.INFO) main.main() pyglossary-3.2.1/pyglossary/plugins/appledict/jing/jing/0000755000175000017500000000000013577304644023744 5ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/plugins/appledict/jing/jing/readme.html0000755000175000017500000000526013575553425026076 0ustar emfoxemfox00000000000000 Jing version 20091111

Jing version 20091111

This directory contains version 20091111 of Jing, a validator for RELAX NG and other schema languages.

The directory bin contains jing.jar, which contains the code for Jing, ready to use with a Java runtime. For more information on how to use Jing, see this document.

Apart from jing.jar, the bin directory contains some third-party jar files, which are used for XML parsing (under a pre-1.4 JRE that does not provide the Java XML parsing extension) and for validating with schema languages other than RELAX NG:

saxon.jar
Comes from the Saxon 6.5.5 distribution. Used for Schematron 1.5 validation.
xercesImpl.jar
xml-apis.jar
Come from the Xerces2 Java 2.9.1 distribution. Used for W3C XML Schema validation and for XML parsing. Xerces2 Java is under the Apache License Version 2.0, which requires the following notice:
   Apache Xerces Java
   Copyright 1999-2007 The Apache Software Foundation

   This product includes software developed at
   The Apache Software Foundation (http://www.apache.org/).

   Portions of this software were originally based on the following:
     - software copyright (c) 1999, IBM Corporation., http://www.ibm.com.
     - software copyright (c) 1999, Sun Microsystems., http://www.sun.com.
     - voluntary contributions made by Paul Eng on behalf of the 
       Apache Software Foundation that were originally developed at iClick, Inc.,
       software copyright (c) 1999.
isorelax.jar
Comes from ISO RELAX 2004/11/11 distribution. Provides a bridge to validators that use the JARV interface.

The file src.zip contains the Java source code. This is for reference purposes, and doesn't contain the supporting files, such as build scripts and test cases, that are needed for working conveniently with the source code. If you want to make changes to Jing, you should check out the source code and supporting files from the project's Subversion repository.

pyglossary-3.2.1/pyglossary/plugins/appledict/jing/main.py0000644000175000017500000000413713575553425024321 0ustar emfoxemfox00000000000000"""Jing, a validator for RELAX NG and other schema languages.""" import logging from os import path import subprocess import sys __all__ = ['JingTestError', 'run', 'main'] log = logging.getLogger('root') log.setLevel(logging.DEBUG) class JingTestError(subprocess.CalledProcessError): """this exception raised when jing test failed, e.g. returned non-zero. the exit status will be stored in the `returncode` attribute. the `output` attribute also will store the output. """ def __init__(self, returncode, cmd, output): super(JingTestError, self).__init__(returncode, cmd, output) def __str__(self): return 'Jing check failed with exit code %d:\n%s\n%s' %\ (self.returncode, '-' * 80, self.output) def run(filename): """run(filename) check whether the file named `filename` conforms to `AppleDictionarySchema.rng`. :returns: None :raises: JingTestError """ here = path.abspath(path.dirname(__file__)) filename = path.abspath(filename) jing_jar_path = path.join(here, 'jing', 'bin', 'jing.jar') rng_path = path.join(here, 'DictionarySchema', 'AppleDictionarySchema.rng') # -Xmxn Specifies the maximum size, in bytes, of the memory allocation # pool. # -- from `man 1 java` args = ['java', '-Xmx2G', '-jar', jing_jar_path, rng_path, filename] cmd = ' '.join(args) log.info('running Jing check:') log.info(cmd) log.info('...') pipe = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) returncode = pipe.wait() output = pipe.communicate()[0] if returncode != 0: if returncode < 0: log.error('Jing was terminated by signal %d' % -returncode) elif returncode > 0: log.error('Jing returned %d' % returncode) raise JingTestError(returncode, cmd, output) else: log.info('Jing check successfully passed!') def main(): """a command-line utility, runs Jing test on given dictionary XML file with Apple Dictionary Schema. """ if len(sys.argv) < 2: prog_name = path.basename(sys.argv[0]) print("usage:\n %s filename" % prog_name) exit(1) try: run(sys.argv[1]) except JingTestError as e: log.fatal(str(e)) exit(e.returncode) pyglossary-3.2.1/pyglossary/plugins/appledict/templates/0000755000175000017500000000000013577304644024064 5ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/plugins/appledict/templates/Dictionary.css0000644000175000017500000000210213577304507026674 0ustar emfoxemfox00000000000000@charset "UTF-8"; @namespace d url(http://www.apple.com/DTDs/DictionaryService-1.0.rng); d|entry { } h1 { font-size: 150%; } h3 { font-size: 100%; } .ex, .m, .m0, .m1, .m2, .m3, .m4, .m5, .m6, .m7, .m8, .m9 { display: block; } .m { margin-left: 0em; } .m0 { margin-left: 0em; } .m1 { margin-left: 1em; } .m2 { margin-left: 2em; } .m3 { margin-left: 3em; } .m4 { margin-left: 4em; } .m5 { margin-left: 5em; } .m6 { margin-left: 6em; } .m7 { margin-left: 7em; } .m8 { margin-left: 8em; } .m9 { margin-left: 9em; } .ex + br, .k + br { display: none; } .c { color: green; } .p { font-style: italic; color: green; } .ex { color: #666; } .u { text-decoration: underline; } /* xdxf support */ .k { color: black; font-weight: bold; display: block; } .tr { color: black; } .abr { color: #008000; font-style: italic; } .hideextra .extra { display: none; } .stress { color: #FF0000; } .kref { color: #000080; text-decoration: none; } .pr { color: #000080; white-space: nowrap; text-decoration: none; overflow: hidden; text-overflow: ellipsis; padding-right: 1ex; } pyglossary-3.2.1/pyglossary/plugins/appledict/templates/Info.plist0000644000175000017500000000171213575553425026036 0ustar emfoxemfox00000000000000 CFBundleDevelopmentRegion English CFBundleIdentifier %(CFBundleIdentifier)s CFBundleDisplayName %(CFBundleDisplayName)s CFBundleName %(CFBundleName)s CFBundleShortVersionString 1.0 DCSDictionaryCopyright %(DCSDictionaryCopyright)s. DCSDictionaryManufacturerName %(DCSDictionaryManufacturerName)s. DCSDictionaryXSL %(DCSDictionaryXSL)s DCSDictionaryDefaultPrefs %(DCSDictionaryDefaultPrefs)s DCSDictionaryPrefsHTML %(DCSDictionaryPrefsHTML)s %(DCSDictionaryFrontMatterReferenceID)s pyglossary-3.2.1/pyglossary/plugins/appledict/templates/Makefile0000644000175000017500000000241013577304507025517 0ustar emfoxemfox00000000000000# # Makefile # # # ########################### # You need to edit these values. DICT_NAME = "%(dict_name)s" DICT_SRC_PATH = "%(dict_name)s.xml" CSS_PATH = "%(dict_name)s.css" PLIST_PATH = "%(dict_name)s.plist" DICT_BUILD_OPTS = # Suppress adding supplementary key. # DICT_BUILD_OPTS = -s 0 # Suppress adding supplementary key. ########################### # The DICT_BUILD_TOOL_DIR value is used also in "build_dict.sh" script. # You need to set it when you invoke the script directly. DICT_BUILD_TOOL_DIR = "/Developer/Extras/Dictionary Development Kit" DICT_BUILD_TOOL_BIN = "$(DICT_BUILD_TOOL_DIR)/bin" ########################### DICT_DEV_KIT_OBJ_DIR = ./objects export DICT_DEV_KIT_OBJ_DIR DESTINATION_FOLDER = ~/Library/Dictionaries RM = /bin/rm ########################### all: "$(DICT_BUILD_TOOL_BIN)/build_dict.sh" $(DICT_BUILD_OPTS) $(DICT_NAME) $(DICT_SRC_PATH) $(CSS_PATH) $(PLIST_PATH) echo "Done." install: echo "Installing into $(DESTINATION_FOLDER)". mkdir -p $(DESTINATION_FOLDER) ditto --noextattr --norsrc $(DICT_DEV_KIT_OBJ_DIR)/$(DICT_NAME).dictionary $(DESTINATION_FOLDER)/$(DICT_NAME).dictionary touch $(DESTINATION_FOLDER) echo "Done." echo "To test the new dictionary, try Dictionary.app." clean: $(RM) -rf $(DICT_DEV_KIT_OBJ_DIR) pyglossary-3.2.1/pyglossary/plugins/babylon_bdc.py0000644000175000017500000000026313577304507022730 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- from formats_common import * enable = False format = "BabylonBdc" description = "Babylon (bdc)" extentions = [".bdc"] readOptions = [] writeOptions = [] pyglossary-3.2.1/pyglossary/plugins/babylon_bgl/0000755000175000017500000000000013577304644022373 5ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/plugins/babylon_bgl/__init__.py0000644000175000017500000000252613577304507024507 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # # Copyright © 2008-2016 Saeed Rasooli (ilius) # Copyright © 2011-2012 kubtek # This file is part of PyGlossary project, http://github.com/ilius/pyglossary # Thanks to Raul Fernandes and Karl Grill # for reverse engineering # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program. Or on Debian systems, from /usr/share/common-licenses/GPL # If not, see . from formats_common import * from .bgl_reader import BglReader as Reader from .bgl_reader import readOptions enable = True format = "BabylonBgl" description = "Babylon (bgl)" extentions = [".bgl"] writeOptions = [] supportsAlternates = True # progressbar = DEFAULT_YES # FIXME: document type of read/write options # (that would be specified in command line) pyglossary-3.2.1/pyglossary/plugins/babylon_bgl/bgl_charset.py0000644000175000017500000000276013575553425025230 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # # Copyright © 2008-2016 Saeed Rasooli (ilius) # Copyright © 2011-2012 kubtek # This file is part of PyGlossary project, http://github.com/ilius/pyglossary # Thanks to Raul Fernandes and Karl Grill # for reverse engineering # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program. Or on Debian systems, from /usr/share/common-licenses/GPL # If not, see . charsetByCode = { 0x41: "cp1252", # Default, 0x41 0x42: "cp1252", # Latin, 0x42 0x43: "cp1250", # Eastern European, 0x43 0x44: "cp1251", # Cyrillic, 0x44 0x45: "cp932", # Japanese, 0x45 0x46: "cp950", # Traditional Chinese, 0x46 0x47: "cp936", # Simplified Chinese, 0x47 0x48: "cp1257", # Baltic, 0x48 0x49: "cp1253", # Greek, 0x49 0x4A: "cp949", # Korean, 0x4A 0x4B: "cp1254", # Turkish, 0x4B 0x4C: "cp1255", # Hebrew, 0x4C 0x4D: "cp1256", # Arabic, 0x4D 0x4E: "cp874", # Thai, 0x4E } pyglossary-3.2.1/pyglossary/plugins/babylon_bgl/bgl_info.py0000644000175000017500000002104113577304507024520 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # # Copyright © 2008-2016 Saeed Rasooli (ilius) # Copyright © 2011-2012 kubtek # This file is part of PyGlossary project, http://github.com/ilius/pyglossary # Thanks to Raul Fernandes and Karl Grill # for reverse engineering # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program. Or on Debian systems, from /usr/share/common-licenses/GPL # If not, see . from .bgl_language import languageByCode from .bgl_charset import charsetByCode from pyglossary.plugins.formats_common import log import pyglossary.gregorian as gregorian from pyglossary.text_utils import ( binStrToInt, ) def decodeBglBinTime(b_value): jd1970 = gregorian.to_jd(1970, 1, 1) djd, hm = divmod(binStrToInt(b_value), 24*60) year, month, day = gregorian.jd_to(djd + jd1970) hour, minute = divmod(hm, 60) return "%.2d/%.2d/%.2d, %.2d:%.2d" % (year, month, day, hour, minute) def languageInfoDecode(b_value): """ returns BabylonLanguage instance """ intValue = binStrToInt(b_value) try: return languageByCode[intValue] except IndexError: log.warning("read_type_3: unknown language code = %s" % intValue) return def charsetInfoDecode(b_value): value = b_value[0] try: return charsetByCode[value] except KeyError: log.warning("read_type_3: unknown charset %s" % value) def aboutInfoDecode(b_value): if not b_value: return aboutExt, _, aboutContents = b_value.partition(b"\x00") if not aboutExt: log.warning("read_type_3: about: no file extension") return return { "about_extension": aboutExt, "about": aboutContents, } def utf16InfoDecode(b_value): """ b_value is byte array returns str, or None (on errors) block type = 3 block format: <2 byte code1><2 byte code2> if code2 == 0: then the block ends if code2 == 1: then the block continues as follows: <4 byte len1> \x00 \x00 len1 - length of message in 2-byte chars """ if b_value[0] != 0: log.warning( "utf16InfoDecode: b_value=%s, null expected at 0" % list(b_value) ) return if b_value[1] == 0: if len(b_value) > 2: log.warning( "utf16InfoDecode: unexpected b_value size: %s" % len(b_value) ) return elif b_value[1] > 1: log.warning( "utf16InfoDecode: b_value=%s, unexpected byte at 1" % list(b_value) ) return # now b_value[1] == 1 size = 2 * binStrToInt(b_value[2:6]) if tuple(b_value[6:8]) != (0, 0): log.warning( "utf16InfoDecode: b_value=%s, null expected at 6:8" % list(b_value) ) if size != len(b_value)-8: log.warning( "utf16InfoDecode: b_value=%s, size does not match" % list(b_value) ) return b_value[8:].decode("utf16") # str def flagsInfoDecode(b_value): """ returns a dict with these keys: utf8Encoding when this flag is set utf8 encoding is used for all articles when false, the encoding is set according to the source and target alphabet spellingAlternatives determines whether the glossary offers spelling alternatives for searched terms caseSensitive defines if the search for terms in this glossary is case sensitive see code 0x20 as well """ flags = binStrToInt(b_value) return { "utf8Encoding": (flags & 0x8000 != 0), "spellingAlternatives": (flags & 0x10000 == 0), "caseSensitive": (flags & 0x1000 != 0), } infoKeysByCode = { 0x01: "title", # glossary name 0x02: "author", # glossary author name, a list of "|"-separated values 0x03: "email", # glossary author e-mail 0x04: "copyright", # copyright message 0x07: "sourceLang", 0x08: "targetLang", 0x09: "description", # Glossary description 0x0a: "browsingEnabled", # 0: browsing disabled, 1: browsing enabled 0x0b: "iconData", # FIXME 0x0c: "bgl_numEntries", 0x11: "flags", # the value is a dict 0x14: "creationTime", 0x1a: "sourceCharset", 0x1b: "targetCharset", 0x1c: "middleUpdated", 0x2c: "purchaseLicenseMsg", 0x2d: "licenseExpiredMsg", 0x2e: "purchaseAddress", 0x30: "titleWide", 0x31: "authorWide", 0x33: "lastUpdated", 0x3b: "contractions", 0x3d: "fontName", # contains a value like "Arial Unicode MS" or "Tahoma" 0x41: "about", # (aboutExtention, aboutContents) 0x43: "length", # the length of the substring match in a term } infoKeyDecodeMethods = { "sourceLang": languageInfoDecode, "targetLang": languageInfoDecode, "browsingEnabled": lambda b_value: (b_value[0] != 0), "bgl_numEntries": binStrToInt, "creationTime": decodeBglBinTime, "middleUpdated": decodeBglBinTime, "lastUpdated": decodeBglBinTime, "sourceCharset": charsetInfoDecode, "targetCharset": charsetInfoDecode, "about": aboutInfoDecode, "length": binStrToInt, "purchaseLicenseMsg": utf16InfoDecode, "licenseExpiredMsg": utf16InfoDecode, "licenseExpiredMsg": utf16InfoDecode, "titleWide": utf16InfoDecode, "authorWide": utf16InfoDecode, # a list of "|"-separated values "flags": flagsInfoDecode, } """ bgl_numEntries (0x0c): bgl_numEntries does not always matches the number of entries in the dictionary, but it's close to it. the difference is usually +- 1 or 2, in rare cases may be 9, 29 and more length (0x43) The length of the substring match in a term. For example, if your glossary contains the term "Dog" and the substring length is 2, search of the substrings "Do" or "og" will retrieve the term dog. Use substring length 0 for exact match. contractions (0x3b): contains a value like this: V-0#Verb|V-0.0#|V-0.1#Infinitive|V-0.1.1#|V-1.0#|V-1.1#|V-1.1.1#Present Simple|V-1.1.2#Present Simple (3rd pers. sing.)|V-2.0#|V-2.1#|V-2.1.1#Past Simple|V-3.0#|V-3.1#|V-3.1.1#Present Participle|V-4.0#|V-4.1#|V-4.1.1#Past Participle|V-5.0#|V-5.1#|V-5.1.1#Future|V2-0#|V2-0.0#|V2-0.1#Infinitive|V2-0.1.1#|V2-1.0#|V2-1.1#|V2-1.1.1#Present Simple (1st pers. sing.)|V2-1.1.2#Present Simple (2nd pers. sing. & plural forms)|V2-1.1.3#Present Simple (3rd pers. sing.)|V2-2.0#|V2-2.1#|V2-2.1.1#Past Simple (1st & 3rd pers. sing.)|V2-2.1.2#Past Simple (2nd pers. sing. & plural forms)|V2-3.0#|V2-3.1#|V2-3.1.1#Present Participle|V2-4.0#|V2-4.1#|V2-4.1.1#Past Participle|V2-5.0#|V2-5.1#|V2-5.1.1#Future||N-0#Noun|N-1.0#|N-1.1#|N-1.1.1#Singular|N-2.0#|N-2.1#|N-2.1.1#Plural|N4-1.0#|N4-1.1#|N4-1.1.1#Singular Masc.|N4-1.1.2#Singular Fem.|N4-2.0#|N4-2.1#|N4-2.1.1#Plural Masc.|N4-2.1.2#Plural Fem.||ADJ-0#Adjective|ADJ-1.0#|ADJ-1.1#|ADJ-1.1.1#Adjective|ADJ-1.1.2#Comparative|ADJ-1.1.3#Superlative|| value format: ( "#" [] "|")+ The value is in second language, that is for Babylon Russian-English.BGL the value in russian, for Babylon English-Spanish.BGL the value is spanish (I guess), etc. Glossary manual file (0x41) additional information about the dictionary in .txt format this may be short info like this: Biology Glossary Author name: Hafez Divandari Author email: hafezdivandari@gmail.com ------------------------------------------- A functional glossary for translating English biological articles to fluent Farsi ------------------------------------------- Copyright (c) 2009 All rights reserved. in .pdf format this may be a quite large document (about 30 pages), an introduction into the dictionary. It describing structure of an article, editors, how to use the dictionary. format "\x00" file extension may be: ".txt", ".pdf" purchaseLicenseMsg (0x2c): contains a value like this: In order to view this glossary, you must purchase a license.
Click here to purchase. licenseExpiredMsg (0x2d): contains a value like this: Your license for this glossary has expired. In order to view this glossary, you must have a valid license.
Renew your license today. purchaseAddress (0x2e): contains a value like this: http://www.babylon.com/redirects/purchase.cgi?type=169&trid=BPCOT or mailto:larousse@babylon.com #elif keyCode==0x20: # # 0x30 - case sensitive search is disabled # # 0x31 - case sensitive search is enabled # # see code 0x11 as well # if b_value: # value = b_value[0] """ pyglossary-3.2.1/pyglossary/plugins/babylon_bgl/bgl_language.py0000644000175000017500000003104013575553425025353 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # # Copyright © 2008-2016 Saeed Rasooli (ilius) # Copyright © 2011-2012 kubtek # This file is part of PyGlossary project, http://github.com/ilius/pyglossary # Thanks to Raul Fernandes and Karl Grill # for reverse engineering # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program. Or on Debian systems, from /usr/share/common-licenses/GPL # If not, see . """ language properties In this short note we describe how Babylon select encoding for key words, alternates and definitions. There are source and target encodings. The source encoding is used to encode keys and alternates, the target encoding is used to encode definitions. The source encoding is selected based on the source language of the dictionary, the target encoding is tied to the target language. Babylon Glossary Builder allows you to specify source and target languages. If you open a Builder project (a file with .gpr extension) in a text editor, you should find the following elements: Latin English Latin English Here bab:SourceLanguage is the source language that you select in the builder wizard, bab:SourceCharset - is the corresponding charset. bab:TargetLanguage - target language, bab:TargetCharset - corresponding charset. Unfortunately, builder does not tell us what encoding corresponds to charset, but we can detect it. A few words about how definitions are encoded. If all chars of the definition fall into the target charset, Babylon use that charset to encode the definition. If at least one char does not fall into the target charset, Babylon use utf-8 encoding, wrapping the definition into and tags. You can make Babylon use utf-8 encoding for the whole dictionary, in that case all definitions, keys and alternates are encoded with utf-8. See Babylon Glossary Builder wizard, Glossary Properties tab, Advanced button, Use UTF-8 encoding check box. Definitions are not augmented with extra mackup in this case, that is you'll not find charset tags in definitions. How you can tell what encoding was used for the particular definition in .bgl file? You need to check the following conditions. Block type 3, code 0x11. If 0x8000 bit is set, the whole dictionary use utf-8 encoding. If the definition starts with , that definition uses utf-8 encoding. Otherwise you need to consult the target encoding. Block type 3, code 0x1b. That field normally contains 1 byte code of the target encoding. Codes fill the range of 0x41 to 0x4e. Babylon Builder generate codes 0x42 - 0x4e. How to generate code 0x41? Occasionally you may encounter the field value is four zero bytes. In this case, I guess, the default encoding for the target language is used. Block type 3, code 0x08. That field contains 4-bytes code of the target language. The first three bytes are always zero, the last byte is the code. Playing with Babylon Glossary builder we can find language codes corresponding to target language. The language codes fill the range of 0 to 0x3d. How to detect the target encoding? Here is the technique I've used. - Create a babylon glossary source file ( a file with .gls extension) with the following contents. Start the file with utf-8 BOM for the builder to recognize the utf-8 encoding. Use unicode code point code as key, and a single unicode chars encoding in utf-8 as definition. Create keys for all code points in the range 32 - 0x10000, or you may use wider range. We do not use code points in the range 0-31, since they are control chars. You should skip the following three chars: & < >. Since the definition is supposed to contain html, these chars are be replaced by & < > respectively. You should skip the char $ as well, it has special meaning in definitions (?). Skip all code point that cannot encoded in utf-8 (not all code points in the range 32-0x10000 represent valid chars). - Now that you have a glossary source file, process it with builder selecting the desired target language. Make sure the "use utf-8" option is no set. You'll get a .bgl file. - Process the generated .bgl file with pyglossary. Skip all definitions that start with tag. Try to decode definitions using different encodings and match the result with the real value (key - code point char code). Thus you'll find the encoding having the best match. For example, you may do the following. Loop over all available encodings, loop over all definitions in the dictionary. Count the number of definitions that does not start with charset tag - total. Among them count the number of definitions that were correctly decoded - success. The encoding where total == success, is the target encoding. There are a few problems I encountered. It looks like python does not correctly implement cp932 and cp950 encodings. For Japanese charset I got 99.12% match, and for Traditional Chinese charset I got even less - 66.97%. To conform my guess that Japanese is cp932 and Traditional Chinese is cp950 I built a C++ utility that worked on the data extracted from .bgl dictionary. I used WideCharToMultiByte function for conversion. The C++ utility confirmed the cp932 and cp950 encodings, I got 100% match. """ class BabylonLanguage(object): """ Babylon language properties. name - bab:SourceLanguage, bab:TargetLanguage .gpr tags (English, French, Japanese) charset - bab:SourceCharset, bab:TargetCharset .gpr tags (Latin, Arabic, Cyrillic) encoding - Windows code page (cp1250, cp1251, cp1252) code - value of the type 3, code in .bgl file """ def __init__(self, name, charset, encoding, code): self.name = name self.charset = charset self.encoding = encoding self.code = code languages = ( BabylonLanguage( name="English", charset="Latin", encoding="cp1252", code=0x00, ), BabylonLanguage( name="French", charset="Latin", encoding="cp1252", code=0x01, ), BabylonLanguage( name="Italian", charset="Latin", encoding="cp1252", code=0x02, ), BabylonLanguage( name="Spanish", charset="Latin", encoding="cp1252", code=0x03, ), BabylonLanguage( name="Dutch", charset="Latin", encoding="cp1252", code=0x04, ), BabylonLanguage( name="Portuguese", charset="Latin", encoding="cp1252", code=0x05, ), BabylonLanguage( name="German", charset="Latin", encoding="cp1252", code=0x06, ), BabylonLanguage( name="Russian", charset="Cyrillic", encoding="cp1251", code=0x07, ), BabylonLanguage( name="Japanese", charset="Japanese", encoding="cp932", code=0x08, ), BabylonLanguage( name="Chinese (T)", charset="Traditional Chinese", encoding="cp950", code=0x09, ), BabylonLanguage( name="Chinese (S)", charset="Simplified Chinese", encoding="cp936", code=0x0a, ), BabylonLanguage( name="Greek", charset="Greek", encoding="cp1253", code=0x0b, ), BabylonLanguage( name="Korean", charset="Korean", encoding="cp949", code=0x0c, ), BabylonLanguage( name="Turkish", charset="Turkish", encoding="cp1254", code=0x0d, ), BabylonLanguage( name="Hebrew", charset="Hebrew", encoding="cp1255", code=0x0e, ), BabylonLanguage( name="Arabic", charset="Arabic", encoding="cp1256", code=0x0f, ), BabylonLanguage( name="Thai", charset="Thai", encoding="cp874", code=0x10, ), BabylonLanguage( name="Other", charset="Latin", encoding="cp1252", code=0x11, ), BabylonLanguage( name="Other Simplified Chinese dialects", charset="Simplified Chinese", encoding="cp936", code=0x12, ), BabylonLanguage( name="Other Traditional Chinese dialects", charset="Traditional Chinese", encoding="cp950", code=0x13, ), BabylonLanguage( name="Other Eastern-European languages", charset="Eastern European", encoding="cp1250", code=0x14, ), BabylonLanguage( name="Other Western-European languages", charset="Latin", encoding="cp1252", code=0x15, ), BabylonLanguage( name="Other Russian languages", charset="Cyrillic", encoding="cp1251", code=0x16, ), BabylonLanguage( name="Other Japanese languages", charset="Japanese", encoding="cp932", code=0x17, ), BabylonLanguage( name="Other Baltic languages", charset="Baltic", encoding="cp1257", code=0x18, ), BabylonLanguage( name="Other Greek languages", charset="Greek", encoding="cp1253", code=0x19, ), BabylonLanguage( name="Other Korean dialects", charset="Korean", encoding="cp949", code=0x1a, ), BabylonLanguage( name="Other Turkish dialects", charset="Turkish", encoding="cp1254", code=0x1b, ), BabylonLanguage( name="Other Thai dialects", charset="Thai", encoding="cp874", code=0x1c, ), BabylonLanguage( name="Polish", charset="Eastern European", encoding="cp1250", code=0x1d, ), BabylonLanguage( name="Hungarian", charset="Eastern European", encoding="cp1250", code=0x1e, ), BabylonLanguage( name="Czech", charset="Eastern European", encoding="cp1250", code=0x1f, ), BabylonLanguage( name="Lithuanian", charset="Baltic", encoding="cp1257", code=0x20, ), BabylonLanguage( name="Latvian", charset="Baltic", encoding="cp1257", code=0x21, ), BabylonLanguage( name="Catalan", charset="Latin", encoding="cp1252", code=0x22, ), BabylonLanguage( name="Croatian", charset="Eastern European", encoding="cp1250", code=0x23, ), BabylonLanguage( name="Serbian", charset="Eastern European", encoding="cp1250", code=0x24, ), BabylonLanguage( name="Slovak", charset="Eastern European", encoding="cp1250", code=0x25, ), BabylonLanguage( name="Albanian", charset="Latin", encoding="cp1252", code=0x26, ), BabylonLanguage( name="Urdu", charset="Arabic", encoding="cp1256", code=0x27, ), BabylonLanguage( name="Slovenian", charset="Eastern European", encoding="cp1250", code=0x28, ), BabylonLanguage( name="Estonian", charset="Latin", encoding="cp1252", code=0x29, ), BabylonLanguage( name="Bulgarian", charset="Eastern European", encoding="cp1250", code=0x2a, ), BabylonLanguage( name="Danish", charset="Latin", encoding="cp1252", code=0x2b, ), BabylonLanguage( name="Finnish", charset="Latin", encoding="cp1252", code=0x2c, ), BabylonLanguage( name="Icelandic", charset="Latin", encoding="cp1252", code=0x2d, ), BabylonLanguage( name="Norwegian", charset="Latin", encoding="cp1252", code=0x2e, ), BabylonLanguage( name="Romanian", charset="Latin", encoding="cp1252", code=0x2f, ), BabylonLanguage( name="Swedish", charset="Latin", encoding="cp1252", code=0x30, ), BabylonLanguage( name="Ukranian", charset="Cyrillic", encoding="cp1251", code=0x31, ), BabylonLanguage( name="Belarusian", charset="Cyrillic", encoding="cp1251", code=0x32, ), BabylonLanguage( name="Farsi", charset="Arabic", encoding="cp1256", code=0x33, ), BabylonLanguage( name="Basque", charset="Latin", encoding="cp1252", code=0x34, ), BabylonLanguage( name="Macedonian", charset="Eastern European", encoding="cp1250", code=0x35, ), BabylonLanguage( name="Afrikaans", charset="Latin", encoding="cp1252", code=0x36, ), BabylonLanguage( # Babylon Glossary Builder spells this language "Faeroese" name="Faroese", charset="Latin", encoding="cp1252", code=0x37, ), BabylonLanguage( name="Latin", charset="Latin", encoding="cp1252", code=0x38, ), BabylonLanguage( name="Esperanto", charset="Turkish", encoding="cp1254", code=0x39, ), BabylonLanguage( name="Tamazight", charset="Latin", encoding="cp1252", code=0x3a, ), BabylonLanguage( name="Armenian", charset="Latin", encoding="cp1252", code=0x3b, ), BabylonLanguage( name="Hindi", charset="Latin", encoding="cp1252", code=0x3c, ), BabylonLanguage( name="Somali", charset="Latin", encoding="cp1252", code=0x3d, ), ) languageByCode = {lang.code: lang for lang in languages} languageByName = {lang.name: lang for lang in languages} pyglossary-3.2.1/pyglossary/plugins/babylon_bgl/bgl_pos.py0000644000175000017500000000426213575553425024377 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # # Copyright © 2008-2016 Saeed Rasooli (ilius) # Copyright © 2011-2012 kubtek # This file is part of PyGlossary project, http://github.com/ilius/pyglossary # Thanks to Raul Fernandes and Karl Grill # for reverse engineering # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program. Or on Debian systems, from /usr/share/common-licenses/GPL # If not, see . partOfSpeechByCode = { # Use None for codes we have not seen yet # Use "" for codes we've seen but part of speech is unknown 0x30: "noun", 0x31: "adjective", 0x32: "verb", 0x33: "adverb", 0x34: "interjection", 0x35: "pronoun", 0x36: "preposition", 0x37: "conjunction", 0x38: "suffix", 0x39: "prefix", 0x3A: "article", 0x3B: "", # in Babylon Italian-English.BGL, # Babylon Spanish-English.BGL, # Babylon_Chinese_S_English.BGL # no indication of the part of speech 0x3C: "abbreviation", # (short form: 'ר"ת') # (full form: "ר"ת: ראשי תיבות") # "ת'" # adjective # (full form: "ת': תואר") # "ש"ע" # noun # (full form: "ש"ע: שם עצם") 0x3D: "masculine noun and adjective", 0x3E: "feminine noun and adjective", 0x3F: "masculine and feminine noun and adjective", 0x40: "feminine noun", # (short form: "נ\'") # (full form: "נ': נקבה") 0x41: "masculine and feminine noun", # 0x41: noun that may be used as masculine and feminine # (short form: "זו"נ") # (full form: "זו"נ: זכר ונקבה") 0x42: "masculine noun", # (short form: 'ז\'') # (full form: "ז': זכר") 0x43: "numeral", 0x44: "participle", 0x45: None, 0x46: None, 0x47: None, } pyglossary-3.2.1/pyglossary/plugins/babylon_bgl/bgl_reader.py0000644000175000017500000012772113577304507025043 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # # Copyright © 2008-2016 Saeed Rasooli (ilius) # Copyright © 2011-2012 kubtek # This file is part of PyGlossary project, http://github.com/ilius/pyglossary # Thanks to Raul Fernandes and Karl Grill # for reverse engineering # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program. Or on Debian systems, from /usr/share/common-licenses/GPL # If not, see . import io import gzip import re from collections import OrderedDict as odict from pyglossary.plugins.formats_common import * # FIXME try: GzipFile = __import__( "pyglossary.plugin_lib.py%d%d.gzip_no_crc" % sys.version_info[:2], fromlist="GzipFile", ).GzipFile except ImportError: from gzip import GzipFile log.exception("") log.warning( "If things didn\'t work well with BGL files, " "install Python 3.5 to 3.7 and try again" ) from pyglossary.text_utils import ( binStrToInt, excMessage, isASCII, ) from pyglossary.xml_utils import xml_escape from .bgl_info import ( infoKeysByCode, infoKeyDecodeMethods, charsetInfoDecode, ) from .bgl_pos import partOfSpeechByCode from .bgl_text import ( replaceHtmlEntries, replaceHtmlEntriesInKeys, stripHtmlTags, removeControlChars, removeNewlines, normalizeNewlines, replaceAsciiCharRefs, fixImgLinks, stripDollarIndexes, unkownHtmlEntries, ) file = io.BufferedReader debugReadOptions = { "searchCharSamples", # bool "collectMetadata2", # bool "writeGz", # bool "charSamplesPath", # str, file path "msgLogPath", # str, file path "rawDumpPath", # str, file path "unpackedGzipPath", # str, file path } readOptions = [ "defaultEncodingOverwrite", # str, encoding "sourceEncodingOverwrite", # str, encoding "targetEncodingOverwrite", # str, encoding "partOfSpeechColor", # str, for example "ff0000" for green "noControlSequenceInDefi", # bool "strictStringConvertion", # bool "processHtmlInKey", # bool "keyRStripChars", # str, list of characters to strip (from right side) ] + sorted(debugReadOptions) if os.sep == "/": # Operating system is Unix-like tmpDir = "/tmp" elif os.sep == "\\": # Operating system is ms-windows tmpDir = os.getenv("TEMP") else: raise RuntimeError( "Unknown path separator(os.sep==%r)" % os.sep + "What is your operating system?" ) charsetDecodePattern = re.compile( b"(|)", re.I, ) class BGLGzipFile(GzipFile): """ gzip_no_crc.py contains GzipFile class without CRC check. It prints a warning when CRC code does not match. The original method raises an exception in this case. Some dictionaries do not use CRC code, it is set to 0. """ def __init__( self, fileobj=None, closeFileobj=False, **kwargs ): GzipFile.__init__(self, fileobj=fileobj, **kwargs) self.closeFileobj = closeFileobj def close(self): if self.closeFileobj: self.fileobj.close() class Block(object): def __init__(self): self.data = b"" self.type = "" # block offset in the gzip stream, for debugging self.offset = -1 def __str__(self): return "Block type=%s, length=%s, len(data)=%s" % ( self.type, self.length, len(self.data), ) class FileOffS(file): """ A file class with an offset. This class provides an interface to a part of a file starting at specified offset and ending at the end of the file, making it appear an independent file. offset parameter of the constructor specifies the offset of the first byte of the modeled file. """ def __init__(self, filename, offset=0): fp = open(filename, "rb") file.__init__(self, fp) self._fp = fp self.offset = offset self.filesize = os.path.getsize(filename) file.seek(self, offset) # OR self.seek(0) def close(self): self._fp.close() def seek(self, pos, whence=0): # position, whence if whence == 0: # relative to start of file file.seek( self, max(0, pos) + self.offset, 0, ) elif whence == 1: # relative to current position file.seek( self, max( self.offset, self.tell() + pos, ), 0 ) elif whence == 2: # relative to end of file file.seek(self, pos, 2) else: raise ValueError("FileOffS.seek: bad whence=%s" % whence) def tell(self): return file.tell(self) - self.offset class DefinitionFields(object): """ Fields of entry definition Entry definition consists of a number of fields. The most important of them are: defi - the main definition, mandatory, comes first. part of speech title """ # nameByCode = { # } def __init__(self): # self.bytesByCode = {} # self.strByCode = {} self.encoding = None # encoding of the definition self.singleEncoding = True # singleEncoding=True if the definition was encoded with # a single encoding self.b_defi = None # bytes, main definition part of defi self.u_defi = None # str, main part of definition self.partOfSpeech = None # string representation of the part of speech, utf-8 self.b_title = None # bytes self.u_title = None # str self.b_title_trans = None # bytes self.u_title_trans = None # str self.b_transcription_50 = None # bytes self.u_transcription_50 = None # str self.code_transcription_50 = None self.b_transcription_60 = None # bytes self.u_transcription_60 = None # str self.code_transcription_60 = None self.b_field_1a = None # bytes self.u_field_1a = None # str self.b_field_07 = None # bytes self.b_field_06 = None # bytes self.b_field_13 = None # bytes class BglReader(object): ########################################################################## """ Dictionary properties --------------------- Dictionary (or glossary) properties are textual data like glossary name, glossary author name, glossary author e-mail, copyright message and glossary description. Most of the dictionaries have these properties set. Since they contain textual data we need to know the encoding. There may be other properties not listed here. I've enumerated only those that are available in Babylon Glossary builder. Playing with Babylon builder allows us detect how encoding is selected. If global utf-8 flag is set, utf-8 encoding is used for all properties. Otherwise the target encoding is used, that is the encoding corresponding to the target language. The chars that cannot be represented in the target encoding are replaced with question marks. Using this algorithm to decode dictionary properties you may encounter that some of them are decoded incorrectly. For example, it is clear that the property is in cp1251 encoding while the algorithm says we must use cp1252, and we get garbage after decoding. That is OK, the algorithm is correct. You may install that dictionary in Babylon and check dictionary properties. It shows the same garbage. Unfortunately, we cannot detect correct encoding in this case automatically. We may add a parameter the will overwrite the selected encoding, so the user may fix the encoding if needed. """ def __init__(self, glos): # no more arguments self._glos = glos self._filename = "" self.info = odict() self.numEntries = None #### self.sourceLang = "" self.targetLang = "" ## self.defaultCharset = "" self.sourceCharset = "" self.targetCharset = "" ## self.sourceEncoding = None self.targetEncoding = None #### self.bgl_numEntries = None self.wordLenMax = 0 self.defiMaxBytes = 0 ## self.metadata2 = None self.rawDumpFile = None self.msgLogFile = None self.samplesDumpFile = None ## self.stripSlashAltKeyPattern = re.compile(r"(^|\s)/(\w)", re.U) self.specialCharPattern = re.compile(r"[^\s\w.]", re.U) ### self.file = None # offset of gzip header, set in self.open() self.gzipOffset = None # must be a in RRGGBB format self.partOfSpeechColor = "007000" self.iconData = None def __len__(self): if self.numEntries is None: log.warning("len(reader) called while numEntries=None") return 0 return self.numEntries + self.numResources # open .bgl file, read signature, find and open gzipped content # self.file - ungzipped content def open( self, filename, defaultEncodingOverwrite=None, sourceEncodingOverwrite=None, targetEncodingOverwrite=None, partOfSpeechColor=None, noControlSequenceInDefi=False, strictStringConvertion=False, # process keys and alternates as HTML # Babylon does not interpret keys and alternates as HTML text, # however you may encounter many keys containing character references # and html tags. That is clearly a bug of the dictionary. # We must be very careful processing HTML tags in keys, not damage # normal keys. This option should be disabled by default, enabled # explicitly by user. Namely this option does the following: # - resolve character references # - strip HTML tags processHtmlInKey=False, # a string of characters that will be stripped from the end of the # key (and alternate), see str.rstrip function keyRStripChars=None, **kwargs ): if kwargs: for key in kwargs: if key in debugReadOptions: log.error( "BGL Reader: option %r is only usable" % key + "in debug mode, add -v4 to enable debug mode" ) else: log.error("BGL Reader: invalid option %r" % key) return False self._filename = filename self.defaultEncodingOverwrite = defaultEncodingOverwrite self.sourceEncodingOverwrite = sourceEncodingOverwrite self.targetEncodingOverwrite = targetEncodingOverwrite if partOfSpeechColor: self.partOfSpeechColor = partOfSpeechColor self.noControlSequenceInDefi = noControlSequenceInDefi self.strictStringConvertion = strictStringConvertion self.processHtmlInKey = processHtmlInKey self.keyRStripChars = keyRStripChars if not self.openGzip(): return False self.readInfo() self.setGlossaryInfo() return True def openGzip(self): with open(self._filename, "rb") as bglFile: if not bglFile: log.error("file pointer empty: %s" % bglFile) return False b_head = bglFile.read(6) if len(b_head) < 6 or not b_head[:4] in ( b"\x12\x34\x00\x01", b"\x12\x34\x00\x02", ): log.error("invalid header: %r" % b_head[:6]) return False self.gzipOffset = gzipOffset = binStrToInt(b_head[4:6]) log.debug("Position of gz header: %s" % gzipOffset) if gzipOffset < 6: log.error("invalid gzip header position: %s" % gzipOffset) return False self.file = BGLGzipFile( fileobj=FileOffS(self._filename, gzipOffset), closeFileobj=True, ) return True def readInfo(self): """ read meta information about the dictionary: author, description, source and target languages, etc (articles are not read) """ self.numEntries = 0 self.numBlocks = 0 self.numResources = 0 block = Block() while not self.isEndOfDictData(): if not self.readBlock(block): break self.numBlocks += 1 if not block.data: continue if block.type == 0: self.readType0(block) elif block.type in (1, 7, 10, 11, 13): self.numEntries += 1 elif block.type == 2: self.numResources += 1 elif block.type == 3: self.readType3(block) else: # Unknown block.type log.debug( "Unkown Block type %r" % block.type + ", data_length = %s" % len(block.data) + ", number = %s" % self.numBlocks ) self.file.seek(0) self.detectEncoding() log.debug("numEntries = %s" % self.numEntries) if self.bgl_numEntries and self.bgl_numEntries != self.numEntries: # There are a number of cases when these numbers do not match. # The dictionary is OK, and these is no doubt that we might missed # an entry. # self.bgl_numEntries may be less than the number of entries # we've read. log.warning( "bgl_numEntries=%s" % self.bgl_numEntries + ", numEntries=%s" % self.numEntries ) self.numBlocks = 0 def setGlossaryInfo(self): glos = self._glos ### if self.sourceLang: glos.setInfo("sourceLang", self.sourceLang.name) if self.targetLang: glos.setInfo("targetLang", self.targetLang.name) ### for attr in ( "defaultCharset", "sourceCharset", "targetCharset", "defaultEncoding", "sourceEncoding", "targetEncoding", ): value = getattr(self, attr, None) if value: glos.setInfo("bgl_" + attr, value) ### glos.setInfo("sourceCharset", "UTF-8") glos.setInfo("targetCharset", "UTF-8") ### for key, value in self.info.items(): if key in { "creationTime", "middleUpdated", "lastUpdated", }: key = "bgl_" + key try: glos.setInfo(key, value) except: log.exception("key = %s" % key) def isEndOfDictData(self): """ Test for end of dictionary data. A bgl file stores dictionary data as a gzip compressed block. In other words, a bgl file stores a gzip data file inside. A gzip file consists of a series of "members". gzip data block in bgl consists of one member (I guess). Testing for block type returned by self.readBlock is not a reliable way to detect the end of gzip member. For example, consider "Airport Code Dictionary.BGL" dictionary. To reliably test for end of gzip member block we must use a number of undocumented variables of gzip.GzipFile class. self.file._new_member - true if the current member has been completely read from the input file self.file.extrasize - size of buffered data self.file.offset - offset in the input file after reading one gzip member current position in the input file is set to the first byte after gzip data We may get this offset: self.file_bgl.tell() The last 4 bytes of gzip block contains the size of the original (uncompressed) input data modulo 2^32 """ return False def close(self): if self.file: self.file.close() self.file = None def __del__(self): self.close() while unkownHtmlEntries: log.debug( "BGL: unknown html entity: %s" % unkownHtmlEntries.pop() ) # returns False if error def readBlock(self, block): block.offset = self.file.tell() length = self.readBytes(1) if length == -1: log.debug("readBlock: length = -1") return False block.type = length & 0xf length >>= 4 if length < 4: length = self.readBytes(length+1) if length == -1: log.error("readBlock: length = -1") return False else: length -= 4 self.file.flush() if length > 0: try: block.data = self.file.read(length) except: # struct.error: unpack requires a string argument of length 4 # FIXME log.exception( "failed to read block data" + ": numBlocks=%s" % self.numBlocks + ", length=%s" % length + ", filePos=%s" % self.file.tell() ) block.data = b"" return False else: block.data = b"" return True def readBytes(self, num): """ return -1 if error """ if num < 1 or num > 4: log.error("invalid argument num=%s" % num) return -1 self.file.flush() buf = self.file.read(num) if len(buf) == 0: log.debug("readBytes: end of file: len(buf)==0") return -1 if len(buf) != num: log.error( "readBytes: expected to read %s bytes" % num + ", but found %s bytes" % len(buf) ) return -1 return binStrToInt(buf) def readType0(self, block): code = block.data[0] if code == 2: # this number is vary close to self.bgl_numEntries, # but does not always equal to the number of entries # see self.readType3, code == 12 as well num = binStrToInt(block.data[1:]) elif code == 8: self.defaultCharset = charsetInfoDecode(block.data[1:]) if not self.defaultCharset: log.warning("defaultCharset is not valid") else: self.logUnknownBlock(block) return False return True def readType2(self, block): """ Process type 2 block Type 2 block is an embedded file (mostly Image or HTML). pass_num - pass number, may be 1 or 2 On the first pass self.sourceEncoding is not defined and we cannot decode file names. That is why the second pass is needed. The second pass is costly, it apparently increases total processing time. We should avoid the second pass if possible. Most of the dictionaries do not have valuable resources, and those that do, use file names consisting only of ASCII characters. We may process these resources on the second pass. If all files have been processed on the first pass, the second pass is not needed. All dictionaries I've processed so far use only ASCII chars in file names. Babylon glossary builder replaces names of files, like links to images, with what looks like a hash code of the file name, for example "8FFC5C68.png". returns: DataEntry instance if the resource was successfully processed or None """ # Embedded File (mostly Image or HTML) name = "" # Embedded file name pos = 0 # name: Len = block.data[pos] pos += 1 if pos+Len > len(block.data): log.warning("reading block type 2: name too long") return b_name = block.data[pos:pos+Len] pos += Len b_data = block.data[pos:] # if b_name in (b"C2EEF3F6.html", b"8EAF66FD.bmp"): # log.debug("Skipping non-useful file %r" % b_name) # return u_name = b_name.decode(self.sourceEncoding) return self._glos.newDataEntry( u_name, b_data, ) def readType3(self, block): """ reads block with type 3, and updates self.info returns None """ code, b_value = binStrToInt(block.data[:2]), block.data[2:] if not b_value: return # if not b_value.strip(b"\x00"): return # FIXME try: key = infoKeysByCode[code] except KeyError: if b_value.strip(b"\x00"): log.debug( "Unknown info type code=%#.2x" % code + ", b_value=%r" % b_value, ) return try: func = infoKeyDecodeMethods[key] except KeyError: value = b_value else: value = func(b_value) # `value` can be a bytes instance, # or str instance, depending on `key` FIXME if value: if isinstance(value, dict): self.info.update(value) elif key in { "sourceLang", "targetLang", "defaultCharset", "sourceCharset", "targetCharset", "sourceEncoding", "targetEncoding", "bgl_numEntries", "iconData", }: setattr(self, key, value) else: self.info[key] = value def detectEncoding(self): """ assign self.sourceEncoding and self.targetEncoding """ utf8Encoding = self.info.get("utf8Encoding", False) if self.sourceEncodingOverwrite: self.sourceEncoding = self.sourceEncodingOverwrite elif utf8Encoding: self.sourceEncoding = "utf8" elif self.sourceCharset: self.sourceEncoding = self.sourceCharset elif self.sourceLang: self.sourceEncoding = self.sourceLang.encoding else: self.sourceEncoding = "cp1252" if self.targetEncodingOverwrite: self.targetEncoding = self.targetEncodingOverwrite elif utf8Encoding: self.targetEncoding = "utf8" elif self.targetCharset: self.targetEncoding = self.targetCharset elif self.targetLang: self.targetEncoding = self.targetLang.encoding else: self.targetEncoding = "cp1252" # not used if self.defaultEncodingOverwrite: self.defaultEncoding = self.defaultEncodingOverwrite elif self.defaultCharset: self.defaultEncoding = self.defaultCharset else: self.defaultEncoding = "cp1252" def logUnknownBlock(self, block): log.debug( "Unknown block: type=%s" % block.type + ", number=%s" % self.numBlocks + ", data=%r" % block.data ) def __iter__(self): return self def __next__(self): if not self.file: raise StopIteration block = Block() while not self.isEndOfDictData(): if not self.readBlock(block): break if not block.data: continue if block.type == 2: return self.readType2(block) elif block.type == 11: succeed, u_word, u_alts, u_defi = self.readEntry_Type11(block) if not succeed: continue return self._glos.newEntry( [u_word] + u_alts, u_defi, ) elif block.type in (1, 7, 10, 11, 13): pos = 0 # word: succeed, pos, u_word, b_word = self.readEntryWord(block, pos) if not succeed: continue # defi: succeed, pos, u_defi, b_defi = self.readEntryDefi( block, pos, b_word, ) if not succeed: continue # now pos points to the first char after definition succeed, pos, u_alts = self.readEntryAlts( block, pos, b_word, u_word, ) if not succeed: continue return self._glos.newEntry( [u_word] + u_alts, u_defi, ) raise StopIteration def readEntryWord(self, block, pos): """ Read word part of entry. Return value is a list. (False, None, None, None) if error (True, pos, u_word, b_word) if OK u_word is a str instance (utf-8) b_word is a bytes instance """ Err = (False, None, None, None) if pos + 1 > len(block.data): log.error( "reading block offset=%#.2x" % block.offset + ", reading word size: pos + 1 > len(block.data)" ) return Err Len = block.data[pos] pos += 1 if pos + Len > len(block.data): log.error( "reading block offset=%#.2x" % block.offset + ", block.type=%s" % block.type + ", reading word: pos + Len > len(block.data)" ) return Err b_word = block.data[pos:pos+Len] u_word = self.processKey(b_word) """ Entry keys may contain html text, for example: ante< meridiem arm und reich c=t>2003; und etc. Babylon does not process keys as html, it display them as is. Html in keys is the problem of that particular dictionary. We should not process keys as html, since Babylon do not process them as such. """ pos += Len self.wordLenMax = max(self.wordLenMax, len(u_word)) return True, pos, u_word, b_word def readEntryDefi(self, block, pos, b_word): """ Read defi part of entry. Return value is a list. (False, None, None, None) if error (True, pos, u_defi, b_defi) if OK u_defi is a str instance (utf-8) b_defi is a bytes instance """ Err = (False, None, None, None) if pos + 2 > len(block.data): log.error( "reading block offset=%#.2x" % block.offset + ", reading defi size: pos + 2 > len(block.data)" ) return Err Len = binStrToInt(block.data[pos:pos+2]) pos += 2 if pos + Len > len(block.data): log.error( "reading block offset=%#.2x" % block.offset + ", block.type=%s" % block.type + ", reading defi: pos + Len > len(block.data)" ) return Err b_defi = block.data[pos:pos+Len] u_defi = self.processDefi(b_defi, b_word) self.defiMaxBytes = max(self.defiMaxBytes, len(b_defi)) pos += Len return True, pos, u_defi, b_defi def readEntryAlts(self, block, pos, b_word, u_word): """ returns: (False, None, None) if error (True, pos, u_alts) if succeed u_alts is a sorted list, items are str (utf-8) """ Err = (False, None, None) # use set instead of list to prevent duplicates u_alts = set() while pos < len(block.data): if pos + 1 > len(block.data): log.error( "reading block offset=%#.2x" % block.offset + ", reading alt size: pos + 1 > len(block.data)" ) return Err Len = block.data[pos] pos += 1 if pos + Len > len(block.data): log.error( "reading block offset=%#.2x" % block.offset + ", block.type=%s" % block.type + ", reading alt: pos + Len > len(block.data)" ) return Err b_alt = block.data[pos:pos+Len] u_alt = self.processAlternativeKey(b_alt, b_word) # Like entry key, alt is not processed as html by babylon, # so do we. u_alts.add(u_alt) pos += Len if u_word in u_alts: u_alts.remove(u_word) return True, pos, list(sorted(u_alts)) def readEntry_Type11(self, block): """return (succeed, u_word, u_alts, u_defi)""" Err = (False, None, None, None) pos = 0 # reading headword if pos + 5 > len(block.data): log.error( "reading block offset=%#.2x" % block.offset + ", reading word size: pos + 5 > len(block.data)" ) return Err wordLen = binStrToInt(block.data[pos:pos+5]) pos += 5 if pos + wordLen > len(block.data): log.error( "reading block offset=%#.2x" % block.offset + ", block.type=%s" % block.type + ", reading word: pos + wordLen > len(block.data)" ) return Err b_word = block.data[pos:pos+wordLen] u_word = self.processKey(b_word) pos += wordLen self.wordLenMax = max(self.wordLenMax, len(u_word)) # reading alts and defi if pos + 4 > len(block.data): log.error( "reading block offset=%#.2x" % block.offset + ", reading defi size: pos + 4 > len(block.data)" ) return Err altsCount = binStrToInt(block.data[pos:pos+4]) pos += 4 # reading alts # use set instead of list to prevent duplicates u_alts = set() for altIndex in range(altsCount): if pos + 4 > len(block.data): log.error( "reading block offset=%#.2x" % block.offset + ", reading alt size: pos + 4 > len(block.data)" ) return Err altLen = binStrToInt(block.data[pos:pos+4]) pos += 4 if altLen == 0: if pos + altLen != len(block.data): # no evidence log.warning( "reading block offset=%#.2x" % block.offset + ", reading alt size: pos + altLen != len(block.data)" ) break if pos + altLen > len(block.data): log.error( "reading block offset=%#.2x" % block.offset + ", block.type=%s" % block.type + ", reading alt: pos + altLen > len(block.data)" ) return Err b_alt = block.data[pos:pos+altLen] u_alt = self.processAlternativeKey(b_alt, b_word) # Like entry key, alt is not processed as html by babylon, # so do we. u_alts.add(u_alt) pos += altLen if u_word in u_alts: u_alts.remove(u_word) u_alts = list(sorted(u_alts)) # reading defi defiLen = binStrToInt(block.data[pos:pos+4]) pos += 4 if pos + defiLen > len(block.data): log.error( "reading block offset=%#.2x" % block.offset + ", block.type=%s" % block.type + ", reading defi: pos + defiLen > len(block.data)" ) return Err b_defi = block.data[pos:pos+defiLen] u_defi = self.processDefi(b_defi, b_word) self.defiMaxBytes = max(self.defiMaxBytes, len(b_defi)) pos += defiLen return True, u_word, u_alts, u_defi def charReferencesStat(self, b_text, encoding): pass def decodeCharsetTags(self, b_text, defaultEncoding): """ b_text is a bytes Decode html text taking into account charset tags and default encoding Return value: (u_text, defaultEncodingOnly) u_text is str defaultEncodingOnly parameter is false if the text contains parts encoded with non-default encoding (babylon character references '00E6;' do not count). """ b_parts = re.split(charsetDecodePattern, b_text) u_text = "" encodings = [] # stack of encodings defaultEncodingOnly = True for i, b_part in enumerate(b_parts): if i % 3 == 0: # text block encoding = encodings[-1] if encodings else defaultEncoding b_text2 = b_part if encoding == "babylon-reference": b_refs = b_text2.split(b";") for i_ref, b_ref in enumerate(b_refs): if not b_ref: if i_ref != len(b_refs)-1: log.debug( "decoding charset tags" + ", b_text=%r\n" % b_text + "blank character" + " reference (%r)\n" % b_text2 ) continue if not re.match(b"^[0-9a-fA-F]{4}$", b_ref): log.debug( "decoding charset tags, b_text=%r\n" % b_text + "invalid character" + " reference (%r)\n" % b_text2 ) continue u_text += chr(int(b_ref, 16)) else: self.charReferencesStat(b_text2, encoding) if encoding == "cp1252": b_text2 = replaceAsciiCharRefs(b_text2, encoding) if self.strictStringConvertion: try: u_text2 = b_text2.decode(encoding) except UnicodeError: log.debug( "decoding charset tags" + ", b_text=%r" % b_text + "\nfragment: %r" % b_text2 + "\nconversion error:\n%s" % excMessage() ) u_text2 = text2.decode(encoding, "replace") else: u_text2 = b_text2.decode(encoding, "replace") u_text += u_text2 if encoding != defaultEncoding: defaultEncodingOnly = False elif i % 3 == 1: # or if b_part.startswith(b" if encodings: encodings.pop() else: log.debug( "decoding charset tags, b_text=%r\n" % b_text + "unbalanced tag\n" ) else: # b_type = b_parts[i+1].lower() # b_type is a bytes instance, with length 1 if b_type == b"t": encodings.append("babylon-reference") elif b_type == b"u": encodings.append("utf-8") elif b_type == b"k": encodings.append(self.sourceEncoding) elif b_type == b"e": encodings.append(self.sourceEncoding) elif b_type == b"g": # gbk or gb18030 encoding # (not enough data to make distinction) encodings.append("gbk") else: log.debug( "decoding charset tags, text = %r\n" % b_text + "unknown charset code = %#.2x\n" % ord(b_type) ) # add any encoding to prevent # "unbalanced tag" error encodings.append(defaultEncoding) else: # c attribute of charset tag if the previous tag was charset pass if encodings: log.debug( "decoding charset tags, text=%s\n" % b_text + "unclosed tag\n" ) return u_text, defaultEncodingOnly def processKey(self, b_word): """ b_word is a bytes instance returns u_word_main, as str instance (utf-8 encoding) """ b_word_main, strip_count = stripDollarIndexes(b_word) if strip_count > 1: log.debug( "processKey(%s):\n" % b_word + "number of dollar indexes = %s" % strip_count, ) # convert to unicode if self.strictStringConvertion: try: u_word_main = b_word_main.decode(self.sourceEncoding) except UnicodeError: log.debug( "processKey(%s):\n" % b_word + "conversion error:\n%s" % excMessage() ) u_word_main = b_word_main.decode( self.sourceEncoding, "ignore", ) else: u_word_main = b_word_main.decode(self.sourceEncoding, "ignore") if self.processHtmlInKey: # u_word_main_orig = u_word_main u_word_main = stripHtmlTags(u_word_main) u_word_main = replaceHtmlEntriesInKeys(u_word_main) # if(re.match(".*[&<>].*", u_word_main_orig)): # log.debug("original text: " + u_word_main_orig + "\n" \ # + "new text: " + u_word_main + "\n") u_word_main = removeControlChars(u_word_main) u_word_main = removeNewlines(u_word_main) u_word_main = u_word_main.lstrip() u_word_main = u_word_main.rstrip(self.keyRStripChars) return u_word_main def processAlternativeKey(self, b_word, b_key): """ b_word is a bytes instance returns u_word_main, as str instance (utf-8 encoding) """ b_word_main, strip_count = stripDollarIndexes(b_word) # convert to unicode if self.strictStringConvertion: try: u_word_main = b_word_main.decode(self.sourceEncoding) except UnicodeError: log.debug( "processAlternativeKey(%s)\n" % b_word + "key = %s:\n" % b_key + "conversion error:\n%s" % excMessage() ) u_word_main = b_word_main.decode(self.sourceEncoding, "ignore") else: u_word_main = b_word_main.decode(self.sourceEncoding, "ignore") # strip "/" before words u_word_main = re.sub( self.stripSlashAltKeyPattern, r"\1\2", u_word_main, ) if self.processHtmlInKey: # u_word_main_orig = u_word_main u_word_main = stripHtmlTags(u_word_main) u_word_main = replaceHtmlEntriesInKeys(u_word_main) # if(re.match(".*[&<>].*", u_word_main_orig)): # log.debug("original text: " + u_word_main_orig + "\n" \ # + "new text: " + u_word_main + "\n") u_word_main = removeControlChars(u_word_main) u_word_main = removeNewlines(u_word_main) u_word_main = u_word_main.lstrip() u_word_main = u_word_main.rstrip(self.keyRStripChars) return u_word_main def processDefi(self, b_defi, b_key): """ b_defi: bytes b_key: bytes return: u_defi_format """ fields = DefinitionFields() self.collectDefiFields(b_defi, b_key, fields) fields.u_defi, fields.singleEncoding = self.decodeCharsetTags( fields.b_defi, self.targetEncoding, ) if fields.singleEncoding: fields.encoding = self.targetEncoding fields.u_defi = fixImgLinks(fields.u_defi) fields.u_defi = replaceHtmlEntries(fields.u_defi) fields.u_defi = removeControlChars(fields.u_defi) fields.u_defi = normalizeNewlines(fields.u_defi) fields.u_defi = fields.u_defi.strip() if fields.b_title: fields.u_title, singleEncoding = self.decodeCharsetTags( fields.b_title, self.sourceEncoding, ) fields.u_title = replaceHtmlEntries(fields.u_title) fields.u_title = removeControlChars(fields.u_title) if fields.b_title_trans: # sourceEncoding or targetEncoding ? fields.u_title_trans, singleEncoding = self.decodeCharsetTags( fields.b_title_trans, self.sourceEncoding, ) fields.u_title_trans = replaceHtmlEntries(fields.u_title_trans) fields.u_title_trans = removeControlChars(fields.u_title_trans) if fields.b_transcription_50: if fields.code_transcription_50 == 0x10: # contains values like this (char codes): # 00 18 00 19 00 1A 00 1B 00 1C 00 1D 00 1E 00 40 00 07 # this is not utf-16 # what is this? pass elif fields.code_transcription_50 == 0x1b: fields.u_transcription_50, singleEncoding = \ self.decodeCharsetTags( fields.b_transcription_50, self.sourceEncoding, ) fields.u_transcription_50 = \ replaceHtmlEntries(fields.u_transcription_50) fields.u_transcription_50 = \ removeControlChars(fields.u_transcription_50) elif fields.code_transcription_50 == 0x18: # incomplete text like: # t c=T>02D0;g0259;- # This defi normally contains fields.b_transcription_60 # in this case. pass else: log.debug( "processDefi(%s)\n" % b_defi + "b_key = %s:\n" % b_key + "defi field 50, " + "unknown code: %#.2x" % fields.code_transcription_50 ) if fields.b_transcription_60: if fields.code_transcription_60 == 0x1b: fields.u_transcription_60, singleEncoding = \ self.decodeCharsetTags( fields.b_transcription_60, self.sourceEncoding, ) fields.u_transcription_60 = \ replaceHtmlEntries(fields.u_transcription_60) fields.u_transcription_60 = \ removeControlChars(fields.u_transcription_60) else: log.debug( "processDefi(%s)\n" % b_defi + "b_key = %s:\n" % b_key + "defi field 60" + "unknown code: %#.2x" % fields.code_transcription_60, ) if fields.b_field_1a: fields.u_field_1a, singleEncoding = self.decodeCharsetTags( fields.b_field_1a, self.sourceEncoding, ) self.processDefiStat(fields, b_defi, b_key) u_defi_format = "" if fields.partOfSpeech or fields.u_title: if fields.partOfSpeech: u_defi_format += '%s' % ( self.partOfSpeechColor, xml_escape(fields.partOfSpeech), ) if fields.u_title: if u_defi_format: u_defi_format += " " u_defi_format += fields.u_title u_defi_format += "
\n" if fields.u_title_trans: u_defi_format += fields.u_title_trans + "
\n" if fields.u_transcription_50: u_defi_format += "[%s]
\n" % fields.u_transcription_50 if fields.u_transcription_60: u_defi_format += "[%s]
\n" % fields.u_transcription_60 if fields.u_defi: u_defi_format += fields.u_defi return u_defi_format def processDefiStat(self, fields, b_defi, b_key): pass def findDefiFieldsStart(self, b_defi): """ b_defi is a bytes instance Finds the beginning of the definition trailing fields. Return value is the index of the first chars of the field set, or -1 if the field set is not found. Normally "\x14" should signal the beginning of the definition fields, but some articles may contain this characters inside, so we get false match. As a workaround we may check the following chars. If "\x14" is followed by space, we assume this is part of the article and continue search. Unfortunately this does no help in many cases... """ if self.noControlSequenceInDefi: return -1 index = -1 while True: index = b_defi.find( 0x14, index+1, # starting from next character -1, # not the last character ) if index == -1: break if b_defi[index+1] != 0x20: # b" "[0] == 0x20 break return index def collectDefiFields(self, b_defi, b_key, fields): """ entry definition structure:
['\x14'[{field_code}{field_data}]*] {field_code} is one character {field_data} has arbitrary length """ # d0 is index of the '\x14 char in b_defi # d0 may be the last char of the string d0 = self.findDefiFieldsStart(b_defi) if d0 == -1: fields.b_defi = b_defi return fields.b_defi = b_defi[:d0] i = d0 + 1 while i < len(b_defi): if self.metadata2: self.metadata2.defiTrailingFields[b_defi[i]] += 1 if b_defi[i] == 0x02: # part of speech # "\x02" if fields.partOfSpeech: log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\n" % b_key + "duplicate part of speech item", ) if i+1 >= len(b_defi): log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\nb_defi ends after \\x02" % b_key ) return posCode = b_defi[i+1] try: fields.partOfSpeech = partOfSpeechByCode[posCode] except KeyError: log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\n" % b_key + "unknown part of speech code = %#.2x" % posCode ) return i += 2 elif b_defi[i] == 0x06: # \x06 if fields.b_field_06: log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\nduplicate type 6" % b_key ) if i+1 >= len(b_defi): log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\nb_defi ends after \\x06" % b_key ) return fields.b_field_06 = b_defi[i+1] i += 2 elif b_defi[i] == 0x07: # \x07 # Found in 4 Hebrew dictionaries. I do not understand. if i+3 > len(b_defi): log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\ntoo few data after \\x07" % b_key ) return fields.b_field_07 = b_defi[i+1:i+3] i += 3 elif b_defi[i] == 0x13: # "\x13" # known values: # 03 06 0D C7 # 04 00 00 00 44 # ... # 04 00 00 00 5F if i + 1 >= len(b_defi): log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\ntoo few data after \\x13" % b_key ) return Len = b_defi[i+1] i += 2 if Len == 0: log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\nblank data after \\x13" % b_key ) continue if i+Len > len(b_defi): log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\ntoo few data after \\x13" % b_key ) return fields.b_field_13 = b_defi[i:i+Len] i += Len elif b_defi[i] == 0x18: # \x18 if fields.b_title: log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\nduplicate entry title item" % b_key ) if i+1 >= len(b_defi): log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\nb_defi ends after \\x18" % b_key ) return i += 1 Len = b_defi[i] i += 1 if Len == 0: # log.debug( # "collecting definition fields, b_defi = %r\n" % b_defi + # "b_key = %r:\nblank entry title" % b_key # ) continue if i + Len > len(b_defi): log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\ntitle is too long" % b_key ) return fields.b_title = b_defi[i:i+Len] i += Len elif b_defi[i] == 0x1a: # "\x1a" # found only in Hebrew dictionaries, I do not understand. if i + 1 >= len(b_defi): log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %s:\ntoo few data after \\x1a" % b_key ) return Len = b_defi[i+1] i += 2 if Len == 0: log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\nblank data after \\x1a" % b_key ) continue if i+Len > len(b_defi): log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\ntoo few data after \\x1a" % b_key ) return fields.b_field_1a = b_defi[i:i+Len] i += Len elif b_defi[i] == 0x28: # "\x28" # title with transcription? if i + 2 >= len(b_defi): log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\ntoo few data after \\x28" % b_key ) return i += 1 Len = binStrToInt(b_defi[i:i+2]) i += 2 if Len == 0: log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\nblank data after \\x28" % b_key ) continue if i+Len > len(b_defi): log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\ntoo few data after \\x28" % b_key ) return fields.b_title_trans = b_defi[i:i+Len] i += Len elif 0x40 <= b_defi[i] <= 0x4f: # [\x41-\x4f] # often contains digits as text: # 56 # ælps - key Alps # 48@i # has no apparent influence on the article code = b_defi[i] Len = b_defi[i] - 0x3f if i+2+Len > len(b_defi): log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\ntoo few data after \\x40+" % b_key ) return i += 2 b_text = b_defi[i:i+Len] i += Len log.debug( "\nunknown definition field %#.2x" % code + ", b_text=%r" % b_text ) elif b_defi[i] == 0x50: # \x50 if i + 2 >= len(b_defi): log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\ntoo few data after \\x50" % b_key ) return fields.code_transcription_50 = b_defi[i+1] Len = b_defi[i+2] i += 3 if Len == 0: log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\nblank data after \\x50" % b_key ) continue if i+Len > len(b_defi): log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\ntoo few data after \\x50" % b_key ) return fields.b_transcription_50 = b_defi[i:i+Len] i += Len elif b_defi[i] == 0x60: # "\x60" if i + 4 > len(b_defi): log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\ntoo few data after \\x60" % b_key ) return fields.code_transcription_60 = b_defi[i+1] i += 2 Len = binStrToInt(b_defi[i:i+2]) i += 2 if Len == 0: log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\nblank data after \\x60" % b_key ) continue if i+Len > len(b_defi): log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\ntoo few data after \\x60" % b_key ) return fields.b_transcription_60 = b_defi[i:i+Len] i += Len else: log.debug( "collecting definition fields, " + "b_defi = %r\n" % b_defi + "b_key = %r:\n" % b_key + "unknown control char. Char code = %#.2x" % b_defi[i] ) return pyglossary-3.2.1/pyglossary/plugins/babylon_bgl/bgl_reader_debug.py0000755000175000017500000003426413577304507026213 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # # Copyright © 2008-2016 Saeed Rasooli (ilius) # Copyright © 2011-2012 kubtek # This file is part of PyGlossary project, http://github.com/ilius/pyglossary # Thanks to Raul Fernandes and Karl Grill # for reverse engineering # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program. Or on Debian systems, from /usr/share/common-licenses/GPL # If not, see . from .bgl_reader import BglReader class MetaData(object): def __init__(self): self.blocks = [] self.numEntries = None self.numBlocks = None self.numFiles = None self.gzipStartOffset = None self.gzipEndOffset = None self.fileSize = None self.bglHeader = None # data before gzip header class MetaDataBlock(object): def __init__(self, data, _type): self.data = data self.type = _type class MetaDataRange(object): def __init__(self, _type, count): self.type = _type self.count = count class MetaData2(object): """ Second pass metadata. We need to scan all definitions in order to collect these statistical data. """ def __init__(self): # defiTrailingFields[i] - number of fields with code i found self.defiTrailingFields = [0] * 256 self.isDefiASCII = True # isDefiASCII = true if all definitions contain only ASCII chars """ We apply a number of tests to each definition, excluding those with overwritten encoding (they start with ). defiProcessedCount - total number of definitions processed defiUtf8Count - number of definitions in utf8 encoding defiAsciiCount - number of definitions containing only ASCII chars """ self.defiProcessedCount = 0 self.defiUtf8Count = 0 self.defiAsciiCount = 0 self.charRefs = dict() # encoding -> [ 0 ] * 257 class GzipWithCheck(object): """ gzip.GzipFile with check. It checks that unpacked data match what was packed. """ def __init__(self, fileobj, unpackedPath, reader, closeFileobj=False): """ constructor fileobj - gzip file - archive unpackedPath - path of a file containing original data, for testing. reader - reference to BglReader class instance, used for logging. """ self.file = BGLGzipFile( fileobj=fileobj, closeFileobj=closeFileobj, ) self.unpackedFile = open(unpackedPath, "rb") self.reader = reader def __del__(self): self.close() def close(self): if self.file: self.file.close() self.file = None if self.unpackedFile: self.unpackedFile.close() self.unpackedFile = None def read(self, size=-1): buf1 = self.file.read(size) buf2 = self.unpackedFile.read(size) if buf1 != buf2: self.reader.msgLogFileWrite( "GzipWithCheck.read: !=: size = %s, (%s) (%s)" % ( buf1, buf2, size, ) ) # else: # self.reader.msgLogFileWrite( # "GzipWithCheck.read: ==: size = %s, (%s) (%s)" % ( # buf1, # buf2, # size, # ) # ) return buf1 def seek(self, offset, whence=os.SEEK_SET): self.file.seek(offset, whence) self.unpackedFile.seek(offset, whence) # self.reader.msgLogFileWrite( # "GzipWithCheck.seek: offset = %s, whence = %s" % (offset, whence) # ) def tell(self): pos1 = self.file.tell() pos2 = self.unpackedFile.tell() if pos1 != pos2: self.reader.msgLogFileWrite( "GzipWithCheck.tell: !=: %s %s" % (pos1, pos2) ) # else: # self.reader.msgLogFileWrite( # "GzipWithCheck.tell: ==: %s %s" % (pos1, pos2) # ) return pos1 def flush(self): if os.sep == "\\": pass # a bug in Windows # after file.flush, file.read returns garbage else: self.file.flush() self.unpackedFile.flush() class DebugBglReader(BglReader): def open( self, filename, collectMetadata2=False, searchCharSamples=False, writeGz=False, rawDumpPath=None, unpackedGzipPath=None, charSamplesPath=None, msgLogPath=None, **kwargs ): if not BglReader.open(self, filename, **kwargs): return self.metadata2 = MetaData2() if collectMetadata2 else None self.targetCharsArray = ([False] * 256) if searchCharSamples else None self.writeGz = writeGz self.rawDumpPath = rawDumpPath self.unpackedGzipPath = unpackedGzipPath self.charSamplesPath = charSamplesPath self.msgLogPath = msgLogPath if self.rawDumpPath: self.rawDumpFile = open(self.rawDumpPath, "w") if self.charSamplesPath: self.samplesDumpFile = open(self.charSamplesPath, "w") if self.msgLogPath: self.msgLogFile = open(self.msgLogPath, "w") self.charRefStatPattern = re.compile(b"(&#\\w+;)", re.I) def openGzip(self): with open(self._filename, "rb") as bglFile: if not bglFile: log.error("file pointer empty: %s" % bglFile) return False buf = bglFile.read(6) if len(buf) < 6 or not buf[:4] in ( b"\x12\x34\x00\x01", b"\x12\x34\x00\x02", ): log.error("invalid header: %s" % buf[:6]) return False self.gzipOffset = gzipOffset = binStrToInt(buf[4:6]) log.debug("Position of gz header: i=%s" % gzipOffset) if gzipOffset < 6: log.error("invalid gzip header position: %s" % gzipOffset) return False if self.writeGz: self.dataFile = self._filename+"-data.gz" try: f2 = open(self.dataFile, "wb") except IOError: log.exception("error while opening gzip data file") self.dataFile = join( tmpDir, os.path.split(self.m_filename)[-1] + "-data.gz" ) f2 = open(self.dataFile, "wb") bglFile.seek(i) f2.write(bglFile.read()) f2.close() self.file = gzip.open(self.dataFile, "rb") else: f2 = FileOffS(self._filename, gzipOffset) if self.unpackedGzipPath: self.file = GzipWithCheck( f2, self.unpackedGzipPath, self, closeFileobj=True, ) else: self.file = BGLGzipFile( fileobj=f2, closeFileobj=True, ) def close(self): BglReader.close(self) if self.rawDumpFile: self.rawDumpFile.close() self.rawDumpFile = None if self.msgLogFile: self.msgLogFile.close() self.msgLogFile = None if self.samplesDumpFile: self.samplesDumpFile.close() self.samplesDumpFile = None def __del__(self): BglReader.__del__(self) def readEntryWord(self, block, pos): succeed, pos, u_word, b_word = \ BglReader.readEntryWord(self, block, pos) if not succeed: return self.rawDumpFileWriteText("\n\nblock type = %s\nkey = " % block.type) self.rawDumpFileWriteData(b_word) def readEntryDefi(self, block, pos, b_key): succeed, pos, u_defi, b_defi = \ BglReader.readEntryDefi(self, block, pos, b_key) if not succeed: return self.rawDumpFileWriteText("\ndefi = ") self.rawDumpFileWriteData(b_defi) """ def readEntryAlts(self, block, pos, b_key, key): succeed, pos, alts, b_alts = \ BglReader.readEntryAlts(self, block, pos, b_key, key) if not succeed: return for b_alt in b_alts: self.rawDumpFileWriteText("\nalt = ") self.rawDumpFileWriteData(b_alt) """ def charReferencesStat(self, b_text, encoding): """ b_text is bytes instance """ # “ # ċ if not self.metadata2: return if encoding not in self.metadata2.charRefs: self.metadata2.charRefs[encoding] = [0] * 257 charRefs = self.metadata2.charRefs[encoding] for index, b_part in enumerate(re.split( self.charRefStatPattern, b_text, )): if index % 2 != 1: continue try: if b_part[:3].lower() == "&#x": code = int(b_part[3:-1], 16) else: code = int(b_part[2:-1]) except (ValueError, OverflowError): continue if code <= 0: continue code = min(code, 256) charRefs[code] += 1 def processDefiStat(self, fields, b_defi, b_key): if fields.singleEncoding: self.findAndPrintCharSamples( fields.b_defi, "defi, key = %s" + b_key, fields.encoding, ) if self.metadata2: self.metadata2.defiProcessedCount += 1 if isASCII(fields.b_defi): self.metadata2.defiAsciiCount += 1 try: fields.b_defi.decode("utf8") except UnicodeError: pass else: self.metadata2.defiUtf8Count += 1 if self.metadata2 and self.metadata2.isDefiASCII: if not isASCII(fields.u_defi): self.metadata2.isDefiASCII = False # write text to dump file as is def rawDumpFileWriteText(self, text): # FIXME text = toStr(text) if self.rawDumpFile: self.rawDumpFile.write(text) # write data to dump file unambiguously representing control chars # escape "\" with "\\" # print control chars as "\xhh" def rawDumpFileWriteData(self, text): text = toStr(text) # the next function escapes too many chars, for example, it escapes äöü # self.rawDumpFile.write(text.encode("unicode_escape")) if self.rawDumpFile: self.rawDumpFile.write(text) def msgLogFileWrite(self, text): text = toStr(text) if self.msgLogFile: offset = self.msgLogFile.tell() # print offset in the log file to facilitate navigating this # log in hex editor # intended usage: # the log file is opened in a text editor and hex editor # use text editor to read error messages, use hex editor to # inspect char codes offsets allows to quickly jump to the right # place of the file hex editor self.msgLogFile.write("\noffset = {0:#X}\n" % offset) self.msgLogFile.write(text+"\n") else: log.debug(text) def samplesDumpFileWrite(self, text): text = toStr(text) if self.samplesDumpFile: offset = self.samplesDumpFile.tell() self.samplesDumpFile.write("\noffset = {0:#X}\n" % offset) self.samplesDumpFile.write(text+"\n") else: log.debug(text) def dumpBlocks(self, dumpPath): import pickle self.file.seek(0) metaData = MetaData() metaData.numFiles = 0 metaData.gzipStartOffset = self.gzipOffset self.numEntries = 0 self.numBlocks = 0 range_type = None range_count = 0 block = Block() while not self.isEndOfDictData(): log.debug( "readBlock: " + "offset %#X, " % self.file.tell() + "unpacked offset %#X" % self.file.unpackedFile.tell() ) if not self.readBlock(block): break self.numBlocks += 1 if block.type in (1, 7, 10, 11, 13): self.numEntries += 1 elif block.type == 2: # Embedded File (mostly Image or HTML) metaData.numFiles += 1 if block.type in (1, 2, 7, 10, 11, 13): if range_type == block.type: range_count += 1 else: if range_count > 0: mblock = MetaDataRange(range_type, range_count) metaData.blocks.append(mblock) range_count = 0 range_type = block.type range_count = 1 else: if range_count > 0: mblock = MetaDataRange(range_type, range_count) metaData.blocks.append(mblock) range_count = 0 mblock = MetaDataBlock(block.data, block.type) metaData.blocks.append(mblock) if range_count > 0: mblock = MetaDataRange(range_type, range_count) metaData.blocks.append(mblock) range_count = 0 metaData.numEntries = self.numEntries metaData.numBlocks = self.numBlocks metaData.gzipEndOffset = self.file_bgl.tell() metaData.fileSize = os.path.getsize(self._filename) with open(self._filename, "rb") as f: metaData.bglHeader = f.read(self.gzipOffset) with open(dumpPath, "wb") as f: pickle.dump(metaData, f) self.file.seek(0) def dumpMetadata2(self, dumpPath): import pickle if not self.metadata2: return with open(dumpPath, "wb") as f: pickle.dump(self.metadata2, f) def processDefiStat(self, fields, defi, b_key): BglReader.processDefiStat(self, fields, defi, b_key) if fields.b_title: self.rawDumpFileWriteText("\ndefi title: ") self.rawDumpFileWriteData(fields.b_title) if fields.b_title_trans: self.rawDumpFileWriteText("\ndefi title trans: ") self.rawDumpFileWriteData(fields.b_title_trans) if fields.b_transcription_50: self.rawDumpFileWriteText( "\ndefi transcription_50 (%#x): " % fields.code_transcription_50 ) self.rawDumpFileWriteData(fields.b_transcription_50) if fields.b_transcription_60: self.rawDumpFileWriteText( "\ndefi transcription_60 (%#x): " % fields.code_transcription_60 ) self.rawDumpFileWriteData(fields.b_transcription_60) if fields.b_field_1a: self.rawDumpFileWriteText("\ndefi field_1a: ") self.rawDumpFileWriteData(fields.b_field_1a) if fields.b_field_13: self.rawDumpFileWriteText( "\ndefi field_13 bytes: %r" % fields.b_field_13 ) if fields.b_field_07: self.rawDumpFileWriteText("\ndefi field_07: ") self.rawDumpFileWriteData(fields.b_field_07) if fields.b_field_06: self.rawDumpFileWriteText( "\ndefi field_06: %s" % fields.b_field_06 ) # search for new chars in data # if new chars are found, mark them with a special sequence in the text # and print result into msg log def findAndPrintCharSamples(self, b_data, hint, encoding): assert isinstance(b_data, bytes) if not self.targetCharsArray: return offsets = self.findCharSamples(b_data) if len(offsets) == 0: return res = "" utf8 = (encoding.lower() == "utf8") i = 0 for o in offsets: j = o if utf8: while b_data[j] & 0xc0 == 0x80: j -= 1 res += b_data[i:j] res += "!!!--+!!!" i = j res += b_data[j:] offsets_str = " ".join([str(el) for el in offsets]) self.samplesDumpFileWrite( "charSample(%s)\noffsets = %s\nmarked = %s\norig = %s\n" % ( hint, offsets_str, res, b_data, ) ) def findCharSamples(self, b_data): """ Find samples of chars in b_data. Search for chars in data that have not been marked so far in the targetCharsArray array, mark new chars. Returns a list of offsets in b_data May return an empty list. """ res = [] if not isinstance(b_data, bytes): log.error("findCharSamples: b_data is not a bytes instance") return res if not self.targetCharsArray: log.error("findCharSamples: self.targetCharsArray == None") return res for i, char in enumerate(b_data): if x < 128: continue if not self.targetCharsArray[x]: self.targetCharsArray[x] = True res.append(i) return res pyglossary-3.2.1/pyglossary/plugins/babylon_bgl/bgl_text.py0000644000175000017500000001757213577304507024567 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # # Copyright © 2008-2016 Saeed Rasooli (ilius) # Copyright © 2011-2012 kubtek # This file is part of PyGlossary project, http://github.com/ilius/pyglossary # Thanks to Raul Fernandes and Karl Grill # for reverse engineering # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program. Or on Debian systems, from /usr/share/common-licenses/GPL # If not, see . import re from pyglossary.plugins.formats_common import log from pyglossary.xml_utils import xml_escape u_pat_html_entry = re.compile("(?:&#x|&#|&)(\\w+);?", re.I) u_pat_html_entry_key = re.compile("(?:&#x|&#|&)(\\w+);", re.I) b_pat_ascii_char_ref = re.compile(b"(&#\\w+;)", re.I) unkownHtmlEntries = set() def replaceHtmlEntryNoEscapeCB(u_match): """ u_match: instance of _sre.SRE_Match Replace character entity with the corresponding character Return the original string if conversion fails. Use this as a replace function of re.sub. """ import html.entities from pyglossary.html_utils import name2codepoint u_text = u_match.group(0) u_name = u_match.group(1) if log.isDebug(): assert isinstance(u_text, str) and isinstance(u_name, str) u_res = None if u_text[:2] == "&#": # character reference try: if u_text[:3].lower() == "&#x": code = int(u_name, 16) else: code = int(u_name) if code <= 0: raise ValueError() u_res = chr(code) except (ValueError, OverflowError): u_res = chr(0xFFFD) # replacement character elif u_text[0] == "&": # named entity try: u_res = chr(html.entities.name2codepoint[u_name]) except KeyError: try: u_res = chr(name2codepoint[u_name.lower()]) except KeyError: """ Babylon dictionaries contain a lot of non-standard entity, references for example, csdot, fllig, nsm, cancer, thlig, tsdot, upslur... This not just a typo. These entries repeat over and over again. Perhaps they had meaning in the source dictionary that was converted to Babylon, but now the meaning is lost. Babylon does render them as is, that is, for example, &csdot; despite other references like & are replaced with corresponding characters. """ unkownHtmlEntries.add(u_text) u_res = u_text else: raise ArgumentError() return u_res def replaceHtmlEntryCB(u_match): """ u_match: instance of _sre.SRE_Match Same as replaceHtmlEntryNoEscapeCB, but escapes result string Only <, >, & characters are escaped. """ u_res = replaceHtmlEntryNoEscapeCB(u_match) if u_match.group(0) == u_res: # conversion failed return u_res else: return xml_escape(u_res) def replaceDingbat(u_match): """ u_match: instance of _sre.SRE_Match replace chars \\u008c-\\u0095 with \\u2776-\\u277f """ ch = u_match.group(0) code = ch + (0x2776-0x8c) return chr(code) def escapeNewlinesCallback(u_match): """ u_match: instance of _sre.SRE_Match """ ch = u_match.group(0) if ch == "\n": return "\\n" if ch == "\r": return "\\r" if ch == "\\": return "\\\\" return ch def replaceHtmlEntries(u_text): # &ldash; # “ # ċ if log.isDebug(): assert isinstance(u_text, str) return re.sub( u_pat_html_entry, replaceHtmlEntryCB, u_text, ) def replaceHtmlEntriesInKeys(u_text): # &ldash; # “ # ċ if log.isDebug(): assert isinstance(u_text, str) return re.sub( u_pat_html_entry_key, replaceHtmlEntryNoEscapeCB, u_text, ) def escapeNewlines(u_text): """ convert text to c-escaped string: \ -> \\ new line -> \n or \r """ if log.isDebug(): assert isinstance(u_text, str) return re.sub( "[\\r\\n\\\\]", escapeNewlinesCallback, u_text, ) def stripHtmlTags(u_text): if log.isDebug(): assert isinstance(text, str) return re.sub( "(?:<[/a-zA-Z].*?(?:>|$))+", " ", u_text, ) def removeControlChars(u_text): # \x09 - tab # \x0a - line feed # \x0b - vertical tab # \x0d - carriage return if log.isDebug(): assert isinstance(u_text, str) return re.sub( "[\x00-\x08\x0c\x0e-\x1f]", "", u_text, ) def removeNewlines(u_text): if log.isDebug(): assert isinstance(u_text, str) return re.sub( "[\r\n]+", " ", u_text, ) def normalizeNewlines(u_text): """ convert new lines to unix style and remove consecutive new lines """ if log.isDebug(): assert isinstance(u_text, str) return re.sub( "[\r\n]+", "\n", u_text, ) def replaceAsciiCharRefs(b_text, encoding): # “ # ċ if log.isDebug(): assert isinstance(b_text, bytes) b_parts = re.split(b_pat_ascii_char_ref, b_text) for i_part, b_part in enumerate(b_parts): if i_part % 2 != 1: continue # reference try: if b_part[:3].lower() == "&#x": code = int(b_part[3:-1], 16) else: code = int(b_part[2:-1]) if code <= 0: raise ValueError() except (ValueError, OverflowError): code = -1 if code < 128 or code > 255: continue # no need to escape "<", ">", "&" b_parts[i_part] = bytes([code]) return b"".join(b_parts) def fixImgLinks(u_text): """ Fix img tag links src attribute value of image tag is often enclosed in \x1e - \x1f characters. For example: . Naturally the control characters are not part of the image source name. They may be used to quickly find all names of resources. This function strips all such characters. Control characters \x1e and \x1f are useless in html text, so we may safely remove all of them, irrespective of context. """ if log.isDebug(): assert isinstance(u_text, str) return u_text.replace("\x1e", "").replace("\x1f", "") def stripDollarIndexes(b_word): if log.isDebug(): assert isinstance(b_word, bytes) i = 0 b_word_main = b"" strip_count = 0 # number of sequences found # strip $$ sequences while True: d0 = b_word.find(b"$", i) if d0 == -1: b_word_main += b_word[i:] break d1 = b_word.find(b"$", d0+1) if d1 == -1: # log.debug( # "stripDollarIndexes(%s):\npaired $ is not found" % b_word # ) b_word_main += b_word[i:] break if d1 == d0+1: """ You may find keys (or alternative keys) like these: sur l'arbre$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ obscurantiste$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ They all end on a sequence of b'$', key length including dollars is always 60 chars. You may find keys like these: extremidade-$$$-$$$-linha .FIRM$$$$$$$$$$$$$ etc summary: we must remove any sequence of dollar signs longer than 1 chars """ # log.debug("stripDollarIndexes(%s):\nfound $$"%b_word) b_word_main += b_word[i:d0] i = d1 + 1 while i < len(b_word) and b_word[i] == ord(b"$"): i += 1 if i >= len(b_word): break continue if b_word[d0+1:d1].strip(b"0123456789"): # if has at least one non-digit char # log.debug( # "stripDollarIndexes(%s):\nnon-digit between $$'%b_word # ) b_word_main += b_word[i:d1] i = d1 continue if d1+1 < len(b_word) and b_word[d1+1] != 0x20: """ Examples: make do$4$/make /do potere$1$

See also notes... volere$1$

See also notes... Ihre$1$Ihres """ log.debug( "stripDollarIndexes(%s):\n" % b_word + "second $ is followed by non-space" ) pass b_word_main += b_word[i:d0] i = d1+1 strip_count += 1 return b_word_main, strip_count pyglossary-3.2.1/pyglossary/plugins/babylon_bgl/gzip_no_crc.patch0000644000175000017500000000032513575553425025711 0ustar emfoxemfox000000000000007a8,10 > import logging > log = logging.getLogger('root') > 498c501 < raise OSError("CRC check failed %s != %s" % (hex(crc32), --- > log.warning("CRC check failed %s != %s" % (hex(crc32), pyglossary-3.2.1/pyglossary/plugins/babylon_source.py0000644000175000017500000000345013577304507023501 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # Source Glossary for "Babylon Builder". # A plain text file. Not binary like BGL files. from formats_common import * enable = True format = "BabylonSource" description = "Babylon Source (gls)" extentions = [".gls", ".babylon"] readOptions = [] writeOptions = [ "writeInfo", # bool "newline", # str, or choice ("\r\n", "\n", or "\r") "encoding", # str "resources", # bool ] def entryCleanWinArabic(entry): from pyglossary.arabic_utils import cleanWinArabicStr entry.editFuncWord(cleanWinArabicStr) entry.editFuncDefi(cleanWinArabicStr) return entry def write( glos, filename, writeInfo=True, newline="", encoding="", resources=True, ): g = glos entryFilterFunc = None if encoding.lower() in ("", "utf8", "utf-8"): encoding = "UTF-8" elif encoding.lower() in ( "arabic", "windows-1256", "windows-arabic", "arabic-windows", "arabic windows", "windows arabic", ): encoding = "windows-1256" entryFilterFunc = entryCleanWinArabic if not newline: newline = "\r\n" if not newline: newline = "\n" head = "" if writeInfo: head += "\n".join([ "### Glossary title:%s" % g.getInfo("name"), "### Author:%s" % g.getInfo("author"), "### Description:%s" % g.getInfo("description"), "### Source language:%s" % g.getInfo("inputlang"), "### Source alphabet:%s" % encoding, "### Target language:%s" % g.getInfo("outputlang"), "### Target alphabet:%s" % encoding, "### Browsing enabled?Yes", "### Type of glossary:00000000", "### Case sensitive words?0" "", "### Glossary section:", "", ]) g.writeTxt( "\n", "\n\n", filename=filename, writeInfo=False, rplList=( ("\n", "
"), ), ext=".gls", head=head, entryFilterFunc=entryFilterFunc, encoding=encoding, newline=newline, resources=resources, ) pyglossary-3.2.1/pyglossary/plugins/cc_cedict/0000755000175000017500000000000013577304644022021 5ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/plugins/cc_cedict/.gitignore0000644000175000017500000000003713575553425024012 0ustar emfoxemfox00000000000000.*.swp __pycache__ *.pyc venv pyglossary-3.2.1/pyglossary/plugins/cc_cedict/__init__.py0000644000175000017500000000320713577304507024132 0ustar emfoxemfox00000000000000import re from formats_common import log from . import conv enable = True format = "CC-CEDICT" description = "CC-CEDICT" # this typo is part of the API used by PyGlossary; don't change it! extentions = (".u8",) entry_count_reg = re.compile(r"#! entries=(\d+)") class Reader: def __init__(self, glos): self._glos = glos self.file = None self.total_entries = self.entries_left = None def open(self, filename, encoding="utf-8"): if self.file is not None: self.file.close() self.file = open(filename, "r", encoding=encoding) for line in self.file: match = entry_count_reg.match(line) if match is not None: count = match.groups()[0] self.total_entries = self.entries_left = int(count) break else: self.close() raise RuntimeError("CC-CEDICT: could not find entry count") def close(self): if self.file is not None: self.file.close() self.file = None self.total_entries = self.entries_left = None def __len__(self): if self.total_entries is None: raise RuntimeError("CC-CEDICT: len(reader) called "\ "while reader is not open") return self.total_entries def __iter__(self): if self.file is None: raise RuntimeError("CC-CEDICT: tried to iterate over entries "\ "while reader is not open") for line in self.file: if not line.startswith("#"): if self.entries_left == 0: log.warning("more entries than the header claimed?!") self.entries_left -= 1 parts = conv.parse_line(line) if parts is None: log.warning("bad line: %s", line) continue names, article = conv.make_entry(*parts) entry = self._glos.newEntry(names, article, defiFormat="h") yield entry pyglossary-3.2.1/pyglossary/plugins/cc_cedict/article.html0000644000175000017500000000120113575553425024325 0ustar emfoxemfox00000000000000{% macro colorize(syllables, tones) %}
{% for syllable, tone in zip(syllables, tones) %} {{syllable}} {% endfor %}
{% endmacro %}
{{colorize(simp, tones)}} {% if trad != simp %}  /  {{colorize(trad, tones)}} {% endif %}
{{colorize(pinyin, tones)}}
    {% for defn in defns %}
  • {{defn}}
  • {% endfor %}
pyglossary-3.2.1/pyglossary/plugins/cc_cedict/conv.py0000644000175000017500000000313113575553425023337 0ustar emfoxemfox00000000000000import re import os from .pinyin import convert from .summarize import summarize line_reg = re.compile(r"^([^ ]+) ([^ ]+) \[([^\]]+)\] /(.+)/$") jinja_env = None script_dir = os.path.dirname(__file__) COLORS = { "": "black", "1": "red", "2": "orange", "3": "green", "4": "blue", "5": "black", } try: ModuleNotFoundError except NameError: ModuleNotFoundError = ImportError def load_jinja(): global jinja_env try: import jinja2 except ModuleNotFoundError as e: e.msg += ", run `sudo pip3 install jinja2` to install" raise e from .jinja2htmlcompress import HTMLCompress jinja_env = jinja2.Environment( loader=jinja2.FileSystemLoader(script_dir), extensions=[HTMLCompress], ) def parse_line(line): line = line.strip() match = line_reg.match(line) if match is None: return None trad, simp, pinyin, eng = match.groups() pinyin = pinyin.replace("u:", "v") eng = eng.split("/") return trad, simp, pinyin, eng def make_entry(trad, simp, pinyin, eng): eng_names = list(map(summarize, eng)) names = [simp, trad, pinyin] + eng_names article = render_article(trad, simp, pinyin, eng) return names, article def render_article(trad, simp, pinyin, eng): if jinja_env is None: load_jinja() pinyin_tones = [convert(syl) for syl in pinyin.split()] nice_pinyin = [] tones = [] for syllable in pinyin.split(): nice_syllable, tone = convert(syllable) nice_pinyin.append(nice_syllable) tones.append(tone) template = jinja_env.get_template("article.html") return template.render( zip=zip, COLORS=COLORS, trad=trad, simp=simp, pinyin=nice_pinyin, tones=tones, defns=eng, ) pyglossary-3.2.1/pyglossary/plugins/cc_cedict/jinja2htmlcompress.py0000644000175000017500000001773413575553425026226 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # copied from https://github.com/mitsuhiko/jinja2-htmlcompress """ jinja2htmlcompress ~~~~~~~~~~~~~~~~~~ A Jinja2 extension that eliminates useless whitespace at template compilation time without extra overhead. :copyright: (c) 2011 by Armin Ronacher. :license: BSD, see bottom of file for more details. """ import re from jinja2.ext import Extension from jinja2.lexer import Token, describe_token from jinja2 import TemplateSyntaxError _tag_re = re.compile(r'(?:<(/?)([a-zA-Z0-9_-]+)\s*|(>\s*))(?s)') _ws_normalize_re = re.compile(r'[ \t\r\n]+') class StreamProcessContext(object): def __init__(self, stream): self.stream = stream self.token = None self.stack = [] def fail(self, message): raise TemplateSyntaxError(message, self.token.lineno, self.stream.name, self.stream.filename) def _make_dict_from_listing(listing): rv = {} for keys, value in listing: for key in keys: rv[key] = value return rv class HTMLCompress(Extension): isolated_elements = set(['script', 'style', 'noscript', 'textarea']) void_elements = set(['br', 'img', 'area', 'hr', 'param', 'input', 'embed', 'col']) block_elements = set(['div', 'p', 'form', 'ul', 'ol', 'li', 'table', 'tr', 'tbody', 'thead', 'tfoot', 'tr', 'td', 'th', 'dl', 'dt', 'dd', 'blockquote', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre']) breaking_rules = _make_dict_from_listing([ (['p'], set(['#block'])), (['li'], set(['li'])), (['td', 'th'], set(['td', 'th', 'tr', 'tbody', 'thead', 'tfoot'])), (['tr'], set(['tr', 'tbody', 'thead', 'tfoot'])), (['thead', 'tbody', 'tfoot'], set(['thead', 'tbody', 'tfoot'])), (['dd', 'dt'], set(['dl', 'dt', 'dd'])) ]) def is_isolated(self, stack): for tag in reversed(stack): if tag in self.isolated_elements: return True return False def is_breaking(self, tag, other_tag): breaking = self.breaking_rules.get(other_tag) return breaking and (tag in breaking or ('#block' in breaking and tag in self.block_elements)) def enter_tag(self, tag, ctx): while ctx.stack and self.is_breaking(tag, ctx.stack[-1]): self.leave_tag(ctx.stack[-1], ctx) if tag not in self.void_elements: ctx.stack.append(tag) def leave_tag(self, tag, ctx): if not ctx.stack: ctx.fail('Tried to leave "%s" but something closed ' 'it already' % tag) if tag == ctx.stack[-1]: ctx.stack.pop() return for idx, other_tag in enumerate(reversed(ctx.stack)): if other_tag == tag: for num in range(idx + 1): ctx.stack.pop() elif not self.breaking_rules.get(other_tag): break def normalize(self, ctx): pos = 0 buffer = [] def write_data(value): if not self.is_isolated(ctx.stack): value = _ws_normalize_re.sub(' ', value.strip()) buffer.append(value) for match in _tag_re.finditer(ctx.token.value): closes, tag, sole = match.groups() preamble = ctx.token.value[pos:match.start()] write_data(preamble) if sole: write_data(sole) else: buffer.append(match.group()) (closes and self.leave_tag or self.enter_tag)(tag, ctx) pos = match.end() write_data(ctx.token.value[pos:]) return ''.join(buffer) def filter_stream(self, stream): ctx = StreamProcessContext(stream) for token in stream: if token.type != 'data': yield token continue ctx.token = token value = self.normalize(ctx) yield Token(token.lineno, 'data', value) class SelectiveHTMLCompress(HTMLCompress): def filter_stream(self, stream): ctx = StreamProcessContext(stream) strip_depth = 0 while 1: if stream.current.type == 'block_begin': if stream.look().test('name:strip') or \ stream.look().test('name:endstrip'): stream.skip() if stream.current.value == 'strip': strip_depth += 1 else: strip_depth -= 1 if strip_depth < 0: ctx.fail('Unexpected tag endstrip') stream.skip() if stream.current.type != 'block_end': ctx.fail('expected end of block, got %s' % describe_token(stream.current)) stream.skip() if strip_depth > 0 and stream.current.type == 'data': ctx.token = stream.current value = self.normalize(ctx) yield Token(stream.current.lineno, 'data', value) else: yield stream.current next(stream) def test(): from jinja2 import Environment env = Environment(extensions=[HTMLCompress]) tmpl = env.from_string(''' {{ title }}
  • {{ title }}
    Test Foo
  • {{ title }} ''') print(tmpl.render(title=42, href='index.html')) env = Environment(extensions=[SelectiveHTMLCompress]) tmpl = env.from_string(''' Normal unchanged stuff {% strip %}Stripped test test {{ foo }} Normal again {{ foo }}

    Foo
    Bar Baz

    Moep Test Moep

    {% endstrip %} ''') print(tmpl.render(foo=42)) if __name__ == '__main__': test() # What follows is the original LICENSE file; there was not actually an AUTHORS # file in the original repo, so its absence here should not be an issue. # Copyright (c) 2011 by Armin Ronacher, see AUTHORS for more details. # # Some rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are # met: # # * Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # * Redistributions in binary form must reproduce the above # copyright notice, this list of conditions and the following # disclaimer in the documentation and/or other materials provided # with the distribution. # # * The names of the contributors may not be used to endorse or # promote products derived from this software without specific # prior written permission. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. pyglossary-3.2.1/pyglossary/plugins/cc_cedict/pinyin.py0000644000175000017500000000137713575553425023712 0ustar emfoxemfox00000000000000# coding=utf-8 # based on https://github.com/zkoch/CEDICT_Parser TONES = { "a1":"ā", "a2":"á", "a3":"ǎ", "a4":"à", "e1":"ē", "e2":"é", "e3":"ě", "e4":"è", "i1":"ī", "i2":"í", "i3":"ǐ", "i4":"ì", "o1":"ō", "o2":"ó", "o3":"ǒ", "o4":"ò", "u1":"ū", "u2":"ú", "u3":"ǔ", "u4":"ù", "v1":"ǖ", "v2":"ǘ", "v3":"ǚ", "v4":"ǜ", } # using v for the umlauted u VOWELS = ("a", "e", "o", "iu", "ui", "i", "u", "v") def convert(word): tone = word[-1] pinyin = word[0:-1].lower() result = pinyin if tone == "5": return pinyin, tone elif tone not in ("1", "2", "3", "4"): return word, "" for vowel in VOWELS: if vowel in pinyin: vowel1 = vowel[-1] result = pinyin.replace(vowel1, TONES[vowel1 + tone]) break return result, tone pyglossary-3.2.1/pyglossary/plugins/cc_cedict/summarize.py0000644000175000017500000000273113575553425024413 0ustar emfoxemfox00000000000000import re import string parenthetical = re.compile(r"\([^)]+?\)") punct_table = {ord(p): " " for p in string.punctuation if p not in "-'"} stops = {"i","me","my","myself","we","our","ours","ourselves","you","your","yours","yourself","yourselves","he","him","his","himself","she","her","hers","herself","it","its","itself","they","them","their","theirs","themselves","what","which","who","whom","this","that","these","those","am","is","are","was","were","be","been","being","have","has","had","having","do","does","did","doing","a","an","the","and","but","if","or","because","as","until","while","of","at","by","for","with","about","against","between","into","through","during","before","after","above","below","to","from","up","down","in","out","on","off","over","under","again","further","then","once","here","there","when","where","why","how","all","any","both","each","few","more","most","other","some","such","no","nor","not","only","own","same","so","than","too","very","s","t","can","will","just","don","should","now","d","ll","m","o","re","ve","y","ain","aren","couldn","didn","doesn","hadn","hasn","haven","isn","ma","mightn","mustn","needn","shan","shouldn","wasn","weren","won","wouldn"} def summarize(phrase): phrase = parenthetical.sub("", phrase) phrase = phrase.translate(punct_table) words = phrase.split() relevant_words = [word for word in words if word not in stops] if not relevant_words: relevant_words = words summary = " ".join(relevant_words[:10]) return summary pyglossary-3.2.1/pyglossary/plugins/csv_pyg.py0000644000175000017500000000704713577304507022153 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # # Copyright © 2013 Saeed Rasooli (ilius) # This file is part of PyGlossary project, https://github.com/ilius/pyglossary # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program. Or on Debian systems, from /usr/share/common-licenses/GPL # If not, see . from formats_common import * import csv from pyglossary.file_utils import fileCountLines enable = True format = "Csv" description = "CSV" extentions = [".csv"] readOptions = [ "encoding", # str ] writeOptions = [ "encoding", # str "resources", # bool ] supportsAlternates = True class Reader(object): def __init__(self, glos): self._glos = glos self.clear() def clear(self): self._filename = "" self._file = None self._leadingLinesCount = 0 self._wordCount = None self._pos = -1 self._csvReader = None self._resDir = "" self._resFileNames = [] def open(self, filename, encoding="utf-8"): self._filename = filename self._file = open(filename, "r", encoding=encoding) self._csvReader = csv.reader( self._file, dialect="excel", ) self._resDir = filename + "_res" if isdir(self._resDir): self._resFileNames = os.listdir(self._resDir) else: self._resDir = "" self._resFileNames = [] def close(self): if self._file: try: self._file.close() except: log.exception("error while closing csv file") self.clear() def __len__(self): if self._wordCount is None: log.debug("Try not to use len(reader) as it takes extra time") self._wordCount = fileCountLines(self._filename) - \ self._leadingLinesCount return self._wordCount + len(self._resFileNames) def __iter__(self): if not self._csvReader: log.error("%s is not open, can not iterate" % self) raise StopIteration wordCount = 0 for row in self._csvReader: wordCount += 1 if not row: yield None # update progressbar continue try: word = row[0] defi = row[1] except IndexError: log.error("invalid row: %r" % row) yield None # update progressbar continue try: alts = row[2].split(",") except IndexError: pass else: word = [word] + alts yield self._glos.newEntry(word, defi) self._wordCount = wordCount resDir = self._resDir for fname in self._resFileNames: with open(join(resDir, fname), "rb") as fromFile: yield self._glos.newDataEntry( fname, fromFile.read(), ) def write(glos, filename, encoding="utf-8", resources=True): resDir = filename + "_res" if not isdir(resDir): os.mkdir(resDir) with open(filename, "w", encoding=encoding) as csvfile: writer = csv.writer( csvfile, dialect="excel", quoting=csv.QUOTE_ALL, # FIXME ) for entry in glos: if entry.isData(): if resources: entry.save(resDir) continue words = entry.getWords() if not words: continue word, alts = words[0], words[1:] defi = entry.getDefi() row = [ word, defi, ] if alts: row.append(",".join(alts)) writer.writerow(row) if not os.listdir(resDir): os.rmdir(resDir) pyglossary-3.2.1/pyglossary/plugins/dicformids.py0000644000175000017500000001210513577304507022613 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- import re from string import Template from tabfile import Reader as TabfileReader from formats_common import * enable = True format = "Dicformids" description = "DictionaryForMIDs" extentions = [".mids"] readOptions = [] writeOptions = [] PROP_TEMPLATE = """#DictionaryForMIDs property file infoText=$name, author: $author indexFileMaxSize=$indexFileMaxSize\n language1IndexNumberOfSourceEntries=%wordCount language1DictionaryUpdateClassName=de.kugihan.dictionaryformids.dictgen.DictionaryUpdate indexCharEncoding=ISO-8859-1 dictionaryFileSeparationCharacter='\\t' language2NormationClassName=de.kugihan.dictionaryformids.translation.Normation language2DictionaryUpdateClassName=de.kugihan.dictionaryformids.dictgen.DictionaryUpdate logLevel=0 language1FilePostfix=$directoryPostfix dictionaryCharEncoding=UTF-8 numberOfAvailableLanguages=2 language1IsSearchable=true language2GenerateIndex=false dictionaryFileMaxSize=$dicMaxSize language2FilePostfix=$language2FilePostfix searchListFileMaxSize=20000 language2IsSearchable=false fileEncodingFormat=plain_format1 language1HasSeparateDictionaryFile=true searchListCharEncoding=ISO-8859-1 searchListFileSeparationCharacter='\t' indexFileSeparationCharacter='\t' language1DisplayText=$inputlang language2HasSeparateDictionaryFile=false dictionaryGenerationInputCharEncoding=UTF-8 language1GenerateIndex=true language2DisplayText=$outputlang language1NormationClassName=de.kugihan.dictionaryformids.translation.NormationEng """ class Reader(object): def __init__(self, glos): self._glos = glos self._tabFileNames = [] self._tabFileReader = None def open(self, dirname): self._dirname = dirname dicFiles = [] orderFileNames = [] for fname in os.listdir(dirname): if not fname.startswith("directory"): continue try: num = re.findall("\d+", fname)[-1] except IndexError: pass else: orderFileNames.append((num, fname)) orderFileNames.sort( key=lambda x: x[0], reverse=True, ) self._tabFileNames = [x[1] for x in orderFileNames] self.nextTabFile() def __len__(self): # FIXME raise NotImplementedError def __iter__(self): return self def __next__(self): for _ in range(10): try: return next(self._tabFileReader) except StopIteration: self._tabFileReader.close() self.nextTabFile() def nextTabFile(self): try: tabFileName = self._tabFileNames.pop() except IndexError: raise StopIteration self._tabFileReader = TabfileReader(self._glos, hasInfo=False) self._tabFileReader.open(join(self._dirname, tabFileName)) def close(self): if self._tabFileReader: try: self._tabFileReader.close() except: pass self._tabFileReader = None self._tabFileNames = [] class Writer(object): def __init__(self, glos): self._glos = glos self.linesPerDirectoryFile = 500 # 200 self.indexFileMaxSize = 32722 # 30000 self.directoryPostfix = "" self.indexPostfix = "Eng" self.dirname = "" def open(self, dirname): self.dirname = dirname if not os.path.isdir(dirname): os.mkdir(dirname) def writeGetIndexGen(self): dicMaxSize = 0 wordCount = 0 for dicIndex, entryList in enumerate( self._glos.iterEntryBuckets( self.linesPerDirectoryFile ) ): # assert len(entryList) == 200 dicFp = open(join( self.dirname, "directory%s%d.csv" % ( self.directoryPostfix, dicIndex+1, ), ), "w") for entry in entryList: if entry.isData(): # FIXME continue wordCount += 1 word = entry.getWord() defi = entry.getDefi() dicLine = "%s\t%s\n" % (word, defi) dicPos = dicFp.tell() dicFp.write(dicLine) yield word, dicIndex+1, dicPos dicMaxSize = max(dicMaxSize, dicFp.tell()) dicFp.close() self.dicMaxSize = dicMaxSize self.wordCount = wordCount def writeProbs(self): glos = self._glos with open(join( self.dirname, "DictionaryForMIDs.properties", ), "w") as fp: fp.write(Template(PROP_TEMPLATE).substitute( name=glos.getInfo("name"), author=glos.getInfo("author"), indexFileMaxSize=self.indexFileMaxSize, wordCount=self.wordCount, directoryPostfix=self.directoryPostfix, dicMaxSize=self.dicMaxSize+1, language2FilePostfix="fa", # FIXME inputlang=glos.getInfo("inputlang"), outputlang=glos.getInfo("outputlang"), )) # open(join( # self.dirname, # "searchlist%s.csv"%self.directoryPostfix # ), "w") # FIXME def nextIndex(self): try: self.indexFp.close() except AttributeError: self.indexIndex = 0 self.indexIndex += 1 fname = "index%s%d.csv" % (self.indexPostfix, self.indexIndex) fpath = join(self.dirname, fname) self.indexFp = open(fpath, "w") def write(self): self.nextIndex() for word, dicIndex, dicPos in self.writeGetIndexGen(): indexLine = "%s\t%d-%d-B\n" % ( word, dicIndex + 1, dicPos, ) if ( self.indexFp.tell() + len(indexLine) ) > self.indexFileMaxSize - 10: self.nextIndex() self.indexFp.write(indexLine) self.indexFp.close() self.writeProbs() # def close(self): # pass def write(glos, filename): writer = Writer(glos) writer.open(filename) writer.write() pyglossary-3.2.1/pyglossary/plugins/dict_org.py0000644000175000017500000001111213577304507022257 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- from formats_common import * from pyglossary.file_utils import fileCountLines enable = True format = "DictOrg" description = "DICT.org file format (.index)" extentions = [".index"] readOptions = [] writeOptions = [ "dictzip", # bool "install", # bool ] sortOnWrite = DEFAULT_YES b64_chars = b"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/" b64_chars_ord = {c: i for i, c in enumerate(b64_chars)} def intToIndexStr(n, retry=0): chars = [] while True: chars.append(b64_chars[n & 0x3f]) n >>= 6 if n == 0: break return bytes(reversed(chars)) def indexStrToInt(st): n = 0 for i, c in enumerate(reversed(list(st))): k = b64_chars_ord[c] assert 0 <= k < 64 n |= (k << 6*i) # += is safe # |= is probably a little faster # |= is also safe because n has lesser that 6*i bits. why? ## FIXME return n def installToDictd(filename, title=""): """ filename is without extention (neither .index or .dict or .dict.dz) """ import shutil targetDir = "/usr/share/dictd/" log.info("Installing %r to DICTD server" % filename) if os.path.isfile(filename + ".dict.dz"): dictPostfix = ".dict.dz" elif os.path.isfile(filename + ".dict"): dictPostfix = ".dict" else: log.error("No .dict file, could not install dictd file %r" % filename) return False if not filename.startswith(targetDir): shutil.copy(filename + ".index", targetDir) shutil.copy(filename + dictPostfix, targetDir) fname = split(filename)[1] if not title: title = fname open("/var/lib/dictd/db.list", "a").write(""" database %s { data %s index %s } """ % ( title, join(targetDir, fname + dictPostfix), join(targetDir, fname + ".index"), )) class Reader(object): def __init__(self, glos): self._glos = glos self._filename = "" self._indexFp = None self._dictFp = None self._leadingLinesCount = 0 self._len = None def open(self, filename): import gzip if filename.endswith(".index"): filename = filename[:-6] self._filename = filename self._indexFp = open(filename+".index", "rb") if os.path.isfile(filename+".dict.dz"): self._dictFp = gzip.open(filename+".dict.dz") else: self._dictFp = open(filename+".dict", "rb") def close(self): if self._indexFp is not None: try: self._indexFp.close() except: log.exception("error while closing index file") self._indexFp = None if self._dictFp is not None: try: self._dictFp.close() except: log.exception("error while closing dict file") self._dictFp = None def __len__(self): if self._len is None: log.debug("Try not to use len(reader) as it takes extra time") self._len = fileCountLines( self._filename + ".index" ) - self._leadingLinesCount return self._len def __iter__(self): if not self._indexFp: log.error("reader is not open, can not iterate") raise StopIteration # read info from header of dict file # FIXME word = "" sumLen = 0 wrongSortedN = 0 wordCount = 0 # __________________ IMPORTANT PART __________________ # for line in self._indexFp: line = line.strip() if not line: continue parts = line.split(b"\t") assert len(parts) == 3 word = parts[0].replace(b"
    ", b"\\n")\ .replace(b"
    ", b"\\n") sumLen2 = indexStrToInt(parts[1]) if sumLen2 != sumLen: wrongSortedN += 1 sumLen = sumLen2 defiLen = indexStrToInt(parts[2]) self._dictFp.seek(sumLen) defi = self._dictFp.read(defiLen) defi = defi.replace(b"
    ", b"\n").replace(b"
    ", b"\n") sumLen += defiLen yield self._glos.newEntry(toStr(word), toStr(defi)) wordCount += 1 # ____________________________________________________ # if wrongSortedN > 0: log.warning("Warning: wrong sorting count: %d" % wrongSortedN) self._len = wordCount def write(glos, filename, dictzip=True, install=True): # FIXME from pyglossary.text_utils import runDictzip (filename_nox, ext) = splitext(filename) if ext.lower() == ".index": filename = filename_nox indexFd = open(filename+".index", "wb") dictFd = open(filename+".dict", "wb") dictMark = 0 for entry in glos: if entry.isData(): # does dictd support resources? and how? FIXME continue word = toBytes(entry.getWord()) defi = toBytes(entry.getDefi()) lm = len(defi) indexFd.write( word + b"\t" + intToIndexStr(dictMark) + b"\t" + intToIndexStr(lm) + b"\n" ) # FIXME dictFd.write(toBytes(defi)) dictMark += lm indexFd.close() dictFd.close() # for key, value in glos.iterInfo(): # if not value: # continue # pass # FIXME if dictzip: runDictzip(filename) if install: installToDictd(filename, glos.getInfo("name").replace(" ", "_")) pyglossary-3.2.1/pyglossary/plugins/dsl/0000755000175000017500000000000013577304644020703 5ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/plugins/dsl/__init__.py0000644000175000017500000002344013577304507023015 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # dsl/__init__.py # Read ABBYY Lingvo DSL dictionary format # # Copyright (C) 2013 Xiaoqiang Wang # Copyright (C) 2013-2016 Saeed Rasooli # Copyright (C) 2016 Ratijas # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, version 3 of the License. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. import re import html.entities from xml.sax.saxutils import escape, quoteattr from formats_common import * from . import flawless_dsl enable = True format = "ABBYYLingvoDSL" description = "ABBYY Lingvo DSL (dsl)" extentions = [".dsl"] readOptions = ["encoding", "audio", "onlyFixMarkUp"] writeOptions = [] __all__ = ["read"] # {{{ # modified to work around codepoints that are not supported by `unichr`. # http://effbot.org/zone/re-sub.htm#unescape-html # January 15, 2003 | Fredrik Lundh # Removes HTML or XML character references and entities from a text string. # # @param text The HTML (or XML) source text. # @return The plain text, as a Unicode string, if necessary. def unescape(text): def fixup(m): text = m.group(0) if text[:2] == "&#": # character reference try: if text[:3] == "&#x": i = int(text[3:-1], 16) else: i = int(text[2:-1]) except ValueError: pass else: try: return chr(i) except ValueError: return ("\\U%08x" % i)\ .decode("unicode-escape").encode("utf-8") else: # named entity try: text = chr(html.entities.name2codepoint[text[1:-1]]) except KeyError: pass return text # leave as is return re.sub("&#?\w+;", fixup, text) # }}} def make_a_href(s): return "%s" % (quoteattr(s), escape(s)) def ref_sub(x): return make_a_href(unescape(x.groups()[0])) # order matters, a lot. shortcuts = [ # canonical: m > * > ex > i > c ( r'[i][c](.*?)[/c][/i]', r'\g<1>' ), ( r'[m(\d)][ex](.*?)[/ex][/m]', r'
    \g<2>
    ' ), ( r'[m(\d)][*][ex](.*?)[/ex][/*][/m]', r'
    \g<2>
    ' ), ( r'[*][ex](.*?)[/ex][/*]', r'\g<1>' ), ( r'[m1](?:-{2,})[/m]', '
    ' ), ( r'[m(\d)](?:-{2,})[/m]', r'
    ' ), ] shortcuts = [ ( re.compile(repl.replace('[', r'\[').replace('*]', r'\*]')), sub ) for (repl, sub) in shortcuts ] # precompiled regexs re_brackets_blocks = re.compile(r'\{\{[^}]*\}\}') re_lang_open = re.compile(r'(?
    {}
    [m{}] =>
    [*] => [ex] => [c] => [p] => ['] => [b] => [i] => [u] => [sup] => [sub] => [ref] \ [url] } => {} <<...>> / [s] => [s] => {} [t] => {{...}} \ [trn] | [!trn] | [trs] } => remove [!trs] | [lang ...] | [com] / """ # remove {{...}} blocks line = re_brackets_blocks.sub("", line) # remove trn tags # re_trn = re.compile("\[\/?!?tr[ns]\]") line = line \ .replace("[trn]", "") \ .replace("[/trn]", "") \ .replace("[trs]", "") \ .replace("[/trs]", "") \ .replace("[!trn]", "") \ .replace("[/!trn]", "") \ .replace("[!trs]", "") \ .replace("[/!trs]", "") # remove lang tags line = re_lang_open.sub("", line).replace("[/lang]", "") # remove com tags line = line.replace("[com]", "").replace("[/com]", "") # remove t tags line = line.replace( "[t]", "" ) line = line.replace("[/t]", "") line = _parse(line) line = re.sub(r"\\$", "
    ", line) # paragraph, part one: before shortcuts. line = line.replace("[m]", "[m1]") # if line somewhere contains "[m_]" tag like # "[b]I[/b][m1] [c][i]conj.[/i][/c][/m][m1]1) ...[/m]" # then leave it alone. only wrap in "[m1]" when no "m" tag found at all. if not re_m_open.search(line): line = "[m1]%s[/m]" % line line = apply_shortcuts(line) # paragraph, part two: if any not shourcuted [m] left? line = re_m.sub(r'
    \g<2>
    ', line) # text formats line = line.replace("[']", '').replace("[/']", '') line = line.replace('[b]', '').replace('[/b]', '') line = line.replace('[i]', '').replace('[/i]', '') line = line.replace('[u]', '').replace('[/u]', '') line = line.replace('[sup]', '').replace('[/sup]', '') line = line.replace('[sub]', '').replace('[/sub]', '') # color line = line.replace('[c]', r'') line = re_c_open_color.sub(r'', line) line = line.replace('[/c]', r'') # example zone line = line.replace('[ex]', r'') line = line.replace('[/ex]', r'') # secondary zone line = line.replace('[*]', '')\ .replace('[/*]', '') # abbrev. label line = line.replace('[p]', '') line = line.replace('[/p]', '') # cross reference line = line.replace('[ref]', '<<').replace('[/ref]', '>>') line = line.replace('[url]', '<<').replace('[/url]', '>>') line = re.sub('<<(.*?)>>', ref_sub, line) # sound file if audio: sound_tag = r'' \ '' \ '' else: sound_tag = "" line = re_sound.sub(sound_tag, line) # image file line = re_img.sub( r'\g<1>\g<2>', line, ) # \[...\] line = line.replace(r'\[', '[').replace(r'\]', ']') return line def unwrap_quotes(s): return wrapped_in_quotes_re.sub(r'\2', s) def read(glos, fname, **options): encoding = options.get("encoding") audio = (options.get("audio", "no") == "yes") onlyFixMarkUp = (options.get("onlyFixMarkUp", "no") == "yes") if onlyFixMarkUp: def clean_tags(line, audio): return _parse(line) else: clean_tags = _clean_tags def setInfo(key, value): glos.setInfo(key, unwrap_quotes(value)) current_key = "" current_key_alters = [] current_text = [] line_type = "header" unfinished_line = "" if not encoding: for testEncoding in ("utf-8", "utf-16"): with open(fname, "r", encoding=testEncoding) as fp: try: for i in range(10): fp.readline() except UnicodeDecodeError: log.info("Encoding of DSL file is not %s" % testEncoding) continue else: log.info("Encoding of DSL file detected: %s" % testEncoding) encoding = testEncoding break if not encoding: raise ValueError("Could not detect encoding of DSL file, specify it by: --read-options encoding=ENCODING") fp = open(fname, "r", encoding=encoding) for line in fp: line = line.rstrip() if not line: continue # header if line.startswith("#"): if line.startswith("#NAME"): setInfo("title", line[6:]) elif line.startswith("#INDEX_LANGUAGE"): setInfo("sourceLang", line[16:]) elif line.startswith("#CONTENTS_LANGUAGE"): setInfo("targetLang", line[19:]) line_type = "header" # texts elif line.startswith(" ") or line.startswith("\t"): line_type = "text" line = unfinished_line + line.lstrip() # some ill formated source may have tags spanned into # multiple lines # try to match opening and closing tags tags_open = re.findall(r'(? start new title else: # append previous entry if line_type == "text": if unfinished_line: # line may be skipped if ill formated current_text.append(clean_tags(unfinished_line, audio)) glos.addEntry( [current_key] + current_key_alters, "\n".join(current_text), ) # start new entry current_key = line current_key_alters = [] current_text = [] unfinished_line = "" line_type = "title" fp.close() # last entry if line_type == "text": glos.addEntry( [current_key] + current_key_alters, "\n".join(current_text), ) pyglossary-3.2.1/pyglossary/plugins/dsl/flawless_dsl/0000755000175000017500000000000013577304644023365 5ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/plugins/dsl/flawless_dsl/__init__.py0000644000175000017500000000147613575553425025507 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # flawless_dsl/__init__.py # # # Copyright (C) 2016 Ratijas # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, version 3 of the License. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. """ only clean flawless DSL markup on output! """ from . import layer from . import tag from .main import ( FlawlessDSLParser, parse, ) pyglossary-3.2.1/pyglossary/plugins/dsl/flawless_dsl/layer.py0000644000175000017500000000455213575553425025062 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # flawless_dsl/layer.py # # Copyright (C) 2016 Ratijas # Copyright (C) 2016 Saeed Rasooli # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, version 3 of the License. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. """ internal stuff. Layer class """ from . import tag class Layer(object): __slots__ = ['tags', 'text'] def __init__(self, stack): stack.append(self) self.tags = set() self.text = '' def __contains__(self, tag): """ :param tag: tag.Tag :return: bool """ return tag in self.tags def __repr__(self): return 'Layer({%s}, %r)' % (', '.join(map(str, self.tags)), self.text) def __eq__(self, other): """ mostly for unittest. """ return self.text == other.text and self.tags == other.tags i_and_c = {tag.Tag('i', 'i'), tag.Tag('c', 'c')} p_tag = tag.Tag('p', 'p') def close_tags(stack, tags, layer_index=-1): """ close given tags on layer with index `layer_index`. :param stack: Iterable[Layer] :param layer_index: int :param tags: Iterable[tag.Tag] :return: None """ if layer_index == -1: layer_index = len(stack) - 1 layer = stack[layer_index] if layer.text: tags = set.intersection(layer.tags, tags) if not tags: return # shortcut: [i][c] equivalent to [p] if tags.issuperset(i_and_c): tags -= i_and_c tags.add(p_tag) layer.tags -= i_and_c # no need to layer.tags.add() ordered_tags = tag.canonical_order(tags) layer.text = '[%s]%s[/%s]' % ( ']['.join([x.opening for x in ordered_tags]), layer.text, '][/'.join([x.closing for x in reversed(ordered_tags)])) # remove tags from layer layer.tags -= tags if layer.tags or layer_index == 0: return superlayer = stack[layer_index - 1] superlayer.text += layer.text del stack[layer_index] def close_layer(stack): """ close top layer on stack. """ if not stack: return tags = stack[-1].tags close_tags(stack, tags) pyglossary-3.2.1/pyglossary/plugins/dsl/flawless_dsl/main.py0000644000175000017500000001646313575553425024676 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # flawless_dsl/main.py # # Copyright (C) 2016 Ratijas # Copyright (C) 2016 Saeed Rasooli # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, version 3 of the License. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. """ exposed API lives here. """ import copy import re from . import tag as _tag from . import layer as _layer def process_closing_tags(stack, tags): """ close `tags`, closing some inner layers if necessary. :param stack: Iterable[layer.Layer] :param tags: Iterable[str] """ index = len(stack) - 1 for tag in copy.copy(tags): index_for_tag = _tag.index_of_layer_containing_tag(stack, tag) if index_for_tag is not None: index = min(index, index_for_tag) else: tags.remove(tag) if not tags: return to_open = set() for layer in stack[:index:-1]: for lt in layer.tags: if lt.closing not in tags: to_open.add(lt) _layer.close_layer(stack) to_close = set() layer = stack[index] for lt in layer.tags: if lt.closing in tags: to_close.add(lt) _layer.close_tags(stack, to_close, index) if to_open: _layer.Layer(stack) stack[-1].tags = to_open OPEN = 1 CLOSE = 2 TEXT = 3 BRACKET_L = "\0\1" BRACKET_R = "\0\2" # precompiled regexs re_m_tag_with_content = re.compile(r"(\[m\d\])(.*?)(\[/m\])") re_non_escaped_bracket = re.compile(r"(?= 1: # close all layers. [m*] tags can only appear # at top layer. # note: do not reopen tags that were marked as # closed already. to_open = set.union(*( {t for t in l.tags if t.closing not in closings} for l in stack )) for i in range(len(stack)): _layer.close_layer(stack) # assert len(stack) == 1 # assert not stack[0].tags _layer.Layer(stack) stack[-1].tags = to_open elif state is CLOSE: process_closing_tags(stack, closings) if not stack or stack[-1].text: _layer.Layer(stack) stack[-1].tags.add(item) state = OPEN continue elif item_t is CLOSE: if state in (OPEN, TEXT): closings.clear() closings.add(item) state = CLOSE continue elif item_t is TEXT: if state is CLOSE: process_closing_tags(stack, closings) if not stack: _layer.Layer(stack) stack[-1].text += item state = TEXT continue if state is CLOSE and closings: process_closing_tags(stack, closings) # shutdown unclosed tags return "".join([l.text for l in stack]) def put_brackets_away(self, line): """put away \[, \] and brackets that does not belong to any of given tags. :rtype: str """ clean_line = "" startswith_tag = _startswith_tag_cache.get(self.tags, None) if startswith_tag is None: openings = "|".join("%s%s" % (_[1], _[2]) for _ in self.tags) closings = "|".join(_[1] for _ in self.tags) startswith_tag = re.compile( r"(?:(?:%s)|/(?:%s))\]" % (openings, closings) ) _startswith_tag_cache[self.tags] = startswith_tag for i, chunk in enumerate(re_non_escaped_bracket.split(line)): if i != 0: m = startswith_tag.match(chunk) if m: clean_line += "[%s%s" % ( m.group(), chunk[m.end():].replace("[", BRACKET_L) .replace("]", BRACKET_R) ) else: clean_line += BRACKET_L + chunk.replace("[", BRACKET_L)\ .replace("]", BRACKET_R) else: # firsr chunk clean_line += chunk.replace("[", BRACKET_L)\ .replace("]", BRACKET_R) return clean_line @staticmethod def bring_brackets_back(line): return line.replace(BRACKET_L, "[").replace(BRACKET_R, "]") def parse(line, tags=None): """parse DSL markup. WARNING! `parse` function is not optimal because it creates new parser instance on each call. consider cache one [per thread] instance of FlawlessDSLParser in your code. """ import warnings warnings.warn( "`parse` function is not optimal because it creates new parser " "instance on each call.\n" "consider cache one [per thread] instance of FlawlessDSLParser " "in your code." ) if tags: parser = FlawlessDSLParser(tags) else: parser = FlawlessDSLParser() return parser.parse(line) pyglossary-3.2.1/pyglossary/plugins/dsl/flawless_dsl/tag.py0000644000175000017500000000405613575553425024520 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # flawless_dsl/tag.py # # Copyright (C) 2016 Ratijas # Copyright (C) 2016 Saeed Rasooli # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, version 3 of the License. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. """ internal stuff. Tag class """ from collections import namedtuple Tag = namedtuple('Tag', ['opening', 'closing']) Tag.__repr__ = lambda tag: \ 'Tag(%r)' % tag.opening if tag.opening == tag.closing \ else 'Tag(%r, %r)' % tag predefined = [ 'm', '*', 'ex', 'i', 'c', ] def was_opened(stack, tag): """ check if tag was opened at some layer before. :param stack: Iterable[layer.Layer] :param tag: tag.Tag :return: bool """ if not len(stack): return False layer = stack[-1] if tag in layer: return True return was_opened(stack[:-1], tag) def canonical_order(tags): """ arrange tags in canonical way, where (outermost to innermost): m > * > ex > i > c with all other tags follow them in alphabetical order. :param tags: Iterable[Tag] :return: List """ result = [] tags = list(tags) for predef in predefined: t = next((t for t in tags if t.closing == predef), None) if t: result.append(t) tags.remove(t) result.extend(sorted(tags, key=lambda x: x.opening)) return result def index_of_layer_containing_tag(stack, tag): """ return zero based index of layer with `tag` or None :param stack: Iterable[layer.Layer] :param tag: str :return: int | None """ for i, layer in enumerate(reversed(stack)): for t in layer.tags: if t.closing == tag: return len(stack) - i - 1 return None pyglossary-3.2.1/pyglossary/plugins/dsl/flawless_dsl/tests.py0000644000175000017500000003214713575553425025111 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # flawless_dsl/tests.py # # Copyright (C) 2016 Ratijas # Copyright (C) 2016 Saeed Rasooli # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, version 3 of the License. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. """ test everything. """ import unittest import os import sys from functools import partial sys.path.insert(0, os.path.dirname(os.path.dirname(__file__))) from flawless_dsl import layer, tag from flawless_dsl.main import ( process_closing_tags, FlawlessDSLParser, parse, BRACKET_L, BRACKET_R, ) tag_i = tag.Tag('i', 'i') tag_m = tag.Tag('m1', 'm') tag_p = tag.Tag('p', 'p') tag_s = tag.Tag('s', 's') class LayerTestCase(unittest.TestCase): def setUp(self): pass def test_new_layer(self): stack = [] l = layer.Layer(stack) self.assertEqual(1, len(stack)) self.assertEqual(l, stack[0]) def test_was_opened_AND_close_tags(self): stack = [] l1, l2 = layer.Layer(stack), layer.Layer(stack) l1.text = '...' l2.tags, l2.text = {tag_i}, ',,,' self.assertTrue(tag.was_opened(stack, tag_i)) self.assertFalse(tag.was_opened(stack, tag.Tag('c green', 'c'))) layer.close_tags(stack, {tag_i}, len(stack) - 1) expected = [] l = layer.Layer(expected) l.text = '...[i],,,[/i]' self.assertEqual(expected, stack) def test_close_layer(self): stack = [] l1, l2, l3 = layer.Layer(stack), layer.Layer(stack), layer.Layer(stack) l1.tags, l1.text = {tag_m}, '...' l2.tags, l2.text = {tag_i}, ',,,' l3.tags, l3.text = {tag_p, tag_s}, '+++' expected = [] l1, l2 = layer.Layer(expected), layer.Layer(expected) l1.tags, l1.text = {tag_m}, '...' l2.tags, l2.text = {tag_i}, ',,,[%s][%s]+++[/%s][/%s]' % ( tag_p.opening, tag_s.opening, tag_s.closing, tag_p.closing) layer.close_layer(stack) self.assertEqual(expected, stack) class CanonicalOrderTestCase(unittest.TestCase): def setUp(self): pass def test_no_tags(self): tags = {} expected = [] result = tag.canonical_order(tags) self.assertEqual(expected, result) def test_one_tag_not_predefined(self): tags = {tag_p} expected = [tag_p] result = tag.canonical_order(tags) self.assertEqual(expected, result) def test_one_tag_predefined(self): tags = {tag_i} expected = [tag_i] result = tag.canonical_order(tags) self.assertEqual(expected, result) def test_many_tags_not_predefined(self): tags = {tag_p, tag_s} expected = [tag_p, tag_s] result = tag.canonical_order(tags) self.assertEqual(expected, result) def test_many_tags_predefined(self): tags = {tag_m, tag_p} expected = [tag_m, tag_p] result = tag.canonical_order(tags) self.assertEqual(expected, result) def test_many_tags_mixed(self): tags = {tag_m, tag_i, tag_s, tag_p} expected = [tag_m, tag_i, tag_p, tag_s] result = tag.canonical_order(tags) self.assertEqual(expected, result) class ProcessClosingTagsTestCase(unittest.TestCase): def setUp(self): pass def test_index_of_layer_containing_tag(self): stack = [] l1, l2, l3 = layer.Layer(stack), layer.Layer(stack), layer.Layer(stack) l1.tags, l1.text = {tag_m}, '...' l2.tags, l2.text = {tag_i, tag_s}, ',,,' l3.tags, l3.text = {tag_p}, '---' fn = partial(tag.index_of_layer_containing_tag, stack) self.assertEqual(0, fn(tag_m.closing)) self.assertEqual(1, fn(tag_i.closing)) self.assertEqual(1, fn(tag_s.closing)) self.assertEqual(2, fn(tag_p.closing)) def test_close_one(self): stack = [] l1, l2 = layer.Layer(stack), layer.Layer(stack) l1.tags, l1.text = (), '...' l2.tags, l2.text = {tag_p}, ',,,' expected = [] l = layer.Layer(expected) l.tags, l.text = (), '...[%s],,,[/%s]' % tag_p closings = {tag_p.closing} process_closing_tags(stack, closings) self.assertEqual(expected, stack) class PutBracketsAwayTestCase(unittest.TestCase): def setUp(self): tags = frozenset({ 'b', '\'', 'c', 'i', 'sup', 'sub', 'ex', 'p', '*', ('m', '\d'), }) parser = FlawlessDSLParser(tags) self.put_brackets_away = parser.put_brackets_away def testStandaloneLeftEscapedAtTheBeginning(self): before = "[..." after = "%s..." % BRACKET_L self.assertEqual(after, self.put_brackets_away(before)) def testStandaloneRightEscapedAtTheBeginning(self): before = "]..." after = "%s..." % BRACKET_R self.assertEqual(after, self.put_brackets_away(before)) def testStandaloneLeftEscaped(self): before = r"...\[,,," after = r"...\%s,,," % BRACKET_L self.assertEqual(after, self.put_brackets_away(before)) def testStandaloneRightEscaped(self): before = r"...\],,," after = r"...\%s,,," % BRACKET_R self.assertEqual(after, self.put_brackets_away(before)) def testStandaloneLeftNonEscaped(self): before = "...[,,," after = "...%s,,," % BRACKET_L self.assertEqual(after, self.put_brackets_away(before)) def testStandaloneRightNonEscaped(self): before = "...],,," after = "...%s,,," % BRACKET_R self.assertEqual(after, self.put_brackets_away(before)) def testStandaloneLeftNonEscapedBeforeTagName(self): before = "...[p ,,," after = "...%sp ,,," % BRACKET_L self.assertEqual(after, self.put_brackets_away(before)) def testStandaloneRightNonEscapedAfterTagName(self): before = "c]..." after = "c%s..." % BRACKET_R self.assertEqual(after, self.put_brackets_away(before)) def testPairEscaped(self): before = r"...\[the\],,," after = r"...\%sthe\%s,,," % (BRACKET_L, BRACKET_R) self.assertEqual(after, self.put_brackets_away(before)) def testPairEscapedAroundTagName(self): before = r"...\[i\],,," after = r"...\%si\%s,,," % (BRACKET_L, BRACKET_R) self.assertEqual(after, self.put_brackets_away(before)) def testPairEscapedAroundClosingTagName(self): before = r"...\[/i\],,," after = r"...\%s/i\%s,,," % (BRACKET_L, BRACKET_R) self.assertEqual(after, self.put_brackets_away(before)) def testMixed(self): before = r"[i]...\[on \]\[the] to[p][/i]" after = r"[i]...\{L}on \{R}\{L}the{R} to[p][/i]".format( L=BRACKET_L, R=BRACKET_R, ) self.assertEqual(after, self.put_brackets_away(before)) def testEverythingEscaped(self): before = """ change it to \[b\]...\[c\]...\[/c\]\[/b\]\[c\]...\[/c\]""" after = before self.assertEqual(after, parse(before)) class FlawlessDSLParserTestCase(unittest.TestCase): def setUp(self): self.split_join = lambda x: FlawlessDSLParser.join_paragraphs( *FlawlessDSLParser.split_line_by_paragraphs(x)) def testStartsWithStandaloneClosed(self): before = """[/p]...""" after = """...""" self.assertEqual(after, parse(before)) def testStandaloneClosedAtTheBeginning(self): before = """...[/p],,,""" after = """...,,,""" self.assertEqual(after, parse(before)) def testStandaloneClosedAtTheBeginningBeforeMarkup(self): before = """...[/p],,,[i][b]+++[/b][/i]---""" after = """...,,,[i][b]+++[/b][/i]---""" self.assertEqual(after, parse(before)) def testEndsWithStandaloneOpened(self): before = """...[i]""" after = """...""" self.assertEqual(after, parse(before)) def testStandaloneOpenedAtTheEnd(self): before = """...[i],,,""" after = """...,,,""" self.assertEqual(after, parse(before)) def testStandaloneOpenedAtTheEndAfterMarkup(self): before = """...[i][b],,,[/b][/i]+++[i]---""" after = """...[i][b],,,[/b][/i]+++---""" self.assertEqual(after, parse(before)) def testWrongOrder2(self): before = """...[i][b],,,[/i][/b]+++""" after = """...[i][b],,,[/b][/i]+++""" self.assertEqual(after, parse(before)) def testWrongOrder3(self): before = """...[i][c],,,[b]+++[/i][/c][/b]---""" after = """...[p],,,[b]+++[/b][/p]---""" self.assertEqual(after, parse(before)) def testOpenOneCloseAnother(self): before = """...[i],,,[/p]+++""" after = """...,,,+++""" self.assertEqual(after, parse(before)) def testStartsWtihClosingAndEndsWithOpening(self): before = """[/c]...[i]""" after = """...""" self.assertEqual(after, parse(before)) def testValidEmptyTagsDestructionOne(self): before = """...[i][/i],,,""" after = """...,,,""" self.assertEqual(after, parse(before)) def testValidEmptyTagsDestructionMany(self): before = """...[b][c][i][/i][/c][/b],,,""" after = """...,,,""" self.assertEqual(after, parse(before)) def testBrokenEmptyTagsDestructionMany(self): before = """...[b][i][c][/b][/c][/i],,,""" after = """...,,,""" self.assertEqual(after, parse(before)) def testNestedWithBrokenOutter(self): before = """[i][p]...[/p][/c]""" after = """[p]...[/p]""" self.assertEqual(after, parse(before)) def testHorriblyBrokenTags(self): before = """[/c]...[i][/p],,,[/i]+++[b]""" after = """...[i],,,[/i]+++""" self.assertEqual(after, parse(before)) def testWrongOrder2_WithConent(self): before = """[b]...[c red]...[/b]...[/c]""" after = """[b]...[c red]...[/c][/b][c red]...[/c]""" self.assertEqual(after, parse(before)) def testWrongOrderWithTextBefore(self): before = "[c]...[i],,,[/c][/i]" after = "[c]...[i],,,[/i][/c]" self.assertEqual(after, parse(before)) def testRespect_m_TagsProperly(self): before = """ [m1]for tags like: [p]n[/c][/i][/p], the line needs scan again[/m]""" after = """ [m1]for tags like: [p]n[/p], the line needs scan again[/m]""" self.assertEqual(after, parse(before)) def testNoTagsDoNothing(self): before = after = """no tags, do nothing""" self.assertEqual(after, parse(before)) def testValidNestedTags(self): before = """...[i][c][b]...[/b][/c][/i]...""" after = """...[b][p]...[/p][/b]...""" self.assertEqual(after, parse(before)) def testBrokenNestedTags(self): before = """...[b][i][c]...[/b][/c][/i]...""" after = """...[b][p]...[/p][/b]...""" self.assertEqual(after, parse(before)) def testEscapedBrackets(self): before = after = r"""on \[the\] top""" self.assertEqual(after, parse(before)) def testPoorlyEscapedBracketsWithTags(self): before = r"""...\[c],,,[/c]+++""" after = r"""...\[c],,,+++""" self.assertEqual(after, parse(before)) def testPoorlyEscapedBracketsWithTags2(self): before = r"""on \[the\] [b]roof[/b]]""" after = r"""on \[the\] [b]roof[/b]]""" self.assertEqual(after, parse(before)) def testValidRealDictionaryArticle(self): # zh => ru, http://bkrs.info/slovo.php?ch=和田 before = after = """和田 [m1][p]г. и уезд[/p] Хотан ([i]Синьцзян-Уйгурский[c] авт.[/c] р-н, КНР[/i])[/m]\ [m2][*][ex]和田玉 Хотанский нефрит[/ex][/*][/m]""" self.assertEqual(after, parse(before)) def testBrokenRealDictionaryArticle(self): # zh => ru, http://bkrs.info/slovo.php?ch=一一相应 before = """一一相应 yīyī xiāngyìng [m1][c][i]мат.[/c][/i] взаимнооднозначное соответствие[/m]""" after = """一一相应 yīyī xiāngyìng [m1][p]мат.[/p] взаимнооднозначное соответствие[/m]""" self.assertEqual(after, parse(before)) def testBrokenManyRealDictionaryArticle(self): # zh => ru, http://bkrs.info/slovo.php?ch=一轮 before = """一轮 yīlún [m1]1) одна очередь[/m][m1]2) цикл ([i]в 12 лет[/i])[/m][m1]3) диск ([c][i]напр.[/c] луны[/i])[/m]\ [m1]4) [c] [i]спорт[/c][/i] раунд, круг ([i]встречи спортсменов[/i])[/m]\ [m1]5) [c] [i]дипл.[/c][/i] раунд ([i]переговоров[/i])[/m]""" after = """一轮 yīlún [m1]1) одна очередь[/m][m1]2) цикл ([i]в 12 лет[/i])[/m][m1]3) диск ([i][c]напр.[/c] луны[/i])[/m]\ [m1]4) [c] [i]спорт[/i][/c] раунд, круг ([i]встречи спортсменов[/i])[/m]\ [m1]5) [c] [i]дипл.[/i][/c] раунд ([i]переговоров[/i])[/m]""" self.assertEqual(after, parse(before)) def testSameTagsNested(self): before = "...[p],,,[p]+++[/p]---[/p]```" after = "...[p],,,+++[/p]---```" self.assertEqual(after, parse(before)) def testOneLastTextLetter(self): before = after = "b" self.assertEqual(after, parse(before)) def testOneLastTextLetterAfterTag(self): before = after = "...[b],,,[/b]b" self.assertEqual(after, parse(before)) def testTagMInsideAnotherTag(self): # tag order. before = "[c][m1]...[/m][/c]" after = "[m1][c]...[/c][/m]" self.assertEqual(after, parse(before)) def testTagMInsideAnotherTagAfterText(self): before = "[c]...[m1],,,[/m][/c]" after = "[c]...[/c][m1][c],,,[/c][/m]" self.assertEqual(after, parse(before)) def testTagMDeepInside(self): before = "...[i],,,[b]+++[c green][/b]---[m1]```[/i][/c][/m]..." after = "...[i],,,[b]+++[/b][c green]---[/c][/i][m1][i][c green]```[/c][/i][/m]..." self.assertEqual(after, parse(before)) def testTagMInsideBroken(self): before = "[m1][*]- [ref]...[/ref][/m][m1]- [ref],,,[/ref][/*][/m]" after = "[m1][*]- [ref]...[/ref][/*][/m][m1][*]- [ref],,,[/ref][/*][/m]" self.assertEqual(after, parse(before)) if __name__ == '__main__': unittest.main() pyglossary-3.2.1/pyglossary/plugins/edlin.py0000644000175000017500000001604413577304507021571 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # edlin.py # # Copyright © 2016 Saeed Rasooli (ilius) # This file is part of PyGlossary project, https://github.com/ilius/pyglossary # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program. Or on Debian systems, from /usr/share/common-licenses/GPL # If not, see . from formats_common import * from pyglossary.text_utils import ( escapeNTB, unescapeNTB, splitByBarUnescapeNTB, ) enable = True format = "Edlin" description = "Editable Linked List of Entries" extentions = [".edlin"] readOptions = [] writeOptions = [ "encoding", # str "havePrevLink", # bool ] def makeDir(direc): if not isdir(direc): os.makedirs(direc) class Reader(object): def __init__(self, glos): self._glos = glos self._clear() def close(self): self._clear() def _clear(self): self._filename = "" self._encoding = "utf-8" self._havePrevLink = True self._wordCount = None self._rootPath = None self._resDir = "" self._resFileNames = [] def open(self, filename, encoding="utf-8"): from pyglossary.json_utils import jsonToOrderedData if isdir(filename): infoFname = join(filename, "info.json") elif isfile(filename): infoFname = filename filename = dirname(filename) else: raise ValueError( "error while opening %r" % filename + ": no such file or directory" ) self._filename = filename self._encoding = encoding with open(infoFname, "r", encoding=encoding) as infoFp: infoJson = infoFp.read() info = jsonToOrderedData(infoJson) self._wordCount = info.pop("wordCount") self._havePrevLink = info.pop("havePrevLink") self._rootPath = info.pop("root") for key, value in info.items(): self._glos.setInfo(key, value) self._resDir = join(filename, "res") if isdir(self._resDir): self._resFileNames = os.listdir(self._resDir) else: self._resDir = "" self._resFileNames = [] def __len__(self): if self._wordCount is None: log.error("called len() on a reader which is not open") return 0 return self._wordCount + len(self._resFileNames) def __iter__(self): if not self._rootPath: log.error("iterating over a reader which is not open") raise StopIteration wordCount = 0 nextPath = self._rootPath while nextPath != "END": wordCount += 1 # before or after reading word and defi # (and skipping empty entry)? FIXME with open( join(self._filename, nextPath), "r", encoding=self._encoding, ) as fromFile: header = fromFile.readline().rstrip() if self._havePrevLink: self._prevPath, nextPath = header.split(" ") else: nextPath = header word = fromFile.readline() if not word: yield None # update progressbar continue defi = fromFile.read() if not defi: log.warning( "Edlin Reader: no definition for word %r" % word + ", skipping" ) yield None # update progressbar continue word = word.rstrip() defi = defi.rstrip() if self._glos.getPref("enable_alts", True): word = splitByBarUnescapeNTB(word) if len(word) == 1: word = word[0] else: word = unescapeNTB(word, bar=True) # defi = unescapeNTB(defi) yield self._glos.newEntry(word, defi) if wordCount != self._wordCount: log.warning( "%s words found, " % wordCount + "wordCount in info.json was %s" % self._wordCount ) self._wordCount = wordCount resDir = self._resDir for fname in self._resFileNames: with open(join(resDir, fname), "rb") as fromFile: yield self._glos.newDataEntry( fname, fromFile.read(), ) class Writer(object): def __init__(self, glos): self._glos = glos self._clear() def close(self): self._clear() def _clear(self): self._filename = "" self._encoding = "utf-8" self._hashSet = set() # self._wordCount = None def open(self, filename, encoding="utf-8", havePrevLink=True): if exists(filename): raise ValueError("directory %r already exists" % filename) self._filename = filename self._encoding = encoding self._havePrevLink = havePrevLink self._resDir = join(filename, "res") os.makedirs(filename) os.mkdir(self._resDir) def hashToPath(self, h): return h[:2] + "/" + h[2:] def getEntryHash(self, entry): """ return hash string for given entry don't call it twice for one entry, if you do you will get a different hash string """ from hashlib import sha1 _hash = sha1(toBytes(entry.getWord())).hexdigest()[:8] if _hash not in self._hashSet: self._hashSet.add(_hash) return _hash index = 0 while True: tmp_hash = _hash + hex(index)[2:] if tmp_hash not in self._hashSet: self._hashSet.add(tmp_hash) return tmp_hash index += 1 def saveEntry(self, thisEntry, thisHash, prevHash, nextHash): dpath = join(self._filename, thisHash[:2]) makeDir(dpath) with open( join(dpath, thisHash[2:]), "w", encoding=self._encoding, ) as toFile: nextPath = self.hashToPath(nextHash) if nextHash else "END" if self._havePrevLink: prevPath = self.hashToPath(prevHash) if prevHash else "START" header = prevPath + " " + nextPath else: header = nextPath toFile.write("\n".join([ header, thisEntry.getWord(), thisEntry.getDefi(), ])) def _iterNonDataEntries(self): for entry in self._glos: if entry.isData(): entry.save(self._resDir) else: yield entry def write(self): from collections import OrderedDict as odict from pyglossary.json_utils import dataToPrettyJson glosIter = iter(self._iterNonDataEntries()) try: thisEntry = next(glosIter) except StopIteration: raise ValueError("glossary is empty") count = 1 rootHash = thisHash = self.getEntryHash(thisEntry) prevHash = None for nextEntry in glosIter: nextHash = self.getEntryHash(nextEntry) self.saveEntry(thisEntry, thisHash, prevHash, nextHash) thisEntry = nextEntry prevHash, thisHash = thisHash, nextHash count += 1 self.saveEntry(thisEntry, thisHash, prevHash, None) with open( join(self._filename, "info.json"), "w", encoding=self._encoding, ) as toFile: info = odict() info["name"] = self._glos.getInfo("name") info["root"] = self.hashToPath(rootHash) info["havePrevLink"] = self._havePrevLink info["wordCount"] = count # info["modified"] = for key, value in self._glos.getExtraInfos(( "name", "root", "havePrevLink", "wordCount", )).items(): info[key] = value toFile.write(dataToPrettyJson(info)) def write(glos, filename, **kwargs): writer = Writer(glos) writer.open( filename, **kwargs ) writer.write() writer.close() pyglossary-3.2.1/pyglossary/plugins/formats_common.py0000644000175000017500000000117213577304507023515 0ustar emfoxemfox00000000000000from formats_common import * import sys import os from os.path import ( join, split, splitext, isfile, isdir, exists, ) import logging log = logging.getLogger("root") from pprint import pformat from paths import rootDir sys.path.insert(0, rootDir) from pyglossary.flags import * from pyglossary import core from pyglossary.file_utils import FileLineWrapper from pyglossary.text_utils import toStr, toBytes from pyglossary.os_utils import indir enable = False format = "Unknown" description = "Unknown" extentions = [] readOptions = [] writeOptions = [] supportsAlternates = False sortOnWrite = DEFAULT_NO sortKey = None pyglossary-3.2.1/pyglossary/plugins/freedict.py0000644000175000017500000000200113577304507022247 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- from formats_common import * enable = True format = "Freedict" description = "FreeDict (tei)" extentions = [".tei"] readOptions = [] def write(glos, filename): fp = open(filename, "w") fp.write(""" ]> %s converted withPyGlossary

    freedict.de

    %s

    """ % (glos.getInfo("title"), filename)) for entry in glos: if entry.isData(): # FIXME continue word = entry.getWord() defi = entry.getDefi() fp.write("""
    %s
    n %s
    """ % (word, defi)) fp.write("
    ") fp.close() pyglossary-3.2.1/pyglossary/plugins/gettext_mo.py0000644000175000017500000000026713577304507022655 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- from formats_common import * enable = False format = "GettextMo" description = "Gettext Binary (mo)" extentions = [".mo"] readOptions = [] writeOptions = [] pyglossary-3.2.1/pyglossary/plugins/gettext_po.py0000644000175000017500000000440413577304507022655 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- from formats_common import * enable = True format = "GettextPo" description = "Gettext Source (po)" extentions = [".po"] readOptions = [] writeOptions = [ "resources", # bool ] class Reader(object): def __init__(self, glos, hasInfo=True): self._glos = glos self.clear() def clear(self): self._filename = "" self._file = None self._wordCount = None self._resDir = "" self._resFileNames = [] def open(self, filename): self._filename = filename self._file = open(filename) self._resDir = filename + "_res" if isdir(self._resDir): self._resFileNames = os.listdir(self._resDir) else: self._resDir = "" self._resFileNames = [] def close(self): if self._file: self._file.close() self.clear() def __len__(self): from pyglossary.file_utils import fileCountLines if self._wordCount is None: log.debug("Try not to use len(reader) as it takes extra time") self._wordCount = fileCountLines( self._filename, newline="\nmsgid", ) return self._wordCount def __iter__(self): from polib import unescape as po_unescape word = "" defi = "" msgstr = False wordCount = 0 for line in self._file: line = line.strip() if not line: continue if line.startswith("#"): continue if line.startswith("msgid "): if word: yield self._glos.newEntry(word, defi) wordCount += 1 word = "" defi = "" word = po_unescape(line[6:]) msgstr = False elif line.startswith("msgstr "): if msgstr: log.error("msgid omitted!") defi = po_unescape(line[7:]) msgstr = True else: if msgstr: defi += po_unescape(line) else: word += po_unescape(line) if word: yield self._glos.newEntry(word, defi) wordCount += 1 self._wordCount = wordCount def write(glos, filename, resources=True): from polib import escape as po_escape with open(filename, "w") as toFile: toFile.write("#\nmsgid ""\nmsgstr ""\n") for key, value in glos.iterInfo(): toFile.write('"%s: %s\\n"\n' % (key, value)) for entry in glos: if entry.isData(): if resources: entry.save(filename + "_res") continue word = entry.getWord() defi = entry.getDefi() toFile.write("msgid %s\nmsgstr %s\n\n" % ( po_escape(word), po_escape(defi), )) pyglossary-3.2.1/pyglossary/plugins/lingoes_ldf.py0000644000175000017500000000567713577304507022775 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- from formats_common import * from pyglossary.text_reader import TextGlossaryReader from pyglossary.file_utils import fileCountLines enable = True format = "LingoesLDF" description = "Lingoes Source (LDF)" extentions = [".ldf"] readOptions = [] writeOptions = [ "newline", # str, or choice ("\r\n", "\n", or "\r") "resources", # bool ] infoKeys = [ "title", "description", "author", "email", "website", "copyright", ] class Reader(TextGlossaryReader): def __len__(self): if self._wordCount is None: log.debug("Try not to use len(reader) as it takes extra time") self._wordCount = fileCountLines( self._filename, newline="\n\n", ) - self._leadingLinesCount return self._wordCount def isInfoWord(self, word): if isinstance(word, str): return word.startswith("#") else: return False def fixInfoWord(self, word): if isinstance(word, str): return word.lstrip("#") else: return word def loadInfo(self): # FIXME pass def nextPair(self): if not self._file: raise StopIteration entryLines = [] while True: line = self._file.readline() if not line: raise StopIteration line = line.rstrip("\n\r") # FIXME if line: entryLines.append(line) continue # now `line` is empty, process `entryLines` if not entryLines: return if len(entryLines) < 2: log.error( "invalid block near line %s" % fileObj.line + " in file %s" % filename ) return word = entryLines[0] defi = "\n".join(entryLines[1:]) defi = defi.replace("
    ", "\n") # FIXME word = [p.strip() for p in word.split("|")] return word, defi def read(glos, filename): glos.setDefaultDefiFormat("h") fileObj = FileLineWrapper(open(filename)) entryLines = [] def addDataEntry(entryLines): if not entryLines: return if len(entryLines) < 2: log.error( "invalid block near line %s" % fileObj.line + " in file %s" % filename ) return word = entryLines[0] defi = "\n".join(entryLines[1:]) defi = defi.replace("
    ", "\n") # FIXME word = [p.strip() for p in word.split("|")] glos.addEntry( word, defi, ) for line in fileObj: line = line.strip() if not line.startswith("###"): if line: entryLines.append(line) break parts = line[3:].split(":") if not parts: continue key = parts[0].lower() value = " ".join(parts[1:]).strip() glos.setInfo(key, value) # info lines finished for line in fileObj: line = line.strip() if line: entryLines.append(line) else: addDataEntry(entryLines) entryLines = [] addDataEntry(entryLines) def write( glos, filename, newline="\n", resources=True, ): g = glos head = "\n".join([ "###%s: %s" % ( key.capitalize(), g.getInfo(key), ) for key in infoKeys ]) head += "\n" g.writeTxt( "\n", "\n\n", filename=filename, writeInfo=False, rplList=( ("\n", "
    "), ), ext=".ldf", head=head, newline=newline, resources=resources, ) pyglossary-3.2.1/pyglossary/plugins/octopus_mdict.py0000644000175000017500000000545313577304507023354 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # octopus_mdic.py # Read Octopus MDict dictionary format, mdx(dictionary)/mdd(data) # # Copyright (C) 2013 Xiaoqiang Wang # Copyright (C) 2013-2016 Saeed Rasooli # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, version 3 of the License. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. from formats_common import * import os from os.path import splitext, isfile, isdir, extsep, basename, dirname enable = True format = "OctopusMdict" description = "Octopus MDict" extentions = [".mdx"] readOptions = [ "encoding", # str "substyle", # bool ] writeOptions = [] class Reader(object): def __init__(self, glos): self._glos = glos self.clear() def clear(self): self._filename = "" self._encoding = "" self._substyle = True self._mdx = None self._mdd = None self._mddFilename = "" def open(self, filename, **options): from pyglossary.plugin_lib.readmdict import MDX, MDD self._filename = filename self._encoding = options.get("encoding", "") self._substyle = options.get("substyle", True) self._mdx = MDX(filename, self._encoding, self._substyle) filenameNoExt, ext = splitext(self._filename) mddFilename = "".join([filenameNoExt, extsep, "mdd"]) if isfile(mddFilename): self._mdd = MDD(mddFilename) self._mddFilename = mddFilename log.debug("mdx.header = " + pformat(self._mdx.header)) # for key, value in self._mdx.header.items(): # key = key.lower() # self._glos.setInfo(key, value) try: title = self._mdx.header[b"Title"] except KeyError: pass else: self._glos.setInfo("title", title) self._glos.setInfo( "description", self._mdx.header.get(b"Description", ""), ) def __iter__(self): if self._mdx is None: log.error("trying to iterate on a closed MDX file") else: for word, defi in self._mdx.items(): word = toStr(word) defi = toStr(defi) yield self._glos.newEntry(word, defi) self._mdx = None if self._mdd: for b_fname, b_data in self._mdd.items(): fname = toStr(b_fname) fname = fname.replace("\\", os.sep).lstrip(os.sep) yield self._glos.newDataEntry(fname, b_data) self._mdd = None def __len__(self): if self._mdx is None: log.error( "OctopusMdict: called len(reader) while reader is not open" ) return 0 return len(self._mdx) def close(self): self.clear() pyglossary-3.2.1/pyglossary/plugins/octopus_mdict_source.py0000644000175000017500000000452513577304507024733 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # http://www.octopus-studio.com/download.en.htm from formats_common import * from collections import OrderedDict enable = True format = "OctopusMdictSource" description = "Octopus MDict Source" extentions = [".mtxt"] readOptions = [ "encoding", # str "links", # bool, support for inconsecutive links, #68 ] writeOptions = [ "resources", # bool ] def read( glos, filename, encoding="utf-8", links=False, ): with open(filename, encoding=encoding) as fp: text = fp.read() text = text.replace("\r\n", "\n") text = text.replace("entry://", "bword://") lastEntry = None linksDict = {} entryDict = OrderedDict() for section in text.split(""): lines = section.strip().split("\n") if len(lines) < 2: continue word = lines[0] defi = "\n".join(lines[1:]) if defi.startswith("@@@LINK="): mainWord = defi.partition("=")[2] if links: linksDict[word] = mainWord elif lastEntry and lastEntry.getWords()[0] == mainWord: lastEntry.addAlt(word) else: log.error("alternate is not ride after word: %s" % defi) continue entry = glos.newEntry(word, defi) if links: # do not call glos.addEntry or glos.addEntryObj # because we will need to modify entries at the end # and we need to keep a OrderedDict of entries (ordered list of entries, and dict of entries by word) entryDict[word] = entry else: if lastEntry: # now that we know there are no more alternate forms of lastEntry glos.addEntryObj(lastEntry) lastEntry = entry if links: for sourceWord, targetWord in linksDict.items(): targetEntry = entryDict.get(targetWord) if targetEntry is None: log.error("Link to non-existing word %s" % targetWord) continue targetEntry.addAlt(sourceWord) for entry in entryDict.values(): glos.addEntryObj(entry) else: if lastEntry: glos.addEntryObj(lastEntry) def writeEntryGen(glos): for entry in glos: words = entry.getWords() defis = entry.getDefis() yield glos.newEntry(words[0], defis) for alt in words[1:]: yield glos.newEntry( alt, "@@@LINK=%s" % words[0], ) def write( glos, filename, resources=True, ): glos.writeTxt( "\n", "\n\n", filename=filename, writeInfo=False, rplList=[ ("bword://", "entry://"), ], ext=".mtxt", head="", iterEntries=writeEntryGen(glos), newline="\r\n", resources=resources, ) pyglossary-3.2.1/pyglossary/plugins/omnidic.py0000644000175000017500000000177313577304507022123 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- from formats_common import * enable = True format = 'Omnidic' description = 'Omnidic' extentions = ['.omni', '.omnidic'] readOptions = [] writeOptions = [] def write(glos, filename, dicIndex=16): if not isinstance(dicIndex, int): raise TypeError( 'invalid dicIndex=%r, must be integer' % dicIndex ) with indir(filename, create=True): indexFp = open(str(dicIndex), 'w') for bucketIndex, bucket in enumerate(glos.iterEntryBuckets(100)): if bucketIndex == 0: bucketFilename = '%s99' % dicIndex else: bucketFilename = '%s%s' % ( dicIndex, bucketIndex * 100 + len(bucket) - 1, ) indexFp.write('%s#%s#%s\n' % ( bucket[0].getWord(), bucket[-1].getWord(), bucketFilename, )) bucketFileObj = open(bucketFilename, 'w') for entry in bucket: word = entry.getWord() defi = entry.getDefi() defi = defi.replace('\n', ' ') # FIXME bucketFileObj.write('%s#%s\n' % (word, defi)) bucketFileObj.close() indexFp.close() pyglossary-3.2.1/pyglossary/plugins/paths.py0000644000175000017500000000026313575553425021614 0ustar emfoxemfox00000000000000from os.path import realpath, dirname, join, isdir import sys if hasattr(sys, 'frozen'): rootDir = dirname(sys.executable) else: rootDir = dirname(dirname(realpath(__file__))) pyglossary-3.2.1/pyglossary/plugins/sdict.py0000644000175000017500000001540613577304507021605 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # sdict.py # Loader engine for AXMASoft's open dictionary format # # Copyright (C) 2010-2016 Saeed Rasooli (ilius) # Copyright (C) 2006-2008 Igor Tkach, as part of SDict Viewer: # http://sdictviewer.sf.net # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, version 3 of the License. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. from struct import unpack from formats_common import * enable = True format = "Sdict" description = "Sdictionary Binary(dct)" extentions = [".dct"] readOptions = [ "encoding", # str ] writeOptions = [] class GzipCompression(object): def __str__(self): return "gzip" def decompress(self, string): import zlib return zlib.decompress(string) class Bzip2Compression(object): def __str__(self): return "bzip2" def decompress(self, string): import bz2 return bz2.decompress(string) class NoCompression(object): def __str__(self): return "no compression" def decompress(self, string): return string compressions = [ NoCompression(), GzipCompression(), Bzip2Compression(), ] def read_raw(s, fe): return s[fe.offset:fe.offset + fe.length] def read_str(s, fe): read_raw(s, fe).replace(b"\x00", b"") def read_int(s, fe=None): return unpack("> 4 self.num_of_words = read_int(st, self.f_num_of_words) self.title_offset = read_int(st, self.f_title) self.copyright_offset = read_int(st, self.f_copyright) self.version_offset = read_int(st, self.f_version) self.articles_offset = read_int(st, self.f_articles) self.short_index_offset = read_int(st, self.f_short_index) self.full_index_offset = read_int(st, self.f_full_index) class Reader(object): def __init__(self, glos): self._glos = glos self.clear() def clear(self): self._file = None self._filename = "" self._encoding = "" self._header = Header() def open(self, filename, encoding="utf-8"): self._file = open(filename, "rb") self._header.parse(self._file.read(43)) self._compression = compressions[self._header.compressionType] self.short_index = self.readShortIndex() self._glos.setInfo( "name", self.readUnit(self._header.title_offset), ) self._glos.setInfo( "version", self.readUnit(self._header.version_offset) ) self._glos.setInfo( "copyright", self.readUnit(self._header.copyright_offset), ) log.debug("SDict word count: %s" % len(self)) # correct? FIXME def close(self): self._file.close() self.clear() def __len__(self): return self._header.num_of_words def readUnit(self, pos): f = self._file f.seek(pos) record_length = read_int(f.read(4)) return self._compression.decompress(f.read(record_length)) def readShortIndex(self): self._file.seek(self._header.short_index_offset) s_index_depth = self._header.short_index_depth index_entry_len = (s_index_depth+1)*4 short_index_str = self._file.read( index_entry_len * self._header.short_index_length ) short_index_str = self._compression.decompress(short_index_str) index_length = self._header.short_index_length short_index = [{} for i in range(s_index_depth+2)] depth_range = range(s_index_depth) for i in range(index_length): entry_start = start_index = i*index_entry_len short_word = "" try: for j in depth_range: # inlined unpack yields ~20% performance gain # compared to calling read_int() uchar_code = unpack( "", "\n").replace("
    ", "\n") yield self._glos.newEntry(word, defi) def readFullIndexItem(self, pointer): try: f = self._file f.seek(pointer) s = f.read(8) next_word = unpack("= self._header.articles_offset: log.error( "Warning: attempt to read word from " "illegal position in dict file" ) return None log.exception("") def readArticle(self, pointer): return self.readUnit(self._header.articles_offset + pointer) pyglossary-3.2.1/pyglossary/plugins/sdict_source.py0000644000175000017500000000172413577304507023163 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # Source Glossary for "Sdictionary" (http://sdict.org) # It has extention '.sdct' from formats_common import * enable = True format = 'SdictSource' description = 'Sdictionary Source (sdct)' extentions = ['.sdct'] readOptions = [] writeOptions = [ 'writeInfo', # bool 'newline', # str, or choice ('\r\n', '\n', or '\r') 'resources', # bool ] def write( glos, filename, writeInfo=True, newline='\n', resources=True, ): head = '' if writeInfo: head += '
    \n' head += 'title = %s\n' % glos.getInfo('name') head += 'author = %s\n' % glos.getInfo('author') head += 'description = %s\n' % glos.getInfo('description') head += 'w_lang = %s\n' % glos.getInfo('inputlang') head += 'a_lang = %s\n' % glos.getInfo('outputlang') head += '
    \n#\n#\n#\n' glos.writeTxt( '___', '\n', filename, writeInfo=False, rplList=( ('\n', '
    '), ), ext='.sdct', head=head, newline=newline, resources=resources, ) pyglossary-3.2.1/pyglossary/plugins/sql.py0000644000175000017500000000056413577304507021275 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- from formats_common import * enable = True format = 'Sql' description = 'SQL' extentions = ['.sql'] readOptions = [] writeOptions = [ 'encoding', # str ] def write( glos, filename, encoding='utf-8', ): with open(filename, 'w', encoding=encoding) as fp: for line in glos.iterSqlLines( transaction=False, ): fp.write(line + '\n') pyglossary-3.2.1/pyglossary/plugins/stardict.py0000644000175000017500000004066213577304507022316 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- import sys import os from os.path import ( dirname, realpath, ) import re import gzip from time import time as now from collections import Counter from pyglossary.text_utils import ( intToBinStr, binStrToInt, runDictzip, ) from formats_common import * enable = True format = "Stardict" description = "StarDict (ifo)" extentions = [".ifo"] readOptions = [] writeOptions = [ "dictzip", # bool "sametypesequence", # str, "h" for html, "m" for plain text ] sortOnWrite = ALWAYS # sortKey also is defined in line 52 supportsAlternates = True infoKeys = ( "bookname", "author", "email", "website", "description", "date", ) def sortKeyBytes(b_word): """ b_word is a bytes instance """ assert isinstance(b_word, bytes) return ( b_word.lower(), b_word, ) def sortKey(word): """ word is a str instance """ assert isinstance(word, str) return sortKeyBytes(word.encode("utf-8")) def newlinesToSpace(text): return re.sub("[\n\r]+", " ", text) def newlinesToBr(text): return re.sub("\n\r?|\r\n?", "
    ", text) def verifySameTypeSequence(s): if not s: return True if not s.isalpha(): log.error("Invalid sametypesequence option") return False return True class Reader(object): def __init__(self, glos): self._glos = glos self.clear() """ indexData format indexData[i] - i-th record in index file indexData[i][0] - b_word (bytes) indexData[i][1] - definition block offset in dict file indexData[i][2] - definition block size in dict file REMOVE: indexData[i][3] - list of definitions indexData[i][3][j][0] - definition data indexData[i][3][j][1] - definition type - "h", "m" or "x" indexData[i][4] - list of synonyms (strings) synDict: a dict { wordIndex -> altList } """ def close(self): if self._dictFile: self._dictFile.close() self.clear() def clear(self): self._dictFile = None self._filename = "" # base file path, no extension self._indexData = [] self._synDict = {} self._sametypesequence = "" self._resDir = "" self._resFileNames = [] self._wordCount = None def open(self, filename): if splitext(filename)[1].lower() == ".ifo": self._filename = splitext(filename)[0] else: self._filename = filename self._filename = realpath(self._filename) self.readIfoFile() sametypesequence = self._glos.getInfo("sametypesequence") if not verifySameTypeSequence(sametypesequence): return False self._indexData = self.readIdxFile() self._wordCount = len(self._indexData) self._synDict = self.readSynFile() self._sametypesequence = sametypesequence if isfile(self._filename + ".dict.dz"): self._dictFile = gzip.open(self._filename+".dict.dz", mode="rb") else: self._dictFile = open(self._filename+".dict", mode="rb") self._resDir = join(dirname(self._filename), "res") if isdir(self._resDir): self._resFileNames = os.listdir(self._resDir) else: self._resDir = "" self._resFileNames = [] # self.readResources() def __len__(self): if self._wordCount is None: raise RuntimeError( "StarDict: len(reader) called while reader is not open" ) return self._wordCount + len(self._resFileNames) def readIfoFile(self): """ .ifo file is a text file in utf-8 encoding """ with open(self._filename+".ifo", "r") as ifoFile: for line in ifoFile: line = line.strip() if not line: continue if line == "StarDict's dict ifo file": continue key, eq, value = line.partition("=") if not (key and value): log.warning("Invalid ifo file line: %s" % line) continue self._glos.setInfo(key, value) def readIdxFile(self): if isfile(self._filename+".idx.gz"): with gzip.open(self._filename+".idx.gz") as idxFile: idxBytes = idxFile.read() else: with open(self._filename+".idx", "rb") as idxFile: idxBytes = idxFile.read() indexData = [] pos = 0 while pos < len(idxBytes): beg = pos pos = idxBytes.find(b"\x00", beg) if pos < 0: log.error("Index file is corrupted") break b_word = idxBytes[beg:pos] pos += 1 if pos + 8 > len(idxBytes): log.error("Index file is corrupted") break offset = binStrToInt(idxBytes[pos:pos+4]) pos += 4 size = binStrToInt(idxBytes[pos:pos+4]) pos += 4 indexData.append([b_word, offset, size]) return indexData def __iter__(self): indexData = self._indexData synDict = self._synDict sametypesequence = self._sametypesequence dictFile = self._dictFile if not dictFile: log.error("%s is not open, can not iterate" % self) raise StopIteration if not indexData: log.warning("indexData is empty") raise StopIteration for wordIndex, (b_word, defiOffset, defiSize) in enumerate(indexData): if not b_word: continue dictFile.seek(defiOffset) if dictFile.tell() != defiOffset: log.error( "Unable to read definition for word \"%s\"" % b_word ) continue b_defiBlock = dictFile.read(defiSize) if len(b_defiBlock) != defiSize: log.error( "Unable to read definition for word \"%s\"" % b_word ) continue if sametypesequence: defisData = self.parseDefiBlockCompact( b_defiBlock, sametypesequence, ) else: defisData = self.parseDefiBlockGeneral(b_defiBlock) if defisData is None: log.error("Data file is corrupted. Word \"%s\"" % b_word) continue # defisData is a list of (b_defi, defiFormatCode) tuples defis = [] defiFormats = [] for b_defi, defiFormatCode in defisData: defis.append(b_defi.decode("utf-8")) defiFormats.append( { "m": "m", "t": "m", "y": "m", "g": "h", "h": "h", "x": "x", }.get(chr(defiFormatCode), "") ) # FIXME defiFormat = defiFormats[0] # defiFormat = Counter(defiFormats).most_common(1)[0][0] if not defiFormat: log.warning( "Definition format %s is not supported" % defiFormat ) word = b_word.decode("utf-8") try: alts = synDict[wordIndex] except KeyError: # synDict is dict pass else: word = [word] + alts if len(defis) == 1: defis = defis[0] yield self._glos.newEntry( word, defis, defiFormat=defiFormat, ) if isdir(self._resDir): for fname in os.listdir(self._resDir): fpath = join(self._resDir, fname) with open(fpath, "rb") as fromFile: yield self._glos.newDataEntry( fname, fromFile.read(), ) def readSynFile(self): """ return synDict, a dict { wordIndex -> altList } """ if not isfile(self._filename+".syn"): return {} with open(self._filename+".syn", "rb") as synFile: synBytes = synFile.read() synBytesLen = len(synBytes) synDict = {} pos = 0 while pos < synBytesLen: beg = pos pos = synBytes.find(b"\x00", beg) if pos < 0: log.error("Synonym file is corrupted") break b_alt = synBytes[beg:pos] # b_alt is bytes pos += 1 if pos + 4 > len(synBytes): log.error("Synonym file is corrupted") break wordIndex = binStrToInt(synBytes[pos:pos+4]) pos += 4 if wordIndex >= self._wordCount: log.error( "Corrupted synonym file. " + "Word \"%s\" references invalid item" % b_alt ) continue s_alt = b_alt.decode("utf-8") # s_alt is str try: synDict[wordIndex].append(s_alt) except KeyError: synDict[wordIndex] = [s_alt] return synDict def parseDefiBlockCompact(self, b_block, sametypesequence): """ Parse definition block when sametypesequence option is specified. Return a list of (b_defi, defiFormatCode) tuples where b_defi is a bytes instance and defiFormatCode is int, so: defiFormat = chr(defiFormatCode) """ assert isinstance(b_block, bytes) sametypesequence = sametypesequence.encode("utf-8") assert len(sametypesequence) > 0 res = [] i = 0 for t in sametypesequence[:-1]: if i >= len(b_block): return None if bytes([t]).islower(): beg = i i = b_block.find(b"\x00", beg) if i < 0: return None res.append((b_block[beg:i], t)) i += 1 else: assert bytes([t]).isupper() if i + 4 > len(b_block): return None size = binStrToInt(b_block[i:i+4]) i += 4 if i + size > len(b_block): return None res.append((b_block[i:i+size], t)) i += size if i >= len(b_block): return None t = sametypesequence[-1] if bytes([t]).islower(): if 0 in b_block[i:]: return None res.append((b_block[i:], t)) else: assert bytes([t]).isupper() res.append((b_block[i:], t)) return res def parseDefiBlockGeneral(self, b_block): """ Parse definition block when sametypesequence option is not specified. Return a list of (b_defi, defiFormatCode) tuples where b_defi is a bytes instance and defiFormatCode is int, so: defiFormat = chr(defiFormatCode) """ res = [] i = 0 while i < len(b_block): t = b_block[i] if not bytes([t]).isalpha(): return None i += 1 if bytes([t]).islower(): beg = i i = b_block.find(b"\x00", beg) if i < 0: return None res.append((b_block[beg:i], t)) i += 1 else: assert bytes([t]).isupper() if i + 4 > len(b_block): return None size = binStrToInt(b_block[i:i+4]) i += 4 if i + size > len(b_block): return None res.append((b_block[i:i+size], t)) i += size return res # def readResources(self): # if not isdir(self._resDir): # resInfoPath = join(baseDirPath, "res.rifo") # if isfile(resInfoPath): # log.warning( # "StarDict resource database is not supported. Skipping" # ) class Writer(object): def __init__(self, glos): self._glos = glos def write( self, filename, dictzip=True, sametypesequence=None, ): fileBasePath = "" ## if splitext(filename)[1].lower() == ".ifo": fileBasePath = splitext(filename)[0] elif filename.endswith(os.sep): if not isdir(filename): os.makedirs(filename) fileBasePath = join(filename, split(filename[:-1])[-1]) elif isdir(filename): fileBasePath = join(filename, split(filename)[-1]) ## if fileBasePath: fileBasePath = realpath(fileBasePath) self._filename = fileBasePath self._resDir = join(dirname(self._filename), "res") if sametypesequence: log.debug("Using write option sametypesequence=%s" % sametypesequence) self.writeCompact(sametypesequence) # elif self.glossaryHasAdditionalDefinitions(): # self.writeGeneral() else: # defiFormat = self.detectMainDefinitionFormat() # if defiFormat == None: self.writeGeneral() # else: # self.writeCompact(defiFormat) if dictzip: runDictzip(self._filename) def writeCompact(self, defiFormat): """ Build StarDict dictionary with sametypesequence option specified. Every item definition consists of a single article. All articles have the same format, specified in defiFormat parameter. Parameters: defiFormat - format of article definition: h - html, m - plain text """ dictMark = 0 altIndexList = [] # list of tuples (b"alternate", wordIndex) dictFile = open(self._filename+".dict", "wb") idxFile = open(self._filename+".idx", "wb") indexFileSize = 0 t0 = now() wordCount = 0 if not isdir(self._resDir): os.mkdir(self._resDir) entryI = -1 for entry in self._glos: if entry.isData(): entry.save(self._resDir) continue entryI += 1 words = entry.getWords() # list of strs word = words[0] # str defis = entry.getDefis() # list of strs b_dictBlock = b"" for alt in words[1:]: altIndexList.append((alt.encode("utf-8"), entryI)) b_dictBlock += (defis[0]).encode("utf-8") for altDefi in defis[1:]: b_dictBlock += b"\x00" + (altDefi).encode("utf-8") dictFile.write(b_dictBlock) blockLen = len(b_dictBlock) b_idxBlock = word.encode("utf-8") + b"\x00" + \ intToBinStr(dictMark, 4) + \ intToBinStr(blockLen, 4) idxFile.write(b_idxBlock) dictMark += blockLen indexFileSize += len(b_idxBlock) wordCount += 1 dictFile.close() idxFile.close() if not os.listdir(self._resDir): os.rmdir(self._resDir) log.info("Writing dict file took %.2f seconds", now() - t0) log.debug("defiFormat = " + pformat(defiFormat)) self.writeSynFile(altIndexList) self.writeIfoFile(wordCount, indexFileSize, len(altIndexList),defiFormat) def writeGeneral(self): """ Build StarDict dictionary in general case. Every item definition may consist of an arbitrary number of articles. sametypesequence option is not used. """ dictMark = 0 altIndexList = [] # list of tuples (b"alternate", wordIndex) dictFile = open(self._filename+".dict", "wb") idxFile = open(self._filename+".idx", "wb") indexFileSize = 0 t0 = now() wordCount = 0 defiFormatCounter = Counter() if not isdir(self._resDir): os.mkdir(self._resDir) entryI = -1 for entry in self._glos: if entry.isData(): entry.save(self._resDir) continue entryI += 1 words = entry.getWords() # list of strs word = words[0] # str defis = entry.getDefis() # list of strs entry.detectDefiFormat() # call no more than once defiFormat = entry.getDefiFormat() defiFormatCounter[defiFormat] += 1 if defiFormat not in ("m", "h"): defiFormat = "m" assert isinstance(defiFormat, str) and len(defiFormat) == 1 b_dictBlock = b"" for alt in words[1:]: altIndexList.append((alt.encode("utf-8"), entryI)) b_dictBlock += (defiFormat + defis[0]).encode("utf-8") + b"\x00" for altDefi in defis[1:]: b_dictBlock += (defiFormat + altDefi).encode("utf-8") + b"\x00" dictFile.write(b_dictBlock) blockLen = len(b_dictBlock) b_idxBlock = word.encode("utf-8") + b"\x00" + \ intToBinStr(dictMark, 4) + \ intToBinStr(blockLen, 4) idxFile.write(b_idxBlock) dictMark += blockLen indexFileSize += len(b_idxBlock) wordCount += 1 dictFile.close() idxFile.close() if not os.listdir(self._resDir): os.rmdir(self._resDir) log.info("Writing dict file took %.2f seconds", now() - t0) log.debug("defiFormatsCount = " + pformat(defiFormatCounter.most_common())) self.writeSynFile(altIndexList) self.writeIfoFile(wordCount, indexFileSize, len(altIndexList)) def writeSynFile(self, altIndexList): """ Build .syn file """ if not altIndexList: return log.info("Sorting %s synonyms..." % len(altIndexList)) t0 = now() altIndexList.sort( key=lambda x: sortKeyBytes(x[0]) ) # 28 seconds with old sort key (converted from custom cmp) # 0.63 seconds with my new sort key # 0.20 seconds without key function (default sort) log.info("Sorting %s synonyms took %.2f seconds" % ( len(altIndexList), now() - t0, )) log.info("Writing %s synonyms..." % len(altIndexList)) t0 = now() with open(self._filename+".syn", "wb") as synFile: synFile.write(b"".join([ b_alt + b"\x00" + intToBinStr(wordIndex, 4) for b_alt, wordIndex in altIndexList ])) log.info("Writing %s synonyms took %.2f seconds" % ( len(altIndexList), now() - t0, )) def writeIfoFile( self, wordCount, indexFileSize, synwordcount, sametypesequence=None, ): """ Build .ifo file """ ifoStr = "StarDict's dict ifo file\n" \ + "version=3.0.0\n" \ + "bookname=%s\n" % newlinesToSpace(self._glos.getInfo("name")) \ + "wordcount=%s\n" % wordCount \ + "idxfilesize=%s\n" % indexFileSize if sametypesequence is not None: ifoStr += "sametypesequence=%s\n" % sametypesequence if synwordcount > 0: ifoStr += "synwordcount=%s\n" % synwordcount for key in infoKeys: if key in ( "bookname", "wordcount", "idxfilesize", "sametypesequence", ): continue value = self._glos.getInfo(key) if value == "": continue if key == "description": value = newlinesToBr(value) else: value = newlinesToSpace(value) ifoStr += "%s=%s\n" % (key, value) with open(self._filename+".ifo", "w", encoding="utf-8") as ifoFile: ifoFile.write(ifoStr) def glossaryHasAdditionalDefinitions(self): """ Search for additional definitions in the glossary. We need to know if the glossary contains additional definitions to make the decision on the format of the StarDict dictionary. """ for entry in self._glos: if len(entry.getDefis()) > 1: return True return False def detectMainDefinitionFormat(self): """ Scan main definitions of the glossary. Return format common to all definitions: "h" or "m" If definitions has different formats return None. """ self._glos.setDefaultDefiFormat("m") formatsCount = self._glos.getMostUsedDefiFormats() if not formatsCount: return None if len(formatsCount) > 1: # FIXME return None return formatsCount[0] def write(glos, filename, **kwargs): writer = Writer(glos) writer.write(filename, **kwargs) pyglossary-3.2.1/pyglossary/plugins/stardict_tests.py0000644000175000017500000001076413577304507023540 0ustar emfoxemfox00000000000000import unittest import locale import random from functools import cmp_to_key def toBytes(s): return bytes(s, "utf8") if isinstance(s, str) else bytes(s) def sortKeyBytes(ba): """ ba is a bytes instance """ assert isinstance(ba, bytes) # WRONG: ba.lower() + ba return ( ba.lower(), ba, ) def stardictStrCmp(s1, s2): """ use this function to sort index items in StarDict dictionary s1 and s2 must be utf-8 encoded strings """ s1 = toBytes(s1) s2 = toBytes(s2) a = asciiStrCaseCmp(s1, s2) if a == 0: return strCmp(s1, s2) else: return a # the slow way in Python 3 (where there is no cmp arg in list.sort) sortKeyOld = cmp_to_key(stardictStrCmp) # TOO SLOW def asciiStrCaseCmp(ba1, ba2): """ ba1 and ba2 are instances of bytes imitate g_ascii_strcasecmp function of glib library gstrfuncs.c file """ commonLen = min(len(ba1), len(ba2)) for i in range(commonLen): c1 = asciiLower(ba1[i]) c2 = asciiLower(ba2[i]) if c1 != c2: return c1 - c2 return len(ba1) - len(ba2) def strCmp(ba1, ba2): """ ba1 and ba2 are instances of bytes imitate strcmp of standard C library Attention! You may have a temptation to replace this function with built-in cmp() function. Hold on! Most probably these two function behave identically now, but cmp does not document how it compares strings. There is no guaranty it will not be changed in future. Since we need predictable sorting order in StarDict dictionary, we need to preserve this function despite the fact there are other ways to implement it. """ commonLen = min(len(ba1), len(ba2)) for i in range(commonLen): c1 = ba1[i] c2 = ba2[i] if c1 != c2: return c1 - c2 return len(ba1) - len(ba2) def isAsciiAlpha(c): """ c is int """ return ord("A") <= c <= ord("Z") or ord("a") <= c <= ord("z") def isAsciiLower(c): return ord("a") <= c <= ord("z") def isAsciiUpper(c): """ c is int imitate ISUPPER macro of glib library gstrfuncs.c file """ return ord("A") <= c <= ord("Z") def asciiLower(c): """ c is int returns int (ascii character code) imitate TOLOWER macro of glib library gstrfuncs.c file This function converts upper case Latin letters to corresponding lower case letters, other chars are not changed. c must be non-Unicode string of length 1. You may apply this function to individual bytes of non-Unicode string. The following encodings are allowed: single byte encoding like koi8-r, cp1250, cp1251, cp1252, etc, and utf-8 encoding. Attention! Python Standard Library provides str.lower() method. It is not a correct replacement for this function. For non-unicode string str.lower() is locale dependent, it not only converts Latin letters to lower case, but also locale specific letters will be converted. """ return c - ord("A") + ord("a") if isAsciiUpper(c) else c def getRandomBytes(avgLen, sigma): length = round(random.gauss(avgLen, sigma)) return bytes([ random.choice(range(256)) for _ in range(length) ]) class AsciiLowerUpperTest(unittest.TestCase): def set_locale_iter(self): for localeName in locale.locale_alias.values(): try: locale.setlocale(locale.LC_ALL, localeName) except Exception as e: if "unsupported locale setting" not in str(e): print(e) continue yield localeName def test_isalpha(self): for _ in self.set_locale_iter(): for code in range(256): self.assertEqual( isAsciiAlpha(code), bytes([code]).isalpha(), ) def test_islower(self): for _ in self.set_locale_iter(): for code in range(256): self.assertEqual( isAsciiLower(code), bytes([code]).islower(), ) def test_isupper(self): for _ in self.set_locale_iter(): for code in range(256): self.assertEqual( isAsciiUpper(code), bytes([code]).isupper(), ) def test_lower(self): for _ in self.set_locale_iter(): for code in range(256): self.assertEqual( asciiLower(code), ord(bytes([code]).lower()), ) class SortRandomTest(unittest.TestCase): def set_locale_iter(self): for localeName in locale.locale_alias.values(): try: locale.setlocale(locale.LC_ALL, localeName) except Exception as e: if "unsupported locale setting" not in str(e): print(e) continue # print(localeName) yield localeName def test_sort_1(self): bsList = [ getRandomBytes(30, 10) for _ in range(100) ] for _ in self.set_locale_iter(): self.assertEqual( sorted( bsList, key=sortKeyOld, ), sorted( bsList, key=sortKeyBytes, ) ) if __name__ == "__main__": unittest.main() pyglossary-3.2.1/pyglossary/plugins/tabfile.py0000644000175000017500000000270013577304507022076 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- from formats_common import * from pyglossary.text_reader import TextGlossaryReader from pyglossary.text_utils import escapeNTB, unescapeNTB, splitByBarUnescapeNTB enable = True format = "Tabfile" description = "Tabfile (txt, dic)" extentions = [".txt", ".tab", ".dic"] readOptions = [ "encoding", ] writeOptions = [ "encoding", # str "writeInfo", # bool "resources", # bool ] class Reader(TextGlossaryReader): def isInfoWord(self, word): if isinstance(word, str): return word.startswith("#") else: return False def fixInfoWord(self, word): if isinstance(word, str): return word.lstrip("#") else: return word def nextPair(self): if not self._file: raise StopIteration line = self._file.readline() if not line: raise StopIteration line = line.strip() # This also removes tailing newline if not line: return ### word, tab, defi = line.partition("\t") if not tab: log.error( "Warning: line starting with \"%s\" has no tab!" % line[:10] ) return ### if self._glos.getPref("enable_alts", True): word = splitByBarUnescapeNTB(word) if len(word) == 1: word = word[0] else: word = unescapeNTB(word, bar=True) ### defi = unescapeNTB(defi) ### return word, defi def write( glos, filename, encoding="utf-8", writeInfo=True, resources=True, ): return glos.writeTabfile( filename, encoding=encoding, writeInfo=writeInfo, resources=resources, ) pyglossary-3.2.1/pyglossary/plugins/testformat.py0000644000175000017500000000274313577304507022667 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- from formats_common import * enable = False format = 'Test' description = 'Test Format File(.test)' extentions = ['.test', '.tst'] readOptions = [] writeOptions = [] def read(glos, filename): # glos is a Glossary object, filename is a string log.info('reading from format %s using plugin' % format) count = 100 # get number of entries from input file(depending on your format) for i in range(count): # here get word and definition from file(depending on your format) word = 'word_%s' % i defi = 'definition %s' % i glos.addEntry(word, defi) # here read info from file and set to Glossary object glos.setInfo('name', 'Test') glos.setInfo('descriptin', 'Test glossary craeted by a PyGlossary plugin') glos.setInfo('author', 'Me') glos.setInfo('copyright', 'GPL') return True # reading input file was succesfull def write(glos, filename): # glos is a Glossary object, filename is a string log.info('writing to format %s using plugin' % format) for entry in glos: word = entry.getWord() defi = entry.getDefi() # here write word and defi to the output file (depending on # your format) # here read info from Glossaey object name = glos.getInfo('name') descriptin = glos.getInfo('descriptin') author = glos.getInfo('author') copyright = glos.getInfo('copyright') # if an info key doesn't exist, getInfo returns empty string # now write info to the output file (depending on your output format) return True # writing output file was succesfull pyglossary-3.2.1/pyglossary/plugins/treedict.py0000644000175000017500000000307013577304507022274 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- from formats_common import * import subprocess enable = True format = "Treedict" description = "TreeDict" extentions = [".tree", ".treedict"] readOptions = [] writeOptions = [ "encoding", # str ] def write(glos, filename, encoding="utf-8", archive="tar.bz2", sep=os.sep): if os.path.exists(filename): if os.path.isdir(filename): if os.listdir(filename): log.warning("Warning: directory \"%s\" is not empty.") else: raise IOError("\"%s\" is not a directory") for entry in glos: defi = entry.getDefi() for word in entry.getWords(): if not word: log.error("empty word") continue chars = list(word) try: os.makedirs(filename + os.sep + sep.join(chars[:-1])) except: pass entryFname = "%s%s%s.m" % ( filename, os.sep, sep.join(chars), ) try: with open(entryFname, "a", encoding=encoding) as entryFp: entryFp.write(defi) except: log.exception("") if archive: if archive == "tar.gz": (output, error) = subprocess.Popen( ["tar", "-czf", filename+".tar.gz", filename], stdout=subprocess.PIPE ).communicate() elif archive == "tar.bz2": (output, error) = subprocess.Popen( ["tar", "-cjf", filename+".tar.bz2", filename], stdout=subprocess.PIPE ).communicate() elif archive == "zip": (output, error) = subprocess.Popen( ["zip", "-r", filename+".zip", filename], stdout=subprocess.PIPE ).communicate() else: log.error("Undefined archive format: \"%s\"" % archive) try: shutil.rmtree(filename, ignore_errors=True) except: pass pyglossary-3.2.1/pyglossary/plugins/wikipedia_dump.py0000644000175000017500000001201213577304507023460 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- from time import time as now import re try: from BeautifulSoup import BeautifulSoup except ImportError: from bs4 import BeautifulSoup from formats_common import * enable = True format = "WikipediaDump" description = "Wikipedia Dump(Static HTML)" extentions = [".wiki"] readOptions = [] writeOptions = [ "encoding", # str ] class Reader(object): specialPattern = re.compile(r".*~[^_]") def __init__(self, glos): self._glos = glos self._rootDir = "" self._articlesDir = "" self._len = None self._specialCount = 0 # self._alts = {} # { word => alts } # where alts is str (one word), or list of strs # we can't recognize alternates unless we keep all data in memory # or scan the whole directiry and read all files twice def open(self, rootDir): if not isdir(rootDir): raise IOError("%s is not a directory" % rootDir) self._rootDir = rootDir self._articlesDir = join(self._rootDir, "articles") def close(self): self._rootDir = "" self._articlesDir = "" self._len = None # self._alts = {} def __len__(self): if not self._articlesDir: log.error( "WikipediaDump: called len(reader) while it's not open" ) return 0 if self._len is None: t0 = now() log.info("Counting articles...") self._len = sum( len(files) for _, _, files in os.walk(self._articlesDir) ) log.debug("Counting articles took %.2f seconds" % (now() - t0)) log.info("Found %s articles" % self._len) return self._len def __iter__(self): if not self._articlesDir: log.error( "WikipediaDump: trying to iterate over reader" " while it's not open" ) raise StopIteration for dirpath, dirs, files in os.walk(self._articlesDir): # dirpathRel = dirpath[len(self._articlesDir):].lstrip("/") # dirParts = dirpathRel.split(os.sep) # test on windows FIXME # prefix = "".join([ # chr(int(x, 16)) if len(x)==2 else x # for x in dirParts # ]) dirs.sort() files.sort() for fname_html in files: fpath = join(dirpath, fname_html) fname, ext = splitext(fname_html) # fpathRel = join(dirpathRel, fname_html) if ext != ".html": log.warning("unkown article extention: %s" % fname_ext) continue if self.isSpecialByPath(fname): # , prefix # log.debug("Skipping special page file: %s" % fpathRel) self._specialCount += 1 yield None # updates progressbar continue word = fname.replace("_", " ").replace("~", ":") defi = self.parseArticle(word, fpath) if not defi: yield None # updates progressbar continue yield self._glos.newEntry(word, defi) log.info("Skipped %s special page files" % self._specialCount) def isSpecialByPath(self, fname): # , d_prefix """ fname: str d_prefix: str, with length of 3 this is the joint string version of directory relative path """ return re.match(self.specialPattern, fname) # assert len(d_prefix) == 3 # f_prefix = fname[:3].lower() # if f_prefix == d_prefix: # return False # if len(f_prefix) < 3: # if f_prefix == d_prefix.rstrip("_"): # return False # return True # if "~" not in fname: # return False # if fname[0] == dirParts[0]: # return False # if list(fname[:3]) != dirParts: # log.debug("dirParts=%r, fname=%r" % (dirParts, fname)) # l_fname = fname.lower() # for ext in ("png", "jpg", "jpeg", "gif", "svg", "pdf", "js"): # if "." + ext in l_fname: # return True # return True def parseArticle(self, word, fpath): try: with open(fpath) as fileObj: text = fileObj.read() except UnicodeDecodeError: log.error("error decoding file %r, not UTF-8" % fpath) return except: log.exception("error reading file %r" % fpath) return root = BeautifulSoup(text, "lxml") body = root.body if not body: return # if body.p and body.p.text.startswith("Redirecting to "): # toWord = body.p.text[len("Redirecting to "):] # try: # fromWords = self._alts[toWord] # except KeyError: # self._alts[toWord] = word # else: # if isinstance(fromWords, str): # self._alts[toWord] = [fromWords, word] # else: # fromWords.append(word) # return # try: # alts = self._alts[word] if body.p and body.p.text.startswith("Redirecting to "): toWord = body.p.text[len("Redirecting to "):] return "↳ %s" % (toWord, toWord) # "↳" does not look good for RTL languages FIXME try: content = body.find(id="column-content").find(id="content") except AttributeError: content = None if not content: log.warning("could not find \"content\" element: %s" % fpath) return try: firstHeading = content.find("h1", class_="firstHeading").text except AttributeError: log.warning("could not find \"firstHeading\" element: %s" % fpath) return if firstHeading != word: log.debug("word=%r, firstHeading=%r" % (firstHeading, word)) bodyContent = content.find(id="bodyContent") if not bodyContent: log.warning("could not find \"bodyContent\" element: %s" % fpath) return # FIXME return "".join([str(tag) for tag in bodyContent.contents]) pyglossary-3.2.1/pyglossary/plugins/xdxf/0000755000175000017500000000000013577304644021072 5ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary/plugins/xdxf/__init__.py0000644000175000017500000001121113577304507023175 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # xdxf/__init__.py """xdxf file format reader and utils to convert xdxf to html.""" # # Copyright (C) 2016 Ratijas # # some parts of this file include code from: # Aard Dictionary Tools . # Copyright (C) 2008-2009 Igor Tkach # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, version 3 of the License. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. from os import path from formats_common import * enable = True format = "Xdxf" description = "XDXF" extentions = [".xdxf", ".xml"] readOptions = [] writeOptions = [] etree = None XML = None tostring = None transform = None def import_xml_stuff(): from lxml import etree as _etree global etree, XML, tostring etree = _etree XML = etree.XML tostring = etree.tostring def read(glos, filename): """ new format All meta information about the dictionary: its title, author etc. article 1 article 2 article 3 article 4 ... old format ... ... article 1 article 2 article 3 article 4 ... """ # import_xml_stuff() with open(filename, "rb") as f: xdxf = XML(f.read()) if len(xdxf) == 2: # new format read_metadata_new(glos, xdxf) read_xdxf_new(glos, xdxf) else: # old format read_metadata_old(glos, xdxf) read_xdxf_old(glos, xdxf) def read_metadata_old(glos, xdxf): full_name = xdxf.find("full_name").text description = xdxf.find("description").text if full_name: glos.setInfo("name", full_name) if description: glos.setInfo("description", description) def read_xdxf_old(glos, xdxf): add_articles(glos, xdxf.iterfind("ar")) def read_metadata_new(glos, xdxf): meta_info = xdxf.find("meta_info") if meta_info is None: raise ValueError("meta_info not found") title = meta_info.find("full_title").text if not title: title = meta_info.find("title").text description = meta_info.find("description").text if title: glos.setInfo("name", title) if description: glos.setInfo("description", description) def read_xdxf_new(glos, xdxf): add_articles(glos, xdxf.find("lexicon").iterfind("ar")) def add_articles(glos, articles): """ :param articles: iterator on tags :return: None """ glos.setDefaultDefiFormat("x") for article in articles: article.tail = None defi = tostring(article, encoding="utf-8") # ... defi = defi[4:-5].strip() glos.addEntry( [toStr(w) for w in titles(article)], toStr(defi), ) def titles(article): """ :param article: tag :return: (title (str) | None, alternative titles (set)) """ from itertools import combinations titles = [] for title_element in article.findall("k"): n_opts = len([c for c in title_element if c.tag == "opt"]) if n_opts: for j in range(n_opts + 1): for comb in combinations(list(range(n_opts)), j): titles.append(_mktitle(title_element, comb)) else: titles.append(_mktitle(title_element)) return titles def _mktitle(title_element, include_opts=()): title = title_element.text opt_i = -1 for c in title_element: if c.tag == "nu" and c.tail: if title: title += c.tail else: title = c.tail if c.tag == "opt": opt_i += 1 if opt_i in include_opts: if title: title += c.text else: title = c.text if c.tail: if title: title += c.tail else: title = c.tail return title.strip() def xdxf_init(): """ call this only once, before `xdxf_to_html`. """ global transform import_xml_stuff() xsl = path.join(path.dirname(__file__), "xdxf.xsl") with open(xsl, "r") as f: xslt_root_txt = f.read() xslt_root = etree.XML(xslt_root_txt) transform = etree.XSLT(xslt_root) def xdxf_to_html(xdxf_text): """ make sure to call `xdxf_init()` first. :param xdxf_text: xdxf formatted string :return: html formatted string """ from io import StringIO xdxf_txt = "%s" % xdxf_text f = StringIO(xdxf_txt) doc = etree.parse(f) result_tree = transform(doc) return tostring(result_tree, encoding="utf-8") pyglossary-3.2.1/pyglossary/plugins/xdxf/xdxf.xsl0000644000175000017500000000744213575553425022603 0ustar emfoxemfox00000000000000

    [] .
    pyglossary-3.2.1/pyglossary/sort_stream.py0000644000175000017500000000372413577304507021360 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- from heapq import heappush, heappop from heapq import merge import logging log = logging.getLogger('root') def hsortStream(stream, maxHeapSize, key=None): """ stream: a generator or iterable maxHeapSize: int, maximum size of heap key: a key function, as in `list.sort` method, or `sorted` function if key is None, we consume less memory the sort is Stable (unlike normal heapsort) because we include the index (after item / output of key function) """ hp = [] if key: for index, item in enumerate(stream): if len(hp) >= maxHeapSize: yield heappop(hp)[2] heappush(hp, ( key(item), # for sorting order index, # for sort being Stable item, # for fetching result )) while hp: yield heappop(hp)[2] else: # consume less memory for index, item in enumerate(stream): if len(hp) >= maxHeapSize: yield heappop(hp)[0] heappush(hp, ( item, # for sorting order, and fetching result index, # for sort being Stable )) while hp: yield heappop(hp)[0] def hsortStreamList(streams, *args, **kwargs): streams = [ hsortStream(stream, *args, **kwargs) for stream in streams ] return merge(*tuple(streams)) def stdinIntegerStream(): while True: line = input(' Input item: ') if not line: break yield int(line) def stdinStringStream(): while True: line = raw_input(' Input item: ') if not line: break yield line def randomChoiceGenerator(choices, count): import random for _ in range(count): yield random.choice(choices) def test_hsortStreamList(count=10): for item in hsortStreamList( [ randomChoiceGenerator(range(0, 50), count), randomChoiceGenerator(range(30, 50), count), randomChoiceGenerator(range(10, 40), count), ], maxHeapSize=5, key=None, ): print(item) def main(): test_hsortStreamList() # stream = stdinIntegerStream() # for line in hsortStream(stream, 3): # print('------ Placed item: %s'%line) if __name__ == '__main__': main() pyglossary-3.2.1/pyglossary/text_reader.py0000644000175000017500000000361713577304507021325 0ustar emfoxemfox00000000000000from pyglossary.file_utils import fileCountLines from pyglossary.entry import Entry import logging log = logging.getLogger('root') class TextGlossaryReader(object): def __init__(self, glos, hasInfo=True): self._glos = glos self._filename = '' self._file = None self._hasInfo = True self._leadingLinesCount = 0 self._pendingEntries = [] self._wordCount = None self._pos = -1 def open(self, filename, encoding='utf-8'): self._filename = filename self._file = open(filename, 'r', encoding=encoding) if self._hasInfo: self.loadInfo() def close(self): if not self._file: return try: self._file.close() except: log.exception('error while closing file "%s"' % self._filename) self._file = None def loadInfo(self): self._pendingEntries = [] self._leadingLinesCount = 0 try: while True: wordDefi = self.nextPair() if not wordDefi: continue word, defi = wordDefi if not self.isInfoWord(word): self._pendingEntries.append(Entry(word, defi)) break self._leadingLinesCount += 1 word = self.fixInfoWord(word) if not word: continue self._glos.setInfo(word, defi) except StopIteration: pass def __next__(self): self._pos += 1 try: return self._pendingEntries.pop(0) except IndexError: pass ### try: wordDefi = self.nextPair() except StopIteration as e: self._wordCount = self._pos raise e if not wordDefi: return word, defi = wordDefi ### return Entry(word, defi) def __len__(self): if self._wordCount is None: log.debug('Try not to use len(reader) as it takes extra time') self._wordCount = fileCountLines(self._filename) - \ self._leadingLinesCount return self._wordCount def __iter__(self): return self def isInfoWord(self, word): raise NotImplementedError def fixInfoWord(self, word): raise NotImplementedError def nextPair(self): raise NotImplementedError pyglossary-3.2.1/pyglossary/text_utils.py0000644000175000017500000001104713577304507021217 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # text_utils.py # # Copyright © 2008-2010 Saeed Rasooli (ilius) # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. import string import sys import os import re import logging from . import core log = logging.getLogger("root") startRed = "\x1b[31m" endFormat = "\x1b[0;0;0m" # len=8 def toBytes(s): return bytes(s, "utf8") if isinstance(s, str) else bytes(s) def toStr(s): return str(s, "utf8") if isinstance(s, bytes) else str(s) def fixUtf8(st): return toBytes(st).replace(b"\x00", b"").decode("utf-8", "replace") pattern_n_us = re.compile(r"((? 0: bs.insert(0, n & 0xff) n >>= 8 return bytes(bs).rjust(stLen, b"\x00") def binStrToInt(bs): bs = toBytes(bs) n = 0 for c in bs: n = (n << 8) + c return n # ___________________________________________ # def urlToPath(url): if not url.startswith("file://"): return url path = url[7:] if path[-2:] == "\r\n": path = path[:-2] elif path[-1] == "\r": path = path[:-1] # here convert html unicode symbols to utf8 string: if "%" not in path: return path path2 = "" n = len(path) i = 0 while i < n: if path[i] == "%" and i < n-2: path2 += chr(eval("0x%s" % path[i+1:i+3])) i += 3 else: path2 += path[i] i += 1 return path2 def replacePostSpaceChar(st, ch): return st.replace(" "+ch, ch).replace(ch, ch+" ").replace(ch+" ", ch+" ") def runDictzip(filename): import subprocess dictzipCmd = "/usr/bin/dictzip" # Save in pref FIXME if not os.path.isfile(dictzipCmd): return False if filename[-4:] == ".ifo": filename = filename[:-4] (out, err) = subprocess.Popen( [dictzipCmd, filename+".dict"], stdout=subprocess.PIPE ).communicate() # out = p3[1].read() # err = p3[2].read() # log.debug("dictzip command: \"%s\""%dictzipCmd) # if err: # log.error("dictzip error: %s"%err.replace("\n", " ")) # if out: # log.error("dictzip error: %s"%out.replace("\n", " ")) def isControlChar(y): # y: char code if y < 32 and chr(y) not in "\t\n\r\v": return True # according to ISO-8859-1 if 128 <= y <= 159: return True return False def isASCII(data, exclude=None): if exclude is None: exclude = [] for c in data: co = ord(c) if co >= 128 and co not in exclude: return False return True def formatByteStr(text): out = "" for c in text: out += "{0:0>2x}".format(ord(c)) + " " return out pyglossary-3.2.1/pyglossary/text_utils_extra.py0000644000175000017500000000434713575553425022432 0ustar emfoxemfox00000000000000def intToBinStr(n, stLen=0): bs = b'' while n > 0: bs = bytes([n & 255]) + bs n >>= 8 return bs.rjust(stLen, b'\x00') def intToBinStr(n, stLen=0): return bytes(( (n >> (i << 3)) & 0xff for i in range(int(ceil(log(n, 256)))-1, -1, -1) )).rjust(stLen, b'\x00') def binStrToInt(bs): n = 0 for c in bs: n = (n << 8) + c return n def textProgress(n=100, t=0.1): import time for i in range(n): sys.stdout.write('\b\b\b\b##%3d' % (i+1)) time.sleep(t) sys.stdout.write('\b\b\b') def locate(lst, val): n = len(lst) if n == 0: return if val < lst[0]: return -0.5 if val == lst[0]: return 0 if val == lst[-1]: return n-1 if val > lst[-1]: return n-0.5 si = 0 # start index ei = n # end index while ei-si > 1: mi = (ei+si)/2 # middle index if lst[mi] == val: return mi elif lst[mi] > val: ei = mi continue else: si = mi continue if ei-si == 1: return si+0.5 def locate2(lst, val, ind=1): n = len(lst) if n == 0: return if val < lst[0][ind]: return -0.5 if val == lst[0][ind]: return 0 if val == lst[-1][ind]: return n-1 if val > lst[-1][ind]: return n-0.5 si = 0 ei = n while ei-si > 1: mi = (ei+si)/2 if lst[mi][ind] == val: return mi elif lst[mi][ind] > val: ei = mi continue else: si = mi continue if ei-si == 1: return si+0.5 def xml2dict(xmlText): from xml.etree.ElementTree import XML, tostring xmlElems = XML(xmlText) for elem in xmlElems: elemText = tostring(elem) try: elem[0] elemElems = xml2dict() except: pass def sortby(lst, n, reverse=False): nlist = [(x[n], x) for x in lst] nlist.sort(None, None, reverse) return [val for (key, val) in nlist] def sortby_inplace(lst, n, reverse=False): lst[:] = [(x[n], x) for x in lst] lst.sort(None, None, reverse) lst[:] = [val for (key, val) in lst] return def chBaseIntToStr(number, base): """ reverse function of int(str, base) and long(str, base) """ if not 2 <= base <= 36: raise ValueError('base must be in 2..36') abc = string.digits + string.ascii_letters result = '' if number < 0: number = -number sign = '-' else: sign = '' while True: number, rdigit = divmod(number, base) result = abc[rdigit] + result if number == 0: return sign + result pyglossary-3.2.1/pyglossary/xml_utils.py0000644000175000017500000000220413577304507021026 0ustar emfoxemfox00000000000000# from xml.sax.saxutils import escape as xml_escape # from xml.sax.saxutils import unescape as xml_unescape def xml_escape(data, entities=None): """Escape &, <, and > in a string of data. You can escape other strings of data by passing a dictionary as the optional entities parameter. The keys and values must all be strings; each key will be replaced with its corresponding value. """ if entities is None: entities = {} # must do ampersand first data = data.replace("&", "&") data = data.replace(">", ">") data = data.replace("<", "<") if entities: data = __dict_replace(data, entities) return data def xml_unescape(data, entities=None): """Unescape &, <, and > in a string of data. You can unescape other strings of data by passing a dictionary as the optional entities parameter. The keys and values must all be strings; each key will be replaced with its corresponding value. """ if entities is None: entities = {} data = data.replace("<", "<") data = data.replace(">", ">") if entities: data = __dict_replace(data, entities) # must do ampersand last return data.replace("&", "&") pyglossary-3.2.1/pyglossary.desktop0000755000175000017500000000040213575553425020033 0ustar emfoxemfox00000000000000#!/usr/bin/env xdg-open [Desktop Entry] Name=PyGlossary GenericName=Glossary Converter Comment=Working on glossaries Exec=pyglossary Terminal=false Type=Application StartupNotify=true Icon=pyglossary Categories=Education; X-GNOME-FullName=Glossary Converter pyglossary-3.2.1/pyglossary.egg-info/0000755000175000017500000000000013577304644020132 5ustar emfoxemfox00000000000000pyglossary-3.2.1/pyglossary.egg-info/PKG-INFO0000644000175000017500000002317113577304644021233 0ustar emfoxemfox00000000000000Metadata-Version: 2.1 Name: pyglossary Version: 3.2.1 Summary: A tool for workig with dictionary databases Home-page: https://github.com/ilius/pyglossary Author: Saeed Rasooli Author-email: saeed.gnu@gmail.com License: GPLv3 Description: PyGlossary ========== PyGlossary is a tool for converting dictionary files aka glossaries, from/to various formats used by different dictionary applications Screenshots ----------- ![](https://raw.githubusercontent.com/ilius/pyglossary/resources/screenshots/30-gtk-bgl-stardict-nl-en.png) Linux - (New) Gtk3-based intreface ------------------------------------------------------------------------ ![](https://raw.githubusercontent.com/ilius/pyglossary/resources/screenshots/30-tk-bgl-mdict-fr-zh-win7.png) Windows - Tkinter-based interface ------------------------------------------------------------------------ ![](https://raw.githubusercontent.com/ilius/pyglossary/resources/screenshots/30-cmd-bgl-apple-ru-de.png) Linux - command line interface Supported formats ----------------- | Format | Extension | Read | Write | |-----------------------------------|---------------|:-----:|:-----:| | ABBYY Lingvo DSL | .dsl | ✔ | | | AppleDict Source | .xml | | ✔ | | Babylon | .bgl | ✔ | | | Babylon Source | .gls | | ✔ | | CC-CEDICT | | ✔ | | | CSV | .csv | ✔ | ✔ | | DictionaryForMIDs | | ✔ | ✔ | | DICTD dictionary server | .index | ✔ | ✔ | | Editable Linked List of Entries | .edlin | ✔ | ✔ | | FreeDict | .tei | | ✔ | | Gettext Source | .po | ✔ | ✔ | | Lingoes Source (LDF) | .ldf | ✔ | ✔ | | Octopus MDict | .mdx | ✔ | | | Octopus MDict Source | .txt | ✔ | ✔ | | Omnidic | | | ✔ | | Sdictionary Binary | .dct | ✔ | | | Sdictionary Source | .sdct | | ✔ | | SQL | .sql | | ✔ | | StarDict | .ifo | ✔ | ✔ | | Tabfile | .txt, .dic | ✔ | ✔ | | TreeDict | | | ✔ | | XDXF | .xdxf | ✔ | | Requirements ------------ PyGlossary uses **Python 3.x**, and works in practically all operating systems. While primarilly designed for *GNU/Linux*, it works on *Windows*, *Mac OS X* and other Unix-based operating systems as well. As shown in the screenshots, there are multiple User Interface types, ie. multiple ways to use the program. - **Gtk3-based interface**, uses [PyGI (Python Gobject Introspection)](http://pygobject.readthedocs.io/en/latest/getting_started.html) You can install it on: - Debian/Ubuntu: `apt install python3-gi python3-gi-cairo gir1.2-gtk-3.0` - openSUSE: `zypper install python3-gobject gtk3` - Fedora: `dnf install pygobject3 python3-gobject gtk3` - Archlinux: * `pacman -S python-gobject gtk3` * https://aur.archlinux.org/packages/pyglossary/ - Mac OS X: `brew install pygobject3 gtk+3` - Nix / NixOS: `nix-shell -p gnome3.gobjectIntrospection python37Packages.pygobject3 python37Packages.pycairo` - **Tkinter-based interface**, works in the lack of Gtk. Specially on Windows where Tkinter library is installed with the Python itself. You can also install it on: - Debian/Ubuntu: `apt-get install python3-tk tix` - openSUSE: `zypper install python3-tk tix` - Fedora: `yum install python3-tkinter tix` - Mac OS X: read - Nix / NixOS: `nix-shell -p python37Packages.tkinter tix` - **Command-line interface**, works in all operating systems without any specific requirements, just type: `python3 pyglossary.pyw --help` You may have to give `--no-progress-bar` option in Windows when converting glossaries (because the progress bar does not work properly in Windows command window) When you run the program without any command line arguments or options, PyGlossary tries to find PyGI, if it's installed, opens the Gtk3-based interface, if it's not, tries to find Tkinter and open the Tkinter-based interface. And exits with an error if neither are installed. But you can explicitly determine the user interface type using `--ui`, for example: python3 pyglossary.pyw --ui=gtk Or python3 pyglossary.pyw --ui=tk Format-specific Requirements ---------------------------- - **Reading from XDXF** `sudo pip3 install lxml` - **Writing to AppleDict** `sudo pip3 install lxml beautifulsoup4 html5lib` - **Reading from Babylon BGL**: Python 3.4 to 3.7 is recommended - **Reading from CC-CEDICT** `sudo pip3 install jinja2` - **Reading from Octopus Mdict (MDX)** + **python-lzo**, required for **some** MDX glossaries - First try converting your MDX file, and if failed (`AssertionError` probably), then you may need to install LZO library and Python binding: - **On Linux**, make sure `liblzo2-dev` or `liblzo2-devel` is installed and then run `sudo pip3 install python-lzo` - **On Windows**: + Open this page: https://www.lfd.uci.edu/~gohlke/pythonlibs/#python-lzo + If you are using Python 3.7 (32 bit) for example, click on `python_lzo‑1.12‑cp37‑cp37m‑win32.whl` + Open Start -> type Command -> right-click on Command Prompt -> Run as administrator + Run `pip install C:\....\python_lzo‑1.12‑cp37‑cp37m‑win32.whl` command, giving the path of downloaded file **Other Requirements for Mac OS X** If you want to convert glossaries into AppleDict format on Mac OS X, you also need: - GNU make as part of [Command Line Tools for Xcode](http://developer.apple.com/downloads). - Dictionary Development Kit as part of [Additional Tools for Xcode](http://developer.apple.com/downloads). Extract to `/Developer/Extras/Dictionary Development Kit` HOWTOs ------ ### Convert Babylon (bgl) to Mac OS X dictionary Let's assume the Babylon dict is at `~/Documents/Duden_Synonym/Duden_Synonym.BGL`: cd ~/Documents/Duden_Synonym/ python3 ~/Software/pyglossary/pyglossary.pyw --write-format=AppleDict Duden_Synonym.BGL Duden_Synonym-apple cd Duden_Synonym-apple make make install Launch Dictionary.app and test. ### Convert Octopus Mdict to Mac OS X dictionary Let's assume the MDict dict is at `~/Documents/Duden-Oxford/Duden-Oxford DEED ver.20110408.mdx`. Run the following command: cd ~/Documents/Duden-Oxford/ python3 ~/Software/pyglossary/pyglossary.pyw --write-format=AppleDict "Duden-Oxford DEED ver.20110408.mdx" "Duden-Oxford DEED ver.20110408-apple" cd "Duden-Oxford DEED ver.20110408-apple" make make install Launch Dictionary.app and test. Let's assume the MDict dict is at `~/Downloads/oald8/oald8.mdx`, along with the image/audio resources file `oald8.mdd`. Run the following commands: : cd ~/Downloads/oald8/ python3 ~/Software/pyglossary/pyglossary.pyw --write-format=AppleDict oald8.mdx oald8-apple cd oald8-apple This extracts dictionary into `oald8.xml` and data resources into folder `OtherResources`. Hyperlinks use relative path. : sed -i "" 's:src="/:src=":g' oald8.xml Convert audio file from SPX format to WAV format. You need package `speex` from [MacPorts](https://www.macports.org) : find OtherResources -name "*.spx" -execdir sh -c 'spx={};speexdec $spx ${spx%.*}.wav' \; sed -i "" 's|sound://\([/_a-zA-Z0-9]*\).spx|\1.wav|g' oald8.xml But be warned that the decoded WAVE audio can consume \~5 times more disk space! Compile and install. : make make install Launch Dictionary.app and test. Platform: UNKNOWN Description-Content-Type: text/markdown pyglossary-3.2.1/pyglossary.egg-info/SOURCES.txt0000644000175000017500000001017713577304644022024 0ustar emfoxemfox00000000000000AUTHORS README.md about config.json help license-dialog license.txt pyglossary.desktop pyglossary.pyw setup.py doc/Babylon/BGL.svg doc/DSL/README.rst doc/Octopus MDict/MDD.svg doc/Octopus MDict/MDX.svg doc/Octopus MDict/README.rst doc/non-gui_examples/any_to_txt.py doc/non-gui_examples/oxford.py pyglossary/__init__.py pyglossary/arabic_utils.py pyglossary/core.py pyglossary/entry.py pyglossary/entry_filters.py pyglossary/exir.py pyglossary/file_utils.py pyglossary/flags.py pyglossary/glossary.py pyglossary/gregorian.py pyglossary/html_utils.py pyglossary/json_utils.py pyglossary/math_utils.py pyglossary/os_utils.py pyglossary/persian_utils.py pyglossary/sort_stream.py pyglossary/text_reader.py pyglossary/text_utils.py pyglossary/text_utils_extra.py pyglossary/xml_utils.py pyglossary.egg-info/PKG-INFO pyglossary.egg-info/SOURCES.txt pyglossary.egg-info/dependency_links.txt pyglossary.egg-info/top_level.txt pyglossary/plugin_lib/__init__.py pyglossary/plugin_lib/pureSalsa20.py pyglossary/plugin_lib/readmdict.py pyglossary/plugin_lib/ripemd128.py pyglossary/plugin_lib/py34/__init__.py pyglossary/plugin_lib/py34/gzip_no_crc.py pyglossary/plugin_lib/py35/__init__.py pyglossary/plugin_lib/py35/gzip_no_crc.py pyglossary/plugin_lib/py36/__init__.py pyglossary/plugin_lib/py36/gzip_no_crc.py pyglossary/plugin_lib/py37/gzip_no_crc.py pyglossary/plugins/babylon_bdc.py pyglossary/plugins/babylon_source.py pyglossary/plugins/csv_pyg.py pyglossary/plugins/dicformids.py pyglossary/plugins/dict_org.py pyglossary/plugins/edlin.py pyglossary/plugins/formats_common.py pyglossary/plugins/freedict.py pyglossary/plugins/gettext_mo.py pyglossary/plugins/gettext_po.py pyglossary/plugins/lingoes_ldf.py pyglossary/plugins/octopus_mdict.py pyglossary/plugins/octopus_mdict_source.py pyglossary/plugins/omnidic.py pyglossary/plugins/paths.py pyglossary/plugins/sdict.py pyglossary/plugins/sdict_source.py pyglossary/plugins/sql.py pyglossary/plugins/stardict.py pyglossary/plugins/stardict_tests.py pyglossary/plugins/tabfile.py pyglossary/plugins/testformat.py pyglossary/plugins/treedict.py pyglossary/plugins/wikipedia_dump.py pyglossary/plugins/appledict/__init__.py pyglossary/plugins/appledict/_dict.py pyglossary/plugins/appledict/_normalize.py pyglossary/plugins/appledict/indexes/__init__.py pyglossary/plugins/appledict/indexes/ru.py pyglossary/plugins/appledict/indexes/zh.py pyglossary/plugins/appledict/jing/__init__.py pyglossary/plugins/appledict/jing/__main__.py pyglossary/plugins/appledict/jing/main.py pyglossary/plugins/appledict/jing/DictionarySchema/AppleDictionarySchema.rng pyglossary/plugins/appledict/jing/DictionarySchema/modules/dict-struct.rng pyglossary/plugins/appledict/jing/jing/readme.html pyglossary/plugins/appledict/templates/Dictionary.css pyglossary/plugins/appledict/templates/Info.plist pyglossary/plugins/appledict/templates/Makefile pyglossary/plugins/babylon_bgl/__init__.py pyglossary/plugins/babylon_bgl/bgl_charset.py pyglossary/plugins/babylon_bgl/bgl_info.py pyglossary/plugins/babylon_bgl/bgl_language.py pyglossary/plugins/babylon_bgl/bgl_pos.py pyglossary/plugins/babylon_bgl/bgl_reader.py pyglossary/plugins/babylon_bgl/bgl_reader_debug.py pyglossary/plugins/babylon_bgl/bgl_text.py pyglossary/plugins/babylon_bgl/gzip_no_crc.patch pyglossary/plugins/cc_cedict/.gitignore pyglossary/plugins/cc_cedict/__init__.py pyglossary/plugins/cc_cedict/article.html pyglossary/plugins/cc_cedict/conv.py pyglossary/plugins/cc_cedict/jinja2htmlcompress.py pyglossary/plugins/cc_cedict/pinyin.py pyglossary/plugins/cc_cedict/summarize.py pyglossary/plugins/dsl/__init__.py pyglossary/plugins/dsl/flawless_dsl/__init__.py pyglossary/plugins/dsl/flawless_dsl/layer.py pyglossary/plugins/dsl/flawless_dsl/main.py pyglossary/plugins/dsl/flawless_dsl/tag.py pyglossary/plugins/dsl/flawless_dsl/tests.py pyglossary/plugins/xdxf/__init__.py pyglossary/plugins/xdxf/xdxf.xsl res/mimapps.list res/pyglossary.ico res/pyglossary.png res/pyglossary.xbm res/resize-16.png res/resize.png ui/__init__.py ui/base.py ui/paths.py ui/progressbar.py ui/ui_cmd.py ui/ui_gtk.py ui/ui_qt.py ui/ui_tk.py ui/gtk3_utils/__init__.py ui/gtk3_utils/dialog.py ui/gtk3_utils/resize_button.py ui/gtk3_utils/utils.pypyglossary-3.2.1/pyglossary.egg-info/dependency_links.txt0000644000175000017500000000000113577304644024200 0ustar emfoxemfox00000000000000 pyglossary-3.2.1/pyglossary.egg-info/top_level.txt0000644000175000017500000000001313577304644022656 0ustar emfoxemfox00000000000000pyglossary pyglossary-3.2.1/pyglossary.pyw0000755000175000017500000002012613577304507017203 0ustar emfoxemfox00000000000000#!/usr/bin/env python3 # -*- coding: utf-8 -*- # ui_main.py # # Copyright © 2008-2010 Saeed Rasooli (ilius) # This file is part of PyGlossary project, https://github.com/ilius/pyglossary # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program. Or on Debian systems, from /usr/share/common-licenses/GPL # If not, see . import os import sys import argparse import builtins from os.path import dirname, join, realpath from pprint import pformat import logging from pyglossary import core # essential from pyglossary import VERSION from pyglossary.text_utils import startRed, endFormat # the first thing to do is to set up logger. # other modules also using logger 'root', so it's essential to set it up prior # to importing anything else; with exception to pyglossary.core which sets up # logger class, and so should be done before actually initializing logger. # verbosity level may be given on command line, so we have to parse arguments # before setting up logger. # once more: # - import system modules like os, sys, argparse etc and pyglossary.core # - parse args # - set up logger # - import submodules # - other code # no-progress-bar only for command line UI # FIXME: load ui-dependent available options from ui modules # (for example ui_cmd.available_options) # the only problem is that it has to "import gtk" before it get the # "ui_gtk.available_options" # FIXME # -v (verbose or version?) # -r (reverse or read-options) parser = argparse.ArgumentParser(add_help=False) parser.add_argument( '-v', '--verbosity', action='store', dest='verbosity', type=int, choices=(0, 1, 2, 3, 4), required=False, default=3, # FIXME ) parser.add_argument( '--version', action='version', version='PyGlossary %s' % VERSION, ) parser.add_argument( '-h', '--help', dest='help', action='store_true', ) parser.add_argument( '-u', '--ui', dest='ui_type', default='auto', choices=( 'cmd', 'gtk', 'tk', # 'qt', 'auto', 'none', ), ) parser.add_argument( '-r', '--read-options', dest='readOptions', default='', ) parser.add_argument( '-w', '--write-options', dest='writeOptions', default='', ) parser.add_argument( '--read-format', dest='inputFormat', ) parser.add_argument( '--write-format', dest='outputFormat', action='store', ) parser.add_argument( '--direct', dest='direct', action='store_true', default=None, help='if possible, convert directly without loading into memory', ) parser.add_argument( '--indirect', dest='direct', action='store_false', default=None, help='disable `direct` mode, load full data into memory before writing' ', this is default', ) parser.add_argument( '--reverse', dest='reverse', action='store_true', ) parser.add_argument( '--no-progress-bar', dest='progressbar', action='store_false', default=None, ) parser.add_argument( '--sort', dest='sort', action='store_true', default=None, ) parser.add_argument( '--no-sort', dest='sort', action='store_false', default=None, ) parser.add_argument( '--sort-cache-size', dest='sortCacheSize', type=int, default=None, ) parser.add_argument( '--utf8-check', dest='utf8Check', action='store_true', default=None, ) parser.add_argument( '--no-utf8-check', dest='utf8Check', action='store_false', default=None, ) parser.add_argument( '--lower', dest='lower', action='store_true', default=None, help='lowercase words before writing', ) parser.add_argument( '--no-lower', dest='lower', action='store_false', default=None, help='don\'t lowercase words before writing', ) parser.add_argument( '--skip-resources', dest='skipResources', action='store_true', default=None, help='skip resources (images, audio, etc)', ) parser.add_argument( '--no-color', dest='noColor', action='store_true', ) parser.add_argument( 'inputFilename', action='store', default='', nargs='?', ) parser.add_argument( 'outputFilename', action='store', default='', nargs='?', ) args = parser.parse_args() log = logging.getLogger('root') log.setVerbosity(args.verbosity) log.addHandler( core.StdLogHandler(noColor=args.noColor), ) # with the logger setted up, we can import other pyglossary modules, so they # can do some loggging in right way. core.checkCreateConfDir() ############################## from pyglossary.glossary import Glossary from ui.ui_cmd import COMMAND, help, parseFormatOptionsStr ############################## def dashToCamelCase(text): # converts "hello-PYTHON-user" to "helloPythonUser" parts = text.split('-') parts[0] = parts[0].lower() for i in range(1, len(parts)): parts[i] = parts[i].capitalize() return ''.join(parts) ui_list = ( 'gtk', 'tk', 'qt', ) # log.info('PyGlossary %s'%VERSION) if args.help: help() sys.exit(0) if os.sep != '/': args.noColor = True # only used in ui_cmd for now readOptions = parseFormatOptionsStr(args.readOptions) writeOptions = parseFormatOptionsStr(args.writeOptions) """ examples for read and write options: --read-options testOption=stringValue --read-options enableFoo=True --read-options fooList=[1,2,3] --read-options 'fooList=[1, 2, 3]' --read-options 'testOption=stringValue; enableFoo=True; fooList=[1, 2, 3]' --read-options 'testOption=stringValue;enableFoo=True;fooList=[1,2,3]' """ # FIXME prefOptionsKeys = ( # 'verbosity', 'utf8Check', 'lower', 'skipResources', ) convertOptionsKeys = ( 'direct', 'progressbar', 'sort', 'sortCacheSize', # 'sortKey',# or sortAlg FIXME ) prefOptions = {} for param in prefOptionsKeys: value = getattr(args, param, None) if value is not None: prefOptions[param] = value convertOptions = {} for param in convertOptionsKeys: value = getattr(args, param, None) if value is not None: convertOptions[param] = value log.pretty(prefOptions, 'prefOptions = ') log.pretty(readOptions, 'readOptions = ') log.pretty(writeOptions, 'writeOptions = ') log.pretty(convertOptions, 'convertOptions = ') """ ui_type: User interface type Possible values: cmd - Command line interface, this ui will automatically selected if you give both input and output file gtk - GTK interface tk - Tkinter interface qt - Qt interface auto - Use the first available UI """ ui_type = args.ui_type if args.inputFilename: if args.outputFilename and ui_type != 'none': ui_type = 'cmd' # silently? FIXME else: if ui_type == 'cmd': log.error('no input file given, try --help') exit(1) if ui_type == 'none': if args.reverse: log.error('--reverse does not work with --ui=none') sys.exit(1) glos = Glossary() glos.convert( args.inputFilename, inputFormat=args.inputFormat, outputFilename=args.outputFilename, outputFormat=args.outputFormat, readOptions=readOptions, writeOptions=writeOptions, **convertOptions ) sys.exit(0) elif ui_type == 'cmd': from ui import ui_cmd sys.exit(0 if ui_cmd.UI().run( args.inputFilename, outputFilename=args.outputFilename, inputFormat=args.inputFormat, outputFormat=args.outputFormat, reverse=args.reverse, prefOptions=prefOptions, readOptions=readOptions, writeOptions=writeOptions, convertOptions=convertOptions, ) else 1) if ui_type == 'auto': ui_module = None for ui_type2 in ui_list: try: ui_module = getattr( __import__('ui.ui_%s' % ui_type2), 'ui_%s' % ui_type2, ) except ImportError: log.exception('error while importing UI module:') # FIXME else: break if ui_module is None: log.error('no user interface module found! try "%s -h" to see command line usage' % sys.argv[0]) sys.exit(1) else: ui_module = getattr( __import__('ui.ui_%s' % ui_type), 'ui_%s' % ui_type, ) sys.exit(0 if ui_module.UI(**prefOptions).run( editPath=args.inputFilename, readOptions=readOptions, ) else 1) pyglossary-3.2.1/res/0000755000175000017500000000000013577304644015015 5ustar emfoxemfox00000000000000pyglossary-3.2.1/res/mimapps.list0000755000175000017500000000021713575553425017364 0ustar emfoxemfox00000000000000application/x-extension-bgl=pyglossary.desktop; application/x-sqlite3=sqlitebrowser.desktop;pyglossary.desktop; text/plain=pyglossary.desktop; pyglossary-3.2.1/res/pyglossary.ico0000644000175000017500000002267613575553425017743 0ustar emfoxemfox0000000000000000 %(0` C@=3;S;c;W;;;;  NK7GC@=;;;;;;S; "G zQ0 URoNKHDA>;;;;;;;   t%P 5$Q;VSOLIFB?<;;;k;;h(((3 R=p7d/T%A+jFZWTQMJGD@>V;%;;=Z!!!111&&&"#!A lIHHFAvPt_[XUROLHEB@f<;;; ---999,,,-,.JzLLKKJXsc`]YVSPMJFC@=;9 888@@@000%%%%%%81=OOOOON\shea^[WRMjIjEk@o;{5/   ===???111)))...B1MRayy7bFVFTm)i_tlifc_\WRPNJGB=97޺798Y$ @@@<<<---&&&222F)Xs+oo@u:s8q6BXbupmjgda]ZWTPMJGDA=;;6 $+ 8AAA999+++"""444T-XnFyyx***???111'''"""=78rIB~@|?z=xz=xz=x;{NxfF_/b/b/a/a.a6badb^\YUROKIf=666666))) ***M)MJFDCA}@{>y=w;u;yB|Ev@j;d9b8a7a7Z>>Xh dc`]ZWSPNXa777333'''+++M^yPGEDCA}?{>y=w;u9s8q6p5n4l2j1i0i0i0WAi*gda^[XUS=888000&&&)*(O vjZHFEDBA}?{>yx=w;u9s8q6o5n3MMbbb)))%%% EqTVVVV\dhF{JU_oNA}?{>yQPPPTg2aøȸơqTKHEB~A|@y?rBUTfZ[]]0 /NNOPPPPORs nrBk{n}TiBe~7lv-ro"xj[YYZZ[6]MMMNPPQvkRQPNORUVVV\\WWXYYY >lKLMMMPqaPPPQQQQQSUX_VVWWWW]  DxKKKKXfNOPPPPQQQQS_VVVVVWW8 FIKLerLMMNOPPPPQQaUQSUVVVW{a 'GFNrs|MKLLMMMOPPP_ZQQQQRUVV^9F[uaOJKKKKLMMMOZaPPPQQQQRS;... +MmmUTFFHJKKKKLLVfOPPPPPPQQQ"ɖ```AAA(((h-~z,i [DFFFFGJKKKSjPLNNPPPPPQR~~~|ycvQiAb0Z!QIEFFGIQmTKKLMMNOPPP[* 555VVV{{{srYgAa2]%UL RrWJKKKKLMMMOPV5a_NuscxyfjJj1g&YP IJKKKKLLLM[MDWNA` P@gJOX'VTNKKKLMf=6I?6MF9XF9[E6[+E5_YH3gM7m^Jzue}kjL\0Q~RQS<6(L;,S?.X8?+\jC.cO9obN~n8[0 Z=Y )6(K 6'N"6$TI5 U}?)`Q:rjUz#R'\[&57(K1I/-M[.Q<%^?&d,Z0[(  "$<0!G0>?9????pyglossary-3.2.1/res/pyglossary.png0000755000175000017500000000672413575553425017754 0ustar emfoxemfox00000000000000PNG  IHDR00WsRGBbKGD pHYs B(xtIME $4)C TIDATh͙]lysff\KHвl'hrkM M4($ F/\MAQM: rQ4i"umװ\GeK(Q(KrgbgDIܝ}{|g'`*7:<4 ֫W`ރxv4ь4|QFI)o~cݟs{OOY)#O fN=T;=5`hȜ_ ? c,fF g}ci?-Lؗ:tax@04fA\E Nq'?7K?!'G3䥵Yw2 3=Q;w|q QRָͅ..S[.S/'~!5hTZȣi?lxa83{I2(OEFՍ5V:4J-C~=B#?*ApldxS{{䋅\afWbhl/6C5,L}DMr]ɏ)@%c<7\.BdX<4;;SL'9:닋lQ^"\ݨoG4l!tjxBP31^ yR(TޢriJd z3DܤiL ²ahO|3eJH3cJ#")}}8F{mUk"# }Os ֗0%CfH՛j!6ڹCe\dw#~D}OE[~{H2I `|2 >G(<o ` :m4BM[lbCEE cƮmpwY&уrHDr^Ǿi?|8ƿ㻜߿I/nq5u/+FD4Q Lq{xǾ}tdqdQɥytF^,[kZwtiٲp~Wdiɤ- 0ƠP팊XB~ro U 4])%qS(x_~LY,8ιt:!Pb: ٞGPݓWߢgٱD*>Qu&'3aǴgdASH#ͰW6x??Zjp\p209Փ9h"fasl"d\ {4s "qthZ*˜SиI $7GK-\!vZKuؑVi o85T,㭆xg -xT苚^%$L7[ƘucLa``C@V;{18FR~W:MN]!0c0 z\Ƣh8FW-sREzZkiuм_y鯥8F -m s"\mkqwUsVwM&"'ZGCvY_Y-!n ðu+U:B[jd]}ֶb\g(Xi;9\[ ]Q8Nѯu;E;u&D\xʷq@[B90e ~J$}>NdoZg;#f)㋧PMZl=^c6`{fzTG28ѕ [x,q/UY/nL90bnfn}d࠾[y;z=m41W8>400Y4 u1ZbxS<7u>"Sc1bƏ*HF(:_z+& vg=_~V)1vQ7M5^̐>jTm%q= R/cB*#+COa$qfZfIE0 4j$;01e}f"K`ڢ̠lz})Nk[ ֞fYUֶh^ڠM'olKvܟ#~S=2'P (`h|7N1"i4\5Ɯ3|kT׈6clh=8ϨAOp*z5 @8(:l$`zM.F"+8mHG;@Ԋ$N8ە.=3_F[Y$oO#nvgQbwUK^?\QYIENDB`pyglossary-3.2.1/res/pyglossary.xbm0000755000175000017500000003433313575553425017753 0ustar emfoxemfox00000000000000#define pyglossary_width 129 #define pyglossary_height 136 static unsigned char pyglossary_bits[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x01, 0x77, 0xef, 0xde, 0xbd, 0x7b, 0xf7, 0xee, 0xdd, 0xbb, 0x77, 0x6f, 0xa1, 0x77, 0xef, 0xde, 0xbd, 0x01, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xbd, 0x00, 0xff, 0xff, 0xff, 0xff, 0x00, 0xff, 0xfe, 0xfd, 0xfb, 0xf7, 0xef, 0xdf, 0xbf, 0x7f, 0x7f, 0x03, 0x00, 0xff, 0xfe, 0xfd, 0xfb, 0x01, 0xef, 0xdf, 0xbf, 0x7f, 0xff, 0xfe, 0xfd, 0xfb, 0xf7, 0x15, 0x00, 0x00, 0xef, 0xdf, 0xbf, 0xff, 0x01, 0xfe, 0xfd, 0xfb, 0xf7, 0xef, 0xdf, 0xbf, 0x7f, 0xbf, 0x02, 0xc0, 0x07, 0xfe, 0xfd, 0xfb, 0x77, 0x01, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x0b, 0x00, 0xfe, 0x07, 0xfe, 0xff, 0xff, 0xff, 0x01, 0xdb, 0xbb, 0x77, 0xef, 0xde, 0xbd, 0x7b, 0x57, 0x00, 0xf0, 0xff, 0x0f, 0xee, 0xbb, 0xb7, 0xff, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x05, 0x80, 0xff, 0xff, 0x0f, 0xbe, 0xff, 0xff, 0xee, 0x01, 0x7f, 0x7f, 0xff, 0xfe, 0xfd, 0xfb, 0x2b, 0x00, 0xfc, 0xff, 0xff, 0x0f, 0xfc, 0x7f, 0xff, 0xff, 0x01, 0xff, 0xf7, 0xef, 0xdf, 0xbf, 0x6f, 0x05, 0xe0, 0xff, 0xff, 0xff, 0x00, 0xf8, 0xf7, 0xff, 0xfd, 0x00, 0xed, 0xff, 0xfd, 0xfb, 0xf7, 0x17, 0x00, 0xff, 0xff, 0xff, 0x07, 0x00, 0xb8, 0xfd, 0xed, 0xef, 0x01, 0xff, 0xde, 0xff, 0xff, 0xbf, 0x00, 0xf0, 0xff, 0xff, 0x7f, 0x00, 0x00, 0xf8, 0xff, 0xff, 0xff, 0x01, 0xff, 0xff, 0x6f, 0xdf, 0x0a, 0xa8, 0xff, 0xff, 0xff, 0x03, 0x00, 0x00, 0xf8, 0xef, 0xbf, 0xbf, 0x01, 0xee, 0xfd, 0xfe, 0x7b, 0x81, 0xfa, 0xff, 0xff, 0x3f, 0x00, 0x00, 0x00, 0xd0, 0xdf, 0xfd, 0xfb, 0x00, 0xff, 0xff, 0xff, 0x06, 0xe8, 0xff, 0xff, 0xff, 0x01, 0x00, 0x00, 0x00, 0xf8, 0xfe, 0xff, 0xfe, 0x01, 0xdf, 0xdb, 0x97, 0x00, 0xff, 0xff, 0xff, 0x0f, 0x00, 0x00, 0x00, 0x00, 0xf0, 0xff, 0xef, 0xff, 0x01, 0xff, 0x7f, 0x01, 0xfa, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x6f, 0xff, 0x77, 0x01, 0xfd, 0x17, 0xa0, 0xff, 0xff, 0xff, 0x07, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0xfe, 0xfd, 0xff, 0x01, 0x2f, 0x00, 0xea, 0xff, 0xff, 0x3f, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd0, 0xff, 0x6f, 0xff, 0x00, 0x1f, 0xc0, 0xff, 0xff, 0xff, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0xff, 0xff, 0xef, 0x01, 0x06, 0xfd, 0xff, 0xff, 0x1f, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe0, 0xb6, 0xff, 0xfe, 0x01, 0xc3, 0xff, 0xff, 0xff, 0x01, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0xff, 0xf7, 0xff, 0x00, 0xe3, 0xff, 0xff, 0x07, 0x28, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe0, 0x7f, 0xff, 0xed, 0x01, 0xe3, 0xff, 0xff, 0x20, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0xfb, 0xdf, 0xff, 0x01, 0xe3, 0xff, 0x07, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xa0, 0xff, 0xfe, 0xdf, 0x01, 0xc3, 0x7f, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe0, 0xef, 0xbf, 0xff, 0x01, 0x87, 0x10, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe0, 0xfe, 0xfd, 0xfd, 0x01, 0x03, 0x82, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe0, 0xff, 0xff, 0xbf, 0x01, 0x07, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0xf7, 0xfb, 0xfb, 0x01, 0x47, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe0, 0xbf, 0xbf, 0xff, 0x01, 0x07, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0xfe, 0xf7, 0xf7, 0x01, 0x07, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0xf7, 0xff, 0xbf, 0x01, 0x87, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0xbf, 0xbf, 0xff, 0x01, 0x07, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0xff, 0xfb, 0xfb, 0x01, 0x0f, 0x12, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0xfb, 0x7e, 0xbf, 0x01, 0x87, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0xff, 0xff, 0xff, 0x01, 0x0f, 0x04, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0xef, 0xf7, 0xfb, 0x01, 0x0e, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0xff, 0xff, 0xdf, 0x01, 0x07, 0x08, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0xbd, 0x6f, 0xff, 0x01, 0x0f, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x7f, 0xff, 0xff, 0x01, 0x0f, 0x88, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0xff, 0xff, 0xdd, 0x00, 0x0f, 0x22, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0xfb, 0xee, 0xff, 0x01, 0x1f, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0x01, 0x0f, 0x42, 0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0xef, 0xbf, 0xed, 0x00, 0x1f, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xfe, 0xff, 0x01, 0x1f, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0xfe, 0xfb, 0xff, 0x01, 0x1f, 0x00, 0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf7, 0xdf, 0xb7, 0x01, 0x1f, 0x12, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0x01, 0x00, 0x00, 0x00, 0xdf, 0x7f, 0xff, 0x00, 0x3e, 0x40, 0x08, 0x00, 0x00, 0x00, 0x00, 0xfc, 0xff, 0x0f, 0x00, 0x00, 0x00, 0xff, 0xfe, 0xff, 0x01, 0x3f, 0x00, 0x01, 0x00, 0x00, 0x00, 0xc0, 0xff, 0xff, 0x3f, 0x00, 0x00, 0x00, 0xfe, 0xff, 0xfe, 0x01, 0x3f, 0x44, 0x02, 0x00, 0x00, 0x00, 0xfe, 0x5f, 0xc0, 0x7f, 0x21, 0x00, 0x00, 0xf7, 0xfd, 0x77, 0x01, 0x3b, 0x10, 0x08, 0x00, 0x00, 0xf0, 0x7f, 0x01, 0x00, 0xa5, 0xde, 0x02, 0x00, 0xfe, 0xef, 0xff, 0x01, 0x3f, 0x80, 0x00, 0x00, 0x00, 0xfc, 0x3f, 0x00, 0xc0, 0xda, 0x52, 0xad, 0x00, 0xde, 0xff, 0xef, 0x00, 0x3f, 0x10, 0x02, 0x00, 0x00, 0x7c, 0x3f, 0x00, 0xb0, 0x56, 0x55, 0x55, 0x05, 0xfe, 0xde, 0xfe, 0x01, 0x7f, 0x84, 0x08, 0x00, 0x00, 0x00, 0x7e, 0x00, 0xc0, 0xb5, 0xaa, 0xaa, 0x02, 0xfe, 0xff, 0xff, 0x01, 0x7b, 0x20, 0x00, 0x00, 0x00, 0x00, 0x3e, 0x00, 0x58, 0xd0, 0x56, 0x55, 0x2d, 0xf6, 0xfd, 0xef, 0x00, 0x3f, 0x08, 0x01, 0x00, 0x00, 0x00, 0x7e, 0x00, 0x54, 0xa0, 0xaa, 0x0a, 0x52, 0xfe, 0xdf, 0xfd, 0x01, 0x7f, 0x20, 0x10, 0x00, 0x00, 0x00, 0x7e, 0x00, 0x0a, 0x40, 0x55, 0x55, 0x95, 0xde, 0xff, 0xff, 0x01, 0x7b, 0x88, 0x04, 0x00, 0x00, 0x00, 0x7e, 0x00, 0x14, 0x80, 0xaa, 0xaa, 0x42, 0xfe, 0xfe, 0xef, 0x00, 0x7f, 0x00, 0x00, 0x00, 0x00, 0x00, 0x7e, 0x00, 0x14, 0x80, 0xaa, 0xaa, 0x94, 0xfc, 0x6f, 0xff, 0x01, 0xff, 0x20, 0x08, 0x00, 0x00, 0x00, 0xfc, 0x00, 0x2c, 0x40, 0x55, 0x92, 0x52, 0xde, 0xfb, 0xdf, 0x01, 0xfb, 0x00, 0x21, 0x00, 0x00, 0x00, 0x7e, 0x00, 0x0a, 0x40, 0x55, 0x55, 0xa5, 0xfc, 0xff, 0xfd, 0x00, 0xff, 0x40, 0x04, 0x00, 0x00, 0x00, 0xfc, 0x00, 0x14, 0xa0, 0xaa, 0x2a, 0x95, 0xfe, 0xff, 0xff, 0x01, 0xf7, 0x10, 0x10, 0x00, 0x00, 0x00, 0x7c, 0x00, 0x74, 0x50, 0x55, 0xa5, 0x24, 0x74, 0xbf, 0xff, 0x01, 0xff, 0x81, 0x00, 0x00, 0x00, 0x00, 0xfc, 0x00, 0x4c, 0xab, 0xaa, 0x94, 0x4a, 0xfc, 0xfb, 0xed, 0x00, 0xef, 0x21, 0x24, 0x00, 0x00, 0x00, 0xfc, 0x00, 0x58, 0xad, 0xaa, 0x2a, 0xa9, 0xfc, 0xff, 0xff, 0x01, 0x7f, 0x01, 0x02, 0x00, 0x00, 0x00, 0xf8, 0x00, 0x54, 0x55, 0xa5, 0x50, 0x12, 0xec, 0xf7, 0xff, 0x01, 0xff, 0x81, 0x50, 0x00, 0x00, 0x00, 0xfc, 0x00, 0x54, 0x55, 0x55, 0x0a, 0x55, 0x70, 0xdf, 0xdd, 0x00, 0xfe, 0x23, 0x04, 0x00, 0x00, 0x00, 0xf8, 0x01, 0x00, 0x00, 0xbc, 0x52, 0x22, 0xa0, 0xfd, 0xff, 0x01, 0xf7, 0x81, 0x10, 0x00, 0x00, 0x00, 0xf8, 0x00, 0x00, 0x00, 0xb4, 0x48, 0x4a, 0x00, 0xc0, 0xff, 0x01, 0xff, 0x03, 0x82, 0x00, 0x00, 0x00, 0xf8, 0x01, 0x00, 0x00, 0x5c, 0xaa, 0x28, 0x00, 0x00, 0xba, 0x01, 0xbe, 0x03, 0x20, 0x00, 0x00, 0x00, 0xf8, 0x01, 0x00, 0x00, 0xaa, 0x92, 0x42, 0xf0, 0x1f, 0xfc, 0x00, 0xf7, 0x81, 0x08, 0x00, 0x00, 0x00, 0x50, 0xaa, 0x2a, 0x55, 0x95, 0x24, 0x15, 0xf0, 0xff, 0xf0, 0x01, 0xff, 0x03, 0x82, 0x00, 0x00, 0x00, 0x98, 0xdb, 0xaa, 0xaa, 0x2a, 0x55, 0x24, 0xf0, 0xff, 0xf1, 0x01, 0xff, 0x47, 0x21, 0x00, 0x00, 0x00, 0x54, 0x56, 0x55, 0x55, 0xa1, 0x24, 0x49, 0xf0, 0xff, 0x63, 0x01, 0xef, 0x03, 0x04, 0x00, 0x00, 0x00, 0xcb, 0x58, 0x55, 0x95, 0x4a, 0x49, 0x12, 0xf8, 0xff, 0xe3, 0x01, 0xff, 0x07, 0x01, 0x01, 0x00, 0x00, 0xb6, 0xaa, 0xaa, 0x52, 0x29, 0x95, 0x24, 0xf0, 0xff, 0xc7, 0x00, 0xbd, 0x47, 0x28, 0x00, 0x00, 0x80, 0x55, 0xd6, 0x88, 0xa8, 0x52, 0x22, 0x49, 0xf8, 0xff, 0xc7, 0x01, 0xff, 0x07, 0x01, 0x00, 0x00, 0x00, 0xa8, 0x2a, 0x22, 0x0a, 0xa5, 0x54, 0x12, 0xf0, 0xff, 0x8f, 0x01, 0x7b, 0x07, 0x94, 0x00, 0x00, 0x40, 0xb1, 0xad, 0xaa, 0xa2, 0x4a, 0x45, 0x52, 0xf8, 0xff, 0x8f, 0x00, 0xff, 0x0f, 0x01, 0x00, 0x00, 0x80, 0x56, 0x55, 0x55, 0x29, 0x95, 0x28, 0x09, 0xf0, 0xff, 0x9f, 0x01, 0xf7, 0x07, 0x92, 0x00, 0x00, 0x40, 0xb1, 0xaa, 0xaa, 0x52, 0xa9, 0x4a, 0x52, 0xf8, 0xff, 0x9f, 0x01, 0xff, 0x8e, 0x00, 0x00, 0x00, 0xa0, 0xad, 0x55, 0x95, 0x4a, 0x12, 0x90, 0x08, 0xf8, 0xff, 0x1f, 0x01, 0xff, 0x0f, 0x2a, 0x01, 0x00, 0x60, 0x55, 0x55, 0x55, 0x15, 0xa5, 0x25, 0x29, 0xf8, 0xff, 0x9f, 0x00, 0xf7, 0x07, 0x00, 0x00, 0x00, 0xc0, 0x5a, 0xad, 0xaa, 0x4a, 0x2a, 0x92, 0x22, 0xf8, 0xff, 0x3f, 0x01, 0xbf, 0x1f, 0x4a, 0x02, 0x00, 0x20, 0x16, 0x50, 0x55, 0x90, 0xa4, 0x24, 0x0a, 0xfc, 0xff, 0x1f, 0x01, 0xff, 0x0f, 0x01, 0x00, 0x00, 0xd0, 0x4a, 0x55, 0xa5, 0xaa, 0x12, 0x49, 0x10, 0xfe, 0xff, 0x3f, 0x00, 0x7b, 0x1f, 0x94, 0x00, 0x00, 0xa0, 0x6a, 0x55, 0x55, 0x4a, 0x52, 0x49, 0x05, 0xff, 0xff, 0x3f, 0x01, 0xff, 0x0f, 0x02, 0x04, 0x00, 0xd0, 0xa8, 0xaa, 0xaa, 0x94, 0x04, 0x12, 0x01, 0xff, 0xff, 0x3f, 0x00, 0xf7, 0x1e, 0x94, 0x00, 0x00, 0x30, 0x57, 0x55, 0x40, 0x21, 0x51, 0x20, 0xc0, 0xff, 0xff, 0x3f, 0x01, 0xff, 0x1f, 0x03, 0x00, 0x00, 0xa0, 0xaa, 0xaa, 0x2a, 0x00, 0x00, 0x00, 0xc0, 0xff, 0xff, 0x3f, 0x00, 0xef, 0x1f, 0x28, 0x01, 0x00, 0x50, 0xa1, 0xaa, 0x04, 0x00, 0x00, 0x00, 0xf8, 0xff, 0xff, 0x3f, 0x01, 0xff, 0x3e, 0x02, 0x04, 0x00, 0x20, 0x5a, 0x55, 0x05, 0x40, 0x92, 0x02, 0xff, 0xff, 0xff, 0x3f, 0x00, 0xfe, 0x1f, 0x54, 0x00, 0x00, 0x90, 0xaa, 0xaa, 0x80, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x3f, 0x01, 0xef, 0x3f, 0x00, 0x01, 0x00, 0x60, 0x55, 0x55, 0xc1, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x3f, 0x00, 0x7f, 0x3b, 0x2a, 0x00, 0x00, 0x50, 0x55, 0x25, 0xf0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x3f, 0x01, 0xff, 0x5f, 0x84, 0x04, 0x00, 0xa0, 0xaa, 0xaa, 0xf8, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x3f, 0x00, 0xf6, 0x3f, 0x10, 0x00, 0x00, 0x40, 0x55, 0x15, 0xfc, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x1f, 0x01, 0xff, 0x3e, 0x4a, 0x08, 0x00, 0xa0, 0xaa, 0x2a, 0xfc, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x3f, 0x01, 0xef, 0x7f, 0x10, 0x21, 0x00, 0xc0, 0x2a, 0x09, 0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x9f, 0x01, 0xff, 0x3d, 0x48, 0x00, 0x00, 0x40, 0x55, 0x15, 0xfc, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x1f, 0x01, 0xfe, 0x7f, 0x10, 0x09, 0x00, 0x80, 0xaa, 0x12, 0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x8f, 0x01, 0xef, 0x77, 0x20, 0x00, 0x00, 0x40, 0x55, 0x05, 0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x8f, 0x01, 0xbf, 0xff, 0x44, 0x02, 0x00, 0x80, 0xaa, 0x1a, 0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x8f, 0x00, 0xfe, 0x7f, 0x10, 0x11, 0x00, 0x80, 0x4a, 0x02, 0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xc7, 0x01, 0xff, 0xfe, 0x20, 0x04, 0x00, 0x00, 0x55, 0x15, 0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xe7, 0x01, 0xfd, 0x77, 0x48, 0x01, 0x00, 0x00, 0xaa, 0x0a, 0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xc7, 0x01, 0xdf, 0xff, 0x00, 0x09, 0x00, 0x00, 0x94, 0x14, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xf1, 0x01, 0xfb, 0xff, 0x28, 0x20, 0x00, 0x00, 0xa0, 0x12, 0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xe1, 0x01, 0x7f, 0xf7, 0xa1, 0x02, 0x00, 0x00, 0x54, 0x0a, 0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xb8, 0x01, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00, 0x80, 0x24, 0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f, 0xf8, 0x01, 0xee, 0xdf, 0xa9, 0x0a, 0x00, 0x00, 0x00, 0x00, 0xfe, 0xff, 0xdf, 0xaa, 0x24, 0x22, 0x01, 0xfe, 0x01, 0x7f, 0xff, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0xfe, 0xff, 0x1f, 0x00, 0x00, 0x00, 0x00, 0xff, 0x01, 0xff, 0xfb, 0x51, 0x01, 0x00, 0x00, 0x00, 0x00, 0xfe, 0xff, 0x1f, 0x00, 0x00, 0x48, 0xf2, 0xb7, 0x01, 0xee, 0xff, 0x01, 0x14, 0x00, 0x00, 0x00, 0x00, 0xfe, 0xff, 0xbf, 0xff, 0x5f, 0xf8, 0xff, 0xff, 0x01, 0xbf, 0xdf, 0xa1, 0x00, 0x00, 0x00, 0x00, 0x00, 0xfe, 0xff, 0xff, 0xff, 0xff, 0xf8, 0xff, 0xfe, 0x01, 0xff, 0xfd, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0xfe, 0xff, 0xff, 0xff, 0xff, 0xf9, 0xdb, 0xef, 0x01, 0xfd, 0xff, 0x83, 0x24, 0x00, 0x00, 0x00, 0x00, 0xfe, 0xff, 0xff, 0xff, 0xff, 0xb1, 0xff, 0xff, 0x01, 0xff, 0xdb, 0x22, 0x00, 0x00, 0x00, 0x00, 0x00, 0xfe, 0xff, 0xff, 0xff, 0xff, 0xf9, 0x7f, 0xdf, 0x00, 0xb7, 0xff, 0x87, 0x02, 0x00, 0x00, 0x00, 0x00, 0xfe, 0xff, 0xff, 0x7f, 0xfc, 0xf9, 0xfd, 0xfd, 0x01, 0xff, 0xff, 0x03, 0x08, 0x00, 0x00, 0x00, 0x34, 0xfe, 0xff, 0xff, 0x1f, 0xf8, 0xb9, 0xef, 0xff, 0x01, 0xfe, 0xff, 0x0f, 0x01, 0x00, 0x00, 0xa0, 0x0e, 0xfe, 0xff, 0xff, 0x8f, 0xf0, 0xf8, 0xff, 0xff, 0x00, 0x6f, 0xdb, 0x07, 0x00, 0x00, 0x00, 0xda, 0x3f, 0xfe, 0xff, 0xff, 0x8f, 0xf3, 0xf9, 0xff, 0xf6, 0x01, 0xff, 0xff, 0x0f, 0x00, 0x00, 0xe8, 0xff, 0x7f, 0xfe, 0xff, 0xff, 0x8f, 0xf1, 0xfc, 0xee, 0xdf, 0x01, 0xfe, 0xbf, 0x0d, 0x00, 0xd0, 0xf7, 0xfe, 0xfe, 0xfc, 0xff, 0xff, 0x1f, 0xf1, 0xd8, 0xff, 0xff, 0x00, 0xff, 0xbd, 0x17, 0x40, 0xbf, 0xfe, 0xfb, 0x6d, 0xfc, 0xff, 0xff, 0x1f, 0xf8, 0xfc, 0x7f, 0xff, 0x01, 0xed, 0x77, 0x55, 0xff, 0xf5, 0xdf, 0xdf, 0xff, 0xf8, 0xff, 0xff, 0x3f, 0x7c, 0xfc, 0xee, 0xfb, 0x01, 0xbf, 0xef, 0xee, 0xda, 0xff, 0xff, 0xff, 0xff, 0xf9, 0xff, 0xff, 0xff, 0x3f, 0xfe, 0xff, 0xef, 0x00, 0xff, 0xde, 0xfb, 0xff, 0xff, 0xfd, 0xff, 0xdb, 0xe2, 0xff, 0xff, 0xff, 0x1f, 0xf6, 0xfd, 0xff, 0x01, 0xfe, 0xbd, 0xdf, 0xff, 0xdf, 0xbf, 0xdd, 0xff, 0xc1, 0xff, 0xff, 0xff, 0x8f, 0xdf, 0x7f, 0xff, 0x01, 0xf7, 0xff, 0xff, 0xdb, 0xfe, 0xfb, 0xff, 0xff, 0x0f, 0xff, 0xff, 0xff, 0x83, 0xff, 0xf7, 0xb7, 0x01, 0xff, 0xff, 0xbf, 0xff, 0xfb, 0xff, 0xff, 0xfe, 0x0f, 0xf8, 0xff, 0x7f, 0xe0, 0xff, 0xff, 0xff, 0x00, 0xef, 0xbd, 0xfd, 0xff, 0xdf, 0xef, 0xf6, 0xf7, 0x7f, 0x80, 0xff, 0x07, 0x70, 0xdb, 0xde, 0xfe, 0x01, 0xde, 0xfb, 0xff, 0xef, 0xff, 0xfe, 0xdf, 0xbf, 0xfd, 0x01, 0x00, 0x00, 0xfe, 0xff, 0xff, 0xff, 0x01, 0xff, 0xff, 0xbb, 0x7d, 0xff, 0xff, 0xff, 0xff, 0xdf, 0x0f, 0x00, 0xe0, 0xff, 0xff, 0xfb, 0xdb, 0x01 }; pyglossary-3.2.1/res/resize-16.png0000644000175000017500000000124713575553425017255 0ustar emfoxemfox00000000000000PNG  IHDRasRGBbKGD pHYs  tIME.rtEXtCommentCreated with The GIMPd%nIDAT8˕?hSaE2ZLڈ(ɢɞ8ԬB  \ m]2t)}|ǡ~{΅S6p;<`()h_W`}>p.ϓJ$@nK[__p?222455et:-`(V*IfYn;[[[ԭVos|>XEIGd$ukF\$5 ߀WKKKO| <ѳb^.JS(3An\.icc c +w{}t%Fgw:|>zmm zwLpVALp9Nl6!.ZflooOR'}E.:83{' "!iD>a3 s0 1fr9^8>>vLFSxf8fsZph5 \?_ ; IENDB`pyglossary-3.2.1/res/resize.png0000644000175000017500000001323713575553425017033 0ustar emfoxemfox00000000000000PNG  IHDR #ꦷ pHYs7\7\Ǥ vpAg bKGD XIDATxYwTS?(EA (cÊ{C(Hj&( :6HÀRBHBi@;sYoz{>fy: !#l ?!zAuS4q!EDt@L{ʩT\)B@;<ApBv)!ܣrY*xg>ʵ^Ss6860MwY?+cs sb)! 6^l(^aO쏰3R,'nVbb|6#ĝr\q le((zWt"Buc\+cC"3B^ \i^4w&CYZ> u@u[m.M.5f,"8_gs+cpʤFՍz̡@2uWIMֻeY1T߈G${~ ^WzBLpk9bEj }cG3o'j@tdEV>:;Sɺ˕NbwF1v=5)Z@*d;&7s Ɔ[zFKW/~2cO)1P-JбyQ<56ekog?e`͹Fpc7 ~l*5ǯAeԘ|D=۸yNTc=kuȓMQ1陖+@qy%hcd߈U03lWrNޡM ĵ4- >Sf ? 0Uv@y]^{* %T) ȶ9HۿtɖylޜEhO<*ޞ=::)ʚKZvp`a%{ү6dlkM67e즒fWƻ- `aƪ҆{@Q]{}n,yK6Q/M}g3ӇL9UF0BVnoFyQ6lf~> kM;EYSDx]9*vcy*ndk]6\6 Gs cq+7:Z}a Ra  zz{`uG=pS[nuVwa:t-/aLwJгgrw_wb}oZv07km2?nh: مgi!N:22>qt;iCʍܬC|8ozCs07yH ;ؐ`v/F YLbYjl֗20] H|p0DxABc]x!dA(:CK[_ۀRЂyȯA1   ,l{o5Z /g\r [EkӕII<[la۪\)VeF*FHit NAE=`ZZ TgӮs nSrHVgxQi<>wOK} NaE^\$ik{G`G:F̆u;Z"$B,C^$LP:M` H Z36TM@G.@+Ef՞~#DLH~I8?R(fUD4݆B;{3ԃaơfjzTr<*x?0q'n ͐B糃@Śk Hܲ,ɥVfXҳp6=LIan"bel MV]* ij!"f ѓ/mA"`':6@+U꒖vdj8Qӏ`pkI&:)`48x7!r$ ͽ)@ Ruِ ґQ _mਝEѩcPnh,[ mCDht{#EO6vvcd!bLCD|D FJ,k0m/E?Oޝ =DDg"bwd7PXfd̾c ״ YyoS CEQʝHVcx#;bvʟ5G3ElTJ{(Lt@.DG.` KTX1BPŧ.}2ȋR@}K-@t(|SC'fxoS3 5 V NfG(܅A3kԌX,2y/ڭה:h9#= Gm};P=b10"YqvVXxxtq=i~'\3}|3&]}Oրo>ˮ(Yr_>s:l"w'95P!ҔRd '{W>AI$I:A)ٌP\.YO\x9P^*fBa!ҾNf:%gLGEh\~wL J0Ո4ah}ޣ +/=}+e!|Ԛ_ X$gjO͜{L*\}p)q.!zոQS[MS!YcRt]6ɊxK ۽RvtIxYByH"ihpE ݴ4yPUӛweV'}V~vIr.C~.hME{AWc}`\*N8QO5i;caƑR+郿KQt;Mpn3J2Yh5fĽǰ9R1H5/$r!=%y>y&. ?Bnsݷ8S ;V=-t5T'}a7c~ܸF衖G[FZ~ꩋfh7\}?S ]3 ;rLZC:pb܈ ڋeh+Z&vQuJ ൡ*t׊@^w!߂bt:?z=o4݇W`sЭVmo5,x3xqY˸ ghUmݒ0-Sh⩐"=,`/³l_sr}J| ۤ\O;P?*lzz#qHewy+<8i8 E򰦸To>6Z|9qgbSU|f$hO ;[~Y(1:~`׫[ ?(Mp"v tyMMbǣ %;3ಕ1'" KFfV,H7yl༪\ЄĮrљ_IK*5tj-'arJB^ 5sXS~['[:Ȣsu)3k_SSg9ߦi8.bOaw{lRHfC rgA8u8qho6]vސj_Fߌ^6=\e^pF#!MI/%8㵄vJnYB=ڻ²\=nFD+ZbG~\)|ן*i>?0U*=i#ni)K\"ڨ{*dNlg7޸Ѿ#}S hXlNś)wY7x]FG i5.T# )h 88td.N˱@C0! qR)cg770+'T*%y%%`h}b<5J@Á5P^OQxߐ/0FGgc;lt_}t}HM':)!pL8pMKj*u.`B\o7/Q}h_`xÞ4\6" 1=6z:xFhn Q"zTXtSoftwarex+//.NN,H/J6XS\IENDB`pyglossary-3.2.1/setup.cfg0000644000175000017500000000004613577304644016045 0ustar emfoxemfox00000000000000[egg_info] tag_build = tag_date = 0 pyglossary-3.2.1/setup.py0000755000175000017500000001041013577304611015727 0ustar emfoxemfox00000000000000#!/usr/bin/env python3 try: import py2exe except ImportError: py2exe = None import glob import sys import os from os.path import join, dirname, exists, isdir import re import logging import setuptools from distutils.core import setup from distutils.command.install import install from pyglossary.glossary import VERSION log = logging.getLogger("root") relRootDir = "share/pyglossary" class my_install(install): def run(self): install.run(self) if os.sep == "/": binPath = join(self.install_scripts, "pyglossary") log.info("creating script file \"%s\"" % binPath) if not exists(self.install_scripts): os.makedirs(self.install_scripts) # let it fail on wrong permissions. else: if not isdir(self.install_scripts): raise OSError( "installation path already exists " + "but is not a directory: %s" % self.install_scripts ) open(binPath, "w").write( join(self.install_data, relRootDir, "pyglossary.pyw") + " \"$@\"" # pass options from command line ) os.chmod(binPath, 0o755) data_files = [ (relRootDir, [ "about", "license.txt", "license-dialog", "help", "pyglossary.pyw", "AUTHORS", "config.json", ]), (relRootDir+"/ui", glob.glob("ui/*.py")), (relRootDir+"/ui/glade", glob.glob("ui/glade/*")), (relRootDir+"/ui/gtk3_utils", glob.glob("ui/gtk3_utils/*.py")), (relRootDir+"/res", glob.glob("res/*")), ("share/doc/pyglossary", []), ("share/doc/pyglossary/non-gui_examples", glob.glob("doc/non-gui_examples/*")), ("share/doc/pyglossary/Babylon", glob.glob("doc/Babylon/*")), ("share/doc/pyglossary/DSL", glob.glob("doc/DSL/*")), ("share/doc/pyglossary/Octopus MDict", glob.glob("doc/Octopus MDict/*")), ("share/applications", ["pyglossary.desktop"]), ("share/pixmaps", ["res/pyglossary.png"]), ] def files(folder): for path in glob.glob(folder+"/*"): if os.path.isfile(path): yield path if py2exe: py2exeoptions = { "script_args": ["py2exe"], "bundle_files": 1, "windows": [ { "script": "pyglossary.pyw", "icon_resources": [ (1, "res/pyglossary.ico"), ], }, ], "zipfile": None, "options": { "py2exe": { "packages": [ "pyglossary", "ui.ui_tk", # "ui.ui_gtk", # "BeautifulSoup", # "xml", "Tkinter", "tkFileDialog", "Tix", # "gtk", "glib", "gobject", ], }, }, } data_files = [ ("tcl/tix8.1", files(join(sys.prefix, "tcl", "tix8.4.3"))), ("tcl/tix8.1/bitmaps", files(join(sys.prefix, "tcl", "tix8.4.3", "bitmaps"))), ("tcl/tix8.1/pref", files(join(sys.prefix, "tcl", "tix8.4.3", "pref"))), ("tcl/tcl8.1/init.tcl", [join(sys.prefix, "tcl", "tix8.4.3", "init.tcl")]), ("", ["about", "license-dialog", "help"]), ("ui", glob.glob("ui/*.py")), ("ui/glade", glob.glob("ui/glade/*")), ("res", glob.glob("res/*")), ("plugins", glob.glob("pyglossary/plugins/*")), ("plugin_lib", glob.glob("pyglossary/plugin_lib/*")), ("doc/pyglossary", ["doc/bgl_structure.svgz", ]), ("doc/pyglossary/non-gui_examples", glob.glob("doc/non-gui_examples/*")), ] for pyVer in ("34", "35", "36"): relPath = "plugin_lib/py%s" % pyVer data_files.append(( relPath, glob.glob("pyglossary/" + relPath + "/*.py", ))) else: py2exeoptions = {} with open("README.md", "r") as fh: long_description = fh.read() setup( name="pyglossary", version=VERSION, cmdclass={ "install": my_install, }, description="A tool for workig with dictionary databases", long_description_content_type="text/markdown", long_description=long_description, author="Saeed Rasooli", author_email="saeed.gnu@gmail.com", license="GPLv3", url="https://github.com/ilius/pyglossary", scripts=[ # "pyglossary.pyw", ], packages=[ "pyglossary", ], package_data={ "pyglossary": [ "plugins/*.py", "plugin_lib/*.py", "plugin_lib/py*/*.py", ] + [ # safest way found so far to include every resource of plugins # producing plugins/pkg/*, plugins/pkg/sub1/*, ... except .pyc/.pyo re.sub( r"^.*?pyglossary%s(?=plugins)" % ("\\\\" if os.sep == "\\" else os.sep), "", join(dirpath, f), ) for top in glob.glob( join(dirname(__file__), "pyglossary", "plugins") ) for dirpath, _, files in os.walk(top) for f in files if not (f.endswith(".pyc") or f.endswith(".pyo")) ], }, data_files=data_files, **py2exeoptions ) pyglossary-3.2.1/ui/0000755000175000017500000000000013577304644014641 5ustar emfoxemfox00000000000000pyglossary-3.2.1/ui/__init__.py0000644000175000017500000000000013575553425016741 0ustar emfoxemfox00000000000000pyglossary-3.2.1/ui/base.py0000644000175000017500000000440713577304507016130 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- ## ## Copyright © 2012 Saeed Rasooli (ilius) ## This file is part of PyGlossary project, https://github.com/ilius/pyglossary ## ## This program is a free software; you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by ## the Free Software Foundation; either version 3, or (at your option) ## any later version. ## ## This program is distributed in the hope that it will be useful, ## but WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License along ## with this program. Or on Debian systems, from /usr/share/common-licenses/GPL ## If not, see . from os.path import join, isfile from pyglossary.core import ( rootConfJsonFile, confJsonFile, rootDir, dataDir, appResDir, ) from pyglossary.glossary import * from pyglossary.json_utils import jsonToData def fread(path): with open(path, encoding="utf-8") as fp: return fp.read() logo = join(appResDir, 'pyglossary.png') aboutText = fread(join(dataDir, 'about')) licenseText = fread(join(dataDir, 'license-dialog')) authors = fread(join(dataDir, 'AUTHORS')).split('\n') class UIBase(object): prefKeys = ( 'noProgressBar',## command line 'ui_autoSetFormat', 'ui_autoSetOutputFileName', 'lower', 'utf8Check', 'enable_alts', ## Reverse Options: 'reverse_matchWord', 'reverse_showRel', 'reverse_saveStep', 'reverse_minRel', 'reverse_maxNum', 'reverse_includeDefs', ) def pref_load(self, **options): data = jsonToData(fread(rootConfJsonFile)) if isfile(confJsonFile): try: userData = jsonToData(fread(confJsonFile)) except Exception: log.exception('error while loading user config file "%s"'%confJsonFile) else: data.update(userData) for key in self.prefKeys: try: self.pref[key] = data.pop(key) except KeyError: pass for key, value in data.items(): log.warning('unkown config key "%s"'%key) for key, value in options.items(): if key in self.prefKeys: self.pref[key] = value return True def progressEnd(self): self.progress(1.0) pyglossary-3.2.1/ui/gtk3_utils/0000755000175000017500000000000013577304644016731 5ustar emfoxemfox00000000000000pyglossary-3.2.1/ui/gtk3_utils/__init__.py0000644000175000017500000000011213575553425021035 0ustar emfoxemfox00000000000000from gi.repository import Gtk as gtk from gi.repository import Gdk as gdk pyglossary-3.2.1/ui/gtk3_utils/dialog.py0000644000175000017500000000113513575553425020543 0ustar emfoxemfox00000000000000from gi.repository import Gtk as gtk from gi.repository import Gdk as gdk class MyDialog(object): def startWaiting(self): self.queue_draw() self.vbox.set_sensitive(False) self.get_window().set_cursor(gdk.Cursor.new(gdk.CursorType.WATCH)) while gtk.events_pending(): gtk.main_iteration_do(False) def endWaiting(self): self.get_window().set_cursor(gdk.Cursor.new(gdk.CursorType.LEFT_PTR)) self.vbox.set_sensitive(True) def waitingDo(self, func, *args, **kwargs): self.startWaiting() try: func(*args, **kwargs) except Exception as e: raise e finally: self.endWaiting() pyglossary-3.2.1/ui/gtk3_utils/resize_button.py0000644000175000017500000000075213575553425022204 0ustar emfoxemfox00000000000000from . import * from .utils import * class ResizeButton(gtk.EventBox): def __init__(self, win, edge=gdk.WindowEdge.SOUTH_EAST): gtk.EventBox.__init__(self) self.win = win self.edge = edge ### self.image = imageFromFile('resize.png') self.add(self.image) self.connect('button-press-event', self.buttonPress) def buttonPress(self, obj, gevent): self.win.begin_resize_drag( self.edge, gevent.button, int(gevent.x_root), int(gevent.y_root), gevent.time, ) pyglossary-3.2.1/ui/gtk3_utils/utils.py0000644000175000017500000000072513577304507020445 0ustar emfoxemfox00000000000000from . import * from os.path import isabs, join from pyglossary.core import appResDir def set_tooltip(widget, text): try: widget.set_tooltip_text(text) # PyGTK 2.12 or above except AttributeError: try: widget.set_tooltip(gtk.Tooltips(), text) except: myRaise(__file__) def imageFromFile(path): # the file must exist if not isabs(path): path = join(appResDir, path) im = gtk.Image() try: im.set_from_file(path) except: myRaise() return im pyglossary-3.2.1/ui/paths.py0000644000175000017500000000041513575553425016333 0ustar emfoxemfox00000000000000from os.path import realpath, dirname, join, isdir import sys uiDir = '' if hasattr(sys, 'frozen'): rootDir = dirname(sys.executable) uiDir = join(rootDir, 'ui') else: uiDir = dirname(realpath(__file__)) rootDir = dirname(uiDir) appResDir = join(rootDir, 'res') pyglossary-3.2.1/ui/progressbar.py0000644000175000017500000002625313575553425017555 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # progressbar.py - Text progressbar library for python # # Copyright (C) 2006 Nilton Volpato # (until version 2.2) # Copyright (C) 2009-2010 Saeed Rasooli # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library. Or on Debian systems, from: # /usr/share/common-licenses/LGPL # If not, see . """ Text progressbar library for python. This library provides a text mode progressbar. This is tipically used to display the progress of a long running operation, providing a visual clue that processing is underway. The ProgressBar class manages the progress, and the format of the line is given by a number of widgets. A widget is an object that may display diferently depending on the state of the progress. There are three types of widget: - a string, which always shows itself; - a ProgressBarWidget, which may return a diferent value every time it's update method is called; and - a ProgressBarWidgetHFill, which is like ProgressBarWidget, except it expands to fill the remaining width of the line. The progressbar module is very easy to use, yet very powerful. And automatically supports features like auto-resizing when available. """ # Changelog # # 2009-??-??: used in PyGlossary by Saeed Rasooli with some modifications # 2006-05-07: v2.2 fixed bug in windows # 2005-12-04: v2.1 autodetect terminal width, added start method # 2005-12-04: v2.0 everything is now a widget (wow!) # 2005-12-03: v1.0 rewrite using widgets # 2005-06-02: v0.5 rewrite # 2004-??-??: v0.1 first version import sys import time from array import array import signal import logging log = logging.getLogger('root') class ProgressBarWidget(object): """ This is an element of ProgressBar formatting. The ProgressBar object will call it's update value when an update is needed. It's size may change between call, but the results will not be good if the size changes drastically and repeatedly. """ def update(self): """ Returns the string representing the widget. The parameter pbar is a reference to the calling ProgressBar, where one can access attributes of the class for knowing how the update must be made. At least this function must be overriden. """ pass def __str__(self): return self.update() def __add__(self, other): if isinstance(other, bytes): return str(self) + other.encode('utf-8') return str(self) + str(other) def __radd__(self, other): if isinstance(other, bytes): return other.encode('utf-8') + str(self) return str(other) + str(self) class ProgressBarWidgetHFill(object): """ This is a variable width element of ProgressBar formatting. The ProgressBar object will call it's update value, informing the width this object must the made. This is like TeX \\hfill, it will expand to fill the line. You can use more than one in the same line, and they will all have the same width, and together will fill the line. """ def update(self, width): """ Returns the string representing the widget. The parameter pbar is a reference to the calling ProgressBar, where one can access attributes of the class for knowing how the update must be made. The parameter width is the total horizontal width the widget must have. At least this function must be overriden. """ pass class ETA(ProgressBarWidget): """ Widget for the Estimated Time of Arrival """ def __init__(self, text='ETA: '): self.text = text def format_time(self, seconds): return time.strftime('%H:%M:%S', time.gmtime(seconds)) def update(self): pbar = self.pbar if pbar.currval == 0: return 'ETA: --:--:--' elif pbar.finished: return 'Time: %s' % self.format_time(pbar.seconds_elapsed) else: elapsed = pbar.seconds_elapsed eta = elapsed * pbar.maxval / pbar.currval - elapsed return '%s%s' % (self.text, self.format_time(eta)) class FileTransferSpeed(ProgressBarWidget): """ Widget for showing the transfer speed (useful for file transfers). """ def __init__(self): self.fmt = '%6.2f %s' self.units = ['B', 'K', 'M', 'G', 'T', 'P'] def update(self): pbar = self.pbar if pbar.seconds_elapsed < 2e-6: # == 0: bps = 0.0 else: bps = float(pbar.currval) / pbar.seconds_elapsed spd = bps for u in self.units: if spd < 1000: break spd /= 1000 return self.fmt % (spd, u+'/s') class RotatingMarker(ProgressBarWidget): """ A rotating marker for filling the bar of progress. """ def __init__(self, markers='|/-\\'): # Some cool exmaple for markers: # u'░▒▓█' # u'⬅⬉⬆⬈➡⬊⬇⬋' , u'⬌⬉⬆⬈⬌⬊⬇⬋', u'➚➙➘➙', u'➝➞➡➞' # u' ⚊⚌☰⚌⚊', u' ⚋⚊⚍⚌☱☰☱⚌⚍⚊⚋' , # '<(|)>)|(', u'❘❙❚❙', u'❢❣❤❣' self.markers = markers self.curmark = -1 def __len__(self): return 1 def update(self): pbar = self.pbar if pbar.finished: return self.markers[0] self.curmark = (self.curmark + 1) % len(self.markers) return self.markers[self.curmark] class Percentage(ProgressBarWidget): """ Just the percentage done. """ def update(self): pbar = self.pbar return '%5.1f' % pbar.percentage() class Bar(ProgressBarWidgetHFill): """ The bar of progress. It will strech to fill the line. """ def __init__(self, marker='#', left='|', right='|'): self.marker = marker self.left = left self.right = right def _format_marker(self, pbar): if type(self.marker) in (str, str): return self.marker else: return self.marker.update(pbar) def update(self, width): width = int(width) pbar = self.pbar percent = pbar.percentage() cwidth = width - len(self.left) - len(self.right) marked_width = int(percent * cwidth / 100.0) m = self._format_marker(pbar) bar = (self.left + (m*marked_width).ljust(cwidth) + self.right) return bar class ReverseBar(Bar): """ The reverse bar of progress, or bar of regress. :) """ def update(self, width): pbar = self.pbar percent = pbar.percentage() cwidth = width - len(self.left) - len(self.right) marked_width = int(percent * cwidth / 100.0) m = self._format_marker(pbar) bar = (self.left + (m*marked_width).rjust(cwidth) + self.right) return bar class ProgressBar(object): """ This is the ProgressBar class, it updates and prints the bar. The term_width parameter may be an integer. Or None, in which case it will try to guess it, if it fails it will default to 80 columns. The simple use is like this: >>> pbar = ProgressBar().start() >>> for i in xrange(100): ... # do something ... pbar.update(i+1) ... >>> pbar.finish() But anything you want to do is possible (well, almost anything). You can supply different widgets of any type in any order. And you can even write your own widgets! There are many widgets already shipped and you should experiment with them. When implementing a widget update method you may access any attribute or function of the ProgressBar object calling the widget's update method. The most important attributes you would like to access are: - currval: current value of the progress, 0 <= currval <= maxval - maxval: maximum (and final) value of the progress - finished: True if the bar is have finished (reached 100%), False o/w - start_time: first time update() method of ProgressBar was called - seconds_elapsed: seconds elapsed since start_time - percentage(): percentage of the progress (this is a method) """ default_widgets = [Percentage(), ' ', Bar()] def __init__( self, maxval=100.0, widgets=None, update_step=0.1, term_width=None, fd=sys.stderr, ): assert maxval > 0 self.maxval = maxval if widgets is None: widgets = self.default_widgets self.widgets = widgets for w in self.widgets: # log.debug( type(w) is ProgressBarWidget ) # if not isinstance(w, str): try: w.pbar = self except: pass self.update_step = update_step self.fd = fd self.signal_set = False if term_width is None: try: self.handle_resize(None, None) signal.signal(signal.SIGWINCH, self.handle_resize) self.signal_set = True except: self.term_width = 79 else: self.term_width = term_width self.currval = 0 self.finished = False self.prev_percentage = -1 self.start_time = None self.seconds_elapsed = 0 def handle_resize(self, signum, frame): try: from fcntl import ioctl import termios except: pass else: h, w = array('h', ioctl(self.fd, termios.TIOCGWINSZ, '\0'*8))[:2] self.term_width = w def percentage(self): """ Returns the percentage of the progress. """ return self.currval*100.0 / self.maxval def _format_widgets(self): r = [] hfill_inds = [] num_hfill = 0 currwidth = 0 for (i, w) in enumerate(self.widgets): if isinstance(w, ProgressBarWidgetHFill): r.append(w) hfill_inds.append(i) num_hfill += 1 elif isinstance(w, str): # OR isinstance(w, (str, unicode)) r.append(w) currwidth += len(w) else: weval = w.update() currwidth += len(weval) r.append(weval) for iw in hfill_inds: r[iw] = r[iw].update((self.term_width-currwidth)/num_hfill) return r def _format_line(self): return ''.join(self._format_widgets()).ljust(self.term_width) def _need_update(self): # return int(self.percentage()) != int(self.prev_percentage) return int(self.percentage() / self.update_step) != \ int(self.prev_percentage / self.update_step) def update(self, value): """ Updates the progress bar to a new value. """ assert 0 <= value <= self.maxval self.currval = value if not self._need_update() or self.finished: return if not self.start_time: self.start_time = time.time() self.seconds_elapsed = time.time() - self.start_time self.prev_percentage = self.percentage() if value != self.maxval: self.fd.write(self._format_line() + '\r') else: self.finished = True self.fd.write(self._format_line() + '\n') def start(self): """ Start measuring time, and prints the bar at 0%. It returns self so you can use it like this: >>> pbar = ProgressBar().start() >>> for i in xrange(100): ... # do something ... pbar.update(i+1) ... >>> pbar.finish() """ self.update(0) return self def finish(self): """ Used to tell the progress is finished. """ self.update(self.maxval) if self.signal_set: signal.signal(signal.SIGWINCH, signal.SIG_DFL) if __name__ == '__main__': def example1(): pbar = ProgressBar( widgets=[ 'Test: ', Bar(), ' ', RotatingMarker(), Percentage(), ' ', ETA(), ], maxval=1.0, update_step=0.2, ) pbar.start() for i in range(1000): # do something time.sleep(0.1) pbar.update(i/1000.0) pbar.finish() print('') example1() pyglossary-3.2.1/ui/ui_cmd.py0000644000175000017500000001540313577304507016454 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # ui_cmd.py # # Copyright © 2008-2010 Saeed Rasooli (ilius) # This file is part of PyGlossary project, https://github.com/ilius/pyglossary # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program. Or on Debian systems, from /usr/share/common-licenses/GPL # If not, see . from os.path import join import time import signal from pyglossary.glossary import * from .base import * from . import progressbar as pb if os.sep == "\\": # Operating system is Windows startRed = "" endFormat = "" startBold = "" startUnderline = "" endFormat = "" else: startRed = "\x1b[31m" endFormat = "\x1b[0;0;0m" startBold = "\x1b[1m" # Start Bold # len=4 startUnderline = "\x1b[4m" # Start Underline # len=4 endFormat = "\x1b[0;0;0m" # End Format # len=8 # redOnGray = "\x1b[0;1;31;47m" COMMAND = "pyglossary" # COMMAND = sys.argv[0] def getColWidth(subject, strings): return max( len(x) for x in [subject] + strings ) def getFormatsTable(names, header): descriptions = [ Glossary.formatsDesc[name] for name in names ] extentions = [ " ".join(Glossary.formatsExt[name]) for name in names ] nameWidth = getColWidth("Name", names) descriptionWidth = getColWidth("Description", descriptions) extentionsWidth = getColWidth("Extentions", extentions) lines = ["\n"] lines.append("%s%s%s" % (startBold, header, endFormat)) lines.append( " | ".join([ "Name".center(nameWidth), "Description".center(descriptionWidth), "Extentions".center(extentionsWidth) ]) ) lines.append( "-+-".join([ "-" * nameWidth, "-" * descriptionWidth, "-" * extentionsWidth, ]) ) for index, name in enumerate(names): lines.append( " | ".join([ name.ljust(nameWidth), descriptions[index].ljust(descriptionWidth), extentions[index].ljust(extentionsWidth) ]) ) return "\n".join(lines) def help(): import string with open(join(dataDir, "help")) as fp: text = fp.read() text = text.replace("", startBold)\ .replace("", startUnderline)\ .replace("", endFormat)\ .replace("", endFormat) text = string.Template(text).substitute( CMD=COMMAND, ) text += getFormatsTable(Glossary.readFormats, "Supported input formats:") text += getFormatsTable(Glossary.writeFormats, "Supported output formats:") print(text) def parseFormatOptionsStr(st): st = st.strip() if not st: return {} opt = {} parts = st.split(";") for part in parts: try: (key, value) = part.split("=") except ValueError: log.error("bad option syntax: %s" % part) continue key = key.strip() value = value.strip() # if it is string form of a number or boolean or tuple ... try: newValue = eval(value) except: pass else: if isinstance(newValue, ( bool, int, float, tuple, list, dict, )): value = newValue opt[key] = value return opt class NullObj(object): def __getattr__(self, attr): return self def __setattr__(self, attr, value): pass def __call__(self, *args, **kwargs): pass class UI(UIBase): def __init__(self, **options): self.pref = {} # log.debug(self.pref) self.pbar = NullObj() self._toPause = False def onSigInt(self, *args): if self._toPause: log.info("\nOperation Canceled") sys.exit(0) else: self._toPause = True log.info("\nPlease wait...") def setText(self, text): self.pbar.widgets[0] = text def progressInit(self, title): rot = pb.RotatingMarker() self.pbar = pb.ProgressBar( widgets=[ title, pb.Bar(marker="█", right=rot), pb.Percentage(), "% ", pb.ETA(), ], maxval=1.0, update_step=0.5, ) rot.pbar = self.pbar def progress(self, rat, text=""): self.pbar.update(rat) def reverseLoop(self, *args, **kwargs): reverseKwArgs = {} for key in ( "words", "matchWord", "showRel", "includeDefs", "reportStep", "saveStep", "maxNum", "minRel", "minWordLen" ): try: reverseKwArgs[key] = self.pref["reverse_" + key] except KeyError: pass reverseKwArgs.update(kwargs) if not self._toPause: log.info("Reversing glossary... (Press Ctrl+C to pause/stop)") for wordI in self.glos.reverse(**reverseKwArgs): if self._toPause: log.info( "Reverse is paused." " Press Enter to continue, and Ctrl+C to exit" ) input() self._toPause = False def run( self, inputFilename, outputFilename="", inputFormat="", outputFormat="", reverse=False, prefOptions=None, readOptions=None, writeOptions=None, convertOptions=None, ): if not prefOptions: prefOptions = {} if not readOptions: readOptions = {} if not writeOptions: writeOptions = {} if not convertOptions: convertOptions = {} self.pref_load(**prefOptions) if inputFormat: # inputFormat = inputFormat.capitalize() if inputFormat not in Glossary.readFormats: log.error("invalid read format %s" % inputFormat) if outputFormat: # outputFormat = outputFormat.capitalize() if outputFormat not in Glossary.writeFormats: log.error("invalid write format %s" % outputFormat) log.error("try: %s --help" % COMMAND) return 1 if not outputFilename: if reverse: pass elif outputFormat: try: ext = Glossary.formatsExt[outputFormat][0] except (KeyError, IndexError): log.error("invalid write format %s" % outputFormat) log.error("try: %s --help" % COMMAND) return 1 else: outputFilename = os.path.splitext(inputFilename)[0] + ext else: log.error("neither output file nor output format is given") log.error("try: %s --help" % COMMAND) return 1 glos = self.glos = Glossary(ui=self) if reverse: signal.signal(signal.SIGINT, self.onSigInt) # good place? FIXME readOptions["direct"] = False if not glos.read( inputFilename, format=inputFormat, **readOptions ): log.error("reading input file was failed!") return False self.setText("Reversing: ") self.pbar.update_step = 0.1 self.reverseLoop(savePath=outputFilename) else: finalOutputFile = self.glos.convert( inputFilename, inputFormat=inputFormat, outputFilename=outputFilename, outputFormat=outputFormat, readOptions=readOptions, writeOptions=writeOptions, **convertOptions ) return bool(finalOutputFile) return True pyglossary-3.2.1/ui/ui_gtk.py0000644000175000017500000005257113577304507016505 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # ui_gtk.py # # Copyright © 2008-2010 Saeed Rasooli (ilius) # Thanks to Pier Carteri for program Py_Shell.py # Thanks to Milad Rastian for program pySQLiteGUI # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. import shutil import sys import os from os.path import join, isfile, isabs, splitext import logging import traceback from pyglossary.text_utils import urlToPath from pyglossary.os_utils import click_website from pyglossary.glossary import * from .base import * from pyglossary import core import gi gi.require_version("Gtk", "3.0") from .gtk3_utils import * from .gtk3_utils.utils import * from .gtk3_utils.dialog import MyDialog from .gtk3_utils.resize_button import ResizeButton # from gi.repository import GdkPixbuf log = logging.getLogger("root") gtk.Window.set_default_icon_from_file(logo) _ = str # later replace with translator function pixDir = join(dataDir, "res") # FIXME def getCopressedFileExt(fpath): fname, ext = splitext(fpath.lower()) if ext in (".gz", ".bz2", ".zip"): fname, ext = splitext(fname) return ext def buffer_get_text(b): return b.get_text( b.get_start_iter(), b.get_end_iter(), True, ) def pack(box, child, expand=False, fill=False, padding=0): if isinstance(box, gtk.Box): box.pack_start(child, expand, fill, padding) elif isinstance(box, gtk.CellLayout): box.pack_start(child, expand) else: raise TypeError("pack: unkown type %s" % type(box)) def imageFromFile(path): # the file must exist if not isabs(path): path = join(pixDir, path) im = gtk.Image() try: im.set_from_file(path) except: myRaise() return im def color_parse(colorStr): rgba = gdk.RGBA() if not rgba.parse(colorStr): raise ValueError("bad color string %r" % colorStr) return rgba.to_color() class FormatComboBox(gtk.ComboBox): def __init__(self): gtk.ComboBox.__init__(self) self.model = gtk.ListStore( str, # format name, hidden # GdkPixbuf.Pixbuf,# icon str, # format description, shown ) self.set_model(self.model) cell = gtk.CellRendererText() cell.set_visible(False) pack(self, cell) self.add_attribute(cell, "text", 0) # cell = gtk.CellRendererPixbuf() # pack(self, cell, False) # self.add_attribute(cell, "pixbuf", 1) cell = gtk.CellRendererText() pack(self, cell, True) self.add_attribute(cell, "text", 1) def addFormat(self, _format): self.get_model().append(( _format, # icon, Glossary.formatsDesc[_format], )) def getActive(self): index = gtk.ComboBox.get_active(self) if index is None or index < 0: return "" return self.get_model()[index][0] def setActive(self, _format): for i, row in enumerate(self.get_model()): if row[0] == _format: gtk.ComboBox.set_active(self, i) return class InputFormatComboBox(FormatComboBox): def __init__(self): FormatComboBox.__init__(self) for _format in Glossary.readFormats: self.addFormat(_format) class OutputFormatComboBox(FormatComboBox): def __init__(self): FormatComboBox.__init__(self) for _format in Glossary.writeFormats: self.addFormat(_format) class GtkTextviewLogHandler(logging.Handler): def __init__(self, treeview_dict): logging.Handler.__init__(self) self.buffers = {} for levelname in ( "CRITICAL", "ERROR", "WARNING", "INFO", "DEBUG", ): textview = treeview_dict[levelname] buff = textview.get_buffer() tag = gtk.TextTag.new(levelname) buff.get_tag_table().add(tag) self.buffers[levelname] = buff def getTag(self, levelname): return self.buffers[levelname].get_tag_table().lookup(levelname) def setColor(self, levelname, color): # FIXME self.getTag(levelname).set_property("foreground-gdk", color) def emit(self, record): msg = record.getMessage() # msg = msg.replace("\x00", "") if record.exc_info: _type, value, tback = record.exc_info tback_text = "".join( traceback.format_exception(_type, value, tback) ) if msg: msg += "\n" msg += tback_text buff = self.buffers[record.levelname] buff.insert_with_tags_by_name( buff.get_end_iter(), msg + "\n", record.levelname, ) class GtkSingleTextviewLogHandler(GtkTextviewLogHandler): def __init__(self, textview): GtkTextviewLogHandler.__init__(self, { "CRITICAL": textview, "ERROR": textview, "WARNING": textview, "INFO": textview, "DEBUG": textview, }) class BrowseButton(gtk.Button): def __init__( self, setFilePathFunc, label="Browse", actionSave=False, title="Select File", ): gtk.Button.__init__(self) self.set_label(label) self.set_image(gtk.Image.new_from_icon_name( "document-save" if actionSave else "document-open", gtk.IconSize.BUTTON, )) self.actionSave = actionSave self.setFilePathFunc = setFilePathFunc self.title = title self.connect("clicked", self.onClick) def onClick(self, widget): fcd = gtk.FileChooserDialog( transient_for=self.get_toplevel(), action=gtk.FileChooserAction.SAVE if self.actionSave else gtk.FileChooserAction.OPEN, title=self.title, ) fcd.add_button(gtk.STOCK_CANCEL, gtk.ResponseType.CANCEL) fcd.add_button(gtk.STOCK_OK, gtk.ResponseType.OK) fcd.connect("response", lambda w, e: fcd.hide()) fcd.connect( "file-activated", lambda w: fcd.response(gtk.ResponseType.OK) ) if fcd.run() == gtk.ResponseType.OK: self.setFilePathFunc(fcd.get_filename()) fcd.destroy() class UI(gtk.Dialog, MyDialog, UIBase): def write(self, tag): # FIXME pass def status(self, msg): # try: # _id = self.statusMsgDict[msg] # except KeyError: # _id = self.statusMsgDict[msg] = self.statusNewId # self.statusNewId += 1 _id = self.statusBar.get_context_id(msg) self.statusBar.push(_id, msg) def __init__(self, **options): gtk.Dialog.__init__(self) self.set_title("PyGlossary (Gtk3)") self.resize(800, 800) self.connect("delete-event", self.onDeleteEvent) self.prefPages = [] # self.statusNewId = 0 # self.statusMsgDict = {}## message -> id ##### self.pref = {} self.pref_load(**options) ##### self.assert_quit = False self.path = "" self.glos = Glossary(ui=self) # ____________________ Tab 1 - Convert ____________________ # sizeGroup = gtk.SizeGroup(mode=gtk.SizeGroupMode.HORIZONTAL) #### vbox = gtk.VBox() vbox.label = _("Convert") vbox.icon = "" # "*.png" self.prefPages.append(vbox) ###### hbox = gtk.HBox(spacing=3) hbox.label = gtk.Label(label=_("Input Format")+":") pack(hbox, hbox.label) sizeGroup.add_widget(hbox.label) hbox.label.set_property("xalign", 0) self.convertInputFormatCombo = InputFormatComboBox() pack(hbox, self.convertInputFormatCombo) pack(vbox, hbox) ### hbox = gtk.HBox(spacing=3) hbox.label = gtk.Label(label=_("Input File")+":") pack(hbox, hbox.label) sizeGroup.add_widget(hbox.label) hbox.label.set_property("xalign", 0) self.convertInputEntry = gtk.Entry() pack(hbox, self.convertInputEntry, 1, 1) button = BrowseButton( self.convertInputEntry.set_text, label="Browse", actionSave=False, title="Select Input File", ) pack(hbox, button) pack(vbox, hbox) ## self.convertInputEntry.connect( "changed", self.convertInputEntryChanged, ) ##### vbox.sep1 = gtk.Label(label="") vbox.sep1.show() pack(vbox, vbox.sep1) ##### hbox = gtk.HBox(spacing=3) hbox.label = gtk.Label(label=_("Output Format")+":") pack(hbox, hbox.label) sizeGroup.add_widget(hbox.label) hbox.label.set_property("xalign", 0) self.convertOutputFormatCombo = OutputFormatComboBox() pack(hbox, self.convertOutputFormatCombo) pack(vbox, hbox) ### hbox = gtk.HBox(spacing=3) hbox.label = gtk.Label(label=_("Output File")+":") pack(hbox, hbox.label) sizeGroup.add_widget(hbox.label) hbox.label.set_property("xalign", 0) self.convertOutputEntry = gtk.Entry() pack(hbox, self.convertOutputEntry, 1, 1) button = BrowseButton( self.convertOutputEntry.set_text, label="Browse", actionSave=True, title="Select Output File", ) pack(hbox, button) pack(vbox, hbox) ## self.convertOutputEntry.connect( "changed", self.convertOutputEntryChanged, ) ##### hbox = gtk.HBox(spacing=10) label = gtk.Label(label="") pack(hbox, label, 1, 1, 10) self.convertButton = gtk.Button() self.convertButton.set_label("Convert") self.convertButton.connect("clicked", self.convertClicked) pack(hbox, self.convertButton, 1, 1, 10) pack(vbox, hbox, 0, 0, 15) # ____________________ Tab 2 - Reverse ____________________ # self.reverseStatus = "" #### sizeGroup = gtk.SizeGroup(mode=gtk.SizeGroupMode.HORIZONTAL) #### vbox = gtk.VBox() vbox.label = _("Reverse") vbox.icon = "" # "*.png" # self.prefPages.append(vbox) ###### hbox = gtk.HBox(spacing=3) hbox.label = gtk.Label(label=_("Input Format")+":") pack(hbox, hbox.label) sizeGroup.add_widget(hbox.label) hbox.label.set_property("xalign", 0) self.reverseInputFormatCombo = InputFormatComboBox() pack(hbox, self.reverseInputFormatCombo) pack(vbox, hbox) ### hbox = gtk.HBox(spacing=3) hbox.label = gtk.Label(label=_("Input File")+":") pack(hbox, hbox.label) sizeGroup.add_widget(hbox.label) hbox.label.set_property("xalign", 0) self.reverseInputEntry = gtk.Entry() pack(hbox, self.reverseInputEntry, 1, 1) button = BrowseButton( self.reverseInputEntry.set_text, label="Browse", actionSave=False, title="Select Input File", ) pack(hbox, button) pack(vbox, hbox) ## self.reverseInputEntry.connect( "changed", self.reverseInputEntryChanged, ) ##### vbox.sep1 = gtk.Label(label="") vbox.sep1.show() pack(vbox, vbox.sep1) ##### hbox = gtk.HBox(spacing=3) hbox.label = gtk.Label(label=_("Output Tabfile")+":") pack(hbox, hbox.label) sizeGroup.add_widget(hbox.label) hbox.label.set_property("xalign", 0) self.reverseOutputEntry = gtk.Entry() pack(hbox, self.reverseOutputEntry, 1, 1) button = BrowseButton( self.reverseOutputEntry.set_text, label="Browse", actionSave=True, title="Select Output File", ) pack(hbox, button) pack(vbox, hbox) ## self.reverseOutputEntry.connect( "changed", self.reverseOutputEntryChanged, ) ##### hbox = gtk.HBox(spacing=3) label = gtk.Label(label="") pack(hbox, label, 1, 1, 5) ### self.reverseStartButton = gtk.Button() self.reverseStartButton.set_label(_("Start")) self.reverseStartButton.connect("clicked", self.reverseStartClicked) pack(hbox, self.reverseStartButton, 1, 1, 2) ### self.reversePauseButton = gtk.Button() self.reversePauseButton.set_label(_("Pause")) self.reversePauseButton.set_sensitive(False) self.reversePauseButton.connect("clicked", self.reversePauseClicked) pack(hbox, self.reversePauseButton, 1, 1, 2) ### self.reverseResumeButton = gtk.Button() self.reverseResumeButton.set_label(_("Resume")) self.reverseResumeButton.set_sensitive(False) self.reverseResumeButton.connect("clicked", self.reverseResumeClicked) pack(hbox, self.reverseResumeButton, 1, 1, 2) ### self.reverseStopButton = gtk.Button() self.reverseStopButton.set_label(_("Stop")) self.reverseStopButton.set_sensitive(False) self.reverseStopButton.connect("clicked", self.reverseStopClicked) pack(hbox, self.reverseStopButton, 1, 1, 2) ### pack(vbox, hbox, 0, 0, 5) ##### # ____________________________________________________________ # notebook = gtk.Notebook() self.notebook = notebook ######### for vbox in self.prefPages: l = gtk.Label(label=vbox.label) l.set_use_underline(True) vb = gtk.VBox(spacing=3) if vbox.icon: vbox.image = imageFromFile(vbox.icon) pack(vb, vbox.image) pack(vb, l) vb.show_all() notebook.append_page(vbox, vb) try: notebook.set_tab_reorderable(vbox, True) except AttributeError: pass ####################### # notebook.set_property("homogeneous", True) # not in gtk3 FIXME # notebook.set_property("tab-border", 5) # not in gtk3 FIXME # notebook.set_property("tab-hborder", 15) # not in gtk3 FIXME pack(self.vbox, notebook, 0, 0) # for i in ui.prefPagesOrder: # try: # j = prefPagesOrder[i] # except IndexError: # continue # notebook.reorder_child(self.prefPages[i], j) # ____________________________________________________________ # self.consoleTextview = textview = gtk.TextView() swin = gtk.ScrolledWindow() swin.set_policy(gtk.PolicyType.AUTOMATIC, gtk.PolicyType.AUTOMATIC) swin.set_border_width(0) swin.add(textview) pack(self.vbox, swin, 1, 1) ### handler = GtkSingleTextviewLogHandler(textview) log.addHandler(handler) ### textview.override_background_color( gtk.StateFlags.NORMAL, gdk.RGBA(0, 0, 0, 1), ) ### handler.setColor("CRITICAL", color_parse("red")) handler.setColor("ERROR", color_parse("red")) handler.setColor("WARNING", color_parse("yellow")) handler.setColor("INFO", color_parse("white")) handler.setColor("DEBUG", color_parse("white")) ### textview.get_buffer().set_text("Output & Error Console:\n") textview.set_editable(False) # ____________________________________________________________ # self.progressTitle = "" self.progressBar = pbar = gtk.ProgressBar() pbar.set_fraction(0) # pbar.set_text(_("Progress Bar")) # pbar.get_style_context() # pbar.set_property("height-request", 20) pack(self.vbox, pbar, 0, 0) ############ hbox = gtk.HBox(spacing=5) clearButton = gtk.Button( use_stock=gtk.STOCK_CLEAR, always_show_image=True, label=_("Clear"), ) clearButton.show_all() # image = gtk.Image() # image.set_from_stock(gtk.STOCK_CLEAR, gtk.IconSize.MENU) # clearButton.add(image) clearButton.set_border_width(0) clearButton.connect("clicked", self.consoleClearButtonClicked) set_tooltip(clearButton, "Clear Console") pack(hbox, clearButton, 0, 0) #### # hbox.sepLabel1 = gtk.Label(label="") # pack(hbox, hbox.sepLabel1, 1, 1) ###### hbox.verbosityLabel = gtk.Label(label=_("Verbosity")+":") pack(hbox, hbox.verbosityLabel, 0, 0) ## self.verbosityCombo = combo = gtk.ComboBoxText() for level, levelName in enumerate(log.levelNamesCap): combo.append_text("%s - %s" % ( level, _(levelName) )) combo.set_active(log.getVerbosity()) combo.set_border_width(0) combo.connect("changed", self.verbosityComboChanged) pack(hbox, combo, 0, 0) #### # hbox.sepLabel2 = gtk.Label(label="") # pack(hbox, hbox.sepLabel2, 1, 1) #### self.statusBar = sbar = gtk.Statusbar() pack(hbox, self.statusBar, 1, 1) #### hbox.resizeButton = ResizeButton(self) pack(hbox, hbox.resizeButton, 0, 0) ###### pack(self.vbox, hbox, 0, 0) # ____________________________________________________________ # self.vbox.show_all() ######## self.status("Select input file") def run(self, editPath=None, readOptions=None): if readOptions is None: readOptions = {} # if editPath: # self.notebook.set_current_page(3) # log.info("Opening file "%s" for edit. please wait..."%editPath) # while gtk.events_pending(): # gtk.main_iteration_do(False) # self.dbe_open(editPath, **readOptions) gtk.Dialog.present(self) gtk.main() def onDeleteEvent(self, widget, event): self.destroy() gtk.main_quit() def consoleClearButtonClicked(self, widget=None): self.consoleTextview.get_buffer().set_text("") def verbosityComboChanged(self, widget=None): verbosity = self.verbosityCombo.get_active() # or int(self.verbosityCombo.get_active_text()) log.setVerbosity(verbosity) def convertClicked(self, widget=None): inPath = self.convertInputEntry.get_text() if not inPath: self.status("Input file path is empty!") log.critical("Input file path is empty!") return inFormat = self.convertInputFormatCombo.getActive() if inFormat: inFormatDesc = Glossary.formatsDesc[inFormat] else: inFormatDesc = "" # log.critical("Input format is empty!");return outPath = self.convertOutputEntry.get_text() if not outPath: self.status("Output file path is empty!") log.critical("Output file path is empty!") return outFormat = self.convertOutputFormatCombo.getActive() if outFormat: outFormatDesc = Glossary.formatsDesc[outFormat] else: outFormatDesc = "" # log.critical("Output format is empty!");return while gtk.events_pending(): gtk.main_iteration_do(False) self.convertButton.set_sensitive(False) self.progressTitle = "Converting" try: # if inFormat=="Omnidic": # dicIndex = self.xml.get_widget("spinbutton_omnidic_i")\ # .get_value_as_int() # ex = self.glos.readOmnidic(inPath, dicIndex=dicIndex) # else: finalOutputFile = self.glos.convert( inPath, inputFormat=inFormat, outputFilename=outPath, outputFormat=outFormat, ) if finalOutputFile: self.status("Convert finished") else: self.status("Convert failed") return bool(finalOutputFile) finally: self.convertButton.set_sensitive(True) self.assert_quit = False self.progressTitle = "" return True def convertInputEntryChanged(self, widget=None): inPath = self.convertInputEntry.get_text() inFormat = self.convertInputFormatCombo.getActive() if inPath.startswith("file://"): inPath = urlToPath(inPath) self.convertInputEntry.set_text(inPath) inExt = getCopressedFileExt(inPath) inFormatNew = Glossary.extFormat.get(inExt) if not inFormatNew: return if not isfile(inPath): return if self.pref["ui_autoSetFormat"] and not inFormat: self.convertInputFormatCombo.setActive(inFormatNew) self.status("Select output file") if self.pref["ui_autoSetOutputFileName"]: outFormat = self.convertOutputFormatCombo.getActive() outPath = self.convertOutputEntry.get_text() if outFormat: if not outPath and "." in inPath: outPath = splitext(inPath)[0] + \ Glossary.formatsExt[outFormat][0] self.convertOutputEntry.set_text(outPath) self.status("Press \"Convert\"") def convertOutputEntryChanged(self, widget=None): outPath = self.convertOutputEntry.get_text() outFormat = self.convertOutputFormatCombo.getActive() if not outPath: return # outFormat = self.combobox_o.get_active_text() if outPath.startswith("file://"): outPath = urlToPath(outPath) self.convertOutputEntry.set_text(outPath) if self.pref["ui_autoSetFormat"] and not outFormat: outExt = getCopressedFileExt(outPath) try: outFormatNew = Glossary.extFormat[outExt] except KeyError: pass else: self.convertOutputFormatCombo.setActive(outFormatNew) if self.convertOutputFormatCombo.getActive(): self.status("Press \"Convert\"") else: self.status("Select output format") def reverseLoad(self): pass def reverseStartLoop(self): pass def reverseStart(self): if not self.reverseLoad(): return ### self.reverseStatus = "doing" self.reverseStartLoop() ### self.reverseStartButton.set_sensitive(False) self.reversePauseButton.set_sensitive(True) self.reverseResumeButton.set_sensitive(False) self.reverseStopButton.set_sensitive(True) def reverseStartClicked(self, widget=None): self.waitingDo(self.reverseStart) def reversePause(self): self.reverseStatus = "pause" ### self.reverseStartButton.set_sensitive(False) self.reversePauseButton.set_sensitive(False) self.reverseResumeButton.set_sensitive(True) self.reverseStopButton.set_sensitive(True) def reversePauseClicked(self, widget=None): self.waitingDo(self.reversePause) def reverseResume(self): self.reverseStatus = "doing" ### self.reverseStartButton.set_sensitive(False) self.reversePauseButton.set_sensitive(True) self.reverseResumeButton.set_sensitive(False) self.reverseStopButton.set_sensitive(True) def reverseResumeClicked(self, widget=None): self.waitingDo(self.reverseResume) def reverseStop(self): self.reverseStatus = "stop" ### self.reverseStartButton.set_sensitive(True) self.reversePauseButton.set_sensitive(False) self.reverseResumeButton.set_sensitive(False) self.reverseStopButton.set_sensitive(False) def reverseStopClicked(self, widget=None): self.waitingDo(self.reverseStop) def reverseInputEntryChanged(self, widget=None): inPath = self.reverseInputEntry.get_text() inFormat = self.reverseInputFormatCombo.getActive() if inPath.startswith("file://"): inPath = urlToPath(inPath) self.reverseInputEntry.set_text(inPath) inExt = getCopressedFileExt(inPath) inFormatNew = Glossary.extFormat.get(inExt) if inFormatNew and self.pref["ui_autoSetFormat"] and not inFormat: self.reverseInputFormatCombo.setActive(inFormatNew) if self.pref["ui_autoSetOutputFileName"]: outExt = ".txt" outPath = self.reverseOutputEntry.get_text() if inFormatNew and not outPath: outPath = splitext(inPath)[0] + "-reversed" + outExt self.reverseOutputEntry.set_text(outPath) def reverseOutputEntryChanged(self, widget=None): pass def progressInit(self, title): self.progressTitle = title def progress(self, rat, text=None): if not text: text = "%%%d" % (rat*100) text += " - %s" % self.progressTitle self.progressBar.set_fraction(rat) # self.progressBar.set_text(text) # not working self.status(text) while gtk.events_pending(): gtk.main_iteration_do(False) pyglossary-3.2.1/ui/ui_qt.py0000644000175000017500000000403613575553425016340 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # ui_qk.py # # Copyright © 2010 Saeed Rasooli (ilius) # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. from pyglossary.glossary import * from .base import * from os.path import join from PyQt4 import QtGui as qt from PyQt4 import QtCore as qc stderr_saved = sys.stderr stdout_saved = sys.stdout # startBold = '\x1b[1m' # Start Bold # len=4 # startUnderline = '\x1b[4m' # Start Underline # len=4 endFormat = '\x1b[0;0;0m' # End Format # len=8 # redOnGray = '\x1b[0;1;31;47m' startRed = '\x1b[31m' noneItem = 'Not Selected' class QVirtualFile(object): def __init__(self, qtext, mode): self.qtext = qtext self.mode = mode def write(self, text): self.qtext.insertPlainText(text) if self.mode == 'stdout': stdout_saved.write(text) elif self.mode == 'stderr': stderr_saved.write(startRed+text+endFormat) def writelines(self, lines): for line in lines: self.write(line) def flush(self): pass def isatty(self): return 1 def fileno(self): pass class UI(qt.QWidget, UIBase): def __init__(self, ipath, **options): qt.QWidget.__init__(self) self.setWindowTitle('PyGlossary (Qt)') self.setWindowIcon(qt.QIcon(join(uiDir, 'pyglossary.png'))) ###################### self.running = False self.glos = Glossary(ui=self) self.pref = {} self.pref_load() self.pathI = '' self.pathO = '' self.fcd_dir = join(homeDir, 'Desktop') ###################### vbox = qt.QVBoxLayout() self.setLayout(vbox) pyglossary-3.2.1/ui/ui_tk.py0000644000175000017500000004656113577304507016340 0ustar emfoxemfox00000000000000# -*- coding: utf-8 -*- # ui_tk.py # # Copyright © 2009-2010 Saeed Rasooli (ilius) # # This program is a free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # You can get a copy of GNU General Public License along this program # But you can always get it from http://www.gnu.org/licenses/gpl.txt # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. from pyglossary.core import homeDir from pyglossary.glossary import * from pyglossary.text_utils import urlToPath from .base import * from os.path import join import logging import traceback import tkinter as tk from tkinter import filedialog from tkinter import tix log = logging.getLogger("root") # startBold = "\x1b[1m" # Start Bold #len=4 # startUnderline = "\x1b[4m" # Start Underline #len=4 endFormat = "\x1b[0;0;0m" # End Format #len=8 # redOnGray = "\x1b[0;1;31;47m" startRed = "\x1b[31m" noneItem = "Not Selected" bitmapLogo = join(dataDir, "res", "pyglossary.ico") if "nt" == os.name \ else "@" + join(dataDir, "res", "pyglossary.xbm") def set_window_icon(window): # window.wm_iconbitmap(bitmap=bitmapLogo) window.iconphoto( True, tk.PhotoImage(file=join(dataDir, "res", "pyglossary.png")), ) class TkTextLogHandler(logging.Handler): def __init__(self, tktext): logging.Handler.__init__(self) ##### tktext.tag_config("CRITICAL", foreground="#ff0000") tktext.tag_config("ERROR", foreground="#ff0000") tktext.tag_config("WARNING", foreground="#ffff00") tktext.tag_config("INFO", foreground="#00ff00") tktext.tag_config("DEBUG", foreground="#ffffff") ### self.tktext = tktext def emit(self, record): msg = record.getMessage() ### if record.exc_info: _type, value, tback = record.exc_info tback_text = "".join( traceback.format_exception(_type, value, tback) ) if msg: msg += "\n" msg += tback_text ### self.tktext.insert( "end", msg + "\n", record.levelname, ) # Monkey-patch Tkinter # http://stackoverflow.com/questions/5191830/python-exception-logging def CallWrapper__call__(self, *args): """ Apply first function SUBST to arguments, than FUNC. """ if self.subst: args = self.subst(*args) try: return self.func(*args) except: log.exception("Exception in Tkinter callback:") tk.CallWrapper.__call__ = CallWrapper__call__ class ProgressBar(tix.Frame): """ This comes from John Grayson's book "Python and Tkinter programming" Edited by Saeed Rasooli """ def __init__( self, rootWin=None, orientation="horizontal", min_=0, max_=100, width=100, height=18, appearance="sunken", fillColor="blue", background="gray", labelColor="yellow", labelFont="Verdana", labelFormat="%d%%", value=0, bd=2, ): # preserve various values self.rootWin = rootWin self.orientation = orientation self.min = min_ self.max = max_ self.width = width self.height = height self.fillColor = fillColor self.labelFont = labelFont self.labelColor = labelColor self.background = background self.labelFormat = labelFormat self.value = value tix.Frame.__init__(self, rootWin, relief=appearance, bd=bd) self.canvas = tix.Canvas( self, height=height, width=width, bd=0, highlightthickness=0, background=background, ) self.scale = self.canvas.create_rectangle( 0, 0, width, height, fill=fillColor, ) self.label = self.canvas.create_text( width/2, height/2, text="", anchor="c", fill=labelColor, font=self.labelFont, ) self.update() self.bind("", self.update) self.canvas.pack(side="top", fill="x", expand="no") def updateProgress(self, newVal, newMax=None, text=""): if newMax: self.max = newMax self.value = newVal self.update(None, text) def update(self, event=None, labelText=""): # Trim the values to be between min and max value = self.value if value > self.max: value = self.max if value < self.min: value = self.min # Adjust the rectangle width = int(self.winfo_width()) # width = self.width ratio = float(value)/self.max if self.orientation == "horizontal": self.canvas.coords( self.scale, 0, 0, width * ratio, self.height, ) else: self.canvas.coords( self.scale, 0, self.height * (1 - ratio), width, self.height, ) # Now update the colors self.canvas.itemconfig(self.scale, fill=self.fillColor) self.canvas.itemconfig(self.label, fill=self.labelColor) # And update the label if not labelText: labelText = self.labelFormat % int(ratio * 100) self.canvas.itemconfig(self.label, text=labelText) # FIXME: # self.canvas.move(self.label, width/2, self.height/2) # self.canvas.scale(self.label, 0, 0, float(width)/self.width, 1) self.canvas.update_idletasks() class UI(tix.Frame, UIBase): def __init__(self, path="", **options): self.glos = Glossary(ui=self) self.pref = {} self.pref_load(**options) ############################################# rootWin = self.rootWin = tix.Tk() tix.Frame.__init__(self, rootWin) rootWin.title("PyGlossary (Tkinter)") rootWin.resizable(True, False) ######## set_window_icon(rootWin) ######## self.pack(fill="x") # rootWin.bind("", self.resized) ###################### self.glos = Glossary(ui=self) self.pref = {} self.pref_load() self.pathI = "" self.pathO = "" self.fcd_dir = join(homeDir, "Desktop") ###################### vpaned = tk.PanedWindow(self, orient=tk.VERTICAL) notebook = tix.NoteBook(vpaned) notebook.add("tab1", label="Convert", underline=0) notebook.add("tab2", label="Reverse", underline=0) convertFrame = tix.Frame(notebook.tab1) ###################### frame = tix.Frame(convertFrame) ## label = tix.Label(frame, text="Read from format") label.pack(side="left") ## comboVar = tk.StringVar() combo = tk.OptionMenu(frame, comboVar, *Glossary.readDesc) # comboVar.set(Glossary.readDesc[0]) comboVar.set(noneItem) combo.pack(side="left") self.combobox_i = comboVar ## frame.pack(fill="x") ################### frame = tix.Frame(convertFrame) ## label = tix.Label(frame, text=" Path:") label.pack(side="left") ## entry = tix.Entry(frame) entry.pack(side="left", fill="x", expand=True) entry.bind_all("", self.entry_changed) self.entry_i = entry ## button = tix.Button( frame, text="Browse", command=self.browse_i, # bg="#f0f000", # activebackground="#f6f622", ) button.pack(side="left") ## frame.pack(fill="x") ###################### frame = tix.Frame(convertFrame) ## label = tix.Label(frame, text="Write to format ") label.pack(side="left") ## comboVar = tk.StringVar() combo = tk.OptionMenu(frame, comboVar, *Glossary.writeDesc) # comboVar.set(Glossary.writeDesc[0]) comboVar.set(noneItem) combo.pack(side="left") combo.bind("", self.combobox_o_changed) self.combobox_o = comboVar ## frame.pack(fill="x") ################### frame = tix.Frame(convertFrame) ## label = tix.Label(frame, text=" Path:") label.pack(side="left") ## entry = tix.Entry(frame) entry.pack(side="left", fill="x", expand=True) # entry.bind_all("", self.entry_changed) self.entry_o = entry ## button = tix.Button( frame, text="Browse", command=self.browse_o, # bg="#f0f000", # activebackground="#f6f622", ) button.pack(side="left") ## frame.pack(fill="x") ####### frame = tix.Frame(convertFrame) label = tix.Label(frame, text=" "*15) label.pack( side="left", fill="x", expand=True, ) button = tix.Button( frame, text="Convert", command=self.convert, # bg="#00e000", # activebackground="#22f022", ) button.pack( side="left", fill="x", expand=True, ) ### frame.pack(fill="x") ###### convertFrame.pack(fill="x") vpaned.add(notebook) ################# console = tix.Text(vpaned, height=15, background="#000000") # self.consoleH = 15 # sbar = Tix.Scrollbar( # vpaned, # orien=Tix.VERTICAL, # command=console.yview # ) # sbar.grid ( row=0, column=1) # console["yscrollcommand"] = sbar.set # console.grid() console.pack(fill="both", expand=True) log.addHandler( TkTextLogHandler(console), ) console.insert("end", "Console:\n") #### vpaned.add(console) vpaned.pack(fill="both", expand=True) self.console = console ################## frame2 = tix.Frame(self) clearB = tix.Button( frame2, text="Clear", command=self.console_clear, # bg="black", # fg="#ffff00", # activebackground="#333333", # activeforeground="#ffff00", ) clearB.pack(side="left") #### label = tix.Label(frame2, text="Verbosity") label.pack(side="left") ## comboVar = tk.StringVar() combo = tk.OptionMenu( frame2, comboVar, 0, 1, 2, 3, 4, ) comboVar.set(log.getVerbosity()) comboVar.trace("w", self.verbosityChanged) combo.pack(side="left") self.verbosityCombo = comboVar ##### pbar = ProgressBar(frame2, width=400) pbar.pack(side="left", fill="x", expand=True) self.pbar = pbar frame2.pack(fill="x") self.progressTitle = "" ############# # vpaned.grid() # bottomFrame.grid() # self.grid() ##################### # lbox = Tix.Listbox(convertFrame) # lbox.insert(0, "aaaaaaaa", "bbbbbbbbbbbbbbbbbbbb") # lbox.pack(fill="x") ############## frame3 = tix.Frame(self) aboutB = tix.Button( frame3, text="About", command=self.about_clicked, # bg="#e000e0", # activebackground="#f030f0", ) aboutB.pack(side="right") closeB = tix.Button( frame3, text="Close", command=self.quit, # bg="#ff0000", # activebackground="#ff5050", ) closeB.pack(side="right") frame3.pack(fill="x") # __________________________ Reverse Tab __________________________ # revFrame = tix.Frame(notebook.tab2) revFrame.pack(fill="x") ###################### frame = tix.Frame(revFrame) ## label = tix.Label(frame, text="Read from format") label.pack(side="left") ## comboVar = tk.StringVar() combo = tk.OptionMenu(frame, comboVar, *Glossary.readDesc) # comboVar.set(Glossary.readDesc[0]) comboVar.set(noneItem) combo.pack(side="left") self.combobox_r_i = comboVar ## frame.pack(fill="x") ################### frame = tix.Frame(revFrame) ## label = tix.Label(frame, text=" Path:") label.pack(side="left") ## entry = tix.Entry(frame) entry.pack(side="left", fill="x", expand=True) # entry.bind_all("", self.entry_r_i_changed) self.entry_r_i = entry ## button = tix.Button( frame, text="Browse", command=self.r_browse_i, # bg="#f0f000", # activebackground="#f6f622", ) button.pack(side="left") ## button = tix.Button( frame, text="Load", command=self.r_load, # bg="#7777ff", ) button.pack(side="left") ### frame.pack(fill="x") ################### frame = tix.Frame(revFrame) ## label = tix.Label(frame, text="Output Tabfile") label.pack(side="left") ### entry = tix.Entry(frame) entry.pack(side="left", fill="x", expand=True) # entry.bind_all("", self.entry_r_i_changed) self.entry_r_o = entry ## button = tix.Button( frame, text="Browse", command=self.r_browse_o, # bg="#f0f000", # activebackground="#f6f622", ) button.pack(side="left") ## frame.pack(fill="x") ############################## if path: self.entry_i.insert(0, path) self.entry_changed() self.load() def verbosityChanged(self, index, value, op): log.setVerbosity( int(self.verbosityCombo.get()) ) def about_clicked(self): about = tix.Toplevel(width=600) # bg="#0f0" does not work about.title("About PyGlossary") about.resizable(False, False) set_window_icon(about) ### msg1 = tix.Message( about, width=350, text="PyGlossary %s (Tkinter)" % VERSION, font=("DejaVu Sans", 13, "bold"), ) msg1.pack(fill="x", expand=True) ### msg2 = tix.Message( about, width=350, text=aboutText, font=("DejaVu Sans", 9, "bold"), justify=tix.CENTER, ) msg2.pack(fill="x", expand=True) ### msg3 = tix.Message( about, width=350, text=homePage, font=("DejaVu Sans", 8, "bold"), fg="#3333ff", ) msg3.pack(fill="x", expand=True) ### msg4 = tix.Message( about, width=350, text="Install PyGTK to have a better interface!", font=("DejaVu Sans", 8, "bold"), fg="#00aa00", ) msg4.pack(fill="x", expand=True) ########### frame = tix.Frame(about) ### button = tix.Button( frame, text="Close", command=about.destroy, # bg="#ff0000", # activebackground="#ff5050", ) button.pack(side="right") ### button = tix.Button( frame, text="License", command=self.about_license_clicked, # bg="#00e000", # activebackground="#22f022", ) button.pack(side="right") ### button = tix.Button( frame, text="Credits", command=self.about_credits_clicked, # bg="#0000ff", # activebackground="#5050ff", ) button.pack(side="right") ### frame.pack(fill="x") def about_credits_clicked(self): about = tix.Toplevel() # bg="#0f0" does not work about.title("Credits") about.resizable(False, False) set_window_icon(about) ### msg1 = tix.Message( about, width=500, text="\n".join(authors), font=("DejaVu Sans", 9, "bold"), ) msg1.pack(fill="x", expand=True) ########### frame = tix.Frame(about) closeB = tix.Button( frame, text="Close", command=about.destroy, # bg="#ff0000", # activebackground="#ff5050", ) closeB.pack(side="right") frame.pack(fill="x") def about_license_clicked(self): about = tix.Toplevel() # bg="#0f0" does not work about.title("License") about.resizable(False, False) set_window_icon(about) ### msg1 = tix.Message( about, width=420, text=licenseText, font=("DejaVu Sans", 9, "bold"), ) msg1.pack(fill="x", expand=True) ########### frame = tix.Frame(about) closeB = tix.Button( frame, text="Close", command=about.destroy, # bg="#ff0000", # activebackground="#ff5050", ) closeB.pack(side="right") frame.pack(fill="x") def quit(self): self.rootWin.destroy() def resized(self, event): dh = self.rootWin.winfo_height() - self.winfo_height() # log.debug(dh, self.consoleH) # if dh > 20: # self.consoleH += 1 # self.console["height"] = self.consoleH # self.console["width"] = int(self.console["width"]) + 1 # self.console.grid() # for x in dir(self): # if "info" in x: # log.debug(x) def combobox_o_changed(self, event): # log.debug(self.combobox_o.get()) formatD = self.combobox_o.get() if formatD == noneItem: return format = Glossary.descFormat[formatD] """ if format=="Omnidic": self.xml.get_widget("label_omnidic_o").show() self.xml.get_widget("spinbutton_omnidic_o").show() else: self.xml.get_widget("label_omnidic_o").hide() self.xml.get_widget("spinbutton_omnidic_o").hide() if format=="Babylon": self.xml.get_widget("label_enc").show() self.xml.get_widget("comboentry_enc").show() else: self.xml.get_widget("label_enc").hide() self.xml.get_widget("comboentry_enc").hide() """ if not self.pref["ui_autoSetOutputFileName"]: # and format is None: return pathI = self.entry_i.get() pathO = self.entry_o.get() formatOD = self.combobox_o.get() if formatOD is None: return if pathO: return if "." not in pathI: return extO = Glossary.descExt[formatOD] pathO = "".join(os.path.splitext(pathI)[:-1])+extO # self.entry_o.delete(0, "end") self.entry_o.insert(0, pathO) def entry_changed(self, event=None): # log.debug("entry_changed") # char = event.keysym pathI = self.entry_i.get() if self.pathI != pathI: formatD = self.combobox_i.get() if pathI.startswith("file://"): pathI = urlToPath(pathI) self.entry_i.delete(0, "end") self.entry_i.insert(0, pathI) if self.pref["ui_autoSetFormat"]: # format==noneItem: ext = os.path.splitext(pathI)[-1].lower() if ext in (".gz", ".bz2", ".zip"): ext = os.path.splitext(pathI[:-len(ext)])[-1].lower() for i in range(len(Glossary.readExt)): if ext in Glossary.readExt[i]: self.combobox_i.set(Glossary.readDesc[i]) break if self.pref["ui_autoSetOutputFileName"]: # format==noneItem: # pathI = self.entry_i.get() formatOD = self.combobox_o.get() pathO = self.entry_o.get() if formatOD != noneItem and not pathO and "." in pathI: extO = Glossary.descExt[formatOD] pathO = "".join(os.path.splitext(pathI)[:-1]) + extO self.entry_o.delete(0, "end") self.entry_o.insert(0, pathO) self.pathI = pathI ############################################## pathO = self.entry_o.get() if self.pathO != pathO: formatD = self.combobox_o.get() if pathO.startswith("file://"): pathO = urlToPath(pathO) self.entry_o.delete(0, "end") self.entry_o.insert(0, pathO) if self.pref["ui_autoSetFormat"]: # format==noneItem: ext = os.path.splitext(pathO)[-1].lower() if ext in (".gz", ".bz2", ".zip"): ext = os.path.splitext(pathO[:-len(ext)])[-1].lower() for i in range(len(Glossary.writeExt)): if ext in Glossary.writeExt[i]: self.combobox_o.set(Glossary.writeDesc[i]) break self.pathO = pathO def browse_i(self): path = filedialog.askopenfilename(initialdir=self.fcd_dir) if path: self.entry_i.delete(0, "end") self.entry_i.insert(0, path) self.entry_changed() self.fcd_dir = os.path.dirname(path) # FIXME def browse_o(self): path = filedialog.asksaveasfilename() if path: self.entry_o.delete(0, "end") self.entry_o.insert(0, path) self.entry_changed() self.fcd_dir = os.path.dirname(path) # FIXME def convert(self): inPath = self.entry_i.get() if not inPath: log.critical("Input file path is empty!") return inFormatDesc = self.combobox_i.get() if inFormatDesc == noneItem: # log.critical("Input format is empty!");return inFormat = "" else: inFormat = Glossary.descFormat[inFormatDesc] outPath = self.entry_o.get() if not outPath: log.critical("Output file path is empty!") return outFormatDesc = self.combobox_o.get() if outFormatDesc in (noneItem, ""): log.critical("Output format is empty!") return outFormat = Glossary.descFormat[outFormatDesc] finalOutputFile = self.glos.convert( inPath, inputFormat=inFormat, outputFilename=outPath, outputFormat=outFormat, ) # if finalOutputFile: # self.status("Convert finished") # else: # self.status("Convert failed") return bool(finalOutputFile) def run(self, editPath=None, readOptions=None): if readOptions is None: readOptions = {} # editPath and readOptions are for DB Editor # which is not implemented self.mainloop() def progressInit(self, title): self.progressTitle = title def progress(self, rat, text=""): if not text: text = "%%%d" % (rat*100) text += " - %s" % self.progressTitle self.pbar.updateProgress(rat*100, None, text) # self.pbar.value = rat*100 # self.pbar.update() self.rootWin.update() def console_clear(self, event=None): self.console.delete("1.0", "end") self.console.insert("end", "Console:\n") def r_browse_i(self): pass def r_browse_o(self): pass def r_load(self): pass if __name__ == "__main__": import sys if len(sys.argv) > 1: path = sys.argv[1] else: path = "" ui = UI(path) ui.run()
  • Copyright © 2001, 2002, 2003, 2008 Thai Open Source Software Center Ltd. Jing can be freely copied subject to these conditions.