././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1687012851.880389 pyenchant-3.3.0rc1/0000755000175000017500000000000014443342764013465 5ustar00dmerejdmerej././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1687011777.0 pyenchant-3.3.0rc1/Changelog0000644000175000017500000002350214443340701015266 0ustar00dmerejdmerejChangelog ========= 3.3.0 (unreleased) ------------------ * Add tokenizer for the German language * Improve support for macOS M1 architecture * Add support for Python 3.11 * Remove support for Python 3.6 * Numerous documentation updates * Start adding type annotations (still a work in progress) * For the `enchant.checker` package: always setup SpellChecker. * Display project urls on ``pypi.org`` * Sort all imports with ``isort`` * Numerous tests cleanups * Update FSF address in LICENSE.txt * Windows wheels: * Use ``enchant`` archive generated from GitHub Actions * Bmup ``enchant`` from 2.2.7 to 3.4.4 3.2.2 (2021-10-05) ------------------ * Add support for Python 3.10 3.2.1 (2021-06-24) -------------------- * Fix ``Dict.__del__`` sometimes raising `TypeError` upon exit (#98). Patch by @rr- * Default development branch is now called ``main`` * Bump ``black`` to 21.6b0 3.2.0 (2020-12-08) ------------------- * Add support for Python 3.9 * Add trove classifiers for all supported Python versions * Run ``pyupgrade`` across the code base * Update documentation about MacPorts 3.1.1 (2020-05-31) ------------------ * On Windows, set PATH instead of calling ``SetDllDirectory`` before loading the Enchant C library. This allows PyEnchant to co-exist with other libraries in the same program. Fix #207. 3.1.0 (2020-05-20) ------------------- * Add ``enchant.get_user_config_dir()`` * Fix: ``enchant.get_enchant_version()`` now returns a ``str``, not some ``bytes`` 3.0.1 (2020-03-01) ------------------ * Add missing LICENSE.txt in source distribution 3.0.0 (2020-03-01) ------------------ Highlights ++++++++++ * Uncouple PyEnchant version from the Enchant version. This release should be compatible with Enchant 1.6 to 2.2 * Fix using PyEnchant with Enchant >= 2.0 * Add support for pypy3, Python 3.7 and Python 3.8 * New website, hosted on https://pyenchant.github.io/pyenchant/ * Add `enchant.set_prefix_dir()` Breaking changes ++++++++++++++++ * Drop support for Python2 * **macOS**: The C enchant library is no longer embedded inside the wheel - you should install the C enchant library with ``brew`` or ``ports``. Clean ups +++++++++ * Port test suite to ``pytest``. * Add ``tbump`` configuration to simplify the release process * Format code with ``black``. * Remove compatibility layers with Python2 from ``enchant.utils`` * Use ``flake8`` to catch some errors during CI * Fix some PEP8 naming violations * Switch to GitHub Actions for CI 2.0.0 (2017-12-10) ------------------ * Removed deprecated `is_in_session` method, for compatibility with enchant 2. 1.6.6 (2014-06-16) ------------------ * New website and documentation, generated with Hyde and Sphinx. * Fix ``SpellChecker.replace()`` when the replacement is shorter than the erroneous word; previously this would corrupt the internal state of the tokenizer. Thanks Steckelfisch. * Make Dict class pickle-safe. Among other things, this should help with strange deadlocks when used with the multiprocessing module. * Ability to import the module even when the enchant C library isn't installed, by setting PYENCHANT_IGNORE_MISSING_LIB env var. * New utility function "trim_suggestions", useful for trimming the list of suggestions to a fixed maximum length. * Change the way DeprecationWarnings are issued, to point to the line line in user code rather than inside pyenchant. Thanks eriolv. * Add GetSpellChecker() method to wxSpellCheckerDialog. Thanks bjosey. 1.6.5 (2010-12-14) ------------------ * restore compatibility with Python 3 (including 3.2 beta1). * fix unittest DeprecationWarnings on Python 3. * statically compile libstdc++ into pre-built windows binaries. 1.6.4 (2010-12-13) ------------------ * DictWithPWL: use pwl and pel to adjust the words returned by suggest(). * Fix tokenization of utf8 bytes in a mutable character array. * get_tokenizer(): pass None as language tag to get default tokenizer. * prevent build-related files from being included in the source tarball. 1.6.3 (2010-08-17) ------------------ * Bundle pre-compiled libraries for Mac OSX 10.4 and later. * Improved handling of unicode paths on win32. * Changed DLL loading logic for win32, to ensure that we don't accidentally load older versions of e.g. glib that may be on the DLL search path. * Added function get_enchant_version() to retrieve the version string for the underlying enchant library. 1.6.2 (2010-05-29) ------------------ * Upgraded bundled enchant to v1.6.0. * Fixed bug in printf() utility function; all input args are now converted to strings before printing. 1.6.1 (2010-03-06) ------------------ * Fixed loading of enchant DLL on win32 without pkg_resources installed. * Fixed HTMLChunker to handle unescaped < and > characters that are clearly not part of a tag. 1.6.0 (2010-02-23) ------------------ * Upgraded to enchant v1.5.0: * new Broker methods get_param() and set_param() allow runtime customisation of provider data * Added the concept of 'chunkers' to enchant.tokenize.get_tokenizer(). These serve split split the text into large chunks of checkable tokens. * implemented a simple HTMLChunker class * Moved error classes into 'enchant.errors' for easier importing * Moved testcases into separate files so they're not loaded by default * Allowed SpellChecker to use default language if none is specified * Improved compatibility with Python 3 1.5.3 (2009-05-02) ------------------ * Fixed termination conditions in English tokenization loop. * Improved unicode detection in English tokenizer. * Made enchant spellcheck all of its docstrings as part of the unittest suite. 1.5.2 (2009-04-27) ------------------ * Modify utils.get_resource_filename and utils.win32_data_files for compatibility with py2exe (which was broken in the move to ctypes). Thanks to Stephen George for the fix. 1.5.1 (2009-01-08) ------------------ * SpellChecker.add_to_personal renamed to SpellChecker.add and fixed to use the corresponding Dict method. 1.5.0 (2008-11-25) ------------------ * Migrated from SWIG to ctypes * now runs under PyPy! * also opens possibilities for Jython, IronPython, ... * Compatibility updates for Python 3.0, mostly around unicode strings * Dropped compatibility with Python 2.2 1.4.2 (2008-06-18) ------------------ * upgrade to enchant v1.4.2 * windows version can now be installed at a path containing unicode characters 1.4.0 (2008-04-18) ------------------ * upgrade to enchant v1.4.0, with new functionality and APIs: * All dictionary providers now use a shared default personal word file (largely obsoleting the DictWithPWL class) * Ability to exclude words using Dict.remove, remove_from_session * Dict.add_to_personal renamed to Dict.add * Dict.is_added/Dict.is_removed for checking membership of word lists * unicode PWL filenames now handled correctly on Windows * upgrade bundled glib DLLs in Windows version 1.3.1 (2007-12-19) ------------------ * treat combining unicode marks as letters during tokenization * cleanup of wxSpellCheckerDialog, thanks to Phil Mayes * upgrades of bundled components in Windows version * upgraded glib DLLs * latest dictionaries from OpenOffice.org * latest version of Hunspell 1.3.0 (2006-12-29) ------------------ * Re-worked the tokenization API to allow filters but still remove non-alpha-numeric characters from words by default. This introduces some minor backward-incompatibilities to the API, hence the full minor version bump. * 'fallback' argument to get_tokenizer() was removed, just catch the Error and re-try with whatever is appropriate for your application. * filters should be passed into get_tokenizer() as the second argument, rather than applied as separate functions. * Basic whitespace-and-punctuation tokenization separated from the language-specific parts. * Internal details of Filter classes expanded and generalized * English tokenization rules reverted to 1.1.5 version 1.2.0 (2006-11-05) ------------------ * Implemented "filters" that allow tokenization to skip common word forms such as URLs, WikiWords, email addresses etc. * Now ships with enchant-1.3.0, meaning: * PWLs can return a useful list of suggestions rather than the empty list * Hunspell replaces MySpell as the default Windows backend * Tokenization doesn't split words at non-alpha characters by default * GtkSpellCheckerDialog contributed by Fredrik Corneliusson * Removed deprecated functionality: * Dict.add_to_personal * All registry handling functionality from enchant.utils * enchant.utils.SpellChecker (use enchant.checker.SpellChecker) * Removed PyPWL, as native enchant PWLs can now suggest corrections 1.1.5 (2006-01-19) ------------------ * Fix hang in included MySpell (Windows distribution) * Workaround for some MySpell/unicode problems * Update to latest setuptools ez_setup.py 1.1.4 (2006-01-09) ------------------ * No longer need to use the registry under Windows * Moved to setuptools for managing distribution * Implemented unittest TestCases, works with `python setup.py test` * Plugins on Windows moved to "enchant" subdirectory * SpellChecker now coerces to/from unicode automatically * Use python default encoding rather than UTF-8 where appropriate * Various documentation cleanups * bug fixes: * (1230151): count of live instances done by normalized key * Accept unicode strings as broker orderings 1.1.3 (2005-06-15) ------------------ * support for Python 2.2 * use 'locale' module to look up default language if none specified * more and better regression tests * mark deprecated interfaces with warnings * removed parameter to Dict constructor, with lots of reshuffling behind the scenes * add DictNotFoundError as a subclass of Error * Remove de_AT from languages in the Windows version, it was causing errors * bug fixes: * memory leak in DictWithPWL._free() * incorrect cache handling for PWLs ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/LICENSE.txt0000644000175000017500000006350414432433534015312 0ustar00dmerejdmerej GNU LESSER GENERAL PUBLIC LICENSE Version 2.1, February 1999 Copyright (C) 1991, 1999 Free Software Foundation, Inc. 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. [This is the first released version of the Lesser GPL. It also counts as the successor of the GNU Library Public License, version 2, hence the version number 2.1.] Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public Licenses are intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This license, the Lesser General Public License, applies to some specially designated software packages--typically libraries--of the Free Software Foundation and other authors who decide to use it. You can use it too, but we suggest you first think carefully about whether this license or the ordinary General Public License is the better strategy to use in any particular case, based on the explanations below. When we speak of free software, we are referring to freedom of use, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish); that you receive source code or can get it if you want it; that you can change the software and use pieces of it in new free programs; and that you are informed that you can do these things. To protect your rights, we need to make restrictions that forbid distributors to deny you these rights or to ask you to surrender these rights. These restrictions translate to certain responsibilities for you if you distribute copies of the library or if you modify it. For example, if you distribute copies of the library, whether gratis or for a fee, you must give the recipients all the rights that we gave you. You must make sure that they, too, receive or can get the source code. If you link other code with the library, you must provide complete object files to the recipients, so that they can relink them with the library after making changes to the library and recompiling it. And you must show them these terms so they know their rights. We protect your rights with a two-step method: (1) we copyright the library, and (2) we offer you this license, which gives you legal permission to copy, distribute and/or modify the library. To protect each distributor, we want to make it very clear that there is no warranty for the free library. Also, if the library is modified by someone else and passed on, the recipients should know that what they have is not the original version, so that the original author's reputation will not be affected by problems that might be introduced by others. Finally, software patents pose a constant threat to the existence of any free program. We wish to make sure that a company cannot effectively restrict the users of a free program by obtaining a restrictive license from a patent holder. Therefore, we insist that any patent license obtained for a version of the library must be consistent with the full freedom of use specified in this license. Most GNU software, including some libraries, is covered by the ordinary GNU General Public License. This license, the GNU Lesser General Public License, applies to certain designated libraries, and is quite different from the ordinary General Public License. We use this license for certain libraries in order to permit linking those libraries into non-free programs. When a program is linked with a library, whether statically or using a shared library, the combination of the two is legally speaking a combined work, a derivative of the original library. The ordinary General Public License therefore permits such linking only if the entire combination fits its criteria of freedom. The Lesser General Public License permits more lax criteria for linking other code with the library. We call this license the "Lesser" General Public License because it does Less to protect the user's freedom than the ordinary General Public License. It also provides other free software developers Less of an advantage over competing non-free programs. These disadvantages are the reason we use the ordinary General Public License for many libraries. However, the Lesser license provides advantages in certain special circumstances. For example, on rare occasions, there may be a special need to encourage the widest possible use of a certain library, so that it becomes a de-facto standard. To achieve this, non-free programs must be allowed to use the library. A more frequent case is that a free library does the same job as widely used non-free libraries. In this case, there is little to gain by limiting the free library to free software only, so we use the Lesser General Public License. In other cases, permission to use a particular library in non-free programs enables a greater number of people to use a large body of free software. For example, permission to use the GNU C Library in non-free programs enables many more people to use the whole GNU operating system, as well as its variant, the GNU/Linux operating system. Although the Lesser General Public License is Less protective of the users' freedom, it does ensure that the user of a program that is linked with the Library has the freedom and the wherewithal to run that program using a modified version of the Library. The precise terms and conditions for copying, distribution and modification follow. Pay close attention to the difference between a "work based on the library" and a "work that uses the library". The former contains code derived from the library, whereas the latter must be combined with the library in order to run. GNU LESSER GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License Agreement applies to any software library or other program which contains a notice placed by the copyright holder or other authorized party saying it may be distributed under the terms of this Lesser General Public License (also called "this License"). Each licensee is addressed as "you". A "library" means a collection of software functions and/or data prepared so as to be conveniently linked with application programs (which use some of those functions and data) to form executables. The "Library", below, refers to any such software library or work which has been distributed under these terms. A "work based on the Library" means either the Library or any derivative work under copyright law: that is to say, a work containing the Library or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language. (Hereinafter, translation is included without limitation in the term "modification".) "Source code" for a work means the preferred form of the work for making modifications to it. For a library, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the library. Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running a program using the Library is not restricted, and output from such a program is covered only if its contents constitute a work based on the Library (independent of the use of the Library in a tool for writing it). Whether that is true depends on what the Library does and what the program that uses the Library does. 1. You may copy and distribute verbatim copies of the Library's complete source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and distribute a copy of this License along with the Library. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Library or any portion of it, thus forming a work based on the Library, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) The modified work must itself be a software library. b) You must cause the files modified to carry prominent notices stating that you changed the files and the date of any change. c) You must cause the whole of the work to be licensed at no charge to all third parties under the terms of this License. d) If a facility in the modified Library refers to a function or a table of data to be supplied by an application program that uses the facility, other than as an argument passed when the facility is invoked, then you must make a good faith effort to ensure that, in the event an application does not supply such function or table, the facility still operates, and performs whatever part of its purpose remains meaningful. (For example, a function in a library to compute square roots has a purpose that is entirely well-defined independent of the application. Therefore, Subsection 2d requires that any application-supplied function or table used by this function must be optional: if the application does not supply it, the square root function must still compute square roots.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Library, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Library, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Library. In addition, mere aggregation of another work not based on the Library with the Library (or with a work based on the Library) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may opt to apply the terms of the ordinary GNU General Public License instead of this License to a given copy of the Library. To do this, you must alter all the notices that refer to this License, so that they refer to the ordinary GNU General Public License, version 2, instead of to this License. (If a newer version than version 2 of the ordinary GNU General Public License has appeared, then you can specify that version instead if you wish.) Do not make any other change in these notices. Once this change is made in a given copy, it is irreversible for that copy, so the ordinary GNU General Public License applies to all subsequent copies and derivative works made from that copy. This option is useful when you wish to copy part of the code of the Library into a program that is not a library. 4. You may copy and distribute the Library (or a portion or derivative of it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange. If distribution of object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place satisfies the requirement to distribute the source code, even though third parties are not compelled to copy the source along with the object code. 5. A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License. However, linking a "work that uses the Library" with the Library creates an executable that is a derivative of the Library (because it contains portions of the Library), rather than a "work that uses the library". The executable is therefore covered by this License. Section 6 states terms for distribution of such executables. When a "work that uses the Library" uses material from a header file that is part of the Library, the object code for the work may be a derivative work of the Library even though the source code is not. Whether this is true is especially significant if the work can be linked without the Library, or if the work is itself a library. The threshold for this to be true is not precisely defined by law. If such an object file uses only numerical parameters, data structure layouts and accessors, and small macros and small inline functions (ten lines or less in length), then the use of the object file is unrestricted, regardless of whether it is legally a derivative work. (Executables containing this object code plus portions of the Library will still fall under Section 6.) Otherwise, if the work is a derivative of the Library, you may distribute the object code for the work under the terms of Section 6. Any executables containing that work also fall under Section 6, whether or not they are linked directly with the Library itself. 6. As an exception to the Sections above, you may also combine or link a "work that uses the Library" with the Library to produce a work containing portions of the Library, and distribute that work under terms of your choice, provided that the terms permit modification of the work for the customer's own use and reverse engineering for debugging such modifications. You must give prominent notice with each copy of the work that the Library is used in it and that the Library and its use are covered by this License. You must supply a copy of this License. If the work during execution displays copyright notices, you must include the copyright notice for the Library among them, as well as a reference directing the user to the copy of this License. Also, you must do one of these things: a) Accompany the work with the complete corresponding machine-readable source code for the Library including whatever changes were used in the work (which must be distributed under Sections 1 and 2 above); and, if the work is an executable linked with the Library, with the complete machine-readable "work that uses the Library", as object code and/or source code, so that the user can modify the Library and then relink to produce a modified executable containing the modified Library. (It is understood that the user who changes the contents of definitions files in the Library will not necessarily be able to recompile the application to use the modified definitions.) b) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (1) uses at run time a copy of the library already present on the user's computer system, rather than copying library functions into the executable, and (2) will operate properly with a modified version of the library, if the user installs one, as long as the modified version is interface-compatible with the version that the work was made with. c) Accompany the work with a written offer, valid for at least three years, to give the same user the materials specified in Subsection 6a, above, for a charge no more than the cost of performing this distribution. d) If distribution of the work is made by offering access to copy from a designated place, offer equivalent access to copy the above specified materials from the same place. e) Verify that the user has already received a copy of these materials or that you have already sent this user a copy. For an executable, the required form of the "work that uses the Library" must include any data and utility programs needed for reproducing the executable from it. However, as a special exception, the materials to be distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. It may happen that this requirement contradicts the license restrictions of other proprietary libraries that do not normally accompany the operating system. Such a contradiction means you cannot use both them and the Library together in an executable that you distribute. 7. You may place library facilities that are a work based on the Library side-by-side in a single library together with other library facilities not covered by this License, and distribute such a combined library, provided that the separate distribution of the work based on the Library and of the other library facilities is otherwise permitted, and provided that you do these two things: a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities. This must be distributed under the terms of the Sections above. b) Give prominent notice with the combined library of the fact that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work. 8. You may not copy, modify, sublicense, link with, or distribute the Library except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, link with, or distribute the Library is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 9. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Library or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Library (or any work based on the Library), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Library or works based on it. 10. Each time you redistribute the Library (or any work based on the Library), the recipient automatically receives a license from the original licensor to copy, distribute, link with or modify the Library subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties with this License. 11. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Library at all. For example, if a patent license would not permit royalty-free redistribution of the Library by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Library. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply, and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 12. If the distribution and/or use of the Library is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Library under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 13. The Free Software Foundation may publish revised and/or new versions of the Lesser General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Library specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Library does not specify a license version number, you may choose any version ever published by the Free Software Foundation. 14. If you wish to incorporate parts of the Library into other free programs whose distribution conditions are incompatible with these, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Libraries If you develop a new library, and you want it to be of the greatest possible use to the public, we recommend making it free software that everyone can redistribute and change. You can do so by permitting redistribution under these terms (or, alternatively, under the terms of the ordinary General Public License). To apply these terms, attach the following notices to the library. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Also add information on how to contact you by electronic and paper mail. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the library, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the library `Frob' (a library for tweaking knobs) written by James Random Hacker. , 1 April 1990 Ty Coon, President of Vice That's all there is to it! ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/MANIFEST.in0000644000175000017500000000011514432433534015212 0ustar00dmerejdmerejinclude Changelog include README.rst graft enchant/data/ include LICENSE.txt ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1687012851.880389 pyenchant-3.3.0rc1/PKG-INFO0000644000175000017500000000772414443342764014574 0ustar00dmerejdmerejMetadata-Version: 2.1 Name: pyenchant Version: 3.3.0rc1 Summary: Python bindings for the Enchant spellchecking system Home-page: https://pyenchant.github.io/pyenchant/ Author: Dimitri Merejkowsky Author-email: d.merej@gmail.com License: LGPL Project-URL: Changelog, https://pyenchant.github.io/pyenchant/changelog.html Project-URL: Source, https://github.com/pyenchant/pyenchant Project-URL: Tracker, https://github.com/pyenchant/pyenchant/issues Keywords: spelling spellcheck enchant Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL) Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3 :: Only Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: 3.12 Classifier: Programming Language :: Python :: Implementation :: CPython Classifier: Programming Language :: Python :: Implementation :: PyPy Classifier: Topic :: Software Development :: Libraries Classifier: Topic :: Text Processing :: Linguistic Requires-Python: >=3.7 License-File: LICENSE.txt pyenchant: Python bindings for the Enchant spellchecker ======================================================== .. image:: https://img.shields.io/pypi/v/pyenchant.svg :target: https://pypi.org/project/pyenchant .. image:: https://img.shields.io/pypi/pyversions/pyenchant.svg :target: https://pypi.org/project/pyenchant .. image:: https://github.com/pyenchant/pyenchant/workflows/tests/badge.svg :target: https://github.com/pyenchant/pyenchant/actions .. image:: https://builds.sr.ht/~dmerej/pyenchant.svg :target: https://builds.sr.ht/~dmerej/pyenchant .. image:: https://img.shields.io/badge/code%20style-black-000000.svg :target: https://github.com/psf/black This package provides a set of Python language bindings for the Enchant spellchecking library. For more information, visit the project website: http://pyenchant.github.io/pyenchant/ What is Enchant? ---------------- Enchant is used to check the spelling of words and suggest corrections for words that are miss-spelled. It can use many popular spellchecking packages to perform this task, including ispell, aspell and MySpell. It is quite flexible at handling multiple dictionaries and multiple languages. More information is available on the Enchant website: https://abiword.github.io/enchant/ How do I use it? ---------------- For Windows users, install the pre-built binary packages using pip:: pip install pyenchant These packages bundle a pre-built copy of the underlying enchant library. Users on other platforms will need to install "enchant" using their system package manager (brew on macOS). Once the software is installed, python's on-line help facilities can get you started. Launch python and issue the following commands: >>> import enchant >>> help(enchant) Who is responsible for all this? -------------------------------- The credit for Enchant itself goes to Dom Lachowicz. Find out more details on the Enchant website listed above. Full marks to Dom for producing such a high-quality library. The glue to pull Enchant into Python via ctypes was written by Ryan Kelly. He needed a decent spellchecker for another project he was working on, and all the solutions turned up by Google were either extremely non-portable (e.g. opening a pipe to ispell) or had completely disappeared from the web (what happened to SnakeSpell?) It was also a great excuse to teach himself about SWIG, ctypes, and even a little bit of the Python/C API. Finally, after Ryan stepped down from the project, Dimitri Merejkowsky became the new maintainer. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/README.rst0000644000175000017500000000502614432433534015151 0ustar00dmerejdmerejpyenchant: Python bindings for the Enchant spellchecker ======================================================== .. image:: https://img.shields.io/pypi/v/pyenchant.svg :target: https://pypi.org/project/pyenchant .. image:: https://img.shields.io/pypi/pyversions/pyenchant.svg :target: https://pypi.org/project/pyenchant .. image:: https://github.com/pyenchant/pyenchant/workflows/tests/badge.svg :target: https://github.com/pyenchant/pyenchant/actions .. image:: https://builds.sr.ht/~dmerej/pyenchant.svg :target: https://builds.sr.ht/~dmerej/pyenchant .. image:: https://img.shields.io/badge/code%20style-black-000000.svg :target: https://github.com/psf/black This package provides a set of Python language bindings for the Enchant spellchecking library. For more information, visit the project website: http://pyenchant.github.io/pyenchant/ What is Enchant? ---------------- Enchant is used to check the spelling of words and suggest corrections for words that are miss-spelled. It can use many popular spellchecking packages to perform this task, including ispell, aspell and MySpell. It is quite flexible at handling multiple dictionaries and multiple languages. More information is available on the Enchant website: https://abiword.github.io/enchant/ How do I use it? ---------------- For Windows users, install the pre-built binary packages using pip:: pip install pyenchant These packages bundle a pre-built copy of the underlying enchant library. Users on other platforms will need to install "enchant" using their system package manager (brew on macOS). Once the software is installed, python's on-line help facilities can get you started. Launch python and issue the following commands: >>> import enchant >>> help(enchant) Who is responsible for all this? -------------------------------- The credit for Enchant itself goes to Dom Lachowicz. Find out more details on the Enchant website listed above. Full marks to Dom for producing such a high-quality library. The glue to pull Enchant into Python via ctypes was written by Ryan Kelly. He needed a decent spellchecker for another project he was working on, and all the solutions turned up by Google were either extremely non-portable (e.g. opening a pipe to ispell) or had completely disappeared from the web (what happened to SnakeSpell?) It was also a great excuse to teach himself about SWIG, ctypes, and even a little bit of the Python/C API. Finally, after Ryan stepped down from the project, Dimitri Merejkowsky became the new maintainer. ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1687012851.8770554 pyenchant-3.3.0rc1/enchant/0000755000175000017500000000000014443342764015105 5ustar00dmerejdmerej././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1687012037.0 pyenchant-3.3.0rc1/enchant/__init__.py0000644000175000017500000010562114443341305017211 0ustar00dmerejdmerej# pyenchant # # Copyright (C) 2004-2011, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant: Access to the enchant spellchecking library ===================================================== This module provides several classes for performing spell checking via the Enchant spellchecking library. For more details on Enchant, visit the project website: https://abiword.github.io/enchant/ Spellchecking is performed using :py:class:`Dict` objects, which represent a language dictionary. Their use is best demonstrated by a quick example:: >>> import enchant >>> d = enchant.Dict("en_US") # create dictionary for US English >>> d.check("enchant") True >>> d.check("enchnt") False >>> d.suggest("enchnt") ['enchant', 'enchants', 'enchanter', 'penchant', 'incant', 'enchain', 'enchanted'] Languages are identified by standard string tags such as "en" (English) and "fr" (French). Specific language dialects can be specified by including an additional code - for example, "en_AU" refers to Australian English. The later form is preferred as it is more widely supported. To check whether a dictionary exists for a given language, the function :py:func:`dict_exists` is available. Dictionaries may also be created using the function :py:func:`request_dict`. A finer degree of control over the dictionaries and how they are created can be obtained using one or more :py:class:`Broker` objects. These objects are responsible for locating dictionaries for a specific language. Note that unicode strings are expected throughout the entire API. Bytestrings should not be passed into any function. Errors that occur in this module are reported by raising subclasses of :py:exc:`~.errors.Error`. """ _DOC_ERRORS = ["enchnt", "enchnt", "incant", "fr"] __version__ = "3.3.0rc1" import os import warnings try: from enchant import _enchant as _e except ImportError: if not os.environ.get("PYENCHANT_IGNORE_MISSING_LIB", False): raise _e = None # type: ignore from typing import Any, List, NoReturn, Optional, Tuple, Type, Union # noqa F401 from enchant.errors import * # noqa F401,F403 from enchant.errors import DictNotFoundError, Error from enchant.pypwl import PyPWL from enchant.utils import get_default_language class ProviderDesc: """Simple class describing an Enchant provider. Each provider has the following information associated with it: * :py:attr:`name`: Internal provider name (e.g. "aspell") * :py:attr:`desc`: Human-readable description (e.g. "Aspell Provider") * :py:attr:`file`: Location of the library containing the provider """ _DOC_ERRORS = ["desc"] def __init__(self, name: str, desc: str, file: str) -> None: self.name = name self.desc = desc self.file = file def __str__(self) -> str: return "" % self.desc def __repr__(self) -> str: return str(self) def __eq__(self, pd): """Equality operator on ProviderDesc objects.""" if not isinstance(pd, ProviderDesc): return False return self.name == pd.name and self.desc == pd.desc and self.file == pd.file def __hash__(self): """Hash operator on ProviderDesc objects.""" return hash(self.name + self.desc + self.file) class _EnchantObject: """Base class for enchant objects. This class implements some general functionality for interfacing with the '_enchant' C-library in a consistent way. All public objects from the 'enchant' module are subclasses of this class. All enchant objects have an attribute :py:attr:`_this` which contains the pointer to the underlying C-library object. The method :py:meth:`_check_this` can be called to ensure that this point is not None, raising an exception if it is. """ def __init__(self) -> None: """_EnchantObject constructor.""" self._this = None # To be importable when enchant C lib is missing, we need # to create a dummy default broker. if _e is not None: self._init_this() def _check_this(self, msg: str = None) -> None: """Check that :py:attr:`_this` is set to a pointer, rather than `None`.""" if self._this is None: if msg is None: msg = "%s unusable: the underlying C-library object has been freed." msg = msg % (self.__class__.__name__,) raise Error(msg) def _init_this(self) -> None: """Initialise the underlying C-library object pointer.""" raise NotImplementedError def _raise_error( self, default: str = "Unspecified Error", eclass: Type[Error] = Error ) -> NoReturn: """Raise an exception based on available error messages. This method causes an :py:exc:`~.errors.Error` to be raised. Subclasses should override it to retrieve an error indication from the underlying API if possible. If such a message cannot be retrieved, the argument value `default` is used. The class of the exception can be specified using the argument `eclass` """ raise eclass(default) _raise_error._DOC_ERRORS = ["eclass"] # type: ignore def __getstate__(self): """Customize pickling of PyEnchant objects. Since it's not safe for multiple objects to share the same C-library object, we make sure it's unset when pickling. """ state = self.__dict__.copy() state["_this"] = None return state def __setstate__(self, state): self.__dict__.update(state) self._init_this() class Broker(_EnchantObject): """Broker object for the Enchant spellchecker. `Broker` objects are responsible for locating and managing dictionaries. Unless custom functionality is required, there is no need to use `Broker` objects directly. The py:mod:`enchant` module provides a default broker object so that :py:class:`Dict` objects can be created directly. The most important methods of this class include: * :py:meth:`dict_exists`: check existence of a specific language dictionary * :py:meth:`request_dict`: obtain a dictionary for specific language * :py:meth:`set_ordering`: specify which dictionaries to try for a given language. """ def __init__(self) -> None: """Broker object constructor. This method is the constructor for the `Broker` object. No arguments are required. """ super().__init__() def _init_this(self) -> None: self._this = _e.broker_init() if not self._this: raise Error("Could not initialise an enchant broker.") self._live_dicts = {} def __del__(self) -> None: """Broker object destructor.""" # Calling free() might fail if python is shutting down try: self._free() except (AttributeError, TypeError): pass def __getstate__(self): state = super().__getstate__() state.pop("_live_dicts") return state def _raise_error( self, default: str = "Unspecified Error", eclass: Type[Error] = Error ) -> NoReturn: """Overrides _EnchantObject._raise_error to check broker errors.""" err = _e.broker_get_error(self._this) if err == "" or err is None: raise eclass(default) raise eclass(err.decode()) def _free(self) -> None: """Free system resource associated with a Broker object. This method can be called to free the underlying system resources associated with a `Broker` object. It is called automatically when the object is garbage collected. If called explicitly, the `Broker` and any associated `Dict` objects must no longer be used. """ if self._this is not None: # During shutdown, this finalizer may be called before # some Dict finalizers. Ensure all pointers are freed. for (dict, count) in list(self._live_dicts.items()): while count: self._free_dict_data(dict) count -= 1 _e.broker_free(self._this) self._this = None def request_dict(self, tag: str = None) -> "Dict": """Request a :py:class:`Dict` object for the language specified by `tag`. This method constructs and returns a :py:class:`Dict` object for the requested language. `tag` should be a string of the appropriate form for specifying a language, such as "fr" (French) or "en_AU" (Australian English). The existence of a specific language can be tested using the method :py:meth:`dict_exists`. If `tag` is not given or is `None`, an attempt is made to determine the current language in use. If this cannot be determined, :py:exc:`~.errors.Error` is raised. .. note:: this method is functionally equivalent to calling the :py:class:`Dict()` constructor and passing in the `broker` argument. """ return Dict(tag, self) request_dict._DOC_ERRORS = ["fr"] # type: ignore def _request_dict_data(self, tag: str) -> _e.t_dict: """Request raw C pointer data for a dictionary. This method call passes on the call to the C library, and does some internal bookkeeping. """ self._check_this() new_dict = _e.broker_request_dict(self._this, tag.encode()) if new_dict is None: e_str = "Dictionary for language '%s' could not be found\n" e_str += "Please check https://pyenchant.github.io/pyenchant/ for details" self._raise_error(e_str % (tag,), DictNotFoundError) if new_dict not in self._live_dicts: self._live_dicts[new_dict] = 1 else: self._live_dicts[new_dict] += 1 return new_dict def request_pwl_dict(self, pwl: str) -> "Dict": """Request a Dict object for a personal word list. This method behaves as :py:meth:`request_dict` but rather than returning a dictionary for a specific language, it returns a dictionary referencing a personal word list. A personal word list is a file of custom dictionary entries, one word per line. """ self._check_this() new_dict = _e.broker_request_pwl_dict(self._this, pwl.encode()) if new_dict is None: e_str = "Personal Word List file '%s' could not be loaded" self._raise_error(e_str % (pwl,)) if new_dict not in self._live_dicts: self._live_dicts[new_dict] = 1 else: self._live_dicts[new_dict] += 1 d = Dict(False) d._switch_this(new_dict, self) return d def _free_dict(self, dict: "Dict") -> None: """Free memory associated with a dictionary. This method frees system resources associated with a `Dict` object. It is equivalent to calling the object`s `free' method. Once this method has been called on a dictionary, it must not be used again. """ self._free_dict_data(dict._this) dict._this = None dict._broker = None def _free_dict_data(self, dict: _e.t_dict) -> None: """Free the underlying pointer for a dict.""" self._check_this() _e.broker_free_dict(self._this, dict) self._live_dicts[dict] -= 1 if self._live_dicts[dict] == 0: del self._live_dicts[dict] def dict_exists(self, tag: str) -> bool: """Check availability of a dictionary. This method checks whether there is a dictionary available for the language specified by `tag`. It returns `True` if a dictionary is available, and `False` otherwise. """ self._check_this() val = _e.broker_dict_exists(self._this, tag.encode()) return bool(val) def set_ordering(self, tag: str, ordering: str) -> None: """Set dictionary preferences for a language. The Enchant library supports the use of multiple dictionary programs and multiple languages. This method specifies which dictionaries the broker should prefer when dealing with a given language. `tag` must be an appropriate language specification and `ordering` is a string listing the dictionaries in order of preference. For example a valid ordering might be "aspell,myspell,ispell". The value of `tag` can also be set to "*" to set a default ordering for all languages for which one has not been set explicitly. """ self._check_this() _e.broker_set_ordering(self._this, tag.encode(), ordering.encode()) def describe(self) -> List[ProviderDesc]: """Return list of provider descriptions. This method returns a list of descriptions of each of the dictionary providers available. Each entry in the list is a :py:class:`ProviderDesc` object. """ self._check_this() self.__describe_result = [] _e.broker_describe(self._this, self.__describe_callback) return [ProviderDesc(*r) for r in self.__describe_result] def __describe_callback(self, name: bytes, desc: bytes, file: bytes) -> None: """Collector callback for dictionary description. This method is used as a callback into the _enchant function `enchant_broker_describe`. It collects the given arguments in a tuple and appends them to the list :py:attr:`__describe_result`. """ name = name.decode() desc = desc.decode() file = file.decode() self.__describe_result.append((name, desc, file)) def list_dicts(self) -> List[Tuple[str, ProviderDesc]]: """Return list of available dictionaries. This method returns a list of dictionaries available to the broker. Each entry in the list is a two-tuple of the form: (tag,provider) where `tag` is the language lag for the dictionary and `provider` is a :py:class:`ProviderDesc` object describing the provider through which that dictionary can be obtained. """ self._check_this() self.__list_dicts_result = [] _e.broker_list_dicts(self._this, self.__list_dicts_callback) return [(r[0], ProviderDesc(*r[1])) for r in self.__list_dicts_result] def __list_dicts_callback(self, tag, name, desc, file): """Collector callback for listing dictionaries. This method is used as a callback into the _enchant function `enchant_broker_list_dicts`. It collects the given arguments into an appropriate tuple and appends them to :py:attr:`__list_dicts_result`. """ tag = tag.decode() name = name.decode() desc = desc.decode() file = file.decode() self.__list_dicts_result.append((tag, (name, desc, file))) def list_languages(self) -> List[str]: """List languages for which dictionaries are available. This function returns a list of language tags for which a dictionary is available. """ langs = [] for (tag, prov) in self.list_dicts(): if tag not in langs: langs.append(tag) return langs def __describe_dict(self, dict_data): """Get the description tuple for a dict data object. `dict_data` must be a C-library pointer to an enchant dictionary. The return value is a tuple of the form: (,,,) """ # Define local callback function cb_result = [] def cb_func(tag, name, desc, file): tag = tag.decode() name = name.decode() desc = desc.decode() file = file.decode() cb_result.append((tag, name, desc, file)) # Actually call the describer function _e.dict_describe(dict_data, cb_func) return cb_result[0] __describe_dict._DOC_ERRORS = ["desc"] # type: ignore def get_param(self, name: str) -> Any: """Get the value of a named parameter on this broker. Parameters are used to provide runtime information to individual provider backends. See the method :py:meth:`set_param` for more details. .. warning:: This method does **not** work when using the Enchant C library version 2.0 and above """ param = _e.broker_get_param(self._this, name.encode()) if param is not None: param = param.decode() return param get_param._DOC_ERRORS = ["param"] # type: ignore def set_param(self, name: str, value: Any) -> None: """Set the value of a named parameter on this broker. Parameters are used to provide runtime information to individual provider backends. .. warning:: This method does **not** work when using the Enchant C library version 2.0 and above """ name = name.encode() if value is not None: value = value.encode() _e.broker_set_param(self._this, name, value) class Dict(_EnchantObject): """Dictionary object for the Enchant spellchecker. Dictionary objects are responsible for checking the spelling of words and suggesting possible corrections. Each dictionary is owned by a :py:class:`Broker` object, but unless a new :py:class:`Broker` has explicitly been created then this will be the :py:mod:`enchant` module default :py:class:`Broker` and is of little interest. The important methods of this class include: * :py:meth:`check()`: check whether a word is spelled correctly * :py:meth:`suggest()`: suggest correct spellings for a word * :py:meth:`add()`: add a word to the user's personal dictionary * :py:meth:`remove()`: add a word to the user's personal exclude list * :py:meth:`add_to_session()`: add a word to the current spellcheck session * :py:meth:`store_replacement()`: indicate a replacement for a given word Information about the dictionary is available using the following attributes: * :py:attr:`tag`: the language tag of the dictionary * :py:attr:`provider`: a :py:class:`ProviderDesc` object for the dictionary provider """ def __init__( self, tag: Optional[str] = None, broker: Optional[Broker] = None ) -> None: """Dict object constructor. A dictionary belongs to a specific language, identified by the string `tag`. If the tag is not given or is `None`, an attempt to determine the language currently in use is made using the :py:mod:`locale` module. If the current language cannot be determined, :py:exc:`~.errors.Error` is raised. If `tag` is instead given the value of `False`, a 'dead' Dict object is created without any reference to a language. This is typically only useful within PyEnchant itself. Any other non-string value for `tag` raises :py:exc:`~.errors.Error`. Each dictionary must also have an associated `Broker` object which obtains the dictionary information from the underlying system. This may be specified using `broker`. If not given, the default broker is used. """ # Initialise misc object attributes to None self.provider = None # If no tag was given, use the default language if tag is None: tag = get_default_language() if tag is None: raise Error( "No tag specified and default language could not be determined." ) self.tag = tag # If no broker was given, use the default broker if broker is None: broker = _broker self._broker = broker # Now let the superclass initialise the C-library object super().__init__() def _init_this(self) -> None: # Create dead object if False was given as the tag. # Otherwise, use the broker to get C-library pointer data. self._this = None if self.tag: this = self._broker._request_dict_data(self.tag) self._switch_this(this, self._broker) def __del__(self) -> None: """Dict object destructor.""" # Calling free() might fail if python is shutting down try: self._free() except (AttributeError, TypeError): pass def _switch_this(self, this, broker: Broker) -> None: """Switch the underlying C-library pointer for this object. As all useful state for a `Dict` is stored by the underlying C-library pointer, it is very convenient to allow this to be switched at run-time. Pass a new dict data object into this method to affect the necessary changes. The creating `Broker` object (at the Python level) must also be provided. This should *never* *ever* be used by application code. It's a convenience for developers only, replacing the clunkier `data` parameter to `__init__` from earlier versions. """ # Free old dict data Dict._free(self) # Hook in the new stuff self._this = this self._broker = broker # Update object properties desc = self.__describe(check_this=False) self.tag = desc[0] self.provider = ProviderDesc(*desc[1:]) _switch_this._DOC_ERRORS = ["init"] # type: ignore def _check_this(self, msg: Optional[str] = None) -> None: """Extend `_EnchantObject._check_this()` to check Broker validity. It is possible for the managing Broker object to be freed without freeing the `Dict`. Thus validity checking must take into account `self._broker._this` as well as `self._this`. """ if self._broker is None or self._broker._this is None: self._this = None super()._check_this(msg) def _raise_error( self, default: str = "Unspecified Error", eclass: Type[Error] = Error ) -> NoReturn: """Overrides `_EnchantObject._raise_error` to check dict errors.""" err = _e.dict_get_error(self._this) if err == "" or err is None: raise eclass(default) raise eclass(err.decode()) def _free(self) -> None: """Free the system resources associated with a `Dict` object. This method frees underlying system resources for a `Dict` object. Once it has been called, the `Dict` object must no longer be used. It is called automatically when the object is garbage collected. """ if self._this is not None: # The broker may have been freed before the dict. # It will have freed the underlying pointers already. if self._broker is not None and self._broker._this is not None: self._broker._free_dict(self) def check(self, word: str) -> bool: """Check spelling of a word. This method takes a word in the dictionary language and returns `True` if it is correctly spelled, and `False` otherwise. """ self._check_this() # Enchant asserts that the word is non-empty. # Check it up-front to avoid nasty warnings on stderr. if len(word) == 0: raise ValueError("can't check spelling of empty string") val = _e.dict_check(self._this, word.encode()) if val == 0: return True if val > 0: return False self._raise_error() def suggest(self, word: str) -> List[str]: """Suggest possible spellings for a word. This method tries to guess the correct spelling for a given word, returning the possibilities in a list. """ self._check_this() # Enchant asserts that the word is non-empty. # Check it up-front to avoid nasty warnings on stderr. if len(word) == 0: raise ValueError("can't suggest spellings for empty string") suggs = _e.dict_suggest(self._this, word.encode()) return [w.decode() for w in suggs] def add(self, word: str) -> None: """Add a word to the user's personal word list.""" self._check_this() _e.dict_add(self._this, word.encode()) def remove(self, word: str) -> None: """Add a word to the user's personal exclude list.""" self._check_this() _e.dict_remove(self._this, word.encode()) def add_to_pwl(self, word: str) -> None: """Add a word to the user's personal word list.""" warnings.warn( "Dict.add_to_pwl is deprecated, please use Dict.add", category=DeprecationWarning, stacklevel=2, ) self._check_this() _e.dict_add_to_pwl(self._this, word.encode()) def add_to_session(self, word: str) -> None: """Add a word to the session personal list.""" self._check_this() _e.dict_add_to_session(self._this, word.encode()) def remove_from_session(self, word: str) -> None: """Add a word to the session exclude list.""" self._check_this() _e.dict_remove_from_session(self._this, word.encode()) def is_added(self, word: str) -> bool: """Check whether a word is in the personal word list.""" self._check_this() return _e.dict_is_added(self._this, word.encode()) def is_removed(self, word: str) -> bool: """Check whether a word is in the personal exclude list.""" self._check_this() return _e.dict_is_removed(self._this, word.encode()) def store_replacement(self, mis: str, cor: str) -> None: """Store a replacement spelling for a miss-spelled word. This method makes a suggestion to the spellchecking engine that the miss-spelled word `mis` is in fact correctly spelled as `cor`. Such a suggestion will typically mean that `cor` appears early in the list of suggested spellings offered for later instances of `mis`. """ if not mis: raise ValueError("can't store replacement for an empty string") if not cor: raise ValueError("can't store empty string as a replacement") self._check_this() _e.dict_store_replacement(self._this, mis.encode(), cor.encode()) store_replacement._DOC_ERRORS = ["mis", "mis"] # type: ignore def __describe(self, check_this: bool = True) -> Tuple[str, str, str, str]: """Return a tuple describing the dictionary. This method returns a four-element tuple describing the underlying spellchecker system providing the dictionary. It will contain the following strings: * language tag * name of dictionary provider * description of dictionary provider * dictionary file Direct use of this method is not recommended - instead, access this information through the attributes :py:attr:`tag` and :py:attr:`provider`. """ if check_this: self._check_this() _e.dict_describe(self._this, self.__describe_callback) return self.__describe_result def __describe_callback(self, tag, name, desc, file): """Collector callback for dictionary description. This method is used as a callback into the `_enchant` function `enchant_dict_describe`. It collects the given arguments in a tuple and stores them in the attribute :py:attr:`__describe_result`. """ tag = tag.decode() name = name.decode() desc = desc.decode() file = file.decode() self.__describe_result = (tag, name, desc, file) class DictWithPWL(Dict): """Dictionary with separately-managed personal word list. .. note:: As of version 1.4.0, enchant manages a per-user pwl and exclude list. This class is now only needed if you want to explicitly maintain a separate word list in addition to the default one. This class behaves as the standard class :py:class:`Dict`, but also manages a personal word list stored in a separate file. The file must be specified at creation time by the `pwl` argument to the constructor. Words added to the dictionary are automatically appended to the pwl file. A personal exclude list can also be managed, by passing another filename to the constructor in the optional `pel` argument. If this is not given, requests to exclude words are ignored. If either `pwl` or `pel` are `None`, an in-memory word list is used. This will prevent calls to :py:meth:`add()` and :py:meth:`remove()` from affecting the user's default word lists. The `Dict` object managing the PWL is available as the :py:attr:`pwl` attribute. The `Dict` object managing the PEL is available as the :py:attr:`pel` attribute. To create a `DictWithPWL` from the user's default language, use `None` as the `tag` argument. """ _DOC_ERRORS = ["pel", "pel", "PEL", "pel"] def __init__( self, tag: str, pwl: str = None, pel: str = None, broker: Broker = None ) -> None: """DictWithPWL constructor. The argument `pwl`, if not `None,` names a file containing the personal word list. If this file does not exist, it is created with default permissions. The argument `pel`, if not `None,` names a file containing the personal exclude list. If this file does not exist, it is created with default permissions. """ super().__init__(tag, broker) if pwl is not None: if not os.path.exists(pwl): f = open(pwl, "wt") f.close() del f self.pwl = self._broker.request_pwl_dict(pwl) else: self.pwl = PyPWL() if pel is not None: if not os.path.exists(pel): f = open(pel, "wt") f.close() del f self.pel = self._broker.request_pwl_dict( pel ) # type: Union[None, _e.t_dict, PyPWL] else: self.pel = PyPWL() def _check_this(self, msg: Optional[str] = None) -> None: """Extend :py:meth:`Dict._check_this()` to check PWL validity.""" if self.pwl is None: self._free() if self.pel is None: self._free() super()._check_this(msg) self.pwl._check_this(msg) self.pel._check_this(msg) def _free(self) -> None: """Extend :py:meth:`Dict._free()` to free the PWL as well.""" if self.pwl is not None: self.pwl._free() self.pwl = None if self.pel is not None: self.pel._free() self.pel = None super()._free() def check(self, word: str) -> bool: """Check spelling of a word. This method takes a word in the dictionary language and returns `True` if it is correctly spelled, and `False` otherwise. It checks both the dictionary and the personal word list. """ if self.pel.check(word): return False if self.pwl.check(word): return True if super().check(word): return True return False def suggest(self, word: str) -> List[str]: """Suggest possible spellings for a word. This method tries to guess the correct spelling for a given word, returning the possibilities in a list. """ suggs = super().suggest(word) suggs.extend([w for w in self.pwl.suggest(word) if w not in suggs]) for i in range(len(suggs) - 1, -1, -1): if self.pel.check(suggs[i]): del suggs[i] return suggs def add(self, word: str) -> None: """Add a word to the associated personal word list. This method adds the given word to the personal word list, and automatically saves the list to disk. """ self._check_this() self.pwl.add(word) self.pel.remove(word) def remove(self, word: str) -> None: """Add a word to the associated exclude list.""" self._check_this() self.pwl.remove(word) self.pel.add(word) def add_to_pwl(self, word: str) -> None: """Add a word to the associated personal word list. This method adds the given word to the personal word list, and automatically saves the list to disk. """ self._check_this() self.pwl.add_to_pwl(word) self.pel.remove(word) def is_added(self, word: str) -> bool: """Check whether a word is in the personal word list.""" self._check_this() return self.pwl.is_added(word) def is_removed(self, word: str) -> bool: """Check whether a word is in the personal exclude list.""" self._check_this() return self.pel.is_added(word) ## Create a module-level default broker object, and make its important ## methods available at the module level. _broker = Broker() request_dict = _broker.request_dict request_pwl_dict = _broker.request_pwl_dict dict_exists = _broker.dict_exists list_dicts = _broker.list_dicts list_languages = _broker.list_languages get_param = _broker.get_param set_param = _broker.set_param # Expose the "get_version" function. def get_enchant_version() -> str: """Get the version string for the underlying enchant library.""" return _e.get_version().decode() # Expose the "set_prefix_dir" function. def set_prefix_dir(path: str) -> None: """Set the prefix used by the Enchant library to find its plugins Called automatically when the Python library is imported when required. """ return _e.set_prefix_dir(path) set_prefix_dir._DOC_ERRORS = ["plugins"] def get_user_config_dir() -> str: """Return the path that will be used by some Enchant providers to look for custom dictionaries. """ return _e.get_user_config_dir().decode() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/enchant/_enchant.py0000644000175000017500000003226214432433534017235 0ustar00dmerejdmerej# pyenchant # # Copyright (C) 2004-2008, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant._enchant: ctypes-based wrapper for enchant C library This module implements the low-level interface to the underlying C library for enchant. The interface is based on ctypes and tries to do as little as possible while making the higher-level components easier to write. The following conveniences are provided that differ from the underlying C API: * the "enchant" prefix has been removed from all functions, since python has a proper module system * callback functions do not take a user_data argument, since python has proper closures that can manage this internally * string lengths are not passed into functions such as dict_check, since python strings know how long they are """ import ctypes import ctypes.util import os import os.path import platform import sys import textwrap from ctypes import CFUNCTYPE, POINTER, c_char_p, c_int, c_size_t, c_void_p, pointer from typing import Callable, List, Optional, TypeVar # noqa F401 def from_prefix(prefix: str) -> str: find_message("finding from prefix ", prefix) assert os.path.exists(prefix), prefix + " does not exist" bin_path = os.path.join(prefix, "bin") enchant_dll_path = os.path.join(bin_path, "libenchant-2.dll") assert os.path.exists(enchant_dll_path), enchant_dll_path + " does not exist" # Make sure all the dlls found next to libenchant-2.dll # (libglib-2.0-0.dll, libgmodule-2.0-0.dll, ...) can be # used without having to modify %PATH% new_path = bin_path + os.pathsep + os.environ["PATH"] find_message("Prepending ", bin_path, " to %PATH%") os.environ["PATH"] = new_path return enchant_dll_path def from_env_var(library_path: str) -> str: find_message("using PYENCHANT_LIBRARY_PATH env var") assert os.path.exists(library_path), library_path + " does not exist" return library_path def from_package_resources() -> Optional[str]: if sys.platform != "win32": return None bits, _ = platform.architecture() if bits == "64bit": subdir = "mingw64" # hopefully this is compatible else: subdir = "mingw32" # ditto this_path = os.path.dirname(os.path.abspath(__file__)) data_path = os.path.join(this_path, "data", subdir) find_message("looking in ", data_path) if os.path.exists(data_path): return from_prefix(data_path) return None def from_system() -> Optional[str]: # Note: keep enchant-2 first find_message("looking in system") candidates = [ "enchant-2", "libenchant-2", "enchant", "libenchant", "enchant-1", "libenchant-1", ] # On M1 macs, Homebrew is in /opt, which isn't on the default # DYLD_LIBRARY_PATH. Look for the homebrew path explicitly. if sys.platform == "darwin" and platform.machine() == "arm64": candidates.extend( [f"/opt/homebrew/lib/{candidate}" for candidate in candidates] ) for name in candidates: find_message("with name ", name) res = ctypes.util.find_library(name) if res: return res return None VERBOSE_FIND = False def find_message(*args: str) -> None: if not VERBOSE_FIND: return print("pyenchant:: ", *args, sep="") def find_c_enchant_lib() -> Optional[str]: verbose = os.environ.get("PYENCHANT_VERBOSE_FIND") if verbose: global VERBOSE_FIND VERBOSE_FIND = True prefix = os.environ.get("PYENCHANT_ENCHANT_PREFIX") if prefix: return from_prefix(prefix) library_path = os.environ.get("PYENCHANT_LIBRARY_PATH") if library_path: return from_env_var(library_path) from_package = from_package_resources() if from_package: return from_package # Last chance return from_system() enchant_lib_path = find_c_enchant_lib() if enchant_lib_path is None: msg = textwrap.dedent( """\ The 'enchant' C library was not found and maybe needs to be installed. See https://pyenchant.github.io/pyenchant/install.html for details """ ) raise ImportError(msg) find_message("loading library ", enchant_lib_path) e = ctypes.cdll.LoadLibrary(enchant_lib_path) # Always assume the found enchant C dll is inside # the correct directory layout prefix_dir = os.path.dirname(os.path.dirname(enchant_lib_path)) if hasattr(e, "enchant_set_prefix_dir") and prefix_dir: find_message("setting prefix ", prefix_dir) e.enchant_set_prefix_dir(prefix_dir.encode()) def callback(restype, *argtypes): """Factory for generating callback function prototypes. This is factored into a factory so I can easily change the definition for experimentation or debugging. """ return CFUNCTYPE(restype, *argtypes) t_broker_desc_func = callback(None, c_char_p, c_char_p, c_char_p, c_void_p) t_dict_desc_func = callback(None, c_char_p, c_char_p, c_char_p, c_char_p, c_void_p) # Simple typedefs for readability t_broker = c_void_p t_dict = c_void_p _B = TypeVar("_B", bound=t_broker) _D = TypeVar("_D", bound=t_dict) # Now we can define the types of each function we are going to use broker_init = e.enchant_broker_init broker_init.argtypes = [] broker_init.restype = t_broker broker_free = e.enchant_broker_free broker_free.argtypes = [t_broker] broker_free.restype = None broker_request_dict = e.enchant_broker_request_dict broker_request_dict.argtypes = [t_broker, c_char_p] broker_request_dict.restype = t_dict broker_request_pwl_dict = e.enchant_broker_request_pwl_dict broker_request_pwl_dict.argtypes = [t_broker, c_char_p] broker_request_pwl_dict.restype = t_dict broker_free_dict = e.enchant_broker_free_dict broker_free_dict.argtypes = [t_broker, t_dict] broker_free_dict.restype = None broker_dict_exists = e.enchant_broker_dict_exists broker_dict_exists.argtypes = [t_broker, c_char_p] broker_dict_exists.restype = c_int broker_set_ordering = e.enchant_broker_set_ordering broker_set_ordering.argtypes = [t_broker, c_char_p, c_char_p] broker_set_ordering.restype = None broker_get_error = e.enchant_broker_get_error broker_get_error.argtypes = [t_broker] broker_get_error.restype = c_char_p broker_describe1 = e.enchant_broker_describe broker_describe1.argtypes = [t_broker, t_broker_desc_func, c_void_p] broker_describe1.restype = None def broker_describe(broker: _B, cbfunc) -> None: def cbfunc1(*args): cbfunc(*args[:-1]) broker_describe1(broker, t_broker_desc_func(cbfunc1), None) broker_list_dicts1 = e.enchant_broker_list_dicts broker_list_dicts1.argtypes = [t_broker, t_dict_desc_func, c_void_p] broker_list_dicts1.restype = None def broker_list_dicts(broker: _B, cbfunc) -> None: def cbfunc1(*args): cbfunc(*args[:-1]) broker_list_dicts1(broker, t_dict_desc_func(cbfunc1), None) try: broker_get_param = e.enchant_broker_get_param # type: Callable[[_B, bytes], bytes] except AttributeError: # Make the lookup error occur at runtime def broker_get_param(broker: _B, name: bytes) -> bytes: return e.enchant_broker_get_param(broker, name) else: broker_get_param.argtypes = [t_broker, c_char_p] # type: ignore broker_get_param.restype = c_char_p # type: ignore try: broker_set_param = ( e.enchant_broker_set_param ) # type: Callable[[_B, bytes, bytes], None] except AttributeError: # Make the lookup error occur at runtime def broker_set_param(broker: _B, name: bytes, value: bytes) -> None: return e.enchant_broker_set_param(broker, name, value) else: broker_set_param.argtypes = [t_broker, c_char_p, c_char_p] # type: ignore broker_set_param.restype = None # type: ignore try: get_version = e.enchant_get_version # type: Callable[[], bytes] except AttributeError: # Make the lookup error occur at runtime def get_version() -> bytes: return e.enchant_get_version() else: get_version.argtypes = [] # type: ignore get_version.restype = c_char_p # type: ignore try: set_prefix_dir = e.enchant_set_prefix_dir # type: Callable[[bytes], None] except AttributeError: # Make the lookup error occur at runtime def set_prefix_dir(path: bytes): return e.enchant_set_prefix_dir(path) else: set_prefix_dir.argtypes = [c_char_p] # type: ignore set_prefix_dir.restype = None # type: ignore try: get_user_config_dir = e.enchant_get_user_config_dir # type: Callable[[], bytes] except AttributeError: # Make the lookup error occur at runtime def get_user_config_dir() -> bytes: return e.enchant_get_user_config_dir() else: get_user_config_dir.argtypes = [] # type: ignore get_user_config_dir.restype = c_char_p # type: ignore dict_check1 = e.enchant_dict_check dict_check1.argtypes = [t_dict, c_char_p, c_size_t] dict_check1.restype = c_int def dict_check(dict: _D, word: bytes) -> int: return dict_check1(dict, word, len(word)) dict_suggest1 = e.enchant_dict_suggest dict_suggest1.argtypes = [t_dict, c_char_p, c_size_t, POINTER(c_size_t)] dict_suggest1.restype = POINTER(c_char_p) def dict_suggest(dict: _D, word: bytes) -> List[bytes]: num_suggs_p = pointer(c_size_t(0)) suggs_c = dict_suggest1(dict, word, len(word), num_suggs_p) suggs = [] n = 0 while n < num_suggs_p.contents.value: suggs.append(suggs_c[n]) n = n + 1 if num_suggs_p.contents.value > 0: dict_free_string_list(dict, suggs_c) return suggs dict_add1 = e.enchant_dict_add dict_add1.argtypes = [t_dict, c_char_p, c_size_t] dict_add1.restype = None def dict_add(dict: _D, word: bytes) -> None: return dict_add1(dict, word, len(word)) dict_add_to_pwl1 = e.enchant_dict_add dict_add_to_pwl1.argtypes = [t_dict, c_char_p, c_size_t] dict_add_to_pwl1.restype = None def dict_add_to_pwl(dict: _D, word: bytes) -> None: return dict_add_to_pwl1(dict, word, len(word)) dict_add_to_session1 = e.enchant_dict_add_to_session dict_add_to_session1.argtypes = [t_dict, c_char_p, c_size_t] dict_add_to_session1.restype = None def dict_add_to_session(dict: _D, word: bytes) -> None: return dict_add_to_session1(dict, word, len(word)) dict_remove1 = e.enchant_dict_remove dict_remove1.argtypes = [t_dict, c_char_p, c_size_t] dict_remove1.restype = None def dict_remove(dict: _D, word: bytes) -> None: return dict_remove1(dict, word, len(word)) dict_remove_from_session1 = e.enchant_dict_remove_from_session dict_remove_from_session1.argtypes = [t_dict, c_char_p, c_size_t] dict_remove_from_session1.restype = c_int def dict_remove_from_session(dict: _D, word: bytes) -> int: return dict_remove_from_session1(dict, word, len(word)) dict_is_added1 = e.enchant_dict_is_added dict_is_added1.argtypes = [t_dict, c_char_p, c_size_t] dict_is_added1.restype = c_int def dict_is_added(dict: _D, word: bytes) -> int: return dict_is_added1(dict, word, len(word)) dict_is_removed1 = e.enchant_dict_is_removed dict_is_removed1.argtypes = [t_dict, c_char_p, c_size_t] dict_is_removed1.restype = c_int def dict_is_removed(dict: _D, word: bytes) -> int: return dict_is_removed1(dict, word, len(word)) dict_store_replacement1 = e.enchant_dict_store_replacement dict_store_replacement1.argtypes = [t_dict, c_char_p, c_size_t, c_char_p, c_size_t] dict_store_replacement1.restype = None def dict_store_replacement(dict: _D, mis: bytes, cor: bytes) -> None: return dict_store_replacement1(dict, mis, len(mis), cor, len(cor)) dict_free_string_list = e.enchant_dict_free_string_list dict_free_string_list.argtypes = [t_dict, POINTER(c_char_p)] dict_free_string_list.restype = None dict_get_error = e.enchant_dict_get_error dict_get_error.argtypes = [t_dict] dict_get_error.restype = c_char_p dict_describe1 = e.enchant_dict_describe dict_describe1.argtypes = [t_dict, t_dict_desc_func, c_void_p] dict_describe1.restype = None def dict_describe(dict: _D, cbfunc) -> None: def cbfunc1(tag, name, desc, file, data): cbfunc(tag, name, desc, file) dict_describe1(dict, t_dict_desc_func(cbfunc1), None) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1687012851.8770554 pyenchant-3.3.0rc1/enchant/checker/0000755000175000017500000000000014443342764016511 5ustar00dmerejdmerej././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/enchant/checker/CmdLineChecker.py0000644000175000017500000003001614432433534021655 0ustar00dmerejdmerej# pyenchant # # Copyright (C) 2004-2008, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant.checker.CmdLineChecker: Command-Line spell checker This module provides the class :py:class:`CmdLineChecker`, which interactively spellchecks a piece of text by interacting with the user on the command line. It can also be run as a script to spellcheck a file. """ import sys from argparse import ArgumentParser from typing import Optional from enchant.checker import SpellChecker # Helpers colors = { "normal": "\x1b[0m", "black": "\x1b[30m", "red": "\x1b[31m", "green": "\x1b[32m", "yellow": "\x1b[33m", "blue": "\x1b[34m", "purple": "\x1b[35m", "cyan": "\x1b[36m", "grey": "\x1b[90m", "gray": "\x1b[90m", "bold": "\x1b[1m", } def color(string: str, color: str = "normal", prefix: str = "") -> str: """ Change text color for the Linux terminal. Args: string (str): String to colorify color (str): Color to colorify the string in the following list: black, red, green, yellow, blue, purple, cyan, gr[ae]y prefix (str): Prefix to add to string (ex: Beginning of line graphics) """ if sys.stdout.isatty(): return colors[color] + prefix + string + colors["normal"] else: return prefix + string def success(string: str) -> str: return "[" + color("+", color="green") + "] " + string def error(string: str) -> str: return "[" + color("!", color="red") + "] " + string def warning(string: str) -> str: return "[" + color("*", color="yellow") + "] " + string def info(string: str) -> str: return "[" + color(".", color="blue") + "] " + string class CmdLineChecker: """A simple command-line spell checker. This class implements a simple command-line spell checker. It must be given a SpellChecker instance to operate on, and interacts with the user by printing instructions on stdout and reading commands from stdin. """ _DOC_ERRORS = ["stdout", "stdin"] def __init__(self, checker: SpellChecker) -> None: self._stop = False self._checker = checker def get_checker(self, chkr: SpellChecker) -> SpellChecker: return self._checker def run(self) -> None: """Run the spellchecking loop.""" self._stop = False for err in self._checker: self.error = err self.print_error() self.print_suggestions() status = self.read_command() while not status and not self._stop: status = self.read_command() if self._stop: break def print_error(self) -> None: """print the spelling error to the console. Prints the misspelled word along with 100 characters of context on either side. This number was arbitrarily chosen and could be modified to be tunable or changed entirely. It seems to be enough context to be helpful though """ error_string = self._build_context( self.error.get_text(), self.error.word, self.error.wordpos ) print(error("ERROR: %s" % color(self.error.word, color="red"))) print(info("")) print(info(error_string)) print(info("")) @staticmethod def _build_context(text: str, error_word: str, error_start: int) -> str: """creates the context line. This function will search forward and backward from the error word to find the nearest newlines. it will return this line with the error word colored red.""" start_newline = text.rfind("\n", 0, error_start) end_newline = text.find("\n", error_start) return text[start_newline + 1 : end_newline].replace( error_word, color(error_word, color="red") ) def print_suggestions(self) -> None: """Prints out the suggestions for a given error. This function will add vertical pipes to separate choices as well as the index of the replacement as expected by the replace function. I don't believe zero indexing is a problem as long as the user can see the numbers :) """ result = "" suggestions = self.error.suggest() for index, sugg in enumerate(suggestions): if index == 0: result = ( result + color(str(index), color="yellow") + ": " + color(sugg, color="bold") ) else: result = ( result + " | " + color(str(index), color="yellow") + ": " + color(sugg, color="bold") ) print(info("HOW ABOUT:"), result) def print_help(self) -> None: print( info( color("0", color="yellow") + ".." + color("N", color="yellow") + ":\t" + color("replace", color="bold") + " with the numbered suggestion" ) ) print( info( color("R", color="cyan") + color("0", color="yellow") + ".." + color("R", color="cyan") + color("N", color="yellow") + ":\t" + color("always replace", color="bold") + " with the numbered suggestion" ) ) print( info( color("i", color="cyan") + ":\t\t" + color("ignore", color="bold") + " this word" ) ) print( info( color("I", color="cyan") + ":\t\t" + color("always ignore", color="bold") + " this word" ) ) print( info( color("a", color="cyan") + ":\t\t" + color("add", color="bold") + " word to personal dictionary" ) ) print( info( color("e", color="cyan") + ":\t\t" + color("edit", color="bold") + " the word" ) ) print( info( color("q", color="cyan") + ":\t\t" + color("quit", color="bold") + " checking" ) ) print( info( color("h", color="cyan") + ":\t\tprint this " + color("help", color="bold") + " message" ) ) print(info("----------------------------------------------------")) self.print_suggestions() def read_command(self) -> bool: cmd = input(">> ") cmd = cmd.strip() if cmd.isdigit(): repl = int(cmd) suggs = self.error.suggest() if repl >= len(suggs): print(warning("No suggestion number"), repl) return False print( success( "Replacing '%s' with '%s'" % ( color(self.error.word, color="red"), color(suggs[repl], color="green"), ) ) ) self.error.replace(suggs[repl]) return True if cmd[0] == "R": if not cmd[1:].isdigit(): print(warning("Badly formatted command (try 'help')")) return False repl = int(cmd[1:]) suggs = self.error.suggest() if repl >= len(suggs): print(warning("No suggestion number"), repl) return False self.error.replace_always(suggs[repl]) return True if cmd == "i": return True if cmd == "I": self.error.ignore_always() return True if cmd == "a": self.error.add() return True if cmd == "e": replacement = get_input(info("New Word: ")) self.error.replace(replacement.strip()) return True if cmd == "q": self._stop = True return True if "help".startswith(cmd.lower()): self.print_help() return False print(warning("Badly formatted command (try 'help')")) return False def run_on_file( self, infile: str, outfile: Optional[str] = None, enc: Optional[str] = None ) -> None: """Run spellchecking on the named file. This method can be used to run the spellchecker over the named file. If `outfile` is not given, the corrected contents replace the contents of `infile`. If `outfile` is given, the corrected contents will be written to that file. Use "-" to have the contents written to stdout. If `enc` is given, it specifies the encoding used to read the file's contents into a unicode string. The output will be written in the same encoding. """ inStr = open(infile, "r", encoding=enc).read() self._checker.set_text(inStr) begin_msg = "Beginning spell check of %s" % infile print(info(begin_msg)) print(info("-" * len(begin_msg))) self.run() print(success("Completed spell check of %s" % infile)) outStr = self._checker.get_text() if outfile is None: outF = open(infile, "w", encoding=enc) elif outfile == "-": outF = sys.stdout else: outF = open(outfile, "w", encoding=enc) outF.write(outStr) outF.close() run_on_file._DOC_ERRORS = ["outfile", "infile", "outfile", "stdout"] # type: ignore def _run_as_script() -> None: """Run the command-line spellchecker as a script. This function allows the spellchecker to be invoked from the command-line to check spelling in a file. """ parser = ArgumentParser() parser.add_argument( "-o", "--output", dest="outfile", metavar="FILE", help="write changes into FILE" ) parser.add_argument( "-l", "--lang", dest="lang", metavar="TAG", default="en_US", help="use language idenfified by TAG", ) parser.add_argument( "-e", "--encoding", dest="enc", metavar="ENC", help="file is unicode with encoding ENC", ) parser.add_argument("infile", metavar="FILE", help="Input file name to check") args = parser.parse_args() # Create and run the checker chkr = SpellChecker(args.lang) cmdln = CmdLineChecker(chkr) cmdln.run_on_file(args.infile, args.outfile, args.enc) if __name__ == "__main__": _run_as_script() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/enchant/checker/GtkSpellCheckerDialog.py0000644000175000017500000002367714432433534023226 0ustar00dmerejdmerej# GtkSpellCheckerDialog for pyenchant # # Copyright (C) 2004-2005, Fredrik Corneliusson # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # from typing import Any, Iterable, List, Optional import gtk from enchant.checker import SpellChecker # columns COLUMN_SUGGESTION = 0 def create_list_view(col_label: str) -> gtk.TreeView: # create list widget list_ = gtk.ListStore(str) list_view = gtk.TreeView(model=list_) list_view.set_rules_hint(True) list_view.get_selection().set_mode(gtk.SELECTION_SINGLE) # Add Colums renderer = gtk.CellRendererText() renderer.set_data("column", COLUMN_SUGGESTION) column = gtk.TreeViewColumn(col_label, renderer, text=COLUMN_SUGGESTION) list_view.append_column(column) return list_view class GtkSpellCheckerDialog(gtk.Window): def __init__(self, checker: SpellChecker, *args: Any, **kwargs: Any) -> None: super().__init__(*args, **kwargs) self.set_title("Spell check") self.set_default_size(350, 200) self._checker = checker self._numContext = 40 self.errors = None # create accel group accel_group = gtk.AccelGroup() self.add_accel_group(accel_group) # list of widgets to disable if there's no spell error left self._conditional_widgets = [] # type: List[gtk.Widget] conditional = self._conditional_widgets.append # layout mainbox = gtk.VBox(spacing=5) hbox = gtk.HBox(spacing=5) self.add(mainbox) mainbox.pack_start(hbox, padding=5) box1 = gtk.VBox(spacing=5) hbox.pack_start(box1, padding=5) conditional(box1) # unrecognized word text_view_lable = gtk.Label("Unrecognized word") text_view_lable.set_justify(gtk.JUSTIFY_LEFT) box1.pack_start(text_view_lable, False, False) text_view = gtk.TextView() text_view.set_wrap_mode(gtk.WRAP_WORD) text_view.set_editable(False) text_view.set_cursor_visible(False) self.error_text = text_view.get_buffer() text_buffer = text_view.get_buffer() text_buffer.create_tag("fg_black", foreground="black") text_buffer.create_tag("fg_red", foreground="red") box1.pack_start(text_view) # Change to change_to_box = gtk.HBox() box1.pack_start(change_to_box, False, False) change_to_label = gtk.Label("Change to:") self.replace_text = gtk.Entry() text_view_lable.set_justify(gtk.JUSTIFY_LEFT) change_to_box.pack_start(change_to_label, False, False) change_to_box.pack_start(self.replace_text) # scrolled window sw = gtk.ScrolledWindow() sw.set_shadow_type(gtk.SHADOW_ETCHED_IN) sw.set_policy(gtk.POLICY_AUTOMATIC, gtk.POLICY_AUTOMATIC) box1.pack_start(sw) self.suggestion_list_view = create_list_view("Suggestions") self.suggestion_list_view.connect("button_press_event", self._onButtonPress) self.suggestion_list_view.connect("cursor-changed", self._onSuggestionChanged) sw.add(self.suggestion_list_view) # ---Buttons---#000000#FFFFFF---------------------------------------------------- button_box = gtk.VButtonBox() hbox.pack_start(button_box, False, False) # Ignore button = gtk.Button("Ignore") button.connect("clicked", self._onIgnore) button.add_accelerator( "activate", accel_group, gtk.keysyms.Return, 0, gtk.ACCEL_VISIBLE ) button_box.pack_start(button) conditional(button) # Ignore all button = gtk.Button("Ignore All") button.connect("clicked", self._onIgnoreAll) button_box.pack_start(button) conditional(button) # Replace button = gtk.Button("Replace") button.connect("clicked", self._onReplace) button_box.pack_start(button) conditional(button) # Replace all button = gtk.Button("Replace All") button.connect("clicked", self._onReplaceAll) button_box.pack_start(button) conditional(button) # Recheck button button = gtk.Button("_Add") button.connect("clicked", self._onAdd) button_box.pack_start(button) conditional(button) # Close button button = gtk.Button(stock=gtk.STOCK_CLOSE) button.connect("clicked", self._onClose) button.add_accelerator( "activate", accel_group, gtk.keysyms.Escape, 0, gtk.ACCEL_VISIBLE ) button_box.pack_end(button) # dictionary label self._dict_lable = gtk.Label("Dictionary:%s" % (checker.dict.tag,)) mainbox.pack_start(self._dict_lable, False, False, padding=5) mainbox.show_all() def _onIgnore(self, w: gtk.Widget, *args: Any) -> None: print(["ignore"]) self._advance() def _onIgnoreAll(self, w: gtk.Widget, *args: Any) -> None: print(["ignore all"]) self._checker.ignore_always() self._advance() def _onReplace(self, *args: Any) -> None: print(["Replace"]) repl = self._getRepl() self._checker.replace(repl) self._advance() def _onReplaceAll(self, *args: Any) -> None: print(["Replace all"]) repl = self._getRepl() self._checker.replace_always(repl) self._advance() def _onAdd(self, *args: Any) -> None: """Callback for the "add" button.""" self._checker.add() self._advance() def _onClose(self, w: gtk.Widget, *args: Any) -> bool: self.emit("delete_event", gtk.gdk.Event(gtk.gdk.BUTTON_PRESS)) return True def _onButtonPress(self, widget: gtk.Widget, event) -> None: if event.type == gtk.gdk._2BUTTON_PRESS: print(["Double click!"]) self._onReplace() def _onSuggestionChanged(self, widget: gtk.Widget, *args: Any) -> None: selection = self.suggestion_list_view.get_selection() model, iter = selection.get_selected() if iter: suggestion = model.get_value(iter, COLUMN_SUGGESTION) self.replace_text.set_text(suggestion) def _getRepl(self) -> str: """Get the chosen replacement string.""" repl = self.replace_text.get_text() repl = self._checker.coerce_string(repl) return repl def _fillSuggestionList(self, suggestions: Iterable[str]) -> None: model = self.suggestion_list_view.get_model() model.clear() for suggestion in suggestions: value = "%s" % (suggestion,) model.append([value]) def updateUI(self) -> None: self._advance() def _disableButtons(self) -> None: for w in self._conditional_widgets: w.set_sensitive(False) def _enableButtons(self) -> None: for w in self._conditional_widgets: w.set_sensitive(True) def _advance(self) -> None: """Advance to the next error. This method advances the SpellChecker to the next error, if any. It then displays the error and some surrounding context, and well as listing the suggested replacements. """ # Disable interaction if no checker if self._checker is None: self._disableButtons() self.emit("check-done") return # Advance to next error, disable if not available try: self._checker.next() except StopIteration: self._disableButtons() self.error_text.set_text("") self._fillSuggestionList([]) self.replace_text.set_text("") return self._enableButtons() # Display error context with erroneous word in red self.error_text.set_text("") iter = self.error_text.get_iter_at_offset(0) append = self.error_text.insert_with_tags_by_name lContext = self._checker.leading_context(self._numContext) tContext = self._checker.trailing_context(self._numContext) append(iter, lContext, "fg_black") append(iter, self._checker.word, "fg_red") append(iter, tContext, "fg_black") # Display suggestions in the replacements list suggs = self._checker.suggest() self._fillSuggestionList(suggs) if suggs: self.replace_text.set_text(suggs[0]) else: self.replace_text.set_text("") def _test() -> None: from enchant.checker import SpellChecker text = "This is sme text with a fw speling errors in it. Here are a fw more to tst it ut." print(["BEFORE:", text]) chkr = SpellChecker("en_US", text) chk_dlg = GtkSpellCheckerDialog(chkr) chk_dlg.show() chk_dlg.connect("delete_event", gtk.main_quit) chk_dlg.updateUI() gtk.main() if __name__ == "__main__": _test() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/enchant/checker/__init__.py0000644000175000017500000003571314432433534020625 0ustar00dmerejdmerej# pyenchant # # Copyright (C) 2004-2008, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant.checker: High-level spellchecking functionality ======================================================== This package is designed to host higher-level spellchecking functionality than is available in the base enchant package. It should make writing applications that follow common usage idioms significantly easier. The most useful class is :py:class:`SpellChecker`, which implements a spellchecking loop over a block of text. It is capable of modifying the text in-place if given an array of characters to work with. This package also contains several interfaces to the SpellChecker class, such as a wxPython GUI dialog and a command-line interface. """ import array import warnings from typing import List, Optional, Type, Union import enchant from enchant import Dict from enchant.errors import * # noqa F401,F403 from enchant.errors import ( DefaultLanguageNotFoundError, DictNotFoundError, TokenizerNotFoundError, ) from enchant.tokenize import Chunker, Filter, get_tokenizer, tokenize from enchant.utils import get_default_language class SpellChecker: """Class implementing stateful spellchecking behaviour. This class is designed to implement a spell-checking loop over a block of text, correcting/ignoring/replacing words as required. This loop is implemented using an iterator paradigm so it can be embedded inside other loops of control. The `SpellChecker` object is stateful, and the appropriate methods must be called to alter its state and affect the progress of the spell checking session. At any point during the checking session, the attribute :py:attr:`word` will hold the current erroneously spelled word under consideration. The action to take on this word is determined by calling methods such as :py:meth:`replace`, :py:meth:`replace_always` and :py:meth:`ignore_always`. Once this is done, calling :py:meth:`next` advances to the next misspelled word. As a quick (and rather silly) example, the following code replaces each misspelled word with the string "SPAM": >>> text = "This is sme text with a fw speling errors in it." >>> chkr = SpellChecker("en_US",text) >>> for err in chkr: ... err.replace("SPAM") ... >>> chkr.get_text() 'This is SPAM text with a SPAM SPAM errors in it.' >>> Internally, the `SpellChecker` always works with arrays of (possibly unicode) character elements. This allows the in-place modification of the string as it is checked, and is the closest thing Python has to a mutable string. The text can be set as any of a normal string, unicode string, character array or unicode character array. The :py:meth:`get_text` method will return the modified array object if an array is used, or a new string object if a string it used. Words input to the `SpellChecker` may be either plain strings or unicode objects. They will be converted to the same type as the text being checked, using python's default encoding/decoding settings. If using an array of characters with this object and the array is modified outside of the spellchecking loop, use the method :py:meth:`set_offset` to reposition the internal loop pointer to make sure it doesn't skip any words. """ _DOC_ERRORS = ["sme", "fw", "speling", "chkr", "chkr", "chkr"] def __init__( self, lang: Union[Dict, str] = None, text: Optional[str] = None, tokenize: Union[Type[tokenize], Filter] = None, chunkers: List[Chunker] = None, filters: List[Filter] = None, ) -> None: """Constructor for the `SpellChecker` class. `SpellChecker` objects can be created in two ways, depending on the nature of the first argument. If it is a string, it specifies a language tag from which a dictionary is created. Otherwise, it must be an :py:class:`enchant.Dict` object to be used. Optional keyword arguments are: :param text: to set the text to be checked at creation time :param tokenize: a custom tokenization function to use :param chunkers: a list of chunkers to apply during tokenization :param filters: a list of filters to apply during tokenization If `tokenize` is not given and the first argument is a :py:class:`Dict`, its `tag` attribute must be a language tag so that a tokenization function can be created automatically. If this attribute is missing the user's default language will be used. """ if lang is None: lang = get_default_language() if isinstance(lang, (str, bytes)): try: dict = enchant.Dict(lang) except DictNotFoundError: raise DefaultLanguageNotFoundError(lang) from None else: dict = lang try: lang = dict.tag except AttributeError: lang = get_default_language() if lang is None: raise DefaultLanguageNotFoundError from None self.lang = lang self.dict = dict if tokenize is None: try: tokenize = get_tokenizer(lang, chunkers, filters) except TokenizerNotFoundError: # Fall back to default tokenization if no match for 'lang' tokenize = get_tokenizer(None, chunkers, filters) self._tokenize = tokenize self.word = None self.wordpos = None self._ignore_words = {} self._replace_words = {} # Default to the empty string as the text to be checked self._text = array.array("u") self._use_tostring = False self._tokens = iter([]) if text is not None: self.set_text(text) def __iter__(self): """Each SpellChecker object is its own iterator""" return self def set_text(self, text: str) -> None: """Set the text to be spell-checked. This method must be called, or the `text` argument supplied to the constructor, before calling the method :py:meth:`next()`. """ # Convert to an array object if necessary if isinstance(text, (str, bytes)): if type(text) is str: self._text = array.array("u", text) else: self._text = array.array("c", text) self._use_tostring = True else: self._text = text self._use_tostring = False self._tokens = self._tokenize(self._text) def get_text(self) -> str: """Return the spell-checked text.""" if self._use_tostring: return self._array_to_string(self._text) return self._text def _array_to_string(self, text): """Format an internal array as a standard string.""" if text.typecode == "u": return text.tounicode() return text.tostring() def wants_unicode(self) -> bool: """Check whether the checker wants unicode strings. This method will return `True` if the checker wants unicode strings as input, `False` if it wants normal strings. It's important to provide the correct type of string to the checker. """ return self._text.typecode == "u" def coerce_string(self, text: str, enc: Optional[str] = None) -> str: """Coerce string into the required type. This method can be used to automatically ensure that strings are of the correct type required by this checker - either unicode or standard. If there is a mismatch, conversion is done using python's default encoding unless another encoding is specified. """ if self.wants_unicode(): if not isinstance(text, str): if enc is None: return text.decode() else: return text.decode(enc) return text if not isinstance(text, bytes): if enc is None: return text.encode() else: return text.encode(enc) return text def __next__(self): return self.next() def next(self) -> "SpellChecker": """Process text up to the next spelling error. This method is designed to support the iterator protocol. Each time it is called, it will advance the :py:attr:`word` attribute to the next spelling error in the text. When no more errors are found, it will raise :py:exc:`StopIteration`. The method will always return `self`, so that it can be used sensibly in common idioms such as:: for err in checker: err.do_something() """ # Find the next spelling error. # The uncaught StopIteration from next(self._tokens) # will provide the StopIteration for this method while True: (word, pos) = next(self._tokens) # decode back to a regular string word = self._array_to_string(word) if self.dict.check(word): continue if word in self._ignore_words: continue self.word = word self.wordpos = pos if word in self._replace_words: self.replace(self._replace_words[word]) continue break return self def replace(self, repl: str) -> None: """Replace the current erroneous word with the given string.""" repl = self.coerce_string(repl) a_repl = array.array(self._text.typecode, repl) if repl: self.dict.store_replacement(self.word, repl) self._text[self.wordpos : self.wordpos + len(self.word)] = a_repl incr = len(repl) - len(self.word) self._tokens.set_offset(self._tokens.offset + incr, replaced=True) def replace_always(self, word: str, repl: Optional[str] = None) -> None: """Always replace given word with given replacement. If a single argument is given, this is used to replace the current erroneous word. If two arguments are given, that combination is added to the list for future use. """ if repl is None: repl = word word = self.word repl = self.coerce_string(repl) word = self.coerce_string(word) self._replace_words[word] = repl if self.word == word: self.replace(repl) def ignore_always(self, word: Optional[str] = None) -> None: """Add given word to list of words to ignore. If no word is given, the current erroneous word is added. """ if word is None: word = self.word word = self.coerce_string(word) if word not in self._ignore_words: self._ignore_words[word] = True def add_to_personal(self, word: Optional[str] = None) -> None: """Add given word to the personal word list. If no word is given, the current erroneous word is added. """ warnings.warn( "SpellChecker.add_to_personal is deprecated, " "please use SpellChecker.add", category=DeprecationWarning, stacklevel=2, ) self.add(word) def add(self, word: Optional[str] = None) -> None: """Add given word to the personal word list. If no word is given, the current erroneous word is added. """ if word is None: word = self.word self.dict.add(word) def suggest(self, word: Optional[str] = None) -> List[str]: """Return suggested spellings for the given word. If no word is given, the current erroneous word is used. """ if word is None: word = self.word suggs = self.dict.suggest(word) return suggs def check(self, word: str) -> bool: """Check correctness of the given word.""" return self.dict.check(word) def set_offset(self, off: int, whence: int = 0) -> None: """Set the offset of the tokenization routine. For more details on the purpose of the tokenization offset, see the documentation of the module :py:mod:`enchant.tokenize`. The optional argument `whence` indicates the method by which to change the offset: * 0 (the default) treats `off` as an increment * 1 treats `off` as a distance from the start * 2 treats `off` as a distance from the end """ if whence == 0: self._tokens.set_offset(self._tokens.offset + off) elif whence == 1: assert off > 0 self._tokens.set_offset(off) elif whence == 2: assert off > 0 self._tokens.set_offset(len(self._text) - 1 - off) else: raise ValueError("Invalid value for whence: %s" % (whence,)) def leading_context(self, chars: int) -> str: """Get `chars` characters of leading context. This method returns up to `chars` characters of leading context - the text that occurs in the string immediately before the current erroneous word. """ start = max(self.wordpos - chars, 0) context = self._text[start : self.wordpos] return self._array_to_string(context) def trailing_context(self, chars: int) -> str: """Get `chars` characters of trailing context. This method returns up to `chars` characters of trailing context - the text that occurs in the string immediately after the current erroneous word. """ start = self.wordpos + len(self.word) end = min(start + chars, len(self._text)) context = self._text[start:end] return self._array_to_string(context) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/enchant/checker/wxSpellCheckerDialog.py0000644000175000017500000002462214432433534023126 0ustar00dmerejdmerej# pyenchant # # Copyright (C) 2004-2008, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # # Major code cleanup and re-write thanks to Phil Mayes, 2007 # """ enchant.checker.wxSpellCheckerDialog: wxPython spellchecker interface This module provides the class :py:class:`wxSpellCheckerDialog`, which provides a wxPython dialog that can be used as an interface to a spell checking session. Currently it is intended as a proof-of-concept and demonstration class, but it should be suitable for general-purpose use in a program. The class must be given an :py:class:`enchant.checker.SpellChecker` object with which to operate. It can (in theory...) be used in modal and non-modal modes. Use `Show()` when operating on an array of characters as it will modify the array in place, meaning other work can be done at the same time. Use `ShowModal()` when operating on a static string. """ _DOC_ERRORS = ["ShowModal"] from typing import Any, Optional import wx from enchant.checker import SpellChecker class wxSpellCheckerDialog(wx.Dialog): """Simple spellcheck dialog for wxPython This class implements a simple spellcheck interface for wxPython, in the form of a dialog. It's intended mainly of an example of how to do this, although it should be useful for applications that just need a simple graphical spellchecker. To use, a :py:class:`SpellChecker` instance must be created and passed to the dialog before it is shown: >>> chkr = SpellChecker("en_AU",text) >>> dlg = wxSpellCheckerDialog(chkr,None,-1,"") >>> dlg.Show() This is most useful when the text to be checked is in the form of a character array, as it will be modified in place as the user interacts with the dialog. For checking strings, the final result will need to be obtained from the `SpellChecker` object: >>> chkr = SpellChecker("en_AU",text) >>> dlg = wxSpellCheckerDialog(chkr,None,-1,"") >>> dlg.ShowModal() >>> text = chkr.get_text() Currently the checker must deal with strings of the same type as returned by wxPython - unicode or normal string depending on the underlying system. This needs to be fixed, somehow... """ _DOC_ERRORS = [ "dlg", "chkr", "dlg", "chkr", "dlg", "dlg", "chkr", "dlg", "chkr", "dlg", "ShowModal", "dlg", ] # Remember dialog size across invocations by storing it on the class sz = (300, 70) def __init__( self, checker: SpellChecker, parent: Optional[Any] = None, id: int = -1, title: str = "Checking Spelling...", size=wxSpellCheckerDialog.sz, style=wx.DEFAULT_DIALOG_STYLE | wx.RESIZE_BORDER, ) -> None: self._numContext = 40 self._checker = checker self._buttonsEnabled = True self.error_text = wx.TextCtrl( self, -1, "", style=wx.TE_MULTILINE | wx.TE_READONLY | wx.TE_RICH ) self.replace_text = wx.TextCtrl(self, -1, "", style=wx.TE_PROCESS_ENTER) self.replace_list = wx.ListBox(self, -1, style=wx.LB_SINGLE) self.InitLayout() wx.EVT_LISTBOX(self, self.replace_list.GetId(), self.OnReplSelect) wx.EVT_LISTBOX_DCLICK(self, self.replace_list.GetId(), self.OnReplace) def InitLayout(self) -> None: """Lay out controls and add buttons.""" sizer = wx.BoxSizer(wx.HORIZONTAL) txtSizer = wx.BoxSizer(wx.VERTICAL) btnSizer = wx.BoxSizer(wx.VERTICAL) replaceSizer = wx.BoxSizer(wx.HORIZONTAL) txtSizer.Add( wx.StaticText(self, -1, "Unrecognised Word:"), 0, wx.LEFT | wx.TOP, 5 ) txtSizer.Add(self.error_text, 1, wx.ALL | wx.EXPAND, 5) replaceSizer.Add( wx.StaticText(self, -1, "Replace with:"), 0, wx.ALL | wx.ALIGN_CENTER_VERTICAL, 5, ) replaceSizer.Add(self.replace_text, 1, wx.ALL | wx.ALIGN_CENTER_VERTICAL, 5) txtSizer.Add(replaceSizer, 0, wx.EXPAND, 0) txtSizer.Add(self.replace_list, 2, wx.ALL | wx.EXPAND, 5) sizer.Add(txtSizer, 1, wx.EXPAND, 0) self.buttons = [] for label, action, tip in ( ("Ignore", self.OnIgnore, "Ignore this word and continue"), ( "Ignore All", self.OnIgnoreAll, "Ignore all instances of this word and continue", ), ("Replace", self.OnReplace, "Replace this word"), ("Replace All", self.OnReplaceAll, "Replace all instances of this word"), ("Add", self.OnAdd, "Add this word to the dictionary"), ("Done", self.OnDone, "Finish spell-checking and accept changes"), ): btn = wx.Button(self, -1, label) btn.SetToolTip(wx.ToolTip(tip)) btnSizer.Add(btn, 0, wx.ALIGN_RIGHT | wx.ALL, 4) btn.Bind(wx.EVT_BUTTON, action) self.buttons.append(btn) sizer.Add(btnSizer, 0, wx.ALL | wx.EXPAND, 5) self.SetAutoLayout(True) self.SetSizer(sizer) sizer.Fit(self) def Advance(self) -> bool: """Advance to the next error. This method advances the SpellChecker to the next error, if any. It then displays the error and some surrounding context, and well as listing the suggested replacements. """ # Advance to next error, disable if not available try: self._checker.next() except StopIteration: self.EnableButtons(False) self.error_text.SetValue("") self.replace_list.Clear() self.replace_text.SetValue("") if self.IsModal(): # test needed for SetSpellChecker call # auto-exit when checking complete self.EndModal(wx.ID_OK) return False self.EnableButtons() # Display error context with erroneous word in red. # Restoring default style was misbehaving under win32, so # I am forcing the rest of the text to be black. self.error_text.SetValue("") self.error_text.SetDefaultStyle(wx.TextAttr(wx.BLACK)) lContext = self._checker.leading_context(self._numContext) self.error_text.AppendText(lContext) self.error_text.SetDefaultStyle(wx.TextAttr(wx.RED)) self.error_text.AppendText(self._checker.word) self.error_text.SetDefaultStyle(wx.TextAttr(wx.BLACK)) tContext = self._checker.trailing_context(self._numContext) self.error_text.AppendText(tContext) # Display suggestions in the replacements list suggs = self._checker.suggest() self.replace_list.Set(suggs) self.replace_text.SetValue(suggs and suggs[0] or "") return True def EnableButtons(self, state: bool = True) -> None: """Enable the checking-related buttons""" if state != self._buttonsEnabled: for btn in self.buttons[:-1]: btn.Enable(state) self._buttonsEnabled = state def GetRepl(self) -> str: """Get the chosen replacement string.""" repl = self.replace_text.GetValue() return repl def OnAdd(self, evt: Any) -> None: """Callback for the "add" button.""" self._checker.add() self.Advance() def OnDone(self, evt: Any) -> None: """Callback for the "close" button.""" wxSpellCheckerDialog.sz = self.error_text.GetSizeTuple() if self.IsModal(): self.EndModal(wx.ID_OK) else: self.Close() def OnIgnore(self, evt: Any) -> None: """Callback for the "ignore" button. This simply advances to the next error. """ self.Advance() def OnIgnoreAll(self, evt: Any) -> None: """Callback for the "ignore all" button.""" self._checker.ignore_always() self.Advance() def OnReplace(self, evt: Any) -> None: """Callback for the "replace" button.""" repl = self.GetRepl() if repl: self._checker.replace(repl) self.Advance() def OnReplaceAll(self, evt: Any) -> None: """Callback for the "replace all" button.""" repl = self.GetRepl() self._checker.replace_always(repl) self.Advance() def OnReplSelect(self, evt: Any) -> None: """Callback when a new replacement option is selected.""" sel = self.replace_list.GetSelection() if sel == -1: return opt = self.replace_list.GetString(sel) self.replace_text.SetValue(opt) def _test(): class TestDialog(wxSpellCheckerDialog): def __init__(self, *args): super().__init__(*args) wx.EVT_CLOSE(self, self.OnClose) def OnClose(self, evnt): print(["AFTER:", dlg._checker.get_text()]) self.Destroy() from enchant.checker import SpellChecker text = "This is sme text with a fw speling errors in it. Here are a fw more to tst it ut." print(["BEFORE:", text]) chkr = SpellChecker("en_US", text) app = wx.App(False) dlg = TestDialog(chkr) dlg.Show() app.MainLoop() if __name__ == "__main__": _test() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/enchant/errors.py0000644000175000017500000000405314432433534016767 0ustar00dmerejdmerej# pyenchant # # Copyright (C) 2004-2008, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant.errors: Error class definitions for the enchant library ================================================================ All error classes are defined in this separate sub-module, so that they can safely be imported without causing circular dependencies. """ class Error(Exception): """Base exception class for the enchant module.""" pass class DictNotFoundError(Error): """Exception raised when a requested dictionary could not be found.""" pass class TokenizerNotFoundError(Error): """Exception raised when a requested tokenizer could not be found.""" pass class DefaultLanguageNotFoundError(Error): """Exception raised when a default language could not be found.""" pass ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/enchant/pypwl.py0000644000175000017500000002273114432433534016631 0ustar00dmerejdmerej# pyenchant # # Copyright (C) 2004-2011 Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ pypwl: pure-python personal word list in the style of Enchant ============================================================== This module provides a pure-python version of the personal word list functionality found in the spellchecking package Enchant. While the same effect can be achieved (with better performance) using the python bindings for Enchant, it requires a C extension. This pure-python implementation uses the same algorithm but without any external dependencies or C code (in fact, it was the author's original prototype for the C version found in Enchant). """ import os import warnings from typing import Dict, Iterable, Iterator, List, Optional # noqa F401 class Trie: """Class implementing a trie-based dictionary of words. A Trie is a recursive data structure storing words by their prefix. "Fuzzy matching" can be done by allowing a certain number of missteps when traversing the Trie. """ def __init__(self, words: Iterable[str] = ()) -> None: self._eos = False # whether I am the end of a word self._keys = {} # type: Dict[str, Trie] # letters at this level of the trie for w in words: self.insert(w) def insert(self, word: str) -> None: if word == "": self._eos = True else: key = word[0] try: subtrie = self[key] except KeyError: subtrie = Trie() self[key] = subtrie subtrie.insert(word[1:]) def remove(self, word: str) -> None: if word == "": self._eos = False else: key = word[0] try: subtrie = self[key] except KeyError: pass else: subtrie.remove(word[1:]) def search(self, word: str, nerrs: int = 0) -> List[str]: """Search for the given word, possibly making errors. This method searches the trie for the given `word`, making precisely `nerrs` errors. It returns a list of words found. """ res = [] # type: List[str] # Terminate if we've run out of errors if nerrs < 0: return res # Precise match at the end of the word if nerrs == 0 and word == "": if self._eos: res.append("") # Precisely match word[0] try: subtrie = self[word[0]] subres = subtrie.search(word[1:], nerrs) for w in subres: w2 = word[0] + w if w2 not in res: res.append(w2) except (IndexError, KeyError): pass # match with deletion of word[0] try: subres = self.search(word[1:], nerrs - 1) for w in subres: if w not in res: res.append(w) except (IndexError,): pass # match with insertion before word[0] try: for k in self._keys: subres = self[k].search(word, nerrs - 1) for w in subres: w2 = k + w if w2 not in res: res.append(w2) except (IndexError, KeyError): pass # match on substitution of word[0] try: for k in self._keys: subres = self[k].search(word[1:], nerrs - 1) for w in subres: w2 = k + w if w2 not in res: res.append(w2) except (IndexError, KeyError): pass # All done! return res search._DOC_ERRORS = ["nerrs"] # type: ignore def __getitem__(self, key: str) -> "Trie": return self._keys[key] def __setitem__(self, key: str, val: "Trie") -> None: self._keys[key] = val def __iter__(self) -> Iterator[str]: if self._eos: yield "" for k in self._keys: for w2 in self._keys[k]: yield k + w2 class PyPWL: """Pure-python implementation of Personal Word List dictionary. This class emulates the PWL objects provided by PyEnchant, but implemented purely in python. """ def __init__(self, pwl: Optional[str] = None) -> None: """PyPWL constructor. This method takes as its only argument the name of a file containing the personal word list, one word per line. Entries will be read from this file, and new entries will be written to it automatically. If `pwl` is not specified or None, the list is maintained in memory only. """ self.provider = None self._words = Trie() if pwl is not None: self.pwl = os.path.abspath(pwl) # type: Optional[str] self.tag = self.pwl pwl_f = open(pwl) for ln in pwl_f: word = ln.strip() self.add_to_session(word) pwl_f.close() else: self.pwl = None self.tag = "PyPWL" def check(self, word: str) -> bool: """Check spelling of a word. This method takes a word in the dictionary language and returns `True` if it is correctly spelled, and `False` otherwise. """ res = self._words.search(word) return bool(res) def suggest(self, word: str) -> List[str]: """Suggest possible spellings for a word. This method tries to guess the correct spelling for a given word, returning the possibilities in a list. """ limit = 10 maxdepth = 5 # Iterative deepening until we get enough matches depth = 0 res = self._words.search(word, depth) while len(res) < limit and depth < maxdepth: depth += 1 for w in self._words.search(word, depth): if w not in res: res.append(w) # Limit number of suggs return res[:limit] def add(self, word: str) -> None: """Add a word to the user's personal dictionary. For a PWL, this means appending it to the file. """ if self.pwl is not None: pwl_f = open(self.pwl, "a") pwl_f.write("%s\n" % (word.strip(),)) pwl_f.close() self.add_to_session(word) def add_to_pwl(self, word: str) -> None: """Add a word to the user's personal dictionary. For a PWL, this means appending it to the file. """ warnings.warn( "PyPWL.add_to_pwl is deprecated, please use PyPWL.add", category=DeprecationWarning, stacklevel=2, ) self.add(word) def remove(self, word: str) -> None: """Add a word to the user's personal exclude list.""" # There's no exclude list for a stand-alone PWL. # Just remove it from the list. self._words.remove(word) if self.pwl is not None: pwl_f = open(self.pwl, "wt") for w in self._words: pwl_f.write("%s\n" % (w.strip(),)) pwl_f.close() def add_to_session(self, word: str) -> None: """Add a word to the session list.""" self._words.insert(word) def store_replacement(self, mis: str, cor: str) -> None: """Store a replacement spelling for a miss-spelled word. This method makes a suggestion to the spellchecking engine that the miss-spelled word `mis` is in fact correctly spelled as `cor`. Such a suggestion will typically mean that `cor` appears early in the list of suggested spellings offered for later instances of `mis`. """ # Too much work for this simple spellchecker pass store_replacement._DOC_ERRORS = ["mis", "mis"] # type: ignore def is_added(self, word: str) -> bool: """Check whether a word is in the personal word list.""" return self.check(word) def is_removed(self, word: str) -> bool: """Check whether a word is in the personal exclude list.""" return False # No-op methods to support internal use as a Dict() replacement def _check_this(self, msg: str) -> None: pass def _free(self) -> None: pass ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1687012851.8770554 pyenchant-3.3.0rc1/enchant/tokenize/0000755000175000017500000000000014443342764016735 5ustar00dmerejdmerej././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/enchant/tokenize/__init__.py0000644000175000017500000005153614432433534021052 0ustar00dmerejdmerej# pyenchant # # Copyright (C) 2004-2009, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant.tokenize: String tokenization functions for PyEnchant ================================================================ An important task in spellchecking is breaking up large bodies of text into their constituent words, each of which is then checked for correctness. This package provides Python functions to split strings into words according to the rules of a particular language. Each tokenization function accepts a string as its only positional argument, and returns an iterator that yields tuples of the following form, one for each word found:: (,) The meanings of these fields should be clear: `word` is the word that was found and `pos` is the position within the text at which the word began (zero indexed, of course). The function will work on any string-like object that supports array-slicing; in particular character-array objects from the :py:mod:`array` module may be used. The iterator also provides the attribute :py:attr:`~.tokenize.offset` which gives the current position of the tokenizer inside the string being split, and the method :py:meth:`~tokenize.set_offset` for manually adjusting this position. This can be used for example if the string's contents have changed during the tokenization process. To obtain an appropriate tokenization function for the language identified by `tag`, use the function :py:func:`get_tokenizer`:: tknzr = get_tokenizer("en_US") for (word,pos) in tknzr("text to be tokenized goes here") do_something(word) This library is designed to be easily extendible by third-party authors. To register a tokenization function for the language `tag`, implement it as the function `tokenize` within the module `enchant.tokenize.`. The function :py:func:`get_tokenizer` will automatically detect it. Note that the underscore must be used as the tag component separator in this case, in order to form a valid python module name. (e.g. "en_US" rather than "en-US") Currently, a tokenizer has only been implemented for the English language. Based on the author's limited experience, this should be at least partially suitable for other languages. This module also provides various implementations of Chunkers and Filters. These classes are designed to make it easy to work with text in a variety of common formats, by detecting and excluding parts of the text that don't need to be checked. A :py:class:`Chunker` is a class designed to break a body of text into large chunks of checkable content; for example the :py:class:`HTMLChunker` class extracts the text content from all HTML tags but excludes the tags themselves. A :py:class:`Filter` is a class designed to skip individual words during the checking process; for example the :py:class:`URLFilter` class skips over any words that have the format of a URL. For example, to spellcheck an HTML document it is necessary to split the text into chunks based on HTML tags, and to filter out common word forms such as URLs and WikiWords. This would look something like the following:: tknzr = get_tokenizer("en_US",(HTMLChunker,),(URLFilter,WikiWordFilter))) text = "the url is http://example.com" for (word,pos) in tknzer(text): ...check each word and react accordingly... """ _DOC_ERRORS = [ "pos", "pos", "tknzr", "URLFilter", "WikiWordFilter", "tkns", "tknzr", "pos", "tkns", ] import array import re import warnings from typing import Callable, Iterable, List, Optional, Tuple, Type, Union, cast from enchant.errors import TokenizerNotFoundError Token = Tuple[str, int] # For backwards-compatibility. This will eventually be removed, but how # does one mark a module-level constant as deprecated? Error = TokenizerNotFoundError class tokenize: # noqa: N801 """Base class for all tokenizer objects. Each tokenizer must be an iterator and provide the :py:attr:`offset` attribute as described in the documentation for this module. While tokenizers are in fact classes, they should be treated like functions, and so are named using lower_case rather than the CamelCase more traditional of class names. """ _DOC_ERRORS = ["CamelCase"] def __init__(self, text: str) -> None: self._text = text self._offset = 0 def __next__(self) -> Token: return self.next() def next(self) -> Token: raise NotImplementedError() def __iter__(self): return self def set_offset(self, offset: int, replaced: bool = False) -> None: self._offset = offset def _get_offset(self) -> int: return self._offset def _set_offset(self, offset: int) -> None: msg = ( "changing a tokenizers :py:attr:`offset` attribute is deprecated;" " use the :py:meth:`set_offset` method" ) warnings.warn(msg, category=DeprecationWarning, stacklevel=2) self.set_offset(offset) offset = property(_get_offset, _set_offset) def get_tokenizer( tag: str = None, chunkers: Iterable[Union[Type["Chunker"], Type["Filter"]]] = None, filters: Iterable[Type["Filter"]] = None, ) -> tokenize: """Locate an appropriate tokenizer by language tag. This requires importing the function `tokenize` from an appropriate module. Modules tried are named after the language tag, tried in the following order: * the entire tag (e.g. "en_AU.py") * the base country code of the tag (e.g. "en.py") If the language tag is `None`, a default tokenizer (actually the English one) is returned. It's unicode aware and should work OK for most latin-derived languages. If a suitable function cannot be found, raises :py:exc:`~.errors.TokenizerNotFoundError`. If given and not `None`, `chunkers` and `filters` must be lists of chunker classes and filter classes respectively. These will be applied to the tokenizer during creation. """ if tag is None: tag = "en" # "filters" used to be the second argument. Try to catch cases # where it is given positionally and issue a DeprecationWarning. if chunkers is not None and filters is None: chunkers = list(chunkers) if chunkers: try: chunkers_are_filters = issubclass(chunkers[0], Filter) except TypeError: pass else: if chunkers_are_filters: msg = ( "passing 'filters' as a non-keyword argument " "to get_tokenizer() is deprecated" ) warnings.warn(msg, category=DeprecationWarning, stacklevel=2) filters = cast(List[Type[Filter]], chunkers) chunkers = None # Ensure only '_' used as separator tag = tag.replace("-", "_") # First try the whole tag tk_func = _try_tokenizer(tag) if tk_func is None: # Try just the base base = tag.split("_")[0] tk_func = _try_tokenizer(base) if tk_func is None: msg = "No tokenizer found for language '%s'" % (tag,) raise TokenizerNotFoundError(msg) # Given the language-specific tokenizer, we now build up the # end result as follows: # * chunk the text using any given chunkers in turn # * begin with basic whitespace tokenization # * apply each of the given filters in turn # * apply language-specific rules tokenizer = basic_tokenize if chunkers is not None: chunkers = list(chunkers) for i in range(len(chunkers) - 1, -1, -1): tokenizer = wrap_tokenizer(chunkers[i], tokenizer) if filters is not None: for f in filters: tokenizer = f(tokenizer) tokenizer = wrap_tokenizer(tokenizer, tk_func) return tokenizer get_tokenizer._DOC_ERRORS = ["py", "py"] # type: ignore class empty_tokenize(tokenize): # noqa: N801 """Tokenizer class that yields no elements.""" _DOC_ERRORS = [] # type: ignore def __init__(self) -> None: super().__init__("") def next(self): raise StopIteration() class unit_tokenize(tokenize): # noqa: N801 """Tokenizer class that yields the text as a single token.""" _DOC_ERRORS = [] # type: ignore def __init__(self, text: str) -> None: super().__init__(text) self._done = False def next(self): if self._done: raise StopIteration() self._done = True return (self._text, 0) class basic_tokenize(tokenize): # noqa: N801 """Tokenizer class that performs very basic word-finding. This tokenizer does the most basic thing that could work - it splits text into words based on whitespace boundaries, and removes basic punctuation symbols from the start and end of each word. """ _DOC_ERRORS = [] # type: ignore # Chars to remove from start/end of words strip_from_start = '"' + "'`([" strip_from_end = '"' + "'`]).!,?;:" def next(self): text = self._text offset = self._offset while True: if offset >= len(text): break # Find start of next word while offset < len(text) and text[offset].isspace(): offset += 1 s_pos = offset # Find end of word while offset < len(text) and not text[offset].isspace(): offset += 1 e_pos = offset self._offset = offset # Strip chars from font/end of word while s_pos < len(text) and text[s_pos] in self.strip_from_start: s_pos += 1 while 0 < e_pos and text[e_pos - 1] in self.strip_from_end: e_pos -= 1 # Return if word isn't empty if s_pos < e_pos: return (text[s_pos:e_pos], s_pos) raise StopIteration() def _try_tokenizer(mod_name: str) -> Optional[Callable]: """Look for a tokenizer in the named module. Returns the function if found, None otherwise. """ mod_base = "enchant.tokenize." func_name = "tokenize" mod_name = mod_base + mod_name try: mod = __import__(mod_name, globals(), {}, func_name) return getattr(mod, func_name) except ImportError: return None _Filter = Union[Type[tokenize], "Filter"] def wrap_tokenizer(tk1: _Filter, tk2: _Filter) -> "Filter": """Wrap one tokenizer inside another. This function takes two tokenizer functions `tk1` and `tk2`, and returns a new tokenizer function that passes the output of `tk1` through `tk2` before yielding it to the calling code. """ # This logic is already implemented in the Filter class. # We simply use tk2 as the _split() method for a filter # around tk1. tkw = Filter(tk1) tkw._split = tk2 # type: ignore return tkw wrap_tokenizer._DOC_ERRORS = ["tk", "tk", "tk", "tk"] # type: ignore class Chunker(tokenize): """Base class for text chunking functions. A chunker is designed to chunk text into large blocks of tokens. It has the same interface as a tokenizer but is for a different purpose. """ pass class Filter: """Base class for token filtering functions. A filter is designed to wrap a tokenizer (or another :py:class:`Filter`) and do two things: * skip over tokens * split tokens into sub-tokens Subclasses have two basic options for customising their behaviour. The method :py:meth:`_skip` may be overridden to return `True` for words that should be skipped, and `False` otherwise. The method :py:meth:`_split` may be overridden as tokenization function that will be applied to further tokenize any words that aren't skipped. """ def __init__(self, tokenizer: _Filter) -> None: """Filter class constructor.""" self._tokenizer = tokenizer def __call__(self, *args, **kwds): tkn = self._tokenizer(*args, **kwds) return self._TokenFilter(tkn, self._skip, self._split) def _skip(self, word: str) -> bool: """Filter method for identifying skippable tokens. If this method returns `True`, the given word will be skipped by the filter. This should be overridden in subclasses to produce the desired functionality. The default behaviour is not to skip any words. """ return False def _split(self, word: str) -> tokenize: """Filter method for sub-tokenization of tokens. This method must be a tokenization function that will split the given word into sub-tokens according to the needs of the filter. The default behaviour is not to split any words. """ return unit_tokenize(word) class _TokenFilter: """Private inner class implementing the tokenizer-wrapping logic. This might seem convoluted, but we're trying to create something akin to a meta-class - when `Filter(tknzr)` is called it must return a *callable* that can then be applied to a particular string to perform the tokenization. Since we need to manage a lot of state during tokenization, returning a class is the best option. """ _DOC_ERRORS = ["tknzr"] def __init__( self, tokenizer: _Filter, skip: Callable[[str], bool], split: Callable[[str], tokenize], ) -> None: self._skip = skip self._split = split self._tokenizer = tokenizer # for managing state of sub-tokenization self._curtok = empty_tokenize() self._curword = "" self._curpos = 0 def __iter__(self): return self def __next__(self): return self.next() def next(self) -> Token: # Try to get the next sub-token from word currently being split. # If unavailable, move on to the next word and try again. while True: try: (word, pos) = next(self._curtok) return (word, pos + self._curpos) except StopIteration: (word, pos) = next(self._tokenizer) while self._skip(self._to_string(word)): (word, pos) = next(self._tokenizer) self._curword = word self._curpos = pos self._curtok = self._split(word) def _to_string(self, word) -> str: if type(word) is array.array: if word.typecode == "u": return word.tounicode() elif word.typecode == "c": return word.tostring() return word # Pass on access to 'offset' to the underlying tokenizer. def _get_offset(self) -> int: return self._tokenizer.offset def _set_offset(self, offset: int) -> None: msg = ( "changing a tokenizers 'offset' attribute is deprecated;" " use the 'set_offset' method" ) warnings.warn(msg, category=DeprecationWarning, stacklevel=2) self.set_offset(offset) offset = property(_get_offset, _set_offset) def set_offset(self, val, replaced: bool = False) -> None: old_offset = self._tokenizer.offset self._tokenizer.set_offset(val, replaced=replaced) # If we move forward within the current word, also set on _curtok. # Otherwise, throw away _curtok and set to empty iterator. keep_curtok = True curtok_offset = val - self._curpos if old_offset > val: keep_curtok = False if curtok_offset < 0: keep_curtok = False if curtok_offset >= len(self._curword): keep_curtok = False if keep_curtok and not replaced: self._curtok.set_offset(curtok_offset) else: self._curtok = empty_tokenize() self._curword = "" self._curpos = 0 # Pre-defined chunkers and filters start here class URLFilter(Filter): r"""Filter skipping over URLs. This filter skips any words matching the following regular expression: ^[a-zA-Z]+:\/\/[^\s].* That is, any words that are URLs. """ _DOC_ERRORS = ["zA"] _pattern = re.compile(r"^[a-zA-Z]+:\/\/[^\s].*") def _skip(self, word: str) -> bool: if self._pattern.match(word): return True return False class WikiWordFilter(Filter): r"""Filter skipping over WikiWords. This filter skips any words matching the following regular expression: ^([A-Z]\w+[A-Z]+\w+) That is, any words that are WikiWords. """ _pattern = re.compile(r"^([A-Z]\w+[A-Z]+\w+)") def _skip(self, word: str) -> bool: if self._pattern.match(word): return True return False class EmailFilter(Filter): r"""Filter skipping over email addresses. This filter skips any words matching the following regular expression: ^.+@[^\.].*\.[a-z]{2,}$ That is, any words that resemble email addresses. """ _pattern = re.compile(r"^.+@[^\.].*\.[a-z]{2,}$") def _skip(self, word: str) -> bool: if self._pattern.match(word): return True return False class MentionFilter(Filter): r"""Filter skipping over @mention. This filter skips any words matching the following regular expression: (\A|\s)@(\w+) That is, any words that are @mention. """ _DOC_ERRORS = ["zA"] _pattern = re.compile(r"(\A|\s)@(\w+)") def _skip(self, word: str) -> bool: if self._pattern.match(word): return True return False class HashtagFilter(Filter): r"""Filter skipping over #hashtag. This filter skips any words matching the following regular expression: (\A|\s)#(\w+) That is, any words that are #hashtag. """ _DOC_ERRORS = ["zA"] _pattern = re.compile(r"(\A|\s)#(\w+)") def _skip(self, word: str) -> bool: if self._pattern.match(word): return True return False class HTMLChunker(Chunker): """Chunker for breaking up HTML documents into chunks of checkable text. The operation of this chunker is very simple - anything between a "<" and a ">" will be ignored. Later versions may improve the algorithm slightly. """ def next(self) -> Token: text = self._text offset = self.offset while True: if offset >= len(text): break # Skip to the end of the current tag, if any. if text[offset] == "<": maybe_tag = offset if self._is_tag(text, offset): while text[offset] != ">": offset += 1 if offset == len(text): offset = maybe_tag + 1 break else: offset += 1 else: offset = maybe_tag + 1 s_pos = offset # Find the start of the next tag. while offset < len(text) and text[offset] != "<": offset += 1 self._offset = offset # Return if chunk isn't empty if s_pos < offset: return (text[s_pos:offset], s_pos) raise StopIteration() def _is_tag(self, text: str, offset: int) -> bool: if offset + 1 < len(text): if text[offset + 1].isalpha(): return True if text[offset + 1] == "/": return True return False # TODO: LaTeXChunker ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/enchant/tokenize/de.py0000644000175000017500000000353614432433534017700 0ustar00dmerejdmerej# pyenchant # # Copyright (C) 2022, Nico Gulden, Univention GmbH # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant.tokenize.de: Tokenizer for the German language This module implements a PyEnchant text tokenizer for the German language, based on very simple rules. """ from typing import Container, Optional from enchant.tokenize.en import tokenize as tokenizer_en from .en import _TextLike class tokenize(tokenizer_en): # noqa: N801 def __init__( self, text: _TextLike, valid_chars: Optional[Container[str]] = ("-", ".") ) -> None: super().__init__(text, valid_chars) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/enchant/tokenize/en.py0000644000175000017500000001642714432433534017715 0ustar00dmerejdmerej# pyenchant # # Copyright (C) 2004-2008, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant.tokenize.en: Tokenizer for the English language This module implements a PyEnchant text tokenizer for the English language, based on very simple rules. """ import unicodedata from typing import Any, Callable, Container, Union # noqa F401 import enchant.tokenize _TextLike = Union[bytes, str, bytearray] _BinaryLike = Union[bytes, bytearray] class tokenize(enchant.tokenize.tokenize): # noqa: N801 """Iterator splitting text into words, reporting position. This iterator takes a text string as input, and yields tuples representing each distinct word found in the text. The tuples take the form: (,) Where `word` is the word string found and `pos` is the position of the start of the word within the text. The optional argument `valid_chars` may be used to specify a list of additional characters that can form part of a word. By default, this list contains only the apostrophe ('). Note that these characters cannot appear at the start or end of a word. """ _DOC_ERRORS = ["pos", "pos"] def __init__(self, text: _TextLike, valid_chars: Container[str] = None) -> None: self._valid_chars = valid_chars # type: Container[str] # type: ignore self._text = text # type: ignore self._offset = 0 # Select proper implementation of self._consume_alpha. # 'text' isn't necessarily a string (it could be e.g. a mutable array) # so we can't use isinstance(text, str) to detect unicode. # Instead we typetest the first character of the text. # If there's no characters then it doesn't matter what implementation # we use since it won't be called anyway. try: char1 = text[0] except IndexError: self._initialize_for_binary() else: if isinstance(char1, str): self._initialize_for_unicode() else: self._initialize_for_binary() def _initialize_for_binary(self) -> None: self._consume_alpha = self._consume_alpha_b # type: Callable[[Any, int], int] if self._valid_chars is None: self._valid_chars = ("'",) def _initialize_for_unicode(self) -> None: self._consume_alpha = self._consume_alpha_u if self._valid_chars is None: # XXX TODO: this doesn't seem to work correctly with the # MySpell provider, disabling for now. # Allow unicode typographic apostrophe # self._valid_chars = (u"'",u"\u2019") self._valid_chars = ("'",) def _consume_alpha_b(self, text: _BinaryLike, offset: int) -> int: """Consume an alphabetic character from the given bytestring. Given a bytestring and the current offset, this method returns the number of characters occupied by the next alphabetic character in the string. Non-ASCII bytes are interpreted as utf-8 and can result in multiple characters being consumed. """ assert offset < len(text) if text[offset] >= 0x80: return self._consume_alpha_utf8(text, offset) if chr(text[offset]).isalpha(): return 1 return 0 def _consume_alpha_utf8(self, text: _BinaryLike, offset: int) -> int: """Consume a sequence of utf8 bytes forming an alphabetic character.""" incr = 2 u = "" while not u and incr <= 4: try: try: # In the common case this will be a string u = text[offset : offset + incr].decode("utf8") except AttributeError: # Looks like it was e.g. a mutable char array. try: s = text[offset : offset + incr].tostring() # type: ignore except AttributeError: s = bytes(c for c in text[offset : offset + incr]) u = s.decode("utf8") except UnicodeDecodeError: incr += 1 if not u: return 0 if u.isalpha(): return incr if unicodedata.category(u)[0] == "M": return incr return 0 def _consume_alpha_u(self, text: str, offset: int) -> int: """Consume an alphabetic character from the given unicode string. Given a unicode string and the current offset, this method returns the number of characters occupied by the next alphabetic character in the string. Trailing combining characters are consumed as a single letter. """ assert offset < len(text) incr = 0 if text[offset].isalpha(): incr = 1 while offset + incr < len(text): if unicodedata.category(text[offset + incr])[0] != "M": break incr += 1 return incr def next(self) -> enchant.tokenize.Token: text = self._text offset = self._offset while offset < len(text): # Find start of next word (must be alpha) while offset < len(text): incr = self._consume_alpha(text, offset) if incr: break offset += 1 cur_pos = offset # Find end of word using, allowing valid_chars while offset < len(text): incr = self._consume_alpha(text, offset) if not incr: if text[offset] in self._valid_chars: incr = 1 else: break offset += incr # Return if word isn't empty if cur_pos != offset: # Make sure word doesn't end with a valid_char while text[offset - 1] in self._valid_chars: offset = offset - 1 self._offset = offset return (text[cur_pos:offset], cur_pos) self._offset = offset raise StopIteration() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/enchant/utils.py0000644000175000017500000001104014432433534016605 0ustar00dmerejdmerej# pyenchant # # Copyright (C) 2004-2008 Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant.utils: Misc utilities for the enchant package ======================================================== This module provides miscellaneous utilities for use with the enchant spellchecking package. Currently available functionality includes: * functions for dealing with locale/language settings * ability to list supporting data files (win32 only) * functions for bundling supporting data files from a build """ import locale from typing import Callable, Iterable, List, Optional, Sequence # noqa F401 from enchant.errors import * # noqa F401,F403 from enchant.errors import Error def levenshtein(s1: str, s2: str) -> int: """Calculate the Levenshtein distance between two strings. This is straight from `Wikipedia `_. """ if len(s1) < len(s2): return levenshtein(s2, s1) if not s1: return len(s2) previous_row = range(len(s2) + 1) # type: Sequence[int] for i, c1 in enumerate(s1): current_row = [i + 1] for j, c2 in enumerate(s2): insertions = previous_row[j + 1] + 1 deletions = current_row[j] + 1 substitutions = previous_row[j] + (c1 != c2) current_row.append(min(insertions, deletions, substitutions)) previous_row = current_row return previous_row[-1] def trim_suggestions( word: str, suggs: Iterable[str], maxlen: int, calcdist: Callable[[str, str], int] = None, ) -> List[str]: """Trim a list of suggestions to a maximum length. If the list of suggested words is too long, you can use this function to trim it down to a maximum length. It tries to keep the "best" suggestions based on similarity to the original word. If the optional `calcdist` argument is provided, it must be a callable taking two words and returning the distance between them. It will be used to determine which words to retain in the list. The default is a simple Levenshtein distance. """ if calcdist is None: calcdist = levenshtein decorated = [(calcdist(word, s), s) for s in suggs] decorated.sort() return [s for (l, s) in decorated[:maxlen]] def get_default_language(default: Optional[str] = None) -> Optional[str]: """Determine the user's default language, if possible. This function uses the :py:mod:`locale` module to try to determine the user's preferred language. The return value is as follows: * if a locale is available for the `LC_MESSAGES` category, that language is used * if a default locale is available, that language is used * if the keyword argument `default` is given, it is used * if nothing else works, `None` is returned Note that determining the user's language is in general only possible if they have set the necessary environment variables on their system. """ try: tag = locale.getlocale()[0] if tag is None: tag = locale.getdefaultlocale()[0] if tag is None: raise Error("No default language available") return tag except Exception: pass return default get_default_language._DOC_ERRORS = ["LC"] # type: ignore ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1687012851.8770554 pyenchant-3.3.0rc1/pyenchant.egg-info/0000755000175000017500000000000014443342764017150 5ustar00dmerejdmerej././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1687012851.0 pyenchant-3.3.0rc1/pyenchant.egg-info/PKG-INFO0000644000175000017500000000772414443342763020256 0ustar00dmerejdmerejMetadata-Version: 2.1 Name: pyenchant Version: 3.3.0rc1 Summary: Python bindings for the Enchant spellchecking system Home-page: https://pyenchant.github.io/pyenchant/ Author: Dimitri Merejkowsky Author-email: d.merej@gmail.com License: LGPL Project-URL: Changelog, https://pyenchant.github.io/pyenchant/changelog.html Project-URL: Source, https://github.com/pyenchant/pyenchant Project-URL: Tracker, https://github.com/pyenchant/pyenchant/issues Keywords: spelling spellcheck enchant Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL) Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3 :: Only Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: 3.12 Classifier: Programming Language :: Python :: Implementation :: CPython Classifier: Programming Language :: Python :: Implementation :: PyPy Classifier: Topic :: Software Development :: Libraries Classifier: Topic :: Text Processing :: Linguistic Requires-Python: >=3.7 License-File: LICENSE.txt pyenchant: Python bindings for the Enchant spellchecker ======================================================== .. image:: https://img.shields.io/pypi/v/pyenchant.svg :target: https://pypi.org/project/pyenchant .. image:: https://img.shields.io/pypi/pyversions/pyenchant.svg :target: https://pypi.org/project/pyenchant .. image:: https://github.com/pyenchant/pyenchant/workflows/tests/badge.svg :target: https://github.com/pyenchant/pyenchant/actions .. image:: https://builds.sr.ht/~dmerej/pyenchant.svg :target: https://builds.sr.ht/~dmerej/pyenchant .. image:: https://img.shields.io/badge/code%20style-black-000000.svg :target: https://github.com/psf/black This package provides a set of Python language bindings for the Enchant spellchecking library. For more information, visit the project website: http://pyenchant.github.io/pyenchant/ What is Enchant? ---------------- Enchant is used to check the spelling of words and suggest corrections for words that are miss-spelled. It can use many popular spellchecking packages to perform this task, including ispell, aspell and MySpell. It is quite flexible at handling multiple dictionaries and multiple languages. More information is available on the Enchant website: https://abiword.github.io/enchant/ How do I use it? ---------------- For Windows users, install the pre-built binary packages using pip:: pip install pyenchant These packages bundle a pre-built copy of the underlying enchant library. Users on other platforms will need to install "enchant" using their system package manager (brew on macOS). Once the software is installed, python's on-line help facilities can get you started. Launch python and issue the following commands: >>> import enchant >>> help(enchant) Who is responsible for all this? -------------------------------- The credit for Enchant itself goes to Dom Lachowicz. Find out more details on the Enchant website listed above. Full marks to Dom for producing such a high-quality library. The glue to pull Enchant into Python via ctypes was written by Ryan Kelly. He needed a decent spellchecker for another project he was working on, and all the solutions turned up by Google were either extremely non-portable (e.g. opening a pipe to ispell) or had completely disappeared from the web (what happened to SnakeSpell?) It was also a great excuse to teach himself about SWIG, ctypes, and even a little bit of the Python/C API. Finally, after Ryan stepped down from the project, Dimitri Merejkowsky became the new maintainer. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1687012851.0 pyenchant-3.3.0rc1/pyenchant.egg-info/SOURCES.txt0000644000175000017500000000131514443342763021033 0ustar00dmerejdmerejChangelog LICENSE.txt MANIFEST.in README.rst pyproject.toml setup.cfg setup.py enchant/__init__.py enchant/_enchant.py enchant/errors.py enchant/pypwl.py enchant/utils.py enchant/checker/CmdLineChecker.py enchant/checker/GtkSpellCheckerDialog.py enchant/checker/__init__.py enchant/checker/wxSpellCheckerDialog.py enchant/tokenize/__init__.py enchant/tokenize/de.py enchant/tokenize/en.py pyenchant.egg-info/PKG-INFO pyenchant.egg-info/SOURCES.txt pyenchant.egg-info/dependency_links.txt pyenchant.egg-info/top_level.txt tests/test_broker.py tests/test_checker.py tests/test_dict.py tests/test_docstrings.py tests/test_misc.py tests/test_multiprocessing.py tests/test_pwl.py tests/test_tokenize.py tests/test_utils.py././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1687012851.0 pyenchant-3.3.0rc1/pyenchant.egg-info/dependency_links.txt0000644000175000017500000000000114443342763023215 0ustar00dmerejdmerej ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1687012851.0 pyenchant-3.3.0rc1/pyenchant.egg-info/top_level.txt0000644000175000017500000000001014443342763021670 0ustar00dmerejdmerejenchant ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/pyproject.toml0000644000175000017500000000003714432433534016373 0ustar00dmerejdmerej[tool.isort] profile = "black" ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1687012851.880389 pyenchant-3.3.0rc1/setup.cfg0000644000175000017500000000337014443342764015311 0ustar00dmerejdmerej[metadata] name = pyenchant version = 3.3.0rc1 description = Python bindings for the Enchant spellchecking system long_description = file: README.rst author = Dimitri Merejkowsky author_email = d.merej@gmail.com url = https://pyenchant.github.io/pyenchant/ project_urls = Changelog=https://pyenchant.github.io/pyenchant/changelog.html Source=https://github.com/pyenchant/pyenchant Tracker=https://github.com/pyenchant/pyenchant/issues license = LGPL keywords = spelling spellcheck enchant classifiers = Development Status :: 5 - Production/Stable Intended Audience :: Developers License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL) Operating System :: OS Independent Programming Language :: Python Programming Language :: Python :: 3 Programming Language :: Python :: 3 :: Only Programming Language :: Python :: 3.7 Programming Language :: Python :: 3.8 Programming Language :: Python :: 3.9 Programming Language :: Python :: 3.10 Programming Language :: Python :: 3.11 Programming Language :: Python :: 3.12 Programming Language :: Python :: Implementation :: CPython Programming Language :: Python :: Implementation :: PyPy Topic :: Software Development :: Libraries Topic :: Text Processing :: Linguistic [options] packages = find: python_requires = >=3.7 include_package_data = true [flake8] ignore = E203 E266 E231 E302 E402 E501 W503 exclude = enchant/checker/wxSpellCheckerDialog.py enchant/checker/GtkSpellCheckerDialog.py enchant/checker/CmdLineChecker.py [mypy] python_version = 3.7 files = . [mypy-gtk] ignore_missing_imports = True [mypy-pytest] ignore_missing_imports = True [mypy-setuptools] ignore_missing_imports = True [mypy-wx] ignore_missing_imports = True [egg_info] tag_build = tag_date = 0 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/setup.py0000644000175000017500000000004614432433534015171 0ustar00dmerejdmerejfrom setuptools import setup setup() ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1687012851.880389 pyenchant-3.3.0rc1/tests/0000755000175000017500000000000014443342764014627 5ustar00dmerejdmerej././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/tests/test_broker.py0000644000175000017500000000627114432433534017524 0ustar00dmerejdmerejimport pytest from enchant import Broker @pytest.fixture def broker(): res = Broker() yield res del res def test_all_langs_are_available(broker): """Test whether all advertised languages are in fact available.""" for lang in broker.list_languages(): if not broker.dict_exists(lang): assert False, "language '" + lang + "' advertised but non-existent" def test_provs_are_available(broker): """Test whether all advertised providers are in fact available.""" for (lang, prov) in broker.list_dicts(): assert broker.dict_exists(lang) if not broker.dict_exists(lang): assert False, "language '" + lang + "' advertised but non-existent" if prov not in broker.describe(): assert False, "provier '" + str(prov) + "' advertised but non-existent" def test_prov_ordering(broker): """Test that provider ordering works correctly.""" langs = {} provs = [] # Find the providers for each language, and a list of all providers for (tag, prov) in broker.list_dicts(): # Skip hyphenation dictionaries installed by OOo if tag.startswith("hyph_") and prov.name == "myspell": continue # Canonicalize separators tag = tag.replace("-", "_") langs[tag] = [] # NOTE: we are excluding Zemberek here as it appears to return # a broker for any language, even nonexistent ones if prov not in provs and prov.name != "zemberek": provs.append(prov) for prov in provs: for tag in langs: b2 = Broker() b2.set_ordering(tag, prov.name) try: d = b2.request_dict(tag) if d.provider != prov: raise ValueError() langs[tag].append(prov) # TODO: bare except except: # noqa pass # Check availability using a single entry in ordering for tag in langs: for prov in langs[tag]: b2 = Broker() b2.set_ordering(tag, prov.name) d = b2.request_dict(tag) assert (d.provider, tag) == (prov, tag) del d del b2 # Place providers that don't have the language in the ordering for tag in langs: for prov in langs[tag]: order = prov.name for prov2 in provs: if prov2 not in langs[tag]: order = prov2.name + "," + order b2 = Broker() b2.set_ordering(tag, order) d = b2.request_dict(tag) assert (d.provider, tag, order) == (prov, tag, order) del d del b2 def test_get_set_param(broker): """ Scenario: Either broker.set_param(key, value) works Or broker.set_param(key, value) throws AttributeError """ key = "pyenchant.unittest" value = "testing" error = None try: broker.set_param(key, value) except AttributeError as e: error = e if error: return else: assert broker.get_param("pyenchant.unittest") == "testing" other_broker = Broker() assert other_broker.get_param("pyenchant.unittest") is None ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/tests/test_checker.py0000644000175000017500000002230514432433534017640 0ustar00dmerejdmerej# pyenchant # # Copyright (C) 2004-2009, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. import array import pytest import enchant import enchant.tokenize from enchant.checker import SpellChecker from enchant.errors import DefaultLanguageNotFoundError from enchant.utils import get_default_language def test_basic(): """Test a basic run of the SpellChecker class.""" text = """This is sme text with a few speling erors in it. Its gret for checking wheather things are working proprly with the SpellChecker class. Not gret for much elss though.""" chkr = SpellChecker("en_US", text=text) for n, err in enumerate(chkr): if n == 0: # Fix up "sme" -> "some" properly assert err.word == "sme" assert err.wordpos == 8 assert "some" in err.suggest() err.replace("some") if n == 1: # Ignore "speling" assert err.word == "speling" if n == 2: # Check context around "erors", and replace assert err.word == "erors" assert err.leading_context(5) == "ling " assert err.trailing_context(5) == " in i" err.replace("errors") if n == 3: # Replace-all on gret as it appears twice assert err.word == "gret" err.replace_always("great") if n == 4: # First encounter with "wheather", move offset back assert err.word == "wheather" err.set_offset(-1 * len(err.word)) if n == 5: # Second encounter, fix up "wheather' assert err.word == "wheather" err.replace("whether") if n == 6: # Just replace "proprly", but also add an ignore # for "SpellChecker" assert err.word == "proprly" err.replace("properly") err.ignore_always("SpellChecker") if n == 7: # The second "gret" should have been replaced # So it's now on "elss" assert err.word == "elss" err.replace("else") if n > 7: pytest.fail("Extraneous spelling errors were found") text2 = """This is some text with a few speling errors in it. Its great for checking whether things are working properly with the SpellChecker class. Not great for much else though.""" assert chkr.get_text() == text2 def test_filters(): """Test SpellChecker with the 'filters' argument.""" text = """I contain WikiWords that ShouldBe skipped by the filters""" chkr = SpellChecker("en_US", text=text, filters=[enchant.tokenize.WikiWordFilter]) for err in chkr: # There are no errors once the WikiWords are skipped pytest.fail("Extraneous spelling errors were found") assert chkr.get_text() == text def test_chunkers(): """Test SpellChecker with the 'chunkers' argument.""" text = """I contain tags that should be skipped""" chkr = SpellChecker("en_US", text=text, chunkers=[enchant.tokenize.HTMLChunker]) for err in chkr: # There are no errors when the tag is skipped pytest.fail("Extraneous spelling errors were found") assert chkr.get_text() == text class TestChunkersAndFilters: """Test SpellChecker with the 'chunkers' and 'filters' arguments.""" text = """I contain tags that should be skipped along with a = 0 assert len(en_us_dict.suggest("Thiiiis")) >= 0 def test_unicode1(en_us_dict): """Test checking/suggesting for unicode strings""" # TODO: find something that actually returns suggestions us1 = r"he\u2149lo" assert type(us1) is str assert not en_us_dict.check(us1) for s in en_us_dict.suggest(us1): assert type(s) is str def test_session(en_us_dict): """Test that adding words to the session works as required.""" assert not en_us_dict.check("Lozz") assert not en_us_dict.is_added("Lozz") en_us_dict.add_to_session("Lozz") assert en_us_dict.is_added("Lozz") assert en_us_dict.check("Lozz") en_us_dict.remove_from_session("Lozz") assert not en_us_dict.check("Lozz") assert not en_us_dict.is_added("Lozz") en_us_dict.remove_from_session("hello") assert not en_us_dict.check("hello") assert en_us_dict.is_removed("hello") # TODO: fixture please en_us_dict.add_to_session("hello") def test_add_remove(en_us_dict): """Test adding/removing from default user dictionary.""" nonsense = "kxhjsddsi" assert not en_us_dict.check(nonsense) en_us_dict.add(nonsense) assert en_us_dict.is_added(nonsense) assert en_us_dict.check(nonsense) en_us_dict.remove(nonsense) assert not en_us_dict.is_added(nonsense) assert not en_us_dict.check(nonsense) en_us_dict.remove("pineapple") assert not en_us_dict.check("pineapple") assert en_us_dict.is_removed("pineapple") assert not en_us_dict.is_added("pineapple") en_us_dict.add("pineapple") assert en_us_dict.check("pineapple") def test_default_lang(en_us_dict): """Test behaviour of default language selection.""" def_lang = get_default_language() if def_lang is None: # If no default language, shouldn't work with pytest.raises(Error): Dict() else: # If there is a default language, should use it # Of course, no need for the dict to actually exist try: d = Dict() assert d.tag == def_lang except DictNotFoundError: pass def test_pickling(en_us_dict): """Test that pickling doesn't corrupt internal state.""" d1 = Dict("en_US") assert d1.check("hello") d2 = pickle.loads(pickle.dumps(d1)) assert d1.check("hello") assert d2.check("hello") d1._free() assert d2.check("hello") ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/tests/test_docstrings.py0000644000175000017500000000662314432433534020420 0ustar00dmerejdmerej"""Test the spelling on all docstrings we can find in this module. This serves two purposes - to provide a lot of test data for the checker routines, and to make sure we don't suffer the embarrassment of having spelling errors in a spellchecking package! """ import os WORDS = [ "spellchecking", "utf", "dict", "unicode", "bytestring", "bytestrings", "str", "pyenchant", "ascii", "utils", "setup", "distutils", "pkg", "filename", "tokenization", "tuple", "tuples", "tokenizer", "tokenizers", "testcase", "testcases", "whitespace", "wxpython", "spellchecker", "dialog", "urls", "wikiwords", "enchantobject", "providerdesc", "spellcheck", "pwl", "aspell", "myspell", "docstring", "docstrings", "stopiteration", "pwls", "pypwl", "dictwithpwl", "skippable", "dicts", "dict's", "filenames", "fr", "trie", "api", "ctypes", "wxspellcheckerdialog", "stateful", "cmdlinechecker", "spellchecks", "callback", "clunkier", "iterator", "ispell", "cor", "backends", "subclasses", "initialise", "runtime", "py", "meth", "attr", "func", "exc", "enchant", ] def test_docstrings(): """Test that all our docstrings are error-free.""" import enchant import enchant.checker import enchant.checker.CmdLineChecker import enchant.pypwl import enchant.tokenize import enchant.tokenize.en import enchant.utils try: import enchant.checker.GtkSpellCheckerDialog except ImportError: pass try: import enchant.checker.WxSpellCheckerDialog except ImportError: pass errors = [] # Naive recursion here would blow the stack, instead we # simulate it with our own stack tocheck = [enchant] checked = [] while tocheck: obj = tocheck.pop() checked.append(obj) newobjs = list(_check_docstrings(obj, errors)) tocheck.extend([obj for obj in newobjs if obj not in checked]) assert not errors def _check_docstrings(obj, errors): import enchant if hasattr(obj, "__doc__"): skip_errors = [w for w in getattr(obj, "_DOC_ERRORS", [])] chkr = enchant.checker.SpellChecker( "en_US", obj.__doc__, filters=[enchant.tokenize.URLFilter] ) for err in chkr: if len(err.word) == 1: continue if err.word.lower() in WORDS: continue if skip_errors and skip_errors[0] == err.word: skip_errors.pop(0) continue errors.append((obj, err.word, err.wordpos)) # Find and yield all child objects that should be checked for name in dir(obj): if name.startswith("__"): continue child = getattr(obj, name) if hasattr(child, "__file__"): if not hasattr(globals(), "__file__"): continue if not child.__file__.startswith(os.path.dirname(__file__)): continue else: cmod = getattr(child, "__module__", None) if not cmod: cclass = getattr(child, "__class__", None) cmod = getattr(cclass, "__module__", None) if cmod and not cmod.startswith("enchant"): continue yield child ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/tests/test_misc.py0000644000175000017500000000051314432433534017164 0ustar00dmerejdmerejimport enchant def test_get_user_config_dir(): """ Scenario: Either broker.get_user_config_dir() works (enchant >= 2.0), or it throws AttributeError (enchant < 2.0) """ try: user_dir = enchant.get_user_config_dir() assert user_dir except AttributeError: assert True ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/tests/test_multiprocessing.py0000644000175000017500000000101214432433534021453 0ustar00dmerejdmerejimport sys from multiprocessing import Pool import pytest import enchant def check_words(words): d = enchant.Dict("en-US") for word in words: d.check(word) return True @pytest.mark.skipif( sys.implementation.name == "pypy" and sys.platform == "win32", reason="hangs for an unknown reason", ) def test_can_use_multiprocessing(): words = ["hello" for i in range(1000)] input = [words for i in range(1000)] pool = Pool(10) assert all(pool.imap_unordered(check_words, input)) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/tests/test_pwl.py0000644000175000017500000000663714432433534017050 0ustar00dmerejdmerejimport sys import pytest from enchant import DictWithPWL, PyPWL, request_pwl_dict @pytest.fixture def pwl_path(tmp_path): res = tmp_path / ("pwl.txt") res.write_text("") return res def set_pwl_contents(path, contents): """Set the contents of the PWL file.""" path.write_text("\n".join(contents)) def get_pwl_contents(path): """Retrieve the contents of the PWL file.""" contents = path.read_text() return [c.strip() for c in contents.splitlines()] def test_check(pwl_path): """Test that basic checking works for PWLs.""" set_pwl_contents(pwl_path, ["Sazz", "Lozz"]) d = request_pwl_dict(str(pwl_path)) assert d.check("Sazz") assert d.check("Lozz") assert not d.check("hello") @pytest.mark.skipif( sys.implementation.name == "pypy" and sys.platform == "win32", reason="failing for an unknown reason", ) # This test only fails on mypy3 and Windows. Not sure if it's # a bug in PyEnchant, Enchant or pypy3 def test_unicodefn(tmp_path): """Test that unicode PWL filenames are accepted.""" unicode_path = tmp_path / "테스트" set_pwl_contents(unicode_path, ["Lozz"]) d = request_pwl_dict(str(unicode_path)) assert d.check("Lozz") assert d def test_add(pwl_path): """Test that adding words to a PWL works correctly.""" d = request_pwl_dict(str(pwl_path)) assert not d.check("Flagen") d.add("Esquilax") d.add("Esquilam") assert d.check("Esquilax") assert "Esquilax" in get_pwl_contents(pwl_path) assert d.is_added("Esquilax") def test_suggestions(pwl_path): """Test getting suggestions from a PWL.""" set_pwl_contents(pwl_path, ["Sazz", "Lozz"]) d = request_pwl_dict(str(pwl_path)) assert "Sazz" in d.suggest("Saz") assert "Lozz" in d.suggest("laz") assert "Sazz" in d.suggest("laz") d.add("Flagen") assert "Flagen" in d.suggest("Flags") assert "sazz" not in d.suggest("Flags") def test_dwpwl(tmp_path, pwl_path): """Test functionality of DictWithPWL.""" set_pwl_contents(pwl_path, ["Sazz", "Lozz"]) other_path = tmp_path / "pel.txt" d = DictWithPWL("en_US", str(pwl_path), str(other_path)) assert d.check("Sazz") assert d.check("Lozz") assert d.check("hello") assert not d.check("helo") assert not d.check("Flagen") d.add("Flagen") assert d.check("Flagen") assert "Flagen" in get_pwl_contents(pwl_path) assert "Flagen" in d.suggest("Flagn") assert "hello" in d.suggest("helo") d.remove("hello") assert not d.check("hello") assert "hello" not in d.suggest("helo") d.remove("Lozz") assert not d.check("Lozz") def test_dwpwl_empty(tmp_path): """Test functionality of DictWithPWL using transient dicts.""" d = DictWithPWL("en_US", None, None) assert d.check("hello") assert not d.check("helo") assert not d.check("Flagen") d.add("Flagen") assert d.check("Flagen") d.remove("hello") assert not d.check("hello") d.add("hello") assert d.check("hello") def test_pypwl(tmp_path): """Test our pure-python PWL implementation.""" d = PyPWL() assert list(d._words) == [] d.add("hello") d.add("there") d.add("duck") ws = list(d._words) assert len(ws) == 3 assert "hello" in ws assert "there" in ws assert "duck" in ws d.remove("duck") d.remove("notinthere") ws = list(d._words) assert len(ws) == 2 assert "hello" in ws assert "there" in ws ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/tests/test_tokenize.py0000644000175000017500000003143514432433534020070 0ustar00dmerejdmerej# pyenchant # # Copyright (C) 2004-2008, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # import textwrap import pytest from enchant.tokenize import ( EmailFilter, HTMLChunker, URLFilter, WikiWordFilter, basic_tokenize, empty_tokenize, get_tokenizer, wrap_tokenizer, ) from enchant.tokenize.en import tokenize as tokenize_en def test_basic_tokenize(): """Simple regression test for basic white-space tokenization.""" input = """This is a paragraph. It's not very special, but it's designed 2 show how the splitter works with many-different combos of words. Also need to "test" the (handling) of 'quoted' words.""" assert list(basic_tokenize(input)) == [ ("This", 0), ("is", 5), ("a", 8), ("paragraph", 10), ("It's", 22), ("not", 27), ("very", 31), ("special", 36), ("but", 45), ("it's", 49), ("designed", 54), ("2", 63), ("show", 65), ("how", 70), ("the", 74), ("splitter", 78), ("works", 87), ("with", 93), ("many-different", 98), ("combos", 113), ("of", 120), ("words", 123), ("Also", 130), ("need", 135), ("to", 140), ("test", 144), ("the", 150), ("handling", 155), ("of", 165), ("quoted", 169), ("words", 177), ] def test_tokenize_strip(): """Test special-char-stripping edge-cases in basic_tokenize.""" input = "((' \"\" 'text' has (lots) of (special chars} >>]" assert list(basic_tokenize(input)) == [ ("", 4), ("text", 15), ("has", 21), ("lots", 26), ("of", 32), ("special", 36), ("chars}", 44), (">>", 51), ] def test_wrap_tokenizer(): """Test wrapping of one tokenizer with another.""" input = "this-string will be split@according to diff'rnt rules" from enchant.tokenize import en tknzr = wrap_tokenizer(basic_tokenize, en.tokenize) tknzr = tknzr(input) assert tknzr._tokenizer.__class__ == basic_tokenize assert tknzr._tokenizer.offset == 0 for (n, (word, pos)) in enumerate(tknzr): if n == 0: assert pos == 0 assert word == "this" if n == 1: assert pos == 5 assert word == "string" if n == 2: assert pos == 12 assert word == "will" # Test setting offset to a previous token tknzr.set_offset(5) assert tknzr.offset == 5 assert tknzr._tokenizer.offset == 5 assert tknzr._curtok.__class__ == empty_tokenize if n == 3: assert word == "string" assert pos == 5 if n == 4: assert pos == 12 assert word == "will" if n == 5: assert pos == 17 assert word == "be" # Test setting offset past the current token tknzr.set_offset(20) assert tknzr.offset == 20 assert tknzr._tokenizer.offset == 20 assert tknzr._curtok.__class__ == empty_tokenize if n == 6: assert pos == 20 assert word == "split" if n == 7: assert pos == 26 assert word == "according" # Test setting offset to middle of current token tknzr.set_offset(23) assert tknzr.offset == 23 assert tknzr._tokenizer.offset == 23 if n == 8: assert pos == 23 assert word == "it" # OK, I'm pretty happy with the behaviour, no need to # continue testing the rest of the string @pytest.fixture def test_text(): text = """this text with http://url.com and SomeLinksLike ftp://my.site.com.au/some/file AndOthers not:/quite.a.url with-an@aemail.address as well""" return text def test_url_filter(test_text): """Test filtering of URLs""" tknzr = get_tokenizer("en_US", filters=(URLFilter,)) assert list(tknzr(test_text)) == [ ("this", 0), ("text", 5), ("with", 10), ("and", 30), ("SomeLinksLike", 34), ("AndOthers", 93), ("not", 103), ("quite", 108), ("a", 114), ("url", 116), ("with", 134), ("an", 139), ("aemail", 142), ("address", 149), ("as", 157), ("well", 160), ] def test_wiki_word_filter(test_text): """Test filtering of WikiWords""" tknzr = get_tokenizer("en_US", filters=(WikiWordFilter,)) assert list(tknzr(test_text)) == [ ("this", 0), ("text", 5), ("with", 10), ("http", 15), ("url", 22), ("com", 26), ("and", 30), ("ftp", 62), ("my", 68), ("site", 71), ("com", 76), ("au", 80), ("some", 83), ("file", 88), ("not", 103), ("quite", 108), ("a", 114), ("url", 116), ("with", 134), ("an", 139), ("aemail", 142), ("address", 149), ("as", 157), ("well", 160), ] def test_email_filter(test_text): """Test filtering of email addresses""" tknzr = get_tokenizer("en_US", filters=(EmailFilter,)) assert list(tknzr(test_text)) == [ ("this", 0), ("text", 5), ("with", 10), ("http", 15), ("url", 22), ("com", 26), ("and", 30), ("SomeLinksLike", 34), ("ftp", 62), ("my", 68), ("site", 71), ("com", 76), ("au", 80), ("some", 83), ("file", 88), ("AndOthers", 93), ("not", 103), ("quite", 108), ("a", 114), ("url", 116), ("as", 157), ("well", 160), ] def test_combined_filter(test_text): """Test several filters combined""" tknzr = get_tokenizer("en_US", filters=(URLFilter, WikiWordFilter, EmailFilter)) assert list(tknzr(test_text)) == [ ("this", 0), ("text", 5), ("with", 10), ("and", 30), ("not", 103), ("quite", 108), ("a", 114), ("url", 116), ("as", 157), ("well", 160), ] def test_html_chunker(): """Test filtering of URLs""" text = """hellomy titlethis is a simple HTML document for

testing purposes

. It < contains > various <-- special characters. """ tknzr = get_tokenizer("en_US", chunkers=(HTMLChunker,)) assert list(tknzr(text)) == [ ("hello", 0), ("my", 24), ("title", 27), ("this", 53), ("is", 58), ("a", 61), ("simple", 80), ("HTML", 91), ("document", 96), ("for", 105), ("test", 113), ("ing", 120), ("purposes", 128), ("It", 154), ("contains", 159), ("various", 170), ("special", 182), ("characters", 190), ] def test_tokenize_en(): """Simple regression test for English tokenization.""" input = """This is a paragraph. It's not very special, but it's designed 2 show how the splitter works with many-different combos of words. Also need to "test" the handling of 'quoted' words.""" assert list(tokenize_en(input)) == [ ("This", 0), ("is", 5), ("a", 8), ("paragraph", 10), ("It's", 22), ("not", 27), ("very", 31), ("special", 36), ("but", 45), ("it's", 49), ("designed", 54), ("show", 65), ("how", 70), ("the", 74), ("splitter", 78), ("works", 87), ("with", 93), ("many", 98), ("different", 103), ("combos", 113), ("of", 120), ("words", 123), ("Also", 130), ("need", 135), ("to", 140), ("test", 144), ("the", 150), ("handling", 154), ("of", 163), ("quoted", 167), ("words", 175), ] def test_unicode_basic(): """Test tokenization of a basic unicode string.""" input = "Ik ben geïnteresseerd in de coördinatie van mijn knieën, maar kan niet één à twee enquêtes vinden die recht doet aan mijn carrière op Curaçao" output = input.split(" ") output[8] = output[8][0:-1] for (itm_o, itm_v) in zip(output, tokenize_en(input)): assert itm_o == itm_v[0] assert input[itm_v[1] :].startswith(itm_o) def test_bug1591450(): """Check for tokenization regressions identified in bug #1591450.""" input = """Testing markup and {y:i}so-forth...leading dots and trail--- well, you get-the-point. Also check numbers: 999 1,000 12:00 .45. Done?""" assert list(tokenize_en(input)) == [ ("Testing", 0), ("i", 9), ("markup", 11), ("i", 19), ("and", 22), ("y", 27), ("i", 29), ("so", 31), ("forth", 34), ("leading", 42), ("dots", 50), ("and", 55), ("trail", 59), ("well", 68), ("you", 74), ("get", 78), ("the", 82), ("point", 86), ("Also", 93), ("check", 98), ("numbers", 104), ("Done", 134), ] def test_bug2785373(): """Testcases for bug #2785373""" input = "So, one dey when I wes 17, I left." for _ in tokenize_en(input): pass input = "So, one dey when I wes 17, I left." for _ in tokenize_en(input): pass def test_finnish_text(): """Test tokenizing some Finnish text. This really should work since there are no special rules to apply, just lots of non-ascii characters. """ text = textwrap.dedent( """\ Tämä on kappale. Eipä ole kovin 2 nen, mutta tarkoitus on näyttää miten sanastaja toimii useiden-erilaisten sanaryppäiden kimpussa. Pitääpä vielä 'tarkistaa' sanat jotka "lainausmerkeissä". Heittomerkki ja vaa'an. Ulkomaisia sanoja süss, spaß. """ ) assert list(tokenize_en(text)) == [ ("Tämä", 0), ("on", 5), ("kappale", 8), ("Eipä", 17), ("ole", 22), ("kovin", 26), ("nen", 34), ("mutta", 39), ("tarkoitus", 45), ("on", 55), ("näyttää", 58), ("miten", 66), ("sanastaja", 72), ("toimii", 83), ("useiden", 90), ("erilaisten", 98), ("sanaryppäiden", 109), ("kimpussa", 123), ("Pitääpä", 133), ("vielä", 141), ("tarkistaa", 148), ("sanat", 159), ("jotka", 165), ("lainausmerkeissä", 172), ("Heittomerkki", 191), ("ja", 204), ("vaa'an", 207), ("Ulkomaisia", 215), ("sanoja", 226), ("süss", 233), ("spaß", 239), ] def test_typographic_apostrophe(): """ "Typographic apostrophes should be word separators in English.""" text = "They\u2019re here" assert list(tokenize_en(text)) == [ ("They", 0), ("re", 5), ("here", 8), ] @pytest.mark.parametrize( "text,expected", [ ("", []), (b"", []), (bytearray(), []), ("a", [("a", 0)]), (b"a", [(b"a", 0)]), (bytearray((97,)), [(bytearray((97,)), 0)]), ("ä", [("ä", 0)]), (b"\xc3", []), (b"\xc3\xa4", [(b"\xc3\xa4", 0)]), (bytearray((0xC3,)), []), (bytearray((0xC3, 0xA4)), [(bytearray((0xC3, 0xA4)), 0)]), ], ) def test_tokenize_en_byte(text, expected): """Test tokenizing bytes.""" assert list(tokenize_en(text)) == expected ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1684682588.0 pyenchant-3.3.0rc1/tests/test_utils.py0000644000175000017500000000077514432433534017403 0ustar00dmerejdmerejfrom enchant.utils import trim_suggestions def test_trim_suggestions(): word = "gud" suggs = ["good", "god", "bad+"] assert trim_suggestions(word, suggs, 40) == ["god", "good", "bad+"] assert trim_suggestions(word, suggs, 4) == ["god", "good", "bad+"] assert trim_suggestions(word, suggs, 3) == ["god", "good", "bad+"] assert trim_suggestions(word, suggs, 2) == ["god", "good"] assert trim_suggestions(word, suggs, 1) == ["god"] assert trim_suggestions(word, suggs, 0) == []