pyenchant-1.6.5/0000755000175000017500000000000011501654510011640 5ustar rfkrfkpyenchant-1.6.5/LICENSE.txt0000644000175000017500000006347611235022427013503 0ustar rfkrfk GNU LESSER GENERAL PUBLIC LICENSE Version 2.1, February 1999 Copyright (C) 1991, 1999 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. [This is the first released version of the Lesser GPL. It also counts as the successor of the GNU Library Public License, version 2, hence the version number 2.1.] Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public Licenses are intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This license, the Lesser General Public License, applies to some specially designated software packages--typically libraries--of the Free Software Foundation and other authors who decide to use it. You can use it too, but we suggest you first think carefully about whether this license or the ordinary General Public License is the better strategy to use in any particular case, based on the explanations below. When we speak of free software, we are referring to freedom of use, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish); that you receive source code or can get it if you want it; that you can change the software and use pieces of it in new free programs; and that you are informed that you can do these things. To protect your rights, we need to make restrictions that forbid distributors to deny you these rights or to ask you to surrender these rights. These restrictions translate to certain responsibilities for you if you distribute copies of the library or if you modify it. For example, if you distribute copies of the library, whether gratis or for a fee, you must give the recipients all the rights that we gave you. You must make sure that they, too, receive or can get the source code. If you link other code with the library, you must provide complete object files to the recipients, so that they can relink them with the library after making changes to the library and recompiling it. And you must show them these terms so they know their rights. We protect your rights with a two-step method: (1) we copyright the library, and (2) we offer you this license, which gives you legal permission to copy, distribute and/or modify the library. To protect each distributor, we want to make it very clear that there is no warranty for the free library. Also, if the library is modified by someone else and passed on, the recipients should know that what they have is not the original version, so that the original author's reputation will not be affected by problems that might be introduced by others. Finally, software patents pose a constant threat to the existence of any free program. We wish to make sure that a company cannot effectively restrict the users of a free program by obtaining a restrictive license from a patent holder. Therefore, we insist that any patent license obtained for a version of the library must be consistent with the full freedom of use specified in this license. Most GNU software, including some libraries, is covered by the ordinary GNU General Public License. This license, the GNU Lesser General Public License, applies to certain designated libraries, and is quite different from the ordinary General Public License. We use this license for certain libraries in order to permit linking those libraries into non-free programs. When a program is linked with a library, whether statically or using a shared library, the combination of the two is legally speaking a combined work, a derivative of the original library. The ordinary General Public License therefore permits such linking only if the entire combination fits its criteria of freedom. The Lesser General Public License permits more lax criteria for linking other code with the library. We call this license the "Lesser" General Public License because it does Less to protect the user's freedom than the ordinary General Public License. It also provides other free software developers Less of an advantage over competing non-free programs. These disadvantages are the reason we use the ordinary General Public License for many libraries. However, the Lesser license provides advantages in certain special circumstances. For example, on rare occasions, there may be a special need to encourage the widest possible use of a certain library, so that it becomes a de-facto standard. To achieve this, non-free programs must be allowed to use the library. A more frequent case is that a free library does the same job as widely used non-free libraries. In this case, there is little to gain by limiting the free library to free software only, so we use the Lesser General Public License. In other cases, permission to use a particular library in non-free programs enables a greater number of people to use a large body of free software. For example, permission to use the GNU C Library in non-free programs enables many more people to use the whole GNU operating system, as well as its variant, the GNU/Linux operating system. Although the Lesser General Public License is Less protective of the users' freedom, it does ensure that the user of a program that is linked with the Library has the freedom and the wherewithal to run that program using a modified version of the Library. The precise terms and conditions for copying, distribution and modification follow. Pay close attention to the difference between a "work based on the library" and a "work that uses the library". The former contains code derived from the library, whereas the latter must be combined with the library in order to run. GNU LESSER GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License Agreement applies to any software library or other program which contains a notice placed by the copyright holder or other authorized party saying it may be distributed under the terms of this Lesser General Public License (also called "this License"). Each licensee is addressed as "you". A "library" means a collection of software functions and/or data prepared so as to be conveniently linked with application programs (which use some of those functions and data) to form executables. The "Library", below, refers to any such software library or work which has been distributed under these terms. A "work based on the Library" means either the Library or any derivative work under copyright law: that is to say, a work containing the Library or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language. (Hereinafter, translation is included without limitation in the term "modification".) "Source code" for a work means the preferred form of the work for making modifications to it. For a library, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the library. Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running a program using the Library is not restricted, and output from such a program is covered only if its contents constitute a work based on the Library (independent of the use of the Library in a tool for writing it). Whether that is true depends on what the Library does and what the program that uses the Library does. 1. You may copy and distribute verbatim copies of the Library's complete source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and distribute a copy of this License along with the Library. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Library or any portion of it, thus forming a work based on the Library, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) The modified work must itself be a software library. b) You must cause the files modified to carry prominent notices stating that you changed the files and the date of any change. c) You must cause the whole of the work to be licensed at no charge to all third parties under the terms of this License. d) If a facility in the modified Library refers to a function or a table of data to be supplied by an application program that uses the facility, other than as an argument passed when the facility is invoked, then you must make a good faith effort to ensure that, in the event an application does not supply such function or table, the facility still operates, and performs whatever part of its purpose remains meaningful. (For example, a function in a library to compute square roots has a purpose that is entirely well-defined independent of the application. Therefore, Subsection 2d requires that any application-supplied function or table used by this function must be optional: if the application does not supply it, the square root function must still compute square roots.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Library, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Library, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Library. In addition, mere aggregation of another work not based on the Library with the Library (or with a work based on the Library) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may opt to apply the terms of the ordinary GNU General Public License instead of this License to a given copy of the Library. To do this, you must alter all the notices that refer to this License, so that they refer to the ordinary GNU General Public License, version 2, instead of to this License. (If a newer version than version 2 of the ordinary GNU General Public License has appeared, then you can specify that version instead if you wish.) Do not make any other change in these notices. Once this change is made in a given copy, it is irreversible for that copy, so the ordinary GNU General Public License applies to all subsequent copies and derivative works made from that copy. This option is useful when you wish to copy part of the code of the Library into a program that is not a library. 4. You may copy and distribute the Library (or a portion or derivative of it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange. If distribution of object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place satisfies the requirement to distribute the source code, even though third parties are not compelled to copy the source along with the object code. 5. A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License. However, linking a "work that uses the Library" with the Library creates an executable that is a derivative of the Library (because it contains portions of the Library), rather than a "work that uses the library". The executable is therefore covered by this License. Section 6 states terms for distribution of such executables. When a "work that uses the Library" uses material from a header file that is part of the Library, the object code for the work may be a derivative work of the Library even though the source code is not. Whether this is true is especially significant if the work can be linked without the Library, or if the work is itself a library. The threshold for this to be true is not precisely defined by law. If such an object file uses only numerical parameters, data structure layouts and accessors, and small macros and small inline functions (ten lines or less in length), then the use of the object file is unrestricted, regardless of whether it is legally a derivative work. (Executables containing this object code plus portions of the Library will still fall under Section 6.) Otherwise, if the work is a derivative of the Library, you may distribute the object code for the work under the terms of Section 6. Any executables containing that work also fall under Section 6, whether or not they are linked directly with the Library itself. 6. As an exception to the Sections above, you may also combine or link a "work that uses the Library" with the Library to produce a work containing portions of the Library, and distribute that work under terms of your choice, provided that the terms permit modification of the work for the customer's own use and reverse engineering for debugging such modifications. You must give prominent notice with each copy of the work that the Library is used in it and that the Library and its use are covered by this License. You must supply a copy of this License. If the work during execution displays copyright notices, you must include the copyright notice for the Library among them, as well as a reference directing the user to the copy of this License. Also, you must do one of these things: a) Accompany the work with the complete corresponding machine-readable source code for the Library including whatever changes were used in the work (which must be distributed under Sections 1 and 2 above); and, if the work is an executable linked with the Library, with the complete machine-readable "work that uses the Library", as object code and/or source code, so that the user can modify the Library and then relink to produce a modified executable containing the modified Library. (It is understood that the user who changes the contents of definitions files in the Library will not necessarily be able to recompile the application to use the modified definitions.) b) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (1) uses at run time a copy of the library already present on the user's computer system, rather than copying library functions into the executable, and (2) will operate properly with a modified version of the library, if the user installs one, as long as the modified version is interface-compatible with the version that the work was made with. c) Accompany the work with a written offer, valid for at least three years, to give the same user the materials specified in Subsection 6a, above, for a charge no more than the cost of performing this distribution. d) If distribution of the work is made by offering access to copy from a designated place, offer equivalent access to copy the above specified materials from the same place. e) Verify that the user has already received a copy of these materials or that you have already sent this user a copy. For an executable, the required form of the "work that uses the Library" must include any data and utility programs needed for reproducing the executable from it. However, as a special exception, the materials to be distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. It may happen that this requirement contradicts the license restrictions of other proprietary libraries that do not normally accompany the operating system. Such a contradiction means you cannot use both them and the Library together in an executable that you distribute. 7. You may place library facilities that are a work based on the Library side-by-side in a single library together with other library facilities not covered by this License, and distribute such a combined library, provided that the separate distribution of the work based on the Library and of the other library facilities is otherwise permitted, and provided that you do these two things: a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities. This must be distributed under the terms of the Sections above. b) Give prominent notice with the combined library of the fact that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work. 8. You may not copy, modify, sublicense, link with, or distribute the Library except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, link with, or distribute the Library is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 9. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Library or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Library (or any work based on the Library), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Library or works based on it. 10. Each time you redistribute the Library (or any work based on the Library), the recipient automatically receives a license from the original licensor to copy, distribute, link with or modify the Library subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties with this License. 11. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Library at all. For example, if a patent license would not permit royalty-free redistribution of the Library by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Library. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply, and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 12. If the distribution and/or use of the Library is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Library under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 13. The Free Software Foundation may publish revised and/or new versions of the Lesser General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Library specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Library does not specify a license version number, you may choose any version ever published by the Free Software Foundation. 14. If you wish to incorporate parts of the Library into other free programs whose distribution conditions are incompatible with these, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Libraries If you develop a new library, and you want it to be of the greatest possible use to the public, we recommend making it free software that everyone can redistribute and change. You can do so by permitting redistribution under these terms (or, alternatively, under the terms of the ordinary General Public License). To apply these terms, attach the following notices to the library. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Also add information on how to contact you by electronic and paper mail. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the library, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the library `Frob' (a library for tweaking knobs) written by James Random Hacker. , 1 April 1990 Ty Coon, President of Vice That's all there is to it! pyenchant-1.6.5/setup.py0000644000175000017500000002177511501534256013372 0ustar rfkrfk# # This is the pyenchant setuptools script. # Originally developed by Ryan Kelly, 2004. # # This script is placed in the public domain. # import distribute_setup distribute_setup.use_setuptools() from setuptools import setup, find_packages, Extension from distutils.archive_util import make_archive import sys import os import shutil setup_kwds = {} if sys.version_info > (3,): setup_kwds["use_2to3"] = True # Location of the prebuilt binaries, if available if sys.platform == "win32": BINDEPS = ".\\tools\\pyenchant-bdist-win32-sources\\build" DYLIB_EXT = ".dll" elif sys.platform == "darwin": BINDEPS = "./tools/pyenchant-bdist-osx-sources/build" DYLIB_EXT = ".dylib" # Package MetaData NAME = "pyenchant" DESCRIPTION = "Python bindings for the Enchant spellchecking system" AUTHOR = "Ryan Kelly" AUTHOR_EMAIL = "ryan@rfk.id.au" URL = "http://www.rfk.id.au/software/pyenchant/" LICENSE = "LGPL" KEYWORDS = "spelling spellcheck enchant" CLASSIFIERS = [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)", "Operating System :: OS Independent", "Programming Language :: Python :: 2", "Programming Language :: Python :: 3", "Topic :: Software Development :: Libraries", "Topic :: Text Processing :: Linguistic", ] # Module Lists PACKAGES = find_packages() EXT_MODULES = [] PKG_DATA = {} EAGER_RES = [] # # Helper functions for packaging dynamic libs on OSX. # def osx_make_lib_relocatable(libpath,bundle_dir=None): """Make an OSX dynamic lib re-locatable by changing dep paths. This function adjusts the path information stored in the given dynamic library, so that is can be bundled into a directory and restributed. It returns a list of any dependencies that must also be included in the bundle directory. """ if sys.platform != "darwin": raise RuntimeError("only works on osx") import subprocess import shutil def do(*cmd): subprocess.Popen(cmd).wait() def bt(*cmd): return subprocess.Popen(cmd,stdout=subprocess.PIPE).stdout.read() (dirnm,nm) = os.path.split(libpath) if bundle_dir is None: bundle_dir = dirnm # Fix the installed name of the lib to be relative to rpath. if libpath.endswith(".dylib"): do("install_name_tool","-id","@loader_path/"+nm,libpath) # Fix references to any non-core dependencies, and copy them into # the target dir so they will be fixed up in turn. deps = [] deplines = bt("otool","-L",libpath).split("\n") if libpath.endswith(".dylib"): deplines = deplines[2:] else: deplines = deplines[1:] for dep in deplines: dep = dep.strip() if not dep: continue dep = dep.split()[0] if dep.startswith("/System/") or dep.startswith("/usr/"): continue depnm = os.path.basename(dep) numdirs = len(dirnm[len(bundle_dir):].split("/")) - 1 loadpath = "@loader_path/" + ("../"*numdirs) + depnm do("install_name_tool","-change",dep,loadpath,libpath) deps.append(dep) return deps def osx_bundle_lib(libpath): """Bundle dependencies into the same directory as the given library.""" if sys.platform != "darwin": raise RuntimeError("only works on osx") bundle_dir = os.path.dirname(libpath) for nm in os.listdir(bundle_dir): oldpath = os.path.join(bundle_dir,nm) if oldpath != libpath and os.path.isfile(oldpath): os.unlink(oldpath) todo = osx_make_lib_relocatable(libpath,bundle_dir) for deppath in todo: depnm = os.path.basename(deppath) bdeppath = os.path.join(bundle_dir,depnm) if not os.path.exists(bdeppath): shutil.copy2(deppath,bdeppath) todo.extend(osx_make_lib_relocatable(bdeppath,bundle_dir)) # # Build and distribution information is different on Windows and OSX. # # There's the possibility of including pre-built support DLLs # for the Windows installer. They will be included if the directory # exists when this script is run. They are copied into # the package directory so setuptools can locate them. # if sys.platform in ("win32","darwin",): PKG_DATA["enchant"] = ["*"+DYLIB_EXT,"lib/*"+DYLIB_EXT, "lib/enchant/*"+DYLIB_EXT, "lib/enchant/*.so", "lib/enchant/*.txt", "share/enchant/myspell/*.*", "share/enchant/ispell/*.*"] EAGER_RES = ["enchant/lib", "enchant/share"] # Copy local DLLs across if available if os.path.exists(BINDEPS): # Main enchant DLL libDir = os.path.join(BINDEPS,"lib") for fName in os.listdir(libDir): if "enchant" in fName and fName.endswith(DYLIB_EXT): print("COPYING: " + fName) if sys.platform == "win32": libroot = os.path.join(".","enchant") EAGER_RES.append("enchant/" + fName) else: libroot = os.path.join(".","enchant","lib") EAGER_RES.append("enchant/lib/" + fName) shutil.copy(os.path.join(libDir,fName),libroot) break # Dependencies. On win32 we just bundle everything, on OSX we call # a helper function that tracks (and re-writes) dependencies if sys.platform == "darwin": osx_bundle_lib(os.path.join(libroot,fName)) for fName in os.listdir(libroot): EAGER_RES.append("enchant/lib/" + fName) else: for fName in os.listdir(libDir): if fName.endswith(DYLIB_EXT): print("COPYING: " + fName) libroot = os.path.join(".","enchant") EAGER_RES.append("enchant/" + fName) shutil.copy(os.path.join(libDir,fName),libroot) # Enchant plugins plugDir = os.path.join(BINDEPS,"lib","enchant") for fName in os.listdir(plugDir): if fName.endswith(DYLIB_EXT) or fName.endswith(".so"): print("COPYING: " + fName) EAGER_RES.append("enchant/lib/enchant/" + fName) fDest = os.path.join(".","enchant","lib","enchant",fName) shutil.copy(os.path.join(plugDir,fName),fDest) if sys.platform == "darwin": osx_make_lib_relocatable(fDest,libroot) # Local Dictionaries dictPath = os.path.join(BINDEPS,"share","enchant","myspell") if os.path.isdir(dictPath): for dictName in os.listdir(dictPath): if dictName[-3:] in ["txt","dic","aff"]: print("COPYING: " + dictName) shutil.copy(os.path.join(dictPath,dictName), os.path.join(".","enchant","share","enchant","myspell")) dictPath = os.path.join(BINDEPS,"share","enchant","ispell") if os.path.isdir(dictPath): for dictName in os.listdir(dictPath): if dictName.endswith("hash") or dictName == "README.txt": print("COPYING: " + dictName) shutil.copy(os.path.join(dictPath,dictName), os.path.join(".","enchant","share","enchant","ispell")) ## Now we can import enchant to get at version info import enchant VERSION = enchant.__version__ ## ## Main call to setup() function ## setup(name=NAME, version=VERSION, author=AUTHOR, author_email=AUTHOR_EMAIL, url=URL, description=DESCRIPTION, license=LICENSE, keywords=KEYWORDS, classifiers=CLASSIFIERS, packages=PACKAGES, package_data=PKG_DATA, eager_resources=EAGER_RES, include_package_data=True, test_suite="enchant.tests.buildtestsuite", ) dist_dir = os.path.join(os.path.dirname(__file__),"dist") if os.path.exists(dist_dir): for nm in os.listdir(dist_dir): # Rename any eggs to make it clear they're platform-specific. # This isn't done by default because we don't build any extension modules, # but rather bundle our libs as data_files. if nm.endswith("py%d.%d.egg" % sys.version_info[:2]): if sys.platform == "win32": platform = "win32" elif sys.platform == "darwin": platform = "macosx-10.4-universal" else: continue newname = nm.rsplit(".",1)[0] + "-" + platform + ".egg" newpath = os.path.join(dist_dir,newname) if os.path.exists(newpath): os.unlink(newpath) os.rename(os.path.join(dist_dir,nm),newpath) # Rename any mpkgs to give better platform info, and zip them up # for easy uploading to PyPI. elif nm.endswith(".mpkg"): if sys.platform != "darwin": continue platform = "macosx-10.4-universal" if platform in nm: continue newname = nm.rsplit("macosx",1)[0] + platform + ".mpkg" newpath = os.path.join(dist_dir,newname) if os.path.exists(newpath): shutil.rmtree(newpath) os.rename(os.path.join(dist_dir,nm),newpath) if os.path.exists(newpath+".zip"): os.unlink(newpath+".zip") make_archive(newpath,"zip",dist_dir,newname) shutil.rmtree(newpath) pyenchant-1.6.5/TODO.txt0000644000175000017500000000040311437044404013146 0ustar rfkrfk PyEnchant: * implement a tkSpellCheckerDialog to compliment the wx one * Expand CmdLineChecker and make it ispell-compatible * rename "tokenize", it tends to conflict with the top-level stdlib module of the same name (although I've no idea why) pyenchant-1.6.5/distribute_setup.py0000644000175000017500000003566511501534256015633 0ustar rfkrfk#!python """Bootstrap distribute installation If you want to use setuptools in your package's setup.py, just include this file in the same directory with it, and add this to the top of your setup.py:: from distribute_setup import use_setuptools use_setuptools() If you want to require a specific version of setuptools, set a download mirror, or use an alternate download directory, you can do so by supplying the appropriate options to ``use_setuptools()``. This file can also be run as a script to install or upgrade setuptools. """ import os import sys import time import fnmatch import tempfile import tarfile from distutils import log try: from site import USER_SITE except ImportError: USER_SITE = None try: import subprocess def _python_cmd(*args): args = (sys.executable,) + args return subprocess.call(args) == 0 except ImportError: # will be used for python 2.3 def _python_cmd(*args): args = (sys.executable,) + args # quoting arguments if windows if sys.platform == 'win32': def quote(arg): if ' ' in arg: return '"%s"' % arg return arg args = [quote(arg) for arg in args] return os.spawnl(os.P_WAIT, sys.executable, *args) == 0 DEFAULT_VERSION = "0.6.10" DEFAULT_URL = "http://pypi.python.org/packages/source/d/distribute/" SETUPTOOLS_FAKED_VERSION = "0.6c11" SETUPTOOLS_PKG_INFO = """\ Metadata-Version: 1.0 Name: setuptools Version: %s Summary: xxxx Home-page: xxx Author: xxx Author-email: xxx License: xxx Description: xxx """ % SETUPTOOLS_FAKED_VERSION def _install(tarball): # extracting the tarball tmpdir = tempfile.mkdtemp() log.warn('Extracting in %s', tmpdir) old_wd = os.getcwd() try: os.chdir(tmpdir) tar = tarfile.open(tarball) _extractall(tar) tar.close() # going in the directory subdir = os.path.join(tmpdir, os.listdir(tmpdir)[0]) os.chdir(subdir) log.warn('Now working in %s', subdir) # installing log.warn('Installing Distribute') if not _python_cmd('setup.py', 'install'): log.warn('Something went wrong during the installation.') log.warn('See the error message above.') finally: os.chdir(old_wd) def _build_egg(egg, tarball, to_dir): # extracting the tarball tmpdir = tempfile.mkdtemp() log.warn('Extracting in %s', tmpdir) old_wd = os.getcwd() try: os.chdir(tmpdir) tar = tarfile.open(tarball) _extractall(tar) tar.close() # going in the directory subdir = os.path.join(tmpdir, os.listdir(tmpdir)[0]) os.chdir(subdir) log.warn('Now working in %s', subdir) # building an egg log.warn('Building a Distribute egg in %s', to_dir) _python_cmd('setup.py', '-q', 'bdist_egg', '--dist-dir', to_dir) finally: os.chdir(old_wd) # returning the result log.warn(egg) if not os.path.exists(egg): raise IOError('Could not build the egg.') def _do_download(version, download_base, to_dir, download_delay): egg = os.path.join(to_dir, 'distribute-%s-py%d.%d.egg' % (version, sys.version_info[0], sys.version_info[1])) if not os.path.exists(egg): tarball = download_setuptools(version, download_base, to_dir, download_delay) _build_egg(egg, tarball, to_dir) sys.path.insert(0, egg) import setuptools setuptools.bootstrap_install_from = egg def use_setuptools(version=DEFAULT_VERSION, download_base=DEFAULT_URL, to_dir=os.curdir, download_delay=15, no_fake=True): # making sure we use the absolute path to_dir = os.path.abspath(to_dir) was_imported = 'pkg_resources' in sys.modules or \ 'setuptools' in sys.modules try: try: import pkg_resources if not hasattr(pkg_resources, '_distribute'): if not no_fake: _fake_setuptools() raise ImportError except ImportError: return _do_download(version, download_base, to_dir, download_delay) try: pkg_resources.require("distribute>="+version) return except pkg_resources.VersionConflict: e = sys.exc_info()[1] if was_imported: sys.stderr.write( "The required version of distribute (>=%s) is not available,\n" "and can't be installed while this script is running. Please\n" "install a more recent version first, using\n" "'easy_install -U distribute'." "\n\n(Currently using %r)\n" % (version, e.args[0])) sys.exit(2) else: del pkg_resources, sys.modules['pkg_resources'] # reload ok return _do_download(version, download_base, to_dir, download_delay) except pkg_resources.DistributionNotFound: return _do_download(version, download_base, to_dir, download_delay) finally: if not no_fake: _create_fake_setuptools_pkg_info(to_dir) def download_setuptools(version=DEFAULT_VERSION, download_base=DEFAULT_URL, to_dir=os.curdir, delay=15): """Download distribute from a specified location and return its filename `version` should be a valid distribute version number that is available as an egg for download under the `download_base` URL (which should end with a '/'). `to_dir` is the directory where the egg will be downloaded. `delay` is the number of seconds to pause before an actual download attempt. """ # making sure we use the absolute path to_dir = os.path.abspath(to_dir) try: from urllib.request import urlopen except ImportError: from urllib2 import urlopen tgz_name = "distribute-%s.tar.gz" % version url = download_base + tgz_name saveto = os.path.join(to_dir, tgz_name) src = dst = None if not os.path.exists(saveto): # Avoid repeated downloads try: log.warn("Downloading %s", url) src = urlopen(url) # Read/write all in one block, so we don't create a corrupt file # if the download is interrupted. data = src.read() dst = open(saveto, "wb") dst.write(data) finally: if src: src.close() if dst: dst.close() return os.path.realpath(saveto) def _patch_file(path, content): """Will backup the file then patch it""" existing_content = open(path).read() if existing_content == content: # already patched log.warn('Already patched.') return False log.warn('Patching...') _rename_path(path) f = open(path, 'w') try: f.write(content) finally: f.close() return True def _same_content(path, content): return open(path).read() == content def _no_sandbox(function): def __no_sandbox(*args, **kw): try: from setuptools.sandbox import DirectorySandbox def violation(*args): pass DirectorySandbox._old = DirectorySandbox._violation DirectorySandbox._violation = violation patched = True except ImportError: patched = False try: return function(*args, **kw) finally: if patched: DirectorySandbox._violation = DirectorySandbox._old del DirectorySandbox._old return __no_sandbox @_no_sandbox def _rename_path(path): new_name = path + '.OLD.%s' % time.time() log.warn('Renaming %s into %s', path, new_name) os.rename(path, new_name) return new_name def _remove_flat_installation(placeholder): if not os.path.isdir(placeholder): log.warn('Unkown installation at %s', placeholder) return False found = False for file in os.listdir(placeholder): if fnmatch.fnmatch(file, 'setuptools*.egg-info'): found = True break if not found: log.warn('Could not locate setuptools*.egg-info') return log.warn('Removing elements out of the way...') pkg_info = os.path.join(placeholder, file) if os.path.isdir(pkg_info): patched = _patch_egg_dir(pkg_info) else: patched = _patch_file(pkg_info, SETUPTOOLS_PKG_INFO) if not patched: log.warn('%s already patched.', pkg_info) return False # now let's move the files out of the way for element in ('setuptools', 'pkg_resources.py', 'site.py'): element = os.path.join(placeholder, element) if os.path.exists(element): _rename_path(element) else: log.warn('Could not find the %s element of the ' 'Setuptools distribution', element) return True def _after_install(dist): log.warn('After install bootstrap.') placeholder = dist.get_command_obj('install').install_purelib _create_fake_setuptools_pkg_info(placeholder) @_no_sandbox def _create_fake_setuptools_pkg_info(placeholder): if not placeholder or not os.path.exists(placeholder): log.warn('Could not find the install location') return pyver = '%s.%s' % (sys.version_info[0], sys.version_info[1]) setuptools_file = 'setuptools-%s-py%s.egg-info' % \ (SETUPTOOLS_FAKED_VERSION, pyver) pkg_info = os.path.join(placeholder, setuptools_file) if os.path.exists(pkg_info): log.warn('%s already exists', pkg_info) return log.warn('Creating %s', pkg_info) f = open(pkg_info, 'w') try: f.write(SETUPTOOLS_PKG_INFO) finally: f.close() pth_file = os.path.join(placeholder, 'setuptools.pth') log.warn('Creating %s', pth_file) f = open(pth_file, 'w') try: f.write(os.path.join(os.curdir, setuptools_file)) finally: f.close() def _patch_egg_dir(path): # let's check if it's already patched pkg_info = os.path.join(path, 'EGG-INFO', 'PKG-INFO') if os.path.exists(pkg_info): if _same_content(pkg_info, SETUPTOOLS_PKG_INFO): log.warn('%s already patched.', pkg_info) return False _rename_path(path) os.mkdir(path) os.mkdir(os.path.join(path, 'EGG-INFO')) pkg_info = os.path.join(path, 'EGG-INFO', 'PKG-INFO') f = open(pkg_info, 'w') try: f.write(SETUPTOOLS_PKG_INFO) finally: f.close() return True def _before_install(): log.warn('Before install bootstrap.') _fake_setuptools() def _under_prefix(location): if 'install' not in sys.argv: return True args = sys.argv[sys.argv.index('install')+1:] for index, arg in enumerate(args): for option in ('--root', '--prefix'): if arg.startswith('%s=' % option): top_dir = arg.split('root=')[-1] return location.startswith(top_dir) elif arg == option: if len(args) > index: top_dir = args[index+1] return location.startswith(top_dir) elif option == '--user' and USER_SITE is not None: return location.startswith(USER_SITE) return True def _fake_setuptools(): log.warn('Scanning installed packages') try: import pkg_resources except ImportError: # we're cool log.warn('Setuptools or Distribute does not seem to be installed.') return ws = pkg_resources.working_set try: setuptools_dist = ws.find(pkg_resources.Requirement.parse('setuptools', replacement=False)) except TypeError: # old distribute API setuptools_dist = ws.find(pkg_resources.Requirement.parse('setuptools')) if setuptools_dist is None: log.warn('No setuptools distribution found') return # detecting if it was already faked setuptools_location = setuptools_dist.location log.warn('Setuptools installation detected at %s', setuptools_location) # if --root or --preix was provided, and if # setuptools is not located in them, we don't patch it if not _under_prefix(setuptools_location): log.warn('Not patching, --root or --prefix is installing Distribute' ' in another location') return # let's see if its an egg if not setuptools_location.endswith('.egg'): log.warn('Non-egg installation') res = _remove_flat_installation(setuptools_location) if not res: return else: log.warn('Egg installation') pkg_info = os.path.join(setuptools_location, 'EGG-INFO', 'PKG-INFO') if (os.path.exists(pkg_info) and _same_content(pkg_info, SETUPTOOLS_PKG_INFO)): log.warn('Already patched.') return log.warn('Patching...') # let's create a fake egg replacing setuptools one res = _patch_egg_dir(setuptools_location) if not res: return log.warn('Patched done.') _relaunch() def _relaunch(): log.warn('Relaunching...') # we have to relaunch the process args = [sys.executable] + sys.argv sys.exit(subprocess.call(args)) def _extractall(self, path=".", members=None): """Extract all members from the archive to the current working directory and set owner, modification time and permissions on directories afterwards. `path' specifies a different directory to extract to. `members' is optional and must be a subset of the list returned by getmembers(). """ import copy import operator from tarfile import ExtractError directories = [] if members is None: members = self for tarinfo in members: if tarinfo.isdir(): # Extract directories with a safe mode. directories.append(tarinfo) tarinfo = copy.copy(tarinfo) tarinfo.mode = 448 # decimal for oct 0700 self.extract(tarinfo, path) # Reverse sort directories. if sys.version_info < (2, 4): def sorter(dir1, dir2): return cmp(dir1.name, dir2.name) directories.sort(sorter) directories.reverse() else: directories.sort(key=operator.attrgetter('name'), reverse=True) # Set correct owner, mtime and filemode on directories. for tarinfo in directories: dirpath = os.path.join(path, tarinfo.name) try: self.chown(tarinfo, dirpath) self.utime(tarinfo, dirpath) self.chmod(tarinfo, dirpath) except ExtractError: e = sys.exc_info()[1] if self.errorlevel > 1: raise else: self._dbg(1, "tarfile: %s" % e) def main(argv, version=DEFAULT_VERSION): """Install or upgrade setuptools and EasyInstall""" tarball = download_setuptools() _install(tarball) if __name__ == '__main__': main(sys.argv[1:]) pyenchant-1.6.5/PKG-INFO0000644000175000017500000000135211501654510012736 0ustar rfkrfkMetadata-Version: 1.0 Name: pyenchant Version: 1.6.5 Summary: Python bindings for the Enchant spellchecking system Home-page: http://www.rfk.id.au/software/pyenchant/ Author: Ryan Kelly Author-email: ryan@rfk.id.au License: LGPL Description: UNKNOWN Keywords: spelling spellcheck enchant Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL) Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 3 Classifier: Topic :: Software Development :: Libraries Classifier: Topic :: Text Processing :: Linguistic pyenchant-1.6.5/enchant/0000755000175000017500000000000011501654510013260 5ustar rfkrfkpyenchant-1.6.5/enchant/pypwl.py0000644000175000017500000002242311501534256015014 0ustar rfkrfk# pyenchant # # Copyright (C) 2004-2008 Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ pypwl: pure-python personal word list in the style of Enchant This module provides a pure-python version of the personal word list functionality found in the spellchecking package Enchant. While the same effect can be achieved (with better performance) using the python bindings for Enchant, it requires a C extension. This pure-python implementation uses the same algorithm but without any external dependencies or C code (in fact, it was the author's original prototype for the C version found in Enchant). """ from __future__ import generators import os class Trie: """Class implementing a trie-based dictionary of words. A Trie is a recursive data structure storing words by their prefix. "Fuzzy matching" can be done by allowing a certain number of missteps when traversing the Trie. """ def __init__(self,words=()): self._eos = False # whether I am the end of a word self._keys = {} # letters at this level of the trie for w in words: self.insert(w) def insert(self,word): if word == "": self._eos = True else: key = word[0] try: subtrie = self[key] except KeyError: subtrie = Trie() self[key] = subtrie subtrie.insert(word[1:]) def remove(self,word): if word == "": self._eos = False else: key = word[0] try: subtrie = self[key] except KeyError: pass else: subtrie.remove(word[1:]) def search(self,word,nerrs=0): """Search for the given word, possibly making errors. This method searches the trie for the given , making precisely errors. It returns a list of words found. """ res = [] # Terminate if we've run out of errors if nerrs < 0: return res # Precise match at the end of the word if nerrs == 0 and word == "": if self._eos: res.append("") # Precisely match word[0] try: subtrie = self[word[0]] subres = subtrie.search(word[1:],nerrs) for w in subres: w2 = word[0] + w if w2 not in res: res.append(w2) except (IndexError, KeyError): pass # match with deletion of word[0] try: subres = self.search(word[1:],nerrs-1) for w in subres: if w not in res: res.append(w) except (IndexError,): pass # match with insertion before word[0] try: for k in self._keys: subres = self[k].search(word,nerrs-1) for w in subres: w2 = k+w if w2 not in res: res.append(w2) except (IndexError,KeyError): pass # match on substitution of word[0] try: for k in self._keys: subres = self[k].search(word[1:],nerrs-1) for w in subres: w2 = k+w if w2 not in res: res.append(w2) except (IndexError,KeyError): pass # All done! return res search._DOC_ERRORS = ["nerrs"] def __getitem__(self,key): return self._keys[key] def __setitem__(self,key,val): self._keys[key] = val def __iter__(self): if self._eos: yield "" for k in self._keys: for w2 in self._keys[k]: yield k + w2 class PyPWL: """Pure-python implementation of Personal Word List dictionary. This class emulates the PWL objects provided by PyEnchant, but implemented purely in python. """ def __init__(self,pwl=None): """PyPWL constructor. This method takes as its only argument the name of a file containing the personal word list, one word per line. Entries will be read from this file, and new entries will be written to it automatically. If is not specified or None, the list is maintained in memory only. """ self.provider = None self._words = Trie() if pwl is not None: self.pwl = os.path.abspath(pwl) self.tag = self.pwl pwlF = file(pwl) for ln in pwlF: word = ln.strip() self.add_to_session(word) pwlF.close() else: self.pwl = None self.tag = "PyPWL" def check(self,word): """Check spelling of a word. This method takes a word in the dictionary language and returns True if it is correctly spelled, and false otherwise. """ res = self._words.search(word) return bool(res) def suggest(self,word): """Suggest possible spellings for a word. This method tries to guess the correct spelling for a given word, returning the possibilities in a list. """ limit = 10 maxdepth = 5 # Iterative deepening until we get enough matches depth = 0 res = self._words.search(word,depth) while len(res) < limit and depth < maxdepth: depth += 1 for w in self._words.search(word,depth): if w not in res: res.append(w) # Limit number of suggs return res[:limit] def add(self,word): """Add a word to the user's personal dictionary. For a PWL, this means appending it to the file. """ if self.pwl is not None: pwlF = file(self.pwl,"a") pwlF.write("%s\n" % (word.strip(),)) pwlF.close() self.add_to_session(word) def add_to_pwl(self,word): """Add a word to the user's personal dictionary. For a PWL, this means appending it to the file. """ warnings.warn("PyPWL.add_to_pwl is deprecated, please use PyPWL.add", category=DeprecationWarning) self.add(word) def remove(self,word): """Add a word to the user's personal exclude list.""" # There's no exclude list for a stand-alone PWL. # Just remove it from the list. self._words.remove(word) if self.pwl is not None: pwlF = file(self.pwl,"wt") for w in self._words: pwlF.write("%s\n" % (w.strip(),)) pwlF.close() def add_to_session(self,word): """Add a word to the session list.""" self._words.insert(word) def is_in_session(self,word): """Check whether a word is in the session list.""" warnings.warn("PyPWL.is_in_session is deprecated, please use PyPWL.is_added",category=DeprecationWarning) # Consider all words to be in the session list return self.check(word) def store_replacement(self,mis,cor): """Store a replacement spelling for a miss-spelled word. This method makes a suggestion to the spellchecking engine that the miss-spelled word is in fact correctly spelled as . Such a suggestion will typically mean that appears early in the list of suggested spellings offered for later instances of . """ # Too much work for this simple spellchecker pass store_replacement._DOC_ERRORS = ["mis","mis"] def is_added(self,word): """Check whether a word is in the personal word list.""" return self.check(word) def is_removed(self,word): """Check whether a word is in the personal exclude list.""" return False # No-op methods to support internal use as a Dict() replacement def _check_this(self,msg): pass def _free(self): pass pyenchant-1.6.5/enchant/utils.py0000644000175000017500000002362711501534256015010 0ustar rfkrfk# pyenchant # # Copyright (C) 2004-2008 Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant.utils: Misc utilities for the enchant package This module provides miscellaneous utilities for use with the enchant spellchecking package. Currently available functionality includes: * string/unicode compatibility wrappers * functions for dealing with locale/language settings * ability to list supporting data files (win32 only) * functions for bundling supporting data files from a build """ import os import sys import codecs from enchant.errors import * # Attempt to access local language information try: import locale except ImportError: locale = None # # Unicode/Bytes compatabilty wrappers. # # These allow us to support both Python 2.x and Python 3.x from # the same codebase. # # We provide explicit type objects "bytes" and "unicode" that can be # used to construct instances of the appropriate type. The class # "EnchantStr" derives from the default "str" type and implements the # necessary logic for encoding/decoding as strings are passed into # the underlying C library (where they must always be utf-8 encoded # byte strings). # try: unicode = unicode except NameError: str = str unicode = str bytes = bytes basestring = (str,bytes) else: str = str unicode = unicode bytes = str basestring = basestring def raw_unicode(raw): """Make a unicode string from a raw string. This function takes a string containing unicode escape characters, and returns the corresponding unicode string. Useful for writing unicode string literals in your python source while being upwards- compatible with Python 3. For example, instead of doing this: s = u"hello\u2149" # syntax error in Python 3 Or this: s = "hello\u2149" # not what you want in Python 2.x You can do this: s = raw_unicode(r"hello\u2149") # works everywhere! """ return raw.encode("utf8").decode("unicode-escape") def raw_bytes(raw): """Make a bytes object out of a raw string. This is analogous to raw_unicode, but processes byte escape characters to produce a bytes object. """ return codecs.escape_decode(raw)[0] class EnchantStr(str): """String subclass for interfacing with enchant C library. This class encapsulates the logic for interfacing between python native string/unicode objects and the underlying enchant library, which expects all strings to be UTF-8 character arrays. It is a subclass of the default string class 'str' - on Python 2.x that makes it an ascii string, on Python 3.x it is a unicode object. Initialise it with a string or unicode object, and use the encode() method to obtain an object suitable for passing to the underlying C library. When strings are read back into python, use decode(s) to translate them back into the appropriate python-level string type. This allows us to following the common Python 2.x idiom of returning unicode when unicode is passed in, and byte strings otherwise. It also lets the interface be upwards-compatible with Python 3, in which string objects are unicode by default. """ def __new__(cls,value): """EnchantStr data constructor. This method records whether the initial string was unicode, then simply passes it along to the default string constructor. """ if type(value) is unicode: was_unicode = True if str is not unicode: value = value.encode("utf-8") else: was_unicode = False if str is not bytes: raise Error("Don't pass bytestrings to pyenchant") self = str.__new__(cls,value) self._was_unicode = was_unicode return self def encode(self): """Encode this string into a form usable by the enchant C library.""" if str is unicode: return str.encode(self,"utf-8") else: return self def decode(self,value): """Decode a string returned by the enchant C library.""" if self._was_unicode: if str is unicode: # On some python3 versions, ctypes converts c_char_p # to str() rather than bytes() if isinstance(value,str): value = value.encode() return value.decode("utf-8") else: return value.decode("utf-8") else: return value def printf(values,sep=" ",end="\n",file=None): """Compatability wrapper from print statement/function. This function is a simple Python2/Python3 compatability wrapper for printing to stdout. """ if file is None: file = sys.stdout file.write(sep.join(map(str,values))) file.write(end) try: next = next except NameError: def next(iter): """Compatability wrapper for advancing an iterator.""" return iter.next() try: xrange = xrange except NameError: xrange = range def get_default_language(default=None): """Determine the user's default language, if possible. This function uses the 'locale' module to try to determine the user's preferred language. The return value is as follows: * if a locale is available for the LC_MESSAGES category, that language is used * if a default locale is available, that language is used * if the keyword argument is given, it is used * if nothing else works, None is returned Note that determining the user's language is in general only possible if they have set the necessary environment variables on their system. """ try: import locale tag = locale.getlocale()[0] if tag is None: tag = locale.getdefaultlocale()[0] if tag is None: raise Error("No default language available") return tag except Exception: pass return default get_default_language._DOC_ERRORS = ["LC"] def get_resource_filename(resname): """Get the absolute path to the named resource file. This serves widely the same purpose as pkg_resources.resource_filename(), but tries to avoid loading pkg_resources unless we're actually in an egg. """ path = os.path.dirname(os.path.abspath(__file__)) path = os.path.join(path,resname) if os.path.exists(path): return path if hasattr(sys, "frozen"): exe_path = unicode(sys.executable,sys.getfilesystemencoding()) exe_dir = os.path.dirname(exe_path) path = os.path.join(exe_dir, resname) if os.path.exists(path): return path else: import pkg_resources try: path = pkg_resources.resource_filename("enchant",resname) except KeyError: pass else: path = os.path.abspath(path) if os.path.exists(path): return path raise Error("Could not locate resource '%s'" % (resname,)) def win32_data_files(): """Get list of supporting data files, for use with setup.py This function returns a list of the supporting data files available to the running version of PyEnchant. This is in the format expected by the data_files argument of the distutils setup function. It's very useful, for example, for including the data files in an executable produced by py2exe. Only really tested on the win32 platform (it's the only platform for which we ship our own supporting data files) """ # Include the main enchant DLL try: libEnchant = get_resource_filename("libenchant.dll") except Error: libEnchant = get_resource_filename("libenchant-1.dll") mainDir = os.path.dirname(libEnchant) dataFiles = [('',[libEnchant])] # And some specific supporting DLLs for dll in os.listdir(mainDir): if not dll.endswith(".dll"): continue for prefix in ("iconv","intl","libglib","libgmodule"): if dll.startswith(prefix): break else: continue dataFiles[0][1].append(os.path.join(mainDir,dll)) # And anything found in the supporting data directories dataDirs = ("share/enchant/myspell","share/enchant/ispell","lib/enchant") for dataDir in dataDirs: files = [] fullDir = os.path.join(mainDir,os.path.normpath(dataDir)) for fn in os.listdir(fullDir): fullFn = os.path.join(fullDir,fn) if os.path.isfile(fullFn): files.append(fullFn) dataFiles.append((dataDir,files)) return dataFiles win32_data_files._DOC_ERRORS = ["py","py","exe"] pyenchant-1.6.5/enchant/checker/0000755000175000017500000000000011501654510014664 5ustar rfkrfkpyenchant-1.6.5/enchant/checker/CmdLineChecker.py0000644000175000017500000001606511501534256020052 0ustar rfkrfk# pyenchant # # Copyright (C) 2004-2008, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant.checker.CmdLineChecker: Command-Line spell checker This module provides the class CmdLineChecker, which interactively spellchecks a piece of text by interacting with the user on the command line. It can also be run as a script to spellcheck a file. """ import sys from enchant.checker import SpellChecker from enchant.utils import printf class CmdLineChecker: """A simple command-line spell checker. This class implements a simple command-line spell checker. It must be given a SpellChecker instance to operate on, and interacts with the user by printing instructions on stdout and reading commands from stdin. """ _DOC_ERRORS = ["stdout","stdin"] def __init__(self): self._stop = False self._checker = None def set_checker(self,chkr): self._checker = chkr def get_checker(self,chkr): return self._checker def run(self): """Run the spellchecking loop.""" self._stop = False for err in self._checker: self.error = err printf(["ERROR:", err.word]) printf(["HOW ABOUT:", err.suggest()]) status = self.read_command() while not status and not self._stop: status = self.read_command() if self._stop: break def print_help(self): printf(["0..N: replace with the numbered suggestion"]) printf(["R0..rN: always replace with the numbered suggestion"]) printf(["i: ignore this word"]) printf(["I: always ignore this word"]) printf(["a: add word to personal dictionary"]) printf(["e: edit the word"]) printf(["q: quit checking"]) printf(["h: print this help message"]) printf(["----------------------------------------------------"]) printf(["HOW ABOUT:", self.error.suggest()]) def read_command(self): cmd = raw_input(">> ") cmd = cmd.strip() if cmd.isdigit(): repl = int(cmd) suggs = self.error.suggest() if repl >= len(suggs): printf(["No suggestion number", repl]) return False printf(["Replacing '%s' with '%s'" % (self.error.word,suggs[repl])]) self.error.replace(suggs[repl]) return True if cmd[0] == "R": if not cmd[1:].isdigit(): printf(["Badly formatted command (try 'help')"]) return False repl = int(cmd[1:]) suggs = self.error.suggest() if repl >= len(suggs): printf(["No suggestion number", repl]) return False self.error.replace_always(suggs[repl]) return True if cmd == "i": return True if cmd == "I": self.error.ignore_always() return True if cmd == "a": self.error.add() return True if cmd == "e": repl = raw_input("New Word: ") self.error.replace(repl.strip()) return True if cmd == "q": self._stop = True return True if "help".startswith(cmd.lower()): self.print_help() return False printf(["Badly formatted command (try 'help')"]) return False def run_on_file(self,infile,outfile=None,enc=None): """Run spellchecking on the named file. This method can be used to run the spellchecker over the named file. If is not given, the corrected contents replace the contents of . If is given, the corrected contents will be written to that file. Use "-" to have the contents written to stdout. If is given, it specifies the encoding used to read the file's contents into a unicode string. The output will be written in the same encoding. """ inStr = "".join(file(infile,"r").readlines()) if enc is not None: inStr = inStr.decode(enc) self._checker.set_text(inStr) self.run() outStr = self._checker.get_text() if enc is not None: outStr = outStr.encode(enc) if outfile is None: outF = file(infile,"w") elif outfile == "-": outF = sys.stdout else: outF = file(outfile,"w") outF.write(outStr) outF.close() run_on_file._DOC_ERRORS = ["outfile","infile","outfile","stdout"] def _run_as_script(): """Run the command-line spellchecker as a script. This function allows the spellchecker to be invoked from the command-line to check spelling in a file. """ # Check necessary command-line options from optparse import OptionParser op = OptionParser() op.add_option("-o","--output",dest="outfile",metavar="FILE", help="write changes into FILE") op.add_option("-l","--lang",dest="lang",metavar="TAG",default="en_US", help="use language idenfified by TAG") op.add_option("-e","--encoding",dest="enc",metavar="ENC", help="file is unicode with encoding ENC") (opts,args) = op.parse_args() # Sanity check if len(args) < 1: raise ValueError("Must name a file to check") if len(args) > 1: raise ValueError("Can only check a single file") # Create and run the checker chkr = SpellChecker(opts.lang) cmdln = CmdLineChecker() cmdln.set_checker(chkr) cmdln.run_on_file(args[0],opts.outfile,opts.enc) if __name__ == "__main__": _run_as_script() pyenchant-1.6.5/enchant/checker/wxSpellCheckerDialog.py0000644000175000017500000002472111501534256021313 0ustar rfkrfk# pyenchant # # Copyright (C) 2004-2008, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # # Major code cleanup and re-write thanks to Phil Mayes, 2007 # """ enchant.checker.wxSpellCheckerDialog: wxPython spellchecker interface This module provides the class wxSpellCheckerDialog, which provides a wxPython dialog that can be used as an interface to a spell checking session. Currently it is intended as a proof-of-concept and demonstration class, but it should be suitable for general-purpose use in a program. The class must be given an enchant.checker.SpellChecker object with which to operate. It can (in theory...) be used in modal and non-modal modes. Use Show() when operating on an array of characters as it will modify the array in place, meaning other work can be done at the same time. Use ShowModal() when operating on a static string. """ _DOC_ERRORS = ["ShowModal"] import wx from enchant.utils import printf class wxSpellCheckerDialog(wx.Dialog): """Simple spellcheck dialog for wxPython This class implements a simple spellcheck interface for wxPython, in the form of a dialog. It's intended mainly of an example of how to do this, although it should be useful for applications that just need a simple graphical spellchecker. To use, a SpellChecker instance must be created and passed to the dialog before it is shown: >>> dlg = wxSpellCheckerDialog(None,-1,"") >>> chkr = SpellChecker("en_AU",text) >>> dlg.SetSpellChecker(chkr) >>> dlg.Show() This is most useful when the text to be checked is in the form of a character array, as it will be modified in place as the user interacts with the dialog. For checking strings, the final result will need to be obtained from the SpellChecker object: >>> dlg = wxSpellCheckerDialog(None,-1,"") >>> chkr = SpellChecker("en_AU",text) >>> dlg.SetSpellChecker(chkr) >>> dlg.ShowModal() >>> text = dlg.GetSpellChecker().get_text() Currently the checker must deal with strings of the same type as returned by wxPython - unicode or normal string depending on the underlying system. This needs to be fixed, somehow... """ _DOC_ERRORS = ["dlg","chkr","dlg","SetSpellChecker","chkr","dlg", "dlg","chkr","dlg","SetSpellChecker","chkr","dlg", "ShowModal","dlg","GetSpellChecker"] # Remember dialog size across invocations by storing it on the class sz = (300,70) def __init__(self, parent=None,id=-1,title="Checking Spelling..."): wx.Dialog.__init__(self, parent, id, title, size=wxSpellCheckerDialog.sz, style=wx.DEFAULT_DIALOG_STYLE|wx.RESIZE_BORDER) self._numContext = 40 self._checker = None self._buttonsEnabled = True self.error_text = wx.TextCtrl(self, -1, "", style=wx.TE_MULTILINE|wx.TE_READONLY|wx.TE_RICH) self.replace_text = wx.TextCtrl(self, -1, "", style=wx.TE_PROCESS_ENTER) self.replace_list = wx.ListBox(self, -1, style=wx.LB_SINGLE) self.InitLayout() wx.EVT_LISTBOX(self,self.replace_list.GetId(),self.OnReplSelect) wx.EVT_LISTBOX_DCLICK(self,self.replace_list.GetId(),self.OnReplace) def InitLayout(self): """Lay out controls and add buttons.""" sizer = wx.BoxSizer(wx.HORIZONTAL) txtSizer = wx.BoxSizer(wx.VERTICAL) btnSizer = wx.BoxSizer(wx.VERTICAL) replaceSizer = wx.BoxSizer(wx.HORIZONTAL) txtSizer.Add(wx.StaticText(self, -1, "Unrecognised Word:"), 0, wx.LEFT|wx.TOP, 5) txtSizer.Add(self.error_text, 1, wx.ALL|wx.EXPAND, 5) replaceSizer.Add(wx.StaticText(self, -1, "Replace with:"), 0, wx.ALL|wx.ALIGN_CENTER_VERTICAL, 5) replaceSizer.Add(self.replace_text, 1, wx.ALL|wx.ALIGN_CENTER_VERTICAL, 5) txtSizer.Add(replaceSizer, 0, wx.EXPAND, 0) txtSizer.Add(self.replace_list, 2, wx.ALL|wx.EXPAND, 5) sizer.Add(txtSizer, 1, wx.EXPAND, 0) self.buttons = [] for label, action, tip in (\ ("Ignore", self.OnIgnore, "Ignore this word and continue"), ("Ignore All", self.OnIgnoreAll, "Ignore all instances of this word and continue"), ("Replace", self.OnReplace, "Replace this word"), ("Replace All", self.OnReplaceAll, "Replace all instances of this word"), ("Add", self.OnAdd, "Add this word to the dictionary"), ("Done", self.OnDone, "Finish spell-checking and accept changes"), ): btn = wx.Button(self, -1, label) btn.SetToolTip(wx.ToolTip(tip)) btnSizer.Add(btn, 0, wx.ALIGN_RIGHT|wx.ALL, 4) btn.Bind(wx.EVT_BUTTON, action) self.buttons.append(btn) sizer.Add(btnSizer, 0, wx.ALL|wx.EXPAND, 5) self.SetAutoLayout(True) self.SetSizer(sizer) sizer.Fit(self) def Advance(self): """Advance to the next error. This method advances the SpellChecker to the next error, if any. It then displays the error and some surrounding context, and well as listing the suggested replacements. """ # Disable interaction if no checker if self._checker is None: self.EnableButtons(False) return False # Advance to next error, disable if not available try: self._checker.next() except StopIteration: self.EnableButtons(False) self.error_text.SetValue("") self.replace_list.Clear() self.replace_text.SetValue("") if self.IsModal(): # test needed for SetSpellChecker call # auto-exit when checking complete self.EndModal(wx.ID_OK) return False self.EnableButtons() # Display error context with erroneous word in red. # Restoring default style was misbehaving under win32, so # I am forcing the rest of the text to be black. self.error_text.SetValue("") self.error_text.SetDefaultStyle(wx.TextAttr(wx.BLACK)) lContext = self._checker.leading_context(self._numContext) self.error_text.AppendText(lContext) self.error_text.SetDefaultStyle(wx.TextAttr(wx.RED)) self.error_text.AppendText(self._checker.word) self.error_text.SetDefaultStyle(wx.TextAttr(wx.BLACK)) tContext = self._checker.trailing_context(self._numContext) self.error_text.AppendText(tContext) # Display suggestions in the replacements list suggs = self._checker.suggest() self.replace_list.Set(suggs) self.replace_text.SetValue(suggs and suggs[0] or '') return True def EnableButtons(self, state=True): """Enable the checking-related buttons""" if state != self._buttonsEnabled: for btn in self.buttons[:-1]: btn.Enable(state) self._buttonsEnabled = state def GetRepl(self): """Get the chosen replacement string.""" repl = self.replace_text.GetValue() return repl def OnAdd(self, evt): """Callback for the "add" button.""" self._checker.add() self.Advance() def OnDone(self, evt): """Callback for the "close" button.""" wxSpellCheckerDialog.sz = self.error_text.GetSizeTuple() if self.IsModal(): self.EndModal(wx.ID_OK) else: self.Close() def OnIgnore(self, evt): """Callback for the "ignore" button. This simply advances to the next error. """ self.Advance() def OnIgnoreAll(self, evt): """Callback for the "ignore all" button.""" self._checker.ignore_always() self.Advance() def OnReplace(self, evt): """Callback for the "replace" button.""" repl = self.GetRepl() if repl: self._checker.replace(repl) self.Advance() def OnReplaceAll(self, evt): """Callback for the "replace all" button.""" repl = self.GetRepl() self._checker.replace_always(repl) self.Advance() def OnReplSelect(self, evt): """Callback when a new replacement option is selected.""" sel = self.replace_list.GetSelection() if sel == -1: return opt = self.replace_list.GetString(sel) self.replace_text.SetValue(opt) def SetSpellChecker(self,chkr): """Set the spell checker, advancing to the first error. Return True if error(s) to correct, else False.""" self._checker = chkr return self.Advance() def _test(): class TestDialog(wxSpellCheckerDialog): def __init__(self,*args): wxSpellCheckerDialog.__init__(self,*args) wx.EVT_CLOSE(self,self.OnClose) def OnClose(self,evnt): if self._checker is not None: printf(["AFTER:", self._checker.get_text()]) self.Destroy() from enchant.checker import SpellChecker text = "This is sme text with a fw speling errors in it. Here are a fw more to tst it ut." printf(["BEFORE:", text]) app = wx.PySimpleApp() dlg = TestDialog() chkr = SpellChecker("en_US",text) dlg.SetSpellChecker(chkr) dlg.Show() app.MainLoop() if __name__ == "__main__": _test() pyenchant-1.6.5/enchant/checker/GtkSpellCheckerDialog.py0000644000175000017500000002356211501534256021404 0ustar rfkrfk# GtkSpellCheckerDialog for pyenchant # # Copyright (C) 2004-2005, Fredrik Corneliusson # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # import gtk import gobject from enchant.utils import printf, unicode # columns COLUMN_SUGGESTION = 0 def create_list_view(col_label,): # create list widget list_ = gtk.ListStore(str) list_view = gtk.TreeView(model=list_) list_view.set_rules_hint(True) list_view.get_selection().set_mode(gtk.SELECTION_SINGLE) # Add Colums renderer = gtk.CellRendererText() renderer.set_data("column", COLUMN_SUGGESTION) column = gtk.TreeViewColumn(col_label, renderer,text=COLUMN_SUGGESTION) list_view.append_column(column) return list_view class GtkSpellCheckerDialog(gtk.Window): def __init__(self, *args,**kwargs): gtk.Window.__init__(self,*args,**kwargs) self.set_title('Spell check') self.set_default_size(350, 200) self._checker = None self._numContext = 40 self.errors = None # create accel group accel_group = gtk.AccelGroup() self.add_accel_group(accel_group) # list of widgets to disable if there's no spell error left self._conditional_widgets = [] conditional = self._conditional_widgets.append # layout mainbox = gtk.VBox(spacing=5) hbox = gtk.HBox(spacing=5) self.add(mainbox) mainbox.pack_start(hbox,padding=5) box1 = gtk.VBox(spacing=5) hbox.pack_start(box1,padding=5) conditional(box1) # unreconized word text_view_lable = gtk.Label('Unreconized word') text_view_lable.set_justify(gtk.JUSTIFY_LEFT) box1.pack_start(text_view_lable,False,False) text_view = gtk.TextView() text_view.set_wrap_mode(gtk.WRAP_WORD) text_view.set_editable(False) text_view.set_cursor_visible(False) self.error_text = text_view.get_buffer() text_buffer = text_view.get_buffer() text_buffer.create_tag("fg_black", foreground="black") text_buffer.create_tag("fg_red", foreground="red") box1.pack_start(text_view) # Change to change_to_box = gtk.HBox() box1.pack_start(change_to_box,False,False) change_to_label = gtk.Label('Change to:') self.replace_text = gtk.Entry() text_view_lable.set_justify(gtk.JUSTIFY_LEFT) change_to_box.pack_start(change_to_label,False,False) change_to_box.pack_start(self.replace_text) # scrolled window sw = gtk.ScrolledWindow() sw.set_shadow_type(gtk.SHADOW_ETCHED_IN) sw.set_policy(gtk.POLICY_AUTOMATIC, gtk.POLICY_AUTOMATIC) box1.pack_start(sw) self.suggestion_list_view = create_list_view('Suggestions') self.suggestion_list_view.connect("button_press_event", self._onButtonPress) self.suggestion_list_view.connect("cursor-changed", self._onSuggestionChanged) sw.add(self.suggestion_list_view) #---Buttons---#000000#FFFFFF---------------------------------------------------- button_box = gtk.VButtonBox() hbox.pack_start(button_box, False, False) # Ignore button = gtk.Button("Ignore") button.connect("clicked", self._onIgnore) button.add_accelerator("activate", accel_group, gtk.keysyms.Return, 0, gtk.ACCEL_VISIBLE) button_box.pack_start(button) conditional(button) # Ignore all button = gtk.Button("Ignore All") button.connect("clicked", self._onIgnoreAll) button_box.pack_start(button) conditional(button) # Replace button = gtk.Button("Replace") button.connect("clicked", self._onReplace) button_box.pack_start(button) conditional(button) # Replace all button = gtk.Button("Replace All") button.connect("clicked", self._onReplaceAll) button_box.pack_start(button) conditional(button) # Recheck button button = gtk.Button("_Add") button.connect("clicked", self._onAdd) button_box.pack_start(button) conditional(button) # Close button button = gtk.Button(stock=gtk.STOCK_CLOSE) button.connect("clicked", self._onClose) button.add_accelerator("activate", accel_group, gtk.keysyms.Escape, 0, gtk.ACCEL_VISIBLE) button_box.pack_end(button) # dictionary label self._dict_lable = gtk.Label('') mainbox.pack_start(self._dict_lable,False,False,padding=5) mainbox.show_all() def _onIgnore(self,w,*args): printf(["ignore"]) self._advance() def _onIgnoreAll(self,w,*args): printf(["ignore all"]) self._checker.ignore_always() self._advance() def _onReplace(self,*args): printf(["Replace"]) repl = self._getRepl() self._checker.replace(repl) self._advance() def _onReplaceAll(self,*args): printf(["Replace all"]) repl = self._getRepl() self._checker.replace_always(repl) self._advance() def _onAdd(self,*args): """Callback for the "add" button.""" self._checker.add() self._advance() def _onClose(self,w,*args): self.emit('delete_event',gtk.gdk.Event(gtk.gdk.BUTTON_PRESS)) return True def _onButtonPress(self,widget,event): if event.type == gtk.gdk._2BUTTON_PRESS: printf(["Double click!"]) self._onReplace() def _onSuggestionChanged(self,widget,*args): selection = self.suggestion_list_view.get_selection() model, iter = selection.get_selected() if iter: suggestion = model.get_value(iter, COLUMN_SUGGESTION) self.replace_text.set_text(suggestion) def _getRepl(self): """Get the chosen replacement string.""" repl = self.replace_text.get_text() repl = self._checker.coerce_string(repl) return repl def _fillSuggestionList(self,suggestions): model = self.suggestion_list_view.get_model() model.clear() for suggestion in suggestions: value = unicode("%s"%(suggestion,)) model.append([value,]) def setSpellChecker(self,checker): assert checker,'checker cant be None' self._checker = checker self._dict_lable.set_text('Dictionary:%s'%(checker.dict.tag,)) def getSpellChecker(self,checker): return self._checker def updateUI(self): self._advance() def _disableButtons(self): for w in self._conditional_widgets: w.set_sensitive(False) def _enableButtons(self): for w in self._conditional_widgets: w.set_sensitive(True) def _advance(self): """Advance to the next error. This method advances the SpellChecker to the next error, if any. It then displays the error and some surrounding context, and well as listing the suggested replacements. """ # Disable interaction if no checker if self._checker is None: self._disableButtons() self.emit('check-done') return # Advance to next error, disable if not available try: self._checker.next() except StopIteration: self._disableButtons() self.error_text.set_text("") self._fillSuggestionList([]) self.replace_text.set_text("") return self._enableButtons() # Display error context with erroneous word in red self.error_text.set_text('') iter = self.error_text.get_iter_at_offset(0) append = self.error_text.insert_with_tags_by_name lContext = self._checker.leading_context(self._numContext) tContext = self._checker.trailing_context(self._numContext) append(iter, lContext, 'fg_black') append(iter, self._checker.word, 'fg_red') append(iter, tContext, 'fg_black') # Display suggestions in the replacements list suggs = self._checker.suggest() self._fillSuggestionList(suggs) if suggs: self.replace_text.set_text(suggs[0]) else: self.replace_text.set_text("") def _test(): from enchant.checker import SpellChecker text = "This is sme text with a fw speling errors in it. Here are a fw more to tst it ut." printf(["BEFORE:", text]) chk_dlg = GtkSpellCheckerDialog() chk_dlg.show() chk_dlg.connect('delete_event', gtk.main_quit) chkr = SpellChecker("en_US",text) chk_dlg.setSpellChecker(chkr) chk_dlg.updateUI() gtk.main() if __name__ == "__main__": _test() pyenchant-1.6.5/enchant/checker/tests.py0000644000175000017500000002303011501534256016402 0ustar rfkrfk# pyenchant # # Copyright (C) 2004-2009, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant.checker.tests: Unittests for enchant SpellChecker class """ import unittest import enchant import enchant.tokenize from enchant.utils import * from enchant.errors import * from enchant.checker import * class TestChecker(unittest.TestCase): """TestCases for checking behaviour of SpellChecker class.""" def test_basic(self): """Test a basic run of the SpellChecker class.""" text = """This is sme text with a few speling erors in it. Its gret for checking wheather things are working proprly with the SpellChecker class. Not gret for much elss though.""" chkr = SpellChecker("en_US",text=text) for n,err in enumerate(chkr): if n == 0: # Fix up "sme" -> "some" properly self.assertEqual(err.word,"sme") self.assertEqual(err.wordpos,8) self.assertTrue("some" in err.suggest()) err.replace("some") if n == 1: # Ignore "speling" self.assertEqual(err.word,"speling") if n == 2: # Check context around "erors", and replace self.assertEqual(err.word,"erors") self.assertEqual(err.leading_context(5),"ling ") self.assertEqual(err.trailing_context(5)," in i") err.replace(raw_unicode("errors")) if n == 3: # Replace-all on gret as it appears twice self.assertEqual(err.word,"gret") err.replace_always("great") if n == 4: # First encounter with "wheather", move offset back self.assertEqual(err.word,"wheather") err.set_offset(-1*len(err.word)) if n == 5: # Second encounter, fix up "wheather' self.assertEqual(err.word,"wheather") err.replace("whether") if n == 6: # Just replace "proprly", but also add an ignore # for "SpellChecker" self.assertEqual(err.word,"proprly") err.replace("properly") err.ignore_always("SpellChecker") if n == 7: # The second "gret" should have been replaced # So it's now on "elss" self.assertEqual(err.word,"elss") err.replace("else") if n > 7: self.fail("Extraneous spelling errors were found") text2 = """This is some text with a few speling errors in it. Its great for checking whether things are working properly with the SpellChecker class. Not great for much else though.""" self.assertEqual(chkr.get_text(),text2) def test_filters(self): """Test SpellChecker with the 'filters' argument.""" text = """I contain WikiWords that ShouldBe skipped by the filters""" chkr = SpellChecker("en_US",text=text, filters=[enchant.tokenize.WikiWordFilter]) for err in chkr: # There are no errors once the WikiWords are skipped self.fail("Extraneous spelling errors were found") self.assertEqual(chkr.get_text(),text) def test_chunkers(self): """Test SpellChecker with the 'chunkers' argument.""" text = """I contain tags that should be skipped""" chkr = SpellChecker("en_US",text=text, chunkers=[enchant.tokenize.HTMLChunker]) for err in chkr: # There are no errors when the tag is skipped self.fail("Extraneous spelling errors were found") self.assertEqual(chkr.get_text(),text) def test_chunkers_and_filters(self): """Test SpellChecker with the 'chunkers' and 'filters' arguments.""" text = """I contain tags that should be skipped along with a >> text = "This is sme text with a fw speling errors in it." >>> chkr = SpellChecker("en_US",text) >>> for err in chkr: ... err.replace("SPAM") ... >>> chkr.get_text() 'This is SPAM text with a SPAM SPAM errors in it.' >>> Internally, the SpellChecker always works with arrays of (possibly unicode) character elements. This allows the in-place modification of the string as it is checked, and is the closest thing Python has to a mutable string. The text can be set as any of a normal string, unicode string, character array or unicode character array. The 'get_text' method will return the modified array object if an array is used, or a new string object if a string it used. Words input to the SpellChecker may be either plain strings or unicode objects. They will be converted to the same type as the text being checked, using python's default encoding/decoding settings. If using an array of characters with this object and the array is modified outside of the spellchecking loop, use the 'set_offset' method to reposition the internal loop pointer to make sure it doesn't skip any words. """ _DOC_ERRORS = ["sme","fw","speling","chkr","chkr","chkr"] def __init__(self,lang=None,text=None,tokenize=None,chunkers=None,filters=None): """Constructor for the SpellChecker class. SpellChecker objects can be created in two ways, depending on the nature of the first argument. If it is a string, it specifies a language tag from which a dictionary is created. Otherwise, it must be an enchant Dict object to be used. Optional keyword arguments are: * text: to set the text to be checked at creation time * tokenize: a custom tokenization function to use * chunkers: a list of chunkers to apply during tokenization * filters: a list of filters to apply during tokenization If is not given and the first argument is a Dict, its 'tag' attribute must be a language tag so that a tokenization function can be created automatically. If this attribute is missing the user's default language will be used. """ if lang is None: lang = get_default_language() if isinstance(lang,basestring): dict = enchant.Dict(lang) else: dict = lang try: lang = dict.tag except AttributeError: lang = get_default_language() if lang is None: raise DefaultLanguageNotFoundError self.lang = lang self.dict = dict if tokenize is None: try: tokenize = get_tokenizer(lang,chunkers,filters) except TokenizerNotFoundError: # Fall back to default tokenization if no match for 'lang' tokenize = get_tokenizer(None,chunkers,filters) self._tokenize = tokenize self.word = None self.wordpos = None self._ignore_words = {} self._replace_words = {} # Default to the empty string as the text to be checked self._text = array.array('u') self._use_tostring = False self._tokens = iter([]) if text is not None: self.set_text(text) def __iter__(self): """Each SpellChecker object is its own iterator""" return self def set_text(self,text): """Set the text to be spell-checked. This method must be called, or the 'text' argument supplied to the constructor, before calling the 'next()' method. """ # Convert to an array object if necessary if isinstance(text,basestring): if type(text) is unicode: self._text = array.array('u',text) else: self._text = array.array('c',text) self._use_tostring = True else: self._text = text self._use_tostring = False self._tokens = self._tokenize(self._text) def get_text(self): """Return the spell-checked text.""" if self._use_tostring: return self._array_to_string(self._text) return self._text def _array_to_string(self,text): """Format an internal array as a standard string.""" if text.typecode == 'u': return text.tounicode() return text.tostring() def wants_unicode(self): """Check whether the checker wants unicode strings. This method will return True if the checker wants unicode strings as input, False if it wants normal strings. It's important to provide the correct type of string to the checker. """ if self._text.typecode == 'u': return True return False def coerce_string(self,text,enc=None): """Coerce string into the required type. This method can be used to automatically ensure that strings are of the correct type required by this checker - either unicode or standard. If there is a mismatch, conversion is done using python's default encoding unless another encoding is specified. """ if self.wants_unicode(): if not isinstance(text,unicode): if enc is None: return text.decode() else: return text.decode(enc) return text if not isinstance(text,bytes): if enc is None: return text.encode() else: return text.encode(enc) return text def __next__(self): return self.next() def next(self): """Process text up to the next spelling error. This method is designed to support the iterator protocol. Each time it is called, it will advance the 'word' attribute to the next spelling error in the text. When no more errors are found, it will raise StopIteration. The method will always return self, so that it can be used sensibly in common idioms such as: for err in checker: err.do_something() """ # Find the next spelling error. # The uncaught StopIteration from next(self._tokens) # will provide the StopIteration for this method while True: (word,pos) = next(self._tokens) # decode back to a regular string word = self._array_to_string(word) if self.dict.check(word): continue if word in self._ignore_words: continue self.word = word self.wordpos = pos if word in self._replace_words: self.replace(self._replace_words[word]) continue break return self def replace(self,repl): """Replace the current erroneous word with the given string.""" repl = self.coerce_string(repl) aRepl = array.array(self._text.typecode,repl) self.dict.store_replacement(self.word,repl) self._text[self.wordpos:self.wordpos+len(self.word)] = aRepl self._tokens.offset = self._tokens.offset + (len(repl)-len(self.word)) def replace_always(self,word,repl=None): """Always replace given word with given replacement. If a single argument is given, this is used to replace the current erroneous word. If two arguments are given, that combination is added to the list for future use. """ if repl is None: repl = word word = self.word repl = self.coerce_string(repl) word = self.coerce_string(word) self._replace_words[word] = repl if self.word == word: self.replace(repl) def ignore_always(self,word=None): """Add given word to list of words to ignore. If no word is given, the current erroneous word is added. """ if word is None: word = self.word word = self.coerce_string(word) if word not in self._ignore_words: self._ignore_words[word] = True def add_to_personal(self,word=None): """Add given word to the personal word list. If no word is given, the current erroneous word is added. """ warnings.warn("SpellChecker.add_to_personal is deprecated, please use SpellChecker.add",category=DeprecationWarning) self.add(word) def add(self,word=None): """Add given word to the personal word list. If no word is given, the current erroneous word is added. """ if word is None: word = self.word self.dict.add(word) def suggest(self,word=None): """Return suggested spellings for the given word. If no word is given, the current erroneous word is used. """ if word is None: word = self.word suggs = self.dict.suggest(word) return suggs def check(self,word): """Check correctness of the given word.""" return self.dict.check(word) def set_offset(self,off,whence=0): """Set the offset of the tokenization routine. For more details on the purpose of the tokenization offset, see the documentation of the 'enchant.tokenize' module. The optional argument whence indicates the method by which to change the offset: * 0 (the default) treats as an increment * 1 treats as a distance from the start * 2 treats as a distance from the end """ if whence == 0: self._tokens.offset = self._tokens.offset + off elif whence == 1: assert(off > 0) self._tokens.offset= off elif whence == 2: assert(off > 0) self._tokens.offset = len(self._text) - 1 - off else: raise ValueError("Invalid value for whence: %s"%(whence,)) def leading_context(self,chars): """Get characters of leading context. This method returns up to characters of leading context - the text that occurs in the string immediately before the current erroneous word. """ start = max(self.wordpos - chars,0) context = self._text[start:self.wordpos] return self._array_to_string(context) def trailing_context(self,chars): """Get characters of trailing context. This method returns up to characters of trailing context - the text that occurs in the string immediately after the current erroneous word. """ start = self.wordpos + len(self.word) end = min(start + chars,len(self._text)) context = self._text[start:end] return self._array_to_string(context) pyenchant-1.6.5/enchant/lib/0000755000175000017500000000000011501654510014026 5ustar rfkrfkpyenchant-1.6.5/enchant/lib/enchant/0000755000175000017500000000000011501654510015446 5ustar rfkrfkpyenchant-1.6.5/enchant/lib/enchant/README.txt0000644000175000017500000000014411235022427017143 0ustar rfkrfk This directory contains the plugin DLLs for enchant when installed on a Microsoft Windows system. pyenchant-1.6.5/enchant/errors.py0000644000175000017500000000375611501534256015165 0ustar rfkrfk# pyenchant # # Copyright (C) 2004-2008, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPsE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant.errors: Error class definitions for the enchant library All error classes are defined in this separate sub-module, so that they can safely be imported without causing circular dependencies. """ class Error(Exception): """Base exception class for the enchant module.""" pass class DictNotFoundError(Error): """Exception raised when a requested dictionary could not be found.""" pass class TokenizerNotFoundError(Error): """Exception raised when a requested tokenizer could not be found.""" pass class DefaultLanguageNotFoundError(Error): """Exception raised when a default language could not be found.""" pass pyenchant-1.6.5/enchant/tokenize/0000755000175000017500000000000011501654510015110 5ustar rfkrfkpyenchant-1.6.5/enchant/tokenize/en.py0000644000175000017500000001464511501534256016102 0ustar rfkrfk# pyenchant # # Copyright (C) 2004-2008, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant.tokenize.en: Tokenizer for the English language This module implements a PyEnchant text tokenizer for the English language, based on very simple rules. """ import unicodedata import enchant.tokenize from enchant.utils import unicode class tokenize(enchant.tokenize.tokenize): """Iterator splitting text into words, reporting position. This iterator takes a text string as input, and yields tuples representing each distinct word found in the text. The tuples take the form: (,) Where is the word string found and is the position of the start of the word within the text. The optional argument may be used to specify a list of additional characters that can form part of a word. By default, this list contains only the apostrophe ('). Note that these characters cannot appear at the start or end of a word. """ _DOC_ERRORS = ["pos","pos"] def __init__(self,text,valid_chars=("'",)): self._valid_chars = valid_chars self._text = text self.offset = 0 # Select proper implementation of self._consume_alpha. # 'text' isn't necessarily a string (it could be e.g. a mutable array) # so we can't use isinstance(text,unicode) to detect unicode. # Instead we typetest the first character of the text. # If there's no characters then it doesn't matter what implementation # we use since it won't be called anyway. try: char1 = text[0] except IndexError: self._consume_alpha = self._consume_alpha_b else: if isinstance(char1,unicode): self._consume_alpha = self._consume_alpha_u else: self._consume_alpha = self._consume_alpha_b def _consume_alpha_b(self,text,offset): """Consume an alphabetic character from the given bytestring. Given a bytestring and the current offset, this method returns the number of characters occupied by the next alphabetic character in the string. Non-ASCII bytes are interpreted as utf-8 and can result in multiple characters being consumed. """ assert offset < len(text) if text[offset].isalpha(): return 1 elif text[offset] >= "\x80": return self._consume_alpha_utf8(text,offset) return 0 def _consume_alpha_utf8(self,text,offset): """Consume a sequence of utf8 bytes forming an alphabetic character.""" incr = 2 u = "" while not u and incr <= 4: try: try: # In the common case this will be a string u = text[offset:offset+incr].decode("utf8") except AttributeError: # Looks like it was e.g. a mutable char array. try: s = text[offset:offset+incr].tostring() except AttributeError: s = "".join([c for c in text[offset:offset+incr]]) u = s.decode("utf8") except UnicodeDecodeError: incr += 1 if not u: return 0 if u.isalpha(): return incr if unicodedata.category(u)[0] == "M": return incr return 0 def _consume_alpha_u(self,text,offset): """Consume an alphabetic character from the given unicode string. Given a unicode string and the current offset, this method returns the number of characters occupied by the next alphabetic character in the string. Trailing combining characters are consumed as a single letter. """ assert offset < len(text) incr = 0 if text[offset].isalpha(): incr = 1 while offset + incr < len(text): if unicodedata.category(text[offset+incr])[0] != "M": break incr += 1 return incr def next(self): text = self._text offset = self.offset while offset < len(text): # Find start of next word (must be alpha) while offset < len(text): incr = self._consume_alpha(text,offset) if incr: break offset += 1 curPos = offset # Find end of word using, allowing valid_chars while offset < len(text): incr = self._consume_alpha(text,offset) if not incr: if text[offset] in self._valid_chars: incr = 1 else: break offset += incr # Return if word isnt empty if(curPos != offset): # Make sure word doesn't end with a valid_char while text[offset-1] in self._valid_chars: offset = offset - 1 self.offset = offset return (text[curPos:offset],curPos) self.offset = offset raise StopIteration() pyenchant-1.6.5/enchant/tokenize/tests.py0000644000175000017500000003643111501534256016637 0ustar rfkrfk# pyenchant # # Copyright (C) 2004-2008, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant.tokenize.tests: unittests for enchant tokenization functions. """ import unittest import array from enchant.tokenize import * from enchant.tokenize.en import tokenize as tokenize_en from enchant.utils import raw_unicode, unicode, bytes class TestTokenization(unittest.TestCase): """TestCases for testing the basic tokenization functionality.""" def test_basic_tokenize(self): """Simple regression test for basic white-space tokenization.""" input = """This is a paragraph. It's not very special, but it's designed 2 show how the splitter works with many-different combos of words. Also need to "test" the (handling) of 'quoted' words.""" output = [ ("This",0),("is",5),("a",8),("paragraph",10),("It's",22), ("not",27),("very",31),("special",36),("but",45),("it's",49), ("designed",54),("2",63), ("show",65),("how",70),("the",74), ("splitter",78),("works",87),("with",93),("many-different",98), ("combos",113),("of",120),("words",123), ("Also",130),("need",135), ("to",140),("test",144),("the",150),("handling",155), ("of",165),("quoted",169),("words",177) ] self.assertEqual(output,[i for i in basic_tokenize(input)]) for (itmO,itmV) in zip(output,basic_tokenize(input)): self.assertEqual(itmO,itmV) def test_tokenize_strip(self): """Test special-char-stripping edge-cases in basic_tokenize.""" input = "((' \"\" 'text' has (lots) of (special chars} >>]" output = [ ("",4),("text",15),("has",21),("lots",26),("of",32), ("special",36),("chars}",44),(">>",51)] self.assertEqual(output,[i for i in basic_tokenize(input)]) for (itmO,itmV) in zip(output,basic_tokenize(input)): self.assertEqual(itmO,itmV) def test_wrap_tokenizer(self): """Test wrapping of one tokenizer with another.""" input = "this-string will be split@according to diff'rnt rules" from enchant.tokenize import en tknzr = wrap_tokenizer(basic_tokenize,en.tokenize) tknzr = tknzr(input) self.assertEqual(tknzr._tokenizer.__class__,basic_tokenize) self.assertEqual(tknzr._tokenizer.offset,0) for (n,(word,pos)) in enumerate(tknzr): if n == 0: self.assertEqual(pos,0) self.assertEqual(word,"this") if n == 1: self.assertEqual(pos,5) self.assertEqual(word,"string") if n == 2: self.assertEqual(pos,12) self.assertEqual(word,"will") # Test setting offset to a previous token tknzr.offset = 5 self.assertEqual(tknzr.offset,5) self.assertEqual(tknzr._tokenizer.offset,5) self.assertEqual(tknzr._curtok.__class__,empty_tokenize) if n == 3: self.assertEqual(word,"string") self.assertEqual(pos,5) if n == 4: self.assertEqual(pos,12) self.assertEqual(word,"will") if n == 5: self.assertEqual(pos,17) self.assertEqual(word,"be") # Test setting offset past the current token tknzr.offset = 20 self.assertEqual(tknzr.offset,20) self.assertEqual(tknzr._tokenizer.offset,20) self.assertEqual(tknzr._curtok.__class__,empty_tokenize) if n == 6: self.assertEqual(pos,20) self.assertEqual(word,"split") if n == 7: self.assertEqual(pos,26) self.assertEqual(word,"according") # Test setting offset to middle of current token tknzr.offset = 23 self.assertEqual(tknzr.offset,23) self.assertEqual(tknzr._tokenizer.offset,23) self.assertEqual(tknzr._curtok.offset,3) if n == 8: self.assertEqual(pos,23) self.assertEqual(word,"it") # OK, I'm pretty happy with the behaviour, no need to # continue testing the rest of the string class TestFilters(unittest.TestCase): """TestCases for the various Filter subclasses.""" text = """this text with http://url.com and SomeLinksLike ftp://my.site.com.au/some/file AndOthers not:/quite.a.url with-an@aemail.address as well""" def setUp(self): pass def test_URLFilter(self): """Test filtering of URLs""" tkns = get_tokenizer("en_US",filters=(URLFilter,))(self.text) out = [t for t in tkns] exp = [("this",0),("text",5),("with",10),("and",30), ("SomeLinksLike",34),("AndOthers",93),("not",103),("quite",108), ("a",114),("url",116),("with",134),("an",139),("aemail",142), ("address",149),("as",157),("well",160)] self.assertEqual(out,exp) def test_WikiWordFilter(self): """Test filtering of WikiWords""" tkns = get_tokenizer("en_US",filters=(WikiWordFilter,))(self.text) out = [t for t in tkns] exp = [("this",0),("text",5),("with",10),("http",15),("url",22),("com",26), ("and",30), ("ftp",62),("my",68),("site",71),("com",76),("au",80), ("some",83),("file",88),("not",103),("quite",108), ("a",114),("url",116),("with",134),("an",139),("aemail",142), ("address",149),("as",157),("well",160)] self.assertEqual(out,exp) def test_EmailFilter(self): """Test filtering of email addresses""" tkns = get_tokenizer("en_US",filters=(EmailFilter,))(self.text) out = [t for t in tkns] exp = [("this",0),("text",5),("with",10),("http",15),("url",22),("com",26), ("and",30),("SomeLinksLike",34), ("ftp",62),("my",68),("site",71),("com",76),("au",80), ("some",83),("file",88),("AndOthers",93),("not",103),("quite",108), ("a",114),("url",116), ("as",157),("well",160)] self.assertEqual(out,exp) def test_CombinedFilter(self): """Test several filters combined""" tkns=get_tokenizer("en_US",filters=(URLFilter,WikiWordFilter,EmailFilter))(self.text) out = [t for t in tkns] exp = [("this",0),("text",5),("with",10), ("and",30),("not",103),("quite",108), ("a",114),("url",116), ("as",157),("well",160)] self.assertEqual(out,exp) class TestChunkers(unittest.TestCase): """TestCases for the various Chunker subclasses.""" def test_HTMLChunker(self): """Test filtering of URLs""" text = """hellomy titlethis is a simple HTML document for

testing purposes

. It < contains > various <-- special characters. """ tkns = get_tokenizer("en_US",chunkers=(HTMLChunker,))(text) out = [t for t in tkns] exp = [("hello",0),("my",24),("title",27),("this",53),("is",58), ("a",61),("simple",82),("HTML",93),("document",98),("for",107), ("test",115),("ing",122),("purposes",130),("It",160), ("contains",165),("various",176),("special",188), ("characters",196)] self.assertEqual(out,exp) for (word,pos) in out: self.assertEqual(text[pos:pos+len(word)],word) class TestTokenizeEN(unittest.TestCase): """TestCases for checking behaviour of English tokenization.""" def test_tokenize_en(self): """Simple regression test for English tokenization.""" input = """This is a paragraph. It's not very special, but it's designed 2 show how the splitter works with many-different combos of words. Also need to "test" the handling of 'quoted' words.""" output = [ ("This",0),("is",5),("a",8),("paragraph",10),("It's",22), ("not",27),("very",31),("special",36),("but",45),("it's",49), ("designed",54),("show",65),("how",70),("the",74), ("splitter",78),("works",87),("with",93),("many",98), ("different",103),("combos",113),("of",120),("words",123), ("Also",130),("need",135), ("to",140),("test",144),("the",150),("handling",154), ("of",163),("quoted",167),("words",175) ] for (itmO,itmV) in zip(output,tokenize_en(input)): self.assertEqual(itmO,itmV) def test_unicodeBasic(self): """Test tokenization of a basic unicode string.""" input = raw_unicode(r"Ik ben ge\u00EFnteresseerd in de co\u00F6rdinatie van mijn knie\u00EBn, maar kan niet \u00E9\u00E9n \u00E0 twee enqu\u00EAtes vinden die recht doet aan mijn carri\u00E8re op Cura\u00E7ao") output = input.split(" ") output[8] = output[8][0:-1] for (itmO,itmV) in zip(output,tokenize_en(input)): self.assertEqual(itmO,itmV[0]) self.assertTrue(input[itmV[1]:].startswith(itmO)) def test_unicodeCombining(self): """Test tokenization with unicode combining symbols.""" input = raw_unicode(r"Ik ben gei\u0308nteresseerd in de co\u00F6rdinatie van mijn knie\u00EBn, maar kan niet e\u0301e\u0301n \u00E0 twee enqu\u00EAtes vinden die recht doet aan mijn carri\u00E8re op Cura\u00E7ao") output = input.split(" ") output[8] = output[8][0:-1] for (itmO,itmV) in zip(output,tokenize_en(input)): self.assertEqual(itmO,itmV[0]) self.assertTrue(input[itmV[1]:].startswith(itmO)) def test_utf8_bytes(self): """Test tokenization of UTF8-encoded bytes (bug #2500184).""" # Python3 doesn't support bytestrings, don't run this test if str is unicode: return input = "A r\xc3\xa9sum\xc3\xa9, also spelled resum\xc3\xa9 or resume" output = input.split(" ") output[1] = output[1][0:-1] for (itmO,itmV) in zip(output,tokenize_en(input)): self.assertEqual(itmO,itmV[0]) self.assertTrue(input[itmV[1]:].startswith(itmO)) def test_utf8_bytes_at_end(self): """Test tokenization of UTF8-encoded bytes at end of word.""" # Python3 doesn't support bytestrings, don't run this test if str is unicode: return input = "A r\xc3\xa9sum\xc3\xa9, also spelled resum\xc3\xa9 or resume" output = input.split(" ") output[1] = output[1][0:-1] for (itmO,itmV) in zip(output,tokenize_en(input)): self.assertEqual(itmO,itmV[0]) def test_utf8_bytes_in_an_array(self): """Test tokenization of UTF8-encoded bytes stored in an array.""" # Python3 doesn't support bytestrings, don't run this test if str is unicode: return input = "A r\xc3\xa9sum\xc3\xa9, also spelled resum\xc3\xa9 or resume" output = input.split(" ") output[1] = output[1][0:-1] input = array.array('c',input) output = [array.array('c',w) for w in output] for (itmO,itmV) in zip(output,tokenize_en(array.array('c',input))): self.assertEqual(itmO,itmV[0]) self.assertEqual(input[itmV[1]:itmV[1]+len(itmV[0])],itmO) def test_bug1591450(self): """Check for tokenization regressions identified in bug #1591450.""" input = """Testing markup and {y:i}so-forth...leading dots and trail--- well, you get-the-point. Also check numbers: 999 1,000 12:00 .45. Done?""" output = [ ("Testing",0),("i",9),("markup",11),("i",19),("and",22), ("y",27),("i",29),("so",31),("forth",34),("leading",42), ("dots",50),("and",55),("trail",59),("well",68), ("you",74),("get",78),("the",82),("point",86), ("Also",93),("check",98),("numbers",104),("Done",134), ] for (itmO,itmV) in zip(output,tokenize_en(input)): self.assertEqual(itmO,itmV) def test_bug2785373(self): """Testcases for bug #2785373""" input = "So, one dey when I wes 17, I left." for _ in tokenize_en(input): pass input = raw_unicode("So, one dey when I wes 17, I left.") for _ in tokenize_en(input): pass def test_finnish_text(self): """Test tokenizing some Finnish text. This really should work since there are no special rules to apply, just lots of non-ascii characters. """ inputT = raw_unicode('T\\xe4m\\xe4 on kappale. Eip\\xe4 ole kovin 2 nen, mutta tarkoitus on n\\xe4ytt\\xe4\\xe4 miten sanastaja \\ntoimii useiden-erilaisten sanarypp\\xe4iden kimpussa.\\nPit\\xe4\\xe4p\\xe4 viel\\xe4 \'tarkistaa\' sanat jotka "lainausmerkeiss\\xe4". Heittomerkki ja vaa\'an.\\nUlkomaisia sanoja s\\xfcss, spa\\xdf.') outputT = [ (raw_unicode('T\\xe4m\\xe4'),0), (raw_unicode('on'),5), (raw_unicode('kappale'),8), (raw_unicode('Eip\\xe4'),17), (raw_unicode('ole'),22), (raw_unicode('kovin'),26), (raw_unicode('nen'),34), (raw_unicode('mutta'),39), (raw_unicode('tarkoitus'),45), (raw_unicode('on'),55), (raw_unicode('n\\xe4ytt\\xe4\\xe4'),58), (raw_unicode('miten'),66), (raw_unicode('sanastaja'),72), (raw_unicode('toimii'),83), (raw_unicode('useiden'),90), (raw_unicode('erilaisten'),98), (raw_unicode('sanarypp\\xe4iden'),109), (raw_unicode('kimpussa'),123), (raw_unicode('Pit\\xe4\\xe4p\\xe4'),133), (raw_unicode('viel\\xe4'),141), (raw_unicode('tarkistaa'),148), (raw_unicode('sanat'),159), (raw_unicode('jotka'),165), (raw_unicode('lainausmerkeiss\\xe4'),172), (raw_unicode('Heittomerkki'),191), (raw_unicode('ja'),204), (raw_unicode("vaa'an"),207), (raw_unicode('Ulkomaisia'),215), (raw_unicode('sanoja'),226), (raw_unicode('s\\xfcss'),233), (raw_unicode('spa\\xdf'),239),] for (itmO,itmV) in zip(outputT,tokenize_en(inputT)): self.assertEqual(itmO,itmV) pyenchant-1.6.5/enchant/tokenize/__init__.py0000644000175000017500000004336111501534256017234 0ustar rfkrfk# pyenchant # # Copyright (C) 2004-2009, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant.tokenize: String tokenization functions for PyEnchant An important task in spellchecking is breaking up large bodies of text into their constituent words, each of which is then checked for correctness. This package provides Python functions to split strings into words according to the rules of a particular language. Each tokenization function accepts a string as its only positional argument, and returns an iterator that yields tuples of the following form, one for each word found: (,) The meanings of these fields should be clear: is the word that was found and is the position within the text at which the word began (zero indexed, of course). The function will work on any string-like object that supports array-slicing; in particular character-array objects from the 'array' module may be used. The iterator also provides the attribute 'offset' which may be used to get/set the current position of the tokenizer inside the string being split. This can be used for example if the string's contents have changed during the tokenization process. To obtain an appropriate tokenization function for the language identified by , use the function 'get_tokenizer(tag)': tknzr = get_tokenizer("en_US") for (word,pos) in tknzr("text to be tokenized goes here") do_something(word) This library is designed to be easily extendible by third-party authors. To register a tokenization function for the language , implement it as the function 'tokenize' within the module enchant.tokenize.. The 'get_tokenizer' function will automatically detect it. Note that the underscore must be used as the tag component separator in this case, in order to form a valid python module name. (e.g. "en_US" rather than "en-US") Currently, a tokenizer has only been implemented for the English language. Based on the author's limited experience, this should be at least partially suitable for other languages. This module also provides various implementations of "Chunkers" and "Filters". These classes are designed to make it easy to work with text in a vareity of common formats, by detecting and excluding parts of the text that don't need to be checked. A Chunker is a class designed to break a body of text into large chunks of checkable content; for example the HTMLChunker class extracts the text content from all HTML tags but excludes the tags themselves. A Filter is a class designed to skip individual words during the checking process; for example the URLFilter class skips over any words that have the format of a URL. For exmaple, to spellcheck an HTML document it is necessary to split the text into chunks based on HTML tags, and to filter out common word forms such as URLs and WikiWords. This would look something like the following: tknzr = get_tokenier("en_US",(HTMLChunker,),(URLFilter,WikiWordFilter))) text = "the url is http://example.com" for (word,pos) in tknzer(text): ...check each word and react accordingly... """ _DOC_ERRORS = ["pos","pos","tknzr","URLFilter","WikiWordFilter", "tkns","tknzr","pos","tkns"] import re import warnings import enchant from enchant.utils import next, xrange from enchant.errors import * # For backwards-compatability. This will eventually be removed, but how # does one mark a module-level constant as deprecated? Error = TokenizerNotFoundError class tokenize: """Base class for all tokenizer objects. Each tokenizer must be an iterator and provide the 'offset' attribute as described in the documentation for this module. While tokenizers are in fact classes, they should be treated like functions, and so are named using lower_case rather than the CamelCase more traditional of class names. """ _DOC_ERRORS = ["CamelCase"] def __init__(self,text): self._text = text self.offset = 0 def __next__(self): return self.next() def next(self): raise NotImplementedError() def __iter__(self): return self def get_tokenizer(tag=None,chunkers=None,filters=None): """Locate an appropriate tokenizer by language tag. This requires importing the function 'tokenize' from an appropriate module. Modules tried are named after the language tag, tried in the following order: * the entire tag (e.g. "en_AU.py") * the base country code of the tag (e.g. "en.py") If the language tag is None, a default tokenizer (actually the English one) is returned. It's unicode aware and should work OK for most latin-derived languages. If a suitable function cannot be found, raises TokenizerNotFoundError. If given and not None, 'chunkers' and 'filters' must be lists of chunker classes and filter classes resectively. These will be applied to the tokenizer during creation. """ if tag is None: tag = "en" # "filters" used to be the second argument. Try to catch cases # where it is given positionally and issue a DeprecationWarning. if chunkers is not None and filters is None: chunkers = list(chunkers) if chunkers: try: chunkers_are_filters = issubclass(chunkers[0],Filter) except TypeError: pass else: if chunkers_are_filters: msg = "passing 'filters' as a non-keyword argument "\ "to get_tokenizer() is deprecated" warnings.warn(msg,category=DeprecationWarning) filters = chunkers chunkers = None # Ensure only '_' used as separator tag = tag.replace("-","_") # First try the whole tag tkFunc = _try_tokenizer(tag) if tkFunc is None: # Try just the base base = tag.split("_")[0] tkFunc = _try_tokenizer(base) if tkFunc is None: msg = "No tokenizer found for language '%s'" % (tag,) raise TokenizerNotFoundError(msg) # Given the language-specific tokenizer, we now build up the # end result as follows: # * chunk the text using any given chunkers in turn # * begin with basic whitespace tokenization # * apply each of the given filters in turn # * apply language-specific rules tokenizer = basic_tokenize if chunkers is not None: chunkers = list(chunkers) for i in xrange(len(chunkers)-1,-1,-1): tokenizer = wrap_tokenizer(chunkers[i],tokenizer) if filters is not None: for f in filters: tokenizer = f(tokenizer) tokenizer = wrap_tokenizer(tokenizer,tkFunc) return tokenizer get_tokenizer._DOC_ERRORS = ["py","py"] class empty_tokenize(tokenize): """Tokenizer class that yields no elements.""" _DOC_ERRORS = [] def __init__(self): tokenize.__init__(self,"") def next(self): raise StopIteration() class unit_tokenize(tokenize): """Tokenizer class that yields the text as a single token.""" _DOC_ERRORS = [] def __init__(self,text): tokenize.__init__(self,text) self._done = False def next(self): if self._done: raise StopIteration() self._done = True return (self._text,0) class basic_tokenize(tokenize): """Tokenizer class that performs very basic word-finding. This tokenizer does the most basic thing that could work - it splits text into words based on whitespace boundaries, and removes basic punctuation symbols from the start and end of each word. """ _DOC_ERRORS = [] # Chars to remove from start/end of words strip_from_start = '"' + "'`([" strip_from_end = '"' + "'`]).!,?;:" def next(self): text = self._text offset = self.offset while True: if offset >= len(text): break # Find start of next word while offset < len(text) and text[offset].isspace(): offset += 1 sPos = offset # Find end of word while offset < len(text) and not text[offset].isspace(): offset += 1 ePos = offset self.offset = offset # Strip chars from font/end of word while sPos < len(text) and text[sPos] in self.strip_from_start: sPos += 1 while 0 < ePos and text[ePos-1] in self.strip_from_end: ePos -= 1 # Return if word isnt empty if(sPos < ePos): return (text[sPos:ePos],sPos) raise StopIteration() def _try_tokenizer(modName): """Look for a tokenizer in the named module. Returns the function if found, None otherwise. """ modBase = "enchant.tokenize." funcName = "tokenize" modName = modBase + modName try: mod = __import__(modName,globals(),{},funcName) return getattr(mod,funcName) except ImportError: return None def wrap_tokenizer(tk1,tk2): """Wrap one tokenizer inside another. This function takes two tokenizer functions 'tk1' and 'tk2', and returns a new tokenizer function that passes the output of tk1 through tk2 before yielding it to the calling code. """ # This logic is already implemented in the Filter class. # We simply use tk2 as the _split() method for a filter # around tk1. tkW = Filter(tk1) tkW._split = tk2 return tkW wrap_tokenizer._DOC_ERRORS = ["tk","tk","tk","tk"] class Chunker(tokenize): """Base class for text chunking functions. A chunker is designed to chunk text into large blocks of tokens. It has the same interface as a tokenizer but is for a different purpose. """ pass class Filter: """Base class for token filtering functions. A filter is designed to wrap a tokenizer (or another filter) and do two things: * skip over tokens * split tokens into sub-tokens Subclasses have two basic options for customising their behaviour. The method _skip(word) may be overridden to return True for words that should be skipped, and false otherwise. The method _split(word) may be overridden as tokenization function that will be applied to further tokenize any words that aren't skipped. """ def __init__(self,tokenizer): """Filter class constructor.""" self._tokenizer = tokenizer def __call__(self,*args,**kwds): tkn = self._tokenizer(*args,**kwds) return self._TokenFilter(tkn,self._skip,self._split) def _skip(self,word): """Filter method for identifying skippable tokens. If this method returns true, the given word will be skipped by the filter. This should be overridden in subclasses to produce the desired functionality. The default behaviour is not to skip any words. """ return False def _split(self,word): """Filter method for sub-tokenization of tokens. This method must be a tokenization function that will split the given word into sub-tokens according to the needs of the filter. The default behaviour is not to split any words. """ return unit_tokenize(word) class _TokenFilter(object): """Private inner class implementing the tokenizer-wrapping logic. This might seem convoluted, but we're trying to create something akin to a meta-class - when Filter(tknzr) is called it must return a *callable* that can then be applied to a particular string to perform the tokenization. Since we need to manage a lot of state during tokenization, returning a class is the best option. """ _DOC_ERRORS = ["tknzr"] def __init__(self,tokenizer,skip,split): self._skip = skip self._split = split self._tokenizer = tokenizer # for managing state of sub-tokenization self._curtok = empty_tokenize() self._curword = "" self._curpos = 0 def __iter__(self): return self def __next__(self): return self.next() def next(self): # Try to get the next sub-token from word currently being split. # If unavailable, move on to the next word and try again. try: (word,pos) = next(self._curtok) return (word,pos + self._curpos) except StopIteration: (word,pos) = next(self._tokenizer) while self._skip(word): (word,pos) = next(self._tokenizer) self._curword = word self._curpos = pos self._curtok = self._split(word) return self.next() # Pass on access to 'offset' to the underlying tokenizer. def _getOffset(self): return self._tokenizer.offset def _setOffset(self,val): self._tokenizer.offset = val # If we stay within the current word, also set on _curtok. # Otherwise, throw away _curtok and set to empty iterator. subval = val - self._curpos if subval >= 0 and subval < len(self._curword): self._curtok.offset = subval else: self._curtok = empty_tokenize() self._curword = "" self._curpos = 0 offset = property(_getOffset,_setOffset) # Pre-defined chunkers and filters start here class URLFilter(Filter): """Filter skipping over URLs. This filter skips any words matching the following regular expression: ^[a-zA-z]+:\/\/[^\s].* That is, any words that are URLs. """ _DOC_ERRORS = ["zA"] _pattern = re.compile(r"^[a-zA-z]+:\/\/[^\s].*") def _skip(self,word): if self._pattern.match(word): return True return False class WikiWordFilter(Filter): """Filter skipping over WikiWords. This filter skips any words matching the following regular expression: ^([A-Z]\w+[A-Z]+\w+) That is, any words that are WikiWords. """ _pattern = re.compile(r"^([A-Z]\w+[A-Z]+\w+)") def _skip(self,word): if self._pattern.match(word): return True return False class EmailFilter(Filter): """Filter skipping over email addresses. This filter skips any words matching the following regular expression: ^.+@[^\.].*\.[a-z]{2,}$ That is, any words that resemble email addresses. """ _pattern = re.compile(r"^.+@[^\.].*\.[a-z]{2,}$") def _skip(self,word): if self._pattern.match(word): return True return False class HTMLChunker(Chunker): """Chunker for breaking up HTML documents into chunks of checkable text. The operation of this chunker is very simple - anything between a "<" and a ">" will be ignored. Later versions may improve the algorithm slightly. """ def next(self): text = self._text offset = self.offset while True: if offset >= len(text): break # Skip to the end of the current tag, if any. if text[offset] == "<": maybeTag = offset if self._is_tag(text,offset): while text[offset] != ">": offset += 1 if offset == len(text): offset = maybeTag+1 break else: offset += 1 else: offset = maybeTag+1 sPos = offset # Find the start of the next tag. while offset < len(text) and text[offset] != "<": offset += 1 ePos = offset self.offset = offset # Return if chunk isnt empty if(sPos < offset): return (text[sPos:offset],sPos) raise StopIteration() def _is_tag(self,text,offset): if offset+1 < len(text): if text[offset+1].isalpha(): return True if text[offset+1] == "/": return True return False #TODO: LaTeXChunker pyenchant-1.6.5/enchant/tests.py0000644000175000017500000005300411501534256015002 0ustar rfkrfk# pyenchant # # Copyright (C) 2004-2009, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPsE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant.tests: testcases for pyenchant """ import os import sys import unittest try: import subprocess except ImportError: subprocess is None import enchant from enchant import * from enchant import _enchant as _e from enchant.utils import unicode, raw_unicode, printf def runcmd(cmd): if subprocess is not None: kwds = dict(stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=True) p = subprocess.Popen(cmd,**kwds) (stdout,stderr) = p.communicate() if p.returncode: if sys.version_info[0] >= 3: stderr = stderr.decode(sys.getdefaultencoding(),"replace") sys.stderr.write(stderr) return p.returncode else: return os.system(cmd) class TestBroker(unittest.TestCase): """Test cases for the proper functioning of Broker objects. These tests assume that there is at least one working provider with a dictionary for the "en_US" language. """ def setUp(self): self.broker = Broker() def tearDown(self): del self.broker def test_HasENUS(self): """Test that the en_US language is available.""" self.assertTrue(self.broker.dict_exists("en_US")) def test_LangsAreAvail(self): """Test whether all advertised languages are in fact available.""" for lang in self.broker.list_languages(): if not self.broker.dict_exists(lang): assert False, "language '"+lang+"' advertised but non-existent" def test_ProvsAreAvail(self): """Test whether all advertised providers are in fact available.""" for (lang,prov) in self.broker.list_dicts(): self.assertTrue(self.broker.dict_exists(lang)) if not self.broker.dict_exists(lang): assert False, "language '"+lang+"' advertised but non-existent" if prov not in self.broker.describe(): assert False, "provier '"+str(prov)+"' advertised but non-existent" def test_ProvOrdering(self): """Test that provider ordering works correctly.""" langs = {} provs = [] # Find the providers for each language, and a list of all providers for (tag,prov) in self.broker.list_dicts(): # Skip hyphenation dictionaries installed by OOo if tag.startswith("hyph_") and prov.name == "myspell": continue # Canonicalize separators tag = tag.replace("-","_") langs[tag] = [] # NOTE: we are excluding Zemberek here as it appears to return # a broker for any language, even nonexistent ones if prov not in provs and prov.name != "zemberek": provs.append(prov) for prov in provs: for tag in langs: b2 = Broker() b2.set_ordering(tag,prov.name) try: d = b2.request_dict(tag) if d.provider != prov: raise ValueError() langs[tag].append(prov) except: pass # Check availability using a single entry in ordering for tag in langs: for prov in langs[tag]: b2 = Broker() b2.set_ordering(tag,prov.name) d = b2.request_dict(tag) self.assertEqual((d.provider,tag),(prov,tag)) del d del b2 # Place providers that dont have the language in the ordering for tag in langs: for prov in langs[tag]: order = prov.name for prov2 in provs: if prov2 not in langs[tag]: order = prov2.name + "," + order b2 = Broker() b2.set_ordering(tag,order) d = b2.request_dict(tag) self.assertEqual((d.provider,tag,order),(prov,tag,order)) del d del b2 def test_UnicodeTag(self): """Test that unicode language tags are accepted""" d1 = self.broker._request_dict_data(raw_unicode("en_US")) self.assertTrue(d1) _e.broker_free_dict(self.broker._this,d1) d1 = Dict(raw_unicode("en_US")) self.assertTrue(d1) def test_GetSetParam(self): try: self.broker.get_param("pyenchant.unittest") except AttributeError: return self.assertEqual(self.broker.get_param("pyenchant.unittest"),None) self.broker.set_param("pyenchant.unittest","testing") self.assertEqual(self.broker.get_param("pyenchant.unittest"),"testing") self.assertEqual(Broker().get_param("pyenchant.unittest"),None) class TestDict(unittest.TestCase): """Test cases for the proper functioning of Dict objects. These tests assume that there is at least one working provider with a dictionary for the "en_US" language. """ def setUp(self): self.dict = Dict("en_US") def tearDown(self): del self.dict def test_HasENUS(self): """Test that the en_US language is available through default broker.""" self.assertTrue(dict_exists("en_US")) def test_check(self): """Test that check() works on some common words.""" self.assertTrue(self.dict.check("hello")) self.assertTrue(self.dict.check("test")) self.assertFalse(self.dict.check("helo")) self.assertFalse(self.dict.check("testt")) def test_broker(self): """Test that the dict's broker is set correctly.""" self.assertTrue(self.dict._broker is enchant._broker) def test_tag(self): """Test that the dict's tag is set correctly.""" self.assertEqual(self.dict.tag,"en_US") def test_suggest(self): """Test that suggest() gets simple suggestions right.""" self.assertTrue(self.dict.check("hello")) self.assertTrue("hello" in self.dict.suggest("helo")) def test_suggestHang1(self): """Test whether suggest() hangs on some inputs (Bug #1404196)""" self.assertTrue(len(self.dict.suggest("Thiis")) >= 0) self.assertTrue(len(self.dict.suggest("Thiiis")) >= 0) self.assertTrue(len(self.dict.suggest("Thiiiis")) >= 0) def test_unicode1(self): """Test checking/suggesting for unicode strings""" # TODO: find something that actually returns suggestions us1 = raw_unicode(r"he\u2149lo") self.assertTrue(type(us1) is unicode) self.assertFalse(self.dict.check(us1)) for s in self.dict.suggest(us1): self.assertTrue(type(s) is unicode) def test_session(self): """Test that adding words to the session works as required.""" self.assertFalse(self.dict.check("Lozz")) self.assertFalse(self.dict.is_added("Lozz")) self.dict.add_to_session("Lozz") self.assertTrue(self.dict.is_added("Lozz")) self.assertTrue(self.dict.check("Lozz")) self.dict.remove_from_session("Lozz") self.assertFalse(self.dict.check("Lozz")) self.assertFalse(self.dict.is_added("Lozz")) self.dict.remove_from_session("hello") self.assertFalse(self.dict.check("hello")) self.assertTrue(self.dict.is_removed("hello")) self.dict.add_to_session("hello") def test_AddRemove(self): """Test adding/removing from default user dictionary.""" nonsense = "kxhjsddsi" self.assertFalse(self.dict.check(nonsense)) self.dict.add(nonsense) self.assertTrue(self.dict.is_added(nonsense)) self.assertTrue(self.dict.check(nonsense)) self.dict.remove(nonsense) self.assertFalse(self.dict.is_added(nonsense)) self.assertFalse(self.dict.check(nonsense)) self.dict.remove("pineapple") self.assertFalse(self.dict.check("pineapple")) self.assertTrue(self.dict.is_removed("pineapple")) self.assertFalse(self.dict.is_added("pineapple")) self.dict.add("pineapple") self.assertTrue(self.dict.check("pineapple")) def test_DefaultLang(self): """Test behaviour of default language selection.""" defLang = utils.get_default_language() if defLang is None: # If no default language, shouldnt work self.assertRaises(Error,Dict) else: # If there is a default language, should use it # Of course, no need for the dict to actually exist try: d = Dict() self.assertEqual(d.tag,defLang) except DictNotFoundError: pass class TestPWL(unittest.TestCase): """Test cases for the proper functioning of PWLs and DictWithPWL objects. These tests assume that there is at least one working provider with a dictionary for the "en_US" language. """ def setUp(self): self._tempDir = self._mkdtemp() self._fileName = "pwl.txt" def tearDown(self): import shutil shutil.rmtree(self._tempDir) def _mkdtemp(self): import tempfile return tempfile.mkdtemp() def _path(self,nm=None): if nm is None: nm = self._fileName nm = os.path.join(self._tempDir,nm) if not os.path.exists(nm): open(nm,'w').close() return nm def setPWLContents(self,contents): """Set the contents of the PWL file.""" pwlFile = open(self._path(),"w") for ln in contents: pwlFile.write(ln) pwlFile.write("\n") pwlFile.flush() pwlFile.close() def getPWLContents(self): """Retrieve the contents of the PWL file.""" pwlFile = open(self._path(),"r") contents = pwlFile.readlines() pwlFile.close() return [c.strip() for c in contents] def test_check(self): """Test that basic checking works for PWLs.""" self.setPWLContents(["Sazz","Lozz"]) d = request_pwl_dict(self._path()) self.assertTrue(d.check("Sazz")) self.assertTrue(d.check("Lozz")) self.assertFalse(d.check("hello")) def test_UnicodeFN(self): """Test that unicode PWL filenames are accepted.""" d = request_pwl_dict(unicode(self._path())) self.assertTrue(d) def test_add(self): """Test that adding words to a PWL works correctly.""" d = request_pwl_dict(self._path()) self.assertFalse(d.check("Flagen")) d.add("Esquilax") d.add("Esquilam") self.assertTrue(d.check("Esquilax")) self.assertTrue("Esquilax" in self.getPWLContents()) self.assertTrue(d.is_added("Esquilax")) def test_suggestions(self): """Test getting suggestions from a PWL.""" self.setPWLContents(["Sazz","Lozz"]) d = request_pwl_dict(self._path()) self.assertTrue("Sazz" in d.suggest("Saz")) self.assertTrue("Lozz" in d.suggest("laz")) self.assertTrue("Sazz" in d.suggest("laz")) d.add("Flagen") self.assertTrue("Flagen" in d.suggest("Flags")) self.assertFalse("sazz" in d.suggest("Flags")) def test_DWPWL(self): """Test functionality of DictWithPWL.""" self.setPWLContents(["Sazz","Lozz"]) d = DictWithPWL("en_US",self._path(),self._path("pel.txt")) self.assertTrue(d.check("Sazz")) self.assertTrue(d.check("Lozz")) self.assertTrue(d.check("hello")) self.assertFalse(d.check("helo")) self.assertFalse(d.check("Flagen")) d.add("Flagen") self.assertTrue(d.check("Flagen")) self.assertTrue("Flagen" in self.getPWLContents()) self.assertTrue("Flagen" in d.suggest("Flagn")) self.assertTrue("hello" in d.suggest("helo")) d.remove("hello") self.assertFalse(d.check("hello")) self.assertTrue("hello" not in d.suggest("helo")) d.remove("Lozz") self.assertFalse(d.check("Lozz")) def test_DWPWL_empty(self): """Test functionality of DictWithPWL using transient dicts.""" d = DictWithPWL("en_US",None,None) self.assertTrue(d.check("hello")) self.assertFalse(d.check("helo")) self.assertFalse(d.check("Flagen")) d.add("Flagen") self.assertTrue(d.check("Flagen")) d.remove("hello") self.assertFalse(d.check("hello")) d.add("hello") self.assertTrue(d.check("hello")) def test_PyPWL(self): """Test our pure-python PWL implementation.""" d = PyPWL() self.assertTrue(list(d._words) == []) d.add("hello") d.add("there") d.add("duck") ws = list(d._words) self.assertTrue(len(ws) == 3) self.assertTrue("hello" in ws) self.assertTrue("there" in ws) self.assertTrue("duck" in ws) d.remove("duck") d.remove("notinthere") ws = list(d._words) self.assertTrue(len(ws) == 2) self.assertTrue("hello" in ws) self.assertTrue("there" in ws) def test_UnicodeCharsInPath(self): """Test that unicode chars in PWL paths are accepted.""" self._fileName = raw_unicode(r"test_\xe5\xe4\xf6_ing") d = request_pwl_dict(self._path()) self.assertTrue(d) class TestDocStrings(unittest.TestCase): """Test the spelling on all docstrings we can find in this module. This serves two purposes - to provide a lot of test data for the checker routines, and to make sure we don't suffer the embarrassment of having spelling errors in a spellchecking package! """ WORDS = ["spellchecking","utf","dict","unicode","bytestring","bytestrings", "str","pyenchant","ascii", "utils","setup","distutils","pkg", "filename", "tokenization", "tuple", "tuples", "tokenizer", "tokenizers","testcase","testcases","whitespace","wxpython", "spellchecker","dialog","urls","wikiwords","enchantobject", "providerdesc", "spellcheck", "pwl", "aspell", "myspell", "docstring", "docstrings", "stopiteration", "pwls","pypwl", "dictwithpwl","skippable","dicts","dict's","filenames", "trie","api","ctypes","wxspellcheckerdialog","stateful", "cmdlinechecker","spellchecks","callback","clunkier","iterator", "ispell","cor","backends"] def test_docstrings(self): """Test that all our docstrings are error-free.""" import enchant import enchant.utils import enchant.pypwl import enchant.tokenize import enchant.tokenize.en import enchant.checker import enchant.checker.CmdLineChecker try: import enchant.checker.GtkSpellCheckerDialog except ImportError: pass try: import enchant.checker.wxSpellCheckerDialog except ImportError: pass errors = [] # Naive recursion here would blow the stack, instead we # simulate it with our own stack tocheck = [enchant] checked = [] while tocheck: obj = tocheck.pop() checked.append(obj) newobjs = list(self._check_docstrings(obj,errors)) tocheck.extend([obj for obj in newobjs if obj not in checked]) self.assertEqual(len(errors),0) def _check_docstrings(self,obj,errors): import enchant if hasattr(obj,"__doc__"): skip_errors = [w for w in getattr(obj,"_DOC_ERRORS",[])] chkr = enchant.checker.SpellChecker("en_AU",obj.__doc__,filters=[enchant.tokenize.URLFilter]) for err in chkr: if len(err.word) == 1: continue if err.word.lower() in self.WORDS: continue if skip_errors and skip_errors[0] == err.word: skip_errors.pop(0) continue errors.append((obj,err.word,err.wordpos)) msg = "\nDOCSTRING SPELLING ERROR: %s %s %d %s\n" % (obj,err.word,err.wordpos,chkr.suggest()) printf([msg],file=sys.stderr) # Find and yield all child objects that should be checked for name in dir(obj): if name.startswith("__"): continue child = getattr(obj,name) if hasattr(child,"__file__"): if not hasattr(globals(),"__file__"): continue if not child.__file__.startswith(os.path.dirname(__file__)): continue else: cmod = getattr(child,"__module__",None) if not cmod: cclass = getattr(child,"__class__",None) cmod = getattr(cclass,"__module__",None) if cmod and not cmod.startswith("enchant"): continue yield child class TestInstallEnv(unittest.TestCase): """Run all testcases in a variety of install environments.""" def setUp(self): self._tempDir = self._mkdtemp() self._insDir = "build" def tearDown(self): import shutil shutil.rmtree(self._tempDir) def _mkdtemp(self): import tempfile return tempfile.mkdtemp() def install(self): import os, sys, shutil insdir = os.path.join(self._tempDir,self._insDir) os.makedirs(insdir) shutil.copytree("enchant",os.path.join(insdir,"enchant")) def runtests(self): import os, sys insdir = os.path.join(self._tempDir,self._insDir) if str is not unicode and isinstance(insdir,unicode): insdir = insdir.encode(sys.getfilesystemencoding()) os.environ["PYTHONPATH"] = insdir script = os.path.join(insdir,"enchant","__init__.py") res = runcmd("\"%s\" %s" % (sys.executable,script,)) self.assertEqual(res,0) def test_basic(self): """Test proper functioning of TestInstallEnv suite.""" self.install() self.runtests() test_basic._DOC_ERRORS = ["TestInstallEnv"] def test_UnicodeInstallPath(self): """Test installation in a path containing unicode chars.""" self._insDir = raw_unicode(r'test_\xe5\xe4\xf6_ing') self.install() self.runtests() class TestPy2exe(unittest.TestCase): """Run all testcases inside a py2exe executable""" _DOC_ERRORS = ["py","exe"] def setUp(self): self._tempDir = self._mkdtemp() def tearDown(self): import shutil shutil.rmtree(self._tempDir) def test_py2exe(self): """Test pyenchant running inside a py2exe executable.""" import os, sys, shutil from os import path from os.path import dirname try: import py2exe except ImportError: return os.environ["PYTHONPATH"] = dirname(dirname(__file__)) setup_py = path.join(dirname(__file__),"..","tools","setup.py2exe.py") if not path.exists(setup_py): return buildCmd = '%s %s -q py2exe --dist-dir="%s"' buildCmd = buildCmd % (sys.executable,setup_py,self._tempDir) res = runcmd(buildCmd) self.assertEqual(res,0) testCmd = self._tempDir + "\\test_pyenchant.exe" self.assertTrue(os.path.exists(testCmd)) res = runcmd(testCmd) self.assertEqual(res,0) test_py2exe._DOC_ERRORS = ["py","exe"] def _mkdtemp(self): import tempfile return tempfile.mkdtemp() def buildtestsuite(recurse=True): from enchant.checker.tests import TestChecker from enchant.tokenize.tests import TestTokenization, TestFilters from enchant.tokenize.tests import TestTokenizeEN suite = unittest.TestSuite() if recurse: suite.addTest(unittest.makeSuite(TestInstallEnv)) suite.addTest(unittest.makeSuite(TestPy2exe)) suite.addTest(unittest.makeSuite(TestBroker)) suite.addTest(unittest.makeSuite(TestDict)) suite.addTest(unittest.makeSuite(TestPWL)) suite.addTest(unittest.makeSuite(TestDocStrings)) suite.addTest(unittest.makeSuite(TestChecker)) suite.addTest(unittest.makeSuite(TestTokenization)) suite.addTest(unittest.makeSuite(TestTokenizeEN)) suite.addTest(unittest.makeSuite(TestFilters)) return suite def runtestsuite(recurse=False): return unittest.TextTestRunner(verbosity=0).run(buildtestsuite(recurse=recurse)) pyenchant-1.6.5/enchant/__init__.py0000644000175000017500000007750411501650655015414 0ustar rfkrfk# pyenchant # # Copyright (C) 2004-2008, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPsE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant: Access to the enchant spellchecking library This module provides several classes for performing spell checking via the Enchant spellchecking library. For more details on Enchant, visit the project website: http://www.abisource.com/enchant/ Spellchecking is performed using 'Dict' objects, which represent a language dictionary. Their use is best demonstrated by a quick example: >>> import enchant >>> d = enchant.Dict("en_US") # create dictionary for US English >>> d.check("enchant") True >>> d.check("enchnt") False >>> d.suggest("enchnt") ['enchant', 'enchants', 'enchanter', 'penchant', 'incant', 'enchain', 'enchanted'] Languages are identified by standard string tags such as "en" (English) and "fr" (French). Specific language dialects can be specified by including an additional code - for example, "en_AU" refers to Australian English. The later form is preferred as it is more widely supported. To check whether a dictionary exists for a given language, the function 'dict_exists' is available. Dictionaries may also be created using the function 'request_dict'. A finer degree of control over the dictionaries and how they are created can be obtained using one or more 'Broker' objects. These objects are responsible for locating dictionaries for a specific language. In Python 2.x, unicode strings are supported transparently in the standard manner - if a unicode string is given as an argument, the result will be a unicode string. Note that Enchant works in UTF-8 internally, so passing an ASCII string to a dictionary for a language requiring Unicode may result in UTF-8 strings being returned. In Python 3.x unicode strings are expected throughout. Bytestrings should not be passed into any functions. Errors that occur in this module are reported by raising subclasses of 'Error'. """ _DOC_ERRORS = ['enchnt','enchnt', 'fr'] # Make version info available __ver_major__ = 1 __ver_minor__ = 6 __ver_patch__ = 5 __ver_sub__ = "" __version__ = "%d.%d.%d%s" % (__ver_major__,__ver_minor__, __ver_patch__,__ver_sub__) import os from enchant import _enchant as _e from enchant.errors import * from enchant.utils import EnchantStr, get_default_language from enchant.pypwl import PyPWL # Due to the unfortunate name collision between the enchant "tokenize" module # and the stdlib "tokenize" module, certain values of sys.path can cause # the former to override the latter and break the "warnings" module. # This hacks around it by making a dumming "warnings" module. try: import warnings except ImportError: class warnings(object): def warn(self,*args,**kwds): pass warnings = warnings() class ProviderDesc(object): """Simple class describing an Enchant provider. Each provider has the following information associated with it: * name: Internal provider name (e.g. "aspell") * desc: Human-readable description (e.g. "Aspell Provider") * file: Location of the library containing the provider """ _DOC_ERRORS = ["desc"] def __init__(self,name,desc,file): self.name = name self.desc = desc self.file = file def __str__(self): return "" % self.desc def __repr__(self): return str(self) def __eq__(self,pd): """Equality operator on ProviderDesc objects.""" return (self.name == pd.name and \ self.desc == pd.desc and \ self.file == pd.file) def __hash__(self): """Hash operator on ProviderDesc objects.""" return hash(self.name + self.desc + self.file) class _EnchantObject(object): """Base class for enchant objects. This class implements some general functionality for interfacing with the '_enchant' C-library in a consistent way. All public objects from the 'enchant' module are subclasses of this class. All enchant objects have an attribute '_this' which contains the pointer to the underlying C-library object. The method '_check_this' can be called to ensure that this point is not None, raising an exception if it is. """ def __init__(self): """_EnchantObject constructor.""" self._this = None def _check_this(self,msg=None): """Check that self._this is set to a pointer, rather than None.""" if msg is None: msg = "%s unusable: the underlying C-library object has been freed." msg = msg % (self.__class__.__name__,) if self._this is None: raise Error(msg) def _raise_error(self,default="Unspecified Error",eclass=Error): """Raise an exception based on available error messages. This method causes an Error to be raised. Subclasses should override it to retrieve an error indication from the underlying API if possible. If such a message cannot be retrieved, the argument value is used. The class of the exception can be specified using the argument """ raise eclass(default) _raise_error._DOC_ERRORS = ["eclass"] class Broker(_EnchantObject): """Broker object for the Enchant spellchecker. Broker objects are responsible for locating and managing dictionaries. Unless custom functionality is required, there is no need to use Broker objects directly. The 'enchant' module provides a default broker object so that 'Dict' objects can be created directly. The most important methods of this class include: * dict_exists: check existence of a specific language dictionary * request_dict: obtain a dictionary for specific language * set_ordering: specify which dictionaries to try for for a given language. """ def __init__(self): """Broker object constructor. This method is the constructor for the 'Broker' object. No arguments are required. """ _EnchantObject.__init__(self) self._this = _e.broker_init() if not self._this: raise Error("Could not initialise an enchant broker.") def __del__(self): """Broker object destructor.""" if _e is not None: self._free() def _raise_error(self,default="Unspecified Error",eclass=Error): """Overrides _EnchantObject._raise_error to check broker errors.""" err = _e.broker_get_error(self._this) if err == "" or err is None: raise eclass(default) raise eclass(err) def _free(self): """Free system resource associated with a Broker object. This method can be called to free the underlying system resources associated with a Broker object. It is called automatically when the object is garbage collected. If called explicitly, the Broker and any associated Dict objects must no longer be used. """ if self._this is not None: _e.broker_free(self._this) self._this = None def request_dict(self,tag=None): """Request a Dict object for the language specified by . This method constructs and returns a Dict object for the requested language. 'tag' should be a string of the appropriate form for specifying a language, such as "fr" (French) or "en_AU" (Australian English). The existence of a specific language can be tested using the 'dict_exists' method. If is not given or is None, an attempt is made to determine the current language in use. If this cannot be determined, Error is raised. NOTE: this method is functionally equivalent to calling the Dict() constructor and passing in the argument. """ return Dict(tag,self) request_dict._DOC_ERRORS = ["fr"] def _request_dict_data(self,tag): """Request raw C pointer data for a dictionary. This method call passes on the call to the C library, and does some internal bookkeeping. """ self._check_this() tag = EnchantStr(tag) new_dict = _e.broker_request_dict(self._this,tag.encode()) if new_dict is None: eStr = "Dictionary for language '%s' could not be found" self._raise_error(eStr % (tag,),DictNotFoundError) return new_dict def request_pwl_dict(self,pwl): """Request a Dict object for a personal word list. This method behaves as 'request_dict' but rather than returning a dictionary for a specific language, it returns a dictionary referencing a personal word list. A personal word list is a file of custom dictionary entries, one word per line. """ self._check_this() pwl = EnchantStr(pwl) new_dict = _e.broker_request_pwl_dict(self._this,pwl.encode()) if new_dict is None: eStr = "Personal Word List file '%s' could not be loaded" self._raise_error(eStr % (pwl,)) d = Dict(False) d._switch_this(new_dict,self) return d def _free_dict(self,dict): """Free memory associated with a dictionary. This method frees system resources associated with a Dict object. It is equivalent to calling the object's 'free' method. Once this method has been called on a dictionary, it must not be used again. """ self._check_this() _e.broker_free_dict(self._this,dict._this) dict._this = None dict._broker = None def dict_exists(self,tag): """Check availability of a dictionary. This method checks whether there is a dictionary available for the language specified by 'tag'. It returns True if a dictionary is available, and False otherwise. """ self._check_this() tag = EnchantStr(tag) val = _e.broker_dict_exists(self._this,tag.encode()) return bool(val) def set_ordering(self,tag,ordering): """Set dictionary preferences for a language. The Enchant library supports the use of multiple dictionary programs and multiple languages. This method specifies which dictionaries the broker should prefer when dealing with a given language. 'tag' must be an appropriate language specification and 'ordering' is a string listing the dictionaries in order of preference. For example a valid ordering might be "aspell,myspell,ispell". The value of 'tag' can also be set to "*" to set a default ordering for all languages for which one has not been set explicitly. """ self._check_this() tag = EnchantStr(tag) ordering = EnchantStr(ordering) _e.broker_set_ordering(self._this,tag.encode(),ordering.encode()) def describe(self): """Return list of provider descriptions. This method returns a list of descriptions of each of the dictionary providers available. Each entry in the list is a ProviderDesc object. """ self._check_this() self.__describe_result = [] _e.broker_describe(self._this,self.__describe_callback) return [ ProviderDesc(*r) for r in self.__describe_result] def __describe_callback(self,name,desc,file): """Collector callback for dictionary description. This method is used as a callback into the _enchant function 'enchant_broker_describe'. It collects the given arguments in a tuple and appends them to the list '__describe_result'. """ s = EnchantStr("") name = s.decode(name) desc = s.decode(desc) file = s.decode(file) self.__describe_result.append((name,desc,file)) def list_dicts(self): """Return list of available dictionaries. This method returns a list of dictionaries available to the broker. Each entry in the list is a two-tuple of the form: (tag,provider) where is the language lag for the dictionary and is a ProviderDesc object describing the provider through which that dictionary can be obtained. """ self._check_this() self.__list_dicts_result = [] _e.broker_list_dicts(self._this,self.__list_dicts_callback) return [ (r[0],ProviderDesc(*r[1])) for r in self.__list_dicts_result] def __list_dicts_callback(self,tag,name,desc,file): """Collector callback for listing dictionaries. This method is used as a callback into the _enchant function 'enchant_broker_list_dicts'. It collects the given arguments into an appropriate tuple and appends them to '__list_dicts_result'. """ s = EnchantStr("") tag = s.decode(tag) name = s.decode(name) desc = s.decode(desc) file = s.decode(file) self.__list_dicts_result.append((tag,(name,desc,file))) def list_languages(self): """List languages for which dictionaries are available. This function returns a list of language tags for which a dictionary is available. """ langs = [] for (tag,prov) in self.list_dicts(): if tag not in langs: langs.append(tag) return langs def __describe_dict(self,dict_data): """Get the description tuple for a dict data object. must be a C-library pointer to an enchant dictionary. The return value is a tuple of the form: (,,,) """ # Define local callback function cb_result = [] def cb_func(tag,name,desc,file): s = EnchantStr("") tag = s.decode(tag) name = s.decode(name) desc = s.decode(desc) file = s.decode(file) cb_result.append((tag,name,desc,file)) # Actually call the describer function _e.dict_describe(dict_data,cb_func) return cb_result[0] __describe_dict._DOC_ERRORS = ["desc"] def get_param(self,name): """Get the value of a named parameter on this broker. Parameters are used to provide runtime information to individual provider backends. See the method 'set_param' for more details. """ name = EnchantStr(name) return name.decode(_e.broker_get_param(self._this,name.encode())) get_param._DOC_ERRORS = ["param"] def set_param(self,name,value): """Set the value of a named parameter on this broker. Parameters are used to provide runtime information to individual provider backends. For example, the myspell provider will search any directories given in the "enchant.myspell.dictionary.path" parameter when looking for its dictionary files. """ name = EnchantStr(name) value = EnchantStr(value) _e.broker_set_param(self._this,name.encode(),value.encode()) class Dict(_EnchantObject): """Dictionary object for the Enchant spellchecker. Dictionary objects are responsible for checking the spelling of words and suggesting possible corrections. Each dictionary is owned by a Broker object, but unless a new Broker has explicitly been created then this will be the 'enchant' module default Broker and is of little interest. The important methods of this class include: * check(): check whether a word id spelled correctly * suggest(): suggest correct spellings for a word * add(): add a word to the user's personal dictionary * remove(): add a word to the user's personal exclude list * add_to_session(): add a word to the current spellcheck session * store_replacement(): indicate a replacement for a given word Information about the dictionary is available using the following attributes: * tag: the language tag of the dictionary * provider: a ProviderDesc object for the dictionary provider """ def __init__(self,tag=None,broker=None): """Dict object constructor. A dictionary belongs to a specific language, identified by the string . If the tag is not given or is None, an attempt to determine the language currently in use is made using the 'locale' module. If the current language cannot be determined, Error is raised. If is instead given the value of False, a 'dead' Dict object is created without any reference to a language. This is typically only useful within PyEnchant itself. Any other non-string value for raises Error. Each dictionary must also have an associated Broker object which obtains the dictionary information from the underlying system. This may be specified using . If not given, the default broker is used. """ # Superclass initialisation _EnchantObject.__init__(self) # Initialise object attributes to None self._broker = None self.tag = None self.provider = None # Create dead object if False was given if tag is False: self._this = None else: if tag is None: tag = get_default_language() if tag is None: err = "No tag specified and default language could not " err = err + "be determined." raise Error(err) # Use module-level broker if none given if broker is None: broker = _broker # Use the broker to get C-library pointer data self._switch_this(broker._request_dict_data(tag),broker) def __del__(self): """Dict object destructor.""" # Calling free() might fail if python is shutting down try: self._free() except AttributeError: pass def _switch_this(self,this,broker): """Switch the underlying C-library pointer for this object. As all useful state for a Dict is stored by the underlying C-library pointer, it is very convenient to allow this to be switched at run-time. Pass a new dict data object into this method to affect the necessary changes. The creating Broker object (at the Python level) must also be provided. This should *never* *ever* be used by application code. It's a convenience for developers only, replacing the clunkier parameter to __init__ from earlier versions. """ # Free old dict data Dict._free(self) # Hook in the new stuff self._this = this self._broker = broker # Update object properties desc = self.__describe(check_this=False) self.tag = desc[0] self.provider = ProviderDesc(*desc[1:]) _switch_this._DOC_ERRORS = ["init"] def _check_this(self,msg=None): """Extend _EnchantObject._check_this() to check Broker validity. It is possible for the managing Broker object to be freed without freeing the Dict. Thus validity checking must take into account self._broker._this as well as self._this. """ if self._broker is None or self._broker._this is None: self._this = None _EnchantObject._check_this(self,msg) def _raise_error(self,default="Unspecified Error",eclass=Error): """Overrides _EnchantObject._raise_error to check dict errors.""" err = _e.dict_get_error(self._this) if err == "" or err is None: raise eclass(default) raise eclass(err) def _free(self): """Free the system resources associated with a Dict object. This method frees underlying system resources for a Dict object. Once it has been called, the Dict object must no longer be used. It is called automatically when the object is garbage collected. """ if self._broker is not None: self._broker._free_dict(self) def check(self,word): """Check spelling of a word. This method takes a word in the dictionary language and returns True if it is correctly spelled, and false otherwise. """ self._check_this() word = EnchantStr(word) val = _e.dict_check(self._this,word.encode()) if val == 0: return True if val > 0: return False self._raise_error() def suggest(self,word): """Suggest possible spellings for a word. This method tries to guess the correct spelling for a given word, returning the possibilities in a list. """ self._check_this() word = EnchantStr(word) suggs = _e.dict_suggest(self._this,word.encode()) return [word.decode(w) for w in suggs] def add(self,word): """Add a word to the user's personal word list.""" self._check_this() word = EnchantStr(word) _e.dict_add(self._this,word.encode()) def remove(self,word): """Add a word to the user's personal exclude list.""" self._check_this() word = EnchantStr(word) _e.dict_remove(self._this,word.encode()) def add_to_pwl(self,word): """Add a word to the user's personal word list.""" warnings.warn("Dict.add_to_pwl is deprecated, please use Dict.add",category=DeprecationWarning) self._check_this() word = EnchantStr(word) _e.dict_add_to_pwl(self._this,word.encode()) def add_to_session(self,word): """Add a word to the session personal list.""" self._check_this() word = EnchantStr(word) _e.dict_add_to_session(self._this,word.encode()) def remove_from_session(self,word): """Add a word to the session exclude list.""" self._check_this() word = EnchantStr(word) _e.dict_remove_from_session(self._this,word.encode()) def is_added(self,word): """Check whether a word is in the personal word list.""" self._check_this() word = EnchantStr(word) return _e.dict_is_added(self._this,word.encode()) def is_removed(self,word): """Check whether a word is in the personal exclude list.""" self._check_this() word = EnchantStr(word) return _e.dict_is_removed(self._this,word.encode()) def is_in_session(self,word): """Check whether a word is in the session list.""" warnings.warn("Dict.is_in_session is deprecated, please use Dict.is_added",category=DeprecationWarning) self._check_this() word = EnchantStr(word) return _e.dict_is_in_session(self._this,word.encode()) def store_replacement(self,mis,cor): """Store a replacement spelling for a miss-spelled word. This method makes a suggestion to the spellchecking engine that the miss-spelled word is in fact correctly spelled as . Such a suggestion will typically mean that appears early in the list of suggested spellings offered for later instances of . """ self._check_this() mis = EnchantStr(mis) cor = EnchantStr(cor) _e.dict_store_replacement(self._this,mis.encode(),cor.encode()) store_replacement._DOC_ERRORS = ["mis","mis"] def __describe(self,check_this=True): """Return a tuple describing the dictionary. This method returns a four-element tuple describing the underlying spellchecker system providing the dictionary. It will contain the following strings: * language tag * name of dictionary provider * description of dictionary provider * dictionary file Direct use of this method is not recommended - instead, access this information through the 'tag' and 'provider' attributes. """ if check_this: self._check_this() _e.dict_describe(self._this,self.__describe_callback) return self.__describe_result def __describe_callback(self,tag,name,desc,file): """Collector callback for dictionary description. This method is used as a callback into the _enchant function 'enchant_dict_describe'. It collects the given arguments in a tuple and stores them in the attribute '__describe_result'. """ s = EnchantStr("") tag = s.decode(tag) name = s.decode(name) desc = s.decode(desc) file = s.decode(file) self.__describe_result = (tag,name,desc,file) class DictWithPWL(Dict): """Dictionary with separately-managed personal word list. NOTE: As of version 1.4.0, enchant manages a per-user pwl and exclude list. This class is now only needed if you want to explicitly maintain a separate word list in addition to the default one. This class behaves as the standard Dict class, but also manages a personal word list stored in a separate file. The file must be specified at creation time by the 'pwl' argument to the constructor. Words added to the dictionary are automatically appended to the pwl file. A personal exclude list can also be managed, by passing another filename to the constructor in the optional 'pel' argument. If this is not given, requests to exclude words are ignored. If either 'pwl' or 'pel' are None, an in-memory word list is used. This will prevent calls to add() and remove() from affecting the user's default word lists. The Dict object managing the PWL is available as the 'pwl' attribute. The Dict object managing the PEL is available as the 'pel' attribute. To create a DictWithPWL from the user's default language, use None as the 'tag' argument. """ _DOC_ERRORS = ["pel","pel","PEL","pel"] def __init__(self,tag,pwl=None,pel=None,broker=None): """DictWithPWL constructor. The argument 'pwl', if not None, names a file containing the personal word list. If this file does not exist, it is created with default permissions. The argument 'pel', if not None, names a file containing the personal exclude list. If this file does not exist, it is created with default permissions. """ Dict.__init__(self,tag,broker) if pwl is not None: if not os.path.exists(pwl): f = open(pwl,"wt") f.close() del f self.pwl = self._broker.request_pwl_dict(pwl) else: self.pwl = PyPWL() if pel is not None: if not os.path.exists(pel): f = open(pel,"wt") f.close() del f self.pel = self._broker.request_pwl_dict(pel) else: self.pel = PyPWL() def _check_this(self,msg=None): """Extend Dict._check_this() to check PWL validity.""" if self.pwl is None: self._free() if self.pel is None: self._free() Dict._check_this(self,msg) self.pwl._check_this(msg) self.pel._check_this(msg) def _free(self): """Extend Dict._free() to free the PWL as well.""" if self.pwl is not None: self.pwl._free() self.pwl = None if self.pel is not None: self.pel._free() self.pel = None Dict._free(self) def check(self,word): """Check spelling of a word. This method takes a word in the dictionary language and returns True if it is correctly spelled, and false otherwise. It checks both the dictionary and the personal word list. """ if self.pel.check(word): return False if self.pwl.check(word): return True if Dict.check(self,word): return True return False def suggest(self,word): """Suggest possible spellings for a word. This method tries to guess the correct spelling for a given word, returning the possibilities in a list. """ suggs = Dict.suggest(self,word) suggs.extend([w for w in self.pwl.suggest(word) if w not in suggs]) for i in range(len(suggs)-1,-1,-1): if self.pel.check(suggs[i]): del suggs[i] return suggs def add(self,word): """Add a word to the associated personal word list. This method adds the given word to the personal word list, and automatically saves the list to disk. """ self._check_this() self.pwl.add(word) self.pel.remove(word) def remove(self,word): """Add a word to the associated exclude list.""" self._check_this() self.pwl.remove(word) self.pel.add(word) def add_to_pwl(self,word): """Add a word to the associated personal word list. This method adds the given word to the personal word list, and automatically saves the list to disk. """ self._check_this() self.pwl.add_to_pwl(word) self.pel.remove(word) def is_added(self,word): """Check whether a word is in the personal word list.""" self._check_this() return self.pwl.is_added(word) def is_removed(self,word): """Check whether a word is in the personal exclude list.""" self._check_this() return self.pel.is_added(word) ## Create a module-level default broker object, and make its important ## methods available at the module level. _broker = Broker() request_dict = _broker.request_dict request_pwl_dict = _broker.request_pwl_dict dict_exists = _broker.dict_exists list_dicts = _broker.list_dicts list_languages = _broker.list_languages get_param = _broker.get_param set_param = _broker.set_param # Expose the "get_version" function. def get_enchant_version(): """Get the version string for the underlying enchant library.""" return _e.get_version() # Run unit tests when called from comand-line if __name__ == "__main__": import sys import enchant.tests res = enchant.tests.runtestsuite() if len(res.errors) > 0 or len(res.failures) > 0: sys.exit(1) sys.exit(0) pyenchant-1.6.5/enchant/share/0000755000175000017500000000000011501654510014362 5ustar rfkrfkpyenchant-1.6.5/enchant/share/enchant/0000755000175000017500000000000011501654510016002 5ustar rfkrfkpyenchant-1.6.5/enchant/share/enchant/myspell/0000755000175000017500000000000011501654510017467 5ustar rfkrfkpyenchant-1.6.5/enchant/share/enchant/myspell/README.txt0000644000175000017500000000011311235022426021157 0ustar rfkrfk This directory contains dictionaries for the myspell backend to enchant. pyenchant-1.6.5/enchant/share/enchant/ispell/0000755000175000017500000000000011501654510017272 5ustar rfkrfkpyenchant-1.6.5/enchant/share/enchant/ispell/README.txt0000666000175000017500000000000011501654374020772 0ustar rfkrfkpyenchant-1.6.5/enchant/share/enchant/README.txt0000644000175000017500000000026411235022426017501 0ustar rfkrfk This directory contains dictionary files for Enchant when installed on a Microsoft Windows system. Each subdirectory contains dictionaries for a particular spellchecking system. pyenchant-1.6.5/enchant/_enchant.py0000644000175000017500000002532511501534256015424 0ustar rfkrfk# pyenchant # # Copyright (C) 2004-2008, Ryan Kelly # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPsE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the # Free Software Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # # In addition, as a special exception, you are # given permission to link the code of this program with # non-LGPL Spelling Provider libraries (eg: a MSFT Office # spell checker backend) and distribute linked combinations including # the two. You must obey the GNU Lesser General Public License in all # respects for all of the code used other than said providers. If you modify # this file, you may extend this exception to your version of the # file, but you are not obligated to do so. If you do not wish to # do so, delete this exception statement from your version. # """ enchant._enchant: ctypes-based wrapper for enchant C library This module implements the low-level interface to the underlying C library for enchant. The interface is based on ctypes and tries to do as little as possible while making the higher-level components easier to write. The following conveniences are provided that differ from the underlying C API: * the "enchant" prefix has been removed from all functions, since python has a proper module system * callback functions do not take a user_data argument, since python has proper closures that can manage this internally * string lengths are not passed into functions such as dict_check, since python strings know how long they are """ import sys, os, os.path from ctypes import * from ctypes.util import find_library from enchant import utils from enchant.errors import * from enchant.utils import unicode # Locate and load the enchant dll. # We've got several options based on the host platform. e = None def _e_path_possibilities(): """Generator yielding possible locations of the enchant library.""" yield os.environ.get("PYENCHANT_LIBRARY_PATH") yield find_library("enchant") yield find_library("libenchant") yield find_library("libenchant-1") if sys.platform == 'darwin': # enchant lib installed by macports yield "/opt/local/lib/libenchant.dylib" # On win32 we ship a bundled version of the enchant DLLs. # Use them if they're present. if sys.platform == "win32": e_path = None try: e_path = utils.get_resource_filename("libenchant.dll") except (Error,ImportError): try: e_path = utils.get_resource_filename("libenchant-1.dll") except (Error,ImportError): pass if e_path is not None: # We need to use LoadLibraryEx with LOAD_WITH_ALTERED_SEARCH_PATH so # that we don't accidentally suck in other versions of e.g. glib. if not isinstance(e_path,unicode): e_path = unicode(e_path,sys.getfilesystemencoding()) LoadLibraryEx = windll.kernel32.LoadLibraryExW LOAD_WITH_ALTERED_SEARCH_PATH = 0x00000008 e_handle = LoadLibraryEx(e_path,None,LOAD_WITH_ALTERED_SEARCH_PATH) if not e_handle: raise WinError() e = CDLL(e_path,handle=e_handle) # On darwin we ship a bundled version of the enchant DLLs. # Use them if they're present. if e is None and sys.platform == "darwin": try: e_path = utils.get_resource_filename("lib/libenchant.1.dylib") except (Error,ImportError): pass else: # Enchant doesn't natively support relocatable binaries on OSX. # We fake it by patching the enchant source to expose a char**, which # we can write the runtime path into ourelves. e = CDLL(e_path) try: e_dir = os.path.dirname(os.path.dirname(e_path)) prefix_dir = POINTER(c_char_p).in_dll(e,"enchant_prefix_dir_p") prefix_dir.contents = c_char_p(e_dir) except AttributeError: e = None # Not found yet, search various standard system locations. if e is None: for e_path in _e_path_possibilities(): if e_path is not None: try: e = cdll.LoadLibrary(e_path) except OSError: pass else: break # No usable enchant install was found :-( if e is None: raise ImportError("enchant C library not found") # Define various callback function types t_broker_desc_func = CFUNCTYPE(None,c_char_p,c_char_p,c_char_p,c_void_p) t_dict_desc_func = CFUNCTYPE(None,c_char_p,c_char_p,c_char_p,c_char_p,c_void_p) # Simple typedefs for readability t_broker = c_void_p t_dict = c_void_p # Now we can define the types of each function we are going to use broker_init = e.enchant_broker_init broker_init.argtypes = [] broker_init.restype = t_broker broker_free = e.enchant_broker_free broker_free.argtypes = [t_broker] broker_free.restype = None broker_request_dict = e.enchant_broker_request_dict broker_request_dict.argtypes = [t_broker,c_char_p] broker_request_dict.restype = t_dict broker_request_pwl_dict = e.enchant_broker_request_pwl_dict broker_request_pwl_dict.argtypes = [t_broker,c_char_p] broker_request_pwl_dict.restype = t_dict broker_free_dict = e.enchant_broker_free_dict broker_free_dict.argtypes = [t_broker,t_dict] broker_free_dict.restype = None broker_dict_exists = e.enchant_broker_dict_exists broker_dict_exists.argtypes = [t_broker,c_char_p] broker_free_dict.restype = c_int broker_set_ordering = e.enchant_broker_set_ordering broker_set_ordering.argtypes = [t_broker,c_char_p,c_char_p] broker_set_ordering.restype = None broker_get_error = e.enchant_broker_get_error broker_get_error.argtypes = [t_broker] broker_get_error.restype = c_char_p broker_describe1 = e.enchant_broker_describe broker_describe1.argtypes = [t_broker,t_broker_desc_func,c_void_p] broker_describe1.restype = None def broker_describe(broker,cbfunc): def cbfunc1(*args): cbfunc(*args[:-1]) broker_describe1(broker,t_broker_desc_func(cbfunc1),None) broker_list_dicts1 = e.enchant_broker_list_dicts broker_list_dicts1.argtypes = [t_broker,t_dict_desc_func,c_void_p] broker_list_dicts1.restype = None def broker_list_dicts(broker,cbfunc): def cbfunc1(*args): cbfunc(*args[:-1]) broker_list_dicts1(broker,t_dict_desc_func(cbfunc1),None) try: broker_get_param = e.enchant_broker_get_param except AttributeError: # Make the lookup error occur at runtime def broker_get_param(broker,param_name): return e.enchant_broker_get_param(param_name) else: broker_get_param.argtypes = [t_broker,c_char_p] broker_get_param.restype = c_char_p try: broker_set_param = e.enchant_broker_set_param except AttributeError: # Make the lookup error occur at runtime def broker_set_param(broker,param_name): return e.enchant_broker_set_param(param_name) else: broker_set_param.argtypes = [t_broker,c_char_p,c_char_p] broker_set_param.restype = None try: get_version = e.enchant_get_version except AttributeError: # Make the lookup error occur at runtime def get_version(): return e.enchant_get_version() else: get_version.argtypes = [] get_version.restype = c_char_p dict_check1 = e.enchant_dict_check dict_check1.argtypes = [t_dict,c_char_p,c_size_t] dict_check1.restype = c_int def dict_check(dict,word): return dict_check1(dict,word,len(word)) dict_suggest1 = e.enchant_dict_suggest dict_suggest1.argtypes = [t_dict,c_char_p,c_size_t,POINTER(c_size_t)] dict_suggest1.restype = POINTER(c_char_p) def dict_suggest(dict,word): numSuggsP = pointer(c_size_t(0)) suggs_c = dict_suggest1(dict,word,len(word),numSuggsP) suggs = [] n = 0 while n < numSuggsP.contents.value: suggs.append(suggs_c[n]) n = n + 1 if numSuggsP.contents.value > 0: dict_free_string_list(dict,suggs_c) return suggs dict_add1 = e.enchant_dict_add dict_add1.argtypes = [t_dict,c_char_p,c_size_t] dict_add1.restype = None def dict_add(dict,word): return dict_add1(dict,word,len(word)) dict_add_to_pwl1 = e.enchant_dict_add dict_add_to_pwl1.argtypes = [t_dict,c_char_p,c_size_t] dict_add_to_pwl1.restype = None def dict_add_to_pwl(dict,word): return dict_add_to_pwl1(dict,word,len(word)) dict_add_to_session1 = e.enchant_dict_add_to_session dict_add_to_session1.argtypes = [t_dict,c_char_p,c_size_t] dict_add_to_session1.restype = None def dict_add_to_session(dict,word): return dict_add_to_session1(dict,word,len(word)) dict_remove1 = e.enchant_dict_remove dict_remove1.argtypes = [t_dict,c_char_p,c_size_t] dict_remove1.restype = None def dict_remove(dict,word): return dict_remove1(dict,word,len(word)) dict_remove_from_session1 = e.enchant_dict_remove_from_session dict_remove_from_session1.argtypes = [t_dict,c_char_p,c_size_t] dict_remove_from_session1.restype = c_int def dict_remove_from_session(dict,word): return dict_remove_from_session1(dict,word,len(word)) dict_is_added1 = e.enchant_dict_is_added dict_is_added1.argtypes = [t_dict,c_char_p,c_size_t] dict_is_added1.restype = c_int def dict_is_added(dict,word): return dict_is_added1(dict,word,len(word)) dict_is_removed1 = e.enchant_dict_is_removed dict_is_removed1.argtypes = [t_dict,c_char_p,c_size_t] dict_is_removed1.restype = c_int def dict_is_removed(dict,word): return dict_is_removed1(dict,word,len(word)) dict_is_in_session1 = e.enchant_dict_is_in_session dict_is_in_session1.argtypes = [t_dict,c_char_p,c_size_t] dict_is_in_session1.restype = c_int def dict_is_in_session(dict,word): return dict_is_in_session1(dict,word,len(word)) dict_store_replacement1 = e.enchant_dict_store_replacement dict_store_replacement1.argtypes = [t_dict,c_char_p,c_size_t,c_char_p,c_size_t] dict_store_replacement1.restype = None def dict_store_replacement(dict,mis,cor): return dict_store_replacement1(dict,mis,len(mis),cor,len(cor)) dict_free_string_list = e.enchant_dict_free_string_list dict_free_string_list.argtypes = [t_dict,POINTER(c_char_p)] dict_free_string_list.restype = None dict_get_error = e.enchant_dict_get_error dict_get_error.argtypes = [t_dict] dict_get_error.restype = c_char_p dict_describe1 = e.enchant_dict_describe dict_describe1.argtypes = [t_dict,t_dict_desc_func,c_void_p] dict_describe1.restype = None def dict_describe(dict,cbfunc): def cbfunc1(*args): cbfunc(*args[:-1]) dict_describe1(dict,t_dict_desc_func(cbfunc1),None) pyenchant-1.6.5/MANIFEST.in0000644000175000017500000000022111500762627013401 0ustar rfkrfk include *.txt include tools/*.py include distribute_setup.py prune tools/pyenchant-bdist-win32-sources prune tools/pyenchant-bdist-osx-sources pyenchant-1.6.5/tools/0000755000175000017500000000000011501654510013000 5ustar rfkrfkpyenchant-1.6.5/tools/shootout.py0000644000175000017500000001155511501534256015251 0ustar rfkrfk#!python # # Written by Ryan Kelly, 2005. This script is placed in the public domain. # # Arrange a short shootout to determine the best spellchecker of them all!! # # This script runs a batch of tests against each enchant spellchecker # provider, collecting statistics as it goes. The tests are read from # a text file, one per line, of the format " " where # is the misspelled word and the correct spelling. Each must be # a single word. # # The statistics printed at the end of the run are: # # EXISTED: percentage of correct words which the provider # reported as being correct # # SUGGESTED: percentage of misspelled words for which the correct # spelling was suggested # # SUGGP: percentage of misspelled words whose correct spelling # existed, for which the correct spelling was suggested # (this is simply 100*SUGGESTED/EXISTED) # # FIRST: percentage of misspelled words for which the correct # spelling was the first suggested correction. # # FIRST5: percentage of misspelled words for which the correct # spelling was in the first five suggested corrections # # FIRST10: percentage of misspelled words for which the correct # spelling was in the first ten suggested corrections # # AVERAGE DIST TO CORRECTION: the average location of the correct # spelling within the suggestions list, # over those words for which the correct # spelling was found # import enchant import enchant.utils # List of providers to test # Providers can also be named "pypwl:" where is # the encoding function to use for PyPWL. All PyPWL instances # will use as their word list providers = ("aspell","pypwl",) # File containing test cases, and the language they are in # A suitable file can be found at http://aspell.net/test/batch0.tab datafile = "batch0.tab" lang = "en_US" #wordsfile = "/usr/share/dict/words" wordsfile = "words" # Corrections to make the the 'correct' words in the tests # This is so we can use unmodified tests published by third parties corrections = (("caesar","Caesar"),("confucianism","Confucianism"),("february","February"),("gandhi","Gandhi"),("muslims","Muslims"),("israel","Israel")) # List of dictionary objects to test dicts = [] # Number of correct words missed by each dictionary missed = [] # Number of corrections not suggested by each dictionary incorrect = [] # Number of places to find correct suggestion, or -1 if not found dists = [] # Create each dictionary object for prov in providers: if prov == "pypwl": d = enchant.request_pwl_dict(wordsfile) else: b = enchant.Broker() b.set_ordering(lang,prov) d = b.request_dict(lang) if not d.provider.name == prov: raise RuntimeError("Provider '%s' has no dictionary for '%s'"%(prov,lang)) del b dicts.append(d) missed.append([]) incorrect.append([]) dists.append([]) # Actually run the tests testcases = file(datafile,"r") testnum = 0 for testcase in testcases: # Skip lines starting with "#" if testcase[0] == "#": continue # Split into words words = testcase.split() # Skip tests that have multi-word corrections if len(words) > 2: continue cor = words[1].strip(); mis = words[0].strip() # Make any custom corrections for (old,new) in corrections: if old == cor: cor = new break # Actually do the test testnum += 1 print "TEST", testnum, ":", mis, "->", cor for dictnum,dict in enumerate(dicts): # Check whether it contains the correct word if not dict.check(cor): missed[dictnum].append(cor) # Check on the suggestions provided suggs = dict.suggest(mis) if cor not in suggs: incorrect[dictnum].append((mis,cor)) dists[dictnum].append(-1) else: dists[dictnum].append(suggs.index(cor)) numtests = testnum # Print a report for each provider for pnum,prov in enumerate(providers): print "=======================================" exdists = [d for d in dists[pnum] if d >= 0] print "PROVIDER:", prov print " EXISTED: %.1f"%(((numtests - len(missed[pnum]))*100.0)/numtests,) print " SUGGESTED: %.1f"%((len(exdists)*100.0)/numtests,) print " SUGGP: %.1f"%((len(exdists)*100.0)/(numtests - len(missed[pnum])),) print " FIRST: %.1f"%((len([d for d in exdists if d == 0])*100.0)/numtests,) print " FIRST5: %.1f"%((len([d for d in exdists if d < 5])*100.0)/numtests,) print " FIRST10: %.1f"%((len([d for d in exdists if d < 10])*100.0)/numtests,) print " AVERAGE DIST TO CORRECTION: %.2f" % (float(sum(exdists))/len(exdists),) print "=======================================" pyenchant-1.6.5/tools/wx_example.py0000644000175000017500000000135211501534256015530 0ustar rfkrfk import wx from enchant.checker import SpellChecker from enchant.checker.wxSpellCheckerDialog import wxSpellCheckerDialog # Retreive the text to be checked text = "this is some smple text with a few erors in it" print "[INITIAL TEXT:]", text # Need to have an App before any windows will be shown app = wx.PySimpleApp() # Construct the dialog, and the SpellChecker it is to use dlg = wxSpellCheckerDialog(None) chkr = SpellChecker("en_US",text) dlg.SetSpellChecker(chkr) # Display the dialog, allowing user interaction if dlg.ShowModal() == wx.ID_OK: # Checking completed successfully # Retreive the modified text print "[FINAL TEXT:]", chkr.get_text() else: # Checking was cancelled print "[CHECKING CANCELLED]" pyenchant-1.6.5/tools/setup.py2exe.py0000666000175000017500000000100311501534256015727 0ustar rfkrfk# # A simple example of how to use pyenchant with py2exe. # This script is also used in unittests to test py2exe integration. # from distutils.core import setup import py2exe from enchant.utils import win32_data_files setup( name="PyEnchant py2exe demo", version="0.0.1", # Include the necessary supporting data files data_files = win32_data_files(), # Make a "test_pyenchant.exe" that runs the unittest suite console=[dict(script="enchant\\__init__.py",dest_base="test_pyenchant")], ) pyenchant-1.6.5/pyenchant.egg-info/0000755000175000017500000000000011501654510015323 5ustar rfkrfkpyenchant-1.6.5/pyenchant.egg-info/SOURCES.txt0000644000175000017500000000144311501654510017211 0ustar rfkrfkLICENSE.txt MANIFEST.in README.txt TODO.txt distribute_setup.py setup.py enchant/__init__.py enchant/_enchant.py enchant/errors.py enchant/pypwl.py enchant/tests.py enchant/utils.py enchant/checker/CmdLineChecker.py enchant/checker/GtkSpellCheckerDialog.py enchant/checker/__init__.py enchant/checker/tests.py enchant/checker/wxSpellCheckerDialog.py enchant/lib/enchant/README.txt enchant/share/enchant/README.txt enchant/share/enchant/ispell/README.txt enchant/share/enchant/myspell/README.txt enchant/tokenize/__init__.py enchant/tokenize/en.py enchant/tokenize/tests.py pyenchant.egg-info/PKG-INFO pyenchant.egg-info/SOURCES.txt pyenchant.egg-info/dependency_links.txt pyenchant.egg-info/eager_resources.txt pyenchant.egg-info/top_level.txt tools/setup.py2exe.py tools/shootout.py tools/wx_example.pypyenchant-1.6.5/pyenchant.egg-info/eager_resources.txt0000644000175000017500000000000111501654510021230 0ustar rfkrfk pyenchant-1.6.5/pyenchant.egg-info/PKG-INFO0000644000175000017500000000135211501654510016421 0ustar rfkrfkMetadata-Version: 1.0 Name: pyenchant Version: 1.6.5 Summary: Python bindings for the Enchant spellchecking system Home-page: http://www.rfk.id.au/software/pyenchant/ Author: Ryan Kelly Author-email: ryan@rfk.id.au License: LGPL Description: UNKNOWN Keywords: spelling spellcheck enchant Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL) Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 3 Classifier: Topic :: Software Development :: Libraries Classifier: Topic :: Text Processing :: Linguistic pyenchant-1.6.5/pyenchant.egg-info/top_level.txt0000644000175000017500000000001011501654510020044 0ustar rfkrfkenchant pyenchant-1.6.5/pyenchant.egg-info/dependency_links.txt0000644000175000017500000000000111501654510021371 0ustar rfkrfk pyenchant-1.6.5/setup.cfg0000644000175000017500000000007311501654510013461 0ustar rfkrfk[egg_info] tag_build = tag_date = 0 tag_svn_revision = 0 pyenchant-1.6.5/README.txt0000644000175000017500000000617311340622067013350 0ustar rfkrfk pyenchant: Python bindings for the Enchant spellchecker ======================================================== This package provides a set of Python language bindings for the Enchant spellchecking library. For more information, visit the project website: http://www.rfk.id.au/software/pyenchant/ What is Enchant? ---------------- Enchant is used to check the spelling of words and suggest corrections for words that are miss-spelled. It can use many popular spellchecking packages to perform this task, including ispell, aspell and MySpell. It is quite flexible at handling multiple dictionaries and multiple languages. More information is available on the Enchant website: http://www.abisource.com/enchant/ How do I use it? ---------------- For Windows users, there is an executable installer program which can be used to install the software with a minimum of effort. Other users will need to install from source. Once the software is installed, python's on-line help facilities can get you started. Launch python and issue the following commands: >>> import enchant >>> help(enchant) Installing with the Windows Installer ------------------------------------- Download and run the windows installer program. It will automatically detect your python installation and set up pyenchant accordingly. The windows installer version provides a pre-compiled enchant library as well as several supporting libraries. Several commonly-used dictionaries are installed into: \Lib\site-packages\enchant\share\enchant\myspell. Additional language dictionaries are available from the OpenOffice.org project, and are available at: http://wiki.services.openoffice.org/wiki/Dictionaries Download the appropriate zip for for the language of interest, and unzip its contents into the "myspell" directory mentioned above. Installing from Source ---------------------- First, you must already have the enchant library installed on our system. You will also need access to a C compiler. This package is distributed using the Python 'setuptools' framework. If you have the necessary prerequisites, all that should be required to install is to execute the following command in the current directory: python setup.py install Who is responsible for all this? -------------------------------- The credit for Enchant itself goes to Dom Lachowicz. Find out more details on the Enchant website listed above. Full marks to Dom for producing such a high-quality library. The glue to pull Enchant into Python via ctypes was written by me, Ryan Kelly. I needed a decent spellchecker for another project I am working on, and all the solutions turned up by Google were either extremely non-portable (e.g. opening a pipe to ispell) or had completely disappeared from the web (what happened to SnakeSpell?) It was also a great excuse to teach myself about SWIG, ctypes, and even a little bit of the Python/C API. Bugs can be filed on the project's github page: http://github.com/rfk/pyenchant/issues Comments, suggestions, other feedback can be directed to the pyenchant-users mailing list: pyenchant-users@googlegroups.com