././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1585193107.7794485 datrie-0.8.2/0000755000175000017500000000000000000000000014450 5ustar00tcaswelltcaswell00000000000000././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585192821.0 datrie-0.8.2/CHANGES.rst0000644000175000017500000000615400000000000016260 0ustar00tcaswelltcaswell00000000000000CHANGES ======= 0.8.2 (2020-03-25) ------------------ * Future-proof Python support by making cython a build time dependency and removing cython generated c files from the repo (and sdist). * Fix collections.abc.MutableMapping import * CI and test updates * Adjust library name to unbreak some linkers 0.8.1 (skipped) --------------- This version intentionally skipped 0.8 (2019-07-03) ---------------- * Python 3.7 compatibility; extension is rebuilt with Cython 0.29.11. * Trie.get function; * Python 2.6 and 3.3 support is dropped; * removed patch to libdatrie which is no longer required; * testing and CI fixes. 0.7.1 (2016-03-12) ------------------ * updated the bundled C library to version 0.2.9; * implemented ``Trie.__len__`` in terms of ``trie_enumerate``; * rebuilt Cython wrapper with Cython 0.23.4; * changed ``Trie`` to implement ``collections.abc.MutableMapping``; * fixed ``Trie`` pickling, which segfaulted on Python2.X. 0.7 (2014-02-18) ---------------- * bundled libdatrie C library is updated to version 0.2.8; * new `.suffixes()` method (thanks Ahmed T. Youssef); * wrapper is rebuilt with Cython 0.20.1. 0.6.1 (2013-09-21) ------------------ * fixed build for Visual Studio (thanks Gabi Davar). 0.6 (2013-07-09) ---------------- * datrie is rebuilt with Cython 0.19.1; * ``iter_prefix_values``, ``prefix_values`` and ``longest_prefix_value`` methods for ``datrie.BaseTrie`` and ``datrie.Trie`` (thanks Jared Suttles). 0.5.1 (2013-01-30) ------------------ * Recently introduced memory leak in ``longest_prefix`` and ``longest_prefix_item`` is fixed. 0.5 (2013-01-29) ---------------- * ``longest_prefix`` and ``longest_prefix_item`` methods are fixed; * datrie is rebuilt with Cython 0.18; * misleading benchmark results in README are fixed; * State._walk is renamed to State.walk_char. 0.4.2 (2012-09-02) ------------------ * Update to latest libdatrie; this makes ``.keys()`` method a bit slower but removes a keys length limitation. 0.4.1 (2012-07-29) ------------------ * cPickle is used for saving/loading ``datrie.Trie`` if it is available. 0.4 (2012-07-27) ---------------- * ``libdatrie`` improvements and bugfixes, including C iterator API support; * custom iteration support using ``datrie.State`` and ``datrie.Iterator``. * speed improvements: ``__length__``, ``keys``, ``values`` and ``items`` methods should be up to 2x faster. * keys longer than 32768 are not supported in this release. 0.3 (2012-07-21) ---------------- There are no new features or speed improvements in this release. * ``datrie.new`` is deprecated; use ``datrie.Trie`` with the same arguments; * small test & benchmark improvements. 0.2 (2012-07-16) ---------------- * ``datrie.Trie`` items can have any Python object as a value (``Trie`` from 0.1.x becomes ``datrie.BaseTrie``); * ``longest_prefix`` and ``longest_prefix_items`` are fixed; * ``save`` & ``load`` are rewritten; * ``setdefault`` method. 0.1.1 (2012-07-13) ------------------ * Windows support (upstream libdatrie changes are merged); * license is changed from LGPL v3 to LGPL v2.1 to match the libdatrie license. 0.1 (2012-07-12) ---------------- Initial release. ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585088538.0 datrie-0.8.2/COPYING0000644000175000017500000006363400000000000015517 0ustar00tcaswelltcaswell00000000000000 GNU LESSER GENERAL PUBLIC LICENSE Version 2.1, February 1999 Copyright (C) 1991, 1999 Free Software Foundation, Inc. 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. [This is the first released version of the Lesser GPL. It also counts as the successor of the GNU Library Public License, version 2, hence the version number 2.1.] Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public Licenses are intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This license, the Lesser General Public License, applies to some specially designated software packages--typically libraries--of the Free Software Foundation and other authors who decide to use it. You can use it too, but we suggest you first think carefully about whether this license or the ordinary General Public License is the better strategy to use in any particular case, based on the explanations below. When we speak of free software, we are referring to freedom of use, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish); that you receive source code or can get it if you want it; that you can change the software and use pieces of it in new free programs; and that you are informed that you can do these things. To protect your rights, we need to make restrictions that forbid distributors to deny you these rights or to ask you to surrender these rights. These restrictions translate to certain responsibilities for you if you distribute copies of the library or if you modify it. For example, if you distribute copies of the library, whether gratis or for a fee, you must give the recipients all the rights that we gave you. You must make sure that they, too, receive or can get the source code. If you link other code with the library, you must provide complete object files to the recipients, so that they can relink them with the library after making changes to the library and recompiling it. And you must show them these terms so they know their rights. We protect your rights with a two-step method: (1) we copyright the library, and (2) we offer you this license, which gives you legal permission to copy, distribute and/or modify the library. To protect each distributor, we want to make it very clear that there is no warranty for the free library. Also, if the library is modified by someone else and passed on, the recipients should know that what they have is not the original version, so that the original author's reputation will not be affected by problems that might be introduced by others. Finally, software patents pose a constant threat to the existence of any free program. We wish to make sure that a company cannot effectively restrict the users of a free program by obtaining a restrictive license from a patent holder. Therefore, we insist that any patent license obtained for a version of the library must be consistent with the full freedom of use specified in this license. Most GNU software, including some libraries, is covered by the ordinary GNU General Public License. This license, the GNU Lesser General Public License, applies to certain designated libraries, and is quite different from the ordinary General Public License. We use this license for certain libraries in order to permit linking those libraries into non-free programs. When a program is linked with a library, whether statically or using a shared library, the combination of the two is legally speaking a combined work, a derivative of the original library. The ordinary General Public License therefore permits such linking only if the entire combination fits its criteria of freedom. The Lesser General Public License permits more lax criteria for linking other code with the library. We call this license the "Lesser" General Public License because it does Less to protect the user's freedom than the ordinary General Public License. It also provides other free software developers Less of an advantage over competing non-free programs. These disadvantages are the reason we use the ordinary General Public License for many libraries. However, the Lesser license provides advantages in certain special circumstances. For example, on rare occasions, there may be a special need to encourage the widest possible use of a certain library, so that it becomes a de-facto standard. To achieve this, non-free programs must be allowed to use the library. A more frequent case is that a free library does the same job as widely used non-free libraries. In this case, there is little to gain by limiting the free library to free software only, so we use the Lesser General Public License. In other cases, permission to use a particular library in non-free programs enables a greater number of people to use a large body of free software. For example, permission to use the GNU C Library in non-free programs enables many more people to use the whole GNU operating system, as well as its variant, the GNU/Linux operating system. Although the Lesser General Public License is Less protective of the users' freedom, it does ensure that the user of a program that is linked with the Library has the freedom and the wherewithal to run that program using a modified version of the Library. The precise terms and conditions for copying, distribution and modification follow. Pay close attention to the difference between a "work based on the library" and a "work that uses the library". The former contains code derived from the library, whereas the latter must be combined with the library in order to run. GNU LESSER GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License Agreement applies to any software library or other program which contains a notice placed by the copyright holder or other authorized party saying it may be distributed under the terms of this Lesser General Public License (also called "this License"). Each licensee is addressed as "you". A "library" means a collection of software functions and/or data prepared so as to be conveniently linked with application programs (which use some of those functions and data) to form executables. The "Library", below, refers to any such software library or work which has been distributed under these terms. A "work based on the Library" means either the Library or any derivative work under copyright law: that is to say, a work containing the Library or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language. (Hereinafter, translation is included without limitation in the term "modification".) "Source code" for a work means the preferred form of the work for making modifications to it. For a library, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the library. Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running a program using the Library is not restricted, and output from such a program is covered only if its contents constitute a work based on the Library (independent of the use of the Library in a tool for writing it). Whether that is true depends on what the Library does and what the program that uses the Library does. 1. You may copy and distribute verbatim copies of the Library's complete source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and distribute a copy of this License along with the Library. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Library or any portion of it, thus forming a work based on the Library, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) The modified work must itself be a software library. b) You must cause the files modified to carry prominent notices stating that you changed the files and the date of any change. c) You must cause the whole of the work to be licensed at no charge to all third parties under the terms of this License. d) If a facility in the modified Library refers to a function or a table of data to be supplied by an application program that uses the facility, other than as an argument passed when the facility is invoked, then you must make a good faith effort to ensure that, in the event an application does not supply such function or table, the facility still operates, and performs whatever part of its purpose remains meaningful. (For example, a function in a library to compute square roots has a purpose that is entirely well-defined independent of the application. Therefore, Subsection 2d requires that any application-supplied function or table used by this function must be optional: if the application does not supply it, the square root function must still compute square roots.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Library, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Library, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Library. In addition, mere aggregation of another work not based on the Library with the Library (or with a work based on the Library) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may opt to apply the terms of the ordinary GNU General Public License instead of this License to a given copy of the Library. To do this, you must alter all the notices that refer to this License, so that they refer to the ordinary GNU General Public License, version 2, instead of to this License. (If a newer version than version 2 of the ordinary GNU General Public License has appeared, then you can specify that version instead if you wish.) Do not make any other change in these notices. Once this change is made in a given copy, it is irreversible for that copy, so the ordinary GNU General Public License applies to all subsequent copies and derivative works made from that copy. This option is useful when you wish to copy part of the code of the Library into a program that is not a library. 4. You may copy and distribute the Library (or a portion or derivative of it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange. If distribution of object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place satisfies the requirement to distribute the source code, even though third parties are not compelled to copy the source along with the object code. 5. A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License. However, linking a "work that uses the Library" with the Library creates an executable that is a derivative of the Library (because it contains portions of the Library), rather than a "work that uses the library". The executable is therefore covered by this License. Section 6 states terms for distribution of such executables. When a "work that uses the Library" uses material from a header file that is part of the Library, the object code for the work may be a derivative work of the Library even though the source code is not. Whether this is true is especially significant if the work can be linked without the Library, or if the work is itself a library. The threshold for this to be true is not precisely defined by law. If such an object file uses only numerical parameters, data structure layouts and accessors, and small macros and small inline functions (ten lines or less in length), then the use of the object file is unrestricted, regardless of whether it is legally a derivative work. (Executables containing this object code plus portions of the Library will still fall under Section 6.) Otherwise, if the work is a derivative of the Library, you may distribute the object code for the work under the terms of Section 6. Any executables containing that work also fall under Section 6, whether or not they are linked directly with the Library itself. 6. As an exception to the Sections above, you may also combine or link a "work that uses the Library" with the Library to produce a work containing portions of the Library, and distribute that work under terms of your choice, provided that the terms permit modification of the work for the customer's own use and reverse engineering for debugging such modifications. You must give prominent notice with each copy of the work that the Library is used in it and that the Library and its use are covered by this License. You must supply a copy of this License. If the work during execution displays copyright notices, you must include the copyright notice for the Library among them, as well as a reference directing the user to the copy of this License. Also, you must do one of these things: a) Accompany the work with the complete corresponding machine-readable source code for the Library including whatever changes were used in the work (which must be distributed under Sections 1 and 2 above); and, if the work is an executable linked with the Library, with the complete machine-readable "work that uses the Library", as object code and/or source code, so that the user can modify the Library and then relink to produce a modified executable containing the modified Library. (It is understood that the user who changes the contents of definitions files in the Library will not necessarily be able to recompile the application to use the modified definitions.) b) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (1) uses at run time a copy of the library already present on the user's computer system, rather than copying library functions into the executable, and (2) will operate properly with a modified version of the library, if the user installs one, as long as the modified version is interface-compatible with the version that the work was made with. c) Accompany the work with a written offer, valid for at least three years, to give the same user the materials specified in Subsection 6a, above, for a charge no more than the cost of performing this distribution. d) If distribution of the work is made by offering access to copy from a designated place, offer equivalent access to copy the above specified materials from the same place. e) Verify that the user has already received a copy of these materials or that you have already sent this user a copy. For an executable, the required form of the "work that uses the Library" must include any data and utility programs needed for reproducing the executable from it. However, as a special exception, the materials to be distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. It may happen that this requirement contradicts the license restrictions of other proprietary libraries that do not normally accompany the operating system. Such a contradiction means you cannot use both them and the Library together in an executable that you distribute. 7. You may place library facilities that are a work based on the Library side-by-side in a single library together with other library facilities not covered by this License, and distribute such a combined library, provided that the separate distribution of the work based on the Library and of the other library facilities is otherwise permitted, and provided that you do these two things: a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities. This must be distributed under the terms of the Sections above. b) Give prominent notice with the combined library of the fact that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work. 8. You may not copy, modify, sublicense, link with, or distribute the Library except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, link with, or distribute the Library is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 9. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Library or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Library (or any work based on the Library), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Library or works based on it. 10. Each time you redistribute the Library (or any work based on the Library), the recipient automatically receives a license from the original licensor to copy, distribute, link with or modify the Library subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties with this License. 11. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Library at all. For example, if a patent license would not permit royalty-free redistribution of the Library by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Library. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply, and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 12. If the distribution and/or use of the Library is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Library under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 13. The Free Software Foundation may publish revised and/or new versions of the Lesser General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Library specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Library does not specify a license version number, you may choose any version ever published by the Free Software Foundation. 14. If you wish to incorporate parts of the Library into other free programs whose distribution conditions are incompatible with these, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Libraries If you develop a new library, and you want it to be of the greatest possible use to the public, we recommend making it free software that everyone can redistribute and change. You can do so by permitting redistribution under these terms (or, alternatively, under the terms of the ordinary General Public License). To apply these terms, attach the following notices to the library. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA Also add information on how to contact you by electronic and paper mail. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the library, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the library `Frob' (a library for tweaking knobs) written by James Random Hacker. , 1 April 1990 Ty Coon, President of Vice That's all there is to it! ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585190767.0 datrie-0.8.2/MANIFEST.in0000644000175000017500000000051400000000000016206 0ustar00tcaswelltcaswell00000000000000include README.rst include CHANGES.rst include COPYING include tox.ini include tox-bench.ini include update_c.sh recursive-include libdatrie *.h recursive-include libdatrie *.c include tests/words100k.txt.zip recursive-include tests *.py include src/datrie.pyx include src/cdatrie.pxd include src/stdio_ext.pxd exclude src/datrie.c././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1585193107.7794485 datrie-0.8.2/PKG-INFO0000644000175000017500000004422400000000000015553 0ustar00tcaswelltcaswell00000000000000Metadata-Version: 1.2 Name: datrie Version: 0.8.2 Summary: Super-fast, efficiently stored Trie for Python. Home-page: https://github.com/kmike/datrie Author: Mikhail Korobov Author-email: kmike84@gmail.com License: LGPLv2+ Description: datrie |travis| |appveyor| ========================== .. |travis| image:: https://travis-ci.org/pytries/datrie.svg :target: https://travis-ci.org/pytries/datrie .. |appveyor| image:: https://ci.appveyor.com/api/projects/status/6bpvhllpjhlau7x0?svg=true :target: https://ci.appveyor.com/project/superbobry/datrie Super-fast, efficiently stored Trie for Python (2.x and 3.x). Uses `libdatrie`_. .. _libdatrie: https://linux.thai.net/~thep/datrie/datrie.html Installation ============ :: pip install datrie Usage ===== Create a new trie capable of storing items with lower-case ascii keys:: >>> import string >>> import datrie >>> trie = datrie.Trie(string.ascii_lowercase) ``trie`` variable is a dict-like object that can have unicode keys of certain ranges and Python objects as values. In addition to implementing the mapping interface, tries facilitate finding the items for a given prefix, and vice versa, finding the items whose keys are prefixes of a given string. As a common special case, finding the longest-prefix item is also supported. .. warning:: For efficiency you must define allowed character range(s) while creating trie. ``datrie`` doesn't check if keys are in allowed ranges at runtime, so be careful! Invalid keys are OK at lookup time but values won't be stored correctly for such keys. Add some values to it (datrie keys must be unicode; the examples are for Python 2.x):: >>> trie[u'foo'] = 5 >>> trie[u'foobar'] = 10 >>> trie[u'bar'] = 'bar value' >>> trie.setdefault(u'foobar', 15) 10 Check if u'foo' is in trie:: >>> u'foo' in trie True Get a value:: >>> trie[u'foo'] 5 Find all prefixes of a word:: >>> trie.prefixes(u'foobarbaz') [u'foo', u'foobar'] >>> trie.prefix_items(u'foobarbaz') [(u'foo', 5), (u'foobar', 10)] >>> trie.iter_prefixes(u'foobarbaz') >>> trie.iter_prefix_items(u'foobarbaz') Find the longest prefix of a word:: >>> trie.longest_prefix(u'foo') u'foo' >>> trie.longest_prefix(u'foobarbaz') u'foobar' >>> trie.longest_prefix(u'gaz') KeyError: u'gaz' >>> trie.longest_prefix(u'gaz', default=u'vasia') u'vasia' >>> trie.longest_prefix_item(u'foobarbaz') (u'foobar', 10) Check if the trie has keys with a given prefix:: >>> trie.has_keys_with_prefix(u'fo') True >>> trie.has_keys_with_prefix(u'FO') False Get all items with a given prefix from a trie:: >>> trie.keys(u'fo') [u'foo', u'foobar'] >>> trie.items(u'ba') [(u'bar', 'bar value')] >>> trie.values(u'foob') [10] Get all suffixes of certain word starting with a given prefix from a trie:: >>> trie.suffixes() [u'pro', u'producer', u'producers', u'product', u'production', u'productivity', u'prof'] >>> trie.suffixes(u'prod') [u'ucer', u'ucers', u'uct', u'uction', u'uctivity'] Save & load a trie (values must be picklable):: >>> trie.save('my.trie') >>> trie2 = datrie.Trie.load('my.trie') Trie and BaseTrie ================= There are two Trie classes in datrie package: ``datrie.Trie`` and ``datrie.BaseTrie``. ``datrie.BaseTrie`` is slightly faster and uses less memory but it can store only integer numbers -2147483648 <= x <= 2147483647. ``datrie.Trie`` is a bit slower but can store any Python object as a value. If you don't need values or integer values are OK then use ``datrie.BaseTrie``:: import datrie import string trie = datrie.BaseTrie(string.ascii_lowercase) Custom iteration ================ If the built-in trie methods don't fit you can use ``datrie.State`` and ``datrie.Iterator`` to implement custom traversal. .. note:: If you use ``datrie.BaseTrie`` you need ``datrie.BaseState`` and ``datrie.BaseIterator`` for custom traversal. For example, let's find all suffixes of ``'fo'`` for our trie and get the values:: >>> state = datrie.State(trie) >>> state.walk(u'foo') >>> it = datrie.Iterator(state) >>> while it.next(): ... print(it.key()) ... print(it.data)) o 5 obar 10 Performance =========== Performance is measured for ``datrie.Trie`` against Python's dict with 100k unique unicode words (English and Russian) as keys and '1' numbers as values. ``datrie.Trie`` uses about 5M memory for 100k words; Python's dict uses about 22M for this according to my unscientific tests. This trie implementation is 2-6 times slower than python's dict on __getitem__. Benchmark results (macbook air i5 1.8GHz, "1.000M ops/sec" == "1 000 000 operations per second"):: Python 2.6: dict __getitem__: 7.107M ops/sec trie __getitem__: 2.478M ops/sec Python 2.7: dict __getitem__: 6.550M ops/sec trie __getitem__: 2.474M ops/sec Python 3.2: dict __getitem__: 8.185M ops/sec trie __getitem__: 2.684M ops/sec Python 3.3: dict __getitem__: 7.050M ops/sec trie __getitem__: 2.755M ops/sec Looking for prefixes of a given word is almost as fast as ``__getitem__`` (results are for Python 3.3):: trie.iter_prefix_items (hits): 0.461M ops/sec trie.prefix_items (hits): 0.743M ops/sec trie.prefix_items loop (hits): 0.629M ops/sec trie.iter_prefixes (hits): 0.759M ops/sec trie.iter_prefixes (misses): 1.538M ops/sec trie.iter_prefixes (mixed): 1.359M ops/sec trie.has_keys_with_prefix (hits): 1.896M ops/sec trie.has_keys_with_prefix (misses): 2.590M ops/sec trie.longest_prefix (hits): 1.710M ops/sec trie.longest_prefix (misses): 1.506M ops/sec trie.longest_prefix (mixed): 1.520M ops/sec trie.longest_prefix_item (hits): 1.276M ops/sec trie.longest_prefix_item (misses): 1.292M ops/sec trie.longest_prefix_item (mixed): 1.379M ops/sec Looking for all words starting with a given prefix is mostly limited by overall result count (this can be improved in future because a lot of time is spent decoding strings from utf_32_le to Python's unicode):: trie.items(prefix="xxx"), avg_len(res)==415: 0.609K ops/sec trie.keys(prefix="xxx"), avg_len(res)==415: 0.642K ops/sec trie.values(prefix="xxx"), avg_len(res)==415: 4.974K ops/sec trie.items(prefix="xxxxx"), avg_len(res)==17: 14.781K ops/sec trie.keys(prefix="xxxxx"), avg_len(res)==17: 15.766K ops/sec trie.values(prefix="xxxxx"), avg_len(res)==17: 96.456K ops/sec trie.items(prefix="xxxxxxxx"), avg_len(res)==3: 75.165K ops/sec trie.keys(prefix="xxxxxxxx"), avg_len(res)==3: 77.225K ops/sec trie.values(prefix="xxxxxxxx"), avg_len(res)==3: 320.755K ops/sec trie.items(prefix="xxxxx..xx"), avg_len(res)==1.4: 173.591K ops/sec trie.keys(prefix="xxxxx..xx"), avg_len(res)==1.4: 180.678K ops/sec trie.values(prefix="xxxxx..xx"), avg_len(res)==1.4: 503.392K ops/sec trie.items(prefix="xxx"), NON_EXISTING: 2023.647K ops/sec trie.keys(prefix="xxx"), NON_EXISTING: 1976.928K ops/sec trie.values(prefix="xxx"), NON_EXISTING: 2060.372K ops/sec Random insert time is very slow compared to dict, this is the limitation of double-array tries; updates are quite fast. If you want to build a trie, consider sorting keys before the insertion:: dict __setitem__ (updates): 6.497M ops/sec trie __setitem__ (updates): 2.633M ops/sec dict __setitem__ (inserts, random): 5.808M ops/sec trie __setitem__ (inserts, random): 0.053M ops/sec dict __setitem__ (inserts, sorted): 5.749M ops/sec trie __setitem__ (inserts, sorted): 0.624M ops/sec dict setdefault (updates): 3.455M ops/sec trie setdefault (updates): 1.910M ops/sec dict setdefault (inserts): 3.466M ops/sec trie setdefault (inserts): 0.053M ops/sec Other results (note that ``len(trie)`` is currently implemented using trie traversal):: dict __contains__ (hits): 6.801M ops/sec trie __contains__ (hits): 2.816M ops/sec dict __contains__ (misses): 5.470M ops/sec trie __contains__ (misses): 4.224M ops/sec dict __len__: 334336.269 ops/sec trie __len__: 22.900 ops/sec dict values(): 406.507 ops/sec trie values(): 20.864 ops/sec dict keys(): 189.298 ops/sec trie keys(): 2.773 ops/sec dict items(): 48.734 ops/sec trie items(): 2.611 ops/sec Please take this benchmark results with a grain of salt; this is a very simple benchmark and may not cover your use case. Current Limitations =================== * keys must be unicode (no implicit conversion for byte strings under Python 2.x, sorry); * there are no iterator versions of keys/values/items (this is not implemented yet); * it is painfully slow and maybe buggy under pypy; * library is not tested with narrow Python builds. Contributing ============ Development happens at github: https://github.com/pytries/datrie. Feel free to submit ideas, bugs, pull requests. Running tests and benchmarks ---------------------------- Make sure `tox`_ is installed and run :: $ tox from the source checkout. Tests should pass under Python 2.7 and 3.4+. :: $ tox -c tox-bench.ini runs benchmarks. If you've changed anything in the source code then make sure `cython`_ is installed and run :: $ update_c.sh before each ``tox`` command. Please note that benchmarks are not included in the release tar.gz's because benchmark data is large and this saves a lot of bandwidth; use source checkouts from github or bitbucket for the benchmarks. .. _cython: https://cython.org/ .. _tox: https://tox.readthedocs.io/ Authors & Contributors ---------------------- See https://github.com/pytries/datrie/graphs/contributors. This module is based on `libdatrie`_ C library by Theppitak Karoonboonyanan and is inspired by `fast_trie`_ Ruby bindings, `PyTrie`_ pure Python implementation and `Tree::Trie`_ Perl implementation; some docs and API ideas are borrowed from these projects. .. _fast_trie: https://github.com/tyler/trie .. _PyTrie: https://github.com/gsakkis/pytrie .. _Tree::Trie: https://metacpan.org/pod/release/AVIF/Tree-Trie-1.9/Trie.pm License ======= Licensed under LGPL v2.1. CHANGES ======= 0.8.2 (2020-03-25) ------------------ * Future-proof Python support by making cython a build time dependency and removing cython generated c files from the repo (and sdist). * Fix collections.abc.MutableMapping import * CI and test updates * Adjust library name to unbreak some linkers 0.8.1 (skipped) --------------- This version intentionally skipped 0.8 (2019-07-03) ---------------- * Python 3.7 compatibility; extension is rebuilt with Cython 0.29.11. * Trie.get function; * Python 2.6 and 3.3 support is dropped; * removed patch to libdatrie which is no longer required; * testing and CI fixes. 0.7.1 (2016-03-12) ------------------ * updated the bundled C library to version 0.2.9; * implemented ``Trie.__len__`` in terms of ``trie_enumerate``; * rebuilt Cython wrapper with Cython 0.23.4; * changed ``Trie`` to implement ``collections.abc.MutableMapping``; * fixed ``Trie`` pickling, which segfaulted on Python2.X. 0.7 (2014-02-18) ---------------- * bundled libdatrie C library is updated to version 0.2.8; * new `.suffixes()` method (thanks Ahmed T. Youssef); * wrapper is rebuilt with Cython 0.20.1. 0.6.1 (2013-09-21) ------------------ * fixed build for Visual Studio (thanks Gabi Davar). 0.6 (2013-07-09) ---------------- * datrie is rebuilt with Cython 0.19.1; * ``iter_prefix_values``, ``prefix_values`` and ``longest_prefix_value`` methods for ``datrie.BaseTrie`` and ``datrie.Trie`` (thanks Jared Suttles). 0.5.1 (2013-01-30) ------------------ * Recently introduced memory leak in ``longest_prefix`` and ``longest_prefix_item`` is fixed. 0.5 (2013-01-29) ---------------- * ``longest_prefix`` and ``longest_prefix_item`` methods are fixed; * datrie is rebuilt with Cython 0.18; * misleading benchmark results in README are fixed; * State._walk is renamed to State.walk_char. 0.4.2 (2012-09-02) ------------------ * Update to latest libdatrie; this makes ``.keys()`` method a bit slower but removes a keys length limitation. 0.4.1 (2012-07-29) ------------------ * cPickle is used for saving/loading ``datrie.Trie`` if it is available. 0.4 (2012-07-27) ---------------- * ``libdatrie`` improvements and bugfixes, including C iterator API support; * custom iteration support using ``datrie.State`` and ``datrie.Iterator``. * speed improvements: ``__length__``, ``keys``, ``values`` and ``items`` methods should be up to 2x faster. * keys longer than 32768 are not supported in this release. 0.3 (2012-07-21) ---------------- There are no new features or speed improvements in this release. * ``datrie.new`` is deprecated; use ``datrie.Trie`` with the same arguments; * small test & benchmark improvements. 0.2 (2012-07-16) ---------------- * ``datrie.Trie`` items can have any Python object as a value (``Trie`` from 0.1.x becomes ``datrie.BaseTrie``); * ``longest_prefix`` and ``longest_prefix_items`` are fixed; * ``save`` & ``load`` are rewritten; * ``setdefault`` method. 0.1.1 (2012-07-13) ------------------ * Windows support (upstream libdatrie changes are merged); * license is changed from LGPL v3 to LGPL v2.1 to match the libdatrie license. 0.1 (2012-07-12) ---------------- Initial release. Platform: UNKNOWN Classifier: Development Status :: 4 - Beta Classifier: Intended Audience :: Developers Classifier: Intended Audience :: Science/Research Classifier: License :: OSI Approved :: GNU Lesser General Public License v2 or later (LGPLv2+) Classifier: Programming Language :: Cython Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: Implementation :: CPython Classifier: Topic :: Software Development :: Libraries :: Python Modules Classifier: Topic :: Scientific/Engineering :: Information Analysis Classifier: Topic :: Text Processing :: Linguistic Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.* ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585186492.0 datrie-0.8.2/README.rst0000644000175000017500000002434400000000000016146 0ustar00tcaswelltcaswell00000000000000datrie |travis| |appveyor| ========================== .. |travis| image:: https://travis-ci.org/pytries/datrie.svg :target: https://travis-ci.org/pytries/datrie .. |appveyor| image:: https://ci.appveyor.com/api/projects/status/6bpvhllpjhlau7x0?svg=true :target: https://ci.appveyor.com/project/superbobry/datrie Super-fast, efficiently stored Trie for Python (2.x and 3.x). Uses `libdatrie`_. .. _libdatrie: https://linux.thai.net/~thep/datrie/datrie.html Installation ============ :: pip install datrie Usage ===== Create a new trie capable of storing items with lower-case ascii keys:: >>> import string >>> import datrie >>> trie = datrie.Trie(string.ascii_lowercase) ``trie`` variable is a dict-like object that can have unicode keys of certain ranges and Python objects as values. In addition to implementing the mapping interface, tries facilitate finding the items for a given prefix, and vice versa, finding the items whose keys are prefixes of a given string. As a common special case, finding the longest-prefix item is also supported. .. warning:: For efficiency you must define allowed character range(s) while creating trie. ``datrie`` doesn't check if keys are in allowed ranges at runtime, so be careful! Invalid keys are OK at lookup time but values won't be stored correctly for such keys. Add some values to it (datrie keys must be unicode; the examples are for Python 2.x):: >>> trie[u'foo'] = 5 >>> trie[u'foobar'] = 10 >>> trie[u'bar'] = 'bar value' >>> trie.setdefault(u'foobar', 15) 10 Check if u'foo' is in trie:: >>> u'foo' in trie True Get a value:: >>> trie[u'foo'] 5 Find all prefixes of a word:: >>> trie.prefixes(u'foobarbaz') [u'foo', u'foobar'] >>> trie.prefix_items(u'foobarbaz') [(u'foo', 5), (u'foobar', 10)] >>> trie.iter_prefixes(u'foobarbaz') >>> trie.iter_prefix_items(u'foobarbaz') Find the longest prefix of a word:: >>> trie.longest_prefix(u'foo') u'foo' >>> trie.longest_prefix(u'foobarbaz') u'foobar' >>> trie.longest_prefix(u'gaz') KeyError: u'gaz' >>> trie.longest_prefix(u'gaz', default=u'vasia') u'vasia' >>> trie.longest_prefix_item(u'foobarbaz') (u'foobar', 10) Check if the trie has keys with a given prefix:: >>> trie.has_keys_with_prefix(u'fo') True >>> trie.has_keys_with_prefix(u'FO') False Get all items with a given prefix from a trie:: >>> trie.keys(u'fo') [u'foo', u'foobar'] >>> trie.items(u'ba') [(u'bar', 'bar value')] >>> trie.values(u'foob') [10] Get all suffixes of certain word starting with a given prefix from a trie:: >>> trie.suffixes() [u'pro', u'producer', u'producers', u'product', u'production', u'productivity', u'prof'] >>> trie.suffixes(u'prod') [u'ucer', u'ucers', u'uct', u'uction', u'uctivity'] Save & load a trie (values must be picklable):: >>> trie.save('my.trie') >>> trie2 = datrie.Trie.load('my.trie') Trie and BaseTrie ================= There are two Trie classes in datrie package: ``datrie.Trie`` and ``datrie.BaseTrie``. ``datrie.BaseTrie`` is slightly faster and uses less memory but it can store only integer numbers -2147483648 <= x <= 2147483647. ``datrie.Trie`` is a bit slower but can store any Python object as a value. If you don't need values or integer values are OK then use ``datrie.BaseTrie``:: import datrie import string trie = datrie.BaseTrie(string.ascii_lowercase) Custom iteration ================ If the built-in trie methods don't fit you can use ``datrie.State`` and ``datrie.Iterator`` to implement custom traversal. .. note:: If you use ``datrie.BaseTrie`` you need ``datrie.BaseState`` and ``datrie.BaseIterator`` for custom traversal. For example, let's find all suffixes of ``'fo'`` for our trie and get the values:: >>> state = datrie.State(trie) >>> state.walk(u'foo') >>> it = datrie.Iterator(state) >>> while it.next(): ... print(it.key()) ... print(it.data)) o 5 obar 10 Performance =========== Performance is measured for ``datrie.Trie`` against Python's dict with 100k unique unicode words (English and Russian) as keys and '1' numbers as values. ``datrie.Trie`` uses about 5M memory for 100k words; Python's dict uses about 22M for this according to my unscientific tests. This trie implementation is 2-6 times slower than python's dict on __getitem__. Benchmark results (macbook air i5 1.8GHz, "1.000M ops/sec" == "1 000 000 operations per second"):: Python 2.6: dict __getitem__: 7.107M ops/sec trie __getitem__: 2.478M ops/sec Python 2.7: dict __getitem__: 6.550M ops/sec trie __getitem__: 2.474M ops/sec Python 3.2: dict __getitem__: 8.185M ops/sec trie __getitem__: 2.684M ops/sec Python 3.3: dict __getitem__: 7.050M ops/sec trie __getitem__: 2.755M ops/sec Looking for prefixes of a given word is almost as fast as ``__getitem__`` (results are for Python 3.3):: trie.iter_prefix_items (hits): 0.461M ops/sec trie.prefix_items (hits): 0.743M ops/sec trie.prefix_items loop (hits): 0.629M ops/sec trie.iter_prefixes (hits): 0.759M ops/sec trie.iter_prefixes (misses): 1.538M ops/sec trie.iter_prefixes (mixed): 1.359M ops/sec trie.has_keys_with_prefix (hits): 1.896M ops/sec trie.has_keys_with_prefix (misses): 2.590M ops/sec trie.longest_prefix (hits): 1.710M ops/sec trie.longest_prefix (misses): 1.506M ops/sec trie.longest_prefix (mixed): 1.520M ops/sec trie.longest_prefix_item (hits): 1.276M ops/sec trie.longest_prefix_item (misses): 1.292M ops/sec trie.longest_prefix_item (mixed): 1.379M ops/sec Looking for all words starting with a given prefix is mostly limited by overall result count (this can be improved in future because a lot of time is spent decoding strings from utf_32_le to Python's unicode):: trie.items(prefix="xxx"), avg_len(res)==415: 0.609K ops/sec trie.keys(prefix="xxx"), avg_len(res)==415: 0.642K ops/sec trie.values(prefix="xxx"), avg_len(res)==415: 4.974K ops/sec trie.items(prefix="xxxxx"), avg_len(res)==17: 14.781K ops/sec trie.keys(prefix="xxxxx"), avg_len(res)==17: 15.766K ops/sec trie.values(prefix="xxxxx"), avg_len(res)==17: 96.456K ops/sec trie.items(prefix="xxxxxxxx"), avg_len(res)==3: 75.165K ops/sec trie.keys(prefix="xxxxxxxx"), avg_len(res)==3: 77.225K ops/sec trie.values(prefix="xxxxxxxx"), avg_len(res)==3: 320.755K ops/sec trie.items(prefix="xxxxx..xx"), avg_len(res)==1.4: 173.591K ops/sec trie.keys(prefix="xxxxx..xx"), avg_len(res)==1.4: 180.678K ops/sec trie.values(prefix="xxxxx..xx"), avg_len(res)==1.4: 503.392K ops/sec trie.items(prefix="xxx"), NON_EXISTING: 2023.647K ops/sec trie.keys(prefix="xxx"), NON_EXISTING: 1976.928K ops/sec trie.values(prefix="xxx"), NON_EXISTING: 2060.372K ops/sec Random insert time is very slow compared to dict, this is the limitation of double-array tries; updates are quite fast. If you want to build a trie, consider sorting keys before the insertion:: dict __setitem__ (updates): 6.497M ops/sec trie __setitem__ (updates): 2.633M ops/sec dict __setitem__ (inserts, random): 5.808M ops/sec trie __setitem__ (inserts, random): 0.053M ops/sec dict __setitem__ (inserts, sorted): 5.749M ops/sec trie __setitem__ (inserts, sorted): 0.624M ops/sec dict setdefault (updates): 3.455M ops/sec trie setdefault (updates): 1.910M ops/sec dict setdefault (inserts): 3.466M ops/sec trie setdefault (inserts): 0.053M ops/sec Other results (note that ``len(trie)`` is currently implemented using trie traversal):: dict __contains__ (hits): 6.801M ops/sec trie __contains__ (hits): 2.816M ops/sec dict __contains__ (misses): 5.470M ops/sec trie __contains__ (misses): 4.224M ops/sec dict __len__: 334336.269 ops/sec trie __len__: 22.900 ops/sec dict values(): 406.507 ops/sec trie values(): 20.864 ops/sec dict keys(): 189.298 ops/sec trie keys(): 2.773 ops/sec dict items(): 48.734 ops/sec trie items(): 2.611 ops/sec Please take this benchmark results with a grain of salt; this is a very simple benchmark and may not cover your use case. Current Limitations =================== * keys must be unicode (no implicit conversion for byte strings under Python 2.x, sorry); * there are no iterator versions of keys/values/items (this is not implemented yet); * it is painfully slow and maybe buggy under pypy; * library is not tested with narrow Python builds. Contributing ============ Development happens at github: https://github.com/pytries/datrie. Feel free to submit ideas, bugs, pull requests. Running tests and benchmarks ---------------------------- Make sure `tox`_ is installed and run :: $ tox from the source checkout. Tests should pass under Python 2.7 and 3.4+. :: $ tox -c tox-bench.ini runs benchmarks. If you've changed anything in the source code then make sure `cython`_ is installed and run :: $ update_c.sh before each ``tox`` command. Please note that benchmarks are not included in the release tar.gz's because benchmark data is large and this saves a lot of bandwidth; use source checkouts from github or bitbucket for the benchmarks. .. _cython: https://cython.org/ .. _tox: https://tox.readthedocs.io/ Authors & Contributors ---------------------- See https://github.com/pytries/datrie/graphs/contributors. This module is based on `libdatrie`_ C library by Theppitak Karoonboonyanan and is inspired by `fast_trie`_ Ruby bindings, `PyTrie`_ pure Python implementation and `Tree::Trie`_ Perl implementation; some docs and API ideas are borrowed from these projects. .. _fast_trie: https://github.com/tyler/trie .. _PyTrie: https://github.com/gsakkis/pytrie .. _Tree::Trie: https://metacpan.org/pod/release/AVIF/Tree-Trie-1.9/Trie.pm License ======= Licensed under LGPL v2.1. ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1585193107.7761154 datrie-0.8.2/datrie.egg-info/0000755000175000017500000000000000000000000017412 5ustar00tcaswelltcaswell00000000000000././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585193107.0 datrie-0.8.2/datrie.egg-info/PKG-INFO0000644000175000017500000004422400000000000020515 0ustar00tcaswelltcaswell00000000000000Metadata-Version: 1.2 Name: datrie Version: 0.8.2 Summary: Super-fast, efficiently stored Trie for Python. Home-page: https://github.com/kmike/datrie Author: Mikhail Korobov Author-email: kmike84@gmail.com License: LGPLv2+ Description: datrie |travis| |appveyor| ========================== .. |travis| image:: https://travis-ci.org/pytries/datrie.svg :target: https://travis-ci.org/pytries/datrie .. |appveyor| image:: https://ci.appveyor.com/api/projects/status/6bpvhllpjhlau7x0?svg=true :target: https://ci.appveyor.com/project/superbobry/datrie Super-fast, efficiently stored Trie for Python (2.x and 3.x). Uses `libdatrie`_. .. _libdatrie: https://linux.thai.net/~thep/datrie/datrie.html Installation ============ :: pip install datrie Usage ===== Create a new trie capable of storing items with lower-case ascii keys:: >>> import string >>> import datrie >>> trie = datrie.Trie(string.ascii_lowercase) ``trie`` variable is a dict-like object that can have unicode keys of certain ranges and Python objects as values. In addition to implementing the mapping interface, tries facilitate finding the items for a given prefix, and vice versa, finding the items whose keys are prefixes of a given string. As a common special case, finding the longest-prefix item is also supported. .. warning:: For efficiency you must define allowed character range(s) while creating trie. ``datrie`` doesn't check if keys are in allowed ranges at runtime, so be careful! Invalid keys are OK at lookup time but values won't be stored correctly for such keys. Add some values to it (datrie keys must be unicode; the examples are for Python 2.x):: >>> trie[u'foo'] = 5 >>> trie[u'foobar'] = 10 >>> trie[u'bar'] = 'bar value' >>> trie.setdefault(u'foobar', 15) 10 Check if u'foo' is in trie:: >>> u'foo' in trie True Get a value:: >>> trie[u'foo'] 5 Find all prefixes of a word:: >>> trie.prefixes(u'foobarbaz') [u'foo', u'foobar'] >>> trie.prefix_items(u'foobarbaz') [(u'foo', 5), (u'foobar', 10)] >>> trie.iter_prefixes(u'foobarbaz') >>> trie.iter_prefix_items(u'foobarbaz') Find the longest prefix of a word:: >>> trie.longest_prefix(u'foo') u'foo' >>> trie.longest_prefix(u'foobarbaz') u'foobar' >>> trie.longest_prefix(u'gaz') KeyError: u'gaz' >>> trie.longest_prefix(u'gaz', default=u'vasia') u'vasia' >>> trie.longest_prefix_item(u'foobarbaz') (u'foobar', 10) Check if the trie has keys with a given prefix:: >>> trie.has_keys_with_prefix(u'fo') True >>> trie.has_keys_with_prefix(u'FO') False Get all items with a given prefix from a trie:: >>> trie.keys(u'fo') [u'foo', u'foobar'] >>> trie.items(u'ba') [(u'bar', 'bar value')] >>> trie.values(u'foob') [10] Get all suffixes of certain word starting with a given prefix from a trie:: >>> trie.suffixes() [u'pro', u'producer', u'producers', u'product', u'production', u'productivity', u'prof'] >>> trie.suffixes(u'prod') [u'ucer', u'ucers', u'uct', u'uction', u'uctivity'] Save & load a trie (values must be picklable):: >>> trie.save('my.trie') >>> trie2 = datrie.Trie.load('my.trie') Trie and BaseTrie ================= There are two Trie classes in datrie package: ``datrie.Trie`` and ``datrie.BaseTrie``. ``datrie.BaseTrie`` is slightly faster and uses less memory but it can store only integer numbers -2147483648 <= x <= 2147483647. ``datrie.Trie`` is a bit slower but can store any Python object as a value. If you don't need values or integer values are OK then use ``datrie.BaseTrie``:: import datrie import string trie = datrie.BaseTrie(string.ascii_lowercase) Custom iteration ================ If the built-in trie methods don't fit you can use ``datrie.State`` and ``datrie.Iterator`` to implement custom traversal. .. note:: If you use ``datrie.BaseTrie`` you need ``datrie.BaseState`` and ``datrie.BaseIterator`` for custom traversal. For example, let's find all suffixes of ``'fo'`` for our trie and get the values:: >>> state = datrie.State(trie) >>> state.walk(u'foo') >>> it = datrie.Iterator(state) >>> while it.next(): ... print(it.key()) ... print(it.data)) o 5 obar 10 Performance =========== Performance is measured for ``datrie.Trie`` against Python's dict with 100k unique unicode words (English and Russian) as keys and '1' numbers as values. ``datrie.Trie`` uses about 5M memory for 100k words; Python's dict uses about 22M for this according to my unscientific tests. This trie implementation is 2-6 times slower than python's dict on __getitem__. Benchmark results (macbook air i5 1.8GHz, "1.000M ops/sec" == "1 000 000 operations per second"):: Python 2.6: dict __getitem__: 7.107M ops/sec trie __getitem__: 2.478M ops/sec Python 2.7: dict __getitem__: 6.550M ops/sec trie __getitem__: 2.474M ops/sec Python 3.2: dict __getitem__: 8.185M ops/sec trie __getitem__: 2.684M ops/sec Python 3.3: dict __getitem__: 7.050M ops/sec trie __getitem__: 2.755M ops/sec Looking for prefixes of a given word is almost as fast as ``__getitem__`` (results are for Python 3.3):: trie.iter_prefix_items (hits): 0.461M ops/sec trie.prefix_items (hits): 0.743M ops/sec trie.prefix_items loop (hits): 0.629M ops/sec trie.iter_prefixes (hits): 0.759M ops/sec trie.iter_prefixes (misses): 1.538M ops/sec trie.iter_prefixes (mixed): 1.359M ops/sec trie.has_keys_with_prefix (hits): 1.896M ops/sec trie.has_keys_with_prefix (misses): 2.590M ops/sec trie.longest_prefix (hits): 1.710M ops/sec trie.longest_prefix (misses): 1.506M ops/sec trie.longest_prefix (mixed): 1.520M ops/sec trie.longest_prefix_item (hits): 1.276M ops/sec trie.longest_prefix_item (misses): 1.292M ops/sec trie.longest_prefix_item (mixed): 1.379M ops/sec Looking for all words starting with a given prefix is mostly limited by overall result count (this can be improved in future because a lot of time is spent decoding strings from utf_32_le to Python's unicode):: trie.items(prefix="xxx"), avg_len(res)==415: 0.609K ops/sec trie.keys(prefix="xxx"), avg_len(res)==415: 0.642K ops/sec trie.values(prefix="xxx"), avg_len(res)==415: 4.974K ops/sec trie.items(prefix="xxxxx"), avg_len(res)==17: 14.781K ops/sec trie.keys(prefix="xxxxx"), avg_len(res)==17: 15.766K ops/sec trie.values(prefix="xxxxx"), avg_len(res)==17: 96.456K ops/sec trie.items(prefix="xxxxxxxx"), avg_len(res)==3: 75.165K ops/sec trie.keys(prefix="xxxxxxxx"), avg_len(res)==3: 77.225K ops/sec trie.values(prefix="xxxxxxxx"), avg_len(res)==3: 320.755K ops/sec trie.items(prefix="xxxxx..xx"), avg_len(res)==1.4: 173.591K ops/sec trie.keys(prefix="xxxxx..xx"), avg_len(res)==1.4: 180.678K ops/sec trie.values(prefix="xxxxx..xx"), avg_len(res)==1.4: 503.392K ops/sec trie.items(prefix="xxx"), NON_EXISTING: 2023.647K ops/sec trie.keys(prefix="xxx"), NON_EXISTING: 1976.928K ops/sec trie.values(prefix="xxx"), NON_EXISTING: 2060.372K ops/sec Random insert time is very slow compared to dict, this is the limitation of double-array tries; updates are quite fast. If you want to build a trie, consider sorting keys before the insertion:: dict __setitem__ (updates): 6.497M ops/sec trie __setitem__ (updates): 2.633M ops/sec dict __setitem__ (inserts, random): 5.808M ops/sec trie __setitem__ (inserts, random): 0.053M ops/sec dict __setitem__ (inserts, sorted): 5.749M ops/sec trie __setitem__ (inserts, sorted): 0.624M ops/sec dict setdefault (updates): 3.455M ops/sec trie setdefault (updates): 1.910M ops/sec dict setdefault (inserts): 3.466M ops/sec trie setdefault (inserts): 0.053M ops/sec Other results (note that ``len(trie)`` is currently implemented using trie traversal):: dict __contains__ (hits): 6.801M ops/sec trie __contains__ (hits): 2.816M ops/sec dict __contains__ (misses): 5.470M ops/sec trie __contains__ (misses): 4.224M ops/sec dict __len__: 334336.269 ops/sec trie __len__: 22.900 ops/sec dict values(): 406.507 ops/sec trie values(): 20.864 ops/sec dict keys(): 189.298 ops/sec trie keys(): 2.773 ops/sec dict items(): 48.734 ops/sec trie items(): 2.611 ops/sec Please take this benchmark results with a grain of salt; this is a very simple benchmark and may not cover your use case. Current Limitations =================== * keys must be unicode (no implicit conversion for byte strings under Python 2.x, sorry); * there are no iterator versions of keys/values/items (this is not implemented yet); * it is painfully slow and maybe buggy under pypy; * library is not tested with narrow Python builds. Contributing ============ Development happens at github: https://github.com/pytries/datrie. Feel free to submit ideas, bugs, pull requests. Running tests and benchmarks ---------------------------- Make sure `tox`_ is installed and run :: $ tox from the source checkout. Tests should pass under Python 2.7 and 3.4+. :: $ tox -c tox-bench.ini runs benchmarks. If you've changed anything in the source code then make sure `cython`_ is installed and run :: $ update_c.sh before each ``tox`` command. Please note that benchmarks are not included in the release tar.gz's because benchmark data is large and this saves a lot of bandwidth; use source checkouts from github or bitbucket for the benchmarks. .. _cython: https://cython.org/ .. _tox: https://tox.readthedocs.io/ Authors & Contributors ---------------------- See https://github.com/pytries/datrie/graphs/contributors. This module is based on `libdatrie`_ C library by Theppitak Karoonboonyanan and is inspired by `fast_trie`_ Ruby bindings, `PyTrie`_ pure Python implementation and `Tree::Trie`_ Perl implementation; some docs and API ideas are borrowed from these projects. .. _fast_trie: https://github.com/tyler/trie .. _PyTrie: https://github.com/gsakkis/pytrie .. _Tree::Trie: https://metacpan.org/pod/release/AVIF/Tree-Trie-1.9/Trie.pm License ======= Licensed under LGPL v2.1. CHANGES ======= 0.8.2 (2020-03-25) ------------------ * Future-proof Python support by making cython a build time dependency and removing cython generated c files from the repo (and sdist). * Fix collections.abc.MutableMapping import * CI and test updates * Adjust library name to unbreak some linkers 0.8.1 (skipped) --------------- This version intentionally skipped 0.8 (2019-07-03) ---------------- * Python 3.7 compatibility; extension is rebuilt with Cython 0.29.11. * Trie.get function; * Python 2.6 and 3.3 support is dropped; * removed patch to libdatrie which is no longer required; * testing and CI fixes. 0.7.1 (2016-03-12) ------------------ * updated the bundled C library to version 0.2.9; * implemented ``Trie.__len__`` in terms of ``trie_enumerate``; * rebuilt Cython wrapper with Cython 0.23.4; * changed ``Trie`` to implement ``collections.abc.MutableMapping``; * fixed ``Trie`` pickling, which segfaulted on Python2.X. 0.7 (2014-02-18) ---------------- * bundled libdatrie C library is updated to version 0.2.8; * new `.suffixes()` method (thanks Ahmed T. Youssef); * wrapper is rebuilt with Cython 0.20.1. 0.6.1 (2013-09-21) ------------------ * fixed build for Visual Studio (thanks Gabi Davar). 0.6 (2013-07-09) ---------------- * datrie is rebuilt with Cython 0.19.1; * ``iter_prefix_values``, ``prefix_values`` and ``longest_prefix_value`` methods for ``datrie.BaseTrie`` and ``datrie.Trie`` (thanks Jared Suttles). 0.5.1 (2013-01-30) ------------------ * Recently introduced memory leak in ``longest_prefix`` and ``longest_prefix_item`` is fixed. 0.5 (2013-01-29) ---------------- * ``longest_prefix`` and ``longest_prefix_item`` methods are fixed; * datrie is rebuilt with Cython 0.18; * misleading benchmark results in README are fixed; * State._walk is renamed to State.walk_char. 0.4.2 (2012-09-02) ------------------ * Update to latest libdatrie; this makes ``.keys()`` method a bit slower but removes a keys length limitation. 0.4.1 (2012-07-29) ------------------ * cPickle is used for saving/loading ``datrie.Trie`` if it is available. 0.4 (2012-07-27) ---------------- * ``libdatrie`` improvements and bugfixes, including C iterator API support; * custom iteration support using ``datrie.State`` and ``datrie.Iterator``. * speed improvements: ``__length__``, ``keys``, ``values`` and ``items`` methods should be up to 2x faster. * keys longer than 32768 are not supported in this release. 0.3 (2012-07-21) ---------------- There are no new features or speed improvements in this release. * ``datrie.new`` is deprecated; use ``datrie.Trie`` with the same arguments; * small test & benchmark improvements. 0.2 (2012-07-16) ---------------- * ``datrie.Trie`` items can have any Python object as a value (``Trie`` from 0.1.x becomes ``datrie.BaseTrie``); * ``longest_prefix`` and ``longest_prefix_items`` are fixed; * ``save`` & ``load`` are rewritten; * ``setdefault`` method. 0.1.1 (2012-07-13) ------------------ * Windows support (upstream libdatrie changes are merged); * license is changed from LGPL v3 to LGPL v2.1 to match the libdatrie license. 0.1 (2012-07-12) ---------------- Initial release. Platform: UNKNOWN Classifier: Development Status :: 4 - Beta Classifier: Intended Audience :: Developers Classifier: Intended Audience :: Science/Research Classifier: License :: OSI Approved :: GNU Lesser General Public License v2 or later (LGPLv2+) Classifier: Programming Language :: Cython Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: Implementation :: CPython Classifier: Topic :: Software Development :: Libraries :: Python Modules Classifier: Topic :: Scientific/Engineering :: Information Analysis Classifier: Topic :: Text Processing :: Linguistic Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.* ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585193107.0 datrie-0.8.2/datrie.egg-info/SOURCES.txt0000644000175000017500000000230500000000000021276 0ustar00tcaswelltcaswell00000000000000CHANGES.rst COPYING MANIFEST.in README.rst pyproject.toml setup.cfg setup.py tox-bench.ini tox.ini update_c.sh datrie.egg-info/PKG-INFO datrie.egg-info/SOURCES.txt datrie.egg-info/dependency_links.txt datrie.egg-info/top_level.txt libdatrie/datrie/alpha-map-private.h libdatrie/datrie/alpha-map.c libdatrie/datrie/alpha-map.h libdatrie/datrie/darray.c libdatrie/datrie/darray.h libdatrie/datrie/dstring-private.h libdatrie/datrie/dstring.c libdatrie/datrie/dstring.h libdatrie/datrie/fileutils.c libdatrie/datrie/fileutils.h libdatrie/datrie/tail.c libdatrie/datrie/tail.h libdatrie/datrie/trie-private.h libdatrie/datrie/trie-string.c libdatrie/datrie/trie-string.h libdatrie/datrie/trie.c libdatrie/datrie/trie.h libdatrie/datrie/triedefs.h libdatrie/datrie/typedefs.h libdatrie/tests/test_file.c libdatrie/tests/test_iterator.c libdatrie/tests/test_nonalpha.c libdatrie/tests/test_null_trie.c libdatrie/tests/test_store-retrieve.c libdatrie/tests/test_term_state.c libdatrie/tests/test_walk.c libdatrie/tests/utils.c libdatrie/tests/utils.h libdatrie/tools/trietool.c src/cdatrie.pxd src/datrie.pyx src/stdio_ext.pxd tests/__init__.py tests/test_iteration.py tests/test_random.py tests/test_state.py tests/test_trie.py././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585193107.0 datrie-0.8.2/datrie.egg-info/dependency_links.txt0000644000175000017500000000000100000000000023460 0ustar00tcaswelltcaswell00000000000000 ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585193107.0 datrie-0.8.2/datrie.egg-info/top_level.txt0000644000175000017500000000000700000000000022141 0ustar00tcaswelltcaswell00000000000000datrie ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1585193107.7761154 datrie-0.8.2/libdatrie/0000755000175000017500000000000000000000000016407 5ustar00tcaswelltcaswell00000000000000././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1585193107.7761154 datrie-0.8.2/libdatrie/datrie/0000755000175000017500000000000000000000000017657 5ustar00tcaswelltcaswell00000000000000././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/alpha-map-private.h0000644000175000017500000000352700000000000023347 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * alpha-map-private.h - private APIs for alpha-map * Created: 2008-12-04 * Author: Theppitak Karoonboonyanan */ #ifndef __ALPHA_MAP_PRIVATE_H #define __ALPHA_MAP_PRIVATE_H #include #include "alpha-map.h" AlphaMap * alpha_map_fread_bin (FILE *file); int alpha_map_fwrite_bin (const AlphaMap *alpha_map, FILE *file); TrieIndex alpha_map_char_to_trie (const AlphaMap *alpha_map, AlphaChar ac); AlphaChar alpha_map_trie_to_char (const AlphaMap *alpha_map, TrieChar tc); TrieChar * alpha_map_char_to_trie_str (const AlphaMap *alpha_map, const AlphaChar *str); AlphaChar * alpha_map_trie_to_char_str (const AlphaMap *alpha_map, const TrieChar *str); #endif /* __ALPHA_MAP_PRIVATE_H */ /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/alpha-map.c0000644000175000017500000003407700000000000021676 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * alpha-map.c - map between character codes and trie alphabet * Created: 2006-08-19 * Author: Theppitak Karoonboonyanan */ #include #include #include #include #include #include "alpha-map.h" #include "alpha-map-private.h" #include "trie-private.h" #include "fileutils.h" /** * @brief Alphabet string length * * @param str : the array of null-terminated AlphaChar string to measure * * @return the total characters in @a str. */ int alpha_char_strlen (const AlphaChar *str) { const AlphaChar *p; for (p = str; *p; p++) ; return p - str; } /** * @brief Compare alphabet strings * * @param str1, str2 : the arrays of null-terminated AlphaChar strings * to compare * * @return negative if @a str1 < @a str2; * 0 if @a str1 == @a str2; * positive if @a str1 > @a str2 * * Available since: 0.2.7 */ int alpha_char_strcmp (const AlphaChar *str1, const AlphaChar *str2) { while (*str1 && *str1 == *str2) { str1++; str2++; } if (*str1 < *str2) return -1; if (*str1 > *str2) return 1; return 0; } /*------------------------------* * PRIVATE DATA DEFINITONS * *------------------------------*/ typedef struct _AlphaRange { struct _AlphaRange *next; AlphaChar begin; AlphaChar end; } AlphaRange; struct _AlphaMap { AlphaRange *first_range; /* work area */ /* alpha-to-trie map */ AlphaChar alpha_begin; AlphaChar alpha_end; int alpha_map_sz; TrieIndex *alpha_to_trie_map; /* trie-to-alpha map */ int trie_map_sz; AlphaChar *trie_to_alpha_map; }; /*-----------------------------------* * PRIVATE METHODS DECLARATIONS * *-----------------------------------*/ static int alpha_map_get_total_ranges (const AlphaMap *alpha_map); static int alpha_map_add_range_only (AlphaMap *alpha_map, AlphaChar begin, AlphaChar end); static int alpha_map_recalc_work_area (AlphaMap *alpha_map); /*-----------------------------* * METHODS IMPLEMENTAIONS * *-----------------------------*/ #define ALPHAMAP_SIGNATURE 0xD9FCD9FC /* AlphaMap Header: * - INT32: signature * - INT32: total ranges * * Ranges: * - INT32: range begin * - INT32: range end */ /** * @brief Create new alphabet map * * @return a pointer to the newly created alphabet map, NULL on failure * * Create a new empty alphabet map. The map contents can then be added with * alpha_map_add_range(). * * The created object must be freed with alpha_map_free(). */ AlphaMap * alpha_map_new () { AlphaMap *alpha_map; alpha_map = (AlphaMap *) malloc (sizeof (AlphaMap)); if (UNLIKELY (!alpha_map)) return NULL; alpha_map->first_range = NULL; /* work area */ alpha_map->alpha_begin = 0; alpha_map->alpha_end = 0; alpha_map->alpha_map_sz = 0; alpha_map->alpha_to_trie_map = NULL; alpha_map->trie_map_sz = 0; alpha_map->trie_to_alpha_map = NULL; return alpha_map; } /** * @brief Create a clone of alphabet map * * @param a_map : the source alphabet map to clone * * @return a pointer to the alphabet map clone, NULL on failure * * The created object must be freed with alpha_map_free(). */ AlphaMap * alpha_map_clone (const AlphaMap *a_map) { AlphaMap *alpha_map; AlphaRange *range; alpha_map = alpha_map_new (); if (UNLIKELY (!alpha_map)) return NULL; for (range = a_map->first_range; range; range = range->next) { if (alpha_map_add_range_only (alpha_map, range->begin, range->end) != 0) goto exit_map_created; } if (alpha_map_recalc_work_area (alpha_map) != 0) goto exit_map_created; return alpha_map; exit_map_created: alpha_map_free (alpha_map); return NULL; } /** * @brief Free an alphabet map object * * @param alpha_map : the alphabet map object to free * * Destruct the @a alpha_map and free its allocated memory. */ void alpha_map_free (AlphaMap *alpha_map) { AlphaRange *p, *q; p = alpha_map->first_range; while (p) { q = p->next; free (p); p = q; } /* work area */ if (alpha_map->alpha_to_trie_map) free (alpha_map->alpha_to_trie_map); if (alpha_map->trie_to_alpha_map) free (alpha_map->trie_to_alpha_map); free (alpha_map); } AlphaMap * alpha_map_fread_bin (FILE *file) { long save_pos; uint32 sig; int32 total, i; AlphaMap *alpha_map; /* check signature */ save_pos = ftell (file); if (!file_read_int32 (file, (int32 *) &sig) || ALPHAMAP_SIGNATURE != sig) goto exit_file_read; alpha_map = alpha_map_new (); if (UNLIKELY (!alpha_map)) goto exit_file_read; /* read number of ranges */ if (!file_read_int32 (file, &total)) goto exit_map_created; /* read character ranges */ for (i = 0; i < total; i++) { int32 b, e; if (!file_read_int32 (file, &b) || !file_read_int32 (file, &e)) goto exit_map_created; alpha_map_add_range_only (alpha_map, b, e); } /* work area */ if (UNLIKELY (alpha_map_recalc_work_area (alpha_map) != 0)) goto exit_map_created; return alpha_map; exit_map_created: alpha_map_free (alpha_map); exit_file_read: fseek (file, save_pos, SEEK_SET); return NULL; } static int alpha_map_get_total_ranges (const AlphaMap *alpha_map) { int n; AlphaRange *range; for (n = 0, range = alpha_map->first_range; range; range = range->next) { ++n; } return n; } int alpha_map_fwrite_bin (const AlphaMap *alpha_map, FILE *file) { AlphaRange *range; if (!file_write_int32 (file, ALPHAMAP_SIGNATURE) || !file_write_int32 (file, alpha_map_get_total_ranges (alpha_map))) { return -1; } for (range = alpha_map->first_range; range; range = range->next) { if (!file_write_int32 (file, range->begin) || !file_write_int32 (file, range->end)) { return -1; } } return 0; } static int alpha_map_add_range_only (AlphaMap *alpha_map, AlphaChar begin, AlphaChar end) { AlphaRange *q, *r, *begin_node, *end_node; if (begin > end) return -1; begin_node = end_node = 0; /* Skip first ranges till 'begin' is covered */ for (q = 0, r = alpha_map->first_range; r && r->begin <= begin; q = r, r = r->next) { if (begin <= r->end) { /* 'r' covers 'begin' -> take 'r' as beginning point */ begin_node = r; break; } if (r->end + 1 == begin) { /* 'begin' is next to 'r'-end * -> extend 'r'-end to cover 'begin' */ r->end = begin; begin_node = r; break; } } if (!begin_node && r && r->begin <= end + 1) { /* ['begin', 'end'] overlaps into 'r'-begin * or 'r' is next to 'end' if r->begin == end + 1 * -> extend 'r'-begin to include the range */ r->begin = begin; begin_node = r; } /* Run upto the first range that exceeds 'end' */ while (r && r->begin <= end + 1) { if (end <= r->end) { /* 'r' covers 'end' -> take 'r' as ending point */ end_node = r; } else if (r != begin_node) { /* ['begin', 'end'] covers the whole 'r' -> remove 'r' */ if (q) { q->next = r->next; free (r); r = q->next; } else { alpha_map->first_range = r->next; free (r); r = alpha_map->first_range; } continue; } q = r; r = r->next; } if (!end_node && q && begin <= q->end) { /* ['begin', 'end'] overlaps 'q' at the end * -> extend 'q'-end to include the range */ q->end = end; end_node = q; } if (begin_node && end_node) { if (begin_node != end_node) { /* Merge begin_node and end_node ranges together */ assert (begin_node->next == end_node); begin_node->end = end_node->end; begin_node->next = end_node->next; free (end_node); } } else if (!begin_node && !end_node) { /* ['begin', 'end'] overlaps with none of the ranges * -> insert a new range */ AlphaRange *range = (AlphaRange *) malloc (sizeof (AlphaRange)); if (UNLIKELY (!range)) return -1; range->begin = begin; range->end = end; /* insert it between 'q' and 'r' */ if (q) { q->next = range; } else { alpha_map->first_range = range; } range->next = r; } return 0; } static int alpha_map_recalc_work_area (AlphaMap *alpha_map) { AlphaRange *range; /* free old existing map */ if (alpha_map->alpha_to_trie_map) { free (alpha_map->alpha_to_trie_map); alpha_map->alpha_to_trie_map = NULL; } if (alpha_map->trie_to_alpha_map) { free (alpha_map->trie_to_alpha_map); alpha_map->trie_to_alpha_map = NULL; } range = alpha_map->first_range; if (range) { const AlphaChar alpha_begin = range->begin; int n_cells, i; AlphaChar a; TrieChar trie_last = 0; TrieChar tc; /* reconstruct alpha-to-trie map */ alpha_map->alpha_begin = alpha_begin; while (range->next) { range = range->next; } alpha_map->alpha_end = range->end; alpha_map->alpha_map_sz = n_cells = range->end - alpha_begin + 1; alpha_map->alpha_to_trie_map = (TrieIndex *) malloc (n_cells * sizeof (TrieIndex)); if (UNLIKELY (!alpha_map->alpha_to_trie_map)) goto error_alpha_map_not_created; for (i = 0; i < n_cells; i++) { alpha_map->alpha_to_trie_map[i] = TRIE_INDEX_MAX; } for (range = alpha_map->first_range; range; range = range->next) { for (a = range->begin; a <= range->end; a++) { alpha_map->alpha_to_trie_map[a - alpha_begin] = ++trie_last; } } /* reconstruct trie-to-alpha map */ alpha_map->trie_map_sz = n_cells = trie_last + 1; alpha_map->trie_to_alpha_map = (AlphaChar *) malloc (n_cells * sizeof (AlphaChar)); if (UNLIKELY (!alpha_map->trie_to_alpha_map)) goto error_alpha_map_created; alpha_map->trie_to_alpha_map[0] = 0; tc = 1; for (range = alpha_map->first_range; range; range = range->next) { for (a = range->begin; a <= range->end; a++) { alpha_map->trie_to_alpha_map[tc++] = a; } } } return 0; error_alpha_map_created: free (alpha_map->alpha_to_trie_map); alpha_map->alpha_to_trie_map = NULL; error_alpha_map_not_created: return -1; } /** * @brief Add a range to alphabet map * * @param alpha_map : the alphabet map object * @param begin : the first character of the range * @param end : the last character of the range * * @return 0 on success, non-zero on failure * * Add a range of character codes from @a begin to @a end to the * alphabet set. */ int alpha_map_add_range (AlphaMap *alpha_map, AlphaChar begin, AlphaChar end) { int res = alpha_map_add_range_only (alpha_map, begin, end); if (res != 0) return res; return alpha_map_recalc_work_area (alpha_map); } TrieIndex alpha_map_char_to_trie (const AlphaMap *alpha_map, AlphaChar ac) { TrieIndex alpha_begin; if (UNLIKELY (0 == ac)) return 0; if (UNLIKELY (!alpha_map->alpha_to_trie_map)) return TRIE_INDEX_MAX; alpha_begin = alpha_map->alpha_begin; if (alpha_begin <= ac && ac <= alpha_map->alpha_end) { return alpha_map->alpha_to_trie_map[ac - alpha_begin]; } return TRIE_INDEX_MAX; } AlphaChar alpha_map_trie_to_char (const AlphaMap *alpha_map, TrieChar tc) { if (tc < alpha_map->trie_map_sz) return alpha_map->trie_to_alpha_map[tc]; return ALPHA_CHAR_ERROR; } TrieChar * alpha_map_char_to_trie_str (const AlphaMap *alpha_map, const AlphaChar *str) { TrieChar *trie_str, *p; trie_str = (TrieChar *) malloc (alpha_char_strlen (str) + 1); if (UNLIKELY (!trie_str)) return NULL; for (p = trie_str; *str; p++, str++) { TrieIndex tc = alpha_map_char_to_trie (alpha_map, *str); if (TRIE_INDEX_MAX == tc) goto error_str_allocated; *p = (TrieChar) tc; } *p = 0; return trie_str; error_str_allocated: free (trie_str); return NULL; } AlphaChar * alpha_map_trie_to_char_str (const AlphaMap *alpha_map, const TrieChar *str) { AlphaChar *alpha_str, *p; alpha_str = (AlphaChar *) malloc ((strlen ((const char *)str) + 1) * sizeof (AlphaChar)); if (UNLIKELY (!alpha_str)) return NULL; for (p = alpha_str; *str; p++, str++) { *p = (AlphaChar) alpha_map_trie_to_char (alpha_map, *str); } *p = 0; return alpha_str; } /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/alpha-map.h0000644000175000017500000000553000000000000021673 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * alpha-map.h - map between character codes and trie alphabet * Created: 2006-08-19 * Author: Theppitak Karoonboonyanan */ #ifndef __ALPHA_MAP_H #define __ALPHA_MAP_H #include #include "typedefs.h" #include "triedefs.h" #ifdef __cplusplus extern "C" { #endif /** * @file alpha-map.h * @brief AlphaMap data type and functions * * AlphaMap is a mapping between AlphaChar and TrieChar. AlphaChar is the * alphabet character used in words of a target language, while TrieChar * is a small integer with packed range of values and is actually used in * trie state transition calculations. * * Since double-array trie relies on sparse state transition table, * a small set of input characters can make the table small, i.e. with * small number of columns. But in real life, alphabet characters can be * of non-continuous range of values. The unused slots between them can * waste the space in the table, and can increase the chance of unused * array cells. * * AlphaMap is thus defined for mapping between non-continuous ranges of * values of AlphaChar and packed and continuous range of Triechar. * * In this implementation, TrieChar is defined as a single-byte integer, * which means the largest AlphaChar set that is supported is of 255 * values, as the special value of 0 is reserved for null-termination code. */ /** * @brief AlphaMap data type */ typedef struct _AlphaMap AlphaMap; AlphaMap * alpha_map_new (); AlphaMap * alpha_map_clone (const AlphaMap *a_map); void alpha_map_free (AlphaMap *alpha_map); int alpha_map_add_range (AlphaMap *alpha_map, AlphaChar begin, AlphaChar end); int alpha_char_strlen (const AlphaChar *str); int alpha_char_strcmp (const AlphaChar *str1, const AlphaChar *str2); #ifdef __cplusplus } #endif #endif /* __ALPHA_MAP_H */ /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/darray.c0000644000175000017500000005271500000000000021317 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * darray.c - Double-array trie structure * Created: 2006-08-13 * Author: Theppitak Karoonboonyanan */ #include #include #ifndef _MSC_VER /* for SIZE_MAX */ # include #endif #include #include "trie-private.h" #include "darray.h" #include "fileutils.h" /*----------------------------------* * INTERNAL TYPES DECLARATIONS * *----------------------------------*/ struct _Symbols { short num_symbols; TrieChar symbols[TRIE_CHAR_MAX + 1]; }; static Symbols * symbols_new (); static void symbols_add (Symbols *syms, TrieChar c); #define symbols_add_fast(s,c) ((s)->symbols[(s)->num_symbols++] = c) /*-----------------------------------* * PRIVATE METHODS DECLARATIONS * *-----------------------------------*/ #define da_get_free_list(d) (1) static Bool da_check_free_cell (DArray *d, TrieIndex s); static Bool da_has_children (const DArray *d, TrieIndex s); static TrieIndex da_find_free_base (DArray *d, const Symbols *symbols); static Bool da_fit_symbols (DArray *d, TrieIndex base, const Symbols *symbols); static void da_relocate_base (DArray *d, TrieIndex s, TrieIndex new_base); static Bool da_extend_pool (DArray *d, TrieIndex to_index); static void da_alloc_cell (DArray *d, TrieIndex cell); static void da_free_cell (DArray *d, TrieIndex cell); /* ==================== BEGIN IMPLEMENTATION PART ==================== */ /*------------------------------------* * INTERNAL TYPES IMPLEMENTATIONS * *------------------------------------*/ static Symbols * symbols_new () { Symbols *syms; syms = (Symbols *) malloc (sizeof (Symbols)); if (UNLIKELY (!syms)) return NULL; syms->num_symbols = 0; return syms; } void symbols_free (Symbols *syms) { free (syms); } static void symbols_add (Symbols *syms, TrieChar c) { short lower, upper; lower = 0; upper = syms->num_symbols; while (lower < upper) { short middle; middle = (lower + upper)/2; if (c > syms->symbols[middle]) lower = middle + 1; else if (c < syms->symbols[middle]) upper = middle; else return; } if (lower < syms->num_symbols) { memmove (syms->symbols + lower + 1, syms->symbols + lower, syms->num_symbols - lower); } syms->symbols[lower] = c; syms->num_symbols++; } int symbols_num (const Symbols *syms) { return syms->num_symbols; } TrieChar symbols_get (const Symbols *syms, int index) { return syms->symbols[index]; } /*------------------------------* * PRIVATE DATA DEFINITONS * *------------------------------*/ typedef struct { TrieIndex base; TrieIndex check; } DACell; struct _DArray { TrieIndex num_cells; DACell *cells; }; /*-----------------------------* * METHODS IMPLEMENTAIONS * *-----------------------------*/ #define DA_SIGNATURE 0xDAFCDAFC /* DA Header: * - Cell 0: SIGNATURE, number of cells * - Cell 1: free circular-list pointers * - Cell 2: root node * - Cell 3: DA pool begin */ #define DA_POOL_BEGIN 3 /** * @brief Create a new double-array object * * Create a new empty doubla-array object. */ DArray * da_new () { DArray *d; d = (DArray *) malloc (sizeof (DArray)); if (UNLIKELY (!d)) return NULL; d->num_cells = DA_POOL_BEGIN; d->cells = (DACell *) malloc (d->num_cells * sizeof (DACell)); if (UNLIKELY (!d->cells)) goto exit_da_created; d->cells[0].base = DA_SIGNATURE; d->cells[0].check = d->num_cells; d->cells[1].base = -1; d->cells[1].check = -1; d->cells[2].base = DA_POOL_BEGIN; d->cells[2].check = 0; return d; exit_da_created: free (d); return NULL; } /** * @brief Read double-array data from file * * @param file : the file to read * * @return a pointer to the openned double-array, NULL on failure * * Read double-array data from the opened file, starting from the current * file pointer until the end of double array data block. On return, the * file pointer is left at the position after the read block. */ DArray * da_fread (FILE *file) { long save_pos; DArray *d = NULL; TrieIndex n; /* check signature */ save_pos = ftell (file); if (!file_read_int32 (file, &n) || DA_SIGNATURE != (uint32) n) goto exit_file_read; d = (DArray *) malloc (sizeof (DArray)); if (UNLIKELY (!d)) goto exit_file_read; /* read number of cells */ if (!file_read_int32 (file, &d->num_cells)) goto exit_da_created; if (d->num_cells > SIZE_MAX / sizeof (DACell)) goto exit_da_created; d->cells = (DACell *) malloc (d->num_cells * sizeof (DACell)); if (UNLIKELY (!d->cells)) goto exit_da_created; d->cells[0].base = DA_SIGNATURE; d->cells[0].check= d->num_cells; for (n = 1; n < d->num_cells; n++) { if (!file_read_int32 (file, &d->cells[n].base) || !file_read_int32 (file, &d->cells[n].check)) { goto exit_da_cells_created; } } return d; exit_da_cells_created: free (d->cells); exit_da_created: free (d); exit_file_read: fseek (file, save_pos, SEEK_SET); return NULL; } /** * @brief Free double-array data * * @param d : the double-array data * * Free the given double-array data. */ void da_free (DArray *d) { free (d->cells); free (d); } /** * @brief Write double-array data * * @param d : the double-array data * @param file : the file to write to * * @return 0 on success, non-zero on failure * * Write double-array data to the given @a file, starting from the current * file pointer. On return, the file pointer is left after the double-array * data block. */ int da_fwrite (const DArray *d, FILE *file) { TrieIndex i; for (i = 0; i < d->num_cells; i++) { if (!file_write_int32 (file, d->cells[i].base) || !file_write_int32 (file, d->cells[i].check)) { return -1; } } return 0; } /** * @brief Get root state * * @param d : the double-array data * * @return root state of the @a index set, or TRIE_INDEX_ERROR on failure * * Get root state for stepwise walking. */ TrieIndex da_get_root (const DArray *d) { /* can be calculated value for multi-index trie */ return 2; } /** * @brief Get BASE cell * * @param d : the double-array data * @param s : the double-array state to get data * * @return the BASE cell value for the given state * * Get BASE cell value for the given state. */ TrieIndex da_get_base (const DArray *d, TrieIndex s) { return LIKELY (s < d->num_cells) ? d->cells[s].base : TRIE_INDEX_ERROR; } /** * @brief Get CHECK cell * * @param d : the double-array data * @param s : the double-array state to get data * * @return the CHECK cell value for the given state * * Get CHECK cell value for the given state. */ TrieIndex da_get_check (const DArray *d, TrieIndex s) { return LIKELY (s < d->num_cells) ? d->cells[s].check : TRIE_INDEX_ERROR; } /** * @brief Set BASE cell * * @param d : the double-array data * @param s : the double-array state to get data * @param val : the value to set * * Set BASE cell for the given state to the given value. */ void da_set_base (DArray *d, TrieIndex s, TrieIndex val) { if (LIKELY (s < d->num_cells)) { d->cells[s].base = val; } } /** * @brief Set CHECK cell * * @param d : the double-array data * @param s : the double-array state to get data * @param val : the value to set * * Set CHECK cell for the given state to the given value. */ void da_set_check (DArray *d, TrieIndex s, TrieIndex val) { if (LIKELY (s < d->num_cells)) { d->cells[s].check = val; } } /** * @brief Walk in double-array structure * * @param d : the double-array structure * @param s : current state * @param c : the input character * * @return boolean indicating success * * Walk the double-array trie from state @a *s, using input character @a c. * If there exists an edge from @a *s with arc labeled @a c, this function * returns TRUE and @a *s is updated to the new state. Otherwise, it returns * FALSE and @a *s is left unchanged. */ Bool da_walk (const DArray *d, TrieIndex *s, TrieChar c) { TrieIndex next; next = da_get_base (d, *s) + c; if (da_get_check (d, next) == *s) { *s = next; return TRUE; } return FALSE; } /** * @brief Insert a branch from trie node * * @param d : the double-array structure * @param s : the state to add branch to * @param c : the character for the branch label * * @return the index of the new node * * Insert a new arc labelled with character @a c from the trie node * represented by index @a s in double-array structure @a d. * Note that it assumes that no such arc exists before inserting. */ TrieIndex da_insert_branch (DArray *d, TrieIndex s, TrieChar c) { TrieIndex base, next; base = da_get_base (d, s); if (base > 0) { next = base + c; /* if already there, do not actually insert */ if (da_get_check (d, next) == s) return next; /* if (base + c) > TRIE_INDEX_MAX which means 'next' is overflow, * or cell [next] is not free, relocate to a free slot */ if (base > TRIE_INDEX_MAX - c || !da_check_free_cell (d, next)) { Symbols *symbols; TrieIndex new_base; /* relocate BASE[s] */ symbols = da_output_symbols (d, s); symbols_add (symbols, c); new_base = da_find_free_base (d, symbols); symbols_free (symbols); if (UNLIKELY (TRIE_INDEX_ERROR == new_base)) return TRIE_INDEX_ERROR; da_relocate_base (d, s, new_base); next = new_base + c; } } else { Symbols *symbols; TrieIndex new_base; symbols = symbols_new (); symbols_add (symbols, c); new_base = da_find_free_base (d, symbols); symbols_free (symbols); if (UNLIKELY (TRIE_INDEX_ERROR == new_base)) return TRIE_INDEX_ERROR; da_set_base (d, s, new_base); next = new_base + c; } da_alloc_cell (d, next); da_set_check (d, next, s); return next; } static Bool da_check_free_cell (DArray *d, TrieIndex s) { return da_extend_pool (d, s) && da_get_check (d, s) < 0; } static Bool da_has_children (const DArray *d, TrieIndex s) { TrieIndex base; TrieIndex c, max_c; base = da_get_base (d, s); if (TRIE_INDEX_ERROR == base || base < 0) return FALSE; max_c = MIN_VAL (TRIE_CHAR_MAX, d->num_cells - base); for (c = 0; c <= max_c; c++) { if (da_get_check (d, base + c) == s) return TRUE; } return FALSE; } Symbols * da_output_symbols (const DArray *d, TrieIndex s) { Symbols *syms; TrieIndex base; TrieIndex c, max_c; syms = symbols_new (); base = da_get_base (d, s); max_c = MIN_VAL (TRIE_CHAR_MAX, d->num_cells - base); for (c = 0; c <= max_c; c++) { if (da_get_check (d, base + c) == s) symbols_add_fast (syms, (TrieChar) c); } return syms; } static TrieIndex da_find_free_base (DArray *d, const Symbols *symbols) { TrieChar first_sym; TrieIndex s; /* find first free cell that is beyond the first symbol */ first_sym = symbols_get (symbols, 0); s = -da_get_check (d, da_get_free_list (d)); while (s != da_get_free_list (d) && s < (TrieIndex) first_sym + DA_POOL_BEGIN) { s = -da_get_check (d, s); } if (s == da_get_free_list (d)) { for (s = first_sym + DA_POOL_BEGIN; ; ++s) { if (!da_extend_pool (d, s)) return TRIE_INDEX_ERROR; if (da_get_check (d, s) < 0) break; } } /* search for next free cell that fits the symbols set */ while (!da_fit_symbols (d, s - first_sym, symbols)) { /* extend pool before getting exhausted */ if (-da_get_check (d, s) == da_get_free_list (d)) { if (UNLIKELY (!da_extend_pool (d, d->num_cells))) return TRIE_INDEX_ERROR; } s = -da_get_check (d, s); } return s - first_sym; } static Bool da_fit_symbols (DArray *d, TrieIndex base, const Symbols *symbols) { int i; for (i = 0; i < symbols_num (symbols); i++) { TrieChar sym = symbols_get (symbols, i); /* if (base + sym) > TRIE_INDEX_MAX which means it's overflow, * or cell [base + sym] is not free, the symbol is not fit. */ if (base > TRIE_INDEX_MAX - sym || !da_check_free_cell (d, base + sym)) return FALSE; } return TRUE; } static void da_relocate_base (DArray *d, TrieIndex s, TrieIndex new_base) { TrieIndex old_base; Symbols *symbols; int i; old_base = da_get_base (d, s); symbols = da_output_symbols (d, s); for (i = 0; i < symbols_num (symbols); i++) { TrieIndex old_next, new_next, old_next_base; old_next = old_base + symbols_get (symbols, i); new_next = new_base + symbols_get (symbols, i); old_next_base = da_get_base (d, old_next); /* allocate new next node and copy BASE value */ da_alloc_cell (d, new_next); da_set_check (d, new_next, s); da_set_base (d, new_next, old_next_base); /* old_next node is now moved to new_next * so, all cells belonging to old_next * must be given to new_next */ /* preventing the case of TAIL pointer */ if (old_next_base > 0) { TrieIndex c, max_c; max_c = MIN_VAL (TRIE_CHAR_MAX, d->num_cells - old_next_base); for (c = 0; c <= max_c; c++) { if (da_get_check (d, old_next_base + c) == old_next) da_set_check (d, old_next_base + c, new_next); } } /* free old_next node */ da_free_cell (d, old_next); } symbols_free (symbols); /* finally, make BASE[s] point to new_base */ da_set_base (d, s, new_base); } static Bool da_extend_pool (DArray *d, TrieIndex to_index) { void *new_block; TrieIndex new_begin; TrieIndex i; TrieIndex free_tail; if (UNLIKELY (to_index <= 0 || TRIE_INDEX_MAX <= to_index)) return FALSE; if (to_index < d->num_cells) return TRUE; new_block = realloc (d->cells, (to_index + 1) * sizeof (DACell)); if (UNLIKELY (!new_block)) return FALSE; d->cells = (DACell *) new_block; new_begin = d->num_cells; d->num_cells = to_index + 1; /* initialize new free list */ for (i = new_begin; i < to_index; i++) { da_set_check (d, i, -(i + 1)); da_set_base (d, i + 1, -i); } /* merge the new circular list to the old */ free_tail = -da_get_base (d, da_get_free_list (d)); da_set_check (d, free_tail, -new_begin); da_set_base (d, new_begin, -free_tail); da_set_check (d, to_index, -da_get_free_list (d)); da_set_base (d, da_get_free_list (d), -to_index); /* update header cell */ d->cells[0].check = d->num_cells; return TRUE; } /** * @brief Prune the single branch * * @param d : the double-array structure * @param s : the dangling state to prune off * * Prune off a non-separate path up from the final state @a s. * If @a s still has some children states, it does nothing. Otherwise, * it deletes the node and all its parents which become non-separate. */ void da_prune (DArray *d, TrieIndex s) { da_prune_upto (d, da_get_root (d), s); } /** * @brief Prune the single branch up to given parent * * @param d : the double-array structure * @param p : the parent up to which to be pruned * @param s : the dangling state to prune off * * Prune off a non-separate path up from the final state @a s to the * given parent @a p. The prunning stop when either the parent @a p * is met, or a first non-separate node is found. */ void da_prune_upto (DArray *d, TrieIndex p, TrieIndex s) { while (p != s && !da_has_children (d, s)) { TrieIndex parent; parent = da_get_check (d, s); da_free_cell (d, s); s = parent; } } static void da_alloc_cell (DArray *d, TrieIndex cell) { TrieIndex prev, next; prev = -da_get_base (d, cell); next = -da_get_check (d, cell); /* remove the cell from free list */ da_set_check (d, prev, -next); da_set_base (d, next, -prev); } static void da_free_cell (DArray *d, TrieIndex cell) { TrieIndex i, prev; /* find insertion point */ i = -da_get_check (d, da_get_free_list (d)); while (i != da_get_free_list (d) && i < cell) i = -da_get_check (d, i); prev = -da_get_base (d, i); /* insert cell before i */ da_set_check (d, cell, -i); da_set_base (d, cell, -prev); da_set_check (d, prev, -cell); da_set_base (d, i, -cell); } /** * @brief Find first separate node in a sub-trie * * @param d : the double-array structure * @param root : the sub-trie root to search from * @param keybuff : the TrieString buffer for incrementally calcuating key * * @return index to the first separate node; TRIE_INDEX_ERROR on any failure * * Find the first separate node under a sub-trie rooted at @a root. * * On return, @a keybuff is appended with the key characters which walk from * @a root to the separate node. This is for incrementally calculating the * transition key, which is more efficient than later totally reconstructing * key from the given separate node. * * Available since: 0.2.6 */ TrieIndex da_first_separate (DArray *d, TrieIndex root, TrieString *keybuff) { TrieIndex base; TrieIndex c, max_c; while ((base = da_get_base (d, root)) >= 0) { max_c = MIN_VAL (TRIE_CHAR_MAX, d->num_cells - base); for (c = 0; c <= max_c; c++) { if (da_get_check (d, base + c) == root) break; } if (c > max_c) return TRIE_INDEX_ERROR; trie_string_append_char (keybuff, c); root = base + c; } return root; } /** * @brief Find next separate node in a sub-trie * * @param d : the double-array structure * @param root : the sub-trie root to search from * @param sep : the current separate node * @param keybuff : the TrieString buffer for incrementally calcuating key * * @return index to the next separate node; TRIE_INDEX_ERROR if no more * separate node is found * * Find the next separate node under a sub-trie rooted at @a root starting * from the current separate node @a sep. * * On return, @a keybuff is incrementally updated from the key which walks * to previous separate node to the one which walks to the new separate node. * So, it is assumed to be initialized by at least one da_first_separate() * call before. This incremental key calculation is more efficient than later * totally reconstructing key from the given separate node. * * Available since: 0.2.6 */ TrieIndex da_next_separate (DArray *d, TrieIndex root, TrieIndex sep, TrieString *keybuff) { TrieIndex parent; TrieIndex base; TrieIndex c, max_c; while (sep != root) { parent = da_get_check (d, sep); base = da_get_base (d, parent); c = sep - base; trie_string_cut_last (keybuff); /* find next sibling of sep */ max_c = MIN_VAL (TRIE_CHAR_MAX, d->num_cells - base); while (++c <= max_c) { if (da_get_check (d, base + c) == parent) { trie_string_append_char (keybuff, c); return da_first_separate (d, base + c, keybuff); } } sep = parent; } return TRIE_INDEX_ERROR; } /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/darray.h0000644000175000017500000000574600000000000021326 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * darray.h - Double-array trie structure * Created: 2006-08-11 * Author: Theppitak Karoonboonyanan */ #ifndef __DARRAY_H #define __DARRAY_H #include "triedefs.h" #include "trie-string.h" /** * @file darray.h * @brief Double-array trie structure */ /** * @brief Symbol set structure type */ typedef struct _Symbols Symbols; void symbols_free (Symbols *syms); int symbols_num (const Symbols *syms); TrieChar symbols_get (const Symbols *syms, int index); /** * @brief Double-array structure type */ typedef struct _DArray DArray; DArray * da_new (); DArray * da_fread (FILE *file); void da_free (DArray *d); int da_fwrite (const DArray *d, FILE *file); TrieIndex da_get_root (const DArray *d); TrieIndex da_get_base (const DArray *d, TrieIndex s); TrieIndex da_get_check (const DArray *d, TrieIndex s); void da_set_base (DArray *d, TrieIndex s, TrieIndex val); void da_set_check (DArray *d, TrieIndex s, TrieIndex val); Bool da_walk (const DArray *d, TrieIndex *s, TrieChar c); Symbols * da_output_symbols (const DArray *d, TrieIndex s); /** * @brief Test walkability in double-array structure * * @param d : the double-array structure * @param s : current state * @param c : the input character * * @return boolean indicating walkability * * Test if there is a transition from state @a s with input character @a c. */ /* Bool da_is_walkable (DArray *d, TrieIndex s, TrieChar c); */ #define da_is_walkable(d,s,c) \ (da_get_check ((d), da_get_base ((d), (s)) + (c)) == (s)) TrieIndex da_insert_branch (DArray *d, TrieIndex s, TrieChar c); void da_prune (DArray *d, TrieIndex s); void da_prune_upto (DArray *d, TrieIndex p, TrieIndex s); TrieIndex da_first_separate (DArray *d, TrieIndex root, TrieString *keybuff); TrieIndex da_next_separate (DArray *d, TrieIndex root, TrieIndex sep, TrieString *keybuff); #endif /* __DARRAY_H */ /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/dstring-private.h0000644000175000017500000000244700000000000023161 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * dstring-private.h - Dynamic string type * Created: 2012-08-02 * Author: Theppitak Karoonboonyanan */ #ifndef __DSTRING_PRIVATE_H #define __DSTRING_PRIVATE_H #include "typedefs.h" struct _DString { int char_size; int str_len; int alloc_size; void * val; }; #endif /* __DSTRING_PRIVATE_H */ /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/dstring.c0000644000175000017500000000766400000000000021512 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * dstring.c - Dynamic string type * Created: 2012-08-01 * Author: Theppitak Karoonboonyanan */ #include "dstring.h" #include "dstring-private.h" #include "trie-private.h" #include #include DString * dstring_new (int char_size, int n_elm) { DString *ds; ds = (DString *) malloc (sizeof (DString)); if (UNLIKELY (!ds)) return NULL; ds->alloc_size = char_size * n_elm; ds->val = malloc (ds->alloc_size); if (!ds->val) { free (ds); return NULL; } ds->char_size = char_size; ds->str_len = 0; return ds; } void dstring_free (DString *ds) { free (ds->val); free (ds); } int dstring_length (const DString *ds) { return ds->str_len; } const void * dstring_get_val (const DString *ds) { return ds->val; } void * dstring_get_val_rw (DString *ds) { return ds->val; } void dstring_clear (DString *ds) { ds->str_len = 0; } static Bool dstring_ensure_space (DString *ds, int size) { if (ds->alloc_size < size) { int re_size = MAX_VAL (ds->alloc_size * 2, size); void *re_ptr = realloc (ds->val, re_size); if (UNLIKELY (!re_ptr)) return FALSE; ds->val = re_ptr; ds->alloc_size = re_size; } return TRUE; } Bool dstring_copy (DString *dst, const DString *src) { if (!dstring_ensure_space (dst, (src->str_len + 1) * src->char_size)) return FALSE; memcpy (dst->val, src->val, (src->str_len + 1) * src->char_size); dst->char_size = src->char_size; dst->str_len = src->str_len; return TRUE; } Bool dstring_append (DString *dst, const DString *src) { if (dst->char_size != src->char_size) return FALSE; if (!dstring_ensure_space (dst, (dst->str_len + src->str_len + 1) * dst->char_size)) { return FALSE; } memcpy ((char *)dst->val + (dst->char_size * dst->str_len), src->val, (src->str_len + 1) * dst->char_size); dst->str_len += src->str_len; return TRUE; } Bool dstring_append_string (DString *ds, const void *data, int len) { if (!dstring_ensure_space (ds, (ds->str_len + len + 1) * ds->char_size)) return FALSE; memcpy ((char *)ds->val + (ds->char_size * ds->str_len), data, ds->char_size * len); ds->str_len += len; return TRUE; } Bool dstring_append_char (DString *ds, const void *data) { if (!dstring_ensure_space (ds, (ds->str_len + 2) * ds->char_size)) return FALSE; memcpy ((char *)ds->val + (ds->char_size * ds->str_len), data, ds->char_size); ds->str_len++; return TRUE; } Bool dstring_terminate (DString *ds) { if (!dstring_ensure_space (ds, (ds->str_len + 2) * ds->char_size)) return FALSE; memset ((char *)ds->val + (ds->char_size * ds->str_len), 0, ds->char_size); return TRUE; } Bool dstring_cut_last (DString *ds) { if (0 == ds->str_len) return FALSE; ds->str_len--; return TRUE; } /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/dstring.h0000644000175000017500000000345500000000000021511 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * dstring.h - Dynamic string type * Created: 2012-08-01 * Author: Theppitak Karoonboonyanan */ #ifndef __DSTRING_H #define __DSTRING_H #include "typedefs.h" typedef struct _DString DString; DString * dstring_new (int char_size, int n_elm); void dstring_free (DString *ds); int dstring_length (const DString *ds); const void * dstring_get_val (const DString *ds); void * dstring_get_val_rw (DString *ds); void dstring_clear (DString *ds); Bool dstring_copy (DString *dst, const DString *src); Bool dstring_append (DString *dst, const DString *src); Bool dstring_append_string (DString *ds, const void *data, int len); Bool dstring_append_char (DString *ds, const void *data); Bool dstring_terminate (DString *ds); Bool dstring_cut_last (DString *ds); #endif /* __DSTRING_H */ /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/fileutils.c0000644000175000017500000000521700000000000022030 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * fileutils.h - File utility functions * Created: 2006-08-15 * Author: Theppitak Karoonboonyanan */ #include #include #include "fileutils.h" /* ==================== BEGIN IMPLEMENTATION PART ==================== */ /*--------------------------------* * FUNCTIONS IMPLEMENTATIONS * *--------------------------------*/ Bool file_read_int32 (FILE *file, int32 *o_val) { unsigned char buff[4]; if (fread (buff, 4, 1, file) == 1) { *o_val = (buff[0] << 24) | (buff[1] << 16) | (buff[2] << 8) | buff[3]; return TRUE; } return FALSE; } Bool file_write_int32 (FILE *file, int32 val) { unsigned char buff[4]; buff[0] = (val >> 24) & 0xff; buff[1] = (val >> 16) & 0xff; buff[2] = (val >> 8) & 0xff; buff[3] = val & 0xff; return (fwrite (buff, 4, 1, file) == 1); } Bool file_read_int16 (FILE *file, int16 *o_val) { unsigned char buff[2]; if (fread (buff, 2, 1, file) == 1) { *o_val = (buff[0] << 8) | buff[1]; return TRUE; } return FALSE; } Bool file_write_int16 (FILE *file, int16 val) { unsigned char buff[2]; buff[0] = val >> 8; buff[1] = val & 0xff; return (fwrite (buff, 2, 1, file) == 1); } Bool file_read_int8 (FILE *file, int8 *o_val) { return (fread (o_val, sizeof (int8), 1, file) == 1); } Bool file_write_int8 (FILE *file, int8 val) { return (fwrite (&val, sizeof (int8), 1, file) == 1); } Bool file_read_chars (FILE *file, char *buff, int len) { return (fread (buff, sizeof (char), len, file) == len); } Bool file_write_chars (FILE *file, const char *buff, int len) { return (fwrite (buff, sizeof (char), len, file) == len); } /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/fileutils.h0000644000175000017500000000315100000000000022030 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * fileutils.h - File utility functions * Created: 2006-08-14 * Author: Theppitak Karoonboonyanan */ #ifndef __FILEUTILS_H #define __FILEUTILS_H #include #include Bool file_read_int32 (FILE *file, int32 *o_val); Bool file_write_int32 (FILE *file, int32 val); Bool file_read_int16 (FILE *file, int16 *o_val); Bool file_write_int16 (FILE *file, int16 val); Bool file_read_int8 (FILE *file, int8 *o_val); Bool file_write_int8 (FILE *file, int8 val); Bool file_read_chars (FILE *file, char *buff, int len); Bool file_write_chars (FILE *file, const char *buff, int len); #endif /* __FILEUTILS_H */ /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/tail.c0000644000175000017500000003151000000000000020754 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * tail.c - trie tail for keeping suffixes * Created: 2006-08-15 * Author: Theppitak Karoonboonyanan */ #include #include #ifndef _MSC_VER /* for SIZE_MAX */ # include #endif #include #include "tail.h" #include "trie-private.h" #include "fileutils.h" /*----------------------------------* * INTERNAL TYPES DECLARATIONS * *----------------------------------*/ /*-----------------------------------* * PRIVATE METHODS DECLARATIONS * *-----------------------------------*/ static TrieIndex tail_alloc_block (Tail *t); static void tail_free_block (Tail *t, TrieIndex block); /* ==================== BEGIN IMPLEMENTATION PART ==================== */ /*------------------------------------* * INTERNAL TYPES IMPLEMENTATIONS * *------------------------------------*/ /*------------------------------* * PRIVATE DATA DEFINITONS * *------------------------------*/ typedef struct { TrieIndex next_free; TrieData data; TrieChar *suffix; } TailBlock; struct _Tail { TrieIndex num_tails; TailBlock *tails; TrieIndex first_free; }; /*-----------------------------* * METHODS IMPLEMENTAIONS * *-----------------------------*/ #define TAIL_SIGNATURE 0xDFFCDFFC #define TAIL_START_BLOCKNO 1 /* Tail Header: * INT32: signature * INT32: pointer to first free slot * INT32: number of tail blocks * * Tail Blocks: * INT32: pointer to next free block (-1 for allocated blocks) * INT32: data for the key * INT16: length * BYTES[length]: suffix string (no terminating '\0') */ /** * @brief Create a new tail object * * Create a new empty tail object. */ Tail * tail_new () { Tail *t; t = (Tail *) malloc (sizeof (Tail)); if (UNLIKELY (!t)) return NULL; t->first_free = 0; t->num_tails = 0; t->tails = NULL; return t; } /** * @brief Read tail data from file * * @param file : the file to read * * @return a pointer to the openned tail data, NULL on failure * * Read tail data from the opened file, starting from the current * file pointer until the end of tail data block. On return, the * file pointer is left at the position after the read block. */ Tail * tail_fread (FILE *file) { long save_pos; Tail *t; TrieIndex i; uint32 sig; /* check signature */ save_pos = ftell (file); if (!file_read_int32 (file, (int32 *) &sig) || TAIL_SIGNATURE != sig) goto exit_file_read; t = (Tail *) malloc (sizeof (Tail)); if (UNLIKELY (!t)) goto exit_file_read; if (!file_read_int32 (file, &t->first_free) || !file_read_int32 (file, &t->num_tails)) { goto exit_tail_created; } if (t->num_tails > SIZE_MAX / sizeof (TailBlock)) goto exit_tail_created; t->tails = (TailBlock *) malloc (t->num_tails * sizeof (TailBlock)); if (UNLIKELY (!t->tails)) goto exit_tail_created; for (i = 0; i < t->num_tails; i++) { int16 length; if (!file_read_int32 (file, &t->tails[i].next_free) || !file_read_int32 (file, &t->tails[i].data) || !file_read_int16 (file, &length)) { goto exit_in_loop; } t->tails[i].suffix = (TrieChar *) malloc (length + 1); if (UNLIKELY (!t->tails[i].suffix)) goto exit_in_loop; if (length > 0) { if (!file_read_chars (file, (char *)t->tails[i].suffix, length)) { free (t->tails[i].suffix); goto exit_in_loop; } } t->tails[i].suffix[length] = '\0'; } return t; exit_in_loop: while (i > 0) { free (t->tails[--i].suffix); } free (t->tails); exit_tail_created: free (t); exit_file_read: fseek (file, save_pos, SEEK_SET); return NULL; } /** * @brief Free tail data * * @param t : the tail data * * @return 0 on success, non-zero on failure * * Free the given tail data. */ void tail_free (Tail *t) { TrieIndex i; if (t->tails) { for (i = 0; i < t->num_tails; i++) if (t->tails[i].suffix) free (t->tails[i].suffix); free (t->tails); } free (t); } /** * @brief Write tail data * * @param t : the tail data * @param file : the file to write to * * @return 0 on success, non-zero on failure * * Write tail data to the given @a file, starting from the current file * pointer. On return, the file pointer is left after the tail data block. */ int tail_fwrite (const Tail *t, FILE *file) { TrieIndex i; if (!file_write_int32 (file, TAIL_SIGNATURE) || !file_write_int32 (file, t->first_free) || !file_write_int32 (file, t->num_tails)) { return -1; } for (i = 0; i < t->num_tails; i++) { int16 length; if (!file_write_int32 (file, t->tails[i].next_free) || !file_write_int32 (file, t->tails[i].data)) { return -1; } length = t->tails[i].suffix ? strlen ((const char *)t->tails[i].suffix) : 0; if (!file_write_int16 (file, length)) return -1; if (length > 0 && !file_write_chars (file, (char *)t->tails[i].suffix, length)) { return -1; } } return 0; } /** * @brief Get suffix * * @param t : the tail data * @param index : the index of the suffix * * @return pointer to the indexed suffix string. * * Get suffix from tail with given @a index. The returned string is a pointer * to internal storage, which should be accessed read-only by the caller. * No need to free() it. */ const TrieChar * tail_get_suffix (const Tail *t, TrieIndex index) { index -= TAIL_START_BLOCKNO; return LIKELY (index < t->num_tails) ? t->tails[index].suffix : NULL; } /** * @brief Set suffix of existing entry * * @param t : the tail data * @param index : the index of the suffix * @param suffix : the new suffix * * Set suffix of existing entry of given @a index in tail. */ Bool tail_set_suffix (Tail *t, TrieIndex index, const TrieChar *suffix) { index -= TAIL_START_BLOCKNO; if (LIKELY (index < t->num_tails)) { /* suffix and t->tails[index].suffix may overlap; * so, dup it before it's overwritten */ TrieChar *tmp = NULL; if (suffix) { tmp = (TrieChar *) strdup ((const char *)suffix); if (UNLIKELY (!tmp)) return FALSE; } if (t->tails[index].suffix) free (t->tails[index].suffix); t->tails[index].suffix = tmp; return TRUE; } return FALSE; } /** * @brief Add a new suffix * * @param t : the tail data * @param suffix : the new suffix * * @return the index of the newly added suffix, * or TRIE_INDEX_ERROR on failure. * * Add a new suffix entry to tail. */ TrieIndex tail_add_suffix (Tail *t, const TrieChar *suffix) { TrieIndex new_block; new_block = tail_alloc_block (t); if (UNLIKELY (TRIE_INDEX_ERROR == new_block)) return TRIE_INDEX_ERROR; tail_set_suffix (t, new_block, suffix); return new_block; } static TrieIndex tail_alloc_block (Tail *t) { TrieIndex block; if (0 != t->first_free) { block = t->first_free; t->first_free = t->tails[block].next_free; } else { void *new_block; block = t->num_tails; new_block = realloc (t->tails, (t->num_tails + 1) * sizeof (TailBlock)); if (UNLIKELY (!new_block)) return TRIE_INDEX_ERROR; t->tails = (TailBlock *) new_block; ++t->num_tails; } t->tails[block].next_free = -1; t->tails[block].data = TRIE_DATA_ERROR; t->tails[block].suffix = NULL; return block + TAIL_START_BLOCKNO; } static void tail_free_block (Tail *t, TrieIndex block) { TrieIndex i, j; block -= TAIL_START_BLOCKNO; if (block >= t->num_tails) return; t->tails[block].data = TRIE_DATA_ERROR; if (NULL != t->tails[block].suffix) { free (t->tails[block].suffix); t->tails[block].suffix = NULL; } /* find insertion point */ j = 0; for (i = t->first_free; i != 0 && i < block; i = t->tails[i].next_free) j = i; /* insert free block between j and i */ t->tails[block].next_free = i; if (0 != j) t->tails[j].next_free = block; else t->first_free = block; } /** * @brief Get data associated to suffix entry * * @param t : the tail data * @param index : the index of the suffix * * @return the data associated to the suffix entry * * Get data associated to suffix entry @a index in tail data. */ TrieData tail_get_data (const Tail *t, TrieIndex index) { index -= TAIL_START_BLOCKNO; return LIKELY (index < t->num_tails) ? t->tails[index].data : TRIE_DATA_ERROR; } /** * @brief Set data associated to suffix entry * * @param t : the tail data * @param index : the index of the suffix * @param data : the data to set * * @return boolean indicating success * * Set data associated to suffix entry @a index in tail data. */ Bool tail_set_data (Tail *t, TrieIndex index, TrieData data) { index -= TAIL_START_BLOCKNO; if (LIKELY (index < t->num_tails)) { t->tails[index].data = data; return TRUE; } return FALSE; } /** * @brief Delete suffix entry * * @param t : the tail data * @param index : the index of the suffix to delete * * Delete suffix entry from the tail data. */ void tail_delete (Tail *t, TrieIndex index) { tail_free_block (t, index); } /** * @brief Walk in tail with a string * * @param t : the tail data * @param s : the tail data index * @param suffix_idx : pointer to current character index in suffix * @param str : the string to use in walking * @param len : total characters in @a str to walk * * @return total number of characters successfully walked * * Walk in the tail data @a t at entry @a s, from given character position * @a *suffix_idx, using @a len characters of given string @a str. On return, * @a *suffix_idx is updated to the position after the last successful walk, * and the function returns the total number of character succesfully walked. */ int tail_walk_str (const Tail *t, TrieIndex s, short *suffix_idx, const TrieChar *str, int len) { const TrieChar *suffix; int i; short j; suffix = tail_get_suffix (t, s); if (UNLIKELY (!suffix)) return FALSE; i = 0; j = *suffix_idx; while (i < len) { if (str[i] != suffix[j]) break; ++i; /* stop and stay at null-terminator */ if (0 == suffix[j]) break; ++j; } *suffix_idx = j; return i; } /** * @brief Walk in tail with a character * * @param t : the tail data * @param s : the tail data index * @param suffix_idx : pointer to current character index in suffix * @param c : the character to use in walking * * @return boolean indicating success * * Walk in the tail data @a t at entry @a s, from given character position * @a *suffix_idx, using given character @a c. If the walk is successful, * it returns TRUE, and @a *suffix_idx is updated to the next character. * Otherwise, it returns FALSE, and @a *suffix_idx is left unchanged. */ Bool tail_walk_char (const Tail *t, TrieIndex s, short *suffix_idx, TrieChar c) { const TrieChar *suffix; TrieChar suffix_char; suffix = tail_get_suffix (t, s); if (UNLIKELY (!suffix)) return FALSE; suffix_char = suffix[*suffix_idx]; if (suffix_char == c) { if (0 != suffix_char) ++*suffix_idx; return TRUE; } return FALSE; } /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/tail.h0000644000175000017500000000575000000000000020770 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * tail.h - trie tail for keeping suffixes * Created: 2006-08-12 * Author: Theppitak Karoonboonyanan */ #ifndef __TAIL_H #define __TAIL_H #include "triedefs.h" /** * @file tail.h * @brief trie tail for keeping suffixes */ /** * @brief Double-array structure type */ typedef struct _Tail Tail; Tail * tail_new (); Tail * tail_fread (FILE *file); void tail_free (Tail *t); int tail_fwrite (const Tail *t, FILE *file); const TrieChar * tail_get_suffix (const Tail *t, TrieIndex index); Bool tail_set_suffix (Tail *t, TrieIndex index, const TrieChar *suffix); TrieIndex tail_add_suffix (Tail *t, const TrieChar *suffix); TrieData tail_get_data (const Tail *t, TrieIndex index); Bool tail_set_data (Tail *t, TrieIndex index, TrieData data); void tail_delete (Tail *t, TrieIndex index); int tail_walk_str (const Tail *t, TrieIndex s, short *suffix_idx, const TrieChar *str, int len); Bool tail_walk_char (const Tail *t, TrieIndex s, short *suffix_idx, TrieChar c); /** * @brief Test walkability in tail with a character * * @param t : the tail data * @param s : the tail data index * @param suffix_idx : current character index in suffix * @param c : the character to test walkability * * @return boolean indicating walkability * * Test if the character @a c can be used to walk from given character * position @a suffix_idx of entry @a s of the tail data @a t. */ /* Bool tail_is_walkable_char (Tail *t, TrieIndex s, short suffix_idx, const TrieChar c); */ #define tail_is_walkable_char(t,s,suffix_idx,c) \ (tail_get_suffix ((t), (s)) [suffix_idx] == (c)) #endif /* __TAIL_H */ /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/trie-private.h0000644000175000017500000000352400000000000022447 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * trie-private.h - Private utilities for trie implementation * Created: 2007-08-25 * Author: Theppitak Karoonboonyanan */ #ifndef __TRIE_PRIVATE_H #define __TRIE_PRIVATE_H #include /** * @file trie-private.h * @brief Private utilities for trie implementation */ /** * @brief LIKELY and UNLIKELY macros for hinting the compiler * about the expected result of a Boolean expression, for the sake of * optimization */ #if defined(__GNUC__) && (__GNUC__ > 2) && defined(__OPTIMIZE__) #define LIKELY(expr) (__builtin_expect (!!(expr), 1)) #define UNLIKELY(expr) (__builtin_expect (!!(expr), 0)) #else #define LIKELY(expr) (expr) #define UNLIKELY(expr) (expr) #endif /** * @brief Minimum value macro */ #define MIN_VAL(a,b) ((a)<(b)?(a):(b)) /** * @brief Maximum value macro */ #define MAX_VAL(a,b) ((a)>(b)?(a):(b)) #endif /* __TRIE_PRIVATE_H */ /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/trie-string.c0000644000175000017500000000503300000000000022273 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * trie-string.c - Dynamic string type for Trie alphabets * Created: 2012-08-02 * Author: Theppitak Karoonboonyanan */ #include "trie-string.h" #include "dstring-private.h" #include "triedefs.h" #include struct _TrieString { DString ds; }; TrieString * trie_string_new (int n_elm) { return (TrieString *) dstring_new (sizeof (TrieChar), n_elm); } void trie_string_free (TrieString *ts) { dstring_free ((DString *)ts); } int trie_string_length (const TrieString *ts) { return dstring_length ((DString *)ts); } const void * trie_string_get_val (const TrieString *ts) { return dstring_get_val ((DString *)ts); } void * trie_string_get_val_rw (TrieString *ts) { return dstring_get_val_rw ((DString *)ts); } void trie_string_clear (TrieString *ts) { dstring_clear ((DString *)ts); } Bool trie_string_copy (TrieString *dst, const TrieString *src) { return dstring_copy ((DString *)dst, (const DString *)src); } Bool trie_string_append (TrieString *dst, const TrieString *src) { return dstring_append ((DString *)dst, (const DString *)src); } Bool trie_string_append_string (TrieString *ts, const TrieChar *str) { return dstring_append_string ((DString *)ts, str, strlen ((const char *)str)); } Bool trie_string_append_char (TrieString *ts, TrieChar tc) { return dstring_append_char ((DString *)ts, &tc); } Bool trie_string_terminate (TrieString *ts) { return dstring_terminate ((DString *)ts); } Bool trie_string_cut_last (TrieString *ts) { return dstring_cut_last ((DString *)ts); } /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/trie-string.h0000644000175000017500000000365500000000000022310 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * trie-string.h - Dynamic string type for Trie alphabets * Created: 2012-08-02 * Author: Theppitak Karoonboonyanan */ #ifndef __TRIE_STRING_H #define __TRIE_STRING_H #include "dstring.h" #include "triedefs.h" typedef struct _TrieString TrieString; TrieString * trie_string_new (int n_elm); void trie_string_free (TrieString *ts); int trie_string_length (const TrieString *ts); const void * trie_string_get_val (const TrieString *ts); void * trie_string_get_val_rw (TrieString *ts); void trie_string_clear (TrieString *ts); Bool trie_string_copy (TrieString *dst, const TrieString *src); Bool trie_string_append (TrieString *dst, const TrieString *src); Bool trie_string_append_string (TrieString *ts, const TrieChar *str); Bool trie_string_append_char (TrieString *ts, TrieChar tc); Bool trie_string_terminate (TrieString *ts); Bool trie_string_cut_last (TrieString *ts); #endif /* __TRIE_STRING_H */ /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/trie.c0000644000175000017500000007052300000000000020775 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * trie.c - Trie data type and functions * Created: 2006-08-11 * Author: Theppitak Karoonboonyanan */ #include #include #include "trie.h" #include "trie-private.h" #include "fileutils.h" #include "alpha-map.h" #include "alpha-map-private.h" #include "darray.h" #include "tail.h" #include "trie-string.h" /** * @brief Trie structure */ struct _Trie { AlphaMap *alpha_map; DArray *da; Tail *tail; Bool is_dirty; }; /** * @brief TrieState structure */ struct _TrieState { const Trie *trie; /**< the corresponding trie */ TrieIndex index; /**< index in double-array/tail structures */ short suffix_idx; /**< suffix character offset, if in suffix */ short is_suffix; /**< whether it is currently in suffix part */ }; /** * @brief TrieIterator structure */ struct _TrieIterator { const TrieState *root; /**< the state to start iteration from */ TrieState *state; /**< the current state */ TrieString *key; /**< buffer for calculating the entry key */ }; /*------------------------* * INTERNAL FUNCTIONS * *------------------------*/ #define trie_da_is_separate(da,s) (da_get_base ((da), (s)) < 0) #define trie_da_get_tail_index(da,s) (-da_get_base ((da), (s))) #define trie_da_set_tail_index(da,s,v) (da_set_base ((da), (s), -(v))) static TrieState * trie_state_new (const Trie *trie, TrieIndex index, short suffix_idx, short is_suffix); static Bool trie_store_conditionally (Trie *trie, const AlphaChar *key, TrieData data, Bool is_overwrite); static Bool trie_branch_in_branch (Trie *trie, TrieIndex sep_node, const TrieChar *suffix, TrieData data); static Bool trie_branch_in_tail (Trie *trie, TrieIndex sep_node, const TrieChar *suffix, TrieData data); /*-----------------------* * GENERAL FUNCTIONS * *-----------------------*/ /** * @brief Create a new trie * * @param alpha_map : the alphabet set for the trie * * @return a pointer to the newly created trie, NULL on failure * * Create a new empty trie object based on the given @a alpha_map alphabet * set. The trie contents can then be added and deleted with trie_store() and * trie_delete() respectively. * * The created object must be freed with trie_free(). */ Trie * trie_new (const AlphaMap *alpha_map) { Trie *trie; trie = (Trie *) malloc (sizeof (Trie)); if (UNLIKELY (!trie)) return NULL; trie->alpha_map = alpha_map_clone (alpha_map); if (UNLIKELY (!trie->alpha_map)) goto exit_trie_created; trie->da = da_new (); if (UNLIKELY (!trie->da)) goto exit_alpha_map_created; trie->tail = tail_new (); if (UNLIKELY (!trie->tail)) goto exit_da_created; trie->is_dirty = TRUE; return trie; exit_da_created: da_free (trie->da); exit_alpha_map_created: alpha_map_free (trie->alpha_map); exit_trie_created: free (trie); return NULL; } /** * @brief Create a new trie by loading from a file * * @param path : the path to the file * * @return a pointer to the created trie, NULL on failure * * Create a new trie and initialize its contents by loading from the file at * given @a path. * * The created object must be freed with trie_free(). */ Trie * trie_new_from_file (const char *path) { Trie *trie; FILE *trie_file; trie_file = fopen (path, "rb"); if (!trie_file) return NULL; trie = trie_fread (trie_file); fclose (trie_file); return trie; } /** * @brief Create a new trie by reading from an open file * * @param file : the handle of the open file * * @return a pointer to the created trie, NULL on failure * * Create a new trie and initialize its contents by reading from the open * @a file. After reading, the file pointer is left at the end of the trie data. * This can be useful for reading embedded trie index as part of a file data. * * The created object must be freed with trie_free(). * * Available since: 0.2.4 */ Trie * trie_fread (FILE *file) { Trie *trie; trie = (Trie *) malloc (sizeof (Trie)); if (UNLIKELY (!trie)) return NULL; if (NULL == (trie->alpha_map = alpha_map_fread_bin (file))) goto exit_trie_created; if (NULL == (trie->da = da_fread (file))) goto exit_alpha_map_created; if (NULL == (trie->tail = tail_fread (file))) goto exit_da_created; trie->is_dirty = FALSE; return trie; exit_da_created: da_free (trie->da); exit_alpha_map_created: alpha_map_free (trie->alpha_map); exit_trie_created: free (trie); return NULL; } /** * @brief Free a trie object * * @param trie : the trie object to free * * Destruct the @a trie and free its allocated memory. */ void trie_free (Trie *trie) { alpha_map_free (trie->alpha_map); da_free (trie->da); tail_free (trie->tail); free (trie); } /** * @brief Save a trie to file * * @param trie : the trie * * @param path : the path to the file * * @return 0 on success, non-zero on failure * * Create a new file at the given @a path and write @a trie data to it. * If @a path already exists, its contents will be replaced. */ int trie_save (Trie *trie, const char *path) { FILE *file; int res = 0; file = fopen (path, "wb+"); if (!file) return -1; res = trie_fwrite (trie, file); fclose (file); return res; } /** * @brief Write trie data to an open file * * @param trie : the trie * * @param file : the open file * * @return 0 on success, non-zero on failure * * Write @a trie data to @a file which is opened for writing. * After writing, the file pointer is left at the end of the trie data. * This can be useful for embedding trie index as part of a file data. * * Available since: 0.2.4 */ int trie_fwrite (Trie *trie, FILE *file) { if (alpha_map_fwrite_bin (trie->alpha_map, file) != 0) return -1; if (da_fwrite (trie->da, file) != 0) return -1; if (tail_fwrite (trie->tail, file) != 0) return -1; trie->is_dirty = FALSE; return 0; } /** * @brief Check pending changes * * @param trie : the trie object * * @return TRUE if there are pending changes, FALSE otherwise * * Check if the @a trie is dirty with some pending changes and needs saving * to synchronize with the file. */ Bool trie_is_dirty (const Trie *trie) { return trie->is_dirty; } /*------------------------------* * GENERAL QUERY OPERATIONS * *------------------------------*/ /** * @brief Retrieve an entry from trie * * @param trie : the trie * @param key : the key for the entry to retrieve * @param o_data : the storage for storing the entry data on return * * @return boolean value indicating the existence of the entry. * * Retrieve an entry for the given @a key from @a trie. On return, * if @a key is found and @a o_data is not NULL, @a *o_data is set * to the data associated to @a key. */ Bool trie_retrieve (const Trie *trie, const AlphaChar *key, TrieData *o_data) { TrieIndex s; short suffix_idx; const AlphaChar *p; /* walk through branches */ s = da_get_root (trie->da); for (p = key; !trie_da_is_separate (trie->da, s); p++) { TrieIndex tc = alpha_map_char_to_trie (trie->alpha_map, *p); if (TRIE_INDEX_MAX == tc) return FALSE; if (!da_walk (trie->da, &s, (TrieChar) tc)) return FALSE; if (0 == *p) break; } /* walk through tail */ s = trie_da_get_tail_index (trie->da, s); suffix_idx = 0; for ( ; ; p++) { TrieIndex tc = alpha_map_char_to_trie (trie->alpha_map, *p); if (TRIE_INDEX_MAX == tc) return FALSE; if (!tail_walk_char (trie->tail, s, &suffix_idx, (TrieChar) tc)) return FALSE; if (0 == *p) break; } /* found, set the val and return */ if (o_data) *o_data = tail_get_data (trie->tail, s); return TRUE; } /** * @brief Store a value for an entry to trie * * @param trie : the trie * @param key : the key for the entry to retrieve * @param data : the data associated to the entry * * @return boolean value indicating the success of the operation * * Store a @a data for the given @a key in @a trie. If @a key does not * exist in @a trie, it will be appended. If it does, its current data will * be overwritten. */ Bool trie_store (Trie *trie, const AlphaChar *key, TrieData data) { return trie_store_conditionally (trie, key, data, TRUE); } /** * @brief Store a value for an entry to trie only if the key is not present * * @param trie : the trie * @param key : the key for the entry to retrieve * @param data : the data associated to the entry * * @return boolean value indicating the success of the operation * * Store a @a data for the given @a key in @a trie. If @a key does not * exist in @a trie, it will be appended. If it does, the function will * return failure and the existing value will not be touched. * * This can be useful for multi-thread applications, as race condition * can be avoided. * * Available since: 0.2.4 */ Bool trie_store_if_absent (Trie *trie, const AlphaChar *key, TrieData data) { return trie_store_conditionally (trie, key, data, FALSE); } static Bool trie_store_conditionally (Trie *trie, const AlphaChar *key, TrieData data, Bool is_overwrite) { TrieIndex s, t; short suffix_idx; const AlphaChar *p, *sep; /* walk through branches */ s = da_get_root (trie->da); for (p = key; !trie_da_is_separate (trie->da, s); p++) { TrieIndex tc = alpha_map_char_to_trie (trie->alpha_map, *p); if (TRIE_INDEX_MAX == tc) return FALSE; if (!da_walk (trie->da, &s, (TrieChar) tc)) { TrieChar *key_str; Bool res; key_str = alpha_map_char_to_trie_str (trie->alpha_map, p); if (!key_str) return FALSE; res = trie_branch_in_branch (trie, s, key_str, data); free (key_str); return res; } if (0 == *p) break; } /* walk through tail */ sep = p; t = trie_da_get_tail_index (trie->da, s); suffix_idx = 0; for ( ; ; p++) { TrieIndex tc = alpha_map_char_to_trie (trie->alpha_map, *p); if (TRIE_INDEX_MAX == tc) return FALSE; if (!tail_walk_char (trie->tail, t, &suffix_idx, (TrieChar) tc)) { TrieChar *tail_str; Bool res; tail_str = alpha_map_char_to_trie_str (trie->alpha_map, sep); if (!tail_str) return FALSE; res = trie_branch_in_tail (trie, s, tail_str, data); free (tail_str); return res; } if (0 == *p) break; } /* duplicated key, overwrite val if flagged */ if (!is_overwrite) { return FALSE; } tail_set_data (trie->tail, t, data); trie->is_dirty = TRUE; return TRUE; } static Bool trie_branch_in_branch (Trie *trie, TrieIndex sep_node, const TrieChar *suffix, TrieData data) { TrieIndex new_da, new_tail; new_da = da_insert_branch (trie->da, sep_node, *suffix); if (TRIE_INDEX_ERROR == new_da) return FALSE; if ('\0' != *suffix) ++suffix; new_tail = tail_add_suffix (trie->tail, suffix); tail_set_data (trie->tail, new_tail, data); trie_da_set_tail_index (trie->da, new_da, new_tail); trie->is_dirty = TRUE; return TRUE; } static Bool trie_branch_in_tail (Trie *trie, TrieIndex sep_node, const TrieChar *suffix, TrieData data) { TrieIndex old_tail, old_da, s; const TrieChar *old_suffix, *p; /* adjust separate point in old path */ old_tail = trie_da_get_tail_index (trie->da, sep_node); old_suffix = tail_get_suffix (trie->tail, old_tail); if (!old_suffix) return FALSE; for (p = old_suffix, s = sep_node; *p == *suffix; p++, suffix++) { TrieIndex t = da_insert_branch (trie->da, s, *p); if (TRIE_INDEX_ERROR == t) goto fail; s = t; } old_da = da_insert_branch (trie->da, s, *p); if (TRIE_INDEX_ERROR == old_da) goto fail; if ('\0' != *p) ++p; tail_set_suffix (trie->tail, old_tail, p); trie_da_set_tail_index (trie->da, old_da, old_tail); /* insert the new branch at the new separate point */ return trie_branch_in_branch (trie, s, suffix, data); fail: /* failed, undo previous insertions and return error */ da_prune_upto (trie->da, sep_node, s); trie_da_set_tail_index (trie->da, sep_node, old_tail); return FALSE; } /** * @brief Delete an entry from trie * * @param trie : the trie * @param key : the key for the entry to delete * * @return boolean value indicating whether the key exists and is removed * * Delete an entry for the given @a key from @a trie. */ Bool trie_delete (Trie *trie, const AlphaChar *key) { TrieIndex s, t; short suffix_idx; const AlphaChar *p; /* walk through branches */ s = da_get_root (trie->da); for (p = key; !trie_da_is_separate (trie->da, s); p++) { TrieIndex tc = alpha_map_char_to_trie (trie->alpha_map, *p); if (TRIE_INDEX_MAX == tc) return FALSE; if (!da_walk (trie->da, &s, (TrieChar) tc)) return FALSE; if (0 == *p) break; } /* walk through tail */ t = trie_da_get_tail_index (trie->da, s); suffix_idx = 0; for ( ; ; p++) { TrieIndex tc = alpha_map_char_to_trie (trie->alpha_map, *p); if (TRIE_INDEX_MAX == tc) return FALSE; if (!tail_walk_char (trie->tail, t, &suffix_idx, (TrieChar) tc)) return FALSE; if (0 == *p) break; } tail_delete (trie->tail, t); da_set_base (trie->da, s, TRIE_INDEX_ERROR); da_prune (trie->da, s); trie->is_dirty = TRUE; return TRUE; } /** * @brief Enumerate entries in trie * * @param trie : the trie * @param enum_func : the callback function to be called on each key * @param user_data : user-supplied data to send as an argument to @a enum_func * * @return boolean value indicating whether all the keys are visited * * Enumerate all entries in trie. For each entry, the user-supplied * @a enum_func callback function is called, with the entry key and data. * Returning FALSE from such callback will stop enumeration and return FALSE. */ Bool trie_enumerate (const Trie *trie, TrieEnumFunc enum_func, void *user_data) { TrieState *root; TrieIterator *iter; Bool cont = TRUE; root = trie_root (trie); if (UNLIKELY (!root)) return FALSE; iter = trie_iterator_new (root); if (UNLIKELY (!iter)) goto exit_root_created; while (cont && trie_iterator_next (iter)) { AlphaChar *key = trie_iterator_get_key (iter); TrieData data = trie_iterator_get_data (iter); cont = (*enum_func) (key, data, user_data); free (key); } trie_iterator_free (iter); trie_state_free (root); return cont; exit_root_created: trie_state_free (root); return FALSE; } /*-------------------------------* * STEPWISE QUERY OPERATIONS * *-------------------------------*/ /** * @brief Get root state of a trie * * @param trie : the trie * * @return the root state of the trie * * Get root state of @a trie, for stepwise walking. * * The returned state is allocated and must be freed with trie_state_free() */ TrieState * trie_root (const Trie *trie) { return trie_state_new (trie, da_get_root (trie->da), 0, FALSE); } /*----------------* * TRIE STATE * *----------------*/ static TrieState * trie_state_new (const Trie *trie, TrieIndex index, short suffix_idx, short is_suffix) { TrieState *s; s = (TrieState *) malloc (sizeof (TrieState)); if (UNLIKELY (!s)) return NULL; s->trie = trie; s->index = index; s->suffix_idx = suffix_idx; s->is_suffix = is_suffix; return s; } /** * @brief Copy trie state to another * * @param dst : the destination state * @param src : the source state * * Copy trie state data from @a src to @a dst. All existing data in @a dst * is overwritten. */ void trie_state_copy (TrieState *dst, const TrieState *src) { /* May be deep copy if necessary, not the case for now */ *dst = *src; } /** * @brief Clone a trie state * * @param s : the state to clone * * @return an duplicated instance of @a s * * Make a copy of trie state. * * The returned state is allocated and must be freed with trie_state_free() */ TrieState * trie_state_clone (const TrieState *s) { return trie_state_new (s->trie, s->index, s->suffix_idx, s->is_suffix); } /** * @brief Free a trie state * * @param s : the state to free * * Free the trie state. */ void trie_state_free (TrieState *s) { free (s); } /** * @brief Rewind a trie state * * @param s : the state to rewind * * Put the state at root. */ void trie_state_rewind (TrieState *s) { s->index = da_get_root (s->trie->da); s->is_suffix = FALSE; } /** * @brief Walk the trie from the state * * @param s : current state * @param c : key character for walking * * @return boolean value indicating the success of the walk * * Walk the trie stepwise, using a given character @a c. * On return, the state @a s is updated to the new state if successfully walked. */ Bool trie_state_walk (TrieState *s, AlphaChar c) { TrieIndex tc = alpha_map_char_to_trie (s->trie->alpha_map, c); if (UNLIKELY (TRIE_INDEX_MAX == tc)) return FALSE; if (!s->is_suffix) { Bool ret; ret = da_walk (s->trie->da, &s->index, (TrieChar) tc); if (ret && trie_da_is_separate (s->trie->da, s->index)) { s->index = trie_da_get_tail_index (s->trie->da, s->index); s->suffix_idx = 0; s->is_suffix = TRUE; } return ret; } else { return tail_walk_char (s->trie->tail, s->index, &s->suffix_idx, (TrieChar) tc); } } /** * @brief Test walkability of character from state * * @param s : the state to check * @param c : the input character * * @return boolean indicating walkability * * Test if there is a transition from state @a s with input character @a c. */ Bool trie_state_is_walkable (const TrieState *s, AlphaChar c) { TrieIndex tc = alpha_map_char_to_trie (s->trie->alpha_map, c); if (UNLIKELY (TRIE_INDEX_MAX == tc)) return FALSE; if (!s->is_suffix) return da_is_walkable (s->trie->da, s->index, (TrieChar) tc); else return tail_is_walkable_char (s->trie->tail, s->index, s->suffix_idx, (TrieChar) tc); } /** * @brief Get all walkable characters from state * * @param s : the state to get * @param chars : the storage for the result * @param chars_nelm : the size of @a chars[] in number of elements * * @return total walkable characters * * Get the list of all walkable characters from state @a s. At most * @a chars_nelm walkable characters are stored in @a chars[] on return. * * The function returns the actual number of walkable characters from @a s. * Note that this may not equal the number of characters stored in @a chars[] * if @a chars_nelm is less than the actual number. * * Available since: 0.2.6 */ int trie_state_walkable_chars (const TrieState *s, AlphaChar chars[], int chars_nelm) { int syms_num = 0; if (!s->is_suffix) { Symbols *syms = da_output_symbols (s->trie->da, s->index); int i; syms_num = symbols_num (syms); for (i = 0; i < syms_num && i < chars_nelm; i++) { TrieChar tc = symbols_get (syms, i); chars[i] = alpha_map_trie_to_char (s->trie->alpha_map, tc); } symbols_free (syms); } else { const TrieChar *suffix = tail_get_suffix (s->trie->tail, s->index); chars[0] = alpha_map_trie_to_char (s->trie->alpha_map, suffix[s->suffix_idx]); syms_num = 1; } return syms_num; } /** * @brief Check for single path * * @param s : the state to check * * @return boolean value indicating whether it is in a single path * * Check if the given state is in a single path, that is, there is no other * branch from it to leaf. */ Bool trie_state_is_single (const TrieState *s) { return s->is_suffix; } /** * @brief Get data from terminal state * * @param s : a terminal state * * @return the data associated with the terminal state @a s, * or TRIE_DATA_ERROR if @a s is not a terminal state * * Get value from a terminal state of trie. Getting value from a non-terminal * state will result in TRIE_DATA_ERROR. */ TrieData trie_state_get_data (const TrieState *s) { if (!s) { return TRIE_DATA_ERROR; } if (!s->is_suffix) { TrieIndex index = s->index; /* walk a terminal char to get the data from tail */ if (da_walk (s->trie->da, &index, TRIE_CHAR_TERM)) { if (trie_da_is_separate (s->trie->da, index)) { index = trie_da_get_tail_index (s->trie->da, index); return tail_get_data (s->trie->tail, index); } } } else { if (tail_is_walkable_char (s->trie->tail, s->index, s->suffix_idx, TRIE_CHAR_TERM)) { return tail_get_data (s->trie->tail, s->index); } } return TRIE_DATA_ERROR; } /*---------------------* * ENTRY ITERATION * *---------------------*/ /** * @brief Create a new trie iterator * * @param s : the TrieState to start iteration from * * @return a pointer to the newly created TrieIterator, or NULL on failure * * Create a new trie iterator for iterating entries of a sub-trie rooted at * state @a s. * * Use it with the result of trie_root() to iterate the whole trie. * * The created object must be freed with trie_iterator_free(). * * Available since: 0.2.6 */ TrieIterator * trie_iterator_new (TrieState *s) { TrieIterator *iter; iter = (TrieIterator *) malloc (sizeof (TrieIterator)); if (UNLIKELY (!iter)) return NULL; iter->root = s; iter->state = NULL; iter->key = NULL; return iter; } /** * @brief Free a trie iterator * * @param iter : the trie iterator to free * * Destruct the iterator @a iter and free its allocated memory. * * Available since: 0.2.6 */ void trie_iterator_free (TrieIterator *iter) { if (iter->state) { trie_state_free (iter->state); } if (iter->key) { trie_string_free (iter->key); } free (iter); } /** * @brief Move trie iterator to the next entry * * @param iter : an iterator * * @return boolean value indicating the availability of the entry * * Move trie iterator to the next entry. * On return, the iterator @a iter is updated to reference to the new entry * if successfully moved. * * Available since: 0.2.6 */ Bool trie_iterator_next (TrieIterator *iter) { TrieState *s = iter->state; TrieIndex sep; /* first iteration */ if (!s) { s = iter->state = trie_state_clone (iter->root); /* for tail state, we are already at the only entry */ if (s->is_suffix) return TRUE; iter->key = trie_string_new (20); sep = da_first_separate (s->trie->da, s->index, iter->key); if (TRIE_INDEX_ERROR == sep) return FALSE; s->index = sep; return TRUE; } /* no next entry for tail state */ if (s->is_suffix) return FALSE; /* iter->state is a separate node */ sep = da_next_separate (s->trie->da, iter->root->index, s->index, iter->key); if (TRIE_INDEX_ERROR == sep) return FALSE; s->index = sep; return TRUE; } /** * @brief Get key for a trie iterator * * @param iter : an iterator * * @return the allocated key string; NULL on failure * * Get key for the current entry referenced by the trie iterator @a iter. * * The return string must be freed with free(). * * Available since: 0.2.6 */ AlphaChar * trie_iterator_get_key (const TrieIterator *iter) { const TrieState *s; const TrieChar *tail_str; AlphaChar *alpha_key, *alpha_p; s = iter->state; if (!s) return NULL; /* if s is in tail, root == s */ if (s->is_suffix) { tail_str = tail_get_suffix (s->trie->tail, s->index); if (!tail_str) return NULL; tail_str += s->suffix_idx; alpha_key = (AlphaChar *) malloc (sizeof (AlphaChar) * (strlen ((const char *)tail_str) + 1)); alpha_p = alpha_key; } else { TrieIndex tail_idx; int i, key_len; const TrieChar *key_p; tail_idx = trie_da_get_tail_index (s->trie->da, s->index); tail_str = tail_get_suffix (s->trie->tail, tail_idx); if (!tail_str) return NULL; key_len = trie_string_length (iter->key); key_p = trie_string_get_val (iter->key); alpha_key = (AlphaChar *) malloc ( sizeof (AlphaChar) * (key_len + strlen ((const char *)tail_str) + 1) ); alpha_p = alpha_key; for (i = key_len; i > 0; i--) { *alpha_p++ = alpha_map_trie_to_char (s->trie->alpha_map, *key_p++); } } while (*tail_str) { *alpha_p++ = alpha_map_trie_to_char (s->trie->alpha_map, *tail_str++); } *alpha_p = 0; return alpha_key; } /** * @brief Get data for the entry referenced by an iterator * * @param iter : an iterator * * @return the data associated with the entry referenced by iterator @a iter, * or TRIE_DATA_ERROR if @a iter does not reference to a unique entry * * Get value for the entry referenced by an iterator. Getting value from an * un-iterated (or broken for any reason) iterator will result in * TRIE_DATA_ERROR. * * Available since: 0.2.6 */ TrieData trie_iterator_get_data (const TrieIterator *iter) { const TrieState *s = iter->state; TrieIndex tail_index; if (!s) return TRIE_DATA_ERROR; if (!s->is_suffix) { if (!trie_da_is_separate (s->trie->da, s->index)) return TRIE_DATA_ERROR; tail_index = trie_da_get_tail_index (s->trie->da, s->index); } else { tail_index = s->index; } return tail_get_data (s->trie->tail, tail_index); } /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/trie.h0000644000175000017500000001614700000000000021004 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * trie.h - Trie data type and functions * Created: 2006-08-11 * Author: Theppitak Karoonboonyanan */ #ifndef __TRIE_H #define __TRIE_H #include #include #ifdef __cplusplus extern "C" { #endif /** * @file trie.h * @brief Trie data type and functions * * Trie is a kind of digital search tree, an efficient indexing method with * O(1) time complexity for searching. Comparably as efficient as hashing, * trie also provides flexibility on incremental matching and key spelling * manipulation. This makes it ideal for lexical analyzers, as well as * spelling dictionaries. * * This library is an implementation of double-array structure for representing * trie, as proposed by Junichi Aoe. The details of the implementation can be * found at http://linux.thai.net/~thep/datrie/datrie.html * * A Trie is associated with an AlphaMap, a map between actual alphabet * characters and the raw characters used to walk through trie. * You can define the alphabet set by adding ranges of character codes * to it before associating it to a trie. And the keys to be added to the trie * must comprise only characters in such ranges. Note that the size of the * alphabet set is limited to 256 (TRIE_CHAR_MAX + 1), and the AlphaMap * will map the alphabet characters to raw codes in the range 0..255 * (0..TRIE_CHAR_MAX). The alphabet character ranges need not be continuous, * but the mapped raw codes will be continuous, for the sake of compactness * of the trie. * * A new Trie can be created in memory using trie_new(), saved to file using * trie_save(), and loaded later with trie_new_from_file(). * It can even be embeded in another file using trie_fwrite() and read back * using trie_fread(). * After use, Trie objects must be freed using trie_free(). * * Operations on trie include: * * - Add/delete entries with trie_store() and trie_delete() * - Retrieve entries with trie_retrieve() * - Walk through trie stepwise with TrieState and its functions * (trie_root(), trie_state_walk(), trie_state_rewind(), * trie_state_clone(), trie_state_copy(), * trie_state_is_walkable(), trie_state_walkable_chars(), * trie_state_is_single(), trie_state_get_data(). * And do not forget to free TrieState objects with trie_state_free() * after use.) * - Enumerate all keys using trie_enumerate() * - Iterate entries using TrieIterator and its functions * (trie_iterator_new(), trie_iterator_next(), trie_iterator_get_key(), * trie_iterator_get_data(). * And do not forget to free TrieIterator objects with trie_iterator_free() * after use.) */ /** * @brief Trie data type */ typedef struct _Trie Trie; /** * @brief Trie enumeration function * * @param key : the key of the entry * @param data : the data of the entry * @param user_data : the user-supplied data on enumerate call * * @return TRUE to continue enumeration, FALSE to stop */ typedef Bool (*TrieEnumFunc) (const AlphaChar *key, TrieData key_data, void *user_data); /** * @brief Trie walking state */ typedef struct _TrieState TrieState; /** * @brief Trie iteration state */ typedef struct _TrieIterator TrieIterator; /*-----------------------* * GENERAL FUNCTIONS * *-----------------------*/ Trie * trie_new (const AlphaMap *alpha_map); Trie * trie_new_from_file (const char *path); Trie * trie_fread (FILE *file); void trie_free (Trie *trie); int trie_save (Trie *trie, const char *path); int trie_fwrite (Trie *trie, FILE *file); Bool trie_is_dirty (const Trie *trie); /*------------------------------* * GENERAL QUERY OPERATIONS * *------------------------------*/ Bool trie_retrieve (const Trie *trie, const AlphaChar *key, TrieData *o_data); Bool trie_store (Trie *trie, const AlphaChar *key, TrieData data); Bool trie_store_if_absent (Trie *trie, const AlphaChar *key, TrieData data); Bool trie_delete (Trie *trie, const AlphaChar *key); Bool trie_enumerate (const Trie *trie, TrieEnumFunc enum_func, void *user_data); /*-------------------------------* * STEPWISE QUERY OPERATIONS * *-------------------------------*/ TrieState * trie_root (const Trie *trie); /*----------------* * TRIE STATE * *----------------*/ TrieState * trie_state_clone (const TrieState *s); void trie_state_copy (TrieState *dst, const TrieState *src); void trie_state_free (TrieState *s); void trie_state_rewind (TrieState *s); Bool trie_state_walk (TrieState *s, AlphaChar c); Bool trie_state_is_walkable (const TrieState *s, AlphaChar c); int trie_state_walkable_chars (const TrieState *s, AlphaChar chars[], int chars_nelm); /** * @brief Check for terminal state * * @param s : the state to check * * @return boolean value indicating whether it is a terminal state * * Check if the given state is a terminal state. A terminal state is a trie * state that terminates a key, and stores a value associated with it. */ #define trie_state_is_terminal(s) trie_state_is_walkable((s),TRIE_CHAR_TERM) Bool trie_state_is_single (const TrieState *s); /** * @brief Check for leaf state * * @param s : the state to check * * @return boolean value indicating whether it is a leaf state * * Check if the given state is a leaf state. A leaf state is a terminal state * that has no other branch. */ #define trie_state_is_leaf(s) \ (trie_state_is_single(s) && trie_state_is_terminal(s)) TrieData trie_state_get_data (const TrieState *s); /*----------------------* * ENTRY ITERATION * *----------------------*/ TrieIterator * trie_iterator_new (TrieState *s); void trie_iterator_free (TrieIterator *iter); Bool trie_iterator_next (TrieIterator *iter); AlphaChar * trie_iterator_get_key (const TrieIterator *iter); TrieData trie_iterator_get_data (const TrieIterator *iter); #ifdef __cplusplus } #endif #endif /* __TRIE_H */ /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/triedefs.h0000644000175000017500000000415400000000000021641 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * triedefs.h - General typedefs for trie * Created: 2006-08-11 * Author: Theppitak Karoonboonyanan */ #ifndef __TRIEDEFS_H #define __TRIEDEFS_H #include /** * @file triedefs.h * @brief General typedefs for trie */ /** * @brief Alphabet character type for use as input/output strings of trie keys */ typedef uint32 AlphaChar; /** * @brief Error value for alphabet character */ #define ALPHA_CHAR_ERROR (~(AlphaChar)0) /** * @brief Raw character type mapped into packed set from AlphaChar, * for use in actual trie transition calculations */ typedef unsigned char TrieChar; /** * @brief Trie terminator character */ #define TRIE_CHAR_TERM '\0' #define TRIE_CHAR_MAX 255 /** * @brief Type of index into Trie double-array and tail structures */ typedef int32 TrieIndex; /** * @brief Trie error index */ #define TRIE_INDEX_ERROR 0 /** * @brief Maximum trie index value */ #define TRIE_INDEX_MAX 0x7fffffff /** * @brief Type of value associated to trie entries */ typedef int32 TrieData; /** * @brief Trie error data */ #define TRIE_DATA_ERROR -1 #endif /* __TRIEDEFS_H */ /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/datrie/typedefs.h0000644000175000017500000000647500000000000021667 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2006 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * typedefs.h - general types * Created : 11 Aug 2006 * Author : Theppitak Karoonboonyanan */ #ifndef __TYPEDEFS_H #define __TYPEDEFS_H #include typedef enum { FALSE = 0, TRUE = 1 } Bool; # if UCHAR_MAX == 0xff # ifndef UINT8_TYPEDEF # define UINT8_TYPEDEF typedef unsigned char uint8; # endif /* UINT8_TYPEDEF */ # endif /* UCHAR_MAX */ # if SCHAR_MAX == 0x7f # ifndef INT8_TYPEDEF # define INT8_TYPEDEF typedef signed char int8; # endif /* INT8_TYPEDEF */ # endif /* SCHAR_MAX */ # if UINT_MAX == 0xffff # ifndef UINT16_TYPEDEF # define UINT16_TYPEDEF typedef unsigned int uint16; # endif /* UINT16_TYPEDEF */ # endif /* UINT_MAX */ # if INT_MAX == 0x7fff # ifndef INT16_TYPEDEF # define INT16_TYPEDEF typedef int int16; # endif /* INT16_TYPEDEF */ # endif /* INT_MAX */ # if USHRT_MAX == 0xffff # ifndef UINT16_TYPEDEF # define UINT16_TYPEDEF typedef unsigned short uint16; # endif /* UINT16_TYPEDEF */ # endif /* USHRT_MAX */ # if SHRT_MAX == 0x7fff # ifndef INT16_TYPEDEF # define INT16_TYPEDEF typedef short int16; # endif /* INT16_TYPEDEF */ # endif /* SHRT_MAX */ # if UINT_MAX == 0xffffffff # ifndef UINT32_TYPEDEF # define UINT32_TYPEDEF typedef unsigned int uint32; # endif /* UINT32_TYPEDEF */ # endif /* UINT_MAX */ # if INT_MAX == 0x7fffffff # ifndef INT32_TYPEDEF # define INT32_TYPEDEF typedef int int32; # endif /* INT32_TYPEDEF */ # endif /* INT_MAX */ # if ULONG_MAX == 0xffffffff # ifndef UINT32_TYPEDEF # define UINT32_TYPEDEF typedef unsigned long uint32; # endif /* UINT32_TYPEDEF */ # endif /* ULONG_MAX */ # if LONG_MAX == 0x7fffffff # ifndef INT32_TYPEDEF # define INT32_TYPEDEF typedef long int32; # endif /* INT32_TYPEDEF */ # endif /* LONG_MAX */ # ifndef UINT8_TYPEDEF # error "uint8 type is undefined!" # endif # ifndef INT8_TYPEDEF # error "int8 type is undefined!" # endif # ifndef UINT16_TYPEDEF # error "uint16 type is undefined!" # endif # ifndef INT16_TYPEDEF # error "int16 type is undefined!" # endif # ifndef UINT32_TYPEDEF # error "uint32 type is undefined!" # endif # ifndef INT32_TYPEDEF # error "int32 type is undefined!" # endif typedef uint8 byte; typedef uint16 word; typedef uint32 dword; #endif /* __TYPEDEFS_H */ /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1585193107.7794485 datrie-0.8.2/libdatrie/tests/0000755000175000017500000000000000000000000017551 5ustar00tcaswelltcaswell00000000000000././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/tests/test_file.c0000644000175000017500000000753100000000000021701 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2013 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * test_file.c - Test for datrie file operations * Created: 2013-10-16 * Author: Theppitak Karoonboonyanan */ #include #include "utils.h" #include #include #define TRIE_FILENAME "test.tri" static Bool trie_enum_mark_rec (const AlphaChar *key, TrieData key_data, void *user_data) { Bool *is_failed = (Bool *)user_data; TrieData src_data; src_data = dict_src_get_data (key); if (TRIE_DATA_ERROR == src_data) { printf ("Extra entry in file: key '%ls', data %d.\n", key, key_data); *is_failed = TRUE; } else if (src_data != key_data) { printf ("Data mismatch for: key '%ls', expected %d, got %d.\n", key, src_data, key_data); *is_failed = TRUE; } else { dict_src_set_data (key, TRIE_DATA_READ); } return TRUE; } int main () { Trie *test_trie; DictRec *dict_p; Bool is_failed; msg_step ("Preparing trie"); test_trie = en_trie_new (); if (!test_trie) { printf ("Failed to allocate test trie.\n"); goto err_trie_not_created; } /* add/remove some words */ for (dict_p = dict_src; dict_p->key; dict_p++) { if (!trie_store (test_trie, dict_p->key, dict_p->data)) { printf ("Failed to add key '%ls', data %d.\n", dict_p->key, dict_p->data); goto err_trie_created; } } /* save & close */ msg_step ("Saving trie to file"); unlink (TRIE_FILENAME); /* error ignored */ if (trie_save (test_trie, TRIE_FILENAME) != 0) { printf ("Failed to save trie to file '%s'.\n", TRIE_FILENAME); goto err_trie_created; } trie_free (test_trie); /* reload from file */ msg_step ("Reloading trie from the saved file"); test_trie = trie_new_from_file (TRIE_FILENAME); if (!test_trie) { printf ("Failed to reload saved trie from '%s'.\n", TRIE_FILENAME); goto err_trie_saved; } /* enumerate & check */ msg_step ("Checking trie contents"); is_failed = FALSE; /* mark entries found in file */ if (!trie_enumerate (test_trie, trie_enum_mark_rec, (void *)&is_failed)) { printf ("Failed to enumerate trie file contents.\n"); goto err_trie_saved; } /* check for unmarked entries, (i.e. missed in file) */ for (dict_p = dict_src; dict_p->key; dict_p++) { if (dict_p->data != TRIE_DATA_READ) { printf ("Entry missed in file: key '%ls', data %d.\n", dict_p->key, dict_p->data); is_failed = TRUE; } } if (is_failed) { printf ("Errors found in trie saved contents.\n"); goto err_trie_saved; } unlink (TRIE_FILENAME); trie_free (test_trie); return 0; err_trie_saved: unlink (TRIE_FILENAME); err_trie_created: trie_free (test_trie); err_trie_not_created: return 1; } /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/tests/test_iterator.c0000644000175000017500000001007700000000000022612 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2013 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * test_iterator.c - Test for datrie iterator operations * Created: 2013-10-16 * Author: Theppitak Karoonboonyanan */ #include #include "utils.h" #include #include int main () { Trie *test_trie; DictRec *dict_p; TrieState *trie_root_state; TrieIterator *trie_it; Bool is_failed; msg_step ("Preparing trie"); test_trie = en_trie_new (); if (!test_trie) { fprintf (stderr, "Fail to create test trie\n"); goto err_trie_not_created; } /* store */ msg_step ("Adding data to trie"); for (dict_p = dict_src; dict_p->key; dict_p++) { if (!trie_store (test_trie, dict_p->key, dict_p->data)) { printf ("Failed to add key '%ls', data %d.\n", dict_p->key, dict_p->data); goto err_trie_created; } } /* iterate & check */ msg_step ("Iterating and checking trie contents"); trie_root_state = trie_root (test_trie); if (!trie_root_state) { printf ("Failed to get trie root state\n"); goto err_trie_created; } trie_it = trie_iterator_new (trie_root_state); if (!trie_it) { printf ("Failed to get trie iterator\n"); goto err_trie_root_created; } is_failed = FALSE; while (trie_iterator_next (trie_it)) { AlphaChar *key; TrieData key_data, src_data; key = trie_iterator_get_key (trie_it); if (!key) { printf ("Failed to get key from trie iterator\n"); is_failed = TRUE; continue; } key_data = trie_iterator_get_data (trie_it); if (TRIE_DATA_ERROR == key_data) { printf ("Failed to get data from trie iterator for key '%ls'\n", key); is_failed = TRUE; } /* mark entries found in trie */ src_data = dict_src_get_data (key); if (TRIE_DATA_ERROR == src_data) { printf ("Extra entry in trie: key '%ls', data %d.\n", key, key_data); is_failed = TRUE; } else if (src_data != key_data) { printf ("Data mismatch for: key '%ls', expected %d, got %d.\n", key, src_data, key_data); is_failed = TRUE; } else { dict_src_set_data (key, TRIE_DATA_READ); } free (key); } /* check for unmarked entries, (i.e. missed in trie) */ for (dict_p = dict_src; dict_p->key; dict_p++) { if (dict_p->data != TRIE_DATA_READ) { printf ("Entry missed in trie: key '%ls', data %d.\n", dict_p->key, dict_p->data); is_failed = TRUE; } } if (is_failed) { printf ("Errors found in trie iteration.\n"); goto err_trie_it_created; } trie_iterator_free (trie_it); trie_state_free (trie_root_state); trie_free (test_trie); return 0; err_trie_it_created: trie_iterator_free (trie_it); err_trie_root_created: trie_state_free (trie_root_state); err_trie_created: trie_free (test_trie); err_trie_not_created: return 1; } /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/tests/test_nonalpha.c0000644000175000017500000000532400000000000022560 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2014 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * test_nonalpha.c - Test for datrie behaviors on non-alphabet inputs * Created: 2014-01-06 * Author: Theppitak Karoonboonyanan */ #include #include "utils.h" #include const AlphaChar *nonalpha_src[] = { (AlphaChar *)L"a6acus", (AlphaChar *)L"a5acus", NULL }; int main () { Trie *test_trie; DictRec *dict_p; const AlphaChar **nonalpha_key; TrieData trie_data; Bool is_fail; msg_step ("Preparing trie"); test_trie = en_trie_new (); if (!test_trie) { fprintf (stderr, "Fail to create test trie\n"); goto err_trie_not_created; } /* store */ msg_step ("Adding data to trie"); for (dict_p = dict_src; dict_p->key; dict_p++) { if (!trie_store (test_trie, dict_p->key, dict_p->data)) { printf ("Failed to add key '%ls', data %d.\n", dict_p->key, dict_p->data); goto err_trie_created; } } /* test storing keys with non-alphabet chars */ is_fail = FALSE; for (nonalpha_key = nonalpha_src; *nonalpha_key; nonalpha_key++) { if (trie_retrieve (test_trie, *nonalpha_key, &trie_data)) { printf ("False duplication on key '%ls', with existing data %d.\n", *nonalpha_key, trie_data); is_fail = TRUE; } if (trie_store (test_trie, *nonalpha_key, TRIE_DATA_UNREAD)) { printf ("Wrongly added key '%ls' containing non-alphanet char\n", *nonalpha_key); is_fail = TRUE; } } if (is_fail) goto err_trie_created; trie_free (test_trie); return 0; err_trie_created: trie_free (test_trie); err_trie_not_created: return 1; } /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/tests/test_null_trie.c0000644000175000017500000000527000000000000022755 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2015 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * test_null_trie.c - Test for datrie iteration on empty trie * Created: 2015-04-21 * Author: Theppitak Karoonboonyanan */ #include #include "utils.h" #include #include int main () { Trie *test_trie; TrieState *trie_root_state; TrieIterator *trie_it; Bool is_failed; msg_step ("Preparing empty trie"); test_trie = en_trie_new (); if (!test_trie) { fprintf (stderr, "Fail to create test trie\n"); goto err_trie_not_created; } /* iterate & check */ msg_step ("Iterating"); trie_root_state = trie_root (test_trie); if (!trie_root_state) { printf ("Failed to get trie root state\n"); goto err_trie_created; } trie_it = trie_iterator_new (trie_root_state); if (!trie_it) { printf ("Failed to get trie iterator\n"); goto err_trie_root_created; } is_failed = FALSE; while (trie_iterator_next (trie_it)) { AlphaChar *key; printf ("Got entry from empty trie, which is weird!\n"); key = trie_iterator_get_key (trie_it); if (key) { printf ("Got key from empty trie, which is weird! (key='%ls')\n", key); is_failed = TRUE; free (key); } } if (is_failed) { printf ("Errors found in empty trie iteration.\n"); goto err_trie_it_created; } trie_iterator_free (trie_it); trie_state_free (trie_root_state); trie_free (test_trie); return 0; err_trie_it_created: trie_iterator_free (trie_it); err_trie_root_created: trie_state_free (trie_root_state); err_trie_created: trie_free (test_trie); err_trie_not_created: return 1; } /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/tests/test_store-retrieve.c0000644000175000017500000001434000000000000023735 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2013 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * test_store-retrieve.c - Test for datrie store/retrieve operations * Created: 2013-10-16 * Author: Theppitak Karoonboonyanan */ #include #include "utils.h" #include #include #include int main () { Trie *test_trie; DictRec *dict_p; TrieData trie_data; Bool is_failed; int n_entries, n_dels, i; TrieState *trie_root_state; TrieIterator *trie_it; msg_step ("Preparing trie"); test_trie = en_trie_new (); if (!test_trie) { fprintf (stderr, "Fail to create test trie\n"); goto err_trie_not_created; } /* store */ msg_step ("Adding data to trie"); for (dict_p = dict_src; dict_p->key; dict_p++) { if (!trie_store (test_trie, dict_p->key, dict_p->data)) { printf ("Failed to add key '%ls', data %d.\n", dict_p->key, dict_p->data); goto err_trie_created; } } /* retrieve */ msg_step ("Retrieving data from trie"); is_failed = FALSE; for (dict_p = dict_src; dict_p->key; dict_p++) { if (!trie_retrieve (test_trie, dict_p->key, &trie_data)) { printf ("Failed to retrieve key '%ls'.\n", dict_p->key); is_failed = TRUE; } if (trie_data != dict_p->data) { printf ("Wrong data for key '%ls'; expected %d, got %d.\n", dict_p->key, dict_p->data, trie_data); is_failed = TRUE; } } if (is_failed) { printf ("Trie store/retrieval test failed.\n"); goto err_trie_created; } /* delete */ msg_step ("Deleting some entries from trie"); n_entries = dict_src_n_entries (); srand (time (NULL)); for (n_dels = n_entries/3 + 1; n_dels > 0; n_dels--) { /* pick an undeleted entry */ do { i = rand () % n_entries; } while (TRIE_DATA_READ == dict_src[i].data); printf ("Deleting '%ls'\n", dict_src[i].key); if (!trie_delete (test_trie, dict_src[i].key)) { printf ("Failed to delete '%ls'\n", dict_src[i].key); is_failed = TRUE; } dict_src[i].data = TRIE_DATA_READ; } if (is_failed) { printf ("Trie deletion test failed.\n"); goto err_trie_created; } /* retrieve */ msg_step ("Retrieving data from trie again after deletions"); for (dict_p = dict_src; dict_p->key; dict_p++) { /* skip deleted entries */ if (TRIE_DATA_READ == dict_p->data) continue; if (!trie_retrieve (test_trie, dict_p->key, &trie_data)) { printf ("Failed to retrieve key '%ls'.\n", dict_p->key); is_failed = TRUE; } if (trie_data != dict_p->data) { printf ("Wrong data for key '%ls'; expected %d, got %d.\n", dict_p->key, dict_p->data, trie_data); is_failed = TRUE; } } if (is_failed) { printf ("Trie retrival-after-deletion test failed.\n"); goto err_trie_created; } /* enumerate & check */ msg_step ("Iterating trie contents after deletions"); trie_root_state = trie_root (test_trie); if (!trie_root_state) { printf ("Failed to get trie root state\n"); goto err_trie_created; } trie_it = trie_iterator_new (trie_root_state); if (!trie_it) { printf ("Failed to get trie iterator\n"); goto err_trie_root_created; } while (trie_iterator_next (trie_it)) { AlphaChar *key; TrieData key_data, src_data; key = trie_iterator_get_key (trie_it); if (!key) { printf ("Failed to get key from trie iterator\n"); is_failed = TRUE; continue; } key_data = trie_iterator_get_data (trie_it); if (TRIE_DATA_ERROR == key_data) { printf ("Failed to get data from trie iterator for key '%ls'\n", key); is_failed = TRUE; } /* mark entries found in trie */ src_data = dict_src_get_data (key); if (TRIE_DATA_ERROR == src_data) { printf ("Extra entry in trie: key '%ls', data %d.\n", key, key_data); is_failed = TRUE; } else if (src_data != key_data) { printf ("Data mismatch for: key '%ls', expected %d, got %d.\n", key, src_data, key_data); is_failed = TRUE; } else { dict_src_set_data (key, TRIE_DATA_READ); } free (key); } /* check for unmarked entries, (i.e. missed in trie) */ for (dict_p = dict_src; dict_p->key; dict_p++) { if (dict_p->data != TRIE_DATA_READ) { printf ("Entry missed in trie: key '%ls', data %d.\n", dict_p->key, dict_p->data); is_failed = TRUE; } } if (is_failed) { printf ("Errors found in trie iteration after deletions.\n"); goto err_trie_it_created; } trie_iterator_free (trie_it); trie_state_free (trie_root_state); trie_free (test_trie); return 0; err_trie_it_created: trie_iterator_free (trie_it); err_trie_root_created: trie_state_free (trie_root_state); err_trie_created: trie_free (test_trie); err_trie_not_created: return 1; } /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/tests/test_term_state.c0000644000175000017500000000716400000000000023133 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2018 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * test_term_state.c - Test data retrieval from terminal state * Created: 2018-03-29 * Author: Theppitak Karoonboonyanan */ #include #include "utils.h" #include #include /* * Test trie * * (1) -a-> (2) -b-> (3) -#-> [4] {data=1} * | * +---c-> (5) -#-> [6] {data=2} * */ int main () { Trie *test_trie; TrieState *trie_state; TrieData data; Bool is_failed; msg_step ("Preparing trie"); test_trie = en_trie_new (); if (!test_trie) { printf ("Fail to create test trie\n"); goto err_trie_not_created; } /* populate trie */ msg_step ("Populating trie with test set"); if (!trie_store (test_trie, (AlphaChar *)L"ab", 1)) { printf ("Failed to add key 'ab', data 1.\n"); goto err_trie_created; } if (!trie_store (test_trie, (AlphaChar *)L"abc", 2)) { printf ("Failed to add key 'abc', data 2.\n"); goto err_trie_created; } is_failed = FALSE; /* try retrieving data */ msg_step ("Preparing root state"); trie_state = trie_root (test_trie); if (!trie_state) { printf ("Failed to get trie root state\n"); goto err_trie_created; } msg_step ("Try walking from root with 'a'"); if (!trie_state_walk (trie_state, (AlphaChar)L'a')) { printf ("Failed to walk from root with 'a'.\n"); is_failed = TRUE; } data = trie_state_get_data (trie_state); if (data != TRIE_DATA_ERROR) { printf ("Retrieved data at 'a' is %d, not %d.\n", data, TRIE_DATA_ERROR); is_failed = TRUE; } msg_step ("Try walking further with 'b'"); if (!trie_state_walk (trie_state, (AlphaChar)L'b')) { printf ("Failed to continue walking with 'b'.\n"); is_failed = TRUE; } data = trie_state_get_data (trie_state); if (data != 1) { printf ("Retrieved data for key 'ab' is %d, not 1.\n", data); is_failed = TRUE; } msg_step ("Try walking further with 'c'"); if (!trie_state_walk (trie_state, (AlphaChar)L'c')) { printf ("Failed to continue walking with 'c'.\n"); is_failed = TRUE; } data = trie_state_get_data (trie_state); if (data != 2) { printf ("Retrieved data for key 'abc' is %d, not 2.\n", data); is_failed = TRUE; } trie_state_free (trie_state); if (is_failed) { printf ("Errors found in terminal state data retrieval.\n"); goto err_trie_created; } trie_free (test_trie); return 0; err_trie_created: trie_free (test_trie); err_trie_not_created: return 1; } /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/tests/test_walk.c0000644000175000017500000004353200000000000021721 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2013 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * test_walk.c - Test for datrie walking operations * Created: 2013-10-16 * Author: Theppitak Karoonboonyanan */ #include #include "utils.h" #include /* * Sample trie in http://linux.thai.net/~thep/datrie/datrie.html * * +---o-> (3) -o-> (4) -l-> [5] * | * | +---i-> (7) -z-> (8) -e-> [9] * | | * (1) -p-> (2) -r-> (6) -e-> (10) -v-> (11) -i-> (12) -e-> (13) -w-> [14] * | | * | +---p-> (15) -a-> (16) -r-> (17) -e-> [18] * | * +---o-> (19) -d-> (20) -u-> (21) -c-> (22) -e-> [23] * | * +---g-> (24) -r-> (25) -e-> (26) -s-> (27) -s-> [28] * */ DictRec walk_dict[] = { {(AlphaChar *)L"pool", TRIE_DATA_UNREAD}, {(AlphaChar *)L"prize", TRIE_DATA_UNREAD}, {(AlphaChar *)L"preview", TRIE_DATA_UNREAD}, {(AlphaChar *)L"prepare", TRIE_DATA_UNREAD}, {(AlphaChar *)L"produce", TRIE_DATA_UNREAD}, {(AlphaChar *)L"progress", TRIE_DATA_UNREAD}, {(AlphaChar *)NULL, TRIE_DATA_ERROR}, }; static Bool is_walkables_include (AlphaChar c, const AlphaChar *walkables, int n_elm) { while (n_elm > 0) { if (walkables[--n_elm] == c) return TRUE; } return FALSE; } static void print_walkables (const AlphaChar *walkables, int n_elm) { int i; printf ("{"); for (i = 0; i < n_elm; i++) { if (i > 0) { printf (", "); } printf ("'%lc'", walkables[i]); } printf ("}"); } #define ALPHABET_SIZE 256 int main () { Trie *test_trie; DictRec *dict_p; TrieState *s, *t, *u; AlphaChar walkables[ALPHABET_SIZE]; int n; Bool is_failed; TrieData data; msg_step ("Preparing trie"); test_trie = en_trie_new (); if (!test_trie) { fprintf (stderr, "Fail to create test trie\n"); goto err_trie_not_created; } /* store */ for (dict_p = walk_dict; dict_p->key; dict_p++) { if (!trie_store (test_trie, dict_p->key, dict_p->data)) { printf ("Failed to add key '%ls', data %d.\n", dict_p->key, dict_p->data); goto err_trie_created; } } printf ( "Now the trie structure is supposed to be:\n" "\n" " +---o-> (3) -o-> (4) -l-> [5]\n" " |\n" " | +---i-> (7) -z-> (8) -e-> [9]\n" " | |\n" "(1) -p-> (2) -r-> (6) -e-> (10) -v-> (11) -i-> (12) -e-> (13) -w-> [14]\n" " | |\n" " | +---p-> (15) -a-> (16) -r-> (17) -e-> [18]\n" " |\n" " +---o-> (19) -d-> (20) -u-> (21) -c-> (22) -e-> [23]\n" " |\n" " +---g-> (24) -r-> (25) -e-> (26) -s-> (27) -s-> [28]\n" "\n" ); /* walk */ msg_step ("Test walking"); s = trie_root (test_trie); if (!s) { printf ("Failed to get trie root state\n"); goto err_trie_created; } msg_step ("Test walking with 'p'"); if (!trie_state_is_walkable (s, L'p')) { printf ("Trie state is not walkable with 'p'\n"); goto err_trie_state_s_created; } if (!trie_state_walk (s, L'p')) { printf ("Failed to walk with 'p'\n"); goto err_trie_state_s_created; } msg_step ("Now at (2), walkable chars should be {'o', 'r'}"); is_failed = FALSE; n = trie_state_walkable_chars (s, walkables, ALPHABET_SIZE); if (2 != n) { printf ("Walkable chars should be exactly 2, got %d\n", n); is_failed = TRUE; } if (!is_walkables_include (L'o', walkables, n)) { printf ("Walkable chars do not include 'o'\n"); is_failed = TRUE; } if (!is_walkables_include (L'r', walkables, n)) { printf ("Walkable chars do not include 'r'\n"); is_failed = TRUE; } if (is_failed) { printf ("Walkables = "); print_walkables (walkables, n); printf ("\n"); goto err_trie_state_s_created; } msg_step ("Try walking from (2) with 'o' to (3)"); t = trie_state_clone (s); if (!t) { printf ("Failed to clone trie state\n"); goto err_trie_state_s_created; } if (!trie_state_walk (t, L'o')) { printf ("Failed to walk from (2) with 'o' to (3)\n"); goto err_trie_state_t_created; } if (!trie_state_is_single (t)) { printf ("(3) should be single, but isn't.\n"); goto err_trie_state_t_created; } msg_step ("Try walking from (3) with 'o' to (4)"); if (!trie_state_walk (t, L'o')) { printf ("Failed to walk from (3) with 'o' to (4)\n"); goto err_trie_state_t_created; } if (!trie_state_is_single (t)) { printf ("(4) should be single, but isn't.\n"); goto err_trie_state_t_created; } msg_step ("Try walking from (4) with 'l' to (5)"); if (!trie_state_walk (t, L'l')) { printf ("Failed to walk from (4) with 'l' to (5)\n"); goto err_trie_state_t_created; } if (!trie_state_is_terminal (t)) { printf ("(5) should be terminal, but isn't.\n"); goto err_trie_state_t_created; } /* get key & data */ msg_step ("Try getting data from (5)"); data = trie_state_get_data (t); if (TRIE_DATA_ERROR == data) { printf ("Failed to get data from (5)\n"); goto err_trie_state_t_created; } if (TRIE_DATA_UNREAD != data) { printf ("Mismatched data from (5), expected %d, got %d\n", TRIE_DATA_UNREAD, data); goto err_trie_state_t_created; } /* walk s from (2) with 'r' to (6) */ msg_step ("Try walking from (2) with 'r' to (6)"); if (!trie_state_walk (s, L'r')) { printf ("Failed to walk from (2) with 'r' to (6)\n"); goto err_trie_state_t_created; } msg_step ("Now at (6), walkable chars should be {'e', 'i', 'o'}"); is_failed = FALSE; n = trie_state_walkable_chars (s, walkables, ALPHABET_SIZE); if (3 != n) { printf ("Walkable chars should be exactly 3, got %d\n", n); is_failed = TRUE; } if (!is_walkables_include (L'e', walkables, n)) { printf ("Walkable chars do not include 'e'\n"); is_failed = TRUE; } if (!is_walkables_include (L'i', walkables, n)) { printf ("Walkable chars do not include 'i'\n"); is_failed = TRUE; } if (!is_walkables_include (L'o', walkables, n)) { printf ("Walkable chars do not include 'o'\n"); is_failed = TRUE; } if (is_failed) { printf ("Walkables = "); print_walkables (walkables, n); printf ("\n"); goto err_trie_state_t_created; } /* walk from s (6) with "ize" */ msg_step ("Try walking from (6) with 'i' to (7)"); trie_state_copy (t, s); if (!trie_state_walk (t, L'i')) { printf ("Failed to walk from (6) with 'i' to (7)\n"); goto err_trie_state_t_created; } msg_step ("Try walking from (7) with 'z' to (8)"); if (!trie_state_walk (t, L'z')) { printf ("Failed to walk from (7) with 'z' to (8)\n"); goto err_trie_state_t_created; } if (!trie_state_is_single (t)) { printf ("(7) should be single, but isn't.\n"); goto err_trie_state_t_created; } msg_step ("Try walking from (8) with 'e' to (9)"); if (!trie_state_walk (t, L'e')) { printf ("Failed to walk from (8) with 'e' to (9)\n"); goto err_trie_state_t_created; } if (!trie_state_is_terminal (t)) { printf ("(9) should be terminal, but isn't.\n"); goto err_trie_state_t_created; } msg_step ("Try getting data from (9)"); data = trie_state_get_data (t); if (TRIE_DATA_ERROR == data) { printf ("Failed to get data from (9)\n"); goto err_trie_state_t_created; } if (TRIE_DATA_UNREAD != data) { printf ("Mismatched data from (9), expected %d, got %d\n", TRIE_DATA_UNREAD, data); goto err_trie_state_t_created; } /* walk from u = s (6) with 'e' to (10) */ msg_step ("Try walking from (6) with 'e' to (10)"); u = trie_state_clone (s); if (!u) { printf ("Failed to clone trie state\n"); goto err_trie_state_t_created; } if (!trie_state_walk (u, L'e')) { printf ("Failed to walk from (6) with 'e' to (10)\n"); goto err_trie_state_u_created; } /* walkable chars from (10) should be {'p', 'v'} */ msg_step ("Now at (10), walkable chars should be {'p', 'v'}"); is_failed = FALSE; n = trie_state_walkable_chars (u, walkables, ALPHABET_SIZE); if (2 != n) { printf ("Walkable chars should be exactly 2, got %d\n", n); is_failed = TRUE; } if (!is_walkables_include (L'p', walkables, n)) { printf ("Walkable chars do not include 'p'\n"); is_failed = TRUE; } if (!is_walkables_include (L'v', walkables, n)) { printf ("Walkable chars do not include 'v'\n"); is_failed = TRUE; } if (is_failed) { printf ("Walkables = "); print_walkables (walkables, n); printf ("\n"); goto err_trie_state_u_created; } /* walk from u (10) with "view" */ msg_step ("Try walking from (10) with 'v' to (11)"); trie_state_copy (t, u); if (!trie_state_walk (t, L'v')) { printf ("Failed to walk from (10) with 'v' to (11)\n"); goto err_trie_state_u_created; } if (!trie_state_is_single (t)) { printf ("(11) should be single, but isn't.\n"); goto err_trie_state_u_created; } msg_step ("Try walking from (11) with 'i' to (12)"); if (!trie_state_walk (t, L'i')) { printf ("Failed to walk from (11) with 'i' to (12)\n"); goto err_trie_state_u_created; } msg_step ("Try walking from (12) with 'e' to (13)"); if (!trie_state_walk (t, L'e')) { printf ("Failed to walk from (12) with 'e' to (13)\n"); goto err_trie_state_u_created; } msg_step ("Try walking from (13) with 'w' to (14)"); if (!trie_state_walk (t, L'w')) { printf ("Failed to walk from (13) with 'w' to (14)\n"); goto err_trie_state_u_created; } if (!trie_state_is_terminal (t)) { printf ("(14) should be terminal, but isn't.\n"); goto err_trie_state_u_created; } msg_step ("Try getting data from (14)"); data = trie_state_get_data (t); if (TRIE_DATA_ERROR == data) { printf ("Failed to get data from (14)\n"); goto err_trie_state_u_created; } if (TRIE_DATA_UNREAD != data) { printf ("Mismatched data from (14), expected %d, got %d\n", TRIE_DATA_UNREAD, data); goto err_trie_state_u_created; } /* walk from u (10) with "pare" */ msg_step ("Try walking from (10) with 'p' to (15)"); trie_state_copy (t, u); if (!trie_state_walk (t, L'p')) { printf ("Failed to walk from (10) with 'p' to (15)\n"); goto err_trie_state_u_created; } if (!trie_state_is_single (t)) { printf ("(15) should be single, but isn't.\n"); goto err_trie_state_u_created; } msg_step ("Try walking from (15) with 'a' to (16)"); if (!trie_state_walk (t, L'a')) { printf ("Failed to walk from (15) with 'a' to (16)\n"); goto err_trie_state_u_created; } msg_step ("Try walking from (16) with 'r' to (17)"); if (!trie_state_walk (t, L'r')) { printf ("Failed to walk from (16) with 'r' to (17)\n"); goto err_trie_state_u_created; } msg_step ("Try walking from (17) with 'e' to (18)"); if (!trie_state_walk (t, L'e')) { printf ("Failed to walk from (17) with 'e' to (18)\n"); goto err_trie_state_u_created; } if (!trie_state_is_terminal (t)) { printf ("(18) should be terminal, but isn't.\n"); goto err_trie_state_u_created; } msg_step ("Try getting data from (18)"); data = trie_state_get_data (t); if (TRIE_DATA_ERROR == data) { printf ("Failed to get data from (18)\n"); goto err_trie_state_u_created; } if (TRIE_DATA_UNREAD != data) { printf ("Mismatched data from (18), expected %d, got %d\n", TRIE_DATA_UNREAD, data); goto err_trie_state_u_created; } trie_state_free (u); /* walk s from (6) with 'o' to (19) */ msg_step ("Try walking from (6) with 'o' to (19)"); if (!trie_state_walk (s, L'o')) { printf ("Failed to walk from (6) with 'o' to (19)\n"); goto err_trie_state_t_created; } msg_step ("Now at (19), walkable chars should be {'d', 'g'}"); is_failed = FALSE; n = trie_state_walkable_chars (s, walkables, ALPHABET_SIZE); if (2 != n) { printf ("Walkable chars should be exactly 2, got %d\n", n); is_failed = TRUE; } if (!is_walkables_include (L'd', walkables, n)) { printf ("Walkable chars do not include 'd'\n"); is_failed = TRUE; } if (!is_walkables_include (L'g', walkables, n)) { printf ("Walkable chars do not include 'g'\n"); is_failed = TRUE; } if (is_failed) { printf ("Walkables = "); print_walkables (walkables, n); printf ("\n"); goto err_trie_state_t_created; } /* walk from s (19) with "duce" */ msg_step ("Try walking from (19) with 'd' to (20)"); trie_state_copy (t, s); if (!trie_state_walk (t, L'd')) { printf ("Failed to walk from (19) with 'd' to (20)\n"); goto err_trie_state_t_created; } if (!trie_state_is_single (t)) { printf ("(20) should be single, but isn't.\n"); goto err_trie_state_t_created; } msg_step ("Try walking from (20) with 'u' to (21)"); if (!trie_state_walk (t, L'u')) { printf ("Failed to walk from (20) with 'u' to (21)\n"); goto err_trie_state_t_created; } msg_step ("Try walking from (21) with 'c' to (22)"); if (!trie_state_walk (t, L'c')) { printf ("Failed to walk from (21) with 'c' to (22)\n"); goto err_trie_state_t_created; } msg_step ("Try walking from (22) with 'e' to (23)"); if (!trie_state_walk (t, L'e')) { printf ("Failed to walk from (22) with 'e' to (23)\n"); goto err_trie_state_t_created; } if (!trie_state_is_terminal (t)) { printf ("(23) should be terminal, but isn't.\n"); goto err_trie_state_t_created; } msg_step ("Try getting data from (23)"); data = trie_state_get_data (t); if (TRIE_DATA_ERROR == data) { printf ("Failed to get data from (23)\n"); goto err_trie_state_t_created; } if (TRIE_DATA_UNREAD != data) { printf ("Mismatched data from (23), expected %d, got %d\n", TRIE_DATA_UNREAD, data); goto err_trie_state_t_created; } trie_state_free (t); /* walk from s (19) with "gress" */ msg_step ("Try walking from (19) with 'g' to (24)"); if (!trie_state_walk (s, L'g')) { printf ("Failed to walk from (19) with 'g' to (24)\n"); goto err_trie_state_s_created; } if (!trie_state_is_single (s)) { printf ("(24) should be single, but isn't.\n"); goto err_trie_state_s_created; } msg_step ("Try walking from (24) with 'r' to (25)"); if (!trie_state_walk (s, L'r')) { printf ("Failed to walk from (24) with 'r' to (25)\n"); goto err_trie_state_s_created; } msg_step ("Try walking from (25) with 'e' to (26)"); if (!trie_state_walk (s, L'e')) { printf ("Failed to walk from (25) with 'e' to (26)\n"); goto err_trie_state_s_created; } msg_step ("Try walking from (26) with 's' to (27)"); if (!trie_state_walk (s, L's')) { printf ("Failed to walk from (26) with 's' to (27)\n"); goto err_trie_state_s_created; } msg_step ("Try walking from (27) with 's' to (28)"); if (!trie_state_walk (s, L's')) { printf ("Failed to walk from (27) with 's' to (28)\n"); goto err_trie_state_s_created; } if (!trie_state_is_terminal (s)) { printf ("(28) should be terminal, but isn't.\n"); goto err_trie_state_s_created; } msg_step ("Try getting data from (28)"); data = trie_state_get_data (s); if (TRIE_DATA_ERROR == data) { printf ("Failed to get data from (28)\n"); goto err_trie_state_s_created; } if (TRIE_DATA_UNREAD != data) { printf ("Mismatched data from (28), expected %d, got %d\n", TRIE_DATA_UNREAD, data); goto err_trie_state_s_created; } trie_state_free (s); trie_free (test_trie); return 0; err_trie_state_u_created: trie_state_free (u); err_trie_state_t_created: trie_state_free (t); err_trie_state_s_created: trie_state_free (s); err_trie_created: trie_free (test_trie); err_trie_not_created: return 1; } /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/tests/utils.c0000644000175000017500000001152400000000000021060 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2013 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * utils.c - Utility functions for datrie test cases * Created: 2013-10-16 * Author: Theppitak Karoonboonyanan */ #include #include "utils.h" /*---------------------* * Debugging helpers * *---------------------*/ void msg_step (const char *msg) { printf ("=> %s...\n", msg); } /*-------------------------* * Trie creation helpers * *-------------------------*/ static AlphaMap * en_alpha_map_new () { AlphaMap *en_map; en_map = alpha_map_new (); if (!en_map) goto err_map_not_created; if (alpha_map_add_range (en_map, 0x0061, 0x007a) != 0) goto err_map_created; return en_map; err_map_created: alpha_map_free (en_map); err_map_not_created: return NULL; } Trie * en_trie_new () { AlphaMap *en_map; Trie *en_trie; en_map = en_alpha_map_new (); if (!en_map) goto err_map_not_created; en_trie = trie_new (en_map); if (!en_trie) goto err_map_created; alpha_map_free (en_map); return en_trie; err_map_created: alpha_map_free (en_map); err_map_not_created: return NULL; } /*---------------------------* * Dict source for testing * *---------------------------*/ DictRec dict_src[] = { {(AlphaChar *)L"a", TRIE_DATA_UNREAD}, {(AlphaChar *)L"abacus", TRIE_DATA_UNREAD}, {(AlphaChar *)L"abandon", TRIE_DATA_UNREAD}, {(AlphaChar *)L"accident", TRIE_DATA_UNREAD}, {(AlphaChar *)L"accredit", TRIE_DATA_UNREAD}, {(AlphaChar *)L"algorithm", TRIE_DATA_UNREAD}, {(AlphaChar *)L"ammonia", TRIE_DATA_UNREAD}, {(AlphaChar *)L"angel", TRIE_DATA_UNREAD}, {(AlphaChar *)L"angle", TRIE_DATA_UNREAD}, {(AlphaChar *)L"azure", TRIE_DATA_UNREAD}, {(AlphaChar *)L"bat", TRIE_DATA_UNREAD}, {(AlphaChar *)L"bet", TRIE_DATA_UNREAD}, {(AlphaChar *)L"best", TRIE_DATA_UNREAD}, {(AlphaChar *)L"home", TRIE_DATA_UNREAD}, {(AlphaChar *)L"house", TRIE_DATA_UNREAD}, {(AlphaChar *)L"hut", TRIE_DATA_UNREAD}, {(AlphaChar *)L"king", TRIE_DATA_UNREAD}, {(AlphaChar *)L"kite", TRIE_DATA_UNREAD}, {(AlphaChar *)L"name", TRIE_DATA_UNREAD}, {(AlphaChar *)L"net", TRIE_DATA_UNREAD}, {(AlphaChar *)L"network", TRIE_DATA_UNREAD}, {(AlphaChar *)L"nut", TRIE_DATA_UNREAD}, {(AlphaChar *)L"nutshell", TRIE_DATA_UNREAD}, {(AlphaChar *)L"quality", TRIE_DATA_UNREAD}, {(AlphaChar *)L"quantum", TRIE_DATA_UNREAD}, {(AlphaChar *)L"quantity", TRIE_DATA_UNREAD}, {(AlphaChar *)L"quartz", TRIE_DATA_UNREAD}, {(AlphaChar *)L"quick", TRIE_DATA_UNREAD}, {(AlphaChar *)L"quiz", TRIE_DATA_UNREAD}, {(AlphaChar *)L"run", TRIE_DATA_UNREAD}, {(AlphaChar *)L"tape", TRIE_DATA_UNREAD}, {(AlphaChar *)L"test", TRIE_DATA_UNREAD}, {(AlphaChar *)L"what", TRIE_DATA_UNREAD}, {(AlphaChar *)L"when", TRIE_DATA_UNREAD}, {(AlphaChar *)L"where", TRIE_DATA_UNREAD}, {(AlphaChar *)L"which", TRIE_DATA_UNREAD}, {(AlphaChar *)L"who", TRIE_DATA_UNREAD}, {(AlphaChar *)L"why", TRIE_DATA_UNREAD}, {(AlphaChar *)L"zebra", TRIE_DATA_UNREAD}, {(AlphaChar *)NULL, TRIE_DATA_ERROR}, }; int dict_src_n_entries () { return sizeof (dict_src) / sizeof (dict_src[0]) - 1; } TrieData dict_src_get_data (const AlphaChar *key) { const DictRec *dict_p; for (dict_p = dict_src; dict_p->key; dict_p++) { if (alpha_char_strcmp (dict_p->key, key) == 0) { return dict_p->data; } } return TRIE_DATA_ERROR; } int dict_src_set_data (const AlphaChar *key, TrieData data) { DictRec *dict_p; for (dict_p = dict_src; dict_p->key; dict_p++) { if (alpha_char_strcmp (dict_p->key, key) == 0) { dict_p->data = data; return 0; } } return -1; } /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/tests/utils.h0000644000175000017500000000337200000000000021067 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * libdatrie - Double-Array Trie Library * Copyright (C) 2013 Theppitak Karoonboonyanan * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ /* * utils.h - Utility functions for datrie test cases * Created: 2013-10-16 * Author: Theppitak Karoonboonyanan */ #include /*---------------------* * Debugging helpers * *---------------------*/ void msg_step (const char *msg); /*-------------------------* * Trie creation helpers * *-------------------------*/ Trie * en_trie_new (); /*---------------------------* * Dict source for testing * *---------------------------*/ typedef struct _DictRec DictRec; struct _DictRec { AlphaChar *key; TrieData data; }; #define TRIE_DATA_UNREAD 1 #define TRIE_DATA_READ 2 extern DictRec dict_src[]; int dict_src_n_entries (); TrieData dict_src_get_data (const AlphaChar *key); int dict_src_set_data (const AlphaChar *key, TrieData data); /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1585193107.7794485 datrie-0.8.2/libdatrie/tools/0000755000175000017500000000000000000000000017547 5ustar00tcaswelltcaswell00000000000000././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585089005.0 datrie-0.8.2/libdatrie/tools/trietool.c0000644000175000017500000004004300000000000021555 0ustar00tcaswelltcaswell00000000000000/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */ /* * trietool.c - Trie manipulation tool * Created: 2006-08-15 * Author: Theppitak Karoonboonyanan */ #include #include #include #include #include #include #if defined(HAVE_LOCALE_CHARSET) # include #elif defined (HAVE_LANGINFO_CODESET) # include # define locale_charset() nl_langinfo(CODESET) #endif #include #include #include #include /* iconv encoding name for AlphaChar string */ #define ALPHA_ENC "UCS-4LE" #define N_ELEMENTS(a) (sizeof(a)/sizeof((a)[0])) typedef struct { const char *path; const char *trie_name; iconv_t to_alpha_conv; iconv_t from_alpha_conv; Trie *trie; } ProgEnv; static void init_conv (ProgEnv *env); static size_t conv_to_alpha (ProgEnv *env, const char *in, AlphaChar *out, size_t out_size); static size_t conv_from_alpha (ProgEnv *env, const AlphaChar *in, char *out, size_t out_size); static void close_conv (ProgEnv *env); static int prepare_trie (ProgEnv *env); static int close_trie (ProgEnv *env); static int decode_switch (int argc, char *argv[], ProgEnv *env); static int decode_command (int argc, char *argv[], ProgEnv *env); static int command_add (int argc, char *argv[], ProgEnv *env); static int command_add_list (int argc, char *argv[], ProgEnv *env); static int command_delete (int argc, char *argv[], ProgEnv *env); static int command_delete_list (int argc, char *argv[], ProgEnv *env); static int command_query (int argc, char *argv[], ProgEnv *env); static int command_list (int argc, char *argv[], ProgEnv *env); static void usage (const char *prog_name, int exit_status); static char *string_trim (char *s); int main (int argc, char *argv[]) { int i; ProgEnv env; int ret; env.path = "."; init_conv (&env); i = decode_switch (argc, argv, &env); if (i == argc) usage (argv[0], EXIT_FAILURE); env.trie_name = argv[i++]; if (prepare_trie (&env) != 0) exit (EXIT_FAILURE); ret = decode_command (argc - i, argv + i, &env); if (close_trie (&env) != 0) exit (EXIT_FAILURE); close_conv (&env); return ret; } static void init_conv (ProgEnv *env) { const char *prev_locale; const char *locale_codeset; prev_locale = setlocale (LC_CTYPE, ""); locale_codeset = locale_charset(); setlocale (LC_CTYPE, prev_locale); env->to_alpha_conv = iconv_open (ALPHA_ENC, locale_codeset); env->from_alpha_conv = iconv_open (locale_codeset, ALPHA_ENC); } static size_t conv_to_alpha (ProgEnv *env, const char *in, AlphaChar *out, size_t out_size) { char *in_p = (char *) in; char *out_p = (char *) out; size_t in_left = strlen (in); size_t out_left = out_size * sizeof (AlphaChar); size_t res; const unsigned char *byte_p; assert (sizeof (AlphaChar) == 4); /* convert to UCS-4LE */ res = iconv (env->to_alpha_conv, (char **) &in_p, &in_left, &out_p, &out_left); if (res == (size_t) -1) return res; /* convert UCS-4LE to AlphaChar string */ res = 0; for (byte_p = (const unsigned char *) out; res < out_size && byte_p + 3 < (unsigned char*) out_p; byte_p += 4) { out[res++] = byte_p[0] | (byte_p[1] << 8) | (byte_p[2] << 16) | (byte_p[3] << 24); } if (res < out_size) { out[res] = 0; } return res; } static size_t conv_from_alpha (ProgEnv *env, const AlphaChar *in, char *out, size_t out_size) { size_t in_left = alpha_char_strlen (in) * sizeof (AlphaChar); size_t res; assert (sizeof (AlphaChar) == 4); /* convert AlphaChar to UCS-4LE */ for (res = 0; in[res]; res++) { unsigned char b[4]; b[0] = in[res] & 0xff; b[1] = (in[res] >> 8) & 0xff; b[2] = (in[res] >> 16) & 0xff; b[3] = (in[res] >> 24) & 0xff; memcpy ((char *) &in[res], b, 4); } /* convert UCS-4LE to locale codeset */ res = iconv (env->from_alpha_conv, (char **) &in, &in_left, &out, &out_size); *out = 0; return res; } static void close_conv (ProgEnv *env) { iconv_close (env->to_alpha_conv); iconv_close (env->from_alpha_conv); } static int prepare_trie (ProgEnv *env) { char buff[256]; snprintf (buff, sizeof (buff), "%s/%s.tri", env->path, env->trie_name); env->trie = trie_new_from_file (buff); if (!env->trie) { FILE *sbm; AlphaMap *alpha_map; snprintf (buff, sizeof (buff), "%s/%s.abm", env->path, env->trie_name); sbm = fopen (buff, "r"); if (!sbm) { fprintf (stderr, "Cannot open alphabet map file %s\n", buff); return -1; } alpha_map = alpha_map_new (); while (fgets (buff, sizeof (buff), sbm)) { int b, e; /* read the range * format: [b,e] * where: b = begin char, e = end char; both in hex values */ if (sscanf (buff, " [ %x , %x ] ", &b, &e) != 2) continue; if (b > e) { fprintf (stderr, "Range begin (%x) > range end (%x)\n", b, e); continue; } alpha_map_add_range (alpha_map, b, e); } env->trie = trie_new (alpha_map); alpha_map_free (alpha_map); fclose (sbm); } return 0; } static int close_trie (ProgEnv *env) { if (trie_is_dirty (env->trie)) { char path[256]; snprintf (path, sizeof (path), "%s/%s.tri", env->path, env->trie_name); if (trie_save (env->trie, path) != 0) { fprintf (stderr, "Cannot save trie to %s\n", path); return -1; } } trie_free (env->trie); return 0; } static int decode_switch (int argc, char *argv[], ProgEnv *env) { int opt_idx; for (opt_idx = 1; opt_idx < argc && *argv[opt_idx] == '-'; opt_idx++) { if (strcmp (argv[opt_idx], "-h") == 0 || strcmp (argv[opt_idx], "--help") == 0) { usage (argv[0], EXIT_FAILURE); } else if (strcmp (argv[opt_idx], "-V") == 0 || strcmp (argv[opt_idx], "--version") == 0) { printf ("%s\n", VERSION); exit (EXIT_FAILURE); } else if (strcmp (argv[opt_idx], "-p") == 0 || strcmp (argv[opt_idx], "--path") == 0) { env->path = argv[++opt_idx]; } else if (strcmp (argv[opt_idx], "--") == 0) { ++opt_idx; break; } else { fprintf (stderr, "Unknown option: %s\n", argv[opt_idx]); exit (EXIT_FAILURE); } } return opt_idx; } static int decode_command (int argc, char *argv[], ProgEnv *env) { int opt_idx; for (opt_idx = 0; opt_idx < argc; opt_idx++) { if (strcmp (argv[opt_idx], "add") == 0) { ++opt_idx; opt_idx += command_add (argc - opt_idx, argv + opt_idx, env); } else if (strcmp (argv[opt_idx], "add-list") == 0) { ++opt_idx; opt_idx += command_add_list (argc - opt_idx, argv + opt_idx, env); } else if (strcmp (argv[opt_idx], "delete") == 0) { ++opt_idx; opt_idx += command_delete (argc - opt_idx, argv + opt_idx, env); } else if (strcmp (argv[opt_idx], "delete-list") == 0) { ++opt_idx; opt_idx += command_delete_list (argc - opt_idx, argv + opt_idx, env); } else if (strcmp (argv[opt_idx], "query") == 0) { ++opt_idx; opt_idx += command_query (argc - opt_idx, argv + opt_idx, env); } else if (strcmp (argv[opt_idx], "list") == 0) { ++opt_idx; opt_idx += command_list (argc - opt_idx, argv + opt_idx, env); } else { fprintf (stderr, "Unknown command: %s\n", argv[opt_idx]); return EXIT_FAILURE; } } return EXIT_SUCCESS; } static int command_add (int argc, char *argv[], ProgEnv *env) { int opt_idx; opt_idx = 0; while (opt_idx < argc) { const char *key; AlphaChar key_alpha[256]; TrieData data; key = argv[opt_idx++]; data = (opt_idx < argc) ? atoi (argv[opt_idx++]) : TRIE_DATA_ERROR; conv_to_alpha (env, key, key_alpha, N_ELEMENTS (key_alpha)); if (!trie_store (env->trie, key_alpha, data)) { fprintf (stderr, "Failed to add entry '%s' with data %d\n", key, data); } } return opt_idx; } static int command_add_list (int argc, char *argv[], ProgEnv *env) { const char *enc_name, *input_name; int opt_idx; iconv_t saved_conv; FILE *input; char line[256]; enc_name = 0; opt_idx = 0; saved_conv = env->to_alpha_conv; if (strcmp (argv[0], "-e") == 0 || strcmp (argv[0], "--encoding") == 0) { if (++opt_idx >= argc) { fprintf (stderr, "add-list option \"%s\" requires encoding name", argv[0]); return opt_idx; } enc_name = argv[opt_idx++]; } if (opt_idx >= argc) { fprintf (stderr, "add-list requires input word list file name\n"); return opt_idx; } input_name = argv[opt_idx++]; if (enc_name) { iconv_t conv = iconv_open (ALPHA_ENC, enc_name); if ((iconv_t) -1 == conv) { fprintf (stderr, "Conversion from \"%s\" to \"%s\" is not supported.\n", enc_name, ALPHA_ENC); return opt_idx; } env->to_alpha_conv = conv; } input = fopen (input_name, "r"); if (!input) { fprintf (stderr, "add-list: Cannot open input file \"%s\"\n", input_name); goto exit_iconv_openned; } while (fgets (line, sizeof line, input)) { char *key, *data; AlphaChar key_alpha[256]; TrieData data_val; key = string_trim (line); if ('\0' != *key) { /* find key boundary */ for (data = key; *data && !strchr ("\t,", *data); ++data) ; /* mark key ending and find data begin */ if ('\0' != *data) { *data++ = '\0'; while (isspace (*data)) ++data; } /* decode data */ data_val = ('\0' != *data) ? atoi (data) : TRIE_DATA_ERROR; /* store the key */ conv_to_alpha (env, key, key_alpha, N_ELEMENTS (key_alpha)); if (!trie_store (env->trie, key_alpha, data_val)) fprintf (stderr, "Failed to add key '%s' with data %d.\n", key, data_val); } } fclose (input); exit_iconv_openned: if (enc_name) { iconv_close (env->to_alpha_conv); env->to_alpha_conv = saved_conv; } return opt_idx; } static int command_delete (int argc, char *argv[], ProgEnv *env) { int opt_idx; for (opt_idx = 0; opt_idx < argc; opt_idx++) { AlphaChar key_alpha[256]; conv_to_alpha (env, argv[opt_idx], key_alpha, N_ELEMENTS (key_alpha)); if (!trie_delete (env->trie, key_alpha)) { fprintf (stderr, "No entry '%s'. Not deleted.\n", argv[opt_idx]); } } return opt_idx; } static int command_delete_list (int argc, char *argv[], ProgEnv *env) { const char *enc_name, *input_name; int opt_idx; iconv_t saved_conv; FILE *input; char line[256]; enc_name = 0; opt_idx = 0; saved_conv = env->to_alpha_conv; if (strcmp (argv[0], "-e") == 0 || strcmp (argv[0], "--encoding") == 0) { if (++opt_idx >= argc) { fprintf (stderr, "delete-list option \"%s\" requires encoding name", argv[0]); return opt_idx; } enc_name = argv[opt_idx++]; } if (opt_idx >= argc) { fprintf (stderr, "delete-list requires input word list file name\n"); return opt_idx; } input_name = argv[opt_idx++]; if (enc_name) { iconv_t conv = iconv_open (ALPHA_ENC, enc_name); if ((iconv_t) -1 == conv) { fprintf (stderr, "Conversion from \"%s\" to \"%s\" is not supported.\n", enc_name, ALPHA_ENC); return opt_idx; } env->to_alpha_conv = conv; } input = fopen (input_name, "r"); if (!input) { fprintf (stderr, "delete-list: Cannot open input file \"%s\"\n", input_name); goto exit_iconv_openned; } while (fgets (line, sizeof line, input)) { char *p; p = string_trim (line); if ('\0' != *p) { AlphaChar key_alpha[256]; conv_to_alpha (env, p, key_alpha, N_ELEMENTS (key_alpha)); if (!trie_delete (env->trie, key_alpha)) { fprintf (stderr, "No entry '%s'. Not deleted.\n", p); } } } fclose (input); exit_iconv_openned: if (enc_name) { iconv_close (env->to_alpha_conv); env->to_alpha_conv = saved_conv; } return opt_idx; } static int command_query (int argc, char *argv[], ProgEnv *env) { AlphaChar key_alpha[256]; TrieData data; if (argc == 0) { fprintf (stderr, "query: No key specified.\n"); return 0; } conv_to_alpha (env, argv[0], key_alpha, N_ELEMENTS (key_alpha)); if (trie_retrieve (env->trie, key_alpha, &data)) { printf ("%d\n", data); } else { fprintf (stderr, "query: Key '%s' not found.\n", argv[0]); } return 1; } static Bool list_enum_func (const AlphaChar *key, TrieData key_data, void *user_data) { ProgEnv *env = (ProgEnv *) user_data; char key_locale[1024]; conv_from_alpha (env, key, key_locale, N_ELEMENTS (key_locale)); printf ("%s\t%d\n", key_locale, key_data); return TRUE; } static int command_list (int argc, char *argv[], ProgEnv *env) { trie_enumerate (env->trie, list_enum_func, (void *) env); return 0; } static void usage (const char *prog_name, int exit_status) { printf ("%s - double-array trie manipulator\n", prog_name); printf ("Usage: %s [OPTION]... TRIE CMD ARG ...\n", prog_name); printf ( "Options:\n" " -p, --path DIR set trie directory to DIR [default=.]\n" " -h, --help display this help and exit\n" " -V, --version output version information and exit\n" "\n" "Commands:\n" " add WORD DATA ...\n" " Add WORD with DATA to trie\n" " add-list [OPTION] LISTFILE\n" " Add words and data listed in LISTFILE to trie\n" " Options:\n" " -e, --encoding ENC specify character encoding of LISTFILE\n" " delete WORD ...\n" " Delete WORD from trie\n" " delete-list [OPTION] LISTFILE\n" " Delete words listed in LISTFILE from trie\n" " Options:\n" " -e, --encoding ENC specify character encoding of LISTFILE\n" " query WORD\n" " Query WORD data from trie\n" " list\n" " List all words in trie\n" ); exit (exit_status); } static char * string_trim (char *s) { char *p; /* skip leading white spaces */ while (*s && isspace (*s)) ++s; /* trim trailing white spaces */ p = s + strlen (s) - 1; while (isspace (*p)) --p; *++p = '\0'; return s; } /* vi:ts=4:ai:expandtab */ ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585190616.0 datrie-0.8.2/pyproject.toml0000644000175000017500000000012200000000000017357 0ustar00tcaswelltcaswell00000000000000[build-system] requires = [ "setuptools>=40.8.0", "wheel", "Cython" ] ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1585193107.7794485 datrie-0.8.2/setup.cfg0000644000175000017500000000007700000000000016275 0ustar00tcaswelltcaswell00000000000000[aliases] test = pytest [egg_info] tag_build = tag_date = 0 ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585190616.0 datrie-0.8.2/setup.py0000755000175000017500000000421500000000000016167 0ustar00tcaswelltcaswell00000000000000#! /usr/bin/env python """Super-fast, efficiently stored Trie for Python.""" import glob import os from setuptools import setup, Extension from Cython.Build import cythonize LIBDATRIE_DIR = 'libdatrie' LIBDATRIE_FILES = sorted(glob.glob(os.path.join(LIBDATRIE_DIR, "datrie", "*.c"))) DESCRIPTION = __doc__ LONG_DESCRIPTION = open('README.rst').read() + open('CHANGES.rst').read() LICENSE = 'LGPLv2+' CLASSIFIERS = [ 'Development Status :: 4 - Beta', 'Intended Audience :: Developers', 'Intended Audience :: Science/Research', 'License :: OSI Approved :: GNU Lesser General Public License v2 or later (LGPLv2+)', 'Programming Language :: Cython', 'Programming Language :: Python', 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: 3.5', 'Programming Language :: Python :: 3.6', 'Programming Language :: Python :: 3.7', 'Programming Language :: Python :: 3.8', 'Programming Language :: Python :: Implementation :: CPython', 'Topic :: Software Development :: Libraries :: Python Modules', 'Topic :: Scientific/Engineering :: Information Analysis', 'Topic :: Text Processing :: Linguistic' ] ext_modules = cythonize( 'src/datrie.pyx', 'src/cdatrie.pxd', 'src/stdio_ext.pxd', annotate=True, include_path=[os.path.join(os.path.dirname(os.path.abspath(__file__)), "src")], language_level=2 ) for m in ext_modules: m.include_dirs=[LIBDATRIE_DIR] setup(name="datrie", version="0.8.2", description=DESCRIPTION, long_description=LONG_DESCRIPTION, author='Mikhail Korobov', author_email='kmike84@gmail.com', license=LICENSE, url='https://github.com/kmike/datrie', classifiers=CLASSIFIERS, libraries=[('datrie', { "sources": LIBDATRIE_FILES, "include_dirs": [LIBDATRIE_DIR]})], ext_modules=ext_modules, python_requires=">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*", setup_requires=["pytest-runner", 'Cython>=0.28'], tests_require=["pytest", "hypothesis"]) ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1585193107.7794485 datrie-0.8.2/src/0000755000175000017500000000000000000000000015237 5ustar00tcaswelltcaswell00000000000000././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585088538.0 datrie-0.8.2/src/cdatrie.pxd0000644000175000017500000000535000000000000017372 0ustar00tcaswelltcaswell00000000000000# cython: profile=False from libc cimport stdio cdef extern from "../libdatrie/datrie/triedefs.h": ctypedef int AlphaChar # it should be utf32 letter ctypedef unsigned char TrieChar # 1 byte ctypedef int TrieIndex ctypedef int TrieData # int cdef extern from "../libdatrie/datrie/alpha-map.h": struct AlphaMap: pass AlphaMap * alpha_map_new() void alpha_map_free (AlphaMap *alpha_map) AlphaMap * alpha_map_clone (AlphaMap *a_map) int alpha_map_add_range (AlphaMap *alpha_map, AlphaChar begin, AlphaChar end) int alpha_char_strlen (AlphaChar *str) cdef extern from "../libdatrie/datrie/trie.h": ctypedef struct Trie: pass ctypedef struct TrieState: pass ctypedef struct TrieIterator: pass ctypedef int TrieData ctypedef bint (*TrieEnumFunc) (AlphaChar *key, TrieData key_data, void *user_data) int TRIE_CHAR_TERM int TRIE_DATA_ERROR # ========== GENERAL FUNCTIONS ========== Trie * trie_new (AlphaMap *alpha_map) Trie * trie_new_from_file (char *path) Trie * trie_fread (stdio.FILE *file) void trie_free (Trie *trie) int trie_save (Trie *trie, char *path) int trie_fwrite (Trie *trie, stdio.FILE *file) bint trie_is_dirty (Trie *trie) # =========== GENERAL QUERY OPERATIONS ========= bint trie_retrieve (Trie *trie, AlphaChar *key, TrieData *o_data) bint trie_store (Trie *trie, AlphaChar *key, TrieData data) bint trie_store_if_absent (Trie *trie, AlphaChar *key, TrieData data) bint trie_delete (Trie *trie, AlphaChar *key) bint trie_enumerate (Trie *trie, TrieEnumFunc enum_func, void *user_data); # ======== STEPWISE QUERY OPERATIONS ======== TrieState * trie_root (Trie *trie) # ========= TRIE STATE =============== TrieState * trie_state_clone (TrieState *s) void trie_state_copy (TrieState *dst, TrieState *src) void trie_state_free (TrieState *s) void trie_state_rewind (TrieState *s) bint trie_state_walk (TrieState *s, AlphaChar c) bint trie_state_is_walkable (TrieState *s, AlphaChar c) bint trie_state_is_terminal(TrieState * s) bint trie_state_is_single (TrieState *s) bint trie_state_is_leaf(TrieState* s) TrieData trie_state_get_data (TrieState *s) TrieData trie_state_get_data (TrieState *s) # ============== ITERATION =================== TrieIterator* trie_iterator_new (TrieState *s) void trie_iterator_free (TrieIterator *iter) bint trie_iterator_next (TrieIterator *iter) AlphaChar * trie_iterator_get_key (TrieIterator *iter) TrieData trie_iterator_get_data (TrieIterator *iter) ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585190616.0 datrie-0.8.2/src/datrie.pyx0000644000175000017500000010527500000000000017263 0ustar00tcaswelltcaswell00000000000000# cython: profile=False """ Cython wrapper for libdatrie. """ from cpython.version cimport PY_MAJOR_VERSION from cython.operator import dereference as deref from libc.stdlib cimport malloc, free from libc cimport stdio from libc cimport string cimport stdio_ext cimport cdatrie import itertools import warnings import sys import tempfile try: from collections.abc import MutableMapping except ImportError: from collections import MutableMapping try: import cPickle as pickle except ImportError: import pickle class DatrieError(Exception): pass RAISE_KEY_ERROR = object() RERAISE_KEY_ERROR = object() DELETED_OBJECT = object() cdef class BaseTrie: """ Wrapper for libdatrie's trie. Keys are unicode strings, values are integers -2147483648 <= x <= 2147483647. """ cdef AlphaMap alpha_map cdef cdatrie.Trie *_c_trie def __init__(self, alphabet=None, ranges=None, AlphaMap alpha_map=None, _create=True): """ For efficiency trie needs to know what unicode symbols it should be able to store so this constructor requires either ``alphabet`` (a string/iterable with all allowed characters), ``ranges`` (a list of (begin, end) pairs, e.g. [('a', 'z')]) or ``alpha_map`` (:class:`datrie.AlphaMap` instance). """ if self._c_trie is not NULL: return if not _create: return if alphabet is None and ranges is None and alpha_map is None: raise ValueError( "Please provide alphabet, ranges or alpha_map argument.") if alpha_map is None: alpha_map = AlphaMap(alphabet, ranges) self.alpha_map = alpha_map self._c_trie = cdatrie.trie_new(alpha_map._c_alpha_map) if self._c_trie is NULL: raise MemoryError() def __dealloc__(self): if self._c_trie is not NULL: cdatrie.trie_free(self._c_trie) def update(self, other=(), **kwargs): if PY_MAJOR_VERSION == 2: if kwargs: raise TypeError("keyword arguments are not supported.") if hasattr(other, "keys"): for key in other: self[key] = other[key] else: for key, value in other: self[key] = value for key in kwargs: self[key] = kwargs[key] def clear(self): cdef AlphaMap alpha_map = self.alpha_map.copy() _c_trie = cdatrie.trie_new(alpha_map._c_alpha_map) if _c_trie is NULL: raise MemoryError() cdatrie.trie_free(self._c_trie) self._c_trie = _c_trie cpdef bint is_dirty(self): """ Returns True if the trie is dirty with some pending changes and needs saving to synchronize with the file. """ return cdatrie.trie_is_dirty(self._c_trie) def save(self, path): """ Saves this trie. """ with open(path, "wb", 0) as f: self.write(f) def write(self, f): """ Writes a trie to a file. File-like objects without real file descriptors are not supported. """ f.flush() cdef stdio.FILE* f_ptr = stdio_ext.fdopen(f.fileno(), "w") if f_ptr == NULL: raise IOError("Can't open file descriptor") cdef int res = cdatrie.trie_fwrite(self._c_trie, f_ptr) if res == -1: raise IOError("Can't write to file") stdio.fflush(f_ptr) @classmethod def load(cls, path): """ Loads a trie from file. """ with open(path, "rb", 0) as f: return cls.read(f) @classmethod def read(cls, f): """ Creates a new Trie by reading it from file. File-like objects without real file descriptors are not supported. # XXX: does it work properly in subclasses? """ cdef BaseTrie trie = cls(_create=False) trie._c_trie = _load_from_file(f) return trie def __reduce__(self): with tempfile.NamedTemporaryFile() as f: self.write(f) f.seek(0) state = f.read() return BaseTrie, (None, None, None, False), state def __setstate__(self, bytes state): assert self._c_trie is NULL with tempfile.NamedTemporaryFile() as f: f.write(state) f.flush() f.seek(0) self._c_trie = _load_from_file(f) def __setitem__(self, unicode key, cdatrie.TrieData value): self._setitem(key, value) cdef void _setitem(self, unicode key, cdatrie.TrieData value): cdef cdatrie.AlphaChar* c_key = new_alpha_char_from_unicode(key) try: cdatrie.trie_store(self._c_trie, c_key, value) finally: free(c_key) def __getitem__(self, unicode key): return self._getitem(key) def get(self, unicode key, default=None): try: return self._getitem(key) except KeyError: return default cdef cdatrie.TrieData _getitem(self, unicode key) except -1: cdef cdatrie.TrieData data cdef cdatrie.AlphaChar* c_key = new_alpha_char_from_unicode(key) try: found = cdatrie.trie_retrieve(self._c_trie, c_key, &data) finally: free(c_key) if not found: raise KeyError(key) return data def __contains__(self, unicode key): cdef cdatrie.AlphaChar* c_key = new_alpha_char_from_unicode(key) try: return cdatrie.trie_retrieve(self._c_trie, c_key, NULL) finally: free(c_key) def __delitem__(self, unicode key): self._delitem(key) def pop(self, unicode key, default=None): try: value = self[key] self._delitem(key) return value except KeyError: return default cpdef bint _delitem(self, unicode key) except -1: """ Deletes an entry for the given key from the trie. Returns boolean value indicating whether the key exists and is removed. """ cdef cdatrie.AlphaChar* c_key = new_alpha_char_from_unicode(key) try: found = cdatrie.trie_delete(self._c_trie, c_key) finally: free(c_key) if not found: raise KeyError(key) @staticmethod cdef int len_enumerator(cdatrie.AlphaChar *key, cdatrie.TrieData key_data, void *counter_ptr): (counter_ptr)[0] += 1 return True def __len__(self): cdef int counter = 0 cdatrie.trie_enumerate(self._c_trie, (self.len_enumerator), &counter) return counter def __richcmp__(self, other, int op): if op == 2: # == if other is self: return True elif not isinstance(other, BaseTrie): return False for key in self: if self[key] != other[key]: return False # XXX this can be written more efficiently via explicit iterators. return len(self) == len(other) elif op == 3: # != return not (self == other) raise TypeError("unorderable types: {0} and {1}".format( self.__class__, other.__class__)) def setdefault(self, unicode key, cdatrie.TrieData value): return self._setdefault(key, value) cdef cdatrie.TrieData _setdefault(self, unicode key, cdatrie.TrieData value): cdef cdatrie.AlphaChar* c_key = new_alpha_char_from_unicode(key) cdef cdatrie.TrieData data try: found = cdatrie.trie_retrieve(self._c_trie, c_key, &data) if found: return data else: cdatrie.trie_store(self._c_trie, c_key, value) return value finally: free(c_key) def iter_prefixes(self, unicode key): ''' Returns an iterator over the keys of this trie that are prefixes of ``key``. ''' cdef cdatrie.TrieState* state = cdatrie.trie_root(self._c_trie) if state == NULL: raise MemoryError() cdef int index = 1 try: for char in key: if not cdatrie.trie_state_walk(state, char): return if cdatrie.trie_state_is_terminal(state): yield key[:index] index += 1 finally: cdatrie.trie_state_free(state) def iter_prefix_items(self, unicode key): ''' Returns an iterator over the items (``(key,value)`` tuples) of this trie that are associated with keys that are prefixes of ``key``. ''' cdef cdatrie.TrieState* state = cdatrie.trie_root(self._c_trie) if state == NULL: raise MemoryError() cdef int index = 1 try: for char in key: if not cdatrie.trie_state_walk(state, char): return if cdatrie.trie_state_is_terminal(state): # word is found yield key[:index], cdatrie.trie_state_get_data(state) index += 1 finally: cdatrie.trie_state_free(state) def iter_prefix_values(self, unicode key): ''' Returns an iterator over the values of this trie that are associated with keys that are prefixes of ``key``. ''' cdef cdatrie.TrieState* state = cdatrie.trie_root(self._c_trie) if state == NULL: raise MemoryError() try: for char in key: if not cdatrie.trie_state_walk(state, char): return if cdatrie.trie_state_is_terminal(state): yield cdatrie.trie_state_get_data(state) finally: cdatrie.trie_state_free(state) def prefixes(self, unicode key): ''' Returns a list with keys of this trie that are prefixes of ``key``. ''' cdef cdatrie.TrieState* state = cdatrie.trie_root(self._c_trie) if state == NULL: raise MemoryError() cdef list result = [] cdef int index = 1 try: for char in key: if not cdatrie.trie_state_walk(state, char): break if cdatrie.trie_state_is_terminal(state): result.append(key[:index]) index += 1 return result finally: cdatrie.trie_state_free(state) cpdef suffixes(self, unicode prefix=u''): """ Returns a list of this trie's suffixes. If ``prefix`` is not empty, returns only the suffixes of words prefixed by ``prefix``. """ cdef bint success cdef list res = [] cdef BaseState state = BaseState(self) if prefix is not None: success = state.walk(prefix) if not success: return res cdef BaseIterator iter = BaseIterator(state) while iter.next(): res.append(iter.key()) return res def prefix_items(self, unicode key): ''' Returns a list of the items (``(key,value)`` tuples) of this trie that are associated with keys that are prefixes of ``key``. ''' return self._prefix_items(key) cdef list _prefix_items(self, unicode key): cdef cdatrie.TrieState* state = cdatrie.trie_root(self._c_trie) if state == NULL: raise MemoryError() cdef list result = [] cdef int index = 1 try: for char in key: if not cdatrie.trie_state_walk(state, char): break if cdatrie.trie_state_is_terminal(state): # word is found result.append( (key[:index], cdatrie.trie_state_get_data(state)) ) index += 1 return result finally: cdatrie.trie_state_free(state) def prefix_values(self, unicode key): ''' Returns a list of the values of this trie that are associated with keys that are prefixes of ``key``. ''' return self._prefix_values(key) cdef list _prefix_values(self, unicode key): cdef cdatrie.TrieState* state = cdatrie.trie_root(self._c_trie) if state == NULL: raise MemoryError() cdef list result = [] try: for char in key: if not cdatrie.trie_state_walk(state, char): break if cdatrie.trie_state_is_terminal(state): # word is found result.append(cdatrie.trie_state_get_data(state)) return result finally: cdatrie.trie_state_free(state) def longest_prefix(self, unicode key, default=RAISE_KEY_ERROR): """ Returns the longest key in this trie that is a prefix of ``key``. If the trie doesn't contain any prefix of ``key``: - if ``default`` is given, returns it, - otherwise raises ``KeyError``. """ cdef cdatrie.TrieState* state = cdatrie.trie_root(self._c_trie) if state == NULL: raise MemoryError() cdef int index = 0, last_terminal_index = 0 try: for ch in key: if not cdatrie.trie_state_walk(state, ch): break index += 1 if cdatrie.trie_state_is_terminal(state): last_terminal_index = index if not last_terminal_index: if default is RAISE_KEY_ERROR: raise KeyError(key) return default return key[:last_terminal_index] finally: cdatrie.trie_state_free(state) def longest_prefix_item(self, unicode key, default=RAISE_KEY_ERROR): """ Returns the item (``(key,value)`` tuple) associated with the longest key in this trie that is a prefix of ``key``. If the trie doesn't contain any prefix of ``key``: - if ``default`` is given, returns it, - otherwise raises ``KeyError``. """ return self._longest_prefix_item(key, default) cdef _longest_prefix_item(self, unicode key, default=RAISE_KEY_ERROR): cdef cdatrie.TrieState* state = cdatrie.trie_root(self._c_trie) if state == NULL: raise MemoryError() cdef int index = 0, last_terminal_index = 0, data try: for ch in key: if not cdatrie.trie_state_walk(state, ch): break index += 1 if cdatrie.trie_state_is_terminal(state): last_terminal_index = index data = cdatrie.trie_state_get_data(state) if not last_terminal_index: if default is RAISE_KEY_ERROR: raise KeyError(key) return default return key[:last_terminal_index], data finally: cdatrie.trie_state_free(state) def longest_prefix_value(self, unicode key, default=RAISE_KEY_ERROR): """ Returns the value associated with the longest key in this trie that is a prefix of ``key``. If the trie doesn't contain any prefix of ``key``: - if ``default`` is given, return it - otherwise raise ``KeyError`` """ return self._longest_prefix_value(key, default) cdef _longest_prefix_value(self, unicode key, default=RAISE_KEY_ERROR): cdef cdatrie.TrieState* state = cdatrie.trie_root(self._c_trie) if state == NULL: raise MemoryError() cdef int data = 0 cdef char found = 0 try: for ch in key: if not cdatrie.trie_state_walk(state, ch): break if cdatrie.trie_state_is_terminal(state): found = 1 data = cdatrie.trie_state_get_data(state) if not found: if default is RAISE_KEY_ERROR: raise KeyError(key) return default return data finally: cdatrie.trie_state_free(state) def has_keys_with_prefix(self, unicode prefix): """ Returns True if any key in the trie begins with ``prefix``. """ cdef cdatrie.TrieState* state = cdatrie.trie_root(self._c_trie) if state == NULL: raise MemoryError() try: for char in prefix: if not cdatrie.trie_state_walk(state, char): return False return True finally: cdatrie.trie_state_free(state) cpdef items(self, unicode prefix=None): """ Returns a list of this trie's items (``(key,value)`` tuples). If ``prefix`` is not None, returns only the items associated with keys prefixed by ``prefix``. """ cdef bint success cdef list res = [] cdef BaseState state = BaseState(self) if prefix is not None: success = state.walk(prefix) if not success: return res cdef BaseIterator iter = BaseIterator(state) if prefix is None: while iter.next(): res.append((iter.key(), iter.data())) else: while iter.next(): res.append((prefix+iter.key(), iter.data())) return res def __iter__(self): cdef BaseIterator iter = BaseIterator(BaseState(self)) while iter.next(): yield iter.key() cpdef keys(self, unicode prefix=None): """ Returns a list of this trie's keys. If ``prefix`` is not None, returns only the keys prefixed by ``prefix``. """ cdef bint success cdef list res = [] cdef BaseState state = BaseState(self) if prefix is not None: success = state.walk(prefix) if not success: return res cdef BaseIterator iter = BaseIterator(state) if prefix is None: while iter.next(): res.append(iter.key()) else: while iter.next(): res.append(prefix+iter.key()) return res cpdef values(self, unicode prefix=None): """ Returns a list of this trie's values. If ``prefix`` is not None, returns only the values associated with keys prefixed by ``prefix``. """ cdef bint success cdef list res = [] cdef BaseState state = BaseState(self) if prefix is not None: success = state.walk(prefix) if not success: return res cdef BaseIterator iter = BaseIterator(state) while iter.next(): res.append(iter.data()) return res cdef _index_to_value(self, cdatrie.TrieData index): return index cdef class Trie(BaseTrie): """ Wrapper for libdatrie's trie. Keys are unicode strings, values are Python objects. """ cdef list _values def __init__(self, alphabet=None, ranges=None, AlphaMap alpha_map=None, _create=True): """ For efficiency trie needs to know what unicode symbols it should be able to store so this constructor requires either ``alphabet`` (a string/iterable with all allowed characters), ``ranges`` (a list of (begin, end) pairs, e.g. [('a', 'z')]) or ``alpha_map`` (:class:`datrie.AlphaMap` instance). """ self._values = [] super(Trie, self).__init__(alphabet, ranges, alpha_map, _create) def __reduce__(self): with tempfile.NamedTemporaryFile() as f: self.write(f) pickle.dump(self._values, f) f.seek(0) state = f.read() return Trie, (None, None, None, False), state def __setstate__(self, bytes state): assert self._c_trie is NULL with tempfile.NamedTemporaryFile() as f: f.write(state) f.flush() f.seek(0) self._c_trie = _load_from_file(f) self._values = pickle.load(f) def __getitem__(self, unicode key): cdef cdatrie.TrieData index = self._getitem(key) return self._values[index] def get(self, unicode key, default=None): cdef cdatrie.TrieData index try: index = self._getitem(key) return self._values[index] except KeyError: return default def __setitem__(self, unicode key, object value): cdef cdatrie.TrieData next_index = len(self._values) cdef cdatrie.TrieData index = self._setdefault(key, next_index) if index == next_index: self._values.append(value) # insert else: self._values[index] = value # update def setdefault(self, unicode key, object value): cdef cdatrie.TrieData next_index = len(self._values) cdef cdatrie.TrieData index = self._setdefault(key, next_index) if index == next_index: self._values.append(value) # insert return value else: return self._values[index] # lookup def __delitem__(self, unicode key): # XXX: this could be faster (key is encoded twice here) cdef cdatrie.TrieData index = self._getitem(key) self._values[index] = DELETED_OBJECT self._delitem(key) def write(self, f): """ Writes a trie to a file. File-like objects without real file descriptors are not supported. """ super(Trie, self).write(f) pickle.dump(self._values, f) @classmethod def read(cls, f): """ Creates a new Trie by reading it from file. File-like objects without real file descriptors are not supported. """ cdef Trie trie = super(Trie, cls).read(f) trie._values = pickle.load(f) return trie cpdef items(self, unicode prefix=None): """ Returns a list of this trie's items (``(key,value)`` tuples). If ``prefix`` is not None, returns only the items associated with keys prefixed by ``prefix``. """ # the following code is # # [(k, self._values[v]) for (k,v) in BaseTrie.items(self, prefix)] # # but inlined for speed. cdef bint success cdef list res = [] cdef BaseState state = BaseState(self) if prefix is not None: success = state.walk(prefix) if not success: return res cdef BaseIterator iter = BaseIterator(state) if prefix is None: while iter.next(): res.append((iter.key(), self._values[iter.data()])) else: while iter.next(): res.append((prefix+iter.key(), self._values[iter.data()])) return res cpdef values(self, unicode prefix=None): """ Returns a list of this trie's values. If ``prefix`` is not None, returns only the values associated with keys prefixed by ``prefix``. """ # the following code is # # [self._values[v] for v in BaseTrie.values(self, prefix)] # # but inlined for speed. cdef list res = [] cdef BaseState state = BaseState(self) cdef bint success if prefix is not None: success = state.walk(prefix) if not success: return res cdef BaseIterator iter = BaseIterator(state) while iter.next(): res.append(self._values[iter.data()]) return res def longest_prefix_item(self, unicode key, default=RAISE_KEY_ERROR): """ Returns the item (``(key,value)`` tuple) associated with the longest key in this trie that is a prefix of ``key``. If the trie doesn't contain any prefix of ``key``: - if ``default`` is given, returns it, - otherwise raises ``KeyError``. """ cdef res = self._longest_prefix_item(key, RERAISE_KEY_ERROR) if res is RERAISE_KEY_ERROR: # error if default is RAISE_KEY_ERROR: raise KeyError(key) return default return res[0], self._values[res[1]] def longest_prefix_value(self, unicode key, default=RAISE_KEY_ERROR): """ Returns the value associated with the longest key in this trie that is a prefix of ``key``. If the trie doesn't contain any prefix of ``key``: - if ``default`` is given, return it - otherwise raise ``KeyError`` """ cdef res = self._longest_prefix_value(key, RERAISE_KEY_ERROR) if res is RERAISE_KEY_ERROR: # error if default is RAISE_KEY_ERROR: raise KeyError(key) return default return self._values[res] def prefix_items(self, unicode key): ''' Returns a list of the items (``(key,value)`` tuples) of this trie that are associated with keys that are prefixes of ``key``. ''' return [(k, self._values[v]) for (k, v) in self._prefix_items(key)] def iter_prefix_items(self, unicode key): for k, v in super(Trie, self).iter_prefix_items(key): yield k, self._values[v] def prefix_values(self, unicode key): ''' Returns a list of the values of this trie that are associated with keys that are prefixes of ``key``. ''' return [self._values[v] for v in self._prefix_values(key)] def iter_prefix_values(self, unicode key): for v in super(Trie, self).iter_prefix_values(key): yield self._values[v] cdef _index_to_value(self, cdatrie.TrieData index): return self._values[index] cdef class _TrieState: cdef cdatrie.TrieState* _state cdef BaseTrie _trie def __cinit__(self, BaseTrie trie): self._state = cdatrie.trie_root(trie._c_trie) if self._state is NULL: raise MemoryError() self._trie = trie def __dealloc__(self): if self._state is not NULL: cdatrie.trie_state_free(self._state) cpdef walk(self, unicode to): cdef bint res for ch in to: if not self.walk_char( ch): return False return True cdef bint walk_char(self, cdatrie.AlphaChar char): """ Walks the trie stepwise, using a given character ``char``. On return, the state is updated to the new state if successfully walked. Returns boolean value indicating the success of the walk. """ return cdatrie.trie_state_walk(self._state, char) cpdef copy_to(self, _TrieState state): """ Copies trie state to another """ cdatrie.trie_state_copy(state._state, self._state) cpdef rewind(self): """ Puts the state at root """ cdatrie.trie_state_rewind(self._state) cpdef bint is_terminal(self): return cdatrie.trie_state_is_terminal(self._state) cpdef bint is_single(self): return cdatrie.trie_state_is_single(self._state) cpdef bint is_leaf(self): return cdatrie.trie_state_is_leaf(self._state) def __unicode__(self): return u"data:%d, term:%s, leaf:%s, single: %s" % ( self.data(), self.is_terminal(), self.is_leaf(), self.is_single(), ) def __repr__(self): return self.__unicode__() # XXX: this is incorrect under Python 2.x cdef class BaseState(_TrieState): """ cdatrie.TrieState wrapper. It can be used for custom trie traversal. """ cpdef int data(self): return cdatrie.trie_state_get_data(self._state) cdef class State(_TrieState): def __cinit__(self, Trie trie): # this is overriden for extra type check self._state = cdatrie.trie_root(trie._c_trie) if self._state is NULL: raise MemoryError() self._trie = trie cpdef data(self): cdef cdatrie.TrieData data = cdatrie.trie_state_get_data(self._state) return self._trie._index_to_value(data) cdef class _TrieIterator: cdef cdatrie.TrieIterator* _iter cdef _TrieState _root def __cinit__(self, _TrieState state): self._root = state # prevent garbage collection of state self._iter = cdatrie.trie_iterator_new(state._state) if self._iter is NULL: raise MemoryError() def __dealloc__(self): if self._iter is not NULL: cdatrie.trie_iterator_free(self._iter) cpdef bint next(self): return cdatrie.trie_iterator_next(self._iter) cpdef unicode key(self): cdef cdatrie.AlphaChar* key = cdatrie.trie_iterator_get_key(self._iter) try: return unicode_from_alpha_char(key) finally: free(key) cdef class BaseIterator(_TrieIterator): """ cdatrie.TrieIterator wrapper. It can be used for custom datrie.BaseTrie traversal. """ cpdef cdatrie.TrieData data(self): return cdatrie.trie_iterator_get_data(self._iter) cdef class Iterator(_TrieIterator): """ cdatrie.TrieIterator wrapper. It can be used for custom datrie.Trie traversal. """ def __cinit__(self, State state): # this is overriden for extra type check self._root = state # prevent garbage collection of state self._iter = cdatrie.trie_iterator_new(state._state) if self._iter is NULL: raise MemoryError() cpdef data(self): cdef cdatrie.TrieData data = cdatrie.trie_iterator_get_data(self._iter) return self._root._trie._index_to_value(data) cdef (cdatrie.Trie* ) _load_from_file(f) except NULL: cdef int fd = f.fileno() cdef stdio.FILE* f_ptr = stdio_ext.fdopen(fd, "r") if f_ptr == NULL: raise IOError() cdef cdatrie.Trie* trie = cdatrie.trie_fread(f_ptr) if trie == NULL: raise DatrieError("Can't load trie from stream") cdef int f_pos = stdio.ftell(f_ptr) f.seek(f_pos) return trie #cdef (cdatrie.Trie*) _load_from_file(path) except NULL: # str_path = path.encode(sys.getfilesystemencoding()) # cdef char* c_path = str_path # cdef cdatrie.Trie* trie = cdatrie.trie_new_from_file(c_path) # if trie is NULL: # raise DatrieError("Can't load trie from file") # # return trie # ============================ AlphaMap & utils ================================ cdef class AlphaMap: """ Alphabet map. For sparse data compactness, the trie alphabet set should be continuous, but that is usually not the case in general character sets. Therefore, a map between the input character and the low-level alphabet set for the trie is created in the middle. You will have to define your input character set by listing their continuous ranges of character codes creating a trie. Then, each character will be automatically assigned internal codes of continuous values. """ cdef cdatrie.AlphaMap *_c_alpha_map def __cinit__(self): self._c_alpha_map = cdatrie.alpha_map_new() def __dealloc__(self): if self._c_alpha_map is not NULL: cdatrie.alpha_map_free(self._c_alpha_map) def __init__(self, alphabet=None, ranges=None, _create=True): if not _create: return if ranges is not None: for range in ranges: self.add_range(*range) if alphabet is not None: self.add_alphabet(alphabet) cdef AlphaMap copy(self): cdef AlphaMap clone = AlphaMap(_create=False) clone._c_alpha_map = cdatrie.alpha_map_clone(self._c_alpha_map) if clone._c_alpha_map is NULL: raise MemoryError() return clone def add_alphabet(self, alphabet): """ Adds all chars from iterable to the alphabet set. """ for begin, end in alphabet_to_ranges(alphabet): self._add_range(begin, end) def add_range(self, begin, end): """ Add a range of character codes from ``begin`` to ``end`` to the alphabet set. ``begin`` - the first character of the range; ``end`` - the last character of the range. """ self._add_range(ord(begin), ord(end)) cpdef _add_range(self, cdatrie.AlphaChar begin, cdatrie.AlphaChar end): if begin > end: raise DatrieError('range begin > end') code = cdatrie.alpha_map_add_range(self._c_alpha_map, begin, end) if code != 0: raise MemoryError() cdef cdatrie.AlphaChar* new_alpha_char_from_unicode(unicode txt): """ Converts Python unicode string to libdatrie's AlphaChar* format. libdatrie wants null-terminated array of 4-byte LE symbols. The caller should free the result of this function. """ cdef int txt_len = len(txt) cdef int size = (txt_len + 1) * sizeof(cdatrie.AlphaChar) # allocate buffer cdef cdatrie.AlphaChar* data = malloc(size) if data is NULL: raise MemoryError() # Copy text contents to buffer. # XXX: is it safe? The safe alternative is to decode txt # to utf32_le and then use memcpy to copy the content: # # py_str = txt.encode('utf_32_le') # cdef char* c_str = py_str # string.memcpy(data, c_str, size-1) # # but the following is much (say 10x) faster and this # function is really in a hot spot. cdef int i = 0 for char in txt: data[i] = char i+=1 # Buffer must be null-terminated (last 4 bytes must be zero). data[txt_len] = 0 return data cdef unicode unicode_from_alpha_char(cdatrie.AlphaChar* key, int len=0): """ Converts libdatrie's AlphaChar* to Python unicode. """ cdef int length = len if length == 0: length = cdatrie.alpha_char_strlen(key)*sizeof(cdatrie.AlphaChar) cdef char* c_str = key return c_str[:length].decode('utf_32_le') def to_ranges(lst): """ Converts a list of numbers to a list of ranges:: >>> numbers = [1,2,3,5,6] >>> list(to_ranges(numbers)) [(1, 3), (5, 6)] """ for a, b in itertools.groupby(enumerate(lst), lambda t: t[1] - t[0]): b = list(b) yield b[0][1], b[-1][1] def alphabet_to_ranges(alphabet): for begin, end in to_ranges(sorted(map(ord, iter(alphabet)))): yield begin, end def new(alphabet=None, ranges=None, AlphaMap alpha_map=None): warnings.warn('datrie.new is deprecated; please use datrie.Trie.', DeprecationWarning) return Trie(alphabet, ranges, alpha_map) MutableMapping.register(Trie) MutableMapping.register(BaseTrie) ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585088538.0 datrie-0.8.2/src/stdio_ext.pxd0000644000175000017500000000014600000000000017757 0ustar00tcaswelltcaswell00000000000000from libc cimport stdio cdef extern from "stdio.h" nogil: stdio.FILE *fdopen(int fd, char *mode) ././@PaxHeader0000000000000000000000000000003400000000000011452 xustar000000000000000028 mtime=1585193107.7794485 datrie-0.8.2/tests/0000755000175000017500000000000000000000000015612 5ustar00tcaswelltcaswell00000000000000././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585088538.0 datrie-0.8.2/tests/__init__.py0000644000175000017500000000007600000000000017726 0ustar00tcaswelltcaswell00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585088538.0 datrie-0.8.2/tests/test_iteration.py0000644000175000017500000000416600000000000021230 0ustar00tcaswelltcaswell00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, unicode_literals import string import datrie WORDS = ['producers', 'pool', 'prepare', 'preview', 'prize', 'produce', 'producer', 'progress'] def _trie(): trie = datrie.Trie(ranges=[(chr(0), chr(127))]) for index, word in enumerate(WORDS, 1): trie[word] = index return trie def test_base_trie_data(): trie = datrie.BaseTrie(string.printable) trie['x'] = 1 trie['xo'] = 2 state = datrie.BaseState(trie) state.walk('x') it = datrie.BaseIterator(state) it.next() assert it.data() == 1 state.walk('o') it = datrie.BaseIterator(state) it.next() assert it.data() == 2 def test_next(): trie = _trie() state = datrie.State(trie) it = datrie.Iterator(state) values = [] while it.next(): values.append(it.data()) assert len(values) == 8 assert values == [2, 3, 4, 5, 6, 7, 1, 8] def test_next_non_root(): trie = _trie() state = datrie.State(trie) state.walk('pr') it = datrie.Iterator(state) values = [] while it.next(): values.append(it.data()) assert len(values) == 7 assert values == [3, 4, 5, 6, 7, 1, 8] def test_next_tail(): trie = _trie() state = datrie.State(trie) state.walk('poo') it = datrie.Iterator(state) values = [] while it.next(): values.append(it.data()) assert values == [2] def test_keys(): trie = _trie() state = datrie.State(trie) it = datrie.Iterator(state) keys = [] while it.next(): keys.append(it.key()) assert keys == sorted(WORDS) def test_keys_non_root(): trie = _trie() state = datrie.State(trie) state.walk('pro') it = datrie.Iterator(state) keys = [] while it.next(): keys.append(it.key()) assert keys == ['duce', 'ducer', 'ducers', 'gress'] def test_keys_tail(): trie = _trie() state = datrie.State(trie) state.walk('pro') it = datrie.Iterator(state) keys = [] while it.next(): keys.append(it.key()) assert keys == ['duce', 'ducer', 'ducers', 'gress'] ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585088538.0 datrie-0.8.2/tests/test_random.py0000644000175000017500000000333500000000000020507 0ustar00tcaswelltcaswell00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, unicode_literals import pickle import string import datrie import hypothesis.strategies as st from hypothesis import given printable_strings = st.lists(st.text(string.printable)) @given(printable_strings) def test_contains(words): trie = datrie.Trie(string.printable) for i, word in enumerate(set(words)): trie[word] = i + 1 for i, word in enumerate(set(words)): assert word in trie assert trie[word] == trie.get(word) == i + 1 @given(printable_strings) def test_len(words): trie = datrie.Trie(string.printable) for i, word in enumerate(set(words)): trie[word] = i assert len(trie) == len(set(words)) @given(printable_strings) def test_pickle_unpickle(words): trie = datrie.Trie(string.printable) for i, word in enumerate(set(words)): trie[word] = i trie = pickle.loads(pickle.dumps(trie)) for i, word in enumerate(set(words)): assert word in trie assert trie[word] == i @given(printable_strings) def test_pop(words): words = set(words) trie = datrie.Trie(string.printable) for i, word in enumerate(words): trie[word] = i for i, word in enumerate(words): assert trie.pop(word) == i assert trie.pop(word, 42) == trie.get(word, 42) == 42 @given(printable_strings) def test_clear(words): words = set(words) trie = datrie.Trie(string.printable) for i, word in enumerate(words): trie[word] = i assert len(trie) == len(words) trie.clear() assert not trie assert len(trie) == 0 # make sure the trie works afterwards. for i, word in enumerate(words): trie[word] = i assert trie[word] == i ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585088538.0 datrie-0.8.2/tests/test_state.py0000644000175000017500000000077000000000000020347 0ustar00tcaswelltcaswell00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, unicode_literals import datrie def _trie(): trie = datrie.Trie(ranges=[(chr(0), chr(127))]) trie['f'] = 1 trie['fo'] = 2 trie['fa'] = 3 trie['faur'] = 4 trie['fauxiiiip'] = 5 trie['fauzox'] = 10 trie['fauzoy'] = 20 return trie def test_trie_state(): trie = _trie() state = datrie.State(trie) state.walk('f') assert state.data() == 1 state.walk('o') assert state.data() == 2 ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585088538.0 datrie-0.8.2/tests/test_trie.py0000644000175000017500000002753400000000000020201 0ustar00tcaswelltcaswell00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, unicode_literals import pickle import random import string import sys import tempfile import datrie import pytest def test_trie(): trie = datrie.Trie(string.printable) assert trie.is_dirty() assert 'foo' not in trie assert 'Foo' not in trie trie['foo'] = '5' assert 'foo' in trie assert trie['foo'] == '5' trie['Foo'] = 10 assert trie['Foo'] == 10 assert trie['foo'] == '5' del trie['foo'] assert 'foo' not in trie assert 'Foo' in trie assert trie['Foo'] == 10 with pytest.raises(KeyError): trie['bar'] def test_trie_invalid_alphabet(): t = datrie.Trie('abc') t['a'] = 'a' t['b'] = 'b' t['c'] = 'c' for k in 'abc': assert t[k] == k with pytest.raises(KeyError): t['d'] with pytest.raises(KeyError): t['e'] def test_trie_save_load(): fd, fname = tempfile.mkstemp() trie = datrie.Trie(string.printable) trie['foobar'] = 1 trie['foovar'] = 2 trie['baz'] = 3 trie['fo'] = 4 trie['Foo'] = 'vasia' trie.save(fname) del trie trie2 = datrie.Trie.load(fname) assert trie2['foobar'] == 1 assert trie2['baz'] == 3 assert trie2['fo'] == 4 assert trie2['foovar'] == 2 assert trie2['Foo'] == 'vasia' def test_save_load_base(): fd, fname = tempfile.mkstemp() trie = datrie.BaseTrie(alphabet=string.printable) trie['foobar'] = 1 trie['foovar'] = 2 trie['baz'] = 3 trie['fo'] = 4 trie.save(fname) trie2 = datrie.BaseTrie.load(fname) assert trie2['foobar'] == 1 assert trie2['baz'] == 3 assert trie2['fo'] == 4 assert trie2['foovar'] == 2 def test_trie_file_io(): fd, fname = tempfile.mkstemp() trie = datrie.BaseTrie(string.printable) trie['foobar'] = 1 trie['foo'] = 2 extra_data = ['foo', 'bar'] with open(fname, "wb", 0) as f: pickle.dump(extra_data, f) trie.write(f) pickle.dump(extra_data, f) with open(fname, "rb", 0) as f: extra_data2 = pickle.load(f) trie2 = datrie.BaseTrie.read(f) extra_data3 = pickle.load(f) assert extra_data2 == extra_data assert extra_data3 == extra_data assert trie2['foobar'] == 1 assert trie2['foo'] == 2 assert len(trie2) == len(trie) def test_trie_unicode(): # trie for lowercase Russian characters trie = datrie.Trie(ranges=[('а', 'я')]) trie['а'] = 1 trie['б'] = 2 trie['аб'] = 'vasia' assert trie['а'] == 1 assert trie['б'] == 2 assert trie['аб'] == 'vasia' def test_trie_ascii(): trie = datrie.Trie(string.ascii_letters) trie['x'] = 1 trie['y'] = 'foo' trie['xx'] = 2 assert trie['x'] == 1 assert trie['y'] == 'foo' assert trie['xx'] == 2 def test_trie_items(): trie = datrie.Trie(string.ascii_lowercase) trie['foo'] = 10 trie['bar'] = 'foo' trie['foobar'] = 30 assert trie.values() == ['foo', 10, 30] assert trie.items() == [('bar', 'foo'), ('foo', 10), ('foobar', 30)] assert trie.keys() == ['bar', 'foo', 'foobar'] def test_trie_iter(): trie = datrie.Trie(string.ascii_lowercase) assert list(trie) == [] trie['foo'] = trie['bar'] = trie['foobar'] = 42 assert list(trie) == ['bar', 'foo', 'foobar'] def test_trie_comparison(): trie = datrie.Trie(string.ascii_lowercase) assert trie == trie assert trie == datrie.Trie(string.ascii_lowercase) other = datrie.Trie(string.ascii_lowercase) trie['foo'] = 42 other['foo'] = 24 assert trie != other other['foo'] = trie['foo'] assert trie == other other['bar'] = 42 assert trie != other with pytest.raises(TypeError): trie < other # same for other comparisons def test_trie_update(): trie = datrie.Trie(string.ascii_lowercase) trie.update([("foo", 42)]) assert trie["foo"] == 42 trie.update({"bar": 123}) assert trie["bar"] == 123 if sys.version_info[0] == 2: with pytest.raises(TypeError): trie.update(bar=24) else: trie.update(bar=24) assert trie["bar"] == 24 def test_trie_suffixes(): trie = datrie.Trie(string.ascii_lowercase) trie['pro'] = 1 trie['prof'] = 2 trie['product'] = 3 trie['production'] = 4 trie['producer'] = 5 trie['producers'] = 6 trie['productivity'] = 7 assert trie.suffixes('pro') == [ '', 'ducer', 'ducers', 'duct', 'duction', 'ductivity', 'f' ] def test_trie_len(): trie = datrie.Trie(string.ascii_lowercase) words = ['foo', 'f', 'faa', 'bar', 'foobar'] for word in words: trie[word] = None assert len(trie) == len(words) # Calling len on an empty trie caused segfault, see #17 on GitHub. trie = datrie.Trie(string.ascii_lowercase) assert len(trie) == 0 def test_setdefault(): trie = datrie.Trie(string.ascii_lowercase) assert trie.setdefault('foo', 5) == 5 assert trie.setdefault('foo', 4) == 5 assert trie.setdefault('foo', 5) == 5 assert trie.setdefault('bar', 'vasia') == 'vasia' assert trie.setdefault('bar', 3) == 'vasia' assert trie.setdefault('bar', 7) == 'vasia' class TestPrefixLookups(object): def _trie(self): trie = datrie.Trie(string.ascii_lowercase) trie['foo'] = 10 trie['bar'] = 20 trie['foobar'] = 30 trie['foovar'] = 40 trie['foobarzartic'] = None return trie def test_trie_keys_prefix(self): trie = self._trie() assert trie.keys('foobarz') == ['foobarzartic'] assert trie.keys('foobarzart') == ['foobarzartic'] assert trie.keys('foo') == ['foo', 'foobar', 'foobarzartic', 'foovar'] assert trie.keys('foobar') == ['foobar', 'foobarzartic'] assert trie.keys('') == [ 'bar', 'foo', 'foobar', 'foobarzartic', 'foovar' ] assert trie.keys('x') == [] def test_trie_items_prefix(self): trie = self._trie() assert trie.items('foobarz') == [('foobarzartic', None)] assert trie.items('foobarzart') == [('foobarzartic', None)] assert trie.items('foo') == [ ('foo', 10), ('foobar', 30), ('foobarzartic', None), ('foovar', 40) ] assert trie.items('foobar') == [('foobar', 30), ('foobarzartic', None)] assert trie.items('') == [ ('bar', 20), ('foo', 10), ('foobar', 30), ('foobarzartic', None), ('foovar', 40) ] assert trie.items('x') == [] def test_trie_values_prefix(self): trie = self._trie() assert trie.values('foobarz') == [None] assert trie.values('foobarzart') == [None] assert trie.values('foo') == [10, 30, None, 40] assert trie.values('foobar') == [30, None] assert trie.values('') == [20, 10, 30, None, 40] assert trie.values('x') == [] class TestPrefixSearch(object): WORDS = ['producers', 'producersz', 'pr', 'pool', 'prepare', 'preview', 'prize', 'produce', 'producer', 'progress'] def _trie(self): trie = datrie.Trie(string.ascii_lowercase) for index, word in enumerate(self.WORDS, 1): trie[word] = index return trie def test_trie_iter_prefixes(self): trie = self._trie() trie['pr'] = 'foo' prefixes = trie.iter_prefixes('producers') assert list(prefixes) == ['pr', 'produce', 'producer', 'producers'] no_prefixes = trie.iter_prefixes('vasia') assert list(no_prefixes) == [] values = trie.iter_prefix_values('producers') assert list(values) == ['foo', 8, 9, 1] no_prefixes = trie.iter_prefix_values('vasia') assert list(no_prefixes) == [] items = trie.iter_prefix_items('producers') assert next(items) == ('pr', 'foo') assert next(items) == ('produce', 8) assert next(items) == ('producer', 9) assert next(items) == ('producers', 1) no_prefixes = trie.iter_prefix_items('vasia') assert list(no_prefixes) == [] def test_trie_prefixes(self): trie = self._trie() prefixes = trie.prefixes('producers') assert prefixes == ['pr', 'produce', 'producer', 'producers'] values = trie.prefix_values('producers') assert values == [3, 8, 9, 1] items = trie.prefix_items('producers') assert items == [('pr', 3), ('produce', 8), ('producer', 9), ('producers', 1)] assert trie.prefixes('vasia') == [] assert trie.prefix_values('vasia') == [] assert trie.prefix_items('vasia') == [] def test_has_keys_with_prefix(self): trie = self._trie() for word in self.WORDS: assert trie.has_keys_with_prefix(word) assert trie.has_keys_with_prefix(word[:-1]) assert trie.has_keys_with_prefix('p') assert trie.has_keys_with_prefix('poo') assert trie.has_keys_with_prefix('pr') assert trie.has_keys_with_prefix('priz') assert not trie.has_keys_with_prefix('prizey') assert not trie.has_keys_with_prefix('ops') assert not trie.has_keys_with_prefix('progn') def test_longest_prefix(self): trie = self._trie() for word in self.WORDS: assert trie.longest_prefix(word) == word assert trie.longest_prefix('pooler') == 'pool' assert trie.longest_prefix('producers') == 'producers' assert trie.longest_prefix('progressor') == 'progress' assert trie.longest_prefix('paol', default=None) is None assert trie.longest_prefix('p', default=None) is None assert trie.longest_prefix('z', default=None) is None with pytest.raises(KeyError): trie.longest_prefix('z') def test_longest_prefix_bug(self): trie = self._trie() assert trie.longest_prefix("print") == "pr" assert trie.longest_prefix_value("print") == 3 assert trie.longest_prefix_item("print") == ("pr", 3) def test_longest_prefix_item(self): trie = self._trie() for index, word in enumerate(self.WORDS, 1): assert trie.longest_prefix_item(word) == (word, index) assert trie.longest_prefix_item('pooler') == ('pool', 4) assert trie.longest_prefix_item('producers') == ('producers', 1) assert trie.longest_prefix_item('progressor') == ('progress', 10) dummy = (None, None) assert trie.longest_prefix_item('paol', default=dummy) == dummy assert trie.longest_prefix_item('p', default=dummy) == dummy assert trie.longest_prefix_item('z', default=dummy) == dummy with pytest.raises(KeyError): trie.longest_prefix_item('z') def test_longest_prefix_value(self): trie = self._trie() for index, word in enumerate(self.WORDS, 1): assert trie.longest_prefix_value(word) == index assert trie.longest_prefix_value('pooler') == 4 assert trie.longest_prefix_value('producers') == 1 assert trie.longest_prefix_value('progressor') == 10 assert trie.longest_prefix_value('paol', default=None) is None assert trie.longest_prefix_value('p', default=None) is None assert trie.longest_prefix_value('z', default=None) is None with pytest.raises(KeyError): trie.longest_prefix_value('z') def test_trie_fuzzy(): russian = 'абвгдеёжзиклмнопрстуфхцчъыьэюя' alphabet = russian.upper() + string.ascii_lowercase words = list({ "".join(random.choice(alphabet) for x in range(random.randint(8, 16))) for y in range(1000) }) trie = datrie.Trie(alphabet) enumerated_words = list(enumerate(words)) for index, word in enumerated_words: trie[word] = index assert len(trie) == len(words) random.shuffle(enumerated_words) for index, word in enumerated_words: assert word in trie, word assert trie[word] == index, (word, index) ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585088538.0 datrie-0.8.2/tox-bench.ini0000644000175000017500000000013000000000000017032 0ustar00tcaswelltcaswell00000000000000[tox] envlist = py27,py34,py35,py36,py37 [testenv] commands= python bench/speed.py ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585190616.0 datrie-0.8.2/tox.ini0000644000175000017500000000017600000000000015767 0ustar00tcaswelltcaswell00000000000000[tox] envlist = py27,py34,py35,py36,py37,py38 [testenv] deps = hypothesis pytest cython commands= py.test [] ././@PaxHeader0000000000000000000000000000002600000000000011453 xustar000000000000000022 mtime=1585088538.0 datrie-0.8.2/update_c.sh0000755000175000017500000000010400000000000016566 0ustar00tcaswelltcaswell00000000000000#!/bin/sh cython src/datrie.pyx src/cdatrie.pxd src/stdio_ext.pxd -a