larch-1.20131130/0000755000175000017500000000000012246332521013062 5ustar jenkinsjenkinslarch-1.20131130/.bzrignore0000644000175000017500000000006112246332521015061 0ustar jenkinsjenkinsfsck-larch.1 doc/_build/doctrees doc/_build/html larch-1.20131130/COPYING0000644000175000017500000010451312246332521014121 0ustar jenkinsjenkins GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The GNU General Public License is a free, copyleft license for software and other kinds of works. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things. To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others. For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License giving you legal permission to copy, distribute and/or modify it. For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions. Some devices are designed to deny users access to install or run modified versions of the software inside them, although the manufacturer can do so. This is fundamentally incompatible with the aim of protecting users' freedom to change the software. The systematic pattern of such abuse occurs in the area of products for individuals to use, which is precisely where it is most unacceptable. Therefore, we have designed this version of the GPL to prohibit the practice for those products. If such problems arise substantially in other domains, we stand ready to extend this provision to those domains in future versions of the GPL, as needed to protect the freedom of users. Finally, every program is threatened constantly by software patents. States should not allow patents to restrict development and use of software on general-purpose computers, but in those that do, we wish to avoid the special danger that patents applied to a free program could make it effectively proprietary. To prevent this, the GPL assures that patents cannot be used to render the program non-free. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 0. Definitions. "This License" refers to version 3 of the GNU General Public License. "Copyright" also means copyright-like laws that apply to other kinds of works, such as semiconductor masks. "The Program" refers to any copyrightable work licensed under this License. Each licensee is addressed as "you". "Licensees" and "recipients" may be individuals or organizations. To "modify" a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a "modified version" of the earlier work or a work "based on" the earlier work. A "covered work" means either the unmodified Program or a work based on the Program. To "propagate" a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well. To "convey" a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. An interactive user interface displays "Appropriate Legal Notices" to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion. 1. Source Code. The "source code" for a work means the preferred form of the work for making modifications to it. "Object code" means any non-source form of a work. A "Standard Interface" means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language. The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source. The Corresponding Source for a work in source code form is that same work. 2. Basic Permissions. All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law. You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you. Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary. 3. Protecting Users' Legal Rights From Anti-Circumvention Law. No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures. When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures. 4. Conveying Verbatim Copies. You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program. You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. 5. Conveying Modified Source Versions. You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: a) The work must carry prominent notices stating that you modified it, and giving a relevant date. b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to "keep intact all notices". c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate. 6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work. A "User Product" is either (1) a "consumer product", which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, "normally used" refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product. "Installation Information" for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made. If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM). The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying. 7. Additional Terms. "Additional permissions" are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions. When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission. Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms: a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or d) Limiting the use for publicity purposes of names of licensors or authors of the material; or e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors. All other non-permissive additional terms are considered "further restrictions" within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying. If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms. Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way. 8. Termination. You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11). However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10. 9. Acceptance Not Required for Having Copies. You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so. 10. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. An "entity transaction" is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it. 11. Patents. A "contributor" is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's "contributor version". A contributor's "essential patent claims" are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, "control" includes the right to grant patent sublicenses in a manner consistent with the requirements of this License. Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version. In the following three paragraphs, a "patent license" is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To "grant" such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party. If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. "Knowingly relying" means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid. If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it. A patent license is "discriminatory" if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007. Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law. 12. No Surrender of Others' Freedom. If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program. 13. Use with the GNU Affero General Public License. Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such. 14. Revised Versions of this License. The Free Software Foundation may publish revised and/or new versions of the GNU General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU General Public License, you may choose any version ever published by the Free Software Foundation. If the Program specifies that a proxy can decide which future versions of the GNU General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program. Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version. 15. Disclaimer of Warranty. THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. Limitation of Liability. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 17. Interpretation of Sections 15 and 16. If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively state the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Also add information on how to contact you by electronic and paper mail. If the program does terminal interaction, make it output a short notice like this when it starts in an interactive mode: Copyright (C) This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, your program's commands might be different; for a GUI interface, you would use an "about box". You should also get your employer (if you work as a programmer) or school, if any, to sign a "copyright disclaimer" for the program, if necessary. For more information on this, and how to apply and follow the GNU GPL, see . The GNU General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. But first, please read . larch-1.20131130/Makefile0000644000175000017500000000253512246332521014527 0ustar jenkinsjenkins# Makefile for B-tree implementation # Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . all: fsck-larch.1 $(MAKE) -C doc html fsck-larch.1: fsck-larch.1.in fsck-larch ./fsck-larch --generate-manpage=fsck-larch.1.in > fsck-larch.1 check: python -m CoverageTestRunner --ignore-missing-from=without-tests rm .coverage ./insert-remove-test tempdir 100 rm -r tempdir larch.log cmdtest tests ./codec-speed -n1000 ./idpath-speed 1 t.idspeed-test 10 5 1 && rm -r t.idspeed-test ./refcount-speed --refs=1000 ./speed-test --location t.speed-test --keys 1000 clean: rm -f .coverage *.py[co] larch/*.py[co] insert.prof lookup.prof rm -rf build tempdir larch.log example.tree t.idspeed-test t.speed-test rm -f fsck-larch.1 $(MAKE) -C doc clean larch-1.20131130/NEWS0000644000175000017500000004230212246332521013562 0ustar jenkinsjenkinsNEWS for larch ============== These are the release notes for larch, a Python implementation of a copy-on-write B-tree, designed by Ohad Rodeh. Version 1.20131130 ------------------ * Serious bug fixed: the "KeyError" crash for reference counts. This was false memory use optimisation, which triggered a rare bug in related code. Repeatable test case by Rob Kendrick, and helpful analysis by Itamar Turing-Trauring. * Serious bug fixed: another "node missing" bug. This crash was caused by a bug that overwrote on-disk reference count groups with zeroes. Repeatable test case by Rob Kendrick. * Fixes to fsck from Antoine Brenner. Version 1.20130808 ------------------ * Bug fix in how Larch handles partly-comitted B-tree journals in read-only mode. Previously, this would result in a crash if, say, a node had been removed, but the B-tree metadata hadn't been committed. Recipe for reproducing bug found by Damien Couroussé. Version 1.20130316 ------------------ * Fsck now check for missing nodes, and optionally fixes them by deleting references to them. * Node numbers are now reported in hexadecimal, to make it easier to find on disk. Patch by Damien Couroussé. * The `speed-test` script now works again. The initialiser API for `NodeStore` and `NodeStoreMemory` now have a new first argument, `allow_writes`, to make them compatible with the `NodeStoreDisk` initialiser. Patch by Antoine Brenner. * Improved error messages for nodes that seem to be missing. Version 1.20121216 ------------------ * Make fsck progress reporting be a bit more fine grained. Version 1.20121006 ------------------ * Critical bug fix: an indentation problem in the Python code was fixed. A line was intended wrong, resulting it to not be included in the right block, and therefore not having access to the variable created in that block. * Bug fix: The Debian packaging was missing a build dependency on cmdtest. Version 1.20120527, released 2012-05-27 --------------------------------------- * New version scheme. Thank you, Joey Hess. * The on-disk data structures and file formats are now declared frozen. An automatic test has been added to verify that things do not break. Version 0.31, released 2012-05-08 --------------------------------- * Optimize journal use to have fewer round-trips. This matters when the journal is stored on a high-latency storage, such as an SFTP server. * Now handles better missing key or node sizes in the metadata. * All exceptions thrown by larch are now subclasses of `larch.Error`. * The on-disk format is now frozen. Version 0.30, released 2012-04-29 --------------------------------- * `NodeStoreDisk` is now explicitly in read-only or read-write mode. In read-only mode it does not replay or rollback to the journal, or care about any changes made there. Version 0.29, released 2012-04-15 --------------------------------- * Larch now uses the `larch.FormatProblem´ exception when an on-disk node store is missing a format version, or has the wrong version. This helps Obnam print a better error message. Version 0.28, released 2012-03-25 --------------------------------- * Changes to on-disk storage are now done via a journal data structure to protect against corruption from crashing. Previously, if a program using Larch crashed during an unfortunate moment, the on-disk state would not be consistent: some nodes might have been removed, for example, but other nodes or the list of root nodes still referred to them. This is now fixed. The on-disk data format version has not changed: the on-disk format is compatible with old versions of larch, as long as there are no uncommitted changes from a new version. The API has changed, however, and it is now necessary for Larch users to call the `Forest.commit` method to force changes to be stored on disk. Failure to do so will cause changes to be lost, and many annoyed bug reports. Version 0.27, released 2012-02-18 --------------------------------- * Merged in some fsck `WorkItem` changes from Obnam, so that Obnam can share the code. Version 0.26, released 2011-12-18 --------------------------------- * `open_forest` now works even if the requested node size is different from the node size used with an existing tree. The requested size is just ignored, rather than causing an error. This is useful for, say, Obnam, when the user decides to change the node size setting, or when Obnam's default node size for a tree grows. This way, things work. Version 0.25, released 2011-10-02 --------------------------------- * `fsck` can now optionally attempt to fix B-trees with missing nodes. The index nodes referring to the missing nodes get adjusted to drop the reference. Version 0.24, released 2011-09-17 --------------------------------- * Debian packaging install the fsck-larch manual page. Version 0.23, released 2011-08-19 --------------------------------- * The default size of the LRU cache used by NodeStoreDisk is not 500 instead of 100. This provides much better performance with large trees: 37% vs 99% cache hit rates with speed-test for 100k keys. * The `BTree.lookup_range` method now returns a list, not a generator. It turned out to be very surprising to return a generator, especially with the documentation saying a list was returned. (Thanks, Jo Shields, for the bug report.) Version 0.22, released 2011-08-03 --------------------------------- * The library now declares which on-disk format it supports, so that B-trees stored with an incompatible format can be detected. * `fsck-larch` now has a `--trace` option, and the library has a bit more tracing statements. Version 0.21, released 2011-08-02 --------------------------------- * Better error messages from `fsck-larch`. * Bug fix: `fsck-larch` no longer reports as missing nodes that aren't in use by all B-trees in a forest. * `fsck-larch` now has a manual page. * More `tracing.trace` statements to make it easier to track when nodes are created and destroyed. * B-tree nodes are now stored in a more efficient way on disk (several levels of directories, not just one). This is a compatibility breaker: old B-trees aren't readable with new larch, and B-trees created with the new larch aren't readable by old larch. * A couple of memory leaks fixed: both were caches that grew effectively without bounds. * Least-Recently-Used caches now log their hit/miss statistics explicitly. Previous this was done in the `__del__` method, but those did not get called deterministically, so the logging did not always happen. Version 0.20, released 2011-07-20 --------------------------------- * `pydoc larch` should now work better. * Changes to larch benchmarking scripts (to make them use cliapp). * `fsck-larch` improvements: - now uses cliapp, for better usability - now automatically detects a forest's key and node sizes, so the user no longer needs to give them manually. - some more checks - installed as part of the Debian package * API documentation with Sphinx. As part of that, the API was cleaned up a bit with regards to public and private methods (the latter being prefixed by underscores now). * The separate LRU cache implementation is now included in larch, to avoid yet another dependency, and to avoid polluting PyPI. * Various speedups. * `BTree.count_range` method added, for speed. * Library version number is now `larch.__version__` instead of `larch.version`, to follow Python conventions. Version 0.19, released 2011-03-21 --------------------------------- * The `NodeStoreDisk` class now uses a separate VFS class for I/O, rather than methods in `NodeStoreDisk` itself, and requiring subclassing with method overrides. A separate VFS class is clearer and simpler to use. As a bonus, the API is now compatible with the Obnam VFS API. * Forest metadata now includes key and node sizes, and there's a factory function that checks that the sizes given to it match the existing ones. This should reduce potential errors. * Renamed from btree to larch, to avoid name clashes with other projects. Version 0.18, released 2011-02-18 --------------------------------- * Fix memory leak. Version 0.17, released 2011-02-13 --------------------------------- * Use the python-tracing library to add logging of node creation and removal and other operations. The library makes it possible for the user to enable and disable logging for specific code modules, and defaults to being disabled, so that the logging will not affect normal execution speed. * `codec-speed` now reports MiB/s, instead of seconds, giving an easy way to compare encoding and decoding speeds with, say, hard disk or network speeds. * B-tree now again modifies nodes, and does so by first removing them from the node store's upload queue. This returns the library back to useful speeds. Version 0.16.2, released 2011-01-07 --------------------------------- * Really fix problems with nodes growing too big while in the upload queue. The previous fixes meant well, but didn't really do the trick. Now we stop modifying nodes at all while in the upload queue, which is good for several reasons. Performance in this release degrades a lot, until there is time to optimize the code. However, correctness is more important than performance. Version 0.16.1, released 2011-01-07 --------------------------------- * Bug fix: Remove temporary node used while splitting a leaf node that has grown too large. * Bug fix: Since we do, in fact, modify nodes in-place when they are not shared between trees, it was possible for a node to be put into the node store upload queue, and later modified. This is not a problem as such, but when inserting a value into a leaf node, we modify it in place making it too large, and then create one or two new nodes. If the too-large node was at the beginning of the upload queue, creating the new node might push it out, resulting in a bug. We fix this by moving the too-large node to the end of the queue, ensuring it will not be pushed out. A cleaner solution would be to not modify the node if it will grow too big, but that change will need to wait for a later date. * BTree now checks that nodes are not too big when they are put into the node store. The node store already did the checking, but only at the point where it was ready to write the node to disk, and that meant the problem was caught at a time that was hard to debug. * BTree now sets an explicit maximum size of the values inserted into the tree: slightly less than half the size of a node. This is necessary for leaf node splitting to work correctly without requiring more than more than two nodes to hold everything. Version 0.16, released 2011-01-07 --------------------------------- * The code to split over-full leaf nodes into two is fixed. Before version 0.14 we had a problem where one of the two halves might still be too big. The code in 0.14 fixed that, but introduced a new problem where one of the new nodes would be very small. Now they should be just right. No, I have not read Goldilocks recently, why do you ask? Version 0.15, released 2011-01-02 --------------------------------- * This version replaces all use of my custom `bsearch` function with the `bisect` module in the Python standard library. This speeds up all operations, some more than others. In-memory benchmarks with ´speed-test` sped up from about 20% for `remove_range` up to 240% for `lookup`. No other changes, but I felt this warranted a release on its own. Version 0.14, released 2010-12-29 --------------------------------- This version seems to work well enough for Obnam to do backups of real systems. It is, however, not as fast as one would wish. Bug fixes: * When a tree was made shallower after a remove operation, the code used to assume the direct children of the root node would not be shared with other trees. This is obviously a wrong assumptions. I must have been distracted by XKCD when writing the code. * A bug in cloning (shadowing) nodes when doing copy-on-write was fixed. The code now increment the reference counts of an index node's children correctly. * The cached encoded size of nodes now gets cleared by `remove_index_range`. * Leaf nodes are now split based on size, not key count. Key counts are OK for index nodes, whose values are all of the same size. However, leaf node values may vary wildly. Sometimes it happens that after splitting, one of the halves is still too large. * `Forest.remove_tree` now actually removes the unshared nodes of the tree that is removed. * `BTree.remove_range` was quite fast, but not always correct. The code was just tricky enough that I was unable to find the actual fault, so I rewrote the method in a simplistic, but slow way. A speed improvement for this needs to happen in a future version. Speed improvements: * When a node is cloned, its previously computed size is now remembered. Since the clone is identical to the original node, except for the id, the size will be the same. New features and stuff: * fsck-btree: a rudimentary checker of the B-tree data structures. This will undoubtedly be improved in the future, but even the simple checking it does now has already helped when debugging things. * Some parts of the code that used to be excluded from test coverage now has tests. Now 19 statements remain that are excluded from coverage. * Some other code prettification has happened, including some docstring improvements. * `BTRee.remove_range` and `lookup_range` now check that their arguments are of the correct size of keys for that tree. * `Node` got a new method, `find_pairs`. * `BTree.dump`, which is useful for debugging, is now nicer to use. * `NodeStoreDisk` no longer forces the use of `fsync` when it writes files. It is not btree's responsibility to decide that on behalf of all users. Those who want it can subclass and override the method. * `RefcountStore` and `UploadQueue` are their own modules, and have much better test coverage now. `UploadQueue` got rewritten in terms of `LRUCache`. * New `BTree.range_is_empty` method, for those (few) cases where one just needs to know if there are any keys, and where getting all keys with `lookup_range` would be slow. * `BTree.lookup_range` is now a generator, which should reduce memory consumption and thus speed things up in cases where a very large number of keys are about to be returned. Removed stuff: * The NodeStore API no longer wants a `listdir` method. It has been removed from NodeStoreDisk. * `RefcountStore` no longer logs statistics. They did not seem to be useful. * `IndexNode` no longer explicitly checks the types of its arguments. This was wasting CPU cycles, and it did not once find an actual bug. Version 0.13, released 2010-07-13 --------------------------------- * Speed-related improvement: The size of NodeStoreDisk's LRU cache for nodes is now user-settable. Version 0.12, released 2010-07-11 --------------------------------- * Some speed optimizations. * The BTree objects no longer have a root_id attribute. Instead, use BTree.root.id (if BTree.root is not None). Version 0.11, released 2010-07-05 --------------------------------- * Now includes a small example program, written for the Wellington Python User Group meeting where the first talk about this library was given. * `NodeStoreDisk` stores nodes in a simple directory hierarchy, instead of a flat one, to better handle very large trees. * `NodeStoreDisk` now has a much larger default upload queue size (now 1024, was 64), to better handle large trees. * `speed-test` now also tests `remove` and `remove_range` methods. Further, it reports both CPU and wall clock timings, and has been refactored a bit. Results for `lookup_range` are no longer comparable with old versions, since the measured scenario has changed. * `remove_range` has been rewritten and is now much faster. Version 0.10, released 2010-06-29 --------------------------------- * Storage of node reference counts is now more efficient. * NodeStoreDisk stores reference counts and nodes in files in a subdirectory of the directory root, which is an incompatible change, so trees made by earlier versions can't be used anymore. (That's what you get for using alpha level code. Obviously, once btree is declared to be ready for production, the on-disk data structures will be supported in future versions.) * Nodes that exist in only one tree are modified in place, rather than via copy-on-write. This is impure, but brings a big speed boost. * New test script: insert-remove-test. "make check" automatically runs it. * Many optimizations to NodeCodec and elsewhere, by Richard Braakman. Version 0.9, released 2010-05-26 -------------------------------- * Several speed optimizations. * One of this requires a Least-Recently-Used cache, which is provided by my python-lru package. That is now a dependency. * NodeStoreDisk now has an upload queue. This is so that when a node gets immediately overwritten, it can be removed from the queue, i.e., removed before it gets encoded and written at all. This provides quite significant speedups. * A bug fix for reference counting of index node is fixed. This allows the speedups from the upload queue to actually occur. * On-disk node encodings have changed. Anything written by previous versions is now unusable. larch-1.20131130/README0000644000175000017500000001211712246332521013744 0ustar jenkinsjenkinsLarch, a Python B-tree library ============================== This is an implementation of particular kind of B-tree, based on research by Ohad Rodeh. See "B-trees, Shadowing, and Clones" (link below) for details on the data structure. This is the same data structure that btrfs uses. Note that my implementation probably differs from what the paper describes or what btrfs implements. The distinctive feature of this B-tree implementation is that a node is never modified. Instead, all updates are done by copy-on-write. Among other things, this makes it easy to clone a tree, and modify only the clone, while other processes access the original tree. This is utterly wonderful for my backup application, and that's the reason I wrote larch in the first place. (The previous paragraph contains a small lie: larch does modify nodes in place, if they are not shared between trees. This is necessary for performance.) I have tried to keep the implementation generic and flexibile, so that you may use it in a variety of situations. For example, the tree itself does not decide where its nodes are stored: you provide a class that does that for it. I have two implementations of the NodeStore class, one for in-memory and one for on-disk storage. Documentation is sparse. Docstrings and reading the code are your best hope. There is also an example program. The speed-test benchmark program may also be useful. (Help and feedback and prodding welcome!) See the file example.py for an example. * Homepage: * Version control: `git clone git://git.liw.fi/larch` * Rodeh paper: Stability --------- The larch on-disk file format and data structures are frozen as of version 0.31. That means any B-trees stored on disk with that version, or any later version, will be readable by future versions of larch. This prompts a new version numbering scheme for Larch. In the future, the version number is of the form `FORMAT.DATE`, where `FORMAT` is the on-disk format version for Larch (currently 1), and the `DATE` is the date of the release. Build and install ----------------- setup.py should work as usual for distutils. But it might not: I have only used enough to make the Debian package build. You can also use the Debian packaging, if on Debian or a derivative. Just run "debuild -us -uc" to build the package. Or you can get packages from my site (see for details). Hacking ------- The actual tree code is in the larch directory, laid out as a normal Python package. * `tree.py` is the actual tree implementation * `nodes.py` has tree nodes * `nodestore.py` defines the interface for storing nodes on disk or wherever; `nodestore_disk.py` and `nodestore_memory.py` are two implementations of the interface * `codec.py` handles encoding of nodes for on-disk storage, and decoding too * `forest.py` handles creation of new trees and committing things to a node store Run `make check` to run the test suite. You will need my CoverageTestRunner and LRUCache, and extrautils packages: * * * * The unit test modules are paired with their corresponding code modules: for code module `foo.py` there exists a unit test module `foo_tests.py`. CoverageTestRunner makes sure each code module's unit test achieve 100% test coverage for that module, not counting explicitly excluded parts of the code. You can also use `nosetests` to run the tests. I have two scripts to run simplistic benchmarks: * `codec-speed 1000 no` tests speed of the NodeCodec class, for encoding tree nodes for on-disk storage. * `speed-test` tests insert and lookup speeds in trees. Use a `--keys` settings that is reasonably large, e.g., a million or ten. You may want to run either or both before and after making any changes, to see if you're making things better or worse. If you have any patches, please send them to me (). I prefer to merge from git repositories, but `git format-patch` or just plain diffs are always OK. Bugs and other things to hack ----------------------------- See for the bug tracker. If you're interested in hacking the larch code, speed improvements are always interesting. See the bug tracker for things known to be slow, or run `speed-test` on large numbers of keys, and use profiling to see where time is wasted. Legalese -------- Copyright 2010, 2011, 2012, 2013 Lars Wirzenius This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . larch-1.20131130/codec-speed0000755000175000017500000001307312246332521015167 0ustar jenkinsjenkins#!/usr/bin/python # Copyright 2010-2011 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import cliapp import cProfile import csv import os import subprocess import sys import time import larch class CodecBenchmark(cliapp.Application): def add_settings(self): self.settings.string(['csv'], 'append a CSV row to FILE', metavar='FILE') self.settings.boolean(['profile'], 'run benchmark under profiling?') self.settings.integer(['keys', 'n'], 'run benchmark with N keys (default: %default)', metavar='N', default=10000) def process_args(self, args): n = self.settings['keys'] do_profile = self.settings['profile'] # Prepare data for tests. key_size = 19 value_size = 128 node_size = 64*1024 codec = larch.NodeCodec(key_size) key_fmt = '%%0%dd' % key_size keys = [key_fmt % i for i in range(n)] value_fmt = '%%0%dd' % value_size leaf_values = [value_fmt % i for i in range(n)] index_values = range(n) leaf = larch.LeafNode(42, keys, leaf_values) encoded_leaf = codec.encode_leaf(leaf) index = larch.IndexNode(42, keys, index_values) encoded_index = codec.encode_index(index) # Measure and report. print 'num_operations: %d' % n leaf_size = self.measure(n, len(encoded_leaf), lambda: codec.leaf_size(keys, leaf_values), do_profile, 'leaf_size') encode_leaf = self.measure(n, len(encoded_leaf), lambda: codec.encode_leaf(leaf), do_profile, 'encode_leaf') decode_leaf = self.measure(n, len(encoded_leaf), lambda: codec.decode(encoded_leaf), do_profile, 'decode_leaf') encode_index = self.measure(n, len(encoded_index), lambda: codec.encode_index(index), do_profile, 'encode_index') decode_index = self.measure(n, len(encoded_index), lambda: codec.decode_index(encoded_index), do_profile, 'decode_index') if do_profile: print 'View *.prof with ./viewprof for profiling results.' if self.settings['csv']: self.append_csv(n, leaf_size, encode_leaf, decode_leaf, encode_index, decode_index) def measure(self, n, unit_size, func, do_profile, profname): def helper(): for i in range(n): func() start_time = time.time() start = time.clock() if do_profile: globaldict = globals().copy() localdict = locals().copy() cProfile.runctx('helper()', globaldict, localdict, '%s.prof' % profname) else: helper() end = time.clock() end_time = time.time() def speed(secs): MiB = 1024**2 return float(n * unit_size) / float(secs) / MiB cpu_speed = speed(end - start) wall_speed = speed(end_time - start_time) fmt = '%12s %6.0f MiB/s (CPU) %6.0f MiBs (wall)' print fmt % (profname, cpu_speed, wall_speed) return cpu_speed def append_csv(self, keys, leaf_size, encode_leaf, decode_leaf, encode_index, decode_index): write_title = not os.path.exists(self.settings['csv']) f = open(self.settings['csv'], 'a') self.writer = csv.writer(f, lineterminator='\n') if write_title: self.writer.writerow(('revno', 'keys', 'leaf_size (MiB/s)', 'encode_leaf (MiB/s)', 'decode_leaf (MiB/s)', 'encode_index (MiB/s)', 'decode_index (MiB/s)')) if os.path.exists('.bzr'): p = subprocess.Popen(['bzr', 'revno'], stdout=subprocess.PIPE) out, err = p.communicate() if p.returncode != 0: raise cliapp.AppException('bzr failed') revno = out.strip() else: revno = '?' self.writer.writerow((revno, keys, self.format(leaf_size), self.format(encode_leaf), self.format(decode_leaf), self.format(encode_index), self.format(decode_index))) f.close() def format(self, value): return '%.1f' % value if __name__ == '__main__': CodecBenchmark().run() larch-1.20131130/doc/0000755000175000017500000000000012246332521013627 5ustar jenkinsjenkinslarch-1.20131130/doc/Makefile0000644000175000017500000000606412246332521015275 0ustar jenkinsjenkins# Makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build PAPER = BUILDDIR = _build # Internal variables. PAPEROPT_a4 = -D latex_paper_size=a4 PAPEROPT_letter = -D latex_paper_size=letter ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . .PHONY: help clean html dirhtml pickle json htmlhelp qthelp latex changes linkcheck doctest help: @echo "Please use \`make ' where is one of" @echo " html to make standalone HTML files" @echo " dirhtml to make HTML files named index.html in directories" @echo " pickle to make pickle files" @echo " json to make JSON files" @echo " htmlhelp to make HTML files and a HTML help project" @echo " qthelp to make HTML files and a qthelp project" @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" @echo " changes to make an overview of all changed/added/deprecated items" @echo " linkcheck to check all external links for integrity" @echo " doctest to run all doctests embedded in the documentation (if enabled)" clean: -rm -rf $(BUILDDIR)/* html: $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." dirhtml: $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." pickle: $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle @echo @echo "Build finished; now you can process the pickle files." json: $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json @echo @echo "Build finished; now you can process the JSON files." htmlhelp: $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp @echo @echo "Build finished; now you can run HTML Help Workshop with the" \ ".hhp project file in $(BUILDDIR)/htmlhelp." qthelp: $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp @echo @echo "Build finished; now you can run "qcollectiongenerator" with the" \ ".qhcp project file in $(BUILDDIR)/qthelp, like this:" @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/larch.qhcp" @echo "To view the help file:" @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/larch.qhc" latex: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." @echo "Run \`make all-pdf' or \`make all-ps' in that directory to" \ "run these through (pdf)latex." changes: $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes @echo @echo "The overview file is in $(BUILDDIR)/changes." linkcheck: $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck @echo @echo "Link check complete; look for any errors in the above output " \ "or in $(BUILDDIR)/linkcheck/output.txt." doctest: $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest @echo "Testing of doctests in the sources finished, look at the " \ "results in $(BUILDDIR)/doctest/output.txt." larch-1.20131130/doc/btree.rst0000644000175000017500000000013712246332521015463 0ustar jenkinsjenkinsIndividual trees ================ .. automodule:: larch.tree :members: :undoc-members: larch-1.20131130/doc/codec.rst0000644000175000017500000000012612246332521015435 0ustar jenkinsjenkinsNode codecs =========== .. automodule:: larch.codec :members: :undoc-members: larch-1.20131130/doc/conf.py0000644000175000017500000001435012246332521015131 0ustar jenkinsjenkins# -*- coding: utf-8 -*- # # larch documentation build configuration file, created by # sphinx-quickstart on Mon Jun 6 12:42:38 2011. # # This file is execfile()d with the current directory set to its containing dir. # # Note that not all possible configuration values are present in this # autogenerated file. # # All configuration values have a default; values that are commented out # serve to show the default. import sys, os # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. sys.path.insert(0, os.path.abspath('.')) sys.path.insert(0, os.path.abspath('..')) # -- General configuration ----------------------------------------------------- # Add any Sphinx extension module names here, as strings. They can be extensions # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. extensions = ['sphinx.ext.autodoc'] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix of source filenames. source_suffix = '.rst' # The encoding of source files. #source_encoding = 'utf-8' # The master toctree document. master_doc = 'index' # General information about the project. project = u'larch' copyright = u'2011, Lars Wirzenius' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The short X.Y version. import larch version = larch.__version__ # The full version, including alpha/beta/rc tags. release = version # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. #language = None # There are two options for replacing |today|: either, you set today to some # non-false value, then it is used: #today = '' # Else, today_fmt is used as the format for a strftime call. #today_fmt = '%B %d, %Y' # List of documents that shouldn't be included in the build. #unused_docs = [] # List of directories, relative to source directory, that shouldn't be searched # for source files. exclude_trees = ['_build'] # The reST default role (used for this markup: `text`) to use for all documents. #default_role = None # If true, '()' will be appended to :func: etc. cross-reference text. #add_function_parentheses = True # If true, the current module name will be prepended to all description # unit titles (such as .. function::). #add_module_names = True # If true, sectionauthor and moduleauthor directives will be shown in the # output. They are ignored by default. #show_authors = False # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # A list of ignored prefixes for module index sorting. #modindex_common_prefix = [] # -- Options for HTML output --------------------------------------------------- # The theme to use for HTML and HTML Help pages. Major themes that come with # Sphinx are currently 'default' and 'sphinxdoc'. html_theme = 'default' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. #html_theme_options = {} # Add any paths that contain custom themes here, relative to this directory. #html_theme_path = [] # The name for this set of Sphinx documents. If None, it defaults to # " v documentation". #html_title = None # A shorter title for the navigation bar. Default is the same as html_title. #html_short_title = None # The name of an image file (relative to this directory) to place at the top # of the sidebar. #html_logo = None # The name of an image file (within the static path) to use as favicon of the # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 # pixels large. #html_favicon = None # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, # using the given strftime format. #html_last_updated_fmt = '%b %d, %Y' # If true, SmartyPants will be used to convert quotes and dashes to # typographically correct entities. #html_use_smartypants = True # Custom sidebar templates, maps document names to template names. #html_sidebars = {} # Additional templates that should be rendered to pages, maps page names to # template names. #html_additional_pages = {} # If false, no module index is generated. #html_use_modindex = True # If false, no index is generated. #html_use_index = True # If true, the index is split into individual pages for each letter. #html_split_index = False # If true, links to the reST sources are added to the pages. #html_show_sourcelink = True # If true, an OpenSearch description file will be output, and all pages will # contain a tag referring to it. The value of this option must be the # base URL from which the finished HTML is served. #html_use_opensearch = '' # If nonempty, this is the file name suffix for HTML files (e.g. ".xhtml"). #html_file_suffix = '' # Output file base name for HTML help builder. htmlhelp_basename = 'larchdoc' # -- Options for LaTeX output -------------------------------------------------- # The paper size ('letter' or 'a4'). #latex_paper_size = 'letter' # The font size ('10pt', '11pt' or '12pt'). #latex_font_size = '10pt' # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, author, documentclass [howto/manual]). latex_documents = [ ('index', 'larch.tex', u'larch Documentation', u'Lars Wirzenius', 'manual'), ] # The name of an image file (relative to this directory) to place at the top of # the title page. #latex_logo = None # For "manual" documents, if this is true, then toplevel headings are parts, # not chapters. #latex_use_parts = False # Additional stuff for the LaTeX preamble. #latex_preamble = '' # Documents to append as an appendix to all manuals. #latex_appendices = [] # If false, no module index is generated. #latex_use_modindex = True larch-1.20131130/doc/forest.rst0000644000175000017500000000014112246332521015657 0ustar jenkinsjenkinsForests of trees ================ .. automodule:: larch.forest :members: :undoc-members: larch-1.20131130/doc/index.rst0000644000175000017500000000576612246332521015506 0ustar jenkinsjenkinsWelcome to larch's documentation! ================================= This is an implementation of particular kind of B-tree, based on research by Ohad Rodeh. See "B-trees, Shadowing, and Clones" (link below) for details on the data structure. This is the same data structure that btrfs uses. The distinctive feature of these B-trees is that a node is never modified. Instead, all updates are done by copy-on-write. Among other things, this makes it easy to clone a tree, and modify only the clone, while other processes access the original tree. This is utterly wonderful for my backup application, and that's the reason I wrote larch in the first place. I have tried to keep the implementation generic and flexibile, so that you may use it in a variety of situations. For example, the tree itself does not decide where its nodes are stored: you provide a class that does that for it. I have two implementations of the NodeStore class, one for in-memory and one for on-disk storage. * Homepage: http://liw.fi/larch/ * Rodeh paper: http://liw.fi/larch/ohad-btrees-shadowing-clones.pdf Classes and functions ===================== Anything that is available as ``larch.foo.bar`` is also available as ``larch.bar``. .. toctree:: :maxdepth: 2 forest btree node codec nodestore refcountstore uploadqueue lru ondisk Quick start =========== A **forest** is a collection of related **trees**: cloning a tree is only possible within a forest. Thus, also, only trees in the same forest can share nodes. All **keys** in a all trees in a forest must be string of the same size. **Values** are strings and are stored in **nodes**, and can be of any size, almost up to half the size of a node. When creating a forest, you must specify the sizes of keys and nodes, and the directory in which everything gets stored:: import larch key_size = 3 node_size = 4096 dirname = 'example.tree' forest = larch.open_forest(key_size=key_size, node_size=node_size, dirname=dirname) The above will create a new forest, if necessary, or open an existing one (which must have been created using the same key and node sizes). To create a new tree:: tree = forest.new_tree() Alternatively, to clone an existing tree if one exists:: if forest.trees: tree = forest.new_tree(self.trees[-1]) else: tree = forest.new_tree() To insert some data into the tree:: for key in ['abc', 'foo', bar']: tree.insert(key, key.upper()) To look up value for one key:: print tree.lookup('foo') To look up a range:: for key, value in tree.lookup_range('aaa', 'zzz'): print key, value To remove a key:: tree.remove('abc') To remove a range: tree.remove_range('aaa', 'zzz') You probably don't need to worry about anything else than the ``Forest`` and ``BTree`` classes, unless you want to provide your own ``NodeStore`` instance. Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search` larch-1.20131130/doc/lru.rst0000644000175000017500000000017612246332521015167 0ustar jenkinsjenkinsLeast recently used object cache ================================ .. automodule:: larch.lru :members: :undoc-members: larch-1.20131130/doc/node.rst0000644000175000017500000000014012246332521015301 0ustar jenkinsjenkinsIndividual nodes ================ .. automodule:: larch.nodes :members: :undoc-members: larch-1.20131130/doc/nodestore.rst0000644000175000017500000000013412246332521016361 0ustar jenkinsjenkinsNode storage ============ .. automodule:: larch.nodestore :members: :undoc-members: larch-1.20131130/doc/ondisk.rst0000644000175000017500000000472712246332521015662 0ustar jenkinsjenkinsOn-disk file format and data structure ====================================== Larch B-tree forests are stored on the disk as follows: * Each forest is stored in a separate directory tree structure. * The file ``metadata`` is in the INI format, and stores metadata about the forest, including the identifiers of the root nodes of the trees. * The ``nodes`` directory contains the actual nodes. * The ``refcounts`` directory contains reference counts for nodes. The metadata file ----------------- The following values are stored in the metadata file: * ``format`` is ``1/1``: node store format version 1 and node codec version 1. * ``node_size`` is the maximum size of an encoded node, in bytes. * ``key_size`` is the size of the keys, in bytes. * ``last_id`` is the latest allocated node identifier in the forest. * ``root_ids`` lists the identifiers of the root nodes of the trees. Node files ---------- Leaf nodes are encoded as follows: * the first four bytes are 'O', 'R', 'B', 'L' * 64-bit unsigned integer giving the node identifier * 32-bit unsigned integer giving the number of key/value pairs * all keys catenated * lengths of values as 32-bit unsigned integers * all values catenated Index nodes are encoded as follows: * the first four bytes are 'O', 'R', 'B', 'I' * 64-bit unsigned integer giving the node identifier * 32-bit integer giving the number of keys * all keys catenated * all child ids catenated, where each id is a 64-bit unsigned integer All integers are in big-endian order. All root nodes are index nodes, so decoding can start knowing that the first node is an index node. Each node is stored in a file under the ``nodes`` subdirectory, using multiple levels of directories to avoid having too many files in the same directory. The basename of the file is the node identifier in hexadecimal. The directory levels are user adjustable, see the ``IdPath`` class for details. Reference counts ---------------- Each node has a reference count. When the count drops to zero, the node file gets removed. Reference counts are stored in files under the ``refcounts`` subdirectory. Each file there contains up to 32768 reference counts, as a 16-bit unsigned big-endian integer. Thus, the reference count for node i is file named ``refcount-N`` where N is i modulo 32768. A refcount file that is full of zeroes is removed. When looking up refcounts, if the file does not exist, the count is assumed to be zero. This avoids having to store refcounts for deleted nodes indefinitely. larch-1.20131130/doc/refcountstore.rst0000644000175000017500000000016612246332521017266 0ustar jenkinsjenkinsReference count storage ======================= .. automodule:: larch.refcountstore :members: :undoc-members: larch-1.20131130/doc/uploadqueue.rst0000644000175000017500000000016212246332521016711 0ustar jenkinsjenkinsUpload queue for nodes ====================== .. automodule:: larch.uploadqueue :members: :undoc-members: larch-1.20131130/example.py0000644000175000017500000000441312246332521015071 0ustar jenkinsjenkins# Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import hashlib import os import sys import larch def compute(filename): f = open(filename) md5 = hashlib.md5() while True: data = f.read(1024) if not data: break md5.update(data) f.close() return md5.hexdigest() def open_tree(allow_writes, dirname): key_size = len(compute('/dev/null')) node_size = 4096 forest = larch.open_forest( allow_writes=allow_writes, key_size=key_size, node_size=node_size, dirname=dirname) if forest.trees: tree = forest.trees[0] else: tree = forest.new_tree() return forest, tree def add(filenames): forest, tree = open_tree(allow_writes=True, dirname='example.tree') for filename in filenames: checksum = compute(filename) tree.insert(checksum, filename) forest.commit() def find(checksums): forest, tree = open_tree(allow_writes=False, dirname='example.tree') for checksum in checksums: filename = tree.lookup(checksum) print checksum, filename def list_checksums(): forest, tree = open_tree(allow_writes=False, dirname='example.tree') key_size = len(compute('/dev/null')) minkey = '0' * key_size maxkey = 'f' * key_size for checksum, filename in tree.lookup_range(minkey, maxkey): print checksum, filename def main(): if sys.argv[1] == 'add': add(sys.argv[2:]) elif sys.argv[1] == 'find': find(sys.argv[2:]) elif sys.argv[1] == 'list': list_checksums() else: raise Exception('Unknown operation %s' % sys.argv[1]) if __name__ == '__main__': main() larch-1.20131130/fsck-larch0000755000175000017500000000465412246332521015036 0ustar jenkinsjenkins#!/usr/bin/python # Copyright 2010, 2011 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . # To debug, one can create a tracing logfile by adding arguments like: # --trace=fsck --log=fsck.logfile import cliapp import logging import sys import tracing import ttystatus import larch.fsck class Fsck(cliapp.Application): def add_settings(self): self.settings.string_list(['trace'], 'add PATTERN to trace patterns', metavar='PATTERN') self.settings.boolean(['fix'], 'fix problems found?') def process_args(self, args): for pattern in self.settings['trace']: tracing.trace_add_pattern(pattern) at_least_one_error = False for dirname in args: self.errors = False forest = larch.open_forest( allow_writes=self.settings['fix'], dirname=dirname) self.ts = ttystatus.TerminalStatus(period=0.1) self.ts['item'] = None self.ts['items'] = 0 self.ts['last_id'] = forest.last_id self.ts.format( 'Checking %Counter(item)/%Integer(last_id): %String(item)') self.ts.notify('fsck-larch for %s' % dirname) fsck = larch.fsck.Fsck(forest, self.warning, self.error, self.settings['fix']) fsck.run_fsck( ts = self.ts ) self.ts.finish() if self.errors: at_least_one_error = True else: print 'fsck-larch for %s: No errors found' % dirname if self.errors: sys.exit(1) def error(self, msg): self.errors = True self.ts.notify(msg) logging.error(msg) def warning(self, msg): self.ts.notify(msg) logging.warning(msg) if __name__ == '__main__': Fsck().run() larch-1.20131130/fsck-larch.1.in0000644000175000017500000000213212246332521015564 0ustar jenkinsjenkins.\" Copyright 2011 Lars Wirzenius .\" .\" This program is free software: you can redistribute it and/or modify .\" it under the terms of the GNU General Public License as published by .\" the Free Software Foundation, either version 3 of the License, or .\" (at your option) any later version. .\" .\" This program is distributed in the hope that it will be useful, .\" but WITHOUT ANY WARRANTY; without even the implied warranty of .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the .\" GNU General Public License for more details. .\" .\" You should have received a copy of the GNU General Public License .\" along with this program. If not, see . .\" .TH FSCK-LARCH 1 .SH NAME fsck-larch \- verify that a larch B-tree is internally consistent .SH SYNOPSIS .SH DESCRIPTION .B fsck-larch reads an on-disk, committed B-tree created by the .B larch Python library, and verifies that it is internally consistent. It reports any problems it finds, but does not currently fix them. .SH OPTIONS .SH "SEE ALSO" Larch home page .RI ( http://liw.fi/larch/ ). larch-1.20131130/idpath-speed0000755000175000017500000000253712246332521015366 0ustar jenkinsjenkins#!/usr/bin/python # Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import sys import time import larch def main(): n = int(sys.argv[1]) dirname = sys.argv[2] depth = int(sys.argv[3]) bits = int(sys.argv[4]) skip = int(sys.argv[5]) idpath = larch.IdPath(dirname, depth, bits, skip) start = time.time() for i in xrange(n): path = idpath.convert(i) dirname = os.path.dirname(path) if not os.path.exists(dirname): os.makedirs(dirname) with open(path, 'w'): pass end = time.time() duration = end - start speed = n / duration print '%d ids, %.1f seconds, %.1f ids/s' % (n, duration, speed) if __name__ == '__main__': main() larch-1.20131130/insert-remove-test0000755000175000017500000000745512246332521016577 0ustar jenkinsjenkins#!/usr/bin/python # Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . # Excercise my B-tree implementation, for simple benchmarking purposes. # The benchmark gets a location and an operation count as command line # arguments. # # If the location is the empty string, an in-memory node store is used. # Otherwise it must be a non-existent directory name. # # The benchmark will do the given number of insertions into the tree, and # do_it the speed of that. Then it will look up each of those, and do_it # the lookups. import cProfile import logging import os import random import shutil import sys import time import larch def do_it(keys, func, final): start = time.clock() for key in keys: func(key) final() end = time.clock() return end - start def assert_refcounts_are_one(tree): def helper(node_id): refcount = tree.node_store.rs.get_refcount(node_id) node = tree._get_node(node_id) assert refcount == 1, 'type=%s id=%d refcount=%d' % (repr(node), node_id, refcount) if isinstance(node, larch.IndexNode): for child_id in node.values(): helper(child_id) helper(tree.root.id) def do_insert(tree, key, value): logging.debug('do_insert(%s)' % (repr(key))) if tree.root is not None: assert_refcounts_are_one(tree) tree.insert(key, value) assert_refcounts_are_one(tree) def do_remove(tree, key): logging.debug('do_remove(%s)' % (repr(key))) assert_refcounts_are_one(tree) tree.remove(key) assert_refcounts_are_one(tree) def main(): if True: import logging import tracing tracing.trace_add_pattern('tree') logging.basicConfig(filename='larch.log', level=logging.DEBUG) location = sys.argv[1] n = int(sys.argv[2]) key_size = 19 value_size = 128 node_size = 300 codec = larch.NodeCodec(key_size) if location == '': ns = larch.NodeStoreMemory(node_size, codec) else: if os.path.exists(location): raise Exception('%s exists already' % location) os.mkdir(location) ns = larch.NodeStoreDisk(True, node_size, codec, dirname=location) forest = larch.Forest(ns) tree = forest.new_tree() logging.debug('min keys: %d' % tree.min_index_length) logging.debug('max keys: %d' % tree.max_index_length) # Create list of keys. keys = ['%0*d' % (key_size, i) for i in xrange(n)] # Do inserts. value = 'x' * value_size logging.debug('start inserts') do_it(keys, lambda key: do_insert(tree, key, value), lambda: forest.commit()) logging.debug('# nodes: %d' % len(ns.list_nodes())) logging.debug('nodes: %s' % sorted(ns.list_nodes())) print '# nodes after inserts:', len(ns.list_nodes()) # Remove all but one key. logging.debug('start removes') do_it(keys[1:], lambda key: do_remove(tree, key), lambda: forest.commit()) logging.debug('# nodes: %d' % len(ns.list_nodes())) logging.debug('nodes: %s' % sorted(ns.list_nodes())) print '# nodes after removes:', len(ns.list_nodes()) assert len(ns.list_nodes()) == 2 if __name__ == '__main__': main() larch-1.20131130/larch/0000755000175000017500000000000012246332521014153 5ustar jenkinsjenkinslarch-1.20131130/larch/__init__.py0000644000175000017500000000270712246332521016272 0ustar jenkinsjenkins# Copyright 2010, 2011, 2012 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . __version__ = '1.20131130' class Error(Exception): def __str__(self): return self.msg from nodes import FrozenNode, Node, LeafNode, IndexNode from codec import NodeCodec, CodecError from tree import BTree, KeySizeMismatch, ValueTooLarge from forest import (Forest, open_forest, BadKeySize, BadNodeSize, MetadataMissingKey) from nodestore import (NodeStore, NodeStoreTests, NodeMissing, NodeTooBig, NodeExists, NodeCannotBeModified) from refcountstore import RefcountStore from lru import LRUCache from uploadqueue import UploadQueue from idpath import IdPath from journal import Journal, ReadOnlyMode from nodestore_disk import NodeStoreDisk, LocalFS, FormatProblem from nodestore_memory import NodeStoreMemory __all__ = locals() larch-1.20131130/larch/codec.py0000644000175000017500000001414212246332521015604 0ustar jenkinsjenkins# Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import struct import larch class CodecError(larch.Error): def __init__(self, msg): self.msg = msg class NodeCodec(object): '''Encode and decode nodes from their binary format. Node identifiers are assumed to fit into 64 bits. Leaf node values are assumed to fit into 4 gibibytes. ''' format = 1 # We use the struct module for encoding and decoding. For speed, # we construct the format string all at once, so that there is only # one call to struct.pack or struct.unpack for one node. This brought # a thousand time speedup over doing it one field at a time. However, # the code is not quite as clear as it might be, what with no symbolic # names for anything is used anymore. Patches welcome. def __init__(self, key_bytes): self.key_bytes = key_bytes self.leaf_header = struct.Struct('!4sQI') self.index_header = struct.Struct('!4sQI') # space for key and length of value is needed for each pair self.leaf_pair_fixed_size = key_bytes + struct.calcsize('!I') self.index_pair_size = key_bytes + struct.calcsize('!Q') def leaf_size(self, keys, values): '''Return size of a leaf node with the given pairs.''' return (self.leaf_header.size + len(keys) * self.leaf_pair_fixed_size + len(''.join([value for value in values]))) def leaf_size_delta_add(self, old_size, value): '''Return size of node that gets a new key/value pair added. ``old_size`` is the old size of the node. The key must not already have existed in the node. ''' delta = self.leaf_pair_fixed_size + len(value) return old_size + delta def leaf_size_delta_replace(self, old_size, old_value, new_value): '''Return size of node that gets a value replaced.''' return old_size + len(new_value) - len(old_value) def encode_leaf(self, node): '''Encode a leaf node as a byte string.''' keys = node.keys() values = node.values() return (self.leaf_header.pack('ORBL', node.id, len(keys)) + ''.join(keys) + struct.pack('!%dI' % len(values), *map(len, values)) + ''.join(values)) def decode_leaf(self, encoded): '''Decode a leaf node from its encoded byte string.''' buf = buffer(encoded) cookie, node_id, num_pairs = self.leaf_header.unpack_from(buf) if cookie != 'ORBL': raise CodecError('Leaf node does not begin with magic cookie ' '(should be ORBL, is %s)' % repr(cookie)) fmt = '!' + ('%ds' % self.key_bytes) * num_pairs + 'I' * num_pairs items = struct.unpack_from(fmt, buf, self.leaf_header.size) keys = items[:num_pairs] lengths = items[num_pairs:num_pairs*2] values = [] offset = self.leaf_header.size + self.leaf_pair_fixed_size * num_pairs append = values.append for length in lengths: append(encoded[offset:offset + length]) offset += length return larch.LeafNode(node_id, keys, values) def max_index_pairs(self, node_size): '''Return number of index pairs that fit in a node of a given size.''' return (node_size - self.index_header.size) / self.index_pair_size def index_size(self, keys, values): '''Return size of an index node with the given pairs.''' return self.index_header.size + self.index_pair_size * len(keys) def encode_index(self, node): '''Encode an index node as a byte string.''' keys = node.keys() child_ids = node.values() return (self.index_header.pack('ORBI', node.id, len(keys)) + ''.join(keys) + struct.pack('!%dQ' % len(child_ids), *child_ids)) def decode_index(self, encoded): '''Decode an index node from its encoded byte string.''' buf = buffer(encoded) cookie, node_id, num_pairs = self.index_header.unpack_from(buf) if cookie != 'ORBI': raise CodecError('Index node does not begin with magic cookie ' '(should be ORBI, is %s)' % repr(cookie)) fmt = '!' + ('%ds' % self.key_bytes) * num_pairs + 'Q' * num_pairs items = struct.unpack_from(fmt, buf, self.index_header.size) keys = items[:num_pairs] child_ids = items[num_pairs:] assert len(keys) == len(child_ids) for x in child_ids: assert type(x) == int return larch.IndexNode(node_id, keys, child_ids) def encode(self, node): '''Encode a node of any type.''' if isinstance(node, larch.LeafNode): return self.encode_leaf(node) else: return self.encode_index(node) def decode(self, encoded): '''Decode node of any type.''' if encoded.startswith('ORBL'): return self.decode_leaf(encoded) elif encoded.startswith('ORBI'): return self.decode_index(encoded) else: raise CodecError('Unknown magic cookie in encoded node (%s)' % repr(encoded[:4])) def size(self, node): '''Return encoded size of a node, regardless of type.''' keys = node.keys() values = node.values() if isinstance(node, larch.LeafNode): return self.leaf_size(keys, values) else: return self.index_size(keys, values) larch-1.20131130/larch/codec_tests.py0000644000175000017500000001050512246332521017025 0ustar jenkinsjenkins# Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import unittest import larch class NodeCodecTests(unittest.TestCase): def setUp(self): self.leaf = larch.LeafNode(1234, ['foo', 'yoo'], ['bar', 'yoyo']) self.index = larch.IndexNode(5678, ['bar', 'foo'], [1234, 7890]) self.codec = larch.NodeCodec(3) def test_has_format_version(self): self.assertNotEqual(self.codec.format, None) def test_returns_reasonable_size_for_empty_leaf(self): self.assert_(self.codec.leaf_size([], []) > 10) def test_returns_reasonable_size_for_empty_index(self): self.assert_(self.codec.index_size([], []) > 10) def test_returns_reasonable_size_for_empty_leaf_generic(self): leaf = larch.LeafNode(0, [], []) self.assert_(self.codec.size(leaf) > 10) def test_returns_ok_delta_for_added_key_value(self): leaf = larch.LeafNode(0, [], []) old_size = self.codec.leaf_size(leaf.keys(), leaf.values()) new_size = self.codec.leaf_size_delta_add(old_size, 'bar') self.assert_(new_size > old_size + len('foo') + len('bar')) def test_returns_ok_delta_for_changed_value_of_same_size(self): leaf = larch.LeafNode(0, ['foo'], ['bar']) old_size = self.codec.leaf_size(leaf.keys(), leaf.values()) new_size = self.codec.leaf_size_delta_replace(old_size, 'bar', 'xxx') self.assertEqual(new_size, old_size) def test_returns_ok_delta_for_changed_value_of_larger_size(self): leaf = larch.LeafNode(0, ['foo'], ['bar']) old_size = self.codec.leaf_size(leaf.keys(), leaf.values()) new_size = self.codec.leaf_size_delta_replace(old_size, 'bar', 'foobar') self.assertEqual(new_size, old_size + len('foobar') - len('foo')) def test_returns_ok_delta_for_changed_value_of_shorter_size(self): leaf = larch.LeafNode(0, ['foo'], ['bar']) old_size = self.codec.leaf_size(leaf.keys(), leaf.values()) new_size = self.codec.leaf_size_delta_replace(old_size, 'bar', '') self.assertEqual(new_size, old_size - len('foo')) def test_returns_reasonable_size_for_empty_index_generic(self): index = larch.IndexNode(0, [], []) self.assert_(self.codec.size(index) > 10) def test_leaf_round_trip_ok(self): encoded = self.codec.encode_leaf(self.leaf) decoded = self.codec.decode_leaf(encoded) self.assertEqual(decoded, self.leaf) def test_index_round_trip_ok(self): encoded = self.codec.encode_index(self.index) decoded = self.codec.decode_index(encoded) self.assertEqual(decoded.keys(), self.index.keys()) self.assertEqual(decoded.values(), self.index.values()) self.assertEqual(decoded, self.index) def test_generic_round_trip_ok_for_leaf(self): encoded = self.codec.encode(self.leaf) self.assertEqual(self.codec.decode(encoded), self.leaf) def test_generic_round_trip_ok_for_index(self): encoded = self.codec.encode(self.index) self.assertEqual(self.codec.decode(encoded), self.index) def test_decode_leaf_raises_error_for_garbage(self): self.assertRaises(larch.CodecError, self.codec.decode_leaf, 'x'*1000) def test_decode_index_raises_error_for_garbage(self): self.assertRaises(larch.CodecError, self.codec.decode_index, 'x'*1000) def test_decode_raises_error_for_garbage(self): self.assertRaises(larch.CodecError, self.codec.decode, 'x'*1000) def test_returns_resonable_max_number_of_index_pairs(self): # Header is 16 bytes. A pair is key_bytes + 8 = 11. self.assert_(self.codec.max_index_pairs(32), 1) self.assert_(self.codec.max_index_pairs(64), 4) larch-1.20131130/larch/forest.py0000644000175000017500000001552212246332521016034 0ustar jenkinsjenkins# Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import tracing import larch class MetadataMissingKey(larch.Error): def __init__(self, key_name): self.msg = 'larch forest metadata missing "%s"' % key_name class BadKeySize(larch.Error): def __init__(self, store_key_size, wanted_key_size): self.msg = ('Node store has key size %s, program wanted %s' % (store_key_size, wanted_key_size)) class BadNodeSize(larch.Error): def __init__(self, store_node_size, wanted_node_size): self.msg = ('Node store has node size %s, program wanted %s' % (store_node_size, wanted_node_size)) class Forest(object): '''A collection of related B-trees. Trees in the same forest can share nodes. Cloned trees are always created in the same forest as the original. Cloning trees is very fast: only the root node is modified. Trees can be modified in place. Modifying a tree is done using copy-on-write, so modifying a clone does not modify the original (and vice versa). You can have up to 65535 clones of a tree. The list of trees in the forest is stored in the ``trees`` property as a list of trees in the order in which they were created. ''' def __init__(self, node_store): tracing.trace('new larch.Forest with node_store=%s' % repr(node_store)) self.node_store = node_store self.trees = [] self.last_id = 0 self._read_metadata() def _read_metadata(self): tracing.trace('reading metadata') keys = self.node_store.get_metadata_keys() tracing.trace('metadata keys: %s' % repr(keys)) if 'last_id' in keys: self.last_id = int(self.node_store.get_metadata('last_id')) tracing.trace('last_id = %s' % self.last_id) if 'root_ids' in keys: s = self.node_store.get_metadata('root_ids') tracing.trace('root_ids: %s', s) if s.strip(): root_ids = [int(x) for x in s.split(',')] self.trees = [larch.BTree(self, self.node_store, root_id) for root_id in root_ids] tracing.trace('root_ids: %s' % repr(root_ids)) else: self.trees = [] tracing.trace('empty root_ids') def new_id(self): '''Generate next node id for this forest. Trees should use this whenever they create new nodes. The ids generated by this method are guaranteed to be unique (as long as commits happen OK). ''' self.last_id += 1 tracing.trace('new id = %d' % self.last_id) return self.last_id def new_tree(self, old=None): '''Create a new tree. If old is None, a completely new tree is created. Otherwise, a clone of an existing one is created. ''' tracing.trace('new tree (old=%s)' % repr(old)) if old: old_root = self.node_store.get_node(old.root.id) keys = old_root.keys() values = old_root.values() else: keys = [] values = [] t = larch.BTree(self, self.node_store, None) t._set_root(t._new_index(keys, values)) self.trees.append(t) tracing.trace('new tree root id: %s' % t.root.id) return t def remove_tree(self, tree): '''Remove a tree from the forest.''' tracing.trace('removing tree with root id %d' % tree.root.id) tree._decrement(tree.root.id) self.trees.remove(tree) def commit(self): '''Make sure all changes are stored into the node store. Changes made to the forest are guaranteed to be persistent only if commit is called successfully. ''' tracing.trace('committing forest') self.node_store.set_metadata('last_id', self.last_id) root_ids = ','.join('%d' % t.root.id for t in self.trees if t.root is not None) self.node_store.set_metadata('root_ids', root_ids) self.node_store.set_metadata('key_size', self.node_store.codec.key_bytes) self.node_store.set_metadata('node_size', self.node_store.node_size) self.node_store.save_refcounts() self.node_store.commit() def open_forest(allow_writes=None, key_size=None, node_size=None, codec=None, node_store=None, **kwargs): '''Create or open a forest. ``key_size`` and ``node_size`` are retrieved from the forest, unless given. If given, they must match exactly. If the forest does not yet exist, the sizes **must** be given. ``codec`` is the class to be used for the node codec, defaults to ``larch.NodeCodec``. Similarly, ``node_store`` is the node store class, defaults to ``larch.NodeStoreDisk``. All other keyword arguments are given the the ``node_store`` class initializer. ''' tracing.trace('opening forest') assert allow_writes is not None codec = codec or larch.NodeCodec node_store = node_store or larch.NodeStoreDisk if key_size is None or node_size is None: # Open a temporary node store for reading metadata. # For this, we can use any values for node and key sizes, # since we won't be accessing nodes or keys. c_temp = codec(42) ns_temp = node_store(False, 42, c_temp, **kwargs) if 'key_size' not in ns_temp.get_metadata_keys(): raise MetadataMissingKey('key_size') if 'node_size' not in ns_temp.get_metadata_keys(): raise MetadataMissingKey('node_size') if key_size is None: key_size = int(ns_temp.get_metadata('key_size')) if node_size is None: node_size = int(ns_temp.get_metadata('node_size')) c = codec(key_size) ns = node_store(allow_writes, node_size, c, **kwargs) def check_size(keyname, wanted, exception): if keyname not in ns.get_metadata_keys(): return value = int(ns.get_metadata(keyname)) if value != wanted: raise exception(value, wanted) check_size('key_size', key_size, BadKeySize) return Forest(ns) larch-1.20131130/larch/forest_tests.py0000644000175000017500000002023412246332521017252 0ustar jenkinsjenkins# Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import shutil import tempfile import unittest import larch class ForestTests(unittest.TestCase): def setUp(self): self.codec = larch.NodeCodec(3) self.ns = larch.NodeStoreMemory( allow_writes=True, node_size=64, codec=self.codec) self.forest = larch.Forest(self.ns) def test_new_node_ids_grow(self): id1 = self.forest.new_id() id2 = self.forest.new_id() self.assertEqual(id1 + 1, id2) def test_has_no_trees_initially(self): self.assertEqual(self.forest.trees, []) def test_creates_a_tree(self): t = self.forest.new_tree() self.assert_(isinstance(t, larch.BTree)) self.assertEqual(self.forest.trees, [t]) def test_clones_a_tree(self): t1 = self.forest.new_tree() t2 = self.forest.new_tree(t1) self.assertNotEqual(t1.root.id, t2.root.id) def test_clones_can_be_changed_independently(self): t1 = self.forest.new_tree() t2 = self.forest.new_tree(t1) t1.insert('foo', 'foo') self.assertNotEqual(t1.root.id, t2.root.id) def test_clones_do_not_clash_in_new_node_ids(self): t1 = self.forest.new_tree() t2 = self.forest.new_tree(t1) node1 = t1._new_leaf([], []) node2 = t2._new_leaf([], []) self.assertEqual(node1.id + 1, node2.id) def test_is_persistent(self): t1 = self.forest.new_tree() t1.insert('foo', 'bar') self.forest.commit() f2 = larch.Forest(self.ns) self.assertEqual([t.root.id for t in f2.trees], [t1.root.id]) def test_removes_trees(self): t1 = self.forest.new_tree() self.forest.remove_tree(t1) self.assertEqual(self.forest.trees, []) def test_remove_tree_removes_nodes_for_tree_as_well(self): t = self.forest.new_tree() t.insert('foo', 'bar') self.forest.commit() self.assertNotEqual(self.ns.list_nodes(), []) self.forest.remove_tree(t) self.assertEqual(self.ns.list_nodes(), []) def test_changes_work_across_commit(self): t1 = self.forest.new_tree() t1.insert('000', 'foo') t1.insert('001', 'bar') t2 = self.forest.new_tree(t1) t2.insert('002', 'foobar') t2.remove('000') self.forest.commit() f2 = larch.Forest(self.ns) t1a, t2a = f2.trees self.assertEqual(t1.root.id, t1a.root.id) self.assertEqual(t2.root.id, t2a.root.id) self.assertEqual(t1a.lookup('000'), 'foo') self.assertEqual(t1a.lookup('001'), 'bar') self.assertRaises(KeyError, t2a.lookup, '000') self.assertEqual(t2a.lookup('001'), 'bar') self.assertEqual(t2a.lookup('002'), 'foobar') def test_committing_single_empty_tree_works(self): self.forest.new_tree() self.assertEqual(self.forest.commit(), None) def test_read_metadata_works_after_removed_and_committed(self): t1 = self.forest.new_tree() t1.insert('foo', 'foo') self.forest.commit() self.forest.remove_tree(t1) self.forest.commit() f2 = larch.Forest(self.ns) self.assertEqual(f2.trees, []) def test_commit_puts_key_and_node_sizes_in_metadata(self): self.forest.commit() self.assertEqual(self.ns.get_metadata('key_size'), 3) self.assertEqual(self.ns.get_metadata('node_size'), 64) class OpenForestTests(unittest.TestCase): def setUp(self): self.key_size = 3 self.node_size = 64 self.tempdir = tempfile.mkdtemp() def tearDown(self): shutil.rmtree(self.tempdir) def test_creates_new_forest(self): f = larch.open_forest(key_size=self.key_size, node_size=self.node_size, dirname=self.tempdir, allow_writes=True) self.assertEqual(f.node_store.codec.key_bytes, self.key_size) self.assertEqual(f.node_store.node_size, self.node_size) def test_fail_if_metadata_missing_key_size(self): with open(os.path.join(self.tempdir, 'metadata'), 'w') as f: f.write('[metadata]\n') f.write('format=1/1\n') f.write('node_size=%s\n' % self.node_size) self.assertRaises(larch.MetadataMissingKey, larch.open_forest, key_size=self.key_size, node_size=None, dirname=self.tempdir, allow_writes=False) def test_fail_if_metadata_missing_node_size(self): with open(os.path.join(self.tempdir, 'metadata'), 'w') as f: f.write('[metadata]\n') f.write('format=1/1\n') f.write('key_size=%s\n' % self.key_size) self.assertRaises(larch.MetadataMissingKey, larch.open_forest, key_size=self.key_size, node_size=None, dirname=self.tempdir, allow_writes=False) def test_fail_if_existing_tree_has_incompatible_key_size(self): f = larch.open_forest(key_size=self.key_size, node_size=self.node_size, dirname=self.tempdir, allow_writes=True) f.commit() self.assertRaises(larch.BadKeySize, larch.open_forest, key_size=self.key_size + 1, node_size=self.node_size, dirname=self.tempdir, allow_writes=True) def test_opens_existing_tree_with_incompatible_node_size(self): f = larch.open_forest(allow_writes=True, key_size=self.key_size, node_size=self.node_size, dirname=self.tempdir) f.commit() new_size = self.node_size + 1 f2 = larch.open_forest(key_size=self.key_size, node_size=new_size, dirname=self.tempdir, allow_writes=True) self.assertEqual(int(f2.node_store.get_metadata('node_size')), self.node_size) def test_opens_existing_tree_with_compatible_key_and_node_size(self): f = larch.open_forest(key_size=self.key_size, node_size=self.node_size, dirname=self.tempdir, allow_writes=True) f.commit() f2 = larch.open_forest(key_size=self.key_size, node_size=self.node_size, dirname=self.tempdir, allow_writes=True) self.assert_(True) def test_opens_existing_tree_without_node_and_key_sizes_given(self): f = larch.open_forest(allow_writes=True, key_size=self.key_size, node_size=self.node_size, dirname=self.tempdir) f.commit() f2 = larch.open_forest(dirname=self.tempdir, allow_writes=True) self.assertEqual(f2.node_store.node_size, self.node_size) self.assertEqual(f2.node_store.codec.key_bytes, self.key_size) def test_fails_with_new_tree_unless_node_and_key_sizes_given(self): self.assertRaises(AssertionError, larch.open_forest, dirname=self.tempdir) class BadKeySizeTests(unittest.TestCase): def test_both_sizes_in_error_message(self): e = larch.BadKeySize(123, 456) self.assert_('123' in str(e)) self.assert_('456' in str(e)) class BadNodeSizeTests(unittest.TestCase): def test_both_sizes_in_error_message(self): e = larch.BadNodeSize(123, 456) self.assert_('123' in str(e)) self.assert_('456' in str(e)) larch-1.20131130/larch/fsck.py0000755000175000017500000001723412246332521015465 0ustar jenkinsjenkins# Copyright 2010, 2011 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import logging import sys import tracing import ttystatus import larch class Error(larch.Error): def __init__(self, msg): self.msg = 'Assertion failed: %s' % msg class WorkItem(object): '''A work item for fsck. Subclass can optionally set the ``name`` attribute; the class name is used by default. ''' def __str__(self): if hasattr(self, 'name'): return self.name else: return self.__class__.__name__ def do(self): pass def __iter__(self): return iter([self]) def warning(self, msg): self.fsck.warning('warning: %s: %s' % (self.name, msg)) def error(self, msg): self.fsck.error('ERROR: %s: %s' % (self.name, msg)) def get_node(self, node_id): tracing.trace('node_id=%s' % node_id) try: return self.fsck.forest.node_store.get_node(node_id) except larch.NodeMissing: self.error( 'forest %s: node %s is missing' % (self.fsck.forest_name, node_id)) def start_modification(self, node): self.fsck.forest.node_store.start_modification(node) def put_node(self, node): tracing.trace('node.id=%s' % node.id) return self.fsck.forest.node_store.put_node(node) class CheckIndexNode(WorkItem): def __init__(self, fsck, node): self.fsck = fsck self.node = node self.name = ( 'CheckIndexNode: checking index node %s in %s' % (self.node.id, self.fsck.forest_name)) def do(self): tracing.trace('node.id=%s' % self.node.id) if type(self.node) != larch.IndexNode: self.error( 'forest %s: node %s: ' 'Expected to get an index node, got %s instead' % (self.fsck.forest_name, self.node.id, type(self.node))) return if len(self.node) == 0: self.error('forest %s: index node %s: No children' % (self.fsck.forest_name, self.node.id)) return # Increase refcounts for all children, and check that the child # nodes exist. If the children are index nodes, create work # items to check those. Leaf nodes get no further checking. drop_keys = [] for key in self.node: child_id = self.node[key] seen_already = child_id in self.fsck.refcounts self.fsck.count(child_id) if not seen_already: child = self.get_node(child_id) if child is None: drop_keys.append(key) elif type(child) == larch.IndexNode: yield CheckIndexNode(self.fsck, child) # Fix references to missing children by dropping them. if self.fsck.fix and drop_keys: self.start_modification(self.node) for key in drop_keys: self.node.remove(key) self.warning('index node %s: dropped key %s' % (self.node.id, key)) self.put_node(self.node) class CheckForest(WorkItem): def __init__(self, fsck): self.fsck = fsck self.name = 'CheckForest: forest %s' % self.fsck.forest_name def do(self): tracing.trace("CheckForest: checking forest %s" % self.name ) for tree in self.fsck.forest.trees: self.fsck.count(tree.root.id) root_node = self.get_node(tree.root.id) tracing.trace('root_node.id=%s' % root_node.id) yield CheckIndexNode(self.fsck, root_node) class CheckRefcounts(WorkItem): def __init__(self, fsck): self.fsck = fsck self.name = 'CheckRefcounts: refcounts in %s' % self.fsck.forest_name def do(self): tracing.trace( 'CheckRefcounts : %s nodes to check' % len(self.fsck.refcounts) ) for node_id in self.fsck.refcounts: tracing.trace('CheckRefcounts checking node %s' % node_id) refcount = self.fsck.forest.node_store.get_refcount(node_id) if refcount != self.fsck.refcounts[node_id]: self.error( 'forest %s: node %s: refcount is %s but should be %s' % (self.fsck.forest_name, node_id, refcount, self.fsck.refcounts[node_id])) if self.fsck.fix: self.fsck.forest.node_store.set_refcount( node_id, self.fsck.refcounts[node_id]) self.warning('node %s: refcount was set to %s' % (node_id, self.fsck.refcounts[node_id])) class CommitForest(WorkItem): def __init__(self, fsck): self.fsck = fsck self.name = ('CommitForest: committing fixes to %s' % self.fsck.forest_name) def do(self): tracing.trace('committing changes to %s' % self.fsck.forest_name) self.fsck.forest.commit() class Fsck(object): '''Verify internal consistency of a larch.Forest.''' def __init__(self, forest, warning, error, fix): self.forest = forest self.forest_name = getattr( forest.node_store, 'dirname', 'in-memory forest') self.warning = warning self.error = error self.fix = fix self.refcounts = {} def find_work(self): yield CheckForest(self) yield CheckRefcounts(self) if self.fix: yield CommitForest(self) def count(self, node_id): self.refcounts[node_id] = self.refcounts.get(node_id, 0) + 1 def run_work(self, work_generators, ts=None): """run work_generator.do() recursively as needed work_generators : list of generators (eg list( self.find_work() )) who return objects with .do() methods that either return None or other generators. if a ttystatus.TerminalStatus instance is passed as ts, report fsck progress via ts """ while work_generators: work_generator = work_generators.pop(0) for work in work_generator: if ts: ts.increase('items', 1) ts['item'] = work generator_or_none = work.do() if generator_or_none: # Run new work before carrying-on with work_generators # (required for proper refcount check) work_generators.insert(0,generator_or_none) def run_fsck(self, ts=None): """Runs full fsck if a ttystatus.TerminalStatus instance is passed as ts, report fsck progress via ts item/items updates """ # Make sure that we pass list( self.find_work() ) and not # [ self.find_work() ] so that when CheckForest.do() returns # work generators, the work generators are actually called # before the CheckRefcounts check. work_generators = list( self.find_work() ) self.run_work(work_generators, ts=ts) larch-1.20131130/larch/idpath.py0000644000175000017500000000303512246332521015777 0ustar jenkinsjenkins# Copyright 2011 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os class IdPath(object): '''Convert a numeric id to a pathname. The ids are stored in a directory hierarchy, the depth of which can be adjusted by a parameter to the class. The ids are assumed to be non-negative integers. ''' def __init__(self, dirname, depth, bits_per_level, skip_bits): self.dirname = dirname self.depth = depth self.bits_per_level = bits_per_level self.skip_bits = skip_bits def convert(self, identifier): def level_bits(level): level_mask = 2**self.bits_per_level - 1 n = self.skip_bits + level * self.bits_per_level return (identifier >> n) & level_mask subdirs = ['%d' % level_bits(i) for i in range(self.depth)] parts = [self.dirname] + subdirs + ['%x' % identifier] return os.path.join(*parts) larch-1.20131130/larch/idpath_tests.py0000644000175000017500000000351412246332521017223 0ustar jenkinsjenkins# Copyright 2011 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import shutil import tempfile import unittest import larch class IdPathTests(unittest.TestCase): def setUp(self): self.tempdir = tempfile.mkdtemp() self.depth = 3 bits = 1 skip = 0 self.idpath = larch.IdPath(self.tempdir, self.depth, bits, skip) def tearDown(self): shutil.rmtree(self.tempdir) def test_returns_string(self): self.assertEqual(type(self.idpath.convert(1)), str) def test_starts_with_designated_path(self): path = self.idpath.convert(1) self.assert_(path.startswith(self.tempdir)) def test_different_ids_return_different_values(self): path1 = self.idpath.convert(42) path2 = self.idpath.convert(1024) self.assertNotEqual(path1, path2) def test_same_id_returns_same_path(self): path1 = self.idpath.convert(42) path2 = self.idpath.convert(42) self.assertEqual(path1, path2) def test_uses_desired_depth(self): path = self.idpath.convert(1) subpath = path[len(self.tempdir + os.sep):] subdir = os.path.dirname(subpath) self.assertEqual(len(subdir.split(os.sep)), self.depth) larch-1.20131130/larch/journal.py0000644000175000017500000002440712246332521016206 0ustar jenkinsjenkins# Copyright 2012 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import errno import logging import os import tracing import larch class ReadOnlyMode(larch.Error): # pragma: no cover def __init__(self): self.msg = 'Larch B-tree is in read-only mode, no changes allowed' class Journal(object): '''A journal layer on top of a virtual filesystem. The journal solves the problem of updating on-disk data structures atomically. Changes are first written to a journal, and then moved from there to the real location. If the program or system crashes, the changes can be completed later on, or rolled back, depending on what's needed for consistency. The journal works as follows: * ``x`` is the real filename * ``new/x`` is a new or modified file * ``delete/x`` is a deleted file, created there as a flag file Commit does this: * for every ``delete/x``, remove ``x`` * for every ``new/x`` except ``new/metadata``, move to ``x`` * move ``new/metadata`` to ``metadata`` Rollback does this: * remove every ``new/x`` * remove every ``delete/x`` When a journalled node store is opened, if ``new/metadata`` exists, the commit happens. Otherwise a rollback happens. This guarantees that the on-disk state is consistent. We only provide enough of a filesystem interface as is needed by NodeStoreDisk. For example, we do not care about directory removal. The journal can be opened in read-only mode, in which case it ignores any changes in ``new`` and ``delete``, and does not try to rollback or commit at start. ''' flag_basename = 'metadata' def __init__(self, allow_writes, fs, storedir): logging.debug('Initializing Journal for %s' % storedir) self.allow_writes = allow_writes self.fs = fs self.storedir = storedir if not self.storedir.endswith(os.sep): self.storedir += os.sep self.newdir = os.path.join(self.storedir, 'new/') self.deletedir = os.path.join(self.storedir, 'delete/') self.flag_file = os.path.join(self.storedir, self.flag_basename) self.new_flag = os.path.join(self.newdir, self.flag_basename) self.new_flag_seen = self.fs.exists(self.new_flag) tracing.trace('self.new_flag_seen: %s' % self.new_flag_seen) if self.allow_writes: if self.new_flag_seen: logging.debug('Automatically committing remaining changes') self.commit() else: logging.debug('Automatically rolling back remaining changes') self.rollback() else: logging.debug('Not committing/rolling back since read-only') self.new_files = set() self.deleted_files = set() def _require_rw(self): '''Raise error if modifications are not allowed.''' if not self.allow_writes: raise ReadOnlyMode() def _relative(self, filename): '''Return the part of filename that is relative to storedir.''' assert filename.startswith(self.storedir) return filename[len(self.storedir):] def _new(self, filename): '''Return name for a new file whose final name is filename.''' return os.path.join(self.newdir, self._relative(filename)) def _deleted(self, filename): '''Return name for temporary name for file to be deleted.''' return os.path.join(self.deletedir, self._relative(filename)) def _realname(self, journaldir, filename): '''Return real name for a file in a journal temporary directory.''' assert filename.startswith(journaldir) return os.path.join(self.storedir, filename[len(journaldir):]) def _is_in_new(self, filename): new = self._new(filename) return new in self.new_files or self.fs.exists(new) def _is_in_deleted(self, filename): deleted = self._deleted(filename) return deleted in self.deleted_files or self.fs.exists(deleted) def exists(self, filename): if self.allow_writes or self.new_flag_seen: if self._is_in_new(filename): return True elif self._is_in_deleted(filename): return False return self.fs.exists(filename) def makedirs(self, dirname): tracing.trace(dirname) self._require_rw() x = self._new(dirname) self.fs.makedirs(x) self.new_files.add(x) def overwrite_file(self, filename, contents): tracing.trace(filename) self._require_rw() new = self._new(filename) self.fs.overwrite_file(new, contents) self.new_files.add(new) def cat(self, filename): tracing.trace('filename=%s' % filename) tracing.trace('allow_writes=%s' % self.allow_writes) tracing.trace('new_flag_seen=%s' % self.new_flag_seen) if self.allow_writes or self.new_flag_seen: if self._is_in_new(filename): return self.fs.cat(self._new(filename)) elif self._is_in_deleted(filename): raise OSError( errno.ENOENT, os.strerror(errno.ENOENT), filename) return self.fs.cat(filename) def remove(self, filename): tracing.trace(filename) self._require_rw() new = self._new(filename) deleted = self._deleted(filename) if new in self.new_files: self.fs.remove(new) self.new_files.remove(new) elif deleted in self.deleted_files: raise OSError(errno.ENOENT, os.strerror(errno.ENOENT), filename) else: self.fs.overwrite_file(deleted, '') self.deleted_files.add(deleted) def list_files(self, dirname): '''List all files. Files only, no directories. ''' assert not dirname.startswith(self.newdir) assert not dirname.startswith(self.deletedir) if self.allow_writes or self.new_flag_seen: if self.fs.exists(dirname): for x in self.climb(dirname, files_only=True): if self.exists(x): yield x new = self._new(dirname) if self.fs.exists(new): for x in self.climb(new, files_only=True): yield self._realname(self.newdir, x) else: if self.fs.exists(dirname): for x in self.climb(dirname, files_only=True): in_new = x.startswith(self.newdir) in_deleted = x.startswith(self.deletedir) if not in_new and not in_deleted: yield x def climb(self, dirname, files_only=False): basenames = self.fs.listdir(dirname) filenames = [] for basename in basenames: pathname = os.path.join(dirname, basename) if self.fs.isdir(pathname): for x in self.climb(pathname, files_only=files_only): yield x else: filenames.append(pathname) for filename in filenames: yield filename if not files_only: yield dirname def _clear_directory(self, dirname): tracing.trace(dirname) for pathname in self.climb(dirname): if pathname != dirname: if self.fs.isdir(pathname): self.fs.rmdir(pathname) else: self.fs.remove(pathname) def _vivify(self, dirname, exclude): tracing.trace('dirname: %s' % dirname) tracing.trace('exclude: %s' % repr(exclude)) all_excludes = [dirname] + exclude for pathname in self.climb(dirname): if pathname not in all_excludes: r = self._realname(dirname, pathname) parent = os.path.dirname(r) if self.fs.isdir(pathname): if not self.fs.exists(r): if not self.fs.exists(parent): self.fs.makedirs(parent) self.fs.rename(pathname, r) else: if not self.fs.exists(parent): self.fs.makedirs(parent) self.fs.rename(pathname, r) def rollback(self): tracing.trace('%s start' % self.storedir) self._require_rw() if self.fs.exists(self.newdir): self._clear_directory(self.newdir) if self.fs.exists(self.deletedir): self._clear_directory(self.deletedir) self.new_files = set() self.deleted_files = set() tracing.trace('%s done' % self.storedir) def _really_delete(self, deletedir): tracing.trace(deletedir) for pathname in self.climb(deletedir, files_only=True): if pathname != deletedir: realname = self._realname(deletedir, pathname) try: self.fs.remove(realname) except OSError, e: # pragma: no cover if e.errno not in (errno.ENOENT, errno.EISDIR): raise self.fs.remove(pathname) def commit(self, skip=[]): tracing.trace('%s start' % self.storedir) self._require_rw() if self.fs.exists(self.deletedir): self._really_delete(self.deletedir) if self.fs.exists(self.newdir): skip = [self._new(x) for x in skip] self._vivify(self.newdir, [self.new_flag] + skip) if not skip and self.fs.exists(self.new_flag): self.fs.rename(self.new_flag, self.flag_file) self.new_files = set() self.deleted_files = set() tracing.trace('%s done' % self.storedir) larch-1.20131130/larch/journal_tests.py0000644000175000017500000002323212246332521017423 0ustar jenkinsjenkins# Copyright 2012 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import shutil import tempfile import unittest import larch class JournalTests(unittest.TestCase): def setUp(self): self.tempdir = tempfile.mkdtemp() self.fs = larch.LocalFS() self.j = larch.Journal(True, self.fs, self.tempdir) def tearDown(self): shutil.rmtree(self.tempdir) def join(self, *args): return os.path.join(self.tempdir, *args) def test_constructs_new_filename(self): self.assertEqual(self.j._new(self.join('foo')), self.join('new', 'foo')) def test_constructs_deleted_filename(self): self.assertEqual(self.j._deleted(self.join('foo')), self.join('delete', 'foo')) def test_does_not_know_random_directory_initially(self): self.assertFalse(self.j.exists(self.join('foo'))) def test_creates_directory(self): dirname = self.join('foo/bar') self.j.makedirs(dirname) self.assertTrue(self.j.exists(dirname)) def test_rollback_undoes_directory_creation(self): dirname = self.join('foo/bar') self.j.makedirs(dirname) self.j.rollback() self.assertFalse(self.j.exists(dirname)) def test_rollback_keeps_committed_directory(self): dirname = self.join('foo/bar') self.j.makedirs(dirname) self.j.commit() self.j.rollback() self.assertTrue(self.j.exists(dirname)) def test_rollback_works_without_changes(self): self.assertEqual(self.j.rollback(), None) def test_creates_new_file(self): filename = self.join('foo/bar') self.j.overwrite_file(filename, 'bar') self.assertEqual(self.j.cat(filename), 'bar') def test_rollback_undoes_new_file(self): filename = self.join('foo/bar') self.j.overwrite_file(filename, 'bar') self.j.rollback() self.assertFalse(self.j.exists(filename)) def test_commits_new_file(self): filename = self.join('foo/bar') self.j.overwrite_file(filename, 'bar') self.j.commit() self.j.rollback() self.assertEqual(self.j.cat(filename), 'bar') def test_creates_new_file_after_commit(self): filename = self.join('foo/bar') self.j.overwrite_file(filename, 'bar') self.j.commit() self.j.overwrite_file(filename, 'yo') self.assertEqual(self.j.cat(filename), 'yo') def test_cat_does_not_find_deleted_file(self): filename = self.join('foo/bar') self.j.overwrite_file(filename, 'bar') self.j.commit() self.j.remove(filename) self.assertRaises(OSError, self.j.cat, filename) def test_rollback_brings_back_old_file(self): filename = self.join('foo/bar') self.j.overwrite_file(filename, 'bar') self.j.commit() self.j.overwrite_file(filename, 'yo') self.j.rollback() self.assertEqual(self.j.cat(filename), 'bar') def test_removes_uncommitted_file(self): filename = self.join('foo/bar') self.j.overwrite_file(filename, 'bar') self.j.remove(filename) self.assertFalse(self.j.exists(filename)) def test_rollback_undoes_removal_of_uncommitted_file(self): filename = self.join('foo/bar') self.j.overwrite_file(filename, 'bar') self.j.remove(filename) self.j.rollback() self.assertFalse(self.j.exists(filename)) def test_commits_file_removal(self): filename = self.join('foo/bar') self.j.overwrite_file(filename, 'bar') self.j.remove(filename) self.j.commit() self.j.rollback() self.assertFalse(self.j.exists(filename)) def test_removes_committed_file(self): filename = self.join('foo/bar') self.j.overwrite_file(filename, 'bar') self.j.commit() self.j.remove(filename) self.assertFalse(self.j.exists(filename)) def test_removing_committed_file_twice_causes_oserror(self): filename = self.join('foo/bar') self.j.overwrite_file(filename, 'bar') self.j.commit() self.j.remove(filename) self.assertRaises(OSError, self.j.remove, filename) def test_rollback_brings_back_committed_file(self): filename = self.join('foo/bar') self.j.overwrite_file(filename, 'bar') self.j.commit() self.j.remove(filename) self.j.rollback() self.assertEqual(self.j.cat(filename), 'bar') def test_commits_removal_of_committed_file(self): filename = self.join('foo/bar') self.j.overwrite_file(filename, 'bar') self.j.commit() self.j.remove(filename) self.j.commit() self.j.rollback() self.assertFalse(self.j.exists(filename)) def test_commits_metadata(self): metadata = self.join('metadata') self.j.overwrite_file(metadata, 'yuck') self.j.commit() self.assertEqual(self.fs.cat(self.join(metadata)), 'yuck') def test_unflagged_commit_means_new_instance_rollbacks(self): filename = self.join('foo/bar') self.j.overwrite_file(filename, 'bar') j2 = larch.Journal(True, self.fs, self.tempdir) self.assertFalse(j2.exists(filename)) def test_partial_commit_finished_by_new_instance(self): filename = self.join('foo/bar') metadata = self.join('metadata') self.j.overwrite_file(filename, 'bar') self.j.overwrite_file(metadata, '') self.j.commit(skip=[filename]) j2 = larch.Journal(True, self.fs, self.tempdir) self.assertTrue(j2.exists(filename)) class ReadOnlyJournalTests(unittest.TestCase): def setUp(self): self.tempdir = tempfile.mkdtemp() self.fs = larch.LocalFS() self.rw = larch.Journal(True, self.fs, self.tempdir) self.ro = larch.Journal(False, self.fs, self.tempdir) def tearDown(self): shutil.rmtree(self.tempdir) def join(self, *args): return os.path.join(self.tempdir, *args) def test_does_not_know_random_directory_initially(self): self.assertFalse(self.ro.exists(self.join('foo'))) def test_creating_directory_raises_error(self): self.assertRaises(larch.ReadOnlyMode, self.ro.makedirs, 'foo') def test_calling_rollback_raises_error(self): self.assertRaises(larch.ReadOnlyMode, self.ro.rollback) def test_readonly_mode_does_not_check_for_directory_creation(self): dirname = self.join('foo/bar') self.rw.makedirs(dirname) self.assertFalse(self.ro.exists(dirname)) def test_write_file_raisees_error(self): self.assertRaises(larch.ReadOnlyMode, self.ro.overwrite_file, 'foo', 'bar') def test_readonly_mode_does_not_check_for_new_file(self): filename = self.join('foo') self.rw.overwrite_file(filename, 'bar') self.assertFalse(self.ro.exists(filename)) def test_readonly_mode_does_not_check_for_modified_file(self): filename = self.join('foo') self.rw.overwrite_file(filename, 'first') self.rw.commit() self.assertEqual(self.ro.cat(filename), 'first') self.rw.overwrite_file(filename, 'second') self.assertEqual(self.ro.cat(filename), 'first') def test_readonly_mode_does_not_know_file_is_deleted_in_journal(self): filename = self.join('foo/bar') self.rw.overwrite_file(filename, 'bar') self.rw.commit() self.rw.remove(filename) self.assertEqual(self.ro.cat(filename), 'bar') def tests_lists_no_files_initially(self): dirname = self.join('foo') self.assertEqual(list(self.ro.list_files(dirname)), []) def test_lists_files_correctly_when_no_changes(self): dirname = self.join('foo') filename = self.join('foo/bar') self.rw.overwrite_file(filename, 'bar') self.rw.commit() self.assertEqual(list(self.ro.list_files(dirname)), [filename]) def test_lists_added_file_correctly(self): dirname = self.join('foo') filename = self.join('foo/bar') self.rw.overwrite_file(filename, 'bar') self.assertEqual(list(self.rw.list_files(dirname)), [filename]) self.assertEqual(list(self.ro.list_files(dirname)), []) def test_lists_added_file_correctly_when_dir_existed_already(self): dirname = self.join('foo') filename = self.join('foo/bar') filename2 = self.join('foo/foobar') self.rw.overwrite_file(filename, 'bar') self.rw.commit() self.rw.overwrite_file(filename2, 'yoyo') self.assertEqual(sorted(list(self.rw.list_files(dirname))), sorted([filename, filename2])) self.assertEqual(list(self.ro.list_files(dirname)), [filename]) def test_lists_removed_file_correctly(self): dirname = self.join('foo') filename = self.join('foo/bar') self.rw.overwrite_file(filename, 'bar') self.rw.commit() self.rw.remove(filename) self.assertEqual(list(self.rw.list_files(dirname)), []) self.assertEqual(list(self.ro.list_files(dirname)), [filename]) larch-1.20131130/larch/lru.py0000644000175000017500000001013712246332521015331 0ustar jenkinsjenkins# Copyright 2010 Lars Wirzenius, Richard Braakman # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import heapq import logging class LRUCache(object): '''A least-recently-used cache. This class caches objects, based on keys. The cache has a fixed size, in number of objects. When a new object is added to the cache, the least recently used old object is dropped. Each object is associated with a key, and use is defined as retrieval of the object using the key. Two hooks are provided for: for removing an object by user request, and when it is automatically removed due to cache overflow. Either hook is called with the key and object as arguments. ''' def __init__(self, max_size, remove_hook=None, forget_hook=None): self.max_size = max_size # Together, obj_before and obj_after form a random access # double-linked sequence. None used as the sentinel on both ends. self.obj_before = dict() self.obj_after = dict() self.obj_before[None] = None self.obj_after[None] = None self.ids = dict() # maps key to object self.objs = dict() # maps object to key self.remove_hook = remove_hook self.forget_hook = forget_hook self.hits = 0 self.misses = 0 def log_stats(self): # pragma: no cover logging.debug('LRUCache %s: hits=%s misses=%s' % (self, self.hits, self.misses)) def __len__(self): return len(self.ids) def keys(self): '''List keys for objects in cache.''' return self.ids.keys() def add(self, key, obj): '''Add new item to cache.''' if key in self.ids: self.remove(key) before = self.obj_before[None] self.obj_before[None] = obj self.obj_before[obj] = before self.obj_after[before] = obj self.obj_after[obj] = None self.ids[key] = obj self.objs[obj] = key while len(self.ids) > self.max_size: self._forget_oldest() def _forget_oldest(self): obj = self.obj_after[None] key = self.objs[obj] self._remove(key) if self.forget_hook: self.forget_hook(key, obj) def _remove(self, key): obj = self.ids[key] before = self.obj_before[obj] after = self.obj_after[obj] self.obj_before[after] = before self.obj_after[before] = after del self.obj_before[obj] del self.obj_after[obj] del self.ids[key] del self.objs[obj] def get(self, key): '''Retrieve item from cache. Return object associated with key, or None. ''' if key in self.ids: self.hits += 1 obj = self.ids[key] self.remove(key) self.add(key, obj) return obj else: self.misses += 1 return None def remove(self, key): '''Remove an item from the cache. Return True if item was in cache, False otherwise. ''' if key in self.ids: obj = self.ids[key] self._remove(key) if self.remove_hook: self.remove_hook(key, obj) return True else: return False def remove_oldest(self): '''Remove oldest object. Return key and object. ''' obj = self.obj_after[None] key = self.objs[obj] self.remove(key) return key, obj larch-1.20131130/larch/lru_tests.py0000644000175000017500000000754412246332521016563 0ustar jenkinsjenkins# Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import unittest import larch class LRUCacheTests(unittest.TestCase): def setUp(self): self.cache = larch.LRUCache(4) self.cache.remove_hook = self.remove_hook self.cache.forget_hook = self.forget_hook self.removed = [] self.forgotten = [] def remove_hook(self, key, obj): self.removed.append((key, obj)) def forget_hook(self, key, obj): self.forgotten.append((key, obj)) def test_does_not_have_remove_hook_initially(self): cache = larch.LRUCache(4) self.assertEqual(cache.remove_hook, None) def test_sets_remove_hook_via_init(self): cache = larch.LRUCache(4, remove_hook=self.remove_hook) self.assertEqual(cache.remove_hook, self.remove_hook) def test_does_not_have_forget_hook_initially(self): cache = larch.LRUCache(4) self.assertEqual(cache.forget_hook, None) def test_sets_forget_hook_via_init(self): cache = larch.LRUCache(4, forget_hook=self.forget_hook) self.assertEqual(cache.forget_hook, self.forget_hook) def test_does_not_contain_object_initially(self): self.assertEqual(self.cache.get('foo'), None) def test_does_contain_object_after_it_is_added(self): self.cache.add('foo', 'bar') self.assertEqual(self.cache.get('foo'), 'bar') def test_oldest_object_dropped_first(self): for i in range(self.cache.max_size + 1): self.cache.add(i, i) self.assertEqual(self.cache.get(0), None) self.assertEqual(self.forgotten, [(0, 0)]) for i in range(1, self.cache.max_size + 1): self.assertEqual(self.cache.get(i), i) def test_getting_object_prevents_it_from_being_dropped(self): for i in range(self.cache.max_size + 1): self.cache.add(i, i) self.cache.get(0) self.assertEqual(self.cache.get(1), None) self.assertEqual(self.forgotten, [(1, 1)]) for i in [0] + range(2, self.cache.max_size + 1): self.assertEqual(self.cache.get(i), i) def test_adding_key_twice_changes_object(self): self.cache.add('foo', 'foo') self.cache.add('foo', 'bar') self.assertEqual(self.cache.get('foo'), 'bar') def test_removes_object(self): self.cache.add('foo', 'bar') gotit = self.cache.remove('foo') self.assertEqual(gotit, True) self.assertEqual(self.cache.get('foo'), None) self.assertEqual(self.removed, [('foo', 'bar')]) def test_remove_returns_False_for_unknown_object(self): self.assertEqual(self.cache.remove('foo'), False) def test_removes_oldest_object(self): self.cache.add(0, 0) self.cache.add(1, 1) self.assertEqual(self.cache.remove_oldest(), (0, 0)) self.assertEqual(self.cache.get(0), None) def test_length_is_initially_zero(self): self.assertEqual(len(self.cache), 0) def test_length_is_correct_after_adds(self): self.cache.add(0, 0) self.assertEqual(len(self.cache), 1) def test_has_initially_no_keys(self): self.assertEqual(self.cache.keys(), []) def test_has_keys_after_add(self): self.cache.add(0, 1) self.assertEqual(self.cache.keys(), [0]) larch-1.20131130/larch/nodes.py0000644000175000017500000001571212246332521015643 0ustar jenkinsjenkins# Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import bisect import larch class FrozenNode(larch.Error): '''User tried to modify node that is frozen.''' def __init__(self, node): self.msg = 'Node %s is frozen against modifications' % node.id class Node(object): '''Abstract base class for index and leaf nodes. A node may be initialized with a list of (key, value) pairs. For leaf nodes, the values are the actual values. For index nodes, they are references to other nodes. A node can be indexed using keys, and give the corresponding value. Setting key/value pairs cannot be done using indexing. However, ``key in node`` does work, as does iteration over a key's values. ``len(node)`` returns the number if keys. Two nodes compare equal if they have the same key/value pairs. The node ids do not need to match. Nodes can be modified, bt only if the ``frozen`` property is false. If it is set to true, any attempt at modifying the node causes the ``FrozenNode`` exception to be raised. ''' def __init__(self, node_id, keys, values): self._keys = list(keys) self._values = list(values) self._dict = dict() for i in range(len(keys)): self._dict[keys[i]] = values[i] self.id = node_id self.size = None self.frozen = False def __getitem__(self, key): return self._dict[key] def __contains__(self, key): return key in self._dict def __eq__(self, other): return self._keys == other._keys and self._values == other._values def __iter__(self): for key in self._keys: yield key def __len__(self): return len(self._keys) def __nonzero__(self): return True def keys(self): '''Return keys in the node, sorted.''' return self._keys def values(self): '''Return value in the node, in same order as keys.''' return self._values def first_key(self): '''Return smallest key in the node.''' return self._keys[0] def find_potential_range(self, minkey, maxkey): '''Find pairs whose key is in desired range. ``minkey`` and ``maxkey`` are inclusive. We take into account that for index nodes, a child's key really represents a range of keys, from the key up to (but not including) the next child's key. The last child's key represents a range up to infinity. Thus we return the first child, if its key lies between ``minkey`` and ``maxkey``, and the last child, if its key is at most ``maxkey``. ''' def helper(key, default): x = bisect.bisect_left(self._keys, key) if x < len(self._keys): if self._keys[x] > key: if x == 0: x = default else: x -= 1 else: if x == 0: x = None else: x -= 1 return x i = helper(minkey, 0) j = helper(maxkey, None) if j is None: i = None return i, j def _error_if_frozen(self): if self.frozen: raise FrozenNode(self) def add(self, key, value): '''Insert a key/value pair into the right place in a node.''' self._error_if_frozen() i = bisect.bisect_left(self._keys, key) if i < len(self._keys) and self._keys[i] == key: self._keys[i] = key self._values[i] = value else: self._keys.insert(i, key) self._values.insert(i, value) self._dict[key] = value self.size = None def remove(self, key): '''Remove a key from the node. Raise KeyError if key does not exist in node. ''' self._error_if_frozen() i = bisect.bisect_left(self._keys, key) if i >= len(self._keys) or self._keys[i] != key: raise KeyError(key) del self._keys[i] del self._values[i] del self._dict[key] self.size = None def remove_index_range(self, lo, hi): '''Remove keys given a range of indexes into pairs. lo and hi are inclusive. ''' self._error_if_frozen() del self._keys[lo:hi+1] del self._values[lo:hi+1] self.size = None class LeafNode(Node): '''Leaf node in the tree. A leaf node contains key/value pairs (both strings), and has no children. ''' def find_keys_in_range(self, minkey, maxkey): '''Find pairs whose key is in desired range. ``minkey`` and ``maxkey`` are inclusive. ''' i = bisect.bisect_left(self._keys, minkey) j = bisect.bisect_left(self._keys, maxkey) if j < len(self._keys) and self._keys[j] == maxkey: j += 1 return self._keys[i:j] class IndexNode(Node): '''Index node in the tree. An index node contains pairs of keys and references to other nodes (node ids, which are integers). The other nodes may be either index nodes or leaf nodes. ''' def find_key_for_child_containing(self, key): '''Return key for the child that contains ``key``.''' i = bisect.bisect_left(self._keys, key) if i < len(self._keys): if self._keys[i] == key: return key elif i: return self._keys[i-1] elif i: return self._keys[i-1] def find_children_in_range(self, minkey, maxkey): '''Find all children whose key is in the range. ``minkey`` and ``maxkey`` are inclusive. Note that a child might be returned even if not all of its keys are in the range, just some of them. Also, we consider potential keys here, not actual keys. We have no way to retrieve the children to check which keys they actually have, so instead we return which keys might have the desired keys, and the caller can go look at those. ''' i, j = self.find_potential_range(minkey, maxkey) if i is not None and j is not None: return self._values[i:j+1] else: return [] larch-1.20131130/larch/nodes_tests.py0000644000175000017500000002767612246332521017101 0ustar jenkinsjenkins# Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import unittest import larch class FrozenNodeTests(unittest.TestCase): def test_node_id_is_in_error_message(self): node = larch.nodes.Node(123, [], []) e = larch.FrozenNode(node) self.assert_('123' in str(e)) class NodeTests(unittest.TestCase): def setUp(self): self.node_id = 12765 self.pairs = [('key2', 'value2'), ('key1', 'value1')] self.pairs.sort() self.keys = [k for k, v in self.pairs] self.values = [v for k, v in self.pairs] self.node = larch.nodes.Node(self.node_id, self.keys, self.values) def test_has_id(self): self.assertEqual(self.node.id, self.node_id) def test_empty_node_is_still_true(self): empty = larch.nodes.Node(self.node_id, [], []) self.assert_(empty) def test_has_no_size(self): self.assertEqual(self.node.size, None) def test_has_each_pair(self): for key, value in self.pairs: self.assertEqual(self.node[key], value) def test_raises_keyerror_for_missing_key(self): self.assertRaises(KeyError, self.node.__getitem__, 'notexist') def test_contains_each_key(self): for key, value in self.pairs: self.assert_(key in self.node) def test_does_not_contain_wrong_key(self): self.assertFalse('notexist' in self.node) def test_is_equal_to_itself(self): self.assert_(self.node == self.node) def test_iterates_over_all_keys(self): self.assertEqual([k for k in self.node], sorted(k for k, v in self.pairs)) def test_has_correct_length(self): self.assertEqual(len(self.node), len(self.pairs)) def test_has_keys(self): self.assertEqual(self.node.keys(), sorted(k for k, v in self.pairs)) def test_sorts_keys(self): self.assertEqual(self.node.keys(), sorted(k for k, v in self.pairs)) def test_has_values(self): self.assertEqual(self.node.values(), [v for k, v in sorted(self.pairs)]) def test_returns_correct_first_key(self): self.assertEqual(self.node.first_key(), 'key1') def test_returns_keys_and_values(self): self.assertEqual(self.node.keys(), self.keys) self.assertEqual(self.node.values(), self.values) def test_adds_key_value_pair_to_empty_node(self): node = larch.nodes.Node(0, [], []) node.add('foo', 'bar') self.assertEqual(node.keys(), ['foo']) self.assertEqual(node.values(), ['bar']) self.assertEqual(node['foo'], 'bar') def test_adds_key_value_pair_to_end_of_node_of_one_element(self): node = larch.nodes.Node(0, ['foo'], ['bar']) node.add('foo2', 'bar2') self.assertEqual(node.keys(), ['foo', 'foo2']) self.assertEqual(node.values(), ['bar', 'bar2']) self.assertEqual(node['foo2'], 'bar2') def test_adds_key_value_pair_to_beginning_of_node_of_one_element(self): node = larch.nodes.Node(0, ['foo'], ['bar']) node.add('bar', 'bar') self.assertEqual(node.keys(), ['bar', 'foo']) self.assertEqual(node.values(), ['bar', 'bar']) self.assertEqual(node['bar'], 'bar') def test_adds_key_value_pair_to_middle_of_node_of_two_elements(self): node = larch.nodes.Node(0, ['bar', 'foo'], ['bar', 'bar']) node.add('duh', 'bar') self.assertEqual(node.keys(), ['bar', 'duh', 'foo']) self.assertEqual(node.values(), ['bar', 'bar', 'bar']) self.assertEqual(node['duh'], 'bar') def test_add_replaces_value_for_existing_key(self): node = larch.nodes.Node(0, ['bar', 'foo'], ['bar', 'bar']) node.add('bar', 'xxx') self.assertEqual(node.keys(), ['bar', 'foo']) self.assertEqual(node.values(), ['xxx', 'bar']) self.assertEqual(node['bar'], 'xxx') def test_add_resets_cached_size(self): node = larch.nodes.Node(0, [], []) node.size = 1234 node.add('foo', 'bar') self.assertEqual(node.size, None) def test_removes_first_key(self): node = larch.nodes.Node(0, ['bar', 'duh', 'foo'], ['bar', 'bar', 'bar']) node.remove('bar') self.assertEqual(node.keys(), ['duh', 'foo']) self.assertEqual(node.values(), ['bar', 'bar']) self.assertRaises(KeyError, node.__getitem__, 'bar') def test_removes_last_key(self): node = larch.nodes.Node(0, ['bar', 'duh', 'foo'], ['bar', 'bar', 'bar']) node.remove('foo') self.assertEqual(node.keys(), ['bar', 'duh']) self.assertEqual(node.values(), ['bar', 'bar']) self.assertRaises(KeyError, node.__getitem__, 'foo') def test_removes_middle_key(self): node = larch.nodes.Node(0, ['bar', 'duh', 'foo'], ['bar', 'bar', 'bar']) node.remove('duh') self.assertEqual(node.keys(), ['bar', 'foo']) self.assertEqual(node.values(), ['bar', 'bar']) self.assertRaises(KeyError, node.__getitem__, 'duh') def test_raises_exception_when_removing_unknown_key(self): node = larch.nodes.Node(0, ['bar', 'duh', 'foo'], ['bar', 'bar', 'bar']) self.assertRaises(KeyError, node.remove, 'yo') def test_remove_resets_cached_size(self): node = larch.nodes.Node(0, ['foo'], ['bar']) node.size = 1234 node.remove('foo') self.assertEqual(node.size, None) def test_removes_index_range(self): node = larch.nodes.Node(0, ['bar', 'duh', 'foo'], ['bar', 'bar', 'bar']) node.size = 12375654 node.remove_index_range(1, 5) self.assertEqual(node.keys(), ['bar']) self.assertEqual(node.values(), ['bar']) self.assertEqual(node.size, None) def test_finds_keys_in_range(self): # The children's keys are 'bar' and 'foo'. We need to test for # every combination of minkey and maxkey being less than, equal, # or greater than either child key (as long as minkey <= maxkey). node = larch.LeafNode(0, ['bar', 'foo'], ['bar', 'foo']) find = node.find_keys_in_range self.assertEqual(find('aaa', 'aaa'), []) self.assertEqual(find('aaa', 'bar'), ['bar']) self.assertEqual(find('aaa', 'ccc'), ['bar']) self.assertEqual(find('aaa', 'foo'), ['bar', 'foo']) self.assertEqual(find('aaa', 'ggg'), ['bar', 'foo']) self.assertEqual(find('bar', 'bar'), ['bar']) self.assertEqual(find('bar', 'ccc'), ['bar']) self.assertEqual(find('bar', 'foo'), ['bar', 'foo']) self.assertEqual(find('bar', 'ggg'), ['bar', 'foo']) self.assertEqual(find('ccc', 'ccc'), []) self.assertEqual(find('ccc', 'foo'), ['foo']) self.assertEqual(find('ccc', 'ggg'), ['foo']) self.assertEqual(find('foo', 'foo'), ['foo']) self.assertEqual(find('foo', 'ggg'), ['foo']) self.assertEqual(find('ggg', 'ggg'), []) def test_finds_no_potential_range_in_empty_node(self): node = larch.LeafNode(0, [], []) self.assertEqual(node.find_potential_range('aaa', 'bbb'), (None, None)) def test_finds_potential_ranges(self): # The children's keys are 'bar' and 'foo'. We need to test for # every combination of minkey and maxkey being less than, equal, # or greater than either child key (as long as minkey <= maxkey). node = larch.LeafNode(0, ['bar', 'foo'], ['bar', 'foo']) find = node.find_potential_range self.assertEqual(find('aaa', 'aaa'), (None, None)) self.assertEqual(find('aaa', 'bar'), (0, 0)) self.assertEqual(find('aaa', 'ccc'), (0, 0)) self.assertEqual(find('aaa', 'foo'), (0, 1)) self.assertEqual(find('aaa', 'ggg'), (0, 1)) self.assertEqual(find('bar', 'bar'), (0, 0)) self.assertEqual(find('bar', 'ccc'), (0, 0)) self.assertEqual(find('bar', 'foo'), (0, 1)) self.assertEqual(find('bar', 'ggg'), (0, 1)) self.assertEqual(find('ccc', 'ccc'), (0, 0)) self.assertEqual(find('ccc', 'foo'), (0, 1)) self.assertEqual(find('ccc', 'ggg'), (0, 1)) self.assertEqual(find('foo', 'foo'), (1, 1)) self.assertEqual(find('foo', 'ggg'), (1, 1)) # This one is a bit special. The last key may refer to a # child that is an index node, so it _might_ have keys # in the desired range. self.assertEqual(find('ggg', 'ggg'), (1, 1)) def test_is_not_frozen(self): self.assertEqual(self.node.frozen, False) def test_freezing_makes_add_raise_error(self): self.node.frozen = True self.assertRaises(larch.FrozenNode, self.node.add, 'foo', 'bar') def test_freezing_makes_remove_raise_error(self): self.node.frozen = True self.assertRaises(larch.FrozenNode, self.node.remove, 'foo') def test_freezing_makes_remove_index_range_raise_error(self): self.node.frozen = True self.assertRaises(larch.FrozenNode, self.node.remove_index_range, 0, 1) class IndexNodeTests(unittest.TestCase): def setUp(self): self.leaf1 = larch.LeafNode(0, ['bar'], ['bar']) self.leaf2 = larch.LeafNode(1, ['foo'], ['foo']) self.index_id = 1234 self.index = larch.IndexNode(self.index_id, ['bar', 'foo'], [self.leaf1.id, self.leaf2.id]) def test_find_key_for_child_containing(self): find = self.index.find_key_for_child_containing self.assertEqual(find('aaa'), None) self.assertEqual(find('bar'), 'bar') self.assertEqual(find('bar2'), 'bar') self.assertEqual(find('foo'), 'foo') self.assertEqual(find('foo2'), 'foo') def test_returns_none_when_no_child_contains_key(self): self.assertEqual(self.index.find_key_for_child_containing('a'), None) def test_finds_no_key_when_node_is_empty(self): empty = larch.IndexNode(0, [], []) self.assertEqual(empty.find_key_for_child_containing('f00'), None) def test_finds_no_children_in_range_when_empty(self): empty = larch.IndexNode(0, [], []) self.assertEqual(empty.find_children_in_range('bar', 'foo'), []) def test_finds_children_in_ranges(self): # The children's keys are 'bar' and 'foo'. We need to test for # every combination of minkey and maxkey being less than, equal, # or greater than either child key (as long as minkey <= maxkey). find = self.index.find_children_in_range bar = self.leaf1.id foo = self.leaf2.id self.assertEqual(find('aaa', 'aaa'), []) self.assertEqual(find('aaa', 'bar'), [bar]) self.assertEqual(find('aaa', 'ccc'), [bar]) self.assertEqual(find('aaa', 'foo'), [bar, foo]) self.assertEqual(find('aaa', 'ggg'), [bar, foo]) self.assertEqual(find('bar', 'bar'), [bar]) self.assertEqual(find('bar', 'ccc'), [bar]) self.assertEqual(find('bar', 'foo'), [bar, foo]) self.assertEqual(find('bar', 'ggg'), [bar, foo]) self.assertEqual(find('ccc', 'ccc'), [bar]) self.assertEqual(find('ccc', 'foo'), [bar, foo]) self.assertEqual(find('ccc', 'ggg'), [bar, foo]) self.assertEqual(find('foo', 'foo'), [foo]) self.assertEqual(find('foo', 'ggg'), [foo]) self.assertEqual(find('ggg', 'ggg'), [foo]) larch-1.20131130/larch/nodestore.py0000644000175000017500000003074412246332521016537 0ustar jenkinsjenkins# Copyright 2010, 2011 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import larch class NodeMissing(larch.Error): '''A node cannot be found from a NodeStore.''' def __init__(self, node_store, node_id, error=None, error_msg=''): if error is not None: error_msg = (': %s: %s: %s' % (error.errno, error.strerror, error.filename)) self.msg = ('Node %s cannot be found in the node store %s%s' % (hex(node_id), node_store, error_msg)) class NodeTooBig(larch.Error): '''User tried to put a node that was too big into the store.''' def __init__(self, node, node_size): self.msg = ('%s %s is too big (%d bytes)' % (node.__class__.__name__, hex(node.id), node_size)) class NodeExists(larch.Error): '''User tried to put a node that already exists in the store.''' def __init__(self, node_id): self.msg = 'Node %s is already in the store' % hex(node_id) class NodeCannotBeModified(larch.Error): '''User called start_modification on node that cannot be modified.''' def __init__(self, node_id): self.msg = 'Node %s cannot be modified' % hex(node_id) class NodeStore(object): # pragma: no cover '''Abstract base class for storing nodes externally. The ``BTree`` class itself does not handle external storage of nodes. Instead, it is given an object that implements the API in this class. An actual implementation might keep nodes in memory, or store them on disk using a filesystem, or a database. Node stores deal with nodes as byte strings: the ``codec`` encodes them before handing them to the store, and decodes them when it gets them from the store. Each node has an identifier that is unique within the store. The identifier is an integer, and the caller makes the following guarantees about it: * it is a non-negative integer * new nodes are assigned the next consecutive one * it is never re-used Further, the caller makes the following guarantees about the encoded nodes: * they have a strict upper size limit * the tree attempts to fill nodes as close to the limit as possible The size limit is given to the node store at initialization time. It is accessible via the ``node_size`` property. Implementations of this API must handle that in some suitable way, preferably by inheriting from this class and calling its initializer. ``self.max_value_size`` gives the maximum size of a value stored in a node. A node store additionally stores some metadata, as key/value pairs, where both key and value is a shortish string. The whole pair must fit into a node, but more than one node can be used for metadata. ''' def __init__(self, allow_writes, node_size, codec): self.allow_writes = allow_writes self.node_size = node_size self.codec = codec self.max_value_size = (node_size / 2) - codec.leaf_header.size def max_index_pairs(self): '''Max number of index pairs in an index node.''' return self.codec.max_index_pairs(self.node_size) def set_metadata(self, key, value): '''Set a metadata key/value pair.''' def get_metadata(self, key): '''Return value that corresponds to a key.''' def get_metadata_keys(self): '''Return list of all metadata keys.''' def remove_metadata(self, key): '''Remove a metadata key, and its corresponding value.''' def save_metadata(self): '''Save metadata persistently, if applicable. Not all node stores are persistent, and this method is not relevant to them. However, if the user does not call this method, none of the changes they make will be stored persistently even with a persistent store. ''' def put_node(self, node): '''Put a new node into the store.''' def get_node(self, node_id): '''Return a node from the store. Raise the ``NodeMissing`` exception if the node is not in the store (has never been, or has been removed). Raise other errors as suitable. ''' def can_be_modified(self, node): '''Can a node be modified?''' return self.get_refcount(node.id) == 1 def start_modification(self, node): '''Start modification of a node. User must call this before modifying a node in place. If a node cannot be modified, ``NodeCannotBeModified`` exception will be raised. ''' def remove_node(self, node_id): '''Remove a node from the store.''' def list_nodes(self): '''Return list of ids of all nodes in store.''' def get_refcount(self, node_id): '''Return the reference count for a node.''' def set_refcount(self, node_id, refcount): '''Set the reference count for a node.''' def save_refcounts(self): '''Save refcounts to disk. This method only applies to node stores that persist. ''' def commit(self): '''Make sure all changes to are committed to the store. Until this is called, there's no guarantee that any of the changes since the previous commit are persistent. ''' class NodeStoreTests(object): # pragma: no cover '''Re-useable tests for ``NodeStore`` implementations. The ``NodeStore`` base class can't be usefully instantiated itself. Instead you are supposed to sub-class it and implement the API in a suitable way for yourself. This class implements a number of tests that the API implementation must pass. The implementation's own test class should inherit from this class, and ``unittest.TestCase``. The test sub-class should define a setUp method that sets the following: * ``self.ns`` to an instance of the API implementation sub-class * ``self.node_size`` to the node size Key size (``self.key_bytes``) is always 3. ''' key_bytes = 3 def assertEqualNodes(self, n1, n2): '''Assert that two nodes are equal. Equal means same keys, and same values for keys. Nodes can be either leaf or index ones. ''' self.assertEqual(sorted(n1.keys()), sorted(n2.keys())) for key in n1: self.assertEqual(n1[key], n2[key]) def test_sets_node_size(self): self.assertEqual(self.ns.node_size, self.node_size) def test_sets_max_value_size(self): self.assert_(self.ns.max_value_size > 1) self.assert_(self.ns.max_value_size < self.node_size / 2) def test_sets_metadata(self): self.ns.set_metadata('foo', 'bar') self.assert_('foo' in self.ns.get_metadata_keys()) self.assertEqual(self.ns.get_metadata('foo'), 'bar') def test_sets_existing_metadata(self): self.ns.set_metadata('foo', 'bar') self.ns.set_metadata('foo', 'foobar') self.assert_('foo' in self.ns.get_metadata_keys()) self.assertEqual(self.ns.get_metadata('foo'), 'foobar') def test_removes_metadata(self): self.ns.set_metadata('foo', 'bar') self.ns.remove_metadata('foo') self.assert_('foo' not in self.ns.get_metadata_keys()) def test_sets_several_metadata_keys(self): old_keys = self.ns.get_metadata_keys() pairs = dict(('%d' % i, '%0128d' % i) for i in range(1024)) for key, value in pairs.iteritems(): self.ns.set_metadata(key, value) self.assertEqual(sorted(self.ns.get_metadata_keys()), sorted(pairs.keys() + old_keys)) for key, value in pairs.iteritems(): self.assertEqual(self.ns.get_metadata(key), value) def test_raises_error_when_getting_unknown_key(self): self.assertRaises(KeyError, self.ns.get_metadata, 'foo') def test_raises_error_when_removing_unknown_key(self): self.assertRaises(KeyError, self.ns.remove_metadata, 'foo') def test_has_no_node_zero_initially(self): self.assertRaises(NodeMissing, self.ns.get_node, 0) def test_lists_no_nodes_initially(self): self.assertEqual(self.ns.list_nodes(), []) def test_puts_and_gets_same(self): node = larch.LeafNode(0, [], []) self.ns.put_node(node) self.ns.commit() self.assertEqualNodes(self.ns.get_node(0), node) def test_put_freezes_node(self): node = larch.LeafNode(0, [], []) self.ns.put_node(node) self.assert_(node.frozen) def test_get_freezes_node(self): node = larch.LeafNode(0, [], []) self.ns.put_node(node) node2 = self.ns.get_node(0) self.assert_(node2.frozen) def test_node_not_in_store_can_not_be_modified(self): node = larch.LeafNode(0, [], []) self.assertFalse(self.ns.can_be_modified(node)) def test_node_with_refcount_0_can_not_be_modified(self): node = larch.LeafNode(0, [], []) self.ns.put_node(node) self.ns.set_refcount(node.id, 0) self.assertFalse(self.ns.can_be_modified(node)) def test_node_with_refcount_1_can_be_modified(self): node = larch.LeafNode(0, [], []) self.ns.put_node(node) self.ns.set_refcount(node.id, 1) self.assertTrue(self.ns.can_be_modified(node)) def test_node_with_refcount_2_can_not_be_modified(self): node = larch.LeafNode(0, [], []) self.ns.put_node(node) self.ns.set_refcount(node.id, 2) self.assertFalse(self.ns.can_be_modified(node)) def test_unfreezes_node_when_modification_starts(self): node = larch.LeafNode(0, [], []) self.ns.put_node(node) self.ns.set_refcount(node.id, 1) self.ns.start_modification(node) self.assertFalse(node.frozen) def test_removes_node(self): node = larch.LeafNode(0, [], []) self.ns.put_node(node) self.ns.commit() self.ns.remove_node(0) self.assertRaises(NodeMissing, self.ns.get_node, 0) self.assertEqual(self.ns.list_nodes(), []) def test_removes_node_from_upload_queue_if_one_exists(self): node = larch.LeafNode(0, [], []) self.ns.put_node(node) self.ns.remove_node(0) self.assertRaises(NodeMissing, self.ns.get_node, 0) self.assertEqual(self.ns.list_nodes(), []) def test_lists_node_zero(self): node = larch.LeafNode(0, [], []) self.ns.put_node(node) self.ns.commit() node_ids = self.ns.list_nodes() self.assertEqual(node_ids, [node.id]) def test_put_allows_to_overwrite_a_node(self): node = larch.LeafNode(0, [], []) self.ns.put_node(node) node = larch.LeafNode(0, ['foo'], ['bar']) self.ns.put_node(node) new = self.ns.get_node(0) self.assertEqual(new.keys(), ['foo']) self.assertEqual(new.values(), ['bar']) def test_put_allows_to_overwrite_a_node_after_upload_queue_push(self): node = larch.LeafNode(0, [], []) self.ns.put_node(node) self.ns.commit() node = larch.LeafNode(0, ['foo'], ['bar']) self.ns.put_node(node) self.ns.commit() new = self.ns.get_node(0) self.assertEqual(new.keys(), ['foo']) self.assertEqual(new.values(), ['bar']) def test_remove_raises_nodemissing_if_node_does_not_exist(self): self.assertRaises(NodeMissing, self.ns.remove_node, 0) def test_returns_zero_count_for_unknown_node_id(self): self.assertEqual(self.ns.get_refcount(123), 0) def test_sets_refcount(self): self.ns.set_refcount(0, 123) self.assertEqual(self.ns.get_refcount(0), 123) def test_updates_refcount(self): self.ns.set_refcount(0, 123) self.ns.set_refcount(0, 0) self.assertEqual(self.ns.get_refcount(0), 0) larch-1.20131130/larch/nodestore_disk.py0000644000175000017500000002354412246332521017551 0ustar jenkinsjenkins# Copyright 2010, 2011 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # (at your option) any later version. # the Free Software Foundation, either version 3 of the License, or # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import ConfigParser import logging import os import StringIO import struct import tempfile import traceback import tracing import larch DIR_DEPTH = 3 DIR_BITS = 12 DIR_SKIP = 13 class FormatProblem(larch.Error): # pragma: no cover def __init__(self, msg): self.msg = msg class LocalFS(object): # pragma: no cover '''Access to local filesystem. The ``NodeStoreDisk`` class will use a class with this interface to do disk operations. This class implements access to the local filesystem. ''' def makedirs(self, dirname): '''Create directories, simliar to os.makedirs.''' if not os.path.exists(dirname): os.makedirs(dirname) def rmdir(self, dirname): '''Remove an empty directory.''' os.rmdir(dirname) def cat(self, filename): '''Return contents of a file.''' return file(filename).read() def overwrite_file(self, filename, contents): '''Write data to disk. File may exist already.''' dirname = os.path.dirname(filename) if not os.path.exists(dirname): os.makedirs(dirname) fd, tempname = tempfile.mkstemp(dir=dirname) os.write(fd, contents) os.close(fd) os.rename(tempname, filename) def exists(self, filename): '''Does a file exist already?''' return os.path.exists(filename) def isdir(self, filename): '''Does filename and is it a directory?''' return os.path.isdir(filename) def rename(self, old, new): '''Rename a file.''' os.rename(old, new) def remove(self, filename): '''Remove a file.''' os.remove(filename) def listdir(self, dirname): '''Return basenames from directory.''' return os.listdir(dirname) class NodeStoreDisk(larch.NodeStore): '''An implementation of larch.NodeStore API for on-disk storage. The caller will specify a directory in which the nodes will be stored. Each node is stored in its own file, named after the node identifier. The ``vfs`` optional argument to the initializer can be used to override filesystem access. By default, the local filesystem is used, but any class can be substituted. ''' # The on-disk format version is format_base combined with whatever # format the codec specifies. format_base = 1 nodedir = 'nodes' def __init__(self, allow_writes, node_size, codec, dirname=None, upload_max=1024, lru_size=500, vfs=None, format=None): tracing.trace('new NodeStoreDisk: %s', dirname) assert dirname is not None if format is not None: tracing.trace('forcing format_base: %s', format) self.format_base = format larch.NodeStore.__init__( self, allow_writes=allow_writes, node_size=node_size, codec=codec) self.dirname = dirname self.metadata_name = os.path.join(dirname, 'metadata') self.metadata = None self.rs = larch.RefcountStore(self) self.cache_size = lru_size self.cache = larch.LRUCache(self.cache_size) self.upload_max = upload_max self.upload_queue = larch.UploadQueue(self._really_put_node, self.upload_max) self.vfs = vfs if vfs != None else LocalFS() self.journal = larch.Journal(allow_writes, self.vfs, dirname) self.idpath = larch.IdPath(os.path.join(dirname, self.nodedir), DIR_DEPTH, DIR_BITS, DIR_SKIP) @property def format_version(self): return '%s/%s' % (self.format_base, self.codec.format) def _load_metadata(self): if self.metadata is None: tracing.trace('load metadata') self.metadata = ConfigParser.ConfigParser() self.metadata.add_section('metadata') if self.journal.exists(self.metadata_name): tracing.trace('metadata file (%s) exists, reading it' % self.metadata_name) data = self.journal.cat(self.metadata_name) f = StringIO.StringIO(data) self.metadata.readfp(f) self._verify_metadata() else: self.metadata.set('metadata', 'format', self.format_version) def _verify_metadata(self): if not self.metadata.has_option('metadata', 'format'): raise FormatProblem('larch on-disk format missing ' '(old version?): %s' % self.dirname) format = self.metadata.get('metadata', 'format') if format != self.format_version: raise FormatProblem('larch on-disk format is incompatible ' '(is %s, should be %s): %s' % (format, self.format_version, self.dirname)) def get_metadata_keys(self): self._load_metadata() return self.metadata.options('metadata') def get_metadata(self, key): self._load_metadata() if self.metadata.has_option('metadata', key): return self.metadata.get('metadata', key) else: raise KeyError(key) def set_metadata(self, key, value): self._load_metadata() self.metadata.set('metadata', key, value) tracing.trace('key=%s value=%s', repr(key), repr(value)) def remove_metadata(self, key): self._load_metadata() if self.metadata.has_option('metadata', key): self.metadata.remove_option('metadata', key) else: raise KeyError(key) def save_metadata(self): tracing.trace('saving metadata') self._load_metadata() f = StringIO.StringIO() self.metadata.write(f) self.journal.overwrite_file(self.metadata_name, f.getvalue()) def pathname(self, node_id): return self.idpath.convert(node_id) def put_node(self, node): tracing.trace('putting node %s into cache and upload queue' % node.id) node.frozen = True self.cache.add(node.id, node) self.upload_queue.put(node) def push_upload_queue(self): tracing.trace('pushing upload queue') self.upload_queue.push() self.cache.log_stats() self.cache = larch.LRUCache(self.cache_size) def _really_put_node(self, node): tracing.trace('really put node %s' % node.id) encoded_node = self.codec.encode(node) if len(encoded_node) > self.node_size: raise larch.NodeTooBig(node, len(encoded_node)) name = self.pathname(node.id) tracing.trace('node %s to be stored in %s' % (node.id, name)) self.journal.overwrite_file(name, encoded_node) def get_node(self, node_id): tracing.trace('getting node %s' % node_id) node = self.cache.get(node_id) if node is not None: tracing.trace('cache hit: %s' % node_id) return node node = self.upload_queue.get(node_id) if node is not None: tracing.trace('upload queue hit: %s' % node_id) return node name = self.pathname(node_id) tracing.trace('reading node %s from file %s' % (node_id, name)) try: encoded = self.journal.cat(name) except (IOError, OSError), e: logging.debug('Error reading node: %s: %s: %s' % (e.errno, e.strerror, e.filename or name)) logging.debug(traceback.format_exc()) raise larch.NodeMissing(self.dirname, node_id, error=e) else: node = self.codec.decode(encoded) node.frozen = True self.cache.add(node.id, node) return node def start_modification(self, node): tracing.trace('start modiyfing node %s' % node.id) self.upload_queue.remove(node.id) node.frozen = False def remove_node(self, node_id): tracing.trace('removing node %s (incl. cache and upload queue)' % node_id) self.cache.remove(node_id) got_it = self.upload_queue.remove(node_id) name = self.pathname(node_id) if self.journal.exists(name): self.journal.remove(name) elif not got_it: raise larch.NodeMissing( self.dirname, node_id, error_msg='attempted to remove node that is not ' 'in journal or in upload queue') def list_nodes(self): queued = self.upload_queue.list_ids() nodedir = os.path.join(self.dirname, self.nodedir) uploaded = [] if self.journal.exists(nodedir): for filename in self.journal.list_files(nodedir): uploaded.append(int(os.path.basename(filename), 16)) return queued + uploaded def get_refcount(self, node_id): return self.rs.get_refcount(node_id) def set_refcount(self, node_id, refcount): self.rs.set_refcount(node_id, refcount) def save_refcounts(self): tracing.trace('saving refcounts') self.rs.save_refcounts() def commit(self): self.push_upload_queue() self.save_metadata() self.journal.commit() larch-1.20131130/larch/nodestore_disk_tests.py0000644000175000017500000001016712246332521020770 0ustar jenkinsjenkins# Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import shutil import tempfile import unittest import larch import nodestore_disk class NodeStoreDiskTests(unittest.TestCase, larch.NodeStoreTests): def setUp(self): self.node_size = 4096 self.codec = larch.NodeCodec(self.key_bytes) self.tempdir = tempfile.mkdtemp() self.ns = self.new_ns() def tearDown(self): shutil.rmtree(self.tempdir) def new_ns(self, format=None): return nodestore_disk.NodeStoreDisk(True, self.node_size, self.codec, dirname=self.tempdir, format=format) def test_metadata_has_format_version(self): self.assertEqual(self.ns.get_metadata('format'), self.ns.format_version) def test_metadata_format_version_is_persistent(self): self.ns.save_metadata() ns2 = self.new_ns() self.assertEqual(ns2.get_metadata('format'), ns2.format_version) def test_refuses_to_open_if_format_version_is_old(self): old = self.new_ns(format=0) old.save_metadata() new = self.new_ns(format=1) self.assertRaises(larch.Error, new.get_metadata, 'format') def test_refuses_to_open_if_format_version_is_not_there(self): self.ns.remove_metadata('format') self.ns.save_metadata() ns2 = self.new_ns() self.assertRaises(larch.Error, ns2.get_metadata, 'format') def test_has_persistent_metadata(self): self.ns.set_metadata('foo', 'bar') self.ns.save_metadata() ns2 = self.new_ns() self.assertEqual(ns2.get_metadata('foo'), 'bar') def test_metadata_does_not_persist_without_saving(self): self.ns.set_metadata('foo', 'bar') ns2 = self.new_ns() self.assertEqual(ns2.get_metadata_keys(), ['format']) def test_refcounts_persist(self): self.ns.set_refcount(0, 1234) self.per_group = 2 self.ns.save_refcounts() self.ns.journal.commit() ns2 = self.new_ns() self.assertEqual(self.ns.get_refcount(0), 1234) self.assertEqual(ns2.get_refcount(0), 1234) def test_put_refuses_too_large_a_node(self): node = larch.LeafNode(0, ['000'], ['x' * (self.node_size + 1)]) def helper(node): self.ns.put_node(node) self.ns.commit() self.assertRaises(larch.NodeTooBig, helper, node) def test_puts_and_gets_same_with_cache_emptied(self): node = larch.LeafNode(0, [], []) self.ns.put_node(node) self.ns.cache = larch.LRUCache(100) self.assertEqualNodes(self.ns.get_node(0), node) def test_put_uploads_queue_overflow(self): self.ns.upload_max = 2 self.ns.upload_queue.max = self.ns.upload_max ids = range(self.ns.upload_max + 1) for i in ids: node = larch.LeafNode(i, [], []) self.ns.put_node(node) self.assertEqual(sorted(self.ns.list_nodes()), ids) for node_id in ids: self.ns.cache.remove(node_id) self.assertEqual(self.ns.get_node(node_id).id, node_id) def test_gets_node_from_disk(self): node = larch.LeafNode(0, [], []) self.ns.put_node(node) self.ns.commit() ns2 = self.new_ns() node2 = ns2.get_node(node.id) self.assertEqual(node.id, node2.id) self.assertEqual(node.keys(), node2.keys()) self.assertEqual(node.values(), node2.values()) larch-1.20131130/larch/nodestore_memory.py0000644000175000017500000000433312246332521020122 0ustar jenkinsjenkins# Copyright 2010, 2011 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import larch class NodeStoreMemory(larch.NodeStore): '''An implementation of larch.NodeStore API for in-memory storage. All nodes are kept in memory. This is for demonstration and testing purposes only. ''' def __init__(self,allow_writes, node_size, codec): larch.NodeStore.__init__( self, allow_writes=allow_writes, node_size=node_size, codec=codec) self.nodes = dict() self.refcounts = dict() self.metadata = dict() def get_metadata_keys(self): return self.metadata.keys() def get_metadata(self, key): return self.metadata[key] def set_metadata(self, key, value): self.metadata[key] = value def remove_metadata(self, key): del self.metadata[key] def put_node(self, node): node.frozen = True self.nodes[node.id] = node def get_node(self, node_id): if node_id in self.nodes: return self.nodes[node_id] else: raise larch.NodeMissing(repr(self), node_id) def start_modification(self, node): node.frozen = False def remove_node(self, node_id): if node_id in self.nodes: del self.nodes[node_id] else: raise larch.NodeMissing(repr(self), node_id) def list_nodes(self): return self.nodes.keys() def get_refcount(self, node_id): return self.refcounts.get(node_id, 0) def set_refcount(self, node_id, refcount): self.refcounts[node_id] = refcount larch-1.20131130/larch/nodestore_memory_tests.py0000644000175000017500000000201012246332521021332 0ustar jenkinsjenkins# Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import unittest import larch import nodestore_memory class NodeStoreMemoryTests(unittest.TestCase, larch.NodeStoreTests): def setUp(self): self.node_size = 4096 self.codec = larch.NodeCodec(self.key_bytes) self.ns = nodestore_memory.NodeStoreMemory( allow_writes=True, node_size=self.node_size, codec=self.codec) larch-1.20131130/larch/refcountstore.py0000644000175000017500000001062712246332521017435 0ustar jenkinsjenkins# Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import logging import os import StringIO import struct import tempfile import tracing import larch def encode_refcounts(refcounts, start_id, how_many, keys): fmt = '!QH' + 'H' * how_many args = [start_id, how_many] + ([0] * how_many) for key in keys: args[2 + key - start_id] = refcounts[key] return struct.pack(fmt, *args) def decode_refcounts(encoded): n = struct.calcsize('!QH') start_id, how_many = struct.unpack('!QH', encoded[:n]) counts = struct.unpack('!' + 'H' * how_many, encoded[n:]) return zip(range(start_id, start_id + how_many), counts) class RefcountStore(object): '''Store node reference counts. Each node has a reference count, which gets stored on disk. Reference counts are grouped into blocks of ``self.per_group`` counts, and each group is stored in its own file. This balances the per-file overhead with the overhead of keeping a lot of unneeded reference counts in memory. Only those blocks that are used get loaded into memory. Blocks that are full of zeroes are not stored in files, to save disk space. ''' per_group = 2**15 refcountdir = 'refcounts' def __init__(self, node_store): self.node_store = node_store self.refcounts = dict() self.dirty = set() def get_refcount(self, node_id): '''Return reference count for a given node.''' if node_id not in self.refcounts: group = self._load_refcount_group(self._start_id(node_id)) if group is None: self.refcounts[node_id] = 0 else: for x, count in group: if x not in self.dirty: self.refcounts[x] = count return self.refcounts[node_id] def set_refcount(self, node_id, refcount): '''Set the reference count for a given node.''' tracing.trace('setting refcount for %s to %s' % (node_id, refcount)) self.refcounts[node_id] = refcount self.dirty.add(node_id) def save_refcounts(self): '''Save all modified refcounts.''' tracing.trace('saving refcounts (len(dirty) = %s)' % (len(self.dirty))) if self.dirty: dirname = os.path.join(self.node_store.dirname, self.refcountdir) if not self.node_store.journal.exists(dirname): self.node_store.journal.makedirs(dirname) ids = sorted(self.dirty) all_ids_in_memory = set(self.refcounts.keys()) for start_id in range(self._start_id(ids[0]), self._start_id(ids[-1]) + 1, self.per_group): all_ids_in_group = set( range(start_id, start_id + self.per_group)) keys = all_ids_in_group.intersection(all_ids_in_memory) if keys: encoded = encode_refcounts( self.refcounts, start_id, self.per_group, keys) filename = self._group_filename(start_id) self.node_store.journal.overwrite_file(filename, encoded) # We re-initialize these so that they don't grow indefinitely. self.refcounts = dict() self.dirty = set() def _load_refcount_group(self, start_id): filename = self._group_filename(start_id) if self.node_store.journal.exists(filename): encoded = self.node_store.journal.cat(filename) return decode_refcounts(encoded) def _group_filename(self, start_id): return os.path.join(self.node_store.dirname, self.refcountdir, 'refcounts-%d' % start_id) def _start_id(self, node_id): return (node_id / self.per_group) * self.per_group larch-1.20131130/larch/refcountstore_tests.py0000644000175000017500000000675612246332521020667 0ustar jenkinsjenkins# Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import shutil import tempfile import unittest import larch import larch.nodestore_disk class DummyNodeStore(object): def __init__(self, dirname): self.dirname = dirname self.journal = self def makedirs(self, dirname): if not os.path.exists(dirname): os.makedirs(dirname) def cat(self, filename): return file(filename).read() def overwrite_file(self, filename, contents): file(filename, 'w').write(contents) def exists(self, filename): return os.path.exists(filename) def rename(self, old, new): os.rename(old, new) def remove(self, filename): os.remove(filename) class RefcountStoreTests(unittest.TestCase): def setUp(self): self.dirname = tempfile.mkdtemp() self.rs = self.new_rs() def tearDown(self): shutil.rmtree(self.dirname) def new_rs(self): return larch.RefcountStore(DummyNodeStore(self.dirname)) def test_returns_zero_for_unset_refcount(self): self.assertEqual(self.rs.get_refcount(123), 0) def test_sets_refcount(self): self.rs.set_refcount(123, 1) self.assertEqual(self.rs.get_refcount(123), 1) def test_updates_refcount(self): self.rs.set_refcount(123, 1) self.rs.set_refcount(123, 2) self.assertEqual(self.rs.get_refcount(123), 2) def test_refcounts_are_not_saved_automatically(self): self.rs.set_refcount(123, 1) rs2 = self.new_rs() self.assertEqual(rs2.get_refcount(123), 0) def test_saves_refcounts(self): self.rs.set_refcount(123, 1) self.rs.save_refcounts() rs2 = self.new_rs() self.assertEqual(rs2.get_refcount(123), 1) def test_save_refcounts_works_without_changes(self): self.assertEqual(self.rs.save_refcounts(), None) def test_refcount_group_encode_decode_round_trip_works(self): refs = range(2048) for ref in refs: self.rs.set_refcount(ref, ref) encoded = larch.refcountstore.encode_refcounts( self.rs.refcounts, 0, 1024, range(1024)) decoded = larch.refcountstore.decode_refcounts(encoded) self.assertEqual(decoded, [(x, x) for x in refs[:1024]]) def test_group_returns_correct_start_id_for_node_zero(self): self.assertEqual(self.rs._start_id(0), 0) def test_group_returns_correct_start_id_for_last_id_in_group(self): self.assertEqual(self.rs._start_id(self.rs.per_group - 1), 0) def test_group_returns_correct_start_id_for_first_in_second_group(self): self.assertEqual(self.rs._start_id(self.rs.per_group), self.rs.per_group) def test_group_returns_correct_start_id_for_second_in_second_group(self): self.assertEqual(self.rs._start_id(self.rs.per_group + 1), self.rs.per_group) larch-1.20131130/larch/tree.py0000644000175000017500000005335512246332521015477 0ustar jenkinsjenkins# Copyright 2010, 2011 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import bisect import tracing import larch '''A simple B-tree implementation.''' class KeySizeMismatch(larch.Error): '''User tried to use key of wrong size.''' def __init__(self, key, wanted_size): self.msg = ('Key %s is of wrong length (%d, should be %d)' % (repr(key), len(key), wanted_size)) class ValueTooLarge(larch.Error): '''User tried ot use a vlaue htat is too large for a node.''' def __init__(self, value, max_size): self.msg = ('Value %s is too long (%d, max %d)' % (repr(value), len(value), max_size)) class BTree(object): '''A balanced search tree (copy-on-write B-tree). The tree belongs to a forest. The tree nodes are stored in an external node store; see the ``NodeStore`` class. ``root_id`` gives the id of the root node of the tree. The root node must be unique to this tree, as it is modified in place. ``root_id`` may also be ``None``, in which case a new node is created automatically to serve as the root node. ''' def __init__(self, forest, node_store, root_id): self.forest = forest self.node_store = node_store self.max_index_length = self.node_store.max_index_pairs() self.min_index_length = self.max_index_length / 2 if root_id is None: self.root = None else: self.root = self._get_node(root_id) tracing.trace('init BTree %s with root_id %s' % (self, root_id)) def _check_key_size(self, key): if len(key) != self.node_store.codec.key_bytes: raise KeySizeMismatch(key, self.node_store.codec.key_bytes) def _check_value_size(self, value): if len(value) > self.node_store.max_value_size: raise ValueTooLarge(value, self.node_store.max_value_size) def _new_id(self): '''Generate a new node identifier.''' return self.forest.new_id() def _new_leaf(self, keys, values): '''Create a new leaf node.''' node = larch.LeafNode(self._new_id(), keys, values) tracing.trace('id=%s' % node.id) return node def _new_index(self, keys, values): '''Create a new index node.''' index = larch.IndexNode(self._new_id(), keys, values) for child_id in values: self._increment(child_id) tracing.trace('id=%s' % index.id) return index def _set_root(self, new_root): '''Replace existing root node.''' tracing.trace('new_root.id=%s' % new_root.id) if self.root is not None and self.root.id != new_root.id: tracing.trace('decrement old root %s' % self.root.id) self._decrement(self.root.id) self._put_node(new_root) self.root = new_root tracing.trace('setting node %s refcount to 1' % self.root.id) self.node_store.set_refcount(self.root.id, 1) def _get_node(self, node_id): '''Return node corresponding to a node id.''' return self.node_store.get_node(node_id) def _put_node(self, node): '''Put node into node store.''' tracing.trace('node.id=%s' % node.id) return self.node_store.put_node(node) def _leaf_size(self, node): if node.size is None: node.size = self.node_store.codec.leaf_size(node.keys(), node.values()) return node.size def lookup(self, key): '''Return value corresponding to ``key``. If the key is not in the tree, raise ``KeyError``. ''' tracing.trace('looking up %s' % repr(key)) tracing.trace('tree is %s (root id %s)', self, self.root.id if self.root else None) self._check_key_size(key) node = self.root while isinstance(node, larch.IndexNode): k = node.find_key_for_child_containing(key) # If k is None, then the indexing of node will cause KeyError # to be returned, just like we want to. This saves us from # having to test for it separately. node = self._get_node(node[k]) if isinstance(node, larch.LeafNode): return node[key] raise KeyError(key) def lookup_range(self, minkey, maxkey): '''Return list of (key, value) pairs for all keys in a range. ``minkey`` and ``maxkey`` are included in range. ''' tracing.trace('looking up range %s .. %s' % (repr(minkey), repr(maxkey))) tracing.trace('tree is %s (root id %s)', self, self.root.id if self.root else None) self._check_key_size(minkey) self._check_key_size(maxkey) if self.root is None: return [] else: return [pair for pair in self._lookup_range(self.root.id, minkey, maxkey)] def _lookup_range(self, node_id, minkey, maxkey): node = self._get_node(node_id) if isinstance(node, larch.LeafNode): for key in node.find_keys_in_range(minkey, maxkey): yield key, node[key] else: assert isinstance(node, larch.IndexNode) result = [] for child_id in node.find_children_in_range(minkey, maxkey): for pair in self._lookup_range(child_id, minkey, maxkey): yield pair def count_range(self, minkey, maxkey): '''Return number of keys in range.''' tracing.trace('count_range(%s, %s)' % (repr(minkey), repr(maxkey))) tracing.trace('tree is %s (root id %s)', self, self.root.id if self.root else None) self._check_key_size(minkey) self._check_key_size(maxkey) if self.root is None: return 0 return self._count_range(self.root.id, minkey, maxkey) def _count_range(self, node_id, minkey, maxkey): node = self._get_node(node_id) if isinstance(node, larch.LeafNode): return len(list(node.find_keys_in_range(minkey, maxkey))) else: assert isinstance(node, larch.IndexNode) count = 0 for child_id in node.find_children_in_range(minkey, maxkey): count += self._count_range(child_id, minkey, maxkey) return count def range_is_empty(self, minkey, maxkey): '''Is a range empty in the tree? This is faster than doing a range lookup for the same range, and checking if there are any keys returned. ''' tracing.trace('range_is_empty(%s, %s)' % (repr(minkey), repr(maxkey))) tracing.trace('tree is %s (root id %s)', self, self.root.id if self.root else None) self._check_key_size(minkey) self._check_key_size(maxkey) if self.root is None: return True return self._range_is_empty(self.root.id, minkey, maxkey) def _range_is_empty(self, node_id, minkey, maxkey): node = self._get_node(node_id) if isinstance(node, larch.LeafNode): return node.find_keys_in_range(minkey, maxkey) == [] else: assert isinstance(node, larch.IndexNode) for child_id in node.find_children_in_range(minkey, maxkey): if not self._range_is_empty(child_id, minkey, maxkey): return False return True def _shadow(self, node): '''Shadow a node: make it possible to modify it in-place.''' tracing.trace('node.id=%s' % node.id) if self.node_store.can_be_modified(node): tracing.trace('can be modified in place') self.node_store.start_modification(node) new = node elif isinstance(node, larch.IndexNode): tracing.trace('new index node') new = self._new_index(node.keys(), node.values()) else: tracing.trace('new leaf node') new = self._new_leaf(node.keys(), node.values()) new.size = node.size tracing.trace('returning new.id=%s' % new.id) return new def insert(self, key, value): '''Insert a new key/value pair into the tree. If the key already existed in the tree, the old value is silently forgotten. ''' tracing.trace('key=%s' % repr(key)) tracing.trace('value=%s' % repr(value)) tracing.trace('tree is %s (root id %s)', self, self.root.id if self.root else None) self._check_key_size(key) self._check_value_size(value) # Is the tree empty? This needs special casing to keep # _insert_into_index simpler. if self.root is None or len(self.root) == 0: tracing.trace('tree is empty') leaf = self._new_leaf([key], [value]) self._put_node(leaf) if self.root is None: new_root = self._new_index([key], [leaf.id]) else: new_root = self._shadow(self.root) new_root.add(key, leaf.id) self._increment(leaf.id) else: tracing.trace('tree is not empty') kids = self._insert_into_index(self.root, key, value) # kids contains either one or more index nodes. If one, # we use that as the new root. Otherwise, we create a new one, # making the tree higher because we must. assert len(kids) > 0 for kid in kids: assert type(kid) == larch.IndexNode if len(kids) == 1: new_root = kids[0] tracing.trace('only one kid: id=%s' % new_root.id) else: keys = [kid.first_key() for kid in kids] values = [kid.id for kid in kids] new_root = self._new_index(keys, values) tracing.trace('create new root: id=%s' % new_root.id) self._set_root(new_root) def _insert_into_index(self, old_index, key, value): '''Insert key, value into an index node. Return list of replacement nodes. Might be just the same node, or a single new node, or two nodes, one of which might be the same node. Note that this method never makes the tree higher, that is the job of the caller. If two nodes are returned, they are siblings at the same height as the original node. ''' tracing.trace('old_index.id=%s' % old_index.id) new_index = self._shadow(old_index) child_key = new_index.find_key_for_child_containing(key) if child_key is None: child_key = new_index.first_key() child = self._get_node(new_index[child_key]) if isinstance(child, larch.IndexNode): new_kids = self._insert_into_index(child, key, value) else: new_kids = self._insert_into_leaf(child, key, value) new_index.remove(child_key) do_dec = True for kid in new_kids: new_index.add(kid.first_key(), kid.id) if kid.id != child.id: self._increment(kid.id) else: do_dec = False if do_dec: # pragma: no cover self._decrement(child.id) if len(new_index) > self.max_index_length: tracing.trace('need to split index node id=%s' % new_index.id) n = len(new_index) / 2 keys = new_index.keys()[n:] values = new_index.values()[n:] new = larch.IndexNode(self._new_id(), keys, values) tracing.trace('new index node id=%s' % new.id) new_index.remove_index_range(n, len(new_index)) self._put_node(new_index) self._put_node(new) return [new_index, new] else: tracing.trace('no need to split index node id=%s' % new_index.id) self._put_node(new_index) return [new_index] def _insert_into_leaf(self, leaf, key, value): '''Insert a key/value pair into a leaf node. Return value is like for _insert_into_index. ''' tracing.trace('leaf.id=%s' % leaf.id) codec = self.node_store.codec max_size = self.node_store.node_size size = self._leaf_size new = self._shadow(leaf) old_size = size(new) if key in new: old_value = new[key] new.add(key, value) new.size = codec.leaf_size_delta_replace(old_size, old_value, value) else: new.add(key, value) new.size = codec.leaf_size_delta_add(old_size, value) if size(new) <= max_size: tracing.trace('leaf did not grow too big') leaves = [new] else: tracing.trace('leaf grew too big, splitting') keys = new.keys() values = new.values() n = len(keys) / 2 new2 = self._new_leaf(keys[n:], values[n:]) for key in new2: new.remove(key) if size(new2) > max_size: # pragma: no cover while size(new2) > max_size: key = new2.keys()[0] new.add(key, new2[key]) new2.remove(key) elif size(new) > max_size: # pragma: no cover while size(new) > max_size: key = new.keys()[-1] new2.add(key, new[key]) new.remove(key) leaves = [new, new2] for x in leaves: self._put_node(x) return leaves def remove(self, key): '''Remove ``key`` and its associated value from tree. If key is not in the tree, ``KeyValue`` is raised. ''' tracing.trace('key=%s' % repr(key)) tracing.trace('tree is %s (root id %s)', self, self.root.id if self.root else None) self._check_key_size(key) if self.root is None: tracing.trace('no root') raise KeyError(key) new_root = self._remove_from_index(self.root, key) self._set_root(new_root) self._reduce_height() def _remove_from_index(self, old_index, key): tracing.trace('old_index.id=%s' % old_index.id) tracing.trace('tree is %s (root id %s)', self, self.root.id if self.root else None) child_key = old_index.find_key_for_child_containing(key) new_index = self._shadow(old_index) child = self._get_node(new_index[child_key]) if isinstance(child, larch.IndexNode): new_kid = self._remove_from_index(child, key) new_index.remove(child_key) if len(new_kid) > 0: self._add_or_merge_index(new_index, new_kid) else: if new_kid.id != child.id: # pragma: no cover self._decrement(new_kid.id) self._decrement(child.id) else: assert isinstance(child, larch.LeafNode) leaf = self._shadow(child) leaf.remove(key) self._put_node(leaf) new_index.remove(child_key) if len(leaf) > 0: self._add_or_merge_leaf(new_index, leaf) else: tracing.trace('new leaf is empty, forgetting it') if leaf.id != child.id: # pragma: no cover self._decrement(leaf.id) self._decrement(child.id) self._put_node(new_index) return new_index def _add_or_merge_index(self, parent, index): self._add_or_merge(parent, index, self._merge_index) def _add_or_merge_leaf(self, parent, leaf): self._add_or_merge(parent, leaf, self._merge_leaf) def _add_or_merge(self, parent, node, merge): assert not parent.frozen assert node.frozen keys = parent.keys() key = node.first_key() i = bisect.bisect_left(keys, key) new_node = None if i > 0: new_node = merge(parent, node, i-1) if new_node is None and i < len(keys): new_node = merge(parent, node, i) if new_node is None: new_node = node assert new_node is not None self._put_node(new_node) parent.add(new_node.first_key(), new_node.id) self._increment(new_node.id) if new_node != node: # pragma: no cover # We made a new node, so get rid of the old one. tracing.trace('decrementing unused node id=%s' % node.id) self._decrement(node.id) def _merge_index(self, parent, node, sibling_index): def merge_indexes_p(a, b): return len(a) + len(b) <= self.max_index_length def add_to_index(n, k, v): n.add(k, v) self._increment(v) return self._merge_nodes(parent, node, sibling_index, merge_indexes_p, add_to_index) def _merge_leaf(self, parent, node, sibling_index): def merge_leaves_p(a, b): a_size = self._leaf_size(a) b_size = self._leaf_size(b) return a_size + b_size <= self.node_store.node_size def add_to_leaf(n, k, v): n.add(k, v) return self._merge_nodes(parent, node, sibling_index, merge_leaves_p, add_to_leaf) def _merge_nodes(self, parent, node, sibling_index, merge_p, add): sibling_key = parent.keys()[sibling_index] sibling_id = parent[sibling_key] sibling = self._get_node(sibling_id) if merge_p(node, sibling): tracing.trace('merging nodes %s and %s' % (node.id, sibling.id)) new_node = self._shadow(node) for k in sibling: add(new_node, k, sibling[k]) self._put_node(new_node) parent.remove(sibling_key) tracing.trace('decrementing now-unused sibling %s' % sibling.id) self._decrement(sibling.id) return new_node else: return None def remove_range(self, minkey, maxkey): '''Remove all keys in the given range. Range is inclusive. ''' tracing.trace('minkey=%s maxkey=%s' % (repr(minkey), repr(maxkey))) tracing.trace('tree is %s (root id %s)', self, self.root.id if self.root else None) self._check_key_size(minkey) self._check_key_size(maxkey) keys = [k for k, v in self.lookup_range(minkey, maxkey)] for key in keys: self.remove(key) def _reduce_height(self): # After removing things, the top of the tree might consist of a # list of index nodes with only a single child, which is also # an index node. These can and should be removed, for efficiency. # Further, since we've modified all of these nodes, they can all # be modified in place. tracing.trace('start reducing height') while len(self.root) == 1: tracing.trace('self.root.id=%s' % self.root.id) key = self.root.first_key() child_id = self.root[key] assert self.node_store.get_refcount(self.root.id) == 1 if self.node_store.get_refcount(child_id) != 1: tracing.trace('only child is shared') break child = self._get_node(child_id) if isinstance(child, larch.LeafNode): tracing.trace('only child is a leaf node') break # We can just make the child be the new root node. assert type(child) == larch.IndexNode # Prevent child from getting removed when parent's refcount # gets decremented. set_root will set the refcount to be 1. tracing.trace('setting node %s refcount to 2' % child.id) self.node_store.set_refcount(child.id, 2) self._set_root(child) tracing.trace('done reducing height') def _increment(self, node_id): '''Non-recursively increment refcount for a node.''' refcount = self.node_store.get_refcount(node_id) refcount += 1 self.node_store.set_refcount(node_id, refcount) tracing.trace('node %s refcount grew to %s' % (node_id, refcount)) def _decrement(self, node_id): '''Recursively, lazily decrement refcounts for a node and children.''' tracing.trace('decrementing node %s refcount' % node_id) refcount = self.node_store.get_refcount(node_id) if refcount > 1: refcount -= 1 self.node_store.set_refcount(node_id, refcount) tracing.trace('node %s refcount now %s' % (node_id, refcount)) else: tracing.trace('node %s refcount %s, removing node' % (node_id, refcount)) node = self._get_node(node_id) if isinstance(node, larch.IndexNode): tracing.trace('reducing refcounts for children') for child_id in node.values(): self._decrement(child_id) self.node_store.remove_node(node_id) self.node_store.set_refcount(node_id, 0) larch-1.20131130/larch/tree_tests.py0000644000175000017500000006275112246332521016721 0ustar jenkinsjenkins# Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import random import sys import unittest import larch class DummyForest(object): def __init__(self): self.last_id = 0 def new_id(self): self.last_id += 1 return self.last_id class DummyNodeStore(larch.NodeStoreMemory): def find_nodes(self): return self.nodes.keys() class KeySizeMismatchTests(unittest.TestCase): def setUp(self): self.err = larch.KeySizeMismatch('foo', 4) def test_error_message_contains_key(self): self.assert_('foo' in str(self.err)) def test_error_message_contains_wanted_size(self): self.assert_('4' in str(self.err)) class ValueTooLargeTests(unittest.TestCase): def setUp(self): self.err = larch.ValueTooLarge('foobar', 3) def test_error_message_contains_value(self): self.assert_('foobar' in str(self.err)) def test_error_message_contains_max_size(self): self.assert_('3' in str(self.err)) class BTreeTests(unittest.TestCase): def setUp(self): # We use a small node size so that all code paths are traversed # during testing. Use coverage.py to make sure they do. self.codec = larch.NodeCodec(3) self.ns = DummyNodeStore( allow_writes=True, node_size=64, codec=self.codec) self.forest = DummyForest() self.tree = larch.BTree(self.forest, self.ns, None) self.dump = False def test_shadow_increments_childrens_refcounts(self): leaf = self.tree._new_leaf(['foo'], ['bar']) index = self.tree._new_index([leaf.first_key()], [leaf.id]) self.assertEqual(self.ns.get_refcount(leaf.id), 1) self.ns.set_refcount(index.id, 2) clone = self.tree._shadow(index) self.assertEqual(self.ns.get_refcount(leaf.id), 2) def test_shadow_returns_new_leaf_if_cannot_be_modified(self): node = self.tree._new_leaf(['foo'], ['bar']) self.tree._put_node(node) self.ns.set_refcount(node.id, 2) node2 = self.tree._shadow(node) self.assertNotEqual(node2.id, node.id) def test_shadow_returns_new_index_if_cannot_be_modified(self): node = self.tree._new_index(['foo'], [1]) self.tree._put_node(node) self.ns.set_refcount(node.id, 2) node2 = self.tree._shadow(node) self.assertNotEqual(node2.id, node.id) def test_shadow_returns_same_node_that_can_be_modified(self): node = self.tree._new_index(['foo'], [1]) self.tree._put_node(node) self.ns.set_refcount(node.id, 1) node2 = self.tree._shadow(node) self.assertEqual(node2.id, node.id) def test_new_leaf_does_not_put_node_into_store(self): leaf = self.tree._new_leaf([], []) self.assertRaises(larch.NodeMissing, self.tree._get_node, leaf.id) def test_new_index_does_not_put_node_into_store(self): index = self.tree._new_index([], []) self.assertRaises(larch.NodeMissing, self.tree._get_node, index.id) def test_new_index_increments_childrens_refcounts(self): leaf = self.tree._new_leaf([], []) self.tree._put_node(leaf) self.assertEqual(self.ns.get_refcount(leaf.id), 0) self.tree._new_index(['foo'], [leaf.id]) self.assertEqual(self.ns.get_refcount(leaf.id), 1) def test_insert_changes_root_id(self): self.tree.insert('foo', 'bar') self.assertNotEqual(self.tree.root.id, 0) def test_is_empty(self): self.assertEqual(self.tree.root, None) def test_lookup_for_missing_key_raises_error(self): self.assertRaises(KeyError, self.tree.lookup, 'foo') def test_lookup_with_wrong_size_key_raises_error(self): self.assertRaises(larch.KeySizeMismatch, self.tree.lookup, '') def test_insert_inserts_key(self): self.tree.insert('foo', 'bar') self.assertEqual(self.tree.lookup('foo'), 'bar') def test_insert_inserts_empty_value(self): self.tree.insert('foo', '') self.assertEqual(self.tree.lookup('foo'), '') def test_insert_replaces_value_for_existing_key(self): self.tree.insert('foo', 'foo') self.tree.insert('foo', 'bar') self.assertEqual(self.tree.lookup('foo'), 'bar') def test_insert_with_wrong_size_key_raises_error(self): self.assertRaises(larch.KeySizeMismatch, self.tree.insert, '', '') def test_insert_with_too_large_value_raises_error(self): self.assertRaises(larch.ValueTooLarge, self.tree.insert, 'xxx', 'x' * (self.ns.max_value_size + 1)) def test_remove_from_empty_tree_raises_keyerror(self): self.assertRaises(KeyError, self.tree.remove, 'foo') def test_remove_of_missing_key_raises_keyerror(self): self.tree.insert('bar', 'bar') self.assertRaises(KeyError, self.tree.remove, 'foo') def test_remove_removes_key(self): self.tree.insert('foo', 'bar') self.tree.remove('foo') self.assertRaises(KeyError, self.tree.lookup, 'foo') def get_roots_first_child(self): child_key = self.tree.root.first_key() child_id = self.tree.root[child_key] return self.ns.get_node(child_id) def test_remove_with_wrong_size_key_raises_error(self): self.assertRaises(larch.KeySizeMismatch, self.tree.remove, '') def keys_are_in_range(self, node, lower, upper, level=0): indent = 2 if isinstance(node, larch.LeafNode): if self.dump: print '%*sleaf keys %s' % (level*indent, '', node.keys()) for key in node.keys(): if key < lower or key >= upper: return False else: keys = node.keys() if self.dump: print '%*sin range; index keys = %s' % (level*indent, '', keys), 'lower..upper:', lower, upper if keys != sorted(keys): return False for i, key in enumerate(keys): if key < lower or key >= upper: return False if i+1 == len(keys): up = upper else: up = keys[i+1] if self.dump: print '%*sin child, keys should be in %s..%s' % (level*indent, '', key, up) if not self.keys_are_in_range(self.tree._get_node(node[key]), key, up, level+1): return False return True def find_largest_key(self, node): if isinstance(node, larch.LeafNode): return max(node.keys()) else: return max(node.keys() + [self.find_largest_key(self.tree._get_node(node[key])) for key in node.keys()]) def nextkey(self, key): assert type(key) == str if key == '': return '\0' if key[-1] == '\xff': return key + '\0' else: return key[:-1] + chr(ord(key[-1]) + 1) def proper_search_tree(self, node): if not node.keys(): return True minkey = node.keys()[0] maxkey = self.find_largest_key(node) if self.dump: print; print 'proper tree', minkey, self.nextkey(maxkey) return self.keys_are_in_range(node, minkey, self.nextkey(maxkey)) def test_insert_many_respects_ordering_requirement(self): ints = range(100) random.shuffle(ints) for i in ints: key = '%03d' % i value = key self.tree.insert(key, value) self.assertEqual(self.tree.lookup(key), value) self.assert_(self.proper_search_tree(self.tree.root), 'key#%d failed' % (1 + ints.index(i))) def test_remove_many_works(self): ints = range(100) random.shuffle(ints) for i in ints: key = '%03d' % i value = key self.tree.insert(key, value) self.assertEqual(self.tree.lookup(key), value) self.tree.remove(key) self.assertRaises(KeyError, self.tree.lookup, key) self.assert_(self.proper_search_tree(self.tree.root), msg='insert of %d in %s failed to keep tree ok' % (i, ints)) def test_reduce_height_makes_tree_lower(self): self.tree.insert('foo', 'bar') old_root = self.tree.root extra_root = self.tree._new_index([old_root.first_key()], [old_root.id]) self.tree._set_root(extra_root) # Fix old root's refcount, since it got incremented to 2. self.ns.set_refcount(old_root.id, 1) self.assertEqual(self.tree.root, extra_root) self.tree._reduce_height() self.assertEqual(self.tree.root, old_root) def test_reduce_height_does_not_lower_tree_when_children_are_shared(self): self.tree.insert('foo', 'bar') old_root = self.tree.root extra_root = self.tree._new_index([old_root.first_key()], [old_root.id]) self.tree._set_root(extra_root) # Make old root's refcount be 2, so it looks like it is shared # between trees. self.ns.set_refcount(old_root.id, 2) self.assertEqual(self.tree.root, extra_root) self.tree._reduce_height() self.assertEqual(self.tree.root, extra_root) def dump_tree(self, node, f=sys.stdout, level=0): if not self.dump: return indent = 4 if isinstance(node, larch.LeafNode): f.write('%*sLeaf:' % (level*indent, '')) for key in node.keys(): f.write(' %s=%s' % (key, node[key])) f.write('\n') else: assert isinstance(node, larch.IndexNode) f.write('%*sIndex:\n' % (level*indent, '')) for key in node.keys(): f.write('%*s%s:\n' % ((level+1)*indent, '', key)) self.dump_tree(self.tree._get_node(node[key]), level=level+2) def test_insert_many_remove_many_works(self): keys = ['%03d' % i for i in range(100)] random.shuffle(keys) for key in keys: self.tree.insert(key, key) self.assert_(self.proper_search_tree(self.tree.root)) if self.dump: print print self.dump_tree(self.tree.root) print for key in keys: if self.dump: print print 'removing', key self.dump_tree(self.tree.root) try: self.tree.remove(key) except: self.dump = True self.dump_tree(self.tree.root) ret = self.proper_search_tree(self.tree.root) print 'is it?', ret raise self.assert_(self.proper_search_tree(self.tree.root)) if self.dump: print print self.assertEqual(self.tree.root.keys(), []) def test_remove_merges_leaf_with_left_sibling(self): keys = ['%03d' % i for i in range(3)] for key in keys: self.tree.insert(key, 'x') self.assertEqual(self.tree.remove(keys[1]), None) def test_persists(self): self.tree.insert('foo', 'bar') tree2 = larch.BTree(self.forest, self.ns, self.tree.root.id) self.assertEqual(tree2.lookup('foo'), 'bar') def test_last_node_id_persists(self): self.tree.insert('foo', 'bar') # make tree has root node1 = self.tree._new_leaf([], []) tree2 = larch.BTree(self.forest, self.ns, self.tree.root.id) node2 = tree2._new_leaf([], []) self.assertEqual(node1.id + 1, node2.id) def test_lookup_range_returns_empty_list_if_nothing_found(self): self.assertEqual(list(self.tree.lookup_range('bar', 'foo')), []) def create_tree_for_range(self): for key in ['%03d' % i for i in range(2, 10, 2)]: self.tree.insert(key, key) def test_lookup_between_keys_raises_keyerror(self): self.create_tree_for_range() self.assertRaises(KeyError, self.tree.lookup, '000') def test_lookup_range_returns_empty_list_if_before_smallest_key(self): self.create_tree_for_range() self.assertEqual(list(self.tree.lookup_range('000', '001')), []) def test_lookup_range_returns_empty_list_if_after_largest_key(self): self.create_tree_for_range() self.assertEqual(list(self.tree.lookup_range('010', '999')), []) def test_lookup_range_returns_empty_list_if_between_keys(self): self.create_tree_for_range() self.assertEqual(list(self.tree.lookup_range('003', '003')), []) def test_lookup_range_returns_single_item_in_range(self): self.create_tree_for_range() self.assertEqual(list(self.tree.lookup_range('002', '002')), [('002', '002')]) def test_lookup_range_returns_single_item_in_range_exclusive(self): self.create_tree_for_range() self.assertEqual(list(self.tree.lookup_range('001', '003')), [('002', '002')]) def test_lookup_range_returns_two_items_in_range(self): self.create_tree_for_range() self.assertEqual(sorted(self.tree.lookup_range('002', '004')), [('002', '002'), ('004', '004')]) def test_lookup_range_returns_all_items_in_range(self): self.create_tree_for_range() self.assertEqual(sorted(self.tree.lookup_range('000', '999')), [('002', '002'), ('004', '004'), ('006', '006'), ('008', '008')]) def test_count_range_returns_zero_for_empty_tree(self): self.assertEqual(self.tree.count_range('000', '000'), 0) def test_count_range_returns_zero_for_empty_range_at_beginning(self): self.create_tree_for_range() self.assertEqual(self.tree.count_range('000', '000'), 0) def test_count_range_returns_zero_for_empty_range_in_middle(self): self.create_tree_for_range() self.assertEqual(self.tree.count_range('003', '003'), 0) def test_count_range_returns_zero_for_empty_range_at_end(self): self.create_tree_for_range() self.assertEqual(self.tree.count_range('009', '009'), 0) def test_count_range_returns_one_for_range_with_one_key(self): self.create_tree_for_range() self.assertEqual(self.tree.count_range('002', '002'), 1) def test_count_range_returns_one_for_range_with_one_key_part_2(self): self.create_tree_for_range() self.assertEqual(self.tree.count_range('001', '003'), 1) def test_count_range_returns_correct_result_for_longer_range(self): self.create_tree_for_range() self.assertEqual(self.tree.count_range('000', '009'), 4) def test_range_is_empty_returns_true_for_empty_tree(self): self.assertTrue(self.tree.range_is_empty('bar', 'foo')) def test_range_is_empty_works_for_nonempty_tree(self): self.create_tree_for_range() self.assertEqual(self.tree.range_is_empty('000', '000'), True) self.assertEqual(self.tree.range_is_empty('000', '001'), True) self.assertEqual(self.tree.range_is_empty('000', '002'), False) self.assertEqual(self.tree.range_is_empty('000', '003'), False) self.assertEqual(self.tree.range_is_empty('000', '004'), False) self.assertEqual(self.tree.range_is_empty('000', '005'), False) self.assertEqual(self.tree.range_is_empty('000', '006'), False) self.assertEqual(self.tree.range_is_empty('000', '007'), False) self.assertEqual(self.tree.range_is_empty('000', '008'), False) self.assertEqual(self.tree.range_is_empty('000', '009'), False) self.assertEqual(self.tree.range_is_empty('000', '999'), False) self.assertEqual(self.tree.range_is_empty('001', '001'), True) self.assertEqual(self.tree.range_is_empty('001', '002'), False) self.assertEqual(self.tree.range_is_empty('001', '003'), False) self.assertEqual(self.tree.range_is_empty('001', '004'), False) self.assertEqual(self.tree.range_is_empty('001', '005'), False) self.assertEqual(self.tree.range_is_empty('001', '006'), False) self.assertEqual(self.tree.range_is_empty('001', '007'), False) self.assertEqual(self.tree.range_is_empty('001', '008'), False) self.assertEqual(self.tree.range_is_empty('001', '009'), False) self.assertEqual(self.tree.range_is_empty('001', '999'), False) self.assertEqual(self.tree.range_is_empty('002', '002'), False) self.assertEqual(self.tree.range_is_empty('002', '003'), False) self.assertEqual(self.tree.range_is_empty('002', '004'), False) self.assertEqual(self.tree.range_is_empty('002', '005'), False) self.assertEqual(self.tree.range_is_empty('002', '006'), False) self.assertEqual(self.tree.range_is_empty('002', '007'), False) self.assertEqual(self.tree.range_is_empty('002', '008'), False) self.assertEqual(self.tree.range_is_empty('002', '009'), False) self.assertEqual(self.tree.range_is_empty('002', '999'), False) self.assertEqual(self.tree.range_is_empty('003', '003'), True) self.assertEqual(self.tree.range_is_empty('003', '004'), False) self.assertEqual(self.tree.range_is_empty('003', '005'), False) self.assertEqual(self.tree.range_is_empty('003', '006'), False) self.assertEqual(self.tree.range_is_empty('003', '007'), False) self.assertEqual(self.tree.range_is_empty('003', '008'), False) self.assertEqual(self.tree.range_is_empty('003', '009'), False) self.assertEqual(self.tree.range_is_empty('003', '999'), False) self.assertEqual(self.tree.range_is_empty('004', '004'), False) self.assertEqual(self.tree.range_is_empty('004', '005'), False) self.assertEqual(self.tree.range_is_empty('004', '006'), False) self.assertEqual(self.tree.range_is_empty('004', '007'), False) self.assertEqual(self.tree.range_is_empty('004', '008'), False) self.assertEqual(self.tree.range_is_empty('004', '009'), False) self.assertEqual(self.tree.range_is_empty('004', '999'), False) self.assertEqual(self.tree.range_is_empty('005', '005'), True) self.assertEqual(self.tree.range_is_empty('005', '006'), False) self.assertEqual(self.tree.range_is_empty('005', '007'), False) self.assertEqual(self.tree.range_is_empty('005', '008'), False) self.assertEqual(self.tree.range_is_empty('005', '009'), False) self.assertEqual(self.tree.range_is_empty('005', '999'), False) self.assertEqual(self.tree.range_is_empty('006', '006'), False) self.assertEqual(self.tree.range_is_empty('006', '007'), False) self.assertEqual(self.tree.range_is_empty('006', '008'), False) self.assertEqual(self.tree.range_is_empty('006', '009'), False) self.assertEqual(self.tree.range_is_empty('006', '999'), False) self.assertEqual(self.tree.range_is_empty('007', '007'), True) self.assertEqual(self.tree.range_is_empty('007', '008'), False) self.assertEqual(self.tree.range_is_empty('007', '009'), False) self.assertEqual(self.tree.range_is_empty('007', '999'), False) self.assertEqual(self.tree.range_is_empty('008', '008'), False) self.assertEqual(self.tree.range_is_empty('008', '009'), False) self.assertEqual(self.tree.range_is_empty('008', '999'), False) self.assertEqual(self.tree.range_is_empty('009', '009'), True) self.assertEqual(self.tree.range_is_empty('009', '999'), True) self.assertEqual(self.tree.range_is_empty('999', '999'), True) def test_remove_range_removes_everything(self): for key in ['%03d' % i for i in range(1000)]: self.tree.insert(key, key) self.tree.remove_range('000', '999') self.assertEqual(list(self.tree.lookup_range('000', '999')), []) def test_remove_range_removes_single_key_in_middle(self): self.create_tree_for_range() self.tree.remove_range('004', '004') self.assertEqual(list(self.tree.lookup_range('000', '999')), [('002', '002'), ('006', '006'), ('008', '008')]) def test_remove_range_removes_from_beginning_of_keys(self): self.create_tree_for_range() self.tree.remove_range('000', '004') self.assertEqual(list(self.tree.lookup_range('000', '999')), [('006', '006'), ('008', '008')]) def test_remove_range_removes_from_middle_of_keys(self): self.create_tree_for_range() self.tree.remove_range('003', '007') self.assertEqual(list(self.tree.lookup_range('000', '999')), [('002', '002'), ('008', '008')]) def test_remove_range_removes_from_end_of_keys(self): self.create_tree_for_range() self.tree.remove_range('007', '009') self.assertEqual(list(self.tree.lookup_range('000', '999')), [('002', '002'), ('004', '004'), ('006', '006')]) def test_remove_range_removes_from_empty_tree(self): self.create_tree_for_range() self.tree.remove_range('000', '999') self.tree.remove_range('007', '009') self.assertEqual(list(self.tree.lookup_range('000', '999')), []) def test_bug_remove_range_when_only_key_is_larger_than_maxkey(self): self.tree.insert('555', '555') self.tree.remove_range('000', '111') self.assertEqual(list(self.tree.lookup_range('000', '999')), [('555', '555')]) class BTreeDecrementTests(unittest.TestCase): def setUp(self): # We use a small node size so that all code paths are traversed # during testing. Use coverage.py to make sure they do. self.codec = larch.NodeCodec(3) self.ns = DummyNodeStore( allow_writes=True, node_size=64, codec=self.codec) self.forest = DummyForest() self.tree = larch.BTree(self.forest, self.ns, None) self.tree.insert('foo', 'bar') def test_store_has_two_nodes(self): self.assertEqual(len(self.ns.find_nodes()), 2) def test_initially_everything_has_refcount_1(self): for node_id in self.ns.find_nodes(): self.assertEqual(self.ns.get_refcount(node_id), 1) def test_decrement_removes_everything(self): self.tree._decrement(self.tree.root.id) self.assertEqual(len(self.ns.find_nodes()), 0) def test_decrement_does_not_remove_anything(self): self.ns.set_refcount(self.tree.root.id, 2) self.tree._decrement(self.tree.root.id) self.assertEqual(len(self.ns.find_nodes()), 2) class BTreeBalanceTests(unittest.TestCase): def setUp(self): ns = DummyNodeStore( allow_writes=True, node_size=4096, codec=larch.NodeCodec(2)) forest = DummyForest() self.tree = larch.BTree(forest, ns, None) self.keys = ['%02d' % i for i in range(10)] self.depth = None def leaves_at_same_depth(self, node, depth=0): if isinstance(node, larch.LeafNode): if self.depth is None: self.depth = depth return self.depth == depth else: assert isinstance(node, larch.IndexNode) for key in node: child = self.tree._get_node(node[key]) if not self.leaves_at_same_depth(child, depth + 1): return False return True def indexes_filled_right_amount(self, node, isroot=True): if isinstance(node, larch.IndexNode): if not isroot: if len(node) < self.fanout or len(node) > 2 * self.fanout + 1: return False for key in node: child = self.tree._get_node(node[key]) ok = self.indexes_filled_right_amount(child, isroot=False) if not ok: return False return True def test_insert_puts_every_leaf_at_same_depth(self): for key in self.keys: self.tree.insert(key, key) self.depth = None self.assert_(self.leaves_at_same_depth(self.tree.root), 'key#%d failed' % (self.keys.index(key) + 1)) def test_insert_fills_every_index_node_the_right_amount(self): self.assert_(self.indexes_filled_right_amount(self.tree.root)) for key in self.keys: self.tree.insert(key, key) self.assert_(self.indexes_filled_right_amount(self.tree.root)) def test_remove_keeps_every_leaf_at_same_depth(self): for key in self.keys: self.tree.insert(key, key) for key in self.keys: self.tree.remove(key) self.assert_(self.leaves_at_same_depth(self.tree.root)) larch-1.20131130/larch/uploadqueue.py0000644000175000017500000000460012246332521017056 0ustar jenkinsjenkins# Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import logging import os import StringIO import struct import tempfile import larch class UploadQueue(object): '''Queue of objects waiting to be uploaded to the store. We don't upload nodes directly, because it frequently happens that a node gets modified or deleted soon after it is created. It makes sense to wait a bit so we can avoid the costly upload operation. This class holds the nodes in a queue, and uploads them if they get pushed out of the queue. ``really_put`` is the function to call to really upload a node. ``max_length`` is the maximum number of nodes to keep in the queue. ''' def __init__(self, really_put, max_length): self.really_put = really_put self._max_length = max_length self._create_lru() def _create_lru(self): self.lru = larch.LRUCache(self._max_length, forget_hook=self._push_oldest) def put(self, node): '''Put a node into the queue.''' self.lru.add(node.id, node) def _push_oldest(self, node_id, node): self.really_put(node) def push(self): '''Upload all nodes in the queue.''' while len(self.lru) > 0: node_id, node = self.lru.remove_oldest() self.really_put(node) self.lru.log_stats() self._create_lru() def remove(self, node_id): '''Remove a node from the queue given its id.''' return self.lru.remove(node_id) def list_ids(self): '''List identifiers of all nodes in the queue.''' return self.lru.keys() def get(self, node_id): '''Get a node node given its id.''' return self.lru.get(node_id) larch-1.20131130/larch/uploadqueue_tests.py0000644000175000017500000000566012246332521020307 0ustar jenkinsjenkins# Copyright 2010 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import shutil import tempfile import unittest import larch class UploadQueueTests(unittest.TestCase): def setUp(self): self.max_queue = 2 self.nodes = [] self.uq = larch.UploadQueue(self.really_put, self.max_queue) self.node = larch.LeafNode(1, [], []) def really_put(self, node): self.nodes.append(node) def test_has_no_nodes_initially(self): self.assertEqual(self.uq.list_ids(), []) def test_get_returns_None_for_nonexistent_node(self): self.assertEqual(self.uq.get(self.node.id), None) def test_puts_node(self): self.uq.put(self.node) self.assertEqual(self.uq.list_ids(), [self.node.id]) self.assertEqual(self.uq.get(self.node.id), self.node) def test_put_replaces_existing_node(self): node2 = larch.LeafNode(1, ['foo'], ['bar']) self.uq.put(self.node) self.uq.put(node2) self.assertEqual(self.uq.get(self.node.id), node2) def test_remove_returns_false_for_nonexistent_node(self): self.assertEqual(self.uq.remove(self.node.id), False) def test_remove_removes_node(self): self.uq.put(self.node) self.uq.remove(self.node.id) self.assertEqual(self.uq.list_ids(), []) self.assertEqual(self.uq.get(self.node.id), None) def test_does_not_push_first_node(self): self.uq.put(self.node) self.assertEqual(self.nodes, []) def test_does_not_push_second_node(self): self.uq.put(self.node) self.uq.put(larch.LeafNode(2, [], [])) self.assertEqual(self.nodes, []) def test_pushes_first_node_after_third_is_pushed(self): self.uq.put(self.node) self.uq.put(larch.LeafNode(2, [], [])) self.uq.put(larch.LeafNode(3, [], [])) self.assertEqual(self.nodes, [self.node]) def test_pushes_oldest_even_if_recently_used(self): self.uq.put(self.node) self.uq.put(larch.LeafNode(2, [], [])) self.uq.get(self.node.id) self.uq.put(larch.LeafNode(3, [], [])) self.assertEqual(self.nodes, [self.node]) def test_pushes_out_only_node_when_requested(self): self.uq.put(self.node) self.uq.push() self.assertEqual(self.nodes, [self.node]) larch-1.20131130/refcount-speed0000755000175000017500000001160412246332521015735 0ustar jenkinsjenkins#!/usr/bin/python # Copyright 2011 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . # Excercise my B-tree implementation, for simple benchmarking purposes. # The benchmark gets a location and an operation count as command line # arguments. # # If the location is the empty string, an in-memory node store is used. # Otherwise it must be a non-existent directory name. # # The benchmark will do the given number of insertions into the tree, and # measure the speed of that. Then it will look up each of those, and measure # the lookups. import cliapp import cProfile import csv import gc import logging import os import random import shutil import subprocess import sys import time import tracing import larch class RefcountSpeedTest(cliapp.Application): def add_settings(self): self.settings.boolean(['profile'], 'profile with cProfile?') self.settings.boolean(['log-memory-use'], 'log VmRSS?') self.settings.string(['trace'], 'code module in which to do trace logging') self.settings.integer(['refs'], 'how many refs to test with (default is %default)', default=2**15) self.settings.integer(['times'], 'how many times to test each op (default is %default)', default=1000) def process_args(self, args): if self.settings['trace']: tracing.trace_add_pattern(self.settings['trace']) n = self.settings['refs'] refcounts = {} for i in xrange(n): refcounts[i] = i # Helper functions. nop = lambda *args: None # Calibrate. looptime = self.measure(nop, 'calibrate') num_refcounts = len(refcounts) keys = refcounts.keys() encode = self.measure( lambda: larch.refcountstore.encode_refcounts( refcounts, 0, num_refcounts, keys), 'encode') encoded = larch.refcountstore.encode_refcounts( refcounts, 0, num_refcounts, keys) decode = self.measure(lambda: larch.refcountstore.decode_refcounts(encoded), 'decode') # Report def speed(result): return n / (result - looptime) def report(label, result): print '%-12s: %5.3f s (%8.1f/s)' % \ (label, result, speed(result)) print 'refs: %d' % self.settings['refs'] print 'times: %d' % self.settings['times'] report('encode', encode) report('decode', decode) def measure(self, func, profname): def log_memory_use(stage): if self.settings['log-memory-use']: logging.info('%s memory use: %s' % (profname, stage)) logging.info(' VmRSS: %s KiB' % self.vmrss()) logging.info(' # objects: %d' % len(gc.get_objects())) logging.info(' # garbage: %d' % len(gc.garbage)) def helper(): n = self.settings['times'] log_memory_use('at start') finished = False while not finished: for i in xrange(n): func() if time.clock() > start: # at least one time unit passed - this is enough finished = True else: # Not a single time unit passed: we need more iterations. # Multiply 'times' by 10 and execute the remaining 9 loops. self.settings['times'] *= 10 n *= 9 log_memory_use('after calls') print 'measuring', profname start = time.clock() if self.settings['profile']: globaldict = globals().copy() localdict = locals().copy() cProfile.runctx('helper()', globaldict, localdict, '%s.prof' % profname) else: helper() end = time.clock() return end - start def vmrss(self): f = open('/proc/self/status') rss = 0 for line in f: if line.startswith('VmRSS'): rss = line.split()[1] f.close() return rss def format(self, value): return '%.0f' % value if __name__ == '__main__': RefcountSpeedTest().run() larch-1.20131130/setup.py0000644000175000017500000000403212246332521014573 0ustar jenkinsjenkins# Copyright 2010, 2011, 2012 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . from distutils.core import setup import larch setup( name='larch', version=larch.__version__, description='copy-on-write B-tree data structure', long_description='''\ An implementation of a particular kind of B-tree, based on research by Ohad Rodeh. This is the same data structure that btrfs uses, but in a new, pure-Python implementation. The distinctive feature of this B-tree is that a node is never (conceptually) modified. Instead, all updates are done by copy-on-write. This makes it easy to clone a tree, and modify only the clone, while other processes access the original tree. The implementation is generic and flexible, so that you may use it in a variety of situations. For example, the tree itself does not decide where its nodes are stored: you provide a class that does that for it. The library contains two implementations, one for in-memory and one for on-disk storage. ''', classifiers=[ 'Development Status :: 4 - Beta', 'Intended Audience :: Developers', 'License :: OSI Approved :: GNU General Public License (GPL)', 'Operating System :: OS Independent', 'Programming Language :: Python :: 2', 'Topic :: Software Development :: Libraries', ], author='Lars Wirzenius', author_email='liw@liw.fi', url='http://liw.fi/larch/', packages=['larch'], scripts=['fsck-larch'], ) larch-1.20131130/speed-test0000755000175000017500000002257412246332521015077 0ustar jenkinsjenkins#!/usr/bin/python # Copyright 2010, 2011 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . # Excercise my B-tree implementation, for simple benchmarking purposes. # The benchmark gets a location and nb of keys to use as command line # arguments --location=LOCATION and --keys=KEYS. # To debug, one can create a tracing logfile by adding arguments like: # --trace=refcount --log=refcount.logfile # # If the location is the empty string, an in-memory node store is used. # Otherwise it must be a non-existent directory name. # # The benchmark will do the given number of insertions into the tree, and # measure the speed of that. Then it will look up each of those, and measure # the lookups. import cliapp import cProfile import csv import gc import logging import os import random import shutil import subprocess import sys import time import tracing import larch class SpeedTest(cliapp.Application): def add_settings(self): self.settings.boolean(['profile'], 'profile with cProfile?') self.settings.boolean(['log-memory-use'], 'log VmRSS?') self.settings.string(['trace'], 'code module in which to do trace logging') self.settings.integer(['keys'], 'how many keys to test with (default is %default)', default=1000) self.settings.string(['location'], 'where to store B-tree on disk (in-memory test if not set)') self.settings.string(['csv'], 'append a CSV row to FILE', metavar='FILE') def process_args(self, args): if self.settings['trace']: tracing.trace_add_pattern(self.settings['trace']) key_size = 19 value_size = 128 node_size = 64*1024 n = self.settings['keys'] location = self.settings['location'] if n is None: raise Exception('You must set number of keys with --keys') if not location: forest = larch.open_forest( allow_writes=True, key_size=key_size, node_size=node_size, node_store=larch.NodeStoreMemory) else: if os.path.exists(location): raise Exception('%s exists already' % location) os.mkdir(location) forest = larch.open_forest( allow_writes=True, key_size=key_size, node_size=node_size, dirname=location) tree = forest.new_tree() # Create list of keys. keys = ['%0*d' % (key_size, i) for i in xrange(n)] ranges = [] range_len = 10 for i in range(0, len(keys) - range_len): ranges.append((keys[i], keys[i+range_len-1])) # Helper functions. nop = lambda *args: None # Calibrate. looptime = self.measure(keys, nop, nop, 'calibrate') # Measure inserts. random.shuffle(keys) value = 'x' * value_size insert = self.measure(keys, lambda key: tree.insert(key, value), forest.commit, 'insert') # Measure lookups. random.shuffle(keys) lookup = self.measure(keys, tree.lookup, nop, 'lookup') # Measure range lookups. random.shuffle(ranges) lookup_range = self.measure(ranges, lambda x: list(tree.lookup_range(x[0], x[1])), nop, 'lookup_range') # Measure count of range lookup results. len_lookup_range = self.measure(ranges, lambda x: len(list(tree.lookup_range(x[0], x[1]))), nop, 'len_lookup_range') # Measure count range. count_range = self.measure(ranges, lambda x: tree.count_range(x[0], x[1]), nop, 'count_range') # Measure inserts into existing tree. random.shuffle(keys) insert2 = self.measure(keys, lambda key: tree.insert(key, value), forest.commit, 'insert2') # Measure removes from tree. random.shuffle(keys) remove = self.measure(keys, tree.remove, forest.commit, 'remove') # Measure remove_range. This requires building a new tree. keys.sort() for key in keys: tree.insert(key, value) random.shuffle(ranges) remove_range = self.measure(ranges, lambda x: tree.remove_range(x[0], x[1]), forest.commit, 'remove_range') # Report def speed(result, i): if result[i] == looptime[i]: # computer too fast for the number of "keys" used... return float("infinity") else: return n / (result[i] - looptime[i]) def report(label, result): cpu, wall = result print '%-16s: %5.3f s (%8.1f/s) CPU; %5.3f s (%8.1f/s) wall' % \ (label, cpu, speed(result, 0), wall, speed(result, 1)) print 'location:', location if location else 'memory' print 'num_operations: %d' % n report('insert', insert) report('lookup', lookup) report('lookup_range', lookup_range) report('len_lookup_range', len_lookup_range) report('count_range', count_range) report('insert2', insert2) report('remove', remove) report('remove_range', remove_range) if self.settings['profile']: print 'View *.prof with ./viewprof for profiling results.' if self.settings['csv']: self.append_csv(n, speed(insert, 0), speed(insert2, 0), speed(lookup, 0), speed(lookup_range, 0), speed(remove, 0), speed(remove_range, 0)) # Clean up if location: shutil.rmtree(location) def measure(self, items, func, finalize, profname): def log_memory_use(stage): if self.settings['log-memory-use']: logging.info('%s memory use: %s' % (profname, stage)) logging.info(' VmRSS: %s KiB' % self.vmrss()) logging.info(' # objects: %d' % len(gc.get_objects())) logging.info(' # garbage: %d' % len(gc.garbage)) def helper(): log_memory_use('at start') for item in items: func(item) log_memory_use('after calls') finalize() log_memory_use('after finalize') print 'measuring', profname start_time = time.time() start = time.clock() if self.settings['profile']: globaldict = globals().copy() localdict = locals().copy() cProfile.runctx('helper()', globaldict, localdict, '%s.prof' % profname) else: helper() end = time.clock() end_time = time.time() return end - start, end_time - start_time def vmrss(self): f = open('/proc/self/status') rss = 0 for line in f: if line.startswith('VmRSS'): rss = line.split()[1] f.close() return rss def append_csv(self, keys, insert, insert2, lookup, lookup_range, remove, remove_range): write_title = not os.path.exists(self.settings['csv']) f = open(self.settings['csv'], 'a') self.writer = csv.writer(f, lineterminator='\n') if write_title: self.writer.writerow(('revno', 'keys', 'insert (random)', 'insert (seq)', 'lookup', 'lookup_range', 'remove', 'remove_range')) if os.path.exists('.bzr'): p = subprocess.Popen(['bzr', 'revno'], stdout=subprocess.PIPE) out, err = p.communicate() if p.returncode != 0: raise cliapp.AppException('bzr failed') revno = out.strip() else: revno = '?' self.writer.writerow((revno, keys, self.format(insert), self.format(insert2), self.format(lookup), self.format(lookup_range), self.format(remove), self.format(remove_range))) f.close() def format(self, value): return '%.0f' % value if __name__ == '__main__': SpeedTest().run() larch-1.20131130/test-backwards-compatibility0000755000175000017500000001150412246332521020576 0ustar jenkinsjenkins#!/usr/bin/python # Copyright 2012 Lars Wirzenius # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . '''Test backwards compatibility of an on-disk B-tree. This program tests that a Larch on-disk B-tree is backwards compatible with previous versions, at least to the extent that it can be read from. This program operates in one of two modes: * it can generate a new B-tree to be stored as test data for the future * it can read an existing tree and verify that it can read it right The generated B-tree is actually a forest, and contains four trees. The first tree has the following keys: * key size is 4 bytes * keys are 0, 1, 2, ..., 1023, converted into binary strings with struct * values are 0, 1, 2, ..., 1023, converted into text strings with '%d' % i * node size is 128 bytes The second tree is a clone of the first one, but with all odd-numbered keys removed. The third tree is a clone of the second one, but with all odd-numbered keys and values added back. The fourth tree is a clone of the third one, but with all even-numbered keys removed. ''' import cliapp import os import shutil import struct import tarfile import tempfile import larch class BackwardsCompatibilityTester(cliapp.Application): key_size = 4 node_size = 128 num_keys = 1024 keys1 = range(num_keys) remove2 = range(1, num_keys, 2) keys2 = [i for i in keys1 if i not in remove2] keys3 = keys2 remove4 = range(0, num_keys, 2) keys4 = [i for i in keys3 if i not in remove4] def setup(self): self.dirname = tempfile.mkdtemp() def teardown(self): shutil.rmtree(self.dirname) def key(self, i): return struct.pack('!L', i) def value(self, i): return '%d' % i def cmd_generate(self, args): '''Generate a Larch B-tree forest''' forest = larch.open_forest(key_size=self.key_size, node_size=self.node_size, dirname=self.dirname, allow_writes=True) # First tree. t = forest.new_tree() for i in self.keys1: t.insert(self.key(i), self.value(i)) # Second tree. t = forest.new_tree(t) for i in self.remove2: t.remove(self.key(i)) # Third tree. t = forest.new_tree(t) for i in self.keys3: t.insert(self.key(i), self.value(i)) # Fourth tree. t = forest.new_tree(t) for i in self.remove4: t.remove(self.key(i)) # Commit and make into a tarball. forest.commit() tf = tarfile.open(fileobj=self.output, mode='w:gz') tf.add(self.dirname, arcname='.') tf.close() def cmd_verify(self, args): forest_dirname = os.path.join(self.dirname, 'forest') for filename in args: os.mkdir(forest_dirname) tf = tarfile.open(filename) tf.extractall(path=forest_dirname) tf.close() forest = larch.open_forest(dirname=forest_dirname, allow_writes=False) if len(forest.trees) != 4: raise cliapp.AppException('Need 4 trees, not %d' % len(forest.trees)) self.verify_tree(forest.trees[0], self.keys1) self.verify_tree(forest.trees[1], self.keys2) self.verify_tree(forest.trees[2], self.keys3) self.verify_tree(forest.trees[3], self.keys4) shutil.rmtree(forest_dirname) self.output.write('%s is OK\n' % filename) def verify_tree(self, tree, keys): minkey = self.key(0) maxkey = self.key(2**(8*self.key_size) - 1) i = 0 for key, value in tree.lookup_range(minkey, maxkey): if key != self.key(keys[i]): raise cliapp.AppException('Wanted key %s, got %s' % (keys[i], repr(key))) if value != self.value(keys[i]): raise cliapp.AppException('Wanted value %s, got %s' % (keys[i], repr(value))) i += 1 BackwardsCompatibilityTester().run() larch-1.20131130/test-data/0000755000175000017500000000000012246332521014750 5ustar jenkinsjenkinslarch-1.20131130/test-data/format-nodestore1-codec1.tar.gz0000644000175000017500000004620112246332521022607 0ustar jenkinsjenkinsOtest-data/format-nodestore1-codec1.tar nS}KHR I%IX}NI$=I$aH$I$I$I$I$I$I$I$Iyϱٻg_=׋1\}b[s9 nUU\3_qZTQ9[a }'g>A׎[M-[ogyBG<4޴ܞ{MNnL51w^wV'XOWn=#rozL/h}:8?l [`kŇߙp?6hBj~9B VO<篚? ofCǂe F#DA r$?_f#bϢW__$_nkx+~3i2gZfuEU3 ˓ǀOh>"~)f!@ Y/&_oaA٠m0,M4XyOnw_|?Z@XMkF?pp8ǀciT*h ^W8+Xx# v.h* I@?9ޚdgR?D?D.0|78 |>(gB=r[!?/IT,^T3??wbTS4pz:lhaZVZJ-'9^+~AtB-/L#)lW Hi}c_\Pzhqoo":X<b TH7y:ߞ?ɟ(buq>K)q /U39W?H?%|~ _>i_Z_J륫I⚏Wsϒ88|>YwkEԺUjkWL8G#QPR(- ] -N.”ί#>#F{x{ 8uSa˓u`|?X ^*>E?ۢ?.`LEemml -'t \,N^ /`5՞_\808 |HJ?:}Z455NkE:˭`-l_v_IRx~^Jixc_0Ґo.:oKi+{kO]ϥp8|T`"_Wr_f=ZM#ϝ?p$8  -  U |*5/E(o݊BTRTJTZT| O/A)ˁ `sٴs9|t?֦\)E;bZVUUڱO<*x*X < ^,HxjU r?/"qG{kьO \  Jfc;puu F![?ޞGONgpWmm//s_xO"h+z(uփf_WR?l`{qi6m|ώ_pLI0jXa#EX'?APVy] `jo*#Ykd{bO}E>UAa=|['%4?8|Q_;vs /ך4b)lp1怖4JLSi]U;J4ֿ[CxrJx,8N' l>*HWcyO~!\`e(i4A=?ίSPĨubA q>,˭'k瀵o zFпk9ϭ/ b)O6oL0| Oi??.|\ _`E2XFYH=пjQ?_Hgs

m_ˠu6hL ;i??1dO,V`k^[,ϗ. /b5u\rOKJ迄mSorRR)WRZ*cw-n5Z`mX?}%}}e xQzV\_E4{mZ/;^jkhN_f-9ZOFh=U -ysSz >N֓h 'YVS@XyQoxbOM@ \o++lEݏ'1cOU?cq {/q:iL`; VY/n_5U[-]-N|ǟ>矖6dS_;&&Ɗ[qsI__qP6(*7TWqO?-C{ǀcqxp8TZi@;/. a3kc;p+\J+|q{f?GE'P p28 \F?,^(ܠHFOi֟:3ތsoL?8*hʎvAS w~+w{6h/W//Uko ?ME{}: cO6?sN)/ׁ lsb3`㖿׮Nj1/qOn%uzz^/?ޢmv:;-N+j rE;:cq`BSѿ|?J/K+u) zk[cOn/">{buRh)cO_5_n*F}~@>H`b4h:( V/9AlF5y/v{=`/X*C;Gu%RxS+`5u[M?ʭ(ߍbPz/F._>}XSu9ϥ_'}1.wA ]|m6Y8g%Zj6:Nlbeϭ/P<Ψ}z|;?:+p3}ZϦCXMB+1yO+*KAVRH)yw](3r_91:W*>Q{p~-[p ]C!0=?=X-՝[e],sx:lDFYecx/~= l-RK x8N:kog^U31$A7572WC ϮK4g)4^,V+'s3፬?Fz ݞ?Vq;ߒ?zJH%Z 5VSqO7x+= ^B!|Ate%rzaqp{+xׇS^ZGi=:;n3< .Y7;Ӿ.[^Hiz1l/sS}~`Np@]㑞ѐp5ph %6]ۗ{g13HoYz<JD{Fqk?rLY|N@֥h}hF5?|7w_YpG'4ˇ뇈釈 >uno_69JQKU)UiUq_/EUU<'wKgw=^_MU[AgfR?:?QNpȒruH *z&xN:<>!+lUpNέ;,Ofjr\ JpO{ZoNiA+ӌkp]brSG߷J?kIyJr"%E%w&[;?x\Qkua ;.` Uߋ?4twK\7x0}!D% wE%ܤdH |Nc_eJ`My ؚ]UZZ@Y}-{W;~n!u?Q<^ o//T?!qg<3=ϭտ9pAx/hMoXx7; ~p"8 \Q' #XduAvq*QςSu:jZ-Y}آWWOudzr1NGo3?>z~u:k^Z|'~õs:y`:GWWWsO/Eko ϭP"xb,\xtJ:-M߁ӿrV9W.(>G&ѿ3s.FE, `) l^`0Xme*_yw~l\K{uBMl׿mO5|\~Pm6RֆoZtG!xy(x?8N[^o/tsc&V W"(USV.O}- !BZ7b5u[b5v?EcHp8޿BiP^JhuJ5Q=`oX 1jgr'?@ \WwYYYy)*)^_.Tc i}._Ќ/Z%RpH,f;R-˪ tKM? )pbT>SBX`'(+oӿJ~#kF39b4kXc/ ђz=/  _>Ʋϥ&?K^ {+ZWuZWe5u[i+|?~?7 ߢaQ* X,XqDS QMKॵq~/R^i8u{no_.z˃ǃ/q o?6_ }cMu#GIьp/Z%Rp\Q̲*DUUɿK#\غ]k =Q)l ^ Vuz7͜iQqZE{ĞORkm7M'_xh[4(ٳXGUA^ʸSg8ϥ? )}g5h}klZb5u< _nF9Qh}"k+p#5N]>xW-g׿zK=PXxW8)ArO//d3+x?˧?#@fp Vqw^2tl x?Q?#(p48;U*h ^W?4OO&IEϏ(RONSHi=z2GS?l^&X ބ`+??\~p68.e9ϭ5Gu?ǃXSuF8D }{wnO>`_[,o#oO?GI3dYpJm?2?_qOnGwE?ѕ5/h5ͭ?8|\U T/è?p O4un7?}UY`m|ZG르~XMֿ 9*p5ijl[KɝzTOH=l h] О8I:oJiY-]\ ~ ?)jDp28\[ p7+Nx~H{iпlSOn?/3Y+uN ^_M52뿌Z_J`ej:(Wk58+^Jc}uAzqw'fL]vfyAN3p-9=j{۔9gg$ง;Ӻ bNТǓ@o;i]MGi=֏z,-\;^up^߬Wky 3V3??G?WQkRub}Q1uWV@o( Zwj\Yזz9!E:#49sh]:]/=|{yPPPq ?.'.OTtx:}9~si: ZguCcS| > N񰜴NK@?&? w{{ Vh`/|7_r?_%~yj (Og3GجZX-RKN< 3ɭxwYb!`#1lZa\mq[u@Z5߼_JFpn۩F`vv^J{j5uArQь8ϬqG"* g!O.jF3o+{}X{H{_žkmNrn%uzwJ:_?%`ӴPn^V%RIVv{@K')^E|7o% >N_P"( Jݏf?]bT78o ΢ϮZZoVVP* mir뿞R=к kJ@/۳:"s[$~o]VV?3s_zhֿj%QICh|x\.s? V\O,LsY{}?}?sI=Go{5?s_ AoBKhݔ֗Һc?iWts!1ouWZwc5u[-/M> ϩ}%|%}⤿"ѿ|[~ nvR@Zs^_Lf_dyY6[-oU1WyKxoxo/>M| ^q&Eu: joI8p<8N'SA؀WB .Ա A {e/]>k/:k9oO/x??/rt<Casr^ŃV*']~?y?oKb8 tׯ7=5d?vq0 xo+o:NNEX3?E0D0D0D0D0D0* , V~WZSg;.`.s612RU+ O6q|/c/hֿh_Gnq68ľNE ]4 El*ף6AuC S?;^_H=Ay4)B ?zt|_{A:[ >BMy?O/ǃO+ ЂOT?ͯGLw-តW}|pBƛtv'.PN;RC-\.߮Zjt4ۓ[NWFkpMmS)')/LC?O _\?p$8jlj{~_\1ZY ~ ns;CZ_I>VOտK^̶FVqnF %p&rm ڲOf_ϝke*X ֘-7JFy+ќwKS xo>OI? p!.}ZoAXMֿmפ+lU@:Jjb#;zt5n;4C]|拭>`#q1?aK wywLu׀~OM;h=msZ9`m:?/A8@S~&!p|N66ikX}i_v*/ku6zN1њ?'Gf-C3{}bb?g(Jm[]3t>f}nہ`to4A`M6>ni+>X@l^|/w,`LE~^XiBY-iۓ:kow{пѿ?_>F@ x\Nh/t7l=U;ϝ+X.8O/-]ZYը7 1??hXHu,R b%J %Uě]Tk9ϭR?X A<XHݢ?4\ <%-W_:$M`ArO?1ٌp"]\;=t'C +P"G 'h`^?H6]0|{薯*r~=ٌ*ߣZc ϘrE$!lqt`*# jMXMׁ u^iTV<5iO/= p08 +S*畯k^qp'?ߺn25Z`mX֟MҾ^T>s1gϱsI ˁ `E/}o[/]7p[_?_@v>_'ͭ?gp ++1gBIfO^wpw&8Eoƚ]/˝PpGh3K4b!hp28 \7L+9Ɩt{|34'| # Ưoco&p3 n CB!l^t?ȣYM8G=?_g;4ZW>mqIuxa4^ Icq:apA stߋ6?|;.ҹߨ(=H̑sAl5qJ pJ3ӝj-Fw^?7??>Eo3q~i$'qO~e 8,>&[Il6`˹%^JIGHW\ [~%jIsk7Ѽ.bg8ϝ/QZ`">z,к,8VSia5I@6 g6[o\e4c?B?AESw1٢__ .?_+8u__>>(#S6nA~KVS)x)ج6w6qHS?k-?Xwp/FMUJTL?y _^"(ӻ BRAпi:dW.JF}g?9q; ?j_ ϧu Zf5u[vGߓ9!\YǺcCUko .#Q#ߪT*?d{?ʯGx l\|-X}пXyhYlp8-A``d0|10?zG>Wl8]]Z@id5uW?s\x :8yK_:1GI}J 6ac:y`]xa4F>_5Ug+|>N;良]iݘ֝h}7'Ɋkq\k=/ !t?D0\6ۓֽi݇ŴwzoP{p _a1B\oy +G_:gKLq;/w=eRy+|-6rOf<&mkuzW O) Js_/hh1_.Wb=GzG1Z?Nq6_ߜEbp \F-wtvuwpovo;7ƪ i+e+m+j'90p8<2"3  ATAclO_=yL3D.0 P=Oq-?:ZAn_i;G%_Gy'R %>.l+6恗ʉ'6RF+ O45s??C f+hW![c^ہ׃;jz[./_ ?)x)~eE%OYo`5u53cYE}fp VrX^JFRZ)T\ Of۠Q*8 NYdP:(\ O*sW_ ہFy=B_u?\HM~V {(VSP?3_70 +7獯q8{8ABϺs?{x#ر>~$t~_5=?؊MUahװ:fW5_~HgE8hQ7vq0 ⛾пow?wD o5MR7?߹xAK_4?XeпnG_sW$_,3( u_UVUNU^UA }TC?q_oSy_ ZD뙴~4JͼnbwAB<0AT ߚC?D|/ E}TSXISx7kZo7n4UWs&;7yNu_iSK/k_~iWuZ-VAol?˭-`:[<+V/?6_~Ig~ߣOYW8 N/ 6{/&VSSnj?]kVu_)TѴRA",jFzFS3Lvx`"cעtb⻁]ѿoO'S`:o0J_J ts^Gnw>p?xV D2˭&O/^Jҳ3ޞdI{P^*׊ 5ٌ/lp8 YZZ__n뿉nMV!O?ʧǓ=lTZi}2On߶"p1.66k?=}Bkڱϭc,xX<)+o?8Ɖq 18M`5u[B44g?eR1/NE{U̲/&ЎJx[w->p?gM%ML{'rp+5l *hTHj0d{I`O7,UNxz罯޳ۢ\=1p,xu%t%ubmg[OKӿB@%\ FNp[g$f~\jm68 ~PϙO?k:`]XTF{c*cX`|~ϺLhQp 8k[H_y7+ [NC0BE%IU`5: \.( ?`k xmm e6^_5WWϬ+EX{2H[N\Frܤr# $XvKJ[2Z!3^N迚j5?WH1b p08d.x0FVH/ {[?z,jZS Fۢ|/w#ǹ_ٟH?}Z\~SѿU\[H//C8_Yc`C2F }TM9oI~>N"*Hg/rSWp^Jj-5hOάa;? [l[6`[gm_,aκ8 Dz?r Uު`yߎ5w/R+Ujp \7s9ᝬDq0tBM9WKS ^Iyf"}X߱:Q!Pp8GY~,vLWw=?;RDVU(t\?˧JF{OtZϢ\Z/`5u\u"`98|]by{n_r'^%Ut/p}/E֗RZ7嬲5otӽX_oDe4\O bME=vq0 s{cVJl_XϭSc@}8Hu7p>y. mxHa0V/W'?ޞ_rQXnMAs_;׾^ju_5@FGq/0yg%Sܪ?rsʍ&`t.-sGy%84^+en~A#p%qCdy%BEդ]_}_.bwd}=:qG?sbT3^׀k gC|lp>*x. set -eu for tarball in test-data/*.tar.gz do ./test-backwards-compatibility verify "$tarball" done larch-1.20131130/tests/backwards-compatible.stdout0000644000175000017500000000006012246332521021542 0ustar jenkinsjenkinstest-data/format-nodestore1-codec1.tar.gz is OK larch-1.20131130/without-tests0000644000175000017500000000013712246332521015651 0ustar jenkinsjenkins./larch/__init__.py ./larch/nodestore.py ./setup.py ./example.py ./doc/conf.py ./larch/fsck.py