pax_global_header00006660000000000000000000000064122034401660014510gustar00rootroot0000000000000052 comment=10792b75a3b6788926bcde1762e2fc8dd82af446 grokmirror-0.3.5/000077500000000000000000000000001220344016600137125ustar00rootroot00000000000000grokmirror-0.3.5/.gitignore000066400000000000000000000000251220344016600156770ustar00rootroot00000000000000*.pyc *.swp *.pdf *~ grokmirror-0.3.5/COPYING000066400000000000000000001045131220344016600147510ustar00rootroot00000000000000 GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The GNU General Public License is a free, copyleft license for software and other kinds of works. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things. To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others. For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License giving you legal permission to copy, distribute and/or modify it. For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions. Some devices are designed to deny users access to install or run modified versions of the software inside them, although the manufacturer can do so. This is fundamentally incompatible with the aim of protecting users' freedom to change the software. The systematic pattern of such abuse occurs in the area of products for individuals to use, which is precisely where it is most unacceptable. Therefore, we have designed this version of the GPL to prohibit the practice for those products. If such problems arise substantially in other domains, we stand ready to extend this provision to those domains in future versions of the GPL, as needed to protect the freedom of users. Finally, every program is threatened constantly by software patents. States should not allow patents to restrict development and use of software on general-purpose computers, but in those that do, we wish to avoid the special danger that patents applied to a free program could make it effectively proprietary. To prevent this, the GPL assures that patents cannot be used to render the program non-free. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 0. Definitions. "This License" refers to version 3 of the GNU General Public License. "Copyright" also means copyright-like laws that apply to other kinds of works, such as semiconductor masks. "The Program" refers to any copyrightable work licensed under this License. Each licensee is addressed as "you". "Licensees" and "recipients" may be individuals or organizations. To "modify" a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a "modified version" of the earlier work or a work "based on" the earlier work. A "covered work" means either the unmodified Program or a work based on the Program. To "propagate" a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well. To "convey" a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. An interactive user interface displays "Appropriate Legal Notices" to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion. 1. Source Code. The "source code" for a work means the preferred form of the work for making modifications to it. "Object code" means any non-source form of a work. A "Standard Interface" means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language. The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source. The Corresponding Source for a work in source code form is that same work. 2. Basic Permissions. All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law. You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you. Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary. 3. Protecting Users' Legal Rights From Anti-Circumvention Law. No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures. When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures. 4. Conveying Verbatim Copies. You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program. You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. 5. Conveying Modified Source Versions. You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: a) The work must carry prominent notices stating that you modified it, and giving a relevant date. b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to "keep intact all notices". c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate. 6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work. A "User Product" is either (1) a "consumer product", which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, "normally used" refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product. "Installation Information" for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made. If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM). The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying. 7. Additional Terms. "Additional permissions" are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions. When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission. Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms: a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or d) Limiting the use for publicity purposes of names of licensors or authors of the material; or e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors. All other non-permissive additional terms are considered "further restrictions" within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying. If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms. Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way. 8. Termination. You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11). However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10. 9. Acceptance Not Required for Having Copies. You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so. 10. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. An "entity transaction" is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it. 11. Patents. A "contributor" is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's "contributor version". A contributor's "essential patent claims" are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, "control" includes the right to grant patent sublicenses in a manner consistent with the requirements of this License. Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version. In the following three paragraphs, a "patent license" is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To "grant" such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party. If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. "Knowingly relying" means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid. If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it. A patent license is "discriminatory" if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007. Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law. 12. No Surrender of Others' Freedom. If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program. 13. Use with the GNU Affero General Public License. Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such. 14. Revised Versions of this License. The Free Software Foundation may publish revised and/or new versions of the GNU General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU General Public License, you may choose any version ever published by the Free Software Foundation. If the Program specifies that a proxy can decide which future versions of the GNU General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program. Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version. 15. Disclaimer of Warranty. THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. Limitation of Liability. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 17. Interpretation of Sections 15 and 16. If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively state the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Also add information on how to contact you by electronic and paper mail. If the program does terminal interaction, make it output a short notice like this when it starts in an interactive mode: Copyright (C) This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, your program's commands might be different; for a GUI interface, you would use an "about box". You should also get your employer (if you work as a programmer) or school, if any, to sign a "copyright disclaimer" for the program, if necessary. For more information on this, and how to apply and follow the GNU GPL, see . The GNU General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. But first, please read . grokmirror-0.3.5/MANIFEST.in000066400000000000000000000000741220344016600154510ustar00rootroot00000000000000include COPYING include *.rst include *.conf include *.spec grokmirror-0.3.5/README.rst000066400000000000000000000203331220344016600154020ustar00rootroot00000000000000GROKMIRROR ========== -------------------------------------------- Framework to smartly mirror git repositories -------------------------------------------- :Author: mricon@kernel.org :Date: 2013-05-27 :Copyright: The Linux Foundation and contributors :License: GPLv3+ :Version: 0.3.5 DESCRIPTION ----------- Grokmirror was written to make mirroring large git repository collections more efficient. Grokmirror uses the manifest file published by the master mirror in order to figure out which repositories to clone, and to track which repositories require updating. The process is extremely lightweight and efficient both for the master and for the mirrors. CONCEPTS -------- Grokmirror master publishes a json-formatted manifest file containing information about all git repositories that it carries. The format of the manifest file is as follows:: { "/path/to/bare/repository.git": { "description": "Repository description", "reference": "/path/to/reference/repository.git", "modified": timestamp, "symlinks": [ "/location/to/symlink", ... ], } ... } The manifest file is usually gzip-compressed to preserve bandwidth. Each time a commit is made to one of the git repositories, it automatically updates the manifest file using an appropriate git hook, so the manifest.js file always contains the most up-to-date information about the repositories provided by the git server and their last-modified date. The mirroring clients will constantly poll the manifest.js file and download the updated manifest if it is newer than the locally stored copy (using ``Last-Modified`` and ``If-Modified-Since`` http headers). After downloading the updated manifest.js file, the mirrors will parse it to find out which repositories have been updated and which new repositories have been added. For all newly-added repositories, the clients will do:: git clone --mirror git://server/path/to/repository.git \ /local/path/to/repository.git For all updated repositories, the clients will do:: GIT_DIR=/local/path/to/repository.git git remote update When run with ``--purge``, the clients will also purge any repositories no longer present in the manifest file received from the server. Shared repositories ~~~~~~~~~~~~~~~~~~~ Grokmirror will automatically recognize when repositories share objects via alternates. E.g. if repositoryB is a shared clone of repositoryA (that is, it's been cloned using ``git clone -s repositoryA``), the manifest will mention the referencing repository, so grokmirror will mirror repositoryA first, and then mirror repositoryB with a ``--reference`` flag. This greatly reduces the bandwidth and disk use for large repositories. See man git-clone_ for more info. .. _git-clone: https://www.kernel.org/pub/software/scm/git/docs/git-clone.html SERVER SETUP ------------ Install grokmirror on the server using your preferred way. **IMPORTANT: Currently, only bare git repositories are supported.** You will need to add a hook to each one of your repositories that would update the manifest upon repository modification. This can either be a post-receive hook, or a post-update hook. The hook must call the following command:: /usr/bin/grok-manifest -m /repos/manifest.js.gz -t /repos -n `pwd` The **-m** flag is the path to the manifest.js file. The git process must be able to write to it and to the directory the file is in (it creates a manifest.js.randomstring file first, and then moves it in place of the old one for atomicity). The **-t** flag is to help grokmirror trim the irrelevant toplevel disk path. E.g. if your repository is in /var/lib/git/repository.git, but it is exported as git://server/repository.git, then you specify ``-t /var/lib/git``. The **-n** flag tells grokmirror to use the current timestamp instead of the exact timestamp of the commit (much faster this way). Before enabling the hook, you will need to generate the manifest.js of all your git repositories. In order to do that, run the same command, but omit the -n and the \`pwd\` argument. E.g.:: /usr/bin/grok-manifest -m /repos/manifest.js.gz -t /repos The last component you need to set up is to automatically purge deleted repositories from the manifest. As this can't be added to a git hook, you can either run the ``--purge`` command from cron:: /usr/bin/grok-manifest -m /repos/manifest.js.gz -t /repos -p Or add it to your gitolite's ``rm`` ADC using the ``--remove`` flag:: /usr/bin/grok-manifest -m /repos/manifest.js.gz -t /repos -x $repo.git If you would like grok-manifest to honor the ``git-daemon-export-ok`` magic file and only add to the manifest those repositories specifically marked as exportable, pass the ``--check-export-ok`` flag. See ``git-daemon(1)`` for more info on ``git-daemon-export-ok`` file. MIRROR SETUP ------------ Install grokmirror on the mirror using your preferred way. Locate repos.conf and modify it to reflect your needs. The default configuration file is heavily commented. Add a cronjob to run as frequently as you like. For example, add the following to ``/etc/cron.d/grokmirror.cron``:: # Run grok-pull every minute as user "mirror" * * * * * mirror /usr/bin/grok-pull -p -c /etc/grokmirror/repos.conf Make sure the user "mirror" (or whichever user you specified) is able to write to the toplevel, log and lock locations specified in repos.conf. If you already have a bunch of repositories in the hierarchy that matches the upstream mirror and you'd like to reuse them instead of re-downloading everything from the master, you can pass the ``-r`` flag to tell grok-pull that it's okay to reuse existing repos. This will delete any existing remotes defined in the repository and set the new origin to match what is configured in the repos.conf. GROK-FSCK --------- Git repositories can get corrupted whether they are frequently updated or not, which is why it is useful to routinely check them using "git fsck". Grokmirror ships with a "grok-fsck" utility that will run "git fsck" on all mirrored git repositories. It is supposed to be run nightly from cron, and will do its best to randomly stagger the checks so only a subset of repositories is checked each night. Any errors will be sent to the user set in MAILTO. To enable grok-fsck, first locate the fsck.conf file and edit it to match your setup -- e.g., it must know where you keep your local manifest. Then, add the following to ``/etc/cron.d/grok-fsck.cron``:: # Make sure MAILTO is set, for error reports MAILTO=root # Run nightly, at 2AM 00 02 * * * mirror /usr/bin/grok-fsck -c /etc/grokmirror/fsck.conf You can force a full run using the ``-f`` flag, but unless you only have a few smallish git repositories, it's not recommended, as it may take several hours to complete. Before it runs, grok-fsck will put an advisory lock in the git-directory being checked (repository.git/grokmirror.lock). Grok-pull will recognize the lock and will postpone any incoming updates to that repository until the next grok-pull run. FAQ --- Why is it called "grok mirror"? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Because it's developed at kernel.org and "grok" is a mirror of "korg". Also, because it groks git mirroring. Why not just use rsync? ~~~~~~~~~~~~~~~~~~~~~~~ Rsync is extremely inefficient for the purpose of mirroring git trees that mostly consist of a lot of small files that very rarely change. Since rsync must calculate checksums on each file during each run, it mostly results in a lot of disk thrashing. Additionally, if several repositories share objects between each-other, unless the disk paths are exactly the same on both the remote and local mirror, this will result in broken git repositories. It is also a bit silly, considering git provides its own extremely efficient mechanism for specifying what changed between revision X and revision Y. Why not just run "git pull" from cron every minute? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is not a complete mirroring strategy, as this won't notify you when the remote mirror adds new repositories. It is also not very nice to the remote server, especially the one that carries hundreds of repositories. Additionally, this will not automatically take care of shared repositories for you. See "Shared repositories" under "CONCEPTS". grokmirror-0.3.5/fsck.conf000066400000000000000000000045701220344016600155150ustar00rootroot00000000000000# You can have multiple sections, just name them appropriately [kernel.org] # Where is the manifest containing the list of repositories? manifest = /var/lib/git/mirror/manifest.js.gz # # Where are the repositories kept? #toplevel = /var/lib/git/mirror toplevel = /var/lib/git/mirror # # Where do we put the logs? #log = /var/log/mirror/kernelorg-fsck.log log = /var/log/mirror/kernelorg-fsck.log # # Log level can be "info" or "debug" #loglevel = info loglevel = info # # Make sure there is only one instance of grok-fsck running by # trying to exclusive-lock this file before we do anything. lock = /var/lock/mirror/kernelorg-fsck.lock # # Where to keep the status file #statusfile = /var/lib/mirror/kernelorg-fsck.js statusfile = /var/lib/mirror/kernelorg-fsck.js # # How often should we check each repository, in days. # Any newly added repository will have the first check within a random # period of 0 and $frequency, and then every $frequency after that, # to assure that not all repositories are checked on the same day. # Don't set to less than 7 unless you only mirror a few repositories # (or really like to thrash your disks). #frequency = 30 frequency = 30 # # Not all repositories take a long time to check. This option lets you # allocate some time after each run to check small repositories. Grok-fsck # will then sort all repos not already checked during the usual run by their # elapsed time, and run an fsck check on each of them until the allotted time # for quick checks is elapsed. # Note that if you only have a few repos, this pretty much obsoletes # "frequency" if it takes less than this much time to check them all. #quick_checks_max_min = 5 quick_checks_max_min = 5 # # Some errors are relatively benign and can be safely ignored. Add matching # substrings to this field to ignore them. ignore_errors = dangling commit dangling blob notice: HEAD points to an unborn branch notice: No default references contains zero-padded file modes # # Enable this option to repack the repository before calling git-fsck. # This will allow you to save some space if you have shared repositories. # To check if you have shared repositories, look at your manifest.js.gz to # see if any repository definition has a "reference" key. #repack = yes # # Default repack flags are -a -d -l -q, but you can specify your own here #repack_flags = -a -d -l -q grokmirror-0.3.5/grok-dumb-pull.py000077500000000000000000000161031220344016600171310ustar00rootroot00000000000000#!/usr/bin/python -tt # Copyright (C) 2013 by The Linux Foundation and contributors # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import sys import grokmirror import logging import time import fnmatch import subprocess import shutil import calendar from fcntl import lockf, LOCK_EX, LOCK_UN, LOCK_NB from git import Repo logger = logging.getLogger(__name__) def git_remote_update(args, env): logger.debug('Running: GIT_DIR=%s %s' % (env['GIT_DIR'], ' '.join(args))) (output, error) = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env).communicate() error = error.strip() if error: # Put things we recognize into debug debug = [] warn = [] for line in error.split('\n'): if line.find('From ') == 0: debug.append(line) elif line.find('-> ') > 0: debug.append(line) else: warn.append(line) if debug: logger.debug('Stderr: %s' % '\n'.join(debug)) if warn: logger.warning('Stderr: %s' % '\n'.join(warn)) def dumb_pull_repo(gitdir, remotes, svn=False): # verify it's a git repo and fetch all remotes logger.debug('Will pull %s with following remotes: %s' % (gitdir, remotes)) try: repo = Repo(gitdir) assert repo.bare == True except: logger.critical('Error opening %s.' % gitdir) logger.critical('Make sure it is a bare git repository.') sys.exit(1) try: grokmirror.lock_repo(gitdir, nonblocking=True) except IOError, ex: logger.info('Could not obtain exclusive lock on %s' % gitdir) logger.info('\tAssuming another process is running.') return False env = {'GIT_DIR': gitdir} if svn: logger.debug('Using git-svn for %s' % gitdir) for remote in remotes: # arghie-argh-argh if remote == '*': remote = '--all' logger.info('Running git-svn fetch %s in %s' % (remote, gitdir)) args = ['/usr/bin/git', 'svn', 'fetch', remote] git_remote_update(args, env) grokmirror.unlock_repo(gitdir) return True # Not an svn remote hasremotes = repo.git.remote() if not len(hasremotes.strip()): logger.info('Repository %s has no defined remotes!' % gitdir) return False didwork = False logger.debug('existing remotes: %s' % hasremotes) for remote in remotes: remotefound = False for hasremote in hasremotes.split('\n'): if fnmatch.fnmatch(hasremote, remote): remotefound = True logger.debug('existing remote %s matches requested %s' % ( hasremote, remote)) args = ['/usr/bin/git', 'remote', 'update', hasremote] logger.info('Updating remote %s in %s' % (hasremote, gitdir)) git_remote_update(args, env) didwork = True if not remotefound: logger.info('Could not find any remotes matching %s in %s' % ( remote, gitdir)) grokmirror.unlock_repo(gitdir) return didwork def run_post_update_hook(hookscript, gitdir): if hookscript == '': return if not os.access(hookscript, os.X_OK): logger.warning('post_update_hook %s is not executable' % hookscript) return args = [hookscript, gitdir] logger.debug('Running: %s' % ' '.join(args)) (output, error) = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate() error = error.strip() output = output.strip() if error: # Put hook stderror into warning logger.warning('Hook Stderr: %s' % error) if output: # Put hook stdout into info logger.info('Hook Stdout: %s' % output) if __name__ == '__main__': from optparse import OptionParser usage = '''usage: %prog [options] /path/to/repos Bluntly fetch remotes in specified git repositories. ''' parser = OptionParser(usage=usage, version=grokmirror.VERSION) parser.add_option('-v', '--verbose', dest='verbose', action='store_true', default=False, help='Be verbose and tell us what you are doing') parser.add_option('-s', '--svn', dest='svn', action='store_true', default=False, help='The remotes for these repositories are Subversion') parser.add_option('-r', '--remote-names', dest='remotes', action='append', default=[], help='Only fetch remotes matching this name (accepts globbing, ' 'can be passed multiple times)') parser.add_option('-u', '--post-update-hook', dest='posthook', default='', help='Run this hook after each repository is updated. Passes ' 'full path to the repository as the sole argument.') parser.add_option('-l', '--logfile', dest='logfile', default=None, help='Put debug logs into this file') (opts, args) = parser.parse_args() if not len(args): parser.error('You must provide at least a path to the repos to pull') if not len(opts.remotes): opts.remotes = ['*'] logger.setLevel(logging.DEBUG) ch = logging.StreamHandler() formatter = logging.Formatter('%(message)s') ch.setFormatter(formatter) if opts.verbose: ch.setLevel(logging.INFO) else: ch.setLevel(logging.CRITICAL) logger.addHandler(ch) if opts.logfile is not None: ch = logging.FileHandler(opts.logfile) formatter = logging.Formatter("[%(process)d] %(asctime)s - %(levelname)s - %(message)s") ch.setFormatter(formatter) ch.setLevel(logging.DEBUG) logger.addHandler(ch) # push our logger into grokmirror to override the default grokmirror.logger = logger # Find all repositories we are to pull for entry in args: if entry[-4:] == '.git': if not os.path.exists(entry): logger.critical('%s does not exist' % entry) continue logger.debug('Found %s' % entry) didwork = dumb_pull_repo(entry, opts.remotes, svn=opts.svn) if didwork: run_post_update_hook(opts.posthook, entry) else: logger.debug('Finding all git repos in %s' % entry) for founddir in grokmirror.find_all_gitdirs(entry): didwork = dumb_pull_repo(founddir, opts.remotes, svn=opts.svn) if didwork: run_post_update_hook(opts.posthook, founddir) grokmirror-0.3.5/grok-fsck.py000077500000000000000000000277451220344016600161740ustar00rootroot00000000000000#!/usr/bin/python -tt # Copyright (C) 2013 by The Linux Foundation and contributors # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import sys import grokmirror import logging import time import json import subprocess import random import time import datetime from fcntl import lockf, LOCK_EX, LOCK_UN, LOCK_NB # default basic logger. We override it later. logger = logging.getLogger(__name__) def run_git_repack(fullpath, config): if 'repack' not in config.keys() or config['repack'] != 'yes': return if 'repack_flags' not in config.keys(): config['repack_flags'] = '-a -d -l -q' flags = config['repack_flags'].split() env = {'GIT_DIR': fullpath} args = ['/usr/bin/git', 'repack'] + flags logger.info('Repacking %s' % fullpath) logger.debug('Running: GIT_DIR=%s %s' % (env['GIT_DIR'], ' '.join(args))) (output, error) = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env).communicate() error = error.strip() # Shouldn't return any errors, normally, so dump any of them into warn if error: logger.critical('Repacking %s returned errors:' % fullpath) for entry in error.split('\n'): logger.critical("\t%s" % entry) def run_git_fsck(fullpath, config): # Lock the git repository so no other grokmirror process attempts to # modify it while we're running git ops. If we miss this window, we # may not check the repo again for a long time, so block until the lock # is available. try: grokmirror.lock_repo(fullpath, nonblocking=False) except IOError, ex: logger.info('Could not obtain exclusive lock on %s' % fullpath) logger.info('Will run next time') return env = {'GIT_DIR': fullpath} args = ['/usr/bin/git', 'fsck', '--full'] logger.info('Checking %s' % fullpath) logger.debug('Running: GIT_DIR=%s %s' % (env['GIT_DIR'], ' '.join(args))) (output, error) = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env).communicate() error = error.strip() if error: # Put things we recognize as fairly benign into debug debug = [] warn = [] for line in error.split('\n'): ignored = False for estring in config['ignore_errors']: if line.find(estring) != -1: ignored = True debug.append(line) break if not ignored: warn.append(line) if debug: logger.debug('Stderr: %s' % '\n'.join(debug)) if warn: logger.critical('%s has critical errors:' % fullpath) for entry in warn: logger.critical("\t%s" % entry) run_git_repack(fullpath, config) grokmirror.unlock_repo(fullpath) def fsck_mirror(name, config, opts): global logger logger = logging.getLogger(name) logger.setLevel(logging.DEBUG) if 'log' in config.keys(): ch = logging.FileHandler(config['log']) formatter = logging.Formatter("[%(process)d] %(asctime)s - %(levelname)s - %(message)s") ch.setFormatter(formatter) loglevel = logging.INFO if 'loglevel' in config.keys(): if config['loglevel'] == 'debug': loglevel = logging.DEBUG ch.setLevel(loglevel) logger.addHandler(ch) ch = logging.StreamHandler() formatter = logging.Formatter('%(message)s') ch.setFormatter(formatter) if opts.verbose: ch.setLevel(logging.INFO) else: ch.setLevel(logging.CRITICAL) logger.addHandler(ch) # push it into grokmirror to override the default logger grokmirror.logger = logger logger.info('Running grok-fsck for [%s]' % name) # Lock the tree to make sure we only run one instance logger.debug('Attempting to obtain lock on %s' % config['lock']) flockh = open(config['lock'], 'w') try: lockf(flockh, LOCK_EX | LOCK_NB) except IOError, ex: logger.info('Could not obtain exclusive lock on %s' % config['lock']) logger.info('Assuming another process is running.') return 0 manifest = grokmirror.read_manifest(config['manifest']) if os.path.exists(config['statusfile']): logger.info('Reading status from %s' % config['statusfile']) stfh = open(config['statusfile'], 'r') try: # Format of the status file: # { # '/full/path/to/repository': { # 'lastcheck': 'YYYY-MM-DD' or 'never', # 'nextcheck': 'YYYY-MM-DD', # }, # ... # } status = json.load(stfh) except: # Huai le! logger.critical('Failed to parse %s' % config['statusfile']) lockf(flockh, LOCK_UN) flockh.close() return 1 else: status = {} frequency = int(config['frequency']) today = datetime.datetime.today() workdone = False # Go through the manifest and compare with status for gitdir in manifest.keys(): fullpath = os.path.join(config['toplevel'], gitdir.lstrip('/')) if fullpath not in status.keys(): # Newly added repository # Randomize next check between now and frequency delay = random.randint(0, frequency) nextdate = today + datetime.timedelta(days=delay) nextcheck = nextdate.strftime('%F') status[fullpath] = { 'lastcheck': 'never', 'nextcheck': nextcheck, } logger.info('Added new repository %s with next check on %s' % ( gitdir, nextcheck)) workdone = True # Go through status and queue checks for all the dirs that are due today # (unless --force, which is EVERYTHING) todayiso = today.strftime('%F') for fullpath in status.keys(): # Check to make sure it's still in the manifest gitdir = fullpath.replace(config['toplevel'], '', 1) gitdir = '/' + gitdir.lstrip('/') if gitdir not in manifest.keys(): del status[fullpath] logger.info('Removed %s which is no longer in manifest' % gitdir) continue # If nextcheck is before today, set it to today # XXX: If a system comes up after being in downtime for a while, this # may cause pain for them, so perhaps use randomization here? nextcheck = datetime.datetime.strptime(status[fullpath]['nextcheck'], '%Y-%m-%d') if opts.force or nextcheck <= today: logger.debug('Queueing to check %s' % fullpath) # Calculate elapsed seconds startt = time.time() run_git_fsck(fullpath, config) endt = time.time() status[fullpath]['lastcheck'] = todayiso status[fullpath]['s_elapsed'] = round(endt - startt, 2) if opts.force: # Use randomization for next check, again delay = random.randint(1, frequency) else: delay = frequency nextdate = today + datetime.timedelta(days=delay) status[fullpath]['nextcheck'] = nextdate.strftime('%F') workdone = True # Do quickie checks if 'quick_checks_max_min' in config.keys(): # Convert to seconds for ease of tracking max_time = int(config['quick_checks_max_min']) * 60 logger.debug('max_time=%s' % max_time) if max_time < 60: logger.warning('quick_checks_max_min must be at least 1 minute') max_time = 60 logger.info('Performing quick checks') # Find the smallest s_elapsed not yet checked today # and run a check on it until we either run out of time # or repos to check. total_elapsed_time = 0 quickies_checked = 0 while True: # use this var to track which repo is smallest on s_elapsed least_elapsed = None repo_to_check = None for fullpath in status.keys(): if status[fullpath]['lastcheck'] in ('never', todayiso): # never been checked or checked today, skip continue if 's_elapsed' not in status[fullpath].keys(): # something happened to the s_elapsed entry? continue prev_elapsed = status[fullpath]['s_elapsed'] if total_elapsed_time + prev_elapsed > max_time: # would take us too long to check it, skip continue if least_elapsed is None or prev_elapsed < least_elapsed: least_elapsed = prev_elapsed repo_to_check = fullpath if repo_to_check is None: if quickies_checked == 0: logger.info('No repos qualified for quick checks') else: logger.info('Quick-checked %s repos in %s seconds' % ( quickies_checked, total_elapsed_time)) break # check repo and record the necessary bits startt = time.time() run_git_fsck(repo_to_check, config) endt = time.time() # We don't adjust nextcheck, since it kinda becomes meaningless status[repo_to_check]['lastcheck'] = todayiso status[repo_to_check]['s_elapsed'] = round(endt - startt, 2) total_elapsed_time += status[repo_to_check]['s_elapsed'] quickies_checked += 1 workdone = True # Write out the new status if workdone: logger.info('Writing new status file in %s' % config['statusfile']) stfh = open(config['statusfile'], 'w') json.dump(status, stfh, indent=2) stfh.close() lockf(flockh, LOCK_UN) flockh.close() if __name__ == '__main__': from optparse import OptionParser from ConfigParser import ConfigParser usage = '''usage: %prog -c fsck.conf Run a git-fsck check on grokmirror-managed repositories. ''' parser = OptionParser(usage=usage, version=grokmirror.VERSION) parser.add_option('-v', '--verbose', dest='verbose', action='store_true', default=False, help='Be verbose and tell us what you are doing') parser.add_option('-f', '--force', dest='force', action='store_true', default=False, help='Force immediate run on all repositories.') parser.add_option('-c', '--config', dest='config', help='Location of fsck.conf') (opts, args) = parser.parse_args() if not opts.config: parser.error('You must provide the path to the config file') ini = ConfigParser() ini.read(opts.config) retval = 0 for section in ini.sections(): config = {} for (option, value) in ini.items(section): config[option] = value if 'ignore_errors' not in config: config['ignore_errors'] = [ 'dangling commit', 'dangling blob', 'notice: HEAD points to an unborn branch', 'notice: No default references', 'contains zero-padded file modes', ] else: ignore_errors = [] for estring in config['ignore_errors'].split('\n'): ignore_errors.append(estring.strip()) config['ignore_errors'] = ignore_errors fsck_mirror(section, config, opts) grokmirror-0.3.5/grok-manifest.py000077500000000000000000000242521220344016600170420ustar00rootroot00000000000000#!/usr/bin/python -tt # Copyright (C) 2013 by The Linux Foundation and contributors # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import sys import logging import time import grokmirror from git import Repo logger = logging.getLogger(__name__) def update_manifest(manifest, toplevel, gitdir, usenow): path = gitdir.replace(toplevel, '', 1) # Try to open git dir logger.debug('Examining %s' % gitdir) try: repo = Repo(gitdir) assert repo.bare == True except: logger.critical('Error opening %s.' % gitdir) logger.critical('Make sure it is a bare git repository.') sys.exit(1) # Ignore it if it's an empty git repository try: if len(repo.heads) == 0: logger.info('%s has no heads, ignoring' % gitdir) return except: # Errors when listing heads usually means repository is no good logger.info('Error listing heads in %s, ignoring' % gitdir) return try: description = repo.description except: description = 'Unnamed repository' try: rcr = repo.config_reader() owner = rcr.get('gitweb', 'owner') except: owner = None modified = 0 if not usenow: for branch in repo.branches: try: if branch.commit.committed_date > modified: modified = branch.commit.committed_date # Older versions of GitPython returned time.struct_time if type(modified) == time.struct_time: modified = int(time.mktime(modified)) except: pass if modified == 0: modified = int(time.time()) reference = None if len(repo.alternates) == 1: # use this to hint which repo to use as reference when cloning alternate = repo.alternates[0] if alternate.find(toplevel) == 0: reference = alternate.replace(toplevel, '').replace('/objects', '') if path not in manifest.keys(): logger.info('Adding %s to manifest' % path) manifest[path] = {} else: logger.info('Updating %s in the manifest' % path) manifest[path]['owner'] = owner manifest[path]['description'] = description manifest[path]['reference'] = reference manifest[path]['modified'] = modified def set_symlinks(manifest, toplevel, symlinks): for symlink in symlinks: target = os.path.realpath(symlink) if target.find(toplevel) < 0: logger.info('Symlink %s points outside toplevel, ignored' % symlink) continue tgtgitdir = target.replace(toplevel, '') if tgtgitdir not in manifest.keys(): logger.info('Symlink %s points to %s, which we do not recognize' % (symlink, target)) continue relative = symlink.replace(toplevel, '') if 'symlinks' in manifest[tgtgitdir].keys(): if relative not in manifest[tgtgitdir]['symlinks']: logger.info('Recording symlink %s->%s' % (relative, tgtgitdir)) manifest[tgtgitdir]['symlinks'].append(relative) else: manifest[tgtgitdir]['symlinks'] = [relative] logger.info('Recording symlink %s to %s' % (relative, tgtgitdir)) # Now go through all repos and fix any references pointing to the # symlinked location. for gitdir in manifest.keys(): if manifest[gitdir]['reference'] == relative: logger.info('Adjusted symlinked reference for %s: %s->%s' % (gitdir, relative, tgtgitdir)) manifest[gitdir]['reference'] = tgtgitdir def purge_manifest(manifest, toplevel, gitdirs): for oldrepo in manifest.keys(): if os.path.join(toplevel, oldrepo.lstrip('/')) not in gitdirs: logger.info('Purged deleted %s\n' % oldrepo) del manifest[oldrepo] if __name__ == '__main__': from optparse import OptionParser usage = '''usage: %prog -m manifest.js[.gz] -t /path [/path/to/bare.git] Create or update manifest.js with the latest repository information. ''' parser = OptionParser(usage=usage, version=grokmirror.VERSION) parser.add_option('-m', '--manifest', dest='manifile', help='Location of manifest.js or manifest.js.gz') parser.add_option('-t', '--toplevel', dest='toplevel', help='Top dir where all repositories reside') parser.add_option('-l', '--logfile', dest='logfile', default=None, help='When specified, will put debug logs in this location') parser.add_option('-n', '--use-now', dest='usenow', action='store_true', default=False, help='Use current timestamp instead of parsing commits') parser.add_option('-c', '--check-export-ok', dest='check_export_ok', action='store_true', default=False, help='Export only repositories marked as git-daemon-export-ok') parser.add_option('-p', '--purge', dest='purge', action='store_true', default=False, help='Purge deleted git repositories from manifest') parser.add_option('-x', '--remove', dest='remove', action='store_true', default=False, help='Remove repositories passed as arguments from manifest') parser.add_option('-y', '--pretty', dest='pretty', action='store_true', default=False, help='Pretty-print manifest (sort keys and add indentation)') parser.add_option('-i', '--ignore-paths', dest='ignore', action='append', default=[], help='When finding git dirs, ignore these paths ' '(can be used multiple times, accepts shell-style globbing)') parser.add_option('-w', '--wait-for-manifest', dest='wait', action='store_true', default=False, help='When running with arguments, wait if manifest is not there ' '(can be useful when multiple writers are writing the manifest)') parser.add_option('-v', '--verbose', dest='verbose', action='store_true', default=False, help='Be verbose and tell us what you are doing') (opts, args) = parser.parse_args() if not opts.manifile: parser.error('You must provide the path to the manifest file') if not opts.toplevel: parser.error('You must provide the toplevel path') if not len(args) and opts.wait: parser.error('--wait option only makes sense when dirs are passed') logger.setLevel(logging.DEBUG) ch = logging.StreamHandler() formatter = logging.Formatter('%(message)s') ch.setFormatter(formatter) if opts.verbose: ch.setLevel(logging.INFO) else: ch.setLevel(logging.CRITICAL) logger.addHandler(ch) if opts.logfile is not None: ch = logging.FileHandler(opts.logfile) formatter = logging.Formatter("[%(process)d] %(asctime)s - %(levelname)s - %(message)s") ch.setFormatter(formatter) ch.setLevel(logging.DEBUG) logger.addHandler(ch) # push our logger into grokmirror to override the default grokmirror.logger = logger grokmirror.manifest_lock(opts.manifile) manifest = grokmirror.read_manifest(opts.manifile, wait=opts.wait) # If manifest is empty, don't use current timestamp if not len(manifest.keys()): opts.usenow = False if opts.remove and len(args): # Remove the repos as required, write new manfiest and exit for fullpath in args: repo = fullpath.replace(opts.toplevel, '', 1) if repo in manifest.keys(): del manifest[repo] logger.info('Repository %s removed from manifest' % repo) else: logger.info('Repository %s not in manifest' % repo) # XXX: need to add logic to make sure we don't break the world # by removing a repository used as a reference for others # also make sure we clean up any dangling symlinks grokmirror.write_manifest(opts.manifile, manifest, pretty=opts.pretty) grokmirror.manifest_unlock(opts.manifile) sys.exit(0) if opts.purge or not len(args) or not len(manifest.keys()): # We automatically purge when we do a full tree walk gitdirs = grokmirror.find_all_gitdirs(opts.toplevel, ignore=opts.ignore) purge_manifest(manifest, opts.toplevel, gitdirs) if len(manifest.keys()) and len(args): # limit ourselves to passed dirs only when there is something # in the manifest. This precaution makes sure we regenerate the # whole file when there is nothing in it or it can't be parsed. gitdirs = args symlinks = [] for gitdir in gitdirs: # check to make sure this gitdir is ok to export if (opts.check_export_ok and not os.path.exists(os.path.join(gitdir, 'git-daemon-export-ok'))): # is it curently in the manifest? repo = gitdir.replace(opts.toplevel, '', 1) if repo in manifest.keys(): logger.info('Repository %s is no longer exported, ' 'removing from manifest' % repo) del manifest[repo] # XXX: need to add logic to make sure we don't break the world # by removing a repository used as a reference for others # also make sure we clean up any dangling symlinks continue if os.path.islink(gitdir): symlinks.append(gitdir) else: update_manifest(manifest, opts.toplevel, gitdir, opts.usenow) if len(symlinks): set_symlinks(manifest, opts.toplevel, symlinks) grokmirror.write_manifest(opts.manifile, manifest, pretty=opts.pretty) grokmirror.manifest_unlock(opts.manifile) grokmirror-0.3.5/grok-pull.py000077500000000000000000000600021220344016600162010ustar00rootroot00000000000000#!/usr/bin/python -tt # Copyright (C) 2013 by The Linux Foundation and contributors # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import sys import grokmirror import logging import urllib2 import time import gzip import json import fnmatch import subprocess import shutil import calendar import threading import Queue from fcntl import lockf, LOCK_EX, LOCK_UN, LOCK_NB from StringIO import StringIO from git import Repo # default basic logger. We override it later. logger = logging.getLogger(__name__) class PullerThread(threading.Thread): def __init__(self, in_queue, out_queue, config, thread_name): threading.Thread.__init__(self) self.in_queue = in_queue self.out_queue = out_queue self.toplevel = config['toplevel'] self.hookscript = config['post_update_hook'] self.myname = thread_name def run(self): while True: (gitdir, modified) = self.in_queue.get() fullpath = os.path.join(self.toplevel, gitdir.lstrip('/')) try: grokmirror.lock_repo(fullpath, nonblocking=True) logger.info('[Thread-%s] Updating %s' % (self.myname, gitdir)) pull_repo(self.toplevel, gitdir) set_agefile(self.toplevel, gitdir, modified) run_post_update_hook(self.hookscript, self.toplevel, gitdir) grokmirror.unlock_repo(fullpath) self.out_queue.put((gitdir, True)) except IOError, ex: logger.info('Could not obtain exclusive lock on %s' % gitdir) logger.info('\tAssuming grok-fsck is running, will try later.') self.out_queue.put((gitdir, False)) logger.debug('[Thread-%s] done pulling %s' % (self.myname, gitdir)) self.in_queue.task_done() def fix_remotes(gitdir, toplevel, site): # Remove all existing remotes and set new origin repo = Repo(os.path.join(toplevel, gitdir.lstrip('/'))) remotes = repo.git.remote() if len(remotes.strip()): logger.debug('existing remotes: %s' % remotes) for name in remotes.split('\n'): logger.debug('\tremoving remote: %s' % name) repo.git.remote('rm', name) # set my origin origin = os.path.join(site, gitdir.lstrip('/')) repo.git.remote('add', '--mirror', 'origin', origin) logger.debug('\tset new origin as %s' % origin) def set_owner_description(toplevel, gitdir, owner, description): if owner is None and description is None: # Let the default git values be there, then return fullpath = os.path.join(toplevel, gitdir.lstrip('/')) repo = Repo(fullpath) if description is not None and repo.description != description: logger.debug('Setting %s description to: %s' % (gitdir, description)) repo.description = description if owner is not None: logger.debug('Setting %s owner to: %s' % (gitdir, owner)) repo.git.config('gitweb.owner', owner) def set_agefile(toplevel, gitdir, last_modified): # set agefile, which can be used by cgit to show idle times # cgit recommends it to be yyyy-mm-dd hh:mm:ss cgit_fmt = time.strftime('%F %T', time.localtime(last_modified)) agefile = os.path.join(toplevel, gitdir.lstrip('/'), 'info/web/last-modified') if not os.path.exists(os.path.dirname(agefile)): os.makedirs(os.path.dirname(agefile)) fh = open(agefile, 'w') fh.write('%s\n' % cgit_fmt) fh.close() logger.debug('Wrote "%s" into %s' % (cgit_fmt, agefile)) def run_post_update_hook(hookscript, toplevel, gitdir): if hookscript == '': return if not os.access(hookscript, os.X_OK): logger.warning('post_update_hook %s is not executable' % hookscript) return fullpath = os.path.join(toplevel, gitdir.lstrip('/')) args = [hookscript, fullpath] logger.debug('Running: %s' % ' '.join(args)) (output, error) = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate() error = error.strip() output = output.strip() if error: # Put hook stderror into warning logger.warning('Hook Stderr: %s' % error) if output: # Put hook stdout into info logger.info('Hook Stdout: %s' % output) def pull_repo(toplevel, gitdir): env = {'GIT_DIR': os.path.join(toplevel, gitdir.lstrip('/'))} args = ['/usr/bin/git', 'remote', 'update', '--prune'] logger.debug('Running: GIT_DIR=%s %s' % (env['GIT_DIR'], ' '.join(args))) (output, error) = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env).communicate() error = error.strip() if error: # Put things we recognize into debug debug = [] warn = [] for line in error.split('\n'): if line.find('From ') == 0: debug.append(line) elif line.find('-> ') > 0: debug.append(line) else: warn.append(line) if debug: logger.debug('Stderr: %s' % '\n'.join(debug)) if warn: logger.warning('Stderr: %s' % '\n'.join(warn)) def clone_repo(toplevel, gitdir, site, reference=None): source = os.path.join(site, gitdir.lstrip('/')) dest = os.path.join(toplevel, gitdir.lstrip('/')) args = ['/usr/bin/git', 'clone', '--mirror'] if reference is not None: reference = os.path.join(toplevel, reference.lstrip('/')) args.append('--reference') args.append(reference) args.append(source) args.append(dest) logger.info('Cloning %s into %s' % (source, dest)) if reference is not None: logger.info('With reference to %s' % reference) logger.debug('Running: %s' % ' '.join(args)) (output, error) = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate() error = error.strip() if error: # Put things we recognize into debug debug = [] warn = [] for line in error.split('\n'): if line.find('cloned an empty repository') > 0: debug.append(line) else: warn.append(line) if debug: logger.debug('Stderr: %s' % '\n'.join(debug)) if warn: logger.warning('Stderr: %s' % '\n'.join(warn)) def clone_order(to_clone, manifest, to_clone_sorted, existing): # recursively go through the list and resolve dependencies new_to_clone = [] num_received = len(to_clone) logger.debug('Another clone_order loop') for gitdir in to_clone: reference = manifest[gitdir]['reference'] logger.debug('reference: %s' % reference) if (reference in existing or reference in to_clone_sorted or reference is None): logger.debug('%s: reference found in existing' % gitdir) to_clone_sorted.append(gitdir) else: logger.debug('%s: reference not found' % gitdir) new_to_clone.append(gitdir) if len(new_to_clone) == 0 or len(new_to_clone) == num_received: # we can resolve no more dependencies, break out logger.debug('Finished resolving dependencies, quitting') if len(new_to_clone): logger.debug('Unresolved: %s' % new_to_clone) to_clone_sorted.extend(new_to_clone) return logger.debug('Going for another clone_order loop') clone_order(new_to_clone, manifest, to_clone_sorted, existing) def write_projects_list(manifest, config): import tempfile import shutil if 'projectslist' not in config.keys(): return if config['projectslist'] == '': return plpath = config['projectslist'] trimtop = '' if 'projectslist_trimtop' in config.keys(): trimtop = config['projectslist_trimtop'] add_symlinks = False if ('projectslist_symlinks' in config.keys() and config['projectslist_symlinks'] == 'yes'): add_symlinks = True (dirname, basename) = os.path.split(plpath) (fd, tmpfile) = tempfile.mkstemp(prefix=basename, dir=dirname) logger.info('Writing new %s' % plpath) try: fh = open(tmpfile, 'w') for gitdir in manifest.keys(): if trimtop and gitdir.find(trimtop) == 0: pgitdir = gitdir[len(trimtop):] else: pgitdir = gitdir # Always remove leading slash, otherwise cgit breaks pgitdir = pgitdir.lstrip('/') fh.write('%s\n' % pgitdir) if add_symlinks and 'symlinks' in manifest[gitdir].keys(): # Do the same for symlinks # XXX: Should make this configurable, perhaps for symlink in manifest[gitdir]['symlinks']: if trimtop and symlink.find(trimtop) == 0: symlink = symlink[len(trimtop):] symlink = symlink.lstrip('/') fh.write('%s\n' % symlink) fh.close() os.chmod(tmpfile, 0644) shutil.move(tmpfile, plpath) finally: # If something failed, don't leave tempfiles trailing around if os.path.exists(tmpfile): os.unlink(tmpfile) def pull_mirror(name, config, opts): global logger logger = logging.getLogger(name) logger.setLevel(logging.DEBUG) if 'log' in config.keys(): ch = logging.FileHandler(config['log']) formatter = logging.Formatter("[%(process)d] %(asctime)s - %(levelname)s - %(message)s") ch.setFormatter(formatter) loglevel = logging.INFO if 'loglevel' in config.keys(): if config['loglevel'] == 'debug': loglevel = logging.DEBUG ch.setLevel(loglevel) logger.addHandler(ch) ch = logging.StreamHandler() formatter = logging.Formatter('%(message)s') ch.setFormatter(formatter) if opts.verbose: ch.setLevel(logging.INFO) else: ch.setLevel(logging.CRITICAL) logger.addHandler(ch) # push it into grokmirror to override the default logger grokmirror.logger = logger # Lock the tree to make sure we only run one instance logger.debug('Attempting to obtain lock on %s' % config['lock']) flockh = open(config['lock'], 'w') try: lockf(flockh, LOCK_EX | LOCK_NB) except IOError, ex: logger.debug('Could not obtain exclusive lock on %s' % config['lock']) logger.info('Skipping [%s] - update already running' % name) return 0 logger.info('Checking [%s]' % name) mymanifest = config['mymanifest'] if config['manifest'].find('file:///') == 0: manifile = config['manifest'].replace('file://', '') if not os.path.exists(manifile): logger.critical('Remote manifest not found in %s! Quitting!' % config['manifest']) return 1 fstat = os.stat(manifile) last_modified = fstat[8] if os.path.exists(config['mymanifest']): fstat = os.stat(config['mymanifest']) my_last_modified = fstat[8] if not (opts.force or opts.nomtime) and last_modified <= my_last_modified: logger.info('Manifest file unchanged. Quitting.') lockf(flockh, LOCK_UN) flockh.close() return 0 logger.info('Reading new manifest from %s' % manifile) manifest = grokmirror.read_manifest(manifile) # Don't accept empty manifests -- that indicates something is wrong if not len(manifest.keys()): logger.critical('Remote manifest empty or unparseable! Quitting.') return 1 else: # Load it from remote host using http and header magic logger.info('Fetching remote manifest from %s' % config['manifest']) request = urllib2.Request(config['manifest']) opener = urllib2.build_opener() # Find out if we need to run at all first if not (opts.force or opts.nomtime) and os.path.exists(mymanifest): fstat = os.stat(mymanifest) mtime = fstat[8] logger.debug('mtime on %s is: %s' % (mymanifest, mtime)) my_last_modified = time.strftime('%a, %d %b %Y %H:%M:%S GMT', time.gmtime(mtime)) logger.debug('Our last-modified is: %s' % my_last_modified) request.add_header('If-Modified-Since', my_last_modified) try: ufh = opener.open(request, timeout=30) except urllib2.HTTPError, ex: if ex.code == 304: logger.info('Server says we have the latest manifest. Quitting.') lockf(flockh, LOCK_UN) flockh.close() return 0 logger.critical('Could not fetch %s' % config['manifest']) logger.critical('Server returned: %s' % ex) lockf(flockh, LOCK_UN) flockh.close() return 1 except urllib2.URLError, ex: logger.critical('Could not fetch %s' % config['manifest']) logger.critical('Error was: %s' % ex) lockf(flockh, LOCK_UN) flockh.close() return 1 last_modified = ufh.headers.get('Last-Modified') last_modified = time.strptime(last_modified, '%a, %d %b %Y %H:%M:%S %Z') last_modified = calendar.timegm(last_modified) # We don't use read_manifest for the remote manifest, as it can be # anything, really. For now, blindly open it with gzipfile if it ends # with .gz. XXX: some http servers will auto-deflate such files. if config['manifest'].find('.gz') > 0: fh = gzip.GzipFile(fileobj=StringIO(ufh.read())) else: fh = ufh try: manifest = json.load(fh) except: logger.critical('Failed to parse %s' % config['manifest']) lockf(flockh, LOCK_UN) flockh.close() return 1 mymanifest = grokmirror.read_manifest(mymanifest) includes = config['include'].split('\n') excludes = config['exclude'].split('\n') # We keep culled, because that becomes our new manifest culled = {} for gitdir in manifest.keys(): # does it fall under include? for include in includes: if fnmatch.fnmatch(gitdir, include): # Yes, but does it fall under excludes? excluded = False for exclude in excludes: if fnmatch.fnmatch(gitdir, exclude): excluded = True break if excluded: continue culled[gitdir] = manifest[gitdir] to_clone = [] to_pull = [] existing = [] toplevel = config['toplevel'] if not os.access(toplevel, os.W_OK): logger.critical('Toplevel %s does not exist or is not writable' % toplevel) sys.exit(1) for gitdir in culled.keys(): fullpath = os.path.join(toplevel, gitdir.lstrip('/')) if gitdir in mymanifest.keys(): # Is the directory in place, too? if os.path.exists(fullpath): # Is it newer than what we have in our old manifest? if gitdir in mymanifest.keys(): # This code is hurky and needs to be cleaned up desc = culled[gitdir].get('description') owner = culled[gitdir].get('owner') mydesc = mymanifest[gitdir].get('description') myowner = mymanifest[gitdir].get('owner') if owner is None: owner = config['default_owner'] if myowner is None: myowner = config['default_owner'] if (owner != myowner or desc != mydesc): # we can do this right away without waiting set_owner_description(toplevel, gitdir, owner, desc) if (opts.force or culled[gitdir]['modified'] > mymanifest[gitdir]['modified']): to_pull.append(gitdir) else: logger.debug('Repo %s unchanged in manifest' % gitdir) existing.append(gitdir) continue # do we have the dir in place? if os.path.exists(fullpath): if not opts.reuse: logger.critical('Found existing repository in %s' % fullpath) logger.critical('Run with -r to use existing repos') sys.exit(1) logger.info('Found existing %s, will set new origin' % gitdir) # Accept it, but fix remotes so they are pointing to our origin fix_remotes(gitdir, toplevel, config['site']) to_pull.append(gitdir) continue to_clone.append(gitdir) hookscript = config['post_update_hook'] in_queue = Queue.Queue() out_queue = Queue.Queue() # start out the number of threads we want if 'pull_threads' in config.keys(): pull_threads = int(config['pull_threads']) if pull_threads < 1: logger.info('The pull_threads option is less than 1, forcing to 1') pull_threads = 1 else: # be conservative logger.info('The pull_threads option is not set, consider setting it') pull_threads = 5 logger.info('Will use %d threads to pull repos' % pull_threads) for gitdir in to_pull: in_queue.put((gitdir, culled[gitdir]['modified'])) for i in range(pull_threads): t = PullerThread(in_queue, out_queue, config, i) t.setDaemon(True) t.start() # wait till it's all done in_queue.join() logger.info('All threads finished.') while not out_queue.empty(): # see if any of it failed (gitdir, status) = out_queue.get() if status == False: # To make sure we check this again during next run, # fudge the manifest accordingly. logger.debug('Will recheck %s during next run' % gitdir) culled[gitdir] = mymanifest[gitdir] # this is rather hackish, but effective last_modified -= 1 if to_clone: # we use "existing" to track which repos can be used as references existing.extend(to_pull) to_clone_sorted = [] clone_order(to_clone, manifest, to_clone_sorted, existing) for gitdir in to_clone_sorted: reference = culled[gitdir]['reference'] if reference is not None and reference in existing: clone_repo(toplevel, gitdir, config['site'], reference=reference) else: clone_repo(toplevel, gitdir, config['site']) # check dir to make sure cloning succeeded and then add to existing if os.path.exists(os.path.join(toplevel, gitdir.lstrip('/'))): logger.debug('Cloning of %s succeeded, adding to existing' % gitdir) existing.append(gitdir) desc = culled[gitdir].get('description') owner = culled[gitdir].get('owner') if owner is None: owner = config['default_owner'] set_owner_description(toplevel, gitdir, owner, desc) set_agefile(toplevel, gitdir, culled[gitdir]['modified']) run_post_update_hook(hookscript, toplevel, gitdir) # loop through all entries and find any symlinks we need to set # We also collect all symlinks to do purging correctly symlinks = [] for gitdir in culled.keys(): if 'symlinks' in culled[gitdir].keys(): source = os.path.join(config['toplevel'], gitdir.lstrip('/')) for symlink in culled[gitdir]['symlinks']: if symlink not in symlinks: symlinks.append(symlink) target = os.path.join(config['toplevel'], symlink.lstrip('/')) if not os.path.exists(target) and os.path.exists(source): logger.info('Symlinking %s -> %s' % (target, source)) # Make sure the leading dirs are in place if not os.path.exists(os.path.dirname(target)): os.makedirs(os.path.dirname(target)) os.symlink(source, target) if opts.purge: for founddir in grokmirror.find_all_gitdirs(config['toplevel']): gitdir = founddir.replace(config['toplevel'], '') if gitdir not in culled.keys() and gitdir not in symlinks: if os.path.islink(founddir): logger.info('Removing unreferenced symlink %s' % gitdir) os.unlink(founddir) else: logger.info('Purging %s' % gitdir) shutil.rmtree(founddir) # Once we're done, save culled as our new manifest grokmirror.write_manifest(config['mymanifest'], culled, mtime=last_modified, pretty=opts.pretty) # write out projects.list, if asked to write_projects_list(culled, config) logger.debug('Unlocking %s' % config['lock']) lockf(flockh, LOCK_UN) flockh.close() return 127 if __name__ == '__main__': from optparse import OptionParser from ConfigParser import ConfigParser usage = '''usage: %prog -c repos.conf Create a grok mirror using the repository configuration found in repos.conf ''' parser = OptionParser(usage=usage, version=grokmirror.VERSION) parser.add_option('-v', '--verbose', dest='verbose', action='store_true', default=False, help='Be verbose and tell us what you are doing') parser.add_option('-n', '--no-mtime-check', dest='nomtime', action='store_true', default=False, help='Run without checking manifest mtime.') parser.add_option('-f', '--force', dest='force', action='store_true', default=False, help='Force full git update regardless of last-modified times. ' 'Also useful when repos.conf has changed.') parser.add_option('-p', '--purge', dest='purge', action='store_true', default=False, help='Remove any git trees that are no longer in manifest.') parser.add_option('-y', '--pretty', dest='pretty', action='store_true', default=False, help='Pretty-print manifest (sort keys and add indentation)') parser.add_option('-r', '--reuse-existing-repos', dest='reuse', action='store_true', default=False, help='If any existing repositories are found on disk, set new ' 'remote origin and reuse') parser.add_option('-c', '--config', dest='config', help='Location of repos.conf') (opts, args) = parser.parse_args() if not opts.config: parser.error('You must provide the path to the config file') ini = ConfigParser() ini.read(opts.config) retval = 0 for section in ini.sections(): config = {} for (option, value) in ini.items(section): config[option] = value if 'default_owner' not in config.keys(): config['default_owner'] = 'Grokmirror User' if 'post_update_hook' not in config.keys(): config['post_update_hook'] = '' if 'include' not in config.keys(): config['include'] = '*' if 'exclude' not in config.keys(): config['exclude'] = '' sect_retval = pull_mirror(section, config, opts) if sect_retval == 1: # Fatal error encountered at some point retval = 1 elif sect_retval == 127: # Successful run with contents modified retval = 127 sys.exit(retval) grokmirror-0.3.5/grokmirror/000077500000000000000000000000001220344016600161075ustar00rootroot00000000000000grokmirror-0.3.5/grokmirror/__init__.py000066400000000000000000000137261220344016600202310ustar00rootroot00000000000000# Copyright (C) 2013 by The Linux Foundation and contributors # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import sys import time import json import fnmatch import logging from fcntl import lockf, LOCK_EX, LOCK_UN, LOCK_NB from StringIO import StringIO VERSION = '0.3.5' MANIFEST_LOCKH = None REPO_LOCKH = {} # default logger. Will probably be overridden. logger = logging.getLogger(__name__) def lock_repo(fullpath, nonblocking=False): repolock = os.path.join(fullpath, 'grokmirror.lock') logger.debug('Attempting to exclusive-lock %s' % repolock) lockfh = open(repolock, 'w') if nonblocking: flags = LOCK_EX | LOCK_NB else: flags = LOCK_EX lockf(lockfh, flags) global REPO_LOCKH REPO_LOCKH[fullpath] = lockfh def unlock_repo(fullpath): global REPO_LOCKH if fullpath in REPO_LOCKH.keys(): logger.debug('Unlocking %s' % fullpath) lockf(REPO_LOCKH[fullpath], LOCK_UN) REPO_LOCKH[fullpath].close() del REPO_LOCKH[fullpath] def is_bare_git_repo(path): """ Return True if path (which is already verified to be a directory) sufficiently resembles a base git repo (good enough to fool git itself). """ logger.debug('Checking if %s is a git repository' % path) if (os.path.isdir(os.path.join(path, 'objects')) and os.path.isdir(os.path.join(path, 'refs')) and os.path.isfile(os.path.join(path, 'HEAD'))): return True logger.debug('Skipping %s: not a git repository' % path) return False def find_all_gitdirs(toplevel, ignore=[]): logger.info('Finding bare git repos in %s' % toplevel) logger.debug('Ignore list: %s' % ' '.join(ignore)) gitdirs = [] for root, dirs, files in os.walk(toplevel, topdown=True): if not len(dirs): continue torm = [] for name in dirs: # Should we ignore this dir? ignored = False for ignoredir in ignore: if fnmatch.fnmatch(os.path.join(root, name), ignoredir): torm.append(name) ignored = True break if not ignored and is_bare_git_repo(os.path.join(root, name)): logger.debug('Found %s' % os.path.join(root, name)) gitdirs.append(os.path.join(root, name)) torm.append(name) for name in torm: # don't recurse into the found *.git dirs dirs.remove(name) return gitdirs def manifest_lock(manifile): (dirname, basename) = os.path.split(manifile) global MANIFEST_LOCKH MANIFEST_LOCKH = open(os.path.join(dirname, '.%s.lock' % basename), 'w') logger.debug('Attempting to lock the manifest') lockf(MANIFEST_LOCKH, LOCK_EX) logger.debug('Manifest lock obtained') def manifest_unlock(manifile): global MANIFEST_LOCKH if MANIFEST_LOCKH is not None: logger.debug('Unlocking manifest') lockf(MANIFEST_LOCKH, LOCK_UN) MANIFEST_LOCKH.close() def read_manifest(manifile, wait=False): while True: if not wait or os.path.exists(manifile): break logger.info('Manifest file not yet found, waiting...') # Unlock the manifest so other processes aren't waiting for us was_locked = False if MANIFEST_LOCKH is not None: was_locked = True manifest_unlock(manifile) time.sleep(1) if was_locked: manifest_lock(manifile) if not os.path.exists(manifile): logger.info('%s not found, assuming initial run' % manifile) return {} if manifile.find('.gz') > 0: import gzip fh = gzip.open(manifile, 'rb') else: fh = open(manifile, 'r') logger.info('Reading %s' % manifile) try: manifest = json.load(fh) except: # We'll regenerate the file entirely on failure to parse logger.critical('Unable to parse %s, will regenerate' % manifile) manifest = {} fh.close() logger.debug('Manifest contains %s entries' % len(manifest.keys())) return manifest def write_manifest(manifile, manifest, mtime=None, pretty=False): import tempfile import shutil import gzip logger.info('Writing new %s' % manifile) (dirname, basename) = os.path.split(manifile) (fd, tmpfile) = tempfile.mkstemp(prefix=basename, dir=dirname) fh = os.fdopen(fd, 'w', 0) logger.debug('Created a temporary file in %s' % tmpfile) logger.debug('Writing to %s' % tmpfile) try: if manifile.find('.gz') > 0: gfh = gzip.GzipFile(fileobj=fh, mode='wb') if pretty: json.dump(manifest, gfh, indent=2, sort_keys=True) else: json.dump(manifest, gfh) gfh.close() else: if pretty: json.dump(manifest, fh, indent=2, sort_keys=True) else: json.dump(manifest, fh) os.fsync(fd) fh.close() os.chmod(tmpfile, 0644) if mtime is not None: logger.debug('Setting mtime to %s' % mtime) os.utime(tmpfile, (mtime, mtime)) logger.debug('Moving %s to %s' % (tmpfile, manifile)) shutil.move(tmpfile, manifile) finally: # If something failed, don't leave these trailing around if os.path.exists(tmpfile): logger.debug('Removing %s' % tmpfile) os.unlink(tmpfile) grokmirror-0.3.5/man/000077500000000000000000000000001220344016600144655ustar00rootroot00000000000000grokmirror-0.3.5/man/grok-dumb-pull.1000066400000000000000000000060111220344016600174060ustar00rootroot00000000000000.\" Man page generated from reStructuredText. . .TH GROK-DUMB-PULL 1 "2013-05-27" "0.3" "" .SH NAME GROK-DUMB-PULL \- Update git repositories not managed by grokmirror . .nr rst2man-indent-level 0 . .de1 rstReportMargin \\$1 \\n[an-margin] level \\n[rst2man-indent-level] level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] - \\n[rst2man-indent0] \\n[rst2man-indent1] \\n[rst2man-indent2] .. .de1 INDENT .\" .rstReportMargin pre: . RS \\$1 . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] . nr rst2man-indent-level +1 .\" .rstReportMargin post: .. .de UNINDENT . RE .\" indent \\n[an-margin] .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] .nr rst2man-indent-level -1 .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. .SH SYNOPSIS .INDENT 0.0 .INDENT 3.5 grok\-dumb\-pull [options] /path/to/repos .UNINDENT .UNINDENT .SH DESCRIPTION .sp This is a satellite utility that updates repositories not exported via grokmirror manifest. You will need to manually clone these repositories using "git clone \-\-mirror" and then define a cronjob to update them as frequently as you require. Grok\-dumb\-pull will bluntly execute "git remote update" in each of them. .SH OPTIONS .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .TP .B \-\-version show program\(aqs version number and exit .TP .B \-h, \-\-help show this help message and exit .TP .B \-v, \-\-verbose Be verbose and tell us what you are doing .TP .B \-s, \-\-svn The remotes for these repositories are Subversion .TP .BI \-r \ REMOTES, \ \-\-remote\-names\fB= REMOTES Only fetch remotes matching this name (accepts globbing, can be passed multiple times) .TP .BI \-u \ POSTHOOK, \ \-\-post\-update\-hook\fB= POSTHOOK Run this hook after each repository is updated. Passes full path to the repository as the sole argument. .TP .BI \-l \ LOGFILE, \ \-\-logfile\fB= LOGFILE Put debug logs into this file .UNINDENT .UNINDENT .UNINDENT .SH EXAMPLES .sp The following will update all bare git repositories found in /path/to/repos hourly, and /path/to/special/repo.git daily, fetching only the "github" remote: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C MAILTO=root # Update all repositories found in /path/to/repos hourly 0 * * * * mirror /usr/bin/grok\-dumb\-pull /path/to/repos # Update /path/to/special/repo.git daily, fetching "github" remote 0 0 * * * mirror /usr/bin/grok\-dumb\-pull \-r github /path/to/special/repo.git .ft P .fi .UNINDENT .UNINDENT .sp Make sure the user "mirror" (or whichever user you specified) is able to write to the repos specified. .SH SEE ALSO .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .IP \(bu 2 grok\-pull(1) .IP \(bu 2 grok\-manifest(1) .IP \(bu 2 grok\-fsck(1) .IP \(bu 2 git(1) .UNINDENT .UNINDENT .UNINDENT .SH SUPPORT .sp Please send support requests to the mailing list: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C http://lists.kernel.org/mailman/listinfo/grokmirror .ft P .fi .UNINDENT .UNINDENT .SH AUTHOR mricon@kernel.org License: GPLv3+ .SH COPYRIGHT The Linux Foundation and contributors .\" Generated by docutils manpage writer. . grokmirror-0.3.5/man/grok-dumb-pull.1.rst000066400000000000000000000043131220344016600202200ustar00rootroot00000000000000GROK-DUMB-PULL ============== ------------------------------------------------- Update git repositories not managed by grokmirror ------------------------------------------------- :Author: mricon@kernel.org :Date: 2013-05-27 :Copyright: The Linux Foundation and contributors :License: GPLv3+ :Version: 0.3 :Manual section: 1 SYNOPSIS -------- grok-dumb-pull [options] /path/to/repos DESCRIPTION ----------- This is a satellite utility that updates repositories not exported via grokmirror manifest. You will need to manually clone these repositories using "git clone --mirror" and then define a cronjob to update them as frequently as you require. Grok-dumb-pull will bluntly execute "git remote update" in each of them. OPTIONS ------- --version show program's version number and exit -h, --help show this help message and exit -v, --verbose Be verbose and tell us what you are doing -s, --svn The remotes for these repositories are Subversion -r REMOTES, --remote-names=REMOTES Only fetch remotes matching this name (accepts globbing, can be passed multiple times) -u POSTHOOK, --post-update-hook=POSTHOOK Run this hook after each repository is updated. Passes full path to the repository as the sole argument. -l LOGFILE, --logfile=LOGFILE Put debug logs into this file EXAMPLES -------- The following will update all bare git repositories found in /path/to/repos hourly, and /path/to/special/repo.git daily, fetching only the "github" remote:: MAILTO=root # Update all repositories found in /path/to/repos hourly 0 * * * * mirror /usr/bin/grok-dumb-pull /path/to/repos # Update /path/to/special/repo.git daily, fetching "github" remote 0 0 * * * mirror /usr/bin/grok-dumb-pull -r github /path/to/special/repo.git Make sure the user "mirror" (or whichever user you specified) is able to write to the repos specified. SEE ALSO -------- * grok-pull(1) * grok-manifest(1) * grok-fsck(1) * git(1) SUPPORT ------- Please send support requests to the mailing list:: http://lists.kernel.org/mailman/listinfo/grokmirror grokmirror-0.3.5/man/grok-fsck.1000066400000000000000000000054031220344016600164370ustar00rootroot00000000000000.\" Man page generated from reStructuredText. . .TH GROK-FSCK 1 "2013-04-26" "0.3" "" .SH NAME GROK-FSCK \- Check mirrored repositories for corruption . .nr rst2man-indent-level 0 . .de1 rstReportMargin \\$1 \\n[an-margin] level \\n[rst2man-indent-level] level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] - \\n[rst2man-indent0] \\n[rst2man-indent1] \\n[rst2man-indent2] .. .de1 INDENT .\" .rstReportMargin pre: . RS \\$1 . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] . nr rst2man-indent-level +1 .\" .rstReportMargin post: .. .de UNINDENT . RE .\" indent \\n[an-margin] .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] .nr rst2man-indent-level -1 .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. .SH SYNOPSIS .INDENT 0.0 .INDENT 3.5 grok\-fsck \-c /path/to/fsck.conf .UNINDENT .UNINDENT .SH DESCRIPTION .sp Git repositories can get corrupted whether they are frequently updated or not, which is why it is useful to routinely check them using "git fsck". Grokmirror ships with a "grok\-fsck" utility that will run "git fsck" on all mirrored git repositories. It is supposed to be run nightly from cron, and will do its best to randomly stagger the checks so only a subset of repositories is checked each night. Any errors will be sent to the user set in MAILTO. .SH OPTIONS .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .TP .B \-\-version show program\(aqs version number and exit .TP .B \-h, \-\-help show this help message and exit .TP .B \-v, \-\-verbose Be verbose and tell us what you are doing .TP .B \-f, \-\-force Force immediate run on all repositories. .TP .BI \-c \ CONFIG, \ \-\-config\fB= CONFIG Location of fsck.conf .UNINDENT .UNINDENT .UNINDENT .SH EXAMPLES .sp Locate fsck.conf and modify it to reflect your needs. The default configuration file is heavily commented. .sp Set up a cron job to run nightly and to email any discovered errors to root: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C # Make sure MAILTO is set, for error reports MAILTO=root # Run nightly, at 2AM 00 02 * * * mirror /usr/bin/grok\-fsck \-c /etc/grokmirror/fsck.conf .ft P .fi .UNINDENT .UNINDENT .sp You can force a full run using the \fB\-f\fP flag, but unless you only have a few smallish git repositories, it\(aqs not recommended, as it may take several hours to complete. .SH SEE ALSO .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .IP \(bu 2 grok\-manifest(1) .IP \(bu 2 grok\-pull(1) .IP \(bu 2 git(1) .UNINDENT .UNINDENT .UNINDENT .SH SUPPORT .sp Please send support requests to the mailing list: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C http://lists.kernel.org/mailman/listinfo/grokmirror .ft P .fi .UNINDENT .UNINDENT .SH AUTHOR mricon@kernel.org License: GPLv3+ .SH COPYRIGHT The Linux Foundation and contributors .\" Generated by docutils manpage writer. . grokmirror-0.3.5/man/grok-fsck.1.rst000066400000000000000000000035721220344016600172530ustar00rootroot00000000000000GROK-FSCK ========= ------------------------------------------ Check mirrored repositories for corruption ------------------------------------------ :Author: mricon@kernel.org :Date: 2013-04-26 :Copyright: The Linux Foundation and contributors :License: GPLv3+ :Version: 0.3 :Manual section: 1 SYNOPSIS -------- grok-fsck -c /path/to/fsck.conf DESCRIPTION ----------- Git repositories can get corrupted whether they are frequently updated or not, which is why it is useful to routinely check them using "git fsck". Grokmirror ships with a "grok-fsck" utility that will run "git fsck" on all mirrored git repositories. It is supposed to be run nightly from cron, and will do its best to randomly stagger the checks so only a subset of repositories is checked each night. Any errors will be sent to the user set in MAILTO. OPTIONS ------- --version show program's version number and exit -h, --help show this help message and exit -v, --verbose Be verbose and tell us what you are doing -f, --force Force immediate run on all repositories. -c CONFIG, --config=CONFIG Location of fsck.conf EXAMPLES -------- Locate fsck.conf and modify it to reflect your needs. The default configuration file is heavily commented. Set up a cron job to run nightly and to email any discovered errors to root:: # Make sure MAILTO is set, for error reports MAILTO=root # Run nightly, at 2AM 00 02 * * * mirror /usr/bin/grok-fsck -c /etc/grokmirror/fsck.conf You can force a full run using the ``-f`` flag, but unless you only have a few smallish git repositories, it's not recommended, as it may take several hours to complete. SEE ALSO -------- * grok-manifest(1) * grok-pull(1) * git(1) SUPPORT ------- Please send support requests to the mailing list:: http://lists.kernel.org/mailman/listinfo/grokmirror grokmirror-0.3.5/man/grok-manifest.1000066400000000000000000000106471220344016600173250ustar00rootroot00000000000000.\" Man page generated from reStructuredText. . .TH GROK-MANIFEST 1 "2013-04-26" "0.3" "" .SH NAME GROK-MANIFEST \- Create manifest for use with grokmirror . .nr rst2man-indent-level 0 . .de1 rstReportMargin \\$1 \\n[an-margin] level \\n[rst2man-indent-level] level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] - \\n[rst2man-indent0] \\n[rst2man-indent1] \\n[rst2man-indent2] .. .de1 INDENT .\" .rstReportMargin pre: . RS \\$1 . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] . nr rst2man-indent-level +1 .\" .rstReportMargin post: .. .de UNINDENT . RE .\" indent \\n[an-margin] .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] .nr rst2man-indent-level -1 .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. .SH SYNOPSIS .INDENT 0.0 .INDENT 3.5 grok\-manifest [opts] \-m manifest.js[.gz] \-t /path [/path/to/bare.git] .UNINDENT .UNINDENT .SH DESCRIPTION .sp Call grok\-manifest from a git post\-update or post\-receive hook to create the latest repository manifest. This manifest file is downloaded by mirror slaves (if newer than what they already have) and used to only clone/pull the repositories that have changed since the mirror\(aqs last run. .SH OPTIONS .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .TP .B \-\-version show program\(aqs version number and exit .TP .B \-h\fP,\fB \-\-help show this help message and exit .TP .BI \-m \ MANIFILE\fP,\fB \ \-\-manifest\fB= MANIFILE Location of manifest.js or manifest.js.gz .TP .BI \-t \ TOPLEVEL\fP,\fB \ \-\-toplevel\fB= TOPLEVEL Top dir where all repositories reside .TP .BI \-l \ LOGFILE\fP,\fB \ \-\-logfile\fB= LOGFILE When specified, will put debug logs in this location .TP .B \-c\fP,\fB \-\-check\-export\-ok Honor the git\-daemon\-export\-ok magic file and do not export repositories not marked as such .TP .B \-n\fP,\fB \-\-use\-now Use current timestamp instead of parsing commits .TP .B \-p\fP,\fB \-\-purge Purge deleted git repositories from manifest .TP .B \-x\fP,\fB \-\-remove Remove repositories passed as arguments from the manifest file .TP .B \-y\fP,\fB \-\-pretty Pretty\-print the generated manifest (sort repos and add indentation). This is much slower, so should be used with caution on large collections. .TP .B \-w\fP,\fB \-\-wait\-for\-manifest When running with arguments, wait if manifest is not there (can be useful when multiple writers are writing to the manifest file via NFS) .TP .BI \-i \ IGNORE\fP,\fB \ \-\-ignore\-paths\fB= IGNORE When finding git dirs, ignore these paths (can be used multiple times, accepts shell\-style globbing) .TP .B \-v\fP,\fB \-\-verbose Be verbose and tell us what you are doing .UNINDENT .UNINDENT .UNINDENT .SH EXAMPLES .sp The examples assume that the repositories are located in /repos. If your repositories are in \fB/var/lib/git\fP, adjust both \fB\-m\fP and \fB\-t\fP flags accordingly. .sp Initial manifest generation: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C /usr/bin/grok\-manifest \-m /repos/manifest.js.gz \-t /repos .ft P .fi .UNINDENT .UNINDENT .sp Inside the git hook: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C /usr/bin/grok\-manifest \-m /repos/manifest.js.gz \-t /repos \-n \(gapwd\(ga .ft P .fi .UNINDENT .UNINDENT .sp To purge deleted repositories, use the \fB\-p\fP flag when running from cron: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C /usr/bin/grok\-manifest \-m /repos/manifest.js.gz \-t /repos \-p .ft P .fi .UNINDENT .UNINDENT .sp You can also add it to the gitolite\(aqs "rm" ADC using the \fB\-x\fP flag: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C /usr/bin/grok\-manifest \-m /repos/manifest.js.gz \-t /repos \-x $repo.git .ft P .fi .UNINDENT .UNINDENT .sp To troubleshoot potential problems, you can pass \fB\-l\fP parameter to grok\-manifest, just make sure the user executing the hook command (user git or gitolite, for example) is able to write to that location: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C /usr/bin/grok\-manifest \-m /repos/manifest.js.gz \-t /repos \e \-l /var/log/git/grok\-manifest\-hook.log \-n \(gapwd\(ga .ft P .fi .UNINDENT .UNINDENT .SH SEE ALSO .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .IP \(bu 2 grok\-pull(1) .IP \(bu 2 git(1) .UNINDENT .UNINDENT .UNINDENT .SH SUPPORT .sp Please send support requests to the mailing list: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C http://lists.kernel.org/mailman/listinfo/grokmirror .ft P .fi .UNINDENT .UNINDENT .SH AUTHOR mricon@kernel.org License: GPLv3+ .SH COPYRIGHT The Linux Foundation and contributors .\" Generated by docutils manpage writer. . grokmirror-0.3.5/man/grok-manifest.1.rst000066400000000000000000000066011220344016600201270ustar00rootroot00000000000000GROK-MANIFEST ============= --------------------------------------- Create manifest for use with grokmirror --------------------------------------- :Author: mricon@kernel.org :Date: 2013-04-26 :Copyright: The Linux Foundation and contributors :License: GPLv3+ :Version: 0.3 :Manual section: 1 SYNOPSIS -------- grok-manifest [opts] -m manifest.js[.gz] -t /path [/path/to/bare.git] DESCRIPTION ----------- Call grok-manifest from a git post-update or post-receive hook to create the latest repository manifest. This manifest file is downloaded by mirror slaves (if newer than what they already have) and used to only clone/pull the repositories that have changed since the mirror's last run. OPTIONS ------- --version show program's version number and exit -h, --help show this help message and exit -m MANIFILE, --manifest=MANIFILE Location of manifest.js or manifest.js.gz -t TOPLEVEL, --toplevel=TOPLEVEL Top dir where all repositories reside -l LOGFILE, --logfile=LOGFILE When specified, will put debug logs in this location -c, --check-export-ok Honor the git-daemon-export-ok magic file and do not export repositories not marked as such -n, --use-now Use current timestamp instead of parsing commits -p, --purge Purge deleted git repositories from manifest -x, --remove Remove repositories passed as arguments from the manifest file -y, --pretty Pretty-print the generated manifest (sort repos and add indentation). This is much slower, so should be used with caution on large collections. -w, --wait-for-manifest When running with arguments, wait if manifest is not there (can be useful when multiple writers are writing to the manifest file via NFS) -i IGNORE, --ignore-paths=IGNORE When finding git dirs, ignore these paths (can be used multiple times, accepts shell-style globbing) -v, --verbose Be verbose and tell us what you are doing EXAMPLES -------- The examples assume that the repositories are located in /repos. If your repositories are in ``/var/lib/git``, adjust both ``-m`` and ``-t`` flags accordingly. Initial manifest generation:: /usr/bin/grok-manifest -m /repos/manifest.js.gz -t /repos Inside the git hook:: /usr/bin/grok-manifest -m /repos/manifest.js.gz -t /repos -n `pwd` To purge deleted repositories, use the ``-p`` flag when running from cron:: /usr/bin/grok-manifest -m /repos/manifest.js.gz -t /repos -p You can also add it to the gitolite's "rm" ADC using the ``-x`` flag:: /usr/bin/grok-manifest -m /repos/manifest.js.gz -t /repos -x $repo.git To troubleshoot potential problems, you can pass ``-l`` parameter to grok-manifest, just make sure the user executing the hook command (user git or gitolite, for example) is able to write to that location:: /usr/bin/grok-manifest -m /repos/manifest.js.gz -t /repos \ -l /var/log/git/grok-manifest-hook.log -n `pwd` SEE ALSO -------- * grok-pull(1) * git(1) SUPPORT ------- Please send support requests to the mailing list:: http://lists.kernel.org/mailman/listinfo/grokmirror grokmirror-0.3.5/man/grok-pull.1000066400000000000000000000063141220344016600164670ustar00rootroot00000000000000.\" Man page generated from reStructuredText. . .TH GROK-PULL 1 "2013-04-26" "0.3" "" .SH NAME GROK-PULL \- Clone or update local git repositories . .nr rst2man-indent-level 0 . .de1 rstReportMargin \\$1 \\n[an-margin] level \\n[rst2man-indent-level] level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] - \\n[rst2man-indent0] \\n[rst2man-indent1] \\n[rst2man-indent2] .. .de1 INDENT .\" .rstReportMargin pre: . RS \\$1 . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] . nr rst2man-indent-level +1 .\" .rstReportMargin post: .. .de UNINDENT . RE .\" indent \\n[an-margin] .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] .nr rst2man-indent-level -1 .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. .SH SYNOPSIS .INDENT 0.0 .INDENT 3.5 grok\-pull \-c /path/to/repos.conf .UNINDENT .UNINDENT .SH DESCRIPTION .sp This utility runs from a cronjob and downloads the latest manifest from the grokmirror master. If there are new repositories or changes in the existing repositories, grok\-pull will perform the necessary git commands to clone or fetch the required data from the master. .sp At the end of its run, grok\-pull will generate its own manifest file, which can then be used for further mirroring. .SH OPTIONS .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .TP .B \-\-version show program\(aqs version number and exit .TP .B \-h\fP,\fB \-\-help show this help message and exit .TP .B \-v\fP,\fB \-\-verbose Be verbose and tell us what you are doing .TP .B \-n\fP,\fB \-\-no\-mtime\-check Run without checking manifest mtime. .TP .B \-f\fP,\fB \-\-force Force full git update regardless of last\-modified times. Also useful when repos.conf has changed. .TP .B \-p\fP,\fB \-\-purge Remove any git trees that are no longer in manifest. .TP .B \-y\fP,\fB \-\-pretty Pretty\-print the generated manifest (sort repos and add indentation). This is much slower, so should be used with caution on large collections. .TP .B \-r\fP,\fB \-\-reuse\-existing\-repos If any existing repositories are found on disk, set new remote origin and reuse .TP .BI \-c \ CONFIG\fP,\fB \ \-\-config\fB= CONFIG Location of repos.conf .UNINDENT .UNINDENT .UNINDENT .SH EXAMPLES .sp Locate repos.conf and modify it to reflect your needs. The default configuration file is heavily commented. .sp Add a cronjob to run as frequently as you like. For example, add the following to \fB/etc/cron.d/grokmirror.cron\fP: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C # Run grok\-pull every minute as user "mirror" * * * * * mirror /usr/bin/grok\-pull \-p \-c /etc/grokmirror/repos.conf .ft P .fi .UNINDENT .UNINDENT .sp Make sure the user "mirror" (or whichever user you specified) is able to write to the toplevel, log and lock locations specified in repos.conf. .SH SEE ALSO .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .IP \(bu 2 grok\-manifest(1) .IP \(bu 2 grok\-fsck(1) .IP \(bu 2 git(1) .UNINDENT .UNINDENT .UNINDENT .SH SUPPORT .sp Please send support requests to the mailing list: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C http://lists.kernel.org/mailman/listinfo/grokmirror .ft P .fi .UNINDENT .UNINDENT .SH AUTHOR mricon@kernel.org License: GPLv3+ .SH COPYRIGHT The Linux Foundation and contributors .\" Generated by docutils manpage writer. . grokmirror-0.3.5/man/grok-pull.1.rst000066400000000000000000000045731220344016600173030ustar00rootroot00000000000000GROK-PULL ========= -------------------------------------- Clone or update local git repositories -------------------------------------- :Author: mricon@kernel.org :Date: 2013-04-26 :Copyright: The Linux Foundation and contributors :License: GPLv3+ :Version: 0.3 :Manual section: 1 SYNOPSIS -------- grok-pull -c /path/to/repos.conf DESCRIPTION ----------- This utility runs from a cronjob and downloads the latest manifest from the grokmirror master. If there are new repositories or changes in the existing repositories, grok-pull will perform the necessary git commands to clone or fetch the required data from the master. At the end of its run, grok-pull will generate its own manifest file, which can then be used for further mirroring. OPTIONS ------- --version show program's version number and exit -h, --help show this help message and exit -v, --verbose Be verbose and tell us what you are doing -n, --no-mtime-check Run without checking manifest mtime. -f, --force Force full git update regardless of last-modified times. Also useful when repos.conf has changed. -p, --purge Remove any git trees that are no longer in manifest. -y, --pretty Pretty-print the generated manifest (sort repos and add indentation). This is much slower, so should be used with caution on large collections. -r, --reuse-existing-repos If any existing repositories are found on disk, set new remote origin and reuse -c CONFIG, --config=CONFIG Location of repos.conf EXAMPLES -------- Locate repos.conf and modify it to reflect your needs. The default configuration file is heavily commented. Add a cronjob to run as frequently as you like. For example, add the following to ``/etc/cron.d/grokmirror.cron``:: # Run grok-pull every minute as user "mirror" * * * * * mirror /usr/bin/grok-pull -p -c /etc/grokmirror/repos.conf Make sure the user "mirror" (or whichever user you specified) is able to write to the toplevel, log and lock locations specified in repos.conf. SEE ALSO -------- * grok-manifest(1) * grok-fsck(1) * git(1) SUPPORT ------- Please send support requests to the mailing list:: http://lists.kernel.org/mailman/listinfo/grokmirror grokmirror-0.3.5/python-grokmirror.spec000066400000000000000000000052271220344016600203100ustar00rootroot00000000000000%if 0%{?fedora} > 12 %global with_python3 0 %else %{!?python_sitelib: %global python_sitelib %(%{__python} -c "from distutils.sysconfig import get_python_lib; print get_python_lib()")} %endif Name: python-grokmirror Version: 0.3.5 Release: 1%{?dist} Summary: Framework to smartly mirror git repositories License: GPLv3+ URL: https://git.kernel.org/cgit/utils/grokmirror/grokmirror.git Source0: https://www.kernel.org/pub/software/network/grokmirror/grokmirror-%{version}.tar.xz BuildArch: noarch BuildRequires: python2-devel, python-setuptools Requires: GitPython %description Grokmirror was written to make mirroring large git repository collections more efficient. Grokmirror uses the manifest file published by the master mirror in order to figure out which repositories to clone, and to track which repositories require updating. The process is extremely lightweight and efficient both for the master and for the mirrors. %prep %setup -q -n grokmirror-%{version} %build %{__python} setup.py build %install rm -rf %{buildroot} %{__python} setup.py install -O1 --skip-build --root %{buildroot} %{__mkdir_p} -m 0755 \ %{buildroot}%{_bindir} \ %{buildroot}%{_mandir}/man1 %{__install} -m 0755 grok-manifest.py %{buildroot}/%{_bindir}/grok-manifest %{__install} -m 0755 grok-pull.py %{buildroot}/%{_bindir}/grok-pull %{__install} -m 0755 grok-fsck.py %{buildroot}/%{_bindir}/grok-fsck %{__install} -m 0755 grok-dumb-pull.py %{buildroot}/%{_bindir}/grok-dumb-pull %{__install} -m 0644 man/*.1 %{buildroot}/%{_mandir}/man1/ %files %doc README.rst COPYING repos.conf fsck.conf %{python_sitelib}/grokmirror/ %{python_sitelib}/*.egg-info %{_bindir}/grok-* %{_mandir}/*/* %changelog * Fri Aug 16 2013 Konstantin Ryabitsev - 0.3.5-1 - Update to 0.3.5 containing minor feature enhancements * Mon Jun 14 2013 Konstantin Ryabitsev - 0.3.4-1 - Update to 0.3.4 containing minor bugfixes * Mon May 27 2013 Konstantin Ryabitsev - 0.3.3-1 - Update to 0.3.3 containing bugfixes and new features * Mon May 13 2013 Konstantin Ryabitsev - 0.3.2-1 - Update to 0.3.2 containing important bugfixes and minor new features * Mon May 13 2013 Konstantin Ryabitsev - 0.3.1-1 - Update to 0.3.1 containing important bugfixes * Mon May 06 2013 Konstantin Ryabitsev - 0.3-1 - Preparing for 0.3 with new features. * Thu Apr 25 2013 Konstantin Ryabitsev - 0.2-1 - Version 0.2 with new features and manpages. * Wed Apr 03 2013 Konstantin Ryabitsev - 0.1-1 - Initial packaging grokmirror-0.3.5/repos.conf000066400000000000000000000063011220344016600157110ustar00rootroot00000000000000# You can pull from multiple grok mirrors, just create # a separate section for each mirror. The name can be anything. [kernel.org] # The host part of the mirror you're pulling from. #site = git://git.kernel.org site = git://git.kernel.org # # Where the grok manifest is published. The following protocols # are supported at this time: # http:// or https:// using If-Modified-Since http header # file:// (when manifest file is on NFS, for example) #manifest = http://git.kernel.org/manifest.js.gz manifest = http://git.kernel.org/manifest.js.gz # # Where are we going to put the mirror on our disk? #toplevel = /var/lib/git/mirror toplevel = /var/lib/git/mirror # # Where do we store our own manifest? Usually in the toplevel. #mymanifest = /var/lib/git/mirror/manifest.js.gz mymanifest = /var/lib/git/mirror/manifest.js.gz # # Write out projects.list that can be used by gitweb or cgit. # Leave blank if you don't want a projects.list. #projectslist = /var/lib/git/mirror/projects.list projectslist = /var/lib/git/mirror/projects.list # # When generating projects.list, start at this subpath instead # of at the toplevel. Useful when mirroring kernel or when generating # multiple gitweb/cgit configurations for the same tree. #projectslist_trimtop = /pub/scm/ projectslist_trimtop = /pub/scm/ # # When generating projects.list, also create entries for symlinks. # Otherwise we assume they are just legacy and keep them out of # web interfaces. #projectslist_symlinks = yes projectslist_symlinks = no # # A simple hook to execute whenever a repository is modified. # It passes the full path to the git repository modified as the only # argument. #post_update_hook = /usr/local/bin/make-git-fairies-appear post_update_hook = # # If owner is not specified in the manifest, who should be listed # as the default owner in tools like gitweb or cgit? #default_owner = Grokmirror User default_owner = Grokmirror User # # Where do we put the logs? #log = /var/log/mirror/kernelorg.log log = /var/log/mirror/kernelorg.log # # Log level can be "info" or "debug" #loglevel = info loglevel = info # # To prevent multiple grok-pull instances from running at the same # time, we first obtain an exclusive lock. #lock = /var/lock/mirror/kernelorg.lock lock = /var/lock/mirror/kernelorg.lock # # To speed up updates, grok-pull will use multiple threads. Please be # considerate to the mirror you're pulling from and don't set this very # high. You may also run into per-ip multiple session limits, so leave this # number at a nice low setting. #pull_threads = 5 pull_threads = 5 # # Use shell-globbing to list the repositories you would like to mirror. # If you want to mirror everything, just say "*". Separate multiple entries # with newline plus tab. Examples: # # mirror everything: #include = * # # mirror just the main kernel sources: #include = /pub/scm/linux/kernel/git/torvalds/linux.git # /pub/scm/linux/kernel/git/stable/linux-stable.git # /pub/scm/linux/kernel/git/next/linux-next.git # # mirror just git: #include = /pub/scm/git/* include = * # # This is processed after the include. If you want to exclude some specific # entries from an all-inclusive globbing above. E.g., to exclude all linux-2.4 # git sources: #exclude = */linux-2.4* exclude = grokmirror-0.3.5/setup.py000066400000000000000000000013621220344016600154260ustar00rootroot00000000000000#!/usr/bin/python -tt import os from distutils.core import setup # Utility function to read the README file. # Used for the long_description. It's nice, because now 1) we have a top level # README file and 2) it's easier to type in the README file than to put a raw # string in below ... def read(fname): return open(os.path.join(os.path.dirname(__file__), fname)).read() VERSION='0.3.5' NAME='grokmirror' setup( version=VERSION, url='https://www.kernel.org/pub/software/network/grokmirror', name=NAME, description='Smartly mirror git repositories that use grokmirror', author='Konstantin Ryabitsev', author_email='mricon@kernel.org', packages=[NAME], license='GPLv3+', long_description=read('README.rst'), )