pax_global_header00006660000000000000000000000064141033014570014510gustar00rootroot0000000000000052 comment=85e95cdb3f9dd0e066c9f35959ead12136fe904c grokmirror-2.0.11/000077500000000000000000000000001410330145700137665ustar00rootroot00000000000000grokmirror-2.0.11/.gitignore000066400000000000000000000000611410330145700157530ustar00rootroot00000000000000.idea *.pyc *.swp *.pdf *~ dist build *.egg-info grokmirror-2.0.11/CHANGELOG.rst000066400000000000000000000173011410330145700160110ustar00rootroot00000000000000v2.0.11 (2021-08-06) -------------------- - Hotfix for pull_threads bug causing any config value to result in 1 thread. v2.0.10 (2021-07-27) -------------------- - Roll back new hook inclusion (will go into 2.1.0) - Roll back code cleanups (will go into 2.1.0) - Fix a grok-fsck regression introduced in 2.0.9 - Fix pull_threads auto-detection on single-cpu systems v2.0.9 (2021-07-13) ------------------- - Add initial support for post_clone_complete_hook that fires only after all new clones have been completed. - Fix grok-manifest traceback due to unicode errors in the repo description file. - Minor code cleanups. v2.0.8 (2021-03-11) ------------------- - Fixes around symlink handling in manifest files. Adding and deleting symlinks should properly work again. - Don't require [fsck] section in the config file (though you'd almost always want it there). v2.0.7 (2021-01-19) ------------------- - A slew of small fixes improving performance on very large repository collections (CAF internally is 32,500). v2.0.6 (2021-01-07) ------------------- - Use fsck.extra_repack_flags when doing quick post-clone repacks - Store objects in objstore after grok-dumb-pull call on a repo that uses objstore repositories v2.0.5 (2020-11-25) ------------------- - Prioritize baseline repositories when finding related objstore repos. - Minor fixes. v2.0.4 (2020-11-06) ------------------- - Add support to use git plumbing for objstore operations, via enabling core.objstore_uses_plumbing. This allows to significantly speed up fetching objects into objstore during pull operations. Fsck operations will continue to use porcelain "git fetch", since speed is less important in those cases and it's best to opt for maximum safety. As a benchmark, with remote.preload_bundle_url and core.objstore_uses_plumbing settings enabled, cloning a full replica of git.kernel.org takes less than an hour as opposed to over a day. v2.0.3 (2020-11-04) ------------------- - Refuse to delete ffonly repos - Add new experimental bundle_preload feature for generating objstore repo bundles and using them to preload objstores on the mirrors v2.0.2 (2020-10-06) ------------------- - Provide pi-piper utility for piping new messages from public-inbox repositories. It can be specified as post_update_hook: post_update_hook = /usr/bin/grok-pi-piper -c ~/.config/pi-piper.conf - Add -r option to grok-manifest to ignore specific refs when calculating repository fingerprint. This is mostly useful for mirroring from gerrit. v2.0.1 (2020-09-30) ------------------- - fix potential corruption when migrating repositories with existing alternates to new object storage format - improve grok-fsck console output to be less misleading for large repo collections (was misreporting obstrepo/total repo numbers) - use a faster repo search algorithm that doesn't needlessly recurse into git repos themselves, once found v2.0.0 (2020-09-21) ------------------- Major rewrite to improve shared object storage and replication for VERY LARGE repository collections (codeaurora.org is ~30,000 repositories, which are mostly various forks of Android). See UPGRADING.rst for the upgrade strategy. Below are some major highlights. - Drop support for python < 3.6 - Introduce "object storage" repositories that benefit from git-pack delta islands and improve overall disk storage footprint (depending on the number of forks). - Drop dependency on GitPython, use git calls directly for all operations - Remove progress bars to slim down dependencies (drops enlighten) - Make grok-pull operate in daemon mode (with -o) (see contrib for systemd unit files). This is more efficient than the cron mode when run very frequently. - Provide a socket listener for pubsub push updates (see contrib for Google pubsubv1.py). - Merge fsck.conf and repos.conf into a single config file. This requires creating a new configuration file after the upgrade. See UPGRADING.rst for details. - Record and propagate HEAD position using the manifest file. - Add grok-bundle command to create clone.bundle files for CDN-offloaded cloning (mostly used by Android's repo command). - Add SELinux policy for EL7 (see contrib). v1.2.2 (2019-10-23) ------------------- - Small bugfixes - Generate commit-graph file if the version of git is new enough to support it. This is done during grok-fsck any time we decide that the repository needs to be repacked. You can force this off by setting commitgraph=never in config. v1.2.1 (2019-03-11) ------------------- - Minor feature improvement changing how precious=yes works. Grokmirror will now turn preciousObjects off for the duration of the repack. We still protect shared repositories against inadvertent object pruning by outside processes, but this allows us to clean up loose objects and obsolete packs. To have the 1.2.0 behaviour back, set precious=always, but it is only really useful in very rare cases. v1.2.0 (2019-02-14) ------------------- - Make sure to set gc.auto=0 on repositories to avoid pruning repos that are acting as alternates to others. We run our own prune during fsck, so there is no need to auto-gc, ever (unless you didn't set up grok-fsck, in which case you're not doing it right). - Rework the repack code to be more clever -- instead of repacking based purely on dates, we now track the number of loose objects and the number of generated packs. Many of the settings are hardcoded for the moment while testing, but will probably end up settable via global and per-repository config settings. - The following fsck.conf settings have no further effect: - repack_flags (replaced with extra_repack_flags) - full_repack_flags (replaced with extra_repack_flags_full) - full_repack_every (we now figure it out ourselves) - Move git command invocation routines into a central function to reduce the amount of code duplication. You can also set the path to the git binary using the GITBIN env variable or by simply adding it to your path. - Add "reclone_on_errors" setting in fsck.conf. If fsck/repack/prune comes across a matching error, it will mark the repository for recloning and it will be cloned anew from the master the next time grok-pull runs. This is useful for auto-correcting corruption on the mirrors. You can also manually request a reclone by creating a "grokmirror.reclone" file in a repository. - Set extensions.preciousObjects for repositories used with git alternates if precious=yes is set in fsck.conf. This helps further protect shared repos from erroneous pruning (e.g. done manually by an administrator). v1.1.1 (2018-07-25) ------------------- - Quickfix a bug that was causing repositories to never be repacked due to miscalculated fingerprints. v1.1.0 (2018-04-24) ------------------- - Make Python3 compatible (thanks to QuLogic for most of the work) - Rework grok-fsck to improve functionality: - run repack and prune before fsck, for optimal safety - add --connectivity flag to run fsck with --connectivity-only - add --repack-all-quick to trigger a quick repack of all repos - add --repack-all-full to trigger a full repack of all repositories using the defined full_repack_flags from fsck.conf - always run fsck with --no-dangling, because mirror admins are not responsible for cleaning those up anyway - no longer locking repos when running repack/prune/fsck, because these operations are safe as long as they are done by git itself - fix grok-pull so it no longer purges repos that are providing alternates to others - fix grok-fsck so it's more paranoid when pruning repos providing alternates to others (checks all repos on disk, not just manifest) - in verbose mode, most commands will draw progress bars (handy with very large connections of repositories) grokmirror-2.0.11/LICENSE.txt000066400000000000000000001045131410330145700156150ustar00rootroot00000000000000 GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The GNU General Public License is a free, copyleft license for software and other kinds of works. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things. To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others. For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License giving you legal permission to copy, distribute and/or modify it. For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions. Some devices are designed to deny users access to install or run modified versions of the software inside them, although the manufacturer can do so. This is fundamentally incompatible with the aim of protecting users' freedom to change the software. The systematic pattern of such abuse occurs in the area of products for individuals to use, which is precisely where it is most unacceptable. Therefore, we have designed this version of the GPL to prohibit the practice for those products. If such problems arise substantially in other domains, we stand ready to extend this provision to those domains in future versions of the GPL, as needed to protect the freedom of users. Finally, every program is threatened constantly by software patents. States should not allow patents to restrict development and use of software on general-purpose computers, but in those that do, we wish to avoid the special danger that patents applied to a free program could make it effectively proprietary. To prevent this, the GPL assures that patents cannot be used to render the program non-free. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 0. Definitions. "This License" refers to version 3 of the GNU General Public License. "Copyright" also means copyright-like laws that apply to other kinds of works, such as semiconductor masks. "The Program" refers to any copyrightable work licensed under this License. Each licensee is addressed as "you". "Licensees" and "recipients" may be individuals or organizations. To "modify" a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a "modified version" of the earlier work or a work "based on" the earlier work. A "covered work" means either the unmodified Program or a work based on the Program. To "propagate" a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well. To "convey" a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. An interactive user interface displays "Appropriate Legal Notices" to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion. 1. Source Code. The "source code" for a work means the preferred form of the work for making modifications to it. "Object code" means any non-source form of a work. A "Standard Interface" means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language. The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source. The Corresponding Source for a work in source code form is that same work. 2. Basic Permissions. All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law. You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you. Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary. 3. Protecting Users' Legal Rights From Anti-Circumvention Law. No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures. When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures. 4. Conveying Verbatim Copies. You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program. You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. 5. Conveying Modified Source Versions. You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: a) The work must carry prominent notices stating that you modified it, and giving a relevant date. b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to "keep intact all notices". c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate. 6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work. A "User Product" is either (1) a "consumer product", which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, "normally used" refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product. "Installation Information" for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made. If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM). The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying. 7. Additional Terms. "Additional permissions" are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions. When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission. Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms: a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or d) Limiting the use for publicity purposes of names of licensors or authors of the material; or e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors. All other non-permissive additional terms are considered "further restrictions" within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying. If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms. Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way. 8. Termination. You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11). However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10. 9. Acceptance Not Required for Having Copies. You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so. 10. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. An "entity transaction" is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it. 11. Patents. A "contributor" is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's "contributor version". A contributor's "essential patent claims" are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, "control" includes the right to grant patent sublicenses in a manner consistent with the requirements of this License. Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version. In the following three paragraphs, a "patent license" is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To "grant" such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party. If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. "Knowingly relying" means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid. If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it. A patent license is "discriminatory" if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007. Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law. 12. No Surrender of Others' Freedom. If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program. 13. Use with the GNU Affero General Public License. Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such. 14. Revised Versions of this License. The Free Software Foundation may publish revised and/or new versions of the GNU General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU General Public License, you may choose any version ever published by the Free Software Foundation. If the Program specifies that a proxy can decide which future versions of the GNU General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program. Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version. 15. Disclaimer of Warranty. THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. Limitation of Liability. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 17. Interpretation of Sections 15 and 16. If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively state the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Also add information on how to contact you by electronic and paper mail. If the program does terminal interaction, make it output a short notice like this when it starts in an interactive mode: Copyright (C) This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, your program's commands might be different; for a GUI interface, you would use an "about box". You should also get your employer (if you work as a programmer) or school, if any, to sign a "copyright disclaimer" for the program, if necessary. For more information on this, and how to apply and follow the GNU GPL, see . The GNU General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. But first, please read . grokmirror-2.0.11/MANIFEST.in000066400000000000000000000000611410330145700155210ustar00rootroot00000000000000include LICENSE.txt include *.rst include *.conf grokmirror-2.0.11/README.rst000066400000000000000000000166501410330145700154650ustar00rootroot00000000000000GROKMIRROR ========== -------------------------------------------- Framework to smartly mirror git repositories -------------------------------------------- :Author: konstantin@linuxfoundation.org :Date: 2020-09-18 :Copyright: The Linux Foundation and contributors :License: GPLv3+ :Version: 2.0.0 DESCRIPTION ----------- Grokmirror was written to make replicating large git repository collections more efficient. Grokmirror uses the manifest file published by the origin server in order to figure out which repositories to clone, and to track which repositories require updating. The process is lightweight and efficient both for the primary and for the replicas. CONCEPTS -------- The origin server publishes a json-formatted manifest file containing information about all git repositories that it carries. The format of the manifest file is as follows:: { "/path/to/bare/repository.git": { "description": "Repository description", "head": "ref: refs/heads/branchname", "reference": "/path/to/reference/repository.git", "forkgroup": "forkgroup-guid", "modified": timestamp, "fingerprint": sha1sum(git show-ref), "symlinks": [ "/location/to/symlink", ... ], } ... } The manifest file is usually gzip-compressed to preserve bandwidth. Each time a commit is made to one of the git repositories, it automatically updates the manifest file using an appropriate git hook, so the manifest.js file should always contain the most up-to-date information about the state of all repositories. The mirroring clients will poll the manifest.js file and download the updated manifest if it is newer than the locally stored copy (using ``Last-Modified`` and ``If-Modified-Since`` http headers). After downloading the updated manifest.js file, the mirrors will parse it to find out which repositories have been updated and which new repositories have been added. Object Storage Repositories ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Grokmirror 2.0 introduces the concept of "object storage repositories", which aims to optimize how repository forks are stored on disk and served to the cloning clients. When grok-fsck runs, it will automatically recognize related repositories by analyzing their root commits. If it finds two or more related repositories, it will set up a unified "object storage" repo and fetch all refs from each related repository into it. For example, you can have two forks of linux.git: torvalds/linux.git: refs/heads/master refs/tags/v5.0-rc3 ... and its fork: maintainer/linux.git: refs/heads/master refs/heads/devbranch refs/tags/v5.0-rc3 ... Grok-fsck will set up an object storage repository and fetch all refs from both repositories: objstore/[random-guid-name].git refs/virtual/[sha1-of-torvalds/linux.git:12]/heads/master refs/virtual/[sha1-of-torvalds/linux.git:12]/tags/v5.0-rc3 ... refs/virtual/[sha1-of-maintainer/linux.git:12]/heads/master refs/virtual/[sha1-of-maintainer/linux.git:12]/heads/devbranch refs/virtual/[sha1-of-maintainer/linux.git:12]/tags/v5.0-rc3 ... Then both torvalds/linux.git and maintainer/linux.git with be configured to use objstore/[random-guid-name].git via objects/info/alternates and repacked to just contain metadata and no objects. The alternates repository will be repacked with "delta islands" enabled, which should help optimize clone operations for each "sibling" repository. Please see the example grokmirror.conf for more details about configuring objstore repositories. ORIGIN SETUP ------------ Install grokmirror on the origin server using your preferred way. **IMPORTANT: Only bare git repositories are supported.** You will need to add a hook to each one of your repositories that would update the manifest upon repository modification. This can either be a post-receive hook, or a post-update hook. The hook must call the following command:: /usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \ -t /var/lib/gitolite3/repositories -n `pwd` The **-m** flag is the path to the manifest.js file. The git process must be able to write to it and to the directory the file is in (it creates a manifest.js.randomstring file first, and then moves it in place of the old one for atomicity). The **-t** flag is to help grokmirror trim the irrelevant toplevel disk path, so it is trimmed from the top. The **-n** flag tells grokmirror to use the current timestamp instead of the exact timestamp of the commit (much faster this way). Before enabling the hook, you will need to generate the manifest.js of all your git repositories. In order to do that, run the same command, but omit the -n and the \`pwd\` argument. E.g.:: /usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \ -t /var/lib/gitolite3/repositories The last component you need to set up is to automatically purge deleted repositories from the manifest. As this can't be added to a git hook, you can either run the ``--purge`` command from cron:: /usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \ -t /var/lib/gitolite3/repositories -p Or add it to your gitolite's ``D`` command using the ``--remove`` flag:: /usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \ -t /var/lib/gitolite3/repositories -x $repo.git If you would like grok-manifest to honor the ``git-daemon-export-ok`` magic file and only add to the manifest those repositories specifically marked as exportable, pass the ``--check-export-ok`` flag. See ``git-daemon(1)`` for more info on ``git-daemon-export-ok`` file. You will need to have some kind of httpd server to serve the manifest file. REPLICA SETUP ------------- Install grokmirror on the replica using your preferred way. Locate grokmirror.conf and modify it to reflect your needs. The default configuration file is heavily commented to explain what each option does. Make sure the user "mirror" (or whichever user you specified) is able to write to the toplevel and log locations specified in grokmirror.conf. You can either run grok-pull manually, from cron, or as a systemd-managed daemon (see contrib). If you do it more frequently than once every few hours, you should definitely run it as a daemon in order to improve performance. GROK-FSCK --------- Git repositories should be routinely repacked and checked for corruption. This utility will perform the necessary optimizations and report any problems to the email defined via fsck.report_to ('root' by default). It should run weekly from cron or from the systemd timer (see contrib). Please examine the example grokmirror.conf file for various things you can tweak. FAQ --- Why is it called "grok mirror"? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Because it's developed at kernel.org and "grok" is a mirror of "korg". Also, because it groks git mirroring. Why not just use rsync? ~~~~~~~~~~~~~~~~~~~~~~~ Rsync is extremely inefficient for the purpose of mirroring git trees that mostly consist of a lot of small files that very rarely change. Since rsync must calculate checksums on each file during each run, it mostly results in a lot of disk thrashing. Additionally, if several repositories share objects between each-other, unless the disk paths are exactly the same on both the remote and local mirror, this will result in broken git repositories. It is also a bit silly, considering git provides its own extremely efficient mechanism for specifying what changed between revision X and revision Y. grokmirror-2.0.11/UPGRADING.rst000066400000000000000000000154671410330145700160550ustar00rootroot00000000000000Upgrading from Grokmirror 1.x to 2.x ------------------------------------ Grokmirror-2.0 introduced major changes to how repositories are organized, so it deliberately breaks the upgrade path in order to force admins to make proper decisions. Installing the newer version on top of the old one will break replication, as it will refuse to work with old configuration files. Manifest compatibility ---------------------- Manifest files generated by grokmirror-1.x will continue to work on grokmirror-2.x replicas. Similarly, manifest files generated by grokmirror-2.x origin servers will work on grokmirror-1.x replicas. In other words, upgrading the origin servers and replicas does not need to happen at the same time. While grokmirror-2.x adds more entries to the manifest file (e.g. "forkgroup" and "head" records), they will be ignored by grokmirror-1.x replicas. Upgrading the origin server --------------------------- Breaking changes affecting the origin server are related to grok-fsck runs. Existing grok-manifest hooks should continue to work without any changes required. Grok-fsck will now automatically recognize related repositories by comparing the output of ``git rev-list --max-parents=0 --all``. When two or more repositories are recognized as forks of each-other, a new "object storage" repository will be set up that will contain refs from all siblings. After that, individual repositories will be repacked to only contain repository metadata (and loose objects in need of pruning). Existing repositories that already use alternates will be automatically migrated to objstore repositories during the first grok-fsck run. If you have a small collection of repositories, or if the vast majority of them aren't forks of each-other, then the upgrade can be done live with little impact. If the opposite is true and most of your repositories are forks, then the initial grok-fsck run will take a lot of time and resources to complete, as repositories will be automatically repacked to take advantage of the new object storage layout. Doing so without preparation can significantly impact the availability of your server, so you should plan the upgrade appropriately. Recommended scenario for large collections with lots of forks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. Set up a temporary system with fast disk IO and plenty of CPUs and RAM. Repacking will go a lot faster on fast systems with plenty of IO cycles. 2. Install grokmirror-2 and configure it to replicate from the origin **INTO THE SAME PATH AS ON THE ORIGIN SERVER**. If your origin server is hosting repos out of /var/lib/gitolite3/repositories, then your migration replica should be configured with toplevel in /var/lib/gitolite3/repositories. This is important, because when the "alternates" file is created, it specifies a full path to the location of the object storage directory and moving repositories into different locations post-migration will result in breakage. *Avoid using symlinks for this purpose*, as grokmirror-2 will realpath them before using internally. 3. Perform initial grok-pull replication from the current origin server to the migration replica. This should set up all repositories currently using alternates as objstore repositories. 4. Once the initial replication is complete, run grok-fsck on the new hierarchy. This should properly repack all new object storage repositories to benefit from delta islands, plus automatically find all repositories that are forks of each-other but aren't already set up for alternates. The initial grok-fsck process may take a LONG time to run, depending on the size of your repository collection. 5. Schedule migration downtime. 6. Right before downtime, run grok-pull to get the latest updates. 7. At the start of downtime, block access to the origin server, so no pushes are allowed to go through. Run final grok-pull on the migration replica. 8. Back up your existing hierarchy, because you know you should, or move it out of the way if you have enough disk space for this. 9. Copy the new hierarchy from the migration replica (e.g. using rsync). 10. Run any necessary steps such as "gitolite setup" in order to set things up. 11. Rerun grok-manifest on the toplevel in order to generate the fresh manifest.js.gz file. 12. Create a new grokmirror.conf for fsck runs (grokmirror-1.x configuration files are purposefully not supported). 13. Enable the grok-fsck timer. Upgrading the replicas ---------------------- The above procedure should also be considered for upgrading the replicas, unless you have a small collection that doesn't use a lot of forks and alternates. You can find out if that is the case by running ``find . -name alternates`` at the top of your mirrored tree. If the number of returned hits is significant, then the first time grok-fsck runs, it will spend a lot of time repacking the repositories to benefit from the new layout. On the upside, you can expect significant storage use reduction after this conversion is completed. If your replica is providing continuous access for members of your development team, then you may want to perform this conversion prior to upgrading grokmirror on your production server, in order to reduce the impact on server load. Just follow the instructions from the section above. Converting the configuration file ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Grokmirror-1.x used two different config files -- one for grok-pull and another for grok-fsck. This separation only really made sense on the origin server and was cumbersome for the replicas, since they ended up duplicating a lot of configuration options between the two config files. Grokmirror-1.x: - separate configuration files for grok-pull and grok-fsck - multiple origin servers can be listed in one file Grokmirror-2.x: - one configuration file for all grokmirror tools - one origin server per configuration file Grokmirror-2.x will refuse to run with configuration files created for the previous version, so you will need to create a new configuration file in order to continue using it after upgrading. Most configuration options will be familiar to you from version 1.x, and the rest are documented in the grokmirror.conf file provided with the distribution. Converting from cron to daemon operation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Grokmirror-1.x expected grok-pull to run from cron, but this had a set of important limitations. In contrast, grokmirror-2.x is written to run grok-pull as a daemon. It is strongly recommended to switch away from cron-based regular runs if you do them more frequently than once every few hours, as this will result in more efficient operation. See the set of systemd unit files included in the contrib directory for where to get started. Grok-fsck can continue to run from cron if you prefer, or you can run it from a systemd timer as well. grokmirror-2.0.11/contrib/000077500000000000000000000000001410330145700154265ustar00rootroot00000000000000grokmirror-2.0.11/contrib/gitolite/000077500000000000000000000000001410330145700172465ustar00rootroot00000000000000grokmirror-2.0.11/contrib/gitolite/get-grok-manifest.command000066400000000000000000000017251410330145700241360ustar00rootroot00000000000000#!/bin/bash # This is a command to install in gitolite's local-code. # Don't forget to enable it via .gitolite.rc # # Change this to where grok-manifest is writing manifest.js MANIFILE="/var/www/html/grokmirror/manifest.js.gz" if [[ -z "$GL_USER" ]]; then echo "ERROR: GL_USER is unset. Run me via ssh, please." exit 1 fi # Make sure we only accept credential replication from the mirrors for MIRROR in $(GL_USER='' gitolite mirror list slaves gitolite-admin); do if [[ $GL_USER == "server-${MIRROR}" ]]; then AOK="yes" break fi done if [[ -z "$AOK" ]]; then echo "You are not allowed to do this" exit 1 fi if [[ ! -s $MANIFILE ]]; then echo "Manifest file not found" exit 1 fi R_LASTMOD=$1 if [[ -z "$R_LASTMOD" ]]; then R_LASTMOD=0 fi L_LASTMOD=$(stat --printf='%Y' $MANIFILE) if [[ $L_LASTMOD -le $R_LASTMOD ]]; then exit 127 fi if [[ $MANIFILE == *.gz ]]; then zcat $MANIFILE else cat $MANIFILE fi exit 0 grokmirror-2.0.11/contrib/gitolite/grok-get-gl-manifest.sh000066400000000000000000000010731410330145700235260ustar00rootroot00000000000000#!/bin/bash # This is executed by grok-pull if manifest_command is defined. # You should install the other file as one of your commands in local-code # and enable it in .gitolite.rc PRIMARY=$(gitolite mirror list master gitolite-admin) STATEFILE="$(gitolite query-rc GL_ADMIN_BASE)/.${PRIMARY}.manifest.lastupd" GL_COMMAND=get-grok-manifest if [[ -s $STATEFILE ]] && [[ $1 != '--force' ]]; then LASTUPD=$(cat $STATEFILE) fi NOWSTAMP=$(date +'%s') ssh $PRIMARY $GL_COMMAND $LASTUPD ECODE=$? if [[ $ECODE == 0 ]]; then echo $NOWSTAMP > $STATEFILE fi exit $ECODE grokmirror-2.0.11/contrib/grok-fsck@.service000066400000000000000000000010601410330145700207730ustar00rootroot00000000000000[Unit] Description=Grok-fsck service for %I Documentation=https://github.com/mricon/grokmirror [Service] Type=oneshot Environment="EXTRA_FSCK_OPTS=" EnvironmentFile=-/etc/sysconfig/grokmirror.default EnvironmentFile=-/etc/sysconfig/grokmirror.%i ExecStart=/usr/bin/grok-fsck -c /etc/grokmirror/%i.conf $EXTRA_FSCK_OPTS CPUSchedulingPolicy=batch # To override these users, create a drop-in systemd conf file in # /etc/systemd/system/grok-fsck@[foo].service.d/10-usergroup.conf: # [Service] # User=yourpreference # Group=yourpreference User=mirror Group=mirror grokmirror-2.0.11/contrib/grok-fsck@.timer000066400000000000000000000002351410330145700204560ustar00rootroot00000000000000[Unit] Description=Grok-fsck timer for %I Documentation=https://github.com/mricon/grokmirror [Timer] OnCalendar=Sat 04:00 [Install] WantedBy=timers.target grokmirror-2.0.11/contrib/grok-pull@.service000066400000000000000000000011461410330145700210260ustar00rootroot00000000000000[Unit] Description=Grok-pull service for %I After=network.target Documentation=https://github.com/mricon/grokmirror [Service] Environment="EXTRA_PULL_OPTS=" EnvironmentFile=-/etc/sysconfig/grokmirror.default EnvironmentFile=-/etc/sysconfig/grokmirror.%i ExecStart=/usr/bin/grok-pull -o -c /etc/grokmirror/%i.conf $EXTRA_PULL_OPTS Type=simple Restart=on-failure # To override these users, create a drop-in systemd conf file in # /etc/systemd/system/grok-pull@[foo].service.d/10-usergroup.conf: # [Service] # User=yourpreference # Group=yourpreference User=mirror Group=mirror [Install] WantedBy=multi-user.target grokmirror-2.0.11/contrib/logrotate000066400000000000000000000001151410330145700173460ustar00rootroot00000000000000/var/log/grokmirror/*.log { missingok notifempty delaycompress } grokmirror-2.0.11/contrib/pubsubv1.py000066400000000000000000000071621410330145700175550ustar00rootroot00000000000000#!/usr/bin/env python3 # Implements a Google pubsub v1 push listener, see: # https://cloud.google.com/pubsub/docs/push # # In order to work, grok-pull must be running as a daemon service with # the "socket" option enabled in the configuration. # # The pubsub message should contain two attributes: # { # "message": { # "attributes": { # "proj": "projname", # "repo": "/path/to/repo.git" # } # } # } # # "proj" value should map to a "$proj.conf" file in /etc/grokmirror # (you can override that default via the GROKMIRROR_CONFIG_DIR env var). # "repo" value should match a repo defined in the manifest file as understood # by the running grok-pull daemon (it will ignore anything else) # # Any other attributes or the "data" field are ignored. import falcon import json import os import socket import re from configparser import ConfigParser, ExtendedInterpolation # Some sanity defaults MAX_PROJ_LEN = 32 MAX_REPO_LEN = 1024 # noinspection PyBroadException class PubsubListener(object): def on_get(self, req, resp): resp.status = falcon.HTTP_200 resp.body = "We don't serve GETs here\n" def on_post(self, req, resp): if not req.content_length: resp.status = falcon.HTTP_500 resp.body = 'Payload required\n' return try: doc = json.load(req.stream) except: resp.status = falcon.HTTP_500 resp.body = 'Failed to parse payload as json\n' return try: proj = doc['message']['attributes']['proj'] repo = doc['message']['attributes']['repo'] except (KeyError, TypeError): resp.status = falcon.HTTP_500 resp.body = 'Not a pubsub v1 payload\n' return if len(proj) > MAX_PROJ_LEN or len(repo) > MAX_REPO_LEN: resp.status = falcon.HTTP_500 resp.body = 'Repo or project value too long\n' return # Proj shouldn't contain slashes or whitespace if re.search(r'[\s/]', proj): resp.status = falcon.HTTP_500 resp.body = 'Invalid characters in project name\n' return # Repo shouldn't contain whitespace if re.search(r'\s', proj): resp.status = falcon.HTTP_500 resp.body = 'Invalid characters in repo name\n' return confdir = os.environ.get('GROKMIRROR_CONFIG_DIR', '/etc/grokmirror') cfgfile = os.path.join(confdir, '{}.conf'.format(proj)) if not os.access(cfgfile, os.R_OK): resp.status = falcon.HTTP_500 resp.body = 'Invalid project name\n' return config = ConfigParser(interpolation=ExtendedInterpolation()) config.read(cfgfile) if 'pull' not in config or not config['pull'].get('socket'): resp.status = falcon.HTTP_500 resp.body = 'Invalid project configuration (no socket defined)\n' return sockfile = config['pull'].get('socket') if not os.access(sockfile, os.W_OK): resp.status = falcon.HTTP_500 resp.body = 'Invalid project configuration (socket does not exist or is not writable)\n' return try: with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as client: client.connect(sockfile) client.send(repo.encode()) except: resp.status = falcon.HTTP_500 resp.body = 'Unable to communicate with the socket\n' return resp.status = falcon.HTTP_204 app = falcon.API() pl = PubsubListener() app.add_route('/pubsub_v1', pl) grokmirror-2.0.11/contrib/python-grokmirror.spec000066400000000000000000000077361410330145700220330ustar00rootroot00000000000000%global srcname grokmirror %global groupname mirror %global username mirror %global userhome %{_sharedstatedir}/grokmirror Name: python-%{srcname} Version: 2.0.8 Release: 1%{?dist} Summary: Framework to smartly mirror git repositories License: GPLv3+ URL: https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git Source0: https://www.kernel.org/pub/software/network/grokmirror/grokmirror-%{version}.tar.xz BuildArch: noarch %global _description %{expand: Grokmirror was written to make mirroring large git repository collections more efficient. Grokmirror uses the manifest file published by the master mirror in order to figure out which repositories to clone, and to track which repositories require updating. The process is extremely lightweight and efficient both for the master and for the mirrors.} %description %_description %package -n python3-%{srcname} Summary: %{summary} Requires(pre): shadow-utils Requires: git-core, python3-packaging, python3-requests BuildRequires: python3-devel, python3-setuptools BuildRequires: systemd Obsoletes: python-%{srcname} < 2, python2-%{srcname} < 2 %description -n python3-%{srcname} %_description %prep %autosetup -n %{srcname}-%{version} %build %py3_build %install %py3_install %{__mkdir_p} -m 0755 \ %{buildroot}%{userhome} \ %{buildroot}%{_sysconfdir}/%{srcname} \ %{buildroot}%{_sysconfdir}/logrotate.d \ %{buildroot}%{_unitdir} \ %{buildroot}%{_bindir} \ %{buildroot}%{_tmpfilesdir} \ %{buildroot}%{_mandir}/man1 \ %{buildroot}%{_localstatedir}/log/%{srcname} \ %{buildroot}/run/%{srcname} %{__install} -m 0644 man/*.1 %{buildroot}/%{_mandir}/man1/ %{__install} -m 0644 contrib/*.service %{buildroot}/%{_unitdir}/ %{__install} -m 0644 contrib/*.timer %{buildroot}/%{_unitdir}/ %{__install} -m 0644 contrib/logrotate %{buildroot}/%{_sysconfdir}/logrotate.d/grokmirror %{__install} -m 0644 grokmirror.conf %{buildroot}/%{_sysconfdir}/%{srcname}/grokmirror.conf.example echo "d /run/%{srcname} 0755 %{username} %{groupname}" > %{buildroot}/%{_tmpfilesdir}/%{srcname}.conf %pre -n python3-%{srcname} getent group %{groupname} >/dev/null || groupadd -r %{groupname} getent passwd %{username} >/dev/null || \ useradd -r -g %{groupname} -d %{userhome} -s /sbin/nologin \ -c "Grokmirror user" %{username} exit 0 %files -n python3-%{srcname} %license LICENSE.txt %doc README.rst grokmirror.conf pi-piper.conf %dir %attr(0750, %{username}, %{groupname}) %{userhome} %dir %attr(0755, %{username}, %{groupname}) %{_localstatedir}/log/%{srcname}/ %dir %attr(0755, %{username}, %{groupname}) /run/%{srcname}/ %config %{_sysconfdir}/%{srcname}/* %config %{_sysconfdir}/logrotate.d/* %{_tmpfilesdir}/%{srcname}.conf %{_unitdir}/* %{python3_sitelib}/%{srcname}-*.egg-info/ %{python3_sitelib}/%{srcname}/ %{_bindir}/* %{_mandir}/*/* %changelog * Thu Mar 11 2021 Konstantin Ryabitsev - 2.0.8-1 - Update to 2.0.8 with fixes to symlink handling in manifests * Tue Jan 19 2021 Konstantin Ryabitsev - 2.0.7-1 - Update to 2.0.7 with improvements for very large repo collections * Thu Jan 07 2021 Konstantin Ryabitsev - 2.0.6-1 - Update to 2.0.6 with minor new features * Wed Nov 25 2020 Konstantin Ryabitsev - 2.0.5-1 - Update to 2.0.5 with minor new features * Wed Nov 04 2020 Konstantin Ryabitsev - 2.0.4-1 - Update to 2.0.4 with minor new features * Wed Nov 04 2020 Konstantin Ryabitsev - 2.0.3-1 - Update to 2.0.3 with minor new features * Tue Oct 06 2020 Konstantin Ryabitsev - 2.0.2-1 - Update to 2.0.2 - Install pi-piper into bindir * Wed Sep 30 2020 Konstantin Ryabitsev - 2.0.1-1 - Update to 2.0.1 * Mon Sep 21 2020 Konstantin Ryabitsev - 2.0.0-1 - Initial 2.0.0 packaging grokmirror-2.0.11/contrib/ref-updated000066400000000000000000000020501410330145700175460ustar00rootroot00000000000000#!/bin/bash # Gerrit's hook system is very different from standard git, so # minor modifications to the hook are required to make it work. # Place this file in your gerrit/hooks/ref-updated and modify the # variables below to make it work for you. GERRIT_HOME=/var/lib/gerrit GERRIT_GIT=/srv/gerrit/git GROK_MANIFEST_BIN=/usr/bin/grok-manifest GROK_MANIFEST_LOG=${GERRIT_HOME}/logs/grok-manifest.log # You'll need to place this where you can serve it with httpd # Make sure the gerrit process can write to this location GROK_MANIFEST=/var/www/html/grokmirror/manifest.js.gz # Yank out the project out of the passed params args=$(getopt -l "project:" -- "$@") eval set -- "$args" while [ $# -ge 1 ]; do case "$1" in --) # No more options left. shift break ;; --project) project="$2" shift ;; esac shift done ${GROK_MANIFEST_BIN} -y -w -l ${GROK_MANIFEST_LOG} \ -m ${GROK_MANIFEST} \ -t ${GERRIT_GIT} \ -n "${GERRIT_GIT}/${project}.git" grokmirror-2.0.11/contrib/selinux/000077500000000000000000000000001410330145700171155ustar00rootroot00000000000000grokmirror-2.0.11/contrib/selinux/el7/000077500000000000000000000000001410330145700176045ustar00rootroot00000000000000grokmirror-2.0.11/contrib/selinux/el7/grokmirror.fc000066400000000000000000000005121410330145700223110ustar00rootroot00000000000000/usr/bin/grok-.* -- gen_context(system_u:object_r:grokmirror_exec_t,s0) /var/lib/grokmirror(/.*)? gen_context(system_u:object_r:grokmirror_var_lib_t,s0) /var/run/grokmirror(/.*)? gen_context(system_u:object_r:grokmirror_var_run_t,s0) /var/log/grokmirror(/.*)? gen_context(system_u:object_r:grokmirror_log_t,s0) grokmirror-2.0.11/contrib/selinux/el7/grokmirror.te000066400000000000000000000107171410330145700223410ustar00rootroot00000000000000################## # Author: Konstantin Ryabitsev # policy_module(grokmirror, 1.1.1) require { type gitosis_var_lib_t; type git_sys_content_t; type net_conf_t; type httpd_t; type ssh_home_t; type passwd_file_t; type postfix_etc_t; } ################## # Declarations type grokmirror_t; type grokmirror_exec_t; init_daemon_domain(grokmirror_t, grokmirror_exec_t) type grokmirror_var_lib_t; files_type(grokmirror_var_lib_t) type grokmirror_log_t; logging_log_file(grokmirror_log_t) type grokmirror_var_run_t; files_pid_file(grokmirror_var_run_t) type grokmirror_tmpfs_t; files_tmpfs_file(grokmirror_tmpfs_t) gen_tunable(grokmirror_connect_ssh, false) gen_tunable(grokmirror_connect_all_unreserved, false) # Uncomment to put these domains into permissive mode permissive grokmirror_t; ################## # Daemons policy domain_use_interactive_fds(grokmirror_t) files_read_etc_files(grokmirror_t) miscfiles_read_localization(grokmirror_t) # Logging append_files_pattern(grokmirror_t, grokmirror_log_t, grokmirror_log_t) create_files_pattern(grokmirror_t, grokmirror_log_t, grokmirror_log_t) setattr_files_pattern(grokmirror_t, grokmirror_log_t, grokmirror_log_t) logging_log_filetrans(grokmirror_t, grokmirror_log_t, { file dir }) logging_send_syslog_msg(grokmirror_t) # Allow managing anything grokmirror_var_lib_t manage_dirs_pattern(grokmirror_t, grokmirror_var_lib_t, grokmirror_var_lib_t) manage_files_pattern(grokmirror_t, grokmirror_var_lib_t, grokmirror_var_lib_t) manage_lnk_files_pattern(grokmirror_t, grokmirror_var_lib_t, grokmirror_var_lib_t) manage_sock_files_pattern(grokmirror_t, grokmirror_var_lib_t, grokmirror_var_lib_t) # Allow managing git repositories manage_files_pattern(grokmirror_t, gitosis_var_lib_t, gitosis_var_lib_t) manage_lnk_files_pattern(grokmirror_t, gitosis_var_lib_t, gitosis_var_lib_t) manage_dirs_pattern(grokmirror_t, gitosis_var_lib_t, gitosis_var_lib_t) manage_sock_files_pattern(grokmirror_t, gitosis_var_lib_t, gitosis_var_lib_t) manage_files_pattern(grokmirror_t, git_sys_content_t, git_sys_content_t) manage_lnk_files_pattern(grokmirror_t, git_sys_content_t, git_sys_content_t) manage_dirs_pattern(grokmirror_t, git_sys_content_t, git_sys_content_t) manage_sock_files_pattern(grokmirror_t, git_sys_content_t, git_sys_content_t) # Allow executing bin (for git, mostly) corecmd_exec_bin(grokmirror_t) libs_exec_ldconfig(grokmirror_t) # Allow managing httpd content in case the manifest is stored there apache_manage_sys_content(grokmirror_t) # git wants to access system state and other bits kernel_dontaudit_read_system_state(grokmirror_t) # Allow connecting to http, git corenet_tcp_connect_http_port(grokmirror_t) corenet_tcp_connect_git_port(grokmirror_t) corenet_tcp_bind_generic_node(grokmirror_t) corenet_tcp_sendrecv_generic_node(grokmirror_t) # git needs to dns-resolve sysnet_dns_name_resolve(grokmirror_t) # Allow reading .netrc files read_files_pattern(grokmirror_t, net_conf_t, net_conf_t) # Post-hooks can use grep, which requires execmem allow grokmirror_t self:process execmem; fs_getattr_tmpfs(grokmirror_t) manage_files_pattern(grokmirror_t, grokmirror_tmpfs_t, grokmirror_tmpfs_t) fs_tmpfs_filetrans(grokmirror_t, grokmirror_tmpfs_t, file) # Listener socket file manage_dirs_pattern(grokmirror_t, grokmirror_var_run_t, grokmirror_var_run_t) manage_files_pattern(grokmirror_t, grokmirror_var_run_t, grokmirror_var_run_t) manage_sock_files_pattern(grokmirror_t, grokmirror_var_run_t, grokmirror_var_run_t) files_pid_filetrans(grokmirror_t, grokmirror_var_run_t, { dir file sock_file }) # allow httpd to write to the listener socket allow httpd_t grokmirror_t:unix_stream_socket connectto; # Some bogus dontaudits # ssh tries to open /etc/mailname, which the postfix module labels oddly dontaudit grokmirror_t postfix_etc_t:file { getattr open read }; tunable_policy(`grokmirror_connect_all_unreserved',` corenet_sendrecv_all_client_packets(grokmirror_t) corenet_tcp_connect_all_unreserved_ports(grokmirror_t) ') tunable_policy(`grokmirror_connect_ssh',` corenet_sendrecv_ssh_client_packets(grokmirror_t) corenet_tcp_connect_ssh_port(grokmirror_t) corenet_tcp_sendrecv_ssh_port(grokmirror_t) ssh_exec(grokmirror_t) ssh_read_user_home_files(grokmirror_t) # for the controlmaster socket manage_sock_files_pattern(grokmirror_t, ssh_home_t, ssh_home_t) allow grokmirror_t self:unix_stream_socket connectto; allow grokmirror_t passwd_file_t:file { getattr open read }; ') grokmirror-2.0.11/grokmirror.conf000066400000000000000000000340011410330145700170300ustar00rootroot00000000000000# Grokmirror 2.x and above have a single config file per each set # of mirrored repos, instead of a separate repos.conf and fsck.conf # with multiple sections. # # You can use ${varname} interpolation within the same section # or ${sectname:varname} from any other section. [core] # # Where are our mirrored repositories kept? toplevel = /var/lib/git/mirror # # Where should we keep our manifest file? manifest = ${toplevel}/manifest.js.gz # # Where should we put our log? Make sure it is logrotated, # otherwise it will grow indefinitely. log = ${toplevel}/log # # Options are "info" and "debug" for all the debug data (lots!) loglevel = info # # Grokmirror version 2.x and above can automatically recognize related repositories # by analyzing root commits. If it finds two or more related repositories, it can set # up a unified "object storage" repo and fetch all refs from each related repository. # For example, you can have two forks of linux.git: # foo/bar/linux.git: # refs/heads/master # refs/heads/devbranch # refs/tags/v5.0-rc3 # ... # baz/quux/linux.git: # refs/heads/master # refs/heads/devbranch # refs/tags/v5.0-rc3 # ... # Grokmirror will set up an object storage repository and fetch all refs from # both repositories: # objstore/[random-guid-name].git # refs/virtual/[sha1-of-foo/bar/linux.git:12]/heads/master # refs/virtual/[sha1-of-foo/bar/linux.git:12]/heads/devbranch # refs/virtual/[sha1-of-foo/bar/linux.git:12]/tags/v5.0-rc3 # ... # refs/virtual/[sha1-of-baz/quux/linux.git:12]/heads/master # refs/virtual/[sha1-of-baz/quux/linux.git:12]/heads/devbranch # refs/virtual/[sha1-of-baz/quux/linux.git:12]/tags/v5.0-rc3 # ... # # This will dramatically improve storage on disk, as original repositories will be # repacked to almost nothing. Grokmirror will repack the object storage repository # with --delta-islands to help optimize packs for efficient clones. objstore = ${toplevel}/objstore # # When copying objects into objstore repositories, we will use regular git # porcelain commands, such as git fetch. However, this tends to be slow due to # git erring on the side of caution when calculating haves and wants, so if you # are running a busy mirror and want to save a lot of cycles, you will want to # enable the setting below, which will use internal git plumbing for much more # direct object copying between repos. #objstore_uses_plumbing = yes # # Due to the nature of git alternates, if two repositories share all their objects # with an "object storage" repo, any object from repoA can be retrieved from repoB # via most web UIs if someone knows the object hash. # E.g. this is how this trick works on Github: # https://github.com/torvalds/linux/blob/b4061a10fc29010a610ff2b5b20160d7335e69bf/drivers/hid/hid-samsung.c#L113-L118 # # If you have private repositories that should absolutely not reveal any objects, # add them here using shell-style globbing. They will still be set up for alternates # if we find common roots with public repositories, but we won't fetch any objects # from these repos into refs/virtual/*. # # Leave blank if you don't have any private repos (or don't offer a web UI). #private = */private/* # Used by grok-manifest (and others for "pretty"). These options can be # overridden using matching command-line switches to grok-manifest. [manifest] # Enable to save pretty-printed js (larger and slower, but easier to debug) pretty = no # List of repositories to ignore -- can take multiple entries with newline+tab # and accepts shell globbing. ignore = /testing/* /private/* # Enable to fetch objects into objstore repos after commit. This can be useful if # someone tries to push the same objects to a sibling repository, but may significantly # slow down post-commit hook operation, negating any speed gains. If set to no, the # objects will be fetched during regular grok-fsck runs. fetch_objstore = no # Only include repositories that have git-daemon-export-ok. check_export_ok = no # Used by grok-pull, mostly [remote] # The host part of the mirror you're pulling from. site = https://git.kernel.org # # Where the grok manifest is published. The following protocols # are supported at this time: # http:// or https:// using If-Modified-Since http header # file:// (when manifest file is on NFS, for example) # NB: You can no longer specify username:password as part of the URL with # grokmirror 2.x and above. You can use a netrc file for this purpose. manifest = ${site}/manifest.js.gz # # As an alternative to setting a manifest URL, you can define a manifest_command. # It has three possible outcomes: # exit code 0 + full remote manifest on stdout (must be valid json) # exit code 1 + error message on stdout # exit code 127 + nothing on stdout if remote manifest hasn't changed # It should also accept '--force' as a single argument to force manifest retrieval # even if it hasn't changed. # See contrib/gitolite/* for example commands to use with gitolite. #manifest_command = /usr/local/bin/grok-get-gl-manifest.sh # # If the remote is providing pre-generated preload bundles, list the path # here. This is only useful if you're mirroring the entire repository # collection and not just a handful of select repos. #preload_bundle_url = https://some-cdn-site.com/preload/ # Used by grok-pull [pull] # # Write out projects.list that can be used by gitweb or cgit. # Leave blank if you don't want a projects.list. projectslist = ${core:toplevel}/projects.list # # When generating projects.list, start at this subpath instead # of at the toplevel. Useful when mirroring kernel or when generating # multiple gitweb/cgit configurations for the same tree. projectslist_trimtop = # # When generating projects.list, also create entries for symlinks. # Otherwise we assume they are just legacy and keep them out of # web interfaces. projectslist_symlinks = no # # A simple hook to execute whenever a repository is modified. # It passes the full path to the git repository modified as the final # argument. You can define multiple hooks if you separate them by # newline+whitespace. post_update_hook = # # Should we purge repositories that are not present in the remote # manifest? If set to "no" this can be overridden via the -p flag to # grok-pull (useful if you have a very large collection of repos # and don't want to walk the entire tree on each manifest run). # See also: purgeprotect. purge = yes # # There may be repositories that aren't replicated with grokmirror that # you don't want to be purged. You can list them below using bash-style # globbing. Separate multiple entries using newline+whitespace. #nopurge = /gitolite-admin.git # # This prevents catastrophic mirror purges when our upstream gives us a # manifest that is dramatically smaller than ours. The default is to # refuse the purge if the remote manifest has over 5% fewer repositories # than what we have, or in other words, if we have 100 repos and the # remote manifest has shrunk to 95 repos or fewer, we refuse to purge, # suspecting that something has gone wrong. You can set purgeprotect to # a higher percentage, or override it entirely with --force-purge # commandline flag. purgeprotect = 5 # # If owner is not specified in the manifest, who should be listed # as the default owner in tools like gitweb or cgit? #default_owner = Grokmirror User default_owner = Grokmirror User # # By default, we'll call the upstream origin "_grokmirror", but you can set your # own name here (e.g. just call it "origin") remotename = _grokmirror # # To speed up updates, grok-pull will use multiple threads. Please be # considerate to the mirror you're pulling from and don't set this very # high. You may also run into per-ip multiple session limits, so leave # this number at a nice low setting. pull_threads = 5 # # If git fetch fails, we will retry up to this many times before # giving up and marking that repository as failed. retries = 3 # # Use shell-globbing to list the repositories you would like to mirror. # If you want to mirror everything, just say "*". Separate multiple entries # with newline plus tab. Examples: # # mirror everything: #include = * # # mirror just the main kernel sources: #include = /pub/scm/linux/kernel/git/torvalds/linux.git # /pub/scm/linux/kernel/git/stable/linux.git # /pub/scm/linux/kernel/git/next/linux-next.git include = * # # This is processed after the include. If you want to exclude some # specific entries from an all-inclusive globbing above. E.g., to # exclude all linux-2.4 git sources: #exclude = */linux-2.4* exclude = # # List repositories that should always reject forced pushes. #ffonly = */torvalds/linux.git # # If you enable the following option and run grok-pull with -o, # grok-pull will run continuously and will periodically recheck the # remote maniefest for new updates. See contrib for an example systemd # service you can set up to continuously update your local mirror. The # value is in seconds. #refresh = 900 # # If you enable refresh, you can also enable the socket listener that # allows for rapid push notifications from your primary mirror. The # socket expects repository names matching what is in the local # manifest, followed by a newline. E.g.: # /pub/scm/linux/kernel/git/torvalds/linux.git\n # # Anything not matching a repository in the local manifest will be ignored. # See contrib for example pubsub listener. #socket = ${core:toplevel}/.updater.socket # Used by grok-fsck [fsck] # # How often should we check each repository, in days. Any newly added # repository will have the first check within a random period of 0 and # $frequency, and then every $frequency after that, to assure that not # all repositories are checked on the same day. Don't set to less than # 7 unless you only mirror a few repositories (or really like to thrash # your disks). frequency = 30 # # Where to keep the status file statusfile = ${core:toplevel}/fsck.status.js # # Some errors are relatively benign and can be safely ignored. Add # matching substrings to this field to ignore them. ignore_errors = notice: warning: disabling bitmap writing ignoring extra bitmap file missingTaggerEntry missingSpaceBeforeDate # # If the fsck process finds errors that match any of these strings # during its run, it will ask grok-pull to reclone this repository when # it runs next. Only useful for minion mirrors, not for mirror masters. reclone_on_errors = fatal: bad tree object fatal: Failed to traverse parents missing commit missing blob missing tree broken link # # Should we repack the repositories? You almost always want this on, # unless you are doing something really odd. repack = yes # # We set proper flags for repacking depending if the repo is using # alternates or not, and whether this is a full repack or not. We will # also always build bitmaps (when it makes sense), to make cloning # faster. You can add other flags (e.g. --threads and --window-memory) # via the following parameter: extra_repack_flags = # # These flags are added *in addition* to extra_repack_flags extra_repack_flags_full = --window=250 --depth=50 # # If git version is new enough to support generating commit graphs, we # will always generate them, though if your git version is older than # 2.24.0, the graphs won't be automatically used unless core.commitgraph # is set to true. You can turn off graph generation by setting the # commitgraph option to "no". Graph generation will be skipped for # child repos that use alternates. commitgraph = yes # # Run git-prune to remove obsolete loose objects. Grokmirror will make # sure this is a safe operation when it comes to objstore repos, so you # should leave this enabled. prune = yes # # Grokmirror is extremely careful about not pruning the repositories # that are used by others via git alternates. However, it cannot prevent # some other git process (not grokmirror-managed) from inadvertently # running "git prune/gc". For example, this may happen if an admin # mistypes a command in the wrong directory. Setting precious=yes will # add extensions.preciousObjects=true to the git configuration file in # such repositories, which will help prevent repository corruption # between grok-fsck runs. # # When set to "yes", grokmirror will temporarily turn this feature off # when running scheduled repacks in order to be able to delete redundant # packs and loose objects that have already been packed. This is usually # a safe operation when done by grok-fsck itself. However, if you set # this to "always", grokmirror will leave this enabled even during # grok-fsck runs, for maximum paranoia. Be warned, that this will result # in ever-growing git repositories, so it only makes sense in very rare # situations, such as for backup purposes. precious = yes # # If you have a lot of forks using the same objstore repo, you may end # up with thousands of refs being negotiated during each remote update. # This tends to result in higher load and bigger negotiation transfers. # Setting the "baselines" option allows you to designate a set of repos # that are likely to have most of the relevant objects and ignore the # rest of the objstore refs. This is done using the # core.alternateRefsPrefixes feature (see git-config). baselines = */kernel/git/next/linux-next.git # # Objstore repos are repacked with delta island support (see man # git-config), but if you have one repo that is a lot more likely to be # cloned than all the other ones, you can designate it as "islandCore", # which will give it priority when creating packs. islandcores = */kernel/git/torvalds/linux.git # # Generate preload bundles for objstore repos and put them into this # location. Unless you are running a major mirroring hub site, you # do not want this enabled. See corresponding preload_bundle_url # entry in the [remote] section. #preload_bundle_outdir = /some/http/accessible/path # # If there are any critical errors, the report will be sent to root. You # can change the settings below to configure report delivery to suit # your needs: #report_to = root #report_from = root #report_subject = git fsck errors on my beautiful replica #report_mailhost = localhost grokmirror-2.0.11/grokmirror/000077500000000000000000000000001410330145700161635ustar00rootroot00000000000000grokmirror-2.0.11/grokmirror/__init__.py000066400000000000000000001120071410330145700202750ustar00rootroot00000000000000# -*- coding: utf-8 -*- # Copyright (C) 2013-2020 by The Linux Foundation and contributors # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import sys import time import json import fnmatch import subprocess import requests import logging import logging.handlers import hashlib import pathlib import uuid import tempfile import shutil import gzip import datetime from fcntl import lockf, LOCK_EX, LOCK_UN, LOCK_NB from requests.adapters import HTTPAdapter from requests.packages.urllib3.util.retry import Retry VERSION = '2.0.11' MANIFEST_LOCKH = None REPO_LOCKH = dict() GITBIN = '/usr/bin/git' # default logger. Will be overridden. logger = logging.getLogger(__name__) _alt_repo_map = None # Used to store our requests session REQSESSION = None OBST_PREAMBULE = ('# WARNING: This is a grokmirror object storage repository.\n' '# Deleting or moving it will cause corruption in the following repositories\n' '# (caution, this list may be incomplete):\n') def get_requests_session(): global REQSESSION if REQSESSION is None: REQSESSION = requests.session() retry = Retry(connect=3, backoff_factor=0.5) adapter = HTTPAdapter(max_retries=retry) REQSESSION.mount('http://', adapter) REQSESSION.mount('https://', adapter) REQSESSION.headers.update({'User-Agent': 'grokmirror/%s' % VERSION}) return REQSESSION def get_config_from_git(fullpath, regexp, defaults=None): args = ['config', '-z', '--get-regexp', regexp] ecode, out, err = run_git_command(fullpath, args) gitconfig = defaults if not gitconfig: gitconfig = dict() if not out: return gitconfig for line in out.split('\x00'): if not line: continue key, value = line.split('\n', 1) try: chunks = key.split('.') cfgkey = chunks[-1] gitconfig[cfgkey.lower()] = value except ValueError: logger.debug('Ignoring git config entry %s', line) return gitconfig def set_git_config(fullpath, param, value, operation='--replace-all'): args = ['config', operation, param, value] ecode, out, err = run_git_command(fullpath, args) return ecode def git_newer_than(minver): from packaging import version (retcode, output, error) = run_git_command(None, ['--version']) ver = output.split()[-1] return version.parse(ver) >= version.parse(minver) def run_shell_command(cmdargs, stdin=None, decode=True): logger.debug('Running: %s', ' '.join(cmdargs)) child = subprocess.Popen(cmdargs, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) output, error = child.communicate(input=stdin) if decode: output = output.decode().strip() error = error.decode().strip() return child.returncode, output, error def run_git_command(fullpath, args, stdin=None, decode=True): if 'GITBIN' in os.environ: _git = os.environ['GITBIN'] else: _git = GITBIN if not os.path.isfile(_git) and os.access(_git, os.X_OK): # we hope for the best by using 'git' without full path _git = 'git' if fullpath is not None: cmdargs = [_git, '--no-pager', '--git-dir', fullpath] + args else: cmdargs = [_git, '--no-pager'] + args return run_shell_command(cmdargs, stdin, decode=decode) def _lockname(fullpath): lockpath = os.path.dirname(fullpath) lockname = '.%s.lock' % os.path.basename(fullpath) if not os.path.exists(lockpath): os.makedirs(lockpath) repolock = os.path.join(lockpath, lockname) return repolock def lock_repo(fullpath, nonblocking=False): repolock = _lockname(fullpath) logger.debug('Attempting to exclusive-lock %s', repolock) lockfh = open(repolock, 'w') if nonblocking: flags = LOCK_EX | LOCK_NB else: flags = LOCK_EX lockf(lockfh, flags) global REPO_LOCKH REPO_LOCKH[fullpath] = lockfh def unlock_repo(fullpath): global REPO_LOCKH if fullpath in REPO_LOCKH.keys(): logger.debug('Unlocking %s', fullpath) lockf(REPO_LOCKH[fullpath], LOCK_UN) REPO_LOCKH[fullpath].close() del REPO_LOCKH[fullpath] def is_bare_git_repo(path): """ Return True if path (which is already verified to be a directory) sufficiently resembles a base git repo (good enough to fool git itself). """ logger.debug('Checking if %s is a git repository', path) if (os.path.isdir(os.path.join(path, 'objects')) and os.path.isdir(os.path.join(path, 'refs')) and os.path.isfile(os.path.join(path, 'HEAD'))): return True logger.debug('Skipping %s: not a git repository', path) return False def get_repo_timestamp(toplevel, gitdir): ts = 0 fullpath = os.path.join(toplevel, gitdir.lstrip('/')) tsfile = os.path.join(fullpath, 'grokmirror.timestamp') if os.path.exists(tsfile): with open(tsfile, 'rb') as tsfh: contents = tsfh.read() try: ts = int(contents) logger.debug('Timestamp for %s: %s', gitdir, ts) except ValueError: logger.warning('Was not able to parse timestamp in %s', tsfile) else: logger.debug('No existing timestamp for %s', gitdir) return ts def set_repo_timestamp(toplevel, gitdir, ts): fullpath = os.path.join(toplevel, gitdir.lstrip('/')) tsfile = os.path.join(fullpath, 'grokmirror.timestamp') with open(tsfile, 'wt') as tsfh: tsfh.write('%d' % ts) logger.debug('Recorded timestamp for %s: %s', gitdir, ts) def get_repo_obj_info(fullpath): args = ['count-objects', '-v'] retcode, output, error = run_git_command(fullpath, args) obj_info = dict() if output: for line in output.split('\n'): key, value = line.split(':') obj_info[key] = value.strip() return obj_info def get_repo_defs(toplevel, gitdir, usenow=False, ignorerefs=None): fullpath = os.path.join(toplevel, gitdir.lstrip('/')) description = None try: descfile = os.path.join(fullpath, 'description') with open(descfile, 'rb') as fh: contents = fh.read().strip() if len(contents) and contents.find(b'edit this file') < 0: # We don't need to tell mirrors to edit this file description = contents.decode(errors='replace') except IOError: pass entries = get_config_from_git(fullpath, r'gitweb\..*') owner = entries.get('owner', None) modified = 0 if not usenow: args = ['for-each-ref', '--sort=-committerdate', '--format=%(committerdate:iso-strict)', '--count=1'] ecode, out, err = run_git_command(fullpath, args) if len(out): try: modified = datetime.datetime.fromisoformat(out) except AttributeError: # Python 3.6 doesn't have fromisoformat # remove : from the TZ info out = out[:-3] + out[-2:] modified = datetime.datetime.strptime(out, '%Y-%m-%dT%H:%M:%S%z') if not modified: modified = datetime.datetime.now() head = None try: with open(os.path.join(fullpath, 'HEAD')) as fh: head = fh.read().strip() except IOError: pass forkgroup = None altrepo = get_altrepo(fullpath) if altrepo and os.path.exists(os.path.join(altrepo, 'grokmirror.objstore')): forkgroup = os.path.basename(altrepo)[:-4] # we need a way to quickly compare whether mirrored repositories match # what is in the master manifest. To this end, we calculate a so-called # "state fingerprint" -- basically the output of "git show-ref | sha1sum". # git show-ref output is deterministic and should accurately list all refs # and their relation to heads/tags/etc. fingerprint = get_repo_fingerprint(toplevel, gitdir, force=True, ignorerefs=ignorerefs) # Record it in the repo for other use set_repo_fingerprint(toplevel, gitdir, fingerprint) repoinfo = { 'modified': int(modified.timestamp()), 'fingerprint': fingerprint, 'head': head, } # Don't add empty things to manifest if owner: repoinfo['owner'] = owner if description: repoinfo['description'] = description if forkgroup: repoinfo['forkgroup'] = forkgroup return repoinfo def get_altrepo(fullpath): altfile = os.path.join(fullpath, 'objects', 'info', 'alternates') altdir = None try: with open(altfile, 'r') as fh: contents = fh.read().strip() if len(contents) > 8 and contents[-8:] == '/objects': altdir = os.path.realpath(contents[:-8]) except IOError: pass return altdir def set_altrepo(fullpath, altdir): # I assume you already checked if this is a sane operation to perform altfile = os.path.join(fullpath, 'objects', 'info', 'alternates') objpath = os.path.join(altdir, 'objects') if os.path.isdir(objpath): with open(altfile, 'w') as fh: fh.write(objpath + '\n') else: logger.critical('objdir %s does not exist, not setting alternates file %s', objpath, altfile) def get_rootsets(toplevel, obstdir): top_roots = dict() obst_roots = dict() topdirs = find_all_gitdirs(toplevel, normalize=True, exclude_objstore=True) obstdirs = find_all_gitdirs(obstdir, normalize=True, exclude_objstore=False) for fullpath in topdirs: roots = get_repo_roots(fullpath) if roots: top_roots[fullpath] = roots for fullpath in obstdirs: if fullpath in obst_roots: continue roots = get_repo_roots(fullpath) if roots: obst_roots[fullpath] = roots return top_roots, obst_roots def get_repo_roots(fullpath, force=False): if not os.path.exists(fullpath): logger.debug('Cannot check roots in %s, as it does not exist', fullpath) return None rfile = os.path.join(fullpath, 'grokmirror.roots') if not force and os.path.exists(rfile): with open(rfile, 'rt') as rfh: content = rfh.read() roots = set(content.split('\n')) else: logger.debug('Generating roots for %s', fullpath) ecode, out, err = run_git_command(fullpath, ['rev-list', '--max-parents=0', '--all']) if ecode > 0: logger.debug('Error listing roots in %s', fullpath) return None if not len(out): logger.debug('No roots in %s', fullpath) return None # save it for future use with open(rfile, 'w') as rfh: rfh.write(out) logger.debug('Wrote %s', rfile) roots = set(out.split('\n')) return roots def setup_bare_repo(fullpath): args = ['init', '--bare', fullpath] ecode, out, err = run_git_command(None, args) if ecode > 0: logger.critical('Unable to bare-init %s', fullpath) return False # Remove .sample files from hooks, because they are just dead weight hooksdir = os.path.join(fullpath, 'hooks') for child in pathlib.Path(hooksdir).iterdir(): if child.suffix == '.sample': child.unlink() # We never want auto-gc anywhere set_git_config(fullpath, 'gc.auto', '0') # We don't care about FETCH_HEAD information and writing to it just # wastes IO cycles os.symlink('/dev/null', os.path.join(fullpath, 'FETCH_HEAD')) return True def setup_objstore_repo(obstdir, name=None): if name is None: name = str(uuid.uuid4()) pathlib.Path(obstdir).mkdir(parents=True, exist_ok=True) obstrepo = os.path.join(obstdir, '%s.git' % name) logger.debug('Creating objstore repo in %s', obstrepo) lock_repo(obstrepo) if not setup_bare_repo(obstrepo): sys.exit(1) # All our objects are precious -- we only turn this off when repacking set_git_config(obstrepo, 'core.repositoryformatversion', '1') set_git_config(obstrepo, 'extensions.preciousObjects', 'true') # Set maximum compression, though perhaps we should make this configurable set_git_config(obstrepo, 'pack.compression', '9') # Set island configs set_git_config(obstrepo, 'repack.useDeltaIslands', 'true') set_git_config(obstrepo, 'repack.writeBitmaps', 'true') set_git_config(obstrepo, 'pack.island', 'refs/virtual/([0-9a-f]+)/', operation='--add') telltale = os.path.join(obstrepo, 'grokmirror.objstore') with open(telltale, 'w') as fh: fh.write(OBST_PREAMBULE) unlock_repo(obstrepo) return obstrepo def objstore_virtref(fullpath): fullpath = os.path.realpath(fullpath) vh = hashlib.sha1() vh.update(fullpath.encode()) return vh.hexdigest()[:12] def objstore_trim_virtref(obstrepo, virtref): args = ['for-each-ref', '--format', 'delete %(refname)', f'refs/virtual/{virtref}'] ecode, out, err = run_git_command(obstrepo, args) if ecode == 0 and len(out): out += '\n' args = ['update-ref', '--stdin'] run_git_command(obstrepo, args, stdin=out.encode()) def remove_from_objstore(obstrepo, fullpath): # is fullpath still using us? altrepo = get_altrepo(fullpath) if altrepo and os.path.realpath(obstrepo) == os.path.realpath(altrepo): # Repack the child first, using minimal flags args = ['repack', '-abq'] ecode, out, err = run_git_command(fullpath, args) if ecode > 0: logger.debug('Could not repack child repo %s for removal from %s', fullpath, obstrepo) return False os.unlink(os.path.join(fullpath, 'objects', 'info', 'alternates')) virtref = objstore_virtref(fullpath) objstore_trim_virtref(obstrepo, virtref) args = ['remote', 'remove', virtref] run_git_command(obstrepo, args) try: os.unlink(os.path.join(obstrepo, 'grokmirror.%s.fingerprint' % virtref)) except (IOError, FileNotFoundError): pass return True def list_repo_remotes(fullpath, withurl=False): args = ['remote'] if withurl: args.append('-v') ecode, out, err = run_git_command(fullpath, args) if not len(out): logger.debug('Could not list remotes in %s', fullpath) return list() if not withurl: return out.split('\n') remotes = list() for line in out.split('\n'): entry = tuple(line.split()[:2]) if entry not in remotes: remotes.append(entry) return remotes def add_repo_to_objstore(obstrepo, fullpath): virtref = objstore_virtref(fullpath) remotes = list_repo_remotes(obstrepo) if virtref in remotes: logger.debug('%s is already set up for objstore in %s', fullpath, obstrepo) return False args = ['remote', 'add', virtref, fullpath, '--no-tags'] ecode, out, err = run_git_command(obstrepo, args) if ecode > 0: logger.critical('Could not add remote to %s', obstrepo) sys.exit(1) set_git_config(obstrepo, 'remote.%s.fetch' % virtref, '+refs/*:refs/virtual/%s/*' % virtref) telltale = os.path.join(obstrepo, 'grokmirror.objstore') knownsiblings = set() if os.path.exists(telltale): with open(telltale) as fh: for line in fh.readlines(): line = line.strip() if not len(line) or line[0] == '#': continue if os.path.isdir(line): knownsiblings.add(line) knownsiblings.add(fullpath) with open(telltale, 'w') as fh: fh.write(OBST_PREAMBULE) fh.write('\n'.join(sorted(list(knownsiblings))) + '\n') return True def _fetch_objstore_repo_using_plumbing(srcrepo, obstrepo, virtref): # Copies objects to objstore repos using direct git plumbing # as opposed to using "fetch". See discussion here: # http://lore.kernel.org/git/20200720173220.GB2045458@coredump.intra.peff.net # First, hardlink all objects and packs srcobj = os.path.join(srcrepo, 'objects') dstobj = os.path.join(obstrepo, 'objects') torm = set() for root, dirs, files in os.walk(srcobj, topdown=True): if 'info' in dirs: dirs.remove('info') subpath = root.replace(srcobj, '').lstrip('/') for file in files: srcpath = os.path.join(root, file) if file.endswith('.bitmap'): torm.add(srcpath) continue dstpath = os.path.join(dstobj, subpath, file) if not os.path.exists(dstpath): pathlib.Path(os.path.dirname(dstpath)).mkdir(parents=True, exist_ok=True) os.link(srcpath, dstpath) torm.add(srcpath) # Now we generate a list of refs on both sides srcargs = ['for-each-ref', f'--format=%(objectname) refs/virtual/{virtref}/%(refname:lstrip=1)'] ecode, out, err = run_git_command(srcrepo, srcargs) if ecode > 0: logger.debug('Could not for-each-ref %s: %s', srcrepo, err) return False srcset = set(out.strip().split('\n')) dstargs = ['for-each-ref', f'--format=%(objectname) %(refname)', f'refs/virtual/{virtref}'] ecode, out, err = run_git_command(obstrepo, dstargs) if ecode > 0: logger.debug('Could not for-each-ref %s: %s', obstrepo, err) return False dstset = set(out.strip().split('\n')) # Now we create a stdin list of commands for update-ref mapping = dict() newset = srcset.difference(dstset) if newset: for refline in newset: obj, ref = refline.split(' ', 1) mapping[ref] = obj commands = '' oldset = dstset.difference(srcset) if oldset: for refline in oldset: if not len(refline): continue obj, ref = refline.split(' ', 1) if ref in mapping: commands += f'update {ref} {mapping[ref]} {obj}\n' mapping.pop(ref) else: commands += f'delete {ref} {obj}\n' for ref, obj in mapping.items(): commands += f'create {ref} {obj}\n' logger.debug('stdin=%s', commands) args = ['update-ref', '--stdin'] ecode, out, err = run_git_command(obstrepo, args, stdin=commands.encode()) if ecode > 0: logger.debug('Could not update-ref %s: %s', obstrepo, err) return False for file in torm: os.unlink(file) return True def fetch_objstore_repo(obstrepo, fullpath=None, pack_refs=False, use_plumbing=False): my_remotes = list_repo_remotes(obstrepo, withurl=True) if fullpath: virtref = objstore_virtref(fullpath) if (virtref, fullpath) in my_remotes: remotes = {(virtref, fullpath)} else: logger.debug('%s is not in remotes for %s', fullpath, obstrepo) return False else: remotes = my_remotes success = True for (virtref, url) in remotes: if use_plumbing: success = _fetch_objstore_repo_using_plumbing(url, obstrepo, virtref) else: ecode, out, err = run_git_command(obstrepo, ['fetch', virtref, '--prune']) if ecode > 0: success = False if success: r_fp = os.path.join(url, 'grokmirror.fingerprint') if os.path.exists(r_fp): l_fp = os.path.join(obstrepo, 'grokmirror.%s.fingerprint' % virtref) shutil.copy(r_fp, l_fp) if pack_refs: try: lock_repo(obstrepo, nonblocking=True) run_git_command(obstrepo, ['pack-refs']) unlock_repo(obstrepo) except IOError: # Next run will take care of it pass else: logger.info('Could not fetch objects from %s to %s', url, obstrepo) return success def is_private_repo(config, fullpath): privmasks = config['core'].get('private', '') if not len(privmasks): return False for privmask in privmasks.split('\n'): # Does this repo match privrepo if fnmatch.fnmatch(fullpath, privmask.strip()): return True return False def find_siblings(fullpath, my_roots, known_roots, exact=False): siblings = set() for gitpath, gitroots in known_roots.items(): # Of course we're going to match ourselves if fullpath == gitpath or not my_roots or not gitroots or not len(gitroots.intersection(my_roots)): continue if gitroots == my_roots: siblings.add(gitpath) continue if exact: continue if gitroots.issubset(my_roots) or my_roots.issubset(gitroots): siblings.add(gitpath) continue sumdiff = len(gitroots.difference(my_roots)) + len(my_roots.difference(gitroots)) # If we only differ by a single root, consider us siblings if sumdiff <= 2: siblings.add(gitpath) continue return siblings def find_best_obstrepo(mypath, obst_roots, toplevel, baselines, minratio=0.2): # We want to find a repo with best intersect len to total roots len ratio, # but we'll ignore any repos where the ratio is too low, in order not to lump # together repositories that have very weak common histories. myroots = get_repo_roots(mypath) if not myroots: return None obstrepo = None bestratio = 0 for path, roots in obst_roots.items(): if path == mypath or not roots: continue icount = len(roots.intersection(myroots)) if icount == 0: # No match at all continue # Baseline repos win over the ratio logic if len(baselines): # Any of its member siblings match baselines? s_remotes = list_repo_remotes(path, withurl=True) for virtref, childpath in s_remotes: gitdir = '/' + os.path.relpath(childpath, toplevel) for baseline in baselines: # Does this repo match a baseline if fnmatch.fnmatch(gitdir, baseline): # Use this one return path ratio = icount / len(roots) if ratio < minratio: continue if ratio > bestratio: obstrepo = path bestratio = ratio return obstrepo def get_obstrepo_mapping(obstdir): mapping = dict() if not os.path.isdir(obstdir): return mapping for child in pathlib.Path(obstdir).iterdir(): if child.is_dir() and child.suffix == '.git': obstrepo = child.as_posix() ecode, out, err = run_git_command(obstrepo, ['remote', '-v']) if ecode > 0: # weird continue lines = out.split('\n') for line in lines: chunks = line.split() if len(chunks) < 2: continue name, url = chunks[:2] if url in mapping: continue # Does it still exist? if not os.path.isdir(url): continue mapping[url] = obstrepo return mapping def find_objstore_repo_for(obstdir, fullpath): if not os.path.isdir(obstdir): return None logger.debug('Finding an objstore repo matching %s', fullpath) virtref = objstore_virtref(fullpath) for child in pathlib.Path(obstdir).iterdir(): if child.is_dir() and child.suffix == '.git': obstrepo = child.as_posix() remotes = list_repo_remotes(obstrepo) if virtref in remotes: logger.debug('Found %s', child.name) return obstrepo logger.debug('No matching objstore repos for %s', fullpath) return None def get_forkgroups(obstdir, toplevel): forkgroups = dict() if not os.path.exists(obstdir): return forkgroups for child in pathlib.Path(obstdir).iterdir(): if child.is_dir() and child.suffix == '.git': forkgroup = child.stem forkgroups[forkgroup] = set() obstrepo = child.as_posix() remotes = list_repo_remotes(obstrepo, withurl=True) for virtref, url in remotes: if url.find(toplevel) != 0: continue forkgroups[forkgroup].add(url) return forkgroups def get_repo_fingerprint(toplevel, gitdir, force=False, ignorerefs=None): fullpath = os.path.join(toplevel, gitdir.lstrip('/')) if not os.path.exists(fullpath): logger.debug('Cannot fingerprint %s, as it does not exist', fullpath) return None fpfile = os.path.join(fullpath, 'grokmirror.fingerprint') if not force and os.path.exists(fpfile): with open(fpfile, 'r') as fpfh: fingerprint = fpfh.read() logger.debug('Fingerprint for %s: %s', gitdir, fingerprint) else: logger.debug('Generating fingerprint for %s', gitdir) ecode, out, err = run_git_command(fullpath, ['show-ref']) if ecode > 0 or not len(out): logger.debug('No heads in %s, nothing to fingerprint.', fullpath) return None if ignorerefs: hasher = hashlib.sha1() for line in out.split('\n'): rhash, rname = line.split(maxsplit=1) ignored = False for ignoreref in ignorerefs: if fnmatch.fnmatch(rname, ignoreref): ignored = True break if ignored: continue hasher.update(line.encode() + b'\n') fingerprint = hasher.hexdigest() else: # We add the final "\n" to be compatible with cmdline output # of git-show-ref fingerprint = hashlib.sha1(out.encode() + b'\n').hexdigest() # Save it for future use if not force: set_repo_fingerprint(toplevel, gitdir, fingerprint) return fingerprint def set_repo_fingerprint(toplevel, gitdir, fingerprint=None): fullpath = os.path.join(toplevel, gitdir.lstrip('/')) fpfile = os.path.join(fullpath, 'grokmirror.fingerprint') if fingerprint is None: fingerprint = get_repo_fingerprint(toplevel, gitdir, force=True) with open(fpfile, 'wt') as fpfh: fpfh.write('%s' % fingerprint) logger.debug('Recorded fingerprint for %s: %s', gitdir, fingerprint) return fingerprint def get_altrepo_map(toplevel, refresh=False): global _alt_repo_map if _alt_repo_map is None or refresh: logger.info(' search: finding all repos using alternates') _alt_repo_map = dict() tp = pathlib.Path(toplevel) for subp in tp.glob('**/*.git'): if subp.is_symlink(): # Don't care about symlinks for altrepo mapping continue fullpath = subp.resolve().as_posix() altrepo = get_altrepo(fullpath) if not altrepo: continue if altrepo not in _alt_repo_map: _alt_repo_map[altrepo] = set() _alt_repo_map[altrepo].add(fullpath) return _alt_repo_map def is_alt_repo(toplevel, refrepo): amap = get_altrepo_map(toplevel) looking_for = os.path.realpath(os.path.join(toplevel, refrepo.strip('/'))) if looking_for in amap: return True return False def is_obstrepo(fullpath, obstdir=None): if obstdir: # At this point, both should be normalized return fullpath.find(obstdir) == 0 # Just check if it has a grokmirror.objstore file in the repo return os.path.exists(os.path.join(fullpath, 'grokmirror.objstore')) def find_all_gitdirs(toplevel, ignore=None, normalize=False, exclude_objstore=True): global _alt_repo_map if _alt_repo_map is None: _alt_repo_map = dict() build_amap = True else: build_amap = False if ignore is None: ignore = set() logger.info(' search: finding all repos in %s', toplevel) logger.debug('Ignore list: %s', ' '.join(ignore)) gitdirs = set() for root, dirs, files in os.walk(toplevel, topdown=True): if not len(dirs): continue torm = set() for name in dirs: fullpath = os.path.join(root, name) # Should we ignore this dir? ignored = False for ignoredir in ignore: if fnmatch.fnmatch(fullpath, ignoredir): torm.add(name) ignored = True break if ignored: continue if not is_bare_git_repo(fullpath): continue if exclude_objstore and os.path.exists(os.path.join(fullpath, 'grokmirror.objstore')): continue if normalize: fullpath = os.path.realpath(fullpath) logger.debug('Found %s', os.path.join(root, name)) gitdirs.add(fullpath) torm.add(name) if build_amap: altrepo = get_altrepo(fullpath) if not altrepo: continue if altrepo not in _alt_repo_map: _alt_repo_map[altrepo] = set() _alt_repo_map[altrepo].add(fullpath) for name in torm: # don't recurse into the found *.git dirs dirs.remove(name) return gitdirs def manifest_lock(manifile): global MANIFEST_LOCKH if MANIFEST_LOCKH is not None: logger.debug('Manifest %s already locked', manifile) manilock = _lockname(manifile) MANIFEST_LOCKH = open(manilock, 'w') logger.debug('Attempting to lock %s', manilock) lockf(MANIFEST_LOCKH, LOCK_EX) logger.debug('Manifest lock obtained') def manifest_unlock(manifile): global MANIFEST_LOCKH if MANIFEST_LOCKH is not None: logger.debug('Unlocking manifest %s', manifile) # noinspection PyTypeChecker lockf(MANIFEST_LOCKH, LOCK_UN) # noinspection PyUnresolvedReferences MANIFEST_LOCKH.close() MANIFEST_LOCKH = None def read_manifest(manifile, wait=False): while True: if not wait or os.path.exists(manifile): break logger.info(' manifest: manifest does not exist yet, waiting ...') # Unlock the manifest so other processes aren't waiting for us was_locked = False if MANIFEST_LOCKH is not None: was_locked = True manifest_unlock(manifile) time.sleep(1) if was_locked: manifest_lock(manifile) if not os.path.exists(manifile): logger.info(' manifest: no local manifest, assuming initial run') return dict() if manifile.find('.gz') > 0: fh = gzip.open(manifile, 'rb') else: fh = open(manifile, 'rb') logger.debug('Reading %s', manifile) jdata = fh.read().decode('utf-8') fh.close() # noinspection PyBroadException try: manifest = json.loads(jdata) except: # We'll regenerate the file entirely on failure to parse logger.critical('Unable to parse %s, will regenerate', manifile) manifest = dict() logger.debug('Manifest contains %s entries', len(manifest.keys())) return manifest def write_manifest(manifile, manifest, mtime=None, pretty=False): logger.debug('Writing new %s', manifile) (dirname, basename) = os.path.split(manifile) (fd, tmpfile) = tempfile.mkstemp(prefix=basename, dir=dirname) fh = os.fdopen(fd, 'wb', 0) logger.debug('Created a temporary file in %s', tmpfile) logger.debug('Writing to %s', tmpfile) try: if pretty: jdata = json.dumps(manifest, indent=2, sort_keys=True) else: jdata = json.dumps(manifest) jdata = jdata.encode('utf-8') if manifile.endswith('.gz'): gfh = gzip.GzipFile(fileobj=fh, mode='wb') gfh.write(jdata) gfh.close() else: fh.write(jdata) os.fsync(fd) fh.close() # set mode to current umask curmask = os.umask(0) os.chmod(tmpfile, 0o0666 ^ curmask) os.umask(curmask) if mtime is not None: logger.debug('Setting mtime to %s', mtime) os.utime(tmpfile, (mtime, mtime)) logger.debug('Moving %s to %s', tmpfile, manifile) shutil.move(tmpfile, manifile) finally: # If something failed, don't leave these trailing around if os.path.exists(tmpfile): logger.debug('Removing %s', tmpfile) os.unlink(tmpfile) def load_config_file(cfgfile): from configparser import ConfigParser, ExtendedInterpolation if not os.path.exists(cfgfile): sys.stderr.write('ERORR: File does not exist: %s\n' % cfgfile) sys.exit(1) config = ConfigParser(interpolation=ExtendedInterpolation()) config.read(cfgfile) if 'core' not in config: sys.stderr.write('ERROR: Section [core] must exist in: %s\n' % cfgfile) sys.stderr.write(' Perhaps this is a grokmirror-1.x config file?\n') sys.exit(1) toplevel = os.path.realpath(os.path.expanduser(config['core'].get('toplevel'))) if not os.access(toplevel, os.W_OK): logger.critical('Toplevel %s does not exist or is not writable', toplevel) sys.exit(1) # Just in case we did expanduser config['core']['toplevel'] = toplevel obstdir = config['core'].get('objstore', None) if obstdir is None: obstdir = os.path.join(toplevel, 'objstore') config['core']['objstore'] = obstdir # Handle some other defaults manifile = config['core'].get('manifest') if not manifile: config['core']['manifest'] = os.path.join(toplevel, 'manifest.js.gz') fstat = os.stat(cfgfile) # stick last config file modification date into the config object, # so we can catch config file updates config.last_modified = fstat[8] return config def is_precious(fullpath): args = ['config', '--get', 'extensions.preciousObjects'] retcode, output, error = run_git_command(fullpath, args) if output.strip().lower() in ('yes', 'true', '1'): return True return False def get_repack_level(obj_info, max_loose_objects=1200, max_packs=20, pc_loose_objects=10, pc_loose_size=10): # for now, hardcode the maximum loose objects and packs # XXX: we can probably set this in git config values? # I don't think this makes sense as a global setting, because # optimal values will depend on the size of the repo as a whole packs = int(obj_info['packs']) count_loose = int(obj_info['count']) needs_repack = 0 # first, compare against max values: if packs >= max_packs: logger.debug('Triggering full repack because packs > %s', max_packs) needs_repack = 2 elif count_loose >= max_loose_objects: logger.debug('Triggering quick repack because loose objects > %s', max_loose_objects) needs_repack = 1 else: # is the number of loose objects or their size more than 10% of # the overall total? in_pack = int(obj_info['in-pack']) size_loose = int(obj_info['size']) size_pack = int(obj_info['size-pack']) total_obj = count_loose + in_pack total_size = size_loose + size_pack # If we have an alternate, then add those numbers in alternate = obj_info.get('alternate') if alternate and len(alternate) > 8 and alternate[-8:] == '/objects': alt_obj_info = get_repo_obj_info(alternate[:-8]) total_obj += int(alt_obj_info['in-pack']) total_size += int(alt_obj_info['size-pack']) # set some arbitrary "worth bothering" limits so we don't # continuously repack tiny repos. if total_obj > 500 and count_loose / total_obj * 100 >= pc_loose_objects: logger.debug('Triggering repack because loose objects > %s%% of total', pc_loose_objects) needs_repack = 1 elif total_size > 1024 and size_loose / total_size * 100 >= pc_loose_size: logger.debug('Triggering repack because loose size > %s%% of total', pc_loose_size) needs_repack = 1 return needs_repack def init_logger(subcommand, logfile, loglevel, verbose): global logger logger = logging.getLogger('grokmirror') logger.setLevel(logging.DEBUG) if logfile: ch = logging.handlers.WatchedFileHandler(os.path.expanduser(logfile)) formatter = logging.Formatter(subcommand + '[%(process)d] %(asctime)s - %(levelname)s - %(message)s') ch.setFormatter(formatter) ch.setLevel(loglevel) logger.addHandler(ch) ch = logging.StreamHandler() formatter = logging.Formatter('%(message)s') ch.setFormatter(formatter) if verbose: ch.setLevel(logging.INFO) else: ch.setLevel(logging.CRITICAL) logger.addHandler(ch) return logger grokmirror-2.0.11/grokmirror/bundle.py000066400000000000000000000136371410330145700200200ustar00rootroot00000000000000# -*- coding: utf-8 -*- # Copyright (C) 2013-2020 by The Linux Foundation and contributors # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import sys import os import logging import fnmatch import grokmirror from pathlib import Path # default basic logger. We override it later. logger = logging.getLogger(__name__) def get_repo_size(fullpath): reposize = 0 obj_info = grokmirror.get_repo_obj_info(fullpath) if 'alternate' in obj_info: altpath = grokmirror.get_altrepo(fullpath) reposize = get_repo_size(altpath) reposize += int(obj_info['size']) reposize += int(obj_info['size-pack']) logger.debug('%s size: %s', fullpath, reposize) return reposize def generate_bundles(config, outdir, gitargs, revlistargs, maxsize, include): # uses advisory lock, so its safe even if we die unexpectedly manifest = grokmirror.read_manifest(config['core'].get('manifest')) toplevel = os.path.realpath(config['core'].get('toplevel')) if gitargs: gitargs = gitargs.split() if revlistargs: revlistargs = revlistargs.split() for repo in manifest.keys(): logger.debug('Checking %s', repo) # Does it match our globbing pattern? found = False for tomatch in include: if fnmatch.fnmatch(repo, tomatch) or fnmatch.fnmatch(repo, tomatch.lstrip('/')): found = True break if not found: logger.debug('%s does not match include list, skipping', repo) continue repo = repo.lstrip('/') fullpath = os.path.join(toplevel, repo) bundledir = os.path.join(outdir, repo.replace('.git', '')) Path(bundledir).mkdir(parents=True, exist_ok=True) repofpr = grokmirror.get_repo_fingerprint(toplevel, repo) logger.debug('%s fingerprint is %s', repo, repofpr) # Do we have a bundle file already? bfile = os.path.join(bundledir, 'clone.bundle') bfprfile = os.path.join(bundledir, '.fingerprint') logger.debug('Looking for %s', bfile) if os.path.exists(bfile): # Do we have a bundle fingerprint? logger.debug('Found existing bundle in %s', bfile) if os.path.exists(bfprfile): with open(bfprfile) as fh: bfpr = fh.read().strip() logger.debug('Read bundle fingerprint from %s: %s', bfprfile, bfpr) if bfpr == repofpr: logger.info(' skipped: %s (unchanged)', repo) continue logger.debug('checking size of %s', repo) total_size = get_repo_size(fullpath)/1024/1024 if total_size > maxsize: logger.info(' skipped: %s (%s > %s)', repo, total_size, maxsize) continue fullargs = gitargs + ['bundle', 'create', bfile] + revlistargs logger.debug('Full git args: %s', fullargs) logger.info(' generate: %s', bfile) ecode, out, err = grokmirror.run_git_command(fullpath, fullargs) if ecode == 0: with open(bfprfile, 'w') as fh: fh.write(repofpr) logger.debug('Wrote %s into %s', repofpr, bfprfile) return 0 def parse_args(): import argparse # noinspection PyTypeChecker op = argparse.ArgumentParser(prog='grok-bundle', description='Generate clone.bundle files for use with "repo"', formatter_class=argparse.ArgumentDefaultsHelpFormatter) op.add_argument('-v', '--verbose', action='store_true', default=False, help='Be verbose and tell us what you are doing') op.add_argument('-c', '--config', required=True, help='Location of the configuration file') op.add_argument('-o', '--outdir', required=True, help='Location where to store bundle files') op.add_argument('-g', '--gitargs', default='-c core.compression=9', help='extra args to pass to git') op.add_argument('-r', '--revlistargs', default='--branches HEAD', help='Rev-list args to use') op.add_argument('-s', '--maxsize', type=int, default=2, help='Maximum size of git repositories to bundle (in GiB)') op.add_argument('-i', '--include', nargs='*', default='*', help='List repositories to bundle (accepts shell globbing)') op.add_argument('--version', action='version', version=grokmirror.VERSION) opts = op.parse_args() return opts def grok_bundle(cfgfile, outdir, gitargs, revlistargs, maxsize, include, verbose=False): global logger config = grokmirror.load_config_file(cfgfile) logfile = config['core'].get('log', None) if config['core'].get('loglevel', 'info') == 'debug': loglevel = logging.DEBUG else: loglevel = logging.INFO logger = grokmirror.init_logger('bundle', logfile, loglevel, verbose) return generate_bundles(config, outdir, gitargs, revlistargs, maxsize, include) def command(): opts = parse_args() retval = grok_bundle( opts.config, opts.outdir, opts.gitargs, opts.revlistargs, opts.maxsize, opts.include, verbose=opts.verbose) sys.exit(retval) if __name__ == '__main__': command() grokmirror-2.0.11/grokmirror/dumb_pull.py000077500000000000000000000161771410330145700205370ustar00rootroot00000000000000# -*- coding: utf-8 -*- # Copyright (C) 2013-2018 by The Linux Foundation and contributors # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import grokmirror import logging import fnmatch import subprocess logger = logging.getLogger(__name__) def git_rev_parse_all(gitdir): args = ['rev-parse', '--all'] retcode, output, error = grokmirror.run_git_command(gitdir, args) if error: # Put things we recognize into debug debug = list() warn = list() for line in error.split('\n'): warn.append(line) if debug: logger.debug('Stderr: %s', '\n'.join(debug)) if warn: logger.warning('Stderr: %s', '\n'.join(warn)) return output def git_remote_update(args, fullpath): retcode, output, error = grokmirror.run_git_command(fullpath, args) if error: # Put things we recognize into debug debug = list() warn = list() for line in error.split('\n'): if line.find('From ') == 0: debug.append(line) elif line.find('-> ') > 0: debug.append(line) else: warn.append(line) if debug: logger.debug('Stderr: %s', '\n'.join(debug)) if warn: logger.warning('Stderr: %s', '\n'.join(warn)) def dumb_pull_repo(gitdir, remotes, svn=False): # verify it's a git repo and fetch all remotes logger.debug('Will pull %s with following remotes: %s', gitdir, remotes) old_revs = git_rev_parse_all(gitdir) try: grokmirror.lock_repo(gitdir, nonblocking=True) except IOError: logger.info('Could not obtain exclusive lock on %s', gitdir) logger.info('\tAssuming another process is running.') return False if svn: logger.debug('Using git-svn for %s', gitdir) for remote in remotes: # arghie-argh-argh if remote == '*': remote = '--all' logger.info('Running git-svn fetch %s in %s', remote, gitdir) args = ['svn', 'fetch', remote] git_remote_update(args, gitdir) else: # Not an svn remote myremotes = grokmirror.list_repo_remotes(gitdir) if not len(myremotes): logger.info('Repository %s has no defined remotes!', gitdir) return False logger.debug('existing remotes: %s', myremotes) for remote in remotes: remotefound = False for myremote in myremotes: if fnmatch.fnmatch(myremote, remote): remotefound = True logger.debug('existing remote %s matches %s', myremote, remote) args = ['remote', 'update', myremote, '--prune'] logger.info('Updating remote %s in %s', myremote, gitdir) git_remote_update(args, gitdir) if not remotefound: logger.info('Could not find any remotes matching %s in %s', remote, gitdir) new_revs = git_rev_parse_all(gitdir) grokmirror.unlock_repo(gitdir) if old_revs == new_revs: logger.debug('No new revs, no updates') return False logger.debug('New revs found -- new content pulled') return True def run_post_update_hook(hookscript, gitdir): if hookscript == '': return if not os.access(hookscript, os.X_OK): logger.warning('post_update_hook %s is not executable', hookscript) return args = [hookscript, gitdir] logger.debug('Running: %s', ' '.join(args)) (output, error) = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate() error = error.decode().strip() output = output.decode().strip() if error: # Put hook stderror into warning logger.warning('Hook Stderr: %s', error) if output: # Put hook stdout into info logger.info('Hook Stdout: %s', output) def parse_args(): import argparse # noinspection PyTypeChecker op = argparse.ArgumentParser(prog='grok-dumb-pull', description='Fetch remotes in repositories not managed by grokmirror', formatter_class=argparse.ArgumentDefaultsHelpFormatter) op.add_argument('-v', '--verbose', dest='verbose', action='store_true', default=False, help='Be verbose and tell us what you are doing') op.add_argument('-s', '--svn', dest='svn', action='store_true', default=False, help='The remotes for these repositories are Subversion') op.add_argument('-r', '--remote-names', dest='remotes', action='append', default=None, help='Only fetch remotes matching this name (accepts shell globbing)') op.add_argument('-u', '--post-update-hook', dest='posthook', default='', help='Run this hook after each repository is updated.') op.add_argument('-l', '--logfile', dest='logfile', default=None, help='Put debug logs into this file') op.add_argument('--version', action='version', version=grokmirror.VERSION) op.add_argument('paths', nargs='+', help='Full path(s) of the repos to pull') opts = op.parse_args() if not len(opts.paths): op.error('You must provide at least a path to the repos to pull') return opts def dumb_pull(paths, verbose=False, svn=False, remotes=None, posthook='', logfile=None): global logger loglevel = logging.INFO logger = grokmirror.init_logger('dumb-pull', logfile, loglevel, verbose) if remotes is None: remotes = ['*'] # Find all repositories we are to pull for entry in paths: if entry[-4:] == '.git': if not os.path.exists(entry): logger.critical('%s does not exist', entry) continue logger.debug('Found %s', entry) didwork = dumb_pull_repo(entry, remotes, svn=svn) if didwork: run_post_update_hook(posthook, entry) else: logger.debug('Finding all git repos in %s', entry) for founddir in grokmirror.find_all_gitdirs(entry): didwork = dumb_pull_repo(founddir, remotes, svn=svn) if didwork: run_post_update_hook(posthook, founddir) def command(): opts = parse_args() return dumb_pull( opts.paths, verbose=opts.verbose, svn=opts.svn, remotes=opts.remotes, posthook=opts.posthook, logfile=opts.logfile) if __name__ == '__main__': command() grokmirror-2.0.11/grokmirror/fsck.py000077500000000000000000001611051410330145700174720ustar00rootroot00000000000000# -*- coding: utf-8 -*- # Copyright (C) 2013-2020 by The Linux Foundation and contributors # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import grokmirror import logging import time import json import random import datetime import shutil import gc import fnmatch import io import smtplib from pathlib import Path from email.message import EmailMessage from fcntl import lockf, LOCK_EX, LOCK_UN, LOCK_NB # default basic logger. We override it later. logger = logging.getLogger(__name__) def log_errors(fullpath, cmdargs, lines): logger.critical('%s reports errors:', fullpath) with open(os.path.join(fullpath, 'grokmirror.fsck.err'), 'w') as fh: fh.write('# Date: %s\n' % datetime.datetime.today().strftime('%F')) fh.write('# Cmd : git %s\n' % ' '.join(cmdargs)) count = 0 for line in lines: fh.write('%s\n' % line) logger.critical('\t%s', line) count += 1 if count > 10: logger.critical('\t [ %s more lines skipped ]', len(lines) - 10) logger.critical('\t [ see %s/grokmirror.fsck.err ]', os.path.basename(fullpath)) break def gen_preload_bundle(fullpath, config): outdir = config['fsck'].get('preload_bundle_outdir') Path(outdir).mkdir(parents=True, exist_ok=True) bname = '%s.bundle' % os.path.basename(fullpath)[:-4] args = ['bundle', 'create', os.path.join(outdir, bname), '--all'] logger.info(' bundling: %s', bname) grokmirror.run_git_command(fullpath, args) def get_blob_set(fullpath): bset = set() size = 0 blobcache = os.path.join(fullpath, 'grokmirror.blobs') if os.path.exists(blobcache): # Did it age out? Hardcode to 30 days. expage = time.time() - 86400*30 st = os.stat(blobcache) if st.st_mtime < expage: os.unlink(blobcache) try: with open(blobcache) as fh: while True: line = fh.readline() if not len(line): break if line[0] == '#': continue chunks = line.strip().split() bhash = chunks[0] bsize = int(chunks[1]) size += bsize bset.add((bhash, bsize)) return bset, size except FileNotFoundError: pass # This only makes sense for repos not using alternates, so make sure you check first logger.info(' bloblist: %s', fullpath) gitargs = ['cat-file', '--batch-all-objects', '--batch-check', '--unordered'] retcode, output, error = grokmirror.run_git_command(fullpath, gitargs) if retcode == 0: with open(blobcache, 'w') as fh: fh.write('# Blobs and sizes used for sibling calculation\n') for line in output.split('\n'): if line.find(' blob ') < 0: continue chunks = line.strip().split() fh.write(f'{chunks[0]} {chunks[2]}\n') bhash = chunks[0] bsize = int(chunks[2]) size += bsize bset.add((bhash, bsize)) return bset, size def check_sibling_repos_by_blobs(bset1, bsize1, bset2, bsize2, ratio): iset = bset1.intersection(bset2) if not len(iset): return False isize = 0 for bhash, bsize in iset: isize += bsize # Both repos should share at least ratio % of blobs in them ratio1 = int(isize / bsize1 * 100) logger.debug('isize=%s, bsize1=%s, ratio1=%s', isize, bsize1, ratio1) ratio2 = int(isize / bsize2 * 100) logger.debug('isize=%s, bsize2=%s ratio2=%s', isize, bsize2, ratio1) if ratio1 >= ratio and ratio2 >= ratio: return True return False def find_siblings_by_blobs(obstrepo, obstdir, ratio=75): siblings = set() oset, osize = get_blob_set(obstrepo) for srepo in grokmirror.find_all_gitdirs(obstdir, normalize=True, exclude_objstore=False): if srepo == obstrepo: continue logger.debug('Comparing blobs between %s and %s', obstrepo, srepo) sset, ssize = get_blob_set(srepo) if check_sibling_repos_by_blobs(oset, osize, sset, ssize, ratio): logger.info(' siblings: %s and %s', os.path.basename(obstrepo), os.path.basename(srepo)) siblings.add(srepo) return siblings def merge_siblings(siblings, amap): mdest = None rcount = 0 # Who has the most remotes? for sibling in set(siblings): if sibling not in amap or not len(amap[sibling]): # Orphaned sibling, ignore it -- it will get cleaned up siblings.remove(sibling) continue s_remotes = grokmirror.list_repo_remotes(sibling) if len(s_remotes) > rcount: mdest = sibling rcount = len(s_remotes) # Migrate all siblings into the repo with most remotes siblings.remove(mdest) for sibling in siblings: logger.info('%s: merging into %s', os.path.basename(sibling), os.path.basename(mdest)) s_remotes = grokmirror.list_repo_remotes(sibling, withurl=True) for virtref, childpath in s_remotes: if childpath not in amap[sibling]: # The child repo isn't even using us args = ['remote', 'remove', virtref] grokmirror.run_git_command(sibling, args) continue logger.info(' moving: %s', childpath) success = grokmirror.add_repo_to_objstore(mdest, childpath) if not success: logger.critical('Could not add %s to %s', childpath, mdest) continue logger.info(' : fetching into %s', os.path.basename(mdest)) success = grokmirror.fetch_objstore_repo(mdest, childpath) if not success: logger.critical('Failed to fetch %s from %s to %s', childpath, os.path.basename(sibling), os.path.basename(mdest)) continue logger.info(' : repointing alternates') grokmirror.set_altrepo(childpath, mdest) amap[sibling].remove(childpath) amap[mdest].add(childpath) args = ['remote', 'remove', virtref] grokmirror.run_git_command(sibling, args) logger.info(' : done') return mdest def check_reclone_error(fullpath, config, errors): reclone = None toplevel = os.path.realpath(config['core'].get('toplevel')) errlist = config['fsck'].get('reclone_on_errors', '').split('\n') for line in errors: for estring in errlist: if line.find(estring) != -1: # is this repo used for alternates? gitdir = '/' + os.path.relpath(fullpath, toplevel).lstrip('/') if grokmirror.is_alt_repo(toplevel, gitdir): logger.critical('\tused for alternates, not requesting auto-reclone') return else: reclone = line logger.critical('\trequested auto-reclone') break if reclone is not None: break if reclone is None: return set_repo_reclone(fullpath, reclone) def get_repo_size(fullpath): oi = grokmirror.get_repo_obj_info(fullpath) kbsize = 0 for field in ['size', 'size-pack', 'size-garbage']: try: kbsize += int(oi[field]) except (KeyError, ValueError): pass logger.debug('%s size: %s kb', fullpath, kbsize) return kbsize def get_human_size(kbsize): num = kbsize for unit in ['Ki', 'Mi', 'Gi']: if abs(num) < 1024.0: return "%3.2f %sB" % (num, unit) num /= 1024.0 return "%.2f%s TiB" % num def set_repo_reclone(fullpath, reason): rfile = os.path.join(fullpath, 'grokmirror.reclone') # Have we already requested a reclone? if os.path.exists(rfile): logger.debug('Already requested repo reclone for %s', fullpath) return with open(rfile, 'w') as rfh: rfh.write('Requested by grok-fsck due to error: %s' % reason) def run_git_prune(fullpath, config): # WARNING: We assume you've already verified that it's safe to do so prune_ok = True isprecious = grokmirror.is_precious(fullpath) if isprecious: set_precious_objects(fullpath, False) # We set expire to yesterday in order to avoid race conditions # in repositories that are actively being accessed at the time of # running the prune job. args = ['prune', '--expire=yesterday'] logger.info(' prune: pruning') retcode, output, error = grokmirror.run_git_command(fullpath, args) if error: # Put things we recognize as fairly benign into debug debug = list() warn = list() ierrors = set([x.strip() for x in config['fsck'].get('ignore_errors', '').split('\n')]) for line in error.split('\n'): ignored = False for estring in ierrors: if line.find(estring) != -1: ignored = True debug.append(line) break if not ignored: warn.append(line) if debug: logger.debug('Stderr: %s', '\n'.join(debug)) if warn: prune_ok = False log_errors(fullpath, args, warn) check_reclone_error(fullpath, config, warn) if isprecious: set_precious_objects(fullpath, True) return prune_ok def is_safe_to_prune(fullpath, config): if config['fsck'].get('prune', 'yes') != 'yes': logger.debug('Pruning disabled in config file') return False toplevel = os.path.realpath(config['core'].get('toplevel')) obstdir = os.path.realpath(config['core'].get('objstore')) gitdir = '/' + os.path.relpath(fullpath, toplevel).lstrip('/') if grokmirror.is_obstrepo(fullpath, obstdir): # We only prune if all repos pointing to us are public urls = set(grokmirror.list_repo_remotes(fullpath, withurl=True)) mine = set([x[1] for x in urls]) amap = grokmirror.get_altrepo_map(toplevel) if mine != amap[fullpath]: logger.debug('Cannot prune %s because it is used by non-public repos', gitdir) return False elif grokmirror.is_alt_repo(toplevel, gitdir): logger.debug('Cannot prune %s because it is used as alternates by other repos', gitdir) return False logger.debug('%s should be safe to prune', gitdir) return True def run_git_repack(fullpath, config, level=1, prune=True): # Returns false if we hit any errors on the way repack_ok = True obstdir = os.path.realpath(config['core'].get('objstore')) toplevel = os.path.realpath(config['core'].get('toplevel')) gitdir = '/' + os.path.relpath(fullpath, toplevel).lstrip('/') ierrors = set([x.strip() for x in config['fsck'].get('ignore_errors', '').split('\n')]) if prune: # Make sure it's safe to do so prune = is_safe_to_prune(fullpath, config) if config['fsck'].get('precious', '') == 'always': always_precious = True set_precious_objects(fullpath, enabled=True) else: always_precious = False set_precious_objects(fullpath, enabled=False) set_precious_after = False gen_commitgraph = True # Figure out what our repack flags should be. repack_flags = list() rregular = config['fsck'].get('extra_repack_flags', '').split() if len(rregular): repack_flags += rregular full_repack_flags = ['-f', '--pack-kept-objects'] rfull = config['fsck'].get('extra_repack_flags_full', '').split() if len(rfull): full_repack_flags += rfull if grokmirror.is_obstrepo(fullpath, obstdir): set_precious_after = True repack_flags.append('-a') if not prune and not always_precious: repack_flags.append('-k') elif grokmirror.is_alt_repo(toplevel, gitdir): set_precious_after = True if grokmirror.get_altrepo(fullpath): gen_commitgraph = False logger.warning(' warning : has alternates and is used by others for alternates') logger.warning(' : this can cause grandchild corruption') repack_flags.append('-A') repack_flags.append('-l') else: repack_flags.append('-a') repack_flags.append('-b') if not always_precious: repack_flags.append('-k') elif grokmirror.get_altrepo(fullpath): # we are a "child repo" gen_commitgraph = False repack_flags.append('-l') repack_flags.append('-A') if prune: repack_flags.append('--unpack-unreachable=yesterday') else: # we have no relationships with other repos repack_flags.append('-a') repack_flags.append('-b') if prune: repack_flags.append('--unpack-unreachable=yesterday') if level > 1: logger.info(' repack: performing a full repack for optimal deltas') repack_flags += full_repack_flags if not always_precious: repack_flags.append('-d') # If we have a logs dir, then run reflog expire if os.path.isdir(os.path.join(fullpath, 'logs')): args = ['reflog', 'expire', '--all', '--stale-fix'] logger.info(' reflog: expiring reflogs') grokmirror.run_git_command(fullpath, args) args = ['repack'] + repack_flags logger.info(' repack: repacking with "%s"', ' '.join(repack_flags)) # We always tack on -q args.append('-q') retcode, output, error = grokmirror.run_git_command(fullpath, args) # With newer versions of git, repack may return warnings that are safe to ignore # so use the same strategy to weed out things we aren't interested in seeing if error: # Put things we recognize as fairly benign into debug debug = list() warn = list() for line in error.split('\n'): ignored = False for estring in ierrors: if line.find(estring) != -1: ignored = True debug.append(line) break if not ignored: warn.append(line) if debug: logger.debug('Stderr: %s', '\n'.join(debug)) if warn: repack_ok = False log_errors(fullpath, args, warn) check_reclone_error(fullpath, config, warn) if not repack_ok: # No need to repack refs if repo is broken if set_precious_after: set_precious_objects(fullpath, enabled=True) return False if gen_commitgraph and config['fsck'].get('commitgraph', 'yes') == 'yes': grokmirror.set_git_config(fullpath, 'core.commitgraph', 'true') run_git_commit_graph(fullpath) # repacking refs requires a separate command, so run it now args = ['pack-refs'] if level > 1: logger.info(' packrefs: repacking all refs') args.append('--all') else: logger.info(' packrefs: repacking refs') retcode, output, error = grokmirror.run_git_command(fullpath, args) # pack-refs shouldn't return anything, but use the same ignore_errors block # to weed out any future potential benign warnings if error: # Put things we recognize as fairly benign into debug debug = list() warn = list() for line in error.split('\n'): ignored = False for estring in ierrors: if line.find(estring) != -1: ignored = True debug.append(line) break if not ignored: warn.append(line) if debug: logger.debug('Stderr: %s', '\n'.join(debug)) if warn: repack_ok = False log_errors(fullpath, args, warn) check_reclone_error(fullpath, config, warn) if prune: repack_ok = run_git_prune(fullpath, config) if set_precious_after: set_precious_objects(fullpath, enabled=True) return repack_ok def run_git_fsck(fullpath, config, conn_only=False): args = ['fsck', '--no-progress', '--no-dangling', '--no-reflogs'] obstdir = os.path.realpath(config['core'].get('objstore')) # If it's got an obstrepo, always run as connectivity-only altrepo = grokmirror.get_altrepo(fullpath) if altrepo and grokmirror.is_obstrepo(altrepo, obstdir): logger.debug('Repo uses objstore, forcing connectivity-only') conn_only = True if conn_only: args.append('--connectivity-only') logger.info(' fsck: running with --connectivity-only') else: logger.info(' fsck: running full checks') retcode, output, error = grokmirror.run_git_command(fullpath, args) if output or error: # Put things we recognize as fairly benign into debug debug = list() warn = list() ierrors = set([x.strip() for x in config['fsck'].get('ignore_errors', '').split('\n')]) for line in output.split('\n') + error.split('\n'): if not len(line.strip()): continue ignored = False for estring in ierrors: if line.find(estring) != -1: ignored = True debug.append(line) break if not ignored: warn.append(line) if debug: logger.debug('Stderr: %s', '\n'.join(debug)) if warn: log_errors(fullpath, args, warn) check_reclone_error(fullpath, config, warn) def run_git_commit_graph(fullpath): # Does our version of git support commit-graph? if not grokmirror.git_newer_than('2.18.0'): logger.debug('Git version too old, not generating commit-graph') logger.info(' graph: generating commit-graph') args = ['commit-graph', 'write'] retcode, output, error = grokmirror.run_git_command(fullpath, args) if retcode == 0: return True return False def set_precious_objects(fullpath, enabled=True): # It's better to just set it blindly without checking first, # as this results in one fewer shell-out. logger.debug('Setting preciousObjects for %s', fullpath) if enabled: poval = 'true' else: poval = 'false' grokmirror.set_git_config(fullpath, 'extensions.preciousObjects', poval) def check_precious_objects(fullpath): return grokmirror.is_precious(fullpath) def get_repack_level(obj_info, max_loose_objects=1200, max_packs=20, pc_loose_objects=10, pc_loose_size=10): # for now, hardcode the maximum loose objects and packs # XXX: we can probably set this in git config values? # I don't think this makes sense as a global setting, because # optimal values will depend on the size of the repo as a whole packs = int(obj_info['packs']) count_loose = int(obj_info['count']) needs_repack = 0 # first, compare against max values: if packs >= max_packs: logger.debug('Triggering full repack because packs > %s', max_packs) needs_repack = 2 elif count_loose >= max_loose_objects: logger.debug('Triggering quick repack because loose objects > %s', max_loose_objects) needs_repack = 1 else: # is the number of loose objects or their size more than 10% of # the overall total? in_pack = int(obj_info['in-pack']) size_loose = int(obj_info['size']) size_pack = int(obj_info['size-pack']) total_obj = count_loose + in_pack total_size = size_loose + size_pack # set some arbitrary "worth bothering" limits so we don't # continuously repack tiny repos. if total_obj > 500 and count_loose / total_obj * 100 >= pc_loose_objects: logger.debug('Triggering repack because loose objects > %s%% of total', pc_loose_objects) needs_repack = 1 elif total_size > 1024 and size_loose / total_size * 100 >= pc_loose_size: logger.debug('Triggering repack because loose size > %s%% of total', pc_loose_size) needs_repack = 1 return needs_repack def fsck_mirror(config, force=False, repack_only=False, conn_only=False, repack_all_quick=False, repack_all_full=False): if repack_all_quick or repack_all_full: force = True statusfile = config['fsck'].get('statusfile') if not statusfile: logger.critical('Please define fsck.statusfile in the config') return 1 st_dir = os.path.dirname(statusfile) if not os.path.isdir(os.path.dirname(statusfile)): logger.critical('Directory %s is absent', st_dir) return 1 # Lock the tree to make sure we only run one instance lockfile = os.path.join(st_dir, '.%s.lock' % os.path.basename(statusfile)) logger.debug('Attempting to obtain lock on %s', lockfile) flockh = open(lockfile, 'w') try: lockf(flockh, LOCK_EX | LOCK_NB) except IOError: logger.info('Could not obtain exclusive lock on %s', lockfile) logger.info('Assuming another process is running.') return 0 manifile = config['core'].get('manifest') logger.info('Analyzing %s', manifile) grokmirror.manifest_lock(manifile) manifest = grokmirror.read_manifest(manifile) if os.path.exists(statusfile): logger.info(' status: reading %s', statusfile) stfh = open(statusfile, 'r') # noinspection PyBroadException try: # Format of the status file: # { # '/full/path/to/repository': { # 'lastcheck': 'YYYY-MM-DD' or 'never', # 'nextcheck': 'YYYY-MM-DD', # 'lastrepack': 'YYYY-MM-DD', # 'fingerprint': 'sha-1', # 's_elapsed': seconds, # 'quick_repack_count': times, # }, # ... # } status = json.loads(stfh.read()) except: logger.critical('Failed to parse %s', statusfile) lockf(flockh, LOCK_UN) flockh.close() return 1 else: status = dict() frequency = config['fsck'].getint('frequency', 30) today = datetime.datetime.today() todayiso = today.strftime('%F') if force: # Use randomization for next check, again checkdelay = random.randint(1, frequency) else: checkdelay = frequency commitgraph = config['fsck'].getboolean('commitgraph', True) # Is our git version new enough to support it? if commitgraph and not grokmirror.git_newer_than('2.18.0'): logger.info('Git version too old to support commit graphs, disabling') config['fsck']['commitgraph'] = 'no' # Go through the manifest and compare with status toplevel = os.path.realpath(config['core'].get('toplevel')) changed = False for gitdir in list(manifest): fullpath = os.path.join(toplevel, gitdir.lstrip('/')) # Does it exist? if not os.path.isdir(fullpath): # Remove it from manifest and status manifest.pop(gitdir) status.pop(fullpath) changed = True continue if fullpath not in status.keys(): # Newly added repository if not force: # Randomize next check between now and frequency delay = random.randint(0, frequency) nextdate = today + datetime.timedelta(days=delay) nextcheck = nextdate.strftime('%F') else: nextcheck = todayiso status[fullpath] = { 'lastcheck': 'never', 'nextcheck': nextcheck, 'fingerprint': grokmirror.get_repo_fingerprint(toplevel, gitdir), } logger.info('%s:', fullpath) logger.info(' added: next check on %s', nextcheck) if 'manifest' in config: pretty = config['manifest'].getboolean('pretty', False) else: pretty = False if changed: grokmirror.write_manifest(manifile, manifest, pretty=pretty) grokmirror.manifest_unlock(manifile) # record newly found repos in the status file logger.debug('Updating status file in %s', statusfile) with open(statusfile, 'w') as stfh: stfh.write(json.dumps(status, indent=2)) # Go through status and find all repos that need work done on them. to_process = set() total_checked = 0 total_elapsed = 0 space_saved = 0 cfg_repack = config['fsck'].getboolean('repack', True) # Can be "always", which is why we don't getboolean cfg_precious = config['fsck'].get('precious', 'yes') obstdir = os.path.realpath(config['core'].get('objstore')) logger.info(' search: getting parent commit info from all repos, may take a while') top_roots, obst_roots = grokmirror.get_rootsets(toplevel, obstdir) amap = grokmirror.get_altrepo_map(toplevel) fetched_obstrepos = set() obst_changes = False analyzed = 0 queued = 0 logger.info('Analyzing %s (%s repos)', toplevel, len(status)) stattime = time.time() baselines = [x.strip() for x in config['fsck'].get('baselines', '').split('\n')] for fullpath in list(status): # Give me a status every 5 seconds if time.time() - stattime >= 5: logger.info(' ---: %s/%s analyzed, %s queued', analyzed, len(status), queued) stattime = time.time() start_size = get_repo_size(fullpath) analyzed += 1 # We do obstrepos separately below, as logic is different if grokmirror.is_obstrepo(fullpath, obstdir): logger.debug('Skipping %s (obstrepo)') continue # Check to make sure it's still in the manifest gitdir = fullpath.replace(toplevel, '', 1) gitdir = '/' + gitdir.lstrip('/') if gitdir not in manifest: status.pop(fullpath) logger.debug('%s is gone, no longer in manifest', gitdir) continue # Make sure FETCH_HEAD is pointing to /dev/null fetch_headf = os.path.join(fullpath, 'FETCH_HEAD') if not os.path.islink(fetch_headf): logger.debug(' replacing FETCH_HEAD with symlink to /dev/null') try: os.unlink(fetch_headf) except FileNotFoundError: pass os.symlink('/dev/null', fetch_headf) # Objstore migration routines # Are we using objstore? altdir = grokmirror.get_altrepo(fullpath) is_private = grokmirror.is_private_repo(config, gitdir) if grokmirror.is_alt_repo(toplevel, gitdir): # Don't prune any repos that are parents -- until migration is fully complete m_prune = False else: m_prune = True if not altdir and not os.path.exists(os.path.join(fullpath, 'grokmirror.do-not-objstore')): # Do we match any obstdir repos? obstrepo = grokmirror.find_best_obstrepo(fullpath, obst_roots, toplevel, baselines) if obstrepo: obst_changes = True # Yes, set ourselves up to be using that obstdir logger.info('%s: can use %s', gitdir, os.path.basename(obstrepo)) grokmirror.set_altrepo(fullpath, obstrepo) if not is_private: grokmirror.add_repo_to_objstore(obstrepo, fullpath) # Fetch into the obstrepo logger.info(' fetch: fetching %s', gitdir) grokmirror.fetch_objstore_repo(obstrepo, fullpath) obst_roots[obstrepo] = grokmirror.get_repo_roots(obstrepo, force=True) run_git_repack(fullpath, config, level=1, prune=m_prune) space_saved += start_size - get_repo_size(fullpath) else: # Do we have any toplevel siblings? obstrepo = None my_roots = grokmirror.get_repo_roots(fullpath) top_siblings = grokmirror.find_siblings(fullpath, my_roots, top_roots) if len(top_siblings): # Am I a private repo? if is_private: # Are there any non-private siblings? for top_sibling in top_siblings: # Are you a private repo? if grokmirror.is_private_repo(config, top_sibling): continue # Great, make an objstore repo out of this sibling obstrepo = grokmirror.setup_objstore_repo(obstdir) logger.info('%s: can use %s', gitdir, os.path.basename(obstrepo)) logger.info(' init: new objstore repo %s', os.path.basename(obstrepo)) grokmirror.add_repo_to_objstore(obstrepo, top_sibling) # Fetch into the obstrepo logger.info(' fetch: fetching %s', top_sibling) grokmirror.fetch_objstore_repo(obstrepo, top_sibling) obst_roots[obstrepo] = grokmirror.get_repo_roots(obstrepo, force=True) # It doesn't matter if this fails, because repacking is still safe # Other siblings will match in their own due course break else: # Make an objstore repo out of myself obstrepo = grokmirror.setup_objstore_repo(obstdir) logger.info('%s: can use %s', gitdir, os.path.basename(obstrepo)) logger.info(' init: new objstore repo %s', os.path.basename(obstrepo)) grokmirror.add_repo_to_objstore(obstrepo, fullpath) if obstrepo: obst_changes = True # Set alternates to the obstrepo grokmirror.set_altrepo(fullpath, obstrepo) if not is_private: # Fetch into the obstrepo logger.info(' fetch: fetching %s', gitdir) grokmirror.fetch_objstore_repo(obstrepo, fullpath) run_git_repack(fullpath, config, level=1, prune=m_prune) space_saved += start_size - get_repo_size(fullpath) obst_roots[obstrepo] = grokmirror.get_repo_roots(obstrepo, force=True) elif not os.path.isdir(altdir): logger.critical(' reclone: %s (alternates repo gone)', gitdir) set_repo_reclone(fullpath, 'Alternates repository gone') continue elif altdir.find(obstdir) != 0: # We have an alternates repo, but it's not an objstore repo # Probably left over from grokmirror-1.x # Do we have any matching obstrepos? obstrepo = grokmirror.find_best_obstrepo(fullpath, obst_roots, toplevel, baselines) if obstrepo: logger.info('%s: migrating to %s', gitdir, os.path.basename(obstrepo)) if altdir not in fetched_obstrepos: # We're already sharing objects with altdir, so no need to check if it's private grokmirror.add_repo_to_objstore(obstrepo, altdir) logger.info(' fetch: fetching %s (previous parent)', os.path.relpath(altdir, toplevel)) success = grokmirror.fetch_objstore_repo(obstrepo, altdir) fetched_obstrepos.add(altdir) if success: set_precious_objects(altdir, enabled=False) pre_size = get_repo_size(altdir) run_git_repack(altdir, config, level=1, prune=False) space_saved += pre_size - get_repo_size(altdir) else: logger.critical('Unsuccessful fetching %s into %s', altdir, os.path.basename(obstrepo)) obstrepo = None else: # Make a new obstrepo out of mommy obstrepo = grokmirror.setup_objstore_repo(obstdir) logger.info('%s: migrating to %s', gitdir, os.path.basename(obstrepo)) logger.info(' init: new objstore repo %s', os.path.basename(obstrepo)) grokmirror.add_repo_to_objstore(obstrepo, altdir) logger.info(' fetch: fetching %s (previous parent)', os.path.relpath(altdir, toplevel)) success = grokmirror.fetch_objstore_repo(obstrepo, altdir) fetched_obstrepos.add(altdir) if success: grokmirror.set_altrepo(altdir, obstrepo) # mommy is no longer precious set_precious_objects(altdir, enabled=False) # Don't prune, because there may be objects others are still borrowing # It can only be pruned once the full migration is completed pre_size = get_repo_size(altdir) run_git_repack(altdir, config, level=1, prune=False) space_saved += pre_size - get_repo_size(altdir) else: logger.critical('Unsuccessful fetching %s into %s', altdir, os.path.basename(obstrepo)) obstrepo = None if obstrepo: obst_changes = True if not is_private: # Fetch into the obstrepo grokmirror.add_repo_to_objstore(obstrepo, fullpath) logger.info(' fetch: fetching %s', gitdir) if grokmirror.fetch_objstore_repo(obstrepo, fullpath): grokmirror.set_altrepo(fullpath, obstrepo) set_precious_objects(fullpath, enabled=False) run_git_repack(fullpath, config, level=1, prune=m_prune) space_saved += start_size - get_repo_size(fullpath) else: # Grab all the objects from the previous parent, since we can't simply # fetch ourselves into the obstrepo (we're private). args = ['repack', '-a'] logger.info(' fetch: restoring private repo %s', gitdir) if grokmirror.run_git_command(fullpath, args): grokmirror.set_altrepo(fullpath, obstrepo) set_precious_objects(fullpath, enabled=False) # Now repack ourselves to get rid of any public objects run_git_repack(fullpath, config, level=1, prune=m_prune) obst_roots[obstrepo] = grokmirror.get_repo_roots(obstrepo, force=True) elif altdir.find(obstdir) == 0 and not is_private: # Make sure this repo is properly set up with obstrepo # (e.g. it could have been cloned/copied and obstrepo is not tracking it yet) obstrepo = altdir s_remotes = grokmirror.list_repo_remotes(obstrepo, withurl=True) found = False for virtref, childpath in s_remotes: if childpath == fullpath: found = True break if not found: # Set it up properly grokmirror.add_repo_to_objstore(obstrepo, fullpath) logger.info(' reconfig: %s to fetch into %s', gitdir, os.path.basename(obstrepo)) obj_info = grokmirror.get_repo_obj_info(fullpath) try: packs = int(obj_info['packs']) count_loose = int(obj_info['count']) except KeyError: logger.warning('Unable to count objects in %s, skipping' % fullpath) continue schedcheck = datetime.datetime.strptime(status[fullpath]['nextcheck'], '%Y-%m-%d') nextcheck = today + datetime.timedelta(days=checkdelay) if not cfg_repack: # don't look at me if you turned off repack logger.debug('Not repacking because repack=no in config') repack_level = None elif repack_all_full and (count_loose > 0 or packs > 1): logger.debug('repack_level=2 due to repack_all_full') repack_level = 2 elif repack_all_quick and count_loose > 0: logger.debug('repack_level=1 due to repack_all_quick') repack_level = 1 elif status[fullpath].get('fingerprint') != grokmirror.get_repo_fingerprint(toplevel, gitdir): logger.debug('Checking repack level of %s', fullpath) repack_level = get_repack_level(obj_info) else: repack_level = None # trigger a level-1 repack if it's regular check time and the fingerprint has changed if (not repack_level and schedcheck <= today and status[fullpath].get('fingerprint') != grokmirror.get_repo_fingerprint(toplevel, gitdir)): status[fullpath]['nextcheck'] = nextcheck.strftime('%F') logger.info(' aged: %s (forcing repack)', fullpath) repack_level = 1 # If we're not already repacking the repo, run a prune if we find garbage in it if obj_info['garbage'] != '0' and not repack_level and is_safe_to_prune(fullpath, config): logger.info(' garbage: %s (%s files, %s KiB)', gitdir, obj_info['garbage'], obj_info['size-garbage']) try: grokmirror.lock_repo(fullpath, nonblocking=True) run_git_prune(fullpath, config) grokmirror.unlock_repo(fullpath) except IOError: pass if repack_level and (cfg_precious == 'always' and check_precious_objects(fullpath)): # if we have preciousObjects, then we only repack based on the same # schedule as fsck. logger.debug('preciousObjects is set') # for repos with preciousObjects, we use the fsck schedule for repacking if schedcheck <= today: logger.debug('Time for a full periodic repack of a preciousObjects repo') status[fullpath]['nextcheck'] = nextcheck.strftime('%F') repack_level = 2 else: logger.debug('Not repacking preciousObjects repo outside of schedule') repack_level = None if repack_level: queued += 1 to_process.add((fullpath, 'repack', repack_level)) if repack_level > 1: logger.info(' queued: %s (full repack)', fullpath) else: logger.info(' queued: %s (repack)', fullpath) elif repack_only or repack_all_quick or repack_all_full: continue elif schedcheck <= today or force: queued += 1 to_process.add((fullpath, 'fsck', None)) logger.info(' queued: %s (fsck)', fullpath) logger.info(' done: %s analyzed, %s queued', analyzed, queued) if obst_changes: # Refresh the alt repo map cache amap = grokmirror.get_altrepo_map(toplevel, refresh=True) obstrepos = grokmirror.find_all_gitdirs(obstdir, normalize=True, exclude_objstore=False) analyzed = 0 queued = 0 logger.info('Analyzing %s (%s repos)', obstdir, len(obstrepos)) objstore_uses_plumbing = config['core'].getboolean('objstore_uses_plumbing', False) islandcores = [x.strip() for x in config['fsck'].get('islandcores', '').split('\n')] stattime = time.time() for obstrepo in obstrepos: if time.time() - stattime >= 5: logger.info(' ---: %s/%s analyzed, %s queued', analyzed, len(obstrepos), queued) stattime = time.time() analyzed += 1 logger.debug('Processing objstore repo: %s', os.path.basename(obstrepo)) my_roots = grokmirror.get_repo_roots(obstrepo) if obstrepo in amap and len(amap[obstrepo]): # Is it redundant with any other objstore repos? strategy = config['fsck'].get('obstrepo_merge_strategy', 'exact') if strategy == 'blobs': siblings = find_siblings_by_blobs(obstrepo, obstdir, ratio=75) else: exact_merge = True if strategy == 'loose': exact_merge = False siblings = grokmirror.find_siblings(obstrepo, my_roots, obst_roots, exact=exact_merge) if len(siblings): siblings.add(obstrepo) mdest = merge_siblings(siblings, amap) obst_changes = True if mdest in status: # Force full repack of merged obstrepos status[mdest]['nextcheck'] = todayiso # Recalculate my roots my_roots = grokmirror.get_repo_roots(obstrepo, force=True) obst_roots[obstrepo] = my_roots # Not an else, because the previous step may have migrated things if obstrepo not in amap or not len(amap[obstrepo]): obst_changes = True # XXX: Is there a possible race condition here if grok-pull cloned a new repo # while we were migrating this one? logger.info('%s: deleting (no longer used by anything)', os.path.basename(obstrepo)) if obstrepo in amap: amap.pop(obstrepo) shutil.rmtree(obstrepo) continue # Record the latest sibling info in the tracking file telltale = os.path.join(obstrepo, 'grokmirror.objstore') with open(telltale, 'w') as fh: fh.write(grokmirror.OBST_PREAMBULE) fh.write('\n'.join(sorted(list(amap[obstrepo]))) + '\n') my_remotes = grokmirror.list_repo_remotes(obstrepo, withurl=True) # Use the first child repo as our "reference" entry in manifest refrepo = None # Use for the alternateRefsPrefixes value baseline_refs = set() set_islandcore = False new_islandcore = False valid_virtrefs = set() for virtref, childpath in my_remotes: # Is it still relevant? if childpath not in amap[obstrepo]: # Remove it and let prune take care of it grokmirror.remove_from_objstore(obstrepo, childpath) logger.info('%s: removed remote %s (no longer used)', os.path.basename(obstrepo), childpath) continue valid_virtrefs.add(virtref) # Does it need fetching? fetch = True l_fpf = os.path.join(obstrepo, 'grokmirror.%s.fingerprint' % virtref) r_fpf = os.path.join(childpath, 'grokmirror.fingerprint') try: with open(l_fpf) as fh: l_fp = fh.read().strip() with open(r_fpf) as fh: r_fp = fh.read().strip() if l_fp == r_fp: fetch = False except IOError: pass gitdir = '/' + os.path.relpath(childpath, toplevel) if fetch: grokmirror.lock_repo(obstrepo, nonblocking=False) logger.info(' fetch: %s -> %s', gitdir, os.path.basename(obstrepo)) success = grokmirror.fetch_objstore_repo(obstrepo, childpath, use_plumbing=objstore_uses_plumbing) if not success and objstore_uses_plumbing: # Try using git porcelain grokmirror.fetch_objstore_repo(obstrepo, childpath) grokmirror.unlock_repo(obstrepo) if gitdir not in manifest: continue # Do we need to set any alternateRefsPrefixes? for baseline in baselines: # Does this repo match a baseline if fnmatch.fnmatch(gitdir, baseline): baseline_refs.add('refs/virtual/%s/heads/' % virtref) break # Do we need to set islandCore? if not set_islandcore: is_islandcore = False for islandcore in islandcores: # Does this repo match a baseline if fnmatch.fnmatch(gitdir, islandcore): is_islandcore = True break if is_islandcore: set_islandcore = True # is it already set to that? entries = grokmirror.get_config_from_git(obstrepo, r'pack\.island*') if entries.get('islandcore') != virtref: new_islandcore = True logger.info(' reconfig: %s (islandCore to %s)', os.path.basename(obstrepo), virtref) grokmirror.set_git_config(obstrepo, 'pack.islandCore', virtref) if refrepo is None: # Legacy "reference=" setting in manifest refrepo = gitdir manifest[gitdir]['reference'] = None else: manifest[gitdir]['reference'] = refrepo manifest[gitdir]['forkgroup'] = os.path.basename(obstrepo[:-4]) if len(baseline_refs): # sort the list, so we have deterministic value br = list(baseline_refs) br.sort() refpref = ' '.join(br) # Go through all remotes and set their alternateRefsPrefixes for s_virtref, s_childpath in my_remotes: # is it already set to that? entries = grokmirror.get_config_from_git(s_childpath, r'core\.alternate*') if entries.get('alternaterefsprefixes') != refpref: s_gitdir = '/' + os.path.relpath(s_childpath, toplevel) logger.info(' reconfig: %s (baseline)', s_gitdir) grokmirror.set_git_config(s_childpath, 'core.alternateRefsPrefixes', refpref) repack_requested = False if os.path.exists(os.path.join(obstrepo, 'grokmirror.repack')): repack_requested = True # Go through all our refs and find all stale virtrefs args = ['for-each-ref', '--format=%(refname)', 'refs/virtual/'] trimmed_virtrefs = set() ecode, out, err = grokmirror.run_git_command(obstrepo, args) if ecode == 0 and out: for line in out.split('\n'): chunks = line.split('/') if len(chunks) < 3: # Where did this come from? logger.debug('Weird ref %s in objstore repo %s', line, obstrepo) continue virtref = chunks[2] if virtref not in valid_virtrefs and virtref not in trimmed_virtrefs: logger.info(' trim: stale virtref %s', virtref) grokmirror.objstore_trim_virtref(obstrepo, virtref) trimmed_virtrefs.add(virtref) if obstrepo not in status or new_islandcore or trimmed_virtrefs or repack_requested: # We don't use obstrepo fingerprints, so we set it to None status[obstrepo] = { 'lastcheck': 'never', 'nextcheck': todayiso, 'fingerprint': None, } # Always full-repack brand new obstrepos repack_level = 2 else: obj_info = grokmirror.get_repo_obj_info(obstrepo) repack_level = get_repack_level(obj_info) nextcheck = datetime.datetime.strptime(status[obstrepo]['nextcheck'], '%Y-%m-%d') if repack_level > 1 and nextcheck > today: # Don't do full repacks outside of schedule repack_level = 1 if repack_level: queued += 1 to_process.add((obstrepo, 'repack', repack_level)) if repack_level > 1: logger.info(' queued: %s (full repack)', os.path.basename(obstrepo)) else: logger.info(' queued: %s (repack)', os.path.basename(obstrepo)) elif repack_only or repack_all_quick or repack_all_full: continue elif (nextcheck <= today or force) and not repack_only: queued += 1 status[obstrepo]['nextcheck'] = nextcheck.strftime('%F') to_process.add((obstrepo, 'fsck', None)) logger.info(' queued: %s (fsck)', os.path.basename(obstrepo)) logger.info(' done: %s analyzed, %s queued', analyzed, queued) if obst_changes: # We keep the same mtime, because the repos themselves haven't changed grokmirror.manifest_lock(manifile) # Re-read manifest, so we can update reference and forkgroup data disk_manifest = grokmirror.read_manifest(manifile) # Go through my manifest and update and changes in forkgroup data for gitdir in manifest: if gitdir not in disk_manifest: # What happened here? continue if 'reference' in manifest[gitdir]: disk_manifest[gitdir]['reference'] = manifest[gitdir]['reference'] if 'forkgroup' in manifest[gitdir]: disk_manifest[gitdir]['forkgroup'] = manifest[gitdir]['forkgroup'] grokmirror.write_manifest(manifile, disk_manifest, pretty=pretty) grokmirror.manifest_unlock(manifile) if not len(to_process): logger.info('No repos need attention.') return # Delete some vars that are huge for large repo sets -- we no longer need them and the # next step will likely eat lots of ram. del obst_roots del top_roots gc.collect() logger.info('Processing %s repositories', len(to_process)) for fullpath, action, repack_level in to_process: logger.info('%s:', fullpath) start_size = get_repo_size(fullpath) checkdelay = frequency if not force else random.randint(1, frequency) nextcheck = today + datetime.timedelta(days=checkdelay) # Calculate elapsed seconds startt = time.time() # Wait till the repo is available and lock it for the duration of checks, # otherwise there may be false-positives if a mirrored repo is updated # in the middle of fsck or repack. grokmirror.lock_repo(fullpath, nonblocking=False) if action == 'repack': if run_git_repack(fullpath, config, repack_level): status[fullpath]['lastrepack'] = todayiso if repack_level > 1: try: os.unlink(os.path.join(fullpath, 'grokmirror.repack')) except FileNotFoundError: pass status[fullpath]['lastfullrepack'] = todayiso status[fullpath]['lastcheck'] = todayiso status[fullpath]['nextcheck'] = nextcheck.strftime('%F') # Do we need to generate a preload bundle? if config['fsck'].get('preload_bundle_outdir') and grokmirror.is_obstrepo(fullpath, obstdir): gen_preload_bundle(fullpath, config) logger.info(' next: %s', status[fullpath]['nextcheck']) else: logger.warning('Repacking %s was unsuccessful', fullpath) grokmirror.unlock_repo(fullpath) continue elif action == 'fsck': run_git_fsck(fullpath, config, conn_only) status[fullpath]['lastcheck'] = todayiso status[fullpath]['nextcheck'] = nextcheck.strftime('%F') logger.info(' next: %s', status[fullpath]['nextcheck']) gitdir = '/' + os.path.relpath(fullpath, toplevel) status[fullpath]['fingerprint'] = grokmirror.get_repo_fingerprint(toplevel, gitdir) # noinspection PyTypeChecker elapsed = int(time.time()-startt) status[fullpath]['s_elapsed'] = elapsed # We're done with the repo now grokmirror.unlock_repo(fullpath) total_checked += 1 total_elapsed += elapsed saved = start_size - get_repo_size(fullpath) space_saved += saved if saved > 0: logger.info(' done: %ss, %s saved', elapsed, get_human_size(saved)) else: logger.info(' done: %ss', elapsed) if space_saved > 0: logger.info(' ---: %s done, %s queued, %s saved', total_checked, len(to_process)-total_checked, get_human_size(space_saved)) else: logger.info(' ---: %s done, %s queued', total_checked, len(to_process)-total_checked) # Write status file after each check, so if the process dies, we won't # have to recheck all the repos we've already checked logger.debug('Updating status file in %s', statusfile) with open(statusfile, 'w') as stfh: stfh.write(json.dumps(status, indent=2)) logger.info('Processed %s repos in %0.2fs', total_checked, total_elapsed) with open(statusfile, 'w') as stfh: stfh.write(json.dumps(status, indent=2)) lockf(flockh, LOCK_UN) flockh.close() def parse_args(): import argparse # noinspection PyTypeChecker op = argparse.ArgumentParser(prog='grok-fsck', description='Optimize and check mirrored repositories', formatter_class=argparse.ArgumentDefaultsHelpFormatter) op.add_argument('-v', '--verbose', dest='verbose', action='store_true', default=False, help='Be verbose and tell us what you are doing') op.add_argument('-f', '--force', dest='force', action='store_true', default=False, help='Force immediate run on all repositories') op.add_argument('-c', '--config', dest='config', required=True, help='Location of the configuration file') op.add_argument('--repack-only', dest='repack_only', action='store_true', default=False, help='Only find and repack repositories that need optimizing') op.add_argument('--connectivity-only', dest='conn_only', action='store_true', default=False, help='Only check connectivity when running fsck checks') op.add_argument('--repack-all-quick', dest='repack_all_quick', action='store_true', default=False, help='(Assumes --force): Do a quick repack of all repos') op.add_argument('--repack-all-full', dest='repack_all_full', action='store_true', default=False, help='(Assumes --force): Do a full repack of all repos') op.add_argument('--version', action='version', version=grokmirror.VERSION) opts = op.parse_args() if opts.repack_all_quick and opts.repack_all_full: op.error('Pick either --repack-all-full or --repack-all-quick') return opts def grok_fsck(cfgfile, verbose=False, force=False, repack_only=False, conn_only=False, repack_all_quick=False, repack_all_full=False): global logger config = grokmirror.load_config_file(cfgfile) obstdir = config['core'].get('objstore', None) if obstdir is None: obstdir = os.path.join(config['core'].get('toplevel'), 'objstore') config['core']['objstore'] = obstdir logfile = config['core'].get('log', None) if config['core'].get('loglevel', 'info') == 'debug': loglevel = logging.DEBUG else: loglevel = logging.INFO logger = grokmirror.init_logger('fsck', logfile, loglevel, verbose) rh = io.StringIO() ch = logging.StreamHandler(stream=rh) formatter = logging.Formatter('%(message)s') ch.setFormatter(formatter) ch.setLevel(logging.CRITICAL) logger.addHandler(ch) fsck_mirror(config, force, repack_only, conn_only, repack_all_quick, repack_all_full) report = rh.getvalue() if len(report): msg = EmailMessage() msg.set_content(report) subject = config['fsck'].get('report_subject') if not subject: import platform subject = 'grok-fsck errors on {} ({})'.format(platform.node(), cfgfile) msg['Subject'] = subject from_addr = config['fsck'].get('report_from', 'root') msg['From'] = from_addr report_to = config['fsck'].get('report_to', 'root') msg['To'] = report_to mailhost = config['fsck'].get('report_mailhost', 'localhost') s = smtplib.SMTP(mailhost) s.send_message(msg) s.quit() def command(): opts = parse_args() return grok_fsck(opts.config, opts.verbose, opts.force, opts.repack_only, opts.conn_only, opts.repack_all_quick, opts.repack_all_full) if __name__ == '__main__': command() grokmirror-2.0.11/grokmirror/manifest.py000077500000000000000000000351021410330145700203470ustar00rootroot00000000000000# -*- coding: utf-8 -*- # Copyright (C) 2013-2020 by The Linux Foundation and contributors # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import sys import logging import datetime import grokmirror logger = logging.getLogger(__name__) objstore_uses_plumbing = False def update_manifest(manifest, toplevel, fullpath, usenow, ignorerefs): logger.debug('Examining %s', fullpath) if not grokmirror.is_bare_git_repo(fullpath): logger.critical('Error opening %s.', fullpath) logger.critical('Make sure it is a bare git repository.') sys.exit(1) gitdir = '/' + os.path.relpath(fullpath, toplevel) repoinfo = grokmirror.get_repo_defs(toplevel, gitdir, usenow=usenow, ignorerefs=ignorerefs) # Ignore it if it's an empty git repository if not repoinfo['fingerprint']: logger.info(' manifest: ignored %s (no heads)', gitdir) return if gitdir not in manifest: # In grokmirror-1.x we didn't normalize paths to be always with a leading '/', so # check the manifest for both and make sure we only save the path with a leading / if gitdir.lstrip('/') in manifest: manifest[gitdir] = manifest.pop(gitdir.lstrip('/')) logger.info(' manifest: updated %s', gitdir) else: logger.info(' manifest: added %s', gitdir) manifest[gitdir] = dict() else: logger.info(' manifest: updated %s', gitdir) altrepo = grokmirror.get_altrepo(fullpath) reference = None if manifest[gitdir].get('forkgroup', None) != repoinfo.get('forkgroup', None): # Use the first remote listed in the forkgroup as our reference, just so # grokmirror-1.x clients continue to work without doing full clones remotes = grokmirror.list_repo_remotes(altrepo, withurl=True) if len(remotes): urls = list(x[1] for x in remotes) urls.sort() reference = '/' + os.path.relpath(urls[0], toplevel) else: reference = manifest[gitdir].get('reference', None) if altrepo and not reference and not repoinfo.get('forkgroup'): # Not an objstore repo reference = '/' + os.path.relpath(altrepo, toplevel) manifest[gitdir].update(repoinfo) # Always write a reference entry even if it's None, as grok-1.x clients expect it manifest[gitdir]['reference'] = reference def set_symlinks(manifest, toplevel, symlinks): for symlink in symlinks: target = os.path.realpath(symlink) if not os.path.exists(target): logger.critical(' manifest: symlink %s is broken, ignored', symlink) continue relative = '/' + os.path.relpath(symlink, toplevel) if target.find(toplevel) < 0: logger.critical(' manifest: symlink %s points outside toplevel, ignored', relative) continue tgtgitdir = '/' + os.path.relpath(target, toplevel) if tgtgitdir not in manifest: logger.critical(' manifest: symlink %s points to %s, which we do not recognize', relative, tgtgitdir) continue if 'symlinks' in manifest[tgtgitdir]: if relative not in manifest[tgtgitdir]['symlinks']: logger.info(' manifest: symlinked %s->%s', relative, tgtgitdir) manifest[tgtgitdir]['symlinks'].append(relative) else: logger.info(' manifest: %s->%s is already in manifest', relative, tgtgitdir) else: manifest[tgtgitdir]['symlinks'] = [relative] logger.info(' manifest: symlinked %s->%s', relative, tgtgitdir) # Now go through all repos and fix any references pointing to the # symlinked location. We shouldn't need to do anything with forkgroups. for gitdir in manifest: if manifest[gitdir] == relative: logger.info(' manifest: removing %s (replaced by a symlink)', gitdir) manifest.pop(gitdir) continue if manifest[gitdir]['reference'] == relative: logger.info(' manifest: symlinked %s->%s', relative, tgtgitdir) manifest[gitdir]['reference'] = tgtgitdir def purge_manifest(manifest, toplevel, gitdirs): for oldrepo in list(manifest): if os.path.join(toplevel, oldrepo.lstrip('/')) not in gitdirs: logger.info(' manifest: purged %s (gone)', oldrepo) manifest.remove(oldrepo) def parse_args(): global objstore_uses_plumbing import argparse # noinspection PyTypeChecker op = argparse.ArgumentParser(prog='grok-manifest', description='Create or update a manifest file', formatter_class=argparse.ArgumentDefaultsHelpFormatter) op.add_argument('--cfgfile', dest='cfgfile', default=None, help='Path to grokmirror.conf containing at least a [core] section') op.add_argument('-m', '--manifest', dest='manifile', help='Location of manifest.js or manifest.js.gz') op.add_argument('-t', '--toplevel', dest='toplevel', help='Top dir where all repositories reside') op.add_argument('-l', '--logfile', dest='logfile', default=None, help='When specified, will put debug logs in this location') op.add_argument('-n', '--use-now', dest='usenow', action='store_true', default=False, help='Use current timestamp instead of parsing commits') op.add_argument('-c', '--check-export-ok', dest='check_export_ok', action='store_true', default=False, help='Export only repositories marked as git-daemon-export-ok') op.add_argument('-p', '--purge', dest='purge', action='store_true', default=False, help='Purge deleted git repositories from manifest') op.add_argument('-x', '--remove', dest='remove', action='store_true', default=False, help='Remove repositories passed as arguments from manifest') op.add_argument('-y', '--pretty', dest='pretty', action='store_true', default=False, help='Pretty-print manifest (sort keys and add indentation)') op.add_argument('-i', '--ignore-paths', dest='ignore', action='append', default=None, help='When finding git dirs, ignore these paths (accepts shell-style globbing)') op.add_argument('-r', '--ignore-refs', dest='ignore_refs', action='append', default=None, help='Refs to exclude from fingerprint calculation (e.g. refs/meta/*)') op.add_argument('-w', '--wait-for-manifest', dest='wait', action='store_true', default=False, help='When running with arguments, wait if manifest is not there ' '(can be useful when multiple writers are writing the manifest)') op.add_argument('-o', '--fetch-objstore', dest='fetchobst', action='store_true', default=False, help='Fetch updates into objstore repo (if used)') op.add_argument('-v', '--verbose', dest='verbose', action='store_true', default=False, help='Be verbose and tell us what you are doing') op.add_argument('--version', action='version', version=grokmirror.VERSION) op.add_argument('paths', nargs='*', help='Full path(s) to process') opts = op.parse_args() if opts.cfgfile: config = grokmirror.load_config_file(opts.cfgfile) if not opts.manifile: opts.manifile = config['core'].get('manifest') if not opts.toplevel: opts.toplevel = os.path.realpath(config['core'].get('toplevel')) if not opts.logfile: opts.logfile = config['core'].get('logfile') objstore_uses_plumbing = config['core'].getboolean('objstore_uses_plumbing', False) if 'manifest' in config: if not opts.ignore: opts.ignore = [x.strip() for x in config['manifest'].get('ignore', '').split('\n')] if not opts.check_export_ok: opts.check_export_ok = config['manifest'].getboolean('check_export_ok', False) if not opts.pretty: opts.pretty = config['manifest'].getboolean('pretty', False) if not opts.fetchobst: opts.fetchobst = config['manifest'].getboolean('fetch_objstore', False) if not opts.manifile: op.error('You must provide the path to the manifest file') if not opts.toplevel: op.error('You must provide the toplevel path') if opts.ignore is None: opts.ignore = list() if not len(opts.paths) and opts.wait: op.error('--wait option only makes sense when dirs are passed') return opts def grok_manifest(manifile, toplevel, paths=None, logfile=None, usenow=False, check_export_ok=False, purge=False, remove=False, pretty=False, ignore=None, wait=False, verbose=False, fetchobst=False, ignorerefs=None): global logger loglevel = logging.INFO logger = grokmirror.init_logger('manifest', logfile, loglevel, verbose) startt = datetime.datetime.now() if paths is None: paths = list() if ignore is None: ignore = list() grokmirror.manifest_lock(manifile) manifest = grokmirror.read_manifest(manifile, wait=wait) toplevel = os.path.realpath(toplevel) # If manifest is empty, don't use current timestamp if not len(manifest): usenow = False if remove and len(paths): # Remove the repos as required, write new manfiest and exit for fullpath in paths: repo = '/' + os.path.relpath(fullpath, toplevel) if repo in manifest: manifest.pop(repo) logger.info(' manifest: removed %s', repo) else: # Is it in any of the symlinks? found = False for gitdir in manifest: if 'symlinks' in manifest[gitdir] and repo in manifest[gitdir]['symlinks']: found = True manifest[gitdir]['symlinks'].remove(repo) if not len(manifest[gitdir]['symlinks']): manifest[gitdir].pop('symlinks') logger.info(' manifest: removed symlink %s->%s', repo, gitdir) if not found: logger.info(' manifest: %s not in manifest', repo) # XXX: need to add logic to make sure we don't break the world # by removing a repository used as a reference for others grokmirror.write_manifest(manifile, manifest, pretty=pretty) grokmirror.manifest_unlock(manifile) return 0 gitdirs = list() if purge or not len(paths) or not len(manifest): # We automatically purge when we do a full tree walk for gitdir in grokmirror.find_all_gitdirs(toplevel, ignore=ignore, exclude_objstore=True): gitdirs.append(gitdir) purge_manifest(manifest, toplevel, gitdirs) if len(manifest) and len(paths): # limit ourselves to passed dirs only when there is something # in the manifest. This precaution makes sure we regenerate the # whole file when there is nothing in it or it can't be parsed. for apath in paths: arealpath = os.path.realpath(apath) if apath != arealpath and os.path.islink(apath): gitdirs.append(apath) else: gitdirs.append(arealpath) symlinks = list() tofetch = set() for gitdir in gitdirs: # check to make sure this gitdir is ok to export if check_export_ok and not os.path.exists(os.path.join(gitdir, 'git-daemon-export-ok')): # is it curently in the manifest? repo = '/' + os.path.relpath(gitdir, toplevel) if repo in list(manifest): logger.info(' manifest: removed %s (no longer exported)', repo) manifest.pop(repo) # XXX: need to add logic to make sure we don't break the world # by removing a repository used as a reference for others # also make sure we clean up any dangling symlinks continue if os.path.islink(gitdir): symlinks.append(gitdir) else: update_manifest(manifest, toplevel, gitdir, usenow, ignorerefs) if fetchobst: # Do it after we're done with manifest, to avoid keeping it locked tofetch.add(gitdir) if len(symlinks): set_symlinks(manifest, toplevel, symlinks) grokmirror.write_manifest(manifile, manifest, pretty=pretty) grokmirror.manifest_unlock(manifile) fetched = set() for gitdir in tofetch: altrepo = grokmirror.get_altrepo(gitdir) if altrepo in fetched: continue if altrepo and grokmirror.is_obstrepo(altrepo): try: grokmirror.lock_repo(altrepo, nonblocking=True) logger.info(' manifest: objstore %s -> %s', gitdir, os.path.basename(altrepo)) grokmirror.fetch_objstore_repo(altrepo, gitdir, use_plumbing=objstore_uses_plumbing) grokmirror.unlock_repo(altrepo) fetched.add(altrepo) except IOError: # grok-fsck will fetch this one, then pass elapsed = datetime.datetime.now() - startt if len(gitdirs) > 1: logger.info('Updated %s records in %ds', len(gitdirs), elapsed.total_seconds()) else: logger.info('Done in %0.2fs', elapsed.total_seconds()) def command(): opts = parse_args() return grok_manifest( opts.manifile, opts.toplevel, paths=opts.paths, logfile=opts.logfile, usenow=opts.usenow, check_export_ok=opts.check_export_ok, purge=opts.purge, remove=opts.remove, pretty=opts.pretty, ignore=opts.ignore, wait=opts.wait, verbose=opts.verbose, fetchobst=opts.fetchobst, ignorerefs=opts.ignore_refs) if __name__ == '__main__': command() grokmirror-2.0.11/grokmirror/pi_piper.py000077500000000000000000000171431410330145700203550ustar00rootroot00000000000000#!/usr/bin/env python3 # -*- coding: utf-8 -*- # # This is a ready-made post_update_hook script for piping messages from # mirrored public-inbox repositories to arbitrary commands (e.g. procmail). # __author__ = 'Konstantin Ryabitsev ' import os import sys import grokmirror import fnmatch import logging import shlex from typing import Optional # default basic logger. We override it later. logger = logging.getLogger(__name__) def git_get_message_from_pi(fullpath: str, commit_id: str) -> bytes: logger.debug('Getting %s:m from %s', commit_id, fullpath) args = ['show', f'{commit_id}:m'] ecode, out, err = grokmirror.run_git_command(fullpath, args, decode=False) if ecode > 0: logger.debug('Could not get the message, error below') logger.debug(err.decode()) raise KeyError('Could not find %s in %s' % (commit_id, fullpath)) return out def git_get_new_revs(fullpath: str, pipelast: Optional[int] = None) -> list: statf = os.path.join(fullpath, 'pi-piper.latest') if pipelast: rev_range = '-n %d' % pipelast else: with open(statf, 'r') as fh: latest = fh.read().strip() rev_range = f'{latest}..' args = ['rev-list', '--pretty=oneline', '--reverse', rev_range, 'master'] ecode, out, err = grokmirror.run_git_command(fullpath, args) if ecode > 0: raise KeyError('Could not iterate %s in %s' % (rev_range, fullpath)) newrevs = list() if out: for line in out.split('\n'): (commit_id, logmsg) = line.split(' ', 1) logger.debug('commit_id=%s, subject=%s', commit_id, logmsg) newrevs.append((commit_id, logmsg)) return newrevs def reshallow(repo: str, commit_id: str) -> int: with open(os.path.join(repo, 'shallow'), 'w') as fh: fh.write(commit_id) fh.write('\n') logger.info(' prune: %s ', repo) ecode, out, err = grokmirror.run_git_command(repo, ['gc', '--prune=now']) return ecode def init_piper_tracking(repo: str, shallow: bool) -> bool: logger.info('Initial setup for %s', repo) args = ['rev-list', '-n', '1', 'master'] ecode, out, err = grokmirror.run_git_command(repo, args) if ecode > 0 or not out: logger.info('Could not list revs in %s', repo) return False # Just write latest into the tracking file and return latest = out.strip() statf = os.path.join(repo, 'pi-piper.latest') with open(statf, 'w') as fh: fh.write(latest) if shallow: reshallow(repo, latest) return True def run_pi_repo(repo: str, pipedef: str, dryrun: bool = False, shallow: bool = False, pipelast: Optional[int] = None) -> None: logger.info('Checking %s', repo) sp = shlex.shlex(pipedef, posix=True) sp.whitespace_split = True args = list(sp) if not os.access(args[0], os.EX_OK): logger.critical('Cannot execute %s', pipedef) sys.exit(1) statf = os.path.join(repo, 'pi-piper.latest') if not os.path.exists(statf): if dryrun: logger.info('Would have set up piper for %s [DRYRUN]', repo) return if not init_piper_tracking(repo, shallow): logger.critical('Unable to set up piper for %s', repo) return try: revlist = git_get_new_revs(repo, pipelast=pipelast) except KeyError: # this could have happened if the public-inbox repository # got rebased, e.g. due to GDPR-induced history editing. # For now, bluntly handle this by getting rid of our # status file and pretending we just started new. # XXX: in reality, we could handle this better by keeping track # of the subject line of the latest message we processed, and # then going through history to find the new commit-id of that # message. Unless, of course, that's the exact message that got # deleted in the first place. :/ # This also makes it hard with shallow repos, since we'd have # to unshallow them first in order to find that message. logger.critical('Assuming the repository got rebased, dropping all history.') os.unlink(statf) if not dryrun: init_piper_tracking(repo, shallow) revlist = git_get_new_revs(repo) if not revlist: return logger.info('Processing %s commits', len(revlist)) latest_good = None ecode = 0 for commit_id, subject in revlist: msgbytes = git_get_message_from_pi(repo, commit_id) if msgbytes: if dryrun: logger.info(' piping: %s (%s b) [DRYRUN]', commit_id, len(msgbytes)) logger.debug(' subject: %s', subject) else: logger.info(' piping: %s (%s b)', commit_id, len(msgbytes)) logger.debug(' subject: %s', subject) ecode, out, err = grokmirror.run_shell_command(args, stdin=msgbytes) if ecode > 0: logger.info('Error running %s', pipedef) logger.info(err) break latest_good = commit_id if latest_good and not dryrun: with open(statf, 'w') as fh: fh.write(latest_good) logger.info('Wrote %s', statf) if ecode == 0 and shallow: reshallow(repo, latest_good) sys.exit(ecode) def command(): import argparse from configparser import ConfigParser, ExtendedInterpolation global logger # noinspection PyTypeChecker op = argparse.ArgumentParser(prog='grok-pi-piper', description='Pipe new messages from public-inbox repositories to arbitrary commands', formatter_class=argparse.ArgumentDefaultsHelpFormatter) op.add_argument('-v', '--verbose', action='store_true', default=False, help='Be verbose and tell us what you are doing') op.add_argument('-d', '--dry-run', dest='dryrun', action='store_true', default=False, help='Do a dry-run and just show what would be done') op.add_argument('-c', '--config', required=True, help='Location of the configuration file') op.add_argument('-l', '--pipe-last', dest='pipelast', type=int, default=None, help='Force pipe last NN messages in the list, regardless of tracking') op.add_argument('repo', help='Full path to foo/git/N.git public-inbox repository') op.add_argument('--version', action='version', version=grokmirror.VERSION) opts = op.parse_args() cfgfile = os.path.expanduser(opts.config) if not cfgfile: sys.stderr.write('ERORR: File does not exist: %s\n' % cfgfile) sys.exit(1) config = ConfigParser(interpolation=ExtendedInterpolation()) config.read(os.path.expanduser(cfgfile)) # Find out the section that we want from the config file section = 'DEFAULT' for sectname in config.sections(): if fnmatch.fnmatch(opts.repo, f'*/{sectname}/git/*.git'): section = sectname pipe = config[section].get('pipe') if pipe == 'None': # Quick exit sys.exit(0) logfile = config[section].get('log') if config[section].get('loglevel') == 'debug': loglevel = logging.DEBUG else: loglevel = logging.INFO shallow = config[section].getboolean('shallow', False) # noqa logger = grokmirror.init_logger('pull', logfile, loglevel, opts.verbose) run_pi_repo(opts.repo, pipe, dryrun=opts.dryrun, shallow=shallow, pipelast=opts.pipelast) if __name__ == '__main__': command() grokmirror-2.0.11/grokmirror/pull.py000077500000000000000000001555551410330145700175340ustar00rootroot00000000000000# -*- coding: utf-8 -*- # Copyright (C) 2013-2020 by The Linux Foundation and contributors # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import stat import sys import grokmirror import logging import requests import time import gzip import json import fnmatch import shutil import tempfile import signal import shlex import calendar import uuid import multiprocessing as mp import queue from socketserver import UnixStreamServer, StreamRequestHandler, ThreadingMixIn # default basic logger. We override it later. logger = logging.getLogger(__name__) class SignalHandler: def __init__(self, config, sw, dws, pws, done): self.config = config self.sw = sw self.dws = dws self.pws = pws self.done = done self.killed = False def _handler(self, signum, frame): self.killed = True logger.debug('Received signum=%s, frame=%s', signum, frame) # if self.sw: # self.sw.terminate() # self.sw.join() # for dw in self.dws: # if dw and dw.is_alive(): # dw.terminate() # dw.join() # for pw in self.pws: # if pw and pw.is_alive(): # pw.terminate() # pw.join() if len(self.done): update_manifest(self.config, self.done) logger.info('Exiting on signal %s', signum) sys.exit(0) def __enter__(self): self.old_sigint = signal.signal(signal.SIGINT, self._handler) self.old_sigterm = signal.signal(signal.SIGTERM, self._handler) def __exit__(self, sigtype, value, traceback): if self.killed: sys.exit(0) signal.signal(signal.SIGINT, self.old_sigint) signal.signal(signal.SIGTERM, self.old_sigterm) class Handler(StreamRequestHandler): def handle(self): config = self.server.config manifile = config['core'].get('manifest') while True: # noinspection PyBroadException try: gitdir = self.rfile.readline().strip().decode() # Do we know anything about this path? manifest = grokmirror.read_manifest(manifile) if gitdir in manifest: logger.info(' listener: %s', gitdir) repoinfo = manifest[gitdir] # Set fingerprint to None to force a run repoinfo['fingerprint'] = None repoinfo['modified'] = int(time.time()) self.server.q_mani.put((gitdir, repoinfo, 'pull')) elif gitdir: logger.info(' listener: %s (not known, ignored)', gitdir) return else: return except: return class ThreadedUnixStreamServer(ThreadingMixIn, UnixStreamServer): pass def build_optimal_forkgroups(l_manifest, r_manifest, toplevel, obstdir): r_forkgroups = dict() for gitdir in set(r_manifest.keys()): fullpath = os.path.join(toplevel, gitdir.lstrip('/')) # our forkgroup info wins, because our own grok-fcsk may have found better siblings # unless we're cloning, in which case we have nothing to go by except remote info if gitdir in l_manifest: reference = l_manifest[gitdir].get('reference', None) forkgroup = l_manifest[gitdir].get('forkgroup', None) if reference is not None: r_manifest[gitdir]['reference'] = reference if forkgroup is not None: r_manifest[gitdir]['forkgroup'] = forkgroup else: reference = r_manifest[gitdir].get('reference', None) forkgroup = r_manifest[gitdir].get('forkgroup', None) if reference and not forkgroup: # probably a grokmirror-1.x manifest r_fullpath = os.path.join(toplevel, reference.lstrip('/')) for fg, fps in r_forkgroups.items(): if r_fullpath in fps: forkgroup = fg break if not forkgroup: # I guess we get to make a new one! forkgroup = str(uuid.uuid4()) r_forkgroups[forkgroup] = {r_fullpath} if forkgroup is not None: if forkgroup not in r_forkgroups: r_forkgroups[forkgroup] = set() r_forkgroups[forkgroup].add(fullpath) # Compare their forkgroups and my forkgroups in case we have a more optimal strategy forkgroups = grokmirror.get_forkgroups(obstdir, toplevel) for r_fg, r_siblings in r_forkgroups.items(): # if we have an intersection between their forkgroups and our forkgroups, then we use ours found = False for l_fg, l_siblings in forkgroups.items(): if l_siblings == r_siblings: # No changes there continue if len(l_siblings.intersection(r_siblings)): l_siblings.update(r_siblings) found = True break if not found: # We don't have any matches in existing repos, so make a new forkgroup forkgroups[r_fg] = r_siblings return forkgroups def spa_worker(config, q_spa, pauseonload): toplevel = os.path.realpath(config['core'].get('toplevel')) cpus = mp.cpu_count() saidpaused = False while True: if pauseonload: load = os.getloadavg() if load[0] > cpus: if not saidpaused: logger.info(' spa: paused (system load), %s waiting', q_spa.qsize()) saidpaused = True time.sleep(5) continue saidpaused = False try: (gitdir, actions) = q_spa.get(timeout=1) except queue.Empty: sys.exit(0) logger.debug('spa_worker: gitdir=%s, actions=%s', gitdir, actions) fullpath = os.path.join(toplevel, gitdir.lstrip('/')) try: grokmirror.lock_repo(fullpath, nonblocking=True) except IOError: # We'll get it during grok-fsck continue if not q_spa.empty(): logger.info(' spa: 1 active, %s waiting', q_spa.qsize()) else: logger.info(' spa: 1 active') done = list() for action in actions: if action in done: continue done.append(action) if action == 'objstore': altrepo = grokmirror.get_altrepo(fullpath) # Should we use plumbing for this? use_plumbing = config['core'].getboolean('objstore_uses_plumbing', False) grokmirror.fetch_objstore_repo(altrepo, fullpath, use_plumbing=use_plumbing) elif action == 'repack': logger.debug('quick-repacking %s', fullpath) args = ['repack', '-Adlq'] if 'fsck' in config: extraflags = config['fsck'].get('extra_repack_flags', '').split() if len(extraflags): args += extraflags ecode, out, err = grokmirror.run_git_command(fullpath, args) if ecode > 0: logger.debug('Could not repack %s', fullpath) elif action == 'packrefs': args = ['pack-refs'] ecode, out, err = grokmirror.run_git_command(fullpath, args) if ecode > 0: logger.debug('Could not pack-refs %s', fullpath) elif action == 'packrefs-all': args = ['pack-refs', '--all'] ecode, out, err = grokmirror.run_git_command(fullpath, args) if ecode > 0: logger.debug('Could not pack-refs %s', fullpath) grokmirror.unlock_repo(fullpath) logger.info(' spa: %s (done: %s)', gitdir, ', '.join(done)) def objstore_repo_preload(config, obstrepo): purl = config['remote'].get('preload_bundle_url') if not purl: return bname = os.path.basename(obstrepo)[:-4] obstdir = os.path.realpath(config['core'].get('objstore')) burl = '%s/%s.bundle' % (purl.rstrip('/'), bname) bfile = os.path.join(obstdir, '%s.bundle' % bname) try: sess = grokmirror.get_requests_session() resp = sess.get(burl, stream=True) resp.raise_for_status() logger.info(' objstore: downloading %s.bundle', bname) with open(bfile, 'wb') as fh: for chunk in resp.iter_content(chunk_size=8192): fh.write(chunk) resp.close() except: # noqa # Make sure we don't leave .bundle files lying around # Should we add logic to resume downloads here in the future? if os.path.exists(bfile): os.unlink(bfile) return # Now we clone from it into the objstore repo ecode, out, err = grokmirror.run_git_command(obstrepo, ['remote', 'add', '--mirror=fetch', '_preload', bfile]) if ecode == 0: logger.info(' objstore: preloading %s.bundle', bname) args = ['remote', 'update', '_preload'] ecode, out, err = grokmirror.run_git_command(obstrepo, args) if ecode > 0: logger.info(' objstore: failed to preload from %s.bundle', bname) else: # now pack refs and generate a commit graph grokmirror.run_git_command(obstrepo, ['pack-refs', '--all']) if grokmirror.git_newer_than('2.18.0'): grokmirror.run_git_command(obstrepo, ['commit-graph', 'write']) logger.info(' objstore: successful preload from %s.bundle', bname) # Regardless of what happened, we remove _preload and the bundle, then move on grokmirror.run_git_command(obstrepo, ['remote', 'rm', '_preload']) os.unlink(bfile) def pull_worker(config, q_pull, q_spa, q_done): toplevel = os.path.realpath(config['core'].get('toplevel')) obstdir = os.path.realpath(config['core'].get('objstore')) maxretries = config['pull'].getint('retries', 3) site = config['remote'].get('site') remotename = config['pull'].get('remotename', '_grokmirror') # Should we use plumbing for objstore operations? objstore_uses_plumbing = config['core'].getboolean('objstore_uses_plumbing', False) while True: try: (gitdir, repoinfo, action, q_action) = q_pull.get(timeout=1) except queue.Empty: sys.exit(0) logger.debug('pull_worker: gitdir=%s, action=%s', gitdir, action) fullpath = os.path.join(toplevel, gitdir.lstrip('/')) success = True spa_actions = list() try: grokmirror.lock_repo(fullpath, nonblocking=True) except IOError: # Take a quick nap and put it back into queue logger.info(' defer: %s (locked)', gitdir) time.sleep(5) q_pull.put((gitdir, repoinfo, action, q_action)) continue altrepo = grokmirror.get_altrepo(fullpath) obstrepo = None if altrepo and grokmirror.is_obstrepo(altrepo, obstdir): obstrepo = altrepo if action == 'purge': # Is it a symlink? if os.path.islink(fullpath): logger.info(' purge: %s', gitdir) os.unlink(fullpath) else: # is anything using us for alternates? if grokmirror.is_alt_repo(toplevel, gitdir): logger.debug('Not purging %s because it is used by other repos via alternates', fullpath) else: logger.info(' purge: %s', gitdir) shutil.rmtree(fullpath) if action == 'fix_params': logger.info(' reconfig: %s', gitdir) set_repo_params(fullpath, repoinfo) if action == 'fix_remotes': logger.info(' reorigin: %s', gitdir) success = fix_remotes(toplevel, gitdir, site, config) if success: set_repo_params(fullpath, repoinfo) action = 'pull' else: success = False if action == 'reclone': logger.info(' reclone: %s', gitdir) try: altrepo = grokmirror.get_altrepo(fullpath) shutil.move(fullpath, '%s.reclone' % fullpath) shutil.rmtree('%s.reclone' % fullpath) grokmirror.setup_bare_repo(fullpath) fix_remotes(toplevel, gitdir, site, config) set_repo_params(fullpath, repoinfo) if altrepo: grokmirror.set_altrepo(fullpath, altrepo) action = 'pull' except (PermissionError, IOError) as ex: logger.critical('Unable to remove %s: %s', fullpath, str(ex)) success = False if action in ('pull', 'objstore_migrate'): r_fp = repoinfo.get('fingerprint') my_fp = grokmirror.get_repo_fingerprint(toplevel, gitdir, force=True) if obstrepo: o_obj_info = grokmirror.get_repo_obj_info(obstrepo) if o_obj_info.get('count') == '0' and o_obj_info.get('in-pack') == '0' and not my_fp: # Try to preload the objstore repo directly objstore_repo_preload(config, obstrepo) if r_fp != my_fp: # Make sure we have the remote set up if action == 'pull' and remotename not in grokmirror.list_repo_remotes(fullpath): logger.info(' reorigin: %s', gitdir) fix_remotes(toplevel, gitdir, site, config) logger.info(' fetch: %s', gitdir) retries = 1 while True: success = pull_repo(fullpath, remotename) if success: break retries += 1 if retries > maxretries: break logger.info(' refetch: %s (try #%s)', gitdir, retries) if success: run_post_update_hook(toplevel, gitdir, config['pull'].get('post_update_hook', '')) post_pull_fp = grokmirror.get_repo_fingerprint(toplevel, gitdir, force=True) repoinfo['fingerprint'] = post_pull_fp altrepo = grokmirror.get_altrepo(fullpath) if post_pull_fp != my_fp: grokmirror.set_repo_fingerprint(toplevel, gitdir, fingerprint=post_pull_fp) if altrepo and grokmirror.is_obstrepo(altrepo, obstdir) and not repoinfo.get('private'): # do we have any objects in the objstore repo? o_obj_info = grokmirror.get_repo_obj_info(altrepo) if o_obj_info.get('count') == '0' and o_obj_info.get('in-pack') == '0': # We fetch right now, as other repos may be waiting on these objects logger.info(' objstore: %s', gitdir) grokmirror.fetch_objstore_repo(altrepo, fullpath, use_plumbing=objstore_uses_plumbing) if not objstore_uses_plumbing: spa_actions.append('repack') else: # We lazy-fetch in the spa spa_actions.append('objstore') if my_fp is None and not objstore_uses_plumbing: # Initial clone, trigger a repack after objstore spa_actions.append('repack') if my_fp is None: # This was the initial clone, so pack all refs spa_actions.append('packrefs-all') if not grokmirror.is_precious(fullpath): # See if doing a quick repack would be beneficial obj_info = grokmirror.get_repo_obj_info(fullpath) if grokmirror.get_repack_level(obj_info): # We only do quick repacks, so we don't care about precise level spa_actions.append('repack') spa_actions.append('packrefs') modified = repoinfo.get('modified') if modified is not None: set_agefile(toplevel, gitdir, modified) else: logger.debug('FP match, not pulling %s', gitdir) if action == 'objstore_migrate': spa_actions.append('objstore') spa_actions.append('repack') grokmirror.unlock_repo(fullpath) symlinks = repoinfo.get('symlinks') if os.path.exists(fullpath) and symlinks: for symlink in symlinks: target = os.path.join(toplevel, symlink.lstrip('/')) if os.path.islink(target): # are you pointing to where we need you? if os.path.realpath(target) != fullpath: # Remove symlink and recreate below logger.debug('Removed existing wrong symlink %s', target) os.unlink(target) elif os.path.exists(target): logger.warning('Deleted repo %s, because it is now a symlink to %s' % (target, fullpath)) shutil.rmtree(target) # Here we re-check if we still need to do anything if not os.path.exists(target): logger.info(' symlink: %s -> %s', symlink, gitdir) # Make sure the leading dirs are in place if not os.path.exists(os.path.dirname(target)): os.makedirs(os.path.dirname(target)) os.symlink(fullpath, target) q_done.put((gitdir, repoinfo, q_action, success)) if spa_actions: q_spa.put((gitdir, spa_actions)) def cull_manifest(manifest, config): includes = config['pull'].get('include', '*').split('\n') excludes = config['pull'].get('exclude', '').split('\n') culled = dict() for gitdir, repoinfo in manifest.items(): if not repoinfo.get('fingerprint'): logger.critical('Repo without fingerprint info (skipped): %s', gitdir) continue # does it fall under include? for include in includes: if fnmatch.fnmatch(gitdir, include): # Yes, but does it fall under excludes? excluded = False for exclude in excludes: if fnmatch.fnmatch(gitdir, exclude): excluded = True break if excluded: continue culled[gitdir] = manifest[gitdir] return culled def fix_remotes(toplevel, gitdir, site, config): remotename = config['pull'].get('remotename', '_grokmirror') fullpath = os.path.join(toplevel, gitdir.lstrip('/')) # Set our remote if remotename in grokmirror.list_repo_remotes(fullpath): logger.debug('\tremoving remote: %s', remotename) ecode, out, err = grokmirror.run_git_command(fullpath, ['remote', 'remove', remotename]) if ecode > 0: logger.critical('FATAL: Could not remove remote %s from %s', remotename, fullpath) return False # set my remote URL url = os.path.join(site, gitdir.lstrip('/')) ecode, out, err = grokmirror.run_git_command(fullpath, ['remote', 'add', '--mirror=fetch', remotename, url]) if ecode > 0: logger.critical('FATAL: Could not set %s to %s in %s', remotename, url, fullpath) return False ffonly = False for globpatt in set([x.strip() for x in config['pull'].get('ffonly', '').split('\n')]): if fnmatch.fnmatch(gitdir, globpatt): ffonly = True break if ffonly: grokmirror.set_git_config(fullpath, 'remote.{}.fetch'.format(remotename), 'refs/*:refs/*') logger.debug('\tset %s as %s (ff-only)', remotename, url) else: logger.debug('\tset %s as %s', remotename, url) return True def set_repo_params(fullpath, repoinfo): owner = repoinfo.get('owner') description = repoinfo.get('description') head = repoinfo.get('head') if owner is None and description is None and head is None: # Let the default git values be there, then return if description is not None: descfile = os.path.join(fullpath, 'description') contents = None if os.path.exists(descfile): with open(descfile) as fh: contents = fh.read() if contents != description: logger.debug('Setting %s description to: %s', fullpath, description) with open(descfile, 'w') as fh: fh.write(description) if owner is not None: logger.debug('Setting %s owner to: %s', fullpath, owner) grokmirror.set_git_config(fullpath, 'gitweb.owner', owner) if head is not None: headfile = os.path.join(fullpath, 'HEAD') contents = None if os.path.exists(headfile): with open(headfile) as fh: contents = fh.read().rstrip() if contents != head: logger.debug('Setting %s HEAD to: %s', fullpath, head) with open(headfile, 'w') as fh: fh.write('{}\n'.format(head)) def set_agefile(toplevel, gitdir, last_modified): grokmirror.set_repo_timestamp(toplevel, gitdir, last_modified) # set agefile, which can be used by cgit to show idle times # cgit recommends it to be yyyy-mm-dd hh:mm:ss cgit_fmt = time.strftime('%F %T', time.localtime(last_modified)) agefile = os.path.join(toplevel, gitdir.lstrip('/'), 'info/web/last-modified') if not os.path.exists(os.path.dirname(agefile)): os.makedirs(os.path.dirname(agefile)) with open(agefile, 'wt') as fh: fh.write('%s\n' % cgit_fmt) logger.debug('Wrote "%s" into %s', cgit_fmt, agefile) def run_post_update_hook(toplevel, gitdir, hookscripts): if not len(hookscripts): return for hookscript in hookscripts.split('\n'): hookscript = os.path.expanduser(hookscript.strip()) sp = shlex.shlex(hookscript, posix=True) sp.whitespace_split = True args = list(sp) logger.info(' hook: %s', ' '.join(args)) if not os.access(args[0], os.X_OK): logger.warning('post_update_hook %s is not executable', hookscript) continue fullpath = os.path.join(toplevel, gitdir.lstrip('/')) args.append(fullpath) logger.debug('Running: %s', ' '.join(args)) ecode, output, error = grokmirror.run_shell_command(args) if error: # Put hook stderror into warning logger.warning('Hook Stderr (%s): %s', gitdir, error) if output: # Put hook stdout into info logger.info('Hook Stdout (%s): %s', gitdir, output) def pull_repo(fullpath, remotename): args = ['remote', 'update', remotename, '--prune'] retcode, output, error = grokmirror.run_git_command(fullpath, args) success = False if retcode == 0: success = True if error: # Put things we recognize into debug debug = list() warn = list() for line in error.split('\n'): if line.find('From ') == 0: debug.append(line) elif line.find('-> ') > 0: debug.append(line) elif line.find('remote: warning:') == 0: debug.append(line) elif line.find('ControlSocket') >= 0: debug.append(line) elif not success: warn.append(line) else: debug.append(line) if debug: logger.debug('Stderr (%s): %s', fullpath, '\n'.join(debug)) if warn: logger.warning('Stderr (%s): %s', fullpath, '\n'.join(warn)) return success def write_projects_list(config, manifest): plpath = config['pull'].get('projectslist', '') if not plpath: return trimtop = config['pull'].get('projectslist_trimtop', '') add_symlinks = config['pull'].getboolean('projectslist_symlinks', False) (dirname, basename) = os.path.split(plpath) (fd, tmpfile) = tempfile.mkstemp(prefix=basename, dir=dirname) try: fh = os.fdopen(fd, 'wb', 0) for gitdir in manifest: if trimtop and gitdir.startswith(trimtop): pgitdir = gitdir[len(trimtop):] else: pgitdir = gitdir # Always remove leading slash, otherwise cgit breaks pgitdir = pgitdir.lstrip('/') fh.write('{}\n'.format(pgitdir).encode()) if add_symlinks and 'symlinks' in manifest[gitdir]: # Do the same for symlinks # XXX: Should make this configurable, perhaps for symlink in manifest[gitdir]['symlinks']: if trimtop and symlink.startswith(trimtop): symlink = symlink[len(trimtop):] symlink = symlink.lstrip('/') fh.write('{}\n'.format(symlink).encode()) os.fsync(fd) fh.close() # set mode to current umask curmask = os.umask(0) os.chmod(tmpfile, 0o0666 ^ curmask) os.umask(curmask) shutil.move(tmpfile, plpath) finally: # If something failed, don't leave tempfiles trailing around if os.path.exists(tmpfile): os.unlink(tmpfile) logger.info(' projlist: wrote %s', plpath) def fill_todo_from_manifest(config, q_mani, nomtime=False, forcepurge=False): # l_ = local, r_ = remote l_mani_path = config['core'].get('manifest') r_mani_cmd = config['remote'].get('manifest_command') if r_mani_cmd: if not os.access(r_mani_cmd, os.X_OK): logger.critical('Remote manifest command is not executable: %s', r_mani_cmd) sys.exit(1) logger.info(' manifest: executing %s', r_mani_cmd) cmdargs = [r_mani_cmd] if nomtime: cmdargs += ['--force'] (ecode, output, error) = grokmirror.run_shell_command(cmdargs) if ecode == 0: try: r_manifest = json.loads(output) except json.JSONDecodeError as ex: logger.warning('Failed to parse output from %s', r_mani_cmd) logger.warning('Error was: %s', ex) raise IOError('Failed to parse output from %s (%s)' % (r_mani_cmd, ex)) elif ecode == 127: logger.info(' manifest: unchanged') return elif ecode == 1: logger.warning('Executing %s failed, exiting', r_mani_cmd, ecode) raise IOError('Failed executing %s' % r_mani_cmd) else: # Non-fatal errors for all other exit codes logger.warning(' manifest: executing %s returned %s', r_mani_cmd, ecode) return if not len(r_manifest): logger.warning(' manifest: empty, ignoring') raise IOError('Empty manifest returned by %s' % r_mani_cmd) else: r_mani_status_path = os.path.join(os.path.dirname(l_mani_path), '.%s.remote' % os.path.basename(l_mani_path)) try: with open(r_mani_status_path, 'r') as fh: r_mani_status = json.loads(fh.read()) except (IOError, json.JSONDecodeError): logger.debug('Could not read %s', r_mani_status_path) r_mani_status = dict() r_last_fetched = r_mani_status.get('last-fetched', 0) config_last_modified = r_mani_status.get('config-last-modified', 0) if config_last_modified != config.last_modified: nomtime = True r_mani_url = config['remote'].get('manifest') logger.info(' manifest: fetching %s', r_mani_url) if r_mani_url.find('file:///') == 0: r_mani_url = r_mani_url.replace('file://', '') if not os.path.exists(r_mani_url): logger.critical('Remote manifest not found in %s! Quitting!', r_mani_url) raise IOError('Remote manifest not found in %s' % r_mani_url) fstat = os.stat(r_mani_url) r_last_modified = fstat[8] if r_last_fetched: logger.debug('mtime on %s is: %s', r_mani_url, fstat[8]) if not nomtime and r_last_modified <= r_last_fetched: logger.info(' manifest: unchanged') return logger.info('Reading new manifest from %s', r_mani_url) r_manifest = grokmirror.read_manifest(r_mani_url) # Don't accept empty manifests -- that indicates something is wrong if not len(r_manifest): logger.warning('Remote manifest empty or unparseable! Quitting.') raise IOError('Empty manifest in %s' % r_mani_url) else: session = grokmirror.get_requests_session() # Find out if we need to run at all first headers = dict() if r_last_fetched and not nomtime: last_modified_h = time.strftime('%a, %d %b %Y %H:%M:%S GMT', time.gmtime(r_last_fetched)) logger.debug('Our last-modified is: %s', last_modified_h) headers['If-Modified-Since'] = last_modified_h try: # 30 seconds to connect, 5 minutes between reads res = session.get(r_mani_url, headers=headers, timeout=(30, 300)) except requests.exceptions.RequestException as ex: logger.warning('Could not fetch %s', r_mani_url) logger.warning('Server returned: %s', ex) raise IOError('Remote server returned an error: %s' % ex) if res.status_code == 304: # No change to the manifest, nothing to do logger.info(' manifest: unchanged') return if res.status_code > 200: logger.warning('Could not fetch %s', r_mani_url) logger.warning('Server returned status: %s', res.status_code) raise IOError('Remote server returned an error: %s' % res.status_code) r_last_modified = res.headers['Last-Modified'] r_last_modified = time.strptime(r_last_modified, '%a, %d %b %Y %H:%M:%S %Z') r_last_modified = calendar.timegm(r_last_modified) # We don't use read_manifest for the remote manifest, as it can be # anything, really. For now, blindly open it with gzipfile if it ends # with .gz. XXX: some http servers will auto-deflate such files. try: if r_mani_url.rfind('.gz') > 0: import io fh = gzip.GzipFile(fileobj=io.BytesIO(res.content)) jdata = fh.read().decode() else: jdata = res.content res.close() # Don't hold session open, since we don't refetch manifest very frequently session.close() r_manifest = json.loads(jdata) except Exception as ex: logger.warning('Failed to parse %s', r_mani_url) logger.warning('Error was: %s', ex) raise IOError('Failed to parse %s (%s)' % (r_mani_url, ex)) # Record for the next run with open(r_mani_status_path, 'w') as fh: r_mani_status = { 'source': r_mani_url, 'last-fetched': r_last_modified, 'config-last-modified': config.last_modified, } json.dump(r_mani_status, fh) l_manifest = grokmirror.read_manifest(l_mani_path) r_culled = cull_manifest(r_manifest, config) logger.info(' manifest: %s relevant entries', len(r_culled)) toplevel = os.path.realpath(config['core'].get('toplevel')) obstdir = os.path.realpath(config['core'].get('objstore')) forkgroups = build_optimal_forkgroups(l_manifest, r_culled, toplevel, obstdir) privmasks = set([x.strip() for x in config['core'].get('private', '').split('\n')]) # populate private/forkgroup info in r_culled for forkgroup, siblings in forkgroups.items(): for s_fullpath in siblings: s_gitdir = '/' + os.path.relpath(s_fullpath, toplevel) is_private = False for privmask in privmasks: # Does this repo match privrepo if fnmatch.fnmatch(s_gitdir, privmask): is_private = True break if s_gitdir in r_culled: r_culled[s_gitdir]['forkgroup'] = forkgroup r_culled[s_gitdir]['private'] = is_private seen = set() to_migrate = set() # Used to track symlinks so we can properly avoid purging them all_symlinks = set() for gitdir, repoinfo in r_culled.items(): symlinks = repoinfo.get('symlinks') if symlinks and isinstance(symlinks, list): all_symlinks.update(set(symlinks)) if gitdir in seen: continue seen.add(gitdir) fullpath = os.path.join(toplevel, gitdir.lstrip('/')) forkgroup = repoinfo.get('forkgroup') # Is the directory in place? if os.path.exists(fullpath): # Did grok-fsck request to reclone it? rfile = os.path.join(fullpath, 'grokmirror.reclone') if os.path.exists(rfile): logger.debug('Reclone requested for %s:', gitdir) q_mani.put((gitdir, repoinfo, 'reclone')) with open(rfile, 'r') as rfh: reason = rfh.read() logger.debug(' %s', reason) continue if gitdir not in l_manifest: q_mani.put((gitdir, repoinfo, 'fix_remotes')) continue r_desc = r_culled[gitdir].get('description') r_owner = r_culled[gitdir].get('owner') r_head = r_culled[gitdir].get('head') l_desc = l_manifest[gitdir].get('description') l_owner = l_manifest[gitdir].get('owner') l_head = l_manifest[gitdir].get('head') if l_owner is None: l_owner = config['pull'].get('default_owner', 'Grokmirror') if r_owner is None: r_owner = config['pull'].get('default_owner', 'Grokmirror') if r_desc != l_desc or r_owner != l_owner or r_head != l_head: q_mani.put((gitdir, repoinfo, 'fix_params')) if symlinks and isinstance(symlinks, list): # Are all symlinks in place? for symlink in symlinks: linkpath = os.path.join(toplevel, symlink.lstrip('/')) if not os.path.islink(linkpath) or os.path.realpath(linkpath) != fullpath: q_mani.put((gitdir, repoinfo, 'fix_params')) break my_fingerprint = grokmirror.get_repo_fingerprint(toplevel, gitdir) if my_fingerprint != l_manifest[gitdir].get('fingerprint'): logger.debug('Fingerprint discrepancy, forcing a fetch') q_mani.put((gitdir, repoinfo, 'pull')) continue if my_fingerprint == r_culled[gitdir]['fingerprint']: logger.debug('Fingerprints match, skipping %s', gitdir) continue logger.debug('No fingerprint match, will pull %s', gitdir) q_mani.put((gitdir, repoinfo, 'pull')) continue if not forkgroup: # no-sibling repo q_mani.put((gitdir, repoinfo, 'init')) continue obstrepo = os.path.join(obstdir, '%s.git' % forkgroup) if os.path.isdir(obstrepo): # Init with an existing obstrepo, easy case q_mani.put((gitdir, repoinfo, 'init')) continue # Do we have any existing siblings that were cloned without obstrepo? # This would happen when an initial fork is created of an existing repo. found_existing = False public_siblings = set() for s_fullpath in forkgroups[forkgroup]: s_gitdir = '/' + os.path.relpath(s_fullpath, toplevel) if s_gitdir == gitdir: continue # can't simply rely on r_culled 'private' info, as this repo may only exist locally is_private = False for privmask in privmasks: # Does this repo match privrepo if fnmatch.fnmatch(s_gitdir, privmask): is_private = True break if is_private: # Can't use this sibling for anything, as it's private continue if os.path.isdir(s_fullpath): found_existing = True if s_gitdir not in to_migrate: # Plan to migrate it to objstore logger.debug('reusing existing %s as new obstrepo %s', s_gitdir, obstrepo) s_repoinfo = grokmirror.get_repo_defs(toplevel, s_gitdir, usenow=True) s_repoinfo['forkgroup'] = forkgroup s_repoinfo['private'] = False # Stick it into queue before the new clone q_mani.put((s_gitdir, s_repoinfo, 'objstore_migrate')) seen.add(s_gitdir) to_migrate.add(s_gitdir) break if s_gitdir in r_culled: public_siblings.add(s_gitdir) if found_existing: q_mani.put((gitdir, repoinfo, 'init')) continue if repoinfo['private'] and len(public_siblings): # Clone public siblings first for s_gitdir in public_siblings: if s_gitdir not in seen: q_mani.put((s_gitdir, r_culled[s_gitdir], 'init')) seen.add(s_gitdir) # Finally, clone ourselves. q_mani.put((gitdir, repoinfo, 'init')) if config['pull'].getboolean('purge', False): nopurge = config['pull'].get('nopurge', '').split('\n') to_purge = set() found_repos = 0 for founddir in grokmirror.find_all_gitdirs(toplevel, exclude_objstore=True): gitdir = '/' + os.path.relpath(founddir, toplevel) found_repos += 1 if gitdir not in r_culled and gitdir not in all_symlinks: exclude = False for entry in nopurge: if fnmatch.fnmatch(gitdir, entry): exclude = True break # Refuse to purge ffonly repos for globpatt in set([x.strip() for x in config['pull'].get('ffonly', '').split('\n')]): if fnmatch.fnmatch(gitdir, globpatt): # Woah, these are not supposed to be deleted, ever logger.critical('Refusing to purge ffonly repo %s', gitdir) exclude = True break if not exclude: logger.debug('Adding %s to to_purge', gitdir) to_purge.add(gitdir) if len(to_purge): # Purge-protection engage purge_limit = int(config['pull'].getint('purgeprotect', 5)) if purge_limit < 1 or purge_limit > 99: logger.critical('Warning: "%s" is not valid for purgeprotect.', purge_limit) logger.critical('Please set to a number between 1 and 99.') logger.critical('Defaulting to purgeprotect=5.') purge_limit = 5 purge_pc = int(len(to_purge) * 100 / found_repos) logger.debug('purgeprotect=%s', purge_limit) logger.debug('purge prercentage=%s', purge_pc) if not forcepurge and purge_pc >= purge_limit: logger.critical('Refusing to purge %s repos (%s%%)', len(to_purge), purge_pc) logger.critical('Set purgeprotect to a higher percentage, or override with --force-purge.') else: for gitdir in to_purge: logger.debug('Queued %s for purging', gitdir) q_mani.put((gitdir, None, 'purge')) else: logger.debug('No repositories need purging') def update_manifest(config, entries): manifile = config['core'].get('manifest') grokmirror.manifest_lock(manifile) manifest = grokmirror.read_manifest(manifile) changed = False while len(entries): gitdir, repoinfo, action, success = entries.pop() if not success: continue if action == 'purge': # Remove entry from manifest try: manifest.pop(gitdir) changed = True except KeyError: pass continue try: # does not belong in the manifest repoinfo.pop('private') except KeyError: pass for key, val in dict(repoinfo).items(): # Clean up grok-2.0 null values if key in ('head', 'forkgroup') and val is None: repoinfo.pop(key) # Make sure 'reference' is present to prevent grok-1.x breakage if 'reference' not in repoinfo: repoinfo['reference'] = None manifest[gitdir] = repoinfo changed = True if changed: if 'manifest' in config: pretty = config['manifest'].getboolean('pretty', False) else: pretty = False grokmirror.write_manifest(manifile, manifest, pretty=pretty) logger.info(' manifest: wrote %s (%d entries)', manifile, len(manifest)) # write out projects.list, if asked to write_projects_list(config, manifest) grokmirror.manifest_unlock(manifile) def socket_worker(config, q_mani, sockfile): logger.info(' listener: listening on socket %s', sockfile) curmask = os.umask(0) with ThreadedUnixStreamServer(sockfile, Handler) as server: os.umask(curmask) # Stick some objects into the server server.q_mani = q_mani server.config = config server.serve_forever() def showstats(q_todo, q_pull, q_spa, good, bad, pws, dws): stats = list() if good: stats.append('%s fetched' % good) if pws: stats.append('%s active' % len(pws)) if not q_pull.empty(): stats.append('%s queued' % q_pull.qsize()) if not q_todo.empty(): stats.append('%s waiting' % q_todo.qsize()) if len(dws) or not q_spa.empty(): stats.append('%s in spa' % (q_spa.qsize() + len(dws))) if bad: stats.append('%s errors' % bad) logger.info(' ---: %s', ', '.join(stats)) def manifest_worker(config, q_mani, nomtime=False): starttime = int(time.time()) fill_todo_from_manifest(config, q_mani, nomtime=nomtime) refresh = config['pull'].getint('refresh', 300) left = refresh - int(time.time() - starttime) if left > 0: logger.info(' manifest: sleeping %ss', left) def pull_mirror(config, nomtime=False, forcepurge=False, runonce=False): toplevel = os.path.realpath(config['core'].get('toplevel')) obstdir = os.path.realpath(config['core'].get('objstore')) refresh = config['pull'].getint('refresh', 300) q_mani = mp.Queue() q_todo = mp.Queue() q_pull = mp.Queue() q_done = mp.Queue() q_spa = mp.Queue() sw = None sockfile = config['pull'].get('socket') if sockfile and not runonce: if os.path.exists(sockfile): mode = os.stat(sockfile).st_mode if stat.S_ISSOCK(mode): os.unlink(sockfile) else: raise IOError('File exists but is not a socket: %s' % sockfile) sw = mp.Process(target=socket_worker, args=(config, q_mani, sockfile)) sw.daemon = True sw.start() pws = list() dws = list() mws = list() actions = set() # Run in the main thread if we have runonce if runonce: fill_todo_from_manifest(config, q_mani, nomtime=nomtime, forcepurge=forcepurge) if not q_mani.qsize(): return 0 else: # force nomtime to True the first time nomtime = True lastrun = 0 pull_threads = config['pull'].getint('pull_threads', 0) if pull_threads < 1 and mp.cpu_count() > 1: # take half of available CPUs by default pull_threads = int(mp.cpu_count() / 2) elif pull_threads < 1: pull_threads = 1 busy = set() done = list() good = 0 bad = 0 loopmark = None with SignalHandler(config, sw, dws, pws, done): while True: for pw in pws: if pw and not pw.is_alive(): pws.remove(pw) logger.info(' worker: terminated (%s remaining)', len(pws)) showstats(q_todo, q_pull, q_spa, good, bad, pws, dws) for dw in dws: if dw and not dw.is_alive(): dws.remove(dw) showstats(q_todo, q_pull, q_spa, good, bad, pws, dws) for mw in mws: if mw and not mw.is_alive(): mws.remove(mw) if not q_spa.empty() and not len(dws): if runonce: pauseonload = False else: pauseonload = True dw = mp.Process(target=spa_worker, args=(config, q_spa, pauseonload)) dw.daemon = True dw.start() dws.append(dw) if not q_pull.empty() and len(pws) < pull_threads: pw = mp.Process(target=pull_worker, args=(config, q_pull, q_spa, q_done)) pw.daemon = True pw.start() pws.append(pw) logger.info(' worker: started (%s running)', len(pws)) # Any new results? try: while True: gitdir, repoinfo, q_action, success = q_done.get_nowait() try: actions.remove((gitdir, q_action)) except KeyError: pass forkgroup = repoinfo.get('forkgroup') if forkgroup and forkgroup in busy: busy.remove(forkgroup) done.append((gitdir, repoinfo, q_action, success)) if success: good += 1 else: bad += 1 logger.info(' done: %s', gitdir) showstats(q_todo, q_pull, q_spa, good, bad, pws, dws) if len(done) >= 100: # Write manifest every 100 repos update_manifest(config, done) except queue.Empty: pass # Anything new in the manifest queue? try: new_updates = 0 while True: gitdir, repoinfo, action = q_mani.get_nowait() if (gitdir, action) in actions: logger.debug('already in the queue: %s, %s', gitdir, action) continue if action == 'pull' and (gitdir, 'init') in actions: logger.debug('already in the queue as init: %s, %s', gitdir, action) continue actions.add((gitdir, action)) q_todo.put((gitdir, repoinfo, action)) new_updates += 1 logger.debug('queued: %s, %s', gitdir, action) if new_updates: logger.info(' manifest: %s new updates', new_updates) except queue.Empty: pass if not runonce and not len(mws) and q_todo.empty() and q_pull.empty() and time.time() - lastrun >= refresh: if done: update_manifest(config, done) mw = mp.Process(target=manifest_worker, args=(config, q_mani, nomtime)) nomtime = False mw.daemon = True mw.start() mws.append(mw) lastrun = int(time.time()) # Finally, deal with q_todo try: gitdir, repoinfo, q_action = q_todo.get_nowait() logger.debug('main_thread: got %s/%s from q_todo', gitdir, q_action) except queue.Empty: if q_mani.empty() and q_done.empty(): if not len(pws): if done: update_manifest(config, done) if runonce: # Wait till spa is done while True: if q_spa.empty(): for dw in dws: dw.join() return 0 time.sleep(1) if len(pws): # Don't run a hot loop waiting on results time.sleep(5) else: # Shorter sleep if everything is idle time.sleep(1) continue if repoinfo is None: repoinfo = dict() fullpath = os.path.join(toplevel, gitdir.lstrip('/')) forkgroup = repoinfo.get('forkgroup') if gitdir in busy or (forkgroup is not None and forkgroup in busy): # Stick it back into the queue q_todo.put((gitdir, repoinfo, q_action)) if loopmark is None: loopmark = gitdir elif loopmark == gitdir: # We've looped around all waiting repos, so back off and don't run # a hot waiting loop. time.sleep(5) continue if gitdir == loopmark: loopmark = None if q_action == 'objstore_migrate': # Add forkgroup to busy, so we don't run any pulls until it's done busy.add(repoinfo['forkgroup']) obstrepo = grokmirror.setup_objstore_repo(obstdir, name=forkgroup) grokmirror.add_repo_to_objstore(obstrepo, fullpath) grokmirror.set_altrepo(fullpath, obstrepo) if q_action != 'init': # Easy actions that don't require priority logic q_pull.put((gitdir, repoinfo, q_action, q_action)) continue try: grokmirror.lock_repo(fullpath, nonblocking=True) except IOError: if not runonce: q_todo.put((gitdir, repoinfo, q_action)) continue if not grokmirror.setup_bare_repo(fullpath): logger.critical('Unable to bare-init %s', fullpath) q_done.put((gitdir, repoinfo, q_action, False)) continue fix_remotes(toplevel, gitdir, config['remote'].get('site'), config) set_repo_params(fullpath, repoinfo) grokmirror.unlock_repo(fullpath) forkgroup = repoinfo.get('forkgroup') if not forkgroup: logger.debug('no-sibling clone: %s', gitdir) q_pull.put((gitdir, repoinfo, 'pull', q_action)) continue obstrepo = os.path.join(obstdir, '%s.git' % forkgroup) if os.path.isdir(obstrepo): logger.debug('clone %s with existing obstrepo %s', gitdir, obstrepo) grokmirror.set_altrepo(fullpath, obstrepo) if not repoinfo['private']: grokmirror.add_repo_to_objstore(obstrepo, fullpath) q_pull.put((gitdir, repoinfo, 'pull', q_action)) continue # Set up a new obstrepo and make sure it's not used until the initial # pull is done logger.debug('cloning %s with new obstrepo %s', gitdir, obstrepo) busy.add(forkgroup) obstrepo = grokmirror.setup_objstore_repo(obstdir, name=forkgroup) grokmirror.set_altrepo(fullpath, obstrepo) if not repoinfo['private']: grokmirror.add_repo_to_objstore(obstrepo, fullpath) q_pull.put((gitdir, repoinfo, 'pull', q_action)) return 0 def parse_args(): import argparse # noinspection PyTypeChecker op = argparse.ArgumentParser(prog='grok-pull', description='Create or update a git repository collection mirror', formatter_class=argparse.ArgumentDefaultsHelpFormatter) op.add_argument('-v', '--verbose', dest='verbose', action='store_true', default=False, help='Be verbose and tell us what you are doing') op.add_argument('-n', '--no-mtime-check', dest='nomtime', action='store_true', default=False, help='Run without checking manifest mtime') op.add_argument('-p', '--purge', dest='purge', action='store_true', default=False, help='Remove any git trees that are no longer in manifest') op.add_argument('--force-purge', dest='forcepurge', action='store_true', default=False, help='Force purge despite significant repo deletions') op.add_argument('-o', '--continuous', dest='runonce', action='store_false', default=True, help='Run continuously (no effect if refresh is not set in config)') op.add_argument('-c', '--config', dest='config', required=True, help='Location of the configuration file') op.add_argument('--version', action='version', version=grokmirror.VERSION) return op.parse_args() def grok_pull(cfgfile, verbose=False, nomtime=False, purge=False, forcepurge=False, runonce=False): global logger config = grokmirror.load_config_file(cfgfile) if config['pull'].get('refresh', None) is None: runonce = True logfile = config['core'].get('log', None) if config['core'].get('loglevel', 'info') == 'debug': loglevel = logging.DEBUG else: loglevel = logging.INFO if purge: # Override the pull.purge setting config['pull']['purge'] = 'yes' logger = grokmirror.init_logger('pull', logfile, loglevel, verbose) return pull_mirror(config, nomtime, forcepurge, runonce) def command(): opts = parse_args() retval = grok_pull( opts.config, opts.verbose, opts.nomtime, opts.purge, opts.forcepurge, opts.runonce) sys.exit(retval) if __name__ == '__main__': command() grokmirror-2.0.11/man/000077500000000000000000000000001410330145700145415ustar00rootroot00000000000000grokmirror-2.0.11/man/grok-bundle.1000066400000000000000000000052401410330145700170350ustar00rootroot00000000000000.\" Man page generated from reStructuredText. . .TH GROK-BUNDLE 1 "2020-09-04" "2.0.0" "" .SH NAME GROK-BUNDLE \- Create clone.bundle files for use with "repo" . .nr rst2man-indent-level 0 . .de1 rstReportMargin \\$1 \\n[an-margin] level \\n[rst2man-indent-level] level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] - \\n[rst2man-indent0] \\n[rst2man-indent1] \\n[rst2man-indent2] .. .de1 INDENT .\" .rstReportMargin pre: . RS \\$1 . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] . nr rst2man-indent-level +1 .\" .rstReportMargin post: .. .de UNINDENT . RE .\" indent \\n[an-margin] .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] .nr rst2man-indent-level -1 .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. .SH SYNOPSIS .INDENT 0.0 .INDENT 3.5 grok\-bundle [options] \-c grokmirror.conf \-o path .UNINDENT .UNINDENT .SH DESCRIPTION .sp Android\(aqs "repo" tool will check for the presence of clone.bundle files before performing a fresh git clone. This is done in order to offload most of the git traffic to a CDN and reduce the load on git servers themselves. .sp This command will generate clone.bundle files in a hierarchy expected by repo. You can then sync the output directory to a CDN service. .SH OPTIONS .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .TP .B \-h\fP,\fB \-\-help show this help message and exit .TP .B \-v\fP,\fB \-\-verbose Be verbose and tell us what you are doing (default: False) .TP .BI \-c \ CONFIG\fP,\fB \ \-\-config \ CONFIG Location of the configuration file .TP .BI \-o \ OUTDIR\fP,\fB \ \-\-outdir \ OUTDIR Location where to store bundle files .TP .BI \-g \ GITARGS\fP,\fB \ \-\-gitargs \ GITARGS extra args to pass to git (default: \-c core.compression=9) .TP .BI \-r \ REVLISTARGS\fP,\fB \ \-\-revlistargs \ REVLISTARGS Rev\-list args to use (default: \-\-branches HEAD) .TP .BI \-s \ MAXSIZE\fP,\fB \ \-\-maxsize \ MAXSIZE Maximum size of git repositories to bundle (in GiB) (default: 2) .TP .BI \-i\fP,\fB \-\-include \ INCLUDE List repositories to bundle (accepts shell globbing) (default: *) .UNINDENT .UNINDENT .UNINDENT .SH EXAMPLES .INDENT 0.0 .INDENT 3.5 grok\-bundle \-c grokmirror.conf \-o /var/www/bundles \-i /pub/scm/linux/kernel/git/torvalds/linux.git /pub/scm/linux/kernel/git/stable/linux.git /pub/scm/linux/kernel/git/next/linux\-next.git .UNINDENT .UNINDENT .SH SEE ALSO .INDENT 0.0 .IP \(bu 2 grok\-pull(1) .IP \(bu 2 grok\-manifest(1) .IP \(bu 2 grok\-fsck(1) .IP \(bu 2 git(1) .UNINDENT .SH SUPPORT .sp Email \fI\%tools@linux.kernel.org\fP\&. .SH AUTHOR mricon@kernel.org License: GPLv3+ .SH COPYRIGHT The Linux Foundation and contributors .\" Generated by docutils manpage writer. . grokmirror-2.0.11/man/grok-bundle.1.rst000066400000000000000000000036041410330145700176460ustar00rootroot00000000000000GROK-BUNDLE =========== ------------------------------------------------- Create clone.bundle files for use with "repo" ------------------------------------------------- :Author: mricon@kernel.org :Date: 2020-09-04 :Copyright: The Linux Foundation and contributors :License: GPLv3+ :Version: 2.0.0 :Manual section: 1 SYNOPSIS -------- grok-bundle [options] -c grokmirror.conf -o path DESCRIPTION ----------- Android's "repo" tool will check for the presence of clone.bundle files before performing a fresh git clone. This is done in order to offload most of the git traffic to a CDN and reduce the load on git servers themselves. This command will generate clone.bundle files in a hierarchy expected by repo. You can then sync the output directory to a CDN service. OPTIONS ------- -h, --help show this help message and exit -v, --verbose Be verbose and tell us what you are doing (default: False) -c CONFIG, --config CONFIG Location of the configuration file -o OUTDIR, --outdir OUTDIR Location where to store bundle files -g GITARGS, --gitargs GITARGS extra args to pass to git (default: -c core.compression=9) -r REVLISTARGS, --revlistargs REVLISTARGS Rev-list args to use (default: --branches HEAD) -s MAXSIZE, --maxsize MAXSIZE Maximum size of git repositories to bundle (in GiB) (default: 2) -i, --include INCLUDE List repositories to bundle (accepts shell globbing) (default: \*) EXAMPLES -------- grok-bundle -c grokmirror.conf -o /var/www/bundles -i /pub/scm/linux/kernel/git/torvalds/linux.git /pub/scm/linux/kernel/git/stable/linux.git /pub/scm/linux/kernel/git/next/linux-next.git SEE ALSO -------- * grok-pull(1) * grok-manifest(1) * grok-fsck(1) * git(1) SUPPORT ------- Email tools@linux.kernel.org. grokmirror-2.0.11/man/grok-dumb-pull.1000066400000000000000000000056011410330145700174660ustar00rootroot00000000000000.\" Man page generated from reStructuredText. . .TH GROK-DUMB-PULL 1 "2020-08-14" "2.0.0" "" .SH NAME GROK-DUMB-PULL \- Update git repositories not managed by grokmirror . .nr rst2man-indent-level 0 . .de1 rstReportMargin \\$1 \\n[an-margin] level \\n[rst2man-indent-level] level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] - \\n[rst2man-indent0] \\n[rst2man-indent1] \\n[rst2man-indent2] .. .de1 INDENT .\" .rstReportMargin pre: . RS \\$1 . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] . nr rst2man-indent-level +1 .\" .rstReportMargin post: .. .de UNINDENT . RE .\" indent \\n[an-margin] .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] .nr rst2man-indent-level -1 .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. .SH SYNOPSIS .INDENT 0.0 .INDENT 3.5 grok\-dumb\-pull [options] /path/to/repos .UNINDENT .UNINDENT .SH DESCRIPTION .sp This is a satellite utility that updates repositories not exported via grokmirror manifest. You will need to manually clone these repositories using "git clone \-\-mirror" and then define a cronjob to update them as frequently as you require. Grok\-dumb\-pull will bluntly execute "git remote update" in each of them. .SH OPTIONS .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .TP .B \-\-version show program\(aqs version number and exit .TP .B \-h\fP,\fB \-\-help show this help message and exit .TP .B \-v\fP,\fB \-\-verbose Be verbose and tell us what you are doing .TP .B \-s\fP,\fB \-\-svn The remotes for these repositories are Subversion .TP .BI \-r \ REMOTES\fP,\fB \ \-\-remote\-names\fB= REMOTES Only fetch remotes matching this name (accepts globbing, can be passed multiple times) .TP .BI \-u \ POSTHOOK\fP,\fB \ \-\-post\-update\-hook\fB= POSTHOOK Run this hook after each repository is updated. Passes full path to the repository as the sole argument. .TP .BI \-l \ LOGFILE\fP,\fB \ \-\-logfile\fB= LOGFILE Put debug logs into this file .UNINDENT .UNINDENT .UNINDENT .SH EXAMPLES .sp The following will update all bare git repositories found in /path/to/repos hourly, and /path/to/special/repo.git daily, fetching only the "github" remote: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C MAILTO=root # Update all repositories found in /path/to/repos hourly 0 * * * * mirror /usr/bin/grok\-dumb\-pull /path/to/repos # Update /path/to/special/repo.git daily, fetching "github" remote 0 0 * * * mirror /usr/bin/grok\-dumb\-pull \-r github /path/to/special/repo.git .ft P .fi .UNINDENT .UNINDENT .sp Make sure the user "mirror" (or whichever user you specified) is able to write to the repos specified. .SH SEE ALSO .INDENT 0.0 .IP \(bu 2 grok\-pull(1) .IP \(bu 2 grok\-manifest(1) .IP \(bu 2 grok\-fsck(1) .IP \(bu 2 git(1) .UNINDENT .SH SUPPORT .sp Email \fI\%tools@linux.kernel.org\fP\&. .SH AUTHOR mricon@kernel.org License: GPLv3+ .SH COPYRIGHT The Linux Foundation and contributors .\" Generated by docutils manpage writer. . grokmirror-2.0.11/man/grok-dumb-pull.1.rst000066400000000000000000000041671410330145700203030ustar00rootroot00000000000000GROK-DUMB-PULL ============== ------------------------------------------------- Update git repositories not managed by grokmirror ------------------------------------------------- :Author: mricon@kernel.org :Date: 2020-08-14 :Copyright: The Linux Foundation and contributors :License: GPLv3+ :Version: 2.0.0 :Manual section: 1 SYNOPSIS -------- grok-dumb-pull [options] /path/to/repos DESCRIPTION ----------- This is a satellite utility that updates repositories not exported via grokmirror manifest. You will need to manually clone these repositories using "git clone --mirror" and then define a cronjob to update them as frequently as you require. Grok-dumb-pull will bluntly execute "git remote update" in each of them. OPTIONS ------- --version show program's version number and exit -h, --help show this help message and exit -v, --verbose Be verbose and tell us what you are doing -s, --svn The remotes for these repositories are Subversion -r REMOTES, --remote-names=REMOTES Only fetch remotes matching this name (accepts globbing, can be passed multiple times) -u POSTHOOK, --post-update-hook=POSTHOOK Run this hook after each repository is updated. Passes full path to the repository as the sole argument. -l LOGFILE, --logfile=LOGFILE Put debug logs into this file EXAMPLES -------- The following will update all bare git repositories found in /path/to/repos hourly, and /path/to/special/repo.git daily, fetching only the "github" remote:: MAILTO=root # Update all repositories found in /path/to/repos hourly 0 * * * * mirror /usr/bin/grok-dumb-pull /path/to/repos # Update /path/to/special/repo.git daily, fetching "github" remote 0 0 * * * mirror /usr/bin/grok-dumb-pull -r github /path/to/special/repo.git Make sure the user "mirror" (or whichever user you specified) is able to write to the repos specified. SEE ALSO -------- * grok-pull(1) * grok-manifest(1) * grok-fsck(1) * git(1) SUPPORT ------- Email tools@linux.kernel.org. grokmirror-2.0.11/man/grok-fsck.1000066400000000000000000000045041410330145700165140ustar00rootroot00000000000000.\" Man page generated from reStructuredText. . .TH GROK-FSCK 1 "2020-08-14" "2.0.0" "" .SH NAME GROK-FSCK \- Optimize mirrored repositories and check for corruption . .nr rst2man-indent-level 0 . .de1 rstReportMargin \\$1 \\n[an-margin] level \\n[rst2man-indent-level] level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] - \\n[rst2man-indent0] \\n[rst2man-indent1] \\n[rst2man-indent2] .. .de1 INDENT .\" .rstReportMargin pre: . RS \\$1 . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] . nr rst2man-indent-level +1 .\" .rstReportMargin post: .. .de UNINDENT . RE .\" indent \\n[an-margin] .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] .nr rst2man-indent-level -1 .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. .SH SYNOPSIS .INDENT 0.0 .INDENT 3.5 grok\-fsck \-c /path/to/grokmirror.conf .UNINDENT .UNINDENT .SH DESCRIPTION .sp Git repositories should be routinely repacked and checked for corruption. This utility will perform the necessary optimizations and report any problems to the email defined via fsck.report_to (\(aqroot\(aq by default). It should run weekly from cron or from the systemd timer (see contrib). .sp Please examine the example grokmirror.conf file for various things you can tweak. .SH OPTIONS .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .TP .B \-\-version show program\(aqs version number and exit .TP .B \-h\fP,\fB \-\-help show this help message and exit .TP .B \-v\fP,\fB \-\-verbose Be verbose and tell us what you are doing .TP .B \-f\fP,\fB \-\-force Force immediate run on all repositories. .TP .BI \-c \ CONFIG\fP,\fB \ \-\-config\fB= CONFIG Location of fsck.conf .TP .B \-\-repack\-only Only find and repack repositories that need optimizing (nightly run mode) .TP .B \-\-connectivity (Assumes \-\-force): Run git fsck on all repos, but only check connectivity .TP .B \-\-repack\-all\-quick (Assumes \-\-force): Do a quick repack of all repos .TP .B \-\-repack\-all\-full (Assumes \-\-force): Do a full repack of all repos .UNINDENT .UNINDENT .UNINDENT .SH SEE ALSO .INDENT 0.0 .IP \(bu 2 grok\-manifest(1) .IP \(bu 2 grok\-pull(1) .IP \(bu 2 git(1) .UNINDENT .SH SUPPORT .sp Email \fI\%tools@linux.kernel.org\fP\&. .SH AUTHOR mricon@kernel.org License: GPLv3+ .SH COPYRIGHT The Linux Foundation and contributors .\" Generated by docutils manpage writer. . grokmirror-2.0.11/man/grok-fsck.1.rst000066400000000000000000000031541410330145700173230ustar00rootroot00000000000000GROK-FSCK ========= ------------------------------------------------------- Optimize mirrored repositories and check for corruption ------------------------------------------------------- :Author: mricon@kernel.org :Date: 2020-08-14 :Copyright: The Linux Foundation and contributors :License: GPLv3+ :Version: 2.0.0 :Manual section: 1 SYNOPSIS -------- grok-fsck -c /path/to/grokmirror.conf DESCRIPTION ----------- Git repositories should be routinely repacked and checked for corruption. This utility will perform the necessary optimizations and report any problems to the email defined via fsck.report_to ('root' by default). It should run weekly from cron or from the systemd timer (see contrib). Please examine the example grokmirror.conf file for various things you can tweak. OPTIONS ------- --version show program's version number and exit -h, --help show this help message and exit -v, --verbose Be verbose and tell us what you are doing -f, --force Force immediate run on all repositories. -c CONFIG, --config=CONFIG Location of fsck.conf --repack-only Only find and repack repositories that need optimizing (nightly run mode) --connectivity (Assumes --force): Run git fsck on all repos, but only check connectivity --repack-all-quick (Assumes --force): Do a quick repack of all repos --repack-all-full (Assumes --force): Do a full repack of all repos SEE ALSO -------- * grok-manifest(1) * grok-pull(1) * git(1) SUPPORT ------- Email tools@linux.kernel.org. grokmirror-2.0.11/man/grok-manifest.1000066400000000000000000000115061410330145700173740ustar00rootroot00000000000000.\" Man page generated from reStructuredText. . .TH GROK-MANIFEST 1 "2020-08-14" "2.0.0" "" .SH NAME GROK-MANIFEST \- Create manifest for use with grokmirror . .nr rst2man-indent-level 0 . .de1 rstReportMargin \\$1 \\n[an-margin] level \\n[rst2man-indent-level] level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] - \\n[rst2man-indent0] \\n[rst2man-indent1] \\n[rst2man-indent2] .. .de1 INDENT .\" .rstReportMargin pre: . RS \\$1 . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] . nr rst2man-indent-level +1 .\" .rstReportMargin post: .. .de UNINDENT . RE .\" indent \\n[an-margin] .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] .nr rst2man-indent-level -1 .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. .SH SYNOPSIS .INDENT 0.0 .INDENT 3.5 grok\-manifest [opts] \-m manifest.js[.gz] \-t /path [/path/to/bare.git] .UNINDENT .UNINDENT .SH DESCRIPTION .sp Call grok\-manifest from a git post\-update or post\-receive hook to create the latest repository manifest. This manifest file is downloaded by mirroring systems (if manifest is newer than what they already have) and used to only clone/pull the repositories that have changed since the grok\-pull\(aqs last run. .SH OPTIONS .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .TP .B \-\-version show program\(aqs version number and exit .TP .B \-h\fP,\fB \-\-help show this help message and exit .TP .BI \-\-cfgfile\fB= CFGFILE Path to grokmirror.conf containing a [manifest] section .TP .BI \-m \ MANIFILE\fP,\fB \ \-\-manifest\fB= MANIFILE Location of manifest.js or manifest.js.gz .TP .BI \-t \ TOPLEVEL\fP,\fB \ \-\-toplevel\fB= TOPLEVEL Top dir where all repositories reside .TP .BI \-l \ LOGFILE\fP,\fB \ \-\-logfile\fB= LOGFILE When specified, will put debug logs in this location .TP .B \-c\fP,\fB \-\-check\-export\-ok Honor the git\-daemon\-export\-ok magic file and do not export repositories not marked as such .TP .B \-n\fP,\fB \-\-use\-now Use current timestamp instead of parsing commits .TP .B \-p\fP,\fB \-\-purge Purge deleted git repositories from manifest .TP .B \-x\fP,\fB \-\-remove Remove repositories passed as arguments from the manifest file .TP .B \-y\fP,\fB \-\-pretty Pretty\-print the generated manifest (sort repos and add indentation). This is much slower, so should be used with caution on large collections. .TP .B \-w\fP,\fB \-\-wait\-for\-manifest When running with arguments, wait if manifest is not there (can be useful when multiple writers are writing to the manifest file via NFS) .TP .BI \-i \ IGNORE\fP,\fB \ \-\-ignore\-paths\fB= IGNORE When finding git dirs, ignore these paths (can be used multiple times, accepts shell\-style globbing) .TP .B \-o\fP,\fB \-\-fetch\-objstore Fetch updates into objstore repo (if used) .TP .B \-v\fP,\fB \-\-verbose Be verbose and tell us what you are doing .UNINDENT .UNINDENT .UNINDENT .sp You can set some of these options in a config file that you can pass via \fB\-\-cfgfile\fP option. See example grokmirror.conf file for documentation. Values passed via cmdline flags will override the corresponding config file values. .SH EXAMPLES .sp The examples assume that the repositories are located in \fB/var/lib/gitolite3/repositories\fP\&. .sp Initial manifest generation: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C /usr/bin/grok\-manifest \-m /var/www/html/manifest.js.gz \e \-t /var/lib/gitolite3/repositories .ft P .fi .UNINDENT .UNINDENT .sp Inside the git hook: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C /usr/bin/grok\-manifest \-m /var/www/html/manifest.js.gz \e \-t /var/lib/gitolite3/repositories \-n \(gapwd\(ga .ft P .fi .UNINDENT .UNINDENT .sp To purge deleted repositories from the manifest, use the \fB\-p\fP flag when running from cron: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C /usr/bin/grok\-manifest \-m /var/www/html/manifest.js.gz \e \-t /var/lib/gitolite3/repositories \-p .ft P .fi .UNINDENT .UNINDENT .sp You can also add it to the gitolite\(aqs \fBD\fP command using the \fB\-x\fP flag: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C /usr/bin/grok\-manifest \-m /var/www/html/manifest.js.gz \e \-t /var/lib/gitolite3/repositories \e \-x $repo.git .ft P .fi .UNINDENT .UNINDENT .sp To troubleshoot potential problems, you can pass \fB\-l\fP parameter to grok\-manifest, just make sure the user executing the hook command (user git or gitolite, for example) is able to write to that location: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C /usr/bin/grok\-manifest \-m /var/www/html/manifest.js.gz \e \-t /var/lib/gitolite3/repositories \e \-l /var/log/grokmirror/grok\-manifest.log \-n \(gapwd\(ga .ft P .fi .UNINDENT .UNINDENT .SH SEE ALSO .INDENT 0.0 .IP \(bu 2 grok\-pull(1) .IP \(bu 2 git(1) .UNINDENT .SH SUPPORT .sp Email \fI\%tools@linux.kernel.org\fP\&. .SH AUTHOR mricon@kernel.org License: GPLv3+ .SH COPYRIGHT The Linux Foundation and contributors .\" Generated by docutils manpage writer. . grokmirror-2.0.11/man/grok-manifest.1.rst000066400000000000000000000075611410330145700202110ustar00rootroot00000000000000GROK-MANIFEST ============= --------------------------------------- Create manifest for use with grokmirror --------------------------------------- :Author: mricon@kernel.org :Date: 2020-08-14 :Copyright: The Linux Foundation and contributors :License: GPLv3+ :Version: 2.0.0 :Manual section: 1 SYNOPSIS -------- grok-manifest [opts] -m manifest.js[.gz] -t /path [/path/to/bare.git] DESCRIPTION ----------- Call grok-manifest from a git post-update or post-receive hook to create the latest repository manifest. This manifest file is downloaded by mirroring systems (if manifest is newer than what they already have) and used to only clone/pull the repositories that have changed since the grok-pull's last run. OPTIONS ------- --version show program's version number and exit -h, --help show this help message and exit --cfgfile=CFGFILE Path to grokmirror.conf containing a [manifest] section -m MANIFILE, --manifest=MANIFILE Location of manifest.js or manifest.js.gz -t TOPLEVEL, --toplevel=TOPLEVEL Top dir where all repositories reside -l LOGFILE, --logfile=LOGFILE When specified, will put debug logs in this location -c, --check-export-ok Honor the git-daemon-export-ok magic file and do not export repositories not marked as such -n, --use-now Use current timestamp instead of parsing commits -p, --purge Purge deleted git repositories from manifest -x, --remove Remove repositories passed as arguments from the manifest file -y, --pretty Pretty-print the generated manifest (sort repos and add indentation). This is much slower, so should be used with caution on large collections. -w, --wait-for-manifest When running with arguments, wait if manifest is not there (can be useful when multiple writers are writing to the manifest file via NFS) -i IGNORE, --ignore-paths=IGNORE When finding git dirs, ignore these paths (can be used multiple times, accepts shell-style globbing) -o, --fetch-objstore Fetch updates into objstore repo (if used) -v, --verbose Be verbose and tell us what you are doing You can set some of these options in a config file that you can pass via ``--cfgfile`` option. See example grokmirror.conf file for documentation. Values passed via cmdline flags will override the corresponding config file values. EXAMPLES -------- The examples assume that the repositories are located in ``/var/lib/gitolite3/repositories``. Initial manifest generation:: /usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \ -t /var/lib/gitolite3/repositories Inside the git hook:: /usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \ -t /var/lib/gitolite3/repositories -n `pwd` To purge deleted repositories from the manifest, use the ``-p`` flag when running from cron:: /usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \ -t /var/lib/gitolite3/repositories -p You can also add it to the gitolite's ``D`` command using the ``-x`` flag:: /usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \ -t /var/lib/gitolite3/repositories \ -x $repo.git To troubleshoot potential problems, you can pass ``-l`` parameter to grok-manifest, just make sure the user executing the hook command (user git or gitolite, for example) is able to write to that location:: /usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \ -t /var/lib/gitolite3/repositories \ -l /var/log/grokmirror/grok-manifest.log -n `pwd` SEE ALSO -------- * grok-pull(1) * git(1) SUPPORT ------- Email tools@linux.kernel.org. grokmirror-2.0.11/man/grok-pi-piper.1000066400000000000000000000061461410330145700173170ustar00rootroot00000000000000.\" Man page generated from reStructuredText. . .TH GROK-PI-PIPER 1 "2020-10-07" "2.0.2" "" .SH NAME GROK-PI-PIPER \- Hook script for piping new messages from public-inbox repos . .nr rst2man-indent-level 0 . .de1 rstReportMargin \\$1 \\n[an-margin] level \\n[rst2man-indent-level] level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] - \\n[rst2man-indent0] \\n[rst2man-indent1] \\n[rst2man-indent2] .. .de1 INDENT .\" .rstReportMargin pre: . RS \\$1 . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] . nr rst2man-indent-level +1 .\" .rstReportMargin post: .. .de UNINDENT . RE .\" indent \\n[an-margin] .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] .nr rst2man-indent-level -1 .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. .SH SYNOPSIS .INDENT 0.0 .INDENT 3.5 grok\-pi\-piper [\-h] [\-v] [\-d] \-c CONFIG [\-l PIPELAST] [\-\-version] repo .UNINDENT .UNINDENT .SH DESCRIPTION .sp This is a ready\-made hook script that can be called from pull.post_update_hook when mirroring public\-inbox repositories. It will pipe all newly received messages to arbitrary commands defined in the config file. The simplest configuration for lore.kernel.org is: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C ~/.config/pi\-piper.conf \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- [DEFAULT] pipe = /usr/bin/procmail # Prune successfully processed messages shallow = yes ~/.procmailrc \-\-\-\-\-\-\-\-\-\-\-\-\- DEFAULT=$HOME/Maildir/ # Don\(aqt deliver cross\-posted duplicates :0 Wh: .msgid.lock | formail \-D 8192 .msgid.cache ~/.config/lore.conf \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- [core] toplevel = ~/.local/share/grokmirror/lore log = ${toplevel}/grokmirror.log [remote] site = https://lore.kernel.org manifest = https://lore.kernel.org/manifest.js.gz [pull] post_update_hook = ~/.local/bin/grok\-pi\-piper \-c ~/.config/pi\-piper.conf include = /list\-you\-want/* /another\-list/* .ft P .fi .UNINDENT .UNINDENT .sp It assumes that grokmirror was installed from pip. If you installed it via some other means, please check the path for the grok\-pi\-piper script. .sp Note, that initial clone may take a long time, even if you set shallow=yes. .sp See pi\-piper.conf for other config options. .SH OPTIONS .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .TP .B \-h\fP,\fB \-\-help show this help message and exit .TP .B \-v\fP,\fB \-\-verbose Be verbose and tell us what you are doing (default: False) .TP .B \-d\fP,\fB \-\-dry\-run Do a dry\-run and just show what would be done (default: False) .TP .BI \-c \ CONFIG\fP,\fB \ \-\-config \ CONFIG Location of the configuration file (default: None) .TP .BI \-l \ PIPELAST\fP,\fB \ \-\-pipe\-last \ PIPELAST Force pipe last NN messages in the list, regardless of tracking (default: None) .TP .B \-\-version show program\(aqs version number and exit .UNINDENT .UNINDENT .UNINDENT .SH SEE ALSO .INDENT 0.0 .IP \(bu 2 grok\-pull(1) .IP \(bu 2 git(1) .UNINDENT .SH SUPPORT .sp Email \fI\%tools@linux.kernel.org\fP\&. .SH AUTHOR mricon@kernel.org License: GPLv3+ .SH COPYRIGHT The Linux Foundation and contributors .\" Generated by docutils manpage writer. . grokmirror-2.0.11/man/grok-pi-piper.1.rst000066400000000000000000000045251410330145700201250ustar00rootroot00000000000000GROK-PI-PIPER ============= ----------------------------------------------------------- Hook script for piping new messages from public-inbox repos ----------------------------------------------------------- :Author: mricon@kernel.org :Date: 2020-10-07 :Copyright: The Linux Foundation and contributors :License: GPLv3+ :Version: 2.0.2 :Manual section: 1 SYNOPSIS -------- grok-pi-piper [-h] [-v] [-d] -c CONFIG [-l PIPELAST] [--version] repo DESCRIPTION ----------- This is a ready-made hook script that can be called from pull.post_update_hook when mirroring public-inbox repositories. It will pipe all newly received messages to arbitrary commands defined in the config file. The simplest configuration for lore.kernel.org is:: ~/.config/pi-piper.conf ----------------------- [DEFAULT] pipe = /usr/bin/procmail # Prune successfully processed messages shallow = yes ~/.procmailrc ------------- DEFAULT=$HOME/Maildir/ # Don't deliver cross-posted duplicates :0 Wh: .msgid.lock | formail -D 8192 .msgid.cache ~/.config/lore.conf ------------------- [core] toplevel = ~/.local/share/grokmirror/lore log = ${toplevel}/grokmirror.log [remote] site = https://lore.kernel.org manifest = https://lore.kernel.org/manifest.js.gz [pull] post_update_hook = ~/.local/bin/grok-pi-piper -c ~/.config/pi-piper.conf include = /list-you-want/* /another-list/* It assumes that grokmirror was installed from pip. If you installed it via some other means, please check the path for the grok-pi-piper script. Note, that initial clone may take a long time, even if you set shallow=yes. See pi-piper.conf for other config options. OPTIONS ------- -h, --help show this help message and exit -v, --verbose Be verbose and tell us what you are doing (default: False) -d, --dry-run Do a dry-run and just show what would be done (default: False) -c CONFIG, --config CONFIG Location of the configuration file (default: None) -l PIPELAST, --pipe-last PIPELAST Force pipe last NN messages in the list, regardless of tracking (default: None) --version show program's version number and exit SEE ALSO -------- * grok-pull(1) * git(1) SUPPORT ------- Email tools@linux.kernel.org. grokmirror-2.0.11/man/grok-pull.1000066400000000000000000000064041410330145700165430ustar00rootroot00000000000000.\" Man page generated from reStructuredText. . .TH GROK-PULL 1 "2020-08-14" "2.0.0" "" .SH NAME GROK-PULL \- Clone or update local git repositories . .nr rst2man-indent-level 0 . .de1 rstReportMargin \\$1 \\n[an-margin] level \\n[rst2man-indent-level] level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] - \\n[rst2man-indent0] \\n[rst2man-indent1] \\n[rst2man-indent2] .. .de1 INDENT .\" .rstReportMargin pre: . RS \\$1 . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] . nr rst2man-indent-level +1 .\" .rstReportMargin post: .. .de UNINDENT . RE .\" indent \\n[an-margin] .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] .nr rst2man-indent-level -1 .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. .SH SYNOPSIS .INDENT 0.0 .INDENT 3.5 grok\-pull \-c /path/to/grokmirror.conf .UNINDENT .UNINDENT .SH DESCRIPTION .sp Grok\-pull is the main tool for replicating repository updates from the grokmirror primary server to the mirrors. .sp Grok\-pull has two modes of operation \-\- onetime and continous (daemonized). In one\-time operation mode, it downloads the latest manifest and applies any outstanding updates. If there are new repositories or changes in the existing repositories, grok\-pull will perform the necessary git commands to clone or fetch the required data from the master. Once all updates are applied, it will write its own manifest and exit. In this mode, grok\-pull can be run manually or from cron. .sp In continuous operation mode (when run with \-o), grok\-pull will continue running after all updates have been applied and will periodically re\-download the manifest from the server to check for new updates. For this to work, you must set pull.refresh in grokmirror.conf to the amount of seconds you would like it to wait between refreshes. .sp If pull.socket is specified, grok\-pull will also listen on a socket for any push updates (relative repository path as present in the manifest file, terminated with newlines). This can be used for pubsub subscriptions (see contrib). .SH OPTIONS .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .TP .B \-\-version show program\(aqs version number and exit .TP .B \-h\fP,\fB \-\-help show this help message and exit .TP .B \-v\fP,\fB \-\-verbose Be verbose and tell us what you are doing .TP .B \-n\fP,\fB \-\-no\-mtime\-check Run without checking manifest mtime. .TP .B \-o\fP,\fB \-\-continuous Run continuously (no effect if refresh is not set) .TP .BI \-c \ CONFIG\fP,\fB \ \-\-config\fB= CONFIG Location of the configuration file .TP .B \-p\fP,\fB \-\-purge Remove any git trees that are no longer in manifest. .TP .B \-\-force\-purge Force purge operation despite significant repo deletions .UNINDENT .UNINDENT .UNINDENT .SH EXAMPLES .sp Use grokmirror.conf and modify it to reflect your needs. The example configuration file is heavily commented. To invoke, run: .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C grok\-pull \-v \-c /path/to/grokmirror.conf .ft P .fi .UNINDENT .UNINDENT .SH SEE ALSO .INDENT 0.0 .IP \(bu 2 grok\-manifest(1) .IP \(bu 2 grok\-fsck(1) .IP \(bu 2 git(1) .UNINDENT .SH SUPPORT .sp Please email \fI\%tools@linux.kernel.org\fP\&. .SH AUTHOR mricon@kernel.org License: GPLv3+ .SH COPYRIGHT The Linux Foundation and contributors .\" Generated by docutils manpage writer. . grokmirror-2.0.11/man/grok-pull.1.rst000066400000000000000000000046031410330145700173510ustar00rootroot00000000000000GROK-PULL ========= -------------------------------------- Clone or update local git repositories -------------------------------------- :Author: mricon@kernel.org :Date: 2020-08-14 :Copyright: The Linux Foundation and contributors :License: GPLv3+ :Version: 2.0.0 :Manual section: 1 SYNOPSIS -------- grok-pull -c /path/to/grokmirror.conf DESCRIPTION ----------- Grok-pull is the main tool for replicating repository updates from the grokmirror primary server to the mirrors. Grok-pull has two modes of operation -- onetime and continous (daemonized). In one-time operation mode, it downloads the latest manifest and applies any outstanding updates. If there are new repositories or changes in the existing repositories, grok-pull will perform the necessary git commands to clone or fetch the required data from the master. Once all updates are applied, it will write its own manifest and exit. In this mode, grok-pull can be run manually or from cron. In continuous operation mode (when run with -o), grok-pull will continue running after all updates have been applied and will periodically re-download the manifest from the server to check for new updates. For this to work, you must set pull.refresh in grokmirror.conf to the amount of seconds you would like it to wait between refreshes. If pull.socket is specified, grok-pull will also listen on a socket for any push updates (relative repository path as present in the manifest file, terminated with newlines). This can be used for pubsub subscriptions (see contrib). OPTIONS ------- --version show program's version number and exit -h, --help show this help message and exit -v, --verbose Be verbose and tell us what you are doing -n, --no-mtime-check Run without checking manifest mtime. -o, --continuous Run continuously (no effect if refresh is not set) -c CONFIG, --config=CONFIG Location of the configuration file -p, --purge Remove any git trees that are no longer in manifest. --force-purge Force purge operation despite significant repo deletions EXAMPLES -------- Use grokmirror.conf and modify it to reflect your needs. The example configuration file is heavily commented. To invoke, run:: grok-pull -v -c /path/to/grokmirror.conf SEE ALSO -------- * grok-manifest(1) * grok-fsck(1) * git(1) SUPPORT ------- Please email tools@linux.kernel.org. grokmirror-2.0.11/pi-piper.conf000066400000000000000000000026061410330145700163660ustar00rootroot00000000000000# These will be overriden by any sections below [DEFAULT] # To start piping public-inbox messages into your inbox, simply # install procmail and add the following line to your ~/.procmailrc: # DEFAULT=$HOME/Maildir/ # You can now read your mail with "mutt -f ~/Maildir/" pipe = /usr/bin/procmail # Once you've successfully piped the messages, you generally # don't need them any more. If you set shallow = yes, then # the repository will be configured as "shallow" and all succesffully # processed messages will be pruned from the repo. # This will greatly reduce disk space usage, especially on large archives. # You can always get any number of them back, e.g. by running: # git fetch _grokmirror master --deepen 100 shallow = yes # You can use ~/ for paths in your home dir, or omit for no log #log = ~/pi-piper.log # Can be "info" or "debug". Note, that debug will have message bodies as well. #loglevel = info # Overrides for any defaults. You may not need any if all you want is to pipe all mirrored # public-inboxes to procmail. # Naming: # We will perform simple shell-style globbing using the following rule: # /{section}/git/*.git, # so, for a section that matches /alsa-devel/git/0.git, name it "alsa-devel" [alsa-devel] # Use a different config file for this one pipe = /usr/bin/procmail /path/to/some/other/procmailrc [lkml] # Setting pipe = None allows ignoring this particular list pipe = Nonegrokmirror-2.0.11/requirements.txt000066400000000000000000000000221410330145700172440ustar00rootroot00000000000000packaging requestsgrokmirror-2.0.11/setup.py000066400000000000000000000046001410330145700155000ustar00rootroot00000000000000#!/usr/bin/env python # -*- coding: utf-8 -*- # Copyright (C) 2013-2020 by The Linux Foundation and contributors # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import os import re from setuptools import setup def read(fname): return open(os.path.join(os.path.dirname(__file__), fname)).read() def find_version(source): version_file = read(source) version_match = re.search(r"^VERSION = ['\"]([^'\"]*)['\"]", version_file, re.M) if version_match: return version_match.group(1) raise RuntimeError("Unable to find version string.") NAME = 'grokmirror' VERSION = find_version('grokmirror/__init__.py') setup( version=VERSION, url='https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git', download_url='https://www.kernel.org/pub/software/network/grokmirror/%s-%s.tar.xz' % (NAME, VERSION), name=NAME, description='Smartly mirror git repositories that use grokmirror', author='Konstantin Ryabitsev', author_email='konstantin@linuxfoundation.org', packages=[NAME], license='GPLv3+', long_description=read('README.rst'), long_description_content_type='text/x-rst', keywords=['git', 'mirroring', 'repositories'], project_urls={ 'Source': 'https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git', 'Tracker': 'https://github.com/mricon/grokmirror/issues', }, install_requires=[ 'requests', ], python_requires='>=3.6', entry_points={ 'console_scripts': [ "grok-dumb-pull=grokmirror.dumb_pull:command", "grok-pull=grokmirror.pull:command", "grok-fsck=grokmirror.fsck:command", "grok-manifest=grokmirror.manifest:command", "grok-bundle=grokmirror.bundle:command", "grok-pi-piper=grokmirror.pi_piper:command", ] } )