pax_global_header 0000666 0000000 0000000 00000000064 14103301457 0014510 g ustar 00root root 0000000 0000000 52 comment=85e95cdb3f9dd0e066c9f35959ead12136fe904c
grokmirror-2.0.11/ 0000775 0000000 0000000 00000000000 14103301457 0013766 5 ustar 00root root 0000000 0000000 grokmirror-2.0.11/.gitignore 0000664 0000000 0000000 00000000061 14103301457 0015753 0 ustar 00root root 0000000 0000000 .idea
*.pyc
*.swp
*.pdf
*~
dist
build
*.egg-info
grokmirror-2.0.11/CHANGELOG.rst 0000664 0000000 0000000 00000017301 14103301457 0016011 0 ustar 00root root 0000000 0000000 v2.0.11 (2021-08-06)
--------------------
- Hotfix for pull_threads bug causing any config value to result in 1
thread.
v2.0.10 (2021-07-27)
--------------------
- Roll back new hook inclusion (will go into 2.1.0)
- Roll back code cleanups (will go into 2.1.0)
- Fix a grok-fsck regression introduced in 2.0.9
- Fix pull_threads auto-detection on single-cpu systems
v2.0.9 (2021-07-13)
-------------------
- Add initial support for post_clone_complete_hook that fires only after
all new clones have been completed.
- Fix grok-manifest traceback due to unicode errors in the repo
description file.
- Minor code cleanups.
v2.0.8 (2021-03-11)
-------------------
- Fixes around symlink handling in manifest files. Adding and deleting
symlinks should properly work again.
- Don't require [fsck] section in the config file (though you'd almost
always want it there).
v2.0.7 (2021-01-19)
-------------------
- A slew of small fixes improving performance on very large repository
collections (CAF internally is 32,500).
v2.0.6 (2021-01-07)
-------------------
- Use fsck.extra_repack_flags when doing quick post-clone repacks
- Store objects in objstore after grok-dumb-pull call on a repo that uses
objstore repositories
v2.0.5 (2020-11-25)
-------------------
- Prioritize baseline repositories when finding related objstore repos.
- Minor fixes.
v2.0.4 (2020-11-06)
-------------------
- Add support to use git plumbing for objstore operations, via enabling
core.objstore_uses_plumbing. This allows to significantly speed up
fetching objects into objstore during pull operations. Fsck operations
will continue to use porcelain "git fetch", since speed is less important
in those cases and it's best to opt for maximum safety. As a benchmark,
with remote.preload_bundle_url and core.objstore_uses_plumbing settings
enabled, cloning a full replica of git.kernel.org takes less than an hour
as opposed to over a day.
v2.0.3 (2020-11-04)
-------------------
- Refuse to delete ffonly repos
- Add new experimental bundle_preload feature for generating objstore
repo bundles and using them to preload objstores on the mirrors
v2.0.2 (2020-10-06)
-------------------
- Provide pi-piper utility for piping new messages from public-inbox
repositories. It can be specified as post_update_hook:
post_update_hook = /usr/bin/grok-pi-piper -c ~/.config/pi-piper.conf
- Add -r option to grok-manifest to ignore specific refs when calculating
repository fingerprint. This is mostly useful for mirroring from gerrit.
v2.0.1 (2020-09-30)
-------------------
- fix potential corruption when migrating repositories with existing
alternates to new object storage format
- improve grok-fsck console output to be less misleading for large repo
collections (was misreporting obstrepo/total repo numbers)
- use a faster repo search algorithm that doesn't needlessly recurse
into git repos themselves, once found
v2.0.0 (2020-09-21)
-------------------
Major rewrite to improve shared object storage and replication for VERY
LARGE repository collections (codeaurora.org is ~30,000 repositories,
which are mostly various forks of Android).
See UPGRADING.rst for the upgrade strategy.
Below are some major highlights.
- Drop support for python < 3.6
- Introduce "object storage" repositories that benefit from git-pack
delta islands and improve overall disk storage footprint (depending on
the number of forks).
- Drop dependency on GitPython, use git calls directly for all operations
- Remove progress bars to slim down dependencies (drops enlighten)
- Make grok-pull operate in daemon mode (with -o) (see contrib for
systemd unit files). This is more efficient than the cron mode when
run very frequently.
- Provide a socket listener for pubsub push updates (see contrib for
Google pubsubv1.py).
- Merge fsck.conf and repos.conf into a single config file. This
requires creating a new configuration file after the upgrade. See
UPGRADING.rst for details.
- Record and propagate HEAD position using the manifest file.
- Add grok-bundle command to create clone.bundle files for CDN-offloaded
cloning (mostly used by Android's repo command).
- Add SELinux policy for EL7 (see contrib).
v1.2.2 (2019-10-23)
-------------------
- Small bugfixes
- Generate commit-graph file if the version of git is new
enough to support it. This is done during grok-fsck any time we
decide that the repository needs to be repacked. You can force
this off by setting commitgraph=never in config.
v1.2.1 (2019-03-11)
-------------------
- Minor feature improvement changing how precious=yes works.
Grokmirror will now turn preciousObjects off for the duration
of the repack. We still protect shared repositories against
inadvertent object pruning by outside processes, but this
allows us to clean up loose objects and obsolete packs.
To have the 1.2.0 behaviour back, set precious=always, but it
is only really useful in very rare cases.
v1.2.0 (2019-02-14)
-------------------
- Make sure to set gc.auto=0 on repositories to avoid pruning repos
that are acting as alternates to others. We run our own prune
during fsck, so there is no need to auto-gc, ever (unless you
didn't set up grok-fsck, in which case you're not doing it right).
- Rework the repack code to be more clever -- instead of repacking
based purely on dates, we now track the number of loose objects
and the number of generated packs. Many of the settings are
hardcoded for the moment while testing, but will probably end up
settable via global and per-repository config settings.
- The following fsck.conf settings have no further effect:
- repack_flags (replaced with extra_repack_flags)
- full_repack_flags (replaced with extra_repack_flags_full)
- full_repack_every (we now figure it out ourselves)
- Move git command invocation routines into a central function to
reduce the amount of code duplication. You can also set the path
to the git binary using the GITBIN env variable or by simply
adding it to your path.
- Add "reclone_on_errors" setting in fsck.conf. If fsck/repack/prune
comes across a matching error, it will mark the repository for
recloning and it will be cloned anew from the master the next time
grok-pull runs. This is useful for auto-correcting corruption on the
mirrors. You can also manually request a reclone by creating a
"grokmirror.reclone" file in a repository.
- Set extensions.preciousObjects for repositories used with git
alternates if precious=yes is set in fsck.conf. This helps further
protect shared repos from erroneous pruning (e.g. done manually by
an administrator).
v1.1.1 (2018-07-25)
-------------------
- Quickfix a bug that was causing repositories to never be repacked
due to miscalculated fingerprints.
v1.1.0 (2018-04-24)
-------------------
- Make Python3 compatible (thanks to QuLogic for most of the work)
- Rework grok-fsck to improve functionality:
- run repack and prune before fsck, for optimal safety
- add --connectivity flag to run fsck with --connectivity-only
- add --repack-all-quick to trigger a quick repack of all repos
- add --repack-all-full to trigger a full repack of all repositories
using the defined full_repack_flags from fsck.conf
- always run fsck with --no-dangling, because mirror admins are not
responsible for cleaning those up anyway
- no longer locking repos when running repack/prune/fsck, because
these operations are safe as long as they are done by git itself
- fix grok-pull so it no longer purges repos that are providing
alternates to others
- fix grok-fsck so it's more paranoid when pruning repos providing
alternates to others (checks all repos on disk, not just manifest)
- in verbose mode, most commands will draw progress bars (handy with
very large connections of repositories)
grokmirror-2.0.11/LICENSE.txt 0000664 0000000 0000000 00000104513 14103301457 0015615 0 ustar 00root root 0000000 0000000 GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc.
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The GNU General Public License is a free, copyleft license for
software and other kinds of works.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
the GNU General Public License is intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users. We, the Free Software Foundation, use the
GNU General Public License for most of our software; it applies also to
any other work released this way by its authors. You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights. Therefore, you have
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must pass on to the recipients the same
freedoms that you received. You must make sure that they, too, receive
or can get the source code. And you must show them these terms so they
know their rights.
Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
giving you legal permission to copy, distribute and/or modify it.
For the developers' and authors' protection, the GPL clearly explains
that there is no warranty for this free software. For both users' and
authors' sake, the GPL requires that modified versions be marked as
changed, so that their problems will not be attributed erroneously to
authors of previous versions.
Some devices are designed to deny users access to install or run
modified versions of the software inside them, although the manufacturer
can do so. This is fundamentally incompatible with the aim of
protecting users' freedom to change the software. The systematic
pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable. Therefore, we
have designed this version of the GPL to prohibit the practice for those
products. If such problems arise substantially in other domains, we
stand ready to extend this provision to those domains in future versions
of the GPL, as needed to protect the freedom of users.
Finally, every program is threatened constantly by software patents.
States should not allow patents to restrict development and use of
software on general-purpose computers, but in those that do, we wish to
avoid the special danger that patents applied to a free program could
make it effectively proprietary. To prevent this, the GPL assures that
patents cannot be used to render the program non-free.
The precise terms and conditions for copying, distribution and
modification follow.
TERMS AND CONDITIONS
0. Definitions.
"This License" refers to version 3 of the GNU General Public License.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
"The Program" refers to any copyrightable work licensed under this
License. Each licensee is addressed as "you". "Licensees" and
"recipients" may be individuals or organizations.
To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy. The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.
A "covered work" means either the unmodified Program or a work based
on the Program.
To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy. Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.
To "convey" a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.
An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License. If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.
1. Source Code.
The "source code" for a work means the preferred form of the work
for making modifications to it. "Object code" means any non-source
form of a work.
A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.
The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form. A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.
The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities. However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.
The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.
The Corresponding Source for a work in source code form is that
same work.
2. Basic Permissions.
All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program. The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work. This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.
You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force. You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright. Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.
Conveying under any other circumstances is permitted solely under
the conditions stated below. Sublicensing is not allowed; section 10
makes it unnecessary.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.
When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.
4. Conveying Verbatim Copies.
You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.
You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.
5. Conveying Modified Source Versions.
You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:
a) The work must carry prominent notices stating that you modified
it, and giving a relevant date.
b) The work must carry prominent notices stating that it is
released under this License and any conditions added under section
7. This requirement modifies the requirement in section 4 to
"keep intact all notices".
c) You must license the entire work, as a whole, under this
License to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable section 7
additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no
permission to license the work in any other way, but it does not
invalidate such permission if you have separately received it.
d) If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has interactive
interfaces that do not display Appropriate Legal Notices, your
work need not make them do so.
A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.
6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:
a) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium
customarily used for software interchange.
b) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a
written offer, valid for at least three years and valid for as
long as you offer spare parts or customer support for that product
model, to give anyone who possesses the object code either (1) a
copy of the Corresponding Source for all the software in the
product that is covered by this License, on a durable physical
medium customarily used for software interchange, for a price no
more than your reasonable cost of physically performing this
conveying of source, or (2) access to copy the
Corresponding Source from a network server at no charge.
c) Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This
alternative is allowed only occasionally and noncommercially, and
only if you received the object code with such an offer, in accord
with subsection 6b.
d) Convey the object code by offering access from a designated
place (gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
further charge. You need not require recipients to copy the
Corresponding Source along with the object code. If the place to
copy the object code is a network server, the Corresponding Source
may be on a different server (operated by you or a third party)
that supports equivalent copying facilities, provided you maintain
clear directions next to the object code saying where to find the
Corresponding Source. Regardless of what server hosts the
Corresponding Source, you remain obligated to ensure that it is
available for as long as needed to satisfy these requirements.
e) Convey the object code using peer-to-peer transmission, provided
you inform other peers where the object code and Corresponding
Source of the work are being offered to the general public at no
charge under subsection 6d.
A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.
A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling. In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage. For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product. A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.
"Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source. The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.
If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information. But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).
The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed. Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.
Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.
7. Additional Terms.
"Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law. If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.
When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it. (Additional permissions may be written to require their own
removal in certain cases when you modify the work.) You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.
Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:
a) Disclaiming warranty or limiting liability differently from the
terms of sections 15 and 16 of this License; or
b) Requiring preservation of specified reasonable legal notices or
author attributions in that material or in the Appropriate Legal
Notices displayed by works containing it; or
c) Prohibiting misrepresentation of the origin of that material, or
requiring that modified versions of such material be marked in
reasonable ways as different from the original version; or
d) Limiting the use for publicity purposes of names of licensors or
authors of the material; or
e) Declining to grant rights under trademark law for use of some
trade names, trademarks, or service marks; or
f) Requiring indemnification of licensors and authors of that
material by anyone who conveys the material (or modified versions of
it) with contractual assumptions of liability to the recipient, for
any liability that these contractual assumptions directly impose on
those licensors and authors.
All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10. If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term. If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.
If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.
Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.
8. Termination.
You may not propagate or modify a covered work except as expressly
provided under this License. Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).
However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.
Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License. If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.
9. Acceptance Not Required for Having Copies.
You are not required to accept this License in order to receive or
run a copy of the Program. Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance. However,
nothing other than this License grants you permission to propagate or
modify any covered work. These actions infringe copyright if you do
not accept this License. Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.
10. Automatic Licensing of Downstream Recipients.
Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License. You are not responsible
for enforcing compliance by third parties with this License.
An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations. If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.
You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License. For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.
11. Patents.
A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The
work thus licensed is called the contributor's "contributor version".
A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version. For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.
Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.
In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement). To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.
If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients. "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.
If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.
A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License. You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.
Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.
12. No Surrender of Others' Freedom.
If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all. For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.
13. Use with the GNU Affero General Public License.
Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU Affero General Public License into a single
combined work, and to convey the resulting work. The terms of this
License will continue to apply to the part which is the covered work,
but the special requirements of the GNU Affero General Public License,
section 13, concerning interaction through a network will apply to the
combination as such.
14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new versions of
the GNU General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the
Program specifies that a certain numbered version of the GNU General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation. If the Program does not specify a version number of the
GNU General Public License, you may choose any version ever published
by the Free Software Foundation.
If the Program specifies that a proxy can decide which future
versions of the GNU General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.
Later license versions may give you additional or different
permissions. However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.
15. Disclaimer of Warranty.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.
17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
Copyright (C)
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see .
Also add information on how to contact you by electronic and paper mail.
If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode:
Copyright (C)
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, your program's commands
might be different; for a GUI interface, you would use an "about box".
You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU GPL, see
.
The GNU General Public License does not permit incorporating your program
into proprietary programs. If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with
the library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. But first, please read
.
grokmirror-2.0.11/MANIFEST.in 0000664 0000000 0000000 00000000061 14103301457 0015521 0 ustar 00root root 0000000 0000000 include LICENSE.txt
include *.rst
include *.conf
grokmirror-2.0.11/README.rst 0000664 0000000 0000000 00000016650 14103301457 0015465 0 ustar 00root root 0000000 0000000 GROKMIRROR
==========
--------------------------------------------
Framework to smartly mirror git repositories
--------------------------------------------
:Author: konstantin@linuxfoundation.org
:Date: 2020-09-18
:Copyright: The Linux Foundation and contributors
:License: GPLv3+
:Version: 2.0.0
DESCRIPTION
-----------
Grokmirror was written to make replicating large git repository
collections more efficient. Grokmirror uses the manifest file published
by the origin server in order to figure out which repositories to clone,
and to track which repositories require updating. The process is
lightweight and efficient both for the primary and for the replicas.
CONCEPTS
--------
The origin server publishes a json-formatted manifest file containing
information about all git repositories that it carries. The format of
the manifest file is as follows::
{
"/path/to/bare/repository.git": {
"description": "Repository description",
"head": "ref: refs/heads/branchname",
"reference": "/path/to/reference/repository.git",
"forkgroup": "forkgroup-guid",
"modified": timestamp,
"fingerprint": sha1sum(git show-ref),
"symlinks": [
"/location/to/symlink",
...
],
}
...
}
The manifest file is usually gzip-compressed to preserve bandwidth.
Each time a commit is made to one of the git repositories, it
automatically updates the manifest file using an appropriate git hook,
so the manifest.js file should always contain the most up-to-date
information about the state of all repositories.
The mirroring clients will poll the manifest.js file and download the
updated manifest if it is newer than the locally stored copy (using
``Last-Modified`` and ``If-Modified-Since`` http headers). After
downloading the updated manifest.js file, the mirrors will parse it to
find out which repositories have been updated and which new repositories
have been added.
Object Storage Repositories
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Grokmirror 2.0 introduces the concept of "object storage repositories",
which aims to optimize how repository forks are stored on disk and
served to the cloning clients.
When grok-fsck runs, it will automatically recognize related
repositories by analyzing their root commits. If it finds two or more
related repositories, it will set up a unified "object storage" repo and
fetch all refs from each related repository into it.
For example, you can have two forks of linux.git:
torvalds/linux.git:
refs/heads/master
refs/tags/v5.0-rc3
...
and its fork:
maintainer/linux.git:
refs/heads/master
refs/heads/devbranch
refs/tags/v5.0-rc3
...
Grok-fsck will set up an object storage repository and fetch all refs from
both repositories:
objstore/[random-guid-name].git
refs/virtual/[sha1-of-torvalds/linux.git:12]/heads/master
refs/virtual/[sha1-of-torvalds/linux.git:12]/tags/v5.0-rc3
...
refs/virtual/[sha1-of-maintainer/linux.git:12]/heads/master
refs/virtual/[sha1-of-maintainer/linux.git:12]/heads/devbranch
refs/virtual/[sha1-of-maintainer/linux.git:12]/tags/v5.0-rc3
...
Then both torvalds/linux.git and maintainer/linux.git with be configured
to use objstore/[random-guid-name].git via objects/info/alternates
and repacked to just contain metadata and no objects.
The alternates repository will be repacked with "delta islands" enabled,
which should help optimize clone operations for each "sibling"
repository.
Please see the example grokmirror.conf for more details about
configuring objstore repositories.
ORIGIN SETUP
------------
Install grokmirror on the origin server using your preferred way.
**IMPORTANT: Only bare git repositories are supported.**
You will need to add a hook to each one of your repositories that would
update the manifest upon repository modification. This can either be a
post-receive hook, or a post-update hook. The hook must call the
following command::
/usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \
-t /var/lib/gitolite3/repositories -n `pwd`
The **-m** flag is the path to the manifest.js file. The git process
must be able to write to it and to the directory the file is in (it
creates a manifest.js.randomstring file first, and then moves it in
place of the old one for atomicity).
The **-t** flag is to help grokmirror trim the irrelevant toplevel disk
path, so it is trimmed from the top.
The **-n** flag tells grokmirror to use the current timestamp instead of
the exact timestamp of the commit (much faster this way).
Before enabling the hook, you will need to generate the manifest.js of
all your git repositories. In order to do that, run the same command,
but omit the -n and the \`pwd\` argument. E.g.::
/usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \
-t /var/lib/gitolite3/repositories
The last component you need to set up is to automatically purge deleted
repositories from the manifest. As this can't be added to a git hook,
you can either run the ``--purge`` command from cron::
/usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \
-t /var/lib/gitolite3/repositories -p
Or add it to your gitolite's ``D`` command using the ``--remove`` flag::
/usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \
-t /var/lib/gitolite3/repositories -x $repo.git
If you would like grok-manifest to honor the ``git-daemon-export-ok``
magic file and only add to the manifest those repositories specifically
marked as exportable, pass the ``--check-export-ok`` flag. See
``git-daemon(1)`` for more info on ``git-daemon-export-ok`` file.
You will need to have some kind of httpd server to serve the manifest
file.
REPLICA SETUP
-------------
Install grokmirror on the replica using your preferred way.
Locate grokmirror.conf and modify it to reflect your needs. The default
configuration file is heavily commented to explain what each option
does.
Make sure the user "mirror" (or whichever user you specified) is able to
write to the toplevel and log locations specified in grokmirror.conf.
You can either run grok-pull manually, from cron, or as a
systemd-managed daemon (see contrib). If you do it more frequently than
once every few hours, you should definitely run it as a daemon in order
to improve performance.
GROK-FSCK
---------
Git repositories should be routinely repacked and checked for
corruption. This utility will perform the necessary optimizations and
report any problems to the email defined via fsck.report_to ('root' by
default). It should run weekly from cron or from the systemd timer (see
contrib).
Please examine the example grokmirror.conf file for various things you
can tweak.
FAQ
---
Why is it called "grok mirror"?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Because it's developed at kernel.org and "grok" is a mirror of "korg".
Also, because it groks git mirroring.
Why not just use rsync?
~~~~~~~~~~~~~~~~~~~~~~~
Rsync is extremely inefficient for the purpose of mirroring git trees
that mostly consist of a lot of small files that very rarely change.
Since rsync must calculate checksums on each file during each run, it
mostly results in a lot of disk thrashing.
Additionally, if several repositories share objects between each-other,
unless the disk paths are exactly the same on both the remote and local
mirror, this will result in broken git repositories.
It is also a bit silly, considering git provides its own extremely
efficient mechanism for specifying what changed between revision X and
revision Y.
grokmirror-2.0.11/UPGRADING.rst 0000664 0000000 0000000 00000015467 14103301457 0016055 0 ustar 00root root 0000000 0000000 Upgrading from Grokmirror 1.x to 2.x
------------------------------------
Grokmirror-2.0 introduced major changes to how repositories are
organized, so it deliberately breaks the upgrade path in order to force
admins to make proper decisions. Installing the newer version on top of
the old one will break replication, as it will refuse to work with old
configuration files.
Manifest compatibility
----------------------
Manifest files generated by grokmirror-1.x will continue to work on
grokmirror-2.x replicas. Similarly, manifest files generated by
grokmirror-2.x origin servers will work on grokmirror-1.x replicas.
In other words, upgrading the origin servers and replicas does not need
to happen at the same time. While grokmirror-2.x adds more entries to
the manifest file (e.g. "forkgroup" and "head" records), they will be
ignored by grokmirror-1.x replicas.
Upgrading the origin server
---------------------------
Breaking changes affecting the origin server are related to grok-fsck
runs. Existing grok-manifest hooks should continue to work without any
changes required.
Grok-fsck will now automatically recognize related repositories by
comparing the output of ``git rev-list --max-parents=0 --all``. When two
or more repositories are recognized as forks of each-other, a new
"object storage" repository will be set up that will contain refs from
all siblings. After that, individual repositories will be repacked to
only contain repository metadata (and loose objects in need of pruning).
Existing repositories that already use alternates will be automatically
migrated to objstore repositories during the first grok-fsck run. If you
have a small collection of repositories, or if the vast majority of them
aren't forks of each-other, then the upgrade can be done live with
little impact.
If the opposite is true and most of your repositories are forks, then
the initial grok-fsck run will take a lot of time and resources to
complete, as repositories will be automatically repacked to take
advantage of the new object storage layout. Doing so without preparation
can significantly impact the availability of your server, so you should
plan the upgrade appropriately.
Recommended scenario for large collections with lots of forks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Set up a temporary system with fast disk IO and plenty of CPUs
and RAM. Repacking will go a lot faster on fast systems with plenty
of IO cycles.
2. Install grokmirror-2 and configure it to replicate from the origin
**INTO THE SAME PATH AS ON THE ORIGIN SERVER**. If your origin server
is hosting repos out of /var/lib/gitolite3/repositories, then your
migration replica should be configured with toplevel in
/var/lib/gitolite3/repositories. This is important, because when the
"alternates" file is created, it specifies a full path to the
location of the object storage directory and moving repositories into
different locations post-migration will result in breakage. *Avoid
using symlinks for this purpose*, as grokmirror-2 will realpath them
before using internally.
3. Perform initial grok-pull replication from the current origin server
to the migration replica. This should set up all repositories
currently using alternates as objstore repositories.
4. Once the initial replication is complete, run grok-fsck on the new
hierarchy. This should properly repack all new object storage
repositories to benefit from delta islands, plus automatically find
all repositories that are forks of each-other but aren't already set
up for alternates. The initial grok-fsck process may take a LONG time
to run, depending on the size of your repository collection.
5. Schedule migration downtime.
6. Right before downtime, run grok-pull to get the latest updates.
7. At the start of downtime, block access to the origin server, so no
pushes are allowed to go through. Run final grok-pull on the
migration replica.
8. Back up your existing hierarchy, because you know you should, or move
it out of the way if you have enough disk space for this.
9. Copy the new hierarchy from the migration replica (e.g. using rsync).
10. Run any necessary steps such as "gitolite setup" in order to set
things up.
11. Rerun grok-manifest on the toplevel in order to generate the fresh
manifest.js.gz file.
12. Create a new grokmirror.conf for fsck runs (grokmirror-1.x
configuration files are purposefully not supported).
13. Enable the grok-fsck timer.
Upgrading the replicas
----------------------
The above procedure should also be considered for upgrading the
replicas, unless you have a small collection that doesn't use a lot of
forks and alternates. You can find out if that is the case by running
``find . -name alternates`` at the top of your mirrored tree. If the
number of returned hits is significant, then the first time grok-fsck
runs, it will spend a lot of time repacking the repositories to benefit
from the new layout. On the upside, you can expect significant storage
use reduction after this conversion is completed.
If your replica is providing continuous access for members of your
development team, then you may want to perform this conversion prior to
upgrading grokmirror on your production server, in order to reduce the
impact on server load. Just follow the instructions from the section
above.
Converting the configuration file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Grokmirror-1.x used two different config files -- one for grok-pull and
another for grok-fsck. This separation only really made sense on the
origin server and was cumbersome for the replicas, since they ended up
duplicating a lot of configuration options between the two config files.
Grokmirror-1.x:
- separate configuration files for grok-pull and grok-fsck
- multiple origin servers can be listed in one file
Grokmirror-2.x:
- one configuration file for all grokmirror tools
- one origin server per configuration file
Grokmirror-2.x will refuse to run with configuration files created for
the previous version, so you will need to create a new configuration
file in order to continue using it after upgrading. Most configuration
options will be familiar to you from version 1.x, and the rest are
documented in the grokmirror.conf file provided with the distribution.
Converting from cron to daemon operation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Grokmirror-1.x expected grok-pull to run from cron, but this had a set
of important limitations. In contrast, grokmirror-2.x is written to run
grok-pull as a daemon. It is strongly recommended to switch away from
cron-based regular runs if you do them more frequently than once every
few hours, as this will result in more efficient operation. See the set
of systemd unit files included in the contrib directory for where to get
started.
Grok-fsck can continue to run from cron if you prefer, or you can run it
from a systemd timer as well.
grokmirror-2.0.11/contrib/ 0000775 0000000 0000000 00000000000 14103301457 0015426 5 ustar 00root root 0000000 0000000 grokmirror-2.0.11/contrib/gitolite/ 0000775 0000000 0000000 00000000000 14103301457 0017246 5 ustar 00root root 0000000 0000000 grokmirror-2.0.11/contrib/gitolite/get-grok-manifest.command 0000664 0000000 0000000 00000001725 14103301457 0024136 0 ustar 00root root 0000000 0000000 #!/bin/bash
# This is a command to install in gitolite's local-code.
# Don't forget to enable it via .gitolite.rc
#
# Change this to where grok-manifest is writing manifest.js
MANIFILE="/var/www/html/grokmirror/manifest.js.gz"
if [[ -z "$GL_USER" ]]; then
echo "ERROR: GL_USER is unset. Run me via ssh, please."
exit 1
fi
# Make sure we only accept credential replication from the mirrors
for MIRROR in $(GL_USER='' gitolite mirror list slaves gitolite-admin); do
if [[ $GL_USER == "server-${MIRROR}" ]]; then
AOK="yes"
break
fi
done
if [[ -z "$AOK" ]]; then
echo "You are not allowed to do this"
exit 1
fi
if [[ ! -s $MANIFILE ]]; then
echo "Manifest file not found"
exit 1
fi
R_LASTMOD=$1
if [[ -z "$R_LASTMOD" ]]; then
R_LASTMOD=0
fi
L_LASTMOD=$(stat --printf='%Y' $MANIFILE)
if [[ $L_LASTMOD -le $R_LASTMOD ]]; then
exit 127
fi
if [[ $MANIFILE == *.gz ]]; then
zcat $MANIFILE
else
cat $MANIFILE
fi
exit 0
grokmirror-2.0.11/contrib/gitolite/grok-get-gl-manifest.sh 0000664 0000000 0000000 00000001073 14103301457 0023526 0 ustar 00root root 0000000 0000000 #!/bin/bash
# This is executed by grok-pull if manifest_command is defined.
# You should install the other file as one of your commands in local-code
# and enable it in .gitolite.rc
PRIMARY=$(gitolite mirror list master gitolite-admin)
STATEFILE="$(gitolite query-rc GL_ADMIN_BASE)/.${PRIMARY}.manifest.lastupd"
GL_COMMAND=get-grok-manifest
if [[ -s $STATEFILE ]] && [[ $1 != '--force' ]]; then
LASTUPD=$(cat $STATEFILE)
fi
NOWSTAMP=$(date +'%s')
ssh $PRIMARY $GL_COMMAND $LASTUPD
ECODE=$?
if [[ $ECODE == 0 ]]; then
echo $NOWSTAMP > $STATEFILE
fi
exit $ECODE
grokmirror-2.0.11/contrib/grok-fsck@.service 0000664 0000000 0000000 00000001060 14103301457 0020773 0 ustar 00root root 0000000 0000000 [Unit]
Description=Grok-fsck service for %I
Documentation=https://github.com/mricon/grokmirror
[Service]
Type=oneshot
Environment="EXTRA_FSCK_OPTS="
EnvironmentFile=-/etc/sysconfig/grokmirror.default
EnvironmentFile=-/etc/sysconfig/grokmirror.%i
ExecStart=/usr/bin/grok-fsck -c /etc/grokmirror/%i.conf $EXTRA_FSCK_OPTS
CPUSchedulingPolicy=batch
# To override these users, create a drop-in systemd conf file in
# /etc/systemd/system/grok-fsck@[foo].service.d/10-usergroup.conf:
# [Service]
# User=yourpreference
# Group=yourpreference
User=mirror
Group=mirror
grokmirror-2.0.11/contrib/grok-fsck@.timer 0000664 0000000 0000000 00000000235 14103301457 0020456 0 ustar 00root root 0000000 0000000 [Unit]
Description=Grok-fsck timer for %I
Documentation=https://github.com/mricon/grokmirror
[Timer]
OnCalendar=Sat 04:00
[Install]
WantedBy=timers.target
grokmirror-2.0.11/contrib/grok-pull@.service 0000664 0000000 0000000 00000001146 14103301457 0021026 0 ustar 00root root 0000000 0000000 [Unit]
Description=Grok-pull service for %I
After=network.target
Documentation=https://github.com/mricon/grokmirror
[Service]
Environment="EXTRA_PULL_OPTS="
EnvironmentFile=-/etc/sysconfig/grokmirror.default
EnvironmentFile=-/etc/sysconfig/grokmirror.%i
ExecStart=/usr/bin/grok-pull -o -c /etc/grokmirror/%i.conf $EXTRA_PULL_OPTS
Type=simple
Restart=on-failure
# To override these users, create a drop-in systemd conf file in
# /etc/systemd/system/grok-pull@[foo].service.d/10-usergroup.conf:
# [Service]
# User=yourpreference
# Group=yourpreference
User=mirror
Group=mirror
[Install]
WantedBy=multi-user.target
grokmirror-2.0.11/contrib/logrotate 0000664 0000000 0000000 00000000115 14103301457 0017346 0 ustar 00root root 0000000 0000000 /var/log/grokmirror/*.log {
missingok
notifempty
delaycompress
}
grokmirror-2.0.11/contrib/pubsubv1.py 0000664 0000000 0000000 00000007162 14103301457 0017555 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python3
# Implements a Google pubsub v1 push listener, see:
# https://cloud.google.com/pubsub/docs/push
#
# In order to work, grok-pull must be running as a daemon service with
# the "socket" option enabled in the configuration.
#
# The pubsub message should contain two attributes:
# {
# "message": {
# "attributes": {
# "proj": "projname",
# "repo": "/path/to/repo.git"
# }
# }
# }
#
# "proj" value should map to a "$proj.conf" file in /etc/grokmirror
# (you can override that default via the GROKMIRROR_CONFIG_DIR env var).
# "repo" value should match a repo defined in the manifest file as understood
# by the running grok-pull daemon (it will ignore anything else)
#
# Any other attributes or the "data" field are ignored.
import falcon
import json
import os
import socket
import re
from configparser import ConfigParser, ExtendedInterpolation
# Some sanity defaults
MAX_PROJ_LEN = 32
MAX_REPO_LEN = 1024
# noinspection PyBroadException
class PubsubListener(object):
def on_get(self, req, resp):
resp.status = falcon.HTTP_200
resp.body = "We don't serve GETs here\n"
def on_post(self, req, resp):
if not req.content_length:
resp.status = falcon.HTTP_500
resp.body = 'Payload required\n'
return
try:
doc = json.load(req.stream)
except:
resp.status = falcon.HTTP_500
resp.body = 'Failed to parse payload as json\n'
return
try:
proj = doc['message']['attributes']['proj']
repo = doc['message']['attributes']['repo']
except (KeyError, TypeError):
resp.status = falcon.HTTP_500
resp.body = 'Not a pubsub v1 payload\n'
return
if len(proj) > MAX_PROJ_LEN or len(repo) > MAX_REPO_LEN:
resp.status = falcon.HTTP_500
resp.body = 'Repo or project value too long\n'
return
# Proj shouldn't contain slashes or whitespace
if re.search(r'[\s/]', proj):
resp.status = falcon.HTTP_500
resp.body = 'Invalid characters in project name\n'
return
# Repo shouldn't contain whitespace
if re.search(r'\s', proj):
resp.status = falcon.HTTP_500
resp.body = 'Invalid characters in repo name\n'
return
confdir = os.environ.get('GROKMIRROR_CONFIG_DIR', '/etc/grokmirror')
cfgfile = os.path.join(confdir, '{}.conf'.format(proj))
if not os.access(cfgfile, os.R_OK):
resp.status = falcon.HTTP_500
resp.body = 'Invalid project name\n'
return
config = ConfigParser(interpolation=ExtendedInterpolation())
config.read(cfgfile)
if 'pull' not in config or not config['pull'].get('socket'):
resp.status = falcon.HTTP_500
resp.body = 'Invalid project configuration (no socket defined)\n'
return
sockfile = config['pull'].get('socket')
if not os.access(sockfile, os.W_OK):
resp.status = falcon.HTTP_500
resp.body = 'Invalid project configuration (socket does not exist or is not writable)\n'
return
try:
with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as client:
client.connect(sockfile)
client.send(repo.encode())
except:
resp.status = falcon.HTTP_500
resp.body = 'Unable to communicate with the socket\n'
return
resp.status = falcon.HTTP_204
app = falcon.API()
pl = PubsubListener()
app.add_route('/pubsub_v1', pl)
grokmirror-2.0.11/contrib/python-grokmirror.spec 0000664 0000000 0000000 00000007736 14103301457 0022033 0 ustar 00root root 0000000 0000000 %global srcname grokmirror
%global groupname mirror
%global username mirror
%global userhome %{_sharedstatedir}/grokmirror
Name: python-%{srcname}
Version: 2.0.8
Release: 1%{?dist}
Summary: Framework to smartly mirror git repositories
License: GPLv3+
URL: https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git
Source0: https://www.kernel.org/pub/software/network/grokmirror/grokmirror-%{version}.tar.xz
BuildArch: noarch
%global _description %{expand:
Grokmirror was written to make mirroring large git repository
collections more efficient. Grokmirror uses the manifest file published
by the master mirror in order to figure out which repositories to
clone, and to track which repositories require updating. The process is
extremely lightweight and efficient both for the master and for the
mirrors.}
%description %_description
%package -n python3-%{srcname}
Summary: %{summary}
Requires(pre): shadow-utils
Requires: git-core, python3-packaging, python3-requests
BuildRequires: python3-devel, python3-setuptools
BuildRequires: systemd
Obsoletes: python-%{srcname} < 2, python2-%{srcname} < 2
%description -n python3-%{srcname} %_description
%prep
%autosetup -n %{srcname}-%{version}
%build
%py3_build
%install
%py3_install
%{__mkdir_p} -m 0755 \
%{buildroot}%{userhome} \
%{buildroot}%{_sysconfdir}/%{srcname} \
%{buildroot}%{_sysconfdir}/logrotate.d \
%{buildroot}%{_unitdir} \
%{buildroot}%{_bindir} \
%{buildroot}%{_tmpfilesdir} \
%{buildroot}%{_mandir}/man1 \
%{buildroot}%{_localstatedir}/log/%{srcname} \
%{buildroot}/run/%{srcname}
%{__install} -m 0644 man/*.1 %{buildroot}/%{_mandir}/man1/
%{__install} -m 0644 contrib/*.service %{buildroot}/%{_unitdir}/
%{__install} -m 0644 contrib/*.timer %{buildroot}/%{_unitdir}/
%{__install} -m 0644 contrib/logrotate %{buildroot}/%{_sysconfdir}/logrotate.d/grokmirror
%{__install} -m 0644 grokmirror.conf %{buildroot}/%{_sysconfdir}/%{srcname}/grokmirror.conf.example
echo "d /run/%{srcname} 0755 %{username} %{groupname}" > %{buildroot}/%{_tmpfilesdir}/%{srcname}.conf
%pre -n python3-%{srcname}
getent group %{groupname} >/dev/null || groupadd -r %{groupname}
getent passwd %{username} >/dev/null || \
useradd -r -g %{groupname} -d %{userhome} -s /sbin/nologin \
-c "Grokmirror user" %{username}
exit 0
%files -n python3-%{srcname}
%license LICENSE.txt
%doc README.rst grokmirror.conf pi-piper.conf
%dir %attr(0750, %{username}, %{groupname}) %{userhome}
%dir %attr(0755, %{username}, %{groupname}) %{_localstatedir}/log/%{srcname}/
%dir %attr(0755, %{username}, %{groupname}) /run/%{srcname}/
%config %{_sysconfdir}/%{srcname}/*
%config %{_sysconfdir}/logrotate.d/*
%{_tmpfilesdir}/%{srcname}.conf
%{_unitdir}/*
%{python3_sitelib}/%{srcname}-*.egg-info/
%{python3_sitelib}/%{srcname}/
%{_bindir}/*
%{_mandir}/*/*
%changelog
* Thu Mar 11 2021 Konstantin Ryabitsev - 2.0.8-1
- Update to 2.0.8 with fixes to symlink handling in manifests
* Tue Jan 19 2021 Konstantin Ryabitsev - 2.0.7-1
- Update to 2.0.7 with improvements for very large repo collections
* Thu Jan 07 2021 Konstantin Ryabitsev - 2.0.6-1
- Update to 2.0.6 with minor new features
* Wed Nov 25 2020 Konstantin Ryabitsev - 2.0.5-1
- Update to 2.0.5 with minor new features
* Wed Nov 04 2020 Konstantin Ryabitsev - 2.0.4-1
- Update to 2.0.4 with minor new features
* Wed Nov 04 2020 Konstantin Ryabitsev - 2.0.3-1
- Update to 2.0.3 with minor new features
* Tue Oct 06 2020 Konstantin Ryabitsev - 2.0.2-1
- Update to 2.0.2
- Install pi-piper into bindir
* Wed Sep 30 2020 Konstantin Ryabitsev - 2.0.1-1
- Update to 2.0.1
* Mon Sep 21 2020 Konstantin Ryabitsev - 2.0.0-1
- Initial 2.0.0 packaging
grokmirror-2.0.11/contrib/ref-updated 0000664 0000000 0000000 00000002050 14103301457 0017546 0 ustar 00root root 0000000 0000000 #!/bin/bash
# Gerrit's hook system is very different from standard git, so
# minor modifications to the hook are required to make it work.
# Place this file in your gerrit/hooks/ref-updated and modify the
# variables below to make it work for you.
GERRIT_HOME=/var/lib/gerrit
GERRIT_GIT=/srv/gerrit/git
GROK_MANIFEST_BIN=/usr/bin/grok-manifest
GROK_MANIFEST_LOG=${GERRIT_HOME}/logs/grok-manifest.log
# You'll need to place this where you can serve it with httpd
# Make sure the gerrit process can write to this location
GROK_MANIFEST=/var/www/html/grokmirror/manifest.js.gz
# Yank out the project out of the passed params
args=$(getopt -l "project:" -- "$@")
eval set -- "$args"
while [ $# -ge 1 ]; do
case "$1" in
--)
# No more options left.
shift
break
;;
--project)
project="$2"
shift
;;
esac
shift
done
${GROK_MANIFEST_BIN} -y -w -l ${GROK_MANIFEST_LOG} \
-m ${GROK_MANIFEST} \
-t ${GERRIT_GIT} \
-n "${GERRIT_GIT}/${project}.git"
grokmirror-2.0.11/contrib/selinux/ 0000775 0000000 0000000 00000000000 14103301457 0017115 5 ustar 00root root 0000000 0000000 grokmirror-2.0.11/contrib/selinux/el7/ 0000775 0000000 0000000 00000000000 14103301457 0017604 5 ustar 00root root 0000000 0000000 grokmirror-2.0.11/contrib/selinux/el7/grokmirror.fc 0000664 0000000 0000000 00000000512 14103301457 0022311 0 ustar 00root root 0000000 0000000 /usr/bin/grok-.* -- gen_context(system_u:object_r:grokmirror_exec_t,s0)
/var/lib/grokmirror(/.*)? gen_context(system_u:object_r:grokmirror_var_lib_t,s0)
/var/run/grokmirror(/.*)? gen_context(system_u:object_r:grokmirror_var_run_t,s0)
/var/log/grokmirror(/.*)? gen_context(system_u:object_r:grokmirror_log_t,s0)
grokmirror-2.0.11/contrib/selinux/el7/grokmirror.te 0000664 0000000 0000000 00000010717 14103301457 0022341 0 ustar 00root root 0000000 0000000 ##################
# Author: Konstantin Ryabitsev
#
policy_module(grokmirror, 1.1.1)
require {
type gitosis_var_lib_t;
type git_sys_content_t;
type net_conf_t;
type httpd_t;
type ssh_home_t;
type passwd_file_t;
type postfix_etc_t;
}
##################
# Declarations
type grokmirror_t;
type grokmirror_exec_t;
init_daemon_domain(grokmirror_t, grokmirror_exec_t)
type grokmirror_var_lib_t;
files_type(grokmirror_var_lib_t)
type grokmirror_log_t;
logging_log_file(grokmirror_log_t)
type grokmirror_var_run_t;
files_pid_file(grokmirror_var_run_t)
type grokmirror_tmpfs_t;
files_tmpfs_file(grokmirror_tmpfs_t)
gen_tunable(grokmirror_connect_ssh, false)
gen_tunable(grokmirror_connect_all_unreserved, false)
# Uncomment to put these domains into permissive mode
permissive grokmirror_t;
##################
# Daemons policy
domain_use_interactive_fds(grokmirror_t)
files_read_etc_files(grokmirror_t)
miscfiles_read_localization(grokmirror_t)
# Logging
append_files_pattern(grokmirror_t, grokmirror_log_t, grokmirror_log_t)
create_files_pattern(grokmirror_t, grokmirror_log_t, grokmirror_log_t)
setattr_files_pattern(grokmirror_t, grokmirror_log_t, grokmirror_log_t)
logging_log_filetrans(grokmirror_t, grokmirror_log_t, { file dir })
logging_send_syslog_msg(grokmirror_t)
# Allow managing anything grokmirror_var_lib_t
manage_dirs_pattern(grokmirror_t, grokmirror_var_lib_t, grokmirror_var_lib_t)
manage_files_pattern(grokmirror_t, grokmirror_var_lib_t, grokmirror_var_lib_t)
manage_lnk_files_pattern(grokmirror_t, grokmirror_var_lib_t, grokmirror_var_lib_t)
manage_sock_files_pattern(grokmirror_t, grokmirror_var_lib_t, grokmirror_var_lib_t)
# Allow managing git repositories
manage_files_pattern(grokmirror_t, gitosis_var_lib_t, gitosis_var_lib_t)
manage_lnk_files_pattern(grokmirror_t, gitosis_var_lib_t, gitosis_var_lib_t)
manage_dirs_pattern(grokmirror_t, gitosis_var_lib_t, gitosis_var_lib_t)
manage_sock_files_pattern(grokmirror_t, gitosis_var_lib_t, gitosis_var_lib_t)
manage_files_pattern(grokmirror_t, git_sys_content_t, git_sys_content_t)
manage_lnk_files_pattern(grokmirror_t, git_sys_content_t, git_sys_content_t)
manage_dirs_pattern(grokmirror_t, git_sys_content_t, git_sys_content_t)
manage_sock_files_pattern(grokmirror_t, git_sys_content_t, git_sys_content_t)
# Allow executing bin (for git, mostly)
corecmd_exec_bin(grokmirror_t)
libs_exec_ldconfig(grokmirror_t)
# Allow managing httpd content in case the manifest is stored there
apache_manage_sys_content(grokmirror_t)
# git wants to access system state and other bits
kernel_dontaudit_read_system_state(grokmirror_t)
# Allow connecting to http, git
corenet_tcp_connect_http_port(grokmirror_t)
corenet_tcp_connect_git_port(grokmirror_t)
corenet_tcp_bind_generic_node(grokmirror_t)
corenet_tcp_sendrecv_generic_node(grokmirror_t)
# git needs to dns-resolve
sysnet_dns_name_resolve(grokmirror_t)
# Allow reading .netrc files
read_files_pattern(grokmirror_t, net_conf_t, net_conf_t)
# Post-hooks can use grep, which requires execmem
allow grokmirror_t self:process execmem;
fs_getattr_tmpfs(grokmirror_t)
manage_files_pattern(grokmirror_t, grokmirror_tmpfs_t, grokmirror_tmpfs_t)
fs_tmpfs_filetrans(grokmirror_t, grokmirror_tmpfs_t, file)
# Listener socket file
manage_dirs_pattern(grokmirror_t, grokmirror_var_run_t, grokmirror_var_run_t)
manage_files_pattern(grokmirror_t, grokmirror_var_run_t, grokmirror_var_run_t)
manage_sock_files_pattern(grokmirror_t, grokmirror_var_run_t, grokmirror_var_run_t)
files_pid_filetrans(grokmirror_t, grokmirror_var_run_t, { dir file sock_file })
# allow httpd to write to the listener socket
allow httpd_t grokmirror_t:unix_stream_socket connectto;
# Some bogus dontaudits
# ssh tries to open /etc/mailname, which the postfix module labels oddly
dontaudit grokmirror_t postfix_etc_t:file { getattr open read };
tunable_policy(`grokmirror_connect_all_unreserved',`
corenet_sendrecv_all_client_packets(grokmirror_t)
corenet_tcp_connect_all_unreserved_ports(grokmirror_t)
')
tunable_policy(`grokmirror_connect_ssh',`
corenet_sendrecv_ssh_client_packets(grokmirror_t)
corenet_tcp_connect_ssh_port(grokmirror_t)
corenet_tcp_sendrecv_ssh_port(grokmirror_t)
ssh_exec(grokmirror_t)
ssh_read_user_home_files(grokmirror_t)
# for the controlmaster socket
manage_sock_files_pattern(grokmirror_t, ssh_home_t, ssh_home_t)
allow grokmirror_t self:unix_stream_socket connectto;
allow grokmirror_t passwd_file_t:file { getattr open read };
')
grokmirror-2.0.11/grokmirror.conf 0000664 0000000 0000000 00000034001 14103301457 0017030 0 ustar 00root root 0000000 0000000 # Grokmirror 2.x and above have a single config file per each set
# of mirrored repos, instead of a separate repos.conf and fsck.conf
# with multiple sections.
#
# You can use ${varname} interpolation within the same section
# or ${sectname:varname} from any other section.
[core]
#
# Where are our mirrored repositories kept?
toplevel = /var/lib/git/mirror
#
# Where should we keep our manifest file?
manifest = ${toplevel}/manifest.js.gz
#
# Where should we put our log? Make sure it is logrotated,
# otherwise it will grow indefinitely.
log = ${toplevel}/log
#
# Options are "info" and "debug" for all the debug data (lots!)
loglevel = info
#
# Grokmirror version 2.x and above can automatically recognize related repositories
# by analyzing root commits. If it finds two or more related repositories, it can set
# up a unified "object storage" repo and fetch all refs from each related repository.
# For example, you can have two forks of linux.git:
# foo/bar/linux.git:
# refs/heads/master
# refs/heads/devbranch
# refs/tags/v5.0-rc3
# ...
# baz/quux/linux.git:
# refs/heads/master
# refs/heads/devbranch
# refs/tags/v5.0-rc3
# ...
# Grokmirror will set up an object storage repository and fetch all refs from
# both repositories:
# objstore/[random-guid-name].git
# refs/virtual/[sha1-of-foo/bar/linux.git:12]/heads/master
# refs/virtual/[sha1-of-foo/bar/linux.git:12]/heads/devbranch
# refs/virtual/[sha1-of-foo/bar/linux.git:12]/tags/v5.0-rc3
# ...
# refs/virtual/[sha1-of-baz/quux/linux.git:12]/heads/master
# refs/virtual/[sha1-of-baz/quux/linux.git:12]/heads/devbranch
# refs/virtual/[sha1-of-baz/quux/linux.git:12]/tags/v5.0-rc3
# ...
#
# This will dramatically improve storage on disk, as original repositories will be
# repacked to almost nothing. Grokmirror will repack the object storage repository
# with --delta-islands to help optimize packs for efficient clones.
objstore = ${toplevel}/objstore
#
# When copying objects into objstore repositories, we will use regular git
# porcelain commands, such as git fetch. However, this tends to be slow due to
# git erring on the side of caution when calculating haves and wants, so if you
# are running a busy mirror and want to save a lot of cycles, you will want to
# enable the setting below, which will use internal git plumbing for much more
# direct object copying between repos.
#objstore_uses_plumbing = yes
#
# Due to the nature of git alternates, if two repositories share all their objects
# with an "object storage" repo, any object from repoA can be retrieved from repoB
# via most web UIs if someone knows the object hash.
# E.g. this is how this trick works on Github:
# https://github.com/torvalds/linux/blob/b4061a10fc29010a610ff2b5b20160d7335e69bf/drivers/hid/hid-samsung.c#L113-L118
#
# If you have private repositories that should absolutely not reveal any objects,
# add them here using shell-style globbing. They will still be set up for alternates
# if we find common roots with public repositories, but we won't fetch any objects
# from these repos into refs/virtual/*.
#
# Leave blank if you don't have any private repos (or don't offer a web UI).
#private = */private/*
# Used by grok-manifest (and others for "pretty"). These options can be
# overridden using matching command-line switches to grok-manifest.
[manifest]
# Enable to save pretty-printed js (larger and slower, but easier to debug)
pretty = no
# List of repositories to ignore -- can take multiple entries with newline+tab
# and accepts shell globbing.
ignore = /testing/*
/private/*
# Enable to fetch objects into objstore repos after commit. This can be useful if
# someone tries to push the same objects to a sibling repository, but may significantly
# slow down post-commit hook operation, negating any speed gains. If set to no, the
# objects will be fetched during regular grok-fsck runs.
fetch_objstore = no
# Only include repositories that have git-daemon-export-ok.
check_export_ok = no
# Used by grok-pull, mostly
[remote]
# The host part of the mirror you're pulling from.
site = https://git.kernel.org
#
# Where the grok manifest is published. The following protocols
# are supported at this time:
# http:// or https:// using If-Modified-Since http header
# file:// (when manifest file is on NFS, for example)
# NB: You can no longer specify username:password as part of the URL with
# grokmirror 2.x and above. You can use a netrc file for this purpose.
manifest = ${site}/manifest.js.gz
#
# As an alternative to setting a manifest URL, you can define a manifest_command.
# It has three possible outcomes:
# exit code 0 + full remote manifest on stdout (must be valid json)
# exit code 1 + error message on stdout
# exit code 127 + nothing on stdout if remote manifest hasn't changed
# It should also accept '--force' as a single argument to force manifest retrieval
# even if it hasn't changed.
# See contrib/gitolite/* for example commands to use with gitolite.
#manifest_command = /usr/local/bin/grok-get-gl-manifest.sh
#
# If the remote is providing pre-generated preload bundles, list the path
# here. This is only useful if you're mirroring the entire repository
# collection and not just a handful of select repos.
#preload_bundle_url = https://some-cdn-site.com/preload/
# Used by grok-pull
[pull]
#
# Write out projects.list that can be used by gitweb or cgit.
# Leave blank if you don't want a projects.list.
projectslist = ${core:toplevel}/projects.list
#
# When generating projects.list, start at this subpath instead
# of at the toplevel. Useful when mirroring kernel or when generating
# multiple gitweb/cgit configurations for the same tree.
projectslist_trimtop =
#
# When generating projects.list, also create entries for symlinks.
# Otherwise we assume they are just legacy and keep them out of
# web interfaces.
projectslist_symlinks = no
#
# A simple hook to execute whenever a repository is modified.
# It passes the full path to the git repository modified as the final
# argument. You can define multiple hooks if you separate them by
# newline+whitespace.
post_update_hook =
#
# Should we purge repositories that are not present in the remote
# manifest? If set to "no" this can be overridden via the -p flag to
# grok-pull (useful if you have a very large collection of repos
# and don't want to walk the entire tree on each manifest run).
# See also: purgeprotect.
purge = yes
#
# There may be repositories that aren't replicated with grokmirror that
# you don't want to be purged. You can list them below using bash-style
# globbing. Separate multiple entries using newline+whitespace.
#nopurge = /gitolite-admin.git
#
# This prevents catastrophic mirror purges when our upstream gives us a
# manifest that is dramatically smaller than ours. The default is to
# refuse the purge if the remote manifest has over 5% fewer repositories
# than what we have, or in other words, if we have 100 repos and the
# remote manifest has shrunk to 95 repos or fewer, we refuse to purge,
# suspecting that something has gone wrong. You can set purgeprotect to
# a higher percentage, or override it entirely with --force-purge
# commandline flag.
purgeprotect = 5
#
# If owner is not specified in the manifest, who should be listed
# as the default owner in tools like gitweb or cgit?
#default_owner = Grokmirror User
default_owner = Grokmirror User
#
# By default, we'll call the upstream origin "_grokmirror", but you can set your
# own name here (e.g. just call it "origin")
remotename = _grokmirror
#
# To speed up updates, grok-pull will use multiple threads. Please be
# considerate to the mirror you're pulling from and don't set this very
# high. You may also run into per-ip multiple session limits, so leave
# this number at a nice low setting.
pull_threads = 5
#
# If git fetch fails, we will retry up to this many times before
# giving up and marking that repository as failed.
retries = 3
#
# Use shell-globbing to list the repositories you would like to mirror.
# If you want to mirror everything, just say "*". Separate multiple entries
# with newline plus tab. Examples:
#
# mirror everything:
#include = *
#
# mirror just the main kernel sources:
#include = /pub/scm/linux/kernel/git/torvalds/linux.git
# /pub/scm/linux/kernel/git/stable/linux.git
# /pub/scm/linux/kernel/git/next/linux-next.git
include = *
#
# This is processed after the include. If you want to exclude some
# specific entries from an all-inclusive globbing above. E.g., to
# exclude all linux-2.4 git sources:
#exclude = */linux-2.4*
exclude =
#
# List repositories that should always reject forced pushes.
#ffonly = */torvalds/linux.git
#
# If you enable the following option and run grok-pull with -o,
# grok-pull will run continuously and will periodically recheck the
# remote maniefest for new updates. See contrib for an example systemd
# service you can set up to continuously update your local mirror. The
# value is in seconds.
#refresh = 900
#
# If you enable refresh, you can also enable the socket listener that
# allows for rapid push notifications from your primary mirror. The
# socket expects repository names matching what is in the local
# manifest, followed by a newline. E.g.:
# /pub/scm/linux/kernel/git/torvalds/linux.git\n
#
# Anything not matching a repository in the local manifest will be ignored.
# See contrib for example pubsub listener.
#socket = ${core:toplevel}/.updater.socket
# Used by grok-fsck
[fsck]
#
# How often should we check each repository, in days. Any newly added
# repository will have the first check within a random period of 0 and
# $frequency, and then every $frequency after that, to assure that not
# all repositories are checked on the same day. Don't set to less than
# 7 unless you only mirror a few repositories (or really like to thrash
# your disks).
frequency = 30
#
# Where to keep the status file
statusfile = ${core:toplevel}/fsck.status.js
#
# Some errors are relatively benign and can be safely ignored. Add
# matching substrings to this field to ignore them.
ignore_errors = notice:
warning: disabling bitmap writing
ignoring extra bitmap file
missingTaggerEntry
missingSpaceBeforeDate
#
# If the fsck process finds errors that match any of these strings
# during its run, it will ask grok-pull to reclone this repository when
# it runs next. Only useful for minion mirrors, not for mirror masters.
reclone_on_errors = fatal: bad tree object
fatal: Failed to traverse parents
missing commit
missing blob
missing tree
broken link
#
# Should we repack the repositories? You almost always want this on,
# unless you are doing something really odd.
repack = yes
#
# We set proper flags for repacking depending if the repo is using
# alternates or not, and whether this is a full repack or not. We will
# also always build bitmaps (when it makes sense), to make cloning
# faster. You can add other flags (e.g. --threads and --window-memory)
# via the following parameter:
extra_repack_flags =
#
# These flags are added *in addition* to extra_repack_flags
extra_repack_flags_full = --window=250 --depth=50
#
# If git version is new enough to support generating commit graphs, we
# will always generate them, though if your git version is older than
# 2.24.0, the graphs won't be automatically used unless core.commitgraph
# is set to true. You can turn off graph generation by setting the
# commitgraph option to "no". Graph generation will be skipped for
# child repos that use alternates.
commitgraph = yes
#
# Run git-prune to remove obsolete loose objects. Grokmirror will make
# sure this is a safe operation when it comes to objstore repos, so you
# should leave this enabled.
prune = yes
#
# Grokmirror is extremely careful about not pruning the repositories
# that are used by others via git alternates. However, it cannot prevent
# some other git process (not grokmirror-managed) from inadvertently
# running "git prune/gc". For example, this may happen if an admin
# mistypes a command in the wrong directory. Setting precious=yes will
# add extensions.preciousObjects=true to the git configuration file in
# such repositories, which will help prevent repository corruption
# between grok-fsck runs.
#
# When set to "yes", grokmirror will temporarily turn this feature off
# when running scheduled repacks in order to be able to delete redundant
# packs and loose objects that have already been packed. This is usually
# a safe operation when done by grok-fsck itself. However, if you set
# this to "always", grokmirror will leave this enabled even during
# grok-fsck runs, for maximum paranoia. Be warned, that this will result
# in ever-growing git repositories, so it only makes sense in very rare
# situations, such as for backup purposes.
precious = yes
#
# If you have a lot of forks using the same objstore repo, you may end
# up with thousands of refs being negotiated during each remote update.
# This tends to result in higher load and bigger negotiation transfers.
# Setting the "baselines" option allows you to designate a set of repos
# that are likely to have most of the relevant objects and ignore the
# rest of the objstore refs. This is done using the
# core.alternateRefsPrefixes feature (see git-config).
baselines = */kernel/git/next/linux-next.git
#
# Objstore repos are repacked with delta island support (see man
# git-config), but if you have one repo that is a lot more likely to be
# cloned than all the other ones, you can designate it as "islandCore",
# which will give it priority when creating packs.
islandcores = */kernel/git/torvalds/linux.git
#
# Generate preload bundles for objstore repos and put them into this
# location. Unless you are running a major mirroring hub site, you
# do not want this enabled. See corresponding preload_bundle_url
# entry in the [remote] section.
#preload_bundle_outdir = /some/http/accessible/path
#
# If there are any critical errors, the report will be sent to root. You
# can change the settings below to configure report delivery to suit
# your needs:
#report_to = root
#report_from = root
#report_subject = git fsck errors on my beautiful replica
#report_mailhost = localhost
grokmirror-2.0.11/grokmirror/ 0000775 0000000 0000000 00000000000 14103301457 0016163 5 ustar 00root root 0000000 0000000 grokmirror-2.0.11/grokmirror/__init__.py 0000664 0000000 0000000 00000112007 14103301457 0020275 0 ustar 00root root 0000000 0000000 # -*- coding: utf-8 -*-
# Copyright (C) 2013-2020 by The Linux Foundation and contributors
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see .
import os
import sys
import time
import json
import fnmatch
import subprocess
import requests
import logging
import logging.handlers
import hashlib
import pathlib
import uuid
import tempfile
import shutil
import gzip
import datetime
from fcntl import lockf, LOCK_EX, LOCK_UN, LOCK_NB
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
VERSION = '2.0.11'
MANIFEST_LOCKH = None
REPO_LOCKH = dict()
GITBIN = '/usr/bin/git'
# default logger. Will be overridden.
logger = logging.getLogger(__name__)
_alt_repo_map = None
# Used to store our requests session
REQSESSION = None
OBST_PREAMBULE = ('# WARNING: This is a grokmirror object storage repository.\n'
'# Deleting or moving it will cause corruption in the following repositories\n'
'# (caution, this list may be incomplete):\n')
def get_requests_session():
global REQSESSION
if REQSESSION is None:
REQSESSION = requests.session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
REQSESSION.mount('http://', adapter)
REQSESSION.mount('https://', adapter)
REQSESSION.headers.update({'User-Agent': 'grokmirror/%s' % VERSION})
return REQSESSION
def get_config_from_git(fullpath, regexp, defaults=None):
args = ['config', '-z', '--get-regexp', regexp]
ecode, out, err = run_git_command(fullpath, args)
gitconfig = defaults
if not gitconfig:
gitconfig = dict()
if not out:
return gitconfig
for line in out.split('\x00'):
if not line:
continue
key, value = line.split('\n', 1)
try:
chunks = key.split('.')
cfgkey = chunks[-1]
gitconfig[cfgkey.lower()] = value
except ValueError:
logger.debug('Ignoring git config entry %s', line)
return gitconfig
def set_git_config(fullpath, param, value, operation='--replace-all'):
args = ['config', operation, param, value]
ecode, out, err = run_git_command(fullpath, args)
return ecode
def git_newer_than(minver):
from packaging import version
(retcode, output, error) = run_git_command(None, ['--version'])
ver = output.split()[-1]
return version.parse(ver) >= version.parse(minver)
def run_shell_command(cmdargs, stdin=None, decode=True):
logger.debug('Running: %s', ' '.join(cmdargs))
child = subprocess.Popen(cmdargs, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, error = child.communicate(input=stdin)
if decode:
output = output.decode().strip()
error = error.decode().strip()
return child.returncode, output, error
def run_git_command(fullpath, args, stdin=None, decode=True):
if 'GITBIN' in os.environ:
_git = os.environ['GITBIN']
else:
_git = GITBIN
if not os.path.isfile(_git) and os.access(_git, os.X_OK):
# we hope for the best by using 'git' without full path
_git = 'git'
if fullpath is not None:
cmdargs = [_git, '--no-pager', '--git-dir', fullpath] + args
else:
cmdargs = [_git, '--no-pager'] + args
return run_shell_command(cmdargs, stdin, decode=decode)
def _lockname(fullpath):
lockpath = os.path.dirname(fullpath)
lockname = '.%s.lock' % os.path.basename(fullpath)
if not os.path.exists(lockpath):
os.makedirs(lockpath)
repolock = os.path.join(lockpath, lockname)
return repolock
def lock_repo(fullpath, nonblocking=False):
repolock = _lockname(fullpath)
logger.debug('Attempting to exclusive-lock %s', repolock)
lockfh = open(repolock, 'w')
if nonblocking:
flags = LOCK_EX | LOCK_NB
else:
flags = LOCK_EX
lockf(lockfh, flags)
global REPO_LOCKH
REPO_LOCKH[fullpath] = lockfh
def unlock_repo(fullpath):
global REPO_LOCKH
if fullpath in REPO_LOCKH.keys():
logger.debug('Unlocking %s', fullpath)
lockf(REPO_LOCKH[fullpath], LOCK_UN)
REPO_LOCKH[fullpath].close()
del REPO_LOCKH[fullpath]
def is_bare_git_repo(path):
"""
Return True if path (which is already verified to be a directory)
sufficiently resembles a base git repo (good enough to fool git
itself).
"""
logger.debug('Checking if %s is a git repository', path)
if (os.path.isdir(os.path.join(path, 'objects')) and
os.path.isdir(os.path.join(path, 'refs')) and
os.path.isfile(os.path.join(path, 'HEAD'))):
return True
logger.debug('Skipping %s: not a git repository', path)
return False
def get_repo_timestamp(toplevel, gitdir):
ts = 0
fullpath = os.path.join(toplevel, gitdir.lstrip('/'))
tsfile = os.path.join(fullpath, 'grokmirror.timestamp')
if os.path.exists(tsfile):
with open(tsfile, 'rb') as tsfh:
contents = tsfh.read()
try:
ts = int(contents)
logger.debug('Timestamp for %s: %s', gitdir, ts)
except ValueError:
logger.warning('Was not able to parse timestamp in %s', tsfile)
else:
logger.debug('No existing timestamp for %s', gitdir)
return ts
def set_repo_timestamp(toplevel, gitdir, ts):
fullpath = os.path.join(toplevel, gitdir.lstrip('/'))
tsfile = os.path.join(fullpath, 'grokmirror.timestamp')
with open(tsfile, 'wt') as tsfh:
tsfh.write('%d' % ts)
logger.debug('Recorded timestamp for %s: %s', gitdir, ts)
def get_repo_obj_info(fullpath):
args = ['count-objects', '-v']
retcode, output, error = run_git_command(fullpath, args)
obj_info = dict()
if output:
for line in output.split('\n'):
key, value = line.split(':')
obj_info[key] = value.strip()
return obj_info
def get_repo_defs(toplevel, gitdir, usenow=False, ignorerefs=None):
fullpath = os.path.join(toplevel, gitdir.lstrip('/'))
description = None
try:
descfile = os.path.join(fullpath, 'description')
with open(descfile, 'rb') as fh:
contents = fh.read().strip()
if len(contents) and contents.find(b'edit this file') < 0:
# We don't need to tell mirrors to edit this file
description = contents.decode(errors='replace')
except IOError:
pass
entries = get_config_from_git(fullpath, r'gitweb\..*')
owner = entries.get('owner', None)
modified = 0
if not usenow:
args = ['for-each-ref', '--sort=-committerdate', '--format=%(committerdate:iso-strict)', '--count=1']
ecode, out, err = run_git_command(fullpath, args)
if len(out):
try:
modified = datetime.datetime.fromisoformat(out)
except AttributeError:
# Python 3.6 doesn't have fromisoformat
# remove : from the TZ info
out = out[:-3] + out[-2:]
modified = datetime.datetime.strptime(out, '%Y-%m-%dT%H:%M:%S%z')
if not modified:
modified = datetime.datetime.now()
head = None
try:
with open(os.path.join(fullpath, 'HEAD')) as fh:
head = fh.read().strip()
except IOError:
pass
forkgroup = None
altrepo = get_altrepo(fullpath)
if altrepo and os.path.exists(os.path.join(altrepo, 'grokmirror.objstore')):
forkgroup = os.path.basename(altrepo)[:-4]
# we need a way to quickly compare whether mirrored repositories match
# what is in the master manifest. To this end, we calculate a so-called
# "state fingerprint" -- basically the output of "git show-ref | sha1sum".
# git show-ref output is deterministic and should accurately list all refs
# and their relation to heads/tags/etc.
fingerprint = get_repo_fingerprint(toplevel, gitdir, force=True, ignorerefs=ignorerefs)
# Record it in the repo for other use
set_repo_fingerprint(toplevel, gitdir, fingerprint)
repoinfo = {
'modified': int(modified.timestamp()),
'fingerprint': fingerprint,
'head': head,
}
# Don't add empty things to manifest
if owner:
repoinfo['owner'] = owner
if description:
repoinfo['description'] = description
if forkgroup:
repoinfo['forkgroup'] = forkgroup
return repoinfo
def get_altrepo(fullpath):
altfile = os.path.join(fullpath, 'objects', 'info', 'alternates')
altdir = None
try:
with open(altfile, 'r') as fh:
contents = fh.read().strip()
if len(contents) > 8 and contents[-8:] == '/objects':
altdir = os.path.realpath(contents[:-8])
except IOError:
pass
return altdir
def set_altrepo(fullpath, altdir):
# I assume you already checked if this is a sane operation to perform
altfile = os.path.join(fullpath, 'objects', 'info', 'alternates')
objpath = os.path.join(altdir, 'objects')
if os.path.isdir(objpath):
with open(altfile, 'w') as fh:
fh.write(objpath + '\n')
else:
logger.critical('objdir %s does not exist, not setting alternates file %s', objpath, altfile)
def get_rootsets(toplevel, obstdir):
top_roots = dict()
obst_roots = dict()
topdirs = find_all_gitdirs(toplevel, normalize=True, exclude_objstore=True)
obstdirs = find_all_gitdirs(obstdir, normalize=True, exclude_objstore=False)
for fullpath in topdirs:
roots = get_repo_roots(fullpath)
if roots:
top_roots[fullpath] = roots
for fullpath in obstdirs:
if fullpath in obst_roots:
continue
roots = get_repo_roots(fullpath)
if roots:
obst_roots[fullpath] = roots
return top_roots, obst_roots
def get_repo_roots(fullpath, force=False):
if not os.path.exists(fullpath):
logger.debug('Cannot check roots in %s, as it does not exist', fullpath)
return None
rfile = os.path.join(fullpath, 'grokmirror.roots')
if not force and os.path.exists(rfile):
with open(rfile, 'rt') as rfh:
content = rfh.read()
roots = set(content.split('\n'))
else:
logger.debug('Generating roots for %s', fullpath)
ecode, out, err = run_git_command(fullpath, ['rev-list', '--max-parents=0', '--all'])
if ecode > 0:
logger.debug('Error listing roots in %s', fullpath)
return None
if not len(out):
logger.debug('No roots in %s', fullpath)
return None
# save it for future use
with open(rfile, 'w') as rfh:
rfh.write(out)
logger.debug('Wrote %s', rfile)
roots = set(out.split('\n'))
return roots
def setup_bare_repo(fullpath):
args = ['init', '--bare', fullpath]
ecode, out, err = run_git_command(None, args)
if ecode > 0:
logger.critical('Unable to bare-init %s', fullpath)
return False
# Remove .sample files from hooks, because they are just dead weight
hooksdir = os.path.join(fullpath, 'hooks')
for child in pathlib.Path(hooksdir).iterdir():
if child.suffix == '.sample':
child.unlink()
# We never want auto-gc anywhere
set_git_config(fullpath, 'gc.auto', '0')
# We don't care about FETCH_HEAD information and writing to it just
# wastes IO cycles
os.symlink('/dev/null', os.path.join(fullpath, 'FETCH_HEAD'))
return True
def setup_objstore_repo(obstdir, name=None):
if name is None:
name = str(uuid.uuid4())
pathlib.Path(obstdir).mkdir(parents=True, exist_ok=True)
obstrepo = os.path.join(obstdir, '%s.git' % name)
logger.debug('Creating objstore repo in %s', obstrepo)
lock_repo(obstrepo)
if not setup_bare_repo(obstrepo):
sys.exit(1)
# All our objects are precious -- we only turn this off when repacking
set_git_config(obstrepo, 'core.repositoryformatversion', '1')
set_git_config(obstrepo, 'extensions.preciousObjects', 'true')
# Set maximum compression, though perhaps we should make this configurable
set_git_config(obstrepo, 'pack.compression', '9')
# Set island configs
set_git_config(obstrepo, 'repack.useDeltaIslands', 'true')
set_git_config(obstrepo, 'repack.writeBitmaps', 'true')
set_git_config(obstrepo, 'pack.island', 'refs/virtual/([0-9a-f]+)/', operation='--add')
telltale = os.path.join(obstrepo, 'grokmirror.objstore')
with open(telltale, 'w') as fh:
fh.write(OBST_PREAMBULE)
unlock_repo(obstrepo)
return obstrepo
def objstore_virtref(fullpath):
fullpath = os.path.realpath(fullpath)
vh = hashlib.sha1()
vh.update(fullpath.encode())
return vh.hexdigest()[:12]
def objstore_trim_virtref(obstrepo, virtref):
args = ['for-each-ref', '--format', 'delete %(refname)', f'refs/virtual/{virtref}']
ecode, out, err = run_git_command(obstrepo, args)
if ecode == 0 and len(out):
out += '\n'
args = ['update-ref', '--stdin']
run_git_command(obstrepo, args, stdin=out.encode())
def remove_from_objstore(obstrepo, fullpath):
# is fullpath still using us?
altrepo = get_altrepo(fullpath)
if altrepo and os.path.realpath(obstrepo) == os.path.realpath(altrepo):
# Repack the child first, using minimal flags
args = ['repack', '-abq']
ecode, out, err = run_git_command(fullpath, args)
if ecode > 0:
logger.debug('Could not repack child repo %s for removal from %s', fullpath, obstrepo)
return False
os.unlink(os.path.join(fullpath, 'objects', 'info', 'alternates'))
virtref = objstore_virtref(fullpath)
objstore_trim_virtref(obstrepo, virtref)
args = ['remote', 'remove', virtref]
run_git_command(obstrepo, args)
try:
os.unlink(os.path.join(obstrepo, 'grokmirror.%s.fingerprint' % virtref))
except (IOError, FileNotFoundError):
pass
return True
def list_repo_remotes(fullpath, withurl=False):
args = ['remote']
if withurl:
args.append('-v')
ecode, out, err = run_git_command(fullpath, args)
if not len(out):
logger.debug('Could not list remotes in %s', fullpath)
return list()
if not withurl:
return out.split('\n')
remotes = list()
for line in out.split('\n'):
entry = tuple(line.split()[:2])
if entry not in remotes:
remotes.append(entry)
return remotes
def add_repo_to_objstore(obstrepo, fullpath):
virtref = objstore_virtref(fullpath)
remotes = list_repo_remotes(obstrepo)
if virtref in remotes:
logger.debug('%s is already set up for objstore in %s', fullpath, obstrepo)
return False
args = ['remote', 'add', virtref, fullpath, '--no-tags']
ecode, out, err = run_git_command(obstrepo, args)
if ecode > 0:
logger.critical('Could not add remote to %s', obstrepo)
sys.exit(1)
set_git_config(obstrepo, 'remote.%s.fetch' % virtref, '+refs/*:refs/virtual/%s/*' % virtref)
telltale = os.path.join(obstrepo, 'grokmirror.objstore')
knownsiblings = set()
if os.path.exists(telltale):
with open(telltale) as fh:
for line in fh.readlines():
line = line.strip()
if not len(line) or line[0] == '#':
continue
if os.path.isdir(line):
knownsiblings.add(line)
knownsiblings.add(fullpath)
with open(telltale, 'w') as fh:
fh.write(OBST_PREAMBULE)
fh.write('\n'.join(sorted(list(knownsiblings))) + '\n')
return True
def _fetch_objstore_repo_using_plumbing(srcrepo, obstrepo, virtref):
# Copies objects to objstore repos using direct git plumbing
# as opposed to using "fetch". See discussion here:
# http://lore.kernel.org/git/20200720173220.GB2045458@coredump.intra.peff.net
# First, hardlink all objects and packs
srcobj = os.path.join(srcrepo, 'objects')
dstobj = os.path.join(obstrepo, 'objects')
torm = set()
for root, dirs, files in os.walk(srcobj, topdown=True):
if 'info' in dirs:
dirs.remove('info')
subpath = root.replace(srcobj, '').lstrip('/')
for file in files:
srcpath = os.path.join(root, file)
if file.endswith('.bitmap'):
torm.add(srcpath)
continue
dstpath = os.path.join(dstobj, subpath, file)
if not os.path.exists(dstpath):
pathlib.Path(os.path.dirname(dstpath)).mkdir(parents=True, exist_ok=True)
os.link(srcpath, dstpath)
torm.add(srcpath)
# Now we generate a list of refs on both sides
srcargs = ['for-each-ref', f'--format=%(objectname) refs/virtual/{virtref}/%(refname:lstrip=1)']
ecode, out, err = run_git_command(srcrepo, srcargs)
if ecode > 0:
logger.debug('Could not for-each-ref %s: %s', srcrepo, err)
return False
srcset = set(out.strip().split('\n'))
dstargs = ['for-each-ref', f'--format=%(objectname) %(refname)', f'refs/virtual/{virtref}']
ecode, out, err = run_git_command(obstrepo, dstargs)
if ecode > 0:
logger.debug('Could not for-each-ref %s: %s', obstrepo, err)
return False
dstset = set(out.strip().split('\n'))
# Now we create a stdin list of commands for update-ref
mapping = dict()
newset = srcset.difference(dstset)
if newset:
for refline in newset:
obj, ref = refline.split(' ', 1)
mapping[ref] = obj
commands = ''
oldset = dstset.difference(srcset)
if oldset:
for refline in oldset:
if not len(refline):
continue
obj, ref = refline.split(' ', 1)
if ref in mapping:
commands += f'update {ref} {mapping[ref]} {obj}\n'
mapping.pop(ref)
else:
commands += f'delete {ref} {obj}\n'
for ref, obj in mapping.items():
commands += f'create {ref} {obj}\n'
logger.debug('stdin=%s', commands)
args = ['update-ref', '--stdin']
ecode, out, err = run_git_command(obstrepo, args, stdin=commands.encode())
if ecode > 0:
logger.debug('Could not update-ref %s: %s', obstrepo, err)
return False
for file in torm:
os.unlink(file)
return True
def fetch_objstore_repo(obstrepo, fullpath=None, pack_refs=False, use_plumbing=False):
my_remotes = list_repo_remotes(obstrepo, withurl=True)
if fullpath:
virtref = objstore_virtref(fullpath)
if (virtref, fullpath) in my_remotes:
remotes = {(virtref, fullpath)}
else:
logger.debug('%s is not in remotes for %s', fullpath, obstrepo)
return False
else:
remotes = my_remotes
success = True
for (virtref, url) in remotes:
if use_plumbing:
success = _fetch_objstore_repo_using_plumbing(url, obstrepo, virtref)
else:
ecode, out, err = run_git_command(obstrepo, ['fetch', virtref, '--prune'])
if ecode > 0:
success = False
if success:
r_fp = os.path.join(url, 'grokmirror.fingerprint')
if os.path.exists(r_fp):
l_fp = os.path.join(obstrepo, 'grokmirror.%s.fingerprint' % virtref)
shutil.copy(r_fp, l_fp)
if pack_refs:
try:
lock_repo(obstrepo, nonblocking=True)
run_git_command(obstrepo, ['pack-refs'])
unlock_repo(obstrepo)
except IOError:
# Next run will take care of it
pass
else:
logger.info('Could not fetch objects from %s to %s', url, obstrepo)
return success
def is_private_repo(config, fullpath):
privmasks = config['core'].get('private', '')
if not len(privmasks):
return False
for privmask in privmasks.split('\n'):
# Does this repo match privrepo
if fnmatch.fnmatch(fullpath, privmask.strip()):
return True
return False
def find_siblings(fullpath, my_roots, known_roots, exact=False):
siblings = set()
for gitpath, gitroots in known_roots.items():
# Of course we're going to match ourselves
if fullpath == gitpath or not my_roots or not gitroots or not len(gitroots.intersection(my_roots)):
continue
if gitroots == my_roots:
siblings.add(gitpath)
continue
if exact:
continue
if gitroots.issubset(my_roots) or my_roots.issubset(gitroots):
siblings.add(gitpath)
continue
sumdiff = len(gitroots.difference(my_roots)) + len(my_roots.difference(gitroots))
# If we only differ by a single root, consider us siblings
if sumdiff <= 2:
siblings.add(gitpath)
continue
return siblings
def find_best_obstrepo(mypath, obst_roots, toplevel, baselines, minratio=0.2):
# We want to find a repo with best intersect len to total roots len ratio,
# but we'll ignore any repos where the ratio is too low, in order not to lump
# together repositories that have very weak common histories.
myroots = get_repo_roots(mypath)
if not myroots:
return None
obstrepo = None
bestratio = 0
for path, roots in obst_roots.items():
if path == mypath or not roots:
continue
icount = len(roots.intersection(myroots))
if icount == 0:
# No match at all
continue
# Baseline repos win over the ratio logic
if len(baselines):
# Any of its member siblings match baselines?
s_remotes = list_repo_remotes(path, withurl=True)
for virtref, childpath in s_remotes:
gitdir = '/' + os.path.relpath(childpath, toplevel)
for baseline in baselines:
# Does this repo match a baseline
if fnmatch.fnmatch(gitdir, baseline):
# Use this one
return path
ratio = icount / len(roots)
if ratio < minratio:
continue
if ratio > bestratio:
obstrepo = path
bestratio = ratio
return obstrepo
def get_obstrepo_mapping(obstdir):
mapping = dict()
if not os.path.isdir(obstdir):
return mapping
for child in pathlib.Path(obstdir).iterdir():
if child.is_dir() and child.suffix == '.git':
obstrepo = child.as_posix()
ecode, out, err = run_git_command(obstrepo, ['remote', '-v'])
if ecode > 0:
# weird
continue
lines = out.split('\n')
for line in lines:
chunks = line.split()
if len(chunks) < 2:
continue
name, url = chunks[:2]
if url in mapping:
continue
# Does it still exist?
if not os.path.isdir(url):
continue
mapping[url] = obstrepo
return mapping
def find_objstore_repo_for(obstdir, fullpath):
if not os.path.isdir(obstdir):
return None
logger.debug('Finding an objstore repo matching %s', fullpath)
virtref = objstore_virtref(fullpath)
for child in pathlib.Path(obstdir).iterdir():
if child.is_dir() and child.suffix == '.git':
obstrepo = child.as_posix()
remotes = list_repo_remotes(obstrepo)
if virtref in remotes:
logger.debug('Found %s', child.name)
return obstrepo
logger.debug('No matching objstore repos for %s', fullpath)
return None
def get_forkgroups(obstdir, toplevel):
forkgroups = dict()
if not os.path.exists(obstdir):
return forkgroups
for child in pathlib.Path(obstdir).iterdir():
if child.is_dir() and child.suffix == '.git':
forkgroup = child.stem
forkgroups[forkgroup] = set()
obstrepo = child.as_posix()
remotes = list_repo_remotes(obstrepo, withurl=True)
for virtref, url in remotes:
if url.find(toplevel) != 0:
continue
forkgroups[forkgroup].add(url)
return forkgroups
def get_repo_fingerprint(toplevel, gitdir, force=False, ignorerefs=None):
fullpath = os.path.join(toplevel, gitdir.lstrip('/'))
if not os.path.exists(fullpath):
logger.debug('Cannot fingerprint %s, as it does not exist', fullpath)
return None
fpfile = os.path.join(fullpath, 'grokmirror.fingerprint')
if not force and os.path.exists(fpfile):
with open(fpfile, 'r') as fpfh:
fingerprint = fpfh.read()
logger.debug('Fingerprint for %s: %s', gitdir, fingerprint)
else:
logger.debug('Generating fingerprint for %s', gitdir)
ecode, out, err = run_git_command(fullpath, ['show-ref'])
if ecode > 0 or not len(out):
logger.debug('No heads in %s, nothing to fingerprint.', fullpath)
return None
if ignorerefs:
hasher = hashlib.sha1()
for line in out.split('\n'):
rhash, rname = line.split(maxsplit=1)
ignored = False
for ignoreref in ignorerefs:
if fnmatch.fnmatch(rname, ignoreref):
ignored = True
break
if ignored:
continue
hasher.update(line.encode() + b'\n')
fingerprint = hasher.hexdigest()
else:
# We add the final "\n" to be compatible with cmdline output
# of git-show-ref
fingerprint = hashlib.sha1(out.encode() + b'\n').hexdigest()
# Save it for future use
if not force:
set_repo_fingerprint(toplevel, gitdir, fingerprint)
return fingerprint
def set_repo_fingerprint(toplevel, gitdir, fingerprint=None):
fullpath = os.path.join(toplevel, gitdir.lstrip('/'))
fpfile = os.path.join(fullpath, 'grokmirror.fingerprint')
if fingerprint is None:
fingerprint = get_repo_fingerprint(toplevel, gitdir, force=True)
with open(fpfile, 'wt') as fpfh:
fpfh.write('%s' % fingerprint)
logger.debug('Recorded fingerprint for %s: %s', gitdir, fingerprint)
return fingerprint
def get_altrepo_map(toplevel, refresh=False):
global _alt_repo_map
if _alt_repo_map is None or refresh:
logger.info(' search: finding all repos using alternates')
_alt_repo_map = dict()
tp = pathlib.Path(toplevel)
for subp in tp.glob('**/*.git'):
if subp.is_symlink():
# Don't care about symlinks for altrepo mapping
continue
fullpath = subp.resolve().as_posix()
altrepo = get_altrepo(fullpath)
if not altrepo:
continue
if altrepo not in _alt_repo_map:
_alt_repo_map[altrepo] = set()
_alt_repo_map[altrepo].add(fullpath)
return _alt_repo_map
def is_alt_repo(toplevel, refrepo):
amap = get_altrepo_map(toplevel)
looking_for = os.path.realpath(os.path.join(toplevel, refrepo.strip('/')))
if looking_for in amap:
return True
return False
def is_obstrepo(fullpath, obstdir=None):
if obstdir:
# At this point, both should be normalized
return fullpath.find(obstdir) == 0
# Just check if it has a grokmirror.objstore file in the repo
return os.path.exists(os.path.join(fullpath, 'grokmirror.objstore'))
def find_all_gitdirs(toplevel, ignore=None, normalize=False, exclude_objstore=True):
global _alt_repo_map
if _alt_repo_map is None:
_alt_repo_map = dict()
build_amap = True
else:
build_amap = False
if ignore is None:
ignore = set()
logger.info(' search: finding all repos in %s', toplevel)
logger.debug('Ignore list: %s', ' '.join(ignore))
gitdirs = set()
for root, dirs, files in os.walk(toplevel, topdown=True):
if not len(dirs):
continue
torm = set()
for name in dirs:
fullpath = os.path.join(root, name)
# Should we ignore this dir?
ignored = False
for ignoredir in ignore:
if fnmatch.fnmatch(fullpath, ignoredir):
torm.add(name)
ignored = True
break
if ignored:
continue
if not is_bare_git_repo(fullpath):
continue
if exclude_objstore and os.path.exists(os.path.join(fullpath, 'grokmirror.objstore')):
continue
if normalize:
fullpath = os.path.realpath(fullpath)
logger.debug('Found %s', os.path.join(root, name))
gitdirs.add(fullpath)
torm.add(name)
if build_amap:
altrepo = get_altrepo(fullpath)
if not altrepo:
continue
if altrepo not in _alt_repo_map:
_alt_repo_map[altrepo] = set()
_alt_repo_map[altrepo].add(fullpath)
for name in torm:
# don't recurse into the found *.git dirs
dirs.remove(name)
return gitdirs
def manifest_lock(manifile):
global MANIFEST_LOCKH
if MANIFEST_LOCKH is not None:
logger.debug('Manifest %s already locked', manifile)
manilock = _lockname(manifile)
MANIFEST_LOCKH = open(manilock, 'w')
logger.debug('Attempting to lock %s', manilock)
lockf(MANIFEST_LOCKH, LOCK_EX)
logger.debug('Manifest lock obtained')
def manifest_unlock(manifile):
global MANIFEST_LOCKH
if MANIFEST_LOCKH is not None:
logger.debug('Unlocking manifest %s', manifile)
# noinspection PyTypeChecker
lockf(MANIFEST_LOCKH, LOCK_UN)
# noinspection PyUnresolvedReferences
MANIFEST_LOCKH.close()
MANIFEST_LOCKH = None
def read_manifest(manifile, wait=False):
while True:
if not wait or os.path.exists(manifile):
break
logger.info(' manifest: manifest does not exist yet, waiting ...')
# Unlock the manifest so other processes aren't waiting for us
was_locked = False
if MANIFEST_LOCKH is not None:
was_locked = True
manifest_unlock(manifile)
time.sleep(1)
if was_locked:
manifest_lock(manifile)
if not os.path.exists(manifile):
logger.info(' manifest: no local manifest, assuming initial run')
return dict()
if manifile.find('.gz') > 0:
fh = gzip.open(manifile, 'rb')
else:
fh = open(manifile, 'rb')
logger.debug('Reading %s', manifile)
jdata = fh.read().decode('utf-8')
fh.close()
# noinspection PyBroadException
try:
manifest = json.loads(jdata)
except:
# We'll regenerate the file entirely on failure to parse
logger.critical('Unable to parse %s, will regenerate', manifile)
manifest = dict()
logger.debug('Manifest contains %s entries', len(manifest.keys()))
return manifest
def write_manifest(manifile, manifest, mtime=None, pretty=False):
logger.debug('Writing new %s', manifile)
(dirname, basename) = os.path.split(manifile)
(fd, tmpfile) = tempfile.mkstemp(prefix=basename, dir=dirname)
fh = os.fdopen(fd, 'wb', 0)
logger.debug('Created a temporary file in %s', tmpfile)
logger.debug('Writing to %s', tmpfile)
try:
if pretty:
jdata = json.dumps(manifest, indent=2, sort_keys=True)
else:
jdata = json.dumps(manifest)
jdata = jdata.encode('utf-8')
if manifile.endswith('.gz'):
gfh = gzip.GzipFile(fileobj=fh, mode='wb')
gfh.write(jdata)
gfh.close()
else:
fh.write(jdata)
os.fsync(fd)
fh.close()
# set mode to current umask
curmask = os.umask(0)
os.chmod(tmpfile, 0o0666 ^ curmask)
os.umask(curmask)
if mtime is not None:
logger.debug('Setting mtime to %s', mtime)
os.utime(tmpfile, (mtime, mtime))
logger.debug('Moving %s to %s', tmpfile, manifile)
shutil.move(tmpfile, manifile)
finally:
# If something failed, don't leave these trailing around
if os.path.exists(tmpfile):
logger.debug('Removing %s', tmpfile)
os.unlink(tmpfile)
def load_config_file(cfgfile):
from configparser import ConfigParser, ExtendedInterpolation
if not os.path.exists(cfgfile):
sys.stderr.write('ERORR: File does not exist: %s\n' % cfgfile)
sys.exit(1)
config = ConfigParser(interpolation=ExtendedInterpolation())
config.read(cfgfile)
if 'core' not in config:
sys.stderr.write('ERROR: Section [core] must exist in: %s\n' % cfgfile)
sys.stderr.write(' Perhaps this is a grokmirror-1.x config file?\n')
sys.exit(1)
toplevel = os.path.realpath(os.path.expanduser(config['core'].get('toplevel')))
if not os.access(toplevel, os.W_OK):
logger.critical('Toplevel %s does not exist or is not writable', toplevel)
sys.exit(1)
# Just in case we did expanduser
config['core']['toplevel'] = toplevel
obstdir = config['core'].get('objstore', None)
if obstdir is None:
obstdir = os.path.join(toplevel, 'objstore')
config['core']['objstore'] = obstdir
# Handle some other defaults
manifile = config['core'].get('manifest')
if not manifile:
config['core']['manifest'] = os.path.join(toplevel, 'manifest.js.gz')
fstat = os.stat(cfgfile)
# stick last config file modification date into the config object,
# so we can catch config file updates
config.last_modified = fstat[8]
return config
def is_precious(fullpath):
args = ['config', '--get', 'extensions.preciousObjects']
retcode, output, error = run_git_command(fullpath, args)
if output.strip().lower() in ('yes', 'true', '1'):
return True
return False
def get_repack_level(obj_info, max_loose_objects=1200, max_packs=20, pc_loose_objects=10, pc_loose_size=10):
# for now, hardcode the maximum loose objects and packs
# XXX: we can probably set this in git config values?
# I don't think this makes sense as a global setting, because
# optimal values will depend on the size of the repo as a whole
packs = int(obj_info['packs'])
count_loose = int(obj_info['count'])
needs_repack = 0
# first, compare against max values:
if packs >= max_packs:
logger.debug('Triggering full repack because packs > %s', max_packs)
needs_repack = 2
elif count_loose >= max_loose_objects:
logger.debug('Triggering quick repack because loose objects > %s', max_loose_objects)
needs_repack = 1
else:
# is the number of loose objects or their size more than 10% of
# the overall total?
in_pack = int(obj_info['in-pack'])
size_loose = int(obj_info['size'])
size_pack = int(obj_info['size-pack'])
total_obj = count_loose + in_pack
total_size = size_loose + size_pack
# If we have an alternate, then add those numbers in
alternate = obj_info.get('alternate')
if alternate and len(alternate) > 8 and alternate[-8:] == '/objects':
alt_obj_info = get_repo_obj_info(alternate[:-8])
total_obj += int(alt_obj_info['in-pack'])
total_size += int(alt_obj_info['size-pack'])
# set some arbitrary "worth bothering" limits so we don't
# continuously repack tiny repos.
if total_obj > 500 and count_loose / total_obj * 100 >= pc_loose_objects:
logger.debug('Triggering repack because loose objects > %s%% of total', pc_loose_objects)
needs_repack = 1
elif total_size > 1024 and size_loose / total_size * 100 >= pc_loose_size:
logger.debug('Triggering repack because loose size > %s%% of total', pc_loose_size)
needs_repack = 1
return needs_repack
def init_logger(subcommand, logfile, loglevel, verbose):
global logger
logger = logging.getLogger('grokmirror')
logger.setLevel(logging.DEBUG)
if logfile:
ch = logging.handlers.WatchedFileHandler(os.path.expanduser(logfile))
formatter = logging.Formatter(subcommand + '[%(process)d] %(asctime)s - %(levelname)s - %(message)s')
ch.setFormatter(formatter)
ch.setLevel(loglevel)
logger.addHandler(ch)
ch = logging.StreamHandler()
formatter = logging.Formatter('%(message)s')
ch.setFormatter(formatter)
if verbose:
ch.setLevel(logging.INFO)
else:
ch.setLevel(logging.CRITICAL)
logger.addHandler(ch)
return logger
grokmirror-2.0.11/grokmirror/bundle.py 0000664 0000000 0000000 00000013637 14103301457 0020020 0 ustar 00root root 0000000 0000000 # -*- coding: utf-8 -*-
# Copyright (C) 2013-2020 by The Linux Foundation and contributors
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see .
import sys
import os
import logging
import fnmatch
import grokmirror
from pathlib import Path
# default basic logger. We override it later.
logger = logging.getLogger(__name__)
def get_repo_size(fullpath):
reposize = 0
obj_info = grokmirror.get_repo_obj_info(fullpath)
if 'alternate' in obj_info:
altpath = grokmirror.get_altrepo(fullpath)
reposize = get_repo_size(altpath)
reposize += int(obj_info['size'])
reposize += int(obj_info['size-pack'])
logger.debug('%s size: %s', fullpath, reposize)
return reposize
def generate_bundles(config, outdir, gitargs, revlistargs, maxsize, include):
# uses advisory lock, so its safe even if we die unexpectedly
manifest = grokmirror.read_manifest(config['core'].get('manifest'))
toplevel = os.path.realpath(config['core'].get('toplevel'))
if gitargs:
gitargs = gitargs.split()
if revlistargs:
revlistargs = revlistargs.split()
for repo in manifest.keys():
logger.debug('Checking %s', repo)
# Does it match our globbing pattern?
found = False
for tomatch in include:
if fnmatch.fnmatch(repo, tomatch) or fnmatch.fnmatch(repo, tomatch.lstrip('/')):
found = True
break
if not found:
logger.debug('%s does not match include list, skipping', repo)
continue
repo = repo.lstrip('/')
fullpath = os.path.join(toplevel, repo)
bundledir = os.path.join(outdir, repo.replace('.git', ''))
Path(bundledir).mkdir(parents=True, exist_ok=True)
repofpr = grokmirror.get_repo_fingerprint(toplevel, repo)
logger.debug('%s fingerprint is %s', repo, repofpr)
# Do we have a bundle file already?
bfile = os.path.join(bundledir, 'clone.bundle')
bfprfile = os.path.join(bundledir, '.fingerprint')
logger.debug('Looking for %s', bfile)
if os.path.exists(bfile):
# Do we have a bundle fingerprint?
logger.debug('Found existing bundle in %s', bfile)
if os.path.exists(bfprfile):
with open(bfprfile) as fh:
bfpr = fh.read().strip()
logger.debug('Read bundle fingerprint from %s: %s', bfprfile, bfpr)
if bfpr == repofpr:
logger.info(' skipped: %s (unchanged)', repo)
continue
logger.debug('checking size of %s', repo)
total_size = get_repo_size(fullpath)/1024/1024
if total_size > maxsize:
logger.info(' skipped: %s (%s > %s)', repo, total_size, maxsize)
continue
fullargs = gitargs + ['bundle', 'create', bfile] + revlistargs
logger.debug('Full git args: %s', fullargs)
logger.info(' generate: %s', bfile)
ecode, out, err = grokmirror.run_git_command(fullpath, fullargs)
if ecode == 0:
with open(bfprfile, 'w') as fh:
fh.write(repofpr)
logger.debug('Wrote %s into %s', repofpr, bfprfile)
return 0
def parse_args():
import argparse
# noinspection PyTypeChecker
op = argparse.ArgumentParser(prog='grok-bundle',
description='Generate clone.bundle files for use with "repo"',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
op.add_argument('-v', '--verbose', action='store_true',
default=False,
help='Be verbose and tell us what you are doing')
op.add_argument('-c', '--config',
required=True,
help='Location of the configuration file')
op.add_argument('-o', '--outdir',
required=True,
help='Location where to store bundle files')
op.add_argument('-g', '--gitargs',
default='-c core.compression=9',
help='extra args to pass to git')
op.add_argument('-r', '--revlistargs',
default='--branches HEAD',
help='Rev-list args to use')
op.add_argument('-s', '--maxsize', type=int,
default=2,
help='Maximum size of git repositories to bundle (in GiB)')
op.add_argument('-i', '--include', nargs='*',
default='*',
help='List repositories to bundle (accepts shell globbing)')
op.add_argument('--version', action='version', version=grokmirror.VERSION)
opts = op.parse_args()
return opts
def grok_bundle(cfgfile, outdir, gitargs, revlistargs, maxsize, include, verbose=False):
global logger
config = grokmirror.load_config_file(cfgfile)
logfile = config['core'].get('log', None)
if config['core'].get('loglevel', 'info') == 'debug':
loglevel = logging.DEBUG
else:
loglevel = logging.INFO
logger = grokmirror.init_logger('bundle', logfile, loglevel, verbose)
return generate_bundles(config, outdir, gitargs, revlistargs, maxsize, include)
def command():
opts = parse_args()
retval = grok_bundle(
opts.config, opts.outdir, opts.gitargs, opts.revlistargs, opts.maxsize, opts.include, verbose=opts.verbose)
sys.exit(retval)
if __name__ == '__main__':
command()
grokmirror-2.0.11/grokmirror/dumb_pull.py 0000775 0000000 0000000 00000016177 14103301457 0020537 0 ustar 00root root 0000000 0000000 # -*- coding: utf-8 -*-
# Copyright (C) 2013-2018 by The Linux Foundation and contributors
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see .
import os
import grokmirror
import logging
import fnmatch
import subprocess
logger = logging.getLogger(__name__)
def git_rev_parse_all(gitdir):
args = ['rev-parse', '--all']
retcode, output, error = grokmirror.run_git_command(gitdir, args)
if error:
# Put things we recognize into debug
debug = list()
warn = list()
for line in error.split('\n'):
warn.append(line)
if debug:
logger.debug('Stderr: %s', '\n'.join(debug))
if warn:
logger.warning('Stderr: %s', '\n'.join(warn))
return output
def git_remote_update(args, fullpath):
retcode, output, error = grokmirror.run_git_command(fullpath, args)
if error:
# Put things we recognize into debug
debug = list()
warn = list()
for line in error.split('\n'):
if line.find('From ') == 0:
debug.append(line)
elif line.find('-> ') > 0:
debug.append(line)
else:
warn.append(line)
if debug:
logger.debug('Stderr: %s', '\n'.join(debug))
if warn:
logger.warning('Stderr: %s', '\n'.join(warn))
def dumb_pull_repo(gitdir, remotes, svn=False):
# verify it's a git repo and fetch all remotes
logger.debug('Will pull %s with following remotes: %s', gitdir, remotes)
old_revs = git_rev_parse_all(gitdir)
try:
grokmirror.lock_repo(gitdir, nonblocking=True)
except IOError:
logger.info('Could not obtain exclusive lock on %s', gitdir)
logger.info('\tAssuming another process is running.')
return False
if svn:
logger.debug('Using git-svn for %s', gitdir)
for remote in remotes:
# arghie-argh-argh
if remote == '*':
remote = '--all'
logger.info('Running git-svn fetch %s in %s', remote, gitdir)
args = ['svn', 'fetch', remote]
git_remote_update(args, gitdir)
else:
# Not an svn remote
myremotes = grokmirror.list_repo_remotes(gitdir)
if not len(myremotes):
logger.info('Repository %s has no defined remotes!', gitdir)
return False
logger.debug('existing remotes: %s', myremotes)
for remote in remotes:
remotefound = False
for myremote in myremotes:
if fnmatch.fnmatch(myremote, remote):
remotefound = True
logger.debug('existing remote %s matches %s', myremote, remote)
args = ['remote', 'update', myremote, '--prune']
logger.info('Updating remote %s in %s', myremote, gitdir)
git_remote_update(args, gitdir)
if not remotefound:
logger.info('Could not find any remotes matching %s in %s', remote, gitdir)
new_revs = git_rev_parse_all(gitdir)
grokmirror.unlock_repo(gitdir)
if old_revs == new_revs:
logger.debug('No new revs, no updates')
return False
logger.debug('New revs found -- new content pulled')
return True
def run_post_update_hook(hookscript, gitdir):
if hookscript == '':
return
if not os.access(hookscript, os.X_OK):
logger.warning('post_update_hook %s is not executable', hookscript)
return
args = [hookscript, gitdir]
logger.debug('Running: %s', ' '.join(args))
(output, error) = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()
error = error.decode().strip()
output = output.decode().strip()
if error:
# Put hook stderror into warning
logger.warning('Hook Stderr: %s', error)
if output:
# Put hook stdout into info
logger.info('Hook Stdout: %s', output)
def parse_args():
import argparse
# noinspection PyTypeChecker
op = argparse.ArgumentParser(prog='grok-dumb-pull',
description='Fetch remotes in repositories not managed by grokmirror',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
op.add_argument('-v', '--verbose', dest='verbose', action='store_true',
default=False,
help='Be verbose and tell us what you are doing')
op.add_argument('-s', '--svn', dest='svn', action='store_true',
default=False,
help='The remotes for these repositories are Subversion')
op.add_argument('-r', '--remote-names', dest='remotes', action='append',
default=None,
help='Only fetch remotes matching this name (accepts shell globbing)')
op.add_argument('-u', '--post-update-hook', dest='posthook',
default='',
help='Run this hook after each repository is updated.')
op.add_argument('-l', '--logfile', dest='logfile',
default=None,
help='Put debug logs into this file')
op.add_argument('--version', action='version', version=grokmirror.VERSION)
op.add_argument('paths', nargs='+', help='Full path(s) of the repos to pull')
opts = op.parse_args()
if not len(opts.paths):
op.error('You must provide at least a path to the repos to pull')
return opts
def dumb_pull(paths, verbose=False, svn=False, remotes=None, posthook='', logfile=None):
global logger
loglevel = logging.INFO
logger = grokmirror.init_logger('dumb-pull', logfile, loglevel, verbose)
if remotes is None:
remotes = ['*']
# Find all repositories we are to pull
for entry in paths:
if entry[-4:] == '.git':
if not os.path.exists(entry):
logger.critical('%s does not exist', entry)
continue
logger.debug('Found %s', entry)
didwork = dumb_pull_repo(entry, remotes, svn=svn)
if didwork:
run_post_update_hook(posthook, entry)
else:
logger.debug('Finding all git repos in %s', entry)
for founddir in grokmirror.find_all_gitdirs(entry):
didwork = dumb_pull_repo(founddir, remotes, svn=svn)
if didwork:
run_post_update_hook(posthook, founddir)
def command():
opts = parse_args()
return dumb_pull(
opts.paths, verbose=opts.verbose, svn=opts.svn, remotes=opts.remotes,
posthook=opts.posthook, logfile=opts.logfile)
if __name__ == '__main__':
command()
grokmirror-2.0.11/grokmirror/fsck.py 0000775 0000000 0000000 00000161105 14103301457 0017472 0 ustar 00root root 0000000 0000000 # -*- coding: utf-8 -*-
# Copyright (C) 2013-2020 by The Linux Foundation and contributors
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see .
import os
import grokmirror
import logging
import time
import json
import random
import datetime
import shutil
import gc
import fnmatch
import io
import smtplib
from pathlib import Path
from email.message import EmailMessage
from fcntl import lockf, LOCK_EX, LOCK_UN, LOCK_NB
# default basic logger. We override it later.
logger = logging.getLogger(__name__)
def log_errors(fullpath, cmdargs, lines):
logger.critical('%s reports errors:', fullpath)
with open(os.path.join(fullpath, 'grokmirror.fsck.err'), 'w') as fh:
fh.write('# Date: %s\n' % datetime.datetime.today().strftime('%F'))
fh.write('# Cmd : git %s\n' % ' '.join(cmdargs))
count = 0
for line in lines:
fh.write('%s\n' % line)
logger.critical('\t%s', line)
count += 1
if count > 10:
logger.critical('\t [ %s more lines skipped ]', len(lines) - 10)
logger.critical('\t [ see %s/grokmirror.fsck.err ]', os.path.basename(fullpath))
break
def gen_preload_bundle(fullpath, config):
outdir = config['fsck'].get('preload_bundle_outdir')
Path(outdir).mkdir(parents=True, exist_ok=True)
bname = '%s.bundle' % os.path.basename(fullpath)[:-4]
args = ['bundle', 'create', os.path.join(outdir, bname), '--all']
logger.info(' bundling: %s', bname)
grokmirror.run_git_command(fullpath, args)
def get_blob_set(fullpath):
bset = set()
size = 0
blobcache = os.path.join(fullpath, 'grokmirror.blobs')
if os.path.exists(blobcache):
# Did it age out? Hardcode to 30 days.
expage = time.time() - 86400*30
st = os.stat(blobcache)
if st.st_mtime < expage:
os.unlink(blobcache)
try:
with open(blobcache) as fh:
while True:
line = fh.readline()
if not len(line):
break
if line[0] == '#':
continue
chunks = line.strip().split()
bhash = chunks[0]
bsize = int(chunks[1])
size += bsize
bset.add((bhash, bsize))
return bset, size
except FileNotFoundError:
pass
# This only makes sense for repos not using alternates, so make sure you check first
logger.info(' bloblist: %s', fullpath)
gitargs = ['cat-file', '--batch-all-objects', '--batch-check', '--unordered']
retcode, output, error = grokmirror.run_git_command(fullpath, gitargs)
if retcode == 0:
with open(blobcache, 'w') as fh:
fh.write('# Blobs and sizes used for sibling calculation\n')
for line in output.split('\n'):
if line.find(' blob ') < 0:
continue
chunks = line.strip().split()
fh.write(f'{chunks[0]} {chunks[2]}\n')
bhash = chunks[0]
bsize = int(chunks[2])
size += bsize
bset.add((bhash, bsize))
return bset, size
def check_sibling_repos_by_blobs(bset1, bsize1, bset2, bsize2, ratio):
iset = bset1.intersection(bset2)
if not len(iset):
return False
isize = 0
for bhash, bsize in iset:
isize += bsize
# Both repos should share at least ratio % of blobs in them
ratio1 = int(isize / bsize1 * 100)
logger.debug('isize=%s, bsize1=%s, ratio1=%s', isize, bsize1, ratio1)
ratio2 = int(isize / bsize2 * 100)
logger.debug('isize=%s, bsize2=%s ratio2=%s', isize, bsize2, ratio1)
if ratio1 >= ratio and ratio2 >= ratio:
return True
return False
def find_siblings_by_blobs(obstrepo, obstdir, ratio=75):
siblings = set()
oset, osize = get_blob_set(obstrepo)
for srepo in grokmirror.find_all_gitdirs(obstdir, normalize=True, exclude_objstore=False):
if srepo == obstrepo:
continue
logger.debug('Comparing blobs between %s and %s', obstrepo, srepo)
sset, ssize = get_blob_set(srepo)
if check_sibling_repos_by_blobs(oset, osize, sset, ssize, ratio):
logger.info(' siblings: %s and %s', os.path.basename(obstrepo), os.path.basename(srepo))
siblings.add(srepo)
return siblings
def merge_siblings(siblings, amap):
mdest = None
rcount = 0
# Who has the most remotes?
for sibling in set(siblings):
if sibling not in amap or not len(amap[sibling]):
# Orphaned sibling, ignore it -- it will get cleaned up
siblings.remove(sibling)
continue
s_remotes = grokmirror.list_repo_remotes(sibling)
if len(s_remotes) > rcount:
mdest = sibling
rcount = len(s_remotes)
# Migrate all siblings into the repo with most remotes
siblings.remove(mdest)
for sibling in siblings:
logger.info('%s: merging into %s', os.path.basename(sibling), os.path.basename(mdest))
s_remotes = grokmirror.list_repo_remotes(sibling, withurl=True)
for virtref, childpath in s_remotes:
if childpath not in amap[sibling]:
# The child repo isn't even using us
args = ['remote', 'remove', virtref]
grokmirror.run_git_command(sibling, args)
continue
logger.info(' moving: %s', childpath)
success = grokmirror.add_repo_to_objstore(mdest, childpath)
if not success:
logger.critical('Could not add %s to %s', childpath, mdest)
continue
logger.info(' : fetching into %s', os.path.basename(mdest))
success = grokmirror.fetch_objstore_repo(mdest, childpath)
if not success:
logger.critical('Failed to fetch %s from %s to %s', childpath, os.path.basename(sibling),
os.path.basename(mdest))
continue
logger.info(' : repointing alternates')
grokmirror.set_altrepo(childpath, mdest)
amap[sibling].remove(childpath)
amap[mdest].add(childpath)
args = ['remote', 'remove', virtref]
grokmirror.run_git_command(sibling, args)
logger.info(' : done')
return mdest
def check_reclone_error(fullpath, config, errors):
reclone = None
toplevel = os.path.realpath(config['core'].get('toplevel'))
errlist = config['fsck'].get('reclone_on_errors', '').split('\n')
for line in errors:
for estring in errlist:
if line.find(estring) != -1:
# is this repo used for alternates?
gitdir = '/' + os.path.relpath(fullpath, toplevel).lstrip('/')
if grokmirror.is_alt_repo(toplevel, gitdir):
logger.critical('\tused for alternates, not requesting auto-reclone')
return
else:
reclone = line
logger.critical('\trequested auto-reclone')
break
if reclone is not None:
break
if reclone is None:
return
set_repo_reclone(fullpath, reclone)
def get_repo_size(fullpath):
oi = grokmirror.get_repo_obj_info(fullpath)
kbsize = 0
for field in ['size', 'size-pack', 'size-garbage']:
try:
kbsize += int(oi[field])
except (KeyError, ValueError):
pass
logger.debug('%s size: %s kb', fullpath, kbsize)
return kbsize
def get_human_size(kbsize):
num = kbsize
for unit in ['Ki', 'Mi', 'Gi']:
if abs(num) < 1024.0:
return "%3.2f %sB" % (num, unit)
num /= 1024.0
return "%.2f%s TiB" % num
def set_repo_reclone(fullpath, reason):
rfile = os.path.join(fullpath, 'grokmirror.reclone')
# Have we already requested a reclone?
if os.path.exists(rfile):
logger.debug('Already requested repo reclone for %s', fullpath)
return
with open(rfile, 'w') as rfh:
rfh.write('Requested by grok-fsck due to error: %s' % reason)
def run_git_prune(fullpath, config):
# WARNING: We assume you've already verified that it's safe to do so
prune_ok = True
isprecious = grokmirror.is_precious(fullpath)
if isprecious:
set_precious_objects(fullpath, False)
# We set expire to yesterday in order to avoid race conditions
# in repositories that are actively being accessed at the time of
# running the prune job.
args = ['prune', '--expire=yesterday']
logger.info(' prune: pruning')
retcode, output, error = grokmirror.run_git_command(fullpath, args)
if error:
# Put things we recognize as fairly benign into debug
debug = list()
warn = list()
ierrors = set([x.strip() for x in config['fsck'].get('ignore_errors', '').split('\n')])
for line in error.split('\n'):
ignored = False
for estring in ierrors:
if line.find(estring) != -1:
ignored = True
debug.append(line)
break
if not ignored:
warn.append(line)
if debug:
logger.debug('Stderr: %s', '\n'.join(debug))
if warn:
prune_ok = False
log_errors(fullpath, args, warn)
check_reclone_error(fullpath, config, warn)
if isprecious:
set_precious_objects(fullpath, True)
return prune_ok
def is_safe_to_prune(fullpath, config):
if config['fsck'].get('prune', 'yes') != 'yes':
logger.debug('Pruning disabled in config file')
return False
toplevel = os.path.realpath(config['core'].get('toplevel'))
obstdir = os.path.realpath(config['core'].get('objstore'))
gitdir = '/' + os.path.relpath(fullpath, toplevel).lstrip('/')
if grokmirror.is_obstrepo(fullpath, obstdir):
# We only prune if all repos pointing to us are public
urls = set(grokmirror.list_repo_remotes(fullpath, withurl=True))
mine = set([x[1] for x in urls])
amap = grokmirror.get_altrepo_map(toplevel)
if mine != amap[fullpath]:
logger.debug('Cannot prune %s because it is used by non-public repos', gitdir)
return False
elif grokmirror.is_alt_repo(toplevel, gitdir):
logger.debug('Cannot prune %s because it is used as alternates by other repos', gitdir)
return False
logger.debug('%s should be safe to prune', gitdir)
return True
def run_git_repack(fullpath, config, level=1, prune=True):
# Returns false if we hit any errors on the way
repack_ok = True
obstdir = os.path.realpath(config['core'].get('objstore'))
toplevel = os.path.realpath(config['core'].get('toplevel'))
gitdir = '/' + os.path.relpath(fullpath, toplevel).lstrip('/')
ierrors = set([x.strip() for x in config['fsck'].get('ignore_errors', '').split('\n')])
if prune:
# Make sure it's safe to do so
prune = is_safe_to_prune(fullpath, config)
if config['fsck'].get('precious', '') == 'always':
always_precious = True
set_precious_objects(fullpath, enabled=True)
else:
always_precious = False
set_precious_objects(fullpath, enabled=False)
set_precious_after = False
gen_commitgraph = True
# Figure out what our repack flags should be.
repack_flags = list()
rregular = config['fsck'].get('extra_repack_flags', '').split()
if len(rregular):
repack_flags += rregular
full_repack_flags = ['-f', '--pack-kept-objects']
rfull = config['fsck'].get('extra_repack_flags_full', '').split()
if len(rfull):
full_repack_flags += rfull
if grokmirror.is_obstrepo(fullpath, obstdir):
set_precious_after = True
repack_flags.append('-a')
if not prune and not always_precious:
repack_flags.append('-k')
elif grokmirror.is_alt_repo(toplevel, gitdir):
set_precious_after = True
if grokmirror.get_altrepo(fullpath):
gen_commitgraph = False
logger.warning(' warning : has alternates and is used by others for alternates')
logger.warning(' : this can cause grandchild corruption')
repack_flags.append('-A')
repack_flags.append('-l')
else:
repack_flags.append('-a')
repack_flags.append('-b')
if not always_precious:
repack_flags.append('-k')
elif grokmirror.get_altrepo(fullpath):
# we are a "child repo"
gen_commitgraph = False
repack_flags.append('-l')
repack_flags.append('-A')
if prune:
repack_flags.append('--unpack-unreachable=yesterday')
else:
# we have no relationships with other repos
repack_flags.append('-a')
repack_flags.append('-b')
if prune:
repack_flags.append('--unpack-unreachable=yesterday')
if level > 1:
logger.info(' repack: performing a full repack for optimal deltas')
repack_flags += full_repack_flags
if not always_precious:
repack_flags.append('-d')
# If we have a logs dir, then run reflog expire
if os.path.isdir(os.path.join(fullpath, 'logs')):
args = ['reflog', 'expire', '--all', '--stale-fix']
logger.info(' reflog: expiring reflogs')
grokmirror.run_git_command(fullpath, args)
args = ['repack'] + repack_flags
logger.info(' repack: repacking with "%s"', ' '.join(repack_flags))
# We always tack on -q
args.append('-q')
retcode, output, error = grokmirror.run_git_command(fullpath, args)
# With newer versions of git, repack may return warnings that are safe to ignore
# so use the same strategy to weed out things we aren't interested in seeing
if error:
# Put things we recognize as fairly benign into debug
debug = list()
warn = list()
for line in error.split('\n'):
ignored = False
for estring in ierrors:
if line.find(estring) != -1:
ignored = True
debug.append(line)
break
if not ignored:
warn.append(line)
if debug:
logger.debug('Stderr: %s', '\n'.join(debug))
if warn:
repack_ok = False
log_errors(fullpath, args, warn)
check_reclone_error(fullpath, config, warn)
if not repack_ok:
# No need to repack refs if repo is broken
if set_precious_after:
set_precious_objects(fullpath, enabled=True)
return False
if gen_commitgraph and config['fsck'].get('commitgraph', 'yes') == 'yes':
grokmirror.set_git_config(fullpath, 'core.commitgraph', 'true')
run_git_commit_graph(fullpath)
# repacking refs requires a separate command, so run it now
args = ['pack-refs']
if level > 1:
logger.info(' packrefs: repacking all refs')
args.append('--all')
else:
logger.info(' packrefs: repacking refs')
retcode, output, error = grokmirror.run_git_command(fullpath, args)
# pack-refs shouldn't return anything, but use the same ignore_errors block
# to weed out any future potential benign warnings
if error:
# Put things we recognize as fairly benign into debug
debug = list()
warn = list()
for line in error.split('\n'):
ignored = False
for estring in ierrors:
if line.find(estring) != -1:
ignored = True
debug.append(line)
break
if not ignored:
warn.append(line)
if debug:
logger.debug('Stderr: %s', '\n'.join(debug))
if warn:
repack_ok = False
log_errors(fullpath, args, warn)
check_reclone_error(fullpath, config, warn)
if prune:
repack_ok = run_git_prune(fullpath, config)
if set_precious_after:
set_precious_objects(fullpath, enabled=True)
return repack_ok
def run_git_fsck(fullpath, config, conn_only=False):
args = ['fsck', '--no-progress', '--no-dangling', '--no-reflogs']
obstdir = os.path.realpath(config['core'].get('objstore'))
# If it's got an obstrepo, always run as connectivity-only
altrepo = grokmirror.get_altrepo(fullpath)
if altrepo and grokmirror.is_obstrepo(altrepo, obstdir):
logger.debug('Repo uses objstore, forcing connectivity-only')
conn_only = True
if conn_only:
args.append('--connectivity-only')
logger.info(' fsck: running with --connectivity-only')
else:
logger.info(' fsck: running full checks')
retcode, output, error = grokmirror.run_git_command(fullpath, args)
if output or error:
# Put things we recognize as fairly benign into debug
debug = list()
warn = list()
ierrors = set([x.strip() for x in config['fsck'].get('ignore_errors', '').split('\n')])
for line in output.split('\n') + error.split('\n'):
if not len(line.strip()):
continue
ignored = False
for estring in ierrors:
if line.find(estring) != -1:
ignored = True
debug.append(line)
break
if not ignored:
warn.append(line)
if debug:
logger.debug('Stderr: %s', '\n'.join(debug))
if warn:
log_errors(fullpath, args, warn)
check_reclone_error(fullpath, config, warn)
def run_git_commit_graph(fullpath):
# Does our version of git support commit-graph?
if not grokmirror.git_newer_than('2.18.0'):
logger.debug('Git version too old, not generating commit-graph')
logger.info(' graph: generating commit-graph')
args = ['commit-graph', 'write']
retcode, output, error = grokmirror.run_git_command(fullpath, args)
if retcode == 0:
return True
return False
def set_precious_objects(fullpath, enabled=True):
# It's better to just set it blindly without checking first,
# as this results in one fewer shell-out.
logger.debug('Setting preciousObjects for %s', fullpath)
if enabled:
poval = 'true'
else:
poval = 'false'
grokmirror.set_git_config(fullpath, 'extensions.preciousObjects', poval)
def check_precious_objects(fullpath):
return grokmirror.is_precious(fullpath)
def get_repack_level(obj_info, max_loose_objects=1200, max_packs=20, pc_loose_objects=10, pc_loose_size=10):
# for now, hardcode the maximum loose objects and packs
# XXX: we can probably set this in git config values?
# I don't think this makes sense as a global setting, because
# optimal values will depend on the size of the repo as a whole
packs = int(obj_info['packs'])
count_loose = int(obj_info['count'])
needs_repack = 0
# first, compare against max values:
if packs >= max_packs:
logger.debug('Triggering full repack because packs > %s', max_packs)
needs_repack = 2
elif count_loose >= max_loose_objects:
logger.debug('Triggering quick repack because loose objects > %s', max_loose_objects)
needs_repack = 1
else:
# is the number of loose objects or their size more than 10% of
# the overall total?
in_pack = int(obj_info['in-pack'])
size_loose = int(obj_info['size'])
size_pack = int(obj_info['size-pack'])
total_obj = count_loose + in_pack
total_size = size_loose + size_pack
# set some arbitrary "worth bothering" limits so we don't
# continuously repack tiny repos.
if total_obj > 500 and count_loose / total_obj * 100 >= pc_loose_objects:
logger.debug('Triggering repack because loose objects > %s%% of total', pc_loose_objects)
needs_repack = 1
elif total_size > 1024 and size_loose / total_size * 100 >= pc_loose_size:
logger.debug('Triggering repack because loose size > %s%% of total', pc_loose_size)
needs_repack = 1
return needs_repack
def fsck_mirror(config, force=False, repack_only=False, conn_only=False,
repack_all_quick=False, repack_all_full=False):
if repack_all_quick or repack_all_full:
force = True
statusfile = config['fsck'].get('statusfile')
if not statusfile:
logger.critical('Please define fsck.statusfile in the config')
return 1
st_dir = os.path.dirname(statusfile)
if not os.path.isdir(os.path.dirname(statusfile)):
logger.critical('Directory %s is absent', st_dir)
return 1
# Lock the tree to make sure we only run one instance
lockfile = os.path.join(st_dir, '.%s.lock' % os.path.basename(statusfile))
logger.debug('Attempting to obtain lock on %s', lockfile)
flockh = open(lockfile, 'w')
try:
lockf(flockh, LOCK_EX | LOCK_NB)
except IOError:
logger.info('Could not obtain exclusive lock on %s', lockfile)
logger.info('Assuming another process is running.')
return 0
manifile = config['core'].get('manifest')
logger.info('Analyzing %s', manifile)
grokmirror.manifest_lock(manifile)
manifest = grokmirror.read_manifest(manifile)
if os.path.exists(statusfile):
logger.info(' status: reading %s', statusfile)
stfh = open(statusfile, 'r')
# noinspection PyBroadException
try:
# Format of the status file:
# {
# '/full/path/to/repository': {
# 'lastcheck': 'YYYY-MM-DD' or 'never',
# 'nextcheck': 'YYYY-MM-DD',
# 'lastrepack': 'YYYY-MM-DD',
# 'fingerprint': 'sha-1',
# 's_elapsed': seconds,
# 'quick_repack_count': times,
# },
# ...
# }
status = json.loads(stfh.read())
except:
logger.critical('Failed to parse %s', statusfile)
lockf(flockh, LOCK_UN)
flockh.close()
return 1
else:
status = dict()
frequency = config['fsck'].getint('frequency', 30)
today = datetime.datetime.today()
todayiso = today.strftime('%F')
if force:
# Use randomization for next check, again
checkdelay = random.randint(1, frequency)
else:
checkdelay = frequency
commitgraph = config['fsck'].getboolean('commitgraph', True)
# Is our git version new enough to support it?
if commitgraph and not grokmirror.git_newer_than('2.18.0'):
logger.info('Git version too old to support commit graphs, disabling')
config['fsck']['commitgraph'] = 'no'
# Go through the manifest and compare with status
toplevel = os.path.realpath(config['core'].get('toplevel'))
changed = False
for gitdir in list(manifest):
fullpath = os.path.join(toplevel, gitdir.lstrip('/'))
# Does it exist?
if not os.path.isdir(fullpath):
# Remove it from manifest and status
manifest.pop(gitdir)
status.pop(fullpath)
changed = True
continue
if fullpath not in status.keys():
# Newly added repository
if not force:
# Randomize next check between now and frequency
delay = random.randint(0, frequency)
nextdate = today + datetime.timedelta(days=delay)
nextcheck = nextdate.strftime('%F')
else:
nextcheck = todayiso
status[fullpath] = {
'lastcheck': 'never',
'nextcheck': nextcheck,
'fingerprint': grokmirror.get_repo_fingerprint(toplevel, gitdir),
}
logger.info('%s:', fullpath)
logger.info(' added: next check on %s', nextcheck)
if 'manifest' in config:
pretty = config['manifest'].getboolean('pretty', False)
else:
pretty = False
if changed:
grokmirror.write_manifest(manifile, manifest, pretty=pretty)
grokmirror.manifest_unlock(manifile)
# record newly found repos in the status file
logger.debug('Updating status file in %s', statusfile)
with open(statusfile, 'w') as stfh:
stfh.write(json.dumps(status, indent=2))
# Go through status and find all repos that need work done on them.
to_process = set()
total_checked = 0
total_elapsed = 0
space_saved = 0
cfg_repack = config['fsck'].getboolean('repack', True)
# Can be "always", which is why we don't getboolean
cfg_precious = config['fsck'].get('precious', 'yes')
obstdir = os.path.realpath(config['core'].get('objstore'))
logger.info(' search: getting parent commit info from all repos, may take a while')
top_roots, obst_roots = grokmirror.get_rootsets(toplevel, obstdir)
amap = grokmirror.get_altrepo_map(toplevel)
fetched_obstrepos = set()
obst_changes = False
analyzed = 0
queued = 0
logger.info('Analyzing %s (%s repos)', toplevel, len(status))
stattime = time.time()
baselines = [x.strip() for x in config['fsck'].get('baselines', '').split('\n')]
for fullpath in list(status):
# Give me a status every 5 seconds
if time.time() - stattime >= 5:
logger.info(' ---: %s/%s analyzed, %s queued', analyzed, len(status), queued)
stattime = time.time()
start_size = get_repo_size(fullpath)
analyzed += 1
# We do obstrepos separately below, as logic is different
if grokmirror.is_obstrepo(fullpath, obstdir):
logger.debug('Skipping %s (obstrepo)')
continue
# Check to make sure it's still in the manifest
gitdir = fullpath.replace(toplevel, '', 1)
gitdir = '/' + gitdir.lstrip('/')
if gitdir not in manifest:
status.pop(fullpath)
logger.debug('%s is gone, no longer in manifest', gitdir)
continue
# Make sure FETCH_HEAD is pointing to /dev/null
fetch_headf = os.path.join(fullpath, 'FETCH_HEAD')
if not os.path.islink(fetch_headf):
logger.debug(' replacing FETCH_HEAD with symlink to /dev/null')
try:
os.unlink(fetch_headf)
except FileNotFoundError:
pass
os.symlink('/dev/null', fetch_headf)
# Objstore migration routines
# Are we using objstore?
altdir = grokmirror.get_altrepo(fullpath)
is_private = grokmirror.is_private_repo(config, gitdir)
if grokmirror.is_alt_repo(toplevel, gitdir):
# Don't prune any repos that are parents -- until migration is fully complete
m_prune = False
else:
m_prune = True
if not altdir and not os.path.exists(os.path.join(fullpath, 'grokmirror.do-not-objstore')):
# Do we match any obstdir repos?
obstrepo = grokmirror.find_best_obstrepo(fullpath, obst_roots, toplevel, baselines)
if obstrepo:
obst_changes = True
# Yes, set ourselves up to be using that obstdir
logger.info('%s: can use %s', gitdir, os.path.basename(obstrepo))
grokmirror.set_altrepo(fullpath, obstrepo)
if not is_private:
grokmirror.add_repo_to_objstore(obstrepo, fullpath)
# Fetch into the obstrepo
logger.info(' fetch: fetching %s', gitdir)
grokmirror.fetch_objstore_repo(obstrepo, fullpath)
obst_roots[obstrepo] = grokmirror.get_repo_roots(obstrepo, force=True)
run_git_repack(fullpath, config, level=1, prune=m_prune)
space_saved += start_size - get_repo_size(fullpath)
else:
# Do we have any toplevel siblings?
obstrepo = None
my_roots = grokmirror.get_repo_roots(fullpath)
top_siblings = grokmirror.find_siblings(fullpath, my_roots, top_roots)
if len(top_siblings):
# Am I a private repo?
if is_private:
# Are there any non-private siblings?
for top_sibling in top_siblings:
# Are you a private repo?
if grokmirror.is_private_repo(config, top_sibling):
continue
# Great, make an objstore repo out of this sibling
obstrepo = grokmirror.setup_objstore_repo(obstdir)
logger.info('%s: can use %s', gitdir, os.path.basename(obstrepo))
logger.info(' init: new objstore repo %s', os.path.basename(obstrepo))
grokmirror.add_repo_to_objstore(obstrepo, top_sibling)
# Fetch into the obstrepo
logger.info(' fetch: fetching %s', top_sibling)
grokmirror.fetch_objstore_repo(obstrepo, top_sibling)
obst_roots[obstrepo] = grokmirror.get_repo_roots(obstrepo, force=True)
# It doesn't matter if this fails, because repacking is still safe
# Other siblings will match in their own due course
break
else:
# Make an objstore repo out of myself
obstrepo = grokmirror.setup_objstore_repo(obstdir)
logger.info('%s: can use %s', gitdir, os.path.basename(obstrepo))
logger.info(' init: new objstore repo %s', os.path.basename(obstrepo))
grokmirror.add_repo_to_objstore(obstrepo, fullpath)
if obstrepo:
obst_changes = True
# Set alternates to the obstrepo
grokmirror.set_altrepo(fullpath, obstrepo)
if not is_private:
# Fetch into the obstrepo
logger.info(' fetch: fetching %s', gitdir)
grokmirror.fetch_objstore_repo(obstrepo, fullpath)
run_git_repack(fullpath, config, level=1, prune=m_prune)
space_saved += start_size - get_repo_size(fullpath)
obst_roots[obstrepo] = grokmirror.get_repo_roots(obstrepo, force=True)
elif not os.path.isdir(altdir):
logger.critical(' reclone: %s (alternates repo gone)', gitdir)
set_repo_reclone(fullpath, 'Alternates repository gone')
continue
elif altdir.find(obstdir) != 0:
# We have an alternates repo, but it's not an objstore repo
# Probably left over from grokmirror-1.x
# Do we have any matching obstrepos?
obstrepo = grokmirror.find_best_obstrepo(fullpath, obst_roots, toplevel, baselines)
if obstrepo:
logger.info('%s: migrating to %s', gitdir, os.path.basename(obstrepo))
if altdir not in fetched_obstrepos:
# We're already sharing objects with altdir, so no need to check if it's private
grokmirror.add_repo_to_objstore(obstrepo, altdir)
logger.info(' fetch: fetching %s (previous parent)', os.path.relpath(altdir, toplevel))
success = grokmirror.fetch_objstore_repo(obstrepo, altdir)
fetched_obstrepos.add(altdir)
if success:
set_precious_objects(altdir, enabled=False)
pre_size = get_repo_size(altdir)
run_git_repack(altdir, config, level=1, prune=False)
space_saved += pre_size - get_repo_size(altdir)
else:
logger.critical('Unsuccessful fetching %s into %s', altdir, os.path.basename(obstrepo))
obstrepo = None
else:
# Make a new obstrepo out of mommy
obstrepo = grokmirror.setup_objstore_repo(obstdir)
logger.info('%s: migrating to %s', gitdir, os.path.basename(obstrepo))
logger.info(' init: new objstore repo %s', os.path.basename(obstrepo))
grokmirror.add_repo_to_objstore(obstrepo, altdir)
logger.info(' fetch: fetching %s (previous parent)', os.path.relpath(altdir, toplevel))
success = grokmirror.fetch_objstore_repo(obstrepo, altdir)
fetched_obstrepos.add(altdir)
if success:
grokmirror.set_altrepo(altdir, obstrepo)
# mommy is no longer precious
set_precious_objects(altdir, enabled=False)
# Don't prune, because there may be objects others are still borrowing
# It can only be pruned once the full migration is completed
pre_size = get_repo_size(altdir)
run_git_repack(altdir, config, level=1, prune=False)
space_saved += pre_size - get_repo_size(altdir)
else:
logger.critical('Unsuccessful fetching %s into %s', altdir, os.path.basename(obstrepo))
obstrepo = None
if obstrepo:
obst_changes = True
if not is_private:
# Fetch into the obstrepo
grokmirror.add_repo_to_objstore(obstrepo, fullpath)
logger.info(' fetch: fetching %s', gitdir)
if grokmirror.fetch_objstore_repo(obstrepo, fullpath):
grokmirror.set_altrepo(fullpath, obstrepo)
set_precious_objects(fullpath, enabled=False)
run_git_repack(fullpath, config, level=1, prune=m_prune)
space_saved += start_size - get_repo_size(fullpath)
else:
# Grab all the objects from the previous parent, since we can't simply
# fetch ourselves into the obstrepo (we're private).
args = ['repack', '-a']
logger.info(' fetch: restoring private repo %s', gitdir)
if grokmirror.run_git_command(fullpath, args):
grokmirror.set_altrepo(fullpath, obstrepo)
set_precious_objects(fullpath, enabled=False)
# Now repack ourselves to get rid of any public objects
run_git_repack(fullpath, config, level=1, prune=m_prune)
obst_roots[obstrepo] = grokmirror.get_repo_roots(obstrepo, force=True)
elif altdir.find(obstdir) == 0 and not is_private:
# Make sure this repo is properly set up with obstrepo
# (e.g. it could have been cloned/copied and obstrepo is not tracking it yet)
obstrepo = altdir
s_remotes = grokmirror.list_repo_remotes(obstrepo, withurl=True)
found = False
for virtref, childpath in s_remotes:
if childpath == fullpath:
found = True
break
if not found:
# Set it up properly
grokmirror.add_repo_to_objstore(obstrepo, fullpath)
logger.info(' reconfig: %s to fetch into %s', gitdir, os.path.basename(obstrepo))
obj_info = grokmirror.get_repo_obj_info(fullpath)
try:
packs = int(obj_info['packs'])
count_loose = int(obj_info['count'])
except KeyError:
logger.warning('Unable to count objects in %s, skipping' % fullpath)
continue
schedcheck = datetime.datetime.strptime(status[fullpath]['nextcheck'], '%Y-%m-%d')
nextcheck = today + datetime.timedelta(days=checkdelay)
if not cfg_repack:
# don't look at me if you turned off repack
logger.debug('Not repacking because repack=no in config')
repack_level = None
elif repack_all_full and (count_loose > 0 or packs > 1):
logger.debug('repack_level=2 due to repack_all_full')
repack_level = 2
elif repack_all_quick and count_loose > 0:
logger.debug('repack_level=1 due to repack_all_quick')
repack_level = 1
elif status[fullpath].get('fingerprint') != grokmirror.get_repo_fingerprint(toplevel, gitdir):
logger.debug('Checking repack level of %s', fullpath)
repack_level = get_repack_level(obj_info)
else:
repack_level = None
# trigger a level-1 repack if it's regular check time and the fingerprint has changed
if (not repack_level and schedcheck <= today
and status[fullpath].get('fingerprint') != grokmirror.get_repo_fingerprint(toplevel, gitdir)):
status[fullpath]['nextcheck'] = nextcheck.strftime('%F')
logger.info(' aged: %s (forcing repack)', fullpath)
repack_level = 1
# If we're not already repacking the repo, run a prune if we find garbage in it
if obj_info['garbage'] != '0' and not repack_level and is_safe_to_prune(fullpath, config):
logger.info(' garbage: %s (%s files, %s KiB)', gitdir, obj_info['garbage'], obj_info['size-garbage'])
try:
grokmirror.lock_repo(fullpath, nonblocking=True)
run_git_prune(fullpath, config)
grokmirror.unlock_repo(fullpath)
except IOError:
pass
if repack_level and (cfg_precious == 'always' and check_precious_objects(fullpath)):
# if we have preciousObjects, then we only repack based on the same
# schedule as fsck.
logger.debug('preciousObjects is set')
# for repos with preciousObjects, we use the fsck schedule for repacking
if schedcheck <= today:
logger.debug('Time for a full periodic repack of a preciousObjects repo')
status[fullpath]['nextcheck'] = nextcheck.strftime('%F')
repack_level = 2
else:
logger.debug('Not repacking preciousObjects repo outside of schedule')
repack_level = None
if repack_level:
queued += 1
to_process.add((fullpath, 'repack', repack_level))
if repack_level > 1:
logger.info(' queued: %s (full repack)', fullpath)
else:
logger.info(' queued: %s (repack)', fullpath)
elif repack_only or repack_all_quick or repack_all_full:
continue
elif schedcheck <= today or force:
queued += 1
to_process.add((fullpath, 'fsck', None))
logger.info(' queued: %s (fsck)', fullpath)
logger.info(' done: %s analyzed, %s queued', analyzed, queued)
if obst_changes:
# Refresh the alt repo map cache
amap = grokmirror.get_altrepo_map(toplevel, refresh=True)
obstrepos = grokmirror.find_all_gitdirs(obstdir, normalize=True, exclude_objstore=False)
analyzed = 0
queued = 0
logger.info('Analyzing %s (%s repos)', obstdir, len(obstrepos))
objstore_uses_plumbing = config['core'].getboolean('objstore_uses_plumbing', False)
islandcores = [x.strip() for x in config['fsck'].get('islandcores', '').split('\n')]
stattime = time.time()
for obstrepo in obstrepos:
if time.time() - stattime >= 5:
logger.info(' ---: %s/%s analyzed, %s queued', analyzed, len(obstrepos), queued)
stattime = time.time()
analyzed += 1
logger.debug('Processing objstore repo: %s', os.path.basename(obstrepo))
my_roots = grokmirror.get_repo_roots(obstrepo)
if obstrepo in amap and len(amap[obstrepo]):
# Is it redundant with any other objstore repos?
strategy = config['fsck'].get('obstrepo_merge_strategy', 'exact')
if strategy == 'blobs':
siblings = find_siblings_by_blobs(obstrepo, obstdir, ratio=75)
else:
exact_merge = True
if strategy == 'loose':
exact_merge = False
siblings = grokmirror.find_siblings(obstrepo, my_roots, obst_roots, exact=exact_merge)
if len(siblings):
siblings.add(obstrepo)
mdest = merge_siblings(siblings, amap)
obst_changes = True
if mdest in status:
# Force full repack of merged obstrepos
status[mdest]['nextcheck'] = todayiso
# Recalculate my roots
my_roots = grokmirror.get_repo_roots(obstrepo, force=True)
obst_roots[obstrepo] = my_roots
# Not an else, because the previous step may have migrated things
if obstrepo not in amap or not len(amap[obstrepo]):
obst_changes = True
# XXX: Is there a possible race condition here if grok-pull cloned a new repo
# while we were migrating this one?
logger.info('%s: deleting (no longer used by anything)', os.path.basename(obstrepo))
if obstrepo in amap:
amap.pop(obstrepo)
shutil.rmtree(obstrepo)
continue
# Record the latest sibling info in the tracking file
telltale = os.path.join(obstrepo, 'grokmirror.objstore')
with open(telltale, 'w') as fh:
fh.write(grokmirror.OBST_PREAMBULE)
fh.write('\n'.join(sorted(list(amap[obstrepo]))) + '\n')
my_remotes = grokmirror.list_repo_remotes(obstrepo, withurl=True)
# Use the first child repo as our "reference" entry in manifest
refrepo = None
# Use for the alternateRefsPrefixes value
baseline_refs = set()
set_islandcore = False
new_islandcore = False
valid_virtrefs = set()
for virtref, childpath in my_remotes:
# Is it still relevant?
if childpath not in amap[obstrepo]:
# Remove it and let prune take care of it
grokmirror.remove_from_objstore(obstrepo, childpath)
logger.info('%s: removed remote %s (no longer used)', os.path.basename(obstrepo), childpath)
continue
valid_virtrefs.add(virtref)
# Does it need fetching?
fetch = True
l_fpf = os.path.join(obstrepo, 'grokmirror.%s.fingerprint' % virtref)
r_fpf = os.path.join(childpath, 'grokmirror.fingerprint')
try:
with open(l_fpf) as fh:
l_fp = fh.read().strip()
with open(r_fpf) as fh:
r_fp = fh.read().strip()
if l_fp == r_fp:
fetch = False
except IOError:
pass
gitdir = '/' + os.path.relpath(childpath, toplevel)
if fetch:
grokmirror.lock_repo(obstrepo, nonblocking=False)
logger.info(' fetch: %s -> %s', gitdir, os.path.basename(obstrepo))
success = grokmirror.fetch_objstore_repo(obstrepo, childpath, use_plumbing=objstore_uses_plumbing)
if not success and objstore_uses_plumbing:
# Try using git porcelain
grokmirror.fetch_objstore_repo(obstrepo, childpath)
grokmirror.unlock_repo(obstrepo)
if gitdir not in manifest:
continue
# Do we need to set any alternateRefsPrefixes?
for baseline in baselines:
# Does this repo match a baseline
if fnmatch.fnmatch(gitdir, baseline):
baseline_refs.add('refs/virtual/%s/heads/' % virtref)
break
# Do we need to set islandCore?
if not set_islandcore:
is_islandcore = False
for islandcore in islandcores:
# Does this repo match a baseline
if fnmatch.fnmatch(gitdir, islandcore):
is_islandcore = True
break
if is_islandcore:
set_islandcore = True
# is it already set to that?
entries = grokmirror.get_config_from_git(obstrepo, r'pack\.island*')
if entries.get('islandcore') != virtref:
new_islandcore = True
logger.info(' reconfig: %s (islandCore to %s)', os.path.basename(obstrepo), virtref)
grokmirror.set_git_config(obstrepo, 'pack.islandCore', virtref)
if refrepo is None:
# Legacy "reference=" setting in manifest
refrepo = gitdir
manifest[gitdir]['reference'] = None
else:
manifest[gitdir]['reference'] = refrepo
manifest[gitdir]['forkgroup'] = os.path.basename(obstrepo[:-4])
if len(baseline_refs):
# sort the list, so we have deterministic value
br = list(baseline_refs)
br.sort()
refpref = ' '.join(br)
# Go through all remotes and set their alternateRefsPrefixes
for s_virtref, s_childpath in my_remotes:
# is it already set to that?
entries = grokmirror.get_config_from_git(s_childpath, r'core\.alternate*')
if entries.get('alternaterefsprefixes') != refpref:
s_gitdir = '/' + os.path.relpath(s_childpath, toplevel)
logger.info(' reconfig: %s (baseline)', s_gitdir)
grokmirror.set_git_config(s_childpath, 'core.alternateRefsPrefixes', refpref)
repack_requested = False
if os.path.exists(os.path.join(obstrepo, 'grokmirror.repack')):
repack_requested = True
# Go through all our refs and find all stale virtrefs
args = ['for-each-ref', '--format=%(refname)', 'refs/virtual/']
trimmed_virtrefs = set()
ecode, out, err = grokmirror.run_git_command(obstrepo, args)
if ecode == 0 and out:
for line in out.split('\n'):
chunks = line.split('/')
if len(chunks) < 3:
# Where did this come from?
logger.debug('Weird ref %s in objstore repo %s', line, obstrepo)
continue
virtref = chunks[2]
if virtref not in valid_virtrefs and virtref not in trimmed_virtrefs:
logger.info(' trim: stale virtref %s', virtref)
grokmirror.objstore_trim_virtref(obstrepo, virtref)
trimmed_virtrefs.add(virtref)
if obstrepo not in status or new_islandcore or trimmed_virtrefs or repack_requested:
# We don't use obstrepo fingerprints, so we set it to None
status[obstrepo] = {
'lastcheck': 'never',
'nextcheck': todayiso,
'fingerprint': None,
}
# Always full-repack brand new obstrepos
repack_level = 2
else:
obj_info = grokmirror.get_repo_obj_info(obstrepo)
repack_level = get_repack_level(obj_info)
nextcheck = datetime.datetime.strptime(status[obstrepo]['nextcheck'], '%Y-%m-%d')
if repack_level > 1 and nextcheck > today:
# Don't do full repacks outside of schedule
repack_level = 1
if repack_level:
queued += 1
to_process.add((obstrepo, 'repack', repack_level))
if repack_level > 1:
logger.info(' queued: %s (full repack)', os.path.basename(obstrepo))
else:
logger.info(' queued: %s (repack)', os.path.basename(obstrepo))
elif repack_only or repack_all_quick or repack_all_full:
continue
elif (nextcheck <= today or force) and not repack_only:
queued += 1
status[obstrepo]['nextcheck'] = nextcheck.strftime('%F')
to_process.add((obstrepo, 'fsck', None))
logger.info(' queued: %s (fsck)', os.path.basename(obstrepo))
logger.info(' done: %s analyzed, %s queued', analyzed, queued)
if obst_changes:
# We keep the same mtime, because the repos themselves haven't changed
grokmirror.manifest_lock(manifile)
# Re-read manifest, so we can update reference and forkgroup data
disk_manifest = grokmirror.read_manifest(manifile)
# Go through my manifest and update and changes in forkgroup data
for gitdir in manifest:
if gitdir not in disk_manifest:
# What happened here?
continue
if 'reference' in manifest[gitdir]:
disk_manifest[gitdir]['reference'] = manifest[gitdir]['reference']
if 'forkgroup' in manifest[gitdir]:
disk_manifest[gitdir]['forkgroup'] = manifest[gitdir]['forkgroup']
grokmirror.write_manifest(manifile, disk_manifest, pretty=pretty)
grokmirror.manifest_unlock(manifile)
if not len(to_process):
logger.info('No repos need attention.')
return
# Delete some vars that are huge for large repo sets -- we no longer need them and the
# next step will likely eat lots of ram.
del obst_roots
del top_roots
gc.collect()
logger.info('Processing %s repositories', len(to_process))
for fullpath, action, repack_level in to_process:
logger.info('%s:', fullpath)
start_size = get_repo_size(fullpath)
checkdelay = frequency if not force else random.randint(1, frequency)
nextcheck = today + datetime.timedelta(days=checkdelay)
# Calculate elapsed seconds
startt = time.time()
# Wait till the repo is available and lock it for the duration of checks,
# otherwise there may be false-positives if a mirrored repo is updated
# in the middle of fsck or repack.
grokmirror.lock_repo(fullpath, nonblocking=False)
if action == 'repack':
if run_git_repack(fullpath, config, repack_level):
status[fullpath]['lastrepack'] = todayiso
if repack_level > 1:
try:
os.unlink(os.path.join(fullpath, 'grokmirror.repack'))
except FileNotFoundError:
pass
status[fullpath]['lastfullrepack'] = todayiso
status[fullpath]['lastcheck'] = todayiso
status[fullpath]['nextcheck'] = nextcheck.strftime('%F')
# Do we need to generate a preload bundle?
if config['fsck'].get('preload_bundle_outdir') and grokmirror.is_obstrepo(fullpath, obstdir):
gen_preload_bundle(fullpath, config)
logger.info(' next: %s', status[fullpath]['nextcheck'])
else:
logger.warning('Repacking %s was unsuccessful', fullpath)
grokmirror.unlock_repo(fullpath)
continue
elif action == 'fsck':
run_git_fsck(fullpath, config, conn_only)
status[fullpath]['lastcheck'] = todayiso
status[fullpath]['nextcheck'] = nextcheck.strftime('%F')
logger.info(' next: %s', status[fullpath]['nextcheck'])
gitdir = '/' + os.path.relpath(fullpath, toplevel)
status[fullpath]['fingerprint'] = grokmirror.get_repo_fingerprint(toplevel, gitdir)
# noinspection PyTypeChecker
elapsed = int(time.time()-startt)
status[fullpath]['s_elapsed'] = elapsed
# We're done with the repo now
grokmirror.unlock_repo(fullpath)
total_checked += 1
total_elapsed += elapsed
saved = start_size - get_repo_size(fullpath)
space_saved += saved
if saved > 0:
logger.info(' done: %ss, %s saved', elapsed, get_human_size(saved))
else:
logger.info(' done: %ss', elapsed)
if space_saved > 0:
logger.info(' ---: %s done, %s queued, %s saved', total_checked,
len(to_process)-total_checked, get_human_size(space_saved))
else:
logger.info(' ---: %s done, %s queued', total_checked, len(to_process)-total_checked)
# Write status file after each check, so if the process dies, we won't
# have to recheck all the repos we've already checked
logger.debug('Updating status file in %s', statusfile)
with open(statusfile, 'w') as stfh:
stfh.write(json.dumps(status, indent=2))
logger.info('Processed %s repos in %0.2fs', total_checked, total_elapsed)
with open(statusfile, 'w') as stfh:
stfh.write(json.dumps(status, indent=2))
lockf(flockh, LOCK_UN)
flockh.close()
def parse_args():
import argparse
# noinspection PyTypeChecker
op = argparse.ArgumentParser(prog='grok-fsck',
description='Optimize and check mirrored repositories',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
op.add_argument('-v', '--verbose', dest='verbose', action='store_true',
default=False,
help='Be verbose and tell us what you are doing')
op.add_argument('-f', '--force', dest='force',
action='store_true', default=False,
help='Force immediate run on all repositories')
op.add_argument('-c', '--config', dest='config',
required=True,
help='Location of the configuration file')
op.add_argument('--repack-only', dest='repack_only',
action='store_true', default=False,
help='Only find and repack repositories that need optimizing')
op.add_argument('--connectivity-only', dest='conn_only',
action='store_true', default=False,
help='Only check connectivity when running fsck checks')
op.add_argument('--repack-all-quick', dest='repack_all_quick',
action='store_true', default=False,
help='(Assumes --force): Do a quick repack of all repos')
op.add_argument('--repack-all-full', dest='repack_all_full',
action='store_true', default=False,
help='(Assumes --force): Do a full repack of all repos')
op.add_argument('--version', action='version', version=grokmirror.VERSION)
opts = op.parse_args()
if opts.repack_all_quick and opts.repack_all_full:
op.error('Pick either --repack-all-full or --repack-all-quick')
return opts
def grok_fsck(cfgfile, verbose=False, force=False, repack_only=False, conn_only=False,
repack_all_quick=False, repack_all_full=False):
global logger
config = grokmirror.load_config_file(cfgfile)
obstdir = config['core'].get('objstore', None)
if obstdir is None:
obstdir = os.path.join(config['core'].get('toplevel'), 'objstore')
config['core']['objstore'] = obstdir
logfile = config['core'].get('log', None)
if config['core'].get('loglevel', 'info') == 'debug':
loglevel = logging.DEBUG
else:
loglevel = logging.INFO
logger = grokmirror.init_logger('fsck', logfile, loglevel, verbose)
rh = io.StringIO()
ch = logging.StreamHandler(stream=rh)
formatter = logging.Formatter('%(message)s')
ch.setFormatter(formatter)
ch.setLevel(logging.CRITICAL)
logger.addHandler(ch)
fsck_mirror(config, force, repack_only, conn_only, repack_all_quick, repack_all_full)
report = rh.getvalue()
if len(report):
msg = EmailMessage()
msg.set_content(report)
subject = config['fsck'].get('report_subject')
if not subject:
import platform
subject = 'grok-fsck errors on {} ({})'.format(platform.node(), cfgfile)
msg['Subject'] = subject
from_addr = config['fsck'].get('report_from', 'root')
msg['From'] = from_addr
report_to = config['fsck'].get('report_to', 'root')
msg['To'] = report_to
mailhost = config['fsck'].get('report_mailhost', 'localhost')
s = smtplib.SMTP(mailhost)
s.send_message(msg)
s.quit()
def command():
opts = parse_args()
return grok_fsck(opts.config, opts.verbose, opts.force, opts.repack_only, opts.conn_only,
opts.repack_all_quick, opts.repack_all_full)
if __name__ == '__main__':
command()
grokmirror-2.0.11/grokmirror/manifest.py 0000775 0000000 0000000 00000035102 14103301457 0020347 0 ustar 00root root 0000000 0000000 # -*- coding: utf-8 -*-
# Copyright (C) 2013-2020 by The Linux Foundation and contributors
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see .
import os
import sys
import logging
import datetime
import grokmirror
logger = logging.getLogger(__name__)
objstore_uses_plumbing = False
def update_manifest(manifest, toplevel, fullpath, usenow, ignorerefs):
logger.debug('Examining %s', fullpath)
if not grokmirror.is_bare_git_repo(fullpath):
logger.critical('Error opening %s.', fullpath)
logger.critical('Make sure it is a bare git repository.')
sys.exit(1)
gitdir = '/' + os.path.relpath(fullpath, toplevel)
repoinfo = grokmirror.get_repo_defs(toplevel, gitdir, usenow=usenow, ignorerefs=ignorerefs)
# Ignore it if it's an empty git repository
if not repoinfo['fingerprint']:
logger.info(' manifest: ignored %s (no heads)', gitdir)
return
if gitdir not in manifest:
# In grokmirror-1.x we didn't normalize paths to be always with a leading '/', so
# check the manifest for both and make sure we only save the path with a leading /
if gitdir.lstrip('/') in manifest:
manifest[gitdir] = manifest.pop(gitdir.lstrip('/'))
logger.info(' manifest: updated %s', gitdir)
else:
logger.info(' manifest: added %s', gitdir)
manifest[gitdir] = dict()
else:
logger.info(' manifest: updated %s', gitdir)
altrepo = grokmirror.get_altrepo(fullpath)
reference = None
if manifest[gitdir].get('forkgroup', None) != repoinfo.get('forkgroup', None):
# Use the first remote listed in the forkgroup as our reference, just so
# grokmirror-1.x clients continue to work without doing full clones
remotes = grokmirror.list_repo_remotes(altrepo, withurl=True)
if len(remotes):
urls = list(x[1] for x in remotes)
urls.sort()
reference = '/' + os.path.relpath(urls[0], toplevel)
else:
reference = manifest[gitdir].get('reference', None)
if altrepo and not reference and not repoinfo.get('forkgroup'):
# Not an objstore repo
reference = '/' + os.path.relpath(altrepo, toplevel)
manifest[gitdir].update(repoinfo)
# Always write a reference entry even if it's None, as grok-1.x clients expect it
manifest[gitdir]['reference'] = reference
def set_symlinks(manifest, toplevel, symlinks):
for symlink in symlinks:
target = os.path.realpath(symlink)
if not os.path.exists(target):
logger.critical(' manifest: symlink %s is broken, ignored', symlink)
continue
relative = '/' + os.path.relpath(symlink, toplevel)
if target.find(toplevel) < 0:
logger.critical(' manifest: symlink %s points outside toplevel, ignored', relative)
continue
tgtgitdir = '/' + os.path.relpath(target, toplevel)
if tgtgitdir not in manifest:
logger.critical(' manifest: symlink %s points to %s, which we do not recognize', relative, tgtgitdir)
continue
if 'symlinks' in manifest[tgtgitdir]:
if relative not in manifest[tgtgitdir]['symlinks']:
logger.info(' manifest: symlinked %s->%s', relative, tgtgitdir)
manifest[tgtgitdir]['symlinks'].append(relative)
else:
logger.info(' manifest: %s->%s is already in manifest', relative, tgtgitdir)
else:
manifest[tgtgitdir]['symlinks'] = [relative]
logger.info(' manifest: symlinked %s->%s', relative, tgtgitdir)
# Now go through all repos and fix any references pointing to the
# symlinked location. We shouldn't need to do anything with forkgroups.
for gitdir in manifest:
if manifest[gitdir] == relative:
logger.info(' manifest: removing %s (replaced by a symlink)', gitdir)
manifest.pop(gitdir)
continue
if manifest[gitdir]['reference'] == relative:
logger.info(' manifest: symlinked %s->%s', relative, tgtgitdir)
manifest[gitdir]['reference'] = tgtgitdir
def purge_manifest(manifest, toplevel, gitdirs):
for oldrepo in list(manifest):
if os.path.join(toplevel, oldrepo.lstrip('/')) not in gitdirs:
logger.info(' manifest: purged %s (gone)', oldrepo)
manifest.remove(oldrepo)
def parse_args():
global objstore_uses_plumbing
import argparse
# noinspection PyTypeChecker
op = argparse.ArgumentParser(prog='grok-manifest',
description='Create or update a manifest file',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
op.add_argument('--cfgfile', dest='cfgfile',
default=None,
help='Path to grokmirror.conf containing at least a [core] section')
op.add_argument('-m', '--manifest', dest='manifile',
help='Location of manifest.js or manifest.js.gz')
op.add_argument('-t', '--toplevel', dest='toplevel',
help='Top dir where all repositories reside')
op.add_argument('-l', '--logfile', dest='logfile',
default=None,
help='When specified, will put debug logs in this location')
op.add_argument('-n', '--use-now', dest='usenow', action='store_true',
default=False,
help='Use current timestamp instead of parsing commits')
op.add_argument('-c', '--check-export-ok', dest='check_export_ok',
action='store_true', default=False,
help='Export only repositories marked as git-daemon-export-ok')
op.add_argument('-p', '--purge', dest='purge', action='store_true',
default=False,
help='Purge deleted git repositories from manifest')
op.add_argument('-x', '--remove', dest='remove', action='store_true',
default=False,
help='Remove repositories passed as arguments from manifest')
op.add_argument('-y', '--pretty', dest='pretty', action='store_true',
default=False,
help='Pretty-print manifest (sort keys and add indentation)')
op.add_argument('-i', '--ignore-paths', dest='ignore', action='append',
default=None,
help='When finding git dirs, ignore these paths (accepts shell-style globbing)')
op.add_argument('-r', '--ignore-refs', dest='ignore_refs', action='append', default=None,
help='Refs to exclude from fingerprint calculation (e.g. refs/meta/*)')
op.add_argument('-w', '--wait-for-manifest', dest='wait',
action='store_true', default=False,
help='When running with arguments, wait if manifest is not there '
'(can be useful when multiple writers are writing the manifest)')
op.add_argument('-o', '--fetch-objstore', dest='fetchobst',
action='store_true', default=False,
help='Fetch updates into objstore repo (if used)')
op.add_argument('-v', '--verbose', dest='verbose', action='store_true',
default=False,
help='Be verbose and tell us what you are doing')
op.add_argument('--version', action='version', version=grokmirror.VERSION)
op.add_argument('paths', nargs='*', help='Full path(s) to process')
opts = op.parse_args()
if opts.cfgfile:
config = grokmirror.load_config_file(opts.cfgfile)
if not opts.manifile:
opts.manifile = config['core'].get('manifest')
if not opts.toplevel:
opts.toplevel = os.path.realpath(config['core'].get('toplevel'))
if not opts.logfile:
opts.logfile = config['core'].get('logfile')
objstore_uses_plumbing = config['core'].getboolean('objstore_uses_plumbing', False)
if 'manifest' in config:
if not opts.ignore:
opts.ignore = [x.strip() for x in config['manifest'].get('ignore', '').split('\n')]
if not opts.check_export_ok:
opts.check_export_ok = config['manifest'].getboolean('check_export_ok', False)
if not opts.pretty:
opts.pretty = config['manifest'].getboolean('pretty', False)
if not opts.fetchobst:
opts.fetchobst = config['manifest'].getboolean('fetch_objstore', False)
if not opts.manifile:
op.error('You must provide the path to the manifest file')
if not opts.toplevel:
op.error('You must provide the toplevel path')
if opts.ignore is None:
opts.ignore = list()
if not len(opts.paths) and opts.wait:
op.error('--wait option only makes sense when dirs are passed')
return opts
def grok_manifest(manifile, toplevel, paths=None, logfile=None, usenow=False,
check_export_ok=False, purge=False, remove=False,
pretty=False, ignore=None, wait=False, verbose=False, fetchobst=False,
ignorerefs=None):
global logger
loglevel = logging.INFO
logger = grokmirror.init_logger('manifest', logfile, loglevel, verbose)
startt = datetime.datetime.now()
if paths is None:
paths = list()
if ignore is None:
ignore = list()
grokmirror.manifest_lock(manifile)
manifest = grokmirror.read_manifest(manifile, wait=wait)
toplevel = os.path.realpath(toplevel)
# If manifest is empty, don't use current timestamp
if not len(manifest):
usenow = False
if remove and len(paths):
# Remove the repos as required, write new manfiest and exit
for fullpath in paths:
repo = '/' + os.path.relpath(fullpath, toplevel)
if repo in manifest:
manifest.pop(repo)
logger.info(' manifest: removed %s', repo)
else:
# Is it in any of the symlinks?
found = False
for gitdir in manifest:
if 'symlinks' in manifest[gitdir] and repo in manifest[gitdir]['symlinks']:
found = True
manifest[gitdir]['symlinks'].remove(repo)
if not len(manifest[gitdir]['symlinks']):
manifest[gitdir].pop('symlinks')
logger.info(' manifest: removed symlink %s->%s', repo, gitdir)
if not found:
logger.info(' manifest: %s not in manifest', repo)
# XXX: need to add logic to make sure we don't break the world
# by removing a repository used as a reference for others
grokmirror.write_manifest(manifile, manifest, pretty=pretty)
grokmirror.manifest_unlock(manifile)
return 0
gitdirs = list()
if purge or not len(paths) or not len(manifest):
# We automatically purge when we do a full tree walk
for gitdir in grokmirror.find_all_gitdirs(toplevel, ignore=ignore, exclude_objstore=True):
gitdirs.append(gitdir)
purge_manifest(manifest, toplevel, gitdirs)
if len(manifest) and len(paths):
# limit ourselves to passed dirs only when there is something
# in the manifest. This precaution makes sure we regenerate the
# whole file when there is nothing in it or it can't be parsed.
for apath in paths:
arealpath = os.path.realpath(apath)
if apath != arealpath and os.path.islink(apath):
gitdirs.append(apath)
else:
gitdirs.append(arealpath)
symlinks = list()
tofetch = set()
for gitdir in gitdirs:
# check to make sure this gitdir is ok to export
if check_export_ok and not os.path.exists(os.path.join(gitdir, 'git-daemon-export-ok')):
# is it curently in the manifest?
repo = '/' + os.path.relpath(gitdir, toplevel)
if repo in list(manifest):
logger.info(' manifest: removed %s (no longer exported)', repo)
manifest.pop(repo)
# XXX: need to add logic to make sure we don't break the world
# by removing a repository used as a reference for others
# also make sure we clean up any dangling symlinks
continue
if os.path.islink(gitdir):
symlinks.append(gitdir)
else:
update_manifest(manifest, toplevel, gitdir, usenow, ignorerefs)
if fetchobst:
# Do it after we're done with manifest, to avoid keeping it locked
tofetch.add(gitdir)
if len(symlinks):
set_symlinks(manifest, toplevel, symlinks)
grokmirror.write_manifest(manifile, manifest, pretty=pretty)
grokmirror.manifest_unlock(manifile)
fetched = set()
for gitdir in tofetch:
altrepo = grokmirror.get_altrepo(gitdir)
if altrepo in fetched:
continue
if altrepo and grokmirror.is_obstrepo(altrepo):
try:
grokmirror.lock_repo(altrepo, nonblocking=True)
logger.info(' manifest: objstore %s -> %s', gitdir, os.path.basename(altrepo))
grokmirror.fetch_objstore_repo(altrepo, gitdir, use_plumbing=objstore_uses_plumbing)
grokmirror.unlock_repo(altrepo)
fetched.add(altrepo)
except IOError:
# grok-fsck will fetch this one, then
pass
elapsed = datetime.datetime.now() - startt
if len(gitdirs) > 1:
logger.info('Updated %s records in %ds', len(gitdirs), elapsed.total_seconds())
else:
logger.info('Done in %0.2fs', elapsed.total_seconds())
def command():
opts = parse_args()
return grok_manifest(
opts.manifile, opts.toplevel, paths=opts.paths, logfile=opts.logfile,
usenow=opts.usenow, check_export_ok=opts.check_export_ok,
purge=opts.purge, remove=opts.remove, pretty=opts.pretty,
ignore=opts.ignore, wait=opts.wait, verbose=opts.verbose,
fetchobst=opts.fetchobst, ignorerefs=opts.ignore_refs)
if __name__ == '__main__':
command()
grokmirror-2.0.11/grokmirror/pi_piper.py 0000775 0000000 0000000 00000017143 14103301457 0020355 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# This is a ready-made post_update_hook script for piping messages from
# mirrored public-inbox repositories to arbitrary commands (e.g. procmail).
#
__author__ = 'Konstantin Ryabitsev '
import os
import sys
import grokmirror
import fnmatch
import logging
import shlex
from typing import Optional
# default basic logger. We override it later.
logger = logging.getLogger(__name__)
def git_get_message_from_pi(fullpath: str, commit_id: str) -> bytes:
logger.debug('Getting %s:m from %s', commit_id, fullpath)
args = ['show', f'{commit_id}:m']
ecode, out, err = grokmirror.run_git_command(fullpath, args, decode=False)
if ecode > 0:
logger.debug('Could not get the message, error below')
logger.debug(err.decode())
raise KeyError('Could not find %s in %s' % (commit_id, fullpath))
return out
def git_get_new_revs(fullpath: str, pipelast: Optional[int] = None) -> list:
statf = os.path.join(fullpath, 'pi-piper.latest')
if pipelast:
rev_range = '-n %d' % pipelast
else:
with open(statf, 'r') as fh:
latest = fh.read().strip()
rev_range = f'{latest}..'
args = ['rev-list', '--pretty=oneline', '--reverse', rev_range, 'master']
ecode, out, err = grokmirror.run_git_command(fullpath, args)
if ecode > 0:
raise KeyError('Could not iterate %s in %s' % (rev_range, fullpath))
newrevs = list()
if out:
for line in out.split('\n'):
(commit_id, logmsg) = line.split(' ', 1)
logger.debug('commit_id=%s, subject=%s', commit_id, logmsg)
newrevs.append((commit_id, logmsg))
return newrevs
def reshallow(repo: str, commit_id: str) -> int:
with open(os.path.join(repo, 'shallow'), 'w') as fh:
fh.write(commit_id)
fh.write('\n')
logger.info(' prune: %s ', repo)
ecode, out, err = grokmirror.run_git_command(repo, ['gc', '--prune=now'])
return ecode
def init_piper_tracking(repo: str, shallow: bool) -> bool:
logger.info('Initial setup for %s', repo)
args = ['rev-list', '-n', '1', 'master']
ecode, out, err = grokmirror.run_git_command(repo, args)
if ecode > 0 or not out:
logger.info('Could not list revs in %s', repo)
return False
# Just write latest into the tracking file and return
latest = out.strip()
statf = os.path.join(repo, 'pi-piper.latest')
with open(statf, 'w') as fh:
fh.write(latest)
if shallow:
reshallow(repo, latest)
return True
def run_pi_repo(repo: str, pipedef: str, dryrun: bool = False, shallow: bool = False,
pipelast: Optional[int] = None) -> None:
logger.info('Checking %s', repo)
sp = shlex.shlex(pipedef, posix=True)
sp.whitespace_split = True
args = list(sp)
if not os.access(args[0], os.EX_OK):
logger.critical('Cannot execute %s', pipedef)
sys.exit(1)
statf = os.path.join(repo, 'pi-piper.latest')
if not os.path.exists(statf):
if dryrun:
logger.info('Would have set up piper for %s [DRYRUN]', repo)
return
if not init_piper_tracking(repo, shallow):
logger.critical('Unable to set up piper for %s', repo)
return
try:
revlist = git_get_new_revs(repo, pipelast=pipelast)
except KeyError:
# this could have happened if the public-inbox repository
# got rebased, e.g. due to GDPR-induced history editing.
# For now, bluntly handle this by getting rid of our
# status file and pretending we just started new.
# XXX: in reality, we could handle this better by keeping track
# of the subject line of the latest message we processed, and
# then going through history to find the new commit-id of that
# message. Unless, of course, that's the exact message that got
# deleted in the first place. :/
# This also makes it hard with shallow repos, since we'd have
# to unshallow them first in order to find that message.
logger.critical('Assuming the repository got rebased, dropping all history.')
os.unlink(statf)
if not dryrun:
init_piper_tracking(repo, shallow)
revlist = git_get_new_revs(repo)
if not revlist:
return
logger.info('Processing %s commits', len(revlist))
latest_good = None
ecode = 0
for commit_id, subject in revlist:
msgbytes = git_get_message_from_pi(repo, commit_id)
if msgbytes:
if dryrun:
logger.info(' piping: %s (%s b) [DRYRUN]', commit_id, len(msgbytes))
logger.debug(' subject: %s', subject)
else:
logger.info(' piping: %s (%s b)', commit_id, len(msgbytes))
logger.debug(' subject: %s', subject)
ecode, out, err = grokmirror.run_shell_command(args, stdin=msgbytes)
if ecode > 0:
logger.info('Error running %s', pipedef)
logger.info(err)
break
latest_good = commit_id
if latest_good and not dryrun:
with open(statf, 'w') as fh:
fh.write(latest_good)
logger.info('Wrote %s', statf)
if ecode == 0 and shallow:
reshallow(repo, latest_good)
sys.exit(ecode)
def command():
import argparse
from configparser import ConfigParser, ExtendedInterpolation
global logger
# noinspection PyTypeChecker
op = argparse.ArgumentParser(prog='grok-pi-piper',
description='Pipe new messages from public-inbox repositories to arbitrary commands',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
op.add_argument('-v', '--verbose', action='store_true',
default=False,
help='Be verbose and tell us what you are doing')
op.add_argument('-d', '--dry-run', dest='dryrun', action='store_true',
default=False,
help='Do a dry-run and just show what would be done')
op.add_argument('-c', '--config', required=True,
help='Location of the configuration file')
op.add_argument('-l', '--pipe-last', dest='pipelast', type=int, default=None,
help='Force pipe last NN messages in the list, regardless of tracking')
op.add_argument('repo',
help='Full path to foo/git/N.git public-inbox repository')
op.add_argument('--version', action='version', version=grokmirror.VERSION)
opts = op.parse_args()
cfgfile = os.path.expanduser(opts.config)
if not cfgfile:
sys.stderr.write('ERORR: File does not exist: %s\n' % cfgfile)
sys.exit(1)
config = ConfigParser(interpolation=ExtendedInterpolation())
config.read(os.path.expanduser(cfgfile))
# Find out the section that we want from the config file
section = 'DEFAULT'
for sectname in config.sections():
if fnmatch.fnmatch(opts.repo, f'*/{sectname}/git/*.git'):
section = sectname
pipe = config[section].get('pipe')
if pipe == 'None':
# Quick exit
sys.exit(0)
logfile = config[section].get('log')
if config[section].get('loglevel') == 'debug':
loglevel = logging.DEBUG
else:
loglevel = logging.INFO
shallow = config[section].getboolean('shallow', False) # noqa
logger = grokmirror.init_logger('pull', logfile, loglevel, opts.verbose)
run_pi_repo(opts.repo, pipe, dryrun=opts.dryrun, shallow=shallow, pipelast=opts.pipelast)
if __name__ == '__main__':
command()
grokmirror-2.0.11/grokmirror/pull.py 0000775 0000000 0000000 00000155555 14103301457 0017534 0 ustar 00root root 0000000 0000000 # -*- coding: utf-8 -*-
# Copyright (C) 2013-2020 by The Linux Foundation and contributors
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see .
import os
import stat
import sys
import grokmirror
import logging
import requests
import time
import gzip
import json
import fnmatch
import shutil
import tempfile
import signal
import shlex
import calendar
import uuid
import multiprocessing as mp
import queue
from socketserver import UnixStreamServer, StreamRequestHandler, ThreadingMixIn
# default basic logger. We override it later.
logger = logging.getLogger(__name__)
class SignalHandler:
def __init__(self, config, sw, dws, pws, done):
self.config = config
self.sw = sw
self.dws = dws
self.pws = pws
self.done = done
self.killed = False
def _handler(self, signum, frame):
self.killed = True
logger.debug('Received signum=%s, frame=%s', signum, frame)
# if self.sw:
# self.sw.terminate()
# self.sw.join()
# for dw in self.dws:
# if dw and dw.is_alive():
# dw.terminate()
# dw.join()
# for pw in self.pws:
# if pw and pw.is_alive():
# pw.terminate()
# pw.join()
if len(self.done):
update_manifest(self.config, self.done)
logger.info('Exiting on signal %s', signum)
sys.exit(0)
def __enter__(self):
self.old_sigint = signal.signal(signal.SIGINT, self._handler)
self.old_sigterm = signal.signal(signal.SIGTERM, self._handler)
def __exit__(self, sigtype, value, traceback):
if self.killed:
sys.exit(0)
signal.signal(signal.SIGINT, self.old_sigint)
signal.signal(signal.SIGTERM, self.old_sigterm)
class Handler(StreamRequestHandler):
def handle(self):
config = self.server.config
manifile = config['core'].get('manifest')
while True:
# noinspection PyBroadException
try:
gitdir = self.rfile.readline().strip().decode()
# Do we know anything about this path?
manifest = grokmirror.read_manifest(manifile)
if gitdir in manifest:
logger.info(' listener: %s', gitdir)
repoinfo = manifest[gitdir]
# Set fingerprint to None to force a run
repoinfo['fingerprint'] = None
repoinfo['modified'] = int(time.time())
self.server.q_mani.put((gitdir, repoinfo, 'pull'))
elif gitdir:
logger.info(' listener: %s (not known, ignored)', gitdir)
return
else:
return
except:
return
class ThreadedUnixStreamServer(ThreadingMixIn, UnixStreamServer):
pass
def build_optimal_forkgroups(l_manifest, r_manifest, toplevel, obstdir):
r_forkgroups = dict()
for gitdir in set(r_manifest.keys()):
fullpath = os.path.join(toplevel, gitdir.lstrip('/'))
# our forkgroup info wins, because our own grok-fcsk may have found better siblings
# unless we're cloning, in which case we have nothing to go by except remote info
if gitdir in l_manifest:
reference = l_manifest[gitdir].get('reference', None)
forkgroup = l_manifest[gitdir].get('forkgroup', None)
if reference is not None:
r_manifest[gitdir]['reference'] = reference
if forkgroup is not None:
r_manifest[gitdir]['forkgroup'] = forkgroup
else:
reference = r_manifest[gitdir].get('reference', None)
forkgroup = r_manifest[gitdir].get('forkgroup', None)
if reference and not forkgroup:
# probably a grokmirror-1.x manifest
r_fullpath = os.path.join(toplevel, reference.lstrip('/'))
for fg, fps in r_forkgroups.items():
if r_fullpath in fps:
forkgroup = fg
break
if not forkgroup:
# I guess we get to make a new one!
forkgroup = str(uuid.uuid4())
r_forkgroups[forkgroup] = {r_fullpath}
if forkgroup is not None:
if forkgroup not in r_forkgroups:
r_forkgroups[forkgroup] = set()
r_forkgroups[forkgroup].add(fullpath)
# Compare their forkgroups and my forkgroups in case we have a more optimal strategy
forkgroups = grokmirror.get_forkgroups(obstdir, toplevel)
for r_fg, r_siblings in r_forkgroups.items():
# if we have an intersection between their forkgroups and our forkgroups, then we use ours
found = False
for l_fg, l_siblings in forkgroups.items():
if l_siblings == r_siblings:
# No changes there
continue
if len(l_siblings.intersection(r_siblings)):
l_siblings.update(r_siblings)
found = True
break
if not found:
# We don't have any matches in existing repos, so make a new forkgroup
forkgroups[r_fg] = r_siblings
return forkgroups
def spa_worker(config, q_spa, pauseonload):
toplevel = os.path.realpath(config['core'].get('toplevel'))
cpus = mp.cpu_count()
saidpaused = False
while True:
if pauseonload:
load = os.getloadavg()
if load[0] > cpus:
if not saidpaused:
logger.info(' spa: paused (system load), %s waiting', q_spa.qsize())
saidpaused = True
time.sleep(5)
continue
saidpaused = False
try:
(gitdir, actions) = q_spa.get(timeout=1)
except queue.Empty:
sys.exit(0)
logger.debug('spa_worker: gitdir=%s, actions=%s', gitdir, actions)
fullpath = os.path.join(toplevel, gitdir.lstrip('/'))
try:
grokmirror.lock_repo(fullpath, nonblocking=True)
except IOError:
# We'll get it during grok-fsck
continue
if not q_spa.empty():
logger.info(' spa: 1 active, %s waiting', q_spa.qsize())
else:
logger.info(' spa: 1 active')
done = list()
for action in actions:
if action in done:
continue
done.append(action)
if action == 'objstore':
altrepo = grokmirror.get_altrepo(fullpath)
# Should we use plumbing for this?
use_plumbing = config['core'].getboolean('objstore_uses_plumbing', False)
grokmirror.fetch_objstore_repo(altrepo, fullpath, use_plumbing=use_plumbing)
elif action == 'repack':
logger.debug('quick-repacking %s', fullpath)
args = ['repack', '-Adlq']
if 'fsck' in config:
extraflags = config['fsck'].get('extra_repack_flags', '').split()
if len(extraflags):
args += extraflags
ecode, out, err = grokmirror.run_git_command(fullpath, args)
if ecode > 0:
logger.debug('Could not repack %s', fullpath)
elif action == 'packrefs':
args = ['pack-refs']
ecode, out, err = grokmirror.run_git_command(fullpath, args)
if ecode > 0:
logger.debug('Could not pack-refs %s', fullpath)
elif action == 'packrefs-all':
args = ['pack-refs', '--all']
ecode, out, err = grokmirror.run_git_command(fullpath, args)
if ecode > 0:
logger.debug('Could not pack-refs %s', fullpath)
grokmirror.unlock_repo(fullpath)
logger.info(' spa: %s (done: %s)', gitdir, ', '.join(done))
def objstore_repo_preload(config, obstrepo):
purl = config['remote'].get('preload_bundle_url')
if not purl:
return
bname = os.path.basename(obstrepo)[:-4]
obstdir = os.path.realpath(config['core'].get('objstore'))
burl = '%s/%s.bundle' % (purl.rstrip('/'), bname)
bfile = os.path.join(obstdir, '%s.bundle' % bname)
try:
sess = grokmirror.get_requests_session()
resp = sess.get(burl, stream=True)
resp.raise_for_status()
logger.info(' objstore: downloading %s.bundle', bname)
with open(bfile, 'wb') as fh:
for chunk in resp.iter_content(chunk_size=8192):
fh.write(chunk)
resp.close()
except: # noqa
# Make sure we don't leave .bundle files lying around
# Should we add logic to resume downloads here in the future?
if os.path.exists(bfile):
os.unlink(bfile)
return
# Now we clone from it into the objstore repo
ecode, out, err = grokmirror.run_git_command(obstrepo, ['remote', 'add', '--mirror=fetch', '_preload', bfile])
if ecode == 0:
logger.info(' objstore: preloading %s.bundle', bname)
args = ['remote', 'update', '_preload']
ecode, out, err = grokmirror.run_git_command(obstrepo, args)
if ecode > 0:
logger.info(' objstore: failed to preload from %s.bundle', bname)
else:
# now pack refs and generate a commit graph
grokmirror.run_git_command(obstrepo, ['pack-refs', '--all'])
if grokmirror.git_newer_than('2.18.0'):
grokmirror.run_git_command(obstrepo, ['commit-graph', 'write'])
logger.info(' objstore: successful preload from %s.bundle', bname)
# Regardless of what happened, we remove _preload and the bundle, then move on
grokmirror.run_git_command(obstrepo, ['remote', 'rm', '_preload'])
os.unlink(bfile)
def pull_worker(config, q_pull, q_spa, q_done):
toplevel = os.path.realpath(config['core'].get('toplevel'))
obstdir = os.path.realpath(config['core'].get('objstore'))
maxretries = config['pull'].getint('retries', 3)
site = config['remote'].get('site')
remotename = config['pull'].get('remotename', '_grokmirror')
# Should we use plumbing for objstore operations?
objstore_uses_plumbing = config['core'].getboolean('objstore_uses_plumbing', False)
while True:
try:
(gitdir, repoinfo, action, q_action) = q_pull.get(timeout=1)
except queue.Empty:
sys.exit(0)
logger.debug('pull_worker: gitdir=%s, action=%s', gitdir, action)
fullpath = os.path.join(toplevel, gitdir.lstrip('/'))
success = True
spa_actions = list()
try:
grokmirror.lock_repo(fullpath, nonblocking=True)
except IOError:
# Take a quick nap and put it back into queue
logger.info(' defer: %s (locked)', gitdir)
time.sleep(5)
q_pull.put((gitdir, repoinfo, action, q_action))
continue
altrepo = grokmirror.get_altrepo(fullpath)
obstrepo = None
if altrepo and grokmirror.is_obstrepo(altrepo, obstdir):
obstrepo = altrepo
if action == 'purge':
# Is it a symlink?
if os.path.islink(fullpath):
logger.info(' purge: %s', gitdir)
os.unlink(fullpath)
else:
# is anything using us for alternates?
if grokmirror.is_alt_repo(toplevel, gitdir):
logger.debug('Not purging %s because it is used by other repos via alternates', fullpath)
else:
logger.info(' purge: %s', gitdir)
shutil.rmtree(fullpath)
if action == 'fix_params':
logger.info(' reconfig: %s', gitdir)
set_repo_params(fullpath, repoinfo)
if action == 'fix_remotes':
logger.info(' reorigin: %s', gitdir)
success = fix_remotes(toplevel, gitdir, site, config)
if success:
set_repo_params(fullpath, repoinfo)
action = 'pull'
else:
success = False
if action == 'reclone':
logger.info(' reclone: %s', gitdir)
try:
altrepo = grokmirror.get_altrepo(fullpath)
shutil.move(fullpath, '%s.reclone' % fullpath)
shutil.rmtree('%s.reclone' % fullpath)
grokmirror.setup_bare_repo(fullpath)
fix_remotes(toplevel, gitdir, site, config)
set_repo_params(fullpath, repoinfo)
if altrepo:
grokmirror.set_altrepo(fullpath, altrepo)
action = 'pull'
except (PermissionError, IOError) as ex:
logger.critical('Unable to remove %s: %s', fullpath, str(ex))
success = False
if action in ('pull', 'objstore_migrate'):
r_fp = repoinfo.get('fingerprint')
my_fp = grokmirror.get_repo_fingerprint(toplevel, gitdir, force=True)
if obstrepo:
o_obj_info = grokmirror.get_repo_obj_info(obstrepo)
if o_obj_info.get('count') == '0' and o_obj_info.get('in-pack') == '0' and not my_fp:
# Try to preload the objstore repo directly
objstore_repo_preload(config, obstrepo)
if r_fp != my_fp:
# Make sure we have the remote set up
if action == 'pull' and remotename not in grokmirror.list_repo_remotes(fullpath):
logger.info(' reorigin: %s', gitdir)
fix_remotes(toplevel, gitdir, site, config)
logger.info(' fetch: %s', gitdir)
retries = 1
while True:
success = pull_repo(fullpath, remotename)
if success:
break
retries += 1
if retries > maxretries:
break
logger.info(' refetch: %s (try #%s)', gitdir, retries)
if success:
run_post_update_hook(toplevel, gitdir, config['pull'].get('post_update_hook', ''))
post_pull_fp = grokmirror.get_repo_fingerprint(toplevel, gitdir, force=True)
repoinfo['fingerprint'] = post_pull_fp
altrepo = grokmirror.get_altrepo(fullpath)
if post_pull_fp != my_fp:
grokmirror.set_repo_fingerprint(toplevel, gitdir, fingerprint=post_pull_fp)
if altrepo and grokmirror.is_obstrepo(altrepo, obstdir) and not repoinfo.get('private'):
# do we have any objects in the objstore repo?
o_obj_info = grokmirror.get_repo_obj_info(altrepo)
if o_obj_info.get('count') == '0' and o_obj_info.get('in-pack') == '0':
# We fetch right now, as other repos may be waiting on these objects
logger.info(' objstore: %s', gitdir)
grokmirror.fetch_objstore_repo(altrepo, fullpath, use_plumbing=objstore_uses_plumbing)
if not objstore_uses_plumbing:
spa_actions.append('repack')
else:
# We lazy-fetch in the spa
spa_actions.append('objstore')
if my_fp is None and not objstore_uses_plumbing:
# Initial clone, trigger a repack after objstore
spa_actions.append('repack')
if my_fp is None:
# This was the initial clone, so pack all refs
spa_actions.append('packrefs-all')
if not grokmirror.is_precious(fullpath):
# See if doing a quick repack would be beneficial
obj_info = grokmirror.get_repo_obj_info(fullpath)
if grokmirror.get_repack_level(obj_info):
# We only do quick repacks, so we don't care about precise level
spa_actions.append('repack')
spa_actions.append('packrefs')
modified = repoinfo.get('modified')
if modified is not None:
set_agefile(toplevel, gitdir, modified)
else:
logger.debug('FP match, not pulling %s', gitdir)
if action == 'objstore_migrate':
spa_actions.append('objstore')
spa_actions.append('repack')
grokmirror.unlock_repo(fullpath)
symlinks = repoinfo.get('symlinks')
if os.path.exists(fullpath) and symlinks:
for symlink in symlinks:
target = os.path.join(toplevel, symlink.lstrip('/'))
if os.path.islink(target):
# are you pointing to where we need you?
if os.path.realpath(target) != fullpath:
# Remove symlink and recreate below
logger.debug('Removed existing wrong symlink %s', target)
os.unlink(target)
elif os.path.exists(target):
logger.warning('Deleted repo %s, because it is now a symlink to %s' % (target, fullpath))
shutil.rmtree(target)
# Here we re-check if we still need to do anything
if not os.path.exists(target):
logger.info(' symlink: %s -> %s', symlink, gitdir)
# Make sure the leading dirs are in place
if not os.path.exists(os.path.dirname(target)):
os.makedirs(os.path.dirname(target))
os.symlink(fullpath, target)
q_done.put((gitdir, repoinfo, q_action, success))
if spa_actions:
q_spa.put((gitdir, spa_actions))
def cull_manifest(manifest, config):
includes = config['pull'].get('include', '*').split('\n')
excludes = config['pull'].get('exclude', '').split('\n')
culled = dict()
for gitdir, repoinfo in manifest.items():
if not repoinfo.get('fingerprint'):
logger.critical('Repo without fingerprint info (skipped): %s', gitdir)
continue
# does it fall under include?
for include in includes:
if fnmatch.fnmatch(gitdir, include):
# Yes, but does it fall under excludes?
excluded = False
for exclude in excludes:
if fnmatch.fnmatch(gitdir, exclude):
excluded = True
break
if excluded:
continue
culled[gitdir] = manifest[gitdir]
return culled
def fix_remotes(toplevel, gitdir, site, config):
remotename = config['pull'].get('remotename', '_grokmirror')
fullpath = os.path.join(toplevel, gitdir.lstrip('/'))
# Set our remote
if remotename in grokmirror.list_repo_remotes(fullpath):
logger.debug('\tremoving remote: %s', remotename)
ecode, out, err = grokmirror.run_git_command(fullpath, ['remote', 'remove', remotename])
if ecode > 0:
logger.critical('FATAL: Could not remove remote %s from %s', remotename, fullpath)
return False
# set my remote URL
url = os.path.join(site, gitdir.lstrip('/'))
ecode, out, err = grokmirror.run_git_command(fullpath, ['remote', 'add', '--mirror=fetch', remotename, url])
if ecode > 0:
logger.critical('FATAL: Could not set %s to %s in %s', remotename, url, fullpath)
return False
ffonly = False
for globpatt in set([x.strip() for x in config['pull'].get('ffonly', '').split('\n')]):
if fnmatch.fnmatch(gitdir, globpatt):
ffonly = True
break
if ffonly:
grokmirror.set_git_config(fullpath, 'remote.{}.fetch'.format(remotename), 'refs/*:refs/*')
logger.debug('\tset %s as %s (ff-only)', remotename, url)
else:
logger.debug('\tset %s as %s', remotename, url)
return True
def set_repo_params(fullpath, repoinfo):
owner = repoinfo.get('owner')
description = repoinfo.get('description')
head = repoinfo.get('head')
if owner is None and description is None and head is None:
# Let the default git values be there, then
return
if description is not None:
descfile = os.path.join(fullpath, 'description')
contents = None
if os.path.exists(descfile):
with open(descfile) as fh:
contents = fh.read()
if contents != description:
logger.debug('Setting %s description to: %s', fullpath, description)
with open(descfile, 'w') as fh:
fh.write(description)
if owner is not None:
logger.debug('Setting %s owner to: %s', fullpath, owner)
grokmirror.set_git_config(fullpath, 'gitweb.owner', owner)
if head is not None:
headfile = os.path.join(fullpath, 'HEAD')
contents = None
if os.path.exists(headfile):
with open(headfile) as fh:
contents = fh.read().rstrip()
if contents != head:
logger.debug('Setting %s HEAD to: %s', fullpath, head)
with open(headfile, 'w') as fh:
fh.write('{}\n'.format(head))
def set_agefile(toplevel, gitdir, last_modified):
grokmirror.set_repo_timestamp(toplevel, gitdir, last_modified)
# set agefile, which can be used by cgit to show idle times
# cgit recommends it to be yyyy-mm-dd hh:mm:ss
cgit_fmt = time.strftime('%F %T', time.localtime(last_modified))
agefile = os.path.join(toplevel, gitdir.lstrip('/'), 'info/web/last-modified')
if not os.path.exists(os.path.dirname(agefile)):
os.makedirs(os.path.dirname(agefile))
with open(agefile, 'wt') as fh:
fh.write('%s\n' % cgit_fmt)
logger.debug('Wrote "%s" into %s', cgit_fmt, agefile)
def run_post_update_hook(toplevel, gitdir, hookscripts):
if not len(hookscripts):
return
for hookscript in hookscripts.split('\n'):
hookscript = os.path.expanduser(hookscript.strip())
sp = shlex.shlex(hookscript, posix=True)
sp.whitespace_split = True
args = list(sp)
logger.info(' hook: %s', ' '.join(args))
if not os.access(args[0], os.X_OK):
logger.warning('post_update_hook %s is not executable', hookscript)
continue
fullpath = os.path.join(toplevel, gitdir.lstrip('/'))
args.append(fullpath)
logger.debug('Running: %s', ' '.join(args))
ecode, output, error = grokmirror.run_shell_command(args)
if error:
# Put hook stderror into warning
logger.warning('Hook Stderr (%s): %s', gitdir, error)
if output:
# Put hook stdout into info
logger.info('Hook Stdout (%s): %s', gitdir, output)
def pull_repo(fullpath, remotename):
args = ['remote', 'update', remotename, '--prune']
retcode, output, error = grokmirror.run_git_command(fullpath, args)
success = False
if retcode == 0:
success = True
if error:
# Put things we recognize into debug
debug = list()
warn = list()
for line in error.split('\n'):
if line.find('From ') == 0:
debug.append(line)
elif line.find('-> ') > 0:
debug.append(line)
elif line.find('remote: warning:') == 0:
debug.append(line)
elif line.find('ControlSocket') >= 0:
debug.append(line)
elif not success:
warn.append(line)
else:
debug.append(line)
if debug:
logger.debug('Stderr (%s): %s', fullpath, '\n'.join(debug))
if warn:
logger.warning('Stderr (%s): %s', fullpath, '\n'.join(warn))
return success
def write_projects_list(config, manifest):
plpath = config['pull'].get('projectslist', '')
if not plpath:
return
trimtop = config['pull'].get('projectslist_trimtop', '')
add_symlinks = config['pull'].getboolean('projectslist_symlinks', False)
(dirname, basename) = os.path.split(plpath)
(fd, tmpfile) = tempfile.mkstemp(prefix=basename, dir=dirname)
try:
fh = os.fdopen(fd, 'wb', 0)
for gitdir in manifest:
if trimtop and gitdir.startswith(trimtop):
pgitdir = gitdir[len(trimtop):]
else:
pgitdir = gitdir
# Always remove leading slash, otherwise cgit breaks
pgitdir = pgitdir.lstrip('/')
fh.write('{}\n'.format(pgitdir).encode())
if add_symlinks and 'symlinks' in manifest[gitdir]:
# Do the same for symlinks
# XXX: Should make this configurable, perhaps
for symlink in manifest[gitdir]['symlinks']:
if trimtop and symlink.startswith(trimtop):
symlink = symlink[len(trimtop):]
symlink = symlink.lstrip('/')
fh.write('{}\n'.format(symlink).encode())
os.fsync(fd)
fh.close()
# set mode to current umask
curmask = os.umask(0)
os.chmod(tmpfile, 0o0666 ^ curmask)
os.umask(curmask)
shutil.move(tmpfile, plpath)
finally:
# If something failed, don't leave tempfiles trailing around
if os.path.exists(tmpfile):
os.unlink(tmpfile)
logger.info(' projlist: wrote %s', plpath)
def fill_todo_from_manifest(config, q_mani, nomtime=False, forcepurge=False):
# l_ = local, r_ = remote
l_mani_path = config['core'].get('manifest')
r_mani_cmd = config['remote'].get('manifest_command')
if r_mani_cmd:
if not os.access(r_mani_cmd, os.X_OK):
logger.critical('Remote manifest command is not executable: %s', r_mani_cmd)
sys.exit(1)
logger.info(' manifest: executing %s', r_mani_cmd)
cmdargs = [r_mani_cmd]
if nomtime:
cmdargs += ['--force']
(ecode, output, error) = grokmirror.run_shell_command(cmdargs)
if ecode == 0:
try:
r_manifest = json.loads(output)
except json.JSONDecodeError as ex:
logger.warning('Failed to parse output from %s', r_mani_cmd)
logger.warning('Error was: %s', ex)
raise IOError('Failed to parse output from %s (%s)' % (r_mani_cmd, ex))
elif ecode == 127:
logger.info(' manifest: unchanged')
return
elif ecode == 1:
logger.warning('Executing %s failed, exiting', r_mani_cmd, ecode)
raise IOError('Failed executing %s' % r_mani_cmd)
else:
# Non-fatal errors for all other exit codes
logger.warning(' manifest: executing %s returned %s', r_mani_cmd, ecode)
return
if not len(r_manifest):
logger.warning(' manifest: empty, ignoring')
raise IOError('Empty manifest returned by %s' % r_mani_cmd)
else:
r_mani_status_path = os.path.join(os.path.dirname(l_mani_path), '.%s.remote' % os.path.basename(l_mani_path))
try:
with open(r_mani_status_path, 'r') as fh:
r_mani_status = json.loads(fh.read())
except (IOError, json.JSONDecodeError):
logger.debug('Could not read %s', r_mani_status_path)
r_mani_status = dict()
r_last_fetched = r_mani_status.get('last-fetched', 0)
config_last_modified = r_mani_status.get('config-last-modified', 0)
if config_last_modified != config.last_modified:
nomtime = True
r_mani_url = config['remote'].get('manifest')
logger.info(' manifest: fetching %s', r_mani_url)
if r_mani_url.find('file:///') == 0:
r_mani_url = r_mani_url.replace('file://', '')
if not os.path.exists(r_mani_url):
logger.critical('Remote manifest not found in %s! Quitting!', r_mani_url)
raise IOError('Remote manifest not found in %s' % r_mani_url)
fstat = os.stat(r_mani_url)
r_last_modified = fstat[8]
if r_last_fetched:
logger.debug('mtime on %s is: %s', r_mani_url, fstat[8])
if not nomtime and r_last_modified <= r_last_fetched:
logger.info(' manifest: unchanged')
return
logger.info('Reading new manifest from %s', r_mani_url)
r_manifest = grokmirror.read_manifest(r_mani_url)
# Don't accept empty manifests -- that indicates something is wrong
if not len(r_manifest):
logger.warning('Remote manifest empty or unparseable! Quitting.')
raise IOError('Empty manifest in %s' % r_mani_url)
else:
session = grokmirror.get_requests_session()
# Find out if we need to run at all first
headers = dict()
if r_last_fetched and not nomtime:
last_modified_h = time.strftime('%a, %d %b %Y %H:%M:%S GMT', time.gmtime(r_last_fetched))
logger.debug('Our last-modified is: %s', last_modified_h)
headers['If-Modified-Since'] = last_modified_h
try:
# 30 seconds to connect, 5 minutes between reads
res = session.get(r_mani_url, headers=headers, timeout=(30, 300))
except requests.exceptions.RequestException as ex:
logger.warning('Could not fetch %s', r_mani_url)
logger.warning('Server returned: %s', ex)
raise IOError('Remote server returned an error: %s' % ex)
if res.status_code == 304:
# No change to the manifest, nothing to do
logger.info(' manifest: unchanged')
return
if res.status_code > 200:
logger.warning('Could not fetch %s', r_mani_url)
logger.warning('Server returned status: %s', res.status_code)
raise IOError('Remote server returned an error: %s' % res.status_code)
r_last_modified = res.headers['Last-Modified']
r_last_modified = time.strptime(r_last_modified, '%a, %d %b %Y %H:%M:%S %Z')
r_last_modified = calendar.timegm(r_last_modified)
# We don't use read_manifest for the remote manifest, as it can be
# anything, really. For now, blindly open it with gzipfile if it ends
# with .gz. XXX: some http servers will auto-deflate such files.
try:
if r_mani_url.rfind('.gz') > 0:
import io
fh = gzip.GzipFile(fileobj=io.BytesIO(res.content))
jdata = fh.read().decode()
else:
jdata = res.content
res.close()
# Don't hold session open, since we don't refetch manifest very frequently
session.close()
r_manifest = json.loads(jdata)
except Exception as ex:
logger.warning('Failed to parse %s', r_mani_url)
logger.warning('Error was: %s', ex)
raise IOError('Failed to parse %s (%s)' % (r_mani_url, ex))
# Record for the next run
with open(r_mani_status_path, 'w') as fh:
r_mani_status = {
'source': r_mani_url,
'last-fetched': r_last_modified,
'config-last-modified': config.last_modified,
}
json.dump(r_mani_status, fh)
l_manifest = grokmirror.read_manifest(l_mani_path)
r_culled = cull_manifest(r_manifest, config)
logger.info(' manifest: %s relevant entries', len(r_culled))
toplevel = os.path.realpath(config['core'].get('toplevel'))
obstdir = os.path.realpath(config['core'].get('objstore'))
forkgroups = build_optimal_forkgroups(l_manifest, r_culled, toplevel, obstdir)
privmasks = set([x.strip() for x in config['core'].get('private', '').split('\n')])
# populate private/forkgroup info in r_culled
for forkgroup, siblings in forkgroups.items():
for s_fullpath in siblings:
s_gitdir = '/' + os.path.relpath(s_fullpath, toplevel)
is_private = False
for privmask in privmasks:
# Does this repo match privrepo
if fnmatch.fnmatch(s_gitdir, privmask):
is_private = True
break
if s_gitdir in r_culled:
r_culled[s_gitdir]['forkgroup'] = forkgroup
r_culled[s_gitdir]['private'] = is_private
seen = set()
to_migrate = set()
# Used to track symlinks so we can properly avoid purging them
all_symlinks = set()
for gitdir, repoinfo in r_culled.items():
symlinks = repoinfo.get('symlinks')
if symlinks and isinstance(symlinks, list):
all_symlinks.update(set(symlinks))
if gitdir in seen:
continue
seen.add(gitdir)
fullpath = os.path.join(toplevel, gitdir.lstrip('/'))
forkgroup = repoinfo.get('forkgroup')
# Is the directory in place?
if os.path.exists(fullpath):
# Did grok-fsck request to reclone it?
rfile = os.path.join(fullpath, 'grokmirror.reclone')
if os.path.exists(rfile):
logger.debug('Reclone requested for %s:', gitdir)
q_mani.put((gitdir, repoinfo, 'reclone'))
with open(rfile, 'r') as rfh:
reason = rfh.read()
logger.debug(' %s', reason)
continue
if gitdir not in l_manifest:
q_mani.put((gitdir, repoinfo, 'fix_remotes'))
continue
r_desc = r_culled[gitdir].get('description')
r_owner = r_culled[gitdir].get('owner')
r_head = r_culled[gitdir].get('head')
l_desc = l_manifest[gitdir].get('description')
l_owner = l_manifest[gitdir].get('owner')
l_head = l_manifest[gitdir].get('head')
if l_owner is None:
l_owner = config['pull'].get('default_owner', 'Grokmirror')
if r_owner is None:
r_owner = config['pull'].get('default_owner', 'Grokmirror')
if r_desc != l_desc or r_owner != l_owner or r_head != l_head:
q_mani.put((gitdir, repoinfo, 'fix_params'))
if symlinks and isinstance(symlinks, list):
# Are all symlinks in place?
for symlink in symlinks:
linkpath = os.path.join(toplevel, symlink.lstrip('/'))
if not os.path.islink(linkpath) or os.path.realpath(linkpath) != fullpath:
q_mani.put((gitdir, repoinfo, 'fix_params'))
break
my_fingerprint = grokmirror.get_repo_fingerprint(toplevel, gitdir)
if my_fingerprint != l_manifest[gitdir].get('fingerprint'):
logger.debug('Fingerprint discrepancy, forcing a fetch')
q_mani.put((gitdir, repoinfo, 'pull'))
continue
if my_fingerprint == r_culled[gitdir]['fingerprint']:
logger.debug('Fingerprints match, skipping %s', gitdir)
continue
logger.debug('No fingerprint match, will pull %s', gitdir)
q_mani.put((gitdir, repoinfo, 'pull'))
continue
if not forkgroup:
# no-sibling repo
q_mani.put((gitdir, repoinfo, 'init'))
continue
obstrepo = os.path.join(obstdir, '%s.git' % forkgroup)
if os.path.isdir(obstrepo):
# Init with an existing obstrepo, easy case
q_mani.put((gitdir, repoinfo, 'init'))
continue
# Do we have any existing siblings that were cloned without obstrepo?
# This would happen when an initial fork is created of an existing repo.
found_existing = False
public_siblings = set()
for s_fullpath in forkgroups[forkgroup]:
s_gitdir = '/' + os.path.relpath(s_fullpath, toplevel)
if s_gitdir == gitdir:
continue
# can't simply rely on r_culled 'private' info, as this repo may only exist locally
is_private = False
for privmask in privmasks:
# Does this repo match privrepo
if fnmatch.fnmatch(s_gitdir, privmask):
is_private = True
break
if is_private:
# Can't use this sibling for anything, as it's private
continue
if os.path.isdir(s_fullpath):
found_existing = True
if s_gitdir not in to_migrate:
# Plan to migrate it to objstore
logger.debug('reusing existing %s as new obstrepo %s', s_gitdir, obstrepo)
s_repoinfo = grokmirror.get_repo_defs(toplevel, s_gitdir, usenow=True)
s_repoinfo['forkgroup'] = forkgroup
s_repoinfo['private'] = False
# Stick it into queue before the new clone
q_mani.put((s_gitdir, s_repoinfo, 'objstore_migrate'))
seen.add(s_gitdir)
to_migrate.add(s_gitdir)
break
if s_gitdir in r_culled:
public_siblings.add(s_gitdir)
if found_existing:
q_mani.put((gitdir, repoinfo, 'init'))
continue
if repoinfo['private'] and len(public_siblings):
# Clone public siblings first
for s_gitdir in public_siblings:
if s_gitdir not in seen:
q_mani.put((s_gitdir, r_culled[s_gitdir], 'init'))
seen.add(s_gitdir)
# Finally, clone ourselves.
q_mani.put((gitdir, repoinfo, 'init'))
if config['pull'].getboolean('purge', False):
nopurge = config['pull'].get('nopurge', '').split('\n')
to_purge = set()
found_repos = 0
for founddir in grokmirror.find_all_gitdirs(toplevel, exclude_objstore=True):
gitdir = '/' + os.path.relpath(founddir, toplevel)
found_repos += 1
if gitdir not in r_culled and gitdir not in all_symlinks:
exclude = False
for entry in nopurge:
if fnmatch.fnmatch(gitdir, entry):
exclude = True
break
# Refuse to purge ffonly repos
for globpatt in set([x.strip() for x in config['pull'].get('ffonly', '').split('\n')]):
if fnmatch.fnmatch(gitdir, globpatt):
# Woah, these are not supposed to be deleted, ever
logger.critical('Refusing to purge ffonly repo %s', gitdir)
exclude = True
break
if not exclude:
logger.debug('Adding %s to to_purge', gitdir)
to_purge.add(gitdir)
if len(to_purge):
# Purge-protection engage
purge_limit = int(config['pull'].getint('purgeprotect', 5))
if purge_limit < 1 or purge_limit > 99:
logger.critical('Warning: "%s" is not valid for purgeprotect.', purge_limit)
logger.critical('Please set to a number between 1 and 99.')
logger.critical('Defaulting to purgeprotect=5.')
purge_limit = 5
purge_pc = int(len(to_purge) * 100 / found_repos)
logger.debug('purgeprotect=%s', purge_limit)
logger.debug('purge prercentage=%s', purge_pc)
if not forcepurge and purge_pc >= purge_limit:
logger.critical('Refusing to purge %s repos (%s%%)', len(to_purge), purge_pc)
logger.critical('Set purgeprotect to a higher percentage, or override with --force-purge.')
else:
for gitdir in to_purge:
logger.debug('Queued %s for purging', gitdir)
q_mani.put((gitdir, None, 'purge'))
else:
logger.debug('No repositories need purging')
def update_manifest(config, entries):
manifile = config['core'].get('manifest')
grokmirror.manifest_lock(manifile)
manifest = grokmirror.read_manifest(manifile)
changed = False
while len(entries):
gitdir, repoinfo, action, success = entries.pop()
if not success:
continue
if action == 'purge':
# Remove entry from manifest
try:
manifest.pop(gitdir)
changed = True
except KeyError:
pass
continue
try:
# does not belong in the manifest
repoinfo.pop('private')
except KeyError:
pass
for key, val in dict(repoinfo).items():
# Clean up grok-2.0 null values
if key in ('head', 'forkgroup') and val is None:
repoinfo.pop(key)
# Make sure 'reference' is present to prevent grok-1.x breakage
if 'reference' not in repoinfo:
repoinfo['reference'] = None
manifest[gitdir] = repoinfo
changed = True
if changed:
if 'manifest' in config:
pretty = config['manifest'].getboolean('pretty', False)
else:
pretty = False
grokmirror.write_manifest(manifile, manifest, pretty=pretty)
logger.info(' manifest: wrote %s (%d entries)', manifile, len(manifest))
# write out projects.list, if asked to
write_projects_list(config, manifest)
grokmirror.manifest_unlock(manifile)
def socket_worker(config, q_mani, sockfile):
logger.info(' listener: listening on socket %s', sockfile)
curmask = os.umask(0)
with ThreadedUnixStreamServer(sockfile, Handler) as server:
os.umask(curmask)
# Stick some objects into the server
server.q_mani = q_mani
server.config = config
server.serve_forever()
def showstats(q_todo, q_pull, q_spa, good, bad, pws, dws):
stats = list()
if good:
stats.append('%s fetched' % good)
if pws:
stats.append('%s active' % len(pws))
if not q_pull.empty():
stats.append('%s queued' % q_pull.qsize())
if not q_todo.empty():
stats.append('%s waiting' % q_todo.qsize())
if len(dws) or not q_spa.empty():
stats.append('%s in spa' % (q_spa.qsize() + len(dws)))
if bad:
stats.append('%s errors' % bad)
logger.info(' ---: %s', ', '.join(stats))
def manifest_worker(config, q_mani, nomtime=False):
starttime = int(time.time())
fill_todo_from_manifest(config, q_mani, nomtime=nomtime)
refresh = config['pull'].getint('refresh', 300)
left = refresh - int(time.time() - starttime)
if left > 0:
logger.info(' manifest: sleeping %ss', left)
def pull_mirror(config, nomtime=False, forcepurge=False, runonce=False):
toplevel = os.path.realpath(config['core'].get('toplevel'))
obstdir = os.path.realpath(config['core'].get('objstore'))
refresh = config['pull'].getint('refresh', 300)
q_mani = mp.Queue()
q_todo = mp.Queue()
q_pull = mp.Queue()
q_done = mp.Queue()
q_spa = mp.Queue()
sw = None
sockfile = config['pull'].get('socket')
if sockfile and not runonce:
if os.path.exists(sockfile):
mode = os.stat(sockfile).st_mode
if stat.S_ISSOCK(mode):
os.unlink(sockfile)
else:
raise IOError('File exists but is not a socket: %s' % sockfile)
sw = mp.Process(target=socket_worker, args=(config, q_mani, sockfile))
sw.daemon = True
sw.start()
pws = list()
dws = list()
mws = list()
actions = set()
# Run in the main thread if we have runonce
if runonce:
fill_todo_from_manifest(config, q_mani, nomtime=nomtime, forcepurge=forcepurge)
if not q_mani.qsize():
return 0
else:
# force nomtime to True the first time
nomtime = True
lastrun = 0
pull_threads = config['pull'].getint('pull_threads', 0)
if pull_threads < 1 and mp.cpu_count() > 1:
# take half of available CPUs by default
pull_threads = int(mp.cpu_count() / 2)
elif pull_threads < 1:
pull_threads = 1
busy = set()
done = list()
good = 0
bad = 0
loopmark = None
with SignalHandler(config, sw, dws, pws, done):
while True:
for pw in pws:
if pw and not pw.is_alive():
pws.remove(pw)
logger.info(' worker: terminated (%s remaining)', len(pws))
showstats(q_todo, q_pull, q_spa, good, bad, pws, dws)
for dw in dws:
if dw and not dw.is_alive():
dws.remove(dw)
showstats(q_todo, q_pull, q_spa, good, bad, pws, dws)
for mw in mws:
if mw and not mw.is_alive():
mws.remove(mw)
if not q_spa.empty() and not len(dws):
if runonce:
pauseonload = False
else:
pauseonload = True
dw = mp.Process(target=spa_worker, args=(config, q_spa, pauseonload))
dw.daemon = True
dw.start()
dws.append(dw)
if not q_pull.empty() and len(pws) < pull_threads:
pw = mp.Process(target=pull_worker, args=(config, q_pull, q_spa, q_done))
pw.daemon = True
pw.start()
pws.append(pw)
logger.info(' worker: started (%s running)', len(pws))
# Any new results?
try:
while True:
gitdir, repoinfo, q_action, success = q_done.get_nowait()
try:
actions.remove((gitdir, q_action))
except KeyError:
pass
forkgroup = repoinfo.get('forkgroup')
if forkgroup and forkgroup in busy:
busy.remove(forkgroup)
done.append((gitdir, repoinfo, q_action, success))
if success:
good += 1
else:
bad += 1
logger.info(' done: %s', gitdir)
showstats(q_todo, q_pull, q_spa, good, bad, pws, dws)
if len(done) >= 100:
# Write manifest every 100 repos
update_manifest(config, done)
except queue.Empty:
pass
# Anything new in the manifest queue?
try:
new_updates = 0
while True:
gitdir, repoinfo, action = q_mani.get_nowait()
if (gitdir, action) in actions:
logger.debug('already in the queue: %s, %s', gitdir, action)
continue
if action == 'pull' and (gitdir, 'init') in actions:
logger.debug('already in the queue as init: %s, %s', gitdir, action)
continue
actions.add((gitdir, action))
q_todo.put((gitdir, repoinfo, action))
new_updates += 1
logger.debug('queued: %s, %s', gitdir, action)
if new_updates:
logger.info(' manifest: %s new updates', new_updates)
except queue.Empty:
pass
if not runonce and not len(mws) and q_todo.empty() and q_pull.empty() and time.time() - lastrun >= refresh:
if done:
update_manifest(config, done)
mw = mp.Process(target=manifest_worker, args=(config, q_mani, nomtime))
nomtime = False
mw.daemon = True
mw.start()
mws.append(mw)
lastrun = int(time.time())
# Finally, deal with q_todo
try:
gitdir, repoinfo, q_action = q_todo.get_nowait()
logger.debug('main_thread: got %s/%s from q_todo', gitdir, q_action)
except queue.Empty:
if q_mani.empty() and q_done.empty():
if not len(pws):
if done:
update_manifest(config, done)
if runonce:
# Wait till spa is done
while True:
if q_spa.empty():
for dw in dws:
dw.join()
return 0
time.sleep(1)
if len(pws):
# Don't run a hot loop waiting on results
time.sleep(5)
else:
# Shorter sleep if everything is idle
time.sleep(1)
continue
if repoinfo is None:
repoinfo = dict()
fullpath = os.path.join(toplevel, gitdir.lstrip('/'))
forkgroup = repoinfo.get('forkgroup')
if gitdir in busy or (forkgroup is not None and forkgroup in busy):
# Stick it back into the queue
q_todo.put((gitdir, repoinfo, q_action))
if loopmark is None:
loopmark = gitdir
elif loopmark == gitdir:
# We've looped around all waiting repos, so back off and don't run
# a hot waiting loop.
time.sleep(5)
continue
if gitdir == loopmark:
loopmark = None
if q_action == 'objstore_migrate':
# Add forkgroup to busy, so we don't run any pulls until it's done
busy.add(repoinfo['forkgroup'])
obstrepo = grokmirror.setup_objstore_repo(obstdir, name=forkgroup)
grokmirror.add_repo_to_objstore(obstrepo, fullpath)
grokmirror.set_altrepo(fullpath, obstrepo)
if q_action != 'init':
# Easy actions that don't require priority logic
q_pull.put((gitdir, repoinfo, q_action, q_action))
continue
try:
grokmirror.lock_repo(fullpath, nonblocking=True)
except IOError:
if not runonce:
q_todo.put((gitdir, repoinfo, q_action))
continue
if not grokmirror.setup_bare_repo(fullpath):
logger.critical('Unable to bare-init %s', fullpath)
q_done.put((gitdir, repoinfo, q_action, False))
continue
fix_remotes(toplevel, gitdir, config['remote'].get('site'), config)
set_repo_params(fullpath, repoinfo)
grokmirror.unlock_repo(fullpath)
forkgroup = repoinfo.get('forkgroup')
if not forkgroup:
logger.debug('no-sibling clone: %s', gitdir)
q_pull.put((gitdir, repoinfo, 'pull', q_action))
continue
obstrepo = os.path.join(obstdir, '%s.git' % forkgroup)
if os.path.isdir(obstrepo):
logger.debug('clone %s with existing obstrepo %s', gitdir, obstrepo)
grokmirror.set_altrepo(fullpath, obstrepo)
if not repoinfo['private']:
grokmirror.add_repo_to_objstore(obstrepo, fullpath)
q_pull.put((gitdir, repoinfo, 'pull', q_action))
continue
# Set up a new obstrepo and make sure it's not used until the initial
# pull is done
logger.debug('cloning %s with new obstrepo %s', gitdir, obstrepo)
busy.add(forkgroup)
obstrepo = grokmirror.setup_objstore_repo(obstdir, name=forkgroup)
grokmirror.set_altrepo(fullpath, obstrepo)
if not repoinfo['private']:
grokmirror.add_repo_to_objstore(obstrepo, fullpath)
q_pull.put((gitdir, repoinfo, 'pull', q_action))
return 0
def parse_args():
import argparse
# noinspection PyTypeChecker
op = argparse.ArgumentParser(prog='grok-pull',
description='Create or update a git repository collection mirror',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
op.add_argument('-v', '--verbose', dest='verbose', action='store_true',
default=False,
help='Be verbose and tell us what you are doing')
op.add_argument('-n', '--no-mtime-check', dest='nomtime',
action='store_true', default=False,
help='Run without checking manifest mtime')
op.add_argument('-p', '--purge', dest='purge',
action='store_true', default=False,
help='Remove any git trees that are no longer in manifest')
op.add_argument('--force-purge', dest='forcepurge',
action='store_true', default=False,
help='Force purge despite significant repo deletions')
op.add_argument('-o', '--continuous', dest='runonce',
action='store_false', default=True,
help='Run continuously (no effect if refresh is not set in config)')
op.add_argument('-c', '--config', dest='config',
required=True,
help='Location of the configuration file')
op.add_argument('--version', action='version', version=grokmirror.VERSION)
return op.parse_args()
def grok_pull(cfgfile, verbose=False, nomtime=False, purge=False, forcepurge=False, runonce=False):
global logger
config = grokmirror.load_config_file(cfgfile)
if config['pull'].get('refresh', None) is None:
runonce = True
logfile = config['core'].get('log', None)
if config['core'].get('loglevel', 'info') == 'debug':
loglevel = logging.DEBUG
else:
loglevel = logging.INFO
if purge:
# Override the pull.purge setting
config['pull']['purge'] = 'yes'
logger = grokmirror.init_logger('pull', logfile, loglevel, verbose)
return pull_mirror(config, nomtime, forcepurge, runonce)
def command():
opts = parse_args()
retval = grok_pull(
opts.config, opts.verbose, opts.nomtime, opts.purge, opts.forcepurge, opts.runonce)
sys.exit(retval)
if __name__ == '__main__':
command()
grokmirror-2.0.11/man/ 0000775 0000000 0000000 00000000000 14103301457 0014541 5 ustar 00root root 0000000 0000000 grokmirror-2.0.11/man/grok-bundle.1 0000664 0000000 0000000 00000005240 14103301457 0017035 0 ustar 00root root 0000000 0000000 .\" Man page generated from reStructuredText.
.
.TH GROK-BUNDLE 1 "2020-09-04" "2.0.0" ""
.SH NAME
GROK-BUNDLE \- Create clone.bundle files for use with "repo"
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.SH SYNOPSIS
.INDENT 0.0
.INDENT 3.5
grok\-bundle [options] \-c grokmirror.conf \-o path
.UNINDENT
.UNINDENT
.SH DESCRIPTION
.sp
Android\(aqs "repo" tool will check for the presence of clone.bundle files
before performing a fresh git clone. This is done in order to offload
most of the git traffic to a CDN and reduce the load on git servers
themselves.
.sp
This command will generate clone.bundle files in a hierarchy expected by
repo. You can then sync the output directory to a CDN service.
.SH OPTIONS
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.B \-h\fP,\fB \-\-help
show this help message and exit
.TP
.B \-v\fP,\fB \-\-verbose
Be verbose and tell us what you are doing (default: False)
.TP
.BI \-c \ CONFIG\fP,\fB \ \-\-config \ CONFIG
Location of the configuration file
.TP
.BI \-o \ OUTDIR\fP,\fB \ \-\-outdir \ OUTDIR
Location where to store bundle files
.TP
.BI \-g \ GITARGS\fP,\fB \ \-\-gitargs \ GITARGS
extra args to pass to git (default: \-c core.compression=9)
.TP
.BI \-r \ REVLISTARGS\fP,\fB \ \-\-revlistargs \ REVLISTARGS
Rev\-list args to use (default: \-\-branches HEAD)
.TP
.BI \-s \ MAXSIZE\fP,\fB \ \-\-maxsize \ MAXSIZE
Maximum size of git repositories to bundle (in GiB) (default: 2)
.TP
.BI \-i\fP,\fB \-\-include \ INCLUDE
List repositories to bundle (accepts shell globbing) (default: *)
.UNINDENT
.UNINDENT
.UNINDENT
.SH EXAMPLES
.INDENT 0.0
.INDENT 3.5
grok\-bundle \-c grokmirror.conf \-o /var/www/bundles \-i /pub/scm/linux/kernel/git/torvalds/linux.git /pub/scm/linux/kernel/git/stable/linux.git /pub/scm/linux/kernel/git/next/linux\-next.git
.UNINDENT
.UNINDENT
.SH SEE ALSO
.INDENT 0.0
.IP \(bu 2
grok\-pull(1)
.IP \(bu 2
grok\-manifest(1)
.IP \(bu 2
grok\-fsck(1)
.IP \(bu 2
git(1)
.UNINDENT
.SH SUPPORT
.sp
Email \fI\%tools@linux.kernel.org\fP\&.
.SH AUTHOR
mricon@kernel.org
License: GPLv3+
.SH COPYRIGHT
The Linux Foundation and contributors
.\" Generated by docutils manpage writer.
.
grokmirror-2.0.11/man/grok-bundle.1.rst 0000664 0000000 0000000 00000003604 14103301457 0017646 0 ustar 00root root 0000000 0000000 GROK-BUNDLE
===========
-------------------------------------------------
Create clone.bundle files for use with "repo"
-------------------------------------------------
:Author: mricon@kernel.org
:Date: 2020-09-04
:Copyright: The Linux Foundation and contributors
:License: GPLv3+
:Version: 2.0.0
:Manual section: 1
SYNOPSIS
--------
grok-bundle [options] -c grokmirror.conf -o path
DESCRIPTION
-----------
Android's "repo" tool will check for the presence of clone.bundle files
before performing a fresh git clone. This is done in order to offload
most of the git traffic to a CDN and reduce the load on git servers
themselves.
This command will generate clone.bundle files in a hierarchy expected by
repo. You can then sync the output directory to a CDN service.
OPTIONS
-------
-h, --help show this help message and exit
-v, --verbose Be verbose and tell us what you are doing (default: False)
-c CONFIG, --config CONFIG
Location of the configuration file
-o OUTDIR, --outdir OUTDIR
Location where to store bundle files
-g GITARGS, --gitargs GITARGS
extra args to pass to git (default: -c core.compression=9)
-r REVLISTARGS, --revlistargs REVLISTARGS
Rev-list args to use (default: --branches HEAD)
-s MAXSIZE, --maxsize MAXSIZE
Maximum size of git repositories to bundle (in GiB) (default: 2)
-i, --include INCLUDE
List repositories to bundle (accepts shell globbing) (default: \*)
EXAMPLES
--------
grok-bundle -c grokmirror.conf -o /var/www/bundles -i /pub/scm/linux/kernel/git/torvalds/linux.git /pub/scm/linux/kernel/git/stable/linux.git /pub/scm/linux/kernel/git/next/linux-next.git
SEE ALSO
--------
* grok-pull(1)
* grok-manifest(1)
* grok-fsck(1)
* git(1)
SUPPORT
-------
Email tools@linux.kernel.org.
grokmirror-2.0.11/man/grok-dumb-pull.1 0000664 0000000 0000000 00000005601 14103301457 0017466 0 ustar 00root root 0000000 0000000 .\" Man page generated from reStructuredText.
.
.TH GROK-DUMB-PULL 1 "2020-08-14" "2.0.0" ""
.SH NAME
GROK-DUMB-PULL \- Update git repositories not managed by grokmirror
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.SH SYNOPSIS
.INDENT 0.0
.INDENT 3.5
grok\-dumb\-pull [options] /path/to/repos
.UNINDENT
.UNINDENT
.SH DESCRIPTION
.sp
This is a satellite utility that updates repositories not exported via
grokmirror manifest. You will need to manually clone these repositories
using "git clone \-\-mirror" and then define a cronjob to update them as
frequently as you require. Grok\-dumb\-pull will bluntly execute "git
remote update" in each of them.
.SH OPTIONS
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.B \-\-version
show program\(aqs version number and exit
.TP
.B \-h\fP,\fB \-\-help
show this help message and exit
.TP
.B \-v\fP,\fB \-\-verbose
Be verbose and tell us what you are doing
.TP
.B \-s\fP,\fB \-\-svn
The remotes for these repositories are Subversion
.TP
.BI \-r \ REMOTES\fP,\fB \ \-\-remote\-names\fB= REMOTES
Only fetch remotes matching this name (accepts globbing,
can be passed multiple times)
.TP
.BI \-u \ POSTHOOK\fP,\fB \ \-\-post\-update\-hook\fB= POSTHOOK
Run this hook after each repository is updated. Passes
full path to the repository as the sole argument.
.TP
.BI \-l \ LOGFILE\fP,\fB \ \-\-logfile\fB= LOGFILE
Put debug logs into this file
.UNINDENT
.UNINDENT
.UNINDENT
.SH EXAMPLES
.sp
The following will update all bare git repositories found in
/path/to/repos hourly, and /path/to/special/repo.git daily, fetching
only the "github" remote:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
MAILTO=root
# Update all repositories found in /path/to/repos hourly
0 * * * * mirror /usr/bin/grok\-dumb\-pull /path/to/repos
# Update /path/to/special/repo.git daily, fetching "github" remote
0 0 * * * mirror /usr/bin/grok\-dumb\-pull \-r github /path/to/special/repo.git
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Make sure the user "mirror" (or whichever user you specified) is able to
write to the repos specified.
.SH SEE ALSO
.INDENT 0.0
.IP \(bu 2
grok\-pull(1)
.IP \(bu 2
grok\-manifest(1)
.IP \(bu 2
grok\-fsck(1)
.IP \(bu 2
git(1)
.UNINDENT
.SH SUPPORT
.sp
Email \fI\%tools@linux.kernel.org\fP\&.
.SH AUTHOR
mricon@kernel.org
License: GPLv3+
.SH COPYRIGHT
The Linux Foundation and contributors
.\" Generated by docutils manpage writer.
.
grokmirror-2.0.11/man/grok-dumb-pull.1.rst 0000664 0000000 0000000 00000004167 14103301457 0020303 0 ustar 00root root 0000000 0000000 GROK-DUMB-PULL
==============
-------------------------------------------------
Update git repositories not managed by grokmirror
-------------------------------------------------
:Author: mricon@kernel.org
:Date: 2020-08-14
:Copyright: The Linux Foundation and contributors
:License: GPLv3+
:Version: 2.0.0
:Manual section: 1
SYNOPSIS
--------
grok-dumb-pull [options] /path/to/repos
DESCRIPTION
-----------
This is a satellite utility that updates repositories not exported via
grokmirror manifest. You will need to manually clone these repositories
using "git clone --mirror" and then define a cronjob to update them as
frequently as you require. Grok-dumb-pull will bluntly execute "git
remote update" in each of them.
OPTIONS
-------
--version show program's version number and exit
-h, --help show this help message and exit
-v, --verbose Be verbose and tell us what you are doing
-s, --svn The remotes for these repositories are Subversion
-r REMOTES, --remote-names=REMOTES
Only fetch remotes matching this name (accepts globbing,
can be passed multiple times)
-u POSTHOOK, --post-update-hook=POSTHOOK
Run this hook after each repository is updated. Passes
full path to the repository as the sole argument.
-l LOGFILE, --logfile=LOGFILE
Put debug logs into this file
EXAMPLES
--------
The following will update all bare git repositories found in
/path/to/repos hourly, and /path/to/special/repo.git daily, fetching
only the "github" remote::
MAILTO=root
# Update all repositories found in /path/to/repos hourly
0 * * * * mirror /usr/bin/grok-dumb-pull /path/to/repos
# Update /path/to/special/repo.git daily, fetching "github" remote
0 0 * * * mirror /usr/bin/grok-dumb-pull -r github /path/to/special/repo.git
Make sure the user "mirror" (or whichever user you specified) is able to
write to the repos specified.
SEE ALSO
--------
* grok-pull(1)
* grok-manifest(1)
* grok-fsck(1)
* git(1)
SUPPORT
-------
Email tools@linux.kernel.org.
grokmirror-2.0.11/man/grok-fsck.1 0000664 0000000 0000000 00000004504 14103301457 0016514 0 ustar 00root root 0000000 0000000 .\" Man page generated from reStructuredText.
.
.TH GROK-FSCK 1 "2020-08-14" "2.0.0" ""
.SH NAME
GROK-FSCK \- Optimize mirrored repositories and check for corruption
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.SH SYNOPSIS
.INDENT 0.0
.INDENT 3.5
grok\-fsck \-c /path/to/grokmirror.conf
.UNINDENT
.UNINDENT
.SH DESCRIPTION
.sp
Git repositories should be routinely repacked and checked for
corruption. This utility will perform the necessary optimizations and
report any problems to the email defined via fsck.report_to (\(aqroot\(aq by
default). It should run weekly from cron or from the systemd timer (see
contrib).
.sp
Please examine the example grokmirror.conf file for various things you
can tweak.
.SH OPTIONS
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.B \-\-version
show program\(aqs version number and exit
.TP
.B \-h\fP,\fB \-\-help
show this help message and exit
.TP
.B \-v\fP,\fB \-\-verbose
Be verbose and tell us what you are doing
.TP
.B \-f\fP,\fB \-\-force
Force immediate run on all repositories.
.TP
.BI \-c \ CONFIG\fP,\fB \ \-\-config\fB= CONFIG
Location of fsck.conf
.TP
.B \-\-repack\-only
Only find and repack repositories that need
optimizing (nightly run mode)
.TP
.B \-\-connectivity
(Assumes \-\-force): Run git fsck on all repos,
but only check connectivity
.TP
.B \-\-repack\-all\-quick
(Assumes \-\-force): Do a quick repack of all repos
.TP
.B \-\-repack\-all\-full
(Assumes \-\-force): Do a full repack of all repos
.UNINDENT
.UNINDENT
.UNINDENT
.SH SEE ALSO
.INDENT 0.0
.IP \(bu 2
grok\-manifest(1)
.IP \(bu 2
grok\-pull(1)
.IP \(bu 2
git(1)
.UNINDENT
.SH SUPPORT
.sp
Email \fI\%tools@linux.kernel.org\fP\&.
.SH AUTHOR
mricon@kernel.org
License: GPLv3+
.SH COPYRIGHT
The Linux Foundation and contributors
.\" Generated by docutils manpage writer.
.
grokmirror-2.0.11/man/grok-fsck.1.rst 0000664 0000000 0000000 00000003154 14103301457 0017323 0 ustar 00root root 0000000 0000000 GROK-FSCK
=========
-------------------------------------------------------
Optimize mirrored repositories and check for corruption
-------------------------------------------------------
:Author: mricon@kernel.org
:Date: 2020-08-14
:Copyright: The Linux Foundation and contributors
:License: GPLv3+
:Version: 2.0.0
:Manual section: 1
SYNOPSIS
--------
grok-fsck -c /path/to/grokmirror.conf
DESCRIPTION
-----------
Git repositories should be routinely repacked and checked for
corruption. This utility will perform the necessary optimizations and
report any problems to the email defined via fsck.report_to ('root' by
default). It should run weekly from cron or from the systemd timer (see
contrib).
Please examine the example grokmirror.conf file for various things you
can tweak.
OPTIONS
-------
--version show program's version number and exit
-h, --help show this help message and exit
-v, --verbose Be verbose and tell us what you are doing
-f, --force Force immediate run on all repositories.
-c CONFIG, --config=CONFIG
Location of fsck.conf
--repack-only Only find and repack repositories that need
optimizing (nightly run mode)
--connectivity (Assumes --force): Run git fsck on all repos,
but only check connectivity
--repack-all-quick (Assumes --force): Do a quick repack of all repos
--repack-all-full (Assumes --force): Do a full repack of all repos
SEE ALSO
--------
* grok-manifest(1)
* grok-pull(1)
* git(1)
SUPPORT
-------
Email tools@linux.kernel.org.
grokmirror-2.0.11/man/grok-manifest.1 0000664 0000000 0000000 00000011506 14103301457 0017374 0 ustar 00root root 0000000 0000000 .\" Man page generated from reStructuredText.
.
.TH GROK-MANIFEST 1 "2020-08-14" "2.0.0" ""
.SH NAME
GROK-MANIFEST \- Create manifest for use with grokmirror
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.SH SYNOPSIS
.INDENT 0.0
.INDENT 3.5
grok\-manifest [opts] \-m manifest.js[.gz] \-t /path [/path/to/bare.git]
.UNINDENT
.UNINDENT
.SH DESCRIPTION
.sp
Call grok\-manifest from a git post\-update or post\-receive hook to create
the latest repository manifest. This manifest file is downloaded by
mirroring systems (if manifest is newer than what they already have) and
used to only clone/pull the repositories that have changed since the
grok\-pull\(aqs last run.
.SH OPTIONS
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.B \-\-version
show program\(aqs version number and exit
.TP
.B \-h\fP,\fB \-\-help
show this help message and exit
.TP
.BI \-\-cfgfile\fB= CFGFILE
Path to grokmirror.conf containing a [manifest] section
.TP
.BI \-m \ MANIFILE\fP,\fB \ \-\-manifest\fB= MANIFILE
Location of manifest.js or manifest.js.gz
.TP
.BI \-t \ TOPLEVEL\fP,\fB \ \-\-toplevel\fB= TOPLEVEL
Top dir where all repositories reside
.TP
.BI \-l \ LOGFILE\fP,\fB \ \-\-logfile\fB= LOGFILE
When specified, will put debug logs in this location
.TP
.B \-c\fP,\fB \-\-check\-export\-ok
Honor the git\-daemon\-export\-ok magic file and
do not export repositories not marked as such
.TP
.B \-n\fP,\fB \-\-use\-now
Use current timestamp instead of parsing commits
.TP
.B \-p\fP,\fB \-\-purge
Purge deleted git repositories from manifest
.TP
.B \-x\fP,\fB \-\-remove
Remove repositories passed as arguments from
the manifest file
.TP
.B \-y\fP,\fB \-\-pretty
Pretty\-print the generated manifest (sort repos
and add indentation). This is much slower, so
should be used with caution on large
collections.
.TP
.B \-w\fP,\fB \-\-wait\-for\-manifest
When running with arguments, wait if manifest is not
there (can be useful when multiple writers are writing
to the manifest file via NFS)
.TP
.BI \-i \ IGNORE\fP,\fB \ \-\-ignore\-paths\fB= IGNORE
When finding git dirs, ignore these paths (can be used
multiple times, accepts shell\-style globbing)
.TP
.B \-o\fP,\fB \-\-fetch\-objstore
Fetch updates into objstore repo (if used)
.TP
.B \-v\fP,\fB \-\-verbose
Be verbose and tell us what you are doing
.UNINDENT
.UNINDENT
.UNINDENT
.sp
You can set some of these options in a config file that you can pass via
\fB\-\-cfgfile\fP option. See example grokmirror.conf file for
documentation. Values passed via cmdline flags will override the
corresponding config file values.
.SH EXAMPLES
.sp
The examples assume that the repositories are located in
\fB/var/lib/gitolite3/repositories\fP\&.
.sp
Initial manifest generation:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
/usr/bin/grok\-manifest \-m /var/www/html/manifest.js.gz \e
\-t /var/lib/gitolite3/repositories
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Inside the git hook:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
/usr/bin/grok\-manifest \-m /var/www/html/manifest.js.gz \e
\-t /var/lib/gitolite3/repositories \-n \(gapwd\(ga
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
To purge deleted repositories from the manifest, use the \fB\-p\fP flag
when running from cron:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
/usr/bin/grok\-manifest \-m /var/www/html/manifest.js.gz \e
\-t /var/lib/gitolite3/repositories \-p
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
You can also add it to the gitolite\(aqs \fBD\fP command using the \fB\-x\fP flag:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
/usr/bin/grok\-manifest \-m /var/www/html/manifest.js.gz \e
\-t /var/lib/gitolite3/repositories \e
\-x $repo.git
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
To troubleshoot potential problems, you can pass \fB\-l\fP parameter to
grok\-manifest, just make sure the user executing the hook command (user
git or gitolite, for example) is able to write to that location:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
/usr/bin/grok\-manifest \-m /var/www/html/manifest.js.gz \e
\-t /var/lib/gitolite3/repositories \e
\-l /var/log/grokmirror/grok\-manifest.log \-n \(gapwd\(ga
.ft P
.fi
.UNINDENT
.UNINDENT
.SH SEE ALSO
.INDENT 0.0
.IP \(bu 2
grok\-pull(1)
.IP \(bu 2
git(1)
.UNINDENT
.SH SUPPORT
.sp
Email \fI\%tools@linux.kernel.org\fP\&.
.SH AUTHOR
mricon@kernel.org
License: GPLv3+
.SH COPYRIGHT
The Linux Foundation and contributors
.\" Generated by docutils manpage writer.
.
grokmirror-2.0.11/man/grok-manifest.1.rst 0000664 0000000 0000000 00000007561 14103301457 0020211 0 ustar 00root root 0000000 0000000 GROK-MANIFEST
=============
---------------------------------------
Create manifest for use with grokmirror
---------------------------------------
:Author: mricon@kernel.org
:Date: 2020-08-14
:Copyright: The Linux Foundation and contributors
:License: GPLv3+
:Version: 2.0.0
:Manual section: 1
SYNOPSIS
--------
grok-manifest [opts] -m manifest.js[.gz] -t /path [/path/to/bare.git]
DESCRIPTION
-----------
Call grok-manifest from a git post-update or post-receive hook to create
the latest repository manifest. This manifest file is downloaded by
mirroring systems (if manifest is newer than what they already have) and
used to only clone/pull the repositories that have changed since the
grok-pull's last run.
OPTIONS
-------
--version show program's version number and exit
-h, --help show this help message and exit
--cfgfile=CFGFILE Path to grokmirror.conf containing a [manifest] section
-m MANIFILE, --manifest=MANIFILE
Location of manifest.js or manifest.js.gz
-t TOPLEVEL, --toplevel=TOPLEVEL
Top dir where all repositories reside
-l LOGFILE, --logfile=LOGFILE
When specified, will put debug logs in this location
-c, --check-export-ok
Honor the git-daemon-export-ok magic file and
do not export repositories not marked as such
-n, --use-now Use current timestamp instead of parsing commits
-p, --purge Purge deleted git repositories from manifest
-x, --remove Remove repositories passed as arguments from
the manifest file
-y, --pretty Pretty-print the generated manifest (sort repos
and add indentation). This is much slower, so
should be used with caution on large
collections.
-w, --wait-for-manifest
When running with arguments, wait if manifest is not
there (can be useful when multiple writers are writing
to the manifest file via NFS)
-i IGNORE, --ignore-paths=IGNORE
When finding git dirs, ignore these paths (can be used
multiple times, accepts shell-style globbing)
-o, --fetch-objstore Fetch updates into objstore repo (if used)
-v, --verbose Be verbose and tell us what you are doing
You can set some of these options in a config file that you can pass via
``--cfgfile`` option. See example grokmirror.conf file for
documentation. Values passed via cmdline flags will override the
corresponding config file values.
EXAMPLES
--------
The examples assume that the repositories are located in
``/var/lib/gitolite3/repositories``.
Initial manifest generation::
/usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \
-t /var/lib/gitolite3/repositories
Inside the git hook::
/usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \
-t /var/lib/gitolite3/repositories -n `pwd`
To purge deleted repositories from the manifest, use the ``-p`` flag
when running from cron::
/usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \
-t /var/lib/gitolite3/repositories -p
You can also add it to the gitolite's ``D`` command using the ``-x`` flag::
/usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \
-t /var/lib/gitolite3/repositories \
-x $repo.git
To troubleshoot potential problems, you can pass ``-l`` parameter to
grok-manifest, just make sure the user executing the hook command (user
git or gitolite, for example) is able to write to that location::
/usr/bin/grok-manifest -m /var/www/html/manifest.js.gz \
-t /var/lib/gitolite3/repositories \
-l /var/log/grokmirror/grok-manifest.log -n `pwd`
SEE ALSO
--------
* grok-pull(1)
* git(1)
SUPPORT
-------
Email tools@linux.kernel.org.
grokmirror-2.0.11/man/grok-pi-piper.1 0000664 0000000 0000000 00000006146 14103301457 0017317 0 ustar 00root root 0000000 0000000 .\" Man page generated from reStructuredText.
.
.TH GROK-PI-PIPER 1 "2020-10-07" "2.0.2" ""
.SH NAME
GROK-PI-PIPER \- Hook script for piping new messages from public-inbox repos
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.SH SYNOPSIS
.INDENT 0.0
.INDENT 3.5
grok\-pi\-piper [\-h] [\-v] [\-d] \-c CONFIG [\-l PIPELAST] [\-\-version] repo
.UNINDENT
.UNINDENT
.SH DESCRIPTION
.sp
This is a ready\-made hook script that can be called from
pull.post_update_hook when mirroring public\-inbox repositories. It will
pipe all newly received messages to arbitrary commands defined in the
config file. The simplest configuration for lore.kernel.org is:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
~/.config/pi\-piper.conf
\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
[DEFAULT]
pipe = /usr/bin/procmail
# Prune successfully processed messages
shallow = yes
~/.procmailrc
\-\-\-\-\-\-\-\-\-\-\-\-\-
DEFAULT=$HOME/Maildir/
# Don\(aqt deliver cross\-posted duplicates
:0 Wh: .msgid.lock
| formail \-D 8192 .msgid.cache
~/.config/lore.conf
\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
[core]
toplevel = ~/.local/share/grokmirror/lore
log = ${toplevel}/grokmirror.log
[remote]
site = https://lore.kernel.org
manifest = https://lore.kernel.org/manifest.js.gz
[pull]
post_update_hook = ~/.local/bin/grok\-pi\-piper \-c ~/.config/pi\-piper.conf
include = /list\-you\-want/*
/another\-list/*
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
It assumes that grokmirror was installed from pip. If you installed it
via some other means, please check the path for the grok\-pi\-piper
script.
.sp
Note, that initial clone may take a long time, even if you set
shallow=yes.
.sp
See pi\-piper.conf for other config options.
.SH OPTIONS
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.B \-h\fP,\fB \-\-help
show this help message and exit
.TP
.B \-v\fP,\fB \-\-verbose
Be verbose and tell us what you are doing (default: False)
.TP
.B \-d\fP,\fB \-\-dry\-run
Do a dry\-run and just show what would be done (default: False)
.TP
.BI \-c \ CONFIG\fP,\fB \ \-\-config \ CONFIG
Location of the configuration file (default: None)
.TP
.BI \-l \ PIPELAST\fP,\fB \ \-\-pipe\-last \ PIPELAST
Force pipe last NN messages in the list, regardless of tracking (default: None)
.TP
.B \-\-version
show program\(aqs version number and exit
.UNINDENT
.UNINDENT
.UNINDENT
.SH SEE ALSO
.INDENT 0.0
.IP \(bu 2
grok\-pull(1)
.IP \(bu 2
git(1)
.UNINDENT
.SH SUPPORT
.sp
Email \fI\%tools@linux.kernel.org\fP\&.
.SH AUTHOR
mricon@kernel.org
License: GPLv3+
.SH COPYRIGHT
The Linux Foundation and contributors
.\" Generated by docutils manpage writer.
.
grokmirror-2.0.11/man/grok-pi-piper.1.rst 0000664 0000000 0000000 00000004525 14103301457 0020125 0 ustar 00root root 0000000 0000000 GROK-PI-PIPER
=============
-----------------------------------------------------------
Hook script for piping new messages from public-inbox repos
-----------------------------------------------------------
:Author: mricon@kernel.org
:Date: 2020-10-07
:Copyright: The Linux Foundation and contributors
:License: GPLv3+
:Version: 2.0.2
:Manual section: 1
SYNOPSIS
--------
grok-pi-piper [-h] [-v] [-d] -c CONFIG [-l PIPELAST] [--version] repo
DESCRIPTION
-----------
This is a ready-made hook script that can be called from
pull.post_update_hook when mirroring public-inbox repositories. It will
pipe all newly received messages to arbitrary commands defined in the
config file. The simplest configuration for lore.kernel.org is::
~/.config/pi-piper.conf
-----------------------
[DEFAULT]
pipe = /usr/bin/procmail
# Prune successfully processed messages
shallow = yes
~/.procmailrc
-------------
DEFAULT=$HOME/Maildir/
# Don't deliver cross-posted duplicates
:0 Wh: .msgid.lock
| formail -D 8192 .msgid.cache
~/.config/lore.conf
-------------------
[core]
toplevel = ~/.local/share/grokmirror/lore
log = ${toplevel}/grokmirror.log
[remote]
site = https://lore.kernel.org
manifest = https://lore.kernel.org/manifest.js.gz
[pull]
post_update_hook = ~/.local/bin/grok-pi-piper -c ~/.config/pi-piper.conf
include = /list-you-want/*
/another-list/*
It assumes that grokmirror was installed from pip. If you installed it
via some other means, please check the path for the grok-pi-piper
script.
Note, that initial clone may take a long time, even if you set
shallow=yes.
See pi-piper.conf for other config options.
OPTIONS
-------
-h, --help show this help message and exit
-v, --verbose Be verbose and tell us what you are doing (default: False)
-d, --dry-run Do a dry-run and just show what would be done (default: False)
-c CONFIG, --config CONFIG
Location of the configuration file (default: None)
-l PIPELAST, --pipe-last PIPELAST
Force pipe last NN messages in the list, regardless of tracking (default: None)
--version show program's version number and exit
SEE ALSO
--------
* grok-pull(1)
* git(1)
SUPPORT
-------
Email tools@linux.kernel.org.
grokmirror-2.0.11/man/grok-pull.1 0000664 0000000 0000000 00000006404 14103301457 0016543 0 ustar 00root root 0000000 0000000 .\" Man page generated from reStructuredText.
.
.TH GROK-PULL 1 "2020-08-14" "2.0.0" ""
.SH NAME
GROK-PULL \- Clone or update local git repositories
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.SH SYNOPSIS
.INDENT 0.0
.INDENT 3.5
grok\-pull \-c /path/to/grokmirror.conf
.UNINDENT
.UNINDENT
.SH DESCRIPTION
.sp
Grok\-pull is the main tool for replicating repository updates from the
grokmirror primary server to the mirrors.
.sp
Grok\-pull has two modes of operation \-\- onetime and continous
(daemonized). In one\-time operation mode, it downloads the latest
manifest and applies any outstanding updates. If there are new
repositories or changes in the existing repositories, grok\-pull will
perform the necessary git commands to clone or fetch the required data
from the master. Once all updates are applied, it will write its own
manifest and exit. In this mode, grok\-pull can be run manually or from
cron.
.sp
In continuous operation mode (when run with \-o), grok\-pull will continue
running after all updates have been applied and will periodically
re\-download the manifest from the server to check for new updates. For
this to work, you must set pull.refresh in grokmirror.conf to the amount
of seconds you would like it to wait between refreshes.
.sp
If pull.socket is specified, grok\-pull will also listen on a socket for
any push updates (relative repository path as present in the manifest
file, terminated with newlines). This can be used for pubsub
subscriptions (see contrib).
.SH OPTIONS
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.B \-\-version
show program\(aqs version number and exit
.TP
.B \-h\fP,\fB \-\-help
show this help message and exit
.TP
.B \-v\fP,\fB \-\-verbose
Be verbose and tell us what you are doing
.TP
.B \-n\fP,\fB \-\-no\-mtime\-check
Run without checking manifest mtime.
.TP
.B \-o\fP,\fB \-\-continuous
Run continuously (no effect if refresh is not set)
.TP
.BI \-c \ CONFIG\fP,\fB \ \-\-config\fB= CONFIG
Location of the configuration file
.TP
.B \-p\fP,\fB \-\-purge
Remove any git trees that are no longer in manifest.
.TP
.B \-\-force\-purge
Force purge operation despite significant repo deletions
.UNINDENT
.UNINDENT
.UNINDENT
.SH EXAMPLES
.sp
Use grokmirror.conf and modify it to reflect your needs. The example
configuration file is heavily commented. To invoke, run:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
grok\-pull \-v \-c /path/to/grokmirror.conf
.ft P
.fi
.UNINDENT
.UNINDENT
.SH SEE ALSO
.INDENT 0.0
.IP \(bu 2
grok\-manifest(1)
.IP \(bu 2
grok\-fsck(1)
.IP \(bu 2
git(1)
.UNINDENT
.SH SUPPORT
.sp
Please email \fI\%tools@linux.kernel.org\fP\&.
.SH AUTHOR
mricon@kernel.org
License: GPLv3+
.SH COPYRIGHT
The Linux Foundation and contributors
.\" Generated by docutils manpage writer.
.
grokmirror-2.0.11/man/grok-pull.1.rst 0000664 0000000 0000000 00000004603 14103301457 0017351 0 ustar 00root root 0000000 0000000 GROK-PULL
=========
--------------------------------------
Clone or update local git repositories
--------------------------------------
:Author: mricon@kernel.org
:Date: 2020-08-14
:Copyright: The Linux Foundation and contributors
:License: GPLv3+
:Version: 2.0.0
:Manual section: 1
SYNOPSIS
--------
grok-pull -c /path/to/grokmirror.conf
DESCRIPTION
-----------
Grok-pull is the main tool for replicating repository updates from the
grokmirror primary server to the mirrors.
Grok-pull has two modes of operation -- onetime and continous
(daemonized). In one-time operation mode, it downloads the latest
manifest and applies any outstanding updates. If there are new
repositories or changes in the existing repositories, grok-pull will
perform the necessary git commands to clone or fetch the required data
from the master. Once all updates are applied, it will write its own
manifest and exit. In this mode, grok-pull can be run manually or from
cron.
In continuous operation mode (when run with -o), grok-pull will continue
running after all updates have been applied and will periodically
re-download the manifest from the server to check for new updates. For
this to work, you must set pull.refresh in grokmirror.conf to the amount
of seconds you would like it to wait between refreshes.
If pull.socket is specified, grok-pull will also listen on a socket for
any push updates (relative repository path as present in the manifest
file, terminated with newlines). This can be used for pubsub
subscriptions (see contrib).
OPTIONS
-------
--version show program's version number and exit
-h, --help show this help message and exit
-v, --verbose Be verbose and tell us what you are doing
-n, --no-mtime-check Run without checking manifest mtime.
-o, --continuous Run continuously (no effect if refresh is not set)
-c CONFIG, --config=CONFIG
Location of the configuration file
-p, --purge Remove any git trees that are no longer in manifest.
--force-purge Force purge operation despite significant repo deletions
EXAMPLES
--------
Use grokmirror.conf and modify it to reflect your needs. The example
configuration file is heavily commented. To invoke, run::
grok-pull -v -c /path/to/grokmirror.conf
SEE ALSO
--------
* grok-manifest(1)
* grok-fsck(1)
* git(1)
SUPPORT
-------
Please email tools@linux.kernel.org.
grokmirror-2.0.11/pi-piper.conf 0000664 0000000 0000000 00000002606 14103301457 0016366 0 ustar 00root root 0000000 0000000 # These will be overriden by any sections below
[DEFAULT]
# To start piping public-inbox messages into your inbox, simply
# install procmail and add the following line to your ~/.procmailrc:
# DEFAULT=$HOME/Maildir/
# You can now read your mail with "mutt -f ~/Maildir/"
pipe = /usr/bin/procmail
# Once you've successfully piped the messages, you generally
# don't need them any more. If you set shallow = yes, then
# the repository will be configured as "shallow" and all succesffully
# processed messages will be pruned from the repo.
# This will greatly reduce disk space usage, especially on large archives.
# You can always get any number of them back, e.g. by running:
# git fetch _grokmirror master --deepen 100
shallow = yes
# You can use ~/ for paths in your home dir, or omit for no log
#log = ~/pi-piper.log
# Can be "info" or "debug". Note, that debug will have message bodies as well.
#loglevel = info
# Overrides for any defaults. You may not need any if all you want is to pipe all mirrored
# public-inboxes to procmail.
# Naming:
# We will perform simple shell-style globbing using the following rule:
# /{section}/git/*.git,
# so, for a section that matches /alsa-devel/git/0.git, name it "alsa-devel"
[alsa-devel]
# Use a different config file for this one
pipe = /usr/bin/procmail /path/to/some/other/procmailrc
[lkml]
# Setting pipe = None allows ignoring this particular list
pipe = None grokmirror-2.0.11/requirements.txt 0000664 0000000 0000000 00000000022 14103301457 0017244 0 ustar 00root root 0000000 0000000 packaging
requests grokmirror-2.0.11/setup.py 0000664 0000000 0000000 00000004600 14103301457 0015500 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright (C) 2013-2020 by The Linux Foundation and contributors
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see .
import os
import re
from setuptools import setup
def read(fname):
return open(os.path.join(os.path.dirname(__file__), fname)).read()
def find_version(source):
version_file = read(source)
version_match = re.search(r"^VERSION = ['\"]([^'\"]*)['\"]", version_file, re.M)
if version_match:
return version_match.group(1)
raise RuntimeError("Unable to find version string.")
NAME = 'grokmirror'
VERSION = find_version('grokmirror/__init__.py')
setup(
version=VERSION,
url='https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git',
download_url='https://www.kernel.org/pub/software/network/grokmirror/%s-%s.tar.xz' % (NAME, VERSION),
name=NAME,
description='Smartly mirror git repositories that use grokmirror',
author='Konstantin Ryabitsev',
author_email='konstantin@linuxfoundation.org',
packages=[NAME],
license='GPLv3+',
long_description=read('README.rst'),
long_description_content_type='text/x-rst',
keywords=['git', 'mirroring', 'repositories'],
project_urls={
'Source': 'https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git',
'Tracker': 'https://github.com/mricon/grokmirror/issues',
},
install_requires=[
'requests',
],
python_requires='>=3.6',
entry_points={
'console_scripts': [
"grok-dumb-pull=grokmirror.dumb_pull:command",
"grok-pull=grokmirror.pull:command",
"grok-fsck=grokmirror.fsck:command",
"grok-manifest=grokmirror.manifest:command",
"grok-bundle=grokmirror.bundle:command",
"grok-pi-piper=grokmirror.pi_piper:command",
]
}
)