pax_global_header 0000666 0000000 0000000 00000000064 14501370353 0014513 g ustar 00root root 0000000 0000000 52 comment=30d1ca3a4243858eb811ef7a00fadeeb9530e135
guerillabackup-0.5.0/ 0000775 0000000 0000000 00000000000 14501370353 0014507 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/LICENSE 0000664 0000000 0000000 00000016743 14501370353 0015527 0 ustar 00root root 0000000 0000000 GNU LESSER GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc.
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
This version of the GNU Lesser General Public License incorporates
the terms and conditions of version 3 of the GNU General Public
License, supplemented by the additional permissions listed below.
0. Additional Definitions.
As used herein, "this License" refers to version 3 of the GNU Lesser
General Public License, and the "GNU GPL" refers to version 3 of the GNU
General Public License.
"The Library" refers to a covered work governed by this License,
other than an Application or a Combined Work as defined below.
An "Application" is any work that makes use of an interface provided
by the Library, but which is not otherwise based on the Library.
Defining a subclass of a class defined by the Library is deemed a mode
of using an interface provided by the Library.
A "Combined Work" is a work produced by combining or linking an
Application with the Library. The particular version of the Library
with which the Combined Work was made is also called the "Linked
Version".
The "Minimal Corresponding Source" for a Combined Work means the
Corresponding Source for the Combined Work, excluding any source code
for portions of the Combined Work that, considered in isolation, are
based on the Application, and not on the Linked Version.
The "Corresponding Application Code" for a Combined Work means the
object code and/or source code for the Application, including any data
and utility programs needed for reproducing the Combined Work from the
Application, but excluding the System Libraries of the Combined Work.
1. Exception to Section 3 of the GNU GPL.
You may convey a covered work under sections 3 and 4 of this License
without being bound by section 3 of the GNU GPL.
2. Conveying Modified Versions.
If you modify a copy of the Library, and, in your modifications, a
facility refers to a function or data to be supplied by an Application
that uses the facility (other than as an argument passed when the
facility is invoked), then you may convey a copy of the modified
version:
a) under this License, provided that you make a good faith effort to
ensure that, in the event an Application does not supply the
function or data, the facility still operates, and performs
whatever part of its purpose remains meaningful, or
b) under the GNU GPL, with none of the additional permissions of
this License applicable to that copy.
3. Object Code Incorporating Material from Library Header Files.
The object code form of an Application may incorporate material from
a header file that is part of the Library. You may convey such object
code under terms of your choice, provided that, if the incorporated
material is not limited to numerical parameters, data structure
layouts and accessors, or small macros, inline functions and templates
(ten or fewer lines in length), you do both of the following:
a) Give prominent notice with each copy of the object code that the
Library is used in it and that the Library and its use are
covered by this License.
b) Accompany the object code with a copy of the GNU GPL and this license
document.
4. Combined Works.
You may convey a Combined Work under terms of your choice that,
taken together, effectively do not restrict modification of the
portions of the Library contained in the Combined Work and reverse
engineering for debugging such modifications, if you also do each of
the following:
a) Give prominent notice with each copy of the Combined Work that
the Library is used in it and that the Library and its use are
covered by this License.
b) Accompany the Combined Work with a copy of the GNU GPL and this license
document.
c) For a Combined Work that displays copyright notices during
execution, include the copyright notice for the Library among
these notices, as well as a reference directing the user to the
copies of the GNU GPL and this license document.
d) Do one of the following:
0) Convey the Minimal Corresponding Source under the terms of this
License, and the Corresponding Application Code in a form
suitable for, and under terms that permit, the user to
recombine or relink the Application with a modified version of
the Linked Version to produce a modified Combined Work, in the
manner specified by section 6 of the GNU GPL for conveying
Corresponding Source.
1) Use a suitable shared library mechanism for linking with the
Library. A suitable mechanism is one that (a) uses at run time
a copy of the Library already present on the user's computer
system, and (b) will operate properly with a modified version
of the Library that is interface-compatible with the Linked
Version.
e) Provide Installation Information, but only if you would otherwise
be required to provide such information under section 6 of the
GNU GPL, and only to the extent that such information is
necessary to install and execute a modified version of the
Combined Work produced by recombining or relinking the
Application with a modified version of the Linked Version. (If
you use option 4d0, the Installation Information must accompany
the Minimal Corresponding Source and Corresponding Application
Code. If you use option 4d1, you must provide the Installation
Information in the manner specified by section 6 of the GNU GPL
for conveying Corresponding Source.)
5. Combined Libraries.
You may place library facilities that are a work based on the
Library side by side in a single library together with other library
facilities that are not Applications and are not covered by this
License, and convey such a combined library under terms of your
choice, if you do both of the following:
a) Accompany the combined library with a copy of the same work based
on the Library, uncombined with any other library facilities,
conveyed under the terms of this License.
b) Give prominent notice with the combined library that part of it
is a work based on the Library, and explaining where to find the
accompanying uncombined form of the same work.
6. Revised Versions of the GNU Lesser General Public License.
The Free Software Foundation may publish revised and/or new versions
of the GNU Lesser General Public License from time to time. Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the
Library as you received it specifies that a certain numbered version
of the GNU Lesser General Public License "or any later version"
applies to it, you have the option of following the terms and
conditions either of that published version or of any later version
published by the Free Software Foundation. If the Library as you
received it does not specify a version number of the GNU Lesser
General Public License, you may choose any version of the GNU Lesser
General Public License ever published by the Free Software Foundation.
If the Library as you received it specifies that a proxy can decide
whether future versions of the GNU Lesser General Public License shall
apply, that proxy's public statement of acceptance of any version is
permanent authorization for you to choose that version for the
Library.
guerillabackup-0.5.0/README.md 0000664 0000000 0000000 00000004600 14501370353 0015766 0 ustar 00root root 0000000 0000000 # GuerillaBackup:
GuerillaBackup is a minimalistic backup toolbox for asynchronous,
local-coordinated, distributed, resilient and secure backup generation,
data distribution, verification, storage and deletion suited for
rugged environments. GuerillaBackup could be the right solution
for you if you want
* distributed backup data generation under control of the source
system owner, assuming that he knows best what data is worth
being written to backup and which policies (retention time,
copy count, encryption, non-repudiation) should be applied
* operation with limited bandwith, instable network connectivity,
limited storage space
* data confidentiality, integrity, availability guarantees even
with a limited number of compromised or malicious backup processing
nodes
* limited trust between backup data source and sink system(s)
When you need the following features, you might look for a standard
free or commercial backup solution:
* central control of backup and retention policies
* central unlimited access to all data
* operate under stable conditions with solid network, sufficient
storage, trust between both backup data source and sink
# Getting started:
For those just wanting to get started quickly, following trail
might be the best:
* Build (see "Building" below) the software or install the binary
package from file ("dpkg -i guerillabackup_[version]_all.deb")
or repository ("apt-get install guerillabackup".
* Follow the steps from "doc/Installation.txt", section "General
GuerillaBackup Configuration".
* If not everything is fine yet, see "doc/FAQs.txt" to see if
your problem is already known.
* If still not working, please file a bug/feature request at github,
see "Resources" section below.
# Building:
* Build a native Debian test package using the default template:
see data/debian.template/Readme.txt
# Resources:
* Bugs, feature requests: https://github.com/halfdog/guerillabackup/issues
# Documentation:
* doc/Design.txt: GuerillaBackup design documentation
* doc/Implementation.txt: GuerillaBackup implementation documentation
* doc/Installation.txt: GuerillaBackup end user installation
documentation
* Manual pages: doc/gb-backup-generator.1.xml and doc/gb-transfer-service.1.xml
here on github, usually "man (gb-backup-generator|gb-transfer-service)"
when installed from package repository
* doc/FAQs.txt: GuerillaBackup frequently asked questions
guerillabackup-0.5.0/data/ 0000775 0000000 0000000 00000000000 14501370353 0015420 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/data/debian.template/ 0000775 0000000 0000000 00000000000 14501370353 0020454 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/data/debian.template/NEWS 0000664 0000000 0000000 00000000334 14501370353 0021153 0 ustar 00root root 0000000 0000000 guerillabackup (0.5.0) unstable; urgency=low
A general overhaul of package structure was done to improve
compatibility with Debian packaging standards.
-- halfdog Thu, 14 Sep 2023 21:00:00 +0000
guerillabackup-0.5.0/data/debian.template/Readme.txt 0000664 0000000 0000000 00000001207 14501370353 0022412 0 ustar 00root root 0000000 0000000 This directory contains the Debian packaging files to build a
native package. They are kept in the data directory: quilt type
Debian package building cannot use any files included in the
DEBIAN directory of an upstream orig.tar.gz.
To build a native package using the template files, use following
commands:
projectBaseDir="... directory with GuerillaBackup source ..."
tmpDir="$(mktemp -d)"
mkdir -- "${tmpDir}/guerillabackup"
cp -aT -- "${projectBaseDir}" "${tmpDir}/guerillabackup"
mv -i -- "${tmpDir}/guerillabackup/data/debian.template" "${tmpDir}/guerillabackup/debian" < /dev/null
cd "${tmpDir}/guerillabackup"
dpkg-buildpackage -us -uc
guerillabackup-0.5.0/data/debian.template/changelog 0000664 0000000 0000000 00000007444 14501370353 0022337 0 ustar 00root root 0000000 0000000 guerillabackup (0.5.0) unstable; urgency=low
Features:
* Added systemd unit hardening.
Refactoring:
* Renamed binaries, pathnames according to Debian package inclusion
recommendations.
* Removed obsolete "/var/run" directories.
Bugfixes:
* gb-backup-generator:
* Provide working default configuration.
* gb-storage-tool:
* Avoid Exception, use exit(1) on error instead.
* Improved error message.
-- halfdog Thu, 14 Sep 2023 21:00:00 +0000
guerillabackup (0.4.0) unstable; urgency=low
Features:
* StorageTool:
* Added "Size" check policy to detect backups with abnormal size.
* Improved messages for interval policy violations and how to fix.
* Warn about files not having any applicable policies defined.
* Made policy inheritance control more explicit, improved
documentation.
Bugfixes:
* BackupGenerator:
* Fixed invalid executable path in systemd service templates.
* Fixed backup generation pipeline race condition on asynchronous
shutdown.
* Applied pylint.
* StorageTool:
* Removed anti-file-deletion protection left in code accidentally.
* Fixed Interval policy status handling when applying retention
policies.
* Fixed deletion mark handling with concurrent retention
policies.
* Fixed exception attempting element retrieval from nonexisting
source.
* Fixed error message typos.
-- halfdog Thu, 19 Jan 2023 14:10:45 +0000
guerillabackup (0.3.0) unstable; urgency=low
Features:
* Added StorageTool policy support to verify sane backup intervals
and to apply data retention policies.
-- halfdog Wed, 30 Nov 2022 15:55:59 +0000
guerillabackup (0.2.0) unstable; urgency=low
Features:
* Added StorageTool to check storage data status, currently
only checking for invalid file names in the storage directory.
Bugfixes:
* Improved TransferService error messages and formatting mistake
in man page.
-- halfdog Wed, 15 Jun 2022 07:57:19 +0000
guerillabackup (0.1.1) unstable; urgency=low
* Bugfixes:
* Correct handling of undefined start condition
-- halfdog Sat, 15 Jan 2022 08:56:30 +0000
guerillabackup (0.1.0) unstable; urgency=low
* Features:
* Added flexible backup generator run condition support
-- halfdog Sun, 5 Sep 2021 18:31:00 +0000
guerillabackup (0.0.2) unstable; urgency=low
* Fixes:
* Fixed wrong full tar backup interval defaults in template
* Features:
* Added TransferService clean shutdown on [Ctrl]-C
* Misc:
* Applied lintian/pylint suggestions
-- halfdog Sat, 24 Oct 2020 11:08:00 +0000
guerillabackup (0.0.1) unstable; urgency=low
* Fixes from Debian mentors review process
* Removed postrm script template
* Changed Debian package section from misc to extra
* Moved file/directory permission setting changes from postinst
to package building rules
* Manpage text corrections after spellchecking
* Features:
* Python 2 to 3 transition applying pylint for coding style
and syntax error detection, code refactoring
* Enforce gnupg encryption exit status check
* Bugfixes:
* Improved IO-handling to external processes during shutdown
* Improved transfer protocol error handling
* Disable console output buffering when not operating on TTYs
* Improved tar backup error status handling, cleanup
* Handle broken full/inc backup timing configuration gracefully
* Close file descriptors after file transfer or on shutdown
due to protocol errors
-- halfdog Thu, 19 Jul 2018 20:57:00 +0000
guerillabackup (0.0.0) unstable; urgency=low
* Initial packaging of guerillabackup
-- halfdog Fri, 30 Dec 2016 00:00:00 +0000
guerillabackup-0.5.0/data/debian.template/control 0000664 0000000 0000000 00000003307 14501370353 0022062 0 ustar 00root root 0000000 0000000 Source: guerillabackup
Section: misc
Priority: optional
Maintainer: halfdog
Build-Depends: debhelper-compat (=13), dh-python, docbook-xsl, docbook-xml, xsltproc
Standards-Version: 4.6.2
Rules-Requires-Root: no
Homepage: https://github.com/halfdog/guerillabackup
Vcs-Git: https://github.com/halfdog/guerillabackup.git
Vcs-Browser: https://github.com/halfdog/guerillabackup
Package: guerillabackup
Architecture: all
Depends: python3, ${misc:Depends}
Description: resilient, distributed backup and archiving solution
GuerillaBackup is a backup solution for tailoring special purpose
backup data flows, e.g. in rugged environments, with special legal
constraints (privacy regulations, cross border data storage), need
for integration of custom data processing, auditing or quality
assurance code. It is kept small to ease code audits and does
not attempt to duplicate features, stable and trusted high quality
encryption, networking or storage solutions already provide. So
for example it can be integrated with your choice of trusted transports
(ssh, SSL, xmpp, etc.) to transfer backups to other nodes according
to predefined or custom transfer policies. GuerillaBackup ensures
security by encrypting data at source and only this key can be
used to decrypt the data.
.
WARNING: If you are familiar with GDPR, ITIL-service-strategy/design
ISO-27k, ... this software will allow you to create custom solutions
fulfilling your needs for high quality, legally sound backup and
archiving data flows. If you do NOT run production with those
(or similar terms) in mind, you should look out for something
else.
.
See /usr/share/doc/guerillabackup/Design.txt.gz section "Requirements"
for more information.
guerillabackup-0.5.0/data/debian.template/copyright 0000664 0000000 0000000 00000002025 14501370353 0022406 0 ustar 00root root 0000000 0000000 Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
Upstream-Name: guerillabackup
Upstream-Contact: me@halfdog.net
Source: https://github.com/halfdog/guerillabackup.git
Files: *
Copyright: 2016-2023 halfdog
License: LGPL-3.0+
License: LGPL-3.0+
This package is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 3 of the License, or (at your option) any later version.
.
This package is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
.
You should have received a copy of the GNU General Public License
along with this program. If not, see .
.
On Debian systems, the complete text of the GNU Lesser General
Public License can be found in "/usr/share/common-licenses/LGPL-3".
guerillabackup-0.5.0/data/debian.template/dirs 0000664 0000000 0000000 00000000157 14501370353 0021343 0 ustar 00root root 0000000 0000000 etc/guerillabackup/lib-enabled
var/lib/guerillabackup
var/lib/guerillabackup/data
var/lib/guerillabackup/state
guerillabackup-0.5.0/data/debian.template/guerillabackup.install 0000664 0000000 0000000 00000000767 14501370353 0025050 0 ustar 00root root 0000000 0000000 src/lib usr/lib/guerillabackup
src/gb-backup-generator usr/bin
src/gb-storage-tool usr/bin
src/gb-transfer-service usr/bin
data/etc/* etc/guerillabackup
data/init/systemd/guerillabackup-generator.service lib/systemd/system
data/init/systemd/guerillabackup-transfer.service lib/systemd/system
doc/Design.txt usr/share/doc/guerillabackup
doc/Implementation.txt usr/share/doc/guerillabackup
doc/Installation.txt usr/share/doc/guerillabackup
doc/BackupGeneratorUnit.py.template usr/share/doc/guerillabackup
guerillabackup-0.5.0/data/debian.template/guerillabackup.lintian-overrides 0000664 0000000 0000000 00000001371 14501370353 0027030 0 ustar 00root root 0000000 0000000 # We use a non-standard dir permission to prohibit any non-root
# user to access keys or data by default.
guerillabackup binary: non-standard-dir-perm
# Ignore repeated path segment to have "guerillabackup" Python
# package in an otherwise empty directory so that modifying
# the Python search path to include "lib" will not include any
# other files from this or other packages in the search path.
guerillabackup binary: repeated-path-segment guerillabackup [usr/lib/guerillabackup/lib/guerillabackup/]
guerillabackup binary : repeated-path-segment lib [usr/lib/guerillabackup/lib/]
# Currently only systemd startup is provided, therefore ignore
# the warning but document it here.
guerillabackup binary: package-supports-alternative-init-but-no-init.d-script
guerillabackup-0.5.0/data/debian.template/guerillabackup.manpages 0000664 0000000 0000000 00000000123 14501370353 0025157 0 ustar 00root root 0000000 0000000 debian/gb-backup-generator.1
debian/gb-storage-tool.1
debian/gb-transfer-service.1
guerillabackup-0.5.0/data/debian.template/postinst 0000775 0000000 0000000 00000003055 14501370353 0022270 0 ustar 00root root 0000000 0000000 #!/bin/sh
# postinst script for guerillabackup
#
# see: dh_installdeb(1)
set -e
# summary of how this script can be called:
# * `configure'
# * `abort-upgrade'
# * `abort-remove' `in-favour'
#
# * `abort-remove'
# * `abort-deconfigure' `in-favour'
# `removing'
#
# for details, see https://www.debian.org/doc/debian-policy/ or
# the debian-policy package
# dh_installdeb will replace this with shell code automatically
# generated by other debhelper scripts.
#DEBHELPER#
# Start the service only when it was already enabled before updating.
# This is required as "dh_systemd_start" is disabled in rules
# file, thus not restarting the services. See rules file for more
# information.
if test -d /run/systemd/system && test -e /run/guerillabackup.dpkg-update.run-state; then
daemonReloadedFlag="false"
for serviceName in guerillabackup-generator.service guerillabackup-transfer.service; do
if grep -q -e "${serviceName}: active" -- /run/guerillabackup.dpkg-update.run-state; then
if [ "${daemonReloadedFlag}" != "true" ]; then
systemctl --system daemon-reload >/dev/null || true
daemonReloadedFlag="true"
fi
deb-systemd-invoke restart "${serviceName}" >/dev/null || true
fi
done
fi
rm -f -- /run/guerillabackup.dpkg-update.run-state
exit 0
guerillabackup-0.5.0/data/debian.template/prerm 0000775 0000000 0000000 00000002155 14501370353 0021532 0 ustar 00root root 0000000 0000000 #!/bin/sh
# prerm script for guerillabackup
#
# see: dh_installdeb(1)
set -e
# summary of how this script can be called:
# * `remove'
# * `upgrade'
# * `failed-upgrade'
# * `remove' `in-favour'
# * `deconfigure' `in-favour'
# `removing'
#
# for details, see https://www.debian.org/doc/debian-policy/ or
# the debian-policy package
# Capture the state of all currently running services before stop.
rm -f -- /run/guerillabackup.dpkg-update.run-state
if test -d /run/systemd/system; then
for serviceName in guerillabackup-generator.service guerillabackup-transfer.service; do
echo "${serviceName}: $(systemctl is-active "${serviceName}")" >> /run/guerillabackup.dpkg-update.run-state
deb-systemd-invoke stop "${serviceName}" > /dev/null
done
fi
# dh_installdeb will replace this with shell code automatically
# generated by other debhelper scripts.
#DEBHELPER#
exit 0
guerillabackup-0.5.0/data/debian.template/rules 0000775 0000000 0000000 00000002131 14501370353 0021531 0 ustar 00root root 0000000 0000000 #!/usr/bin/make -f
# -*- makefile -*-
# Uncomment this to turn on verbose mode.
# export DH_VERBOSE=1
%:
dh $@ --with=python3
override_dh_auto_build:
xsltproc --nonet \
--param make.year.ranges 1 \
--param make.single.year.ranges 1 \
--param man.charmap.use.subset 0 \
-o debian/ \
http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl \
doc/gb-backup-generator.1.xml doc/gb-storage-tool.1.xml \
doc/gb-transfer-service.1.xml
dh_auto_build
# Do not enable the services on fresh install by default. The
# user should do that manually for those services, he really wants
# to run. Also do not start the services after install or update.
# Without this option, all units would be started during upgrade,
# even those not enabled. When user did not enable them, dpkg
# should respect that. Those enabled will still be started by
# custom postinst code.
override_dh_installsystemd:
dh_installsystemd --no-enable --no-start
override_dh_fixperms:
dh_fixperms
chmod -R 00700 -- debian/guerillabackup/var/lib/guerillabackup
chmod 00700 -- debian/guerillabackup/etc/guerillabackup/keys
guerillabackup-0.5.0/data/debian.template/source/ 0000775 0000000 0000000 00000000000 14501370353 0021754 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/data/debian.template/source/format 0000664 0000000 0000000 00000000015 14501370353 0023163 0 ustar 00root root 0000000 0000000 3.0 (native)
guerillabackup-0.5.0/data/etc/ 0000775 0000000 0000000 00000000000 14501370353 0016173 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/data/etc/config.template 0000664 0000000 0000000 00000011452 14501370353 0021200 0 ustar 00root root 0000000 0000000 # GuerillaBackup main configuration file.
# General parameters influence behavior of various backup elements,
# e.g. source units, sinks or the generator itself. All those
# parameters start with "General" to indicate their global relevance.
# This is the default persistency storage base directory for all
# components. All components will create files or subdirectories
# starting with the component class name unless changed within
# configuration. See also "ComponentPersistencyPrefix" in unit
# or subunit configuration files.
# GeneralPersistencyBaseDir = '/var/lib/guerillabackup/state'
# This is the default runtime data directory for all components.
# It is used to create sockets, PID files and similar, that need
# not to be preserved between reboots.
# GeneralRuntimeDataDir = '/run/guerillabackup'
# This parameter defines the default pipeline element to use to
# compress backup data of any kind before sending it to downstream
# processing, usually encryption of sink. When enabling compression
# and encryption, you may want to disable the additional compression
# step included in many encryption toosl, e.g. via "--compress-algo
# none" in gpg.
# GeneralDefaultCompressionElement = guerillabackup.OSProcessPipelineElement('/bin/bzip2', ['/bin/bzip2', '-c9'])
# This parameter defines the default encryption pipeline element
# to use to encrypt backup data of any kind before sending it
# to donwstream processing. For security reasons, a unit might
# use an alternative encryption element, e.g. with different options
# or keys, but it should NEVER ignore the parameter, even when
# unit-specific encryption is disabled. Hence the unit shall never
# generate uncencrypted data while this parameter is not also
# overriden in the unit-specific configuration. See also function
# "getDefaultDownstreamPipeline" documentation.
# GeneralDefaultEncryptionElement = guerillabackup.GpgEncryptionPipelineElement('some key name')
# Debugging settings:
# This flag enables test mode for all configurable components
# in the data pipe from source to sink. As testing of most features
# will require to run real backups, the testing mode will cause
# an abort in the very last moment before completion. Wellbehaved
# components will roll back most of the actions under this circumstances.
# GeneralDebugTestModeFlag = False
# Generator specific settings: Those settings configure the local
# default backup generator.
# Use this sink for storage of backup data elements. The class
# has to have a constructor only taking one argument, that is
# the generator configuration context as defined by the SinkInterface.
# When empty, the guerillabackup.DefaultFileSystemSink is used.
# GeneratorSinkClass = guerillabackup.DefaultFileSystemSink
# Use this directory for storage of the backup data elements generated
# locally. The default location is "/var/lib/guerillabackup/data".
# You may want also to enable transfer services using this directory
# as source to copy or move backup data to an offsite location.
DefaultFileSystemSinkBaseDir = '/var/lib/guerillabackup/data'
# This parameter defines the conditions that have to be met to
# run any backup unit. The settings is intended to avoid running
# units at unfavorable times, e.g. during machine maintenance,
# immediately during boot time high CPU/disk activity but also
# when there is abnormally high load on the machine. When the
# condition is not met yet it will be reevaluated at the next
# scheduler run, usually some seconds later.
# DefaultUnitRunCondition = guerillabackup.LogicalAndCondition([
# guerillabackup.MinPowerOnTimeCondition(600),
# guerillabackup.AverageLoadLimitCondition(0.5, 240)])
# Unit specific default and specific settings can be found in
# the units directory.
# Transfer service configuration: this part of configuration does
# not take effect automatically, a transfer service has to be
# started loading this configuration file. When security considerations
# prohibit use of same configuration, e.g. due to inaccessibility
# of configuration file because of permission settings, then this
# file should be copied to "config-[agent name]" instead.
# Storage directory used by this transfer service. When not present,
# the DefaultFileSystemSinkBaseDir is used instead.
# TransferServiceStorageBaseDir = '/var/spool/guerillabackup/transfer'
# Class to load to define the transfer receiver policy.
# TransferReceiverPolicyClass = guerillabackup.Transfer.ReceiverStoreDataTransferPolicy
# Arguments for creating the named transfer policy to pass after
# the configuration context.
# TransferReceiverPolicyInitArgs = None
# Class to load to define the transfer sender policy.
# TransferSenderPolicyClass = guerillabackup.Transfer.SenderMoveDataTransferPolicy
# Arguments for creating the named transfer policy to pass after
# the configuration context.
# TransferSenderPolicyInitArgs = [False]
guerillabackup-0.5.0/data/etc/keys/ 0000775 0000000 0000000 00000000000 14501370353 0017146 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/data/etc/keys/Readme.txt 0000664 0000000 0000000 00000000565 14501370353 0021112 0 ustar 00root root 0000000 0000000 Use this directory to collect all the encryption keys required
for guerillabackup tool operation. By default, the directory is
only readable for the root user to hide the keys used by default.
Change only when you know what you are doing.
This directory is also the default home directory for Gnupg as
it does not support operation without warnings and no home directory.
guerillabackup-0.5.0/data/etc/storage-tool-config.json.template 0000664 0000000 0000000 00000005373 14501370353 0024572 0 ustar 00root root 0000000 0000000 # This is the gb-storage-tool configuration template. See also the
# gb-storage-tool man page for more information.
{
# These are the default policies to apply to resources in the
# data directory when they are first seen. Policies are passed
# on from a configuration to included subconfigurations. An included
# configuration may override a policy by defining another one
# on the same resource but with higher priority. To disable policy
# inheritance, add a null policy as first element in the list.
"Policies": [
{
"Sources": "^(.*/)?root$",
"Inherit": true,
"List": [
{
"Name": "Interval",
"Priority": 100,
"FullMin": "6d20H",
"FullMax": "6d28H",
"IncMin": "20H",
"IncMax": "28H"
}, {
"Name": "LevelRetention",
"Levels": [
# Keep weekly backups for 30 days, including incremental backups.
{
"KeepCount": 30,
"Interval": "day",
"TimeRef": "latest",
"KeepInc": true
},
# Keep weekly backups for 3 month, approx. 13 backups.
{
"KeepCount": 13,
"Interval": "day",
"AlignModulus": 7
},
# Keep monthly backups for 12 month.
{
"KeepCount": 12,
"Interval": "month"
},
# Keep 3-month backups for 3 years, total 12 backups.
{
"KeepCount": 12,
"Interval": "month",
"AlignModulus": 3,
"AlignValue": 1
},
# Keep yearly backups.
{
"KeepCount": 10,
"Interval": "year"
}
]
}
]
},
{
"Sources": "^(.*/)?var/log/.*$",
"List": [
{
"Name": "Interval",
"Priority": 100,
"FullMin": "20H",
"FullMax": "28H"
}
]
}
],
# This is the data directory for this configuration. All files
# not within the data directory of another (sub-)configuration
# have to be sane backup resource files or otherwise covered
# by a policy, usually the "Ignore" policy in the status file.
"DataDir": "/var/lib/guerillabackup/data",
# Ignore those files in the data directory. Ignoring nonexisting
# files will cause a warning.
"Ignore": [
],
# This is the status file defining the current status associated
# with files in "DataDir" when required.
"Status": "/var/lib/guerillabackup/state/storage-tool-status.json"
# Include a list of sub-configuration files for backup storages
# spread out over multiple unrelated data directories or to split
# one huge configuration into multiple smaller ones.
# "Include": [
# "/...[another storage].../storage-tool-config.json"
# ]
}
guerillabackup-0.5.0/data/etc/units/ 0000775 0000000 0000000 00000000000 14501370353 0017335 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/data/etc/units/LogfileBackupUnit.config.template 0000664 0000000 0000000 00000006112 14501370353 0025705 0 ustar 00root root 0000000 0000000 # LogFileBackupUnit configuration template
# This list contains tuples with five elements per logfile backup
# input. The meaning of each value is:
# * Input directory: absolute directory name to search for logfiles.
# * Input file regex: regular expression to select compressed
# or uncompressed logfiles for inclusion. When the regex contains
# a named group "oldserial", a file with empty serial is handled
# as newest while file with largest serial value is the oldest.
# With named group "serial", oldest file will have smallest
# serial number, e.g. with date or timestamp file extensions.
# When a named group "compress" is found, the match content,
# e.g. "gz" or "bz2", will be used to find a decompressor and
# decompress the file before processing.
# * Source URL transformation: If None, the first named group
# of the "input file regex" is appended to the input directory
# name and used as source URL. When not starting with a "/",
# the transformation string is the name to include literally
# in the URL after the "input directory" name.
# * Policy: If not none, include this string as handling policy
# within the manifest.
# * Encryption key name: If not None, encrypt the input using
# the named key.
LogBackupUnitInputList = []
# Include old (rotated) default syslog files, where serial number
# was already appended. Accept also the compressed variants.
# LogBackupUnitInputList.append((
# '/var/log',
# '^(auth\\.log|daemon\\.log|debug|kern\\.log|mail\\.err|mail\\.info|' \
# 'mail\\.log|mail\\.warn|messages|syslog)\\.(?P[0-9]+)' \
# '(?:\\.(?Pgz))?$',
# None, None, None))
# Other logs and backup files:
# LogBackupUnitInputList.append((
# '/var/log',
# '^(alternatives\\.log|btmp|dmesg|dpkg\\.log|wtmp)\\.' \
# '(?P[0-9]+)(?:\\.(?Pgz))?$',
# None, None, None))
# Apt logs:
# LogBackupUnitInputList.append((
# '/var/log/apt',
# '^([a-z]+\\.log)\\.(?P[0-9]+)(?:\\.(?Pgz))?$',
# None, None, None))
# Apache logs:
# LogBackupUnitInputList.append((
# '/var/log/apache2',
# '^([0-9a-zA-Z.-]+)\\.(?P[0-9]+)(?:\\.(?Pgz))?$',
# None, None, None))
# Firewall logs:
# LogBackupUnitInputList.append((
# '/var/log/ulog',
# '^(ulogd\\.pcap)\\.(?P[0-9]+)(?:\\.(?Pgz))?$',
# None, None, None))
# Tomcat logs:
# LogBackupUnitInputList.append((
# '/var/log/tomcat8',
# '^(catalina\\.out)\\.(?P[0-9]+)(?:\\.(?Pgz))?$',
# None, None, None))
# LogBackupUnitInputList.append((
# '/var/log/tomcat8',
# '^(catalina)\\.(?P[0-9-]{10})\\.log(?:\\.(?Pgz))?$',
# 'catalina.log', None, None))
# LogBackupUnitInputList.append((
# '/var/log/tomcat8',
# '^(localhost)\\.(?P[0-9-]{10})\\.log(?:\\.(?Pgz))?$',
# 'localhost.log', None, None))
# LogBackupUnitInputList.append((
# '/var/log/tomcat8',
# '^(localhost_access_log)\\.(?P[0-9-]{10})\\.txt(?:\\.(?Pgz))?$',
# 'localhost_access_log.txt', None, None))
guerillabackup-0.5.0/data/etc/units/Readme.txt 0000664 0000000 0000000 00000001221 14501370353 0021267 0 ustar 00root root 0000000 0000000 This directory contains all the loaded units plus configuration
parameter overrides, if available. When not available, the main
backup generator configuration, usually "/etc/guerillabackup/config",
is passed to each unit unmodified.
A valid unit can be a symlink to a guerillabackup core unit, e.g.
/usr/lib/guerillabackup/lib/guerillabackup/LogfileBackupUnit.py
but also a local unit definition written into a plain file.
To be loaded, the unit definition file name has to contain only
numbers and letters. An associated configuration file has the
same name with suffix ".config" appended.
This "Readme.txt" and all files named ".template" are ignored.
guerillabackup-0.5.0/data/etc/units/TarBackupUnit.config.template 0000664 0000000 0000000 00000005373 14501370353 0025062 0 ustar 00root root 0000000 0000000 # TarBackupUnit configuration template
# This list contains dictionaries with configuration parameters
# for each tar backup to run. All tar backups of one unit are
# run sequentially. Configuration parameters are:
# * PreBackupCommand: execute this command given as list of arguments
# before starting the backup, e.g. create a filesystem or virtual
# machine snapshot, perform cleanup.
# * PostBackupCommand: execute this command after starting the
# backup.
# * Root: root directory of tar backup, "/" when missing.
# * Include: list of pathes to include, ["."] when missing.
# * Exclude: list of patterns to exclude from backup (see tar
# documentation "--exclude"). When missing and Root is "/",
# list ["./var/lib/guerillabackup/data"] is used.
# * IgnoreBackupRaces: flag to indicate if races during backup
# are acceptable, e.g. because the directories are modified,
# * FullBackupTiming: tuple with minimum and maximum interval
# between full backup invocations and modulo base and offset,
# all in seconds. Without modulo invocation (all values None),
# full backups will run as soon as minimum interval is exceeded.
# With modulo timing, modulo trigger is ignored when below minimum
# time. When gap above maximum interval, immediate backup is
# started.
# * IncBackupTiming: When set, incremental backups are created
# to fill the time between full backups. Timings are specified
# as tuple with same meaning as in FullBackupTiming parameter.
# This will also trigger generation of tar file indices when
# running full backups.
# * FullOverrideCommand: when set, parameters Exclude, Include,
# Root are ignored and exactly the given command is executed.
# * IncOverrideCommand: when set, parameters Exclude, Include,
# Root are ignored and exactly the given command is executed.
# * KeepIndices: number of old incremental tar backup indices
# to keep. With -1 keep all, otherwise keep one the given number.
# Default is 0.
# * Policy: If not none, include this string as handling policy
# * EncryptionKey: If not None, encrypt the input using the named
# key. Otherwise default encryption key from global configuration
# might be used.
TarBackupUnitConfigList = {}
# TarBackupUnitConfigList['/root'] = {
# 'PreBackupCommand': ['/usr/bin/touch', '/tmp/prebackup'],
# 'PostBackupCommand': ['/usr/bin/touch', '/tmp/postbackup'],
# 'Root': '/',
# 'Include': ['.'],
# 'Exclude': ['./proc', './sys', './var/lib/guerillabackup/data'],
# 'IgnoreBackupRaces': False,
# Schedule one root directory full backup every week.
# 'FullBackupTiming': [(7*24-4)*3600, (7*24+4)*3600, 7*24*3600, 0],
# Create a daily incremental backup when machine is up.
# 'IncBackupTiming': [20*3600, 28*3600, 24*3600, 0],
# 'Policy': 'default', 'EncryptionKey': None}
guerillabackup-0.5.0/data/init/ 0000775 0000000 0000000 00000000000 14501370353 0016363 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/data/init/systemd/ 0000775 0000000 0000000 00000000000 14501370353 0020053 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/data/init/systemd/guerillabackup-generator.service 0000664 0000000 0000000 00000001750 14501370353 0026416 0 ustar 00root root 0000000 0000000 [Unit]
Description="Guerillabackup backup generator service"
Documentation=man:gb-backup-generator(1)
After=network.target
[Service]
Type=simple
ExecStart=/usr/bin/gb-backup-generator
Restart=always
# Enable strict hardening by default: the settings here should
# be compatible with the backup generator units provided by the
# software package, but non-standard units may require these
# settings to be relaxed.
LockPersonality=true
MemoryDenyWriteExecute=true
# Do not provide a private view on devices as usually the devices
# should also end up unmodified in the backup when included by
# the backup source selection.
PrivateDevices=false
# Do not exclude the temporary directories from backup here but
# using the source selection.
PrivateTmp=false
ProtectClock=true
ProtectControlGroups=true
ProtectHostname=true
ProtectKernelLogs=true
ProtectKernelModules=true
ProtectKernelTunables=true
ProtectSystem=full
RestrictNamespaces=true
RestrictRealtime=true
[Install]
WantedBy=multi-user.target
guerillabackup-0.5.0/data/init/systemd/guerillabackup-transfer.service 0000664 0000000 0000000 00000001065 14501370353 0026253 0 ustar 00root root 0000000 0000000 [Unit]
Description="Guerillabackup data transfer service"
Documentation=man:gb-transfer-service(1)
After=network.target
[Service]
Type=simple
ExecStart=/usr/bin/gb-transfer-service
Restart=always
# Enable strict hardening by default.
LockPersonality=true
MemoryDenyWriteExecute=true
PrivateDevices=true
PrivateTmp=true
ProtectClock=true
ProtectControlGroups=true
ProtectHostname=true
ProtectKernelLogs=true
ProtectKernelModules=true
ProtectKernelTunables=true
ProtectSystem=full
RestrictNamespaces=true
RestrictRealtime=true
[Install]
WantedBy=multi-user.target
guerillabackup-0.5.0/data/init/upstart/ 0000775 0000000 0000000 00000000000 14501370353 0020065 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/data/init/upstart/guerillabackup-generator.conf 0000664 0000000 0000000 00000000302 14501370353 0025705 0 ustar 00root root 0000000 0000000 # guerillabackup - Start the backup generator service
description "Guerillabackup backup generator service"
start on filesystem
stop on starting rcS
respawn
exec /usr/bin/gb-backup-generator
guerillabackup-0.5.0/doc/ 0000775 0000000 0000000 00000000000 14501370353 0015254 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/doc/BackupGeneratorUnit.py.template 0000664 0000000 0000000 00000006765 14501370353 0023372 0 ustar 00root root 0000000 0000000 # This file is a template to create own backup generator unit
# definitions, that are in fact just plain python code. You may
# also use other guerillabackup core units as basis for your new
# code, e.g. /usr/lib/guerillabackup/lib/guerillabackup/LogfileBackupUnit.py
"""Your module docstring here ..."""
import errno
import json
import os
import guerillabackup
# Declare the keys to access configuration parameters in the configuration
# data dictionary here.
CONFIG_SOME_KEY = 'SomeBackupUnitSomeParameter'
class SomeBackupUnit(guerillabackup.SchedulableGeneratorUnitInterface):
"""Add documentation about this class here."""
def __init__(self, unitName, configContext):
"""Initialize this unit using the given configuration....
@param unitName The name of the activated unit main file in
/etc/guerillabackup/units."""
# Keep the unitName, it is usefull to create unique persistency
# directory names.
self.unitName = unitName
self.configContext = configContext
# Each unit has to handle the test mode flag, so extract it here.
self.testModeFlag = configContext.get(
guerillabackup.CONFIG_GENERAL_DEBUG_TEST_MODE_KEY, False)
if not isinstance(self.testModeFlag, bool):
raise Exception('Configuration parameter %s has to be ' \
'boolean' % guerillabackup.CONFIG_GENERAL_DEBUG_TEST_MODE_KEY)
# Open a persistency directory.
self.persistencyDirFd = guerillabackup.openPersistencyFile(
configContext, os.path.join('generators', self.unitName),
os.O_DIRECTORY|os.O_RDONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY, 0o600)
handle = None
try:
handle = guerillabackup.secureOpenAt(
self.persistencyDirFd, 'state.current',
fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY)
except OSError as openError:
if openError.errno != errno.ENOENT:
raise
if handle != None:
stateData = b''
while True:
data = os.read(handle, 1<<20)
if len(data) == 0:
break
stateData += data
os.close(handle)
stateInfo = json.loads(str(stateData, 'ascii'))
if ((not isinstance(stateInfo, list)) or (len(stateInfo) != 2) or
(not isinstance(stateInfo[0], int)) or
(not isinstance(stateInfo[1], dict))):
raise Exception('Persistency data structure mismatch')
...
# Now use the persistency information.
def getNextInvocationTime(self):
"""Get the time in seconds until this unit should called again.
If a unit does not know (yet) as invocation needs depend on
external events, it should report a reasonable low value to
be queried again soon.
@return 0 if the unit should be invoked immediately, the seconds
to go otherwise."""
# Calculate the next invocation time.
maxIntervalDelta = 600.0
...
return maxIntervalDelta
def invokeUnit(self, sink):
"""Invoke this unit to create backup elements and pass them
on to the sink. Even when indicated via getNextInvocationTime,
the unit may decide, that it is not yet ready and not write
any element to the sink.
@return None if currently there is nothing to write to the
source, a number of seconds to retry invocation if the unit
assumes, that there is data to be processed but processing
cannot start yet, e.g. due to locks held by other parties
or resource, e.g. network storages, currently not available."""
...
# Declare the main unit class so that the backup generator can
# instantiate it.
backupGeneratorUnitClass = SomeBackupUnit
guerillabackup-0.5.0/doc/Design.txt 0000664 0000000 0000000 00000041142 14501370353 0017230 0 ustar 00root root 0000000 0000000 Terms:
======
The following terms shall be used within requirements and design
to describe the components.
* Backup data element: A complete, atomic backup data storage
unit representing a defined complete state (full) or the change
from a previous state (incremental). Each element is linked
to a single source identified by a backup data element ID.
* Backup data element ID: An unique identifier for backup data
element storages to refer to a stored element. The storage has
to be able to derive the corresponding "source URL" from the
ID.
* BackupGenerator: The tool implementing the "Backup Scheduler"
and "Sink" functionality to trigger execution of registered
"Generator Unit" elements.
* Backup Scheduler: The scheduler will invoke backup generation
of a given "Generator Unit", thus triggering the backup storage
to a given sink.
* Generator Unit: When invoked by a BackupGenerator, the unit
delivers backup data elements from one or more "Sources". This
does not imply, that the unit has direct access to the "Source",
it may also retrieve elements from other generators or intermediate
caching or storage units.
* Source: A source is an identified data entity with a defined
state in time. At some timepoints, backup data elements can
be produced to represent that state or changes to that state
to an extent depending on the source properties. The series
of backup data elements produced for a single source are identified
by a common "Source URL".
* Source Multiplexer: A source multiplexer can retrieve or generate
"backup data elements" from one or more sources and deliver
deliver them to a sink multiplexer.
* Source URL: See design.
User stories:
=============
* N-way redundant storage synchronization:
There are n machines, all producing "backup data elements". All
these machines communicate one with another. The backup data elements
from one machine should be stored on at least two more machines.
The source synchronization policy is to announce each element
to all other machines until one of those confirms transmission.
On transmission, the source keeps information about which elements
were sucessfully stored by the remote receiver. As soon as the
required number of copies is reached, the file is announced only
to those agents any more, that have already fetched it.
The receiver synchronization policy is to ask each machine for
data elements. If the element is already present, it will not
be fetched again, otherwise the local policy may decide to start
a fetch procedure immediately. To conserve bandwidth and local
resources, the policy may also refuse to fetch some elements now
and retry later, thus giving another slower instance the chance
to fetch the file by itself. For local elements not announced
any more by the remote source, the receiver will move them to
the attic and delete after some time.
When an agent does not attempt to synchronize for an extended
period of time, the source will not count copies made by this
agent to the total number of copies any more. Thus the source
may start announcing the same backup data element to other agents
again.
Requirements:
=============
* [Req:SourceIdentification]: Each backup data source, that is
an endpoint producing backups, shall have a unique address,
both for linking stored backups to the source but also to apply
policies to data from a single source or to control behaviour
of a source.
* [Req:SecureDataLocalDataTransfers]: Avoid copying of files between
different user contexts to protect against filesystem based
attacks, privilege escalation.
* [Req:SynchronousGenerationAndStreaming]: Allow streaming of
backups from generator context to other local or remote context
immediately during generation.
* [Req:Spooling]: Support spooling of "backup data elements" on
intermediate storage location, which is needed when final storage
location and source are not permanently connected during backup
generation. As "backup data elements" may need to be generated
timely, support writing to spool location and fetching from
there.
* [Req:DetectSpoolingManipulations]: A malicious spool instance
shall not be able to remove or modify spooled backup data.
* [Req:EncryptedBackups]: Support encryption of backup data.
* [Req:Metainfo]: Allow transport of "backup data element"
meta information:
* Backup type ([Req:MetainfoBackupType]):
* full: the file contains the complete copy of the data
* inc: the backup has to be applied to the previous full backup
and possibly all incremental backups in between.
* storage data checksums
* handling policy
* fields for future use
* [Req:NonRepudiation]: Ensure non-repudiation even for backup
files in spool. This means, that as soon as a backup file was
received by another party, the source shall not be able to deny
having produced it.
* [Req:OrderedProcessing]: With spooling, files from one source
might not be transmitted in correct order. Some processing operations,
e.g. awstats, might need to see all files in correct order.
Thus where relevant, processor at end of pipeline shall be able
to verify all files have arrived and are properly ordered.
* [Req:DecentralizedScheduling]: Administrator at source shall
be able to trigger immediate backup and change scheduling if
not disabled upstream when using streaming generation (see also
[Req:SynchronousGenerationAndStreaming]).
* [Req:DecentralizedPolicing]: Administrator at source shall be
able to generate a single set of cyclic backups with non-default
policy tags.
* [Req:ModularGeneratorUnitConfiguration]: Modularized configuration
at backup generator level shall guarantee independent adding,
changing or removal of configuration files but also ...
* [Req:ModularGeneratorUnitCustomCode]: ... easy inclusion of
user-specific custom code without touching the application core.
* [Req:StorageBackupDataElementAttributes]: Apart from data element
attributes, storage shall be able to keep track of additional
storage attributes to manage data for current applications,
e.g. policy based synchronization between multiple storages
but also for future applications.
Design:
=======
* Source URL ([Req:SourceIdentification]): Each backup source
is identified by a unique UNIX-pathlike string starting with
'/' and path components consisting only of characters from the
set [A-Za-z0-9%.-] separated by slashes. The path components
'.' and '..' are forbidden for security reasons. The URL must
not end with a slash. A source may decide to use the '%' or
any other character for escaping when creating source URLs from
any other kind of input with broader set of allowed characters,
e.g. file names. The source must not rely on any downstream
processor to treat it any special.
* File storage: The storage file name is the resource name but
with ISO-like timestamp (second precision) and optional serial
number prepended to the last path part of the source URL followed
by the backup type ([Req:MetainfoBackupType]) and suffix '.data'.
Inclusion of the backup type in file name simplifies manual
removal without use of staging tools.
* Unit configuration: A "generator unit" of a given type might
have to be added multiple times but with different configuration.
Together with the configuration, each of the units might also
require a separate persistency or locking directory. Therefore
following scheme shall be used:
* A unit is activated by adding a symlink to an guerillabackup
core unit definition file ([Req:ModularGeneratorUnitConfiguration])
or creating a custom definition ([Req:ModularGeneratorUnitCustomCode]).
* A valid "generator unit" has to declare the main class object
to instantiate so that the generator can locate the class.
* Existance of a configuration file with same name as unit file,
just suffix changed to ".config", will cause this configuration
to be added as overlay to the global backup generator configuration.
* Data processing pipeline design:
* Data processing pipelines shall be used to allow creation
of customized processing pipelines.
* To support parallel processing and multithreading, processing
pipelines can be built using synchronous and asynchronous
pipeline elements.
* Data processing has to be protected against two kind of errors,
that is blocking IO within one component while action from
another component would be required. Thus blocking, if any
requires a timeout under all circumstances to avoid that an
asynchronous process enters error state while blocking.
* Even when a pipeline instance in the middle of the complete
pipeline has terminated, nothing can be infered about the
termination behaviour of the up- or downstream elements, it
is still required to await normal termination or errors when
triggering processing: an asynchronous process may terminate
after an unpredictable amount of time due to calculation or
IO activites not interfering with the pipeline data streams.
* Due to the asynchronous nature of pipeline processing and
the use of file descriptors for optimization, closing of those
descriptors may only occur after the last component using
a descriptor has released it. The downstream component alone
is allowed to close it and is also in charge of closing it.
If a downstream component is multithreaded and one thread
is ready to close it, another one still needs it, then the
downstream component has to solve that problem on its own.
* When all synchronous pipeline elements are stuck, select on
all blocking file descriptors from those elements until the
first one is ready for IO again.
* Scheduling: The scheduler contains a registry of known sources.
Each source is responsible to store scheduling information required
for operation, e.g. when source was last run, when it is scheduled
to be run again.
Each source has to provide methods to schedule and run it.
Each source has default processing pipeline associated, e.g.
compression, encryption, signing.
A source, that does not support multiple parallel invocation
has to provide locking support. While the older source process
shall continue processing uninterupted, the newer one may indicate
a retry timeout.
* Encryption and signing: GPG shall be used to create encrypted
and signed files or detached signatures for immediate transmission
to trusted third party.
* Metainformation files ([Req:Metainfo]): This file has a simple
structure just containing a json-serialized dictionary with
metainformation key-value pairs. The file name is derived from
the main backup file by appending ".info". As json serialization
of binary data is inefficient regarding space, this data shall
be encoded base64 before writing.
* BackupType ([Req:MetainfoBackupType]): Mandatory field with
type of the backup, only string "full" and "inc" supported.
* DataUuid ([Req:OrderedProcessing]): Optional unique identifier
of this file, e.g. when it may be possible, that datasets
with same timestamp might be available from a single source.
To allow reuse of DataUuid also for [Req:DetectSpoolingManipulations],
it has to be unique not only for a single source but for all
sources from a machine or even globally. This has to hold
true also when two source produce identical data. Otherwise
items with same Uuid could be swapped. It should be the same
when rerunning exactly the same backup for a single source
with identical state and data twice, e.g. when source wrote
data to sink before failing to complete the last steps.
* HandlingPolicy: Optional list of strings from set [A-Za-z0-9 ./-]
defining how a downstream backup sink or storage maintenance
process should handle the file. No list or an empty list is
allowed. Receiver has to know how to deal with a given policy.
* MetaDataSignature ([Req:DetectSpoolingManipulations]): A base64
encoded binary PGP signature on the whole metadata json artefact
with MetaDataSignature field already present but set to null
and TransferAttributes field missing.
* Predecessor ([Req:OrderedProcessing]): String with UUID of
file to be processed before this one.
* StorageFileChecksumSha512: The base64 binary checksum of the
stored backup file. As the file might be encrypted, this does
not need to match the checksum of the embedded content.
* StorageFileSignature: Base64 encoded data of binary PGP-signature
made on the storage file immediately after creation. While
signatures embedded in the encrypted storage file itself are
problematic to detect manipulation on the source system between
creation and retrieval, this detached signature can be easily
copied from the source system to another more trustworthy
machine, e.g. by sending as mail or writing to remote syslog.
([Req:NonRepudiation])
* Timestamp: Backup content timestamp in seconds since 1970.
This field is mandatory.
* Storage:
* The storage uses one JSON artefact per stored backup data
element containing a list of two items: the first one is the
element's metainfo, the second one the dictionary containing
the attributes.
* Synchronization:
* Daemons shall monitor the local storage to offer backup data
elements to remote synchronization agents based on the local
policy.
* Daemons shall also fetch remote resources when offered and
local storage is acceptable according to policy.
* TransferAttributes: A dictionary with attributes set by the
transfer proceses and used by backup data element announcement
and storage policies to optimize operation. Each attribute
holds a list of attribute values for each remote transfer
agent and probably also the transfer source.
* Transmission protocol: For interaction between the components,
a minimalistic transmission protocol shall be used.
* The protocol implementation shall be easily replaceable.
* The default JSON protocol implementation only forwards
data to ServerProtocolInterface implementation. Each request
is just a list with method name to call, the remaining
list items are used as call arguments. The supported method
names are the same as in ServerProtocolInterface interface.
As sending of large JSON responses is problematic, the
protocol supports multipart responses, sending chunks of
data.
BackupGenerator Design:
=======================
* The generator can write to a single but configurable sink,
thus enabling both local file storage but also remote fetch.
* It keeps track of the Schedulable Source Units.
* It provides configuration support to the source units.
* It provides persistency support to the source units.
StorageTool Design:
===================
The tool checks and modifies a storage using this workflow:
* Load the master configuration and all included configurations.
* Locate the storage data and meta information files in each
configuration data directory.
* Check for applicable but yet unused policy templates and apply
them to files matching the selection pattern.
Following policies can be applied to each source:
* Backup interval policy: this policy checks if full and incremental
backups are created and transfered with sane intervals between
each element and also in relation to current time (generated at
all).
* Integrity policy: this policy checks that the hash value of
the backup data matches the one in the metadata and that all
metadata blocks are chained together appropriately, thus detecting
manipulation of backup data and metadata in the storage. (Not
implemented yet)
* Retention policies: these policies check which backup data
elements should be kept and which ones could be pruned from
the storage to free space but also to comply with regulations
regarding data retention, e.g. GDPR. Such policies also interact
with the mechanisms to really delete the data without causing
inconsistencies in other policies, see below.
From all policies the retention policies are special as they
may release backup data elements still needed by other policies
when checking compliance. So for example deleting elements will
always break the integrity policy as it shall detect any modification
of backup data by design. Therefore applying any policy is a
two step process: first all policies are applied (checked) and
retention policies may mark some elements for deletion. Before
performing the deletion each policy is invoked again to extract
any information from the to be deleted elements that the policy
will need when applied again to the same storage after the elements
were already deleted. Thus any storage state that was valid according
to a policy will stay valid even after deletion of some elements.
guerillabackup-0.5.0/doc/FAQs.txt 0000664 0000000 0000000 00000010117 14501370353 0016607 0 ustar 00root root 0000000 0000000 Introduction:
=============
This document contains questions already asked regarding GuerillaBackup.
Each entry gives only an introductory answer and references to
core documentation regarding the question.
Questions regarding whole software suite:
=========================================
* Q: When should I use GuerillaBackup?
A: Use it when you want to craft a backup toolchain fulfilling
special needs of your environment. Use it when you know, that
you just cannot simply "make backups and restore them" in a highly
complex distributed system as done with a notebook and a USB backup
drive. Therefore you most likely have performed thourough risk
analysis, developed mitigation strategies, derived backup and
recovery plans. And now you need a flexible toolchain to be
configured according your specification and integrated into your
software ecosystem, most likely using fully automated installation
(e.g. ansible) and operation.
Do NOT use it if you are looking for a 3-click graphical solution
to synchronize your notebook desktop data to another location
every some weeks or so.
* Q: How do I restore backups?
A: Sarkastic answer: "exactly the one validated way you defined
in your backup plans and standard operating procedures".
Realistic answer: most likely you will have configured a simple
tar backup during installation that might be also encrypted using
the GnuPG encryption example. See "SoftwareOperation.txt" for
some answers regarding restore procedures.
gb-backup-generator:
=================
* Q: Where do I find generated archives and backups?
A: When using only "gb-backup-generator" without any transfers configured
(see "gb-transfer-service"), you will find them at the sink location
you configured when creating your generator configuration. Starting
from a default configuration, you will usually use a file system
sink storing to "/var/lib/guerillabackup/data". See "gb-backup-generator"
man page on general generator configuration and configuration
sections in "/usr/share/doc/guerillabackup/Installation.txt".
* Q: What are the PGP keys for, who maintains them?
A: Backup storages contain all the valuable data from various backup
sources thus are a very interesting target for data theft. Therefore
e.g. backup storage media (disks, tapes) have to be tracked, then
wiped and destroyed to prevent data leakage. GuerillaBackup supports
public key encryption at the source, thus an attack on the central
storage system or theft of storage media cannot reveal any relevant
information to the adversary. This makes e.g. backup media handling
easier and thus more secure, reduces costs. You could even synchronize
your backups to the cloud without relevant data leakage risks. The
private key is required only to restore backups and should be kept
safe, even offline at the best.
To use this feature, enable the "GeneralDefaultEncryptionElement"
from "/etc/guerillabackup/config.template" (see template for more
information). Define your protection levels, generate the required
keys and install them where needed.
gb-transfer-service:
=================
* Q: How do both sides authenticate?
A: According specification "gb-transfer-service" implementation creates
an UNIX domain socket which can be protected by standard means.
In default configuration it can only be accessed by user root.
To grant access to that socket remotely use the very same techniques
and tools you use for other network services also. Quite useful is
e.g. to forward the UNIX domain socket via SSH both on interactive
or non-interactive connections (depends on your use case, see
man "ssh", "-L" option) or use "socat UNIX-CONNECT:... OPENSSL"
to use PKI/certificate based access control and data encryption.
See "gb-transfer-service" man page for more information.
* Q: Which network protocols are supported?
A: "gb-transfer-service" requires bidirectional communication but
does not aim to reimplement all the high-quality network communication
tools out there already. Instead it provides means to easily
integrate your preferred tools for network tunneling, network
access control. See "Q: How do both sides authenticate?" for
more information.
guerillabackup-0.5.0/doc/Implementation.txt 0000664 0000000 0000000 00000025645 14501370353 0021016 0 ustar 00root root 0000000 0000000 Introduction:
=============
This document provides information on the implementation side
design decisions and the blueprint of the implementation itself.
Directory structure:
====================
* /etc/guerillabackup: This is the default configuration directory.
* config: This is the main GuerillaBackup configuration file.
Settings can be overridden e.g. in unit configuration files.
* keys: This directory is the default backup encryption key
location. Currently this is the home directory of a GnuPG
key store.
* lib-enabled: This directory is included in the site-path by
default. Add symbolic links to include specific Python packages
or machine/organisation specific code.
* units: The units directory contains the enabled backup data
generation units. To enable a unit, a symbolic link to the
unit definition file has to be created. The name of the symbolic
link has to consist only of letters and numbers. For units with
an associated configuration file named "[unitname].config",
configuration parameters from the main configuration file
can be overridden within the unit-specific configuration.
* /var/lib/guerillabackup: This directory is usually only readable
by root user unless transfer agents with different UID are configured.
* data: where backuped data from local backups is stored, usually
by the default sink.
* state: State persistency directory for all backup procedures.
* state/generators/[UnitName]: File or directory to store state
data for a given backup unit.
* state/agents: Directory to store additional information of
local backup data processing or remote transfer agents.
* /run/guerillabackup: This directory is used to keep data, only
needed while guerillabackup tools are running. This data can
be discarded on reboot.
* transfer.socket: Default socket location for "gb-transfer-service".
Library functions:
==================
* Configuration loading:
Configuration loading happens in 2 stages:
* Loading of the main configuration.
* Loading of a component/module specific overlay configuration.
This is allows tools to perform modularised tasks, e.g. a backup
generator processing different sources, to apply user-defined
configuration alterations to the configuration of a single unit.
The overlay configuration is then merged with the main configuration.
Defaults have to be set in the main configuration. A tool may
refuse to start when required default values are missing in the
configuration.
Backup Generator:
=================
* Process pipelines:
Pipeline implementation is designed to support both operating
system processes using only file descriptors for streaming,
pure Python processes, that need to be run in a separate thread
or are polled for normal operation and a mixture of both, i.e.
only one of input or output is an operating system pipe. Because
of the last case, a pipeline instance may behave like a synchronous
component at the beginning until the synchronous input was processed
and the end of input data was reached. From that moment on till
all processing is finished it behaves like an asynchronous component.
* State handling:
A pipeline instance may only change its state when the doProcess()
or isRunning() method is called. It is forbidden to invoke doProcess()
on an instance not running any more. Therefore after isRunning()
returned true, it is save to call doProcess().
* Error handling:
Standard way to get processing errors is by calling the doProcess
method, even when process is asynchronous. On error, the method
should always return the same error message for a broken process
until stop() is called.
One error variant is, that operating system processes did not
read all input from their input pipes and some data remains in
buffers. This error has to be reported to the caller either from
doProcess() or stop(), whatever comes first. The correct detection
of unprocessed input data may fail, if a downstream component
is stopped while the upstream is running and writing data to
a pipe after the checks.
* Pipeline element implementation:
* DigestPipelineElement:
A synchronous element creating a checksum of all data passing
through it.
* GpgEncryptionPipelineElement:
A pipeline element returning a generic OSProcessPipelineExecutionInstance
to perform encryption using GnuPG.
* OSProcessPipelineElement:
This pipeline element will create an operating system level process
wrapped in a OSProcessPipelineExecutionInstance to perform the
transformation. The instance may be fully asynchronous when it
is connected only by operating system pipes but is synchronous
when connected via stdin/stdout data moving.
Policy Based Data Synchronisation:
==================================
To support various requirements, e.g. decentralised backup generation
with secure secure spooling, asynchronous transfers, a the "gb-transfer-service",
a component for synchronisation is required, see "doc/Design.txt"
section "Synchronisation" for design information.
The implementation of "gb-transfer-service" orchestrates all components
required related to following functional blocks:
* ConnectorService: This service provides functions to establish
connectivity to other "gb-transfer-service" instances. Currently
only "SocketConnectorService" together with protocol handler
"JsonStreamServerProtocolRequestHandler" is supported. The
service has to care about authentication and basic service
access authorisation.
* Policies: Policies define, how the "gb-transfer-service" should
interact with other "gb-transfer-service" instances. There are
two types of policies, "ReceiverTransferPolicy" for incoming
transfers and "SenderTransferPolicy" for transmitting data.
See "ReceiverStoreDataTransferPolicy", "SenderMoveDataTransferPolicy",
for currently supported policies.
* Storage: A storage to store, fetch and delete StorageBackupDataElements.
Some storages may support storing of custom annotation data
per element. This can then be used in policies to perform policy
decisions, e.g. to prioritise sending of files according to tags.
Current storage implementation is "DefaultFileStorage".
* TransferAgent: The agent keeps track of all current connections
created via the ConnectorService. It may control load balancing
between multiple connections. Current available agent implementation
is "SimpleTransferAgent".
Classes and interfaces:
* ClientProtocolInterface:
Classes implementing this interface are passed to the TransferAgent
by the ConnectorService to allow outbound calls to the other agent.
* ConnectorService:
A service to establish in or outbound connections to an active
TransferAgent. Implementation will vary depending on underlying
protocol, e.g. TCP, socket, ... and authentication type, which
is also handled by the ConnectorService.
* DefaultFileStorage:
This storage implementation stores all relevant information on
the filesystem, supporting locking and extra attribute handling.
It uses the element name to create the storage file names, appending
"data", "info" or "lock" to it for content, meta information
storage and locking. Extra attribute data is stored in by using
the attribute name as file extension. Thus extensions from above
but also ones containing dashes or dots are not allowed.
* JsonStreamServerProtocolRequestHandler:
This handler implements a minimalistic JSON protocol to invoke
ServerProtocolInterface methods. See "doc/Design.txt" section
"Transmission protocol" for protocol design information.
* ReceiverStoreDataTransferPolicy:
This class defines a receiver policy, that attempts to fetch all
data elements offered by the remote transfer agent.
* ReceiverTransferPolicy:
This is the common superinterface of all receiver transfer policies.
* ServerProtocolInterface:
This is the server side protocol adaptor to be provided to the
transfer service to forward remote requests to the local SenderPolicy.
* SenderMoveDataTransferPolicy(SenderTransferPolicy):
This is a simple sender transfer policy just advertising all resources
for transfer and removing them or marking them as transferred as
soon as remote side confirms successful transfer. A file with a
mark will not be offered for download any more.
* applyPolicy(): deletes the file when transfer was successful.
* SenderTransferPolicy:
This is the common superinterface of all sender side transfer
policies. A policy implementation may require to adjust the internal
state after data was transferred.
* queryBackupDataElements(): return an iterator over all elements
eligible for transfer by the current policy. The query may
support remote side supplied query data for optimisation.
This should of course only be used when the remote side knows
the policy.
* applyPolicy(): update internal state after data transfer
was rejected, attempted or even successful.
* SocketConnectorService:
This is currently the only ConnectorService available. It accepts
incoming connections on a local UNIX socket. Authentication and
socket access authorisation has to be handled UNIX permissions
or integration with other tools, e.g. "socat". For each incoming
connection it uses a "JsonStreamServerProtocolRequestHandler"
protocol handler.
* TransferAgent:
This class provides the core functionality for in and outbound
transfers. It has a single sender or receiver transfer policy
or both attached. Protocol connections are attached to it using
a ConnectorService. The agent does not care about authentication
any more: everything relevant for authorisation has to be provided
by the ConnectorService and stored to the TransferContext.
gb-storage-tool:
============:
The workflow described in "Design.txt" is implemented as such:
* Load the master configuration and all included configurations:
This really creates just the "StorageConfig" and validates that
directories or files exist referenced by the configuration exist.
It also loads the "StorageStatus" (if any) but does not validate
any entries inside it.
A subconfiguration is only loaded after initialisation of the
parent configuration was completed.
See "StorageTool.loadConfiguration", "StorageConfig.__init__".
* Locate the storage data and meta information files in each
configuration data directory:
This step is done for the complete configuration tree with main
configuration as root and all included subconfiguration branches
and leaves. Starting with the most specific (leaf) configurations,
all files in the data directory are listed, associated with the
current configuration loading it. After processing of leaves
the same is done for the branch configuration including the leaves,
thus adding files not yet covered by the leaves and branches.
When all available storage files are known, they are grouped
into backup data elements (data and metadata) before grouping
also elements from the same source.
See "StorageTool.initializeStorage", "StorageConfig.initializeStorage".
* Check for applicable but yet unused policy templates and apply
them to files matching the selection pattern:
guerillabackup-0.5.0/doc/Installation.txt 0000664 0000000 0000000 00000011442 14501370353 0020460 0 ustar 00root root 0000000 0000000 Manual Installation:
====================
This installation guide applies to perform a manual installation
of GuerillaBackup.
* Create backup generation directory structures:
mkdir -m 0700 -p /etc/guerillabackup/units /var/lib/guerillabackup/data /var/lib/guerillabackup/state
cp -aT src /usr/lib/guerillabackup
General GuerillaBackup Configuration:
=====================================
All tools require a general configuration file, which usually
is identical for backup generation and transfer. The default
location is "/etc/guerillabackup/config". It can be derived from
"/etc/guerillabackup/config.template".
The file contains configuration paramers that influence the behavior
of various backup elements, e.g. source units, sinks or the generator
itself. All those parameters start with "General" to indicate
their global relevance.
See "/etc/guerillabackup/config.template" template file for extensive
comments regarding each parameter.
Configuration of gb-backup-generator:
=================================
* Configure generator units:
The unit configuration directory "/etc/guerillabackup/units" contains
templates for all available units. The documentation for unit
configuration parameters can be found within the template itself.
To enable a unit, the configuration has to be created and the
unit code to be activated. See "gb-backup-generator" manual page for
more details.
* Enable a default logfile archiving component:
ln -s -- /usr/lib/guerillabackup/lib/guerillabackup/LogfileBackupUnit.py /etc/guerillabackup/units/LogfileBackupUnit
cp /etc/guerillabackup/units/LogfileBackupUnit.config.template /etc/guerillabackup/units/LogfileBackupUnit.config
Enable log data directories by editing "LogfileBackupUnit.config".
* Add a cyclic tar backup component:
ln -s -- /usr/lib/guerillabackup/lib/guerillabackup/TarBackupUnit.py /etc/guerillabackup/units/TarBackupUnit
cp /etc/guerillabackup/units/TarBackupUnit.config.template /etc/guerillabackup/units/TarBackupUnit.config
Add tar backup configurations needed on the source system to the
configuration file.
* Perform a generator test run in foreground mode:
Start the backup generator directly:
/usr/bin/gb-backup-generator
The tool should not emit any errors during normal operation while
running. After your CPU is idle, check that all backup volumes
were generated as expected by verifying existence of backup files
in the sink directory. You might use
find /var/lib/guerillabackup -type f | sort
for that.
* Enable automatic startup of the generator after boot:
* On systemd systems:
mkdir -p /etc/systemd/system
cp data/init/systemd/guerillabackup.service /etc/systemd/system/guerillabackup.service
systemctl enable guerillabackup.service
start guerillabackup
* On upstart systems:
cp data/init/upstart/guerillabackup.conf /etc/init/guerillabackup.conf
* As cronjob after reboot:
cat < /etc/cron.d/guerillabackup
@reboot root (/usr/bin/gb-backup-generator < /dev/null >> /var/log/guerillabackup.log 2>&1 &)
EOF
Configuration of gb-transfer-service:
==================================
* Configure the service:
The main configuration can be found in "/etc/guerillabackup/config".
The most simplified transfer scheme is just a sender and receiver
to move backup data. Transfer can be started independently from
backup generation when conditions are favourable, e.g. connectivity
or bandwidth availability.
The upstream source documentation contains two testcases for this
scenario, "SenderOnlyTransferService" and "ReceiverOnlyTransferService".
* Sender configuration:
Just enable "TransferSenderPolicyClass" and "TransferSenderPolicyInitArgs"
for a default move-only sender policy.
* Receiver configuration:
While sender often requires root privileges to read the backup
data files to avoid privacy issues with backup content. The receiver
on the other hand is usually running on a suitable intermediate
transfer hop or final data sink, where isolation is easier. In
such scenarios, "/etc/guerillabackup/config.template" can be copied
and used with any user ID. To use it, adjust "GeneralRuntimeDataDir"
and "TransferServiceStorageBaseDir" appropriately, e.g.
GeneralRuntimeDataDir = '/[user data directory]/run'
TransferServiceStorageBaseDir = '/[user data directory]/[host]
The receiver policies have to be enabled also by enabling
"TransferReceiverPolicyClass" and "TransferReceiverPolicyInitArgs".
The service is then started using
/usr/bin/gb-transfer-service --Config [configfile]
* Automatic startup:
Activation is similar to "Configuration of gb-backup-generator", only
the systemd unit name "guerillabackup-transfer.service" has to
be used for systemd.
* Initiate the transfer:
Transfer will start as soon as a connection between the two
gb-transfer-service instances is established. See "gb-transfer-service"
manual page for more information on that.
guerillabackup-0.5.0/doc/SoftwareOperation.txt 0000664 0000000 0000000 00000015303 14501370353 0021472 0 ustar 00root root 0000000 0000000 Introduction:
=============
This document deals with software operation after initial installation,
e.g. regarding:
* Analyze system failures, restoring operation
* Using archives and backups
Using archives and backups:
===========================
* Restoring backups:
WORD OF WARNING: If you are planning to restore backups "free-handed"
(without any defined, written, validated and trained operation
procedures), using this software is most likely inefficient and
risky and you might search for another solution. Otherwise this
section gives you hints what should be considered when defining
your own backup procedures.
NOTE: Future releases may contain written procedures for some
common use cases, so that only selection of relevant use cases,
validation and training needs to be done by you or your organization.
General procedure:
To restore data you have to unwind all steps performed during
backup generation. As GuerillaBackup is a toolchain with nearly
unlimited flexibility in backup generation, processing and transfer
procedures, those "unwinding" procedures might be very special
for your system and cannot be covered here. When running a setup
following many recommendations from the installation guideline,
those steps should be considered:
1) Locate the (one) storage of the backup to restore
2) Validate backup integrity with (suspected) adversaries
3) Decrypt the data if encrypted
4) Unpack the data to a storage system for that type of data
5) Merge the data according to data type, validate result
* 1) Locate the (one) storage of the backup to restore:
In distributed setups, "gb-transfer-service" (see man page) will
have synchronized the backups between one or more storages according
to the transfer policies in place. In small environments you
might be able to locate the data checking your configuration.
On larger setups relevant (transfer-, retention-, access- ...)
policy and data location should be stored in your configuration
management database that was also used for automated installation
of the guerillabackup system.
As backup data retrieval is a security critical step, you should
also consider links to you ISMS here. Theft of backup data might
be easier than stealing data from the live-systems, e.g. by fooling
the system operator performing the restore procedure.
The name of the backup storage files depends on your configuration.
With the example configuration from the installation guide, just
a simple tar-backup of the file system root is generated. With
the default storage module, a pair of files exists for each backup
timepoint, e.g.
20181220083942-root-full.data
20181220083942-root-full.info
These files are usually located in "/var/lib/guerillabackup/data"
by default. When using incremental backups, you usually will need
to collect all following "-inc.(data|info)" files up to the restore
point you want to reach.
* 2) Validate backup integrity with (suspected) adversaries:
There are two types of data integrity violations possible:
* The backup data file itself contains corrupted data violating
the data type specification, e.g. an invalid compression format.
* The data itself is not corrupted on format level (so it can
be restored technically) but its content was modified and
does not match the data on the source system at the given
timepoint.
The first issue can be addressed by checking the ".info" file
(a simple JSON file) and compare the "StorageFileChecksumSha512"
value with the checksum of the ".data" file.
If you are considering attacks on your backup system, the later
data integrity violation can only be detected checking the synteny
of all your backup archive. Therefore the "Predecessor" checksum
field from the ".info" file can be used. Manipulation of a single
backup archive will break the chain at that point. The attacker
would have to manipulate also all ".info" files from that position
on to create a consistent picture at least on the backup storage
system.
As the ".info" files are small, a paranoid configuration should
also send them to off-site locations where at least the last
version of each backup source can be retrieved for comparison.
Currently the guerilla package does not yet contain a tool for
those different data integrity validation procedures. You may
want to check "Design.txt", section "Metainformation files" on
how to create a small tool for yourself.
* 3) Decrypt the data if encrypted:
With the GnuPG plugin, data is encrypted using the public key
configured during installation. As backup data archives might
be way too large to quickly decrypt them to intermediate storage
during restore and because unencrypted content should only be
seen by the target system operating on that data, not the backup
or some intermediate system (data leakage prevention), thus streaming
the encrypted data to the target system and decryption on the
target should be the way to go. When playing it safe, the private
key for decryption is stored on a hardware token and cannot (and
should never) be copied to the target system.
For that use-case GnuPG provides the feature of "session-key-extraction".
It is sufficient to decrypt only the first few kB of the backup
archive using "--show-session-key" on the machine holding the
PKI-token. On the target system corresponding "--override-session-key"
option allows to decrypt the backup data stream on the fly as
it arives. Refer to you gnupg documentation for more information.
So the data receiving pipeline on the data target system might
look somethink like
[backup data receive/retrieve command] |
gpg "--override-session-key [key]" --decrypt |
[backup data specific restore/merge command(s) - see below]
* 4) Unpack the data to a storage system for that type of data:
To unpack the data you need to apply an opposite command than
the one you configured to create the backup data, e.g. "pg_restore"
vs "pg_dump".
The same is true for tar-backups created by the recommended
configuration from the installation guide. On a full backup with
default "bzip2" compression usually this will suite most:
... data strem souce] | tar -C [mergelocation] --numeric-owner -xj
* mergelocation: when the target system is "hot" during restore
(other processes accessing files might be active) or complex
data merging is required, data should never be extracted to
final location immediately due to data quality and security
risks.
With incremental backups the "...-full.data" file has to be extracted
first, followed by all incremental data files.
See "tar" manual pages for more information.
* 5) Merge the data according to data type, validate result:
These are the fine arts of system operators maintaining highly
reliable systems and clearly out of scope of this document. Make
a plan, write it down, validate it, train your staff.
guerillabackup-0.5.0/doc/gb-backup-generator.1.xml 0000664 0000000 0000000 00000014777 14501370353 0021774 0 ustar 00root root 0000000 0000000
]>
&dhtitle;&dhpackage;&dhfirstname;&dhsurname;Wrote this manual page.&dhemail;2016-2023&dhusername;This manual page was written for guerillabackup system
on Linux systems, e.g. Debian.Permission is granted to copy, distribute and/or modify this
document under the terms of the Lesser GNU General Public
License, Version 3.On Debian systems, the complete text of the Lesser
GNU General Public License can be found in
/usr/share/common-licenses/LGPL-3.GB-BACKUP-GENERATOR&dhsection;gb-backup-generatorProgram to generate backups or archives using
configured generator units according to given schedule.gb-backup-generatorDESCRIPTIONThis is the manual page for the gb-backup-generator
command. For more details see documentation at
/usr/share/doc/guerillabackup. The generator is responsible
to keep track over all scheduled backup tasks (units), to
invoke them and write the created backup data stream to the
data sink, usually the file system. The generator supports
generation of encrypted backups, management of information
about the backed-up element, adds hashes to support detection
of missing and manipulated backups. With that functions, confidentiality
and integrity can be protected, also providing non-repudiation
features.OPTIONSThis optional parameter specifies an alternative
configuration loading directory instead of /etc/guerillabackup.
The directory has to contain the main configuration
file (config), the units subdirectory.
FILES/etc/guerillabackup/configThe main configuration file for all guerillabackup
tools. Use /etc/guerillabackup/config.template to create
it. The template also contains the documentation for
each available parameter./etc/guerillabackup/units/[name]The units directory contains the enabled backup
data generation units. To enable a unit, a symbolic
link to the unit definition file has to be created.
The name of the symbolic link has to consist only of
letters and numbers. For example, to enable LogfileBackupUnit
for log-file archiving, one could use
"ln -s -- /usr/lib/guerillabackup/lib/guerillabackup/LogfileBackupUnit.py LogfileBackupUnit".For units with an associated configuration file
named "[unitname].config", configuration parameters
from the main configuration file can be overridden within
the unit-specific configuration. For all standard units,
/etc/guerillabackup/units contains templates for unit
configuration files.It is also possible to link a the same unit definition
file more than once using different symbolic link names.
Usually this only makes sense when each of those units
has a different unit configuration file./etc/systemd/system/guerillabackup-generator.serviceOn systemd installations, this is the systemd
configuration for automatic startup of the gb-backup-generator
service. Usually it is not enabled by default. To enable
use "systemctl enable guerillabackup-generator.service".
BUGS
For guerillabackup setups installed from packages, e.g.
.deb or .rpm files usually installed via package management
software, e.g. apt-get, aptitude, rpm, yast, please report
bugs to the package maintainer.
For setups from unpackaged software trunk, please report
at .SEE ALSOgb-transfer-service1
guerillabackup-0.5.0/doc/gb-storage-tool.1.xml 0000664 0000000 0000000 00000044014 14501370353 0021145 0 ustar 00root root 0000000 0000000
]>
&dhtitle;&dhpackage;&dhfirstname;&dhsurname;Wrote this manual page.&dhemail;2022-2023&dhusername;This manual page was written for guerillabackup system
on Linux systems, e.g. Debian.Permission is granted to copy, distribute and/or modify this
document under the terms of the Lesser GNU General Public
License, Version 3.On Debian systems, the complete text of the Lesser
GNU General Public License can be found in
/usr/share/common-licenses/LGPL-3.GB-STORAGE-TOOL&dhsection;gb-storage-toolManage guerillabackup backup data storagesgb-storage-toolDESCRIPTIONThis is the manual page for the gb-storage-tool
command. The tool is used to perform operations on backup
file storage locations as used by gb-backup-generator
or gb-transfer-service to store backup data.Currently the tool supports checking storage file naming
to identify incomplete backups due to aborts during backup
generation or transfer e.g. by reboots or crashes. To ignore
files for a reason, e.g. notes, add entries to the status
file, e.g.For all files defining valid backup data elements,
configurable policies are applied. See POLICIES section below
for supported policies.OPTIONSThis optional parameter specifies an alternative
configuration file instead of /etc/guerillabackup/storage-tool-config.json.
This optional parameter will make gb-storage-tool
perform policy checks only but will not modify the
storage, e.g. by deleting files flagged for deletion
by a retention policy.
POLICIESgb-storage-tool can apply multiple policies to each backup
data source but it is only possible to have one policy of
a given type (see policy types below). Which policies to
apply is defined by the gb-storage-tool configuration file "Policies"
parameter. A regular expression is used to select which sources
policies should be applied to with the first matching expression
taking precedence. For each regular expression a list of
polices with parameters is defined. See
/data/etc/storage-tool-config.json.template for examples.To ease policy selection in large setups, policy inheritance
can be used. A included configuration (see "Include" configuration
parameter) may also define policies, which can extend or
override the policies from the parent configuration(s) but
also policies defined just earlier in the same configuration.
The overriding policy definition has to have a higher priority,
otherwise it will be ignored. To disable policy inheritance
a subconfiguration may set the "Inherit" configuration parameter
to false (default is true). This will also prevent any policies
defined earlier in the very same configuration to be ignored.
Thus to disable inheritance for all sources in a configuration,
the first entry in the policy list should match all sources
(.*) and disable inheritance.Each policy defined in the gb-storage-tool configuration
file may also keep policy status information in the status
file. The status data is usually updated as the policy is
applied unless there is a significant policy violation. That
will require the user either to fix the root cause of the
violation (e.g. backup data was found to be missing) or the
user may update the status to ignore the violation. The later
cannot be done interactively via gb-storage-tool yet, one has
to adjust the storage status configuration manually. Therefore
the user has to create or update the status configuration
with the the backup element name (the filename relative to
the data directory without any suffix) as key and the status
information for the policy as described below (and sometimes
given as hint on the console too).gb-storage-tool supports following policies:Interval:Verify that all sources generate backups at expected
rates and all backups were transferred successfully.
Thus this policy eases spotting of system failures
in the backup system. An example policy configuration
is:
...
"Policies": [
{
"Sources": "^(.*/)?root$",
"Inherit": false,
"List": [
{
"Name": "Interval",
"Priority": 100,
"FullMin": "6d20H",
"FullMax": "6d28H",
"IncMin": "20H",
"IncMax": "28H"
}, {
...This configuration specifies that to all backups
from source with name "root" (the default backup created
by the gb-backup-generator) an Interval
policy shall be applied. The policy will expect full
backups every 7 days +- 4 hours and incremental backups
each day +- 4 hours.When policy validation fails for a given source,
the policy configuration may be adjusted but also the
violation may be ignored by updating the check status.
Thus the validation error will not be reported any
more in the next run. The status data in that case
may look like:
...
"20200102000000-root-full": {
"Interval": {
"Ignore": "both"
}
},
...This status indicates, that the both interval
checks for the interval from the previous full and
incremental backup to the backup named above should
be disabled. Do disable only one type of checks, the
"full" or "inc" type keyword is used instead of "both".While above is fine to ignore singular policy
violations, also the policy itself may be adjusted.
This is useful when e.g. the backup generation intervals
where changed at the source. The status data in that
case could look like:
...
"20200102000000-root-full": {
"Interval": {
"Config": {
"FullMax": "29d28H",
"FullMin": "29d20H",
"IncMax": "6d28H",
"IncMin": "6d20H"
}
}
},
...LevelRetention:This defines a retention policy defined by retention
levels, e.g. on first level keep each backup for 30
days, next level keep 12 weekly backups, on the next
level keep 12 monthly backups, then 12 every three
month and from that on only yearly ones.
...
"Policies": [
{
"Sources": "^(.*/)?root$",
"List": [
{
"Name": "LevelRetention",
"Levels": [
# Keep weekly backups for 30 days, including incremental backups.
{
"KeepCount": 30,
"Interval": "day",
"TimeRef": "latest",
"KeepInc": true
},
# Keep weekly backups for 3 month, approx. 13 backups.
{
"KeepCount": 13,
"Interval": "day",
"AlignModulus": 7
},
...
{
"KeepCount": 12,
"Interval": "month",
"AlignModulus": 3,
"AlignValue": 1
},
...This configuration defines, that on the finest
level, backups for 30 days should be kept counting
from the most recent on ("TimeRef": "latest"), including
incremental backups ("KeepInc": true). Thus for machines
not producing backups any more, the most recent ones
are kept unconditionally.On the next level, 13 weekly backups are kept,
while may overlap with backups already kept due to
the first level configuration from above. But here
only full backups will be kept, that were generated
after every 7th day due to "AlignModulus", preferring
the one generated on day 0.At another level, only one backup is kept every
three month, preferring the one from the month numbered
1, 4, 7, 10 due to "AlignModulus" and "AlignValue".
Hence the first backup in January, April, ... should
be kept.Size:This policy checks that backup data sizes are
as expected as size changes may indicate problems, e.g.
a size increase due to archives, database dumps, local
file backups ... forgotten by the administrator (thus
wasting backup space but sometimes also causing security
issues due to lack of as strict access permissions
on those files compared to their source), size increase
due to rampant processes filling up database tables
or log files in retry loops (also monitoring should
catch that), core dumps accumulating, ...A "Size" policy can be defined for both full and
incremental backups. For each backup type, the accepted
size range can be defined by absolute or relative values.
Without providing an expected size, the size of the
first backup of that type seen is used. Therefore for
servers without accumulating data, following policy
could be defined:
...
"Policies": [
{
"Sources": "^(.*/)?root$",
"List": [
{
"Name": "Size",
"Priority": 0,
"FullSizeMinRel": 0.9,
"FullSizeMaxRel": 1.1,
"IncSizeMin": 100000,
"IncSizeMaxRel": 10.0
}, {
...This configuration will check sizes of "root"
backups using the first full and incremental size as
reference. Full backups may vary in size between 90%
and 110% while incremental backups have to be at least
100kb large but may vary 10-fold in size. All supported
policy parameters are:Specify the expected full backup size. When missing
the size of first full backup seen is used as default.Specify the absolute maximum backup size. You cannot
use "FullSizeMaxRel" at the same time.Specify the absolute minimum backup size. You cannot
use "FullSizeMinRel" at the same time.Same as "Full..." parameters just for incremental
backups. See above.Specify the maximum backup size in relation to the
expected size (see "FullSizeExpect"). You cannot use "FullSizeMax"
at the same time.Specify the minimum backup size in relation to the
expected size (see "FullSizeExpect"). You cannot use "FullSizeMin"
at the same time.Specify the expected incremental backup size in relation
to the expected full backup size (see "FullSizeExpect").
You cannot use "IncSizeExpect" at the same time.Same as "Full..." parameters just for incremental
backups. See above.When policy validation fails for a given source,
the policy configuration may be adjusted but also the
violation may be ignored by updating the check status.
Thus the validation error will not be reported any
more in the next run. The status data in that case
may look like:
...
"20200102000000-root-full": {
"Size": {
"Ignore": true
}
},
...While above is fine to ignore singular policy
violations, also the policy itself may be adjusted.
This is useful when e.g. the size of backups changed
due to installing of new software or services. The
updated policy configuration can then be attached
to the first element it should apply to:
...
"20200102000000-root-full": {
"Size": {
"Config": {
"FullSizeExpect": 234567890,
"FullSizeMinRel": 0.9,
"FullSizeMaxRel": 1.1,
"IncSizeMin": 100000,
"IncSizeMaxRel": 10.0
}
}
},
...FILES/etc/guerillabackup/storage-tool-config.jsonThe default configuration file for gb-storage-tool
tool. Use storage-tool-config.json.template to create
it. The template also contains the documentation for
each available parameter. The most relevant parameters
for gb-storage-tool are DataDir, Include
and Status./var/lib/guerillabackup/state/storage-tool-status.jsonThis is the recommended location for the toplevel
gb-storage-tool status file. The file has to contain valid
JSON data but also comment lines starting with #. See
POLICIES section above for description of policy specific
status data.BUGS
For guerillabackup setups installed from packages, e.g.
.deb or .rpm files usually installed via package management
software, e.g. apt-get, aptitude, rpm, yast, please report
bugs to the package maintainer.
For setups from unpackaged software trunk, please report
at .SEE ALSOgb-transfer-service1
guerillabackup-0.5.0/doc/gb-transfer-service.1.xml 0000664 0000000 0000000 00000016734 14501370353 0022020 0 ustar 00root root 0000000 0000000
]>
&dhtitle;&dhpackage;&dhfirstname;&dhsurname;Wrote this manual page.&dhemail;2016-2023&dhusername;This manual page was written for guerillabackup system
on Linux systems, e.g. Debian.Permission is granted to copy, distribute and/or modify this
document under the terms of the Lesser GNU General Public
License, Version 3.On Debian systems, the complete text of the Lesser
GNU General Public License can be found in
/usr/share/common-licenses/LGPL-3.GB-TRANSFER-SERVICE&dhsection;gb-transfer-serviceSynchronise guerillabackup backup data storagesgb-transfer-serviceDESCRIPTIONThis is the manual page for the gb-transfer-service
command. For more details see packaged documentation at
/usr/share/doc/guerillabackup. The service has two main
purposes: providing a stream-based protocol for interaction
with other gb-transfer-service instances and application of
storage and retrieval policies for data synchronisation.The network part uses a local (AF_UNIX) socket to listen
for incoming connections (see /run/guerillabackup/transfer.socket
below). There is no authentication magic or likely-to-be-flawed
custom-made crypto included in that part: any process allowed
to open the socket can talk the protocol. For connectivity
and authentication, use your favourite (trusted) tools. Good
starting points are socat with OPENSSL X509 client/server
certificate checks on one side and
UNIX-CONNECT:/run/guerillabackup/transfer.socket for the
other one. When using SSH to forward such connections, you
should consider key-based authentication with command forcing
(command="/usr/bin/socat - UNIX-CONNECT:/run/guerillabackup/transfer.socket")
and default security options (restrict).The policies are the other domain of the gb-transfer-service.
They define the authorisation rules granting access to backup
data elements but do NOT grant access to the remote file system
as such or allow creation or restore of backups. That is the
domain of gb-backup-generator tool. The policy also defines, which
backup elements should be copied or moved to other storages.
Each gb-transfer-service may have two polices: one defining, what
should be sent to other instances (sender policy) and what
should be received (receiver policy). Without defining a policy
for a transfer direction, no data will be sent in that direction.
Currently there are two predefined policies:ReceiverStoreDataTransferPolicy: this policy attempts
to create a copy of each file offered by a remote sender and
keeps it, even after the sender stopped providing it. This
policy is useful to fetch all files from a remote storage.SenderMoveDataTransferPolicy: this policy offers all
backup files in the local storage for transfer. Depending
on the settings, files are deleted after sending or just flagged
as sent after successful transfer.A policy implements one of the policy interfaces, that
are ReceiverTransferPolicy and SenderTransferPolicy. You may
create a custom policy when the predefined do not match your
requirements.OPTIONSThis optional parameter specifies an alternative
configuration file instead of /etc/guerillabackup/config.
FILES/etc/guerillabackup/configThe main configuration file for all guerillabackup
tools. Use /etc/guerillabackup/config.template to create
it. The template also contains the documentation for
each available parameter. The most relevant parameters
for gb-transfer-service are TransferServiceStorageBaseDir,
TransferReceiverPolicyClass, TransferReceiverPolicyInitArgs,
TransferSenderPolicyClass, TransferSenderPolicyInitArgs./run/guerillabackup/transfer.socketThis is the default socket file name to connect
two gb-transfer-service instances. The path can be changed
by modification of "GeneralRuntimeDataDir" configuration
property from default "/run/guerillabackup". By
default, the socket is only accessible to privileged
users and the user, who created it (mode 0600). You
might change permissions after startup to grant access
to other users also.BUGS
For guerillabackup setups installed from packages, e.g.
.deb or .rpm files usually installed via package management
software, e.g. apt-get, aptitude, rpm, yast, please report
bugs to the package maintainer.
For setups from unpackaged software trunk, please report
at .SEE ALSOgb-backup-generator1
guerillabackup-0.5.0/src/ 0000775 0000000 0000000 00000000000 14501370353 0015276 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/src/gb-backup-generator 0000775 0000000 0000000 00000012562 14501370353 0021051 0 ustar 00root root 0000000 0000000 #!/usr/bin/python3 -BEsStt
"""This tool allows to generate handle scheduling of sources and
generation of backup data elements by writing them to a sink."""
import os
import re
import sys
import time
import traceback
# Adjust the Python sites path to include only the guerillabackup
# library addons, thus avoiding a large set of python site packages
# to be included in code run with root privileges. Also remove
# the local directory from the site path.
sys.path = sys.path[1:]+['/usr/lib/guerillabackup/lib', '/etc/guerillabackup/lib-enabled']
import guerillabackup
def runUnits(unitList, defaultUnitRunCondition, backupSink):
"""Run all units in the list in an endless loop."""
while True:
immediateUnitList = []
nextInvocationTime = 3600
for unit in unitList:
unitInvocationTime = unit.getNextInvocationTime()
if unitInvocationTime == 0:
immediateUnitList.append(unit)
nextInvocationTime = min(nextInvocationTime, unitInvocationTime)
if len(immediateUnitList) != 0:
# Really run the tasks when there was no condition defined or
# the condition is met.
if (defaultUnitRunCondition is None) or \
(defaultUnitRunCondition.evaluate()):
for unit in immediateUnitList:
unit.invokeUnit(backupSink)
# Clear the next invocation time, we do not know how long we spent
# inside the scheduled units.
nextInvocationTime = 0
else:
# The unit is ready but the condition was not met. Evaluate the
# conditions again in 10 seconds.
nextInvocationTime = 10
if nextInvocationTime > 0:
time.sleep(nextInvocationTime)
def main():
"""This is the program main function."""
backupConfigDirName = '/etc/guerillabackup'
unitNameRegex = re.compile('^[0-9A-Za-z]+$')
argPos = 1
while argPos < len(sys.argv):
argName = sys.argv[argPos]
argPos += 1
if not argName.startswith('--'):
print('Invalid argument "%s"' % argName, file=sys.stderr)
sys.exit(1)
if argName == '--ConfigDir':
backupConfigDirName = sys.argv[argPos]
argPos += 1
continue
if argName == '--Help':
print(
'Usage: %s [OPTION]\n' \
' --ConfigDir [dir]: Use custom configuration directory not\n' \
' default "/etc/guerillabackup/"\n' % (sys.argv[0]))
sys.exit(0)
print('Unknown parameter "%s", try "--Help"' % argName, file=sys.stderr)
sys.exit(1)
# Make stdout, stderr unbuffered to avoid data lingering in buffers
# when output is piped to another program.
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 1)
sys.stderr = os.fdopen(sys.stderr.fileno(), 'w', 1)
mainConfig = {}
mainConfigFileName = os.path.join(backupConfigDirName, 'config')
if not os.path.exists(backupConfigDirName):
print('Configuration file %s does not exist' % repr(mainConfigFileName), file=sys.stderr)
sys.exit(1)
try:
mainConfig = {'guerillabackup': guerillabackup}
guerillabackup.execConfigFile(mainConfigFileName, mainConfig)
mainConfig.__delitem__('__builtins__')
except Exception as loadException:
print('Failed to load configuration "%s": %s' % (
mainConfigFileName, str(loadException)), file=sys.stderr)
traceback.print_tb(sys.exc_info()[2])
sys.exit(1)
# Initialize the sink.
backupSinkClass = mainConfig.get(
'GeneratorSinkClass', guerillabackup.DefaultFileSystemSink)
backupSink = backupSinkClass(mainConfig)
# Get the default condition for running any unit when ready.
defaultUnitRunCondition = mainConfig.get('DefaultUnitRunCondition', None)
# Now search the unit directory and load all units that should
# be scheduled.
unitList = []
unitDir = os.path.join(backupConfigDirName, 'units')
unitDirFileList = os.listdir(unitDir)
for unitFileName in unitDirFileList[:]:
if unitFileName.endswith('.config'):
# Ignore config files for now, will be loaded when handling the
# unit main file.
continue
if (unitFileName == 'Readme.txt') or (unitFileName.endswith('.template')):
# Ignore main templates and Readme.txt also.
unitDirFileList.remove(unitFileName)
continue
matcher = unitNameRegex.match(unitFileName)
if matcher is None:
continue
unitDirFileList.remove(unitFileName)
# See if there is a configuration file to load before initializing
# the unit. Clone the main configuration anyway to avoid accidental
# modification by units.
unitConfig = dict(mainConfig)
unitConfigFileName = os.path.join(unitDir, '%s.config' % unitFileName)
if os.path.exists(unitConfigFileName):
try:
unitDirFileList.remove(unitFileName+'.config')
except:
pass
guerillabackup.execConfigFile(unitConfigFileName, unitConfig)
unitConfig.__delitem__('__builtins__')
# Load the code within a new namespace and create unit object
# from class with the same name as the file.
localsDict = {}
guerillabackup.execConfigFile(os.path.join(unitDir, unitFileName), localsDict)
unitClass = localsDict[guerillabackup.GENERATOR_UNIT_CLASS_KEY]
unitObject = unitClass(unitFileName, unitConfig)
unitList.append(unitObject)
for unhandledFileName in unitDirFileList:
print('WARNING: File %s/%s is not unit definition nor unit configuration ' \
'for activated unit' % (unitDir, unhandledFileName), file=sys.stderr)
# Now all units are loaded, start the scheduling.
runUnits(unitList, defaultUnitRunCondition, backupSink)
if __name__ == '__main__':
main()
guerillabackup-0.5.0/src/gb-storage-tool 0000775 0000000 0000000 00000101367 14501370353 0020241 0 ustar 00root root 0000000 0000000 #!/usr/bin/python3 -BEsStt
"""This tool performs operations on a local data storage, at
the moment only checking that all file names are sane, warning
about unexpected files, e.g. from failed transfers."""
import datetime
import json
import os
import re
import sys
# Adjust the Python sites path to include only the guerillabackup
# library addons, thus avoiding a large set of python site packages
# to be included in code run with root privileges. Also remove
# the local directory from the site path.
sys.path = sys.path[1:]+['/usr/lib/guerillabackup/lib', '/etc/guerillabackup/lib-enabled']
import guerillabackup.Utils
from guerillabackup.storagetool.PolicyTypeInterval import PolicyTypeInterval
from guerillabackup.storagetool.PolicyTypeLevelRetention import PolicyTypeLevelRetention
from guerillabackup.storagetool.PolicyTypeSize import PolicyTypeSize
class PolicyGroup():
"""A policy group is a list of policies to be applied to sources
matching a given regular expression."""
def __init__(self, groupConfig):
"""Create a new policy configuration group."""
if not isinstance(groupConfig, dict):
raise Exception('Policies entry not a dictionary')
self.inheritFlag = True
self.resourceRegex = re.compile(groupConfig['Sources'])
if 'Inherit' in groupConfig:
self.inheritFlag = groupConfig['Inherit']
if not isinstance(self.inheritFlag, bool):
raise Exception(
'Inherit policy configuration flag has to be ' \
'true or false (defaults to true)')
self.policyList = []
for policyConfig in groupConfig['List']:
policy = {
PolicyTypeInterval.POLICY_NAME: PolicyTypeInterval,
PolicyTypeLevelRetention.POLICY_NAME: PolicyTypeLevelRetention,
PolicyTypeSize.POLICY_NAME: PolicyTypeSize
}.get(policyConfig['Name'], None)
if policy is None:
raise Exception(
'Unknown policy type "%s"' % policyConfig['Name'])
self.policyList.append(policy(policyConfig))
def isApplicableToSource(self, sourceName):
"""Check if this policy is applicable to a given source."""
return self.resourceRegex.match(sourceName) is not None
def isPolicyInheritanceEnabled(self):
"""Check if policy inheritance is enabled for this group."""
return self.inheritFlag
def getPolicies(self):
"""Get the list of policies in this group."""
return self.policyList
class StorageConfig():
"""This class implements the configuration of a storage location
without the status information."""
def __init__(self, fileName, parentConfig):
"""Create a storage configuration from a given file name.
@param fileName the filename where to load the storage configuration.
@param parentConfig this is the parent configuration that
included this configuration and therefor may also define
policies."""
self.configFileName = fileName
self.parentConfig = parentConfig
self.dataDir = None
# This is the list of files to ignore within the data directory.
self.ignoreFileList = []
# Keep all policies in a list as we need to loop over all of
# them anyway.
self.policyList = None
self.storageStatusFileName = None
self.storageStatus = None
self.includedConfigList = []
if not os.path.exists(self.configFileName):
raise Exception(
'Configuration file "%s" does not exist' % self.configFileName)
config = guerillabackup.Utils.jsonLoadWithComments(self.configFileName)
includeFileList = []
if not isinstance(config, dict):
raise Exception()
for configName, configData in config.items():
if configName == 'DataDir':
if not isinstance(configData, str):
raise Exception()
self.dataDir = self.canonicalizePathname(configData)
if not os.path.isdir(self.dataDir):
raise Exception(
'Data directory %s in configuration %s does ' \
'not exist or is not a directory' % (
self.dataDir, self.configFileName))
continue
if configName == 'Ignore':
self.ignoreFileList = configData
continue
if configName == 'Include':
includeFileList = configData
continue
if configName == 'Policies':
self.initPolicies(configData)
continue
if configName == 'Status':
if not isinstance(configData, str):
raise Exception()
self.storageStatusFileName = self.canonicalizePathname(configData)
if os.path.exists(self.storageStatusFileName):
if not os.path.isfile(self.storageStatusFileName):
raise Exception('Status file "%s" has to be file' % configData)
self.storageStatus = StorageStatus(
self,
guerillabackup.Utils.jsonLoadWithComments(
self.storageStatusFileName))
continue
raise Exception('Invalid configuration section "%s"' % configName)
# Always have a status object even when not loaded from file.
if self.storageStatus is None:
self.storageStatus = StorageStatus(self, None)
# Initialize the included configuration only after initialization
# of this object was completed because we pass this object as
# parent so it might already get used.
for includeFileName in includeFileList:
includeFileName = self.canonicalizePathname(includeFileName)
if not os.path.isfile(includeFileName):
raise Exception(
'Included configuration %s in configuration %s ' \
'does not exist or is not a file' % (
includeFileName, self.configFileName))
try:
includedConfig = StorageConfig(includeFileName, self)
self.includedConfigList.append(includedConfig)
except:
print(
'Failed to load configuration "%s" included from "%s".' % (
includeFileName, self.configFileName),
file=sys.stderr)
raise
def getConfigFileName(self):
"""Get the name of the file defining this configuration."""
return self.configFileName
def canonicalizePathname(self, pathname):
"""Canonicalize a pathname which might be a name relative
to the configuration file directory or a noncanonical absolute
path."""
if not os.path.isabs(pathname):
pathname = os.path.join(os.path.dirname(self.configFileName), pathname)
return os.path.realpath(pathname)
def getStatusFileName(self):
"""Get the name of the status file to use to keep validation
status information between validation runs."""
return self.storageStatusFileName
def getStatus(self):
"""Get the status object associated with this configuration."""
return self.storageStatus
def initPolicies(self, policyConfig):
"""Initialize all policies from the given JSON policy configuration.
The configuration may contain multiple policy definitions
per resource name regular expression, but that is just for
convenience and not reflected in the policy data structures."""
if not isinstance(policyConfig, list):
raise Exception('Policies not a list')
policyList = []
for policyGroupConfig in policyConfig:
policyList.append(PolicyGroup(policyGroupConfig))
self.policyList = policyList
def getDataDirectoryRelativePath(self, pathname):
"""Get the pathname of a file relative to the data directory.
@param pathname the absolute path name to resolve."""
return os.path.relpath(pathname, self.dataDir)
def initializeStorage(self, storageFileDict):
"""Initialize the storage by locating all files in this data
storage directory and also apply already known status information."""
# First load included configuration data to assign the most specific
# configuration and status to those resources.
for includeConfig in self.includedConfigList:
includeConfig.initializeStorage(storageFileDict)
# Walk the data directory to include any files not included yet.
for dirName, subDirs, subFiles in os.walk(self.dataDir):
for fileName in subFiles:
fileName = os.path.join(dirName, fileName)
fileInfo = storageFileDict.get(fileName, None)
if fileInfo is not None:
refConfig = fileInfo.getConfig()
if refConfig == self:
raise Exception('Logic error')
if self.dataDir == refConfig.dataDir:
raise Exception(
'Data directory "%s" part of at least two ' \
'(included) configurations' % self.dataDir)
if (not refConfig.dataDir.startswith(self.dataDir)) or \
(refConfig.dataDir[len(self.dataDir)] != '/'):
raise Exception(
'Directory tree inconsistency due to logic ' \
'error or concurrent (malicious) modifictions')
else:
# Add the file and reference to this configuration or override
# a less specific previous one.
storageFileDict[fileName] = StorageFileInfo(self.storageStatus)
# Now detect all the valid resources and group them.
self.storageStatus.updateResources(storageFileDict)
self.storageStatus.validate(storageFileDict)
def getSourcePolicies(self, sourceName):
"""Get the list of policies to be applied to a source."""
result = []
if self.parentConfig is not None:
# No need to clone the list as the topmost configuration will
# have started with a newly created list anyway.
result = self.parentConfig.getSourcePolicies(sourceName)
for policyGroup in self.policyList:
# Ignore policies not matching the source.
if not policyGroup.isApplicableToSource(sourceName):
continue
if not policyGroup.isPolicyInheritanceEnabled():
result = []
for policy in policyGroup.getPolicies():
for refPos, refPolicy in enumerate(result):
refPolicy = result[refPos]
if refPolicy.getPolicyName() == policy.getPolicyName():
if refPolicy.getPriority() < policy.getPriority():
result[refPos] = policy
policy = None
break
if policy is not None:
result.append(policy)
return result
class BackupDataElement():
"""This class stores information about a single backup data
element that is the reference to the backup data itself but
also the meta information. The class is designed in a way to
support current file based backup data element storage but
also future storages where the meta information may end up
in a relational database and the backup data is kept in a storage
system."""
# This name is used to store deletion status information in the
# storage status information in the same way as policy data. Unlike
# policy data, the delete status data cannot be serialized.
DELETE_STATUS_POLICY = 'Delete'
def __init__(self, sourceStatus, dateTimeId, elementType):
"""Create a element as part of the status of one backup source.
@param sourceStatus this is the reference to the complete
status of the backup source this element is belonging to.
@param dateTimeId the datetime string when this element was
created and optional ID number part. It has to contain at
least a 14 digit timestamp but may include more digits
which then sorted according to their integer value."""
self.sourceStatus = sourceStatus
if (len(dateTimeId) < 14) or (not dateTimeId.isnumeric()):
raise Exception('Datetime to short or not numeric')
self.dateTimeId = dateTimeId
if elementType not in ('full', 'inc'):
raise Exception()
self.type = elementType
self.dataFileName = None
self.dataLength = None
self.infoFileName = None
# This dictionary contains policy data associated with this element.
# The key is the name of the policy holding the data.
self.policyData = {}
def getSourceStatus(self):
"""Get the complete source status information this backup
data element is belonging to.
@return the source status object."""
return self.sourceStatus
def getDateTimeId(self):
"""Get the date, time and ID string of this element."""
return self.dateTimeId
def setFile(self, fileName, fileType):
"""Set a file of given type to define this backup data element."""
if fileType == 'data':
if self.dataFileName is not None:
raise Exception(
'Logic error redefining data file for "%s"' % (
self.sourceStatus.getSourceName()))
if not fileName.endswith('.data'):
raise Exception()
self.dataFileName = fileName
elif fileType == 'info':
if self.infoFileName is not None:
raise Exception()
self.infoFileName = fileName
else:
raise Exception('Invalid file type %s' % fileType)
def getElementName(self):
"""Get the name of the element in the storage."""
sourceName = self.sourceStatus.getSourceName()
elementStart = sourceName.rfind('/') +1
partStr = '%s-%s-%s' % (
self.dateTimeId, sourceName[elementStart:], self.type)
if elementStart:
return '%s%s' % (sourceName[:elementStart], partStr)
return partStr
def getDatetimeSeconds(self):
"""Get the datetime part of this element as seconds since
epoche."""
dateTime = datetime.datetime.strptime(self.dateTimeId[:14], '%Y%m%d%H%M%S')
# FIXME: no UTC conversion yet.
return int(dateTime.timestamp())
def getDatetimeTuple(self):
"""Get the datetime of this element as a tuple.
@return a tuple with the datetime part as string and the
serial part as integer or -1 when there is no serial part."""
dateTime = self.dateTimeId[:14]
serial = -1
if len(self.dateTimeId) > 14:
serialStr = self.dateTimeId[14:]
if (serialStr[0] == '0') and (len(serialStr) > 1):
raise Exception()
serial = int(serialStr)
return (dateTime, serial)
def getType(self):
"""Get the type of this backup data element."""
return self.type
def getDataLength(self):
"""Get the length of the binary backup data of this element."""
if self.dataLength is None:
self.dataLength = os.stat(self.dataFileName).st_size
return self.dataLength
def getStatusData(self):
"""Get the complete status data associated with this element.
Currently this data is identical to the complete policy data.
@return the status data or None when there is no status data
associated with this element."""
return self.policyData
def getPolicyData(self, policyName):
"""Get the policy data for a given policy name.
@return the data or None when there was no data defined yet."""
return self.policyData.get(policyName, None)
def setPolicyData(self, policyName, data):
"""Set the policy data for a given policy name."""
self.policyData[policyName] = data
def updatePolicyData(self, policyName, data):
"""Update the policy data for a given policy name by adding
or overriding the data.
@return the updated policy data"""
if not isinstance(data, dict):
raise Exception()
policyData = self.policyData.get(policyName, None)
if policyData is None:
policyData = dict(data)
self.policyData[policyName] = policyData
else:
policyData.update(data)
return policyData
def removePolicyData(self, policyName):
"""Remove the policy data for a given policy name if it exists."""
if policyName in self.policyData:
del self.policyData[policyName]
def initPolicyData(self, data):
"""Initialize the complete policy data of this backup data
element. This function may only be called while there is
no policy data defined yet."""
if self.policyData:
raise Exception(
'Attempted to initialize policy data twice for %s' % (
repr(self.dataFileName)))
self.policyData = data
def findUnsuedPolicyData(self, policyNames):
"""Check if this element contains policy data not belonging
to any policy.
@return the name of the first unused policy found or None."""
for key in self.policyData.keys():
if key not in policyNames:
return key
return None
def markForDeletion(self, deleteFlag):
"""Mark this element for deletion if it was not marked at
all or marked in the same way.
@param deleteFlag the deletion mark to set or None to remove
the current mark.
@raise Exception if there is already a conflicting mark."""
if (deleteFlag is not None) and (not isinstance(deleteFlag, bool)):
raise Exception()
policyData = self.getPolicyData(BackupDataElement.DELETE_STATUS_POLICY)
if (policyData is not None) and (not isinstance(policyData, bool)):
raise Exception()
if deleteFlag is None:
if policyData:
self.removePolicyData(BackupDataElement.DELETE_STATUS_POLICY)
else:
if (policyData is not None) and (policyData != deleteFlag):
raise Exception()
self.setPolicyData(BackupDataElement.DELETE_STATUS_POLICY, deleteFlag)
def isMarkedForDeletion(self):
"""Check if this element is marked for deletion."""
policyData = self.getPolicyData(BackupDataElement.DELETE_STATUS_POLICY)
if (policyData is not None) and (not isinstance(policyData, bool)):
raise Exception()
return bool(policyData)
def delete(self):
"""Delete all resources associated with this backup data
element."""
os.unlink(self.dataFileName)
os.unlink(self.infoFileName)
class BackupSourceStatus():
"""This class stores status information about a single backup
source, e.g. all BackupDataElements belonging to this source,
current policy status information ..."""
def __init__(self, storageStatus, sourceName):
self.storageStatus = storageStatus
# This is the unique name of this source.
self.sourceName = sourceName
# This dictionary contains all backup data elements belonging
# to this source with datetime and type as key.
self.dataElements = {}
def getStorageStatus(self):
"""Get the complete storage status containing the status
of this backup source.
@return the storage status object."""
return self.storageStatus
def getSourceName(self):
"""Get the name of the source that created the backup data
elements.
@return the name of the source."""
return self.sourceName
def addFile(self, fileName, dateTimeId, elementType, fileType):
"""Add a file to this storage status. With the current file
storage model, two files will define a backup data element.
@param fileName the file to add.
@param dateTimeId the datetime and additional serial number
information.
@param elementType the type of element to create, i.e. full
or incremental.
@param fileType the type of the file, i.e. data or info."""
key = (dateTimeId, elementType)
element = self.dataElements.get(key, None)
if element is None:
element = BackupDataElement(self, dateTimeId, elementType)
self.dataElements[key] = element
element.setFile(fileName, fileType)
def getDataElementList(self):
"""Get the sorted list of all data elements."""
result = list(self.dataElements.values())
result.sort(key=lambda x: x.getDatetimeTuple())
return result
def findElementByKey(self, dateTimeId, elementType):
"""Find an element by the identification key values.
@return the element or None when not found."""
return self.dataElements.get((dateTimeId, elementType), None)
def getElementIdString(self, element):
"""Get the identification string of a given element of this
source."""
idStr = ''
pathEndPos = self.sourceName.rfind('/')
if pathEndPos >= 0:
idStr = '%s/%s-%s-%s' % (
self.sourceName[:pathEndPos],
element.getDateTimeId(),
self.sourceName[pathEndPos+1:],
element.getType())
else:
idStr = '%s-%s-%s' % (
element.getDateTimeId(), self.sourceName,
element.getType())
return idStr
def removeDeleted(self):
"""Remove all elements that are marked deleted. The method
should only be invoked after applying all deletion policies."""
for key in list(self.dataElements.keys()):
element = self.dataElements[key]
if element.isMarkedForDeletion():
del self.dataElements[key]
def serializeStatus(self):
"""Serialize the status of all files belonging to this source.
@return a dictionary with the status information."""
status = {}
for element in self.dataElements.values():
statusData = element.getStatusData()
# Do not serialize the deletion policy data.
if BackupDataElement.DELETE_STATUS_POLICY in statusData:
statusData = dict(statusData)
del statusData[BackupDataElement.DELETE_STATUS_POLICY]
if not statusData:
statusData = None
if statusData:
status[self.getElementIdString(element)] = statusData
return status
class StorageStatus():
"""This class keeps track about the status of one storage location,
that are all tracked resources but also unrelated files within
the storage location."""
RESOURCE_NAME_REGEX = re.compile(
'^(?P[0-9]{14,})-(?P[0-9A-Za-z.-]+)-' \
'(?Pfull|inc)\\.(?Pdata|info)$')
def __init__(self, config, statusData):
self.config = config
self.statusData = statusData
if self.statusData is None:
self.statusData = {}
# This dictionary contains the name of each source (not file)
# found in this storage as key and the BackupSourceStatus element
# bundling all relevant information about the source.
self.trackedSources = {}
def getConfig(self):
"""Get the configuration managing this status."""
return self.config
def getStatusFileName(self):
"""Get the file name holding this backup status data."""
return self.config.getStatusFileName()
def findElementByName(self, name):
"""Find a backup element tracked by this status object by
name.
@return the element when found or None"""
relFileName = name + '.data'
nameStartPos = relFileName.rfind('/') + 1
match = StorageStatus.RESOURCE_NAME_REGEX.match(
relFileName[nameStartPos:])
if match is None:
raise Exception(
'Invalid element name "%s"' % str(name))
sourceName = match.group('name')
if nameStartPos != 0:
sourceName = relFileName[:nameStartPos] + sourceName
if sourceName not in self.trackedSources:
return None
sourceStatus = self.trackedSources[sourceName]
return sourceStatus.findElementByKey(
match.group('datetime'), match.group('type'))
def validate(self, storageFileDict):
"""Validate that all files tracked in the status can be still
found in the list of all storage files."""
for elemName in self.statusData.keys():
targetFileName = os.path.join(self.config.dataDir, elemName + '.data')
if targetFileName not in storageFileDict:
raise Exception(
'Invalid status of nonexisting file "%s.data" ' \
'in data directory "%s"' % (
elemName, self.config.dataDir))
def updateResources(self, storageFileDict):
"""Update the status of valid sources.
@param storageFileDict the dictionary keeping track of known
files from all configurations."""
ignoreSet = set()
ignoreSet.update(self.config.ignoreFileList)
for fileName, fileInfo in storageFileDict.items():
if fileInfo.getStatus() != self:
continue
relFileName = self.config.getDataDirectoryRelativePath(fileName)
if relFileName in ignoreSet:
ignoreSet.remove(relFileName)
continue
nameStartPos = relFileName.rfind('/') + 1
match = StorageStatus.RESOURCE_NAME_REGEX.match(
relFileName[nameStartPos:])
if match is None:
print(
'File "%s" (absolute "%s") should be ignored by ' \
'config "%s".' % (
fileInfo.getConfig().getDataDirectoryRelativePath(fileName),
fileName, fileInfo.getConfig().getConfigFileName()),
file=sys.stderr)
continue
sourceName = match.group('name')
if nameStartPos != 0:
sourceName = relFileName[:nameStartPos] + sourceName
sourceStatus = self.trackedSources.get(sourceName, None)
if sourceStatus is None:
sourceStatus = BackupSourceStatus(self, sourceName)
self.trackedSources[sourceName] = sourceStatus
sourceStatus.addFile(
fileName, match.group('datetime'), match.group('type'),
match.group('element'))
fileInfo.setBackupSource(sourceStatus)
for unusedIgnoreFile in ignoreSet:
print(
'WARNING: Nonexisting file "%s" ignored in configuration "%s".' % (
unusedIgnoreFile, self.config.getConfigFileName()),
file=sys.stderr)
for elemName, statusData in self.statusData.items():
element = self.findElementByName(elemName)
if element is None:
raise Exception(
'Invalid status of nonexisting element "%s" ' \
'in data directory "%s"' % (
elemName, self.config.dataDir))
element.initPolicyData(statusData)
def applyPolicies(self):
"""Apply the policy to this storage and all storages defined
in subconfigurations. For each storage this will check all
policy templates if one or more of them should be applied
to known sources managed by this configuration."""
for includeConfig in self.config.includedConfigList:
includeConfig.getStatus().applyPolicies()
for sourceName, sourceStatus in self.trackedSources.items():
policyList = self.config.getSourcePolicies(sourceName)
if len(policyList) == 0:
print('WARNING: no policies for "%s" in "%s"' % (
sourceName, self.config.getConfigFileName()), file=sys.stderr)
policyNames = set()
policyNames.add(BackupDataElement.DELETE_STATUS_POLICY)
for policy in policyList:
policyNames.add(policy.getPolicyName())
policy.apply(sourceStatus)
# Now check if any element is marked for deletion. In the same
# round detect policy status not belonging to any policy.
deleteList = []
for element in sourceStatus.getDataElementList():
if element.findUnsuedPolicyData(policyNames):
print(
'WARNING: Unused policy data for "%s" in ' \
'element "%s".' % (
element.findUnsuedPolicyData(policyNames),
element.getElementName()), file=sys.stderr)
if element.isMarkedForDeletion():
deleteList.append(element)
if not deleteList:
continue
# So there are deletions, show them and ask for confirmation.
print('Backup data to be kept/deleted for "%s" in storage "%s":' %
(sourceName, self.config.getConfigFileName()))
for element in sourceStatus.getDataElementList():
marker = '*'
if element.isMarkedForDeletion():
marker = ' '
print(
'%s %s %s' % (marker, element.getDateTimeId(), element.getType()))
inputText = None
if StorageTool.INTERACTIVE_MODE == 'keyboard':
inputText = input('Delete elements in "%s" (y/n)? ' % sourceName)
elif StorageTool.INTERACTIVE_MODE == 'force-no':
pass
elif StorageTool.INTERACTIVE_MODE == 'force-yes':
inputText = 'y'
if inputText != 'y':
# Deletion was not confirmed.
continue
# So there are deletions. Deleting data may corrupt the status
# data if there is any software or system error while processing
# the deletions. Therefore save the current status in a temporary
# file before modifying the data by applying deletion policies.
# Verify that there was no logic flaw assinging sources to the
# wrong storage.
if sourceStatus.getStorageStatus() != self:
raise Exception()
# Now save the current storage status to a temporary file as
# applying policies might have modified it already.
statusFileNamePreDelete = self.save(suffix='.pre-delete')
for policy in policyList:
policy.delete(sourceStatus)
# All policies were invoked, so remove the deleted elements from
# the status.
sourceStatus.removeDeleted()
# So at least updating the status data has worked, so save the
# status data.
statusFileNamePostDelete = self.save(suffix='.post-delete')
# Now just delete the files. File system errors or system crashes
# will still cause an inconsistent state, but that is easy to
# detect.
for element in deleteList:
element.delete()
os.unlink(statusFileNamePreDelete)
os.rename(statusFileNamePostDelete, self.config.getStatusFileName())
def save(self, suffix=None):
"""Save the current storage status to the status file or
a new file derived from the status file name by adding a
suffix. When saving to the status file, the old status file
may exist and is replaced. Saving to files with suffix is
only intended to create temporary files to avoid status data
corruption during critical operations. These files shall
be removed or renamed by the caller as soon as not needed
any more.
@return the file name the status data was written to."""
# First serialize all status data.
statusData = {}
for sourceStatus in self.trackedSources.values():
statusData.update(sourceStatus.serializeStatus())
targetFileName = self.config.getStatusFileName()
if suffix is not None:
targetFileName += suffix
if os.path.exists(targetFileName):
raise Exception()
targetFile = open(targetFileName, 'wb')
targetFile.write(bytes(json.dumps(
statusData, indent=2, sort_keys=True), 'ascii'))
targetFile.close()
return targetFileName
class StorageFileInfo():
"""This class stores information about one file found in the
file data storage directory."""
def __init__(self, status):
# This is the storage configuration authoritative for defining
# the status and policy of this file.
self.status = status
# This is the backup source that created the file data.
self.backupSource = None
def getStatus(self):
"""Get the authoritative status for this file."""
return self.status
def setBackupSource(self, backupSource):
"""Set the backup source this file belongs to."""
if self.backupSource is not None:
raise Exception('Logic error')
self.backupSource = backupSource
def getConfig(self):
"""Get the configuration associated with this storage file."""
return self.status.config
class StorageTool():
"""This class implements the storage tool main functions."""
# Use a singleton variable to define interactive behaviour.
INTERACTIVE_MODE = 'keyboard'
def __init__(self):
"""Create a StorageTool object with default configuration.
The object has to be properly initialized by loading configuration
data from files."""
self.configFileName = '/etc/guerillabackup/storage-tool-config.json'
self.config = None
# This is the dictionary of all known storage files found in
# the data directory of the main configuration or any subconfiguration.
self.storageFileDict = {}
def parseCommandLine(self):
"""This function will parse command line arguments and update
settings before loading of configuration."""
if self.config is not None:
raise Exception('Cannot reload configuration')
argPos = 1
while argPos < len(sys.argv):
argName = sys.argv[argPos]
argPos += 1
if not argName.startswith('--'):
raise Exception('Invalid argument "%s"' % argName)
if argName == '--Config':
self.configFileName = sys.argv[argPos]
argPos += 1
continue
if argName == '--Help':
print(
'Usage: %s [options]\n' \
'* --Config [file]: Use this configuration file not the default\n' \
' file at "/etc/guerillabackup/storage-tool-config.json".\n' \
'* --DryRun: Just report check results but do not ' \
'modify storage.\n' \
'* --Help: This output' % sys.argv[0],
file=sys.stderr)
sys.exit(0)
if argName == '--DryRun':
StorageTool.INTERACTIVE_MODE = 'force-no'
continue
print(
'Unknown parameter "%s", use "--Help" or see man page.' % argName,
file=sys.stderr)
sys.exit(1)
def loadConfiguration(self):
"""Load the configuration from the specified configuration file."""
if self.config is not None:
raise Exception('Cannot reload configuration')
self.config = StorageConfig(self.configFileName, None)
def initializeStorage(self):
"""Initialize the storage by locating all files in data storage
directories and also apply already known status information."""
self.config.initializeStorage(self.storageFileDict)
def applyPolicies(self):
"""Check all policy templates if one or more should be applied
to any known resource."""
self.config.getStatus().applyPolicies()
def main():
"""This is the program main function."""
tool = StorageTool()
tool.parseCommandLine()
tool.loadConfiguration()
# Now all recursive configurations are loaded. First initialize
# the storage.
tool.initializeStorage()
# Next check for files not already covered by policies according
# to status data. Suggest status changes for those.
tool.applyPolicies()
print('All policies applied.', file=sys.stderr)
if __name__ == '__main__':
main()
guerillabackup-0.5.0/src/gb-transfer-service 0000775 0000000 0000000 00000013027 14501370353 0021077 0 ustar 00root root 0000000 0000000 #!/usr/bin/python3 -BEsStt
"""This file defines a simple transfer service, that supports
inbound connections via a listening socket or receiving of file
descriptors if supported. An authentication helper may then provide
the agentId information. Otherwise it is just the base64 encoded
binary struct sockAddr information extracted from the file descriptor.
Authorization has to be performed outside the this service."""
import sys
# Adjust the Python sites path to include only the guerillabackup
# library addons, thus avoiding a large set of python site packages
# to be included in code run with root privileges. Also remove
# the local directory from the site path.
sys.path = sys.path[1:] + ['/usr/lib/guerillabackup/lib', '/etc/guerillabackup/lib-enabled']
import errno
import os
import signal
import guerillabackup
from guerillabackup.DefaultFileStorage import DefaultFileStorage
from guerillabackup.Transfer import SimpleTransferAgent
from guerillabackup.Transfer import SocketConnectorService
class TransferApplicationContext():
def __init__(self):
"""Initialize this application context without loading any
configuration. That has to be done separately e.g. by invoking
initFromSysArgs."""
self.serviceConfigFileName = '/etc/guerillabackup/config'
self.mainConfig = None
self.connectorService = None
self.forceShutdownFlag = False
def initFromSysArgs(self):
"""This method initializes the application context from the
system command line arguments but does not run the service
yet. Any errors during initialization will cause the program
to be terminated."""
argPos = 1
while argPos < len(sys.argv):
argName = sys.argv[argPos]
argPos += 1
if not argName.startswith('--'):
print('Invalid argument "%s"' % argName, file=sys.stderr)
sys.exit(1)
if argName == '--Config':
self.serviceConfigFileName = sys.argv[argPos]
argPos += 1
continue
print('Unknown parameter "%s"' % argName, file=sys.stderr)
sys.exit(1)
if not os.path.exists(self.serviceConfigFileName):
print('Configuration file %s does not exist' % (
repr(self.serviceConfigFileName),), file=sys.stderr)
sys.exit(1)
self.mainConfig = {}
try:
self.mainConfig = {'guerillabackup': guerillabackup}
guerillabackup.execConfigFile(
self.serviceConfigFileName, self.mainConfig)
except:
print('Failed to load configuration %s' % (
repr(self.serviceConfigFileName),), file=sys.stderr)
import traceback
traceback.print_tb(sys.exc_info()[2])
sys.exit(1)
def createPolicy(self, classNameKey, initArgsKey):
"""Create a policy with given keys."""
policyClass = self.mainConfig.get(classNameKey, None)
if policyClass is None:
return None
policyInitArgs = self.mainConfig.get(initArgsKey, None)
if policyInitArgs is None:
return policyClass(self.mainConfig)
policyInitArgs = [self.mainConfig]+policyInitArgs
return policyClass(*policyInitArgs)
def startService(self):
"""This method starts the transfer service."""
# Make stdout, stderr unbuffered to avoid data lingering in buffers
# when output is piped to another program.
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 1)
sys.stderr = os.fdopen(sys.stderr.fileno(), 'w', 1)
# Initialize the storage.
storageDirName = self.mainConfig.get('TransferServiceStorageBaseDir', None)
if storageDirName is None:
storageDirName = self.mainConfig.get(
guerillabackup.DefaultFileSystemSink.SINK_BASEDIR_KEY, None)
if storageDirName is None:
print('No storage configured, use configuration key "%s"' % (
guerillabackup.DefaultFileSystemSink.SINK_BASEDIR_KEY),
file=sys.stderr)
sys.exit(1)
if not os.path.isdir(storageDirName):
print('Storage directory %s does not exist or is inaccessible' % repr(storageDirName),
file=sys.stderr)
sys.exit(1)
storage = DefaultFileStorage(storageDirName, self.mainConfig)
transferAgent = SimpleTransferAgent()
runtimeDataDirPathname = guerillabackup.getRuntimeDataDirPathname(
self.mainConfig)
receiverPolicy = self.createPolicy(
guerillabackup.TRANSFER_RECEIVER_POLICY_CLASS_KEY,
guerillabackup.TRANSFER_RECEIVER_POLICY_INIT_ARGS_KEY)
senderPolicy = self.createPolicy(
guerillabackup.TRANSFER_SENDER_POLICY_CLASS_KEY,
guerillabackup.TRANSFER_SENDER_POLICY_INIT_ARGS_KEY)
try:
os.mkdir(runtimeDataDirPathname, 0o700)
except OSError as mkdirError:
if mkdirError.errno != errno.EEXIST:
raise
self.connectorService = SocketConnectorService(
os.path.join(runtimeDataDirPathname, 'transfer.socket'),
receiverPolicy, senderPolicy, storage, transferAgent)
signal.signal(signal.SIGINT, self.shutdown)
signal.signal(signal.SIGHUP, self.shutdown)
signal.signal(signal.SIGTERM, self.shutdown)
self.connectorService.run()
def shutdown(self, signum, frame):
"""This function triggers shutdown of the service. By default
when invoked for the first time, the method will still wait
10 seconds for any ongoing operations to complete. When invoked
twice that will trigger immediate service shutdown."""
forceShutdownTime = 10
if self.forceShutdownFlag:
forceShutdownTime = 0
self.connectorService.shutdown(forceShutdownTime=forceShutdownTime)
self.forceShutdownFlag = True
applicationContext = TransferApplicationContext()
applicationContext.initFromSysArgs()
applicationContext.startService()
guerillabackup-0.5.0/src/lib/ 0000775 0000000 0000000 00000000000 14501370353 0016044 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/src/lib/guerillabackup/ 0000775 0000000 0000000 00000000000 14501370353 0021036 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/src/lib/guerillabackup/BackupElementMetainfo.py 0000664 0000000 0000000 00000005322 14501370353 0025614 0 ustar 00root root 0000000 0000000 """This module contains only the class for in memory storage of
backup data element metadata."""
import base64
import json
class BackupElementMetainfo():
"""This class is used to store backup data element metadata
in memory."""
def __init__(self, valueDict=None):
"""Create a a new instance.
@param if not None, use this dictionary to initialize the
object. Invocation without a dictionary should only be used
internally during deserialization."""
self.valueDict = valueDict
if valueDict != None:
self.assertMetaInfoSpecificationConforming()
def get(self, keyName):
"""Get the value for a given key.
@return None when no value for the key was found."""
return self.valueDict.get(keyName, None)
def serialize(self):
"""Serialize the content of this object.
@return the ascii-encoded JSON serialization of this object."""
dumpMetainfo = {}
for key, value in self.valueDict.items():
if key in [
'DataUuid', 'MetaDataSignature', 'Predecessor',
'StorageFileChecksumSha512', 'StorageFileSignature']:
if value != None:
value = str(base64.b64encode(value), 'ascii')
dumpMetainfo[key] = value
return json.dumps(dumpMetainfo, sort_keys=True).encode('ascii')
def assertMetaInfoSpecificationConforming(self):
"""Make sure, that meta information values are conforming
to the minimal requirements from the specification for the
in-memory object variant of meta information."""
timestamp = self.valueDict.get('Timestamp', None)
if (timestamp is None) or not isinstance(timestamp, int) or (timestamp < 0):
raise Exception('Timestamp not found or not a positive integer')
backupType = self.valueDict.get('BackupType', None)
if backupType not in ['full', 'inc']:
raise Exception('BackupType missing or invalid')
checksum = self.valueDict.get('StorageFileChecksumSha512', None)
if checksum != None:
if not isinstance(checksum, bytes) or (len(checksum) != 64):
raise Exception('Invalid checksum type or length')
@staticmethod
def unserialize(serializedMetaInfoData):
"""Create a BackupElementMetainfo object from serialized data.
@param serializedMetaInfoData binary ascii-encoded JSON data"""
valueDict = json.loads(str(serializedMetaInfoData, 'ascii'))
for key, value in valueDict.items():
if key in [
'DataUuid', 'MetaDataSignature', 'Predecessor',
'StorageFileChecksumSha512', 'StorageFileSignature']:
if value != None:
value = base64.b64decode(value)
valueDict[key] = value
metaInfo = BackupElementMetainfo()
metaInfo.valueDict = valueDict
metaInfo.assertMetaInfoSpecificationConforming()
return metaInfo
guerillabackup-0.5.0/src/lib/guerillabackup/DefaultFileStorage.py 0000664 0000000 0000000 00000033717 14501370353 0025134 0 ustar 00root root 0000000 0000000 """This module provides a default file storage that allows storage
of new element using the sink interface. The storage used 3 files,
the main data file, an info file holding the meta information
and a lock file to allow race-free operation when multiple processes
use the same storage directory."""
import errno
import os
import stat
import guerillabackup
from guerillabackup.BackupElementMetainfo import BackupElementMetainfo
class DefaultFileStorage(
guerillabackup.DefaultFileSystemSink, guerillabackup.StorageInterface):
"""This is the interface of all stores for backup data elements
providing access to content data and metainfo but also additional
storage attributes. The main difference to a generator unit
is, that data is just retrieved but not generated on invocation."""
def __init__(self, storageDirName, configContext):
"""Initialize this store with parameters from the given configuration
context."""
self.storageDirName = None
self.openStorageDir(storageDirName, configContext)
def getBackupDataElement(self, elementId):
"""Retrieve a single stored backup data element from the storage.
@throws Exception when an incompatible query, update or read
is in progress."""
return FileStorageBackupDataElement(self.storageDirFd, elementId)
def getBackupDataElementForMetaData(self, sourceUrl, metaData):
"""Retrieve a single stored backup data element from the storage.
@param sourceUrl the URL identifying the source that produced
the stored data elements.
@param metaData metaData dictionary for the element of interest.
@throws Exception when an incompatible query, update or read
is in progress.
@return the element or None if no matching element was found."""
# At first get an iterator over all elements in file system that
# might match the given query.
guerillabackup.assertSourceUrlSpecificationConforming(sourceUrl)
elementIdParts = \
guerillabackup.DefaultFileSystemSink.internalGetElementIdParts(
sourceUrl, metaData)
# Now search the directory for all files conforming to the specifiction.
# As there may exist multiple files with the same time stamp and
# type, load also the meta data and check if matches the query.
elementDirFd = None
if len(elementIdParts[0]) == 0:
elementDirFd = os.dup(self.storageDirFd)
else:
try:
elementDirFd = guerillabackup.secureOpenAt(
self.storageDirFd, elementIdParts[0][1:], symlinksAllowedFlag=False,
dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY,
dirCreateMode=0o700,
fileOpenFlags=os.O_DIRECTORY|os.O_RDONLY|os.O_NOFOLLOW|os.O_CREAT|os.O_EXCL|os.O_NOCTTY)
except OSError as dirOpenError:
# Directory does not exist, so there cannot be any valid element.
if dirOpenError.errno == errno.ENOENT:
return None
raise
searchPrefix = elementIdParts[2]
searchSuffix = '-%s-%s.data' % (elementIdParts[1], elementIdParts[3])
result = None
try:
fileList = guerillabackup.listDirAt(elementDirFd)
for fileName in fileList:
if ((not fileName.startswith(searchPrefix)) or
(not fileName.endswith(searchSuffix))):
continue
# Just verify, that the serial part is really an integer but no
# need to handle the exception. This would indicate storage corruption,
# so we need to stop anyway.
serialStr = fileName[len(searchPrefix):-len(searchSuffix)]
if serialStr != '':
int(serialStr)
# So file might match, load the meta data.
metaDataFd = -1
fileMetaInfo = None
try:
metaDataFd = guerillabackup.secureOpenAt(
elementDirFd, './%s.info' % fileName[:-5],
symlinksAllowedFlag=False,
dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY,
dirCreateMode=None,
fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY)
metaInfoData = guerillabackup.readFully(metaDataFd)
fileMetaInfo = BackupElementMetainfo.unserialize(metaInfoData)
finally:
if metaDataFd >= 0:
os.close(metaDataFd)
if fileMetaInfo.get('DataUuid') != metaData.get('DataUuid'):
continue
elementId = '%s/%s' % (elementIdParts[0], fileName[:-5])
result = FileStorageBackupDataElement(self.storageDirFd, elementId)
break
finally:
os.close(elementDirFd)
return result
def queryBackupDataElements(self, query):
"""Query this storage.
@param query if None, return an iterator over all stored elements.
Otherwise query has to be a function returning True or False
for StorageBackupDataElementInterface elements.
@return BackupDataElementQueryResult iterator for this query.
@throws Exception if there are any open queries or updates
preventing response."""
return FileBackupDataElementQueryResult(self.storageDirFd, query)
class FileStorageBackupDataElement(
guerillabackup.StorageBackupDataElementInterface):
"""This class implements a file based backup data element."""
def __init__(self, storageDirFd, elementId):
"""Create a file based backup data element and make sure the
storage files are at least accessible without reading or validating
the content."""
# Extract the source URL from the elementId.
fileNameSepPos = elementId.rfind('/')
if (fileNameSepPos < 0) or (elementId[0] != '/'):
raise Exception('Invalid elementId without a separator')
lastNameStart = elementId.find('-', fileNameSepPos)
lastNameEnd = elementId.rfind('-')
if ((lastNameStart < 0) or (lastNameEnd < 0) or
(lastNameStart+1 >= lastNameEnd)):
raise Exception('Malformed last name in elementId')
self.sourceUrl = elementId[:fileNameSepPos+1]+elementId[lastNameStart+1:lastNameEnd]
guerillabackup.assertSourceUrlSpecificationConforming(self.sourceUrl)
# Now try to create the StorageBackupDataElementInterface element.
self.storageDirFd = storageDirFd
# Just stat the data and info file, that are mandatory.
os.stat('.'+elementId+'.data', dir_fd=self.storageDirFd)
os.stat('.'+elementId+'.info', dir_fd=self.storageDirFd)
self.elementId = elementId
# Cache the metainfo once loaded.
self.metaInfo = None
def getElementId(self):
"""Get the storage element ID of this data element."""
return self.elementId
def getSourceUrl(self):
"""Get the source URL of the storage element."""
return self.sourceUrl
def getMetaData(self):
"""Get only the metadata part of this element.
@return a BackupElementMetainfo object"""
if self.metaInfo != None:
return self.metaInfo
metaInfoData = b''
metaDataFd = -1
try:
metaDataFd = self.openElementFile('info')
metaInfoData = guerillabackup.readFully(metaDataFd)
self.metaInfo = BackupElementMetainfo.unserialize(metaInfoData)
finally:
if metaDataFd >= 0:
os.close(metaDataFd)
return self.metaInfo
def getDataStream(self):
"""Get a stream to read data from that element.
@return a file descriptor for reading this stream."""
dataFd = self.openElementFile('data')
return dataFd
def assertExtraDataName(self, name):
"""Make sure that file extension is a known one."""
if ((name in ['', 'data', 'info', 'lock']) or (name.find('/') >= 0) or
(name.find('-') >= 0) or (name.find('.') >= 0)):
raise Exception('Invalid extra data name')
def setExtraData(self, name, value):
"""Attach or detach extra data to this storage element. This
function is intended for agents to use the storage to persist
this specific data also.
@param value the extra data content or None to remove the
element."""
self.assertExtraDataName(name)
valueFileName = '.'+self.elementId+'.'+name
if value is None:
try:
os.unlink(valueFileName, dir_fd=self.storageDirFd)
except OSError as unlinkError:
if unlinkError.errno != errno.ENOENT:
raise
return
# . and - are forbidden in name, so such a temporary file should
# be colissionfree.
temporaryExtraDataFileName = '.%s.%s-%d' % (
self.elementId, name, os.getpid())
extraDataFd = backup.secureOpenAt(
self.storageDirFd, temporaryExtraDataFileName,
symlinksAllowedFlag=False,
dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY,
dirCreateMode=None,
fileOpenFlags=os.O_WRONLY|os.O_CREAT|os.O_EXCL|os.O_TRUNC|os.O_NOFOLLOW|os.O_NOCTTY)
try:
os.write(extraDataFd, value)
os.close(extraDataFd)
extraDataFd = -1
extraDataFileName = '.%s.%s' % (self.elementId, name)
try:
os.unlink(extraDataFileName, dir_fd=self.storageDirFd)
except OSError as unlinkError:
if unlinkError.errno != errno.ENOENT:
raise
os.link(
temporaryExtraDataFileName, extraDataFileName,
src_dir_fd=self.storageDirFd, dst_dir_fd=self.storageDirFd,
follow_symlinks=False)
# Do not let "finally" do the cleanup to on late failures to avoid
# deletion of both versions.
os.unlink(temporaryExtraDataFileName, dir_fd=self.storageDirFd)
finally:
if extraDataFd >= 0:
os.close(extraDataFd)
os.unlink(temporaryExtraDataFileName, dir_fd=self.storageDirFd)
def getExtraData(self, name):
"""@return None when no extra data was found, the content
otherwise"""
self.assertExtraDataName(name)
value = None
extraDataFd = -1
try:
extraDataFd = self.openElementFile(name)
value = guerillabackup.readFully(extraDataFd)
except OSError as readError:
if readError.errno != errno.ENOENT:
raise
finally:
os.close(extraDataFd)
return value
def delete(self):
"""Delete this data element. This will remove all files for
this element. The resource should be locked by the process
attempting removal if concurrent access is possible."""
lastFileSepPos = self.elementId.rfind('/')
dirFd = guerillabackup.secureOpenAt(
self.storageDirFd, '.'+self.elementId[:lastFileSepPos],
symlinksAllowedFlag=False,
dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY,
dirCreateMode=None,
fileOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY)
try:
fileNamePrefix = self.elementId[lastFileSepPos+1:]
for fileName in guerillabackup.listDirAt(dirFd):
if fileName.startswith(fileNamePrefix):
os.unlink(fileName, dir_fd=dirFd)
finally:
os.close(dirFd)
def lock(self):
"""Lock this backup data element.
@throws Exception if the element does not exist any more or
cannot be locked"""
lockFd = self.openElementFile(
'lock',
fileOpenFlags=os.O_WRONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY)
os.close(lockFd)
def unlock(self):
"""Unlock this backup data element."""
os.unlink('.'+self.elementId+'.lock', dir_fd=self.storageDirFd)
def openElementFile(self, name, fileOpenFlags=None):
"""Open the element file with given name.
@param fileOpenFlags when None, open the file readonly without
creating it.
@return the file descriptor to the new file."""
if fileOpenFlags is None:
fileOpenFlags = os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY
valueFileName = '.'+self.elementId+'.'+name
elementFd = guerillabackup.secureOpenAt(
self.storageDirFd, valueFileName, symlinksAllowedFlag=False,
dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY,
dirCreateMode=None,
fileOpenFlags=fileOpenFlags)
return elementFd
class FileBackupDataElementQueryResult(guerillabackup.BackupDataElementQueryResult):
"""This class provides results from querying a file based backup
data element storage."""
def __init__(self, storageDirFd, queryFunction):
self.queryFunction = queryFunction
self.storageDirFd = storageDirFd
# Create a stack with files and directory resources not listed yet.
# Each entry is a tuple with the file name prefix and the list
# of files.
self.dirStack = [('.', ['.'])]
def getNextElement(self):
"""Get the next backup data element from this query iterator.
@return a StorageBackupDataElementInterface object."""
while len(self.dirStack) != 0:
lastDirStackElement = self.dirStack[-1]
if len(lastDirStackElement[1]) == 0:
del self.dirStack[-1]
continue
# Check the type of the first element included in the list.
testName = lastDirStackElement[1][0]
del lastDirStackElement[1][0]
testPath = lastDirStackElement[0]+'/'+testName
if lastDirStackElement[0] == '.':
testPath = testName
# Stat without following links.
statData = os.stat(testPath, dir_fd=self.storageDirFd)
if stat.S_ISDIR(statData.st_mode):
# Add an additional level of to the stack.
fileList = guerillabackup.listDirAt(self.storageDirFd, testPath)
if len(fileList) != 0:
self.dirStack.append((testPath, fileList))
continue
if not stat.S_ISREG(statData.st_mode):
raise Exception('Found unexpected storage data elements ' \
'with stat data 0x%x' % statData.st_mode)
# So this is a normal file. Find the common prefix and remove
# all other files belonging to the same element from the list.
testNamePrefixPos = testName.rfind('.')
if testNamePrefixPos < 0:
raise Exception('Malformed element name %s' % repr(testPath))
testNamePrefix = testName[:testNamePrefixPos+1]
for testPos in range(len(lastDirStackElement[1])-1, -1, -1):
if lastDirStackElement[1][testPos].startswith(testNamePrefix):
del lastDirStackElement[1][testPos]
# Create the element anyway, it is needed for the query.
elementId = '/'
if lastDirStackElement[0] != '.':
elementId += lastDirStackElement[0]+'/'
elementId += testNamePrefix[:-1]
dataElement = FileStorageBackupDataElement(self.storageDirFd, elementId)
if (self.queryFunction != None) and (not self.queryFunction(dataElement)):
continue
return dataElement
return None
guerillabackup-0.5.0/src/lib/guerillabackup/DefaultFileSystemSink.py 0000664 0000000 0000000 00000022005 14501370353 0025625 0 ustar 00root root 0000000 0000000 """This module defines the classes for writing backup data elements
to the file system."""
import datetime
import errno
import hashlib
import os
import random
import guerillabackup
class DefaultFileSystemSink(guerillabackup.SinkInterface):
"""This class defines a sink to store backup data elements to
the filesystem. In test mode it will unlink the data file during
close and report an error."""
SINK_BASEDIR_KEY = 'DefaultFileSystemSinkBaseDir'
def __init__(self, configContext):
self.testModeFlag = False
storageDirName = configContext.get(
DefaultFileSystemSink.SINK_BASEDIR_KEY, None)
if storageDirName is None:
raise Exception('Mandatory sink configuration parameter ' \
'%s missing' % DefaultFileSystemSink.SINK_BASEDIR_KEY)
self.storageDirName = None
self.storageDirFd = -1
self.openStorageDir(storageDirName, configContext)
def openStorageDir(self, storageDirName, configContext):
"""Open the storage behind the sink. This method may only
be called once."""
if self.storageDirName != None:
raise Exception('Already defined')
self.storageDirName = storageDirName
self.storageDirFd = guerillabackup.secureOpenAt(
-1, self.storageDirName, symlinksAllowedFlag=False,
dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY,
dirCreateMode=None,
fileOpenFlags=os.O_DIRECTORY|os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY,
fileCreateMode=0o700)
self.testModeFlag = configContext.get(
guerillabackup.CONFIG_GENERAL_DEBUG_TEST_MODE_KEY, False)
if not isinstance(self.testModeFlag, bool):
raise Exception('Configuration parameter %s has to be ' \
'boolean' % guerillabackup.CONFIG_GENERAL_DEBUG_TEST_MODE_KEY)
def getSinkHandle(self, sourceUrl):
"""Get a handle to perform transfer of a single backup data
element to a sink."""
return DefaultFileSystemSinkHandle(
self.storageDirFd, self.testModeFlag, sourceUrl)
@staticmethod
def internalGetElementIdParts(sourceUrl, metaInfo):
"""Get the parts forming the element ID as tuple. The tuple
elements are directory part, timestamp string, storage file
name main part including the backup type. The storage file
name can be created easily by adding separators, an optional
serial after the timestamp and the file type suffix.
@return the tuple with all fields filled when metaInfo is not
None, otherwise only directory part is filled. The directory
will be an empty string for top level elements or the absolute
sourceUrl path up to but excluding the last slash."""
fileTimestampStr = None
backupTypeStr = None
if metaInfo != None:
fileTimestampStr = datetime.datetime.fromtimestamp(
metaInfo.get('Timestamp')).strftime('%Y%m%d%H%M%S')
backupTypeStr = metaInfo.get('BackupType')
lastPartSplitPos = sourceUrl.rfind('/')
return (
sourceUrl[:lastPartSplitPos], sourceUrl[lastPartSplitPos+1:],
fileTimestampStr, backupTypeStr)
class DefaultFileSystemSinkHandle(guerillabackup.SinkHandleInterface):
"""This class defines a handle for writing a backup data to
the file system."""
def __init__(self, storageDirFd, testModeFlag, sourceUrl):
"""Create a temporary storage file and a handle to it."""
self.testModeFlag = testModeFlag
self.sourceUrl = sourceUrl
guerillabackup.assertSourceUrlSpecificationConforming(sourceUrl)
self.elementIdParts = DefaultFileSystemSink.internalGetElementIdParts(
sourceUrl, None)
self.storageDirFd = None
if self.elementIdParts[0] == '':
self.storageDirFd = os.dup(storageDirFd)
else:
self.storageDirFd = guerillabackup.secureOpenAt(
storageDirFd, self.elementIdParts[0][1:], symlinksAllowedFlag=False,
dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY,
dirCreateMode=0o700,
fileOpenFlags=os.O_DIRECTORY|os.O_RDONLY|os.O_NOFOLLOW|os.O_CREAT|os.O_EXCL|os.O_NOCTTY,
fileCreateMode=0o700)
# Generate a temporary file name in the same directory.
while True:
self.tmpFileName = 'tmp-%s-%d' % (self.elementIdParts[1], random.randint(0, 1<<30))
try:
self.streamFd = guerillabackup.secureOpenAt(
self.storageDirFd, self.tmpFileName, symlinksAllowedFlag=False,
dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY,
dirCreateMode=None,
fileOpenFlags=os.O_RDWR|os.O_NOFOLLOW|os.O_CREAT|os.O_EXCL|os.O_NOCTTY,
fileCreateMode=0o600)
break
except OSError as openError:
if openError.errno != errno.EEXIST:
os.close(self.storageDirFd)
raise
def getSinkStream(self):
"""Get the file descriptor to write directly to the open backup
data element at the sink, if available.
@return the file descriptor or None when not supported."""
if self.streamFd is None:
raise Exception('Illegal state, already closed')
return self.streamFd
def write(self, data):
"""Write data to the open backup data element at the sink."""
os.write(self.streamFd, data)
def close(self, metaInfo):
"""Close the backup data element at the sink and receive any
pending or current error associated with the writing process.
When there is sufficient risk, that data written to the sink
is might have been corrupted during transit or storage, the
sink may decide to perform a verification operation while
closing and return any verification errors here also.
@param metaInfo python objects with additional information
about this backup data element. This information is added
at the end of the sink procedure to allow inclusion of checksum
or signature fields created on the fly while writing. See
design and implementation documentation for requirements on
those objects."""
if self.streamFd is None:
raise Exception('Illegal state, already closed')
self.elementIdParts = DefaultFileSystemSink.internalGetElementIdParts(
self.sourceUrl, metaInfo)
# The file name main part between timestamp (with serial) and
# suffix as string.
fileNameMainStr = '%s-%s' % (self.elementIdParts[1], self.elementIdParts[3])
fileChecksum = metaInfo.get('StorageFileChecksumSha512')
metaInfoStr = metaInfo.serialize()
try:
if fileChecksum != None:
# Reread the file and create checksum.
os.lseek(self.streamFd, os.SEEK_SET, 0)
digestAlgo = hashlib.sha512()
while True:
data = os.read(self.streamFd, 1<<20)
if len(data) == 0:
break
digestAlgo.update(data)
if fileChecksum != digestAlgo.digest():
raise Exception('Checksum mismatch')
# Link the name to the final pathname.
serial = -1
storageFileName = None
while True:
if serial < 0:
storageFileName = '%s-%s.data' % (
self.elementIdParts[2], fileNameMainStr)
else:
storageFileName = '%s%d-%s.data' % (
self.elementIdParts[2], serial, fileNameMainStr)
serial += 1
try:
os.link(
self.tmpFileName, storageFileName, src_dir_fd=self.storageDirFd,
dst_dir_fd=self.storageDirFd, follow_symlinks=False)
break
except OSError as linkError:
if linkError.errno != errno.EEXIST:
raise
# Now unlink the old file. With malicious actors we cannot be
# sure to unlink the file we have currently opened, but in worst
# case some malicious symlink is removed.
os.unlink(self.tmpFileName, dir_fd=self.storageDirFd)
# Now create the meta-information file. As the data file acted
# as a lock, there is nothing to fail except for severe system
# failure or malicious activity. So do not attempt to correct
# any errors at this stage. Create a temporary version first and
# then link it to have atomic completion operation instead of
# risk, that another system could pick up the incomplete info
# file.
metaInfoFileName = storageFileName[:-4]+'info'
metaInfoFd = guerillabackup.secureOpenAt(
self.storageDirFd, metaInfoFileName+'.tmp',
symlinksAllowedFlag=False,
dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY,
dirCreateMode=None,
fileOpenFlags=os.O_RDWR|os.O_NOFOLLOW|os.O_CREAT|os.O_EXCL|os.O_NOCTTY,
fileCreateMode=0o600)
os.write(metaInfoFd, metaInfoStr)
os.close(metaInfoFd)
if self.testModeFlag:
# Unlink all artefacts when operating in test mode to avoid accidential
os.unlink(storageFileName, dir_fd=self.storageDirFd)
os.unlink(metaInfoFileName+'.tmp', dir_fd=self.storageDirFd)
raise Exception('No storage in test mode')
os.link(
metaInfoFileName+'.tmp', metaInfoFileName,
src_dir_fd=self.storageDirFd, dst_dir_fd=self.storageDirFd,
follow_symlinks=False)
os.unlink(metaInfoFileName+'.tmp', dir_fd=self.storageDirFd)
finally:
os.close(self.storageDirFd)
self.storageDirFd = None
os.close(self.streamFd)
self.streamFd = None
guerillabackup-0.5.0/src/lib/guerillabackup/DigestPipelineElement.py 0000664 0000000 0000000 00000017245 14501370353 0025640 0 ustar 00root root 0000000 0000000 """This module contains only the classes for pipelined digest
calculation."""
import fcntl
import hashlib
import os
import guerillabackup
class DigestPipelineElement(
guerillabackup.TransformationPipelineElementInterface):
"""This class create pipeline instances for digest generation.
The instances will forward incoming data unmodified to allow
digest generation on the fly."""
def __init__(self, digestClass=hashlib.sha512):
self.digestClass = digestClass
def getExecutionInstance(self, upstreamProcessOutput):
"""Get an execution instance for this transformation element.
@param upstreamProcessOutput this is the output of the upstream
process, that will be wired as input of the newly created
process instance."""
return DigestPipelineExecutionInstance(
self.digestClass, upstreamProcessOutput)
class DigestPipelineExecutionInstance(
guerillabackup.TransformationProcessInterface):
"""This is the digest execution instance class created when
instantiating the pipeline."""
def __init__(self, digestClass, upstreamProcessOutput):
self.digest = digestClass()
self.digestData = None
# Keep the upstream process output until end of stream is reached.
self.upstreamProcessOutput = upstreamProcessOutput
self.processOutput = None
# Output stream for direct writing.
self.processOutputStream = None
self.processOutputBuffer = ''
def getProcessOutput(self):
"""Get the output connector of this transformation process."""
if self.processOutputStream is None:
raise Exception('No access to process output in stream mode')
if self.processOutput is None:
self.processOutput = DigestOutputInterface(self)
return self.processOutput
def setProcessOutputStream(self, processOutputStream):
"""Some processes may also support setting of an output stream
file descriptor. This is especially useful if the process
is the last one in a pipeline and hence could write directly
to a file or network descriptor.
@throw Exception if this process does not support setting
of output stream descriptors."""
if self.processOutput != None:
raise Exception('No setting of output stream after call to getProcessOutput')
# This module has no asynchronous operation mode, so writing to
# a given output stream in doProcess has to be non-blocking to
# avoid deadlock.
flags = fcntl.fcntl(processOutputStream, fcntl.F_GETFL)
fcntl.fcntl(processOutputStream, fcntl.F_SETFL, flags|os.O_NONBLOCK)
self.processOutputStream = processOutputStream
def isAsynchronous(self):
"""A asynchronous process just needs to be started and will
perform data processing on streams without any further interaction
while running."""
return False
def start(self):
"""Start this execution process."""
if (self.processOutput is None) and (self.processOutputStream is None):
raise Exception('Not connected')
if self.digest is None:
raise Exception('Cannot restart again')
# Nothing to do with that type of process.
def stop(self):
"""Stop this execution process when still running.
@return None when the the instance was already stopped, information
about stopping, e.g. the stop error message when the process
was really stopped."""
stopException = None
if self.processOutputBuffer != None:
data = self.upstreamProcessOutput.read(64)
self.upstreamProcessOutput.close()
if data != None:
stopException = Exception('Upstream output still open, there might be unprocessed data')
if self.digest is None:
return None
self.digestData = self.digest.digest()
self.digest = None
return stopException
def isRunning(self):
"""See if this process instance is still running."""
return self.digest != None
def doProcess(self):
"""This method triggers the data transformation operation
of this component. For components in synchronous mode, the
method will attempt to move data from input to output. Asynchronous
components will just check the processing status and may raise
an exception, when processing terminated with errors. As such
a component might not be able to detect the amount of data
really moved since last invocation, the component may report
a fake single byte move.
@throws Exception if an uncorrectable transformation state
was reached and transformation cannot proceed, even though
end of input data was not yet seen. Raise exception also when
process was not started or already stopped.
@return the number of bytes read or written or at least a
value greater zero if any data was processed. A value of zero
indicates, that currently data processing was not possible
due to filled buffers but should be attemted again. A value
below zero indicates that all input data was processed and
output buffers were flushed already."""
if self.digest is None:
return -1
movedDataLength = 0
if ((self.upstreamProcessOutput != None) and
(len(self.processOutputBuffer) == 0)):
self.processOutputBuffer = self.upstreamProcessOutput.readData(1<<16)
if self.processOutputBuffer is None:
self.upstreamProcessOutput.close()
self.upstreamProcessOutput = None
self.digestData = self.digest.digest()
self.digest = None
return -1
movedDataLength = len(self.processOutputBuffer)
if self.processOutputStream != None:
writeLength = os.write(self.processOutputStream, self.processOutputBuffer)
movedDataLength += writeLength
self.digest.update(self.processOutputBuffer[:writeLength])
if writeLength == len(self.processOutputBuffer):
self.processOutputBuffer = ''
else:
self.processOutputBuffer = self.processOutputBuffer[writeLength:]
return movedDataLength
def getBlockingStreams(self, readStreamList, writeStreamList):
"""Collect the file descriptors that are currently blocking
this synchronous compoment."""
if ((self.upstreamProcessOutput != None) and
(len(self.processOutputBuffer) == 0) and
(self.upstreamProcessOutput.getOutputStreamDescriptor() != None)):
readStreamList.append(
self.upstreamProcessOutput.getOutputStreamDescriptor())
if ((self.processOutputStream != None) and
(self.processOutputBuffer != None) and
(len(self.processOutputBuffer) != 0)):
writeStreamList.append(self.processOutputStream)
def getDigestData(self):
"""Get the data from this digest after processing was completed."""
if self.digest != None:
raise Exception('Digest processing not yet completed')
return self.digestData
class DigestOutputInterface(
guerillabackup.TransformationProcessOutputInterface):
"""Digest pipeline element output class."""
def __init__(self, executionInstance):
self.executionInstance = executionInstance
def getOutputStreamDescriptor(self):
"""Get the file descriptor to read output from this output
interface. This is not available for that type of digest element."""
return None
def readData(self, length):
"""Read data from this output.
@return the at most length bytes of data, zero-length data
if nothing available at the moment and None when end of input
was reached."""
if self.executionInstance.processOutputBuffer is None:
return None
returnData = self.executionInstance.processOutputBuffer
if length < len(self.executionInstance.processOutputBuffer):
returnData = self.executionInstance.processOutputBuffer[:length]
self.executionInstance.processOutputBuffer = \
self.executionInstance.processOutputBuffer[length:]
else:
self.executionInstance.processOutputBuffer = ''
return returnData
guerillabackup-0.5.0/src/lib/guerillabackup/GpgEncryptionPipelineElement.py 0000664 0000000 0000000 00000003164 14501370353 0027204 0 ustar 00root root 0000000 0000000 """This module provides support for a GnuPG based encryption pipeline
element."""
import guerillabackup
from guerillabackup.OSProcessPipelineElement import OSProcessPipelineExecutionInstance
class GpgEncryptionPipelineElement(
guerillabackup.TransformationPipelineElementInterface):
"""This class create pipeline instances for PGP encryption of
data stream using GnuPG."""
# Those are the default arguments beside key name.
gpgDefaultCallArguments = [
'/usr/bin/gpg', '--batch', '--lock-never',
'--no-options', '--homedir', '/etc/guerillabackup/keys',
'--trust-model', 'always', '--throw-keyids', '--no-emit-version',
'--encrypt']
def __init__(self, keyName, callArguments=gpgDefaultCallArguments):
"""Create the pipeline element.
@param When defined, pass those arguments to gpg when encrypting.
Otherwise gpgDefaultCallArguments are used."""
self.keyName = keyName
self.callArguments = callArguments
def getExecutionInstance(self, upstreamProcessOutput):
"""Get an execution instance for this transformation element.
@param upstreamProcessOutput this is the output of the upstream
process, that will be wired as input of the newly created
process instance."""
return OSProcessPipelineExecutionInstance(
self.callArguments[0],
self.callArguments+['--hidden-recipient', self.keyName],
upstreamProcessOutput, allowedExitStatusList=[0])
def replaceKey(self, newKeyName):
"""Return an encryption element with same gpg invocation arguments
but key name replaced."""
return GpgEncryptionPipelineElement(newKeyName, self.callArguments)
guerillabackup-0.5.0/src/lib/guerillabackup/LogfileBackupUnit.py 0000664 0000000 0000000 00000054226 14501370353 0024770 0 ustar 00root root 0000000 0000000 """This module provides all classes required for logfile backup."""
import base64
import errno
import hashlib
import json
import os
import re
import sys
import time
import traceback
import guerillabackup
from guerillabackup.BackupElementMetainfo import BackupElementMetainfo
from guerillabackup.TransformationProcessOutputStream import TransformationProcessOutputStream
# This is the key to the list of source files to include using
# a LogfileBackupUnit. The list is extracted from the configContext
# at invocation time of a backup unit, not at creation time. The
# structure of the parameter content is a list of source description
# entries. Each entry in the list is used to create an input description
# object of class LogfileBackupUnitInputDescription.
CONFIG_INPUT_LIST_KEY = 'LogBackupUnitInputList'
class LogfileBackupUnitInputDescription():
"""This class stores information about one set of logfiles to
be processed."""
def __init__(self, descriptionTuple):
"""Initialize a single input description using a 5-value tuple,
e.g. extracted directly from the CONFIG_INPUT_LIST_KEY parameter.
@param descriptionTuple the tuple, the meaning of the 5 values
to be extracted is:
* Input directory: directory to search for logfiles
* Input file regex: regular expression to select compressed
or uncompressed logfiles for inclusion.
* Source URL transformation: If None, the first named group
of the "input file regex" is used as source URL. When not
starting with a "/", the transformation string is the name
to include literally in the URL after the "input directory"
name.
* Policy: If not none, include this string as handling policy
within the manifest.
* Encryption key name: If not None, encrypt the input using
the named key."""
# Accept list also.
if ((not isinstance(descriptionTuple, tuple)) and
(not isinstance(descriptionTuple, list))):
raise Exception('Input description has to be list or tuple')
if len(descriptionTuple) != 5:
raise Exception('Input description has to be tuple with 5 elements')
self.inputDirectoryName = os.path.normpath(descriptionTuple[0])
# "//..." is a normalized path, get rid of double slashes.
self.sourceUrlPath = self.inputDirectoryName.replace('//', '/')
if self.sourceUrlPath[-1] != '/':
self.sourceUrlPath += '/'
self.inputFileRegex = re.compile(descriptionTuple[1])
self.sourceTransformationPattern = descriptionTuple[2]
try:
if self.sourceTransformationPattern is None:
guerillabackup.assertSourceUrlSpecificationConforming(
self.sourceUrlPath+'testname')
elif self.sourceTransformationPattern[0] != '/':
guerillabackup.assertSourceUrlSpecificationConforming(
self.sourceUrlPath+self.sourceTransformationPattern)
else:
guerillabackup.assertSourceUrlSpecificationConforming(
self.sourceTransformationPattern)
except Exception as assertException:
raise Exception('Source URL transformation malformed: '+assertException.args[0])
self.handlingPolicyName = descriptionTuple[3]
self.encryptionKeyName = descriptionTuple[4]
def getTransformedSourceName(self, matcher):
"""Get the source name for logfiles matching the input description."""
if self.sourceTransformationPattern is None:
return self.sourceUrlPath+matcher.group(1)
if self.sourceTransformationPattern[0] != '/':
return self.sourceUrlPath+self.sourceTransformationPattern
return self.sourceTransformationPattern
def tryIntConvert(value):
"""Try to convert a value to an integer for sorting. When conversion
fails, the value itself returned, thus sorting will be performed
lexigraphically afterwards."""
try:
return int(value)
except:
return value
class LogfileSourceInfo():
"""This class provides support to collect logfiles from one
LogfileBackupUnitInputDescription that all map to the same source
URL. This is needed to process all of them in the correct order,
starting with the oldest one."""
def __init__(self, sourceUrl):
self.sourceUrl = sourceUrl
self.serialTypesConsistentFlag = True
self.serialType = None
# List of tracked file information records. Each record is a tuple
# with 3 values: file name, regex matcher and serial data.
self.fileList = []
def addFile(self, fileName, matcher):
"""Add a logfile that will be mapped to the source URL of
this group."""
groupDict = matcher.groupdict()
serialType = None
if 'serial' in groupDict:
serialType = 'serial'
if 'oldserial' in groupDict:
if serialType != None:
self.serialTypesConsistentFlag = False
else:
serialType = 'oldserial'
if self.serialType is None:
self.serialType = serialType
elif self.serialType != serialType:
self.serialTypesConsistentFlag = False
serialData = []
if serialType != None:
serialValue = groupDict[serialType]
if (serialValue != None) and (len(serialValue) != 0):
serialData = [tryIntConvert(x) for x in re.findall('(\\d+|\\D+)', serialValue)]
# This is not very efficient but try to detect duplicate serialData
# values here already and tag the whole list as inconsistent.
# This may happen with broken regular expressions or when mixing
# compressed and uncompressed files with same serial.
for elemFileName, elemMatcher, elemSerialData in self.fileList:
if elemSerialData == serialData:
self.serialTypesConsistentFlag = False
self.fileList.append((fileName, matcher, serialData,))
def getSortedFileList(self):
"""Get the sorted file list starting with the oldest entry.
The oldest one should be moved to backup first."""
if not self.serialTypesConsistentFlag:
raise Exception('No sorting in inconsistent state')
fileList = sorted(self.fileList, key=lambda x: x[2])
if self.serialType is None:
if len(fileList) > 1:
raise Exception('No serial type and more than one file')
elif self.serialType == 'serial':
# Larger serial numbers denote newer files, only elements without
# serial data have to be moved to the end.
moveCount = 0
while (moveCount < len(fileList)) and (len(fileList[moveCount][2]) == 0):
moveCount += 1
fileList = fileList[moveCount:]+fileList[:moveCount]
elif self.serialType == 'oldserial':
# Larger serial numbers denote older files. File without serial would
# be first, so just reverse is sufficient.
fileList.reverse()
else:
raise Exception('Unsupported serial type %s' % self.serialType)
return fileList
class LogfileBackupUnit(guerillabackup.SchedulableGeneratorUnitInterface):
"""This class allows to schedule regular searches in a list
of log file directories for files matching a pattern. If files
are found and not open for writing any more, they are processed
according to specified transformation pipeline and deleted afterwards.
The unit will keep track of the last UUID reported for each
resource and generate a new one for each handled file using
json-serialized state data. The state data is a list with the
timestamp of the last run as seconds since 1970, the next list
value contains a dictionary with the resource name for each
logfile group as key and the last UUID as value."""
def __init__(self, unitName, configContext):
"""Initialize this unit using the given configuration."""
self.unitName = unitName
self.configContext = configContext
# This is the maximum interval in seconds between two invocations.
# When last invocation was more than that number of seconds in
# the past, the unit will attempt invocation at first possible
# moment.
self.maxInvocationInterval = 3600
# When this value is not zero, the unit will attempt to trigger
# invocation always at the same time using this value as modulus.
self.moduloInvocationUnit = 3600
# This is the invocation offset when modulus timing is enabled.
self.moduloInvocationTime = 0
# As immediate invocation cannot be guaranteed, this value defines
# the size of the window, within that the unit should still be
# invoked, even when the targeted time slot has already passed
# by.
self.moduloInvocationTimeWindow = 10
self.testModeFlag = configContext.get(
guerillabackup.CONFIG_GENERAL_DEBUG_TEST_MODE_KEY, False)
if not isinstance(self.testModeFlag, bool):
raise Exception('Configuration parameter %s has to be ' \
'boolean' % guerillabackup.CONFIG_GENERAL_DEBUG_TEST_MODE_KEY)
# Timestamp of last invocation end.
self.lastInvocationTime = -1
# Map from resource name to UUID of most recent file processed.
# The UUID is kept internally as binary data string. Only for
# persistency, data will be base64 encoded.
self.resourceUuidMap = {}
self.persistencyDirFd = guerillabackup.openPersistencyFile(
configContext, os.path.join('generators', self.unitName),
os.O_DIRECTORY|os.O_RDONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY, 0o700)
handle = None
try:
handle = guerillabackup.secureOpenAt(
self.persistencyDirFd, 'state.current',
fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY)
except OSError as openError:
if openError.errno != errno.ENOENT:
raise
# See if the state.previous file exists, if yes, the unit is likely
# to be broken. Refuse to do anything while in this state.
try:
os.stat(
'state.previous', dir_fd=self.persistencyDirFd, follow_symlinks=False)
raise Exception('Persistency data inconsistencies: found stale previous state file')
except OSError as statError:
if statError.errno != errno.ENOENT:
raise
# So there is only the current state file, if any.
stateInfo = None
if handle != None:
stateData = b''
while True:
data = os.read(handle, 1<<20)
if len(data) == 0:
break
stateData += data
os.close(handle)
stateInfo = json.loads(str(stateData, 'ascii'))
if ((not isinstance(stateInfo, list)) or (len(stateInfo) != 2) or
(not isinstance(stateInfo[0], int)) or
(not isinstance(stateInfo[1], dict))):
raise Exception('Persistency data structure mismatch')
self.lastInvocationTime = stateInfo[0]
self.resourceUuidMap = stateInfo[1]
for url, uuidData in self.resourceUuidMap.items():
self.resourceUuidMap[url] = base64.b64decode(uuidData)
def getNextInvocationTime(self):
"""Get the time in seconds until this unit should called again.
If a unit does not know (yet) as invocation needs depend on
external events, it should report a reasonable low value to
be queried again soon.
@return 0 if the unit should be invoked immediately, the seconds
to go otherwise."""
currentTime = int(time.time())
maxIntervalDelta = self.lastInvocationTime+self.maxInvocationInterval-currentTime
# Already overdue, activate immediately.
if maxIntervalDelta <= 0:
return 0
# No modulo time operation, just return the next delta value.
if self.moduloInvocationUnit == 0:
return maxIntervalDelta
# See if currentTime is within invocation window
moduloIntervalDelta = (currentTime%self.moduloInvocationUnit)-self.moduloInvocationTime
if moduloIntervalDelta < 0:
moduloIntervalDelta += self.moduloInvocationUnit
# See if immediate modulo invocation is possible.
if moduloIntervalDelta < self.moduloInvocationTimeWindow:
# We could be within the window, but only if last invocation happened
# during the previous modulo unit.
lastInvocationUnit = (self.lastInvocationTime-self.moduloInvocationTime)/self.moduloInvocationUnit
currentInvocationUnit = (currentTime-self.moduloInvocationTime)/self.moduloInvocationUnit
if lastInvocationUnit != currentInvocationUnit:
return 0
# We are still within the same invocation interval. Fall through
# to the out-of-window case to calculate the next invocation time.
moduloIntervalDelta = self.moduloInvocationUnit-moduloIntervalDelta
return min(maxIntervalDelta, moduloIntervalDelta)
def processInput(self, unitInput, sink):
"""Process a single input description by searching for files
that could be written to the sink."""
inputDirectoryFd = None
getFileOpenerInformationErrorMode = guerillabackup.OPENER_INFO_FAIL_ON_ERROR
if os.geteuid() != 0:
getFileOpenerInformationErrorMode = guerillabackup.OPENER_INFO_IGNORE_ACCESS_ERRORS
try:
inputDirectoryFd = guerillabackup.secureOpenAt(
None, unitInput.inputDirectoryName,
fileOpenFlags=os.O_DIRECTORY|os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY)
sourceDict = {}
for fileName in guerillabackup.listDirAt(inputDirectoryFd):
matcher = unitInput.inputFileRegex.match(fileName)
if matcher is None:
continue
sourceUrl = unitInput.getTransformedSourceName(matcher)
sourceInfo = sourceDict.get(sourceUrl, None)
if sourceInfo is None:
sourceInfo = LogfileSourceInfo(sourceUrl)
sourceDict[sourceUrl] = sourceInfo
sourceInfo.addFile(fileName, matcher)
# Now we know all files to be included for each URL. Sort them
# to fulfill Req:OrderedProcessing and start with the oldest.
for sourceUrl, sourceInfo in sourceDict.items():
if not sourceInfo.serialTypesConsistentFlag:
print('Inconsistent serial types in %s, ignoring ' \
'source.' % sourceInfo.sourceUrl, file=sys.stderr)
continue
# Get the downstream transformation pipeline elements.
downstreamPipelineElements = \
guerillabackup.getDefaultDownstreamPipeline(
self.configContext, unitInput.encryptionKeyName)
fileList = sourceInfo.getSortedFileList()
fileInfoList = guerillabackup.getFileOpenerInformation(
['%s/%s' % (unitInput.inputDirectoryName, x[0]) for x in fileList],
getFileOpenerInformationErrorMode)
for fileListIndex in range(0, len(fileList)):
fileName, matcher, serialData = fileList[fileListIndex]
# Make sure, that the file is not written any more.
logFilePathName = os.path.join(
unitInput.inputDirectoryName, fileName)
isOpenForWritingFlag = False
if fileInfoList[fileListIndex] != None:
for pid, fdInfoList in fileInfoList[fileListIndex]:
for fdNum, fdOpenFlags in fdInfoList:
if fdOpenFlags == 0o100001:
print('File %s is still written by pid %d, ' \
'fd %d' % (logFilePathName, pid, fdNum), file=sys.stderr)
isOpenForWritingFlag = True
elif fdOpenFlags != 0o100000:
print('File %s unknown open flags 0x%x by pid %d, ' \
'fd %d' % (
logFilePathName, fdOpenFlags, pid, fdNum), file=sys.stderr)
isOpenForWritingFlag = True
# Files have to be processed in correct order, so we have to stop
# here.
if isOpenForWritingFlag:
break
completePipleline = downstreamPipelineElements
compressionType = matcher.groupdict().get('compress', None)
if compressionType != None:
# Source file is compressed, prepend a suffix/content-specific
# decompression element.
compressionElement = None
if compressionType == 'gz':
compressionElement = guerillabackup.OSProcessPipelineElement(
'/bin/gzip', ['/bin/gzip', '-cd'])
else:
raise Exception('Unkown compression type %s for file %s/%s' % (
compressionType, unitInput.inputDirectoryName, fileName))
completePipleline = [compressionElement]+completePipleline[:]
logFileFd = guerillabackup.secureOpenAt(
inputDirectoryFd, fileName,
fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY)
logFileStatData = os.fstat(logFileFd)
# By wrapping the logFileFd into this object, the first pipeline
# element will close it. So we do not need to care here.
logFileOutput = TransformationProcessOutputStream(logFileFd)
sinkHandle = sink.getSinkHandle(sourceInfo.sourceUrl)
sinkStream = sinkHandle.getSinkStream()
# Get the list of started pipeline instances.
pipelineInstances = guerillabackup.instantiateTransformationPipeline(
completePipleline, logFileOutput, sinkStream, doStartFlag=True)
guerillabackup.runTransformationPipeline(pipelineInstances)
digestData = pipelineInstances[-1].getDigestData()
metaInfoDict = {}
metaInfoDict['BackupType'] = 'full'
if unitInput.handlingPolicyName != None:
metaInfoDict['HandlingPolicy'] = [unitInput.handlingPolicyName]
lastUuid = self.resourceUuidMap.get(sourceInfo.sourceUrl, None)
currentUuidDigest = hashlib.sha512()
if lastUuid != None:
metaInfoDict['Predecessor'] = lastUuid
currentUuidDigest.update(lastUuid)
# Add the compressed file digest. The consequence is, that it
# will not be completely obvious when the same file was processed
# with twice with encryption enabled and processing failed in
# late phase. Therefore identical file content cannot be detected.
currentUuidDigest.update(digestData)
# Also include the timestamp and original filename of the source
# file in the UUID calculation: Otherwise retransmissions of files
# with identical content cannot be distinguished.
currentUuidDigest.update(bytes('%d %s' % (
logFileStatData.st_mtime, fileName), sys.getdefaultencoding()))
currentUuid = currentUuidDigest.digest()
metaInfoDict['DataUuid'] = currentUuid
metaInfoDict['StorageFileChecksumSha512'] = digestData
metaInfoDict['Timestamp'] = int(logFileStatData.st_mtime)
metaInfo = BackupElementMetainfo(metaInfoDict)
sinkHandle.close(metaInfo)
if self.testModeFlag:
raise Exception('No completion of logfile backup in test mode')
# Delete the logfile.
os.unlink(fileName, dir_fd=inputDirectoryFd)
# Update the UUID map as last step: if any of the steps above
# would fail, currentUuid generated in next run will be identical
# to this. Sorting out the duplicates will be easy.
self.resourceUuidMap[sourceInfo.sourceUrl] = currentUuid
finally:
if inputDirectoryFd != None:
os.close(inputDirectoryFd)
def invokeUnit(self, sink):
"""Invoke this unit to create backup elements and pass them
on to the sink. Even when indicated via getNextInvocationTime,
the unit may decide, that it is not yet ready and not write
any element to the sink.
@return None if currently there is nothing to write to the
source, a number of seconds to retry invocation if the unit
assumes, that there is data to be processed but processing
cannot start yet, e.g. due to locks held by other parties
or resource, e.g. network storages, currently not available."""
nextInvocationDelta = self.getNextInvocationTime()
invocationAttemptedFlag = False
try:
if nextInvocationDelta == 0:
# We are now ready for processing. Get the list of source directories
# and search patterns to locate the target files.
unitInputListConfig = self.configContext.get(CONFIG_INPUT_LIST_KEY, None)
invocationAttemptedFlag = True
nextInvocationDelta = None
if unitInputListConfig is None:
print('Suspected configuration error: LogfileBackupUnit ' \
'enabled but %s configuration list empty' % CONFIG_INPUT_LIST_KEY,
file=sys.stderr)
else:
for configItem in unitInputListConfig:
unitInput = None
try:
unitInput = LogfileBackupUnitInputDescription(configItem)
except Exception as configReadException:
print('LogfileBackupUnit: failed to use configuration ' \
'%s: %s' % (
repr(configItem), configReadException.args[0]),
file=sys.stderr)
continue
# Configuration parsing worked, start processing the inputs.
self.processInput(unitInput, sink)
finally:
if invocationAttemptedFlag:
try:
# Update the timestamp.
self.lastInvocationTime = int(time.time())
# Write back the new state information immediately after invocation
# to avoid data loss when program crashes immediately afterwards.
# Keep one old version of state file.
try:
os.unlink('state.old', dir_fd=self.persistencyDirFd)
except OSError as relinkError:
if relinkError.errno != errno.ENOENT:
raise
try:
os.link(
'state.current', 'state.old', src_dir_fd=self.persistencyDirFd,
dst_dir_fd=self.persistencyDirFd, follow_symlinks=False)
except OSError as relinkError:
if relinkError.errno != errno.ENOENT:
raise
try:
os.unlink('state.current', dir_fd=self.persistencyDirFd)
except OSError as relinkError:
if relinkError.errno != errno.ENOENT:
raise
handle = guerillabackup.secureOpenAt(
self.persistencyDirFd, 'state.current',
fileOpenFlags=os.O_WRONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY,
fileCreateMode=0o600)
writeResourceUuidMap = {}
for url, uuidData in self.resourceUuidMap.items():
writeResourceUuidMap[url] = str(base64.b64encode(uuidData), 'ascii')
os.write(
handle,
json.dumps([
self.lastInvocationTime,
writeResourceUuidMap]).encode('ascii'))
os.close(handle)
except Exception as stateSaveException:
# Writing of state information failed. Print out the state information
# for manual reconstruction as last resort.
print('Writing of state information failed: %s\nCurrent ' \
'state: %s' % (
str(stateSaveException),
repr([self.lastInvocationTime, self.resourceUuidMap])),
file=sys.stderr)
traceback.print_tb(sys.exc_info()[2])
raise
# Declare the main unit class so that the backup generator can
# instantiate it.
backupGeneratorUnitClass = LogfileBackupUnit
guerillabackup-0.5.0/src/lib/guerillabackup/OSProcessPipelineElement.py 0000664 0000000 0000000 00000033741 14501370353 0026300 0 ustar 00root root 0000000 0000000 """This module contains classes for creation of asynchronous
OS process based pipeline elements."""
import fcntl
import os
import subprocess
import guerillabackup
from guerillabackup.TransformationProcessOutputStream import NullProcessOutputStream
from guerillabackup.TransformationProcessOutputStream import TransformationProcessOutputStream
class OSProcessPipelineElement(
guerillabackup.TransformationPipelineElementInterface):
"""This is the interface to define data transformation pipeline
elements, e.g. for compression, encryption, signing. To really
start execution of a transformation pipeline, transformation
process instances have to be created for each pipe element."""
def __init__(self, executable, execArgs, allowedExitStatusList=None):
"""Create the OSProcessPipelineElement element.
@param allowedExitStatusList when not defined, only command
exit code of 0 is accepted to indicated normal termination."""
self.executable = executable
self.execArgs = execArgs
if not guerillabackup.isValueListOfType(self.execArgs, str):
raise Exception('execArgs have to be list of strings')
if allowedExitStatusList is None:
allowedExitStatusList = [0]
self.allowedExitStatusList = allowedExitStatusList
def getExecutionInstance(self, upstreamProcessOutput):
"""Get an execution instance for this transformation element.
@param upstreamProcessOutput this is the output of the upstream
process, that will be wired as input of the newly created
process instance."""
return OSProcessPipelineExecutionInstance(
self.executable, self.execArgs, upstreamProcessOutput,
self.allowedExitStatusList)
class OSProcessPipelineExecutionInstance(
guerillabackup.TransformationProcessInterface):
"""This class defines the execution instance of an OSProcessPipeline
element."""
STATE_NOT_STARTED = 0
STATE_RUNNING = 1
# This state reached when the process has already terminated but
# input/output shutdown is still pending.
STATE_SHUTDOWN = 2
STATE_ENDED = 3
def __init__(self, executable, execArgs, upstreamProcessOutput, allowedExitStatusList):
self.executable = executable
self.execArgs = execArgs
self.upstreamProcessOutput = upstreamProcessOutput
if self.upstreamProcessOutput is None:
# Avoid reading from real stdin, use replacement output.
self.upstreamProcessOutput = NullProcessOutputStream()
self.upstreamProcessOutputBuffer = b''
self.inputPipe = None
self.allowedExitStatusList = allowedExitStatusList
# Simple state tracking to be more consistent on multiple invocations
# of the same method. States are "not starte", "running", "ended"
self.processState = OSProcessPipelineExecutionInstance.STATE_NOT_STARTED
self.process = None
# Process output instance of this process only when no output
# file descriptor is set.
self.processOutput = None
# This exception holds any processing error until doProcess()
# or stop() is called.
self.processingException = None
def createProcess(self, outputFd):
"""Create the process.
@param outputFd if not None, use this as output stream descriptor."""
# Create the process file descriptor pairs manually. Otherwise
# it is not possible to wait() for the process first and continue
# to read from the other side of the pipe after garbage collection
# of the process object.
self.inputPipe = None
outputPipeFds = None
if outputFd is None:
outputPipeFds = os.pipe2(os.O_CLOEXEC)
outputFd = outputPipeFds[1]
if self.upstreamProcessOutput.getOutputStreamDescriptor() is None:
self.process = subprocess.Popen(
self.execArgs, executable=self.executable, stdin=subprocess.PIPE,
stdout=outputFd)
self.inputPipe = self.process.stdin
flags = fcntl.fcntl(self.inputPipe.fileno(), fcntl.F_GETFL)
fcntl.fcntl(self.inputPipe.fileno(), fcntl.F_SETFL, flags|os.O_NONBLOCK)
else:
self.process = subprocess.Popen(
self.execArgs, executable=self.executable,
stdin=self.upstreamProcessOutput.getOutputStreamDescriptor(),
stdout=outputFd)
self.processOutput = None
if outputPipeFds is not None:
# Close the write side now.
os.close(outputPipeFds[1])
self.processOutput = TransformationProcessOutputStream(
outputPipeFds[0])
def getProcessOutput(self):
"""Get the output connector of this transformation process."""
if self.processState != OSProcessPipelineExecutionInstance.STATE_NOT_STARTED:
raise Exception('Output manipulation only when not started yet')
if self.process is None:
self.createProcess(None)
if self.processOutput is None:
raise Exception('No access to process output in stream mode')
return self.processOutput
def setProcessOutputStream(self, processOutputStream):
"""Some processes may also support setting of an output stream
file descriptor. This is especially useful if the process
is the last one in a pipeline and hence could write directly
to a file or network descriptor.
@throw Exception if this process does not support setting
of output stream descriptors."""
if self.processState != OSProcessPipelineExecutionInstance.STATE_NOT_STARTED:
raise Exception('Output manipulation only when not started yet')
if self.process is not None:
raise Exception('No setting of output stream after previous ' \
'setting or call to getProcessOutput')
self.createProcess(processOutputStream)
def checkConnected(self):
"""Check if this process instance is already connected to
an output, e.g. via getProcessOutput or setProcessOutputStream."""
if (self.processState == OSProcessPipelineExecutionInstance.STATE_NOT_STARTED) and \
(self.process is None):
raise Exception('Operation mode not known while not fully connected')
# Process instance only created when connected, so everything OK.
def isAsynchronous(self):
"""A asynchronous process just needs to be started and will
perform data processing on streams without any further interaction
while running."""
self.checkConnected()
return self.inputPipe is None
def start(self):
"""Start this execution process."""
if self.processState != OSProcessPipelineExecutionInstance.STATE_NOT_STARTED:
raise Exception('Already started')
self.checkConnected()
# The process itself was already started when being connected.
# Just update the state here.
self.processState = OSProcessPipelineExecutionInstance.STATE_RUNNING
def stop(self):
"""Stop this execution process when still running.
@return None when the the instance was already stopped, information
about stopping, e.g. the stop error message when the process
was really stopped."""
if self.processState == OSProcessPipelineExecutionInstance.STATE_NOT_STARTED:
raise Exception('Not started')
# We are already stopped, do othing here.
if self.processState == OSProcessPipelineExecutionInstance.STATE_ENDED:
return None
# Clear any pending processing exceptions. This is the last chance
# for reporting anyway.
stopException = self.processingException
self.processingException = None
if self.processState == OSProcessPipelineExecutionInstance.STATE_RUNNING:
# The process was not stopped yet, do it. There is a small chance
# that we send a signal to a dead process before waiting on it
# and hence we would have a normal termination here. Ignore that
# case, a stop() indicates need for abnormal termination with
# risk of data loss.
self.process.kill()
self.process.wait()
self.process = None
self.processState = OSProcessPipelineExecutionInstance.STATE_SHUTDOWN
# Now we are in STATE_SHUTDOWN.
self.finishProcess()
# If there was no previous exception, copy any exception from
# finishing the process.
if stopException is None:
stopException = self.processingException
self.processingException = None
return stopException
def isRunning(self):
"""See if this process instance is still running.
@return False if instance was not yet started or already stopped.
If there are any unreported pending errors from execution,
this method will return True until doProcess() or stop() is
called at least once."""
if self.processState in [
OSProcessPipelineExecutionInstance.STATE_NOT_STARTED,
OSProcessPipelineExecutionInstance.STATE_ENDED]:
return False
if self.processingException is not None:
# There is a pending exception, which is cleared only in doProcess()
# or stop(), so pretend that the process is still running.
return True
if self.processState == OSProcessPipelineExecutionInstance.STATE_RUNNING:
(pid, status) = os.waitpid(self.process.pid, os.WNOHANG)
if pid == 0:
return True
self.process = None
self.processState = OSProcessPipelineExecutionInstance.STATE_SHUTDOWN
if (status&0xff) != 0:
self.processingException = Exception('Process end by signal %d, ' \
'status 0x%x' % (status&0xff, status))
elif (status>>8) not in self.allowedExitStatusList:
self.processingException = Exception('Process end with unexpected ' \
'exit status %d' % (status>>8))
# We are in shutdown here. See if we can finish that phase immediately.
if self.processingException is None:
try:
self.upstreamProcessOutput.close()
self.processState = OSProcessPipelineExecutionInstance.STATE_ENDED
except Exception as closeException:
self.processingException = closeException
# Pretend that we are still running so that pending exception
# is reported with next doProcess() call.
return self.processState != OSProcessPipelineExecutionInstance.STATE_ENDED
def doProcess(self):
"""This method triggers the data transformation operation
of this component. For components in synchronous mode, the
method will attempt to move data from input to output. Asynchronous
components will just check the processing status and may raise
an exception, when processing terminated with errors. As such
a component might not be able to detect the amount of data
really moved since last invocation, the component may report
a fake single byte move.
@throws Exception if an uncorrectable transformation state
was reached and transformation cannot proceed, even though
end of input data was not yet seen. Raise exception also when
process was not started or already stopped.
@return the number of bytes read or written or at least a
value greater zero if any data was processed. A value of zero
indicates, that currently data processing was not possible
due to filled buffers but should be attemted again. A value
below zero indicates that all input data was processed and
output buffers were flushed already."""
if self.processState == OSProcessPipelineExecutionInstance.STATE_NOT_STARTED:
raise Exception('Not started')
if self.processState == OSProcessPipelineExecutionInstance.STATE_ENDED:
# This must be a logic error attempting to process data when
# already stopped.
raise Exception('Process %s already stopped' % self.executable)
if self.processingException is not None:
processingException = self.processingException
self.processingException = None
# We are dead here anyway, close inputs and outputs ignoring any
# data possibly lost.
self.finishProcess()
self.processState = OSProcessPipelineExecutionInstance.STATE_ENDED
raise processingException
if self.inputPipe is not None:
if len(self.upstreamProcessOutputBuffer) == 0:
self.upstreamProcessOutputBuffer = self.upstreamProcessOutput.readData(1<<16)
if self.upstreamProcessOutputBuffer is None:
self.inputPipe.close()
self.inputPipe = None
if ((self.upstreamProcessOutputBuffer is not None) and
(len(self.upstreamProcessOutputBuffer) != 0)):
writeLength = self.inputPipe.write(self.upstreamProcessOutputBuffer)
if writeLength == len(self.upstreamProcessOutputBuffer):
self.upstreamProcessOutputBuffer = b''
else:
self.upstreamProcessOutputBuffer = self.upstreamProcessOutputBuffer[writeLength:]
return writeLength
if self.isRunning():
# Pretend that we are still waiting for more input, thus polling
# may continue when at least another component moved data.
return 0
# All pipes are empty and no more processing is possible.
return -1
def getBlockingStreams(self, readStreamList, writeStreamList):
"""Collect the file descriptors that are currently blocking
this synchronous compoment."""
# The upstream input can be ignored when really a file descriptor,
# it is wired to this process for asynchronous use anyway. When
# not a file descriptor, writing to the input pipe may block.
if self.inputPipe is not None:
writeStreamList.append(self.inputPipe.fileno())
def finishProcess(self):
"""This method cleans up the current process after operating
system process termination but maybe before handling of pending
exceptions. The method will set the processingException when
finishProcess caused any errors. An error is also when this
method is called while upstream did not close the upstream
output stream yet."""
readData = self.upstreamProcessOutput.readData(64)
if readData is not None:
if len(readData) == 0:
self.processingException = Exception('Upstream did not finish yet, data might be lost')
else:
self.processingException = Exception('Not all upstream data processed')
# Upstream is delivering data that was not processed. Close the
# output so that upstream will also notice when attempting to
# write to will receive an exception.
self.upstreamProcessOutput.close()
self.upstreamProcessOutput = None
if self.upstreamProcessOutputBuffer is not None:
self.processingException = Exception(
'Output buffers to process not drained yet, %d bytes lost' %
len(self.upstreamProcessOutputBuffer))
self.upstreamProcessOutputBuffer = None
self.process = None
guerillabackup-0.5.0/src/lib/guerillabackup/TarBackupUnit.py 0000664 0000000 0000000 00000077745 14501370353 0024150 0 ustar 00root root 0000000 0000000 """This unit provides support for full and incremental tar backups.
All tar backups within the same unit are created sequentially,
starting with the one most overdue.
For incremental backups, backup indices are kept in the persistency
storage of this backup unit. Those files might be compressed by
the backup unit and are kept for a limited timespan. An external
process might also remove them without causing damage."""
import base64
import datetime
import errno
import hashlib
import json
import os
import subprocess
import sys
import time
import traceback
import guerillabackup
from guerillabackup.BackupElementMetainfo import BackupElementMetainfo
# This is the key to the list of tar backups to schedule and perform
# using the given unit. Each entry has to be a dictionary containing
# entries to create TarBackupUnitDescription objects. See there
# for more information.
CONFIG_LIST_KEY = 'TarBackupUnitConfigList'
class TarBackupUnitDescription:
"""This class collects all the information about a single tar
backup to be scheduled by a unit."""
def __init__(self, sourceUrl, descriptionDict):
"""Initialize a single tar backup description using values
from a dictionary, e.g. extracted directly from the CONFIG_LIST_KEY
parameter.
@param sourceUrl the URL which will identify the backups from
this subunit. It is used also to store persistency information
for that unit.
@param descriptionDict dictionary containing the parameters
for a single tar backup task.
* PreBackupCommand: execute this command given as list of
arguments before starting the backup, e.g. create a filesystem
or virtual machine snapshot, perform cleanup.
* PostBackupCommand: execute this command after starting the
backup.
* Root: root directory of tar backup, "/" when missing.
* Include: list of pathes to include, ["."] when missing.
* Exclude: list of patterns to exclude from backup (see tar
documentation "--exclude"). When missing and Root is "/",
list ["./var/lib/guerillabackup/data"] is used.
* IgnoreBackupRaces: flag to indicate if races during backup
are acceptable, e.g. because the directories are modified,
files changed or removed. When set, such races will not
result in non-zero exit status. Off course, it would be
more sensible to deploy a snapshot based backup variant
using the Pre/PostBackupCommand functions.
* FullBackupTiming: tuple with minimum and maximum interval
between full backup invocations and modulo base and offset,
all in seconds. Without modulo invocation (all values None),
full backups will run as soon as minimum interval is exceeded.
With modulo timing, modulo trigger is ignored when below
minimum time. When gap above maximum interval, immediate
backup is started.
* IncBackupTiming: When set, incremental backups are created
to fill the time between full backups. Timings are specified
as tuple with same meaning as in FullBackupTiming parameter.
This will also trigger generation of tar file indices when
running full backups.
* FullOverrideCommand: when set, parameters Exclude, Include,
Root are ignored and exactly the given command is executed.
* IncOverrideCommand: when set, parameters Exclude, Include,
Root are ignored and exactly the given command is executed.
* KeepIndices: number of old incremental tar backup indices
to keep. With -1 keep all, otherwise keep one the given
number. Default is 0.
* Policy: If not none, include this string as handling policy
within the manifest.
* EncryptionKey: If not None, encrypt the input using the
named key. Otherwise default encryption key from global
configuration might be used."""
if not isinstance(sourceUrl, str):
raise Exception('Source URL has to be string')
guerillabackup.assertSourceUrlSpecificationConforming(sourceUrl)
self.sourceUrl = sourceUrl
if not isinstance(descriptionDict, dict):
raise Exception('Input description has to be dictionary')
self.preBackupCommandList = None
self.postBackupCommandList = None
self.backupRoot = None
self.backupIncludeList = None
self.backupExcludeList = None
self.ignoreBackupRacesFlag = False
self.fullBackupTiming = None
self.incBackupTiming = None
self.fullBackupOverrideCommand = None
self.incBackupOverrideCommand = None
self.handlingPolicyName = None
self.encryptionKeyName = None
self.keepOldIndicesCount = 0
for configKey, configValue in descriptionDict.items():
if ((configKey == 'PreBackupCommand') or
(configKey == 'PostBackupCommand') or
(configKey == 'FullOverrideCommand') or
(configKey == 'IncOverrideCommand')):
if not guerillabackup.isValueListOfType(configValue, str):
raise Exception('Parameter %s has to be list of string' % configKey)
if configKey == 'PreBackupCommand':
self.preBackupCommandList = configValue
elif configKey == 'PostBackupCommand':
self.postBackupCommandList = configValue
elif configKey == 'FullOverrideCommand':
self.fullBackupOverrideCommand = configValue
elif configKey == 'IncOverrideCommand':
self.incBackupOverrideCommand = configValue
else:
raise Exception('Logic error')
elif configKey == 'Root':
self.backupRoot = configValue
elif configKey == 'Include':
if not (isinstance(configValue, list) or
isinstance(configValue, tuple)):
raise Exception(
'Parameter %s has to be list or tuple' % configKey)
self.backupIncludeList = configValue
elif configKey == 'Exclude':
if not (isinstance(configValue, list) or
isinstance(configValue, tuple)):
raise Exception(
'Parameter %s has to be list or tuple' % configKey)
self.backupExcludeList = configValue
elif configKey == 'IgnoreBackupRaces':
self.ignoreBackupRacesFlag = configValue
elif ((configKey == 'FullBackupTiming') or
(configKey == 'IncBackupTiming')):
if (not isinstance(configValue, list)) or (len(configValue) != 4):
raise Exception(
'Parameter %s has to be list with 4 values' % configKey)
if configValue[0] is None:
raise Exception('Parameter %s minimum interval value must not be None' % configKey)
for timeValue in configValue:
if (timeValue != None) and (not isinstance(timeValue, int)):
raise Exception(
'Parameter %s contains non-number element' % configKey)
if configValue[2] != None:
if ((configValue[2] <= 0) or (configValue[3] < 0) or
(configValue[3] >= configValue[2])):
raise Exception(
'Parameter %s modulo timing values invalid' % configKey)
if configKey == 'FullBackupTiming':
self.fullBackupTiming = configValue
else: self.incBackupTiming = configValue
elif configKey == 'KeepIndices':
if not isinstance(configValue, int):
raise Exception('KeepIndices has to be integer value')
self.keepOldIndicesCount = configValue
elif configKey == 'Policy':
self.handlingPolicyName = configValue
elif configKey == 'EncryptionKey':
self.encryptionKeyName = configValue
else:
raise Exception('Unsupported parameter %s' % configKey)
if self.fullBackupTiming is None:
raise Exception('Mandatory FullBackupTiming parameter missing')
# The remaining values are not from the unit configuration but
# unit state persistency instead.
self.lastFullBackupTime = None
self.lastAnyBackupTime = None
self.lastUuidValue = None
# When not None, delay execution of any any backup for that resource
# beyond the given time. This is intended for backups that failed
# to run to avoid invocation loops. This value is not persistent
# between multiple software invocations.
self.nextRetryTime = None
def getNextInvocationInfo(self, currentTime):
"""Get the next invocation time for this unit description.
@return a tuple with the the number of seconds till invocation
or a negative value if invocation is already overdue and the
type of backup to generate, full or incremental (inc)."""
# See if backup generation is currently blocked.
if self.nextRetryTime != None:
if currentTime < self.nextRetryTime:
return (self.nextRetryTime-currentTime, None)
self.nextRetryTime = None
if self.lastFullBackupTime is None:
return (-self.fullBackupTiming[1], 'full')
# Use this as default value. At invocation time, unit may still
# decide if backup generation is really neccessary.
lastOffset = currentTime-self.lastFullBackupTime
result = (self.fullBackupTiming[1]-lastOffset, 'full')
if self.fullBackupTiming[2] != None:
# This is the delta to the previous (negative) or next (positive)
# preferred timepoint.
delta = self.fullBackupTiming[3]-(currentTime%self.fullBackupTiming[2])
if delta < 0:
# Add default modulo value if preferred backup timepoint is in
# the past.
delta += self.fullBackupTiming[2]
if delta+lastOffset < self.fullBackupTiming[0]:
# If time from last backup to next preferred time is below minimum,
# then add again the default modulo value. Most likely this will
# increase the backup time to be above the maximum backup interval.
delta += self.fullBackupTiming[2]
if delta+lastOffset > self.fullBackupTiming[1]:
# If time from last backup to next preferred time is above maximum,
# use the maximum interval to calculate a new delta.
delta = self.fullBackupTiming[1]-lastOffset
if delta < result[0]:
result = (delta, 'full')
# When a full backup is overdue, report it. Do not care if there
# are incremental backups more overdue.
if result[0] <= 0:
return result
if self.incBackupTiming != None:
lastOffset = currentTime-self.lastAnyBackupTime
if self.incBackupTiming[2] is None:
# Normal minimum, maximum timing mode.
delta = self.incBackupTiming[0]-lastOffset
if delta < result[0]:
result = (delta, 'inc')
else:
delta = self.incBackupTiming[3]-(currentTime%self.incBackupTiming[2])
if delta < 0:
delta += self.incBackupTiming[2]
if delta+lastOffset < self.incBackupTiming[0]:
delta += self.incBackupTiming[2]
if delta+lastOffset > self.incBackupTiming[1]:
delta = self.incBackupTiming[1]-lastOffset
if delta < result[0]:
result = (delta, 'inc')
return result
def getBackupCommand(self, backupType, indexPathname):
"""Get the command to execute to create the backup.
@param backupType use this mode to create the backup.
@param indexPathname path to the index file name, None when
backup without indexing is requested."""
if (backupType == 'full') and (self.fullBackupOverrideCommand != None):
return self.fullBackupOverrideCommand
if (backupType == 'inc') and (self.incBackupOverrideCommand != None):
return self.incBackupOverrideCommand
backupCommand = ['tar']
if self.ignoreBackupRacesFlag:
backupCommand.append('--ignore-failed-read')
backupCommand.append('-C')
backupCommand.append(self.backupRoot)
if self.incBackupTiming != None:
backupCommand.append('--listed-incremental')
backupCommand.append(indexPathname)
if self.backupExcludeList != None:
for excludePattern in self.backupExcludeList:
backupCommand.append('--exclude=%s' % excludePattern)
backupCommand.append('-c')
backupCommand.append('--')
backupCommand += self.backupIncludeList
return backupCommand
def getJsonData(self):
"""Return the state of this object in a format suitable for
JSON serialization."""
return [
self.lastFullBackupTime, self.lastAnyBackupTime,
str(base64.b64encode(self.lastUuidValue), 'ascii')]
class TarBackupUnit(guerillabackup.SchedulableGeneratorUnitInterface):
"""This class allows to schedule regular runs of tar backups.
The unit supports different scheduling modes: modulo timing
and maximum gap timing. The time stamps considered are always
those from starting the tar backup, not end time.
The unit will keep track of the last UUID reported for each
resource and generate a new one for each produced file using
json-serialized state data. The state data is a dictionary with
the resource name as key to refer to a tuple of values: the
last successful run timestamp as seconds since 1970, the timestamp
of the last full backup run and the UUID value of the last run."""
def __init__(self, unitName, configContext):
"""Initialize this unit using the given configuration."""
self.unitName = unitName
self.configContext = configContext
self.testModeFlag = configContext.get(guerillabackup.CONFIG_GENERAL_DEBUG_TEST_MODE_KEY, False)
if not isinstance(self.testModeFlag, bool):
raise Exception('Configuration parameter %s has to be ' \
'boolean' % guerillabackup.CONFIG_GENERAL_DEBUG_TEST_MODE_KEY)
backupConfigList = configContext.get(CONFIG_LIST_KEY, None)
if (backupConfigList is None) or (not isinstance(backupConfigList, dict)):
raise Exception('Configuration parameter %s missing or of wrong type' % CONFIG_LIST_KEY)
self.backupUnitDescriptions = {}
for sourceUrl, configDef in backupConfigList.items():
self.backupUnitDescriptions[sourceUrl] = TarBackupUnitDescription(
sourceUrl, configDef)
# Start loading the persistency information.
persistencyDirFd = None
persistencyFileHandle = None
stateData = None
try:
persistencyDirFd = guerillabackup.openPersistencyFile(
configContext, os.path.join('generators', self.unitName),
os.O_DIRECTORY|os.O_RDONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY, 0o700)
try:
persistencyFileHandle = guerillabackup.secureOpenAt(
persistencyDirFd, 'state.current',
fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY)
except OSError as openError:
if openError.errno != errno.ENOENT:
raise
# See if the state.previous file exists, if yes, the unit is likely
# to be broken. Refuse to do anything while in this state.
try:
os.stat(
'state.previous', dir_fd=persistencyDirFd, follow_symlinks=False)
raise Exception(
'Persistency data inconsistencies: found stale previous state file')
except OSError as statError:
if statError.errno != errno.ENOENT:
raise
# So there is only the current state file, if any.
if persistencyFileHandle != None:
stateData = b''
while True:
data = os.read(persistencyFileHandle, 1<<20)
if len(data) == 0:
break
stateData += data
os.close(persistencyFileHandle)
persistencyFileHandle = None
finally:
if persistencyFileHandle != None:
os.close(persistencyFileHandle)
if persistencyDirFd != None:
os.close(persistencyDirFd)
# Start mangling of data after closing all file handles.
if stateData is None:
print('%s: first time activation, no persistency data found' % self.unitName, file=sys.stderr)
else:
stateInfo = json.loads(str(stateData, 'ascii'))
if not isinstance(stateInfo, dict):
raise Exception('Persistency data structure mismatch')
for url, stateData in stateInfo.items():
description = self.backupUnitDescriptions.get(url, None)
if description is None:
# Ignore this state, user might have removed a single tar backup
# configuration without deleting the UUID and timing data.
print('No tar backup configuration for %s resource state data %s' % (
url, repr(stateData)), file=sys.stderr)
continue
description.lastFullBackupTime = stateData[0]
description.lastAnyBackupTime = stateData[1]
# The UUID is kept internally as binary data string. Only for
# persistency, data will be base64 encoded.
description.lastUuidValue = base64.b64decode(stateData[2])
def findNextInvocationUnit(self):
"""Find the next unit to invoke.
@return a tuple containing the seconds till next invocation
and the corresponding TarBackupUnitDescription. Next invocation
time might be negative if unit invocation is already overdue."""
currentTime = int(time.time())
nextInvocationTime = None
nextDescription = None
for url, description in self.backupUnitDescriptions.items():
info = description.getNextInvocationInfo(currentTime)
if (nextInvocationTime is None) or (info[0] < nextInvocationTime):
nextInvocationTime = info[0]
nextDescription = description
if nextInvocationTime is None:
return None
return (nextInvocationTime, nextDescription)
def getNextInvocationTime(self):
"""Get the time in seconds until this unit should called again.
If a unit does not know (yet) as invocation needs depend on
external events, it should report a reasonable low value to
be queried again soon.
@return 0 if the unit should be invoked immediately, the seconds
to go otherwise."""
nextUnitInfo = self.findNextInvocationUnit()
if nextUnitInfo is None:
return 3600
if nextUnitInfo[0] < 0:
return 0
return nextUnitInfo[0]
def processInput(self, tarUnitDescription, sink, persistencyDirFd):
"""Process a single input description by creating the tar
stream and updating the indices, if any. When successful,
persistency information about this subunit is updated also."""
# Keep time of invocation check and start of backup procedure
# also for updating the unit data.
currentTime = int(time.time())
(invocationTime, backupType) = tarUnitDescription.getNextInvocationInfo(
currentTime)
indexFilenamePrefix = None
indexFilePathname = None
nextIndexFileName = None
if tarUnitDescription.incBackupTiming != None:
# We will have to create an index, open the index directory at
# first.
indexFilenamePrefix = tarUnitDescription.sourceUrl[1:].replace('/', '-')
# Make sure the filename cannot get longer than 256 bytes, even
# with ".index(.bz2).yyyymmddhhmmss" (25 chars) appended.
if len(indexFilenamePrefix) > 231:
indexFilenamePrefix = indexFilenamePrefix[:231]
# Create the new index file.
nextIndexFileName = '%s.index.next' % indexFilenamePrefix
nextIndexFileHandle = guerillabackup.secureOpenAt(
persistencyDirFd, nextIndexFileName,
fileOpenFlags=os.O_WRONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY,
fileCreateMode=0o600)
indexFilePathname = os.path.join(
guerillabackup.getPersistencyBaseDirPathname(self.configContext),
'generators', self.unitName, nextIndexFileName)
if backupType == 'inc':
# See if there is an old index. When missing, change the mode
# to "full".
indexStatResult = None
try:
indexStatResult = os.stat(
'%s.index' % indexFilenamePrefix, dir_fd=persistencyDirFd,
follow_symlinks=False)
except OSError as statError:
if statError.errno != errno.ENOENT:
raise
if indexStatResult is None:
backupType = 'full'
else:
# Copy content from current index to new one.
currentIndexFileHandle = guerillabackup.secureOpenAt(
persistencyDirFd, '%s.index' % indexFilenamePrefix,
fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY)
while True:
data = os.read(currentIndexFileHandle, 1<<20)
if len(data) == 0:
break
os.write(nextIndexFileHandle, data)
os.close(currentIndexFileHandle)
os.close(nextIndexFileHandle)
# Everything is prepared for backup, start it.
if tarUnitDescription.preBackupCommandList != None:
if self.testModeFlag:
print('No invocation of PreBackupCommand in test mode', file=sys.stderr)
else:
process = subprocess.Popen(tarUnitDescription.preBackupCommandList)
returnCode = process.wait()
if returnCode != 0:
raise Exception('Pre backup command %s failed in %s, source %s' % (
repr(tarUnitDescription.preBackupCommandList)[1:-1],
self.unitName, tarUnitDescription.sourceUrl))
# Start the unit itself.
backupCommand = tarUnitDescription.getBackupCommand(
backupType, indexFilePathname)
# Accept creation of tar archives only with zero exit status or
# return code 1, when files were concurrently modified and those
# races should be ignored.
allowedExitStatusList = [0]
if tarUnitDescription.ignoreBackupRacesFlag:
allowedExitStatusList.append(1)
completePipleline = [guerillabackup.OSProcessPipelineElement(
'/bin/tar', backupCommand, allowedExitStatusList)]
# Get the downstream transformation pipeline elements.
completePipleline += guerillabackup.getDefaultDownstreamPipeline(
self.configContext, tarUnitDescription.encryptionKeyName)
# Build the transformation pipeline instance.
sinkHandle = sink.getSinkHandle(tarUnitDescription.sourceUrl)
sinkStream = sinkHandle.getSinkStream()
# Get the list of started pipeline instances.
pipelineInstances = guerillabackup.instantiateTransformationPipeline(
completePipleline, None, sinkStream, doStartFlag=True)
try:
guerillabackup.runTransformationPipeline(pipelineInstances)
except:
# Just cleanup the incomplete index file when incremental mode
# was requested.
if not nextIndexFileName is None:
os.unlink(nextIndexFileName, dir_fd=persistencyDirFd)
raise
digestData = pipelineInstances[-1].getDigestData()
metaInfoDict = {}
metaInfoDict['BackupType'] = backupType
if tarUnitDescription.handlingPolicyName != None:
metaInfoDict['HandlingPolicy'] = [tarUnitDescription.handlingPolicyName]
lastUuid = tarUnitDescription.lastUuidValue
currentUuidDigest = hashlib.sha512()
if lastUuid != None:
metaInfoDict['Predecessor'] = lastUuid
currentUuidDigest.update(lastUuid)
# Add the compressed file digest to make UUID different for different
# content.
currentUuidDigest.update(digestData)
# Also include the timestamp and source URL in the UUID calculation
# to make UUID different for backup of identical data at (nearly)
# same time.
currentUuidDigest.update(bytes('%d %s' % (
currentTime, tarUnitDescription.sourceUrl), sys.getdefaultencoding()))
currentUuid = currentUuidDigest.digest()
metaInfoDict['DataUuid'] = currentUuid
metaInfoDict['StorageFileChecksumSha512'] = digestData
metaInfoDict['Timestamp'] = currentTime
metaInfo = BackupElementMetainfo(metaInfoDict)
sinkHandle.close(metaInfo)
if self.testModeFlag:
raise Exception('No completion of tar backup in test mode')
if tarUnitDescription.postBackupCommandList != None:
process = subprocess.Popen(tarUnitDescription.postBackupCommandList)
returnCode = process.wait()
if returnCode != 0:
# Still raise an exception and thus prohibit completion of this
# tar backup. The PostBackupCommand itself cannot have an influence
# on the backup created before but the failure might indicate,
# that the corresponding PreBackupCommand was problematic. Thus
# let the user resolve the problem manually.
raise Exception('Post backup command %s failed in %s, source %s' % (
repr(tarUnitDescription.postBackupCommandList)[1:-1],
self.unitName, tarUnitDescription.sourceUrl))
if tarUnitDescription.incBackupTiming != None:
# See if there is an old index to compress and move, but only
# if it should be really kept. Currently fstatat function is not
# available, so use open/fstat instead.
currentIndexFd = None
currentIndexName = '%s.index' % indexFilenamePrefix
try:
currentIndexFd = guerillabackup.secureOpenAt(
persistencyDirFd, currentIndexName,
fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY)
except OSError as indexOpenError:
if indexOpenError.errno != errno.ENOENT:
raise
targetFileName = None
if currentIndexFd != None:
if tarUnitDescription.keepOldIndicesCount == 0:
os.close(currentIndexFd)
os.unlink(currentIndexName, dir_fd=persistencyDirFd)
else:
statData = os.fstat(currentIndexFd)
targetFileTime = int(statData.st_mtime)
targetFileHandle = None
while True:
date = datetime.datetime.fromtimestamp(targetFileTime)
dateStr = date.strftime('%Y%m%d%H%M%S')
targetFileName = '%s.index.bz2.%s' % (indexFilenamePrefix, dateStr)
try:
targetFileHandle = guerillabackup.secureOpenAt(
persistencyDirFd, targetFileName,
fileOpenFlags=os.O_WRONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY,
fileCreateMode=0o600)
break
except OSError as indexBackupOpenError:
if indexBackupOpenError.errno != errno.EEXIST:
raise
targetFileTime += 1
# Now both handles are valid, use external bzip2 binary to perform
# compression.
process = subprocess.Popen(
['/bin/bzip2', '-c9'], stdin=currentIndexFd,
stdout=targetFileHandle)
returnCode = process.wait()
if returnCode != 0:
raise Exception('Failed to compress the old index: %s' % returnCode)
os.close(currentIndexFd)
# FIXME: we should use utime with targetFileHandle as pathlike
# object, only available in Python3.6 and later.
os.utime(
'/proc/self/fd/%d' % targetFileHandle,
(statData.st_mtime, statData.st_mtime))
os.close(targetFileHandle)
os.unlink(currentIndexName, dir_fd=persistencyDirFd)
# Now previous index was compressed or deleted, link the next
# index to the current position.
os.link(
nextIndexFileName, currentIndexName, src_dir_fd=persistencyDirFd,
dst_dir_fd=persistencyDirFd, follow_symlinks=False)
os.unlink(nextIndexFileName, dir_fd=persistencyDirFd)
if tarUnitDescription.keepOldIndicesCount != -1:
# So we should apply limits to the number of index backups.
fileList = []
searchPrefix = '%s.index.bz2.' % indexFilenamePrefix
searchLength = len(searchPrefix)+14
for fileName in guerillabackup.listDirAt(persistencyDirFd):
if ((len(fileName) != searchLength) or
(not fileName.startswith(searchPrefix))):
continue
fileList.append(fileName)
fileList.sort()
if len(fileList) > tarUnitDescription.keepOldIndicesCount:
# Make sure that the new index file was sorted last. When not,
# the current state could indicate clock/time problems on the
# machine. Refuse to process the indices and issue a warning.
indexBackupPos = fileList.index(targetFileName)
if indexBackupPos+1 != len(fileList):
raise Exception('Sorting of old backup indices inconsistent, refusing cleanup')
for fileName in fileList[:-tarUnitDescription.keepOldIndicesCount]:
os.unlink(fileName, dir_fd=persistencyDirFd)
# Update the UUID map as last step: if any of the steps above
# would fail, currentUuid generated in next run will be identical
# to this. Sorting out the duplicates will be easy.
tarUnitDescription.lastUuidValue = currentUuid
# Update the timestamp.
tarUnitDescription.lastAnyBackupTime = currentTime
if backupType == 'full':
tarUnitDescription.lastFullBackupTime = currentTime
# Write the new persistency data before returning.
self.updateStateData(persistencyDirFd)
def invokeUnit(self, sink):
"""Invoke this unit to create backup elements and pass them
on to the sink. Even when indicated via getNextInvocationTime,
the unit may decide, that it is not yet ready and not write
any element to the sink.
@return None if currently there is nothing to write to the
sink, a number of seconds to retry invocation if the unit
assumes, that there is data to be processed but processing
cannot start yet, e.g. due to locks held by other parties
or resource, e.g. network storages, currently not available.
@throw Exception if the unit internal logic failed in any
uncorrectable ways. Even when invoker decides to continue
processing, it must not reinvoke this unit before complete
reload."""
persistencyDirFd = None
try:
while True:
nextUnitInfo = self.findNextInvocationUnit()
if nextUnitInfo is None:
return None
if nextUnitInfo[0] > 0:
return nextUnitInfo[0]
if persistencyDirFd is None:
# Not opened yet, do it now.
persistencyDirFd = guerillabackup.openPersistencyFile(
self.configContext, os.path.join('generators', self.unitName),
os.O_DIRECTORY|os.O_RDONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY, 0o600)
# Try to process the current tar backup unit. There should be
# no state change to persist or cleanup, just let any exception
# be passed on to caller.
try:
self.processInput(nextUnitInfo[1], sink, persistencyDirFd)
except Exception as processException:
print('%s: Error processing tar %s, disabling it temporarily\n%s' % (
self.unitName,
repr(nextUnitInfo[1].sourceUrl), processException),
file=sys.stderr)
traceback.print_tb(sys.exc_info()[2])
nextUnitInfo[1].nextRetryTime = time.time()+3600
finally:
if persistencyDirFd != None:
try:
os.close(persistencyDirFd)
persistencyDirFd = None
except Exception as closeException:
print('FATAL: Internal Error: failed to close persistency ' \
'directory handle %d: %s' % (
persistencyDirFd, str(closeException)),
file=sys.stderr)
def updateStateData(self, persistencyDirFd):
"""Replace the current state data file with one containing
the current unit internal state.
@throw Exception is writing fails for any reason. The unit
will be in incorrectable state afterwards."""
# Create the data structures for writing.
stateData = {}
for sourceUrl, description in self.backupUnitDescriptions.items():
stateData[sourceUrl] = description.getJsonData()
writeData = bytes(json.dumps(stateData), 'ascii')
# Try to replace the current state file. At first unlink the old
# one.
try:
os.unlink('state.old', dir_fd=persistencyDirFd)
except OSError as unlinkError:
if unlinkError.errno != errno.ENOENT:
raise
# Link the current to the old one.
try:
os.link(
'state.current', 'state.old', src_dir_fd=persistencyDirFd,
dst_dir_fd=persistencyDirFd, follow_symlinks=False)
except OSError as relinkError:
if relinkError.errno != errno.ENOENT:
raise
# Unlink the current state. Thus we can then use O_EXCL on create.
try:
os.unlink('state.current', dir_fd=persistencyDirFd)
except OSError as relinkError:
if relinkError.errno != errno.ENOENT:
raise
# Create the new file.
fileHandle = None
try:
fileHandle = guerillabackup.secureOpenAt(
persistencyDirFd, 'state.current',
fileOpenFlags=os.O_WRONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY,
fileCreateMode=0o600)
os.write(fileHandle, writeData)
# Also close handle within try, except block to catch also delayed
# errors after write.
os.close(fileHandle)
fileHandle = None
except Exception as stateSaveException:
# Writing of state information failed. Print out the state information
# for manual reconstruction as last resort.
print('Writing of state information failed: %s\nCurrent state: ' \
'%s' % (str(stateSaveException), repr(writeData)), file=sys.stderr)
traceback.print_tb(sys.exc_info()[2])
raise
finally:
if fileHandle != None:
os.close(fileHandle)
# Declare the main unit class so that the backup generator can
# instantiate it.
backupGeneratorUnitClass = TarBackupUnit
guerillabackup-0.5.0/src/lib/guerillabackup/Transfer.py 0000664 0000000 0000000 00000155577 14501370353 0023220 0 ustar 00root root 0000000 0000000 """This module contains a collection of interfaces and classes
for agent-based transfer and synchronization."""
import errno
import json
import os
import select
import socket
import struct
import sys
import threading
import time
import guerillabackup
from guerillabackup.BackupElementMetainfo import BackupElementMetainfo
class TransferContext():
"""This class stores all information about a remote TransferAgent
while it is attached to this TransferAgent. It is the responsibility
of the class creating a new context to authenticate the remote
side and to assign the correct agent id if needed.
@param localStorage local storage to used by policies for data
retrieval and storage."""
def __init__(
self, agentId, receiverTransferPolicy, senderTransferPolicy,
localStorage):
self.agentId = agentId
self.receiverTransferPolicy = receiverTransferPolicy
self.senderTransferPolicy = senderTransferPolicy
self.localStorage = localStorage
self.clientProtocolAdapter = None
self.serverProtocolAdapter = None
self.shutdownOfferedFlag = False
self.shutdownAcceptedFlag = False
def connect(self, clientProtocolAdapter, serverProtocolAdapter):
"""Connect this context with the client and server adapters."""
self.clientProtocolAdapter = clientProtocolAdapter
self.serverProtocolAdapter = serverProtocolAdapter
def offerShutdown(self):
"""Offer protocol shutdown to the other side."""
if self.clientProtocolAdapter is None:
raise Exception('Cannot offer shutdown while not connected')
if self.shutdownOfferedFlag:
raise Exception('Shutdown already offered')
self.clientProtocolAdapter.offerShutdown()
self.shutdownOfferedFlag = True
def waitForShutdown(self):
"""Wait for the remote side to offer a shutdown and accept
it."""
if self.shutdownAcceptedFlag:
return
self.clientProtocolAdapter.waitForShutdown()
self.shutdownAcceptedFlag = True
def isShutdownAccepted(self):
"""Check if we already accepted a remote shutdown offer."""
return self.shutdownAcceptedFlag
class ProtocolDataElementStream:
"""This is the interface of any client protocol stream to a
remote data element."""
def read(self, readLength=0):
"""Read data from the current data element.
@param readLength if not zero, return a chunk of data with
at most the given length. When zero, return chunks with the
default size of the underlying IO layer.
@return the amount of data read or an empty string when end
of stream was reached."""
raise Exception('Interface method called')
def close(self):
"""Close the stream. This call might discard data already
buffered within this object or the underlying IO layer. This
method has to be invoked also when the end of the stream was
reached."""
raise Exception('Interface method called')
class ClientProtocolInterface:
"""This is the client side protocol adapter to initiate retrieval
of remote data from a remote SenderTransferPolicy. Each method
of the interface but also the returned FileInfo objects may
raise an IOError('Connection closed') to indicate connection
failures."""
def getRemotePolicyInfo(self):
"""Get information about the remote policy. The local agent
may then check if remote SenderTransferPolicy is compatible
to local settings and ReceiverTransferPolicy.
@return information about the remote sender policy or None
when no sender policy is installed, thus requesting remote
files is impossible."""
raise Exception('Interface method called')
def startTransaction(self, queryData):
"""Start or restart a query transaction to retrive files from
beginning on, even when skipped in previous round. The query
pointer is positioned before the first FileInfo to retrieve.
@param query data to send as query to remote side.
@throws exception if transation start is not possible or query
data was not understood by remote side."""
raise Exception('Interface method called')
def nextDataElement(self, wasStoredFlag=False):
"""Move to the next FileInfo.
@param wasStoredFlag if true, indicate that the previous file
was stored successfully on local side.
@return True if a next FileInfo is available, False otherwise."""
raise Exception('Interface method called')
def getDataElementInfo(self):
"""Get information about the currently selected data element.
The method can be invoked more than once on the same data
element. Extraction of associated stream data is only be possible
until proceeding to the next FileInfo using nextDataElement().
@return a tuple with the source URL, metadata and the attribute
dictionary visible to the client or None when no element is
currently selected."""
raise Exception('Interface method called')
def getDataElementStream(self):
"""Get a stream to read from the remote data element. While
stream is open, no other client protocol methods can be called.
@throws Exception if no transaction is open or no current
data element selected for transfer.
@return an instance of ProtocolDataElementStream for reading."""
raise Exception('Interface method called')
def getFileInfos(self, count):
"""Get the next file infos from the remote side. This method
is equivalent to calling nextDataElement() and getDataElementInfo()
count times. This will also finish any currently open FileInfo
indicating no sucessful storage to remote side."""
raise Exception('Interface method called')
def offerShutdown(self):
"""Offer remote side to shutdown the connection. The other
side has to confirm the offer to allow real shutdown."""
raise Exception('Interface method called')
def waitForShutdown(self):
"""Wait for the remote side to offer a shutdown. As we cannot
force remote side to offer shutdown using, this method may
block."""
raise Exception('Interface method called')
def forceShutdown(self):
"""Force an immediate shutdown without prior anouncement just
by terminating all connections and releasing all resources."""
raise Exception('Interface method called')
class ServerProtocolInterface:
"""This is the server side protocol adapter to be provided to
the transfer service to forward remote requests to the local
SenderPolicy. Methods are named identically but have different
service contract as in ClientProtocolInterface."""
def getPolicyInfo(self):
"""Get information about the remote SenderTransferPolicy.
The local agent may then check if the policy is compatible
to local settings and ReceiverTransferPolicy.
@return information about the installed SenderTransferPolicy
or None without a policy."""
raise Exception('Interface method called')
def startTransaction(self, queryData):
"""Start or restart a query transaction to retrive files from
beginning on, even when skipped in previous round. The query
pointer is positioned before the first FileInfo to retrieve.
@param query data received from remote side.
@throws Exception if transation start is not possible or query
data was not understood."""
raise Exception('Interface method called')
def nextDataElement(self, wasStoredFlag=False):
"""Move to the next FileInfo.
@param wasStoredFlag if true, indicate that the previous file
was stored successfully on local side.
@return True if a next FileInfo is available, False otherwise."""
raise Exception('Interface method called')
def getDataElementInfo(self):
"""Get information about the currently selected data element.
The method can be invoked more than once on the same data
element. Extraction of associated stream data is only be possible
until proceeding to the next FileInfo using nextDataElement().
@return a tuple with the source URL, metadata and the attribute
dictionary visible to the client or None when no element is
currently selected."""
raise Exception('Interface method called')
def getDataElementStream(self):
"""Get a stream to read the currently selected data element.
While stream is open, no other protocol methods can be called.
@throws Exception if no transaction is open or no current
data element selected for transfer."""
raise Exception('Interface method called')
class SenderTransferPolicy():
"""This is the common superinterface of all sender side transfer
policies. A policy implementation has perform internal consistency
checks after data modification as needed, the applyPolicy call
is only to notify policy about state changes due to transfers."""
def getPolicyInfo(self):
"""Get information about the sender policy."""
raise Exception('Interface method called')
def queryBackupDataElements(self, transferContext, queryData):
"""Query the local sender transfer policy to return a query
result with elements to be transfered to the remote side.
@param queryData when None, return all elements to be transfered.
Otherwise apply the policy specific query data to limit the
number of elements.
@return BackupDataElementQueryResult"""
raise Exception('Interface method called')
def applyPolicy(self, transferContext, backupDataElement, wasStoredFlag):
"""Apply this policy to adopt to changes due to access to
the a backup data element within a storage context.
@param backupDataElement a backup data element instance of
StorageBackupDataElementInterface returned by the queryBackupDataElements
method of this policy.
@param wasStoredFlag flag indicating if the remote side also
fetched the data of this object."""
raise Exception('Interface method called')
class ReceiverTransferPolicy:
"""This is the common superinterface of all receiver transfer
policies."""
def isSenderPolicyCompatible(self, policyInfo):
"""Check if a remote sender policy is compatible to this receiver
policy.
@return True when compatible."""
raise Exception('Interface method called')
def applyPolicy(self, transferContext):
"""Apply this policy for the given transfer context. This
method should be invoked only after checking that policies
are compliant. The policy will then use access to remote side
within transferContext to fetch data elements and modify local
and possibly also remote storage."""
raise Exception('Interface method called')
class SenderMoveDataTransferPolicy(SenderTransferPolicy):
"""This is a simple sender transfer policy just advertising
all resources for transfer and removing them or marking them
as transfered as soon as remote side confirms sucessful transfer.
A file with a mark will not be offered for download any more."""
def __init__(self, configContext, markAsDoneOnlyFlag=False):
self.configContext = configContext
self.markAsDoneOnlyFlag = markAsDoneOnlyFlag
if self.markAsDoneOnlyFlag:
raise Exception('FIXME: no persistency support for marking yet')
def getPolicyInfo(self):
"""Get information about the sender policy."""
return 'SenderMoveDataTransferPolicy'
def queryBackupDataElements(self, transferContext, queryData):
"""Query the local sender transfer policy to return a query
result with elements to be transfered to the remote side.
@param queryData when None, return all elements to be transfered.
Otherwise apply the policy specific query data to limit the
number of elements.
@return BackupDataElementQueryResult"""
query = None
if queryData != None:
if not isinstance(queryData, list):
raise Exception('Unsupported query data')
queryType = queryData[0]
if queryType == 'SourceUrl':
raise Exception('Not yet')
else:
raise Exception('Unsupported query type')
return transferContext.localStorage.queryBackupDataElements(query)
def applyPolicy(self, transferContext, backupDataElement, wasStoredFlag):
"""Apply this policy to adopt to changes due to access to
the a backup data element within a storage context.
@param backupDataElement a backup data element instance of
StorageBackupDataElementInterface returned by the queryBackupDataElements
method of this policy.
@param wasStoredFlag flag indicating if the remote side also
fetched the data of this object."""
# When other side did not confirm receiving the data, keep this
# element active.
if not wasStoredFlag:
return
if self.markAsDoneOnlyFlag:
raise Exception('FIXME: no persistency support for marking yet')
# Remove the element from the storage.
backupDataElement.delete()
class ReceiverStoreDataTransferPolicy(ReceiverTransferPolicy):
"""This class defines a receiver policy, that attempts to fetch
all data elements offered by the remote transfer agent."""
def __init__(self, configContext):
"""Create this policy using the given configuration."""
self.configContext = configContext
def isSenderPolicyCompatible(self, policyInfo):
"""Check if a remote sender policy is compatible to this receiver
policy.
@return True when compatible."""
return policyInfo == 'SenderMoveDataTransferPolicy'
def applyPolicy(self, transferContext):
"""Apply this policy for the given transfer context. This
method should be invoked only after checking that policies
are compliant. The policy will then use access to remote side
within transferContext to fetch data elements and modify local
and possibly also remote storage."""
# Just start a transaction and try to verify, that each remote
# element is also present in local storage.
transferContext.clientProtocolAdapter.startTransaction(None)
while transferContext.clientProtocolAdapter.nextDataElement(True):
(sourceUrl, metaInfo, attributeDict) = \
transferContext.clientProtocolAdapter.getDataElementInfo()
# Now we know about the remote object. See if it is already available
# within local storage.
localElement = transferContext.localStorage.getBackupDataElementForMetaData(
sourceUrl, metaInfo)
if localElement != None:
# Element is already available, not attempting to copy.
continue
# Create the sink to store the element.
sinkHandle = transferContext.localStorage.getSinkHandle(sourceUrl)
dataStream = transferContext.clientProtocolAdapter.getDataElementStream()
while True:
streamData = dataStream.read()
if streamData == b'':
break
sinkHandle.write(streamData)
sinkHandle.close(metaInfo)
class TransferAgent():
"""The TransferAgent keeps track of all currently open transfer
contexts and orchestrates transfer."""
def addConnection(self, transferContext):
"""Add a connection to the local agent."""
raise Exception('Interface method called')
def shutdown(self, forceShutdownTime=-1):
"""Trigger shutdown of this TransferAgent and all open connections
established by it. The method call shall return as fast as
possible as it might be invoked via signal handlers, that
should not be blocked. If shutdown requires activities with
uncertain duration, e.g. remote service acknowledging clean
shutdown, those tasks shall be performed in another thread,
e.g. the main thread handling the connections.
@param forceShutdowTime when 0 this method will immediately
end all service activity just undoing obvious intermediate
state, e.g. deleting temporary files, but will not notify
remote side for a clean shutdown or wait for current processes
to complete. A value greater zero indicates the intent to
terminate within that given amount of time."""
raise Exception('Interface method called')
class SimpleTransferAgent(TransferAgent):
"""This is a minimalistic implementation of a transfer agent.
It is capable of single-threaded transfers only, thus only a
single connection can be attached to this agent."""
def __init__(self):
"""Create the local agent."""
self.singletonContext = None
self.lock = threading.Lock()
def addConnection(self, transferContext):
"""Add a connection to the local transfer agent. As this agent
is only single threaded, the method will return only after
this connection was closed already."""
with self.lock:
if self.singletonContext is not None:
raise Exception(
'%s cannot handle multiple connections in parallel' % (
self.__class__.__name__))
self.singletonContext = transferContext
try:
if not self.ensurePolicyCompliance(transferContext):
print('Incompatible policies detected, shutting down', file=sys.stderr)
elif transferContext.receiverTransferPolicy != None:
# So remote sender policy is compliant to local storage policy.
# Recheck local policy until we are done.
transferContext.receiverTransferPolicy.applyPolicy(transferContext)
# Indicate local shutdown to other side.
transferContext.offerShutdown()
# Await remote shutdown confirmation.
transferContext.waitForShutdown()
except OSError as communicationError:
if communicationError.errno == errno.ECONNRESET:
print('%s' % communicationError.args[1], file=sys.stderr)
else:
raise
finally:
transferContext.clientProtocolAdapter.forceShutdown()
with self.lock:
self.singletonContext = None
def ensurePolicyCompliance(self, transferContext):
"""Check that remote sending policy is compliant to local
receiver policy or both policies are not set."""
policyInfo = transferContext.clientProtocolAdapter.getRemotePolicyInfo()
if transferContext.receiverTransferPolicy is None:
# We are not expecting to receive anything, so remote policy can
# never match local one.
return policyInfo is None
return transferContext.receiverTransferPolicy.isSenderPolicyCompatible(
policyInfo)
def shutdown(self, forceShutdownTime=-1):
"""Trigger shutdown of this TransferAgent and all open connections
established by it.
@param forceShutdowTime when 0 this method will immediately
end all service activity just undoing obvious intermediate
state, e.g. deleting temporary files, but will not notify
remote side for a clean shutdown or wait for current processes
to complete. A value greater zero indicates the intent to
terminate within that given amount of time."""
transferContext = None
with self.lock:
if self.singletonContext is None:
return
transferContext = self.singletonContext
if forceShutdownTime == 0:
# Immedate shutdown was requested.
transferContext.clientProtocolAdapter.forceShutdown()
else:
transferContext.offerShutdown()
class DefaultTransferAgentServerProtocolAdapter(ServerProtocolInterface):
"""This class provides a default protocol adapter only relaying
requests to the sender policy within the given transfer context."""
def __init__(self, transferContext, remoteStorageNotificationFunction=None):
"""Create the default adapter. This adapter just publishes
all DataElements from local storage but does not provide any
support for attributes or element deletion.
@param remoteStorageNotificationFunction when not None, this
function is invoked with context, FileInfo and wasStoredFlag
before moving to the next resource."""
self.transferContext = transferContext
self.remoteStorageNotificationFunction = remoteStorageNotificationFunction
self.transactionIterator = None
self.currentDataElement = None
def getPolicyInfo(self):
"""Get information about the remote SenderTransferPolicy.
The local agent may then check if the policy is compatible
to local settings and ReceiverTransferPolicy.
@return information about the installed SenderTransferPolicy
or None without a policy."""
if self.transferContext.senderTransferPolicy is None:
return None
return self.transferContext.senderTransferPolicy.getPolicyInfo()
def startTransaction(self, queryData):
"""Start or restart a query transaction to retrive files from
beginning on, even when skipped in previous round. The query
pointer is positioned before the first FileInfo to retrieve.
@param query data received from remote side.
@throws exception if transation start is not possible or query
data was not understood."""
if self.currentDataElement != None:
self.currentDataElement.invalidate()
self.currentDataElement = None
self.transactionIterator = \
self.transferContext.senderTransferPolicy.queryBackupDataElements(
self.transferContext, queryData)
def nextDataElement(self, wasStoredFlag=False):
"""Move to the next FileInfo.
@param wasStoredFlag if true, indicate that the previous file
was stored successfully on local side.
@return True if a next FileInfo is available, False otherwise."""
if self.currentDataElement != None:
self.transferContext.senderTransferPolicy.applyPolicy(
self.transferContext, self.currentDataElement, wasStoredFlag)
if self.remoteStorageNotificationFunction != None:
self.remoteStorageNotificationFunction(
self.transferContext, self.currentDataElement, wasStoredFlag)
self.currentDataElement.invalidate()
self.currentDataElement = None
if self.transactionIterator is None:
return False
dataElement = self.transactionIterator.getNextElement()
if dataElement is None:
self.transactionIterator = None
return False
self.currentDataElement = WrappedStorageBackupDataElementFileInfo(
dataElement)
return True
def getDataElementInfo(self):
"""Get information about the currently selected data element.
The method can be invoked more than once on the same data
element. Extraction of associated stream data is only be possible
until proceeding to the next FileInfo using nextDataElement().
@return a tuple with the source URL, metadata and the attribute
dictionary visible to the client."""
if self.currentDataElement is None:
return None
return (self.currentDataElement.getSourceUrl(), self.currentDataElement.getMetaInfo(), None)
def getDataElementStream(self):
"""Get a stream to read the currently selected data element.
While stream is open, no other protocol methods can be called.
@throws Exception if no transaction is open or no current
data element selected for transfer."""
if self.currentDataElement is None:
raise Exception('No data element selected')
return self.currentDataElement.getDataStream()
class WrappedStorageBackupDataElementFileInfo():
"""This is a simple wrapper over a a StorageBackupDataElement
to retrieve requested data directly from storage. It does not
support any attributes as those are usually policy specific."""
def __init__(self, dataElement):
if not isinstance(dataElement, guerillabackup.StorageBackupDataElementInterface):
raise Exception('Cannot wrap object not implementing StorageBackupDataElementInterface')
self.dataElement = dataElement
def getSourceUrl(self):
"""Get the source URL of this file object."""
return self.dataElement.getSourceUrl()
def getMetaInfo(self):
"""Get only the metadata part of this element.
@return a BackupElementMetainfo object"""
return self.dataElement.getMetaData()
def getAttributes(self):
"""Get the additional attributes of this file info object.
Currently attributes are not supported for wrapped objects."""
return None
def setAttribute(self, name, value):
"""Set an attribute for this file info."""
raise Exception('Not supported')
def getDataStream(self):
"""Get a stream to read data from that element.
@return a file descriptor for reading this stream."""
return self.dataElement.getDataStream()
def delete(self):
"""Delete the backup data element behind this object."""
self.dataElement.delete()
self.dataElement = None
def invalidate(self):
"""Make sure all resources associated with this element are
released.
@throws Exception if element is currently in use, e.g. read."""
# Nothing to invalidate here when using primitive, uncached wrapping.
pass
class MultipartResponseIterator():
"""This is the interface for all response iterators. Those should
be used, where the response is too large for a single response
data block or where means for interruption of an ongoing large
response are needed."""
def getNextPart(self):
"""Get the next part from this iterator. After detecting
that no more parts are available or calling release(), the
caller must not attempt to invoke the method again.
@return the part data or None when no more parts are available."""
raise Exception('Interface method called')
def release(self):
"""This method releases all resources associated with this
iterator if the iterator end was not yet reached in getNextPart().
All future calls to getNextPart() or release() will cause
exceptions."""
raise Exception('Interface method called')
class StreamRequestResponseMultiplexer():
"""This class allows to send requests and reponses via a single
bidirectional in any connection. The class is not thread-safe
and hence does not support concurrent method calls. The transfer
protocol consists of a few packet types:
* 'A' for aborting a running streaming request.
* 'S' for sending a request to the remote side server interface
* 'P' for stream response parts before receiving a final 'R'
packet. Thus a 'P' packet identifies a stream response. Therefore
for zero byte stream responses, there has to be at least a
single 'P' package of zero zero size. The final 'R' packet
at the end of the stream has to be always of zero size.
* 'R' for the remote response packet containing the response
data.
Any exception due to protocol violations, multiplex connection
IO errors or requestHandler processing failures will cause the
multiplexer to shutdown all functionality immediately for security
reasons."""
def __init__(self, inputFd, outputFd, requestHandler):
"""Create this multiplexer based on the given input and output
file descriptors.
@param requestHandler a request handler to process the incoming
requests."""
self.inputFd = inputFd
self.outputFd = outputFd
self.requestHandler = requestHandler
# This flag stays true until the response for the last request
# was received.
self.responsePendingFlag = False
# When true, this multiplexer is currently receiving a stream
# response from the remote side.
self.inStreamResponseFlag = False
# This iterator will be none when an incoming request is returning
# a group of packets as response.
self.responsePartIterator = None
self.shutdownOfferSentFlag = False
self.shutdownOfferReceivedFlag = False
# Input buffer of data from remote side.
self.remoteData = b''
self.remoteDataLength = -1
def sendRequest(self, requestData):
"""Send request data to the remote side and await the result.
The method will block until the remote data was received and
will process incoming requests while waiting. The method relies
on the caller to have fetched all continuation response parts
for the previous requests using handleRequests, before submitting
a new request.
@rais Exception if multiplexer is in invalid state.
@return response data when a response was pending and data
was received within time. The data is is a tuple containing
the binary content and a boolean value indicating if the received
data is a complete response or belonging to a stream."""
if (requestData is None) or (len(requestData) == 0):
raise Exception('No request data given')
if self.responsePendingFlag:
raise Exception('Cannot queue another request while response is pending')
if self.shutdownOfferSentFlag:
raise Exception('Shutdown already offered')
return self.internalIOHandler(requestData, 1000)
def handleRequests(self, selectTime):
"""Handle incoming requests by waiting for incoming data for
the given amount of time.
@return None when only request handling was performed, the
no-response continuation data for the last request if received."""
if ((self.shutdownOfferSentFlag) and (self.shutdownOfferReceivedFlag) and
not self.responsePendingFlag):
raise Exception('Cannot handle requests after shutdown')
return self.internalIOHandler(None, selectTime)
def offerShutdown(self):
"""Offer the shutdown to the remote side. This method has
the same limitations and requirements as the sendRequest method.
It does not return any data."""
if self.shutdownOfferSentFlag:
raise Exception('Already offered')
self.shutdownOfferSentFlag = True
result = self.internalIOHandler(b'', 600)
if result[1]:
raise Exception('Received unexpected stream response')
if len(result[0]) != 0:
raise Exception('Unexpected response data on shutdown offer request')
def wasShutdownOffered(self):
"""Check if shutdown was already offered by this side."""
return self.shutdownOfferSentFlag
def wasShutdownRequested(self):
"""Check if remote side has alredy requested shutdown."""
return self.shutdownOfferReceivedFlag
def close(self):
"""Close this multiplexer by closing also all underlying streams."""
if self.inputFd == -1:
raise Exception('Already closed')
# We were currently transfering stream parts when being shutdown.
# Make sure to release the iterator to avoid resource leaks.
if self.responsePartIterator != None:
self.responsePartIterator.release()
self.responsePartIterator = None
pendingException = None
try:
os.close(self.inputFd)
except Exception as closeException:
# Without any program logic errors, exceptions here are rare
# and problematic, therefore report them immediately.
print(
'Closing of input stream failed: %s' % str(closeException),
file=sys.stderr)
pendingException = closeException
if self.outputFd != self.inputFd:
try:
os.close(self.outputFd)
except Exception as closeException:
print(
'Closing of output stream failed: %s' % str(closeException),
file=sys.stderr)
pendingException = closeException
self.inputFd = -1
self.outputFd = -1
self.shutdownOfferSentFlag = True
self.shutdownOfferReceivedFlag = True
if pendingException != None:
raise pendingException
def internalIOHandler(self, requestData, maxHandleTime):
"""Perform multiplexer IO operations. When requestData was
given, it will be sent before waiting for a response and probably
handling incoming requests.
@param maxHandleTime the maximum time to stay inside this
function waiting for input data. When 0, attempt to handle
only data without blocking. Writing is not affected by this
parameter and will be attempted until successful or fatal
error was detected.
@return response data when a response was pending and data
was received within time. The data is is a tuple containing
the binary content and a boolean value indicating if the received
data is a complete response or belonging to a stream.
@throws OSError when lowlevel IO communication with the remote
peer failed or was ended prematurely.
@throws Exception when protocol violations were detected."""
sendQueue = []
writeSelectFds = []
if requestData is None:
if self.shutdownOfferReceivedFlag and not self.inStreamResponseFlag:
raise Exception(
'Request handling attempted after remote shutdown was offered')
else:
sendQueue.append([b'S'+struct.pack(' (1<<20)):
raise Exception('Invalid input data chunk length 0x%x' % self.remoteDataLength)
self.remoteDataLength += 5
if self.remoteDataLength != 5:
# We read exactly 5 bytes but need more. Let the outer loop do that.
return (True, True, None)
if len(self.remoteData) < self.remoteDataLength:
readData = os.read(
self.inputFd, self.remoteDataLength-len(self.remoteData))
if len(readData) == 0:
if self.responsePendingFlag:
raise Exception('End of data while awaiting response')
raise Exception('End of data with partial input')
self.remoteData += readData
if len(self.remoteData) != self.remoteDataLength:
return (True, True, None)
# Check the code.
if self.remoteData[0] not in b'PRS':
raise Exception('Invalid packet type 0x%x' % self.remoteData[0])
if not self.remoteData.startswith(b'S'):
# Not sending a request but receiving some kind of response.
if not self.responsePendingFlag:
raise Exception(
'Received %s packet while not awaiting response' % repr(self.remoteData[0:1]))
if self.remoteData.startswith(b'P'):
# So this is the first or an additional fragment for a previous response.
self.inStreamResponseFlag = True
else:
self.responsePendingFlag = False
self.inStreamResponseFlag = False
responseData = (self.remoteData[5:], self.inStreamResponseFlag)
self.remoteData = b''
self.remoteDataLength = -1
# We could receive multiple response packets while sendqueue was
# not yet flushed containing outgoing responses to requests processed.
# Although all those responses would belong to the same request,
# fusion is not an option to avoid memory exhaustion. So we have
# to delay returning of response data until sendqueue is emptied.
return (len(sendQueue) != 0, True, responseData)
# So this was another incoming request. Handle it and queue the
# response data.
if self.responsePartIterator != None:
raise Exception(
'Cannot handle additional request while previous one not done')
if self.shutdownOfferReceivedFlag:
raise Exception('Received request even after shutdown was offered')
if self.remoteDataLength == 5:
# This is a remote shutdown offer, the last request from the remote
# side. Prepare for shutdown.
self.shutdownOfferReceivedFlag = True
sendQueue.append([b'R\x00\x00\x00\x00', 0])
if not self.responsePendingFlag:
del readSelectFds[0]
else:
handlerResponse = self.requestHandler.handleRequest(
self.remoteData[5:])
handlerResponseType = b'R'
if isinstance(handlerResponse, MultipartResponseIterator):
self.responsePartIterator = handlerResponse
handlerResponse = handlerResponse.getNextPart()
handlerResponseType = b'P'
# Empty files might not even return a single part. But stream
# responses have to contain at least one 'P' type packet, so send
# an empty one.
if handlerResponse is None:
handlerResponse = b''
sendQueue.append([handlerResponseType+struct.pack(' readLength):
self.dataBuffer = result[readLength:]
result = result[:readLength]
if len(result) == 0:
self.endOfStreamFlag = True
return result
def close(self):
"""Close the stream. This call might discard data already
buffered within this object or the underlying IO layer. This
method has to be invoked also when the end of the stream was
reached."""
if self.endOfStreamFlag:
return
self.clientProtocolAdapter.__internalStreamAbort()
self.endOfStreamFlag = True
class JsonStreamClientProtocolAdapter(ClientProtocolInterface):
"""This class forwards client protocol requests via stream file
descriptors to the remote side."""
def __init__(self, streamMultiplexer):
"""Create this adapter based on the given input and output
file descriptors."""
self.streamMultiplexer = streamMultiplexer
self.inStreamReadingFlag = False
def getRemotePolicyInfo(self):
"""Get information about the remote policy. The local agent
may then check if remote SenderTransferPolicy is compatible
to local settings and ReceiverTransferPolicy.
@return information about the remote sender policy or None
when no sender policy is installed, thus requesting remote
files is impossible."""
if self.inStreamReadingFlag:
raise Exception('Illegal state')
(responseData, inStreamReadingFlag) = self.streamMultiplexer.sendRequest(
bytes(json.dumps(['getPolicyInfo']), 'ascii'))
if inStreamReadingFlag:
raise Exception('Protocol error')
return json.loads(str(responseData, 'ascii'))
def startTransaction(self, queryData):
"""Start or restart a query transaction to retrive files from
beginning on, even when skipped in previous round. The query
pointer is positioned before the first FileInfo to retrieve.
@param query data to send as query to remote side.
@throws exception if transation start is not possible or query
data was not understood by remote side."""
if self.inStreamReadingFlag:
raise Exception('Illegal state')
(responseData, inStreamReadingFlag) = self.streamMultiplexer.sendRequest(
bytes(json.dumps(['startTransaction', queryData]), 'ascii'))
if inStreamReadingFlag:
raise Exception('Protocol error')
if len(responseData) != 0:
raise Exception('Unexpected response received')
def nextDataElement(self, wasStoredFlag=False):
"""Move to the next FileInfo.
@param wasStoredFlag if true, indicate that the previous file
was stored successfully on local side.
@return True if a next FileInfo is available, False otherwise."""
if self.inStreamReadingFlag:
raise Exception('Illegal state')
(responseData, inStreamReadingFlag) = self.streamMultiplexer.sendRequest(
bytes(json.dumps(['nextDataElement', wasStoredFlag]), 'ascii'))
if inStreamReadingFlag:
raise Exception('Protocol error')
return json.loads(str(responseData, 'ascii'))
def getDataElementInfo(self):
"""Get information about the currently selected FileInfo.
Extraction of information and stream may only be possible
until proceeding to the next FileInfo using nextDataElement().
@return a tuple with the source URL, metadata and the attribute
dictionary visible to the client."""
if self.inStreamReadingFlag:
raise Exception('Illegal state')
(responseData, inStreamReadingFlag) = self.streamMultiplexer.sendRequest(
bytes(json.dumps(['getDataElementInfo']), 'ascii'))
if inStreamReadingFlag:
raise Exception('Protocol error')
result = json.loads(str(responseData, 'ascii'))
result[1] = BackupElementMetainfo.unserialize(
bytes(result[1], 'ascii'))
return result
def getDataElementStream(self):
"""Get a stream to read from currently selected data element.
While stream is open, no other protocol methods can be called.
@throws Exception if no transaction is open or no current
data element selected for transfer."""
self.inStreamReadingFlag = True
return JsonClientProtocolDataElementStream(self)
def getFileInfos(self, count):
"""Get the next file infos from the remote side. This method
is equivalent to calling nextDataElement() and getDataElementInfo()
count times. This will also finish any currently open FileInfo
indicating no sucessful storage to remote side."""
raise Exception('Interface method called')
def offerShutdown(self):
"""Offer remote side to shutdown the connection. The other
side has to confirm the offer to allow real shutdown."""
if self.inStreamReadingFlag:
raise Exception('Illegal state')
self.streamMultiplexer.offerShutdown()
def waitForShutdown(self):
"""Wait for the remote side to offer a shutdown. As we cannot
force remote side to offer shutdown using the given multiplex
mode, just wait until we receive the offer."""
while not self.streamMultiplexer.shutdownOfferReceivedFlag:
responseData = self.streamMultiplexer.handleRequests(600)
if responseData != None:
raise Exception(
'Did not expect to receive data while waiting for shutdown')
self.streamMultiplexer.close()
self.streamMultiplexer = None
def forceShutdown(self):
"""Force an immediate shutdown without prior anouncement just
by terminating all connections and releasing all resources."""
if self.streamMultiplexer != None:
self.streamMultiplexer.close()
self.streamMultiplexer = None
def internalReadDataStream(self, startStreamFlag=False):
"""Read a remote backup data element as stream."""
if not self.inStreamReadingFlag:
raise Exception('Illegal state')
responseData = None
inStreamReadingFlag = None
if startStreamFlag:
(responseData, inStreamReadingFlag) = self.streamMultiplexer.sendRequest(
bytes(json.dumps(['getDataElementStream']), 'ascii'))
if not inStreamReadingFlag:
raise Exception('Protocol error')
else:
(responseData, inStreamReadingFlag) = self.streamMultiplexer.handleRequests(1000)
if inStreamReadingFlag != (len(responseData) != 0):
raise Exception('Protocol error')
if len(responseData) == 0:
# So this was the final chunk, switch to normal non-stream mode again.
self.inStreamReadingFlag = False
return responseData
def __internalStreamAbort(self):
"""Abort reading of the data stream currently open for reading."""
if not self.inStreamReadingFlag:
raise Exception('Illegal state')
# This is an exception to the normal client/server protocol: send
# the abort request even while the current request is still being
# processed.
(responseData, inStreamReadingFlag) = self.streamMultiplexer.sendRequest(
bytes(json.dumps(['abortDataElementStream']), 'ascii'))
# Now continue reading until all buffers have drained and final
# data chunk was removed.
while len(self.internalReadDataStream()) != 0:
pass
# Now read the response to the abort command itself.
(responseData, inStreamReadingFlag) = self.streamMultiplexer.sendRequest(None)
if len(responseData) != 0:
raise Exception('Unexpected response received')
class WrappedFileStreamMultipartResponseIterator(MultipartResponseIterator):
"""This class wraps an OS stream to provide the data as response
iterator."""
def __init__(self, streamFd, chunkSize=1<<16):
self.streamFd = streamFd
self.chunkSize = chunkSize
def getNextPart(self):
"""Get the next part from this iterator. After detecting
that no more parts are available or calling release(), the
caller must not attempt to invoke the method again.
@return the part data or None when no more parts are available."""
if self.streamFd < 0:
raise Exception('Illegal state')
readData = os.read(self.streamFd, self.chunkSize)
if len(readData) == 0:
# This is the end of the stream, we can release it.
self.release()
return None
return readData
def release(self):
"""This method releases all resources associated with this
iterator if the iterator end was not yet reached in getNextPart().
All future calls to getNextPart() or release() will cause
exceptions."""
if self.streamFd < 0:
raise Exception('Illegal state')
os.close(self.streamFd)
self.streamFd = -1
class JsonStreamServerProtocolRequestHandler():
"""This class handles incoming requests encoded in JSON and
passes them on to a server protocol adapter."""
def __init__(self, serverProtocolAdapter):
self.serverProtocolAdapter = serverProtocolAdapter
def handleRequest(self, requestData):
"""Handle an incoming request.
@return the serialized data."""
request = json.loads(str(requestData, 'ascii'))
if not isinstance(request, list):
raise Exception('Unexpected request data')
requestMethod = request[0]
responseData = None
noResponseFlag = False
if requestMethod == 'getPolicyInfo':
responseData = self.serverProtocolAdapter.getPolicyInfo()
elif requestMethod == 'startTransaction':
self.serverProtocolAdapter.startTransaction(request[1])
noResponseFlag = True
elif requestMethod == 'nextDataElement':
responseData = self.serverProtocolAdapter.nextDataElement(request[1])
elif requestMethod == 'getDataElementInfo':
elementInfo = self.serverProtocolAdapter.getDataElementInfo()
# Meta information needs separate serialization, do it.
if elementInfo != None:
responseData = (elementInfo[0], str(elementInfo[1].serialize(), 'ascii'), elementInfo[2])
elif requestMethod == 'getDataElementStream':
responseData = WrappedFileStreamMultipartResponseIterator(
self.serverProtocolAdapter.getDataElementStream())
else:
raise Exception('Invalid request %s' % repr(requestMethod))
if noResponseFlag:
return b''
if isinstance(responseData, MultipartResponseIterator):
return responseData
return bytes(json.dumps(responseData), 'ascii')
class SocketConnectorService():
"""This class listens on a socket and creates a TransferContext
for each incoming connection."""
def __init__(
self, socketPath, receiverTransferPolicy, senderTransferPolicy,
localStorage, transferAgent):
"""Create this service to listen on the given path.
@param socketPath the path where to create the local UNIX
socket. The service will not create the directory required
to hold the socket but will unlink any preexisting socket
file with that path. This operation is racy and might cause
security issues when applied by privileged user on insecure
directories."""
self.socketPath = socketPath
self.receiverTransferPolicy = receiverTransferPolicy
self.senderTransferPolicy = senderTransferPolicy
self.localStorage = localStorage
self.transferAgent = transferAgent
# The local socket to accept incoming connections. The only place
# to close the socket is in the shutdown method to avoid concurrent
# close in main loop.
self.socket = None
self.isRunningFlag = False
self.shutdownFlag = False
self.socket = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
try:
self.socket.bind(self.socketPath)
except socket.error as bindError:
if bindError.errno != errno.EADDRINUSE:
raise
# Try to unlink the old socket and retry creation.
os.unlink(self.socketPath)
self.socket.bind(self.socketPath)
os.chmod(self.socketPath, 0x180)
def run(self):
"""Run this connector service. The method will not return until
shutdown is requested by another thread."""
# Keep a local copy of the server socket to avoid races with
# asynchronous shutdown signals.
serverSocket = self.socket
if self.shutdownFlag or (serverSocket is None):
raise Exception('Already shutdown')
self.isRunningFlag = True
serverSocket.listen(4)
while not self.shutdownFlag:
clientSocket = None
remoteAddress = None
try:
(clientSocket, remoteAddress) = serverSocket.accept()
except OSError as acceptError:
if (acceptError.errno != errno.EINVAL) or (not self.shutdownFlag):
print(
'Unexpected error %s accepting new connections' % acceptError,
file=sys.stderr)
continue
transferContext = TransferContext(
'socket', self.receiverTransferPolicy,
self.senderTransferPolicy, self.localStorage)
serverProtocolAdapter = DefaultTransferAgentServerProtocolAdapter(
transferContext)
# Extract a duplicate of the socket file descriptor. This is needed
# as the clientSocket might be garbage collected any time, thus
# closing the file descriptors while still needed.
clientSocketFd = os.dup(clientSocket.fileno())
streamMultiplexer = StreamRequestResponseMultiplexer(
clientSocketFd, clientSocketFd,
JsonStreamServerProtocolRequestHandler(serverProtocolAdapter))
# Do not wait for garbage collection, release the object immediately.
clientSocket.close()
transferContext.connect(
JsonStreamClientProtocolAdapter(streamMultiplexer),
serverProtocolAdapter)
self.transferAgent.addConnection(transferContext)
self.isRunningFlag = False
def shutdown(self, forceShutdownTime=-1):
"""Shutdown this connector service and all open connections
established by it. This method is usually called by a signal
handler as it will take down the whole service including all
open connections. The method might be invoked more than once
to force immediate shutdown after a previous attempt with
timeout did not complete yet.
@param forceShutdowTime when 0 this method will immediately
end all service activity just undoing obvious intermediate
state, e.g. deleting temporary files, but will not notify
remote side for a clean shutdown or wait for current processes
to complete. A value greater zero indicates the intent to
terminate within that given amount of time."""
# Close the socket. This will also interrupt any other thread
# in run method if it was currently waiting for new connections.
if not self.shutdownFlag:
self.socket.shutdown(socket.SHUT_RDWR)
os.unlink(self.socketPath)
self.socketPath = None
# Indicate main loop termination on next possible occasion.
self.shutdownFlag = True
# Shutdown all open connections.
self.transferAgent.shutdown(forceShutdownTime)
guerillabackup-0.5.0/src/lib/guerillabackup/TransformationProcessOutputStream.py 0000664 0000000 0000000 00000005114 14501370353 0030353 0 ustar 00root root 0000000 0000000 """This module provides streams for linking of pipeline elements."""
import os
import select
import guerillabackup
class TransformationProcessOutputStream(
guerillabackup.TransformationProcessOutputInterface):
"""This class implements a filedescriptor stream based transformation
output. It can be used for both plain reading but also to pass
the file descriptor to downstream processes directly."""
def __init__(self, streamFd):
if not isinstance(streamFd, int):
raise Exception('Not a valid stream file descriptor')
self.streamFd = streamFd
def getOutputStreamDescriptor(self):
return self.streamFd
def readData(self, length):
"""Read data from this stream without blocking.
@return the at most length bytes of data, zero-length data
if nothing available at the moment and None when end of input
was reached."""
# Perform a select before reading so that we do not need to switch
# the stream into non-blocking mode.
readFds, writeFds, exFds = select.select([self.streamFd], [], [], 0)
# Nothing available yet, do not attempt to read.
if len(readFds) == 0:
return b''
data = os.read(self.streamFd, length)
# Reading will return zero-length data when end of stream was reached.
# Return none in that case.
if len(data) == 0:
return None
return data
def close(self):
"""Close this interface. This will guarantee, that any future
access will report EOF or an error.
@raise Exception if close is attempted there still is data
available."""
data = self.readData(64)
os.close(self.streamFd)
self.streamFd = -1
if data != None:
if len(data) == 0:
raise Exception('Closing output before EOF, data might be lost')
else:
raise Exception('Unhandled data in stream lost due to close before EOF')
class NullProcessOutputStream(
guerillabackup.TransformationProcessOutputInterface):
"""This class implements a transformation output delivering
no output at all. It is useful to seal stdin of a toplevel OS
process pipeline element to avoid reading from real stdin."""
def getOutputStreamDescriptor(self):
return None
def readData(self, length):
"""Read data from this stream without blocking.
@return the at most length bytes of data, zero-length data
if nothing available at the moment and None when end of input
was reached."""
return None
def close(self):
"""Close this interface. This will guarantee, that any future
access will report EOF or an error.
@raise Exception if close is attempted there still is data
available."""
pass
guerillabackup-0.5.0/src/lib/guerillabackup/UnitRunConditions.py 0000664 0000000 0000000 00000005473 14501370353 0025057 0 ustar 00root root 0000000 0000000 """This module provides various condition classes to determine
if a backup unit can be run if the unit itself is ready to be
run."""
import time
import guerillabackup
class IUnitRunCondition():
"""This is the interface of all unit run condition classes."""
def evaluate(self):
"""Evaluate this condition.
@return True if the condition is met, false otherwise."""
raise NotImplementedError()
class AverageLoadLimitCondition(IUnitRunCondition):
"""This condition class allows to check the load stayed below
a given limit for some time."""
def __init__(self, loadLimit, limitOkSeconds):
"""Create a new load limit condition.
@param loadLimit the 1 minute CPU load limit to stay below.
@param limitOkSeconds the number of seconds the machine 1
minute load value has to stay below the limit for this condition
to be met."""
self.loadLimit = loadLimit
self.limitOkSeconds = limitOkSeconds
self.limitOkStartTime = None
def evaluate(self):
"""Evaluate this condition.
@return True when the condition is met."""
loadFile = open('/proc/loadavg', 'rb')
loadData = loadFile.read()
loadFile.close()
load1Min = float(loadData.split(b' ')[0])
if load1Min >= self.loadLimit:
self.limitOkStartTime = None
return False
currentTime = time.time()
if self.limitOkStartTime is None:
self.limitOkStartTime = currentTime
return (self.limitOkStartTime + self.limitOkSeconds <= currentTime)
class LogicalAndCondition(IUnitRunCondition):
"""This condition checks if all subconditions evaluate to true.
Even when a condition fails to evaluate to true all other conditions
are still checked to allow time sensitive conditions keep track
of time elapsed."""
def __init__(self, conditionList):
"""Create a logical and condition.
@param conditionList the list of conditions to be met."""
self.conditionList = conditionList
def evaluate(self):
"""Evaluate this condition.
@return True when the condition is met."""
result = True
for condition in self.conditionList:
if not condition.evaluate():
result = False
return result
class MinPowerOnTimeCondition():
"""This class checks if the machine was in powered on long
enough to be in stable state and ready for backup load."""
def __init__(self, minPowerOnSeconds):
"""Create a condition to check machine power up time.
@param minPowerOnSeconds the minimum number of seconds the
machine has to be powered on."""
self.minPowerOnSeconds = minPowerOnSeconds
def evaluate(self):
"""Evaluate this condition.
@return True when the condition is met."""
uptimeFile = open('/proc/uptime', 'rb')
uptimeData = uptimeFile.read()
uptimeFile.close()
uptime = float(uptimeData.split(b' ')[0])
return (uptime >= self.minPowerOnSeconds)
guerillabackup-0.5.0/src/lib/guerillabackup/Utils.py 0000664 0000000 0000000 00000003551 14501370353 0022514 0 ustar 00root root 0000000 0000000 """This module contains shared utility functions used e.g. by
generator, transfer and validation services."""
import json
def jsonLoadWithComments(fileName):
"""Load JSON data containing comments from a given file name."""
jsonFile = open(fileName, 'rb')
jsonData = jsonFile.read()
jsonFile.close()
jsonData = b'\n'.join([
b'' if x.startswith(b'#') else x for x in jsonData.split(b'\n')])
return json.loads(str(jsonData, 'utf-8'))
def parseTimeDef(timeStr):
"""Parse a time definition string returning the seconds it
encodes.
@param timeStr a human readable string defining a time interval.
It may contain one or more pairs of numeric values and unit
appended to each other without spaces, e.g. "6d20H". Valid
units are "m" (month with 30 days), "w" (week, 7 days), "d"
(day, 24 hours), "H" (hour with 60 minutes), "M" (minute with
60 seconds), "S" (second).
@return the number of seconds encoded in the interval specification."""
if not timeStr:
raise Exception('Empty time string not allowed')
timeDef = {}
numStart = 0
while numStart < len(timeStr):
numEnd = numStart
while (numEnd < len(timeStr)) and (timeStr[numEnd].isnumeric()):
numEnd += 1
if numEnd == numStart:
raise Exception()
number = int(timeStr[numStart:numEnd])
typeKey = 's'
if numEnd != len(timeStr):
typeKey = timeStr[numEnd]
numEnd += 1
numStart = numEnd
if typeKey in timeDef:
raise Exception()
timeDef[typeKey] = number
timeVal = 0
for typeKey, number in timeDef.items():
factor = {
'm': 30 * 24 * 60 * 60,
'w': 7 * 24 * 60 * 60,
'd': 24 * 60 * 60,
'H': 60 * 60,
'M': 60,
's': 1
}.get(typeKey, None)
if factor is None:
raise Exception('Unknown time specification element "%s"' % typeKey)
timeVal += number * factor
return timeVal
guerillabackup-0.5.0/src/lib/guerillabackup/__init__.py 0000664 0000000 0000000 00000076022 14501370353 0023156 0 ustar 00root root 0000000 0000000 """This is the main guerillabackup module containing interfaces,
common helper functions."""
import errno
import os
import select
import sys
import time
CONFIG_GENERAL_PERSISTENCY_BASE_DIR_KEY = 'GeneralPersistencyBaseDir'
CONFIG_GENERAL_PERSISTENCY_BASE_DIR_DEFAULT = '/var/lib/guerillabackup/state'
CONFIG_GENERAL_RUNTIME_DATA_DIR_KEY = 'GeneralRuntimeDataDir'
CONFIG_GENERAL_RUNTIME_DATA_DIR_DEFAULT = '/run/guerillabackup'
CONFIG_GENERAL_DEBUG_TEST_MODE_KEY = 'GeneralDebugTestModeFlag'
GENERATOR_UNIT_CLASS_KEY = 'backupGeneratorUnitClass'
TRANSFER_RECEIVER_POLICY_CLASS_KEY = 'TransferReceiverPolicyClass'
TRANSFER_RECEIVER_POLICY_INIT_ARGS_KEY = 'TransferReceiverPolicyInitArgs'
TRANSFER_SENDER_POLICY_CLASS_KEY = 'TransferSenderPolicyClass'
TRANSFER_SENDER_POLICY_INIT_ARGS_KEY = 'TransferSenderPolicyInitArgs'
# Some constants not available on Python os module level yet.
AT_SYMLINK_NOFOLLOW = 0x100
AT_EMPTY_PATH = 0x1000
class TransformationPipelineElementInterface:
"""This is the interface to define data transformation pipeline
elements, e.g. for compression, encryption, signing. To really
start execution of a transformation pipeline, transformation
process instances have to be created for each pipe element."""
def getExecutionInstance(self, upstreamProcessOutput):
"""Get an execution instance for this transformation element.
@param upstreamProcessOutput this is the output of the upstream
process, that will be wired as input of the newly created
process instance."""
raise Exception('Interface method called')
class TransformationProcessOutputInterface:
"""This interface has to be implemented by all pipeline instances,
both synchronous and asynchronous. When an instance reaches
stopped state, it has to guarantee, that both upstream and downstream
instance will detect EOF or an exception is raised when output
access is attempted."""
def getOutputStreamDescriptor(self):
"""Get the file descriptor to read output from this output
interface. When supported, a downstream asynchronous process
may decide to operate only using the stream, eliminating the
need to be invoked for IO operations after starting.
@return the file descriptor, pipe or socket or None if stream
operation is not available."""
raise Exception('Interface method called')
def readData(self, length):
"""Read data from this output. This method must not block
as it is usually invoked from synchronous pipeline elements.
@return the at most length bytes of data, zero-length data
if nothing available at the moment and None when end of input
was reached."""
raise Exception('Interface method called')
def close(self):
"""Close this interface. This will guarantee, that any future
access will report EOF or an error.
@raise Exception if close is attempted there still is data
available."""
raise Exception('Interface method called')
class TransformationProcessInterface:
"""This is the interface of all pipe transformation process
instances."""
def getProcessOutput(self):
"""Get the output connector of this transformation process.
After calling this method, it is not possible to set an output
stream using setProcessOutputStream."""
raise Exception('Interface method called')
def setProcessOutputStream(self, processOutputStream):
"""Some processes may also support setting of an output stream
file descriptor. This is especially useful if the process
is the last one in a pipeline and hence could write directly
to a file or network descriptor. After calling this method,
it is not possible to switch back to getProcessOutput.
@throw Exception if this process does not support setting
of output stream descriptors."""
raise Exception('Interface method called')
def isAsynchronous(self):
"""A asynchronous process just needs to be started and will
perform data processing on streams without any further interaction
while running. This method may raise an exception when the
process element was not completely connected yet: operation
mode might not be known yet."""
raise Exception('Interface method called')
def start(self):
"""Start this execution process."""
raise Exception('Interface method called')
def stop(self):
"""Stop this execution process when still running.
@return None when the the instance was already stopped, information
about stopping, e.g. the stop error message when the process
was really stopped."""
raise Exception('Interface method called')
def isRunning(self):
"""See if this process instance is still running.
@return False if instance was not yet started or already stopped.
If there are any unreported pending errors from execution,
this method will return True until doProcess() or stop() is
called at least once."""
raise Exception('Interface method called')
def doProcess(self):
"""This method triggers the data transformation operation
of this component. For components in synchronous mode, the
method will attempt to move data from input to output. Asynchronous
components will just check the processing status and may raise
an exception, when processing terminated with errors. As such
a component might not be able to detect the amount of data
really moved since last invocation, the component may report
a fake single byte move.
@throws Exception if an uncorrectable transformation state
was reached and transformation cannot proceed, even though
end of input data was not yet seen. Raise exception also when
process was not started or already stopped.
@return the number of bytes read or written or at least a
value greater zero if any data was processed. A value of zero
indicates, that currently data processing was not possible
due to filled buffers but should be attemted again. A value
below zero indicates that all input data was processed and
output buffers were flushed already."""
raise Exception('Interface method called')
def getBlockingStreams(self, readStreamList, writeStreamList):
"""Collect the file descriptors that are currently blocking
this synchronous compoment."""
raise Exception('Interface method called')
class SchedulableGeneratorUnitInterface:
"""This is the interface each generator unit has to provide
for interaction with a backup generator component. Therefore
this component has to provide both information about scheduling
and the backup data elements on request. In return, it receives
configuration information and persistency support from the
invoker."""
def __init__(self, unitName, configContext):
"""Initialize this unit using the given configuration. The
new object has to keep a reference to when needed.
@param unitName The name of the activated unit main file in
/etc/guerillabackup/units."""
raise Exception('Interface method called')
def getNextInvocationTime(self):
"""Get the time in seconds until this unit should called again.
If a unit does not know (yet) as invocation needs depend on
external events, it should report a reasonable low value to
be queried again soon.
@return 0 if the unit should be invoked immediately, the seconds
to go otherwise."""
raise Exception('Interface method called')
def invokeUnit(self, sink):
"""Invoke this unit to create backup elements and pass them
on to the sink. Even when indicated via getNextInvocationTime,
the unit may decide, that it is not yet ready and not write
any element to the sink.
@return None if currently there is nothing to write to the
sink, a number of seconds to retry invocation if the unit
assumes, that there is data to be processed but processing
cannot start yet, e.g. due to locks held by other parties
or resource, e.g. network storages, currently not available.
@throw Exception if the unit internal logic failed in any
uncorrectable ways. Even when invoker decides to continue
processing, it must not reinvoke this unit before complete
reload."""
raise Exception('Interface method called')
class SinkInterface:
"""This is the interface each sink has to provide to store backup
data elements from different sources."""
def __init__(self, configContext):
"""Initialize this sink with parameters from the given configuration
context."""
raise Exception('Interface method called')
def getSinkHandle(self, sourceUrl):
"""Get a handle to perform transfer of a single backup data
element to a sink."""
raise Exception('Interface method called')
class SinkHandleInterface:
"""This is the common interface of all sink handles to store
a single backup data element to a sink."""
def getSinkStream(self):
"""Get the file descriptor to write directly to the open backup
data element at the sink, if available. The stream should
not be closed using os.close(), but via the close method from
SinkHandleInterface.
@return the file descriptor or None when not supported."""
raise Exception('Interface method called')
def write(self, data):
"""Write data to the open backup data element at the sink."""
raise Exception('Interface method called')
def close(self, metaInfo):
"""Close the backup data element at the sink and receive any
pending or current error associated with the writing process.
When there is sufficient risk, that data written to the sink
is might have been corrupted during transit or storage, the
sink may decide to perform a verification operation while
closing and return any verification errors here also.
@param metaInfo python objects with additional information
about this backup data element. This information is added
at the end of the sink procedure to allow inclusion of checksum
or signature fields created on the fly while writing. See
design and implementation documentation for requirements on
those objects."""
raise Exception('Interface method called')
def getElementId(self):
"""Get the storage element ID of the previously written data.
@throws Exception if the element ID is not yet available because
the object is not closed yet."""
raise Exception('Interface method called')
class StorageInterface:
"""This is the interface of all stores for backup data elements
providing access to content data and metainfo but also additional
storage attributes. The main difference to a generator unit
is, that data is just retrieved but not generated on invocation."""
def __init__(self, configContext):
"""Initialize this store with parameters from the given configuration
context."""
raise Exception('Interface method called')
def getSinkHandle(self, sourceUrl):
"""Get a handle to perform transfer of a single backup data
element to a sink. This method may never block or raise an
exception, even other concurrent sink, query or update procedures
are in progress."""
raise Exception('Interface method called')
def getBackupDataElement(self, elementId):
"""Retrieve a single stored backup data element from the storage.
@param elementId the storage ID of the backup data element.
@throws Exception when an incompatible query, update or read
is in progress."""
raise Exception('Interface method called')
def getBackupDataElementForMetaData(self, sourceUrl, metaData):
"""Retrieve a single stored backup data element from the storage.
@param sourceUrl the URL identifying the source that produced
the stored data elements.
@param metaData metaData dictionary for the element of interest.
@throws Exception when an incompatible query, update or read
is in progress."""
raise Exception('Interface method called')
def queryBackupDataElements(self, query):
"""Query this storage.
@param query if None, return an iterator over all stored elements.
Otherwise query has to be a function returning True or False
for StorageBackupDataElementInterface elements.
@return BackupDataElementQueryResult iterator for this query.
@throws Exception if there are any open queries or updates
preventing response."""
raise Exception('Interface method called')
class StorageBackupDataElementInterface:
"""This class encapsulates access to a stored backup data element."""
def getElementId(self):
"""Get the storage element ID of this data element."""
raise Exception('Interface method called')
def getSourceUrl(self):
"""Get the source URL of the storage element."""
raise Exception('Interface method called')
def getMetaData(self):
"""Get only the metadata part of this element.
@return a BackupElementMetainfo object"""
raise Exception('Interface method called')
def getDataStream(self):
"""Get a stream to read data from that element.
@return a file descriptor for reading this stream."""
raise Exception('Interface method called')
def setExtraData(self, name, value):
"""Attach or detach extra data to this storage element. This
function is intended for agents to use the storage to persist
this specific data also.
@param value the extra data content or None to remove the
element."""
raise Exception('Interface method called')
def getExtraData(self, name):
"""@return None when no extra data was found, the content
otherwise"""
raise Exception('Interface method called')
def delete(self):
"""Delete this data element and all extra data element."""
raise Exception('Interface method called')
def lock(self):
"""Lock this backup data element.
@throws Exception if the element does not exist any more or
cannot be locked"""
raise Exception('Interface method called')
def unlock(self):
"""Unlock this backup data element."""
raise Exception('Interface method called')
class BackupDataElementQueryResult():
"""This is the interface of all query results."""
def getNextElement(self):
"""Get the next backup data element from this query iterator.
@return a StorageBackupDataElementInterface object."""
raise Exception('Interface method called')
# Define common functions:
def isValueListOfType(value, targetType):
"""Check if a give value is a list of values of given target
type."""
if not isinstance(value, list):
return False
for item in value:
if not isinstance(item, targetType):
return False
return True
def assertSourceUrlSpecificationConforming(sourceUrl):
"""Assert that the source URL is according to specification."""
if (sourceUrl[0] != '/') or (sourceUrl[-1] == '/'):
raise Exception('Slashes not conforming')
for urlPart in sourceUrl[1:].split('/'):
if len(urlPart) == 0:
raise Exception('No path part between slashes')
if urlPart in ('.', '..'):
raise Exception('. and .. forbidden')
for urlChar in urlPart:
if urlChar not in '%-.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz':
raise Exception('Invalid character %s in URL part' % repr(urlChar))
def getDefaultDownstreamPipeline(configContext, encryptionKeyName):
"""This function returns the default processing pipeline for
locally generated backup data. It uses the GeneralDefaultCompressionElement
and GeneralDefaultEncryptionElement parameters from the configuration
to generate the pipeline, including a DigestPipelineElement
at the end.
@encryptionKeyName when not None, this method will change the
key in the GeneralDefaultEncryptionElement when defined. Otherwise
a default gpg encryption element is created using keys from
/etc/guerillabackup/keys."""
downstreamPipelineElements = []
compressionElement = configContext.get(
'GeneralDefaultCompressionElement', None)
if compressionElement is not None:
downstreamPipelineElements.append(compressionElement)
encryptionElement = configContext.get('GeneralDefaultEncryptionElement', None)
if encryptionKeyName is not None:
if encryptionElement is not None:
encryptionElement = encryptionElement.replaceKey(encryptionKeyName)
else:
encryptionCallArguments = GpgEncryptionPipelineElement.gpgDefaultCallArguments
if compressionElement is not None:
encryptionCallArguments += ['--compress-algo', 'none']
encryptionElement = GpgEncryptionPipelineElement(
encryptionKeyName, encryptionCallArguments)
if encryptionElement is not None:
downstreamPipelineElements.append(encryptionElement)
downstreamPipelineElements.append(DigestPipelineElement())
return downstreamPipelineElements
def instantiateTransformationPipeline(
pipelineElements, upstreamProcessOutput, downstreamProcessOutputStream,
doStartFlag=False):
"""Create transformation instances for the list of given pipeline
elements.
@param upstreamProcessOutput TransformationProcessOutputInterface
upstream output. This parameter might be None for a pipeline
element at first position creating the backup data internally.
@param downstreamProcessOutputStream if None, enable this stream
as output of the last pipeline element."""
if ((upstreamProcessOutput is not None) and
(not isinstance(
upstreamProcessOutput, TransformationProcessOutputInterface))):
raise Exception('upstreamProcessOutput not an instance of TransformationProcessOutputInterface')
if doStartFlag and (downstreamProcessOutputStream is None):
raise Exception('Cannot autostart instances without downstream')
instanceList = []
lastInstance = None
for element in pipelineElements:
if lastInstance is not None:
upstreamProcessOutput = lastInstance.getProcessOutput()
instance = element.getExecutionInstance(upstreamProcessOutput)
upstreamProcessOutput = None
lastInstance = instance
instanceList.append(instance)
if downstreamProcessOutputStream is not None:
lastInstance.setProcessOutputStream(downstreamProcessOutputStream)
if doStartFlag:
for instance in instanceList:
instance.start()
return instanceList
def runTransformationPipeline(pipelineInstances):
"""Run all processes included in the pipeline until processing
is complete or the first uncorrectable error is detected by
transformation process. All transformation instances have to
be started before calling this method.
@param pipelineInstances the list of pipeline instances. The
instances have to be sorted, so the one reading or creating
the input data is the first, the one writing to the sink the
last in the list.
@throws Exception when first failing pipeline element is detected.
This will not terminate the whole pipeline, other elements have
to be stopped explicitely."""
# Keep list of still running synchronous instances.
syncInstancesList = []
for instance in pipelineInstances:
if not instance.isAsynchronous():
syncInstancesList.append(instance)
while True:
# Run all the synchronous units first.
processState = -1
for instance in syncInstancesList:
result = instance.doProcess()
processState = max(processState, result)
if result < 0:
if instance.isRunning():
raise Exception('Logic error')
syncInstancesList.remove(instance)
# Fake state and pretend something was moved. Modifying the list
# in loop might skip elements, thus cause invalid state information
# aggregation.
processState = 1
break
if instance.isAsynchronous():
# All synchronous IO was completed, the component may behave
# now like an asynchronous one. Therefore remove it from the
# list, otherwise we might spin here until the component stops
# because it will not report any blocking file descriptors below,
# thus skipping select() when there are no other blocking components
# in syncInstancesList.
syncInstancesList.remove(instance)
if processState == -1:
# There is no running synchronous instances, all remaining ones
# are asynchronous and do not need select() for synchronous IO
# data moving below.
break
if processState != 0:
# At least some data was moved, continue moving.
continue
# Not a single synchronous instance was able to move data. This
# is due to a blocking read operation at the upstream side of
# the pipeline or the downstream side write, e.g. due to network
# or filesystem IO blocking. Update the state of all synchronous
# components and if all are still running, wait for any IO to
# end blocking.
readStreamList = []
writeStreamList = []
for instance in syncInstancesList:
instance.getBlockingStreams(readStreamList, writeStreamList)
if readStreamList or writeStreamList:
# Wait for at least one stream from a synchronous instance to
# be ready for IO. There might be none at all due to a former
# synchronous component having finished handling all synchronous
# IO.
select.select(readStreamList, writeStreamList, [], 1)
# Only asynchronous instances remain (if any), so just wait until
# each one of them has stopped.
for instance in pipelineInstances:
while instance.isRunning():
time.sleep(1)
def listDirAt(dirFd, path='.'):
"""This function provides the os.listdir() functionality to
list files in an opened directory for python2.x. With Python
3.x listdir() also accepts file descriptors as argument."""
currentDirFd = os.open('.', os.O_DIRECTORY|os.O_RDONLY|os.O_NOCTTY)
result = None
try:
os.fchdir(dirFd)
result = os.listdir(path)
finally:
os.fchdir(currentDirFd)
os.close(currentDirFd)
return result
def secureOpenAt(
dirFd, pathName, symlinksAllowedFlag=False,
dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY,
dirCreateMode=None, fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY,
fileCreateMode=None):
"""Perform a secure opening of the file. This function does
not circumvent the umask in place when applying create modes.
@param dirFd the directory file descriptor where to open a relative
pathName, ignored for absolute pathName values.
@param pathName open the path denotet by this string relative
to the dirFd directory. When pathName is absolute, dirFd will
be ignored. The pathName must not end with '/' unless the directory
path '/' itself is specified. Distinction what should be opened
has to be made using the flags.
@param symlinksAllowedFlag when not set to True, O_NOFOLLOW
will be added to each open call.
@param dirOpenFlag those flags are used to open directory path
components without modification. The flags have to include the
O_DIRECTORY flag. The O_NOFOLLOW and O_NOCTTY flags are strongly
recommended and should be omitted only in special cases.
@param dirCreateMode if not None, missing directories will be created
with the given mode.
@param fileOpenFlags flags to apply when opening the last component
@param fileCreateMode if not None, missing files will be created
when O_CREAT was also in the fileOpenFlags."""
if (dirOpenFlags&os.O_DIRECTORY) == 0:
raise Exception('Directory open flags have to include O_DIRECTORY')
if not symlinksAllowedFlag:
dirOpenFlags |= os.O_NOFOLLOW
fileOpenFlags |= os.O_NOFOLLOW
if fileCreateMode is None:
fileCreateMode = 0
if pathName == '/':
return os.open(pathName, os.O_RDONLY|os.O_DIRECTORY|os.O_NOCTTY)
if pathName.endswith('/'):
raise Exception('Invalid path value')
currentDirFd = dirFd
if pathName.startswith('/'):
currentDirFd = os.open(
'/', os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY)
pathName = pathName[1:]
pathNameParts = pathName.split('/')
try:
# Traverse all the directory path parts, but just one at a time
# to avoid following symlinks.
for pathNamePart in pathNameParts[:-1]:
try:
nextDirFd = os.open(pathNamePart, dirOpenFlags, dir_fd=currentDirFd)
except OSError as openError:
if openError.errno == errno.EACCES:
raise
if dirCreateMode is None:
raise
os.mkdir(pathNamePart, mode=dirCreateMode, dir_fd=currentDirFd)
nextDirFd = os.open(pathNamePart, dirOpenFlags, dir_fd=currentDirFd)
if currentDirFd != dirFd:
os.close(currentDirFd)
currentDirFd = nextDirFd
# Now open the last part. Always open last part separately,
# also for directories: the last open may use different flags.
directoryCreateFlag = False
if (((fileOpenFlags&os.O_DIRECTORY) != 0) and
((fileOpenFlags&os.O_CREAT) != 0)):
directoryCreateFlag = True
# Clear the create flag, otherwise open would create a file instead
# of a directory, ignoring the O_DIRECTORY flag.
fileOpenFlags &= ~os.O_CREAT
resultFd = None
try:
resultFd = os.open(
pathNameParts[-1], fileOpenFlags, mode=fileCreateMode,
dir_fd=currentDirFd)
except OSError as openError:
if (not directoryCreateFlag) or (openError.errno != errno.ENOENT):
raise
os.mkdir(pathNameParts[-1], mode=dirCreateMode, dir_fd=currentDirFd)
resultFd = os.open(
pathNameParts[-1], fileOpenFlags, dir_fd=currentDirFd)
return resultFd
finally:
# Make sure to close the currentDirFd, otherwise we leak one fd
# per error.
if currentDirFd != dirFd:
os.close(currentDirFd)
# Fail on all errors not related to concurrent proc filesystem
# changes.
OPENER_INFO_FAIL_ON_ERROR = 0
# Do not fail on errors related to limited permissions accessing
# the information. This flag is needed when running without root
# privileges.
OPENER_INFO_IGNORE_ACCESS_ERRORS = 1
def getFileOpenerInformation(pathNameList, checkMode=OPENER_INFO_FAIL_ON_ERROR):
"""Get information about processes currently having access to
one of the absolute pathnames from the list. This is done reading
information from the proc filesystem. As access to proc might
be limited for processes with limited permissions, the function
can be forced to ignore the permission errors occurring during
those checks.
CAVEAT: The checks are meaningful to detect concurrent write
access to files where e.g. a daemon did not close them on error
or a file is currently filled. The function is always racy,
a malicious process can also trick guerillabackup to believe
a file is in steady state and not currently written even when
that is not true.
@param pathNameList a list of absolute pathnames to check in
parallel. All those entries have to pass a call to os.path.realpath
unmodified.
@return a list containing one entry per pathNameList entry.
The entry can be none if no access to the file was detected.
Otherwise the entry is a list with tuples containing the pid
of the process having access to the file and a list with tuples
containing the fd within that process and a the flags."""
for pathName in pathNameList:
if pathName != os.path.realpath(pathName):
raise Exception('%s is not an absolute, canonical path' % pathName)
if checkMode not in [OPENER_INFO_FAIL_ON_ERROR, OPENER_INFO_IGNORE_ACCESS_ERRORS]:
raise Exception('Invalid checkMode given')
resultList = [None]*len(pathNameList)
for procPidName in os.listdir('/proc'):
procPid = -1
try:
procPid = int(procPidName)
except ValueError:
continue
fdDirName = '/proc/%s/fd' % procPidName
fdInfoDirName = '/proc/%s/fdinfo' % procPidName
fdFileList = []
try:
fdFileList = os.listdir(fdDirName)
except OSError as fdListException:
if fdListException.errno == errno.ENOENT:
continue
if ((fdListException.errno == errno.EACCES) and
(checkMode == OPENER_INFO_IGNORE_ACCESS_ERRORS)):
continue
raise
for openFdName in fdFileList:
targetPathName = None
try:
targetPathName = os.readlink('%s/%s' % (fdDirName, openFdName))
except OSError as readLinkError:
if readLinkError.errno == errno.ENOENT:
continue
raise
pathNameIndex = -1
try:
pathNameIndex = pathNameList.index(targetPathName)
except ValueError:
continue
# At least one hit, read the data.
infoFd = os.open(
'%s/%s' % (fdInfoDirName, openFdName),
os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY)
infoData = os.read(infoFd, 1<<16)
os.close(infoFd)
splitPos = infoData.find(b'flags:\t')
if splitPos < 0:
raise Exception('Unexpected proc behaviour')
endPos = infoData.find(b'\n', splitPos)
infoTuple = (int(openFdName), int(infoData[splitPos+7:endPos], 8))
while True:
if resultList[pathNameIndex] is None:
resultList[pathNameIndex] = [(procPid, [infoTuple])]
else:
pathNameInfo = resultList[pathNameIndex]
indexPos = -1-len(pathNameInfo)
for index, entry in enumerate(pathNameInfo):
if entry[0] == procPid:
indexPos = index
break
if entry[0] > procPid:
indexPos = -1-index
break
if indexPos >= 0:
pathNameInfo[indexPos][1].append(infoTuple)
else:
indexPos = -1-indexPos
pathNameInfo.insert(indexPos, (procPid, [infoTuple]))
try:
pathNameIndex = pathNameList.index(targetPathName, pathNameIndex+1)
except ValueError:
break
return resultList
def getPersistencyBaseDirPathname(configContext):
"""Get the persistency data directory pathname from configuration
or return the default value."""
return configContext.get(
CONFIG_GENERAL_PERSISTENCY_BASE_DIR_KEY,
CONFIG_GENERAL_PERSISTENCY_BASE_DIR_DEFAULT)
def getRuntimeDataDirPathname(configContext):
"""Get the runtime data directory pathname from configuration
or return the default value."""
return configContext.get(
CONFIG_GENERAL_RUNTIME_DATA_DIR_KEY,
CONFIG_GENERAL_RUNTIME_DATA_DIR_DEFAULT)
def openPersistencyFile(configContext, pathName, flags, mode):
"""Open or possibly create a persistency file in the default
persistency directory."""
baseDir = getPersistencyBaseDirPathname(configContext)
return secureOpenAt(
-1, os.path.join(baseDir, pathName), symlinksAllowedFlag=False,
dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY,
dirCreateMode=0o700, fileOpenFlags=flags|os.O_NOFOLLOW|os.O_NOCTTY,
fileCreateMode=mode)
def readFully(readFd):
"""Read data from a file descriptor until EOF is reached."""
data = b''
while True:
block = os.read(readFd, 1<<16)
if len(block) == 0:
break
data += block
return data
def execConfigFile(configFileName, configContext):
"""Load code from file and execute it with given global context."""
configFile = open(configFileName, 'r')
configData = configFile.read()
configFile.close()
configCode = compile(configData, configFileName, 'exec')
exec(configCode, configContext, configContext)
# Load some classes into this namespace as shortcut for use in
# configuration files.
from guerillabackup.DefaultFileSystemSink import DefaultFileSystemSink
from guerillabackup.DigestPipelineElement import DigestPipelineElement
from guerillabackup.GpgEncryptionPipelineElement import GpgEncryptionPipelineElement
from guerillabackup.OSProcessPipelineElement import OSProcessPipelineElement
from guerillabackup.Transfer import SenderMoveDataTransferPolicy
from guerillabackup.UnitRunConditions import AverageLoadLimitCondition
from guerillabackup.UnitRunConditions import LogicalAndCondition
from guerillabackup.UnitRunConditions import MinPowerOnTimeCondition
import guerillabackup.Utils
guerillabackup-0.5.0/src/lib/guerillabackup/storagetool/ 0000775 0000000 0000000 00000000000 14501370353 0023400 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/src/lib/guerillabackup/storagetool/Policy.py 0000664 0000000 0000000 00000002603 14501370353 0025212 0 ustar 00root root 0000000 0000000 class Policy():
"""This class defines a policy instance."""
def __init__(self, policyConfig):
"""Instantiate this policy object from a JSON policy configuration
definition."""
if not isinstance(policyConfig, dict):
raise Exception(
'Policy configuration has to be a dictionary: %s' % (
repr(policyConfig)))
self.policyName = policyConfig['Name']
if not isinstance(self.policyName, str):
raise Exception('Policy name has to be a string')
self.policyPriority = policyConfig.get('Priority', 0)
if not isinstance(self.policyPriority, int):
raise Exception('Policy priority has to be an integer')
def getPolicyName(self):
"""Get the name of this policy."""
return self.policyName
def getPriority(self):
"""Get the priority of this policy. A higher number indicates
that a policy has higher priority and shall override any
policies with lower or equal priority."""
return self.policyPriority
def apply(self, sourceStatus):
"""Apply this policy to a backup source status."""
raise Exception('Abstract method called')
def delete(self, sourceStatus):
"""Delete the policy status data of this policy for all elements
marked for deletion and update status data of those elements
kept to allow validation even after deletion of some elements."""
raise Exception('Abstract method called')
guerillabackup-0.5.0/src/lib/guerillabackup/storagetool/PolicyTypeInterval.py 0000664 0000000 0000000 00000023433 14501370353 0027565 0 ustar 00root root 0000000 0000000 """This module provides support for the backup interval checking
policy."""
import json
import sys
import guerillabackup.storagetool.Policy
class PolicyTypeInterval(guerillabackup.storagetool.Policy.Policy):
"""This policy type defines a policy checking file intervals
between full and incremental backups (if available).
Applying the policies will also use and modify following backup
data element status information fields:
* LastFull, LastInc: Timestamp of previous element in case
the element was deleted. Otherwise policy checks would report
policy failures when pruning data storage elements.
* Ignore: When set ignore interval policy violations of that
type, i.e. full, inc, both related to the previous backup
data element.
* Dead: When set, this source will not produce any more backups.
Therefore ignore any gap after this last backup and report
a policy violation if more backups are seen."""
POLICY_NAME = 'Interval'
def __init__(self, policyConfig):
"""Instantiate this policy object from a JSON policy configuration
definition."""
super().__init__(policyConfig)
self.config = {}
self.timeFullMin = None
self.timeFullMax = None
# This is the minimal time between incremental backups or None
# if this source does not emit any incremental backups.
self.timeIncMin = None
self.timeIncMax = None
for configKey, configValue in policyConfig.items():
if configKey in ('Name', 'Priority'):
continue
self.config[configKey] = configValue
if configKey not in ('FullMax', 'FullMin', 'IncMax', 'IncMin'):
raise Exception('Unknown policy configuration setting "%s"' % configKey)
timeVal = None
if configValue is not None:
if not isinstance(configValue, str):
raise Exception(
'Policy setting "%s" has to be a string' % configKey)
timeVal = guerillabackup.Utils.parseTimeDef(configValue)
if configKey == 'FullMax':
self.timeFullMax = timeVal
elif configKey == 'FullMin':
self.timeFullMin = timeVal
elif configKey == 'IncMax':
self.timeIncMax = timeVal
else:
self.timeIncMin = timeVal
# Validate the configuration settings.
if (self.timeFullMin is None) or (self.timeFullMax is None):
raise Exception()
if self.timeFullMin > self.timeFullMax:
raise Exception()
if self.timeIncMax is None:
if self.timeIncMin is not None:
raise Exception()
elif (self.timeIncMin > self.timeIncMax) or \
(self.timeIncMin > self.timeFullMin) or \
(self.timeIncMax > self.timeFullMax):
raise Exception()
def apply(self, sourceStatus):
"""Apply this policy to a backup source status."""
elementList = sourceStatus.getDataElementList()
currentPolicy = None
lastElement = None
lastFullElementName = None
lastFullTime = None
lastIncTime = None
for elem in elementList:
policyData = elem.getPolicyData(PolicyTypeInterval.POLICY_NAME)
ignoreType = None
if policyData is not None:
for key in policyData.keys():
if key not in ('Config', 'Ignore', 'LastFull', 'LastInc'):
raise Exception(
'Policy status configuration for "%s" in ' \
'"%s" corrupted: %s' % (
elem.getElementName(),
sourceStatus.getStorageStatus().getStatusFileName(),
repr(policyData)))
if 'LastFull' in policyData:
lastFullTime = policyData['LastFull']
lastFullElementName = '?'
lastIncTime = policyData['LastInc']
if 'Config' in policyData:
# Use the additional policy data.
initConfig = dict(policyData['Config'])
initConfig['Name'] = PolicyTypeInterval.POLICY_NAME
currentPolicy = PolicyTypeInterval(initConfig)
ignoreType = policyData.get('Ignore', None)
if currentPolicy is None:
# This is the first element and default policy data was not copied
# yet.
elem.updatePolicyData(
PolicyTypeInterval.POLICY_NAME,
{'Config': self.config})
currentPolicy = self
elemTime = elem.getDatetimeSeconds()
if lastFullTime is None:
# This is the first element.
if elem.getType() != 'full':
print(
'Backup source %s starts with incremental ' \
'backups, storage might be corrupted.' % (
sourceStatus.getSourceName()), file=sys.stderr)
lastFullTime = lastIncTime = elemTime
lastFullElementName = elem.getElementName()
else:
# Now check the time policy.
fullValidationOkFlag = False
if elem.getType() == 'full':
if ignoreType not in ('both', 'full'):
fullDelta = elemTime - lastFullTime
if fullDelta < currentPolicy.timeFullMin:
print(
'Backup source %s emitting full backups ' \
'faster than expected between "%s" and "%s".' % (
sourceStatus.getSourceName(),
lastFullElementName,
elem.getElementName()), file=sys.stderr)
elif fullDelta > currentPolicy.timeFullMax:
print(
'Backup source "%s" full interval too big ' \
'between "%s" and "%s".' % (
sourceStatus.getSourceName(),
lastFullElementName,
elem.getElementName()), file=sys.stderr)
else:
fullValidationOkFlag = True
if not fullValidationOkFlag:
print(
'No interactive/automatic policy or ' \
'status update, consider adding this ' \
'manually to the status in %s:\n%s' % (
sourceStatus.getStorageStatus().getStatusFileName(),
json.dumps(
{elem.getElementName(): {'Interval': {'Ignore': 'full'}}}, indent=2)
), file=sys.stderr)
lastFullTime = elemTime
lastFullElementName = elem.getElementName()
else:
# This is an incremental backup, so incremental interval policy
# has to be defined.
if currentPolicy.timeIncMin is None:
raise Exception('Not expecting incremental backups')
# Always perform incremental checks if there is a policy for
# it.
if (currentPolicy.timeIncMin is not None) and \
(ignoreType not in ('both', 'inc')):
incDelta = elemTime - lastIncTime
# As full backup scheduling overrides the lower priority incremental
# schedule, that may have triggered a full backup while no incremental
# would have been required yet.
if (incDelta < currentPolicy.timeIncMin) and \
(elem.getType() != 'full') and (not fullValidationOkFlag):
print(
'Backup source %s emitting inc backups ' \
'faster than expected between "%s" and "%s".' % (
sourceStatus.getSourceName(),
lastElement.getElementName(),
elem.getElementName()), file=sys.stderr)
elif incDelta > currentPolicy.timeIncMax:
print(
'Backup source "%s" inc interval too big ' \
'between "%s" and "%s".' % (
sourceStatus.getSourceName(),
lastElement.getElementName(),
elem.getElementName()), file=sys.stderr)
print(
'No interactive/automatic policy or ' \
'status update, consider adding this ' \
'manually to the status in %s:\n%s' % (
sourceStatus.getStorageStatus().getStatusFileName(),
json.dumps(
{elem.getElementName(): {'Interval': {'Ignore': 'inc'}}}, indent=2)
), file=sys.stderr)
lastElement = elem
lastIncTime = elemTime
def delete(self, sourceStatus):
"""Prepare the policy status data for deletions going to
happen later on."""
elementList = sourceStatus.getDataElementList()
lastElement = None
lastFullTime = None
lastIncTime = None
# Keep also track of the last persistent and the current policy.
# When deleting an element with policy updates then move the
# current policy data to the first element not deleted.
persistentPolicyConfig = currentPolicyConfig = self.config
for elem in elementList:
policyData = elem.getPolicyData(PolicyTypeInterval.POLICY_NAME)
if policyData is not None:
if 'LastFull' in policyData:
lastFullTime = policyData['LastFull']
lastIncTime = policyData['LastInc']
if 'Config' in policyData:
currentPolicyConfig = policyData['Config']
if not elem.isMarkedForDeletion():
persistentPolicyConfig = currentPolicyConfig
elemTime = elem.getDatetimeSeconds()
if lastFullTime is None:
# This is the first element.
lastFullTime = lastIncTime = elemTime
else:
if (lastElement is not None) and \
(lastElement.isMarkedForDeletion()) and \
(not elem.isMarkedForDeletion()):
# So the previous element is to be deleted but this not. Make
# sure last timings are kept in policy data.
if policyData is None:
policyData = {}
elem.setPolicyData(PolicyTypeInterval.POLICY_NAME, policyData)
policyData['LastFull'] = lastFullTime
policyData['LastInc'] = lastIncTime
# Also copy policy configuration if the element defining the
# current policy was not persisted.
if persistentPolicyConfig != currentPolicyConfig:
policyData['Config'] = currentPolicyConfig
persistentPolicyConfig = currentPolicyConfig
if elem.getType() == 'full':
lastFullTime = elemTime
lastElement = elem
lastIncTime = elemTime
guerillabackup-0.5.0/src/lib/guerillabackup/storagetool/PolicyTypeLevelRetention.py 0000664 0000000 0000000 00000023047 14501370353 0030741 0 ustar 00root root 0000000 0000000 """This module provides backup data retention policy support."""
import datetime
import time
import guerillabackup.storagetool.Policy
class PolicyTypeLevelRetentionTagger():
"""This is a helper class for PolicyTypeLevelRetention to tag
backup elements to be kept."""
# This is the specification for calender based alignment. Each
# tuple contains the name and the offset when counting does not
# start with 0.
ALIGN_ATTR_SPEC = (
('year', 0), ('month', 1), ('day', 1),
('hour', 0), ('minute', 0), ('second', 0))
def __init__(self, configDict):
"""Create a new aligned tagger component. The class will
check all available backup data elements and tag those to
be kept, that match the specification of this tagger. All
backup data elements tagged by at least one tagger will be
kept, the others marked for deletion.
Configuration parameters for the tagger are:
* KeepCount: This is the number of time intervals to search
for matching backup data elements and thus the number of
backups to keep at most. When there is no matching backup
found within a time interval, the empty interval is also
included in the count. When there is more than one backup
within the interval, only the first one is kept. Thus e.g.
having a "KeepCount" of 30 and a "Interval" setting of
"day", this will cause backups to be kept from now till
30 days ahead (or the time of the latest backup if "TimeRef"
was set to "latest").
* KeepInc: When true, keep also incremental backups within
any interval where backups would be kept. This will also
tag the previous full backup and any incremental backups
in between for retention. Incremental backups after the
last full backup are always kept unconditionally as the
are required to restore the latest backup. Default is false.
* Interval: This is the size of the retention interval to
keep one backup per interval. Values are "year", "month",
"day", "hour".
* TimeRef: This defines the time reference to use to perform
the retention policy evaluation. With "latest" the time
of the newest backup is used, while "now" uses the current
time. The default is "now".
" AlignModulus: This defines the modulus when aligning backup
retention to values other than the "Interval" unit, e.g.
to keep one backup every three month.
* AlignValue: When defined this will make backups to be aligned
to that value related to the modulus, e.g. to keep backups
of January, April, July, October an "AlignModulus" of 3
and "AlignValue" of 1 is required."""
self.intervalUnit = None
self.alignModulus = 1
self.alignValue = 0
self.keepCount = None
self.keepIncFlag = False
self.timeReferenceType = 'now'
for configKey, configValue in configDict.items():
if configKey == 'AlignModulus':
if (not isinstance(configValue, int)) or (configValue < 1):
raise Exception()
self.alignModulus = configValue
elif configKey == 'AlignValue':
if (not isinstance(configValue, int)) or (configValue < 0):
raise Exception()
self.alignValue = configValue
elif configKey == 'Interval':
if configValue not in ['year', 'month', 'day', 'hour']:
raise Exception()
self.intervalUnit = configValue
elif configKey == 'KeepCount':
if (not isinstance(configValue, int)) or (configValue < 1):
raise Exception()
self.keepCount = configValue
elif configKey == 'KeepInc':
if not isinstance(configValue, bool):
raise Exception()
self.keepIncFlag = configValue
elif configKey == 'TimeRef':
if configValue not in ['latest', 'now']:
raise Exception()
self.timeReferenceType = configValue
else:
raise Exception(
'Unsupported configuration parameter %s' % repr(configKey))
if self.keepCount is None:
raise Exception('Mandatory KeepCount parameter not set')
if (self.intervalUnit is not None) and \
(self.alignModulus <= self.alignValue):
raise Exception('Align value has to be smaller than modulus')
def tagBackups(self, backupList, tagList):
"""Check which backup elements should be kept and tag them
in the tagList.
@param backupList the sorted list of backup data elements.
@param tagList list of boolean values of same length as the
backupList. The list should be initialized to all False values
before calling tagBackups for the first time."""
if len(backupList) != len(tagList):
raise Exception()
# Always tag the newest backup.
tagList[-1] = True
referenceTime = None
if self.timeReferenceType == 'now':
referenceTime = datetime.datetime.fromtimestamp(time.time())
elif self.timeReferenceType == 'latest':
referenceTime = datetime.datetime.fromtimestamp(
backupList[-1].getDatetimeSeconds())
else:
raise Exception('Logic error')
# Prefill the data with the default field offsets.
alignedTimeData = [x[1] for x in self.ALIGN_ATTR_SPEC]
alignFieldValue = None
for alignPos in range(0, len(self.ALIGN_ATTR_SPEC)):
alignedTimeData[alignPos] = getattr(
referenceTime, self.ALIGN_ATTR_SPEC[alignPos][0])
if self.intervalUnit == self.ALIGN_ATTR_SPEC[alignPos][0]:
alignFieldValue = alignedTimeData[alignPos]
break
startSearchTime = datetime.datetime(*alignedTimeData).timestamp()
# Handle alignment and modulus rules.
if (self.alignModulus != 1) and (self.alignValue is not None):
startSearchTime = self.decreaseTimeField(
startSearchTime,
(alignFieldValue - self.alignValue) % self.alignModulus)
keepAttempts = self.keepCount + 1
while keepAttempts != 0:
for pos, backupElement in enumerate(backupList):
backupElement = backupList[pos]
if (backupElement.getDatetimeSeconds() >= startSearchTime) and \
((backupElement.getType() == 'full') or self.keepIncFlag):
tagList[pos] = True
# Tag all incrementals within the interval.
if not self.keepIncFlag:
break
# Move down units of intervalUnit size.
startSearchTime = self.decreaseTimeField(
startSearchTime, self.alignModulus)
keepAttempts -= 1
# Now after tagging select also those incremental backups in between
# to guaranteee that they can be really restored.
forceTaggingFlag = True
for pos in range(len(backupList) - 1, -1, -1):
if forceTaggingFlag:
tagList[pos] = True
if backupList[pos].getType() == 'full':
forceTaggingFlag = False
elif (tagList[pos]) and (backupList[pos].getType() == 'inc'):
forceTaggingFlag = True
def decreaseTimeField(self, timestamp, decreaseCount):
"""Decrease a given time value."""
timeValue = datetime.datetime.fromtimestamp(timestamp)
if self.intervalUnit == 'year':
timeValue = timeValue.replace(year=timeValue.year-decreaseCount)
elif self.intervalUnit == 'month':
# Decrease the month values. As the timestamp was aligned to
# the first of the month already, the month value can be simple
# decreased.
for modPos in range(0, decreaseCount):
if timeValue.month == 1:
timeValue = timeValue.replace(year=timeValue.year-1, month=12)
else:
timeValue = timeValue.replace(month=timeValue.month-1)
elif self.intervalUnit == 'day':
timeValue = timeValue - datetime.timedelta(days=decreaseCount)
else:
raise Exception('Logic error')
return timeValue.timestamp()
class PolicyTypeLevelRetention(guerillabackup.storagetool.Policy.Policy):
"""This policy type defines a data retention policy keeping
a number of backups for each time level. The policy itself
does not rely on the backup source status but only on the list
of backup data elmeents. The policy will flag elements to be
deleted when this retention policy has no use for those elmements
and will fail if another, concurrent policy or manual intervention,
has flagged elements for deletion which should be kept according
to this policy."""
POLICY_NAME = 'LevelRetention'
def __init__(self, policyConfig):
"""Instantiate this policy object from a JSON policy configuration
definition."""
super().__init__(policyConfig)
# This is the list of retention level definitions.
self.levelList = []
for configKey, configValue in policyConfig.items():
if configKey in ('Name', 'Priority'):
continue
if configKey == 'Levels':
self.parseLevelConfig(configValue)
continue
raise Exception('Unknown policy configuration setting "%s"' % configKey)
# Validate the configuration settings.
if not self.levelList:
raise Exception()
def parseLevelConfig(self, levelConfigList):
"""Parse the retention policy level configuration."""
if not isinstance(levelConfigList, list):
raise Exception()
for levelConfig in levelConfigList:
if not isinstance(levelConfig, dict):
raise Exception()
self.levelList.append(PolicyTypeLevelRetentionTagger(levelConfig))
def apply(self, sourceStatus):
"""Apply this policy to a backup source status."""
elementList = sourceStatus.getDataElementList()
tagList = [False] * len(elementList)
for levelTagger in self.levelList:
levelTagger.tagBackups(elementList, tagList)
for tagPos, element in enumerate(elementList):
element.markForDeletion(not tagList[tagPos])
def delete(self, sourceStatus):
"""Prepare the policy status data for deletions going to
happen later on."""
guerillabackup-0.5.0/src/lib/guerillabackup/storagetool/PolicyTypeSize.py 0000664 0000000 0000000 00000024044 14501370353 0026712 0 ustar 00root root 0000000 0000000 """This module provides support for the backup size checking
policy."""
import json
import sys
import guerillabackup.storagetool.Policy
class PolicyTypeSize(guerillabackup.storagetool.Policy.Policy):
"""This policy type defines a policy checking file sizes of
full and incremental backups.
Applying the policies will also use and modify following backup
data element status information fields:
* FullSizeExpect: The expected size of full backups in bytes.
* FullSizeMax: The maximum size of backups still accepted as
normal.
* FullSizeMin: The minimum size of backups still accepted as
normal.
* IncSizeExpect: The expected size of incremental backups in
bytes."""
POLICY_NAME = 'Size'
def __init__(self, policyConfig):
"""Instantiate this policy object from a JSON policy configuration
definition."""
super().__init__(policyConfig)
self.config = {}
self.fullSizeExpect = None
self.fullSizeMax = None
self.fullSizeMaxRel = None
self.fullSizeMin = None
self.fullSizeMinRel = None
self.incSizeExpect = None
self.incSizeExpectRel = None
self.incSizeMax = None
self.incSizeMaxRel = None
self.incSizeMin = None
self.incSizeMinRel = None
for configKey, configValue in policyConfig.items():
if configKey in ('Name', 'Priority'):
continue
if configKey in ('FullSizeExpect', 'FullSizeMax',
'FullSizeMin', 'IncSizeExpect', 'IncSizeMax', 'IncSizeMin'):
if not isinstance(configValue, int):
raise Exception(
'Policy setting "%s" has to be a integer' % configKey)
if configKey == 'FullSizeExpect':
self.fullSizeExpect = configValue
elif configKey == 'FullSizeMax':
self.fullSizeMax = configValue
elif configKey == 'FullSizeMin':
self.fullSizeMin = configValue
elif configKey == 'IncSizeExpect':
self.incSizeExpect = configValue
elif configKey == 'IncSizeMax':
self.incSizeMax = configValue
elif configKey == 'IncSizeMin':
self.incSizeMin = configValue
elif configKey in ('FullSizeMaxRel', 'FullSizeMinRel',
'IncSizeExpectRel', 'IncSizeMaxRel', 'IncSizeMinRel'):
if not isinstance(configValue, float):
raise Exception(
'Policy setting "%s" has to be a float' % configKey)
if configKey == 'FullSizeMaxRel':
self.fullSizeMaxRel = configValue
elif configKey == 'FullSizeMinRel':
self.fullSizeMinRel = configValue
elif configKey == 'IncSizeExpectRel':
self.incSizeExpectRel = configValue
elif configKey == 'IncSizeMaxRel':
self.incSizeMaxRel = configValue
elif configKey == 'IncSizeMinRel':
self.incSizeMinRel = configValue
else:
raise Exception('Unknown policy configuration setting "%s"' % configKey)
self.config[configKey] = configValue
# Validate the configuration settings. Relative and absolute
# settings may not be set or unset at the same time.
if (self.fullSizeMax is not None) == (self.fullSizeMaxRel is not None):
raise Exception()
if (self.fullSizeMin is not None) == (self.fullSizeMinRel is not None):
raise Exception()
# Incremental settings might be missing when there are no incremental
# backups expected.
if (self.incSizeMax is not None) and (self.incSizeMaxRel is not None):
raise Exception()
if (self.incSizeMin is not None) and (self.incSizeMinRel is not None):
raise Exception()
if self.fullSizeExpect is not None:
if self.fullSizeMaxRel is not None:
self.fullSizeMax = int(self.fullSizeMaxRel * self.fullSizeExpect)
if self.fullSizeMinRel is not None:
self.fullSizeMin = int(self.fullSizeMinRel * self.fullSizeExpect)
if self.incSizeExpectRel is not None:
self.incSizeExpect = int(self.incSizeExpectRel * self.fullSizeExpect)
if self.incSizeExpect is not None:
if self.incSizeMaxRel is not None:
self.incSizeMax = int(self.incSizeMaxRel * self.incSizeExpect)
if self.incSizeMinRel is not None:
self.incSizeMin = int(self.incSizeMinRel * self.incSizeExpect)
def apply(self, sourceStatus):
"""Apply this policy to a backup source status."""
elementList = sourceStatus.getDataElementList()
currentPolicy = self
for elem in elementList:
# This is a new policy configuration created while checking this
# element. It has to be persisted before checking the next element.
newConfig = None
policyData = elem.getPolicyData(PolicyTypeSize.POLICY_NAME)
ignoreFlag = False
if policyData is not None:
for key in policyData.keys():
if key not in ('Config', 'Ignore'):
raise Exception(
'Policy status configuration for "%s" in ' \
'"%s" corrupted: %s' % (
elem.getElementName(),
sourceStatus.getStorageStatus().getStatusFileName(),
repr(policyData)))
if 'Config' in policyData:
# Use the additional policy data.
initConfig = dict(policyData['Config'])
initConfig['Name'] = PolicyTypeSize.POLICY_NAME
currentPolicy = PolicyTypeSize(initConfig)
ignoreFlag = policyData.get('Ignore', False)
if not isinstance(ignoreFlag, bool):
raise Exception(
'Policy status configuration for "%s" in ' \
'"%s" corrupted, "Ignore" parameter has to ' \
'be boolean: %s' % (
elem.getElementName(),
sourceStatus.getStorageStatus().getStatusFileName(),
repr(policyData)))
if elem.getType() == 'full':
# First see if the current policy has already an expected
# full backup size. If not use the current size.
if currentPolicy.fullSizeExpect is None:
newConfig = dict(currentPolicy.config)
newConfig['FullSizeExpect'] = elem.getDataLength()
newConfig['Name'] = PolicyTypeSize.POLICY_NAME
currentPolicy = PolicyTypeSize(newConfig)
elif ((elem.getDataLength() < currentPolicy.fullSizeMin) or \
(elem.getDataLength() > currentPolicy.fullSizeMax)) and \
(not ignoreFlag):
print(
'Full backup size %s in source %s out of ' \
'limits, should be %d <= %d <= %d.' % (
elem.getElementName(),
sourceStatus.getSourceName(),
currentPolicy.fullSizeMin,
elem.getDataLength(),
currentPolicy.fullSizeMax), file=sys.stderr)
print(
'No interactive/automatic policy or ' \
'status update, consider adding this ' \
'manually to the status in %s:\n%s' % (
sourceStatus.getStorageStatus().getStatusFileName(),
json.dumps(
{elem.getElementName(): {'Size': {'Ignore': True}}}, indent=2)
), file=sys.stderr)
else:
# This is an incremental backup, so incremental interval policy
# has to be defined.
if currentPolicy.incSizeExpect is None:
if currentPolicy.incSizeExpectRel is not None:
# There was no full backup seen before any incremental one.
raise Exception(
'Not expecting incremental backups before full ones')
newConfig = dict(currentPolicy.config)
newConfig['IncSizeExpect'] = elem.getDataLength()
newConfig['Name'] = PolicyTypeSize.POLICY_NAME
currentPolicy = PolicyTypeSize(newConfig)
if (currentPolicy.incSizeMin is None) or \
(currentPolicy.incSizeMax is None):
raise Exception(
'Incremental backups from source "%s" in ' \
'config "%s" found but no Size policy ' \
'for incremental data defined' % (
sourceStatus.getSourceName(),
sourceStatus.getStorageStatus().getConfig().getConfigFileName()))
if ((elem.getDataLength() < currentPolicy.incSizeMin) or \
(elem.getDataLength() > currentPolicy.incSizeMax)) and \
(not ignoreFlag):
print(
'Incremental backup size %s in source %s out of ' \
'limits, should be %d <= %d <= %d.' % (
elem.getElementName(),
sourceStatus.getSourceName(),
currentPolicy.incSizeMin,
elem.getDataLength(),
currentPolicy.incSizeMax), file=sys.stderr)
print(
'No interactive/automatic policy or ' \
'status update, consider adding this ' \
'manually to the status in %s:\n%s' % (
sourceStatus.getStorageStatus().getStatusFileName(),
json.dumps(
{elem.getElementName(): {'Size': {'Ignore': True}}}, indent=2)
), file=sys.stderr)
if newConfig is not None:
if policyData is None:
policyData = {}
elem.setPolicyData(PolicyTypeSize.POLICY_NAME, policyData)
policyData['Config'] = newConfig
def delete(self, sourceStatus):
"""Prepare the policy status data for deletions going to
happen later on."""
elementList = sourceStatus.getDataElementList()
# Keep also track of the last persistent and the current policy.
# When deleting an element with policy updates then move the
# current policy data to the first element not deleted.
persistentPolicyConfig = currentPolicyConfig = self.config
for elem in elementList:
policyData = elem.getPolicyData(PolicyTypeSize.POLICY_NAME)
if (policyData is not None) and ('Config' in policyData):
currentPolicyConfig = policyData['Config']
if not elem.isMarkedForDeletion():
persistentPolicyConfig = currentPolicyConfig
if not elem.isMarkedForDeletion():
if persistentPolicyConfig != currentPolicyConfig:
if policyData is None:
policyData = {}
elem.setPolicyData(PolicyTypeSize.POLICY_NAME, policyData)
policyData['Config'] = currentPolicyConfig
persistentPolicyConfig = currentPolicyConfig
guerillabackup-0.5.0/test/ 0000775 0000000 0000000 00000000000 14501370353 0015466 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/test/DefaultFileSystemSinkTest 0000775 0000000 0000000 00000003716 14501370353 0022501 0 ustar 00root root 0000000 0000000 #!/usr/bin/python3 -BEsStt
"""This test collection attempts to verify that the DefaultFileSystemSink
class works as expected."""
import sys
sys.path = sys.path[1:] + ['/usr/lib/guerillabackup/lib', '/etc/guerillabackup/lib-enabled']
import hashlib
import os
import time
import guerillabackup
from guerillabackup.BackupElementMetainfo import BackupElementMetainfo
testDirName = '/tmp/gb-test-%s' % time.time()
print('Using %s for testing' % testDirName, file=sys.stderr)
os.mkdir(testDirName, 0o700)
baseDirFd = os.open(testDirName, os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW)
sink = guerillabackup.DefaultFileSystemSink(
{guerillabackup.DefaultFileSystemSink.SINK_BASEDIR_KEY: testDirName})
try:
sinkHandle = sink.getSinkHandle('somepath/invalid')
raise Exception('Illegal state')
except Exception as testException:
if testException.args[0] != 'Slashes not conforming':
raise testException
try:
sinkHandle = sink.getSinkHandle('/somepath/invalid/')
raise Exception('Illegal state')
except Exception as testException:
if testException.args[0] != 'Slashes not conforming':
raise testException
try:
sinkHandle = sink.getSinkHandle('/somepath/../invalid')
raise Exception('Illegal state')
except Exception as testException:
if testException.args[0] != '. and .. forbidden':
raise testException
try:
sinkHandle = sink.getSinkHandle('/somepath/./invalid')
raise Exception('Illegal state')
except Exception as testException:
if testException.args[0] != '. and .. forbidden':
raise testException
sinkHandle = sink.getSinkHandle('/somepath/valid')
sinkInputFd = os.open('/dev/urandom', os.O_RDONLY)
sinkTestData = os.read(sinkInputFd, 1<<16)
sinkStream = sinkHandle.getSinkStream()
digestAlgo = hashlib.sha512()
os.write(sinkStream, sinkTestData)
digestAlgo.update(sinkTestData)
metaInfo = {
'BackupType': 'full',
'StorageFileChecksumSha512': digestAlgo.digest(),
'Timestamp': 1234567890}
sinkHandle.close(BackupElementMetainfo(metaInfo))
guerillabackup-0.5.0/test/LibIOTests 0000775 0000000 0000000 00000005630 14501370353 0017401 0 ustar 00root root 0000000 0000000 #!/usr/bin/python3 -BEsStt
"""This test collection attempts to verify that the library low-level
IO functions work as expected."""
import errno
import os
import sys
import time
sys.path = sys.path[1:] + ['/usr/lib/guerillabackup/lib', '/etc/guerillabackup/lib-enabled']
import guerillabackup
testDirName = '/tmp/gb-test-%s' % time.time()
os.mkdir(testDirName, 0o700)
baseDirFd = os.open(testDirName, os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW)
# Open a nonexisting file without creating it:
try:
newFileFd = guerillabackup.secureOpenAt(
baseDirFd, 'newfile',
fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY,
fileCreateMode=0o666)
raise Exception('Illegal state')
except OSError as osError:
if osError.errno != errno.ENOENT:
raise Exception('Illegal state: %s' % osError)
# Open a file, creating it:
newFileFd = guerillabackup.secureOpenAt(
baseDirFd, 'newfile',
fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY|os.O_CREAT|os.O_EXCL,
fileCreateMode=0o666)
os.close(newFileFd)
# Try again, now should fail as already existing:
try:
newFileFd = guerillabackup.secureOpenAt(
baseDirFd, 'newfile',
fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY|os.O_CREAT|os.O_EXCL,
fileCreateMode=0o666)
except OSError as osError:
if osError.errno != errno.EEXIST:
raise Exception('Illegal state: %s' % osError)
# Try to create directory and file but directory still missing:
try:
newFileFd = guerillabackup.secureOpenAt(
baseDirFd, 'newdir/newfile',
dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY,
dirCreateMode=None,
fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY|os.O_CREAT|os.O_EXCL,
fileCreateMode=0o666)
raise Exception('Illegal state')
except OSError as osError:
if osError.errno != errno.ENOENT:
raise Exception('Illegal state: %s' % osError)
# Try to create directory and file:
newFileFd = guerillabackup.secureOpenAt(
baseDirFd, 'newdir/newfile',
dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY,
dirCreateMode=0o777,
fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY|os.O_CREAT|os.O_EXCL,
fileCreateMode=0o666)
# Try to create only directories: A normal open call would create
# a file, that could not be reopened using the same flags.
newFileFd = guerillabackup.secureOpenAt(
baseDirFd, 'newdir/subdir',
dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY,
dirCreateMode=0o777,
fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY|os.O_CREAT|os.O_EXCL|os.O_DIRECTORY,
fileCreateMode=0o777)
os.close(newFileFd)
newFileFd = guerillabackup.secureOpenAt(
baseDirFd, 'newdir/subdir',
dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY,
dirCreateMode=0o777,
fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY|os.O_DIRECTORY,
fileCreateMode=0o777)
print('No recursive testdir cleanup for %s' % testDirName, file=sys.stderr)
guerillabackup-0.5.0/test/LogfileBackupUnitTest/ 0000775 0000000 0000000 00000000000 14501370353 0021675 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/test/LogfileBackupUnitTest/LogfileBackupUnit.config 0000664 0000000 0000000 00000003020 14501370353 0026426 0 ustar 00root root 0000000 0000000 # LogFileBackupUnit configuration template
# This list contains tuples with five elements per logfile backup
# input. The meaning of each value is:
# * Input directory: absolute directory name to search for logfiles.
# * Input file regex: regular expression to select compressed
# or uncompressed logfiles for inclusion. When the regex contains
# a named group "oldserial", a file with empty serial is handled
# as newest while file with largest serial value is the oldest.
# With named group "serial", oldest file will have smallest
# serial number, e.g. with date or timestamp file extensions.
# When a named group "compress" is found, the match content,
# e.g. "gz" or "bz2", will be used to find a decompressor and
# decompress the file before processing.
# * Source URL transformation: If None, the first named group
# of the "input file regex" is appended to the input directory
# name and used as source URL. When not starting with a "/",
# the transformation string is the name to include literally
# in the URL after the "input directory" name.
# * Policy: If not none, include this string as handling policy
# within the manifest.
# * Encryption key name: If not None, encrypt the input using
# the named key.
LogBackupUnitInputList = []
# Include old (rotated) default syslog files, where serial number
# was already appended. Accept also the compressed variants.
LogBackupUnitInputList.append((
'[TmpDir]/logs',
'^(test\\.log)\\.(?P[0-9]+)(?:\\.(?Pgz))?$',
None, None, None))
guerillabackup-0.5.0/test/LogfileBackupUnitTest/Readme.txt 0000664 0000000 0000000 00000002707 14501370353 0023641 0 ustar 00root root 0000000 0000000 Description:
============
This directory contains configuration for logfile backup unit
testing.
Test invocation:
================
projectBaseDir="... directory with GuerillaBackup source ..."
tmpDir="$(mktemp -d)"
mkdir -- "${tmpDir}/config" "${tmpDir}/data" "${tmpDir}/sink" "${tmpDir}/logs"
cp -a -- "${projectBaseDir}/test/LogfileBackupUnitTest/config" "${tmpDir}/config"
ln -s -- "${projectBaseDir}/src/lib/guerillabackup/LogfileBackupUnit.py" "${tmpDir}/config/units/LogfileBackupUnit"
cp -a -- "${projectBaseDir}/test/LogfileBackupUnitTest/LogfileBackupUnit.config" "${tmpDir}/config/units"
sed -i -r -e "s:\[TmpDir\]:${tmpDir}:g" -- "${tmpDir}/config/config" "${tmpDir}/config/units/LogfileBackupUnit.config"
echo 0 > "${tmpDir}/logs/test.log"
dd if=/dev/zero bs=1M count=32 > "${tmpDir}/logs/test.log.1"
dd if=/dev/zero bs=1M count=32 | gzip -c9 > "${tmpDir}/logs/test.log.2.gz"
cp -a -- "${tmpDir}/logs/test.log.2.gz" "${tmpDir}/logs/test.log.3.gz"
cp -a -- "${tmpDir}/logs/test.log.2.gz" "${tmpDir}/logs/test.log.4.gz"
cp -a -- "${tmpDir}/logs/test.log.2.gz" "${tmpDir}/logs/test.log.5.gz"
cp -a -- "${tmpDir}/logs/test.log.2.gz" "${tmpDir}/logs/test.log.6.gz"
cp -a -- "${tmpDir}/logs/test.log.2.gz" "${tmpDir}/logs/test.log.7.gz"
echo "Starting LogfileBackupUnit testing in ${tmpDir}"
"${projectBaseDir}/src/gb-backup-generator" --ConfigDir "${tmpDir}/config"
sed -i -r -e "s:\[[0-9]+,:[123,:g" -- "${tmpDir}/state/generators/LogfileBackupUnit/state.current"
guerillabackup-0.5.0/test/LogfileBackupUnitTest/config 0000664 0000000 0000000 00000010255 14501370353 0023070 0 ustar 00root root 0000000 0000000 # GuerillaBackup main configuration file.
# General parameters influence behavior of various backup elements,
# e.g. source units, sinks or the generator itself. All those
# parameters start with "General" to indicate their global relevance.
# This is the default persistency storage base directory for all
# components. All components will create files or subdirectories
# starting with the component class name unless changed within
# configuration. See also "ComponentPersistencyPrefix" in unit
# or subunit configuration files.
GeneralPersistencyBaseDir = '[TmpDir]/state'
# This is the default runtime data directory for all components.
# It is used to create sockets, PID files and similar, that need
# not to be preserved between reboots.
GeneralRuntimeDataDir = '[TmpDir]/run'
# This parameter defines the default pipeline element to use to
# compress backup data of any kind before sending it to downstream
# processing, usually encryption of sink. When enabling compression
# and encryption, you may want to disable the additional compression
# step included in many encryption toosl, e.g. via "--compress-algo
# none" in gpg.
GeneralDefaultCompressionElement = guerillabackup.OSProcessPipelineElement('/bin/bzip2', ['/bin/bzip2', '-c9'])
# This parameter defines the default encryption pipeline element
# to use to encrypt backup data of any kind before sending it
# to donwstream processing. For security reasons, a unit might
# use an alternative encryption element, e.g. with different options
# or keys, but it should NEVER ignore the parameter, even when
# unit-specific encryption is disabled. Hence the unit shall never
# generate uncencrypted data while this parameter is not also
# overriden in the unit-specific configuration. See also function
# "getDefaultDownstreamPipeline" documentation.
# GeneralDefaultEncryptionElement = guerillabackup.GpgEncryptionPipelineElement('some key name')
# Debugging settings:
# This flag enables test mode for all configurable components
# in the data pipe from source to sink. As testing of most features
# will require to run real backups, the testing mode will cause
# an abort in the very last moment before completion. Wellbehaved
# components will roll back most of the actions under this circumstances.
# GeneralDebugTestModeFlag = False
# Generator specific settings: Those settings configure the local
# default backup generator.
# Use this sink for storage of backup data elements. The class
# has to have a constructor only taking one argument, that is
# the generator configuration context as defined by the SinkInterface.
# When empty, the guerillabackup.DefaultFileSystemSink is used.
# GeneratorSinkClass = guerillabackup.DefaultFileSystemSink
# Use this directory for storage of the backup data elements generated
# locally. The default location is "/var/lib/guerillabackup/data".
# You may want also to enable transfer services using this directory
# as source to copy or move backup data to an offsite location.
DefaultFileSystemSinkBaseDir = '[TmpDir]/sink'
# Unit specific default and specific settings can be found in
# the units directory.
# Transfer service configuration: this part of configuration does
# not take effect automatically, a transfer service has to be
# started loading this configuration file. When security considerations
# prohibit use of same configuration, e.g. due to inaccessibility
# of configuration file because of permission settings, then this
# file should be copied to "config-[agent name]" instead.
# Storage directory used by this transfer service. When not present,
# the DefaultFileSystemSinkBaseDir is used instead.
# TransferServiceStorageBaseDir = '/var/spool/guerillabackup/transfer'
# Class to load to define the transfer receiver policy.
# TransferReceiverPolicyClass = guerillabackup.Transfer.ReceiverStoreDataTransferPolicy
# Arguments for creating the named transfer policy to pass after
# the configuration context.
# TransferReceiverPolicyInitArgs = None
# Class to load to define the transfer sender policy.
# TransferSenderPolicyClass = guerillabackup.Transfer.SenderMoveDataTransferPolicy
# Arguments for creating the named transfer policy to pass after
# the configuration context.
# TransferSenderPolicyInitArgs = [False]
guerillabackup-0.5.0/test/ReceiverOnlyTransferService/ 0000775 0000000 0000000 00000000000 14501370353 0023122 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/test/ReceiverOnlyTransferService/Readme.txt 0000664 0000000 0000000 00000001757 14501370353 0025072 0 ustar 00root root 0000000 0000000 Description:
============
This directory contains a transfer service implementation with
a test receiver only transfer configuration. It just listens on
an input socket, which has to be connected externally.
Transfer invocation:
====================
projectBaseDir="... directory with GuerillaBackup source ..."
tmpDir="$(mktemp -d)"
mkdir -- "${tmpDir}/config" "${tmpDir}/data"
cp -a -- "${projectBaseDir}/test/ReceiverOnlyTransferService/config" "${tmpDir}/config"
sed -i -r -e "s:\[TmpDir\]:${tmpDir}:g" -- "${tmpDir}/config/config"
echo "Listening on socket ${tmpDir}/run/transfer.socket"
"${projectBaseDir}/src/gb-transfer-service" --Config "${tmpDir}/config/config"
Connect the gb-transfer-service to an instance with a sending policy,
e.g. see SenderOnlyTransferService testcase.
socat "UNIX-CONNECT:${tmpDir}/run/transfer.socket" "UNIX-CONNECT:...other socket"
Terminate the gb-transfer-service using [Ctrl]-C and check, that backups
were transferred as expected.
ls -al -- "${tmpDir}/data"
guerillabackup-0.5.0/test/ReceiverOnlyTransferService/config 0000664 0000000 0000000 00000010421 14501370353 0024310 0 ustar 00root root 0000000 0000000 # GuerillaBackup main configuration file.
# General parameters influence behavior of various backup elements,
# e.g. source units, sinks or the generator itself. All those
# parameters start with "General" to indicate their global relevance.
# This is the default persistency storage base directory for all
# components. All components will create files or subdirectories
# starting with the component class name unless changed within
# configuration. See also "ComponentPersistencyPrefix" in unit
# or subunit configuration files.
GeneralPersistencyBaseDir = '[TmpDir]/state'
# This is the default runtime data directory for all components.
# It is used to create sockets, PID files and similar, that need
# not to be preserved between reboots.
GeneralRuntimeDataDir = '[TmpDir]/run'
# This parameter defines the default pipeline element to use to
# compress backup data of any kind before sending it to downstream
# processing, usually encryption of sink. When enabling compression
# and encryption, you may want to disable the additional compression
# step included in many encryption toosl, e.g. via "--compress-algo
# none" in gpg.
GeneralDefaultCompressionElement = guerillabackup.OSProcessPipelineElement(
'/bin/bzip2', ['/bin/bzip2', '-c9'])
# This parameter defines the default encryption pipeline element
# to use to encrypt backup data of any kind before sending it
# to donwstream processing. For security reasons, a unit might
# use an alternative encryption element, e.g. with different options
# or keys, but it should NEVER ignore the parameter, even when
# unit-specific encryption is disabled. Hence the unit shall never
# generate uncencrypted data while this parameter is not also
# overriden in the unit-specific configuration. See also function
# "getDefaultDownstreamPipeline" documentation.
# GeneralDefaultEncryptionElement = guerillabackup.GpgEncryptionPipelineElement('some key name')
# Debugging settings:
# This flag enables test mode for all configurable components
# in the data pipe from source to sink. As testing of most features
# will require to run real backups, the testing mode will cause
# an abort in the very last moment before completion. Wellbehaved
# components will roll back most of the actions under this circumstances.
# GeneralDebugTestModeFlag = False
# Generator specific settings: Those settings configure the local
# default backup generator.
# Use this sink for storage of backup data elements. The class
# has to have a constructor only taking one argument, that is
# the generator configuration context as defined by the SinkInterface.
# When empty, the guerillabackup.DefaultFileSystemSink is used.
# GeneratorSinkClass = guerillabackup.DefaultFileSystemSink
# Use this directory for storage of the backup data elements generated
# locally. Usually this is "/var/lib/guerillabackup/data" when
# for temporary local storage (local disk-to-disk, e.g. to have
# older versions to recover e.g. after admin errors or failed
# system updates) or "/var/spool/guerillabackup/outgoing" when
# backup data should be transfered to different location using
# asynchronous fetch operations.
DefaultFileSystemSinkBaseDir = '[TmpDir]/data'
# Unit specific default and specific settings can be found in
# the units directory.
# Transfer service configuration: this part of configuration does
# not take effect automatically, a transfer service has to be
# started loading this configuration file. When security considerations
# prohibit use of same configuration, e.g. due to inaccessibility
# of configuration file because of permission settings, then this
# file should be copied to "config-[agent name]" instead.
# Storage directory used by this transfer service. When not present,
# the DefaultFileSystemSinkBaseDir is used instead.
# TransferServiceStorageBaseDir = '/var/spool/guerillabackup/transfer'
# Class to load to define the transfer receiver policy.
TransferReceiverPolicyClass = guerillabackup.Transfer.ReceiverStoreDataTransferPolicy
# Arguments for creating the named transfer policy to pass after
# the configuration context.
# TransferReceiverPolicyInitArgs = None
# Class to load to define the transfer sender policy.
# TransferSenderPolicyClass = None
# Arguments for creating the named transfer policy to pass after
# the configuration context.
# TransferSenderPolicyInitArgs = None
guerillabackup-0.5.0/test/SenderOnlyTransferService/ 0000775 0000000 0000000 00000000000 14501370353 0022576 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/test/SenderOnlyTransferService/Readme.txt 0000664 0000000 0000000 00000003723 14501370353 0024541 0 ustar 00root root 0000000 0000000 Description:
============
This directory contains a transfer service implementation with
a test backup generator adding one simple tar backup every minute
and a transfer service configuration to send those.
Generator invocation:
=====================
projectBaseDir="... directory with GuerillaBackup source ..."
tmpDir="$(mktemp -d)"
mkdir -- "${tmpDir}/config" "${tmpDir}/data" "${tmpDir}/log"
echo "Testlogdata" > "${tmpDir}/log/test.log.0"
cp -a -- "${projectBaseDir}/test/SenderOnlyTransferService/config" "${projectBaseDir}/test/SenderOnlyTransferService/units" "${tmpDir}/config"
sed -i -r -e "s:\[TmpDir\]:${tmpDir}:g" -- "${tmpDir}/config/config" "${tmpDir}/config/units/LogfileBackupUnit.config" "${tmpDir}/config/units/TarBackupUnit.config"
ln -s -- "${projectBaseDir}/src/lib/guerillabackup/LogfileBackupUnit.py" "${tmpDir}/config/units/LogfileBackupUnit"
ln -s -- "${projectBaseDir}/src/lib/guerillabackup/TarBackupUnit.py" "${tmpDir}/config/units/TarBackupUnit"
"${projectBaseDir}/src/gb-backup-generator" --ConfigDir "${tmpDir}/config"
Terminate the generator using [Ctrl]-C and check, that backups
were created.
ls -alR -- "${tmpDir}/data"
To test data corruption handling, append a byte to one of the
data files.
echo "corrupted!" >> "${tmpDir}/data/.....data"
gb-transfer-service invocation:
============================
Start the service:
echo "Listening on socket ${tmpDir}/run/transfer.socket"
"${projectBaseDir}/src/gb-transfer-service" --Config "${tmpDir}/config/config"
Send test requests using the fake client: IO-handling is simplified,
so just press return on empty lines until expected response was
received.
"${projectBaseDir}/test/SyncProtoTestClient" "${tmpDir}/run/transfer.socket"
send Rnull
send R
send S["getPolicyInfo"]
send S["startTransaction", null]
send S["nextDataElement", false]
send S["getDataElementInfo"]
send S["getDataElementStream"]
send S["nextDataElement", true]
...
send S
Normal transfer client test:
See ReceiverOnlyTransferService
guerillabackup-0.5.0/test/SenderOnlyTransferService/config 0000664 0000000 0000000 00000010417 14501370353 0023771 0 ustar 00root root 0000000 0000000 # GuerillaBackup main configuration file.
# General parameters influence behavior of various backup elements,
# e.g. source units, sinks or the generator itself. All those
# parameters start with "General" to indicate their global relevance.
# This is the default persistency storage base directory for all
# components. All components will create files or subdirectories
# starting with the component class name unless changed within
# configuration. See also "ComponentPersistencyPrefix" in unit
# or subunit configuration files.
GeneralPersistencyBaseDir = '[TmpDir]/state'
# This is the default runtime data directory for all components.
# It is used to create sockets, PID files and similar, that need
# not to be preserved between reboots.
GeneralRuntimeDataDir = '[TmpDir]/run'
# This parameter defines the default pipeline element to use to
# compress backup data of any kind before sending it to downstream
# processing, usually encryption of sink. When enabling compression
# and encryption, you may want to disable the additional compression
# step included in many encryption toosl, e.g. via "--compress-algo
# none" in gpg.
GeneralDefaultCompressionElement = guerillabackup.OSProcessPipelineElement(
'/bin/bzip2', ['/bin/bzip2', '-c9'])
# This parameter defines the default encryption pipeline element
# to use to encrypt backup data of any kind before sending it
# to donwstream processing. For security reasons, a unit might
# use an alternative encryption element, e.g. with different options
# or keys, but it should NEVER ignore the parameter, even when
# unit-specific encryption is disabled. Hence the unit shall never
# generate uncencrypted data while this parameter is not also
# overriden in the unit-specific configuration. See also function
# "getDefaultDownstreamPipeline" documentation.
# GeneralDefaultEncryptionElement = guerillabackup.GpgEncryptionPipelineElement('some key name')
# Debugging settings:
# This flag enables test mode for all configurable components
# in the data pipe from source to sink. As testing of most features
# will require to run real backups, the testing mode will cause
# an abort in the very last moment before completion. Wellbehaved
# components will roll back most of the actions under this circumstances.
# GeneralDebugTestModeFlag = False
# Generator specific settings: Those settings configure the local
# default backup generator.
# Use this sink for storage of backup data elements. The class
# has to have a constructor only taking one argument, that is
# the generator configuration context as defined by the SinkInterface.
# When empty, the guerillabackup.DefaultFileSystemSink is used.
# GeneratorSinkClass = guerillabackup.DefaultFileSystemSink
# Use this directory for storage of the backup data elements generated
# locally. Usually this is "/var/lib/guerillabackup/data" when
# for temporary local storage (local disk-to-disk, e.g. to have
# older versions to recover e.g. after admin errors or failed
# system updates) or "/var/spool/guerillabackup/outgoing" when
# backup data should be transfered to different location using
# asynchronous fetch operations.
DefaultFileSystemSinkBaseDir = '[TmpDir]/data'
# Unit specific default and specific settings can be found in
# the units directory.
# Transfer service configuration: this part of configuration does
# not take effect automatically, a transfer service has to be
# started loading this configuration file. When security considerations
# prohibit use of same configuration, e.g. due to inaccessibility
# of configuration file because of permission settings, then this
# file should be copied to "config-[agent name]" instead.
# Storage directory used by this transfer service. When not present,
# the DefaultFileSystemSinkBaseDir is used instead.
# TransferServiceStorageBaseDir = '/var/spool/guerillabackup/transfer'
# Class to load to define the transfer receiver policy.
# TransferReceiverPolicyClass = None
# Arguments for creating the named transfer policy to pass after
# the configuration context.
# TransferReceiverPolicyInitArgs = None
# Class to load to define the transfer sender policy.
TransferSenderPolicyClass = guerillabackup.Transfer.SenderMoveDataTransferPolicy
# Arguments for creating the named transfer policy to pass after
# the configuration context.
TransferSenderPolicyInitArgs = [False]
guerillabackup-0.5.0/test/SenderOnlyTransferService/units/ 0000775 0000000 0000000 00000000000 14501370353 0023740 5 ustar 00root root 0000000 0000000 guerillabackup-0.5.0/test/SenderOnlyTransferService/units/LogfileBackupUnit.config 0000664 0000000 0000000 00000003015 14501370353 0030475 0 ustar 00root root 0000000 0000000 # LogFileBackupUnit configuration template
# This list contains tuples with five elements per logfile backup
# input. The meaning of each value is:
# * Input directory: absolute directory name to search for logfiles.
# * Input file regex: regular expression to select compressed
# or uncompressed logfiles for inclusion. When the regex contains
# a named group "oldserial", a file with empty serial is handled
# as newest while file with largest serial value is the oldest.
# With named group "serial", oldest file will have smallest
# serial number, e.g. with date or timestamp file extensions.
# When a named group "compress" is found, the match content,
# e.g. "gz" or "bz2", will be used to find a decompressor and
# decompress the file before processing.
# * Source URL transformation: If None, the first named group
# of the "input file regex" is appended to the input directory
# name and used as source URL. When not starting with a "/",
# the transformation string is the name to include literally
# in the URL after the "input directory" name.
# * Policy: If not none, include this string as handling policy
# within the manifest.
# * Encryption key name: If not None, encrypt the input using
# the named key.
LogBackupUnitInputList = []
# Include old (rotated) default syslog files, where serial number
# was already appended. Accept also the compressed variants.
LogBackupUnitInputList.append((
'[TmpDir]/log',
'^([a-z.-]+)\\.(?P[0-9]+)(?:\\.(?Pgz))?$',
None, None, None))
guerillabackup-0.5.0/test/SenderOnlyTransferService/units/TarBackupUnit.config 0000664 0000000 0000000 00000005051 14501370353 0027644 0 ustar 00root root 0000000 0000000 # TarBackupUnit configuration template
# This list contains dictionaries with configuration parameters
# for each tar backup to run. All tar backups of one unit are
# run sequentially. Configuration parameters are:
# * PreBackupCommand: execute this command given as list of arguments
# before starting the backup, e.g. create a filesystem or virtual
# machine snapshot, perform cleanup.
# * PostBackupCommand: execute this command after starting the
# backup.
# * Root: root directory of tar backup, "/" when missing.
# * Include: list of pathes to include, ["."] when missing.
# * Exclude: list of patterns to exclude from backup (see tar
# documentation "--exclude"). When missing and Root is "/",
# list ["./var/lib/guerillabackup/data"] is used.
# * IgnoreBackupRaces: flag to indicate if races during backup
# are acceptable, e.g. because the directories are modified,
# * FullBackupTiming: tuple with minimum and maximum interval
# between full backup invocations and modulo base and offset,
# all in seconds. Without modulo invocation (all values None),
# full backups will run as soon as minimum interval is exceeded.
# With modulo timing, modulo trigger is ignored when below minimum
# time. When gap above maximum interval, immediate backup is
# started.
# * IncBackupTiming: When set, incremental backups are created
# to fill the time between full backups. Timings are specified
# as tuple with same meaning as in FullBackupTiming parameter.
# This will also trigger generation of tar file indices when
# running full backups.
# * FullOverrideCommand: when set, parameters Exclude, Include,
# Root are ignored and exactly the given command is executed.
# * IncOverrideCommand: when set, parameters Exclude, Include,
# Root are ignored and exactly the given command is executed.
# * KeepIndices: number of old incremental tar backup indices
# to keep. With -1 keep all, otherwise keep one the given number.
# Default is 0.
# * Policy: If not none, include this string as handling policy
# * EncryptionKey: If not None, encrypt the input using the named
# key. Otherwise default encryption key from global configuration
# might be used.
TarBackupUnitConfigList = {}
TarBackupUnitConfigList['/test'] = {
'Root': '[TmpDir]', 'Include': ['.'],
'Exclude': ['./data'],
'IgnoreBackupRaces': False,
# Create full backup every 10 minutes.
'FullBackupTiming': [570, 630, 600, 0],
# Create incremental backup every minute.
'IncBackupTiming': [55, 65, 60, 0],
'KeepIndices': 20,
'Policy': 'default',
'EncryptionKey': None}
guerillabackup-0.5.0/test/SyncProtoTestClient 0000775 0000000 0000000 00000006016 14501370353 0021356 0 ustar 00root root 0000000 0000000 #!/usr/bin/python3 -BEsStt
"""This client connects to a sync service and sends requests from
stdin and prints the responses. This can be used for testing of
StreamRequestResponseMultiplexer from guerillabackup.Transfer.
See source of StreamRequestResponseMultiplexer for description
of protocol structure."""
import sys
sys.path = sys.path[1:] + ['/usr/lib/guerillabackup/lib', '/etc/guerillabackup/lib-enabled']
import errno
import fcntl
import os
import socket
import struct
if len(sys.argv) != 2:
print('Usage %s [target]' % sys.argv[0], file=sys.stderr)
sys.exit(1)
connectAddress = sys.argv[1]
clientSocket = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
clientSocket.connect(connectAddress)
print('Connected to %s' % repr(connectAddress), file=sys.stderr)
flags = fcntl.fcntl(clientSocket.fileno(), fcntl.F_GETFL)
fcntl.fcntl(clientSocket.fileno(), fcntl.F_SETFL, flags|os.O_NONBLOCK)
remoteData = b''
while True:
readData = None
try:
readData = clientSocket.recv(1<<20)
except socket.error as receiveError:
if receiveError.errno == errno.EAGAIN:
readData = b''
else:
raise
if len(readData) != 0:
print('Received %d bytes of remote data' % len(readData), file=sys.stderr)
remoteData += readData
if len(remoteData) >= 5:
if remoteData[0] not in b'APRS':
print('Invalid remote data package type %s, purging data %s' % (
repr(remoteData[0]), repr(remoteData)), file=sys.stderr)
remoteData = b''
else:
remoteDataLength = struct.unpack(' (1<<20)):
print('Invalid remote data length %d, purging data %s' % (
remoteDataLength, repr(remoteData)), file=sys.stderr)
remoteData = b''
elif remoteDataLength+5 <= len(remoteData):
print('Received valid packet %s' % repr(remoteData[0:1]+remoteData[5:5+remoteDataLength]),
file=sys.stderr)
remoteData = remoteData[5+remoteDataLength:]
# Try again to read more data
continue
# No remote data to dump, try to read a command
commandLine = sys.stdin.readline()
if commandLine == '':
# End of input.
break
commandLine = commandLine[:-1]
if commandLine == '':
continue
commandLength = commandLine.find(' ')
if commandLength < 0:
commandLength = len(commandLine)
command = commandLine[:commandLength]
if command == 'send':
sendData = bytes(commandLine[commandLength+1:], sys.getdefaultencoding())
if (len(sendData) == 0) or (sendData[0] not in b'APRS'):
print('Send data has to start with type letter, optionally ' \
'followed by data %s' % repr(sendData), file=sys.stderr)
continue
sendData = sendData[0:1]+struct.pack('