pax_global_header00006660000000000000000000000064145013703530014513gustar00rootroot0000000000000052 comment=30d1ca3a4243858eb811ef7a00fadeeb9530e135 guerillabackup-0.5.0/000077500000000000000000000000001450137035300145075ustar00rootroot00000000000000guerillabackup-0.5.0/LICENSE000066400000000000000000000167431450137035300155270ustar00rootroot00000000000000 GNU LESSER GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. This version of the GNU Lesser General Public License incorporates the terms and conditions of version 3 of the GNU General Public License, supplemented by the additional permissions listed below. 0. Additional Definitions. As used herein, "this License" refers to version 3 of the GNU Lesser General Public License, and the "GNU GPL" refers to version 3 of the GNU General Public License. "The Library" refers to a covered work governed by this License, other than an Application or a Combined Work as defined below. An "Application" is any work that makes use of an interface provided by the Library, but which is not otherwise based on the Library. Defining a subclass of a class defined by the Library is deemed a mode of using an interface provided by the Library. A "Combined Work" is a work produced by combining or linking an Application with the Library. The particular version of the Library with which the Combined Work was made is also called the "Linked Version". The "Minimal Corresponding Source" for a Combined Work means the Corresponding Source for the Combined Work, excluding any source code for portions of the Combined Work that, considered in isolation, are based on the Application, and not on the Linked Version. The "Corresponding Application Code" for a Combined Work means the object code and/or source code for the Application, including any data and utility programs needed for reproducing the Combined Work from the Application, but excluding the System Libraries of the Combined Work. 1. Exception to Section 3 of the GNU GPL. You may convey a covered work under sections 3 and 4 of this License without being bound by section 3 of the GNU GPL. 2. Conveying Modified Versions. If you modify a copy of the Library, and, in your modifications, a facility refers to a function or data to be supplied by an Application that uses the facility (other than as an argument passed when the facility is invoked), then you may convey a copy of the modified version: a) under this License, provided that you make a good faith effort to ensure that, in the event an Application does not supply the function or data, the facility still operates, and performs whatever part of its purpose remains meaningful, or b) under the GNU GPL, with none of the additional permissions of this License applicable to that copy. 3. Object Code Incorporating Material from Library Header Files. The object code form of an Application may incorporate material from a header file that is part of the Library. You may convey such object code under terms of your choice, provided that, if the incorporated material is not limited to numerical parameters, data structure layouts and accessors, or small macros, inline functions and templates (ten or fewer lines in length), you do both of the following: a) Give prominent notice with each copy of the object code that the Library is used in it and that the Library and its use are covered by this License. b) Accompany the object code with a copy of the GNU GPL and this license document. 4. Combined Works. You may convey a Combined Work under terms of your choice that, taken together, effectively do not restrict modification of the portions of the Library contained in the Combined Work and reverse engineering for debugging such modifications, if you also do each of the following: a) Give prominent notice with each copy of the Combined Work that the Library is used in it and that the Library and its use are covered by this License. b) Accompany the Combined Work with a copy of the GNU GPL and this license document. c) For a Combined Work that displays copyright notices during execution, include the copyright notice for the Library among these notices, as well as a reference directing the user to the copies of the GNU GPL and this license document. d) Do one of the following: 0) Convey the Minimal Corresponding Source under the terms of this License, and the Corresponding Application Code in a form suitable for, and under terms that permit, the user to recombine or relink the Application with a modified version of the Linked Version to produce a modified Combined Work, in the manner specified by section 6 of the GNU GPL for conveying Corresponding Source. 1) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (a) uses at run time a copy of the Library already present on the user's computer system, and (b) will operate properly with a modified version of the Library that is interface-compatible with the Linked Version. e) Provide Installation Information, but only if you would otherwise be required to provide such information under section 6 of the GNU GPL, and only to the extent that such information is necessary to install and execute a modified version of the Combined Work produced by recombining or relinking the Application with a modified version of the Linked Version. (If you use option 4d0, the Installation Information must accompany the Minimal Corresponding Source and Corresponding Application Code. If you use option 4d1, you must provide the Installation Information in the manner specified by section 6 of the GNU GPL for conveying Corresponding Source.) 5. Combined Libraries. You may place library facilities that are a work based on the Library side by side in a single library together with other library facilities that are not Applications and are not covered by this License, and convey such a combined library under terms of your choice, if you do both of the following: a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities, conveyed under the terms of this License. b) Give prominent notice with the combined library that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work. 6. Revised Versions of the GNU Lesser General Public License. The Free Software Foundation may publish revised and/or new versions of the GNU Lesser General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Library as you received it specifies that a certain numbered version of the GNU Lesser General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that published version or of any later version published by the Free Software Foundation. If the Library as you received it does not specify a version number of the GNU Lesser General Public License, you may choose any version of the GNU Lesser General Public License ever published by the Free Software Foundation. If the Library as you received it specifies that a proxy can decide whether future versions of the GNU Lesser General Public License shall apply, that proxy's public statement of acceptance of any version is permanent authorization for you to choose that version for the Library. guerillabackup-0.5.0/README.md000066400000000000000000000046001450137035300157660ustar00rootroot00000000000000# GuerillaBackup: GuerillaBackup is a minimalistic backup toolbox for asynchronous, local-coordinated, distributed, resilient and secure backup generation, data distribution, verification, storage and deletion suited for rugged environments. GuerillaBackup could be the right solution for you if you want * distributed backup data generation under control of the source system owner, assuming that he knows best what data is worth being written to backup and which policies (retention time, copy count, encryption, non-repudiation) should be applied * operation with limited bandwith, instable network connectivity, limited storage space * data confidentiality, integrity, availability guarantees even with a limited number of compromised or malicious backup processing nodes * limited trust between backup data source and sink system(s) When you need the following features, you might look for a standard free or commercial backup solution: * central control of backup and retention policies * central unlimited access to all data * operate under stable conditions with solid network, sufficient storage, trust between both backup data source and sink # Getting started: For those just wanting to get started quickly, following trail might be the best: * Build (see "Building" below) the software or install the binary package from file ("dpkg -i guerillabackup_[version]_all.deb") or repository ("apt-get install guerillabackup". * Follow the steps from "doc/Installation.txt", section "General GuerillaBackup Configuration". * If not everything is fine yet, see "doc/FAQs.txt" to see if your problem is already known. * If still not working, please file a bug/feature request at github, see "Resources" section below. # Building: * Build a native Debian test package using the default template: see data/debian.template/Readme.txt # Resources: * Bugs, feature requests: https://github.com/halfdog/guerillabackup/issues # Documentation: * doc/Design.txt: GuerillaBackup design documentation * doc/Implementation.txt: GuerillaBackup implementation documentation * doc/Installation.txt: GuerillaBackup end user installation documentation * Manual pages: doc/gb-backup-generator.1.xml and doc/gb-transfer-service.1.xml here on github, usually "man (gb-backup-generator|gb-transfer-service)" when installed from package repository * doc/FAQs.txt: GuerillaBackup frequently asked questions guerillabackup-0.5.0/data/000077500000000000000000000000001450137035300154205ustar00rootroot00000000000000guerillabackup-0.5.0/data/debian.template/000077500000000000000000000000001450137035300204545ustar00rootroot00000000000000guerillabackup-0.5.0/data/debian.template/NEWS000066400000000000000000000003341450137035300211530ustar00rootroot00000000000000guerillabackup (0.5.0) unstable; urgency=low A general overhaul of package structure was done to improve compatibility with Debian packaging standards. -- halfdog Thu, 14 Sep 2023 21:00:00 +0000 guerillabackup-0.5.0/data/debian.template/Readme.txt000066400000000000000000000012071450137035300224120ustar00rootroot00000000000000This directory contains the Debian packaging files to build a native package. They are kept in the data directory: quilt type Debian package building cannot use any files included in the DEBIAN directory of an upstream orig.tar.gz. To build a native package using the template files, use following commands: projectBaseDir="... directory with GuerillaBackup source ..." tmpDir="$(mktemp -d)" mkdir -- "${tmpDir}/guerillabackup" cp -aT -- "${projectBaseDir}" "${tmpDir}/guerillabackup" mv -i -- "${tmpDir}/guerillabackup/data/debian.template" "${tmpDir}/guerillabackup/debian" < /dev/null cd "${tmpDir}/guerillabackup" dpkg-buildpackage -us -uc guerillabackup-0.5.0/data/debian.template/changelog000066400000000000000000000074441450137035300223370ustar00rootroot00000000000000guerillabackup (0.5.0) unstable; urgency=low Features: * Added systemd unit hardening. Refactoring: * Renamed binaries, pathnames according to Debian package inclusion recommendations. * Removed obsolete "/var/run" directories. Bugfixes: * gb-backup-generator: * Provide working default configuration. * gb-storage-tool: * Avoid Exception, use exit(1) on error instead. * Improved error message. -- halfdog Thu, 14 Sep 2023 21:00:00 +0000 guerillabackup (0.4.0) unstable; urgency=low Features: * StorageTool: * Added "Size" check policy to detect backups with abnormal size. * Improved messages for interval policy violations and how to fix. * Warn about files not having any applicable policies defined. * Made policy inheritance control more explicit, improved documentation. Bugfixes: * BackupGenerator: * Fixed invalid executable path in systemd service templates. * Fixed backup generation pipeline race condition on asynchronous shutdown. * Applied pylint. * StorageTool: * Removed anti-file-deletion protection left in code accidentally. * Fixed Interval policy status handling when applying retention policies. * Fixed deletion mark handling with concurrent retention policies. * Fixed exception attempting element retrieval from nonexisting source. * Fixed error message typos. -- halfdog Thu, 19 Jan 2023 14:10:45 +0000 guerillabackup (0.3.0) unstable; urgency=low Features: * Added StorageTool policy support to verify sane backup intervals and to apply data retention policies. -- halfdog Wed, 30 Nov 2022 15:55:59 +0000 guerillabackup (0.2.0) unstable; urgency=low Features: * Added StorageTool to check storage data status, currently only checking for invalid file names in the storage directory. Bugfixes: * Improved TransferService error messages and formatting mistake in man page. -- halfdog Wed, 15 Jun 2022 07:57:19 +0000 guerillabackup (0.1.1) unstable; urgency=low * Bugfixes: * Correct handling of undefined start condition -- halfdog Sat, 15 Jan 2022 08:56:30 +0000 guerillabackup (0.1.0) unstable; urgency=low * Features: * Added flexible backup generator run condition support -- halfdog Sun, 5 Sep 2021 18:31:00 +0000 guerillabackup (0.0.2) unstable; urgency=low * Fixes: * Fixed wrong full tar backup interval defaults in template * Features: * Added TransferService clean shutdown on [Ctrl]-C * Misc: * Applied lintian/pylint suggestions -- halfdog Sat, 24 Oct 2020 11:08:00 +0000 guerillabackup (0.0.1) unstable; urgency=low * Fixes from Debian mentors review process * Removed postrm script template * Changed Debian package section from misc to extra * Moved file/directory permission setting changes from postinst to package building rules * Manpage text corrections after spellchecking * Features: * Python 2 to 3 transition applying pylint for coding style and syntax error detection, code refactoring * Enforce gnupg encryption exit status check * Bugfixes: * Improved IO-handling to external processes during shutdown * Improved transfer protocol error handling * Disable console output buffering when not operating on TTYs * Improved tar backup error status handling, cleanup * Handle broken full/inc backup timing configuration gracefully * Close file descriptors after file transfer or on shutdown due to protocol errors -- halfdog Thu, 19 Jul 2018 20:57:00 +0000 guerillabackup (0.0.0) unstable; urgency=low * Initial packaging of guerillabackup -- halfdog Fri, 30 Dec 2016 00:00:00 +0000 guerillabackup-0.5.0/data/debian.template/control000066400000000000000000000033071450137035300220620ustar00rootroot00000000000000Source: guerillabackup Section: misc Priority: optional Maintainer: halfdog Build-Depends: debhelper-compat (=13), dh-python, docbook-xsl, docbook-xml, xsltproc Standards-Version: 4.6.2 Rules-Requires-Root: no Homepage: https://github.com/halfdog/guerillabackup Vcs-Git: https://github.com/halfdog/guerillabackup.git Vcs-Browser: https://github.com/halfdog/guerillabackup Package: guerillabackup Architecture: all Depends: python3, ${misc:Depends} Description: resilient, distributed backup and archiving solution GuerillaBackup is a backup solution for tailoring special purpose backup data flows, e.g. in rugged environments, with special legal constraints (privacy regulations, cross border data storage), need for integration of custom data processing, auditing or quality assurance code. It is kept small to ease code audits and does not attempt to duplicate features, stable and trusted high quality encryption, networking or storage solutions already provide. So for example it can be integrated with your choice of trusted transports (ssh, SSL, xmpp, etc.) to transfer backups to other nodes according to predefined or custom transfer policies. GuerillaBackup ensures security by encrypting data at source and only this key can be used to decrypt the data. . WARNING: If you are familiar with GDPR, ITIL-service-strategy/design ISO-27k, ... this software will allow you to create custom solutions fulfilling your needs for high quality, legally sound backup and archiving data flows. If you do NOT run production with those (or similar terms) in mind, you should look out for something else. . See /usr/share/doc/guerillabackup/Design.txt.gz section "Requirements" for more information. guerillabackup-0.5.0/data/debian.template/copyright000066400000000000000000000020251450137035300224060ustar00rootroot00000000000000Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ Upstream-Name: guerillabackup Upstream-Contact: me@halfdog.net Source: https://github.com/halfdog/guerillabackup.git Files: * Copyright: 2016-2023 halfdog License: LGPL-3.0+ License: LGPL-3.0+ This package is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. . This package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. . You should have received a copy of the GNU General Public License along with this program. If not, see . . On Debian systems, the complete text of the GNU Lesser General Public License can be found in "/usr/share/common-licenses/LGPL-3". guerillabackup-0.5.0/data/debian.template/dirs000066400000000000000000000001571450137035300213430ustar00rootroot00000000000000etc/guerillabackup/lib-enabled var/lib/guerillabackup var/lib/guerillabackup/data var/lib/guerillabackup/state guerillabackup-0.5.0/data/debian.template/guerillabackup.install000066400000000000000000000007671450137035300250500ustar00rootroot00000000000000src/lib usr/lib/guerillabackup src/gb-backup-generator usr/bin src/gb-storage-tool usr/bin src/gb-transfer-service usr/bin data/etc/* etc/guerillabackup data/init/systemd/guerillabackup-generator.service lib/systemd/system data/init/systemd/guerillabackup-transfer.service lib/systemd/system doc/Design.txt usr/share/doc/guerillabackup doc/Implementation.txt usr/share/doc/guerillabackup doc/Installation.txt usr/share/doc/guerillabackup doc/BackupGeneratorUnit.py.template usr/share/doc/guerillabackup guerillabackup-0.5.0/data/debian.template/guerillabackup.lintian-overrides000066400000000000000000000013711450137035300270300ustar00rootroot00000000000000# We use a non-standard dir permission to prohibit any non-root # user to access keys or data by default. guerillabackup binary: non-standard-dir-perm # Ignore repeated path segment to have "guerillabackup" Python # package in an otherwise empty directory so that modifying # the Python search path to include "lib" will not include any # other files from this or other packages in the search path. guerillabackup binary: repeated-path-segment guerillabackup [usr/lib/guerillabackup/lib/guerillabackup/] guerillabackup binary : repeated-path-segment lib [usr/lib/guerillabackup/lib/] # Currently only systemd startup is provided, therefore ignore # the warning but document it here. guerillabackup binary: package-supports-alternative-init-but-no-init.d-script guerillabackup-0.5.0/data/debian.template/guerillabackup.manpages000066400000000000000000000001231450137035300251570ustar00rootroot00000000000000debian/gb-backup-generator.1 debian/gb-storage-tool.1 debian/gb-transfer-service.1 guerillabackup-0.5.0/data/debian.template/postinst000077500000000000000000000030551450137035300222700ustar00rootroot00000000000000#!/bin/sh # postinst script for guerillabackup # # see: dh_installdeb(1) set -e # summary of how this script can be called: # * `configure' # * `abort-upgrade' # * `abort-remove' `in-favour' # # * `abort-remove' # * `abort-deconfigure' `in-favour' # `removing' # # for details, see https://www.debian.org/doc/debian-policy/ or # the debian-policy package # dh_installdeb will replace this with shell code automatically # generated by other debhelper scripts. #DEBHELPER# # Start the service only when it was already enabled before updating. # This is required as "dh_systemd_start" is disabled in rules # file, thus not restarting the services. See rules file for more # information. if test -d /run/systemd/system && test -e /run/guerillabackup.dpkg-update.run-state; then daemonReloadedFlag="false" for serviceName in guerillabackup-generator.service guerillabackup-transfer.service; do if grep -q -e "${serviceName}: active" -- /run/guerillabackup.dpkg-update.run-state; then if [ "${daemonReloadedFlag}" != "true" ]; then systemctl --system daemon-reload >/dev/null || true daemonReloadedFlag="true" fi deb-systemd-invoke restart "${serviceName}" >/dev/null || true fi done fi rm -f -- /run/guerillabackup.dpkg-update.run-state exit 0 guerillabackup-0.5.0/data/debian.template/prerm000077500000000000000000000021551450137035300215320ustar00rootroot00000000000000#!/bin/sh # prerm script for guerillabackup # # see: dh_installdeb(1) set -e # summary of how this script can be called: # * `remove' # * `upgrade' # * `failed-upgrade' # * `remove' `in-favour' # * `deconfigure' `in-favour' # `removing' # # for details, see https://www.debian.org/doc/debian-policy/ or # the debian-policy package # Capture the state of all currently running services before stop. rm -f -- /run/guerillabackup.dpkg-update.run-state if test -d /run/systemd/system; then for serviceName in guerillabackup-generator.service guerillabackup-transfer.service; do echo "${serviceName}: $(systemctl is-active "${serviceName}")" >> /run/guerillabackup.dpkg-update.run-state deb-systemd-invoke stop "${serviceName}" > /dev/null done fi # dh_installdeb will replace this with shell code automatically # generated by other debhelper scripts. #DEBHELPER# exit 0 guerillabackup-0.5.0/data/debian.template/rules000077500000000000000000000021311450137035300215310ustar00rootroot00000000000000#!/usr/bin/make -f # -*- makefile -*- # Uncomment this to turn on verbose mode. # export DH_VERBOSE=1 %: dh $@ --with=python3 override_dh_auto_build: xsltproc --nonet \ --param make.year.ranges 1 \ --param make.single.year.ranges 1 \ --param man.charmap.use.subset 0 \ -o debian/ \ http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl \ doc/gb-backup-generator.1.xml doc/gb-storage-tool.1.xml \ doc/gb-transfer-service.1.xml dh_auto_build # Do not enable the services on fresh install by default. The # user should do that manually for those services, he really wants # to run. Also do not start the services after install or update. # Without this option, all units would be started during upgrade, # even those not enabled. When user did not enable them, dpkg # should respect that. Those enabled will still be started by # custom postinst code. override_dh_installsystemd: dh_installsystemd --no-enable --no-start override_dh_fixperms: dh_fixperms chmod -R 00700 -- debian/guerillabackup/var/lib/guerillabackup chmod 00700 -- debian/guerillabackup/etc/guerillabackup/keys guerillabackup-0.5.0/data/debian.template/source/000077500000000000000000000000001450137035300217545ustar00rootroot00000000000000guerillabackup-0.5.0/data/debian.template/source/format000066400000000000000000000000151450137035300231630ustar00rootroot000000000000003.0 (native) guerillabackup-0.5.0/data/etc/000077500000000000000000000000001450137035300161735ustar00rootroot00000000000000guerillabackup-0.5.0/data/etc/config.template000066400000000000000000000114521450137035300212000ustar00rootroot00000000000000# GuerillaBackup main configuration file. # General parameters influence behavior of various backup elements, # e.g. source units, sinks or the generator itself. All those # parameters start with "General" to indicate their global relevance. # This is the default persistency storage base directory for all # components. All components will create files or subdirectories # starting with the component class name unless changed within # configuration. See also "ComponentPersistencyPrefix" in unit # or subunit configuration files. # GeneralPersistencyBaseDir = '/var/lib/guerillabackup/state' # This is the default runtime data directory for all components. # It is used to create sockets, PID files and similar, that need # not to be preserved between reboots. # GeneralRuntimeDataDir = '/run/guerillabackup' # This parameter defines the default pipeline element to use to # compress backup data of any kind before sending it to downstream # processing, usually encryption of sink. When enabling compression # and encryption, you may want to disable the additional compression # step included in many encryption toosl, e.g. via "--compress-algo # none" in gpg. # GeneralDefaultCompressionElement = guerillabackup.OSProcessPipelineElement('/bin/bzip2', ['/bin/bzip2', '-c9']) # This parameter defines the default encryption pipeline element # to use to encrypt backup data of any kind before sending it # to donwstream processing. For security reasons, a unit might # use an alternative encryption element, e.g. with different options # or keys, but it should NEVER ignore the parameter, even when # unit-specific encryption is disabled. Hence the unit shall never # generate uncencrypted data while this parameter is not also # overriden in the unit-specific configuration. See also function # "getDefaultDownstreamPipeline" documentation. # GeneralDefaultEncryptionElement = guerillabackup.GpgEncryptionPipelineElement('some key name') # Debugging settings: # This flag enables test mode for all configurable components # in the data pipe from source to sink. As testing of most features # will require to run real backups, the testing mode will cause # an abort in the very last moment before completion. Wellbehaved # components will roll back most of the actions under this circumstances. # GeneralDebugTestModeFlag = False # Generator specific settings: Those settings configure the local # default backup generator. # Use this sink for storage of backup data elements. The class # has to have a constructor only taking one argument, that is # the generator configuration context as defined by the SinkInterface. # When empty, the guerillabackup.DefaultFileSystemSink is used. # GeneratorSinkClass = guerillabackup.DefaultFileSystemSink # Use this directory for storage of the backup data elements generated # locally. The default location is "/var/lib/guerillabackup/data". # You may want also to enable transfer services using this directory # as source to copy or move backup data to an offsite location. DefaultFileSystemSinkBaseDir = '/var/lib/guerillabackup/data' # This parameter defines the conditions that have to be met to # run any backup unit. The settings is intended to avoid running # units at unfavorable times, e.g. during machine maintenance, # immediately during boot time high CPU/disk activity but also # when there is abnormally high load on the machine. When the # condition is not met yet it will be reevaluated at the next # scheduler run, usually some seconds later. # DefaultUnitRunCondition = guerillabackup.LogicalAndCondition([ # guerillabackup.MinPowerOnTimeCondition(600), # guerillabackup.AverageLoadLimitCondition(0.5, 240)]) # Unit specific default and specific settings can be found in # the units directory. # Transfer service configuration: this part of configuration does # not take effect automatically, a transfer service has to be # started loading this configuration file. When security considerations # prohibit use of same configuration, e.g. due to inaccessibility # of configuration file because of permission settings, then this # file should be copied to "config-[agent name]" instead. # Storage directory used by this transfer service. When not present, # the DefaultFileSystemSinkBaseDir is used instead. # TransferServiceStorageBaseDir = '/var/spool/guerillabackup/transfer' # Class to load to define the transfer receiver policy. # TransferReceiverPolicyClass = guerillabackup.Transfer.ReceiverStoreDataTransferPolicy # Arguments for creating the named transfer policy to pass after # the configuration context. # TransferReceiverPolicyInitArgs = None # Class to load to define the transfer sender policy. # TransferSenderPolicyClass = guerillabackup.Transfer.SenderMoveDataTransferPolicy # Arguments for creating the named transfer policy to pass after # the configuration context. # TransferSenderPolicyInitArgs = [False] guerillabackup-0.5.0/data/etc/keys/000077500000000000000000000000001450137035300171465ustar00rootroot00000000000000guerillabackup-0.5.0/data/etc/keys/Readme.txt000066400000000000000000000005651450137035300211120ustar00rootroot00000000000000Use this directory to collect all the encryption keys required for guerillabackup tool operation. By default, the directory is only readable for the root user to hide the keys used by default. Change only when you know what you are doing. This directory is also the default home directory for Gnupg as it does not support operation without warnings and no home directory. guerillabackup-0.5.0/data/etc/storage-tool-config.json.template000066400000000000000000000053731450137035300245720ustar00rootroot00000000000000# This is the gb-storage-tool configuration template. See also the # gb-storage-tool man page for more information. { # These are the default policies to apply to resources in the # data directory when they are first seen. Policies are passed # on from a configuration to included subconfigurations. An included # configuration may override a policy by defining another one # on the same resource but with higher priority. To disable policy # inheritance, add a null policy as first element in the list. "Policies": [ { "Sources": "^(.*/)?root$", "Inherit": true, "List": [ { "Name": "Interval", "Priority": 100, "FullMin": "6d20H", "FullMax": "6d28H", "IncMin": "20H", "IncMax": "28H" }, { "Name": "LevelRetention", "Levels": [ # Keep weekly backups for 30 days, including incremental backups. { "KeepCount": 30, "Interval": "day", "TimeRef": "latest", "KeepInc": true }, # Keep weekly backups for 3 month, approx. 13 backups. { "KeepCount": 13, "Interval": "day", "AlignModulus": 7 }, # Keep monthly backups for 12 month. { "KeepCount": 12, "Interval": "month" }, # Keep 3-month backups for 3 years, total 12 backups. { "KeepCount": 12, "Interval": "month", "AlignModulus": 3, "AlignValue": 1 }, # Keep yearly backups. { "KeepCount": 10, "Interval": "year" } ] } ] }, { "Sources": "^(.*/)?var/log/.*$", "List": [ { "Name": "Interval", "Priority": 100, "FullMin": "20H", "FullMax": "28H" } ] } ], # This is the data directory for this configuration. All files # not within the data directory of another (sub-)configuration # have to be sane backup resource files or otherwise covered # by a policy, usually the "Ignore" policy in the status file. "DataDir": "/var/lib/guerillabackup/data", # Ignore those files in the data directory. Ignoring nonexisting # files will cause a warning. "Ignore": [ ], # This is the status file defining the current status associated # with files in "DataDir" when required. "Status": "/var/lib/guerillabackup/state/storage-tool-status.json" # Include a list of sub-configuration files for backup storages # spread out over multiple unrelated data directories or to split # one huge configuration into multiple smaller ones. # "Include": [ # "/...[another storage].../storage-tool-config.json" # ] } guerillabackup-0.5.0/data/etc/units/000077500000000000000000000000001450137035300173355ustar00rootroot00000000000000guerillabackup-0.5.0/data/etc/units/LogfileBackupUnit.config.template000066400000000000000000000061121450137035300257050ustar00rootroot00000000000000# LogFileBackupUnit configuration template # This list contains tuples with five elements per logfile backup # input. The meaning of each value is: # * Input directory: absolute directory name to search for logfiles. # * Input file regex: regular expression to select compressed # or uncompressed logfiles for inclusion. When the regex contains # a named group "oldserial", a file with empty serial is handled # as newest while file with largest serial value is the oldest. # With named group "serial", oldest file will have smallest # serial number, e.g. with date or timestamp file extensions. # When a named group "compress" is found, the match content, # e.g. "gz" or "bz2", will be used to find a decompressor and # decompress the file before processing. # * Source URL transformation: If None, the first named group # of the "input file regex" is appended to the input directory # name and used as source URL. When not starting with a "/", # the transformation string is the name to include literally # in the URL after the "input directory" name. # * Policy: If not none, include this string as handling policy # within the manifest. # * Encryption key name: If not None, encrypt the input using # the named key. LogBackupUnitInputList = [] # Include old (rotated) default syslog files, where serial number # was already appended. Accept also the compressed variants. # LogBackupUnitInputList.append(( # '/var/log', # '^(auth\\.log|daemon\\.log|debug|kern\\.log|mail\\.err|mail\\.info|' \ # 'mail\\.log|mail\\.warn|messages|syslog)\\.(?P[0-9]+)' \ # '(?:\\.(?Pgz))?$', # None, None, None)) # Other logs and backup files: # LogBackupUnitInputList.append(( # '/var/log', # '^(alternatives\\.log|btmp|dmesg|dpkg\\.log|wtmp)\\.' \ # '(?P[0-9]+)(?:\\.(?Pgz))?$', # None, None, None)) # Apt logs: # LogBackupUnitInputList.append(( # '/var/log/apt', # '^([a-z]+\\.log)\\.(?P[0-9]+)(?:\\.(?Pgz))?$', # None, None, None)) # Apache logs: # LogBackupUnitInputList.append(( # '/var/log/apache2', # '^([0-9a-zA-Z.-]+)\\.(?P[0-9]+)(?:\\.(?Pgz))?$', # None, None, None)) # Firewall logs: # LogBackupUnitInputList.append(( # '/var/log/ulog', # '^(ulogd\\.pcap)\\.(?P[0-9]+)(?:\\.(?Pgz))?$', # None, None, None)) # Tomcat logs: # LogBackupUnitInputList.append(( # '/var/log/tomcat8', # '^(catalina\\.out)\\.(?P[0-9]+)(?:\\.(?Pgz))?$', # None, None, None)) # LogBackupUnitInputList.append(( # '/var/log/tomcat8', # '^(catalina)\\.(?P[0-9-]{10})\\.log(?:\\.(?Pgz))?$', # 'catalina.log', None, None)) # LogBackupUnitInputList.append(( # '/var/log/tomcat8', # '^(localhost)\\.(?P[0-9-]{10})\\.log(?:\\.(?Pgz))?$', # 'localhost.log', None, None)) # LogBackupUnitInputList.append(( # '/var/log/tomcat8', # '^(localhost_access_log)\\.(?P[0-9-]{10})\\.txt(?:\\.(?Pgz))?$', # 'localhost_access_log.txt', None, None)) guerillabackup-0.5.0/data/etc/units/Readme.txt000066400000000000000000000012211450137035300212670ustar00rootroot00000000000000This directory contains all the loaded units plus configuration parameter overrides, if available. When not available, the main backup generator configuration, usually "/etc/guerillabackup/config", is passed to each unit unmodified. A valid unit can be a symlink to a guerillabackup core unit, e.g. /usr/lib/guerillabackup/lib/guerillabackup/LogfileBackupUnit.py but also a local unit definition written into a plain file. To be loaded, the unit definition file name has to contain only numbers and letters. An associated configuration file has the same name with suffix ".config" appended. This "Readme.txt" and all files named ".template" are ignored. guerillabackup-0.5.0/data/etc/units/TarBackupUnit.config.template000066400000000000000000000053731450137035300250620ustar00rootroot00000000000000# TarBackupUnit configuration template # This list contains dictionaries with configuration parameters # for each tar backup to run. All tar backups of one unit are # run sequentially. Configuration parameters are: # * PreBackupCommand: execute this command given as list of arguments # before starting the backup, e.g. create a filesystem or virtual # machine snapshot, perform cleanup. # * PostBackupCommand: execute this command after starting the # backup. # * Root: root directory of tar backup, "/" when missing. # * Include: list of pathes to include, ["."] when missing. # * Exclude: list of patterns to exclude from backup (see tar # documentation "--exclude"). When missing and Root is "/", # list ["./var/lib/guerillabackup/data"] is used. # * IgnoreBackupRaces: flag to indicate if races during backup # are acceptable, e.g. because the directories are modified, # * FullBackupTiming: tuple with minimum and maximum interval # between full backup invocations and modulo base and offset, # all in seconds. Without modulo invocation (all values None), # full backups will run as soon as minimum interval is exceeded. # With modulo timing, modulo trigger is ignored when below minimum # time. When gap above maximum interval, immediate backup is # started. # * IncBackupTiming: When set, incremental backups are created # to fill the time between full backups. Timings are specified # as tuple with same meaning as in FullBackupTiming parameter. # This will also trigger generation of tar file indices when # running full backups. # * FullOverrideCommand: when set, parameters Exclude, Include, # Root are ignored and exactly the given command is executed. # * IncOverrideCommand: when set, parameters Exclude, Include, # Root are ignored and exactly the given command is executed. # * KeepIndices: number of old incremental tar backup indices # to keep. With -1 keep all, otherwise keep one the given number. # Default is 0. # * Policy: If not none, include this string as handling policy # * EncryptionKey: If not None, encrypt the input using the named # key. Otherwise default encryption key from global configuration # might be used. TarBackupUnitConfigList = {} # TarBackupUnitConfigList['/root'] = { # 'PreBackupCommand': ['/usr/bin/touch', '/tmp/prebackup'], # 'PostBackupCommand': ['/usr/bin/touch', '/tmp/postbackup'], # 'Root': '/', # 'Include': ['.'], # 'Exclude': ['./proc', './sys', './var/lib/guerillabackup/data'], # 'IgnoreBackupRaces': False, # Schedule one root directory full backup every week. # 'FullBackupTiming': [(7*24-4)*3600, (7*24+4)*3600, 7*24*3600, 0], # Create a daily incremental backup when machine is up. # 'IncBackupTiming': [20*3600, 28*3600, 24*3600, 0], # 'Policy': 'default', 'EncryptionKey': None} guerillabackup-0.5.0/data/init/000077500000000000000000000000001450137035300163635ustar00rootroot00000000000000guerillabackup-0.5.0/data/init/systemd/000077500000000000000000000000001450137035300200535ustar00rootroot00000000000000guerillabackup-0.5.0/data/init/systemd/guerillabackup-generator.service000066400000000000000000000017501450137035300264160ustar00rootroot00000000000000[Unit] Description="Guerillabackup backup generator service" Documentation=man:gb-backup-generator(1) After=network.target [Service] Type=simple ExecStart=/usr/bin/gb-backup-generator Restart=always # Enable strict hardening by default: the settings here should # be compatible with the backup generator units provided by the # software package, but non-standard units may require these # settings to be relaxed. LockPersonality=true MemoryDenyWriteExecute=true # Do not provide a private view on devices as usually the devices # should also end up unmodified in the backup when included by # the backup source selection. PrivateDevices=false # Do not exclude the temporary directories from backup here but # using the source selection. PrivateTmp=false ProtectClock=true ProtectControlGroups=true ProtectHostname=true ProtectKernelLogs=true ProtectKernelModules=true ProtectKernelTunables=true ProtectSystem=full RestrictNamespaces=true RestrictRealtime=true [Install] WantedBy=multi-user.target guerillabackup-0.5.0/data/init/systemd/guerillabackup-transfer.service000066400000000000000000000010651450137035300262530ustar00rootroot00000000000000[Unit] Description="Guerillabackup data transfer service" Documentation=man:gb-transfer-service(1) After=network.target [Service] Type=simple ExecStart=/usr/bin/gb-transfer-service Restart=always # Enable strict hardening by default. LockPersonality=true MemoryDenyWriteExecute=true PrivateDevices=true PrivateTmp=true ProtectClock=true ProtectControlGroups=true ProtectHostname=true ProtectKernelLogs=true ProtectKernelModules=true ProtectKernelTunables=true ProtectSystem=full RestrictNamespaces=true RestrictRealtime=true [Install] WantedBy=multi-user.target guerillabackup-0.5.0/data/init/upstart/000077500000000000000000000000001450137035300200655ustar00rootroot00000000000000guerillabackup-0.5.0/data/init/upstart/guerillabackup-generator.conf000066400000000000000000000003021450137035300257050ustar00rootroot00000000000000# guerillabackup - Start the backup generator service description "Guerillabackup backup generator service" start on filesystem stop on starting rcS respawn exec /usr/bin/gb-backup-generator guerillabackup-0.5.0/doc/000077500000000000000000000000001450137035300152545ustar00rootroot00000000000000guerillabackup-0.5.0/doc/BackupGeneratorUnit.py.template000066400000000000000000000067651450137035300233720ustar00rootroot00000000000000# This file is a template to create own backup generator unit # definitions, that are in fact just plain python code. You may # also use other guerillabackup core units as basis for your new # code, e.g. /usr/lib/guerillabackup/lib/guerillabackup/LogfileBackupUnit.py """Your module docstring here ...""" import errno import json import os import guerillabackup # Declare the keys to access configuration parameters in the configuration # data dictionary here. CONFIG_SOME_KEY = 'SomeBackupUnitSomeParameter' class SomeBackupUnit(guerillabackup.SchedulableGeneratorUnitInterface): """Add documentation about this class here.""" def __init__(self, unitName, configContext): """Initialize this unit using the given configuration.... @param unitName The name of the activated unit main file in /etc/guerillabackup/units.""" # Keep the unitName, it is usefull to create unique persistency # directory names. self.unitName = unitName self.configContext = configContext # Each unit has to handle the test mode flag, so extract it here. self.testModeFlag = configContext.get( guerillabackup.CONFIG_GENERAL_DEBUG_TEST_MODE_KEY, False) if not isinstance(self.testModeFlag, bool): raise Exception('Configuration parameter %s has to be ' \ 'boolean' % guerillabackup.CONFIG_GENERAL_DEBUG_TEST_MODE_KEY) # Open a persistency directory. self.persistencyDirFd = guerillabackup.openPersistencyFile( configContext, os.path.join('generators', self.unitName), os.O_DIRECTORY|os.O_RDONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY, 0o600) handle = None try: handle = guerillabackup.secureOpenAt( self.persistencyDirFd, 'state.current', fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY) except OSError as openError: if openError.errno != errno.ENOENT: raise if handle != None: stateData = b'' while True: data = os.read(handle, 1<<20) if len(data) == 0: break stateData += data os.close(handle) stateInfo = json.loads(str(stateData, 'ascii')) if ((not isinstance(stateInfo, list)) or (len(stateInfo) != 2) or (not isinstance(stateInfo[0], int)) or (not isinstance(stateInfo[1], dict))): raise Exception('Persistency data structure mismatch') ... # Now use the persistency information. def getNextInvocationTime(self): """Get the time in seconds until this unit should called again. If a unit does not know (yet) as invocation needs depend on external events, it should report a reasonable low value to be queried again soon. @return 0 if the unit should be invoked immediately, the seconds to go otherwise.""" # Calculate the next invocation time. maxIntervalDelta = 600.0 ... return maxIntervalDelta def invokeUnit(self, sink): """Invoke this unit to create backup elements and pass them on to the sink. Even when indicated via getNextInvocationTime, the unit may decide, that it is not yet ready and not write any element to the sink. @return None if currently there is nothing to write to the source, a number of seconds to retry invocation if the unit assumes, that there is data to be processed but processing cannot start yet, e.g. due to locks held by other parties or resource, e.g. network storages, currently not available.""" ... # Declare the main unit class so that the backup generator can # instantiate it. backupGeneratorUnitClass = SomeBackupUnit guerillabackup-0.5.0/doc/Design.txt000066400000000000000000000411421450137035300172300ustar00rootroot00000000000000Terms: ====== The following terms shall be used within requirements and design to describe the components. * Backup data element: A complete, atomic backup data storage unit representing a defined complete state (full) or the change from a previous state (incremental). Each element is linked to a single source identified by a backup data element ID. * Backup data element ID: An unique identifier for backup data element storages to refer to a stored element. The storage has to be able to derive the corresponding "source URL" from the ID. * BackupGenerator: The tool implementing the "Backup Scheduler" and "Sink" functionality to trigger execution of registered "Generator Unit" elements. * Backup Scheduler: The scheduler will invoke backup generation of a given "Generator Unit", thus triggering the backup storage to a given sink. * Generator Unit: When invoked by a BackupGenerator, the unit delivers backup data elements from one or more "Sources". This does not imply, that the unit has direct access to the "Source", it may also retrieve elements from other generators or intermediate caching or storage units. * Source: A source is an identified data entity with a defined state in time. At some timepoints, backup data elements can be produced to represent that state or changes to that state to an extent depending on the source properties. The series of backup data elements produced for a single source are identified by a common "Source URL". * Source Multiplexer: A source multiplexer can retrieve or generate "backup data elements" from one or more sources and deliver deliver them to a sink multiplexer. * Source URL: See design. User stories: ============= * N-way redundant storage synchronization: There are n machines, all producing "backup data elements". All these machines communicate one with another. The backup data elements from one machine should be stored on at least two more machines. The source synchronization policy is to announce each element to all other machines until one of those confirms transmission. On transmission, the source keeps information about which elements were sucessfully stored by the remote receiver. As soon as the required number of copies is reached, the file is announced only to those agents any more, that have already fetched it. The receiver synchronization policy is to ask each machine for data elements. If the element is already present, it will not be fetched again, otherwise the local policy may decide to start a fetch procedure immediately. To conserve bandwidth and local resources, the policy may also refuse to fetch some elements now and retry later, thus giving another slower instance the chance to fetch the file by itself. For local elements not announced any more by the remote source, the receiver will move them to the attic and delete after some time. When an agent does not attempt to synchronize for an extended period of time, the source will not count copies made by this agent to the total number of copies any more. Thus the source may start announcing the same backup data element to other agents again. Requirements: ============= * [Req:SourceIdentification]: Each backup data source, that is an endpoint producing backups, shall have a unique address, both for linking stored backups to the source but also to apply policies to data from a single source or to control behaviour of a source. * [Req:SecureDataLocalDataTransfers]: Avoid copying of files between different user contexts to protect against filesystem based attacks, privilege escalation. * [Req:SynchronousGenerationAndStreaming]: Allow streaming of backups from generator context to other local or remote context immediately during generation. * [Req:Spooling]: Support spooling of "backup data elements" on intermediate storage location, which is needed when final storage location and source are not permanently connected during backup generation. As "backup data elements" may need to be generated timely, support writing to spool location and fetching from there. * [Req:DetectSpoolingManipulations]: A malicious spool instance shall not be able to remove or modify spooled backup data. * [Req:EncryptedBackups]: Support encryption of backup data. * [Req:Metainfo]: Allow transport of "backup data element" meta information: * Backup type ([Req:MetainfoBackupType]): * full: the file contains the complete copy of the data * inc: the backup has to be applied to the previous full backup and possibly all incremental backups in between. * storage data checksums * handling policy * fields for future use * [Req:NonRepudiation]: Ensure non-repudiation even for backup files in spool. This means, that as soon as a backup file was received by another party, the source shall not be able to deny having produced it. * [Req:OrderedProcessing]: With spooling, files from one source might not be transmitted in correct order. Some processing operations, e.g. awstats, might need to see all files in correct order. Thus where relevant, processor at end of pipeline shall be able to verify all files have arrived and are properly ordered. * [Req:DecentralizedScheduling]: Administrator at source shall be able to trigger immediate backup and change scheduling if not disabled upstream when using streaming generation (see also [Req:SynchronousGenerationAndStreaming]). * [Req:DecentralizedPolicing]: Administrator at source shall be able to generate a single set of cyclic backups with non-default policy tags. * [Req:ModularGeneratorUnitConfiguration]: Modularized configuration at backup generator level shall guarantee independent adding, changing or removal of configuration files but also ... * [Req:ModularGeneratorUnitCustomCode]: ... easy inclusion of user-specific custom code without touching the application core. * [Req:StorageBackupDataElementAttributes]: Apart from data element attributes, storage shall be able to keep track of additional storage attributes to manage data for current applications, e.g. policy based synchronization between multiple storages but also for future applications. Design: ======= * Source URL ([Req:SourceIdentification]): Each backup source is identified by a unique UNIX-pathlike string starting with '/' and path components consisting only of characters from the set [A-Za-z0-9%.-] separated by slashes. The path components '.' and '..' are forbidden for security reasons. The URL must not end with a slash. A source may decide to use the '%' or any other character for escaping when creating source URLs from any other kind of input with broader set of allowed characters, e.g. file names. The source must not rely on any downstream processor to treat it any special. * File storage: The storage file name is the resource name but with ISO-like timestamp (second precision) and optional serial number prepended to the last path part of the source URL followed by the backup type ([Req:MetainfoBackupType]) and suffix '.data'. Inclusion of the backup type in file name simplifies manual removal without use of staging tools. * Unit configuration: A "generator unit" of a given type might have to be added multiple times but with different configuration. Together with the configuration, each of the units might also require a separate persistency or locking directory. Therefore following scheme shall be used: * A unit is activated by adding a symlink to an guerillabackup core unit definition file ([Req:ModularGeneratorUnitConfiguration]) or creating a custom definition ([Req:ModularGeneratorUnitCustomCode]). * A valid "generator unit" has to declare the main class object to instantiate so that the generator can locate the class. * Existance of a configuration file with same name as unit file, just suffix changed to ".config", will cause this configuration to be added as overlay to the global backup generator configuration. * Data processing pipeline design: * Data processing pipelines shall be used to allow creation of customized processing pipelines. * To support parallel processing and multithreading, processing pipelines can be built using synchronous and asynchronous pipeline elements. * Data processing has to be protected against two kind of errors, that is blocking IO within one component while action from another component would be required. Thus blocking, if any requires a timeout under all circumstances to avoid that an asynchronous process enters error state while blocking. * Even when a pipeline instance in the middle of the complete pipeline has terminated, nothing can be infered about the termination behaviour of the up- or downstream elements, it is still required to await normal termination or errors when triggering processing: an asynchronous process may terminate after an unpredictable amount of time due to calculation or IO activites not interfering with the pipeline data streams. * Due to the asynchronous nature of pipeline processing and the use of file descriptors for optimization, closing of those descriptors may only occur after the last component using a descriptor has released it. The downstream component alone is allowed to close it and is also in charge of closing it. If a downstream component is multithreaded and one thread is ready to close it, another one still needs it, then the downstream component has to solve that problem on its own. * When all synchronous pipeline elements are stuck, select on all blocking file descriptors from those elements until the first one is ready for IO again. * Scheduling: The scheduler contains a registry of known sources. Each source is responsible to store scheduling information required for operation, e.g. when source was last run, when it is scheduled to be run again. Each source has to provide methods to schedule and run it. Each source has default processing pipeline associated, e.g. compression, encryption, signing. A source, that does not support multiple parallel invocation has to provide locking support. While the older source process shall continue processing uninterupted, the newer one may indicate a retry timeout. * Encryption and signing: GPG shall be used to create encrypted and signed files or detached signatures for immediate transmission to trusted third party. * Metainformation files ([Req:Metainfo]): This file has a simple structure just containing a json-serialized dictionary with metainformation key-value pairs. The file name is derived from the main backup file by appending ".info". As json serialization of binary data is inefficient regarding space, this data shall be encoded base64 before writing. * BackupType ([Req:MetainfoBackupType]): Mandatory field with type of the backup, only string "full" and "inc" supported. * DataUuid ([Req:OrderedProcessing]): Optional unique identifier of this file, e.g. when it may be possible, that datasets with same timestamp might be available from a single source. To allow reuse of DataUuid also for [Req:DetectSpoolingManipulations], it has to be unique not only for a single source but for all sources from a machine or even globally. This has to hold true also when two source produce identical data. Otherwise items with same Uuid could be swapped. It should be the same when rerunning exactly the same backup for a single source with identical state and data twice, e.g. when source wrote data to sink before failing to complete the last steps. * HandlingPolicy: Optional list of strings from set [A-Za-z0-9 ./-] defining how a downstream backup sink or storage maintenance process should handle the file. No list or an empty list is allowed. Receiver has to know how to deal with a given policy. * MetaDataSignature ([Req:DetectSpoolingManipulations]): A base64 encoded binary PGP signature on the whole metadata json artefact with MetaDataSignature field already present but set to null and TransferAttributes field missing. * Predecessor ([Req:OrderedProcessing]): String with UUID of file to be processed before this one. * StorageFileChecksumSha512: The base64 binary checksum of the stored backup file. As the file might be encrypted, this does not need to match the checksum of the embedded content. * StorageFileSignature: Base64 encoded data of binary PGP-signature made on the storage file immediately after creation. While signatures embedded in the encrypted storage file itself are problematic to detect manipulation on the source system between creation and retrieval, this detached signature can be easily copied from the source system to another more trustworthy machine, e.g. by sending as mail or writing to remote syslog. ([Req:NonRepudiation]) * Timestamp: Backup content timestamp in seconds since 1970. This field is mandatory. * Storage: * The storage uses one JSON artefact per stored backup data element containing a list of two items: the first one is the element's metainfo, the second one the dictionary containing the attributes. * Synchronization: * Daemons shall monitor the local storage to offer backup data elements to remote synchronization agents based on the local policy. * Daemons shall also fetch remote resources when offered and local storage is acceptable according to policy. * TransferAttributes: A dictionary with attributes set by the transfer proceses and used by backup data element announcement and storage policies to optimize operation. Each attribute holds a list of attribute values for each remote transfer agent and probably also the transfer source. * Transmission protocol: For interaction between the components, a minimalistic transmission protocol shall be used. * The protocol implementation shall be easily replaceable. * The default JSON protocol implementation only forwards data to ServerProtocolInterface implementation. Each request is just a list with method name to call, the remaining list items are used as call arguments. The supported method names are the same as in ServerProtocolInterface interface. As sending of large JSON responses is problematic, the protocol supports multipart responses, sending chunks of data. BackupGenerator Design: ======================= * The generator can write to a single but configurable sink, thus enabling both local file storage but also remote fetch. * It keeps track of the Schedulable Source Units. * It provides configuration support to the source units. * It provides persistency support to the source units. StorageTool Design: =================== The tool checks and modifies a storage using this workflow: * Load the master configuration and all included configurations. * Locate the storage data and meta information files in each configuration data directory. * Check for applicable but yet unused policy templates and apply them to files matching the selection pattern. Following policies can be applied to each source: * Backup interval policy: this policy checks if full and incremental backups are created and transfered with sane intervals between each element and also in relation to current time (generated at all). * Integrity policy: this policy checks that the hash value of the backup data matches the one in the metadata and that all metadata blocks are chained together appropriately, thus detecting manipulation of backup data and metadata in the storage. (Not implemented yet) * Retention policies: these policies check which backup data elements should be kept and which ones could be pruned from the storage to free space but also to comply with regulations regarding data retention, e.g. GDPR. Such policies also interact with the mechanisms to really delete the data without causing inconsistencies in other policies, see below. From all policies the retention policies are special as they may release backup data elements still needed by other policies when checking compliance. So for example deleting elements will always break the integrity policy as it shall detect any modification of backup data by design. Therefore applying any policy is a two step process: first all policies are applied (checked) and retention policies may mark some elements for deletion. Before performing the deletion each policy is invoked again to extract any information from the to be deleted elements that the policy will need when applied again to the same storage after the elements were already deleted. Thus any storage state that was valid according to a policy will stay valid even after deletion of some elements. guerillabackup-0.5.0/doc/FAQs.txt000066400000000000000000000101171450137035300166070ustar00rootroot00000000000000Introduction: ============= This document contains questions already asked regarding GuerillaBackup. Each entry gives only an introductory answer and references to core documentation regarding the question. Questions regarding whole software suite: ========================================= * Q: When should I use GuerillaBackup? A: Use it when you want to craft a backup toolchain fulfilling special needs of your environment. Use it when you know, that you just cannot simply "make backups and restore them" in a highly complex distributed system as done with a notebook and a USB backup drive. Therefore you most likely have performed thourough risk analysis, developed mitigation strategies, derived backup and recovery plans. And now you need a flexible toolchain to be configured according your specification and integrated into your software ecosystem, most likely using fully automated installation (e.g. ansible) and operation. Do NOT use it if you are looking for a 3-click graphical solution to synchronize your notebook desktop data to another location every some weeks or so. * Q: How do I restore backups? A: Sarkastic answer: "exactly the one validated way you defined in your backup plans and standard operating procedures". Realistic answer: most likely you will have configured a simple tar backup during installation that might be also encrypted using the GnuPG encryption example. See "SoftwareOperation.txt" for some answers regarding restore procedures. gb-backup-generator: ================= * Q: Where do I find generated archives and backups? A: When using only "gb-backup-generator" without any transfers configured (see "gb-transfer-service"), you will find them at the sink location you configured when creating your generator configuration. Starting from a default configuration, you will usually use a file system sink storing to "/var/lib/guerillabackup/data". See "gb-backup-generator" man page on general generator configuration and configuration sections in "/usr/share/doc/guerillabackup/Installation.txt". * Q: What are the PGP keys for, who maintains them? A: Backup storages contain all the valuable data from various backup sources thus are a very interesting target for data theft. Therefore e.g. backup storage media (disks, tapes) have to be tracked, then wiped and destroyed to prevent data leakage. GuerillaBackup supports public key encryption at the source, thus an attack on the central storage system or theft of storage media cannot reveal any relevant information to the adversary. This makes e.g. backup media handling easier and thus more secure, reduces costs. You could even synchronize your backups to the cloud without relevant data leakage risks. The private key is required only to restore backups and should be kept safe, even offline at the best. To use this feature, enable the "GeneralDefaultEncryptionElement" from "/etc/guerillabackup/config.template" (see template for more information). Define your protection levels, generate the required keys and install them where needed. gb-transfer-service: ================= * Q: How do both sides authenticate? A: According specification "gb-transfer-service" implementation creates an UNIX domain socket which can be protected by standard means. In default configuration it can only be accessed by user root. To grant access to that socket remotely use the very same techniques and tools you use for other network services also. Quite useful is e.g. to forward the UNIX domain socket via SSH both on interactive or non-interactive connections (depends on your use case, see man "ssh", "-L" option) or use "socat UNIX-CONNECT:... OPENSSL" to use PKI/certificate based access control and data encryption. See "gb-transfer-service" man page for more information. * Q: Which network protocols are supported? A: "gb-transfer-service" requires bidirectional communication but does not aim to reimplement all the high-quality network communication tools out there already. Instead it provides means to easily integrate your preferred tools for network tunneling, network access control. See "Q: How do both sides authenticate?" for more information. guerillabackup-0.5.0/doc/Implementation.txt000066400000000000000000000256451450137035300210160ustar00rootroot00000000000000Introduction: ============= This document provides information on the implementation side design decisions and the blueprint of the implementation itself. Directory structure: ==================== * /etc/guerillabackup: This is the default configuration directory. * config: This is the main GuerillaBackup configuration file. Settings can be overridden e.g. in unit configuration files. * keys: This directory is the default backup encryption key location. Currently this is the home directory of a GnuPG key store. * lib-enabled: This directory is included in the site-path by default. Add symbolic links to include specific Python packages or machine/organisation specific code. * units: The units directory contains the enabled backup data generation units. To enable a unit, a symbolic link to the unit definition file has to be created. The name of the symbolic link has to consist only of letters and numbers. For units with an associated configuration file named "[unitname].config", configuration parameters from the main configuration file can be overridden within the unit-specific configuration. * /var/lib/guerillabackup: This directory is usually only readable by root user unless transfer agents with different UID are configured. * data: where backuped data from local backups is stored, usually by the default sink. * state: State persistency directory for all backup procedures. * state/generators/[UnitName]: File or directory to store state data for a given backup unit. * state/agents: Directory to store additional information of local backup data processing or remote transfer agents. * /run/guerillabackup: This directory is used to keep data, only needed while guerillabackup tools are running. This data can be discarded on reboot. * transfer.socket: Default socket location for "gb-transfer-service". Library functions: ================== * Configuration loading: Configuration loading happens in 2 stages: * Loading of the main configuration. * Loading of a component/module specific overlay configuration. This is allows tools to perform modularised tasks, e.g. a backup generator processing different sources, to apply user-defined configuration alterations to the configuration of a single unit. The overlay configuration is then merged with the main configuration. Defaults have to be set in the main configuration. A tool may refuse to start when required default values are missing in the configuration. Backup Generator: ================= * Process pipelines: Pipeline implementation is designed to support both operating system processes using only file descriptors for streaming, pure Python processes, that need to be run in a separate thread or are polled for normal operation and a mixture of both, i.e. only one of input or output is an operating system pipe. Because of the last case, a pipeline instance may behave like a synchronous component at the beginning until the synchronous input was processed and the end of input data was reached. From that moment on till all processing is finished it behaves like an asynchronous component. * State handling: A pipeline instance may only change its state when the doProcess() or isRunning() method is called. It is forbidden to invoke doProcess() on an instance not running any more. Therefore after isRunning() returned true, it is save to call doProcess(). * Error handling: Standard way to get processing errors is by calling the doProcess method, even when process is asynchronous. On error, the method should always return the same error message for a broken process until stop() is called. One error variant is, that operating system processes did not read all input from their input pipes and some data remains in buffers. This error has to be reported to the caller either from doProcess() or stop(), whatever comes first. The correct detection of unprocessed input data may fail, if a downstream component is stopped while the upstream is running and writing data to a pipe after the checks. * Pipeline element implementation: * DigestPipelineElement: A synchronous element creating a checksum of all data passing through it. * GpgEncryptionPipelineElement: A pipeline element returning a generic OSProcessPipelineExecutionInstance to perform encryption using GnuPG. * OSProcessPipelineElement: This pipeline element will create an operating system level process wrapped in a OSProcessPipelineExecutionInstance to perform the transformation. The instance may be fully asynchronous when it is connected only by operating system pipes but is synchronous when connected via stdin/stdout data moving. Policy Based Data Synchronisation: ================================== To support various requirements, e.g. decentralised backup generation with secure secure spooling, asynchronous transfers, a the "gb-transfer-service", a component for synchronisation is required, see "doc/Design.txt" section "Synchronisation" for design information. The implementation of "gb-transfer-service" orchestrates all components required related to following functional blocks: * ConnectorService: This service provides functions to establish connectivity to other "gb-transfer-service" instances. Currently only "SocketConnectorService" together with protocol handler "JsonStreamServerProtocolRequestHandler" is supported. The service has to care about authentication and basic service access authorisation. * Policies: Policies define, how the "gb-transfer-service" should interact with other "gb-transfer-service" instances. There are two types of policies, "ReceiverTransferPolicy" for incoming transfers and "SenderTransferPolicy" for transmitting data. See "ReceiverStoreDataTransferPolicy", "SenderMoveDataTransferPolicy", for currently supported policies. * Storage: A storage to store, fetch and delete StorageBackupDataElements. Some storages may support storing of custom annotation data per element. This can then be used in policies to perform policy decisions, e.g. to prioritise sending of files according to tags. Current storage implementation is "DefaultFileStorage". * TransferAgent: The agent keeps track of all current connections created via the ConnectorService. It may control load balancing between multiple connections. Current available agent implementation is "SimpleTransferAgent". Classes and interfaces: * ClientProtocolInterface: Classes implementing this interface are passed to the TransferAgent by the ConnectorService to allow outbound calls to the other agent. * ConnectorService: A service to establish in or outbound connections to an active TransferAgent. Implementation will vary depending on underlying protocol, e.g. TCP, socket, ... and authentication type, which is also handled by the ConnectorService. * DefaultFileStorage: This storage implementation stores all relevant information on the filesystem, supporting locking and extra attribute handling. It uses the element name to create the storage file names, appending "data", "info" or "lock" to it for content, meta information storage and locking. Extra attribute data is stored in by using the attribute name as file extension. Thus extensions from above but also ones containing dashes or dots are not allowed. * JsonStreamServerProtocolRequestHandler: This handler implements a minimalistic JSON protocol to invoke ServerProtocolInterface methods. See "doc/Design.txt" section "Transmission protocol" for protocol design information. * ReceiverStoreDataTransferPolicy: This class defines a receiver policy, that attempts to fetch all data elements offered by the remote transfer agent. * ReceiverTransferPolicy: This is the common superinterface of all receiver transfer policies. * ServerProtocolInterface: This is the server side protocol adaptor to be provided to the transfer service to forward remote requests to the local SenderPolicy. * SenderMoveDataTransferPolicy(SenderTransferPolicy): This is a simple sender transfer policy just advertising all resources for transfer and removing them or marking them as transferred as soon as remote side confirms successful transfer. A file with a mark will not be offered for download any more. * applyPolicy(): deletes the file when transfer was successful. * SenderTransferPolicy: This is the common superinterface of all sender side transfer policies. A policy implementation may require to adjust the internal state after data was transferred. * queryBackupDataElements(): return an iterator over all elements eligible for transfer by the current policy. The query may support remote side supplied query data for optimisation. This should of course only be used when the remote side knows the policy. * applyPolicy(): update internal state after data transfer was rejected, attempted or even successful. * SocketConnectorService: This is currently the only ConnectorService available. It accepts incoming connections on a local UNIX socket. Authentication and socket access authorisation has to be handled UNIX permissions or integration with other tools, e.g. "socat". For each incoming connection it uses a "JsonStreamServerProtocolRequestHandler" protocol handler. * TransferAgent: This class provides the core functionality for in and outbound transfers. It has a single sender or receiver transfer policy or both attached. Protocol connections are attached to it using a ConnectorService. The agent does not care about authentication any more: everything relevant for authorisation has to be provided by the ConnectorService and stored to the TransferContext. gb-storage-tool: ============: The workflow described in "Design.txt" is implemented as such: * Load the master configuration and all included configurations: This really creates just the "StorageConfig" and validates that directories or files exist referenced by the configuration exist. It also loads the "StorageStatus" (if any) but does not validate any entries inside it. A subconfiguration is only loaded after initialisation of the parent configuration was completed. See "StorageTool.loadConfiguration", "StorageConfig.__init__". * Locate the storage data and meta information files in each configuration data directory: This step is done for the complete configuration tree with main configuration as root and all included subconfiguration branches and leaves. Starting with the most specific (leaf) configurations, all files in the data directory are listed, associated with the current configuration loading it. After processing of leaves the same is done for the branch configuration including the leaves, thus adding files not yet covered by the leaves and branches. When all available storage files are known, they are grouped into backup data elements (data and metadata) before grouping also elements from the same source. See "StorageTool.initializeStorage", "StorageConfig.initializeStorage". * Check for applicable but yet unused policy templates and apply them to files matching the selection pattern: guerillabackup-0.5.0/doc/Installation.txt000066400000000000000000000114421450137035300204600ustar00rootroot00000000000000Manual Installation: ==================== This installation guide applies to perform a manual installation of GuerillaBackup. * Create backup generation directory structures: mkdir -m 0700 -p /etc/guerillabackup/units /var/lib/guerillabackup/data /var/lib/guerillabackup/state cp -aT src /usr/lib/guerillabackup General GuerillaBackup Configuration: ===================================== All tools require a general configuration file, which usually is identical for backup generation and transfer. The default location is "/etc/guerillabackup/config". It can be derived from "/etc/guerillabackup/config.template". The file contains configuration paramers that influence the behavior of various backup elements, e.g. source units, sinks or the generator itself. All those parameters start with "General" to indicate their global relevance. See "/etc/guerillabackup/config.template" template file for extensive comments regarding each parameter. Configuration of gb-backup-generator: ================================= * Configure generator units: The unit configuration directory "/etc/guerillabackup/units" contains templates for all available units. The documentation for unit configuration parameters can be found within the template itself. To enable a unit, the configuration has to be created and the unit code to be activated. See "gb-backup-generator" manual page for more details. * Enable a default logfile archiving component: ln -s -- /usr/lib/guerillabackup/lib/guerillabackup/LogfileBackupUnit.py /etc/guerillabackup/units/LogfileBackupUnit cp /etc/guerillabackup/units/LogfileBackupUnit.config.template /etc/guerillabackup/units/LogfileBackupUnit.config Enable log data directories by editing "LogfileBackupUnit.config". * Add a cyclic tar backup component: ln -s -- /usr/lib/guerillabackup/lib/guerillabackup/TarBackupUnit.py /etc/guerillabackup/units/TarBackupUnit cp /etc/guerillabackup/units/TarBackupUnit.config.template /etc/guerillabackup/units/TarBackupUnit.config Add tar backup configurations needed on the source system to the configuration file. * Perform a generator test run in foreground mode: Start the backup generator directly: /usr/bin/gb-backup-generator The tool should not emit any errors during normal operation while running. After your CPU is idle, check that all backup volumes were generated as expected by verifying existence of backup files in the sink directory. You might use find /var/lib/guerillabackup -type f | sort for that. * Enable automatic startup of the generator after boot: * On systemd systems: mkdir -p /etc/systemd/system cp data/init/systemd/guerillabackup.service /etc/systemd/system/guerillabackup.service systemctl enable guerillabackup.service start guerillabackup * On upstart systems: cp data/init/upstart/guerillabackup.conf /etc/init/guerillabackup.conf * As cronjob after reboot: cat < /etc/cron.d/guerillabackup @reboot root (/usr/bin/gb-backup-generator < /dev/null >> /var/log/guerillabackup.log 2>&1 &) EOF Configuration of gb-transfer-service: ================================== * Configure the service: The main configuration can be found in "/etc/guerillabackup/config". The most simplified transfer scheme is just a sender and receiver to move backup data. Transfer can be started independently from backup generation when conditions are favourable, e.g. connectivity or bandwidth availability. The upstream source documentation contains two testcases for this scenario, "SenderOnlyTransferService" and "ReceiverOnlyTransferService". * Sender configuration: Just enable "TransferSenderPolicyClass" and "TransferSenderPolicyInitArgs" for a default move-only sender policy. * Receiver configuration: While sender often requires root privileges to read the backup data files to avoid privacy issues with backup content. The receiver on the other hand is usually running on a suitable intermediate transfer hop or final data sink, where isolation is easier. In such scenarios, "/etc/guerillabackup/config.template" can be copied and used with any user ID. To use it, adjust "GeneralRuntimeDataDir" and "TransferServiceStorageBaseDir" appropriately, e.g. GeneralRuntimeDataDir = '/[user data directory]/run' TransferServiceStorageBaseDir = '/[user data directory]/[host] The receiver policies have to be enabled also by enabling "TransferReceiverPolicyClass" and "TransferReceiverPolicyInitArgs". The service is then started using /usr/bin/gb-transfer-service --Config [configfile] * Automatic startup: Activation is similar to "Configuration of gb-backup-generator", only the systemd unit name "guerillabackup-transfer.service" has to be used for systemd. * Initiate the transfer: Transfer will start as soon as a connection between the two gb-transfer-service instances is established. See "gb-transfer-service" manual page for more information on that. guerillabackup-0.5.0/doc/SoftwareOperation.txt000066400000000000000000000153031450137035300214720ustar00rootroot00000000000000Introduction: ============= This document deals with software operation after initial installation, e.g. regarding: * Analyze system failures, restoring operation * Using archives and backups Using archives and backups: =========================== * Restoring backups: WORD OF WARNING: If you are planning to restore backups "free-handed" (without any defined, written, validated and trained operation procedures), using this software is most likely inefficient and risky and you might search for another solution. Otherwise this section gives you hints what should be considered when defining your own backup procedures. NOTE: Future releases may contain written procedures for some common use cases, so that only selection of relevant use cases, validation and training needs to be done by you or your organization. General procedure: To restore data you have to unwind all steps performed during backup generation. As GuerillaBackup is a toolchain with nearly unlimited flexibility in backup generation, processing and transfer procedures, those "unwinding" procedures might be very special for your system and cannot be covered here. When running a setup following many recommendations from the installation guideline, those steps should be considered: 1) Locate the (one) storage of the backup to restore 2) Validate backup integrity with (suspected) adversaries 3) Decrypt the data if encrypted 4) Unpack the data to a storage system for that type of data 5) Merge the data according to data type, validate result * 1) Locate the (one) storage of the backup to restore: In distributed setups, "gb-transfer-service" (see man page) will have synchronized the backups between one or more storages according to the transfer policies in place. In small environments you might be able to locate the data checking your configuration. On larger setups relevant (transfer-, retention-, access- ...) policy and data location should be stored in your configuration management database that was also used for automated installation of the guerillabackup system. As backup data retrieval is a security critical step, you should also consider links to you ISMS here. Theft of backup data might be easier than stealing data from the live-systems, e.g. by fooling the system operator performing the restore procedure. The name of the backup storage files depends on your configuration. With the example configuration from the installation guide, just a simple tar-backup of the file system root is generated. With the default storage module, a pair of files exists for each backup timepoint, e.g. 20181220083942-root-full.data 20181220083942-root-full.info These files are usually located in "/var/lib/guerillabackup/data" by default. When using incremental backups, you usually will need to collect all following "-inc.(data|info)" files up to the restore point you want to reach. * 2) Validate backup integrity with (suspected) adversaries: There are two types of data integrity violations possible: * The backup data file itself contains corrupted data violating the data type specification, e.g. an invalid compression format. * The data itself is not corrupted on format level (so it can be restored technically) but its content was modified and does not match the data on the source system at the given timepoint. The first issue can be addressed by checking the ".info" file (a simple JSON file) and compare the "StorageFileChecksumSha512" value with the checksum of the ".data" file. If you are considering attacks on your backup system, the later data integrity violation can only be detected checking the synteny of all your backup archive. Therefore the "Predecessor" checksum field from the ".info" file can be used. Manipulation of a single backup archive will break the chain at that point. The attacker would have to manipulate also all ".info" files from that position on to create a consistent picture at least on the backup storage system. As the ".info" files are small, a paranoid configuration should also send them to off-site locations where at least the last version of each backup source can be retrieved for comparison. Currently the guerilla package does not yet contain a tool for those different data integrity validation procedures. You may want to check "Design.txt", section "Metainformation files" on how to create a small tool for yourself. * 3) Decrypt the data if encrypted: With the GnuPG plugin, data is encrypted using the public key configured during installation. As backup data archives might be way too large to quickly decrypt them to intermediate storage during restore and because unencrypted content should only be seen by the target system operating on that data, not the backup or some intermediate system (data leakage prevention), thus streaming the encrypted data to the target system and decryption on the target should be the way to go. When playing it safe, the private key for decryption is stored on a hardware token and cannot (and should never) be copied to the target system. For that use-case GnuPG provides the feature of "session-key-extraction". It is sufficient to decrypt only the first few kB of the backup archive using "--show-session-key" on the machine holding the PKI-token. On the target system corresponding "--override-session-key" option allows to decrypt the backup data stream on the fly as it arives. Refer to you gnupg documentation for more information. So the data receiving pipeline on the data target system might look somethink like [backup data receive/retrieve command] | gpg "--override-session-key [key]" --decrypt | [backup data specific restore/merge command(s) - see below] * 4) Unpack the data to a storage system for that type of data: To unpack the data you need to apply an opposite command than the one you configured to create the backup data, e.g. "pg_restore" vs "pg_dump". The same is true for tar-backups created by the recommended configuration from the installation guide. On a full backup with default "bzip2" compression usually this will suite most: ... data strem souce] | tar -C [mergelocation] --numeric-owner -xj * mergelocation: when the target system is "hot" during restore (other processes accessing files might be active) or complex data merging is required, data should never be extracted to final location immediately due to data quality and security risks. With incremental backups the "...-full.data" file has to be extracted first, followed by all incremental data files. See "tar" manual pages for more information. * 5) Merge the data according to data type, validate result: These are the fine arts of system operators maintaining highly reliable systems and clearly out of scope of this document. Make a plan, write it down, validate it, train your staff. guerillabackup-0.5.0/doc/gb-backup-generator.1.xml000066400000000000000000000147771450137035300217740ustar00rootroot00000000000000 ]> &dhtitle; &dhpackage; &dhfirstname; &dhsurname; Wrote this manual page.
&dhemail;
2016-2023 &dhusername; This manual page was written for guerillabackup system on Linux systems, e.g. Debian. Permission is granted to copy, distribute and/or modify this document under the terms of the Lesser GNU General Public License, Version 3. On Debian systems, the complete text of the Lesser GNU General Public License can be found in /usr/share/common-licenses/LGPL-3.
GB-BACKUP-GENERATOR &dhsection; gb-backup-generator Program to generate backups or archives using configured generator units according to given schedule. gb-backup-generator DESCRIPTION This is the manual page for the gb-backup-generator command. For more details see documentation at /usr/share/doc/guerillabackup. The generator is responsible to keep track over all scheduled backup tasks (units), to invoke them and write the created backup data stream to the data sink, usually the file system. The generator supports generation of encrypted backups, management of information about the backed-up element, adds hashes to support detection of missing and manipulated backups. With that functions, confidentiality and integrity can be protected, also providing non-repudiation features. OPTIONS This optional parameter specifies an alternative configuration loading directory instead of /etc/guerillabackup. The directory has to contain the main configuration file (config), the units subdirectory. FILES /etc/guerillabackup/config The main configuration file for all guerillabackup tools. Use /etc/guerillabackup/config.template to create it. The template also contains the documentation for each available parameter. /etc/guerillabackup/units/[name] The units directory contains the enabled backup data generation units. To enable a unit, a symbolic link to the unit definition file has to be created. The name of the symbolic link has to consist only of letters and numbers. For example, to enable LogfileBackupUnit for log-file archiving, one could use "ln -s -- /usr/lib/guerillabackup/lib/guerillabackup/LogfileBackupUnit.py LogfileBackupUnit". For units with an associated configuration file named "[unitname].config", configuration parameters from the main configuration file can be overridden within the unit-specific configuration. For all standard units, /etc/guerillabackup/units contains templates for unit configuration files. It is also possible to link a the same unit definition file more than once using different symbolic link names. Usually this only makes sense when each of those units has a different unit configuration file. /etc/systemd/system/guerillabackup-generator.service On systemd installations, this is the systemd configuration for automatic startup of the gb-backup-generator service. Usually it is not enabled by default. To enable use "systemctl enable guerillabackup-generator.service". BUGS For guerillabackup setups installed from packages, e.g. .deb or .rpm files usually installed via package management software, e.g. apt-get, aptitude, rpm, yast, please report bugs to the package maintainer. For setups from unpackaged software trunk, please report at . SEE ALSO gb-transfer-service1
guerillabackup-0.5.0/doc/gb-storage-tool.1.xml000066400000000000000000000440141450137035300211450ustar00rootroot00000000000000 ]> &dhtitle; &dhpackage; &dhfirstname; &dhsurname; Wrote this manual page.
&dhemail;
2022-2023 &dhusername; This manual page was written for guerillabackup system on Linux systems, e.g. Debian. Permission is granted to copy, distribute and/or modify this document under the terms of the Lesser GNU General Public License, Version 3. On Debian systems, the complete text of the Lesser GNU General Public License can be found in /usr/share/common-licenses/LGPL-3.
GB-STORAGE-TOOL &dhsection; gb-storage-tool Manage guerillabackup backup data storages gb-storage-tool DESCRIPTION This is the manual page for the gb-storage-tool command. The tool is used to perform operations on backup file storage locations as used by gb-backup-generator or gb-transfer-service to store backup data. Currently the tool supports checking storage file naming to identify incomplete backups due to aborts during backup generation or transfer e.g. by reboots or crashes. To ignore files for a reason, e.g. notes, add entries to the status file, e.g. For all files defining valid backup data elements, configurable policies are applied. See POLICIES section below for supported policies. OPTIONS This optional parameter specifies an alternative configuration file instead of /etc/guerillabackup/storage-tool-config.json. This optional parameter will make gb-storage-tool perform policy checks only but will not modify the storage, e.g. by deleting files flagged for deletion by a retention policy. POLICIES gb-storage-tool can apply multiple policies to each backup data source but it is only possible to have one policy of a given type (see policy types below). Which policies to apply is defined by the gb-storage-tool configuration file "Policies" parameter. A regular expression is used to select which sources policies should be applied to with the first matching expression taking precedence. For each regular expression a list of polices with parameters is defined. See /data/etc/storage-tool-config.json.template for examples. To ease policy selection in large setups, policy inheritance can be used. A included configuration (see "Include" configuration parameter) may also define policies, which can extend or override the policies from the parent configuration(s) but also policies defined just earlier in the same configuration. The overriding policy definition has to have a higher priority, otherwise it will be ignored. To disable policy inheritance a subconfiguration may set the "Inherit" configuration parameter to false (default is true). This will also prevent any policies defined earlier in the very same configuration to be ignored. Thus to disable inheritance for all sources in a configuration, the first entry in the policy list should match all sources (.*) and disable inheritance. Each policy defined in the gb-storage-tool configuration file may also keep policy status information in the status file. The status data is usually updated as the policy is applied unless there is a significant policy violation. That will require the user either to fix the root cause of the violation (e.g. backup data was found to be missing) or the user may update the status to ignore the violation. The later cannot be done interactively via gb-storage-tool yet, one has to adjust the storage status configuration manually. Therefore the user has to create or update the status configuration with the the backup element name (the filename relative to the data directory without any suffix) as key and the status information for the policy as described below (and sometimes given as hint on the console too). gb-storage-tool supports following policies: Interval: Verify that all sources generate backups at expected rates and all backups were transferred successfully. Thus this policy eases spotting of system failures in the backup system. An example policy configuration is: ... "Policies": [ { "Sources": "^(.*/)?root$", "Inherit": false, "List": [ { "Name": "Interval", "Priority": 100, "FullMin": "6d20H", "FullMax": "6d28H", "IncMin": "20H", "IncMax": "28H" }, { ... This configuration specifies that to all backups from source with name "root" (the default backup created by the gb-backup-generator) an Interval policy shall be applied. The policy will expect full backups every 7 days +- 4 hours and incremental backups each day +- 4 hours. When policy validation fails for a given source, the policy configuration may be adjusted but also the violation may be ignored by updating the check status. Thus the validation error will not be reported any more in the next run. The status data in that case may look like: ... "20200102000000-root-full": { "Interval": { "Ignore": "both" } }, ... This status indicates, that the both interval checks for the interval from the previous full and incremental backup to the backup named above should be disabled. Do disable only one type of checks, the "full" or "inc" type keyword is used instead of "both". While above is fine to ignore singular policy violations, also the policy itself may be adjusted. This is useful when e.g. the backup generation intervals where changed at the source. The status data in that case could look like: ... "20200102000000-root-full": { "Interval": { "Config": { "FullMax": "29d28H", "FullMin": "29d20H", "IncMax": "6d28H", "IncMin": "6d20H" } } }, ... LevelRetention: This defines a retention policy defined by retention levels, e.g. on first level keep each backup for 30 days, next level keep 12 weekly backups, on the next level keep 12 monthly backups, then 12 every three month and from that on only yearly ones. ... "Policies": [ { "Sources": "^(.*/)?root$", "List": [ { "Name": "LevelRetention", "Levels": [ # Keep weekly backups for 30 days, including incremental backups. { "KeepCount": 30, "Interval": "day", "TimeRef": "latest", "KeepInc": true }, # Keep weekly backups for 3 month, approx. 13 backups. { "KeepCount": 13, "Interval": "day", "AlignModulus": 7 }, ... { "KeepCount": 12, "Interval": "month", "AlignModulus": 3, "AlignValue": 1 }, ... This configuration defines, that on the finest level, backups for 30 days should be kept counting from the most recent on ("TimeRef": "latest"), including incremental backups ("KeepInc": true). Thus for machines not producing backups any more, the most recent ones are kept unconditionally. On the next level, 13 weekly backups are kept, while may overlap with backups already kept due to the first level configuration from above. But here only full backups will be kept, that were generated after every 7th day due to "AlignModulus", preferring the one generated on day 0. At another level, only one backup is kept every three month, preferring the one from the month numbered 1, 4, 7, 10 due to "AlignModulus" and "AlignValue". Hence the first backup in January, April, ... should be kept. Size: This policy checks that backup data sizes are as expected as size changes may indicate problems, e.g. a size increase due to archives, database dumps, local file backups ... forgotten by the administrator (thus wasting backup space but sometimes also causing security issues due to lack of as strict access permissions on those files compared to their source), size increase due to rampant processes filling up database tables or log files in retry loops (also monitoring should catch that), core dumps accumulating, ... A "Size" policy can be defined for both full and incremental backups. For each backup type, the accepted size range can be defined by absolute or relative values. Without providing an expected size, the size of the first backup of that type seen is used. Therefore for servers without accumulating data, following policy could be defined: ... "Policies": [ { "Sources": "^(.*/)?root$", "List": [ { "Name": "Size", "Priority": 0, "FullSizeMinRel": 0.9, "FullSizeMaxRel": 1.1, "IncSizeMin": 100000, "IncSizeMaxRel": 10.0 }, { ... This configuration will check sizes of "root" backups using the first full and incremental size as reference. Full backups may vary in size between 90% and 110% while incremental backups have to be at least 100kb large but may vary 10-fold in size. All supported policy parameters are: Specify the expected full backup size. When missing the size of first full backup seen is used as default. Specify the absolute maximum backup size. You cannot use "FullSizeMaxRel" at the same time. Specify the absolute minimum backup size. You cannot use "FullSizeMinRel" at the same time. Same as "Full..." parameters just for incremental backups. See above. Specify the maximum backup size in relation to the expected size (see "FullSizeExpect"). You cannot use "FullSizeMax" at the same time. Specify the minimum backup size in relation to the expected size (see "FullSizeExpect"). You cannot use "FullSizeMin" at the same time. Specify the expected incremental backup size in relation to the expected full backup size (see "FullSizeExpect"). You cannot use "IncSizeExpect" at the same time. Same as "Full..." parameters just for incremental backups. See above. When policy validation fails for a given source, the policy configuration may be adjusted but also the violation may be ignored by updating the check status. Thus the validation error will not be reported any more in the next run. The status data in that case may look like: ... "20200102000000-root-full": { "Size": { "Ignore": true } }, ... While above is fine to ignore singular policy violations, also the policy itself may be adjusted. This is useful when e.g. the size of backups changed due to installing of new software or services. The updated policy configuration can then be attached to the first element it should apply to: ... "20200102000000-root-full": { "Size": { "Config": { "FullSizeExpect": 234567890, "FullSizeMinRel": 0.9, "FullSizeMaxRel": 1.1, "IncSizeMin": 100000, "IncSizeMaxRel": 10.0 } } }, ... FILES /etc/guerillabackup/storage-tool-config.json The default configuration file for gb-storage-tool tool. Use storage-tool-config.json.template to create it. The template also contains the documentation for each available parameter. The most relevant parameters for gb-storage-tool are DataDir, Include and Status. /var/lib/guerillabackup/state/storage-tool-status.json This is the recommended location for the toplevel gb-storage-tool status file. The file has to contain valid JSON data but also comment lines starting with #. See POLICIES section above for description of policy specific status data. BUGS For guerillabackup setups installed from packages, e.g. .deb or .rpm files usually installed via package management software, e.g. apt-get, aptitude, rpm, yast, please report bugs to the package maintainer. For setups from unpackaged software trunk, please report at . SEE ALSO gb-transfer-service1
guerillabackup-0.5.0/doc/gb-transfer-service.1.xml000066400000000000000000000167341450137035300220200ustar00rootroot00000000000000 ]> &dhtitle; &dhpackage; &dhfirstname; &dhsurname; Wrote this manual page.
&dhemail;
2016-2023 &dhusername; This manual page was written for guerillabackup system on Linux systems, e.g. Debian. Permission is granted to copy, distribute and/or modify this document under the terms of the Lesser GNU General Public License, Version 3. On Debian systems, the complete text of the Lesser GNU General Public License can be found in /usr/share/common-licenses/LGPL-3.
GB-TRANSFER-SERVICE &dhsection; gb-transfer-service Synchronise guerillabackup backup data storages gb-transfer-service DESCRIPTION This is the manual page for the gb-transfer-service command. For more details see packaged documentation at /usr/share/doc/guerillabackup. The service has two main purposes: providing a stream-based protocol for interaction with other gb-transfer-service instances and application of storage and retrieval policies for data synchronisation. The network part uses a local (AF_UNIX) socket to listen for incoming connections (see /run/guerillabackup/transfer.socket below). There is no authentication magic or likely-to-be-flawed custom-made crypto included in that part: any process allowed to open the socket can talk the protocol. For connectivity and authentication, use your favourite (trusted) tools. Good starting points are socat with OPENSSL X509 client/server certificate checks on one side and UNIX-CONNECT:/run/guerillabackup/transfer.socket for the other one. When using SSH to forward such connections, you should consider key-based authentication with command forcing (command="/usr/bin/socat - UNIX-CONNECT:/run/guerillabackup/transfer.socket") and default security options (restrict). The policies are the other domain of the gb-transfer-service. They define the authorisation rules granting access to backup data elements but do NOT grant access to the remote file system as such or allow creation or restore of backups. That is the domain of gb-backup-generator tool. The policy also defines, which backup elements should be copied or moved to other storages. Each gb-transfer-service may have two polices: one defining, what should be sent to other instances (sender policy) and what should be received (receiver policy). Without defining a policy for a transfer direction, no data will be sent in that direction. Currently there are two predefined policies: ReceiverStoreDataTransferPolicy: this policy attempts to create a copy of each file offered by a remote sender and keeps it, even after the sender stopped providing it. This policy is useful to fetch all files from a remote storage. SenderMoveDataTransferPolicy: this policy offers all backup files in the local storage for transfer. Depending on the settings, files are deleted after sending or just flagged as sent after successful transfer. A policy implements one of the policy interfaces, that are ReceiverTransferPolicy and SenderTransferPolicy. You may create a custom policy when the predefined do not match your requirements. OPTIONS This optional parameter specifies an alternative configuration file instead of /etc/guerillabackup/config. FILES /etc/guerillabackup/config The main configuration file for all guerillabackup tools. Use /etc/guerillabackup/config.template to create it. The template also contains the documentation for each available parameter. The most relevant parameters for gb-transfer-service are TransferServiceStorageBaseDir, TransferReceiverPolicyClass, TransferReceiverPolicyInitArgs, TransferSenderPolicyClass, TransferSenderPolicyInitArgs. /run/guerillabackup/transfer.socket This is the default socket file name to connect two gb-transfer-service instances. The path can be changed by modification of "GeneralRuntimeDataDir" configuration property from default "/run/guerillabackup". By default, the socket is only accessible to privileged users and the user, who created it (mode 0600). You might change permissions after startup to grant access to other users also. BUGS For guerillabackup setups installed from packages, e.g. .deb or .rpm files usually installed via package management software, e.g. apt-get, aptitude, rpm, yast, please report bugs to the package maintainer. For setups from unpackaged software trunk, please report at . SEE ALSO gb-backup-generator1
guerillabackup-0.5.0/src/000077500000000000000000000000001450137035300152765ustar00rootroot00000000000000guerillabackup-0.5.0/src/gb-backup-generator000077500000000000000000000125621450137035300210510ustar00rootroot00000000000000#!/usr/bin/python3 -BEsStt """This tool allows to generate handle scheduling of sources and generation of backup data elements by writing them to a sink.""" import os import re import sys import time import traceback # Adjust the Python sites path to include only the guerillabackup # library addons, thus avoiding a large set of python site packages # to be included in code run with root privileges. Also remove # the local directory from the site path. sys.path = sys.path[1:]+['/usr/lib/guerillabackup/lib', '/etc/guerillabackup/lib-enabled'] import guerillabackup def runUnits(unitList, defaultUnitRunCondition, backupSink): """Run all units in the list in an endless loop.""" while True: immediateUnitList = [] nextInvocationTime = 3600 for unit in unitList: unitInvocationTime = unit.getNextInvocationTime() if unitInvocationTime == 0: immediateUnitList.append(unit) nextInvocationTime = min(nextInvocationTime, unitInvocationTime) if len(immediateUnitList) != 0: # Really run the tasks when there was no condition defined or # the condition is met. if (defaultUnitRunCondition is None) or \ (defaultUnitRunCondition.evaluate()): for unit in immediateUnitList: unit.invokeUnit(backupSink) # Clear the next invocation time, we do not know how long we spent # inside the scheduled units. nextInvocationTime = 0 else: # The unit is ready but the condition was not met. Evaluate the # conditions again in 10 seconds. nextInvocationTime = 10 if nextInvocationTime > 0: time.sleep(nextInvocationTime) def main(): """This is the program main function.""" backupConfigDirName = '/etc/guerillabackup' unitNameRegex = re.compile('^[0-9A-Za-z]+$') argPos = 1 while argPos < len(sys.argv): argName = sys.argv[argPos] argPos += 1 if not argName.startswith('--'): print('Invalid argument "%s"' % argName, file=sys.stderr) sys.exit(1) if argName == '--ConfigDir': backupConfigDirName = sys.argv[argPos] argPos += 1 continue if argName == '--Help': print( 'Usage: %s [OPTION]\n' \ ' --ConfigDir [dir]: Use custom configuration directory not\n' \ ' default "/etc/guerillabackup/"\n' % (sys.argv[0])) sys.exit(0) print('Unknown parameter "%s", try "--Help"' % argName, file=sys.stderr) sys.exit(1) # Make stdout, stderr unbuffered to avoid data lingering in buffers # when output is piped to another program. sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 1) sys.stderr = os.fdopen(sys.stderr.fileno(), 'w', 1) mainConfig = {} mainConfigFileName = os.path.join(backupConfigDirName, 'config') if not os.path.exists(backupConfigDirName): print('Configuration file %s does not exist' % repr(mainConfigFileName), file=sys.stderr) sys.exit(1) try: mainConfig = {'guerillabackup': guerillabackup} guerillabackup.execConfigFile(mainConfigFileName, mainConfig) mainConfig.__delitem__('__builtins__') except Exception as loadException: print('Failed to load configuration "%s": %s' % ( mainConfigFileName, str(loadException)), file=sys.stderr) traceback.print_tb(sys.exc_info()[2]) sys.exit(1) # Initialize the sink. backupSinkClass = mainConfig.get( 'GeneratorSinkClass', guerillabackup.DefaultFileSystemSink) backupSink = backupSinkClass(mainConfig) # Get the default condition for running any unit when ready. defaultUnitRunCondition = mainConfig.get('DefaultUnitRunCondition', None) # Now search the unit directory and load all units that should # be scheduled. unitList = [] unitDir = os.path.join(backupConfigDirName, 'units') unitDirFileList = os.listdir(unitDir) for unitFileName in unitDirFileList[:]: if unitFileName.endswith('.config'): # Ignore config files for now, will be loaded when handling the # unit main file. continue if (unitFileName == 'Readme.txt') or (unitFileName.endswith('.template')): # Ignore main templates and Readme.txt also. unitDirFileList.remove(unitFileName) continue matcher = unitNameRegex.match(unitFileName) if matcher is None: continue unitDirFileList.remove(unitFileName) # See if there is a configuration file to load before initializing # the unit. Clone the main configuration anyway to avoid accidental # modification by units. unitConfig = dict(mainConfig) unitConfigFileName = os.path.join(unitDir, '%s.config' % unitFileName) if os.path.exists(unitConfigFileName): try: unitDirFileList.remove(unitFileName+'.config') except: pass guerillabackup.execConfigFile(unitConfigFileName, unitConfig) unitConfig.__delitem__('__builtins__') # Load the code within a new namespace and create unit object # from class with the same name as the file. localsDict = {} guerillabackup.execConfigFile(os.path.join(unitDir, unitFileName), localsDict) unitClass = localsDict[guerillabackup.GENERATOR_UNIT_CLASS_KEY] unitObject = unitClass(unitFileName, unitConfig) unitList.append(unitObject) for unhandledFileName in unitDirFileList: print('WARNING: File %s/%s is not unit definition nor unit configuration ' \ 'for activated unit' % (unitDir, unhandledFileName), file=sys.stderr) # Now all units are loaded, start the scheduling. runUnits(unitList, defaultUnitRunCondition, backupSink) if __name__ == '__main__': main() guerillabackup-0.5.0/src/gb-storage-tool000077500000000000000000001013671450137035300202410ustar00rootroot00000000000000#!/usr/bin/python3 -BEsStt """This tool performs operations on a local data storage, at the moment only checking that all file names are sane, warning about unexpected files, e.g. from failed transfers.""" import datetime import json import os import re import sys # Adjust the Python sites path to include only the guerillabackup # library addons, thus avoiding a large set of python site packages # to be included in code run with root privileges. Also remove # the local directory from the site path. sys.path = sys.path[1:]+['/usr/lib/guerillabackup/lib', '/etc/guerillabackup/lib-enabled'] import guerillabackup.Utils from guerillabackup.storagetool.PolicyTypeInterval import PolicyTypeInterval from guerillabackup.storagetool.PolicyTypeLevelRetention import PolicyTypeLevelRetention from guerillabackup.storagetool.PolicyTypeSize import PolicyTypeSize class PolicyGroup(): """A policy group is a list of policies to be applied to sources matching a given regular expression.""" def __init__(self, groupConfig): """Create a new policy configuration group.""" if not isinstance(groupConfig, dict): raise Exception('Policies entry not a dictionary') self.inheritFlag = True self.resourceRegex = re.compile(groupConfig['Sources']) if 'Inherit' in groupConfig: self.inheritFlag = groupConfig['Inherit'] if not isinstance(self.inheritFlag, bool): raise Exception( 'Inherit policy configuration flag has to be ' \ 'true or false (defaults to true)') self.policyList = [] for policyConfig in groupConfig['List']: policy = { PolicyTypeInterval.POLICY_NAME: PolicyTypeInterval, PolicyTypeLevelRetention.POLICY_NAME: PolicyTypeLevelRetention, PolicyTypeSize.POLICY_NAME: PolicyTypeSize }.get(policyConfig['Name'], None) if policy is None: raise Exception( 'Unknown policy type "%s"' % policyConfig['Name']) self.policyList.append(policy(policyConfig)) def isApplicableToSource(self, sourceName): """Check if this policy is applicable to a given source.""" return self.resourceRegex.match(sourceName) is not None def isPolicyInheritanceEnabled(self): """Check if policy inheritance is enabled for this group.""" return self.inheritFlag def getPolicies(self): """Get the list of policies in this group.""" return self.policyList class StorageConfig(): """This class implements the configuration of a storage location without the status information.""" def __init__(self, fileName, parentConfig): """Create a storage configuration from a given file name. @param fileName the filename where to load the storage configuration. @param parentConfig this is the parent configuration that included this configuration and therefor may also define policies.""" self.configFileName = fileName self.parentConfig = parentConfig self.dataDir = None # This is the list of files to ignore within the data directory. self.ignoreFileList = [] # Keep all policies in a list as we need to loop over all of # them anyway. self.policyList = None self.storageStatusFileName = None self.storageStatus = None self.includedConfigList = [] if not os.path.exists(self.configFileName): raise Exception( 'Configuration file "%s" does not exist' % self.configFileName) config = guerillabackup.Utils.jsonLoadWithComments(self.configFileName) includeFileList = [] if not isinstance(config, dict): raise Exception() for configName, configData in config.items(): if configName == 'DataDir': if not isinstance(configData, str): raise Exception() self.dataDir = self.canonicalizePathname(configData) if not os.path.isdir(self.dataDir): raise Exception( 'Data directory %s in configuration %s does ' \ 'not exist or is not a directory' % ( self.dataDir, self.configFileName)) continue if configName == 'Ignore': self.ignoreFileList = configData continue if configName == 'Include': includeFileList = configData continue if configName == 'Policies': self.initPolicies(configData) continue if configName == 'Status': if not isinstance(configData, str): raise Exception() self.storageStatusFileName = self.canonicalizePathname(configData) if os.path.exists(self.storageStatusFileName): if not os.path.isfile(self.storageStatusFileName): raise Exception('Status file "%s" has to be file' % configData) self.storageStatus = StorageStatus( self, guerillabackup.Utils.jsonLoadWithComments( self.storageStatusFileName)) continue raise Exception('Invalid configuration section "%s"' % configName) # Always have a status object even when not loaded from file. if self.storageStatus is None: self.storageStatus = StorageStatus(self, None) # Initialize the included configuration only after initialization # of this object was completed because we pass this object as # parent so it might already get used. for includeFileName in includeFileList: includeFileName = self.canonicalizePathname(includeFileName) if not os.path.isfile(includeFileName): raise Exception( 'Included configuration %s in configuration %s ' \ 'does not exist or is not a file' % ( includeFileName, self.configFileName)) try: includedConfig = StorageConfig(includeFileName, self) self.includedConfigList.append(includedConfig) except: print( 'Failed to load configuration "%s" included from "%s".' % ( includeFileName, self.configFileName), file=sys.stderr) raise def getConfigFileName(self): """Get the name of the file defining this configuration.""" return self.configFileName def canonicalizePathname(self, pathname): """Canonicalize a pathname which might be a name relative to the configuration file directory or a noncanonical absolute path.""" if not os.path.isabs(pathname): pathname = os.path.join(os.path.dirname(self.configFileName), pathname) return os.path.realpath(pathname) def getStatusFileName(self): """Get the name of the status file to use to keep validation status information between validation runs.""" return self.storageStatusFileName def getStatus(self): """Get the status object associated with this configuration.""" return self.storageStatus def initPolicies(self, policyConfig): """Initialize all policies from the given JSON policy configuration. The configuration may contain multiple policy definitions per resource name regular expression, but that is just for convenience and not reflected in the policy data structures.""" if not isinstance(policyConfig, list): raise Exception('Policies not a list') policyList = [] for policyGroupConfig in policyConfig: policyList.append(PolicyGroup(policyGroupConfig)) self.policyList = policyList def getDataDirectoryRelativePath(self, pathname): """Get the pathname of a file relative to the data directory. @param pathname the absolute path name to resolve.""" return os.path.relpath(pathname, self.dataDir) def initializeStorage(self, storageFileDict): """Initialize the storage by locating all files in this data storage directory and also apply already known status information.""" # First load included configuration data to assign the most specific # configuration and status to those resources. for includeConfig in self.includedConfigList: includeConfig.initializeStorage(storageFileDict) # Walk the data directory to include any files not included yet. for dirName, subDirs, subFiles in os.walk(self.dataDir): for fileName in subFiles: fileName = os.path.join(dirName, fileName) fileInfo = storageFileDict.get(fileName, None) if fileInfo is not None: refConfig = fileInfo.getConfig() if refConfig == self: raise Exception('Logic error') if self.dataDir == refConfig.dataDir: raise Exception( 'Data directory "%s" part of at least two ' \ '(included) configurations' % self.dataDir) if (not refConfig.dataDir.startswith(self.dataDir)) or \ (refConfig.dataDir[len(self.dataDir)] != '/'): raise Exception( 'Directory tree inconsistency due to logic ' \ 'error or concurrent (malicious) modifictions') else: # Add the file and reference to this configuration or override # a less specific previous one. storageFileDict[fileName] = StorageFileInfo(self.storageStatus) # Now detect all the valid resources and group them. self.storageStatus.updateResources(storageFileDict) self.storageStatus.validate(storageFileDict) def getSourcePolicies(self, sourceName): """Get the list of policies to be applied to a source.""" result = [] if self.parentConfig is not None: # No need to clone the list as the topmost configuration will # have started with a newly created list anyway. result = self.parentConfig.getSourcePolicies(sourceName) for policyGroup in self.policyList: # Ignore policies not matching the source. if not policyGroup.isApplicableToSource(sourceName): continue if not policyGroup.isPolicyInheritanceEnabled(): result = [] for policy in policyGroup.getPolicies(): for refPos, refPolicy in enumerate(result): refPolicy = result[refPos] if refPolicy.getPolicyName() == policy.getPolicyName(): if refPolicy.getPriority() < policy.getPriority(): result[refPos] = policy policy = None break if policy is not None: result.append(policy) return result class BackupDataElement(): """This class stores information about a single backup data element that is the reference to the backup data itself but also the meta information. The class is designed in a way to support current file based backup data element storage but also future storages where the meta information may end up in a relational database and the backup data is kept in a storage system.""" # This name is used to store deletion status information in the # storage status information in the same way as policy data. Unlike # policy data, the delete status data cannot be serialized. DELETE_STATUS_POLICY = 'Delete' def __init__(self, sourceStatus, dateTimeId, elementType): """Create a element as part of the status of one backup source. @param sourceStatus this is the reference to the complete status of the backup source this element is belonging to. @param dateTimeId the datetime string when this element was created and optional ID number part. It has to contain at least a 14 digit timestamp but may include more digits which then sorted according to their integer value.""" self.sourceStatus = sourceStatus if (len(dateTimeId) < 14) or (not dateTimeId.isnumeric()): raise Exception('Datetime to short or not numeric') self.dateTimeId = dateTimeId if elementType not in ('full', 'inc'): raise Exception() self.type = elementType self.dataFileName = None self.dataLength = None self.infoFileName = None # This dictionary contains policy data associated with this element. # The key is the name of the policy holding the data. self.policyData = {} def getSourceStatus(self): """Get the complete source status information this backup data element is belonging to. @return the source status object.""" return self.sourceStatus def getDateTimeId(self): """Get the date, time and ID string of this element.""" return self.dateTimeId def setFile(self, fileName, fileType): """Set a file of given type to define this backup data element.""" if fileType == 'data': if self.dataFileName is not None: raise Exception( 'Logic error redefining data file for "%s"' % ( self.sourceStatus.getSourceName())) if not fileName.endswith('.data'): raise Exception() self.dataFileName = fileName elif fileType == 'info': if self.infoFileName is not None: raise Exception() self.infoFileName = fileName else: raise Exception('Invalid file type %s' % fileType) def getElementName(self): """Get the name of the element in the storage.""" sourceName = self.sourceStatus.getSourceName() elementStart = sourceName.rfind('/') +1 partStr = '%s-%s-%s' % ( self.dateTimeId, sourceName[elementStart:], self.type) if elementStart: return '%s%s' % (sourceName[:elementStart], partStr) return partStr def getDatetimeSeconds(self): """Get the datetime part of this element as seconds since epoche.""" dateTime = datetime.datetime.strptime(self.dateTimeId[:14], '%Y%m%d%H%M%S') # FIXME: no UTC conversion yet. return int(dateTime.timestamp()) def getDatetimeTuple(self): """Get the datetime of this element as a tuple. @return a tuple with the datetime part as string and the serial part as integer or -1 when there is no serial part.""" dateTime = self.dateTimeId[:14] serial = -1 if len(self.dateTimeId) > 14: serialStr = self.dateTimeId[14:] if (serialStr[0] == '0') and (len(serialStr) > 1): raise Exception() serial = int(serialStr) return (dateTime, serial) def getType(self): """Get the type of this backup data element.""" return self.type def getDataLength(self): """Get the length of the binary backup data of this element.""" if self.dataLength is None: self.dataLength = os.stat(self.dataFileName).st_size return self.dataLength def getStatusData(self): """Get the complete status data associated with this element. Currently this data is identical to the complete policy data. @return the status data or None when there is no status data associated with this element.""" return self.policyData def getPolicyData(self, policyName): """Get the policy data for a given policy name. @return the data or None when there was no data defined yet.""" return self.policyData.get(policyName, None) def setPolicyData(self, policyName, data): """Set the policy data for a given policy name.""" self.policyData[policyName] = data def updatePolicyData(self, policyName, data): """Update the policy data for a given policy name by adding or overriding the data. @return the updated policy data""" if not isinstance(data, dict): raise Exception() policyData = self.policyData.get(policyName, None) if policyData is None: policyData = dict(data) self.policyData[policyName] = policyData else: policyData.update(data) return policyData def removePolicyData(self, policyName): """Remove the policy data for a given policy name if it exists.""" if policyName in self.policyData: del self.policyData[policyName] def initPolicyData(self, data): """Initialize the complete policy data of this backup data element. This function may only be called while there is no policy data defined yet.""" if self.policyData: raise Exception( 'Attempted to initialize policy data twice for %s' % ( repr(self.dataFileName))) self.policyData = data def findUnsuedPolicyData(self, policyNames): """Check if this element contains policy data not belonging to any policy. @return the name of the first unused policy found or None.""" for key in self.policyData.keys(): if key not in policyNames: return key return None def markForDeletion(self, deleteFlag): """Mark this element for deletion if it was not marked at all or marked in the same way. @param deleteFlag the deletion mark to set or None to remove the current mark. @raise Exception if there is already a conflicting mark.""" if (deleteFlag is not None) and (not isinstance(deleteFlag, bool)): raise Exception() policyData = self.getPolicyData(BackupDataElement.DELETE_STATUS_POLICY) if (policyData is not None) and (not isinstance(policyData, bool)): raise Exception() if deleteFlag is None: if policyData: self.removePolicyData(BackupDataElement.DELETE_STATUS_POLICY) else: if (policyData is not None) and (policyData != deleteFlag): raise Exception() self.setPolicyData(BackupDataElement.DELETE_STATUS_POLICY, deleteFlag) def isMarkedForDeletion(self): """Check if this element is marked for deletion.""" policyData = self.getPolicyData(BackupDataElement.DELETE_STATUS_POLICY) if (policyData is not None) and (not isinstance(policyData, bool)): raise Exception() return bool(policyData) def delete(self): """Delete all resources associated with this backup data element.""" os.unlink(self.dataFileName) os.unlink(self.infoFileName) class BackupSourceStatus(): """This class stores status information about a single backup source, e.g. all BackupDataElements belonging to this source, current policy status information ...""" def __init__(self, storageStatus, sourceName): self.storageStatus = storageStatus # This is the unique name of this source. self.sourceName = sourceName # This dictionary contains all backup data elements belonging # to this source with datetime and type as key. self.dataElements = {} def getStorageStatus(self): """Get the complete storage status containing the status of this backup source. @return the storage status object.""" return self.storageStatus def getSourceName(self): """Get the name of the source that created the backup data elements. @return the name of the source.""" return self.sourceName def addFile(self, fileName, dateTimeId, elementType, fileType): """Add a file to this storage status. With the current file storage model, two files will define a backup data element. @param fileName the file to add. @param dateTimeId the datetime and additional serial number information. @param elementType the type of element to create, i.e. full or incremental. @param fileType the type of the file, i.e. data or info.""" key = (dateTimeId, elementType) element = self.dataElements.get(key, None) if element is None: element = BackupDataElement(self, dateTimeId, elementType) self.dataElements[key] = element element.setFile(fileName, fileType) def getDataElementList(self): """Get the sorted list of all data elements.""" result = list(self.dataElements.values()) result.sort(key=lambda x: x.getDatetimeTuple()) return result def findElementByKey(self, dateTimeId, elementType): """Find an element by the identification key values. @return the element or None when not found.""" return self.dataElements.get((dateTimeId, elementType), None) def getElementIdString(self, element): """Get the identification string of a given element of this source.""" idStr = '' pathEndPos = self.sourceName.rfind('/') if pathEndPos >= 0: idStr = '%s/%s-%s-%s' % ( self.sourceName[:pathEndPos], element.getDateTimeId(), self.sourceName[pathEndPos+1:], element.getType()) else: idStr = '%s-%s-%s' % ( element.getDateTimeId(), self.sourceName, element.getType()) return idStr def removeDeleted(self): """Remove all elements that are marked deleted. The method should only be invoked after applying all deletion policies.""" for key in list(self.dataElements.keys()): element = self.dataElements[key] if element.isMarkedForDeletion(): del self.dataElements[key] def serializeStatus(self): """Serialize the status of all files belonging to this source. @return a dictionary with the status information.""" status = {} for element in self.dataElements.values(): statusData = element.getStatusData() # Do not serialize the deletion policy data. if BackupDataElement.DELETE_STATUS_POLICY in statusData: statusData = dict(statusData) del statusData[BackupDataElement.DELETE_STATUS_POLICY] if not statusData: statusData = None if statusData: status[self.getElementIdString(element)] = statusData return status class StorageStatus(): """This class keeps track about the status of one storage location, that are all tracked resources but also unrelated files within the storage location.""" RESOURCE_NAME_REGEX = re.compile( '^(?P[0-9]{14,})-(?P[0-9A-Za-z.-]+)-' \ '(?Pfull|inc)\\.(?Pdata|info)$') def __init__(self, config, statusData): self.config = config self.statusData = statusData if self.statusData is None: self.statusData = {} # This dictionary contains the name of each source (not file) # found in this storage as key and the BackupSourceStatus element # bundling all relevant information about the source. self.trackedSources = {} def getConfig(self): """Get the configuration managing this status.""" return self.config def getStatusFileName(self): """Get the file name holding this backup status data.""" return self.config.getStatusFileName() def findElementByName(self, name): """Find a backup element tracked by this status object by name. @return the element when found or None""" relFileName = name + '.data' nameStartPos = relFileName.rfind('/') + 1 match = StorageStatus.RESOURCE_NAME_REGEX.match( relFileName[nameStartPos:]) if match is None: raise Exception( 'Invalid element name "%s"' % str(name)) sourceName = match.group('name') if nameStartPos != 0: sourceName = relFileName[:nameStartPos] + sourceName if sourceName not in self.trackedSources: return None sourceStatus = self.trackedSources[sourceName] return sourceStatus.findElementByKey( match.group('datetime'), match.group('type')) def validate(self, storageFileDict): """Validate that all files tracked in the status can be still found in the list of all storage files.""" for elemName in self.statusData.keys(): targetFileName = os.path.join(self.config.dataDir, elemName + '.data') if targetFileName not in storageFileDict: raise Exception( 'Invalid status of nonexisting file "%s.data" ' \ 'in data directory "%s"' % ( elemName, self.config.dataDir)) def updateResources(self, storageFileDict): """Update the status of valid sources. @param storageFileDict the dictionary keeping track of known files from all configurations.""" ignoreSet = set() ignoreSet.update(self.config.ignoreFileList) for fileName, fileInfo in storageFileDict.items(): if fileInfo.getStatus() != self: continue relFileName = self.config.getDataDirectoryRelativePath(fileName) if relFileName in ignoreSet: ignoreSet.remove(relFileName) continue nameStartPos = relFileName.rfind('/') + 1 match = StorageStatus.RESOURCE_NAME_REGEX.match( relFileName[nameStartPos:]) if match is None: print( 'File "%s" (absolute "%s") should be ignored by ' \ 'config "%s".' % ( fileInfo.getConfig().getDataDirectoryRelativePath(fileName), fileName, fileInfo.getConfig().getConfigFileName()), file=sys.stderr) continue sourceName = match.group('name') if nameStartPos != 0: sourceName = relFileName[:nameStartPos] + sourceName sourceStatus = self.trackedSources.get(sourceName, None) if sourceStatus is None: sourceStatus = BackupSourceStatus(self, sourceName) self.trackedSources[sourceName] = sourceStatus sourceStatus.addFile( fileName, match.group('datetime'), match.group('type'), match.group('element')) fileInfo.setBackupSource(sourceStatus) for unusedIgnoreFile in ignoreSet: print( 'WARNING: Nonexisting file "%s" ignored in configuration "%s".' % ( unusedIgnoreFile, self.config.getConfigFileName()), file=sys.stderr) for elemName, statusData in self.statusData.items(): element = self.findElementByName(elemName) if element is None: raise Exception( 'Invalid status of nonexisting element "%s" ' \ 'in data directory "%s"' % ( elemName, self.config.dataDir)) element.initPolicyData(statusData) def applyPolicies(self): """Apply the policy to this storage and all storages defined in subconfigurations. For each storage this will check all policy templates if one or more of them should be applied to known sources managed by this configuration.""" for includeConfig in self.config.includedConfigList: includeConfig.getStatus().applyPolicies() for sourceName, sourceStatus in self.trackedSources.items(): policyList = self.config.getSourcePolicies(sourceName) if len(policyList) == 0: print('WARNING: no policies for "%s" in "%s"' % ( sourceName, self.config.getConfigFileName()), file=sys.stderr) policyNames = set() policyNames.add(BackupDataElement.DELETE_STATUS_POLICY) for policy in policyList: policyNames.add(policy.getPolicyName()) policy.apply(sourceStatus) # Now check if any element is marked for deletion. In the same # round detect policy status not belonging to any policy. deleteList = [] for element in sourceStatus.getDataElementList(): if element.findUnsuedPolicyData(policyNames): print( 'WARNING: Unused policy data for "%s" in ' \ 'element "%s".' % ( element.findUnsuedPolicyData(policyNames), element.getElementName()), file=sys.stderr) if element.isMarkedForDeletion(): deleteList.append(element) if not deleteList: continue # So there are deletions, show them and ask for confirmation. print('Backup data to be kept/deleted for "%s" in storage "%s":' % (sourceName, self.config.getConfigFileName())) for element in sourceStatus.getDataElementList(): marker = '*' if element.isMarkedForDeletion(): marker = ' ' print( '%s %s %s' % (marker, element.getDateTimeId(), element.getType())) inputText = None if StorageTool.INTERACTIVE_MODE == 'keyboard': inputText = input('Delete elements in "%s" (y/n)? ' % sourceName) elif StorageTool.INTERACTIVE_MODE == 'force-no': pass elif StorageTool.INTERACTIVE_MODE == 'force-yes': inputText = 'y' if inputText != 'y': # Deletion was not confirmed. continue # So there are deletions. Deleting data may corrupt the status # data if there is any software or system error while processing # the deletions. Therefore save the current status in a temporary # file before modifying the data by applying deletion policies. # Verify that there was no logic flaw assinging sources to the # wrong storage. if sourceStatus.getStorageStatus() != self: raise Exception() # Now save the current storage status to a temporary file as # applying policies might have modified it already. statusFileNamePreDelete = self.save(suffix='.pre-delete') for policy in policyList: policy.delete(sourceStatus) # All policies were invoked, so remove the deleted elements from # the status. sourceStatus.removeDeleted() # So at least updating the status data has worked, so save the # status data. statusFileNamePostDelete = self.save(suffix='.post-delete') # Now just delete the files. File system errors or system crashes # will still cause an inconsistent state, but that is easy to # detect. for element in deleteList: element.delete() os.unlink(statusFileNamePreDelete) os.rename(statusFileNamePostDelete, self.config.getStatusFileName()) def save(self, suffix=None): """Save the current storage status to the status file or a new file derived from the status file name by adding a suffix. When saving to the status file, the old status file may exist and is replaced. Saving to files with suffix is only intended to create temporary files to avoid status data corruption during critical operations. These files shall be removed or renamed by the caller as soon as not needed any more. @return the file name the status data was written to.""" # First serialize all status data. statusData = {} for sourceStatus in self.trackedSources.values(): statusData.update(sourceStatus.serializeStatus()) targetFileName = self.config.getStatusFileName() if suffix is not None: targetFileName += suffix if os.path.exists(targetFileName): raise Exception() targetFile = open(targetFileName, 'wb') targetFile.write(bytes(json.dumps( statusData, indent=2, sort_keys=True), 'ascii')) targetFile.close() return targetFileName class StorageFileInfo(): """This class stores information about one file found in the file data storage directory.""" def __init__(self, status): # This is the storage configuration authoritative for defining # the status and policy of this file. self.status = status # This is the backup source that created the file data. self.backupSource = None def getStatus(self): """Get the authoritative status for this file.""" return self.status def setBackupSource(self, backupSource): """Set the backup source this file belongs to.""" if self.backupSource is not None: raise Exception('Logic error') self.backupSource = backupSource def getConfig(self): """Get the configuration associated with this storage file.""" return self.status.config class StorageTool(): """This class implements the storage tool main functions.""" # Use a singleton variable to define interactive behaviour. INTERACTIVE_MODE = 'keyboard' def __init__(self): """Create a StorageTool object with default configuration. The object has to be properly initialized by loading configuration data from files.""" self.configFileName = '/etc/guerillabackup/storage-tool-config.json' self.config = None # This is the dictionary of all known storage files found in # the data directory of the main configuration or any subconfiguration. self.storageFileDict = {} def parseCommandLine(self): """This function will parse command line arguments and update settings before loading of configuration.""" if self.config is not None: raise Exception('Cannot reload configuration') argPos = 1 while argPos < len(sys.argv): argName = sys.argv[argPos] argPos += 1 if not argName.startswith('--'): raise Exception('Invalid argument "%s"' % argName) if argName == '--Config': self.configFileName = sys.argv[argPos] argPos += 1 continue if argName == '--Help': print( 'Usage: %s [options]\n' \ '* --Config [file]: Use this configuration file not the default\n' \ ' file at "/etc/guerillabackup/storage-tool-config.json".\n' \ '* --DryRun: Just report check results but do not ' \ 'modify storage.\n' \ '* --Help: This output' % sys.argv[0], file=sys.stderr) sys.exit(0) if argName == '--DryRun': StorageTool.INTERACTIVE_MODE = 'force-no' continue print( 'Unknown parameter "%s", use "--Help" or see man page.' % argName, file=sys.stderr) sys.exit(1) def loadConfiguration(self): """Load the configuration from the specified configuration file.""" if self.config is not None: raise Exception('Cannot reload configuration') self.config = StorageConfig(self.configFileName, None) def initializeStorage(self): """Initialize the storage by locating all files in data storage directories and also apply already known status information.""" self.config.initializeStorage(self.storageFileDict) def applyPolicies(self): """Check all policy templates if one or more should be applied to any known resource.""" self.config.getStatus().applyPolicies() def main(): """This is the program main function.""" tool = StorageTool() tool.parseCommandLine() tool.loadConfiguration() # Now all recursive configurations are loaded. First initialize # the storage. tool.initializeStorage() # Next check for files not already covered by policies according # to status data. Suggest status changes for those. tool.applyPolicies() print('All policies applied.', file=sys.stderr) if __name__ == '__main__': main() guerillabackup-0.5.0/src/gb-transfer-service000077500000000000000000000130271450137035300210770ustar00rootroot00000000000000#!/usr/bin/python3 -BEsStt """This file defines a simple transfer service, that supports inbound connections via a listening socket or receiving of file descriptors if supported. An authentication helper may then provide the agentId information. Otherwise it is just the base64 encoded binary struct sockAddr information extracted from the file descriptor. Authorization has to be performed outside the this service.""" import sys # Adjust the Python sites path to include only the guerillabackup # library addons, thus avoiding a large set of python site packages # to be included in code run with root privileges. Also remove # the local directory from the site path. sys.path = sys.path[1:] + ['/usr/lib/guerillabackup/lib', '/etc/guerillabackup/lib-enabled'] import errno import os import signal import guerillabackup from guerillabackup.DefaultFileStorage import DefaultFileStorage from guerillabackup.Transfer import SimpleTransferAgent from guerillabackup.Transfer import SocketConnectorService class TransferApplicationContext(): def __init__(self): """Initialize this application context without loading any configuration. That has to be done separately e.g. by invoking initFromSysArgs.""" self.serviceConfigFileName = '/etc/guerillabackup/config' self.mainConfig = None self.connectorService = None self.forceShutdownFlag = False def initFromSysArgs(self): """This method initializes the application context from the system command line arguments but does not run the service yet. Any errors during initialization will cause the program to be terminated.""" argPos = 1 while argPos < len(sys.argv): argName = sys.argv[argPos] argPos += 1 if not argName.startswith('--'): print('Invalid argument "%s"' % argName, file=sys.stderr) sys.exit(1) if argName == '--Config': self.serviceConfigFileName = sys.argv[argPos] argPos += 1 continue print('Unknown parameter "%s"' % argName, file=sys.stderr) sys.exit(1) if not os.path.exists(self.serviceConfigFileName): print('Configuration file %s does not exist' % ( repr(self.serviceConfigFileName),), file=sys.stderr) sys.exit(1) self.mainConfig = {} try: self.mainConfig = {'guerillabackup': guerillabackup} guerillabackup.execConfigFile( self.serviceConfigFileName, self.mainConfig) except: print('Failed to load configuration %s' % ( repr(self.serviceConfigFileName),), file=sys.stderr) import traceback traceback.print_tb(sys.exc_info()[2]) sys.exit(1) def createPolicy(self, classNameKey, initArgsKey): """Create a policy with given keys.""" policyClass = self.mainConfig.get(classNameKey, None) if policyClass is None: return None policyInitArgs = self.mainConfig.get(initArgsKey, None) if policyInitArgs is None: return policyClass(self.mainConfig) policyInitArgs = [self.mainConfig]+policyInitArgs return policyClass(*policyInitArgs) def startService(self): """This method starts the transfer service.""" # Make stdout, stderr unbuffered to avoid data lingering in buffers # when output is piped to another program. sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 1) sys.stderr = os.fdopen(sys.stderr.fileno(), 'w', 1) # Initialize the storage. storageDirName = self.mainConfig.get('TransferServiceStorageBaseDir', None) if storageDirName is None: storageDirName = self.mainConfig.get( guerillabackup.DefaultFileSystemSink.SINK_BASEDIR_KEY, None) if storageDirName is None: print('No storage configured, use configuration key "%s"' % ( guerillabackup.DefaultFileSystemSink.SINK_BASEDIR_KEY), file=sys.stderr) sys.exit(1) if not os.path.isdir(storageDirName): print('Storage directory %s does not exist or is inaccessible' % repr(storageDirName), file=sys.stderr) sys.exit(1) storage = DefaultFileStorage(storageDirName, self.mainConfig) transferAgent = SimpleTransferAgent() runtimeDataDirPathname = guerillabackup.getRuntimeDataDirPathname( self.mainConfig) receiverPolicy = self.createPolicy( guerillabackup.TRANSFER_RECEIVER_POLICY_CLASS_KEY, guerillabackup.TRANSFER_RECEIVER_POLICY_INIT_ARGS_KEY) senderPolicy = self.createPolicy( guerillabackup.TRANSFER_SENDER_POLICY_CLASS_KEY, guerillabackup.TRANSFER_SENDER_POLICY_INIT_ARGS_KEY) try: os.mkdir(runtimeDataDirPathname, 0o700) except OSError as mkdirError: if mkdirError.errno != errno.EEXIST: raise self.connectorService = SocketConnectorService( os.path.join(runtimeDataDirPathname, 'transfer.socket'), receiverPolicy, senderPolicy, storage, transferAgent) signal.signal(signal.SIGINT, self.shutdown) signal.signal(signal.SIGHUP, self.shutdown) signal.signal(signal.SIGTERM, self.shutdown) self.connectorService.run() def shutdown(self, signum, frame): """This function triggers shutdown of the service. By default when invoked for the first time, the method will still wait 10 seconds for any ongoing operations to complete. When invoked twice that will trigger immediate service shutdown.""" forceShutdownTime = 10 if self.forceShutdownFlag: forceShutdownTime = 0 self.connectorService.shutdown(forceShutdownTime=forceShutdownTime) self.forceShutdownFlag = True applicationContext = TransferApplicationContext() applicationContext.initFromSysArgs() applicationContext.startService() guerillabackup-0.5.0/src/lib/000077500000000000000000000000001450137035300160445ustar00rootroot00000000000000guerillabackup-0.5.0/src/lib/guerillabackup/000077500000000000000000000000001450137035300210365ustar00rootroot00000000000000guerillabackup-0.5.0/src/lib/guerillabackup/BackupElementMetainfo.py000066400000000000000000000053221450137035300256140ustar00rootroot00000000000000"""This module contains only the class for in memory storage of backup data element metadata.""" import base64 import json class BackupElementMetainfo(): """This class is used to store backup data element metadata in memory.""" def __init__(self, valueDict=None): """Create a a new instance. @param if not None, use this dictionary to initialize the object. Invocation without a dictionary should only be used internally during deserialization.""" self.valueDict = valueDict if valueDict != None: self.assertMetaInfoSpecificationConforming() def get(self, keyName): """Get the value for a given key. @return None when no value for the key was found.""" return self.valueDict.get(keyName, None) def serialize(self): """Serialize the content of this object. @return the ascii-encoded JSON serialization of this object.""" dumpMetainfo = {} for key, value in self.valueDict.items(): if key in [ 'DataUuid', 'MetaDataSignature', 'Predecessor', 'StorageFileChecksumSha512', 'StorageFileSignature']: if value != None: value = str(base64.b64encode(value), 'ascii') dumpMetainfo[key] = value return json.dumps(dumpMetainfo, sort_keys=True).encode('ascii') def assertMetaInfoSpecificationConforming(self): """Make sure, that meta information values are conforming to the minimal requirements from the specification for the in-memory object variant of meta information.""" timestamp = self.valueDict.get('Timestamp', None) if (timestamp is None) or not isinstance(timestamp, int) or (timestamp < 0): raise Exception('Timestamp not found or not a positive integer') backupType = self.valueDict.get('BackupType', None) if backupType not in ['full', 'inc']: raise Exception('BackupType missing or invalid') checksum = self.valueDict.get('StorageFileChecksumSha512', None) if checksum != None: if not isinstance(checksum, bytes) or (len(checksum) != 64): raise Exception('Invalid checksum type or length') @staticmethod def unserialize(serializedMetaInfoData): """Create a BackupElementMetainfo object from serialized data. @param serializedMetaInfoData binary ascii-encoded JSON data""" valueDict = json.loads(str(serializedMetaInfoData, 'ascii')) for key, value in valueDict.items(): if key in [ 'DataUuid', 'MetaDataSignature', 'Predecessor', 'StorageFileChecksumSha512', 'StorageFileSignature']: if value != None: value = base64.b64decode(value) valueDict[key] = value metaInfo = BackupElementMetainfo() metaInfo.valueDict = valueDict metaInfo.assertMetaInfoSpecificationConforming() return metaInfo guerillabackup-0.5.0/src/lib/guerillabackup/DefaultFileStorage.py000066400000000000000000000337171450137035300251340ustar00rootroot00000000000000"""This module provides a default file storage that allows storage of new element using the sink interface. The storage used 3 files, the main data file, an info file holding the meta information and a lock file to allow race-free operation when multiple processes use the same storage directory.""" import errno import os import stat import guerillabackup from guerillabackup.BackupElementMetainfo import BackupElementMetainfo class DefaultFileStorage( guerillabackup.DefaultFileSystemSink, guerillabackup.StorageInterface): """This is the interface of all stores for backup data elements providing access to content data and metainfo but also additional storage attributes. The main difference to a generator unit is, that data is just retrieved but not generated on invocation.""" def __init__(self, storageDirName, configContext): """Initialize this store with parameters from the given configuration context.""" self.storageDirName = None self.openStorageDir(storageDirName, configContext) def getBackupDataElement(self, elementId): """Retrieve a single stored backup data element from the storage. @throws Exception when an incompatible query, update or read is in progress.""" return FileStorageBackupDataElement(self.storageDirFd, elementId) def getBackupDataElementForMetaData(self, sourceUrl, metaData): """Retrieve a single stored backup data element from the storage. @param sourceUrl the URL identifying the source that produced the stored data elements. @param metaData metaData dictionary for the element of interest. @throws Exception when an incompatible query, update or read is in progress. @return the element or None if no matching element was found.""" # At first get an iterator over all elements in file system that # might match the given query. guerillabackup.assertSourceUrlSpecificationConforming(sourceUrl) elementIdParts = \ guerillabackup.DefaultFileSystemSink.internalGetElementIdParts( sourceUrl, metaData) # Now search the directory for all files conforming to the specifiction. # As there may exist multiple files with the same time stamp and # type, load also the meta data and check if matches the query. elementDirFd = None if len(elementIdParts[0]) == 0: elementDirFd = os.dup(self.storageDirFd) else: try: elementDirFd = guerillabackup.secureOpenAt( self.storageDirFd, elementIdParts[0][1:], symlinksAllowedFlag=False, dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY, dirCreateMode=0o700, fileOpenFlags=os.O_DIRECTORY|os.O_RDONLY|os.O_NOFOLLOW|os.O_CREAT|os.O_EXCL|os.O_NOCTTY) except OSError as dirOpenError: # Directory does not exist, so there cannot be any valid element. if dirOpenError.errno == errno.ENOENT: return None raise searchPrefix = elementIdParts[2] searchSuffix = '-%s-%s.data' % (elementIdParts[1], elementIdParts[3]) result = None try: fileList = guerillabackup.listDirAt(elementDirFd) for fileName in fileList: if ((not fileName.startswith(searchPrefix)) or (not fileName.endswith(searchSuffix))): continue # Just verify, that the serial part is really an integer but no # need to handle the exception. This would indicate storage corruption, # so we need to stop anyway. serialStr = fileName[len(searchPrefix):-len(searchSuffix)] if serialStr != '': int(serialStr) # So file might match, load the meta data. metaDataFd = -1 fileMetaInfo = None try: metaDataFd = guerillabackup.secureOpenAt( elementDirFd, './%s.info' % fileName[:-5], symlinksAllowedFlag=False, dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY, dirCreateMode=None, fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY) metaInfoData = guerillabackup.readFully(metaDataFd) fileMetaInfo = BackupElementMetainfo.unserialize(metaInfoData) finally: if metaDataFd >= 0: os.close(metaDataFd) if fileMetaInfo.get('DataUuid') != metaData.get('DataUuid'): continue elementId = '%s/%s' % (elementIdParts[0], fileName[:-5]) result = FileStorageBackupDataElement(self.storageDirFd, elementId) break finally: os.close(elementDirFd) return result def queryBackupDataElements(self, query): """Query this storage. @param query if None, return an iterator over all stored elements. Otherwise query has to be a function returning True or False for StorageBackupDataElementInterface elements. @return BackupDataElementQueryResult iterator for this query. @throws Exception if there are any open queries or updates preventing response.""" return FileBackupDataElementQueryResult(self.storageDirFd, query) class FileStorageBackupDataElement( guerillabackup.StorageBackupDataElementInterface): """This class implements a file based backup data element.""" def __init__(self, storageDirFd, elementId): """Create a file based backup data element and make sure the storage files are at least accessible without reading or validating the content.""" # Extract the source URL from the elementId. fileNameSepPos = elementId.rfind('/') if (fileNameSepPos < 0) or (elementId[0] != '/'): raise Exception('Invalid elementId without a separator') lastNameStart = elementId.find('-', fileNameSepPos) lastNameEnd = elementId.rfind('-') if ((lastNameStart < 0) or (lastNameEnd < 0) or (lastNameStart+1 >= lastNameEnd)): raise Exception('Malformed last name in elementId') self.sourceUrl = elementId[:fileNameSepPos+1]+elementId[lastNameStart+1:lastNameEnd] guerillabackup.assertSourceUrlSpecificationConforming(self.sourceUrl) # Now try to create the StorageBackupDataElementInterface element. self.storageDirFd = storageDirFd # Just stat the data and info file, that are mandatory. os.stat('.'+elementId+'.data', dir_fd=self.storageDirFd) os.stat('.'+elementId+'.info', dir_fd=self.storageDirFd) self.elementId = elementId # Cache the metainfo once loaded. self.metaInfo = None def getElementId(self): """Get the storage element ID of this data element.""" return self.elementId def getSourceUrl(self): """Get the source URL of the storage element.""" return self.sourceUrl def getMetaData(self): """Get only the metadata part of this element. @return a BackupElementMetainfo object""" if self.metaInfo != None: return self.metaInfo metaInfoData = b'' metaDataFd = -1 try: metaDataFd = self.openElementFile('info') metaInfoData = guerillabackup.readFully(metaDataFd) self.metaInfo = BackupElementMetainfo.unserialize(metaInfoData) finally: if metaDataFd >= 0: os.close(metaDataFd) return self.metaInfo def getDataStream(self): """Get a stream to read data from that element. @return a file descriptor for reading this stream.""" dataFd = self.openElementFile('data') return dataFd def assertExtraDataName(self, name): """Make sure that file extension is a known one.""" if ((name in ['', 'data', 'info', 'lock']) or (name.find('/') >= 0) or (name.find('-') >= 0) or (name.find('.') >= 0)): raise Exception('Invalid extra data name') def setExtraData(self, name, value): """Attach or detach extra data to this storage element. This function is intended for agents to use the storage to persist this specific data also. @param value the extra data content or None to remove the element.""" self.assertExtraDataName(name) valueFileName = '.'+self.elementId+'.'+name if value is None: try: os.unlink(valueFileName, dir_fd=self.storageDirFd) except OSError as unlinkError: if unlinkError.errno != errno.ENOENT: raise return # . and - are forbidden in name, so such a temporary file should # be colissionfree. temporaryExtraDataFileName = '.%s.%s-%d' % ( self.elementId, name, os.getpid()) extraDataFd = backup.secureOpenAt( self.storageDirFd, temporaryExtraDataFileName, symlinksAllowedFlag=False, dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY, dirCreateMode=None, fileOpenFlags=os.O_WRONLY|os.O_CREAT|os.O_EXCL|os.O_TRUNC|os.O_NOFOLLOW|os.O_NOCTTY) try: os.write(extraDataFd, value) os.close(extraDataFd) extraDataFd = -1 extraDataFileName = '.%s.%s' % (self.elementId, name) try: os.unlink(extraDataFileName, dir_fd=self.storageDirFd) except OSError as unlinkError: if unlinkError.errno != errno.ENOENT: raise os.link( temporaryExtraDataFileName, extraDataFileName, src_dir_fd=self.storageDirFd, dst_dir_fd=self.storageDirFd, follow_symlinks=False) # Do not let "finally" do the cleanup to on late failures to avoid # deletion of both versions. os.unlink(temporaryExtraDataFileName, dir_fd=self.storageDirFd) finally: if extraDataFd >= 0: os.close(extraDataFd) os.unlink(temporaryExtraDataFileName, dir_fd=self.storageDirFd) def getExtraData(self, name): """@return None when no extra data was found, the content otherwise""" self.assertExtraDataName(name) value = None extraDataFd = -1 try: extraDataFd = self.openElementFile(name) value = guerillabackup.readFully(extraDataFd) except OSError as readError: if readError.errno != errno.ENOENT: raise finally: os.close(extraDataFd) return value def delete(self): """Delete this data element. This will remove all files for this element. The resource should be locked by the process attempting removal if concurrent access is possible.""" lastFileSepPos = self.elementId.rfind('/') dirFd = guerillabackup.secureOpenAt( self.storageDirFd, '.'+self.elementId[:lastFileSepPos], symlinksAllowedFlag=False, dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY, dirCreateMode=None, fileOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY) try: fileNamePrefix = self.elementId[lastFileSepPos+1:] for fileName in guerillabackup.listDirAt(dirFd): if fileName.startswith(fileNamePrefix): os.unlink(fileName, dir_fd=dirFd) finally: os.close(dirFd) def lock(self): """Lock this backup data element. @throws Exception if the element does not exist any more or cannot be locked""" lockFd = self.openElementFile( 'lock', fileOpenFlags=os.O_WRONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY) os.close(lockFd) def unlock(self): """Unlock this backup data element.""" os.unlink('.'+self.elementId+'.lock', dir_fd=self.storageDirFd) def openElementFile(self, name, fileOpenFlags=None): """Open the element file with given name. @param fileOpenFlags when None, open the file readonly without creating it. @return the file descriptor to the new file.""" if fileOpenFlags is None: fileOpenFlags = os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY valueFileName = '.'+self.elementId+'.'+name elementFd = guerillabackup.secureOpenAt( self.storageDirFd, valueFileName, symlinksAllowedFlag=False, dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY, dirCreateMode=None, fileOpenFlags=fileOpenFlags) return elementFd class FileBackupDataElementQueryResult(guerillabackup.BackupDataElementQueryResult): """This class provides results from querying a file based backup data element storage.""" def __init__(self, storageDirFd, queryFunction): self.queryFunction = queryFunction self.storageDirFd = storageDirFd # Create a stack with files and directory resources not listed yet. # Each entry is a tuple with the file name prefix and the list # of files. self.dirStack = [('.', ['.'])] def getNextElement(self): """Get the next backup data element from this query iterator. @return a StorageBackupDataElementInterface object.""" while len(self.dirStack) != 0: lastDirStackElement = self.dirStack[-1] if len(lastDirStackElement[1]) == 0: del self.dirStack[-1] continue # Check the type of the first element included in the list. testName = lastDirStackElement[1][0] del lastDirStackElement[1][0] testPath = lastDirStackElement[0]+'/'+testName if lastDirStackElement[0] == '.': testPath = testName # Stat without following links. statData = os.stat(testPath, dir_fd=self.storageDirFd) if stat.S_ISDIR(statData.st_mode): # Add an additional level of to the stack. fileList = guerillabackup.listDirAt(self.storageDirFd, testPath) if len(fileList) != 0: self.dirStack.append((testPath, fileList)) continue if not stat.S_ISREG(statData.st_mode): raise Exception('Found unexpected storage data elements ' \ 'with stat data 0x%x' % statData.st_mode) # So this is a normal file. Find the common prefix and remove # all other files belonging to the same element from the list. testNamePrefixPos = testName.rfind('.') if testNamePrefixPos < 0: raise Exception('Malformed element name %s' % repr(testPath)) testNamePrefix = testName[:testNamePrefixPos+1] for testPos in range(len(lastDirStackElement[1])-1, -1, -1): if lastDirStackElement[1][testPos].startswith(testNamePrefix): del lastDirStackElement[1][testPos] # Create the element anyway, it is needed for the query. elementId = '/' if lastDirStackElement[0] != '.': elementId += lastDirStackElement[0]+'/' elementId += testNamePrefix[:-1] dataElement = FileStorageBackupDataElement(self.storageDirFd, elementId) if (self.queryFunction != None) and (not self.queryFunction(dataElement)): continue return dataElement return None guerillabackup-0.5.0/src/lib/guerillabackup/DefaultFileSystemSink.py000066400000000000000000000220051450137035300256250ustar00rootroot00000000000000"""This module defines the classes for writing backup data elements to the file system.""" import datetime import errno import hashlib import os import random import guerillabackup class DefaultFileSystemSink(guerillabackup.SinkInterface): """This class defines a sink to store backup data elements to the filesystem. In test mode it will unlink the data file during close and report an error.""" SINK_BASEDIR_KEY = 'DefaultFileSystemSinkBaseDir' def __init__(self, configContext): self.testModeFlag = False storageDirName = configContext.get( DefaultFileSystemSink.SINK_BASEDIR_KEY, None) if storageDirName is None: raise Exception('Mandatory sink configuration parameter ' \ '%s missing' % DefaultFileSystemSink.SINK_BASEDIR_KEY) self.storageDirName = None self.storageDirFd = -1 self.openStorageDir(storageDirName, configContext) def openStorageDir(self, storageDirName, configContext): """Open the storage behind the sink. This method may only be called once.""" if self.storageDirName != None: raise Exception('Already defined') self.storageDirName = storageDirName self.storageDirFd = guerillabackup.secureOpenAt( -1, self.storageDirName, symlinksAllowedFlag=False, dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY, dirCreateMode=None, fileOpenFlags=os.O_DIRECTORY|os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY, fileCreateMode=0o700) self.testModeFlag = configContext.get( guerillabackup.CONFIG_GENERAL_DEBUG_TEST_MODE_KEY, False) if not isinstance(self.testModeFlag, bool): raise Exception('Configuration parameter %s has to be ' \ 'boolean' % guerillabackup.CONFIG_GENERAL_DEBUG_TEST_MODE_KEY) def getSinkHandle(self, sourceUrl): """Get a handle to perform transfer of a single backup data element to a sink.""" return DefaultFileSystemSinkHandle( self.storageDirFd, self.testModeFlag, sourceUrl) @staticmethod def internalGetElementIdParts(sourceUrl, metaInfo): """Get the parts forming the element ID as tuple. The tuple elements are directory part, timestamp string, storage file name main part including the backup type. The storage file name can be created easily by adding separators, an optional serial after the timestamp and the file type suffix. @return the tuple with all fields filled when metaInfo is not None, otherwise only directory part is filled. The directory will be an empty string for top level elements or the absolute sourceUrl path up to but excluding the last slash.""" fileTimestampStr = None backupTypeStr = None if metaInfo != None: fileTimestampStr = datetime.datetime.fromtimestamp( metaInfo.get('Timestamp')).strftime('%Y%m%d%H%M%S') backupTypeStr = metaInfo.get('BackupType') lastPartSplitPos = sourceUrl.rfind('/') return ( sourceUrl[:lastPartSplitPos], sourceUrl[lastPartSplitPos+1:], fileTimestampStr, backupTypeStr) class DefaultFileSystemSinkHandle(guerillabackup.SinkHandleInterface): """This class defines a handle for writing a backup data to the file system.""" def __init__(self, storageDirFd, testModeFlag, sourceUrl): """Create a temporary storage file and a handle to it.""" self.testModeFlag = testModeFlag self.sourceUrl = sourceUrl guerillabackup.assertSourceUrlSpecificationConforming(sourceUrl) self.elementIdParts = DefaultFileSystemSink.internalGetElementIdParts( sourceUrl, None) self.storageDirFd = None if self.elementIdParts[0] == '': self.storageDirFd = os.dup(storageDirFd) else: self.storageDirFd = guerillabackup.secureOpenAt( storageDirFd, self.elementIdParts[0][1:], symlinksAllowedFlag=False, dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY, dirCreateMode=0o700, fileOpenFlags=os.O_DIRECTORY|os.O_RDONLY|os.O_NOFOLLOW|os.O_CREAT|os.O_EXCL|os.O_NOCTTY, fileCreateMode=0o700) # Generate a temporary file name in the same directory. while True: self.tmpFileName = 'tmp-%s-%d' % (self.elementIdParts[1], random.randint(0, 1<<30)) try: self.streamFd = guerillabackup.secureOpenAt( self.storageDirFd, self.tmpFileName, symlinksAllowedFlag=False, dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY, dirCreateMode=None, fileOpenFlags=os.O_RDWR|os.O_NOFOLLOW|os.O_CREAT|os.O_EXCL|os.O_NOCTTY, fileCreateMode=0o600) break except OSError as openError: if openError.errno != errno.EEXIST: os.close(self.storageDirFd) raise def getSinkStream(self): """Get the file descriptor to write directly to the open backup data element at the sink, if available. @return the file descriptor or None when not supported.""" if self.streamFd is None: raise Exception('Illegal state, already closed') return self.streamFd def write(self, data): """Write data to the open backup data element at the sink.""" os.write(self.streamFd, data) def close(self, metaInfo): """Close the backup data element at the sink and receive any pending or current error associated with the writing process. When there is sufficient risk, that data written to the sink is might have been corrupted during transit or storage, the sink may decide to perform a verification operation while closing and return any verification errors here also. @param metaInfo python objects with additional information about this backup data element. This information is added at the end of the sink procedure to allow inclusion of checksum or signature fields created on the fly while writing. See design and implementation documentation for requirements on those objects.""" if self.streamFd is None: raise Exception('Illegal state, already closed') self.elementIdParts = DefaultFileSystemSink.internalGetElementIdParts( self.sourceUrl, metaInfo) # The file name main part between timestamp (with serial) and # suffix as string. fileNameMainStr = '%s-%s' % (self.elementIdParts[1], self.elementIdParts[3]) fileChecksum = metaInfo.get('StorageFileChecksumSha512') metaInfoStr = metaInfo.serialize() try: if fileChecksum != None: # Reread the file and create checksum. os.lseek(self.streamFd, os.SEEK_SET, 0) digestAlgo = hashlib.sha512() while True: data = os.read(self.streamFd, 1<<20) if len(data) == 0: break digestAlgo.update(data) if fileChecksum != digestAlgo.digest(): raise Exception('Checksum mismatch') # Link the name to the final pathname. serial = -1 storageFileName = None while True: if serial < 0: storageFileName = '%s-%s.data' % ( self.elementIdParts[2], fileNameMainStr) else: storageFileName = '%s%d-%s.data' % ( self.elementIdParts[2], serial, fileNameMainStr) serial += 1 try: os.link( self.tmpFileName, storageFileName, src_dir_fd=self.storageDirFd, dst_dir_fd=self.storageDirFd, follow_symlinks=False) break except OSError as linkError: if linkError.errno != errno.EEXIST: raise # Now unlink the old file. With malicious actors we cannot be # sure to unlink the file we have currently opened, but in worst # case some malicious symlink is removed. os.unlink(self.tmpFileName, dir_fd=self.storageDirFd) # Now create the meta-information file. As the data file acted # as a lock, there is nothing to fail except for severe system # failure or malicious activity. So do not attempt to correct # any errors at this stage. Create a temporary version first and # then link it to have atomic completion operation instead of # risk, that another system could pick up the incomplete info # file. metaInfoFileName = storageFileName[:-4]+'info' metaInfoFd = guerillabackup.secureOpenAt( self.storageDirFd, metaInfoFileName+'.tmp', symlinksAllowedFlag=False, dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY, dirCreateMode=None, fileOpenFlags=os.O_RDWR|os.O_NOFOLLOW|os.O_CREAT|os.O_EXCL|os.O_NOCTTY, fileCreateMode=0o600) os.write(metaInfoFd, metaInfoStr) os.close(metaInfoFd) if self.testModeFlag: # Unlink all artefacts when operating in test mode to avoid accidential os.unlink(storageFileName, dir_fd=self.storageDirFd) os.unlink(metaInfoFileName+'.tmp', dir_fd=self.storageDirFd) raise Exception('No storage in test mode') os.link( metaInfoFileName+'.tmp', metaInfoFileName, src_dir_fd=self.storageDirFd, dst_dir_fd=self.storageDirFd, follow_symlinks=False) os.unlink(metaInfoFileName+'.tmp', dir_fd=self.storageDirFd) finally: os.close(self.storageDirFd) self.storageDirFd = None os.close(self.streamFd) self.streamFd = None guerillabackup-0.5.0/src/lib/guerillabackup/DigestPipelineElement.py000066400000000000000000000172451450137035300256400ustar00rootroot00000000000000"""This module contains only the classes for pipelined digest calculation.""" import fcntl import hashlib import os import guerillabackup class DigestPipelineElement( guerillabackup.TransformationPipelineElementInterface): """This class create pipeline instances for digest generation. The instances will forward incoming data unmodified to allow digest generation on the fly.""" def __init__(self, digestClass=hashlib.sha512): self.digestClass = digestClass def getExecutionInstance(self, upstreamProcessOutput): """Get an execution instance for this transformation element. @param upstreamProcessOutput this is the output of the upstream process, that will be wired as input of the newly created process instance.""" return DigestPipelineExecutionInstance( self.digestClass, upstreamProcessOutput) class DigestPipelineExecutionInstance( guerillabackup.TransformationProcessInterface): """This is the digest execution instance class created when instantiating the pipeline.""" def __init__(self, digestClass, upstreamProcessOutput): self.digest = digestClass() self.digestData = None # Keep the upstream process output until end of stream is reached. self.upstreamProcessOutput = upstreamProcessOutput self.processOutput = None # Output stream for direct writing. self.processOutputStream = None self.processOutputBuffer = '' def getProcessOutput(self): """Get the output connector of this transformation process.""" if self.processOutputStream is None: raise Exception('No access to process output in stream mode') if self.processOutput is None: self.processOutput = DigestOutputInterface(self) return self.processOutput def setProcessOutputStream(self, processOutputStream): """Some processes may also support setting of an output stream file descriptor. This is especially useful if the process is the last one in a pipeline and hence could write directly to a file or network descriptor. @throw Exception if this process does not support setting of output stream descriptors.""" if self.processOutput != None: raise Exception('No setting of output stream after call to getProcessOutput') # This module has no asynchronous operation mode, so writing to # a given output stream in doProcess has to be non-blocking to # avoid deadlock. flags = fcntl.fcntl(processOutputStream, fcntl.F_GETFL) fcntl.fcntl(processOutputStream, fcntl.F_SETFL, flags|os.O_NONBLOCK) self.processOutputStream = processOutputStream def isAsynchronous(self): """A asynchronous process just needs to be started and will perform data processing on streams without any further interaction while running.""" return False def start(self): """Start this execution process.""" if (self.processOutput is None) and (self.processOutputStream is None): raise Exception('Not connected') if self.digest is None: raise Exception('Cannot restart again') # Nothing to do with that type of process. def stop(self): """Stop this execution process when still running. @return None when the the instance was already stopped, information about stopping, e.g. the stop error message when the process was really stopped.""" stopException = None if self.processOutputBuffer != None: data = self.upstreamProcessOutput.read(64) self.upstreamProcessOutput.close() if data != None: stopException = Exception('Upstream output still open, there might be unprocessed data') if self.digest is None: return None self.digestData = self.digest.digest() self.digest = None return stopException def isRunning(self): """See if this process instance is still running.""" return self.digest != None def doProcess(self): """This method triggers the data transformation operation of this component. For components in synchronous mode, the method will attempt to move data from input to output. Asynchronous components will just check the processing status and may raise an exception, when processing terminated with errors. As such a component might not be able to detect the amount of data really moved since last invocation, the component may report a fake single byte move. @throws Exception if an uncorrectable transformation state was reached and transformation cannot proceed, even though end of input data was not yet seen. Raise exception also when process was not started or already stopped. @return the number of bytes read or written or at least a value greater zero if any data was processed. A value of zero indicates, that currently data processing was not possible due to filled buffers but should be attemted again. A value below zero indicates that all input data was processed and output buffers were flushed already.""" if self.digest is None: return -1 movedDataLength = 0 if ((self.upstreamProcessOutput != None) and (len(self.processOutputBuffer) == 0)): self.processOutputBuffer = self.upstreamProcessOutput.readData(1<<16) if self.processOutputBuffer is None: self.upstreamProcessOutput.close() self.upstreamProcessOutput = None self.digestData = self.digest.digest() self.digest = None return -1 movedDataLength = len(self.processOutputBuffer) if self.processOutputStream != None: writeLength = os.write(self.processOutputStream, self.processOutputBuffer) movedDataLength += writeLength self.digest.update(self.processOutputBuffer[:writeLength]) if writeLength == len(self.processOutputBuffer): self.processOutputBuffer = '' else: self.processOutputBuffer = self.processOutputBuffer[writeLength:] return movedDataLength def getBlockingStreams(self, readStreamList, writeStreamList): """Collect the file descriptors that are currently blocking this synchronous compoment.""" if ((self.upstreamProcessOutput != None) and (len(self.processOutputBuffer) == 0) and (self.upstreamProcessOutput.getOutputStreamDescriptor() != None)): readStreamList.append( self.upstreamProcessOutput.getOutputStreamDescriptor()) if ((self.processOutputStream != None) and (self.processOutputBuffer != None) and (len(self.processOutputBuffer) != 0)): writeStreamList.append(self.processOutputStream) def getDigestData(self): """Get the data from this digest after processing was completed.""" if self.digest != None: raise Exception('Digest processing not yet completed') return self.digestData class DigestOutputInterface( guerillabackup.TransformationProcessOutputInterface): """Digest pipeline element output class.""" def __init__(self, executionInstance): self.executionInstance = executionInstance def getOutputStreamDescriptor(self): """Get the file descriptor to read output from this output interface. This is not available for that type of digest element.""" return None def readData(self, length): """Read data from this output. @return the at most length bytes of data, zero-length data if nothing available at the moment and None when end of input was reached.""" if self.executionInstance.processOutputBuffer is None: return None returnData = self.executionInstance.processOutputBuffer if length < len(self.executionInstance.processOutputBuffer): returnData = self.executionInstance.processOutputBuffer[:length] self.executionInstance.processOutputBuffer = \ self.executionInstance.processOutputBuffer[length:] else: self.executionInstance.processOutputBuffer = '' return returnData guerillabackup-0.5.0/src/lib/guerillabackup/GpgEncryptionPipelineElement.py000066400000000000000000000031641450137035300272040ustar00rootroot00000000000000"""This module provides support for a GnuPG based encryption pipeline element.""" import guerillabackup from guerillabackup.OSProcessPipelineElement import OSProcessPipelineExecutionInstance class GpgEncryptionPipelineElement( guerillabackup.TransformationPipelineElementInterface): """This class create pipeline instances for PGP encryption of data stream using GnuPG.""" # Those are the default arguments beside key name. gpgDefaultCallArguments = [ '/usr/bin/gpg', '--batch', '--lock-never', '--no-options', '--homedir', '/etc/guerillabackup/keys', '--trust-model', 'always', '--throw-keyids', '--no-emit-version', '--encrypt'] def __init__(self, keyName, callArguments=gpgDefaultCallArguments): """Create the pipeline element. @param When defined, pass those arguments to gpg when encrypting. Otherwise gpgDefaultCallArguments are used.""" self.keyName = keyName self.callArguments = callArguments def getExecutionInstance(self, upstreamProcessOutput): """Get an execution instance for this transformation element. @param upstreamProcessOutput this is the output of the upstream process, that will be wired as input of the newly created process instance.""" return OSProcessPipelineExecutionInstance( self.callArguments[0], self.callArguments+['--hidden-recipient', self.keyName], upstreamProcessOutput, allowedExitStatusList=[0]) def replaceKey(self, newKeyName): """Return an encryption element with same gpg invocation arguments but key name replaced.""" return GpgEncryptionPipelineElement(newKeyName, self.callArguments) guerillabackup-0.5.0/src/lib/guerillabackup/LogfileBackupUnit.py000066400000000000000000000542261450137035300247700ustar00rootroot00000000000000"""This module provides all classes required for logfile backup.""" import base64 import errno import hashlib import json import os import re import sys import time import traceback import guerillabackup from guerillabackup.BackupElementMetainfo import BackupElementMetainfo from guerillabackup.TransformationProcessOutputStream import TransformationProcessOutputStream # This is the key to the list of source files to include using # a LogfileBackupUnit. The list is extracted from the configContext # at invocation time of a backup unit, not at creation time. The # structure of the parameter content is a list of source description # entries. Each entry in the list is used to create an input description # object of class LogfileBackupUnitInputDescription. CONFIG_INPUT_LIST_KEY = 'LogBackupUnitInputList' class LogfileBackupUnitInputDescription(): """This class stores information about one set of logfiles to be processed.""" def __init__(self, descriptionTuple): """Initialize a single input description using a 5-value tuple, e.g. extracted directly from the CONFIG_INPUT_LIST_KEY parameter. @param descriptionTuple the tuple, the meaning of the 5 values to be extracted is: * Input directory: directory to search for logfiles * Input file regex: regular expression to select compressed or uncompressed logfiles for inclusion. * Source URL transformation: If None, the first named group of the "input file regex" is used as source URL. When not starting with a "/", the transformation string is the name to include literally in the URL after the "input directory" name. * Policy: If not none, include this string as handling policy within the manifest. * Encryption key name: If not None, encrypt the input using the named key.""" # Accept list also. if ((not isinstance(descriptionTuple, tuple)) and (not isinstance(descriptionTuple, list))): raise Exception('Input description has to be list or tuple') if len(descriptionTuple) != 5: raise Exception('Input description has to be tuple with 5 elements') self.inputDirectoryName = os.path.normpath(descriptionTuple[0]) # "//..." is a normalized path, get rid of double slashes. self.sourceUrlPath = self.inputDirectoryName.replace('//', '/') if self.sourceUrlPath[-1] != '/': self.sourceUrlPath += '/' self.inputFileRegex = re.compile(descriptionTuple[1]) self.sourceTransformationPattern = descriptionTuple[2] try: if self.sourceTransformationPattern is None: guerillabackup.assertSourceUrlSpecificationConforming( self.sourceUrlPath+'testname') elif self.sourceTransformationPattern[0] != '/': guerillabackup.assertSourceUrlSpecificationConforming( self.sourceUrlPath+self.sourceTransformationPattern) else: guerillabackup.assertSourceUrlSpecificationConforming( self.sourceTransformationPattern) except Exception as assertException: raise Exception('Source URL transformation malformed: '+assertException.args[0]) self.handlingPolicyName = descriptionTuple[3] self.encryptionKeyName = descriptionTuple[4] def getTransformedSourceName(self, matcher): """Get the source name for logfiles matching the input description.""" if self.sourceTransformationPattern is None: return self.sourceUrlPath+matcher.group(1) if self.sourceTransformationPattern[0] != '/': return self.sourceUrlPath+self.sourceTransformationPattern return self.sourceTransformationPattern def tryIntConvert(value): """Try to convert a value to an integer for sorting. When conversion fails, the value itself returned, thus sorting will be performed lexigraphically afterwards.""" try: return int(value) except: return value class LogfileSourceInfo(): """This class provides support to collect logfiles from one LogfileBackupUnitInputDescription that all map to the same source URL. This is needed to process all of them in the correct order, starting with the oldest one.""" def __init__(self, sourceUrl): self.sourceUrl = sourceUrl self.serialTypesConsistentFlag = True self.serialType = None # List of tracked file information records. Each record is a tuple # with 3 values: file name, regex matcher and serial data. self.fileList = [] def addFile(self, fileName, matcher): """Add a logfile that will be mapped to the source URL of this group.""" groupDict = matcher.groupdict() serialType = None if 'serial' in groupDict: serialType = 'serial' if 'oldserial' in groupDict: if serialType != None: self.serialTypesConsistentFlag = False else: serialType = 'oldserial' if self.serialType is None: self.serialType = serialType elif self.serialType != serialType: self.serialTypesConsistentFlag = False serialData = [] if serialType != None: serialValue = groupDict[serialType] if (serialValue != None) and (len(serialValue) != 0): serialData = [tryIntConvert(x) for x in re.findall('(\\d+|\\D+)', serialValue)] # This is not very efficient but try to detect duplicate serialData # values here already and tag the whole list as inconsistent. # This may happen with broken regular expressions or when mixing # compressed and uncompressed files with same serial. for elemFileName, elemMatcher, elemSerialData in self.fileList: if elemSerialData == serialData: self.serialTypesConsistentFlag = False self.fileList.append((fileName, matcher, serialData,)) def getSortedFileList(self): """Get the sorted file list starting with the oldest entry. The oldest one should be moved to backup first.""" if not self.serialTypesConsistentFlag: raise Exception('No sorting in inconsistent state') fileList = sorted(self.fileList, key=lambda x: x[2]) if self.serialType is None: if len(fileList) > 1: raise Exception('No serial type and more than one file') elif self.serialType == 'serial': # Larger serial numbers denote newer files, only elements without # serial data have to be moved to the end. moveCount = 0 while (moveCount < len(fileList)) and (len(fileList[moveCount][2]) == 0): moveCount += 1 fileList = fileList[moveCount:]+fileList[:moveCount] elif self.serialType == 'oldserial': # Larger serial numbers denote older files. File without serial would # be first, so just reverse is sufficient. fileList.reverse() else: raise Exception('Unsupported serial type %s' % self.serialType) return fileList class LogfileBackupUnit(guerillabackup.SchedulableGeneratorUnitInterface): """This class allows to schedule regular searches in a list of log file directories for files matching a pattern. If files are found and not open for writing any more, they are processed according to specified transformation pipeline and deleted afterwards. The unit will keep track of the last UUID reported for each resource and generate a new one for each handled file using json-serialized state data. The state data is a list with the timestamp of the last run as seconds since 1970, the next list value contains a dictionary with the resource name for each logfile group as key and the last UUID as value.""" def __init__(self, unitName, configContext): """Initialize this unit using the given configuration.""" self.unitName = unitName self.configContext = configContext # This is the maximum interval in seconds between two invocations. # When last invocation was more than that number of seconds in # the past, the unit will attempt invocation at first possible # moment. self.maxInvocationInterval = 3600 # When this value is not zero, the unit will attempt to trigger # invocation always at the same time using this value as modulus. self.moduloInvocationUnit = 3600 # This is the invocation offset when modulus timing is enabled. self.moduloInvocationTime = 0 # As immediate invocation cannot be guaranteed, this value defines # the size of the window, within that the unit should still be # invoked, even when the targeted time slot has already passed # by. self.moduloInvocationTimeWindow = 10 self.testModeFlag = configContext.get( guerillabackup.CONFIG_GENERAL_DEBUG_TEST_MODE_KEY, False) if not isinstance(self.testModeFlag, bool): raise Exception('Configuration parameter %s has to be ' \ 'boolean' % guerillabackup.CONFIG_GENERAL_DEBUG_TEST_MODE_KEY) # Timestamp of last invocation end. self.lastInvocationTime = -1 # Map from resource name to UUID of most recent file processed. # The UUID is kept internally as binary data string. Only for # persistency, data will be base64 encoded. self.resourceUuidMap = {} self.persistencyDirFd = guerillabackup.openPersistencyFile( configContext, os.path.join('generators', self.unitName), os.O_DIRECTORY|os.O_RDONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY, 0o700) handle = None try: handle = guerillabackup.secureOpenAt( self.persistencyDirFd, 'state.current', fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY) except OSError as openError: if openError.errno != errno.ENOENT: raise # See if the state.previous file exists, if yes, the unit is likely # to be broken. Refuse to do anything while in this state. try: os.stat( 'state.previous', dir_fd=self.persistencyDirFd, follow_symlinks=False) raise Exception('Persistency data inconsistencies: found stale previous state file') except OSError as statError: if statError.errno != errno.ENOENT: raise # So there is only the current state file, if any. stateInfo = None if handle != None: stateData = b'' while True: data = os.read(handle, 1<<20) if len(data) == 0: break stateData += data os.close(handle) stateInfo = json.loads(str(stateData, 'ascii')) if ((not isinstance(stateInfo, list)) or (len(stateInfo) != 2) or (not isinstance(stateInfo[0], int)) or (not isinstance(stateInfo[1], dict))): raise Exception('Persistency data structure mismatch') self.lastInvocationTime = stateInfo[0] self.resourceUuidMap = stateInfo[1] for url, uuidData in self.resourceUuidMap.items(): self.resourceUuidMap[url] = base64.b64decode(uuidData) def getNextInvocationTime(self): """Get the time in seconds until this unit should called again. If a unit does not know (yet) as invocation needs depend on external events, it should report a reasonable low value to be queried again soon. @return 0 if the unit should be invoked immediately, the seconds to go otherwise.""" currentTime = int(time.time()) maxIntervalDelta = self.lastInvocationTime+self.maxInvocationInterval-currentTime # Already overdue, activate immediately. if maxIntervalDelta <= 0: return 0 # No modulo time operation, just return the next delta value. if self.moduloInvocationUnit == 0: return maxIntervalDelta # See if currentTime is within invocation window moduloIntervalDelta = (currentTime%self.moduloInvocationUnit)-self.moduloInvocationTime if moduloIntervalDelta < 0: moduloIntervalDelta += self.moduloInvocationUnit # See if immediate modulo invocation is possible. if moduloIntervalDelta < self.moduloInvocationTimeWindow: # We could be within the window, but only if last invocation happened # during the previous modulo unit. lastInvocationUnit = (self.lastInvocationTime-self.moduloInvocationTime)/self.moduloInvocationUnit currentInvocationUnit = (currentTime-self.moduloInvocationTime)/self.moduloInvocationUnit if lastInvocationUnit != currentInvocationUnit: return 0 # We are still within the same invocation interval. Fall through # to the out-of-window case to calculate the next invocation time. moduloIntervalDelta = self.moduloInvocationUnit-moduloIntervalDelta return min(maxIntervalDelta, moduloIntervalDelta) def processInput(self, unitInput, sink): """Process a single input description by searching for files that could be written to the sink.""" inputDirectoryFd = None getFileOpenerInformationErrorMode = guerillabackup.OPENER_INFO_FAIL_ON_ERROR if os.geteuid() != 0: getFileOpenerInformationErrorMode = guerillabackup.OPENER_INFO_IGNORE_ACCESS_ERRORS try: inputDirectoryFd = guerillabackup.secureOpenAt( None, unitInput.inputDirectoryName, fileOpenFlags=os.O_DIRECTORY|os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY) sourceDict = {} for fileName in guerillabackup.listDirAt(inputDirectoryFd): matcher = unitInput.inputFileRegex.match(fileName) if matcher is None: continue sourceUrl = unitInput.getTransformedSourceName(matcher) sourceInfo = sourceDict.get(sourceUrl, None) if sourceInfo is None: sourceInfo = LogfileSourceInfo(sourceUrl) sourceDict[sourceUrl] = sourceInfo sourceInfo.addFile(fileName, matcher) # Now we know all files to be included for each URL. Sort them # to fulfill Req:OrderedProcessing and start with the oldest. for sourceUrl, sourceInfo in sourceDict.items(): if not sourceInfo.serialTypesConsistentFlag: print('Inconsistent serial types in %s, ignoring ' \ 'source.' % sourceInfo.sourceUrl, file=sys.stderr) continue # Get the downstream transformation pipeline elements. downstreamPipelineElements = \ guerillabackup.getDefaultDownstreamPipeline( self.configContext, unitInput.encryptionKeyName) fileList = sourceInfo.getSortedFileList() fileInfoList = guerillabackup.getFileOpenerInformation( ['%s/%s' % (unitInput.inputDirectoryName, x[0]) for x in fileList], getFileOpenerInformationErrorMode) for fileListIndex in range(0, len(fileList)): fileName, matcher, serialData = fileList[fileListIndex] # Make sure, that the file is not written any more. logFilePathName = os.path.join( unitInput.inputDirectoryName, fileName) isOpenForWritingFlag = False if fileInfoList[fileListIndex] != None: for pid, fdInfoList in fileInfoList[fileListIndex]: for fdNum, fdOpenFlags in fdInfoList: if fdOpenFlags == 0o100001: print('File %s is still written by pid %d, ' \ 'fd %d' % (logFilePathName, pid, fdNum), file=sys.stderr) isOpenForWritingFlag = True elif fdOpenFlags != 0o100000: print('File %s unknown open flags 0x%x by pid %d, ' \ 'fd %d' % ( logFilePathName, fdOpenFlags, pid, fdNum), file=sys.stderr) isOpenForWritingFlag = True # Files have to be processed in correct order, so we have to stop # here. if isOpenForWritingFlag: break completePipleline = downstreamPipelineElements compressionType = matcher.groupdict().get('compress', None) if compressionType != None: # Source file is compressed, prepend a suffix/content-specific # decompression element. compressionElement = None if compressionType == 'gz': compressionElement = guerillabackup.OSProcessPipelineElement( '/bin/gzip', ['/bin/gzip', '-cd']) else: raise Exception('Unkown compression type %s for file %s/%s' % ( compressionType, unitInput.inputDirectoryName, fileName)) completePipleline = [compressionElement]+completePipleline[:] logFileFd = guerillabackup.secureOpenAt( inputDirectoryFd, fileName, fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY) logFileStatData = os.fstat(logFileFd) # By wrapping the logFileFd into this object, the first pipeline # element will close it. So we do not need to care here. logFileOutput = TransformationProcessOutputStream(logFileFd) sinkHandle = sink.getSinkHandle(sourceInfo.sourceUrl) sinkStream = sinkHandle.getSinkStream() # Get the list of started pipeline instances. pipelineInstances = guerillabackup.instantiateTransformationPipeline( completePipleline, logFileOutput, sinkStream, doStartFlag=True) guerillabackup.runTransformationPipeline(pipelineInstances) digestData = pipelineInstances[-1].getDigestData() metaInfoDict = {} metaInfoDict['BackupType'] = 'full' if unitInput.handlingPolicyName != None: metaInfoDict['HandlingPolicy'] = [unitInput.handlingPolicyName] lastUuid = self.resourceUuidMap.get(sourceInfo.sourceUrl, None) currentUuidDigest = hashlib.sha512() if lastUuid != None: metaInfoDict['Predecessor'] = lastUuid currentUuidDigest.update(lastUuid) # Add the compressed file digest. The consequence is, that it # will not be completely obvious when the same file was processed # with twice with encryption enabled and processing failed in # late phase. Therefore identical file content cannot be detected. currentUuidDigest.update(digestData) # Also include the timestamp and original filename of the source # file in the UUID calculation: Otherwise retransmissions of files # with identical content cannot be distinguished. currentUuidDigest.update(bytes('%d %s' % ( logFileStatData.st_mtime, fileName), sys.getdefaultencoding())) currentUuid = currentUuidDigest.digest() metaInfoDict['DataUuid'] = currentUuid metaInfoDict['StorageFileChecksumSha512'] = digestData metaInfoDict['Timestamp'] = int(logFileStatData.st_mtime) metaInfo = BackupElementMetainfo(metaInfoDict) sinkHandle.close(metaInfo) if self.testModeFlag: raise Exception('No completion of logfile backup in test mode') # Delete the logfile. os.unlink(fileName, dir_fd=inputDirectoryFd) # Update the UUID map as last step: if any of the steps above # would fail, currentUuid generated in next run will be identical # to this. Sorting out the duplicates will be easy. self.resourceUuidMap[sourceInfo.sourceUrl] = currentUuid finally: if inputDirectoryFd != None: os.close(inputDirectoryFd) def invokeUnit(self, sink): """Invoke this unit to create backup elements and pass them on to the sink. Even when indicated via getNextInvocationTime, the unit may decide, that it is not yet ready and not write any element to the sink. @return None if currently there is nothing to write to the source, a number of seconds to retry invocation if the unit assumes, that there is data to be processed but processing cannot start yet, e.g. due to locks held by other parties or resource, e.g. network storages, currently not available.""" nextInvocationDelta = self.getNextInvocationTime() invocationAttemptedFlag = False try: if nextInvocationDelta == 0: # We are now ready for processing. Get the list of source directories # and search patterns to locate the target files. unitInputListConfig = self.configContext.get(CONFIG_INPUT_LIST_KEY, None) invocationAttemptedFlag = True nextInvocationDelta = None if unitInputListConfig is None: print('Suspected configuration error: LogfileBackupUnit ' \ 'enabled but %s configuration list empty' % CONFIG_INPUT_LIST_KEY, file=sys.stderr) else: for configItem in unitInputListConfig: unitInput = None try: unitInput = LogfileBackupUnitInputDescription(configItem) except Exception as configReadException: print('LogfileBackupUnit: failed to use configuration ' \ '%s: %s' % ( repr(configItem), configReadException.args[0]), file=sys.stderr) continue # Configuration parsing worked, start processing the inputs. self.processInput(unitInput, sink) finally: if invocationAttemptedFlag: try: # Update the timestamp. self.lastInvocationTime = int(time.time()) # Write back the new state information immediately after invocation # to avoid data loss when program crashes immediately afterwards. # Keep one old version of state file. try: os.unlink('state.old', dir_fd=self.persistencyDirFd) except OSError as relinkError: if relinkError.errno != errno.ENOENT: raise try: os.link( 'state.current', 'state.old', src_dir_fd=self.persistencyDirFd, dst_dir_fd=self.persistencyDirFd, follow_symlinks=False) except OSError as relinkError: if relinkError.errno != errno.ENOENT: raise try: os.unlink('state.current', dir_fd=self.persistencyDirFd) except OSError as relinkError: if relinkError.errno != errno.ENOENT: raise handle = guerillabackup.secureOpenAt( self.persistencyDirFd, 'state.current', fileOpenFlags=os.O_WRONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY, fileCreateMode=0o600) writeResourceUuidMap = {} for url, uuidData in self.resourceUuidMap.items(): writeResourceUuidMap[url] = str(base64.b64encode(uuidData), 'ascii') os.write( handle, json.dumps([ self.lastInvocationTime, writeResourceUuidMap]).encode('ascii')) os.close(handle) except Exception as stateSaveException: # Writing of state information failed. Print out the state information # for manual reconstruction as last resort. print('Writing of state information failed: %s\nCurrent ' \ 'state: %s' % ( str(stateSaveException), repr([self.lastInvocationTime, self.resourceUuidMap])), file=sys.stderr) traceback.print_tb(sys.exc_info()[2]) raise # Declare the main unit class so that the backup generator can # instantiate it. backupGeneratorUnitClass = LogfileBackupUnit guerillabackup-0.5.0/src/lib/guerillabackup/OSProcessPipelineElement.py000066400000000000000000000337411450137035300263000ustar00rootroot00000000000000"""This module contains classes for creation of asynchronous OS process based pipeline elements.""" import fcntl import os import subprocess import guerillabackup from guerillabackup.TransformationProcessOutputStream import NullProcessOutputStream from guerillabackup.TransformationProcessOutputStream import TransformationProcessOutputStream class OSProcessPipelineElement( guerillabackup.TransformationPipelineElementInterface): """This is the interface to define data transformation pipeline elements, e.g. for compression, encryption, signing. To really start execution of a transformation pipeline, transformation process instances have to be created for each pipe element.""" def __init__(self, executable, execArgs, allowedExitStatusList=None): """Create the OSProcessPipelineElement element. @param allowedExitStatusList when not defined, only command exit code of 0 is accepted to indicated normal termination.""" self.executable = executable self.execArgs = execArgs if not guerillabackup.isValueListOfType(self.execArgs, str): raise Exception('execArgs have to be list of strings') if allowedExitStatusList is None: allowedExitStatusList = [0] self.allowedExitStatusList = allowedExitStatusList def getExecutionInstance(self, upstreamProcessOutput): """Get an execution instance for this transformation element. @param upstreamProcessOutput this is the output of the upstream process, that will be wired as input of the newly created process instance.""" return OSProcessPipelineExecutionInstance( self.executable, self.execArgs, upstreamProcessOutput, self.allowedExitStatusList) class OSProcessPipelineExecutionInstance( guerillabackup.TransformationProcessInterface): """This class defines the execution instance of an OSProcessPipeline element.""" STATE_NOT_STARTED = 0 STATE_RUNNING = 1 # This state reached when the process has already terminated but # input/output shutdown is still pending. STATE_SHUTDOWN = 2 STATE_ENDED = 3 def __init__(self, executable, execArgs, upstreamProcessOutput, allowedExitStatusList): self.executable = executable self.execArgs = execArgs self.upstreamProcessOutput = upstreamProcessOutput if self.upstreamProcessOutput is None: # Avoid reading from real stdin, use replacement output. self.upstreamProcessOutput = NullProcessOutputStream() self.upstreamProcessOutputBuffer = b'' self.inputPipe = None self.allowedExitStatusList = allowedExitStatusList # Simple state tracking to be more consistent on multiple invocations # of the same method. States are "not starte", "running", "ended" self.processState = OSProcessPipelineExecutionInstance.STATE_NOT_STARTED self.process = None # Process output instance of this process only when no output # file descriptor is set. self.processOutput = None # This exception holds any processing error until doProcess() # or stop() is called. self.processingException = None def createProcess(self, outputFd): """Create the process. @param outputFd if not None, use this as output stream descriptor.""" # Create the process file descriptor pairs manually. Otherwise # it is not possible to wait() for the process first and continue # to read from the other side of the pipe after garbage collection # of the process object. self.inputPipe = None outputPipeFds = None if outputFd is None: outputPipeFds = os.pipe2(os.O_CLOEXEC) outputFd = outputPipeFds[1] if self.upstreamProcessOutput.getOutputStreamDescriptor() is None: self.process = subprocess.Popen( self.execArgs, executable=self.executable, stdin=subprocess.PIPE, stdout=outputFd) self.inputPipe = self.process.stdin flags = fcntl.fcntl(self.inputPipe.fileno(), fcntl.F_GETFL) fcntl.fcntl(self.inputPipe.fileno(), fcntl.F_SETFL, flags|os.O_NONBLOCK) else: self.process = subprocess.Popen( self.execArgs, executable=self.executable, stdin=self.upstreamProcessOutput.getOutputStreamDescriptor(), stdout=outputFd) self.processOutput = None if outputPipeFds is not None: # Close the write side now. os.close(outputPipeFds[1]) self.processOutput = TransformationProcessOutputStream( outputPipeFds[0]) def getProcessOutput(self): """Get the output connector of this transformation process.""" if self.processState != OSProcessPipelineExecutionInstance.STATE_NOT_STARTED: raise Exception('Output manipulation only when not started yet') if self.process is None: self.createProcess(None) if self.processOutput is None: raise Exception('No access to process output in stream mode') return self.processOutput def setProcessOutputStream(self, processOutputStream): """Some processes may also support setting of an output stream file descriptor. This is especially useful if the process is the last one in a pipeline and hence could write directly to a file or network descriptor. @throw Exception if this process does not support setting of output stream descriptors.""" if self.processState != OSProcessPipelineExecutionInstance.STATE_NOT_STARTED: raise Exception('Output manipulation only when not started yet') if self.process is not None: raise Exception('No setting of output stream after previous ' \ 'setting or call to getProcessOutput') self.createProcess(processOutputStream) def checkConnected(self): """Check if this process instance is already connected to an output, e.g. via getProcessOutput or setProcessOutputStream.""" if (self.processState == OSProcessPipelineExecutionInstance.STATE_NOT_STARTED) and \ (self.process is None): raise Exception('Operation mode not known while not fully connected') # Process instance only created when connected, so everything OK. def isAsynchronous(self): """A asynchronous process just needs to be started and will perform data processing on streams without any further interaction while running.""" self.checkConnected() return self.inputPipe is None def start(self): """Start this execution process.""" if self.processState != OSProcessPipelineExecutionInstance.STATE_NOT_STARTED: raise Exception('Already started') self.checkConnected() # The process itself was already started when being connected. # Just update the state here. self.processState = OSProcessPipelineExecutionInstance.STATE_RUNNING def stop(self): """Stop this execution process when still running. @return None when the the instance was already stopped, information about stopping, e.g. the stop error message when the process was really stopped.""" if self.processState == OSProcessPipelineExecutionInstance.STATE_NOT_STARTED: raise Exception('Not started') # We are already stopped, do othing here. if self.processState == OSProcessPipelineExecutionInstance.STATE_ENDED: return None # Clear any pending processing exceptions. This is the last chance # for reporting anyway. stopException = self.processingException self.processingException = None if self.processState == OSProcessPipelineExecutionInstance.STATE_RUNNING: # The process was not stopped yet, do it. There is a small chance # that we send a signal to a dead process before waiting on it # and hence we would have a normal termination here. Ignore that # case, a stop() indicates need for abnormal termination with # risk of data loss. self.process.kill() self.process.wait() self.process = None self.processState = OSProcessPipelineExecutionInstance.STATE_SHUTDOWN # Now we are in STATE_SHUTDOWN. self.finishProcess() # If there was no previous exception, copy any exception from # finishing the process. if stopException is None: stopException = self.processingException self.processingException = None return stopException def isRunning(self): """See if this process instance is still running. @return False if instance was not yet started or already stopped. If there are any unreported pending errors from execution, this method will return True until doProcess() or stop() is called at least once.""" if self.processState in [ OSProcessPipelineExecutionInstance.STATE_NOT_STARTED, OSProcessPipelineExecutionInstance.STATE_ENDED]: return False if self.processingException is not None: # There is a pending exception, which is cleared only in doProcess() # or stop(), so pretend that the process is still running. return True if self.processState == OSProcessPipelineExecutionInstance.STATE_RUNNING: (pid, status) = os.waitpid(self.process.pid, os.WNOHANG) if pid == 0: return True self.process = None self.processState = OSProcessPipelineExecutionInstance.STATE_SHUTDOWN if (status&0xff) != 0: self.processingException = Exception('Process end by signal %d, ' \ 'status 0x%x' % (status&0xff, status)) elif (status>>8) not in self.allowedExitStatusList: self.processingException = Exception('Process end with unexpected ' \ 'exit status %d' % (status>>8)) # We are in shutdown here. See if we can finish that phase immediately. if self.processingException is None: try: self.upstreamProcessOutput.close() self.processState = OSProcessPipelineExecutionInstance.STATE_ENDED except Exception as closeException: self.processingException = closeException # Pretend that we are still running so that pending exception # is reported with next doProcess() call. return self.processState != OSProcessPipelineExecutionInstance.STATE_ENDED def doProcess(self): """This method triggers the data transformation operation of this component. For components in synchronous mode, the method will attempt to move data from input to output. Asynchronous components will just check the processing status and may raise an exception, when processing terminated with errors. As such a component might not be able to detect the amount of data really moved since last invocation, the component may report a fake single byte move. @throws Exception if an uncorrectable transformation state was reached and transformation cannot proceed, even though end of input data was not yet seen. Raise exception also when process was not started or already stopped. @return the number of bytes read or written or at least a value greater zero if any data was processed. A value of zero indicates, that currently data processing was not possible due to filled buffers but should be attemted again. A value below zero indicates that all input data was processed and output buffers were flushed already.""" if self.processState == OSProcessPipelineExecutionInstance.STATE_NOT_STARTED: raise Exception('Not started') if self.processState == OSProcessPipelineExecutionInstance.STATE_ENDED: # This must be a logic error attempting to process data when # already stopped. raise Exception('Process %s already stopped' % self.executable) if self.processingException is not None: processingException = self.processingException self.processingException = None # We are dead here anyway, close inputs and outputs ignoring any # data possibly lost. self.finishProcess() self.processState = OSProcessPipelineExecutionInstance.STATE_ENDED raise processingException if self.inputPipe is not None: if len(self.upstreamProcessOutputBuffer) == 0: self.upstreamProcessOutputBuffer = self.upstreamProcessOutput.readData(1<<16) if self.upstreamProcessOutputBuffer is None: self.inputPipe.close() self.inputPipe = None if ((self.upstreamProcessOutputBuffer is not None) and (len(self.upstreamProcessOutputBuffer) != 0)): writeLength = self.inputPipe.write(self.upstreamProcessOutputBuffer) if writeLength == len(self.upstreamProcessOutputBuffer): self.upstreamProcessOutputBuffer = b'' else: self.upstreamProcessOutputBuffer = self.upstreamProcessOutputBuffer[writeLength:] return writeLength if self.isRunning(): # Pretend that we are still waiting for more input, thus polling # may continue when at least another component moved data. return 0 # All pipes are empty and no more processing is possible. return -1 def getBlockingStreams(self, readStreamList, writeStreamList): """Collect the file descriptors that are currently blocking this synchronous compoment.""" # The upstream input can be ignored when really a file descriptor, # it is wired to this process for asynchronous use anyway. When # not a file descriptor, writing to the input pipe may block. if self.inputPipe is not None: writeStreamList.append(self.inputPipe.fileno()) def finishProcess(self): """This method cleans up the current process after operating system process termination but maybe before handling of pending exceptions. The method will set the processingException when finishProcess caused any errors. An error is also when this method is called while upstream did not close the upstream output stream yet.""" readData = self.upstreamProcessOutput.readData(64) if readData is not None: if len(readData) == 0: self.processingException = Exception('Upstream did not finish yet, data might be lost') else: self.processingException = Exception('Not all upstream data processed') # Upstream is delivering data that was not processed. Close the # output so that upstream will also notice when attempting to # write to will receive an exception. self.upstreamProcessOutput.close() self.upstreamProcessOutput = None if self.upstreamProcessOutputBuffer is not None: self.processingException = Exception( 'Output buffers to process not drained yet, %d bytes lost' % len(self.upstreamProcessOutputBuffer)) self.upstreamProcessOutputBuffer = None self.process = None guerillabackup-0.5.0/src/lib/guerillabackup/TarBackupUnit.py000066400000000000000000000777451450137035300241500ustar00rootroot00000000000000"""This unit provides support for full and incremental tar backups. All tar backups within the same unit are created sequentially, starting with the one most overdue. For incremental backups, backup indices are kept in the persistency storage of this backup unit. Those files might be compressed by the backup unit and are kept for a limited timespan. An external process might also remove them without causing damage.""" import base64 import datetime import errno import hashlib import json import os import subprocess import sys import time import traceback import guerillabackup from guerillabackup.BackupElementMetainfo import BackupElementMetainfo # This is the key to the list of tar backups to schedule and perform # using the given unit. Each entry has to be a dictionary containing # entries to create TarBackupUnitDescription objects. See there # for more information. CONFIG_LIST_KEY = 'TarBackupUnitConfigList' class TarBackupUnitDescription: """This class collects all the information about a single tar backup to be scheduled by a unit.""" def __init__(self, sourceUrl, descriptionDict): """Initialize a single tar backup description using values from a dictionary, e.g. extracted directly from the CONFIG_LIST_KEY parameter. @param sourceUrl the URL which will identify the backups from this subunit. It is used also to store persistency information for that unit. @param descriptionDict dictionary containing the parameters for a single tar backup task. * PreBackupCommand: execute this command given as list of arguments before starting the backup, e.g. create a filesystem or virtual machine snapshot, perform cleanup. * PostBackupCommand: execute this command after starting the backup. * Root: root directory of tar backup, "/" when missing. * Include: list of pathes to include, ["."] when missing. * Exclude: list of patterns to exclude from backup (see tar documentation "--exclude"). When missing and Root is "/", list ["./var/lib/guerillabackup/data"] is used. * IgnoreBackupRaces: flag to indicate if races during backup are acceptable, e.g. because the directories are modified, files changed or removed. When set, such races will not result in non-zero exit status. Off course, it would be more sensible to deploy a snapshot based backup variant using the Pre/PostBackupCommand functions. * FullBackupTiming: tuple with minimum and maximum interval between full backup invocations and modulo base and offset, all in seconds. Without modulo invocation (all values None), full backups will run as soon as minimum interval is exceeded. With modulo timing, modulo trigger is ignored when below minimum time. When gap above maximum interval, immediate backup is started. * IncBackupTiming: When set, incremental backups are created to fill the time between full backups. Timings are specified as tuple with same meaning as in FullBackupTiming parameter. This will also trigger generation of tar file indices when running full backups. * FullOverrideCommand: when set, parameters Exclude, Include, Root are ignored and exactly the given command is executed. * IncOverrideCommand: when set, parameters Exclude, Include, Root are ignored and exactly the given command is executed. * KeepIndices: number of old incremental tar backup indices to keep. With -1 keep all, otherwise keep one the given number. Default is 0. * Policy: If not none, include this string as handling policy within the manifest. * EncryptionKey: If not None, encrypt the input using the named key. Otherwise default encryption key from global configuration might be used.""" if not isinstance(sourceUrl, str): raise Exception('Source URL has to be string') guerillabackup.assertSourceUrlSpecificationConforming(sourceUrl) self.sourceUrl = sourceUrl if not isinstance(descriptionDict, dict): raise Exception('Input description has to be dictionary') self.preBackupCommandList = None self.postBackupCommandList = None self.backupRoot = None self.backupIncludeList = None self.backupExcludeList = None self.ignoreBackupRacesFlag = False self.fullBackupTiming = None self.incBackupTiming = None self.fullBackupOverrideCommand = None self.incBackupOverrideCommand = None self.handlingPolicyName = None self.encryptionKeyName = None self.keepOldIndicesCount = 0 for configKey, configValue in descriptionDict.items(): if ((configKey == 'PreBackupCommand') or (configKey == 'PostBackupCommand') or (configKey == 'FullOverrideCommand') or (configKey == 'IncOverrideCommand')): if not guerillabackup.isValueListOfType(configValue, str): raise Exception('Parameter %s has to be list of string' % configKey) if configKey == 'PreBackupCommand': self.preBackupCommandList = configValue elif configKey == 'PostBackupCommand': self.postBackupCommandList = configValue elif configKey == 'FullOverrideCommand': self.fullBackupOverrideCommand = configValue elif configKey == 'IncOverrideCommand': self.incBackupOverrideCommand = configValue else: raise Exception('Logic error') elif configKey == 'Root': self.backupRoot = configValue elif configKey == 'Include': if not (isinstance(configValue, list) or isinstance(configValue, tuple)): raise Exception( 'Parameter %s has to be list or tuple' % configKey) self.backupIncludeList = configValue elif configKey == 'Exclude': if not (isinstance(configValue, list) or isinstance(configValue, tuple)): raise Exception( 'Parameter %s has to be list or tuple' % configKey) self.backupExcludeList = configValue elif configKey == 'IgnoreBackupRaces': self.ignoreBackupRacesFlag = configValue elif ((configKey == 'FullBackupTiming') or (configKey == 'IncBackupTiming')): if (not isinstance(configValue, list)) or (len(configValue) != 4): raise Exception( 'Parameter %s has to be list with 4 values' % configKey) if configValue[0] is None: raise Exception('Parameter %s minimum interval value must not be None' % configKey) for timeValue in configValue: if (timeValue != None) and (not isinstance(timeValue, int)): raise Exception( 'Parameter %s contains non-number element' % configKey) if configValue[2] != None: if ((configValue[2] <= 0) or (configValue[3] < 0) or (configValue[3] >= configValue[2])): raise Exception( 'Parameter %s modulo timing values invalid' % configKey) if configKey == 'FullBackupTiming': self.fullBackupTiming = configValue else: self.incBackupTiming = configValue elif configKey == 'KeepIndices': if not isinstance(configValue, int): raise Exception('KeepIndices has to be integer value') self.keepOldIndicesCount = configValue elif configKey == 'Policy': self.handlingPolicyName = configValue elif configKey == 'EncryptionKey': self.encryptionKeyName = configValue else: raise Exception('Unsupported parameter %s' % configKey) if self.fullBackupTiming is None: raise Exception('Mandatory FullBackupTiming parameter missing') # The remaining values are not from the unit configuration but # unit state persistency instead. self.lastFullBackupTime = None self.lastAnyBackupTime = None self.lastUuidValue = None # When not None, delay execution of any any backup for that resource # beyond the given time. This is intended for backups that failed # to run to avoid invocation loops. This value is not persistent # between multiple software invocations. self.nextRetryTime = None def getNextInvocationInfo(self, currentTime): """Get the next invocation time for this unit description. @return a tuple with the the number of seconds till invocation or a negative value if invocation is already overdue and the type of backup to generate, full or incremental (inc).""" # See if backup generation is currently blocked. if self.nextRetryTime != None: if currentTime < self.nextRetryTime: return (self.nextRetryTime-currentTime, None) self.nextRetryTime = None if self.lastFullBackupTime is None: return (-self.fullBackupTiming[1], 'full') # Use this as default value. At invocation time, unit may still # decide if backup generation is really neccessary. lastOffset = currentTime-self.lastFullBackupTime result = (self.fullBackupTiming[1]-lastOffset, 'full') if self.fullBackupTiming[2] != None: # This is the delta to the previous (negative) or next (positive) # preferred timepoint. delta = self.fullBackupTiming[3]-(currentTime%self.fullBackupTiming[2]) if delta < 0: # Add default modulo value if preferred backup timepoint is in # the past. delta += self.fullBackupTiming[2] if delta+lastOffset < self.fullBackupTiming[0]: # If time from last backup to next preferred time is below minimum, # then add again the default modulo value. Most likely this will # increase the backup time to be above the maximum backup interval. delta += self.fullBackupTiming[2] if delta+lastOffset > self.fullBackupTiming[1]: # If time from last backup to next preferred time is above maximum, # use the maximum interval to calculate a new delta. delta = self.fullBackupTiming[1]-lastOffset if delta < result[0]: result = (delta, 'full') # When a full backup is overdue, report it. Do not care if there # are incremental backups more overdue. if result[0] <= 0: return result if self.incBackupTiming != None: lastOffset = currentTime-self.lastAnyBackupTime if self.incBackupTiming[2] is None: # Normal minimum, maximum timing mode. delta = self.incBackupTiming[0]-lastOffset if delta < result[0]: result = (delta, 'inc') else: delta = self.incBackupTiming[3]-(currentTime%self.incBackupTiming[2]) if delta < 0: delta += self.incBackupTiming[2] if delta+lastOffset < self.incBackupTiming[0]: delta += self.incBackupTiming[2] if delta+lastOffset > self.incBackupTiming[1]: delta = self.incBackupTiming[1]-lastOffset if delta < result[0]: result = (delta, 'inc') return result def getBackupCommand(self, backupType, indexPathname): """Get the command to execute to create the backup. @param backupType use this mode to create the backup. @param indexPathname path to the index file name, None when backup without indexing is requested.""" if (backupType == 'full') and (self.fullBackupOverrideCommand != None): return self.fullBackupOverrideCommand if (backupType == 'inc') and (self.incBackupOverrideCommand != None): return self.incBackupOverrideCommand backupCommand = ['tar'] if self.ignoreBackupRacesFlag: backupCommand.append('--ignore-failed-read') backupCommand.append('-C') backupCommand.append(self.backupRoot) if self.incBackupTiming != None: backupCommand.append('--listed-incremental') backupCommand.append(indexPathname) if self.backupExcludeList != None: for excludePattern in self.backupExcludeList: backupCommand.append('--exclude=%s' % excludePattern) backupCommand.append('-c') backupCommand.append('--') backupCommand += self.backupIncludeList return backupCommand def getJsonData(self): """Return the state of this object in a format suitable for JSON serialization.""" return [ self.lastFullBackupTime, self.lastAnyBackupTime, str(base64.b64encode(self.lastUuidValue), 'ascii')] class TarBackupUnit(guerillabackup.SchedulableGeneratorUnitInterface): """This class allows to schedule regular runs of tar backups. The unit supports different scheduling modes: modulo timing and maximum gap timing. The time stamps considered are always those from starting the tar backup, not end time. The unit will keep track of the last UUID reported for each resource and generate a new one for each produced file using json-serialized state data. The state data is a dictionary with the resource name as key to refer to a tuple of values: the last successful run timestamp as seconds since 1970, the timestamp of the last full backup run and the UUID value of the last run.""" def __init__(self, unitName, configContext): """Initialize this unit using the given configuration.""" self.unitName = unitName self.configContext = configContext self.testModeFlag = configContext.get(guerillabackup.CONFIG_GENERAL_DEBUG_TEST_MODE_KEY, False) if not isinstance(self.testModeFlag, bool): raise Exception('Configuration parameter %s has to be ' \ 'boolean' % guerillabackup.CONFIG_GENERAL_DEBUG_TEST_MODE_KEY) backupConfigList = configContext.get(CONFIG_LIST_KEY, None) if (backupConfigList is None) or (not isinstance(backupConfigList, dict)): raise Exception('Configuration parameter %s missing or of wrong type' % CONFIG_LIST_KEY) self.backupUnitDescriptions = {} for sourceUrl, configDef in backupConfigList.items(): self.backupUnitDescriptions[sourceUrl] = TarBackupUnitDescription( sourceUrl, configDef) # Start loading the persistency information. persistencyDirFd = None persistencyFileHandle = None stateData = None try: persistencyDirFd = guerillabackup.openPersistencyFile( configContext, os.path.join('generators', self.unitName), os.O_DIRECTORY|os.O_RDONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY, 0o700) try: persistencyFileHandle = guerillabackup.secureOpenAt( persistencyDirFd, 'state.current', fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY) except OSError as openError: if openError.errno != errno.ENOENT: raise # See if the state.previous file exists, if yes, the unit is likely # to be broken. Refuse to do anything while in this state. try: os.stat( 'state.previous', dir_fd=persistencyDirFd, follow_symlinks=False) raise Exception( 'Persistency data inconsistencies: found stale previous state file') except OSError as statError: if statError.errno != errno.ENOENT: raise # So there is only the current state file, if any. if persistencyFileHandle != None: stateData = b'' while True: data = os.read(persistencyFileHandle, 1<<20) if len(data) == 0: break stateData += data os.close(persistencyFileHandle) persistencyFileHandle = None finally: if persistencyFileHandle != None: os.close(persistencyFileHandle) if persistencyDirFd != None: os.close(persistencyDirFd) # Start mangling of data after closing all file handles. if stateData is None: print('%s: first time activation, no persistency data found' % self.unitName, file=sys.stderr) else: stateInfo = json.loads(str(stateData, 'ascii')) if not isinstance(stateInfo, dict): raise Exception('Persistency data structure mismatch') for url, stateData in stateInfo.items(): description = self.backupUnitDescriptions.get(url, None) if description is None: # Ignore this state, user might have removed a single tar backup # configuration without deleting the UUID and timing data. print('No tar backup configuration for %s resource state data %s' % ( url, repr(stateData)), file=sys.stderr) continue description.lastFullBackupTime = stateData[0] description.lastAnyBackupTime = stateData[1] # The UUID is kept internally as binary data string. Only for # persistency, data will be base64 encoded. description.lastUuidValue = base64.b64decode(stateData[2]) def findNextInvocationUnit(self): """Find the next unit to invoke. @return a tuple containing the seconds till next invocation and the corresponding TarBackupUnitDescription. Next invocation time might be negative if unit invocation is already overdue.""" currentTime = int(time.time()) nextInvocationTime = None nextDescription = None for url, description in self.backupUnitDescriptions.items(): info = description.getNextInvocationInfo(currentTime) if (nextInvocationTime is None) or (info[0] < nextInvocationTime): nextInvocationTime = info[0] nextDescription = description if nextInvocationTime is None: return None return (nextInvocationTime, nextDescription) def getNextInvocationTime(self): """Get the time in seconds until this unit should called again. If a unit does not know (yet) as invocation needs depend on external events, it should report a reasonable low value to be queried again soon. @return 0 if the unit should be invoked immediately, the seconds to go otherwise.""" nextUnitInfo = self.findNextInvocationUnit() if nextUnitInfo is None: return 3600 if nextUnitInfo[0] < 0: return 0 return nextUnitInfo[0] def processInput(self, tarUnitDescription, sink, persistencyDirFd): """Process a single input description by creating the tar stream and updating the indices, if any. When successful, persistency information about this subunit is updated also.""" # Keep time of invocation check and start of backup procedure # also for updating the unit data. currentTime = int(time.time()) (invocationTime, backupType) = tarUnitDescription.getNextInvocationInfo( currentTime) indexFilenamePrefix = None indexFilePathname = None nextIndexFileName = None if tarUnitDescription.incBackupTiming != None: # We will have to create an index, open the index directory at # first. indexFilenamePrefix = tarUnitDescription.sourceUrl[1:].replace('/', '-') # Make sure the filename cannot get longer than 256 bytes, even # with ".index(.bz2).yyyymmddhhmmss" (25 chars) appended. if len(indexFilenamePrefix) > 231: indexFilenamePrefix = indexFilenamePrefix[:231] # Create the new index file. nextIndexFileName = '%s.index.next' % indexFilenamePrefix nextIndexFileHandle = guerillabackup.secureOpenAt( persistencyDirFd, nextIndexFileName, fileOpenFlags=os.O_WRONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY, fileCreateMode=0o600) indexFilePathname = os.path.join( guerillabackup.getPersistencyBaseDirPathname(self.configContext), 'generators', self.unitName, nextIndexFileName) if backupType == 'inc': # See if there is an old index. When missing, change the mode # to "full". indexStatResult = None try: indexStatResult = os.stat( '%s.index' % indexFilenamePrefix, dir_fd=persistencyDirFd, follow_symlinks=False) except OSError as statError: if statError.errno != errno.ENOENT: raise if indexStatResult is None: backupType = 'full' else: # Copy content from current index to new one. currentIndexFileHandle = guerillabackup.secureOpenAt( persistencyDirFd, '%s.index' % indexFilenamePrefix, fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY) while True: data = os.read(currentIndexFileHandle, 1<<20) if len(data) == 0: break os.write(nextIndexFileHandle, data) os.close(currentIndexFileHandle) os.close(nextIndexFileHandle) # Everything is prepared for backup, start it. if tarUnitDescription.preBackupCommandList != None: if self.testModeFlag: print('No invocation of PreBackupCommand in test mode', file=sys.stderr) else: process = subprocess.Popen(tarUnitDescription.preBackupCommandList) returnCode = process.wait() if returnCode != 0: raise Exception('Pre backup command %s failed in %s, source %s' % ( repr(tarUnitDescription.preBackupCommandList)[1:-1], self.unitName, tarUnitDescription.sourceUrl)) # Start the unit itself. backupCommand = tarUnitDescription.getBackupCommand( backupType, indexFilePathname) # Accept creation of tar archives only with zero exit status or # return code 1, when files were concurrently modified and those # races should be ignored. allowedExitStatusList = [0] if tarUnitDescription.ignoreBackupRacesFlag: allowedExitStatusList.append(1) completePipleline = [guerillabackup.OSProcessPipelineElement( '/bin/tar', backupCommand, allowedExitStatusList)] # Get the downstream transformation pipeline elements. completePipleline += guerillabackup.getDefaultDownstreamPipeline( self.configContext, tarUnitDescription.encryptionKeyName) # Build the transformation pipeline instance. sinkHandle = sink.getSinkHandle(tarUnitDescription.sourceUrl) sinkStream = sinkHandle.getSinkStream() # Get the list of started pipeline instances. pipelineInstances = guerillabackup.instantiateTransformationPipeline( completePipleline, None, sinkStream, doStartFlag=True) try: guerillabackup.runTransformationPipeline(pipelineInstances) except: # Just cleanup the incomplete index file when incremental mode # was requested. if not nextIndexFileName is None: os.unlink(nextIndexFileName, dir_fd=persistencyDirFd) raise digestData = pipelineInstances[-1].getDigestData() metaInfoDict = {} metaInfoDict['BackupType'] = backupType if tarUnitDescription.handlingPolicyName != None: metaInfoDict['HandlingPolicy'] = [tarUnitDescription.handlingPolicyName] lastUuid = tarUnitDescription.lastUuidValue currentUuidDigest = hashlib.sha512() if lastUuid != None: metaInfoDict['Predecessor'] = lastUuid currentUuidDigest.update(lastUuid) # Add the compressed file digest to make UUID different for different # content. currentUuidDigest.update(digestData) # Also include the timestamp and source URL in the UUID calculation # to make UUID different for backup of identical data at (nearly) # same time. currentUuidDigest.update(bytes('%d %s' % ( currentTime, tarUnitDescription.sourceUrl), sys.getdefaultencoding())) currentUuid = currentUuidDigest.digest() metaInfoDict['DataUuid'] = currentUuid metaInfoDict['StorageFileChecksumSha512'] = digestData metaInfoDict['Timestamp'] = currentTime metaInfo = BackupElementMetainfo(metaInfoDict) sinkHandle.close(metaInfo) if self.testModeFlag: raise Exception('No completion of tar backup in test mode') if tarUnitDescription.postBackupCommandList != None: process = subprocess.Popen(tarUnitDescription.postBackupCommandList) returnCode = process.wait() if returnCode != 0: # Still raise an exception and thus prohibit completion of this # tar backup. The PostBackupCommand itself cannot have an influence # on the backup created before but the failure might indicate, # that the corresponding PreBackupCommand was problematic. Thus # let the user resolve the problem manually. raise Exception('Post backup command %s failed in %s, source %s' % ( repr(tarUnitDescription.postBackupCommandList)[1:-1], self.unitName, tarUnitDescription.sourceUrl)) if tarUnitDescription.incBackupTiming != None: # See if there is an old index to compress and move, but only # if it should be really kept. Currently fstatat function is not # available, so use open/fstat instead. currentIndexFd = None currentIndexName = '%s.index' % indexFilenamePrefix try: currentIndexFd = guerillabackup.secureOpenAt( persistencyDirFd, currentIndexName, fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY) except OSError as indexOpenError: if indexOpenError.errno != errno.ENOENT: raise targetFileName = None if currentIndexFd != None: if tarUnitDescription.keepOldIndicesCount == 0: os.close(currentIndexFd) os.unlink(currentIndexName, dir_fd=persistencyDirFd) else: statData = os.fstat(currentIndexFd) targetFileTime = int(statData.st_mtime) targetFileHandle = None while True: date = datetime.datetime.fromtimestamp(targetFileTime) dateStr = date.strftime('%Y%m%d%H%M%S') targetFileName = '%s.index.bz2.%s' % (indexFilenamePrefix, dateStr) try: targetFileHandle = guerillabackup.secureOpenAt( persistencyDirFd, targetFileName, fileOpenFlags=os.O_WRONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY, fileCreateMode=0o600) break except OSError as indexBackupOpenError: if indexBackupOpenError.errno != errno.EEXIST: raise targetFileTime += 1 # Now both handles are valid, use external bzip2 binary to perform # compression. process = subprocess.Popen( ['/bin/bzip2', '-c9'], stdin=currentIndexFd, stdout=targetFileHandle) returnCode = process.wait() if returnCode != 0: raise Exception('Failed to compress the old index: %s' % returnCode) os.close(currentIndexFd) # FIXME: we should use utime with targetFileHandle as pathlike # object, only available in Python3.6 and later. os.utime( '/proc/self/fd/%d' % targetFileHandle, (statData.st_mtime, statData.st_mtime)) os.close(targetFileHandle) os.unlink(currentIndexName, dir_fd=persistencyDirFd) # Now previous index was compressed or deleted, link the next # index to the current position. os.link( nextIndexFileName, currentIndexName, src_dir_fd=persistencyDirFd, dst_dir_fd=persistencyDirFd, follow_symlinks=False) os.unlink(nextIndexFileName, dir_fd=persistencyDirFd) if tarUnitDescription.keepOldIndicesCount != -1: # So we should apply limits to the number of index backups. fileList = [] searchPrefix = '%s.index.bz2.' % indexFilenamePrefix searchLength = len(searchPrefix)+14 for fileName in guerillabackup.listDirAt(persistencyDirFd): if ((len(fileName) != searchLength) or (not fileName.startswith(searchPrefix))): continue fileList.append(fileName) fileList.sort() if len(fileList) > tarUnitDescription.keepOldIndicesCount: # Make sure that the new index file was sorted last. When not, # the current state could indicate clock/time problems on the # machine. Refuse to process the indices and issue a warning. indexBackupPos = fileList.index(targetFileName) if indexBackupPos+1 != len(fileList): raise Exception('Sorting of old backup indices inconsistent, refusing cleanup') for fileName in fileList[:-tarUnitDescription.keepOldIndicesCount]: os.unlink(fileName, dir_fd=persistencyDirFd) # Update the UUID map as last step: if any of the steps above # would fail, currentUuid generated in next run will be identical # to this. Sorting out the duplicates will be easy. tarUnitDescription.lastUuidValue = currentUuid # Update the timestamp. tarUnitDescription.lastAnyBackupTime = currentTime if backupType == 'full': tarUnitDescription.lastFullBackupTime = currentTime # Write the new persistency data before returning. self.updateStateData(persistencyDirFd) def invokeUnit(self, sink): """Invoke this unit to create backup elements and pass them on to the sink. Even when indicated via getNextInvocationTime, the unit may decide, that it is not yet ready and not write any element to the sink. @return None if currently there is nothing to write to the sink, a number of seconds to retry invocation if the unit assumes, that there is data to be processed but processing cannot start yet, e.g. due to locks held by other parties or resource, e.g. network storages, currently not available. @throw Exception if the unit internal logic failed in any uncorrectable ways. Even when invoker decides to continue processing, it must not reinvoke this unit before complete reload.""" persistencyDirFd = None try: while True: nextUnitInfo = self.findNextInvocationUnit() if nextUnitInfo is None: return None if nextUnitInfo[0] > 0: return nextUnitInfo[0] if persistencyDirFd is None: # Not opened yet, do it now. persistencyDirFd = guerillabackup.openPersistencyFile( self.configContext, os.path.join('generators', self.unitName), os.O_DIRECTORY|os.O_RDONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY, 0o600) # Try to process the current tar backup unit. There should be # no state change to persist or cleanup, just let any exception # be passed on to caller. try: self.processInput(nextUnitInfo[1], sink, persistencyDirFd) except Exception as processException: print('%s: Error processing tar %s, disabling it temporarily\n%s' % ( self.unitName, repr(nextUnitInfo[1].sourceUrl), processException), file=sys.stderr) traceback.print_tb(sys.exc_info()[2]) nextUnitInfo[1].nextRetryTime = time.time()+3600 finally: if persistencyDirFd != None: try: os.close(persistencyDirFd) persistencyDirFd = None except Exception as closeException: print('FATAL: Internal Error: failed to close persistency ' \ 'directory handle %d: %s' % ( persistencyDirFd, str(closeException)), file=sys.stderr) def updateStateData(self, persistencyDirFd): """Replace the current state data file with one containing the current unit internal state. @throw Exception is writing fails for any reason. The unit will be in incorrectable state afterwards.""" # Create the data structures for writing. stateData = {} for sourceUrl, description in self.backupUnitDescriptions.items(): stateData[sourceUrl] = description.getJsonData() writeData = bytes(json.dumps(stateData), 'ascii') # Try to replace the current state file. At first unlink the old # one. try: os.unlink('state.old', dir_fd=persistencyDirFd) except OSError as unlinkError: if unlinkError.errno != errno.ENOENT: raise # Link the current to the old one. try: os.link( 'state.current', 'state.old', src_dir_fd=persistencyDirFd, dst_dir_fd=persistencyDirFd, follow_symlinks=False) except OSError as relinkError: if relinkError.errno != errno.ENOENT: raise # Unlink the current state. Thus we can then use O_EXCL on create. try: os.unlink('state.current', dir_fd=persistencyDirFd) except OSError as relinkError: if relinkError.errno != errno.ENOENT: raise # Create the new file. fileHandle = None try: fileHandle = guerillabackup.secureOpenAt( persistencyDirFd, 'state.current', fileOpenFlags=os.O_WRONLY|os.O_CREAT|os.O_EXCL|os.O_NOFOLLOW|os.O_NOCTTY, fileCreateMode=0o600) os.write(fileHandle, writeData) # Also close handle within try, except block to catch also delayed # errors after write. os.close(fileHandle) fileHandle = None except Exception as stateSaveException: # Writing of state information failed. Print out the state information # for manual reconstruction as last resort. print('Writing of state information failed: %s\nCurrent state: ' \ '%s' % (str(stateSaveException), repr(writeData)), file=sys.stderr) traceback.print_tb(sys.exc_info()[2]) raise finally: if fileHandle != None: os.close(fileHandle) # Declare the main unit class so that the backup generator can # instantiate it. backupGeneratorUnitClass = TarBackupUnit guerillabackup-0.5.0/src/lib/guerillabackup/Transfer.py000066400000000000000000001555771450137035300232200ustar00rootroot00000000000000"""This module contains a collection of interfaces and classes for agent-based transfer and synchronization.""" import errno import json import os import select import socket import struct import sys import threading import time import guerillabackup from guerillabackup.BackupElementMetainfo import BackupElementMetainfo class TransferContext(): """This class stores all information about a remote TransferAgent while it is attached to this TransferAgent. It is the responsibility of the class creating a new context to authenticate the remote side and to assign the correct agent id if needed. @param localStorage local storage to used by policies for data retrieval and storage.""" def __init__( self, agentId, receiverTransferPolicy, senderTransferPolicy, localStorage): self.agentId = agentId self.receiverTransferPolicy = receiverTransferPolicy self.senderTransferPolicy = senderTransferPolicy self.localStorage = localStorage self.clientProtocolAdapter = None self.serverProtocolAdapter = None self.shutdownOfferedFlag = False self.shutdownAcceptedFlag = False def connect(self, clientProtocolAdapter, serverProtocolAdapter): """Connect this context with the client and server adapters.""" self.clientProtocolAdapter = clientProtocolAdapter self.serverProtocolAdapter = serverProtocolAdapter def offerShutdown(self): """Offer protocol shutdown to the other side.""" if self.clientProtocolAdapter is None: raise Exception('Cannot offer shutdown while not connected') if self.shutdownOfferedFlag: raise Exception('Shutdown already offered') self.clientProtocolAdapter.offerShutdown() self.shutdownOfferedFlag = True def waitForShutdown(self): """Wait for the remote side to offer a shutdown and accept it.""" if self.shutdownAcceptedFlag: return self.clientProtocolAdapter.waitForShutdown() self.shutdownAcceptedFlag = True def isShutdownAccepted(self): """Check if we already accepted a remote shutdown offer.""" return self.shutdownAcceptedFlag class ProtocolDataElementStream: """This is the interface of any client protocol stream to a remote data element.""" def read(self, readLength=0): """Read data from the current data element. @param readLength if not zero, return a chunk of data with at most the given length. When zero, return chunks with the default size of the underlying IO layer. @return the amount of data read or an empty string when end of stream was reached.""" raise Exception('Interface method called') def close(self): """Close the stream. This call might discard data already buffered within this object or the underlying IO layer. This method has to be invoked also when the end of the stream was reached.""" raise Exception('Interface method called') class ClientProtocolInterface: """This is the client side protocol adapter to initiate retrieval of remote data from a remote SenderTransferPolicy. Each method of the interface but also the returned FileInfo objects may raise an IOError('Connection closed') to indicate connection failures.""" def getRemotePolicyInfo(self): """Get information about the remote policy. The local agent may then check if remote SenderTransferPolicy is compatible to local settings and ReceiverTransferPolicy. @return information about the remote sender policy or None when no sender policy is installed, thus requesting remote files is impossible.""" raise Exception('Interface method called') def startTransaction(self, queryData): """Start or restart a query transaction to retrive files from beginning on, even when skipped in previous round. The query pointer is positioned before the first FileInfo to retrieve. @param query data to send as query to remote side. @throws exception if transation start is not possible or query data was not understood by remote side.""" raise Exception('Interface method called') def nextDataElement(self, wasStoredFlag=False): """Move to the next FileInfo. @param wasStoredFlag if true, indicate that the previous file was stored successfully on local side. @return True if a next FileInfo is available, False otherwise.""" raise Exception('Interface method called') def getDataElementInfo(self): """Get information about the currently selected data element. The method can be invoked more than once on the same data element. Extraction of associated stream data is only be possible until proceeding to the next FileInfo using nextDataElement(). @return a tuple with the source URL, metadata and the attribute dictionary visible to the client or None when no element is currently selected.""" raise Exception('Interface method called') def getDataElementStream(self): """Get a stream to read from the remote data element. While stream is open, no other client protocol methods can be called. @throws Exception if no transaction is open or no current data element selected for transfer. @return an instance of ProtocolDataElementStream for reading.""" raise Exception('Interface method called') def getFileInfos(self, count): """Get the next file infos from the remote side. This method is equivalent to calling nextDataElement() and getDataElementInfo() count times. This will also finish any currently open FileInfo indicating no sucessful storage to remote side.""" raise Exception('Interface method called') def offerShutdown(self): """Offer remote side to shutdown the connection. The other side has to confirm the offer to allow real shutdown.""" raise Exception('Interface method called') def waitForShutdown(self): """Wait for the remote side to offer a shutdown. As we cannot force remote side to offer shutdown using, this method may block.""" raise Exception('Interface method called') def forceShutdown(self): """Force an immediate shutdown without prior anouncement just by terminating all connections and releasing all resources.""" raise Exception('Interface method called') class ServerProtocolInterface: """This is the server side protocol adapter to be provided to the transfer service to forward remote requests to the local SenderPolicy. Methods are named identically but have different service contract as in ClientProtocolInterface.""" def getPolicyInfo(self): """Get information about the remote SenderTransferPolicy. The local agent may then check if the policy is compatible to local settings and ReceiverTransferPolicy. @return information about the installed SenderTransferPolicy or None without a policy.""" raise Exception('Interface method called') def startTransaction(self, queryData): """Start or restart a query transaction to retrive files from beginning on, even when skipped in previous round. The query pointer is positioned before the first FileInfo to retrieve. @param query data received from remote side. @throws Exception if transation start is not possible or query data was not understood.""" raise Exception('Interface method called') def nextDataElement(self, wasStoredFlag=False): """Move to the next FileInfo. @param wasStoredFlag if true, indicate that the previous file was stored successfully on local side. @return True if a next FileInfo is available, False otherwise.""" raise Exception('Interface method called') def getDataElementInfo(self): """Get information about the currently selected data element. The method can be invoked more than once on the same data element. Extraction of associated stream data is only be possible until proceeding to the next FileInfo using nextDataElement(). @return a tuple with the source URL, metadata and the attribute dictionary visible to the client or None when no element is currently selected.""" raise Exception('Interface method called') def getDataElementStream(self): """Get a stream to read the currently selected data element. While stream is open, no other protocol methods can be called. @throws Exception if no transaction is open or no current data element selected for transfer.""" raise Exception('Interface method called') class SenderTransferPolicy(): """This is the common superinterface of all sender side transfer policies. A policy implementation has perform internal consistency checks after data modification as needed, the applyPolicy call is only to notify policy about state changes due to transfers.""" def getPolicyInfo(self): """Get information about the sender policy.""" raise Exception('Interface method called') def queryBackupDataElements(self, transferContext, queryData): """Query the local sender transfer policy to return a query result with elements to be transfered to the remote side. @param queryData when None, return all elements to be transfered. Otherwise apply the policy specific query data to limit the number of elements. @return BackupDataElementQueryResult""" raise Exception('Interface method called') def applyPolicy(self, transferContext, backupDataElement, wasStoredFlag): """Apply this policy to adopt to changes due to access to the a backup data element within a storage context. @param backupDataElement a backup data element instance of StorageBackupDataElementInterface returned by the queryBackupDataElements method of this policy. @param wasStoredFlag flag indicating if the remote side also fetched the data of this object.""" raise Exception('Interface method called') class ReceiverTransferPolicy: """This is the common superinterface of all receiver transfer policies.""" def isSenderPolicyCompatible(self, policyInfo): """Check if a remote sender policy is compatible to this receiver policy. @return True when compatible.""" raise Exception('Interface method called') def applyPolicy(self, transferContext): """Apply this policy for the given transfer context. This method should be invoked only after checking that policies are compliant. The policy will then use access to remote side within transferContext to fetch data elements and modify local and possibly also remote storage.""" raise Exception('Interface method called') class SenderMoveDataTransferPolicy(SenderTransferPolicy): """This is a simple sender transfer policy just advertising all resources for transfer and removing them or marking them as transfered as soon as remote side confirms sucessful transfer. A file with a mark will not be offered for download any more.""" def __init__(self, configContext, markAsDoneOnlyFlag=False): self.configContext = configContext self.markAsDoneOnlyFlag = markAsDoneOnlyFlag if self.markAsDoneOnlyFlag: raise Exception('FIXME: no persistency support for marking yet') def getPolicyInfo(self): """Get information about the sender policy.""" return 'SenderMoveDataTransferPolicy' def queryBackupDataElements(self, transferContext, queryData): """Query the local sender transfer policy to return a query result with elements to be transfered to the remote side. @param queryData when None, return all elements to be transfered. Otherwise apply the policy specific query data to limit the number of elements. @return BackupDataElementQueryResult""" query = None if queryData != None: if not isinstance(queryData, list): raise Exception('Unsupported query data') queryType = queryData[0] if queryType == 'SourceUrl': raise Exception('Not yet') else: raise Exception('Unsupported query type') return transferContext.localStorage.queryBackupDataElements(query) def applyPolicy(self, transferContext, backupDataElement, wasStoredFlag): """Apply this policy to adopt to changes due to access to the a backup data element within a storage context. @param backupDataElement a backup data element instance of StorageBackupDataElementInterface returned by the queryBackupDataElements method of this policy. @param wasStoredFlag flag indicating if the remote side also fetched the data of this object.""" # When other side did not confirm receiving the data, keep this # element active. if not wasStoredFlag: return if self.markAsDoneOnlyFlag: raise Exception('FIXME: no persistency support for marking yet') # Remove the element from the storage. backupDataElement.delete() class ReceiverStoreDataTransferPolicy(ReceiverTransferPolicy): """This class defines a receiver policy, that attempts to fetch all data elements offered by the remote transfer agent.""" def __init__(self, configContext): """Create this policy using the given configuration.""" self.configContext = configContext def isSenderPolicyCompatible(self, policyInfo): """Check if a remote sender policy is compatible to this receiver policy. @return True when compatible.""" return policyInfo == 'SenderMoveDataTransferPolicy' def applyPolicy(self, transferContext): """Apply this policy for the given transfer context. This method should be invoked only after checking that policies are compliant. The policy will then use access to remote side within transferContext to fetch data elements and modify local and possibly also remote storage.""" # Just start a transaction and try to verify, that each remote # element is also present in local storage. transferContext.clientProtocolAdapter.startTransaction(None) while transferContext.clientProtocolAdapter.nextDataElement(True): (sourceUrl, metaInfo, attributeDict) = \ transferContext.clientProtocolAdapter.getDataElementInfo() # Now we know about the remote object. See if it is already available # within local storage. localElement = transferContext.localStorage.getBackupDataElementForMetaData( sourceUrl, metaInfo) if localElement != None: # Element is already available, not attempting to copy. continue # Create the sink to store the element. sinkHandle = transferContext.localStorage.getSinkHandle(sourceUrl) dataStream = transferContext.clientProtocolAdapter.getDataElementStream() while True: streamData = dataStream.read() if streamData == b'': break sinkHandle.write(streamData) sinkHandle.close(metaInfo) class TransferAgent(): """The TransferAgent keeps track of all currently open transfer contexts and orchestrates transfer.""" def addConnection(self, transferContext): """Add a connection to the local agent.""" raise Exception('Interface method called') def shutdown(self, forceShutdownTime=-1): """Trigger shutdown of this TransferAgent and all open connections established by it. The method call shall return as fast as possible as it might be invoked via signal handlers, that should not be blocked. If shutdown requires activities with uncertain duration, e.g. remote service acknowledging clean shutdown, those tasks shall be performed in another thread, e.g. the main thread handling the connections. @param forceShutdowTime when 0 this method will immediately end all service activity just undoing obvious intermediate state, e.g. deleting temporary files, but will not notify remote side for a clean shutdown or wait for current processes to complete. A value greater zero indicates the intent to terminate within that given amount of time.""" raise Exception('Interface method called') class SimpleTransferAgent(TransferAgent): """This is a minimalistic implementation of a transfer agent. It is capable of single-threaded transfers only, thus only a single connection can be attached to this agent.""" def __init__(self): """Create the local agent.""" self.singletonContext = None self.lock = threading.Lock() def addConnection(self, transferContext): """Add a connection to the local transfer agent. As this agent is only single threaded, the method will return only after this connection was closed already.""" with self.lock: if self.singletonContext is not None: raise Exception( '%s cannot handle multiple connections in parallel' % ( self.__class__.__name__)) self.singletonContext = transferContext try: if not self.ensurePolicyCompliance(transferContext): print('Incompatible policies detected, shutting down', file=sys.stderr) elif transferContext.receiverTransferPolicy != None: # So remote sender policy is compliant to local storage policy. # Recheck local policy until we are done. transferContext.receiverTransferPolicy.applyPolicy(transferContext) # Indicate local shutdown to other side. transferContext.offerShutdown() # Await remote shutdown confirmation. transferContext.waitForShutdown() except OSError as communicationError: if communicationError.errno == errno.ECONNRESET: print('%s' % communicationError.args[1], file=sys.stderr) else: raise finally: transferContext.clientProtocolAdapter.forceShutdown() with self.lock: self.singletonContext = None def ensurePolicyCompliance(self, transferContext): """Check that remote sending policy is compliant to local receiver policy or both policies are not set.""" policyInfo = transferContext.clientProtocolAdapter.getRemotePolicyInfo() if transferContext.receiverTransferPolicy is None: # We are not expecting to receive anything, so remote policy can # never match local one. return policyInfo is None return transferContext.receiverTransferPolicy.isSenderPolicyCompatible( policyInfo) def shutdown(self, forceShutdownTime=-1): """Trigger shutdown of this TransferAgent and all open connections established by it. @param forceShutdowTime when 0 this method will immediately end all service activity just undoing obvious intermediate state, e.g. deleting temporary files, but will not notify remote side for a clean shutdown or wait for current processes to complete. A value greater zero indicates the intent to terminate within that given amount of time.""" transferContext = None with self.lock: if self.singletonContext is None: return transferContext = self.singletonContext if forceShutdownTime == 0: # Immedate shutdown was requested. transferContext.clientProtocolAdapter.forceShutdown() else: transferContext.offerShutdown() class DefaultTransferAgentServerProtocolAdapter(ServerProtocolInterface): """This class provides a default protocol adapter only relaying requests to the sender policy within the given transfer context.""" def __init__(self, transferContext, remoteStorageNotificationFunction=None): """Create the default adapter. This adapter just publishes all DataElements from local storage but does not provide any support for attributes or element deletion. @param remoteStorageNotificationFunction when not None, this function is invoked with context, FileInfo and wasStoredFlag before moving to the next resource.""" self.transferContext = transferContext self.remoteStorageNotificationFunction = remoteStorageNotificationFunction self.transactionIterator = None self.currentDataElement = None def getPolicyInfo(self): """Get information about the remote SenderTransferPolicy. The local agent may then check if the policy is compatible to local settings and ReceiverTransferPolicy. @return information about the installed SenderTransferPolicy or None without a policy.""" if self.transferContext.senderTransferPolicy is None: return None return self.transferContext.senderTransferPolicy.getPolicyInfo() def startTransaction(self, queryData): """Start or restart a query transaction to retrive files from beginning on, even when skipped in previous round. The query pointer is positioned before the first FileInfo to retrieve. @param query data received from remote side. @throws exception if transation start is not possible or query data was not understood.""" if self.currentDataElement != None: self.currentDataElement.invalidate() self.currentDataElement = None self.transactionIterator = \ self.transferContext.senderTransferPolicy.queryBackupDataElements( self.transferContext, queryData) def nextDataElement(self, wasStoredFlag=False): """Move to the next FileInfo. @param wasStoredFlag if true, indicate that the previous file was stored successfully on local side. @return True if a next FileInfo is available, False otherwise.""" if self.currentDataElement != None: self.transferContext.senderTransferPolicy.applyPolicy( self.transferContext, self.currentDataElement, wasStoredFlag) if self.remoteStorageNotificationFunction != None: self.remoteStorageNotificationFunction( self.transferContext, self.currentDataElement, wasStoredFlag) self.currentDataElement.invalidate() self.currentDataElement = None if self.transactionIterator is None: return False dataElement = self.transactionIterator.getNextElement() if dataElement is None: self.transactionIterator = None return False self.currentDataElement = WrappedStorageBackupDataElementFileInfo( dataElement) return True def getDataElementInfo(self): """Get information about the currently selected data element. The method can be invoked more than once on the same data element. Extraction of associated stream data is only be possible until proceeding to the next FileInfo using nextDataElement(). @return a tuple with the source URL, metadata and the attribute dictionary visible to the client.""" if self.currentDataElement is None: return None return (self.currentDataElement.getSourceUrl(), self.currentDataElement.getMetaInfo(), None) def getDataElementStream(self): """Get a stream to read the currently selected data element. While stream is open, no other protocol methods can be called. @throws Exception if no transaction is open or no current data element selected for transfer.""" if self.currentDataElement is None: raise Exception('No data element selected') return self.currentDataElement.getDataStream() class WrappedStorageBackupDataElementFileInfo(): """This is a simple wrapper over a a StorageBackupDataElement to retrieve requested data directly from storage. It does not support any attributes as those are usually policy specific.""" def __init__(self, dataElement): if not isinstance(dataElement, guerillabackup.StorageBackupDataElementInterface): raise Exception('Cannot wrap object not implementing StorageBackupDataElementInterface') self.dataElement = dataElement def getSourceUrl(self): """Get the source URL of this file object.""" return self.dataElement.getSourceUrl() def getMetaInfo(self): """Get only the metadata part of this element. @return a BackupElementMetainfo object""" return self.dataElement.getMetaData() def getAttributes(self): """Get the additional attributes of this file info object. Currently attributes are not supported for wrapped objects.""" return None def setAttribute(self, name, value): """Set an attribute for this file info.""" raise Exception('Not supported') def getDataStream(self): """Get a stream to read data from that element. @return a file descriptor for reading this stream.""" return self.dataElement.getDataStream() def delete(self): """Delete the backup data element behind this object.""" self.dataElement.delete() self.dataElement = None def invalidate(self): """Make sure all resources associated with this element are released. @throws Exception if element is currently in use, e.g. read.""" # Nothing to invalidate here when using primitive, uncached wrapping. pass class MultipartResponseIterator(): """This is the interface for all response iterators. Those should be used, where the response is too large for a single response data block or where means for interruption of an ongoing large response are needed.""" def getNextPart(self): """Get the next part from this iterator. After detecting that no more parts are available or calling release(), the caller must not attempt to invoke the method again. @return the part data or None when no more parts are available.""" raise Exception('Interface method called') def release(self): """This method releases all resources associated with this iterator if the iterator end was not yet reached in getNextPart(). All future calls to getNextPart() or release() will cause exceptions.""" raise Exception('Interface method called') class StreamRequestResponseMultiplexer(): """This class allows to send requests and reponses via a single bidirectional in any connection. The class is not thread-safe and hence does not support concurrent method calls. The transfer protocol consists of a few packet types: * 'A' for aborting a running streaming request. * 'S' for sending a request to the remote side server interface * 'P' for stream response parts before receiving a final 'R' packet. Thus a 'P' packet identifies a stream response. Therefore for zero byte stream responses, there has to be at least a single 'P' package of zero zero size. The final 'R' packet at the end of the stream has to be always of zero size. * 'R' for the remote response packet containing the response data. Any exception due to protocol violations, multiplex connection IO errors or requestHandler processing failures will cause the multiplexer to shutdown all functionality immediately for security reasons.""" def __init__(self, inputFd, outputFd, requestHandler): """Create this multiplexer based on the given input and output file descriptors. @param requestHandler a request handler to process the incoming requests.""" self.inputFd = inputFd self.outputFd = outputFd self.requestHandler = requestHandler # This flag stays true until the response for the last request # was received. self.responsePendingFlag = False # When true, this multiplexer is currently receiving a stream # response from the remote side. self.inStreamResponseFlag = False # This iterator will be none when an incoming request is returning # a group of packets as response. self.responsePartIterator = None self.shutdownOfferSentFlag = False self.shutdownOfferReceivedFlag = False # Input buffer of data from remote side. self.remoteData = b'' self.remoteDataLength = -1 def sendRequest(self, requestData): """Send request data to the remote side and await the result. The method will block until the remote data was received and will process incoming requests while waiting. The method relies on the caller to have fetched all continuation response parts for the previous requests using handleRequests, before submitting a new request. @rais Exception if multiplexer is in invalid state. @return response data when a response was pending and data was received within time. The data is is a tuple containing the binary content and a boolean value indicating if the received data is a complete response or belonging to a stream.""" if (requestData is None) or (len(requestData) == 0): raise Exception('No request data given') if self.responsePendingFlag: raise Exception('Cannot queue another request while response is pending') if self.shutdownOfferSentFlag: raise Exception('Shutdown already offered') return self.internalIOHandler(requestData, 1000) def handleRequests(self, selectTime): """Handle incoming requests by waiting for incoming data for the given amount of time. @return None when only request handling was performed, the no-response continuation data for the last request if received.""" if ((self.shutdownOfferSentFlag) and (self.shutdownOfferReceivedFlag) and not self.responsePendingFlag): raise Exception('Cannot handle requests after shutdown') return self.internalIOHandler(None, selectTime) def offerShutdown(self): """Offer the shutdown to the remote side. This method has the same limitations and requirements as the sendRequest method. It does not return any data.""" if self.shutdownOfferSentFlag: raise Exception('Already offered') self.shutdownOfferSentFlag = True result = self.internalIOHandler(b'', 600) if result[1]: raise Exception('Received unexpected stream response') if len(result[0]) != 0: raise Exception('Unexpected response data on shutdown offer request') def wasShutdownOffered(self): """Check if shutdown was already offered by this side.""" return self.shutdownOfferSentFlag def wasShutdownRequested(self): """Check if remote side has alredy requested shutdown.""" return self.shutdownOfferReceivedFlag def close(self): """Close this multiplexer by closing also all underlying streams.""" if self.inputFd == -1: raise Exception('Already closed') # We were currently transfering stream parts when being shutdown. # Make sure to release the iterator to avoid resource leaks. if self.responsePartIterator != None: self.responsePartIterator.release() self.responsePartIterator = None pendingException = None try: os.close(self.inputFd) except Exception as closeException: # Without any program logic errors, exceptions here are rare # and problematic, therefore report them immediately. print( 'Closing of input stream failed: %s' % str(closeException), file=sys.stderr) pendingException = closeException if self.outputFd != self.inputFd: try: os.close(self.outputFd) except Exception as closeException: print( 'Closing of output stream failed: %s' % str(closeException), file=sys.stderr) pendingException = closeException self.inputFd = -1 self.outputFd = -1 self.shutdownOfferSentFlag = True self.shutdownOfferReceivedFlag = True if pendingException != None: raise pendingException def internalIOHandler(self, requestData, maxHandleTime): """Perform multiplexer IO operations. When requestData was given, it will be sent before waiting for a response and probably handling incoming requests. @param maxHandleTime the maximum time to stay inside this function waiting for input data. When 0, attempt to handle only data without blocking. Writing is not affected by this parameter and will be attempted until successful or fatal error was detected. @return response data when a response was pending and data was received within time. The data is is a tuple containing the binary content and a boolean value indicating if the received data is a complete response or belonging to a stream. @throws OSError when lowlevel IO communication with the remote peer failed or was ended prematurely. @throws Exception when protocol violations were detected.""" sendQueue = [] writeSelectFds = [] if requestData is None: if self.shutdownOfferReceivedFlag and not self.inStreamResponseFlag: raise Exception( 'Request handling attempted after remote shutdown was offered') else: sendQueue.append([b'S'+struct.pack(' (1<<20)): raise Exception('Invalid input data chunk length 0x%x' % self.remoteDataLength) self.remoteDataLength += 5 if self.remoteDataLength != 5: # We read exactly 5 bytes but need more. Let the outer loop do that. return (True, True, None) if len(self.remoteData) < self.remoteDataLength: readData = os.read( self.inputFd, self.remoteDataLength-len(self.remoteData)) if len(readData) == 0: if self.responsePendingFlag: raise Exception('End of data while awaiting response') raise Exception('End of data with partial input') self.remoteData += readData if len(self.remoteData) != self.remoteDataLength: return (True, True, None) # Check the code. if self.remoteData[0] not in b'PRS': raise Exception('Invalid packet type 0x%x' % self.remoteData[0]) if not self.remoteData.startswith(b'S'): # Not sending a request but receiving some kind of response. if not self.responsePendingFlag: raise Exception( 'Received %s packet while not awaiting response' % repr(self.remoteData[0:1])) if self.remoteData.startswith(b'P'): # So this is the first or an additional fragment for a previous response. self.inStreamResponseFlag = True else: self.responsePendingFlag = False self.inStreamResponseFlag = False responseData = (self.remoteData[5:], self.inStreamResponseFlag) self.remoteData = b'' self.remoteDataLength = -1 # We could receive multiple response packets while sendqueue was # not yet flushed containing outgoing responses to requests processed. # Although all those responses would belong to the same request, # fusion is not an option to avoid memory exhaustion. So we have # to delay returning of response data until sendqueue is emptied. return (len(sendQueue) != 0, True, responseData) # So this was another incoming request. Handle it and queue the # response data. if self.responsePartIterator != None: raise Exception( 'Cannot handle additional request while previous one not done') if self.shutdownOfferReceivedFlag: raise Exception('Received request even after shutdown was offered') if self.remoteDataLength == 5: # This is a remote shutdown offer, the last request from the remote # side. Prepare for shutdown. self.shutdownOfferReceivedFlag = True sendQueue.append([b'R\x00\x00\x00\x00', 0]) if not self.responsePendingFlag: del readSelectFds[0] else: handlerResponse = self.requestHandler.handleRequest( self.remoteData[5:]) handlerResponseType = b'R' if isinstance(handlerResponse, MultipartResponseIterator): self.responsePartIterator = handlerResponse handlerResponse = handlerResponse.getNextPart() handlerResponseType = b'P' # Empty files might not even return a single part. But stream # responses have to contain at least one 'P' type packet, so send # an empty one. if handlerResponse is None: handlerResponse = b'' sendQueue.append([handlerResponseType+struct.pack(' readLength): self.dataBuffer = result[readLength:] result = result[:readLength] if len(result) == 0: self.endOfStreamFlag = True return result def close(self): """Close the stream. This call might discard data already buffered within this object or the underlying IO layer. This method has to be invoked also when the end of the stream was reached.""" if self.endOfStreamFlag: return self.clientProtocolAdapter.__internalStreamAbort() self.endOfStreamFlag = True class JsonStreamClientProtocolAdapter(ClientProtocolInterface): """This class forwards client protocol requests via stream file descriptors to the remote side.""" def __init__(self, streamMultiplexer): """Create this adapter based on the given input and output file descriptors.""" self.streamMultiplexer = streamMultiplexer self.inStreamReadingFlag = False def getRemotePolicyInfo(self): """Get information about the remote policy. The local agent may then check if remote SenderTransferPolicy is compatible to local settings and ReceiverTransferPolicy. @return information about the remote sender policy or None when no sender policy is installed, thus requesting remote files is impossible.""" if self.inStreamReadingFlag: raise Exception('Illegal state') (responseData, inStreamReadingFlag) = self.streamMultiplexer.sendRequest( bytes(json.dumps(['getPolicyInfo']), 'ascii')) if inStreamReadingFlag: raise Exception('Protocol error') return json.loads(str(responseData, 'ascii')) def startTransaction(self, queryData): """Start or restart a query transaction to retrive files from beginning on, even when skipped in previous round. The query pointer is positioned before the first FileInfo to retrieve. @param query data to send as query to remote side. @throws exception if transation start is not possible or query data was not understood by remote side.""" if self.inStreamReadingFlag: raise Exception('Illegal state') (responseData, inStreamReadingFlag) = self.streamMultiplexer.sendRequest( bytes(json.dumps(['startTransaction', queryData]), 'ascii')) if inStreamReadingFlag: raise Exception('Protocol error') if len(responseData) != 0: raise Exception('Unexpected response received') def nextDataElement(self, wasStoredFlag=False): """Move to the next FileInfo. @param wasStoredFlag if true, indicate that the previous file was stored successfully on local side. @return True if a next FileInfo is available, False otherwise.""" if self.inStreamReadingFlag: raise Exception('Illegal state') (responseData, inStreamReadingFlag) = self.streamMultiplexer.sendRequest( bytes(json.dumps(['nextDataElement', wasStoredFlag]), 'ascii')) if inStreamReadingFlag: raise Exception('Protocol error') return json.loads(str(responseData, 'ascii')) def getDataElementInfo(self): """Get information about the currently selected FileInfo. Extraction of information and stream may only be possible until proceeding to the next FileInfo using nextDataElement(). @return a tuple with the source URL, metadata and the attribute dictionary visible to the client.""" if self.inStreamReadingFlag: raise Exception('Illegal state') (responseData, inStreamReadingFlag) = self.streamMultiplexer.sendRequest( bytes(json.dumps(['getDataElementInfo']), 'ascii')) if inStreamReadingFlag: raise Exception('Protocol error') result = json.loads(str(responseData, 'ascii')) result[1] = BackupElementMetainfo.unserialize( bytes(result[1], 'ascii')) return result def getDataElementStream(self): """Get a stream to read from currently selected data element. While stream is open, no other protocol methods can be called. @throws Exception if no transaction is open or no current data element selected for transfer.""" self.inStreamReadingFlag = True return JsonClientProtocolDataElementStream(self) def getFileInfos(self, count): """Get the next file infos from the remote side. This method is equivalent to calling nextDataElement() and getDataElementInfo() count times. This will also finish any currently open FileInfo indicating no sucessful storage to remote side.""" raise Exception('Interface method called') def offerShutdown(self): """Offer remote side to shutdown the connection. The other side has to confirm the offer to allow real shutdown.""" if self.inStreamReadingFlag: raise Exception('Illegal state') self.streamMultiplexer.offerShutdown() def waitForShutdown(self): """Wait for the remote side to offer a shutdown. As we cannot force remote side to offer shutdown using the given multiplex mode, just wait until we receive the offer.""" while not self.streamMultiplexer.shutdownOfferReceivedFlag: responseData = self.streamMultiplexer.handleRequests(600) if responseData != None: raise Exception( 'Did not expect to receive data while waiting for shutdown') self.streamMultiplexer.close() self.streamMultiplexer = None def forceShutdown(self): """Force an immediate shutdown without prior anouncement just by terminating all connections and releasing all resources.""" if self.streamMultiplexer != None: self.streamMultiplexer.close() self.streamMultiplexer = None def internalReadDataStream(self, startStreamFlag=False): """Read a remote backup data element as stream.""" if not self.inStreamReadingFlag: raise Exception('Illegal state') responseData = None inStreamReadingFlag = None if startStreamFlag: (responseData, inStreamReadingFlag) = self.streamMultiplexer.sendRequest( bytes(json.dumps(['getDataElementStream']), 'ascii')) if not inStreamReadingFlag: raise Exception('Protocol error') else: (responseData, inStreamReadingFlag) = self.streamMultiplexer.handleRequests(1000) if inStreamReadingFlag != (len(responseData) != 0): raise Exception('Protocol error') if len(responseData) == 0: # So this was the final chunk, switch to normal non-stream mode again. self.inStreamReadingFlag = False return responseData def __internalStreamAbort(self): """Abort reading of the data stream currently open for reading.""" if not self.inStreamReadingFlag: raise Exception('Illegal state') # This is an exception to the normal client/server protocol: send # the abort request even while the current request is still being # processed. (responseData, inStreamReadingFlag) = self.streamMultiplexer.sendRequest( bytes(json.dumps(['abortDataElementStream']), 'ascii')) # Now continue reading until all buffers have drained and final # data chunk was removed. while len(self.internalReadDataStream()) != 0: pass # Now read the response to the abort command itself. (responseData, inStreamReadingFlag) = self.streamMultiplexer.sendRequest(None) if len(responseData) != 0: raise Exception('Unexpected response received') class WrappedFileStreamMultipartResponseIterator(MultipartResponseIterator): """This class wraps an OS stream to provide the data as response iterator.""" def __init__(self, streamFd, chunkSize=1<<16): self.streamFd = streamFd self.chunkSize = chunkSize def getNextPart(self): """Get the next part from this iterator. After detecting that no more parts are available or calling release(), the caller must not attempt to invoke the method again. @return the part data or None when no more parts are available.""" if self.streamFd < 0: raise Exception('Illegal state') readData = os.read(self.streamFd, self.chunkSize) if len(readData) == 0: # This is the end of the stream, we can release it. self.release() return None return readData def release(self): """This method releases all resources associated with this iterator if the iterator end was not yet reached in getNextPart(). All future calls to getNextPart() or release() will cause exceptions.""" if self.streamFd < 0: raise Exception('Illegal state') os.close(self.streamFd) self.streamFd = -1 class JsonStreamServerProtocolRequestHandler(): """This class handles incoming requests encoded in JSON and passes them on to a server protocol adapter.""" def __init__(self, serverProtocolAdapter): self.serverProtocolAdapter = serverProtocolAdapter def handleRequest(self, requestData): """Handle an incoming request. @return the serialized data.""" request = json.loads(str(requestData, 'ascii')) if not isinstance(request, list): raise Exception('Unexpected request data') requestMethod = request[0] responseData = None noResponseFlag = False if requestMethod == 'getPolicyInfo': responseData = self.serverProtocolAdapter.getPolicyInfo() elif requestMethod == 'startTransaction': self.serverProtocolAdapter.startTransaction(request[1]) noResponseFlag = True elif requestMethod == 'nextDataElement': responseData = self.serverProtocolAdapter.nextDataElement(request[1]) elif requestMethod == 'getDataElementInfo': elementInfo = self.serverProtocolAdapter.getDataElementInfo() # Meta information needs separate serialization, do it. if elementInfo != None: responseData = (elementInfo[0], str(elementInfo[1].serialize(), 'ascii'), elementInfo[2]) elif requestMethod == 'getDataElementStream': responseData = WrappedFileStreamMultipartResponseIterator( self.serverProtocolAdapter.getDataElementStream()) else: raise Exception('Invalid request %s' % repr(requestMethod)) if noResponseFlag: return b'' if isinstance(responseData, MultipartResponseIterator): return responseData return bytes(json.dumps(responseData), 'ascii') class SocketConnectorService(): """This class listens on a socket and creates a TransferContext for each incoming connection.""" def __init__( self, socketPath, receiverTransferPolicy, senderTransferPolicy, localStorage, transferAgent): """Create this service to listen on the given path. @param socketPath the path where to create the local UNIX socket. The service will not create the directory required to hold the socket but will unlink any preexisting socket file with that path. This operation is racy and might cause security issues when applied by privileged user on insecure directories.""" self.socketPath = socketPath self.receiverTransferPolicy = receiverTransferPolicy self.senderTransferPolicy = senderTransferPolicy self.localStorage = localStorage self.transferAgent = transferAgent # The local socket to accept incoming connections. The only place # to close the socket is in the shutdown method to avoid concurrent # close in main loop. self.socket = None self.isRunningFlag = False self.shutdownFlag = False self.socket = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) try: self.socket.bind(self.socketPath) except socket.error as bindError: if bindError.errno != errno.EADDRINUSE: raise # Try to unlink the old socket and retry creation. os.unlink(self.socketPath) self.socket.bind(self.socketPath) os.chmod(self.socketPath, 0x180) def run(self): """Run this connector service. The method will not return until shutdown is requested by another thread.""" # Keep a local copy of the server socket to avoid races with # asynchronous shutdown signals. serverSocket = self.socket if self.shutdownFlag or (serverSocket is None): raise Exception('Already shutdown') self.isRunningFlag = True serverSocket.listen(4) while not self.shutdownFlag: clientSocket = None remoteAddress = None try: (clientSocket, remoteAddress) = serverSocket.accept() except OSError as acceptError: if (acceptError.errno != errno.EINVAL) or (not self.shutdownFlag): print( 'Unexpected error %s accepting new connections' % acceptError, file=sys.stderr) continue transferContext = TransferContext( 'socket', self.receiverTransferPolicy, self.senderTransferPolicy, self.localStorage) serverProtocolAdapter = DefaultTransferAgentServerProtocolAdapter( transferContext) # Extract a duplicate of the socket file descriptor. This is needed # as the clientSocket might be garbage collected any time, thus # closing the file descriptors while still needed. clientSocketFd = os.dup(clientSocket.fileno()) streamMultiplexer = StreamRequestResponseMultiplexer( clientSocketFd, clientSocketFd, JsonStreamServerProtocolRequestHandler(serverProtocolAdapter)) # Do not wait for garbage collection, release the object immediately. clientSocket.close() transferContext.connect( JsonStreamClientProtocolAdapter(streamMultiplexer), serverProtocolAdapter) self.transferAgent.addConnection(transferContext) self.isRunningFlag = False def shutdown(self, forceShutdownTime=-1): """Shutdown this connector service and all open connections established by it. This method is usually called by a signal handler as it will take down the whole service including all open connections. The method might be invoked more than once to force immediate shutdown after a previous attempt with timeout did not complete yet. @param forceShutdowTime when 0 this method will immediately end all service activity just undoing obvious intermediate state, e.g. deleting temporary files, but will not notify remote side for a clean shutdown or wait for current processes to complete. A value greater zero indicates the intent to terminate within that given amount of time.""" # Close the socket. This will also interrupt any other thread # in run method if it was currently waiting for new connections. if not self.shutdownFlag: self.socket.shutdown(socket.SHUT_RDWR) os.unlink(self.socketPath) self.socketPath = None # Indicate main loop termination on next possible occasion. self.shutdownFlag = True # Shutdown all open connections. self.transferAgent.shutdown(forceShutdownTime) guerillabackup-0.5.0/src/lib/guerillabackup/TransformationProcessOutputStream.py000066400000000000000000000051141450137035300303530ustar00rootroot00000000000000"""This module provides streams for linking of pipeline elements.""" import os import select import guerillabackup class TransformationProcessOutputStream( guerillabackup.TransformationProcessOutputInterface): """This class implements a filedescriptor stream based transformation output. It can be used for both plain reading but also to pass the file descriptor to downstream processes directly.""" def __init__(self, streamFd): if not isinstance(streamFd, int): raise Exception('Not a valid stream file descriptor') self.streamFd = streamFd def getOutputStreamDescriptor(self): return self.streamFd def readData(self, length): """Read data from this stream without blocking. @return the at most length bytes of data, zero-length data if nothing available at the moment and None when end of input was reached.""" # Perform a select before reading so that we do not need to switch # the stream into non-blocking mode. readFds, writeFds, exFds = select.select([self.streamFd], [], [], 0) # Nothing available yet, do not attempt to read. if len(readFds) == 0: return b'' data = os.read(self.streamFd, length) # Reading will return zero-length data when end of stream was reached. # Return none in that case. if len(data) == 0: return None return data def close(self): """Close this interface. This will guarantee, that any future access will report EOF or an error. @raise Exception if close is attempted there still is data available.""" data = self.readData(64) os.close(self.streamFd) self.streamFd = -1 if data != None: if len(data) == 0: raise Exception('Closing output before EOF, data might be lost') else: raise Exception('Unhandled data in stream lost due to close before EOF') class NullProcessOutputStream( guerillabackup.TransformationProcessOutputInterface): """This class implements a transformation output delivering no output at all. It is useful to seal stdin of a toplevel OS process pipeline element to avoid reading from real stdin.""" def getOutputStreamDescriptor(self): return None def readData(self, length): """Read data from this stream without blocking. @return the at most length bytes of data, zero-length data if nothing available at the moment and None when end of input was reached.""" return None def close(self): """Close this interface. This will guarantee, that any future access will report EOF or an error. @raise Exception if close is attempted there still is data available.""" pass guerillabackup-0.5.0/src/lib/guerillabackup/UnitRunConditions.py000066400000000000000000000054731450137035300250570ustar00rootroot00000000000000"""This module provides various condition classes to determine if a backup unit can be run if the unit itself is ready to be run.""" import time import guerillabackup class IUnitRunCondition(): """This is the interface of all unit run condition classes.""" def evaluate(self): """Evaluate this condition. @return True if the condition is met, false otherwise.""" raise NotImplementedError() class AverageLoadLimitCondition(IUnitRunCondition): """This condition class allows to check the load stayed below a given limit for some time.""" def __init__(self, loadLimit, limitOkSeconds): """Create a new load limit condition. @param loadLimit the 1 minute CPU load limit to stay below. @param limitOkSeconds the number of seconds the machine 1 minute load value has to stay below the limit for this condition to be met.""" self.loadLimit = loadLimit self.limitOkSeconds = limitOkSeconds self.limitOkStartTime = None def evaluate(self): """Evaluate this condition. @return True when the condition is met.""" loadFile = open('/proc/loadavg', 'rb') loadData = loadFile.read() loadFile.close() load1Min = float(loadData.split(b' ')[0]) if load1Min >= self.loadLimit: self.limitOkStartTime = None return False currentTime = time.time() if self.limitOkStartTime is None: self.limitOkStartTime = currentTime return (self.limitOkStartTime + self.limitOkSeconds <= currentTime) class LogicalAndCondition(IUnitRunCondition): """This condition checks if all subconditions evaluate to true. Even when a condition fails to evaluate to true all other conditions are still checked to allow time sensitive conditions keep track of time elapsed.""" def __init__(self, conditionList): """Create a logical and condition. @param conditionList the list of conditions to be met.""" self.conditionList = conditionList def evaluate(self): """Evaluate this condition. @return True when the condition is met.""" result = True for condition in self.conditionList: if not condition.evaluate(): result = False return result class MinPowerOnTimeCondition(): """This class checks if the machine was in powered on long enough to be in stable state and ready for backup load.""" def __init__(self, minPowerOnSeconds): """Create a condition to check machine power up time. @param minPowerOnSeconds the minimum number of seconds the machine has to be powered on.""" self.minPowerOnSeconds = minPowerOnSeconds def evaluate(self): """Evaluate this condition. @return True when the condition is met.""" uptimeFile = open('/proc/uptime', 'rb') uptimeData = uptimeFile.read() uptimeFile.close() uptime = float(uptimeData.split(b' ')[0]) return (uptime >= self.minPowerOnSeconds) guerillabackup-0.5.0/src/lib/guerillabackup/Utils.py000066400000000000000000000035511450137035300225140ustar00rootroot00000000000000"""This module contains shared utility functions used e.g. by generator, transfer and validation services.""" import json def jsonLoadWithComments(fileName): """Load JSON data containing comments from a given file name.""" jsonFile = open(fileName, 'rb') jsonData = jsonFile.read() jsonFile.close() jsonData = b'\n'.join([ b'' if x.startswith(b'#') else x for x in jsonData.split(b'\n')]) return json.loads(str(jsonData, 'utf-8')) def parseTimeDef(timeStr): """Parse a time definition string returning the seconds it encodes. @param timeStr a human readable string defining a time interval. It may contain one or more pairs of numeric values and unit appended to each other without spaces, e.g. "6d20H". Valid units are "m" (month with 30 days), "w" (week, 7 days), "d" (day, 24 hours), "H" (hour with 60 minutes), "M" (minute with 60 seconds), "S" (second). @return the number of seconds encoded in the interval specification.""" if not timeStr: raise Exception('Empty time string not allowed') timeDef = {} numStart = 0 while numStart < len(timeStr): numEnd = numStart while (numEnd < len(timeStr)) and (timeStr[numEnd].isnumeric()): numEnd += 1 if numEnd == numStart: raise Exception() number = int(timeStr[numStart:numEnd]) typeKey = 's' if numEnd != len(timeStr): typeKey = timeStr[numEnd] numEnd += 1 numStart = numEnd if typeKey in timeDef: raise Exception() timeDef[typeKey] = number timeVal = 0 for typeKey, number in timeDef.items(): factor = { 'm': 30 * 24 * 60 * 60, 'w': 7 * 24 * 60 * 60, 'd': 24 * 60 * 60, 'H': 60 * 60, 'M': 60, 's': 1 }.get(typeKey, None) if factor is None: raise Exception('Unknown time specification element "%s"' % typeKey) timeVal += number * factor return timeVal guerillabackup-0.5.0/src/lib/guerillabackup/__init__.py000066400000000000000000000760221450137035300231560ustar00rootroot00000000000000"""This is the main guerillabackup module containing interfaces, common helper functions.""" import errno import os import select import sys import time CONFIG_GENERAL_PERSISTENCY_BASE_DIR_KEY = 'GeneralPersistencyBaseDir' CONFIG_GENERAL_PERSISTENCY_BASE_DIR_DEFAULT = '/var/lib/guerillabackup/state' CONFIG_GENERAL_RUNTIME_DATA_DIR_KEY = 'GeneralRuntimeDataDir' CONFIG_GENERAL_RUNTIME_DATA_DIR_DEFAULT = '/run/guerillabackup' CONFIG_GENERAL_DEBUG_TEST_MODE_KEY = 'GeneralDebugTestModeFlag' GENERATOR_UNIT_CLASS_KEY = 'backupGeneratorUnitClass' TRANSFER_RECEIVER_POLICY_CLASS_KEY = 'TransferReceiverPolicyClass' TRANSFER_RECEIVER_POLICY_INIT_ARGS_KEY = 'TransferReceiverPolicyInitArgs' TRANSFER_SENDER_POLICY_CLASS_KEY = 'TransferSenderPolicyClass' TRANSFER_SENDER_POLICY_INIT_ARGS_KEY = 'TransferSenderPolicyInitArgs' # Some constants not available on Python os module level yet. AT_SYMLINK_NOFOLLOW = 0x100 AT_EMPTY_PATH = 0x1000 class TransformationPipelineElementInterface: """This is the interface to define data transformation pipeline elements, e.g. for compression, encryption, signing. To really start execution of a transformation pipeline, transformation process instances have to be created for each pipe element.""" def getExecutionInstance(self, upstreamProcessOutput): """Get an execution instance for this transformation element. @param upstreamProcessOutput this is the output of the upstream process, that will be wired as input of the newly created process instance.""" raise Exception('Interface method called') class TransformationProcessOutputInterface: """This interface has to be implemented by all pipeline instances, both synchronous and asynchronous. When an instance reaches stopped state, it has to guarantee, that both upstream and downstream instance will detect EOF or an exception is raised when output access is attempted.""" def getOutputStreamDescriptor(self): """Get the file descriptor to read output from this output interface. When supported, a downstream asynchronous process may decide to operate only using the stream, eliminating the need to be invoked for IO operations after starting. @return the file descriptor, pipe or socket or None if stream operation is not available.""" raise Exception('Interface method called') def readData(self, length): """Read data from this output. This method must not block as it is usually invoked from synchronous pipeline elements. @return the at most length bytes of data, zero-length data if nothing available at the moment and None when end of input was reached.""" raise Exception('Interface method called') def close(self): """Close this interface. This will guarantee, that any future access will report EOF or an error. @raise Exception if close is attempted there still is data available.""" raise Exception('Interface method called') class TransformationProcessInterface: """This is the interface of all pipe transformation process instances.""" def getProcessOutput(self): """Get the output connector of this transformation process. After calling this method, it is not possible to set an output stream using setProcessOutputStream.""" raise Exception('Interface method called') def setProcessOutputStream(self, processOutputStream): """Some processes may also support setting of an output stream file descriptor. This is especially useful if the process is the last one in a pipeline and hence could write directly to a file or network descriptor. After calling this method, it is not possible to switch back to getProcessOutput. @throw Exception if this process does not support setting of output stream descriptors.""" raise Exception('Interface method called') def isAsynchronous(self): """A asynchronous process just needs to be started and will perform data processing on streams without any further interaction while running. This method may raise an exception when the process element was not completely connected yet: operation mode might not be known yet.""" raise Exception('Interface method called') def start(self): """Start this execution process.""" raise Exception('Interface method called') def stop(self): """Stop this execution process when still running. @return None when the the instance was already stopped, information about stopping, e.g. the stop error message when the process was really stopped.""" raise Exception('Interface method called') def isRunning(self): """See if this process instance is still running. @return False if instance was not yet started or already stopped. If there are any unreported pending errors from execution, this method will return True until doProcess() or stop() is called at least once.""" raise Exception('Interface method called') def doProcess(self): """This method triggers the data transformation operation of this component. For components in synchronous mode, the method will attempt to move data from input to output. Asynchronous components will just check the processing status and may raise an exception, when processing terminated with errors. As such a component might not be able to detect the amount of data really moved since last invocation, the component may report a fake single byte move. @throws Exception if an uncorrectable transformation state was reached and transformation cannot proceed, even though end of input data was not yet seen. Raise exception also when process was not started or already stopped. @return the number of bytes read or written or at least a value greater zero if any data was processed. A value of zero indicates, that currently data processing was not possible due to filled buffers but should be attemted again. A value below zero indicates that all input data was processed and output buffers were flushed already.""" raise Exception('Interface method called') def getBlockingStreams(self, readStreamList, writeStreamList): """Collect the file descriptors that are currently blocking this synchronous compoment.""" raise Exception('Interface method called') class SchedulableGeneratorUnitInterface: """This is the interface each generator unit has to provide for interaction with a backup generator component. Therefore this component has to provide both information about scheduling and the backup data elements on request. In return, it receives configuration information and persistency support from the invoker.""" def __init__(self, unitName, configContext): """Initialize this unit using the given configuration. The new object has to keep a reference to when needed. @param unitName The name of the activated unit main file in /etc/guerillabackup/units.""" raise Exception('Interface method called') def getNextInvocationTime(self): """Get the time in seconds until this unit should called again. If a unit does not know (yet) as invocation needs depend on external events, it should report a reasonable low value to be queried again soon. @return 0 if the unit should be invoked immediately, the seconds to go otherwise.""" raise Exception('Interface method called') def invokeUnit(self, sink): """Invoke this unit to create backup elements and pass them on to the sink. Even when indicated via getNextInvocationTime, the unit may decide, that it is not yet ready and not write any element to the sink. @return None if currently there is nothing to write to the sink, a number of seconds to retry invocation if the unit assumes, that there is data to be processed but processing cannot start yet, e.g. due to locks held by other parties or resource, e.g. network storages, currently not available. @throw Exception if the unit internal logic failed in any uncorrectable ways. Even when invoker decides to continue processing, it must not reinvoke this unit before complete reload.""" raise Exception('Interface method called') class SinkInterface: """This is the interface each sink has to provide to store backup data elements from different sources.""" def __init__(self, configContext): """Initialize this sink with parameters from the given configuration context.""" raise Exception('Interface method called') def getSinkHandle(self, sourceUrl): """Get a handle to perform transfer of a single backup data element to a sink.""" raise Exception('Interface method called') class SinkHandleInterface: """This is the common interface of all sink handles to store a single backup data element to a sink.""" def getSinkStream(self): """Get the file descriptor to write directly to the open backup data element at the sink, if available. The stream should not be closed using os.close(), but via the close method from SinkHandleInterface. @return the file descriptor or None when not supported.""" raise Exception('Interface method called') def write(self, data): """Write data to the open backup data element at the sink.""" raise Exception('Interface method called') def close(self, metaInfo): """Close the backup data element at the sink and receive any pending or current error associated with the writing process. When there is sufficient risk, that data written to the sink is might have been corrupted during transit or storage, the sink may decide to perform a verification operation while closing and return any verification errors here also. @param metaInfo python objects with additional information about this backup data element. This information is added at the end of the sink procedure to allow inclusion of checksum or signature fields created on the fly while writing. See design and implementation documentation for requirements on those objects.""" raise Exception('Interface method called') def getElementId(self): """Get the storage element ID of the previously written data. @throws Exception if the element ID is not yet available because the object is not closed yet.""" raise Exception('Interface method called') class StorageInterface: """This is the interface of all stores for backup data elements providing access to content data and metainfo but also additional storage attributes. The main difference to a generator unit is, that data is just retrieved but not generated on invocation.""" def __init__(self, configContext): """Initialize this store with parameters from the given configuration context.""" raise Exception('Interface method called') def getSinkHandle(self, sourceUrl): """Get a handle to perform transfer of a single backup data element to a sink. This method may never block or raise an exception, even other concurrent sink, query or update procedures are in progress.""" raise Exception('Interface method called') def getBackupDataElement(self, elementId): """Retrieve a single stored backup data element from the storage. @param elementId the storage ID of the backup data element. @throws Exception when an incompatible query, update or read is in progress.""" raise Exception('Interface method called') def getBackupDataElementForMetaData(self, sourceUrl, metaData): """Retrieve a single stored backup data element from the storage. @param sourceUrl the URL identifying the source that produced the stored data elements. @param metaData metaData dictionary for the element of interest. @throws Exception when an incompatible query, update or read is in progress.""" raise Exception('Interface method called') def queryBackupDataElements(self, query): """Query this storage. @param query if None, return an iterator over all stored elements. Otherwise query has to be a function returning True or False for StorageBackupDataElementInterface elements. @return BackupDataElementQueryResult iterator for this query. @throws Exception if there are any open queries or updates preventing response.""" raise Exception('Interface method called') class StorageBackupDataElementInterface: """This class encapsulates access to a stored backup data element.""" def getElementId(self): """Get the storage element ID of this data element.""" raise Exception('Interface method called') def getSourceUrl(self): """Get the source URL of the storage element.""" raise Exception('Interface method called') def getMetaData(self): """Get only the metadata part of this element. @return a BackupElementMetainfo object""" raise Exception('Interface method called') def getDataStream(self): """Get a stream to read data from that element. @return a file descriptor for reading this stream.""" raise Exception('Interface method called') def setExtraData(self, name, value): """Attach or detach extra data to this storage element. This function is intended for agents to use the storage to persist this specific data also. @param value the extra data content or None to remove the element.""" raise Exception('Interface method called') def getExtraData(self, name): """@return None when no extra data was found, the content otherwise""" raise Exception('Interface method called') def delete(self): """Delete this data element and all extra data element.""" raise Exception('Interface method called') def lock(self): """Lock this backup data element. @throws Exception if the element does not exist any more or cannot be locked""" raise Exception('Interface method called') def unlock(self): """Unlock this backup data element.""" raise Exception('Interface method called') class BackupDataElementQueryResult(): """This is the interface of all query results.""" def getNextElement(self): """Get the next backup data element from this query iterator. @return a StorageBackupDataElementInterface object.""" raise Exception('Interface method called') # Define common functions: def isValueListOfType(value, targetType): """Check if a give value is a list of values of given target type.""" if not isinstance(value, list): return False for item in value: if not isinstance(item, targetType): return False return True def assertSourceUrlSpecificationConforming(sourceUrl): """Assert that the source URL is according to specification.""" if (sourceUrl[0] != '/') or (sourceUrl[-1] == '/'): raise Exception('Slashes not conforming') for urlPart in sourceUrl[1:].split('/'): if len(urlPart) == 0: raise Exception('No path part between slashes') if urlPart in ('.', '..'): raise Exception('. and .. forbidden') for urlChar in urlPart: if urlChar not in '%-.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz': raise Exception('Invalid character %s in URL part' % repr(urlChar)) def getDefaultDownstreamPipeline(configContext, encryptionKeyName): """This function returns the default processing pipeline for locally generated backup data. It uses the GeneralDefaultCompressionElement and GeneralDefaultEncryptionElement parameters from the configuration to generate the pipeline, including a DigestPipelineElement at the end. @encryptionKeyName when not None, this method will change the key in the GeneralDefaultEncryptionElement when defined. Otherwise a default gpg encryption element is created using keys from /etc/guerillabackup/keys.""" downstreamPipelineElements = [] compressionElement = configContext.get( 'GeneralDefaultCompressionElement', None) if compressionElement is not None: downstreamPipelineElements.append(compressionElement) encryptionElement = configContext.get('GeneralDefaultEncryptionElement', None) if encryptionKeyName is not None: if encryptionElement is not None: encryptionElement = encryptionElement.replaceKey(encryptionKeyName) else: encryptionCallArguments = GpgEncryptionPipelineElement.gpgDefaultCallArguments if compressionElement is not None: encryptionCallArguments += ['--compress-algo', 'none'] encryptionElement = GpgEncryptionPipelineElement( encryptionKeyName, encryptionCallArguments) if encryptionElement is not None: downstreamPipelineElements.append(encryptionElement) downstreamPipelineElements.append(DigestPipelineElement()) return downstreamPipelineElements def instantiateTransformationPipeline( pipelineElements, upstreamProcessOutput, downstreamProcessOutputStream, doStartFlag=False): """Create transformation instances for the list of given pipeline elements. @param upstreamProcessOutput TransformationProcessOutputInterface upstream output. This parameter might be None for a pipeline element at first position creating the backup data internally. @param downstreamProcessOutputStream if None, enable this stream as output of the last pipeline element.""" if ((upstreamProcessOutput is not None) and (not isinstance( upstreamProcessOutput, TransformationProcessOutputInterface))): raise Exception('upstreamProcessOutput not an instance of TransformationProcessOutputInterface') if doStartFlag and (downstreamProcessOutputStream is None): raise Exception('Cannot autostart instances without downstream') instanceList = [] lastInstance = None for element in pipelineElements: if lastInstance is not None: upstreamProcessOutput = lastInstance.getProcessOutput() instance = element.getExecutionInstance(upstreamProcessOutput) upstreamProcessOutput = None lastInstance = instance instanceList.append(instance) if downstreamProcessOutputStream is not None: lastInstance.setProcessOutputStream(downstreamProcessOutputStream) if doStartFlag: for instance in instanceList: instance.start() return instanceList def runTransformationPipeline(pipelineInstances): """Run all processes included in the pipeline until processing is complete or the first uncorrectable error is detected by transformation process. All transformation instances have to be started before calling this method. @param pipelineInstances the list of pipeline instances. The instances have to be sorted, so the one reading or creating the input data is the first, the one writing to the sink the last in the list. @throws Exception when first failing pipeline element is detected. This will not terminate the whole pipeline, other elements have to be stopped explicitely.""" # Keep list of still running synchronous instances. syncInstancesList = [] for instance in pipelineInstances: if not instance.isAsynchronous(): syncInstancesList.append(instance) while True: # Run all the synchronous units first. processState = -1 for instance in syncInstancesList: result = instance.doProcess() processState = max(processState, result) if result < 0: if instance.isRunning(): raise Exception('Logic error') syncInstancesList.remove(instance) # Fake state and pretend something was moved. Modifying the list # in loop might skip elements, thus cause invalid state information # aggregation. processState = 1 break if instance.isAsynchronous(): # All synchronous IO was completed, the component may behave # now like an asynchronous one. Therefore remove it from the # list, otherwise we might spin here until the component stops # because it will not report any blocking file descriptors below, # thus skipping select() when there are no other blocking components # in syncInstancesList. syncInstancesList.remove(instance) if processState == -1: # There is no running synchronous instances, all remaining ones # are asynchronous and do not need select() for synchronous IO # data moving below. break if processState != 0: # At least some data was moved, continue moving. continue # Not a single synchronous instance was able to move data. This # is due to a blocking read operation at the upstream side of # the pipeline or the downstream side write, e.g. due to network # or filesystem IO blocking. Update the state of all synchronous # components and if all are still running, wait for any IO to # end blocking. readStreamList = [] writeStreamList = [] for instance in syncInstancesList: instance.getBlockingStreams(readStreamList, writeStreamList) if readStreamList or writeStreamList: # Wait for at least one stream from a synchronous instance to # be ready for IO. There might be none at all due to a former # synchronous component having finished handling all synchronous # IO. select.select(readStreamList, writeStreamList, [], 1) # Only asynchronous instances remain (if any), so just wait until # each one of them has stopped. for instance in pipelineInstances: while instance.isRunning(): time.sleep(1) def listDirAt(dirFd, path='.'): """This function provides the os.listdir() functionality to list files in an opened directory for python2.x. With Python 3.x listdir() also accepts file descriptors as argument.""" currentDirFd = os.open('.', os.O_DIRECTORY|os.O_RDONLY|os.O_NOCTTY) result = None try: os.fchdir(dirFd) result = os.listdir(path) finally: os.fchdir(currentDirFd) os.close(currentDirFd) return result def secureOpenAt( dirFd, pathName, symlinksAllowedFlag=False, dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY, dirCreateMode=None, fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY, fileCreateMode=None): """Perform a secure opening of the file. This function does not circumvent the umask in place when applying create modes. @param dirFd the directory file descriptor where to open a relative pathName, ignored for absolute pathName values. @param pathName open the path denotet by this string relative to the dirFd directory. When pathName is absolute, dirFd will be ignored. The pathName must not end with '/' unless the directory path '/' itself is specified. Distinction what should be opened has to be made using the flags. @param symlinksAllowedFlag when not set to True, O_NOFOLLOW will be added to each open call. @param dirOpenFlag those flags are used to open directory path components without modification. The flags have to include the O_DIRECTORY flag. The O_NOFOLLOW and O_NOCTTY flags are strongly recommended and should be omitted only in special cases. @param dirCreateMode if not None, missing directories will be created with the given mode. @param fileOpenFlags flags to apply when opening the last component @param fileCreateMode if not None, missing files will be created when O_CREAT was also in the fileOpenFlags.""" if (dirOpenFlags&os.O_DIRECTORY) == 0: raise Exception('Directory open flags have to include O_DIRECTORY') if not symlinksAllowedFlag: dirOpenFlags |= os.O_NOFOLLOW fileOpenFlags |= os.O_NOFOLLOW if fileCreateMode is None: fileCreateMode = 0 if pathName == '/': return os.open(pathName, os.O_RDONLY|os.O_DIRECTORY|os.O_NOCTTY) if pathName.endswith('/'): raise Exception('Invalid path value') currentDirFd = dirFd if pathName.startswith('/'): currentDirFd = os.open( '/', os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY) pathName = pathName[1:] pathNameParts = pathName.split('/') try: # Traverse all the directory path parts, but just one at a time # to avoid following symlinks. for pathNamePart in pathNameParts[:-1]: try: nextDirFd = os.open(pathNamePart, dirOpenFlags, dir_fd=currentDirFd) except OSError as openError: if openError.errno == errno.EACCES: raise if dirCreateMode is None: raise os.mkdir(pathNamePart, mode=dirCreateMode, dir_fd=currentDirFd) nextDirFd = os.open(pathNamePart, dirOpenFlags, dir_fd=currentDirFd) if currentDirFd != dirFd: os.close(currentDirFd) currentDirFd = nextDirFd # Now open the last part. Always open last part separately, # also for directories: the last open may use different flags. directoryCreateFlag = False if (((fileOpenFlags&os.O_DIRECTORY) != 0) and ((fileOpenFlags&os.O_CREAT) != 0)): directoryCreateFlag = True # Clear the create flag, otherwise open would create a file instead # of a directory, ignoring the O_DIRECTORY flag. fileOpenFlags &= ~os.O_CREAT resultFd = None try: resultFd = os.open( pathNameParts[-1], fileOpenFlags, mode=fileCreateMode, dir_fd=currentDirFd) except OSError as openError: if (not directoryCreateFlag) or (openError.errno != errno.ENOENT): raise os.mkdir(pathNameParts[-1], mode=dirCreateMode, dir_fd=currentDirFd) resultFd = os.open( pathNameParts[-1], fileOpenFlags, dir_fd=currentDirFd) return resultFd finally: # Make sure to close the currentDirFd, otherwise we leak one fd # per error. if currentDirFd != dirFd: os.close(currentDirFd) # Fail on all errors not related to concurrent proc filesystem # changes. OPENER_INFO_FAIL_ON_ERROR = 0 # Do not fail on errors related to limited permissions accessing # the information. This flag is needed when running without root # privileges. OPENER_INFO_IGNORE_ACCESS_ERRORS = 1 def getFileOpenerInformation(pathNameList, checkMode=OPENER_INFO_FAIL_ON_ERROR): """Get information about processes currently having access to one of the absolute pathnames from the list. This is done reading information from the proc filesystem. As access to proc might be limited for processes with limited permissions, the function can be forced to ignore the permission errors occurring during those checks. CAVEAT: The checks are meaningful to detect concurrent write access to files where e.g. a daemon did not close them on error or a file is currently filled. The function is always racy, a malicious process can also trick guerillabackup to believe a file is in steady state and not currently written even when that is not true. @param pathNameList a list of absolute pathnames to check in parallel. All those entries have to pass a call to os.path.realpath unmodified. @return a list containing one entry per pathNameList entry. The entry can be none if no access to the file was detected. Otherwise the entry is a list with tuples containing the pid of the process having access to the file and a list with tuples containing the fd within that process and a the flags.""" for pathName in pathNameList: if pathName != os.path.realpath(pathName): raise Exception('%s is not an absolute, canonical path' % pathName) if checkMode not in [OPENER_INFO_FAIL_ON_ERROR, OPENER_INFO_IGNORE_ACCESS_ERRORS]: raise Exception('Invalid checkMode given') resultList = [None]*len(pathNameList) for procPidName in os.listdir('/proc'): procPid = -1 try: procPid = int(procPidName) except ValueError: continue fdDirName = '/proc/%s/fd' % procPidName fdInfoDirName = '/proc/%s/fdinfo' % procPidName fdFileList = [] try: fdFileList = os.listdir(fdDirName) except OSError as fdListException: if fdListException.errno == errno.ENOENT: continue if ((fdListException.errno == errno.EACCES) and (checkMode == OPENER_INFO_IGNORE_ACCESS_ERRORS)): continue raise for openFdName in fdFileList: targetPathName = None try: targetPathName = os.readlink('%s/%s' % (fdDirName, openFdName)) except OSError as readLinkError: if readLinkError.errno == errno.ENOENT: continue raise pathNameIndex = -1 try: pathNameIndex = pathNameList.index(targetPathName) except ValueError: continue # At least one hit, read the data. infoFd = os.open( '%s/%s' % (fdInfoDirName, openFdName), os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY) infoData = os.read(infoFd, 1<<16) os.close(infoFd) splitPos = infoData.find(b'flags:\t') if splitPos < 0: raise Exception('Unexpected proc behaviour') endPos = infoData.find(b'\n', splitPos) infoTuple = (int(openFdName), int(infoData[splitPos+7:endPos], 8)) while True: if resultList[pathNameIndex] is None: resultList[pathNameIndex] = [(procPid, [infoTuple])] else: pathNameInfo = resultList[pathNameIndex] indexPos = -1-len(pathNameInfo) for index, entry in enumerate(pathNameInfo): if entry[0] == procPid: indexPos = index break if entry[0] > procPid: indexPos = -1-index break if indexPos >= 0: pathNameInfo[indexPos][1].append(infoTuple) else: indexPos = -1-indexPos pathNameInfo.insert(indexPos, (procPid, [infoTuple])) try: pathNameIndex = pathNameList.index(targetPathName, pathNameIndex+1) except ValueError: break return resultList def getPersistencyBaseDirPathname(configContext): """Get the persistency data directory pathname from configuration or return the default value.""" return configContext.get( CONFIG_GENERAL_PERSISTENCY_BASE_DIR_KEY, CONFIG_GENERAL_PERSISTENCY_BASE_DIR_DEFAULT) def getRuntimeDataDirPathname(configContext): """Get the runtime data directory pathname from configuration or return the default value.""" return configContext.get( CONFIG_GENERAL_RUNTIME_DATA_DIR_KEY, CONFIG_GENERAL_RUNTIME_DATA_DIR_DEFAULT) def openPersistencyFile(configContext, pathName, flags, mode): """Open or possibly create a persistency file in the default persistency directory.""" baseDir = getPersistencyBaseDirPathname(configContext) return secureOpenAt( -1, os.path.join(baseDir, pathName), symlinksAllowedFlag=False, dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY, dirCreateMode=0o700, fileOpenFlags=flags|os.O_NOFOLLOW|os.O_NOCTTY, fileCreateMode=mode) def readFully(readFd): """Read data from a file descriptor until EOF is reached.""" data = b'' while True: block = os.read(readFd, 1<<16) if len(block) == 0: break data += block return data def execConfigFile(configFileName, configContext): """Load code from file and execute it with given global context.""" configFile = open(configFileName, 'r') configData = configFile.read() configFile.close() configCode = compile(configData, configFileName, 'exec') exec(configCode, configContext, configContext) # Load some classes into this namespace as shortcut for use in # configuration files. from guerillabackup.DefaultFileSystemSink import DefaultFileSystemSink from guerillabackup.DigestPipelineElement import DigestPipelineElement from guerillabackup.GpgEncryptionPipelineElement import GpgEncryptionPipelineElement from guerillabackup.OSProcessPipelineElement import OSProcessPipelineElement from guerillabackup.Transfer import SenderMoveDataTransferPolicy from guerillabackup.UnitRunConditions import AverageLoadLimitCondition from guerillabackup.UnitRunConditions import LogicalAndCondition from guerillabackup.UnitRunConditions import MinPowerOnTimeCondition import guerillabackup.Utils guerillabackup-0.5.0/src/lib/guerillabackup/storagetool/000077500000000000000000000000001450137035300234005ustar00rootroot00000000000000guerillabackup-0.5.0/src/lib/guerillabackup/storagetool/Policy.py000066400000000000000000000026031450137035300252120ustar00rootroot00000000000000class Policy(): """This class defines a policy instance.""" def __init__(self, policyConfig): """Instantiate this policy object from a JSON policy configuration definition.""" if not isinstance(policyConfig, dict): raise Exception( 'Policy configuration has to be a dictionary: %s' % ( repr(policyConfig))) self.policyName = policyConfig['Name'] if not isinstance(self.policyName, str): raise Exception('Policy name has to be a string') self.policyPriority = policyConfig.get('Priority', 0) if not isinstance(self.policyPriority, int): raise Exception('Policy priority has to be an integer') def getPolicyName(self): """Get the name of this policy.""" return self.policyName def getPriority(self): """Get the priority of this policy. A higher number indicates that a policy has higher priority and shall override any policies with lower or equal priority.""" return self.policyPriority def apply(self, sourceStatus): """Apply this policy to a backup source status.""" raise Exception('Abstract method called') def delete(self, sourceStatus): """Delete the policy status data of this policy for all elements marked for deletion and update status data of those elements kept to allow validation even after deletion of some elements.""" raise Exception('Abstract method called') guerillabackup-0.5.0/src/lib/guerillabackup/storagetool/PolicyTypeInterval.py000066400000000000000000000234331450137035300275650ustar00rootroot00000000000000"""This module provides support for the backup interval checking policy.""" import json import sys import guerillabackup.storagetool.Policy class PolicyTypeInterval(guerillabackup.storagetool.Policy.Policy): """This policy type defines a policy checking file intervals between full and incremental backups (if available). Applying the policies will also use and modify following backup data element status information fields: * LastFull, LastInc: Timestamp of previous element in case the element was deleted. Otherwise policy checks would report policy failures when pruning data storage elements. * Ignore: When set ignore interval policy violations of that type, i.e. full, inc, both related to the previous backup data element. * Dead: When set, this source will not produce any more backups. Therefore ignore any gap after this last backup and report a policy violation if more backups are seen.""" POLICY_NAME = 'Interval' def __init__(self, policyConfig): """Instantiate this policy object from a JSON policy configuration definition.""" super().__init__(policyConfig) self.config = {} self.timeFullMin = None self.timeFullMax = None # This is the minimal time between incremental backups or None # if this source does not emit any incremental backups. self.timeIncMin = None self.timeIncMax = None for configKey, configValue in policyConfig.items(): if configKey in ('Name', 'Priority'): continue self.config[configKey] = configValue if configKey not in ('FullMax', 'FullMin', 'IncMax', 'IncMin'): raise Exception('Unknown policy configuration setting "%s"' % configKey) timeVal = None if configValue is not None: if not isinstance(configValue, str): raise Exception( 'Policy setting "%s" has to be a string' % configKey) timeVal = guerillabackup.Utils.parseTimeDef(configValue) if configKey == 'FullMax': self.timeFullMax = timeVal elif configKey == 'FullMin': self.timeFullMin = timeVal elif configKey == 'IncMax': self.timeIncMax = timeVal else: self.timeIncMin = timeVal # Validate the configuration settings. if (self.timeFullMin is None) or (self.timeFullMax is None): raise Exception() if self.timeFullMin > self.timeFullMax: raise Exception() if self.timeIncMax is None: if self.timeIncMin is not None: raise Exception() elif (self.timeIncMin > self.timeIncMax) or \ (self.timeIncMin > self.timeFullMin) or \ (self.timeIncMax > self.timeFullMax): raise Exception() def apply(self, sourceStatus): """Apply this policy to a backup source status.""" elementList = sourceStatus.getDataElementList() currentPolicy = None lastElement = None lastFullElementName = None lastFullTime = None lastIncTime = None for elem in elementList: policyData = elem.getPolicyData(PolicyTypeInterval.POLICY_NAME) ignoreType = None if policyData is not None: for key in policyData.keys(): if key not in ('Config', 'Ignore', 'LastFull', 'LastInc'): raise Exception( 'Policy status configuration for "%s" in ' \ '"%s" corrupted: %s' % ( elem.getElementName(), sourceStatus.getStorageStatus().getStatusFileName(), repr(policyData))) if 'LastFull' in policyData: lastFullTime = policyData['LastFull'] lastFullElementName = '?' lastIncTime = policyData['LastInc'] if 'Config' in policyData: # Use the additional policy data. initConfig = dict(policyData['Config']) initConfig['Name'] = PolicyTypeInterval.POLICY_NAME currentPolicy = PolicyTypeInterval(initConfig) ignoreType = policyData.get('Ignore', None) if currentPolicy is None: # This is the first element and default policy data was not copied # yet. elem.updatePolicyData( PolicyTypeInterval.POLICY_NAME, {'Config': self.config}) currentPolicy = self elemTime = elem.getDatetimeSeconds() if lastFullTime is None: # This is the first element. if elem.getType() != 'full': print( 'Backup source %s starts with incremental ' \ 'backups, storage might be corrupted.' % ( sourceStatus.getSourceName()), file=sys.stderr) lastFullTime = lastIncTime = elemTime lastFullElementName = elem.getElementName() else: # Now check the time policy. fullValidationOkFlag = False if elem.getType() == 'full': if ignoreType not in ('both', 'full'): fullDelta = elemTime - lastFullTime if fullDelta < currentPolicy.timeFullMin: print( 'Backup source %s emitting full backups ' \ 'faster than expected between "%s" and "%s".' % ( sourceStatus.getSourceName(), lastFullElementName, elem.getElementName()), file=sys.stderr) elif fullDelta > currentPolicy.timeFullMax: print( 'Backup source "%s" full interval too big ' \ 'between "%s" and "%s".' % ( sourceStatus.getSourceName(), lastFullElementName, elem.getElementName()), file=sys.stderr) else: fullValidationOkFlag = True if not fullValidationOkFlag: print( 'No interactive/automatic policy or ' \ 'status update, consider adding this ' \ 'manually to the status in %s:\n%s' % ( sourceStatus.getStorageStatus().getStatusFileName(), json.dumps( {elem.getElementName(): {'Interval': {'Ignore': 'full'}}}, indent=2) ), file=sys.stderr) lastFullTime = elemTime lastFullElementName = elem.getElementName() else: # This is an incremental backup, so incremental interval policy # has to be defined. if currentPolicy.timeIncMin is None: raise Exception('Not expecting incremental backups') # Always perform incremental checks if there is a policy for # it. if (currentPolicy.timeIncMin is not None) and \ (ignoreType not in ('both', 'inc')): incDelta = elemTime - lastIncTime # As full backup scheduling overrides the lower priority incremental # schedule, that may have triggered a full backup while no incremental # would have been required yet. if (incDelta < currentPolicy.timeIncMin) and \ (elem.getType() != 'full') and (not fullValidationOkFlag): print( 'Backup source %s emitting inc backups ' \ 'faster than expected between "%s" and "%s".' % ( sourceStatus.getSourceName(), lastElement.getElementName(), elem.getElementName()), file=sys.stderr) elif incDelta > currentPolicy.timeIncMax: print( 'Backup source "%s" inc interval too big ' \ 'between "%s" and "%s".' % ( sourceStatus.getSourceName(), lastElement.getElementName(), elem.getElementName()), file=sys.stderr) print( 'No interactive/automatic policy or ' \ 'status update, consider adding this ' \ 'manually to the status in %s:\n%s' % ( sourceStatus.getStorageStatus().getStatusFileName(), json.dumps( {elem.getElementName(): {'Interval': {'Ignore': 'inc'}}}, indent=2) ), file=sys.stderr) lastElement = elem lastIncTime = elemTime def delete(self, sourceStatus): """Prepare the policy status data for deletions going to happen later on.""" elementList = sourceStatus.getDataElementList() lastElement = None lastFullTime = None lastIncTime = None # Keep also track of the last persistent and the current policy. # When deleting an element with policy updates then move the # current policy data to the first element not deleted. persistentPolicyConfig = currentPolicyConfig = self.config for elem in elementList: policyData = elem.getPolicyData(PolicyTypeInterval.POLICY_NAME) if policyData is not None: if 'LastFull' in policyData: lastFullTime = policyData['LastFull'] lastIncTime = policyData['LastInc'] if 'Config' in policyData: currentPolicyConfig = policyData['Config'] if not elem.isMarkedForDeletion(): persistentPolicyConfig = currentPolicyConfig elemTime = elem.getDatetimeSeconds() if lastFullTime is None: # This is the first element. lastFullTime = lastIncTime = elemTime else: if (lastElement is not None) and \ (lastElement.isMarkedForDeletion()) and \ (not elem.isMarkedForDeletion()): # So the previous element is to be deleted but this not. Make # sure last timings are kept in policy data. if policyData is None: policyData = {} elem.setPolicyData(PolicyTypeInterval.POLICY_NAME, policyData) policyData['LastFull'] = lastFullTime policyData['LastInc'] = lastIncTime # Also copy policy configuration if the element defining the # current policy was not persisted. if persistentPolicyConfig != currentPolicyConfig: policyData['Config'] = currentPolicyConfig persistentPolicyConfig = currentPolicyConfig if elem.getType() == 'full': lastFullTime = elemTime lastElement = elem lastIncTime = elemTime guerillabackup-0.5.0/src/lib/guerillabackup/storagetool/PolicyTypeLevelRetention.py000066400000000000000000000230471450137035300307410ustar00rootroot00000000000000"""This module provides backup data retention policy support.""" import datetime import time import guerillabackup.storagetool.Policy class PolicyTypeLevelRetentionTagger(): """This is a helper class for PolicyTypeLevelRetention to tag backup elements to be kept.""" # This is the specification for calender based alignment. Each # tuple contains the name and the offset when counting does not # start with 0. ALIGN_ATTR_SPEC = ( ('year', 0), ('month', 1), ('day', 1), ('hour', 0), ('minute', 0), ('second', 0)) def __init__(self, configDict): """Create a new aligned tagger component. The class will check all available backup data elements and tag those to be kept, that match the specification of this tagger. All backup data elements tagged by at least one tagger will be kept, the others marked for deletion. Configuration parameters for the tagger are: * KeepCount: This is the number of time intervals to search for matching backup data elements and thus the number of backups to keep at most. When there is no matching backup found within a time interval, the empty interval is also included in the count. When there is more than one backup within the interval, only the first one is kept. Thus e.g. having a "KeepCount" of 30 and a "Interval" setting of "day", this will cause backups to be kept from now till 30 days ahead (or the time of the latest backup if "TimeRef" was set to "latest"). * KeepInc: When true, keep also incremental backups within any interval where backups would be kept. This will also tag the previous full backup and any incremental backups in between for retention. Incremental backups after the last full backup are always kept unconditionally as the are required to restore the latest backup. Default is false. * Interval: This is the size of the retention interval to keep one backup per interval. Values are "year", "month", "day", "hour". * TimeRef: This defines the time reference to use to perform the retention policy evaluation. With "latest" the time of the newest backup is used, while "now" uses the current time. The default is "now". " AlignModulus: This defines the modulus when aligning backup retention to values other than the "Interval" unit, e.g. to keep one backup every three month. * AlignValue: When defined this will make backups to be aligned to that value related to the modulus, e.g. to keep backups of January, April, July, October an "AlignModulus" of 3 and "AlignValue" of 1 is required.""" self.intervalUnit = None self.alignModulus = 1 self.alignValue = 0 self.keepCount = None self.keepIncFlag = False self.timeReferenceType = 'now' for configKey, configValue in configDict.items(): if configKey == 'AlignModulus': if (not isinstance(configValue, int)) or (configValue < 1): raise Exception() self.alignModulus = configValue elif configKey == 'AlignValue': if (not isinstance(configValue, int)) or (configValue < 0): raise Exception() self.alignValue = configValue elif configKey == 'Interval': if configValue not in ['year', 'month', 'day', 'hour']: raise Exception() self.intervalUnit = configValue elif configKey == 'KeepCount': if (not isinstance(configValue, int)) or (configValue < 1): raise Exception() self.keepCount = configValue elif configKey == 'KeepInc': if not isinstance(configValue, bool): raise Exception() self.keepIncFlag = configValue elif configKey == 'TimeRef': if configValue not in ['latest', 'now']: raise Exception() self.timeReferenceType = configValue else: raise Exception( 'Unsupported configuration parameter %s' % repr(configKey)) if self.keepCount is None: raise Exception('Mandatory KeepCount parameter not set') if (self.intervalUnit is not None) and \ (self.alignModulus <= self.alignValue): raise Exception('Align value has to be smaller than modulus') def tagBackups(self, backupList, tagList): """Check which backup elements should be kept and tag them in the tagList. @param backupList the sorted list of backup data elements. @param tagList list of boolean values of same length as the backupList. The list should be initialized to all False values before calling tagBackups for the first time.""" if len(backupList) != len(tagList): raise Exception() # Always tag the newest backup. tagList[-1] = True referenceTime = None if self.timeReferenceType == 'now': referenceTime = datetime.datetime.fromtimestamp(time.time()) elif self.timeReferenceType == 'latest': referenceTime = datetime.datetime.fromtimestamp( backupList[-1].getDatetimeSeconds()) else: raise Exception('Logic error') # Prefill the data with the default field offsets. alignedTimeData = [x[1] for x in self.ALIGN_ATTR_SPEC] alignFieldValue = None for alignPos in range(0, len(self.ALIGN_ATTR_SPEC)): alignedTimeData[alignPos] = getattr( referenceTime, self.ALIGN_ATTR_SPEC[alignPos][0]) if self.intervalUnit == self.ALIGN_ATTR_SPEC[alignPos][0]: alignFieldValue = alignedTimeData[alignPos] break startSearchTime = datetime.datetime(*alignedTimeData).timestamp() # Handle alignment and modulus rules. if (self.alignModulus != 1) and (self.alignValue is not None): startSearchTime = self.decreaseTimeField( startSearchTime, (alignFieldValue - self.alignValue) % self.alignModulus) keepAttempts = self.keepCount + 1 while keepAttempts != 0: for pos, backupElement in enumerate(backupList): backupElement = backupList[pos] if (backupElement.getDatetimeSeconds() >= startSearchTime) and \ ((backupElement.getType() == 'full') or self.keepIncFlag): tagList[pos] = True # Tag all incrementals within the interval. if not self.keepIncFlag: break # Move down units of intervalUnit size. startSearchTime = self.decreaseTimeField( startSearchTime, self.alignModulus) keepAttempts -= 1 # Now after tagging select also those incremental backups in between # to guaranteee that they can be really restored. forceTaggingFlag = True for pos in range(len(backupList) - 1, -1, -1): if forceTaggingFlag: tagList[pos] = True if backupList[pos].getType() == 'full': forceTaggingFlag = False elif (tagList[pos]) and (backupList[pos].getType() == 'inc'): forceTaggingFlag = True def decreaseTimeField(self, timestamp, decreaseCount): """Decrease a given time value.""" timeValue = datetime.datetime.fromtimestamp(timestamp) if self.intervalUnit == 'year': timeValue = timeValue.replace(year=timeValue.year-decreaseCount) elif self.intervalUnit == 'month': # Decrease the month values. As the timestamp was aligned to # the first of the month already, the month value can be simple # decreased. for modPos in range(0, decreaseCount): if timeValue.month == 1: timeValue = timeValue.replace(year=timeValue.year-1, month=12) else: timeValue = timeValue.replace(month=timeValue.month-1) elif self.intervalUnit == 'day': timeValue = timeValue - datetime.timedelta(days=decreaseCount) else: raise Exception('Logic error') return timeValue.timestamp() class PolicyTypeLevelRetention(guerillabackup.storagetool.Policy.Policy): """This policy type defines a data retention policy keeping a number of backups for each time level. The policy itself does not rely on the backup source status but only on the list of backup data elmeents. The policy will flag elements to be deleted when this retention policy has no use for those elmements and will fail if another, concurrent policy or manual intervention, has flagged elements for deletion which should be kept according to this policy.""" POLICY_NAME = 'LevelRetention' def __init__(self, policyConfig): """Instantiate this policy object from a JSON policy configuration definition.""" super().__init__(policyConfig) # This is the list of retention level definitions. self.levelList = [] for configKey, configValue in policyConfig.items(): if configKey in ('Name', 'Priority'): continue if configKey == 'Levels': self.parseLevelConfig(configValue) continue raise Exception('Unknown policy configuration setting "%s"' % configKey) # Validate the configuration settings. if not self.levelList: raise Exception() def parseLevelConfig(self, levelConfigList): """Parse the retention policy level configuration.""" if not isinstance(levelConfigList, list): raise Exception() for levelConfig in levelConfigList: if not isinstance(levelConfig, dict): raise Exception() self.levelList.append(PolicyTypeLevelRetentionTagger(levelConfig)) def apply(self, sourceStatus): """Apply this policy to a backup source status.""" elementList = sourceStatus.getDataElementList() tagList = [False] * len(elementList) for levelTagger in self.levelList: levelTagger.tagBackups(elementList, tagList) for tagPos, element in enumerate(elementList): element.markForDeletion(not tagList[tagPos]) def delete(self, sourceStatus): """Prepare the policy status data for deletions going to happen later on.""" guerillabackup-0.5.0/src/lib/guerillabackup/storagetool/PolicyTypeSize.py000066400000000000000000000240441450137035300267120ustar00rootroot00000000000000"""This module provides support for the backup size checking policy.""" import json import sys import guerillabackup.storagetool.Policy class PolicyTypeSize(guerillabackup.storagetool.Policy.Policy): """This policy type defines a policy checking file sizes of full and incremental backups. Applying the policies will also use and modify following backup data element status information fields: * FullSizeExpect: The expected size of full backups in bytes. * FullSizeMax: The maximum size of backups still accepted as normal. * FullSizeMin: The minimum size of backups still accepted as normal. * IncSizeExpect: The expected size of incremental backups in bytes.""" POLICY_NAME = 'Size' def __init__(self, policyConfig): """Instantiate this policy object from a JSON policy configuration definition.""" super().__init__(policyConfig) self.config = {} self.fullSizeExpect = None self.fullSizeMax = None self.fullSizeMaxRel = None self.fullSizeMin = None self.fullSizeMinRel = None self.incSizeExpect = None self.incSizeExpectRel = None self.incSizeMax = None self.incSizeMaxRel = None self.incSizeMin = None self.incSizeMinRel = None for configKey, configValue in policyConfig.items(): if configKey in ('Name', 'Priority'): continue if configKey in ('FullSizeExpect', 'FullSizeMax', 'FullSizeMin', 'IncSizeExpect', 'IncSizeMax', 'IncSizeMin'): if not isinstance(configValue, int): raise Exception( 'Policy setting "%s" has to be a integer' % configKey) if configKey == 'FullSizeExpect': self.fullSizeExpect = configValue elif configKey == 'FullSizeMax': self.fullSizeMax = configValue elif configKey == 'FullSizeMin': self.fullSizeMin = configValue elif configKey == 'IncSizeExpect': self.incSizeExpect = configValue elif configKey == 'IncSizeMax': self.incSizeMax = configValue elif configKey == 'IncSizeMin': self.incSizeMin = configValue elif configKey in ('FullSizeMaxRel', 'FullSizeMinRel', 'IncSizeExpectRel', 'IncSizeMaxRel', 'IncSizeMinRel'): if not isinstance(configValue, float): raise Exception( 'Policy setting "%s" has to be a float' % configKey) if configKey == 'FullSizeMaxRel': self.fullSizeMaxRel = configValue elif configKey == 'FullSizeMinRel': self.fullSizeMinRel = configValue elif configKey == 'IncSizeExpectRel': self.incSizeExpectRel = configValue elif configKey == 'IncSizeMaxRel': self.incSizeMaxRel = configValue elif configKey == 'IncSizeMinRel': self.incSizeMinRel = configValue else: raise Exception('Unknown policy configuration setting "%s"' % configKey) self.config[configKey] = configValue # Validate the configuration settings. Relative and absolute # settings may not be set or unset at the same time. if (self.fullSizeMax is not None) == (self.fullSizeMaxRel is not None): raise Exception() if (self.fullSizeMin is not None) == (self.fullSizeMinRel is not None): raise Exception() # Incremental settings might be missing when there are no incremental # backups expected. if (self.incSizeMax is not None) and (self.incSizeMaxRel is not None): raise Exception() if (self.incSizeMin is not None) and (self.incSizeMinRel is not None): raise Exception() if self.fullSizeExpect is not None: if self.fullSizeMaxRel is not None: self.fullSizeMax = int(self.fullSizeMaxRel * self.fullSizeExpect) if self.fullSizeMinRel is not None: self.fullSizeMin = int(self.fullSizeMinRel * self.fullSizeExpect) if self.incSizeExpectRel is not None: self.incSizeExpect = int(self.incSizeExpectRel * self.fullSizeExpect) if self.incSizeExpect is not None: if self.incSizeMaxRel is not None: self.incSizeMax = int(self.incSizeMaxRel * self.incSizeExpect) if self.incSizeMinRel is not None: self.incSizeMin = int(self.incSizeMinRel * self.incSizeExpect) def apply(self, sourceStatus): """Apply this policy to a backup source status.""" elementList = sourceStatus.getDataElementList() currentPolicy = self for elem in elementList: # This is a new policy configuration created while checking this # element. It has to be persisted before checking the next element. newConfig = None policyData = elem.getPolicyData(PolicyTypeSize.POLICY_NAME) ignoreFlag = False if policyData is not None: for key in policyData.keys(): if key not in ('Config', 'Ignore'): raise Exception( 'Policy status configuration for "%s" in ' \ '"%s" corrupted: %s' % ( elem.getElementName(), sourceStatus.getStorageStatus().getStatusFileName(), repr(policyData))) if 'Config' in policyData: # Use the additional policy data. initConfig = dict(policyData['Config']) initConfig['Name'] = PolicyTypeSize.POLICY_NAME currentPolicy = PolicyTypeSize(initConfig) ignoreFlag = policyData.get('Ignore', False) if not isinstance(ignoreFlag, bool): raise Exception( 'Policy status configuration for "%s" in ' \ '"%s" corrupted, "Ignore" parameter has to ' \ 'be boolean: %s' % ( elem.getElementName(), sourceStatus.getStorageStatus().getStatusFileName(), repr(policyData))) if elem.getType() == 'full': # First see if the current policy has already an expected # full backup size. If not use the current size. if currentPolicy.fullSizeExpect is None: newConfig = dict(currentPolicy.config) newConfig['FullSizeExpect'] = elem.getDataLength() newConfig['Name'] = PolicyTypeSize.POLICY_NAME currentPolicy = PolicyTypeSize(newConfig) elif ((elem.getDataLength() < currentPolicy.fullSizeMin) or \ (elem.getDataLength() > currentPolicy.fullSizeMax)) and \ (not ignoreFlag): print( 'Full backup size %s in source %s out of ' \ 'limits, should be %d <= %d <= %d.' % ( elem.getElementName(), sourceStatus.getSourceName(), currentPolicy.fullSizeMin, elem.getDataLength(), currentPolicy.fullSizeMax), file=sys.stderr) print( 'No interactive/automatic policy or ' \ 'status update, consider adding this ' \ 'manually to the status in %s:\n%s' % ( sourceStatus.getStorageStatus().getStatusFileName(), json.dumps( {elem.getElementName(): {'Size': {'Ignore': True}}}, indent=2) ), file=sys.stderr) else: # This is an incremental backup, so incremental interval policy # has to be defined. if currentPolicy.incSizeExpect is None: if currentPolicy.incSizeExpectRel is not None: # There was no full backup seen before any incremental one. raise Exception( 'Not expecting incremental backups before full ones') newConfig = dict(currentPolicy.config) newConfig['IncSizeExpect'] = elem.getDataLength() newConfig['Name'] = PolicyTypeSize.POLICY_NAME currentPolicy = PolicyTypeSize(newConfig) if (currentPolicy.incSizeMin is None) or \ (currentPolicy.incSizeMax is None): raise Exception( 'Incremental backups from source "%s" in ' \ 'config "%s" found but no Size policy ' \ 'for incremental data defined' % ( sourceStatus.getSourceName(), sourceStatus.getStorageStatus().getConfig().getConfigFileName())) if ((elem.getDataLength() < currentPolicy.incSizeMin) or \ (elem.getDataLength() > currentPolicy.incSizeMax)) and \ (not ignoreFlag): print( 'Incremental backup size %s in source %s out of ' \ 'limits, should be %d <= %d <= %d.' % ( elem.getElementName(), sourceStatus.getSourceName(), currentPolicy.incSizeMin, elem.getDataLength(), currentPolicy.incSizeMax), file=sys.stderr) print( 'No interactive/automatic policy or ' \ 'status update, consider adding this ' \ 'manually to the status in %s:\n%s' % ( sourceStatus.getStorageStatus().getStatusFileName(), json.dumps( {elem.getElementName(): {'Size': {'Ignore': True}}}, indent=2) ), file=sys.stderr) if newConfig is not None: if policyData is None: policyData = {} elem.setPolicyData(PolicyTypeSize.POLICY_NAME, policyData) policyData['Config'] = newConfig def delete(self, sourceStatus): """Prepare the policy status data for deletions going to happen later on.""" elementList = sourceStatus.getDataElementList() # Keep also track of the last persistent and the current policy. # When deleting an element with policy updates then move the # current policy data to the first element not deleted. persistentPolicyConfig = currentPolicyConfig = self.config for elem in elementList: policyData = elem.getPolicyData(PolicyTypeSize.POLICY_NAME) if (policyData is not None) and ('Config' in policyData): currentPolicyConfig = policyData['Config'] if not elem.isMarkedForDeletion(): persistentPolicyConfig = currentPolicyConfig if not elem.isMarkedForDeletion(): if persistentPolicyConfig != currentPolicyConfig: if policyData is None: policyData = {} elem.setPolicyData(PolicyTypeSize.POLICY_NAME, policyData) policyData['Config'] = currentPolicyConfig persistentPolicyConfig = currentPolicyConfig guerillabackup-0.5.0/test/000077500000000000000000000000001450137035300154665ustar00rootroot00000000000000guerillabackup-0.5.0/test/DefaultFileSystemSinkTest000077500000000000000000000037161450137035300225010ustar00rootroot00000000000000#!/usr/bin/python3 -BEsStt """This test collection attempts to verify that the DefaultFileSystemSink class works as expected.""" import sys sys.path = sys.path[1:] + ['/usr/lib/guerillabackup/lib', '/etc/guerillabackup/lib-enabled'] import hashlib import os import time import guerillabackup from guerillabackup.BackupElementMetainfo import BackupElementMetainfo testDirName = '/tmp/gb-test-%s' % time.time() print('Using %s for testing' % testDirName, file=sys.stderr) os.mkdir(testDirName, 0o700) baseDirFd = os.open(testDirName, os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW) sink = guerillabackup.DefaultFileSystemSink( {guerillabackup.DefaultFileSystemSink.SINK_BASEDIR_KEY: testDirName}) try: sinkHandle = sink.getSinkHandle('somepath/invalid') raise Exception('Illegal state') except Exception as testException: if testException.args[0] != 'Slashes not conforming': raise testException try: sinkHandle = sink.getSinkHandle('/somepath/invalid/') raise Exception('Illegal state') except Exception as testException: if testException.args[0] != 'Slashes not conforming': raise testException try: sinkHandle = sink.getSinkHandle('/somepath/../invalid') raise Exception('Illegal state') except Exception as testException: if testException.args[0] != '. and .. forbidden': raise testException try: sinkHandle = sink.getSinkHandle('/somepath/./invalid') raise Exception('Illegal state') except Exception as testException: if testException.args[0] != '. and .. forbidden': raise testException sinkHandle = sink.getSinkHandle('/somepath/valid') sinkInputFd = os.open('/dev/urandom', os.O_RDONLY) sinkTestData = os.read(sinkInputFd, 1<<16) sinkStream = sinkHandle.getSinkStream() digestAlgo = hashlib.sha512() os.write(sinkStream, sinkTestData) digestAlgo.update(sinkTestData) metaInfo = { 'BackupType': 'full', 'StorageFileChecksumSha512': digestAlgo.digest(), 'Timestamp': 1234567890} sinkHandle.close(BackupElementMetainfo(metaInfo)) guerillabackup-0.5.0/test/LibIOTests000077500000000000000000000056301450137035300174010ustar00rootroot00000000000000#!/usr/bin/python3 -BEsStt """This test collection attempts to verify that the library low-level IO functions work as expected.""" import errno import os import sys import time sys.path = sys.path[1:] + ['/usr/lib/guerillabackup/lib', '/etc/guerillabackup/lib-enabled'] import guerillabackup testDirName = '/tmp/gb-test-%s' % time.time() os.mkdir(testDirName, 0o700) baseDirFd = os.open(testDirName, os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW) # Open a nonexisting file without creating it: try: newFileFd = guerillabackup.secureOpenAt( baseDirFd, 'newfile', fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY, fileCreateMode=0o666) raise Exception('Illegal state') except OSError as osError: if osError.errno != errno.ENOENT: raise Exception('Illegal state: %s' % osError) # Open a file, creating it: newFileFd = guerillabackup.secureOpenAt( baseDirFd, 'newfile', fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY|os.O_CREAT|os.O_EXCL, fileCreateMode=0o666) os.close(newFileFd) # Try again, now should fail as already existing: try: newFileFd = guerillabackup.secureOpenAt( baseDirFd, 'newfile', fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY|os.O_CREAT|os.O_EXCL, fileCreateMode=0o666) except OSError as osError: if osError.errno != errno.EEXIST: raise Exception('Illegal state: %s' % osError) # Try to create directory and file but directory still missing: try: newFileFd = guerillabackup.secureOpenAt( baseDirFd, 'newdir/newfile', dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY, dirCreateMode=None, fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY|os.O_CREAT|os.O_EXCL, fileCreateMode=0o666) raise Exception('Illegal state') except OSError as osError: if osError.errno != errno.ENOENT: raise Exception('Illegal state: %s' % osError) # Try to create directory and file: newFileFd = guerillabackup.secureOpenAt( baseDirFd, 'newdir/newfile', dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY, dirCreateMode=0o777, fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY|os.O_CREAT|os.O_EXCL, fileCreateMode=0o666) # Try to create only directories: A normal open call would create # a file, that could not be reopened using the same flags. newFileFd = guerillabackup.secureOpenAt( baseDirFd, 'newdir/subdir', dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY, dirCreateMode=0o777, fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY|os.O_CREAT|os.O_EXCL|os.O_DIRECTORY, fileCreateMode=0o777) os.close(newFileFd) newFileFd = guerillabackup.secureOpenAt( baseDirFd, 'newdir/subdir', dirOpenFlags=os.O_RDONLY|os.O_DIRECTORY|os.O_NOFOLLOW|os.O_NOCTTY, dirCreateMode=0o777, fileOpenFlags=os.O_RDONLY|os.O_NOFOLLOW|os.O_NOCTTY|os.O_DIRECTORY, fileCreateMode=0o777) print('No recursive testdir cleanup for %s' % testDirName, file=sys.stderr) guerillabackup-0.5.0/test/LogfileBackupUnitTest/000077500000000000000000000000001450137035300216755ustar00rootroot00000000000000guerillabackup-0.5.0/test/LogfileBackupUnitTest/LogfileBackupUnit.config000066400000000000000000000030201450137035300264260ustar00rootroot00000000000000# LogFileBackupUnit configuration template # This list contains tuples with five elements per logfile backup # input. The meaning of each value is: # * Input directory: absolute directory name to search for logfiles. # * Input file regex: regular expression to select compressed # or uncompressed logfiles for inclusion. When the regex contains # a named group "oldserial", a file with empty serial is handled # as newest while file with largest serial value is the oldest. # With named group "serial", oldest file will have smallest # serial number, e.g. with date or timestamp file extensions. # When a named group "compress" is found, the match content, # e.g. "gz" or "bz2", will be used to find a decompressor and # decompress the file before processing. # * Source URL transformation: If None, the first named group # of the "input file regex" is appended to the input directory # name and used as source URL. When not starting with a "/", # the transformation string is the name to include literally # in the URL after the "input directory" name. # * Policy: If not none, include this string as handling policy # within the manifest. # * Encryption key name: If not None, encrypt the input using # the named key. LogBackupUnitInputList = [] # Include old (rotated) default syslog files, where serial number # was already appended. Accept also the compressed variants. LogBackupUnitInputList.append(( '[TmpDir]/logs', '^(test\\.log)\\.(?P[0-9]+)(?:\\.(?Pgz))?$', None, None, None)) guerillabackup-0.5.0/test/LogfileBackupUnitTest/Readme.txt000066400000000000000000000027071450137035300236410ustar00rootroot00000000000000Description: ============ This directory contains configuration for logfile backup unit testing. Test invocation: ================ projectBaseDir="... directory with GuerillaBackup source ..." tmpDir="$(mktemp -d)" mkdir -- "${tmpDir}/config" "${tmpDir}/data" "${tmpDir}/sink" "${tmpDir}/logs" cp -a -- "${projectBaseDir}/test/LogfileBackupUnitTest/config" "${tmpDir}/config" ln -s -- "${projectBaseDir}/src/lib/guerillabackup/LogfileBackupUnit.py" "${tmpDir}/config/units/LogfileBackupUnit" cp -a -- "${projectBaseDir}/test/LogfileBackupUnitTest/LogfileBackupUnit.config" "${tmpDir}/config/units" sed -i -r -e "s:\[TmpDir\]:${tmpDir}:g" -- "${tmpDir}/config/config" "${tmpDir}/config/units/LogfileBackupUnit.config" echo 0 > "${tmpDir}/logs/test.log" dd if=/dev/zero bs=1M count=32 > "${tmpDir}/logs/test.log.1" dd if=/dev/zero bs=1M count=32 | gzip -c9 > "${tmpDir}/logs/test.log.2.gz" cp -a -- "${tmpDir}/logs/test.log.2.gz" "${tmpDir}/logs/test.log.3.gz" cp -a -- "${tmpDir}/logs/test.log.2.gz" "${tmpDir}/logs/test.log.4.gz" cp -a -- "${tmpDir}/logs/test.log.2.gz" "${tmpDir}/logs/test.log.5.gz" cp -a -- "${tmpDir}/logs/test.log.2.gz" "${tmpDir}/logs/test.log.6.gz" cp -a -- "${tmpDir}/logs/test.log.2.gz" "${tmpDir}/logs/test.log.7.gz" echo "Starting LogfileBackupUnit testing in ${tmpDir}" "${projectBaseDir}/src/gb-backup-generator" --ConfigDir "${tmpDir}/config" sed -i -r -e "s:\[[0-9]+,:[123,:g" -- "${tmpDir}/state/generators/LogfileBackupUnit/state.current" guerillabackup-0.5.0/test/LogfileBackupUnitTest/config000066400000000000000000000102551450137035300230700ustar00rootroot00000000000000# GuerillaBackup main configuration file. # General parameters influence behavior of various backup elements, # e.g. source units, sinks or the generator itself. All those # parameters start with "General" to indicate their global relevance. # This is the default persistency storage base directory for all # components. All components will create files or subdirectories # starting with the component class name unless changed within # configuration. See also "ComponentPersistencyPrefix" in unit # or subunit configuration files. GeneralPersistencyBaseDir = '[TmpDir]/state' # This is the default runtime data directory for all components. # It is used to create sockets, PID files and similar, that need # not to be preserved between reboots. GeneralRuntimeDataDir = '[TmpDir]/run' # This parameter defines the default pipeline element to use to # compress backup data of any kind before sending it to downstream # processing, usually encryption of sink. When enabling compression # and encryption, you may want to disable the additional compression # step included in many encryption toosl, e.g. via "--compress-algo # none" in gpg. GeneralDefaultCompressionElement = guerillabackup.OSProcessPipelineElement('/bin/bzip2', ['/bin/bzip2', '-c9']) # This parameter defines the default encryption pipeline element # to use to encrypt backup data of any kind before sending it # to donwstream processing. For security reasons, a unit might # use an alternative encryption element, e.g. with different options # or keys, but it should NEVER ignore the parameter, even when # unit-specific encryption is disabled. Hence the unit shall never # generate uncencrypted data while this parameter is not also # overriden in the unit-specific configuration. See also function # "getDefaultDownstreamPipeline" documentation. # GeneralDefaultEncryptionElement = guerillabackup.GpgEncryptionPipelineElement('some key name') # Debugging settings: # This flag enables test mode for all configurable components # in the data pipe from source to sink. As testing of most features # will require to run real backups, the testing mode will cause # an abort in the very last moment before completion. Wellbehaved # components will roll back most of the actions under this circumstances. # GeneralDebugTestModeFlag = False # Generator specific settings: Those settings configure the local # default backup generator. # Use this sink for storage of backup data elements. The class # has to have a constructor only taking one argument, that is # the generator configuration context as defined by the SinkInterface. # When empty, the guerillabackup.DefaultFileSystemSink is used. # GeneratorSinkClass = guerillabackup.DefaultFileSystemSink # Use this directory for storage of the backup data elements generated # locally. The default location is "/var/lib/guerillabackup/data". # You may want also to enable transfer services using this directory # as source to copy or move backup data to an offsite location. DefaultFileSystemSinkBaseDir = '[TmpDir]/sink' # Unit specific default and specific settings can be found in # the units directory. # Transfer service configuration: this part of configuration does # not take effect automatically, a transfer service has to be # started loading this configuration file. When security considerations # prohibit use of same configuration, e.g. due to inaccessibility # of configuration file because of permission settings, then this # file should be copied to "config-[agent name]" instead. # Storage directory used by this transfer service. When not present, # the DefaultFileSystemSinkBaseDir is used instead. # TransferServiceStorageBaseDir = '/var/spool/guerillabackup/transfer' # Class to load to define the transfer receiver policy. # TransferReceiverPolicyClass = guerillabackup.Transfer.ReceiverStoreDataTransferPolicy # Arguments for creating the named transfer policy to pass after # the configuration context. # TransferReceiverPolicyInitArgs = None # Class to load to define the transfer sender policy. # TransferSenderPolicyClass = guerillabackup.Transfer.SenderMoveDataTransferPolicy # Arguments for creating the named transfer policy to pass after # the configuration context. # TransferSenderPolicyInitArgs = [False] guerillabackup-0.5.0/test/ReceiverOnlyTransferService/000077500000000000000000000000001450137035300231225ustar00rootroot00000000000000guerillabackup-0.5.0/test/ReceiverOnlyTransferService/Readme.txt000066400000000000000000000017571450137035300250720ustar00rootroot00000000000000Description: ============ This directory contains a transfer service implementation with a test receiver only transfer configuration. It just listens on an input socket, which has to be connected externally. Transfer invocation: ==================== projectBaseDir="... directory with GuerillaBackup source ..." tmpDir="$(mktemp -d)" mkdir -- "${tmpDir}/config" "${tmpDir}/data" cp -a -- "${projectBaseDir}/test/ReceiverOnlyTransferService/config" "${tmpDir}/config" sed -i -r -e "s:\[TmpDir\]:${tmpDir}:g" -- "${tmpDir}/config/config" echo "Listening on socket ${tmpDir}/run/transfer.socket" "${projectBaseDir}/src/gb-transfer-service" --Config "${tmpDir}/config/config" Connect the gb-transfer-service to an instance with a sending policy, e.g. see SenderOnlyTransferService testcase. socat "UNIX-CONNECT:${tmpDir}/run/transfer.socket" "UNIX-CONNECT:...other socket" Terminate the gb-transfer-service using [Ctrl]-C and check, that backups were transferred as expected. ls -al -- "${tmpDir}/data" guerillabackup-0.5.0/test/ReceiverOnlyTransferService/config000066400000000000000000000104211450137035300243100ustar00rootroot00000000000000# GuerillaBackup main configuration file. # General parameters influence behavior of various backup elements, # e.g. source units, sinks or the generator itself. All those # parameters start with "General" to indicate their global relevance. # This is the default persistency storage base directory for all # components. All components will create files or subdirectories # starting with the component class name unless changed within # configuration. See also "ComponentPersistencyPrefix" in unit # or subunit configuration files. GeneralPersistencyBaseDir = '[TmpDir]/state' # This is the default runtime data directory for all components. # It is used to create sockets, PID files and similar, that need # not to be preserved between reboots. GeneralRuntimeDataDir = '[TmpDir]/run' # This parameter defines the default pipeline element to use to # compress backup data of any kind before sending it to downstream # processing, usually encryption of sink. When enabling compression # and encryption, you may want to disable the additional compression # step included in many encryption toosl, e.g. via "--compress-algo # none" in gpg. GeneralDefaultCompressionElement = guerillabackup.OSProcessPipelineElement( '/bin/bzip2', ['/bin/bzip2', '-c9']) # This parameter defines the default encryption pipeline element # to use to encrypt backup data of any kind before sending it # to donwstream processing. For security reasons, a unit might # use an alternative encryption element, e.g. with different options # or keys, but it should NEVER ignore the parameter, even when # unit-specific encryption is disabled. Hence the unit shall never # generate uncencrypted data while this parameter is not also # overriden in the unit-specific configuration. See also function # "getDefaultDownstreamPipeline" documentation. # GeneralDefaultEncryptionElement = guerillabackup.GpgEncryptionPipelineElement('some key name') # Debugging settings: # This flag enables test mode for all configurable components # in the data pipe from source to sink. As testing of most features # will require to run real backups, the testing mode will cause # an abort in the very last moment before completion. Wellbehaved # components will roll back most of the actions under this circumstances. # GeneralDebugTestModeFlag = False # Generator specific settings: Those settings configure the local # default backup generator. # Use this sink for storage of backup data elements. The class # has to have a constructor only taking one argument, that is # the generator configuration context as defined by the SinkInterface. # When empty, the guerillabackup.DefaultFileSystemSink is used. # GeneratorSinkClass = guerillabackup.DefaultFileSystemSink # Use this directory for storage of the backup data elements generated # locally. Usually this is "/var/lib/guerillabackup/data" when # for temporary local storage (local disk-to-disk, e.g. to have # older versions to recover e.g. after admin errors or failed # system updates) or "/var/spool/guerillabackup/outgoing" when # backup data should be transfered to different location using # asynchronous fetch operations. DefaultFileSystemSinkBaseDir = '[TmpDir]/data' # Unit specific default and specific settings can be found in # the units directory. # Transfer service configuration: this part of configuration does # not take effect automatically, a transfer service has to be # started loading this configuration file. When security considerations # prohibit use of same configuration, e.g. due to inaccessibility # of configuration file because of permission settings, then this # file should be copied to "config-[agent name]" instead. # Storage directory used by this transfer service. When not present, # the DefaultFileSystemSinkBaseDir is used instead. # TransferServiceStorageBaseDir = '/var/spool/guerillabackup/transfer' # Class to load to define the transfer receiver policy. TransferReceiverPolicyClass = guerillabackup.Transfer.ReceiverStoreDataTransferPolicy # Arguments for creating the named transfer policy to pass after # the configuration context. # TransferReceiverPolicyInitArgs = None # Class to load to define the transfer sender policy. # TransferSenderPolicyClass = None # Arguments for creating the named transfer policy to pass after # the configuration context. # TransferSenderPolicyInitArgs = None guerillabackup-0.5.0/test/SenderOnlyTransferService/000077500000000000000000000000001450137035300225765ustar00rootroot00000000000000guerillabackup-0.5.0/test/SenderOnlyTransferService/Readme.txt000066400000000000000000000037231450137035300245410ustar00rootroot00000000000000Description: ============ This directory contains a transfer service implementation with a test backup generator adding one simple tar backup every minute and a transfer service configuration to send those. Generator invocation: ===================== projectBaseDir="... directory with GuerillaBackup source ..." tmpDir="$(mktemp -d)" mkdir -- "${tmpDir}/config" "${tmpDir}/data" "${tmpDir}/log" echo "Testlogdata" > "${tmpDir}/log/test.log.0" cp -a -- "${projectBaseDir}/test/SenderOnlyTransferService/config" "${projectBaseDir}/test/SenderOnlyTransferService/units" "${tmpDir}/config" sed -i -r -e "s:\[TmpDir\]:${tmpDir}:g" -- "${tmpDir}/config/config" "${tmpDir}/config/units/LogfileBackupUnit.config" "${tmpDir}/config/units/TarBackupUnit.config" ln -s -- "${projectBaseDir}/src/lib/guerillabackup/LogfileBackupUnit.py" "${tmpDir}/config/units/LogfileBackupUnit" ln -s -- "${projectBaseDir}/src/lib/guerillabackup/TarBackupUnit.py" "${tmpDir}/config/units/TarBackupUnit" "${projectBaseDir}/src/gb-backup-generator" --ConfigDir "${tmpDir}/config" Terminate the generator using [Ctrl]-C and check, that backups were created. ls -alR -- "${tmpDir}/data" To test data corruption handling, append a byte to one of the data files. echo "corrupted!" >> "${tmpDir}/data/.....data" gb-transfer-service invocation: ============================ Start the service: echo "Listening on socket ${tmpDir}/run/transfer.socket" "${projectBaseDir}/src/gb-transfer-service" --Config "${tmpDir}/config/config" Send test requests using the fake client: IO-handling is simplified, so just press return on empty lines until expected response was received. "${projectBaseDir}/test/SyncProtoTestClient" "${tmpDir}/run/transfer.socket" send Rnull send R send S["getPolicyInfo"] send S["startTransaction", null] send S["nextDataElement", false] send S["getDataElementInfo"] send S["getDataElementStream"] send S["nextDataElement", true] ... send S Normal transfer client test: See ReceiverOnlyTransferService guerillabackup-0.5.0/test/SenderOnlyTransferService/config000066400000000000000000000104171450137035300237710ustar00rootroot00000000000000# GuerillaBackup main configuration file. # General parameters influence behavior of various backup elements, # e.g. source units, sinks or the generator itself. All those # parameters start with "General" to indicate their global relevance. # This is the default persistency storage base directory for all # components. All components will create files or subdirectories # starting with the component class name unless changed within # configuration. See also "ComponentPersistencyPrefix" in unit # or subunit configuration files. GeneralPersistencyBaseDir = '[TmpDir]/state' # This is the default runtime data directory for all components. # It is used to create sockets, PID files and similar, that need # not to be preserved between reboots. GeneralRuntimeDataDir = '[TmpDir]/run' # This parameter defines the default pipeline element to use to # compress backup data of any kind before sending it to downstream # processing, usually encryption of sink. When enabling compression # and encryption, you may want to disable the additional compression # step included in many encryption toosl, e.g. via "--compress-algo # none" in gpg. GeneralDefaultCompressionElement = guerillabackup.OSProcessPipelineElement( '/bin/bzip2', ['/bin/bzip2', '-c9']) # This parameter defines the default encryption pipeline element # to use to encrypt backup data of any kind before sending it # to donwstream processing. For security reasons, a unit might # use an alternative encryption element, e.g. with different options # or keys, but it should NEVER ignore the parameter, even when # unit-specific encryption is disabled. Hence the unit shall never # generate uncencrypted data while this parameter is not also # overriden in the unit-specific configuration. See also function # "getDefaultDownstreamPipeline" documentation. # GeneralDefaultEncryptionElement = guerillabackup.GpgEncryptionPipelineElement('some key name') # Debugging settings: # This flag enables test mode for all configurable components # in the data pipe from source to sink. As testing of most features # will require to run real backups, the testing mode will cause # an abort in the very last moment before completion. Wellbehaved # components will roll back most of the actions under this circumstances. # GeneralDebugTestModeFlag = False # Generator specific settings: Those settings configure the local # default backup generator. # Use this sink for storage of backup data elements. The class # has to have a constructor only taking one argument, that is # the generator configuration context as defined by the SinkInterface. # When empty, the guerillabackup.DefaultFileSystemSink is used. # GeneratorSinkClass = guerillabackup.DefaultFileSystemSink # Use this directory for storage of the backup data elements generated # locally. Usually this is "/var/lib/guerillabackup/data" when # for temporary local storage (local disk-to-disk, e.g. to have # older versions to recover e.g. after admin errors or failed # system updates) or "/var/spool/guerillabackup/outgoing" when # backup data should be transfered to different location using # asynchronous fetch operations. DefaultFileSystemSinkBaseDir = '[TmpDir]/data' # Unit specific default and specific settings can be found in # the units directory. # Transfer service configuration: this part of configuration does # not take effect automatically, a transfer service has to be # started loading this configuration file. When security considerations # prohibit use of same configuration, e.g. due to inaccessibility # of configuration file because of permission settings, then this # file should be copied to "config-[agent name]" instead. # Storage directory used by this transfer service. When not present, # the DefaultFileSystemSinkBaseDir is used instead. # TransferServiceStorageBaseDir = '/var/spool/guerillabackup/transfer' # Class to load to define the transfer receiver policy. # TransferReceiverPolicyClass = None # Arguments for creating the named transfer policy to pass after # the configuration context. # TransferReceiverPolicyInitArgs = None # Class to load to define the transfer sender policy. TransferSenderPolicyClass = guerillabackup.Transfer.SenderMoveDataTransferPolicy # Arguments for creating the named transfer policy to pass after # the configuration context. TransferSenderPolicyInitArgs = [False] guerillabackup-0.5.0/test/SenderOnlyTransferService/units/000077500000000000000000000000001450137035300237405ustar00rootroot00000000000000guerillabackup-0.5.0/test/SenderOnlyTransferService/units/LogfileBackupUnit.config000066400000000000000000000030151450137035300304750ustar00rootroot00000000000000# LogFileBackupUnit configuration template # This list contains tuples with five elements per logfile backup # input. The meaning of each value is: # * Input directory: absolute directory name to search for logfiles. # * Input file regex: regular expression to select compressed # or uncompressed logfiles for inclusion. When the regex contains # a named group "oldserial", a file with empty serial is handled # as newest while file with largest serial value is the oldest. # With named group "serial", oldest file will have smallest # serial number, e.g. with date or timestamp file extensions. # When a named group "compress" is found, the match content, # e.g. "gz" or "bz2", will be used to find a decompressor and # decompress the file before processing. # * Source URL transformation: If None, the first named group # of the "input file regex" is appended to the input directory # name and used as source URL. When not starting with a "/", # the transformation string is the name to include literally # in the URL after the "input directory" name. # * Policy: If not none, include this string as handling policy # within the manifest. # * Encryption key name: If not None, encrypt the input using # the named key. LogBackupUnitInputList = [] # Include old (rotated) default syslog files, where serial number # was already appended. Accept also the compressed variants. LogBackupUnitInputList.append(( '[TmpDir]/log', '^([a-z.-]+)\\.(?P[0-9]+)(?:\\.(?Pgz))?$', None, None, None)) guerillabackup-0.5.0/test/SenderOnlyTransferService/units/TarBackupUnit.config000066400000000000000000000050511450137035300276440ustar00rootroot00000000000000# TarBackupUnit configuration template # This list contains dictionaries with configuration parameters # for each tar backup to run. All tar backups of one unit are # run sequentially. Configuration parameters are: # * PreBackupCommand: execute this command given as list of arguments # before starting the backup, e.g. create a filesystem or virtual # machine snapshot, perform cleanup. # * PostBackupCommand: execute this command after starting the # backup. # * Root: root directory of tar backup, "/" when missing. # * Include: list of pathes to include, ["."] when missing. # * Exclude: list of patterns to exclude from backup (see tar # documentation "--exclude"). When missing and Root is "/", # list ["./var/lib/guerillabackup/data"] is used. # * IgnoreBackupRaces: flag to indicate if races during backup # are acceptable, e.g. because the directories are modified, # * FullBackupTiming: tuple with minimum and maximum interval # between full backup invocations and modulo base and offset, # all in seconds. Without modulo invocation (all values None), # full backups will run as soon as minimum interval is exceeded. # With modulo timing, modulo trigger is ignored when below minimum # time. When gap above maximum interval, immediate backup is # started. # * IncBackupTiming: When set, incremental backups are created # to fill the time between full backups. Timings are specified # as tuple with same meaning as in FullBackupTiming parameter. # This will also trigger generation of tar file indices when # running full backups. # * FullOverrideCommand: when set, parameters Exclude, Include, # Root are ignored and exactly the given command is executed. # * IncOverrideCommand: when set, parameters Exclude, Include, # Root are ignored and exactly the given command is executed. # * KeepIndices: number of old incremental tar backup indices # to keep. With -1 keep all, otherwise keep one the given number. # Default is 0. # * Policy: If not none, include this string as handling policy # * EncryptionKey: If not None, encrypt the input using the named # key. Otherwise default encryption key from global configuration # might be used. TarBackupUnitConfigList = {} TarBackupUnitConfigList['/test'] = { 'Root': '[TmpDir]', 'Include': ['.'], 'Exclude': ['./data'], 'IgnoreBackupRaces': False, # Create full backup every 10 minutes. 'FullBackupTiming': [570, 630, 600, 0], # Create incremental backup every minute. 'IncBackupTiming': [55, 65, 60, 0], 'KeepIndices': 20, 'Policy': 'default', 'EncryptionKey': None} guerillabackup-0.5.0/test/SyncProtoTestClient000077500000000000000000000060161450137035300213560ustar00rootroot00000000000000#!/usr/bin/python3 -BEsStt """This client connects to a sync service and sends requests from stdin and prints the responses. This can be used for testing of StreamRequestResponseMultiplexer from guerillabackup.Transfer. See source of StreamRequestResponseMultiplexer for description of protocol structure.""" import sys sys.path = sys.path[1:] + ['/usr/lib/guerillabackup/lib', '/etc/guerillabackup/lib-enabled'] import errno import fcntl import os import socket import struct if len(sys.argv) != 2: print('Usage %s [target]' % sys.argv[0], file=sys.stderr) sys.exit(1) connectAddress = sys.argv[1] clientSocket = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) clientSocket.connect(connectAddress) print('Connected to %s' % repr(connectAddress), file=sys.stderr) flags = fcntl.fcntl(clientSocket.fileno(), fcntl.F_GETFL) fcntl.fcntl(clientSocket.fileno(), fcntl.F_SETFL, flags|os.O_NONBLOCK) remoteData = b'' while True: readData = None try: readData = clientSocket.recv(1<<20) except socket.error as receiveError: if receiveError.errno == errno.EAGAIN: readData = b'' else: raise if len(readData) != 0: print('Received %d bytes of remote data' % len(readData), file=sys.stderr) remoteData += readData if len(remoteData) >= 5: if remoteData[0] not in b'APRS': print('Invalid remote data package type %s, purging data %s' % ( repr(remoteData[0]), repr(remoteData)), file=sys.stderr) remoteData = b'' else: remoteDataLength = struct.unpack(' (1<<20)): print('Invalid remote data length %d, purging data %s' % ( remoteDataLength, repr(remoteData)), file=sys.stderr) remoteData = b'' elif remoteDataLength+5 <= len(remoteData): print('Received valid packet %s' % repr(remoteData[0:1]+remoteData[5:5+remoteDataLength]), file=sys.stderr) remoteData = remoteData[5+remoteDataLength:] # Try again to read more data continue # No remote data to dump, try to read a command commandLine = sys.stdin.readline() if commandLine == '': # End of input. break commandLine = commandLine[:-1] if commandLine == '': continue commandLength = commandLine.find(' ') if commandLength < 0: commandLength = len(commandLine) command = commandLine[:commandLength] if command == 'send': sendData = bytes(commandLine[commandLength+1:], sys.getdefaultencoding()) if (len(sendData) == 0) or (sendData[0] not in b'APRS'): print('Send data has to start with type letter, optionally ' \ 'followed by data %s' % repr(sendData), file=sys.stderr) continue sendData = sendData[0:1]+struct.pack('