././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1729007861.4743865 borgstore-0.1.0/0000755000076500000240000000000014703510365012077 5ustar00twstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1729007318.0 borgstore-0.1.0/CHANGES.rst0000644000076500000240000000455314703507326013713 0ustar00twstaffChangeLog ========= Version 0.1.0 2024-10-15 ------------------------ Breaking changes: - accepted store URLs: see README - Store: require complete levels configuration, #46 Other changes: - sftp/posixfs backends: remove ad-hoc mkdir calls, #46 - optimize Sftp._mkdir, #80 - sftp backend is now optional, avoids dependency issues on some platforms, #74. Use pip install "borgstore[sftp]" to install with the sftp backend. Version 0.0.5 2024-10-01 ------------------------ Fixes: - backend.create: only reject non-empty storage, #57 - backends.sftp: fix _mkdir edge case - backends.sftp: raise BackendDoesNotExist if base path is not found - rclone backend: - don't error on create if source directory is empty, #57 - fix hang on termination, #54 New features: - rclone backend: retry errors on load and store 3 times Other changes: - remove MStore for now, see commit 6a6fb334. - refactor Store tests, add Store.set_levels method - move types-requests to tox.ini, only needed for development Version 0.0.4 2024-09-22 ------------------------ - rclone: new backend to access any of the 100s of cloud backends rclone supports, needs rclone >= v1.57.0. See the rclone docs for installing rclone and creating remotes. After that, borgstore will support URLs like: - rclone://remote: - rclone://remote:path - rclone:///tmp/testdir (local fs, for testing) - Store.list: give up trying to do anything with a directory's "size" - .info / .list: return st.st_size for a directory "as is" - tests: BORGSTORE_TEST_RCLONE_URL to set rclone test URL - tests: allow BORGSTORE_TEST_*_URL into testenv to make tox work for testing sftp, rclone or other URLs. Version 0.0.3 2024-09-17 ------------------------ - sftp: add support for ~/.ssh/config, #37 - sftp: username is optional, #27 - load known_hosts, remove AutoAddPolicy, #39 - store: raise BE specific exceptions, #34 - add Store.stats property, #25 - bandwidth emulation via BORGSTORE_BANDWIDTH [bit/s], #24 - latency emulation via BORGSTORE_LATENCY [us], #24 - fix demo code, also output stats - tests: BORGSTORE_TEST_SFTP_URL to set sftp test URL Version 0.0.2 2024-09-10 ------------------------ - sftp backend: use paramiko's client.posix_rename, #17 - posixfs backend: hack: accept file://relative/path, #23 - support / test on Python 3.13, #21 Version 0.0.1 2024-08-23 ------------------------ First PyPi release. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1724149610.0 borgstore-0.1.0/LICENSE.rst0000644000076500000240000000271214661067552013725 0ustar00twstaffCopyright (C) 2024 Thomas Waldmann All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1729007861.4741857 borgstore-0.1.0/PKG-INFO0000644000076500000240000001650714703510365013205 0ustar00twstaffMetadata-Version: 2.1 Name: borgstore Version: 0.1.0 Summary: key/value store Author-email: Thomas Waldmann License: BSD Project-URL: Homepage, https://github.com/borgbackup/borgstore Keywords: kv,key/value,store Classifier: Development Status :: 3 - Alpha Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: BSD License Classifier: Operating System :: POSIX Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: 3.12 Classifier: Programming Language :: Python :: 3.13 Classifier: Topic :: Software Development :: Libraries Classifier: Topic :: Software Development :: Libraries :: Python Modules Requires-Python: >=3.9 Description-Content-Type: text/x-rst License-File: LICENSE.rst Requires-Dist: requests>=2.25.1 Provides-Extra: sftp Requires-Dist: paramiko>=1.9.1; extra == "sftp" BorgStore ========= A key/value store implementation in Python, supporting multiple backends. Keys ---- A key (str) can look like: - 0123456789abcdef... (usually a long, hex-encoded hash value) - Any other pure ASCII string without "/" or ".." or " ". Namespaces ---------- To keep stuff apart, keys should get prefixed with a namespace, like: - config/settings - meta/0123456789abcdef... - data/0123456789abcdef... Please note: 1. you should always use namespaces. 2. nested namespaces like namespace1/namespace2/key are not supported. 3. the code could work without a namespace (namespace ""), but then you can't add another namespace later, because then you would have created nested namespaces. Values ------ Values can be any arbitrary binary data (bytes). Store Operations ---------------- The high-level Store API implementation transparently deals with nesting and soft deletion, so the caller doesn't have to care much for that and the Backend API can be much simpler: - create/destroy: initialize or remove the whole store. - list: flat list of the items in the given namespace, with or without soft deleted items. - store: write a new item into the store (giving its key/value pair) - load: read a value from the store (giving its key), partial loads giving offset and/or size are supported. - info: get information about an item via its key (exists? size? ...) - delete: immediately remove an item from the store (giving its key) - move: implements rename, soft delete / undelete, move to current nesting level - stats: api call counters, time spent in api methods, data volume/throughput - latency/bandwidth emulator: can emulate higher latency (via BORGSTORE_LATENCY [us]) and lower bandwidth (via BORGSTORE_BANDWIDTH [bit/s]) than what is actually provided by the backend. Automatic Nesting ----------------- For the Store user, items have names like e.g.: namespace/0123456789abcdef... namespace/abcdef0123456789... If there are very many items in the namespace, this could lead to scalability issues in the backend, thus the Store implementation offers transparent nesting, so that internally the Backend API will be called with names like e.g.: namespace/01/23/56/0123456789abcdef... namespace/ab/cd/ef/abcdef0123456789... The nesting depth can be configured from 0 (= no nesting) to N levels and there can be different nesting configurations depending on the namespace. The Store supports operating at different nesting levels in the same namespace at the same time. When using nesting depth > 0, the backends will assume that keys are hashes (have hex digits) because some backends will want to pre-create the nesting directories at backend initialization time to optimize for better performance while using the backend. Soft deletion ------------- To soft delete an item (so its value could be still read or it could be undeleted), the store just renames the item, appending ".del" to its name. Undelete reverses this by removing the ".del" suffix from the name. Some store operations have a boolean flag "deleted" to choose whether they shall consider soft deleted items. Backends -------- The backend API is rather simple, one only needs to provide some very basic operations. Existing backends are listed below, more might come in future. posixfs ~~~~~~~ Use storage on a local POSIX filesystem: - URL: ``file:///absolute/path`` - it is the caller's task to create an absolute fs path from a relative one. - namespaces: directories - values: in key-named files - pre-creates nesting directories sftp ~~~~ Use storage on a sftp server: - URL: ``sftp://user@server:port/relative/path`` (strongly recommended) For user's and admin's convenience, mapping the URL path to the server fs path depends on the server configuration (home directory, sshd/sftpd config, ...). Usually the path is relative to the user's home directory. - URL: ``sftp://user@server:port//absolute/path`` As this uses an absolute path, things are more difficult here: - user's config might break if server admin moves a user home to a new location. - users must know the full absolute path of space they have permission to use. - namespaces: directories - values: in key-named files - pre-creates nesting directories rclone ~~~~~~ Use storage on any of the many cloud providers `rclone `_ supports: - URL: ``rclone:remote:path``, we just prefix "rclone:" and give all to the right of that to rclone, see: https://rclone.org/docs/#syntax-of-remote-paths - implementation of this primarily depends on the specific remote. Scalability ----------- - Count of key/value pairs stored in a namespace: automatic nesting is provided for keys to address common scalability issues. - Key size: there are no special provisions for extremely long keys (like: more than backend limitations). Usually this is not a problem though. - Value size: there are no special provisions for dealing with large value sizes (like: more than free memory, more than backend storage limitations, etc.). If one deals with very large values, one usually cuts them into chunks before storing them into the store. - Partial loads improve performance by avoiding a full load if only a part of the value is needed (e.g. a header with metadata). Installation ------------ Install without the ``sftp:`` backend:: pip install borgstore Install with the ``sftp:`` backend (more dependencies):: pip install "borgstore[sftp]" Please note that ``rclone:`` also supports sftp remotes. Want a demo? ------------ Run this to get instructions how to run the demo: python3 -m borgstore State of this project --------------------- **API is still unstable and expected to change as development goes on.** **As long as the API is unstable, there will be no data migration tools, like e.g. for upgrading an existing store's data to a new release.** There are tests and they succeed for the basic functionality, so some of the stuff is already working well. There might be missing features or optimization potential, feedback welcome! There are a lot of possible, but still missing backends. If you want to create and support one: pull requests are welcome. Borg? ----- Please note that this code is currently **not** used by the stable release of BorgBackup (aka "borg"), but only by borg2 beta 10+ and master branch. License ------- BSD license. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1729007176.0 borgstore-0.1.0/README.rst0000644000076500000240000001442714703507110013570 0ustar00twstaffBorgStore ========= A key/value store implementation in Python, supporting multiple backends. Keys ---- A key (str) can look like: - 0123456789abcdef... (usually a long, hex-encoded hash value) - Any other pure ASCII string without "/" or ".." or " ". Namespaces ---------- To keep stuff apart, keys should get prefixed with a namespace, like: - config/settings - meta/0123456789abcdef... - data/0123456789abcdef... Please note: 1. you should always use namespaces. 2. nested namespaces like namespace1/namespace2/key are not supported. 3. the code could work without a namespace (namespace ""), but then you can't add another namespace later, because then you would have created nested namespaces. Values ------ Values can be any arbitrary binary data (bytes). Store Operations ---------------- The high-level Store API implementation transparently deals with nesting and soft deletion, so the caller doesn't have to care much for that and the Backend API can be much simpler: - create/destroy: initialize or remove the whole store. - list: flat list of the items in the given namespace, with or without soft deleted items. - store: write a new item into the store (giving its key/value pair) - load: read a value from the store (giving its key), partial loads giving offset and/or size are supported. - info: get information about an item via its key (exists? size? ...) - delete: immediately remove an item from the store (giving its key) - move: implements rename, soft delete / undelete, move to current nesting level - stats: api call counters, time spent in api methods, data volume/throughput - latency/bandwidth emulator: can emulate higher latency (via BORGSTORE_LATENCY [us]) and lower bandwidth (via BORGSTORE_BANDWIDTH [bit/s]) than what is actually provided by the backend. Automatic Nesting ----------------- For the Store user, items have names like e.g.: namespace/0123456789abcdef... namespace/abcdef0123456789... If there are very many items in the namespace, this could lead to scalability issues in the backend, thus the Store implementation offers transparent nesting, so that internally the Backend API will be called with names like e.g.: namespace/01/23/56/0123456789abcdef... namespace/ab/cd/ef/abcdef0123456789... The nesting depth can be configured from 0 (= no nesting) to N levels and there can be different nesting configurations depending on the namespace. The Store supports operating at different nesting levels in the same namespace at the same time. When using nesting depth > 0, the backends will assume that keys are hashes (have hex digits) because some backends will want to pre-create the nesting directories at backend initialization time to optimize for better performance while using the backend. Soft deletion ------------- To soft delete an item (so its value could be still read or it could be undeleted), the store just renames the item, appending ".del" to its name. Undelete reverses this by removing the ".del" suffix from the name. Some store operations have a boolean flag "deleted" to choose whether they shall consider soft deleted items. Backends -------- The backend API is rather simple, one only needs to provide some very basic operations. Existing backends are listed below, more might come in future. posixfs ~~~~~~~ Use storage on a local POSIX filesystem: - URL: ``file:///absolute/path`` - it is the caller's task to create an absolute fs path from a relative one. - namespaces: directories - values: in key-named files - pre-creates nesting directories sftp ~~~~ Use storage on a sftp server: - URL: ``sftp://user@server:port/relative/path`` (strongly recommended) For user's and admin's convenience, mapping the URL path to the server fs path depends on the server configuration (home directory, sshd/sftpd config, ...). Usually the path is relative to the user's home directory. - URL: ``sftp://user@server:port//absolute/path`` As this uses an absolute path, things are more difficult here: - user's config might break if server admin moves a user home to a new location. - users must know the full absolute path of space they have permission to use. - namespaces: directories - values: in key-named files - pre-creates nesting directories rclone ~~~~~~ Use storage on any of the many cloud providers `rclone `_ supports: - URL: ``rclone:remote:path``, we just prefix "rclone:" and give all to the right of that to rclone, see: https://rclone.org/docs/#syntax-of-remote-paths - implementation of this primarily depends on the specific remote. Scalability ----------- - Count of key/value pairs stored in a namespace: automatic nesting is provided for keys to address common scalability issues. - Key size: there are no special provisions for extremely long keys (like: more than backend limitations). Usually this is not a problem though. - Value size: there are no special provisions for dealing with large value sizes (like: more than free memory, more than backend storage limitations, etc.). If one deals with very large values, one usually cuts them into chunks before storing them into the store. - Partial loads improve performance by avoiding a full load if only a part of the value is needed (e.g. a header with metadata). Installation ------------ Install without the ``sftp:`` backend:: pip install borgstore Install with the ``sftp:`` backend (more dependencies):: pip install "borgstore[sftp]" Please note that ``rclone:`` also supports sftp remotes. Want a demo? ------------ Run this to get instructions how to run the demo: python3 -m borgstore State of this project --------------------- **API is still unstable and expected to change as development goes on.** **As long as the API is unstable, there will be no data migration tools, like e.g. for upgrading an existing store's data to a new release.** There are tests and they succeed for the basic functionality, so some of the stuff is already working well. There might be missing features or optimization potential, feedback welcome! There are a lot of possible, but still missing backends. If you want to create and support one: pull requests are welcome. Borg? ----- Please note that this code is currently **not** used by the stable release of BorgBackup (aka "borg"), but only by borg2 beta 10+ and master branch. License ------- BSD license. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1728824406.0 borgstore-0.1.0/pyproject.toml0000644000076500000240000000401314702742126015012 0ustar00twstaff[project] name = "borgstore" dynamic = ["version"] authors = [{name="Thomas Waldmann", email="tw@waldmann-edv.de"}, ] description = "key/value store" readme = "README.rst" keywords = ["kv", "key/value", "store"] classifiers = [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Operating System :: POSIX", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.9", "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", "Programming Language :: Python :: 3.12", "Programming Language :: Python :: 3.13", "Topic :: Software Development :: Libraries", "Topic :: Software Development :: Libraries :: Python Modules", ] license = {text="BSD"} requires-python = ">=3.9" dependencies = [ "requests >= 2.25.1", ] [project.optional-dependencies] sftp = [ "paramiko >= 1.9.1", # 1.9.1+ supports multiple IdentityKey entries in .ssh/config ] [project.urls] Homepage = "https://github.com/borgbackup/borgstore" [build-system] requires = ["setuptools", "setuptools_scm[toml]>=6.2"] build-backend = "setuptools.build_meta" [tool.setuptools_scm] # make sure we have the same versioning scheme with all setuptools_scm versions, to avoid different autogenerated files # https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1015052 # https://github.com/borgbackup/borg/issues/6875 write_to = "src/borgstore/_version.py" write_to_template = "__version__ = version = {version!r}\n" [tool.black] line-length = 120 skip-magic-trailing-comma = true [tool.pytest.ini_options] minversion = "6.0" testpaths = ["tests"] [tool.flake8] # Ignoring E203 due to https://github.com/PyCQA/pycodestyle/issues/373 ignore = ['E226', 'W503', 'E203'] max_line_length = 120 exclude = ['build', 'dist', '.git', '.idea', '.mypy_cache', '.tox'] [tool.mypy] python_version = '3.10' strict_optional = false local_partial_types = true show_error_codes = true files = 'src/borgstore/**/*.py' ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1729007861.474421 borgstore-0.1.0/setup.cfg0000644000076500000240000000004614703510365013720 0ustar00twstaff[egg_info] tag_build = tag_date = 0 ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1729007861.4686706 borgstore-0.1.0/src/0000755000076500000240000000000014703510365012666 5ustar00twstaff././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1729007861.4704297 borgstore-0.1.0/src/borgstore/0000755000076500000240000000000014703510365014674 5ustar00twstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722853419.0 borgstore-0.1.0/src/borgstore/__init__.py0000644000076500000240000000013314654124053017002 0ustar00twstaff""" BorgStore - a key/value store. """ from ._version import __version__, version # noqa ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726386464.0 borgstore-0.1.0/src/borgstore/__main__.py0000644000076500000240000000453314671510440016771 0ustar00twstaff""" Demo for BorgStore ================== Usage: python -m borgstore E.g.: python -m borgstore file:///tmp/borgstore_storage Please be careful: the given storage will be created, used and **completely deleted**! """ def run_demo(storage_url): from .store import Store def id_key(data: bytes): from hashlib import new h = new("sha256", data) return f"data/{h.hexdigest()}" levels_config = { "config/": [0], # no nesting needed/wanted for the configs "data/": [2], # 2 nesting levels wanted for the data } store = Store(url=storage_url, levels=levels_config) try: store.create() except FileExistsError: # currently, we only have file:// storages, so this should be fine. print("Error: you must not give an existing directory.") return with store: print("Writing 2 items to config namespace...") settings1_key = "config/settings1" store.store(settings1_key, b"value1 = 42") settings2_key = "config/settings2" store.store(settings2_key, b"value2 = 23") print(f"Listing config namespace contents: {list(store.list('config'))}") settings1_value = store.load(settings1_key) print(f"Loaded from store: {settings1_key}: {settings1_value.decode()}") settings2_value = store.load(settings2_key) print(f"Loaded from store: {settings2_key}: {settings2_value.decode()}") print("Writing 2 items to data namespace...") data1 = b"some arbitrary binary data." key1 = id_key(data1) store.store(key1, data1) data2 = b"more arbitrary binary data. " * 2 key2 = id_key(data2) store.store(key2, data2) print(f"Soft deleting item {key2} ...") store.move(key2, delete=True) print(f"Listing data namespace contents: {list(store.list('data', deleted=False))}") print(f"Listing data namespace contents, incl. deleted: {list(store.list('data', deleted=True))}") print(f"Stats: {store.stats}") answer = input("After you've inspected the storage, enter DESTROY to destroy the storage, anything else to abort: ") if answer == "DESTROY": store.destroy() if __name__ == "__main__": import sys if len(sys.argv) == 2: run_demo(sys.argv[1]) else: print(__doc__) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1729007861.0 borgstore-0.1.0/src/borgstore/_version.py0000644000076500000240000000004014703510365017064 0ustar00twstaff__version__ = version = '0.1.0' ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1729007861.4728682 borgstore-0.1.0/src/borgstore/backends/0000755000076500000240000000000014703510365016446 5ustar00twstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722853419.0 borgstore-0.1.0/src/borgstore/backends/__init__.py0000644000076500000240000000012414654124053020554 0ustar00twstaff""" Package with misc. backend implementations. See backends._base for details. """ ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1728835103.0 borgstore-0.1.0/src/borgstore/backends/_base.py0000644000076500000240000000776414702767037020120 0ustar00twstaff""" Base class and type definitions for all backend implementations in this package. Docs that are not backend-specific are also found here. """ from abc import ABC, abstractmethod from collections import namedtuple from typing import Iterator from ..constants import MAX_NAME_LENGTH ItemInfo = namedtuple("ItemInfo", "name exists size directory") def validate_name(name): """validate a backend key / name""" if not isinstance(name, str): raise TypeError(f"name must be str, but got: {type(name)}") # name must not be too long if len(name) > MAX_NAME_LENGTH: raise ValueError(f"name is too long (max: {MAX_NAME_LENGTH}): {name}") # avoid encoding issues try: name.encode("ascii") except UnicodeEncodeError: raise ValueError(f"name must encode to plain ascii, but failed with: {name}") # security: name must be relative - can be foo or foo/bar/baz, but must never be /foo or ../foo if name.startswith("/") or name.endswith("/") or ".." in name: raise ValueError(f"name must be relative and not contain '..': {name}") # names used here always have '/' as separator, never '\' - # this is to avoid confusion in case this is ported to e.g. Windows. # also: no blanks - simplifies usage via CLI / shell. if "\\" in name or " " in name: raise ValueError(f"name must not contain backslashes or blanks: {name}") # name must be lowercase - this is to avoid troubles in case this is ported to a non-case-sensitive backend. # also, guess we want to avoid that a key "config" would address a different item than a key "CONFIG" or # a key "1234CAFE5678BABE" would address a different item than a key "1234cafe5678babe". if name != name.lower(): raise ValueError(f"name must be lowercase, but got: {name}") class BackendBase(ABC): # a backend can request all directories to be pre-created once at backend creation (initialization) time. # for some backends this will optimize the performance of store and move operation, because they won't # have to care for ad-hoc directory creation for every store or move call. of course, create will take # significantly longer, especially if nesting on levels > 1 is used. # otoh, for some backends this might be completely pointless, e.g. if mkdir is a NOP (is ignored). precreate_dirs: bool = False @abstractmethod def create(self): """create (initialize) a backend storage""" @abstractmethod def destroy(self): """completely remove the backend storage (and its contents)""" def __enter__(self): self.open() return self def __exit__(self, exc_type, exc_val, exc_tb): self.close() return False @abstractmethod def open(self): """open (start using) a backend storage""" @abstractmethod def close(self): """close (stop using) a backend storage""" @abstractmethod def mkdir(self, name: str) -> None: """create directory/namespace """ @abstractmethod def rmdir(self, name: str) -> None: """remove directory/namespace """ @abstractmethod def info(self, name) -> ItemInfo: """return information about """ @abstractmethod def load(self, name: str, *, size=None, offset=0) -> bytes: """load value from """ @abstractmethod def store(self, name: str, value: bytes) -> None: """store into """ @abstractmethod def delete(self, name: str) -> None: """delete """ @abstractmethod def move(self, curr_name: str, new_name: str) -> None: """rename curr_name to new_name (overwrite target)""" @abstractmethod def list(self, name: str) -> Iterator[ItemInfo]: """list the contents of , non-recursively. Does not yield TMP_SUFFIX items - usually they are either not finished uploading or they are leftover crap from aborted uploads. The yielded ItemInfos are sorted alphabetically by name. """ ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726386464.0 borgstore-0.1.0/src/borgstore/backends/errors.py0000644000076500000240000000141614671510440020334 0ustar00twstaff""" Generic exception classes used by all backends. """ class BackendError(Exception): """Base class for exceptions in this module.""" class BackendURLInvalid(BackendError): """Raised when trying to create a store using an invalid backend URL.""" class NoBackendGiven(BackendError): """Raised when trying to create a store and not giving a backend nor a URL.""" class BackendAlreadyExists(BackendError): """Raised when a backend already exists.""" class BackendDoesNotExist(BackendError): """Raised when a backend does not exist.""" class BackendMustNotBeOpen(BackendError): """Backend must not be open.""" class BackendMustBeOpen(BackendError): """Backend must be open.""" class ObjectNotFound(BackendError): """Object not found.""" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1728835103.0 borgstore-0.1.0/src/borgstore/backends/posixfs.py0000644000076500000240000001534414702767037020533 0ustar00twstaff""" Filesystem based backend implementation - uses files in directories below a base path. """ import os import re from pathlib import Path import shutil import stat import tempfile from ._base import BackendBase, ItemInfo, validate_name from .errors import BackendError, BackendAlreadyExists, BackendDoesNotExist, BackendMustNotBeOpen, BackendMustBeOpen from .errors import ObjectNotFound from ..constants import TMP_SUFFIX def get_file_backend(url): # file:///absolute/path # notes: # - we only support **local** fs **absolute** paths. # - there is no such thing as a "relative path" local fs file: url # - the general url syntax is proto://host/path # - // introduces the host part. it is empty here, meaning localhost / local fs. # - the third slash is NOT optional, it is the start of an absolute path as well # as the separator between the host and the path part. # - the caller is responsible to give an absolute path. file_regex = r""" file:// # only empty host part is supported (?P(/.*)) # path must be an absolute path """ m = re.match(file_regex, url, re.VERBOSE) if m: return PosixFS(path=m["path"]) class PosixFS(BackendBase): # PosixFS implementation supports precreate = True as well as = False, # but be careful: if backend creation was with precreate_dirs = False, # backend usage must not be with precreate_dirs = True. precreate_dirs: bool = True def __init__(self, path, *, do_fsync=False): self.base_path = Path(path) if not self.base_path.is_absolute(): raise BackendError("path must be an absolute path") self.opened = False self.do_fsync = do_fsync # False = 26x faster, see #10 def create(self): if self.opened: raise BackendMustNotBeOpen() try: # we accept an already existing directory, but we do not create parent dirs: self.base_path.mkdir(exist_ok=True, parents=False) except FileNotFoundError: raise BackendError(f"posixfs storage base path's parent directory does not exist: {self.base_path}") contents = list(self.base_path.iterdir()) if contents: raise BackendAlreadyExists(f"posixfs storage base path is not empty: {self.base_path}") def destroy(self): if self.opened: raise BackendMustNotBeOpen() try: shutil.rmtree(os.fspath(self.base_path)) except FileNotFoundError: raise BackendDoesNotExist(f"posixfs storage base path does not exist: {self.base_path}") def open(self): if self.opened: raise BackendMustNotBeOpen() if not self.base_path.is_dir(): raise BackendDoesNotExist( f"posixfs storage base path does not exist or is not a directory: {self.base_path}" ) self.opened = True def close(self): if not self.opened: raise BackendMustBeOpen() self.opened = False def _validate_join(self, name): validate_name(name) return self.base_path / name def mkdir(self, name): if not self.opened: raise BackendMustBeOpen() path = self._validate_join(name) path.mkdir(parents=True, exist_ok=True) def rmdir(self, name): if not self.opened: raise BackendMustBeOpen() path = self._validate_join(name) try: path.rmdir() except FileNotFoundError: raise ObjectNotFound(name) from None def info(self, name): if not self.opened: raise BackendMustBeOpen() path = self._validate_join(name) try: st = path.stat() except FileNotFoundError: return ItemInfo(name=path.name, exists=False, directory=False, size=0) else: is_dir = stat.S_ISDIR(st.st_mode) return ItemInfo(name=path.name, exists=True, directory=is_dir, size=st.st_size) def load(self, name, *, size=None, offset=0): if not self.opened: raise BackendMustBeOpen() path = self._validate_join(name) try: with path.open("rb") as f: if offset > 0: f.seek(offset) return f.read(-1 if size is None else size) except FileNotFoundError: raise ObjectNotFound(name) from None def store(self, name, value): if not self.opened: raise BackendMustBeOpen() path = self._validate_join(name) tmp_dir = path.parent if not self.precreate_dirs: # note: tmp_dir already exists, if it was pre-created by Store.create_levels. tmp_dir.mkdir(parents=True, exist_ok=True) # write to a differently named temp file in same directory first, # so the store never sees partially written data. with tempfile.NamedTemporaryFile(suffix=TMP_SUFFIX, dir=tmp_dir, delete=False) as f: f.write(value) if self.do_fsync: f.flush() os.fsync(f.fileno()) tmp_path = Path(f.name) # all written and synced to disk, rename it to the final name: try: tmp_path.replace(path) except OSError: tmp_path.unlink() raise def delete(self, name): if not self.opened: raise BackendMustBeOpen() path = self._validate_join(name) try: path.unlink() except FileNotFoundError: raise ObjectNotFound(name) from None def move(self, curr_name, new_name): if not self.opened: raise BackendMustBeOpen() curr_path = self._validate_join(curr_name) new_path = self._validate_join(new_name) try: if not self.precreate_dirs: # note: new_path.parent dir already exists, if it was pre-created by Store.create_levels. new_path.parent.mkdir(parents=True, exist_ok=True) curr_path.replace(new_path) except FileNotFoundError: raise ObjectNotFound(curr_name) from None def list(self, name): if not self.opened: raise BackendMustBeOpen() path = self._validate_join(name) try: paths = sorted(path.iterdir()) except FileNotFoundError: raise ObjectNotFound(name) from None else: for p in paths: if not p.name.endswith(TMP_SUFFIX): try: st = p.stat() except FileNotFoundError: pass else: is_dir = stat.S_ISDIR(st.st_mode) yield ItemInfo(name=p.name, exists=True, size=st.st_size, directory=is_dir) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1728840930.0 borgstore-0.1.0/src/borgstore/backends/rclone.py0000644000076500000240000002234614703002342020300 0ustar00twstaff""" Borgstore backend for rclone """ import os import re import requests import subprocess import json import secrets from typing import Iterator import threading from ._base import BackendBase, ItemInfo, validate_name from .errors import ( BackendError, BackendDoesNotExist, BackendMustNotBeOpen, BackendMustBeOpen, BackendAlreadyExists, ObjectNotFound, ) from ..constants import TMP_SUFFIX # rclone binary - expected to be on the path RCLONE = "rclone" # Debug HTTP requests and responses if False: import logging import http.client as http_client http_client.HTTPConnection.debuglevel = 1 logging.basicConfig() logging.getLogger().setLevel(logging.DEBUG) requests_log = logging.getLogger("requests.packages.urllib3") requests_log.setLevel(logging.DEBUG) requests_log.propagate = True def get_rclone_backend(url): """get rclone URL rclone:remote: rclone:remote:path """ # Check rclone is on the path try: info = json.loads(subprocess.check_output([RCLONE, "rc", "--loopback", "core/version"])) except Exception: raise BackendDoesNotExist("rclone binary not found on the path or not working properly") if info["decomposed"] < [1, 57, 0]: raise BackendDoesNotExist(f"rclone binary too old - need at least version v1.57.0 - found {info['version']}") rclone_regex = r""" rclone: (?P(.*)) """ m = re.match(rclone_regex, url, re.VERBOSE) if m: return Rclone(path=m["path"]) class Rclone(BackendBase): """Borgstore backend for rclone This uses the rclone rc API to control an rclone rcd process. """ precreate_dirs: bool = False HOST = "localhost" TRIES = 3 # try failed load/store operations this many times def __init__(self, path, *, do_fsync=False): if not path.endswith(":") and not path.endswith("/"): path += "/" self.fs = path self.process = None self.url = None self.user = "borg" self.password = secrets.token_urlsafe(32) def open(self): """ Start using the rclone server """ if self.process: raise BackendMustNotBeOpen() # Open rclone rcd listening on a random port with random auth args = [ RCLONE, "rcd", "--rc-user", self.user, "--rc-addr", self.HOST + ":0", "--rc-serve", "--use-server-modtime", ] env = os.environ.copy() env["RCLONE_RC_PASS"] = self.password # pass password by env var so it isn't in process list self.process = subprocess.Popen( args, stderr=subprocess.PIPE, stdout=subprocess.DEVNULL, stdin=subprocess.DEVNULL, env=env ) # Read the log line with the port in it line = self.process.stderr.readline() m = re.search(rb"(http://.*/)", line) if not m: raise BackendDoesNotExist(f"rclone rcd did not return URL in log line: {line}") self.url = m.group(1).decode("utf-8") def discard(): """discard log output on stderr so we don't block the process""" while True: line = self.process.stderr.readline() if not line: break # Process has finished thread = threading.Thread(target=discard, daemon=True) thread.start() def close(self): """ Stop using the rclone server """ if not self.process: raise BackendMustBeOpen() self.process.terminate() self.process = None self.url = None def _requests(self, fn, *args, tries=1, **kwargs): """ Runs a call to the requests function fn with *args and **kwargs It adds auth and decodes errors in a consistent way It returns the response object This will retry any 500 errors received from rclone tries times as these correspond to backend or protocol or Internet errors. Note that rclone will retry all operations internally except those which stream data. """ if not self.process or not self.url: raise BackendMustBeOpen() for try_number in range(tries): r = fn(*args, auth=(self.user, self.password), **kwargs) if r.status_code in (200, 206): return r elif r.status_code == 404: raise ObjectNotFound(f"Not Found: error {r.status_code}: {r.text}") err = BackendError(f"rclone rc command failed: error {r.status_code}: {r.text}") if r.status_code != 500: break raise err def _rpc(self, command, json_input, **kwargs): """ Run the rclone command over the rclone API Additional kwargs may be passed to requests """ if not self.url: raise BackendMustBeOpen() r = self._requests(requests.post, self.url + command, json=json_input, **kwargs) return r.json() def create(self): """create (initialize) the rclone storage""" if self.process: raise BackendMustNotBeOpen() with self: try: if any(self.list("")): raise BackendAlreadyExists(f"rclone storage base path exists and isn't empty: {self.fs}") except ObjectNotFound: pass self.mkdir("") def destroy(self): """completely remove the rclone storage (and its contents)""" if self.process: raise BackendMustNotBeOpen() with self: info = self.info("") if not info.exists: raise BackendDoesNotExist(f"rclone storage base path does not exist: {self.fs}") self._rpc("operations/purge", {"fs": self.fs, "remote": ""}) def __enter__(self): self.open() return self def __exit__(self, exc_type, exc_val, exc_tb): self.close() return False def mkdir(self, name: str) -> None: """create directory/namespace """ validate_name(name) self._rpc("operations/mkdir", {"fs": self.fs, "remote": name}) def rmdir(self, name: str) -> None: """remove directory/namespace """ validate_name(name) self._rpc("operations/rmdir", {"fs": self.fs, "remote": name}) def _to_item_info(self, remote, item): """Converts an rclone item at remote into a borgstore ItemInfo""" if item is None: return ItemInfo(name=os.path.basename(remote), exists=False, directory=False, size=0) name = item["Name"] size = item["Size"] directory = item["IsDir"] return ItemInfo(name=name, exists=True, size=size, directory=directory) def info(self, name) -> ItemInfo: """return information about """ validate_name(name) try: result = self._rpc( "operations/stat", {"fs": self.fs, "remote": name, "opt": {"recurse": False, "noModTime": True, "noMimeType": True}}, ) item = result["item"] except ObjectNotFound: item = None return self._to_item_info(name, item) def load(self, name: str, *, size=None, offset=0) -> bytes: """load value from """ validate_name(name) headers = {} if size is not None or offset > 0: if size is not None: headers["Range"] = f"bytes={offset}-{offset+size-1}" else: headers["Range"] = f"bytes={offset}-" r = self._requests(requests.get, f"{self.url}[{self.fs}]/{name}", tries=self.TRIES, headers=headers) return r.content def store(self, name: str, value: bytes) -> None: """store into """ validate_name(name) files = {"file": (os.path.basename(name), value, "application/octet-stream")} params = {"fs": self.fs, "remote": os.path.dirname(name)} self._rpc("operations/uploadfile", None, tries=self.TRIES, params=params, files=files) def delete(self, name: str) -> None: """delete """ validate_name(name) self._rpc("operations/deletefile", {"fs": self.fs, "remote": name}) def move(self, curr_name: str, new_name: str) -> None: """rename curr_name to new_name (overwrite target)""" validate_name(curr_name) validate_name(new_name) self._rpc( "operations/movefile", {"srcFs": self.fs, "srcRemote": curr_name, "dstFs": self.fs, "dstRemote": new_name} ) def list(self, name: str) -> Iterator[ItemInfo]: """list the contents of , non-recursively. Does not yield TMP_SUFFIX items - usually they are either not finished uploading or they are leftover crap from aborted uploads. The yielded ItemInfos are sorted alphabetically by name. """ validate_name(name) result = self._rpc( "operations/list", {"fs": self.fs, "remote": name, "opt": {"recurse": False, "noModTime": True, "noMimeType": True}}, ) for item in result["list"]: name = item["Name"] if name.endswith(TMP_SUFFIX): continue yield self._to_item_info(name, item) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1729007178.0 borgstore-0.1.0/src/borgstore/backends/sftp.py0000644000076500000240000002561114703507112017775 0ustar00twstaff""" SFTP based backend implementation - on a sftp server, use files in directories below a base path. """ from pathlib import Path import random import re import stat from typing import Optional try: import paramiko except ImportError: paramiko = None from ._base import BackendBase, ItemInfo, validate_name from .errors import BackendError, BackendMustBeOpen, BackendMustNotBeOpen, BackendDoesNotExist, BackendAlreadyExists from .errors import ObjectNotFound from ..constants import TMP_SUFFIX def get_sftp_backend(url): # sftp://username@hostname:22/path # note: # - username and port optional # - host must be a hostname (not IP) # - must give a path, default is a relative path (usually relative to user's home dir - # this is so that the sftp server admin can move stuff around without the user needing to know). # - giving an absolute path is also possible: sftp://username@hostname:22//home/username/borgstore sftp_regex = r""" sftp:// ((?P[^@]+)@)? (?P([^:/]+))(?::(?P\d+))?/ # slash as separator, not part of the path (?P(.+)) # path may or may not start with a slash, must not be empty """ if paramiko is not None: m = re.match(sftp_regex, url, re.VERBOSE) if m: return Sftp(username=m["username"], hostname=m["hostname"], port=int(m["port"] or "0"), path=m["path"]) class Sftp(BackendBase): # Sftp implementation supports precreate = True as well as = False, # but be careful: if backend creation was with precreate_dirs = False, # backend usage must not be with precreate_dirs = True. precreate_dirs: bool = True def __init__(self, hostname: str, path: str, port: int = 0, username: Optional[str] = None): self.username = username self.hostname = hostname self.port = port self.base_path = path self.opened = False if paramiko is None: raise BackendError("sftp backend unavailable: could not import paramiko!") def _get_host_config_from_file(self, path: str, hostname: str): """lookup the configuration for hostname in path (ssh config file)""" config_path = Path(path).expanduser() try: ssh_config = paramiko.SSHConfig.from_path(config_path) except FileNotFoundError: return paramiko.SSHConfigDict() # empty dict else: return ssh_config.lookup(hostname) def _get_host_config(self): """assemble all given and configured host config values""" host_config = paramiko.SSHConfigDict() # self.hostname might be an alias/shortcut (with real hostname given in configuration), # but there might be also nothing in the configs at all for self.hostname: host_config["hostname"] = self.hostname # first process system-wide ssh config, then override with user ssh config: host_config.update(self._get_host_config_from_file("/etc/ssh/ssh_config", self.hostname)) # note: no support yet for /etc/ssh/ssh_config.d/* host_config.update(self._get_host_config_from_file("~/.ssh/config", self.hostname)) # now override configured values with given values if self.username is not None: host_config.update({"user": self.username}) if self.port != 0: host_config.update({"port": self.port}) # make sure port is present and is an int host_config["port"] = int(host_config.get("port") or 22) return host_config def _connect(self): ssh = paramiko.SSHClient() # note: we do not deal with unknown hosts and ssh.set_missing_host_key_policy here, # the user shall just make "first contact" to any new host using ssh or sftp cli command # and interactively verify remote host fingerprints. ssh.load_system_host_keys() # this is documented to load the USER's known_hosts file host_config = self._get_host_config() ssh.connect( hostname=host_config["hostname"], username=host_config.get("user"), # if None, paramiko will use current user port=host_config["port"], key_filename=host_config.get("identityfile"), # list of keys, ~ is already expanded allow_agent=True, ) self.client = ssh.open_sftp() def _disconnect(self): self.client.close() self.client = None def create(self): if self.opened: raise BackendMustNotBeOpen() self._connect() try: try: # we accept an already existing directory, but we do not create parent dirs: self._mkdir(self.base_path, exist_ok=True, parents=False) except FileNotFoundError: raise BackendError(f"sftp storage base path's parent directory does not exist: {self.base_path}") contents = list(self.client.listdir(self.base_path)) if contents: raise BackendAlreadyExists(f"sftp storage base path is not empty: {self.base_path}") except IOError as err: raise BackendError(f"sftp storage I/O error: {err}") finally: self._disconnect() def destroy(self): def delete_recursive(path): parent = Path(path) for child_st in self.client.listdir_attr(str(parent)): child = parent / child_st.filename if stat.S_ISDIR(child_st.st_mode): delete_recursive(child) else: self.client.unlink(str(child)) self.client.rmdir(str(parent)) if self.opened: raise BackendMustNotBeOpen() self._connect() try: delete_recursive(self.base_path) except FileNotFoundError: raise BackendDoesNotExist(f"sftp storage base path does not exist: {self.base_path}") finally: self._disconnect() def open(self): if self.opened: raise BackendMustNotBeOpen() self._connect() try: st = self.client.stat(self.base_path) # check if this storage exists, fail early if not. except FileNotFoundError: raise BackendDoesNotExist(f"sftp storage base path does not exist: {self.base_path}") from None if not stat.S_ISDIR(st.st_mode): raise BackendDoesNotExist(f"sftp storage base path is not a directory: {self.base_path}") self.client.chdir(self.base_path) # this sets the cwd we work in! self.opened = True def close(self): if not self.opened: raise BackendMustBeOpen() self._disconnect() self.opened = False def _mkdir(self, name, *, parents=False, exist_ok=False): # Path.mkdir, but via sftp p = Path(name) try: self.client.mkdir(str(p)) except FileNotFoundError: # the parent dir is missing if not parents: raise # first create parent dir(s), recursively: self._mkdir(p.parents[0], parents=parents, exist_ok=exist_ok) # then retry: self.client.mkdir(str(p)) except OSError: # maybe p already existed? if not exist_ok: raise def mkdir(self, name): if not self.opened: raise BackendMustBeOpen() validate_name(name) self._mkdir(name, parents=True, exist_ok=True) def rmdir(self, name): if not self.opened: raise BackendMustBeOpen() validate_name(name) try: self.client.rmdir(name) except FileNotFoundError: raise ObjectNotFound(name) from None def info(self, name): if not self.opened: raise BackendMustBeOpen() validate_name(name) try: st = self.client.stat(name) except FileNotFoundError: return ItemInfo(name=name, exists=False, directory=False, size=0) else: is_dir = stat.S_ISDIR(st.st_mode) return ItemInfo(name=name, exists=True, directory=is_dir, size=st.st_size) def load(self, name, *, size=None, offset=0): if not self.opened: raise BackendMustBeOpen() validate_name(name) try: with self.client.open(name) as f: f.seek(offset) f.prefetch(size) # speeds up the following read() significantly! return f.read(size) except FileNotFoundError: raise ObjectNotFound(name) from None def store(self, name, value): if not self.opened: raise BackendMustBeOpen() validate_name(name) tmp_dir = Path(name).parent if not self.precreate_dirs: # note: tmp_dir already exists, if it was pre-created by Store.create_levels. self._mkdir(str(tmp_dir), parents=True, exist_ok=True) # write to a differently named temp file in same directory first, # so the store never sees partially written data. tmp_name = str(tmp_dir / ("".join(random.choices("abcdefghijklmnopqrstuvwxyz", k=8)) + TMP_SUFFIX)) with self.client.open(tmp_name, mode="w") as f: f.set_pipelined(True) # speeds up the following write() significantly! f.write(value) # rename it to the final name: try: self.client.posix_rename(tmp_name, name) except OSError: self.client.unlink(tmp_name) raise def delete(self, name): if not self.opened: raise BackendMustBeOpen() validate_name(name) try: self.client.unlink(name) except FileNotFoundError: raise ObjectNotFound(name) from None def move(self, curr_name, new_name): if not self.opened: raise BackendMustBeOpen() validate_name(curr_name) validate_name(new_name) if not self.precreate_dirs: # note: the parent dir of new_name already exists, if it was pre-created by Store.create_levels. try: parent_dir = Path(new_name).parent self._mkdir(str(parent_dir), parents=True, exist_ok=True) except OSError: # exists already? pass try: self.client.posix_rename(curr_name, new_name) except FileNotFoundError: raise ObjectNotFound(curr_name) from None def list(self, name): if not self.opened: raise BackendMustBeOpen() validate_name(name) try: infos = self.client.listdir_attr(name) except FileNotFoundError: raise ObjectNotFound(name) from None else: for info in sorted(infos, key=lambda i: i.filename): if not info.filename.endswith(TMP_SUFFIX): is_dir = stat.S_ISDIR(info.st_mode) yield ItemInfo(name=info.filename, exists=True, size=info.st_size, directory=is_dir) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722853419.0 borgstore-0.1.0/src/borgstore/constants.py0000644000076500000240000000072114654124053017262 0ustar00twstaff"""some constant definitions""" # namespace that needs to be given to list from the root of the storage: ROOTNS = "" # filename suffixes used for special purposes TMP_SUFFIX = ".tmp" # temporary file while being uploaded / written DEL_SUFFIX = ".del" # "soft deleted" item, undelete possible # max name length (not precise, suffixes might be added!) MAX_NAME_LENGTH = 100 # being rather conservative here to improve portability between backends and platforms ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1728914289.0 borgstore-0.1.0/src/borgstore/store.py0000644000076500000240000003000114703221561016371 0ustar00twstaff""" Key/Value Store Implementation. Store internally uses a backend to store k/v data and adds some functionality: - backend creation from a URL - configurable nesting - recursive .list method - soft deletion """ from binascii import hexlify from collections import Counter from contextlib import contextmanager import os import time from typing import Iterator, Optional from .utils.nesting import nest from .backends._base import ItemInfo, BackendBase from .backends.errors import ObjectNotFound, NoBackendGiven, BackendURLInvalid # noqa from .backends.posixfs import get_file_backend from .backends.rclone import get_rclone_backend from .backends.sftp import get_sftp_backend from .constants import DEL_SUFFIX def get_backend(url): """parse backend URL and return a backend instance (or None)""" backend = get_file_backend(url) if backend is not None: return backend backend = get_sftp_backend(url) if backend is not None: return backend backend = get_rclone_backend(url) if backend is not None: return backend class Store: def __init__(self, url: Optional[str] = None, backend: Optional[BackendBase] = None, levels: Optional[dict] = None): self.url = url if backend is None and url is not None: backend = get_backend(url) if backend is None: raise BackendURLInvalid(f"Invalid Backend Storage URL: {url}") if backend is None: raise NoBackendGiven("You need to give a backend instance or a backend url.") self.backend = backend self.set_levels(levels) self._stats: Counter = Counter() # this is to emulate additional latency to what the backend actually offers: self.latency = float(os.environ.get("BORGSTORE_LATENCY", "0")) / 1e6 # [us] -> [s] # this is to emulate less bandwidth than what the backend actually offers: self.bandwidth = float(os.environ.get("BORGSTORE_BANDWIDTH", "0")) / 8 # [bits/s] -> [bytes/s] def __repr__(self): return f"" def set_levels(self, levels: dict, create: bool = False) -> None: if not levels or not isinstance(levels, dict): raise ValueError("No or invalid levels configuration given.") # we accept levels as a dict, but we rather want a list of (namespace, levels) tuples, longest namespace first: self.levels = [entry for entry in sorted(levels.items(), key=lambda item: len(item[0]), reverse=True)] if create: self.create_levels() def create_levels(self): """creating any needed namespaces / directory in advance""" # doing that saves a lot of ad-hoc mkdir calls, which is especially important # for backends with high latency or other noticeable costs of mkdir. with self: for namespace, levels in self.levels: namespace = namespace.rstrip("/") level = max(levels) if level == 0: # flat, we just need to create the namespace directory: self.backend.mkdir(namespace) elif level > 0: # nested, we only need to create the deepest nesting dir layer, # any missing parent dirs will be created as needed by backend.mkdir. limit = 2 ** (level * 8) for i in range(limit): dir = hexlify(i.to_bytes(length=level, byteorder="big")).decode("ascii") name = f"{namespace}/{dir}" if namespace else dir nested_name = nest(name, level) self.backend.mkdir(nested_name[: -2 * level - 1]) else: raise ValueError(f"Invalid levels: {namespace}: {levels}") def create(self) -> None: self.backend.create() if self.backend.precreate_dirs: self.create_levels() def destroy(self) -> None: self.backend.destroy() def __enter__(self): self.open() return self def __exit__(self, exc_type, exc_val, exc_tb): self.close() return False def open(self) -> None: self.backend.open() def close(self) -> None: self.backend.close() @contextmanager def _stats_updater(self, key): """update call counters and overall times, also emulate latency and bandwidth""" # do not use this in generators! volume_before = self._stats_get_volume(key) start = time.perf_counter_ns() yield be_needed_ns = time.perf_counter_ns() - start volume_after = self._stats_get_volume(key) volume = volume_after - volume_before emulated_time = self.latency + (0 if not self.bandwidth else float(volume) / self.bandwidth) remaining_time = emulated_time - be_needed_ns / 1e9 if remaining_time > 0.0: time.sleep(remaining_time) end = time.perf_counter_ns() self._stats[f"{key}_calls"] += 1 self._stats[f"{key}_time"] += end - start def _stats_update_volume(self, key, amount): self._stats[f"{key}_volume"] += amount def _stats_get_volume(self, key): return self._stats.get(f"{key}_volume", 0) @property def stats(self): """ return statistics like method call counters, overall time [ns], overall data volume, overall throughput. please note that the stats values only consider what is seen on the Store api: - there might be additional time spent by the caller, outside of Store, thus: - real time is longer. - real throughput is lower. - there are some overheads not accounted for, e.g. the volume only adds up the data size of load and store. - write buffering or cached reads might give a wrong impression. """ st = dict(self._stats) # copy Counter -> generic dict for key in "info", "load", "store", "delete", "move", "list": # make sure key is present, even if method was not called st[f"{key}_calls"] = st.get(f"{key}_calls", 0) # convert integer ns timings to float s st[f"{key}_time"] = st.get(f"{key}_time", 0) / 1e9 for key in "load", "store": v = st.get(f"{key}_volume", 0) t = st.get(f"{key}_time", 0) st[f"{key}_throughput"] = v / t return st def _get_levels(self, name): """get levels from configuration depending on namespace""" for namespace, levels in self.levels: if name.startswith(namespace): return levels # Store.create_levels requires all namespaces to be configured in self.levels. raise KeyError(f"no matching namespace found for: {name}") def find(self, name: str, *, deleted=False) -> str: """ Find an item checking all supported nesting levels and return its nested name: - item not in the store yet: we won't find it, but find will return a nested name for **last** level. - item is in the store already: find will return the same nested name as the already present item. If deleted is True, find will try to find a "deleted" item. """ nested_name = None suffix = DEL_SUFFIX if deleted else None for level in self._get_levels(name): nested_name = nest(name, level, add_suffix=suffix) info = self.backend.info(nested_name) if info.exists: break return nested_name def info(self, name: str, *, deleted=False) -> ItemInfo: with self._stats_updater("info"): return self.backend.info(self.find(name, deleted=deleted)) def load(self, name: str, *, size=None, offset=0, deleted=False) -> bytes: with self._stats_updater("load"): result = self.backend.load(self.find(name, deleted=deleted), size=size, offset=offset) self._stats_update_volume("load", len(result)) return result def store(self, name: str, value: bytes) -> None: # note: using .find here will: # - overwrite an existing item (level stays same) # - write to the last level if no existing item is found. with self._stats_updater("store"): self.backend.store(self.find(name), value) self._stats_update_volume("store", len(value)) def delete(self, name: str, *, deleted=False) -> None: """ Really and immediately deletes an item. See also .move(name, delete=True) for "soft" deletion. """ with self._stats_updater("delete"): self.backend.delete(self.find(name, deleted=deleted)) def move( self, name: str, new_name: Optional[str] = None, *, delete: bool = False, undelete: bool = False, change_level: bool = False, deleted: bool = False, ) -> None: if delete: # use case: keep name, but soft "delete" the item nested_name = self.find(name, deleted=False) nested_new_name = nested_name + DEL_SUFFIX elif undelete: # use case: keep name, undelete a previously soft "deleted" item nested_name = self.find(name, deleted=True) nested_new_name = nested_name.removesuffix(DEL_SUFFIX) elif change_level: # use case: keep name, changing to another nesting level suffix = DEL_SUFFIX if deleted else None nested_name = self.find(name, deleted=deleted) nested_new_name = nest(name, self._get_levels(name)[-1], add_suffix=suffix) else: # generic use (be careful!) if not new_name: raise ValueError("generic move needs new_name to be given.") nested_name = self.find(name, deleted=deleted) nested_new_name = self.find(new_name, deleted=deleted) with self._stats_updater("move"): self.backend.move(nested_name, nested_new_name) def list(self, name: str, deleted: bool = False) -> Iterator[ItemInfo]: """ List all names in the namespace . If deleted is True and soft deleted items are encountered, they are yielded as if they were not deleted. Otherwise, they are ignored. backend.list giving us sorted names implies store.list is also sorted, if all items are stored on same level. """ # we need this wrapper due to the recursion - we only want to increment list_calls once: self._stats["list_calls"] += 1 yield from self._list(name, deleted=deleted) def _list(self, name: str, deleted: bool = False) -> Iterator[ItemInfo]: # as the backend.list method only supports non-recursive listing and # also returns directories/namespaces we introduced for nesting, we do the # recursion here (and also we do not yield directory names from here). start = time.perf_counter_ns() backend_list_iterator = self.backend.list(name) if self.latency: # we add the simulated latency once per backend.list iteration, not per element. time.sleep(self.latency) end = time.perf_counter_ns() self._stats["list_time"] += end - start while True: start = time.perf_counter_ns() try: info = next(backend_list_iterator) except StopIteration: break finally: end = time.perf_counter_ns() self._stats["list_time"] += end - start if info.directory: # note: we only expect subdirectories from key nesting, but not namespaces nested into each other. subdir_name = (name + "/" + info.name) if name else info.name yield from self._list(subdir_name, deleted=deleted) else: is_deleted = info.name.endswith(DEL_SUFFIX) if deleted and is_deleted: yield info._replace(name=info.name.removesuffix(DEL_SUFFIX)) elif not is_deleted: yield info ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1729007861.4732945 borgstore-0.1.0/src/borgstore/utils/0000755000076500000240000000000014703510365016034 5ustar00twstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1722853419.0 borgstore-0.1.0/src/borgstore/utils/__init__.py0000644000076500000240000000000014654124053020133 0ustar00twstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1727468875.0 borgstore-0.1.0/src/borgstore/utils/nesting.py0000644000076500000240000000515214675612513020066 0ustar00twstaff""" Nest / un-nest names to address directory scalability issues and deal with suffix of deleted items. Many directory implementations can't cope well with gazillions of entries, so we introduce intermediate directories to lower the amount of entries per directory. The name is expected to have the key as the last element, like: name = "namespace/0123456789abcdef" # often, the key is hex(hash(content)) As we can have a huge amount of keys, we could nest 2 levels deep: nested_name = nest(name, 2) nested_name == "namespace/01/23/0123456789abcdef" Note that the final element is the **full** key - we assume that this is better to deal with in case of errors (like a fs issue and stuff being pushed to lost+found) and also easier to deal with (e.g. the directory listing directly gives keys without needing to reassemble the full key from parent dirs and partial key). Also, a sorted directory list would be same order as a sorted key list. name = unnest(nested_name, namespace="namespace") # namespace with a final slash is also supported name == "namespace/0123456789abcdef" Notes: - it works the same way without a namespace, but guess one always wants to use a namespace. - always use nest / unnest, even if levels == 0 are desired as it also does some checks and cares for adding / removing a suffix. """ from typing import Optional def split_key(name: str) -> tuple[Optional[str], str]: namespace_key = name.rsplit("/", 1) if len(namespace_key) == 2: namespace, key = namespace_key else: # == 1 (no slash in name) namespace, key = None, name return namespace, key def nest(name: str, levels: int, *, add_suffix: Optional[str] = None) -> str: """namespace/12345678 --2 levels--> namespace/12/34/12345678""" if levels > 0: namespace, key = split_key(name) parts = [key[2 * level : 2 * level + 2] for level in range(levels)] parts.append(key) if namespace is not None: parts.insert(0, namespace) name = "/".join(parts) return (name + add_suffix) if add_suffix else name def unnest(name: str, namespace: str, *, remove_suffix: Optional[str] = None) -> str: """namespace/12/34/12345678 --namespace=namespace--> namespace/12345678""" if namespace: if not namespace.endswith("/"): namespace += "/" if not name.startswith(namespace): raise ValueError(f"name {name} does not start with namespace {namespace}") name = name.removeprefix(namespace) key = name.rsplit("/", 1)[-1] if remove_suffix: key = key.removesuffix(remove_suffix) return namespace + key ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1729007861.473751 borgstore-0.1.0/src/borgstore.egg-info/0000755000076500000240000000000014703510365016366 5ustar00twstaff././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1729007861.0 borgstore-0.1.0/src/borgstore.egg-info/PKG-INFO0000644000076500000240000001650714703510365017474 0ustar00twstaffMetadata-Version: 2.1 Name: borgstore Version: 0.1.0 Summary: key/value store Author-email: Thomas Waldmann License: BSD Project-URL: Homepage, https://github.com/borgbackup/borgstore Keywords: kv,key/value,store Classifier: Development Status :: 3 - Alpha Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: BSD License Classifier: Operating System :: POSIX Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: 3.12 Classifier: Programming Language :: Python :: 3.13 Classifier: Topic :: Software Development :: Libraries Classifier: Topic :: Software Development :: Libraries :: Python Modules Requires-Python: >=3.9 Description-Content-Type: text/x-rst License-File: LICENSE.rst Requires-Dist: requests>=2.25.1 Provides-Extra: sftp Requires-Dist: paramiko>=1.9.1; extra == "sftp" BorgStore ========= A key/value store implementation in Python, supporting multiple backends. Keys ---- A key (str) can look like: - 0123456789abcdef... (usually a long, hex-encoded hash value) - Any other pure ASCII string without "/" or ".." or " ". Namespaces ---------- To keep stuff apart, keys should get prefixed with a namespace, like: - config/settings - meta/0123456789abcdef... - data/0123456789abcdef... Please note: 1. you should always use namespaces. 2. nested namespaces like namespace1/namespace2/key are not supported. 3. the code could work without a namespace (namespace ""), but then you can't add another namespace later, because then you would have created nested namespaces. Values ------ Values can be any arbitrary binary data (bytes). Store Operations ---------------- The high-level Store API implementation transparently deals with nesting and soft deletion, so the caller doesn't have to care much for that and the Backend API can be much simpler: - create/destroy: initialize or remove the whole store. - list: flat list of the items in the given namespace, with or without soft deleted items. - store: write a new item into the store (giving its key/value pair) - load: read a value from the store (giving its key), partial loads giving offset and/or size are supported. - info: get information about an item via its key (exists? size? ...) - delete: immediately remove an item from the store (giving its key) - move: implements rename, soft delete / undelete, move to current nesting level - stats: api call counters, time spent in api methods, data volume/throughput - latency/bandwidth emulator: can emulate higher latency (via BORGSTORE_LATENCY [us]) and lower bandwidth (via BORGSTORE_BANDWIDTH [bit/s]) than what is actually provided by the backend. Automatic Nesting ----------------- For the Store user, items have names like e.g.: namespace/0123456789abcdef... namespace/abcdef0123456789... If there are very many items in the namespace, this could lead to scalability issues in the backend, thus the Store implementation offers transparent nesting, so that internally the Backend API will be called with names like e.g.: namespace/01/23/56/0123456789abcdef... namespace/ab/cd/ef/abcdef0123456789... The nesting depth can be configured from 0 (= no nesting) to N levels and there can be different nesting configurations depending on the namespace. The Store supports operating at different nesting levels in the same namespace at the same time. When using nesting depth > 0, the backends will assume that keys are hashes (have hex digits) because some backends will want to pre-create the nesting directories at backend initialization time to optimize for better performance while using the backend. Soft deletion ------------- To soft delete an item (so its value could be still read or it could be undeleted), the store just renames the item, appending ".del" to its name. Undelete reverses this by removing the ".del" suffix from the name. Some store operations have a boolean flag "deleted" to choose whether they shall consider soft deleted items. Backends -------- The backend API is rather simple, one only needs to provide some very basic operations. Existing backends are listed below, more might come in future. posixfs ~~~~~~~ Use storage on a local POSIX filesystem: - URL: ``file:///absolute/path`` - it is the caller's task to create an absolute fs path from a relative one. - namespaces: directories - values: in key-named files - pre-creates nesting directories sftp ~~~~ Use storage on a sftp server: - URL: ``sftp://user@server:port/relative/path`` (strongly recommended) For user's and admin's convenience, mapping the URL path to the server fs path depends on the server configuration (home directory, sshd/sftpd config, ...). Usually the path is relative to the user's home directory. - URL: ``sftp://user@server:port//absolute/path`` As this uses an absolute path, things are more difficult here: - user's config might break if server admin moves a user home to a new location. - users must know the full absolute path of space they have permission to use. - namespaces: directories - values: in key-named files - pre-creates nesting directories rclone ~~~~~~ Use storage on any of the many cloud providers `rclone `_ supports: - URL: ``rclone:remote:path``, we just prefix "rclone:" and give all to the right of that to rclone, see: https://rclone.org/docs/#syntax-of-remote-paths - implementation of this primarily depends on the specific remote. Scalability ----------- - Count of key/value pairs stored in a namespace: automatic nesting is provided for keys to address common scalability issues. - Key size: there are no special provisions for extremely long keys (like: more than backend limitations). Usually this is not a problem though. - Value size: there are no special provisions for dealing with large value sizes (like: more than free memory, more than backend storage limitations, etc.). If one deals with very large values, one usually cuts them into chunks before storing them into the store. - Partial loads improve performance by avoiding a full load if only a part of the value is needed (e.g. a header with metadata). Installation ------------ Install without the ``sftp:`` backend:: pip install borgstore Install with the ``sftp:`` backend (more dependencies):: pip install "borgstore[sftp]" Please note that ``rclone:`` also supports sftp remotes. Want a demo? ------------ Run this to get instructions how to run the demo: python3 -m borgstore State of this project --------------------- **API is still unstable and expected to change as development goes on.** **As long as the API is unstable, there will be no data migration tools, like e.g. for upgrading an existing store's data to a new release.** There are tests and they succeed for the basic functionality, so some of the stuff is already working well. There might be missing features or optimization potential, feedback welcome! There are a lot of possible, but still missing backends. If you want to create and support one: pull requests are welcome. Borg? ----- Please note that this code is currently **not** used by the stable release of BorgBackup (aka "borg"), but only by borg2 beta 10+ and master branch. License ------- BSD license. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1729007861.0 borgstore-0.1.0/src/borgstore.egg-info/SOURCES.txt0000644000076500000240000000115614703510365020255 0ustar00twstaffCHANGES.rst LICENSE.rst README.rst pyproject.toml src/borgstore/__init__.py src/borgstore/__main__.py src/borgstore/_version.py src/borgstore/constants.py src/borgstore/store.py src/borgstore.egg-info/PKG-INFO src/borgstore.egg-info/SOURCES.txt src/borgstore.egg-info/dependency_links.txt src/borgstore.egg-info/requires.txt src/borgstore.egg-info/top_level.txt src/borgstore/backends/__init__.py src/borgstore/backends/_base.py src/borgstore/backends/errors.py src/borgstore/backends/posixfs.py src/borgstore/backends/rclone.py src/borgstore/backends/sftp.py src/borgstore/utils/__init__.py src/borgstore/utils/nesting.py././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1729007861.0 borgstore-0.1.0/src/borgstore.egg-info/dependency_links.txt0000644000076500000240000000000114703510365022434 0ustar00twstaff ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1729007861.0 borgstore-0.1.0/src/borgstore.egg-info/requires.txt0000644000076500000240000000005114703510365020762 0ustar00twstaffrequests>=2.25.1 [sftp] paramiko>=1.9.1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1729007861.0 borgstore-0.1.0/src/borgstore.egg-info/top_level.txt0000644000076500000240000000001214703510365021111 0ustar00twstaffborgstore