persist-queue-0.4.0/0000755000076500000240000000000013311454610015254 5ustar yuzhi.wxstaff00000000000000persist-queue-0.4.0/PKG-INFO0000644000076500000240000003554513311454610016365 0ustar yuzhi.wxstaff00000000000000Metadata-Version: 1.2 Name: persist-queue Version: 0.4.0 Summary: A thread-safe disk based persistent queue in Python. Home-page: http://github.com/peter-wangxu/persist-queue Author: Peter Wang Author-email: wangxu198709@gmail.com Maintainer: Peter Wang Maintainer-email: wangxu198709@gmail.com License: BSD Description: persist-queue - A thread-safe, disk-based queue for Python ========================================================== .. image:: https://img.shields.io/circleci/project/github/peter-wangxu/persist-queue/master.svg?label=Linux%20%26%20Mac :target: https://circleci.com/gh/peter-wangxu/persist-queue .. image:: https://img.shields.io/appveyor/ci/peter-wangxu/persist-queue/master.svg?label=Windows :target: https://ci.appveyor.com/project/peter-wangxu/persist-queue .. image:: https://img.shields.io/codecov/c/github/peter-wangxu/persist-queue/master.svg :target: https://codecov.io/gh/peter-wangxu/persist-queue .. image:: https://img.shields.io/pypi/v/persist-queue.svg :target: https://pypi.python.org/pypi/persist-queue ``persist-queue`` implements a file-based queue and a serial of sqlite3-based queues. The goals is to achieve following requirements: * Disk-based: each queued item should be stored in disk in case of any crash. * Thread-safe: can be used by multi-threaded producers and multi-threaded consumers. * Recoverable: Items can be read after process restart. * Green-compatible: can be used in ``greenlet`` or ``eventlet`` environment. While *queuelib* and *python-pqueue* cannot fulfil all of above. After some try, I found it's hard to achieve based on their current implementation without huge code change. this is the motivation to start this project. *persist-queue* use *pickle* object serialization module to support object instances. Most built-in type, like `int`, `dict`, `list` are able to be persisted by `persist-queue` directly, to support customized objects, please refer to `Pickling and unpickling extension types(Python2) `_ and `Pickling Class Instances(Python3) `_ This project is based on the achievements of `python-pqueue `_ and `queuelib `_ Requirements ------------ * Python 2.7 or Python 3.x. * Full support for Linux. * Windows support (with `Caution`_ if ``persistqueue.Queue`` is used). Installation ------------ from pypi ^^^^^^^^^ .. code-block:: console pip install persist-queue from source code ^^^^^^^^^^^^^^^^ .. code-block:: console git clone https://github.com/peter-wangxu/persist-queue cd persist-queue python setup.py install Benchmark --------- Here are the results for writing/reading **1000** items to the disk comparing the sqlite3 and file queue. - Windows - OS: Windows 10 - Disk: SATA3 SSD - RAM: 16 GiB +---------------+---------+-------------------------+----------------------------+ | | Write | Write/Read(1 task_done) | Write/Read(many task_done) | +---------------+---------+-------------------------+----------------------------+ | SQLite3 Queue | 1.8880 | 2.0290 | 3.5940 | +---------------+---------+-------------------------+----------------------------+ | File Queue | 15.0550 | 15.9150 | 30.7650 | +---------------+---------+-------------------------+----------------------------+ - Linux - OS: Ubuntu 16.04 (VM) - Disk: SATA3 SSD - RAM: 4 GiB +---------------+--------+-------------------------+----------------------------+ | | Write | Write/Read(1 task_done) | Write/Read(many task_done) | +---------------+--------+-------------------------+----------------------------+ | SQLite3 Queue | 1.8282 | 1.8075 | 2.8639 | +---------------+--------+-------------------------+----------------------------+ | File Queue | 0.9123 | 1.0411 | 2.5104 | +---------------+--------+-------------------------+----------------------------+ **note** Above result was got from: .. code-block:: console python benchmark/run_benchmark.py 1000 To see the real performance on your host, run the script under ``benchmark/run_benchmark.py``: .. code-block:: console python benchmark/run_benchmark.py Examples -------- Example usage with a SQLite3 based queue ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python >>> import persistqueue >>> q = persistqueue.SQLiteQueue('mypath', auto_commit=True) >>> q.put('str1') >>> q.put('str2') >>> q.put('str3') >>> q.get() 'str1' >>> del q Close the console, and then recreate the queue: .. code-block:: python >>> import persistqueue >>> q = persistqueue.SQLiteQueue('mypath', auto_commit=True) >>> q.get() 'str2' >>> Example usage of SQLite3 based ``UniqueQ`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This queue does not allow duplicate items. .. code-block:: python >>> import persistqueue >>> q = persistqueue.UniqueQ('mypath') >>> q.put('str1') >>> q.put('str1') >>> q.size 1 >>> q.put('str2') >>> q.size 2 >>> Example usage of SQLite3 based ``SQLiteAckQueue`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The core functions: ``get``: get from queue and mark item as unack ``ack``: mark item as acked ``nack``: there might be something wrong with current consumer, so mark item as ready and new consumer will get it ``ack_failed``: there might be something wrong during process, so just mark item as failed. .. code-block:: python >>> import persisitqueue >>> ackq = persistqueue.SQLiteAckQueue('path') >>> ackq.put('str1') >>> item = ackq.get() >>> # Do something with the item >>> ackq.ack(item) # If done with the item >>> ackq.nack(item) # Else mark item as `nack` so that it can be proceeded again by any worker >>> ackq.ack_failed() # Or else mark item as `ack_failed` to discard this item Note: this queue does not support ``auto_commit=True`` Example usage with a file based queue ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python >>> from persistqueue import Queue >>> q = Queue("mypath") >>> q.put('a') >>> q.put('b') >>> q.put('c') >>> q.get() 'a' >>> q.task_done() Close the python console, and then we restart the queue from the same path, .. code-block:: python >>> from persistqueue import Queue >>> q = Queue('mypath') >>> q.get() 'b' >>> q.task_done() Example usage with a SQLite3 based dict ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python >>> from persisitqueue import PDict >>> q = PDict("testpath", "testname") >>> q['key1'] = 123 >>> q['key2'] = 321 >>> q['key1'] 123 >>> len(q) 2 >>> del q['key1'] >>> q['key1'] Traceback (most recent call last): File "", line 1, in File "persistqueue\pdict.py", line 58, in __getitem__ raise KeyError('Key: {} not exists.'.format(item)) KeyError: 'Key: key1 not exists.' Close the console and restart the PDict .. code-block:: python >>> from persisitqueue import PDict >>> q = PDict("testpath", "testname") >>> q['key2'] 321 Multi-thread usage for **SQLite3** based queue ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python from persistqueue import FIFOSQLiteQueue q = FIFOSQLiteQueue(path="./test", multithreading=True) def worker(): while True: item = q.get() do_work(item) for i in range(num_worker_threads): t = Thread(target=worker) t.daemon = True t.start() for item in source(): q.put(item) multi-thread usage for **Queue** ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python from persistqueue import Queue q = Queue() def worker(): while True: item = q.get() do_work(item) q.task_done() for i in range(num_worker_threads): t = Thread(target=worker) t.daemon = True t.start() for item in source(): q.put(item) q.join() # block until all tasks are done Tips ---- ``task_done`` is required both for filed based queue and SQLite3 based queue (when ``auto_commit=False``) to persist the cursor of next ``get`` to the disk. Performance impact ------------------ - **WAL** Starting on v0.3.2, the ``persistqueue`` is leveraging the sqlite3 builtin feature ``WAL `` which can improve the performance significantly, a general testing indicates that ``persistqueue`` is 2-4 times faster than previous version. - **auto_commit=False** Since persistqueue v0.3.0, a new parameter ``auto_commit`` is introduced to tweak the performance for sqlite3 based queues as needed. When specify ``auto_commit=False``, user needs to perform ``queue.task_done()`` to persist the changes made to the disk since last ``task_done`` invocation. - **pickle protocol selection** From v0.3.6, the ``persistqueue`` will select ``Protocol version 2`` for python2 and ``Protocol version 4`` for python3 respectively. This selection only happens when the directory is not present when initializing the queue. Tests ----- *persist-queue* use ``tox`` to trigger tests. - Unit test .. code-block:: console tox -e Available ````: ``py27``, ``py34``, ``py35``, ``py36``, ``py37`` - PEP8 check .. code-block:: console tox -e pep8 `pyenv `_ is usually a helpful tool to manage multiple versions of Python. Caution ------- Currently, the atomic operation is not supported on Windows due to the limitation of Python's `os.rename `_, That's saying, the data in ``persistqueue.Queue`` could be in unreadable state when an incidental failure occurs during ``Queue.task_done``. **DO NOT put any critical data on persistqueue.queue on Windows**. This issue is under track by issue `Atomic renames on windows `_ Contribution ------------ Simply fork this repo and send PR for your code change(also tests to cover your change), remember to give a title and description of your PR. I am willing to enhance this project with you :). License ------- `BSD `_ Contributors ------------ `Contributors `_ FAQ --- * ``sqlite3.OperationalError: database is locked`` is raised. persistqueue open 2 connections for the db if ``multithreading=True``, the SQLite database is locked until that transaction is committed. The ``timeout`` parameter specifies how long the connection should wait for the lock to go away until raising an exception. Default time is **10**, increase ``timeout`` when creating the queue if above error occurs. * sqlite3 based queues are not thread-safe. The sqlite3 queues are heavily tested under multi-threading environment, if you find it's not thread-safe, please make sure you set the ``multithreading=True`` when initializing the queue before submitting new issue:). Platform: all Classifier: Development Status :: 4 - Beta Classifier: Operating System :: OS Independent Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: BSD License Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: Implementation Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Topic :: Software Development :: Libraries persist-queue-0.4.0/persistqueue/0000755000076500000240000000000013311454610020012 5ustar yuzhi.wxstaff00000000000000persist-queue-0.4.0/persistqueue/queue.py0000644000076500000240000001772713277745217021547 0ustar yuzhi.wxstaff00000000000000# coding=utf-8 """A thread-safe disk based persistent queue in Python.""" import logging import os import pickle import tempfile import threading from time import time as _time from persistqueue.exceptions import Empty, Full from persistqueue import common log = logging.getLogger(__name__) def _truncate(fn, length): with open(fn, 'ab+') as f: f.truncate(length) class Queue(object): def __init__(self, path, maxsize=0, chunksize=100, tempdir=None): """Create a persistent queue object on a given path. The argument path indicates a directory where enqueued data should be persisted. If the directory doesn't exist, one will be created. If maxsize is <= 0, the queue size is infinite. The optional argument chunksize indicates how many entries should exist in each chunk file on disk. The tempdir parameter indicates where temporary files should be stored. The tempdir has to be located on the same disk as the enqueued data in order to obtain atomic operations. """ log.debug('Initializing File based Queue with path {}'.format(path)) self.path = path self.chunksize = chunksize self.tempdir = tempdir self.maxsize = maxsize self.protocol = None self._init(maxsize) if self.tempdir: if os.stat(self.path).st_dev != os.stat(self.tempdir).st_dev: raise ValueError("tempdir has to be located " "on same path filesystem") self.info = self._loadinfo() # truncate head case it contains garbage hnum, hcnt, hoffset = self.info['head'] headfn = self._qfile(hnum) if os.path.exists(headfn): if hoffset < os.path.getsize(headfn): _truncate(headfn, hoffset) # let the head file open self.headf = self._openchunk(hnum, 'ab+') # let the tail file open tnum, _, toffset = self.info['tail'] self.tailf = self._openchunk(tnum) self.tailf.seek(toffset) # update unfinished tasks with the current number of enqueued tasks self.unfinished_tasks = self.info['size'] # optimize info file updates self.update_info = True def _init(self, maxsize): self.mutex = threading.Lock() self.not_empty = threading.Condition(self.mutex) self.not_full = threading.Condition(self.mutex) self.all_tasks_done = threading.Condition(self.mutex) if not os.path.exists(self.path): os.makedirs(self.path) self.protocol = common.select_pickle_protocol() def join(self): with self.all_tasks_done: while self.unfinished_tasks: self.all_tasks_done.wait() def qsize(self): n = None with self.mutex: n = self._qsize() return n def _qsize(self): return self.info['size'] def put(self, item, block=True, timeout=None): "Interface for putting item in disk-based queue." self.not_full.acquire() try: if self.maxsize > 0: if not block: if self._qsize() == self.maxsize: raise Full elif timeout is None: while self._qsize() == self.maxsize: self.not_full.wait() elif timeout < 0: raise ValueError("'timeout' must be a non-negative number") else: endtime = _time() + timeout while self._qsize() == self.maxsize: remaining = endtime - _time() if remaining <= 0.0: raise Full self.not_full.wait(remaining) self._put(item) self.unfinished_tasks += 1 self.not_empty.notify() finally: self.not_full.release() def _put(self, item): pickle.dump(item, self.headf) self.headf.flush() hnum, hpos, _ = self.info['head'] hpos += 1 if hpos == self.info['chunksize']: hpos = 0 hnum += 1 self.headf.close() self.headf = self._openchunk(hnum, 'ab+') self.info['size'] += 1 self.info['head'] = [hnum, hpos, self.headf.tell()] self._saveinfo() def put_nowait(self, item): return self.put(item, False) def get(self, block=True, timeout=None): self.not_empty.acquire() try: if not block: if not self._qsize(): raise Empty elif timeout is None: while not self._qsize(): self.not_empty.wait() elif timeout < 0: raise ValueError("'timeout' must be a non-negative number") else: endtime = _time() + timeout while not self._qsize(): remaining = endtime - _time() if remaining <= 0.0: raise Empty self.not_empty.wait(remaining) item = self._get() self.not_full.notify() return item finally: self.not_empty.release() def get_nowait(self): return self.get(False) def _get(self): tnum, tcnt, toffset = self.info['tail'] hnum, hcnt, _ = self.info['head'] if [tnum, tcnt] >= [hnum, hcnt]: return None data = pickle.load(self.tailf) toffset = self.tailf.tell() tcnt += 1 if tcnt == self.info['chunksize'] and tnum <= hnum: tcnt = toffset = 0 tnum += 1 self.tailf.close() self.tailf = self._openchunk(tnum) self.info['size'] -= 1 self.info['tail'] = [tnum, tcnt, toffset] self.update_info = True return data def task_done(self): with self.all_tasks_done: unfinished = self.unfinished_tasks - 1 if unfinished <= 0: if unfinished < 0: raise ValueError("task_done() called too many times.") self.all_tasks_done.notify_all() self.unfinished_tasks = unfinished self._task_done() def _task_done(self): if self.update_info: self._saveinfo() self.update_info = False def _openchunk(self, number, mode='rb'): return open(self._qfile(number), mode) def _loadinfo(self): infopath = self._infopath() if os.path.exists(infopath): with open(infopath, 'rb') as f: info = pickle.load(f) else: info = { 'chunksize': self.chunksize, 'size': 0, 'tail': [0, 0, 0], 'head': [0, 0, 0], } return info def _gettempfile(self): if self.tempdir: return tempfile.mkstemp(dir=self.tempdir) else: return tempfile.mkstemp() def _saveinfo(self): tmpfd, tmpfn = self._gettempfile() os.write(tmpfd, pickle.dumps(self.info)) os.close(tmpfd) # POSIX requires that 'rename' is an atomic operation try: os.rename(tmpfn, self._infopath()) except OSError as e: if getattr(e, 'winerror', None) == 183: os.remove(self._infopath()) os.rename(tmpfn, self._infopath()) else: raise self._clear_tail_file() def _clear_tail_file(self): """Remove the tail files whose items were already get.""" tnum, _, _ = self.info['tail'] while tnum >= 1: tnum -= 1 path = self._qfile(tnum) if os.path.exists(path): os.remove(path) else: break def _qfile(self, number): return os.path.join(self.path, 'q%05d' % number) def _infopath(self): return os.path.join(self.path, 'info') persist-queue-0.4.0/persistqueue/sqlbase.py0000644000076500000240000001512613277745217022044 0ustar yuzhi.wxstaff00000000000000import logging import os import sqlite3 import threading from persistqueue import common sqlite3.enable_callback_tracebacks(True) log = logging.getLogger(__name__) def with_conditional_transaction(func): def _execute(obj, *args, **kwargs): with obj.tran_lock: with obj._putter as tran: stat, param = func(obj, *args, **kwargs) tran.execute(stat, param) return _execute def commit_ignore_error(conn): """Ignore the error of no transaction is active. The transaction may be already committed by user's task_done call. It's safe to ignore all errors of this kind. """ try: conn.commit() except sqlite3.OperationalError as ex: if 'no transaction is active' in str(ex): log.debug( 'Not able to commit the transaction, ' 'may already be committed.') else: raise class SQLiteBase(object): """SQLite3 base class.""" _TABLE_NAME = 'base' # DB table name _KEY_COLUMN = '' # the name of the key column, used in DB CRUD _SQL_CREATE = '' # SQL to create a table _SQL_UPDATE = '' # SQL to update a record _SQL_INSERT = '' # SQL to insert a record _SQL_SELECT = '' # SQL to select a record _SQL_SELECT_WHERE = '' # SQL to select a record with criteria _MEMORY = ':memory:' # flag indicating store DB in memory def __init__(self, path, name='default', multithreading=False, timeout=10.0, auto_commit=True): """Initiate a queue in sqlite3 or memory. :param path: path for storing DB file. :param name: the suffix for the table name, table name would be ${_TABLE_NAME}_${name} :param multithreading: if set to True, two db connections will be, one for **put** and one for **get**. :param timeout: timeout in second waiting for the database lock. :param auto_commit: Set to True, if commit is required on every INSERT/UPDATE action, otherwise False, whereas a **task_done** is required to persist changes after **put**. """ self.memory_sql = False self.path = path self.name = name self.timeout = timeout self.multithreading = multithreading self.auto_commit = auto_commit self.protocol = None self._init() def _init(self): """Initialize the tables in DB.""" if self.path == self._MEMORY: self.memory_sql = True log.debug("Initializing Sqlite3 Queue in memory.") self.protocol = common.select_pickle_protocol() elif not os.path.exists(self.path): os.makedirs(self.path) log.debug( 'Initializing Sqlite3 Queue with path {}'.format(self.path)) # Set to current highest pickle protocol for new queue. self.protocol = common.select_pickle_protocol() self._conn = self._new_db_connection( self.path, self.multithreading, self.timeout) self._getter = self._conn self._putter = self._conn self._conn.execute(self._sql_create) self._conn.commit() # Setup another session only for disk-based queue. if self.multithreading: if not self.memory_sql: self._putter = self._new_db_connection( self.path, self.multithreading, self.timeout) if self.protocol is not None: self._conn.text_factory = str self._putter.text_factory = str # SQLite3 transaction lock self.tran_lock = threading.Lock() self.put_event = threading.Event() def _new_db_connection(self, path, multithreading, timeout): conn = None if path == self._MEMORY: conn = sqlite3.connect(path, check_same_thread=not multithreading) else: conn = sqlite3.connect('{}/data.db'.format(path), timeout=timeout, check_same_thread=not multithreading) conn.execute('PRAGMA journal_mode=WAL;') return conn @with_conditional_transaction def _insert_into(self, *record): return self._sql_insert, record @with_conditional_transaction def _update(self, key, *args): args = list(args) + [key] return self._sql_update, args @with_conditional_transaction def _delete(self, key, op='='): sql = 'DELETE FROM {} WHERE {} {} ?'.format(self._table_name, self._key_column, op) return sql, (key,) def _select(self, *args, **kwargs): op = kwargs.get('op', None) column = kwargs.get('column', None) if op and column: return self._getter.execute( self._sql_select_where(op, column), args).fetchone() return self._getter.execute(self._sql_select, args).fetchone() def _count(self): sql = 'SELECT COUNT({}) FROM {}'.format(self._key_column, self._table_name) row = self._getter.execute(sql).fetchone() return row[0] if row else 0 def _task_done(self): """Only required if auto-commit is set as False.""" commit_ignore_error(self._putter) @property def _table_name(self): return '{}_{}'.format(self._TABLE_NAME, self.name) @property def _key_column(self): return self._KEY_COLUMN @property def _sql_create(self): return self._SQL_CREATE.format(table_name=self._table_name, key_column=self._key_column) @property def _sql_insert(self): return self._SQL_INSERT.format(table_name=self._table_name, key_column=self._key_column) @property def _sql_update(self): return self._SQL_UPDATE.format(table_name=self._table_name, key_column=self._key_column) @property def _sql_select(self): return self._SQL_SELECT.format(table_name=self._table_name, key_column=self._key_column) def _sql_select_where(self, op, column): return self._SQL_SELECT_WHERE.format(table_name=self._table_name, key_column=self._key_column, op=op, column=column) persist-queue-0.4.0/persistqueue/__init__.py0000644000076500000240000000100213311450020022103 0ustar yuzhi.wxstaff00000000000000# coding=utf-8 __author__ = 'Peter Wang' __license__ = 'BSD' __version__ = '0.4.0' from .exceptions import Empty, Full # noqa from .pdict import PDict # noqa from .queue import Queue # noqa from .sqlqueue import SQLiteQueue, FIFOSQLiteQueue, FILOSQLiteQueue, UniqueQ # noqa from .sqlackqueue import SQLiteAckQueue __all__ = ["Queue", "SQLiteQueue", "FIFOSQLiteQueue", "FILOSQLiteQueue", "UniqueQ", "PDict", "SQLiteAckQueue", "Empty", "Full", "__author__", "__license__", "__version__"] persist-queue-0.4.0/persistqueue/common.py0000644000076500000240000000050113277745217021671 0ustar yuzhi.wxstaff00000000000000#! coding = utf-8 import logging import pickle log = logging.getLogger(__name__) def select_pickle_protocol(): if pickle.HIGHEST_PROTOCOL <= 2: r = 2 # python2 use fixed 2 else: r = 4 # python3 use fixed 4 log.info("Selected pickle protocol: '{}'".format(r)) return r persist-queue-0.4.0/persistqueue/sqlackqueue.py0000644000076500000240000001740213307241164022716 0ustar yuzhi.wxstaff00000000000000# coding=utf-8 from __future__ import absolute_import from __future__ import unicode_literals import logging import pickle import sqlite3 import time as _time import threading import warnings from . import sqlbase from .exceptions import Empty sqlite3.enable_callback_tracebacks(True) log = logging.getLogger(__name__) # 10 seconds internal for `wait` of event TICK_FOR_WAIT = 10 class AckStatus(object): inited = '0' ready = '1' unack = '2' acked = '5' ack_failed = '9' class SQLiteAckQueue(sqlbase.SQLiteBase): """SQLite3 based FIFO queue with ack support.""" _TABLE_NAME = 'ack_queue' _KEY_COLUMN = '_id' # the name of the key column, used in DB CRUD _MAX_ACKED_LENGTH = 1000 # SQL to create a table _SQL_CREATE = ('CREATE TABLE IF NOT EXISTS {table_name} (' '{key_column} INTEGER PRIMARY KEY AUTOINCREMENT, ' 'data BLOB, timestamp FLOAT, status INTEGER)') # SQL to insert a record _SQL_INSERT = 'INSERT INTO {table_name} (data, timestamp, status)'\ ' VALUES (?, ?, %s)' % AckStatus.inited # SQL to select a record _SQL_SELECT = ('SELECT {key_column}, data, status FROM {table_name} ' 'WHERE status < %s ' 'ORDER BY {key_column} ASC LIMIT 1' % AckStatus.unack) _SQL_MARK_ACK_UPDATE = 'UPDATE {table_name} SET status = ?'\ ' WHERE {key_column} = ?' _SQL_SELECT_WHERE = 'SELECT {key_column}, data FROM {table_name}'\ ' WHERE status < %s AND' \ ' {column} {op} ? ORDER BY {key_column} ASC'\ ' LIMIT 1 ' % AckStatus.unack def __init__(self, *args, **kwargs): super(SQLiteAckQueue, self).__init__(*args, **kwargs) if not self.auto_commit: warnings.warn("disable auto commit is not support in ack queue") self.auto_commit = True self._unack_cache = {} def put(self, item): obj = pickle.dumps(item, protocol=self.protocol) self._insert_into(obj, _time.time()) self.total += 1 self.put_event.set() def _init(self): super(SQLiteAckQueue, self)._init() # Action lock to assure multiple action to be *atomic* self.action_lock = threading.Lock() self.total = self._count() def _count(self): sql = 'SELECT COUNT({}) FROM {}'\ ' WHERE status < ?'.format(self._key_column, self._table_name) row = self._getter.execute(sql, (AckStatus.unack,)).fetchone() return row[0] if row else 0 def _ack_count_via_status(self, status): sql = 'SELECT COUNT({}) FROM {}'\ ' WHERE status = ?'.format(self._key_column, self._table_name) row = self._getter.execute(sql, (status, )).fetchone() return row[0] if row else 0 def unack_count(self): return self._ack_count_via_status(AckStatus.unack) def acked_count(self): return self._ack_count_via_status(AckStatus.acked) def ready_count(self): return self._ack_count_via_status(AckStatus.ready) def ack_failed_count(self): return self._ack_count_via_status(AckStatus.ack_failed) @sqlbase.with_conditional_transaction def _mark_ack_status(self, key, status): return self._sql_mark_ack_status, (status, key, ) @sqlbase.with_conditional_transaction def clear_acked_data(self): sql = """DELETE FROM {table_name} WHERE {key_column} IN ( SELECT _id FROM {table_name} WHERE status = ? ORDER BY {key_column} DESC LIMIT 1000 OFFSET {max_acked_length} )""".format(table_name=self._table_name, key_column=self._key_column, max_acked_length=self._MAX_ACKED_LENGTH) return sql, AckStatus.acked @property def _sql_mark_ack_status(self): return self._SQL_MARK_ACK_UPDATE.format(table_name=self._table_name, key_column=self._key_column) def _pop(self): with self.action_lock: row = self._select() # Perhaps a sqlite3 bug, sometimes (None, None) is returned # by select, below can avoid these invalid records. if row and row[0] is not None: self._mark_ack_status(row[0], AckStatus.unack) pickled_data = row[1] # pickled data item = pickle.loads(pickled_data) self._unack_cache[row[0]] = item self.total -= 1 return item return None def _find_item_id(self, item): for key, value in self._unack_cache.items(): if value is item: return key log.warning("Can't find item %s from unack cache", item) return None def ack(self, item): with self.action_lock: _id = self._find_item_id(item) if _id is None: return self._mark_ack_status(_id, AckStatus.acked) self._unack_cache.pop(_id) def ack_failed(self, item): with self.action_lock: _id = self._find_item_id(item) if _id is None: return self._mark_ack_status(_id, AckStatus.ack_failed) self._unack_cache.pop(_id) def nack(self, item): with self.action_lock: _id = self._find_item_id(item) if _id is None: return self._mark_ack_status(_id, AckStatus.ready) self._unack_cache.pop(_id) self.total += 1 def get(self, block=True, timeout=None): if not block: pickled = self._pop() if not pickled: raise Empty elif timeout is None: # block until a put event. pickled = self._pop() while not pickled: self.put_event.clear() self.put_event.wait(TICK_FOR_WAIT) pickled = self._pop() elif timeout < 0: raise ValueError("'timeout' must be a non-negative number") else: # block until the timeout reached endtime = _time.time() + timeout pickled = self._pop() while not pickled: self.put_event.clear() remaining = endtime - _time.time() if remaining <= 0.0: raise Empty self.put_event.wait( TICK_FOR_WAIT if TICK_FOR_WAIT < remaining else remaining) pickled = self._pop() item = pickled return item def task_done(self): """Persist the current state if auto_commit=False.""" if not self.auto_commit: self._task_done() @property def size(self): return self.total def qsize(self): return self.size def __len__(self): return self.size FIFOSQLiteAckQueue = SQLiteAckQueue class FILOSQLiteAckQueue(SQLiteAckQueue): """SQLite3 based FILO queue with ack support.""" _TABLE_NAME = 'ack_filo_queue' # SQL to select a record _SQL_SELECT = ('SELECT {key_column}, data FROM {table_name} ' 'WHERE status < %s ' 'ORDER BY {key_column} DESC LIMIT 1' % AckStatus.unack) class UniqueAckQ(SQLiteAckQueue): _TABLE_NAME = 'ack_unique_queue' _SQL_CREATE = ( 'CREATE TABLE IF NOT EXISTS {table_name} (' '{key_column} INTEGER PRIMARY KEY AUTOINCREMENT, ' 'data BLOB, timestamp FLOAT, status INTEGER, UNIQUE (data))' ) def put(self, item): obj = pickle.dumps(item) try: self._insert_into(obj, _time.time()) except sqlite3.IntegrityError: pass else: self.total += 1 self.put_event.set() persist-queue-0.4.0/persistqueue/exceptions.py0000644000076500000240000000012713277745217022566 0ustar yuzhi.wxstaff00000000000000#! coding = utf-8 class Empty(Exception): pass class Full(Exception): pass persist-queue-0.4.0/persistqueue/sqlqueue.py0000644000076500000240000001121613277745217022252 0ustar yuzhi.wxstaff00000000000000# coding=utf-8 """A thread-safe sqlite3 based persistent queue in Python.""" import logging import pickle import sqlite3 import time as _time import threading from persistqueue import sqlbase from persistqueue.exceptions import Empty sqlite3.enable_callback_tracebacks(True) log = logging.getLogger(__name__) # 10 seconds internal for `wait` of event TICK_FOR_WAIT = 10 class SQLiteQueue(sqlbase.SQLiteBase): """SQLite3 based FIFO queue.""" _TABLE_NAME = 'queue' _KEY_COLUMN = '_id' # the name of the key column, used in DB CRUD # SQL to create a table _SQL_CREATE = ('CREATE TABLE IF NOT EXISTS {table_name} (' '{key_column} INTEGER PRIMARY KEY AUTOINCREMENT, ' 'data BLOB, timestamp FLOAT)') # SQL to insert a record _SQL_INSERT = 'INSERT INTO {table_name} (data, timestamp) VALUES (?, ?)' # SQL to select a record _SQL_SELECT = ('SELECT {key_column}, data FROM {table_name} ' 'ORDER BY {key_column} ASC LIMIT 1') _SQL_SELECT_WHERE = 'SELECT {key_column}, data FROM {table_name} WHERE' \ ' {column} {op} ? ORDER BY {key_column} ASC LIMIT 1 ' def put(self, item): obj = pickle.dumps(item, protocol=self.protocol) self._insert_into(obj, _time.time()) self.total += 1 self.put_event.set() def _init(self): super(SQLiteQueue, self)._init() # Action lock to assure multiple action to be *atomic* self.action_lock = threading.Lock() if not self.auto_commit: # Refresh current cursor after restart head = self._select() if head: self.cursor = head[0] - 1 else: self.cursor = 0 self.total = self._count() def _pop(self): with self.action_lock: if self.auto_commit: row = self._select() # Perhaps a sqlite3 bug, sometimes (None, None) is returned # by select, below can avoid these invalid records. if row and row[0] is not None: self._delete(row[0]) self.total -= 1 return row[1] # pickled data else: row = self._select( self.cursor, op=">", column=self._KEY_COLUMN) if row and row[0] is not None: self.cursor = row[0] self.total -= 1 return row[1] return None def get(self, block=True, timeout=None): if not block: pickled = self._pop() if not pickled: raise Empty elif timeout is None: # block until a put event. pickled = self._pop() while not pickled: self.put_event.clear() self.put_event.wait(TICK_FOR_WAIT) pickled = self._pop() elif timeout < 0: raise ValueError("'timeout' must be a non-negative number") else: # block until the timeout reached endtime = _time.time() + timeout pickled = self._pop() while not pickled: self.put_event.clear() remaining = endtime - _time.time() if remaining <= 0.0: raise Empty self.put_event.wait( TICK_FOR_WAIT if TICK_FOR_WAIT < remaining else remaining) pickled = self._pop() item = pickle.loads(pickled) return item def task_done(self): """Persist the current state if auto_commit=False.""" if not self.auto_commit: self._delete(self.cursor, op='<=') self._task_done() @property def size(self): return self.total def qsize(self): return self.size def __len__(self): return self.size FIFOSQLiteQueue = SQLiteQueue class FILOSQLiteQueue(SQLiteQueue): """SQLite3 based FILO queue.""" _TABLE_NAME = 'filo_queue' # SQL to select a record _SQL_SELECT = ('SELECT {key_column}, data FROM {table_name} ' 'ORDER BY {key_column} DESC LIMIT 1') class UniqueQ(SQLiteQueue): _TABLE_NAME = 'unique_queue' _SQL_CREATE = ('CREATE TABLE IF NOT EXISTS {table_name} (' '{key_column} INTEGER PRIMARY KEY AUTOINCREMENT, ' 'data BLOB, timestamp FLOAT, UNIQUE (data))') def put(self, item): obj = pickle.dumps(item) try: self._insert_into(obj, _time.time()) except sqlite3.IntegrityError: pass else: self.total += 1 self.put_event.set() persist-queue-0.4.0/persistqueue/pdict.py0000644000076500000240000000375313277745217021520 0ustar yuzhi.wxstaff00000000000000#! coding = utf-8 import logging import pickle import sqlite3 from persistqueue import sqlbase log = logging.getLogger(__name__) class PDict(sqlbase.SQLiteBase, dict): _TABLE_NAME = 'dict' _KEY_COLUMN = 'key' _SQL_CREATE = ('CREATE TABLE IF NOT EXISTS {table_name} (' '{key_column} TEXT PRIMARY KEY, data BLOB)') _SQL_INSERT = 'INSERT INTO {table_name} (key, data) VALUES (?, ?)' _SQL_SELECT = ('SELECT {key_column}, data FROM {table_name} ' 'WHERE {key_column} = ?') _SQL_UPDATE = 'UPDATE {table_name} SET data = ? WHERE {key_column} = ?' def __init__(self, path, name, multithreading=False): # PDict is always auto_commit=True super(PDict, self).__init__(path, name=name, multithreading=multithreading, auto_commit=True) def __iter__(self): raise NotImplementedError('Not supported.') def keys(self): raise NotImplementedError('Not supported.') def iterkeys(self): raise NotImplementedError('Not supported.') def values(self): raise NotImplementedError('Not supported.') def itervalues(self): raise NotImplementedError('Not supported.') def iteritems(self): raise NotImplementedError('Not supported.') def items(self): raise NotImplementedError('Not supported.') def __contains__(self, item): row = self._select(item) return row is not None def __setitem__(self, key, value): obj = pickle.dumps(value) try: self._insert_into(key, obj) except sqlite3.IntegrityError: self._update(key, obj) def __getitem__(self, item): row = self._select(item) if row: return pickle.loads(row[1]) else: raise KeyError('Key: {} not exists.'.format(item)) def __delitem__(self, key): self._delete(key) def __len__(self): return self._count() persist-queue-0.4.0/LICENSE0000644000076500000240000000301713277745217016303 0ustar yuzhi.wxstaff00000000000000Copyright (c) G. B. Versiani. Copyright (c) Peter Wang. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of python-pqueue nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. persist-queue-0.4.0/requirements.txt0000644000076500000240000000000013277745217020547 0ustar yuzhi.wxstaff00000000000000persist-queue-0.4.0/test-requirements.txt0000644000076500000240000000015313277745217021535 0ustar yuzhi.wxstaff00000000000000mock>=2.0.0 flake8>=3.2.1 eventlet>=0.19.0 nose2>=0.6.5 coverage!=4.5 cov_core>=1.15.0 virtualenv>=15.1.0 persist-queue-0.4.0/tests/0000755000076500000240000000000013311454610016416 5ustar yuzhi.wxstaff00000000000000persist-queue-0.4.0/tests/__init__.py0000644000076500000240000000000013277745217020536 0ustar yuzhi.wxstaff00000000000000persist-queue-0.4.0/tests/test_pdict.py0000644000076500000240000000461013277745217021154 0ustar yuzhi.wxstaff00000000000000 import shutil import tempfile import unittest from persistqueue import pdict class PDictTest(unittest.TestCase): def setUp(self): self.path = tempfile.mkdtemp(suffix='pdict') def tearDown(self): shutil.rmtree(self.path, ignore_errors=True) def test_unsupported(self): pd = pdict.PDict(self.path, 'pd') pd['key_a'] = 'value_a' self.assertRaises(NotImplementedError, pd.keys) self.assertRaises(NotImplementedError, pd.iterkeys) self.assertRaises(NotImplementedError, pd.values) self.assertRaises(NotImplementedError, pd.itervalues) self.assertRaises(NotImplementedError, pd.items) self.assertRaises(NotImplementedError, pd.iteritems) def _for(): for _ in pd: pass self.assertRaises(NotImplementedError, _for) def test_add(self): pd = pdict.PDict(self.path, 'pd') pd['key_a'] = 'value_a' self.assertEqual(pd['key_a'], 'value_a') self.assertTrue('key_a' in pd) self.assertFalse('key_b' in pd) self.assertRaises(KeyError, lambda: pd['key_b']) pd['key_b'] = 'value_b' self.assertEqual(pd['key_a'], 'value_a') self.assertEqual(pd['key_b'], 'value_b') def test_set(self): pd = pdict.PDict(self.path, 'pd') pd['key_a'] = 'value_a' pd['key_b'] = 'value_b' self.assertEqual(pd['key_a'], 'value_a') self.assertEqual(pd['key_b'], 'value_b') pd['key_a'] = 'value_aaaaaa' self.assertEqual(pd['key_a'], 'value_aaaaaa') self.assertEqual(pd['key_b'], 'value_b') def test_delete(self): pd = pdict.PDict(self.path, 'pd') pd['key_a'] = 'value_a' pd['key_b'] = 'value_b' self.assertEqual(pd['key_a'], 'value_a') self.assertEqual(pd['key_b'], 'value_b') del pd['key_a'] self.assertFalse('key_a' in pd) self.assertRaises(KeyError, lambda: pd['key_a']) self.assertEqual(pd['key_b'], 'value_b') def test_two_dicts(self): pd_1 = pdict.PDict(self.path, '1') pd_2 = pdict.PDict(self.path, '2') pd_1['key_a'] = 'value_a' pd_2['key_b'] = 'value_b' self.assertEqual(pd_1['key_a'], 'value_a') self.assertEqual(pd_2['key_b'], 'value_b') self.assertRaises(KeyError, lambda: pd_1['key_b']) self.assertRaises(KeyError, lambda: pd_2['key_a']) persist-queue-0.4.0/tests/test_queue.py0000644000076500000240000001602213277745217021175 0ustar yuzhi.wxstaff00000000000000# coding=utf-8 import mock import os import pickle import random import shutil import sys import tempfile import unittest from threading import Thread from persistqueue import Queue, Empty, Full class PersistTest(unittest.TestCase): def setUp(self): self.path = tempfile.mkdtemp(suffix='queue') def tearDown(self): shutil.rmtree(self.path, ignore_errors=True) def test_open_close_single(self): """Write 1 item, close, reopen checking if same item is there""" q = Queue(self.path) q.put(b'var1') del q q = Queue(self.path) self.assertEqual(1, q.qsize()) self.assertEqual(b'var1', q.get()) q.task_done() def test_open_close_1000(self): """Write 1000 items, close, reopen checking if all items are there""" q = Queue(self.path) for i in range(1000): q.put('var%d' % i) self.assertEqual(1000, q.qsize()) del q q = Queue(self.path) self.assertEqual(1000, q.qsize()) for i in range(1000): data = q.get() self.assertEqual('var%d' % i, data) q.task_done() with self.assertRaises(Empty): q.get_nowait() # assert adding another one still works q.put(b'foobar') data = q.get() def test_partial_write(self): """Test recovery from previous crash w/ partial write""" q = Queue(self.path) for i in range(100): q.put('var%d' % i) del q with open(os.path.join(self.path, 'q00000'), 'ab') as f: pickle.dump('文字化け', f) q = Queue(self.path) self.assertEqual(100, q.qsize()) for i in range(100): self.assertEqual('var%d' % i, q.get()) q.task_done() with self.assertRaises(Empty): q.get_nowait() def test_random_read_write(self): """Test random read/write""" q = Queue(self.path) n = 0 for i in range(1000): if random.random() < 0.5: if n > 0: q.get_nowait() q.task_done() n -= 1 else: with self.assertRaises(Empty): q.get_nowait() else: q.put('var%d' % random.getrandbits(16)) n += 1 def test_multi_threaded(self): """Create consumer and producer threads, check parallelism""" q = Queue(self.path) def producer(): for i in range(1000): q.put('var%d' % i) def consumer(): for i in range(1000): q.get() q.task_done() c = Thread(target=consumer) c.start() p = Thread(target=producer) p.start() c.join() p.join() q.join() with self.assertRaises(Empty): q.get_nowait() def test_garbage_on_head(self): """Adds garbage to the queue head and let the internal integrity checks fix it""" q = Queue(self.path) q.put(b'var1') del q with open(os.path.join(self.path, 'q00000'), 'ab') as f: f.write(b'garbage') q = Queue(self.path) q.put(b'var2') self.assertEqual(2, q.qsize()) self.assertEqual(b'var1', q.get()) q.task_done() def test_task_done_too_many_times(self): """Test too many task_done called.""" q = Queue(self.path) q.put(b'var1') q.get() q.task_done() with self.assertRaises(ValueError): q.task_done() def test_get_timeout_negative(self): q = Queue(self.path) q.put(b'var1') with self.assertRaises(ValueError): q.get(timeout=-1) def test_get_timeout(self): """Test when get failed within timeout.""" q = Queue(self.path) q.put(b'var1') q.get() with self.assertRaises(Empty): q.get(timeout=1) def test_put_nowait(self): """Tests the put_nowait interface.""" q = Queue(self.path) q.put_nowait(b'var1') self.assertEqual(b'var1', q.get()) q.task_done() def test_put_maxsize_reached(self): """Test that maxsize reached.""" q = Queue(self.path, maxsize=10) for x in range(10): q.put(x) with self.assertRaises(Full): q.put(b'full_now', block=False) def test_put_timeout_reached(self): """Test put with block and timeout.""" q = Queue(self.path, maxsize=2) for x in range(2): q.put(x) with self.assertRaises(Full): q.put(b'full_and_timeout', block=True, timeout=1) def test_put_timeout_negative(self): """Test and put with timeout < 0""" q = Queue(self.path, maxsize=1) with self.assertRaises(ValueError): q.put(b'var1', timeout=-1) def test_put_block_and_wait(self): """Test block until queue is not full.""" q = Queue(self.path, maxsize=10) def consumer(): for i in range(5): q.get() q.task_done() def producer(): for j in range(16): q.put('var%d' % j) p = Thread(target=producer) p.start() c = Thread(target=consumer) c.start() c.join() val = q.get_nowait() p.join() self.assertEqual('var5', val) def test_clear_tail_file(self): """Teat that only remove tail file when calling task_done.""" q = Queue(self.path, chunksize=10) for i in range(35): q.put('var%d' % i) for _ in range(15): q.get() q = Queue(self.path, chunksize=10) self.assertEqual(q.qsize(), 35) for _ in range(15): q.get() # the first tail file gets removed after task_done q.task_done() for _ in range(16): q.get() # the second and third files get removed after task_done q.task_done() self.assertEqual(q.qsize(), 4) def test_windows_error(self): """Test the rename restrictions of Windows""" q = Queue(self.path) q.put(b'a') fake_error = OSError('Cannot create a file when' 'that file already exists') setattr(fake_error, 'winerror', 183) os_rename = os.rename i = [] def fake_remove(src, dst): if not i: i.append(1) raise fake_error else: i.append(2) os_rename(src, dst) with mock.patch('os.rename', new=fake_remove): q.put(b'b') self.assertTrue(b'a', q.get()) self.assertTrue(b'b', q.get()) def test_protocol_1(self): shutil.rmtree(self.path) q = Queue(path=self.path) self.assertEqual(q.protocol, 2 if sys.version_info[0] == 2 else 4) def test_protocol_2(self): q = Queue(path=self.path) self.assertEqual(q.protocol, None) persist-queue-0.4.0/tests/test_sqlqueue.py0000644000076500000240000002500613307235336021705 0ustar yuzhi.wxstaff00000000000000# coding=utf-8 import random import shutil import sys import tempfile import unittest from threading import Thread from persistqueue import SQLiteQueue, FILOSQLiteQueue, UniqueQ from persistqueue import Empty class SQLite3QueueTest(unittest.TestCase): def setUp(self): self.path = tempfile.mkdtemp(suffix='sqlqueue') self.auto_commit = True def tearDown(self): shutil.rmtree(self.path, ignore_errors=True) def test_raise_empty(self): q = SQLiteQueue(self.path, auto_commit=self.auto_commit) q.put('first') d = q.get() self.assertEqual('first', d) self.assertRaises(Empty, q.get, block=False) # assert with timeout self.assertRaises(Empty, q.get, block=True, timeout=1.0) # assert with negative timeout self.assertRaises(ValueError, q.get, block=True, timeout=-1.0) def test_open_close_single(self): """Write 1 item, close, reopen checking if same item is there""" q = SQLiteQueue(self.path, auto_commit=self.auto_commit) q.put(b'var1') del q q = SQLiteQueue(self.path) self.assertEqual(1, q.qsize()) self.assertEqual(b'var1', q.get()) def test_open_close_1000(self): """Write 1000 items, close, reopen checking if all items are there""" q = SQLiteQueue(self.path, auto_commit=self.auto_commit) for i in range(1000): q.put('var%d' % i) self.assertEqual(1000, q.qsize()) del q q = SQLiteQueue(self.path) self.assertEqual(1000, q.qsize()) for i in range(1000): data = q.get() self.assertEqual('var%d' % i, data) # assert adding another one still works q.put('foobar') data = q.get() self.assertEqual('foobar', data) def test_random_read_write(self): """Test random read/write""" q = SQLiteQueue(self.path, auto_commit=self.auto_commit) n = 0 for _ in range(1000): if random.random() < 0.5: if n > 0: q.get() n -= 1 else: self.assertRaises(Empty, q.get, block=False) else: q.put('var%d' % random.getrandbits(16)) n += 1 def test_multi_threaded_parallel(self): """Create consumer and producer threads, check parallelism""" # self.skipTest("Not supported multi-thread.") m_queue = SQLiteQueue(path=self.path, multithreading=True, auto_commit=self.auto_commit) def producer(): for i in range(1000): m_queue.put('var%d' % i) def consumer(): for i in range(1000): x = m_queue.get(block=True) self.assertEqual('var%d' % i, x) c = Thread(target=consumer) c.start() p = Thread(target=producer) p.start() p.join() c.join() self.assertEqual(0, m_queue.size) self.assertEqual(0, len(m_queue)) self.assertRaises(Empty, m_queue.get, block=False) def test_multi_threaded_multi_producer(self): """Test sqlqueue can be used by multiple producers.""" queue = SQLiteQueue(path=self.path, multithreading=True, auto_commit=self.auto_commit) def producer(seq): for i in range(10): queue.put('var%d' % (i + (seq * 10))) def consumer(): for _ in range(100): data = queue.get(block=True) self.assertTrue('var' in data) c = Thread(target=consumer) c.start() producers = [] for seq in range(10): t = Thread(target=producer, args=(seq,)) t.start() producers.append(t) for t in producers: t.join() c.join() def test_multiple_consumers(self): """Test sqlqueue can be used by multiple consumers.""" queue = SQLiteQueue(path=self.path, multithreading=True, auto_commit=self.auto_commit) def producer(): for x in range(1000): queue.put('var%d' % x) counter = [] # Set all to 0 for _ in range(1000): counter.append(0) def consumer(index): for i in range(200): data = queue.get(block=True) self.assertTrue('var' in data) counter[index * 200 + i] = data p = Thread(target=producer) p.start() consumers = [] for index in range(5): t = Thread(target=consumer, args=(index,)) t.start() consumers.append(t) p.join() for t in consumers: t.join() self.assertEqual(0, queue.qsize()) for x in range(1000): self.assertNotEqual(0, counter[x], "not 0 for counter's index %s" % x) self.assertEqual(len(set(counter)), len(counter)) def test_task_done_with_restart(self): """Test that items are not deleted before task_done.""" q = SQLiteQueue(path=self.path, auto_commit=False) for i in range(1, 11): q.put(i) self.assertEqual(1, q.get()) self.assertEqual(2, q.get()) # size is correct before task_done self.assertEqual(8, q.qsize()) q.task_done() # make sure the size still correct self.assertEqual(8, q.qsize()) self.assertEqual(3, q.get()) # without task done del q q = SQLiteQueue(path=self.path, auto_commit=False) # After restart, the qsize and head item are the same self.assertEqual(8, q.qsize()) # After restart, the queue still works self.assertEqual(3, q.get()) self.assertEqual(7, q.qsize()) def test_protocol_1(self): shutil.rmtree(self.path, ignore_errors=True) q = SQLiteQueue(path=self.path) self.assertEqual(q.protocol, 2 if sys.version_info[0] == 2 else 4) def test_protocol_2(self): q = SQLiteQueue(path=self.path) self.assertEqual(q.protocol, None) class SQLite3QueueNoAutoCommitTest(SQLite3QueueTest): def setUp(self): self.path = tempfile.mkdtemp(suffix='sqlqueue_auto_commit') self.auto_commit = False def test_multiple_consumers(self): """ FAIL: test_multiple_consumers ( -tests.test_sqlqueue.SQLite3QueueNoAutoCommitTest) Test sqlqueue can be used by multiple consumers. ---------------------------------------------------------------------- Traceback (most recent call last): File "persist-queue\tests\test_sqlqueue.py", line 183, -in test_multiple_consumers self.assertEqual(0, queue.qsize()) AssertionError: 0 != 72 :return: """ self.skipTest('Skipped due to a known bug above.') class SQLite3QueueInMemory(SQLite3QueueTest): def setUp(self): self.path = ":memory:" self.auto_commit = True def test_open_close_1000(self): self.skipTest('Memory based sqlite is not persistent.') def test_open_close_single(self): self.skipTest('Memory based sqlite is not persistent.') def test_multiple_consumers(self): self.skipTest('Skipped due to occasional crash during ' 'multithreading mode.') def test_multi_threaded_multi_producer(self): self.skipTest('Skipped due to occasional crash during ' 'multithreading mode.') def test_multi_threaded_parallel(self): self.skipTest('Skipped due to occasional crash during ' 'multithreading mode.') def test_task_done_with_restart(self): self.skipTest('Skipped due to not persistent.') def test_protocol_2(self): self.skipTest('In memory queue is always new.') class FILOSQLite3QueueTest(unittest.TestCase): def setUp(self): self.path = tempfile.mkdtemp(suffix='filo_sqlqueue') self.auto_commit = True def tearDown(self): shutil.rmtree(self.path, ignore_errors=True) def test_open_close_1000(self): """Write 1000 items, close, reopen checking if all items are there""" q = FILOSQLiteQueue(self.path, auto_commit=self.auto_commit) for i in range(1000): q.put('var%d' % i) self.assertEqual(1000, q.qsize()) del q q = FILOSQLiteQueue(self.path) self.assertEqual(1000, q.qsize()) for i in range(1000): data = q.get() self.assertEqual('var%d' % (999 - i), data) # assert adding another one still works q.put('foobar') data = q.get() self.assertEqual('foobar', data) class FILOSQLite3QueueNoAutoCommitTest(FILOSQLite3QueueTest): def setUp(self): self.path = tempfile.mkdtemp(suffix='filo_sqlqueue_auto_commit') self.auto_commit = False class SQLite3UniqueQueueTest(unittest.TestCase): def setUp(self): self.path = tempfile.mkdtemp(suffix='sqlqueue') self.auto_commit = True def test_add_duplicate_item(self): q = UniqueQ(self.path) q.put(1111) self.assertEqual(1, q.size) # put duplicate item q.put(1111) self.assertEqual(1, q.size) q.put(2222) self.assertEqual(2, q.size) del q q = UniqueQ(self.path) self.assertEqual(2, q.size) def test_multiple_consumers(self): """Test UniqueQ can be used by multiple consumers.""" queue = UniqueQ(path=self.path, multithreading=True, auto_commit=self.auto_commit) def producer(): for x in range(1000): queue.put('var%d' % x) counter = [] # Set all to 0 for _ in range(1000): counter.append(0) def consumer(index): for i in range(200): data = queue.get(block=True) self.assertTrue('var' in data) counter[index * 200 + i] = data p = Thread(target=producer) p.start() consumers = [] for index in range(5): t = Thread(target=consumer, args=(index,)) t.start() consumers.append(t) p.join() for t in consumers: t.join() self.assertEqual(0, queue.qsize()) for x in range(1000): self.assertNotEqual(0, counter[x], "not 0 for counter's index %s" % x) self.assertEqual(len(set(counter)), len(counter)) persist-queue-0.4.0/tests/test_sqlackqueue.py0000644000076500000240000002310413304764767022375 0ustar yuzhi.wxstaff00000000000000# coding=utf-8 import random import shutil import sys import tempfile import unittest from threading import Thread from persistqueue.sqlackqueue import ( SQLiteAckQueue, FILOSQLiteAckQueue, UniqueAckQ) from persistqueue import Empty class SQLite3AckQueueTest(unittest.TestCase): def setUp(self): self.path = tempfile.mkdtemp(suffix='sqlackqueue') self.auto_commit = True def tearDown(self): shutil.rmtree(self.path, ignore_errors=True) def test_raise_empty(self): q = SQLiteAckQueue(self.path, auto_commit=self.auto_commit) q.put('first') d = q.get() self.assertEqual('first', d) self.assertRaises(Empty, q.get, block=False) # assert with timeout self.assertRaises(Empty, q.get, block=True, timeout=1.0) # assert with negative timeout self.assertRaises(ValueError, q.get, block=True, timeout=-1.0) def test_open_close_single(self): """Write 1 item, close, reopen checking if same item is there""" q = SQLiteAckQueue(self.path, auto_commit=self.auto_commit) q.put(b'var1') del q q = SQLiteAckQueue(self.path) self.assertEqual(1, q.qsize()) self.assertEqual(b'var1', q.get()) def test_open_close_1000(self): """Write 1000 items, close, reopen checking if all items are there""" q = SQLiteAckQueue(self.path, auto_commit=self.auto_commit) for i in range(1000): q.put('var%d' % i) self.assertEqual(1000, q.qsize()) del q q = SQLiteAckQueue(self.path) self.assertEqual(1000, q.qsize()) for i in range(1000): data = q.get() self.assertEqual('var%d' % i, data) # assert adding another one still works q.put('foobar') data = q.get() self.assertEqual('foobar', data) def test_random_read_write(self): """Test random read/write""" q = SQLiteAckQueue(self.path, auto_commit=self.auto_commit) n = 0 for _ in range(1000): if random.random() < 0.5: if n > 0: q.get() n -= 1 else: self.assertRaises(Empty, q.get, block=False) else: q.put('var%d' % random.getrandbits(16)) n += 1 def test_multi_threaded_parallel(self): """Create consumer and producer threads, check parallelism""" # self.skipTest("Not supported multi-thread.") m_queue = SQLiteAckQueue( path=self.path, multithreading=True, auto_commit=self.auto_commit ) def producer(): for i in range(1000): m_queue.put('var%d' % i) def consumer(): for i in range(1000): x = m_queue.get(block=True) self.assertEqual('var%d' % i, x) c = Thread(target=consumer) c.start() p = Thread(target=producer) p.start() p.join() c.join() self.assertEqual(0, m_queue.size) self.assertEqual(0, len(m_queue)) self.assertRaises(Empty, m_queue.get, block=False) def test_multi_threaded_multi_producer(self): """Test sqlqueue can be used by multiple producers.""" queue = SQLiteAckQueue( path=self.path, multithreading=True, auto_commit=self.auto_commit ) def producer(seq): for i in range(10): queue.put('var%d' % (i + (seq * 10))) def consumer(): for _ in range(100): data = queue.get(block=True) self.assertTrue('var' in data) c = Thread(target=consumer) c.start() producers = [] for seq in range(10): t = Thread(target=producer, args=(seq,)) t.start() producers.append(t) for t in producers: t.join() c.join() def test_multiple_consumers(self): """Test sqlqueue can be used by multiple consumers.""" queue = SQLiteAckQueue( path=self.path, multithreading=True, auto_commit=self.auto_commit ) def producer(): for x in range(1000): queue.put('var%d' % x) counter = [] # Set all to 0 for _ in range(1000): counter.append(0) def consumer(index): for i in range(200): data = queue.get(block=True) self.assertTrue('var' in data) counter[index * 200 + i] = data p = Thread(target=producer) p.start() consumers = [] for index in range(5): t = Thread(target=consumer, args=(index,)) t.start() consumers.append(t) p.join() for t in consumers: t.join() self.assertEqual(0, queue.qsize()) for x in range(1000): self.assertNotEqual(0, counter[x], "not 0 for counter's index %s" % x) def test_protocol_1(self): shutil.rmtree(self.path, ignore_errors=True) q = SQLiteAckQueue(path=self.path) self.assertEqual(q.protocol, 2 if sys.version_info[0] == 2 else 4) def test_protocol_2(self): q = SQLiteAckQueue(path=self.path) self.assertEqual(q.protocol, None) def test_ack_and_clear(self): q = SQLiteAckQueue(path=self.path) q._MAX_ACKED_LENGTH = 10 ret_list = [] for _ in range(100): q.put("val%s" % _) for _ in range(100): ret_list.append(q.get()) for ret in ret_list: q.ack(ret) self.assertEqual(q.acked_count(), 100) q.clear_acked_data() self.assertEqual(q.acked_count(), 10) def test_ack_unknown_item(self): q = SQLiteAckQueue(path=self.path) q.put("val1") val1 = q.get() q.ack("val2") q.nack("val3") q.ack_failed("val4") self.assertEqual(q.qsize(), 0) self.assertEqual(q.unack_count(), 1) q.ack(val1) self.assertEqual(q.unack_count(), 0) def test_ack_unack_ack_failed(self): q = SQLiteAckQueue(path=self.path) q.put("val1") q.put("val2") q.put("val3") val1 = q.get() val2 = q.get() val3 = q.get() # qsize should be zero when all item is getted from q self.assertEqual(q.qsize(), 0) self.assertEqual(q.unack_count(), 3) # nack will let the item requeued as ready status q.nack(val1) self.assertEqual(q.qsize(), 1) self.assertEqual(q.ready_count(), 1) # ack failed is just mark item as ack failed q.ack_failed(val3) self.assertEqual(q.ack_failed_count(), 1) # ack should not effect qsize q.ack(val2) self.assertEqual(q.acked_count(), 1) self.assertEqual(q.qsize(), 1) # all ack* related action will reduce unack count self.assertEqual(q.unack_count(), 0) # reget the nacked item ready_val = q.get() self.assertEqual(ready_val, val1) q.ack(ready_val) self.assertEqual(q.qsize(), 0) self.assertEqual(q.acked_count(), 2) self.assertEqual(q.ready_count(), 0) class SQLite3QueueInMemory(SQLite3AckQueueTest): def setUp(self): self.path = ":memory:" self.auto_commit = True def test_open_close_1000(self): self.skipTest('Memory based sqlite is not persistent.') def test_open_close_single(self): self.skipTest('Memory based sqlite is not persistent.') def test_multiple_consumers(self): self.skipTest('Skipped due to occasional crash during ' 'multithreading mode.') def test_multi_threaded_multi_producer(self): self.skipTest('Skipped due to occasional crash during ' 'multithreading mode.') def test_multi_threaded_parallel(self): self.skipTest('Skipped due to occasional crash during ' 'multithreading mode.') def test_task_done_with_restart(self): self.skipTest('Skipped due to not persistent.') def test_protocol_2(self): self.skipTest('In memory queue is always new.') class FILOSQLite3AckQueueTest(unittest.TestCase): def setUp(self): self.path = tempfile.mkdtemp(suffix='filo_sqlackqueue') self.auto_commit = True def tearDown(self): shutil.rmtree(self.path, ignore_errors=True) def test_open_close_1000(self): """Write 1000 items, close, reopen checking if all items are there""" q = FILOSQLiteAckQueue(self.path, auto_commit=self.auto_commit) for i in range(1000): q.put('var%d' % i) self.assertEqual(1000, q.qsize()) del q q = FILOSQLiteAckQueue(self.path) self.assertEqual(1000, q.qsize()) for i in range(1000): data = q.get() self.assertEqual('var%d' % (999 - i), data) # assert adding another one still works q.put('foobar') data = q.get() self.assertEqual('foobar', data) class SQLite3UniqueAckQueueTest(unittest.TestCase): def setUp(self): self.path = tempfile.mkdtemp(suffix='sqlackqueue') self.auto_commit = True def test_add_duplicate_item(self): q = UniqueAckQ(self.path) q.put(1111) self.assertEqual(1, q.size) # put duplicate item q.put(1111) self.assertEqual(1, q.size) q.put(2222) self.assertEqual(2, q.size) del q q = UniqueAckQ(self.path) self.assertEqual(2, q.size) persist-queue-0.4.0/MANIFEST.in0000644000076500000240000000006113277745217017030 0ustar yuzhi.wxstaff00000000000000include LICENSE include README.rst include *.txt persist-queue-0.4.0/setup.py0000644000076500000240000000252113277745217017007 0ustar yuzhi.wxstaff00000000000000#!/usr/bin/env python # coding=utf-8 from setuptools import setup, find_packages setup( name='persist-queue', version=__import__('persistqueue').__version__, description=( 'A thread-safe disk based persistent queue in Python.' ), long_description=open('README.rst').read(), author=__import__('persistqueue').__author__, author_email='wangxu198709@gmail.com', maintainer=__import__('persistqueue').__author__, maintainer_email='wangxu198709@gmail.com', license=__import__('persistqueue').__license__, packages=find_packages(), platforms=["all"], url='http://github.com/peter-wangxu/persist-queue', classifiers=[ 'Development Status :: 4 - Beta', 'Operating System :: OS Independent', 'Intended Audience :: Developers', 'License :: OSI Approved :: BSD License', 'Programming Language :: Python', 'Programming Language :: Python :: Implementation', 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: 3.5', 'Programming Language :: Python :: 3.6', 'Programming Language :: Python :: 3.7', 'Topic :: Software Development :: Libraries' ], ) persist-queue-0.4.0/setup.cfg0000644000076500000240000000010313311454610017067 0ustar yuzhi.wxstaff00000000000000[bdist_wheel] universal = 1 [egg_info] tag_build = tag_date = 0 persist-queue-0.4.0/README.rst0000644000076500000240000002563313311452716016761 0ustar yuzhi.wxstaff00000000000000persist-queue - A thread-safe, disk-based queue for Python ========================================================== .. image:: https://img.shields.io/circleci/project/github/peter-wangxu/persist-queue/master.svg?label=Linux%20%26%20Mac :target: https://circleci.com/gh/peter-wangxu/persist-queue .. image:: https://img.shields.io/appveyor/ci/peter-wangxu/persist-queue/master.svg?label=Windows :target: https://ci.appveyor.com/project/peter-wangxu/persist-queue .. image:: https://img.shields.io/codecov/c/github/peter-wangxu/persist-queue/master.svg :target: https://codecov.io/gh/peter-wangxu/persist-queue .. image:: https://img.shields.io/pypi/v/persist-queue.svg :target: https://pypi.python.org/pypi/persist-queue ``persist-queue`` implements a file-based queue and a serial of sqlite3-based queues. The goals is to achieve following requirements: * Disk-based: each queued item should be stored in disk in case of any crash. * Thread-safe: can be used by multi-threaded producers and multi-threaded consumers. * Recoverable: Items can be read after process restart. * Green-compatible: can be used in ``greenlet`` or ``eventlet`` environment. While *queuelib* and *python-pqueue* cannot fulfil all of above. After some try, I found it's hard to achieve based on their current implementation without huge code change. this is the motivation to start this project. *persist-queue* use *pickle* object serialization module to support object instances. Most built-in type, like `int`, `dict`, `list` are able to be persisted by `persist-queue` directly, to support customized objects, please refer to `Pickling and unpickling extension types(Python2) `_ and `Pickling Class Instances(Python3) `_ This project is based on the achievements of `python-pqueue `_ and `queuelib `_ Requirements ------------ * Python 2.7 or Python 3.x. * Full support for Linux. * Windows support (with `Caution`_ if ``persistqueue.Queue`` is used). Installation ------------ from pypi ^^^^^^^^^ .. code-block:: console pip install persist-queue from source code ^^^^^^^^^^^^^^^^ .. code-block:: console git clone https://github.com/peter-wangxu/persist-queue cd persist-queue python setup.py install Benchmark --------- Here are the results for writing/reading **1000** items to the disk comparing the sqlite3 and file queue. - Windows - OS: Windows 10 - Disk: SATA3 SSD - RAM: 16 GiB +---------------+---------+-------------------------+----------------------------+ | | Write | Write/Read(1 task_done) | Write/Read(many task_done) | +---------------+---------+-------------------------+----------------------------+ | SQLite3 Queue | 1.8880 | 2.0290 | 3.5940 | +---------------+---------+-------------------------+----------------------------+ | File Queue | 15.0550 | 15.9150 | 30.7650 | +---------------+---------+-------------------------+----------------------------+ - Linux - OS: Ubuntu 16.04 (VM) - Disk: SATA3 SSD - RAM: 4 GiB +---------------+--------+-------------------------+----------------------------+ | | Write | Write/Read(1 task_done) | Write/Read(many task_done) | +---------------+--------+-------------------------+----------------------------+ | SQLite3 Queue | 1.8282 | 1.8075 | 2.8639 | +---------------+--------+-------------------------+----------------------------+ | File Queue | 0.9123 | 1.0411 | 2.5104 | +---------------+--------+-------------------------+----------------------------+ **note** Above result was got from: .. code-block:: console python benchmark/run_benchmark.py 1000 To see the real performance on your host, run the script under ``benchmark/run_benchmark.py``: .. code-block:: console python benchmark/run_benchmark.py Examples -------- Example usage with a SQLite3 based queue ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python >>> import persistqueue >>> q = persistqueue.SQLiteQueue('mypath', auto_commit=True) >>> q.put('str1') >>> q.put('str2') >>> q.put('str3') >>> q.get() 'str1' >>> del q Close the console, and then recreate the queue: .. code-block:: python >>> import persistqueue >>> q = persistqueue.SQLiteQueue('mypath', auto_commit=True) >>> q.get() 'str2' >>> Example usage of SQLite3 based ``UniqueQ`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This queue does not allow duplicate items. .. code-block:: python >>> import persistqueue >>> q = persistqueue.UniqueQ('mypath') >>> q.put('str1') >>> q.put('str1') >>> q.size 1 >>> q.put('str2') >>> q.size 2 >>> Example usage of SQLite3 based ``SQLiteAckQueue`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The core functions: ``get``: get from queue and mark item as unack ``ack``: mark item as acked ``nack``: there might be something wrong with current consumer, so mark item as ready and new consumer will get it ``ack_failed``: there might be something wrong during process, so just mark item as failed. .. code-block:: python >>> import persisitqueue >>> ackq = persistqueue.SQLiteAckQueue('path') >>> ackq.put('str1') >>> item = ackq.get() >>> # Do something with the item >>> ackq.ack(item) # If done with the item >>> ackq.nack(item) # Else mark item as `nack` so that it can be proceeded again by any worker >>> ackq.ack_failed() # Or else mark item as `ack_failed` to discard this item Note: this queue does not support ``auto_commit=True`` Example usage with a file based queue ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python >>> from persistqueue import Queue >>> q = Queue("mypath") >>> q.put('a') >>> q.put('b') >>> q.put('c') >>> q.get() 'a' >>> q.task_done() Close the python console, and then we restart the queue from the same path, .. code-block:: python >>> from persistqueue import Queue >>> q = Queue('mypath') >>> q.get() 'b' >>> q.task_done() Example usage with a SQLite3 based dict ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python >>> from persisitqueue import PDict >>> q = PDict("testpath", "testname") >>> q['key1'] = 123 >>> q['key2'] = 321 >>> q['key1'] 123 >>> len(q) 2 >>> del q['key1'] >>> q['key1'] Traceback (most recent call last): File "", line 1, in File "persistqueue\pdict.py", line 58, in __getitem__ raise KeyError('Key: {} not exists.'.format(item)) KeyError: 'Key: key1 not exists.' Close the console and restart the PDict .. code-block:: python >>> from persisitqueue import PDict >>> q = PDict("testpath", "testname") >>> q['key2'] 321 Multi-thread usage for **SQLite3** based queue ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python from persistqueue import FIFOSQLiteQueue q = FIFOSQLiteQueue(path="./test", multithreading=True) def worker(): while True: item = q.get() do_work(item) for i in range(num_worker_threads): t = Thread(target=worker) t.daemon = True t.start() for item in source(): q.put(item) multi-thread usage for **Queue** ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python from persistqueue import Queue q = Queue() def worker(): while True: item = q.get() do_work(item) q.task_done() for i in range(num_worker_threads): t = Thread(target=worker) t.daemon = True t.start() for item in source(): q.put(item) q.join() # block until all tasks are done Tips ---- ``task_done`` is required both for filed based queue and SQLite3 based queue (when ``auto_commit=False``) to persist the cursor of next ``get`` to the disk. Performance impact ------------------ - **WAL** Starting on v0.3.2, the ``persistqueue`` is leveraging the sqlite3 builtin feature ``WAL `` which can improve the performance significantly, a general testing indicates that ``persistqueue`` is 2-4 times faster than previous version. - **auto_commit=False** Since persistqueue v0.3.0, a new parameter ``auto_commit`` is introduced to tweak the performance for sqlite3 based queues as needed. When specify ``auto_commit=False``, user needs to perform ``queue.task_done()`` to persist the changes made to the disk since last ``task_done`` invocation. - **pickle protocol selection** From v0.3.6, the ``persistqueue`` will select ``Protocol version 2`` for python2 and ``Protocol version 4`` for python3 respectively. This selection only happens when the directory is not present when initializing the queue. Tests ----- *persist-queue* use ``tox`` to trigger tests. - Unit test .. code-block:: console tox -e Available ````: ``py27``, ``py34``, ``py35``, ``py36``, ``py37`` - PEP8 check .. code-block:: console tox -e pep8 `pyenv `_ is usually a helpful tool to manage multiple versions of Python. Caution ------- Currently, the atomic operation is not supported on Windows due to the limitation of Python's `os.rename `_, That's saying, the data in ``persistqueue.Queue`` could be in unreadable state when an incidental failure occurs during ``Queue.task_done``. **DO NOT put any critical data on persistqueue.queue on Windows**. This issue is under track by issue `Atomic renames on windows `_ Contribution ------------ Simply fork this repo and send PR for your code change(also tests to cover your change), remember to give a title and description of your PR. I am willing to enhance this project with you :). License ------- `BSD `_ Contributors ------------ `Contributors `_ FAQ --- * ``sqlite3.OperationalError: database is locked`` is raised. persistqueue open 2 connections for the db if ``multithreading=True``, the SQLite database is locked until that transaction is committed. The ``timeout`` parameter specifies how long the connection should wait for the lock to go away until raising an exception. Default time is **10**, increase ``timeout`` when creating the queue if above error occurs. * sqlite3 based queues are not thread-safe. The sqlite3 queues are heavily tested under multi-threading environment, if you find it's not thread-safe, please make sure you set the ``multithreading=True`` when initializing the queue before submitting new issue:). persist-queue-0.4.0/persist_queue.egg-info/0000755000076500000240000000000013311454610021643 5ustar yuzhi.wxstaff00000000000000persist-queue-0.4.0/persist_queue.egg-info/PKG-INFO0000644000076500000240000003554513311454607022762 0ustar yuzhi.wxstaff00000000000000Metadata-Version: 1.2 Name: persist-queue Version: 0.4.0 Summary: A thread-safe disk based persistent queue in Python. Home-page: http://github.com/peter-wangxu/persist-queue Author: Peter Wang Author-email: wangxu198709@gmail.com Maintainer: Peter Wang Maintainer-email: wangxu198709@gmail.com License: BSD Description: persist-queue - A thread-safe, disk-based queue for Python ========================================================== .. image:: https://img.shields.io/circleci/project/github/peter-wangxu/persist-queue/master.svg?label=Linux%20%26%20Mac :target: https://circleci.com/gh/peter-wangxu/persist-queue .. image:: https://img.shields.io/appveyor/ci/peter-wangxu/persist-queue/master.svg?label=Windows :target: https://ci.appveyor.com/project/peter-wangxu/persist-queue .. image:: https://img.shields.io/codecov/c/github/peter-wangxu/persist-queue/master.svg :target: https://codecov.io/gh/peter-wangxu/persist-queue .. image:: https://img.shields.io/pypi/v/persist-queue.svg :target: https://pypi.python.org/pypi/persist-queue ``persist-queue`` implements a file-based queue and a serial of sqlite3-based queues. The goals is to achieve following requirements: * Disk-based: each queued item should be stored in disk in case of any crash. * Thread-safe: can be used by multi-threaded producers and multi-threaded consumers. * Recoverable: Items can be read after process restart. * Green-compatible: can be used in ``greenlet`` or ``eventlet`` environment. While *queuelib* and *python-pqueue* cannot fulfil all of above. After some try, I found it's hard to achieve based on their current implementation without huge code change. this is the motivation to start this project. *persist-queue* use *pickle* object serialization module to support object instances. Most built-in type, like `int`, `dict`, `list` are able to be persisted by `persist-queue` directly, to support customized objects, please refer to `Pickling and unpickling extension types(Python2) `_ and `Pickling Class Instances(Python3) `_ This project is based on the achievements of `python-pqueue `_ and `queuelib `_ Requirements ------------ * Python 2.7 or Python 3.x. * Full support for Linux. * Windows support (with `Caution`_ if ``persistqueue.Queue`` is used). Installation ------------ from pypi ^^^^^^^^^ .. code-block:: console pip install persist-queue from source code ^^^^^^^^^^^^^^^^ .. code-block:: console git clone https://github.com/peter-wangxu/persist-queue cd persist-queue python setup.py install Benchmark --------- Here are the results for writing/reading **1000** items to the disk comparing the sqlite3 and file queue. - Windows - OS: Windows 10 - Disk: SATA3 SSD - RAM: 16 GiB +---------------+---------+-------------------------+----------------------------+ | | Write | Write/Read(1 task_done) | Write/Read(many task_done) | +---------------+---------+-------------------------+----------------------------+ | SQLite3 Queue | 1.8880 | 2.0290 | 3.5940 | +---------------+---------+-------------------------+----------------------------+ | File Queue | 15.0550 | 15.9150 | 30.7650 | +---------------+---------+-------------------------+----------------------------+ - Linux - OS: Ubuntu 16.04 (VM) - Disk: SATA3 SSD - RAM: 4 GiB +---------------+--------+-------------------------+----------------------------+ | | Write | Write/Read(1 task_done) | Write/Read(many task_done) | +---------------+--------+-------------------------+----------------------------+ | SQLite3 Queue | 1.8282 | 1.8075 | 2.8639 | +---------------+--------+-------------------------+----------------------------+ | File Queue | 0.9123 | 1.0411 | 2.5104 | +---------------+--------+-------------------------+----------------------------+ **note** Above result was got from: .. code-block:: console python benchmark/run_benchmark.py 1000 To see the real performance on your host, run the script under ``benchmark/run_benchmark.py``: .. code-block:: console python benchmark/run_benchmark.py Examples -------- Example usage with a SQLite3 based queue ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python >>> import persistqueue >>> q = persistqueue.SQLiteQueue('mypath', auto_commit=True) >>> q.put('str1') >>> q.put('str2') >>> q.put('str3') >>> q.get() 'str1' >>> del q Close the console, and then recreate the queue: .. code-block:: python >>> import persistqueue >>> q = persistqueue.SQLiteQueue('mypath', auto_commit=True) >>> q.get() 'str2' >>> Example usage of SQLite3 based ``UniqueQ`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This queue does not allow duplicate items. .. code-block:: python >>> import persistqueue >>> q = persistqueue.UniqueQ('mypath') >>> q.put('str1') >>> q.put('str1') >>> q.size 1 >>> q.put('str2') >>> q.size 2 >>> Example usage of SQLite3 based ``SQLiteAckQueue`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The core functions: ``get``: get from queue and mark item as unack ``ack``: mark item as acked ``nack``: there might be something wrong with current consumer, so mark item as ready and new consumer will get it ``ack_failed``: there might be something wrong during process, so just mark item as failed. .. code-block:: python >>> import persisitqueue >>> ackq = persistqueue.SQLiteAckQueue('path') >>> ackq.put('str1') >>> item = ackq.get() >>> # Do something with the item >>> ackq.ack(item) # If done with the item >>> ackq.nack(item) # Else mark item as `nack` so that it can be proceeded again by any worker >>> ackq.ack_failed() # Or else mark item as `ack_failed` to discard this item Note: this queue does not support ``auto_commit=True`` Example usage with a file based queue ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python >>> from persistqueue import Queue >>> q = Queue("mypath") >>> q.put('a') >>> q.put('b') >>> q.put('c') >>> q.get() 'a' >>> q.task_done() Close the python console, and then we restart the queue from the same path, .. code-block:: python >>> from persistqueue import Queue >>> q = Queue('mypath') >>> q.get() 'b' >>> q.task_done() Example usage with a SQLite3 based dict ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python >>> from persisitqueue import PDict >>> q = PDict("testpath", "testname") >>> q['key1'] = 123 >>> q['key2'] = 321 >>> q['key1'] 123 >>> len(q) 2 >>> del q['key1'] >>> q['key1'] Traceback (most recent call last): File "", line 1, in File "persistqueue\pdict.py", line 58, in __getitem__ raise KeyError('Key: {} not exists.'.format(item)) KeyError: 'Key: key1 not exists.' Close the console and restart the PDict .. code-block:: python >>> from persisitqueue import PDict >>> q = PDict("testpath", "testname") >>> q['key2'] 321 Multi-thread usage for **SQLite3** based queue ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python from persistqueue import FIFOSQLiteQueue q = FIFOSQLiteQueue(path="./test", multithreading=True) def worker(): while True: item = q.get() do_work(item) for i in range(num_worker_threads): t = Thread(target=worker) t.daemon = True t.start() for item in source(): q.put(item) multi-thread usage for **Queue** ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python from persistqueue import Queue q = Queue() def worker(): while True: item = q.get() do_work(item) q.task_done() for i in range(num_worker_threads): t = Thread(target=worker) t.daemon = True t.start() for item in source(): q.put(item) q.join() # block until all tasks are done Tips ---- ``task_done`` is required both for filed based queue and SQLite3 based queue (when ``auto_commit=False``) to persist the cursor of next ``get`` to the disk. Performance impact ------------------ - **WAL** Starting on v0.3.2, the ``persistqueue`` is leveraging the sqlite3 builtin feature ``WAL `` which can improve the performance significantly, a general testing indicates that ``persistqueue`` is 2-4 times faster than previous version. - **auto_commit=False** Since persistqueue v0.3.0, a new parameter ``auto_commit`` is introduced to tweak the performance for sqlite3 based queues as needed. When specify ``auto_commit=False``, user needs to perform ``queue.task_done()`` to persist the changes made to the disk since last ``task_done`` invocation. - **pickle protocol selection** From v0.3.6, the ``persistqueue`` will select ``Protocol version 2`` for python2 and ``Protocol version 4`` for python3 respectively. This selection only happens when the directory is not present when initializing the queue. Tests ----- *persist-queue* use ``tox`` to trigger tests. - Unit test .. code-block:: console tox -e Available ````: ``py27``, ``py34``, ``py35``, ``py36``, ``py37`` - PEP8 check .. code-block:: console tox -e pep8 `pyenv `_ is usually a helpful tool to manage multiple versions of Python. Caution ------- Currently, the atomic operation is not supported on Windows due to the limitation of Python's `os.rename `_, That's saying, the data in ``persistqueue.Queue`` could be in unreadable state when an incidental failure occurs during ``Queue.task_done``. **DO NOT put any critical data on persistqueue.queue on Windows**. This issue is under track by issue `Atomic renames on windows `_ Contribution ------------ Simply fork this repo and send PR for your code change(also tests to cover your change), remember to give a title and description of your PR. I am willing to enhance this project with you :). License ------- `BSD `_ Contributors ------------ `Contributors `_ FAQ --- * ``sqlite3.OperationalError: database is locked`` is raised. persistqueue open 2 connections for the db if ``multithreading=True``, the SQLite database is locked until that transaction is committed. The ``timeout`` parameter specifies how long the connection should wait for the lock to go away until raising an exception. Default time is **10**, increase ``timeout`` when creating the queue if above error occurs. * sqlite3 based queues are not thread-safe. The sqlite3 queues are heavily tested under multi-threading environment, if you find it's not thread-safe, please make sure you set the ``multithreading=True`` when initializing the queue before submitting new issue:). Platform: all Classifier: Development Status :: 4 - Beta Classifier: Operating System :: OS Independent Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: BSD License Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: Implementation Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Topic :: Software Development :: Libraries persist-queue-0.4.0/persist_queue.egg-info/SOURCES.txt0000644000076500000240000000103313311454610023524 0ustar yuzhi.wxstaff00000000000000LICENSE MANIFEST.in README.rst requirements.txt setup.cfg setup.py test-requirements.txt persist_queue.egg-info/PKG-INFO persist_queue.egg-info/SOURCES.txt persist_queue.egg-info/dependency_links.txt persist_queue.egg-info/top_level.txt persistqueue/__init__.py persistqueue/common.py persistqueue/exceptions.py persistqueue/pdict.py persistqueue/queue.py persistqueue/sqlackqueue.py persistqueue/sqlbase.py persistqueue/sqlqueue.py tests/__init__.py tests/test_pdict.py tests/test_queue.py tests/test_sqlackqueue.py tests/test_sqlqueue.pypersist-queue-0.4.0/persist_queue.egg-info/top_level.txt0000644000076500000240000000002313311454607024376 0ustar yuzhi.wxstaff00000000000000persistqueue tests persist-queue-0.4.0/persist_queue.egg-info/dependency_links.txt0000644000076500000240000000000113311454607025717 0ustar yuzhi.wxstaff00000000000000